Ansible Playbooks: Practical Examples for Real-World Infrastructure

Introduction

Ansible playbooks are the cornerstone of infrastructure automation, transforming complex manual processes into repeatable, version-controlled configurations. While ad-hoc commands are useful for quick tasks, playbooks provide the power to orchestrate multi-step operations, manage complex infrastructure, and implement sophisticated deployment strategies.

This comprehensive guide presents practical, production-ready Ansible playbook examples that you can adapt to your infrastructure needs. Each example is designed to solve real-world problems faced by system administrators and DevOps engineers, from deploying full application stacks to implementing disaster recovery procedures.

Whether you're managing a handful of servers or orchestrating thousands of cloud instances, these playbook examples will help you automate repetitive tasks, reduce human error, and implement infrastructure-as-code best practices. Each example includes detailed explanations, complete working code, and best practices that you can immediately apply to your projects.

Understanding Playbook Structure

Before diving into examples, let's understand the anatomy of a well-structured playbook:

---
# Top-level play
- name: Descriptive play name
  hosts: target_hosts
  become: yes  # Privilege escalation
  gather_facts: yes  # Gather system information

  vars:
    # Play-specific variables
    app_version: "1.0.0"

  pre_tasks:
    # Tasks that run before roles
    - name: Update cache
      apt:
        update_cache: yes

  roles:
    # Reusable role includes
    - common
    - webserver

  tasks:
    # Main tasks
    - name: Task description
      module_name:
        parameter: value
      notify: handler_name

  post_tasks:
    # Tasks that run after everything
    - name: Final verification
      uri:
        url: http://localhost

  handlers:
    # Event-driven tasks
    - name: handler_name
      systemd:
        name: nginx
        state: restarted

Prerequisites

To use these playbooks effectively, ensure you have:

  • Ansible 2.9 or higher installed on your control node
  • SSH access to managed nodes with key-based authentication
  • Sudo/root privileges on managed nodes
  • Basic understanding of YAML syntax
  • Properly configured inventory file
  • Python 3.6+ on all managed nodes

Project Structure

Organize your Ansible project like this:

ansible-project/
├── ansible.cfg
├── inventory/
│   ├── production
│   ├── staging
│   └── development
├── group_vars/
│   ├── all.yml
│   ├── webservers.yml
│   └── databases.yml
├── host_vars/
│   └── special-host.yml
├── playbooks/
│   ├── site.yml
│   ├── webservers.yml
│   └── databases.yml
├── roles/
│   ├── common/
│   ├── nginx/
│   └── postgresql/
└── files/
    └── templates/

Example 1: Complete LEMP Stack Deployment

This playbook deploys a full Linux, Nginx, MySQL (MariaDB), PHP stack with security hardening:

---
# playbooks/lemp-stack.yml
- name: Deploy LEMP Stack
  hosts: webservers
  become: yes

  vars:
    php_version: "8.2"
    mysql_root_password: "{{ vault_mysql_root_password }}"
    app_user: "www-data"
    app_domain: "example.com"

  tasks:
    # System preparation
    - name: Update apt cache
      apt:
        update_cache: yes
        cache_valid_time: 3600

    - name: Install system dependencies
      apt:
        name:
          - software-properties-common
          - apt-transport-https
          - ca-certificates
          - curl
          - gnupg
        state: present

    # Nginx installation and configuration
    - name: Install Nginx
      apt:
        name: nginx
        state: present

    - name: Create web root directory
      file:
        path: "/var/www/{{ app_domain }}"
        state: directory
        owner: "{{ app_user }}"
        group: "{{ app_user }}"
        mode: '0755'

    - name: Configure Nginx virtual host
      template:
        src: templates/nginx-vhost.j2
        dest: "/etc/nginx/sites-available/{{ app_domain }}"
        mode: '0644'
      notify: reload nginx

    - name: Enable Nginx site
      file:
        src: "/etc/nginx/sites-available/{{ app_domain }}"
        dest: "/etc/nginx/sites-enabled/{{ app_domain }}"
        state: link
      notify: reload nginx

    - name: Remove default Nginx site
      file:
        path: /etc/nginx/sites-enabled/default
        state: absent
      notify: reload nginx

    # MariaDB installation
    - name: Install MariaDB server
      apt:
        name:
          - mariadb-server
          - mariadb-client
          - python3-pymysql
        state: present

    - name: Start and enable MariaDB
      systemd:
        name: mariadb
        state: started
        enabled: yes

    - name: Set MariaDB root password
      mysql_user:
        name: root
        password: "{{ mysql_root_password }}"
        login_unix_socket: /var/run/mysqld/mysqld.sock
        state: present

    - name: Create MariaDB configuration for root
      template:
        src: templates/my.cnf.j2
        dest: /root/.my.cnf
        mode: '0600'

    - name: Remove anonymous MariaDB users
      mysql_user:
        name: ''
        host_all: yes
        state: absent

    - name: Remove MariaDB test database
      mysql_db:
        name: test
        state: absent

    # PHP installation
    - name: Add PHP repository
      apt_repository:
        repo: "ppa:ondrej/php"
        state: present

    - name: Install PHP and extensions
      apt:
        name:
          - "php{{ php_version }}-fpm"
          - "php{{ php_version }}-mysql"
          - "php{{ php_version }}-curl"
          - "php{{ php_version }}-gd"
          - "php{{ php_version }}-mbstring"
          - "php{{ php_version }}-xml"
          - "php{{ php_version }}-zip"
          - "php{{ php_version }}-opcache"
        state: present

    - name: Configure PHP-FPM pool
      template:
        src: templates/php-fpm-pool.j2
        dest: "/etc/php/{{ php_version }}/fpm/pool.d/www.conf"
        mode: '0644'
      notify: restart php-fpm

    - name: Configure PHP settings
      lineinfile:
        path: "/etc/php/{{ php_version }}/fpm/php.ini"
        regexp: "{{ item.regexp }}"
        line: "{{ item.line }}"
      loop:
        - { regexp: '^;?upload_max_filesize', line: 'upload_max_filesize = 64M' }
        - { regexp: '^;?post_max_size', line: 'post_max_size = 64M' }
        - { regexp: '^;?memory_limit', line: 'memory_limit = 256M' }
        - { regexp: '^;?max_execution_time', line: 'max_execution_time = 300' }
      notify: restart php-fpm

    # Security hardening
    - name: Install and configure UFW
      apt:
        name: ufw
        state: present

    - name: Configure UFW defaults
      ufw:
        direction: "{{ item.direction }}"
        policy: "{{ item.policy }}"
      loop:
        - { direction: 'incoming', policy: 'deny' }
        - { direction: 'outgoing', policy: 'allow' }

    - name: Allow SSH
      ufw:
        rule: allow
        port: '22'
        proto: tcp

    - name: Allow HTTP
      ufw:
        rule: allow
        port: '80'
        proto: tcp

    - name: Allow HTTPS
      ufw:
        rule: allow
        port: '443'
        proto: tcp

    - name: Enable UFW
      ufw:
        state: enabled

    # SSL certificate with Let's Encrypt
    - name: Install Certbot
      apt:
        name:
          - certbot
          - python3-certbot-nginx
        state: present

    - name: Obtain SSL certificate
      command: >
        certbot --nginx --non-interactive --agree-tos
        --email admin@{{ app_domain }}
        -d {{ app_domain }} -d www.{{ app_domain }}
      args:
        creates: "/etc/letsencrypt/live/{{ app_domain }}/fullchain.pem"

    - name: Setup SSL renewal cron job
      cron:
        name: "Renew Let's Encrypt certificates"
        minute: "0"
        hour: "3"
        job: "certbot renew --quiet --post-hook 'systemctl reload nginx'"

    # Deploy sample application
    - name: Deploy index.php
      copy:
        content: |
          <?php
          phpinfo();
          ?>
        dest: "/var/www/{{ app_domain }}/index.php"
        owner: "{{ app_user }}"
        group: "{{ app_user }}"
        mode: '0644'

  handlers:
    - name: reload nginx
      systemd:
        name: nginx
        state: reloaded

    - name: restart php-fpm
      systemd:
        name: "php{{ php_version }}-fpm"
        state: restarted

Required Template: nginx-vhost.j2

# templates/nginx-vhost.j2
server {
    listen 80;
    listen [::]:80;
    server_name {{ app_domain }} www.{{ app_domain }};
    root /var/www/{{ app_domain }};

    index index.php index.html index.htm;

    location / {
        try_files $uri $uri/ =404;
    }

    location ~ \.php$ {
        include snippets/fastcgi-php.conf;
        fastcgi_pass unix:/var/run/php/php{{ php_version }}-fpm.sock;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        include fastcgi_params;
    }

    location ~ /\.ht {
        deny all;
    }

    # Security headers
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-XSS-Protection "1; mode=block" always;

    # Gzip compression
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types text/plain text/css text/xml text/javascript application/json application/javascript application/xml+rss;
}

Example 2: Multi-Tier Application Deployment

Deploy a complete application with load balancer, web servers, and database cluster:

---
# playbooks/multi-tier-app.yml
- name: Configure load balancers
  hosts: loadbalancers
  become: yes

  tasks:
    - name: Install HAProxy
      apt:
        name: haproxy
        state: present

    - name: Configure HAProxy
      template:
        src: templates/haproxy.cfg.j2
        dest: /etc/haproxy/haproxy.cfg
        mode: '0644'
        validate: 'haproxy -f %s -c'
      notify: restart haproxy

    - name: Enable HAProxy
      systemd:
        name: haproxy
        enabled: yes
        state: started

  handlers:
    - name: restart haproxy
      systemd:
        name: haproxy
        state: restarted

- name: Configure web application servers
  hosts: appservers
  become: yes
  serial: 1  # Rolling deployment

  vars:
    app_name: "myapp"
    app_version: "{{ deploy_version | default('latest') }}"
    app_port: 3000

  tasks:
    - name: Install Node.js
      apt:
        name:
          - nodejs
          - npm
        state: present

    - name: Create application user
      user:
        name: "{{ app_name }}"
        system: yes
        shell: /bin/bash
        home: "/opt/{{ app_name }}"

    - name: Create app directory
      file:
        path: "/opt/{{ app_name }}"
        state: directory
        owner: "{{ app_name }}"
        group: "{{ app_name }}"
        mode: '0755'

    - name: Deploy application code
      git:
        repo: "https://github.com/yourorg/{{ app_name }}.git"
        dest: "/opt/{{ app_name }}/app"
        version: "{{ app_version }}"
        force: yes
      become_user: "{{ app_name }}"
      notify: restart app

    - name: Install npm dependencies
      npm:
        path: "/opt/{{ app_name }}/app"
        production: yes
      become_user: "{{ app_name }}"
      notify: restart app

    - name: Create environment file
      template:
        src: templates/app-env.j2
        dest: "/opt/{{ app_name }}/.env"
        owner: "{{ app_name }}"
        group: "{{ app_name }}"
        mode: '0600'
      notify: restart app

    - name: Create systemd service
      template:
        src: templates/app-service.j2
        dest: "/etc/systemd/system/{{ app_name }}.service"
        mode: '0644'
      notify:
        - reload systemd
        - restart app

    - name: Enable and start application
      systemd:
        name: "{{ app_name }}"
        enabled: yes
        state: started

    - name: Wait for application to be ready
      uri:
        url: "http://localhost:{{ app_port }}/health"
        status_code: 200
      register: result
      until: result.status == 200
      retries: 10
      delay: 3

  handlers:
    - name: reload systemd
      systemd:
        daemon_reload: yes

    - name: restart app
      systemd:
        name: "{{ app_name }}"
        state: restarted

- name: Configure database servers
  hosts: databases
  become: yes

  vars:
    postgres_version: "15"
    db_name: "myapp_production"
    db_user: "myapp"
    db_password: "{{ vault_db_password }}"

  tasks:
    - name: Install PostgreSQL
      apt:
        name:
          - "postgresql-{{ postgres_version }}"
          - "postgresql-contrib-{{ postgres_version }}"
          - python3-psycopg2
        state: present

    - name: Ensure PostgreSQL is running
      systemd:
        name: postgresql
        state: started
        enabled: yes

    - name: Create application database
      postgresql_db:
        name: "{{ db_name }}"
        state: present
      become_user: postgres

    - name: Create application user
      postgresql_user:
        name: "{{ db_user }}"
        password: "{{ db_password }}"
        db: "{{ db_name }}"
        priv: ALL
        state: present
      become_user: postgres

    - name: Configure PostgreSQL for network access
      lineinfile:
        path: "/etc/postgresql/{{ postgres_version }}/main/postgresql.conf"
        regexp: "^#?listen_addresses"
        line: "listen_addresses = '*'"
      notify: restart postgresql

    - name: Allow application servers to connect
      postgresql_pg_hba:
        dest: "/etc/postgresql/{{ postgres_version }}/main/pg_hba.conf"
        contype: host
        users: "{{ db_user }}"
        source: "{{ hostvars[item]['ansible_default_ipv4']['address'] }}/32"
        databases: "{{ db_name }}"
        method: md5
      loop: "{{ groups['appservers'] }}"
      notify: restart postgresql

  handlers:
    - name: restart postgresql
      systemd:
        name: postgresql
        state: restarted

- name: Run database migrations
  hosts: appservers[0]
  become: yes
  become_user: myapp

  tasks:
    - name: Run migrations
      command: npm run migrate
      args:
        chdir: /opt/myapp/app
      run_once: yes

Example 3: Disaster Recovery and Backup Automation

Comprehensive backup solution with rotation and off-site storage:

---
# playbooks/backup-automation.yml
- name: Configure automated backups
  hosts: all
  become: yes

  vars:
    backup_dir: "/var/backups"
    backup_retention_days: 7
    backup_s3_bucket: "company-backups"
    backup_schedule: "0 2 * * *"  # 2 AM daily

  tasks:
    - name: Install backup tools
      apt:
        name:
          - rsync
          - borgbackup
          - awscli
          - pigz
        state: present

    - name: Create backup directory
      file:
        path: "{{ backup_dir }}"
        state: directory
        mode: '0700'
        owner: root
        group: root

    - name: Create backup script
      copy:
        content: |
          #!/bin/bash
          set -euo pipefail

          # Configuration
          BACKUP_DIR="{{ backup_dir }}"
          RETENTION_DAYS={{ backup_retention_days }}
          S3_BUCKET="{{ backup_s3_bucket }}"
          HOSTNAME=$(hostname -f)
          TIMESTAMP=$(date +%Y%m%d_%H%M%S)

          # Logging
          LOG_FILE="${BACKUP_DIR}/backup.log"
          exec 1> >(tee -a "${LOG_FILE}")
          exec 2>&1

          echo "=== Backup started at $(date) ==="

          # Backup system files
          echo "Backing up system files..."
          tar -czf "${BACKUP_DIR}/system_${TIMESTAMP}.tar.gz" \
            /etc \
            /home \
            /root \
            --exclude='/home/*/.cache' \
            --exclude='/home/*/tmp'

          {% if 'databases' in group_names %}
          # Database backup
          echo "Backing up databases..."
          if systemctl is-active --quiet postgresql; then
            sudo -u postgres pg_dumpall | pigz > "${BACKUP_DIR}/postgres_${TIMESTAMP}.sql.gz"
          fi

          if systemctl is-active --quiet mariadb; then
            mysqldump --all-databases --single-transaction | pigz > "${BACKUP_DIR}/mysql_${TIMESTAMP}.sql.gz"
          fi
          {% endif %}

          {% if 'webservers' in group_names %}
          # Web content backup
          echo "Backing up web content..."
          tar -czf "${BACKUP_DIR}/web_${TIMESTAMP}.tar.gz" /var/www
          {% endif %}

          # Upload to S3
          echo "Uploading to S3..."
          aws s3 sync "${BACKUP_DIR}" "s3://${S3_BUCKET}/${HOSTNAME}/" \
            --exclude "*.log" \
            --storage-class STANDARD_IA

          # Cleanup old local backups
          echo "Cleaning up old backups..."
          find "${BACKUP_DIR}" -name "*.tar.gz" -mtime +${RETENTION_DAYS} -delete
          find "${BACKUP_DIR}" -name "*.sql.gz" -mtime +${RETENTION_DAYS} -delete

          echo "=== Backup completed at $(date) ==="
        dest: /usr/local/bin/automated-backup.sh
        mode: '0700'
        owner: root
        group: root

    - name: Configure AWS credentials
      template:
        src: templates/aws-credentials.j2
        dest: /root/.aws/credentials
        mode: '0600'

    - name: Schedule backup cron job
      cron:
        name: "Automated system backup"
        minute: "{{ backup_schedule.split()[0] }}"
        hour: "{{ backup_schedule.split()[1] }}"
        job: "/usr/local/bin/automated-backup.sh"
        state: present

    - name: Create backup monitoring script
      copy:
        content: |
          #!/bin/bash
          BACKUP_DIR="{{ backup_dir }}"
          MAX_AGE_HOURS=26

          LATEST_BACKUP=$(find "${BACKUP_DIR}" -name "*.tar.gz" -type f -printf '%T@ %p\n' | sort -n | tail -1 | cut -f2- -d" ")

          if [ -z "$LATEST_BACKUP" ]; then
            echo "CRITICAL: No backups found"
            exit 2
          fi

          AGE_HOURS=$(( ($(date +%s) - $(stat -c %Y "$LATEST_BACKUP")) / 3600 ))

          if [ $AGE_HOURS -gt $MAX_AGE_HOURS ]; then
            echo "WARNING: Latest backup is ${AGE_HOURS} hours old"
            exit 1
          fi

          echo "OK: Latest backup is ${AGE_HOURS} hours old"
          exit 0
        dest: /usr/local/bin/check-backup.sh
        mode: '0755'

    - name: Test backup script
      command: /usr/local/bin/automated-backup.sh
      async: 3600
      poll: 0
      register: backup_test

    - name: Verify backup completion
      async_status:
        jid: "{{ backup_test.ansible_job_id }}"
      register: job_result
      until: job_result.finished
      retries: 60
      delay: 60

Example 4: Zero-Downtime Rolling Deployments

Implement blue-green deployments with health checks:

---
# playbooks/rolling-deployment.yml
- name: Blue-Green deployment with zero downtime
  hosts: webservers
  become: yes
  serial: 1
  max_fail_percentage: 0

  vars:
    app_name: "webapp"
    app_version: "{{ deploy_version }}"
    app_port: 8080
    health_check_url: "http://localhost:{{ app_port }}/health"
    health_check_retries: 30
    health_check_delay: 2

  pre_tasks:
    - name: Remove from load balancer
      haproxy:
        state: disabled
        host: "{{ inventory_hostname }}"
        socket: /run/haproxy/admin.sock
        backend: app_backend
      delegate_to: "{{ item }}"
      loop: "{{ groups['loadbalancers'] }}"

    - name: Wait for connections to drain
      wait_for:
        timeout: 10

  tasks:
    - name: Stop current application
      systemd:
        name: "{{ app_name }}"
        state: stopped

    - name: Backup current version
      command: >
        mv /opt/{{ app_name }}/current
        /opt/{{ app_name }}/rollback_{{ ansible_date_time.epoch }}
      args:
        removes: /opt/{{ app_name }}/current
      ignore_errors: yes

    - name: Deploy new version
      git:
        repo: "https://github.com/yourorg/{{ app_name }}.git"
        dest: "/opt/{{ app_name }}/releases/{{ app_version }}"
        version: "{{ app_version }}"
      become_user: "{{ app_name }}"

    - name: Install dependencies
      npm:
        path: "/opt/{{ app_name }}/releases/{{ app_version }}"
        production: yes
      become_user: "{{ app_name }}"

    - name: Create symlink to current version
      file:
        src: "/opt/{{ app_name }}/releases/{{ app_version }}"
        dest: "/opt/{{ app_name }}/current"
        state: link

    - name: Start application
      systemd:
        name: "{{ app_name }}"
        state: started

    - name: Wait for application health check
      uri:
        url: "{{ health_check_url }}"
        status_code: 200
        timeout: 5
      register: health_check
      until: health_check.status == 200
      retries: "{{ health_check_retries }}"
      delay: "{{ health_check_delay }}"
      failed_when: false

    - name: Rollback if health check fails
      block:
        - name: Stop failed deployment
          systemd:
            name: "{{ app_name }}"
            state: stopped

        - name: Restore previous version
          shell: |
            rm -f /opt/{{ app_name }}/current
            ROLLBACK=$(ls -t /opt/{{ app_name }}/rollback_* | head -1)
            mv "$ROLLBACK" /opt/{{ app_name }}/current
          args:
            executable: /bin/bash

        - name: Start rolled back version
          systemd:
            name: "{{ app_name }}"
            state: started

        - name: Fail deployment
          fail:
            msg: "Deployment failed health check, rolled back to previous version"
      when: health_check.status != 200

  post_tasks:
    - name: Add back to load balancer
      haproxy:
        state: enabled
        host: "{{ inventory_hostname }}"
        socket: /run/haproxy/admin.sock
        backend: app_backend
      delegate_to: "{{ item }}"
      loop: "{{ groups['loadbalancers'] }}"

    - name: Verify in load balancer rotation
      uri:
        url: "http://{{ hostvars[item]['ansible_default_ipv4']['address'] }}/haproxy?stats"
        return_content: yes
      delegate_to: "{{ item }}"
      loop: "{{ groups['loadbalancers'] }}"
      register: lb_status
      failed_when: "'{{ inventory_hostname }}' not in lb_status.content"

    - name: Cleanup old releases
      shell: |
        cd /opt/{{ app_name }}/releases
        ls -t | tail -n +4 | xargs -r rm -rf
        cd /opt/{{ app_name }}
        ls -t rollback_* 2>/dev/null | tail -n +3 | xargs -r rm -rf
      args:
        executable: /bin/bash

Example 5: Infrastructure Monitoring Setup

Deploy complete monitoring stack with Prometheus and Grafana:

---
# playbooks/monitoring-stack.yml
- name: Deploy Prometheus monitoring
  hosts: monitoring
  become: yes

  vars:
    prometheus_version: "2.45.0"
    grafana_version: "latest"
    alertmanager_version: "0.26.0"

  tasks:
    - name: Create prometheus user
      user:
        name: prometheus
        system: yes
        shell: /bin/false
        create_home: no

    - name: Create prometheus directories
      file:
        path: "{{ item }}"
        state: directory
        owner: prometheus
        group: prometheus
        mode: '0755'
      loop:
        - /etc/prometheus
        - /var/lib/prometheus

    - name: Download Prometheus
      get_url:
        url: "https://github.com/prometheus/prometheus/releases/download/v{{ prometheus_version }}/prometheus-{{ prometheus_version }}.linux-amd64.tar.gz"
        dest: /tmp/prometheus.tar.gz

    - name: Extract Prometheus
      unarchive:
        src: /tmp/prometheus.tar.gz
        dest: /tmp
        remote_src: yes

    - name: Copy Prometheus binaries
      copy:
        src: "/tmp/prometheus-{{ prometheus_version }}.linux-amd64/{{ item }}"
        dest: "/usr/local/bin/{{ item }}"
        mode: '0755'
        remote_src: yes
      loop:
        - prometheus
        - promtool

    - name: Configure Prometheus
      template:
        src: templates/prometheus.yml.j2
        dest: /etc/prometheus/prometheus.yml
        owner: prometheus
        group: prometheus
        mode: '0644'
      notify: reload prometheus

    - name: Create Prometheus systemd service
      copy:
        content: |
          [Unit]
          Description=Prometheus
          Wants=network-online.target
          After=network-online.target

          [Service]
          User=prometheus
          Group=prometheus
          Type=simple
          ExecStart=/usr/local/bin/prometheus \
            --config.file=/etc/prometheus/prometheus.yml \
            --storage.tsdb.path=/var/lib/prometheus/ \
            --web.console.templates=/etc/prometheus/consoles \
            --web.console.libraries=/etc/prometheus/console_libraries \
            --web.listen-address=0.0.0.0:9090

          [Install]
          WantedBy=multi-user.target
        dest: /etc/systemd/system/prometheus.service
        mode: '0644'
      notify:
        - reload systemd
        - restart prometheus

    - name: Start Prometheus
      systemd:
        name: prometheus
        state: started
        enabled: yes

    # Grafana installation
    - name: Add Grafana repository
      apt_repository:
        repo: "deb https://packages.grafana.com/oss/deb stable main"
        state: present
        filename: grafana

    - name: Add Grafana GPG key
      apt_key:
        url: https://packages.grafana.com/gpg.key
        state: present

    - name: Install Grafana
      apt:
        name: grafana
        state: present
        update_cache: yes

    - name: Configure Grafana
      template:
        src: templates/grafana.ini.j2
        dest: /etc/grafana/grafana.ini
        mode: '0640'
        owner: grafana
        group: grafana
      notify: restart grafana

    - name: Start Grafana
      systemd:
        name: grafana-server
        state: started
        enabled: yes

    - name: Configure firewall for Prometheus
      ufw:
        rule: allow
        port: '9090'
        proto: tcp

    - name: Configure firewall for Grafana
      ufw:
        rule: allow
        port: '3000'
        proto: tcp

  handlers:
    - name: reload systemd
      systemd:
        daemon_reload: yes

    - name: restart prometheus
      systemd:
        name: prometheus
        state: restarted

    - name: reload prometheus
      systemd:
        name: prometheus
        state: reloaded

    - name: restart grafana
      systemd:
        name: grafana-server
        state: restarted

- name: Deploy Node Exporters
  hosts: all
  become: yes

  vars:
    node_exporter_version: "1.7.0"

  tasks:
    - name: Create node_exporter user
      user:
        name: node_exporter
        system: yes
        shell: /bin/false
        create_home: no

    - name: Download Node Exporter
      get_url:
        url: "https://github.com/prometheus/node_exporter/releases/download/v{{ node_exporter_version }}/node_exporter-{{ node_exporter_version }}.linux-amd64.tar.gz"
        dest: /tmp/node_exporter.tar.gz

    - name: Extract Node Exporter
      unarchive:
        src: /tmp/node_exporter.tar.gz
        dest: /tmp
        remote_src: yes

    - name: Copy Node Exporter binary
      copy:
        src: "/tmp/node_exporter-{{ node_exporter_version }}.linux-amd64/node_exporter"
        dest: /usr/local/bin/node_exporter
        mode: '0755'
        remote_src: yes

    - name: Create Node Exporter systemd service
      copy:
        content: |
          [Unit]
          Description=Node Exporter
          After=network.target

          [Service]
          User=node_exporter
          Group=node_exporter
          Type=simple
          ExecStart=/usr/local/bin/node_exporter \
            --collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/) \
            --collector.netclass.ignored-devices=^(veth.*|docker.*|br-.*)$$

          [Install]
          WantedBy=multi-user.target
        dest: /etc/systemd/system/node_exporter.service
        mode: '0644'

    - name: Start Node Exporter
      systemd:
        name: node_exporter
        state: started
        enabled: yes
        daemon_reload: yes

Example 6: Security Compliance and Hardening

Implement CIS benchmarks and security best practices:

---
# playbooks/security-hardening.yml
- name: Apply security hardening
  hosts: all
  become: yes

  vars:
    allowed_ssh_users: ["admin", "deploy"]
    ssh_port: 22
    max_auth_tries: 3
    password_max_days: 90
    password_min_days: 1
    password_warn_age: 7

  tasks:
    # System updates
    - name: Update all packages
      apt:
        upgrade: dist
        update_cache: yes
        autoremove: yes
        autoclean: yes

    - name: Install security tools
      apt:
        name:
          - aide
          - auditd
          - fail2ban
          - rkhunter
          - lynis
        state: present

    # SSH hardening
    - name: Configure SSH daemon
      lineinfile:
        path: /etc/ssh/sshd_config
        regexp: "{{ item.regexp }}"
        line: "{{ item.line }}"
        state: present
        validate: '/usr/sbin/sshd -t -f %s'
      loop:
        - { regexp: '^#?PermitRootLogin', line: 'PermitRootLogin no' }
        - { regexp: '^#?PasswordAuthentication', line: 'PasswordAuthentication no' }
        - { regexp: '^#?PubkeyAuthentication', line: 'PubkeyAuthentication yes' }
        - { regexp: '^#?PermitEmptyPasswords', line: 'PermitEmptyPasswords no' }
        - { regexp: '^#?X11Forwarding', line: 'X11Forwarding no' }
        - { regexp: '^#?MaxAuthTries', line: 'MaxAuthTries {{ max_auth_tries }}' }
        - { regexp: '^#?ClientAliveInterval', line: 'ClientAliveInterval 300' }
        - { regexp: '^#?ClientAliveCountMax', line: 'ClientAliveCountMax 2' }
        - { regexp: '^#?Protocol', line: 'Protocol 2' }
        - { regexp: '^#?AllowUsers', line: 'AllowUsers {{ allowed_ssh_users | join(" ") }}' }
      notify: restart sshd

    # Password policies
    - name: Configure password aging
      lineinfile:
        path: /etc/login.defs
        regexp: "{{ item.regexp }}"
        line: "{{ item.line }}"
      loop:
        - { regexp: '^PASS_MAX_DAYS', line: 'PASS_MAX_DAYS {{ password_max_days }}' }
        - { regexp: '^PASS_MIN_DAYS', line: 'PASS_MIN_DAYS {{ password_min_days }}' }
        - { regexp: '^PASS_WARN_AGE', line: 'PASS_WARN_AGE {{ password_warn_age }}' }

    # Kernel hardening
    - name: Configure sysctl security parameters
      sysctl:
        name: "{{ item.name }}"
        value: "{{ item.value }}"
        state: present
        reload: yes
        sysctl_file: /etc/sysctl.d/99-security.conf
      loop:
        # Network security
        - { name: 'net.ipv4.conf.all.rp_filter', value: '1' }
        - { name: 'net.ipv4.conf.default.rp_filter', value: '1' }
        - { name: 'net.ipv4.icmp_echo_ignore_broadcasts', value: '1' }
        - { name: 'net.ipv4.conf.all.accept_source_route', value: '0' }
        - { name: 'net.ipv4.conf.default.accept_source_route', value: '0' }
        - { name: 'net.ipv4.conf.all.accept_redirects', value: '0' }
        - { name: 'net.ipv4.conf.default.accept_redirects', value: '0' }
        - { name: 'net.ipv4.conf.all.secure_redirects', value: '0' }
        - { name: 'net.ipv4.conf.default.secure_redirects', value: '0' }
        - { name: 'net.ipv4.conf.all.send_redirects', value: '0' }
        - { name: 'net.ipv4.conf.default.send_redirects', value: '0' }
        - { name: 'net.ipv4.tcp_syncookies', value: '1' }
        - { name: 'net.ipv4.tcp_timestamps', value: '0' }
        # Kernel security
        - { name: 'kernel.dmesg_restrict', value: '1' }
        - { name: 'kernel.kptr_restrict', value: '2' }
        - { name: 'kernel.yama.ptrace_scope', value: '1' }
        - { name: 'fs.suid_dumpable', value: '0' }

    # Fail2Ban configuration
    - name: Configure Fail2Ban for SSH
      copy:
        content: |
          [sshd]
          enabled = true
          port = {{ ssh_port }}
          filter = sshd
          logpath = /var/log/auth.log
          maxretry = 3
          bantime = 3600
          findtime = 600
        dest: /etc/fail2ban/jail.d/sshd.conf
        mode: '0644'
      notify: restart fail2ban

    # Audit daemon
    - name: Configure auditd rules
      copy:
        content: |
          # Delete all existing rules
          -D

          # Buffer size
          -b 8192

          # Failure mode
          -f 1

          # Monitor user/group changes
          -w /etc/group -p wa -k identity
          -w /etc/passwd -p wa -k identity
          -w /etc/gshadow -p wa -k identity
          -w /etc/shadow -p wa -k identity

          # Monitor system calls
          -a always,exit -F arch=b64 -S adjtimex -S settimeofday -k time-change
          -a always,exit -F arch=b32 -S adjtimex -S settimeofday -S stime -k time-change

          # Monitor network environment
          -a always,exit -F arch=b64 -S sethostname -S setdomainname -k system-locale
          -a always,exit -F arch=b32 -S sethostname -S setdomainname -k system-locale

          # Monitor login/logout events
          -w /var/log/faillog -p wa -k logins
          -w /var/log/lastlog -p wa -k logins

          # Monitor sudo usage
          -w /etc/sudoers -p wa -k sudo_changes
          -w /etc/sudoers.d/ -p wa -k sudo_changes
        dest: /etc/audit/rules.d/hardening.rules
        mode: '0640'
      notify: restart auditd

    # File integrity monitoring
    - name: Initialize AIDE database
      command: aideinit
      args:
        creates: /var/lib/aide/aide.db.new

    - name: Setup AIDE cron job
      cron:
        name: "AIDE file integrity check"
        minute: "0"
        hour: "5"
        job: "/usr/bin/aide --check | mail -s 'AIDE Report' root@localhost"

    # Disable unnecessary services
    - name: Disable unnecessary services
      systemd:
        name: "{{ item }}"
        state: stopped
        enabled: no
      loop:
        - bluetooth
        - cups
        - avahi-daemon
      ignore_errors: yes

    # Remove unnecessary packages
    - name: Remove unnecessary packages
      apt:
        name:
          - telnet
          - rsh-client
          - rsh-redone-client
        state: absent
        purge: yes

  handlers:
    - name: restart sshd
      systemd:
        name: sshd
        state: restarted

    - name: restart fail2ban
      systemd:
        name: fail2ban
        state: restarted

    - name: restart auditd
      systemd:
        name: auditd
        state: restarted

Best Practices for Production Playbooks

1. Use Ansible Vault for Secrets

# Create encrypted variable file
ansible-vault create group_vars/production/vault.yml

# Edit encrypted file
ansible-vault edit group_vars/production/vault.yml

# Content example:
vault_mysql_root_password: "super_secret_password"
vault_api_keys:
  aws: "AKIAIOSFODNN7EXAMPLE"
  sendgrid: "SG.example123"

Reference in playbooks:

vars:
  mysql_root_password: "{{ vault_mysql_root_password }}"

2. Implement Proper Error Handling

- name: Task with error handling
  command: /usr/bin/some-command
  register: result
  failed_when: false
  changed_when: result.rc == 0

- name: Handle errors gracefully
  block:
    - name: Risky operation
      command: /usr/bin/risky-command
  rescue:
    - name: Handle failure
      debug:
        msg: "Command failed, rolling back"
    - name: Rollback action
      command: /usr/bin/rollback-command
  always:
    - name: Cleanup
      file:
        path: /tmp/tempfile
        state: absent

3. Use Tags Strategically

- name: Full application setup
  hosts: appservers

  tasks:
    - name: Install dependencies
      apt:
        name: "{{ packages }}"
      tags: [install, packages]

    - name: Deploy code
      git:
        repo: "{{ repo_url }}"
        dest: /opt/app
      tags: [deploy, code]

    - name: Configure application
      template:
        src: config.j2
        dest: /opt/app/config.yml
      tags: [configure, config]

Run specific tags:

ansible-playbook site.yml --tags "deploy"
ansible-playbook site.yml --tags "install,configure"

4. Implement Testing and Validation

- name: Validate deployment
  hosts: webservers

  tasks:
    - name: Check if service is running
      systemd:
        name: nginx
        state: started
      check_mode: yes
      register: service_status
      failed_when: false

    - name: Verify HTTP response
      uri:
        url: http://localhost
        status_code: 200
        timeout: 5
      register: http_check
      until: http_check.status == 200
      retries: 5
      delay: 2

    - name: Validate configuration syntax
      command: nginx -t
      changed_when: false

    - name: Assert all checks passed
      assert:
        that:
          - service_status.state == "started"
          - http_check.status == 200
        fail_msg: "Validation failed"
        success_msg: "All validations passed"

5. Document with Comments and Metadata

---
# ============================================================================
# Playbook: production-deployment.yml
# Description: Deploy application to production environment
# Author: DevOps Team <[email protected]>
# Version: 2.1.0
# Last Updated: 2024-01-15
#
# Dependencies:
#   - Ansible 2.9+
#   - Python 3.6+
#   - AWS CLI configured
#
# Variables Required:
#   - deploy_version: Application version to deploy
#   - environment: Target environment (production/staging)
#
# Usage:
#   ansible-playbook production-deployment.yml -e deploy_version=v1.2.3
# ============================================================================

- name: Deploy application (v{{ deploy_version }})
  hosts: production

  # Task execution settings
  serial: 2              # Deploy 2 servers at a time
  max_fail_percentage: 10 # Fail if more than 10% of hosts fail

  tasks:
    # Each task should have a clear, descriptive name
    - name: Validate deployment prerequisites
      assert:
        that:
          - deploy_version is defined
          - deploy_version is match('^v[0-9]+\.[0-9]+\.[0-9]+$')
        fail_msg: "deploy_version must be in format v1.2.3"

Troubleshooting Playbook Executions

Debug Failed Tasks

- name: Debug playbook execution
  hosts: all

  tasks:
    - name: Run command with debugging
      command: /usr/bin/my-command
      register: command_result
      ignore_errors: yes

    - name: Display command output
      debug:
        var: command_result
        verbosity: 2

    - name: Show specific values
      debug:
        msg: "Return code: {{ command_result.rc }}, Output: {{ command_result.stdout }}"

Run with verbosity:

ansible-playbook debug.yml -v    # verbose
ansible-playbook debug.yml -vv   # more verbose
ansible-playbook debug.yml -vvv  # debug
ansible-playbook debug.yml -vvvv # connection debug

Dry Run and Check Mode

# Test without making changes
ansible-playbook site.yml --check

# Show what would change
ansible-playbook site.yml --check --diff

# Step through playbook interactively
ansible-playbook site.yml --step

Conclusion

These practical Ansible playbook examples demonstrate real-world automation scenarios that you can adapt to your infrastructure needs. From simple LEMP stack deployments to complex multi-tier applications with zero-downtime deployments, Ansible provides the flexibility and power to automate virtually any infrastructure task.

Key takeaways:

  • Structure playbooks for reusability and maintainability
  • Implement proper error handling and rollback mechanisms
  • Use variables and templates for environment-specific configurations
  • Apply security best practices from the start
  • Test thoroughly before deploying to production
  • Document your playbooks comprehensively
  • Use version control for all Ansible code

As you build your Ansible automation library, focus on creating idempotent, well-tested playbooks that can be safely run multiple times. Start with simple playbooks and gradually increase complexity as you gain experience. Remember that the goal is not just to automate, but to create reliable, maintainable infrastructure-as-code that your entire team can understand and contribute to.

Continue exploring advanced topics such as custom modules, dynamic inventory, Ansible Tower/AWX for enterprise orchestration, and integration with CI/CD pipelines to take your automation to the next level.