Playbooks de Ansible: Ejemplos Prácticos para Infraestructura del Mundo Real

Introducción

Los playbooks de Ansible son la piedra angular de la automatización de infraestructura, transformando procesos manuales complejos en configuraciones repetibles y controladas por versiones. Mientras que los comandos ad-hoc son útiles para tareas rápidas, los playbooks proporcionan el poder para orquestar operaciones de múltiples pasos, gestionar infraestructura compleja e implementar estrategias de despliegue sofisticadas.

Esta guía completa presenta ejemplos prácticos de playbooks de Ansible listos para producción que puede adaptar a las necesidades de su infraestructura. Cada ejemplo está diseñado para resolver problemas del mundo real enfrentados por administradores de sistemas e ingenieros de DevOps, desde desplegar pilas de aplicaciones completas hasta implementar procedimientos de recuperación ante desastres.

Ya sea que esté gestionando un puñado de servidores u orquestando miles de instancias en la nube, estos ejemplos de playbooks le ayudarán a automatizar tareas repetitivas, reducir el error humano e implementar las mejores prácticas de infraestructura como código. Cada ejemplo incluye explicaciones detalladas, código funcional completo y mejores prácticas que puede aplicar inmediatamente a sus proyectos.

Comprensión de la Estructura de Playbooks

Antes de sumergirnos en los ejemplos, entendamos la anatomía de un playbook bien estructurado:

---
# Top-level play
- name: Descriptive play name
  hosts: target_hosts
  become: yes  # Privilege escalation
  gather_facts: yes  # Gather system information

  vars:
    # Play-specific variables
    app_version: "1.0.0"

  pre_tasks:
    # Tasks that run before roles
    - name: Update cache
      apt:
        update_cache: yes

  roles:
    # Reusable role includes
    - common
    - webserver

  tasks:
    # Main tasks
    - name: Task description
      module_name:
        parameter: value
      notify: handler_name

  post_tasks:
    # Tasks that run after everything
    - name: Final verification
      uri:
        url: http://localhost

  handlers:
    # Event-driven tasks
    - name: handler_name
      systemd:
        name: nginx
        state: restarted

Requisitos Previos

Para usar estos playbooks de manera efectiva, asegúrese de tener:

  • Ansible 2.9 o superior instalado en su nodo de control
  • Acceso SSH a nodos gestionados con autenticación basada en claves
  • Privilegios Sudo/root en nodos gestionados
  • Comprensión básica de la sintaxis YAML
  • Archivo de inventario correctamente configurado
  • Python 3.6+ en todos los nodos gestionados

Estructura del Proyecto

Organice su proyecto Ansible de esta manera:

ansible-project/
├── ansible.cfg
├── inventory/
│   ├── production
│   ├── staging
│   └── development
├── group_vars/
│   ├── all.yml
│   ├── webservers.yml
│   └── databases.yml
├── host_vars/
│   └── special-host.yml
├── playbooks/
│   ├── site.yml
│   ├── webservers.yml
│   └── databases.yml
├── roles/
│   ├── common/
│   ├── nginx/
│   └── postgresql/
└── files/
    └── templates/

Ejemplo 1: Despliegue Completo de Pila LEMP

Este playbook despliega una pila completa de Linux, Nginx, MySQL (MariaDB), PHP con endurecimiento de seguridad:

---
# playbooks/lemp-stack.yml
- name: Deploy LEMP Stack
  hosts: webservers
  become: yes

  vars:
    php_version: "8.2"
    mysql_root_password: "{{ vault_mysql_root_password }}"
    app_user: "www-data"
    app_domain: "example.com"

  tasks:
    # System preparation
    - name: Update apt cache
      apt:
        update_cache: yes
        cache_valid_time: 3600

    - name: Install system dependencies
      apt:
        name:
          - software-properties-common
          - apt-transport-https
          - ca-certificates
          - curl
          - gnupg
        state: present

    # Nginx installation and configuration
    - name: Install Nginx
      apt:
        name: nginx
        state: present

    - name: Create web root directory
      file:
        path: "/var/www/{{ app_domain }}"
        state: directory
        owner: "{{ app_user }}"
        group: "{{ app_user }}"
        mode: '0755'

    - name: Configure Nginx virtual host
      template:
        src: templates/nginx-vhost.j2
        dest: "/etc/nginx/sites-available/{{ app_domain }}"
        mode: '0644'
      notify: reload nginx

    - name: Enable Nginx site
      file:
        src: "/etc/nginx/sites-available/{{ app_domain }}"
        dest: "/etc/nginx/sites-enabled/{{ app_domain }}"
        state: link
      notify: reload nginx

    - name: Remove default Nginx site
      file:
        path: /etc/nginx/sites-enabled/default
        state: absent
      notify: reload nginx

    # MariaDB installation
    - name: Install MariaDB server
      apt:
        name:
          - mariadb-server
          - mariadb-client
          - python3-pymysql
        state: present

    - name: Start and enable MariaDB
      systemd:
        name: mariadb
        state: started
        enabled: yes

    - name: Set MariaDB root password
      mysql_user:
        name: root
        password: "{{ mysql_root_password }}"
        login_unix_socket: /var/run/mysqld/mysqld.sock
        state: present

    - name: Create MariaDB configuration for root
      template:
        src: templates/my.cnf.j2
        dest: /root/.my.cnf
        mode: '0600'

    - name: Remove anonymous MariaDB users
      mysql_user:
        name: ''
        host_all: yes
        state: absent

    - name: Remove MariaDB test database
      mysql_db:
        name: test
        state: absent

    # PHP installation
    - name: Add PHP repository
      apt_repository:
        repo: "ppa:ondrej/php"
        state: present

    - name: Install PHP and extensions
      apt:
        name:
          - "php{{ php_version }}-fpm"
          - "php{{ php_version }}-mysql"
          - "php{{ php_version }}-curl"
          - "php{{ php_version }}-gd"
          - "php{{ php_version }}-mbstring"
          - "php{{ php_version }}-xml"
          - "php{{ php_version }}-zip"
          - "php{{ php_version }}-opcache"
        state: present

    - name: Configure PHP-FPM pool
      template:
        src: templates/php-fpm-pool.j2
        dest: "/etc/php/{{ php_version }}/fpm/pool.d/www.conf"
        mode: '0644'
      notify: restart php-fpm

    - name: Configure PHP settings
      lineinfile:
        path: "/etc/php/{{ php_version }}/fpm/php.ini"
        regexp: "{{ item.regexp }}"
        line: "{{ item.line }}"
      loop:
        - { regexp: '^;?upload_max_filesize', line: 'upload_max_filesize = 64M' }
        - { regexp: '^;?post_max_size', line: 'post_max_size = 64M' }
        - { regexp: '^;?memory_limit', line: 'memory_limit = 256M' }
        - { regexp: '^;?max_execution_time', line: 'max_execution_time = 300' }
      notify: restart php-fpm

    # Security hardening
    - name: Install and configure UFW
      apt:
        name: ufw
        state: present

    - name: Configure UFW defaults
      ufw:
        direction: "{{ item.direction }}"
        policy: "{{ item.policy }}"
      loop:
        - { direction: 'incoming', policy: 'deny' }
        - { direction: 'outgoing', policy: 'allow' }

    - name: Allow SSH
      ufw:
        rule: allow
        port: '22'
        proto: tcp

    - name: Allow HTTP
      ufw:
        rule: allow
        port: '80'
        proto: tcp

    - name: Allow HTTPS
      ufw:
        rule: allow
        port: '443'
        proto: tcp

    - name: Enable UFW
      ufw:
        state: enabled

    # SSL certificate with Let's Encrypt
    - name: Install Certbot
      apt:
        name:
          - certbot
          - python3-certbot-nginx
        state: present

    - name: Obtain SSL certificate
      command: >
        certbot --nginx --non-interactive --agree-tos
        --email admin@{{ app_domain }}
        -d {{ app_domain }} -d www.{{ app_domain }}
      args:
        creates: "/etc/letsencrypt/live/{{ app_domain }}/fullchain.pem"

    - name: Setup SSL renewal cron job
      cron:
        name: "Renew Let's Encrypt certificates"
        minute: "0"
        hour: "3"
        job: "certbot renew --quiet --post-hook 'systemctl reload nginx'"

    # Deploy sample application
    - name: Deploy index.php
      copy:
        content: |
          <?php
          phpinfo();
          ?>
        dest: "/var/www/{{ app_domain }}/index.php"
        owner: "{{ app_user }}"
        group: "{{ app_user }}"
        mode: '0644'

  handlers:
    - name: reload nginx
      systemd:
        name: nginx
        state: reloaded

    - name: restart php-fpm
      systemd:
        name: "php{{ php_version }}-fpm"
        state: restarted

Plantilla Requerida: nginx-vhost.j2

# templates/nginx-vhost.j2
server {
    listen 80;
    listen [::]:80;
    server_name {{ app_domain }} www.{{ app_domain }};
    root /var/www/{{ app_domain }};

    index index.php index.html index.htm;

    location / {
        try_files $uri $uri/ =404;
    }

    location ~ \.php$ {
        include snippets/fastcgi-php.conf;
        fastcgi_pass unix:/var/run/php/php{{ php_version }}-fpm.sock;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        include fastcgi_params;
    }

    location ~ /\.ht {
        deny all;
    }

    # Security headers
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-XSS-Protection "1; mode=block" always;

    # Gzip compression
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types text/plain text/css text/xml text/javascript application/json application/javascript application/xml+rss;
}

Ejemplo 2: Despliegue de Aplicación Multi-Nivel

Despliegue una aplicación completa con balanceador de carga, servidores web y clúster de base de datos:

---
# playbooks/multi-tier-app.yml
- name: Configure load balancers
  hosts: loadbalancers
  become: yes

  tasks:
    - name: Install HAProxy
      apt:
        name: haproxy
        state: present

    - name: Configure HAProxy
      template:
        src: templates/haproxy.cfg.j2
        dest: /etc/haproxy/haproxy.cfg
        mode: '0644'
        validate: 'haproxy -f %s -c'
      notify: restart haproxy

    - name: Enable HAProxy
      systemd:
        name: haproxy
        enabled: yes
        state: started

  handlers:
    - name: restart haproxy
      systemd:
        name: haproxy
        state: restarted

- name: Configure web application servers
  hosts: appservers
  become: yes
  serial: 1  # Rolling deployment

  vars:
    app_name: "myapp"
    app_version: "{{ deploy_version | default('latest') }}"
    app_port: 3000

  tasks:
    - name: Install Node.js
      apt:
        name:
          - nodejs
          - npm
        state: present

    - name: Create application user
      user:
        name: "{{ app_name }}"
        system: yes
        shell: /bin/bash
        home: "/opt/{{ app_name }}"

    - name: Create app directory
      file:
        path: "/opt/{{ app_name }}"
        state: directory
        owner: "{{ app_name }}"
        group: "{{ app_name }}"
        mode: '0755'

    - name: Deploy application code
      git:
        repo: "https://github.com/yourorg/{{ app_name }}.git"
        dest: "/opt/{{ app_name }}/app"
        version: "{{ app_version }}"
        force: yes
      become_user: "{{ app_name }}"
      notify: restart app

    - name: Install npm dependencies
      npm:
        path: "/opt/{{ app_name }}/app"
        production: yes
      become_user: "{{ app_name }}"
      notify: restart app

    - name: Create environment file
      template:
        src: templates/app-env.j2
        dest: "/opt/{{ app_name }}/.env"
        owner: "{{ app_name }}"
        group: "{{ app_name }}"
        mode: '0600'
      notify: restart app

    - name: Create systemd service
      template:
        src: templates/app-service.j2
        dest: "/etc/systemd/system/{{ app_name }}.service"
        mode: '0644'
      notify:
        - reload systemd
        - restart app

    - name: Enable and start application
      systemd:
        name: "{{ app_name }}"
        enabled: yes
        state: started

    - name: Wait for application to be ready
      uri:
        url: "http://localhost:{{ app_port }}/health"
        status_code: 200
      register: result
      until: result.status == 200
      retries: 10
      delay: 3

  handlers:
    - name: reload systemd
      systemd:
        daemon_reload: yes

    - name: restart app
      systemd:
        name: "{{ app_name }}"
        state: restarted

- name: Configure database servers
  hosts: databases
  become: yes

  vars:
    postgres_version: "15"
    db_name: "myapp_production"
    db_user: "myapp"
    db_password: "{{ vault_db_password }}"

  tasks:
    - name: Install PostgreSQL
      apt:
        name:
          - "postgresql-{{ postgres_version }}"
          - "postgresql-contrib-{{ postgres_version }}"
          - python3-psycopg2
        state: present

    - name: Ensure PostgreSQL is running
      systemd:
        name: postgresql
        state: started
        enabled: yes

    - name: Create application database
      postgresql_db:
        name: "{{ db_name }}"
        state: present
      become_user: postgres

    - name: Create application user
      postgresql_user:
        name: "{{ db_user }}"
        password: "{{ db_password }}"
        db: "{{ db_name }}"
        priv: ALL
        state: present
      become_user: postgres

    - name: Configure PostgreSQL for network access
      lineinfile:
        path: "/etc/postgresql/{{ postgres_version }}/main/postgresql.conf"
        regexp: "^#?listen_addresses"
        line: "listen_addresses = '*'"
      notify: restart postgresql

    - name: Allow application servers to connect
      postgresql_pg_hba:
        dest: "/etc/postgresql/{{ postgres_version }}/main/pg_hba.conf"
        contype: host
        users: "{{ db_user }}"
        source: "{{ hostvars[item]['ansible_default_ipv4']['address'] }}/32"
        databases: "{{ db_name }}"
        method: md5
      loop: "{{ groups['appservers'] }}"
      notify: restart postgresql

  handlers:
    - name: restart postgresql
      systemd:
        name: postgresql
        state: restarted

- name: Run database migrations
  hosts: appservers[0]
  become: yes
  become_user: myapp

  tasks:
    - name: Run migrations
      command: npm run migrate
      args:
        chdir: /opt/myapp/app
      run_once: yes

Ejemplo 3: Automatización de Recuperación ante Desastres y Respaldo

Solución completa de respaldo con rotación y almacenamiento fuera del sitio:

---
# playbooks/backup-automation.yml
- name: Configure automated backups
  hosts: all
  become: yes

  vars:
    backup_dir: "/var/backups"
    backup_retention_days: 7
    backup_s3_bucket: "company-backups"
    backup_schedule: "0 2 * * *"  # 2 AM daily

  tasks:
    - name: Install backup tools
      apt:
        name:
          - rsync
          - borgbackup
          - awscli
          - pigz
        state: present

    - name: Create backup directory
      file:
        path: "{{ backup_dir }}"
        state: directory
        mode: '0700'
        owner: root
        group: root

    - name: Create backup script
      copy:
        content: |
          #!/bin/bash
          set -euo pipefail

          # Configuration
          BACKUP_DIR="{{ backup_dir }}"
          RETENTION_DAYS={{ backup_retention_days }}
          S3_BUCKET="{{ backup_s3_bucket }}"
          HOSTNAME=$(hostname -f)
          TIMESTAMP=$(date +%Y%m%d_%H%M%S)

          # Logging
          LOG_FILE="${BACKUP_DIR}/backup.log"
          exec 1> >(tee -a "${LOG_FILE}")
          exec 2>&1

          echo "=== Backup started at $(date) ==="

          # Backup system files
          echo "Backing up system files..."
          tar -czf "${BACKUP_DIR}/system_${TIMESTAMP}.tar.gz" \
            /etc \
            /home \
            /root \
            --exclude='/home/*/.cache' \
            --exclude='/home/*/tmp'

          {% if 'databases' in group_names %}
          # Database backup
          echo "Backing up databases..."
          if systemctl is-active --quiet postgresql; then
            sudo -u postgres pg_dumpall | pigz > "${BACKUP_DIR}/postgres_${TIMESTAMP}.sql.gz"
          fi

          if systemctl is-active --quiet mariadb; then
            mysqldump --all-databases --single-transaction | pigz > "${BACKUP_DIR}/mysql_${TIMESTAMP}.sql.gz"
          fi
          {% endif %}

          {% if 'webservers' in group_names %}
          # Web content backup
          echo "Backing up web content..."
          tar -czf "${BACKUP_DIR}/web_${TIMESTAMP}.tar.gz" /var/www
          {% endif %}

          # Upload to S3
          echo "Uploading to S3..."
          aws s3 sync "${BACKUP_DIR}" "s3://${S3_BUCKET}/${HOSTNAME}/" \
            --exclude "*.log" \
            --storage-class STANDARD_IA

          # Cleanup old local backups
          echo "Cleaning up old backups..."
          find "${BACKUP_DIR}" -name "*.tar.gz" -mtime +${RETENTION_DAYS} -delete
          find "${BACKUP_DIR}" -name "*.sql.gz" -mtime +${RETENTION_DAYS} -delete

          echo "=== Backup completed at $(date) ==="
        dest: /usr/local/bin/automated-backup.sh
        mode: '0700'
        owner: root
        group: root

    - name: Configure AWS credentials
      template:
        src: templates/aws-credentials.j2
        dest: /root/.aws/credentials
        mode: '0600'

    - name: Schedule backup cron job
      cron:
        name: "Automated system backup"
        minute: "{{ backup_schedule.split()[0] }}"
        hour: "{{ backup_schedule.split()[1] }}"
        job: "/usr/local/bin/automated-backup.sh"
        state: present

    - name: Create backup monitoring script
      copy:
        content: |
          #!/bin/bash
          BACKUP_DIR="{{ backup_dir }}"
          MAX_AGE_HOURS=26

          LATEST_BACKUP=$(find "${BACKUP_DIR}" -name "*.tar.gz" -type f -printf '%T@ %p\n' | sort -n | tail -1 | cut -f2- -d" ")

          if [ -z "$LATEST_BACKUP" ]; then
            echo "CRITICAL: No backups found"
            exit 2
          fi

          AGE_HOURS=$(( ($(date +%s) - $(stat -c %Y "$LATEST_BACKUP")) / 3600 ))

          if [ $AGE_HOURS -gt $MAX_AGE_HOURS ]; then
            echo "WARNING: Latest backup is ${AGE_HOURS} hours old"
            exit 1
          fi

          echo "OK: Latest backup is ${AGE_HOURS} hours old"
          exit 0
        dest: /usr/local/bin/check-backup.sh
        mode: '0755'

    - name: Test backup script
      command: /usr/local/bin/automated-backup.sh
      async: 3600
      poll: 0
      register: backup_test

    - name: Verify backup completion
      async_status:
        jid: "{{ backup_test.ansible_job_id }}"
      register: job_result
      until: job_result.finished
      retries: 60
      delay: 60

Ejemplo 4: Despliegues Continuos sin Tiempo de Inactividad

Implemente despliegues blue-green con verificaciones de salud:

---
# playbooks/rolling-deployment.yml
- name: Blue-Green deployment with zero downtime
  hosts: webservers
  become: yes
  serial: 1
  max_fail_percentage: 0

  vars:
    app_name: "webapp"
    app_version: "{{ deploy_version }}"
    app_port: 8080
    health_check_url: "http://localhost:{{ app_port }}/health"
    health_check_retries: 30
    health_check_delay: 2

  pre_tasks:
    - name: Remove from load balancer
      haproxy:
        state: disabled
        host: "{{ inventory_hostname }}"
        socket: /run/haproxy/admin.sock
        backend: app_backend
      delegate_to: "{{ item }}"
      loop: "{{ groups['loadbalancers'] }}"

    - name: Wait for connections to drain
      wait_for:
        timeout: 10

  tasks:
    - name: Stop current application
      systemd:
        name: "{{ app_name }}"
        state: stopped

    - name: Backup current version
      command: >
        mv /opt/{{ app_name }}/current
        /opt/{{ app_name }}/rollback_{{ ansible_date_time.epoch }}
      args:
        removes: /opt/{{ app_name }}/current
      ignore_errors: yes

    - name: Deploy new version
      git:
        repo: "https://github.com/yourorg/{{ app_name }}.git"
        dest: "/opt/{{ app_name }}/releases/{{ app_version }}"
        version: "{{ app_version }}"
      become_user: "{{ app_name }}"

    - name: Install dependencies
      npm:
        path: "/opt/{{ app_name }}/releases/{{ app_version }}"
        production: yes
      become_user: "{{ app_name }}"

    - name: Create symlink to current version
      file:
        src: "/opt/{{ app_name }}/releases/{{ app_version }}"
        dest: "/opt/{{ app_name }}/current"
        state: link

    - name: Start application
      systemd:
        name: "{{ app_name }}"
        state: started

    - name: Wait for application health check
      uri:
        url: "{{ health_check_url }}"
        status_code: 200
        timeout: 5
      register: health_check
      until: health_check.status == 200
      retries: "{{ health_check_retries }}"
      delay: "{{ health_check_delay }}"
      failed_when: false

    - name: Rollback if health check fails
      block:
        - name: Stop failed deployment
          systemd:
            name: "{{ app_name }}"
            state: stopped

        - name: Restore previous version
          shell: |
            rm -f /opt/{{ app_name }}/current
            ROLLBACK=$(ls -t /opt/{{ app_name }}/rollback_* | head -1)
            mv "$ROLLBACK" /opt/{{ app_name }}/current
          args:
            executable: /bin/bash

        - name: Start rolled back version
          systemd:
            name: "{{ app_name }}"
            state: started

        - name: Fail deployment
          fail:
            msg: "Deployment failed health check, rolled back to previous version"
      when: health_check.status != 200

  post_tasks:
    - name: Add back to load balancer
      haproxy:
        state: enabled
        host: "{{ inventory_hostname }}"
        socket: /run/haproxy/admin.sock
        backend: app_backend
      delegate_to: "{{ item }}"
      loop: "{{ groups['loadbalancers'] }}"

    - name: Verify in load balancer rotation
      uri:
        url: "http://{{ hostvars[item]['ansible_default_ipv4']['address'] }}/haproxy?stats"
        return_content: yes
      delegate_to: "{{ item }}"
      loop: "{{ groups['loadbalancers'] }}"
      register: lb_status
      failed_when: "'{{ inventory_hostname }}' not in lb_status.content"

    - name: Cleanup old releases
      shell: |
        cd /opt/{{ app_name }}/releases
        ls -t | tail -n +4 | xargs -r rm -rf
        cd /opt/{{ app_name }}
        ls -t rollback_* 2>/dev/null | tail -n +3 | xargs -r rm -rf
      args:
        executable: /bin/bash

Ejemplo 5: Configuración de Monitoreo de Infraestructura

Despliegue pila completa de monitoreo con Prometheus y Grafana:

---
# playbooks/monitoring-stack.yml
- name: Deploy Prometheus monitoring
  hosts: monitoring
  become: yes

  vars:
    prometheus_version: "2.45.0"
    grafana_version: "latest"
    alertmanager_version: "0.26.0"

  tasks:
    - name: Create prometheus user
      user:
        name: prometheus
        system: yes
        shell: /bin/false
        create_home: no

    - name: Create prometheus directories
      file:
        path: "{{ item }}"
        state: directory
        owner: prometheus
        group: prometheus
        mode: '0755'
      loop:
        - /etc/prometheus
        - /var/lib/prometheus

    - name: Download Prometheus
      get_url:
        url: "https://github.com/prometheus/prometheus/releases/download/v{{ prometheus_version }}/prometheus-{{ prometheus_version }}.linux-amd64.tar.gz"
        dest: /tmp/prometheus.tar.gz

    - name: Extract Prometheus
      unarchive:
        src: /tmp/prometheus.tar.gz
        dest: /tmp
        remote_src: yes

    - name: Copy Prometheus binaries
      copy:
        src: "/tmp/prometheus-{{ prometheus_version }}.linux-amd64/{{ item }}"
        dest: "/usr/local/bin/{{ item }}"
        mode: '0755'
        remote_src: yes
      loop:
        - prometheus
        - promtool

    - name: Configure Prometheus
      template:
        src: templates/prometheus.yml.j2
        dest: /etc/prometheus/prometheus.yml
        owner: prometheus
        group: prometheus
        mode: '0644'
      notify: reload prometheus

    - name: Create Prometheus systemd service
      copy:
        content: |
          [Unit]
          Description=Prometheus
          Wants=network-online.target
          After=network-online.target

          [Service]
          User=prometheus
          Group=prometheus
          Type=simple
          ExecStart=/usr/local/bin/prometheus \
            --config.file=/etc/prometheus/prometheus.yml \
            --storage.tsdb.path=/var/lib/prometheus/ \
            --web.console.templates=/etc/prometheus/consoles \
            --web.console.libraries=/etc/prometheus/console_libraries \
            --web.listen-address=0.0.0.0:9090

          [Install]
          WantedBy=multi-user.target
        dest: /etc/systemd/system/prometheus.service
        mode: '0644'
      notify:
        - reload systemd
        - restart prometheus

    - name: Start Prometheus
      systemd:
        name: prometheus
        state: started
        enabled: yes

    # Grafana installation
    - name: Add Grafana repository
      apt_repository:
        repo: "deb https://packages.grafana.com/oss/deb stable main"
        state: present
        filename: grafana

    - name: Add Grafana GPG key
      apt_key:
        url: https://packages.grafana.com/gpg.key
        state: present

    - name: Install Grafana
      apt:
        name: grafana
        state: present
        update_cache: yes

    - name: Configure Grafana
      template:
        src: templates/grafana.ini.j2
        dest: /etc/grafana/grafana.ini
        mode: '0640'
        owner: grafana
        group: grafana
      notify: restart grafana

    - name: Start Grafana
      systemd:
        name: grafana-server
        state: started
        enabled: yes

    - name: Configure firewall for Prometheus
      ufw:
        rule: allow
        port: '9090'
        proto: tcp

    - name: Configure firewall for Grafana
      ufw:
        rule: allow
        port: '3000'
        proto: tcp

  handlers:
    - name: reload systemd
      systemd:
        daemon_reload: yes

    - name: restart prometheus
      systemd:
        name: prometheus
        state: restarted

    - name: reload prometheus
      systemd:
        name: prometheus
        state: reloaded

    - name: restart grafana
      systemd:
        name: grafana-server
        state: restarted

- name: Deploy Node Exporters
  hosts: all
  become: yes

  vars:
    node_exporter_version: "1.7.0"

  tasks:
    - name: Create node_exporter user
      user:
        name: node_exporter
        system: yes
        shell: /bin/false
        create_home: no

    - name: Download Node Exporter
      get_url:
        url: "https://github.com/prometheus/node_exporter/releases/download/v{{ node_exporter_version }}/node_exporter-{{ node_exporter_version }}.linux-amd64.tar.gz"
        dest: /tmp/node_exporter.tar.gz

    - name: Extract Node Exporter
      unarchive:
        src: /tmp/node_exporter.tar.gz
        dest: /tmp
        remote_src: yes

    - name: Copy Node Exporter binary
      copy:
        src: "/tmp/node_exporter-{{ node_exporter_version }}.linux-amd64/node_exporter"
        dest: /usr/local/bin/node_exporter
        mode: '0755'
        remote_src: yes

    - name: Create Node Exporter systemd service
      copy:
        content: |
          [Unit]
          Description=Node Exporter
          After=network.target

          [Service]
          User=node_exporter
          Group=node_exporter
          Type=simple
          ExecStart=/usr/local/bin/node_exporter \
            --collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/) \
            --collector.netclass.ignored-devices=^(veth.*|docker.*|br-.*)$$

          [Install]
          WantedBy=multi-user.target
        dest: /etc/systemd/system/node_exporter.service
        mode: '0644'

    - name: Start Node Exporter
      systemd:
        name: node_exporter
        state: started
        enabled: yes
        daemon_reload: yes

Ejemplo 6: Cumplimiento de Seguridad y Endurecimiento

Implemente benchmarks CIS y mejores prácticas de seguridad:

---
# playbooks/security-hardening.yml
- name: Apply security hardening
  hosts: all
  become: yes

  vars:
    allowed_ssh_users: ["admin", "deploy"]
    ssh_port: 22
    max_auth_tries: 3
    password_max_days: 90
    password_min_days: 1
    password_warn_age: 7

  tasks:
    # System updates
    - name: Update all packages
      apt:
        upgrade: dist
        update_cache: yes
        autoremove: yes
        autoclean: yes

    - name: Install security tools
      apt:
        name:
          - aide
          - auditd
          - fail2ban
          - rkhunter
          - lynis
        state: present

    # SSH hardening
    - name: Configure SSH daemon
      lineinfile:
        path: /etc/ssh/sshd_config
        regexp: "{{ item.regexp }}"
        line: "{{ item.line }}"
        state: present
        validate: '/usr/sbin/sshd -t -f %s'
      loop:
        - { regexp: '^#?PermitRootLogin', line: 'PermitRootLogin no' }
        - { regexp: '^#?PasswordAuthentication', line: 'PasswordAuthentication no' }
        - { regexp: '^#?PubkeyAuthentication', line: 'PubkeyAuthentication yes' }
        - { regexp: '^#?PermitEmptyPasswords', line: 'PermitEmptyPasswords no' }
        - { regexp: '^#?X11Forwarding', line: 'X11Forwarding no' }
        - { regexp: '^#?MaxAuthTries', line: 'MaxAuthTries {{ max_auth_tries }}' }
        - { regexp: '^#?ClientAliveInterval', line: 'ClientAliveInterval 300' }
        - { regexp: '^#?ClientAliveCountMax', line: 'ClientAliveCountMax 2' }
        - { regexp: '^#?Protocol', line: 'Protocol 2' }
        - { regexp: '^#?AllowUsers', line: 'AllowUsers {{ allowed_ssh_users | join(" ") }}' }
      notify: restart sshd

    # Password policies
    - name: Configure password aging
      lineinfile:
        path: /etc/login.defs
        regexp: "{{ item.regexp }}"
        line: "{{ item.line }}"
      loop:
        - { regexp: '^PASS_MAX_DAYS', line: 'PASS_MAX_DAYS {{ password_max_days }}' }
        - { regexp: '^PASS_MIN_DAYS', line: 'PASS_MIN_DAYS {{ password_min_days }}' }
        - { regexp: '^PASS_WARN_AGE', line: 'PASS_WARN_AGE {{ password_warn_age }}' }

    # Kernel hardening
    - name: Configure sysctl security parameters
      sysctl:
        name: "{{ item.name }}"
        value: "{{ item.value }}"
        state: present
        reload: yes
        sysctl_file: /etc/sysctl.d/99-security.conf
      loop:
        # Network security
        - { name: 'net.ipv4.conf.all.rp_filter', value: '1' }
        - { name: 'net.ipv4.conf.default.rp_filter', value: '1' }
        - { name: 'net.ipv4.icmp_echo_ignore_broadcasts', value: '1' }
        - { name: 'net.ipv4.conf.all.accept_source_route', value: '0' }
        - { name: 'net.ipv4.conf.default.accept_source_route', value: '0' }
        - { name: 'net.ipv4.conf.all.accept_redirects', value: '0' }
        - { name: 'net.ipv4.conf.default.accept_redirects', value: '0' }
        - { name: 'net.ipv4.conf.all.secure_redirects', value: '0' }
        - { name: 'net.ipv4.conf.default.secure_redirects', value: '0' }
        - { name: 'net.ipv4.conf.all.send_redirects', value: '0' }
        - { name: 'net.ipv4.conf.default.send_redirects', value: '0' }
        - { name: 'net.ipv4.tcp_syncookies', value: '1' }
        - { name: 'net.ipv4.tcp_timestamps', value: '0' }
        # Kernel security
        - { name: 'kernel.dmesg_restrict', value: '1' }
        - { name: 'kernel.kptr_restrict', value: '2' }
        - { name: 'kernel.yama.ptrace_scope', value: '1' }
        - { name: 'fs.suid_dumpable', value: '0' }

    # Fail2Ban configuration
    - name: Configure Fail2Ban for SSH
      copy:
        content: |
          [sshd]
          enabled = true
          port = {{ ssh_port }}
          filter = sshd
          logpath = /var/log/auth.log
          maxretry = 3
          bantime = 3600
          findtime = 600
        dest: /etc/fail2ban/jail.d/sshd.conf
        mode: '0644'
      notify: restart fail2ban

    # Audit daemon
    - name: Configure auditd rules
      copy:
        content: |
          # Delete all existing rules
          -D

          # Buffer size
          -b 8192

          # Failure mode
          -f 1

          # Monitor user/group changes
          -w /etc/group -p wa -k identity
          -w /etc/passwd -p wa -k identity
          -w /etc/gshadow -p wa -k identity
          -w /etc/shadow -p wa -k identity

          # Monitor system calls
          -a always,exit -F arch=b64 -S adjtimex -S settimeofday -k time-change
          -a always,exit -F arch=b32 -S adjtimex -S settimeofday -S stime -k time-change

          # Monitor network environment
          -a always,exit -F arch=b64 -S sethostname -S setdomainname -k system-locale
          -a always,exit -F arch=b32 -S sethostname -S setdomainname -k system-locale

          # Monitor login/logout events
          -w /var/log/faillog -p wa -k logins
          -w /var/log/lastlog -p wa -k logins

          # Monitor sudo usage
          -w /etc/sudoers -p wa -k sudo_changes
          -w /etc/sudoers.d/ -p wa -k sudo_changes
        dest: /etc/audit/rules.d/hardening.rules
        mode: '0640'
      notify: restart auditd

    # File integrity monitoring
    - name: Initialize AIDE database
      command: aideinit
      args:
        creates: /var/lib/aide/aide.db.new

    - name: Setup AIDE cron job
      cron:
        name: "AIDE file integrity check"
        minute: "0"
        hour: "5"
        job: "/usr/bin/aide --check | mail -s 'AIDE Report' root@localhost"

    # Disable unnecessary services
    - name: Disable unnecessary services
      systemd:
        name: "{{ item }}"
        state: stopped
        enabled: no
      loop:
        - bluetooth
        - cups
        - avahi-daemon
      ignore_errors: yes

    # Remove unnecessary packages
    - name: Remove unnecessary packages
      apt:
        name:
          - telnet
          - rsh-client
          - rsh-redone-client
        state: absent
        purge: yes

  handlers:
    - name: restart sshd
      systemd:
        name: sshd
        state: restarted

    - name: restart fail2ban
      systemd:
        name: fail2ban
        state: restarted

    - name: restart auditd
      systemd:
        name: auditd
        state: restarted

Mejores Prácticas para Playbooks de Producción

1. Usar Ansible Vault para Secretos

# Create encrypted variable file
ansible-vault create group_vars/production/vault.yml

# Edit encrypted file
ansible-vault edit group_vars/production/vault.yml

# Content example:
vault_mysql_root_password: "super_secret_password"
vault_api_keys:
  aws: "AKIAIOSFODNN7EXAMPLE"
  sendgrid: "SG.example123"

Referenciar en playbooks:

vars:
  mysql_root_password: "{{ vault_mysql_root_password }}"

2. Implementar Manejo Apropiado de Errores

- name: Task with error handling
  command: /usr/bin/some-command
  register: result
  failed_when: false
  changed_when: result.rc == 0

- name: Handle errors gracefully
  block:
    - name: Risky operation
      command: /usr/bin/risky-command
  rescue:
    - name: Handle failure
      debug:
        msg: "Command failed, rolling back"
    - name: Rollback action
      command: /usr/bin/rollback-command
  always:
    - name: Cleanup
      file:
        path: /tmp/tempfile
        state: absent

3. Usar Etiquetas Estratégicamente

- name: Full application setup
  hosts: appservers

  tasks:
    - name: Install dependencies
      apt:
        name: "{{ packages }}"
      tags: [install, packages]

    - name: Deploy code
      git:
        repo: "{{ repo_url }}"
        dest: /opt/app
      tags: [deploy, code]

    - name: Configure application
      template:
        src: config.j2
        dest: /opt/app/config.yml
      tags: [configure, config]

Ejecutar etiquetas específicas:

ansible-playbook site.yml --tags "deploy"
ansible-playbook site.yml --tags "install,configure"

4. Implementar Pruebas y Validación

- name: Validate deployment
  hosts: webservers

  tasks:
    - name: Check if service is running
      systemd:
        name: nginx
        state: started
      check_mode: yes
      register: service_status
      failed_when: false

    - name: Verify HTTP response
      uri:
        url: http://localhost
        status_code: 200
        timeout: 5
      register: http_check
      until: http_check.status == 200
      retries: 5
      delay: 2

    - name: Validate configuration syntax
      command: nginx -t
      changed_when: false

    - name: Assert all checks passed
      assert:
        that:
          - service_status.state == "started"
          - http_check.status == 200
        fail_msg: "Validation failed"
        success_msg: "All validations passed"

5. Documentar con Comentarios y Metadatos

---
# ============================================================================
# Playbook: production-deployment.yml
# Description: Deploy application to production environment
# Author: DevOps Team <[email protected]>
# Version: 2.1.0
# Last Updated: 2024-01-15
#
# Dependencies:
#   - Ansible 2.9+
#   - Python 3.6+
#   - AWS CLI configured
#
# Variables Required:
#   - deploy_version: Application version to deploy
#   - environment: Target environment (production/staging)
#
# Usage:
#   ansible-playbook production-deployment.yml -e deploy_version=v1.2.3
# ============================================================================

- name: Deploy application (v{{ deploy_version }})
  hosts: production

  # Task execution settings
  serial: 2              # Deploy 2 servers at a time
  max_fail_percentage: 10 # Fail if more than 10% of hosts fail

  tasks:
    # Each task should have a clear, descriptive name
    - name: Validate deployment prerequisites
      assert:
        that:
          - deploy_version is defined
          - deploy_version is match('^v[0-9]+\.[0-9]+\.[0-9]+$')
        fail_msg: "deploy_version must be in format v1.2.3"

Solución de Problemas de Ejecuciones de Playbook

Depurar Tareas Fallidas

- name: Debug playbook execution
  hosts: all

  tasks:
    - name: Run command with debugging
      command: /usr/bin/my-command
      register: command_result
      ignore_errors: yes

    - name: Display command output
      debug:
        var: command_result
        verbosity: 2

    - name: Show specific values
      debug:
        msg: "Return code: {{ command_result.rc }}, Output: {{ command_result.stdout }}"

Ejecutar con verbosidad:

ansible-playbook debug.yml -v    # verbose
ansible-playbook debug.yml -vv   # more verbose
ansible-playbook debug.yml -vvv  # debug
ansible-playbook debug.yml -vvvv # connection debug

Ejecución en Seco y Modo de Verificación

# Test without making changes
ansible-playbook site.yml --check

# Show what would change
ansible-playbook site.yml --check --diff

# Step through playbook interactively
ansible-playbook site.yml --step

Conclusión

Estos ejemplos prácticos de playbooks de Ansible demuestran escenarios de automatización del mundo real que puede adaptar a las necesidades de su infraestructura. Desde despliegues simples de pila LEMP hasta aplicaciones complejas de múltiples niveles con despliegues sin tiempo de inactividad, Ansible proporciona la flexibilidad y el poder para automatizar prácticamente cualquier tarea de infraestructura.

Conclusiones clave:

  • Estructure playbooks para reutilización y mantenibilidad
  • Implemente manejo apropiado de errores y mecanismos de rollback
  • Use variables y plantillas para configuraciones específicas del entorno
  • Aplique mejores prácticas de seguridad desde el inicio
  • Pruebe exhaustivamente antes de desplegar a producción
  • Documente sus playbooks de manera completa
  • Use control de versiones para todo el código de Ansible

A medida que construya su biblioteca de automatización de Ansible, enfóquese en crear playbooks idempotentes y bien probados que puedan ejecutarse de manera segura múltiples veces. Comience con playbooks simples y aumente gradualmente la complejidad a medida que gane experiencia. Recuerde que el objetivo no es solo automatizar, sino crear infraestructura como código confiable y mantenible que todo su equipo pueda entender y a la que pueda contribuir.

Continúe explorando temas avanzados como módulos personalizados, inventario dinámico, Ansible Tower/AWX para orquestación empresarial e integración con pipelines de CI/CD para llevar su automatización al siguiente nivel.