Ansible Playbooks: Practical Examples for Real-World Infrastructure
Introduction
Ansible playbooks are the cornerstone of infrastructure automation, transforming complex manual processes into repeatable, version-controlled configurations. While ad-hoc commands are useful for quick tasks, playbooks provide the power to orchestrate multi-step operations, manage complex infrastructure, and implement sophisticated deployment strategies.
This comprehensive guide presents practical, production-ready Ansible playbook examples that you can adapt to your infrastructure needs. Each example is designed to solve real-world problems faced by system administrators and DevOps engineers, from deploying full application stacks to implementing disaster recovery procedures.
Whether you're managing a handful of servers or orchestrating thousands of cloud instances, these playbook examples will help you automate repetitive tasks, reduce human error, and implement infrastructure-as-code best practices. Each example includes detailed explanations, complete working code, and best practices that you can immediately apply to your projects.
Understanding Playbook Structure
Before diving into examples, let's understand the anatomy of a well-structured playbook:
---
# Top-level play
- name: Descriptive play name
hosts: target_hosts
become: yes # Privilege escalation
gather_facts: yes # Gather system information
vars:
# Play-specific variables
app_version: "1.0.0"
pre_tasks:
# Tasks that run before roles
- name: Update cache
apt:
update_cache: yes
roles:
# Reusable role includes
- common
- webserver
tasks:
# Main tasks
- name: Task description
module_name:
parameter: value
notify: handler_name
post_tasks:
# Tasks that run after everything
- name: Final verification
uri:
url: http://localhost
handlers:
# Event-driven tasks
- name: handler_name
systemd:
name: nginx
state: restarted
Prerequisites
To use these playbooks effectively, ensure you have:
- Ansible 2.9 or higher installed on your control node
- SSH access to managed nodes with key-based authentication
- Sudo/root privileges on managed nodes
- Basic understanding of YAML syntax
- Properly configured inventory file
- Python 3.6+ on all managed nodes
Project Structure
Organize your Ansible project like this:
ansible-project/
├── ansible.cfg
├── inventory/
│ ├── production
│ ├── staging
│ └── development
├── group_vars/
│ ├── all.yml
│ ├── webservers.yml
│ └── databases.yml
├── host_vars/
│ └── special-host.yml
├── playbooks/
│ ├── site.yml
│ ├── webservers.yml
│ └── databases.yml
├── roles/
│ ├── common/
│ ├── nginx/
│ └── postgresql/
└── files/
└── templates/
Example 1: Complete LEMP Stack Deployment
This playbook deploys a full Linux, Nginx, MySQL (MariaDB), PHP stack with security hardening:
---
# playbooks/lemp-stack.yml
- name: Deploy LEMP Stack
hosts: webservers
become: yes
vars:
php_version: "8.2"
mysql_root_password: "{{ vault_mysql_root_password }}"
app_user: "www-data"
app_domain: "example.com"
tasks:
# System preparation
- name: Update apt cache
apt:
update_cache: yes
cache_valid_time: 3600
- name: Install system dependencies
apt:
name:
- software-properties-common
- apt-transport-https
- ca-certificates
- curl
- gnupg
state: present
# Nginx installation and configuration
- name: Install Nginx
apt:
name: nginx
state: present
- name: Create web root directory
file:
path: "/var/www/{{ app_domain }}"
state: directory
owner: "{{ app_user }}"
group: "{{ app_user }}"
mode: '0755'
- name: Configure Nginx virtual host
template:
src: templates/nginx-vhost.j2
dest: "/etc/nginx/sites-available/{{ app_domain }}"
mode: '0644'
notify: reload nginx
- name: Enable Nginx site
file:
src: "/etc/nginx/sites-available/{{ app_domain }}"
dest: "/etc/nginx/sites-enabled/{{ app_domain }}"
state: link
notify: reload nginx
- name: Remove default Nginx site
file:
path: /etc/nginx/sites-enabled/default
state: absent
notify: reload nginx
# MariaDB installation
- name: Install MariaDB server
apt:
name:
- mariadb-server
- mariadb-client
- python3-pymysql
state: present
- name: Start and enable MariaDB
systemd:
name: mariadb
state: started
enabled: yes
- name: Set MariaDB root password
mysql_user:
name: root
password: "{{ mysql_root_password }}"
login_unix_socket: /var/run/mysqld/mysqld.sock
state: present
- name: Create MariaDB configuration for root
template:
src: templates/my.cnf.j2
dest: /root/.my.cnf
mode: '0600'
- name: Remove anonymous MariaDB users
mysql_user:
name: ''
host_all: yes
state: absent
- name: Remove MariaDB test database
mysql_db:
name: test
state: absent
# PHP installation
- name: Add PHP repository
apt_repository:
repo: "ppa:ondrej/php"
state: present
- name: Install PHP and extensions
apt:
name:
- "php{{ php_version }}-fpm"
- "php{{ php_version }}-mysql"
- "php{{ php_version }}-curl"
- "php{{ php_version }}-gd"
- "php{{ php_version }}-mbstring"
- "php{{ php_version }}-xml"
- "php{{ php_version }}-zip"
- "php{{ php_version }}-opcache"
state: present
- name: Configure PHP-FPM pool
template:
src: templates/php-fpm-pool.j2
dest: "/etc/php/{{ php_version }}/fpm/pool.d/www.conf"
mode: '0644'
notify: restart php-fpm
- name: Configure PHP settings
lineinfile:
path: "/etc/php/{{ php_version }}/fpm/php.ini"
regexp: "{{ item.regexp }}"
line: "{{ item.line }}"
loop:
- { regexp: '^;?upload_max_filesize', line: 'upload_max_filesize = 64M' }
- { regexp: '^;?post_max_size', line: 'post_max_size = 64M' }
- { regexp: '^;?memory_limit', line: 'memory_limit = 256M' }
- { regexp: '^;?max_execution_time', line: 'max_execution_time = 300' }
notify: restart php-fpm
# Security hardening
- name: Install and configure UFW
apt:
name: ufw
state: present
- name: Configure UFW defaults
ufw:
direction: "{{ item.direction }}"
policy: "{{ item.policy }}"
loop:
- { direction: 'incoming', policy: 'deny' }
- { direction: 'outgoing', policy: 'allow' }
- name: Allow SSH
ufw:
rule: allow
port: '22'
proto: tcp
- name: Allow HTTP
ufw:
rule: allow
port: '80'
proto: tcp
- name: Allow HTTPS
ufw:
rule: allow
port: '443'
proto: tcp
- name: Enable UFW
ufw:
state: enabled
# SSL certificate with Let's Encrypt
- name: Install Certbot
apt:
name:
- certbot
- python3-certbot-nginx
state: present
- name: Obtain SSL certificate
command: >
certbot --nginx --non-interactive --agree-tos
--email admin@{{ app_domain }}
-d {{ app_domain }} -d www.{{ app_domain }}
args:
creates: "/etc/letsencrypt/live/{{ app_domain }}/fullchain.pem"
- name: Setup SSL renewal cron job
cron:
name: "Renew Let's Encrypt certificates"
minute: "0"
hour: "3"
job: "certbot renew --quiet --post-hook 'systemctl reload nginx'"
# Deploy sample application
- name: Deploy index.php
copy:
content: |
<?php
phpinfo();
?>
dest: "/var/www/{{ app_domain }}/index.php"
owner: "{{ app_user }}"
group: "{{ app_user }}"
mode: '0644'
handlers:
- name: reload nginx
systemd:
name: nginx
state: reloaded
- name: restart php-fpm
systemd:
name: "php{{ php_version }}-fpm"
state: restarted
Required Template: nginx-vhost.j2
# templates/nginx-vhost.j2
server {
listen 80;
listen [::]:80;
server_name {{ app_domain }} www.{{ app_domain }};
root /var/www/{{ app_domain }};
index index.php index.html index.htm;
location / {
try_files $uri $uri/ =404;
}
location ~ \.php$ {
include snippets/fastcgi-php.conf;
fastcgi_pass unix:/var/run/php/php{{ php_version }}-fpm.sock;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
include fastcgi_params;
}
location ~ /\.ht {
deny all;
}
# Security headers
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
# Gzip compression
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_types text/plain text/css text/xml text/javascript application/json application/javascript application/xml+rss;
}
Example 2: Multi-Tier Application Deployment
Deploy a complete application with load balancer, web servers, and database cluster:
---
# playbooks/multi-tier-app.yml
- name: Configure load balancers
hosts: loadbalancers
become: yes
tasks:
- name: Install HAProxy
apt:
name: haproxy
state: present
- name: Configure HAProxy
template:
src: templates/haproxy.cfg.j2
dest: /etc/haproxy/haproxy.cfg
mode: '0644'
validate: 'haproxy -f %s -c'
notify: restart haproxy
- name: Enable HAProxy
systemd:
name: haproxy
enabled: yes
state: started
handlers:
- name: restart haproxy
systemd:
name: haproxy
state: restarted
- name: Configure web application servers
hosts: appservers
become: yes
serial: 1 # Rolling deployment
vars:
app_name: "myapp"
app_version: "{{ deploy_version | default('latest') }}"
app_port: 3000
tasks:
- name: Install Node.js
apt:
name:
- nodejs
- npm
state: present
- name: Create application user
user:
name: "{{ app_name }}"
system: yes
shell: /bin/bash
home: "/opt/{{ app_name }}"
- name: Create app directory
file:
path: "/opt/{{ app_name }}"
state: directory
owner: "{{ app_name }}"
group: "{{ app_name }}"
mode: '0755'
- name: Deploy application code
git:
repo: "https://github.com/yourorg/{{ app_name }}.git"
dest: "/opt/{{ app_name }}/app"
version: "{{ app_version }}"
force: yes
become_user: "{{ app_name }}"
notify: restart app
- name: Install npm dependencies
npm:
path: "/opt/{{ app_name }}/app"
production: yes
become_user: "{{ app_name }}"
notify: restart app
- name: Create environment file
template:
src: templates/app-env.j2
dest: "/opt/{{ app_name }}/.env"
owner: "{{ app_name }}"
group: "{{ app_name }}"
mode: '0600'
notify: restart app
- name: Create systemd service
template:
src: templates/app-service.j2
dest: "/etc/systemd/system/{{ app_name }}.service"
mode: '0644'
notify:
- reload systemd
- restart app
- name: Enable and start application
systemd:
name: "{{ app_name }}"
enabled: yes
state: started
- name: Wait for application to be ready
uri:
url: "http://localhost:{{ app_port }}/health"
status_code: 200
register: result
until: result.status == 200
retries: 10
delay: 3
handlers:
- name: reload systemd
systemd:
daemon_reload: yes
- name: restart app
systemd:
name: "{{ app_name }}"
state: restarted
- name: Configure database servers
hosts: databases
become: yes
vars:
postgres_version: "15"
db_name: "myapp_production"
db_user: "myapp"
db_password: "{{ vault_db_password }}"
tasks:
- name: Install PostgreSQL
apt:
name:
- "postgresql-{{ postgres_version }}"
- "postgresql-contrib-{{ postgres_version }}"
- python3-psycopg2
state: present
- name: Ensure PostgreSQL is running
systemd:
name: postgresql
state: started
enabled: yes
- name: Create application database
postgresql_db:
name: "{{ db_name }}"
state: present
become_user: postgres
- name: Create application user
postgresql_user:
name: "{{ db_user }}"
password: "{{ db_password }}"
db: "{{ db_name }}"
priv: ALL
state: present
become_user: postgres
- name: Configure PostgreSQL for network access
lineinfile:
path: "/etc/postgresql/{{ postgres_version }}/main/postgresql.conf"
regexp: "^#?listen_addresses"
line: "listen_addresses = '*'"
notify: restart postgresql
- name: Allow application servers to connect
postgresql_pg_hba:
dest: "/etc/postgresql/{{ postgres_version }}/main/pg_hba.conf"
contype: host
users: "{{ db_user }}"
source: "{{ hostvars[item]['ansible_default_ipv4']['address'] }}/32"
databases: "{{ db_name }}"
method: md5
loop: "{{ groups['appservers'] }}"
notify: restart postgresql
handlers:
- name: restart postgresql
systemd:
name: postgresql
state: restarted
- name: Run database migrations
hosts: appservers[0]
become: yes
become_user: myapp
tasks:
- name: Run migrations
command: npm run migrate
args:
chdir: /opt/myapp/app
run_once: yes
Example 3: Disaster Recovery and Backup Automation
Comprehensive backup solution with rotation and off-site storage:
---
# playbooks/backup-automation.yml
- name: Configure automated backups
hosts: all
become: yes
vars:
backup_dir: "/var/backups"
backup_retention_days: 7
backup_s3_bucket: "company-backups"
backup_schedule: "0 2 * * *" # 2 AM daily
tasks:
- name: Install backup tools
apt:
name:
- rsync
- borgbackup
- awscli
- pigz
state: present
- name: Create backup directory
file:
path: "{{ backup_dir }}"
state: directory
mode: '0700'
owner: root
group: root
- name: Create backup script
copy:
content: |
#!/bin/bash
set -euo pipefail
# Configuration
BACKUP_DIR="{{ backup_dir }}"
RETENTION_DAYS={{ backup_retention_days }}
S3_BUCKET="{{ backup_s3_bucket }}"
HOSTNAME=$(hostname -f)
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
# Logging
LOG_FILE="${BACKUP_DIR}/backup.log"
exec 1> >(tee -a "${LOG_FILE}")
exec 2>&1
echo "=== Backup started at $(date) ==="
# Backup system files
echo "Backing up system files..."
tar -czf "${BACKUP_DIR}/system_${TIMESTAMP}.tar.gz" \
/etc \
/home \
/root \
--exclude='/home/*/.cache' \
--exclude='/home/*/tmp'
{% if 'databases' in group_names %}
# Database backup
echo "Backing up databases..."
if systemctl is-active --quiet postgresql; then
sudo -u postgres pg_dumpall | pigz > "${BACKUP_DIR}/postgres_${TIMESTAMP}.sql.gz"
fi
if systemctl is-active --quiet mariadb; then
mysqldump --all-databases --single-transaction | pigz > "${BACKUP_DIR}/mysql_${TIMESTAMP}.sql.gz"
fi
{% endif %}
{% if 'webservers' in group_names %}
# Web content backup
echo "Backing up web content..."
tar -czf "${BACKUP_DIR}/web_${TIMESTAMP}.tar.gz" /var/www
{% endif %}
# Upload to S3
echo "Uploading to S3..."
aws s3 sync "${BACKUP_DIR}" "s3://${S3_BUCKET}/${HOSTNAME}/" \
--exclude "*.log" \
--storage-class STANDARD_IA
# Cleanup old local backups
echo "Cleaning up old backups..."
find "${BACKUP_DIR}" -name "*.tar.gz" -mtime +${RETENTION_DAYS} -delete
find "${BACKUP_DIR}" -name "*.sql.gz" -mtime +${RETENTION_DAYS} -delete
echo "=== Backup completed at $(date) ==="
dest: /usr/local/bin/automated-backup.sh
mode: '0700'
owner: root
group: root
- name: Configure AWS credentials
template:
src: templates/aws-credentials.j2
dest: /root/.aws/credentials
mode: '0600'
- name: Schedule backup cron job
cron:
name: "Automated system backup"
minute: "{{ backup_schedule.split()[0] }}"
hour: "{{ backup_schedule.split()[1] }}"
job: "/usr/local/bin/automated-backup.sh"
state: present
- name: Create backup monitoring script
copy:
content: |
#!/bin/bash
BACKUP_DIR="{{ backup_dir }}"
MAX_AGE_HOURS=26
LATEST_BACKUP=$(find "${BACKUP_DIR}" -name "*.tar.gz" -type f -printf '%T@ %p\n' | sort -n | tail -1 | cut -f2- -d" ")
if [ -z "$LATEST_BACKUP" ]; then
echo "CRITICAL: No backups found"
exit 2
fi
AGE_HOURS=$(( ($(date +%s) - $(stat -c %Y "$LATEST_BACKUP")) / 3600 ))
if [ $AGE_HOURS -gt $MAX_AGE_HOURS ]; then
echo "WARNING: Latest backup is ${AGE_HOURS} hours old"
exit 1
fi
echo "OK: Latest backup is ${AGE_HOURS} hours old"
exit 0
dest: /usr/local/bin/check-backup.sh
mode: '0755'
- name: Test backup script
command: /usr/local/bin/automated-backup.sh
async: 3600
poll: 0
register: backup_test
- name: Verify backup completion
async_status:
jid: "{{ backup_test.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 60
delay: 60
Example 4: Zero-Downtime Rolling Deployments
Implement blue-green deployments with health checks:
---
# playbooks/rolling-deployment.yml
- name: Blue-Green deployment with zero downtime
hosts: webservers
become: yes
serial: 1
max_fail_percentage: 0
vars:
app_name: "webapp"
app_version: "{{ deploy_version }}"
app_port: 8080
health_check_url: "http://localhost:{{ app_port }}/health"
health_check_retries: 30
health_check_delay: 2
pre_tasks:
- name: Remove from load balancer
haproxy:
state: disabled
host: "{{ inventory_hostname }}"
socket: /run/haproxy/admin.sock
backend: app_backend
delegate_to: "{{ item }}"
loop: "{{ groups['loadbalancers'] }}"
- name: Wait for connections to drain
wait_for:
timeout: 10
tasks:
- name: Stop current application
systemd:
name: "{{ app_name }}"
state: stopped
- name: Backup current version
command: >
mv /opt/{{ app_name }}/current
/opt/{{ app_name }}/rollback_{{ ansible_date_time.epoch }}
args:
removes: /opt/{{ app_name }}/current
ignore_errors: yes
- name: Deploy new version
git:
repo: "https://github.com/yourorg/{{ app_name }}.git"
dest: "/opt/{{ app_name }}/releases/{{ app_version }}"
version: "{{ app_version }}"
become_user: "{{ app_name }}"
- name: Install dependencies
npm:
path: "/opt/{{ app_name }}/releases/{{ app_version }}"
production: yes
become_user: "{{ app_name }}"
- name: Create symlink to current version
file:
src: "/opt/{{ app_name }}/releases/{{ app_version }}"
dest: "/opt/{{ app_name }}/current"
state: link
- name: Start application
systemd:
name: "{{ app_name }}"
state: started
- name: Wait for application health check
uri:
url: "{{ health_check_url }}"
status_code: 200
timeout: 5
register: health_check
until: health_check.status == 200
retries: "{{ health_check_retries }}"
delay: "{{ health_check_delay }}"
failed_when: false
- name: Rollback if health check fails
block:
- name: Stop failed deployment
systemd:
name: "{{ app_name }}"
state: stopped
- name: Restore previous version
shell: |
rm -f /opt/{{ app_name }}/current
ROLLBACK=$(ls -t /opt/{{ app_name }}/rollback_* | head -1)
mv "$ROLLBACK" /opt/{{ app_name }}/current
args:
executable: /bin/bash
- name: Start rolled back version
systemd:
name: "{{ app_name }}"
state: started
- name: Fail deployment
fail:
msg: "Deployment failed health check, rolled back to previous version"
when: health_check.status != 200
post_tasks:
- name: Add back to load balancer
haproxy:
state: enabled
host: "{{ inventory_hostname }}"
socket: /run/haproxy/admin.sock
backend: app_backend
delegate_to: "{{ item }}"
loop: "{{ groups['loadbalancers'] }}"
- name: Verify in load balancer rotation
uri:
url: "http://{{ hostvars[item]['ansible_default_ipv4']['address'] }}/haproxy?stats"
return_content: yes
delegate_to: "{{ item }}"
loop: "{{ groups['loadbalancers'] }}"
register: lb_status
failed_when: "'{{ inventory_hostname }}' not in lb_status.content"
- name: Cleanup old releases
shell: |
cd /opt/{{ app_name }}/releases
ls -t | tail -n +4 | xargs -r rm -rf
cd /opt/{{ app_name }}
ls -t rollback_* 2>/dev/null | tail -n +3 | xargs -r rm -rf
args:
executable: /bin/bash
Example 5: Infrastructure Monitoring Setup
Deploy complete monitoring stack with Prometheus and Grafana:
---
# playbooks/monitoring-stack.yml
- name: Deploy Prometheus monitoring
hosts: monitoring
become: yes
vars:
prometheus_version: "2.45.0"
grafana_version: "latest"
alertmanager_version: "0.26.0"
tasks:
- name: Create prometheus user
user:
name: prometheus
system: yes
shell: /bin/false
create_home: no
- name: Create prometheus directories
file:
path: "{{ item }}"
state: directory
owner: prometheus
group: prometheus
mode: '0755'
loop:
- /etc/prometheus
- /var/lib/prometheus
- name: Download Prometheus
get_url:
url: "https://github.com/prometheus/prometheus/releases/download/v{{ prometheus_version }}/prometheus-{{ prometheus_version }}.linux-amd64.tar.gz"
dest: /tmp/prometheus.tar.gz
- name: Extract Prometheus
unarchive:
src: /tmp/prometheus.tar.gz
dest: /tmp
remote_src: yes
- name: Copy Prometheus binaries
copy:
src: "/tmp/prometheus-{{ prometheus_version }}.linux-amd64/{{ item }}"
dest: "/usr/local/bin/{{ item }}"
mode: '0755'
remote_src: yes
loop:
- prometheus
- promtool
- name: Configure Prometheus
template:
src: templates/prometheus.yml.j2
dest: /etc/prometheus/prometheus.yml
owner: prometheus
group: prometheus
mode: '0644'
notify: reload prometheus
- name: Create Prometheus systemd service
copy:
content: |
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--web.listen-address=0.0.0.0:9090
[Install]
WantedBy=multi-user.target
dest: /etc/systemd/system/prometheus.service
mode: '0644'
notify:
- reload systemd
- restart prometheus
- name: Start Prometheus
systemd:
name: prometheus
state: started
enabled: yes
# Grafana installation
- name: Add Grafana repository
apt_repository:
repo: "deb https://packages.grafana.com/oss/deb stable main"
state: present
filename: grafana
- name: Add Grafana GPG key
apt_key:
url: https://packages.grafana.com/gpg.key
state: present
- name: Install Grafana
apt:
name: grafana
state: present
update_cache: yes
- name: Configure Grafana
template:
src: templates/grafana.ini.j2
dest: /etc/grafana/grafana.ini
mode: '0640'
owner: grafana
group: grafana
notify: restart grafana
- name: Start Grafana
systemd:
name: grafana-server
state: started
enabled: yes
- name: Configure firewall for Prometheus
ufw:
rule: allow
port: '9090'
proto: tcp
- name: Configure firewall for Grafana
ufw:
rule: allow
port: '3000'
proto: tcp
handlers:
- name: reload systemd
systemd:
daemon_reload: yes
- name: restart prometheus
systemd:
name: prometheus
state: restarted
- name: reload prometheus
systemd:
name: prometheus
state: reloaded
- name: restart grafana
systemd:
name: grafana-server
state: restarted
- name: Deploy Node Exporters
hosts: all
become: yes
vars:
node_exporter_version: "1.7.0"
tasks:
- name: Create node_exporter user
user:
name: node_exporter
system: yes
shell: /bin/false
create_home: no
- name: Download Node Exporter
get_url:
url: "https://github.com/prometheus/node_exporter/releases/download/v{{ node_exporter_version }}/node_exporter-{{ node_exporter_version }}.linux-amd64.tar.gz"
dest: /tmp/node_exporter.tar.gz
- name: Extract Node Exporter
unarchive:
src: /tmp/node_exporter.tar.gz
dest: /tmp
remote_src: yes
- name: Copy Node Exporter binary
copy:
src: "/tmp/node_exporter-{{ node_exporter_version }}.linux-amd64/node_exporter"
dest: /usr/local/bin/node_exporter
mode: '0755'
remote_src: yes
- name: Create Node Exporter systemd service
copy:
content: |
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter \
--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/) \
--collector.netclass.ignored-devices=^(veth.*|docker.*|br-.*)$$
[Install]
WantedBy=multi-user.target
dest: /etc/systemd/system/node_exporter.service
mode: '0644'
- name: Start Node Exporter
systemd:
name: node_exporter
state: started
enabled: yes
daemon_reload: yes
Example 6: Security Compliance and Hardening
Implement CIS benchmarks and security best practices:
---
# playbooks/security-hardening.yml
- name: Apply security hardening
hosts: all
become: yes
vars:
allowed_ssh_users: ["admin", "deploy"]
ssh_port: 22
max_auth_tries: 3
password_max_days: 90
password_min_days: 1
password_warn_age: 7
tasks:
# System updates
- name: Update all packages
apt:
upgrade: dist
update_cache: yes
autoremove: yes
autoclean: yes
- name: Install security tools
apt:
name:
- aide
- auditd
- fail2ban
- rkhunter
- lynis
state: present
# SSH hardening
- name: Configure SSH daemon
lineinfile:
path: /etc/ssh/sshd_config
regexp: "{{ item.regexp }}"
line: "{{ item.line }}"
state: present
validate: '/usr/sbin/sshd -t -f %s'
loop:
- { regexp: '^#?PermitRootLogin', line: 'PermitRootLogin no' }
- { regexp: '^#?PasswordAuthentication', line: 'PasswordAuthentication no' }
- { regexp: '^#?PubkeyAuthentication', line: 'PubkeyAuthentication yes' }
- { regexp: '^#?PermitEmptyPasswords', line: 'PermitEmptyPasswords no' }
- { regexp: '^#?X11Forwarding', line: 'X11Forwarding no' }
- { regexp: '^#?MaxAuthTries', line: 'MaxAuthTries {{ max_auth_tries }}' }
- { regexp: '^#?ClientAliveInterval', line: 'ClientAliveInterval 300' }
- { regexp: '^#?ClientAliveCountMax', line: 'ClientAliveCountMax 2' }
- { regexp: '^#?Protocol', line: 'Protocol 2' }
- { regexp: '^#?AllowUsers', line: 'AllowUsers {{ allowed_ssh_users | join(" ") }}' }
notify: restart sshd
# Password policies
- name: Configure password aging
lineinfile:
path: /etc/login.defs
regexp: "{{ item.regexp }}"
line: "{{ item.line }}"
loop:
- { regexp: '^PASS_MAX_DAYS', line: 'PASS_MAX_DAYS {{ password_max_days }}' }
- { regexp: '^PASS_MIN_DAYS', line: 'PASS_MIN_DAYS {{ password_min_days }}' }
- { regexp: '^PASS_WARN_AGE', line: 'PASS_WARN_AGE {{ password_warn_age }}' }
# Kernel hardening
- name: Configure sysctl security parameters
sysctl:
name: "{{ item.name }}"
value: "{{ item.value }}"
state: present
reload: yes
sysctl_file: /etc/sysctl.d/99-security.conf
loop:
# Network security
- { name: 'net.ipv4.conf.all.rp_filter', value: '1' }
- { name: 'net.ipv4.conf.default.rp_filter', value: '1' }
- { name: 'net.ipv4.icmp_echo_ignore_broadcasts', value: '1' }
- { name: 'net.ipv4.conf.all.accept_source_route', value: '0' }
- { name: 'net.ipv4.conf.default.accept_source_route', value: '0' }
- { name: 'net.ipv4.conf.all.accept_redirects', value: '0' }
- { name: 'net.ipv4.conf.default.accept_redirects', value: '0' }
- { name: 'net.ipv4.conf.all.secure_redirects', value: '0' }
- { name: 'net.ipv4.conf.default.secure_redirects', value: '0' }
- { name: 'net.ipv4.conf.all.send_redirects', value: '0' }
- { name: 'net.ipv4.conf.default.send_redirects', value: '0' }
- { name: 'net.ipv4.tcp_syncookies', value: '1' }
- { name: 'net.ipv4.tcp_timestamps', value: '0' }
# Kernel security
- { name: 'kernel.dmesg_restrict', value: '1' }
- { name: 'kernel.kptr_restrict', value: '2' }
- { name: 'kernel.yama.ptrace_scope', value: '1' }
- { name: 'fs.suid_dumpable', value: '0' }
# Fail2Ban configuration
- name: Configure Fail2Ban for SSH
copy:
content: |
[sshd]
enabled = true
port = {{ ssh_port }}
filter = sshd
logpath = /var/log/auth.log
maxretry = 3
bantime = 3600
findtime = 600
dest: /etc/fail2ban/jail.d/sshd.conf
mode: '0644'
notify: restart fail2ban
# Audit daemon
- name: Configure auditd rules
copy:
content: |
# Delete all existing rules
-D
# Buffer size
-b 8192
# Failure mode
-f 1
# Monitor user/group changes
-w /etc/group -p wa -k identity
-w /etc/passwd -p wa -k identity
-w /etc/gshadow -p wa -k identity
-w /etc/shadow -p wa -k identity
# Monitor system calls
-a always,exit -F arch=b64 -S adjtimex -S settimeofday -k time-change
-a always,exit -F arch=b32 -S adjtimex -S settimeofday -S stime -k time-change
# Monitor network environment
-a always,exit -F arch=b64 -S sethostname -S setdomainname -k system-locale
-a always,exit -F arch=b32 -S sethostname -S setdomainname -k system-locale
# Monitor login/logout events
-w /var/log/faillog -p wa -k logins
-w /var/log/lastlog -p wa -k logins
# Monitor sudo usage
-w /etc/sudoers -p wa -k sudo_changes
-w /etc/sudoers.d/ -p wa -k sudo_changes
dest: /etc/audit/rules.d/hardening.rules
mode: '0640'
notify: restart auditd
# File integrity monitoring
- name: Initialize AIDE database
command: aideinit
args:
creates: /var/lib/aide/aide.db.new
- name: Setup AIDE cron job
cron:
name: "AIDE file integrity check"
minute: "0"
hour: "5"
job: "/usr/bin/aide --check | mail -s 'AIDE Report' root@localhost"
# Disable unnecessary services
- name: Disable unnecessary services
systemd:
name: "{{ item }}"
state: stopped
enabled: no
loop:
- bluetooth
- cups
- avahi-daemon
ignore_errors: yes
# Remove unnecessary packages
- name: Remove unnecessary packages
apt:
name:
- telnet
- rsh-client
- rsh-redone-client
state: absent
purge: yes
handlers:
- name: restart sshd
systemd:
name: sshd
state: restarted
- name: restart fail2ban
systemd:
name: fail2ban
state: restarted
- name: restart auditd
systemd:
name: auditd
state: restarted
Best Practices for Production Playbooks
1. Use Ansible Vault for Secrets
# Create encrypted variable file
ansible-vault create group_vars/production/vault.yml
# Edit encrypted file
ansible-vault edit group_vars/production/vault.yml
# Content example:
vault_mysql_root_password: "super_secret_password"
vault_api_keys:
aws: "AKIAIOSFODNN7EXAMPLE"
sendgrid: "SG.example123"
Reference in playbooks:
vars:
mysql_root_password: "{{ vault_mysql_root_password }}"
2. Implement Proper Error Handling
- name: Task with error handling
command: /usr/bin/some-command
register: result
failed_when: false
changed_when: result.rc == 0
- name: Handle errors gracefully
block:
- name: Risky operation
command: /usr/bin/risky-command
rescue:
- name: Handle failure
debug:
msg: "Command failed, rolling back"
- name: Rollback action
command: /usr/bin/rollback-command
always:
- name: Cleanup
file:
path: /tmp/tempfile
state: absent
3. Use Tags Strategically
- name: Full application setup
hosts: appservers
tasks:
- name: Install dependencies
apt:
name: "{{ packages }}"
tags: [install, packages]
- name: Deploy code
git:
repo: "{{ repo_url }}"
dest: /opt/app
tags: [deploy, code]
- name: Configure application
template:
src: config.j2
dest: /opt/app/config.yml
tags: [configure, config]
Run specific tags:
ansible-playbook site.yml --tags "deploy"
ansible-playbook site.yml --tags "install,configure"
4. Implement Testing and Validation
- name: Validate deployment
hosts: webservers
tasks:
- name: Check if service is running
systemd:
name: nginx
state: started
check_mode: yes
register: service_status
failed_when: false
- name: Verify HTTP response
uri:
url: http://localhost
status_code: 200
timeout: 5
register: http_check
until: http_check.status == 200
retries: 5
delay: 2
- name: Validate configuration syntax
command: nginx -t
changed_when: false
- name: Assert all checks passed
assert:
that:
- service_status.state == "started"
- http_check.status == 200
fail_msg: "Validation failed"
success_msg: "All validations passed"
5. Document with Comments and Metadata
---
# ============================================================================
# Playbook: production-deployment.yml
# Description: Deploy application to production environment
# Author: DevOps Team <[email protected]>
# Version: 2.1.0
# Last Updated: 2024-01-15
#
# Dependencies:
# - Ansible 2.9+
# - Python 3.6+
# - AWS CLI configured
#
# Variables Required:
# - deploy_version: Application version to deploy
# - environment: Target environment (production/staging)
#
# Usage:
# ansible-playbook production-deployment.yml -e deploy_version=v1.2.3
# ============================================================================
- name: Deploy application (v{{ deploy_version }})
hosts: production
# Task execution settings
serial: 2 # Deploy 2 servers at a time
max_fail_percentage: 10 # Fail if more than 10% of hosts fail
tasks:
# Each task should have a clear, descriptive name
- name: Validate deployment prerequisites
assert:
that:
- deploy_version is defined
- deploy_version is match('^v[0-9]+\.[0-9]+\.[0-9]+$')
fail_msg: "deploy_version must be in format v1.2.3"
Troubleshooting Playbook Executions
Debug Failed Tasks
- name: Debug playbook execution
hosts: all
tasks:
- name: Run command with debugging
command: /usr/bin/my-command
register: command_result
ignore_errors: yes
- name: Display command output
debug:
var: command_result
verbosity: 2
- name: Show specific values
debug:
msg: "Return code: {{ command_result.rc }}, Output: {{ command_result.stdout }}"
Run with verbosity:
ansible-playbook debug.yml -v # verbose
ansible-playbook debug.yml -vv # more verbose
ansible-playbook debug.yml -vvv # debug
ansible-playbook debug.yml -vvvv # connection debug
Dry Run and Check Mode
# Test without making changes
ansible-playbook site.yml --check
# Show what would change
ansible-playbook site.yml --check --diff
# Step through playbook interactively
ansible-playbook site.yml --step
Conclusion
These practical Ansible playbook examples demonstrate real-world automation scenarios that you can adapt to your infrastructure needs. From simple LEMP stack deployments to complex multi-tier applications with zero-downtime deployments, Ansible provides the flexibility and power to automate virtually any infrastructure task.
Key takeaways:
- Structure playbooks for reusability and maintainability
- Implement proper error handling and rollback mechanisms
- Use variables and templates for environment-specific configurations
- Apply security best practices from the start
- Test thoroughly before deploying to production
- Document your playbooks comprehensively
- Use version control for all Ansible code
As you build your Ansible automation library, focus on creating idempotent, well-tested playbooks that can be safely run multiple times. Start with simple playbooks and gradually increase complexity as you gain experience. Remember that the goal is not just to automate, but to create reliable, maintainable infrastructure-as-code that your entire team can understand and contribute to.
Continue exploring advanced topics such as custom modules, dynamic inventory, Ansible Tower/AWX for enterprise orchestration, and integration with CI/CD pipelines to take your automation to the next level.


