Server Configuration Drift Detection

Configuration drift occurs when actual server state diverges from desired configuration due to manual changes, failed deployments, or untracked modifications. Detecting and remediating drift is critical for maintaining consistent, predictable infrastructure. This guide covers drift detection techniques using Ansible, Terraform, AIDE, Tripwire, automated detection methods, and remediation strategies.

Table of Contents

  1. Configuration Drift Overview
  2. Ansible Check Mode
  3. Terraform Plan Analysis
  4. File Integrity Monitoring with AIDE
  5. Tripwire for Change Detection
  6. System Package Monitoring
  7. Automated Drift Detection
  8. Drift Remediation
  9. Monitoring and Alerting
  10. Conclusion

Configuration Drift Overview

Configuration drift is the deviation of actual infrastructure from the declared desired state. It occurs through ad-hoc changes, manual fixes, security patches, or failed deployments.

Common causes:

  • Manual Changes: Direct SSH modifications to fix issues
  • Failed Deployments: Incomplete automation updates
  • Security Patches: OS updates outside of automation
  • Third-party Tools: Changes from monitoring or logging systems
  • Emergency Fixes: Temporary changes during incidents
  • Unchecked Automation: Automation that doesn't idempotently enforce state

Drift detection benefits:

  • Consistency: Ensure infrastructure matches configuration
  • Security: Detect unauthorized changes
  • Compliance: Maintain compliance state
  • Predictability: Know actual vs desired state
  • Audit Trail: Track what changed and when

Drift detection methods:

┌────────────────────────────────────────┐
│    Desired State (Configuration)       │
└────────────────────────┬───────────────┘
                         │
                    Compare
                         │
                         ▼
┌────────────────────────────────────────┐
│    Actual State (Real Servers)         │
└────────────────────────────────────────┘
                         │
                    Drift Report
                         │
                         ▼
┌────────────────────────────────────────┐
│    Remediation (Auto or Manual)        │
└────────────────────────────────────────┘

Ansible Check Mode

Use Ansible's check mode to detect configuration drift without applying changes.

Check mode fundamentals:

# Run in check mode (dry-run, no changes applied)
ansible-playbook site.yml --check

# Check mode with verbose output
ansible-playbook site.yml --check -v

# Check specific hosts
ansible-playbook site.yml --check --limit webservers

# Show differences detected
ansible-playbook site.yml --check --diff

Playbook for drift detection:

# drift-check.yml
---
- hosts: all
  gather_facts: yes

  tasks:
    - name: Check package updates
      apt:
        update_cache: yes
        cache_valid_time: 3600
      register: apt_check

    - name: Verify Nginx installed
      package:
        name: nginx
        state: present
      register: nginx_check
      check_mode: yes

    - name: Check Nginx configuration
      stat:
        path: /etc/nginx/nginx.conf
      register: nginx_conf_stat

    - name: Validate Nginx configuration
      shell: nginx -t
      register: nginx_validate
      changed_when: false
      failed_when: nginx_validate.rc != 0

    - name: Check service status
      systemd:
        name: nginx
      register: nginx_status

    - name: Report drift
      debug:
        msg: |
          Drift Detection Report:
          - Packages available for update: {{ apt_check.changed }}
          - Nginx installed: {{ not nginx_check.changed }}
          - Config exists: {{ nginx_conf_stat.stat.exists }}
          - Nginx runs valid: {{ nginx_validate.rc == 0 }}
          - Service active: {{ nginx_status.status.ActiveState == 'active' }}

    - name: Save drift report
      copy:
        content: |
          Drift Report - {{ ansible_date_time.iso8601 }}
          
          Host: {{ inventory_hostname }}
          
          Packages:
          - Updates available: {{ apt_check.changed }}
          
          Nginx:
          - Installation state: {{ nginx_check.changed }}
          - Config valid: {{ nginx_validate.rc == 0 }}
          - Service active: {{ nginx_status.status.ActiveState }}
        dest: /var/log/drift-report-{{ ansible_date_time.date }}.txt

Ansible with check and enforce:

# deploy-with-drift-check.yml
---
- hosts: all
  gather_facts: yes

  vars:
    auto_remediate: "{{ auto_remediate | default(false) }}"

  tasks:
    - name: Check configuration
      block:
        - name: Run configuration check
          include_tasks: tasks/config-check.yml
          register: drift_check

        - name: Report drift
          debug:
            msg: "Configuration drift detected: {{ drift_check.changes }}"
          when: drift_check.changed

      rescue:
        - name: Remediation needed
          debug:
            msg: "Manual intervention may be needed"

    - name: Remediate drift
      block:
        - name: Apply configuration
          include_tasks: tasks/config-apply.yml

      when: 
        - drift_check.changed
        - auto_remediate | bool

    - name: Verify remediation
      include_tasks: tasks/config-check.yml
      register: drift_check_after
      when:
        - drift_check.changed
        - auto_remediate | bool

Terraform Plan Analysis

Use Terraform's plan output to detect infrastructure drift.

Terraform plan for drift detection:

# Refresh state and plan
terraform plan -out=tfplan

# Show plan changes
terraform show tfplan

# Human-readable diff
terraform plan -out=tfplan && terraform show tfplan

# JSON output for analysis
terraform plan -json > tfplan.json

# Check for specific resource changes
terraform plan | grep "will be created\|will be updated\|will be deleted"

Drift detection script:

#!/bin/bash
# terraform-drift-check.sh

set -e

TERRAFORM_DIR="${1:-.}"
DRIFT_REPORT="/tmp/drift-report-$(date +%s).txt"

cd "$TERRAFORM_DIR"

# Refresh state
echo "Refreshing Terraform state..."
terraform refresh

# Plan and capture output
echo "Running Terraform plan..."
terraform plan -no-color > "$DRIFT_REPORT" 2>&1

# Check for drift
if grep -q "No changes\|perfect\|already matches desired state" "$DRIFT_REPORT"; then
  echo "No drift detected"
  exit 0
else
  echo "Drift detected!"
  echo ""
  echo "Changes needed:"
  grep "will be created\|will be updated\|will be destroyed\|will be replaced" "$DRIFT_REPORT" || true
  
  echo ""
  echo "Full report:"
  cat "$DRIFT_REPORT"
  
  exit 1
fi

Scheduled drift checks:

#!/bin/bash
# Cron job for drift detection
# Run every 6 hours

0 */6 * * * cd /opt/terraform && \
  terraform refresh && \
  terraform plan | mail -s "Drift Detection Report" [email protected]

# More sophisticated with alerting
0 */6 * * * cd /opt/terraform && \
  terraform plan -json | \
  jq -r 'select(.type=="resource_drift") | .message' | \
  if read -r line; then \
    curl -X POST https://slack-webhook.example.com \
      -d "{\"text\":\"Terraform drift detected: $line\"}"; \
  fi

File Integrity Monitoring with AIDE

AIDE (Advanced Intrusion Detection Environment) monitors file changes.

Install and configure AIDE:

# Install AIDE
sudo apt-get install -y aide aide-common

# Initialize database
sudo aideinit

# Wait for database creation (can take several minutes)
# This creates /var/lib/aide/aide.db.new

# Move to production location
sudo mv /var/lib/aide/aide.db.new /var/lib/aide/aide.db

AIDE configuration:

# /etc/aide/aide.conf.d/custom
# Monitor application directories
/opt/app R+b+sha512
/etc/app R+b+sha512

# Monitor critical system files
/etc/passwd R+b+sha512
/etc/shadow R+b+sha512
/etc/sudoers R+b+sha512

# Exclude frequently changing files
!/var/log
!/var/cache
!/tmp

Run AIDE checks:

# Check against database
sudo aide --check

# Generate report
sudo aide --check > /tmp/aide-report.txt

# Compare with baseline (if available)
sudo aide --compare

# Update database after approved changes
sudo aide --update
mv /var/lib/aide/aide.db.new /var/lib/aide/aide.db

Automated AIDE monitoring:

#!/bin/bash
# aide-monitor.sh

AIDE_DB="/var/lib/aide/aide.db"
REPORT_FILE="/var/log/aide-report-$(date +%Y%m%d).txt"
CHANGED_FILE="/var/log/aide-changes-$(date +%Y%m%d).txt"

# Run check
sudo aide --check > "$REPORT_FILE" 2>&1

# Extract changed files
if grep -q "changed" "$REPORT_FILE"; then
  echo "File changes detected:"
  grep "changed" "$REPORT_FILE" > "$CHANGED_FILE"
  
  # Send alert
  cat "$CHANGED_FILE" | mail -s "AIDE: File Changes Detected" [email protected]
  
  exit 1
else
  echo "No changes detected"
  exit 0
fi

Cron job for AIDE:

# Run AIDE checks hourly
0 * * * * /usr/local/bin/aide-monitor.sh

# Run AIDE checks daily
0 2 * * * sudo aide --check > /var/log/aide-daily-$(date +\%Y\%m\%d).txt 2>&1

Tripwire for Change Detection

Tripwire provides advanced file integrity monitoring.

Install Tripwire:

# Ubuntu/Debian
sudo apt-get install -y tripwire

# Configure
sudo twinstall.sh

# Accept default settings when prompted
# Default password: admin

# Initialize database
sudo tripwire --init

# Create baseline report
sudo tripwire --check --email-report

Tripwire policy configuration:

# /etc/tripwire/twpol.txt
# Monitor application
/opt/app        -> $(NORMAL);
/opt/app/bin    -> $(NORMAL);
/opt/app/conf   -> $(NORMAL);

# Monitor system configuration
/etc/passwd     -> $(PERMS);
/etc/shadow     -> $(PERMS);
/etc/sudoers    -> $(PERMS);
/etc/hosts      -> $(NORMAL);

# Skip frequently changing files
!/var/log;
!/var/cache;
!/tmp;
!/var/tmp;

# Variable definitions
NORMAL          = p+i+n+u+g+s+b+m+c+md5+rmd160;
PERMS           = p+u+g;

Run Tripwire checks:

# Initialize policy
sudo tripwire -a -S /etc/tripwire/site.key -L

# Check integrity
sudo tripwire --check

# Email report
sudo tripwire --check --email-report

# Generate report
sudo tripwire --check --report-level 3 > /tmp/tripwire-report.txt

# Update database after approved changes
sudo tripwire --update
sudo tripwire --init

System Package Monitoring

Monitor installed packages for drift.

Package inventory:

#!/bin/bash
# package-monitor.sh

# Generate package list
dpkg -l > /var/log/packages-installed.txt

# Compare with previous
if [ -f /var/log/packages-installed.prev ]; then
  diff /var/log/packages-installed.prev /var/log/packages-installed.txt > /tmp/package-changes.txt
  
  if [ -s /tmp/package-changes.txt ]; then
    echo "Package changes detected:"
    cat /tmp/package-changes.txt
    
    # Send alert
    mail -s "Package Changes Detected" [email protected] < /tmp/package-changes.txt
  fi
fi

# Update baseline
cp /var/log/packages-installed.txt /var/log/packages-installed.prev

Check for security updates:

#!/bin/bash
# security-updates.sh

echo "Checking for available security updates..."

# Count security updates
SECURITY_UPDATES=$(apt list --upgradable 2>/dev/null | grep -c "Security")

if [ "$SECURITY_UPDATES" -gt 0 ]; then
  echo "Security updates available: $SECURITY_UPDATES"
  
  # List them
  apt list --upgradable 2>/dev/null | grep "Security"
  
  # Alert
  echo "Security updates available" | \
    mail -s "Security Updates Needed" [email protected]
fi

Automated Drift Detection

Set up automated monitoring systems.

Drift detection daemon:

#!/bin/bash
# drift-detection-daemon.sh

DRIFT_CHECK_INTERVAL=3600  # 1 hour
DRIFT_LOG="/var/log/drift-detection.log"

while true; do
  echo "$(date): Running drift detection..." >> "$DRIFT_LOG"
  
  # Run Terraform plan
  (cd /opt/terraform && terraform plan -json | \
    jq -r '.[] | select(.type=="resource_drift") | .message' >> "$DRIFT_LOG") || true
  
  # Run Ansible check
  (ansible-playbook /opt/ansible/drift-check.yml --check --diff >> "$DRIFT_LOG") || true
  
  # Run file integrity check
  sudo aide --check >> "$DRIFT_LOG" 2>&1 || true
  
  # Sleep before next check
  sleep "$DRIFT_CHECK_INTERVAL"
done

Systemd service for drift detection:

# /etc/systemd/system/drift-detection.service
[Unit]
Description=Configuration Drift Detection
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/drift-detection-daemon.sh
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Drift Remediation

Automatically fix drift when detected.

Auto-remediation workflow:

# remediate-drift.yml
---
- hosts: all
  serial: 1  # One host at a time
  
  tasks:
    - name: Check for drift
      include_tasks: tasks/drift-check.yml
      register: drift_check

    - name: Log drift detection
      lineinfile:
        path: /var/log/drift-remediation.log
        line: "[{{ ansible_date_time.iso8601 }}] Drift detected on {{ inventory_hostname }}: {{ drift_check.changes | join(', ') }}"
        create: yes
      delegate_to: localhost

    - name: Remediate drift
      block:
        - name: Apply desired configuration
          shell: |
            cd /opt/terraform
            terraform apply -auto-approve
          register: remediation_result

        - name: Verify remediation
          include_tasks: tasks/drift-check.yml
          register: drift_check_after

        - name: Report success
          debug:
            msg: "Drift remediated successfully"
          when: not drift_check_after.changed

      rescue:
        - name: Remediation failed
          debug:
            msg: "Failed to remediate drift"

        - name: Alert on failure
          mail:
            host: smtp.example.com
            port: 25
            subject: "Drift Remediation Failed - {{ inventory_hostname }}"
            body: "Automatic remediation failed. Manual intervention required."
            to: [email protected]

Monitoring and Alerting

Alert on drift detection and remediation actions.

Prometheus metrics:

# /etc/prometheus/rules/drift.yml
groups:
  - name: drift_detection
    interval: 1m
    rules:
      - alert: ConfigurationDriftDetected
        expr: drift_detection_changes_total > 0
        for: 5m
        annotations:
          summary: "Configuration drift detected on {{ $labels.instance }}"
          description: "{{ $value }} configuration changes detected"

      - alert: FileIntegrityViolation
        expr: aide_violations_total > 0
        for: 5m
        annotations:
          summary: "File integrity violation on {{ $labels.instance }}"

Alerting rules:

#!/bin/bash
# send-alert.sh

ALERT_MESSAGE="$1"
SEVERITY="${2:-warning}"

# Send to multiple channels
case "$SEVERITY" in
  critical)
    # Send Slack alert
    curl -X POST https://hooks.slack.com/services/... \
      -d "{\"text\":\":rotating_light: CRITICAL: $ALERT_MESSAGE\"}"
    
    # Send PagerDuty
    curl -X POST https://events.pagerduty.com/v2/enqueue \
      -d "{\"routing_key\":\"...\",\"payload\":{\"summary\":\"$ALERT_MESSAGE\"}}"
    ;;
  warning)
    # Send email
    echo "$ALERT_MESSAGE" | mail -s "Drift Warning" [email protected]
    ;;
esac

Conclusion

Configuration drift detection is critical for maintaining infrastructure consistency and security. By combining Ansible check mode for quick drift detection, Terraform plan analysis for infrastructure changes, file integrity monitoring with AIDE and Tripwire, and automated detection systems, you create a comprehensive drift detection and remediation framework. Automated remediation with proper alerting ensures infrastructure stays in the desired state while maintaining audit trails of all changes.