Incident Response Playbook for Linux Servers

An incident response playbook defines the structured steps your team takes when a Linux server security event occurs — from initial detection through containment, forensic analysis, and recovery. This guide provides a practical playbook covering breach detection, evidence collection, and post-incident review for VPS and baremetal Linux servers.

Prerequisites

  • Linux server running Ubuntu 20.04/22.04, CentOS 8, or Rocky Linux 8+
  • Root or sudo access
  • Tools: netstat/ss, ps, lsof, awk, grep, tar, sha256sum
  • Incident response tools: chkrootkit, rkhunter, volatility (optional)
  • A secondary secure location for evidence storage (separate from the compromised system)
  • Team communication channel (Slack, PagerDuty, or similar)

Phase 1: Detection and Initial Assessment

When an incident is suspected, act quickly but methodically.

# --- INITIAL TRIAGE SCRIPT ---
# Run these commands in order and save output

INCIDENT_DIR="/root/incident-$(date +%Y%m%d-%H%M%S)"
mkdir -p "${INCIDENT_DIR}"
exec > >(tee "${INCIDENT_DIR}/triage.log") 2>&1

echo "=== INCIDENT TRIAGE - $(date) ==="
echo "Hostname: $(hostname)"
echo "Uptime: $(uptime)"

# Check current logged-in users
echo "=== Currently logged in ==="
w
who

# Check recent login history
echo "=== Last logins ==="
last -n 20

# Check failed login attempts
echo "=== Failed SSH logins (last 50) ==="
grep "Failed password" /var/log/auth.log 2>/dev/null | tail -50
grep "Failed password" /var/log/secure 2>/dev/null | tail -50  # CentOS

# Check currently running processes
echo "=== Running processes ==="
ps auxf > "${INCIDENT_DIR}/ps_output.txt"
cat "${INCIDENT_DIR}/ps_output.txt"

# Check network connections
echo "=== Active network connections ==="
ss -tlnp > "${INCIDENT_DIR}/network_connections.txt"
ss -anp >> "${INCIDENT_DIR}/network_connections.txt"
cat "${INCIDENT_DIR}/network_connections.txt"

# Check listening ports
echo "=== Listening ports ==="
ss -tlnp | grep LISTEN

Identify the incident type:

# Check for suspicious cron jobs
echo "=== Cron jobs ==="
crontab -l 2>/dev/null
ls -la /etc/cron.* /var/spool/cron/
for user in $(cut -f1 -d: /etc/passwd); do
  crontab -u "$user" -l 2>/dev/null | grep -v '^#' | grep -v '^$' \
    && echo "  ^ from user: $user"
done

# Check recently modified files (last 24 hours)
echo "=== Recently modified files in /etc, /bin, /usr/bin ==="
find /etc /bin /usr/bin /usr/sbin /sbin \
  -mtime -1 -type f 2>/dev/null \
  | tee "${INCIDENT_DIR}/recently_modified.txt"

# Check for suspicious SUID/SGID binaries
echo "=== SUID/SGID binaries ==="
find / -perm /4000 -o -perm /2000 2>/dev/null \
  | tee "${INCIDENT_DIR}/suid_files.txt"

# Check kernel modules (rootkit detection)
echo "=== Loaded kernel modules ==="
lsmod | tee "${INCIDENT_DIR}/lsmod.txt"

Phase 2: Containment

Contain the incident to prevent further damage before deep analysis:

# --- NETWORK CONTAINMENT ---

# Option A: Block all external traffic except your management IP
# Replace 203.0.113.10 with your admin IP
ADMIN_IP="203.0.113.10"

# Save existing iptables rules first
iptables-save > "${INCIDENT_DIR}/iptables_before.rules"

# Apply containment rules
iptables -I INPUT 1 -s "${ADMIN_IP}" -j ACCEPT
iptables -I OUTPUT 1 -d "${ADMIN_IP}" -j ACCEPT
iptables -I INPUT 2 -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -P INPUT DROP
iptables -P OUTPUT DROP
iptables -P FORWARD DROP

echo "Containment rules applied. Only ${ADMIN_IP} can connect."

# Option B: Terminate specific suspicious connections
# Kill connections from a suspicious IP
SUSPICIOUS_IP="198.51.100.55"
ss -K dst "${SUSPICIOUS_IP}"

# --- PROCESS CONTAINMENT ---
# Suspend (not kill) a suspicious process to preserve evidence
SUSPICIOUS_PID=12345
kill -STOP "${SUSPICIOUS_PID}"

# Kill confirmed malicious processes
kill -9 "${SUSPICIOUS_PID}"

# Disable suspicious user accounts
COMPROMISED_USER="webapp"
usermod -L "${COMPROMISED_USER}"     # Lock account
pkill -u "${COMPROMISED_USER}"       # Kill active sessions

# Revoke SSH keys for a compromised user
mv /home/${COMPROMISED_USER}/.ssh/authorized_keys \
   "${INCIDENT_DIR}/authorized_keys_${COMPROMISED_USER}.bak"

Phase 3: Evidence Collection and Forensics

Collect evidence before making any changes. Memory and volatile data first:

# Create a timestamped evidence directory
EVIDENCE_DIR="${INCIDENT_DIR}/evidence"
mkdir -p "${EVIDENCE_DIR}"

# --- VOLATILE DATA (collect first - lost on reboot) ---

# Memory map of suspicious process
SUSPICIOUS_PID=12345
cat /proc/${SUSPICIOUS_PID}/maps > "${EVIDENCE_DIR}/proc_${SUSPICIOUS_PID}_maps.txt"
cat /proc/${SUSPICIOUS_PID}/cmdline | tr '\0' ' ' > "${EVIDENCE_DIR}/proc_${SUSPICIOUS_PID}_cmdline.txt"
ls -la /proc/${SUSPICIOUS_PID}/fd/ > "${EVIDENCE_DIR}/proc_${SUSPICIOUS_PID}_fds.txt"

# All open files
lsof -n > "${EVIDENCE_DIR}/lsof_all.txt"
lsof -i > "${EVIDENCE_DIR}/lsof_network.txt"

# ARP cache (who has been communicating on the network)
arp -a > "${EVIDENCE_DIR}/arp_cache.txt"
ip neigh > "${EVIDENCE_DIR}/ip_neigh.txt"

# Routing table
ip route > "${EVIDENCE_DIR}/routes.txt"

# --- LOG PRESERVATION ---
# Copy all relevant logs
cp -r /var/log "${EVIDENCE_DIR}/var_log/"

# Capture auth logs
cp /var/log/auth.log "${EVIDENCE_DIR}/" 2>/dev/null
cp /var/log/secure "${EVIDENCE_DIR}/" 2>/dev/null
cp /var/log/syslog "${EVIDENCE_DIR}/" 2>/dev/null
journalctl --no-pager > "${EVIDENCE_DIR}/journalctl_all.txt"

# --- FILESYSTEM FORENSICS ---

# Hash critical binaries for integrity checking
echo "=== Binary hash verification ==="
for bin in /bin/ls /bin/ps /bin/netstat /usr/bin/find /usr/bin/who; do
  sha256sum "$bin" 2>/dev/null
done | tee "${EVIDENCE_DIR}/binary_hashes.txt"

# Compare against package manager (Ubuntu)
dpkg -V 2>/dev/null | tee "${EVIDENCE_DIR}/dpkg_verify.txt"
rpm -Va 2>/dev/null | tee "${EVIDENCE_DIR}/rpm_verify.txt"  # CentOS

# Find files with no owner (orphaned files - suspicious)
find / -nouser -o -nogroup 2>/dev/null | tee "${EVIDENCE_DIR}/orphaned_files.txt"

# Check bash history for all users
for home in /root /home/*; do
  if [ -f "${home}/.bash_history" ]; then
    echo "=== ${home}/.bash_history ===" >> "${EVIDENCE_DIR}/bash_histories.txt"
    cat "${home}/.bash_history" >> "${EVIDENCE_DIR}/bash_histories.txt"
  fi
done

# Run rootkit checker
which rkhunter && rkhunter --check --skip-keypress --logfile "${EVIDENCE_DIR}/rkhunter.log"
which chkrootkit && chkrootkit 2>/dev/null | tee "${EVIDENCE_DIR}/chkrootkit.txt"

# Create a compressed evidence archive
tar -czf "/root/evidence-$(hostname)-$(date +%Y%m%d%H%M%S).tar.gz" "${INCIDENT_DIR}/"
sha256sum "/root/evidence-$(hostname)-$(date +%Y%m%d%H%M%S).tar.gz"

echo "Evidence collection complete. Archive created."

Phase 4: Eradication

Remove the attacker's foothold after evidence is secured:

# Remove unauthorized SSH keys from all users
find /root /home -name "authorized_keys" -exec cat {} \; | \
  grep -v "your-legitimate-key-fingerprint"

# Remove malicious cron jobs
crontab -r  # Removes current user's crontab
# Review and clean /etc/cron.d/, /etc/cron.daily/, etc.

# Find and remove web shells (PHP example)
find /var/www -name "*.php" -newer /etc/passwd \
  -exec grep -l "eval\|base64_decode\|system\|passthru\|shell_exec" {} \; \
  | tee "${INCIDENT_DIR}/webshells_found.txt"

# Review found files before deleting
# cat <suspicious_file>
# rm -f <confirmed_webshell>

# Remove malicious systemd services
# List recently created services
find /etc/systemd/system /lib/systemd/system \
  -name "*.service" -newer /etc/passwd 2>/dev/null

# Disable and remove a malicious service
systemctl disable --now malicious.service
rm -f /etc/systemd/system/malicious.service
systemctl daemon-reload

# Update all packages after eradication
apt update && apt upgrade -y  # Ubuntu
dnf update -y                 # CentOS/Rocky

# Rotate all credentials
# - SSH keys for all accounts
# - Application passwords
# - API tokens
# - Database passwords

Phase 5: Recovery

Restore services carefully after eradication:

# Verify system integrity before bringing services back up
dpkg -V --no-pager 2>/dev/null | grep -v "^$"
rpm -Va 2>/dev/null | grep -v "^$"

# Reinstall potentially compromised packages (Ubuntu)
apt install --reinstall $(dpkg -l | awk '/^ii/{print $2}' | head -50)

# Restore iptables to allow normal traffic
iptables-restore < "${INCIDENT_DIR}/iptables_before.rules"
# Or apply a hardened ruleset

# Re-enable services one by one, verifying each
systemctl start nginx && systemctl status nginx
systemctl start php8.1-fpm && systemctl status php8.1-fpm

# Monitor logs after recovery
tail -f /var/log/auth.log /var/log/nginx/error.log /var/log/syslog

# Harden SSH after incident
cat >> /etc/ssh/sshd_config << 'EOF'
PermitRootLogin no
PasswordAuthentication no
MaxAuthTries 3
LoginGraceTime 20
AllowUsers deploy ops
EOF
systemctl restart sshd

Phase 6: Post-Incident Review

Document findings and improve defenses:

# Generate incident timeline from logs
grep "$(date +%b)" /var/log/auth.log | \
  grep -E "Accepted|Failed|Invalid|session opened" | \
  sort -k1,3 > "${INCIDENT_DIR}/auth_timeline.txt"

# Calculate metrics
echo "Incident Metrics" > "${INCIDENT_DIR}/metrics.txt"
echo "Time to Detection: X minutes" >> "${INCIDENT_DIR}/metrics.txt"
echo "Time to Containment: X minutes" >> "${INCIDENT_DIR}/metrics.txt"
echo "Time to Recovery: X hours" >> "${INCIDENT_DIR}/metrics.txt"

Post-incident review checklist:

  • Root cause: How did the attacker gain access? (Weak password, unpatched CVE, misconfiguration)
  • Impact assessment: What data or systems were accessed?
  • Timeline reconstruction: Correlate all log sources into a single timeline
  • Detection gaps: Why wasn't this caught earlier? What alert was missing?
  • Process improvements: Update firewall rules, patch management, monitoring
  • Documentation: Write an incident report with all findings and remediation steps

Automation Scripts

Create a quick-start triage script to run at the start of any incident:

cat > /usr/local/bin/ir-triage << 'SCRIPT'
#!/bin/bash
# Incident Response Quick Triage
DIR="/root/ir-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$DIR"
echo "Starting triage, saving to $DIR"
w > "$DIR/who.txt"
ps auxf > "$DIR/processes.txt"
ss -anp > "$DIR/connections.txt"
last -n 30 > "$DIR/last_logins.txt"
find /tmp /var/tmp -type f > "$DIR/tmp_files.txt"
lsof -i > "$DIR/open_network_files.txt"
crontab -l > "$DIR/root_crontab.txt" 2>&1
journalctl -n 500 --no-pager > "$DIR/journal.txt"
tar -czf "${DIR}.tar.gz" "$DIR/"
echo "Triage complete: ${DIR}.tar.gz"
SCRIPT
chmod +x /usr/local/bin/ir-triage

Troubleshooting

Cannot access server during incident:

# Use your hosting provider's out-of-band console (VPS console/IPMI)
# Take a snapshot before making changes (if possible)

Evidence archive is too large to transfer:

# Split into smaller pieces
split -b 500M evidence.tar.gz evidence_part_
# Verify integrity with checksums
sha256sum evidence_part_*

Rootkit tools give false positives:

# Cross-reference with package manager
dpkg -S /bin/ls  # Check which package owns the file
sha256sum /bin/ls
# Compare against a known-good system

Conclusion

An effective incident response playbook transforms a chaotic breach situation into a structured process with clear phases: detect, contain, collect evidence, eradicate, recover, and review. Practice these procedures in tabletop exercises before a real incident occurs, automate the initial triage script deployment across all servers, and maintain an up-to-date asset inventory to accelerate response time when a security event strikes.