Incident Response Playbook for Linux Servers
An incident response playbook defines the structured steps your team takes when a Linux server security event occurs — from initial detection through containment, forensic analysis, and recovery. This guide provides a practical playbook covering breach detection, evidence collection, and post-incident review for VPS and baremetal Linux servers.
Prerequisites
- Linux server running Ubuntu 20.04/22.04, CentOS 8, or Rocky Linux 8+
- Root or sudo access
- Tools:
netstat/ss,ps,lsof,awk,grep,tar,sha256sum - Incident response tools:
chkrootkit,rkhunter,volatility(optional) - A secondary secure location for evidence storage (separate from the compromised system)
- Team communication channel (Slack, PagerDuty, or similar)
Phase 1: Detection and Initial Assessment
When an incident is suspected, act quickly but methodically.
# --- INITIAL TRIAGE SCRIPT ---
# Run these commands in order and save output
INCIDENT_DIR="/root/incident-$(date +%Y%m%d-%H%M%S)"
mkdir -p "${INCIDENT_DIR}"
exec > >(tee "${INCIDENT_DIR}/triage.log") 2>&1
echo "=== INCIDENT TRIAGE - $(date) ==="
echo "Hostname: $(hostname)"
echo "Uptime: $(uptime)"
# Check current logged-in users
echo "=== Currently logged in ==="
w
who
# Check recent login history
echo "=== Last logins ==="
last -n 20
# Check failed login attempts
echo "=== Failed SSH logins (last 50) ==="
grep "Failed password" /var/log/auth.log 2>/dev/null | tail -50
grep "Failed password" /var/log/secure 2>/dev/null | tail -50 # CentOS
# Check currently running processes
echo "=== Running processes ==="
ps auxf > "${INCIDENT_DIR}/ps_output.txt"
cat "${INCIDENT_DIR}/ps_output.txt"
# Check network connections
echo "=== Active network connections ==="
ss -tlnp > "${INCIDENT_DIR}/network_connections.txt"
ss -anp >> "${INCIDENT_DIR}/network_connections.txt"
cat "${INCIDENT_DIR}/network_connections.txt"
# Check listening ports
echo "=== Listening ports ==="
ss -tlnp | grep LISTEN
Identify the incident type:
# Check for suspicious cron jobs
echo "=== Cron jobs ==="
crontab -l 2>/dev/null
ls -la /etc/cron.* /var/spool/cron/
for user in $(cut -f1 -d: /etc/passwd); do
crontab -u "$user" -l 2>/dev/null | grep -v '^#' | grep -v '^$' \
&& echo " ^ from user: $user"
done
# Check recently modified files (last 24 hours)
echo "=== Recently modified files in /etc, /bin, /usr/bin ==="
find /etc /bin /usr/bin /usr/sbin /sbin \
-mtime -1 -type f 2>/dev/null \
| tee "${INCIDENT_DIR}/recently_modified.txt"
# Check for suspicious SUID/SGID binaries
echo "=== SUID/SGID binaries ==="
find / -perm /4000 -o -perm /2000 2>/dev/null \
| tee "${INCIDENT_DIR}/suid_files.txt"
# Check kernel modules (rootkit detection)
echo "=== Loaded kernel modules ==="
lsmod | tee "${INCIDENT_DIR}/lsmod.txt"
Phase 2: Containment
Contain the incident to prevent further damage before deep analysis:
# --- NETWORK CONTAINMENT ---
# Option A: Block all external traffic except your management IP
# Replace 203.0.113.10 with your admin IP
ADMIN_IP="203.0.113.10"
# Save existing iptables rules first
iptables-save > "${INCIDENT_DIR}/iptables_before.rules"
# Apply containment rules
iptables -I INPUT 1 -s "${ADMIN_IP}" -j ACCEPT
iptables -I OUTPUT 1 -d "${ADMIN_IP}" -j ACCEPT
iptables -I INPUT 2 -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -P INPUT DROP
iptables -P OUTPUT DROP
iptables -P FORWARD DROP
echo "Containment rules applied. Only ${ADMIN_IP} can connect."
# Option B: Terminate specific suspicious connections
# Kill connections from a suspicious IP
SUSPICIOUS_IP="198.51.100.55"
ss -K dst "${SUSPICIOUS_IP}"
# --- PROCESS CONTAINMENT ---
# Suspend (not kill) a suspicious process to preserve evidence
SUSPICIOUS_PID=12345
kill -STOP "${SUSPICIOUS_PID}"
# Kill confirmed malicious processes
kill -9 "${SUSPICIOUS_PID}"
# Disable suspicious user accounts
COMPROMISED_USER="webapp"
usermod -L "${COMPROMISED_USER}" # Lock account
pkill -u "${COMPROMISED_USER}" # Kill active sessions
# Revoke SSH keys for a compromised user
mv /home/${COMPROMISED_USER}/.ssh/authorized_keys \
"${INCIDENT_DIR}/authorized_keys_${COMPROMISED_USER}.bak"
Phase 3: Evidence Collection and Forensics
Collect evidence before making any changes. Memory and volatile data first:
# Create a timestamped evidence directory
EVIDENCE_DIR="${INCIDENT_DIR}/evidence"
mkdir -p "${EVIDENCE_DIR}"
# --- VOLATILE DATA (collect first - lost on reboot) ---
# Memory map of suspicious process
SUSPICIOUS_PID=12345
cat /proc/${SUSPICIOUS_PID}/maps > "${EVIDENCE_DIR}/proc_${SUSPICIOUS_PID}_maps.txt"
cat /proc/${SUSPICIOUS_PID}/cmdline | tr '\0' ' ' > "${EVIDENCE_DIR}/proc_${SUSPICIOUS_PID}_cmdline.txt"
ls -la /proc/${SUSPICIOUS_PID}/fd/ > "${EVIDENCE_DIR}/proc_${SUSPICIOUS_PID}_fds.txt"
# All open files
lsof -n > "${EVIDENCE_DIR}/lsof_all.txt"
lsof -i > "${EVIDENCE_DIR}/lsof_network.txt"
# ARP cache (who has been communicating on the network)
arp -a > "${EVIDENCE_DIR}/arp_cache.txt"
ip neigh > "${EVIDENCE_DIR}/ip_neigh.txt"
# Routing table
ip route > "${EVIDENCE_DIR}/routes.txt"
# --- LOG PRESERVATION ---
# Copy all relevant logs
cp -r /var/log "${EVIDENCE_DIR}/var_log/"
# Capture auth logs
cp /var/log/auth.log "${EVIDENCE_DIR}/" 2>/dev/null
cp /var/log/secure "${EVIDENCE_DIR}/" 2>/dev/null
cp /var/log/syslog "${EVIDENCE_DIR}/" 2>/dev/null
journalctl --no-pager > "${EVIDENCE_DIR}/journalctl_all.txt"
# --- FILESYSTEM FORENSICS ---
# Hash critical binaries for integrity checking
echo "=== Binary hash verification ==="
for bin in /bin/ls /bin/ps /bin/netstat /usr/bin/find /usr/bin/who; do
sha256sum "$bin" 2>/dev/null
done | tee "${EVIDENCE_DIR}/binary_hashes.txt"
# Compare against package manager (Ubuntu)
dpkg -V 2>/dev/null | tee "${EVIDENCE_DIR}/dpkg_verify.txt"
rpm -Va 2>/dev/null | tee "${EVIDENCE_DIR}/rpm_verify.txt" # CentOS
# Find files with no owner (orphaned files - suspicious)
find / -nouser -o -nogroup 2>/dev/null | tee "${EVIDENCE_DIR}/orphaned_files.txt"
# Check bash history for all users
for home in /root /home/*; do
if [ -f "${home}/.bash_history" ]; then
echo "=== ${home}/.bash_history ===" >> "${EVIDENCE_DIR}/bash_histories.txt"
cat "${home}/.bash_history" >> "${EVIDENCE_DIR}/bash_histories.txt"
fi
done
# Run rootkit checker
which rkhunter && rkhunter --check --skip-keypress --logfile "${EVIDENCE_DIR}/rkhunter.log"
which chkrootkit && chkrootkit 2>/dev/null | tee "${EVIDENCE_DIR}/chkrootkit.txt"
# Create a compressed evidence archive
tar -czf "/root/evidence-$(hostname)-$(date +%Y%m%d%H%M%S).tar.gz" "${INCIDENT_DIR}/"
sha256sum "/root/evidence-$(hostname)-$(date +%Y%m%d%H%M%S).tar.gz"
echo "Evidence collection complete. Archive created."
Phase 4: Eradication
Remove the attacker's foothold after evidence is secured:
# Remove unauthorized SSH keys from all users
find /root /home -name "authorized_keys" -exec cat {} \; | \
grep -v "your-legitimate-key-fingerprint"
# Remove malicious cron jobs
crontab -r # Removes current user's crontab
# Review and clean /etc/cron.d/, /etc/cron.daily/, etc.
# Find and remove web shells (PHP example)
find /var/www -name "*.php" -newer /etc/passwd \
-exec grep -l "eval\|base64_decode\|system\|passthru\|shell_exec" {} \; \
| tee "${INCIDENT_DIR}/webshells_found.txt"
# Review found files before deleting
# cat <suspicious_file>
# rm -f <confirmed_webshell>
# Remove malicious systemd services
# List recently created services
find /etc/systemd/system /lib/systemd/system \
-name "*.service" -newer /etc/passwd 2>/dev/null
# Disable and remove a malicious service
systemctl disable --now malicious.service
rm -f /etc/systemd/system/malicious.service
systemctl daemon-reload
# Update all packages after eradication
apt update && apt upgrade -y # Ubuntu
dnf update -y # CentOS/Rocky
# Rotate all credentials
# - SSH keys for all accounts
# - Application passwords
# - API tokens
# - Database passwords
Phase 5: Recovery
Restore services carefully after eradication:
# Verify system integrity before bringing services back up
dpkg -V --no-pager 2>/dev/null | grep -v "^$"
rpm -Va 2>/dev/null | grep -v "^$"
# Reinstall potentially compromised packages (Ubuntu)
apt install --reinstall $(dpkg -l | awk '/^ii/{print $2}' | head -50)
# Restore iptables to allow normal traffic
iptables-restore < "${INCIDENT_DIR}/iptables_before.rules"
# Or apply a hardened ruleset
# Re-enable services one by one, verifying each
systemctl start nginx && systemctl status nginx
systemctl start php8.1-fpm && systemctl status php8.1-fpm
# Monitor logs after recovery
tail -f /var/log/auth.log /var/log/nginx/error.log /var/log/syslog
# Harden SSH after incident
cat >> /etc/ssh/sshd_config << 'EOF'
PermitRootLogin no
PasswordAuthentication no
MaxAuthTries 3
LoginGraceTime 20
AllowUsers deploy ops
EOF
systemctl restart sshd
Phase 6: Post-Incident Review
Document findings and improve defenses:
# Generate incident timeline from logs
grep "$(date +%b)" /var/log/auth.log | \
grep -E "Accepted|Failed|Invalid|session opened" | \
sort -k1,3 > "${INCIDENT_DIR}/auth_timeline.txt"
# Calculate metrics
echo "Incident Metrics" > "${INCIDENT_DIR}/metrics.txt"
echo "Time to Detection: X minutes" >> "${INCIDENT_DIR}/metrics.txt"
echo "Time to Containment: X minutes" >> "${INCIDENT_DIR}/metrics.txt"
echo "Time to Recovery: X hours" >> "${INCIDENT_DIR}/metrics.txt"
Post-incident review checklist:
- Root cause: How did the attacker gain access? (Weak password, unpatched CVE, misconfiguration)
- Impact assessment: What data or systems were accessed?
- Timeline reconstruction: Correlate all log sources into a single timeline
- Detection gaps: Why wasn't this caught earlier? What alert was missing?
- Process improvements: Update firewall rules, patch management, monitoring
- Documentation: Write an incident report with all findings and remediation steps
Automation Scripts
Create a quick-start triage script to run at the start of any incident:
cat > /usr/local/bin/ir-triage << 'SCRIPT'
#!/bin/bash
# Incident Response Quick Triage
DIR="/root/ir-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$DIR"
echo "Starting triage, saving to $DIR"
w > "$DIR/who.txt"
ps auxf > "$DIR/processes.txt"
ss -anp > "$DIR/connections.txt"
last -n 30 > "$DIR/last_logins.txt"
find /tmp /var/tmp -type f > "$DIR/tmp_files.txt"
lsof -i > "$DIR/open_network_files.txt"
crontab -l > "$DIR/root_crontab.txt" 2>&1
journalctl -n 500 --no-pager > "$DIR/journal.txt"
tar -czf "${DIR}.tar.gz" "$DIR/"
echo "Triage complete: ${DIR}.tar.gz"
SCRIPT
chmod +x /usr/local/bin/ir-triage
Troubleshooting
Cannot access server during incident:
# Use your hosting provider's out-of-band console (VPS console/IPMI)
# Take a snapshot before making changes (if possible)
Evidence archive is too large to transfer:
# Split into smaller pieces
split -b 500M evidence.tar.gz evidence_part_
# Verify integrity with checksums
sha256sum evidence_part_*
Rootkit tools give false positives:
# Cross-reference with package manager
dpkg -S /bin/ls # Check which package owns the file
sha256sum /bin/ls
# Compare against a known-good system
Conclusion
An effective incident response playbook transforms a chaotic breach situation into a structured process with clear phases: detect, contain, collect evidence, eradicate, recover, and review. Practice these procedures in tabletop exercises before a real incident occurs, automate the initial triage script deployment across all servers, and maintain an up-to-date asset inventory to accelerate response time when a security event strikes.


