Rundeck Job Automation and Incident Response
Rundeck is an open-source runbook automation platform that enables teams to create self-service operations, schedule jobs, and coordinate incident response across nodes. This guide covers installing Rundeck on Linux, defining jobs, managing nodes, configuring ACL policies, and building incident response runbooks.
Prerequisites
- Ubuntu 20.04/22.04 or CentOS 8/Rocky Linux 8+
- Java 11+ (OpenJDK)
- At least 2 GB RAM (4 GB recommended)
- SSH access to target nodes
sudoprivileges on the Rundeck server
Install Rundeck on Linux
Ubuntu/Debian:
# Install Java
sudo apt update
sudo apt install -y openjdk-11-jdk-headless
# Add Rundeck repository
curl -s https://packagecloud.io/install/repositories/pagerduty/rundeck/script.deb.sh | sudo bash
# Install Rundeck
sudo apt install -y rundeck
# Enable and start the service
sudo systemctl enable rundeck
sudo systemctl start rundeck
# Check status
sudo systemctl status rundeck
CentOS/Rocky Linux:
# Install Java
sudo dnf install -y java-11-openjdk-headless
# Add Rundeck repository
curl -s https://packagecloud.io/install/repositories/pagerduty/rundeck/script.rpm.sh | sudo bash
# Install Rundeck
sudo dnf install -y rundeck
sudo systemctl enable --now rundeck
Rundeck listens on port 4440 by default. Open it in your firewall:
# Ubuntu (UFW)
sudo ufw allow 4440/tcp
# CentOS (firewalld)
sudo firewall-cmd --permanent --add-port=4440/tcp
sudo firewall-cmd --reload
Access the web UI at http://your-server:4440. Default credentials are admin / admin.
Initial Setup and Projects
Rundeck organizes work into projects. Create your first project:
# Via CLI (rd tool)
# Install the rd CLI
sudo apt install -y rundeck-cli
# Configure rd CLI
export RD_URL=http://localhost:4440
export RD_USER=admin
export RD_PASSWORD=admin
# Create a project
rd projects create --project myops -- \
--project.name=myops \
--project.description="Operations Project"
# List projects
rd projects list
Via the web UI: New Project > Enter name > Save.
Change the admin password immediately after first login:
# Edit the realm.properties file
sudo nano /etc/rundeck/realm.properties
# Change: admin:admin,user,admin,architect,deploy,build
# To: admin:NewSecurePassword,user,admin,architect,deploy,build
sudo systemctl restart rundeck
Node Management
Nodes are the targets where Rundeck executes commands. The local server is already a node.
Add remote nodes by editing the project's resources.xml or resources.yaml:
# Create a YAML node resource file
sudo mkdir -p /var/rundeck/projects/myops/etc
cat > /var/rundeck/projects/myops/etc/resources.yaml << 'EOF'
web01:
nodename: web01
hostname: 192.168.1.10
username: deploy
description: Web server 1
tags: web,production
ssh-keypath: /var/lib/rundeck/.ssh/id_rsa
osFamily: unix
db01:
nodename: db01
hostname: 192.168.1.20
username: deploy
description: Database server
tags: db,production
ssh-keypath: /var/lib/rundeck/.ssh/id_rsa
osFamily: unix
EOF
# Configure project to use this resource file
# In the project settings, set Resource Model Source to File
# Path: /var/rundeck/projects/myops/etc/resources.yaml
# Add Rundeck's SSH key to target nodes
sudo -u rundeck ssh-keygen -t ed25519 -f /var/lib/rundeck/.ssh/id_rsa -N ""
sudo cat /var/lib/rundeck/.ssh/id_rsa.pub
# Copy this key to all target nodes: ssh-copy-id [email protected]
Test node connectivity from the dashboard: Nodes > web01 > Run Command.
Job Definitions
Jobs define the workflow of commands to execute. Create a job via YAML export format:
# Create a job definition file
cat > /tmp/deploy-job.yaml << 'EOF'
- name: Deploy Application
id: deploy-app
description: Pull latest code and restart service
project: myops
loglevel: INFO
nodefilters:
filter: "tags: web"
sequence:
keepgoing: false
strategy: node-first
commands:
- script: |
#!/bin/bash
set -e
echo "Pulling latest code..."
cd /var/www/app
git pull origin main
echo "Installing dependencies..."
composer install --no-dev --quiet
echo "Running migrations..."
php artisan migrate --force
echo "Restarting PHP-FPM..."
sudo systemctl restart php8.1-fpm
echo "Deploy complete on $(hostname)"
notification:
onfailure:
email:
recipients: [email protected]
subject: "Deploy FAILED on ${node.name}"
onsuccess:
email:
recipients: [email protected]
subject: "Deploy succeeded"
scheduleEnabled: true
schedule:
time:
hour: '2'
minute: '0'
month: '*'
weekday:
day: '*'
EOF
# Import the job
rd jobs load --project myops --file /tmp/deploy-job.yaml --format yaml
# List jobs
rd jobs list --project myops
# Run a job immediately
rd run --project myops --job "Deploy Application"
# Follow job execution
rd executions follow --id <execution-id>
ACL Policies
ACL policies control who can do what in Rundeck. Create role-based policies:
# Create an ACL policy for a read-only developer role
cat > /etc/rundeck/acl/developer.aclpolicy << 'EOF'
description: Developer read-only access
context:
project: myops
for:
resource:
- allow: [read]
adhoc:
- allow: [read]
job:
- allow: [read, run]
node:
- allow: [read, run]
by:
group: developers
---
description: Developer system access
context:
application: rundeck
for:
resource:
- equals:
kind: project
allow: [read]
- equals:
kind: system
allow: [read]
project:
- match:
name: myops
allow: [read]
by:
group: developers
EOF
# Create an ops engineer policy
cat > /etc/rundeck/acl/ops.aclpolicy << 'EOF'
description: Ops full access
context:
project: myops
for:
resource:
- allow: [read, create, update, delete]
adhoc:
- allow: [read, run, kill]
job:
- allow: [read, create, update, delete, run, kill]
node:
- allow: [read, run]
by:
group: ops
---
context:
application: rundeck
for:
resource:
- allow: [read, create, update, delete]
project:
- allow: [read, configure, delete, import, export]
by:
group: ops
EOF
sudo systemctl restart rundeck
Webhook Triggers
Trigger jobs via webhooks from external systems (GitHub, monitoring alerts, etc.):
# Create a webhook in the Rundeck UI:
# Project > Webhooks > Add Webhook
# Name: deploy-on-push
# Event Handler: Run Job
# Job: Deploy Application
# The webhook URL format:
# http://your-server:4440/api/45/webhook/<token>
# Test the webhook
curl -X POST "http://localhost:4440/api/45/webhook/YourWebhookToken" \
-H "Content-Type: application/json" \
-d '{"event": "push", "branch": "main"}'
# Use job options from webhook payload
# In job definition, add option: ${RD_WEBHOOK_PAYLOAD_BRANCH}
Incident Response Runbooks
Create structured incident response runbooks as Rundeck jobs:
# Create a database high-CPU incident runbook
cat > /tmp/db-incident-runbook.yaml << 'EOF'
- name: DB High CPU - Incident Response
description: Automated steps for database high CPU incidents
project: myops
loglevel: INFO
nodefilters:
filter: "tags: db"
sequence:
keepgoing: true
strategy: sequential
commands:
- description: "Step 1: Capture current process list"
script: |
#!/bin/bash
echo "=== Active Queries (top 20) ==="
mysql -u root -p${DB_PASS} -e "
SELECT id, user, host, db, command, time, state, info
FROM information_schema.processlist
WHERE command != 'Sleep'
ORDER BY time DESC LIMIT 20;
" 2>/dev/null || echo "Could not query MySQL"
echo "=== System load ==="
uptime
echo "=== Top CPU processes ==="
ps aux --sort=-%cpu | head -20
- description: "Step 2: Check slow query log"
script: |
#!/bin/bash
echo "=== Recent slow queries ==="
tail -n 50 /var/log/mysql/slow-query.log 2>/dev/null || \
echo "Slow query log not found"
- description: "Step 3: Kill long-running queries (>300s)"
script: |
#!/bin/bash
mysql -u root -p${DB_PASS} -e "
SELECT GROUP_CONCAT('KILL ', id SEPARATOR '; ')
FROM information_schema.processlist
WHERE command = 'Query' AND time > 300
" -s -N 2>/dev/null | mysql -u root -p${DB_PASS} 2>/dev/null || true
echo "Long-running query kill attempt complete"
- description: "Step 4: Notify team"
script: |
#!/bin/bash
curl -s -X POST "${SLACK_WEBHOOK}" \
-H 'Content-Type: application/json' \
-d "{\"text\": \"DB incident runbook executed on \$(hostname). Check Rundeck logs for details.\"}"
options:
- name: DB_PASS
required: true
secure: true
valueExposed: false
- name: SLACK_WEBHOOK
required: true
EOF
rd jobs load --project myops --file /tmp/db-incident-runbook.yaml --format yaml
Troubleshooting
Rundeck won't start:
# Check Java version
java -version # Must be 11+
# Check logs
sudo journalctl -u rundeck -n 50
sudo tail -n 50 /var/log/rundeck/service.log
# Verify port 4440 is not in use
ss -tlnp | grep 4440
SSH connection to nodes fails:
# Test SSH manually as rundeck user
sudo -u rundeck ssh -i /var/lib/rundeck/.ssh/id_rsa [email protected]
# Check node definition (username and key path)
# Verify the target node has Rundeck's public key in authorized_keys
grep "rundeck" /home/deploy/.ssh/authorized_keys
Job execution permission denied:
# Check ACL policy syntax
rd acl validate --file /etc/rundeck/acl/developer.aclpolicy
# Reload ACL policies
sudo systemctl restart rundeck
Out of memory errors:
# Increase Java heap size
sudo nano /etc/rundeck/profile
# Add or change: RDECK_JVM_SETTINGS="-Xmx2g -Xms512m"
sudo systemctl restart rundeck
Conclusion
Rundeck brings runbook automation and self-service operations to your infrastructure, reducing mean time to resolution for incidents by providing structured, auditable workflows. Use ACL policies to safely delegate job execution to developers, configure webhooks to trigger automated responses to monitoring alerts, and build incident runbooks to codify your team's institutional knowledge.


