Service Monitoring with systemd
Introduction
systemd has become the standard init system and service manager for most modern Linux distributions, replacing traditional SysV init and Upstart. Beyond service management, systemd provides powerful built-in monitoring capabilities that enable administrators to track service health, resource usage, failures, and dependencies without installing additional monitoring software.
Understanding systemd's monitoring features is essential for effective service management in modern Linux environments. systemd tracks detailed service metrics, maintains comprehensive logs via journald, offers automatic restart capabilities, implements dependency management, and provides real-time status information. These capabilities enable proactive service monitoring and rapid troubleshooting directly from the command line.
This comprehensive guide explores systemd's monitoring capabilities, teaching you how to monitor service status, track resource consumption, configure automatic restart policies, analyze service logs, create custom monitoring scripts, and implement alerting for service failures. Whether you're managing a single server or orchestrating multiple services, mastering systemd monitoring is fundamental for maintaining reliable operations.
Prerequisites
Before exploring systemd monitoring, ensure you have:
- A Linux distribution using systemd (Ubuntu 16.04+, Debian 8+, CentOS 7+, Rocky Linux 8+)
- Root or sudo access for service management
- Basic understanding of systemd service units
- Familiarity with Linux command line
- Services configured under systemd management
Verify systemd is running:
# Check systemd version
systemctl --version
# Verify systemd is PID 1
ps -p 1 -o comm=
# Should output: systemd
# Check systemd status
systemctl status
Understanding systemd Service States
Service States
systemd services can be in various states:
Active States:
active (running)- Service is running normallyactive (exited)- One-time service completed successfullyactive (waiting)- Service is waiting for an event
Inactive States:
inactive (dead)- Service is not runningfailed- Service failed to start or crashedactivating- Service is startingdeactivating- Service is stopping
Enable States:
enabled- Service starts automatically at bootdisabled- Service doesn't start at bootstatic- Service can't be enabled (typically dependencies)masked- Service can't be started (completely disabled)
Basic Service Monitoring Commands
Check service status:
# Basic status
systemctl status nginx
# Show all properties
systemctl show nginx
# Check if service is active
systemctl is-active nginx
# Check if service is enabled
systemctl is-enabled nginx
# Check if service failed
systemctl is-failed nginx
List all services:
# List all loaded services
systemctl list-units --type=service
# List all services (including inactive)
systemctl list-units --type=service --all
# List failed services
systemctl list-units --state=failed
# List enabled services
systemctl list-unit-files --type=service --state=enabled
# List running services
systemctl list-units --type=service --state=running
Monitoring Service Status
Detailed Service Status
Get comprehensive service information:
# Full status with recent logs
systemctl status nginx -l --no-pager
# Status of multiple services
systemctl status nginx mysql redis
# Show service dependency tree
systemctl list-dependencies nginx
# Show reverse dependencies (what depends on this service)
systemctl list-dependencies nginx --reverse
Service properties:
# Show all properties
systemctl show nginx
# Show specific property
systemctl show nginx -p MainPID
systemctl show nginx -p ActiveState
systemctl show nginx -p SubState
systemctl show nginx -p LoadState
systemctl show nginx -p UnitFileState
# Multiple properties
systemctl show nginx -p MainPID -p MemoryCurrent -p CPUUsageNSec
Real-Time Service Monitoring
Watch service status:
# Continuously monitor service status (updates every 2 seconds)
watch -n 2 'systemctl status nginx'
# Monitor multiple services
watch -n 2 'systemctl status nginx mysql redis | grep -E "Active|Main PID|Memory"'
# Monitor failed services
watch -n 5 'systemctl list-units --state=failed'
Follow service logs in real-time:
# Follow service logs
journalctl -u nginx -f
# Follow with more context
journalctl -u nginx -f -n 100
# Follow multiple services
journalctl -u nginx -u mysql -f
# Follow all service logs
journalctl -f
Resource Monitoring
CPU and Memory Usage
Check resource consumption:
# Show resource usage for service
systemctl status nginx
# Detailed resource statistics
systemd-cgtop
# Resource usage for specific service
systemctl show nginx -p CPUUsageNSec -p MemoryCurrent
# Human-readable memory usage
systemctl show nginx -p MemoryCurrent | awk -F= '{printf "Memory: %.2f MB\n", $2/1024/1024}'
Monitor resource limits:
# Check configured limits
systemctl show nginx -p LimitNOFILE -p LimitNPROC -p LimitMEMLOCK
# Check current vs limit
systemctl show nginx | grep -E "Limit|Current"
systemd-cgtop for Real-Time Resource Monitoring
Interactive resource monitoring:
# Launch systemd-cgtop (like top for services)
systemd-cgtop
# Press 'p' to sort by path
# Press 't' to sort by tasks
# Press 'c' to sort by CPU
# Press 'm' to sort by memory
# Press 'q' to quit
# Batch mode (single output)
systemd-cgtop -n 1 --batch
# Monitor specific services
systemd-cgtop | grep -E "nginx|mysql|redis"
Resource usage script:
#!/bin/bash
# service-resources.sh - Monitor service resource usage
SERVICES=("nginx" "mysql" "redis")
echo "Service Resource Usage Report"
echo "=============================="
echo "Date: $(date)"
echo ""
for service in "${SERVICES[@]}"; do
if systemctl is-active --quiet "$service"; then
echo "Service: $service"
# Get PID
PID=$(systemctl show "$service" -p MainPID | cut -d= -f2)
echo " PID: $PID"
# Get memory usage
MEM=$(systemctl show "$service" -p MemoryCurrent | cut -d= -f2)
MEM_MB=$(echo "scale=2; $MEM/1024/1024" | bc)
echo " Memory: ${MEM_MB} MB"
# Get CPU usage (accumulated)
CPU=$(systemctl show "$service" -p CPUUsageNSec | cut -d= -f2)
CPU_SEC=$(echo "scale=2; $CPU/1000000000" | bc)
echo " CPU Time: ${CPU_SEC}s"
# Get task count
TASKS=$(systemctl show "$service" -p TasksCurrent | cut -d= -f2)
echo " Tasks: $TASKS"
echo ""
else
echo "Service: $service - NOT RUNNING"
echo ""
fi
done
Service Failure Monitoring
Detecting Service Failures
Check for failed services:
# List all failed services
systemctl --failed
# Count failed services
systemctl --failed --no-legend | wc -l
# Get failure reason
systemctl status nginx | grep -A 5 "Process"
# Show failure details
systemctl show nginx -p Result -p ExecMainStatus
Failed service details:
# Get exit code
systemctl show nginx -p ExecMainStatus
# Get failure result
systemctl show nginx -p Result
# Possible values: success, timeout, exit-code, signal, core-dump
# View recent failures
journalctl -u nginx --since "1 hour ago" | grep -i "failed\|error"
Automated Failure Detection Script
#!/bin/bash
# monitor-failed-services.sh - Alert on service failures
EMAIL="[email protected]"
HOSTNAME=$(hostname)
STATE_FILE="/var/lib/monitoring/failed-services-state"
mkdir -p /var/lib/monitoring
# Get currently failed services
FAILED=$(systemctl --failed --no-legend | awk '{print $1}')
if [ -n "$FAILED" ]; then
# Check if this is a new failure
if [ ! -f "$STATE_FILE" ] || ! diff -q <(echo "$FAILED") "$STATE_FILE" > /dev/null 2>&1; then
# Send alert
{
echo "Service Failure Alert on $HOSTNAME"
echo "=================================="
echo "Time: $(date)"
echo ""
echo "Failed Services:"
echo "$FAILED"
echo ""
echo "Details:"
echo "--------"
for service in $FAILED; do
echo ""
echo "Service: $service"
systemctl status "$service" --no-pager -l
echo ""
echo "Recent Logs:"
journalctl -u "$service" -n 20 --no-pager
echo "---"
done
} | mail -s "ALERT: Service Failures on $HOSTNAME" "$EMAIL"
# Update state file
echo "$FAILED" > "$STATE_FILE"
fi
else
# No failures, remove state file
rm -f "$STATE_FILE"
fi
Service Restart Policies
Automatic Restart Configuration
Configure service restart:
# Edit service unit
sudo systemctl edit nginx
Add restart configuration:
[Service]
Restart=on-failure
RestartSec=5s
StartLimitInterval=200s
StartLimitBurst=3
Restart policy options:
Restart=no- Never restart (default)Restart=on-success- Restart only on clean exitRestart=on-failure- Restart on failuresRestart=on-abnormal- Restart on crashes, watchdog, timeoutsRestart=on-abort- Restart on unclean signalRestart=on-watchdog- Restart on watchdog timeoutRestart=always- Always restart
Example configurations:
# Web server - restart on failure
[Service]
Restart=on-failure
RestartSec=10s
# Database - restart only on clean exit
[Service]
Restart=on-success
RestartSec=30s
# Critical service - always restart with rate limiting
[Service]
Restart=always
RestartSec=5s
StartLimitInterval=300s
StartLimitBurst=5
# Worker process - restart on abnormal exit
[Service]
Restart=on-abnormal
RestartSec=15s
Apply changes:
# Reload systemd configuration
sudo systemctl daemon-reload
# Restart service
sudo systemctl restart nginx
# Verify new configuration
systemctl show nginx -p Restart -p RestartSec
Monitor Restart Activity
Check restart count:
# View service restarts
systemctl show nginx -p NRestarts
# View with status
systemctl status nginx | grep -i restart
# Check restart rate limiting
systemctl show nginx -p StartLimitBurst -p StartLimitIntervalSec
Track restart history:
# View restart events in journal
journalctl -u nginx | grep -E "Started|Stopped|Failed"
# Count restarts in last hour
journalctl -u nginx --since "1 hour ago" | grep -c "Started"
# View restart timestamps
journalctl -u nginx -o short-precise | grep "Started"
Watchdog Monitoring
Configure Watchdog
systemd can monitor services using watchdog functionality.
Enable watchdog in service:
sudo systemctl edit myapp
[Service]
WatchdogSec=30s
Restart=on-watchdog
Application must send watchdog notifications:
# Python example using systemd python library
import systemd.daemon
import time
while True:
# Do work
process_data()
# Notify watchdog (service is alive)
systemd.daemon.notify('WATCHDOG=1')
time.sleep(10)
Monitor watchdog status:
# Check watchdog configuration
systemctl show myapp -p WatchdogSec -p WatchdogTimestamp
# View watchdog events
journalctl -u myapp | grep watchdog
Dependency Monitoring
Service Dependencies
View dependencies:
# What this service requires
systemctl list-dependencies nginx
# What requires this service
systemctl list-dependencies nginx --reverse
# Full dependency tree
systemctl list-dependencies nginx --all
# Just direct dependencies
systemctl list-dependencies nginx --depth=1
Dependency types:
# View unit file to see dependency configuration
systemctl cat nginx
# Common dependency directives:
# Requires= - Hard dependency (fails if dependency fails)
# Wants= - Soft dependency (continues if dependency fails)
# After= - Order dependency (start after)
# Before= - Order dependency (start before)
# BindsTo= - Strong binding (stops if dependency stops)
Monitor Dependency Failures
#!/bin/bash
# check-service-dependencies.sh
SERVICE="$1"
if [ -z "$SERVICE" ]; then
echo "Usage: $0 <service-name>"
exit 1
fi
echo "Checking dependencies for $SERVICE"
echo "==================================="
# Get required dependencies
REQUIRES=$(systemctl show "$SERVICE" -p Requires | cut -d= -f2)
if [ -n "$REQUIRES" ]; then
echo "Required dependencies:"
for dep in $REQUIRES; do
STATUS=$(systemctl is-active "$dep")
if [ "$STATUS" != "active" ]; then
echo " [WARN] $dep: $STATUS"
else
echo " [OK] $dep: $STATUS"
fi
done
else
echo "No hard dependencies"
fi
echo ""
# Get wanted dependencies
WANTS=$(systemctl show "$SERVICE" -p Wants | cut -d= -f2)
if [ -n "$WANTS" ]; then
echo "Optional dependencies:"
for dep in $WANTS; do
STATUS=$(systemctl is-active "$dep")
echo " $dep: $STATUS"
done
fi
Logging and Journal Monitoring
Journal Integration
Service-specific logs:
# View logs for service
journalctl -u nginx
# Last 100 lines
journalctl -u nginx -n 100
# Follow logs
journalctl -u nginx -f
# Since specific time
journalctl -u nginx --since "2024-01-11 10:00:00"
journalctl -u nginx --since "1 hour ago"
journalctl -u nginx --since today
# Date range
journalctl -u nginx --since "2024-01-11" --until "2024-01-12"
# Priority filtering (errors only)
journalctl -u nginx -p err
# Multiple services
journalctl -u nginx -u mysql
Log analysis:
# Count errors
journalctl -u nginx --since today -p err --no-pager | wc -l
# Extract specific patterns
journalctl -u nginx --since today | grep "404\|500"
# Export to file
journalctl -u nginx --since "1 hour ago" > /tmp/nginx-logs.txt
# JSON output
journalctl -u nginx -n 10 -o json-pretty
# Show kernel messages related to service
journalctl -u nginx -k
Custom Monitoring Scripts
Comprehensive Service Monitor
#!/bin/bash
# comprehensive-service-monitor.sh - Complete service monitoring
SERVICES=("nginx" "mysql" "redis" "ssh")
REPORT_FILE="/tmp/service-monitor-$(date +%Y%m%d-%H%M).txt"
ALERT_EMAIL="[email protected]"
declare -a ALERTS=()
{
echo "========================================="
echo "Service Monitoring Report"
echo "Date: $(date)"
echo "Hostname: $(hostname)"
echo "========================================="
echo ""
for service in "${SERVICES[@]}"; do
echo "--- Service: $service ---"
# Check if service exists
if ! systemctl list-unit-files | grep -q "^${service}.service"; then
echo " Status: NOT INSTALLED"
echo ""
continue
fi
# Get status
STATUS=$(systemctl is-active "$service")
ENABLED=$(systemctl is-enabled "$service" 2>/dev/null || echo "unknown")
echo " Status: $STATUS"
echo " Enabled: $ENABLED"
if [ "$STATUS" = "active" ]; then
# Get resource usage
MEM=$(systemctl show "$service" -p MemoryCurrent | cut -d= -f2)
if [ "$MEM" != "[not set]" ] && [ "$MEM" -gt 0 ]; then
MEM_MB=$(echo "scale=2; $MEM/1024/1024" | bc)
echo " Memory: ${MEM_MB} MB"
fi
# Get restart count
RESTARTS=$(systemctl show "$service" -p NRestarts | cut -d= -f2)
echo " Restarts: $RESTARTS"
# Check for recent errors
ERROR_COUNT=$(journalctl -u "$service" --since "1 hour ago" -p err --no-pager | wc -l)
echo " Recent Errors (1h): $ERROR_COUNT"
if [ "$ERROR_COUNT" -gt 10 ]; then
ALERTS+=("High error count for $service: $ERROR_COUNT errors in last hour")
fi
else
echo " [ALERT] Service is not active!"
ALERTS+=("Service $service is $STATUS")
fi
# Check last restart
LAST_START=$(systemctl show "$service" -p ActiveEnterTimestamp | cut -d= -f2)
echo " Last Started: $LAST_START"
echo ""
done
# Summary
echo "========================================="
echo "Summary"
echo "========================================="
ACTIVE_COUNT=0
INACTIVE_COUNT=0
for service in "${SERVICES[@]}"; do
if systemctl is-active --quiet "$service"; then
((ACTIVE_COUNT++))
else
((INACTIVE_COUNT++))
fi
done
echo "Active Services: $ACTIVE_COUNT"
echo "Inactive Services: $INACTIVE_COUNT"
if [ ${#ALERTS[@]} -gt 0 ]; then
echo ""
echo "ALERTS:"
for alert in "${ALERTS[@]}"; do
echo " - $alert"
done
else
echo ""
echo "No alerts - all services healthy"
fi
} > "$REPORT_FILE"
# Display report
cat "$REPORT_FILE"
# Send email if there are alerts
if [ ${#ALERTS[@]} -gt 0 ]; then
mail -s "Service Alert: $(hostname)" "$ALERT_EMAIL" < "$REPORT_FILE"
fi
Service Availability Monitor
#!/bin/bash
# service-availability.sh - Track service uptime and availability
SERVICE="$1"
STATS_FILE="/var/lib/monitoring/service-stats-${SERVICE}.json"
if [ -z "$SERVICE" ]; then
echo "Usage: $0 <service-name>"
exit 1
fi
mkdir -p /var/lib/monitoring
# Check if service is active
if systemctl is-active --quiet "$SERVICE"; then
STATUS="up"
else
STATUS="down"
fi
# Update statistics
if [ -f "$STATS_FILE" ]; then
# Load existing stats
TOTAL_CHECKS=$(jq -r '.total_checks' "$STATS_FILE")
UP_CHECKS=$(jq -r '.up_checks' "$STATS_FILE")
LAST_STATUS=$(jq -r '.last_status' "$STATS_FILE")
# Increment counters
((TOTAL_CHECKS++))
if [ "$STATUS" = "up" ]; then
((UP_CHECKS++))
fi
# Check for status change
if [ "$STATUS" != "$LAST_STATUS" ]; then
echo "Status change detected: $LAST_STATUS -> $STATUS"
# Log to journal
logger -t service-monitor "Service $SERVICE changed from $LAST_STATUS to $STATUS"
fi
else
# Initialize stats
TOTAL_CHECKS=1
if [ "$STATUS" = "up" ]; then
UP_CHECKS=1
else
UP_CHECKS=0
fi
fi
# Calculate availability
AVAILABILITY=$(echo "scale=2; ($UP_CHECKS / $TOTAL_CHECKS) * 100" | bc)
# Save stats
cat > "$STATS_FILE" <<EOF
{
"service": "$SERVICE",
"last_check": "$(date -Iseconds)",
"current_status": "$STATUS",
"last_status": "$STATUS",
"total_checks": $TOTAL_CHECKS,
"up_checks": $UP_CHECKS,
"availability": $AVAILABILITY
}
EOF
echo "Service: $SERVICE"
echo "Status: $STATUS"
echo "Availability: ${AVAILABILITY}%"
echo "Checks: $UP_CHECKS/$TOTAL_CHECKS"
Alerting Integration
systemd OnFailure Integration
Configure alert on service failure:
# Create alert service
sudo nano /etc/systemd/system/[email protected]
[Unit]
Description=Alert on service failure for %i
[Service]
Type=oneshot
ExecStart=/usr/local/bin/send-service-alert.sh %i
Create alert script:
sudo nano /usr/local/bin/send-service-alert.sh
#!/bin/bash
SERVICE="$1"
EMAIL="[email protected]"
HOSTNAME=$(hostname)
{
echo "Service Failure Alert"
echo "===================="
echo "Service: $SERVICE"
echo "Hostname: $HOSTNAME"
echo "Time: $(date)"
echo ""
echo "Status:"
systemctl status "$SERVICE" --no-pager -l
echo ""
echo "Recent Logs:"
journalctl -u "$SERVICE" -n 50 --no-pager
} | mail -s "CRITICAL: $SERVICE failed on $HOSTNAME" "$EMAIL"
sudo chmod +x /usr/local/bin/send-service-alert.sh
Add to service configuration:
sudo systemctl edit nginx
[Unit]
OnFailure=service-alert@%n.service
sudo systemctl daemon-reload
Performance Monitoring
Benchmark Service Startup
# Analyze service startup time
systemd-analyze blame | grep nginx
# Show critical chain
systemd-analyze critical-chain nginx.service
# Plot boot chart (requires graphviz)
systemd-analyze plot > boot.svg
Monitor Service Performance
#!/bin/bash
# service-performance.sh - Track service performance metrics
SERVICE="$1"
if [ -z "$SERVICE" ]; then
echo "Usage: $0 <service-name>"
exit 1
fi
echo "Performance Metrics for $SERVICE"
echo "================================="
# Startup time
STARTUP_TIME=$(systemd-analyze blame | grep "$SERVICE" | awk '{print $1}')
echo "Startup Time: $STARTUP_TIME"
# Memory usage
MEM=$(systemctl show "$SERVICE" -p MemoryCurrent | cut -d= -f2)
if [ "$MEM" != "[not set]" ]; then
MEM_MB=$(echo "scale=2; $MEM/1024/1024" | bc)
echo "Memory Usage: ${MEM_MB} MB"
fi
# CPU time
CPU=$(systemctl show "$SERVICE" -p CPUUsageNSec | cut -d= -f2)
if [ "$CPU" != "[not set]" ]; then
CPU_SEC=$(echo "scale=2; $CPU/1000000000" | bc)
echo "CPU Time: ${CPU_SEC}s"
fi
# Tasks
TASKS=$(systemctl show "$SERVICE" -p TasksCurrent | cut -d= -f2)
echo "Active Tasks: $TASKS"
# File descriptors
FD=$(systemctl show "$SERVICE" -p FileDescriptorCount | cut -d= -f2)
echo "Open FDs: $FD"
Conclusion
systemd provides comprehensive built-in monitoring capabilities that enable effective service management without requiring external monitoring tools. By mastering systemd's monitoring features, you can track service health, detect failures, analyze resource usage, and automate remediation directly from the Linux command line.
Key takeaways:
- Built-in monitoring - systemd tracks extensive service metrics natively
- Resource tracking - Monitor CPU, memory, and other resources per service
- Automatic restart - Configure intelligent restart policies for resilience
- Journal integration - Unified logging with powerful filtering and search
- Dependency awareness - Monitor and manage service dependencies
Best practices:
- Configure appropriate restart policies for each service type
- Monitor failed services regularly
- Implement alerting for critical service failures
- Track resource usage to identify performance issues
- Use journalctl for centralized log analysis
- Document service dependencies
- Automate routine monitoring tasks
- Integrate with external monitoring for comprehensive coverage
While systemd provides excellent built-in monitoring, consider complementing it with dedicated monitoring solutions like Prometheus, Nagios, or Zabbix for historical metrics, advanced alerting, and distributed monitoring across multiple servers. systemd's monitoring capabilities form the foundation for effective service management in modern Linux infrastructure.


