Docker Health Checks Configuration
Health checks provide Docker with the ability to determine if a container is properly functioning or in a failed state. This guide covers implementing HEALTHCHECK instructions in Dockerfiles, configuring health checks in Docker Compose, managing container restart policies, and integrating health checks with orchestration systems. Proper health check configuration ensures your infrastructure can automatically detect and recover from application failures without manual intervention.
Table of Contents
- Understanding Health Checks
- HEALTHCHECK Instruction in Dockerfile
- Health Check Parameters
- Common Health Check Patterns
- Docker Compose Health Checks
- Orchestration Integration
- Monitoring Health Status
- Troubleshooting Health Checks
- Advanced Health Check Strategies
- Conclusion
Understanding Health Checks
Health checks are automated tests that verify whether a container's application is running and responding properly. Docker tracks health status but doesn't take action automatically; orchestrators use health data to restart containers or remove them from load balancers.
Health check states:
- starting: Container just started, health status unknown (initial period)
- healthy: Health check passed, container is operational
- unhealthy: Health check failed, container application may be broken
- none: No health check configured (default)
# Check current health status
docker ps --format "table {{.Names}}\t{{.Status}}"
# Detailed health status
docker inspect <container-id> --format='{{.State.Health.Status}}'
# View health check history
docker inspect <container-id> | grep -A 20 Health
Benefits of proper health checks:
- Automatic detection of application failures
- Orchestrators can restart unhealthy containers
- Load balancers exclude unhealthy instances
- Clear visibility into infrastructure health
- Reduces manual intervention and human error
HEALTHCHECK Instruction in Dockerfile
The HEALTHCHECK instruction defines how Docker should test if a container is healthy.
Basic HEALTHCHECK syntax:
# Dockerfile with simple health check
cat > Dockerfile <<'EOF'
FROM nginx:latest
# Basic health check using curl
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost/ || exit 1
EXPOSE 80
EOF
# Build and test
docker build -t nginx-health:latest .
docker run -d -p 8080:80 --name web nginx-health:latest
# Monitor health status
docker inspect web --format='{{.State.Health.Status}}'
sleep 10
docker inspect web --format='{{.State.Health.Status}}'
HEALTHCHECK forms:
# Form 1: CMD-SHELL (uses /bin/sh -c)
HEALTHCHECK CMD-SHELL curl -f http://localhost/ || exit 1
# Form 2: CMD (no shell, direct command execution)
HEALTHCHECK CMD curl -f http://localhost/ || exit 1
# Form 3: NONE (disable inherited health check)
HEALTHCHECK NONE
Practical examples:
# Web server health check
cat > Dockerfile.web <<'EOF'
FROM nginx:alpine
HEALTHCHECK --interval=15s --timeout=5s --retries=2 \
CMD curl -f http://localhost/health || exit 1
EOF
# Database health check
cat > Dockerfile.db <<'EOF'
FROM postgres:15-alpine
HEALTHCHECK --interval=10s --timeout=5s --retries=3 \
CMD pg_isready -U postgres
EOF
# Application health check
cat > Dockerfile.app <<'EOF'
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY app.py .
HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:5000/health').read()"
CMD ["python", "app.py"]
EOF
Health Check Parameters
Understand and configure health check timing and retry parameters.
Parameter details:
# Dockerfile with all health check parameters
cat > Dockerfile <<'EOF'
FROM nginx:alpine
HEALTHCHECK \
--interval=30s \
--timeout=10s \
--start-period=40s \
--retries=3 \
CMD curl -f http://localhost/ || exit 1
EXPOSE 80
EOF
# Parameters explanation:
# --interval: Wait this long between health checks (default: 30s)
# --timeout: Allow this long for check to complete (default: 30s)
# --start-period: Give container this long to start before checking (default: 0s)
# --retries: Mark unhealthy after this many consecutive failures (default: 3)
Optimize parameters for different scenarios:
# Fast-responding service (web server)
cat > Dockerfile.fast <<'EOF'
FROM nginx:alpine
HEALTHCHECK --interval=10s --timeout=5s --start-period=5s --retries=2 \
CMD curl -f http://localhost/ || exit 1
EOF
# Slow-starting service (database)
cat > Dockerfile.slow <<'EOF'
FROM postgres:15-alpine
HEALTHCHECK --interval=15s --timeout=10s --start-period=60s --retries=5 \
CMD pg_isready -U postgres
EOF
# Long-running background job
cat > Dockerfile.background <<'EOF'
FROM python:3.11-slim
HEALTHCHECK --interval=60s --timeout=30s --start-period=120s --retries=2 \
CMD curl -f http://localhost:8080/health || exit 1
EOF
Health check exit codes:
# Exit code 0: Container is healthy
# Exit code 1: Container is unhealthy
# Exit code other: Reserved (treated as unhealthy)
# Example with error handling
cat > Dockerfile <<'EOF'
FROM nginx:alpine
HEALTHCHECK CMD \
curl -f http://localhost/ || exit 1
# Multiple conditions
# HEALTHCHECK CMD bash -c 'curl -f http://localhost/ && curl -f http://localhost/api'
# With logging
# HEALTHCHECK CMD curl -f http://localhost/ || (echo "Health check failed"; exit 1)
EOF
Common Health Check Patterns
Implement health checks for various application types.
HTTP-based health checks:
# Web application with specific health endpoint
cat > Dockerfile <<'EOF'
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install flask
COPY app.py .
HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
CMD curl -f http://localhost:5000/health || exit 1
EXPOSE 5000
CMD ["python", "app.py"]
EOF
# Create test application
cat > app.py <<'EOF'
from flask import Flask, jsonify
app = Flask(__name__)
@app.route('/health')
def health():
return jsonify({"status": "healthy"}), 200
@app.route('/')
def hello():
return "Hello World"
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
EOF
docker build -t flask-app:health .
docker run -d -p 5000:5000 --name app flask-app:health
docker inspect app --format='{{.State.Health.Status}}'
Database health checks:
# PostgreSQL health check
cat > Dockerfile.pg <<'EOF'
FROM postgres:15-alpine
HEALTHCHECK --interval=10s --timeout=5s --start-period=10s --retries=3 \
CMD pg_isready -U postgres
ENV POSTGRES_DB=mydb
ENV POSTGRES_USER=admin
ENV POSTGRES_PASSWORD=secret
EOF
docker build -f Dockerfile.pg -t postgres-health:latest .
docker run -d --name db postgres-health:latest
sleep 15
docker inspect db --format='{{.State.Health}}'
# MySQL health check
cat > Dockerfile.mysql <<'EOF'
FROM mysql:8.0
HEALTHCHECK --interval=10s --timeout=5s --start-period=15s --retries=3 \
CMD mysqladmin ping -h 127.0.0.1 -u root -p$$MYSQL_ROOT_PASSWORD
ENV MYSQL_ROOT_PASSWORD=secret
EOF
# Redis health check
cat > Dockerfile.redis <<'EOF'
FROM redis:7-alpine
HEALTHCHECK --interval=10s --timeout=5s --start-period=5s --retries=3 \
CMD redis-cli ping | grep -q PONG
EOF
Custom script health checks:
# Application with custom health check script
cat > Dockerfile <<'EOF'
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY app.py .
COPY healthcheck.py .
HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
CMD python healthcheck.py
EXPOSE 5000
CMD ["python", "app.py"]
EOF
# Create health check script
cat > healthcheck.py <<'EOF'
#!/usr/bin/env python
import sys
import socket
import time
def check_port(port):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
result = sock.connect_ex(('localhost', port))
sock.close()
return result == 0
try:
if check_port(5000):
sys.exit(0)
else:
sys.exit(1)
except Exception as e:
print(f"Health check failed: {e}")
sys.exit(1)
EOF
chmod +x healthcheck.py
Docker Compose Health Checks
Configure health checks in docker-compose.yml files.
Basic compose health check:
cat > docker-compose.yml <<'EOF'
version: '3.9'
services:
web:
image: nginx:alpine
ports:
- "80:80"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
db:
image: postgres:15-alpine
environment:
POSTGRES_PASSWORD: secret
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
start_period: 10s
networks:
default:
EOF
docker-compose up -d
docker-compose ps
Complex service dependencies with health checks:
cat > docker-compose.yml <<'EOF'
version: '3.9'
services:
redis:
image: redis:7-alpine
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 3
start_period: 5s
ports:
- "6379:6379"
db:
image: postgres:15-alpine
environment:
POSTGRES_DB: appdb
POSTGRES_USER: admin
POSTGRES_PASSWORD: secret
healthcheck:
test: ["CMD-SHELL", "pg_isready -U admin -d appdb"]
interval: 10s
timeout: 5s
retries: 5
start_period: 15s
ports:
- "5432:5432"
app:
build: .
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
ports:
- "5000:5000"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:5000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 10s
environment:
DATABASE_URL: postgresql://admin:secret@db:5432/appdb
REDIS_URL: redis://redis:6379
EOF
docker-compose up -d
docker-compose ps --no-trunc
docker-compose logs -f
Service startup orchestration with depends_on:
cat > docker-compose.yml <<'EOF'
version: '3.9'
services:
postgres:
image: postgres:15-alpine
environment:
POSTGRES_DB: mydb
POSTGRES_USER: user
POSTGRES_PASSWORD: pass
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user -d mydb"]
interval: 10s
timeout: 5s
retries: 5
start_period: 10s
app:
build: .
depends_on:
postgres:
condition: service_healthy
ports:
- "8000:8000"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
environment:
DATABASE_URL: postgresql://user:pass@postgres:5432/mydb
EOF
# Start with proper ordering
docker-compose up -d
sleep 5
docker-compose ps
Orchestration Integration
Integrate health checks with Docker Swarm and other orchestration platforms.
Swarm service with health checks:
# Health checks in Swarm services
docker service create \
--name web \
--replicas 3 \
--publish 80:80 \
--health-cmd="curl -f http://localhost/ || exit 1" \
--health-interval=30s \
--health-timeout=10s \
--health-retries=3 \
--health-start-period=40s \
nginx:alpine
# Verify health status
docker service ps web
# Update health check on running service
docker service update \
--health-cmd="curl -f http://localhost/health || exit 1" \
web
Stack deployment with health checks:
cat > stack.yml <<'EOF'
version: '3.9'
services:
web:
image: nginx:alpine
deploy:
replicas: 3
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
api:
image: myapi:latest
deploy:
replicas: 2
resources:
limits:
cpus: '0.5'
memory: 512M
ports:
- "5000:5000"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:5000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
networks:
default:
driver: overlay
EOF
docker stack deploy -c stack.yml myapp
docker stack ps myapp
Monitoring Health Status
Track and monitor container health across your infrastructure.
Check health status:
# View health status of running containers
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.State}}"
# Detailed health information
docker inspect <container-id> --format='{{json .State.Health}}' | jq
# Health event log
docker inspect <container-id> | jq '.State.Health.Log'
# Format: array of health check results with timestamps
Real-time health monitoring:
# Watch health status changes
watch -n 1 'docker ps --format "table {{.Names}}\t{{.Status}}" | grep -E "unhealthy|healthy|starting"'
# Monitor specific container
docker inspect <container-id> --format='{{.State.Health.Status}}' && echo "Container: $1 - Status: $(docker inspect $1 --format='{{.State.Health.Status}}')"
# Track health transitions
docker events --filter type=container --filter event=health_status
# Monitor health check logs
docker logs <container-id> | grep -i health
Logging and alerting:
# Log health check failures
cat > monitor-health.sh <<'EOF'
#!/bin/bash
CONTAINERS=$(docker ps -q)
for container in $CONTAINERS; do
NAME=$(docker inspect $container --format='{{.Name}}' | sed 's/^///')
STATUS=$(docker inspect $container --format='{{.State.Health.Status}}')
if [ "$STATUS" = "unhealthy" ]; then
echo "$(date): Container $NAME is unhealthy" >> /var/log/docker-health.log
# Send alert (email, Slack, PagerDuty, etc.)
fi
done
EOF
chmod +x monitor-health.sh
# Schedule health monitoring
0 * * * * /path/to/monitor-health.sh
Troubleshooting Health Checks
Diagnose and resolve health check issues.
Debug health check failures:
# View health check log
docker inspect <container-id> | jq '.State.Health'
# Example output shows each check's output, exit code, and timestamp
# Test health check command manually
docker exec <container-id> curl -f http://localhost/ || echo "Check failed with code: $?"
# Increase verbosity
docker run -it --name test nginx:alpine sh
# curl -v http://localhost/
# Check if required tools are installed
docker exec <container-id> which curl
docker exec <container-id> which pg_isready
Common health check issues:
# Issue 1: Health check command not found
# Solution: Ensure tool is installed in image
cat > Dockerfile <<'EOF'
FROM ubuntu:22.04
# Need curl before health check
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
HEALTHCHECK CMD curl -f http://localhost/
EOF
# Issue 2: Port not listening yet
# Solution: Increase start_period
HEALTHCHECK --start-period=60s \
CMD curl -f http://localhost/
# Issue 3: Health check runs too frequently, affecting performance
# Solution: Increase interval
HEALTHCHECK --interval=60s \
CMD curl -f http://localhost/
# Issue 4: Timeout too short for check
# Solution: Increase timeout
HEALTHCHECK --timeout=30s \
CMD curl -f http://localhost/
Advanced Health Check Strategies
Implement sophisticated health check patterns for complex applications.
Composite health checks:
# Health check with multiple conditions
cat > Dockerfile <<'EOF'
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y curl
RUN mkdir -p /healthcheck
COPY healthcheck.sh /healthcheck/
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
CMD /healthcheck/healthcheck.sh
EOF
cat > healthcheck.sh <<'EOF'
#!/bin/bash
set -e
# Check HTTP endpoint
curl -f http://localhost:8080/health || exit 1
# Check database connectivity
psql -h db -U user -d mydb -c "SELECT 1" || exit 1
# Check service dependencies
curl -f http://api:5000/status || exit 1
echo "All health checks passed"
exit 0
EOF
chmod +x healthcheck.sh
Weighted health scoring:
cat > healthcheck.py <<'EOF'
#!/usr/bin/env python3
import requests
import sys
checks = {
'http': {'weight': 50, 'url': 'http://localhost:8080/health'},
'db': {'weight': 30, 'url': 'http://localhost:5432'},
'cache': {'weight': 20, 'url': 'http://localhost:6379'}
}
score = 0
total = 0
for name, check in checks.items():
total += check['weight']
try:
requests.get(check['url'], timeout=2)
score += check['weight']
except:
print(f"Check failed: {name}")
if score >= total * 0.8: # 80% threshold
print(f"Health score: {score}/{total}")
sys.exit(0)
else:
print(f"Unhealthy: {score}/{total}")
sys.exit(1)
EOF
Progressive health checks:
cat > Dockerfile <<'EOF'
FROM node:18-alpine
WORKDIR /app
COPY . .
RUN npm install
HEALTHCHECK --interval=10s --timeout=5s --start-period=30s --retries=2 \
CMD node -e "require('http').get('http://localhost:3000/health', (r) => {if(r.statusCode !== 200) throw new Error(r.statusCode)})"
CMD ["npm", "start"]
EOF
Conclusion
Health checks are a critical component of production container infrastructure, enabling automatic detection and recovery from application failures. By implementing appropriate health check commands, configuring suitable timing parameters, and integrating with orchestration systems, you create self-healing infrastructure that requires minimal manual intervention. Start with simple HTTP-based health checks for web applications, progress to database-specific checks for stateful services, and eventually implement composite health checks for complex microservices. Regularly review your health check configuration as your applications evolve, and consider using monitoring tools to track health trends across your entire infrastructure. Proper health check configuration separates amateur container deployments from professional, reliable systems.


