Docker Health Checks Configuration

Health checks provide Docker with the ability to determine if a container is properly functioning or in a failed state. This guide covers implementing HEALTHCHECK instructions in Dockerfiles, configuring health checks in Docker Compose, managing container restart policies, and integrating health checks with orchestration systems. Proper health check configuration ensures your infrastructure can automatically detect and recover from application failures without manual intervention.

Table of Contents

Understanding Health Checks

Health checks are automated tests that verify whether a container's application is running and responding properly. Docker tracks health status but doesn't take action automatically; orchestrators use health data to restart containers or remove them from load balancers.

Health check states:

  • starting: Container just started, health status unknown (initial period)
  • healthy: Health check passed, container is operational
  • unhealthy: Health check failed, container application may be broken
  • none: No health check configured (default)
# Check current health status
docker ps --format "table {{.Names}}\t{{.Status}}"

# Detailed health status
docker inspect <container-id> --format='{{.State.Health.Status}}'

# View health check history
docker inspect <container-id> | grep -A 20 Health

Benefits of proper health checks:

  • Automatic detection of application failures
  • Orchestrators can restart unhealthy containers
  • Load balancers exclude unhealthy instances
  • Clear visibility into infrastructure health
  • Reduces manual intervention and human error

HEALTHCHECK Instruction in Dockerfile

The HEALTHCHECK instruction defines how Docker should test if a container is healthy.

Basic HEALTHCHECK syntax:

# Dockerfile with simple health check
cat > Dockerfile <<'EOF'
FROM nginx:latest

# Basic health check using curl
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD curl -f http://localhost/ || exit 1

EXPOSE 80
EOF

# Build and test
docker build -t nginx-health:latest .
docker run -d -p 8080:80 --name web nginx-health:latest

# Monitor health status
docker inspect web --format='{{.State.Health.Status}}'
sleep 10
docker inspect web --format='{{.State.Health.Status}}'

HEALTHCHECK forms:

# Form 1: CMD-SHELL (uses /bin/sh -c)
HEALTHCHECK CMD-SHELL curl -f http://localhost/ || exit 1

# Form 2: CMD (no shell, direct command execution)
HEALTHCHECK CMD curl -f http://localhost/ || exit 1

# Form 3: NONE (disable inherited health check)
HEALTHCHECK NONE

Practical examples:

# Web server health check
cat > Dockerfile.web <<'EOF'
FROM nginx:alpine
HEALTHCHECK --interval=15s --timeout=5s --retries=2 \
    CMD curl -f http://localhost/health || exit 1
EOF

# Database health check
cat > Dockerfile.db <<'EOF'
FROM postgres:15-alpine
HEALTHCHECK --interval=10s --timeout=5s --retries=3 \
    CMD pg_isready -U postgres
EOF

# Application health check
cat > Dockerfile.app <<'EOF'
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY app.py .
HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:5000/health').read()"

CMD ["python", "app.py"]
EOF

Health Check Parameters

Understand and configure health check timing and retry parameters.

Parameter details:

# Dockerfile with all health check parameters
cat > Dockerfile <<'EOF'
FROM nginx:alpine

HEALTHCHECK \
    --interval=30s \
    --timeout=10s \
    --start-period=40s \
    --retries=3 \
    CMD curl -f http://localhost/ || exit 1

EXPOSE 80
EOF

# Parameters explanation:
# --interval: Wait this long between health checks (default: 30s)
# --timeout: Allow this long for check to complete (default: 30s)
# --start-period: Give container this long to start before checking (default: 0s)
# --retries: Mark unhealthy after this many consecutive failures (default: 3)

Optimize parameters for different scenarios:

# Fast-responding service (web server)
cat > Dockerfile.fast <<'EOF'
FROM nginx:alpine
HEALTHCHECK --interval=10s --timeout=5s --start-period=5s --retries=2 \
    CMD curl -f http://localhost/ || exit 1
EOF

# Slow-starting service (database)
cat > Dockerfile.slow <<'EOF'
FROM postgres:15-alpine
HEALTHCHECK --interval=15s --timeout=10s --start-period=60s --retries=5 \
    CMD pg_isready -U postgres
EOF

# Long-running background job
cat > Dockerfile.background <<'EOF'
FROM python:3.11-slim
HEALTHCHECK --interval=60s --timeout=30s --start-period=120s --retries=2 \
    CMD curl -f http://localhost:8080/health || exit 1
EOF

Health check exit codes:

# Exit code 0: Container is healthy
# Exit code 1: Container is unhealthy
# Exit code other: Reserved (treated as unhealthy)

# Example with error handling
cat > Dockerfile <<'EOF'
FROM nginx:alpine

HEALTHCHECK CMD \
    curl -f http://localhost/ || exit 1

# Multiple conditions
# HEALTHCHECK CMD bash -c 'curl -f http://localhost/ && curl -f http://localhost/api'

# With logging
# HEALTHCHECK CMD curl -f http://localhost/ || (echo "Health check failed"; exit 1)
EOF

Common Health Check Patterns

Implement health checks for various application types.

HTTP-based health checks:

# Web application with specific health endpoint
cat > Dockerfile <<'EOF'
FROM python:3.11-slim
WORKDIR /app

COPY requirements.txt .
RUN pip install flask

COPY app.py .

HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
    CMD curl -f http://localhost:5000/health || exit 1

EXPOSE 5000
CMD ["python", "app.py"]
EOF

# Create test application
cat > app.py <<'EOF'
from flask import Flask, jsonify
app = Flask(__name__)

@app.route('/health')
def health():
    return jsonify({"status": "healthy"}), 200

@app.route('/')
def hello():
    return "Hello World"

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)
EOF

docker build -t flask-app:health .
docker run -d -p 5000:5000 --name app flask-app:health
docker inspect app --format='{{.State.Health.Status}}'

Database health checks:

# PostgreSQL health check
cat > Dockerfile.pg <<'EOF'
FROM postgres:15-alpine

HEALTHCHECK --interval=10s --timeout=5s --start-period=10s --retries=3 \
    CMD pg_isready -U postgres

ENV POSTGRES_DB=mydb
ENV POSTGRES_USER=admin
ENV POSTGRES_PASSWORD=secret
EOF

docker build -f Dockerfile.pg -t postgres-health:latest .
docker run -d --name db postgres-health:latest
sleep 15
docker inspect db --format='{{.State.Health}}'

# MySQL health check
cat > Dockerfile.mysql <<'EOF'
FROM mysql:8.0

HEALTHCHECK --interval=10s --timeout=5s --start-period=15s --retries=3 \
    CMD mysqladmin ping -h 127.0.0.1 -u root -p$$MYSQL_ROOT_PASSWORD

ENV MYSQL_ROOT_PASSWORD=secret
EOF

# Redis health check
cat > Dockerfile.redis <<'EOF'
FROM redis:7-alpine

HEALTHCHECK --interval=10s --timeout=5s --start-period=5s --retries=3 \
    CMD redis-cli ping | grep -q PONG
EOF

Custom script health checks:

# Application with custom health check script
cat > Dockerfile <<'EOF'
FROM python:3.11-slim
WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY app.py .
COPY healthcheck.py .

HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
    CMD python healthcheck.py

EXPOSE 5000
CMD ["python", "app.py"]
EOF

# Create health check script
cat > healthcheck.py <<'EOF'
#!/usr/bin/env python
import sys
import socket
import time

def check_port(port):
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    result = sock.connect_ex(('localhost', port))
    sock.close()
    return result == 0

try:
    if check_port(5000):
        sys.exit(0)
    else:
        sys.exit(1)
except Exception as e:
    print(f"Health check failed: {e}")
    sys.exit(1)
EOF

chmod +x healthcheck.py

Docker Compose Health Checks

Configure health checks in docker-compose.yml files.

Basic compose health check:

cat > docker-compose.yml <<'EOF'
version: '3.9'

services:
  web:
    image: nginx:alpine
    ports:
      - "80:80"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  db:
    image: postgres:15-alpine
    environment:
      POSTGRES_PASSWORD: secret
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 10s

networks:
  default:

EOF

docker-compose up -d
docker-compose ps

Complex service dependencies with health checks:

cat > docker-compose.yml <<'EOF'
version: '3.9'

services:
  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 5s
    ports:
      - "6379:6379"

  db:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: appdb
      POSTGRES_USER: admin
      POSTGRES_PASSWORD: secret
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U admin -d appdb"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 15s
    ports:
      - "5432:5432"

  app:
    build: .
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy
    ports:
      - "5000:5000"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:5000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
    environment:
      DATABASE_URL: postgresql://admin:secret@db:5432/appdb
      REDIS_URL: redis://redis:6379

EOF

docker-compose up -d
docker-compose ps --no-trunc
docker-compose logs -f

Service startup orchestration with depends_on:

cat > docker-compose.yml <<'EOF'
version: '3.9'

services:
  postgres:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: mydb
      POSTGRES_USER: user
      POSTGRES_PASSWORD: pass
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user -d mydb"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 10s

  app:
    build: .
    depends_on:
      postgres:
        condition: service_healthy
    ports:
      - "8000:8000"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 30s
    environment:
      DATABASE_URL: postgresql://user:pass@postgres:5432/mydb

EOF

# Start with proper ordering
docker-compose up -d
sleep 5
docker-compose ps

Orchestration Integration

Integrate health checks with Docker Swarm and other orchestration platforms.

Swarm service with health checks:

# Health checks in Swarm services
docker service create \
  --name web \
  --replicas 3 \
  --publish 80:80 \
  --health-cmd="curl -f http://localhost/ || exit 1" \
  --health-interval=30s \
  --health-timeout=10s \
  --health-retries=3 \
  --health-start-period=40s \
  nginx:alpine

# Verify health status
docker service ps web

# Update health check on running service
docker service update \
  --health-cmd="curl -f http://localhost/health || exit 1" \
  web

Stack deployment with health checks:

cat > stack.yml <<'EOF'
version: '3.9'

services:
  web:
    image: nginx:alpine
    deploy:
      replicas: 3
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  api:
    image: myapi:latest
    deploy:
      replicas: 2
      resources:
        limits:
          cpus: '0.5'
          memory: 512M
    ports:
      - "5000:5000"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:5000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 30s

networks:
  default:
    driver: overlay

EOF

docker stack deploy -c stack.yml myapp
docker stack ps myapp

Monitoring Health Status

Track and monitor container health across your infrastructure.

Check health status:

# View health status of running containers
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.State}}"

# Detailed health information
docker inspect <container-id> --format='{{json .State.Health}}' | jq

# Health event log
docker inspect <container-id> | jq '.State.Health.Log'

# Format: array of health check results with timestamps

Real-time health monitoring:

# Watch health status changes
watch -n 1 'docker ps --format "table {{.Names}}\t{{.Status}}" | grep -E "unhealthy|healthy|starting"'

# Monitor specific container
docker inspect <container-id> --format='{{.State.Health.Status}}' && echo "Container: $1 - Status: $(docker inspect $1 --format='{{.State.Health.Status}}')"

# Track health transitions
docker events --filter type=container --filter event=health_status

# Monitor health check logs
docker logs <container-id> | grep -i health

Logging and alerting:

# Log health check failures
cat > monitor-health.sh <<'EOF'
#!/bin/bash
CONTAINERS=$(docker ps -q)

for container in $CONTAINERS; do
    NAME=$(docker inspect $container --format='{{.Name}}' | sed 's/^///')
    STATUS=$(docker inspect $container --format='{{.State.Health.Status}}')
    
    if [ "$STATUS" = "unhealthy" ]; then
        echo "$(date): Container $NAME is unhealthy" >> /var/log/docker-health.log
        # Send alert (email, Slack, PagerDuty, etc.)
    fi
done
EOF

chmod +x monitor-health.sh

# Schedule health monitoring
0 * * * * /path/to/monitor-health.sh

Troubleshooting Health Checks

Diagnose and resolve health check issues.

Debug health check failures:

# View health check log
docker inspect <container-id> | jq '.State.Health'

# Example output shows each check's output, exit code, and timestamp

# Test health check command manually
docker exec <container-id> curl -f http://localhost/ || echo "Check failed with code: $?"

# Increase verbosity
docker run -it --name test nginx:alpine sh
# curl -v http://localhost/

# Check if required tools are installed
docker exec <container-id> which curl
docker exec <container-id> which pg_isready

Common health check issues:

# Issue 1: Health check command not found
# Solution: Ensure tool is installed in image

cat > Dockerfile <<'EOF'
FROM ubuntu:22.04
# Need curl before health check
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
HEALTHCHECK CMD curl -f http://localhost/
EOF

# Issue 2: Port not listening yet
# Solution: Increase start_period

HEALTHCHECK --start-period=60s \
    CMD curl -f http://localhost/

# Issue 3: Health check runs too frequently, affecting performance
# Solution: Increase interval

HEALTHCHECK --interval=60s \
    CMD curl -f http://localhost/

# Issue 4: Timeout too short for check
# Solution: Increase timeout

HEALTHCHECK --timeout=30s \
    CMD curl -f http://localhost/

Advanced Health Check Strategies

Implement sophisticated health check patterns for complex applications.

Composite health checks:

# Health check with multiple conditions
cat > Dockerfile <<'EOF'
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y curl
RUN mkdir -p /healthcheck
COPY healthcheck.sh /healthcheck/

HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
    CMD /healthcheck/healthcheck.sh
EOF

cat > healthcheck.sh <<'EOF'
#!/bin/bash
set -e

# Check HTTP endpoint
curl -f http://localhost:8080/health || exit 1

# Check database connectivity
psql -h db -U user -d mydb -c "SELECT 1" || exit 1

# Check service dependencies
curl -f http://api:5000/status || exit 1

echo "All health checks passed"
exit 0
EOF

chmod +x healthcheck.sh

Weighted health scoring:

cat > healthcheck.py <<'EOF'
#!/usr/bin/env python3
import requests
import sys

checks = {
    'http': {'weight': 50, 'url': 'http://localhost:8080/health'},
    'db': {'weight': 30, 'url': 'http://localhost:5432'},
    'cache': {'weight': 20, 'url': 'http://localhost:6379'}
}

score = 0
total = 0

for name, check in checks.items():
    total += check['weight']
    try:
        requests.get(check['url'], timeout=2)
        score += check['weight']
    except:
        print(f"Check failed: {name}")

if score >= total * 0.8:  # 80% threshold
    print(f"Health score: {score}/{total}")
    sys.exit(0)
else:
    print(f"Unhealthy: {score}/{total}")
    sys.exit(1)
EOF

Progressive health checks:

cat > Dockerfile <<'EOF'
FROM node:18-alpine
WORKDIR /app
COPY . .
RUN npm install

HEALTHCHECK --interval=10s --timeout=5s --start-period=30s --retries=2 \
    CMD node -e "require('http').get('http://localhost:3000/health', (r) => {if(r.statusCode !== 200) throw new Error(r.statusCode)})"

CMD ["npm", "start"]
EOF

Conclusion

Health checks are a critical component of production container infrastructure, enabling automatic detection and recovery from application failures. By implementing appropriate health check commands, configuring suitable timing parameters, and integrating with orchestration systems, you create self-healing infrastructure that requires minimal manual intervention. Start with simple HTTP-based health checks for web applications, progress to database-specific checks for stateful services, and eventually implement composite health checks for complex microservices. Regularly review your health check configuration as your applications evolve, and consider using monitoring tools to track health trends across your entire infrastructure. Proper health check configuration separates amateur container deployments from professional, reliable systems.