Docker Health Checks Configuración

Health checks provide Docker with the ability to determine if a contenedor is properly functioning or in a failed state. Esta guía cubre implementing HEALTHCHECK instructions in Dockerfiles, configuring health checks in Docker Compose, managing contenedor restart policies, and integrating health checks with orchestration systems. Proper health verifica configuration ensures your infrastructure can automatically detect and recover from application failures without manual intervention.

Tabla de Contenidos

Comprendiendo Health Checks

Health checks are automated tests that verifica whether a contenedor's application is running and responding properly. Docker tracks health status but doesn't take action automatically; orchestrators use health data to restart contenedores or remove them from load balancers.

Health verifica states:

  • starting: Contenedor just started, health status unknown (initial period)
  • healthy: Health verifica passed, contenedor is operational
  • unhealthy: Health verifica failed, contenedor application may be broken
  • none: No health verifica configured (default)
# Check current health status
docker ps --format "table {{.Names}}\t{{.Status}}"

# Detailed health status
docker inspect <contenedor-id> --format='{{.State.Health.Status}}'

# View health verifica history
docker inspect <contenedor-id> | grep -A 20 Health

Benefits of proper health checks:

  • Automatic detection of application failures
  • Orchestrators can restart unhealthy contenedores
  • Load balancers exclude unhealthy instances
  • Clear visibility into infrastructure health
  • Reduces manual intervention and human error

HEALTHCHECK Instruction in Dockerfile

The HEALTHCHECK instruction defines how Docker should test if a contenedor is healthy.

Basic HEALTHCHECK syntax:

# Dockerfile with simple health verifica
cat > Dockerfile <<'EOF'
FROM nginx:latest

# Basic health verifica using curl
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD curl -f http://localhost/ || exit 1

EXPOSE 80
EOF

# Build and test
docker build -t nginx-health:latest .
docker run -d -p 8080:80 --name web nginx-health:latest

# Monitorea health status
docker inspect web --format='{{.State.Health.Status}}'
sleep 10
docker inspect web --format='{{.State.Health.Status}}'

HEALTHCHECK forms:

# Form 1: CMD-SHELL (uses /bin/sh -c)
HEALTHCHECK CMD-SHELL curl -f http://localhost/ || exit 1

# Form 2: CMD (no shell, direct command execution)
HEALTHCHECK CMD curl -f http://localhost/ || exit 1

# Form 3: NONE (disable inherited health verifica)
HEALTHCHECK NONE

Practical examples:

# Web server health verifica
cat > Dockerfile.web <<'EOF'
FROM nginx:alpine
HEALTHCHECK --interval=15s --timeout=5s --retries=2 \
    CMD curl -f http://localhost/health || exit 1
EOF

# Database health verifica
cat > Dockerfile.db <<'EOF'
FROM postgres:15-alpine
HEALTHCHECK --interval=10s --timeout=5s --retries=3 \
    CMD pg_isready -U postgres
EOF

# Application health verifica
cat > Dockerfile.app <<'EOF'
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY app.py .
HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:5000/health').read()"

CMD ["python", "app.py"]
EOF

Health Check Parameters

Understand and configure health verifica timing and retry parameters.

Parameter details:

# Dockerfile with all health verifica parameters
cat > Dockerfile <<'EOF'
FROM nginx:alpine

HEALTHCHECK \
    --interval=30s \
    --timeout=10s \
    --start-period=40s \
    --retries=3 \
    CMD curl -f http://localhost/ || exit 1

EXPOSE 80
EOF

# Parameters explanation:
# --interval: Wait this long between health checks (default: 30s)
# --timeout: Permite this long for verifica to complete (default: 30s)
# --start-period: Give contenedor this long to start before checking (default: 0s)
# --retries: Mark unhealthy after this many consecutive failures (default: 3)

Optimiza parameters for different scenarios:

# Fast-responding servicio (web server)
cat > Dockerfile.fast <<'EOF'
FROM nginx:alpine
HEALTHCHECK --interval=10s --timeout=5s --start-period=5s --retries=2 \
    CMD curl -f http://localhost/ || exit 1
EOF

# Slow-starting servicio (database)
cat > Dockerfile.slow <<'EOF'
FROM postgres:15-alpine
HEALTHCHECK --interval=15s --timeout=10s --start-period=60s --retries=5 \
    CMD pg_isready -U postgres
EOF

# Long-running background job
cat > Dockerfile.background <<'EOF'
FROM python:3.11-slim
HEALTHCHECK --interval=60s --timeout=30s --start-period=120s --retries=2 \
    CMD curl -f http://localhost:8080/health || exit 1
EOF

Health verifica exit codes:

# Exit code 0: Contenedor is healthy
# Exit code 1: Contenedor is unhealthy
# Exit code other: Reserved (treated as unhealthy)

# Example with error handling
cat > Dockerfile <<'EOF'
FROM nginx:alpine

HEALTHCHECK CMD \
    curl -f http://localhost/ || exit 1

# Multiple conditions
# HEALTHCHECK CMD bash -c 'curl -f http://localhost/ && curl -f http://localhost/api'

# With logging
# HEALTHCHECK CMD curl -f http://localhost/ || (echo "Health verifica failed"; exit 1)
EOF

Common Health Check Patterns

Implement health checks for various application types.

HTTP-based health checks:

# Web application with specific health endpoint
cat > Dockerfile <<'EOF'
FROM python:3.11-slim
WORKDIR /app

COPY requirements.txt .
RUN pip install flask

COPY app.py .

HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
    CMD curl -f http://localhost:5000/health || exit 1

EXPOSE 5000
CMD ["python", "app.py"]
EOF

# Crea test application
cat > app.py <<'EOF'
from flask import Flask, jsonify
app = Flask(__name__)

@app.route('/health')
def health():
    return jsonify({"status": "healthy"}), 200

@app.route('/')
def hello():
    return "Hello World"

if __name__ == '__main__':
    app.run(host='0.0.0.0', puerto=5000)
EOF

docker build -t flask-app:health .
docker run -d -p 5000:5000 --name app flask-app:health
docker inspect app --format='{{.State.Health.Status}}'

Database health checks:

# PostgreSQL health verifica
cat > Dockerfile.pg <<'EOF'
FROM postgres:15-alpine

HEALTHCHECK --interval=10s --timeout=5s --start-period=10s --retries=3 \
    CMD pg_isready -U postgres

ENV POSTGRES_DB=mydb
ENV POSTGRES_USER=admin
ENV POSTGRES_PASSWORD=secret
EOF

docker build -f Dockerfile.pg -t postgres-health:latest .
docker run -d --name db postgres-health:latest
sleep 15
docker inspect db --format='{{.State.Health}}'

# MySQL health verifica
cat > Dockerfile.mysql <<'EOF'
FROM mysql:8.0

HEALTHCHECK --interval=10s --timeout=5s --start-period=15s --retries=3 \
    CMD mysqladmin ping -h 127.0.0.1 -u root -p$$MYSQL_ROOT_PASSWORD

ENV MYSQL_ROOT_PASSWORD=secret
EOF

# Redis health verifica
cat > Dockerfile.redis <<'EOF'
FROM redis:7-alpine

HEALTHCHECK --interval=10s --timeout=5s --start-period=5s --retries=3 \
    CMD redis-cli ping | grep -q PONG
EOF

Custom script health checks:

# Application with custom health verifica script
cat > Dockerfile <<'EOF'
FROM python:3.11-slim
WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY app.py .
COPY healthcheck.py .

HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
    CMD python healthcheck.py

EXPOSE 5000
CMD ["python", "app.py"]
EOF

# Crea health verifica script
cat > healthcheck.py <<'EOF'
#!/usr/bin/env python
import sys
import socket
import time

def check_port(puerto):
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    result = sock.connect_ex(('localhost', puerto))
    sock.close()
    return result == 0

try:
    if check_port(5000):
        sys.exit(0)
    else:
        sys.exit(1)
except Exception as e:
    print(f"Health verifica failed: {e}")
    sys.exit(1)
EOF

chmod +x healthcheck.py

Docker Compose Health Checks

Configura health checks in docker-compose.yml files.

Basic compose health verifica:

cat > docker-compose.yml <<'EOF'
version: '3.9'

servicios:
  web:
    imagen: nginx:alpine
    puertos:
      - "80:80"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  db:
    imagen: postgres:15-alpine
    environment:
      POSTGRES_PASSWORD: secret
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 10s

redes:
  default:

EOF

docker-compose up -d
docker-compose ps

Complex servicio dependencies with health checks:

cat > docker-compose.yml <<'EOF'
version: '3.9'

servicios:
  redis:
    imagen: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 5s
    puertos:
      - "6379:6379"

  db:
    imagen: postgres:15-alpine
    environment:
      POSTGRES_DB: appdb
      POSTGRES_USER: admin
      POSTGRES_PASSWORD: secret
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U admin -d appdb"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 15s
    puertos:
      - "5432:5432"

  app:
    build: .
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy
    puertos:
      - "5000:5000"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:5000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
    environment:
      DATABASE_URL: postgresql://admin:secret@db:5432/appdb
      REDIS_URL: redis://redis:6379

EOF

docker-compose up -d
docker-compose ps --no-trunc
docker-compose logs -f

Servicio startup orchestration with depends_on:

cat > docker-compose.yml <<'EOF'
version: '3.9'

servicios:
  postgres:
    imagen: postgres:15-alpine
    environment:
      POSTGRES_DB: mydb
      POSTGRES_USER: user
      POSTGRES_PASSWORD: pass
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user -d mydb"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 10s

  app:
    build: .
    depends_on:
      postgres:
        condition: service_healthy
    puertos:
      - "8000:8000"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 30s
    environment:
      DATABASE_URL: postgresql://user:pass@postgres:5432/mydb

EOF

# Inicia with proper ordering
docker-compose up -d
sleep 5
docker-compose ps

Orchestration Integration

Integrate health checks with Docker Swarm and other orchestration platforms.

Swarm servicio with health checks:

# Health checks in Swarm servicios
docker servicio create \
  --name web \
  --réplicas 3 \
  --publish 80:80 \
  --health-cmd="curl -f http://localhost/ || exit 1" \
  --health-interval=30s \
  --health-timeout=10s \
  --health-retries=3 \
  --health-start-period=40s \
  nginx:alpine

# Verifica health status
docker servicio ps web

# Actualiza health verifica on running servicio
docker servicio update \
  --health-cmd="curl -f http://localhost/health || exit 1" \
  web

Stack deployment with health checks:

cat > stack.yml <<'EOF'
version: '3.9'

servicios:
  web:
    imagen: nginx:alpine
    deploy:
      réplicas: 3
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  api:
    imagen: myapi:latest
    deploy:
      réplicas: 2
      resources:
        limits:
          cpus: '0.5'
          memory: 512M
    puertos:
      - "5000:5000"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:5000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 30s

redes:
  default:
    driver: overlay

EOF

docker stack deploy -c stack.yml myapp
docker stack ps myapp

Monitoreo Health Status

Track and monitor contenedor health across your infrastructure.

Check health status:

# View health status of running contenedores
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.State}}"

# Detailed health information
docker inspect <contenedor-id> --format='{{json .State.Health}}' | jq

# Health event log
docker inspect <contenedor-id> | jq '.State.Health.Log'

# Format: array of health verifica results with timestamps

Real-time health monitoring:

# Watch health status changes
watch -n 1 'docker ps --format "table {{.Names}}\t{{.Status}}" | grep -E "unhealthy|healthy|starting"'

# Monitorea specific contenedor
docker inspect <contenedor-id> --format='{{.State.Health.Status}}' && echo "Contenedor: $1 - Status: $(docker inspect $1 --format='{{.State.Health.Status}}')"

# Track health transitions
docker events --filter type=contenedor --filter event=health_status

# Monitorea health verifica logs
docker logs <contenedor-id> | grep -i health

Logging and alerting:

# Log health verifica failures
cat > monitor-health.sh <<'EOF'
#!/bin/bash
CONTAINERS=$(docker ps -q)

for contenedor in $CONTAINERS; do
    NAME=$(docker inspect $contenedor --format='{{.Name}}' | sed 's/^///')
    STATUS=$(docker inspect $contenedor --format='{{.State.Health.Status}}')
    
    if [ "$STATUS" = "unhealthy" ]; then
        echo "$(date): Contenedor $NAME is unhealthy" >> /var/log/docker-health.log
        # Send alert (email, Slack, PagerDuty, etc.)
    fi
done
EOF

chmod +x monitor-health.sh

# Schedule health monitoring
0 * * * * /path/to/monitor-health.sh

Solución de Problemas Health Checks

Diagnose and resolve health verifica issues.

Depura health verifica failures:

# View health verifica log
docker inspect <contenedor-id> | jq '.State.Health'

# Example output shows each verifica's output, exit code, and timestamp

# Prueba health verifica command manually
docker exec <contenedor-id> curl -f http://localhost/ || echo "Check failed with code: $?"

# Increase verbosity
docker run -it --name test nginx:alpine sh
# curl -v http://localhost/

# Check if required tools are installed
docker exec <contenedor-id> which curl
docker exec <contenedor-id> which pg_isready

Common health verifica issues:

# Issue 1: Health verifica command not found
# Solution: Ensure tool is installed in imagen

cat > Dockerfile <<'EOF'
FROM ubuntu:22.04
# Need curl before health verifica
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
HEALTHCHECK CMD curl -f http://localhost/
EOF

# Issue 2: Puerto not listening yet
# Solution: Increase start_period

HEALTHCHECK --start-period=60s \
    CMD curl -f http://localhost/

# Issue 3: Health verifica runs too frequently, affecting performance
# Solution: Increase interval

HEALTHCHECK --interval=60s \
    CMD curl -f http://localhost/

# Issue 4: Timeout too short for verifica
# Solution: Increase timeout

HEALTHCHECK --timeout=30s \
    CMD curl -f http://localhost/

Avanzado Health Check Strategies

Implement sophisticated health verifica patterns for complex applications.

Composite health checks:

# Health verifica with multiple conditions
cat > Dockerfile <<'EOF'
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y curl
RUN mkdir -p /healthcheck
COPY healthcheck.sh /healthcheck/

HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
    CMD /healthcheck/healthcheck.sh
EOF

cat > healthcheck.sh <<'EOF'
#!/bin/bash
set -e

# Check HTTP endpoint
curl -f http://localhost:8080/health || exit 1

# Check database connectivity
psql -h db -U user -d mydb -c "SELECT 1" || exit 1

# Check servicio dependencies
curl -f http://api:5000/status || exit 1

echo "All health checks passed"
exit 0
EOF

chmod +x healthcheck.sh

Weighted health scoring:

cat > healthcheck.py <<'EOF'
#!/usr/bin/env python3
import requests
import sys

checks = {
    'http': {'weight': 50, 'url': 'http://localhost:8080/health'},
    'db': {'weight': 30, 'url': 'http://localhost:5432'},
    'cache': {'weight': 20, 'url': 'http://localhost:6379'}
}

score = 0
total = 0

for name, verifica in checks.items():
    total += verifica['weight']
    try:
        requests.get(verifica['url'], timeout=2)
        score += verifica['weight']
    except:
        print(f"Check failed: {name}")

if score >= total * 0.8:  # 80% threshold
    print(f"Health score: {score}/{total}")
    sys.exit(0)
else:
    print(f"Unhealthy: {score}/{total}")
    sys.exit(1)
EOF

Progressive health checks:

cat > Dockerfile <<'EOF'
FROM nodo:18-alpine
WORKDIR /app
COPY . .
RUN npm install

HEALTHCHECK --interval=10s --timeout=5s --start-period=30s --retries=2 \
    CMD nodo -e "require('http').get('http://localhost:3000/health', (r) => {if(r.statusCode !== 200) throw new Error(r.statusCode)})"

CMD ["npm", "start"]
EOF

Conclusión

Health checks are a critical component of production contenedor infrastructure, enabling automatic detection and recovery from application failures. By implementing appropriate health verifica commands, configuring suitable timing parameters, and integrating with orchestration systems, you create self-healing infrastructure that requires minimal manual intervention. Inicia with simple HTTP-based health checks for web applications, progress to database-specific checks for stateful servicios, and eventually implement composite health checks for complex microservices. Regularly review your health verifica configuration as your applications evolve, and consider using monitoring tools to track health trends across your entire infrastructure. Proper health verifica configuration separates amateur contenedor deployments from professional, reliable systems.