Verificaciones de Salud de HAProxy y Conmutación por Error

HAProxy proporciona mecanismos sofisticados de verificación de salud para asegurar que el tráfico solo enruta a servidores backend saludables. A diferencia de las verificaciones de salud pasivas que esperan fallos, HAProxy prueba activamente servidores backend, habilitando conmutación por error rápida y recuperación automática. Esta guía cubre verificaciones de salud HTTP y TCP, parámetros de configuración, servidores de respaldo, servidores de disculpa y estrategias de monitoreo.

Descripción General de Verificaciones de Salud

Las verificaciones de salud detectan servidores no saludables antes de que fallen las solicitudes. HAProxy soporta:

Verificaciones Activas: Envíe activamente solicitudes de prueba a backends
Verificaciones HTTP: Verifique códigos de respuesta HTTP y contenido
Verificaciones TCP: Verifique conectividad TCP
Verificaciones de Agente: Verificaciones basadas en agente personalizado

Las verificaciones de salud activas habilitan:

Detección inmediata de fallos
Remoción automática de rotación
Recuperación rápida cuando servidores vuelven
Tasas de error del lado del cliente reducidas

Verificaciones de Salud HTTP

Configuración básica de verificación de salud HTTP:

cat > /etc/haproxy/haproxy.cfg <<'EOF'
global
    log stdout local0
    stats socket /run/haproxy/admin.sock mode 660 level admin

defaults
    mode http
    timeout connect 5000
    timeout client 50000
    timeout server 50000

frontend web_in
    bind *:80
    default_backend web_servers

backend web_servers
    balance roundrobin
    option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
    
    server web1 192.168.1.100:8000 check
    server web2 192.168.1.101:8000 check
    server web3 192.168.1.102:8000 check
EOF

La opción httpchk envía solicitudes GET /health HTTP/1.1 a cada backend:

sudo systemctl reload haproxy

Monitoree el estado de salud:

echo "show stat" | socat - /run/haproxy/admin.sock | grep -i status

Verificaciones de Salud TCP

Use verificaciones TCP para servicios no-HTTP:

backend database_servers
    balance roundrobin
    option tcp-check
    tcp-check connect port 5432
    
    server db1 192.168.1.150:5432 check
    server db2 192.168.1.151:5432 check

Las verificaciones TCP verifican solo que el puerto sea alcanzable, sin validación a nivel de aplicación.

Configure verificaciones TCP con opciones específicas:

backend cache_servers
    balance roundrobin
    option tcp-check
    tcp-check connect port 6379 timeout 2s
    
    server redis1 192.168.1.160:6379 check
    server redis2 192.168.1.161:6379 check

Parámetros de Verificación de Salud

Ajuste fino del comportamiento de verificación de salud con parámetros específicos:

backend api_servers
    balance roundrobin
    option httpchk GET /api/health HTTP/1.1\r\nHost:\ api.example.com
    
    server api1 192.168.1.100:8080 check inter 2000 fall 3 rise 2 weight 1
    server api2 192.168.1.101:8080 check inter 2000 fall 3 rise 2 weight 1
    server api3 192.168.1.102:8080 check inter 2000 fall 3 rise 2 weight 1 backup

Explicaciones de parámetros:

check: Habilite verificación de salud
inter 2000: Intervalo de verificación en milisegundos (predeterminado 2000)
fall 3: Marcar abajo después de 3 fallos consecutivos
rise 2: Marcar arriba después de 2 éxitos consecutivos
weight 1: Peso del servidor para equilibrio de carga
backup: Usar solo cuando fallan servidores primarios

Verificaciones de Salud Avanzadas

Valide códigos de estado de respuesta HTTP:

backend web_servers
    balance roundrobin
    option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
    http-check expect status 200
    
    server web1 192.168.1.100:8000 check
    server web2 192.168.1.101:8000 check

Compruebe encabezados de respuesta específicos:

backend api_servers
    option httpchk GET /status HTTP/1.1\r\nHost:\ api.example.com
    http-check expect status 200
    http-check expect header Content-Type "application/json"
    
    server api1 192.168.1.110:8080 check
    server api2 192.168.1.111:8080 check

Valide contenido de cuerpo de respuesta:

backend web_servers
    option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
    http-check expect status 200
    http-check expect body "OK"
    
    server web1 192.168.1.100:8000 check
    server web2 192.168.1.101:8000 check

Use verificaciones de salud basadas en Lua para lógica compleja:

backend dynamic_servers
    option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
    http-check expect status 200
    http-check expect custom lognot "error"
    
    server srv1 192.168.1.100:8000 check
    server srv2 192.168.1.101:8000 check

Servidores de Respaldo

Designe servidores de respaldo como objetivos de conmutación por error:

backend web_servers
    balance roundrobin
    option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
    http-check expect status 200
    
    # Servidores primarios
    server web1 192.168.1.100:8000 check inter 2000 fall 3 rise 2
    server web2 192.168.1.101:8000 check inter 2000 fall 3 rise 2
    
    # Servidores de respaldo (usados solo si todos los primarios están abajo)
    server web3 192.168.1.102:8000 check inter 2000 fall 3 rise 2 backup
    server web4 192.168.1.103:8000 check inter 2000 fall 3 rise 2 backup

Cuando fallan todos los servidores primarios, HAProxy enruta a servidores de respaldo.

Servidores de Disculpa

Un "servidor de disculpa" muestra un mensaje de mantenimiento cuando todos los backends reales no están disponibles:

backend web_servers
    balance roundrobin
    option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
    http-check expect status 200
    
    server web1 192.168.1.100:8000 check
    server web2 192.168.1.101:8000 check
    server sorry_server 127.0.0.1:8888 backup

listen sorry_backend
    bind 127.0.0.1:8888
    mode http
    
    default_content_type "text/html; charset=utf-8"
    errorfile 503 /etc/haproxy/sorry.http

Cree /etc/haproxy/sorry.http:

HTTP/1.1 503 Service Unavailable
Content-Type: text/html; charset=utf-8
Content-Length: 200

<!DOCTYPE html>
<html>
<head>
    <title>Maintenance</title>
    <style>
        body { font-family: Arial, sans-serif; text-align: center; padding: 50px; }
        h1 { color: #333; }
    </style>
</head>
<body>
    <h1>Service Unavailable</h1>
    <p>We are currently performing maintenance.</p>
    <p>Please try again later.</p>
</body>
</html>

Verificaciones de Agente

Use verificaciones de agente de HAProxy para determinación de salud más sofisticada. Implemente un pequeño agente en cada servidor backend:

cat > /usr/local/bin/haproxy-agent.py <<'EOF'
#!/usr/bin/env python3
import socket
import sys
from http.server import HTTPServer, BaseHTTPRequestHandler
import json

class HealthHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        if self.path == '/health':
            # Check system health
            health_status = check_health()
            weight = 100 if health_status['healthy'] else 0
            
            self.send_response(200)
            self.send_header('Content-Type', 'text/plain')
            self.end_headers()
            self.wfile.write(f"{weight}\n".encode())
        else:
            self.send_response(404)
            self.end_headers()
    
    def log_message(self, format, *args):
        pass  # Suppress logging

def check_health():
    # Implement custom health logic
    return {'healthy': True}

if __name__ == '__main__':
    server = HTTPServer(('127.0.0.1', 5555), HealthHandler)
    server.serve_forever()
EOF

chmod +x /usr/local/bin/haproxy-agent.py

Configure verificación de agente de HAProxy:

backend api_servers
    balance roundrobin
    option httpchk GET /health HTTP/1.1\r\nHost:\ api.example.com
    
    # Agent check for dynamic weight adjustment
    server api1 192.168.1.100:8080 check agent-check agent-port 5555
    server api2 192.168.1.101:8080 check agent-check agent-port 5555

El agente devuelve un peso (0-100), permitiendo ajuste de equilibrio de carga dinámico.

Persistencia Durante Conmutación por Error

Mantenga persistencia de sesión incluso cuando fallan servidores:

backend api_servers
    balance roundrobin
    option httpchk GET /health HTTP/1.1\r\nHost:\ api.example.com
    http-check expect status 200
    
    stick-table type string len 32 size 100k expire 30m
    stick on cookie(JSESSIONID)
    stick on src if !{ req.hdr(Authorization) }
    
    server api1 192.168.1.100:8080 check
    server api2 192.168.1.101:8080 check
    server api3 192.168.1.102:8080 check

Cuando un servidor falla pero la sesión adhesiva del cliente está en ese servidor, HAProxy:

Marca el servidor abajo
Reenruta a otro servidor
Mantiene la sesión adhesiva para la próxima solicitud

Monitoreo de Estado de Salud

Use la página de estadísticas para monitorear la salud:

listen stats
    bind *:8404
    mode http
    stats enable
    stats uri /stats
    stats refresh 5s
    stats show-legends

Acceda a estadísticas:

curl http://localhost:8404/stats

Extraiga información de salud a través del socket de administración:

echo "show servers state" | socat - /run/haproxy/admin.sock
echo "show backend" | socat - /run/haproxy/admin.sock

Monitoree backend específico:

watch -n 1 'echo "show stat" | socat - /run/haproxy/admin.sock | grep "api_servers"'

Solución de Problemas

Compruebe si las verificaciones de salud están ejecutándose:

sudo tcpdump -i any -n "port 8000 and (tcp[tcpflags] & tcp-syn) != 0"

Verifique la conectividad de verificación de salud manualmente:

curl -v http://192.168.1.100:8000/health

Pruebe la respuesta de verificación de salud HTTP:

curl -v "http://192.168.1.100:8000/health" \
  -H "Host: example.com"

Compruebe los registros de HAProxy para fallos de verificación de salud:

tail -f /var/log/haproxy.log | grep -i "health\|down\|up"

Revise la configuración de HAProxy:

haproxy -f /etc/haproxy/haproxy.cfg -c

Monitoree cambios de estado del servidor:

sudo journalctl -u haproxy -f | grep -i "server\|health"

Aumente el detalle de registro:

global
    log stdout local0 debug

Recargue y pruebe:

sudo systemctl reload haproxy
curl http://localhost/test

Conclusión

Los mecanismos robustos de verificación de salud y conmutación por error de HAProxy aseguran alta disponibilidad y confiabilidad. Al monitorear activamente la salud del backend, detectar fallos rápidamente y gestionar la conmutación por error mediante servidores de respaldo y disculpa, HAProxy mantiene la disponibilidad del servicio incluso durante problemas de infraestructura. Combinado con sesiones adhesivas y parámetros de verificación sofisticados, HAProxy proporciona resiliencia de grado de producción para aplicaciones críticas.

Verificaciones de salud de HAProxy y conmutación por error

En esta página