HAProxy Health Checks and Failover

HAProxy provides sophisticated health checking mechanisms to ensure traffic routes only to healthy backend servers. Unlike passive health checking that waits for failures, HAProxy actively probes backend servers, enabling rapid failover and automatic recovery. This guide covers HTTP and TCP health checks, configuration parameters, backup servers, sorry servers, and monitoring strategies.

Table of Contents

  1. Health Check Overview
  2. HTTP Health Checks
  3. TCP Health Checks
  4. Health Check Parameters
  5. Advanced Health Checks
  6. Backup Servers
  7. Sorry Servers
  8. Agent Checks
  9. Persistence During Failover
  10. Monitoring Health Status
  11. Troubleshooting

Health Check Overview

Health checks detect unhealthy servers before requests fail. HAProxy supports:

  • Active Checks: Proactively send test requests to backends
  • HTTP Checks: Verify HTTP response codes and content
  • TCP Checks: Verify TCP connectivity
  • Agent Checks: Custom agent-based checks

Active health checking enables:

  • Immediate detection of failures
  • Automatic removal from rotation
  • Quick recovery when servers return
  • Reduced client-side error rates

HTTP Health Checks

Basic HTTP health check configuration:

cat > /etc/haproxy/haproxy.cfg <<'EOF'
global
    log stdout local0
    stats socket /run/haproxy/admin.sock mode 660 level admin

defaults
    mode http
    timeout connect 5000
    timeout client 50000
    timeout server 50000

frontend web_in
    bind *:80
    default_backend web_servers

backend web_servers
    balance roundrobin
    option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
    
    server web1 192.168.1.100:8000 check
    server web2 192.168.1.101:8000 check
    server web3 192.168.1.102:8000 check
EOF

The option httpchk sends GET /health HTTP/1.1 requests to each backend:

sudo systemctl reload haproxy

Monitor health status:

echo "show stat" | socat - /run/haproxy/admin.sock | grep -i status

TCP Health Checks

Use TCP checks for non-HTTP services:

backend database_servers
    balance roundrobin
    option tcp-check
    tcp-check connect port 5432
    
    server db1 192.168.1.150:5432 check
    server db2 192.168.1.151:5432 check

TCP checks verify only that the port is reachable, no application-level validation.

Configure TCP checks with specific options:

backend cache_servers
    balance roundrobin
    option tcp-check
    tcp-check connect port 6379 timeout 2s
    
    server redis1 192.168.1.160:6379 check
    server redis2 192.168.1.161:6379 check

Health Check Parameters

Fine-tune health check behavior with specific parameters:

backend api_servers
    balance roundrobin
    option httpchk GET /api/health HTTP/1.1\r\nHost:\ api.example.com
    
    server api1 192.168.1.100:8080 check inter 2000 fall 3 rise 2 weight 1
    server api2 192.168.1.101:8080 check inter 2000 fall 3 rise 2 weight 1
    server api3 192.168.1.102:8080 check inter 2000 fall 3 rise 2 weight 1 backup

Parameter explanations:

  • check: Enable health checking
  • inter 2000: Check interval in milliseconds (default 2000)
  • fall 3: Mark down after 3 consecutive failures
  • rise 2: Mark up after 2 consecutive successes
  • weight 1: Server weight for load balancing
  • backup: Use only when primary servers fail

Advanced Health Checks

Validate HTTP response status codes:

backend web_servers
    balance roundrobin
    option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
    http-check expect status 200
    
    server web1 192.168.1.100:8000 check
    server web2 192.168.1.101:8000 check

Check for specific response headers:

backend api_servers
    option httpchk GET /status HTTP/1.1\r\nHost:\ api.example.com
    http-check expect status 200
    http-check expect header Content-Type "application/json"
    
    server api1 192.168.1.110:8080 check
    server api2 192.168.1.111:8080 check

Validate response body content:

backend web_servers
    option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
    http-check expect status 200
    http-check expect body "OK"
    
    server web1 192.168.1.100:8000 check
    server web2 192.168.1.101:8000 check

Use Lua-based health checks for complex logic:

backend dynamic_servers
    option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
    http-check expect status 200
    http-check expect custom lognot "error"
    
    server srv1 192.168.1.100:8000 check
    server srv2 192.168.1.101:8000 check

Backup Servers

Designate backup servers as failover targets:

backend web_servers
    balance roundrobin
    option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
    http-check expect status 200
    
    # Primary servers
    server web1 192.168.1.100:8000 check inter 2000 fall 3 rise 2
    server web2 192.168.1.101:8000 check inter 2000 fall 3 rise 2
    
    # Backup servers (used only if all primaries are down)
    server web3 192.168.1.102:8000 check inter 2000 fall 3 rise 2 backup
    server web4 192.168.1.103:8000 check inter 2000 fall 3 rise 2 backup

When all primary servers fail, HAProxy routes to backup servers.

Sorry Servers

A "sorry server" displays a maintenance message when all real backends are unavailable:

backend web_servers
    balance roundrobin
    option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
    http-check expect status 200
    
    server web1 192.168.1.100:8000 check
    server web2 192.168.1.101:8000 check
    server sorry_server 127.0.0.1:8888 backup

listen sorry_backend
    bind 127.0.0.1:8888
    mode http
    
    default_content_type "text/html; charset=utf-8"
    errorfile 503 /etc/haproxy/sorry.http

Create /etc/haproxy/sorry.http:

HTTP/1.1 503 Service Unavailable
Content-Type: text/html; charset=utf-8
Content-Length: 200

<!DOCTYPE html>
<html>
<head>
    <title>Maintenance</title>
    <style>
        body { font-family: Arial, sans-serif; text-align: center; padding: 50px; }
        h1 { color: #333; }
    </style>
</head>
<body>
    <h1>Service Unavailable</h1>
    <p>We are currently performing maintenance.</p>
    <p>Please try again later.</p>
</body>
</html>

Agent Checks

Use HAProxy agent checks for more sophisticated health determination. Deploy a small agent on each backend server:

cat > /usr/local/bin/haproxy-agent.py <<'EOF'
#!/usr/bin/env python3
import socket
import sys
from http.server import HTTPServer, BaseHTTPRequestHandler
import json

class HealthHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        if self.path == '/health':
            # Check system health
            health_status = check_health()
            weight = 100 if health_status['healthy'] else 0
            
            self.send_response(200)
            self.send_header('Content-Type', 'text/plain')
            self.end_headers()
            self.wfile.write(f"{weight}\n".encode())
        else:
            self.send_response(404)
            self.end_headers()
    
    def log_message(self, format, *args):
        pass  # Suppress logging

def check_health():
    # Implement custom health logic
    return {'healthy': True}

if __name__ == '__main__':
    server = HTTPServer(('127.0.0.1', 5555), HealthHandler)
    server.serve_forever()
EOF

chmod +x /usr/local/bin/haproxy-agent.py

Configure HAProxy agent check:

backend api_servers
    balance roundrobin
    option httpchk GET /health HTTP/1.1\r\nHost:\ api.example.com
    
    # Agent check for dynamic weight adjustment
    server api1 192.168.1.100:8080 check agent-check agent-port 5555
    server api2 192.168.1.101:8080 check agent-check agent-port 5555

The agent returns a weight (0-100), allowing dynamic load balancing adjustment.

Persistence During Failover

Maintain session persistence even when servers fail:

backend api_servers
    balance roundrobin
    option httpchk GET /health HTTP/1.1\r\nHost:\ api.example.com
    http-check expect status 200
    
    stick-table type string len 32 size 100k expire 30m
    stick on cookie(JSESSIONID)
    stick on src if !{ req.hdr(Authorization) }
    
    server api1 192.168.1.100:8080 check
    server api2 192.168.1.101:8080 check
    server api3 192.168.1.102:8080 check

When a server fails but the client's sticky session is on that server, HAProxy:

  1. Marks the server down
  2. Reroutes to another server
  3. Maintains the sticky session for next request

Monitoring Health Status

Use the stats page to monitor health:

listen stats
    bind *:8404
    mode http
    stats enable
    stats uri /stats
    stats refresh 5s
    stats show-legends

Access stats:

curl http://localhost:8404/stats

Extract health information via admin socket:

echo "show servers state" | socat - /run/haproxy/admin.sock
echo "show backend" | socat - /run/haproxy/admin.sock

Monitor specific backend:

watch -n 1 'echo "show stat" | socat - /run/haproxy/admin.sock | grep "api_servers"'

Troubleshooting

Check if health checks are running:

sudo tcpdump -i any -n "port 8000 and (tcp[tcpflags] & tcp-syn) != 0"

Verify health check connectivity manually:

curl -v http://192.168.1.100:8000/health

Test HTTP health check response:

curl -v "http://192.168.1.100:8000/health" \
  -H "Host: example.com"

Check HAProxy logs for health check failures:

tail -f /var/log/haproxy.log | grep -i "health\|down\|up"

Review HAProxy configuration:

haproxy -f /etc/haproxy/haproxy.cfg -c

Monitor server state changes:

sudo journalctl -u haproxy -f | grep -i "server\|health"

Increase logging detail:

global
    log stdout local0 debug

Reload and test:

sudo systemctl reload haproxy
curl http://localhost/test

Conclusion

HAProxy's robust health checking and failover mechanisms ensure high availability and reliability. By actively monitoring backend health, quickly detecting failures, and managing failover through backup and sorry servers, HAProxy maintains service availability even during infrastructure issues. Combined with sticky sessions and sophisticated check parameters, HAProxy provides production-grade resilience for critical applications.