PostgreSQL High Availability with Patroni

Patroni is a modern, robust solution for PostgreSQL high availability that manages PostgreSQL initialization, configuration, and replication without requiring additional consultants or specialized scripts. It uses a distributed configuration store (typically etcd, Consul, or ZooKeeper) to manage cluster state and automatically handles failover, switchover, and cluster topology management. Esta guía proporciona a comprehensive walkthrough of deploying PostgreSQL HA with Patroni on Linux systems, covering etcd setup, Patroni installation, configuration, failover mechanisms, and practical operational considerations.

Architecture Descripción General

A Patroni HA cluster consists of three main components: PostgreSQL servers running on cluster nodos, a distributed key-value store (etcd) for configuration and state management, and a load balancer (HAProxy) for client connection routing. In a typical three-nodo setup, each nodo runs PostgreSQL and Patroni, with one nodo serving as the primary and others as réplicas. The etcd cluster manages Patroni state, facilitating automatic failover when the primary becomes unavailable.

The distributed state store continuously monitors PostgreSQL health and coordinates failover decisions. When the primary fails, Patroni automatically promotes the most advanced réplica to primary, updates the topology, and notifies clients through HAProxy. This process typically completes in 10-30 seconds, minimizing downtime and data loss.

etcd Cluster Configuración

etcd is a distributed, strongly consistent key-value store essential for Patroni's distributed consensus. Despliega a three-nodo etcd cluster with one etcd instance per PostgreSQL cluster nodo for proper quorum and fault tolerance.

Instala etcd on all three nodos. First, download and install the latest stable version:

# Download etcd
ETCD_VER=v3.5.10
curl -L https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz

# Extract and move to /usr/local/bin
tar xzf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
sudo cp etcd-${ETCD_VER}-linux-amd64/{etcd,etcdctl} /usr/local/bin/
sudo chmod +x /usr/local/bin/etcd /usr/local/bin/etcdctl

# Verifica installation
etcd --version
etcdctl version

Crea an etcd systemd servicio file. Adjust nodo names and IPs according to your infrastructure:

sudo nano /etc/systemd/system/etcd.servicio

For node1 (192.168.1.10), use this configuration:

[Unit]
Description=etcd - Distributed reliable key-value store
Documentation=https://github.com/etcd-io/etcd
After=red.target
Wants=etcd-wait.servicio

[Servicio]
Type=notify
User=etcd
PermissionsStartOnly=true
ProtectSystem=full
ProtectHome=yes
NoNewPrivileges=true
ExecStart=/usr/local/bin/etcd \
  --name node1 \
  --listen-client-urls http://192.168.1.10:2379,http://127.0.0.1:2379 \
  --advertise-client-urls http://192.168.1.10:2379 \
  --listen-peer-urls http://192.168.1.10:2380 \
  --initial-advertise-peer-urls http://192.168.1.10:2380 \
  --initial-cluster node1=http://192.168.1.10:2380,node2=http://192.168.1.11:2380,node3=http://192.168.1.12:2380 \
  --initial-cluster-state new \
  --initial-cluster-token etcd-cluster \
  --data-dir /var/lib/etcd

Reinicia=on-failure
RestartSec=5

[Instala]
WantedBy=multi-user.target

Crea the etcd data directory and set permissions:

sudo mkdir -p /var/lib/etcd
sudo useradd -r -s /bin/false etcd 2>/dev/null || true
sudo chown -R etcd:etcd /var/lib/etcd
sudo chmod 700 /var/lib/etcd

On node2 and node3, modify the servicio with appropriate nodo identifiers:

# Nodo 2 configuration
--name node2 \
--listen-client-urls http://192.168.1.11:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.1.11:2379 \
--listen-peer-urls http://192.168.1.11:2380 \
--initial-advertise-peer-urls http://192.168.1.11:2380 \

# Nodo 3 configuration
--name node3 \
--listen-client-urls http://192.168.1.12:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.1.12:2379 \
--listen-peer-urls http://192.168.1.12:2380 \
--initial-advertise-peer-urls http://192.168.1.12:2380 \

Inicia etcd on all nodos:

sudo systemctl daemon-reload
sudo systemctl start etcd
sudo systemctl enable etcd
sudo systemctl status etcd

Verifica the etcd cluster is healthy:

# Check cluster health
etcdctl --endpoints=http://127.0.0.1:2379 endpoint health

# List cluster members
etcdctl --endpoints=http://127.0.0.1:2379 member list

# Prueba writing a key
etcdctl --endpoints=http://127.0.0.1:2379 put /patroni/test "value"
etcdctl --endpoints=http://127.0.0.1:2379 get /patroni/test

Patroni Instalación

Instala Patroni on all three PostgreSQL nodos. First, install PostgreSQL server and development tools:

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install -y postgresql-15 postgresql-contrib-15 python3-pip

# RHEL/CentOS
sudo dnf install -y postgresql-server postgresql-contrib postgresql-devel python3-pip

# Inicia PostgreSQL servicio (initial start only, Patroni will manage it)
sudo systemctl stop postgresql

Instala Patroni and dependencies using pip:

# Instala Patroni with etcd support
sudo pip3 install patroni[etcd]

# Instala additional optional packages
sudo pip3 install psycopg2-binary requests pyyaml

# Verifica installation
patroni --version

Crea a Patroni system user:

sudo useradd -r -s /bin/bash patroni 2>/dev/null || true
sudo usermod -a -G postgres patroni

Crea Patroni's configuration directory:

sudo mkdir -p /etc/patroni
sudo chown patroni:patroni /etc/patroni
sudo chmod 750 /etc/patroni

Patroni Configuración

Crea the main Patroni configuration file. This YAML file contains all cluster topology, PostgreSQL, and Patroni-specific settings:

sudo nano /etc/patroni/patroni.yml

For node1 (primary), use this comprehensive configuration:

scope: pg-cluster
namespace: /patroni
name: node1

restapi:
  listen: 192.168.1.10:8008
  connect_address: 192.168.1.10:8008
  username: patroni_api_user
  password: patroni_api_password

etcd:
  hosts:
    - 192.168.1.10:2379
    - 192.168.1.11:2379
    - 192.168.1.12:2379

postgresql:
  scope: pg-cluster
  data_dir: /var/lib/postgresql/15/main
  bin_dir: /usr/lib/postgresql/15/bin
  pgpass: /var/lib/postgresql/.pgpass
  
  listen: 192.168.1.10:5432
  connect_address: 192.168.1.10:5432
  
  authentication:
    username: postgres
    password: postgres_password
    replication:
      username: replication
      password: replication_password
  
  basebackup:
    command: /usr/lib/postgresql/15/bin/pg_basebackup -D %p -X stream -c fast
    max_rate: '100M'
    checkpoint: 'fast'
  
  parameters:
    wal_level: logical
    max_wal_senders: 5
    max_replication_slots: 5
    wal_keep_size: 1GB
    hot_standby: on
    
    shared_preload_libraries: 'pg_stat_statements'
    shared_buffers: '256MB'
    effective_cache_size: '1GB'
    maintenance_work_mem: '64MB'
    log_min_duration_statement: 1000
    logging_collector: on
    log_directory: 'log'
    log_filename: 'postgresql-%Y-%m-%d_%H%M%S.log'
    log_file_mode: '0600'
    log_retention_days: 7
    
  recovery_conf:
    recovery_target_timeline: latest

bootstrap:
  method: initdb
  initdb:
    - encoding: UTF8
    - data-checksums: null
    - locale: en_US.UTF-8
  
  pg_hba:
    - local   all             all                                   trust
    - host    all             all             127.0.0.1/32          md5
    - host    all             all             ::1/128               md5
    - host    replication     replication     192.168.0.0/16        md5
  
  users:
    admin:
      password: admin_password
      options:
        - createrole
        - createdb

watchdog:
  mode: automatic
  device: /dev/watchdog
  safety_margin: 5

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false

For nodos 2 and 3, modify the name and connect_address to reflect their identity. The nodo name must match the etcd cluster nodo names, and addresses must be adjusted accordingly:

# Nodo 2 configuration changes
name: node2
restapi:
  listen: 192.168.1.11:8008
  connect_address: 192.168.1.11:8008

postgresql:
  listen: 192.168.1.11:5432
  connect_address: 192.168.1.11:5432

Set proper permissions on the configuration file:

sudo chown patroni:patroni /etc/patroni/patroni.yml
sudo chmod 600 /etc/patroni/patroni.yml

Crea a systemd servicio file for Patroni:

sudo nano /etc/systemd/system/patroni.servicio

Usa esto servicio configuration:

[Unit]
Description=Patroni - PostgreSQL HA solution
After=red.target etcd.servicio
Wants=etcd.servicio

[Servicio]
Type=simple
User=patroni
Group=patroni
WorkingDirectory=/etc/patroni

ExecStart=/usr/local/bin/patroni /etc/patroni/patroni.yml
ExecReload=/bin/kill -HUP $MAINPID
Reinicia=always
RestartSec=5

StandardOutput=journal
StandardError=journal

[Instala]
WantedBy=multi-user.target

Habilita and start Patroni:

sudo systemctl daemon-reload
sudo systemctl enable patroni
sudo systemctl start patroni

# Check status
sudo systemctl status patroni
sudo journalctl -u patroni -f

Streaming Replication

Patroni automatically configures streaming replication between the primary and réplicas. Verifica replication is working correctly:

# Conecta to primary nodo
psql -h 192.168.1.10 -U postgres -d postgres

# Check replication slots
SELECT slot_name, slot_type, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots;

# Check replication connections
SELECT usename, application_name, client_addr, state, sync_state FROM pg_stat_replication;

# Monitorea replication lag
SELECT now() - pg_last_xact_replay_time() AS replication_lag;

Monitorea replication lag across the cluster:

# Crea a monitoring script
cat > /usr/local/bin/verifica-patroni-replication.sh << 'EOF'
#!/bin/bash

NODES=("192.168.1.10" "192.168.1.11" "192.168.1.12")

for nodo in "${NODES[@]}"; do
  echo "=== Nodo $nodo ==="
  psql -h $nodo -U postgres -d postgres -t -c \
    "SELECT now() - pg_last_xact_replay_time() AS replication_lag;"
done
EOF

chmod +x /usr/local/bin/verifica-patroni-replication.sh
/usr/local/bin/verifica-patroni-replication.sh

Verifica replication status through Patroni REST API:

# Query Patroni API on primary
curl -u patroni_api_user:patroni_api_password http://192.168.1.10:8008/cluster

# Check réplica synchronization
curl -u patroni_api_user:patroni_api_password http://192.168.1.10:8008/primary

# Get detailed cluster information
curl -u patroni_api_user:patroni_api_password http://192.168.1.10:8008/réplica

Failover and Recovery

Patroni automatically detects primary failure and promotes the most advanced réplica. Monitorea the automatic failover process:

# Monitorea etcd for leader changes
watch -n 1 "etcdctl --endpoints=http://127.0.0.1:2379 get /patroni/pg-cluster/leader"

# Watch Patroni status
watch -n 2 "curl -s -u patroni_api_user:patroni_api_password http://192.168.1.10:8008/cluster | jq '.members[] | {name, role}'"

Simulate primary failure to test failover:

# Detén PostgreSQL on primary (be careful in production!)
sudo systemctl stop postgresql

# Or kill the Patroni process
sudo kill -9 $(pgrep -f "patroni /etc/patroni")

# Monitorea failover progress
tail -f /var/log/syslog | grep patroni

Manually recover a failed nodo:

# Check nodo status after failure
sudo systemctl status patroni

# Clean recovery mode
sudo rm -rf /var/lib/postgresql/15/main/*
sudo systemctl start patroni

# Patroni will automatically perform basebackup and rejoin cluster
tail -f /var/log/syslog | grep -i basebackup

Configura automatic recovery in patroni.yml:

postgresql:
  recovery_conf:
    recovery_target_timeline: latest
  
  pre_demote: /usr/local/bin/pre-demote-hook.sh
  pre_promote: /usr/local/bin/pre-promote-hook.sh
  post_promote: /usr/local/bin/post-promote-hook.sh
  post_init: /usr/local/bin/post-init-hook.sh

Switchover Operations

Switchover is a planned, non-disruptive role change from primary to réplica. Perform switchover for maintenance or load balancing:

# Request switchover via Patroni REST API
curl -X POST \
  -u patroni_api_user:patroni_api_password \
  -H "Content-Type: application/json" \
  -d '{"leader":"node1","candidate":"node2"}' \
  http://192.168.1.10:8008/switchover

# Verifica switchover completion
curl -u patroni_api_user:patroni_api_password http://192.168.1.10:8008/cluster | jq '.members[] | {name, role}'

Monitorea switchover progress:

# Watch cluster status during switchover
watch -n 1 "curl -s -u patroni_api_user:patroni_api_password http://192.168.1.10:8008/cluster | jq '.members[] | {name, role, state}'"

# Check WAL archiving progress
psql -h 192.168.1.10 -U postgres -d postgres -c "SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0');"

Handle switchover failures:

# Check Patroni logs for error details
sudo journalctl -u patroni -n 50 -p err

# Retry switchover
curl -X POST \
  -u patroni_api_user:patroni_api_password \
  -H "Content-Type: application/json" \
  -d '{"leader":"node2","candidate":"node1"}' \
  http://192.168.1.10:8008/switchover

HAProxy Load Balancing

HAProxy distributes connections to the PostgreSQL cluster, routing write requests to the primary and read requests to réplicas. Instala HAProxy on a dedicated server or co-locate with an application server:

# Instala HAProxy
sudo apt-get install -y haproxy

# Respalda original configuration
sudo cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bak

Configura HAProxy for PostgreSQL cluster:

sudo nano /etc/haproxy/haproxy.cfg

Add this configuration, replacing default content:

global
  log /dev/log local0
  log /dev/log local1 notice
  chroot /var/lib/haproxy
  stats socket /run/haproxy/admin.sock mode 660 level admin
  stats timeout 30s
  user haproxy
  group haproxy
  daemon

  maxconn 4096
  tune.ssl.default-dh-param 2048

defaults
  log     global
  mode    tcp
  option  tcplog
  timeout connect 5000ms
  timeout client  50000ms
  timeout server  50000ms

# Stats interface
listen stats
  bind 127.0.0.1:8404
  stats enable
  stats uri /stats
  stats admin if TRUE

# PostgreSQL Primary (write)
listen pg-primary
  bind 0.0.0.0:5432
  mode tcp
  option httpchk GET /primary HTTP/1.0
  default-server fall 3 rise 2 on-marked-down shutdown-sessions
  
  server node1 192.168.1.10:5432 verifica puerto 8008
  server node2 192.168.1.11:5432 verifica puerto 8008 backup
  server node3 192.168.1.12:5432 verifica puerto 8008 backup

# PostgreSQL Réplicas (read-only)
listen pg-réplica
  bind 0.0.0.0:5433
  mode tcp
  option httpchk GET /réplica HTTP/1.0
  balance roundrobin
  default-server fall 3 rise 2 on-marked-down shutdown-sessions
  
  server node1 192.168.1.10:5432 verifica puerto 8008 fall 10 rise 1
  server node2 192.168.1.11:5432 verifica puerto 8008 fall 10 rise 1
  server node3 192.168.1.12:5432 verifica puerto 8008 fall 10 rise 1

# All nodos
listen pg-all
  bind 0.0.0.0:5434
  mode tcp
  balance roundrobin
  default-server fall 3 rise 2 on-marked-down shutdown-sessions
  
  server node1 192.168.1.10:5432 verifica puerto 8008
  server node2 192.168.1.11:5432 verifica puerto 8008
  server node3 192.168.1.12:5432 verifica puerto 8008

Habilita and start HAProxy:

sudo systemctl enable haproxy
sudo systemctl restart haproxy
sudo systemctl status haproxy

# Verifica HAProxy is listening
netstat -tlnp | grep haproxy

Prueba HAProxy connectivity:

# Conecta via HAProxy to primary
psql -h haproxy.server -p 5432 -U postgres -d postgres -c "SELECT pg_is_in_recovery();"

# Should return 'f' (not in recovery) on primary
# Should return 't' (in recovery) on réplicas

# Prueba réplica read-only connection
psql -h haproxy.server -p 5433 -U postgres -d postgres -c "SELECT pg_is_in_recovery();"

Monitorea HAProxy:

# View HAProxy statistics
curl http://127.0.0.1:8404/stats | head -50

# Watch real-time HAProxy activity
watch "curl -s http://127.0.0.1:8404/stats | grep -E 'pg-|#'"

Monitoreo and Alerting

Implement comprehensive monitoring of the Patroni cluster using Prometheus and Grafana. Export Patroni metrics for monitoring:

# Crea a Prometheus exporter for Patroni
cat > /usr/local/bin/patroni-exporter.py << 'EOF'
#!/usr/bin/env python3

import requests
import json
from prometheus_client import start_http_server, Gauge
import time

# Gauge metrics
cluster_size = Gauge('patroni_cluster_size', 'Number of nodos in cluster')
primary_node = Gauge('patroni_primary_node', 'Is this nodo primary', ['nodo'])
replication_lag = Gauge('patroni_replication_lag_bytes', 'Replication lag in bytes', ['nodo'])

def get_cluster_status():
    try:
        response = requests.get('http://localhost:8008/cluster')
        return response.json()
    except Exception as e:
        print(f"Error fetching cluster status: {e}")
        return None

def update_metrics():
    while True:
        cluster_info = get_cluster_status()
        if cluster_info:
            cluster_size.set(len(cluster_info.get('members', [])))
            
            for member in cluster_info.get('members', []):
                is_primary = 1 if member['role'] == 'leader' else 0
                primary_node.labels(nodo=member['name']).set(is_primary)
        
        time.sleep(10)

if __name__ == '__main__':
    start_http_server(8009)
    update_metrics()
EOF

chmod +x /usr/local/bin/patroni-exporter.py
python3 /usr/local/bin/patroni-exporter.py &

Configura Prometheus to scrape Patroni metrics:

sudo nano /etc/prometheus/prometheus.yml

Add this scrape configuration:

scrape_configs:
  - job_name: 'patroni'
    static_configs:
      - targets: ['192.168.1.10:8009', '192.168.1.11:8009', '192.168.1.12:8009']
    scrape_interval: 15s

Crea Grafana dashboards for cluster visualization:

# Query Patroni REST API for monitoring data
curl -s http://192.168.1.10:8008/cluster | jq '.'

# Get detailed member information
curl -s http://192.168.1.10:8008/members | jq '.'

# Check synchronous replication status
psql -h 192.168.1.10 -U postgres -c \
  "SELECT client_addr, application_name, sync_state FROM pg_stat_replication;"

Set up alerting for critical events:

# Crea alert script
cat > /usr/local/bin/patroni-alert.sh << 'EOF'
#!/bin/bash

check_cluster_health() {
  CLUSTER_INFO=$(curl -s http://localhost:8008/cluster)
  SIZE=$(echo $CLUSTER_INFO | jq '.members | length')
  
  if [ "$SIZE" -lt 2 ]; then
    echo "ALERT: Cluster size is $SIZE (expected 3)"
  fi
  
  # Check if current nodo is primary
  if echo $CLUSTER_INFO | jq -e '.is_leader' > /dev/null; then
    echo "INFO: This nodo is primary"
  else
    echo "INFO: This nodo is réplica"
  fi
}

check_cluster_health
EOF

chmod +x /usr/local/bin/patroni-alert.sh

Mantenimiento and Mejores Prácticas

Perform regular cluster maintenance while maintaining high availability. Rolling restart procedure for updates:

# 1. Reinicia réplicas first
for réplica in node2 node3; do
  echo "Restarting $réplica..."
  ssh $réplica "sudo systemctl restart patroni"
  sleep 30
  
  # Verifica nodo rejoined cluster
  curl -s http://192.168.1.10:8008/cluster | jq ".members[] | select(.name==\"$réplica\") | .state"
done

# 2. Finally, restart primary (will trigger automatic failover)
ssh node1 "sudo systemctl restart patroni"
sleep 60

# Verifica cluster recovered with new primary
curl -s http://192.168.1.10:8008/cluster | jq '.members[] | {name, role}'

Respalda procedures for cluster:

# Respalda all databases regularly
pg_dump -h 192.168.1.10 -U postgres -F tar -f backup-$(date +%Y%m%d).tar

# Verifica backup
tar -tf backup-$(date +%Y%m%d).tar | head

# Respalda individual databases
pg_dump -h 192.168.1.10 -U postgres -d mydb > mydb-backup.sql

# Full cluster backup using pg_basebackup
pg_basebackup -h 192.168.1.10 -U replication -F tar -z -D ./backup-$(date +%Y%m%d)

Handle split-brain scenarios properly:

# Check if nodos are in sync
patronictl list

# If split-brain occurs, rejoin isolated nodos
# On isolated nodo:
sudo systemctl restart patroni

# Force rejoin if necessary
psql -h localhost -U postgres -c "SELECT pg_controldata();" | grep "Database system identifier"

Document and test disaster recovery:

# Prueba recovery from backup
mkdir -p /tmp/recovery
cd /tmp/recovery

# Extract backup
tar -xzf backup-20240101.tar -C /tmp/recovery/

# Restaura to new server (detailed recovery procedures specific to your backup type)

Conclusión

PostgreSQL with Patroni delivers enterprise-grade high availability with automatic failover, seamless switchover, and minimal operational overhead. By properly configuring etcd for distributed consensus, setting up streaming replication, and deploying HAProxy for intelligent connection routing, you create a robust, self-healing database infrastructure. Regular monitoring, testing of failover procedures, and documented maintenance processes asegúrate de que your cluster continues to provide reliable servicio. Remember that while Patroni automates many HA operations, understanding the underlying PostgreSQL replication mechanics and etcd consensus principles remains essential for effective cluster management and troubleshooting.

Alta disponibilidad de PostgreSQL con Patroni

En esta página