PostgreSQL High Availability with Patroni
Patroni is a modern, robust solution for PostgreSQL high availability that manages PostgreSQL initialization, configuration, and replication without requiring additional consultants or specialized scripts. It uses a distributed configuration store (typically etcd, Consul, or ZooKeeper) to manage cluster state and automatically handles failover, switchover, and cluster topology management. Esta guía proporciona a comprehensive walkthrough of deploying PostgreSQL HA with Patroni on Linux systems, covering etcd setup, Patroni installation, configuration, failover mechanisms, and practical operational considerations.
Tabla de Contenidos
- Architecture Descripción General
- etcd Cluster Configuración
- Patroni Instalación
- Patroni Configuración
- Streaming Replication
- Failover and Recovery
- Switchover Operations
- HAProxy Load Balancing
- Monitoreo and Alerting
- Mantenimiento and Mejores Prácticas
- Conclusión
Architecture Descripción General
A Patroni HA cluster consists of three main components: PostgreSQL servers running on cluster nodos, a distributed key-value store (etcd) for configuration and state management, and a load balancer (HAProxy) for client connection routing. In a typical three-nodo setup, each nodo runs PostgreSQL and Patroni, with one nodo serving as the primary and others as réplicas. The etcd cluster manages Patroni state, facilitating automatic failover when the primary becomes unavailable.
The distributed state store continuously monitors PostgreSQL health and coordinates failover decisions. When the primary fails, Patroni automatically promotes the most advanced réplica to primary, updates the topology, and notifies clients through HAProxy. This process typically completes in 10-30 seconds, minimizing downtime and data loss.
etcd Cluster Configuración
etcd is a distributed, strongly consistent key-value store essential for Patroni's distributed consensus. Despliega a three-nodo etcd cluster with one etcd instance per PostgreSQL cluster nodo for proper quorum and fault tolerance.
Instala etcd on all three nodos. First, download and install the latest stable version:
# Download etcd
ETCD_VER=v3.5.10
curl -L https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
# Extract and move to /usr/local/bin
tar xzf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
sudo cp etcd-${ETCD_VER}-linux-amd64/{etcd,etcdctl} /usr/local/bin/
sudo chmod +x /usr/local/bin/etcd /usr/local/bin/etcdctl
# Verifica installation
etcd --version
etcdctl version
Crea an etcd systemd servicio file. Adjust nodo names and IPs according to your infrastructure:
sudo nano /etc/systemd/system/etcd.servicio
For node1 (192.168.1.10), use this configuration:
[Unit]
Description=etcd - Distributed reliable key-value store
Documentation=https://github.com/etcd-io/etcd
After=red.target
Wants=etcd-wait.servicio
[Servicio]
Type=notify
User=etcd
PermissionsStartOnly=true
ProtectSystem=full
ProtectHome=yes
NoNewPrivileges=true
ExecStart=/usr/local/bin/etcd \
--name node1 \
--listen-client-urls http://192.168.1.10:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.1.10:2379 \
--listen-peer-urls http://192.168.1.10:2380 \
--initial-advertise-peer-urls http://192.168.1.10:2380 \
--initial-cluster node1=http://192.168.1.10:2380,node2=http://192.168.1.11:2380,node3=http://192.168.1.12:2380 \
--initial-cluster-state new \
--initial-cluster-token etcd-cluster \
--data-dir /var/lib/etcd
Reinicia=on-failure
RestartSec=5
[Instala]
WantedBy=multi-user.target
Crea the etcd data directory and set permissions:
sudo mkdir -p /var/lib/etcd
sudo useradd -r -s /bin/false etcd 2>/dev/null || true
sudo chown -R etcd:etcd /var/lib/etcd
sudo chmod 700 /var/lib/etcd
On node2 and node3, modify the servicio with appropriate nodo identifiers:
# Nodo 2 configuration
--name node2 \
--listen-client-urls http://192.168.1.11:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.1.11:2379 \
--listen-peer-urls http://192.168.1.11:2380 \
--initial-advertise-peer-urls http://192.168.1.11:2380 \
# Nodo 3 configuration
--name node3 \
--listen-client-urls http://192.168.1.12:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.1.12:2379 \
--listen-peer-urls http://192.168.1.12:2380 \
--initial-advertise-peer-urls http://192.168.1.12:2380 \
Inicia etcd on all nodos:
sudo systemctl daemon-reload
sudo systemctl start etcd
sudo systemctl enable etcd
sudo systemctl status etcd
Verifica the etcd cluster is healthy:
# Check cluster health
etcdctl --endpoints=http://127.0.0.1:2379 endpoint health
# List cluster members
etcdctl --endpoints=http://127.0.0.1:2379 member list
# Prueba writing a key
etcdctl --endpoints=http://127.0.0.1:2379 put /patroni/test "value"
etcdctl --endpoints=http://127.0.0.1:2379 get /patroni/test
Patroni Instalación
Instala Patroni on all three PostgreSQL nodos. First, install PostgreSQL server and development tools:
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install -y postgresql-15 postgresql-contrib-15 python3-pip
# RHEL/CentOS
sudo dnf install -y postgresql-server postgresql-contrib postgresql-devel python3-pip
# Inicia PostgreSQL servicio (initial start only, Patroni will manage it)
sudo systemctl stop postgresql
Instala Patroni and dependencies using pip:
# Instala Patroni with etcd support
sudo pip3 install patroni[etcd]
# Instala additional optional packages
sudo pip3 install psycopg2-binary requests pyyaml
# Verifica installation
patroni --version
Crea a Patroni system user:
sudo useradd -r -s /bin/bash patroni 2>/dev/null || true
sudo usermod -a -G postgres patroni
Crea Patroni's configuration directory:
sudo mkdir -p /etc/patroni
sudo chown patroni:patroni /etc/patroni
sudo chmod 750 /etc/patroni
Patroni Configuración
Crea the main Patroni configuration file. This YAML file contains all cluster topology, PostgreSQL, and Patroni-specific settings:
sudo nano /etc/patroni/patroni.yml
For node1 (primary), use this comprehensive configuration:
scope: pg-cluster
namespace: /patroni
name: node1
restapi:
listen: 192.168.1.10:8008
connect_address: 192.168.1.10:8008
username: patroni_api_user
password: patroni_api_password
etcd:
hosts:
- 192.168.1.10:2379
- 192.168.1.11:2379
- 192.168.1.12:2379
postgresql:
scope: pg-cluster
data_dir: /var/lib/postgresql/15/main
bin_dir: /usr/lib/postgresql/15/bin
pgpass: /var/lib/postgresql/.pgpass
listen: 192.168.1.10:5432
connect_address: 192.168.1.10:5432
authentication:
username: postgres
password: postgres_password
replication:
username: replication
password: replication_password
basebackup:
command: /usr/lib/postgresql/15/bin/pg_basebackup -D %p -X stream -c fast
max_rate: '100M'
checkpoint: 'fast'
parameters:
wal_level: logical
max_wal_senders: 5
max_replication_slots: 5
wal_keep_size: 1GB
hot_standby: on
shared_preload_libraries: 'pg_stat_statements'
shared_buffers: '256MB'
effective_cache_size: '1GB'
maintenance_work_mem: '64MB'
log_min_duration_statement: 1000
logging_collector: on
log_directory: 'log'
log_filename: 'postgresql-%Y-%m-%d_%H%M%S.log'
log_file_mode: '0600'
log_retention_days: 7
recovery_conf:
recovery_target_timeline: latest
bootstrap:
method: initdb
initdb:
- encoding: UTF8
- data-checksums: null
- locale: en_US.UTF-8
pg_hba:
- local all all trust
- host all all 127.0.0.1/32 md5
- host all all ::1/128 md5
- host replication replication 192.168.0.0/16 md5
users:
admin:
password: admin_password
options:
- createrole
- createdb
watchdog:
mode: automatic
device: /dev/watchdog
safety_margin: 5
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
For nodos 2 and 3, modify the name and connect_address to reflect their identity. The nodo name must match the etcd cluster nodo names, and addresses must be adjusted accordingly:
# Nodo 2 configuration changes
name: node2
restapi:
listen: 192.168.1.11:8008
connect_address: 192.168.1.11:8008
postgresql:
listen: 192.168.1.11:5432
connect_address: 192.168.1.11:5432
Set proper permissions on the configuration file:
sudo chown patroni:patroni /etc/patroni/patroni.yml
sudo chmod 600 /etc/patroni/patroni.yml
Crea a systemd servicio file for Patroni:
sudo nano /etc/systemd/system/patroni.servicio
Usa esto servicio configuration:
[Unit]
Description=Patroni - PostgreSQL HA solution
After=red.target etcd.servicio
Wants=etcd.servicio
[Servicio]
Type=simple
User=patroni
Group=patroni
WorkingDirectory=/etc/patroni
ExecStart=/usr/local/bin/patroni /etc/patroni/patroni.yml
ExecReload=/bin/kill -HUP $MAINPID
Reinicia=always
RestartSec=5
StandardOutput=journal
StandardError=journal
[Instala]
WantedBy=multi-user.target
Habilita and start Patroni:
sudo systemctl daemon-reload
sudo systemctl enable patroni
sudo systemctl start patroni
# Check status
sudo systemctl status patroni
sudo journalctl -u patroni -f
Streaming Replication
Patroni automatically configures streaming replication between the primary and réplicas. Verifica replication is working correctly:
# Conecta to primary nodo
psql -h 192.168.1.10 -U postgres -d postgres
# Check replication slots
SELECT slot_name, slot_type, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots;
# Check replication connections
SELECT usename, application_name, client_addr, state, sync_state FROM pg_stat_replication;
# Monitorea replication lag
SELECT now() - pg_last_xact_replay_time() AS replication_lag;
Monitorea replication lag across the cluster:
# Crea a monitoring script
cat > /usr/local/bin/verifica-patroni-replication.sh << 'EOF'
#!/bin/bash
NODES=("192.168.1.10" "192.168.1.11" "192.168.1.12")
for nodo in "${NODES[@]}"; do
echo "=== Nodo $nodo ==="
psql -h $nodo -U postgres -d postgres -t -c \
"SELECT now() - pg_last_xact_replay_time() AS replication_lag;"
done
EOF
chmod +x /usr/local/bin/verifica-patroni-replication.sh
/usr/local/bin/verifica-patroni-replication.sh
Verifica replication status through Patroni REST API:
# Query Patroni API on primary
curl -u patroni_api_user:patroni_api_password http://192.168.1.10:8008/cluster
# Check réplica synchronization
curl -u patroni_api_user:patroni_api_password http://192.168.1.10:8008/primary
# Get detailed cluster information
curl -u patroni_api_user:patroni_api_password http://192.168.1.10:8008/réplica
Failover and Recovery
Patroni automatically detects primary failure and promotes the most advanced réplica. Monitorea the automatic failover process:
# Monitorea etcd for leader changes
watch -n 1 "etcdctl --endpoints=http://127.0.0.1:2379 get /patroni/pg-cluster/leader"
# Watch Patroni status
watch -n 2 "curl -s -u patroni_api_user:patroni_api_password http://192.168.1.10:8008/cluster | jq '.members[] | {name, role}'"
Simulate primary failure to test failover:
# Detén PostgreSQL on primary (be careful in production!)
sudo systemctl stop postgresql
# Or kill the Patroni process
sudo kill -9 $(pgrep -f "patroni /etc/patroni")
# Monitorea failover progress
tail -f /var/log/syslog | grep patroni
Manually recover a failed nodo:
# Check nodo status after failure
sudo systemctl status patroni
# Clean recovery mode
sudo rm -rf /var/lib/postgresql/15/main/*
sudo systemctl start patroni
# Patroni will automatically perform basebackup and rejoin cluster
tail -f /var/log/syslog | grep -i basebackup
Configura automatic recovery in patroni.yml:
postgresql:
recovery_conf:
recovery_target_timeline: latest
pre_demote: /usr/local/bin/pre-demote-hook.sh
pre_promote: /usr/local/bin/pre-promote-hook.sh
post_promote: /usr/local/bin/post-promote-hook.sh
post_init: /usr/local/bin/post-init-hook.sh
Switchover Operations
Switchover is a planned, non-disruptive role change from primary to réplica. Perform switchover for maintenance or load balancing:
# Request switchover via Patroni REST API
curl -X POST \
-u patroni_api_user:patroni_api_password \
-H "Content-Type: application/json" \
-d '{"leader":"node1","candidate":"node2"}' \
http://192.168.1.10:8008/switchover
# Verifica switchover completion
curl -u patroni_api_user:patroni_api_password http://192.168.1.10:8008/cluster | jq '.members[] | {name, role}'
Monitorea switchover progress:
# Watch cluster status during switchover
watch -n 1 "curl -s -u patroni_api_user:patroni_api_password http://192.168.1.10:8008/cluster | jq '.members[] | {name, role, state}'"
# Check WAL archiving progress
psql -h 192.168.1.10 -U postgres -d postgres -c "SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0');"
Handle switchover failures:
# Check Patroni logs for error details
sudo journalctl -u patroni -n 50 -p err
# Retry switchover
curl -X POST \
-u patroni_api_user:patroni_api_password \
-H "Content-Type: application/json" \
-d '{"leader":"node2","candidate":"node1"}' \
http://192.168.1.10:8008/switchover
HAProxy Load Balancing
HAProxy distributes connections to the PostgreSQL cluster, routing write requests to the primary and read requests to réplicas. Instala HAProxy on a dedicated server or co-locate with an application server:
# Instala HAProxy
sudo apt-get install -y haproxy
# Respalda original configuration
sudo cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bak
Configura HAProxy for PostgreSQL cluster:
sudo nano /etc/haproxy/haproxy.cfg
Add this configuration, replacing default content:
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
maxconn 4096
tune.ssl.default-dh-param 2048
defaults
log global
mode tcp
option tcplog
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
# Stats interface
listen stats
bind 127.0.0.1:8404
stats enable
stats uri /stats
stats admin if TRUE
# PostgreSQL Primary (write)
listen pg-primary
bind 0.0.0.0:5432
mode tcp
option httpchk GET /primary HTTP/1.0
default-server fall 3 rise 2 on-marked-down shutdown-sessions
server node1 192.168.1.10:5432 verifica puerto 8008
server node2 192.168.1.11:5432 verifica puerto 8008 backup
server node3 192.168.1.12:5432 verifica puerto 8008 backup
# PostgreSQL Réplicas (read-only)
listen pg-réplica
bind 0.0.0.0:5433
mode tcp
option httpchk GET /réplica HTTP/1.0
balance roundrobin
default-server fall 3 rise 2 on-marked-down shutdown-sessions
server node1 192.168.1.10:5432 verifica puerto 8008 fall 10 rise 1
server node2 192.168.1.11:5432 verifica puerto 8008 fall 10 rise 1
server node3 192.168.1.12:5432 verifica puerto 8008 fall 10 rise 1
# All nodos
listen pg-all
bind 0.0.0.0:5434
mode tcp
balance roundrobin
default-server fall 3 rise 2 on-marked-down shutdown-sessions
server node1 192.168.1.10:5432 verifica puerto 8008
server node2 192.168.1.11:5432 verifica puerto 8008
server node3 192.168.1.12:5432 verifica puerto 8008
Habilita and start HAProxy:
sudo systemctl enable haproxy
sudo systemctl restart haproxy
sudo systemctl status haproxy
# Verifica HAProxy is listening
netstat -tlnp | grep haproxy
Prueba HAProxy connectivity:
# Conecta via HAProxy to primary
psql -h haproxy.server -p 5432 -U postgres -d postgres -c "SELECT pg_is_in_recovery();"
# Should return 'f' (not in recovery) on primary
# Should return 't' (in recovery) on réplicas
# Prueba réplica read-only connection
psql -h haproxy.server -p 5433 -U postgres -d postgres -c "SELECT pg_is_in_recovery();"
Monitorea HAProxy:
# View HAProxy statistics
curl http://127.0.0.1:8404/stats | head -50
# Watch real-time HAProxy activity
watch "curl -s http://127.0.0.1:8404/stats | grep -E 'pg-|#'"
Monitoreo and Alerting
Implement comprehensive monitoring of the Patroni cluster using Prometheus and Grafana. Export Patroni metrics for monitoring:
# Crea a Prometheus exporter for Patroni
cat > /usr/local/bin/patroni-exporter.py << 'EOF'
#!/usr/bin/env python3
import requests
import json
from prometheus_client import start_http_server, Gauge
import time
# Gauge metrics
cluster_size = Gauge('patroni_cluster_size', 'Number of nodos in cluster')
primary_node = Gauge('patroni_primary_node', 'Is this nodo primary', ['nodo'])
replication_lag = Gauge('patroni_replication_lag_bytes', 'Replication lag in bytes', ['nodo'])
def get_cluster_status():
try:
response = requests.get('http://localhost:8008/cluster')
return response.json()
except Exception as e:
print(f"Error fetching cluster status: {e}")
return None
def update_metrics():
while True:
cluster_info = get_cluster_status()
if cluster_info:
cluster_size.set(len(cluster_info.get('members', [])))
for member in cluster_info.get('members', []):
is_primary = 1 if member['role'] == 'leader' else 0
primary_node.labels(nodo=member['name']).set(is_primary)
time.sleep(10)
if __name__ == '__main__':
start_http_server(8009)
update_metrics()
EOF
chmod +x /usr/local/bin/patroni-exporter.py
python3 /usr/local/bin/patroni-exporter.py &
Configura Prometheus to scrape Patroni metrics:
sudo nano /etc/prometheus/prometheus.yml
Add this scrape configuration:
scrape_configs:
- job_name: 'patroni'
static_configs:
- targets: ['192.168.1.10:8009', '192.168.1.11:8009', '192.168.1.12:8009']
scrape_interval: 15s
Crea Grafana dashboards for cluster visualization:
# Query Patroni REST API for monitoring data
curl -s http://192.168.1.10:8008/cluster | jq '.'
# Get detailed member information
curl -s http://192.168.1.10:8008/members | jq '.'
# Check synchronous replication status
psql -h 192.168.1.10 -U postgres -c \
"SELECT client_addr, application_name, sync_state FROM pg_stat_replication;"
Set up alerting for critical events:
# Crea alert script
cat > /usr/local/bin/patroni-alert.sh << 'EOF'
#!/bin/bash
check_cluster_health() {
CLUSTER_INFO=$(curl -s http://localhost:8008/cluster)
SIZE=$(echo $CLUSTER_INFO | jq '.members | length')
if [ "$SIZE" -lt 2 ]; then
echo "ALERT: Cluster size is $SIZE (expected 3)"
fi
# Check if current nodo is primary
if echo $CLUSTER_INFO | jq -e '.is_leader' > /dev/null; then
echo "INFO: This nodo is primary"
else
echo "INFO: This nodo is réplica"
fi
}
check_cluster_health
EOF
chmod +x /usr/local/bin/patroni-alert.sh
Mantenimiento and Mejores Prácticas
Perform regular cluster maintenance while maintaining high availability. Rolling restart procedure for updates:
# 1. Reinicia réplicas first
for réplica in node2 node3; do
echo "Restarting $réplica..."
ssh $réplica "sudo systemctl restart patroni"
sleep 30
# Verifica nodo rejoined cluster
curl -s http://192.168.1.10:8008/cluster | jq ".members[] | select(.name==\"$réplica\") | .state"
done
# 2. Finally, restart primary (will trigger automatic failover)
ssh node1 "sudo systemctl restart patroni"
sleep 60
# Verifica cluster recovered with new primary
curl -s http://192.168.1.10:8008/cluster | jq '.members[] | {name, role}'
Respalda procedures for cluster:
# Respalda all databases regularly
pg_dump -h 192.168.1.10 -U postgres -F tar -f backup-$(date +%Y%m%d).tar
# Verifica backup
tar -tf backup-$(date +%Y%m%d).tar | head
# Respalda individual databases
pg_dump -h 192.168.1.10 -U postgres -d mydb > mydb-backup.sql
# Full cluster backup using pg_basebackup
pg_basebackup -h 192.168.1.10 -U replication -F tar -z -D ./backup-$(date +%Y%m%d)
Handle split-brain scenarios properly:
# Check if nodos are in sync
patronictl list
# If split-brain occurs, rejoin isolated nodos
# On isolated nodo:
sudo systemctl restart patroni
# Force rejoin if necessary
psql -h localhost -U postgres -c "SELECT pg_controldata();" | grep "Database system identifier"
Document and test disaster recovery:
# Prueba recovery from backup
mkdir -p /tmp/recovery
cd /tmp/recovery
# Extract backup
tar -xzf backup-20240101.tar -C /tmp/recovery/
# Restaura to new server (detailed recovery procedures specific to your backup type)
Conclusión
PostgreSQL with Patroni delivers enterprise-grade high availability with automatic failover, seamless switchover, and minimal operational overhead. By properly configuring etcd for distributed consensus, setting up streaming replication, and deploying HAProxy for intelligent connection routing, you create a robust, self-healing database infrastructure. Regular monitoring, testing of failover procedures, and documented maintenance processes asegúrate de que your cluster continues to provide reliable servicio. Remember that while Patroni automates many HA operations, understanding the underlying PostgreSQL replication mechanics and etcd consensus principles remains essential for effective cluster management and troubleshooting.


