PostgreSQL High Availability with Patroni
Patroni is a modern, robust solution for PostgreSQL high availability that manages PostgreSQL initialization, configuration, and replication without requiring additional consultants or specialized scripts. It uses a distributed configuration store (typically etcd, Consul, or ZooKeeper) to manage cluster state and automatically handles failover, switchover, and cluster topology management. This guide provides a comprehensive walkthrough of deploying PostgreSQL HA with Patroni on Linux systems, covering etcd setup, Patroni installation, configuration, failover mechanisms, and practical operational considerations.
Table of Contents
- Architecture Overview
- etcd Cluster Setup
- Patroni Installation
- Patroni Configuration
- Streaming Replication
- Failover and Recovery
- Switchover Operations
- HAProxy Load Balancing
- Monitoring and Alerting
- Maintenance and Best Practices
- Conclusion
Architecture Overview
A Patroni HA cluster consists of three main components: PostgreSQL servers running on cluster nodes, a distributed key-value store (etcd) for configuration and state management, and a load balancer (HAProxy) for client connection routing. In a typical three-node setup, each node runs PostgreSQL and Patroni, with one node serving as the primary and others as replicas. The etcd cluster manages Patroni state, facilitating automatic failover when the primary becomes unavailable.
The distributed state store continuously monitors PostgreSQL health and coordinates failover decisions. When the primary fails, Patroni automatically promotes the most advanced replica to primary, updates the topology, and notifies clients through HAProxy. This process typically completes in 10-30 seconds, minimizing downtime and data loss.
etcd Cluster Setup
etcd is a distributed, strongly consistent key-value store essential for Patroni's distributed consensus. Deploy a three-node etcd cluster with one etcd instance per PostgreSQL cluster node for proper quorum and fault tolerance.
Install etcd on all three nodes. First, download and install the latest stable version:
# Download etcd
ETCD_VER=v3.5.10
curl -L https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
# Extract and move to /usr/local/bin
tar xzf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
sudo cp etcd-${ETCD_VER}-linux-amd64/{etcd,etcdctl} /usr/local/bin/
sudo chmod +x /usr/local/bin/etcd /usr/local/bin/etcdctl
# Verify installation
etcd --version
etcdctl version
Create an etcd systemd service file. Adjust node names and IPs according to your infrastructure:
sudo nano /etc/systemd/system/etcd.service
For node1 (192.168.1.10), use this configuration:
[Unit]
Description=etcd - Distributed reliable key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target
Wants=etcd-wait.service
[Service]
Type=notify
User=etcd
PermissionsStartOnly=true
ProtectSystem=full
ProtectHome=yes
NoNewPrivileges=true
ExecStart=/usr/local/bin/etcd \
--name node1 \
--listen-client-urls http://192.168.1.10:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.1.10:2379 \
--listen-peer-urls http://192.168.1.10:2380 \
--initial-advertise-peer-urls http://192.168.1.10:2380 \
--initial-cluster node1=http://192.168.1.10:2380,node2=http://192.168.1.11:2380,node3=http://192.168.1.12:2380 \
--initial-cluster-state new \
--initial-cluster-token etcd-cluster \
--data-dir /var/lib/etcd
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
Create the etcd data directory and set permissions:
sudo mkdir -p /var/lib/etcd
sudo useradd -r -s /bin/false etcd 2>/dev/null || true
sudo chown -R etcd:etcd /var/lib/etcd
sudo chmod 700 /var/lib/etcd
On node2 and node3, modify the service with appropriate node identifiers:
# Node 2 configuration
--name node2 \
--listen-client-urls http://192.168.1.11:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.1.11:2379 \
--listen-peer-urls http://192.168.1.11:2380 \
--initial-advertise-peer-urls http://192.168.1.11:2380 \
# Node 3 configuration
--name node3 \
--listen-client-urls http://192.168.1.12:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.1.12:2379 \
--listen-peer-urls http://192.168.1.12:2380 \
--initial-advertise-peer-urls http://192.168.1.12:2380 \
Start etcd on all nodes:
sudo systemctl daemon-reload
sudo systemctl start etcd
sudo systemctl enable etcd
sudo systemctl status etcd
Verify the etcd cluster is healthy:
# Check cluster health
etcdctl --endpoints=http://127.0.0.1:2379 endpoint health
# List cluster members
etcdctl --endpoints=http://127.0.0.1:2379 member list
# Test writing a key
etcdctl --endpoints=http://127.0.0.1:2379 put /patroni/test "value"
etcdctl --endpoints=http://127.0.0.1:2379 get /patroni/test
Patroni Installation
Install Patroni on all three PostgreSQL nodes. First, install PostgreSQL server and development tools:
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install -y postgresql-15 postgresql-contrib-15 python3-pip
# RHEL/CentOS
sudo dnf install -y postgresql-server postgresql-contrib postgresql-devel python3-pip
# Start PostgreSQL service (initial start only, Patroni will manage it)
sudo systemctl stop postgresql
Install Patroni and dependencies using pip:
# Install Patroni with etcd support
sudo pip3 install patroni[etcd]
# Install additional optional packages
sudo pip3 install psycopg2-binary requests pyyaml
# Verify installation
patroni --version
Create a Patroni system user:
sudo useradd -r -s /bin/bash patroni 2>/dev/null || true
sudo usermod -a -G postgres patroni
Create Patroni's configuration directory:
sudo mkdir -p /etc/patroni
sudo chown patroni:patroni /etc/patroni
sudo chmod 750 /etc/patroni
Patroni Configuration
Create the main Patroni configuration file. This YAML file contains all cluster topology, PostgreSQL, and Patroni-specific settings:
sudo nano /etc/patroni/patroni.yml
For node1 (primary), use this comprehensive configuration:
scope: pg-cluster
namespace: /patroni
name: node1
restapi:
listen: 192.168.1.10:8008
connect_address: 192.168.1.10:8008
username: patroni_api_user
password: patroni_api_password
etcd:
hosts:
- 192.168.1.10:2379
- 192.168.1.11:2379
- 192.168.1.12:2379
postgresql:
scope: pg-cluster
data_dir: /var/lib/postgresql/15/main
bin_dir: /usr/lib/postgresql/15/bin
pgpass: /var/lib/postgresql/.pgpass
listen: 192.168.1.10:5432
connect_address: 192.168.1.10:5432
authentication:
username: postgres
password: postgres_password
replication:
username: replication
password: replication_password
basebackup:
command: /usr/lib/postgresql/15/bin/pg_basebackup -D %p -X stream -c fast
max_rate: '100M'
checkpoint: 'fast'
parameters:
wal_level: logical
max_wal_senders: 5
max_replication_slots: 5
wal_keep_size: 1GB
hot_standby: on
shared_preload_libraries: 'pg_stat_statements'
shared_buffers: '256MB'
effective_cache_size: '1GB'
maintenance_work_mem: '64MB'
log_min_duration_statement: 1000
logging_collector: on
log_directory: 'log'
log_filename: 'postgresql-%Y-%m-%d_%H%M%S.log'
log_file_mode: '0600'
log_retention_days: 7
recovery_conf:
recovery_target_timeline: latest
bootstrap:
method: initdb
initdb:
- encoding: UTF8
- data-checksums: null
- locale: en_US.UTF-8
pg_hba:
- local all all trust
- host all all 127.0.0.1/32 md5
- host all all ::1/128 md5
- host replication replication 192.168.0.0/16 md5
users:
admin:
password: admin_password
options:
- createrole
- createdb
watchdog:
mode: automatic
device: /dev/watchdog
safety_margin: 5
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
For nodes 2 and 3, modify the name and connect_address to reflect their identity. The node name must match the etcd cluster node names, and addresses must be adjusted accordingly:
# Node 2 configuration changes
name: node2
restapi:
listen: 192.168.1.11:8008
connect_address: 192.168.1.11:8008
postgresql:
listen: 192.168.1.11:5432
connect_address: 192.168.1.11:5432
Set proper permissions on the configuration file:
sudo chown patroni:patroni /etc/patroni/patroni.yml
sudo chmod 600 /etc/patroni/patroni.yml
Create a systemd service file for Patroni:
sudo nano /etc/systemd/system/patroni.service
Use this service configuration:
[Unit]
Description=Patroni - PostgreSQL HA solution
After=network.target etcd.service
Wants=etcd.service
[Service]
Type=simple
User=patroni
Group=patroni
WorkingDirectory=/etc/patroni
ExecStart=/usr/local/bin/patroni /etc/patroni/patroni.yml
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=5
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
Enable and start Patroni:
sudo systemctl daemon-reload
sudo systemctl enable patroni
sudo systemctl start patroni
# Check status
sudo systemctl status patroni
sudo journalctl -u patroni -f
Streaming Replication
Patroni automatically configures streaming replication between the primary and replicas. Verify replication is working correctly:
# Connect to primary node
psql -h 192.168.1.10 -U postgres -d postgres
# Check replication slots
SELECT slot_name, slot_type, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots;
# Check replication connections
SELECT usename, application_name, client_addr, state, sync_state FROM pg_stat_replication;
# Monitor replication lag
SELECT now() - pg_last_xact_replay_time() AS replication_lag;
Monitor replication lag across the cluster:
# Create a monitoring script
cat > /usr/local/bin/check-patroni-replication.sh << 'EOF'
#!/bin/bash
NODES=("192.168.1.10" "192.168.1.11" "192.168.1.12")
for node in "${NODES[@]}"; do
echo "=== Node $node ==="
psql -h $node -U postgres -d postgres -t -c \
"SELECT now() - pg_last_xact_replay_time() AS replication_lag;"
done
EOF
chmod +x /usr/local/bin/check-patroni-replication.sh
/usr/local/bin/check-patroni-replication.sh
Verify replication status through Patroni REST API:
# Query Patroni API on primary
curl -u patroni_api_user:patroni_api_password http://192.168.1.10:8008/cluster
# Check replica synchronization
curl -u patroni_api_user:patroni_api_password http://192.168.1.10:8008/primary
# Get detailed cluster information
curl -u patroni_api_user:patroni_api_password http://192.168.1.10:8008/replica
Failover and Recovery
Patroni automatically detects primary failure and promotes the most advanced replica. Monitor the automatic failover process:
# Monitor etcd for leader changes
watch -n 1 "etcdctl --endpoints=http://127.0.0.1:2379 get /patroni/pg-cluster/leader"
# Watch Patroni status
watch -n 2 "curl -s -u patroni_api_user:patroni_api_password http://192.168.1.10:8008/cluster | jq '.members[] | {name, role}'"
Simulate primary failure to test failover:
# Stop PostgreSQL on primary (be careful in production!)
sudo systemctl stop postgresql
# Or kill the Patroni process
sudo kill -9 $(pgrep -f "patroni /etc/patroni")
# Monitor failover progress
tail -f /var/log/syslog | grep patroni
Manually recover a failed node:
# Check node status after failure
sudo systemctl status patroni
# Clean recovery mode
sudo rm -rf /var/lib/postgresql/15/main/*
sudo systemctl start patroni
# Patroni will automatically perform basebackup and rejoin cluster
tail -f /var/log/syslog | grep -i basebackup
Configure automatic recovery in patroni.yml:
postgresql:
recovery_conf:
recovery_target_timeline: latest
pre_demote: /usr/local/bin/pre-demote-hook.sh
pre_promote: /usr/local/bin/pre-promote-hook.sh
post_promote: /usr/local/bin/post-promote-hook.sh
post_init: /usr/local/bin/post-init-hook.sh
Switchover Operations
Switchover is a planned, non-disruptive role change from primary to replica. Perform switchover for maintenance or load balancing:
# Request switchover via Patroni REST API
curl -X POST \
-u patroni_api_user:patroni_api_password \
-H "Content-Type: application/json" \
-d '{"leader":"node1","candidate":"node2"}' \
http://192.168.1.10:8008/switchover
# Verify switchover completion
curl -u patroni_api_user:patroni_api_password http://192.168.1.10:8008/cluster | jq '.members[] | {name, role}'
Monitor switchover progress:
# Watch cluster status during switchover
watch -n 1 "curl -s -u patroni_api_user:patroni_api_password http://192.168.1.10:8008/cluster | jq '.members[] | {name, role, state}'"
# Check WAL archiving progress
psql -h 192.168.1.10 -U postgres -d postgres -c "SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0');"
Handle switchover failures:
# Check Patroni logs for error details
sudo journalctl -u patroni -n 50 -p err
# Retry switchover
curl -X POST \
-u patroni_api_user:patroni_api_password \
-H "Content-Type: application/json" \
-d '{"leader":"node2","candidate":"node1"}' \
http://192.168.1.10:8008/switchover
HAProxy Load Balancing
HAProxy distributes connections to the PostgreSQL cluster, routing write requests to the primary and read requests to replicas. Install HAProxy on a dedicated server or co-locate with an application server:
# Install HAProxy
sudo apt-get install -y haproxy
# Backup original configuration
sudo cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bak
Configure HAProxy for PostgreSQL cluster:
sudo nano /etc/haproxy/haproxy.cfg
Add this configuration, replacing default content:
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
maxconn 4096
tune.ssl.default-dh-param 2048
defaults
log global
mode tcp
option tcplog
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
# Stats interface
listen stats
bind 127.0.0.1:8404
stats enable
stats uri /stats
stats admin if TRUE
# PostgreSQL Primary (write)
listen pg-primary
bind 0.0.0.0:5432
mode tcp
option httpchk GET /primary HTTP/1.0
default-server fall 3 rise 2 on-marked-down shutdown-sessions
server node1 192.168.1.10:5432 check port 8008
server node2 192.168.1.11:5432 check port 8008 backup
server node3 192.168.1.12:5432 check port 8008 backup
# PostgreSQL Replicas (read-only)
listen pg-replica
bind 0.0.0.0:5433
mode tcp
option httpchk GET /replica HTTP/1.0
balance roundrobin
default-server fall 3 rise 2 on-marked-down shutdown-sessions
server node1 192.168.1.10:5432 check port 8008 fall 10 rise 1
server node2 192.168.1.11:5432 check port 8008 fall 10 rise 1
server node3 192.168.1.12:5432 check port 8008 fall 10 rise 1
# All nodes
listen pg-all
bind 0.0.0.0:5434
mode tcp
balance roundrobin
default-server fall 3 rise 2 on-marked-down shutdown-sessions
server node1 192.168.1.10:5432 check port 8008
server node2 192.168.1.11:5432 check port 8008
server node3 192.168.1.12:5432 check port 8008
Enable and start HAProxy:
sudo systemctl enable haproxy
sudo systemctl restart haproxy
sudo systemctl status haproxy
# Verify HAProxy is listening
netstat -tlnp | grep haproxy
Test HAProxy connectivity:
# Connect via HAProxy to primary
psql -h haproxy.server -p 5432 -U postgres -d postgres -c "SELECT pg_is_in_recovery();"
# Should return 'f' (not in recovery) on primary
# Should return 't' (in recovery) on replicas
# Test replica read-only connection
psql -h haproxy.server -p 5433 -U postgres -d postgres -c "SELECT pg_is_in_recovery();"
Monitor HAProxy:
# View HAProxy statistics
curl http://127.0.0.1:8404/stats | head -50
# Watch real-time HAProxy activity
watch "curl -s http://127.0.0.1:8404/stats | grep -E 'pg-|#'"
Monitoring and Alerting
Implement comprehensive monitoring of the Patroni cluster using Prometheus and Grafana. Export Patroni metrics for monitoring:
# Create a Prometheus exporter for Patroni
cat > /usr/local/bin/patroni-exporter.py << 'EOF'
#!/usr/bin/env python3
import requests
import json
from prometheus_client import start_http_server, Gauge
import time
# Gauge metrics
cluster_size = Gauge('patroni_cluster_size', 'Number of nodes in cluster')
primary_node = Gauge('patroni_primary_node', 'Is this node primary', ['node'])
replication_lag = Gauge('patroni_replication_lag_bytes', 'Replication lag in bytes', ['node'])
def get_cluster_status():
try:
response = requests.get('http://localhost:8008/cluster')
return response.json()
except Exception as e:
print(f"Error fetching cluster status: {e}")
return None
def update_metrics():
while True:
cluster_info = get_cluster_status()
if cluster_info:
cluster_size.set(len(cluster_info.get('members', [])))
for member in cluster_info.get('members', []):
is_primary = 1 if member['role'] == 'leader' else 0
primary_node.labels(node=member['name']).set(is_primary)
time.sleep(10)
if __name__ == '__main__':
start_http_server(8009)
update_metrics()
EOF
chmod +x /usr/local/bin/patroni-exporter.py
python3 /usr/local/bin/patroni-exporter.py &
Configure Prometheus to scrape Patroni metrics:
sudo nano /etc/prometheus/prometheus.yml
Add this scrape configuration:
scrape_configs:
- job_name: 'patroni'
static_configs:
- targets: ['192.168.1.10:8009', '192.168.1.11:8009', '192.168.1.12:8009']
scrape_interval: 15s
Create Grafana dashboards for cluster visualization:
# Query Patroni REST API for monitoring data
curl -s http://192.168.1.10:8008/cluster | jq '.'
# Get detailed member information
curl -s http://192.168.1.10:8008/members | jq '.'
# Check synchronous replication status
psql -h 192.168.1.10 -U postgres -c \
"SELECT client_addr, application_name, sync_state FROM pg_stat_replication;"
Set up alerting for critical events:
# Create alert script
cat > /usr/local/bin/patroni-alert.sh << 'EOF'
#!/bin/bash
check_cluster_health() {
CLUSTER_INFO=$(curl -s http://localhost:8008/cluster)
SIZE=$(echo $CLUSTER_INFO | jq '.members | length')
if [ "$SIZE" -lt 2 ]; then
echo "ALERT: Cluster size is $SIZE (expected 3)"
fi
# Check if current node is primary
if echo $CLUSTER_INFO | jq -e '.is_leader' > /dev/null; then
echo "INFO: This node is primary"
else
echo "INFO: This node is replica"
fi
}
check_cluster_health
EOF
chmod +x /usr/local/bin/patroni-alert.sh
Maintenance and Best Practices
Perform regular cluster maintenance while maintaining high availability. Rolling restart procedure for updates:
# 1. Restart replicas first
for replica in node2 node3; do
echo "Restarting $replica..."
ssh $replica "sudo systemctl restart patroni"
sleep 30
# Verify node rejoined cluster
curl -s http://192.168.1.10:8008/cluster | jq ".members[] | select(.name==\"$replica\") | .state"
done
# 2. Finally, restart primary (will trigger automatic failover)
ssh node1 "sudo systemctl restart patroni"
sleep 60
# Verify cluster recovered with new primary
curl -s http://192.168.1.10:8008/cluster | jq '.members[] | {name, role}'
Backup procedures for cluster:
# Backup all databases regularly
pg_dump -h 192.168.1.10 -U postgres -F tar -f backup-$(date +%Y%m%d).tar
# Verify backup
tar -tf backup-$(date +%Y%m%d).tar | head
# Backup individual databases
pg_dump -h 192.168.1.10 -U postgres -d mydb > mydb-backup.sql
# Full cluster backup using pg_basebackup
pg_basebackup -h 192.168.1.10 -U replication -F tar -z -D ./backup-$(date +%Y%m%d)
Handle split-brain scenarios properly:
# Check if nodes are in sync
patronictl list
# If split-brain occurs, rejoin isolated nodes
# On isolated node:
sudo systemctl restart patroni
# Force rejoin if necessary
psql -h localhost -U postgres -c "SELECT pg_controldata();" | grep "Database system identifier"
Document and test disaster recovery:
# Test recovery from backup
mkdir -p /tmp/recovery
cd /tmp/recovery
# Extract backup
tar -xzf backup-20240101.tar -C /tmp/recovery/
# Restore to new server (detailed recovery procedures specific to your backup type)
Conclusion
PostgreSQL with Patroni delivers enterprise-grade high availability with automatic failover, seamless switchover, and minimal operational overhead. By properly configuring etcd for distributed consensus, setting up streaming replication, and deploying HAProxy for intelligent connection routing, you create a robust, self-healing database infrastructure. Regular monitoring, testing of failover procedures, and documented maintenance processes ensure your cluster continues to provide reliable service. Remember that while Patroni automates many HA operations, understanding the underlying PostgreSQL replication mechanics and etcd consensus principles remains essential for effective cluster management and troubleshooting.


