PostgreSQL Replication Configuration: Complete Streaming Replication Guide
Introduction
PostgreSQL replication is a critical feature for building highly available, scalable database systems that provide data redundancy, disaster recovery capabilities, and improved read performance. Unlike traditional backup methods, replication creates live copies of your database that stay synchronized with the primary server in real-time, ensuring minimal data loss and maximum availability for your applications.
PostgreSQL offers multiple replication methods, with streaming replication being the most popular and reliable approach. Streaming replication works by continuously shipping Write-Ahead Log (WAL) records from the primary server to one or more standby servers, where they are replayed to keep the databases in sync. This mechanism provides near-instantaneous failover capabilities and allows read-only queries on standby servers, effectively distributing database load across multiple servers.
Understanding PostgreSQL replication is essential for database administrators managing production environments where downtime is costly and data integrity is paramount. This comprehensive guide covers everything from basic streaming replication setup to advanced configurations including synchronous replication, cascading replication, and automated failover mechanisms. Whether you're building a simple master-standby setup or a complex multi-tier replication topology, this guide provides the knowledge and practical steps needed for successful implementation.
Prerequisites
Before configuring PostgreSQL replication, ensure you have the following:
Server Requirements
- Two or more Linux servers (Ubuntu 20.04+, CentOS 8+, or Debian 11+)
- One server designated as the primary (master)
- One or more servers designated as standby (replica) servers
- Network connectivity between all servers with low latency
- Synchronized system clocks using NTP
PostgreSQL Installation
- PostgreSQL 12+ installed on all servers (PostgreSQL 14+ recommended)
- Identical PostgreSQL versions across all servers
- PostgreSQL service running on the primary server
- Root or sudo access to all servers
System Requirements
- At least 2GB RAM per server (4GB+ recommended for production)
- Sufficient disk space for WAL archives and database storage
- Fast disk I/O (SSD recommended for WAL storage)
- Static IP addresses or reliable DNS names
Network Requirements
- Port 5432 (PostgreSQL default) open between servers
- SSH access (port 22) for initial setup and administration
- Sufficient bandwidth for WAL streaming (minimum 10Mbps)
Knowledge Requirements
- PostgreSQL administration fundamentals
- Linux command line proficiency
- Understanding of SQL and database concepts
- Basic networking knowledge
Understanding PostgreSQL Replication
Replication Architecture
PostgreSQL streaming replication uses a Write-Ahead Log (WAL) based approach:
- Primary Server: Writes all changes to WAL files
- WAL Sender Process: Streams WAL records to standby servers
- Standby Servers: Receive WAL records via WAL receiver processes
- WAL Replay: Standby servers continuously apply WAL records
- Hot Standby: Standby servers can accept read-only queries
Replication Types
Physical Replication: Byte-level replication of the entire database cluster. Most common and efficient, but requires identical hardware architectures.
Logical Replication: Table-level replication that replicates data changes at a logical level. More flexible but with higher overhead.
Streaming Replication: Continuous WAL streaming from primary to standby (focus of this guide).
File-Based WAL Shipping: Periodic transfer of completed WAL files. Simple but has higher lag than streaming.
Synchronization Modes
Asynchronous Replication: Primary commits transactions without waiting for standby confirmation. Fastest but may lose recent transactions on failure.
Synchronous Replication: Primary waits for at least one standby to confirm WAL receipt before committing. Slower but guarantees data safety.
Quorum-Based Synchronous: Requires confirmation from a specified number of standbys before commit.
Step-by-Step Configuration
Step 1: Configure the Primary Server
Connect to your primary server and edit the PostgreSQL configuration file:
sudo nano /etc/postgresql/14/main/postgresql.conf
Add or modify the following parameters:
# --- REPLICATION SETTINGS ---
# Enable WAL archiving for replication
wal_level = replica
# Number of concurrent standby servers
max_wal_senders = 10
# Number of replication slots (one per standby)
max_replication_slots = 10
# WAL retention for standby recovery
wal_keep_size = 1GB
# Archive settings (optional but recommended)
archive_mode = on
archive_command = 'test ! -f /var/lib/postgresql/14/archive/%f && cp %p /var/lib/postgresql/14/archive/%f'
# Hot standby settings
hot_standby = on
# Timeout for replication connections
wal_sender_timeout = 60s
# For synchronous replication (optional)
# synchronous_commit = on
# synchronous_standby_names = 'standby1,standby2'
# Listen on all interfaces
listen_addresses = '*'
# Connection settings
max_connections = 100
Create the WAL archive directory:
sudo mkdir -p /var/lib/postgresql/14/archive
sudo chown postgres:postgres /var/lib/postgresql/14/archive
sudo chmod 700 /var/lib/postgresql/14/archive
Step 2: Configure Host-Based Authentication
Edit the pg_hba.conf file to allow replication connections:
sudo nano /etc/postgresql/14/main/pg_hba.conf
Add the following lines:
# TYPE DATABASE USER ADDRESS METHOD
# Replication connections
host replication replication 192.168.1.0/24 scram-sha-256
host replication replication standby_server_ip/32 scram-sha-256
# Allow connections from standby servers
host all all 192.168.1.0/24 scram-sha-256
Replace 192.168.1.0/24 with your actual network range and standby_server_ip with your standby server's IP address.
Step 3: Create Replication User
Connect to PostgreSQL on the primary server:
sudo -u postgres psql
Create a dedicated replication user:
-- Create replication user
CREATE ROLE replication WITH REPLICATION LOGIN PASSWORD 'StrongPassword123!';
-- Grant necessary privileges
ALTER ROLE replication WITH CONNECTION LIMIT 10;
-- Verify the user was created
\du replication
Exit psql:
\q
Step 4: Restart Primary PostgreSQL Server
Apply the configuration changes:
sudo systemctl restart postgresql
Verify PostgreSQL is running:
sudo systemctl status postgresql
Check if replication settings are active:
sudo -u postgres psql -c "SHOW wal_level;"
sudo -u postgres psql -c "SHOW max_wal_senders;"
Step 5: Create Base Backup for Standby
On the primary server, create a base backup using pg_basebackup:
# Stop PostgreSQL on standby server first (if running)
# ssh standby_server "sudo systemctl stop postgresql"
# Create backup directory on standby
ssh standby_server "sudo mkdir -p /var/lib/postgresql/14/main"
ssh standby_server "sudo chown postgres:postgres /var/lib/postgresql/14/main"
# From primary, create base backup directly to standby
sudo -u postgres pg_basebackup \
-h primary_server_ip \
-D /tmp/pgbackup \
-U replication \
-P \
-v \
-R \
-X stream \
-C -S standby1_slot
Parameters explanation:
-D: Target directory for backup-U: Replication user-P: Show progress-v: Verbose output-R: Create standby.signal and write connection info-X stream: Include WAL files in backup-C: Create replication slot-S: Replication slot name
Alternatively, backup to a tar file and transfer:
# Create backup as tar
sudo -u postgres pg_basebackup \
-h localhost \
-D - \
-U replication \
-Ft \
-z \
-P \
> /tmp/pgbackup.tar.gz
# Transfer to standby server
scp /tmp/pgbackup.tar.gz standby_server:/tmp/
Step 6: Configure the Standby Server
On the standby server, prepare the PostgreSQL data directory:
# Stop PostgreSQL if running
sudo systemctl stop postgresql
# Backup existing data directory
sudo mv /var/lib/postgresql/14/main /var/lib/postgresql/14/main.backup
# Create new data directory
sudo mkdir -p /var/lib/postgresql/14/main
sudo chown postgres:postgres /var/lib/postgresql/14/main
sudo chmod 700 /var/lib/postgresql/14/main
If you created a tar backup, extract it:
sudo -u postgres tar -xzf /tmp/pgbackup.tar.gz -C /var/lib/postgresql/14/main
Or sync the backup directory from primary:
sudo rsync -av --delete \
root@primary_server:/tmp/pgbackup/ \
/var/lib/postgresql/14/main/
sudo chown -R postgres:postgres /var/lib/postgresql/14/main
Step 7: Create Standby Configuration
If you didn't use the -R flag with pg_basebackup, manually create the standby configuration:
sudo -u postgres nano /var/lib/postgresql/14/main/standby.signal
This file can be empty; its presence indicates standby mode.
Create or edit postgresql.auto.conf:
sudo -u postgres nano /var/lib/postgresql/14/main/postgresql.auto.conf
Add connection information:
primary_conninfo = 'host=primary_server_ip port=5432 user=replication password=StrongPassword123! application_name=standby1'
primary_slot_name = 'standby1_slot'
For better security, use a password file instead:
# Create .pgpass file
sudo -u postgres nano /var/lib/postgresql/.pgpass
Add:
primary_server_ip:5432:replication:replication:StrongPassword123!
Secure the file:
sudo chmod 600 /var/lib/postgresql/.pgpass
sudo chown postgres:postgres /var/lib/postgresql/.pgpass
Update postgresql.auto.conf to remove password:
primary_conninfo = 'host=primary_server_ip port=5432 user=replication application_name=standby1'
primary_slot_name = 'standby1_slot'
Step 8: Configure Standby PostgreSQL Settings
Edit the standby's postgresql.conf:
sudo nano /etc/postgresql/14/main/postgresql.conf
Add or modify:
# Hot standby settings
hot_standby = on
hot_standby_feedback = on
# WAL receiver settings
wal_receiver_status_interval = 10s
wal_receiver_timeout = 60s
# Standby queries
max_standby_streaming_delay = 30s
max_standby_archive_delay = 30s
# Recovery settings
restore_command = 'cp /var/lib/postgresql/14/archive/%f %p'
recovery_target_timeline = 'latest'
Step 9: Start Standby Server
Start PostgreSQL on the standby server:
sudo systemctl start postgresql
Check the status:
sudo systemctl status postgresql
Verify standby mode:
sudo -u postgres psql -c "SELECT pg_is_in_recovery();"
Should return t (true) indicating standby mode.
Step 10: Verify Replication Status
On the primary server, check replication status:
sudo -u postgres psql
-- Check connected standbys
SELECT * FROM pg_stat_replication;
-- View replication slots
SELECT * FROM pg_replication_slots;
-- Check WAL sender processes
SELECT pid, state, application_name, client_addr, sent_lsn, write_lsn, flush_lsn, replay_lsn
FROM pg_stat_replication;
On the standby server:
-- Check if in recovery mode
SELECT pg_is_in_recovery();
-- View replay status
SELECT pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(),
pg_last_xact_replay_timestamp();
-- Calculate replication lag
SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) AS lag_seconds;
Testing Replication
Test Data Replication
On the primary server, create test data:
-- Create test database
CREATE DATABASE replication_test;
\c replication_test
-- Create test table
CREATE TABLE test_replication (
id SERIAL PRIMARY KEY,
test_data VARCHAR(100),
created_at TIMESTAMP DEFAULT NOW()
);
-- Insert test data
INSERT INTO test_replication (test_data)
VALUES ('Test data 1'), ('Test data 2'), ('Test data 3');
-- View data
SELECT * FROM test_replication;
On the standby server, verify replication:
-- Connect to test database
\c replication_test
-- Query replicated data
SELECT * FROM test_replication;
-- Verify you cannot write (should fail)
INSERT INTO test_replication (test_data) VALUES ('This should fail');
Monitor Replication Lag
Create a monitoring query:
-- On primary
SELECT client_addr, state,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
sent_lsn::text::pg_lsn - replay_lsn::text::pg_lsn AS byte_lag
FROM pg_stat_replication;
-- On standby
SELECT pg_is_in_recovery() AS is_standby,
pg_last_wal_receive_lsn() AS receive_lsn,
pg_last_wal_replay_lsn() AS replay_lsn,
pg_last_wal_receive_lsn()::text::pg_lsn -
pg_last_wal_replay_lsn()::text::pg_lsn AS replay_lag_bytes,
EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) AS lag_seconds;
Advanced Configuration
Implementing Synchronous Replication
For zero data loss, configure synchronous replication.
On the primary, edit postgresql.conf:
synchronous_commit = on
synchronous_standby_names = 'FIRST 1 (standby1, standby2)'
Options for synchronous_standby_names:
'standby1'- Single standby must confirm'FIRST 1 (standby1, standby2)'- First available standby must confirm'ANY 1 (standby1, standby2)'- Any one standby must confirm'2 (standby1, standby2, standby3)'- Two standbys must confirm
Reload configuration:
sudo systemctl reload postgresql
Verify synchronous replication:
SELECT application_name, sync_state, sync_priority
FROM pg_stat_replication;
Setting Up Cascading Replication
Configure a standby to replicate from another standby.
On the intermediate standby, edit postgresql.conf:
# Enable WAL sending from standby
max_wal_senders = 5
wal_level = replica
hot_standby = on
On the cascading standby, configure to connect to intermediate:
primary_conninfo = 'host=intermediate_standby_ip port=5432 user=replication application_name=cascading_standby'
Implementing Replication Slots
Replication slots prevent WAL deletion before standby consumption:
On the primary:
-- Create replication slot
SELECT * FROM pg_create_physical_replication_slot('standby1_slot');
-- View existing slots
SELECT * FROM pg_replication_slots;
-- Drop slot if needed
SELECT pg_drop_replication_slot('standby1_slot');
On standby, configure to use the slot:
primary_slot_name = 'standby1_slot'
Monitoring with pg_stat_wal_receiver
On standby servers, monitor WAL receiver:
SELECT * FROM pg_stat_wal_receiver;
This shows connection status, last received LSN, and timeline information.
Automated Monitoring Scripts
Create Replication Health Check Script
sudo nano /usr/local/bin/check_pg_replication.sh
#!/bin/bash
# PostgreSQL Replication Health Check Script
LOG_FILE="/var/log/postgresql/replication-check.log"
PRIMARY_IP="primary_server_ip"
log_message() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}
# Check if running as standby
IS_STANDBY=$(sudo -u postgres psql -t -c "SELECT pg_is_in_recovery();")
if [ "$IS_STANDBY" = " t" ]; then
log_message "Running as STANDBY server"
# Check replication lag
LAG=$(sudo -u postgres psql -t -c "SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()));")
if [ -z "$LAG" ]; then
log_message "ERROR: Cannot determine replication lag"
exit 2
fi
LAG_INT=$(echo "$LAG" | cut -d. -f1)
if [ "$LAG_INT" -gt 300 ]; then
log_message "WARNING: Replication lag is ${LAG_INT} seconds"
exit 1
else
log_message "OK: Replication lag is ${LAG_INT} seconds"
exit 0
fi
else
log_message "Running as PRIMARY server"
# Check connected standbys
STANDBY_COUNT=$(sudo -u postgres psql -t -c "SELECT count(*) FROM pg_stat_replication;")
if [ "$STANDBY_COUNT" -eq 0 ]; then
log_message "WARNING: No standby servers connected"
exit 1
else
log_message "OK: $STANDBY_COUNT standby server(s) connected"
# Show standby details
sudo -u postgres psql -c "SELECT application_name, client_addr, state, sync_state FROM pg_stat_replication;"
exit 0
fi
fi
Make executable:
sudo chmod +x /usr/local/bin/check_pg_replication.sh
Schedule with cron:
sudo crontab -e
Add:
# Check replication every 5 minutes
*/5 * * * * /usr/local/bin/check_pg_replication.sh
Create Lag Monitoring Script
sudo nano /usr/local/bin/monitor_replication_lag.sh
#!/bin/bash
# Monitor and log replication lag
LOGFILE="/var/log/postgresql/replication-lag.log"
LAG_BYTES=$(sudo -u postgres psql -t -c "
SELECT
COALESCE(
pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()),
0
)::bigint;
")
LAG_TIME=$(sudo -u postgres psql -t -c "
SELECT
COALESCE(
EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())),
0
)::int;
")
echo "$(date '+%Y-%m-%d %H:%M:%S'),$LAG_BYTES,$LAG_TIME" >> "$LOGFILE"
# Alert if lag exceeds thresholds
if [ "$LAG_TIME" -gt 60 ]; then
echo "ALERT: Replication lag is ${LAG_TIME} seconds" | \
mail -s "PostgreSQL Replication Lag Alert" [email protected]
fi
Troubleshooting
Replication Not Starting
Issue: Standby server not connecting to primary.
Solution: Check connectivity and authentication:
# Test network connectivity
telnet primary_server_ip 5432
# Test replication connection
psql "host=primary_server_ip port=5432 user=replication dbname=replication" -c "IDENTIFY_SYSTEM;"
# Check PostgreSQL logs
sudo tail -f /var/log/postgresql/postgresql-14-main.log
# Verify pg_hba.conf allows replication
sudo grep replication /etc/postgresql/14/main/pg_hba.conf
High Replication Lag
Issue: Standby is falling behind primary.
Solution: Investigate and optimize:
-- Check current lag
SELECT pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()) AS lag_bytes;
-- Check for long-running queries on standby
SELECT pid, usename, query_start, state, query
FROM pg_stat_activity
WHERE state = 'active';
-- Terminate blocking queries
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE ...;
Optimize standby performance:
# Increase WAL receiver buffer
wal_receiver_status_interval = 1s
# Adjust standby delay settings
max_standby_streaming_delay = 30s
hot_standby_feedback = on
Replication Slot Bloat
Issue: Replication slot consuming excessive disk space.
Solution: Monitor and manage slots:
-- Check slot status
SELECT slot_name,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal
FROM pg_replication_slots;
-- Drop inactive slots
SELECT pg_drop_replication_slot('slot_name');
-- Monitor WAL files
SELECT count(*) FROM pg_ls_waldir();
WAL Segment Not Found
Issue: Error: "requested WAL segment has already been removed"
Solution: Increase WAL retention or rebuild standby:
# Increase WAL retention
wal_keep_size = 2GB
# Or use replication slots
max_replication_slots = 5
If standby is too far behind, rebuild from base backup:
# Stop standby
sudo systemctl stop postgresql
# Remove old data
sudo rm -rf /var/lib/postgresql/14/main/*
# Create new base backup
sudo -u postgres pg_basebackup -h primary_ip -D /var/lib/postgresql/14/main -U replication -P -R
# Start standby
sudo systemctl start postgresql
Timeline History Issues
Issue: Error: "timeline ID does not match"
Solution: Reset standby to correct timeline:
# Remove standby.signal
sudo rm /var/lib/postgresql/14/main/standby.signal
# Create new base backup
sudo -u postgres pg_basebackup -h primary_ip -D /tmp/newbackup -U replication -P -R
# Replace data directory
sudo systemctl stop postgresql
sudo rm -rf /var/lib/postgresql/14/main/*
sudo cp -a /tmp/newbackup/* /var/lib/postgresql/14/main/
sudo chown -R postgres:postgres /var/lib/postgresql/14/main
sudo systemctl start postgresql
Best Practices
1. Use Replication Slots
Always use replication slots to prevent WAL deletion before standby consumption. This prevents standbys from falling too far behind.
2. Monitor Replication Health
Implement comprehensive monitoring:
- Replication lag (time and bytes)
- Connected standbys count
- Replication slot status
- WAL disk usage
- Hot standby feedback
3. Regular Failover Testing
Test failover procedures quarterly:
- Promote standby to primary
- Redirect application traffic
- Rebuild old primary as new standby
- Document lessons learned
4. Secure Replication Connections
Use SSL/TLS for replication traffic:
# In pg_hba.conf
hostssl replication replication 0.0.0.0/0 scram-sha-256
# In postgresql.conf
ssl = on
ssl_cert_file = '/etc/postgresql/server.crt'
ssl_key_file = '/etc/postgresql/server.key'
5. Implement Proper Monitoring
Use monitoring tools like:
- pgBadger for log analysis
- pg_stat_statements for query performance
- Prometheus + Grafana for metrics visualization
- Nagios/Icinga for alerting
6. Document Your Configuration
Maintain documentation including:
- Server roles and IP addresses
- Replication topology diagram
- Failover procedures
- Recovery procedures
- Contact information
7. Plan Capacity Properly
Monitor and plan for:
- Disk space for WAL files
- Network bandwidth requirements
- I/O capacity for replay
- CPU for hot standby queries
8. Use Synchronous Replication for Critical Data
For zero data loss requirements, use synchronous replication with at least two standbys for redundancy.
9. Implement Automated Failover
Consider using tools like:
- Patroni for automatic failover
- repmgr for replication management
- Pgpool-II for connection pooling and load balancing
10. Keep PostgreSQL Updated
Regularly update PostgreSQL to benefit from:
- Performance improvements
- Bug fixes
- Security patches
- New replication features
Conclusion
PostgreSQL streaming replication provides a robust, high-performance solution for database high availability and scalability. By following this comprehensive guide, you've learned how to configure physical streaming replication from basic asynchronous setup through advanced synchronous and cascading configurations.
The implementation covered setting up WAL-based streaming replication, creating and managing replication users and slots, configuring both primary and standby servers, implementing monitoring and alerting systems, and troubleshooting common replication issues. You've also learned best practices for maintaining a healthy replication environment and ensuring data consistency across your PostgreSQL infrastructure.
Replication significantly improves your database infrastructure's resilience and performance, but it requires ongoing maintenance and monitoring. Continue to test your failover procedures regularly, monitor replication lag and health metrics, maintain adequate disk space for WAL files, keep PostgreSQL updated with latest patches, and document all configuration changes and procedures.
As your needs evolve, consider implementing logical replication for selective table replication, using connection pooling solutions like PgBouncer or Pgpool-II, deploying automated failover solutions like Patroni, or integrating with cloud-native PostgreSQL services. The foundation you've built with this guide will serve you well as you scale and enhance your PostgreSQL replication infrastructure to meet growing demands and increasing complexity.


