Database Migration with pg_dump and pg_restore

PostgreSQL's pg_dump and pg_restore are the standard tools for database backup and migration, supporting multiple dump formats, parallel processing, and selective restoration that make them suitable for databases of any size. Mastering these tools enables safe migrations between servers, PostgreSQL versions, and cloud providers with minimal downtime.

Prerequisites

  • PostgreSQL client tools installed (pg_dump, pg_restore, psql)
  • Source and target database access credentials
  • Sufficient disk space (at least 1.5x the database size for compressed dumps)
  • Network access between source and target servers
# Install PostgreSQL client tools only (Ubuntu/Debian)
sudo apt-get install -y postgresql-client

# For CentOS/Rocky
sudo dnf install -y postgresql

# Verify versions
pg_dump --version
pg_restore --version

# Set password to avoid interactive prompts
export PGPASSWORD="your-password"
# Or use a .pgpass file
echo "hostname:5432:dbname:username:password" >> ~/.pgpass
chmod 600 ~/.pgpass

Dump Formats Overview

pg_dump supports four formats with different use cases:

FormatFlagExtensionFeatures
Plain SQL-Fp.sqlHuman-readable, use with psql
Custom-Fc.dumpCompressed, selective restore, parallel restore
Directory-FddirectoryParallel dump, parallel restore
Tar-Ft.tarPortable, selective restore

Always use custom (-Fc) or directory (-Fd) format for production migrations — they are compressed, faster to restore in parallel, and allow selective table restoration.

Basic pg_dump Usage

# Dump a single database in custom format (recommended)
pg_dump \
  -h source-host \
  -U postgres \
  -d myapp \
  -Fc \
  -f /backup/myapp_$(date +%Y%m%d_%H%M%S).dump

# Include verbose output to track progress
pg_dump \
  -h source-host \
  -U postgres \
  -d myapp \
  -Fc \
  -v \
  -f /backup/myapp.dump

# Dump to stdout and compress on the fly
pg_dump -h source-host -U postgres -d myapp -Fp | \
  gzip > /backup/myapp.sql.gz

# Dump all databases (creates plain SQL, use for full server migration)
pg_dumpall \
  -h source-host \
  -U postgres \
  --globals-only \
  -f /backup/globals.sql  # Roles and tablespaces only

# Then dump each database with pg_dump
for db in $(psql -h source-host -U postgres -t -c "SELECT datname FROM pg_database WHERE datname NOT IN ('template0','template1','postgres')"); do
  pg_dump -h source-host -U postgres -Fc -d "$db" -f "/backup/${db}.dump"
done

Parallel Dumps for Large Databases

# Directory format enables parallel dump (8 parallel jobs)
pg_dump \
  -h source-host \
  -U postgres \
  -d large_database \
  -Fd \
  -j 8 \
  -f /backup/large_database_dir \
  -v

# The directory will contain:
# toc.dat        - table of contents
# *.dat.gz       - compressed table data files (one per table)

# Control compression level (0=none, 9=max, default=6)
pg_dump \
  -h source-host \
  -U postgres \
  -d myapp \
  -Fc \
  --compress=9 \
  -f /backup/myapp_compressed.dump

# For very large databases, stream directly to remote host
pg_dump -h source-host -U postgres -d myapp -Fc | \
  ssh target-host "pg_restore -h localhost -U postgres -d myapp_new -Fc"

Schema-Only and Data-Only Exports

# Export schema only (no data) - useful for setting up new environment
pg_dump \
  -h source-host \
  -U postgres \
  -d myapp \
  --schema-only \
  -f /backup/myapp_schema.sql

# Export data only (no DDL) - useful for refreshing data
pg_dump \
  -h source-host \
  -U postgres \
  -d myapp \
  --data-only \
  --disable-triggers \
  -Fc \
  -f /backup/myapp_data.dump

# Export specific tables
pg_dump \
  -h source-host \
  -U postgres \
  -d myapp \
  -t public.users \
  -t public.orders \
  -t public.products \
  -Fc \
  -f /backup/myapp_partial.dump

# Exclude specific tables (e.g., large log/audit tables)
pg_dump \
  -h source-host \
  -U postgres \
  -d myapp \
  --exclude-table=public.audit_log \
  --exclude-table=public.session_log \
  -Fc \
  -f /backup/myapp_no_logs.dump

# Export specific schema
pg_dump \
  -h source-host \
  -U postgres \
  -d myapp \
  -n reporting \
  -Fc \
  -f /backup/reporting_schema.dump

Restoring with pg_restore

# Restore to an existing (empty) database
createdb -h target-host -U postgres myapp_new

pg_restore \
  -h target-host \
  -U postgres \
  -d myapp_new \
  -v \
  /backup/myapp.dump

# Parallel restore (matches parallel dump, use same or fewer jobs)
pg_restore \
  -h target-host \
  -U postgres \
  -d myapp_new \
  -j 8 \
  -v \
  /backup/large_database_dir

# Restore with --clean to drop and recreate objects
pg_restore \
  -h target-host \
  -U postgres \
  -d myapp_new \
  --clean \
  --if-exists \
  -v \
  /backup/myapp.dump

# Restore single table from a full dump
pg_restore \
  -h target-host \
  -U postgres \
  -d myapp_new \
  -t public.users \
  /backup/myapp.dump

# List what's in a dump before restoring
pg_restore --list /backup/myapp.dump | head -50

# Restore only specific items using a TOC file
pg_restore --list /backup/myapp.dump > /tmp/toc.txt
# Edit toc.txt to comment out items you don't want to restore
pg_restore \
  -h target-host \
  -U postgres \
  -d myapp_new \
  --use-list=/tmp/toc.txt \
  /backup/myapp.dump

# Restore plain SQL dump
psql -h target-host -U postgres -d myapp_new -f /backup/myapp.sql
# Or from compressed:
gunzip -c /backup/myapp.sql.gz | psql -h target-host -U postgres -d myapp_new

Cross-Version Migration

Migrating between PostgreSQL major versions requires careful planning:

# Check current version
psql -h source-host -U postgres -c "SELECT version();"

# Method 1: pg_dump from old, restore to new (always compatible)
# Use the NEWER pg_dump version when possible
pg_dump -h old-server -U postgres -d myapp -Fc -f myapp_v14.dump
pg_restore -h new-server -U postgres -d myapp -v myapp_v14.dump

# Method 2: pg_upgrade (same server, faster for large databases)
# Stop the old cluster first
sudo systemctl stop postgresql-14

# Run pg_upgrade
sudo -u postgres /usr/lib/postgresql/16/bin/pg_upgrade \
  -b /usr/lib/postgresql/14/bin \
  -B /usr/lib/postgresql/16/bin \
  -d /var/lib/postgresql/14/main \
  -D /var/lib/postgresql/16/main \
  --link  # Hard-link data files instead of copying (faster, but old cluster unusable after)

# After upgrade, update statistics
sudo -u postgres vacuumdb --all --analyze-in-stages

# Method 3: Logical replication for zero-downtime migration
# On source (PostgreSQL 14):
psql -h source -U postgres -c "CREATE PUBLICATION migration_pub FOR ALL TABLES;"

# On target (PostgreSQL 16):
createdb -h target -U postgres myapp
pg_restore -h target -U postgres -d myapp --schema-only myapp_schema.dump

psql -h target -U postgres -d myapp <<EOF
CREATE SUBSCRIPTION migration_sub
  CONNECTION 'host=source-host port=5432 dbname=myapp user=postgres password=pass'
  PUBLICATION migration_pub;
EOF

# Monitor replication lag
psql -h target -U postgres -d myapp \
  -c "SELECT * FROM pg_stat_subscription;"

# Once caught up, switch application connection string to target
# Then cleanup:
psql -h source -U postgres -c "DROP PUBLICATION migration_pub;"
psql -h target -U postgres -d myapp -c "DROP SUBSCRIPTION migration_sub;"

Large Database Strategies

# Use pg_dump with compression and measure throughput
time pg_dump \
  -h source-host \
  -U postgres \
  -d large_db \
  -Fd \
  -j 16 \
  -f /backup/large_db \
  -v 2>&1 | tee /tmp/dump_log.txt

# Stream dump directly to target to save disk space
pg_dump -h source -U postgres -Fc -d myapp | \
  pg_restore -h target -U postgres -d myapp_new

# Use table-level dumps for very large tables
# Useful when one table dominates the size (e.g., events/logs)
pg_dump -h source -U postgres -d myapp \
  --exclude-table=public.events \
  -Fc -f /backup/myapp_no_events.dump

# Dump large table separately with COPY format for max speed
psql -h source -U postgres -d myapp \
  -c "\COPY public.events TO '/backup/events.csv' CSV"

# Then restore with COPY
psql -h target -U postgres -d myapp \
  -c "\COPY public.events FROM '/backup/events.csv' CSV"

# For tables with 100M+ rows, use parallel COPY
psql -h source -U postgres -d myapp <<EOF
\COPY (SELECT * FROM events WHERE id % 4 = 0) TO '/backup/events_part0.csv' CSV
\COPY (SELECT * FROM events WHERE id % 4 = 1) TO '/backup/events_part1.csv' CSV
\COPY (SELECT * FROM events WHERE id % 4 = 2) TO '/backup/events_part2.csv' CSV
\COPY (SELECT * FROM events WHERE id % 4 = 3) TO '/backup/events_part3.csv' CSV
EOF

# Restore in parallel (4 background processes)
for i in 0 1 2 3; do
  psql -h target -U postgres -d myapp \
    -c "\COPY events FROM '/backup/events_part${i}.csv' CSV" &
done
wait

Troubleshooting

Dump fails with "out of memory":

# Reduce the number of parallel jobs
pg_dump -Fd -j 2 ... # Instead of -j 8

# Or set lock_timeout to avoid waiting indefinitely
pg_dump --lock-wait-timeout=60s ...

Restore fails with "role does not exist":

# Create missing roles before restoring
pg_dumpall -h source -U postgres --roles-only -f roles.sql
psql -h target -U postgres -f roles.sql

# Or restore without ownership
pg_restore --no-owner --no-privileges ...

Restore is extremely slow:

# Temporarily disable autovacuum during restore
psql -h target -U postgres -c "ALTER SYSTEM SET autovacuum = off;"
psql -h target -U postgres -c "SELECT pg_reload_conf();"

# Increase maintenance_work_mem
psql -h target -U postgres -c "SET maintenance_work_mem = '1GB';"

# After restore, re-enable and analyze
psql -h target -U postgres -c "ALTER SYSTEM SET autovacuum = on;"
vacuumdb -h target -U postgres --all --analyze

Data integrity check after migration:

# Compare row counts
psql -h source -U postgres -d myapp -c \
  "SELECT table_name, n_live_tup FROM pg_stat_user_tables ORDER BY n_live_tup DESC;"

psql -h target -U postgres -d myapp -c \
  "SELECT table_name, n_live_tup FROM pg_stat_user_tables ORDER BY n_live_tup DESC;"

Conclusion

pg_dump and pg_restore remain the most reliable tools for PostgreSQL migrations, supporting everything from small single-table exports to parallel dumps of terabyte-scale databases. Use custom or directory format for production work, leverage parallel jobs for large databases, and validate row counts after migration. For zero-downtime migrations between major versions, combine a schema-only dump with logical replication to minimize application disruption.