MySQL to PostgreSQL Migration Guide

Migrating from MySQL to PostgreSQL unlocks advanced features like better standards compliance, powerful JSON support, full-text search, and superior extensibility, but requires careful schema conversion and query adaptation. This guide covers the complete migration process using pgLoader for automated data transfer, along with the schema differences and application query changes you need to address.

Prerequisites

  • MySQL source database access (read access is sufficient)
  • PostgreSQL target server (same or newer version)
  • pgLoader installed
  • Sufficient disk space on target (at least 1.5x source database size)
  • Application source code access for query adaptation
# Install pgLoader (Ubuntu/Debian)
sudo apt-get install -y pgloader

# Or build from source for latest version
sudo apt-get install -y sbcl unzip libsqlite3-dev make curl gawk freetds-dev libzip-dev
git clone https://github.com/dimitri/pgloader.git
cd pgloader
make pgloader
sudo mv pgloader /usr/local/bin/

# Install PostgreSQL client
sudo apt-get install -y postgresql-client

# Verify versions
pgloader --version
psql --version
mysql --version

Schema Differences and Conversion

MySQL and PostgreSQL have significant schema incompatibilities that require manual attention:

-- AUTO_INCREMENT -> SERIAL or IDENTITY
-- MySQL:
CREATE TABLE users (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100)
);
-- PostgreSQL:
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    -- Or modern syntax:
    -- id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name VARCHAR(100)
);

-- Data type conversions
-- MySQL TINYINT(1) -> PostgreSQL BOOLEAN
-- MySQL DATETIME -> PostgreSQL TIMESTAMP
-- MySQL ENUM -> PostgreSQL ENUM or VARCHAR with CHECK constraint
-- MySQL TEXT variants -> PostgreSQL TEXT (no size variants needed)
-- MySQL BLOB -> PostgreSQL BYTEA
-- MySQL UNSIGNED INT -> PostgreSQL BIGINT (or use CHECK constraint)
-- MySQL JSON -> PostgreSQL JSONB (better indexing)

-- MySQL ENUM (pgLoader handles this automatically)
-- MySQL:
CREATE TABLE orders (
    status ENUM('pending','processing','shipped','delivered')
);
-- PostgreSQL:
CREATE TYPE order_status AS ENUM ('pending','processing','shipped','delivered');
CREATE TABLE orders (
    status order_status
);

-- String comparison: MySQL is case-insensitive by default
-- PostgreSQL is case-sensitive; use ILIKE or citext extension
-- PostgreSQL:
CREATE EXTENSION IF NOT EXISTS citext;
CREATE TABLE users (
    email citext UNIQUE  -- Case-insensitive comparison
);

-- MySQL backtick identifiers -> PostgreSQL double quotes
-- MySQL: SELECT `user`, `order` FROM `table`
-- PostgreSQL: SELECT "user", "order" FROM "table"
-- Best practice: use lowercase unquoted names

Key behavioral differences to document before migration:

-- NULL handling differences
-- MySQL: NULL = NULL returns NULL (same as PostgreSQL)
-- MySQL: '' (empty string) != NULL (same as PostgreSQL)
-- But MySQL functions often treat NULL differently

-- String concatenation
-- MySQL: CONCAT(a, b) or a || b (with PIPES_AS_CONCAT mode)
-- PostgreSQL: a || b or CONCAT(a, b) - || propagates NULL

-- Date functions
-- MySQL: NOW(), CURDATE(), DATE_FORMAT()
-- PostgreSQL: NOW(), CURRENT_DATE, TO_CHAR()

-- MySQL: DATE_FORMAT(created_at, '%Y-%m-%d')
-- PostgreSQL: TO_CHAR(created_at, 'YYYY-MM-DD')

-- MySQL: DATEDIFF(date1, date2) -> integer days
-- PostgreSQL: date1 - date2 -> interval, use EXTRACT(EPOCH FROM date1-date2)/86400

-- IFNULL vs COALESCE
-- MySQL: IFNULL(col, 'default')
-- PostgreSQL: COALESCE(col, 'default')  [also works in MySQL]

-- Pagination
-- MySQL: LIMIT 10 OFFSET 20
-- PostgreSQL: LIMIT 10 OFFSET 20  [same syntax - compatible]

-- String operations
-- MySQL: SUBSTRING_INDEX(str, delim, count)
-- PostgreSQL: SPLIT_PART(str, delim, count) [slightly different behavior]

Installing and Configuring pgLoader

# Create target PostgreSQL database
createdb -h pg-host -U postgres myapp_new

# Create a pgLoader command file
cat > migrate-myapp.load <<'EOF'
LOAD DATABASE
  FROM mysql://mysqluser:mysqlpassword@mysql-host:3306/myapp
  INTO postgresql://pguser:pgpassword@pg-host:5432/myapp_new

  WITH
    workers = 8,            -- Parallel threads
    concurrency = 2,        -- Concurrent table loads
    max errors = 0,         -- Stop on any error
    data only = false,      -- Migrate schema + data
    create tables = true,
    include drop = false,
    create indexes = true,
    reset sequences = true,
    foreign keys = true,
    downcase identifiers = true  -- Convert to lowercase

  SET PostgreSQL PARAMETERS
    maintenance_work_mem to '512MB',
    work_mem to '128MB'

  CAST
    -- Convert MySQL TINYINT(1) to PostgreSQL BOOLEAN
    type tinyint when (= precision 1) to boolean drop typemod,
    -- Convert BIT(1) to BOOLEAN
    type bit when (= precision 1) to boolean drop typemod,
    -- Convert YEAR to integer
    type year to integer drop typemod,
    -- Convert BIGINT UNSIGNED to numeric to avoid overflow
    type bigint when unsigned to bigint drop typemod

  EXCLUDING TABLE NAMES MATCHING
    ~<temp_.*>,       -- Exclude temp tables
    'schema_migrations'  -- Exclude migration tracking table

  AFTER LOAD DO
    $$ ALTER TABLE users ADD CONSTRAINT email_unique UNIQUE (email); $$,
    $$ CREATE INDEX idx_orders_status ON orders(status); $$;

EOF

# Run a dry run first
pgloader --dry-run migrate-myapp.load

# Run the actual migration
pgloader --verbose migrate-myapp.load 2>&1 | tee migration.log

Running the Migration

# Step 1: Migrate schema only first to review and fix issues
cat > schema-only.load <<'EOF'
LOAD DATABASE
  FROM mysql://user:pass@mysql-host/myapp
  INTO postgresql://user:pass@pg-host/myapp_new
  WITH schema only = true
EOF

pgloader schema-only.load

# Review the generated schema
psql -h pg-host -U postgres -d myapp_new -c "\dt"
psql -h pg-host -U postgres -d myapp_new -c "\d users"

# Step 2: Fix any schema issues manually
psql -h pg-host -U postgres -d myapp_new <<'EOF'
-- Add missing constraints
ALTER TABLE orders ADD CONSTRAINT fk_user
  FOREIGN KEY (user_id) REFERENCES users(id);

-- Create missing indexes  
CREATE INDEX CONCURRENTLY idx_orders_created_at ON orders(created_at);

-- Fix any enum types
CREATE TYPE payment_method AS ENUM ('credit_card','debit_card','paypal','bank_transfer');
ALTER TABLE payments ALTER COLUMN method TYPE payment_method USING method::payment_method;
EOF

# Step 3: Load data
cat > data-only.load <<'EOF'
LOAD DATABASE
  FROM mysql://user:pass@mysql-host/myapp
  INTO postgresql://user:pass@pg-host/myapp_new
  WITH
    data only = true,
    workers = 8,
    concurrency = 2
EOF

pgloader --verbose data-only.load

# Step 4: Verify row counts
mysql -h mysql-host -u user -p myapp -e "
  SELECT table_name, table_rows
  FROM information_schema.tables
  WHERE table_schema = 'myapp'
  ORDER BY table_rows DESC;"

psql -h pg-host -U postgres -d myapp_new -c "
  SELECT schemaname, tablename,
         n_live_tup AS estimated_rows
  FROM pg_stat_user_tables
  ORDER BY n_live_tup DESC;"

Application Query Adaptation

Common query changes needed after migration:

# Create a script to find MySQL-specific syntax in your codebase
grep -rn "IFNULL\|DATE_FORMAT\|SUBSTRING_INDEX\|GROUP_CONCAT\|LIMIT.*,.*\|REPLACE INTO" \
  ./app/src/ --include="*.py" --include="*.php" --include="*.js" \
  > mysql_specific_queries.txt

# Review and fix each occurrence

Key query rewrites:

-- GROUP_CONCAT -> STRING_AGG
-- MySQL:
SELECT user_id, GROUP_CONCAT(tag ORDER BY tag SEPARATOR ',') AS tags
FROM user_tags GROUP BY user_id;
-- PostgreSQL:
SELECT user_id, STRING_AGG(tag, ',' ORDER BY tag) AS tags
FROM user_tags GROUP BY user_id;

-- REPLACE INTO -> INSERT ... ON CONFLICT
-- MySQL:
REPLACE INTO users (id, name, email) VALUES (1, 'Alice', '[email protected]');
-- PostgreSQL:
INSERT INTO users (id, name, email) VALUES (1, 'Alice', '[email protected]')
ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name, email = EXCLUDED.email;

-- INSERT IGNORE -> ON CONFLICT DO NOTHING
-- MySQL: INSERT IGNORE INTO users (email) VALUES ('[email protected]');
-- PostgreSQL:
INSERT INTO users (email) VALUES ('[email protected]')
ON CONFLICT DO NOTHING;

-- LIMIT with offset (comma syntax)
-- MySQL: SELECT * FROM users LIMIT 20, 10;  (offset, count)
-- PostgreSQL: SELECT * FROM users LIMIT 10 OFFSET 20;

-- Full-text search
-- MySQL: WHERE MATCH(title, body) AGAINST ('search term' IN BOOLEAN MODE)
-- PostgreSQL:
SELECT * FROM articles
WHERE to_tsvector('english', title || ' ' || body) @@
      to_tsquery('english', 'search & term');

-- JSON operations
-- MySQL: JSON_EXTRACT(data, '$.key')
-- PostgreSQL: data->>'key' or data->'key'
-- MySQL: JSON_SET(data, '$.key', 'value')
-- PostgreSQL: jsonb_set(data, '{key}', '"value"')

Testing Strategies

# 1. Schema validation script
cat > validate_schema.sh <<'EOF'
#!/bin/bash
MYSQL_CMD="mysql -h mysql-host -u user -ppassword myapp -e"
PG_CMD="psql -h pg-host -U postgres -d myapp_new -t -c"

echo "=== Table count ==="
echo "MySQL: $(${MYSQL_CMD} "SELECT COUNT(*) FROM information_schema.tables WHERE table_schema='myapp';" | tail -1)"
echo "PostgreSQL: $(${PG_CMD} "SELECT COUNT(*) FROM information_schema.tables WHERE table_schema='public';")"

echo "=== Row counts ==="
for table in users orders products; do
  mysql_count=$(${MYSQL_CMD} "SELECT COUNT(*) FROM ${table};" | tail -1)
  pg_count=$(${PG_CMD} "SELECT COUNT(*) FROM ${table};")
  echo "${table}: MySQL=${mysql_count} PG=${pg_count}"
done
EOF

chmod +x validate_schema.sh
./validate_schema.sh

# 2. Data checksum validation
psql -h pg-host -U postgres -d myapp_new <<'EOF'
-- Compare aggregates that should be identical
SELECT
  COUNT(*) AS total_users,
  SUM(id) AS id_sum,
  MAX(created_at) AS latest_user
FROM users;
EOF

# 3. Run application test suite against PostgreSQL
DATABASE_URL=postgresql://user:pass@pg-host/myapp_new npm test
# or
DATABASE_URL=postgresql://user:pass@pg-host/myapp_new python -m pytest

Rollback Planning

# Keep MySQL running during validation period
# Use a feature flag or environment variable to switch databases

# Application-level: Configure dual-write during transition
# 1. Point read traffic to PostgreSQL
# 2. Keep writes going to MySQL
# 3. Replicate MySQL writes to PostgreSQL with pgLoader in continuous mode

cat > continuous-sync.load <<'EOF'
LOAD DATABASE
  FROM mysql://user:pass@mysql-host/myapp
  INTO postgresql://user:pass@pg-host/myapp_new
  WITH
    data only = true,
    workers = 4
EOF

# Run pgLoader continuously (it will catch up on new rows)
while true; do
  pgloader continuous-sync.load
  sleep 60
done

# Switch application to PostgreSQL
# If issues arise, rollback by switching back to MySQL connection string
# MySQL data is unaffected (pgLoader only reads from it)

Troubleshooting

pgLoader fails with encoding errors:

# Add encoding setting to pgLoader config
# In the LOAD command:
# SET client_encoding TO 'UTF8'

# Fix MySQL source encoding first
mysql -h mysql-host -u user -p myapp \
  -e "SHOW VARIABLES LIKE 'character%';"

Sequences out of sync after migration:

# Reset all sequences to match current max IDs
psql -h pg-host -U postgres -d myapp_new <<'EOF'
DO $$
DECLARE
  r RECORD;
BEGIN
  FOR r IN SELECT tablename FROM pg_tables WHERE schemaname = 'public'
  LOOP
    BEGIN
      EXECUTE 'SELECT setval(pg_get_serial_sequence(''' ||
              r.tablename || ''', ''id''), COALESCE(MAX(id), 1)) FROM ' ||
              r.tablename;
    EXCEPTION WHEN OTHERS THEN
      NULL;
    END;
  END LOOP;
END $$;
EOF

Application queries failing after migration:

# Enable PostgreSQL query logging to capture errors
# In postgresql.conf:
log_min_error_statement = error
log_min_duration_statement = 1000  # Log slow queries

# Review PostgreSQL logs
sudo tail -f /var/log/postgresql/postgresql-*.log

Conclusion

Migrating from MySQL to PostgreSQL is a multi-step process that requires schema conversion, data migration with pgLoader, and query adaptation in your application layer. The key to success is thorough testing against a PostgreSQL copy before switching production traffic, maintaining MySQL as a fallback during validation, and resetting sequences after data load. Once migrated, PostgreSQL's advanced features like JSONB, window functions, and full-text search provide significant capabilities over MySQL.