MariaDB Galera Cluster Configuration

MariaDB Galera Cluster is a synchronous multi-master replication solution that provides high availability and scalability for MariaDB databases. Unlike traditional asynchronous replication, Galera ensures data consistency across all cluster nodes through write-set replication and certification-based conflict resolution. This guide covers the complete installation, configuration, and operational aspects of setting up a production-grade MariaDB Galera Cluster on Linux systems.

Table of Contents

Prerequisites and Planning

Before deploying MariaDB Galera Cluster, ensure you have the following requirements in place. You'll need a minimum of three nodes for a production cluster, preferably distributed across different physical locations or availability zones. Each node requires at least 2GB RAM and 10GB storage, though production workloads typically need more. All nodes must have network connectivity with low latency between them, ideally sub-10ms round-trip time.

Ensure you have root or sudo access on all nodes. The MariaDB version should be consistent across all cluster nodes, preferably the same minor version. Firewall rules must allow TCP ports 3306 (MySQL), 4567-4569 (Galera), and UDP port 4567 for cluster communication.

Document your cluster topology before installation. Decide whether you'll use rsync or mariabackup for state snapshot transfer, as this affects initial configuration. Plan for a load balancer or Nginx proxy to distribute client connections across cluster nodes.

Installation

Begin by updating your system repositories and installing MariaDB server with Galera support. On Ubuntu or Debian systems, use the official MariaDB repository to ensure you get the correct version with Galera included.

# Ubuntu 20.04/22.04
curl -LsS https://r.mariadb.com/downloads/mariadb_repo_setup | sudo bash -s -- --mariadb-server-version=11.0

# Install MariaDB with Galera
sudo apt-get update
sudo apt-get install -y mariadb-server mariadb-backup galera-4

# Verify installation
mariadb --version
mariadbd --version

On CentOS/RHEL systems, configure the MariaDB repository and install:

# CentOS/RHEL 8/9
curl -LsS https://r.mariadb.com/downloads/mariadb_repo_setup | sudo bash -s -- --mariadb-server-version=11.0

# Install packages
sudo dnf install -y MariaDB-server MariaDB-backup galera-4

# Verify installation
mariadb --version

After installation, ensure the MariaDB service is stopped before configuration:

sudo systemctl stop mariadb
sudo systemctl disable mariadb

# Verify the service is fully stopped
sudo ps aux | grep -i mariadb

Install additional packages needed for SST operations:

# For rsync-based SST
sudo apt-get install -y rsync

# For mariabackup-based SST and enhanced features
sudo apt-get install -y mariadb-backup socat

# On CentOS/RHEL
sudo dnf install -y rsync socat

Wsrep Configuration

The wsrep (Write-Set Replication) configuration is the core of Galera clustering. Create or modify the MariaDB configuration file to include Galera-specific settings. Create a dedicated Galera configuration file for clarity:

sudo nano /etc/mysql/conf.d/99-galera.cnf

Add the following configuration, adjusting node names and IPs for your environment. Assume we're setting up a three-node cluster with nodes at 192.168.1.10, 192.168.1.11, and 192.168.1.12:

[mysqld]
# Galera cluster name - must be identical across all nodes
wsrep_cluster_name="mariadb-cluster"

# Node identification
wsrep_node_name="node1"
wsrep_node_address="192.168.1.10"

# Galera provider library path
wsrep_provider="/usr/lib/galera/libgalera_smm.so"

# Cluster connection string - all nodes listed
wsrep_cluster_address="gcomm://192.168.1.10,192.168.1.11,192.168.1.12"

# SST method (rsync or mariabackup)
wsrep_sst_method=mariabackup

# SST authentication - should be a dedicated replication user
wsrep_sst_auth="repl_user:repl_password"

# Enable Galera replication
wsrep_on=ON

# Replication format - row-based for consistency
binlog_format=ROW

# Enable binary logging for SST and recovery
log_bin=mariadb-bin
log_bin_index=mariadb-bin.index

# Server ID must be unique per node
server_id=1

# Maximum replication batch size in bytes
wsrep_max_ws_rows=131072
wsrep_max_ws_size=2147483648

# Certification index size
wsrep_cert_index_size=32768

# Override system variable settings for cluster awareness
wsrep_certify_nonpk=ON

# Replication timeout settings
wsrep_sst_donor_rejects_queries=OFF
wsrep_sst_method_options="--rsync-restart=on"

# Database consistency check level
wsrep_recovery=ON

# Performance and optimization settings
query_cache_size=0
query_cache_type=0
innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=2

For the second node, modify the configuration:

sudo nano /etc/mysql/conf.d/99-galera.cnf

Update these parameters for node2 (192.168.1.11):

wsrep_node_name="node2"
wsrep_node_address="192.168.1.11"
server_id=2

And for node3 (192.168.1.12):

wsrep_node_name="node3"
wsrep_node_address="192.168.1.12"
server_id=3

Verify the configuration syntax:

sudo mariadbd --validate-config

State Snapshot Transfer Methods

MariaDB Galera supports multiple State Snapshot Transfer (SST) methods. The most common are rsync and mariabackup, each with different characteristics.

Rsync is simpler but slower and locks the donor node during transfer. It's suitable for smaller datasets:

# Rsync SST configuration (donor side)
[sst]
method=rsync
rsync_path="/usr/bin/rsync"

Mariabackup is preferred for production as it performs non-blocking backups:

# Mariabackup SST configuration
[sst]
method=mariabackup
# Backup user must have appropriate privileges
# This user handles SST authentication
sstuser=sst_user
sstpass=sst_password

# Parallelization for faster backup
# Set to number of CPU cores
parallel=4

# Encrypt backup for network transfer
encrypt=4
encrypt_key="backup_encryption_key_here"

Create the SST user on the first node after initial bootstrap:

mariadb -u root -p -e "CREATE USER 'sst_user'@'localhost' IDENTIFIED BY 'sst_password';"
mariadb -u root -p -e "GRANT RELOAD, LOCK TABLES, PROCESS, REPLICATION CLIENT ON *.* TO 'sst_user'@'localhost';"
mariadb -u root -p -e "FLUSH PRIVILEGES;"

Test rsync connectivity between nodes:

# From node1, test rsync to node2
rsync --version
rsync -avz /var/lib/mysql/ [email protected]:/tmp/test_rsync/

# Verify rsync can connect without password
ssh-copy-id [email protected]
ssh [email protected] "ls -la /tmp/test_rsync/"

Cluster Bootstrap

Bootstrap the first node to create the initial cluster state. The bootstrap process initializes the cluster UUID and establishes the first node as the reference point for other nodes:

# On node1, perform bootstrap
sudo systemctl start mariadb

# Verify the service started successfully
sudo systemctl status mariadb

# Check cluster status
mariadb -u root -e "SHOW STATUS LIKE 'wsrep%';"

Expected output should show:

  • wsrep_local_state_uuid: (a UUID value)
  • wsrep_cluster_size: 1
  • wsrep_ready: ON
  • wsrep_connected: ON

If the node fails to start, check the error log:

sudo tail -50 /var/log/mysql/error.log

Common bootstrap issues include incorrect paths to galera library or permission problems. Verify the galera library exists:

ls -la /usr/lib/galera/libgalera_smm.so

If bootstrap fails with wsrep error, you may need to force initialization:

# DANGEROUS: Only for initial bootstrap failure recovery
sudo rm /var/lib/mysql/grastate.dat
sudo systemctl start mariadb

After successful bootstrap, create the replication user required for inter-node communication:

mariadb -u root -p << EOF
CREATE USER 'repl_user'@'%' IDENTIFIED BY 'repl_password';
GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'repl_user'@'%';
GRANT ALL PRIVILEGES ON *.* TO 'repl_user'@'localhost' IDENTIFIED BY 'repl_password';
FLUSH PRIVILEGES;
EOF

Node Recovery and Joining

With the first node running, add the remaining nodes to the cluster. Ensure the configuration on nodes 2 and 3 is complete and correctly references the cluster address.

Start the second node:

# On node2
sudo systemctl start mariadb

# Monitor the startup and SST process
sudo tail -f /var/log/mysql/error.log

# Check progress
mariadb -u root -e "SHOW STATUS LIKE 'wsrep%';"

The node will perform State Snapshot Transfer from node1, which may take several minutes depending on data size. Monitor the process:

# Check SST progress
ps aux | grep -i mariabackup
iostat -x 1 5  # Monitor disk I/O

# Check port 4567 traffic during SST
netstat -tn | grep 4567

Once node2 shows wsrep_cluster_size=2 and wsrep_ready=ON, start node3:

# On node3
sudo systemctl start mariadb

# Verify it joins the cluster
mariadb -u root -e "SHOW STATUS LIKE 'wsrep%';"

# Should eventually show cluster_size=3

Verify all three nodes are communicating:

# Run on any node
mariadb -u root -e "SHOW STATUS LIKE 'wsrep%';" | grep -E '(wsrep_cluster_size|wsrep_ready|wsrep_connected)'

All nodes should show cluster_size=3, ready=ON, and connected=ON.

Monitoring and Health Checks

Implement comprehensive monitoring of your Galera cluster. Create a monitoring script to check cluster health:

#!/bin/bash
# /usr/local/bin/check-galera-health.sh

MYSQL_USER="root"
MYSQL_PASS="$1"

echo "=== Galera Cluster Health Check ==="
echo "Timestamp: $(date)"

mariadb -u $MYSQL_USER -p$MYSQL_PASS -e "SHOW STATUS LIKE 'wsrep%';" | grep -E '(wsrep_cluster_size|wsrep_ready|wsrep_connected|wsrep_local_state|wsrep_flow_control)' | awk '{print $1": "$2}'

echo ""
echo "=== Replication Lag ==="
mariadb -u $MYSQL_USER -p$MYSQL_PASS -e "SHOW STATUS LIKE 'wsrep%';" | grep wsrep_local_recv_queue

echo ""
echo "=== Open Connections ==="
mariadb -u $MYSQL_USER -p$MYSQL_PASS -e "SHOW PROCESSLIST;" | wc -l

Use the script for quick health checks:

chmod +x /usr/local/bin/check-galera-health.sh
/usr/local/bin/check-galera-health.sh your_root_password

Monitor key metrics with a more detailed view:

# Real-time cluster monitoring
watch -n 2 'mariadb -u root -e "SHOW STATUS LIKE '"'"'wsrep%'"'"';"'

# Check for flow control events (indicates replication lag)
mariadb -u root -e "SHOW STATUS LIKE 'wsrep_flow_control%';"

# Monitor commit latency
mariadb -u root -e "SHOW STATUS LIKE 'wsrep_local_cert_failures';"

Set up MySQL monitoring with Prometheus exporter:

# Install MySQL exporter
sudo apt-get install -y prometheus-mysqld-exporter

# Configure exporter
sudo nano /etc/default/prometheus-mysqld-exporter

# Add credentials
MYSQLD_EXPORTER_PASSWORD="exporter_password"
MYSQLD_EXPORTER_USERNAME="exporter"

# Start exporter
sudo systemctl start prometheus-mysqld-exporter
sudo systemctl enable prometheus-mysqld-exporter

# Verify metrics are available
curl http://localhost:9104/metrics | grep wsrep

Split-Brain Prevention

Split-brain occurs when cluster nodes become isolated into separate partitions. Galera prevents this through quorum-based writes and proper configuration.

Configure minimum cluster size to prevent split-brain:

-- Set on any node
SET GLOBAL wsrep_provider_options="pc.ignore_sb=OFF;pc.bootstrap=NO";

Monitor for potential split-brain conditions:

# Check cluster status
mariadb -u root -e "SHOW STATUS LIKE 'wsrep_cluster_status';"

# If status is "non-Primary", the node is not in majority partition
# Restart or rejoin the cluster appropriately

Implement automatic split-brain recovery with proper quorum setup:

-- On isolated partition with minority nodes
-- Force rejoin to cluster if network is restored
SET GLOBAL wsrep_cluster_address="gcomm://192.168.1.10,192.168.1.11,192.168.1.12";

For true split-brain recovery, you may need manual intervention:

# Check node status
mariadb -u root -e "SHOW PROCESSLIST; SHOW STATUS LIKE 'wsrep%';"

# If node is stuck in "Joining" state
sudo systemctl restart mariadb

# Verify it rejoins with correct UUID
mariadb -u root -e "SHOW STATUS LIKE 'wsrep_local_state_uuid';"

Maintenance Operations

Perform rolling maintenance without downtime by taking nodes offline sequentially. This allows you to apply updates, patches, or configuration changes while the cluster remains operational.

Perform a rolling restart:

# Node 1: Desynced state
mariadb -u root -e "SET GLOBAL wsrep_provider_options='pc.bootstrap=NO';"

# Stop node 1
sudo systemctl stop mariadb

# Apply updates, configuration changes, etc.
sudo apt-get update
sudo apt-get install -y mariadb-server

# Restart node 1
sudo systemctl start mariadb

# Verify it rejoins
mariadb -u root -e "SHOW STATUS LIKE 'wsrep_cluster_size';"

# Repeat for nodes 2 and 3

Perform backup of the entire cluster consistently:

# Use mariabackup on a non-production node or during low traffic
mariadb-backup --backup --target-dir=/backup/mariadb-$(date +%Y%m%d) \
  --user=backup_user --password=backup_password

# Prepare the backup
mariadb-backup --prepare --target-dir=/backup/mariadb-$(date +%Y%m%d)

# Verify backup integrity
ls -la /backup/mariadb-$(date +%Y%m%d)/

Handle node failures and recovery:

# If a node crashes, it will automatically recover upon restart
sudo systemctl start mariadb

# Monitor recovery progress
tail -f /var/log/mysql/error.log | grep -i "wsrep\|innodb"

# Verify cluster membership
mariadb -u root -e "SHOW STATUS LIKE 'wsrep_cluster_size';"

Performance Tuning

Optimize Galera cluster performance for your specific workload. Configure flow control settings to prevent replication lag:

# In /etc/mysql/conf.d/99-galera.cnf
wsrep_slave_threads=4
wsrep_flow_control_mode=MONITOR
wsrep_flow_control_pause=0.1
wsrep_flow_control_resume=0.05

Tune binary log settings for performance:

binlog_cache_size=32K
binlog_row_image=MINIMAL
wsrep_max_ws_size=2147483648

Optimize InnoDB settings for Galera:

innodb_buffer_pool_size=50%_of_physical_memory
innodb_log_file_size=500M
innodb_flush_method=O_DIRECT
innodb_autoinc_lock_mode=2

Monitor and adjust wsrep_max_ws_rows based on workload:

-- Monitor for certification issues
SHOW STATUS LIKE 'wsrep_local_cert_failures';

-- If failures are high, check write-set sizes
SHOW STATUS LIKE 'wsrep_local_bf_aborts';

Test cluster performance:

# Use sysbench to simulate load
sudo apt-get install -y sysbench

# Prepare test database
sysbench /usr/share/sysbench/oltp_prepare.lua \
  --mysql-user=root \
  --mysql-password=password \
  --mysql-db=sbtest \
  --tables=10 \
  --table-size=100000 prepare

# Run benchmark
sysbench /usr/share/sysbench/oltp_read_write.lua \
  --mysql-user=root \
  --mysql-password=password \
  --mysql-db=sbtest \
  --threads=8 \
  --time=300 run

Conclusion

MariaDB Galera Cluster provides a robust, production-grade solution for highly available database infrastructure. By carefully planning your cluster topology, configuring wsrep settings appropriately, and choosing the right SST method for your environment, you can achieve both high availability and data consistency. Regular monitoring and proper maintenance procedures ensure your cluster continues to perform reliably over time. Remember to test all procedures in a non-production environment first, document your cluster configuration comprehensively, and maintain regular backups independent of cluster replication for additional safety and disaster recovery capabilities.