MariaDB Galera Cluster Configuración

MariaDB Galera Cluster is a synchronous multi-master replication solution that proporciona high availability and scalability for MariaDB databases. Unlike traditional asynchronous replication, Galera ensures data consistency across all cluster nodos through write-set replication and certification-based conflict resolution. Esta guía cubre the complete installation, configuration, and operational aspects of setting up a production-grade MariaDB Galera Cluster on Linux systems.

Tabla de Contenidos

Requisitos y Planificación

Before deploying MariaDB Galera Cluster, asegúrate de que you have the following requirements in place. You'll need a minimum of three nodos for a production cluster, preferably distributed across different physical locations or availability zones. Each nodo requires at least 2GB RAM and 10GB almacenamiento, though production workloads typically need more. All nodos must have red connectivity with low latency between them, ideally sub-10ms round-trip time.

Ensure you have root or sudo access on all nodos. The MariaDB version should be consistent across all cluster nodos, preferably the same minor version. Firewall rules must allow TCP puertos 3306 (MySQL), 4567-4569 (Galera), and UDP puerto 4567 for cluster communication.

Document your cluster topology before installation. Decide whether you'll use rsync or mariabackup for state snapshot transfer, as this affects initial configuration. Plan for a load balancer or Nginx proxy to distribute client connections across cluster nodos.

Instalación

Begin by updating your system repositories and installing MariaDB server with Galera support. On Ubuntu or Debian systems, use the official MariaDB repositorio to asegúrate de que you get the correct version with Galera included.

# Ubuntu 20.04/22.04
curl -LsS https://r.mariadb.com/downloads/mariadb_repo_setup | sudo bash -s -- --mariadb-server-version=11.0

# Instala MariaDB with Galera
sudo apt-get update
sudo apt-get install -y mariadb-server mariadb-backup galera-4

# Verifica installation
mariadb --version
mariadbd --version

On CentOS/RHEL systems, configure the MariaDB repositorio and install:

# CentOS/RHEL 8/9
curl -LsS https://r.mariadb.com/downloads/mariadb_repo_setup | sudo bash -s -- --mariadb-server-version=11.0

# Instala packages
sudo dnf install -y MariaDB-server MariaDB-backup galera-4

# Verifica installation
mariadb --version

After installation, asegúrate de que the MariaDB servicio is stopped before configuration:

sudo systemctl stop mariadb
sudo systemctl disable mariadb

# Verifica the servicio is fully stopped
sudo ps aux | grep -i mariadb

Instala additional packages needed for SST operations:

# For rsync-based SST
sudo apt-get install -y rsync

# For mariabackup-based SST and enhanced features
sudo apt-get install -y mariadb-backup socat

# On CentOS/RHEL
sudo dnf install -y rsync socat

Wsrep Configuración

The wsrep (Write-Set Replication) configuration is the core of Galera clustering. Crea or modify the MariaDB configuration file to include Galera-specific settings. Crea a dedicated Galera configuration file for clarity:

sudo nano /etc/mysql/conf.d/99-galera.cnf

Add the following configuration, adjusting nodo names and IPs for your environment. Assume we're setting up a three-nodo cluster with nodos at 192.168.1.10, 192.168.1.11, and 192.168.1.12:

[mysqld]
# Galera cluster name - must be identical across all nodos
wsrep_cluster_name="mariadb-cluster"

# Nodo identification
wsrep_node_name="node1"
wsrep_node_address="192.168.1.10"

# Galera provider library path
wsrep_provider="/usr/lib/galera/libgalera_smm.so"

# Cluster connection string - all nodos listed
wsrep_cluster_address="gcomm://192.168.1.10,192.168.1.11,192.168.1.12"

# SST method (rsync or mariabackup)
wsrep_sst_method=mariabackup

# SST authentication - should be a dedicated replication user
wsrep_sst_auth="repl_user:repl_password"

# Habilita Galera replication
wsrep_on=ON

# Replication format - row-based for consistency
binlog_format=ROW

# Habilita binary logging for SST and recovery
log_bin=mariadb-bin
log_bin_index=mariadb-bin.index

# Server ID must be unique per nodo
server_id=1

# Maximum replication batch size in bytes
wsrep_max_ws_rows=131072
wsrep_max_ws_size=2147483648

# Certification index size
wsrep_cert_index_size=32768

# Override system variable settings for cluster awareness
wsrep_certify_nonpk=ON

# Replication timeout settings
wsrep_sst_donor_rejects_queries=OFF
wsrep_sst_method_options="--rsync-restart=on"

# Database consistency verifica level
wsrep_recovery=ON

# Performance and optimization settings
query_cache_size=0
query_cache_type=0
innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=2

For the second nodo, modify the configuration:

sudo nano /etc/mysql/conf.d/99-galera.cnf

Actualiza these parameters for node2 (192.168.1.11):

wsrep_node_name="node2"
wsrep_node_address="192.168.1.11"
server_id=2

And for node3 (192.168.1.12):

wsrep_node_name="node3"
wsrep_node_address="192.168.1.12"
server_id=3

Verifica the configuration syntax:

sudo mariadbd --validate-config

State Snapshot Transfer Methods

MariaDB Galera supports multiple State Snapshot Transfer (SST) methods. The most common are rsync and mariabackup, each with different characteristics.

Rsync is simpler but slower and locks the donor nodo during transfer. It's suitable for smaller datasets:

# Rsync SST configuration (donor side)
[sst]
method=rsync
rsync_path="/usr/bin/rsync"

Mariabackup is preferred for production as it performs non-blocking backups:

# Mariabackup SST configuration
[sst]
method=mariabackup
# Respalda user must have appropriate privileges
# This user handles SST authentication
sstuser=sst_user
sstpass=sst_password

# Parallelization for faster backup
# Set to number of CPU cores
parallel=4

# Cifra backup for red transfer
encrypt=4
encrypt_key="backup_encryption_key_here"

Crea the SST user on the first nodo after initial bootstrap:

mariadb -u root -p -e "CREATE USER 'sst_user'@'localhost' IDENTIFIED BY 'sst_password';"
mariadb -u root -p -e "GRANT RELOAD, LOCK TABLES, PROCESS, REPLICATION CLIENT ON *.* TO 'sst_user'@'localhost';"
mariadb -u root -p -e "FLUSH PRIVILEGES;"

Prueba rsync connectivity between nodos:

# From node1, test rsync to node2
rsync --version
rsync -avz /var/lib/mysql/ [email protected]:/tmp/test_rsync/

# Verifica rsync can connect without password
ssh-copy-id [email protected]
ssh [email protected] "ls -la /tmp/test_rsync/"

Cluster Bootstrap

Bootstrap the first nodo to create the initial cluster state. The bootstrap process initializes the cluster UUID and establishes the first nodo as the reference point for other nodos:

# On node1, perform bootstrap
sudo systemctl start mariadb

# Verifica the servicio started successfully
sudo systemctl status mariadb

# Check cluster status
mariadb -u root -e "SHOW STATUS LIKE 'wsrep%';"

Expected output should show:

  • wsrep_local_state_uuid: (a UUID value)
  • wsrep_cluster_size: 1
  • wsrep_ready: ON
  • wsrep_connected: ON

If the nodo fails to start, verifica the error log:

sudo tail -50 /var/log/mysql/error.log

Common bootstrap issues include incorrect paths to galera library or permission problems. Verifica the galera library exists:

ls -la /usr/lib/galera/libgalera_smm.so

If bootstrap fails with wsrep error, you may need to force initialization:

# DANGEROUS: Only for initial bootstrap failure recovery
sudo rm /var/lib/mysql/grastate.dat
sudo systemctl start mariadb

After successful bootstrap, create the replication user required for inter-nodo communication:

mariadb -u root -p << EOF
CREATE USER 'repl_user'@'%' IDENTIFIED BY 'repl_password';
GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'repl_user'@'%';
GRANT ALL PRIVILEGES ON *.* TO 'repl_user'@'localhost' IDENTIFIED BY 'repl_password';
FLUSH PRIVILEGES;
EOF

Nodo Recovery and Joining

With the first nodo running, add the remaining nodos to the cluster. Ensure the configuration on nodos 2 and 3 is complete and correctly references the cluster address.

Inicia the second nodo:

# On node2
sudo systemctl start mariadb

# Monitorea the startup and SST process
sudo tail -f /var/log/mysql/error.log

# Check progress
mariadb -u root -e "SHOW STATUS LIKE 'wsrep%';"

The nodo will perform State Snapshot Transfer from node1, which may take several minutes depending on data size. Monitorea the process:

# Check SST progress
ps aux | grep -i mariabackup
iostat -x 1 5  # Monitorea disk I/O

# Check puerto 4567 traffic during SST
netstat -tn | grep 4567

Once node2 shows wsrep_cluster_size=2 and wsrep_ready=ON, start node3:

# On node3
sudo systemctl start mariadb

# Verifica it joins the cluster
mariadb -u root -e "SHOW STATUS LIKE 'wsrep%';"

# Should eventually show cluster_size=3

Verifica all three nodos are communicating:

# Ejecuta on any nodo
mariadb -u root -e "SHOW STATUS LIKE 'wsrep%';" | grep -E '(wsrep_cluster_size|wsrep_ready|wsrep_connected)'

All nodos should show cluster_size=3, ready=ON, and connected=ON.

Monitoreo and Health Checks

Implement comprehensive monitoring of your Galera cluster. Crea a monitoring script to verifica cluster health:

#!/bin/bash
# /usr/local/bin/verifica-galera-health.sh

MYSQL_USER="root"
MYSQL_PASS="$1"

echo "=== Galera Cluster Health Check ==="
echo "Timestamp: $(date)"

mariadb -u $MYSQL_USER -p$MYSQL_PASS -e "SHOW STATUS LIKE 'wsrep%';" | grep -E '(wsrep_cluster_size|wsrep_ready|wsrep_connected|wsrep_local_state|wsrep_flow_control)' | awk '{print $1": "$2}'

echo ""
echo "=== Replication Lag ==="
mariadb -u $MYSQL_USER -p$MYSQL_PASS -e "SHOW STATUS LIKE 'wsrep%';" | grep wsrep_local_recv_queue

echo ""
echo "=== Open Connections ==="
mariadb -u $MYSQL_USER -p$MYSQL_PASS -e "SHOW PROCESSLIST;" | wc -l

Use the script for quick health checks:

chmod +x /usr/local/bin/verifica-galera-health.sh
/usr/local/bin/verifica-galera-health.sh your_root_password

Monitorea key metrics with a more detailed view:

# Real-time cluster monitoring
watch -n 2 'mariadb -u root -e "SHOW STATUS LIKE '"'"'wsrep%'"'"';"'

# Check for flow control events (indicates replication lag)
mariadb -u root -e "SHOW STATUS LIKE 'wsrep_flow_control%';"

# Monitorea commit latency
mariadb -u root -e "SHOW STATUS LIKE 'wsrep_local_cert_failures';"

Set up MySQL monitoring with Prometheus exporter:

# Instala MySQL exporter
sudo apt-get install -y prometheus-mysqld-exporter

# Configura exporter
sudo nano /etc/default/prometheus-mysqld-exporter

# Add credentials
MYSQLD_EXPORTER_PASSWORD="exporter_password"
MYSQLD_EXPORTER_USERNAME="exporter"

# Inicia exporter
sudo systemctl start prometheus-mysqld-exporter
sudo systemctl enable prometheus-mysqld-exporter

# Verifica metrics are available
curl http://localhost:9104/metrics | grep wsrep

Split-Brain Prevention

Split-brain occurs when cluster nodos become isolated into separate partitions. Galera prevents this through quorum-based writes and proper configuration.

Configura minimum cluster size to prevent split-brain:

-- Set on any nodo
SET GLOBAL wsrep_provider_options="pc.ignore_sb=OFF;pc.bootstrap=NO";

Monitorea for potential split-brain conditions:

# Check cluster status
mariadb -u root -e "SHOW STATUS LIKE 'wsrep_cluster_status';"

# If status is "non-Primary", the nodo is not in majority partition
# Reinicia or rejoin the cluster appropriately

Implement automatic split-brain recovery with proper quorum setup:

-- On isolated partition with minority nodos
-- Force rejoin to cluster if red is restored
SET GLOBAL wsrep_cluster_address="gcomm://192.168.1.10,192.168.1.11,192.168.1.12";

For true split-brain recovery, you may need manual intervention:

# Check nodo status
mariadb -u root -e "SHOW PROCESSLIST; SHOW STATUS LIKE 'wsrep%';"

# If nodo is stuck in "Joining" state
sudo systemctl restart mariadb

# Verifica it rejoins with correct UUID
mariadb -u root -e "SHOW STATUS LIKE 'wsrep_local_state_uuid';"

Mantenimiento Operations

Perform rolling maintenance without downtime by taking nodos offline sequentially. This allows you to apply updates, patches, or configuration changes while the cluster remains operational.

Perform a rolling restart:

# Nodo 1: Desynced state
mariadb -u root -e "SET GLOBAL wsrep_provider_options='pc.bootstrap=NO';"

# Detén nodo 1
sudo systemctl stop mariadb

# Apply updates, configuration changes, etc.
sudo apt-get update
sudo apt-get install -y mariadb-server

# Reinicia nodo 1
sudo systemctl start mariadb

# Verifica it rejoins
mariadb -u root -e "SHOW STATUS LIKE 'wsrep_cluster_size';"

# Repeat for nodos 2 and 3

Perform backup of the entire cluster consistently:

# Use mariabackup on a non-production nodo or during low traffic
mariadb-backup --backup --target-dir=/backup/mariadb-$(date +%Y%m%d) \
  --user=backup_user --password=backup_password

# Prepare the backup
mariadb-backup --prepare --target-dir=/backup/mariadb-$(date +%Y%m%d)

# Verifica backup integrity
ls -la /backup/mariadb-$(date +%Y%m%d)/

Handle nodo failures and recovery:

# If a nodo crashes, it will automatically recover upon restart
sudo systemctl start mariadb

# Monitorea recovery progress
tail -f /var/log/mysql/error.log | grep -i "wsrep\|innodb"

# Verifica cluster membership
mariadb -u root -e "SHOW STATUS LIKE 'wsrep_cluster_size';"

Performance Tuning

Optimiza Galera cluster performance for your specific workload. Configura flow control settings to prevent replication lag:

# In /etc/mysql/conf.d/99-galera.cnf
wsrep_slave_threads=4
wsrep_flow_control_mode=MONITOR
wsrep_flow_control_pause=0.1
wsrep_flow_control_resume=0.05

Tune binary log settings for performance:

binlog_cache_size=32K
binlog_row_image=MINIMAL
wsrep_max_ws_size=2147483648

Optimiza InnoDB settings for Galera:

innodb_buffer_pool_size=50%_of_physical_memory
innodb_log_file_size=500M
innodb_flush_method=O_DIRECT
innodb_autoinc_lock_mode=2

Monitorea and adjust wsrep_max_ws_rows based on workload:

-- Monitorea for certification issues
SHOW STATUS LIKE 'wsrep_local_cert_failures';

-- If failures are high, verifica write-set sizes
SHOW STATUS LIKE 'wsrep_local_bf_aborts';

Prueba cluster performance:

# Use sysbench to simulate load
sudo apt-get install -y sysbench

# Prepare test database
sysbench /usr/share/sysbench/oltp_prepare.lua \
  --mysql-user=root \
  --mysql-password=password \
  --mysql-db=sbtest \
  --tables=10 \
  --table-size=100000 prepare

# Ejecuta benchmark
sysbench /usr/share/sysbench/oltp_read_write.lua \
  --mysql-user=root \
  --mysql-password=password \
  --mysql-db=sbtest \
  --threads=8 \
  --time=300 run

Conclusión

MariaDB Galera Cluster proporciona a robust, production-grade solution for highly available database infrastructure. By carefully planning your cluster topology, configuring wsrep settings appropriately, and choosing the right SST method for your environment, you can achieve both high availability and data consistency. Regular monitoring and proper maintenance procedures asegúrate de que your cluster continues to perform reliably over time. Remember to test all procedures in a non-production environment first, document your cluster configuration comprehensively, and maintain regular backups independent of cluster replication for additional safety and disaster recovery capabilities.