GlusterFS Distributed Storage Configuration

GlusterFS is a scalable network filesystem capable of scaling to several petabytes and handling thousands of clients. Built on modern commodity hardware, GlusterFS combines advantages of NAS with the flexibility of scale-out architecture. This guide covers GlusterFS installation, configuration of various volume types, and advanced features including geo-replication and disaster recovery.

Table of Contents

  1. Architecture and Concepts
  2. Installation and Setup
  3. Peer Clustering
  4. Volume Creation and Configuration
  5. Distributed Volume Types
  6. Replication and High Availability
  7. Geo-Replication for Disaster Recovery
  8. Monitoring and Optimization
  9. Conclusion

Architecture and Concepts

GlusterFS employs a client-server architecture where multiple servers aggregate storage resources into a unified namespace:

  • Brick: Local filesystem directory exported by a GlusterFS server
  • Volume: Logical collection of bricks assembled via volumes
  • Peer: GlusterFS server that participates in trusted storage pool
  • Client: System accessing GlusterFS volumes via FUSE or NFS

Three volume types address different requirements:

  • Distributed: Data spread across bricks (scale-out, no redundancy)
  • Replicated: Data copied across multiple bricks (high availability)
  • Dispersed: Erasure coding for space-efficient redundancy

Installation and Setup

Installing GlusterFS Server

Prepare infrastructure on each storage node:

# Add GlusterFS repository
sudo add-apt-repository ppa:glusterfs-maintainers/glusterfs-latest
sudo apt-get update

# Or with yum (CentOS/RHEL)
sudo yum install centos-release-gluster
sudo yum update

# Install GlusterFS server packages
sudo apt-get install -y glusterfs-server glusterfs-client

# Start and enable glusterd service
sudo systemctl start glusterd
sudo systemctl enable glusterd

# Verify service status
sudo systemctl status glusterd

# Check installed version
glusterd --version

Preparing Brick Storage

Each server requires dedicated storage devices formatted as bricks:

# List available storage devices
lsblk

# Create physical volumes (repeat for each device)
sudo pvcreate /dev/sdb
sudo pvcreate /dev/sdc

# Create volume group
sudo vgcreate gfs-vg /dev/sdb /dev/sdc

# Create logical volumes
sudo lvcreate -L 100G -n brick1 gfs-vg
sudo lvcreate -L 100G -n brick2 gfs-vg

# Format filesystems
sudo mkfs.ext4 /dev/gfs-vg/brick1
sudo mkfs.ext4 /dev/gfs-vg/brick2

# Create mount points
sudo mkdir -p /bricks/brick1
sudo mkdir -p /bricks/brick2

# Mount filesystems
sudo mount /dev/gfs-vg/brick1 /bricks/brick1
sudo mount /dev/gfs-vg/brick2 /bricks/brick2

# Add to fstab for persistence
echo '/dev/gfs-vg/brick1 /bricks/brick1 ext4 defaults 0 2' | sudo tee -a /etc/fstab
echo '/dev/gfs-vg/brick2 /bricks/brick2 ext4 defaults 0 2' | sudo tee -a /etc/fstab

# Verify mounts
df -h /bricks/

Firewall Configuration

GlusterFS requires specific ports:

# UFW firewall (Ubuntu)
sudo ufw allow 24007/tcp      # Glusterd daemon
sudo ufw allow 24008/tcp      # Glusterd
sudo ufw allow 49152:49251/tcp # Brick services

# iptables (CentOS/RHEL)
sudo firewall-cmd --permanent --add-service=glusterfs
sudo firewall-cmd --permanent --add-port=24007-24008/tcp
sudo firewall-cmd --permanent --add-port=49152-49251/tcp
sudo firewall-cmd --reload

# Verify open ports
sudo ss -tlnp | grep -E "24007|24008|492"

Peer Clustering

Establishing Peer Relationships

Create a trusted storage pool by adding peers:

# On node1, add node2 and node3
sudo gluster peer probe node2
sudo gluster peer probe node3

# Verify peer status
sudo gluster peer status

# List peers in pool
sudo gluster pool list

# Get detailed peer information
sudo gluster peer info

# Remove peer from pool (if needed)
sudo gluster peer detach node3

Pool Configuration

# Set pool-wide configuration options
sudo gluster volume set all auth.ssl-allow '*'
sudo gluster volume set all server.ssl on

# Enable SSL for cluster communication (optional)
# Requires certificate generation and distribution

# Check current pool configuration
sudo gluster volume info all

Volume Creation and Configuration

Creating Basic Volumes

# Create distributed volume (3 nodes, 1 brick each)
sudo gluster volume create distributed-vol \
  node1:/bricks/brick1 \
  node2:/bricks/brick1 \
  node3:/bricks/brick1

# Create replicated volume (3-way replication)
sudo gluster volume create replicated-vol replica 3 \
  node1:/bricks/brick1 \
  node2:/bricks/brick1 \
  node3:/bricks/brick1

# Create dispersed volume (4+2 erasure coding)
sudo gluster volume create dispersed-vol disperse 6 redundancy 2 \
  node1:/bricks/brick1 \
  node2:/bricks/brick1 \
  node3:/bricks/brick1 \
  node1:/bricks/brick2 \
  node2:/bricks/brick2 \
  node3:/bricks/brick2

# Start volume
sudo gluster volume start distributed-vol
sudo gluster volume start replicated-vol
sudo gluster volume start dispersed-vol

# Verify volume status
sudo gluster volume status
sudo gluster volume info

Volume Mounting

On client systems:

# Install client packages
sudo apt-get install -y glusterfs-client

# Create mount point
sudo mkdir -p /mnt/glusterfs

# Mount via FUSE (preferred)
sudo mount -t glusterfs node1:/distributed-vol /mnt/glusterfs

# Verify mount
df -h /mnt/glusterfs

# Add to fstab
echo 'node1:/distributed-vol /mnt/glusterfs glusterfs defaults 0 2' | sudo tee -a /etc/fstab

Volume Options and Tuning

# Enable performance optimization
sudo gluster volume set distributed-vol performance.write-behind on
sudo gluster volume set distributed-vol performance.quick-read on
sudo gluster volume set distributed-vol performance.readdir-ahead on

# Set rebalance parameters
sudo gluster volume set distributed-vol cluster.min-free-disk 10%

# Configure self-healing (for replicated volumes)
sudo gluster volume set replicated-vol cluster.self-heal-daemon on
sudo gluster volume set replicated-vol cluster.healing-timeout 600

# Apply and verify
sudo gluster volume get distributed-vol all

Distributed Volume Types

Distributed-Replicated Volume

Combines distribution with replication for scalability and redundancy:

# Create 2-way replicated, distributed across 3 pairs
sudo gluster volume create dist-rep-vol replica 2 \
  node1:/bricks/brick1 node2:/bricks/brick1 \
  node3:/bricks/brick1 node1:/bricks/brick2

# Start and verify
sudo gluster volume start dist-rep-vol
sudo gluster volume status dist-rep-vol

# Mount
sudo mount -t glusterfs node1:/dist-rep-vol /mnt/dist-rep

Distributed-Dispersed Volume

Combines distribution with erasure coding:

# Create dispersed volume across distribution
sudo gluster volume create dist-disp-vol disperse 3 redundancy 1 \
  node1:/bricks/brick1 node2:/bricks/brick1 node3:/bricks/brick1 \
  node1:/bricks/brick2 node2:/bricks/brick2 node3:/bricks/brick2

# Start volume
sudo gluster volume start dist-disp-vol

# Mount and test
sudo mount -t glusterfs node1:/dist-disp-vol /mnt/dist-disp

Adding Bricks to Volumes

Scale volumes by adding additional bricks:

# Add brick to distributed volume
sudo gluster volume add-brick distributed-vol node1:/bricks/brick3

# Start rebalance to redistribute data
sudo gluster volume rebalance distributed-vol start

# Monitor rebalance progress
sudo gluster volume rebalance distributed-vol status

# Stop rebalance if needed
sudo gluster volume rebalance distributed-vol stop

# Fix layout issue
sudo gluster volume fix-layout distributed-vol

Replication and High Availability

Self-Healing Configuration

Automatic healing restores data consistency in replicated volumes:

# Enable self-healing daemon
sudo gluster volume set replicated-vol cluster.self-heal-daemon on

# Configure healing split-brain resolution
sudo gluster volume set replicated-vol cluster.favorite-child-policy mtime

# Monitor healing status
sudo gluster volume heal replicated-vol info
sudo gluster volume heal replicated-vol info summary

# Manual heal operation
sudo gluster volume heal replicated-vol full

# Check heal split-brain status
sudo gluster volume heal replicated-vol info split-brain

Quorum Configuration

Prevent split-brain conditions with quorum enforcement:

# Enable server quorum
sudo gluster volume set replicated-vol cluster.server-quorum-type server

# Set quorum ratio
sudo gluster volume set replicated-vol cluster.server-quorum-ratio 51%

# Monitor quorum status
sudo gluster volume status replicated-vol

Geo-Replication for Disaster Recovery

Setting Up Geo-Replication

Replicate volumes to remote GlusterFS clusters:

# On primary cluster, set up ssh keys for geo-replication
sudo ssh-keygen -f /var/lib/glusterd/georeplication/secret.pem

# Setup on secondary cluster (receive replication)
# Ensure secondary has matching volume name

# Create geo-replication session
sudo gluster volume geo-replication replicated-vol \
  root@secondary-node1:/data/secondary-vol start

# Verify geo-replication status
sudo gluster volume geo-replication replicated-vol status

# Get detailed status
sudo gluster volume geo-replication replicated-vol status detail

Geo-Replication Monitoring

# Monitor replication synchronization
sudo gluster volume geo-replication replicated-vol status verbose

# Check for failures
sudo gluster volume geo-replication replicated-vol status detail | grep -i "faulty\|failed"

# View geo-replication logs
sudo tail -f /var/log/glusterfs/gsyncd.log

# Pause geo-replication
sudo gluster volume geo-replication replicated-vol pause

# Resume geo-replication
sudo gluster volume geo-replication replicated-vol resume

# Stop geo-replication
sudo gluster volume geo-replication replicated-vol stop

Failover and Recovery

# In disaster scenario, promote secondary to primary
# On secondary cluster:
sudo gluster volume geo-replication primary-node:/primary-vol \
  root@recovery-node:/recovery-vol stop

# Verify secondary data integrity
sudo gluster volume heal secondary-vol full

# Mount recovered volume on clients
sudo umount /mnt/glusterfs
sudo mount -t glusterfs secondary-node:/secondary-vol /mnt/glusterfs

Monitoring and Optimization

Volume Health Monitoring

# Get comprehensive volume status
sudo gluster volume status all

# Monitor individual volume
sudo gluster volume status replicated-vol detail

# Check brick status
sudo gluster volume brick status replicated-vol

# Monitor cluster topology
sudo gluster pool list

Performance Monitoring

# View volume profile statistics
sudo gluster volume profile replicated-vol start
sudo gluster volume profile replicated-vol info

# Stop profiling
sudo gluster volume profile replicated-vol stop

# Monitor top operations
sudo gluster volume top replicated-vol open
sudo gluster volume top replicated-vol read
sudo gluster volume top replicated-vol write

Troubleshooting and Logs

# Enable debug logging
sudo gluster volume set replicated-vol diagnostics.brick-log-level DEBUG

# View server logs
sudo tail -f /var/log/glusterfs/glusterd.log

# View brick logs
sudo tail -f /var/log/glusterfs/bricks/*.log

# View client logs
sudo tail -f /var/log/glusterfs/mnt*.log

# Check system logs
sudo journalctl -u glusterd -f

Backup Procedures

# Backup GlusterFS configuration
sudo mkdir -p /backup/glusterfs
sudo cp -r /var/lib/glusterd /backup/glusterfs/

# Backup volume data (from mount point)
sudo rsync -av /mnt/glusterfs/ /backup/glusterfs-data/

# Backup brick metadata
sudo find /bricks -name ".glusterfs" -exec rsync -av {} /backup/ \;

Conclusion

GlusterFS delivers flexible, scalable distributed storage suitable for diverse workload patterns. By mastering volume types—distributed, replicated, and dispersed—you can architect storage solutions matching specific performance and availability requirements. Geo-replication capabilities enable robust disaster recovery strategies, while comprehensive monitoring tools ensure operational visibility. Whether deploying for high-performance computing, cloud storage, or backup infrastructure, GlusterFS's unified namespace and horizontal scalability make it an excellent choice for modern data center environments requiring reliable, distributed storage infrastructure.