GlusterFS Distributed Storage Configuration

GlusterFS is a scalable network filesystem capable of scaling to several petabytes and handling thousands of clients. Built on modern commodity hardware, GlusterFS combines advantages of NAS with the flexibility of scale-out architecture. This guide covers GlusterFS installation, configuration of various volume types, and advanced features including geo-replication and disaster recovery.

Architecture and Concepts

GlusterFS employs a client-server architecture where multiple servers aggregate storage resources into a unified namespace:

  • Brick: Local filesystem directory exported by a GlusterFS server
  • Volume: Logical collection of bricks assembled via volumes
  • Peer: GlusterFS server that participates in trusted storage pool
  • Client: System accessing GlusterFS volumes via FUSE or NFS

Three volume types address different requirements:

  • Distributed: Data spread across bricks (scale-out, no redundancy)
  • Replicated: Data copied across multiple bricks (high availability)
  • Dispersed: Erasure coding for space-efficient redundancy

Installation and Setup

Installing GlusterFS Server

Prepare infrastructure on each storage node:

# Add GlusterFS repository
sudo add-apt-repository ppa:glusterfs-maintainers/glusterfs-latest
sudo apt-get update

# Or with yum (CentOS/RHEL)
sudo yum install centos-release-gluster
sudo yum update

# Install GlusterFS server packages
sudo apt-get install -y glusterfs-server glusterfs-client

# Start and enable glusterd service
sudo systemctl start glusterd
sudo systemctl enable glusterd

# Verify service status
sudo systemctl status glusterd

# Check installed version
glusterd --version

Preparing Brick Storage

Each server requires dedicated storage devices formatted as bricks:

# List available storage devices
lsblk

# Create physical volumes (repeat for each device)
sudo pvcreate /dev/sdb
sudo pvcreate /dev/sdc

# Create volume group
sudo vgcreate gfs-vg /dev/sdb /dev/sdc

# Create logical volumes
sudo lvcreate -L 100G -n brick1 gfs-vg
sudo lvcreate -L 100G -n brick2 gfs-vg

# Format filesystems
sudo mkfs.ext4 /dev/gfs-vg/brick1
sudo mkfs.ext4 /dev/gfs-vg/brick2

# Create mount points
sudo mkdir -p /bricks/brick1
sudo mkdir -p /bricks/brick2

# Mount filesystems
sudo mount /dev/gfs-vg/brick1 /bricks/brick1
sudo mount /dev/gfs-vg/brick2 /bricks/brick2

# Add to fstab for persistence
echo '/dev/gfs-vg/brick1 /bricks/brick1 ext4 defaults 0 2' | sudo tee -a /etc/fstab
echo '/dev/gfs-vg/brick2 /bricks/brick2 ext4 defaults 0 2' | sudo tee -a /etc/fstab

# Verify mounts
df -h /bricks/

Firewall Configuration

GlusterFS requires specific ports:

# UFW firewall (Ubuntu)
sudo ufw allow 24007/tcp      # Glusterd daemon
sudo ufw allow 24008/tcp      # Glusterd
sudo ufw allow 49152:49251/tcp # Brick services

# iptables (CentOS/RHEL)
sudo firewall-cmd --permanent --add-service=glusterfs
sudo firewall-cmd --permanent --add-port=24007-24008/tcp
sudo firewall-cmd --permanent --add-port=49152-49251/tcp
sudo firewall-cmd --reload

# Verify open ports
sudo ss -tlnp | grep -E "24007|24008|492"

Peer Clustering

Establishing Peer Relationships

Create a trusted storage pool by adding peers:

# On node1, add node2 and node3
sudo gluster peer probe node2
sudo gluster peer probe node3

# Verify peer status
sudo gluster peer status

# List peers in pool
sudo gluster pool list

# Get detailed peer information
sudo gluster peer info

# Remove peer from pool (if needed)
sudo gluster peer detach node3

Pool Configuration

# Set pool-wide configuration options
sudo gluster volume set all auth.ssl-allow '*'
sudo gluster volume set all server.ssl on

# Enable SSL for cluster communication (optional)
# Requires certificate generation and distribution

# Check current pool configuration
sudo gluster volume info all

Volume Creation and Configuration

Creating Basic Volumes

# Create distributed volume (3 nodes, 1 brick each)
sudo gluster volume create distributed-vol \
  node1:/bricks/brick1 \
  node2:/bricks/brick1 \
  node3:/bricks/brick1

# Create replicated volume (3-way replication)
sudo gluster volume create replicated-vol replica 3 \
  node1:/bricks/brick1 \
  node2:/bricks/brick1 \
  node3:/bricks/brick1

# Create dispersed volume (4+2 erasure coding)
sudo gluster volume create dispersed-vol disperse 6 redundancy 2 \
  node1:/bricks/brick1 \
  node2:/bricks/brick1 \
  node3:/bricks/brick1 \
  node1:/bricks/brick2 \
  node2:/bricks/brick2 \
  node3:/bricks/brick2

# Start volume
sudo gluster volume start distributed-vol
sudo gluster volume start replicated-vol
sudo gluster volume start dispersed-vol

# Verify volume status
sudo gluster volume status
sudo gluster volume info

Volume Mounting

On client systems:

# Install client packages
sudo apt-get install -y glusterfs-client

# Create mount point
sudo mkdir -p /mnt/glusterfs

# Mount via FUSE (preferred)
sudo mount -t glusterfs node1:/distributed-vol /mnt/glusterfs

# Verify mount
df -h /mnt/glusterfs

# Add to fstab
echo 'node1:/distributed-vol /mnt/glusterfs glusterfs defaults 0 2' | sudo tee -a /etc/fstab

Volume Options and Tuning

# Enable performance optimization
sudo gluster volume set distributed-vol performance.write-behind on
sudo gluster volume set distributed-vol performance.quick-read on
sudo gluster volume set distributed-vol performance.readdir-ahead on

# Set rebalance parameters
sudo gluster volume set distributed-vol cluster.min-free-disk 10%

# Configure self-healing (for replicated volumes)
sudo gluster volume set replicated-vol cluster.self-heal-daemon on
sudo gluster volume set replicated-vol cluster.healing-timeout 600

# Apply and verify
sudo gluster volume get distributed-vol all

Distributed Volume Types

Distributed-Replicated Volume

Combines distribution with replication for scalability and redundancy:

# Create 2-way replicated, distributed across 3 pairs
sudo gluster volume create dist-rep-vol replica 2 \
  node1:/bricks/brick1 node2:/bricks/brick1 \
  node3:/bricks/brick1 node1:/bricks/brick2

# Start and verify
sudo gluster volume start dist-rep-vol
sudo gluster volume status dist-rep-vol

# Mount
sudo mount -t glusterfs node1:/dist-rep-vol /mnt/dist-rep

Distributed-Dispersed Volume

Combines distribution with erasure coding:

# Create dispersed volume across distribution
sudo gluster volume create dist-disp-vol disperse 3 redundancy 1 \
  node1:/bricks/brick1 node2:/bricks/brick1 node3:/bricks/brick1 \
  node1:/bricks/brick2 node2:/bricks/brick2 node3:/bricks/brick2

# Start volume
sudo gluster volume start dist-disp-vol

# Mount and test
sudo mount -t glusterfs node1:/dist-disp-vol /mnt/dist-disp

Adding Bricks to Volumes

Scale volumes by adding additional bricks:

# Add brick to distributed volume
sudo gluster volume add-brick distributed-vol node1:/bricks/brick3

# Start rebalance to redistribute data
sudo gluster volume rebalance distributed-vol start

# Monitor rebalance progress
sudo gluster volume rebalance distributed-vol status

# Stop rebalance if needed
sudo gluster volume rebalance distributed-vol stop

# Fix layout issue
sudo gluster volume fix-layout distributed-vol

Replication and High Availability

Self-Healing Configuration

Automatic healing restores data consistency in replicated volumes:

# Enable self-healing daemon
sudo gluster volume set replicated-vol cluster.self-heal-daemon on

# Configure healing split-brain resolution
sudo gluster volume set replicated-vol cluster.favorite-child-policy mtime

# Monitor healing status
sudo gluster volume heal replicated-vol info
sudo gluster volume heal replicated-vol info summary

# Manual heal operation
sudo gluster volume heal replicated-vol full

# Check heal split-brain status
sudo gluster volume heal replicated-vol info split-brain

Quorum Configuration

Prevent split-brain conditions with quorum enforcement:

# Enable server quorum
sudo gluster volume set replicated-vol cluster.server-quorum-type server

# Set quorum ratio
sudo gluster volume set replicated-vol cluster.server-quorum-ratio 51%

# Monitor quorum status
sudo gluster volume status replicated-vol

Geo-Replication for Disaster Recovery

Setting Up Geo-Replication

Replicate volumes to remote GlusterFS clusters:

# On primary cluster, set up ssh keys for geo-replication
sudo ssh-keygen -f /var/lib/glusterd/georeplication/secret.pem

# Setup on secondary cluster (receive replication)
# Ensure secondary has matching volume name

# Create geo-replication session
sudo gluster volume geo-replication replicated-vol \
  root@secondary-node1:/data/secondary-vol start

# Verify geo-replication status
sudo gluster volume geo-replication replicated-vol status

# Get detailed status
sudo gluster volume geo-replication replicated-vol status detail

Geo-Replication Monitoring

# Monitor replication synchronization
sudo gluster volume geo-replication replicated-vol status verbose

# Check for failures
sudo gluster volume geo-replication replicated-vol status detail | grep -i "faulty\|failed"

# View geo-replication logs
sudo tail -f /var/log/glusterfs/gsyncd.log

# Pause geo-replication
sudo gluster volume geo-replication replicated-vol pause

# Resume geo-replication
sudo gluster volume geo-replication replicated-vol resume

# Stop geo-replication
sudo gluster volume geo-replication replicated-vol stop

Failover and Recovery

# In disaster scenario, promote secondary to primary
# On secondary cluster:
sudo gluster volume geo-replication primary-node:/primary-vol \
  root@recovery-node:/recovery-vol stop

# Verify secondary data integrity
sudo gluster volume heal secondary-vol full

# Mount recovered volume on clients
sudo umount /mnt/glusterfs
sudo mount -t glusterfs secondary-node:/secondary-vol /mnt/glusterfs

Monitoring and Optimization

Volume Health Monitoring

# Get comprehensive volume status
sudo gluster volume status all

# Monitor individual volume
sudo gluster volume status replicated-vol detail

# Check brick status
sudo gluster volume brick status replicated-vol

# Monitor cluster topology
sudo gluster pool list

Performance Monitoring

# View volume profile statistics
sudo gluster volume profile replicated-vol start
sudo gluster volume profile replicated-vol info

# Stop profiling
sudo gluster volume profile replicated-vol stop

# Monitor top operations
sudo gluster volume top replicated-vol open
sudo gluster volume top replicated-vol read
sudo gluster volume top replicated-vol write

Troubleshooting and Logs

# Enable debug logging
sudo gluster volume set replicated-vol diagnostics.brick-log-level DEBUG

# View server logs
sudo tail -f /var/log/glusterfs/glusterd.log

# View brick logs
sudo tail -f /var/log/glusterfs/bricks/*.log

# View client logs
sudo tail -f /var/log/glusterfs/mnt*.log

# Check system logs
sudo journalctl -u glusterd -f

Backup Procedures

# Backup GlusterFS configuration
sudo mkdir -p /backup/glusterfs
sudo cp -r /var/lib/glusterd /backup/glusterfs/

# Backup volume data (from mount point)
sudo rsync -av /mnt/glusterfs/ /backup/glusterfs-data/

# Backup brick metadata
sudo find /bricks -name ".glusterfs" -exec rsync -av {} /backup/ \;

Conclusion

GlusterFS delivers flexible, scalable distributed storage suitable for diverse workload patterns. By mastering volume types—distributed, replicated, and dispersed—you can architect storage solutions matching specific performance and availability requirements. Geo-replication capabilities enable robust disaster recovery strategies, while comprehensive monitoring tools ensure operational visibility. Whether deploying for high-performance computing, cloud storage, or backup infrastructure, GlusterFS's unified namespace and horizontal scalability make it an excellent choice for modern data center environments requiring reliable, distributed storage infrastructure.