GlusterFS Distributed Storage Configuration
GlusterFS is a scalable network filesystem capable of scaling to several petabytes and handling thousands of clients. Built on modern commodity hardware, GlusterFS combines advantages of NAS with the flexibility of scale-out architecture. This guide covers GlusterFS installation, configuration of various volume types, and advanced features including geo-replication and disaster recovery.
Table of Contents
- Architecture and Concepts
- Installation and Setup
- Peer Clustering
- Volume Creation and Configuration
- Distributed Volume Types
- Replication and High Availability
- Geo-Replication for Disaster Recovery
- Monitoring and Optimization
- Conclusion
Architecture and Concepts
GlusterFS employs a client-server architecture where multiple servers aggregate storage resources into a unified namespace:
- Brick: Local filesystem directory exported by a GlusterFS server
- Volume: Logical collection of bricks assembled via volumes
- Peer: GlusterFS server that participates in trusted storage pool
- Client: System accessing GlusterFS volumes via FUSE or NFS
Three volume types address different requirements:
- Distributed: Data spread across bricks (scale-out, no redundancy)
- Replicated: Data copied across multiple bricks (high availability)
- Dispersed: Erasure coding for space-efficient redundancy
Installation and Setup
Installing GlusterFS Server
Prepare infrastructure on each storage node:
# Add GlusterFS repository
sudo add-apt-repository ppa:glusterfs-maintainers/glusterfs-latest
sudo apt-get update
# Or with yum (CentOS/RHEL)
sudo yum install centos-release-gluster
sudo yum update
# Install GlusterFS server packages
sudo apt-get install -y glusterfs-server glusterfs-client
# Start and enable glusterd service
sudo systemctl start glusterd
sudo systemctl enable glusterd
# Verify service status
sudo systemctl status glusterd
# Check installed version
glusterd --version
Preparing Brick Storage
Each server requires dedicated storage devices formatted as bricks:
# List available storage devices
lsblk
# Create physical volumes (repeat for each device)
sudo pvcreate /dev/sdb
sudo pvcreate /dev/sdc
# Create volume group
sudo vgcreate gfs-vg /dev/sdb /dev/sdc
# Create logical volumes
sudo lvcreate -L 100G -n brick1 gfs-vg
sudo lvcreate -L 100G -n brick2 gfs-vg
# Format filesystems
sudo mkfs.ext4 /dev/gfs-vg/brick1
sudo mkfs.ext4 /dev/gfs-vg/brick2
# Create mount points
sudo mkdir -p /bricks/brick1
sudo mkdir -p /bricks/brick2
# Mount filesystems
sudo mount /dev/gfs-vg/brick1 /bricks/brick1
sudo mount /dev/gfs-vg/brick2 /bricks/brick2
# Add to fstab for persistence
echo '/dev/gfs-vg/brick1 /bricks/brick1 ext4 defaults 0 2' | sudo tee -a /etc/fstab
echo '/dev/gfs-vg/brick2 /bricks/brick2 ext4 defaults 0 2' | sudo tee -a /etc/fstab
# Verify mounts
df -h /bricks/
Firewall Configuration
GlusterFS requires specific ports:
# UFW firewall (Ubuntu)
sudo ufw allow 24007/tcp # Glusterd daemon
sudo ufw allow 24008/tcp # Glusterd
sudo ufw allow 49152:49251/tcp # Brick services
# iptables (CentOS/RHEL)
sudo firewall-cmd --permanent --add-service=glusterfs
sudo firewall-cmd --permanent --add-port=24007-24008/tcp
sudo firewall-cmd --permanent --add-port=49152-49251/tcp
sudo firewall-cmd --reload
# Verify open ports
sudo ss -tlnp | grep -E "24007|24008|492"
Peer Clustering
Establishing Peer Relationships
Create a trusted storage pool by adding peers:
# On node1, add node2 and node3
sudo gluster peer probe node2
sudo gluster peer probe node3
# Verify peer status
sudo gluster peer status
# List peers in pool
sudo gluster pool list
# Get detailed peer information
sudo gluster peer info
# Remove peer from pool (if needed)
sudo gluster peer detach node3
Pool Configuration
# Set pool-wide configuration options
sudo gluster volume set all auth.ssl-allow '*'
sudo gluster volume set all server.ssl on
# Enable SSL for cluster communication (optional)
# Requires certificate generation and distribution
# Check current pool configuration
sudo gluster volume info all
Volume Creation and Configuration
Creating Basic Volumes
# Create distributed volume (3 nodes, 1 brick each)
sudo gluster volume create distributed-vol \
node1:/bricks/brick1 \
node2:/bricks/brick1 \
node3:/bricks/brick1
# Create replicated volume (3-way replication)
sudo gluster volume create replicated-vol replica 3 \
node1:/bricks/brick1 \
node2:/bricks/brick1 \
node3:/bricks/brick1
# Create dispersed volume (4+2 erasure coding)
sudo gluster volume create dispersed-vol disperse 6 redundancy 2 \
node1:/bricks/brick1 \
node2:/bricks/brick1 \
node3:/bricks/brick1 \
node1:/bricks/brick2 \
node2:/bricks/brick2 \
node3:/bricks/brick2
# Start volume
sudo gluster volume start distributed-vol
sudo gluster volume start replicated-vol
sudo gluster volume start dispersed-vol
# Verify volume status
sudo gluster volume status
sudo gluster volume info
Volume Mounting
On client systems:
# Install client packages
sudo apt-get install -y glusterfs-client
# Create mount point
sudo mkdir -p /mnt/glusterfs
# Mount via FUSE (preferred)
sudo mount -t glusterfs node1:/distributed-vol /mnt/glusterfs
# Verify mount
df -h /mnt/glusterfs
# Add to fstab
echo 'node1:/distributed-vol /mnt/glusterfs glusterfs defaults 0 2' | sudo tee -a /etc/fstab
Volume Options and Tuning
# Enable performance optimization
sudo gluster volume set distributed-vol performance.write-behind on
sudo gluster volume set distributed-vol performance.quick-read on
sudo gluster volume set distributed-vol performance.readdir-ahead on
# Set rebalance parameters
sudo gluster volume set distributed-vol cluster.min-free-disk 10%
# Configure self-healing (for replicated volumes)
sudo gluster volume set replicated-vol cluster.self-heal-daemon on
sudo gluster volume set replicated-vol cluster.healing-timeout 600
# Apply and verify
sudo gluster volume get distributed-vol all
Distributed Volume Types
Distributed-Replicated Volume
Combines distribution with replication for scalability and redundancy:
# Create 2-way replicated, distributed across 3 pairs
sudo gluster volume create dist-rep-vol replica 2 \
node1:/bricks/brick1 node2:/bricks/brick1 \
node3:/bricks/brick1 node1:/bricks/brick2
# Start and verify
sudo gluster volume start dist-rep-vol
sudo gluster volume status dist-rep-vol
# Mount
sudo mount -t glusterfs node1:/dist-rep-vol /mnt/dist-rep
Distributed-Dispersed Volume
Combines distribution with erasure coding:
# Create dispersed volume across distribution
sudo gluster volume create dist-disp-vol disperse 3 redundancy 1 \
node1:/bricks/brick1 node2:/bricks/brick1 node3:/bricks/brick1 \
node1:/bricks/brick2 node2:/bricks/brick2 node3:/bricks/brick2
# Start volume
sudo gluster volume start dist-disp-vol
# Mount and test
sudo mount -t glusterfs node1:/dist-disp-vol /mnt/dist-disp
Adding Bricks to Volumes
Scale volumes by adding additional bricks:
# Add brick to distributed volume
sudo gluster volume add-brick distributed-vol node1:/bricks/brick3
# Start rebalance to redistribute data
sudo gluster volume rebalance distributed-vol start
# Monitor rebalance progress
sudo gluster volume rebalance distributed-vol status
# Stop rebalance if needed
sudo gluster volume rebalance distributed-vol stop
# Fix layout issue
sudo gluster volume fix-layout distributed-vol
Replication and High Availability
Self-Healing Configuration
Automatic healing restores data consistency in replicated volumes:
# Enable self-healing daemon
sudo gluster volume set replicated-vol cluster.self-heal-daemon on
# Configure healing split-brain resolution
sudo gluster volume set replicated-vol cluster.favorite-child-policy mtime
# Monitor healing status
sudo gluster volume heal replicated-vol info
sudo gluster volume heal replicated-vol info summary
# Manual heal operation
sudo gluster volume heal replicated-vol full
# Check heal split-brain status
sudo gluster volume heal replicated-vol info split-brain
Quorum Configuration
Prevent split-brain conditions with quorum enforcement:
# Enable server quorum
sudo gluster volume set replicated-vol cluster.server-quorum-type server
# Set quorum ratio
sudo gluster volume set replicated-vol cluster.server-quorum-ratio 51%
# Monitor quorum status
sudo gluster volume status replicated-vol
Geo-Replication for Disaster Recovery
Setting Up Geo-Replication
Replicate volumes to remote GlusterFS clusters:
# On primary cluster, set up ssh keys for geo-replication
sudo ssh-keygen -f /var/lib/glusterd/georeplication/secret.pem
# Setup on secondary cluster (receive replication)
# Ensure secondary has matching volume name
# Create geo-replication session
sudo gluster volume geo-replication replicated-vol \
root@secondary-node1:/data/secondary-vol start
# Verify geo-replication status
sudo gluster volume geo-replication replicated-vol status
# Get detailed status
sudo gluster volume geo-replication replicated-vol status detail
Geo-Replication Monitoring
# Monitor replication synchronization
sudo gluster volume geo-replication replicated-vol status verbose
# Check for failures
sudo gluster volume geo-replication replicated-vol status detail | grep -i "faulty\|failed"
# View geo-replication logs
sudo tail -f /var/log/glusterfs/gsyncd.log
# Pause geo-replication
sudo gluster volume geo-replication replicated-vol pause
# Resume geo-replication
sudo gluster volume geo-replication replicated-vol resume
# Stop geo-replication
sudo gluster volume geo-replication replicated-vol stop
Failover and Recovery
# In disaster scenario, promote secondary to primary
# On secondary cluster:
sudo gluster volume geo-replication primary-node:/primary-vol \
root@recovery-node:/recovery-vol stop
# Verify secondary data integrity
sudo gluster volume heal secondary-vol full
# Mount recovered volume on clients
sudo umount /mnt/glusterfs
sudo mount -t glusterfs secondary-node:/secondary-vol /mnt/glusterfs
Monitoring and Optimization
Volume Health Monitoring
# Get comprehensive volume status
sudo gluster volume status all
# Monitor individual volume
sudo gluster volume status replicated-vol detail
# Check brick status
sudo gluster volume brick status replicated-vol
# Monitor cluster topology
sudo gluster pool list
Performance Monitoring
# View volume profile statistics
sudo gluster volume profile replicated-vol start
sudo gluster volume profile replicated-vol info
# Stop profiling
sudo gluster volume profile replicated-vol stop
# Monitor top operations
sudo gluster volume top replicated-vol open
sudo gluster volume top replicated-vol read
sudo gluster volume top replicated-vol write
Troubleshooting and Logs
# Enable debug logging
sudo gluster volume set replicated-vol diagnostics.brick-log-level DEBUG
# View server logs
sudo tail -f /var/log/glusterfs/glusterd.log
# View brick logs
sudo tail -f /var/log/glusterfs/bricks/*.log
# View client logs
sudo tail -f /var/log/glusterfs/mnt*.log
# Check system logs
sudo journalctl -u glusterd -f
Backup Procedures
# Backup GlusterFS configuration
sudo mkdir -p /backup/glusterfs
sudo cp -r /var/lib/glusterd /backup/glusterfs/
# Backup volume data (from mount point)
sudo rsync -av /mnt/glusterfs/ /backup/glusterfs-data/
# Backup brick metadata
sudo find /bricks -name ".glusterfs" -exec rsync -av {} /backup/ \;
Conclusion
GlusterFS delivers flexible, scalable distributed storage suitable for diverse workload patterns. By mastering volume types—distributed, replicated, and dispersed—you can architect storage solutions matching specific performance and availability requirements. Geo-replication capabilities enable robust disaster recovery strategies, while comprehensive monitoring tools ensure operational visibility. Whether deploying for high-performance computing, cloud storage, or backup infrastructure, GlusterFS's unified namespace and horizontal scalability make it an excellent choice for modern data center environments requiring reliable, distributed storage infrastructure.


