ZFS Installation and Configuration on Linux

ZFS is a modern, advanced filesystem and logical volume manager that provides exceptional reliability, data integrity, and performance characteristics. Originally developed by Sun Microsystems for Solaris, OpenZFS brings full ZFS functionality to Linux with enhanced compatibility and community support. This comprehensive guide covers OpenZFS installation, pool management, advanced features including snapshots and replication, and optimization techniques for production deployments.

Table of Contents

  1. ZFS Concepts and Benefits
  2. OpenZFS Installation
  3. Storage Pool Creation
  4. Dataset Management
  5. Snapshots and Clones
  6. Replication and Backup
  7. Performance Optimization
  8. Pool Maintenance
  9. Conclusion

ZFS Concepts and Benefits

ZFS provides advantages over traditional filesystems through integrated volume management:

  • Copy-on-Write (CoW): Data modification creates new blocks, preserving previous versions
  • Data Integrity: Built-in checksums detect silent data corruption
  • Snapshots: Zero-copy point-in-time filesystem snapshots
  • Compression: Transparent dataset compression (LZ4, ZSTD, gzip)
  • Deduplication: Automatic detection and elimination of duplicate data
  • Automatic Repair: RAID-like redundancy with repair on read

OpenZFS Installation

Prerequisites and Kernel Requirements

# Check kernel version
uname -r

# Verify kernel is compatible (5.4 or later recommended)
cat /proc/version

# Check available DKMS (for kernel module compilation)
which dkms

Installing OpenZFS on Ubuntu

# Add OpenZFS repository
sudo add-apt-repository ppa:jonathonf/zfs

# Update package lists
sudo apt-get update

# Install ZFS packages
sudo apt-get install -y zfsutils-linux zfs-dkms

# Load ZFS kernel module
sudo modprobe zfs

# Verify installation
zfs --version
zpool --version

# Check loaded module
lsmod | grep zfs

Installing OpenZFS on CentOS/RHEL

# Install EPEL repository
sudo yum install -y epel-release

# Add OpenZFS Yum repository
sudo yum install -y https://zfsonlinux.org/epel/zfs-release$(rpm -E %rhel).noarch.rpm

# Install ZFS
sudo yum install -y zfs zfs-dkms

# Import module
sudo modprobe zfs

# Verify installation
zfs --version
zpool --version

Persistent Module Loading

Ensure ZFS module loads at boot:

# Add zfs to modules file
echo "zfs" | sudo tee /etc/modules-load.d/zfs.conf

# Verify module configuration
cat /etc/modules-load.d/zfs.conf

# Load module immediately
sudo modprobe zfs

Storage Pool Creation

Identifying Storage Devices

# List all block devices
lsblk

# Show device details with manufacturer info
sudo hdparm -I /dev/sda | grep -E "Model|Serial"

# Check disk health with SMART
sudo smartctl -a /dev/sda

# List only available drives (not mounted)
sudo lsblk --list | grep disk

Creating Basic Pools

# Single device pool (not recommended for production)
sudo zpool create tank /dev/sda

# Mirrored pool (2-way RAID-1)
sudo zpool create tank mirror /dev/sda /dev/sdb

# 3-way mirror for higher redundancy
sudo zpool create tank mirror /dev/sda /dev/sdb mirror /dev/sdc /dev/sdd

# RAIDZ1 (similar to RAID-5, tolerates 1 drive failure)
sudo zpool create tank raidz1 /dev/sda /dev/sdb /dev/sdc /dev/sdd

# RAIDZ2 (similar to RAID-6, tolerates 2 drive failures)
sudo zpool create tank raidz2 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde

# RAIDZ3 (tolerates 3 drive failures)
sudo zpool create tank raidz3 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf

Pool Configuration Options

Create pools with optimizations:

# Pool with advanced options
sudo zpool create \
  -f \
  -o ashift=12 \
  -o autotrim=on \
  tank raidz2 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde

# Explanation of options:
# -f: Force pool creation without confirmation
# ashift=12: Optimal for 4KB sector drives (2^12 = 4096 bytes)
# autotrim=on: Automatically trim unused space (for SSDs)

# Verify pool configuration
zpool get all tank

Verifying Pool Status

# List all pools
zpool list

# Detailed pool status
zpool status tank

# Monitor pool with continuous updates
watch -n 1 'zpool status tank'

# Check pool capacity
zfs list tank

# View pool iostat
zpool iostat tank 1

Dataset Management

Creating Datasets

Datasets are hierarchical containers within pools:

# Create dataset
sudo zfs create tank/home

# Create nested dataset
sudo zfs create tank/home/alice
sudo zfs create tank/home/bob

# List datasets
zfs list

# View dataset properties
zfs get all tank/home

Dataset Properties

Configure compression, caching, and other options:

# Enable compression (LZ4 is fast and effective)
sudo zfs set compression=lz4 tank/home

# Use ZSTD for higher compression ratio (slower)
sudo zfs set compression=zstd tank/home

# Disable compression if needed
sudo zfs set compression=off tank/data

# Enable deduplication (memory intensive, use carefully)
sudo zfs set dedup=on tank/home

# Set quota to limit dataset size
sudo zfs set quota=100G tank/home/alice

# Set reservation to guarantee space
sudo zfs set reservation=50G tank/home/bob

# Enable synchronous writes (safer but slower)
sudo zfs set sync=always tank/critical

# Disable synchronous writes (faster but less safe)
sudo zfs set sync=disabled tank/bulk

# View current properties
zfs get compression,dedup,quota tank/home

Mounting Datasets

# Automatic mounting (default)
zfs list -o name,mounted

# Mount specific dataset
sudo zfs mount tank/home

# Mount all datasets
sudo zfs mount -a

# Unmount dataset
sudo zfs unmount tank/home

# Check mount points
df -h | grep tank

# Mount with custom options
sudo zfs set mountpoint=/home/users tank/home
sudo zfs mount tank/home

Snapshots and Clones

Creating and Managing Snapshots

Snapshots capture filesystem state at a point in time with minimal overhead:

# Create snapshot
sudo zfs snapshot tank/home@backup-20240101

# Create recursive snapshots (all datasets under tank/home)
sudo zfs snapshot -r tank/home@backup-20240101

# List snapshots
zfs list -t snapshot

# View snapshot details
zfs list -o name,creation,used tank/home@backup-20240101

# Delete snapshot
sudo zfs destroy tank/home@backup-20240101

# Delete recursive snapshots
sudo zfs destroy -r tank/home@backup-20240101

Snapshot Scheduling with cron

# Create automated daily snapshots
cat > /tmp/zfs-snapshot.sh <<'EOF'
#!/bin/bash
DATE=$(date +%Y%m%d_%H%M%S)
zfs snapshot -r tank@daily_$DATE
# Keep only last 7 daily snapshots
zfs list -t snapshot -o name | grep "tank@daily_" | head -n -7 | xargs -r zfs destroy
EOF

sudo mv /tmp/zfs-snapshot.sh /usr/local/bin/
sudo chmod +x /usr/local/bin/zfs-snapshot.sh

# Add to crontab
(sudo crontab -l 2>/dev/null; echo "0 1 * * * /usr/local/bin/zfs-snapshot.sh") | sudo crontab -

Creating Clones

Clones are writable copies of snapshots:

# Create clone from snapshot
sudo zfs clone tank/home@backup-20240101 tank/home-restored

# Set different properties on clone
sudo zfs set mountpoint=/mnt/recovered tank/home-restored

# Mount clone
sudo zfs mount tank/home-recovered

# List clones
zfs list -t clone

# Delete clone (also destroys origin snapshot)
sudo zfs destroy tank/home-recovered

Replication and Backup

Send and Receive Operations

Replicate datasets to other pools or remote systems:

# Create snapshot for replication
sudo zfs snapshot tank/home@backup-20240101

# Send snapshot to file (full backup)
sudo zfs send tank/home@backup-20240101 > /backup/home-20240101.zfs

# Send incremental snapshot (only changes)
sudo zfs send -i tank/home@backup-20240101 tank/home@backup-20240102 > /backup/home-incremental.zfs

# Receive snapshot to restore
sudo zfs receive backup/home-restored < /backup/home-20240101.zfs

# Receive incremental snapshot
sudo zfs receive backup/home < /backup/home-incremental.zfs

# Verify received dataset
zfs list backup/home-restored

Remote Replication via SSH

# Replicate to remote system
sudo zfs snapshot tank/home@backup-20240101

# Full send to remote
sudo zfs send tank/home@backup-20240101 | \
  ssh backup-server 'zfs receive backup/home'

# Incremental send to remote
sudo zfs send -i tank/home@backup-20240101 tank/home@backup-20240102 | \
  ssh backup-server 'zfs receive backup/home'

# Verify replication on remote
ssh backup-server 'zfs list backup/home'

Automated Replication Script

cat > /usr/local/bin/zfs-replicate.sh <<'EOF'
#!/bin/bash
SOURCE_POOL="tank"
SOURCE_DATASET="home"
REMOTE_HOST="backup-server"
REMOTE_POOL="backup"
DATE=$(date +%Y%m%d_%H%M%S)
SNAPSHOT="${SOURCE_DATASET}@repl_${DATE}"

# Create snapshot
zfs snapshot ${SOURCE_POOL}/${SNAPSHOT}

# Send to remote (check for existing snapshots)
LAST_SNAP=$(zfs list -t snapshot -o name | grep "${SOURCE_POOL}/${SOURCE_DATASET}@repl_" | tail -1)

if [ -z "$LAST_SNAP" ]; then
  # Full backup
  zfs send ${SOURCE_POOL}/${SNAPSHOT} | \
    ssh ${REMOTE_HOST} "zfs receive ${REMOTE_POOL}/${SOURCE_DATASET}"
else
  # Incremental backup
  zfs send -i ${LAST_SNAP} ${SOURCE_POOL}/${SNAPSHOT} | \
    ssh ${REMOTE_HOST} "zfs receive ${REMOTE_POOL}/${SOURCE_DATASET}"
fi

echo "Replication of ${SOURCE_POOL}/${SOURCE_DATASET} completed"
EOF

sudo chmod +x /usr/local/bin/zfs-replicate.sh

# Schedule replication
(sudo crontab -l 2>/dev/null; echo "0 2 * * * /usr/local/bin/zfs-replicate.sh") | sudo crontab -

Performance Optimization

ARC (Adaptive Replacement Cache) Tuning

ARC is ZFS's intelligent memory cache:

# Check current ARC usage
sudo grep -i "^c\|^p\|^size" /proc/spl/kstat/zfs/arcstats

# View ARC metrics more clearly
cat <<'EOF' > /tmp/arc-stats.sh
#!/bin/bash
arc_stats=/proc/spl/kstat/zfs/arcstats
if [ -f "$arc_stats" ]; then
  echo "=== ZFS ARC Statistics ==="
  grep "^size" "$arc_stats" | awk '{print "ARC Size:", $3 / 1073741824, "GB"}'
  grep "^p\b" "$arc_stats" | awk '{print "Target Size:", $3 / 1073741824, "GB"}'
  grep "^data" "$arc_stats" | awk '{print "Data:", $3 / 1073741824, "GB"}'
fi
EOF
chmod +x /tmp/arc-stats.sh
/tmp/arc-stats.sh

Limiting ARC Cache

Set maximum ARC size:

# Create ZFS module configuration
echo "options zfs zfs_arc_max=8589934592" | sudo tee /etc/modprobe.d/zfs.conf

# 8GB max ARC size (8589934592 bytes = 8GB)
# Reload module to apply
sudo modprobe -r zfs
sudo modprobe zfs

# Verify setting
cat /sys/module/zfs/parameters/zfs_arc_max

Prefetch Tuning

Optimize prefetch for workload type:

# Disable prefetch for random workloads
echo "0" | sudo tee /sys/module/zfs/parameters/zfs_prefetch_disable

# Enable aggressive prefetch for sequential reads
echo "0" | sudo tee /sys/module/zfs/parameters/zfs_prefetch_disable

# Check current setting
cat /sys/module/zfs/parameters/zfs_prefetch_disable

Pool Maintenance

Regular Scrubbing

Scrub detects and repairs data corruption:

# Run pool scrub
sudo zpool scrub tank

# Monitor scrub progress
watch -n 1 'zpool status tank'

# Check scrub statistics
zpool status tank | grep -A 5 "scan:"

# Schedule regular scrubs (monthly)
echo "0 2 1 * * zpool scrub tank" | sudo crontab -

# List scheduled scrub jobs
sudo crontab -l

Pool Updates and Upgrades

# Check pool and feature versions
zpool get version tank

# Upgrade pool to latest feature set
sudo zpool upgrade tank

# Upgrade all pools
sudo zpool upgrade -a

# Check filesystem version
zfs get version tank/home

# Upgrade filesystem
sudo zfs upgrade tank/home

# Upgrade all filesystems
sudo zfs upgrade -a

Removing and Replacing Devices

# Remove device from mirror
sudo zpool detach tank /dev/sdb

# Replace device (online replacement)
sudo zpool replace tank /dev/sda /dev/sdc

# Resilver status
zpool status tank

# Monitor resilver progress
watch -n 1 'zpool iostat tank 1'

Conclusion

ZFS represents the modern standard for reliable, enterprise-grade filesystems. By mastering pool creation, dataset management, and advanced features like snapshots and replication, you establish a robust storage foundation. Proper ARC tuning and regular maintenance ensure optimal performance and data integrity. Whether deploying for backup infrastructure, NAS systems, or critical database storage, ZFS's copy-on-write architecture, checksumming capabilities, and integrated volume management deliver the reliability and performance modern infrastructure demands.