Longhorn Distributed Storage for Kubernetes

Longhorn is a lightweight, cloud-native distributed block storage system for Kubernetes developed by Rancher that provides persistent volumes with built-in replication, snapshots, and backup capabilities. It runs entirely within your Kubernetes cluster and manages block storage across nodes, making it ideal for bare metal and VPS deployments where cloud-native storage is not available.

Prerequisites

  • Kubernetes 1.21+
  • Each node needs: open-iscsi, nfs-common, util-linux, curl, grep, awk, blkid, lsblk
  • Minimum 2 worker nodes for replication
  • Dedicated disks recommended (Longhorn can use existing directories)
  • No other storage providers managing the same disks

Check prerequisites on all nodes:

# Install required packages (Ubuntu/Debian)
sudo apt-get install -y open-iscsi nfs-common util-linux curl

# Enable iSCSI
sudo systemctl enable --now iscsid

# For CentOS/Rocky
sudo yum install -y iscsi-initiator-utils nfs-utils
sudo systemctl enable --now iscsid

# Run Longhorn environment checker
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.7.0/deploy/prerequisite/longhorn-iscsi-installation.yaml

# Wait for checker to complete
kubectl -n default get pods longhorn-iscsi-installation -w

Installing Longhorn

# Install via Helm (recommended)
helm repo add longhorn https://charts.longhorn.io
helm repo update

helm install longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --create-namespace \
  --set defaultSettings.defaultReplicaCount=3 \
  --set defaultSettings.storageMinimalAvailablePercentage=15 \
  --set defaultSettings.storageReservedPercentageForDefaultDisk=25 \
  --set persistence.defaultClassReplicaCount=3

# Or using kubectl
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.7.0/deploy/longhorn.yaml

# Verify all components are running
kubectl -n longhorn-system get pods -w

# Access Longhorn UI
kubectl -n longhorn-system port-forward svc/longhorn-frontend 8080:80
# Open http://localhost:8080

StorageClass Configuration

Longhorn creates a default StorageClass. Customize for different workload requirements:

# High-replication StorageClass for critical data
cat > storageclass-ha.yaml <<EOF
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: longhorn-ha
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Retain
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "2880"  # 48 hours
  fromBackup: ""
  diskSelector: "ssd"     # Only use SSD-labeled disks
  nodeSelector: "storage"  # Only on storage-labeled nodes
  encrypted: "true"        # Enable volume encryption
EOF

kubectl apply -f storageclass-ha.yaml

# Fast single-replica StorageClass for dev/test
cat > storageclass-fast.yaml <<EOF
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: longhorn-fast
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
parameters:
  numberOfReplicas: "1"
  dataLocality: "best-effort"  # Co-locate data with pod if possible
EOF

kubectl apply -f storageclass-fast.yaml

# Label nodes for disk selection
kubectl label node worker-01 node.longhorn.io/create-default-disk=true
kubectl label node worker-01 longhorn.io/disk-type=ssd

Creating Persistent Volumes

Deploy a stateful application using Longhorn storage:

# PVC using Longhorn HA StorageClass
cat > postgres-pvc.yaml <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data
  namespace: production
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn-ha
  resources:
    requests:
      storage: 50Gi
EOF

kubectl apply -f postgres-pvc.yaml

# StatefulSet with Longhorn storage
cat > postgres-statefulset.yaml <<EOF
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: production
spec:
  serviceName: postgres
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:16
          env:
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-secret
                  key: password
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: longhorn-ha
        resources:
          requests:
            storage: 50Gi
EOF

kubectl apply -f postgres-statefulset.yaml

# Expand a volume (online expansion supported)
kubectl patch pvc postgres-data -n production \
  -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}'

Backup to S3

Configure Longhorn to backup volumes to S3-compatible storage:

# Create S3 credentials secret
kubectl -n longhorn-system create secret generic aws-s3-secret \
  --from-literal=AWS_ACCESS_KEY_ID=your-access-key \
  --from-literal=AWS_SECRET_ACCESS_KEY=your-secret-key \
  --from-literal=AWS_ENDPOINTS=https://s3.amazonaws.com

# Configure backup target in Longhorn settings
# Via Longhorn UI: Settings > General > Backup Target
# Or via kubectl:
kubectl -n longhorn-system patch settings.longhorn.io backup-target \
  --type merge \
  -p '{"value":"s3://your-bucket@us-east-1/"}'

kubectl -n longhorn-system patch settings.longhorn.io backup-target-credential-secret \
  --type merge \
  -p '{"value":"aws-s3-secret"}'

# Create a recurring backup job
cat > recurring-backup.yaml <<EOF
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
  name: daily-backup
  namespace: longhorn-system
spec:
  cron: "0 2 * * *"  # 2 AM daily
  task: backup
  groups:
    - default
  retain: 7
  concurrency: 2
  labels:
    backup: daily
EOF

kubectl apply -f recurring-backup.yaml

# Apply recurring job label to a volume
kubectl -n longhorn-system patch volume postgres-data-postgres-0 \
  --type merge \
  -p '{"spec":{"recurringJobSelector":[{"name":"daily-backup","isGroup":false}]}}'

Volume Snapshots and Restore

# Install Volume Snapshot CRDs (if not present)
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v6.3.3/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v6.3.3/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v6.3.3/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml

# Create a VolumeSnapshotClass
cat > snapshot-class.yaml <<EOF
kind: VolumeSnapshotClass
apiVersion: snapshot.storage.k8s.io/v1
metadata:
  name: longhorn-snapshot
driver: driver.longhorn.io
deletionPolicy: Delete
parameters:
  type: snap
EOF

kubectl apply -f snapshot-class.yaml

# Take a snapshot
cat > snapshot.yaml <<EOF
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: postgres-snapshot-$(date +%Y%m%d)
  namespace: production
spec:
  volumeSnapshotClassName: longhorn-snapshot
  source:
    persistentVolumeClaimName: postgres-data
EOF

kubectl apply -f snapshot.yaml

# Restore from snapshot
cat > restore-pvc.yaml <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data-restored
  namespace: production
spec:
  storageClassName: longhorn-ha
  dataSource:
    name: postgres-snapshot-20240115
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
EOF

kubectl apply -f restore-pvc.yaml

Disaster Recovery

Restore a volume from backup to a different cluster:

# On the recovery cluster, configure same backup target

# Create a PVC that restores from backup
cat > dr-restore.yaml <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data-dr
  namespace: production
  annotations:
    # Get backup URL from Longhorn UI or longhorn API
    backup.longhorn.io/volume-name: postgres-data-postgres-0
EOF

# Or use Longhorn UI: Backup > Select backup > Restore

# List available backups via API
kubectl -n longhorn-system exec -it deploy/longhorn-manager -- \
  curl http://longhorn-backend:9500/v1/backupvolumes

Monitoring

# Longhorn exposes Prometheus metrics
# Add ServiceMonitor for Prometheus Operator
cat > longhorn-servicemonitor.yaml <<EOF
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: longhorn
  namespace: monitoring
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: longhorn-manager
  namespaceSelector:
    matchNames:
      - longhorn-system
  endpoints:
    - port: manager
      path: /metrics
      interval: 30s
EOF

kubectl apply -f longhorn-servicemonitor.yaml

# Check volume health
kubectl -n longhorn-system get volumes
kubectl -n longhorn-system get replicas

# Check disk space
kubectl -n longhorn-system get nodes.longhorn.io -o wide

Troubleshooting

Volume stuck in attaching state:

# Check which node the volume is trying to attach to
kubectl -n longhorn-system get volume <volume-name> -o yaml | grep -A5 spec

# Check iSCSI daemon on nodes
kubectl -n longhorn-system exec -it ds/longhorn-manager -- iscsiadm -m session

# Restart the instance manager on the node
kubectl -n longhorn-system delete pod -l longhorn.io/component=instance-manager

Replica degraded:

# View replica details
kubectl -n longhorn-system get replicas -l longhornvolume=<volume-name>

# Check disk space on nodes
kubectl -n longhorn-system get nodes.longhorn.io -o json | \
  jq '.items[] | {name: .metadata.name, available: .status.diskStatus}'

Backup failing:

# Test S3 credentials
kubectl -n longhorn-system exec -it deploy/longhorn-manager -- \
  aws s3 ls s3://your-bucket/ --region us-east-1

# Check backup manager logs
kubectl -n longhorn-system logs deploy/longhorn-manager | grep -i backup

Conclusion

Longhorn provides enterprise-grade distributed block storage for Kubernetes without requiring external storage systems, making it perfect for bare metal and VPS clusters. With built-in replication, S3 backup, snapshots, and online volume expansion, it covers the full storage lifecycle. Deploy it alongside monitoring to keep visibility into replica health and disk usage across your nodes.