Docker Swarm Cluster Configuration

Docker Swarm is Docker's native orchestration platform that allows you to manage a cluster of Docker hosts as a single virtual system. This guide covers the essential aspects of setting up, configuring, and managing a Docker Swarm cluster on your VPS or baremetal infrastructure. Whether you're deploying a small 3-node cluster or scaling to dozens of servers, understanding Swarm fundamentals is critical for reliable container orchestration.

Prerequisites and Planning

Before initializing a Docker Swarm cluster, ensure all nodes meet the following requirements:

Docker Engine 1.12 or later installed on each node
Port 2377/tcp open between all nodes (cluster management)
Port 7946/tcp and 7946/udp open between all nodes (node communication)
Port 4789/udp open for overlay network traffic
All nodes able to communicate on a private network
Consistent system time across all nodes (NTP recommended)
At least 2GB RAM per node for stable operation

Best practices for cluster sizing:

# Check Docker version and ensure compatibility
docker --version

# Check current kernel and cgroup settings
uname -r
cat /proc/cgroups | head

# Verify network connectivity between planned nodes
ping -c 4 <other-node-ip>

# Test DNS resolution for node hostnames
nslookup <node-hostname>

Plan your Swarm architecture with the following considerations:

Manager nodes: Minimum 1 (but 3 recommended for high availability)
Worker nodes: Scale based on workload requirements
Distribution: Spread across physical hosts when possible
Backup strategy: Regular snapshots of manager state

Initializing the Swarm

Initialize Swarm on your first manager node. This node becomes the leader and generates certificate material for secure communication.

# Initialize Docker Swarm on the first manager node
docker swarm init

# Specify advertise address if multiple network interfaces exist
docker swarm init --advertise-addr 192.168.1.10

# Initialize with custom data path encryption (recommended)
docker swarm init --advertise-addr 192.168.1.10 --force-new-cluster

# View current Swarm status
docker info | grep -A 5 Swarm

# Get the cluster information
docker node ls

When you initialize Swarm, Docker creates:

A unique cluster ID and certificates
Two unique tokens: manager and worker tokens
The manager becomes the root certificate authority
Raft database for storing cluster state

Verify the initialization was successful:

# Check Swarm status
docker swarm inspect

# List all nodes (only one initially)
docker node ls

# View manager status
docker node inspect --pretty self

# Check if Swarm services are running
systemctl status docker

Adding Nodes to the Cluster

To scale your cluster, add additional manager or worker nodes. Always use the appropriate token for the node type.

Obtain the join tokens:

# Get the worker join token
docker swarm join-token worker

# Get the manager join token
docker swarm join-token manager

# Rotate tokens for security
docker swarm join-token --rotate worker

# Rotate manager token
docker swarm join-token --rotate manager

Add a worker node to the cluster:

# On the new worker node, execute the join command
docker swarm join \
  --token SWMTKN-1-5k9w2r7y1z3q4p9m8n7o6l5k4j3h2g1f \
  192.168.1.10:2377

# Verify successful join
docker swarm inspect

Add manager nodes for high availability:

# On the new manager node, join using manager token
docker swarm join \
  --token SWMTKN-1-0k1l2m3n4o5p6q7r8s9t0u1v2w3x4y5z \
  192.168.1.10:2377

# On any existing manager, verify the new manager joined
docker node ls

Configure node labels for workload placement:

# Add labels to nodes (useful for constraining services)
docker node update --label-add environment=production node1

# Add multiple labels
docker node update --label-add tier=web --label-add region=us-east node2

# View node details including labels
docker node inspect node1

# Remove a label
docker node update --label-rm environment node1

Set node availability:

# Drain a node (prevent new tasks, reschedule existing)
docker node update --availability drain node-name

# Make a node active again
docker node update --availability active node-name

# Pause a node (no new tasks, but existing continue)
docker node update --availability pause node-name

# Check node status
docker node ls

Deploying Services in Swarm

Services are the primary way to run containers in Swarm. A service defines the desired container image, replicas, port mappings, and resource constraints.

Create a simple service:

# Deploy a service with 3 replicas
docker service create \
  --name web \
  --replicas 3 \
  --publish 80:80 \
  nginx:latest

# Verify service creation
docker service ls

# Check service details
docker service inspect web

# View service tasks (running containers)
docker service ps web

Configure service constraints and placement:

# Deploy service only on production nodes
docker service create \
  --name db \
  --constraint node.labels.environment==production \
  --replicas 1 \
  postgres:15

# Use spread placement strategy
docker service create \
  --name api \
  --placement-pref spread=node.hostname \
  --replicas 3 \
  myapp:latest

# Deploy on specific node
docker service create \
  --name cache \
  --constraint node.hostname==cache-node-01 \
  redis:7-alpine

Configure resource limits and reservations:

# Limit memory and CPU
docker service create \
  --name web \
  --limit-memory 512m \
  --limit-cpu 0.5 \
  --reserve-memory 256m \
  --reserve-cpu 0.25 \
  --replicas 2 \
  nginx:latest

# View resource usage
docker stats

Configure environment variables and secrets:

# Pass environment variables
docker service create \
  --name app \
  --env DATABASE_URL=postgres://db:5432/myapp \
  --env LOG_LEVEL=info \
  --replicas 2 \
  myapp:latest

# Update service environment
docker service update \
  --env-add NEW_VAR=value \
  app

Understanding Overlay Networks

Overlay networks enable secure container-to-container communication across multiple hosts in the Swarm.

Create and manage overlay networks:

# Create an overlay network
docker network create --driver overlay mynetwork

# Create with encryption enabled (recommended)
docker network create \
  --driver overlay \
  --opt encrypted \
  --subnet 10.0.0.0/24 \
  secure-net

# List networks
docker network ls

# Inspect network
docker network inspect mynetwork

# Remove network
docker network rm mynetwork

Connect services to overlay networks:

# Create a network
docker network create --driver overlay backend

# Deploy service on network
docker service create \
  --name web \
  --network backend \
  --publish 80:80 \
  nginx:latest

# Deploy another service on same network
docker service create \
  --name app \
  --network backend \
  myapp:latest

# Services can now communicate via service name
# From app container: curl http://web/

Create attachable overlay networks for external access:

# Create attachable network (allows standalone containers)
docker network create \
  --driver overlay \
  --attachable \
  shared-net

# Connect a standalone container (if needed)
docker run -d \
  --network shared-net \
  --name standalone \
  nginx:latest

# Connect service to network
docker service create \
  --name api \
  --network shared-net \
  myapi:latest

Service discovery in overlay networks:

# All services in same network can resolve by name
# Internal DNS: <service-name>:<network-name>
# Example from within a container on the network:

# nslookup web
# Resolves to virtual IP (VIP) of web service

# Service VIP load balances across all replicas
# curl http://web:80/ will round-robin across replicas

Managing Stacks and Compose Files

Stacks are a convenient way to deploy and manage multi-service applications using Docker Compose format files.

Create a stack from a compose file:

# Create a docker-compose.yml file
cat > docker-compose.yml <<EOF
version: '3.9'

services:
  web:
    image: nginx:latest
    ports:
      - "80:80"
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 10s

  app:
    image: myapp:latest
    deploy:
      replicas: 2
      resources:
        limits:
          cpus: '0.5'
          memory: 512M

  db:
    image: postgres:15
    deploy:
      replicas: 1
      constraints:
        - node.labels.tier == database
    environment:
      POSTGRES_PASSWORD: secret

networks:
  default:
    driver: overlay
    driver_opts:
      encrypted: "true"

EOF

# Deploy the stack
docker stack deploy -c docker-compose.yml myapp

# List stacks
docker stack ls

# Check stack services
docker stack services myapp

# View stack details
docker stack ps myapp

Update and manage stacks:

# Update service in running stack
docker service update \
  --image myapp:v2.0 \
  myapp_app

# Or update via compose file changes
docker stack deploy -c docker-compose.yml myapp

# Remove entire stack and services
docker stack rm myapp

# View stack tasks
docker stack ps myapp --no-trunc

Rolling Updates and Service Scaling

Implement zero-downtime deployments with rolling updates and manage service scaling efficiently.

Configure update strategy:

# Create service with update config
docker service create \
  --name api \
  --update-delay 10s \
  --update-parallelism 1 \
  --update-failure-action pause \
  --replicas 4 \
  myapi:v1.0

# View update config
docker service inspect api | grep -A 5 UpdateConfig

Perform a rolling update:

# Update service image
docker service update \
  --image myapi:v2.0 \
  api

# Monitor the rolling update
docker service ps api --no-trunc

# Update with custom timing
docker service update \
  --image myapi:v3.0 \
  --update-delay 5s \
  --update-parallelism 2 \
  api

# Rollback if needed (go to previous image)
docker service update \
  --image myapi:v2.0 \
  api

Scale services up and down:

# Scale service to desired replicas
docker service scale api=6

# Scale multiple services
docker service scale web=5 app=3 cache=2

# Monitor scaling
docker service ps api
watch -n 1 'docker service ps api'

# Check task distribution
docker node ls
docker service ps api --filter desired-state=running

Configure health-based auto-recovery:

# Create service with restart policy
docker service create \
  --name web \
  --restart-condition on-failure \
  --restart-delay 5s \
  --restart-max-attempts 3 \
  --replicas 3 \
  nginx:latest

# Monitor task failures
docker service ps web --filter desired-state=failed

Draining and Removing Nodes

Safely remove nodes from the cluster with proper task migration.

Drain a node before removal:

# Put node in drain state (migrate all tasks)
docker node update --availability drain node-to-remove

# Monitor task migration
watch -n 1 'docker service ps -f "node=node-to-remove"'

# Wait for all tasks to migrate
# Check when node shows no running tasks
docker service ps api --filter node=node-to-remove

# Once drained, remove from cluster (on the target node)
docker swarm leave

Forcefully remove a node from another manager:

# If node is unresponsive, remove from any manager
docker node rm node-offline-id

# View node ID
docker node ls

# Remove with force flag
docker node rm --force unresponsive-node-id

Rejoin a node after removal:

# Get new join token
docker swarm join-token worker

# On the node, execute join command
docker swarm join \
  --token SWMTKN-1-newtoken \
  192.168.1.10:2377

# Verify rejoin
docker node ls

Monitoring and Troubleshooting

Monitor cluster health and diagnose common issues.

Check cluster status:

# View cluster information
docker info

# List all nodes with status
docker node ls

# Detailed node information
docker node inspect node1 --pretty

# Check Swarm database
docker system info | grep -A 10 Swarm

# View manager statistics
docker node stats

Diagnose node issues:

# Check node state
docker node inspect node2

# View node events
docker events --filter type=node

# Check node connectivity
docker node update --label-add health-check=pending node2

# Remove health check label
docker node update --label-rm health-check node2

Monitor services and tasks:

# Real-time service monitoring
docker stats

# View all tasks across cluster
docker service ps --all

# Monitor specific service
docker service ps api --no-trunc

# Check task logs
docker service logs api

# Follow task logs
docker service logs -f api

Troubleshoot common issues:

# Service won't start - check image availability
docker service create --name test busybox:latest
docker service ps test

# Tasks stuck pending - check constraints
docker service inspect api | grep -A 5 Placement

# Network issues - test connectivity
docker exec <container-id> ping <service-name>

# Check overlay network
docker network inspect <network-name>

# Verify node certificate status
docker node inspect --pretty self

Backup and restore Swarm:

# Backup manager node (stop container safely)
sudo systemctl stop docker
sudo tar -czf /backup/swarm-backup.tar.gz /var/lib/docker/swarm

# Restore from backup
sudo systemctl stop docker
sudo tar -xzf /backup/swarm-backup.tar.gz -C /
sudo systemctl start docker

# Verify cluster state
docker node ls
docker service ls

Conclusion

Docker Swarm provides a straightforward yet powerful platform for orchestrating containerized applications at scale. By understanding initialization procedures, node management, overlay networks, and service deployment strategies, you can build reliable, scalable infrastructure. Start with a small cluster, practice rolling updates and scaling operations, and gradually expand your deployment complexity. Monitor your cluster consistently and maintain regular backups of manager nodes to ensure business continuity. With these fundamentals in place, Docker Swarm becomes an excellent choice for organizations seeking a native Docker orchestration solution without Kubernetes complexity.

Docker Swarm cluster configuration

En esta página