HashiCorp Nomad Installation and Job Scheduling

HashiCorp Nomad is a flexible workload orchestrator that can schedule containers, virtual machines, and standalone binaries across a cluster with minimal operational overhead. This guide covers installing Nomad on Linux, setting up a cluster, writing job specifications, and integrating with Docker and Consul for service discovery.

Prerequisites

  • 3+ Linux servers for Nomad servers (Ubuntu 22.04/Debian 12 or CentOS/Rocky 9)
  • Additional nodes for Nomad clients (application workers)
  • Docker installed on client nodes (for container workloads)
  • Consul installed for service discovery (recommended)
  • Ports 4646 (HTTP), 4647 (RPC), 4648 (Serf) open between nodes

Install Nomad

Install on all server and client nodes:

# Ubuntu/Debian
wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] \
  https://apt.releases.hashicorp.com $(lsb_release -cs) main" \
  | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install -y nomad

# CentOS/Rocky
sudo dnf config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo
sudo dnf install -y nomad

# Verify
nomad version

# Create directories
sudo mkdir -p /etc/nomad.d /opt/nomad/data
sudo chown nomad:nomad /opt/nomad/data 2>/dev/null || true

Configure Nomad Servers

Create the server configuration on each of your 3 server nodes:

# /etc/nomad.d/nomad.hcl - adjust per node
sudo tee /etc/nomad.d/nomad.hcl << 'EOF'
datacenter = "dc1"
region     = "global"
data_dir   = "/opt/nomad/data"
log_level  = "INFO"
bind_addr  = "0.0.0.0"

server {
  enabled          = true
  bootstrap_expect = 3

  # Gossip encryption key - generate with: nomad operator keygen
  encrypt = "YOUR_GOSSIP_KEY_HERE"
}

# Integrate with Consul for service discovery
consul {
  address = "127.0.0.1:8500"
}

# Telemetry
telemetry {
  publish_allocation_metrics = true
  publish_node_metrics       = true
  prometheus_metrics         = true
}
EOF

sudo chmod 640 /etc/nomad.d/nomad.hcl

Generate a gossip key (run once and use on all nodes):

nomad operator keygen
# Example output: HhZgJgNPMNAiUGQbm1jSMg==
# Replace YOUR_GOSSIP_KEY_HERE in the config above

Configure Nomad Clients

Client nodes run actual workloads. Install Docker if using container tasks:

# Install Docker on client nodes
sudo apt install -y docker.io
sudo systemctl enable --now docker
sudo usermod -aG docker nomad

# Client configuration
sudo tee /etc/nomad.d/nomad.hcl << 'EOF'
datacenter = "dc1"
region     = "global"
data_dir   = "/opt/nomad/data"
log_level  = "INFO"
bind_addr  = "0.0.0.0"

client {
  enabled = true

  # Resources to reserve for the OS (not allocatable)
  reserved {
    cpu            = 500    # MHz
    memory         = 512    # MB
    disk           = 1024   # MB
  }

  # Node metadata for job placement constraints
  meta {
    role = "worker"
    env  = "production"
  }
}

server {
  enabled = false
}

# Announce to Nomad servers
client {
  servers = ["192.168.1.10:4647", "192.168.1.11:4647", "192.168.1.12:4647"]
}

consul {
  address = "127.0.0.1:8500"
}
EOF

Bootstrap the Cluster

# Start Nomad on all servers, then clients
sudo systemctl enable --now nomad

# Verify cluster (run from any server)
export NOMAD_ADDR=http://192.168.1.10:4646
nomad server members
nomad node status

# Check cluster status
nomad status

Write and Run Jobs

Nomad uses HCL job specifications. Create a simple web server job:

# /etc/nomad.d/jobs/nginx.nomad
cat > /tmp/nginx.nomad << 'EOF'
job "nginx" {
  datacenters = ["dc1"]
  type        = "service"    # long-running service

  group "web" {
    count = 3   # Run 3 instances

    # Spread across different hosts
    spread {
      attribute = "${node.unique.id}"
    }

    network {
      port "http" {
        static = 80
        to     = 80
      }
    }

    task "nginx" {
      driver = "docker"

      config {
        image = "nginx:1.25-alpine"
        ports = ["http"]
      }

      resources {
        cpu    = 200   # MHz
        memory = 128   # MB
      }

      # Health check
      service {
        name = "nginx"
        port = "http"

        check {
          type     = "http"
          path     = "/"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}
EOF

# Submit the job
nomad job run /tmp/nginx.nomad

# Monitor job status
nomad job status nginx
nomad alloc status <alloc-id>
nomad alloc logs <alloc-id>

Run a batch job (one-time or periodic task):

cat > /tmp/backup.nomad << 'EOF'
job "db-backup" {
  datacenters = ["dc1"]
  type        = "batch"

  # Run every night at 2am
  periodic {
    cron             = "0 2 * * *"
    prohibit_overlap = true
  }

  group "backup" {
    task "pg-dump" {
      driver = "exec"

      config {
        command = "/usr/local/bin/backup-postgres.sh"
        args    = ["--output=/backup/pg-$(date +%Y%m%d).dump"]
      }

      resources {
        cpu    = 500
        memory = 512
      }
    }
  }
}
EOF

nomad job run /tmp/backup.nomad

Docker and Exec Drivers

Docker driver - for containerized workloads:

task "app" {
  driver = "docker"

  config {
    image       = "myapp:v2.1.0"
    ports       = ["http"]
    volumes     = ["/data/app:/app/data"]
    force_pull  = false

    # Resource limits at Docker level
    cpu_hard_limit = true
  }

  env {
    DB_HOST = "postgres.service.consul"
    DB_PORT = "5432"
  }

  # Inject Consul-retrieved secrets
  template {
    data        = "{{ with secret \"secret/myapp/db\" }}DB_PASS={{ .Data.password }}{{ end }}"
    destination = "${NOMAD_SECRETS_DIR}/db.env"
    env         = true
  }
}

Exec driver - for native binary workloads:

task "api-server" {
  driver = "exec"

  config {
    command = "/usr/local/bin/api-server"
    args    = ["-port", "${NOMAD_PORT_http}", "-config", "/etc/api/config.yaml"]
  }

  resources {
    cpu    = 1000
    memory = 512
  }
}

Service Discovery with Consul

Nomad integrates natively with Consul for service registration and health checking:

task "api" {
  driver = "docker"

  config {
    image = "myapi:latest"
    ports = ["http"]
  }

  service {
    name = "myapi"
    port = "http"
    tags = ["urlprefix-/api strip=/api"]   # Fabio load balancer tag

    check {
      type     = "http"
      path     = "/health"
      interval = "15s"
      timeout  = "5s"
    }

    # Consul Connect sidecar for service mesh
    connect {
      sidecar_service {
        proxy {
          upstreams {
            destination_name = "postgres"
            local_bind_port  = 5432
          }
        }
      }
    }
  }
}

Query registered services from Nomad jobs:

# Nomad injects Consul environment variables automatically
# NOMAD_UPSTREAM_ADDR_postgres = 127.0.0.1:5432 (from the upstream above)

# Or use Consul DNS
curl http://myapi.service.consul/health

# List all Nomad-registered services in Consul
consul catalog services | grep -v consul

Troubleshooting

Job stuck in pending state:

# Check why allocations aren't being placed
nomad job status nginx
nomad alloc status <pending-alloc-id>

# Check node eligibility and resources
nomad node status -verbose <node-id>

# Check for placement failures
nomad eval status <eval-id>

Allocation fails to start:

# Get allocation logs
nomad alloc logs <alloc-id>
nomad alloc logs -stderr <alloc-id>

# Check Nomad client agent logs
journalctl -u nomad -n 100 --no-pager

Cluster leadership issues:

nomad operator raft list-peers
nomad server members

# Force re-election if needed (use with caution)
nomad operator raft remove-peer -peer-address=<dead-peer>:4647

Docker image pull failures:

# Verify Docker is running on client
docker info
systemctl status docker

# Test image pull manually
docker pull nginx:1.25-alpine

Conclusion

HashiCorp Nomad provides workload orchestration with significantly less operational complexity than Kubernetes, supporting Docker containers, native binaries, and JVM applications from the same scheduler. Its native Consul integration makes service discovery automatic, while the job specification format provides fine-grained control over placement constraints, resource limits, and update strategies. Nomad is particularly well-suited for teams running mixed workloads (containers + legacy binaries) across a shared cluster.