HashiCorp Nomad Installation and Job Scheduling
HashiCorp Nomad is a flexible workload orchestrator that can schedule containers, virtual machines, and standalone binaries across a cluster with minimal operational overhead. This guide covers installing Nomad on Linux, setting up a cluster, writing job specifications, and integrating with Docker and Consul for service discovery.
Prerequisites
- 3+ Linux servers for Nomad servers (Ubuntu 22.04/Debian 12 or CentOS/Rocky 9)
- Additional nodes for Nomad clients (application workers)
- Docker installed on client nodes (for container workloads)
- Consul installed for service discovery (recommended)
- Ports 4646 (HTTP), 4647 (RPC), 4648 (Serf) open between nodes
Install Nomad
Install on all server and client nodes:
# Ubuntu/Debian
wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] \
https://apt.releases.hashicorp.com $(lsb_release -cs) main" \
| sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install -y nomad
# CentOS/Rocky
sudo dnf config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo
sudo dnf install -y nomad
# Verify
nomad version
# Create directories
sudo mkdir -p /etc/nomad.d /opt/nomad/data
sudo chown nomad:nomad /opt/nomad/data 2>/dev/null || true
Configure Nomad Servers
Create the server configuration on each of your 3 server nodes:
# /etc/nomad.d/nomad.hcl - adjust per node
sudo tee /etc/nomad.d/nomad.hcl << 'EOF'
datacenter = "dc1"
region = "global"
data_dir = "/opt/nomad/data"
log_level = "INFO"
bind_addr = "0.0.0.0"
server {
enabled = true
bootstrap_expect = 3
# Gossip encryption key - generate with: nomad operator keygen
encrypt = "YOUR_GOSSIP_KEY_HERE"
}
# Integrate with Consul for service discovery
consul {
address = "127.0.0.1:8500"
}
# Telemetry
telemetry {
publish_allocation_metrics = true
publish_node_metrics = true
prometheus_metrics = true
}
EOF
sudo chmod 640 /etc/nomad.d/nomad.hcl
Generate a gossip key (run once and use on all nodes):
nomad operator keygen
# Example output: HhZgJgNPMNAiUGQbm1jSMg==
# Replace YOUR_GOSSIP_KEY_HERE in the config above
Configure Nomad Clients
Client nodes run actual workloads. Install Docker if using container tasks:
# Install Docker on client nodes
sudo apt install -y docker.io
sudo systemctl enable --now docker
sudo usermod -aG docker nomad
# Client configuration
sudo tee /etc/nomad.d/nomad.hcl << 'EOF'
datacenter = "dc1"
region = "global"
data_dir = "/opt/nomad/data"
log_level = "INFO"
bind_addr = "0.0.0.0"
client {
enabled = true
# Resources to reserve for the OS (not allocatable)
reserved {
cpu = 500 # MHz
memory = 512 # MB
disk = 1024 # MB
}
# Node metadata for job placement constraints
meta {
role = "worker"
env = "production"
}
}
server {
enabled = false
}
# Announce to Nomad servers
client {
servers = ["192.168.1.10:4647", "192.168.1.11:4647", "192.168.1.12:4647"]
}
consul {
address = "127.0.0.1:8500"
}
EOF
Bootstrap the Cluster
# Start Nomad on all servers, then clients
sudo systemctl enable --now nomad
# Verify cluster (run from any server)
export NOMAD_ADDR=http://192.168.1.10:4646
nomad server members
nomad node status
# Check cluster status
nomad status
Write and Run Jobs
Nomad uses HCL job specifications. Create a simple web server job:
# /etc/nomad.d/jobs/nginx.nomad
cat > /tmp/nginx.nomad << 'EOF'
job "nginx" {
datacenters = ["dc1"]
type = "service" # long-running service
group "web" {
count = 3 # Run 3 instances
# Spread across different hosts
spread {
attribute = "${node.unique.id}"
}
network {
port "http" {
static = 80
to = 80
}
}
task "nginx" {
driver = "docker"
config {
image = "nginx:1.25-alpine"
ports = ["http"]
}
resources {
cpu = 200 # MHz
memory = 128 # MB
}
# Health check
service {
name = "nginx"
port = "http"
check {
type = "http"
path = "/"
interval = "10s"
timeout = "2s"
}
}
}
}
}
EOF
# Submit the job
nomad job run /tmp/nginx.nomad
# Monitor job status
nomad job status nginx
nomad alloc status <alloc-id>
nomad alloc logs <alloc-id>
Run a batch job (one-time or periodic task):
cat > /tmp/backup.nomad << 'EOF'
job "db-backup" {
datacenters = ["dc1"]
type = "batch"
# Run every night at 2am
periodic {
cron = "0 2 * * *"
prohibit_overlap = true
}
group "backup" {
task "pg-dump" {
driver = "exec"
config {
command = "/usr/local/bin/backup-postgres.sh"
args = ["--output=/backup/pg-$(date +%Y%m%d).dump"]
}
resources {
cpu = 500
memory = 512
}
}
}
}
EOF
nomad job run /tmp/backup.nomad
Docker and Exec Drivers
Docker driver - for containerized workloads:
task "app" {
driver = "docker"
config {
image = "myapp:v2.1.0"
ports = ["http"]
volumes = ["/data/app:/app/data"]
force_pull = false
# Resource limits at Docker level
cpu_hard_limit = true
}
env {
DB_HOST = "postgres.service.consul"
DB_PORT = "5432"
}
# Inject Consul-retrieved secrets
template {
data = "{{ with secret \"secret/myapp/db\" }}DB_PASS={{ .Data.password }}{{ end }}"
destination = "${NOMAD_SECRETS_DIR}/db.env"
env = true
}
}
Exec driver - for native binary workloads:
task "api-server" {
driver = "exec"
config {
command = "/usr/local/bin/api-server"
args = ["-port", "${NOMAD_PORT_http}", "-config", "/etc/api/config.yaml"]
}
resources {
cpu = 1000
memory = 512
}
}
Service Discovery with Consul
Nomad integrates natively with Consul for service registration and health checking:
task "api" {
driver = "docker"
config {
image = "myapi:latest"
ports = ["http"]
}
service {
name = "myapi"
port = "http"
tags = ["urlprefix-/api strip=/api"] # Fabio load balancer tag
check {
type = "http"
path = "/health"
interval = "15s"
timeout = "5s"
}
# Consul Connect sidecar for service mesh
connect {
sidecar_service {
proxy {
upstreams {
destination_name = "postgres"
local_bind_port = 5432
}
}
}
}
}
}
Query registered services from Nomad jobs:
# Nomad injects Consul environment variables automatically
# NOMAD_UPSTREAM_ADDR_postgres = 127.0.0.1:5432 (from the upstream above)
# Or use Consul DNS
curl http://myapi.service.consul/health
# List all Nomad-registered services in Consul
consul catalog services | grep -v consul
Troubleshooting
Job stuck in pending state:
# Check why allocations aren't being placed
nomad job status nginx
nomad alloc status <pending-alloc-id>
# Check node eligibility and resources
nomad node status -verbose <node-id>
# Check for placement failures
nomad eval status <eval-id>
Allocation fails to start:
# Get allocation logs
nomad alloc logs <alloc-id>
nomad alloc logs -stderr <alloc-id>
# Check Nomad client agent logs
journalctl -u nomad -n 100 --no-pager
Cluster leadership issues:
nomad operator raft list-peers
nomad server members
# Force re-election if needed (use with caution)
nomad operator raft remove-peer -peer-address=<dead-peer>:4647
Docker image pull failures:
# Verify Docker is running on client
docker info
systemctl status docker
# Test image pull manually
docker pull nginx:1.25-alpine
Conclusion
HashiCorp Nomad provides workload orchestration with significantly less operational complexity than Kubernetes, supporting Docker containers, native binaries, and JVM applications from the same scheduler. Its native Consul integration makes service discovery automatic, while the job specification format provides fine-grained control over placement constraints, resource limits, and update strategies. Nomad is particularly well-suited for teams running mixed workloads (containers + legacy binaries) across a shared cluster.


