Healthchecks.io Self-Hosted Cron Monitoring
Healthchecks is a self-hosted dead-man's switch service that monitors your cron jobs and scheduled tasks by expecting periodic pings. If a job doesn't check in on time, Healthchecks sends an alert — catching silent failures that would otherwise go unnoticed until damage is done.
Prerequisites
- Docker and Docker Compose installed
- A domain name (optional but recommended for SSL)
- A mail server or SMTP relay for email notifications
- Ports 8000 (or 80/443) accessible
Docker Deployment
# Create project directory
mkdir -p /opt/healthchecks && cd /opt/healthchecks
Create docker-compose.yml:
version: '3'
services:
db:
image: postgres:15-alpine
environment:
POSTGRES_DB: hc
POSTGRES_USER: hc
POSTGRES_PASSWORD: hc-db-password
volumes:
- pg_data:/var/lib/postgresql/data
web:
image: healthchecks/healthchecks:latest
depends_on:
- db
environment:
DEBUG: "False"
SECRET_KEY: "your-50-char-random-secret-key-here-abcdefghijklmnop"
ALLOWED_HOSTS: "healthchecks.yourdomain.com,localhost"
DEFAULT_FROM_EMAIL: "[email protected]"
DB: postgres
DB_HOST: db
DB_PORT: "5432"
DB_NAME: hc
DB_USER: hc
DB_PASSWORD: hc-db-password
EMAIL_HOST: smtp.sendgrid.net
EMAIL_PORT: "587"
EMAIL_USE_TLS: "True"
EMAIL_HOST_USER: apikey
EMAIL_HOST_PASSWORD: your-sendgrid-key
SITE_ROOT: "https://healthchecks.yourdomain.com"
SITE_NAME: "Healthchecks"
REGISTRATION_OPEN: "False"
ports:
- "8000:8000"
restart: unless-stopped
worker:
image: healthchecks/healthchecks:latest
depends_on:
- db
environment: &shared_env
SECRET_KEY: "your-50-char-random-secret-key-here-abcdefghijklmnop"
DB: postgres
DB_HOST: db
DB_NAME: hc
DB_USER: hc
DB_PASSWORD: hc-db-password
command: python manage.py sendalerts
restart: unless-stopped
volumes:
pg_data:
# Start the stack
docker compose up -d
# Create initial superuser
docker compose exec web python manage.py createsuperuser
# Apply database migrations
docker compose exec web python manage.py migrate
# View logs
docker compose logs web --tail 50
Access the dashboard at http://your-server-ip:8000.
Creating Checks
Via Web Interface
- Log in to the dashboard
- Click Add Check
- Set:
- Name: Daily Backup
- Tags: production, backup
- Period: 1 day
- Grace Time: 1 hour (how long to wait before alerting)
- Click Save — you get a unique ping URL
Via API
# Create an API key in the dashboard: Settings → API Access
# Create a check via API
curl -X POST "https://healthchecks.yourdomain.com/api/v3/checks/" \
-H "X-Api-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"name": "Daily Database Backup",
"tags": "production backup",
"desc": "PostgreSQL backup runs at 3am daily",
"timeout": 86400,
"grace": 3600,
"channels": "*"
}'
# Returns the check UUID and ping URL
# Create a cron-schedule check (more precise than period-based)
curl -X POST "https://healthchecks.yourdomain.com/api/v3/checks/" \
-H "X-Api-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"name": "Hourly Data Sync",
"tags": "sync",
"schedule": "0 * * * *",
"tz": "UTC",
"grace": 300
}'
Ping Protocols
Healthchecks supports multiple ping signals:
# Basic success ping (GET or POST)
curl https://healthchecks.yourdomain.com/ping/CHECK-UUID
# Signal start of a job
curl https://healthchecks.yourdomain.com/ping/CHECK-UUID/start
# Signal successful completion
curl https://healthchecks.yourdomain.com/ping/CHECK-UUID
# Signal failure explicitly
curl https://healthchecks.yourdomain.com/ping/CHECK-UUID/fail
# Send logs/output with the ping (POST with body)
curl -X POST "https://healthchecks.yourdomain.com/ping/CHECK-UUID" \
--data-raw "Backup completed. 42 files, 2.1 GB archived."
# Ping with exit code (0=success, nonzero=failure)
# Useful with the hc-ping command wrapper
EXIT_CODE=$?
curl "https://healthchecks.yourdomain.com/ping/CHECK-UUID/$EXIT_CODE"
Integrating with Cron and Systemd Timers
Cron Jobs
# Basic pattern: run job, then ping on success
0 3 * * * /usr/local/bin/backup.sh && curl -fsS --retry 3 https://healthchecks.yourdomain.com/ping/CHECK-UUID
# Better: ping start, run job, ping result
0 3 * * * curl -fsS https://hc/ping/UUID/start; /usr/local/bin/backup.sh; curl -fsS https://hc/ping/UUID/$?
# Full example with output logging
0 3 * * * root \
curl -fsS --retry 3 "https://healthchecks.yourdomain.com/ping/CHECK-UUID/start" > /dev/null; \
/usr/local/bin/backup.sh 2>&1 | curl -fsS --retry 3 --data-binary @- \
"https://healthchecks.yourdomain.com/ping/CHECK-UUID/$?" > /dev/null
Systemd Timer with Healthchecks
# /etc/systemd/system/backup.service
[Unit]
Description=Database Backup
After=network-online.target
[Service]
Type=oneshot
ExecStartPre=/usr/bin/curl -fsS --retry 3 \
https://healthchecks.yourdomain.com/ping/CHECK-UUID/start
ExecStart=/usr/local/bin/backup.sh
ExecStartPost=/usr/bin/curl -fsS --retry 3 \
https://healthchecks.yourdomain.com/ping/CHECK-UUID
OnFailure=backup-failure-notify.service
# /etc/systemd/system/backup-failure-notify.service
[Unit]
Description=Notify Healthchecks on backup failure
[Service]
Type=oneshot
ExecStart=/usr/bin/curl -fsS --retry 3 \
https://healthchecks.yourdomain.com/ping/CHECK-UUID/fail
Using the hc-ping Wrapper
# Install the official hc-ping tool
pip3 install hc-ping
# Or use the shell wrapper
cat > /usr/local/bin/healthchecks-wrap << 'EOF'
#!/bin/bash
HC_URL="https://healthchecks.yourdomain.com/ping/$1"
shift
curl -fsS "$HC_URL/start" > /dev/null
"$@" 2>&1 | tail -c 100000 | curl -fsS --data-binary @- "$HC_URL/$?" > /dev/null
EOF
chmod +x /usr/local/bin/healthchecks-wrap
# Usage
healthchecks-wrap CHECK-UUID /usr/local/bin/backup.sh
Notification Channels
Email Notifications
Configure in Integrations → Email:
- Click Add Integration → Email
- Enter the email address to notify
- Verify the email address
Slack
- Create a Slack incoming webhook in your workspace
- In Healthchecks: Integrations → Slack
- Paste the webhook URL
Ntfy / Gotify (Webhook)
# Use Healthchecks "Webhook" integration
# URL: https://ntfy.yourdomain.com/backups
# HTTP Method: POST
# Headers: Authorization: Basic base64(user:password)
# Request body: ${name} is ${status}
PagerDuty / Opsgenie
Configure via the built-in integrations in Healthchecks Settings → Integrations.
API Usage
# List all checks
curl "https://healthchecks.yourdomain.com/api/v3/checks/" \
-H "X-Api-Key: your-api-key"
# Get a specific check status
curl "https://healthchecks.yourdomain.com/api/v3/checks/CHECK-UUID" \
-H "X-Api-Key: your-api-key"
# Update a check
curl -X POST "https://healthchecks.yourdomain.com/api/v3/checks/CHECK-UUID" \
-H "X-Api-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{"name": "Updated Name", "grace": 7200}'
# Pause a check (temporarily stop alerting)
curl -X POST "https://healthchecks.yourdomain.com/api/v3/checks/CHECK-UUID/pause" \
-H "X-Api-Key: your-api-key"
# Get check's ping log
curl "https://healthchecks.yourdomain.com/api/v3/checks/CHECK-UUID/pings/" \
-H "X-Api-Key: your-api-key"
# Get check's status log
curl "https://healthchecks.yourdomain.com/api/v3/checks/CHECK-UUID/flips/" \
-H "X-Api-Key: your-api-key"
Troubleshooting
Checks showing "New" and never "Up":
# Check hasn't received a ping yet
# Test the ping URL
curl -v https://healthchecks.yourdomain.com/ping/CHECK-UUID
# Verify the check UUID matches
# Check Healthchecks logs
docker compose logs web --tail 50
Not receiving email alerts:
# Test email configuration
docker compose exec web python manage.py shell -c "
from django.core.mail import send_mail
send_mail('Test', 'Test message', '[email protected]', ['[email protected]'])
"
# Check SMTP settings in environment variables
docker compose exec web env | grep EMAIL
Alerts firing too quickly (flapping):
# Increase the grace period for the check
# Grace time should be at least: max job duration + network latency
# For a 30-minute job, set grace to at least 15 minutes
Worker not sending alerts:
# Check worker container is running
docker compose ps worker
# View worker logs
docker compose logs worker --tail 50
# Restart worker
docker compose restart worker
Conclusion
Healthchecks fills a critical gap in server monitoring by catching the "silent failures" that traditional uptime monitors miss — cron jobs that don't run, backup scripts that time out, or data syncs that never complete. By pinging a check from within your scheduled tasks, you get immediate alerts when a job misses its schedule or fails, with logs attached to each alert for quick diagnosis.


