Grafana OnCall Installation for Incident Management
Grafana OnCall is an open-source incident management and on-call scheduling tool that centralizes alert routing, escalation policies, and on-call rotations with integrations for Slack, Telegram, PagerDuty, and more. This guide covers deploying Grafana OnCall, creating schedules, configuring escalation policies, routing alerts, and managing on-call rotations.
Prerequisites
- Ubuntu 20.04+ or CentOS 8+ / Rocky Linux 8+
- Docker and Docker Compose
- Grafana 9.x+ (OnCall is a Grafana plugin)
- 2 GB RAM minimum
- Outbound internet access for notification channels (Slack, SMS, etc.)
Installing Grafana OnCall with Docker Compose
Grafana OnCall requires several services: the OnCall engine, Celery workers, Redis, and a database.
# Clone the OnCall repository for docker-compose templates
git clone https://github.com/grafana/oncall.git
cd oncall
# Copy and edit the environment file
cp .env.example .env
nano .env
Edit the .env file with your settings:
# .env key settings
SECRET_KEY=your-random-secret-key-here # openssl rand -hex 32
DATABASE_TYPE=sqlite3 # or postgresql
RABBITMQ_URI=amqp://rabbitmq:rabbitmq@rabbitmq:5672/
REDIS_URI=redis://redis:6379/0
DJANGO_SETTINGS_MODULE=settings.hobby # hobby or production
# For PostgreSQL (recommended for production)
# DATABASE_TYPE=postgresql
# DATABASE_HOST=db
# DATABASE_PORT=5432
# DATABASE_USER=oncall
# DATABASE_PASSWORD=oncallpassword
# DATABASE_NAME=oncall
# Grafana connection
GRAFANA_API_URL=http://grafana:3000
# Start all services
docker compose -f docker-compose-hobby.yml up -d
# Check status
docker compose -f docker-compose-hobby.yml ps
# Watch logs during initialization
docker compose -f docker-compose-hobby.yml logs -f engine
# OnCall engine is accessible at http://localhost:8080
For production, use the full docker-compose.yml with PostgreSQL:
# Start with production config
docker compose up -d
# Run database migrations
docker compose run --rm engine python manage.py migrate
# Create a superuser
docker compose run --rm engine python manage.py createsuperuser
Connecting to Grafana
OnCall works as a Grafana plugin. Install it in your Grafana instance:
Option 1: Install from Grafana UI
- In Grafana: Administration → Plugins → search "OnCall"
- Click Install then Enable
Option 2: Install via CLI
grafana-cli plugins install grafana-oncall-app
sudo systemctl restart grafana-server
Configure the plugin:
- Go to Grafana → OnCall (left sidebar)
- In the plugin settings, set the OnCall API URL:
http://oncall-engine:8080 - Enter the admin token from the OnCall engine
- Click Connect
Get the admin token:
docker compose exec engine python manage.py shell -c "
from apps.user_management.models import User
print(User.objects.get(username='admin').auth_token.key)
"
Creating On-Call Schedules
Schedules define who is on-call at any given time.
- In Grafana, go to OnCall → Schedules → + New Schedule
- Choose schedule type:
- Calendar - manual scheduling via drag-and-drop
- API - programmatic control via API
- iCal - import from Google Calendar, Outlook, etc.
Calendar schedule example:
- Select Calendar type
- Add team members from your user list
- Use the drag-and-drop calendar to assign shifts:
- Primary: main on-call (receives alerts first)
- Override: covers for other team members
- Set the rotation time zone
- Click Save schedule
API-based schedule via REST:
# Get your API token from OnCall → Settings → API tokens
ONCALL_TOKEN="your-api-token"
ONCALL_URL="http://localhost:8080"
# Create a schedule
curl -X POST "${ONCALL_URL}/api/v1/schedules/" \
-H "Authorization: ${ONCALL_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"name": "Engineering On-Call",
"type": "calendar",
"team_id": null,
"time_zone": "America/New_York",
"slack": {"channel_id": "C1234567890", "user_group_id": "S1234567890"}
}'
Escalation Chains
Escalation chains define what happens when an alert fires: who gets notified, in what order, and after how long.
- Go to OnCall → Escalation Chains → + New Escalation Chain
- Name it (e.g., "P1 Critical Escalation")
- Add steps:
Example escalation chain:
| Step | Action | Wait |
|---|---|---|
| 1 | Notify on-call from schedule "Engineering On-Call" | 5 min |
| 2 | Notify whole team via Slack | 10 min |
| 3 | Notify on-call from schedule "Management On-Call" | - |
Steps available:
- Notify users - specific users or on-call from a schedule
- Notify user group - Slack user group
- Trigger webhook - custom HTTP callback
- Resolve - auto-resolve the alert
- Wait - pause before next step
Configure notification preferences per user:
- Go to OnCall → Users → click your profile
- Set notification methods: Slack, SMS, phone call, Telegram, email
- Configure default and important notification methods separately
Alert Routes and Integrations
Integrations connect external alert sources to OnCall.
Create an integration:
-
OnCall → Integrations → + New Integration
-
Select the source type:
- Grafana Alerting (most common)
- Prometheus Alertmanager
- PagerDuty (migration)
- Webhook (generic)
- Email inbound
-
Copy the generated webhook URL
-
Configure your alert source to send to that URL
Configure Grafana Alerting to send to OnCall:
- In Grafana: Alerting → Contact points → + Add contact point
- Select Grafana OnCall as the integration type
- Select your OnCall integration from the dropdown
- Save, then add it to your notification policies
Routing - send different alerts to different escalation chains:
- Go to the integration → Routes
- Add a route with a Jinja2 filter:
# Route critical alerts to P1 escalation
{{ payload.labels.severity == "critical" }}
# Route database alerts to DBA team
{{ "database" in payload.labels.alertname }}
# Default route (no filter) goes to the default escalation chain
Slack and Telegram Integration
Slack integration:
- Go to OnCall → Settings → Chat Ops → Slack
- Click Install Slack App and follow OAuth flow
- Select the default Slack channel for notifications
- In user profiles, link each user's Slack account
Once connected, OnCall creates threaded Slack messages for each alert:
/oncall who- shows who is currently on-call/oncall new- create a new incident- Buttons in alert messages: Acknowledge, Resolve, Silence
Telegram integration:
- Create a Telegram bot via @BotFather
- In OnCall → Settings → Chat Ops → Telegram
- Enter the bot token
- Add the bot to your Telegram group/channel
- Users link their Telegram accounts via a verification code
Managing On-Call Rotations
Override shifts (when someone is covering for another):
- Go to the schedule → click the shift you want to override
- Click + Add override
- Select the covering user and time range
View who is on-call now:
# API query
curl "${ONCALL_URL}/api/v1/schedules/{schedule_id}/final_shifts/?date_from=$(date -u +%Y-%m-%d)" \
-H "Authorization: ${ONCALL_TOKEN}"
iCal export for personal calendars:
- Go to a schedule → click the export icon
- Copy the iCal URL
- Add to Google Calendar, Apple Calendar, or Outlook
Vacation / out-of-office:
- Create an override shift covering the vacation period
- Assign a colleague to cover
- OnCall automatically routes alerts to the covering person
Troubleshooting
Engine container failing to start:
docker compose logs engine | tail -30
# Common: wrong DATABASE_URL, missing SECRET_KEY
Notifications not being sent:
# Check Celery worker logs (handles async notification delivery)
docker compose logs celery | tail -30
# Test notification manually
docker compose exec engine python manage.py shell -c "
from apps.base.messaging import get_messaging_backend_from_id
# Trigger a test notification
"
Grafana plugin shows "Connection failed":
# Verify OnCall engine is reachable from Grafana container
docker compose exec grafana wget -q -O- http://engine:8080/api/v1/info/
Alert not routing to the right escalation:
# Check route evaluation in the integration
# OnCall shows which route matched and why
# Review Jinja2 templates in route filters
Schedule showing wrong timezone:
- Ensure user profiles have the correct timezone set
- Verify the schedule timezone setting
Conclusion
Grafana OnCall provides a complete open-source incident management workflow - from alert ingestion through Grafana Alerting, through escalation chains, to on-call schedules with Slack and Telegram notifications. The calendar-based schedule editor and visual escalation chain builder make it accessible to teams without dedicated SRE tooling budgets. For production deployments, use PostgreSQL as the backend database, run multiple Celery workers for reliable notification delivery, and set up schedule overrides in advance to handle planned absences.


