Grafana OnCall Installation for Incident Management

Grafana OnCall is an open-source incident management and on-call scheduling tool that centralizes alert routing, escalation policies, and on-call rotations with integrations for Slack, Telegram, PagerDuty, and more. This guide covers deploying Grafana OnCall, creating schedules, configuring escalation policies, routing alerts, and managing on-call rotations.

Prerequisites

  • Ubuntu 20.04+ or CentOS 8+ / Rocky Linux 8+
  • Docker and Docker Compose
  • Grafana 9.x+ (OnCall is a Grafana plugin)
  • 2 GB RAM minimum
  • Outbound internet access for notification channels (Slack, SMS, etc.)

Installing Grafana OnCall with Docker Compose

Grafana OnCall requires several services: the OnCall engine, Celery workers, Redis, and a database.

# Clone the OnCall repository for docker-compose templates
git clone https://github.com/grafana/oncall.git
cd oncall

# Copy and edit the environment file
cp .env.example .env
nano .env

Edit the .env file with your settings:

# .env key settings
SECRET_KEY=your-random-secret-key-here  # openssl rand -hex 32
DATABASE_TYPE=sqlite3                    # or postgresql
RABBITMQ_URI=amqp://rabbitmq:rabbitmq@rabbitmq:5672/
REDIS_URI=redis://redis:6379/0
DJANGO_SETTINGS_MODULE=settings.hobby   # hobby or production

# For PostgreSQL (recommended for production)
# DATABASE_TYPE=postgresql
# DATABASE_HOST=db
# DATABASE_PORT=5432
# DATABASE_USER=oncall
# DATABASE_PASSWORD=oncallpassword
# DATABASE_NAME=oncall

# Grafana connection
GRAFANA_API_URL=http://grafana:3000
# Start all services
docker compose -f docker-compose-hobby.yml up -d

# Check status
docker compose -f docker-compose-hobby.yml ps

# Watch logs during initialization
docker compose -f docker-compose-hobby.yml logs -f engine

# OnCall engine is accessible at http://localhost:8080

For production, use the full docker-compose.yml with PostgreSQL:

# Start with production config
docker compose up -d

# Run database migrations
docker compose run --rm engine python manage.py migrate

# Create a superuser
docker compose run --rm engine python manage.py createsuperuser

Connecting to Grafana

OnCall works as a Grafana plugin. Install it in your Grafana instance:

Option 1: Install from Grafana UI

  1. In Grafana: AdministrationPlugins → search "OnCall"
  2. Click Install then Enable

Option 2: Install via CLI

grafana-cli plugins install grafana-oncall-app
sudo systemctl restart grafana-server

Configure the plugin:

  1. Go to GrafanaOnCall (left sidebar)
  2. In the plugin settings, set the OnCall API URL: http://oncall-engine:8080
  3. Enter the admin token from the OnCall engine
  4. Click Connect

Get the admin token:

docker compose exec engine python manage.py shell -c "
from apps.user_management.models import User
print(User.objects.get(username='admin').auth_token.key)
"

Creating On-Call Schedules

Schedules define who is on-call at any given time.

  1. In Grafana, go to OnCallSchedules+ New Schedule
  2. Choose schedule type:
    • Calendar - manual scheduling via drag-and-drop
    • API - programmatic control via API
    • iCal - import from Google Calendar, Outlook, etc.

Calendar schedule example:

  1. Select Calendar type
  2. Add team members from your user list
  3. Use the drag-and-drop calendar to assign shifts:
    • Primary: main on-call (receives alerts first)
    • Override: covers for other team members
  4. Set the rotation time zone
  5. Click Save schedule

API-based schedule via REST:

# Get your API token from OnCall → Settings → API tokens
ONCALL_TOKEN="your-api-token"
ONCALL_URL="http://localhost:8080"

# Create a schedule
curl -X POST "${ONCALL_URL}/api/v1/schedules/" \
  -H "Authorization: ${ONCALL_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Engineering On-Call",
    "type": "calendar",
    "team_id": null,
    "time_zone": "America/New_York",
    "slack": {"channel_id": "C1234567890", "user_group_id": "S1234567890"}
  }'

Escalation Chains

Escalation chains define what happens when an alert fires: who gets notified, in what order, and after how long.

  1. Go to OnCallEscalation Chains+ New Escalation Chain
  2. Name it (e.g., "P1 Critical Escalation")
  3. Add steps:

Example escalation chain:

StepActionWait
1Notify on-call from schedule "Engineering On-Call"5 min
2Notify whole team via Slack10 min
3Notify on-call from schedule "Management On-Call"-

Steps available:

  • Notify users - specific users or on-call from a schedule
  • Notify user group - Slack user group
  • Trigger webhook - custom HTTP callback
  • Resolve - auto-resolve the alert
  • Wait - pause before next step

Configure notification preferences per user:

  • Go to OnCallUsers → click your profile
  • Set notification methods: Slack, SMS, phone call, Telegram, email
  • Configure default and important notification methods separately

Alert Routes and Integrations

Integrations connect external alert sources to OnCall.

Create an integration:

  1. OnCallIntegrations+ New Integration

  2. Select the source type:

    • Grafana Alerting (most common)
    • Prometheus Alertmanager
    • PagerDuty (migration)
    • Webhook (generic)
    • Email inbound
  3. Copy the generated webhook URL

  4. Configure your alert source to send to that URL

Configure Grafana Alerting to send to OnCall:

  1. In Grafana: AlertingContact points+ Add contact point
  2. Select Grafana OnCall as the integration type
  3. Select your OnCall integration from the dropdown
  4. Save, then add it to your notification policies

Routing - send different alerts to different escalation chains:

  1. Go to the integration → Routes
  2. Add a route with a Jinja2 filter:
# Route critical alerts to P1 escalation
{{ payload.labels.severity == "critical" }}

# Route database alerts to DBA team
{{ "database" in payload.labels.alertname }}

# Default route (no filter) goes to the default escalation chain

Slack and Telegram Integration

Slack integration:

  1. Go to OnCallSettingsChat OpsSlack
  2. Click Install Slack App and follow OAuth flow
  3. Select the default Slack channel for notifications
  4. In user profiles, link each user's Slack account

Once connected, OnCall creates threaded Slack messages for each alert:

  • /oncall who - shows who is currently on-call
  • /oncall new - create a new incident
  • Buttons in alert messages: Acknowledge, Resolve, Silence

Telegram integration:

  1. Create a Telegram bot via @BotFather
  2. In OnCallSettingsChat OpsTelegram
  3. Enter the bot token
  4. Add the bot to your Telegram group/channel
  5. Users link their Telegram accounts via a verification code

Managing On-Call Rotations

Override shifts (when someone is covering for another):

  1. Go to the schedule → click the shift you want to override
  2. Click + Add override
  3. Select the covering user and time range

View who is on-call now:

# API query
curl "${ONCALL_URL}/api/v1/schedules/{schedule_id}/final_shifts/?date_from=$(date -u +%Y-%m-%d)" \
  -H "Authorization: ${ONCALL_TOKEN}"

iCal export for personal calendars:

  1. Go to a schedule → click the export icon
  2. Copy the iCal URL
  3. Add to Google Calendar, Apple Calendar, or Outlook

Vacation / out-of-office:

  • Create an override shift covering the vacation period
  • Assign a colleague to cover
  • OnCall automatically routes alerts to the covering person

Troubleshooting

Engine container failing to start:

docker compose logs engine | tail -30
# Common: wrong DATABASE_URL, missing SECRET_KEY

Notifications not being sent:

# Check Celery worker logs (handles async notification delivery)
docker compose logs celery | tail -30

# Test notification manually
docker compose exec engine python manage.py shell -c "
from apps.base.messaging import get_messaging_backend_from_id
# Trigger a test notification
"

Grafana plugin shows "Connection failed":

# Verify OnCall engine is reachable from Grafana container
docker compose exec grafana wget -q -O- http://engine:8080/api/v1/info/

Alert not routing to the right escalation:

# Check route evaluation in the integration
# OnCall shows which route matched and why
# Review Jinja2 templates in route filters

Schedule showing wrong timezone:

  • Ensure user profiles have the correct timezone set
  • Verify the schedule timezone setting

Conclusion

Grafana OnCall provides a complete open-source incident management workflow - from alert ingestion through Grafana Alerting, through escalation chains, to on-call schedules with Slack and Telegram notifications. The calendar-based schedule editor and visual escalation chain builder make it accessible to teams without dedicated SRE tooling budgets. For production deployments, use PostgreSQL as the backend database, run multiple Celery workers for reliable notification delivery, and set up schedule overrides in advance to handle planned absences.