Blackbox Exporter for Endpoint Monitoring

Blackbox Exporter enables external endpoint monitoring by probing targets and exposing metrics about their availability and performance. Unlike agent-based monitoring, blackbox monitoring tests actual user-facing endpoints without requiring installation on target systems. This guide covers installation, module configuration, Prometheus integration, and Grafana dashboards.

Table of Contents

Introduction

Blackbox Exporter simulates user interactions with endpoints, providing synthetic monitoring insights. It tests HTTP/HTTPS endpoints, TCP connections, DNS resolution, ICMP pings, and other protocols. This user perspective complements internal metrics for complete observability.

System Requirements

  • Linux kernel 2.6.32 or later
  • At least 256MB RAM
  • 100MB disk space
  • Network access to monitored endpoints
  • Privileged access for ICMP (ping)
  • Root or CAP_NET_RAW capabilities

Installation

Step 1: Download and Install

# Create user
sudo useradd --no-create-home --shell /bin/false blackbox-exporter

# Download
cd /tmp
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.24.0/blackbox_exporter-0.24.0.linux-amd64.tar.gz
tar -xvzf blackbox_exporter-0.24.0.linux-amd64.tar.gz
cd blackbox_exporter-0.24.0.linux-amd64

# Install
sudo cp blackbox_exporter /usr/local/bin/
sudo chown blackbox-exporter:blackbox-exporter /usr/local/bin/blackbox_exporter
sudo chmod +x /usr/local/bin/blackbox_exporter

# Create directories
sudo mkdir -p /etc/blackbox-exporter
sudo chown blackbox-exporter:blackbox-exporter /etc/blackbox-exporter

Step 2: Create Configuration

sudo tee /etc/blackbox-exporter/blackbox.yml > /dev/null << 'EOF'
modules:
  http_2xx:
    prober: http
    timeout: 5s
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
      valid_status_codes: [200, 201, 202, 204, 206, 301, 302, 304, 307, 308]
      method: GET
      preferred_ip_protocol: "ip4"

  http_post_2xx:
    prober: http
    timeout: 5s
    http:
      method: POST
      headers:
        Content-Type: application/json
      body: '{"test":"data"}'

  tcp_connect:
    prober: tcp
    timeout: 5s

  tcp_connect_tls:
    prober: tcp
    timeout: 5s
    tcp:
      tls: true
      tls_config:
        insecure_skip_verify: false

  dns:
    prober: dns
    timeout: 5s
    dns:
      transport_protocol: "tcp"
      preferred_ip_protocol: "ip4"
      query_name: "www.prometheus.io"
      query_type: "A"
      validate_answer_rrs:
        fail_if_matches_regexp:
          - "nonexistent\\.invalid"
        fail_if_all_match_regexp:
          - "127\\.0\\.0\\.1"
        fail_if_none_matches_regexp:
          - ".*"

  icmp:
    prober: icmp
    timeout: 5s
    icmp:
      preferred_ip_protocol: "ip4"

  pop3s_banner:
    prober: tcp
    timeout: 5s
    tcp:
      query_response:
        - expect: "^\\+OK"
      tls: true
      tls_config:
        insecure_skip_verify: false

  ssh_banner:
    prober: tcp
    timeout: 5s
    tcp:
      query_response:
        - expect: "^SSH-2.0-"
      query_response_log: true
EOF

sudo chown blackbox-exporter:blackbox-exporter /etc/blackbox-exporter/blackbox.yml

Step 3: Create Systemd Service

sudo tee /etc/systemd/system/blackbox-exporter.service > /dev/null << 'EOF'
[Unit]
Description=Blackbox Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=blackbox-exporter
Group=blackbox-exporter
Type=simple
ExecStart=/usr/local/bin/blackbox_exporter \
  --config.file=/etc/blackbox-exporter/blackbox.yml \
  --web.listen-address=0.0.0.0:9115

Restart=always
RestartSec=10

StandardOutput=journal
StandardError=journal
SyslogIdentifier=blackbox-exporter

# Capabilities for ICMP
AmbientCapabilities=CAP_NET_RAW
CapabilityBoundingSet=CAP_NET_RAW
SecureBits=keep-caps

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable blackbox-exporter
sudo systemctl start blackbox-exporter

Step 4: Verify Installation

# Check service
sudo systemctl status blackbox-exporter

# Test endpoint
curl http://localhost:9115/metrics | head -20

# Test probe
curl 'http://localhost:9115/probe?target=https://www.google.com&module=http_2xx'

Module Configuration

HTTP Monitoring

modules:
  # Basic HTTP check
  http_2xx:
    prober: http
    timeout: 5s
    http:
      method: GET
      valid_status_codes: [200, 301, 302]

  # HTTPS with custom headers
  https_with_auth:
    prober: http
    timeout: 10s
    http:
      method: GET
      headers:
        Authorization: "Bearer YOUR_TOKEN"
        User-Agent: "Blackbox Exporter"
      valid_status_codes: [200]
      body_regexp:
        - "success"

  # SSL Certificate check
  https_cert:
    prober: http
    timeout: 5s
    http:
      tls_config:
        insecure_skip_verify: false
      follow_redirects: false

TCP Monitoring

modules:
  tcp_database:
    prober: tcp
    timeout: 5s
    tcp:
      preferred_ip_protocol: "ip4"
      query_response:
        - send: ""
          expect: ""

  mysql_check:
    prober: tcp
    timeout: 5s
    tcp:
      preferred_ip_protocol: "ip4"
      tls: false

  postgresql_check:
    prober: tcp
    timeout: 5s
    tcp:
      tls: true
      tls_config:
        insecure_skip_verify: true

DNS Monitoring

modules:
  dns_a_record:
    prober: dns
    timeout: 5s
    dns:
      transport_protocol: "udp"
      preferred_ip_protocol: "ip4"
      query_name: "example.com"
      query_type: "A"
      validate_answer_rrs:
        fail_if_matches_regexp:
          - "127\\.0\\.0\\.1"
        fail_if_none_matches_regexp:
          - ".*"

  dns_mx_record:
    prober: dns
    timeout: 5s
    dns:
      query_name: "example.com"
      query_type: "MX"

ICMP Monitoring

modules:
  icmp:
    prober: icmp
    timeout: 5s
    icmp:
      preferred_ip_protocol: "ip4"
      dont_fragment: false

Prometheus Integration

Add Blackbox Scrape Configuration

# /etc/prometheus/prometheus.yml

scrape_configs:
  - job_name: 'blackbox-http'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
          - https://www.google.com
          - https://github.com
          - https://your-api.example.com/health
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

  - job_name: 'blackbox-tcp'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets:
          - 192.168.1.50:5432
          - 192.168.1.51:3306
          - 192.168.1.52:27017
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

  - job_name: 'blackbox-icmp'
    metrics_path: /probe
    params:
      module: [icmp]
    static_configs:
      - targets:
          - 8.8.8.8
          - 1.1.1.1
          - 192.168.1.1
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

  - job_name: 'blackbox-dns'
    metrics_path: /probe
    params:
      module: [dns_a_record]
    static_configs:
      - targets:
          - 8.8.8.8
          - 1.1.1.1
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

Reload Prometheus

curl -X POST http://localhost:9090/-/reload

Grafana Dashboards

Create Dashboard Panels

Endpoint Availability

# HTTP status
probe_http_status_code

# Success rate
rate(probe_success[5m]) * 100

# Response time
histogram_quantile(0.95, probe_duration_seconds)

TCP Connectivity

# TCP success
probe_success{job="blackbox-tcp"}

# Connection time
probe_connect_duration_seconds

DNS Resolution

# DNS lookup time
probe_dns_lookup_duration_seconds

# DNS success rate
probe_success{job="blackbox-dns"}

ICMP Ping

# Ping success
probe_success{job="blackbox-icmp"}

# RTT
probe_icmp_duration_seconds

Advanced Monitoring

Custom Probe Targets

# Test specific endpoint
curl 'http://localhost:9115/probe?target=https://api.example.com/health&module=http_2xx'

# Monitor with retries
curl 'http://localhost:9115/probe?target=database.example.com:5432&module=tcp_connect'

Multi-Module Checks

# Create combined checks
probe_dns_lookup_duration_seconds + probe_http_duration_seconds

Synthetic Monitoring

Monitor user journeys:

modules:
  login_check:
    prober: http
    timeout: 10s
    http:
      method: POST
      headers:
        Content-Type: application/json
      body: '{"username":"test","password":"test"}'
      valid_status_codes: [200]

Alerting Rules

Create Alert Rules

# /etc/prometheus/alert_rules.yml

groups:
  - name: blackbox_alerts
    rules:
      - alert: EndpointDown
        expr: probe_success == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Endpoint {{ $labels.instance }} is down"
          description: "Endpoint {{ $labels.instance }} has been down for 5 minutes"

      - alert: SlowEndpoint
        expr: histogram_quantile(0.95, probe_duration_seconds) > 2
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Endpoint {{ $labels.instance }} is slow"
          description: "Endpoint response time exceeds 2 seconds"

      - alert: SSLCertificateExpiring
        expr: probe_ssl_earliest_cert_expiry - time() < 7 * 86400
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "SSL certificate for {{ $labels.instance }} expires in less than 7 days"

      - alert: HighErrorRate
        expr: rate(probe_http_status_code{code=~"5.."}[5m]) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High error rate on {{ $labels.instance }}"

Performance Tuning

Concurrent Probes

Limit concurrent probes to avoid overwhelming targets:

# Add to systemd service
ExecStart=/usr/local/bin/blackbox_exporter \
  --config.file=/etc/blackbox-exporter/blackbox.yml \
  --web.listen-address=0.0.0.0:9115 \
  --config.check-interval=30s

Timeout Configuration

modules:
  # Fast checks
  http_quick:
    prober: http
    timeout: 2s

  # Slower operations
  http_full:
    prober: http
    timeout: 10s

Troubleshooting

Check Probe Results

# Test HTTP probe
curl -v 'http://localhost:9115/probe?target=https://example.com&module=http_2xx'

# Test TCP probe
curl 'http://localhost:9115/probe?target=example.com:443&module=tcp_connect_tls'

# Test ICMP (requires capabilities)
curl 'http://localhost:9115/probe?target=8.8.8.8&module=icmp'

Debug Configuration

# Validate config
blackbox_exporter --config.file=/etc/blackbox-exporter/blackbox.yml

# Check logs
journalctl -u blackbox-exporter -f

# Verify metrics
curl http://localhost:9115/metrics | grep probe

Common Issues

# ICMP not working - check capabilities
getcap /usr/local/bin/blackbox_exporter
sudo setcap cap_net_raw=ep /usr/local/bin/blackbox_exporter

# TLS errors - disable verification if needed
# Update module: tls_config.insecure_skip_verify = true

# Firewall blocking - verify connectivity
telnet target 443

Conclusion

Blackbox Exporter provides comprehensive external endpoint monitoring without requiring agents. By following this guide, you've deployed synthetic monitoring that simulates user interactions. Focus on monitoring all critical user-facing endpoints, setting appropriate alerting thresholds based on SLOs, and regularly reviewing probe results to identify trends. Combined with internal metrics, blackbox monitoring provides complete visibility into application availability and performance.