Blackbox Exporter for Endpoint Monitoreo

Blackbox Exporter enables external endpoint monitoreo by probing targets and exposing metrics about their availability and rendimiento. UnComo agent-based monitoreo, blackbox monitoreo tests actual user-facing endpoints without requiring installation on target systems. Esta guía covers installation, module configuration, Prometheus integration, and Grafana dashboards.

Tabla de Contenidos

Introducción

Blackbox Exporter simulates user interactions with endpoints, providing synthetic monitoreo insights. It tests HTTP/HTTPS endpoints, TCP connections, DNS resolution, ICMP pings, and other protocols. This user perspective complements internal metrics for complete observability.

Requisitos del Sistema

  • Linux kernel 2.6.32 or later
  • At least 256MB RAM
  • 100MB disk space
  • Red access Para monitoreared endpoints
  • Privileged access for ICMP (ping)
  • Root or CAP_NET_RAW capabilities

Instalación

Paso 1: Download and Install

# Create user
sudo useradd --no-create-home --shell /bin/false blackbox-exporter

# Download
cd /tmp
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.24.0/blackbox_exporter-0.24.0.linux-amd64.tar.gz
tar -xvzf blackbox_exporter-0.24.0.linux-amd64.tar.gz
cd blackbox_exporter-0.24.0.linux-amd64

# Install
sudo cp blackbox_exporter /usr/local/bin/
sudo chown blackbox-exporter:blackbox-exporter /usr/local/bin/blackbox_exporter
sudo chmod +x /usr/local/bin/blackbox_exporter

# Create directories
sudo mkdir -p /etc/blackbox-exporter
sudo chown blackbox-exporter:blackbox-exporter /etc/blackbox-exporter

Paso 2: Crear Configuración

sudo tee /etc/blackbox-exporter/blackbox.yml > /dev/null << 'EOF'
modules:
  http_2xx:
    prober: http
    timeout: 5s
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
      valid_status_codes: [200, 201, 202, 204, 206, 301, 302, 304, 307, 308]
      method: GET
      preferred_ip_protocol: "ip4"

  http_post_2xx:
    prober: http
    timeout: 5s
    http:
      method: POST
      headers:
        Content-Type: application/json
      body: '{"test":"data"}'

  tcp_connect:
    prober: tcp
    timeout: 5s

  tcp_connect_tls:
    prober: tcp
    timeout: 5s
    tcp:
      tls: true
      tls_config:
        insecure_skip_verify: false

  dns:
    prober: dns
    timeout: 5s
    dns:
      transport_protocol: "tcp"
      preferred_ip_protocol: "ip4"
      query_name: "www.prometheus.io"
      query_type: "A"
      validate_answer_rrs:
        fail_if_matches_regexp:
          - "nonexistent\\.invalid"
        fail_if_all_match_regexp:
          - "127\\.0\\.0\\.1"
        fail_if_none_matches_regexp:
          - ".*"

  icmp:
    prober: icmp
    timeout: 5s
    icmp:
      preferred_ip_protocol: "ip4"

  pop3s_banner:
    prober: tcp
    timeout: 5s
    tcp:
      query_response:
        - expect: "^\\+OK"
      tls: true
      tls_config:
        insecure_skip_verify: false

  ssh_banner:
    prober: tcp
    timeout: 5s
    tcp:
      query_response:
        - expect: "^SSH-2.0-"
      query_response_log: true
EOF

sudo chown blackbox-exporter:blackbox-exporter /etc/blackbox-exporter/blackbox.yml

Paso 3: Crear Systemd Servicio

sudo tee /etc/systemd/system/blackbox-exporter.service > /dev/null << 'EOF'
[Unit]
Description=Blackbox Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=blackbox-exporter
Group=blackbox-exporter
Type=simple
ExecStart=/usr/local/bin/blackbox_exporter \
  --config.file=/etc/blackbox-exporter/blackbox.yml \
  --web.listen-address=0.0.0.0:9115

Restart=always
RestartSec=10

StandardOutput=journal
StandardError=journal
SyslogIdentifier=blackbox-exporter

# Capabilities for ICMP
AmbientCapabilities=CAP_NET_RAW
CapabilityBoundingSet=CAP_NET_RAW
SecureBits=keep-caps

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable blackbox-exporter
sudo systemctl start blackbox-exporter

Paso 4: Verificar Instalación

# Check service
sudo systemctl status blackbox-exporter

# Test endpoint
curl http://localhost:9115/metrics | head -20

# Test probe
curl 'http://localhost:9115/probe?target=https://www.google.com&module=http_2xx'

Module Configuración

HTTP Monitoreo

modules:
  # Basic HTTP check
  http_2xx:
    prober: http
    timeout: 5s
    http:
      method: GET
      valid_status_codes: [200, 301, 302]

  # HTTPS with custom headers
  https_with_auth:
    prober: http
    timeout: 10s
    http:
      method: GET
      headers:
        Authorization: "Bearer YOUR_TOKEN"
        User-Agent: "Blackbox Exporter"
      valid_status_codes: [200]
      body_regexp:
        - "success"

  # SSL Certificate check
  https_cert:
    prober: http
    timeout: 5s
    http:
      tls_config:
        insecure_skip_verify: false
      follow_redirects: false

TCP Monitoreo

modules:
  tcp_database:
    prober: tcp
    timeout: 5s
    tcp:
      preferred_ip_protocol: "ip4"
      query_response:
        - send: ""
          expect: ""

  mysql_check:
    prober: tcp
    timeout: 5s
    tcp:
      preferred_ip_protocol: "ip4"
      tls: false

  postgresql_check:
    prober: tcp
    timeout: 5s
    tcp:
      tls: true
      tls_config:
        insecure_skip_verify: true

DNS Monitoreo

modules:
  dns_a_record:
    prober: dns
    timeout: 5s
    dns:
      transport_protocol: "udp"
      preferred_ip_protocol: "ip4"
      query_name: "example.com"
      query_type: "A"
      validate_answer_rrs:
        fail_if_matches_regexp:
          - "127\\.0\\.0\\.1"
        fail_if_none_matches_regexp:
          - ".*"

  dns_mx_record:
    prober: dns
    timeout: 5s
    dns:
      query_name: "example.com"
      query_type: "MX"

ICMP Monitoreo

modules:
  icmp:
    prober: icmp
    timeout: 5s
    icmp:
      preferred_ip_protocol: "ip4"
      dont_fragment: false

Prometheus Integración

Agregar Blackbox Scrape Configuración

# /etc/prometheus/prometheus.yml

scrape_configs:
  - job_name: 'blackbox-http'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
          - https://www.google.com
          - https://github.com
          - https://your-api.example.com/health
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

  - job_name: 'blackbox-tcp'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets:
          - 192.168.1.50:5432
          - 192.168.1.51:3306
          - 192.168.1.52:27017
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

  - job_name: 'blackbox-icmp'
    metrics_path: /probe
    params:
      module: [icmp]
    static_configs:
      - targets:
          - 8.8.8.8
          - 1.1.1.1
          - 192.168.1.1
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

  - job_name: 'blackbox-dns'
    metrics_path: /probe
    params:
      module: [dns_a_record]
    static_configs:
      - targets:
          - 8.8.8.8
          - 1.1.1.1
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

Reload Prometheus

curl -X POST http://localhost:9090/-/reload

Grafana Paneles

Crear Panel Panels

Endpoint Availability

# HTTP status
probe_http_status_code

# Success rate
rate(probe_success[5m]) * 100

# Response time
histogram_quantile(0.95, probe_duration_seconds)

TCP Connectivity

# TCP success
probe_success{job="blackbox-tcp"}

# Connection time
probe_connect_duration_seconds

DNS Resolution

# DNS lookup time
probe_dns_lookup_duration_seconds

# DNS success rate
probe_success{job="blackbox-dns"}

ICMP Ping

# Ping success
probe_success{job="blackbox-icmp"}

# RTT
probe_icmp_duration_seconds

Avanzado Monitoreo

Custom Probe Targets

# Test specific endpoint
curl 'http://localhost:9115/probe?target=https://api.example.com/health&module=http_2xx'

# Monitor with retries
curl 'http://localhost:9115/probe?target=database.example.com:5432&module=tcp_connect'

Multi-Module Checks

# Create combined checks
probe_dns_lookup_duration_seconds + probe_http_duration_seconds

Synthetic Monitoreo

Monitor user journeys:

modules:
  login_check:
    prober: http
    timeout: 10s
    http:
      method: POST
      headers:
        Content-Type: application/json
      body: '{"username":"test","password":"test"}'
      valid_status_codes: [200]

Alerting Rules

Crear Alerta Rules

# /etc/prometheus/alert_rules.yml

groups:
  - name: blackbox_alerts
    rules:
      - alert: EndpointDown
        expr: probe_success == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Endpoint {{ $labels.instance }} is down"
          description: "Endpoint {{ $labels.instance }} has been down for 5 minutes"

      - alert: SlowEndpoint
        expr: histogram_quantile(0.95, probe_duration_seconds) > 2
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Endpoint {{ $labels.instance }} is slow"
          description: "Endpoint response time exceeds 2 seconds"

      - alert: SSLCertificateExpiring
        expr: probe_ssl_earliest_cert_expiry - time() < 7 * 86400
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "SSL certificate for {{ $labels.instance }} expires in less than 7 days"

      - alert: HighErrorRate
        expr: rate(probe_http_status_code{code=~"5.."}[5m]) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High error rate on {{ $labels.instance }}"

Rendimiento Tuning

Concurrent Probes

Limit concurrent probes Para evitar overwhelming targets:

# Add to systemd service
ExecStart=/usr/local/bin/blackbox_exporter \
  --config.file=/etc/blackbox-exporter/blackbox.yml \
  --web.listen-address=0.0.0.0:9115 \
  --config.check-interval=30s

Timeout Configuración

modules:
  # Fast checks
  http_quick:
    prober: http
    timeout: 2s

  # Slower operations
  http_full:
    prober: http
    timeout: 10s

Solución de Problemas

Verificar Probe Results

# Test HTTP probe
curl -v 'http://localhost:9115/probe?target=https://example.com&module=http_2xx'

# Test TCP probe
curl 'http://localhost:9115/probe?target=example.com:443&module=tcp_connect_tls'

# Test ICMP (requires capabilities)
curl 'http://localhost:9115/probe?target=8.8.8.8&module=icmp'

Debug Configuración

# Validate config
blackbox_exporter --config.file=/etc/blackbox-exporter/blackbox.yml

# Check logs
journalctl -u blackbox-exporter -f

# Verify metrics
curl http://localhost:9115/metrics | grep probe

Common Issues

# ICMP not working - check capabilities
getcap /usr/local/bin/blackbox_exporter
sudo setcap cap_net_raw=ep /usr/local/bin/blackbox_exporter

# TLS errors - disable verification if needed
# Update module: tls_config.insecure_skip_verify = true

# Firewall blocking - verify connectivity
telnet target 443

Conclusión

Blackbox Exporter provides comprehensive external endpoint monitoreo without requiring agents. By following Esta guía, you've deployed synthetic monitoreo that simulates user interactions. Focus on monitoreo all critical user-facing endpoints, setting appropriate alerting thresholds based on SLOs, and regularly reviewing probe results to identify trends. Combined with internal metrics, blackbox monitoreo provides complete visibility into application availability and rendimiento.