Prometheus Instalación y Configuración

Prometheus is an open-source monitoreo and alerting toolkit designed for reliability and scalability. It collects metrics from configured targets at regular intervals, evaluates alerting rules, and can trigger alerts based on predefined conditions. This comprehensive guide covers everything needed Para instalar, configure, and secure Prometheus En su infrastructure.

Tabla de Contenidos

Introducción

Prometheus works by pulling metrics from instrumented applications and infrastructure components. UnComo traditional push-based monitoreo, Prometheus' pull model provides better control, simpler architecture, and easier debugging. The metrics are stored in a time-series database with powerful querying capabilities through PromQL.

Requisitos del Sistema

Antes de instalar Prometheus, Asegúrese de que su system meets these requirements:

  • Linux kernel 2.6.32 or later
  • At least 1GB RAM (2GB+ Recomendado for producción)
  • At least 10GB storage (scale based on retention period and metric volume)
  • Internet connectivity for downloading packages
  • Root or sudo access

Instalación

Paso 1: Download Prometheus

Iniciar by downloading the latest stable release of Prometheus:

cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.50.0/prometheus-2.50.0.linux-amd64.tar.gz
tar -xvzf prometheus-2.50.0.linux-amd64.tar.gz
cd prometheus-2.50.0.linux-amd64

Paso 2: Crear System Usuario and Directorios

Crear a dedicated user for Prometheus and set up necessary directories:

sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus /var/lib/prometheus

Paso 3: Copy Binaries and Archivos

Move the Prometheus binaries and files to system locations:

sudo cp prometheus promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool
sudo cp prometheus.yml /etc/prometheus/
sudo cp consoles -r /etc/prometheus/
sudo cp console_libraries -r /etc/prometheus/
sudo chown -R prometheus:prometheus /etc/prometheus/consoles /etc/prometheus/console_libraries

Paso 4: Verificar Instalación

Verificar that Prometheus is properly installed:

prometheus --version
promtool --version

Configuración

Basic Configuración

The main configuration file is located at /etc/prometheus/prometheus.yml. Here's a minimal producción-ready configuration:

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    monitor: 'prometheus-prod'
    environment: 'production'

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - 'localhost:9093'

rule_files:
  - '/etc/prometheus/rules/*.yml'

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

Avanzado Configuración Opciones

For producción environments, consider these additional settings:

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  scrape_timeout: 10s
  external_labels:
    cluster: 'us-east-1'
    region: 'production'

remote_write:
  - url: 'http://localhost:9009/api/v1/push'
    queue_config:
      capacity: 10000
      max_shards: 200
      min_shards: 1
      max_samples_per_send: 500
      batch_send_wait_time: 5s
      min_backoff: 30ms
      max_backoff: 100ms

remote_read:
  - url: 'http://localhost:9009/api/v1/read'
    read_recent: true

Scrape Configuración

Scrape Targets Configuración

Define targets Para monitorear using various discovery methods:

scrape_configs:
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['localhost:9100', '192.168.1.10:9100', '192.168.1.11:9100']
        labels:
          datacenter: 'us-east-1'
          rack: '1a'
      - targets: ['192.168.1.12:9100']
        labels:
          datacenter: 'us-west-1'

  - job_name: 'mysql-servers'
    static_configs:
      - targets: 
          - '192.168.1.20:9104'
          - '192.168.1.21:9104'
        labels:
          environment: 'production'

  - job_name: 'postgres-servers'
    scrape_interval: 30s
    static_configs:
      - targets: ['localhost:9187']
    metric_relabel_configs:
      - source_labels: [__name__]
        regex: 'pg_stat_.*'
        action: drop

Servicio Discovery Methods

For dynamic environments, use service discovery:

scrape_configs:
  - job_name: 'consul-services'
    consul_sd_configs:
      - server: 'localhost:8500'
        datacenter: 'us-east-1'
    relabel_configs:
      - source_labels: [__meta_consul_service]
        target_label: service

  - job_name: 'docker-containers'
    docker_sd_configs:
      - host: 'unix:///var/run/docker.sock'
    relabel_configs:
      - source_labels: [__meta_docker_container_name]
        target_label: container

Relabeling Configuración

Use relabeling to add, drop, or modify labels:

scrape_configs:
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        action: keep
        regex: 'app-(web|api)'
      - source_labels: [__meta_kubernetes_pod_name]
        target_label: pod_name
      - source_labels: [__meta_kubernetes_namespace]
        target_label: namespace
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_port]
        action: replace
        regex: '([^:]+)(?::\d+)?;(\d+)'
        replacement: '$1:$2'
        target_label: __address__

Servicio Management

Crear Systemd Servicio

Crear a systemd service file for Prometheus:

sudo tee /etc/systemd/system/prometheus.service > /dev/null << 'EOF'
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus \
  --web.console.templates=/etc/prometheus/consoles \
  --web.console.libraries=/etc/prometheus/console_libraries \
  --web.listen-address=0.0.0.0:9090 \
  --web.enable-lifecycle

Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target
EOF

Habilitar and Iniciar Servicio

sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
sudo systemctl status prometheus

View Servicio Registros

sudo journalctl -u prometheus -f
sudo journalctl -u prometheus --since "1 hour ago"

Datos Retention

Configurar Retention Policy

Set retention time and size limits En el systemd service:

sudo systemctl edit prometheus

Modificar the ExecStart line to include:

--storage.tsdb.retention.time=30d \
--storage.tsdb.retention.size=50GB

Monitor Storage Usage

du -sh /var/lib/prometheus/
df -h /var/lib/prometheus/

# Check current blocks
ls -la /var/lib/prometheus/wal/
ls -la /var/lib/prometheus/

Cleanup and Maintenance

Prometheus automatically manages old data based on retention policies. To manually trigger cleanup:

# Validate configuration before cleanup
promtool check config /etc/prometheus/prometheus.yml

# Check WAL corruption
promtool tsdb list /var/lib/prometheus/

# Repair corrupted database
promtool tsdb repair /var/lib/prometheus/

PromQL Basics

Simple Consultas

Retrieve current metric values:

# Get CPU usage
node_cpu_seconds_total

# Get memory available
node_memory_MemAvailable_bytes

# Get specific instance
node_memory_MemAvailable_bytes{instance="192.168.1.10:9100"}

Range Vectors

Consulta metrics over time ranges:

# Last 5 minutes of CPU usage
node_cpu_seconds_total[5m]

# Last hour of memory usage
node_memory_MemAvailable_bytes[1h]

# Last 7 days
up[7d]

Aggregation and Functions

Perform calculations on metrics:

# Average CPU usage across instances
avg(node_cpu_seconds_total)

# Sum of requests per second
sum(rate(http_requests_total[5m]))

# Top 5 memory consumers
topk(5, node_memory_MemAvailable_bytes)

# Disk usage percentage
(node_filesystem_size_bytes - node_filesystem_avail_bytes) / node_filesystem_size_bytes * 100

Avanzado PromQL Consultas

Complex queries for real-world monitoreo:

# CPU usage percentage
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Request latency p95
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# Service error rate
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) * 100

# Memory pressure
(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) < 0.1

Seguridad Considerations

Red Seguridad

Configurar firewall rules to restrict access:

# Allow only specific IPs
sudo ufw allow from 192.168.1.0/24 to any port 9090
sudo ufw allow from 10.0.0.0/8 to any port 9090

# Allow local access only
sudo ufw allow 127.0.0.1/32 port 9090

Autenticación and Reverse Proxy

Use a reverse proxy for authentication:

# Install Nginx
sudo apt-get update
sudo apt-get install -y nginx

# Create basic auth file
sudo htpasswd -c /etc/nginx/.htpasswd prometheus_user

Configurar Nginx for Prometheus:

upstream prometheus {
    server 127.0.0.1:9090;
}

server {
    listen 443 ssl http2;
    server_name prometheus.example.com;

    ssl_certificate /etc/ssl/certs/cert.pem;
    ssl_certificate_key /etc/ssl/private/key.pem;

    auth_basic "Prometheus";
    auth_basic_user_file /etc/nginx/.htpasswd;

    location / {
        proxy_pass http://prometheus;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Archivo Permisos

Ensure proper file permissions:

sudo chown -R prometheus:prometheus /etc/prometheus
sudo chmod -R 750 /etc/prometheus
sudo chown -R prometheus:prometheus /var/lib/prometheus
sudo chmod -R 750 /var/lib/prometheus

Monitoreo Prometheus

Self-Monitoreo

Habilitar Prometheus Para monitorear itself:

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance

Key Métricas Para monitorear

# Prometheus health
up{job="prometheus"}

# Scrape duration
prometheus_tsdb_symbol_table_size_bytes

# WAL size
prometheus_tsdb_wal_segment_creation_failures_total

# Memory usage
process_resident_memory_bytes{job="prometheus"}

# Goroutine count
go_goroutines{job="prometheus"}

Solución de Problemas

Configuración Validation

Before applying configuration changes:

promtool check config /etc/prometheus/prometheus.yml
promtool check config --lint-fatal /etc/prometheus/prometheus.yml

Verificar Rules

Verificar alerting rules syntax:

promtool check rules /etc/prometheus/rules/*.yml

Rendimiento Issues

Verificar rendimiento metrics:

# Check scrape job duration
promtool query instant 'prometheus_tsdb_symbol_table_size_bytes'

# View active targets
curl -s http://localhost:9090/api/v1/targets | jq .

# Check failed scrapes
curl -s http://localhost:9090/api/v1/targets?state=down | jq .

Storage Issues

Diagnose storage problems:

# Check WAL integrity
promtool tsdb list /var/lib/prometheus/ --human-readable

# Check block health
promtool tsdb analyze /var/lib/prometheus/

# Verify blocks
promtool tsdb list /var/lib/prometheus/ | head -20

Debug Registro

Habilitar debug registro:

sudo systemctl edit prometheus

Agregar to ExecStart:

--log.level=debug

Then restart:

sudo systemctl restart prometheus

Conclusión

Prometheus provides a robust foundation for monitoreo infrastructure and applications. By properly Instalando, configuring, and maintaining Prometheus with attention to seguridad and rendimiento, you Crear un reliable monitoreo backbone. Regular backup of configuration files, monitoreo the monitoreo system itself, and staying updated with new releases Asegúrese de que su observability platform remains effective and secure. Iniciar with basic monitoreo, gradually add more exporters and complexity, and leverage the powerful PromQL language to gain deep insights ina su systems.