Prometheus Installation and Configuration

Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability. It collects metrics from configured targets at regular intervals, evaluates alerting rules, and can trigger alerts based on predefined conditions. This comprehensive guide covers everything needed to install, configure, and secure Prometheus on your infrastructure.

Table of Contents

Introduction

Prometheus works by pulling metrics from instrumented applications and infrastructure components. Unlike traditional push-based monitoring, Prometheus' pull model provides better control, simpler architecture, and easier debugging. The metrics are stored in a time-series database with powerful querying capabilities through PromQL.

System Requirements

Before installing Prometheus, ensure your system meets these requirements:

  • Linux kernel 2.6.32 or later
  • At least 1GB RAM (2GB+ recommended for production)
  • At least 10GB storage (scale based on retention period and metric volume)
  • Internet connectivity for downloading packages
  • Root or sudo access

Installation

Step 1: Download Prometheus

Start by downloading the latest stable release of Prometheus:

cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.50.0/prometheus-2.50.0.linux-amd64.tar.gz
tar -xvzf prometheus-2.50.0.linux-amd64.tar.gz
cd prometheus-2.50.0.linux-amd64

Step 2: Create System User and Directories

Create a dedicated user for Prometheus and set up necessary directories:

sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus /var/lib/prometheus

Step 3: Copy Binaries and Files

Move the Prometheus binaries and files to system locations:

sudo cp prometheus promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool
sudo cp prometheus.yml /etc/prometheus/
sudo cp consoles -r /etc/prometheus/
sudo cp console_libraries -r /etc/prometheus/
sudo chown -R prometheus:prometheus /etc/prometheus/consoles /etc/prometheus/console_libraries

Step 4: Verify Installation

Verify that Prometheus is properly installed:

prometheus --version
promtool --version

Configuration

Basic Configuration

The main configuration file is located at /etc/prometheus/prometheus.yml. Here's a minimal production-ready configuration:

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    monitor: 'prometheus-prod'
    environment: 'production'

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - 'localhost:9093'

rule_files:
  - '/etc/prometheus/rules/*.yml'

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

Advanced Configuration Options

For production environments, consider these additional settings:

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  scrape_timeout: 10s
  external_labels:
    cluster: 'us-east-1'
    region: 'production'

remote_write:
  - url: 'http://localhost:9009/api/v1/push'
    queue_config:
      capacity: 10000
      max_shards: 200
      min_shards: 1
      max_samples_per_send: 500
      batch_send_wait_time: 5s
      min_backoff: 30ms
      max_backoff: 100ms

remote_read:
  - url: 'http://localhost:9009/api/v1/read'
    read_recent: true

Scrape Configuration

Scrape Targets Setup

Define targets to monitor using various discovery methods:

scrape_configs:
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['localhost:9100', '192.168.1.10:9100', '192.168.1.11:9100']
        labels:
          datacenter: 'us-east-1'
          rack: '1a'
      - targets: ['192.168.1.12:9100']
        labels:
          datacenter: 'us-west-1'

  - job_name: 'mysql-servers'
    static_configs:
      - targets: 
          - '192.168.1.20:9104'
          - '192.168.1.21:9104'
        labels:
          environment: 'production'

  - job_name: 'postgres-servers'
    scrape_interval: 30s
    static_configs:
      - targets: ['localhost:9187']
    metric_relabel_configs:
      - source_labels: [__name__]
        regex: 'pg_stat_.*'
        action: drop

Service Discovery Methods

For dynamic environments, use service discovery:

scrape_configs:
  - job_name: 'consul-services'
    consul_sd_configs:
      - server: 'localhost:8500'
        datacenter: 'us-east-1'
    relabel_configs:
      - source_labels: [__meta_consul_service]
        target_label: service

  - job_name: 'docker-containers'
    docker_sd_configs:
      - host: 'unix:///var/run/docker.sock'
    relabel_configs:
      - source_labels: [__meta_docker_container_name]
        target_label: container

Relabeling Configuration

Use relabeling to add, drop, or modify labels:

scrape_configs:
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        action: keep
        regex: 'app-(web|api)'
      - source_labels: [__meta_kubernetes_pod_name]
        target_label: pod_name
      - source_labels: [__meta_kubernetes_namespace]
        target_label: namespace
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_port]
        action: replace
        regex: '([^:]+)(?::\d+)?;(\d+)'
        replacement: '$1:$2'
        target_label: __address__

Service Management

Create Systemd Service

Create a systemd service file for Prometheus:

sudo tee /etc/systemd/system/prometheus.service > /dev/null << 'EOF'
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus \
  --web.console.templates=/etc/prometheus/consoles \
  --web.console.libraries=/etc/prometheus/console_libraries \
  --web.listen-address=0.0.0.0:9090 \
  --web.enable-lifecycle

Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target
EOF

Enable and Start Service

sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
sudo systemctl status prometheus

View Service Logs

sudo journalctl -u prometheus -f
sudo journalctl -u prometheus --since "1 hour ago"

Data Retention

Configure Retention Policy

Set retention time and size limits in the systemd service:

sudo systemctl edit prometheus

Modify the ExecStart line to include:

--storage.tsdb.retention.time=30d \
--storage.tsdb.retention.size=50GB

Monitor Storage Usage

du -sh /var/lib/prometheus/
df -h /var/lib/prometheus/

# Check current blocks
ls -la /var/lib/prometheus/wal/
ls -la /var/lib/prometheus/

Cleanup and Maintenance

Prometheus automatically manages old data based on retention policies. To manually trigger cleanup:

# Validate configuration before cleanup
promtool check config /etc/prometheus/prometheus.yml

# Check WAL corruption
promtool tsdb list /var/lib/prometheus/

# Repair corrupted database
promtool tsdb repair /var/lib/prometheus/

PromQL Basics

Simple Queries

Retrieve current metric values:

# Get CPU usage
node_cpu_seconds_total

# Get memory available
node_memory_MemAvailable_bytes

# Get specific instance
node_memory_MemAvailable_bytes{instance="192.168.1.10:9100"}

Range Vectors

Query metrics over time ranges:

# Last 5 minutes of CPU usage
node_cpu_seconds_total[5m]

# Last hour of memory usage
node_memory_MemAvailable_bytes[1h]

# Last 7 days
up[7d]

Aggregation and Functions

Perform calculations on metrics:

# Average CPU usage across instances
avg(node_cpu_seconds_total)

# Sum of requests per second
sum(rate(http_requests_total[5m]))

# Top 5 memory consumers
topk(5, node_memory_MemAvailable_bytes)

# Disk usage percentage
(node_filesystem_size_bytes - node_filesystem_avail_bytes) / node_filesystem_size_bytes * 100

Advanced PromQL Queries

Complex queries for real-world monitoring:

# CPU usage percentage
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Request latency p95
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# Service error rate
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) * 100

# Memory pressure
(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) < 0.1

Security Considerations

Network Security

Configure firewall rules to restrict access:

# Allow only specific IPs
sudo ufw allow from 192.168.1.0/24 to any port 9090
sudo ufw allow from 10.0.0.0/8 to any port 9090

# Allow local access only
sudo ufw allow 127.0.0.1/32 port 9090

Authentication and Reverse Proxy

Use a reverse proxy for authentication:

# Install Nginx
sudo apt-get update
sudo apt-get install -y nginx

# Create basic auth file
sudo htpasswd -c /etc/nginx/.htpasswd prometheus_user

Configure Nginx for Prometheus:

upstream prometheus {
    server 127.0.0.1:9090;
}

server {
    listen 443 ssl http2;
    server_name prometheus.example.com;

    ssl_certificate /etc/ssl/certs/cert.pem;
    ssl_certificate_key /etc/ssl/private/key.pem;

    auth_basic "Prometheus";
    auth_basic_user_file /etc/nginx/.htpasswd;

    location / {
        proxy_pass http://prometheus;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

File Permissions

Ensure proper file permissions:

sudo chown -R prometheus:prometheus /etc/prometheus
sudo chmod -R 750 /etc/prometheus
sudo chown -R prometheus:prometheus /var/lib/prometheus
sudo chmod -R 750 /var/lib/prometheus

Monitoring Prometheus

Self-Monitoring

Enable Prometheus to monitor itself:

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance

Key Metrics to Monitor

# Prometheus health
up{job="prometheus"}

# Scrape duration
prometheus_tsdb_symbol_table_size_bytes

# WAL size
prometheus_tsdb_wal_segment_creation_failures_total

# Memory usage
process_resident_memory_bytes{job="prometheus"}

# Goroutine count
go_goroutines{job="prometheus"}

Troubleshooting

Configuration Validation

Before applying configuration changes:

promtool check config /etc/prometheus/prometheus.yml
promtool check config --lint-fatal /etc/prometheus/prometheus.yml

Verify Rules

Check alerting rules syntax:

promtool check rules /etc/prometheus/rules/*.yml

Performance Issues

Check performance metrics:

# Check scrape job duration
promtool query instant 'prometheus_tsdb_symbol_table_size_bytes'

# View active targets
curl -s http://localhost:9090/api/v1/targets | jq .

# Check failed scrapes
curl -s http://localhost:9090/api/v1/targets?state=down | jq .

Storage Issues

Diagnose storage problems:

# Check WAL integrity
promtool tsdb list /var/lib/prometheus/ --human-readable

# Check block health
promtool tsdb analyze /var/lib/prometheus/

# Verify blocks
promtool tsdb list /var/lib/prometheus/ | head -20

Debug Logging

Enable debug logging:

sudo systemctl edit prometheus

Add to ExecStart:

--log.level=debug

Then restart:

sudo systemctl restart prometheus

Conclusion

Prometheus provides a robust foundation for monitoring infrastructure and applications. By properly installing, configuring, and maintaining Prometheus with attention to security and performance, you create a reliable monitoring backbone. Regular backup of configuration files, monitoring the monitoring system itself, and staying updated with new releases ensure your observability platform remains effective and secure. Start with basic monitoring, gradually add more exporters and complexity, and leverage the powerful PromQL language to gain deep insights into your systems.