Prometheus Installation and Configuration
Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability. It collects metrics from configured targets at regular intervals, evaluates alerting rules, and can trigger alerts based on predefined conditions. This comprehensive guide covers everything needed to install, configure, and secure Prometheus on your infrastructure.
Table of Contents
- Introduction
- System Requirements
- Installation
- Configuration
- Scrape Configuration
- Service Management
- Data Retention
- PromQL Basics
- Security Considerations
- Monitoring Prometheus
- Troubleshooting
- Conclusion
Introduction
Prometheus works by pulling metrics from instrumented applications and infrastructure components. Unlike traditional push-based monitoring, Prometheus' pull model provides better control, simpler architecture, and easier debugging. The metrics are stored in a time-series database with powerful querying capabilities through PromQL.
System Requirements
Before installing Prometheus, ensure your system meets these requirements:
- Linux kernel 2.6.32 or later
- At least 1GB RAM (2GB+ recommended for production)
- At least 10GB storage (scale based on retention period and metric volume)
- Internet connectivity for downloading packages
- Root or sudo access
Installation
Step 1: Download Prometheus
Start by downloading the latest stable release of Prometheus:
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.50.0/prometheus-2.50.0.linux-amd64.tar.gz
tar -xvzf prometheus-2.50.0.linux-amd64.tar.gz
cd prometheus-2.50.0.linux-amd64
Step 2: Create System User and Directories
Create a dedicated user for Prometheus and set up necessary directories:
sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus /var/lib/prometheus
Step 3: Copy Binaries and Files
Move the Prometheus binaries and files to system locations:
sudo cp prometheus promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool
sudo cp prometheus.yml /etc/prometheus/
sudo cp consoles -r /etc/prometheus/
sudo cp console_libraries -r /etc/prometheus/
sudo chown -R prometheus:prometheus /etc/prometheus/consoles /etc/prometheus/console_libraries
Step 4: Verify Installation
Verify that Prometheus is properly installed:
prometheus --version
promtool --version
Configuration
Basic Configuration
The main configuration file is located at /etc/prometheus/prometheus.yml. Here's a minimal production-ready configuration:
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: 'prometheus-prod'
environment: 'production'
alerting:
alertmanagers:
- static_configs:
- targets:
- 'localhost:9093'
rule_files:
- '/etc/prometheus/rules/*.yml'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
Advanced Configuration Options
For production environments, consider these additional settings:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_timeout: 10s
external_labels:
cluster: 'us-east-1'
region: 'production'
remote_write:
- url: 'http://localhost:9009/api/v1/push'
queue_config:
capacity: 10000
max_shards: 200
min_shards: 1
max_samples_per_send: 500
batch_send_wait_time: 5s
min_backoff: 30ms
max_backoff: 100ms
remote_read:
- url: 'http://localhost:9009/api/v1/read'
read_recent: true
Scrape Configuration
Scrape Targets Setup
Define targets to monitor using various discovery methods:
scrape_configs:
- job_name: 'node-exporter'
static_configs:
- targets: ['localhost:9100', '192.168.1.10:9100', '192.168.1.11:9100']
labels:
datacenter: 'us-east-1'
rack: '1a'
- targets: ['192.168.1.12:9100']
labels:
datacenter: 'us-west-1'
- job_name: 'mysql-servers'
static_configs:
- targets:
- '192.168.1.20:9104'
- '192.168.1.21:9104'
labels:
environment: 'production'
- job_name: 'postgres-servers'
scrape_interval: 30s
static_configs:
- targets: ['localhost:9187']
metric_relabel_configs:
- source_labels: [__name__]
regex: 'pg_stat_.*'
action: drop
Service Discovery Methods
For dynamic environments, use service discovery:
scrape_configs:
- job_name: 'consul-services'
consul_sd_configs:
- server: 'localhost:8500'
datacenter: 'us-east-1'
relabel_configs:
- source_labels: [__meta_consul_service]
target_label: service
- job_name: 'docker-containers'
docker_sd_configs:
- host: 'unix:///var/run/docker.sock'
relabel_configs:
- source_labels: [__meta_docker_container_name]
target_label: container
Relabeling Configuration
Use relabeling to add, drop, or modify labels:
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: 'app-(web|api)'
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod_name
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_port]
action: replace
regex: '([^:]+)(?::\d+)?;(\d+)'
replacement: '$1:$2'
target_label: __address__
Service Management
Create Systemd Service
Create a systemd service file for Prometheus:
sudo tee /etc/systemd/system/prometheus.service > /dev/null << 'EOF'
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--web.listen-address=0.0.0.0:9090 \
--web.enable-lifecycle
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
EOF
Enable and Start Service
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
sudo systemctl status prometheus
View Service Logs
sudo journalctl -u prometheus -f
sudo journalctl -u prometheus --since "1 hour ago"
Data Retention
Configure Retention Policy
Set retention time and size limits in the systemd service:
sudo systemctl edit prometheus
Modify the ExecStart line to include:
--storage.tsdb.retention.time=30d \
--storage.tsdb.retention.size=50GB
Monitor Storage Usage
du -sh /var/lib/prometheus/
df -h /var/lib/prometheus/
# Check current blocks
ls -la /var/lib/prometheus/wal/
ls -la /var/lib/prometheus/
Cleanup and Maintenance
Prometheus automatically manages old data based on retention policies. To manually trigger cleanup:
# Validate configuration before cleanup
promtool check config /etc/prometheus/prometheus.yml
# Check WAL corruption
promtool tsdb list /var/lib/prometheus/
# Repair corrupted database
promtool tsdb repair /var/lib/prometheus/
PromQL Basics
Simple Queries
Retrieve current metric values:
# Get CPU usage
node_cpu_seconds_total
# Get memory available
node_memory_MemAvailable_bytes
# Get specific instance
node_memory_MemAvailable_bytes{instance="192.168.1.10:9100"}
Range Vectors
Query metrics over time ranges:
# Last 5 minutes of CPU usage
node_cpu_seconds_total[5m]
# Last hour of memory usage
node_memory_MemAvailable_bytes[1h]
# Last 7 days
up[7d]
Aggregation and Functions
Perform calculations on metrics:
# Average CPU usage across instances
avg(node_cpu_seconds_total)
# Sum of requests per second
sum(rate(http_requests_total[5m]))
# Top 5 memory consumers
topk(5, node_memory_MemAvailable_bytes)
# Disk usage percentage
(node_filesystem_size_bytes - node_filesystem_avail_bytes) / node_filesystem_size_bytes * 100
Advanced PromQL Queries
Complex queries for real-world monitoring:
# CPU usage percentage
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Request latency p95
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
# Service error rate
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) * 100
# Memory pressure
(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) < 0.1
Security Considerations
Network Security
Configure firewall rules to restrict access:
# Allow only specific IPs
sudo ufw allow from 192.168.1.0/24 to any port 9090
sudo ufw allow from 10.0.0.0/8 to any port 9090
# Allow local access only
sudo ufw allow 127.0.0.1/32 port 9090
Authentication and Reverse Proxy
Use a reverse proxy for authentication:
# Install Nginx
sudo apt-get update
sudo apt-get install -y nginx
# Create basic auth file
sudo htpasswd -c /etc/nginx/.htpasswd prometheus_user
Configure Nginx for Prometheus:
upstream prometheus {
server 127.0.0.1:9090;
}
server {
listen 443 ssl http2;
server_name prometheus.example.com;
ssl_certificate /etc/ssl/certs/cert.pem;
ssl_certificate_key /etc/ssl/private/key.pem;
auth_basic "Prometheus";
auth_basic_user_file /etc/nginx/.htpasswd;
location / {
proxy_pass http://prometheus;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
File Permissions
Ensure proper file permissions:
sudo chown -R prometheus:prometheus /etc/prometheus
sudo chmod -R 750 /etc/prometheus
sudo chown -R prometheus:prometheus /var/lib/prometheus
sudo chmod -R 750 /var/lib/prometheus
Monitoring Prometheus
Self-Monitoring
Enable Prometheus to monitor itself:
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
relabel_configs:
- source_labels: [__address__]
target_label: instance
Key Metrics to Monitor
# Prometheus health
up{job="prometheus"}
# Scrape duration
prometheus_tsdb_symbol_table_size_bytes
# WAL size
prometheus_tsdb_wal_segment_creation_failures_total
# Memory usage
process_resident_memory_bytes{job="prometheus"}
# Goroutine count
go_goroutines{job="prometheus"}
Troubleshooting
Configuration Validation
Before applying configuration changes:
promtool check config /etc/prometheus/prometheus.yml
promtool check config --lint-fatal /etc/prometheus/prometheus.yml
Verify Rules
Check alerting rules syntax:
promtool check rules /etc/prometheus/rules/*.yml
Performance Issues
Check performance metrics:
# Check scrape job duration
promtool query instant 'prometheus_tsdb_symbol_table_size_bytes'
# View active targets
curl -s http://localhost:9090/api/v1/targets | jq .
# Check failed scrapes
curl -s http://localhost:9090/api/v1/targets?state=down | jq .
Storage Issues
Diagnose storage problems:
# Check WAL integrity
promtool tsdb list /var/lib/prometheus/ --human-readable
# Check block health
promtool tsdb analyze /var/lib/prometheus/
# Verify blocks
promtool tsdb list /var/lib/prometheus/ | head -20
Debug Logging
Enable debug logging:
sudo systemctl edit prometheus
Add to ExecStart:
--log.level=debug
Then restart:
sudo systemctl restart prometheus
Conclusion
Prometheus provides a robust foundation for monitoring infrastructure and applications. By properly installing, configuring, and maintaining Prometheus with attention to security and performance, you create a reliable monitoring backbone. Regular backup of configuration files, monitoring the monitoring system itself, and staying updated with new releases ensure your observability platform remains effective and secure. Start with basic monitoring, gradually add more exporters and complexity, and leverage the powerful PromQL language to gain deep insights into your systems.


