Promtail and Loki for Log Aggregation
Grafana Loki is a horizontally scalable log aggregation system inspired by Prometheus, using label-based indexing instead of full-text indexing to keep costs low while providing fast log queries through LogQL. Paired with Promtail for log scraping and Grafana for visualization, the PLG stack (Promtail-Loki-Grafana) delivers Prometheus-style observability for your logs.
Prerequisites
- Ubuntu/Debian or CentOS/Rocky Linux server
- Grafana 9.x+ (for visualization)
- Object storage like S3 or MinIO (for production Loki storage)
- Ports: 3100 (Loki HTTP), 9080 (Promtail HTTP)
Installing Loki
# Download and install Loki binary
LOKI_VERSION=3.0.0
wget https://github.com/grafana/loki/releases/download/v${LOKI_VERSION}/loki-linux-amd64.zip
unzip loki-linux-amd64.zip
sudo mv loki-linux-amd64 /usr/local/bin/loki
sudo chmod +x /usr/local/bin/loki
# Create Loki user and directories
sudo useradd -r -s /bin/false loki
sudo mkdir -p /etc/loki /var/lib/loki/{index,cache,wal,boltdb-cache}
sudo chown -R loki:loki /etc/loki /var/lib/loki
# Basic Loki configuration (single-node, local filesystem storage)
cat > /etc/loki/loki-config.yaml <<EOF
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9095
common:
instance_addr: 127.0.0.1
path_prefix: /var/lib/loki
storage:
filesystem:
chunks_directory: /var/lib/loki/chunks
rules_directory: /var/lib/loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
query_range:
results_cache:
cache:
embedded_cache:
enabled: true
max_size_mb: 100
schema_config:
configs:
- from: 2024-01-01
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h # 7 days
ingestion_rate_mb: 64
ingestion_burst_size_mb: 128
max_query_series: 5000
max_query_lookback: 0
table_manager:
retention_deletes_enabled: true
retention_period: 744h # 31 days
EOF
# Create systemd service
cat > /etc/systemd/system/loki.service <<EOF
[Unit]
Description=Grafana Loki
After=network-online.target
[Service]
User=loki
Group=loki
ExecStart=/usr/local/bin/loki -config.file=/etc/loki/loki-config.yaml
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl enable --now loki
sudo systemctl status loki
# Verify Loki is running
curl -s http://localhost:3100/ready
curl -s http://localhost:3100/metrics | grep loki_build
Configuring Promtail
Promtail scrapes log files and sends them to Loki:
# Download and install Promtail
LOKI_VERSION=3.0.0
wget https://github.com/grafana/loki/releases/download/v${LOKI_VERSION}/promtail-linux-amd64.zip
unzip promtail-linux-amd64.zip
sudo mv promtail-linux-amd64 /usr/local/bin/promtail
sudo chmod +x /usr/local/bin/promtail
# Create Promtail configuration
cat > /etc/promtail/promtail-config.yaml <<EOF
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /var/lib/promtail/positions.yaml # Tracks read positions
clients:
- url: http://loki-server:3100/loki/api/v1/push
scrape_configs:
# System logs
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: syslog
host: ${HOSTNAME}
__path__: /var/log/syslog
# Auth logs
- job_name: auth
static_configs:
- targets:
- localhost
labels:
job: auth
host: ${HOSTNAME}
__path__: /var/log/auth.log
# Nginx access logs with pipeline
- job_name: nginx_access
static_configs:
- targets:
- localhost
labels:
job: nginx
log_type: access
host: ${HOSTNAME}
__path__: /var/log/nginx/access.log
# Nginx error logs
- job_name: nginx_error
static_configs:
- targets:
- localhost
labels:
job: nginx
log_type: error
host: ${HOSTNAME}
__path__: /var/log/nginx/error.log
# Application logs (multiple files with glob)
- job_name: application
static_configs:
- targets:
- localhost
labels:
job: app
environment: production
host: ${HOSTNAME}
__path__: /var/log/app/*.log
EOF
sudo mkdir -p /var/lib/promtail /etc/promtail
sudo useradd -r -s /bin/false promtail
# Add promtail user to adm group to read system logs
sudo usermod -a -G adm promtail
sudo chown -R promtail:promtail /var/lib/promtail /etc/promtail
cat > /etc/systemd/system/promtail.service <<EOF
[Unit]
Description=Promtail log shipper
After=network-online.target
[Service]
User=promtail
Group=promtail
ExecStart=/usr/local/bin/promtail -config.file=/etc/promtail/promtail-config.yaml
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl enable --now promtail
Label Extraction and Pipeline Stages
Promtail pipeline stages parse and enrich log entries:
# Enhanced Promtail config with pipeline stages
scrape_configs:
- job_name: nginx_parsed
static_configs:
- targets:
- localhost
labels:
job: nginx
host: server-01
__path__: /var/log/nginx/access.log
pipeline_stages:
# Parse nginx combined log format
- regex:
expression: '^(?P<remote_addr>[\w\.]+) - (?P<remote_user>\S+) \[(?P<time_local>[^\]]+)\] "(?P<method>\S+) (?P<request_uri>\S+) (?P<protocol>\S+)" (?P<status>\d+) (?P<body_bytes_sent>\d+)'
# Extract labels from parsed fields
- labels:
status:
method:
# Convert status to numeric metric
- metrics:
nginx_response_time_seconds:
type: Histogram
description: "Nginx response time"
source: request_time
config:
buckets: [0.001, 0.01, 0.1, 0.5, 1.0, 5.0]
# Add timestamp from log
- timestamp:
source: time_local
format: "02/Jan/2006:15:04:05 -0700"
# Drop noisy health check requests
- drop:
expression: '.*healthz.*'
# JSON log pipeline
- job_name: json_app
static_configs:
- targets:
- localhost
labels:
job: json-app
__path__: /var/log/app/app.json.log
pipeline_stages:
- json:
expressions:
level: level
message: message
trace_id: trace_id
service: service
- labels:
level:
service:
# Filter only errors for a separate stream
- match:
selector: '{job="json-app"} |= "error"'
stages:
- labels:
severity: "error"
Loki Storage Backends
# Production: S3 storage backend
common:
storage:
s3:
endpoint: s3.amazonaws.com
bucketnames: your-loki-bucket
region: us-east-1
access_key_id: YOUR_ACCESS_KEY
secret_access_key: YOUR_SECRET_KEY
s3forcepathstyle: false
# Production: MinIO storage (self-hosted S3-compatible)
common:
storage:
s3:
endpoint: minio.internal:9000
bucketnames: loki-data
region: us-east-1 # Required but ignored by MinIO
access_key_id: minio-access-key
secret_access_key: minio-secret-key
s3forcepathstyle: true
insecure: false
# Configure chunk caching with Redis
chunk_store_config:
chunk_cache_config:
redis:
endpoint: redis:6379
db: 0
expiration: 1h
# Retention policy configuration
limits_config:
retention_period: 744h # 31 days globally
# Per-tenant retention (requires auth_enabled: true)
ruler:
storage:
type: s3
s3:
bucketnames: loki-rules
# ... same s3 config
compactor:
working_directory: /var/lib/loki/compactor
retention_enabled: true
delete_request_store: s3
LogQL Queries
LogQL is Loki's query language:
# Basic log stream selection
{job="nginx"}
# Filter by label and content
{job="nginx", status="500"} |= "upstream"
# Show last 100 error logs
{job="nginx"} |= "error" | line_format "{{.message}}" | limit 100
# Parse nginx access log and filter slow requests
{job="nginx"}
| regex `(?P<method>\w+) (?P<path>\S+).*" (?P<status>\d+) .* (?P<response_time>[\d.]+)$`
| response_time > 2.0
# Count errors per minute (metric query)
sum(rate({job="nginx"} |= "error" [5m])) by (host)
# Count HTTP status codes
sum by (status) (rate({job="nginx"} [5m]))
# Error rate percentage
sum(rate({job="nginx", status=~"5.."}[5m]))
/ sum(rate({job="nginx"}[5m])) * 100
# Find failed SSH logins
{job="auth"} |= "Failed password"
| regex `Failed password for (?P<user>\S+) from (?P<ip>[\d.]+)`
| line_format "User: {{.user}} from IP: {{.ip}}"
# JSON log parsing
{job="json-app"}
| json
| level="error"
| line_format "{{.timestamp}} [{{.level}}] {{.message}} trace={{.trace_id}}"
# Top 10 slowest endpoints
topk(10,
sum by (path) (
rate({job="nginx"}
| regex `(?P<path>/[^\s?]+).*(?P<duration>[\d.]+)$`
| unwrap duration [5m])
)
)
Grafana Log Visualization
# Add Loki data source in Grafana
curl -s -X POST \
-H "Content-Type: application/json" \
-u admin:grafana-password \
http://grafana:3000/api/datasources \
-d '{
"name": "Loki",
"type": "loki",
"url": "http://loki:3100",
"access": "proxy",
"basicAuth": false
}'
# Create a log dashboard panel via API or Grafana UI
# In Grafana: + > Dashboard > Add visualization > Select Loki datasource
# Use "Logs" visualization type for raw log viewing
# Use "Time series" or "Bar chart" for aggregated metrics from LogQL metric queries
# Example: Nginx dashboard with Logs panel
# Query: {job="nginx"} |= "" | json | line_format "{{.remote_addr}} {{.method}} {{.request_uri}} {{.status}}"
# Visualization: Logs
# Labels: status, method
# Correlate logs with Prometheus metrics
# In Grafana, use the "Explore" view to correlate:
# - Prometheus panel showing latency spike
# - Switch to Logs explorer with same time range
# - Filter {job="nginx"} |= "error"
Troubleshooting
Promtail not sending logs:
# Check Promtail is running and reading files
curl -s http://localhost:9080/metrics | grep promtail_files_active
# Check positions file to see where Promtail is reading
cat /var/lib/promtail/positions.yaml
# View Promtail logs
sudo journalctl -u promtail -n 50
# Test Promtail configuration
promtail -config.file=/etc/promtail/promtail-config.yaml -dry-run
Loki returning "too many outstanding requests":
# Increase query concurrency limits in loki-config.yaml
query_scheduler:
max_outstanding_requests_per_tenant: 2048
# Or reduce query time range in LogQL
# Instead of last 7 days, use last 1 hour
High memory usage:
# Reduce ingestion rate limits
limits_config:
ingestion_rate_mb: 32
ingestion_burst_size_mb: 64
# Enable chunk compression
chunk_encoding: snappy
Labels not being extracted:
# Test pipeline stages locally with promtail in dry-run mode
echo '127.0.0.1 - - [15/Jan/2024:10:00:00 +0000] "GET /api/health HTTP/1.1" 200 42' | \
promtail -config.file=/etc/promtail/promtail-config.yaml -stdin --dry-run
# Check Promtail targets and labels
curl -s http://localhost:9080/targets | jq
Conclusion
The Promtail + Loki + Grafana stack provides a cost-effective, horizontally scalable log aggregation solution that integrates naturally with Prometheus-based monitoring. Loki's label-based indexing approach keeps storage costs dramatically lower than full-text search solutions, while LogQL provides powerful querying capabilities. Start with filesystem storage for development and migrate to S3-compatible object storage as your log volume grows, using Loki's retention policies to automatically manage storage costs.


