Grafana Loki and Promtail Complete Setup
Building a complete logging stack with Grafana, Loki, and Promtail provides comprehensive log aggregation and visualization. This guide covers deploying the entire stack, configuring complex pipeline stages, integrating with Grafana, and setting up log-based alerting for production environments.
Table of Contents
- Overview
- Architecture
- System Preparation
- Loki Server Setup
- Promtail Configuration
- Advanced Pipeline Stages
- Grafana Integration
- Creating Log Dashboards
- Alerting on Logs
- Performance Optimization
- Troubleshooting
- Conclusion
Overview
A complete logging stack captures, processes, stores, and visualizes logs from all infrastructure components. Loki's label-based architecture reduces storage costs while Promtail's flexible configuration handles diverse log sources. Grafana provides unified visualization across metrics and logs.
Architecture
Stack Components
┌─────────────────────────────────────────────┐
│ Application Logs / Syslog / Files │
└───────────────────┬─────────────────────────┘
│
┌─────────▼─────────┐
│ Promtail │
│ - Collection │
│ - Parsing │
│ - Labeling │
└─────────┬─────────┘
│
┌─────────▼─────────┐
│ Loki Server │
│ - Ingestion │
│ - Indexing │
│ - Storage │
└─────────┬─────────┘
│
┌────────────┼────────────┐
│ │ │
BoltDB S3/GCS Cassandra
│
┌─────────▼─────────┐
│ Grafana │
│ - Visualization │
│ - Alerting │
└───────────────────┘
System Preparation
Prerequisites
# System updates
sudo apt-get update && sudo apt-get upgrade -y
# Install dependencies
sudo apt-get install -y \
curl wget unzip \
git gcc make \
openssl ca-certificates
# Create logging user
sudo useradd --no-create-home --shell /bin/false logging
Directory Structure
# Create directories
sudo mkdir -p /opt/logging/{loki,promtail,grafana}
sudo mkdir -p /var/lib/loki/{chunks,index,cache}
sudo mkdir -p /var/log/loki
sudo mkdir -p /etc/loki /etc/promtail
# Set permissions
sudo chown -R logging:logging /opt/logging
sudo chown -R logging:logging /var/lib/loki
sudo chown -R logging:logging /var/log/loki
Loki Server Setup
Installation
cd /tmp
wget https://github.com/grafana/loki/releases/download/v2.9.0/loki-linux-amd64.zip
unzip loki-linux-amd64.zip
sudo mv loki-linux-amd64 /usr/local/bin/loki
sudo chmod +x /usr/local/bin/loki
Production Configuration
sudo tee /etc/loki/loki-config.yml > /dev/null << 'EOF'
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
log_level: info
log_format: json
timeout_shutdown_request: 10s
ingester:
chunk_idle_period: 3m
chunk_retain_period: 1m
max_chunk_age: 2h
chunk_encoding: snappy
chunk_size_target: 1048576
chunk_size_bytes: 1572864
max_streams_utilization_factor: 2.0
max_stream_entries_limit: 10000
lifecycler:
ring:
kvstore:
store: inmemory
replication_factor: 1
num_tokens: 128
heartbeat_timeout: 5m
term_timeout: 10m
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
max_line_size: 2097152
ingestion_rate_mb: 100
ingestion_burst_size_mb: 200
max_entries_limit_per_second: 10000
max_global_streams_matched_per_user: 10000
retention_period: 720h
cardinality_limit: 100000
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: loki_index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /var/lib/loki/index
shared_store: filesystem
cache_location: /var/lib/loki/cache
shared_store_key_prefix: index/
filesystem:
directory: /var/lib/loki/chunks
chunk_store_config:
max_look_back_period: 0s
chunk_cache_config:
cache:
enable_fifocache: true
default_validity: 1h
memcache:
batch_size: 1024
parallelism: 100
table_manager:
retention_deletes_enabled: true
retention_period: 720h
poll_interval: 10m
creation_grace_period: 10m
query_range:
align_queries_with_step: true
cache_results: true
results_cache:
cache:
enable_fifocache: true
default_validity: 1h
loki:
auth_enabled: false
tracing:
enabled: false
metrics:
enabled: false
EOF
sudo chown logging:logging /etc/loki/loki-config.yml
Systemd Service
sudo tee /etc/systemd/system/loki.service > /dev/null << 'EOF'
[Unit]
Description=Grafana Loki
Documentation=https://grafana.com/loki
After=network.target
[Service]
User=logging
Group=logging
Type=simple
ExecStart=/usr/local/bin/loki -config.file=/etc/loki/loki-config.yml
Restart=on-failure
RestartSec=5
# Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=loki
# Resource limits
LimitNOFILE=65536
LimitNPROC=65536
# Security
ProtectSystem=full
ProtectHome=yes
NoNewPrivileges=true
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable loki
sudo systemctl start loki
Promtail Configuration
Base Configuration
sudo tee /etc/promtail/promtail-config.yml > /dev/null << 'EOF'
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /var/lib/loki/positions.yaml
clients:
- url: http://localhost:3100/loki/api/v1/push
batchwait: 1s
batchsize: 1048576
scrape_configs:
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: syslog
host: __HOSTNAME__
__path__: /var/log/{syslog,messages}
- job_name: kernel
static_configs:
- targets:
- localhost
labels:
job: kernel
__path__: /var/log/kern.log
- job_name: auth
static_configs:
- targets:
- localhost
labels:
job: auth
__path__: /var/log/auth.log
- job_name: docker
static_configs:
- targets:
- localhost
labels:
job: docker
__path__: /var/lib/docker/containers/*/*-json.log
pipeline_stages:
- json:
expressions:
output: log
stream: stream
attrs_status: attrs.status
- output:
source: output
- job_name: nginx
static_configs:
- targets:
- localhost
labels:
job: nginx
__path__: /var/log/nginx/*.log
pipeline_stages:
- multiline:
line_start_pattern: '^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
- regex:
expression: '^(?P<remote>[\w\.]+) (?P<host>[\w\.]+) (?P<user>[\w\-\.]+) \[(?P<timestamp>[\w:/]+\s[+\-]\d{4})\] "(?P<method>\w+) (?P<path>[^\s]+) (?P<protocol>[\w/\.]+)" (?P<status>\d+|-) (?P<bytes>\d+|-)\s?"?(?P<referer>[^\s]*)"?\s?"?(?P<agent>[^"]*)"?'
- timestamp:
source: timestamp
format: '02/Jan/2006:15:04:05 -0700'
- labels:
status:
method:
path:
- metrics:
nginx_http_requests_total:
type: Counter
description: "Total HTTP requests"
prefix: "nginx_"
max_idle_duration: 30s
match_all: true
action: add
nginx_http_request_duration_seconds:
type: Histogram
description: "Request duration"
source: response_time
prefix: "nginx_"
buckets: [.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10]
max_idle_duration: 30s
- job_name: application
static_configs:
- targets:
- localhost
labels:
job: app
env: production
__path__: /var/log/app/*.log
pipeline_stages:
- json:
expressions:
timestamp: timestamp
level: level
msg: message
trace_id: trace_id
user_id: user_id
request_path: request.path
status_code: request.status_code
response_time: response_time_ms
- timestamp:
source: timestamp
format: 2006-01-02T15:04:05Z07:00
- labels:
level:
trace_id:
user_id:
- drop:
expression: '.*health_check.*'
- metrics:
app_errors_total:
type: Counter
description: "Total errors"
match_all: true
action: add
value: '1'
condition:
selector: '{level="error"}'
app_request_duration_seconds:
type: Histogram
description: "Request duration"
source: response_time
buckets: [10, 50, 100, 250, 500, 1000, 2500, 5000]
EOF
sudo chown logging:logging /etc/promtail/promtail-config.yml
Promtail Service
# Download Promtail
cd /tmp
wget https://github.com/grafana/loki/releases/download/v2.9.0/promtail-linux-amd64.zip
unzip promtail-linux-amd64.zip
sudo mv promtail-linux-amd64 /usr/local/bin/promtail
sudo chmod +x /usr/local/bin/promtail
# Create systemd service
sudo tee /etc/systemd/system/promtail.service > /dev/null << 'EOF'
[Unit]
Description=Grafana Promtail
Documentation=https://grafana.com/loki
After=network.target loki.service
[Service]
User=logging
Group=logging
Type=simple
ExecStart=/usr/local/bin/promtail -config.file=/etc/promtail/promtail-config.yml
Restart=on-failure
RestartSec=5
StandardOutput=journal
StandardError=journal
SyslogIdentifier=promtail
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable promtail
sudo systemctl start promtail
Advanced Pipeline Stages
Multiline Log Parsing
scrape_configs:
- job_name: java-app
static_configs:
- targets:
- localhost
labels:
job: java
__path__: /var/log/java-app/*.log
pipeline_stages:
- multiline:
line_start_pattern: '^\d{4}-\d{2}-\d{2}'
- regex:
expression: '^(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<level>\w+) \[(?P<thread>[^\]]+)\] (?P<logger>[^\s]+)\s-\s(?P<message>.*)$'
- timestamp:
source: timestamp
format: '2006-01-02 15:04:05'
- labels:
level:
thread:
logger:
Complex JSON Parsing
scrape_configs:
- job_name: structured-logs
static_configs:
- targets:
- localhost
labels:
job: structured
__path__: /var/log/app/*.json
pipeline_stages:
- json:
expressions:
timestamp: '@timestamp'
level: log.level
message: message
service: service.name
trace_id: trace.id
user_email: user.email
duration_ms: duration_ms
status_code: http.status_code
- timestamp:
source: timestamp
format: '2006-01-02T15:04:05.000Z07:00'
- labels:
level:
service:
trace_id:
status_code:
- metrics:
app_request_total:
type: Counter
description: "Total requests"
match_all: true
action: add
app_error_total:
type: Counter
description: "Total errors"
match_all: true
action: add
condition:
selector: '{level="error"}'
app_duration_seconds:
type: Histogram
description: "Request duration"
source: duration_ms
buckets: [10, 50, 100, 500, 1000, 5000, 10000]
Conditional Processing
scrape_configs:
- job_name: conditional-processing
static_configs:
- targets:
- localhost
labels:
job: conditional
__path__: /var/log/app/*.log
pipeline_stages:
- regex:
expression: '(?P<method>\w+) (?P<path>\S+) HTTP'
- drop:
expression: '(health_check|status_check|metrics)'
- json:
expressions:
duration: duration
only_errors: true
- match:
selector: '{method="GET"}'
stages:
- regex:
expression: 'duration=(?P<duration>\d+)'
- metrics:
get_requests_total:
type: Counter
match_all: true
action: add
- match:
selector: '{method="POST"}'
stages:
- metrics:
post_requests_total:
type: Counter
match_all: true
action: add
Grafana Integration
Add Loki Data Source
curl -X POST http://admin:admin@localhost:3000/api/datasources \
-H "Content-Type: application/json" \
-d '{
"name": "Loki",
"type": "loki",
"url": "http://localhost:3100",
"access": "proxy",
"isDefault": false,
"jsonData": {
"maxLines": 1000
}
}'
Creating Log Dashboards
Dashboard JSON with Log Panels
{
"dashboard": {
"title": "Application Logs Dashboard",
"panels": [
{
"title": "Log Volume",
"targets": [
{
"expr": "sum(rate({job=\"app\"}[5m])) by (level)",
"legendFormat": "{{level}}"
}
],
"type": "timeseries"
},
{
"title": "Error Logs",
"targets": [
{
"expr": "{job=\"app\", level=\"error\"}",
"format": "logs"
}
],
"type": "logs"
},
{
"title": "Request Duration",
"targets": [
{
"expr": "histogram_quantile(0.95, sum(rate({job=\"app\"} | json [5m])) by (le))"
}
],
"type": "timeseries"
}
]
}
}
Alerting on Logs
Create Log-Based Alerts
# Alert on error rate
curl -X POST http://admin:admin@localhost:3000/api/ruler/grafana/rules/logs \
-H "Content-Type: application/json" \
-d '{
"uid": "error-rate-alert",
"title": "High Error Rate",
"condition": "A",
"data": [
{
"refId": "A",
"queryType": "logs",
"model": {
"expr": "sum(rate({job=\"app\"} |= \"error\" [5m]))"
}
}
],
"noDataState": "NoData",
"execErrState": "Alerting",
"for": "5m",
"annotations": {
"summary": "High error rate detected"
}
}'
Performance Optimization
Tuning Loki
# Increase chunk size for high-volume logging
# Edit loki-config.yml
chunk_size_target: 2097152 # 2MB instead of 1MB
chunk_size_bytes: 3145728 # 3MB max
# Increase ingestion rate
ingestion_rate_mb: 200
ingestion_burst_size_mb: 400
# Adjust query cache
query_range:
cache_results: true
results_cache:
cache:
default_validity: 2h
Promtail Optimization
# Increase batching
clients:
- url: http://localhost:3100/loki/api/v1/push
batchwait: 2s
batchsize: 2097152 # 2MB batches
backoff_config:
minbackoff: 100ms
maxbackoff: 10s
maxretries: 5
Troubleshooting
Health Checks
# Check Loki readiness
curl -f http://localhost:3100/ready
# Check Loki metrics
curl http://localhost:3100/metrics | grep loki_ingester
# Check Promtail metrics
curl http://localhost:9080/metrics | grep promtail
Verify Data Flow
# Query recent logs
curl 'http://localhost:3100/loki/api/v1/query_range?query={job="app"}&start=1000&end=2000&limit=100' | jq .
# Check Promtail position tracking
tail -20 /var/lib/loki/positions.yaml
Debug Issues
# Enable debug logging
# Edit loki-config.yml
server:
log_level: debug
# Restart services
sudo systemctl restart loki promtail
# Monitor logs
sudo journalctl -u loki -f
sudo journalctl -u promtail -f
Conclusion
A complete Loki stack provides cost-effective log aggregation with powerful querying and visualization. By following this guide, you've built a production-ready logging infrastructure. Focus on designing efficient label hierarchies, leveraging pipeline stages for intelligent parsing, and setting appropriate retention policies. This foundation scales to handle large-scale logging requirements while maintaining fast query performance and keeping operational costs low.


