Promtail and Loki for Log Aggregation

Grafana Loki is a horizontally scalable log aggregation system inspired by Prometheus, using label-based indexing instead of full-text indexing to keep costs low while providing fast log queries through LogQL. Paired with Promtail for log scraping and Grafana for visualization, the PLG stack (Promtail-Loki-Grafana) delivers Prometheus-style observability for your logs.

Prerequisites

Ubuntu/Debian or CentOS/Rocky Linux server
Grafana 9.x+ (for visualization)
Object storage like S3 or MinIO (for production Loki storage)
Ports: 3100 (Loki HTTP), 9080 (Promtail HTTP)

Installing Loki

# Download and install Loki binary
LOKI_VERSION=3.0.0
wget https://github.com/grafana/loki/releases/download/v${LOKI_VERSION}/loki-linux-amd64.zip
unzip loki-linux-amd64.zip
sudo mv loki-linux-amd64 /usr/local/bin/loki
sudo chmod +x /usr/local/bin/loki

# Create Loki user and directories
sudo useradd -r -s /bin/false loki
sudo mkdir -p /etc/loki /var/lib/loki/{index,cache,wal,boltdb-cache}
sudo chown -R loki:loki /etc/loki /var/lib/loki

# Basic Loki configuration (single-node, local filesystem storage)
cat > /etc/loki/loki-config.yaml <<EOF
auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9095

common:
  instance_addr: 127.0.0.1
  path_prefix: /var/lib/loki
  storage:
    filesystem:
      chunks_directory: /var/lib/loki/chunks
      rules_directory: /var/lib/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

limits_config:
  reject_old_samples: true
  reject_old_samples_max_age: 168h  # 7 days
  ingestion_rate_mb: 64
  ingestion_burst_size_mb: 128
  max_query_series: 5000
  max_query_lookback: 0

table_manager:
  retention_deletes_enabled: true
  retention_period: 744h  # 31 days
EOF

# Create systemd service
cat > /etc/systemd/system/loki.service <<EOF
[Unit]
Description=Grafana Loki
After=network-online.target

[Service]
User=loki
Group=loki
ExecStart=/usr/local/bin/loki -config.file=/etc/loki/loki-config.yaml
Restart=on-failure
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl enable --now loki
sudo systemctl status loki

# Verify Loki is running
curl -s http://localhost:3100/ready
curl -s http://localhost:3100/metrics | grep loki_build

Configuring Promtail

Promtail scrapes log files and sends them to Loki:

# Download and install Promtail
LOKI_VERSION=3.0.0
wget https://github.com/grafana/loki/releases/download/v${LOKI_VERSION}/promtail-linux-amd64.zip
unzip promtail-linux-amd64.zip
sudo mv promtail-linux-amd64 /usr/local/bin/promtail
sudo chmod +x /usr/local/bin/promtail

# Create Promtail configuration
cat > /etc/promtail/promtail-config.yaml <<EOF
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /var/lib/promtail/positions.yaml  # Tracks read positions

clients:
  - url: http://loki-server:3100/loki/api/v1/push

scrape_configs:
  # System logs
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: syslog
          host: ${HOSTNAME}
          __path__: /var/log/syslog

  # Auth logs
  - job_name: auth
    static_configs:
      - targets:
          - localhost
        labels:
          job: auth
          host: ${HOSTNAME}
          __path__: /var/log/auth.log

  # Nginx access logs with pipeline
  - job_name: nginx_access
    static_configs:
      - targets:
          - localhost
        labels:
          job: nginx
          log_type: access
          host: ${HOSTNAME}
          __path__: /var/log/nginx/access.log

  # Nginx error logs
  - job_name: nginx_error
    static_configs:
      - targets:
          - localhost
        labels:
          job: nginx
          log_type: error
          host: ${HOSTNAME}
          __path__: /var/log/nginx/error.log

  # Application logs (multiple files with glob)
  - job_name: application
    static_configs:
      - targets:
          - localhost
        labels:
          job: app
          environment: production
          host: ${HOSTNAME}
          __path__: /var/log/app/*.log
EOF

sudo mkdir -p /var/lib/promtail /etc/promtail
sudo useradd -r -s /bin/false promtail
# Add promtail user to adm group to read system logs
sudo usermod -a -G adm promtail
sudo chown -R promtail:promtail /var/lib/promtail /etc/promtail

cat > /etc/systemd/system/promtail.service <<EOF
[Unit]
Description=Promtail log shipper
After=network-online.target

[Service]
User=promtail
Group=promtail
ExecStart=/usr/local/bin/promtail -config.file=/etc/promtail/promtail-config.yaml
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl enable --now promtail

Label Extraction and Pipeline Stages

Promtail pipeline stages parse and enrich log entries:

# Enhanced Promtail config with pipeline stages
scrape_configs:
  - job_name: nginx_parsed
    static_configs:
      - targets:
          - localhost
        labels:
          job: nginx
          host: server-01
          __path__: /var/log/nginx/access.log
    pipeline_stages:
      # Parse nginx combined log format
      - regex:
          expression: '^(?P<remote_addr>[\w\.]+) - (?P<remote_user>\S+) \[(?P<time_local>[^\]]+)\] "(?P<method>\S+) (?P<request_uri>\S+) (?P<protocol>\S+)" (?P<status>\d+) (?P<body_bytes_sent>\d+)'
      # Extract labels from parsed fields
      - labels:
          status:
          method:
      # Convert status to numeric metric
      - metrics:
          nginx_response_time_seconds:
            type: Histogram
            description: "Nginx response time"
            source: request_time
            config:
              buckets: [0.001, 0.01, 0.1, 0.5, 1.0, 5.0]
      # Add timestamp from log
      - timestamp:
          source: time_local
          format: "02/Jan/2006:15:04:05 -0700"
      # Drop noisy health check requests
      - drop:
          expression: '.*healthz.*'

  # JSON log pipeline
  - job_name: json_app
    static_configs:
      - targets:
          - localhost
        labels:
          job: json-app
          __path__: /var/log/app/app.json.log
    pipeline_stages:
      - json:
          expressions:
            level: level
            message: message
            trace_id: trace_id
            service: service
      - labels:
          level:
          service:
      # Filter only errors for a separate stream
      - match:
          selector: '{job="json-app"} |= "error"'
          stages:
            - labels:
                severity: "error"

Loki Storage Backends

# Production: S3 storage backend
common:
  storage:
    s3:
      endpoint: s3.amazonaws.com
      bucketnames: your-loki-bucket
      region: us-east-1
      access_key_id: YOUR_ACCESS_KEY
      secret_access_key: YOUR_SECRET_KEY
      s3forcepathstyle: false

# Production: MinIO storage (self-hosted S3-compatible)
common:
  storage:
    s3:
      endpoint: minio.internal:9000
      bucketnames: loki-data
      region: us-east-1  # Required but ignored by MinIO
      access_key_id: minio-access-key
      secret_access_key: minio-secret-key
      s3forcepathstyle: true
      insecure: false

# Configure chunk caching with Redis
chunk_store_config:
  chunk_cache_config:
    redis:
      endpoint: redis:6379
      db: 0
      expiration: 1h

# Retention policy configuration
limits_config:
  retention_period: 744h  # 31 days globally

  # Per-tenant retention (requires auth_enabled: true)
ruler:
  storage:
    type: s3
    s3:
      bucketnames: loki-rules
      # ... same s3 config

compactor:
  working_directory: /var/lib/loki/compactor
  retention_enabled: true
  delete_request_store: s3

LogQL Queries

LogQL is Loki's query language:

# Basic log stream selection
{job="nginx"}

# Filter by label and content
{job="nginx", status="500"} |= "upstream"

# Show last 100 error logs
{job="nginx"} |= "error" | line_format "{{.message}}" | limit 100

# Parse nginx access log and filter slow requests
{job="nginx"} 
| regex `(?P<method>\w+) (?P<path>\S+).*" (?P<status>\d+) .* (?P<response_time>[\d.]+)$`
| response_time > 2.0

# Count errors per minute (metric query)
sum(rate({job="nginx"} |= "error" [5m])) by (host)

# Count HTTP status codes
sum by (status) (rate({job="nginx"} [5m]))

# Error rate percentage
sum(rate({job="nginx", status=~"5.."}[5m])) 
/ sum(rate({job="nginx"}[5m])) * 100

# Find failed SSH logins
{job="auth"} |= "Failed password" 
| regex `Failed password for (?P<user>\S+) from (?P<ip>[\d.]+)`
| line_format "User: {{.user}} from IP: {{.ip}}"

# JSON log parsing
{job="json-app"} 
| json 
| level="error" 
| line_format "{{.timestamp}} [{{.level}}] {{.message}} trace={{.trace_id}}"

# Top 10 slowest endpoints
topk(10, 
  sum by (path) (
    rate({job="nginx"} 
    | regex `(?P<path>/[^\s?]+).*(?P<duration>[\d.]+)$`
    | unwrap duration [5m])
  )
)

Grafana Log Visualization

# Add Loki data source in Grafana
curl -s -X POST \
  -H "Content-Type: application/json" \
  -u admin:grafana-password \
  http://grafana:3000/api/datasources \
  -d '{
    "name": "Loki",
    "type": "loki",
    "url": "http://loki:3100",
    "access": "proxy",
    "basicAuth": false
  }'

# Create a log dashboard panel via API or Grafana UI
# In Grafana: + > Dashboard > Add visualization > Select Loki datasource
# Use "Logs" visualization type for raw log viewing
# Use "Time series" or "Bar chart" for aggregated metrics from LogQL metric queries

# Example: Nginx dashboard with Logs panel
# Query: {job="nginx"} |= "" | json | line_format "{{.remote_addr}} {{.method}} {{.request_uri}} {{.status}}"
# Visualization: Logs
# Labels: status, method

# Correlate logs with Prometheus metrics
# In Grafana, use the "Explore" view to correlate:
# - Prometheus panel showing latency spike
# - Switch to Logs explorer with same time range
# - Filter {job="nginx"} |= "error"

Troubleshooting

Promtail not sending logs:

# Check Promtail is running and reading files
curl -s http://localhost:9080/metrics | grep promtail_files_active

# Check positions file to see where Promtail is reading
cat /var/lib/promtail/positions.yaml

# View Promtail logs
sudo journalctl -u promtail -n 50

# Test Promtail configuration
promtail -config.file=/etc/promtail/promtail-config.yaml -dry-run

Loki returning "too many outstanding requests":

# Increase query concurrency limits in loki-config.yaml
query_scheduler:
  max_outstanding_requests_per_tenant: 2048

# Or reduce query time range in LogQL
# Instead of last 7 days, use last 1 hour

High memory usage:

# Reduce ingestion rate limits
limits_config:
  ingestion_rate_mb: 32
  ingestion_burst_size_mb: 64

# Enable chunk compression
chunk_encoding: snappy

Labels not being extracted:

# Test pipeline stages locally with promtail in dry-run mode
echo '127.0.0.1 - - [15/Jan/2024:10:00:00 +0000] "GET /api/health HTTP/1.1" 200 42' | \
  promtail -config.file=/etc/promtail/promtail-config.yaml -stdin --dry-run

# Check Promtail targets and labels
curl -s http://localhost:9080/targets | jq

Conclusion

The Promtail + Loki + Grafana stack provides a cost-effective, horizontally scalable log aggregation solution that integrates naturally with Prometheus-based monitoring. Loki's label-based indexing approach keeps storage costs dramatically lower than full-text search solutions, while LogQL provides powerful querying capabilities. Start with filesystem storage for development and migrate to S3-compatible object storage as your log volume grows, using Loki's retention policies to automatically manage storage costs.

Promtail and Loki for Log Aggregation

On this page

Promtail and Loki for Log Aggregation

Prerequisites

Installing Loki

Configuring Promtail

Label Extraction and Pipeline Stages

Loki Storage Backends

LogQL Queries

Grafana Log Visualization

Troubleshooting

Conclusion

On this page

Latest Video

Get $20 Free Credit