Loki Installation for Log Aggregation

Loki is a log aggregation system designed for DevOps and observability. Unlike traditional log aggregation systems, Loki uses a label-based index strategy similar to Prometheus, making it cost-effective and easy to operate. This guide covers Loki server deployment, Promtail configuration, label strategies, LogQL queries, and retention policies.

Table of Contents

Introduction

Loki provides horizontally-scalable, highly-available log aggregation without indexing the full content of logs. By indexing only metadata (labels), Loki dramatically reduces storage costs while maintaining powerful querying capabilities through LogQL. It integrates seamlessly with Prometheus and Grafana for complete observability.

Architecture

Component Overview

Applications/Services
        ↓
   Promtail Agents
        ↓
   Loki Server (3 modes)
   ├─ Single Binary
   ├─ Simple Scalable
   └─ Microservices
        ↓
   Time-Series Storage
   ├─ Object Storage (S3, GCS)
   └─ Database (BoltDB, Cassandra)
        ↓
   Grafana Visualization

System Requirements

  • Linux kernel 2.6.32 or later
  • Minimum 2GB RAM (4GB+ recommended)
  • 10GB+ storage (depends on retention policy)
  • Go 1.19+ (for compilation)
  • Object storage access (S3, GCS, MinIO) or local storage

Loki Installation

Step 1: Download and Install

# Create Loki user
sudo useradd --no-create-home --shell /bin/false loki

# Download latest release
cd /tmp
wget https://github.com/grafana/loki/releases/download/v2.9.0/loki-linux-amd64.zip
unzip loki-linux-amd64.zip

# Install binary
sudo mv loki-linux-amd64 /usr/local/bin/loki
sudo chown loki:loki /usr/local/bin/loki
sudo chmod +x /usr/local/bin/loki

# Verify installation
/usr/local/bin/loki --version

Step 2: Create Directories

# Create configuration and data directories
sudo mkdir -p /etc/loki /var/lib/loki
sudo chown loki:loki /etc/loki /var/lib/loki
sudo chmod 750 /etc/loki /var/lib/loki

# Create log directory
sudo mkdir -p /var/log/loki
sudo chown loki:loki /var/log/loki
sudo chmod 755 /var/log/loki

Step 3: Create Configuration File

sudo tee /etc/loki/loki-config.yml > /dev/null << 'EOF'
auth_enabled: false

ingester:
  chunk_idle_period: 3m
  chunk_retain_period: 1m
  max_chunk_age: 1h
  chunk_encoding: snappy
  chunk_size_target: 524288
  chunk_size_bytes: 1048576
  max_streams_utilization_factor: 1.5
  lifecycler:
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1

limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  max_line_size: 1024000
  ingestion_rate_mb: 12
  ingestion_burst_size_mb: 20
  max_entries_limit_per_second: 1000

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

server:
  http_listen_port: 3100
  log_level: info

storage_config:
  boltdb_shipper:
    active_index_directory: /var/lib/loki/index
    shared_store: filesystem
    cache_location: /var/lib/loki/cache
  filesystem:
    directory: /var/lib/loki/chunks

chunk_store_config:
  max_look_back_period: 0s

table_manager:
  retention_deletes_enabled: true
  retention_period: 720h
EOF

sudo chown loki:loki /etc/loki/loki-config.yml

Step 4: Create Systemd Service

sudo tee /etc/systemd/system/loki.service > /dev/null << 'EOF'
[Unit]
Description=Loki
After=network.target

[Service]
User=loki
Group=loki
Type=simple
ExecStart=/usr/local/bin/loki -config.file=/etc/loki/loki-config.yml
Restart=on-failure
RestartSec=5

StandardOutput=journal
StandardError=journal
SyslogIdentifier=loki

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable loki
sudo systemctl start loki
sudo systemctl status loki

Configuration

Single Binary Configuration for Production

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  log_level: info
  log_format: json

ingester:
  chunk_idle_period: 3m
  chunk_retain_period: 1m
  max_chunk_age: 2h
  chunk_encoding: snappy
  chunk_size_target: 1048576
  chunk_size_bytes: 1572864
  max_streams_utilization_factor: 2.0
  lifecycler:
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
    num_tokens: 128
    heartbeat_timeout: 5m

limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  max_line_size: 2097152
  ingestion_rate_mb: 50
  ingestion_burst_size_mb: 100
  max_entries_limit_per_second: 10000
  retention_period: 720h

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: s3
      schema: v11
      index:
        prefix: loki_index_
        period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /var/lib/loki/index
    shared_store: s3
    cache_location: /var/lib/loki/cache
  s3:
    s3: s3://region/bucket
    endpoint: https://s3.amazonaws.com
    access_key_id: YOUR_KEY
    secret_access_key: YOUR_SECRET
    s3forcepathstyle: true

chunk_store_config:
  max_look_back_period: 0s

table_manager:
  retention_deletes_enabled: true
  retention_period: 720h
  poll_interval: 10m
  creation_grace_period: 10m
EOF

Promtail Installation

Step 1: Download and Install Promtail

# Create promtail user
sudo useradd --no-create-home --shell /bin/false promtail

# Download
cd /tmp
wget https://github.com/grafana/loki/releases/download/v2.9.0/promtail-linux-amd64.zip
unzip promtail-linux-amd64.zip

# Install
sudo mv promtail-linux-amd64 /usr/local/bin/promtail
sudo chmod +x /usr/local/bin/promtail

# Create directories
sudo mkdir -p /etc/promtail
sudo chown promtail:promtail /etc/promtail

Step 2: Create Promtail Configuration

sudo tee /etc/promtail/promtail-config.yml > /dev/null << 'EOF'
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://localhost:3100/loki/api/v1/push

scrape_configs:
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: system
          env: production
          host: __HOSTNAME__
          __path__: /var/log/*log

  - job_name: syslog
    static_configs:
      - targets:
          - localhost
        labels:
          job: syslog
          __path__: /var/log/syslog

  - job_name: auth
    static_configs:
      - targets:
          - localhost
        labels:
          job: auth
          __path__: /var/log/auth.log

  - job_name: kernel
    static_configs:
      - targets:
          - localhost
        labels:
          job: kernel
          __path__: /var/log/kern.log

  - job_name: nginx
    static_configs:
      - targets:
          - localhost
        labels:
          job: nginx
          __path__: /var/log/nginx/*.log
    pipeline_stages:
      - regex:
          expression: '^(?P<remote>[^ ]*) (?P<host>[^ ]*) (?P<user>[^ ]*) \[(?P<time>[^\]]*)\] "(?P<method>\w+) (?P<path>[^ ]*) (?P<version>[^"]*)" (?P<status>[^ ]*) (?P<size>[^ ]*)'
      - labels:
          status:
          method:
          path:
EOF

sudo chown promtail:promtail /etc/promtail/promtail-config.yml

Step 3: Create Promtail Systemd Service

sudo tee /etc/systemd/system/promtail.service > /dev/null << 'EOF'
[Unit]
Description=Promtail
After=network.target

[Service]
User=promtail
Group=promtail
Type=simple
ExecStart=/usr/local/bin/promtail -config.file=/etc/promtail/promtail-config.yml
Restart=on-failure
RestartSec=5

StandardOutput=journal
StandardError=journal
SyslogIdentifier=promtail

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable promtail
sudo systemctl start promtail
sudo systemctl status promtail

Label Strategy

Design Label Sets

Effective labeling is crucial for Loki performance and queryability:

scrape_configs:
  - job_name: application
    static_configs:
      - targets:
          - localhost
        labels:
          job: app
          env: production
          region: us-east-1
          team: backend
          service: api
          __path__: /var/log/app/*.log

  - job_name: database
    static_configs:
      - targets:
          - localhost
        labels:
          job: database
          env: production
          region: us-east-1
          team: database
          service: postgresql
          __path__: /var/log/postgresql/*.log

Pipeline Stages for Label Extraction

Extract labels from log content:

scrape_configs:
  - job_name: application
    static_configs:
      - targets:
          - localhost
        labels:
          job: app
          __path__: /var/log/app/*.log
    pipeline_stages:
      # Parse JSON logs
      - json:
          expressions:
            level: level
            msg: message
            request_id: request_id
            user_id: user_id
      # Extract labels
      - labels:
          level:
          request_id:
          user_id:
      # Add timestamp
      - timestamp:
          source: timestamp
          format: 2006-01-02T15:04:05Z07:00

LogQL Queries

Basic Log Queries

# Get all logs from a job
{job="app"}

# Get logs with specific label value
{job="app", env="production"}

# Get logs from multiple services
{job=~"app|api", env="production"}

# Exclude logs
{job="app", env!="development"}

# Regex matching
{job="app", host=~".*prod.*"}

Filter by Content

# Show lines containing "error"
{job="app"} |= "error"

# Show lines NOT containing "200" (HTTP status)
{job="app"} != "200"

# Case-insensitive matching
{job="app"} |~ "(?i)failed"

# Regular expression filtering
{job="app"} |~ "error|exception|fatal"

Aggregation Queries

# Count of log lines per service
sum(count_over_time({job="app"}[5m])) by (service)

# Error rate
sum(count_over_time({job="app"} |= "error" [1m])) 
/ 
sum(count_over_time({job="app"}[1m]))

# Average response time from logs
avg(rate({job="app"} | json | response_time > 0 [5m]))

Label Aggregations

# Count logs by severity
sum(count_over_time({job="app"}[5m])) by (level)

# Top 10 users by log volume
topk(10, sum(count_over_time({job="app"}[5m])) by (user_id))

# Request count by endpoint
sum(count_over_time({job="api"} | json | uri != "" [5m])) by (uri)

Parsing and Extraction

# Extract duration from logs
{job="app"} | json | duration > 1000

# Parse key-value pairs
{job="app"} | logfmt | status >= 500

# Extract from log format
{job="app"} | pattern `<_> <_> <level> <_> <method> <path> <status>`

Data Retention

Retention Configuration

Set retention policies in loki-config.yml:

table_manager:
  retention_deletes_enabled: true
  retention_period: 720h  # 30 days
  poll_interval: 10m
  creation_grace_period: 10m
  
limits_config:
  retention_period: 720h
  enforce_retention: true

Configure Per-Label Retention

Advanced retention by labels:

limits_config:
  retention_period: 720h
  
  # Shorter retention for verbose logs
  stream_retention:
    - selector: '{level="debug"}'
      period: 48h
    - selector: '{job="debug-app"}'
      period: 24h
    - selector: '{env="production"}'
      period: 720h

Monitor Retention

# Check current log volume
curl -s http://localhost:3100/api/prom/series | jq .

# View label values
curl -s http://localhost:3100/api/prom/label/job/values | jq .

# Check index size
du -sh /var/lib/loki/index/
du -sh /var/lib/loki/chunks/

Grafana Integration

Add Loki Data Source

  1. Grafana > Configuration > Data Sources
  2. Add New Data Source > Loki
  3. URL: http://localhost:3100
  4. Save & Test

Create Loki Dashboard

  1. Create Dashboard > Add Panel
  2. Data Source: Loki
  3. Query Example:
{job="nginx"} | json status=status_code | status >= 400

Set Up Log Alerts

  1. Edit panel > Alert tab
  2. Create alert rule on query
  3. Example:
count_over_time({job="app"} |= "error" [5m]) > 10

Troubleshooting

Check Loki Status

# Service status
sudo systemctl status loki

# Metrics endpoint
curl -s http://localhost:3100/metrics | head -30

# Config reload
curl -X POST http://localhost:3100/-/reload

Debug Promtail

# Check Promtail status
sudo systemctl status promtail

# Test configuration
/usr/local/bin/promtail -verify-config -config.file=/etc/promtail/promtail-config.yml

# View logs
sudo journalctl -u promtail -f

Verify Data Flow

# Check Loki ingestion
curl 'http://localhost:3100/loki/api/v1/query_range?query={job="app"}&start=1000&end=2000&limit=100' | jq .

# Check indices
ls -la /var/lib/loki/index/

# Monitor disk usage
df -h /var/lib/loki/

Performance Issues

# Check chunk age
curl -s http://localhost:3100/metrics | grep loki_ingester_chunks_age_seconds

# Monitor ingestion rate
curl -s http://localhost:3100/metrics | grep loki_ingester_ingested_lines_total

# Check memory usage
ps aux | grep loki

Conclusion

Loki provides cost-effective log aggregation by indexing only labels rather than full content. Combined with Promtail for collection and Grafana for visualization, it creates a complete observability solution. Focus on designing clear label hierarchies, leveraging pipeline stages for efficient parsing, and setting appropriate retention policies for your use case. This foundation supports growing log volumes while maintaining fast query performance and minimal operational overhead.