Grafana Loki and Promtail Complete Configuración
Building a complete registro stack with Grafana, Loki, and Promtail provides comprehensive log aggregation and visualization. Esta guía covers deploying the entire stack, configuring complex pipeline stages, integrating with Grafana, and Configurando log-based alerting for producción environments.
Tabla de Contenidos
- Descripción General
- Architecture
- System Preparation
- Loki Servidor Configuración
- Promtail Configuración
- Avanzado Canalización Etapas
- Grafana Integración
- Creating Registro Paneles
- Alerting on Registros
- Rendimiento Optimización
- [Solución de Problemas](#solución de problemas)
- Conclusión
Descripción General
A complete registro stack captures, processes, stores, and visualizes logs from all infrastructure components. Loki's label-based architecture reduces storage costs Mientras Promtail's flexible configuration handles diverse log sources. Grafana provides unified visualization across metrics and logs.
Architecture
Stack Components
┌─────────────────────────────────────────────┐
│ Application Logs / Syslog / Files │
└───────────────────┬─────────────────────────┘
│
┌─────────▼─────────┐
│ Promtail │
│ - Collection │
│ - Parsing │
│ - Labeling │
└─────────┬─────────┘
│
┌─────────▼─────────┐
│ Loki Server │
│ - Ingestion │
│ - Indexing │
│ - Storage │
└─────────┬─────────┘
│
┌────────────┼────────────┐
│ │ │
BoltDB S3/GCS Cassandra
│
┌─────────▼─────────┐
│ Grafana │
│ - Visualization │
│ - Alerting │
└───────────────────┘
System Preparation
Requisitos Previos
# System updates
sudo apt-get update && sudo apt-get upgrade -y
# Install dependencies
sudo apt-get install -y \
curl wget unzip \
git gcc make \
openssl ca-certificates
# Create logging user
sudo useradd --no-create-home --shell /bin/false logging
Directorio Structure
# Create directories
sudo mkdir -p /opt/logging/{loki,promtail,grafana}
sudo mkdir -p /var/lib/loki/{chunks,index,cache}
sudo mkdir -p /var/log/loki
sudo mkdir -p /etc/loki /etc/promtail
# Set permissions
sudo chown -R logging:logging /opt/logging
sudo chown -R logging:logging /var/lib/loki
sudo chown -R logging:logging /var/log/loki
Loki Servidor Configuración
Instalación
cd /tmp
wget https://github.com/grafana/loki/releases/download/v2.9.0/loki-linux-amd64.zip
unzip loki-linux-amd64.zip
sudo mv loki-linux-amd64 /usr/local/bin/loki
sudo chmod +x /usr/local/bin/loki
Producción Configuración
sudo tee /etc/loki/loki-config.yml > /dev/null << 'EOF'
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
log_level: info
log_format: json
timeout_shutdown_request: 10s
ingester:
chunk_idle_period: 3m
chunk_retain_period: 1m
max_chunk_age: 2h
chunk_encoding: snappy
chunk_size_target: 1048576
chunk_size_bytes: 1572864
max_streams_utilization_factor: 2.0
max_stream_entries_limit: 10000
lifecycler:
ring:
kvstore:
store: inmemory
replication_factor: 1
num_tokens: 128
heartbeat_timeout: 5m
term_timeout: 10m
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
max_line_size: 2097152
ingestion_rate_mb: 100
ingestion_burst_size_mb: 200
max_entries_limit_per_second: 10000
max_global_streams_matched_per_user: 10000
retention_period: 720h
cardinality_limit: 100000
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: loki_index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /var/lib/loki/index
shared_store: filesystem
cache_location: /var/lib/loki/cache
shared_store_key_prefix: index/
filesystem:
directory: /var/lib/loki/chunks
chunk_store_config:
max_look_back_period: 0s
chunk_cache_config:
cache:
enable_fifocache: true
default_validity: 1h
memcache:
batch_size: 1024
parallelism: 100
table_manager:
retention_deletes_enabled: true
retention_period: 720h
poll_interval: 10m
creation_grace_period: 10m
query_range:
align_queries_with_step: true
cache_results: true
results_cache:
cache:
enable_fifocache: true
default_validity: 1h
loki:
auth_enabled: false
tracing:
enabled: false
metrics:
enabled: false
EOF
sudo chown logging:logging /etc/loki/loki-config.yml
Systemd Servicio
sudo tee /etc/systemd/system/loki.service > /dev/null << 'EOF'
[Unit]
Description=Grafana Loki
Documentation=https://grafana.com/loki
After=network.target
[Service]
User=logging
Group=logging
Type=simple
ExecStart=/usr/local/bin/loki -config.file=/etc/loki/loki-config.yml
Restart=on-failure
RestartSec=5
# Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=loki
# Resource limits
LimitNOFILE=65536
LimitNPROC=65536
# Security
ProtectSystem=full
ProtectHome=yes
NoNewPrivileges=true
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable loki
sudo systemctl start loki
Promtail Configuración
Base Configuración
sudo tee /etc/promtail/promtail-config.yml > /dev/null << 'EOF'
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /var/lib/loki/positions.yaml
clients:
- url: http://localhost:3100/loki/api/v1/push
batchwait: 1s
batchsize: 1048576
scrape_configs:
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: syslog
host: __HOSTNAME__
__path__: /var/log/{syslog,messages}
- job_name: kernel
static_configs:
- targets:
- localhost
labels:
job: kernel
__path__: /var/log/kern.log
- job_name: auth
static_configs:
- targets:
- localhost
labels:
job: auth
__path__: /var/log/auth.log
- job_name: docker
static_configs:
- targets:
- localhost
labels:
job: docker
__path__: /var/lib/docker/containers/*/*-json.log
pipeline_stages:
- json:
expressions:
output: log
stream: stream
attrs_status: attrs.status
- output:
source: output
- job_name: nginx
static_configs:
- targets:
- localhost
labels:
job: nginx
__path__: /var/log/nginx/*.log
pipeline_stages:
- multiline:
line_start_pattern: '^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
- regex:
expression: '^(?P<remote>[\w\.]+) (?P<host>[\w\.]+) (?P<user>[\w\-\.]+) \[(?P<timestamp>[\w:/]+\s[+\-]\d{4})\] "(?P<method>\w+) (?P<path>[^\s]+) (?P<protocol>[\w/\.]+)" (?P<status>\d+|-) (?P<bytes>\d+|-)\s?"?(?P<referer>[^\s]*)"?\s?"?(?P<agent>[^"]*)"?'
- timestamp:
source: timestamp
format: '02/Jan/2006:15:04:05 -0700'
- labels:
status:
method:
path:
- metrics:
nginx_http_requests_total:
type: Counter
description: "Total HTTP requests"
prefix: "nginx_"
max_idle_duration: 30s
match_all: true
action: add
nginx_http_request_duration_seconds:
type: Histogram
description: "Request duration"
source: response_time
prefix: "nginx_"
buckets: [.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10]
max_idle_duration: 30s
- job_name: application
static_configs:
- targets:
- localhost
labels:
job: app
env: production
__path__: /var/log/app/*.log
pipeline_stages:
- json:
expressions:
timestamp: timestamp
level: level
msg: message
trace_id: trace_id
user_id: user_id
request_path: request.path
status_code: request.status_code
response_time: response_time_ms
- timestamp:
source: timestamp
format: 2006-01-02T15:04:05Z07:00
- labels:
level:
trace_id:
user_id:
- drop:
expression: '.*health_check.*'
- metrics:
app_errors_total:
type: Counter
description: "Total errors"
match_all: true
action: add
value: '1'
condition:
selector: '{level="error"}'
app_request_duration_seconds:
type: Histogram
description: "Request duration"
source: response_time
buckets: [10, 50, 100, 250, 500, 1000, 2500, 5000]
EOF
sudo chown logging:logging /etc/promtail/promtail-config.yml
Promtail Servicio
# Download Promtail
cd /tmp
wget https://github.com/grafana/loki/releases/download/v2.9.0/promtail-linux-amd64.zip
unzip promtail-linux-amd64.zip
sudo mv promtail-linux-amd64 /usr/local/bin/promtail
sudo chmod +x /usr/local/bin/promtail
# Create systemd service
sudo tee /etc/systemd/system/promtail.service > /dev/null << 'EOF'
[Unit]
Description=Grafana Promtail
Documentation=https://grafana.com/loki
After=network.target loki.service
[Service]
User=logging
Group=logging
Type=simple
ExecStart=/usr/local/bin/promtail -config.file=/etc/promtail/promtail-config.yml
Restart=on-failure
RestartSec=5
StandardOutput=journal
StandardError=journal
SyslogIdentifier=promtail
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable promtail
sudo systemctl start promtail
Avanzado Canalización Etapas
Multiline Registro Parsing
scrape_configs:
- job_name: java-app
static_configs:
- targets:
- localhost
labels:
job: java
__path__: /var/log/java-app/*.log
pipeline_stages:
- multiline:
line_start_pattern: '^\d{4}-\d{2}-\d{2}'
- regex:
expression: '^(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<level>\w+) \[(?P<thread>[^\]]+)\] (?P<logger>[^\s]+)\s-\s(?P<message>.*)$'
- timestamp:
source: timestamp
format: '2006-01-02 15:04:05'
- labels:
level:
thread:
logger:
Complex JSON Parsing
scrape_configs:
- job_name: structured-logs
static_configs:
- targets:
- localhost
labels:
job: structured
__path__: /var/log/app/*.json
pipeline_stages:
- json:
expressions:
timestamp: '@timestamp'
level: log.level
message: message
service: service.name
trace_id: trace.id
user_email: user.email
duration_ms: duration_ms
status_code: http.status_code
- timestamp:
source: timestamp
format: '2006-01-02T15:04:05.000Z07:00'
- labels:
level:
service:
trace_id:
status_code:
- metrics:
app_request_total:
type: Counter
description: "Total requests"
match_all: true
action: add
app_error_total:
type: Counter
description: "Total errors"
match_all: true
action: add
condition:
selector: '{level="error"}'
app_duration_seconds:
type: Histogram
description: "Request duration"
source: duration_ms
buckets: [10, 50, 100, 500, 1000, 5000, 10000]
Conditional Processing
scrape_configs:
- job_name: conditional-processing
static_configs:
- targets:
- localhost
labels:
job: conditional
__path__: /var/log/app/*.log
pipeline_stages:
- regex:
expression: '(?P<method>\w+) (?P<path>\S+) HTTP'
- drop:
expression: '(health_check|status_check|metrics)'
- json:
expressions:
duration: duration
only_errors: true
- match:
selector: '{method="GET"}'
stages:
- regex:
expression: 'duration=(?P<duration>\d+)'
- metrics:
get_requests_total:
type: Counter
match_all: true
action: add
- match:
selector: '{method="POST"}'
stages:
- metrics:
post_requests_total:
type: Counter
match_all: true
action: add
Grafana Integración
Agregar Loki Datos Source
curl -X POST http://admin:admin@localhost:3000/api/datasources \
-H "Content-Type: application/json" \
-d '{
"name": "Loki",
"type": "loki",
"url": "http://localhost:3100",
"access": "proxy",
"isDefault": false,
"jsonData": {
"maxLines": 1000
}
}'
Creating Registro Paneles
Panel JSON with Registro Panels
{
"dashboard": {
"title": "Application Logs Dashboard",
"panels": [
{
"title": "Log Volume",
"targets": [
{
"expr": "sum(rate({job=\"app\"}[5m])) by (level)",
"legendFormat": "{{level}}"
}
],
"type": "timeseries"
},
{
"title": "Error Logs",
"targets": [
{
"expr": "{job=\"app\", level=\"error\"}",
"format": "logs"
}
],
"type": "logs"
},
{
"title": "Request Duration",
"targets": [
{
"expr": "histogram_quantile(0.95, sum(rate({job=\"app\"} | json [5m])) by (le))"
}
],
"type": "timeseries"
}
]
}
}
Alerting on Registros
Crear Registro-Based Alertas
# Alert on error rate
curl -X POST http://admin:admin@localhost:3000/api/ruler/grafana/rules/logs \
-H "Content-Type: application/json" \
-d '{
"uid": "error-rate-alert",
"title": "High Error Rate",
"condition": "A",
"data": [
{
"refId": "A",
"queryType": "logs",
"model": {
"expr": "sum(rate({job=\"app\"} |= \"error\" [5m]))"
}
}
],
"noDataState": "NoData",
"execErrState": "Alerting",
"for": "5m",
"annotations": {
"summary": "High error rate detected"
}
}'
Rendimiento Optimización
Tuning Loki
# Increase chunk size for high-volume logging
# Edit loki-config.yml
chunk_size_target: 2097152 # 2MB instead of 1MB
chunk_size_bytes: 3145728 # 3MB max
# Increase ingestion rate
ingestion_rate_mb: 200
ingestion_burst_size_mb: 400
# Adjust query cache
query_range:
cache_results: true
results_cache:
cache:
default_validity: 2h
Promtail Optimización
# Increase batching
clients:
- url: http://localhost:3100/loki/api/v1/push
batchwait: 2s
batchsize: 2097152 # 2MB batches
backoff_config:
minbackoff: 100ms
maxbackoff: 10s
maxretries: 5
Solución de Problemas
Estado Checks
# Check Loki readiness
curl -f http://localhost:3100/ready
# Check Loki metrics
curl http://localhost:3100/metrics | grep loki_ingester
# Check Promtail metrics
curl http://localhost:9080/metrics | grep promtail
Verificar Datos Flow
# Query recent logs
curl 'http://localhost:3100/loki/api/v1/query_range?query={job="app"}&start=1000&end=2000&limit=100' | jq .
# Check Promtail position tracking
tail -20 /var/lib/loki/positions.yaml
Debug Issues
# Enable debug logging
# Edit loki-config.yml
server:
log_level: debug
# Restart services
sudo systemctl restart loki promtail
# Monitor logs
sudo journalctl -u loki -f
sudo journalctl -u promtail -f
Conclusión
A complete Loki stack provides cost-effective log aggregation with powerful querying and visualization. By following Esta guía, you've built a producción-ready registro infrastructure. Focus on designing efficient label hierarchies, leveraging pipeline stages for intelligent parsing, and setting appropriate retention policies. This foundation scales to handle large-scale registro requirements Mientras maintaining fast query rendimiento and keeping operational costs low.


