Telegraf InfluxDB Grafana TIG Stack

The TIG stack (Telegraf, InfluxDB, Grafana) provides a complete metrics collection, storage, and visualization pipeline. UnComo Prometheus, InfluxDB is a time-series database optimized for high-volume metric ingestion and long-term storage. Esta guía covers InfluxDB installation, Telegraf configuration, metrics collection, and Grafana dashboards.

Tabla de Contenidos

Introducción

The TIG stack offers a different approach to metrics collection compared to Prometheus. InfluxDB pulls data from Telegraf agents in a push model, making it ideal for cloud environments and high-frequency metrics. Telegraf's plugin architecture enables collection from hundreds of data sources with minimal configuration.

Architecture

TIG Stack Descripción General

┌──────────────────────────────────────┐
│   Applications & Infrastructure      │
│  ┌────────────────────────────────┐  │
│  │   System Metrics               │  │
│  │   Application Logs & Events    │  │
│  │   Database Performance         │  │
│  └────────────────┬───────────────┘  │
└─────────────────────┼──────────────────┘
                      │
         ┌────────────▼────────────┐
         │      Telegraf Agents    │
         │  - Collection           │
         │  - Aggregation          │
         │  - Transformation       │
         └────────────┬────────────┘
                      │
                      │ Push (HTTP/TCP)
                      │
         ┌────────────▼─────────────┐
         │     InfluxDB Server      │
         │  - Time-series Storage   │
         │  - Retention Policies    │
         │  - Downsampling         │
         └────────────┬─────────────┘
                      │
         ┌────────────▼─────────────┐
         │       Grafana            │
         │  - Visualization         │
         │  - Dashboards            │
         │  - Alerting              │
         └──────────────────────────┘

Requisitos del Sistema

  • Linux (Ubuntu 20.04+, CentOS 8+, Debian 11+)
  • Minimum 2GB RAM for InfluxDB
  • At least 10GB storage (scales with metrics volume and retention)
  • Internet connectivity for downloads
  • Root or sudo access

InfluxDB Instalación

Paso 1: Agregar Repository and Install

# Add InfluxDB repository (Ubuntu/Debian)
wget -q https://repos.influxdata.com/influxdata-archive_compat.key
echo '943666ed83d68847d957f4db127ac0c2f3b7614b40ee23581f3842fda7537541' | sha256sum -c && cat influxdata-archive_compat.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg > /dev/null
echo 'deb [signed-by=/etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg] https://repos.influxdata.com/debian stable main' | sudo tee /etc/apt/sources.list.d/influxdata.list

# Install
sudo apt-get update
sudo apt-get install -y influxdb2

Paso 2: Iniciar InfluxDB

sudo systemctl enable influxdb
sudo systemctl start influxdb
sudo systemctl status influxdb

# Verify service
curl -I http://localhost:8086/health

Paso 3: Initial Configuración

# Access CLI
influx setup \
  --username admin \
  --password admin_password \
  --org myorg \
  --bucket mybucket \
  --retention 30d \
  --force

Paso 4: Crear API Token

# Generate API token via CLI
influx auth create \
  --org myorg \
  --description "Telegraf token" \
  --write-buckets

# Save token for Telegraf configuration
export INFLUX_TOKEN="your-generated-token"

Telegraf Instalación

Paso 1: Install Telegraf

# Add repository
wget -q https://repos.influxdata.com/influxdata-archive_compat.key
echo '943666ed83d68847d957f4db127ac0c2f3b7614b40ee23581f3842fda7537541' | sha256sum -c && cat influxdata-archive_compat.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg > /dev/null
echo 'deb [signed-by=/etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg] https://repos.influxdata.com/debian stable main' | sudo tee /etc/apt/sources.list.d/influxdata.list

# Install
sudo apt-get update
sudo apt-get install -y telegraf

# Enable service
sudo systemctl enable telegraf

Paso 2: Generate Configuración

# Generate default config
telegraf config > telegraf.conf

# Copy to /etc/telegraf
sudo cp telegraf.conf /etc/telegraf/telegraf.conf

# Create directory for custom configs
sudo mkdir -p /etc/telegraf/telegraf.d

Telegraf Input Plugins

System Métricas

# Edit telegraf config
sudo nano /etc/telegraf/telegraf.conf

# Or create specific config
sudo tee /etc/telegraf/telegraf.d/system.conf > /dev/null << 'EOF'
# System CPU usage
[[inputs.cpu]]
  percpu = true
  totalcpu = true
  interval = "10s"

# System memory
[[inputs.mem]]
  interval = "10s"

# Disk metrics
[[inputs.disk]]
  mount_points = ["/"]
  ignore_fs = ["tmpfs", "devtmpfs", "devfs"]
  interval = "30s"

# Disk I/O
[[inputs.diskio]]
  interval = "10s"

# Network interfaces
[[inputs.net]]
  interface_include = ["eth0", "eth1"]
  interval = "10s"

# System processes
[[inputs.processes]]
  interval = "10s"

# Load average
[[inputs.system]]
  interval = "10s"

# Kernel metrics
[[inputs.linux_sysctl_fs]]
  interval = "30s"
EOF

Application Monitoreo

# MySQL monitoring
sudo tee /etc/telegraf/telegraf.d/mysql.conf > /dev/null << 'EOF'
[[inputs.mysql]]
  servers = ["user:password@tcp(localhost:3306)/"]
  perf_events_statements_digest_text_limit = 120
  perf_events_statements_limit = 250
  perf_events_statements_interval = 60
  metric_database = "performance_schema"
  interval = "30s"
EOF

# PostgreSQL monitoring
sudo tee /etc/telegraf/telegraf.d/postgres.conf > /dev/null << 'EOF'
[[inputs.postgresql]]
  address = "host=localhost user=telegraf password=pwd dbname=postgres sslmode=disable"
  databases = ["postgres"]
  interval = "30s"
EOF

# Redis monitoring
sudo tee /etc/telegraf/telegraf.d/redis.conf > /dev/null << 'EOF'
[[inputs.redis]]
  servers = ["tcp://localhost:6379"]
  interval = "30s"
EOF

# Nginx monitoring
sudo tee /etc/telegraf/telegraf.d/nginx.conf > /dev/null << 'EOF'
[[inputs.nginx]]
  urls = ["http://localhost/nginx_status"]
  interval = "10s"
EOF

# Docker monitoring
sudo tee /etc/telegraf/telegraf.d/docker.conf > /dev/null << 'EOF'
[[inputs.docker]]
  endpoint = "unix:///var/run/docker.sock"
  gather_services = false
  timeout = "5s"
  perdevice = true
  total = true
  interval = "30s"
EOF

Custom Métricas via exec Plugin

sudo tee /etc/telegraf/telegraf.d/custom.conf > /dev/null << 'EOF'
# Execute custom scripts
[[inputs.exec]]
  commands = [
    "bash /opt/scripts/custom_metric.sh",
    "python3 /opt/scripts/app_metrics.py"
  ]
  timeout = "5s"
  data_format = "json"
  interval = "60s"
  tag_keys = ["hostname", "service"]
EOF

Telegraf Output Configuración

Main InfluxDB Output

sudo tee /etc/telegraf/telegraf.d/outputs.conf > /dev/null << 'EOF'
[[outputs.influxdb_v2]]
  urls = ["http://localhost:8086"]
  token = "your-api-token"
  organization = "myorg"
  bucket = "telegraf"
  
  # Batching
  flush_interval = "10s"
  metric_buffer_limit = 10000
  
  # Retention
  retention = 30d
  
  # TLS (if needed)
  insecure_skip_verify = false
  tls_ca = "/etc/telegraf/ca.pem"
  tls_cert = "/etc/telegraf/cert.pem"
  tls_key = "/etc/telegraf/key.pem"
EOF

Multiple Output Destinations

# Send to multiple InfluxDB instances
[[outputs.influxdb_v2]]
  urls = ["http://primary:8086"]
  token = "token1"
  organization = "myorg"
  bucket = "telegraf"

[[outputs.influxdb_v2]]
  urls = ["http://backup:8086"]
  token = "token2"
  organization = "myorg"
  bucket = "telegraf"
  tagpass = {"backup" = ["true"]}

InfluxDB Consultas

InfluxQL Consultas

# Connect to InfluxDB CLI
influx v1 shell

# List buckets
show databases

# Query metrics
SELECT * FROM cpu WHERE time > now() - 1h

# Aggregations
SELECT mean(usage_user) FROM cpu GROUP BY time(1m), host

# Join multiple series
SELECT * FROM cpu JOIN mem ON cpu.host = mem.host

Flux Consultas (Modern InfluxDB)

# Basic query
from(bucket: "telegraf")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "cpu")

# Aggregation
from(bucket: "telegraf")
  |> range(start: -24h)
  |> filter(fn: (r) => r._measurement == "mem")
  |> aggregateWindow(every: 1m, fn: mean)

# Multi-series
from(bucket: "telegraf")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement =~ /^(cpu|mem|disk)$/)
  |> group(columns: ["_measurement"])
  |> aggregateWindow(every: 5m, fn: mean)

# Downsampling
from(bucket: "telegraf")
  |> range(start: -30d)
  |> filter(fn: (r) => r._measurement == "cpu")
  |> aggregateWindow(every: 1h, fn: mean)
  |> to(bucket: "telegraf-downsampled")

Grafana Integración

Agregar InfluxDB Datos Source

curl -X POST http://admin:admin@localhost:3000/api/datasources \
  -H "Content-Type: application/json" \
  -d '{
    "name": "InfluxDB",
    "type": "influxdb",
    "url": "http://localhost:8086",
    "access": "proxy",
    "isDefault": true,
    "jsonData": {
      "version": "Flux",
      "organization": "myorg",
      "defaultBucket": "telegraf",
      "token": "your-api-token"
    }
  }'

Crear Panel Panel

Ejemplo Flux query for Grafana:

from(bucket: "telegraf")
  |> range(start: $__from, stop: $__to)
  |> filter(fn: (r) => r._measurement == "cpu")
  |> filter(fn: (r) => r._field == "usage_user")
  |> aggregateWindow(every: $__interval, fn: mean)

Avanzado Configurations

Processor Plugin (Datos Transformation)

sudo tee /etc/telegraf/telegraf.d/processors.conf > /dev/null << 'EOF'
# Rename fields
[[processors.rename]]
  [[processors.rename.replace]]
    field = "usage_user"
    dest = "cpu_usage_percent"

# Add tags
[[processors.tags]]
  [[processors.tags.tags]]
    key = "environment"
    value = "production"

# Drop fields
[[processors.fields]]
  drop = ["fieldname1", "fieldname2"]

# Regular expression
[[processors.regex]]
  [[processors.regex.tags]]
    key = "host"
    pattern = "^(.+?)\\."
    replacement = "${1}"
EOF

Aggregator Plugin (Windowing)

sudo tee /etc/telegraf/telegraf.d/aggregators.conf > /dev/null << 'EOF'
# Min/Max aggregation
[[aggregators.minmax]]
  period = "30s"
  drop_original = false

# Percentiles
[[aggregators.percentile]]
  period = "60s"
  percentiles = [50, 90, 95, 99]
  fields = ["response_time"]
EOF

Rendimiento Optimización

Batch Configuración

sudo nano /etc/telegraf/telegraf.conf

# Batching settings
[agent]
  metric_buffer_limit = 20000
  flush_interval = "10s"
  flush_jitter = "0s"

[outputs.influxdb_v2]
  flush_interval = "10s"
  metric_buffer_limit = 10000

Sampling and Filtering

sudo tee /etc/telegraf/telegraf.d/sampling.conf > /dev/null << 'EOF'
# Sample interval
[agent]
  interval = "30s"
  round_interval = true

# Collect only specific metrics
[[inputs.cpu]]
  interval = "60s"
  percpu = false
  totalcpu = true

# Tag filtering
[outputs.influxdb_v2]
  tagpass = {"environment" = ["production"]}
  tagdrop = {"test" = ["true"]}
EOF

Solución de Problemas

Verificar Telegraf Servicio

# Service status
sudo systemctl status telegraf

# Check logs
sudo journalctl -u telegraf -f

# Test configuration
telegraf -test -config /etc/telegraf/telegraf.conf

Verificar InfluxDB Connectivity

# Check health
curl -I http://localhost:8086/health

# Verify token
influx auth list

# Test write
telegraf --input-filter=cpu --output-filter=influxdb_v2 --test

Consulta Métricas

# Via influx CLI
influx query 'from(bucket: "telegraf") |> range(start: -1h) |> limit(n: 10)'

# Check bucket contents
influx bucket list

# View stored metrics
influx query 'from(bucket: "telegraf") |> group(columns: ["_measurement"])'

Retention and Cleanup

# Update retention policy
influx bucket update \
  --id bucket-id \
  --retention 30d

# List retention policies
influx bucket list --org myorg

Conclusión

The TIG stack provides a robust metrics collection and visualization platform with excellent scalability. By following Esta guía, you've deployed a high-rendimienPara monitoreareo system capable of handling thousands of metrics per second. Focus on efficient Telegraf configurations tailored a su infrastructure, leveraging retention policies for cost-effective storage, and creating Grafana dashboards that provide actionable insights. The flexibility of the TIG stack makes it ideal for cloud-native and high-volume metrics scenarios.