Telegraf InfluxDB Grafana TIG Stack
The TIG stack (Telegraf, InfluxDB, Grafana) provides a complete metrics collection, storage, and visualization pipeline. Unlike Prometheus, InfluxDB is a time-series database optimized for high-volume metric ingestion and long-term storage. This guide covers InfluxDB installation, Telegraf configuration, metrics collection, and Grafana dashboards.
Table of Contents
- Introduction
- Architecture
- System Requirements
- InfluxDB Installation
- Telegraf Installation
- Telegraf Input Plugins
- Telegraf Output Configuration
- InfluxDB Queries
- Grafana Integration
- Advanced Configurations
- Performance Optimization
- Troubleshooting
- Conclusion
Introduction
The TIG stack offers a different approach to metrics collection compared to Prometheus. InfluxDB pulls data from Telegraf agents in a push model, making it ideal for cloud environments and high-frequency metrics. Telegraf's plugin architecture enables collection from hundreds of data sources with minimal configuration.
Architecture
TIG Stack Overview
┌──────────────────────────────────────┐
│ Applications & Infrastructure │
│ ┌────────────────────────────────┐ │
│ │ System Metrics │ │
│ │ Application Logs & Events │ │
│ │ Database Performance │ │
│ └────────────────┬───────────────┘ │
└─────────────────────┼──────────────────┘
│
┌────────────▼────────────┐
│ Telegraf Agents │
│ - Collection │
│ - Aggregation │
│ - Transformation │
└────────────┬────────────┘
│
│ Push (HTTP/TCP)
│
┌────────────▼─────────────┐
│ InfluxDB Server │
│ - Time-series Storage │
│ - Retention Policies │
│ - Downsampling │
└────────────┬─────────────┘
│
┌────────────▼─────────────┐
│ Grafana │
│ - Visualization │
│ - Dashboards │
│ - Alerting │
└──────────────────────────┘
System Requirements
- Linux (Ubuntu 20.04+, CentOS 8+, Debian 11+)
- Minimum 2GB RAM for InfluxDB
- At least 10GB storage (scales with metrics volume and retention)
- Internet connectivity for downloads
- Root or sudo access
InfluxDB Installation
Step 1: Add Repository and Install
# Add InfluxDB repository (Ubuntu/Debian)
wget -q https://repos.influxdata.com/influxdata-archive_compat.key
echo '943666ed83d68847d957f4db127ac0c2f3b7614b40ee23581f3842fda7537541' | sha256sum -c && cat influxdata-archive_compat.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg > /dev/null
echo 'deb [signed-by=/etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg] https://repos.influxdata.com/debian stable main' | sudo tee /etc/apt/sources.list.d/influxdata.list
# Install
sudo apt-get update
sudo apt-get install -y influxdb2
Step 2: Start InfluxDB
sudo systemctl enable influxdb
sudo systemctl start influxdb
sudo systemctl status influxdb
# Verify service
curl -I http://localhost:8086/health
Step 3: Initial Setup
# Access CLI
influx setup \
--username admin \
--password admin_password \
--org myorg \
--bucket mybucket \
--retention 30d \
--force
Step 4: Create API Token
# Generate API token via CLI
influx auth create \
--org myorg \
--description "Telegraf token" \
--write-buckets
# Save token for Telegraf configuration
export INFLUX_TOKEN="your-generated-token"
Telegraf Installation
Step 1: Install Telegraf
# Add repository
wget -q https://repos.influxdata.com/influxdata-archive_compat.key
echo '943666ed83d68847d957f4db127ac0c2f3b7614b40ee23581f3842fda7537541' | sha256sum -c && cat influxdata-archive_compat.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg > /dev/null
echo 'deb [signed-by=/etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg] https://repos.influxdata.com/debian stable main' | sudo tee /etc/apt/sources.list.d/influxdata.list
# Install
sudo apt-get update
sudo apt-get install -y telegraf
# Enable service
sudo systemctl enable telegraf
Step 2: Generate Configuration
# Generate default config
telegraf config > telegraf.conf
# Copy to /etc/telegraf
sudo cp telegraf.conf /etc/telegraf/telegraf.conf
# Create directory for custom configs
sudo mkdir -p /etc/telegraf/telegraf.d
Telegraf Input Plugins
System Metrics
# Edit telegraf config
sudo nano /etc/telegraf/telegraf.conf
# Or create specific config
sudo tee /etc/telegraf/telegraf.d/system.conf > /dev/null << 'EOF'
# System CPU usage
[[inputs.cpu]]
percpu = true
totalcpu = true
interval = "10s"
# System memory
[[inputs.mem]]
interval = "10s"
# Disk metrics
[[inputs.disk]]
mount_points = ["/"]
ignore_fs = ["tmpfs", "devtmpfs", "devfs"]
interval = "30s"
# Disk I/O
[[inputs.diskio]]
interval = "10s"
# Network interfaces
[[inputs.net]]
interface_include = ["eth0", "eth1"]
interval = "10s"
# System processes
[[inputs.processes]]
interval = "10s"
# Load average
[[inputs.system]]
interval = "10s"
# Kernel metrics
[[inputs.linux_sysctl_fs]]
interval = "30s"
EOF
Application Monitoring
# MySQL monitoring
sudo tee /etc/telegraf/telegraf.d/mysql.conf > /dev/null << 'EOF'
[[inputs.mysql]]
servers = ["user:password@tcp(localhost:3306)/"]
perf_events_statements_digest_text_limit = 120
perf_events_statements_limit = 250
perf_events_statements_interval = 60
metric_database = "performance_schema"
interval = "30s"
EOF
# PostgreSQL monitoring
sudo tee /etc/telegraf/telegraf.d/postgres.conf > /dev/null << 'EOF'
[[inputs.postgresql]]
address = "host=localhost user=telegraf password=pwd dbname=postgres sslmode=disable"
databases = ["postgres"]
interval = "30s"
EOF
# Redis monitoring
sudo tee /etc/telegraf/telegraf.d/redis.conf > /dev/null << 'EOF'
[[inputs.redis]]
servers = ["tcp://localhost:6379"]
interval = "30s"
EOF
# Nginx monitoring
sudo tee /etc/telegraf/telegraf.d/nginx.conf > /dev/null << 'EOF'
[[inputs.nginx]]
urls = ["http://localhost/nginx_status"]
interval = "10s"
EOF
# Docker monitoring
sudo tee /etc/telegraf/telegraf.d/docker.conf > /dev/null << 'EOF'
[[inputs.docker]]
endpoint = "unix:///var/run/docker.sock"
gather_services = false
timeout = "5s"
perdevice = true
total = true
interval = "30s"
EOF
Custom Metrics via exec Plugin
sudo tee /etc/telegraf/telegraf.d/custom.conf > /dev/null << 'EOF'
# Execute custom scripts
[[inputs.exec]]
commands = [
"bash /opt/scripts/custom_metric.sh",
"python3 /opt/scripts/app_metrics.py"
]
timeout = "5s"
data_format = "json"
interval = "60s"
tag_keys = ["hostname", "service"]
EOF
Telegraf Output Configuration
Main InfluxDB Output
sudo tee /etc/telegraf/telegraf.d/outputs.conf > /dev/null << 'EOF'
[[outputs.influxdb_v2]]
urls = ["http://localhost:8086"]
token = "your-api-token"
organization = "myorg"
bucket = "telegraf"
# Batching
flush_interval = "10s"
metric_buffer_limit = 10000
# Retention
retention = 30d
# TLS (if needed)
insecure_skip_verify = false
tls_ca = "/etc/telegraf/ca.pem"
tls_cert = "/etc/telegraf/cert.pem"
tls_key = "/etc/telegraf/key.pem"
EOF
Multiple Output Destinations
# Send to multiple InfluxDB instances
[[outputs.influxdb_v2]]
urls = ["http://primary:8086"]
token = "token1"
organization = "myorg"
bucket = "telegraf"
[[outputs.influxdb_v2]]
urls = ["http://backup:8086"]
token = "token2"
organization = "myorg"
bucket = "telegraf"
tagpass = {"backup" = ["true"]}
InfluxDB Queries
InfluxQL Queries
# Connect to InfluxDB CLI
influx v1 shell
# List buckets
show databases
# Query metrics
SELECT * FROM cpu WHERE time > now() - 1h
# Aggregations
SELECT mean(usage_user) FROM cpu GROUP BY time(1m), host
# Join multiple series
SELECT * FROM cpu JOIN mem ON cpu.host = mem.host
Flux Queries (Modern InfluxDB)
# Basic query
from(bucket: "telegraf")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "cpu")
# Aggregation
from(bucket: "telegraf")
|> range(start: -24h)
|> filter(fn: (r) => r._measurement == "mem")
|> aggregateWindow(every: 1m, fn: mean)
# Multi-series
from(bucket: "telegraf")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement =~ /^(cpu|mem|disk)$/)
|> group(columns: ["_measurement"])
|> aggregateWindow(every: 5m, fn: mean)
# Downsampling
from(bucket: "telegraf")
|> range(start: -30d)
|> filter(fn: (r) => r._measurement == "cpu")
|> aggregateWindow(every: 1h, fn: mean)
|> to(bucket: "telegraf-downsampled")
Grafana Integration
Add InfluxDB Data Source
curl -X POST http://admin:admin@localhost:3000/api/datasources \
-H "Content-Type: application/json" \
-d '{
"name": "InfluxDB",
"type": "influxdb",
"url": "http://localhost:8086",
"access": "proxy",
"isDefault": true,
"jsonData": {
"version": "Flux",
"organization": "myorg",
"defaultBucket": "telegraf",
"token": "your-api-token"
}
}'
Create Dashboard Panel
Example Flux query for Grafana:
from(bucket: "telegraf")
|> range(start: $__from, stop: $__to)
|> filter(fn: (r) => r._measurement == "cpu")
|> filter(fn: (r) => r._field == "usage_user")
|> aggregateWindow(every: $__interval, fn: mean)
Advanced Configurations
Processor Plugin (Data Transformation)
sudo tee /etc/telegraf/telegraf.d/processors.conf > /dev/null << 'EOF'
# Rename fields
[[processors.rename]]
[[processors.rename.replace]]
field = "usage_user"
dest = "cpu_usage_percent"
# Add tags
[[processors.tags]]
[[processors.tags.tags]]
key = "environment"
value = "production"
# Drop fields
[[processors.fields]]
drop = ["fieldname1", "fieldname2"]
# Regular expression
[[processors.regex]]
[[processors.regex.tags]]
key = "host"
pattern = "^(.+?)\\."
replacement = "${1}"
EOF
Aggregator Plugin (Windowing)
sudo tee /etc/telegraf/telegraf.d/aggregators.conf > /dev/null << 'EOF'
# Min/Max aggregation
[[aggregators.minmax]]
period = "30s"
drop_original = false
# Percentiles
[[aggregators.percentile]]
period = "60s"
percentiles = [50, 90, 95, 99]
fields = ["response_time"]
EOF
Performance Optimization
Batch Configuration
sudo nano /etc/telegraf/telegraf.conf
# Batching settings
[agent]
metric_buffer_limit = 20000
flush_interval = "10s"
flush_jitter = "0s"
[outputs.influxdb_v2]
flush_interval = "10s"
metric_buffer_limit = 10000
Sampling and Filtering
sudo tee /etc/telegraf/telegraf.d/sampling.conf > /dev/null << 'EOF'
# Sample interval
[agent]
interval = "30s"
round_interval = true
# Collect only specific metrics
[[inputs.cpu]]
interval = "60s"
percpu = false
totalcpu = true
# Tag filtering
[outputs.influxdb_v2]
tagpass = {"environment" = ["production"]}
tagdrop = {"test" = ["true"]}
EOF
Troubleshooting
Verify Telegraf Service
# Service status
sudo systemctl status telegraf
# Check logs
sudo journalctl -u telegraf -f
# Test configuration
telegraf -test -config /etc/telegraf/telegraf.conf
Verify InfluxDB Connectivity
# Check health
curl -I http://localhost:8086/health
# Verify token
influx auth list
# Test write
telegraf --input-filter=cpu --output-filter=influxdb_v2 --test
Query Metrics
# Via influx CLI
influx query 'from(bucket: "telegraf") |> range(start: -1h) |> limit(n: 10)'
# Check bucket contents
influx bucket list
# View stored metrics
influx query 'from(bucket: "telegraf") |> group(columns: ["_measurement"])'
Retention and Cleanup
# Update retention policy
influx bucket update \
--id bucket-id \
--retention 30d
# List retention policies
influx bucket list --org myorg
Conclusion
The TIG stack provides a robust metrics collection and visualization platform with excellent scalability. By following this guide, you've deployed a high-performance monitoring system capable of handling thousands of metrics per second. Focus on efficient Telegraf configurations tailored to your infrastructure, leveraging retention policies for cost-effective storage, and creating Grafana dashboards that provide actionable insights. The flexibility of the TIG stack makes it ideal for cloud-native and high-volume metrics scenarios.


