Thanos for Long-Term Prometheus Storage
Thanos extends Prometheus with unlimited data retention by shipping metrics to object storage (S3, GCS, MinIO), providing a global query view across multiple Prometheus instances, and enabling downsampling for fast long-range queries. This guide covers deploying Thanos on Linux with sidecar setup, object storage configuration, global querying, compaction, and high availability.
Prerequisites
- Ubuntu 20.04+ or CentOS 8+ / Rocky Linux 8+
- Prometheus 2.x already running
- S3-compatible object storage: AWS S3, GCS, MinIO, or Wasabi
- 1 GB RAM per component
- Go 1.21+ (if building from source)
Installing Thanos
# Download pre-built Thanos binary
THANOS_VERSION="0.36.1"
wget https://github.com/thanos-io/thanos/releases/download/v${THANOS_VERSION}/thanos-${THANOS_VERSION}.linux-amd64.tar.gz
tar xzf thanos-${THANOS_VERSION}.linux-amd64.tar.gz
sudo mv thanos-${THANOS_VERSION}.linux-amd64/thanos /usr/local/bin/
thanos --version
# Create directory structure
sudo mkdir -p /etc/thanos /var/lib/thanos
sudo useradd -r -s /sbin/nologin thanos
sudo chown -R thanos:thanos /var/lib/thanos /etc/thanos
Thanos is a single binary with subcommands for each component:
thanos sidecar- sits next to Prometheus, uploads TSDB blocksthanos store- serves historical data from object storagethanos query- global PromQL query frontendthanos compactor- compacts and downsamples blocks in object storagethanos ruler- alerting/recording rules against global view
Object Storage Configuration
Create an object store configuration file. Thanos supports AWS S3, GCS, Azure Blob, and any S3-compatible storage:
# /etc/thanos/objstore.yaml
# For AWS S3
type: S3
config:
bucket: my-thanos-metrics
region: us-east-1
access_key: AKIAIOSFODNN7EXAMPLE
secret_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
# Use IAM role instead of keys when running on EC2:
# access_key: ""
# secret_key: ""
# For MinIO (self-hosted S3)
type: S3
config:
bucket: thanos
endpoint: minio.example.com:9000
access_key: minioadmin
secret_key: minioadmin
insecure: false
signature_version2: false
# For Google Cloud Storage
type: GCS
config:
bucket: my-thanos-bucket
service_account: |
{
"type": "service_account",
"project_id": "my-project",
...
}
Protect this file:
sudo chmod 600 /etc/thanos/objstore.yaml
sudo chown thanos:thanos /etc/thanos/objstore.yaml
Thanos Sidecar Setup
The sidecar runs alongside Prometheus. It:
- Exposes Prometheus data via the Thanos gRPC StoreAPI
- Uploads completed TSDB blocks to object storage
Configure Prometheus to expose blocks for Thanos by ensuring --storage.tsdb.min-block-duration and --storage.tsdb.max-block-duration match:
# Edit Prometheus startup flags
# Add: --storage.tsdb.min-block-duration=2h
# Add: --storage.tsdb.max-block-duration=2h
# This creates 2-hour blocks that Thanos can safely upload
sudo systemctl edit prometheus
Create the sidecar service:
sudo cat > /etc/systemd/system/thanos-sidecar.service << 'EOF'
[Unit]
Description=Thanos Sidecar
After=prometheus.service
[Service]
User=thanos
Group=thanos
ExecStart=/usr/local/bin/thanos sidecar \
--tsdb.path=/var/lib/prometheus \
--prometheus.url=http://localhost:9090 \
--objstore.config-file=/etc/thanos/objstore.yaml \
--http-address=0.0.0.0:19191 \
--grpc-address=0.0.0.0:10901 \
--log.level=info
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now thanos-sidecar
sudo systemctl status thanos-sidecar
Verify the sidecar is uploading blocks:
# Check sidecar metrics
curl http://localhost:19191/metrics | grep thanos_objstore_bucket_operations_total
# List uploaded blocks in S3/MinIO
aws s3 ls s3://my-thanos-metrics/ --recursive | head -20
Thanos Store Gateway
The Store Gateway serves historical data from object storage, making old blocks available for queries:
sudo cat > /etc/systemd/system/thanos-store.service << 'EOF'
[Unit]
Description=Thanos Store Gateway
After=network.target
[Service]
User=thanos
Group=thanos
ExecStart=/usr/local/bin/thanos store \
--data-dir=/var/lib/thanos/store \
--objstore.config-file=/etc/thanos/objstore.yaml \
--http-address=0.0.0.0:19193 \
--grpc-address=0.0.0.0:10903 \
--log.level=info
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now thanos-store
The store gateway downloads block indexes to disk (/var/lib/thanos/store) for fast metadata lookups. Allocate at least 10% of your S3 data size for this cache.
Thanos Querier (Global View)
The Querier provides a single PromQL interface that queries all sidecars and store gateways, deduplicating results:
sudo cat > /etc/systemd/system/thanos-query.service << 'EOF'
[Unit]
Description=Thanos Querier
After=network.target
[Service]
User=thanos
Group=thanos
ExecStart=/usr/local/bin/thanos query \
--http-address=0.0.0.0:19192 \
--grpc-address=0.0.0.0:10902 \
--store=prometheus-1:10901 \
--store=prometheus-2:10901 \
--store=localhost:10903 \
--query.replica-label=replica \
--log.level=info
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now thanos-query
Access the Thanos Query UI at http://your-server:19192. It looks identical to the Prometheus UI.
For service discovery (dynamic sidecar registration), use file-based service discovery:
# /etc/thanos/stores.yaml
- targets:
- prometheus-1:10901
- prometheus-2:10901
- store-gateway:10903
# Add to thanos-query ExecStart:
--store.sd-files=/etc/thanos/stores.yaml
--store.sd-interval=5m
Configure Grafana to use Thanos Querier:
- URL:
http://thanos-query-host:19192 - Type: Prometheus
Thanos Compactor
The Compactor merges small blocks, applies downsampling, and enforces retention:
# Compactor runs as a one-shot job or continuous process
sudo cat > /etc/systemd/system/thanos-compactor.service << 'EOF'
[Unit]
Description=Thanos Compactor
After=network.target
[Service]
User=thanos
Group=thanos
ExecStart=/usr/local/bin/thanos compact \
--data-dir=/var/lib/thanos/compact \
--objstore.config-file=/etc/thanos/objstore.yaml \
--http-address=0.0.0.0:19194 \
--retention.resolution-raw=90d \
--retention.resolution-5m=1y \
--retention.resolution-1h=2y \
--wait \
--wait-interval=2h \
--log.level=info
Restart=on-failure
RestartSec=30
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now thanos-compactor
Retention flags:
--retention.resolution-raw- keep raw (non-downsampled) data for 90 days--retention.resolution-5m- keep 5-minute downsampled data for 1 year--retention.resolution-1h- keep 1-hour downsampled data for 2 years
Important: Run only one Compactor per object storage bucket to avoid conflicts.
Thanos Ruler
The Ruler evaluates recording and alerting rules against the global query view:
# /etc/thanos/rules.yaml
groups:
- name: infrastructure
interval: 1m
rules:
- record: job:http_requests:rate5m
expr: sum(rate(http_requests_total[5m])) by (job)
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 10m
labels:
severity: critical
annotations:
summary: "High HTTP error rate on {{ $labels.job }}"
sudo cat > /etc/systemd/system/thanos-ruler.service << 'EOF'
[Unit]
Description=Thanos Ruler
After=thanos-query.service
[Service]
User=thanos
Group=thanos
ExecStart=/usr/local/bin/thanos rule \
--data-dir=/var/lib/thanos/ruler \
--eval-interval=1m \
--rule-file=/etc/thanos/rules.yaml \
--alertmanagers.url=http://alertmanager:9093 \
--query=http://localhost:19192 \
--objstore.config-file=/etc/thanos/objstore.yaml \
--http-address=0.0.0.0:19195 \
--grpc-address=0.0.0.0:10905 \
--log.level=info
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now thanos-ruler
Troubleshooting
Sidecar not uploading blocks:
# Check sidecar logs
sudo journalctl -u thanos-sidecar -f
# Verify Prometheus TSDB path is correct
ls -la /var/lib/prometheus/
# Should see directories like 01HQK... (ULID-named blocks)
Querier shows no stores:
# Check querier stores
curl http://localhost:19192/api/v1/stores | python3 -m json.tool
# Verify gRPC ports are reachable
nc -zv prometheus-host 10901
Compactor conflicts:
# Check for lock file
aws s3 ls s3://my-thanos-metrics/ | grep thanos_compact
# Remove if stale (compactor crashed without cleanup)
High S3 API costs:
# Reduce listing calls with longer sync intervals
# Add to thanos-store:
--sync-block-duration=15m # default 3m
Query timeout on long range:
# Increase query timeout
# Add to thanos-query:
--query.timeout=5m
Conclusion
Thanos solves Prometheus's primary limitation of limited local storage by transparently shipping TSDB blocks to cheap object storage while maintaining full PromQL compatibility. The sidecar approach requires zero Prometheus configuration changes beyond block duration settings, and the global query view makes it trivial to query across multiple Prometheus instances for federated environments. For cost efficiency, let the Compactor downsample old data to 5-minute and 1-hour resolution while keeping raw data only for recent months.


