Thanos for Long-Term Prometheus Storage

Thanos extends Prometheus with unlimited data retention by shipping metrics to object storage (S3, GCS, MinIO), providing a global query view across multiple Prometheus instances, and enabling downsampling for fast long-range queries. This guide covers deploying Thanos on Linux with sidecar setup, object storage configuration, global querying, compaction, and high availability.

Prerequisites

  • Ubuntu 20.04+ or CentOS 8+ / Rocky Linux 8+
  • Prometheus 2.x already running
  • S3-compatible object storage: AWS S3, GCS, MinIO, or Wasabi
  • 1 GB RAM per component
  • Go 1.21+ (if building from source)

Installing Thanos

# Download pre-built Thanos binary
THANOS_VERSION="0.36.1"
wget https://github.com/thanos-io/thanos/releases/download/v${THANOS_VERSION}/thanos-${THANOS_VERSION}.linux-amd64.tar.gz

tar xzf thanos-${THANOS_VERSION}.linux-amd64.tar.gz
sudo mv thanos-${THANOS_VERSION}.linux-amd64/thanos /usr/local/bin/
thanos --version

# Create directory structure
sudo mkdir -p /etc/thanos /var/lib/thanos
sudo useradd -r -s /sbin/nologin thanos
sudo chown -R thanos:thanos /var/lib/thanos /etc/thanos

Thanos is a single binary with subcommands for each component:

  • thanos sidecar - sits next to Prometheus, uploads TSDB blocks
  • thanos store - serves historical data from object storage
  • thanos query - global PromQL query frontend
  • thanos compactor - compacts and downsamples blocks in object storage
  • thanos ruler - alerting/recording rules against global view

Object Storage Configuration

Create an object store configuration file. Thanos supports AWS S3, GCS, Azure Blob, and any S3-compatible storage:

# /etc/thanos/objstore.yaml

# For AWS S3
type: S3
config:
  bucket: my-thanos-metrics
  region: us-east-1
  access_key: AKIAIOSFODNN7EXAMPLE
  secret_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
  # Use IAM role instead of keys when running on EC2:
  # access_key: ""
  # secret_key: ""
# For MinIO (self-hosted S3)
type: S3
config:
  bucket: thanos
  endpoint: minio.example.com:9000
  access_key: minioadmin
  secret_key: minioadmin
  insecure: false
  signature_version2: false
# For Google Cloud Storage
type: GCS
config:
  bucket: my-thanos-bucket
  service_account: |
    {
      "type": "service_account",
      "project_id": "my-project",
      ...
    }

Protect this file:

sudo chmod 600 /etc/thanos/objstore.yaml
sudo chown thanos:thanos /etc/thanos/objstore.yaml

Thanos Sidecar Setup

The sidecar runs alongside Prometheus. It:

  1. Exposes Prometheus data via the Thanos gRPC StoreAPI
  2. Uploads completed TSDB blocks to object storage

Configure Prometheus to expose blocks for Thanos by ensuring --storage.tsdb.min-block-duration and --storage.tsdb.max-block-duration match:

# Edit Prometheus startup flags
# Add: --storage.tsdb.min-block-duration=2h
# Add: --storage.tsdb.max-block-duration=2h
# This creates 2-hour blocks that Thanos can safely upload

sudo systemctl edit prometheus

Create the sidecar service:

sudo cat > /etc/systemd/system/thanos-sidecar.service << 'EOF'
[Unit]
Description=Thanos Sidecar
After=prometheus.service

[Service]
User=thanos
Group=thanos
ExecStart=/usr/local/bin/thanos sidecar \
    --tsdb.path=/var/lib/prometheus \
    --prometheus.url=http://localhost:9090 \
    --objstore.config-file=/etc/thanos/objstore.yaml \
    --http-address=0.0.0.0:19191 \
    --grpc-address=0.0.0.0:10901 \
    --log.level=info
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now thanos-sidecar
sudo systemctl status thanos-sidecar

Verify the sidecar is uploading blocks:

# Check sidecar metrics
curl http://localhost:19191/metrics | grep thanos_objstore_bucket_operations_total

# List uploaded blocks in S3/MinIO
aws s3 ls s3://my-thanos-metrics/ --recursive | head -20

Thanos Store Gateway

The Store Gateway serves historical data from object storage, making old blocks available for queries:

sudo cat > /etc/systemd/system/thanos-store.service << 'EOF'
[Unit]
Description=Thanos Store Gateway
After=network.target

[Service]
User=thanos
Group=thanos
ExecStart=/usr/local/bin/thanos store \
    --data-dir=/var/lib/thanos/store \
    --objstore.config-file=/etc/thanos/objstore.yaml \
    --http-address=0.0.0.0:19193 \
    --grpc-address=0.0.0.0:10903 \
    --log.level=info
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now thanos-store

The store gateway downloads block indexes to disk (/var/lib/thanos/store) for fast metadata lookups. Allocate at least 10% of your S3 data size for this cache.

Thanos Querier (Global View)

The Querier provides a single PromQL interface that queries all sidecars and store gateways, deduplicating results:

sudo cat > /etc/systemd/system/thanos-query.service << 'EOF'
[Unit]
Description=Thanos Querier
After=network.target

[Service]
User=thanos
Group=thanos
ExecStart=/usr/local/bin/thanos query \
    --http-address=0.0.0.0:19192 \
    --grpc-address=0.0.0.0:10902 \
    --store=prometheus-1:10901 \
    --store=prometheus-2:10901 \
    --store=localhost:10903 \
    --query.replica-label=replica \
    --log.level=info
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now thanos-query

Access the Thanos Query UI at http://your-server:19192. It looks identical to the Prometheus UI.

For service discovery (dynamic sidecar registration), use file-based service discovery:

# /etc/thanos/stores.yaml
- targets:
    - prometheus-1:10901
    - prometheus-2:10901
    - store-gateway:10903
# Add to thanos-query ExecStart:
--store.sd-files=/etc/thanos/stores.yaml
--store.sd-interval=5m

Configure Grafana to use Thanos Querier:

  • URL: http://thanos-query-host:19192
  • Type: Prometheus

Thanos Compactor

The Compactor merges small blocks, applies downsampling, and enforces retention:

# Compactor runs as a one-shot job or continuous process
sudo cat > /etc/systemd/system/thanos-compactor.service << 'EOF'
[Unit]
Description=Thanos Compactor
After=network.target

[Service]
User=thanos
Group=thanos
ExecStart=/usr/local/bin/thanos compact \
    --data-dir=/var/lib/thanos/compact \
    --objstore.config-file=/etc/thanos/objstore.yaml \
    --http-address=0.0.0.0:19194 \
    --retention.resolution-raw=90d \
    --retention.resolution-5m=1y \
    --retention.resolution-1h=2y \
    --wait \
    --wait-interval=2h \
    --log.level=info
Restart=on-failure
RestartSec=30

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now thanos-compactor

Retention flags:

  • --retention.resolution-raw - keep raw (non-downsampled) data for 90 days
  • --retention.resolution-5m - keep 5-minute downsampled data for 1 year
  • --retention.resolution-1h - keep 1-hour downsampled data for 2 years

Important: Run only one Compactor per object storage bucket to avoid conflicts.

Thanos Ruler

The Ruler evaluates recording and alerting rules against the global query view:

# /etc/thanos/rules.yaml
groups:
  - name: infrastructure
    interval: 1m
    rules:
      - record: job:http_requests:rate5m
        expr: sum(rate(http_requests_total[5m])) by (job)

      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "High HTTP error rate on {{ $labels.job }}"
sudo cat > /etc/systemd/system/thanos-ruler.service << 'EOF'
[Unit]
Description=Thanos Ruler
After=thanos-query.service

[Service]
User=thanos
Group=thanos
ExecStart=/usr/local/bin/thanos rule \
    --data-dir=/var/lib/thanos/ruler \
    --eval-interval=1m \
    --rule-file=/etc/thanos/rules.yaml \
    --alertmanagers.url=http://alertmanager:9093 \
    --query=http://localhost:19192 \
    --objstore.config-file=/etc/thanos/objstore.yaml \
    --http-address=0.0.0.0:19195 \
    --grpc-address=0.0.0.0:10905 \
    --log.level=info
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now thanos-ruler

Troubleshooting

Sidecar not uploading blocks:

# Check sidecar logs
sudo journalctl -u thanos-sidecar -f

# Verify Prometheus TSDB path is correct
ls -la /var/lib/prometheus/
# Should see directories like 01HQK... (ULID-named blocks)

Querier shows no stores:

# Check querier stores
curl http://localhost:19192/api/v1/stores | python3 -m json.tool

# Verify gRPC ports are reachable
nc -zv prometheus-host 10901

Compactor conflicts:

# Check for lock file
aws s3 ls s3://my-thanos-metrics/ | grep thanos_compact
# Remove if stale (compactor crashed without cleanup)

High S3 API costs:

# Reduce listing calls with longer sync intervals
# Add to thanos-store:
--sync-block-duration=15m  # default 3m

Query timeout on long range:

# Increase query timeout
# Add to thanos-query:
--query.timeout=5m

Conclusion

Thanos solves Prometheus's primary limitation of limited local storage by transparently shipping TSDB blocks to cheap object storage while maintaining full PromQL compatibility. The sidecar approach requires zero Prometheus configuration changes beyond block duration settings, and the global query view makes it trivial to query across multiple Prometheus instances for federated environments. For cost efficiency, let the Compactor downsample old data to 5-minute and 1-hour resolution while keeping raw data only for recent months.