Jaeger Installation for Distributed Tracing

Jaeger is an open-source distributed tracing platform for monitoring and troubleshooting complex microservices architectures. It helps track requests across multiple services, identify performance bottlenecks, and understand service dependencies. This guide covers all-in-one deployment, Elasticsearch backend setup, application instrumentation, trace analysis, and sampling strategies.

Table of Contents

Introduction

Jaeger solves the observability challenge of distributed systems by providing end-to-end request tracing. Unlike metrics that aggregate behavior, traces show individual request paths through microservices, revealing exactly where latency occurs and why services fail.

Architecture

Jaeger Components

┌─────────────────────────────────┐
│   Application Services          │
│  ├─ Jaeger Client Libraries    │
│  └─ OpenTelemetry SDKs         │
└────────────┬────────────────────┘
             │ Emit Traces (UDP/HTTP)
    ┌────────▼─────────┐
    │   Jaeger Agent   │ (Port 6831 UDP, 6832, 14250)
    │  ├─ Batching    │
    │  └─ Sampling    │
    └────────┬─────────┘
             │
    ┌────────▼──────────────┐
    │  Jaeger Collector     │ (Port 14250, 14268)
    │  ├─ Authentication    │
    │  ├─ Batching         │
    │  └─ Validation       │
    └────────┬──────────────┘
             │
    ┌────────▼──────────────┐
    │ Storage Backend       │
    │ ├─ Elasticsearch      │
    │ ├─ Cassandra         │
    │ ├─ BadgerDB          │
    │ └─ Memory            │
    └────────┬──────────────┘
             │
    ┌────────▼──────────────┐
    │   Jaeger UI           │ (Port 16686)
    │   ├─ Search Traces   │
    │   ├─ Visualize       │
    │   └─ Analyze         │
    └───────────────────────┘

System Requirements

  • Linux (Ubuntu 20.04+, CentOS 8+)
  • Minimum 2GB RAM for all-in-one
  • 4GB+ for production with Elasticsearch
  • 10GB+ storage for trace backend
  • Java 11+ (for Elasticsearch)
  • Network connectivity for trace collection

All-In-One Deployment

# Pull and run all-in-one container
docker run -d \
  --name jaeger \
  -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
  -p 5775:5775/udp \
  -p 6831:6831/udp \
  -p 6832:6832/udp \
  -p 5778:5778 \
  -p 16686:16686 \
  -p 14250:14250 \
  -p 14268:14268 \
  -p 14269:14269 \
  -p 9411:9411 \
  jaegertracing/all-in-one:latest

# Verify container
docker logs jaeger
docker ps | grep jaeger

Binary Installation

# Download latest release
cd /tmp
wget https://github.com/jaegertracing/jaeger/releases/download/v1.48.0/jaeger-1.48.0-linux-amd64.tar.gz
tar -xzf jaeger-1.48.0-linux-amd64.tar.gz

# Create user
sudo useradd --no-create-home --shell /bin/false jaeger

# Install binaries
sudo mkdir -p /opt/jaeger
sudo mv jaeger-1.48.0-linux-amd64/* /opt/jaeger/
sudo chown -R jaeger:jaeger /opt/jaeger

Systemd Service

sudo tee /etc/systemd/system/jaeger.service > /dev/null << 'EOF'
[Unit]
Description=Jaeger All-In-One
After=network.target

[Service]
User=jaeger
Group=jaeger
Type=simple
ExecStart=/opt/jaeger/jaeger-all-in-one \
  --collector.zipkin.host-port=0.0.0.0:9411 \
  --collector.grpc.enabled=true \
  --collector.grpc.host-port=0.0.0.0:14250 \
  --collector.http.enabled=true \
  --collector.http.host-port=0.0.0.0:14268 \
  --query.base-path=/jaeger \
  --memory.max-traces=100000

Restart=on-failure
RestartSec=10

StandardOutput=journal
StandardError=journal
SyslogIdentifier=jaeger

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable jaeger
sudo systemctl start jaeger

Access Jaeger UI

Navigate to http://localhost:16686

Elasticsearch Backend

Install Elasticsearch

# Add repository
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

# Install
sudo apt-get update
sudo apt-get install -y elasticsearch

# Enable and start
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch

# Verify
curl -u elastic:password http://localhost:9200/

Configure Jaeger with Elasticsearch

# Edit Jaeger configuration
sudo tee /etc/jaeger/jaeger-collector.yml > /dev/null << 'EOF'
storage:
  type: elasticsearch
  es:
    server-urls: http://localhost:9200
    index-prefix: jaeger
    bulk:
      actions: 1000
      flush-interval: 200ms
      size: 5000000

querying:
  max-trace-lookback: 168h

sampling:
  initial-sampling-probability: 0.1
  strategies:
    - service-name: "*"
      type: probabilistic
      param: 0.01
    - service-name: critical-service
      type: probabilistic
      param: 1.0

collector:
  port: 14250
  grpc:
    enabled: true
    host-port: 0.0.0.0:14250
  http:
    enabled: true
    host-port: 0.0.0.0:14268

query:
  base-path: /jaeger
  port: 16686
EOF

Collector and Agent

Jaeger Agent Setup

Install agent on each application host:

# Download agent
wget https://github.com/jaegertracing/jaeger/releases/download/v1.48.0/jaeger-agent-1.48.0-linux-amd64

# Install
sudo mv jaeger-agent-1.48.0-linux-amd64 /usr/local/bin/jaeger-agent
sudo chmod +x /usr/local/bin/jaeger-agent

# Create systemd service
sudo tee /etc/systemd/system/jaeger-agent.service > /dev/null << 'EOF'
[Unit]
Description=Jaeger Agent
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/jaeger-agent \
  --reporter.logspans=true \
  --reporter.tchannel.host-port=localhost:6831 \
  --collector.host-port=jaeger-collector:14250

Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable jaeger-agent
sudo systemctl start jaeger-agent

Collector Configuration

# Docker Compose setup
cat > docker-compose.yml << 'EOF'
version: '3'

services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.0.0
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
    ports:
      - "9200:9200"

  jaeger-collector:
    image: jaegertracing/jaeger-collector:latest
    environment:
      - SPAN_STORAGE_TYPE=elasticsearch
      - ES_SERVER_URLS=http://elasticsearch:9200
      - ES_INDEX_PREFIX=jaeger
    ports:
      - "14250:14250"
      - "14268:14268"
    depends_on:
      - elasticsearch

  jaeger-query:
    image: jaegertracing/jaeger-query:latest
    environment:
      - SPAN_STORAGE_TYPE=elasticsearch
      - ES_SERVER_URLS=http://elasticsearch:9200
      - ES_INDEX_PREFIX=jaeger
      - QUERY_BASE_PATH=/jaeger
    ports:
      - "16686:16686"
    depends_on:
      - elasticsearch
EOF

docker-compose up -d

Application Instrumentation

Python Application

from jaeger_client import Config
from opentelemetry import trace, metrics
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
import logging

# Configure Jaeger
jaeger_exporter = JaegerExporter(
    agent_host_name="localhost",
    agent_port=6831,
)

trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(jaeger_exporter)
)

tracer = trace.get_tracer(__name__)

# Instrument requests
import requests
from opentelemetry.instrumentation.requests import RequestsInstrumentor

RequestsInstrumentor().instrument()

# Use tracer
with tracer.start_as_current_span("my-operation") as span:
    span.set_attribute("service.name", "my-service")
    span.set_attribute("span.kind", "SERVER")
    
    # Your application code
    response = requests.get("http://example.com")

Go Application

package main

import (
	"io"
	"log"

	"github.com/jaegertracing/jaeger-client-go"
	"github.com/jaegertracing/jaeger-client-go/config"
)

func initTracer(serviceName string) (io.Closer, error) {
	cfg := &config.Configuration{
		ServiceName: serviceName,
		Sampler: &config.SamplerConfig{
			Type:  "const",
			Param: 1,
		},
		Reporter: &config.ReporterConfig{
			LogSpans:           true,
			LocalAgentHostPort: "localhost:6831",
		},
	}

	closer, err := cfg.InitGlobalTracer()
	if err != nil {
		return nil, err
	}

	return closer, nil
}

func main() {
	closer, err := initTracer("my-service")
	if err != nil {
		log.Fatal(err)
	}
	defer closer.Close()

	tracer := opentracing.GlobalTracer()
	span := tracer.StartSpan("my-operation")
	defer span.Finish()

	// Your application code
}

Node.js Application

const initTracer = require('jaeger-client').initTracer;
const opentracing = require('opentracing');

const config = {
  serviceName: 'my-service',
  sampler: {
    type: 'const',
    param: 1,
  },
  reporter: {
    logSpans: true,
    agentHost: 'localhost',
    agentPort: 6831,
  },
};

const options = {
  logger: console,
};

const tracer = initTracer(config, options);

const span = tracer.startSpan('my-operation');
span.setTag('component', 'my-service');
span.log({ event: 'request_start' });

// Your application code

span.finish();

Trace Analysis

Search Traces

Via Jaeger UI at http://localhost:16686:

  1. Service: Select application
  2. Operation: Choose specific operation
  3. Tags: Filter by request attributes
  4. Min/Max Duration: Set time ranges

Trace Inspection

Trace Details View:
- Service list with timing
- Span details and logs
- Tags and baggage
- Error information
- Dependencies graph

Latency Analysis

1. Search results by latency
2. Identify slow spans
3. Compare traces
4. Export for analysis

Query examples:
- error=true
- http.status_code=500
- duration>100ms

Sampling Strategies

Probabilistic Sampling

sampling:
  default-strategy:
    type: probabilistic
    param: 0.1  # 10% of requests

  strategies:
    - service-name: "frontend"
      type: probabilistic
      param: 1.0  # 100% of frontend requests
    
    - service-name: "backend"
      type: probabilistic
      param: 0.01  # 1% of backend

Rate Limiting Sampling

sampling:
  default-strategy:
    type: rate-limiting
    param: 100  # Max 100 traces per second

  strategies:
    - service-name: "*"
      type: rate-limiting
      param: 1000

Adaptive Sampling

sampling:
  strategies:
    - service-name: "*"
      type: adaptive
      param: 0.0001  # Initial probability
      
      override-rules:
        - match:
            - tag: error
              value: true
          probability: 1.0  # Always sample errors
        
        - match:
            - tag: http.status_code
              value: 500
          probability: 1.0  # Always sample 500s

Performance Tuning

Memory Configuration

# Increase Java heap for Elasticsearch
export ES_JAVA_OPTS="-Xms2g -Xmx2g"

# Jaeger collector buffer
jaeger-collector \
  --processor.jaeger.workers=100 \
  --processor.jaeger.queues=2000

Batch Processing

collector:
  queue:
    queue-size: 2000
  processors:
    jaeger:
      workers: 100
      queues: 2000

Index Management

# Reduce index overhead
ES_INDEX_PREFIX=jaeger
ES_INDEX_DATE_SEPARATOR="-"

# Enable ILM (Index Lifecycle Management)
PUT jaeger-ilm-policy
{
  "policy": "jaeger-policy",
  "phases": {
    "hot": {
      "min_age": "0d",
      "actions": {
        "rollover": {
          "max_primary_store_size": "50GB"
        }
      }
    },
    "delete": {
      "min_age": "30d",
      "actions": {
        "delete": {}
      }
    }
  }
}

Troubleshooting

Verify Collector Health

# Check collector logs
docker logs jaeger-collector

# Test endpoint
curl http://localhost:14268/api/traces

# Check Elasticsearch
curl http://localhost:9200/_cat/indices

# View Jaeger metrics
curl http://localhost:14269/metrics

Trace Not Appearing

# Check sampling configuration
curl http://localhost:14269/api/sampling/strategies

# Verify agent connectivity
curl http://localhost:5778/sampling

# Monitor spans
docker logs jaeger | grep -i span

Performance Issues

# Check storage backend
curl http://elasticsearch:9200/_stats

# Monitor Jaeger metrics
curl http://localhost:14269/metrics | grep jaeger_collector

# Check disk usage
du -sh /var/lib/elasticsearch/

Conclusion

Jaeger enables visibility into distributed microservices architectures through comprehensive request tracing. By following this guide, you've deployed a powerful tracing platform with multiple storage backends and sampling strategies. Focus on instrumenting critical paths in your services, setting appropriate sampling rates to manage cost, and regularly analyzing traces to identify performance bottlenecks. The insights from tracing combined with metrics and logs create complete system observability.