OpenTelemetry Collector Configuration

The OpenTelemetry Collector is a vendor-agnostic proxy for processing and exporting telemetry data (metrics, traces, logs) from applications to various backends. This guide covers installation, receiver configuration, processors for data transformation, exporters to multiple destinations, and service configuration for reliable telemetry processing.

Table of Contents

Introduction

The OpenTelemetry Collector solves the challenge of collecting telemetry from diverse sources and routing to multiple backends. It decouples instrumentation from infrastructure choices, allowing seamless switching between observability platforms without application changes.

Architecture

Collector Pipeline

Applications
    ↓
Instrumented with OpenTelemetry SDKs
    ↓
OTLP Protocol (gRPC/HTTP)
    ↓
┌─────────────────────────────────┐
│   OpenTelemetry Collector       │
├─────────────────────────────────┤
│                                 │
│  Receivers                      │
│  ├─ OTLP (gRPC/HTTP)           │
│  ├─ Prometheus                 │
│  ├─ Jaeger                     │
│  └─ Syslog                     │
│         ↓                       │
│  Processors                    │
│  ├─ Batch                      │
│  ├─ Memory Limiter             │
│  ├─ Sampling                   │
│  └─ Attribute Processor        │
│         ↓                       │
│  Exporters                     │
│  ├─ Prometheus                │
│  ├─ Jaeger                    │
│  ├─ OTLP Backends             │
│  └─ Multiple Destinations     │
│                                 │
└─────────────────────────────────┘
    ↓
Observability Platforms
(Prometheus, Jaeger, Datadog, etc.)

System Requirements

  • Linux, macOS, or Windows
  • Minimum 512MB RAM
  • 100MB disk space
  • Go 1.17+
  • Network connectivity
  • Telemetry-generating applications

Installation

Binary Installation

# Download latest release
OTEL_VERSION="0.88.0"
wget https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v${OTEL_VERSION}/otelcontribcol_${OTEL_VERSION}_linux_amd64.tar.gz

tar -xzf otelcontribcol_${OTEL_VERSION}_linux_amd64.tar.gz

# Install
sudo mv otelcontribcol /usr/local/bin/
sudo chmod +x /usr/local/bin/otelcontribcol

# Verify
otelcontribcol --version

Docker Installation

# Pull image
docker pull otel/opentelemetry-collector-contrib:latest

# Run container
docker run -d \
  -p 4317:4317 \
  -p 4318:4318 \
  -p 9411:9411 \
  -p 14250:14250 \
  -p 55679:55679 \
  -v $(pwd)/otel-collector-config.yml:/etc/otel-collector-config.yml \
  --name otel-collector \
  otel/opentelemetry-collector-contrib:latest \
  --config=/etc/otel-collector-config.yml

Systemd Service

sudo tee /etc/systemd/system/otel-collector.service > /dev/null << 'EOF'
[Unit]
Description=OpenTelemetry Collector
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/otelcontribcol --config=/etc/otel-collector/config.yml
Restart=on-failure
RestartSec=5

StandardOutput=journal
StandardError=journal
SyslogIdentifier=otel-collector

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable otel-collector
sudo systemctl start otel-collector

Receivers Configuration

OTLP Receivers

receivers:
  # OpenTelemetry Protocol over gRPC
  otlp/grpc:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

  # OpenTelemetry Protocol over HTTP
  otlp/http:
    protocols:
      http:
        endpoint: 0.0.0.0:4318
        cors:
          allowed_origins: ["*"]

Prometheus Receiver

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'prometheus'
          static_configs:
            - targets: ['localhost:9090']
        
        - job_name: 'node-exporter'
          static_configs:
            - targets: ['localhost:9100']
          metric_relabel_configs:
            - source_labels: [__name__]
              regex: 'node_network_.*'
              action: drop

Jaeger Receiver

receivers:
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268

Syslog Receiver

receivers:
  syslog:
    listen_address: 0.0.0.0:514
    protocol_config:
      protocol: rfc5424

Filelog Receiver

receivers:
  filelog:
    include_paths:
      - /var/log/app/*.log
      - /var/log/syslog
    multiline_parser:
      line_start_pattern: '^\d{4}-\d{2}-\d{2}'
    parse_from: body
    parse_config:
      type: json

Processors for Data Transformation

Batch Processor

processors:
  batch:
    send_batch_size: 1024
    timeout: 10s
    send_batch_max_size: 2048

Memory Limiter

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
    spike_limit_mib: 128

Sampling Processor

processors:
  sampling:
    sampling_percentage: 10  # Sample 10% of traces
  
  # Or head-based sampling
  tail_sampling:
    policies:
      - name: error-spans
        type: status_code
        status_code:
          status_codes: [ERROR]
      
      - name: slow-traces
        type: latency
        latency:
          threshold_ms: 1000
      
      - name: default-sampling
        type: probabilistic
        probabilistic:
          sampling_percentage: 10

Attribute Processor

processors:
  attributes:
    actions:
      - key: service.version
        value: 1.0.0
        action: insert
      
      - key: environment
        value: production
        action: insert
      
      - key: internal_id
        action: delete
      
      - key: db.password
        pattern: "^password=(?P<pass>\\S+)"
        replacement: password=****
        action: update

Resource Processor

processors:
  resource:
    attributes:
      - key: service.name
        value: my-service
        action: insert
      
      - key: host.name
        from_attribute: hostname
        action: insert

Span Processor

processors:
  span:
    name:
      to_attributes:
        rules:
          - ^/api/(?P<method>[^/]+)
          - ^(?P<operation>[^/]+)
    status:
      code: Error
      description: Span has error status

Exporters to Backends

Prometheus Exporter

exporters:
  prometheus:
    endpoint: 0.0.0.0:8888
    resource_to_telemetry_conversion:
      enabled: true

Jaeger Exporter

exporters:
  jaeger/grpc:
    endpoint: localhost:14250
    tls:
      insecure: true
  
  jaeger/http:
    endpoint: http://localhost:14268/api/traces

OTLP Exporters

exporters:
  # Export to external OTLP backend
  otlp/grpc:
    endpoint: otel-backend.example.com:4317
    tls:
      insecure: false

  # Export to Grafana Cloud
  otlp/http:
    endpoint: https://otlp-gateway-prod-us-central-1.grafana.net/otlp
    headers:
      Authorization: "Bearer YOUR_TOKEN"

Multiple Backends

exporters:
  prometheus:
    endpoint: 0.0.0.0:8888
  
  jaeger/grpc:
    endpoint: jaeger:14250
  
  datadog:
    api:
      key: YOUR_API_KEY
      site: datadoghq.com
  
  otlp/datadog-apm:
    endpoint: http://localhost:4317

Service Configuration

Basic Service Setup

service:
  pipelines:
    # Traces pipeline
    traces:
      receivers: [otlp/grpc, otlp/http, jaeger]
      processors: [memory_limiter, batch, sampling]
      exporters: [jaeger/grpc, otlp/grpc]

    # Metrics pipeline
    metrics:
      receivers: [otlp/grpc, otlp/http, prometheus]
      processors: [memory_limiter, batch]
      exporters: [prometheus, otlp/http]

    # Logs pipeline
    logs:
      receivers: [otlp/grpc, otlp/http, syslog, filelog]
      processors: [memory_limiter, batch]
      exporters: [otlp/grpc]

Multi-Backend Configuration

service:
  pipelines:
    traces:
      receivers: [otlp/grpc]
      processors: [memory_limiter, tail_sampling, batch]
      exporters:
        - jaeger/grpc
        - otlp/datadog
        - otlp/honeycomb

    metrics:
      receivers: [otlp/grpc, prometheus]
      processors: [memory_limiter, batch]
      exporters:
        - prometheus
        - otlp/datadog
        - otlp/grafana

    logs:
      receivers: [otlp/grpc, syslog, filelog]
      processors: [memory_limiter, batch]
      exporters:
        - otlp/grafana
        - otlp/datadog

Advanced Scenarios

Conditional Routing

processors:
  routing:
    default_exporters:
      - otlp/default
    table:
      - value: production
        exporters: [otlp/prod]
      - value: staging
        exporters: [otlp/staging]
    from_attribute: environment

service:
  pipelines:
    traces:
      receivers: [otlp/grpc]
      processors: [routing, batch]
      exporters: [otlp/default, otlp/prod, otlp/staging]

Multi-Region Deployment

receivers:
  otlp/us_east:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
  
  otlp/eu_west:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4318

service:
  pipelines:
    traces:
      receivers: [otlp/us_east, otlp/eu_west]
      processors: [memory_limiter, batch]
      exporters:
        - jaeger/us_east
        - jaeger/eu_west

Collector as Gateway

receivers:
  otlp/app1:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
  
  otlp/app2:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4318

processors:
  resource/app1:
    attributes:
      - key: service.instance.id
        from_attribute: ""
        action: insert
      - key: app
        value: app1
        action: insert

  resource/app2:
    attributes:
      - key: app
        value: app2
        action: insert

service:
  pipelines:
    traces:
      receivers: [otlp/app1]
      processors: [resource/app1, memory_limiter, batch]
      exporters: [jaeger/grpc]
    
    traces/app2:
      receivers: [otlp/app2]
      processors: [resource/app2, memory_limiter, batch]
      exporters: [jaeger/grpc]

Performance Tuning

Memory Configuration

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 1024      # Overall limit
    spike_limit_mib: 256 # Spike allowance
  
  batch:
    send_batch_size: 2048
    timeout: 10s
    send_batch_max_size: 4096

extensions:
  memory_ballast:
    size_mib: 512  # Reserve memory

Queue Configuration

exporters:
  otlp/grpc:
    endpoint: backend:4317
    
    # Retry configuration
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 5m
    
    # Queue settings
    sending_queue:
      enabled: true
      num_consumers: 10
      queue_size: 5000
      storage: file_storage

Telemetry Configuration

service:
  telemetry:
    logs:
      level: info
    metrics:
      level: detailed
      
extensions:
  pprof:
    endpoint: :1888
  zpages:
    endpoint: :55679

Troubleshooting

Check Collector Status

# Service status
systemctl status otel-collector

# View logs
journalctl -u otel-collector -f

# Check health
curl http://localhost:13133/healthz

Debug Configuration

# Validate configuration
otelcontribcol validate --config=config.yml

# Run with debug logging
GODEBUG=http2debug=1 otelcontribcol --config=config.yml

# Enable pprof for profiling
curl http://localhost:1888/debug/pprof/

Monitor Metrics

# Export metrics endpoint
curl http://localhost:8888/metrics

# Check pipeline stats
curl http://localhost:8888/metrics | grep otelcol_

# Monitor memory usage
curl http://localhost:8888/metrics | grep process_runtime_go_

Conclusion

The OpenTelemetry Collector provides a flexible, vendor-neutral way to collect and process telemetry data. By following this guide, you've deployed a robust telemetry pipeline capable of handling traces, metrics, and logs from diverse sources. Focus on designing efficient processor chains, setting appropriate memory limits, and leveraging multiple exporters for comprehensive observability. The collector's flexibility makes it essential infrastructure in modern observability architectures.