OpenTelemetry Collector Configuración

The OpenTelemetry Collector is a vendor-agnostic proxy for processing and exporting telemetry data (metrics, traces, logs) from applications to various backends. Esta guía covers installation, receiver configuration, processors for data transformation, exporters to multiple destinations, and service configuration for reliable telemetry processing.

Tabla de Contenidos

Introducción

The OpenTelemetry Collector solves the challenge of collecting telemetry from diverse sources and routing to multiple backends. It decouples instrumentation from infrastructure choices, allowing seamless switching between observability platforms without application changes.

Architecture

Collector Canalización

Applications
    ↓
Instrumented with OpenTelemetry SDKs
    ↓
OTLP Protocol (gRPC/HTTP)
    ↓
┌─────────────────────────────────┐
│   OpenTelemetry Collector       │
├─────────────────────────────────┤
│                                 │
│  Receivers                      │
│  ├─ OTLP (gRPC/HTTP)           │
│  ├─ Prometheus                 │
│  ├─ Jaeger                     │
│  └─ Syslog                     │
│         ↓                       │
│  Processors                    │
│  ├─ Batch                      │
│  ├─ Memory Limiter             │
│  ├─ Sampling                   │
│  └─ Attribute Processor        │
│         ↓                       │
│  Exporters                     │
│  ├─ Prometheus                │
│  ├─ Jaeger                    │
│  ├─ OTLP Backends             │
│  └─ Multiple Destinations     │
│                                 │
└─────────────────────────────────┘
    ↓
Observability Platforms
(Prometheus, Jaeger, Datadog, etc.)

Requisitos del Sistema

  • Linux, macOS, or Windows
  • Minimum 512MB RAM
  • 100MB disk space
  • Go 1.17+
  • Red connectivity
  • Telemetry-generating applications

Instalación

Binary Instalación

# Download latest release
OTEL_VERSION="0.88.0"
wget https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v${OTEL_VERSION}/otelcontribcol_${OTEL_VERSION}_linux_amd64.tar.gz

tar -xzf otelcontribcol_${OTEL_VERSION}_linux_amd64.tar.gz

# Install
sudo mv otelcontribcol /usr/local/bin/
sudo chmod +x /usr/local/bin/otelcontribcol

# Verify
otelcontribcol --version

Docker Instalación

# Pull image
docker pull otel/opentelemetry-collector-contrib:latest

# Run container
docker run -d \
  -p 4317:4317 \
  -p 4318:4318 \
  -p 9411:9411 \
  -p 14250:14250 \
  -p 55679:55679 \
  -v $(pwd)/otel-collector-config.yml:/etc/otel-collector-config.yml \
  --name otel-collector \
  otel/opentelemetry-collector-contrib:latest \
  --config=/etc/otel-collector-config.yml

Systemd Servicio

sudo tee /etc/systemd/system/otel-collector.service > /dev/null << 'EOF'
[Unit]
Description=OpenTelemetry Collector
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/otelcontribcol --config=/etc/otel-collector/config.yml
Restart=on-failure
RestartSec=5

StandardOutput=journal
StandardError=journal
SyslogIdentifier=otel-collector

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable otel-collector
sudo systemctl start otel-collector

Receivers Configuración

OTLP Receivers

receivers:
  # OpenTelemetry Protocol over gRPC
  otlp/grpc:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

  # OpenTelemetry Protocol over HTTP
  otlp/http:
    protocols:
      http:
        endpoint: 0.0.0.0:4318
        cors:
          allowed_origins: ["*"]

Prometheus Receiver

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'prometheus'
          static_configs:
            - targets: ['localhost:9090']
        
        - job_name: 'node-exporter'
          static_configs:
            - targets: ['localhost:9100']
          metric_relabel_configs:
            - source_labels: [__name__]
              regex: 'node_network_.*'
              action: drop

Jaeger Receiver

receivers:
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268

Syslog Receiver

receivers:
  syslog:
    listen_address: 0.0.0.0:514
    protocol_config:
      protocol: rfc5424

Filelog Receiver

receivers:
  filelog:
    include_paths:
      - /var/log/app/*.log
      - /var/log/syslog
    multiline_parser:
      line_start_pattern: '^\d{4}-\d{2}-\d{2}'
    parse_from: body
    parse_config:
      type: json

Processors for Datos Transformation

Batch Processor

processors:
  batch:
    send_batch_size: 1024
    timeout: 10s
    send_batch_max_size: 2048

Memory Limiter

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
    spike_limit_mib: 128

Sampling Processor

processors:
  sampling:
    sampling_percentage: 10  # Sample 10% of traces
  
  # Or head-based sampling
  tail_sampling:
    policies:
      - name: error-spans
        type: status_code
        status_code:
          status_codes: [ERROR]
      
      - name: slow-traces
        type: latency
        latency:
          threshold_ms: 1000
      
      - name: default-sampling
        type: probabilistic
        probabilistic:
          sampling_percentage: 10

Attribute Processor

processors:
  attributes:
    actions:
      - key: service.version
        value: 1.0.0
        action: insert
      
      - key: environment
        value: production
        action: insert
      
      - key: internal_id
        action: delete
      
      - key: db.password
        pattern: "^password=(?P<pass>\\S+)"
        replacement: password=****
        action: update

Resource Processor

processors:
  resource:
    attributes:
      - key: service.name
        value: my-service
        action: insert
      
      - key: host.name
        from_attribute: hostname
        action: insert

Span Processor

processors:
  span:
    name:
      to_attributes:
        rules:
          - ^/api/(?P<method>[^/]+)
          - ^(?P<operation>[^/]+)
    status:
      code: Error
      description: Span has error status

Exporters to Backends

Prometheus Exporter

exporters:
  prometheus:
    endpoint: 0.0.0.0:8888
    resource_to_telemetry_conversion:
      enabled: true

Jaeger Exporter

exporters:
  jaeger/grpc:
    endpoint: localhost:14250
    tls:
      insecure: true
  
  jaeger/http:
    endpoint: http://localhost:14268/api/traces

OTLP Exporters

exporters:
  # Export to external OTLP backend
  otlp/grpc:
    endpoint: otel-backend.example.com:4317
    tls:
      insecure: false

  # Export to Grafana Cloud
  otlp/http:
    endpoint: https://otlp-gateway-prod-us-central-1.grafana.net/otlp
    headers:
      Authorization: "Bearer YOUR_TOKEN"

Multiple Backends

exporters:
  prometheus:
    endpoint: 0.0.0.0:8888
  
  jaeger/grpc:
    endpoint: jaeger:14250
  
  datadog:
    api:
      key: YOUR_API_KEY
      site: datadoghq.com
  
  otlp/datadog-apm:
    endpoint: http://localhost:4317

Servicio Configuración

Basic Servicio Configuración

service:
  pipelines:
    # Traces pipeline
    traces:
      receivers: [otlp/grpc, otlp/http, jaeger]
      processors: [memory_limiter, batch, sampling]
      exporters: [jaeger/grpc, otlp/grpc]

    # Metrics pipeline
    metrics:
      receivers: [otlp/grpc, otlp/http, prometheus]
      processors: [memory_limiter, batch]
      exporters: [prometheus, otlp/http]

    # Logs pipeline
    logs:
      receivers: [otlp/grpc, otlp/http, syslog, filelog]
      processors: [memory_limiter, batch]
      exporters: [otlp/grpc]

Multi-Backend Configuración

service:
  pipelines:
    traces:
      receivers: [otlp/grpc]
      processors: [memory_limiter, tail_sampling, batch]
      exporters:
        - jaeger/grpc
        - otlp/datadog
        - otlp/honeycomb

    metrics:
      receivers: [otlp/grpc, prometheus]
      processors: [memory_limiter, batch]
      exporters:
        - prometheus
        - otlp/datadog
        - otlp/grafana

    logs:
      receivers: [otlp/grpc, syslog, filelog]
      processors: [memory_limiter, batch]
      exporters:
        - otlp/grafana
        - otlp/datadog

Avanzado Scenarios

Conditional Routing

processors:
  routing:
    default_exporters:
      - otlp/default
    table:
      - value: production
        exporters: [otlp/prod]
      - value: staging
        exporters: [otlp/staging]
    from_attribute: environment

service:
  pipelines:
    traces:
      receivers: [otlp/grpc]
      processors: [routing, batch]
      exporters: [otlp/default, otlp/prod, otlp/staging]

Multi-Region Despliegue

receivers:
  otlp/us_east:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
  
  otlp/eu_west:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4318

service:
  pipelines:
    traces:
      receivers: [otlp/us_east, otlp/eu_west]
      processors: [memory_limiter, batch]
      exporters:
        - jaeger/us_east
        - jaeger/eu_west

Collector as Gateway

receivers:
  otlp/app1:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
  
  otlp/app2:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4318

processors:
  resource/app1:
    attributes:
      - key: service.instance.id
        from_attribute: ""
        action: insert
      - key: app
        value: app1
        action: insert

  resource/app2:
    attributes:
      - key: app
        value: app2
        action: insert

service:
  pipelines:
    traces:
      receivers: [otlp/app1]
      processors: [resource/app1, memory_limiter, batch]
      exporters: [jaeger/grpc]
    
    traces/app2:
      receivers: [otlp/app2]
      processors: [resource/app2, memory_limiter, batch]
      exporters: [jaeger/grpc]

Rendimiento Tuning

Memory Configuración

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 1024      # Overall limit
    spike_limit_mib: 256 # Spike allowance
  
  batch:
    send_batch_size: 2048
    timeout: 10s
    send_batch_max_size: 4096

extensions:
  memory_ballast:
    size_mib: 512  # Reserve memory

Queue Configuración

exporters:
  otlp/grpc:
    endpoint: backend:4317
    
    # Retry configuration
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 5m
    
    # Queue settings
    sending_queue:
      enabled: true
      num_consumers: 10
      queue_size: 5000
      storage: file_storage

Telemetry Configuración

service:
  telemetry:
    logs:
      level: info
    metrics:
      level: detailed
      
extensions:
  pprof:
    endpoint: :1888
  zpages:
    endpoint: :55679

Solución de Problemas

Verificar Collector Estado

# Service status
systemctl status otel-collector

# View logs
journalctl -u otel-collector -f

# Check health
curl http://localhost:13133/healthz

Debug Configuración

# Validate configuration
otelcontribcol validate --config=config.yml

# Run with debug logging
GODEBUG=http2debug=1 otelcontribcol --config=config.yml

# Enable pprof for profiling
curl http://localhost:1888/debug/pprof/

Monitor Métricas

# Export metrics endpoint
curl http://localhost:8888/metrics

# Check pipeline stats
curl http://localhost:8888/metrics | grep otelcol_

# Monitor memory usage
curl http://localhost:8888/metrics | grep process_runtime_go_

Conclusión

The OpenTelemetry Collector provides a flexible, vendor-neutral way to collect and process telemetry data. By following Esta guía, you've deployed a robust telemetry pipeline capable of handling traces, metrics, and logs from diverse sources. Focus on designing efficient processor chains, setting appropriate memory limits, and leveraging multiple exporters for comprehensive observability. The collector's flexibility makes it essential infrastructure in modern observability architectures.