Log Shipping with Filebeat to Elasticsearch

Filebeat is a lightweight log shipper from the Elastic Beats family that monitors log files, collects events, and forwards them to Elasticsearch or Logstash with minimal CPU and memory overhead. With built-in modules for popular software like nginx, MySQL, and system logs, Filebeat accelerates log pipeline setup while its processors and Ingest Node pipelines enable on-the-fly data enrichment.

Prerequisites

  • Ubuntu/Debian or CentOS/Rocky Linux server
  • Elasticsearch 8.x cluster accessible from Filebeat nodes
  • Kibana 8.x (for dashboards and management)
  • Filebeat 8.x (must match Elasticsearch version)
  • Log files readable by the filebeat user

Installing Filebeat

# Install Filebeat via APT (Ubuntu/Debian)
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
sudo apt-get install -y apt-transport-https
echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | \
  sudo tee /etc/apt/sources.list.d/elastic-8.x.list
sudo apt-get update
sudo apt-get install -y filebeat

# For CentOS/Rocky Linux
sudo rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch
cat > /etc/yum.repos.d/elasticsearch.repo <<EOF
[elasticsearch]
name=Elasticsearch repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
EOF

sudo yum install -y filebeat

# Verify installation
filebeat version

# Configure Elasticsearch connection
sudo nano /etc/filebeat/filebeat.yml

Basic filebeat.yml configuration:

# /etc/filebeat/filebeat.yml

# Output to Elasticsearch
output.elasticsearch:
  hosts: ["https://elasticsearch:9200"]
  username: "elastic"
  password: "your-elastic-password"
  # For Elasticsearch with self-signed certs:
  ssl.certificate_authorities: ["/etc/filebeat/certs/ca.crt"]
  # Or for testing (disable in production):
  ssl.verification_mode: none

# Kibana for dashboard imports
setup.kibana:
  host: "https://kibana:5601"
  username: "elastic"
  password: "your-elastic-password"

# ILM (Index Lifecycle Management) - manages index rotation
setup.ilm.enabled: true
setup.ilm.rollover_alias: "filebeat"
setup.ilm.pattern: "{now/d}-000001"
setup.ilm.policy_name: "filebeat-30d"

# Global settings
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: true
  reload.period: 10s

# Processors applied to all events
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~
  - add_docker_metadata: ~

Module Setup

Filebeat modules provide pre-configured parsers and Kibana dashboards:

# List available modules
filebeat modules list

# Enable common modules
sudo filebeat modules enable system nginx mysql

# Configure system module
cat > /etc/filebeat/modules.d/system.yml <<EOF
- module: system
  syslog:
    enabled: true
    var.paths: ["/var/log/syslog*", "/var/log/messages*"]
  auth:
    enabled: true
    var.paths: ["/var/log/auth.log*", "/var/log/secure*"]
EOF

# Configure nginx module
cat > /etc/filebeat/modules.d/nginx.yml <<EOF
- module: nginx
  access:
    enabled: true
    var.paths: ["/var/log/nginx/access.log*"]
  error:
    enabled: true
    var.paths: ["/var/log/nginx/error.log*"]
  ingress_controller:
    enabled: false
EOF

# Configure MySQL module
cat > /etc/filebeat/modules.d/mysql.yml <<EOF
- module: mysql
  error:
    enabled: true
    var.paths: ["/var/log/mysql/error.log*"]
  slowlog:
    enabled: true
    var.paths: ["/var/log/mysql/mysql-slow.log*"]
EOF

# Set up dashboards and ILM policies
sudo filebeat setup --dashboards --index-management

# Start Filebeat
sudo systemctl enable --now filebeat
sudo systemctl status filebeat

# Verify logs are shipping
sudo filebeat test output

Custom Log Inputs

Configure Filebeat for custom application logs:

# /etc/filebeat/filebeat.yml - inputs section
filebeat.inputs:
  # JSON application logs
  - type: filestream
    id: app-json-logs
    enabled: true
    paths:
      - /var/log/app/*.log
      - /var/log/app/**/*.json
    parsers:
      - ndjson:
          target: "json"
          add_error_key: true
          overwrite_keys: true
    fields:
      service: "webapp"
      environment: "production"
    fields_under_root: true
    tags: ["json", "application"]

  # Multiple applications with labels
  - type: filestream
    id: api-gateway-logs
    enabled: true
    paths:
      - /var/log/api-gateway/access.log
    processors:
      - dissect:
          tokenizer: '%{client_ip} - %{user} [%{timestamp}] "%{method} %{path} %{protocol}" %{status} %{bytes}'
          field: "message"
          target_prefix: "http"
    fields:
      service: "api-gateway"
    fields_under_root: true

  # Docker container logs (read from Docker socket)
  - type: container
    paths:
      - /var/lib/docker/containers/*/*.log
    processors:
      - add_docker_metadata:
          host: "unix:///var/run/docker.sock"
          labels.dedot: true

Multiline Log Handling

Handle Java stack traces, Python tracebacks, and other multi-line logs:

filebeat.inputs:
  # Java application logs with stack traces
  - type: filestream
    id: java-app
    paths:
      - /var/log/java-app/app.log
    multiline:
      # Start a new event when line begins with a timestamp
      type: pattern
      pattern: '^\d{4}-\d{2}-\d{2}'
      negate: true
      match: after
      max_lines: 200
      timeout: 5s

  # Python traceback handling
  - type: filestream
    id: python-app
    paths:
      - /var/log/python-app/*.log
    multiline:
      type: pattern
      # New log entries start with a log level
      pattern: '^(INFO|WARNING|ERROR|DEBUG|CRITICAL)'
      negate: true
      match: after

  # Log4j/Logback XML format
  - type: filestream
    id: log4j-app
    paths:
      - /var/log/log4j-app/*.log
    multiline:
      type: pattern
      pattern: '^<log4j'
      negate: false
      match: after

  # Go panic messages
  - type: filestream
    id: go-app
    paths:
      - /var/log/go-app/*.log
    multiline:
      type: pattern
      pattern: '^goroutine \d+ \['
      negate: true
      match: after

Processors and Field Enrichment

filebeat.inputs:
  - type: filestream
    id: webapp
    paths:
      - /var/log/webapp/access.log
    processors:
      # Parse nginx combined log format with dissect
      - dissect:
          tokenizer: '%{source.ip} - %{user.name} [%{@timestamp}] "%{http.request.method} %{url.path} HTTP/%{http.version}" %{http.response.status_code} %{http.response.body.bytes}'
          field: "message"
          target_prefix: ""
      # GeoIP enrichment for client IPs
      - geoip:
          field: source.ip
          target: source.geo
          database_file: /etc/filebeat/GeoLite2-City.mmdb
      # Convert status code to integer
      - convert:
          fields:
            - {from: "http.response.status_code", type: integer}
            - {from: "http.response.body.bytes", type: long}
      # Add custom fields
      - add_fields:
          target: ''
          fields:
            datacenter: "us-east-1"
            cluster: "production"
      # Drop health check logs to reduce noise
      - drop_event:
          when:
            or:
              - contains:
                  url.path: "/health"
              - contains:
                  url.path: "/metrics"

# Global processors
processors:
  - add_host_metadata:
      when.not.contains.tags: forwarded
  - add_cloud_metadata: ~
  # Rename fields for ECS compliance
  - rename:
      fields:
        - {from: "host.name", to: "agent.hostname"}
      ignore_missing: true
  # Drop specific fields to reduce storage
  - drop_fields:
      fields: ["agent.ephemeral_id", "agent.id", "ecs.version"]
      ignore_missing: true

Kibana Dashboards

# Import module dashboards
sudo filebeat setup --dashboards

# Verify dashboards were created in Kibana
curl -s -u elastic:password \
  "https://kibana:5601/api/saved_objects/_find?type=dashboard&search=Filebeat" | \
  jq '.saved_objects[].attributes.title'

# Create a custom index template
curl -s -X PUT \
  -H "Content-Type: application/json" \
  -u elastic:password \
  "https://elasticsearch:9200/_index_template/filebeat-custom" \
  -d '{
    "index_patterns": ["filebeat-*"],
    "template": {
      "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 1,
        "index.lifecycle.name": "filebeat-policy"
      },
      "mappings": {
        "properties": {
          "http.response.status_code": {"type": "integer"},
          "http.response.body.bytes": {"type": "long"},
          "source.geo.location": {"type": "geo_point"}
        }
      }
    }
  }'

# Key dashboards available after setup:
# [Filebeat System] Syslog dashboard
# [Filebeat Nginx] Overview
# [Filebeat MySQL] Overview
# Access via Kibana > Dashboards > search "Filebeat"

Ingest Node Pipelines

Use Elasticsearch Ingest Node for server-side processing:

# Create a custom ingest pipeline for application logs
curl -s -X PUT \
  -H "Content-Type: application/json" \
  -u elastic:password \
  "https://elasticsearch:9200/_ingest/pipeline/webapp-logs" \
  -d '{
    "description": "Process webapp logs",
    "processors": [
      {
        "grok": {
          "field": "message",
          "patterns": [
            "%{IPORHOST:source_ip} - %{DATA:user} \\[%{HTTPDATE:timestamp}\\] \"%{WORD:method} %{NOTSPACE:path} HTTP/%{NUMBER:http_version}\" %{NUMBER:status_code:int} %{NUMBER:bytes_sent:int} \"%{DATA:referrer}\" \"%{DATA:user_agent}\""
          ]
        }
      },
      {
        "date": {
          "field": "timestamp",
          "formats": ["dd/MMM/yyyy:HH:mm:ss Z"],
          "timezone": "UTC"
        }
      },
      {
        "geoip": {
          "field": "source_ip",
          "target_field": "geo"
        }
      },
      {
        "user_agent": {
          "field": "user_agent"
        }
      },
      {
        "remove": {
          "field": ["message", "timestamp"]
        }
      }
    ]
  }'

# Configure Filebeat to use the pipeline
# In filebeat.yml output.elasticsearch section:
# output.elasticsearch:
#   pipeline: webapp-logs

Troubleshooting

Filebeat not shipping logs:

# Test output connectivity
sudo filebeat test output

# Check Filebeat logs
sudo journalctl -u filebeat -n 50
# Or:
sudo tail -f /var/log/filebeat/filebeat

# Enable debug logging temporarily
sudo filebeat -e -d "*" 2>&1 | head -100

# Check file permissions
sudo -u filebeat ls /var/log/nginx/access.log
# Add filebeat to the adm group if needed:
sudo usermod -a -G adm filebeat

Logs appear in wrong index:

# Check index pattern in Kibana
# Should be filebeat-8.x-YYYY.MM.DD

# Verify data stream
curl -s -u elastic:password \
  "https://elasticsearch:9200/_cat/indices/filebeat-*?v" | head -5

# Check ILM policy
curl -s -u elastic:password \
  "https://elasticsearch:9200/_ilm/policy/filebeat-default-policy" | jq

Multiline events not combining correctly:

# Test multiline pattern manually
echo "2024-01-15 ERROR Main thread
    at java.lang.Exception
    at com.example.App.main" | filebeat -e \
  -E "filebeat.inputs=[{type:stdin,multiline.pattern:'^[0-9]{4}',multiline.negate:true,multiline.match:after}]" \
  -E "output.console.pretty=true"

High CPU usage:

# Reduce harvester goroutine count
# In filebeat.yml:
filebeat.inputs:
  - type: filestream
    # ...
    harvester_buffer_size: 16384  # 16KB buffer (default 16384)
    close_inactive: 5m  # Close inactive files

# Reduce how often Filebeat scans for new files
filebeat.registry.scan_frequency: 10s

Conclusion

Filebeat provides the most reliable and lightweight foundation for shipping logs to Elasticsearch, with built-in modules that handle parsing for dozens of popular applications out of the box. Custom inputs with dissect and grok processors handle any log format, while Ingest Node pipelines offload transformation to the Elasticsearch cluster. Combine Filebeat with Kibana dashboards and ILM policies for a complete, auto-rotating log management pipeline that scales from a single server to thousands of nodes.