Log Shipping with Filebeat to Elasticsearch
Filebeat is a lightweight log shipper from the Elastic Beats family that monitors log files, collects events, and forwards them to Elasticsearch or Logstash with minimal CPU and memory overhead. With built-in modules for popular software like nginx, MySQL, and system logs, Filebeat accelerates log pipeline setup while its processors and Ingest Node pipelines enable on-the-fly data enrichment.
Prerequisites
- Ubuntu/Debian or CentOS/Rocky Linux server
- Elasticsearch 8.x cluster accessible from Filebeat nodes
- Kibana 8.x (for dashboards and management)
- Filebeat 8.x (must match Elasticsearch version)
- Log files readable by the filebeat user
Installing Filebeat
# Install Filebeat via APT (Ubuntu/Debian)
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
sudo apt-get install -y apt-transport-https
echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | \
sudo tee /etc/apt/sources.list.d/elastic-8.x.list
sudo apt-get update
sudo apt-get install -y filebeat
# For CentOS/Rocky Linux
sudo rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch
cat > /etc/yum.repos.d/elasticsearch.repo <<EOF
[elasticsearch]
name=Elasticsearch repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
EOF
sudo yum install -y filebeat
# Verify installation
filebeat version
# Configure Elasticsearch connection
sudo nano /etc/filebeat/filebeat.yml
Basic filebeat.yml configuration:
# /etc/filebeat/filebeat.yml
# Output to Elasticsearch
output.elasticsearch:
hosts: ["https://elasticsearch:9200"]
username: "elastic"
password: "your-elastic-password"
# For Elasticsearch with self-signed certs:
ssl.certificate_authorities: ["/etc/filebeat/certs/ca.crt"]
# Or for testing (disable in production):
ssl.verification_mode: none
# Kibana for dashboard imports
setup.kibana:
host: "https://kibana:5601"
username: "elastic"
password: "your-elastic-password"
# ILM (Index Lifecycle Management) - manages index rotation
setup.ilm.enabled: true
setup.ilm.rollover_alias: "filebeat"
setup.ilm.pattern: "{now/d}-000001"
setup.ilm.policy_name: "filebeat-30d"
# Global settings
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: true
reload.period: 10s
# Processors applied to all events
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
- add_docker_metadata: ~
Module Setup
Filebeat modules provide pre-configured parsers and Kibana dashboards:
# List available modules
filebeat modules list
# Enable common modules
sudo filebeat modules enable system nginx mysql
# Configure system module
cat > /etc/filebeat/modules.d/system.yml <<EOF
- module: system
syslog:
enabled: true
var.paths: ["/var/log/syslog*", "/var/log/messages*"]
auth:
enabled: true
var.paths: ["/var/log/auth.log*", "/var/log/secure*"]
EOF
# Configure nginx module
cat > /etc/filebeat/modules.d/nginx.yml <<EOF
- module: nginx
access:
enabled: true
var.paths: ["/var/log/nginx/access.log*"]
error:
enabled: true
var.paths: ["/var/log/nginx/error.log*"]
ingress_controller:
enabled: false
EOF
# Configure MySQL module
cat > /etc/filebeat/modules.d/mysql.yml <<EOF
- module: mysql
error:
enabled: true
var.paths: ["/var/log/mysql/error.log*"]
slowlog:
enabled: true
var.paths: ["/var/log/mysql/mysql-slow.log*"]
EOF
# Set up dashboards and ILM policies
sudo filebeat setup --dashboards --index-management
# Start Filebeat
sudo systemctl enable --now filebeat
sudo systemctl status filebeat
# Verify logs are shipping
sudo filebeat test output
Custom Log Inputs
Configure Filebeat for custom application logs:
# /etc/filebeat/filebeat.yml - inputs section
filebeat.inputs:
# JSON application logs
- type: filestream
id: app-json-logs
enabled: true
paths:
- /var/log/app/*.log
- /var/log/app/**/*.json
parsers:
- ndjson:
target: "json"
add_error_key: true
overwrite_keys: true
fields:
service: "webapp"
environment: "production"
fields_under_root: true
tags: ["json", "application"]
# Multiple applications with labels
- type: filestream
id: api-gateway-logs
enabled: true
paths:
- /var/log/api-gateway/access.log
processors:
- dissect:
tokenizer: '%{client_ip} - %{user} [%{timestamp}] "%{method} %{path} %{protocol}" %{status} %{bytes}'
field: "message"
target_prefix: "http"
fields:
service: "api-gateway"
fields_under_root: true
# Docker container logs (read from Docker socket)
- type: container
paths:
- /var/lib/docker/containers/*/*.log
processors:
- add_docker_metadata:
host: "unix:///var/run/docker.sock"
labels.dedot: true
Multiline Log Handling
Handle Java stack traces, Python tracebacks, and other multi-line logs:
filebeat.inputs:
# Java application logs with stack traces
- type: filestream
id: java-app
paths:
- /var/log/java-app/app.log
multiline:
# Start a new event when line begins with a timestamp
type: pattern
pattern: '^\d{4}-\d{2}-\d{2}'
negate: true
match: after
max_lines: 200
timeout: 5s
# Python traceback handling
- type: filestream
id: python-app
paths:
- /var/log/python-app/*.log
multiline:
type: pattern
# New log entries start with a log level
pattern: '^(INFO|WARNING|ERROR|DEBUG|CRITICAL)'
negate: true
match: after
# Log4j/Logback XML format
- type: filestream
id: log4j-app
paths:
- /var/log/log4j-app/*.log
multiline:
type: pattern
pattern: '^<log4j'
negate: false
match: after
# Go panic messages
- type: filestream
id: go-app
paths:
- /var/log/go-app/*.log
multiline:
type: pattern
pattern: '^goroutine \d+ \['
negate: true
match: after
Processors and Field Enrichment
filebeat.inputs:
- type: filestream
id: webapp
paths:
- /var/log/webapp/access.log
processors:
# Parse nginx combined log format with dissect
- dissect:
tokenizer: '%{source.ip} - %{user.name} [%{@timestamp}] "%{http.request.method} %{url.path} HTTP/%{http.version}" %{http.response.status_code} %{http.response.body.bytes}'
field: "message"
target_prefix: ""
# GeoIP enrichment for client IPs
- geoip:
field: source.ip
target: source.geo
database_file: /etc/filebeat/GeoLite2-City.mmdb
# Convert status code to integer
- convert:
fields:
- {from: "http.response.status_code", type: integer}
- {from: "http.response.body.bytes", type: long}
# Add custom fields
- add_fields:
target: ''
fields:
datacenter: "us-east-1"
cluster: "production"
# Drop health check logs to reduce noise
- drop_event:
when:
or:
- contains:
url.path: "/health"
- contains:
url.path: "/metrics"
# Global processors
processors:
- add_host_metadata:
when.not.contains.tags: forwarded
- add_cloud_metadata: ~
# Rename fields for ECS compliance
- rename:
fields:
- {from: "host.name", to: "agent.hostname"}
ignore_missing: true
# Drop specific fields to reduce storage
- drop_fields:
fields: ["agent.ephemeral_id", "agent.id", "ecs.version"]
ignore_missing: true
Kibana Dashboards
# Import module dashboards
sudo filebeat setup --dashboards
# Verify dashboards were created in Kibana
curl -s -u elastic:password \
"https://kibana:5601/api/saved_objects/_find?type=dashboard&search=Filebeat" | \
jq '.saved_objects[].attributes.title'
# Create a custom index template
curl -s -X PUT \
-H "Content-Type: application/json" \
-u elastic:password \
"https://elasticsearch:9200/_index_template/filebeat-custom" \
-d '{
"index_patterns": ["filebeat-*"],
"template": {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"index.lifecycle.name": "filebeat-policy"
},
"mappings": {
"properties": {
"http.response.status_code": {"type": "integer"},
"http.response.body.bytes": {"type": "long"},
"source.geo.location": {"type": "geo_point"}
}
}
}
}'
# Key dashboards available after setup:
# [Filebeat System] Syslog dashboard
# [Filebeat Nginx] Overview
# [Filebeat MySQL] Overview
# Access via Kibana > Dashboards > search "Filebeat"
Ingest Node Pipelines
Use Elasticsearch Ingest Node for server-side processing:
# Create a custom ingest pipeline for application logs
curl -s -X PUT \
-H "Content-Type: application/json" \
-u elastic:password \
"https://elasticsearch:9200/_ingest/pipeline/webapp-logs" \
-d '{
"description": "Process webapp logs",
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{IPORHOST:source_ip} - %{DATA:user} \\[%{HTTPDATE:timestamp}\\] \"%{WORD:method} %{NOTSPACE:path} HTTP/%{NUMBER:http_version}\" %{NUMBER:status_code:int} %{NUMBER:bytes_sent:int} \"%{DATA:referrer}\" \"%{DATA:user_agent}\""
]
}
},
{
"date": {
"field": "timestamp",
"formats": ["dd/MMM/yyyy:HH:mm:ss Z"],
"timezone": "UTC"
}
},
{
"geoip": {
"field": "source_ip",
"target_field": "geo"
}
},
{
"user_agent": {
"field": "user_agent"
}
},
{
"remove": {
"field": ["message", "timestamp"]
}
}
]
}'
# Configure Filebeat to use the pipeline
# In filebeat.yml output.elasticsearch section:
# output.elasticsearch:
# pipeline: webapp-logs
Troubleshooting
Filebeat not shipping logs:
# Test output connectivity
sudo filebeat test output
# Check Filebeat logs
sudo journalctl -u filebeat -n 50
# Or:
sudo tail -f /var/log/filebeat/filebeat
# Enable debug logging temporarily
sudo filebeat -e -d "*" 2>&1 | head -100
# Check file permissions
sudo -u filebeat ls /var/log/nginx/access.log
# Add filebeat to the adm group if needed:
sudo usermod -a -G adm filebeat
Logs appear in wrong index:
# Check index pattern in Kibana
# Should be filebeat-8.x-YYYY.MM.DD
# Verify data stream
curl -s -u elastic:password \
"https://elasticsearch:9200/_cat/indices/filebeat-*?v" | head -5
# Check ILM policy
curl -s -u elastic:password \
"https://elasticsearch:9200/_ilm/policy/filebeat-default-policy" | jq
Multiline events not combining correctly:
# Test multiline pattern manually
echo "2024-01-15 ERROR Main thread
at java.lang.Exception
at com.example.App.main" | filebeat -e \
-E "filebeat.inputs=[{type:stdin,multiline.pattern:'^[0-9]{4}',multiline.negate:true,multiline.match:after}]" \
-E "output.console.pretty=true"
High CPU usage:
# Reduce harvester goroutine count
# In filebeat.yml:
filebeat.inputs:
- type: filestream
# ...
harvester_buffer_size: 16384 # 16KB buffer (default 16384)
close_inactive: 5m # Close inactive files
# Reduce how often Filebeat scans for new files
filebeat.registry.scan_frequency: 10s
Conclusion
Filebeat provides the most reliable and lightweight foundation for shipping logs to Elasticsearch, with built-in modules that handle parsing for dozens of popular applications out of the box. Custom inputs with dissect and grok processors handle any log format, while Ingest Node pipelines offload transformation to the Elasticsearch cluster. Combine Filebeat with Kibana dashboards and ILM policies for a complete, auto-rotating log management pipeline that scales from a single server to thousands of nodes.


