Fluent Bit Lightweight Log Processor

Fluent Bit is a fast, lightweight log processor and forwarder written in C, designed for high-throughput log collection with minimal CPU and memory usage. This guide covers deploying Fluent Bit on Linux, configuring input plugins, parsing rules, filters, output destinations, and Kubernetes integration for memory-efficient log forwarding.

Prerequisites

  • Ubuntu 20.04+ / Debian 11+ or CentOS 8+ / Rocky Linux 8+
  • 64 MB RAM (typical production usage is under 5 MB)
  • Root or sudo access
  • Log sources: systemd, files, TCP/UDP syslog, etc.

Installing Fluent Bit

# Ubuntu 22.04/24.04
curl https://raw.githubusercontent.com/fluent/fluent-bit/master/install.sh | sh

# Or manually via APT
curl -fsSL https://packages.fluentbit.io/fluentbit.key | sudo gpg --dearmor -o /usr/share/keyrings/fluentbit-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/fluentbit-keyring.gpg] https://packages.fluentbit.io/ubuntu/$(lsb_release -cs) $(lsb_release -cs) main" \
  | sudo tee /etc/apt/sources.list.d/fluent-bit.list

sudo apt-get update && sudo apt-get install -y fluent-bit

# CentOS/Rocky Linux
curl https://raw.githubusercontent.com/fluent/fluent-bit/master/install.sh | sh
# or:
sudo rpm --import https://packages.fluentbit.io/fluentbit.key
cat > /etc/yum.repos.d/fluent-bit.repo << 'EOF'
[fluent-bit]
name = Fluent Bit
baseurl = https://packages.fluentbit.io/centos/8/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.fluentbit.io/fluentbit.key
enabled=1
EOF
sudo dnf install -y fluent-bit

# Verify installation
fluent-bit --version

# Enable and start
sudo systemctl enable --now fluent-bit
sudo systemctl status fluent-bit

Configuration Structure

Fluent Bit uses a classic INI-like format (.conf) or YAML. Config files live in /etc/fluent-bit/:

/etc/fluent-bit/
├── fluent-bit.conf   # Main config
├── parsers.conf      # Parser definitions
└── plugins.conf      # External plugins

The main config has sections: [SERVICE], [INPUT], [FILTER], [OUTPUT]. Data flows from INPUT through FILTERs to OUTPUT.

# /etc/fluent-bit/fluent-bit.conf

[SERVICE]
    Flush         5          # Flush every 5 seconds
    Log_Level     info       # debug, info, warn, error
    Daemon        Off        # systemd manages the process
    Parsers_File  parsers.conf
    HTTP_Server   On
    HTTP_Listen   0.0.0.0
    HTTP_Port     2020       # Metrics endpoint

[INPUT]
    Name   tail
    Path   /var/log/nginx/access.log
    Tag    nginx.access
    Parser nginx

[OUTPUT]
    Name  stdout
    Match *

Input Plugins

Tail (follow log files):

[INPUT]
    Name              tail
    Path              /var/log/app/*.log
    Path_Key          filename        # add filename field to each record
    Tag               app.*
    Parser            json
    Refresh_Interval  10
    Rotate_Wait       30             # seconds to wait after log rotation
    Skip_Long_Lines   On
    Mem_Buf_Limit     50MB

Systemd journal:

[INPUT]
    Name            systemd
    Tag             systemd.*
    Systemd_Filter  _SYSTEMD_UNIT=nginx.service
    Systemd_Filter  _SYSTEMD_UNIT=postgresql.service
    Read_From_Tail  On
    Strip_Underscores On            # Remove _ prefix from journal fields

TCP/UDP Syslog:

[INPUT]
    Name    syslog
    Parser  syslog-rfc5424
    Listen  0.0.0.0
    Port    5140
    Mode    tcp
    Tag     syslog

Forward (receive from Fluentd/other Fluent Bit):

[INPUT]
    Name        forward
    Listen      0.0.0.0
    Port        24224
    Buffer_Chunk_Size 1M
    Buffer_Max_Size   6M

Parsing Logs

Define parsers in /etc/fluent-bit/parsers.conf:

[PARSER]
    Name        nginx
    Format      regex
    Regex       ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
    Time_Key    time
    Time_Format %d/%b/%Y:%H:%M:%S %z

[PARSER]
    Name        json
    Format      json
    Time_Key    timestamp
    Time_Format %Y-%m-%dT%H:%M:%S.%L%z
    Time_Keep   On

[PARSER]
    Name        syslog-rfc5424
    Format      regex
    Regex       ^\<(?<pri>[0-9]{1,5})\>1 (?<time>[^ ]+) (?<host>[^ ]+) (?<ident>[^ ]+) (?<pid>[-0-9]+) (?<msgid>[^ ]+) (?<extradata>(\[(.*)\]|-)) (?<message>.+)$
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L%z

[PARSER]
    Name        docker
    Format      json
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L
    Time_Keep   On

Filtering

Grep filter (include/exclude records):

# Keep only error-level logs
[FILTER]
    Name  grep
    Match app.*
    Regex level (error|critical|fatal)

# Exclude health check endpoints
[FILTER]
    Name    grep
    Match   nginx.*
    Exclude path /health

Record Modifier (add/remove fields):

[FILTER]
    Name          record_modifier
    Match         *
    Record        hostname ${HOSTNAME}
    Record        environment production
    Remove_key    password
    Remove_key    secret

Lua filter (custom logic):

[FILTER]
    Name    lua
    Match   app.*
    Script  /etc/fluent-bit/transform.lua
    Call    transform_record
-- /etc/fluent-bit/transform.lua
function transform_record(tag, timestamp, record)
    -- Normalize log level
    if record["level"] then
        record["level"] = string.upper(record["level"])
    end

    -- Add derived fields
    if record["duration_ms"] then
        if record["duration_ms"] > 1000 then
            record["slow_request"] = true
        end
    end

    return 1, timestamp, record
end

Throttle filter (rate limiting):

[FILTER]
    Name      throttle
    Match     *
    Rate      100        # max 100 records
    Window    5          # per 5-second window
    Print_Status On

Output Destinations

Elasticsearch / OpenSearch:

[OUTPUT]
    Name            es
    Match           *
    Host            elasticsearch.example.com
    Port            9200
    Index           logs
    Type            _doc
    Logstash_Format On
    Logstash_Prefix fluentbit
    Time_Key        @timestamp
    HTTP_User       elastic
    HTTP_Passwd     yourpassword
    tls             On
    tls.verify      On

Grafana Loki:

[OUTPUT]
    Name            loki
    Match           *
    Host            loki.example.com
    Port            3100
    Labels          job=fluent-bit,env=production,host=${HOSTNAME}
    Label_Keys      $app,$level
    Line_Format     json

S3 (archive):

[OUTPUT]
    Name                         s3
    Match                        *
    Region                       us-east-1
    Bucket                       my-log-archive
    Total_File_Size              100M
    Upload_Timeout               10m
    S3_Key_Format                /logs/%Y/%m/%d/%H/$UUID.gz
    Compression                  gzip
    Content_Type                 application/json

Forward to another Fluent Bit or Fluentd:

[OUTPUT]
    Name          forward
    Match         *
    Host          aggregator.internal
    Port          24224
    Shared_Key    mysecretkey
    Self_Hostname ${HOSTNAME}
    tls           On

Kubernetes Integration

Deploy Fluent Bit as a DaemonSet in Kubernetes using the official Helm chart:

# Add Helm repo
helm repo add fluent https://fluent.github.io/helm-charts
helm repo update

# Create values override
cat > fluent-bit-values.yaml << 'EOF'
config:
  inputs: |
    [INPUT]
        Name              tail
        Path              /var/log/containers/*.log
        multiline.parser  docker, cri
        Tag               kube.*
        Mem_Buf_Limit     50MB
        Skip_Long_Lines   On

  filters: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Kube_Tag_Prefix     kube.var.log.containers.
        Merge_Log           On
        Keep_Log            Off
        K8S-Logging.Parser  On
        K8S-Logging.Exclude On

  outputs: |
    [OUTPUT]
        Name  loki
        Match kube.*
        Host  loki.monitoring.svc.cluster.local
        Port  3100
        Labels job=fluent-bit,namespace=$kubernetes['namespace_name'],pod=$kubernetes['pod_name']

tolerations:
  - operator: Exists   # Schedule on all nodes including masters
EOF

helm install fluent-bit fluent/fluent-bit \
  --namespace monitoring \
  --create-namespace \
  --values fluent-bit-values.yaml

Troubleshooting

Check metrics and pipeline health:

# Fluent Bit exposes Prometheus metrics
curl http://localhost:2020/api/v1/metrics
curl http://localhost:2020/api/v1/uptime

# Check which outputs are healthy
curl http://localhost:2020/api/v1/health

Debug log output:

# Run interactively with debug logging
fluent-bit -c /etc/fluent-bit/fluent-bit.conf -v

# Or change Log_Level in [SERVICE] to debug
sudo systemctl restart fluent-bit && journalctl -u fluent-bit -f

Logs not being picked up (tail input):

# Fluent Bit tracks file positions in a DB file
# Reset position tracking to re-read from beginning
sudo find /var/lib/fluent-bit/ -name "*.db" -delete
sudo systemctl restart fluent-bit

High memory usage:

# Reduce Mem_Buf_Limit on inputs
# Enable filesystem buffering for backpressure
[INPUT]
    Name             tail
    storage.type     filesystem  # buffer to disk instead of RAM
    Mem_Buf_Limit    5MB

Parser not matching:

# Test parser against a log line
echo '127.0.0.1 - - [01/Jan/2024:12:00:00 +0000] "GET / HTTP/1.1" 200 1234' \
  | fluent-bit -i stdin -p tag=test \
               -F parser -p key_name=log -p parser=nginx \
               -o stdout -f 1

Conclusion

Fluent Bit's C implementation makes it ideal for resource-constrained environments where heavier agents like Logstash would add unacceptable overhead. With its plugin system covering dozens of inputs and outputs, Lua scripting for custom transforms, and native Kubernetes metadata enrichment, Fluent Bit handles the full log forwarding pipeline for both small VPS deployments and large Kubernetes clusters. Configure filesystem buffering and appropriate memory limits to ensure reliable delivery even during downstream outages.