bpftrace for System Tracing and Debugging

bpftrace is a high-level tracing language for Linux built on eBPF (extended Berkeley Packet Filter) that allows dynamic tracing of kernel and userspace events with minimal overhead. This guide covers installing bpftrace, probe types, writing one-liners and scripts, kernel and syscall tracing, userspace probes, and practical performance analysis recipes.

Prerequisites

  • Linux kernel 4.9+ (5.x recommended for full feature support)
  • Ubuntu 20.04+, CentOS/Rocky 8+, or Debian 10+
  • Root access (eBPF programs require root or CAP_BPF)
  • BPF filesystem mounted: ls /sys/fs/bpf

Installing bpftrace

# Ubuntu/Debian
sudo apt update
sudo apt install -y bpftrace

# CentOS/Rocky 8+
sudo dnf install -y bpftrace

# Verify installation
bpftrace --version

# Test basic functionality
sudo bpftrace -e 'BEGIN { printf("bpftrace is working\n"); exit(); }'

For newer versions, build from source:

# Ubuntu - install from snap for latest version
sudo snap install bpftrace

# Or use the static binary
wget https://github.com/iovisor/bpftrace/releases/latest/download/bpftrace
chmod +x bpftrace
sudo mv bpftrace /usr/local/bin/

List available probes on your system:

# List all kernel tracepoints
sudo bpftrace -l 'tracepoint:*' | head -30

# List kprobes (kernel functions)
sudo bpftrace -l 'kprobe:*' | grep tcp | head -20

# List uprobes for a binary
sudo bpftrace -l 'uprobe:/usr/bin/python3:*' | head -20

Probe Types and Syntax

bpftrace programs follow this structure:

probe_type:target:event /filter/ {
    actions
}
Probe TypeExampleWhen it fires
kprobekprobe:tcp_connectKernel function entry
kretprobekretprobe:tcp_connectKernel function return
tracepointtracepoint:syscalls:sys_enter_readKernel tracepoint
uprobeuprobe:/bin/bash:readlineUserspace function entry
uretprobeuretprobe:/bin/bash:readlineUserspace function return
usdtusdt:/usr/bin/python3:function__entryUserspace static probe
profileprofile:hz:99Timer-based sampling
intervalinterval:s:5Periodic interval
BEGINBEGINProgram start
ENDENDProgram exit (Ctrl+C)

Built-in variables:

comm    - process name
pid     - process ID
tid     - thread ID
uid     - user ID
cpu     - CPU number
nsecs   - timestamp (nanoseconds)
retval  - return value (kretprobe/uretprobe)
args    - function arguments
curtask - current task_struct pointer

Essential One-Liners

# Trace all syscalls for a specific process
sudo bpftrace -e 'tracepoint:raw_syscalls:sys_enter /pid == 1234/ { printf("%s\n", probe); }'

# Count syscalls by process name
sudo bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'

# Trace file opens with filenames
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args->filename)); }'

# Show disk I/O latency as a histogram (ms)
sudo bpftrace -e 'kprobe:blk_account_io_start { @start[arg0] = nsecs; }
  kprobe:blk_account_io_done /@start[arg0]/ {
    @latency_ms = hist((nsecs - @start[arg0]) / 1000000);
    delete(@start[arg0]);
  }'

# Count TCP connections by destination port
sudo bpftrace -e 'kprobe:tcp_connect {
  $sk = (struct sock *)arg0;
  @ports[$sk->__sk_common.skc_dport] = count();
}'

# Trace slow function calls (>1ms)
sudo bpftrace -e 'kprobe:vfs_read { @start[tid] = nsecs; }
  kretprobe:vfs_read /@start[tid] && (nsecs - @start[tid]) > 1000000/ {
    printf("slow vfs_read: %s %dms\n", comm, (nsecs - @start[tid]) / 1000000);
    delete(@start[tid]);
  }'

# Profile CPU stack traces (for flame graphs)
sudo bpftrace -e 'profile:hz:99 { @[kstack] = count(); }' -o /tmp/stacks.txt

Writing bpftrace Scripts

Save scripts as .bt files for reuse:

sudo nano /usr/local/bin/trace-read-latency.bt
#!/usr/local/bin/bpftrace

/*
 * trace-read-latency.bt - Trace read() latency by process
 * Usage: sudo bpftrace trace-read-latency.bt
 */

BEGIN {
    printf("Tracing read() latency... Ctrl+C to stop\n");
}

tracepoint:syscalls:sys_enter_read {
    @start[tid] = nsecs;
}

tracepoint:syscalls:sys_exit_read
/@start[tid]/
{
    $latency = nsecs - @start[tid];
    
    // Only record if read returned data (retval > 0)
    if (retval > 0) {
        @latency_us[comm] = hist($latency / 1000);
        @total_bytes[comm] = sum(retval);
    }
    
    // Alert on reads taking more than 10ms
    if ($latency > 10000000) {
        printf("SLOW read: pid=%d comm=%s latency=%dms bytes=%d\n",
               pid, comm, $latency / 1000000, retval);
    }
    
    delete(@start[tid]);
}

END {
    printf("\nRead latency histograms (microseconds):\n");
    print(@latency_us);
    printf("\nTotal bytes read per process:\n");
    print(@total_bytes);
    clear(@start);
}
sudo chmod +x /usr/local/bin/trace-read-latency.bt
sudo bpftrace /usr/local/bin/trace-read-latency.bt

Kernel and Syscall Tracing

# Find available tracepoints for a subsystem
sudo bpftrace -l 'tracepoint:net:*'
sudo bpftrace -l 'tracepoint:block:*'
sudo bpftrace -l 'tracepoint:sched:*'

# View tracepoint arguments
sudo bpftrace -lv 'tracepoint:syscalls:sys_enter_write'

Trace network packet drops:

sudo bpftrace -e '
kprobe:kfree_skb {
    $skb = (struct sk_buff *)arg0;
    @drops[comm] = count();
}
interval:s:5 {
    print(@drops);
    clear(@drops);
}' 

Trace scheduler events (context switches):

sudo bpftrace -e '
tracepoint:sched:sched_switch {
    @[args->next_comm] = count();
}
END { 
    printf("Context switches by process (top 10):\n");
    print(@, 10); 
}'

Trace memory allocations (kernel):

sudo bpftrace -e '
kprobe:kmalloc {
    @bytes[comm] = sum(arg1);
}
END {
    printf("Kernel memory allocations (bytes) by process:\n");
    print(@bytes, 10);
}'

Userspace Probes (USDT and Uprobes)

Trace specific functions in user processes:

# List uprobes in a binary
sudo bpftrace -l 'uprobe:/usr/bin/python3:*' | grep -i "PyEval"

# Trace Python function calls
sudo bpftrace -e '
usdt:/usr/bin/python3:python:function__entry {
    printf("%s:%s\n", str(arg0), str(arg2));
}'

Trace a Go binary's HTTP handlers:

# First, find the function names
objdump -t /usr/local/bin/mygoapp | grep "ServeHTTP"

# Trace the function
sudo bpftrace -e '
uprobe:/usr/local/bin/mygoapp:"net/http.(*ServeMux).ServeHTTP" {
    printf("HTTP request pid=%d\n", pid);
}
uretprobe:/usr/local/bin/mygoapp:"net/http.(*ServeMux).ServeHTTP" {
    printf("HTTP response latency=%dms\n", elapsed / 1000000);
}'

Trace MySQL queries:

# MySQL has USDT probes
sudo bpftrace -l 'usdt:/usr/sbin/mysqld:*'

sudo bpftrace -e '
usdt:/usr/sbin/mysqld:mysql:query__start {
    printf("query: %s\n", str(arg0));
    @start[tid] = nsecs;
}
usdt:/usr/sbin/mysqld:mysql:query__done /@start[tid]/ {
    $latency = (nsecs - @start[tid]) / 1000000;
    if ($latency > 100) {
        printf("SLOW QUERY: %dms\n", $latency);
    }
    delete(@start[tid]);
}'

Performance Analysis Recipes

CPU flame graph generation:

# Install Brendan Gregg's FlameGraph tools
git clone https://github.com/brendangregg/FlameGraph /opt/flamegraph

# Collect stack samples for 30 seconds
sudo bpftrace -e '
profile:hz:99 /comm == "nginx"/ {
    @[kstack, ustack] = count();
}
interval:s:30 { exit(); }' > /tmp/nginx-stacks.txt

# Generate flame graph
/opt/flamegraph/stackcollapse-bpftrace.pl /tmp/nginx-stacks.txt | \
/opt/flamegraph/flamegraph.pl > /tmp/nginx-flamegraph.svg

echo "Open /tmp/nginx-flamegraph.svg in a browser"

Off-CPU time analysis (identifying blocking):

sudo bpftrace -e '
tracepoint:sched:sched_switch /args->prev_state != 0 && args->prev_pid != 0/ {
    @offcpu_start[args->prev_pid] = nsecs;
}
tracepoint:sched:sched_switch /@offcpu_start[args->next_pid]/ {
    $delta = nsecs - @offcpu_start[args->next_pid];
    @offcpu_us[args->next_comm] = hist($delta / 1000);
    delete(@offcpu_start[args->next_pid]);
}
interval:s:10 { print(@offcpu_us); exit(); }'

File system latency:

sudo bpftrace -e '
kprobe:ext4_file_read_iter { @start[tid] = nsecs; }
kretprobe:ext4_file_read_iter /@start[tid]/ {
    @read_latency_us = hist((nsecs - @start[tid]) / 1000);
    delete(@start[tid]);
}
kprobe:ext4_file_write_iter { @start[tid] = nsecs; }
kretprobe:ext4_file_write_iter /@start[tid]/ {
    @write_latency_us = hist((nsecs - @start[tid]) / 1000);
    delete(@start[tid]);
}
interval:s:30 { print(@read_latency_us); print(@write_latency_us); exit(); }'

Troubleshooting

"No such file or directory" for kernel headers:

# Ubuntu
sudo apt install linux-headers-$(uname -r)
# CentOS/Rocky
sudo dnf install kernel-devel

"Failed to attach probe" errors:

# Check if the function exists in the kernel
sudo bpftrace -l "kprobe:$FUNCTION_NAME"
# Kernel functions may be inlined (not traceable)
# Try tracepoints instead:
sudo bpftrace -l "tracepoint:syscalls:*read*"

Missing USDT probes in binary:

# Check if binary was compiled with USDT support
readelf -n /usr/bin/python3 | grep stapsdt
# If no output, binary lacks USDT probes—use uprobes instead

bpftrace output lost (too many events):

# Increase perf ring buffer
sudo bpftrace -e '...' --perf-rb-pages 256
# Or reduce event volume with filters
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_read /pid == 1234/ { ... }'

Permission denied:

# Check BPF capabilities
sudo capsh --print | grep bpf
# Ensure /sys/fs/bpf is mounted
mount | grep bpf
sudo mount -t bpf bpf /sys/fs/bpf

Conclusion

bpftrace provides unprecedented visibility into Linux system behavior with minimal overhead, making it invaluable for production debugging and performance analysis. By combining kprobes for kernel function tracing, tracepoints for structured events, uprobes for userspace functions, and profile-based CPU sampling, you can answer complex performance questions that traditional tools cannot address. Start with the built-in one-liners, then write custom scripts as your investigation deepens.