bpftrace for System Tracing and Debugging
bpftrace is a high-level tracing language for Linux built on eBPF (extended Berkeley Packet Filter) that allows dynamic tracing of kernel and userspace events with minimal overhead. This guide covers installing bpftrace, probe types, writing one-liners and scripts, kernel and syscall tracing, userspace probes, and practical performance analysis recipes.
Prerequisites
- Linux kernel 4.9+ (5.x recommended for full feature support)
- Ubuntu 20.04+, CentOS/Rocky 8+, or Debian 10+
- Root access (eBPF programs require root or
CAP_BPF) - BPF filesystem mounted:
ls /sys/fs/bpf
Installing bpftrace
# Ubuntu/Debian
sudo apt update
sudo apt install -y bpftrace
# CentOS/Rocky 8+
sudo dnf install -y bpftrace
# Verify installation
bpftrace --version
# Test basic functionality
sudo bpftrace -e 'BEGIN { printf("bpftrace is working\n"); exit(); }'
For newer versions, build from source:
# Ubuntu - install from snap for latest version
sudo snap install bpftrace
# Or use the static binary
wget https://github.com/iovisor/bpftrace/releases/latest/download/bpftrace
chmod +x bpftrace
sudo mv bpftrace /usr/local/bin/
List available probes on your system:
# List all kernel tracepoints
sudo bpftrace -l 'tracepoint:*' | head -30
# List kprobes (kernel functions)
sudo bpftrace -l 'kprobe:*' | grep tcp | head -20
# List uprobes for a binary
sudo bpftrace -l 'uprobe:/usr/bin/python3:*' | head -20
Probe Types and Syntax
bpftrace programs follow this structure:
probe_type:target:event /filter/ {
actions
}
| Probe Type | Example | When it fires |
|---|---|---|
kprobe | kprobe:tcp_connect | Kernel function entry |
kretprobe | kretprobe:tcp_connect | Kernel function return |
tracepoint | tracepoint:syscalls:sys_enter_read | Kernel tracepoint |
uprobe | uprobe:/bin/bash:readline | Userspace function entry |
uretprobe | uretprobe:/bin/bash:readline | Userspace function return |
usdt | usdt:/usr/bin/python3:function__entry | Userspace static probe |
profile | profile:hz:99 | Timer-based sampling |
interval | interval:s:5 | Periodic interval |
BEGIN | BEGIN | Program start |
END | END | Program exit (Ctrl+C) |
Built-in variables:
comm - process name
pid - process ID
tid - thread ID
uid - user ID
cpu - CPU number
nsecs - timestamp (nanoseconds)
retval - return value (kretprobe/uretprobe)
args - function arguments
curtask - current task_struct pointer
Essential One-Liners
# Trace all syscalls for a specific process
sudo bpftrace -e 'tracepoint:raw_syscalls:sys_enter /pid == 1234/ { printf("%s\n", probe); }'
# Count syscalls by process name
sudo bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
# Trace file opens with filenames
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args->filename)); }'
# Show disk I/O latency as a histogram (ms)
sudo bpftrace -e 'kprobe:blk_account_io_start { @start[arg0] = nsecs; }
kprobe:blk_account_io_done /@start[arg0]/ {
@latency_ms = hist((nsecs - @start[arg0]) / 1000000);
delete(@start[arg0]);
}'
# Count TCP connections by destination port
sudo bpftrace -e 'kprobe:tcp_connect {
$sk = (struct sock *)arg0;
@ports[$sk->__sk_common.skc_dport] = count();
}'
# Trace slow function calls (>1ms)
sudo bpftrace -e 'kprobe:vfs_read { @start[tid] = nsecs; }
kretprobe:vfs_read /@start[tid] && (nsecs - @start[tid]) > 1000000/ {
printf("slow vfs_read: %s %dms\n", comm, (nsecs - @start[tid]) / 1000000);
delete(@start[tid]);
}'
# Profile CPU stack traces (for flame graphs)
sudo bpftrace -e 'profile:hz:99 { @[kstack] = count(); }' -o /tmp/stacks.txt
Writing bpftrace Scripts
Save scripts as .bt files for reuse:
sudo nano /usr/local/bin/trace-read-latency.bt
#!/usr/local/bin/bpftrace
/*
* trace-read-latency.bt - Trace read() latency by process
* Usage: sudo bpftrace trace-read-latency.bt
*/
BEGIN {
printf("Tracing read() latency... Ctrl+C to stop\n");
}
tracepoint:syscalls:sys_enter_read {
@start[tid] = nsecs;
}
tracepoint:syscalls:sys_exit_read
/@start[tid]/
{
$latency = nsecs - @start[tid];
// Only record if read returned data (retval > 0)
if (retval > 0) {
@latency_us[comm] = hist($latency / 1000);
@total_bytes[comm] = sum(retval);
}
// Alert on reads taking more than 10ms
if ($latency > 10000000) {
printf("SLOW read: pid=%d comm=%s latency=%dms bytes=%d\n",
pid, comm, $latency / 1000000, retval);
}
delete(@start[tid]);
}
END {
printf("\nRead latency histograms (microseconds):\n");
print(@latency_us);
printf("\nTotal bytes read per process:\n");
print(@total_bytes);
clear(@start);
}
sudo chmod +x /usr/local/bin/trace-read-latency.bt
sudo bpftrace /usr/local/bin/trace-read-latency.bt
Kernel and Syscall Tracing
# Find available tracepoints for a subsystem
sudo bpftrace -l 'tracepoint:net:*'
sudo bpftrace -l 'tracepoint:block:*'
sudo bpftrace -l 'tracepoint:sched:*'
# View tracepoint arguments
sudo bpftrace -lv 'tracepoint:syscalls:sys_enter_write'
Trace network packet drops:
sudo bpftrace -e '
kprobe:kfree_skb {
$skb = (struct sk_buff *)arg0;
@drops[comm] = count();
}
interval:s:5 {
print(@drops);
clear(@drops);
}'
Trace scheduler events (context switches):
sudo bpftrace -e '
tracepoint:sched:sched_switch {
@[args->next_comm] = count();
}
END {
printf("Context switches by process (top 10):\n");
print(@, 10);
}'
Trace memory allocations (kernel):
sudo bpftrace -e '
kprobe:kmalloc {
@bytes[comm] = sum(arg1);
}
END {
printf("Kernel memory allocations (bytes) by process:\n");
print(@bytes, 10);
}'
Userspace Probes (USDT and Uprobes)
Trace specific functions in user processes:
# List uprobes in a binary
sudo bpftrace -l 'uprobe:/usr/bin/python3:*' | grep -i "PyEval"
# Trace Python function calls
sudo bpftrace -e '
usdt:/usr/bin/python3:python:function__entry {
printf("%s:%s\n", str(arg0), str(arg2));
}'
Trace a Go binary's HTTP handlers:
# First, find the function names
objdump -t /usr/local/bin/mygoapp | grep "ServeHTTP"
# Trace the function
sudo bpftrace -e '
uprobe:/usr/local/bin/mygoapp:"net/http.(*ServeMux).ServeHTTP" {
printf("HTTP request pid=%d\n", pid);
}
uretprobe:/usr/local/bin/mygoapp:"net/http.(*ServeMux).ServeHTTP" {
printf("HTTP response latency=%dms\n", elapsed / 1000000);
}'
Trace MySQL queries:
# MySQL has USDT probes
sudo bpftrace -l 'usdt:/usr/sbin/mysqld:*'
sudo bpftrace -e '
usdt:/usr/sbin/mysqld:mysql:query__start {
printf("query: %s\n", str(arg0));
@start[tid] = nsecs;
}
usdt:/usr/sbin/mysqld:mysql:query__done /@start[tid]/ {
$latency = (nsecs - @start[tid]) / 1000000;
if ($latency > 100) {
printf("SLOW QUERY: %dms\n", $latency);
}
delete(@start[tid]);
}'
Performance Analysis Recipes
CPU flame graph generation:
# Install Brendan Gregg's FlameGraph tools
git clone https://github.com/brendangregg/FlameGraph /opt/flamegraph
# Collect stack samples for 30 seconds
sudo bpftrace -e '
profile:hz:99 /comm == "nginx"/ {
@[kstack, ustack] = count();
}
interval:s:30 { exit(); }' > /tmp/nginx-stacks.txt
# Generate flame graph
/opt/flamegraph/stackcollapse-bpftrace.pl /tmp/nginx-stacks.txt | \
/opt/flamegraph/flamegraph.pl > /tmp/nginx-flamegraph.svg
echo "Open /tmp/nginx-flamegraph.svg in a browser"
Off-CPU time analysis (identifying blocking):
sudo bpftrace -e '
tracepoint:sched:sched_switch /args->prev_state != 0 && args->prev_pid != 0/ {
@offcpu_start[args->prev_pid] = nsecs;
}
tracepoint:sched:sched_switch /@offcpu_start[args->next_pid]/ {
$delta = nsecs - @offcpu_start[args->next_pid];
@offcpu_us[args->next_comm] = hist($delta / 1000);
delete(@offcpu_start[args->next_pid]);
}
interval:s:10 { print(@offcpu_us); exit(); }'
File system latency:
sudo bpftrace -e '
kprobe:ext4_file_read_iter { @start[tid] = nsecs; }
kretprobe:ext4_file_read_iter /@start[tid]/ {
@read_latency_us = hist((nsecs - @start[tid]) / 1000);
delete(@start[tid]);
}
kprobe:ext4_file_write_iter { @start[tid] = nsecs; }
kretprobe:ext4_file_write_iter /@start[tid]/ {
@write_latency_us = hist((nsecs - @start[tid]) / 1000);
delete(@start[tid]);
}
interval:s:30 { print(@read_latency_us); print(@write_latency_us); exit(); }'
Troubleshooting
"No such file or directory" for kernel headers:
# Ubuntu
sudo apt install linux-headers-$(uname -r)
# CentOS/Rocky
sudo dnf install kernel-devel
"Failed to attach probe" errors:
# Check if the function exists in the kernel
sudo bpftrace -l "kprobe:$FUNCTION_NAME"
# Kernel functions may be inlined (not traceable)
# Try tracepoints instead:
sudo bpftrace -l "tracepoint:syscalls:*read*"
Missing USDT probes in binary:
# Check if binary was compiled with USDT support
readelf -n /usr/bin/python3 | grep stapsdt
# If no output, binary lacks USDT probes—use uprobes instead
bpftrace output lost (too many events):
# Increase perf ring buffer
sudo bpftrace -e '...' --perf-rb-pages 256
# Or reduce event volume with filters
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_read /pid == 1234/ { ... }'
Permission denied:
# Check BPF capabilities
sudo capsh --print | grep bpf
# Ensure /sys/fs/bpf is mounted
mount | grep bpf
sudo mount -t bpf bpf /sys/fs/bpf
Conclusion
bpftrace provides unprecedented visibility into Linux system behavior with minimal overhead, making it invaluable for production debugging and performance analysis. By combining kprobes for kernel function tracing, tracepoints for structured events, uprobes for userspace functions, and profile-based CPU sampling, you can answer complex performance questions that traditional tools cannot address. Start with the built-in one-liners, then write custom scripts as your investigation deepens.


