Linux perf Purpose Guide agents through perf for CPU profiling: sampling, hardware counter measurement, hotspot identification, and integration with flamegraph generation. Triggers "Which function is consuming the most CPU?" "How do I measure cache misses / IPC?" "How do I use perf to find hotspots?" "How do I generate a flamegraph from perf data?" "perf shows [unknown] or [kernel] frames" Workflow 1. Prerequisites
Install
sudo apt install linux-perf
Debian/Ubuntu (version-matched)
sudo dnf install perf
Fedora/RHEL
Check permissions
By default perf requires root or paranoid level ≤ 1
cat /proc/sys/kernel/perf_event_paranoid
2 = only CPU stats (not kernel), 1 = user+kernel, 0 = all, -1 = no restrictions
Temporarily lower (session only)
sudo sysctl -w kernel.perf_event_paranoid = 1
Persistent
echo 'kernel.perf_event_paranoid=1' | sudo tee /etc/sysctl.d/99-perf.conf sudo sysctl -p /etc/sysctl.d/99-perf.conf Compile the target with debug symbols for useful frame data: gcc -g -O2 -fno-omit-frame-pointer -o prog main.c
-fno-omit-frame-pointer: essential for frame-pointer-based unwinding
Alternative: compile with DWARF CFI and use --call-graph=dwarf
- perf stat — quick counters
Basic hardware counters
perf stat ./prog
With specific events
perf stat -e cache-misses,cache-references,instructions,cycles,branch-misses ./prog
Wall-clock comparison: N runs
perf stat -r 5 ./prog
Attach to existing process
perf stat -p 12345 sleep 10 Interpret perf stat output: IPC (instructions per cycle) < 1.0: memory-bound or stalled pipeline cache-miss rate
5%: significant cache pressure branch-miss rate 5%: branch predictor struggling 3. perf record — sampling
Default: sample at 1000 Hz (cycles event)
perf record -g ./prog
Specify frequency
perf record -F 999 -g ./prog
Specific event
perf record -e cache-misses -g ./prog
Attach to running process
perf record -F 999 -g -p 12345 sleep 30
Off-CPU profiling (time spent waiting)
perf record -e sched:sched_switch -ag sleep 10
DWARF call graphs (better for binaries without frame pointers)
perf record -F 999 --call-graph = dwarf ./prog
Save to named file
perf record -o myapp.perf.data -g ./prog 4. perf report — interactive analysis perf report
reads perf.data
perf report -i myapp.perf.data perf report --no-children
self time only (not cumulative)
perf report --sort comm,dso,sym
sort by fields
perf report --stdio
non-interactive text output
Navigation in TUI: Enter — expand a symbol a — annotate (show assembly with hit counts) s — show source (needs debug info) d — filter by DSO (library) t — filter by thread ? — help 5. perf annotate — hot instructions
Show assembly with hit percentages
perf annotate sym_name
From report: press 'a' on a symbol
Or directly:
perf annotate -i perf.data --symbol = hot_function --stdio High hit count on a mov or vmovdqa suggests a cache miss at that load. 6. perf top — live profiling
Live top, like 'top' but for functions
sudo perf top -g
Filter by process
sudo perf top -p 12345 7. Feed into flamegraphs
Generate perf script output
perf script
out.perf
Use Brendan Gregg's FlameGraph tools
git clone https://github.com/brendangregg/FlameGraph ./FlameGraph/stackcollapse-perf.pl out.perf
out.folded ./FlameGraph/flamegraph.pl out.folded
flamegraph.svg
Open flamegraph.svg in browser
See skills/profilers/flamegraphs for reading flamegraphs and interpreting results. 8. Common issues Problem Cause Fix Permission denied perf_event_paranoid too high Lower paranoid level or run with sudo [unknown] frames Missing frame pointers or debug info Recompile with -fno-omit-frame-pointer or use --call-graph=dwarf [kernel] everywhere Kernel symbols not visible Use sudo perf record ; install linux-image-$(uname -r)-dbgsym No kallsyms Kernel symbols unavailable `echo 0 Empty report for short program Program exits too fast Use -F 9999 or instrument longer workload DWARF unwinding slow Large DWARF stack Limit with --call-graph dwarf,512 9. Useful events
List all available events
perf list
Common hardware events
cycles instructions cache-references cache-misses branch-instructions branch-misses stalled-cycles-frontend stalled-cycles-backend
Software events
context-switches cpu-migrations page-faults
Tracepoints (requires root)
sched:sched_switch syscalls:sys_enter_read For a counter reference and interpretation guide, see references/events.md .