Linux Tracing

😋😋😋

Some general notes about the various tracers inside Linux kernel.

Link to my old notes about the profilers/tracers in LegoOS: notes. The profile points, which is able to profile arbitray code piece is still my favorite thing.

In Linux kernel, we have:

  • ftrace
  • kprobe
  • uprobe
  • perf_event
  • tracepoints
  • eBPF

I tend to think this way:

  • Tracing needs two parts, 1) Mechanims to do callback. This means we need a way to let our tracing/profiling code got invoked on a running system. This can be static or dynamic. Static means we added our tracing code to source code, like tracepoints. Dynamic means we added our tracing code when system is running, like ftrace and kprobe. 2) Do stuff within callback. All of them provide some sort of handling. But eBPF is the most extensive one.
  • For example, ftrace, kprobe, and perf_event include the callback facilities, although they are not just limited to this. ftrace has the call mount way to do callback on every single function invocation. kprobe dynamically patch instructions and to do callback within exception handlers. perf_event can let CPU fire NMI interrupt. Those are all mechanisms to catch perf data.
  • The blog from Julia explains it well: Linux tracing systems & how they fit together

ftrace:

  • Mechanism
    • For each un-inlined function, gcc inserts a call mcount, or a call fentry instruction at the very beginning. This means whenever a function is called, the mcount() or the fentry() callback will be invoked.
    • Having these call instructions introduce a lot overheads. So by default kernel replace call with nop. Only after we echo something > setup_filter_functions will the ftrace code replace nop with call. Do note, Linux uses the linker magic again here, check Steven’s slides.
    • You can do a objdump vmlinux -d, and able to see the following instructions for almost all functions: callq ffffffff81a01560 <__fentry__>.
    • x86 related code: arch/x86/kernel/ftrace_64.S, arch/x86/kernel/ftrace.c
    • Questions: it seems we can know when a function got called by using fentry, but how can we know the function has returned? The trick is: the returning address is pushed to stack when a function got called. So ftrace, again, can replace that return address, so it can catch the exit time, and calculate the latency of a function. Neat!!
  • Resources
  • Usage
    • Files under /sys/kernel/debug/tracing/*
    • perf help ftrace

kprobe:

  • Mechanism
    • Kprobe replaces the original assembly instruction with an int3 trap instruction. So when we ran into the PC of the original instruction, an int3 CPU exception will happen. Within do_in3(), kernel will callback to core kprobe layer to do pre-handler. After singlestep, CPU have debug exception. Kernel walks into do_debug(), where kprobe run post-handler.
    • Kprobe is powerful, because it’s able to trace almost everything at instruction level.
    • Kprobe can NOT touch things inside entry.S. It needs a valid pt_regs to operate.
  • Resources

eBPF:

  • Mechanism
    • I think the most important thing is to understand what’s the relationship between eBPF and the others.
    • Part I: Hook. eBPF attach its program to kprobe/uprobe/ftrace/perf_event. You can think eBPF of a generic callback layer for kprobe/uprobe/ftrace/perf_event. It’s essentially the second part of tracing as we mentioned above.
    • Part II Run: eBPF run program when the above hook got invoked. eBPF is event-driven. The program can be user-written eBPF code. Other articles explained it well.
  • Resources

perf:

Trace in real time:

Print the number of page faults happen in every one second:

1
perf stat -e "page-faults" -I 1000 -a -- sleep 10

Print the numberf of mmap syscall happen in every one second:

1
perf stat -e "syscalls:sys_enter_mmap" -I 1000 -a -- sleep 10

Dynamically trace kernel functions:

1
2
3
perf probe --add do_anonymous_page
perf stat -I 5000 -e "page-faults,probe:do_anonymous_page" -- sleep 10
perf probe --del=probe:do_anonymous_page


Yizhou Shan
Created: Jun 10, 2019
Last Updated: Jul 21, 2019