Topics

Load BPF program at boot-time?


Shung-Hsi Yu
 

Hi,

Is it possible to load a BPF program at boot time?
What I'm trying to achieve is to trace every single call to a certain
function since the kernel starts, without missing anything.

More specifically, I'm trying to debug iommu_alloc failures by looking
at the stacktrace to find out which subsystem/driver allocated too
many IOMMU slots on a ppc64le system, which I do not have direct
access to.

I've considered writing a systemd unit file that loads a BPF program
before the sysinit target[1], but I'm not sure if that's early enough.
An alternative seems to be to use boot-time tracing with ftrace[2]
instead (which I end up doing), but it requires recompiling the kernel
inorder to add tracepoints to retrieve the function call arguments,
and there isn't an easy way to stop tracing to prevent the tracing
buffer overflows (I end up writing a systemd unit file that sets a
ftrace event trigger that turns off tracing).

Maybe there is a better way to do something like this?


Much thanks,
Shung-Hsi Yu

[1]: https://www.freedesktop.org/software/systemd/man/bootup.html
[2]: https://www.kernel.org/doc/html/latest/trace/boottime-trace.html


Yonghong Song
 

On Sun, Sep 6, 2020 at 7:55 AM Shung-Hsi Yu <@shunghsiyu> wrote:

Hi,

Is it possible to load a BPF program at boot time?
It is possible. See the patch below:
https://lore.kernel.org/bpf/20200819042759.51280-1-alexei.starovoitov@.../

I tried to load a BPF program and pin it in bpffs system. The system could
be extended to load bpf program, even attach it if other subsystem is ready.
But this needs kernel work.

What I'm trying to achieve is to trace every single call to a certain
function since the kernel starts, without missing anything.

More specifically, I'm trying to debug iommu_alloc failures by looking
at the stacktrace to find out which subsystem/driver allocated too
many IOMMU slots on a ppc64le system, which I do not have direct
access to.

I've considered writing a systemd unit file that loads a BPF program
before the sysinit target[1], but I'm not sure if that's early enough.
An alternative seems to be to use boot-time tracing with ftrace[2]
instead (which I end up doing), but it requires recompiling the kernel
inorder to add tracepoints to retrieve the function call arguments,
and there isn't an easy way to stop tracing to prevent the tracing
buffer overflows (I end up writing a systemd unit file that sets a
ftrace event trigger that turns off tracing).
bpf program seems a good choice here since it can store arbitrary
data in its maps and based on the tracing state, it can stop tracing.

There are still some potential issues relating to not recompile kernel
and just change bpf programs and recompile bpf programs and
rebooting should just work, which is not available today. I guess this
probably can be improved. If you are interested, please take a look
at the above patch and may improve kernel to cover your use case.


Maybe there is a better way to do something like this?


Much thanks,
Shung-Hsi Yu

[1]: https://www.freedesktop.org/software/systemd/man/bootup.html
[2]: https://www.kernel.org/doc/html/latest/trace/boottime-trace.html