Date   

Load BPF program at boot-time?

Shung-Hsi Yu
 

Hi,

Is it possible to load a BPF program at boot time?
What I'm trying to achieve is to trace every single call to a certain
function since the kernel starts, without missing anything.

More specifically, I'm trying to debug iommu_alloc failures by looking
at the stacktrace to find out which subsystem/driver allocated too
many IOMMU slots on a ppc64le system, which I do not have direct
access to.

I've considered writing a systemd unit file that loads a BPF program
before the sysinit target[1], but I'm not sure if that's early enough.
An alternative seems to be to use boot-time tracing with ftrace[2]
instead (which I end up doing), but it requires recompiling the kernel
inorder to add tracepoints to retrieve the function call arguments,
and there isn't an easy way to stop tracing to prevent the tracing
buffer overflows (I end up writing a systemd unit file that sets a
ftrace event trigger that turns off tracing).

Maybe there is a better way to do something like this?


Much thanks,
Shung-Hsi Yu

[1]: https://www.freedesktop.org/software/systemd/man/bootup.html
[2]: https://www.kernel.org/doc/html/latest/trace/boottime-trace.html


Re: Reading Pinned maps in eBPF Programs

Andrii Nakryiko
 

On Mon, Aug 31, 2020 at 12:03 PM Ian <icampbe14@gmail.com> wrote:

Interestingly enough adding just -g in my Makefile built the BPF programs and allowed the BTF section to be found and properly loaded. My BPF program was loaded and is running properly with my desired functionality. I am confused though as to why the -g flag fixed this problem. Which according to the clang man page:

-g Generate debug information.

Is BTF information considered debug information? Is that in general or in this case? Is the this unexpected behavior? Perhaps a bug of clangs non -g compiled binaries with BPF? It would seem to me that the BTF information should not be purged from a non -g binary. I am interested to hear your thought on this Andrii!
It's expected right now. BTF started out as purely debug information,
but got elevated into pretty much a mandatory thing for modern BPF
applications. We've talked about making .BTF emitted without -g, but
that hasn't happened in Clang yet (there are some technical
difficulties).

Again, thank you so much for your help. There is no way I would have figured that out on my own.

Ian


Re: Reading Pinned maps in eBPF Programs

Ian
 

Interestingly enough adding just -g in my Makefile built the BPF programs and allowed the BTF section to be found and properly loaded. My BPF program was loaded and is running properly with my desired functionality. I am confused though as to why the -g flag fixed this problem. Which according to the clang man page:
-g Generate debug information.
Is BTF information considered debug information? Is that in general or in this case? Is the this unexpected behavior? Perhaps a bug of clangs non -g compiled binaries with BPF? It would seem to me that the BTF information should not be purged from a non -g binary. I am interested to hear your thought on this Andrii! 

Again, thank you so much for your help. There is no way I would have figured that out on my own. 

Ian


Re: Reading Pinned maps in eBPF Programs

Andrii Nakryiko
 

On Sun, Aug 30, 2020 at 4:35 PM Ian <icampbe14@gmail.com> wrote:

Hello,

Here is the libbpf Logs at all levels for the open snoop program when using the pinned option for a map. This was tested on Linux Kernel v5.4 with libbpf 0.0.9, 0.1.0, and the current version. All the results of the logs were the same so I have only posted a single copy of it here. Let me know what you think and what the next steps might be! I appreciate the help and am having a good time trying to piece this together.
[...]


libbpf: section(14) .rel.eh_frame, size 32, link 15, flags 0, type=9

libbpf: skip relo .rel.eh_frame(14) for section(13)

libbpf: section(15) .symtab, size 408, link 1, flags 0, type=2

libbpf: BTF is required, but is missing or corrupted.
Ok, this is a very different issue than the kernel missing BTF. libbpf
is complaining that your opensnoop.bpf.o itself is missing BTF. And
right, BTF is required to parse map definitions properly, but it
doesn't depend on having kernel support for BTF at all. Make sure you
use recent enough Clang (v10+) and you build your opensnoop.bpf.o with
-target bpf **and** -g flag to generate debug info (including .BTF ELF
section).


Ian


Re: Reading Pinned maps in eBPF Programs

Ian
 

Hello, 

Here is the libbpf Logs at all levels for the open snoop program when using the pinned option for a map. This was tested on Linux Kernel v5.4 with libbpf 0.0.9, 0.1.0, and the current version. All the results of the logs were the same so I have only posted a single copy of it here. Let me know what you think and what the next steps might be! I appreciate the help and am having a good time trying to piece this together. 

libbpf: loading bpf-library/bpf_objs/opensnoop.bpf.o
 
libbpf: section(1) .strtab, size 289, link 0, flags 0, type=3
 
libbpf: skip section(1) .strtab
 
libbpf: section(2) .text, size 0, link 0, flags 6, type=1
 
libbpf: skip section(2) .text
 
libbpf: section(3) tracepoint/syscalls/sys_enter_openat, size 1632, link 0, flags 6, type=1
 
libbpf: found program tracepoint/syscalls/sys_enter_openat
 
libbpf: section(4) .reltracepoint/syscalls/sys_enter_openat, size 32, link 15, flags 0, type=9
 
libbpf: section(5) tracepoint/syscalls/sys_enter_open, size 1368, link 0, flags 6, type=1
 
libbpf: found program tracepoint/syscalls/sys_enter_open
 
libbpf: section(6) .reltracepoint/syscalls/sys_enter_open, size 32, link 15, flags 0, type=9
 
libbpf: section(7) .data, size 4, link 0, flags 3, type=1
 
libbpf: section(8) maps, size 20, link 0, flags 3, type=1
 
libbpf: section(9) .rodata.str1.1, size 9, link 0, flags 32, type=1
 
libbpf: skip section(9) .rodata.str1.1
 
libbpf: section(10) version, size 4, link 0, flags 3, type=1
 
libbpf: kernel version of bpf-library/bpf_objs/opensnoop.bpf.o is 50422
 
libbpf: section(11) license, size 4, link 0, flags 3, type=1
 
libbpf: license of bpf-library/bpf_objs/opensnoop.bpf.o is GPL
 
libbpf: section(12) .maps, size 40, link 0, flags 3, type=1
 
libbpf: section(13) .eh_frame, size 80, link 0, flags 2, type=1
 
libbpf: skip section(13) .eh_frame
 
libbpf: section(14) .rel.eh_frame, size 32, link 15, flags 0, type=9
 
libbpf: skip relo .rel.eh_frame(14) for section(13)
 
libbpf: section(15) .symtab, size 408, link 1, flags 0, type=2
 
libbpf: BTF is required, but is missing or corrupted.
 
Ian


Re: Reading Pinned maps in eBPF Programs

Andrii Nakryiko
 

On Thu, Aug 27, 2020 at 6:55 AM Ian <icampbe14@gmail.com> wrote:

Hey Andrii,

I tried using the same BPF program with the declarative pinning of maps with Libbpf v.0.0.9, v.0.1.0 and the current master branch under commit 7bc52e6. All of these had the same error being generated requiring BTF. I will update this post with the Libbpf debugger messages once I figure out how to set those up/find them! Is there anything other than that you might need from me?
Check example [0] for how to set custom logging callback and print all
libbpf logs (including those at DEBUG level of verbosity).

[0] https://github.com/iovisor/bcc/blob/master/libbpf-tools/runqslower.c#L136



By the way, thank you so much for all your help!
You are welcome!


Ian


Re: Reading Pinned maps in eBPF Programs

Ian
 

Hey Andrii, 

I tried using the same BPF program with the declarative pinning of maps with Libbpf v.0.0.9, v.0.1.0 and the current master branch under commit 7bc52e6. All of these had the same error being generated requiring BTF. I will update this post with the Libbpf debugger messages once I figure out how to set those up/find them! Is there anything other than that you might need from me? 

 

By the way, thank you so much for all your help! 

Ian


Re: Reading Pinned maps in eBPF Programs

Andrii Nakryiko
 

On Wed, Aug 26, 2020 at 6:54 AM Tristan Mayfield
<mayfieldtristan@gmail.com> wrote:

I wanted to chime in and mention that I've seen the BTF error before when trying to declare maps the way shown in https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/progs/test_pinning.c.

I have tested kernel 4.15 and 5.4 (vanilla Ubuntu 18.04 and 20 respectively) and both have the same issue. Looking through libbpf it looks like the call would be coming from:

bpf_object__open() -> __bpf_object__open() -> bpf_object__elf_collect() -> bpf_object__init/finalize_btf()

I haven't run through a debugger yet to verify that's the issue, but I have verified on the opensnoop code Ian posted.
I'm not sure why the deprecated version of map declaration doesn't cause this BTF workflow while the newer one does, but I'll look through and debug today and if I can find it I'll send out a message. I'd be interested to know if that above code is doing something that triggers BTF reliance though.
Which version of libbpf are you seeing this on? We've had bugs in
libbpf where we'd attempt to load kernel BTF unnecessarily, but I
believe we've fixed all those issues. Can you please double-check with
latest released libbpf and see if that's still happening? If it is,
could you provide a repro and full libbpf debug logs for me to
investigate? Thanks!

Tristan


Re: Reading Pinned maps in eBPF Programs

Andrii Nakryiko
 

On Sun, Aug 23, 2020 at 12:36 PM Ian <icampbe14@gmail.com> wrote:

Hello! Sorry for the wait, I just started back at uni and things are a little bit crazy around here!

Anyways, this is the source code for my version of open snoop. Which is what I have been testing with. This does not contain the changes for map reading. My goal is to have this open snoop file open/read a map with one element after it gets the PID to compare them. It is also worth noting that I am tracking both open and openat within the same file.
[...]

I don't see anything needing kernel BTF in there, so if libbpf still
fails on not being able to load kernel BTF, that might be a bug in
libbpf. Can you please double-check this with the latest released (or
just plain latest) libbpf and if that's still happening, please
provide debug-level logs from libbpf. Thank you!



Re: Reading Pinned maps in eBPF Programs

Tristan Mayfield
 

I wanted to chime in and mention that I've seen the BTF error before when trying to declare maps the way shown in https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/progs/test_pinning.c.

I have tested kernel 4.15 and 5.4 (vanilla Ubuntu 18.04 and 20 respectively) and both have the same issue. Looking through libbpf it looks like the call would be coming from:

bpf_object__open() -> __bpf_object__open() -> bpf_object__elf_collect() -> bpf_object__init/finalize_btf()

I haven't run through a debugger yet to verify that's the issue, but I have verified on the opensnoop code Ian posted.
I'm not sure why the deprecated version of map declaration doesn't cause this BTF workflow while the newer one does, but I'll look through and debug today and if I can find it I'll send out a message. I'd be interested to know if that above code is doing something that triggers BTF reliance though.

Tristan


Re: Reading Pinned maps in eBPF Programs

Ian
 

Hello! Sorry for the wait, I just started back at uni and things are a little bit crazy around here!

Anyways,  this is the source code for my version of open snoop. Which is what I have been testing with. This does not contain the changes for map reading. My goal is to have this open snoop file open/read a map with one element after it gets the PID to compare them. It is also worth noting that I am tracking both open and openat within the same file. 

#include <linux/bpf.h>   // BPF asm file that ships with the OS

#include "bpf_helpers.h" // bpf_helper functions

#include <linux/version.h>

 

// For navigating the task struct

#include <linux/sched.h>

#include <linux/nsproxy.h>

#include <linux/pid_namespace.h>

#include <linux/ns_common.h>

 

#define MAX_CPUS 4

 

/**

 * Struct to pass data to the perf buffer

 */

#pragma pack(1)

struct opensnoop_data_t {

    u32 pid;

    u32 tgid;

    char program_name[16]; // max comm length is 16

    char file[255];

    u32 namespace;

    u64 time;

};

 

struct sys_enter_openat_args {

    long long pad;

    long __syscall_nr;

    long dfd;

    const char *filename;

    long flags;

    long mode;

};

 

struct sys_enter_open_args {

    long long pad;

    long __syscall_nr;

    const char *filename;

    long flags;

    long mode;

};

 

/**

 * Using the magic macro SEC this struct declares

 * and creates a new bpf map of a type PERF that we

 * can use to pass data to userspace

 */

struct bpf_map_def SEC("maps") opensnoop_events = {

    .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,

    .key_size = sizeof(int),

    .value_size = sizeof(u32),

    .max_entries = MAX_CPUS,

};

 

SEC("tracepoint/syscalls/sys_enter_openat")

int bpf_prog(struct sys_enter_openat_args *ctx) {

 

    struct opensnoop_data_t data = {};

    data.pid = bpf_get_current_pid_tgid() >> 32; // use fn from libbpf.h to get pid_tgid

    data.tgid = bpf_get_current_pid_tgid();      // first 32 bits are tgid

    data.time = bpf_ktime_get_ns();

 

    bpf_get_current_comm(&data.program_name, sizeof(data.program_name)); // puts current comm into char array

 

    int err = bpf_probe_read_str(data.file, sizeof(data.file), ctx->filename);

    if (!err) {

        char msg[] = "Err: %d\n";

        bpf_trace_printk(msg, sizeof(msg), err);

    }

 

    struct task_struct *task = (struct task_struct *)bpf_get_current_task(); // sched.h

 

    struct nsproxy *nsprox = 0;      // nsproxy.h

    struct pid_namespace *pidns = 0; // pid_namespace.h

    struct ns_common *nsc = 0;       // ns_common.h

    struct ns_common n = {};

    data.namespace = ({

        typeof(unsigned int) _val;

        __builtin_memset(&_val, 0, sizeof(_val)); // set bytes to 0

        bpf_probe_read(&_val, sizeof(_val), &({

 

            typeof(struct pid_namespace *) _val;

            __builtin_memset(&_val, 0, sizeof(_val));

            bpf_probe_read(&_val, sizeof(_val), &({

 

                typeof(struct nsproxy *) _val;

                __builtin_memset(&_val, 0, sizeof(_val));

                bpf_probe_read(&_val, sizeof(_val), &task->nsproxy);

                _val;

 

            })->pid_ns_for_children);

 

            _val;

        })->ns.inum);

 

        _val;

    });

 

#ifdef DEBUG

    char debug_msg[] = "Tracepoint on syscalls/sys_enter_openat was called for process %d\n";

    bpf_trace_printk(debug_msg, sizeof(debug_msg), data.pid);

#endif

 

    bpf_perf_event_output(ctx, &opensnoop_events, BPF_F_CURRENT_CPU /*run on current cpu*/, &data, sizeof(data));

 

    return 0;

}

SEC("tracepoint/syscalls/sys_enter_open")

int sys_enter_open_prog(struct sys_enter_open_args *ctx) {

 

    struct opensnoop_data_t data = {};

 

    data.pid = bpf_get_current_pid_tgid() >> 32; // use fn from libbpf.h to get pid_tgid

    data.tgid = bpf_get_current_pid_tgid();      // first 32 bits are tgid

    data.time = bpf_ktime_get_ns();

 

    bpf_get_current_comm(&data.program_name, sizeof(data.program_name)); // puts current comm into char array

 

    int err = bpf_probe_read_str(data.file, sizeof(data.file), ctx->filename);

    if (!err) {

        char msg[] = "Err: %d\n";

        bpf_trace_printk(msg, sizeof(msg), err);

    }

 

#ifdef DEBUG

    char debug_msg[] = "Tracepoint on syscalls/sys_enter_open was called for process %d\n";

    bpf_trace_printk(debug_msg, sizeof(debug_msg), data.pid);

#endif

 

    bpf_perf_event_output(ctx, &opensnoop_events, BPF_F_CURRENT_CPU /*run on current cpu*/, &data, sizeof(data));

 

    return 0;

}

 

u32 _version SEC("version") = LINUX_VERSION_CODE;

char _license[] SEC("license") = "GPL"; // necessary to use types of kernel ABI's


Re: Reading Pinned maps in eBPF Programs

Andrii Nakryiko
 

On Thu, Aug 20, 2020 at 5:35 AM Ian <icampbe14@gmail.com> wrote:

Interestingly enough I am using clang version 10.0.0! Even with that creating a structure from the examples like so:

struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 1);
__type(key, __u32);
__type(value, __u32);
__uint(pinning, LIBBPF_PIN_BY_NAME);
} pid_map SEC(".maps");

I still get: libbpf: BTF is required, but is missing or corrupted.
Your BPF code must be relying on CO-RE. I can check if you can show me
your BPF source code.

The pinning and map definition itself doesn't rely on CO-RE and thus
doesn't need kernel BTF.


Here is my clang version output:

vagrant@vagrant:/vagrant$ clang -v
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/9
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/9
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Candidate multilib: x32;@mx32
Selected multilib: .;@m64

I will continue looking into new clang versions to see if mine is slightly out of date!



Re: Reading Pinned maps in eBPF Programs

Ian
 

Interestingly enough I am using clang version 10.0.0! Even with that creating a structure from the examples like so:

struct {     
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 1);
    __type(key, __u32);
    __type(value, __u32);
    __uint(pinning, LIBBPF_PIN_BY_NAME);
} pid_map SEC(".maps");
 
I still get: libbpf: BTF is required, but is missing or corrupted.

Here is my clang version output: 

vagrant@vagrant:/vagrant$ clang -v
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/9
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/9
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Candidate multilib: x32;@mx32
Selected multilib: .;@m64

I will continue looking into new clang versions to see if mine is slightly out of date!



Re: Reading Pinned maps in eBPF Programs

Andrii Nakryiko
 

On Wed, Aug 19, 2020 at 3:40 PM Ian <icampbe14@gmail.com> wrote:

Libbpf supports declarative pinning of maps, that's how you easily get
"map re-use" from BPF side. See [0] for example.

These examples are exactly what I am looking for but it appears that they either require BTF activated in the kernel or require a 5.8 kernel. Unfortunately I am targeting the new Ubuntu 20.04 system with "out-of-the-box" configurations. So that means I am saddled with kernel v5.4 and BTF not active. Why does libbpfs declarative map pinning require BTF? Does the metadata within BTF support the ability to correctly find and open the map?
It doesn't require kernel BTF for that. Only BPF program's BTF
generated by Clang. So you'll need something like Clang 10 (or maybe
Clang 9 will do as well), but no requirements for kernel BTF.


Re: Reading Pinned maps in eBPF Programs

Ian
 

Libbpf supports declarative pinning of maps, that's how you easily get
"map re-use" from BPF side. See [0] for example.
These examples are exactly what I am looking for but it appears that they either require BTF activated in the kernel or require a 5.8 kernel. Unfortunately I am targeting the new Ubuntu 20.04 system with "out-of-the-box" configurations. So that means I am saddled with kernel v5.4 and BTF not active. Why does libbpfs declarative map pinning require BTF? Does the metadata within BTF support the ability to correctly find and open the map?


Re: Reading Pinned maps in eBPF Programs

Andrii Nakryiko
 

On Mon, Aug 17, 2020 at 6:36 AM Ian <icampbe14@gmail.com> wrote:

You can use bpf_obj_get() API to get a reference to the pinned map.

It was my understanding that bpf_obj_get was intended to be used as a user space API. I am looking to "open" or obtain a reference to a map in the actual eBPF program that is loaded into the kernel space. My eBPF programs do include linux/bpf.h but not the uapi bpf.h. Can/should you use it in the actual BPF program? Or is there an a different way to achieve this?
Libbpf supports declarative pinning of maps, that's how you easily get
"map re-use" from BPF side. See [0] for example.

But there is also bpf_map__pin() and bpf_map__reuse_fd() API on
user-space side to set everything up, if you need to do it more
flexibly.

[0] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/progs/test_pinning.c

I have seen a function called bpf_obj_get_user in linux/bpf.h but I cannot find any documentation on it. It also just returns an unsupported error in my kernel's source code.

static inline int bpf_obj_get_user(const char __user *pathname, int flags)

{

return -EOPNOTSUPP;

}

BPF_ANNOTATE_KV_PAIR is old way to provide map key/value types, mostly
for pretty print. bcc still uses it. libbpf can use more advanced
mechanisms with direct .maps section attribute.s

Ahh interesting!


Re: Reading Pinned maps in eBPF Programs

Ian
 

You can use bpf_obj_get() API to get a reference to the pinned map.

It was my understanding that bpf_obj_get was intended to be used as a user space API. I am looking to "open" or obtain a reference to a map in the actual eBPF program that is loaded into the kernel space. My eBPF programs do include linux/bpf.h but not the uapi bpf.h. Can/should you use it in the actual BPF program? Or is there an a different way to achieve this?

I have seen a function called bpf_obj_get_user in linux/bpf.h but I cannot find any documentation on it. It also just returns an unsupported error in my kernel's source code. 

static inline int bpf_obj_get_user(const char __user *pathname, int flags)

{

        return -EOPNOTSUPP;

}

BPF_ANNOTATE_KV_PAIR is old way to provide map key/value types, mostly
for pretty print. bcc still uses it. libbpf can use more advanced
mechanisms with direct .maps section attribute.s
Ahh interesting! 


Re: Reading Pinned maps in eBPF Programs

Yonghong Song
 

On Fri, Aug 14, 2020 at 12:05 PM Ian <icampbe14@gmail.com> wrote:

Hello BPF Community!

Hope you are all doing well. I am trying to have a user space program create a BPF Hash map with a single element containing its PID. This map could then be read by all the BPF programs loaded by the user space program. Any event the BPF programs would handle would first compare the PID with the user space program. If the PIDs matched (this is a single threaded application) the event will be thrown out to eliminate events being processed that are from the user space programs own feedback. I was doing some research into this and found a similar post here: https://lists.iovisor.org/g/iovisor-dev/message/1389?p=,,,20,0,0,0::Created,,Userspace+Maps,20,2,0,23673879 that discusses the possibility of this in C++ and BCC. I am curious as to how this could be possible using the standard BPF functions and Libbpf library on Ubuntu 20.04 and Linux Kernel v5.4. NOTE: BTF is not currently compiled into this kernel.

I have created and pinned the map in my user space program like this:

char map_name[] = "pid_map";

int fd = bpf_create_map_name(BPF_MAP_TYPE_HASH, &map_name, sizeof(u32), sizeof(u32), 1, 0) };

u32 key = 1;

bpf_map_update_elem(fd, &key, &PID, BPF_ANY);

char pid_map_path[] = "/sys/fs/bpf/pid_map";

bpf_obj_pin(fd, &pid_map_path);

NOTE: Error checking and some syntax stuff was removed for brevity.

In my BPF programs I know I cannot "open" a map using bpf_obj_open. Therefore, I need a reference. I looked into the link provided above, essentially in the BPF program all they did was define the map as an extern map def. So I reproduced this in my BPF program like this:
You can use bpf_obj_get() API to get a reference to the pinned map.


u32 *pid = bpf_map_lookup_elem(&pid_map, &key);
extern struct bpf_map_def pid_map;

To see if the BPF Loading process would catch the matching map names. Interestingly this would result in a libbpf error:
libbpf: failed to find BTF for extern 'pid_map': -3

Looking at this error message it would appear that there is a way to get this kind of functionality using BTF. The error message to implies that some sort of BTF metadata is being searched in some location to match the extern map I have declared. Knowing this I am curious as to how I can create a reference for multiple BPF programs that could read the data in the pid_map to prevent feedback issues. I have looked into libbpf and the standard BPF.h functions but couldn't really find anything that seemed plausible. One thing I did see and am also curious about is the usage of BPF_ANNOTATE_KV_PAIR. This macro seemed like a possibility but my lack of understanding of BTF has not been able to confirm it. I also wasn't sure if using bpf_helpers.h in a user space program was ideal.
BPF_ANNOTATE_KV_PAIR is old way to provide map key/value types, mostly
for pretty print. bcc still uses it. libbpf can use more advanced
mechanisms with direct .maps section attribute.



Thank you so much in advance for any response! I really have been amazed at how responsive the community is here. You all have helped me learn so much about BPF!

Ian


Reading Pinned maps in eBPF Programs

Ian
 

Hello BPF Community! 

Hope you are all doing well. I am trying to have a user space program create a BPF Hash map with a single element containing its PID. This map could then be read by all the BPF programs loaded by the user space program. Any event the BPF programs would handle would first compare the PID with the user space program. If the PIDs matched (this is a single threaded application) the event will be thrown out to eliminate events being processed that are from the user space programs own feedback. I was doing some research into this and found a similar post here: https://lists.iovisor.org/g/iovisor-dev/message/1389?p=,,,20,0,0,0::Created,,Userspace+Maps,20,2,0,23673879 that discusses the possibility of this in C++ and BCC. I am curious as to how this could be possible using the standard BPF functions and Libbpf library on Ubuntu 20.04 and Linux Kernel v5.4. NOTE: BTF is not currently compiled into this kernel. 

I have created and pinned the map in my user space program like this: 

    char map_name[] = "pid_map";

    int fd = bpf_create_map_name(BPF_MAP_TYPE_HASH, &map_name, sizeof(u32), sizeof(u32), 1, 0) };

    u32 key = 1;

    bpf_map_update_elem(fd, &key, &PID, BPF_ANY);

    char pid_map_path[] = "/sys/fs/bpf/pid_map";

    bpf_obj_pin(fd, &pid_map_path);

NOTE: Error checking and some syntax stuff was removed for brevity.

In my BPF programs I know I cannot "open" a map using bpf_obj_open. Therefore, I need a reference. I looked into the link provided above, essentially in the BPF program all they did was define the map as an extern map def. So I reproduced this in my BPF program like this:

u32 *pid = bpf_map_lookup_elem(&pid_map, &key);
extern struct bpf_map_def pid_map;

To see if the BPF Loading process would catch the matching map names. Interestingly this would result in a libbpf error: 
libbpf: failed to find BTF for extern 'pid_map': -3

Looking at this error message it would appear that there is a way to get this kind of functionality using BTF. The error message to implies that some sort of BTF metadata is being searched in some location to match the extern map I have declared. Knowing this I am curious as to how I can create a reference for multiple BPF programs that could read the data in the pid_map to prevent feedback issues. I have looked into libbpf and the standard BPF.h functions but couldn't really find anything that seemed plausible. One thing I did see and am also curious about is the usage of BPF_ANNOTATE_KV_PAIR. This macro seemed like a possibility but my lack of understanding of BTF has not been able to confirm it. I also wasn't sure if using bpf_helpers.h in a user space program was ideal. 


Thank you so much in advance for any response! I really have been amazed at how responsive the community is here. You all have helped me learn so much about BPF! 

Ian


Re: Polling multiple BPF_MAP_TYPE_PERF_EVENT_ARRAY causing dropped events

Andrii Nakryiko
 

On Wed, Aug 12, 2020 at 5:38 AM Ian <icampbe14@gmail.com> wrote:

If you have the luxury of using Linux kernel 5.8 or newer, you can try
a new BPF ring buffer map, that provides MPSC queue (so you can queue
from multiple CPUs simultaneously, while BPF perf buffer allows you to
only enqueue on your current CPU). But what's more important for you,
libbpf's ring_buffer interface allows you to do exactly what you need:
poll multiple independent ring buffers simultaneously from a single
epoll FD. See [0] for example of using that API in user-space, plus
[1] for corresponding BPF-side code.

But having said that, we should probably extend libbpf's perf_buffer
API to support similar use cases. I'll try to do this some time soon.

[0] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/prog_tests/ringbuf_multi.c#L54-L62
[1] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/progs/test_ringbuf_multi.c

Unfortunately my project is currently targeting Ubuntu 20.04 which ships with linux kernel version 5.4. It is a shame because the new ring buffer interface looks excellent! That said, would you still suggest we use the perf functionality? Or is this currently an incorrect usage? (More on possible changes below)
No perf buffer is just fine to pass data from the BPF program in the
kernel to the user-space part for post-processing.


Yes, after your handle_event() callback returns, libbpf marks that
sample as consumed and the space it was taking is now available for
new samples to be enqueued. You are right, though, that by increasing
the size of each per-CPU perf ring buffer, you'll delay the drops,
because now you can accumulate more samples in the ring before the
ring buffer is full.

When you say delay the drops, do you mean that the threshold for dropping events is larger? So if I made my page size 256, would that make it far less likely to receive dropped events all together? What would a suggested page size be? I initially thought 16 seemed like plenty, but I haven't found any research to support this. Will I always lose some events? Because that is the behavior I am witnessing right now. It seems like I always eventually start to lose events. Some of this might be due to a feedback loop where my BPF program that monitors file opens collects events triggered by my user space program. I was thinking about using a BPF map that is written by my user space program containing its PID and having all my BPF programs read that map and not write any corresponding events with matching PIDs. Any advice or thoughts on this would be appreciated!
It's hard to give you any definitive answer, it all depends. But think
about this. Perf buffer is a queue. Let's say that your per-CPU buffer
size is 1MB, each of your samples is say 1KB. What does that mean? It
means that at any given time you can't have at most 1024 samples
enqueued. So, if your BPF program in the kernel generates those 1024
samples faster than the user-space side consumes them, then you'll
have drops. So you have many ways to reduce drops:

1. Generate events at the lower rate. E.g., add sampling, filter
unuseful events, etc. This will give user-space side time to consume.
2. Speed up user-space. Many things can influence this. You can do
less work per item. You can ensure you start reacting to items sooner
by increasing priority of your consumer thread and/or pin it to a
dedicated CPU, etc.
3. Reduce the size of the event. If you can reduce sample size from
1KB to 512B by more effective data encoding or dropping unnecessary
data, you suddenly will be able to produce up to 2048 events before
running out of space. That will give your user-space more time to
consume data.
4. Increase per-CPU buffer size. Going from 1MB to 2MB will have the
same effect as reducing sample size from 1KB to 512B, again,
increasing the capacity of your buffer and thus giving more time to
consumer data.

Hope that makes sense and helps showing that I can't answer your
questions, you'll need to do analysis on your own based on your
specific implementation and problem domain.

Some of the event loss might also be attributed to the inefficiencies of my looping mechanism. Although I think the feedback loop might be the bigger culprit. I am thinking about following the Sysdig approach, which is to have a single perf buffer that is used by all my BPF programs (16 in total). This would remove the loop and eliminate all but 1 perf buffer. I would think that would be more efficient because I am removing 15 perf buffers and their epoll_waits. Then I would use a ID member of each passed data structure to properly read the data.
Yes, that would be a good approach. It's better to have 16x bigger
single perf_buffer shared across all BPF programs, than 16 separate
smaller perf buffers. Because you can absorb event spikes more
effectively.

One way I can help you, if you do need to have multiple
PERF_EVENT_ARRAY maps that you need to consume, is to add perf_buffer
APIs similar to ring_buffer that would allow to epoll all of them
simultaneously. Let me know if you are interested. That will
effectively eliminate your outer (LIST_FOREACH(evt, &event_head,
list)), you'll be just doing while(true) perf_buffer__poll() across
all perf buffers simultaneously. But single perf_buffer allows you to
do the same, effectively.



61 - 80 of 1966