Reading Pinned maps in eBPF Programs
Ian
Hello BPF Community! char map_name[] = "pid_map"; int fd = bpf_create_map_name(BPF_MAP_TYPE_HASH, &map_name, sizeof(u32), sizeof(u32), 1, 0) }; u32 key = 1; bpf_map_update_elem(fd, &key, &PID, BPF_ANY); char pid_map_path[] = "/sys/fs/bpf/pid_map"; bpf_obj_pin(fd, &pid_map_path);
|
Yonghong Song
On Fri, Aug 14, 2020 at 12:05 PM Ian <icampbe14@...> wrote:
You can use bpf_obj_get() API to get a reference to the pinned map. BPF_ANNOTATE_KV_PAIR is old way to provide map key/value types, mostly for pretty print. bcc still uses it. libbpf can use more advanced mechanisms with direct .maps section attribute.
|
Ian
You can use bpf_obj_get() API to get a reference to the pinned map. It was my understanding that bpf_obj_get was intended to be used as a user space API. I am looking to "open" or obtain a reference to a map in the actual eBPF program that is loaded into the kernel space. My eBPF programs do include linux/bpf.h but not the uapi bpf.h. Can/should you use it in the actual BPF program? Or is there an a different way to achieve this? I have seen a function called bpf_obj_get_user in linux/bpf.h but I cannot find any documentation on it. It also just returns an unsupported error in my kernel's source code. static inline int bpf_obj_get_user(const char __user *pathname, int flags) { return -EOPNOTSUPP; } BPF_ANNOTATE_KV_PAIR is old way to provide map key/value types, mostlyAhh interesting! |
Andrii Nakryiko
On Mon, Aug 17, 2020 at 6:36 AM Ian <icampbe14@...> wrote:
Libbpf supports declarative pinning of maps, that's how you easily get "map re-use" from BPF side. See [0] for example. But there is also bpf_map__pin() and bpf_map__reuse_fd() API on user-space side to set everything up, if you need to do it more flexibly. [0] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/progs/test_pinning.c I have seen a function called bpf_obj_get_user in linux/bpf.h but I cannot find any documentation on it. It also just returns an unsupported error in my kernel's source code. |
Ian
Libbpf supports declarative pinning of maps, that's how you easily getThese examples are exactly what I am looking for but it appears that they either require BTF activated in the kernel or require a 5.8 kernel. Unfortunately I am targeting the new Ubuntu 20.04 system with "out-of-the-box" configurations. So that means I am saddled with kernel v5.4 and BTF not active. Why does libbpfs declarative map pinning require BTF? Does the metadata within BTF support the ability to correctly find and open the map? |
Andrii Nakryiko
On Wed, Aug 19, 2020 at 3:40 PM Ian <icampbe14@...> wrote:
It doesn't require kernel BTF for that. Only BPF program's BTF generated by Clang. So you'll need something like Clang 10 (or maybe Clang 9 will do as well), but no requirements for kernel BTF. |
Ian
Interestingly enough I am using clang version 10.0.0! Even with that creating a structure from the examples like so:
struct {
__uint(type, BPF_MAP_TYPE_HASH); __uint(max_entries, 1);
__type(key, __u32);
__type(value, __u32); __uint(pinning, LIBBPF_PIN_BY_NAME);
} pid_map SEC(".maps"); I still get: libbpf: BTF is required, but is missing or corrupted. Here is my clang version output: vagrant@vagrant:/vagrant$ clang -v
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/9
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/9
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Candidate multilib: x32;@mx32
Selected multilib: .;@m64 I will continue looking into new clang versions to see if mine is slightly out of date! |
Andrii Nakryiko
On Thu, Aug 20, 2020 at 5:35 AM Ian <icampbe14@...> wrote:
Your BPF code must be relying on CO-RE. I can check if you can show me your BPF source code. The pinning and map definition itself doesn't rely on CO-RE and thus doesn't need kernel BTF.
|
Ian
Hello! Sorry for the wait, I just started back at uni and things are a little bit crazy around here! Anyways, this is the source code for my version of open snoop. Which is what I have been testing with. This does not contain the changes for map reading. My goal is to have this open snoop file open/read a map with one element after it gets the PID to compare them. It is also worth noting that I am tracking both open and openat within the same file. #include <linux/bpf.h> // BPF asm file that ships with the OS #include "bpf_helpers.h" // bpf_helper functions #include <linux/version.h>
// For navigating the task struct #include <linux/sched.h> #include <linux/nsproxy.h> #include <linux/pid_namespace.h> #include <linux/ns_common.h>
#define MAX_CPUS 4
/** * Struct to pass data to the perf buffer */ #pragma pack(1) struct opensnoop_data_t { u32 pid; u32 tgid; char program_name[16]; // max comm length is 16 char file[255]; u32 namespace; u64 time; };
struct sys_enter_openat_args { long long pad; long __syscall_nr; long dfd; const char *filename; long flags; long mode; };
struct sys_enter_open_args { long long pad; long __syscall_nr; const char *filename; long flags; long mode; };
/** * Using the magic macro SEC this struct declares * and creates a new bpf map of a type PERF that we * can use to pass data to userspace */ struct bpf_map_def SEC("maps") opensnoop_events = { .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY, .key_size = sizeof(int), .value_size = sizeof(u32), .max_entries = MAX_CPUS, };
SEC("tracepoint/syscalls/sys_enter_openat") int bpf_prog(struct sys_enter_openat_args *ctx) {
struct opensnoop_data_t data = {}; data.pid = bpf_get_current_pid_tgid() >> 32; // use fn from libbpf.h to get pid_tgid data.tgid = bpf_get_current_pid_tgid(); // first 32 bits are tgid data.time = bpf_ktime_get_ns();
bpf_get_current_comm(&data.program_name, sizeof(data.program_name)); // puts current comm into char array
int err = bpf_probe_read_str(data.file, sizeof(data.file), ctx->filename); if (!err) { char msg[] = "Err: %d\n"; bpf_trace_printk(msg, sizeof(msg), err); }
struct task_struct *task = (struct task_struct *)bpf_get_current_task(); // sched.h
struct nsproxy *nsprox = 0; // nsproxy.h struct pid_namespace *pidns = 0; // pid_namespace.h struct ns_common *nsc = 0; // ns_common.h struct ns_common n = {}; data.namespace = ({ typeof(unsigned int) _val; __builtin_memset(&_val, 0, sizeof(_val)); // set bytes to 0 bpf_probe_read(&_val, sizeof(_val), &({
typeof(struct pid_namespace *) _val; __builtin_memset(&_val, 0, sizeof(_val)); bpf_probe_read(&_val, sizeof(_val), &({
typeof(struct nsproxy *) _val; __builtin_memset(&_val, 0, sizeof(_val)); bpf_probe_read(&_val, sizeof(_val), &task->nsproxy); _val;
})->pid_ns_for_children);
_val; })->ns.inum);
_val; });
#ifdef DEBUG char debug_msg[] = "Tracepoint on syscalls/sys_enter_openat was called for process %d\n"; bpf_trace_printk(debug_msg, sizeof(debug_msg), data.pid); #endif
bpf_perf_event_output(ctx, &opensnoop_events, BPF_F_CURRENT_CPU /*run on current cpu*/, &data, sizeof(data));
return 0; } SEC("tracepoint/syscalls/sys_enter_open") int sys_enter_open_prog(struct sys_enter_open_args *ctx) {
struct opensnoop_data_t data = {};
data.pid = bpf_get_current_pid_tgid() >> 32; // use fn from libbpf.h to get pid_tgid data.tgid = bpf_get_current_pid_tgid(); // first 32 bits are tgid data.time = bpf_ktime_get_ns();
bpf_get_current_comm(&data.program_name, sizeof(data.program_name)); // puts current comm into char array
int err = bpf_probe_read_str(data.file, sizeof(data.file), ctx->filename); if (!err) { char msg[] = "Err: %d\n"; bpf_trace_printk(msg, sizeof(msg), err); }
#ifdef DEBUG char debug_msg[] = "Tracepoint on syscalls/sys_enter_open was called for process %d\n"; bpf_trace_printk(debug_msg, sizeof(debug_msg), data.pid); #endif
bpf_perf_event_output(ctx, &opensnoop_events, BPF_F_CURRENT_CPU /*run on current cpu*/, &data, sizeof(data));
return 0; }
u32 _version SEC("version") = LINUX_VERSION_CODE; char _license[] SEC("license") = "GPL"; // necessary to use types of kernel ABI's |
Tristan Mayfield
I wanted to chime in and mention that I've seen the BTF error before when trying to declare maps the way shown in https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/progs/test_pinning.c. I haven't run through a debugger yet to verify that's the issue, but I have verified on the opensnoop code Ian posted. |
Andrii Nakryiko
On Sun, Aug 23, 2020 at 12:36 PM Ian <icampbe14@...> wrote:
[...] I don't see anything needing kernel BTF in there, so if libbpf still fails on not being able to load kernel BTF, that might be a bug in libbpf. Can you please double-check this with the latest released (or just plain latest) libbpf and if that's still happening, please provide debug-level logs from libbpf. Thank you! |
Andrii Nakryiko
On Wed, Aug 26, 2020 at 6:54 AM Tristan Mayfield
<mayfieldtristan@...> wrote: Which version of libbpf are you seeing this on? We've had bugs in libbpf where we'd attempt to load kernel BTF unnecessarily, but I believe we've fixed all those issues. Can you please double-check with latest released libbpf and see if that's still happening? If it is, could you provide a repro and full libbpf debug logs for me to investigate? Thanks! Tristan |
Ian
Hey Andrii,
By the way, thank you so much for all your help! Ian |
Andrii Nakryiko
On Thu, Aug 27, 2020 at 6:55 AM Ian <icampbe14@...> wrote:
Check example [0] for how to set custom logging callback and print all libbpf logs (including those at DEBUG level of verbosity). [0] https://github.com/iovisor/bcc/blob/master/libbpf-tools/runqslower.c#L136 You are welcome!
|
Ian
Hello,
Here is the libbpf Logs at all levels for the open snoop program when using the pinned option for a map. This was tested on Linux Kernel v5.4 with libbpf 0.0.9, 0.1.0, and the current version. All the results of the logs were the same so I have only posted a single copy of it here. Let me know what you think and what the next steps might be! I appreciate the help and am having a good time trying to piece this together. libbpf: loading bpf-library/bpf_objs/opensnoop.bpf.o
libbpf: section(1) .strtab, size 289, link 0, flags 0, type=3
libbpf: skip section(1) .strtab
libbpf: section(2) .text, size 0, link 0, flags 6, type=1
libbpf: skip section(2) .text
libbpf: section(3) tracepoint/syscalls/sys_enter_openat, size 1632, link 0, flags 6, type=1
libbpf: found program tracepoint/syscalls/sys_enter_openat
libbpf: section(4) .reltracepoint/syscalls/sys_enter_openat, size 32, link 15, flags 0, type=9
libbpf: section(5) tracepoint/syscalls/sys_enter_open, size 1368, link 0, flags 6, type=1
libbpf: found program tracepoint/syscalls/sys_enter_open
libbpf: section(6) .reltracepoint/syscalls/sys_enter_open, size 32, link 15, flags 0, type=9
libbpf: section(7) .data, size 4, link 0, flags 3, type=1
libbpf: section(8) maps, size 20, link 0, flags 3, type=1
libbpf: section(9) .rodata.str1.1, size 9, link 0, flags 32, type=1
libbpf: skip section(9) .rodata.str1.1
libbpf: section(10) version, size 4, link 0, flags 3, type=1
libbpf: kernel version of bpf-library/bpf_objs/opensnoop.bpf.o is 50422
libbpf: section(11) license, size 4, link 0, flags 3, type=1
libbpf: license of bpf-library/bpf_objs/opensnoop.bpf.o is GPL
libbpf: section(12) .maps, size 40, link 0, flags 3, type=1
libbpf: section(13) .eh_frame, size 80, link 0, flags 2, type=1
libbpf: skip section(13) .eh_frame
libbpf: section(14) .rel.eh_frame, size 32, link 15, flags 0, type=9
libbpf: skip relo .rel.eh_frame(14) for section(13)
libbpf: section(15) .symtab, size 408, link 1, flags 0, type=2
libbpf: BTF is required, but is missing or corrupted.
|
Andrii Nakryiko
On Sun, Aug 30, 2020 at 4:35 PM Ian <icampbe14@...> wrote:
[...] Ok, this is a very different issue than the kernel missing BTF. libbpf is complaining that your opensnoop.bpf.o itself is missing BTF. And right, BTF is required to parse map definitions properly, but it doesn't depend on having kernel support for BTF at all. Make sure you use recent enough Clang (v10+) and you build your opensnoop.bpf.o with -target bpf **and** -g flag to generate debug info (including .BTF ELF section). Ian |
Ian
Interestingly enough adding just -g in my Makefile built the BPF programs and allowed the BTF section to be found and properly loaded. My BPF program was loaded and is running properly with my desired functionality. I am confused though as to why the -g flag fixed this problem. Which according to the clang man page:
-g Generate debug information.Is BTF information considered debug information? Is that in general or in this case? Is the this unexpected behavior? Perhaps a bug of clangs non -g compiled binaries with BPF? It would seem to me that the BTF information should not be purged from a non -g binary. I am interested to hear your thought on this Andrii! Again, thank you so much for your help. There is no way I would have figured that out on my own. Ian |
Andrii Nakryiko
On Mon, Aug 31, 2020 at 12:03 PM Ian <icampbe14@...> wrote:
It's expected right now. BTF started out as purely debug information, but got elevated into pretty much a mandatory thing for modern BPF applications. We've talked about making .BTF emitted without -g, but that hasn't happened in Clang yet (there are some technical difficulties). Again, thank you so much for your help. There is no way I would have figured that out on my own. |