This group is locked. No changes can be made to the group while it is locked.
Date
1 - 12 of 12
Extracting data from tracepoints (and anything else)
Tristan Mayfield
I've been exploring the libbpf library for different versions of the Linux kernel, and trying to rewrite some of the BCC tools. I would like to do more work with CO-RE eventually, but I'm trying to understand the entire model of how BPF programs work and how data flows between the kernel, the VM, and userspace. I just started using perf buffers instead of bpf_trace_printk and came across an issue that has me scratching my head. In the below code, I'm not able to access the const char * arg in the tracepoint sys_enter_openat (kernel 4.15). For some reason the verifier rejects this code. I think it's valid C (although I'm a little bit rusty still) and I think I followed the correct flow where data must be copied from the kernel to the VM before being able to use.
If anyone has insight to share, I'd much appreciate it. Conversely, if anyone can point me in the direction of how to debug BPF programs that would be extremely helpful too. Should I just dig into learning the basics of BPF asm? Highlights of the code: struct bpf_map_def SEC("maps") events = { .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY, .key_size = sizeof(int), .value_size = sizeof(u32), .max_entries = MAX_CPUS, }; struct sys_enter_openat_args { u16 common_type; u8 common_flags; u8 common_preempt_count; int common_pid; int __syscall_nr; int dfd; char *filename; int flags; __mode_t mode; }; SEC("tracepoint/syscalls/sys_enter_openat") int bpf_prog(struct sys_enter_openat_args *ctx) { struct data_t data; struct sys_enter_openat_args *args; int res = bpf_probe_read(args, sizeof(ctx), ctx); if(!res) { data.file = "couldn't get file"; } else { data.file = args->filename; } Error Message: bpf_load_program() err=13 0: (bf) r6 = r1 1: (b7) r2 = 8 2: (bf) r3 = r6 3: (85) call bpf_probe_read#4 R1 type=ctx expected=fp The kernel didn't load the BPF program data.pid = bpf_get_current_pid_tgid(); // use fn from libbpf.h to get pid_tgid bpf_get_current_comm(data.program_name, sizeof(data.program_name)); // puts current comm into char array bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data)); return 0; } If more code would be helpful, I'm happy to share. I recognize that libbpf and CO-RE in later kernels provides an easier API for dealing with char * (bpf_probe_read_str() I believe) but I'm trying to understand what needs to be done to target different kernels and not just the most cutting edge. As a second question, how much should I learn about perf(1) and its overlap with BPF? Finally, for long-term monitoring solutions and passing readable data, do most programs rely on pinning maps to the vfs instead of using perf buffers or passing directly to a userspace process? Thanks for the patience and goodwill with a new systems dev. I've enjoyed my interactions with the BPF community. Tristan |
Andrii Nakryiko
On Mon, Mar 23, 2020 at 9:38 AM <mayfieldtristan@...> wrote:
nit: this is a legacy syntax of specifying BPF maps, please see [0] for some newer examples [0] https://github.com/iovisor/bcc/tree/master/libbpf-tools you don't need to bpf_probe_read() ctx here, you can just access its members directly. if(!res) {But here if you want to read filename contents itself, you'll need to use bpf_probe_read_str(). Having data_t definition would be also helpful. }this error from verifier is quite misleading, but what verifier complains about here is that you try to read uninitialized pointer (arg) and pass it as a first parameter into bpf_probe_read(). But see above, you don't need to bpf_probe_read() anything, and even if you wanted to it would have to be done very differently: struct sys_enter_openat_args args; /* notice no pointer here */ bpf_probe_read(&args, sizeof(args), ctx); /* taking address of args, taking size of args, not its pointer */ The kernel didn't load the BPF programIt's a mix. If your data should/can be pre-aggregated in kernel, using map might benefit you in that you will be sending much less data to user-space. But if you want to send every piece of information than perf_buffer is faster and more convenient than having user-space query BPF maps all the time. You're welcome. Check libbpf-tools in BCC repo, it should give you some examples to work off of.
|
Andrii Nakryiko
Adding back mailing list.
On Mon, Mar 23, 2020 at 12:33 PM <mayfieldtristan@...> wrote: bpf_probe_read_str() has been there for a long time, at least 4.12 or even older. samples/bpf are part of kernel, so yes, they are using libbpf from kernel sources. For stand-alone application I'd go with github.com/libbpf/libbpf comm is 16 and unlikely to ever change. No need to waste 256 bytes here. char *file;data has to be initialized here: struct data_t data = {}; struct sys_enter_openat_args args;data is not completely initialized, see above. Please keep this discussion on mailing list, though, it might benefit someone else. |
Tristan Mayfield
bpf_probe_read_str() has been there for a long time, at least 4.12 orI found out that the cloned the kernel tree from the Ubuntu repo (i.e. "git clone --depth 1 git://kernel.ubuntu.com/ubuntu/ubuntu-bionic.git") for Bionic was the issue. For some reason it doesn't have an up to date libbpf library and so doesn't have bpf_probe_read_str(). I think going forward, getting the API from the repo you recommended or from the official kernel source is the way to go. I appreciate the pointers for my BPF program. If using github.com/libbpf/libbpf, should I just plan on loading and attaching programs manually instead of using bpf_load.h? I've been looking through the bcc/libbpf-tools/ directory and it looks like they're making use of bpf_load.h and BTF/CO-RE. I've tried using bpf_load.h/c with the standalone libbpf, but I've gotten some difficult linking issues I haven't been able to resolve. Please keep this discussion on mailing list, though, it might benefitAgreed, the last message I replied to just you accidentally. Thanks again for the help. |
Andrii Nakryiko
On Wed, Mar 25, 2020 at 6:45 AM <mayfieldtristan@...> wrote:
Take a closer look. libbpf-tools do not use bpf_load.h, that one is deprecated and its use is discouraged. libbpf-tools rely on code-generated BPF skeleton. But really, get a close look at libbpf-tools, it has everything you need to get started.
|
Tristan Mayfield
Take a closer look. libbpf-tools do not use bpf_load.h, that one is Will do. Does this mean that, going forward, BPF development will be encouraged to use kernels compiled with "CONFIG_DEBUG_INFO_BTF=y"? I've been using a default build up to now. |
Andrii Nakryiko
On Wed, Mar 25, 2020 at 11:39 AM <mayfieldtristan@...> wrote:
Yes. A lot of newer functionality relies on kernel BTF as well. But to compile portable BPF program you also need kernel BTF (for BPF CO-RE stuff). |
Tristan Mayfield
I've spent a few days trying to solve this issue I've had, and I've learned a lot about both the past BPF APIs, and the new CO-RE API. I do have a couple questions though.
#include "bpf_helpers.h" // To get kernel datatypes. Haven't figured out how to do this // without cloning the kernel source tree yet. #include "/kernel-src/tools/include/linux/types.h" #include <linux/version.h> #include <asm/ptrace.h> #include <unistd.h> #define MAX_CPUS 4 struct bpf_map_def SEC("maps") events = { .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY, .key_size = sizeof(int), .value_size = sizeof(u32), .max_entries = MAX_CPUS, }; // Struct to pass data via perf buffer struct data_t { u32 pid; u32 tgid; char program_name[16]; // max comm length is arbitrary char file[255]; }; struct sys_enter_openat_args { // struct fields obtained from tplist.py output long long pad; int __syscall_nr; int dfd; const char * filename; int flags; __mode_t mode; // used __mode_t instead of umode_t }; SEC("tracepoint/syscalls/sys_enter_openat") int bpf_prog(struct sys_enter_openat_args *ctx) { struct data_t data = {}; data.pid = bpf_get_current_pid_tgid() >> 32; data.tgid = bpf_get_current_pid_tgid(); bpf_get_current_comm(&data.program_name, sizeof(data.program_name)); int err = bpf_probe_read_str(data.file, sizeof(data.file), ctx->filename); // debugging char msg[] = "Probe read results: %d\n"; bpf_trace_printk(msg, sizeof(msg), ctx->err); bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data)); return 0; } char _license[] SEC("license") = "GPL"; u32 _version SEC("version") = LINUX_VERSION_CODE; With the above code, err = -14 and ctx->filename = -100. I took a look at an article written by |
Andrii Nakryiko
On Wed, Apr 1, 2020 at 12:52 PM <mayfieldtristan@...> wrote:
Just answered on another Github issue (https://github.com/iovisor/bcc/issues/2855#issuecomment-609532793), please check it there as well. Short answer: no. Unless you can pretty much guarantee that it will be exactly the same **binary** compiled version of the kernel (not just same version). The CO-RE API is very nice, but in case that ends up only being able to run on kernels with BTF support enabled, I've been trying to solve the original issue found in this topic without the CO-RE approach. I'm still not able to read the arguments from a given tracepoint. I'll put my code below. I'm sure there are still plenty of issues and appreciate any time given to nudge me in the right direction.These should come from kernel-devel packages. #include <linux/version.h>nit: this is deprecated form of declaring maps, please see kernel selftests for better examples. It's not arbitrary, it's set at 16 in kernel. char file[255];I haven't checked the order of fields, but each field has to be long in size (so 8 bytes on 64-bit arch). BPF is 64-bit arch, so long is 64-bit there. I'm not sure how this plays out on 32-bit target architecture, but assuming you are on x86-64, all switch int to long and make __mode_t also long. ctx->err doesn't exist according to definition above? 0 is not right here, use BPF_F_CURRENT_CPU (0xffffffffULL). Otherwise you'll get data only on CPU #0 (if you get tracepoint triggered on that CPU). _version is not necessary with modern libbpf and kernel. This is due to invalid memory layour of struct sys_enter_openat_args, you are reading wrong pointer. But sometimes filename might not be in memory and you will get -EFAULT (-14), but that should not happen all the time for sure. I took a look at an article written by Gianluca Borello (https://sysdig.com/blog/the-art-of-writing-ebpf-programs-a-primer/) for Sysdig's approach, and thought that using a raw tracepoint would be easier to get the filename arg than the above approach. I tried it out, but couldn't get it to compile.better include bpf_tracing.h header from libbpf and use PT_REGS_PARM2_CORE(regs) instead of directly referencing fields of pt_regs. This is due to different definitions of struct pt_regs in user-space and kernel-space. Using libbpf's bpf_tracing.h header and PT_REGS macros should eliminate a lot of those. Sticking to vmlinux.h also helps, but requires BPF CO-RE. If CO-RE compiled programs can run on non-BTF supported kernels, then I would be more than happy to shift to that approach. Otherwise, it's nice to have non-BTF reliant code.No, unfortunately, it can't. As a final note, I was working through some examples for XDP in https://github.com/xdp-project/xdp-tutorial and was thinking that something similar would be helpful for general BPF programming. The API may be too volatile at this point, but if people who have the technical expertise are interested, I'm willing to donate some of my own time to help build something similar. BCC's libbpf-tools has been extremely helpful, but it seems that there's not any resources (I've found) that are as in-depth and cohesive as the tutorial linked above. Again, I don't know if it's completely appropriate at this stage of development, but I know there's a lot of interest out there in using BPF at a more granular level and with less overhead than what is offered with BCC.I agree that such tutorial is sorely missing. libbpf-tools and kernel selftests (not so much samples/bpf, though) are probably the best way to see usage of all the newer features. It would be awesome for someone to prepare an approachable and comprehensive set of tutorials, of course. Please do give it a try and community will certainly help you with answering questions you have! |
Andrii Nakryiko
adding back mailing list
On Mon, Apr 6, 2020 at 7:58 AM <mayfieldtristan@...> wrote: Notice offsets, they are all (except for first 4 fields which fit in first 8 bytes) 8-byte aligned. You can do that in your struct definitions as: int __syscall_nr __attribute__((aligned(8))); OR just use long. The other issue I've been confused about, is __syscall_nr has an offset of 8 and size 4, but dfd has an offset 16 where I'd expect 12.Not sure which part do you mean? Field alignment, sizes, and padding are all part of standard C. As for tracepoint, selftests in kernel and various BCC and libbpf examples should be a good starting point. You can use libbpf-tools/Makefile for inspiration on how to do this: https://github.com/iovisor/bcc/blob/master/libbpf-tools/Makefile You might need to define __TARGET_ARCH_x86 and __KERNEL__ explicitly otherwise. It's easier with vmlinux.h, though. BPF is still rapidly evolving, so yeah, that's a concern. It definitely requires dedication and time to maintain good up-to-date documentation. No way around that, unfortunately. Sure, you are welcome. |
Tristan Mayfield
I've waited to reply, not wanting to clog the mailing list, but I thought it would be beneficial to follow up on the same topic with kprobes in addition to tracepoints. The main issue I had with tracepoints was not understanding the 8-byte alignment in the arguments. Once that was sorted, getting information was actually really simple.
At this point I've moved to kprobes, kretprobes, and raw tracepoints. From what I understand, if not using CO-RE or vmlinux.h, to access data from kprobes or kretprobes you must access the cpu registers in which those values live? For example, if I'm porting Brenden Gregg's bpftrace tool "elfsnoop" to libbpf, I'd want to trace "load_elf_binary()." load_elf_binary() only has one argument: "struct linux_binrprm *bprm." So if I want to read that struct, I'd have to access the register with that argument. I think in bpf_tracing.h that macro would be PT_REGS_PARAM1(x). I don't have the greatest understanding of asm and cpu registers, but I believe that would be the %rdi register? With that in mind, here's my code and build. #include <linux/bpf.h> Unfortunately, as Andrii mentioned previously in this topic, I think there are different definitions of pt_regs and my /usr/include/linux/ptrace.h does not have the correct one, as evidenced by the error I get when trying to build. elfsnoop.bpf.c:89:54: error: no member named 'di' in 'struct pt_regs' struct linux_binprm *arg = (struct linux_binprm *) PT_REGS_PARM1(ctx); ^~~~~~~~~~~~~~~~~~ /home/vagrant/libbpf/src/bpf_tracing.h:54:32: note: expanded from macro 'PT_REGS_PARM1' #define PT_REGS_PARM1(x) ((x)->di) Is this the correct way to access data in kprobes? Most of the information I've found explicitly talking about accessing kprobe data is pretty old (2012-2015). selftests/bpf/ seems to not have examples of accessing kprobe data, and, from my understanding, libbpf-tools is CO-RE dependent which I'm trying to avoid for now just because most default kernels aren't BTF enabled yet (I will definitely be voicing my opinion to distros that this should change since the average user likely isn't keen on recompiling and installing a kernel). I also looked at the brief C Appendix of "BPF Performace Tools" and "Linux Observability with BPF" to try and understand, but I still haven't been able to extract data from the kprobes or raw tracepoints yet. I think the final question that may (or may not) solve this issue is which pt_regs should be used? Also, assuming this is the correct way, is this generalizable to raw tracepoints and kretprobes as well? After I have these things figured out with some working examples, I think I will publish a github repo with a tutorial as discussed with Andrii in a few messages above. Appreciate any feedback and help. |
Andrii Nakryiko
On Thu, Apr 16, 2020 at 8:42 AM <mayfieldtristan@...> wrote:
You are not really accessing CPU registers, but you access their values before the program was interrupted. Those values are stored in pt_regs struct. It's a technicality in this case, but you can't access CPU registers directly in BPF. BTW, raw_tracepoints are completely different, but you should be able to find examples in selftests for those. For example, if I'm porting Brenden Gregg's bpftrace tool "elfsnoop" to libbpf, I'd want to trace "load_elf_binary()." load_elf_binary() only has one argument: "struct linux_binrprm *bprm." So if I want to read that struct, I'd have to access the register with that argument. I think in bpf_tracing.h that macro would be PT_REGS_PARAM1(x). I don't have the greatest understanding of asm and cpu registers, but I believe that would be the %rdi register?Yes, rdi register, which is accesed from pt_regs using PT_REGS_PARM1() With that in mind, here's my code and build.So <linux/ptrace.h> in your case is taken from UAPI headers, not kernel internal headers. They have different names for field. Drop -D__KERNEL__ part and it should work. kretprobes can only safely access return value, which you would use PT_REGS_RC(ctx) to get. Input arguments are clobbered by the time kretprobe fires, so using PT_REGS_PARM1(ctx) would return you something, but most probably it won't be a correct value of first input argument. raw_tracepoints are similar to fentry/fexit in that each input argument is 8-byte long. See progs/test_vmlinux.c in selftests/bpf for an example of getting a syscall number on sys_entry. BPF_PROG is useful macro for such use cases.
|