Re: Extracting data from tracepoints (and anything else)


Andrii Nakryiko
 

On Wed, Apr 1, 2020 at 12:52 PM <mayfieldtristan@...> wrote:

I've spent a few days trying to solve this issue I've had, and I've learned a lot about both the past BPF APIs, and the new CO-RE API. I do have a couple questions though.

Once a CO-RE program is compiled and tested with the verifier, can it be run on a kernel of the same version that isn't compiled with BTF?
Just answered on another Github issue
(https://github.com/iovisor/bcc/issues/2855#issuecomment-609532793),
please check it there as well. Short answer: no. Unless you can pretty
much guarantee that it will be exactly the same **binary** compiled
version of the kernel (not just same version).

The CO-RE API is very nice, but in case that ends up only being able to run on kernels with BTF support enabled, I've been trying to solve the original issue found in this topic without the CO-RE approach. I'm still not able to read the arguments from a given tracepoint. I'll put my code below. I'm sure there are still plenty of issues and appreciate any time given to nudge me in the right direction.

#include <linux/bpf.h>
#include "bpf_helpers.h"

// To get kernel datatypes. Haven't figured out how to do this
// without cloning the kernel source tree yet.
#include "/kernel-src/tools/include/linux/types.h"
These should come from kernel-devel packages.

#include <linux/version.h>
#include <asm/ptrace.h>
#include <unistd.h>
#define MAX_CPUS 4

struct bpf_map_def SEC("maps") events = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = MAX_CPUS,
};
nit: this is deprecated form of declaring maps, please see kernel
selftests for better examples.



// Struct to pass data via perf buffer
struct data_t {
u32 pid;
u32 tgid;
char program_name[16]; // max comm length is arbitrary
It's not arbitrary, it's set at 16 in kernel.

char file[255];
};

struct sys_enter_openat_args {
// struct fields obtained from tplist.py output
long long pad;
int __syscall_nr;
int dfd;
const char * filename;
int flags;
__mode_t mode; // used __mode_t instead of umode_t
};
I haven't checked the order of fields, but each field has to be long
in size (so 8 bytes on 64-bit arch). BPF is 64-bit arch, so long is
64-bit there. I'm not sure how this plays out on 32-bit target
architecture, but assuming you are on x86-64, all switch int to long
and make __mode_t also long.


SEC("tracepoint/syscalls/sys_enter_openat")
int bpf_prog(struct sys_enter_openat_args *ctx)
{
struct data_t data = {};

data.pid = bpf_get_current_pid_tgid() >> 32;
data.tgid = bpf_get_current_pid_tgid();
bpf_get_current_comm(&data.program_name, sizeof(data.program_name));

int err = bpf_probe_read_str(data.file, sizeof(data.file), ctx->filename);

// debugging
char msg[] = "Probe read results: %d\n";
bpf_trace_printk(msg, sizeof(msg), ctx->err);
ctx->err doesn't exist according to definition above?


bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));
0 is not right here, use BPF_F_CURRENT_CPU (0xffffffffULL). Otherwise
you'll get data only on CPU #0 (if you get tracepoint triggered on
that CPU).


return 0;
}
char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;
_version is not necessary with modern libbpf and kernel.


With the above code, err = -14 and ctx->filename = -100.
This is due to invalid memory layour of struct sys_enter_openat_args,
you are reading wrong pointer. But sometimes filename might not be in
memory and you will get -EFAULT (-14), but that should not happen all
the time for sure.


I took a look at an article written by Gianluca Borello (https://sysdig.com/blog/the-art-of-writing-ebpf-programs-a-primer/) for Sysdig's approach, and thought that using a raw tracepoint would be easier to get the filename arg than the above approach. I tried it out, but couldn't get it to compile.
Here's the new function:

SEC("raw_tracepoint/sys_enter")
int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
{
unsigned long syscall_id = ctx->args[1];
volatile struct pt_regs *regs;
volatile const char *pathname;

regs = (struct pt_regs *)ctx->args[0];
pathname = (const char *)regs->si;
better include bpf_tracing.h header from libbpf and use
PT_REGS_PARM2_CORE(regs) instead of directly referencing fields of
pt_regs.


struct data_t data = {};

data.pid = bpf_get_current_pid_tgid() >> 32;
data.tgid = bpf_get_current_pid_tgid();
bpf_get_current_comm(&data.program_name, sizeof(data.program_name));

char msg[] = "Probe read results: %d\n";
bpf_trace_printk(msg, sizeof(msg), syscall_id);

bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));

return 0;
}

With this code I get a compilation error:
file_open_kern.c:77:34: error: no member named 'si' in 'struct pt_regs'
pathname = (const char *)regs->si;
~~~~ ^
This error is strange to me because ptrace.h does list %si as a valid field. Perhaps I'm using the wrong header. Hopefully this is enough information to be clear.
This is due to different definitions of struct pt_regs in user-space
and kernel-space. Using libbpf's bpf_tracing.h header and PT_REGS
macros should eliminate a lot of those. Sticking to vmlinux.h also
helps, but requires BPF CO-RE.

If CO-RE compiled programs can run on non-BTF supported kernels, then I would be more than happy to shift to that approach. Otherwise, it's nice to have non-BTF reliant code.
No, unfortunately, it can't.

As a final note, I was working through some examples for XDP in https://github.com/xdp-project/xdp-tutorial and was thinking that something similar would be helpful for general BPF programming. The API may be too volatile at this point, but if people who have the technical expertise are interested, I'm willing to donate some of my own time to help build something similar. BCC's libbpf-tools has been extremely helpful, but it seems that there's not any resources (I've found) that are as in-depth and cohesive as the tutorial linked above. Again, I don't know if it's completely appropriate at this stage of development, but I know there's a lot of interest out there in using BPF at a more granular level and with less overhead than what is offered with BCC.
I agree that such tutorial is sorely missing. libbpf-tools and kernel
selftests (not so much samples/bpf, though) are probably the best way
to see usage of all the newer features. It would be awesome for
someone to prepare an approachable and comprehensive set of tutorials,
of course. Please do give it a try and community will certainly help
you with answering questions you have!

Join {iovisor-dev@lists.iovisor.org to automatically receive all group messages.