Re: Extracting data from tracepoints (and anything else)


Andrii Nakryiko
 

adding back mailing list


On Mon, Apr 6, 2020 at 7:58 AM <mayfieldtristan@...> wrote:

Andrii, thanks for the reply!

It's not arbitrary, it's set at 16 in kernel.

ctx->err doesn't exist according to definition above?

Sorry, these were my mistake. I neglected cleaning my code up properly before sending here. I thought I had caught my relic comments and weird experiments, but hadn't.
Really sorry.


I haven't checked the order of fields, but each field has to be long
in size (so 8 bytes on 64-bit arch). BPF is 64-bit arch, so long is
64-bit there. I'm not sure how this plays out on 32-bit target
architecture, but assuming you are on x86-64, all switch int to long
and make __mode_t also long.
Interesting. Here's the tracepoint field order for reference (if nothing else so the information is in one place for people who may read this):

root@ubuntu-focal:~# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_openat/format
name: sys_enter_openat
ID: 622
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;

field:int __syscall_nr; offset:8; size:4; signed:1;
field:int dfd; offset:16; size:8; signed:0;
field:const char * filename; offset:24; size:8; signed:0;
field:int flags; offset:32; size:8; signed:0;
field:umode_t mode; offset:40; size:8; signed:0;

I tried matching the struct to the fields listed, but I am on x86_64 so I guess the ints and umode_t should be long.
Notice offsets, they are all (except for first 4 fields which fit in
first 8 bytes) 8-byte aligned. You can do that in your struct
definitions as:

int __syscall_nr __attribute__((aligned(8)));

OR just use long.

The other issue I've been confused about, is __syscall_nr has an offset of 8 and size 4, but dfd has an offset 16 where I'd expect 12.
Does that mean that there's just meaningless data in that area that should be accounted for?
And, if the data are longs, does that mean that the information given in "format" is incorrect?


0 is not right here, use BPF_F_CURRENT_CPU (0xffffffffULL). Otherwise
you'll get data only on CPU #0 (if you get tracepoint triggered on
that CPU).
Ah, that is really helpful! I think I just took 0 from some code at https://github.com/bpftools/linux-observability-with-bpf
and just hadn't looked into those arguments yet, assuming they were correct!

This is due to invalid memory layour of struct sys_enter_openat_args,
you are reading wrong pointer. But sometimes filename might not be in
memory and you will get -EFAULT (-14), but that should not happen all
the time for sure.
Okay, so fixing the *ctx struct to use longs did, in fact, work! Is there a resource or way that I should have read in order to know that?
I'm actually really excited I can finally read tracepoint data :)
Not sure which part do you mean? Field alignment, sizes, and padding
are all part of standard C. As for tracepoint, selftests in kernel and
various BCC and libbpf examples should be a good starting point.


Since that worked, I'm a little less concerned with the raw tracepoints, but still interested. Here's my modified code for it:

#include "bpf_tracing.h"
#include <linux/bpf.h>
#include "bpf_helpers.h"

SEC("raw_tracepoint/sys_enter")
int bpf_prog(struct bpf_raw_tracepoint_args *ctx) {

volatile struct pt_regs *regs;
volatile const char *pathname;
regs = (struct pt_regs *)ctx->args[0];
pathname = PT_REGS_PARM2_CORE(regs); // instead of (const char *)regs->si;

char msg[] = "Path: %d\n";
bpf_trace_printk(msg, sizeof(msg), pathname);

return 0;
}
char _license[] SEC("license") = "GPL";

With this, I get a compiler error warning that "implicit declaration of function 'PT_REGS_PARM2_CORE' is invalid in C99"
which indicates to me that the defined guards in bpf_tracing.h are keeping me from accessing the macro.
I looked over the bpf_tracing.h file to see if it was an easy error, but it hasn't been obvious to me yet.
I'll keep fiddling with it, and look at selftests, and see if I can get it working.
You can use libbpf-tools/Makefile for inspiration on how to do this:
https://github.com/iovisor/bcc/blob/master/libbpf-tools/Makefile

You might need to define __TARGET_ARCH_x86 and __KERNEL__ explicitly
otherwise. It's easier with vmlinux.h, though.



Finally, I definitely am interested in starting up a tutorial. Right now I can load, attach, and unload BPF programs. Use perf buffers. I'm sure I could use other maps types as they're pretty simple, just haven't dabbled in them yet. I can also read data from tracepoints ;)
I'm going to start on kprobes this week, and hopefully that will be a little more straightforward after doing the work on tracepoints.
That's about what I could start a tutorial with right now. I'll maybe start one this week with some basic "hello world" type stuff, but I'm nervous to get too deep into technical details if the community isn't willing to at least look over it and make sure I'm not steering information the wrong direction. From the sound of it, that's not a huge worry, but a concern of mine nonetheless. There's a lot of deprecated information about BPF out there, and I don't want to make another deprecated resource.
BPF is still rapidly evolving, so yeah, that's a concern. It
definitely requires dedication and time to maintain good up-to-date
documentation. No way around that, unfortunately.


Cheers again for helping me debug my tracepoint code! I'm excited it's working!
Sure, you are welcome.

Join {iovisor-dev@lists.iovisor.org to automatically receive all group messages.