Extracting data from tracepoints (and anything else)


Tristan Mayfield
 

I've been exploring the libbpf library for different versions of the Linux kernel, and trying to rewrite some of the BCC tools. I would like to do more work with CO-RE eventually, but I'm trying to understand the entire model of how BPF programs work and how data flows between the kernel, the VM, and userspace. I just started using perf buffers instead of bpf_trace_printk and came across an issue that has me scratching my head. In the below code, I'm not able to access the const char * arg in the tracepoint sys_enter_openat (kernel 4.15). For some reason the verifier rejects this code. I think it's valid C (although I'm a little bit rusty still) and I think I followed the correct flow where data must be copied from the kernel to the VM before being able to use.

If anyone has insight to share, I'd much appreciate it. Conversely, if anyone can point me in the direction of how to debug BPF programs that would be extremely helpful too. Should I just dig into learning the basics of BPF asm?

Highlights of the code:

struct bpf_map_def SEC("maps") events = {
  .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
  .key_size = sizeof(int),
  .value_size = sizeof(u32),
  .max_entries = MAX_CPUS,
};

struct sys_enter_openat_args {
        u16 common_type;
        u8 common_flags;
        u8 common_preempt_count;
        int common_pid;
        int __syscall_nr;
        int dfd;
        char *filename;
        int flags;
        __mode_t mode;
};

SEC("tracepoint/syscalls/sys_enter_openat")
int bpf_prog(struct sys_enter_openat_args *ctx) {
  struct data_t data;
  struct sys_enter_openat_args *args;

  int res = bpf_probe_read(args, sizeof(ctx), ctx);
  if(!res) {
         data.file = "couldn't get file";
  } else {
         data.file = args->filename;
  }

Error Message:

bpf_load_program() err=13
0: (bf) r6 = r1
1: (b7) r2 = 8
2: (bf) r3 = r6
3: (85) call bpf_probe_read#4
R1 type=ctx expected=fp
The kernel didn't load the BPF program

  data.pid = bpf_get_current_pid_tgid(); // use fn from libbpf.h to get pid_tgid
  bpf_get_current_comm(data.program_name, sizeof(data.program_name)); // puts current comm into char array

  bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));

  return 0;
}

If more code would be helpful, I'm happy to share.

I recognize that libbpf and CO-RE in later kernels provides an easier API for dealing with char * (bpf_probe_read_str() I believe) but I'm trying to understand what needs to be done to target different kernels and not just the most cutting edge.

As a second question, how much should I learn about perf(1) and its overlap with BPF?

Finally, for long-term monitoring solutions and passing readable data, do most programs rely on pinning maps to the vfs instead of using perf buffers or passing directly to a userspace process?

Thanks for the patience and goodwill with a new systems dev. I've enjoyed my interactions with the BPF community.

Tristan


Andrii Nakryiko
 

On Mon, Mar 23, 2020 at 9:38 AM <mayfieldtristan@...> wrote:

I've been exploring the libbpf library for different versions of the Linux kernel, and trying to rewrite some of the BCC tools. I would like to do more work with CO-RE eventually, but I'm trying to understand the entire model of how BPF programs work and how data flows between the kernel, the VM, and userspace. I just started using perf buffers instead of bpf_trace_printk and came across an issue that has me scratching my head. In the below code, I'm not able to access the const char * arg in the tracepoint sys_enter_openat (kernel 4.15). For some reason the verifier rejects this code. I think it's valid C (although I'm a little bit rusty still) and I think I followed the correct flow where data must be copied from the kernel to the VM before being able to use.

If anyone has insight to share, I'd much appreciate it. Conversely, if anyone can point me in the direction of how to debug BPF programs that would be extremely helpful too. Should I just dig into learning the basics of BPF asm?

Highlights of the code:

struct bpf_map_def SEC("maps") events = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = MAX_CPUS,
};
nit: this is a legacy syntax of specifying BPF maps, please see [0]
for some newer examples

[0] https://github.com/iovisor/bcc/tree/master/libbpf-tools


struct sys_enter_openat_args {
u16 common_type;
u8 common_flags;
u8 common_preempt_count;
int common_pid;
int __syscall_nr;
int dfd;
char *filename;
int flags;
__mode_t mode;
};

SEC("tracepoint/syscalls/sys_enter_openat")
int bpf_prog(struct sys_enter_openat_args *ctx) {
struct data_t data;
struct sys_enter_openat_args *args;

int res = bpf_probe_read(args, sizeof(ctx), ctx);
you don't need to bpf_probe_read() ctx here, you can just access its
members directly.

if(!res) {
data.file = "couldn't get file";
} else {
data.file = args->filename;
But here if you want to read filename contents itself, you'll need to
use bpf_probe_read_str().

Having data_t definition would be also helpful.

}

Error Message:

bpf_load_program() err=13
0: (bf) r6 = r1
1: (b7) r2 = 8
2: (bf) r3 = r6
3: (85) call bpf_probe_read#4
R1 type=ctx expected=fp
this error from verifier is quite misleading, but what verifier
complains about here is that you try to read uninitialized pointer
(arg) and pass it as a first parameter into bpf_probe_read(). But see
above, you don't need to bpf_probe_read() anything, and even if you
wanted to it would have to be done very differently:

struct sys_enter_openat_args args; /* notice no pointer here */
bpf_probe_read(&args, sizeof(args), ctx); /* taking address of args,
taking size of args, not its pointer */

The kernel didn't load the BPF program

data.pid = bpf_get_current_pid_tgid(); // use fn from libbpf.h to get pid_tgid
bpf_get_current_comm(data.program_name, sizeof(data.program_name)); // puts current comm into char array

bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));

return 0;
}

If more code would be helpful, I'm happy to share.

I recognize that libbpf and CO-RE in later kernels provides an easier API for dealing with char * (bpf_probe_read_str() I believe) but I'm trying to understand what needs to be done to target different kernels and not just the most cutting edge.

As a second question, how much should I learn about perf(1) and its overlap with BPF?

Finally, for long-term monitoring solutions and passing readable data, do most programs rely on pinning maps to the vfs instead of using perf buffers or passing directly to a userspace process?
It's a mix. If your data should/can be pre-aggregated in kernel, using
map might benefit you in that you will be sending much less data to
user-space. But if you want to send every piece of information than
perf_buffer is faster and more convenient than having user-space query
BPF maps all the time.


Thanks for the patience and goodwill with a new systems dev. I've enjoyed my interactions with the BPF community.
You're welcome. Check libbpf-tools in BCC repo, it should give you
some examples to work off of.


Tristan


Andrii Nakryiko
 

Adding back mailing list.

On Mon, Mar 23, 2020 at 12:33 PM <mayfieldtristan@...> wrote:

Thanks for the reply. All of your suggestions make sense. Because I'm targeting a kernel 4.15 for this specific bit of code, I can't use bpf_probe_read_str().. I think? The way I'm developing is that I cloned the kernel src of kernel 4.15 to a certain depth, and then compiled libbpf. Should I use the standalone libbpf repo instead? I've tried that but have struggled to get samples/bpf/ to compile taking that approach.
bpf_probe_read_str() has been there for a long time, at least 4.12 or
even older.

samples/bpf are part of kernel, so yes, they are using libbpf from
kernel sources. For stand-alone application I'd go with
github.com/libbpf/libbpf


Regardless, I took your advice, and my code now looks like:

struct data_t {
u32 pid;
char program_name[256]; // max comm length is arbitrary
comm is 16 and unlikely to ever change. No need to waste 256 bytes here.

char *file;
};

struct bpf_map_def SEC("maps") events = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = MAX_CPUS,
};



struct sys_enter_openat_args {
u16 common_type;
u8 common_flags;
u8 common_preempt_count;
int common_pid;
int __syscall_nr;
int dfd;
char *filename;
int flags;
__mode_t mode;
};

SEC("tracepoint/syscalls/sys_enter_openat")
int bpf_prog(struct sys_enter_openat_args *ctx) {
struct data_t data;
data has to be initialized here:

struct data_t data = {};

struct sys_enter_openat_args args;

int res = bpf_probe_read(&args, sizeof(args), ctx); // read the ctx into bpf space
if(!res) {
data.file = "couldn't get file";
} else {
data.file = args.filename;
}

data.pid = bpf_get_current_pid_tgid();
bpf_get_current_comm(data.program_name, sizeof(data.program_name));

bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));
return 0;
}

char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;

---
With a new error code

bpf_load_program() err=13
0: (bf) r6 = r1
1: (bf) r1 = r10
2: (07) r1 += -304
3: (b7) r2 = 32
4: (bf) r3 = r6
5: (85) call bpf_probe_read#4
6: (67) r0 <<= 32
7: (77) r0 >>= 32
8: (55) if r0 != 0x0 goto pc+3
R0=inv0 R6=ctx(id=0,off=0,imm=0) R10=fp0
9: (18) r1 = 0xffff88ac7c953000
11: (05) goto pc+1
13: (7b) *(u64 *)(r10 -8) = r1
14: (85) call bpf_get_current_pid_tgid#14
15: (63) *(u32 *)(r10 -272) = r0
16: (bf) r1 = r10
17: (07) r1 += -268
18: (b7) r2 = 256
19: (85) call bpf_get_current_comm#16
20: (bf) r4 = r10
21: (07) r4 += -272
22: (bf) r1 = r6
23: (18) r2 = 0xffff88ac7c953000
25: (b7) r3 = 0
26: (b7) r5 = 272
27: (85) call bpf_perf_event_output#25
invalid indirect read from stack off -272+260 size 272
The kernel didn't load the BPF program
---

Thanks for the pointer about the libbpf-tools in the BCC repo! I had seen it before but for some reason didn't make any kind of note about it. It's extremely helpful.

I will change the map declaration, just trying to understand getting data from the tracepoint right now.
data is not completely initialized, see above.

Please keep this discussion on mailing list, though, it might benefit
someone else.


Tristan Mayfield
 

bpf_probe_read_str() has been there for a long time, at least 4.12 or
even older.
I found out that the cloned the kernel tree from the Ubuntu repo (i.e. "git clone --depth 1 git://kernel.ubuntu.com/ubuntu/ubuntu-bionic.git") for Bionic was the issue. For some reason it doesn't have an up to date libbpf library and so doesn't have bpf_probe_read_str(). I think going forward, getting the API from the repo you recommended or from the official kernel source is the way to go.

I appreciate the pointers for my BPF program. If using github.com/libbpf/libbpf, should I just plan on loading and attaching programs manually instead of using bpf_load.h? I've been looking through the bcc/libbpf-tools/ directory and it looks like they're making use of bpf_load.h and BTF/CO-RE. I've tried using bpf_load.h/c with the standalone libbpf, but I've gotten some difficult linking issues I haven't been able to resolve.
Please keep this discussion on mailing list, though, it might benefit
someone else.
Agreed, the last message I replied to just you accidentally.
Thanks again for the help.


Andrii Nakryiko
 

On Wed, Mar 25, 2020 at 6:45 AM <mayfieldtristan@...> wrote:

bpf_probe_read_str() has been there for a long time, at least 4.12 or
even older.

I found out that the cloned the kernel tree from the Ubuntu repo (i.e. "git clone --depth 1 git://kernel.ubuntu.com/ubuntu/ubuntu-bionic.git") for Bionic was the issue. For some reason it doesn't have an up to date libbpf library and so doesn't have bpf_probe_read_str(). I think going forward, getting the API from the repo you recommended or from the official kernel source is the way to go.

I appreciate the pointers for my BPF program. If using github.com/libbpf/libbpf, should I just plan on loading and attaching programs manually instead of using bpf_load.h? I've been looking through the bcc/libbpf-tools/ directory and it looks like they're making use of bpf_load.h and BTF/CO-RE. I've tried using bpf_load.h/c with the standalone libbpf, but I've gotten some difficult linking issues I haven't been able to resolve.
Take a closer look. libbpf-tools do not use bpf_load.h, that one is
deprecated and its use is discouraged. libbpf-tools rely on
code-generated BPF skeleton. But really, get a close look at
libbpf-tools, it has everything you need to get started.


Please keep this discussion on mailing list, though, it might benefit
someone else.

Agreed, the last message I replied to just you accidentally.
Thanks again for the help.


Tristan Mayfield
 

Take a closer look. libbpf-tools do not use bpf_load.h, that one is
deprecated and its use is discouraged. libbpf-tools rely on
code-generated BPF skeleton. But really, get a close look at
libbpf-tools, it has everything you need to get started.

Will do. Does this mean that, going forward, BPF development will be encouraged to use kernels compiled with "CONFIG_DEBUG_INFO_BTF=y"? I've been using a default build up to now.


Andrii Nakryiko
 

On Wed, Mar 25, 2020 at 11:39 AM <mayfieldtristan@...> wrote:

Take a closer look. libbpf-tools do not use bpf_load.h, that one is
deprecated and its use is discouraged. libbpf-tools rely on
code-generated BPF skeleton. But really, get a close look at
libbpf-tools, it has everything you need to get started.


Will do. Does this mean that, going forward, BPF development will be encouraged to use kernels compiled with "CONFIG_DEBUG_INFO_BTF=y"? I've been using a default build up to now.
Yes. A lot of newer functionality relies on kernel BTF as well. But to
compile portable BPF program you also need kernel BTF (for BPF CO-RE
stuff).


Tristan Mayfield
 

I've spent a few days trying to solve this issue I've had, and I've learned a lot about both the past BPF APIs, and the new CO-RE API. I do have a couple questions though.
  • Once a CO-RE program is compiled and tested with the verifier, can it be run on a kernel of the same version that isn't compiled with BTF?
  • The CO-RE API is very nice, but in case that ends up only being able to run on kernels with BTF support enabled, I've been trying to solve the original issue found in this topic without the CO-RE approach. I'm still not able to read the arguments from a given tracepoint. I'll put my code below. I'm sure there are still plenty of issues and appreciate any time given to nudge me in the right direction.
#include <linux/bpf.h>
#include "bpf_helpers.h"

// To get kernel datatypes. Haven't figured out how to do this
// without cloning the kernel source tree yet.
#include "/kernel-src/tools/include/linux/types.h"
#include <linux/version.h>
#include <asm/ptrace.h>
#include <unistd.h>
#define MAX_CPUS 4

struct bpf_map_def SEC("maps") events = {
  .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
  .key_size = sizeof(int),
  .value_size = sizeof(u32),
  .max_entries = MAX_CPUS,
};

// Struct to pass data via perf buffer
struct data_t {
    u32 pid;
    u32 tgid;
    char program_name[16]; // max comm length is arbitrary
    char file[255];
};

struct sys_enter_openat_args {
    // struct fields obtained from tplist.py output
    long long pad;
    int __syscall_nr;
    int dfd;
    const char * filename;
    int flags;
    __mode_t mode;  // used __mode_t instead of umode_t
};

SEC("tracepoint/syscalls/sys_enter_openat")
int bpf_prog(struct sys_enter_openat_args *ctx)
{
  struct data_t data = {};

  data.pid = bpf_get_current_pid_tgid() >> 32;
  data.tgid = bpf_get_current_pid_tgid();
  bpf_get_current_comm(&data.program_name, sizeof(data.program_name));

  int err = bpf_probe_read_str(data.file, sizeof(data.file), ctx->filename);

  // debugging
  char msg[] = "Probe read results: %d\n";
  bpf_trace_printk(msg, sizeof(msg), ctx->err);

  bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));

  return 0;
}
char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;

With the above code, err = -14 and ctx->filename = -100.
I took a look at an article written by


Andrii Nakryiko
 

On Wed, Apr 1, 2020 at 12:52 PM <mayfieldtristan@...> wrote:

I've spent a few days trying to solve this issue I've had, and I've learned a lot about both the past BPF APIs, and the new CO-RE API. I do have a couple questions though.

Once a CO-RE program is compiled and tested with the verifier, can it be run on a kernel of the same version that isn't compiled with BTF?
Just answered on another Github issue
(https://github.com/iovisor/bcc/issues/2855#issuecomment-609532793),
please check it there as well. Short answer: no. Unless you can pretty
much guarantee that it will be exactly the same **binary** compiled
version of the kernel (not just same version).

The CO-RE API is very nice, but in case that ends up only being able to run on kernels with BTF support enabled, I've been trying to solve the original issue found in this topic without the CO-RE approach. I'm still not able to read the arguments from a given tracepoint. I'll put my code below. I'm sure there are still plenty of issues and appreciate any time given to nudge me in the right direction.

#include <linux/bpf.h>
#include "bpf_helpers.h"

// To get kernel datatypes. Haven't figured out how to do this
// without cloning the kernel source tree yet.
#include "/kernel-src/tools/include/linux/types.h"
These should come from kernel-devel packages.

#include <linux/version.h>
#include <asm/ptrace.h>
#include <unistd.h>
#define MAX_CPUS 4

struct bpf_map_def SEC("maps") events = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = MAX_CPUS,
};
nit: this is deprecated form of declaring maps, please see kernel
selftests for better examples.



// Struct to pass data via perf buffer
struct data_t {
u32 pid;
u32 tgid;
char program_name[16]; // max comm length is arbitrary
It's not arbitrary, it's set at 16 in kernel.

char file[255];
};

struct sys_enter_openat_args {
// struct fields obtained from tplist.py output
long long pad;
int __syscall_nr;
int dfd;
const char * filename;
int flags;
__mode_t mode; // used __mode_t instead of umode_t
};
I haven't checked the order of fields, but each field has to be long
in size (so 8 bytes on 64-bit arch). BPF is 64-bit arch, so long is
64-bit there. I'm not sure how this plays out on 32-bit target
architecture, but assuming you are on x86-64, all switch int to long
and make __mode_t also long.


SEC("tracepoint/syscalls/sys_enter_openat")
int bpf_prog(struct sys_enter_openat_args *ctx)
{
struct data_t data = {};

data.pid = bpf_get_current_pid_tgid() >> 32;
data.tgid = bpf_get_current_pid_tgid();
bpf_get_current_comm(&data.program_name, sizeof(data.program_name));

int err = bpf_probe_read_str(data.file, sizeof(data.file), ctx->filename);

// debugging
char msg[] = "Probe read results: %d\n";
bpf_trace_printk(msg, sizeof(msg), ctx->err);
ctx->err doesn't exist according to definition above?


bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));
0 is not right here, use BPF_F_CURRENT_CPU (0xffffffffULL). Otherwise
you'll get data only on CPU #0 (if you get tracepoint triggered on
that CPU).


return 0;
}
char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;
_version is not necessary with modern libbpf and kernel.


With the above code, err = -14 and ctx->filename = -100.
This is due to invalid memory layour of struct sys_enter_openat_args,
you are reading wrong pointer. But sometimes filename might not be in
memory and you will get -EFAULT (-14), but that should not happen all
the time for sure.


I took a look at an article written by Gianluca Borello (https://sysdig.com/blog/the-art-of-writing-ebpf-programs-a-primer/) for Sysdig's approach, and thought that using a raw tracepoint would be easier to get the filename arg than the above approach. I tried it out, but couldn't get it to compile.
Here's the new function:

SEC("raw_tracepoint/sys_enter")
int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
{
unsigned long syscall_id = ctx->args[1];
volatile struct pt_regs *regs;
volatile const char *pathname;

regs = (struct pt_regs *)ctx->args[0];
pathname = (const char *)regs->si;
better include bpf_tracing.h header from libbpf and use
PT_REGS_PARM2_CORE(regs) instead of directly referencing fields of
pt_regs.


struct data_t data = {};

data.pid = bpf_get_current_pid_tgid() >> 32;
data.tgid = bpf_get_current_pid_tgid();
bpf_get_current_comm(&data.program_name, sizeof(data.program_name));

char msg[] = "Probe read results: %d\n";
bpf_trace_printk(msg, sizeof(msg), syscall_id);

bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));

return 0;
}

With this code I get a compilation error:
file_open_kern.c:77:34: error: no member named 'si' in 'struct pt_regs'
pathname = (const char *)regs->si;
~~~~ ^
This error is strange to me because ptrace.h does list %si as a valid field. Perhaps I'm using the wrong header. Hopefully this is enough information to be clear.
This is due to different definitions of struct pt_regs in user-space
and kernel-space. Using libbpf's bpf_tracing.h header and PT_REGS
macros should eliminate a lot of those. Sticking to vmlinux.h also
helps, but requires BPF CO-RE.

If CO-RE compiled programs can run on non-BTF supported kernels, then I would be more than happy to shift to that approach. Otherwise, it's nice to have non-BTF reliant code.
No, unfortunately, it can't.

As a final note, I was working through some examples for XDP in https://github.com/xdp-project/xdp-tutorial and was thinking that something similar would be helpful for general BPF programming. The API may be too volatile at this point, but if people who have the technical expertise are interested, I'm willing to donate some of my own time to help build something similar. BCC's libbpf-tools has been extremely helpful, but it seems that there's not any resources (I've found) that are as in-depth and cohesive as the tutorial linked above. Again, I don't know if it's completely appropriate at this stage of development, but I know there's a lot of interest out there in using BPF at a more granular level and with less overhead than what is offered with BCC.
I agree that such tutorial is sorely missing. libbpf-tools and kernel
selftests (not so much samples/bpf, though) are probably the best way
to see usage of all the newer features. It would be awesome for
someone to prepare an approachable and comprehensive set of tutorials,
of course. Please do give it a try and community will certainly help
you with answering questions you have!


Andrii Nakryiko
 

adding back mailing list


On Mon, Apr 6, 2020 at 7:58 AM <mayfieldtristan@...> wrote:

Andrii, thanks for the reply!

It's not arbitrary, it's set at 16 in kernel.

ctx->err doesn't exist according to definition above?

Sorry, these were my mistake. I neglected cleaning my code up properly before sending here. I thought I had caught my relic comments and weird experiments, but hadn't.
Really sorry.


I haven't checked the order of fields, but each field has to be long
in size (so 8 bytes on 64-bit arch). BPF is 64-bit arch, so long is
64-bit there. I'm not sure how this plays out on 32-bit target
architecture, but assuming you are on x86-64, all switch int to long
and make __mode_t also long.
Interesting. Here's the tracepoint field order for reference (if nothing else so the information is in one place for people who may read this):

root@ubuntu-focal:~# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_openat/format
name: sys_enter_openat
ID: 622
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;

field:int __syscall_nr; offset:8; size:4; signed:1;
field:int dfd; offset:16; size:8; signed:0;
field:const char * filename; offset:24; size:8; signed:0;
field:int flags; offset:32; size:8; signed:0;
field:umode_t mode; offset:40; size:8; signed:0;

I tried matching the struct to the fields listed, but I am on x86_64 so I guess the ints and umode_t should be long.
Notice offsets, they are all (except for first 4 fields which fit in
first 8 bytes) 8-byte aligned. You can do that in your struct
definitions as:

int __syscall_nr __attribute__((aligned(8)));

OR just use long.

The other issue I've been confused about, is __syscall_nr has an offset of 8 and size 4, but dfd has an offset 16 where I'd expect 12.
Does that mean that there's just meaningless data in that area that should be accounted for?
And, if the data are longs, does that mean that the information given in "format" is incorrect?


0 is not right here, use BPF_F_CURRENT_CPU (0xffffffffULL). Otherwise
you'll get data only on CPU #0 (if you get tracepoint triggered on
that CPU).
Ah, that is really helpful! I think I just took 0 from some code at https://github.com/bpftools/linux-observability-with-bpf
and just hadn't looked into those arguments yet, assuming they were correct!

This is due to invalid memory layour of struct sys_enter_openat_args,
you are reading wrong pointer. But sometimes filename might not be in
memory and you will get -EFAULT (-14), but that should not happen all
the time for sure.
Okay, so fixing the *ctx struct to use longs did, in fact, work! Is there a resource or way that I should have read in order to know that?
I'm actually really excited I can finally read tracepoint data :)
Not sure which part do you mean? Field alignment, sizes, and padding
are all part of standard C. As for tracepoint, selftests in kernel and
various BCC and libbpf examples should be a good starting point.


Since that worked, I'm a little less concerned with the raw tracepoints, but still interested. Here's my modified code for it:

#include "bpf_tracing.h"
#include <linux/bpf.h>
#include "bpf_helpers.h"

SEC("raw_tracepoint/sys_enter")
int bpf_prog(struct bpf_raw_tracepoint_args *ctx) {

volatile struct pt_regs *regs;
volatile const char *pathname;
regs = (struct pt_regs *)ctx->args[0];
pathname = PT_REGS_PARM2_CORE(regs); // instead of (const char *)regs->si;

char msg[] = "Path: %d\n";
bpf_trace_printk(msg, sizeof(msg), pathname);

return 0;
}
char _license[] SEC("license") = "GPL";

With this, I get a compiler error warning that "implicit declaration of function 'PT_REGS_PARM2_CORE' is invalid in C99"
which indicates to me that the defined guards in bpf_tracing.h are keeping me from accessing the macro.
I looked over the bpf_tracing.h file to see if it was an easy error, but it hasn't been obvious to me yet.
I'll keep fiddling with it, and look at selftests, and see if I can get it working.
You can use libbpf-tools/Makefile for inspiration on how to do this:
https://github.com/iovisor/bcc/blob/master/libbpf-tools/Makefile

You might need to define __TARGET_ARCH_x86 and __KERNEL__ explicitly
otherwise. It's easier with vmlinux.h, though.



Finally, I definitely am interested in starting up a tutorial. Right now I can load, attach, and unload BPF programs. Use perf buffers. I'm sure I could use other maps types as they're pretty simple, just haven't dabbled in them yet. I can also read data from tracepoints ;)
I'm going to start on kprobes this week, and hopefully that will be a little more straightforward after doing the work on tracepoints.
That's about what I could start a tutorial with right now. I'll maybe start one this week with some basic "hello world" type stuff, but I'm nervous to get too deep into technical details if the community isn't willing to at least look over it and make sure I'm not steering information the wrong direction. From the sound of it, that's not a huge worry, but a concern of mine nonetheless. There's a lot of deprecated information about BPF out there, and I don't want to make another deprecated resource.
BPF is still rapidly evolving, so yeah, that's a concern. It
definitely requires dedication and time to maintain good up-to-date
documentation. No way around that, unfortunately.


Cheers again for helping me debug my tracepoint code! I'm excited it's working!
Sure, you are welcome.


Tristan Mayfield
 

I've waited to reply, not wanting to clog the mailing list, but I thought it would be beneficial to follow up on the same topic with kprobes in addition to tracepoints. The main issue I had with tracepoints was not understanding the 8-byte alignment in the arguments. Once that was sorted, getting information was actually really simple.

At this point I've moved to kprobes, kretprobes, and raw tracepoints. From what I understand, if not using CO-RE or vmlinux.h, to access data from kprobes or kretprobes you must access the cpu registers in which those values live?
For example, if I'm porting Brenden Gregg's bpftrace tool "elfsnoop" to libbpf, I'd want to trace "load_elf_binary()." load_elf_binary() only has one argument: "struct linux_binrprm *bprm." So if I want to read that struct, I'd have to access the register with that argument. I think in bpf_tracing.h that macro would be PT_REGS_PARAM1(x). I don't have the greatest understanding of asm and cpu registers, but I believe that would be the %rdi register?
With that in mind, here's my code and build.

#include <linux/bpf.h>
#include "bpf_helpers.h"
#include "bpf_tracing.h"
#include <linux/ptrace.h>
#include <linux/types.h>

SEC("kprobe/load_elf_binary")
int trace_entry(struct pt_regs *ctx) {
char msg[] = "hello world\n"; // for verification that the bpf program is running at all
bpf_trace_printk(msg, sizeof(msg));

struct linux_binprm *arg = (struct linux_binprm *) PT_REGS_PARM1(ctx);

return 0;
}

char _license[] SEC("license") = "GPL";

//// And the build command.
// Target arch and kernel are defined to get the correct macros
// in bpf_tracing.h
$ clang -O3 -Wall -target bpf \
-D__TARGET_ARCH_x86 \
-D__KERNEL__ -c \
elfsnoop.bpf.c \
-I/home/vagrant/libbpf/src/ \
-o elfsnoop.bpf.o

Unfortunately, as Andrii mentioned previously in this topic, I think there are different definitions of pt_regs and my /usr/include/linux/ptrace.h does not have the correct one, as evidenced by the error I get when trying to build.

elfsnoop.bpf.c:89:54: error: no member named 'di' in 'struct pt_regs'
  struct linux_binprm *arg = (struct linux_binprm *) PT_REGS_PARM1(ctx);
                                                     ^~~~~~~~~~~~~~~~~~
/home/vagrant/libbpf/src/bpf_tracing.h:54:32: note: expanded from macro 'PT_REGS_PARM1'
#define PT_REGS_PARM1(x) ((x)->di)

Is this the correct way to access data in kprobes? Most of the information I've found explicitly talking about accessing kprobe data is pretty old (2012-2015). selftests/bpf/ seems to not have examples of accessing kprobe data, and, from my understanding, libbpf-tools is CO-RE dependent which I'm trying to avoid for now just because most default kernels aren't BTF enabled yet (I will definitely be voicing my opinion to distros that this should change since the average user likely isn't keen on recompiling and installing a kernel). I also looked at the brief C Appendix of "BPF Performace Tools" and "Linux Observability with BPF" to try and understand, but I still haven't been able to extract data from the kprobes or raw tracepoints yet.
I think the final question that may (or may not) solve this issue is which pt_regs should be used?

Also, assuming this is the correct way, is this generalizable to raw tracepoints and kretprobes as well?

After I have these things figured out with some working examples, I think I will publish a github repo with a tutorial as discussed with Andrii in a few messages above.
Appreciate any feedback and help.


Andrii Nakryiko
 

On Thu, Apr 16, 2020 at 8:42 AM <mayfieldtristan@...> wrote:

I've waited to reply, not wanting to clog the mailing list, but I thought it would be beneficial to follow up on the same topic with kprobes in addition to tracepoints. The main issue I had with tracepoints was not understanding the 8-byte alignment in the arguments. Once that was sorted, getting information was actually really simple.

At this point I've moved to kprobes, kretprobes, and raw tracepoints. From what I understand, if not using CO-RE or vmlinux.h, to access data from kprobes or kretprobes you must access the cpu registers in which those values live?
You are not really accessing CPU registers, but you access their
values before the program was interrupted. Those values are stored in
pt_regs struct. It's a technicality in this case, but you can't access
CPU registers directly in BPF.

BTW, raw_tracepoints are completely different, but you should be able
to find examples in selftests for those.

For example, if I'm porting Brenden Gregg's bpftrace tool "elfsnoop" to libbpf, I'd want to trace "load_elf_binary()." load_elf_binary() only has one argument: "struct linux_binrprm *bprm." So if I want to read that struct, I'd have to access the register with that argument. I think in bpf_tracing.h that macro would be PT_REGS_PARAM1(x). I don't have the greatest understanding of asm and cpu registers, but I believe that would be the %rdi register?
Yes, rdi register, which is accesed from pt_regs using PT_REGS_PARM1()

With that in mind, here's my code and build.

#include <linux/bpf.h>
#include "bpf_helpers.h"
#include "bpf_tracing.h"
#include <linux/ptrace.h>
#include <linux/types.h>

SEC("kprobe/load_elf_binary")
int trace_entry(struct pt_regs *ctx) {
char msg[] = "hello world\n"; // for verification that the bpf program is running at all
bpf_trace_printk(msg, sizeof(msg));

struct linux_binprm *arg = (struct linux_binprm *) PT_REGS_PARM1(ctx);

return 0;
}
char _license[] SEC("license") = "GPL";

//// And the build command.
// Target arch and kernel are defined to get the correct macros
// in bpf_tracing.h
$ clang -O3 -Wall -target bpf \
-D__TARGET_ARCH_x86 \
-D__KERNEL__ -c \
elfsnoop.bpf.c \
-I/home/vagrant/libbpf/src/ \
-o elfsnoop.bpf.o


Unfortunately, as Andrii mentioned previously in this topic, I think there are different definitions of pt_regs and my /usr/include/linux/ptrace.h does not have the correct one, as evidenced by the error I get when trying to build.

elfsnoop.bpf.c:89:54: error: no member named 'di' in 'struct pt_regs'
struct linux_binprm *arg = (struct linux_binprm *) PT_REGS_PARM1(ctx);
^~~~~~~~~~~~~~~~~~
/home/vagrant/libbpf/src/bpf_tracing.h:54:32: note: expanded from macro 'PT_REGS_PARM1'
#define PT_REGS_PARM1(x) ((x)->di)

Is this the correct way to access data in kprobes? Most of the information I've found explicitly talking about accessing kprobe data is pretty old (2012-2015). selftests/bpf/ seems to not have examples of accessing kprobe data, and, from my understanding, libbpf-tools is CO-RE dependent which I'm trying to avoid for now just because most default kernels aren't BTF enabled yet (I will definitely be voicing my opinion to distros that this should change since the average user likely isn't keen on recompiling and installing a kernel). I also looked at the brief C Appendix of "BPF Performace Tools" and "Linux Observability with BPF" to try and understand, but I still haven't been able to extract data from the kprobes or raw tracepoints yet.
I think the final question that may (or may not) solve this issue is which pt_regs should be used?
So <linux/ptrace.h> in your case is taken from UAPI headers, not
kernel internal headers. They have different names for field. Drop
-D__KERNEL__ part and it should work.


Also, assuming this is the correct way, is this generalizable to raw tracepoints and kretprobes as well?
kretprobes can only safely access return value, which you would use
PT_REGS_RC(ctx) to get. Input arguments are clobbered by the time
kretprobe fires, so using PT_REGS_PARM1(ctx) would return you
something, but most probably it won't be a correct value of first
input argument.

raw_tracepoints are similar to fentry/fexit in that each input
argument is 8-byte long. See progs/test_vmlinux.c in selftests/bpf for
an example of getting a syscall number on sys_entry. BPF_PROG is
useful macro for such use cases.


After I have these things figured out with some working examples, I think I will publish a github repo with a tutorial as discussed with Andrii in a few messages above.
Appreciate any feedback and help.