Date   

Re: BPF Concurrency

Andrii Nakryiko
 

On Fri, May 22, 2020 at 1:07 PM Kanthi P <Pavuluri.kanthi@...> wrote:

Hi,


I’ve been reading that hash map’s update element is atomic and also that we can use BPF_XADD to make the entire map update atomically.


But I think that doesn’t guarantee that these updates are thread safe, meaning one cpu core can overwrite other core’s update.


Is there a clean way of keeping them thread safe. Unfortunately I can’t use per-cpu maps as I need global counters.


And spin locks sounds a costly operation. Can you please throw some light?
Stating that spin locks are costly without empirical data seems
premature. What's the scenario? What's the number of CPUs? What's the
level of contention? Under light contention, spin locks in practice
would be almost as fast as atomic increments. Under heavy contention,
spin locks would probably be even better than atomics because they
will not waste as much CPU, as a typical atomic retry loop would.

But basically, depending on your use case (which you should probably
describe to get a better answer), you can either:
- do atomic increment/decrement if you need to update a counter (see
examples in kernel selftests using __sync_fetch_and_add);
- use map with bpf_spin_lock (there are also examples in selftests).



Regards,

Kanthi


Re: BPF Concurrency

Yonghong Song
 

On Fri, May 22, 2020 at 1:07 PM Kanthi P <Pavuluri.kanthi@...> wrote:

Hi,


I’ve been reading that hash map’s update element is atomic and also that we can use BPF_XADD to make the entire map update atomically.
BPF_XADD is to make one map element inside the kenel to update atomically.

Could you filx an issue with more details? This way, we will have a
better record.



But I think that doesn’t guarantee that these updates are thread safe, meaning one cpu core can overwrite other core’s update.
Not sure what do you mean here. yes, one cpu updated a map element and
the other can modify it.
What kind of primitive do you want? compare-and-exchange?



Is there a clean way of keeping them thread safe. Unfortunately I can’t use per-cpu maps as I need global counters.


And spin locks sounds a costly operation. Can you please throw some light?


Regards,

Kanthi


Re: USDT probe to trace based on path to binary

Yonghong Song
 

On Fri, May 22, 2020 at 7:14 PM Vallish Guru.V. <vallishguru@...> wrote:

Hello,


I am trying to introduce USDT probe to an application and my bcc script is failing with following error:


<snip>

Traceback (most recent call last):

File "test.py", line 40, in <module>

b = BPF(text=prog, usdt_contexts=[u])

File "/usr/lib/python3.6/dist-packages/bcc/__init__.py", line 318, in __init__

"locations")

Exception: can't generate USDT probe arguments; possible cause is missing pid when a probe in a shared object has multiple locations

<end snip>


I am trying to run bcc script based on path to binary and not pid. The closest discussion I found related to this error is:

https://github.com/iovisor/bcc/issues/1774

Application that I am trying to instrument is not a shared object as discussed in issue #1774. The code that I have instrumented is in the application. Since there are more than
Could you file a separate issue so it is easy for people to help?

Do you have a minimum reproducible test case? This will make it easy
to debug. application without pid should work. A test case will be
helpful to investigate.

one instance of the application, tracing through pid becomes messy. Is
there something obvious that I have missed in my script? I would
appreciate any pointers to unblock me.


Thanks.

-Vallish


USDT probe to trace based on path to binary

Vallish Guru.V.
 

Hello,


I am trying to introduce USDT probe to an application and my bcc script is failing with following error:


<snip>

Traceback (most recent call last):

  File "test.py", line 40, in <module>

    b = BPF(text=prog, usdt_contexts=[u])

  File "/usr/lib/python3.6/dist-packages/bcc/__init__.py", line 318, in __init__

    "locations")

Exception: can't generate USDT probe arguments; possible cause is missing pid when a probe in a shared object has multiple locations

<end snip>


I am trying to run bcc script based on path to binary and not pid.  The closest discussion I found related to this error is:

https://github.com/iovisor/bcc/issues/1774

Application that I am trying to instrument is not a shared object as discussed in issue #1774. The code that I have instrumented is in the application. Since there are more than one instance of the application, tracing through pid becomes messy. Is there something obvious that I have missed in my script? I would appreciate any pointers to unblock me.


Thanks.

-Vallish


BPF Concurrency

Kanthi P
 

Hi,


I’ve been reading that hash map’s update element is atomic and also that we can use BPF_XADD to make the entire map update atomically.


But I think that doesn’t guarantee that these updates are thread safe, meaning one cpu core can overwrite other core’s update.


Is there a clean way of keeping them thread safe. Unfortunately I can’t use per-cpu maps as I need global counters.


And spin locks sounds a costly operation. Can you please throw some light?


Regards,

Kanthi


Re: Building BPF programs and kernel persistence

Andrii Nakryiko
 

On Mon, May 18, 2020 at 9:23 AM Tristan Mayfield
<mayfieldtristan@...> wrote:

Thanks for the reply Andrii.

Managed to get a build working outside of the kernel tree for BPF programs.
The two major things that I learned were that first, the order in which files are
included in the build command is more important than I previously thought.
The second thing was learning how clang deals with asm differently than gcc.
I had to use samples/bpf/asm_goto_workaround.h to fix those errors.
The meat of the makefile is as follows:

CLANGINC := /usr/lib/llvm-10/lib/clang/10.0.0/include
INC_FLAGS := -nostdinc -isystem $(CLANGINC)
EXTRA_FLAGS := -O3 -emit-llvm
BTW, everyone seems to be using -O2 for compiling BPF programs. Not
sure how well-supported -O3 will be.

[...]


I suspect that these warnings come from my aggressive warning flags during
compilation rather than from actual issues in the kernel.

Right, pinning map or program doesn't ensure that program is still
attached to whatever BPF hook you attached to it. As you mentioned,
XDP, tc, cgroup-bpf programs are persistent. We are actually moving
towards the model of auto-detachment for those as well. See recent
activity around bpf_link. The solution with bpf_link to make such
attachments persistent is through pinning **link** itself, not program
or map. bpf_link is relatively recent addition, so on older kernels
you'd have to make sure you still have some process around that would
keep BPF attachment FD around.

I have been looking at the commits surrounding the pinning of bpf_link. It looks like it's only
working in kernel 5.7? I did actually go through and attempt to attach links for kprobes,
tracepoints, and raw_tracepoints in kernel 5.4 but, as you suggested, it seems unsupported.
I have yet to try on kernel 5.5-5.7 so I'll take a look this week or next.

As I mentioned before, with basic functionality in place here, I'm interested in working on
some sort of BPF tutorial similar to the XDP tutorial (https://github.com/xdp-project/xdp-tutorial)
with perhaps a more in-depth look at the technology included as well.

I'm still fuzzy on the relationship between bpf(2) and perf(1). Would it be correct to say that for
tracepoints, kprobes, and uprobes BPF leverages perf "under the hood" while for XDP and tc,
this is more like classic BPF in that it's implementation doesn't involve perf?
"classic BPF" is entirely different thing, don't use that term in this
context, it will just confuse people.

perf is used as a means to trigger BPF program execution for
tracepoint and kprobes. It is, essentially, a BPF hook provider, if
you will. For XDP, BPF hook is provided by networking layer and
drivers. For cgroup BPF programs, hooks are "provided", in a sense, by
cgroup subsystem. So perf is just one of many ways to specify where
and when BPF program is going to be executed, and with what context.

If that's the case then is the bpf_link object the tool to bridge BPF and perf? I noticed that when
checking for pinned BPF programs with bpftool in kernel 5.4 that unless a kprobe, tracepoint,
or uprobe is listed in "bpftool perf list", the program doesn't seem to be running. Is the use of
perf to load BPF programs potentially a way to make them "headless" instead of pinning the bpf_link objects?
no, bpf_link is a way to marry BPF hook with BPF program. It's not
specific to perf or XDP, or whatever. Actually, right now perf-based
BPF hooks (kprobe, tracepoint) actually do not create a bpf_link under
cover, so you won't be able to pin them.



Regardless, I'm excited to have a more reliable build system than I have in the past. I think I'll start looking more into CO-RE and libbpf on kernels 5.5-5.7.
Awesome, have fun!

Hope everyone is staying healthy out there,
Tristan


Re: Building BPF programs and kernel persistence

Tristan Mayfield
 

Thanks for the reply Andrii.

Managed to get a build working outside of the kernel tree for BPF programs.
The two major things that I learned were that first, the order in which files are
included in the build command is more important than I previously thought.
The second thing was learning how clang deals with asm differently than gcc.
I had to use samples/bpf/asm_goto_workaround.h to fix those errors.
The meat of the makefile is as follows:

CLANGINC := /usr/lib/llvm-10/lib/clang/10.0.0/include
INC_FLAGS := -nostdinc -isystem $(CLANGINC)
EXTRA_FLAGS := -O3 -emit-llvm

linuxhdrs := /usr/src/linux-headers-$(shell uname -r)
LINUXINCLUDE := -include $(linuxhdrs)/include/linux/kconfig.h \
                                -include asm_workaround.h \
                                -I$(linuxhdrs)/arch/x86/include/ \
                                -I$(linuxhdrs)/arch/x86/include/uapi \
                                -I$(linuxhdrs)/arch/x86/include/generated \
                                -I$(linuxhdrs)/arch/x86/include/generated/uapi \
                                -I$(linuxhdrs)/include \
                                -I$(linuxhdrs)/include/uapi \
                                -I$(linuxhdrs)/include/generated/uapi \

COMPILERFLAGS := -D__KERNEL__ -D__ASM_SYSREG_H \
                                -D__BPF_TRACING__ -D__TARGET_ARCH_$(ARCH) \

# Builds all the targets from corresponding .c files
$(BPFOBJDIR)/%.o:$(BPFSRCDIR)/%.c
        $(CC) $(INC_FLAGS) $(COMPILERFLAGS) \
                $(LINUXINCLUDE) $(LIBBPF_HDRS) \
                $(EXTRA_FLAGS) -c $< -o - | $(LLC) -march=bpf -filetype obj -o $@

I wanted to include that sample for whatever soul in the future wants to tread the
same path with similar systems experience levels.
I still get about 100+ warnings when building that are the same as or similar to:

/usr/src/linux-headers-5.4.0-26-generic/arch/x86/include/asm/atomic.h:194:9: warning: unused variable '__ptr' [-Wunused-variable]
        return arch_cmpxchg(&v->counter, old, new);
                  ^

/usr/src/linux-headers-5.4.0-26-generic/arch/x86/include/asm/msr.h:100:26: warning: variable 'low' is uninitialized when used here [-Wuninitialized]
        return EAX_EDX_VAL(val, low, high);
                                                    ^~~

I suspect that these warnings come from my aggressive warning flags during
compilation rather than from actual issues in the kernel.

Right, pinning map or program doesn't ensure that program is still
attached to whatever BPF hook you attached to it. As you mentioned,
XDP, tc, cgroup-bpf programs are persistent. We are actually moving
towards the model of auto-detachment for those as well. See recent
activity around bpf_link. The solution with bpf_link to make such
attachments persistent is through pinning **link** itself, not program
or map. bpf_link is relatively recent addition, so on older kernels
you'd have to make sure you still have some process around that would
keep BPF attachment FD around.

I have been looking at the commits surrounding the pinning of bpf_link. It looks like it's only
working in kernel 5.7? I did actually go through and attempt to attach links for kprobes,
tracepoints, and raw_tracepoints in kernel 5.4 but, as you suggested, it seems unsupported.
I have yet to try on kernel 5.5-5.7 so I'll take a look this week or next.

As I mentioned before, with basic functionality in place here, I'm interested in working on
some sort of BPF tutorial similar to the XDP tutorial (https://github.com/xdp-project/xdp-tutorial)
with perhaps a more in-depth look at the technology included as well.

I'm still fuzzy on the relationship between bpf(2) and perf(1). Would it be correct to say that for
tracepoints, kprobes, and uprobes BPF leverages perf "under the hood" while for XDP and tc,
this is more like classic BPF in that it's implementation doesn't involve perf?
If that's the case then is the bpf_link object the tool to bridge BPF and perf? I noticed that when
checking for pinned BPF programs with bpftool in kernel 5.4 that unless a kprobe, tracepoint,
or uprobe is listed in "bpftool perf list", the program doesn't seem to be running. Is the use of
perf to load BPF programs potentially a way to make them "headless" instead of pinning the bpf_link objects?

Regardless, I'm excited to have a more reliable build system than I have in the past. I think I'll start looking more into CO-RE and libbpf on kernels 5.5-5.7.

Hope everyone is staying healthy out there,
Tristan

On Thu, May 14, 2020 at 5:51 PM Andrii Nakryiko <andrii.nakryiko@...> wrote:
On Mon, May 11, 2020 at 10:06 AM <mayfieldtristan@...> wrote:
>
>
> Hi all, hope everyone is staying healthy out there.

Hi! For the future, I think cc'ing bpf@... would be a good
idea, there are a lot of folks who are probably not watching iovisor
mailing list, but could help with issues like this.

>
> I've been working on building BPF programs, and have run into a few issues that I think might be clang (vs gcc) based.
> It seems that either clang isn't the most friendly of compilers when it comes to building Linux-native programs, or my lack of experience makes it seem so.
> I've been trying to build the simple BPF program below:
>
>
> #include "bpf_helpers.h"
> #include <linux/bpf.h>
> #include <linux/version.h>
> #include <linux/types.h>
> #include <linux/tcp.h>
> #include <net/sock.h>
>
> struct inet_sock_set_state_args {
>         long long pad;
> const void * skaddr;
> int oldstate;
> int newstate;
> u16 sport;
>   u16 dport;
> u16 family;
> u8 protocol;
> u8 saddr[4];
> u8 daddr[4];
> u8 saddr_v6[16];
> u8 daddr_v6[16];
> };
>
>
> SEC("tracepoint/sock/inet_sock_set_state")
> int bpf_prog(struct inet_sock_set_state_args *args) {
>
>   struct sock *sk = (struct sock *)args->skaddr;
>   short lport = args->sport;
>
>   char msg[] = "lport: %d\n";
>   bpf_trace_printk(msg, sizeof(msg), lport);
>
>   return 0;
> }
>
> char _license[] SEC("license") = "GPL";
>
>
>
> I've been looking through selftests/bpf/, samples/bpf/, and examples on various blogs and articles.
> From this, I've come up with the following makefile:
>
>
> ## Build tools
> LLC := llc
> CC := clang
> HOSTCC := clang
> CLANGINC := /usr/lib/llvm-10/lib/clang/10.0.0/include
>
> ## Some useful flags
> INC_FLAGS := -nostdinc -isystem $(CLANGINC)
> EXTRA_FLAGS := -O3 -emit-llvm
>
> ## Includes
> linuxhdrs := /usr/src/linux-headers-$(shell uname -r)
> LINUXINCLUDE := -include $(linuxhdrs)/include/linux/kconfig.h \
> -include /usr/include/linux/bpf.h \
> -I$(linuxhdrs)/arch/x86/include/ \
> -I$(linuxhdrs)/arch/x86/include/uapi \
> -I$(linuxhdrs)/arch/x86/include/generated \
> -I$(linuxhdrs)/arch/x86/include/generated/uapi \
> -I$(linuxhdrs)/include \
> -I$(linuxhdrs)/include/uapi \
> -I$(linuxhdrs)/include/generated/uapi \
> LIBBPF :=  -I/home/vagrant/libbpf/src/
> OBJS := tcptest.bpf.o
>
> $(OBJS): %.o:%.c
> $(CC) $(INC_FLAGS) \
> -target bpf -D__KERNEL__ -D __ASM_SYSREG_H \
> -D__BPF_TRACING__ -D__TARGET_ARCH_$(ARCH) \
> -Wno-unused-value -Wno-pointer-sign \
> -Wno-compare-distinct-pointer-types \
> -Wno-gnu-variable-sized-type-not-at-end \
> -Wno-address-of-packed-member \
> -Wno-tautological-compare \
> -Wno-unknown-warning-option \
> -Wall -v \
> $(LINUXINCLUDE) $(LIBBPF) \
> $(EXTRA_FLAGS) -c $< -o - | $(LLC) -march=bpf -filetype obj -o $
>
>
> Unfortunately, I keep running into what seems to be asm errors. I've tried reorganizing the list of include statements, taking out "-target bpf", not including some files, including other files, etc etc.
> This stackoverflow post suggests that it's a kconfig.h error, but I seem to be including the file just fine (https://stackoverflow.com/questions/56975861/error-compiling-ebpf-c-code-out-of-kernel-tree/56990939#56990939).
> I'm not really sure where to go from here with building BPF programs and including files that have the kernel datatypes. Maybe I'm missing something that's obvious that I'm just ignorant of?

I'd start with actually specifying what compilation errors you run
into. Also check out
https://github.com/iovisor/bcc/blob/master/libbpf-tools/Makefile to
see how BPF programs can be compiled properly outside of kernel tree.
Though that one pretty much assumes vmlinux.h, which simplifies a
bunch of compilation issues, probably.

>
>
>
> As additional information, and regarding kernel persistence, I am working on a monitoring project that uses BPF programs to continuously monitor the system without the bulky dependencies that BCC includes. I'm concurrently working on a BTF/CO-RE solution but I'm emphasizing a non-CO-RE approach at the moment. I can load and run BPF programs but upon termination of my userspace loader the BPF programs themselves also terminate.
>
>
>
> I would like to have the BPF program persist in the kernel even after the user space loader has completed its execution. I read in various documentation and in a 2015 LWN article that persistent BPF programs can be created by pinning programs and maps to the BPF vfs so as to keep the fds open. I have attempted pinning the entire BPF object, various programs and various maps, and no matter what I've tried the kernel BPF program terminates when the userspace process terminates. Using bpftool I have verified that the BPF files are pinned to the location and that BPF programs themselves all work. I know that persistent BPF programs are a part of projects like XDP and tc. Is there a way to do this for a generic BPF loader without having to implement customized kernel functions?  Below I have included a simplified version of my code. In which I outline the basic steps I take to load the compiled bpf programs and attempt to make persistent instances of them.

Right, pinning map or program doesn't ensure that program is still
attached to whatever BPF hook you attached to it. As you mentioned,
XDP, tc, cgroup-bpf programs are persistent. We are actually moving
towards the model of auto-detachment for those as well. See recent
activity around bpf_link. The solution with bpf_link to make such
attachments persistent is through pinning **link** itself, not program
or map. bpf_link is relatively recent addition, so on older kernels
you'd have to make sure you still have some process around that would
keep BPF attachment FD around.


>
>
>
> #include <stdio.h>
>
> #include <stdlib.h>
>
> #include <string.h>
>
> #include <errno.h>
>
> #include <getopt.h>
>
> #include <dirent.h>
>
> #include <sys/stat.h>
>
> #include <unistd.h>
>
> #include <assert.h>
>
> #include <linux/version.h>
>
>
>
> #include "libbpf.h"
>
> #include "bpf.h"
>
> #include "loader_helpers.h"
>
>
>
> #include <stdbool.h>
>
> #include <fcntl.h>
>
> #include <poll.h>
>
> #include <linux/perf_event.h>
>
> #include <assert.h>
>
> #include <sys/syscall.h>
>
> #include <sys/ioctl.h>
>
> #include <sys/mman.h>
>
> #include <time.h>
>
> #include <signal.h>
>
> #include <linux/ptrace.h>
>
>
>
> int main(int argc, char **argv) {
>
>
>
>     struct bpf_object *bpf_obj;
>
>     struct bpf_program *bpf_prog;
>
>     struct bpf_map *map;
>
>     char * license = "GPL";
>
>     __u32 kernelvers = LINUX_VERSION_CODE;
>
>     struct bpf_link * link;
>
>     int err;
>
>     int prog_fd;
>
>
>
>     bpf_obj = bpf_object__open("test_file.bpf.o");
>
>
>
>     bpf_prog = bpf_program__next(NULL, bpf_obj);
>
>
>
>     err = bpf_program__set_tracepoint(bpf_prog);
>
>     if(err) {
>
>         fprintf(stderr, "ERR couldn't setup program type\n");
>
>         return -1;
>
>     }
>
>     err = bpf_program__load(bpf_prog, license, kernelvers);
>
>     if(err) {
>
>         fprintf(stderr, "ERR couldn't setup program phase\n");
>
>         return -1;
>
>     }
>
>     prog_fd = bpf_program__fd(bpf_prog);
>
>
>
>     link = bpf_program__attach_tracepoint(bpf_prog, "syscalls", "sys_enter_openat");
>
>     if(!link) {
>
>         fprintf(stderr, "ERROR ATTACHING TRACEPOINT\n");
>
>         return -1;
>
>     }
>
>
>
>     assert(bpf_program__is_tracepoint(bpf_prog));
>
>
>
> pin:
>
>     err = bpf_program__pin(bpf_prog, "/sys/fs/bpf/tpprogram");
>
>     if(err) {
>
>         if(err == -17) {
>
>             printf("Program exists...trying to unpin and retry!\n");
>
>             err = bpf_program__unpin(bpf_prog, "/sys/fs/bpf/tpprogram");
>
>             if(!err) {
>
>                 goto pin;
>
>            }
>
>             printf("The pining already exists but it couldn't be removed...\n");
>
>             return -1;
>
>         }
>
>         printf("We couldn't pin...%d\n", err);
>
>         return -1;
>
>     }
>
>
>
>     printf("Program pinned and working...\n");
>
>
>
>     return 0;
>
> }
>
>
>
>
> Thanks for having a look and I hope these issues can be cleared up. Seems like building is the last major hurdle I have to get rolling with better engineering solutions than manually including structs in my files.
> Hope everyone stays well!


Hope above helped. Please cc bpf@... (and ideally send
plain-text emails, kernel mailing lists don't accept HTML emails).

>


Re: eBPF map - Control and Data plane concurrency #bcc

Andrii Nakryiko
 

On Tue, May 12, 2020 at 2:19 AM <simonemagnani.96@...> wrote:

Thanks for the suggestion, now I feel more confident about this solution.

However, I have still problems with the map-in-map type: is it possible to use a map which has as key the 4 tcp-session identifier {srcIp, dstIp, srcPort, dstPort} and as value a BPF_ARRAY which is a list of some packets' headers belonging to that session?
As far as I undestood, a BPF_HASH_OF_MAPS key is coded as integer, and the value retrieved with lookup is the inner table fileDescriptor. Although, how do you initialize those inner arrays? I've tried to insert something, but what should I put as value? The inner map's fileDescriptor (how do I know it)?
No, HASH_OF_MAPS allows arbitrary-sized keys, just like normal
HASHMAP. Libbpf recently got a support for nicer map-in-map
declaration and initialization, you might want to check it out: [0].

[0] https://patchwork.ozlabs.org/project/netdev/patch/20200428064140.122796-4-andriin@fb.com/



Re: Building BPF programs and kernel persistence

Andrii Nakryiko
 

On Mon, May 11, 2020 at 10:06 AM <mayfieldtristan@...> wrote:


Hi all, hope everyone is staying healthy out there.
Hi! For the future, I think cc'ing bpf@... would be a good
idea, there are a lot of folks who are probably not watching iovisor
mailing list, but could help with issues like this.


I've been working on building BPF programs, and have run into a few issues that I think might be clang (vs gcc) based.
It seems that either clang isn't the most friendly of compilers when it comes to building Linux-native programs, or my lack of experience makes it seem so.
I've been trying to build the simple BPF program below:


#include "bpf_helpers.h"
#include <linux/bpf.h>
#include <linux/version.h>
#include <linux/types.h>
#include <linux/tcp.h>
#include <net/sock.h>

struct inet_sock_set_state_args {
long long pad;
const void * skaddr;
int oldstate;
int newstate;
u16 sport;
u16 dport;
u16 family;
u8 protocol;
u8 saddr[4];
u8 daddr[4];
u8 saddr_v6[16];
u8 daddr_v6[16];
};


SEC("tracepoint/sock/inet_sock_set_state")
int bpf_prog(struct inet_sock_set_state_args *args) {

struct sock *sk = (struct sock *)args->skaddr;
short lport = args->sport;

char msg[] = "lport: %d\n";
bpf_trace_printk(msg, sizeof(msg), lport);

return 0;
}

char _license[] SEC("license") = "GPL";



I've been looking through selftests/bpf/, samples/bpf/, and examples on various blogs and articles.
From this, I've come up with the following makefile:


## Build tools
LLC := llc
CC := clang
HOSTCC := clang
CLANGINC := /usr/lib/llvm-10/lib/clang/10.0.0/include

## Some useful flags
INC_FLAGS := -nostdinc -isystem $(CLANGINC)
EXTRA_FLAGS := -O3 -emit-llvm

## Includes
linuxhdrs := /usr/src/linux-headers-$(shell uname -r)
LINUXINCLUDE := -include $(linuxhdrs)/include/linux/kconfig.h \
-include /usr/include/linux/bpf.h \
-I$(linuxhdrs)/arch/x86/include/ \
-I$(linuxhdrs)/arch/x86/include/uapi \
-I$(linuxhdrs)/arch/x86/include/generated \
-I$(linuxhdrs)/arch/x86/include/generated/uapi \
-I$(linuxhdrs)/include \
-I$(linuxhdrs)/include/uapi \
-I$(linuxhdrs)/include/generated/uapi \
LIBBPF := -I/home/vagrant/libbpf/src/
OBJS := tcptest.bpf.o

$(OBJS): %.o:%.c
$(CC) $(INC_FLAGS) \
-target bpf -D__KERNEL__ -D __ASM_SYSREG_H \
-D__BPF_TRACING__ -D__TARGET_ARCH_$(ARCH) \
-Wno-unused-value -Wno-pointer-sign \
-Wno-compare-distinct-pointer-types \
-Wno-gnu-variable-sized-type-not-at-end \
-Wno-address-of-packed-member \
-Wno-tautological-compare \
-Wno-unknown-warning-option \
-Wall -v \
$(LINUXINCLUDE) $(LIBBPF) \
$(EXTRA_FLAGS) -c $< -o - | $(LLC) -march=bpf -filetype obj -o $


Unfortunately, I keep running into what seems to be asm errors. I've tried reorganizing the list of include statements, taking out "-target bpf", not including some files, including other files, etc etc.
This stackoverflow post suggests that it's a kconfig.h error, but I seem to be including the file just fine (https://stackoverflow.com/questions/56975861/error-compiling-ebpf-c-code-out-of-kernel-tree/56990939#56990939).
I'm not really sure where to go from here with building BPF programs and including files that have the kernel datatypes. Maybe I'm missing something that's obvious that I'm just ignorant of?
I'd start with actually specifying what compilation errors you run
into. Also check out
https://github.com/iovisor/bcc/blob/master/libbpf-tools/Makefile to
see how BPF programs can be compiled properly outside of kernel tree.
Though that one pretty much assumes vmlinux.h, which simplifies a
bunch of compilation issues, probably.




As additional information, and regarding kernel persistence, I am working on a monitoring project that uses BPF programs to continuously monitor the system without the bulky dependencies that BCC includes. I'm concurrently working on a BTF/CO-RE solution but I'm emphasizing a non-CO-RE approach at the moment. I can load and run BPF programs but upon termination of my userspace loader the BPF programs themselves also terminate.



I would like to have the BPF program persist in the kernel even after the user space loader has completed its execution. I read in various documentation and in a 2015 LWN article that persistent BPF programs can be created by pinning programs and maps to the BPF vfs so as to keep the fds open. I have attempted pinning the entire BPF object, various programs and various maps, and no matter what I've tried the kernel BPF program terminates when the userspace process terminates. Using bpftool I have verified that the BPF files are pinned to the location and that BPF programs themselves all work. I know that persistent BPF programs are a part of projects like XDP and tc. Is there a way to do this for a generic BPF loader without having to implement customized kernel functions? Below I have included a simplified version of my code. In which I outline the basic steps I take to load the compiled bpf programs and attempt to make persistent instances of them.
Right, pinning map or program doesn't ensure that program is still
attached to whatever BPF hook you attached to it. As you mentioned,
XDP, tc, cgroup-bpf programs are persistent. We are actually moving
towards the model of auto-detachment for those as well. See recent
activity around bpf_link. The solution with bpf_link to make such
attachments persistent is through pinning **link** itself, not program
or map. bpf_link is relatively recent addition, so on older kernels
you'd have to make sure you still have some process around that would
keep BPF attachment FD around.





#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include <errno.h>

#include <getopt.h>

#include <dirent.h>

#include <sys/stat.h>

#include <unistd.h>

#include <assert.h>

#include <linux/version.h>



#include "libbpf.h"

#include "bpf.h"

#include "loader_helpers.h"



#include <stdbool.h>

#include <fcntl.h>

#include <poll.h>

#include <linux/perf_event.h>

#include <assert.h>

#include <sys/syscall.h>

#include <sys/ioctl.h>

#include <sys/mman.h>

#include <time.h>

#include <signal.h>

#include <linux/ptrace.h>



int main(int argc, char **argv) {



struct bpf_object *bpf_obj;

struct bpf_program *bpf_prog;

struct bpf_map *map;

char * license = "GPL";

__u32 kernelvers = LINUX_VERSION_CODE;

struct bpf_link * link;

int err;

int prog_fd;



bpf_obj = bpf_object__open("test_file.bpf.o");



bpf_prog = bpf_program__next(NULL, bpf_obj);



err = bpf_program__set_tracepoint(bpf_prog);

if(err) {

fprintf(stderr, "ERR couldn't setup program type\n");

return -1;

}

err = bpf_program__load(bpf_prog, license, kernelvers);

if(err) {

fprintf(stderr, "ERR couldn't setup program phase\n");

return -1;

}

prog_fd = bpf_program__fd(bpf_prog);



link = bpf_program__attach_tracepoint(bpf_prog, "syscalls", "sys_enter_openat");

if(!link) {

fprintf(stderr, "ERROR ATTACHING TRACEPOINT\n");

return -1;

}



assert(bpf_program__is_tracepoint(bpf_prog));



pin:

err = bpf_program__pin(bpf_prog, "/sys/fs/bpf/tpprogram");

if(err) {

if(err == -17) {

printf("Program exists...trying to unpin and retry!\n");

err = bpf_program__unpin(bpf_prog, "/sys/fs/bpf/tpprogram");

if(!err) {

goto pin;

}

printf("The pining already exists but it couldn't be removed...\n");

return -1;

}

printf("We couldn't pin...%d\n", err);

return -1;

}



printf("Program pinned and working...\n");



return 0;

}




Thanks for having a look and I hope these issues can be cleared up. Seems like building is the last major hurdle I have to get rolling with better engineering solutions than manually including structs in my files.
Hope everyone stays well!

Hope above helped. Please cc bpf@... (and ideally send
plain-text emails, kernel mailing lists don't accept HTML emails).



Re: eBPF map - Control and Data plane concurrency #bcc

Simone Magnani
 

Thanks for the suggestion, now I feel more confident about this solution.

However, I have still problems with the map-in-map type: is it possible to use a map which has as key the 4 tcp-session identifier {srcIp, dstIp, srcPort, dstPort} and as value a BPF_ARRAY which is a list of some packets' headers belonging to that session?
As far as I undestood, a BPF_HASH_OF_MAPS key is coded as integer, and the value retrieved with lookup is the inner table fileDescriptor. Although, how do you initialize those inner arrays? I've tried to insert something, but what should I put as value? The inner map's fileDescriptor (how do I know it)?


Building BPF programs and kernel persistence

Tristan Mayfield
 


Hi all, hope everyone is staying healthy out there.

I've been working on building BPF programs, and have run into a few issues that I think might be clang (vs gcc) based.
It seems that either clang isn't the most friendly of compilers when it comes to building Linux-native programs, or my lack of experience makes it seem so.
I've been trying to build the simple BPF program below:
 
#include "bpf_helpers.h"
#include <linux/bpf.h>
#include <linux/version.h>
#include <linux/types.h>
#include <linux/tcp.h>
#include <net/sock.h>

struct inet_sock_set_state_args {
        long long pad;
	const void * skaddr;
	int oldstate;
	int newstate;
	u16 sport;
 	u16 dport;
	u16 family;
	u8 protocol;
	u8 saddr[4];
	u8 daddr[4];
	u8 saddr_v6[16];
	u8 daddr_v6[16];
};


SEC("tracepoint/sock/inet_sock_set_state")
int bpf_prog(struct inet_sock_set_state_args *args) {

  struct sock *sk = (struct sock *)args->skaddr;
  short lport = args->sport;

  char msg[] = "lport: %d\n";
  bpf_trace_printk(msg, sizeof(msg), lport);

  return 0;
}

char _license[] SEC("license") = "GPL";
 
 
I've been looking through selftests/bpf/, samples/bpf/, and examples on various blogs and articles.
From this, I've come up with the following makefile:
 
## Build tools
LLC := llc
CC := clang
HOSTCC := clang
CLANGINC := /usr/lib/llvm-10/lib/clang/10.0.0/include

## Some useful flags
INC_FLAGS := -nostdinc -isystem $(CLANGINC)
EXTRA_FLAGS := -O3 -emit-llvm

## Includes
linuxhdrs := /usr/src/linux-headers-$(shell uname -r)
LINUXINCLUDE := -include $(linuxhdrs)/include/linux/kconfig.h \
				-include /usr/include/linux/bpf.h \
				-I$(linuxhdrs)/arch/x86/include/ \
				-I$(linuxhdrs)/arch/x86/include/uapi \
				-I$(linuxhdrs)/arch/x86/include/generated \
				-I$(linuxhdrs)/arch/x86/include/generated/uapi \
				-I$(linuxhdrs)/include \
				-I$(linuxhdrs)/include/uapi \
				-I$(linuxhdrs)/include/generated/uapi \
LIBBPF :=  -I/home/vagrant/libbpf/src/
OBJS := tcptest.bpf.o

$(OBJS): %.o:%.c
	$(CC) $(INC_FLAGS) \
		-target bpf -D__KERNEL__ -D __ASM_SYSREG_H \
		-D__BPF_TRACING__ -D__TARGET_ARCH_$(ARCH) \
		-Wno-unused-value -Wno-pointer-sign \
		-Wno-compare-distinct-pointer-types \
		-Wno-gnu-variable-sized-type-not-at-end \
		-Wno-address-of-packed-member \
		-Wno-tautological-compare \
		-Wno-unknown-warning-option \
		-Wall -v \
		$(LINUXINCLUDE) $(LIBBPF) \
		$(EXTRA_FLAGS) -c $< -o - | $(LLC) -march=bpf -filetype obj -o $
 
Unfortunately, I keep running into what seems to be asm errors. I've tried reorganizing the list of include statements, taking out "-target bpf", not including some files, including other files, etc etc.
This stackoverflow post suggests that it's a kconfig.h error, but I seem to be including the file just fine (https://stackoverflow.com/questions/56975861/error-compiling-ebpf-c-code-out-of-kernel-tree/56990939#56990939).
I'm not really sure where to go from here with building BPF programs and including files that have the kernel datatypes. Maybe I'm missing something that's obvious that I'm just ignorant of?
 
 

As additional information, and regarding kernel persistence, I am working on a monitoring project that uses BPF programs to continuously monitor the system without the bulky dependencies that BCC includes. I'm concurrently working on a BTF/CO-RE solution but I'm emphasizing a non-CO-RE approach at the moment. I can load and run BPF programs but upon termination of my userspace loader the BPF programs themselves also terminate.

 

I would like to have the BPF program persist in the kernel even after the user space loader has completed its execution. I read in various documentation and in a 2015 LWN article that persistent BPF programs can be created by pinning programs and maps to the BPF vfs so as to keep the fds open. I have attempted pinning the entire BPF object, various programs and various maps, and no matter what I've tried the kernel BPF program terminates when the userspace process terminates. Using bpftool I have verified that the BPF files are pinned to the location and that BPF programs themselves all work. I know that persistent BPF programs are a part of projects like XDP and tc. Is there a way to do this for a generic BPF loader without having to implement customized kernel functions?  Below I have included a simplified version of my code. In which I outline the basic steps I take to load the compiled bpf programs and attempt to make persistent instances of them.

 

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include <errno.h>

#include <getopt.h>

#include <dirent.h>

#include <sys/stat.h>

#include <unistd.h>

#include <assert.h>

#include <linux/version.h>

 

#include "libbpf.h"

#include "bpf.h"

#include "loader_helpers.h"

 

#include <stdbool.h>

#include <fcntl.h>

#include <poll.h>

#include <linux/perf_event.h>

#include <assert.h>

#include <sys/syscall.h>

#include <sys/ioctl.h>

#include <sys/mman.h>

#include <time.h>

#include <signal.h>

#include <linux/ptrace.h>

 

int main(int argc, char **argv) {

 

    struct bpf_object *bpf_obj;

    struct bpf_program *bpf_prog;

    struct bpf_map *map;

    char * license = "GPL";

    __u32 kernelvers = LINUX_VERSION_CODE;

    struct bpf_link * link;

    int err;

    int prog_fd;

 

    bpf_obj = bpf_object__open("test_file.bpf.o");

 

    bpf_prog = bpf_program__next(NULL, bpf_obj);

 

    err = bpf_program__set_tracepoint(bpf_prog);

    if(err) {

        fprintf(stderr, "ERR couldn't setup program type\n");

        return -1;

    }

    err = bpf_program__load(bpf_prog, license, kernelvers);

    if(err) {

        fprintf(stderr, "ERR couldn't setup program phase\n");

        return -1;

    }

    prog_fd = bpf_program__fd(bpf_prog);

 

    link = bpf_program__attach_tracepoint(bpf_prog, "syscalls", "sys_enter_openat");

    if(!link) {

        fprintf(stderr, "ERROR ATTACHING TRACEPOINT\n");

        return -1;

    }

 

    assert(bpf_program__is_tracepoint(bpf_prog));

 

pin:

    err = bpf_program__pin(bpf_prog, "/sys/fs/bpf/tpprogram");

    if(err) {

        if(err == -17) {

            printf("Program exists...trying to unpin and retry!\n");

            err = bpf_program__unpin(bpf_prog, "/sys/fs/bpf/tpprogram");

            if(!err) {

                goto pin;

           }

            printf("The pining already exists but it couldn't be removed...\n");

            return -1;

        }

        printf("We couldn't pin...%d\n", err);

        return -1;

    }

 

    printf("Program pinned and working...\n");

 

    return 0;

}




Thanks for having a look and I hope these issues can be cleared up. Seems like building is the last major hurdle I have to get rolling with better engineering solutions than manually including structs in my files.
Hope everyone stays well!
 


Re: eBPF map - Control and Data plane concurrency #bcc

Yonghong Song
 

Your approach seems okay. You can use two maps or use map-in-map.
Using batch operation from user space should speedup the
deletion operation.

On Sat, May 9, 2020 at 5:31 AM <simonemagnani.96@...> wrote:

Hi everybody,

I am writing this email to ask for an opinion about how to address the following problem.

Lately, I have been trying to develop an eBPF program that extracts some per-flow values and stores them into an eBPF HASH_MAP, which is then read by the user-space that extracts the stored information.
When I start reading the map from user-space, all the entries should be deleted at the same time, and the data plane should keep storing the incoming data.
The solution that I have found is to use two maps (and programs) that are continuously swapped when the user-space read is triggered. In this way, when we read the 'old' map, the 'new' map keeps storing the new data.

At the same time, to speed up the lookup and delete operation, I could use the recently added "bpf_map_lookup_and_delete_batch" function to read and clear the map.

Do you think this could be an optimal solution, or are there other more efficient methods?

Thanks in advance for all the suggestions

Best Regards,
Simone


eBPF map - Control and Data plane concurrency #bcc

Simone Magnani
 

Hi everybody,

I am writing this email to ask for an opinion about how to address the following problem.

Lately, I have been trying to develop an eBPF program that extracts some per-flow values and stores them into an eBPF HASH_MAP, which is then read by the user-space that extracts the stored information.
When I start reading the map from user-space, all the entries should be deleted at the same time, and the data plane should keep storing the incoming data.
The solution that I have found is to use two maps (and programs) that are continuously swapped when the user-space read is triggered. In this way, when we read the 'old' map, the 'new' map keeps storing the new data.

At the same time, to speed up the lookup and delete operation, I could use the recently added "bpf_map_lookup_and_delete_batch" function to read and clear the map.

Do you think this could be an optimal solution, or are there other more efficient methods?

Thanks in advance for all the suggestions

Best Regards,
Simone


Re: #bcc - skb_network_header crashes in a BPF Kernel trace function #bcc

Yonghong Song
 

On Wed, May 6, 2020 at 11:00 PM Yonghong Song via lists.iovisor.org
<ys114321=gmail.com@...> wrote:

On Wed, May 6, 2020 at 9:26 AM <vigs.prof@...> wrote:

Hello - I am looking to trace ip_forward_finish. The intent is to trace latency of all TCP connections going through a linux based gateway router. Hence thought of tracing ip_forward_finish kernel function. And capture the time-stamp of SYN, SYN-ACK and ACK messages at the router.

The issue is accessing iphdr inside the trace function crashes with the below error:

bpf: Failed to load program: Permission denied
0: (79) r6 = *(u64 *)(r1 +96)
1: (b7) r1 = 0
2: (6b) *(u16 *)(r10 -24) = r1
3: (bf) r3 = r6
4: (07) r3 += 192
5: (bf) r1 = r10
6: (07) r1 += -24
7: (b7) r2 = 2
8: (85) call bpf_probe_read#4
9: (69) r1 = *(u16 *)(r10 -24)
10: (55) if r1 != 0x8 goto pc+7
R0=inv(id=0) R1=inv8 R6=inv(id=0) R10=fp0
11: (69) r1 = *(u16 *)(r6 +196)
R6 invalid mem access 'inv'
You did not show the code which actually caused the problem.

bpf_probe_read(&ip_Hdr, sizeof(ip_Hdr), (void*)ip_hdr(skb));

if ( (ip_Hdr.protocol != IPPROTO_TCP))
return 0;

return 0;
}
There must be code after "if ( (ip_Hdr.protocol != IPPROTO_TCP)) return 0;" .
You may need bpf_probe_read() for memory accesses there.


HINT: The invalid mem access 'inv' error can happen if you try to dereference memory without first using bpf_probe_read() to copy it to the BPF stack. Sometimes the bpf_probe_read is automatic by the bcc rewriter, other times you'll need to be explicit.

The code fragment I originally had was as below and the crash occurs when an access to ip_Hdr->protocol is made. And I also checked that ip_Hdr is not null.

int trace_forward_finish(struct pt_regs *ctx,struct net *net, struct sock *sk, struct sk_buff *skb)
{

if (skb->protocol != htons(ETH_P_IP)) return 0;

struct iphdr* ip_Hdr = (struct iphdr *) skb_network_header(skb);

if (ip_Hdr->protocol != IPPROTO_TCP)
return 0;


/// Other code

}

Per the HINT in the message, I did try to change to bpf_probe_read but still the same outcome

int trace_forward_finish(struct pt_regs *ctx,struct net *net, struct sock *sk, struct sk_buff *skb)
{
if (skb->protocol != htons(ETH_P_IP)) return 0;

struct iphdr ip_Hdr;
bpf_probe_read(&ip_Hdr, sizeof(ip_Hdr), (void*)ip_hdr(skb));
I see the issue now. ip_hdr(skb) eventually transforms to

static inline unsigned char *skb_network_header(const struct sk_buff *skb)
{
return skb->head + skb->network_header;
}

The above two pointer dereferences need bpf probe read.
Unfortunately, you may need put the above function in your bpf program
so you could use bpf_probe_read to access skb->head and skb->network_header.


if ( (ip_Hdr.protocol != IPPROTO_TCP))
return 0;

return 0;
}

Any help would be appreciated.


Re: #bcc - skb_network_header crashes in a BPF Kernel trace function #bcc

Yonghong Song
 

On Wed, May 6, 2020 at 9:26 AM <vigs.prof@...> wrote:

Hello - I am looking to trace ip_forward_finish. The intent is to trace latency of all TCP connections going through a linux based gateway router. Hence thought of tracing ip_forward_finish kernel function. And capture the time-stamp of SYN, SYN-ACK and ACK messages at the router.

The issue is accessing iphdr inside the trace function crashes with the below error:

bpf: Failed to load program: Permission denied
0: (79) r6 = *(u64 *)(r1 +96)
1: (b7) r1 = 0
2: (6b) *(u16 *)(r10 -24) = r1
3: (bf) r3 = r6
4: (07) r3 += 192
5: (bf) r1 = r10
6: (07) r1 += -24
7: (b7) r2 = 2
8: (85) call bpf_probe_read#4
9: (69) r1 = *(u16 *)(r10 -24)
10: (55) if r1 != 0x8 goto pc+7
R0=inv(id=0) R1=inv8 R6=inv(id=0) R10=fp0
11: (69) r1 = *(u16 *)(r6 +196)
R6 invalid mem access 'inv'
You did not show the code which actually caused the problem.

bpf_probe_read(&ip_Hdr, sizeof(ip_Hdr), (void*)ip_hdr(skb));

if ( (ip_Hdr.protocol != IPPROTO_TCP))
return 0;

return 0;
}
There must be code after "if ( (ip_Hdr.protocol != IPPROTO_TCP)) return 0;" .
You may need bpf_probe_read() for memory accesses there.


HINT: The invalid mem access 'inv' error can happen if you try to dereference memory without first using bpf_probe_read() to copy it to the BPF stack. Sometimes the bpf_probe_read is automatic by the bcc rewriter, other times you'll need to be explicit.

The code fragment I originally had was as below and the crash occurs when an access to ip_Hdr->protocol is made. And I also checked that ip_Hdr is not null.

int trace_forward_finish(struct pt_regs *ctx,struct net *net, struct sock *sk, struct sk_buff *skb)
{

if (skb->protocol != htons(ETH_P_IP)) return 0;

struct iphdr* ip_Hdr = (struct iphdr *) skb_network_header(skb);

if (ip_Hdr->protocol != IPPROTO_TCP)
return 0;


/// Other code

}

Per the HINT in the message, I did try to change to bpf_probe_read but still the same outcome

int trace_forward_finish(struct pt_regs *ctx,struct net *net, struct sock *sk, struct sk_buff *skb)
{
if (skb->protocol != htons(ETH_P_IP)) return 0;

struct iphdr ip_Hdr;
bpf_probe_read(&ip_Hdr, sizeof(ip_Hdr), (void*)ip_hdr(skb));

if ( (ip_Hdr.protocol != IPPROTO_TCP))
return 0;

return 0;
}

Any help would be appreciated.


#bcc - skb_network_header crashes in a BPF Kernel trace function #bcc

vigs.prof@...
 

Hello - I am looking to trace ip_forward_finish. The intent is to trace latency of all TCP connections going through a linux based gateway router.  Hence thought of tracing ip_forward_finish kernel function. And capture the time-stamp of SYN, SYN-ACK and ACK messages at the router. 
 
The issue is accessing iphdr inside the trace function crashes with the below error:
 
bpf: Failed to load program: Permission denied
0: (79) r6 = *(u64 *)(r1 +96)
1: (b7) r1 = 0
2: (6b) *(u16 *)(r10 -24) = r1
3: (bf) r3 = r6
4: (07) r3 += 192
5: (bf) r1 = r10
6: (07) r1 += -24
7: (b7) r2 = 2
8: (85) call bpf_probe_read#4
9: (69) r1 = *(u16 *)(r10 -24)
10: (55) if r1 != 0x8 goto pc+7
 R0=inv(id=0) R1=inv8 R6=inv(id=0) R10=fp0
11: (69) r1 = *(u16 *)(r6 +196)
R6 invalid mem access 'inv'
 
HINT: The invalid mem access 'inv' error can happen if you try to dereference memory without first using bpf_probe_read() to copy it to the BPF stack. Sometimes the bpf_probe_read is automatic by the bcc rewriter, other times you'll need to be explicit.
 
The code fragment I originally had was as below and the crash occurs when an access to ip_Hdr->protocol is made. And I also checked that ip_Hdr is not null. 
 
int trace_forward_finish(struct pt_regs *ctx,struct net *net, struct sock *sk, struct sk_buff *skb)
{
 
    if (skb->protocol != htons(ETH_P_IP)) return 0;
 
    struct iphdr* ip_Hdr = (struct iphdr *) skb_network_header(skb);
 
    if (ip_Hdr->protocol != IPPROTO_TCP)
         return 0;
 
 
    /// Other code
 
  }
 
Per the HINT in the message, I did try to change to bpf_probe_read but still the same outcome
 
int trace_forward_finish(struct pt_regs *ctx,struct net *net, struct sock *sk, struct sk_buff *skb)
{
    if (skb->protocol != htons(ETH_P_IP)) return 0;
 
    struct iphdr ip_Hdr;
    bpf_probe_read(&ip_Hdr, sizeof(ip_Hdr), (void*)ip_hdr(skb)); 
    
    if ( (ip_Hdr.protocol != IPPROTO_TCP))
         return 0;
 
    return 0;
}
 
Any help would be appreciated. 


Seeking candidates for PhD position related to XDP/eBPF

Jesper Dangaard Brouer
 

Hi Potential PhD student,

Reminder: Application deadline 15.May 2020 is really soon for our PhD
position located in Sweden, at Karlstads University. See:

"PhD position in Computer Science, programmable networks"
https://kau.varbi.com/en/what:job/jobID:315513

This PhD position is related to XDP/eBPF. The Red Hat engineers you
will be cooperating with are Toke and I. Red Hat is funding the
position, but employment happens under University terms, with the
exception the work should be released under an Open Source license.

--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer


Re: Extracting data from tracepoints (and anything else)

Andrii Nakryiko
 

On Thu, Apr 16, 2020 at 8:42 AM <mayfieldtristan@...> wrote:

I've waited to reply, not wanting to clog the mailing list, but I thought it would be beneficial to follow up on the same topic with kprobes in addition to tracepoints. The main issue I had with tracepoints was not understanding the 8-byte alignment in the arguments. Once that was sorted, getting information was actually really simple.

At this point I've moved to kprobes, kretprobes, and raw tracepoints. From what I understand, if not using CO-RE or vmlinux.h, to access data from kprobes or kretprobes you must access the cpu registers in which those values live?
You are not really accessing CPU registers, but you access their
values before the program was interrupted. Those values are stored in
pt_regs struct. It's a technicality in this case, but you can't access
CPU registers directly in BPF.

BTW, raw_tracepoints are completely different, but you should be able
to find examples in selftests for those.

For example, if I'm porting Brenden Gregg's bpftrace tool "elfsnoop" to libbpf, I'd want to trace "load_elf_binary()." load_elf_binary() only has one argument: "struct linux_binrprm *bprm." So if I want to read that struct, I'd have to access the register with that argument. I think in bpf_tracing.h that macro would be PT_REGS_PARAM1(x). I don't have the greatest understanding of asm and cpu registers, but I believe that would be the %rdi register?
Yes, rdi register, which is accesed from pt_regs using PT_REGS_PARM1()

With that in mind, here's my code and build.

#include <linux/bpf.h>
#include "bpf_helpers.h"
#include "bpf_tracing.h"
#include <linux/ptrace.h>
#include <linux/types.h>

SEC("kprobe/load_elf_binary")
int trace_entry(struct pt_regs *ctx) {
char msg[] = "hello world\n"; // for verification that the bpf program is running at all
bpf_trace_printk(msg, sizeof(msg));

struct linux_binprm *arg = (struct linux_binprm *) PT_REGS_PARM1(ctx);

return 0;
}
char _license[] SEC("license") = "GPL";

//// And the build command.
// Target arch and kernel are defined to get the correct macros
// in bpf_tracing.h
$ clang -O3 -Wall -target bpf \
-D__TARGET_ARCH_x86 \
-D__KERNEL__ -c \
elfsnoop.bpf.c \
-I/home/vagrant/libbpf/src/ \
-o elfsnoop.bpf.o


Unfortunately, as Andrii mentioned previously in this topic, I think there are different definitions of pt_regs and my /usr/include/linux/ptrace.h does not have the correct one, as evidenced by the error I get when trying to build.

elfsnoop.bpf.c:89:54: error: no member named 'di' in 'struct pt_regs'
struct linux_binprm *arg = (struct linux_binprm *) PT_REGS_PARM1(ctx);
^~~~~~~~~~~~~~~~~~
/home/vagrant/libbpf/src/bpf_tracing.h:54:32: note: expanded from macro 'PT_REGS_PARM1'
#define PT_REGS_PARM1(x) ((x)->di)

Is this the correct way to access data in kprobes? Most of the information I've found explicitly talking about accessing kprobe data is pretty old (2012-2015). selftests/bpf/ seems to not have examples of accessing kprobe data, and, from my understanding, libbpf-tools is CO-RE dependent which I'm trying to avoid for now just because most default kernels aren't BTF enabled yet (I will definitely be voicing my opinion to distros that this should change since the average user likely isn't keen on recompiling and installing a kernel). I also looked at the brief C Appendix of "BPF Performace Tools" and "Linux Observability with BPF" to try and understand, but I still haven't been able to extract data from the kprobes or raw tracepoints yet.
I think the final question that may (or may not) solve this issue is which pt_regs should be used?
So <linux/ptrace.h> in your case is taken from UAPI headers, not
kernel internal headers. They have different names for field. Drop
-D__KERNEL__ part and it should work.


Also, assuming this is the correct way, is this generalizable to raw tracepoints and kretprobes as well?
kretprobes can only safely access return value, which you would use
PT_REGS_RC(ctx) to get. Input arguments are clobbered by the time
kretprobe fires, so using PT_REGS_PARM1(ctx) would return you
something, but most probably it won't be a correct value of first
input argument.

raw_tracepoints are similar to fentry/fexit in that each input
argument is 8-byte long. See progs/test_vmlinux.c in selftests/bpf for
an example of getting a syscall number on sys_entry. BPF_PROG is
useful macro for such use cases.


After I have these things figured out with some working examples, I think I will publish a github repo with a tutorial as discussed with Andrii in a few messages above.
Appreciate any feedback and help.


Re: Extracting data from tracepoints (and anything else)

Tristan Mayfield
 

I've waited to reply, not wanting to clog the mailing list, but I thought it would be beneficial to follow up on the same topic with kprobes in addition to tracepoints. The main issue I had with tracepoints was not understanding the 8-byte alignment in the arguments. Once that was sorted, getting information was actually really simple.

At this point I've moved to kprobes, kretprobes, and raw tracepoints. From what I understand, if not using CO-RE or vmlinux.h, to access data from kprobes or kretprobes you must access the cpu registers in which those values live?
For example, if I'm porting Brenden Gregg's bpftrace tool "elfsnoop" to libbpf, I'd want to trace "load_elf_binary()." load_elf_binary() only has one argument: "struct linux_binrprm *bprm." So if I want to read that struct, I'd have to access the register with that argument. I think in bpf_tracing.h that macro would be PT_REGS_PARAM1(x). I don't have the greatest understanding of asm and cpu registers, but I believe that would be the %rdi register?
With that in mind, here's my code and build.

#include <linux/bpf.h>
#include "bpf_helpers.h"
#include "bpf_tracing.h"
#include <linux/ptrace.h>
#include <linux/types.h>

SEC("kprobe/load_elf_binary")
int trace_entry(struct pt_regs *ctx) {
char msg[] = "hello world\n"; // for verification that the bpf program is running at all
bpf_trace_printk(msg, sizeof(msg));

struct linux_binprm *arg = (struct linux_binprm *) PT_REGS_PARM1(ctx);

return 0;
}

char _license[] SEC("license") = "GPL";

//// And the build command.
// Target arch and kernel are defined to get the correct macros
// in bpf_tracing.h
$ clang -O3 -Wall -target bpf \
-D__TARGET_ARCH_x86 \
-D__KERNEL__ -c \
elfsnoop.bpf.c \
-I/home/vagrant/libbpf/src/ \
-o elfsnoop.bpf.o

Unfortunately, as Andrii mentioned previously in this topic, I think there are different definitions of pt_regs and my /usr/include/linux/ptrace.h does not have the correct one, as evidenced by the error I get when trying to build.

elfsnoop.bpf.c:89:54: error: no member named 'di' in 'struct pt_regs'
  struct linux_binprm *arg = (struct linux_binprm *) PT_REGS_PARM1(ctx);
                                                     ^~~~~~~~~~~~~~~~~~
/home/vagrant/libbpf/src/bpf_tracing.h:54:32: note: expanded from macro 'PT_REGS_PARM1'
#define PT_REGS_PARM1(x) ((x)->di)

Is this the correct way to access data in kprobes? Most of the information I've found explicitly talking about accessing kprobe data is pretty old (2012-2015). selftests/bpf/ seems to not have examples of accessing kprobe data, and, from my understanding, libbpf-tools is CO-RE dependent which I'm trying to avoid for now just because most default kernels aren't BTF enabled yet (I will definitely be voicing my opinion to distros that this should change since the average user likely isn't keen on recompiling and installing a kernel). I also looked at the brief C Appendix of "BPF Performace Tools" and "Linux Observability with BPF" to try and understand, but I still haven't been able to extract data from the kprobes or raw tracepoints yet.
I think the final question that may (or may not) solve this issue is which pt_regs should be used?

Also, assuming this is the correct way, is this generalizable to raw tracepoints and kretprobes as well?

After I have these things figured out with some working examples, I think I will publish a github repo with a tutorial as discussed with Andrii in a few messages above.
Appreciate any feedback and help.


Re: Extracting data from tracepoints (and anything else)

Andrii Nakryiko
 

adding back mailing list


On Mon, Apr 6, 2020 at 7:58 AM <mayfieldtristan@...> wrote:

Andrii, thanks for the reply!

It's not arbitrary, it's set at 16 in kernel.

ctx->err doesn't exist according to definition above?

Sorry, these were my mistake. I neglected cleaning my code up properly before sending here. I thought I had caught my relic comments and weird experiments, but hadn't.
Really sorry.


I haven't checked the order of fields, but each field has to be long
in size (so 8 bytes on 64-bit arch). BPF is 64-bit arch, so long is
64-bit there. I'm not sure how this plays out on 32-bit target
architecture, but assuming you are on x86-64, all switch int to long
and make __mode_t also long.
Interesting. Here's the tracepoint field order for reference (if nothing else so the information is in one place for people who may read this):

root@ubuntu-focal:~# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_openat/format
name: sys_enter_openat
ID: 622
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;

field:int __syscall_nr; offset:8; size:4; signed:1;
field:int dfd; offset:16; size:8; signed:0;
field:const char * filename; offset:24; size:8; signed:0;
field:int flags; offset:32; size:8; signed:0;
field:umode_t mode; offset:40; size:8; signed:0;

I tried matching the struct to the fields listed, but I am on x86_64 so I guess the ints and umode_t should be long.
Notice offsets, they are all (except for first 4 fields which fit in
first 8 bytes) 8-byte aligned. You can do that in your struct
definitions as:

int __syscall_nr __attribute__((aligned(8)));

OR just use long.

The other issue I've been confused about, is __syscall_nr has an offset of 8 and size 4, but dfd has an offset 16 where I'd expect 12.
Does that mean that there's just meaningless data in that area that should be accounted for?
And, if the data are longs, does that mean that the information given in "format" is incorrect?


0 is not right here, use BPF_F_CURRENT_CPU (0xffffffffULL). Otherwise
you'll get data only on CPU #0 (if you get tracepoint triggered on
that CPU).
Ah, that is really helpful! I think I just took 0 from some code at https://github.com/bpftools/linux-observability-with-bpf
and just hadn't looked into those arguments yet, assuming they were correct!

This is due to invalid memory layour of struct sys_enter_openat_args,
you are reading wrong pointer. But sometimes filename might not be in
memory and you will get -EFAULT (-14), but that should not happen all
the time for sure.
Okay, so fixing the *ctx struct to use longs did, in fact, work! Is there a resource or way that I should have read in order to know that?
I'm actually really excited I can finally read tracepoint data :)
Not sure which part do you mean? Field alignment, sizes, and padding
are all part of standard C. As for tracepoint, selftests in kernel and
various BCC and libbpf examples should be a good starting point.


Since that worked, I'm a little less concerned with the raw tracepoints, but still interested. Here's my modified code for it:

#include "bpf_tracing.h"
#include <linux/bpf.h>
#include "bpf_helpers.h"

SEC("raw_tracepoint/sys_enter")
int bpf_prog(struct bpf_raw_tracepoint_args *ctx) {

volatile struct pt_regs *regs;
volatile const char *pathname;
regs = (struct pt_regs *)ctx->args[0];
pathname = PT_REGS_PARM2_CORE(regs); // instead of (const char *)regs->si;

char msg[] = "Path: %d\n";
bpf_trace_printk(msg, sizeof(msg), pathname);

return 0;
}
char _license[] SEC("license") = "GPL";

With this, I get a compiler error warning that "implicit declaration of function 'PT_REGS_PARM2_CORE' is invalid in C99"
which indicates to me that the defined guards in bpf_tracing.h are keeping me from accessing the macro.
I looked over the bpf_tracing.h file to see if it was an easy error, but it hasn't been obvious to me yet.
I'll keep fiddling with it, and look at selftests, and see if I can get it working.
You can use libbpf-tools/Makefile for inspiration on how to do this:
https://github.com/iovisor/bcc/blob/master/libbpf-tools/Makefile

You might need to define __TARGET_ARCH_x86 and __KERNEL__ explicitly
otherwise. It's easier with vmlinux.h, though.



Finally, I definitely am interested in starting up a tutorial. Right now I can load, attach, and unload BPF programs. Use perf buffers. I'm sure I could use other maps types as they're pretty simple, just haven't dabbled in them yet. I can also read data from tracepoints ;)
I'm going to start on kprobes this week, and hopefully that will be a little more straightforward after doing the work on tracepoints.
That's about what I could start a tutorial with right now. I'll maybe start one this week with some basic "hello world" type stuff, but I'm nervous to get too deep into technical details if the community isn't willing to at least look over it and make sure I'm not steering information the wrong direction. From the sound of it, that's not a huge worry, but a concern of mine nonetheless. There's a lot of deprecated information about BPF out there, and I don't want to make another deprecated resource.
BPF is still rapidly evolving, so yeah, that's a concern. It
definitely requires dedication and time to maintain good up-to-date
documentation. No way around that, unfortunately.


Cheers again for helping me debug my tracepoint code! I'm excited it's working!
Sure, you are welcome.

161 - 180 of 2015