Date   
Re: How to get function param in kretprobe bpf program? #bcc #pragma

Andrii Nakryiko
 

On Sun, Aug 9, 2020 at 8:10 PM <forrest0579@...> wrote:

On Fri, Aug 7, 2020 at 11:31 AM, Andrii Nakryiko wrote:

You can't do it reliably with kretprobe. kretprobe is executed right
before the function is exiting, by that time all the registers that
contained input parameters could have been used for something else. So
you got lucky with struct sock * here, but as a general rule you
shouldn't rely on this. You either have to pair kprobe with kretprobe
and store input arguments, or take a look at fexit program type, it is
just like kretprobe, but faster and guarantees input arguments are
preserved.

Thanks for reply.
It seems fexit it a new feature and I'm using linux v4.15, so fexit can't help here.
kretprobe with kprobe is an option and I've found a lot examples in bbc, but I am also wondering if it is always right to use pid_tgid as key to store params and get it from kretprobe.
I am wondering if there is a chance that following case would happen:

0. attach kprobe program in tcp_set_state, store params in HASHMAP using pid_tgid as key; attach kretprobe in tcp_set_state, lookup params using pid_tgid
1. kprobe program triggered twice with same pid_tgid before kretprobe executed, so can only get the last params

I have this concern because I'm using golang and the two goroutines may map to one thread in kernel. If one goroutine gets interrupted when executing tcp_set_state, another one would have a chance to execute tcp_set_state with the same pid_tgid.
I don't think golang can interrupt thread while it's being executed in
the kernel. So from the golang perspective I wouldn't worry, the
kernel will execute both kprobe and corresponding kretprobe before
golang runtime can do anything about that. But in general, it's
possible to attach kprobe to a kernel function that could be called
multiple times for a given thread, at which point pid_tgid won't be
enough. This cannot happen for syscalls and many other kernel
functions, though. I would imagine that's not the case for
tcp_set_state either. But please double check kernel sources to be
absolutely sure.

Re: Polling multiple BPF_MAP_TYPE_PERF_EVENT_ARRAY causing dropped events

Andrii Nakryiko
 

On Mon, Aug 10, 2020 at 5:22 AM Ian <@iwcampbell> wrote:

[Edited Message Follows]

The project I am working on generically loads BPF object files, pins their respective maps, and then proceeds to use perf_buffer__poll from libbpf to poll the maps. I currently am polling the multiple maps this way after loading and setting everything else up:

while(true) {
LIST_FOREACH(evt, &event_head, list) {
if(evt->map_loaded == 1) {
err = perf_buffer__poll(evt->pb, 10);
if(err < 0) {
break;
}
}
}
}

Where a evt is a structure that looks like:

struct evt_struct {
char * map_name;
FILE * fp;
int map_loaded;
...<some elements removed for clarity>...
struct perf_buffer * pb;
LIST_ENTRY(evt_struct) list;
};

Essentially each event (evt) in this program correlates to a BPF program. I am looping through the events and calling perf_buffer__poll for each of them. This doesn't seem efficient and to me it makes the epoll_wait that perf_buffer__poll calls loose any of its efficiencies by looping through the events before hand. In perf_buffer__poll epoll is used to poll each CPU. Is there a more efficient way to poll multiple maps like this? Does it involve dropping perf? I don't like that I have to make a separate epoll context for each BPF program I am going to poll, that just checks the CPUs. It would be better if I just had two sets for epoll to monitor, but then I would lose the built in perf functionality. More than just being efficient my current polling implementation drops a significant number of events (i.e. the lost event callback in the perf options is called). This is the issue that really must be fixed. I have some ideas that might be worth trying but I wanted to ascertain more information before I do any substantial refactoring:

1) I was thinking about dropping perf and just using another BPF map type (Hash, Array) to pass elements back to user space then using a standard epoll context to monitor all the maps FD. I wouldn't lose any events that way (or if I did I would never know). But I have read in various books that perf maps are the ideal way to send data to user space...
If you have the luxury of using Linux kernel 5.8 or newer, you can try
a new BPF ring buffer map, that provides MPSC queue (so you can queue
from multiple CPUs simultaneously, while BPF perf buffer allows you to
only enqueue on your current CPU). But what's more important for you,
libbpf's ring_buffer interface allows you to do exactly what you need:
poll multiple independent ring buffers simultaneously from a single
epoll FD. See [0] for example of using that API in user-space, plus
[1] for corresponding BPF-side code.

But having said that, we should probably extend libbpf's perf_buffer
API to support similar use cases. I'll try to do this some time soon.

[0] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/prog_tests/ringbuf_multi.c#L54-L62
[1] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/progs/test_ringbuf_multi.c


2) Do perf maps or their buffer pages (for the mmap ring buffer) get cleaned up automatically? When do analyzed entries get removed? I tried increasing the page size of my perf buffer and it just took longer for me to start getting lost events. Which almost suggests I am leaking memory. Am I using perf incorrectly? Each perf buffer is created by:

pb_opts.sample_cb = handle_events;
pb_opts.lost_cb = handle_lost_events;
evt->pb = perf_buffer__new(map_fd, 16, &pb_opts); // Where the map_fd is received from a bpf_object_get call
Yes, after your handle_event() callback returns, libbpf marks that
sample as consumed and the space it was taking is now available for
new samples to be enqueued. You are right, though, that by increasing
the size of each per-CPU perf ring buffer, you'll delay the drops,
because now you can accumulate more samples in the ring before the
ring buffer is full.


Any help or advice would be appreciated!

- Ian

Polling multiple BPF_MAP_TYPE_PERF_EVENT_ARRAY causing dropped events

Ian
 
Edited

The project I am working on generically loads BPF object files, pins their respective maps, and then proceeds to use perf_buffer__poll from libbpf to poll the maps. I currently am polling the multiple maps this way after loading and setting everything else up:

        while(true) {
            LIST_FOREACH(evt, &event_head, list) {
                if(evt->map_loaded == 1) {
                    err = perf_buffer__poll(evt->pb, 10);
                    if(err < 0) {
                        break;
                    }
                }
            }
        }

Where a evt is a structure that looks like:

struct evt_struct {
    char * map_name;
    FILE * fp;
    int map_loaded;
    ...<some elements removed for clarity>...
    struct perf_buffer * pb;
    LIST_ENTRY(evt_struct) list;
};

Essentially each event (evt) in this program correlates to a BPF program. I am looping through the events and calling perf_buffer__poll for each of them. This doesn't seem efficient and to me it makes the epoll_wait that perf_buffer__poll calls loose any of its efficiencies by looping through the events before hand. In perf_buffer__poll epoll is used to poll each CPU. Is there a more efficient way to poll multiple maps like this? Does it involve dropping perf? I don't like that I have to make a separate epoll context for each BPF program I am going to poll, that just checks the CPUs. It would be better if I just had two sets for epoll to monitor, but then I would lose the built in perf functionality. More than just being efficient my current polling implementation drops a significant number of events (i.e. the lost event callback in the perf options is called). This is the issue that really must be fixed.  I have some ideas that might be worth trying but I wanted to ascertain more information before I do any substantial refactoring: 

1) I was thinking about dropping perf and just using another BPF map type (Hash, Array) to pass elements back to user space then using a standard epoll context to monitor all the maps FD. I wouldn't lose any events that way (or if I did I would never know). But I have read in various books that perf maps are the ideal way to send data to user space...

2) Do perf maps or their buffer pages (for the mmap ring buffer) get cleaned up automatically? When do analyzed entries get removed? I tried increasing the page size of my perf buffer and it just took longer for me to start getting lost events. Which almost suggests I am leaking memory. Am I using perf incorrectly? Each perf buffer is created by:

pb_opts.sample_cb = handle_events;
pb_opts.lost_cb = handle_lost_events;
evt->pb = perf_buffer__new(map_fd, 16, &pb_opts); // Where the map_fd is received from a bpf_object_get call

Any help or advice would be appreciated!

- Ian
 

Re: How to get function param in kretprobe bpf program? #bcc #pragma

forrest0579@...
 

On Fri, Aug 7, 2020 at 11:31 AM, Andrii Nakryiko wrote:
You can't do it reliably with kretprobe. kretprobe is executed right
before the function is exiting, by that time all the registers that
contained input parameters could have been used for something else. So
you got lucky with struct sock * here, but as a general rule you
shouldn't rely on this. You either have to pair kprobe with kretprobe
and store input arguments, or take a look at fexit program type, it is
just like kretprobe, but faster and guarantees input arguments are
preserved.
Thanks for reply.
It seems fexit it a new feature and I'm using linux v4.15, so fexit can't help here.
kretprobe with kprobe is an option and I've found a lot examples in bbc, but I am also wondering if it is always right to use pid_tgid as key to store params and get it from kretprobe.
I am wondering if there is a chance that following case would happen:

0. attach kprobe program in tcp_set_state, store params in HASHMAP using pid_tgid as key; attach kretprobe in tcp_set_state, lookup params using pid_tgid
1. kprobe program triggered twice with same pid_tgid before kretprobe executed, so can only get the last params

I have this concern because I'm using golang and the two goroutines may map to one thread in kernel. If one goroutine gets interrupted when executing tcp_set_state, another one would have a chance to execute tcp_set_state with the same pid_tgid.

Re: How to get function param in kretprobe bpf program? #bcc #pragma

Andrii Nakryiko
 

On Fri, Aug 7, 2020 at 12:45 AM <forrest0579@...> wrote:

When using kprobe in bcc, I can get param directly like `int kprobe__tcp_set_state(struct pt_regs *ctx, struct sock *sk, int state)`
But it seems not to work in kretprobe, I've found that I can get first param by using `struct sock *sk = (void*)ctx->bx`
but I can't get the second param through `ctx->cx`.
Am I get the wrong register? I'm in x86-64
You can't do it reliably with kretprobe. kretprobe is executed right
before the function is exiting, by that time all the registers that
contained input parameters could have been used for something else. So
you got lucky with struct sock * here, but as a general rule you
shouldn't rely on this. You either have to pair kprobe with kretprobe
and store input arguments, or take a look at fexit program type, it is
just like kretprobe, but faster and guarantees input arguments are
preserved.

How to get function param in kretprobe bpf program? #bcc #pragma

forrest0579@...
 

When using kprobe in bcc, I can get param directly like `int kprobe__tcp_set_state(struct pt_regs *ctx, struct sock *sk, int state)`
But it seems not to work in kretprobe, I've found that I can get first param by using `struct sock *sk = (void*)ctx->bx`
but I can't get the second param through `ctx->cx`.
Am I get the wrong register? I'm in x86-64

Clang target bpf compile issue/fail on Ubuntu and Debian

Jesper Dangaard Brouer
 

The BPF UAPI header file <linux/bpf.h> includes <linux/types.h>, which gives
BPF-programs access to types e.g. __u32, __u64, __u8, etc.

On Ubuntu/Debian when compiling with clang option[1] "-target bpf" the
compile fails because it cannot find the file <asm/types.h>, which is
included from <linux/types.h>. This is because Ubuntu/Debian tries to
support multiple architectures on a single system[2]. On x86_64 the file
<asm/types.h> is located in /usr/include/x86_64-linux-gnu/, which the distro
compiler will add to it's search path (/usr/include/<triplet> [3]). Note, it
works if not specifying target bpf, but as explained in kernel doc[1] the
clang target bpf should really be used (to avoid other issues).

There are two workarounds: (1) To have an extra include dir on Ubuntu (which
seems too x86 specific) like: CFLAGS += -I/usr/include/x86_64-linux-gnu .
Or (2) install the package gcc-multilib on Ubuntu.

The question is: Should Ubuntu/Debian have a /usr/include/<triplet>
directory for BPF? (as part of their multi-arch approach)

Or should clang use the compile-host's triplet for the /usr/include/triplet
path even when giving clang -target bpf option?

p.s. GCC choose 'bpf-unknown-none' target triplet for BPF.


Links:
[1] https://www.kernel.org/doc/html/latest/bpf/bpf_devel_QA.html#q-clang-flag-for-target-bpf
[2] https://wiki.ubuntu.com/MultiarchSpec
[3] https://wiki.osdev.org/Target_Triplet
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer

Accessing current netns info in a TC eBPF program

siva.gaggara@...
 

Hi,

I am trying to attach the same TC eBPF program instance to both host
and container interfaces. So some of the maps need to be qualified
with the netns id. I was wondering if there is a way to access the
'current' netns info in a TC eBPF program. It would be quite helpful if
you could provide me with some pointers.

Thanks

Siva

Re: Invalid filename/mode in openat tracepoint data

alessandro.gario@...
 

Hello Tristan!

That is the same path I found when debugging with strace! I think I also saw a missing comm string during my tests (with printk from BCC), but I would have to reproduce it again to be sure.

I'm going to test this one more time on kernel 4.18, as I don't remember finding this problem when I started writing the library on Ubuntu 18.10 (and maybe I'll also try to take a look at the openat implementation).

Thanks so much for your help!

Alessandro Gario

On Fri, Jul 24, 2020 at 10:11 am, Tristan Mayfield <mayfieldtristan@...> wrote:
Alessandro,
I figured out that it's non-deterministic. So sometimes certain commands (git, awk, rm, uname, etc.) will have an openat with no filename, but other times they will.
I ran these commands experimentally and got results similar to what I have below for all of them:
$ rm something
sys_enter_openat comm: rm pid:3512 filename:/etc/ld.so.cache (140398792747904)
sys_enter_openat comm: rm pid:3512 filename:/lib/x86_64-linux-gnu/libc.so.6 (140398792789520)
sys_enter_openat comm: rm pid:3512 filename:/usr/lib/locale/locale-archive (140398792339408)
sys_enter_openat comm: rm pid:3514 filename:/etc/ld.so.cache (139648615484288)
sys_enter_openat comm: rm pid:3514 filename:/lib/x86_64-linux-gnu/libc.so.6 (139648615525904)
sys_enter_openat comm: rm pid:3514 filename: (139648615075792)
Because it's been so consistent, I believe the missing file is... always? Most of the time? At least a good part of the time "/usr/lib/locale/locale-archive".
I'm not sure why an archive file would behave differently, but it seems to be causing this issue. You can use the below bpftrace script to figure out which commands most often create the no-name situation.
tracepoint:syscalls:sys_enter_open,
tracepoint:syscalls:sys_enter_openat
{
if( str(args->filename) == "") {
printf("sys_enter_openat comm: %s pid:%d filename:%s (%ld)\n",comm,pid, str(args->filename), args->filename);
}
}
Tristan

Re: Invalid filename/mode in openat tracepoint data

Tristan Mayfield
 

Alessandro,

I figured out that it's non-deterministic. So sometimes certain commands (git, awk, rm, uname, etc.) will have an openat with no filename, but other times they will.
I ran these commands experimentally and got results similar to what I have below for all of them:

$ rm something
sys_enter_openat comm: rm pid:3512 filename:/etc/ld.so.cache (140398792747904)
sys_enter_openat comm: rm pid:3512 filename:/lib/x86_64-linux-gnu/libc.so.6 (140398792789520)
sys_enter_openat comm: rm pid:3512 filename:/usr/lib/locale/locale-archive (140398792339408)

sys_enter_openat comm: rm pid:3514 filename:/etc/ld.so.cache (139648615484288)
sys_enter_openat comm: rm pid:3514 filename:/lib/x86_64-linux-gnu/libc.so.6 (139648615525904)
sys_enter_openat comm: rm pid:3514 filename: (139648615075792)

Because it's been so consistent, I believe the missing file is... always? Most of the time? At least a good part of the time "/usr/lib/locale/locale-archive".
I'm not sure why an archive file would behave differently, but it seems to be causing this issue. You can use the below bpftrace script to figure out which commands most often create the no-name situation.

tracepoint:syscalls:sys_enter_open,
tracepoint:syscalls:sys_enter_openat
{
        if( str(args->filename) == "") {
        printf("sys_enter_openat comm: %s pid:%d filename:%s (%ld)\n",comm,pid, str(args->filename), args->filename);
        }
}

Tristan

Re: Invalid filename/mode in openat tracepoint data

Tristan Mayfield
 

I ran the same test with strace. One of the file data points that doesn't show up is this:

bpftrace:
sys_enter_openat mode:0 filename: (93911401193582)

strace:
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3

But "locale-archive" does show up in different contexts in bpftrace.
The major commonality I'm seeing is that the file opened right before the "no-name" file seems to be a shared object that was (presumably) dynamically used. Here are some examples:

sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libc.so.6 (140092012560096)
sys_enter_openat mode:0 filename: (93826516217966)

sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libtinfo.so.6 (139814679237888)
sys_enter_openat mode:0 filename: (139814679027664)

sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libc.so.6 (140231836626656)
sys_enter_openat mode:0 filename: (94880667103342)

This might be a linking issue and openat isn't getting supplied a filename? I'll keep debugging since this is interesting. Have you looked through bug reports for bpftrace or BCC?

Tristan

Re: Invalid filename/mode in openat tracepoint data

alessandro.gario@...
 

Hello Tristan,

thanks for spending the time to check this out!

One thing I forgot to mention is that I can verify with strace that the filename parameter is always present.
I initially suspected that the pointer wasn't mapped at the time the probe attempted to read from it, but shouldn't the tracepoint interface make sure it is accessible?

Alessandro Gario

On Fri, Jul 24, 2020 at 10:27 am, Tristan Mayfield <mayfieldtristan@...> wrote:
I don't have an answer, but I verified this with the following
bpftrace script and using the action of switching to zsh/oh-my-zsh
from bash.
---
tracepoint:syscalls:sys_enter_open,
tracepoint:syscalls:sys_enter_openat
{
printf("sys_enter_openat mode:%ld filename:%s (%ld)\n",
args->mode, str(args->filename), args->filename);
}
---
Here's some example data (not all the generated output) with spaces
around some of the issue lines:
sys_enter_openat mode:0 filename: (94797689127022)
sys_enter_openat mode:0 filename:/usr/lib/locale/locale-archive
(139635662831568)
sys_enter_openat mode:0 filename:/usr/share/locale/locale.alias
(140728940893664)
sys_enter_openat mode:0
filename:/usr/share/locale/en_US/LC_MESSAGES/git.mo (94797710736928)
sys_enter_openat mode:0
filename:/usr/share/locale/en/LC_MESSAGES/git.mo (94797710737472)
sys_enter_openat mode:0
filename:/usr/share/locale-langpack/en_US/LC_MESSAGES/git.mo
(94797710737712)
sys_enter_openat mode:0
filename:/usr/share/locale-langpack/en/LC_MESSAGES/git.mo
(94797710737584)
sys_enter_openat mode:438 filename:/dev/null (139809161489144)
sys_enter_openat mode:0 filename:/etc/ld.so.cache (140236659837824)
sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libpcre2-8.so.0
(140236659879440)
sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libz.so.1
(140236659639520)
sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libpthread.so.0
(140236659640784)
sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libc.so.6
(140236659642080)
sys_enter_openat mode:0 filename: (94426721874030)
sys_enter_openat mode:0 filename:/usr/lib/locale/locale-archive
(140236658581456)
sys_enter_openat mode:0 filename:/usr/share/locale/locale.alias
(140728357496384)
I'm tempted to think that this is some behavior of the system I don't
understand yet, rather than being a bug. But I can't say for sure.
Tristan
On 7/24/20, alessandro.gario@... <alessandro.gario@...> wrote:
Hello everyone,
I'll start with some backstory first: I wrote my own BPF library to
trace functions/syscalls and yesterday I noticed that I am sometimes
receiving broken openat() tracepoint data. This happens randomly, often
when processes are created in a short burst (like opening a new
terminal instance with zsh + oh-my-zsh installed).
I initially thought it was my fault, and proceeded to debug the
generated IR code and double check my tracepoint data definition
(which, for reference, can be found here:
https://github.com/trailofbits/ebpfpub/blob/master/ebpfpub/src/tracepointserializers.cpp#L425).
I ended up giving up, not finding the reason this was failing.
Today, I have tried to replicate the same functionality using BCC so I
could compare the output with my library and I ended up inside the same
weird behavior:
Full script here:
https://gist.github.com/alessandrogario/968b9c3ea78559f470bc650c8496449e#file-bcc_openat_tracepoint-py
--
bpf_trace_printk("sys_enter_openat mode:%ld "
"filename:%s (%ld)\\n",
args->mode,
args->filename,
args->filename);
2608.223222000 b'git' 8998 b'sys_enter_openat mode:0 filename:
(93849603522670)
--
I was able to replicate this problem on Ubuntu 20.20 (5.4.0), Arch
Linux (5.7.9) and Ubuntu 19.10 (5.3.0).
Has anyone ever encountered this problem, or has a few pointers as to
why it happening?
Thanks!
Alessandro

Re: Invalid filename/mode in openat tracepoint data

Tristan Mayfield
 

I don't have an answer, but I verified this with the following
bpftrace script and using the action of switching to zsh/oh-my-zsh
from bash.

---
tracepoint:syscalls:sys_enter_open,
tracepoint:syscalls:sys_enter_openat
{
printf("sys_enter_openat mode:%ld filename:%s (%ld)\n",
args->mode, str(args->filename), args->filename);
}
---

Here's some example data (not all the generated output) with spaces
around some of the issue lines:

sys_enter_openat mode:0 filename: (94797689127022)

sys_enter_openat mode:0 filename:/usr/lib/locale/locale-archive
(139635662831568)
sys_enter_openat mode:0 filename:/usr/share/locale/locale.alias
(140728940893664)
sys_enter_openat mode:0
filename:/usr/share/locale/en_US/LC_MESSAGES/git.mo (94797710736928)
sys_enter_openat mode:0
filename:/usr/share/locale/en/LC_MESSAGES/git.mo (94797710737472)
sys_enter_openat mode:0
filename:/usr/share/locale-langpack/en_US/LC_MESSAGES/git.mo
(94797710737712)
sys_enter_openat mode:0
filename:/usr/share/locale-langpack/en/LC_MESSAGES/git.mo
(94797710737584)
sys_enter_openat mode:438 filename:/dev/null (139809161489144)
sys_enter_openat mode:0 filename:/etc/ld.so.cache (140236659837824)
sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libpcre2-8.so.0
(140236659879440)
sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libz.so.1
(140236659639520)
sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libpthread.so.0
(140236659640784)
sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libc.so.6
(140236659642080)

sys_enter_openat mode:0 filename: (94426721874030)

sys_enter_openat mode:0 filename:/usr/lib/locale/locale-archive
(140236658581456)
sys_enter_openat mode:0 filename:/usr/share/locale/locale.alias
(140728357496384)

I'm tempted to think that this is some behavior of the system I don't
understand yet, rather than being a bug. But I can't say for sure.

Tristan

On 7/24/20, alessandro.gario@... <alessandro.gario@...> wrote:
Hello everyone,

I'll start with some backstory first: I wrote my own BPF library to
trace functions/syscalls and yesterday I noticed that I am sometimes
receiving broken openat() tracepoint data. This happens randomly, often
when processes are created in a short burst (like opening a new
terminal instance with zsh + oh-my-zsh installed).

I initially thought it was my fault, and proceeded to debug the
generated IR code and double check my tracepoint data definition
(which, for reference, can be found here:
https://github.com/trailofbits/ebpfpub/blob/master/ebpfpub/src/tracepointserializers.cpp#L425).

I ended up giving up, not finding the reason this was failing.

Today, I have tried to replicate the same functionality using BCC so I
could compare the output with my library and I ended up inside the same
weird behavior:

Full script here:
https://gist.github.com/alessandrogario/968b9c3ea78559f470bc650c8496449e#file-bcc_openat_tracepoint-py

--
bpf_trace_printk("sys_enter_openat mode:%ld "
"filename:%s (%ld)\\n",
args->mode,
args->filename,
args->filename);

2608.223222000 b'git' 8998 b'sys_enter_openat mode:0 filename:
(93849603522670)
--

I was able to replicate this problem on Ubuntu 20.20 (5.4.0), Arch
Linux (5.7.9) and Ubuntu 19.10 (5.3.0).

Has anyone ever encountered this problem, or has a few pointers as to
why it happening?

Thanks!

Alessandro





Invalid filename/mode in openat tracepoint data

alessandro.gario@...
 

Hello everyone,

I'll start with some backstory first: I wrote my own BPF library to trace functions/syscalls and yesterday I noticed that I am sometimes receiving broken openat() tracepoint data. This happens randomly, often when processes are created in a short burst (like opening a new terminal instance with zsh + oh-my-zsh installed).

I initially thought it was my fault, and proceeded to debug the generated IR code and double check my tracepoint data definition (which, for reference, can be found here: https://github.com/trailofbits/ebpfpub/blob/master/ebpfpub/src/tracepointserializers.cpp#L425). I ended up giving up, not finding the reason this was failing.

Today, I have tried to replicate the same functionality using BCC so I could compare the output with my library and I ended up inside the same weird behavior:

Full script here: https://gist.github.com/alessandrogario/968b9c3ea78559f470bc650c8496449e#file-bcc_openat_tracepoint-py

--
bpf_trace_printk("sys_enter_openat mode:%ld "
"filename:%s (%ld)\\n",
args->mode,
args->filename,
args->filename);

2608.223222000 b'git' 8998 b'sys_enter_openat mode:0 filename: (93849603522670)
--

I was able to replicate this problem on Ubuntu 20.20 (5.4.0), Arch Linux (5.7.9) and Ubuntu 19.10 (5.3.0).

Has anyone ever encountered this problem, or has a few pointers as to why it happening?

Thanks!

Alessandro

Port mirroring using bpf_clone_redirect

Kanthi P
 

Hello,

I am trying a port mirroring use case that basically mirrors traffic from host1 to host2. On host 1 I have two interfaces eth0 and eth1 and have configured vxlan interface on eth1. I have used bpf_clone_redirect on both ingress/egress of eth0 and mirrored them to vxlan1(on eth1). This vxlan tunnel is ending on host2. So I am actually seeing all the packets on host2, but the order of the packets is too jumbled. Could this be because clone_and_redirect on ingress/egress is just redirecting both in parallel? But strangely the packet capture on host1’s ethernet interface is all fine in the order.

Appreciate your inputs!

Regards,
Kanthi

bpf batch support for queue/stack

Simone Magnani
 

Hi,

Lately, I've been working on in-kernel traffic analysis with eBPF and
the newest features released in the latest kernel versions
(queue/stack, batch operations,...).
For some reason, I couldn't help but notice that Queues and Stacks bpf
map types don't support batch operations at all, and I was wondering
why. Is there any reason why this decision has been made or it is just
temporary and you are planning to implement it later on?

Reference file: linux/kernel/bpf/queue_stack_maps.c (and all the
others belonging to the same directory)

Thanks in advance,

Regards,
Simone

Re: BPF Concurrency

Kanthi P
 

Thanks, fetch_and_add would be more appropriate to my use-case


On Sun, Jun 21, 2020 at 06:02 PM, Yonghong Song wrote:
You cannot use the return value. A recent llvm should return an error
if you try to use it.

There is some preliminary work to have more atomic operations in the
BPF ISA. https://reviews.llvm.org/D72184. We could add a version of
fetch_and_add with proper return value. This may take some time as we
need to ensure kernel has proper support.

Re: BPF Concurrency

Yonghong Song
 

On Sun, Jun 21, 2020 at 4:17 PM Kanthi P <Pavuluri.kanthi@...> wrote:

Thanks Andrii. __sync_fetch_and_add doesn't seem to work as expected, it is adding the increment, but it is returning the wrong value.
I am actually hitting the same issue mentioned here: https://lists.iovisor.org/g/iovisor-dev/topic/problems_with/23670176?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,20,23670176

Can anyone suggest if it is fixed recently? I am on 4.15 kernel.
You cannot use the return value. A recent llvm should return an error
if you try to use it.

There is some preliminary work to have more atomic operations in the
BPF ISA. https://reviews.llvm.org/D72184. We could add a version of
fetch_and_add with proper return value. This may take some time as we
need to ensure kernel has proper support.


Thanks,
Kanthi

Re: BPF Concurrency

Kanthi P
 

Hi Jesper and Quentin,

Nice, I checked that logic. If I understand it right, that implementation would also need few operations to be atomic, for example the window movements(whenever R and B are being added/subtracted).
That's the issue that I am attempting to solve, but couldn't conclude anything yet.

Regards,
Kanthi

Re: BPF Concurrency

Kanthi P
 

Thanks Andrii. __sync_fetch_and_add doesn't seem to work as expected, it is adding the increment, but it is returning the wrong value.
I am actually hitting the same issue mentioned here: https://lists.iovisor.org/g/iovisor-dev/topic/problems_with/23670176?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,20,23670176

Can anyone suggest if it is fixed recently? I am on 4.15 kernel.

Thanks,
Kanthi