Date   

Re: Invalid filename/mode in openat tracepoint data

Tristan Mayfield
 

I ran the same test with strace. One of the file data points that doesn't show up is this:

bpftrace:
sys_enter_openat mode:0 filename: (93911401193582)

strace:
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3

But "locale-archive" does show up in different contexts in bpftrace.
The major commonality I'm seeing is that the file opened right before the "no-name" file seems to be a shared object that was (presumably) dynamically used. Here are some examples:

sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libc.so.6 (140092012560096)
sys_enter_openat mode:0 filename: (93826516217966)

sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libtinfo.so.6 (139814679237888)
sys_enter_openat mode:0 filename: (139814679027664)

sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libc.so.6 (140231836626656)
sys_enter_openat mode:0 filename: (94880667103342)

This might be a linking issue and openat isn't getting supplied a filename? I'll keep debugging since this is interesting. Have you looked through bug reports for bpftrace or BCC?

Tristan


Re: Invalid filename/mode in openat tracepoint data

alessandro.gario@...
 

Hello Tristan,

thanks for spending the time to check this out!

One thing I forgot to mention is that I can verify with strace that the filename parameter is always present.
I initially suspected that the pointer wasn't mapped at the time the probe attempted to read from it, but shouldn't the tracepoint interface make sure it is accessible?

Alessandro Gario

On Fri, Jul 24, 2020 at 10:27 am, Tristan Mayfield <mayfieldtristan@...> wrote:
I don't have an answer, but I verified this with the following
bpftrace script and using the action of switching to zsh/oh-my-zsh
from bash.
---
tracepoint:syscalls:sys_enter_open,
tracepoint:syscalls:sys_enter_openat
{
printf("sys_enter_openat mode:%ld filename:%s (%ld)\n",
args->mode, str(args->filename), args->filename);
}
---
Here's some example data (not all the generated output) with spaces
around some of the issue lines:
sys_enter_openat mode:0 filename: (94797689127022)
sys_enter_openat mode:0 filename:/usr/lib/locale/locale-archive
(139635662831568)
sys_enter_openat mode:0 filename:/usr/share/locale/locale.alias
(140728940893664)
sys_enter_openat mode:0
filename:/usr/share/locale/en_US/LC_MESSAGES/git.mo (94797710736928)
sys_enter_openat mode:0
filename:/usr/share/locale/en/LC_MESSAGES/git.mo (94797710737472)
sys_enter_openat mode:0
filename:/usr/share/locale-langpack/en_US/LC_MESSAGES/git.mo
(94797710737712)
sys_enter_openat mode:0
filename:/usr/share/locale-langpack/en/LC_MESSAGES/git.mo
(94797710737584)
sys_enter_openat mode:438 filename:/dev/null (139809161489144)
sys_enter_openat mode:0 filename:/etc/ld.so.cache (140236659837824)
sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libpcre2-8.so.0
(140236659879440)
sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libz.so.1
(140236659639520)
sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libpthread.so.0
(140236659640784)
sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libc.so.6
(140236659642080)
sys_enter_openat mode:0 filename: (94426721874030)
sys_enter_openat mode:0 filename:/usr/lib/locale/locale-archive
(140236658581456)
sys_enter_openat mode:0 filename:/usr/share/locale/locale.alias
(140728357496384)
I'm tempted to think that this is some behavior of the system I don't
understand yet, rather than being a bug. But I can't say for sure.
Tristan
On 7/24/20, alessandro.gario@... <alessandro.gario@...> wrote:
Hello everyone,
I'll start with some backstory first: I wrote my own BPF library to
trace functions/syscalls and yesterday I noticed that I am sometimes
receiving broken openat() tracepoint data. This happens randomly, often
when processes are created in a short burst (like opening a new
terminal instance with zsh + oh-my-zsh installed).
I initially thought it was my fault, and proceeded to debug the
generated IR code and double check my tracepoint data definition
(which, for reference, can be found here:
https://github.com/trailofbits/ebpfpub/blob/master/ebpfpub/src/tracepointserializers.cpp#L425).
I ended up giving up, not finding the reason this was failing.
Today, I have tried to replicate the same functionality using BCC so I
could compare the output with my library and I ended up inside the same
weird behavior:
Full script here:
https://gist.github.com/alessandrogario/968b9c3ea78559f470bc650c8496449e#file-bcc_openat_tracepoint-py
--
bpf_trace_printk("sys_enter_openat mode:%ld "
"filename:%s (%ld)\\n",
args->mode,
args->filename,
args->filename);
2608.223222000 b'git' 8998 b'sys_enter_openat mode:0 filename:
(93849603522670)
--
I was able to replicate this problem on Ubuntu 20.20 (5.4.0), Arch
Linux (5.7.9) and Ubuntu 19.10 (5.3.0).
Has anyone ever encountered this problem, or has a few pointers as to
why it happening?
Thanks!
Alessandro


Re: Invalid filename/mode in openat tracepoint data

Tristan Mayfield
 

I don't have an answer, but I verified this with the following
bpftrace script and using the action of switching to zsh/oh-my-zsh
from bash.

---
tracepoint:syscalls:sys_enter_open,
tracepoint:syscalls:sys_enter_openat
{
printf("sys_enter_openat mode:%ld filename:%s (%ld)\n",
args->mode, str(args->filename), args->filename);
}
---

Here's some example data (not all the generated output) with spaces
around some of the issue lines:

sys_enter_openat mode:0 filename: (94797689127022)

sys_enter_openat mode:0 filename:/usr/lib/locale/locale-archive
(139635662831568)
sys_enter_openat mode:0 filename:/usr/share/locale/locale.alias
(140728940893664)
sys_enter_openat mode:0
filename:/usr/share/locale/en_US/LC_MESSAGES/git.mo (94797710736928)
sys_enter_openat mode:0
filename:/usr/share/locale/en/LC_MESSAGES/git.mo (94797710737472)
sys_enter_openat mode:0
filename:/usr/share/locale-langpack/en_US/LC_MESSAGES/git.mo
(94797710737712)
sys_enter_openat mode:0
filename:/usr/share/locale-langpack/en/LC_MESSAGES/git.mo
(94797710737584)
sys_enter_openat mode:438 filename:/dev/null (139809161489144)
sys_enter_openat mode:0 filename:/etc/ld.so.cache (140236659837824)
sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libpcre2-8.so.0
(140236659879440)
sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libz.so.1
(140236659639520)
sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libpthread.so.0
(140236659640784)
sys_enter_openat mode:0 filename:/lib/x86_64-linux-gnu/libc.so.6
(140236659642080)

sys_enter_openat mode:0 filename: (94426721874030)

sys_enter_openat mode:0 filename:/usr/lib/locale/locale-archive
(140236658581456)
sys_enter_openat mode:0 filename:/usr/share/locale/locale.alias
(140728357496384)

I'm tempted to think that this is some behavior of the system I don't
understand yet, rather than being a bug. But I can't say for sure.

Tristan

On 7/24/20, alessandro.gario@... <alessandro.gario@...> wrote:
Hello everyone,

I'll start with some backstory first: I wrote my own BPF library to
trace functions/syscalls and yesterday I noticed that I am sometimes
receiving broken openat() tracepoint data. This happens randomly, often
when processes are created in a short burst (like opening a new
terminal instance with zsh + oh-my-zsh installed).

I initially thought it was my fault, and proceeded to debug the
generated IR code and double check my tracepoint data definition
(which, for reference, can be found here:
https://github.com/trailofbits/ebpfpub/blob/master/ebpfpub/src/tracepointserializers.cpp#L425).

I ended up giving up, not finding the reason this was failing.

Today, I have tried to replicate the same functionality using BCC so I
could compare the output with my library and I ended up inside the same
weird behavior:

Full script here:
https://gist.github.com/alessandrogario/968b9c3ea78559f470bc650c8496449e#file-bcc_openat_tracepoint-py

--
bpf_trace_printk("sys_enter_openat mode:%ld "
"filename:%s (%ld)\\n",
args->mode,
args->filename,
args->filename);

2608.223222000 b'git' 8998 b'sys_enter_openat mode:0 filename:
(93849603522670)
--

I was able to replicate this problem on Ubuntu 20.20 (5.4.0), Arch
Linux (5.7.9) and Ubuntu 19.10 (5.3.0).

Has anyone ever encountered this problem, or has a few pointers as to
why it happening?

Thanks!

Alessandro






Invalid filename/mode in openat tracepoint data

alessandro.gario@...
 

Hello everyone,

I'll start with some backstory first: I wrote my own BPF library to trace functions/syscalls and yesterday I noticed that I am sometimes receiving broken openat() tracepoint data. This happens randomly, often when processes are created in a short burst (like opening a new terminal instance with zsh + oh-my-zsh installed).

I initially thought it was my fault, and proceeded to debug the generated IR code and double check my tracepoint data definition (which, for reference, can be found here: https://github.com/trailofbits/ebpfpub/blob/master/ebpfpub/src/tracepointserializers.cpp#L425). I ended up giving up, not finding the reason this was failing.

Today, I have tried to replicate the same functionality using BCC so I could compare the output with my library and I ended up inside the same weird behavior:

Full script here: https://gist.github.com/alessandrogario/968b9c3ea78559f470bc650c8496449e#file-bcc_openat_tracepoint-py

--
bpf_trace_printk("sys_enter_openat mode:%ld "
"filename:%s (%ld)\\n",
args->mode,
args->filename,
args->filename);

2608.223222000 b'git' 8998 b'sys_enter_openat mode:0 filename: (93849603522670)
--

I was able to replicate this problem on Ubuntu 20.20 (5.4.0), Arch Linux (5.7.9) and Ubuntu 19.10 (5.3.0).

Has anyone ever encountered this problem, or has a few pointers as to why it happening?

Thanks!

Alessandro


Port mirroring using bpf_clone_redirect

Kanthi P
 

Hello,

I am trying a port mirroring use case that basically mirrors traffic from host1 to host2. On host 1 I have two interfaces eth0 and eth1 and have configured vxlan interface on eth1. I have used bpf_clone_redirect on both ingress/egress of eth0 and mirrored them to vxlan1(on eth1). This vxlan tunnel is ending on host2. So I am actually seeing all the packets on host2, but the order of the packets is too jumbled. Could this be because clone_and_redirect on ingress/egress is just redirecting both in parallel? But strangely the packet capture on host1’s ethernet interface is all fine in the order.

Appreciate your inputs!

Regards,
Kanthi


bpf batch support for queue/stack

Simone Magnani
 

Hi,

Lately, I've been working on in-kernel traffic analysis with eBPF and
the newest features released in the latest kernel versions
(queue/stack, batch operations,...).
For some reason, I couldn't help but notice that Queues and Stacks bpf
map types don't support batch operations at all, and I was wondering
why. Is there any reason why this decision has been made or it is just
temporary and you are planning to implement it later on?

Reference file: linux/kernel/bpf/queue_stack_maps.c (and all the
others belonging to the same directory)

Thanks in advance,

Regards,
Simone


Re: BPF Concurrency

Kanthi P
 

Thanks, fetch_and_add would be more appropriate to my use-case


On Sun, Jun 21, 2020 at 06:02 PM, Yonghong Song wrote:
You cannot use the return value. A recent llvm should return an error
if you try to use it.

There is some preliminary work to have more atomic operations in the
BPF ISA. https://reviews.llvm.org/D72184. We could add a version of
fetch_and_add with proper return value. This may take some time as we
need to ensure kernel has proper support.


Re: BPF Concurrency

Yonghong Song
 

On Sun, Jun 21, 2020 at 4:17 PM Kanthi P <Pavuluri.kanthi@...> wrote:

Thanks Andrii. __sync_fetch_and_add doesn't seem to work as expected, it is adding the increment, but it is returning the wrong value.
I am actually hitting the same issue mentioned here: https://lists.iovisor.org/g/iovisor-dev/topic/problems_with/23670176?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,20,23670176

Can anyone suggest if it is fixed recently? I am on 4.15 kernel.
You cannot use the return value. A recent llvm should return an error
if you try to use it.

There is some preliminary work to have more atomic operations in the
BPF ISA. https://reviews.llvm.org/D72184. We could add a version of
fetch_and_add with proper return value. This may take some time as we
need to ensure kernel has proper support.


Thanks,
Kanthi


Re: BPF Concurrency

Kanthi P
 

Hi Jesper and Quentin,

Nice, I checked that logic. If I understand it right, that implementation would also need few operations to be atomic, for example the window movements(whenever R and B are being added/subtracted).
That's the issue that I am attempting to solve, but couldn't conclude anything yet.

Regards,
Kanthi


Re: BPF Concurrency

Kanthi P
 

Thanks Andrii. __sync_fetch_and_add doesn't seem to work as expected, it is adding the increment, but it is returning the wrong value.
I am actually hitting the same issue mentioned here: https://lists.iovisor.org/g/iovisor-dev/topic/problems_with/23670176?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,20,23670176

Can anyone suggest if it is fixed recently? I am on 4.15 kernel.

Thanks,
Kanthi


Re: BPF Concurrency

Quentin Monnet
 

2020-06-16 14:18 UTC+0200 ~ Jesper Dangaard Brouer <brouer@...>
Hi Kanthi and Quentin,

You mention token counter as the use-case, for somehow ratelimiting
(TCP connections in your case?).

That reminds me that Quentin implemented[1] a two-color token-bucket
ratelimiter in BPF. (Notice, code is 3 years old, and e.g. use the old
syntax for maps). There is a really good README[1] with illustrations:

[1] https://github.com/qmonnet/tbpoc-bpf

What I found interesting, is that the code actually doesn't implement
this via token counter, but instead uses a sliding time window. Based
on getting timestamping via bpf_ktime_get_ns() code[2]. (p.s. as we have
discussed on xdp-newbies list, getting this timestamp also have
overhead, but we can mitigate overhead by only doing it once per NAPI
cycle).

I would like to hear Quentin why he avoided maintaining the token counter?

-Jesper

Hi Jesper, Kanthi,

This token bucket implementation was realised in the context of a
research project (BEBA), for which the OPP interface, based on “extended
finite state machines”, was developed. This abstraction interface can be
used to implement various programs, including the token bucket
application, on several platforms (we had proof-of-concept
implementations on FPGAs or in software switches for example). Since it
is strictly event-based, the model does not really allow for “external”
interactions such as map update from user space, which would be
necessary to increment the token counter periodically. More information
on BEBA and OPP is available from the links on the GitHub README.

My objective at the time was simply to translate this token bucket
example into an eBPF program, in order to show that eBPF could be a
target for OPP, so I stuck to the OPP-based algorithm we had. It should
work like a regular token bucket implementation, the sliding window is
just another way to represent the maximum number of packets to accept in
a given period of time, without having to periodically increment the
counter. For example, a long shift of the sliding window (case 2 on the
diagram) equates to refilling the bucket entirely.

We may have run some benchmarks at the time, but we did not compare it
to a more standard implementation, so I could not really tell which is
best in terms of performance.

The code is from 2017, but if you want to test it, I would still expect
it to load (I haven't checked recently though). Jesper, the syntax for
maps is still the one used today when loading programs with iproute2 I
believe? Anyway, not sure how I can help further, but let me know if you
have questions.

Best regards,
Quentin


Re: BPF Concurrency

Jesper Dangaard Brouer
 

Hi Kanthi and Quentin,

You mention token counter as the use-case, for somehow ratelimiting
(TCP connections in your case?).

That reminds me that Quentin implemented[1] a two-color token-bucket
ratelimiter in BPF. (Notice, code is 3 years old, and e.g. use the old
syntax for maps). There is a really good README[1] with illustrations:

[1] https://github.com/qmonnet/tbpoc-bpf

What I found interesting, is that the code actually doesn't implement
this via token counter, but instead uses a sliding time window. Based
on getting timestamping via bpf_ktime_get_ns() code[2]. (p.s. as we have
discussed on xdp-newbies list, getting this timestamp also have
overhead, but we can mitigate overhead by only doing it once per NAPI
cycle).

I would like to hear Quentin why he avoided maintaining the token counter?

-Jesper

[2] https://github.com/qmonnet/tbpoc-bpf/blob/master/tokenbucket.c#L98

On Mon, 15 Jun 2020 05:07:25 +0530
"Kanthi P" <Pavuluri.kanthi@...> wrote:

[Edited Message Follows]

Thanks Song and Andrii for the response.

Use-case is global rate-limiting for incoming TCP connections. And we
want to implement the token bucket algorithm using XDP for this
purpose.

So we are planning to have a map that holds a token counter which
gets two kinds of updates:

1. Periodic increments with 'x' number of tokens per second
2. Decrements as and when we get a new TCP connection request.

Most of our systems are 64 core machines. Since every core would try
to update the counter in parallel as the packets arrive each of them,
the problem I am imagining is that I might miss few updates of the
counter as one core update can overwrite other’s.

I guess it is still ok to lose the case 2 type of updates as that
might just allow a small fraction of more or less connections than
what is configured.

But I cannot afford to lose case 1 kind of updates as that could mean
that I cannot process bunch of connections until the next second.

So if I use "__sync_fetch_and_add" for incrementing the counter (for
case 1), would it guarantee that this update is never missed(though
some other core is trying to update the map to decrement the counter
to account the incoming connection at the same time)?

My understanding is that __sync_fetch_and_add translates to BPF_XADD
internally.  And it looks like spin locks are being supported from
5.x kernel versions, we are on lower version, so can’t try this one
atm.

Regards,
Kanthi

P.S there has been some problem sending the reply, which resulted in
multiple edits and deletes, please bear with me



On Wed, May 27, 2020 at 1:29 AM Andrii Nakryiko < andrii.nakryiko@...
wrote:
On Fri, May 22, 2020 at 1:07 PM Kanthi P < Pavuluri.kanthi@... >
wrote:

Hi,


I’ve been reading that hash map’s update element is atomic and also that
we can use BPF_XADD to make the entire map update atomically.


But I think that doesn’t guarantee that these updates are thread safe,
meaning one cpu core can overwrite other core’s update.


Is there a clean way of keeping them thread safe. Unfortunately I can’t
use per-cpu maps as I need global counters.


And spin locks sounds a costly operation. Can you please throw some
light?
Stating that spin locks are costly without empirical data seems
premature. What's the scenario? What's the number of CPUs? What's the
level of contention? Under light contention, spin locks in practice
would be almost as fast as atomic increments. Under heavy contention,
spin locks would probably be even better than atomics because they
will not waste as much CPU, as a typical atomic retry loop would.

But basically, depending on your use case (which you should probably
describe to get a better answer), you can either:
- do atomic increment/decrement if you need to update a counter (see
examples in kernel selftests using __sync_fetch_and_add);
- use map with bpf_spin_lock (there are also examples in selftests).


--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer


Re: BPF Concurrency

Andrii Nakryiko
 

On Sun, Jun 14, 2020 at 4:45 PM Kanthi P <Pavuluri.kanthi@...> wrote:

[Edited Message Follows]

Thanks Song and Andrii for the response.

Use-case is global rate-limiting for incoming TCP connections. And we want to implement the token bucket algorithm using XDP for this purpose.

So we are planning to have a map that holds a token counter which gets two kinds of updates:

1. Periodic increments with 'x' number of tokens per second
2. Decrements as and when we get a new TCP connection request.

Most of our systems are 64 core machines. Since every core would try to update the counter in parallel as the packets arrive each of them, the problem I am imagining is that I might miss few updates of the counter as one core update can overwrite other’s.

I guess it is still ok to lose the case 2 type of updates as that might just allow a small fraction of more or less connections than what is configured.

But I cannot afford to lose case 1 kind of updates as that could mean that I cannot process bunch of connections until the next second.

So if I use "__sync_fetch_and_add" for incrementing the counter (for case 1), would it guarantee that this update is never missed(though some other core is trying to update the map to decrement the counter to account the incoming connection at the same time)?
You should use __sync_fetch_and_add() for both cases, and then yes,
you won't lose any update. You probably would want
__sync_add_and_fetch() to get the counter after update, but that's not
supported by BPF yet. But you should still get far enough with
__sync_fetch_and_add().

Also, if you could use BPF global variables instead of BPF maps
directly, you will avoid map lookup overhead on BPF side. See BPF
selftests for examples, global vars are being used quite extensively
there.

BTW, you mentioned that you are going to update counter on every
packet, right? On 64-core machine, even __sync_fetch_and_add() might
be too much overhead. I recommend looking at Paul McKenney's book
([0]), see chapter on counting. It might provide you with good ideas
how to scale this further to per-CPU counters, if need be.

[0] https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html


My understanding is that __sync_fetch_and_add translates to BPF_XADD internally. And it looks like spin locks are being supported from 5.x kernel versions, we are on lower version, so can’t try this one atm.

Regards,
Kanthi

P.S there has been some problem sending the reply, which resulted in multiple edits and deletes, please bear with me


On Wed, May 27, 2020 at 1:29 AM Andrii Nakryiko <andrii.nakryiko@...> wrote:

On Fri, May 22, 2020 at 1:07 PM Kanthi P <Pavuluri.kanthi@...> wrote:

Hi,


I’ve been reading that hash map’s update element is atomic and also that we can use BPF_XADD to make the entire map update atomically.


But I think that doesn’t guarantee that these updates are thread safe, meaning one cpu core can overwrite other core’s update.


Is there a clean way of keeping them thread safe. Unfortunately I can’t use per-cpu maps as I need global counters.


And spin locks sounds a costly operation. Can you please throw some light?
Stating that spin locks are costly without empirical data seems
premature. What's the scenario? What's the number of CPUs? What's the
level of contention? Under light contention, spin locks in practice
would be almost as fast as atomic increments. Under heavy contention,
spin locks would probably be even better than atomics because they
will not waste as much CPU, as a typical atomic retry loop would.

But basically, depending on your use case (which you should probably
describe to get a better answer), you can either:
- do atomic increment/decrement if you need to update a counter (see
examples in kernel selftests using __sync_fetch_and_add);
- use map with bpf_spin_lock (there are also examples in selftests).



Regards,

Kanthi



Re: BPF Concurrency

Kanthi P
 
Edited

Thanks Song and Andrii for the response.

Use-case is global rate-limiting for incoming TCP connections. And we want to implement the token bucket algorithm using XDP for this purpose.

So we are planning to have a map that holds a token counter which gets two kinds of updates:

1. Periodic increments with 'x' number of tokens per second
2. Decrements as and when we get a new TCP connection request.

Most of our systems are 64 core machines. Since every core would try to update the counter in parallel as the packets arrive each of them, the problem I am imagining is that I might miss few updates of the counter as one core update can overwrite other’s.

I guess it is still ok to lose the case 2 type of updates as that might just allow a small fraction of more or less connections than what is configured.

But I cannot afford to lose case 1 kind of updates as that could mean that I cannot process bunch of connections until the next second.

So if I use "__sync_fetch_and_add" for incrementing the counter (for case 1), would it guarantee that this update is never missed(though some other core is trying to update the map to decrement the counter to account the incoming connection at the same time)?

My understanding is that __sync_fetch_and_add translates to BPF_XADD internally.  And it looks like spin locks are being supported from 5.x kernel versions, we are on lower version, so can’t try this one atm.

Regards,
Kanthi

P.S there has been some problem sending the reply, which resulted in multiple edits and deletes, please bear with me


On Wed, May 27, 2020 at 1:29 AM Andrii Nakryiko <andrii.nakryiko@...> wrote:
On Fri, May 22, 2020 at 1:07 PM Kanthi P <Pavuluri.kanthi@...> wrote:
>
> Hi,
>
>
> I’ve been reading that hash map’s update element is atomic and also that we can use BPF_XADD to make the entire map update atomically.
>
>
> But I think that doesn’t guarantee that these updates are thread safe, meaning one cpu core can overwrite other core’s update.
>
>
> Is there a clean way of keeping them thread safe. Unfortunately I can’t use per-cpu maps as I need global counters.
>
>
> And spin locks sounds a costly operation. Can you please throw some light?

Stating that spin locks are costly without empirical data seems
premature. What's the scenario? What's the number of CPUs? What's the
level of contention? Under light contention, spin locks in practice
would be almost as fast as atomic increments. Under heavy contention,
spin locks would probably be even better than atomics because they
will not waste as much CPU, as a typical atomic retry loop would.

But basically, depending on your use case (which you should probably
describe to get a better answer), you can either:
  - do atomic increment/decrement if you need to update a counter (see
examples in kernel selftests using __sync_fetch_and_add);
  - use map with bpf_spin_lock (there are also examples in selftests).

>
>
> Regards,
>
> Kanthi
>

 

 


Re: Tracing malloc/free calls in a Kubernetes Pod

Lorenzo Fontana
 


On Sun, 14 Jun 2020 at 20:32 <adelstaging+iovisor@...> wrote:
Hey folks,

I have been experimenting with bpf(trace) on a Kubernetes cluster and have gotten kubectl-trace instrumenting an application running in a Pod. Now I want to instrument the code to chase down a memory leak happening in one of the applications - originally I was hoping to use the memleak BCC tool but it seemed a pain to get it working generically, so I turned my attention to bpftrace and kubectl-trace. The problem I'm running into is I believe I need to instrument libc to listen on those calls, but I don't know of a way to point at the Pod's libc in kubectl-trace.

As I understand it, much of kubectl-trace's functionality is figuring out a Pod's process ID in the node's root namespace and exposing it via $container_pid, but the bpftrace program itself still just runs on the node, which makes sense. With the $container_pid variable we can then point at an application process via the node's procfs, i.e. /proc/$container_pid/exe. However I have not been able to figure out how to point to the $container_pid's libc, if that is at all possible?

Any suggestions would be much appreciated. Thanks!
_._,_._,_

Links:

You receive all messages sent to this group.

View/Reply Online (#1863) | Reply To Sender | Reply To Group | Mute This Topic | New Topic

Your Subscription | Contact Group Owner | Unsubscribe [fontanalorenz@...]



Replying here again for the record since you posted the same question on the k8s slack.

Kubectl trace replaces $container_pid so you can access the pid folder in the host proc. it’s not specific only for exe.
That means that you can instrument anything from that directory using the root symlink inside that pid folder.

E.g: /proc/$container_pid/root/lib/yourlib.so


Thanks for the PR today,
Lore


Re: Error loading xdp program that worked with bpf_load

Andrii Nakryiko
 

On Thu, Jun 11, 2020 at 1:41 PM Elerion <elerion1000@...> wrote:

I am using libbpf from here https://github.com/libbpf/libbpf I'm not
using ebpf. I just linked to the ebpf issue because it seems like the
only thing related to this problem when I googled it.
Ok, that I can help with, then.

What's the kernel version? Where I can find repro? Steps, etc.
Basically, a bit more context would help, as I wasn't part of initial
discussion.



On Thu, Jun 11, 2020 at 9:34 AM Andrii Nakryiko
<andrii.nakryiko@...> wrote:

On Thu, Jun 11, 2020 at 4:00 AM Jesper Dangaard Brouer
<brouer@...> wrote:

(Cross-posting to iovisor-dev)

Seeking input from BPF-llvm developers. How come Clang/LLVM 10+ is
generating incompatible BTF-info in ELF file, and downgrading to LLVM-9
fixes the issue ?


On Wed, 10 Jun 2020 14:50:27 -0700 Elerion <elerion1000@...> wrote:

Never mind, I fixed it by downgrading to Clang 9.

It appears to be an issue with Clang/LLVM 10+

https://github.com/cilium/ebpf/issues/43
This is newer Clang recording that function is global, not static.
libbpf is sanitizing BTF to remove this flag, if kernel doesn't
support this. But given this is re-implementation of libbpf, that's
probably not happening, right?


On Wed, Jun 10, 2020 at 2:38 PM Toke Høiland-Jørgensen <toke@...> wrote:

Elerion <elerion1000@...> writes:

[69] FUNC xdp_program type_id=68 vlen != 0
'vlen != 0' is the error. Not sure why you hit that; what's the output
of 'bpftool btf dump file yourprog.o' ?

-Toke

--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer


Re: Error loading xdp program that worked with bpf_load

Alexei Starovoitov
 

On Thu, Jun 11, 2020 at 9:35 AM Andrii Nakryiko
<andrii.nakryiko@...> wrote:

On Thu, Jun 11, 2020 at 4:00 AM Jesper Dangaard Brouer
<brouer@...> wrote:

(Cross-posting to iovisor-dev)

Seeking input from BPF-llvm developers. How come Clang/LLVM 10+ is
generating incompatible BTF-info in ELF file, and downgrading to LLVM-9
fixes the issue ?


On Wed, 10 Jun 2020 14:50:27 -0700 Elerion <elerion1000@...> wrote:

Never mind, I fixed it by downgrading to Clang 9.

It appears to be an issue with Clang/LLVM 10+

https://github.com/cilium/ebpf/issues/43
This is newer Clang recording that function is global, not static.
libbpf is sanitizing BTF to remove this flag, if kernel doesn't
support this. But given this is re-implementation of libbpf, that's
probably not happening, right?
just running ./test_xdp_veth.sh on the latest bpf-next with the latest
clang I see:
BTF debug data section '.BTF' rejected: Invalid argument (22)!
- Length: 514
Verifier analysis:
...
[11] VAR _license type_id=9 linkage=1
[12] DATASEC license size=0 vlen=1 size == 0


BTF debug data section '.BTF' rejected: Invalid argument (22)!
- Length: 494
Verifier analysis:
...
[11] VAR _license type_id=9 linkage=1
[12] DATASEC license size=0 vlen=1 size == 0


BTF debug data section '.BTF' rejected: Invalid argument (22)!
11] VAR _license type_id=9 linkage=1
[12] DATASEC license size=0 vlen=1 size == 0

PING 10.1.1.33 (10.1.1.33) 56(84) bytes of data.
64 bytes from 10.1.1.33: icmp_seq=1 ttl=64 time=0.042 ms

--- 10.1.1.33 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.042/0.042/0.042/0.000 ms
selftests: xdp_veth [PASS]

Is that just the noise from libbpf probing or what?


Re: Error loading xdp program that worked with bpf_load

Andrii Nakryiko
 

On Thu, Jun 11, 2020 at 4:00 AM Jesper Dangaard Brouer
<brouer@...> wrote:

(Cross-posting to iovisor-dev)

Seeking input from BPF-llvm developers. How come Clang/LLVM 10+ is
generating incompatible BTF-info in ELF file, and downgrading to LLVM-9
fixes the issue ?


On Wed, 10 Jun 2020 14:50:27 -0700 Elerion <elerion1000@...> wrote:

Never mind, I fixed it by downgrading to Clang 9.

It appears to be an issue with Clang/LLVM 10+

https://github.com/cilium/ebpf/issues/43
This is newer Clang recording that function is global, not static.
libbpf is sanitizing BTF to remove this flag, if kernel doesn't
support this. But given this is re-implementation of libbpf, that's
probably not happening, right?


On Wed, Jun 10, 2020 at 2:38 PM Toke Høiland-Jørgensen <toke@...> wrote:

Elerion <elerion1000@...> writes:

[69] FUNC xdp_program type_id=68 vlen != 0
'vlen != 0' is the error. Not sure why you hit that; what's the output
of 'bpftool btf dump file yourprog.o' ?

-Toke

--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer


Re: Error loading xdp program that worked with bpf_load

Jesper Dangaard Brouer
 

(Cross-posting to iovisor-dev)

Seeking input from BPF-llvm developers. How come Clang/LLVM 10+ is
generating incompatible BTF-info in ELF file, and downgrading to LLVM-9
fixes the issue ?

On Wed, 10 Jun 2020 14:50:27 -0700 Elerion <elerion1000@...> wrote:

Never mind, I fixed it by downgrading to Clang 9.

It appears to be an issue with Clang/LLVM 10+

https://github.com/cilium/ebpf/issues/43

On Wed, Jun 10, 2020 at 2:38 PM Toke Høiland-Jørgensen <toke@...> wrote:

Elerion <elerion1000@...> writes:

[69] FUNC xdp_program type_id=68 vlen != 0
'vlen != 0' is the error. Not sure why you hit that; what's the output
of 'bpftool btf dump file yourprog.o' ?

-Toke
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer


LPM Trie methods not available in user space program (python)

mdimolianis@...
 

Hello all,
I am trying to retrieve the keys of an LPM Trie in my user space program (similar to https://github.com/iovisor/bcc/blob/master/examples/networking/xdp/xdp_macswap_count.py) however, I am actually getting nothing.
The appropriate keys and values are inserted from the kernel space program (these are actually inserted, I have validated it by printing the values from the LPM Trie that match my packets - in the kernel space program -).
The weirdest thing is that when I substitute the LPM Trie with a BPF_HASH, I can retrieve the existing keys. According to the https://github.com/iovisor/bcc/blob/master/src/python/bcc/table.py , both LPM and BPF_HASH are inheriting the same class and share the same methods for manipulating keys and values.
If you have any thoughts or recommendations please share them.
Thank you in advance.

P.S.
I am using an Ubuntu machine 16.04.6 LTS with kernel 4.15.0-60-generic and bcc (0.10.0).

141 - 160 of 2015