Date   

Re: Questions about current eBPF usages

Yonghong Song
 

On Thu, Oct 15, 2020 at 4:06 PM Jiada Tu via lists.iovisor.org
<jtu3=hawk.iit.edu@...> wrote:

Hello BPF community,

I am looking for a way to move a user space program's disk I/O scheduling related logic down to kernel space, and then have the new kernel logic communicate with the user space program to make better I/O scheduling decisions. The reason that the user space program itself has I/O scheduling logic is because it needs to prioritize certain read or write requests.

I started looking at eBPF for that purpose. After doing some research, I learned that eBPF is very good at kernel profiling and tracing, but I didn't find much information about modifying kernel functions / data-structure using eBPF.

I am wondering:

1. Instead of calling eBPF function before / after calling a kernel function and then returning back to that kernel function, is it possible for eBPF programs to totally replace a kernel function or module logic?
Currently, no. Kernel has support to replace a bpf program, but not
kernel function. Replacing kernel functions may easily causing kernel
mishehave. There are some proposals to explicitly specify functions
which can be replaced. This work is not done yet.


2. Is it possible for eBPF programs to tamper the parameter and return value of a kernel function, or eBPF program can only read kernel data-structure but can not modify them? (some search indicates that it can not few years ago, but I am not sure if it is changed recently)
No for input parameters.
Yes for return values in certain cases. For any kernel functions
annotated with ALLOW_ERROR_INJECTION, you can attach a bpf program to
that function to change its return values.

all tracing programs can read kernel data structures as of today with
bpf_probe_read or
direct memory access similar to bpf_probe_read in later kernels.
writing to kernel data structure has to be extremely careful as it can
easily crash the kernel or cause kernel to misbehavior. This has to be
done in a controlled way, e.g., in networking, through specific
helpers.

In your case, the bpf program is to influence io scheduling decisions.
You could implement in a way to do kernel data structure write in
kernel but have a hook to a bpf program to make decision and based on
bpf program return value, kernel can decide what to schedule.



Thank you!
Jiada


Questions about current eBPF usages

Jiada Tu
 

Hello BPF community,

I am looking for a way to move a user space program's disk I/O scheduling related logic down to kernel space, and then have the new kernel logic communicate with the user space program to make better I/O scheduling decisions. The reason that the user space program itself has I/O scheduling logic is because it needs to prioritize certain read or write requests.

I started looking at eBPF for that purpose. After doing some research, I learned that eBPF is very good at kernel profiling and tracing, but I didn't find much information about modifying kernel functions / data-structure using eBPF.

I am wondering:

1. Instead of calling eBPF function before / after calling a kernel function and then returning back to that kernel function, is it possible for eBPF programs to totally replace a kernel function or module logic?

2. Is it possible for eBPF programs to tamper the parameter and return value of a kernel function, or eBPF program can only read kernel data-structure but can not modify them? (some search indicates that it can not few years ago, but I am not sure if it is changed recently)


Thank you!
Jiada


Re: Tracepoint/Kprobe for tracking inbound connections

Yonghong Song
 

On Wed, Oct 14, 2020 at 11:57 AM Kanthi P <Pavuluri.kanthi@...> wrote:

[Edited Message Follows]

Nice, thanks Song. I am actually looking to track it till it is closed, so might have to remove that tag when the socket goes to closed state.
And once I have the concurrent connections info, say in a map, I am using XDP to drop the connections after they reach a threshold

So also wanted to ask if there is any way I can read the concurrent connections in XDP since the kernel already keeps track of them at /proc/net/tcp*?
That would help me avoid placing another tracepoint to track the connection count.
XDP only tracks raw packet. There is no skb or other meta data is
available at that point.
You either need to track by yourself or you add another skb or sk level hook.


Appreciate your help!

Thanks,
Kanthi

On Thu, Oct 1, 2020 at 11:26 AM Y Song <ys114321@...> wrote:

On Tue, Sep 29, 2020 at 4:14 AM Kanthi P <Pavuluri.kanthi@...> wrote:

Hi,

I am looking for tracking inbound connections on a system using tracepoints/kprobes.

I was checking "trace_inet_sock_set_state", with which we can track the state changes during connection establishment and closure. It seems straightforward to track total connections, but since we only want inbound, one way would be to look at what are the ip addresses/ports on which a node listens to and while tracking the state changes, I can see if the local address/port matches to the one this system listens on and based on that make a decision whether its an inbound connection or not. This looks a bit roundabout way for me, so thought of reaching for suggestions to do it simpler.

Another way is to store the socker address when TCP_SYN_RECV to TCP_ESTABLISHED state change happens and during closure we can check if it is for this socket, so we know its inbound connection. But this would make the map size grow too high as we have about 50k concurrent connections.

Can you suggest a better way to do this?
Maybe you can use sk_local_storage? You can attach a piece of
information to the socket during TCP_SYN_RECV and later on during
TCP_ESTABLISHED to check that data, and you can delete that data from
the socket if you do not need it any more,
all in bpf program.


Thanks,
Kanthi


Re: Tracepoint/Kprobe for tracking inbound connections

Kanthi P
 

Thanks Forrest!


On Wed, Oct 7, 2020 at 1:03 PM Forrest Chen <forrest0579@...> wrote:
you can attach kprobe in 'tcp_conn_request" for inbound connection

--
forrest0579@...






Re: Tracepoint/Kprobe for tracking inbound connections

Kanthi P
 
Edited

Nice, thanks Song. I am actually looking to track it till it is closed, so might have to remove that tag when the socket goes to closed state.
And once I have the concurrent connections info, say in a map, I am using XDP to drop the connections after they reach a threshold
 
So also wanted to ask if there is any way I can read the concurrent connections in XDP since the kernel already keeps track of them at /proc/net/tcp*?
That would help me avoid placing another tracepoint to track the connection count.
 
Appreciate your help!
 
Thanks,
Kanthi

On Thu, Oct 1, 2020 at 11:26 AM Y Song <ys114321@...> wrote:

On Tue, Sep 29, 2020 at 4:14 AM Kanthi P <Pavuluri.kanthi@...> wrote:
>
> Hi,
>
> I am looking for tracking inbound connections on a system using tracepoints/kprobes.
>
> I was checking "trace_inet_sock_set_state", with which we can track the state changes during connection establishment and closure. It seems straightforward to track total connections, but since we only want inbound, one way would be to look at what are the ip addresses/ports on which a node listens to and while tracking the state changes, I can see if the local address/port matches to the one this system listens on and based on that make a decision whether its an inbound connection or not. This looks a bit roundabout way for me, so thought of reaching for suggestions to do it simpler.
>
> Another way is to store the socker address when TCP_SYN_RECV to TCP_ESTABLISHED state change happens and during closure we can check if it is for this socket, so we know its inbound connection. But this would make the map size grow too high as we have about 50k concurrent connections.
>
> Can you suggest a better way to do this?

Maybe you can use sk_local_storage? You can attach a piece of
information to the socket during TCP_SYN_RECV and later on during
TCP_ESTABLISHED to check that data, and you can delete that data from
the socket if you do not need it any more,
all in bpf program.

>
> Thanks,
> Kanthi
>


Re: Question about inet_set_socket_state trace point

Raga lahari
 

Hi,


Observing established connection counter discrepancy as 20% (30-40 connections mismatch out of 200) in one day that builds to 30% by day-2 and so on.


This observation is with this code

if (args->newstate == TCP_ESTABLISHED) 

                 __sync_fetch_and_add(val, 1); 

       if (args->oldstate == TCP_ESTABLISHED)       

                 __sync_fetch_and_add(val, -1);  

 

 } 

There was a typo in my first message.

 


Regards,
Ragalahari


Re: Question about inet_set_socket_state trace point

Raga lahari
 

Hello,

Correcting typo in code snippet

<code>

TRACEPOINT_PROBE(sock, inet_sock_set_state) {


if (args->newstate == TCP_ESTABLISHED) 

                 __sync_fetch_and_add(val, 1); 

       if (args->oldstate == TCP_ESTABLISHED)       

                 __sync_fetch_and_add(val, -1);  

 } 



Thanks & Regards,
Ragalahari


On Wed, Oct 14, 2020 at 10:35 AM Raga lahari <ragalahari.potti@...> wrote:

Hi everyone,


I am using inet_set_socket_state trace point to get current establish connection count

Here, incrementing counter value in BPF map when new state is TCP_ESTABLISHED and decrementing when old state is TCP_ESTABLISHED.


But observed that the map count is having discrepancy with what netstat shows. When we start the probe, it looks all fine, but when we leave it running say for 2-3 days we see the difference. And this difference is building over time.

Can someone please help me here if I am missing something?


<code>

TRACEPOINT_PROBE(sock, inet_sock_set_state) {


if (args->newstate >= TCP_ESTABLISHED) 

                 __sync_fetch_and_add(val, 1); 

       if (args->newstate >= TCP_ESTABLISHED)       

                 __sync_fetch_and_add(val, -1);  

 } 


netstat -tanp  | grep -i "EST" | wc -l

Thanks,
Ragalahari


Re: Question about inet_set_socket_state trace point

Tristan Mayfield
 

Hi Ragalahari,

In your code you seem to not check for "old state" when you're heading to decrement. It looks like you are adding 1 and then immediately subtracting 1 in the same condition. That might be your problem? You never stated what the difference between it and netstat are so I can't be sure.

Tristan


Question about inet_set_socket_state trace point

Raga lahari
 

Hi everyone,


I am using inet_set_socket_state trace point to get current establish connection count

Here, incrementing counter value in BPF map when new state is TCP_ESTABLISHED and decrementing when old state is TCP_ESTABLISHED.


But observed that the map count is having discrepancy with what netstat shows. When we start the probe, it looks all fine, but when we leave it running say for 2-3 days we see the difference. And this difference is building over time.

Can someone please help me here if I am missing something?


<code>

TRACEPOINT_PROBE(sock, inet_sock_set_state) {


if (args->newstate >= TCP_ESTABLISHED) 

                 __sync_fetch_and_add(val, 1); 

       if (args->newstate >= TCP_ESTABLISHED)       

                 __sync_fetch_and_add(val, -1);  

 } 


netstat -tanp  | grep -i "EST" | wc -l

Thanks,
Ragalahari


Re: [vagrant] accept PR to bring iovisor/vagrant to ubuntu 20.04 (from ubuntu 14.04)

Brenden Blanco
 

Sure I can accept a PR.

On Fri, Oct 9, 2020 at 5:59 AM <github@...> wrote:

I have to create a test-environment (based on vagrant) the last couple of days and i've done this with ubuntu 20.04 as base image.

Is the repository https://github.com/iovisor/vagrant still active?
If yes i would create a PR to update this Repository.


[vagrant] accept PR to bring iovisor/vagrant to ubuntu 20.04 (from ubuntu 14.04)

github@...
 

I have to create a test-environment (based on vagrant) the last couple of days and i've done this with ubuntu 20.04 as base image.

Is the repository https://github.com/iovisor/vagrant still active?
If yes i would create a PR to update this Repository.


Re: Tracepoint/Kprobe for tracking inbound connections

Forrest Chen
 

you can attach kprobe in 'tcp_conn_request" for inbound connection

--
forrest0579@...


Re: Tracepoint/Kprobe for tracking inbound connections

Yonghong Song
 

On Tue, Sep 29, 2020 at 4:14 AM Kanthi P <Pavuluri.kanthi@...> wrote:

Hi,

I am looking for tracking inbound connections on a system using tracepoints/kprobes.

I was checking "trace_inet_sock_set_state", with which we can track the state changes during connection establishment and closure. It seems straightforward to track total connections, but since we only want inbound, one way would be to look at what are the ip addresses/ports on which a node listens to and while tracking the state changes, I can see if the local address/port matches to the one this system listens on and based on that make a decision whether its an inbound connection or not. This looks a bit roundabout way for me, so thought of reaching for suggestions to do it simpler.

Another way is to store the socker address when TCP_SYN_RECV to TCP_ESTABLISHED state change happens and during closure we can check if it is for this socket, so we know its inbound connection. But this would make the map size grow too high as we have about 50k concurrent connections.

Can you suggest a better way to do this?
Maybe you can use sk_local_storage? You can attach a piece of
information to the socket during TCP_SYN_RECV and later on during
TCP_ESTABLISHED to check that data, and you can delete that data from
the socket if you do not need it any more,
all in bpf program.


Thanks,
Kanthi


Tracepoint/Kprobe for tracking inbound connections

Kanthi P
 

Hi,

I am looking for tracking inbound connections on a system using tracepoints/kprobes.

I was checking "trace_inet_sock_set_state", with which we can track the state changes during connection establishment and closure. It seems straightforward to track total connections, but since we only want inbound, one way would be to look at what are the ip addresses/ports on which a node listens to and while tracking the state changes, I can see if the local address/port matches to the one this system listens on and based on that make a decision whether its an inbound connection or not. This looks a bit roundabout way for me, so thought of reaching for suggestions to do it simpler.

Another way is to store the socker address when TCP_SYN_RECV to TCP_ESTABLISHED state change happens and during closure we can check if it is for this socket, so we know its inbound connection. But this would make the map size grow too high as we have about 50k concurrent connections.

Can you suggest a better way to do this?

Thanks,
Kanthi


Re: Load BPF program at boot-time?

Yonghong Song
 

On Sun, Sep 6, 2020 at 7:55 AM Shung-Hsi Yu <yu@...> wrote:

Hi,

Is it possible to load a BPF program at boot time?
It is possible. See the patch below:
https://lore.kernel.org/bpf/20200819042759.51280-1-alexei.starovoitov@gmail.com/

I tried to load a BPF program and pin it in bpffs system. The system could
be extended to load bpf program, even attach it if other subsystem is ready.
But this needs kernel work.

What I'm trying to achieve is to trace every single call to a certain
function since the kernel starts, without missing anything.

More specifically, I'm trying to debug iommu_alloc failures by looking
at the stacktrace to find out which subsystem/driver allocated too
many IOMMU slots on a ppc64le system, which I do not have direct
access to.

I've considered writing a systemd unit file that loads a BPF program
before the sysinit target[1], but I'm not sure if that's early enough.
An alternative seems to be to use boot-time tracing with ftrace[2]
instead (which I end up doing), but it requires recompiling the kernel
inorder to add tracepoints to retrieve the function call arguments,
and there isn't an easy way to stop tracing to prevent the tracing
buffer overflows (I end up writing a systemd unit file that sets a
ftrace event trigger that turns off tracing).
bpf program seems a good choice here since it can store arbitrary
data in its maps and based on the tracing state, it can stop tracing.

There are still some potential issues relating to not recompile kernel
and just change bpf programs and recompile bpf programs and
rebooting should just work, which is not available today. I guess this
probably can be improved. If you are interested, please take a look
at the above patch and may improve kernel to cover your use case.


Maybe there is a better way to do something like this?


Much thanks,
Shung-Hsi Yu

[1]: https://www.freedesktop.org/software/systemd/man/bootup.html
[2]: https://www.kernel.org/doc/html/latest/trace/boottime-trace.html



Load BPF program at boot-time?

Shung-Hsi Yu
 

Hi,

Is it possible to load a BPF program at boot time?
What I'm trying to achieve is to trace every single call to a certain
function since the kernel starts, without missing anything.

More specifically, I'm trying to debug iommu_alloc failures by looking
at the stacktrace to find out which subsystem/driver allocated too
many IOMMU slots on a ppc64le system, which I do not have direct
access to.

I've considered writing a systemd unit file that loads a BPF program
before the sysinit target[1], but I'm not sure if that's early enough.
An alternative seems to be to use boot-time tracing with ftrace[2]
instead (which I end up doing), but it requires recompiling the kernel
inorder to add tracepoints to retrieve the function call arguments,
and there isn't an easy way to stop tracing to prevent the tracing
buffer overflows (I end up writing a systemd unit file that sets a
ftrace event trigger that turns off tracing).

Maybe there is a better way to do something like this?


Much thanks,
Shung-Hsi Yu

[1]: https://www.freedesktop.org/software/systemd/man/bootup.html
[2]: https://www.kernel.org/doc/html/latest/trace/boottime-trace.html


Re: Reading Pinned maps in eBPF Programs

Andrii Nakryiko
 

On Mon, Aug 31, 2020 at 12:03 PM Ian <icampbe14@...> wrote:

Interestingly enough adding just -g in my Makefile built the BPF programs and allowed the BTF section to be found and properly loaded. My BPF program was loaded and is running properly with my desired functionality. I am confused though as to why the -g flag fixed this problem. Which according to the clang man page:

-g Generate debug information.

Is BTF information considered debug information? Is that in general or in this case? Is the this unexpected behavior? Perhaps a bug of clangs non -g compiled binaries with BPF? It would seem to me that the BTF information should not be purged from a non -g binary. I am interested to hear your thought on this Andrii!
It's expected right now. BTF started out as purely debug information,
but got elevated into pretty much a mandatory thing for modern BPF
applications. We've talked about making .BTF emitted without -g, but
that hasn't happened in Clang yet (there are some technical
difficulties).

Again, thank you so much for your help. There is no way I would have figured that out on my own.

Ian


Re: Reading Pinned maps in eBPF Programs

Ian
 

Interestingly enough adding just -g in my Makefile built the BPF programs and allowed the BTF section to be found and properly loaded. My BPF program was loaded and is running properly with my desired functionality. I am confused though as to why the -g flag fixed this problem. Which according to the clang man page:
-g Generate debug information.
Is BTF information considered debug information? Is that in general or in this case? Is the this unexpected behavior? Perhaps a bug of clangs non -g compiled binaries with BPF? It would seem to me that the BTF information should not be purged from a non -g binary. I am interested to hear your thought on this Andrii! 

Again, thank you so much for your help. There is no way I would have figured that out on my own. 

Ian


Re: Reading Pinned maps in eBPF Programs

Andrii Nakryiko
 

On Sun, Aug 30, 2020 at 4:35 PM Ian <icampbe14@...> wrote:

Hello,

Here is the libbpf Logs at all levels for the open snoop program when using the pinned option for a map. This was tested on Linux Kernel v5.4 with libbpf 0.0.9, 0.1.0, and the current version. All the results of the logs were the same so I have only posted a single copy of it here. Let me know what you think and what the next steps might be! I appreciate the help and am having a good time trying to piece this together.
[...]


libbpf: section(14) .rel.eh_frame, size 32, link 15, flags 0, type=9

libbpf: skip relo .rel.eh_frame(14) for section(13)

libbpf: section(15) .symtab, size 408, link 1, flags 0, type=2

libbpf: BTF is required, but is missing or corrupted.
Ok, this is a very different issue than the kernel missing BTF. libbpf
is complaining that your opensnoop.bpf.o itself is missing BTF. And
right, BTF is required to parse map definitions properly, but it
doesn't depend on having kernel support for BTF at all. Make sure you
use recent enough Clang (v10+) and you build your opensnoop.bpf.o with
-target bpf **and** -g flag to generate debug info (including .BTF ELF
section).


Ian


Re: Reading Pinned maps in eBPF Programs

Ian
 

Hello, 

Here is the libbpf Logs at all levels for the open snoop program when using the pinned option for a map. This was tested on Linux Kernel v5.4 with libbpf 0.0.9, 0.1.0, and the current version. All the results of the logs were the same so I have only posted a single copy of it here. Let me know what you think and what the next steps might be! I appreciate the help and am having a good time trying to piece this together. 

libbpf: loading bpf-library/bpf_objs/opensnoop.bpf.o
 
libbpf: section(1) .strtab, size 289, link 0, flags 0, type=3
 
libbpf: skip section(1) .strtab
 
libbpf: section(2) .text, size 0, link 0, flags 6, type=1
 
libbpf: skip section(2) .text
 
libbpf: section(3) tracepoint/syscalls/sys_enter_openat, size 1632, link 0, flags 6, type=1
 
libbpf: found program tracepoint/syscalls/sys_enter_openat
 
libbpf: section(4) .reltracepoint/syscalls/sys_enter_openat, size 32, link 15, flags 0, type=9
 
libbpf: section(5) tracepoint/syscalls/sys_enter_open, size 1368, link 0, flags 6, type=1
 
libbpf: found program tracepoint/syscalls/sys_enter_open
 
libbpf: section(6) .reltracepoint/syscalls/sys_enter_open, size 32, link 15, flags 0, type=9
 
libbpf: section(7) .data, size 4, link 0, flags 3, type=1
 
libbpf: section(8) maps, size 20, link 0, flags 3, type=1
 
libbpf: section(9) .rodata.str1.1, size 9, link 0, flags 32, type=1
 
libbpf: skip section(9) .rodata.str1.1
 
libbpf: section(10) version, size 4, link 0, flags 3, type=1
 
libbpf: kernel version of bpf-library/bpf_objs/opensnoop.bpf.o is 50422
 
libbpf: section(11) license, size 4, link 0, flags 3, type=1
 
libbpf: license of bpf-library/bpf_objs/opensnoop.bpf.o is GPL
 
libbpf: section(12) .maps, size 40, link 0, flags 3, type=1
 
libbpf: section(13) .eh_frame, size 80, link 0, flags 2, type=1
 
libbpf: skip section(13) .eh_frame
 
libbpf: section(14) .rel.eh_frame, size 32, link 15, flags 0, type=9
 
libbpf: skip relo .rel.eh_frame(14) for section(13)
 
libbpf: section(15) .symtab, size 408, link 1, flags 0, type=2
 
libbpf: BTF is required, but is missing or corrupted.
 
Ian

101 - 120 of 2021