Re: BPF Concurrency
Andrii Nakryiko
On Sun, Jun 14, 2020 at 4:45 PM Kanthi P <Pavuluri.kanthi@gmail.com> wrote:
You should use __sync_fetch_and_add() for both cases, and then yes, you won't lose any update. You probably would want __sync_add_and_fetch() to get the counter after update, but that's not supported by BPF yet. But you should still get far enough with __sync_fetch_and_add(). Also, if you could use BPF global variables instead of BPF maps directly, you will avoid map lookup overhead on BPF side. See BPF selftests for examples, global vars are being used quite extensively there. BTW, you mentioned that you are going to update counter on every packet, right? On 64-core machine, even __sync_fetch_and_add() might be too much overhead. I recommend looking at Paul McKenney's book ([0]), see chapter on counting. It might provide you with good ideas how to scale this further to per-CPU counters, if need be. [0] https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html
|
|
Re: BPF Concurrency
Thanks Song and Andrii for the response. Use-case is global rate-limiting for incoming TCP connections. And we want to implement the token bucket algorithm using XDP for this purpose. So we are planning to have a map that holds a token counter which gets two kinds of updates: 1. Periodic increments with 'x' number of tokens per second Most of our systems are 64 core machines. Since every core would try to update the counter in parallel as the packets arrive each of them, the problem I am imagining is that I might miss few updates of the counter as one core update can overwrite other’s. I guess it is still ok to lose the case 2 type of updates as that might just allow a small fraction of more or less connections than what is configured. But I cannot afford to lose case 1 kind of updates as that could mean that I cannot process bunch of connections until the next second. So if I use "__sync_fetch_and_add" for incrementing the counter (for case 1), would it guarantee that this update is never missed(though some other core is trying to update the map to decrement the counter to account the incoming connection at the same time)? My understanding is that __sync_fetch_and_add translates to BPF_XADD internally. And it looks like spin locks are being supported from 5.x kernel versions, we are on lower version, so can’t try this one atm. Regards,
|
|
Re: Tracing malloc/free calls in a Kubernetes Pod
Lorenzo Fontana
On Sun, 14 Jun 2020 at 20:32 <adelstaging+iovisor@...> wrote: Hey folks, Replying here again for the record since you posted the same question on the k8s slack. Kubectl trace replaces $container_pid so you can access the pid folder in the host proc. it’s not specific only for exe. That means that you can instrument anything from that directory using the root symlink inside that pid folder. E.g: /proc/$container_pid/root/lib/yourlib.so Thanks for the PR today, Lore
|
|
Re: Error loading xdp program that worked with bpf_load
Andrii Nakryiko
On Thu, Jun 11, 2020 at 1:41 PM Elerion <elerion1000@gmail.com> wrote:
Ok, that I can help with, then. What's the kernel version? Where I can find repro? Steps, etc. Basically, a bit more context would help, as I wasn't part of initial discussion.
|
|
Re: Error loading xdp program that worked with bpf_load
Alexei Starovoitov
On Thu, Jun 11, 2020 at 9:35 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote: just running ./test_xdp_veth.sh on the latest bpf-next with the latest clang I see: BTF debug data section '.BTF' rejected: Invalid argument (22)! - Length: 514 Verifier analysis: ... [11] VAR _license type_id=9 linkage=1 [12] DATASEC license size=0 vlen=1 size == 0 BTF debug data section '.BTF' rejected: Invalid argument (22)! - Length: 494 Verifier analysis: ... [11] VAR _license type_id=9 linkage=1 [12] DATASEC license size=0 vlen=1 size == 0 BTF debug data section '.BTF' rejected: Invalid argument (22)! 11] VAR _license type_id=9 linkage=1 [12] DATASEC license size=0 vlen=1 size == 0 PING 10.1.1.33 (10.1.1.33) 56(84) bytes of data. 64 bytes from 10.1.1.33: icmp_seq=1 ttl=64 time=0.042 ms --- 10.1.1.33 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.042/0.042/0.042/0.000 ms selftests: xdp_veth [PASS] Is that just the noise from libbpf probing or what?
|
|
Re: Error loading xdp program that worked with bpf_load
Andrii Nakryiko
On Thu, Jun 11, 2020 at 4:00 AM Jesper Dangaard Brouer
<brouer@redhat.com> wrote: This is newer Clang recording that function is global, not static. libbpf is sanitizing BTF to remove this flag, if kernel doesn't support this. But given this is re-implementation of libbpf, that's probably not happening, right?
|
|
Re: Error loading xdp program that worked with bpf_load
(Cross-posting to iovisor-dev)
toggle quoted messageShow quoted text
Seeking input from BPF-llvm developers. How come Clang/LLVM 10+ is generating incompatible BTF-info in ELF file, and downgrading to LLVM-9 fixes the issue ?
On Wed, 10 Jun 2020 14:50:27 -0700 Elerion <elerion1000@gmail.com> wrote:
Never mind, I fixed it by downgrading to Clang 9. --
Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
|
|
LPM Trie methods not available in user space program (python)
mdimolianis@...
Hello all,
I am trying to retrieve the keys of an LPM Trie in my user space program (similar to https://github.com/iovisor/bcc/blob/master/examples/networking/xdp/xdp_macswap_count.py) however, I am actually getting nothing. The appropriate keys and values are inserted from the kernel space program (these are actually inserted, I have validated it by printing the values from the LPM Trie that match my packets - in the kernel space program -). The weirdest thing is that when I substitute the LPM Trie with a BPF_HASH, I can retrieve the existing keys. According to the https://github.com/iovisor/bcc/blob/master/src/python/bcc/table.py , both LPM and BPF_HASH are inheriting the same class and share the same methods for manipulating keys and values. If you have any thoughts or recommendations please share them. Thank you in advance. P.S. I am using an Ubuntu machine 16.04.6 LTS with kernel 4.15.0-60-generic and bcc (0.10.0).
|
|
Re: BPF Concurrency
Andrii Nakryiko
On Fri, May 22, 2020 at 1:07 PM Kanthi P <Pavuluri.kanthi@gmail.com> wrote:
Stating that spin locks are costly without empirical data seems premature. What's the scenario? What's the number of CPUs? What's the level of contention? Under light contention, spin locks in practice would be almost as fast as atomic increments. Under heavy contention, spin locks would probably be even better than atomics because they will not waste as much CPU, as a typical atomic retry loop would. But basically, depending on your use case (which you should probably describe to get a better answer), you can either: - do atomic increment/decrement if you need to update a counter (see examples in kernel selftests using __sync_fetch_and_add); - use map with bpf_spin_lock (there are also examples in selftests).
|
|
Re: BPF Concurrency
Yonghong Song
On Fri, May 22, 2020 at 1:07 PM Kanthi P <Pavuluri.kanthi@gmail.com> wrote:
BPF_XADD is to make one map element inside the kenel to update atomically. Could you filx an issue with more details? This way, we will have a better record. Not sure what do you mean here. yes, one cpu updated a map element and the other can modify it. What kind of primitive do you want? compare-and-exchange?
|
|
Re: USDT probe to trace based on path to binary
Yonghong Song
On Fri, May 22, 2020 at 7:14 PM Vallish Guru.V. <vallishguru@hotmail.com> wrote:
Could you file a separate issue so it is easy for people to help? Do you have a minimum reproducible test case? This will make it easy to debug. application without pid should work. A test case will be helpful to investigate. one instance of the application, tracing through pid becomes messy. Is there something obvious that I have missed in my script? I would appreciate any pointers to unblock me.
|
|
USDT probe to trace based on path to binary
Vallish Guru.V.
Hello,
I am trying to introduce USDT probe to an application and my bcc script is failing with following error:
<snip> Traceback (most recent call last): File "test.py", line 40, in <module> b = BPF(text=prog, usdt_contexts=[u]) File "/usr/lib/python3.6/dist-packages/bcc/__init__.py", line 318, in __init__ "locations") Exception: can't generate USDT probe arguments; possible cause is missing pid when a probe in a shared object has multiple locations <end snip>
I am trying to run bcc script based on path to binary and not pid. The closest discussion I found related to this error is:
https://github.com/iovisor/bcc/issues/1774 Application that I am trying to instrument is not a shared object as discussed in issue #1774. The code that I have instrumented is in the application. Since there are more than one instance of the application, tracing through pid becomes messy. Is there something obvious that I have missed in my script? I would appreciate any pointers to unblock me.
Thanks.
-Vallish
|
|
BPF Concurrency
Kanthi P
Hi, I’ve been reading that hash map’s update element is atomic and also that we can use BPF_XADD to make the entire map update atomically. But I think that doesn’t guarantee that these updates are thread safe, meaning one cpu core can overwrite other core’s update. Is there a clean way of keeping them thread safe. Unfortunately I can’t use per-cpu maps as I need global counters. And spin locks sounds a costly operation. Can you please throw some light? Regards, Kanthi
|
|
Re: Building BPF programs and kernel persistence
Andrii Nakryiko
On Mon, May 18, 2020 at 9:23 AM Tristan Mayfield
<mayfieldtristan@gmail.com> wrote: BTW, everyone seems to be using -O2 for compiling BPF programs. Not sure how well-supported -O3 will be. [...] "classic BPF" is entirely different thing, don't use that term in this context, it will just confuse people. perf is used as a means to trigger BPF program execution for tracepoint and kprobes. It is, essentially, a BPF hook provider, if you will. For XDP, BPF hook is provided by networking layer and drivers. For cgroup BPF programs, hooks are "provided", in a sense, by cgroup subsystem. So perf is just one of many ways to specify where and when BPF program is going to be executed, and with what context. If that's the case then is the bpf_link object the tool to bridge BPF and perf? I noticed that whenno, bpf_link is a way to marry BPF hook with BPF program. It's not specific to perf or XDP, or whatever. Actually, right now perf-based BPF hooks (kprobe, tracepoint) actually do not create a bpf_link under cover, so you won't be able to pin them. Awesome, have fun! Hope everyone is staying healthy out there,
|
|
Re: Building BPF programs and kernel persistence
Tristan Mayfield
Thanks for the reply Andrii. Managed to get a build working outside of the kernel tree for BPF programs. The two major things that I learned were that first, the order in which files are included in the build command is more important than I previously thought. The second thing was learning how clang deals with asm differently than gcc. I had to use samples/bpf/asm_goto_workaround.h to fix those errors. The meat of the makefile is as follows: CLANGINC := /usr/lib/llvm-10/lib/clang/10.0.0/include INC_FLAGS := -nostdinc -isystem $(CLANGINC) EXTRA_FLAGS := -O3 -emit-llvm linuxhdrs := /usr/src/linux-headers-$(shell uname -r) LINUXINCLUDE := -include $(linuxhdrs)/include/linux/kconfig.h \ -include asm_workaround.h \ -I$(linuxhdrs)/arch/x86/include/ \ -I$(linuxhdrs)/arch/x86/include/uapi \ -I$(linuxhdrs)/arch/x86/include/generated \ -I$(linuxhdrs)/arch/x86/include/generated/uapi \ -I$(linuxhdrs)/include \ -I$(linuxhdrs)/include/uapi \ -I$(linuxhdrs)/include/generated/uapi \ COMPILERFLAGS := -D__KERNEL__ -D__ASM_SYSREG_H \ -D__BPF_TRACING__ -D__TARGET_ARCH_$(ARCH) \ # Builds all the targets from corresponding .c files $(BPFOBJDIR)/%.o:$(BPFSRCDIR)/%.c $(CC) $(INC_FLAGS) $(COMPILERFLAGS) \ $(LINUXINCLUDE) $(LIBBPF_HDRS) \ $(EXTRA_FLAGS) -c $< -o - | $(LLC) -march=bpf -filetype obj -o $@ I wanted to include that sample for whatever soul in the future wants to tread the same path with similar systems experience levels. I still get about 100+ warnings when building that are the same as or similar to: /usr/src/linux-headers-5.4.0-26-generic/arch/x86/include/asm/atomic.h:194:9: warning: unused variable '__ptr' [-Wunused-variable] return arch_cmpxchg(&v->counter, old, new); ^ /usr/src/linux-headers-5.4.0-26-generic/arch/x86/include/asm/msr.h:100:26: warning: variable 'low' is uninitialized when used here [-Wuninitialized] return EAX_EDX_VAL(val, low, high); ^~~ I suspect that these warnings come from my aggressive warning flags during compilation rather than from actual issues in the kernel.
I have been looking at the commits surrounding the pinning of bpf_link. It looks like it's only working in kernel 5.7? I did actually go through and attempt to attach links for kprobes, tracepoints, and raw_tracepoints in kernel 5.4 but, as you suggested, it seems unsupported. I have yet to try on kernel 5.5-5.7 so I'll take a look this week or next. As I mentioned before, with basic functionality in place here, I'm interested in working on some sort of BPF tutorial similar to the XDP tutorial (https://github.com/xdp-project/xdp-tutorial) with perhaps a more in-depth look at the technology included as well. I'm still fuzzy on the relationship between bpf(2) and perf(1). Would it be correct to say that for tracepoints, kprobes, and uprobes BPF leverages perf "under the hood" while for XDP and tc, this is more like classic BPF in that it's implementation doesn't involve perf? If that's the case then is the bpf_link object the tool to bridge BPF and perf? I noticed that when checking for pinned BPF programs with bpftool in kernel 5.4 that unless a kprobe, tracepoint, or uprobe is listed in "bpftool perf list", the program doesn't seem to be running. Is the use of perf to load BPF programs potentially a way to make them "headless" instead of pinning the bpf_link objects? Regardless, I'm excited to have a more reliable build system than I have in the past. I think I'll start looking more into CO-RE and libbpf on kernels 5.5-5.7. Hope everyone is staying healthy out there, Tristan On Thu, May 14, 2020 at 5:51 PM Andrii Nakryiko <andrii.nakryiko@...> wrote: On Mon, May 11, 2020 at 10:06 AM <mayfieldtristan@...> wrote:
|
|
Re: eBPF map - Control and Data plane concurrency
#bcc
Andrii Nakryiko
On Tue, May 12, 2020 at 2:19 AM <simonemagnani.96@gmail.com> wrote:
No, HASH_OF_MAPS allows arbitrary-sized keys, just like normal HASHMAP. Libbpf recently got a support for nicer map-in-map declaration and initialization, you might want to check it out: [0]. [0] https://patchwork.ozlabs.org/project/netdev/patch/20200428064140.122796-4-andriin@fb.com/
|
|
Re: Building BPF programs and kernel persistence
Andrii Nakryiko
On Mon, May 11, 2020 at 10:06 AM <mayfieldtristan@gmail.com> wrote:
Hi! For the future, I think cc'ing bpf@vger.kernel.org would be a good idea, there are a lot of folks who are probably not watching iovisor mailing list, but could help with issues like this. I'd start with actually specifying what compilation errors you run into. Also check out https://github.com/iovisor/bcc/blob/master/libbpf-tools/Makefile to see how BPF programs can be compiled properly outside of kernel tree. Though that one pretty much assumes vmlinux.h, which simplifies a bunch of compilation issues, probably. Right, pinning map or program doesn't ensure that program is still attached to whatever BPF hook you attached to it. As you mentioned, XDP, tc, cgroup-bpf programs are persistent. We are actually moving towards the model of auto-detachment for those as well. See recent activity around bpf_link. The solution with bpf_link to make such attachments persistent is through pinning **link** itself, not program or map. bpf_link is relatively recent addition, so on older kernels you'd have to make sure you still have some process around that would keep BPF attachment FD around.
Hope above helped. Please cc bpf@vger.kernel.org (and ideally send plain-text emails, kernel mailing lists don't accept HTML emails).
|
|
Re: eBPF map - Control and Data plane concurrency
#bcc
Simone Magnani
Thanks for the suggestion, now I feel more confident about this solution. However, I have still problems with the map-in-map type: is it possible to use a map which has as key the 4 tcp-session identifier {srcIp, dstIp, srcPort, dstPort} and as value a BPF_ARRAY which is a list of some packets' headers belonging to that session?
|
|
Building BPF programs and kernel persistence
Tristan Mayfield
Hi all, hope everyone is staying healthy out there. I've been working on building BPF programs, and have run into a few issues that I think might be clang (vs gcc) based. It seems that either clang isn't the most friendly of compilers when it comes to building Linux-native programs, or my lack of experience makes it seem so. I've been trying to build the simple BPF program below:
#include "bpf_helpers.h" #include <linux/bpf.h> #include <linux/version.h> #include <linux/types.h> #include <linux/tcp.h> #include <net/sock.h> struct inet_sock_set_state_args { long long pad; const void * skaddr; int oldstate; int newstate; u16 sport; u16 dport; u16 family; u8 protocol; u8 saddr[4]; u8 daddr[4]; u8 saddr_v6[16]; u8 daddr_v6[16]; }; SEC("tracepoint/sock/inet_sock_set_state") int bpf_prog(struct inet_sock_set_state_args *args) { struct sock *sk = (struct sock *)args->skaddr; short lport = args->sport; char msg[] = "lport: %d\n"; bpf_trace_printk(msg, sizeof(msg), lport); return 0; } char _license[] SEC("license") = "GPL"; I've been looking through selftests/bpf/, samples/bpf/, and examples on various blogs and articles.
From this, I've come up with the following makefile:
## Build tools LLC := llc CC := clang HOSTCC := clang CLANGINC := /usr/lib/llvm-10/lib/clang/10.0.0/include ## Some useful flags INC_FLAGS := -nostdinc -isystem $(CLANGINC) EXTRA_FLAGS := -O3 -emit-llvm ## Includes linuxhdrs := /usr/src/linux-headers-$(shell uname -r) LINUXINCLUDE := -include $(linuxhdrs)/include/linux/kconfig.h \ -include /usr/include/linux/bpf.h \ -I$(linuxhdrs)/arch/x86/include/ \ -I$(linuxhdrs)/arch/x86/include/uapi \ -I$(linuxhdrs)/arch/x86/include/generated \ -I$(linuxhdrs)/arch/x86/include/generated/uapi \ -I$(linuxhdrs)/include \ -I$(linuxhdrs)/include/uapi \ -I$(linuxhdrs)/include/generated/uapi \ LIBBPF := -I/home/vagrant/libbpf/src/ OBJS := tcptest.bpf.o $(OBJS): %.o:%.c $(CC) $(INC_FLAGS) \ -target bpf -D__KERNEL__ -D __ASM_SYSREG_H \ -D__BPF_TRACING__ -D__TARGET_ARCH_$(ARCH) \ -Wno-unused-value -Wno-pointer-sign \ -Wno-compare-distinct-pointer-types \ -Wno-gnu-variable-sized-type-not-at-end \ -Wno-address-of-packed-member \ -Wno-tautological-compare \ -Wno-unknown-warning-option \ -Wall -v \ $(LINUXINCLUDE) $(LIBBPF) \ $(EXTRA_FLAGS) -c $< -o - | $(LLC) -march=bpf -filetype obj -o $ Unfortunately, I keep running into what seems to be asm errors. I've tried reorganizing the list of include statements, taking out "-target bpf", not including some files, including other files, etc etc.
This stackoverflow post suggests that it's a kconfig.h error, but I seem to be including the file just fine (https://stackoverflow.com/questions/56975861/error-compiling-ebpf-c-code-out-of-kernel-tree/56990939#56990939).
I'm not really sure where to go from here with building BPF programs and including files that have the kernel datatypes. Maybe I'm missing something that's obvious that I'm just ignorant of?
As additional information, and regarding kernel persistence, I am working on a monitoring project that uses BPF programs to continuously monitor the system without the bulky dependencies that BCC includes. I'm concurrently working on a BTF/CO-RE solution but I'm emphasizing a non-CO-RE approach at the moment. I can load and run BPF programs but upon termination of my userspace loader the BPF programs themselves also terminate.
I would like to have the BPF program persist in the kernel even after the user space loader has completed its execution. I read in various documentation and in a 2015 LWN article that persistent BPF programs can be created by pinning programs and maps to the BPF vfs so as to keep the fds open. I have attempted pinning the entire BPF object, various programs and various maps, and no matter what I've tried the kernel BPF program terminates when the userspace process terminates. Using bpftool I have verified that the BPF files are pinned to the location and that BPF programs themselves all work. I know that persistent BPF programs are a part of projects like XDP and tc. Is there a way to do this for a generic BPF loader without having to implement customized kernel functions? Below I have included a simplified version of my code. In which I outline the basic steps I take to load the compiled bpf programs and attempt to make persistent instances of them.
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <errno.h> #include <getopt.h> #include <dirent.h> #include <sys/stat.h> #include <unistd.h> #include <assert.h> #include <linux/version.h>
#include "libbpf.h" #include "bpf.h" #include "loader_helpers.h"
#include <stdbool.h> #include <fcntl.h> #include <poll.h> #include <linux/perf_event.h> #include <assert.h> #include <sys/syscall.h> #include <sys/ioctl.h> #include <sys/mman.h> #include <time.h> #include <signal.h> #include <linux/ptrace.h>
int main(int argc, char **argv) {
struct bpf_object *bpf_obj; struct bpf_program *bpf_prog; struct bpf_map *map; char * license = "GPL"; __u32 kernelvers = LINUX_VERSION_CODE; struct bpf_link * link; int err; int prog_fd;
bpf_obj = bpf_object__open("test_file.bpf.o");
bpf_prog = bpf_program__next(NULL, bpf_obj);
err = bpf_program__set_tracepoint(bpf_prog); if(err) { fprintf(stderr, "ERR couldn't setup program type\n"); return -1; } err = bpf_program__load(bpf_prog, license, kernelvers); if(err) { fprintf(stderr, "ERR couldn't setup program phase\n"); return -1; } prog_fd = bpf_program__fd(bpf_prog);
link = bpf_program__attach_tracepoint(bpf_prog, "syscalls", "sys_enter_openat"); if(!link) { fprintf(stderr, "ERROR ATTACHING TRACEPOINT\n"); return -1; }
assert(bpf_program__is_tracepoint(bpf_prog));
pin: err = bpf_program__pin(bpf_prog, "/sys/fs/bpf/tpprogram"); if(err) { if(err == -17) { printf("Program exists...trying to unpin and retry!\n"); err = bpf_program__unpin(bpf_prog, "/sys/fs/bpf/tpprogram"); if(!err) { goto pin; } printf("The pining already exists but it couldn't be removed...\n"); return -1; } printf("We couldn't pin...%d\n", err); return -1; }
printf("Program pinned and working...\n");
return 0; } Thanks for having a look and I hope these issues can be cleared up. Seems like building is the last major hurdle I have to get rolling with better engineering solutions than manually including structs in my files. Hope everyone stays well!
|
|
Re: eBPF map - Control and Data plane concurrency
#bcc
Yonghong Song
Your approach seems okay. You can use two maps or use map-in-map.
toggle quoted messageShow quoted text
Using batch operation from user space should speedup the deletion operation.
On Sat, May 9, 2020 at 5:31 AM <simonemagnani.96@gmail.com> wrote:
|
|