Re: #bcc Count map.ringbuf_reserve() failures
#bcc
นิวัฒน์ ไชยจันทร์
ในวันที่ อ. 2 พ.ย. 2021 21:31 น. Eelco Chaudron <echaudro@...> เขียนว่า:
|
|
Re: #bcc Count map.ringbuf_reserve() failures
#bcc
Eelco Chaudron
On 3 Nov 2021, at 6:26, Y Song wrote:
On Tue, Nov 2, 2021 at 7:31 AM Eelco Chaudron <echaudro@...> wrote:Thanks, I was looking at BCC to solve this in the wrappers, but you are right, as simple BPF_TABLE() solved it.You can check return value of map.ringbuf_reserve(). If the //Eelco |
|
Re: #bcc Count map.ringbuf_reserve() failures
#bcc
Yonghong Song
On Tue, Nov 2, 2021 at 7:31 AM Eelco Chaudron <echaudro@...> wrote:
You can check return value of map.ringbuf_reserve(). If the reservation failed, you can notify user space through map, another side channel ringbuf, perf buf, etc. Depending on your program type and program running context, you might be able to use bpf_send_signal() helper to send a signal to the *current* process. |
|
Re: #bcc Count map.ringbuf_reserve() failures
#bcc
On Tue, Nov 2, 2021 at 03:24 PM, Eelco Chaudron wrote:
Thought I added the #bcc tag but I no longer see it :( So just in case, it's not clear, this is with BCC. |
|
Count map.ringbuf_reserve() failures
Eelco Chaudron
Hi, Was wondering if there is a way to count the number of times map.ringbuf_reserve() fails for a BPF_RINGBUF_OUTPUT buffer? This way I can get notified in userspace that I have missed events, and might need to increase the buffer size. Cheers, Eelco |
|
Re: Access packet payload in TC egress programs
Yonghong Song
On Fri, Oct 22, 2021 at 12:31 AM Federico Parola
<federico.parola@...> wrote: The source code is BPF_CALL_2(bpf_skb_pull_data, struct sk_buff *, skb, u32, len) { /* Idea is the following: should the needed direct read/write * test fail during runtime, we can pull in more data and redo * again, since implicitly, we invalidate previous checks here. * * Or, since we know how much we need to make read/writeable, * this can be done once at the program beginning for direct * access case. By this we overcome limitations of only current * headroom being accessible. */ return bpf_try_make_writable(skb, len ? : skb_headlen(skb)); } So if len is 0, it will only try to make *existing* linear data to be writable, so you are right. It seems we are not not trying to pull more data in. I will check with other kernel developers later. The current behavior is after data pull, you will need to reparse the packet. There are a lot of helpers fitting in this case: bool bpf_helper_changes_pkt_data(void *func) { if (func == bpf_skb_vlan_push || func == bpf_skb_vlan_pop || func == bpf_skb_store_bytes || func == bpf_skb_change_proto || func == bpf_skb_change_head || func == sk_skb_change_head || func == bpf_skb_change_tail || func == sk_skb_change_tail || func == bpf_skb_adjust_room || func == sk_skb_adjust_room || func == bpf_skb_pull_data || func == sk_skb_pull_data || func == bpf_clone_redirect || func == bpf_l3_csum_replace || func == bpf_l4_csum_replace || func == bpf_xdp_adjust_head || func == bpf_xdp_adjust_meta || func == bpf_msg_pull_data || func == bpf_msg_push_data || func == bpf_msg_pop_data || func == bpf_xdp_adjust_tail || #if IS_ENABLED(CONFIG_IPV6_SEG6_BPF) func == bpf_lwt_seg6_store_bytes || func == bpf_lwt_seg6_adjust_srh || func == bpf_lwt_seg6_action || #endif #ifdef CONFIG_INET func == bpf_sock_ops_store_hdr_opt || #endif func == bpf_lwt_in_push_encap || func == bpf_lwt_xmit_push_encap) return true; return false; } It is possible that we could fine tune this behavior as some helpers like bpf_skb_pull_data() may not need to start over again. But I could miss some conditions. Could you post your questions at bpf@...? Networking people in the mailing list may give you a better answer why this behavior for bpf_skb_pull_data() and whether it can be improved.
|
|
Re: Access packet payload in TC egress programs
Federico Parola
One update on point 2.
toggle quoted message
Show quoted text
I found out that every time a pointer to the packet is increased of a variable value stored in a variable greater than 1 byte, subsequent checks against packet boundaries become ineffective. In my example if I change payload_offset from unsigned to u8 the program is accepted, even though my offset can now only be 256 bytes at maximum. Here is a toy example to test the problem: int test(struct __sk_buff *ctx) { void *data = (void *)(long)ctx->data; void *data_end = (void *)(long)ctx->data_end; /* Skipping an amount of bytes stored in __u8 works */ if (data + sizeof(__u8) > data_end) return TC_ACT_OK; bpf_trace_printk("Skipping %d bytes", *(__u8 *)data); data += *(__u8 *)data; /* Skipping an amount of bytes stored in __u16 works but... */ if (data + sizeof(__u16) > data_end) return TC_ACT_OK; bpf_trace_printk("Skipping %d bytes", *(__u16 *)data); data += *(__u16 *)data; /* ...this check is not effective and packet access is rejected */ if (data + sizeof(__u8) > data_end) return TC_ACT_OK; bpf_trace_printk("Next byte is %x", *(__u8 *)data); return TC_ACT_OK; } My practical use case would be skipping varaible-size TLS header extensions until I reach the desired one. On 22/10/21 09:31, Federico Parola wrote:
Thanks for the answer, I wasn't aware of the existence of that helper. |
|
Re: Access packet payload in TC egress programs
Federico Parola
Thanks for the answer, I wasn't aware of the existence of that helper.
toggle quoted message
Show quoted text
I have two additional comments: 1. The documentation of the helper says that passing a length of zero should pull the whole length of the packet [1], however with that parameter the length of direct accessible data stays unchanged. I think there is a mismatch in the behavior and the documentation. 2. I'd like to avoid re-parsing all the headers after I have pulled new data. To do so I save the offset I just reached (the end of the TCP header), pull data, get the new data and data_end pointers and add the offset to data. However the verifier does not accept my accesses to the packet from this point on. Here is some example code: unsigned payload_offset = (void *)tcph + (tcph->doff << 2) - data; bpf_skb_pull_data(ctx, ctx->len); data = (void *)(long)ctx->data; data_end = (void *)(long)ctx->data_end; struct tls_record_hdr *rech = data + payload_offset; if ((void *)(rech + 1) > data_end) return TC_ACT_OK; if (rech->type == TLS_CONTENT_TYPE_HANDSAHKE) bpf_trace_printk("It's a handshake"); Running this code gives me the error "R1 offset is outside of the packet" even if I performed the correct check on packet boundaries. If I re-parse all header the code is accepted. Is there a way to solve the problem? [1] https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h#L2312 On 20/10/21 08:11, Y Song wrote:
On Tue, Oct 19, 2021 at 8:13 AM Federico Parola |
|
Re: Access packet payload in TC egress programs
Yonghong Song
On Tue, Oct 19, 2021 at 8:13 AM Federico Parola
<federico.parola@...> wrote: This could be the case that linear data only covers up to the end of L4 header. In such cases, you can use bpf_skb_pull_data() helper to get more data into linear region and after that your ctx->data_end will point to much later packet data.
|
|
Access packet payload in TC egress programs
Federico Parola
Dear all,
how can I access the payload of the packet in a program attached to the TC egress hook (SCHED_CLS attached to clsact qdisc)? ctx->data_end points to the end of the L4 header, while on the ingress hook it points to the end of the packet (tested on kernel v5.14). Best regards, Federico Parola |
|
Tracing CPU utilisation
Raga lahari
Hello, Furthermore, I would like to add following, I have a function test() that takes a string as an input and next functionality depends on the value it received. I would like to trace CPU utilization of test() function along with the value of b int test(string b) { ...... }
expecting like, test(b="abc") - 40% test(b="defghijkl") - 60% Perf probe and record are providing the number of occurrences of test() along with the variable value, But, it's not giving CPU cycles taken in each time. It would be great, If someone could share some pointer that can help me here. Regards, Ragalahari |
|
Tracing CPU utilisation
Raga lahari
Hi, Can someone please let me know that there is any way to get a CPU profile for a specific function using eBPF? Thanks & Regards, Ragalahari |
|
LPC 2021 Networking and BPF Track CFP (2nd reminder)
Daniel Borkmann
This is a reminder for the Call for Proposals (CFP) for the Networking and
BPF track at the 2021 edition of the Linux Plumbers Conference (LPC), which will be held virtually on the wider Internet, on September 20th - 24th, 2021. This year's Networking and BPF track technical committee is comprised of: David S. Miller <davem@...> Jakub Kicinski <kuba@...> Eric Dumazet <edumazet@...> Alexei Starovoitov <ast@...> Daniel Borkmann <daniel@...> Andrii Nakryiko <andrii@...> We are seeking proposals of 40 minutes in length (including Q&A discussion), optionally accompanied by papers of 2 to 10 pages in length. Any kind of advanced Linux networking and/or BPF related topic will be considered. Please submit your proposals through the official LPC website at: https://linuxplumbersconf.org/event/11/abstracts/ Make sure to select "Networking & BPF Summit" in the Track pull-down menu. Proposals must be submitted by August 13th, and submitters will be notified of acceptance by August 16th. Final slides and papers (as PDF) are due on the first day of the conference. |
|
LPC 2021 Networking and BPF Track CFP (Reminder)
Daniel Borkmann
This is a reminder for the Call for Proposals (CFP) for the Networking and
BPF track at the 2021 edition of the Linux Plumbers Conference (LPC), which will be held virtually on the wider Internet, on September 20th - 24th, 2021. This year's Networking and BPF track technical committee is comprised of: David S. Miller <davem@...> Jakub Kicinski <kuba@...> Eric Dumazet <edumazet@...> Alexei Starovoitov <ast@...> Daniel Borkmann <daniel@...> Andrii Nakryiko <andrii@...> We are seeking proposals of 40 minutes in length (including Q&A discussion), optionally accompanied by papers of 2 to 10 pages in length. Any kind of advanced Linux networking and/or BPF related topic will be considered. Please submit your proposals through the official LPC website at: https://linuxplumbersconf.org/event/11/abstracts/ Make sure to select "Networking & BPF Summit" in the Track pull-down menu. Proposals must be submitted by August 13th, and submitters will be notified of acceptance by August 16th. Final slides and papers (as PDF) are due on the first day of the conference. |
|
Using XDP in docker swarm to track outgoing traffic
Sebastião Santos Boavida Amaro
Hi everyone,
I am trying to use XDP to track outgoing traffic from docker containers deployed using docker swarm and running in a network using the driver overlay. I am using a simple xdp program based on [1], and I run this program on the network namespace of the container using nsenter and attach it to its eth0. However, I am only able to detect the incoming packets and not the outgoing ones. When running tcpdump on the container network namespace I can see both incoming and outgoing packets. So I am a bit confused as to why XDP would not detect the outgoing ones. Does anyone know why the reason for this or a general idea as to why this might happen? [1]https://github.com/iovisor/bcc/blob/master/examples/networking/xdp/xdp_drop_count.py Best Regards, Sebastião Amaro |
|
BPF map pinning
Raga lahari
Hi, Can someone please help me to pin a BPF map in custom path for a TC program. My requirement is pin a map in interface-specific path (like /sys/fs/bpf/eth0/) Regards, |
|
Re: Question about map.increment()
#bcc
Donald Hunter
On Sun, 25 Apr 2021 at 20:18, Y Song <ys114321@...> wrote:
This is a good question. In earlier bpf days, the key MUST be fromI am happy to take a look at the code and see if I can improve it at all. Thanks, Donald. |
|
Re: Question about map.increment()
#bcc
Yonghong Song
On Thu, Apr 22, 2021 at 4:18 AM Donald Hunter <donald.hunter@...> wrote:
This is a good question. In earlier bpf days, the key MUST be from stack. Otherwise, the verifier will fail. Nowadays, things become better and keys can be from verifier recognizable memory regions (stack, key, value, allocated_mem, etc.). I think rewriter can be made smart to check if the first argument of the increment is actually a variable (instead of an expression), we can directly take the address of it since the variable can be allocated on stack. The relevant code is at b_frontend_action.cc. Do you want to take a look to see whether you could help improve the bcc rewriter for this particular issue?
|
|
Question about map.increment()
#bcc
Donald Hunter
Is there a reason why map.increment() internally copies the key into a stack variable? When building a key inline, it uses double the stack space and incurs the cost of a copy. For u64 keys this is fine but for larger custom keys, e.g. containing a char[] it blows up the stack pretty quickly.
Thanks, Donald. |
|
Re: [libbpf] Questions about XDP/TC
Toke Høiland-Jørgensen
chenhengqi@... writes:
1. How do I attach `BPF_PROG_TYPE_SCHED_CLS`/`classifier` BPF programs to specific data path(i.e. ingress or egress) using libbpf ?libbpf does not yet support attaching to TC hooks, but there is work in progress to add this. See https://lore.kernel.org/bpf/20210325120020.236504-4-memxor@gmail.com/ (an updated version should hopefully show up soon). I found some comments related in the source:Just set XDP_FLAGS_SKB_MODE or XDP_FLAGS_DRV_MODE when attaching... -Toke |
|