Date   

Re: #bcc Count map.ringbuf_reserve() failures #bcc

นิวัฒน์ ไชยจันทร์
 


ในวันที่ อ. 2 พ.ย. 2021 21:31 น. Eelco Chaudron <echaudro@...> เขียนว่า:

[Edited Message Follows]

On Tue, Nov 2, 2021 at 03:24 PM, Eelco Chaudron wrote:

Hi,

Was wondering if there is a way to count the number of times map.ringbuf_reserve() fails for a BPF_RINGBUF_OUTPUT buffer?

This way I can get notified in userspace that I have missed events, and might need to increase the buffer size.

Cheers,

Eelco

Thought I added the #bcc tag but I no longer see it :( So just in case, it's not clear, this is with BCC.


Re: #bcc Count map.ringbuf_reserve() failures #bcc

Eelco Chaudron
 

On 3 Nov 2021, at 6:26, Y Song wrote:

On Tue, Nov 2, 2021 at 7:31 AM Eelco Chaudron <echaudro@...> wrote:

[Edited Message Follows]

On Tue, Nov 2, 2021 at 03:24 PM, Eelco Chaudron wrote:

Hi,

Was wondering if there is a way to count the number of times map.ringbuf_reserve() fails for a BPF_RINGBUF_OUTPUT buffer?

This way I can get notified in userspace that I have missed events, and might need to increase the buffer size.

Cheers,

Eelco

Thought I added the #bcc tag but I no longer see it :( So just in case, it's not clear, this is with BCC.
You can check return value of map.ringbuf_reserve(). If the
reservation failed, you can notify user space through map, another
side channel ringbuf, perf buf, etc. Depending on your program type
and program running context, you might be able to use
bpf_send_signal() helper to send a signal to the *current* process.
Thanks, I was looking at BCC to solve this in the wrappers, but you are right, as simple BPF_TABLE() solved it.

//Eelco


Re: #bcc Count map.ringbuf_reserve() failures #bcc

Yonghong Song
 

On Tue, Nov 2, 2021 at 7:31 AM Eelco Chaudron <echaudro@...> wrote:

[Edited Message Follows]

On Tue, Nov 2, 2021 at 03:24 PM, Eelco Chaudron wrote:

Hi,

Was wondering if there is a way to count the number of times map.ringbuf_reserve() fails for a BPF_RINGBUF_OUTPUT buffer?

This way I can get notified in userspace that I have missed events, and might need to increase the buffer size.

Cheers,

Eelco

Thought I added the #bcc tag but I no longer see it :( So just in case, it's not clear, this is with BCC.
You can check return value of map.ringbuf_reserve(). If the
reservation failed, you can notify user space through map, another
side channel ringbuf, perf buf, etc. Depending on your program type
and program running context, you might be able to use
bpf_send_signal() helper to send a signal to the *current* process.



Re: #bcc Count map.ringbuf_reserve() failures #bcc

Eelco Chaudron
 
Edited

On Tue, Nov 2, 2021 at 03:24 PM, Eelco Chaudron wrote:

Hi,

Was wondering if there is a way to count the number of times map.ringbuf_reserve() fails for a BPF_RINGBUF_OUTPUT buffer?

This way I can get notified in userspace that I have missed events, and might need to increase the buffer size.

Cheers,

Eelco

Thought I added the #bcc tag but I no longer see it :( So just in case, it's not clear, this is with BCC.


Count map.ringbuf_reserve() failures

Eelco Chaudron
 

Hi,

Was wondering if there is a way to count the number of times map.ringbuf_reserve() fails for a BPF_RINGBUF_OUTPUT buffer?

This way I can get notified in userspace that I have missed events, and might need to increase the buffer size.

Cheers,

Eelco


Re: Access packet payload in TC egress programs

Yonghong Song
 

On Fri, Oct 22, 2021 at 12:31 AM Federico Parola
<federico.parola@...> wrote:

Thanks for the answer, I wasn't aware of the existence of that helper.
I have two additional comments:

1. The documentation of the helper says that passing a length of zero
should pull the whole length of the packet [1], however with that
parameter the length of direct accessible data stays unchanged. I think
there is a mismatch in the behavior and the documentation.
The source code is
BPF_CALL_2(bpf_skb_pull_data, struct sk_buff *, skb, u32, len)
{
/* Idea is the following: should the needed direct read/write
* test fail during runtime, we can pull in more data and redo
* again, since implicitly, we invalidate previous checks here.
*
* Or, since we know how much we need to make read/writeable,
* this can be done once at the program beginning for direct
* access case. By this we overcome limitations of only current
* headroom being accessible.
*/
return bpf_try_make_writable(skb, len ? : skb_headlen(skb));
}

So if len is 0, it will only try to make *existing* linear data to be
writable, so
you are right. It seems we are not not trying to pull more data in. I will
check with other kernel developers later.


2. I'd like to avoid re-parsing all the headers after I have pulled new
data. To do so I save the offset I just reached (the end of the TCP
header), pull data, get the new data and data_end pointers and add the
offset to data. However the verifier does not accept my accesses to the
packet from this point on. Here is some example code:
The current behavior is after data pull, you will need to reparse the packet.
There are a lot of helpers fitting in this case:

bool bpf_helper_changes_pkt_data(void *func)
{
if (func == bpf_skb_vlan_push ||
func == bpf_skb_vlan_pop ||
func == bpf_skb_store_bytes ||
func == bpf_skb_change_proto ||
func == bpf_skb_change_head ||
func == sk_skb_change_head ||
func == bpf_skb_change_tail ||
func == sk_skb_change_tail ||
func == bpf_skb_adjust_room ||
func == sk_skb_adjust_room ||
func == bpf_skb_pull_data ||
func == sk_skb_pull_data ||
func == bpf_clone_redirect ||
func == bpf_l3_csum_replace ||
func == bpf_l4_csum_replace ||
func == bpf_xdp_adjust_head ||
func == bpf_xdp_adjust_meta ||
func == bpf_msg_pull_data ||
func == bpf_msg_push_data ||
func == bpf_msg_pop_data ||
func == bpf_xdp_adjust_tail ||
#if IS_ENABLED(CONFIG_IPV6_SEG6_BPF)
func == bpf_lwt_seg6_store_bytes ||
func == bpf_lwt_seg6_adjust_srh ||
func == bpf_lwt_seg6_action ||
#endif
#ifdef CONFIG_INET
func == bpf_sock_ops_store_hdr_opt ||
#endif
func == bpf_lwt_in_push_encap ||
func == bpf_lwt_xmit_push_encap)
return true;

return false;
}

It is possible that we could fine tune this behavior as some helpers
like bpf_skb_pull_data() may not need to start over again. But I
could miss some conditions.

Could you post your questions at bpf@...?
Networking people in the mailing list may give you a better
answer why this behavior for bpf_skb_pull_data() and whether
it can be improved.


unsigned payload_offset = (void *)tcph + (tcph->doff << 2) - data;
bpf_skb_pull_data(ctx, ctx->len);
data = (void *)(long)ctx->data;
data_end = (void *)(long)ctx->data_end;

struct tls_record_hdr *rech = data + payload_offset;
if ((void *)(rech + 1) > data_end)
return TC_ACT_OK;

if (rech->type == TLS_CONTENT_TYPE_HANDSAHKE)
bpf_trace_printk("It's a handshake");

Running this code gives me the error "R1 offset is outside of the
packet" even if I performed the correct check on packet boundaries. If I
re-parse all header the code is accepted. Is there a way to solve the
problem?

[1]
https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h#L2312

On 20/10/21 08:11, Y Song wrote:
On Tue, Oct 19, 2021 at 8:13 AM Federico Parola
<federico.parola@...> wrote:

Dear all,
how can I access the payload of the packet in a program attached to the
TC egress hook (SCHED_CLS attached to clsact qdisc)?
ctx->data_end points to the end of the L4 header, while on the ingress
hook it points to the end of the packet (tested on kernel v5.14).
This could be the case that linear data only covers up to the end of
L4 header. In such cases, you can use bpf_skb_pull_data() helper
to get more data into linear region and after that your ctx->data_end
will point to much later packet data.


Best regards,
Federico Parola








Re: Access packet payload in TC egress programs

Federico Parola
 

One update on point 2.
I found out that every time a pointer to the packet is increased of a variable value stored in a variable greater than 1 byte, subsequent checks against packet boundaries become ineffective.
In my example if I change payload_offset from unsigned to u8 the program is accepted, even though my offset can now only be 256 bytes at maximum.
Here is a toy example to test the problem:

int test(struct __sk_buff *ctx) {
void *data = (void *)(long)ctx->data;
void *data_end = (void *)(long)ctx->data_end;

/* Skipping an amount of bytes stored in __u8 works */
if (data + sizeof(__u8) > data_end)
return TC_ACT_OK;
bpf_trace_printk("Skipping %d bytes", *(__u8 *)data);
data += *(__u8 *)data;

/* Skipping an amount of bytes stored in __u16 works but... */
if (data + sizeof(__u16) > data_end)
return TC_ACT_OK;
bpf_trace_printk("Skipping %d bytes", *(__u16 *)data);
data += *(__u16 *)data;

/* ...this check is not effective and packet access is rejected */
if (data + sizeof(__u8) > data_end)
return TC_ACT_OK;
bpf_trace_printk("Next byte is %x", *(__u8 *)data);

return TC_ACT_OK;
}

My practical use case would be skipping varaible-size TLS header extensions until I reach the desired one.

On 22/10/21 09:31, Federico Parola wrote:
Thanks for the answer, I wasn't aware of the existence of that helper.
I have two additional comments:
1. The documentation of the helper says that passing a length of zero should pull the whole length of the packet [1], however with that parameter the length of direct accessible data stays unchanged. I think there is a mismatch in the behavior and the documentation.
2. I'd like to avoid re-parsing all the headers after I have pulled new data. To do so I save the offset I just reached (the end of the TCP header), pull data, get the new data and data_end pointers and add the offset to data. However the verifier does not accept my accesses to the packet from this point on. Here is some example code:
unsigned payload_offset = (void *)tcph + (tcph->doff << 2) - data;
bpf_skb_pull_data(ctx, ctx->len);
data = (void *)(long)ctx->data;
data_end = (void *)(long)ctx->data_end;
struct tls_record_hdr *rech = data + payload_offset;
if ((void *)(rech + 1) > data_end)
    return TC_ACT_OK;
if (rech->type == TLS_CONTENT_TYPE_HANDSAHKE)
    bpf_trace_printk("It's a handshake");
Running this code gives me the error "R1 offset is outside of the packet" even if I performed the correct check on packet boundaries. If I re-parse all header the code is accepted. Is there a way to solve the problem?
[1] https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h#L2312 On 20/10/21 08:11, Y Song wrote:
On Tue, Oct 19, 2021 at 8:13 AM Federico Parola
<federico.parola@...> wrote:

Dear all,
how can I access the payload of the packet in a program attached to the
TC egress hook (SCHED_CLS attached to clsact qdisc)?
ctx->data_end points to the end of the L4 header, while on the ingress
hook it points to the end of the packet (tested on kernel v5.14).
This could be the case that linear data only covers up to the end of
L4 header. In such cases, you can use bpf_skb_pull_data() helper
to get more data into linear region and after that your ctx->data_end
will point to much later packet data.


Best regards,
Federico Parola





Re: Access packet payload in TC egress programs

Federico Parola
 

Thanks for the answer, I wasn't aware of the existence of that helper.
I have two additional comments:

1. The documentation of the helper says that passing a length of zero should pull the whole length of the packet [1], however with that parameter the length of direct accessible data stays unchanged. I think there is a mismatch in the behavior and the documentation.

2. I'd like to avoid re-parsing all the headers after I have pulled new data. To do so I save the offset I just reached (the end of the TCP header), pull data, get the new data and data_end pointers and add the offset to data. However the verifier does not accept my accesses to the packet from this point on. Here is some example code:

unsigned payload_offset = (void *)tcph + (tcph->doff << 2) - data;
bpf_skb_pull_data(ctx, ctx->len);
data = (void *)(long)ctx->data;
data_end = (void *)(long)ctx->data_end;

struct tls_record_hdr *rech = data + payload_offset;
if ((void *)(rech + 1) > data_end)
return TC_ACT_OK;

if (rech->type == TLS_CONTENT_TYPE_HANDSAHKE)
bpf_trace_printk("It's a handshake");

Running this code gives me the error "R1 offset is outside of the packet" even if I performed the correct check on packet boundaries. If I re-parse all header the code is accepted. Is there a way to solve the problem?

[1] https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h#L2312

On 20/10/21 08:11, Y Song wrote:
On Tue, Oct 19, 2021 at 8:13 AM Federico Parola
<federico.parola@...> wrote:

Dear all,
how can I access the payload of the packet in a program attached to the
TC egress hook (SCHED_CLS attached to clsact qdisc)?
ctx->data_end points to the end of the L4 header, while on the ingress
hook it points to the end of the packet (tested on kernel v5.14).
This could be the case that linear data only covers up to the end of
L4 header. In such cases, you can use bpf_skb_pull_data() helper
to get more data into linear region and after that your ctx->data_end
will point to much later packet data.


Best regards,
Federico Parola




Re: Access packet payload in TC egress programs

Yonghong Song
 

On Tue, Oct 19, 2021 at 8:13 AM Federico Parola
<federico.parola@...> wrote:

Dear all,
how can I access the payload of the packet in a program attached to the
TC egress hook (SCHED_CLS attached to clsact qdisc)?
ctx->data_end points to the end of the L4 header, while on the ingress
hook it points to the end of the packet (tested on kernel v5.14).
This could be the case that linear data only covers up to the end of
L4 header. In such cases, you can use bpf_skb_pull_data() helper
to get more data into linear region and after that your ctx->data_end
will point to much later packet data.


Best regards,
Federico Parola





Access packet payload in TC egress programs

Federico Parola
 

Dear all,
how can I access the payload of the packet in a program attached to the TC egress hook (SCHED_CLS attached to clsact qdisc)?
ctx->data_end points to the end of the L4 header, while on the ingress hook it points to the end of the packet (tested on kernel v5.14).

Best regards,
Federico Parola


Tracing CPU utilisation

Raga lahari
 


Hello,


Furthermore, I would like to add following, 


I have a function test() that takes a string as an input and next functionality depends on the value it received. I would like to trace CPU utilization of test() function along with the value of b 


int test(string b) {

     ......

}

 

expecting like,

test(b="abc")  -  40%

test(b="defghijkl") - 60%


Perf probe and record are providing the number of occurrences of test() along with the variable value, But, it's not giving CPU cycles taken in each time.

It would be great, If someone could share some pointer that can help me here. 



Regards,

Ragalahari


Tracing CPU utilisation

Raga lahari
 

Hi,

Can someone please let me know that there is any way to get a CPU profile for a specific function using eBPF? 


Thanks & Regards, 

Ragalahari


LPC 2021 Networking and BPF Track CFP (2nd reminder)

Daniel Borkmann
 

This is a reminder for the Call for Proposals (CFP) for the Networking and
BPF track at the 2021 edition of the Linux Plumbers Conference (LPC), which
will be held virtually on the wider Internet, on September 20th - 24th, 2021.

This year's Networking and BPF track technical committee is comprised of:

David S. Miller <davem@...>
Jakub Kicinski <kuba@...>
Eric Dumazet <edumazet@...>
Alexei Starovoitov <ast@...>
Daniel Borkmann <daniel@...>
Andrii Nakryiko <andrii@...>

We are seeking proposals of 40 minutes in length (including Q&A discussion),
optionally accompanied by papers of 2 to 10 pages in length.

Any kind of advanced Linux networking and/or BPF related topic will be considered.

Please submit your proposals through the official LPC website at:

https://linuxplumbersconf.org/event/11/abstracts/

Make sure to select "Networking & BPF Summit" in the Track pull-down menu.

Proposals must be submitted by August 13th, and submitters will be notified of
acceptance by August 16th.

Final slides and papers (as PDF) are due on the first day of the conference.


LPC 2021 Networking and BPF Track CFP (Reminder)

Daniel Borkmann
 

This is a reminder for the Call for Proposals (CFP) for the Networking and
BPF track at the 2021 edition of the Linux Plumbers Conference (LPC), which
will be held virtually on the wider Internet, on September 20th - 24th, 2021.

This year's Networking and BPF track technical committee is comprised of:

David S. Miller <davem@...>
Jakub Kicinski <kuba@...>
Eric Dumazet <edumazet@...>
Alexei Starovoitov <ast@...>
Daniel Borkmann <daniel@...>
Andrii Nakryiko <andrii@...>

We are seeking proposals of 40 minutes in length (including Q&A discussion),
optionally accompanied by papers of 2 to 10 pages in length.

Any kind of advanced Linux networking and/or BPF related topic will be considered.

Please submit your proposals through the official LPC website at:

https://linuxplumbersconf.org/event/11/abstracts/

Make sure to select "Networking & BPF Summit" in the Track pull-down menu.

Proposals must be submitted by August 13th, and submitters will be notified of
acceptance by August 16th.

Final slides and papers (as PDF) are due on the first day of the conference.


Using XDP in docker swarm to track outgoing traffic

Sebastião Santos Boavida Amaro
 

Hi everyone,
I am trying to use XDP to track outgoing traffic from docker containers deployed using docker swarm and running in a network using the driver overlay. I am using a simple xdp program based on [1], and I run this program on the network namespace of the container using nsenter and attach it to its eth0.
However, I am only able to detect the incoming packets and not the outgoing ones. When running tcpdump on the container network namespace I can see both incoming and outgoing packets. So I am a bit confused as to why XDP would not detect the outgoing ones.
Does anyone know why the reason for this or a general idea as to why this might happen?

[1]https://github.com/iovisor/bcc/blob/master/examples/networking/xdp/xdp_drop_count.py

Best Regards,
Sebastião Amaro


BPF map pinning

Raga lahari
 

Hi,

Can someone please help me to pin a BPF map in custom path for a TC program. My requirement is pin  a map in interface-specific path (like /sys/fs/bpf/eth0/)


Regards,
Ragalahari


Re: Question about map.increment() #bcc

Donald Hunter
 

On Sun, 25 Apr 2021 at 20:18, Y Song <ys114321@...> wrote:
This is a good question. In earlier bpf days, the key MUST be from
stack. Otherwise, the verifier will fail. Nowadays, things become
better and keys can be from verifier recognizable memory regions
(stack, key, value, allocated_mem, etc.). I think rewriter can be made
smart to check if the first argument of the increment is actually a
variable (instead of an expression), we can directly take the address
of it since the variable can be allocated on stack. The relevant code
is at b_frontend_action.cc. Do you want to take a look to see whether
you could help improve the bcc rewriter for this particular issue?
I am happy to take a look at the code and see if I can improve it at all.

Thanks,
Donald.


Re: Question about map.increment() #bcc

Yonghong Song
 

On Thu, Apr 22, 2021 at 4:18 AM Donald Hunter <donald.hunter@...> wrote:

Is there a reason why map.increment() internally copies the key into a stack variable? When building a key inline, it uses double the stack space and incurs the cost of a copy. For u64 keys this is fine but for larger custom keys, e.g. containing a char[] it blows up the stack pretty quickly.
This is a good question. In earlier bpf days, the key MUST be from
stack. Otherwise, the verifier will fail. Nowadays, things become
better and keys can be from verifier recognizable memory regions
(stack, key, value, allocated_mem, etc.). I think rewriter can be made
smart to check if the first argument of the increment is actually a
variable (instead of an expression), we can directly take the address
of it since the variable can be allocated on stack. The relevant code
is at b_frontend_action.cc. Do you want to take a look to see whether
you could help improve the bcc rewriter for this particular issue?


Thanks, Donald.


Question about map.increment() #bcc

Donald Hunter
 

Is there a reason why map.increment() internally copies the key into a stack variable? When building a key inline, it uses double the stack space and incurs the cost of a copy. For u64 keys this is fine but for larger custom keys, e.g. containing a char[] it blows up the stack pretty quickly.

Thanks, Donald.


Re: [libbpf] Questions about XDP/TC

Toke Høiland-Jørgensen
 

chenhengqi@... writes:

1. How do I attach `BPF_PROG_TYPE_SCHED_CLS`/`classifier` BPF programs to specific data path(i.e. ingress or egress) using libbpf ?
libbpf does not yet support attaching to TC hooks, but there is work in
progress to add this. See
https://lore.kernel.org/bpf/20210325120020.236504-4-memxor@gmail.com/

(an updated version should hopefully show up soon).

I found some comments related in the source:
```
The **BPF_F_INGRESS** value in *flags* is used to make the distinction (ingress path is selected if the flag is present, egress path otherwise).
```

How can I get that flag, am I missing something ?

2. How do I attach `XDP` BPF programs using specific mode(i.e.
xdpgeneric/xdpdrv)?
Just set XDP_FLAGS_SKB_MODE or XDP_FLAGS_DRV_MODE when attaching...

-Toke