Date
1 - 5 of 5
Access packet payload in TC egress programs
Federico Parola
Dear all,
how can I access the payload of the packet in a program attached to the TC egress hook (SCHED_CLS attached to clsact qdisc)? ctx->data_end points to the end of the L4 header, while on the ingress hook it points to the end of the packet (tested on kernel v5.14). Best regards, Federico Parola |
|
Yonghong Song
On Tue, Oct 19, 2021 at 8:13 AM Federico Parola
<federico.parola@...> wrote: This could be the case that linear data only covers up to the end of L4 header. In such cases, you can use bpf_skb_pull_data() helper to get more data into linear region and after that your ctx->data_end will point to much later packet data.
|
|
Federico Parola
Thanks for the answer, I wasn't aware of the existence of that helper.
toggle quoted message
Show quoted text
I have two additional comments: 1. The documentation of the helper says that passing a length of zero should pull the whole length of the packet [1], however with that parameter the length of direct accessible data stays unchanged. I think there is a mismatch in the behavior and the documentation. 2. I'd like to avoid re-parsing all the headers after I have pulled new data. To do so I save the offset I just reached (the end of the TCP header), pull data, get the new data and data_end pointers and add the offset to data. However the verifier does not accept my accesses to the packet from this point on. Here is some example code: unsigned payload_offset = (void *)tcph + (tcph->doff << 2) - data; bpf_skb_pull_data(ctx, ctx->len); data = (void *)(long)ctx->data; data_end = (void *)(long)ctx->data_end; struct tls_record_hdr *rech = data + payload_offset; if ((void *)(rech + 1) > data_end) return TC_ACT_OK; if (rech->type == TLS_CONTENT_TYPE_HANDSAHKE) bpf_trace_printk("It's a handshake"); Running this code gives me the error "R1 offset is outside of the packet" even if I performed the correct check on packet boundaries. If I re-parse all header the code is accepted. Is there a way to solve the problem? [1] https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h#L2312 On 20/10/21 08:11, Y Song wrote:
On Tue, Oct 19, 2021 at 8:13 AM Federico Parola |
|
Federico Parola
One update on point 2.
toggle quoted message
Show quoted text
I found out that every time a pointer to the packet is increased of a variable value stored in a variable greater than 1 byte, subsequent checks against packet boundaries become ineffective. In my example if I change payload_offset from unsigned to u8 the program is accepted, even though my offset can now only be 256 bytes at maximum. Here is a toy example to test the problem: int test(struct __sk_buff *ctx) { void *data = (void *)(long)ctx->data; void *data_end = (void *)(long)ctx->data_end; /* Skipping an amount of bytes stored in __u8 works */ if (data + sizeof(__u8) > data_end) return TC_ACT_OK; bpf_trace_printk("Skipping %d bytes", *(__u8 *)data); data += *(__u8 *)data; /* Skipping an amount of bytes stored in __u16 works but... */ if (data + sizeof(__u16) > data_end) return TC_ACT_OK; bpf_trace_printk("Skipping %d bytes", *(__u16 *)data); data += *(__u16 *)data; /* ...this check is not effective and packet access is rejected */ if (data + sizeof(__u8) > data_end) return TC_ACT_OK; bpf_trace_printk("Next byte is %x", *(__u8 *)data); return TC_ACT_OK; } My practical use case would be skipping varaible-size TLS header extensions until I reach the desired one. On 22/10/21 09:31, Federico Parola wrote:
Thanks for the answer, I wasn't aware of the existence of that helper. |
|
Yonghong Song
On Fri, Oct 22, 2021 at 12:31 AM Federico Parola
<federico.parola@...> wrote: The source code is BPF_CALL_2(bpf_skb_pull_data, struct sk_buff *, skb, u32, len) { /* Idea is the following: should the needed direct read/write * test fail during runtime, we can pull in more data and redo * again, since implicitly, we invalidate previous checks here. * * Or, since we know how much we need to make read/writeable, * this can be done once at the program beginning for direct * access case. By this we overcome limitations of only current * headroom being accessible. */ return bpf_try_make_writable(skb, len ? : skb_headlen(skb)); } So if len is 0, it will only try to make *existing* linear data to be writable, so you are right. It seems we are not not trying to pull more data in. I will check with other kernel developers later. The current behavior is after data pull, you will need to reparse the packet. There are a lot of helpers fitting in this case: bool bpf_helper_changes_pkt_data(void *func) { if (func == bpf_skb_vlan_push || func == bpf_skb_vlan_pop || func == bpf_skb_store_bytes || func == bpf_skb_change_proto || func == bpf_skb_change_head || func == sk_skb_change_head || func == bpf_skb_change_tail || func == sk_skb_change_tail || func == bpf_skb_adjust_room || func == sk_skb_adjust_room || func == bpf_skb_pull_data || func == sk_skb_pull_data || func == bpf_clone_redirect || func == bpf_l3_csum_replace || func == bpf_l4_csum_replace || func == bpf_xdp_adjust_head || func == bpf_xdp_adjust_meta || func == bpf_msg_pull_data || func == bpf_msg_push_data || func == bpf_msg_pop_data || func == bpf_xdp_adjust_tail || #if IS_ENABLED(CONFIG_IPV6_SEG6_BPF) func == bpf_lwt_seg6_store_bytes || func == bpf_lwt_seg6_adjust_srh || func == bpf_lwt_seg6_action || #endif #ifdef CONFIG_INET func == bpf_sock_ops_store_hdr_opt || #endif func == bpf_lwt_in_push_encap || func == bpf_lwt_xmit_push_encap) return true; return false; } It is possible that we could fine tune this behavior as some helpers like bpf_skb_pull_data() may not need to start over again. But I could miss some conditions. Could you post your questions at bpf@...? Networking people in the mailing list may give you a better answer why this behavior for bpf_skb_pull_data() and whether it can be improved.
|
|