Access packet payload in TC egress programs


Federico Parola
 

Dear all,
how can I access the payload of the packet in a program attached to the TC egress hook (SCHED_CLS attached to clsact qdisc)?
ctx->data_end points to the end of the L4 header, while on the ingress hook it points to the end of the packet (tested on kernel v5.14).

Best regards,
Federico Parola


Yonghong Song
 

On Tue, Oct 19, 2021 at 8:13 AM Federico Parola
<federico.parola@...> wrote:

Dear all,
how can I access the payload of the packet in a program attached to the
TC egress hook (SCHED_CLS attached to clsact qdisc)?
ctx->data_end points to the end of the L4 header, while on the ingress
hook it points to the end of the packet (tested on kernel v5.14).
This could be the case that linear data only covers up to the end of
L4 header. In such cases, you can use bpf_skb_pull_data() helper
to get more data into linear region and after that your ctx->data_end
will point to much later packet data.


Best regards,
Federico Parola





Federico Parola
 

Thanks for the answer, I wasn't aware of the existence of that helper.
I have two additional comments:

1. The documentation of the helper says that passing a length of zero should pull the whole length of the packet [1], however with that parameter the length of direct accessible data stays unchanged. I think there is a mismatch in the behavior and the documentation.

2. I'd like to avoid re-parsing all the headers after I have pulled new data. To do so I save the offset I just reached (the end of the TCP header), pull data, get the new data and data_end pointers and add the offset to data. However the verifier does not accept my accesses to the packet from this point on. Here is some example code:

unsigned payload_offset = (void *)tcph + (tcph->doff << 2) - data;
bpf_skb_pull_data(ctx, ctx->len);
data = (void *)(long)ctx->data;
data_end = (void *)(long)ctx->data_end;

struct tls_record_hdr *rech = data + payload_offset;
if ((void *)(rech + 1) > data_end)
return TC_ACT_OK;

if (rech->type == TLS_CONTENT_TYPE_HANDSAHKE)
bpf_trace_printk("It's a handshake");

Running this code gives me the error "R1 offset is outside of the packet" even if I performed the correct check on packet boundaries. If I re-parse all header the code is accepted. Is there a way to solve the problem?

[1] https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h#L2312

On 20/10/21 08:11, Y Song wrote:
On Tue, Oct 19, 2021 at 8:13 AM Federico Parola
<federico.parola@...> wrote:

Dear all,
how can I access the payload of the packet in a program attached to the
TC egress hook (SCHED_CLS attached to clsact qdisc)?
ctx->data_end points to the end of the L4 header, while on the ingress
hook it points to the end of the packet (tested on kernel v5.14).
This could be the case that linear data only covers up to the end of
L4 header. In such cases, you can use bpf_skb_pull_data() helper
to get more data into linear region and after that your ctx->data_end
will point to much later packet data.


Best regards,
Federico Parola




Federico Parola
 

One update on point 2.
I found out that every time a pointer to the packet is increased of a variable value stored in a variable greater than 1 byte, subsequent checks against packet boundaries become ineffective.
In my example if I change payload_offset from unsigned to u8 the program is accepted, even though my offset can now only be 256 bytes at maximum.
Here is a toy example to test the problem:

int test(struct __sk_buff *ctx) {
void *data = (void *)(long)ctx->data;
void *data_end = (void *)(long)ctx->data_end;

/* Skipping an amount of bytes stored in __u8 works */
if (data + sizeof(__u8) > data_end)
return TC_ACT_OK;
bpf_trace_printk("Skipping %d bytes", *(__u8 *)data);
data += *(__u8 *)data;

/* Skipping an amount of bytes stored in __u16 works but... */
if (data + sizeof(__u16) > data_end)
return TC_ACT_OK;
bpf_trace_printk("Skipping %d bytes", *(__u16 *)data);
data += *(__u16 *)data;

/* ...this check is not effective and packet access is rejected */
if (data + sizeof(__u8) > data_end)
return TC_ACT_OK;
bpf_trace_printk("Next byte is %x", *(__u8 *)data);

return TC_ACT_OK;
}

My practical use case would be skipping varaible-size TLS header extensions until I reach the desired one.

On 22/10/21 09:31, Federico Parola wrote:
Thanks for the answer, I wasn't aware of the existence of that helper.
I have two additional comments:
1. The documentation of the helper says that passing a length of zero should pull the whole length of the packet [1], however with that parameter the length of direct accessible data stays unchanged. I think there is a mismatch in the behavior and the documentation.
2. I'd like to avoid re-parsing all the headers after I have pulled new data. To do so I save the offset I just reached (the end of the TCP header), pull data, get the new data and data_end pointers and add the offset to data. However the verifier does not accept my accesses to the packet from this point on. Here is some example code:
unsigned payload_offset = (void *)tcph + (tcph->doff << 2) - data;
bpf_skb_pull_data(ctx, ctx->len);
data = (void *)(long)ctx->data;
data_end = (void *)(long)ctx->data_end;
struct tls_record_hdr *rech = data + payload_offset;
if ((void *)(rech + 1) > data_end)
    return TC_ACT_OK;
if (rech->type == TLS_CONTENT_TYPE_HANDSAHKE)
    bpf_trace_printk("It's a handshake");
Running this code gives me the error "R1 offset is outside of the packet" even if I performed the correct check on packet boundaries. If I re-parse all header the code is accepted. Is there a way to solve the problem?
[1] https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h#L2312 On 20/10/21 08:11, Y Song wrote:
On Tue, Oct 19, 2021 at 8:13 AM Federico Parola
<federico.parola@...> wrote:

Dear all,
how can I access the payload of the packet in a program attached to the
TC egress hook (SCHED_CLS attached to clsact qdisc)?
ctx->data_end points to the end of the L4 header, while on the ingress
hook it points to the end of the packet (tested on kernel v5.14).
This could be the case that linear data only covers up to the end of
L4 header. In such cases, you can use bpf_skb_pull_data() helper
to get more data into linear region and after that your ctx->data_end
will point to much later packet data.


Best regards,
Federico Parola





Yonghong Song
 

On Fri, Oct 22, 2021 at 12:31 AM Federico Parola
<federico.parola@...> wrote:

Thanks for the answer, I wasn't aware of the existence of that helper.
I have two additional comments:

1. The documentation of the helper says that passing a length of zero
should pull the whole length of the packet [1], however with that
parameter the length of direct accessible data stays unchanged. I think
there is a mismatch in the behavior and the documentation.
The source code is
BPF_CALL_2(bpf_skb_pull_data, struct sk_buff *, skb, u32, len)
{
/* Idea is the following: should the needed direct read/write
* test fail during runtime, we can pull in more data and redo
* again, since implicitly, we invalidate previous checks here.
*
* Or, since we know how much we need to make read/writeable,
* this can be done once at the program beginning for direct
* access case. By this we overcome limitations of only current
* headroom being accessible.
*/
return bpf_try_make_writable(skb, len ? : skb_headlen(skb));
}

So if len is 0, it will only try to make *existing* linear data to be
writable, so
you are right. It seems we are not not trying to pull more data in. I will
check with other kernel developers later.


2. I'd like to avoid re-parsing all the headers after I have pulled new
data. To do so I save the offset I just reached (the end of the TCP
header), pull data, get the new data and data_end pointers and add the
offset to data. However the verifier does not accept my accesses to the
packet from this point on. Here is some example code:
The current behavior is after data pull, you will need to reparse the packet.
There are a lot of helpers fitting in this case:

bool bpf_helper_changes_pkt_data(void *func)
{
if (func == bpf_skb_vlan_push ||
func == bpf_skb_vlan_pop ||
func == bpf_skb_store_bytes ||
func == bpf_skb_change_proto ||
func == bpf_skb_change_head ||
func == sk_skb_change_head ||
func == bpf_skb_change_tail ||
func == sk_skb_change_tail ||
func == bpf_skb_adjust_room ||
func == sk_skb_adjust_room ||
func == bpf_skb_pull_data ||
func == sk_skb_pull_data ||
func == bpf_clone_redirect ||
func == bpf_l3_csum_replace ||
func == bpf_l4_csum_replace ||
func == bpf_xdp_adjust_head ||
func == bpf_xdp_adjust_meta ||
func == bpf_msg_pull_data ||
func == bpf_msg_push_data ||
func == bpf_msg_pop_data ||
func == bpf_xdp_adjust_tail ||
#if IS_ENABLED(CONFIG_IPV6_SEG6_BPF)
func == bpf_lwt_seg6_store_bytes ||
func == bpf_lwt_seg6_adjust_srh ||
func == bpf_lwt_seg6_action ||
#endif
#ifdef CONFIG_INET
func == bpf_sock_ops_store_hdr_opt ||
#endif
func == bpf_lwt_in_push_encap ||
func == bpf_lwt_xmit_push_encap)
return true;

return false;
}

It is possible that we could fine tune this behavior as some helpers
like bpf_skb_pull_data() may not need to start over again. But I
could miss some conditions.

Could you post your questions at bpf@...?
Networking people in the mailing list may give you a better
answer why this behavior for bpf_skb_pull_data() and whether
it can be improved.


unsigned payload_offset = (void *)tcph + (tcph->doff << 2) - data;
bpf_skb_pull_data(ctx, ctx->len);
data = (void *)(long)ctx->data;
data_end = (void *)(long)ctx->data_end;

struct tls_record_hdr *rech = data + payload_offset;
if ((void *)(rech + 1) > data_end)
return TC_ACT_OK;

if (rech->type == TLS_CONTENT_TYPE_HANDSAHKE)
bpf_trace_printk("It's a handshake");

Running this code gives me the error "R1 offset is outside of the
packet" even if I performed the correct check on packet boundaries. If I
re-parse all header the code is accepted. Is there a way to solve the
problem?

[1]
https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h#L2312

On 20/10/21 08:11, Y Song wrote:
On Tue, Oct 19, 2021 at 8:13 AM Federico Parola
<federico.parola@...> wrote:

Dear all,
how can I access the payload of the packet in a program attached to the
TC egress hook (SCHED_CLS attached to clsact qdisc)?
ctx->data_end points to the end of the L4 header, while on the ingress
hook it points to the end of the packet (tested on kernel v5.14).
This could be the case that linear data only covers up to the end of
L4 header. In such cases, you can use bpf_skb_pull_data() helper
to get more data into linear region and after that your ctx->data_end
will point to much later packet data.


Best regards,
Federico Parola