Date
1 - 4 of 4
A couple eBPF and cls_act questions
Dan Siemon
Hello,
I listened in on the IOVisor call today but wasn't sure if user questions and use cases where appropriate there so I started writing this... There are two things I was going to discuss: 1) We use cls_act and eBPF extensively (https://www.preseem.com) I'm hoping someone can point me in the right direction... we track metrics per-IP address (latency, loss, others) via a eBPF program and also manage traffic via HTB and FQ-CoDel. A given end customer can have several IP addresses assigned which all need to go through the same HTB class (and attached FQ-CoDel instance) to enforce the plan rate. At present, we extract the metrics in cls_act which is pre-qdisc. The problem with this is that the byte and packet counts are too high because packets can be dropped in the qdiscs. It also messes a bit with the loss and latency calculations. Is there a way to hook post qdisc? I looked a bit at XDP, but it seems that is only Rx now? 2) The Celium BPF docs are great but one thing I don't have a good handle on is concurrent access to map data. Taking the hash map as an example, does the map update function need to be called with the eBPF program for each update to make it safe with concurrent access via userspace bpf syscall? At present we do some map data updates just using the ptr from the lookup function. |
|
Daniel Borkmann
On 08/08/2018 09:18 PM, Dan Siemon wrote:
Hello,There's a tracepoint right before netdev_start_xmit() which is called trace_net_dev_start_xmit(). So you could combine sch_clsact egress with cls_bpf and a BPF prog on the tracepoint right before handing the skb to the driver, they could share a map for example for the tuple to counters mapping, so you would still be able to do the major work in cls_bpf outside the qdisc lock. 2)The latter works with e.g. counters using BPF_XADD instructions, or switching to per-CPU map for counters to avoid the atomic op altogether. For single CPU map with non-XADD updates you might need to use the map update function instead to avoid races. Thanks, Daniel |
|
Dan Siemon
On Wed, 2018-08-08 at 21:43 +0200, Daniel Borkmann wrote:
Is there a way to hook post qdisc? I looked a bit at XDP, but itThanks. I don't know much about tracepoints but will look into this. IseemsThere's a tracepoint right before netdev_start_xmit() which is called gather these are capable of the same rates as the tc hooks? Where would the context extracted from the packet in the BPF prog (eg 5-tuple) be stashed so the tracepoint program can get at it without parsing the headers again? Ideally this context is extracted once the ingress port and flows with the SKB through to the egress port so we don't need to parse the headers more than once. Is the XDP on Tx idea something worth even talking about or is the tracepoint basically equivalent? Similarly, does it make any sense to add a post-qdisc tc hook where a clsact could be attached? In this model, the same program could count pre or post qdisc just based on where it was attached. Thanks for the help. |
|
Daniel Borkmann
On 08/08/2018 10:48 PM, Dan Siemon wrote:
On Wed, 2018-08-08 at 21:43 +0200, Daniel Borkmann wrote:Depends on what rates you are targeting, you might want to check BPFIs there a way to hook post qdisc? I looked a bit at XDP, but itThanks. I don't know much about tracepoints but will look into this. IseemsThere's a tracepoint right before netdev_start_xmit() which is called raw tracepoints to reduce overhead given this would be in hot path. From f6ef56589374 ("Merge branch 'bpf-raw-tracepoints'") that tested samples/bpf/test_overhead performance on 1 CPU, it says: tracepoint base kprobe+bpf tracepoint+bpf raw_tracepoint+bpf task_rename 1.1M 769K 947K 1.0M urandom_read 789K 697K 750K 755K Where would the context extracted from the packet in the BPF prog (egProbably makes sense to flatten part of the key and map it into skb->mark, or store it into skb->cb[], or store an offset there that points into the packet. Ideally this context is extracted once the ingress port and flows withI think it might be useful, a sch_clsact subhook would avoid having to unclone or linearize the skb. There's also an option to place cls_bpf in direct-action mode into sch_fq_codel which would come after your htb (see fq_codel_classify()), but I presumed you also want the hook after that. |
|