Re: XDP seeking input from NIC hardware vendors

Jakub Kicinski

On Fri, 8 Jul 2016 09:45:25 -0700, John Fastabend wrote:
The only distinction between VFs and queue groupings on my side is VFs
provide RSS where as queue groupings have to be selected explicitly.
In a programmable NIC world the distinction might be lost if a "RSS"
program can be loaded into the NIC to select queues but for existing
hardware the distinction is there.
To do BPF RSS we need a way to select the queue which I think is all
Jasper wanted. So we will have to tackle the queue selection at some
point. The main obstacle with it for me is to define what queue
selection means when program is not offloaded to HW... Implementing
queue selection on HW side is trivial.

If you demux using a eBPF program or via a filter model like
flow_director or cls_{u32|flower} I think we can support both. And this
just depends on the programmability of the hardware. Note flow_director
and cls_{u32|flower} steering to VFs is already in place.
Yes, for steering to VFs we could potentially reuse a lot of existing

The question I have is should the "filter" part of the eBPF program
be a separate program from the XDP program and loaded using specific
semantics (e.g. "load_hardware_demux" ndo op) at the risk of building
a ever growing set of "ndo" ops. If you are running multiple XDP
programs on the same NIC hardware then I think this actually makes
sense otherwise how would the hardware and even software find the
"demux" logic. In this model there is a "demux" program that selects
a queue/VF and a program that runs on the netdev queues.
I don't think we should enforce the separation here. What we may want
to do before forwarding to the VF can be much more complicated than
pure demux/filtering (simple eg - pop VLAN/tunnel). VF representative
model works well here as fallback - if program could not be offloaded
it will be run on the host and "trombone" packets via VFR into the VF.

If we have a chain of BPF programs we can order them in increasing
level of complexity/features required and then HW could transparently
offload the first parts - the easier ones - leaving more complex
processing on the host.

This should probably be paired with some sort of "skip-sw" flag to let
user space enforce the HW offload on the fast path part.

Join to automatically receive all group messages.