Re: The page-pool as a component for XDP forwarding


Daniel Borkmann
 

On 05/06/2016 01:00 AM, Tom Herbert via iovisor-dev wrote:
On Thu, May 5, 2016 at 3:41 PM, Alexei Starovoitov
<alexei.starovoitov@...> wrote:
On Thu, May 05, 2016 at 03:21:24PM -0700, Tom Herbert wrote:
On Thu, May 5, 2016 at 2:44 PM, Alexei Starovoitov
<alexei.starovoitov@...> wrote:
On Thu, May 05, 2016 at 01:19:37PM -0700, Tom Herbert wrote:
I think we're saying the same the thing just using different notation.
BPF program returns an index which the driver maps to a queue, but
this index is relative to XDP instance. So if a device offers 3 levels
priority queues then BPF program can return 0,1, or 2. The driver can
map this return value to a queue (probably from a set of three queues
dedicated to the XDP instance). What I am saying is that this driver
mapping should be trivial and does not implement any policy other than
restricting the XDP instance to its set-- like mapping to actual queue
number could be 3*N+R where N in instance # of XDP and R is return
index. Egress on a different interface can work the same way, for
instance 0 index might queue for local interface, 1 index might queue
for interface. This simple return value to queue mapping is lot easier
for crossing devices if they are managed by the same driver I think.
+1
we'd need a way to specify priority queue from bpf program.
Probably not as a first step though.
Something like
BPF_XDP_DROP 0
BPF_XDP_PASS 1
BPF_XDP_TX 2
BPF_XDP_TX_PRIO 3 | upper bits used for prio
BPF_XDP_TX_PHYS_IFINDEX 4 | upper bits for ifindex
I think we can simplify these three into just XDP_TX (and without BPF
tag to allow possibility that some non-BPF entity really wants to use
this interface ;-) ).

Just have XDP_TX with some index. Index maps to priority, queue, other
device, what ever. The caller will need to understand what the
different possible indices mean but this can be negotiated out of band
and up front before programming.
No. See my comment to Jesper and rant about 'generality vs performance'
Combining them into one generic TX code is not simpler. Not for
the program and not for the driver side.
Sorry, I'm missing your point. The simple model is that we have three
opcodes and two of them take parameters. The parameters are generic so
they can indicate arbitrary instructions on what to do with the packet
(these can point to priority queue, HW rate limited queue, tap queue,
whatever). This is a way that drivers can extend the capabilities of
the interface for their own features without requiring any changes to
But that would mean, that XDP programs are not portable anymore across
different drivers, no? So they'd have to be rewritten when porting to a
different nic or cannot be supported there due to missing features.

the interface. A singe index allows one array lookup which returns
what every information is needed for driver to act on the packet. So
this scheme is both generic and performant-- it allows generality in
that the TX and RX actions can be arbitrarily extended and is
performant since all the driver needs to do is look the index in the
array to complete the action. If we don't do something like this, then
every time someone adds some new functionality we have to add another
action-- that doesn't scale. Priority queues are a perfect example of
this, these are not a common supported feature and should not be
exposed in the base action.
For portability on one XDP program across different NICs, then this
would either just be a *hint* to the driver and drivers not supporting
this don't care about that, or drivers need to indicate their capabilities
in some way to the verifier, so that verifier makes sure that such action
is possible at all for the given driver. For prio queues case, probably
first option might be better.

This also means the return code is simple, just two fields: the opcode
and its parameter. In phase one the parameter would always be zero.

Tom
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev

Join iovisor-dev@lists.iovisor.org to automatically receive all group messages.