On 05/05/2016 11:44 PM, Alexei Starovoitov via iovisor-dev wrote:
On Thu, May 05, 2016 at 01:19:37PM -0700, Tom Herbert wrote:
I think we're saying the same the thing just using different notation.+1
BPF program returns an index which the driver maps to a queue, but
this index is relative to XDP instance. So if a device offers 3 levels
priority queues then BPF program can return 0,1, or 2. The driver can
map this return value to a queue (probably from a set of three queues
dedicated to the XDP instance). What I am saying is that this driver
mapping should be trivial and does not implement any policy other than
restricting the XDP instance to its set-- like mapping to actual queue
number could be 3*N+R where N in instance # of XDP and R is return
index. Egress on a different interface can work the same way, for
instance 0 index might queue for local interface, 1 index might queue
for interface. This simple return value to queue mapping is lot easier
for crossing devices if they are managed by the same driver I think.
we'd need a way to specify priority queue from bpf program.
Probably not as a first step though.
BPF_XDP_TX_PRIO 3 | upper bits used for prio
BPF_XDP_TX_PHYS_IFINDEX 4 | upper bits for ifindex
BPF_XDP_RX_NETDEV_IFINDEX 5 | upper bits for ifindex of veth or any netdev
lower 8-bits to encode action should be enough.
First merge-able step is to do 0,1,2 in one driver (like mlx4) and
start building it in other drivers.
Can't this be done in a second step, with some per-cpu scratch data
as we have for redirect? That would seem easier to use to me, and easier
to extend with further data required to tx or rx to stack ... The return
code could have a flag to tell to look at the scratch data, for example.