On 05/05/2016 11:44 PM, Alexei Starovoitov via iovisor-dev wrote:
On Thu, May 05, 2016 at 01:19:37PM -0700, Tom Herbert wrote:
I think we're saying the same the thing just using different notation. BPF program returns an index which the driver maps to a queue, but this index is relative to XDP instance. So if a device offers 3 levels priority queues then BPF program can return 0,1, or 2. The driver can map this return value to a queue (probably from a set of three queues dedicated to the XDP instance). What I am saying is that this driver mapping should be trivial and does not implement any policy other than restricting the XDP instance to its set-- like mapping to actual queue number could be 3*N+R where N in instance # of XDP and R is return index. Egress on a different interface can work the same way, for instance 0 index might queue for local interface, 1 index might queue for interface. This simple return value to queue mapping is lot easier for crossing devices if they are managed by the same driver I think.
+1 we'd need a way to specify priority queue from bpf program. Probably not as a first step though. Something like BPF_XDP_DROP 0 BPF_XDP_PASS 1 BPF_XDP_TX 2 BPF_XDP_TX_PRIO 3 | upper bits used for prio BPF_XDP_TX_PHYS_IFINDEX 4 | upper bits for ifindex BPF_XDP_RX_NETDEV_IFINDEX 5 | upper bits for ifindex of veth or any netdev lower 8-bits to encode action should be enough. First merge-able step is to do 0,1,2 in one driver (like mlx4) and start building it in other drivers.
Can't this be done in a second step, with some per-cpu scratch data as we have for redirect? That would seem easier to use to me, and easier to extend with further data required to tx or rx to stack ... The return code could have a flag to tell to look at the scratch data, for example.