[PATCH RFC 0/3] AF_XDP support for OVS


William Tu
 

The patch series introduces AF_XDP support for OVS netdev.
AF_XDP is a new address family working together with eBPF.
In short, a socket with AF_XDP family can receive and send
packets from an eBPF/XDP program attached to the netdev.
For more details about AF_XDP, please see linux kernel's
Documentation/networking/af_xdp.rst

OVS has a couple of netdev types, i.e., system, tap, or
internal. The patch first adds a new netdev types called
"afxdp", and implement its configuration, packet reception,
and transmit functions. Since the AF_XDP socket, xsk,
operates in userspace, once ovs-vswitchd receives packets
from xsk, the proposed architecture re-uses the existing
userspace dpif-netdev datapath. As a result, most of
the packet processing happens at the userspace instead of
linux kernel.

Architecure
===========
_
| +-------------------+
| | ovs-vswitchd |<-->ovsdb-server
| +-------------------+
| | ofproto |<-->OpenFlow controllers
| +--------+-+--------+
| | netdev | |ofproto-|
userspace | +--------+ | dpif |
| | netdev | +--------+
| |provider| | dpif |
| +---||---+ +--------+
| || | dpif- |
| || | netdev |
|_ || +--------+
||
_ +---||-----+--------+
| | af_xdp prog + |
kernel | | xsk_map |
|_ +--------||---------+
||
physical
NIC

To simply start, create a ovs userspace bridge using dpif-netdev
by setting the datapath_type to netdev:
# ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev

And attach a linux netdev with type afxdp:
# ovs-vsctl add-port br0 afxdp-p0 -- \
set interface afxdp-p0 type="afxdp"

Most of the implementation follows the AF_XDP sample code
in Linux kernel under samples/bpf/xdpsock_user.c.

Configuration
=============
When a new afxdp netdev is added to OVS, the patch does
the following configuration
1) attach the afxdp program and map to the netdev (see bpf/xdp.h)
2) create an AF_XDP socket (XSK) for thie netdev
3) allocate a virtual contiguous memory region, called umem, and
register this memory to the XSK
4) setup the rx/tx ring, and umem's fill/completion ring

Packet Flow
===========
Currently, the af_xdp program loaded to the netdev does nothing
but simply forwards the packet to queue id 0.

The patch simplifies the buffer/ring management by introducing
a copy from umem to ovs's internal buffer, when receiving a
packet. And when sending the packet out to another netdev,
copying the packet to the netdev's umem.

An AF_XDP packet forwarding from one netdev (ovs-p0) to another
netdev (ovs-p1) goes through the following path:
1) xdp program at ovs-p0 copies packet to kernel (SKB_MODE)
2) kernel maps the packet to userspace umem
3) ovs-vswitchd receive the packet from ovs-p0, copy to internal
packet buffer
4) ovs-vswitchd copies the pachet to umem of ovs-p1, kick_tx
5) kernel copies the packet from umem to ovs-p1 tx queue

Since the total number of copies between two ports is 4,
the performance will be bad so I don't want to do it.
Hopefully by using AF_XDP zero copy mode, 1) and 5) will
be removed and in ovs-vswitchd, it's possible to combine
the 3) and 4) to only one copy. So the best case will be
one copy between two netdev.

Test Framework
==============
# make check-afxdp
will kick start two end-to-end tests using veth peer
and namespaces:

AFXDP netdev datapath-sanity
1: datapath - ping between two ports ok
2: datapath - http between two ports ok

The patch series is based on the ovs-ebpf implementaion.
A copy is put at: https://github.com/williamtu/ovs-ebpf/
branch afxdp-v1

William Tu (3):
afxdp: add ebpf code for afxdp and xskmap.
netdev-linux: add new netdev type afxdp.
tests: add afxdp test cases.

acinclude.m4 | 1 +
bpf/api.h | 6 +
bpf/helpers.h | 2 +
bpf/maps.h | 12 +
bpf/xdp.h | 34 +-
lib/automake.mk | 3 +-
lib/bpf.c | 41 ++-
lib/bpf.h | 6 +-
lib/dpif-netdev.c | 74 +++-
lib/if_xdp.h | 79 +++++
lib/netdev-dummy.c | 1 +
lib/netdev-linux.c | 741 +++++++++++++++++++++++++++++++++++++++-
lib/netdev-provider.h | 2 +
lib/netdev-vport.c | 4 +
lib/netdev.c | 11 +
lib/netdev.h | 1 +
tests/automake.mk | 17 +
tests/ofproto-macros.at | 1 +
tests/system-afxdp-macros.at | 148 ++++++++
tests/system-afxdp-testsuite.at | 25 ++
tests/system-afxdp-traffic.at | 38 +++
vswitchd/bridge.c | 1 +
22 files changed, 1228 insertions(+), 20 deletions(-)
create mode 100644 lib/if_xdp.h
create mode 100644 tests/system-afxdp-macros.at
create mode 100644 tests/system-afxdp-testsuite.at
create mode 100644 tests/system-afxdp-traffic.at

--
2.7.4