[PATCHv3 RFC 0/3] AF_XDP netdev support for OVS


William Tu
 

The patch series introduces AF_XDP support for OVS netdev.
AF_XDP is a new address family working together with eBPF.
In short, a socket with AF_XDP family can receive and send
packets from an eBPF/XDP program attached to the netdev.
For more details about AF_XDP, please see linux kernel's
Documentation/networking/af_xdp.rst

OVS has a couple of netdev types, i.e., system, tap, or
internal. The patch first adds a new netdev types called
"afxdp", and implement its configuration, packet reception,
and transmit functions. Since the AF_XDP socket, xsk,
operates in userspace, once ovs-vswitchd receives packets
from xsk, the proposed architecture re-uses the existing
userspace dpif-netdev datapath. As a result, most of
the packet processing happens at the userspace instead of
linux kernel.

Architecure
===========
_
| +-------------------+
| | ovs-vswitchd |<-->ovsdb-server
| +-------------------+
| | ofproto |<-->OpenFlow controllers
| +--------+-+--------+
| | netdev | |ofproto-|
userspace | +--------+ | dpif |
| | netdev | +--------+
| |provider| | dpif |
| +---||---+ +--------+
| || | dpif- |
| || | netdev |
|_ || +--------+
||
_ +---||-----+--------+
| | af_xdp prog + |
kernel | | xsk_map |
|_ +--------||---------+
||
physical
NIC

To simply start, create a ovs userspace bridge using dpif-netdev
by setting the datapath_type to netdev:
# ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev

And attach a linux netdev with type afxdp:
# ovs-vsctl add-port br0 afxdp-p0 -- \
set interface afxdp-p0 type="afxdp"

Documentation
=============
Most of the design details are described in the paper presetned at
Linux Plumber 2018, "Bringing the Power of eBPF to Open vSwitch"[1],
section 4, and slides[2].
This path uses a not-yet upstreamed feature called XDP_ATTACH[3],
described in section 3.1, which is a built-in XDP program for the AF_XDP.
This greatly simplifies the management of XDP/eBPF programs.

[1] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-afxdp.pdf
[2] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-lpc18-presentation.pdf
[3] http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf

Test Cases
==========
Test cases are created using namespaces and veth peer, with AF_XDP socket
attached to the veth (thus the SKB_MODE). By issuing "make check-afxdp",
the patch shows the following:

AF_XDP netdev datapath-sanity

1: datapath - ping between two ports ok
2: datapath - http between two ports ok
3: datapath - ping between two ports on vlan ok
4: datapath - ping6 between two ports ok
5: datapath - ping6 between two ports on vlan ok
6: datapath - ping over vxlan tunnel ok
7: datapath - ping over vxlan6 tunnel ok
8: datapath - ping over gre tunnel ok
9: datapath - ping over erspan v1 tunnel ok
10: datapath - ping over erspan v2 tunnel ok
11: datapath - ping over ip6erspan v1 tunnel ok
12: datapath - ping over ip6erspan v2 tunnel ok
13: datapath - ping over geneve tunnel ok
14: datapath - ping over geneve6 tunnel ok
15: datapath - clone action ok
16: datapath - mpls actions ok
17: datapath - basic truncate action ok

conntrack

18: conntrack - controller ok
19: conntrack - force commit ok
20: conntrack - ct flush by 5-tuple ok
21: conntrack - IPv4 ping ok
22: conntrack - get_nconns and get/set_maxconns ok
23: conntrack - IPv6 ping ok
24: conntrack - preserve registers ok
25: conntrack - invalid ok
26: conntrack - zones ok
27: conntrack - zones from field ok
28: conntrack - multiple bridges ok
29: conntrack - multiple zones ok
30: conntrack - multiple namespaces, internal ports skipped (system-afxdp-traffic.at:1298)
31: conntrack - ct_mark ok
32: conntrack - ct_mark bit-fiddling ok

system-ovn

36: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT ok
37: ovn -- 2 LRs connected via LS, gateway router, easy SNAT ok
38: ovn -- multiple gateway routers, SNAT and DNAT ok
39: ovn -- load-balancing ok
40: ovn -- load-balancing - same subnet. ok
41: ovn -- load balancing in gateway router ok
42: ovn -- multiple gateway routers, load-balancing ok
43: ovn -- load balancing in router with gateway router port ok
44: ovn -- DNAT and SNAT on distributed router - N/S ok
45: ovn -- DNAT and SNAT on distributed router - E/W ok

---
v1->v2:
- add a list to maintain unused umem elements
- remove copy from rx umem to ovs internal buffer
- use hugetlb to reduce misses (not much difference)
- use pmd mode netdev in OVS (huge performance improve)
- remove malloc dp_packet, instead put dp_packet in umem

v2->v3:
- rebase on the OVS master, 7ab4b0653784
("configure: Check for more specific function to pull in pthread library.")
- remove the dependency on libbpf and dpif-bpf.
instead, use the built-in XDP_ATTACH feature.
- data structure optimizations for better performance, see[1]
- more test cases support

William Tu (3):
netdev-afxdp: add new netdev type for AF_XDP
tests: add AF_XDP netdev test cases.
FIXME: work around the failed cases.

acinclude.m4 | 13 +
configure.ac | 1 +
lib/automake.mk | 6 +-
lib/dp-packet.c | 20 +
lib/dp-packet.h | 29 +-
lib/dpif-netdev.c | 2 +-
lib/netdev-afxdp.c | 703 ++++++++++++++++++
lib/netdev-afxdp.h | 41 ++
lib/netdev-linux.c | 72 +-
lib/netdev-provider.h | 1 +
lib/netdev.c | 1 +
lib/xdpsock.c | 171 +++++
lib/xdpsock.h | 144 ++++
tests/automake.mk | 17 +
tests/system-afxdp-macros.at | 155 ++++
tests/system-afxdp-testsuite.at | 26 +
tests/system-afxdp-traffic.at | 1541 +++++++++++++++++++++++++++++++++++++++
17 files changed, 2937 insertions(+), 6 deletions(-)
create mode 100644 lib/netdev-afxdp.c
create mode 100644 lib/netdev-afxdp.h
create mode 100644 lib/xdpsock.c
create mode 100644 lib/xdpsock.h
create mode 100644 tests/system-afxdp-macros.at
create mode 100644 tests/system-afxdp-testsuite.at
create mode 100644 tests/system-afxdp-traffic.at

--
2.7.4