[RFC][Proposal] BPF Control MAP


Saeed Mahameed <saeedm@...>
 

In this proposal I am going to address the lack of a unified user API
for accessing and manipulating BPF system attributes, while this
proposal is generic and will work on any BPF subsystem (eBPF attach
points), I will mostly focus on XDP use cases.

So lately I started working on three different XDP open issues, namely
XDP statistic, XDP redirect and XDP meta-data, while the details of
these issues are not really relevant for the sake of this proposal, all
of them share one common problem: the lack of unified user interface to
manipulate and access their attributes.

Examples:
1. Query XDP statistics.
2. XDP resource management, Setup XDP-redirect TX resources.
3. Setup and query XDP-metadata - (BTF data structure).

Jesper Brouer, explains some of these issues in details at:
https://github.com/xdp-project/xdp-project/blob/master/xdp-project.org

Yes I considered, netlink, devlink, ethtool, sysctrl, etc .. but each
one of them has it's own drawback, they are networking specific and
will not serve the BPF general purpose.

What we want is, all of the BPF related knobs to be present in BPF user
tools: bcc, bpftool and libbpf. Ideally we don't want these tools to
integrate with all different subsystem's UAPIs, especially the wide
variety of the networking UAPIs, and imagine what other subsystems are
going to be using ..

So what seems to be the right path here is a unified BPF
control/configuration user interface, which will hook the caller with
the targeted subsystem.

To be aligned with all existing BPF tools I am going to propose the use
of BPF syscall (No, not a new BPF syscall command, I am not planing to
reinvent the wheel - "again" -).
What i am going to suggest is to use an already existing API which runs
on top of the BPF syscall, BPF MAPs API with just a small tweak. Enter:



BPF control MAP:

A special type of MAP "BPF_MAP_TYPE_CONTROL", this map will not behave
like other maps in the essence of having a user defined data structure
behind it, we are going to use it just to hook the user with the
targeted underlying subsystem and delegate user commands to it through
map operations (create/update_elem/lookup_elem/etc ...)



Requirements and implementation details:

1) Hook the user with the targeted subsystem:
- On create map, user selects the BPF_MAP_TYPE_CONTROL map type and
sets map_attr.ctrl_type to be the subsystem he wants to access and
manipulate (KPROBE/CGROUP/SOCKET_FILTER/XDP/etc..).

2) Set and Get operations of a specific BPF subsystem or an object in
that subsystem (for example a netdev in XDP).
- user will use the file descriptor retrieved on map creation to access
(Set/Get) the BPF subsystem attributes via map update_elem and
lookup_elem operations, the key will be the object id (example:
ifindex, or just the type of configuration to access) keys and values
are subsystem dependent.

3) Iterate through the different attributes/objects of the subsystem,
Use case: XDP BPF subsystem, get ALL netdevs XDP attributes/statistics.
can be easily achieved with: bpf_map_get_next_key.



Advantages & Motivation:
Why BPF MAP and not just a plain new BPF syscall command or any other
existing UAPI:

0) All BPF users love maps and got used to them, and simply, everything
is a map, system objects can be keys and their attributes can be
values.

1) **BTF** integration, any map (key, value) pair can be described in
BTF in kernel level and can be attached to the map the user creates,
this will be a huge advantage for user forward compatibility, and for
development convenience to not copy kernel uapi headers on each
attribute set updates, and simplify ABI compatibility.
New values or attributes can be dumped/parsed in user space with zero
effort, no need to constantly update user space tools.

2) BPF maps already laid the groundwork for our requirements as the
infrastructure and has the semantics that we are looking for (set/get).

3) Already integrated in user-space tools and libraries such ash
bcc/libbpf and friends, what is missing is just this small tweak (in
the kernel) to hook one special map type with the underlying BPF
subsystems.

Thoughts ?


[Some EXTRAs]
Example use cases (XDP only for now):

1) Query XDP stats of all XDP netdevs:

xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =
XDP_STATS);

while (bpf_map_get_next_key(xdp_ctrl, &ifindex, &next_ifindex) == 0) {
bpf_map_lookup_elem(xdp_ctrl, &next_ifindex, &stats);
// we don't even need to know stats format in this case
btf_pretty_print(xdp_ctrl->btf, &stats);
ifindex = next_ifindex;
}

2) Setup XDP tx redirect resources on egress netdev (netdev with no XDP
program).

xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =
XDP_ATTR);

xdp_attr->command = SETUP_REDIRECT;
xdp_attr->rings.num = 12;
xdp_attr->rings.size = 128;
bpf_map_update_elem(xdp_ctrl, &ifindex, &xdp_attr);

3) Turn On/Off XDP meta data offloads and retrieve meta data BTF format
of specific netdev/hardware:

xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =
XDP_ATTR);

xdp_attr->command = SETUP_MD;
xdp_attr->enable_md = 1;
err = bpf_map_update_elem(xdp_ctrl, &ifindex, &xdp_attr);
if (err) {
printf("XDP meta data is not supported on this netdev");
return;
}
// Query Meta data BTF
bpf_map_lookup_elem(xdp_ctrl, &ifindex, &xdp_attr);
md_btf = xdp_attr.md_btf;

Thanks,
Saeed.

Join iovisor-dev@lists.iovisor.org to automatically receive all group messages.