Re: [RFC][Proposal] BPF Control MAP

Alexei Starovoitov

On Fri, Mar 08, 2019 at 08:51:03PM +0000, Saeed Mahameed wrote:

Thoughts ?
It's certainly an interesting idea. I think we need to agree on use cases
and goals first before bikesheding on the solution.

Example use cases (XDP only for now):

1) Query XDP stats of all XDP netdevs:

xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =

while (bpf_map_get_next_key(xdp_ctrl, &ifindex, &next_ifindex) == 0) {
bpf_map_lookup_elem(xdp_ctrl, &next_ifindex, &stats);
// we don't even need to know stats format in this case
btf_pretty_print(xdp_ctrl->btf, &stats);
ifindex = next_ifindex;
this bit show cases advantage of BTF nicely.

2) Setup XDP tx redirect resources on egress netdev (netdev with no XDP

xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =

xdp_attr->command = SETUP_REDIRECT;
xdp_attr->rings.num = 12;
xdp_attr->rings.size = 128;
bpf_map_update_elem(xdp_ctrl, &ifindex, &xdp_attr);
this one starting to become a bit odd, since input arguments are split
between key an value. ifindex is part of the key whereas command,
rings.num and rings.size are part of the value.

3) Turn On/Off XDP meta data offloads and retrieve meta data BTF format
of specific netdev/hardware:

xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =

xdp_attr->command = SETUP_MD;
xdp_attr->enable_md = 1;
err = bpf_map_update_elem(xdp_ctrl, &ifindex, &xdp_attr);
if (err) {
printf("XDP meta data is not supported on this netdev");
// Query Meta data BTF
bpf_map_lookup_elem(xdp_ctrl, &ifindex, &xdp_attr);
md_btf = xdp_attr.md_btf;
here it gets even weirder, since lookup arguments are also
split between key and value.
ifindex is inside the key while addition info is passed
inside xdp_attr which is part of value.

I wish we could do maps where every key would have different layout.
Then such api would be suitable.
The existing maps require all keys and values to be inform.
I guess one can argue that such 'control map' can have one element
and one value, but then what is the purpose of 'map_create'
and 'map_delete==close(fd)' operations?

I think we need something else here. Using BTF to describe
output stats is nice, but using BTF to describe input query is problematic,
since user cannot know before hand what kernel can and cannot accept.
imo input should be stable uapi with fixed constants whereas
stats-like output can be BTF based and vary from driver to driver
and from one nic version to another.

Join to automatically receive all group messages.