This group is locked. No changes can be made to the group while it is locked.
Date
1 - 7 of 7
[RFC][Proposal] BPF Control MAP
Alexei Starovoitov
On Fri, Mar 08, 2019 at 08:51:03PM +0000, Saeed Mahameed wrote:
It's certainly an interesting idea. I think we need to agree on use cases and goals first before bikesheding on the solution. this bit show cases advantage of BTF nicely. 2) Setup XDP tx redirect resources on egress netdev (netdev with no XDPthis one starting to become a bit odd, since input arguments are split between key an value. ifindex is part of the key whereas command, rings.num and rings.size are part of the value. 3) Turn On/Off XDP meta data offloads and retrieve meta data BTF formathere it gets even weirder, since lookup arguments are also split between key and value. ifindex is inside the key while addition info is passed inside xdp_attr which is part of value. I wish we could do maps where every key would have different layout. Then such api would be suitable. The existing maps require all keys and values to be inform. I guess one can argue that such 'control map' can have one element and one value, but then what is the purpose of 'map_create' and 'map_delete==close(fd)' operations? I think we need something else here. Using BTF to describe output stats is nice, but using BTF to describe input query is problematic, since user cannot know before hand what kernel can and cannot accept. imo input should be stable uapi with fixed constants whereas stats-like output can be BTF based and vary from driver to driver and from one nic version to another. |
Alexei Starovoitov
On Tue, Mar 19, 2019 at 06:54:08PM +0000, Saeed Mahameed wrote:
In your examples above does netdev with corresponding ifindex exist before map is created? Does prog_fd exist ? and socket? In all cases yes. they do. Hence creation of 'map' (even pseudo map) is an unnecessary step. User space providing BTF for input and output makes little sense to me. What kernel suppose to do with it? In case of XDP stats (that could be different between drivers) the driver would provide a BTF to describe the stats it's collecting. So it's kernel supplied BTF instead of user. for input queries - yes. See how 'union bpf_attr' works.I think we need something else here. Using BTF to describeWell, we can decide to use static stable uapi, and still use this It can accommodate any number of new commands with its own arguments to query XDP stats. For output the driver can supply BTF back to user space along with a blob of data. If I got your intent correctly you want only BPF_MAP_TYPE_CONTROL to be processed by generic bpf layer and everything else to be done in the driver? All commands, values, input/output to be driver specific? |
Alexei Starovoitov
On Wed, Mar 20, 2019 at 04:47:11AM +0000, Saeed Mahameed wrote:
perfect. let's agree on the use case first... $ bpftool xdp set eth0 xsk-ring-size 128would be great to have. no doubt. what this can eventually do:and bpf_map_info returns btf_key_type_id and btf_value_type_id Are you saying btf_key_type_id will return single u32 with name 'ifindex' ? Where do you specify netns_id+dev_id ? // Find if control map values btf layout include the new memberwhat about other fields in this value ? If zero do not update ? Shouldn't it be read/modify/write ? So lookup everything first, modify only that '*(value + offset)' and do map_update ? and the driver received full blob of bytes for all fields. What driver suppose to do ? Rewrite all fields ? Or try to be smart and do read+if_not_the_same_write ? Different ifindexes will be different drivers, so single map providing single btf_value_type_id for all of them isn't going to work. I hope you see that such api is falling apart even for simplest use case. |
Saeed Mahameed <saeedm@...>
On Tue, 2019-03-19 at 08:36 -0700, Alexei Starovoitov wrote:
On Fri, Mar 08, 2019 at 08:51:03PM +0000, Saeed Mahameed wrote:Sure will discuss most of the use cases tomorrow in the iovisorThoughts ?It's certainly an interesting idea. I think we need to agree on use meeting. The idea is that we need to keep a map semantics (object->vlaue).Example use cases (XDP only for now):this bit show cases advantage of BTF nicely. in this case ( map_attr.ctrl_type = XDP_ATTR ) object (key) is if_index and value is xdp attributes of that if_index, which plays nicely when you want to iterate through all the objects (XDP netdevs). simply think of ifindex as a key and not an input value. different ctrl map types can have different keys and values hence the need for map_create and passing map_attr.ctrl_type attribute which will define what the user is trying to access (which control map) and what will be the key (object) & value (configuration) pair layout. This is exactly the purpose of map_{create/delete} to define what the3) Turn On/Off XDP meta data offloads and retrieve meta data BTFhere it gets even weirder, since lookup arguments are also key,vlaue format will be ( it is not going to be ifindex for all control maps), to make it clear, the key doesn't have to be an ifindex at all, it depends on the map_attr.ctrl_type which the user request on map creation, so different layouts are already supported by this proposal examples of different map_attr.ctrl_type: fd = create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type = XDP_ATTR) // Key layout == ifindex, vlaue format if_xdp_attributes fd = create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type = BPF_PROG_STATS) // Key layout == prog_fd, value format struct bpf_prog_stats fd = create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type = BPF_SOCK_ATTR) // Key layout == socket_fd, value layout struct bpf_sock_attr extending this can be done in any linux BPF subsystem, by introducing new map_attr.ctrl_types and new key value layouts of that control type. on map creation we will attach the btf format of the (key, value) pair to the map created for that user. I think we need something else here. Using BTF to describeWell, we can decide to use static stable uapi, and still use this special map to leverage the map API as described in this doc. but also we can allow dynamic value layouts, but any extension should be done to the end of any value structuer and on lookup we will only copy the size that the user already recognizes .. ? or we can assume/force the user to use the map_attributes to figure out format layouts and sizes, but still we will guarantee backward compatibility in kernel level by keeping old format the same, and extension is only allowed to the end of the value structures. |
Saeed Mahameed <saeedm@...>
On Tue, 2019-03-19 at 20:04 -0700, Alexei Starovoitov wrote:
On Tue, Mar 19, 2019 at 06:54:08PM +0000, Saeed Mahameed wrote:ok, i think i wasn't clear enough, let me retry.This is exactly the purpose of map_{create/delete} to define whatIn your examples above does netdev with corresponding ifindex So map creation has nothing to do with ifindex or the object the user is trying to access. on map creation the user will define what map sub type (control type) the user want to create/access, and the kernel job will be to setup this user map attributes, connect that map to the subsystem that is going to handle this map operations (lookup and update). by subsystem i mean XDP subsystem for example (net/core layer and not device driver) There will be no direct connection to the device driver, direct map operations that go to device drivers is something we should avoid, the whole idea here is to provide standard sustainable uapi and not device specific key values that will go out of control. BTF layout for key value pair is going to be defined and assigned by kernel middle layers and NOT the user space nor device drivers, each subsystem that want to support new control MAP sub types should pre- define its own BTF layou. on map creation user can query the BTF key value layouts and any other map attributes. For example to enable user to access XDP attributes, we will add a kernel BPF layer under net/core/xdp_ctrl.c which will handle all map operations coming in from maps that are created via: create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type = XDP_ATTR) other subsystems/layers (eg. sockets/kprobe/cgroups/tracepoints) can define there own map_attr.ctrl_type and implement thier own mid-layer which will handle control maps operation of that sub type. So you don't create a map to get connected to a specific netdev, you create a map to get a connection to a specific kernel BPF subsystem and query modify that subsystem objects (in XDP case netdevs). User can create multiple maps with different ap_attr.ctrl_type to access different system attributes. hmm, no i want the kernel core code to process everything, differentfor input queries - yes. See how 'union bpf_attr' works.I think we need something else here. Using BTF to describeWell, we can decide to use static stable uapi, and still use this subsystems (not device drivers) can provide there own BPF_MAP_TYPE_CONTROL sub types and provide the map operation implementation. As a backend such mid layer subsystem can use whatever interface to access objects (netdevs ndos in our case and could be via union bpf_attr), this is implementation specific that is transparent to the user. bottom line we don't want drivers/users to create BTF layouts it is the kernel mid layer job to connect between the two via the control MAPs, and this mid layer will audit drivers and users. I know we can achieve same thing with union bpf_attr and use plain ethtool or netlink, but this means that we need to integrate ethtool/netlink APIs into libbpf but the whole beauty if this proposal is that a BPF developer will have to deal only with MAP operations even if he wants to change system attributes. One extra benefit of this is forward compatibility of a simple user space program that can access new kernel attributes with out the need to modify the user space program consider the following command line, with bpftool that compiled way before xsk-ring-szie attribute was introduced. $ bpftool xdp set eth0 xsk-ring-size 128 what this can eventually do: fd = create_map(BPF_CONTROL, map_attr.ctrl_type = XDP_ATTR) //Query BTF of the control (XDP) map query_map_attributes(fd, &map_attr); // Find if control map values btf layout include the new member offset = btf_get_offset_of_memeber_name(map_attr.btf, "xsk-ring-size", BTF_TYPE_INT); if (offset < 0) return offset; // the new member is not supported in kernel *(value + offset) = (int)new_val; map_update(fd, <eth0 ifindex>, value); |
Toke Høiland-Jørgensen <toke@...>
Saeed Mahameed <saeedm@...> writes:
In this proposal I am going to address the lack of a unified user APIThe one concern I have with this is that it makes XDP configuration different from regular networking configuration. One of the compelling features of XDP is that it is less surprising than kernel offloads, because that you can interface with it using the regular kernel tooling. This is less the case if we're doing a BPF-specific thing... Or to put it another way, in my mind XDP is a networking technology that happens to use eBPF, more than it is an eBPF usage that happens to process packets; and I think it would make more sense for the userspace tooling to reflect this. That being said, I do agree that there are some cool ideas in your example, such as using BTF to express the statistics format, and the automatic enumeration of objects. -Toke |
Saeed Mahameed <saeedm@...>
In this proposal I am going to address the lack of a unified user API
for accessing and manipulating BPF system attributes, while this proposal is generic and will work on any BPF subsystem (eBPF attach points), I will mostly focus on XDP use cases. So lately I started working on three different XDP open issues, namely XDP statistic, XDP redirect and XDP meta-data, while the details of these issues are not really relevant for the sake of this proposal, all of them share one common problem: the lack of unified user interface to manipulate and access their attributes. Examples: 1. Query XDP statistics. 2. XDP resource management, Setup XDP-redirect TX resources. 3. Setup and query XDP-metadata - (BTF data structure). Jesper Brouer, explains some of these issues in details at: https://github.com/xdp-project/xdp-project/blob/master/xdp-project.org Yes I considered, netlink, devlink, ethtool, sysctrl, etc .. but each one of them has it's own drawback, they are networking specific and will not serve the BPF general purpose. What we want is, all of the BPF related knobs to be present in BPF user tools: bcc, bpftool and libbpf. Ideally we don't want these tools to integrate with all different subsystem's UAPIs, especially the wide variety of the networking UAPIs, and imagine what other subsystems are going to be using .. So what seems to be the right path here is a unified BPF control/configuration user interface, which will hook the caller with the targeted subsystem. To be aligned with all existing BPF tools I am going to propose the use of BPF syscall (No, not a new BPF syscall command, I am not planing to reinvent the wheel - "again" -). What i am going to suggest is to use an already existing API which runs on top of the BPF syscall, BPF MAPs API with just a small tweak. Enter: BPF control MAP: A special type of MAP "BPF_MAP_TYPE_CONTROL", this map will not behave like other maps in the essence of having a user defined data structure behind it, we are going to use it just to hook the user with the targeted underlying subsystem and delegate user commands to it through map operations (create/update_elem/lookup_elem/etc ...) Requirements and implementation details: 1) Hook the user with the targeted subsystem: - On create map, user selects the BPF_MAP_TYPE_CONTROL map type and sets map_attr.ctrl_type to be the subsystem he wants to access and manipulate (KPROBE/CGROUP/SOCKET_FILTER/XDP/etc..). 2) Set and Get operations of a specific BPF subsystem or an object in that subsystem (for example a netdev in XDP). - user will use the file descriptor retrieved on map creation to access (Set/Get) the BPF subsystem attributes via map update_elem and lookup_elem operations, the key will be the object id (example: ifindex, or just the type of configuration to access) keys and values are subsystem dependent. 3) Iterate through the different attributes/objects of the subsystem, Use case: XDP BPF subsystem, get ALL netdevs XDP attributes/statistics. can be easily achieved with: bpf_map_get_next_key. Advantages & Motivation: Why BPF MAP and not just a plain new BPF syscall command or any other existing UAPI: 0) All BPF users love maps and got used to them, and simply, everything is a map, system objects can be keys and their attributes can be values. 1) **BTF** integration, any map (key, value) pair can be described in BTF in kernel level and can be attached to the map the user creates, this will be a huge advantage for user forward compatibility, and for development convenience to not copy kernel uapi headers on each attribute set updates, and simplify ABI compatibility. New values or attributes can be dumped/parsed in user space with zero effort, no need to constantly update user space tools. 2) BPF maps already laid the groundwork for our requirements as the infrastructure and has the semantics that we are looking for (set/get). 3) Already integrated in user-space tools and libraries such ash bcc/libbpf and friends, what is missing is just this small tweak (in the kernel) to hook one special map type with the underlying BPF subsystems. Thoughts ? [Some EXTRAs] Example use cases (XDP only for now): 1) Query XDP stats of all XDP netdevs: xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type = XDP_STATS); while (bpf_map_get_next_key(xdp_ctrl, &ifindex, &next_ifindex) == 0) { bpf_map_lookup_elem(xdp_ctrl, &next_ifindex, &stats); // we don't even need to know stats format in this case btf_pretty_print(xdp_ctrl->btf, &stats); ifindex = next_ifindex; } 2) Setup XDP tx redirect resources on egress netdev (netdev with no XDP program). xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type = XDP_ATTR); xdp_attr->command = SETUP_REDIRECT; xdp_attr->rings.num = 12; xdp_attr->rings.size = 128; bpf_map_update_elem(xdp_ctrl, &ifindex, &xdp_attr); 3) Turn On/Off XDP meta data offloads and retrieve meta data BTF format of specific netdev/hardware: xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type = XDP_ATTR); xdp_attr->command = SETUP_MD; xdp_attr->enable_md = 1; err = bpf_map_update_elem(xdp_ctrl, &ifindex, &xdp_attr); if (err) { printf("XDP meta data is not supported on this netdev"); return; } // Query Meta data BTF bpf_map_lookup_elem(xdp_ctrl, &ifindex, &xdp_attr); md_btf = xdp_attr.md_btf; Thanks, Saeed. |