Date   

[RFC][Proposal] BPF Control MAP

Saeed Mahameed <saeedm@...>
 

In this proposal I am going to address the lack of a unified user API
for accessing and manipulating BPF system attributes, while this
proposal is generic and will work on any BPF subsystem (eBPF attach
points), I will mostly focus on XDP use cases.

So lately I started working on three different XDP open issues, namely
XDP statistic, XDP redirect and XDP meta-data, while the details of
these issues are not really relevant for the sake of this proposal, all
of them share one common problem: the lack of unified user interface to
manipulate and access their attributes.

Examples:
1. Query XDP statistics.
2. XDP resource management, Setup XDP-redirect TX resources.
3. Setup and query XDP-metadata - (BTF data structure).

Jesper Brouer, explains some of these issues in details at:
https://github.com/xdp-project/xdp-project/blob/master/xdp-project.org

Yes I considered, netlink, devlink, ethtool, sysctrl, etc .. but each
one of them has it's own drawback, they are networking specific and
will not serve the BPF general purpose.

What we want is, all of the BPF related knobs to be present in BPF user
tools: bcc, bpftool and libbpf. Ideally we don't want these tools to
integrate with all different subsystem's UAPIs, especially the wide
variety of the networking UAPIs, and imagine what other subsystems are
going to be using ..

So what seems to be the right path here is a unified BPF
control/configuration user interface, which will hook the caller with
the targeted subsystem.

To be aligned with all existing BPF tools I am going to propose the use
of BPF syscall (No, not a new BPF syscall command, I am not planing to
reinvent the wheel - "again" -).
What i am going to suggest is to use an already existing API which runs
on top of the BPF syscall, BPF MAPs API with just a small tweak. Enter:



BPF control MAP:

A special type of MAP "BPF_MAP_TYPE_CONTROL", this map will not behave
like other maps in the essence of having a user defined data structure
behind it, we are going to use it just to hook the user with the
targeted underlying subsystem and delegate user commands to it through
map operations (create/update_elem/lookup_elem/etc ...)



Requirements and implementation details:

1) Hook the user with the targeted subsystem:
- On create map, user selects the BPF_MAP_TYPE_CONTROL map type and
sets map_attr.ctrl_type to be the subsystem he wants to access and
manipulate (KPROBE/CGROUP/SOCKET_FILTER/XDP/etc..).

2) Set and Get operations of a specific BPF subsystem or an object in
that subsystem (for example a netdev in XDP).
- user will use the file descriptor retrieved on map creation to access
(Set/Get) the BPF subsystem attributes via map update_elem and
lookup_elem operations, the key will be the object id (example:
ifindex, or just the type of configuration to access) keys and values
are subsystem dependent.

3) Iterate through the different attributes/objects of the subsystem,
Use case: XDP BPF subsystem, get ALL netdevs XDP attributes/statistics.
can be easily achieved with: bpf_map_get_next_key.



Advantages & Motivation:
Why BPF MAP and not just a plain new BPF syscall command or any other
existing UAPI:

0) All BPF users love maps and got used to them, and simply, everything
is a map, system objects can be keys and their attributes can be
values.

1) **BTF** integration, any map (key, value) pair can be described in
BTF in kernel level and can be attached to the map the user creates,
this will be a huge advantage for user forward compatibility, and for
development convenience to not copy kernel uapi headers on each
attribute set updates, and simplify ABI compatibility.
New values or attributes can be dumped/parsed in user space with zero
effort, no need to constantly update user space tools.

2) BPF maps already laid the groundwork for our requirements as the
infrastructure and has the semantics that we are looking for (set/get).

3) Already integrated in user-space tools and libraries such ash
bcc/libbpf and friends, what is missing is just this small tweak (in
the kernel) to hook one special map type with the underlying BPF
subsystems.

Thoughts ?


[Some EXTRAs]
Example use cases (XDP only for now):

1) Query XDP stats of all XDP netdevs:

xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =
XDP_STATS);

while (bpf_map_get_next_key(xdp_ctrl, &ifindex, &next_ifindex) == 0) {
bpf_map_lookup_elem(xdp_ctrl, &next_ifindex, &stats);
// we don't even need to know stats format in this case
btf_pretty_print(xdp_ctrl->btf, &stats);
ifindex = next_ifindex;
}

2) Setup XDP tx redirect resources on egress netdev (netdev with no XDP
program).

xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =
XDP_ATTR);

xdp_attr->command = SETUP_REDIRECT;
xdp_attr->rings.num = 12;
xdp_attr->rings.size = 128;
bpf_map_update_elem(xdp_ctrl, &ifindex, &xdp_attr);

3) Turn On/Off XDP meta data offloads and retrieve meta data BTF format
of specific netdev/hardware:

xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =
XDP_ATTR);

xdp_attr->command = SETUP_MD;
xdp_attr->enable_md = 1;
err = bpf_map_update_elem(xdp_ctrl, &ifindex, &xdp_attr);
if (err) {
printf("XDP meta data is not supported on this netdev");
return;
}
// Query Meta data BTF
bpf_map_lookup_elem(xdp_ctrl, &ifindex, &xdp_attr);
md_btf = xdp_attr.md_btf;

Thanks,
Saeed.


Re: [RFC][Proposal] BPF Control MAP

Toke Høiland-Jørgensen <toke@...>
 

Saeed Mahameed <saeedm@...> writes:

In this proposal I am going to address the lack of a unified user API
for accessing and manipulating BPF system attributes, while this
proposal is generic and will work on any BPF subsystem (eBPF attach
points), I will mostly focus on XDP use cases.

So lately I started working on three different XDP open issues, namely
XDP statistic, XDP redirect and XDP meta-data, while the details of
these issues are not really relevant for the sake of this proposal, all
of them share one common problem: the lack of unified user interface to
manipulate and access their attributes.

Examples:
1. Query XDP statistics.
2. XDP resource management, Setup XDP-redirect TX resources.
3. Setup and query XDP-metadata - (BTF data structure).

Jesper Brouer, explains some of these issues in details at:
https://github.com/xdp-project/xdp-project/blob/master/xdp-project.org

Yes I considered, netlink, devlink, ethtool, sysctrl, etc .. but each
one of them has it's own drawback, they are networking specific and
will not serve the BPF general purpose.
The one concern I have with this is that it makes XDP configuration
different from regular networking configuration. One of the compelling
features of XDP is that it is less surprising than kernel offloads,
because that you can interface with it using the regular kernel tooling.
This is less the case if we're doing a BPF-specific thing...

Or to put it another way, in my mind XDP is a networking technology that
happens to use eBPF, more than it is an eBPF usage that happens to
process packets; and I think it would make more sense for the userspace
tooling to reflect this.

That being said, I do agree that there are some cool ideas in your
example, such as using BTF to express the statistics format, and the
automatic enumeration of objects.

-Toke


Re: [RFC][Proposal] BPF Control MAP

Saeed Mahameed <saeedm@...>
 

On Tue, 2019-03-19 at 20:04 -0700, Alexei Starovoitov wrote:
On Tue, Mar 19, 2019 at 06:54:08PM +0000, Saeed Mahameed wrote:
This is exactly the purpose of map_{create/delete} to define what
the
key,vlaue format will be ( it is not going to be ifindex for all
control maps), to make it clear, the key doesn't have to be an
ifindex
at all, it depends on the map_attr.ctrl_type which the user request
on
map creation, so different layouts are already supported by this
proposal

examples of different map_attr.ctrl_type:

fd = create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =
XDP_ATTR)
// Key layout == ifindex, vlaue format if_xdp_attributes

fd = create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =
BPF_PROG_STATS)
// Key layout == prog_fd, value format struct bpf_prog_stats

fd = create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =
BPF_SOCK_ATTR)
// Key layout == socket_fd, value layout struct bpf_sock_attr

extending this can be done in any linux BPF subsystem, by
introducing
new map_attr.ctrl_types and new key value layouts of that control
type.

on map creation we will attach the btf format of the (key, value)
pair
to the map created for that user.
In your examples above does netdev with corresponding ifindex
exist before map is created?
Does prog_fd exist ? and socket?
In all cases yes. they do. Hence creation of 'map' (even pseudo map)
is an unnecessary step.

User space providing BTF for input and output makes little sense to
me.
What kernel suppose to do with it?

In case of XDP stats (that could be different between drivers)
the driver would provide a BTF to describe the stats it's collecting.
So it's kernel supplied BTF instead of user.
ok, i think i wasn't clear enough, let me retry.

So map creation has nothing to do with ifindex or the object the user
is trying to access.

on map creation the user will define what map sub type (control type)
the user want to create/access, and the kernel job will be to setup
this user map attributes, connect that map to the subsystem that is
going to handle this map operations (lookup and update). by subsystem i
mean XDP subsystem for example (net/core layer and not device driver)
There will be no direct connection to the device driver, direct map
operations that go to device drivers is something we should avoid, the
whole idea here is to provide standard sustainable uapi and not device
specific key values that will go out of control.

BTF layout for key value pair is going to be defined and assigned by
kernel middle layers and NOT the user space nor device drivers, each
subsystem that want to support new control MAP sub types should pre-
define its own BTF layou.

on map creation user can query the BTF key value layouts and any other
map attributes.

For example to enable user to access XDP attributes, we will add a
kernel BPF layer under net/core/xdp_ctrl.c which will handle all
map operations coming in from maps that are created via:

create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type = XDP_ATTR)

other subsystems/layers (eg. sockets/kprobe/cgroups/tracepoints) can
define there own map_attr.ctrl_type and implement thier own mid-layer
which will handle control maps operation of that sub type.

So you don't create a map to get connected to a specific netdev, you
create a map to get a connection to a specific kernel BPF subsystem and
query modify that subsystem objects (in XDP case netdevs).

User can create multiple maps with different ap_attr.ctrl_type to
access different system attributes.

I think we need something else here. Using BTF to describe
output stats is nice, but using BTF to describe input query is
problematic,
since user cannot know before hand what kernel can and cannot
accept.
imo input should be stable uapi with fixed constants whereas
stats-like output can be BTF based and vary from driver to driver
and from one nic version to another.
Well, we can decide to use static stable uapi, and still use this
special map to leverage the map API as described in this doc.

but also we can allow dynamic value layouts, but any extension
should
be done to the end of any value structuer and on lookup we will
only
copy the size that the user already recognizes .. ? or we can
assume/force the user to use the map_attributes to figure out
format
layouts and sizes, but still we will guarantee backward
compatibility
in kernel level by keeping old format the same, and extension is
only
allowed to the end of the value structures.
for input queries - yes. See how 'union bpf_attr' works.
It can accommodate any number of new commands with its own arguments
to query XDP stats.
For output the driver can supply BTF back to user space along with
a blob of data.

If I got your intent correctly you want only BPF_MAP_TYPE_CONTROL to
be processed by generic bpf layer and everything else to be done in
the driver? All commands, values, input/output to be driver specific?
hmm, no i want the kernel core code to process everything, different
subsystems (not device drivers) can provide there own
BPF_MAP_TYPE_CONTROL sub types and provide the map operation
implementation. As a backend such mid layer subsystem can use whatever
interface to access objects (netdevs ndos in our case and could be via
union bpf_attr), this is implementation specific that is transparent to
the user.

bottom line we don't want drivers/users to create BTF layouts it is the
kernel mid layer job to connect between the two via the control MAPs,
and this mid layer will audit drivers and users.

I know we can achieve same thing with union bpf_attr and use plain
ethtool or netlink, but this means that we need to integrate
ethtool/netlink APIs into libbpf but the whole beauty if this proposal
is that a BPF developer will have to deal only with MAP operations even
if he wants to change system attributes.

One extra benefit of this is forward compatibility of a simple user
space program that can access new kernel attributes with out the need
to modify the user space program

consider the following command line, with bpftool that compiled way
before xsk-ring-szie attribute was introduced.

$ bpftool xdp set eth0 xsk-ring-size 128

what this can eventually do:

fd = create_map(BPF_CONTROL, map_attr.ctrl_type = XDP_ATTR)

//Query BTF of the control (XDP) map
query_map_attributes(fd, &map_attr);

// Find if control map values btf layout include the new member
offset = btf_get_offset_of_memeber_name(map_attr.btf, "xsk-ring-size",
BTF_TYPE_INT);
if (offset < 0)
return offset; // the new member is not supported in kernel

*(value + offset) = (int)new_val;
map_update(fd, <eth0 ifindex>, value);


Re: [RFC][Proposal] BPF Control MAP

Saeed Mahameed <saeedm@...>
 

On Tue, 2019-03-19 at 08:36 -0700, Alexei Starovoitov wrote:
On Fri, Mar 08, 2019 at 08:51:03PM +0000, Saeed Mahameed wrote:
Thoughts ?
It's certainly an interesting idea. I think we need to agree on use
cases
and goals first before bikesheding on the solution.
Sure will discuss most of the use cases tomorrow in the iovisor
meeting.


Example use cases (XDP only for now):

1) Query XDP stats of all XDP netdevs:

xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type
=
XDP_STATS);

while (bpf_map_get_next_key(xdp_ctrl, &ifindex, &next_ifindex) ==
0) {
bpf_map_lookup_elem(xdp_ctrl, &next_ifindex, &stats);
// we don't even need to know stats format in this case
btf_pretty_print(xdp_ctrl->btf, &stats);
ifindex = next_ifindex;
}
this bit show cases advantage of BTF nicely.

2) Setup XDP tx redirect resources on egress netdev (netdev with no
XDP
program).

xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type
=
XDP_ATTR);

xdp_attr->command = SETUP_REDIRECT;
xdp_attr->rings.num = 12;
xdp_attr->rings.size = 128;
bpf_map_update_elem(xdp_ctrl, &ifindex, &xdp_attr);
this one starting to become a bit odd, since input arguments are
split
between key an value. ifindex is part of the key whereas command,
rings.num and rings.size are part of the value.
The idea is that we need to keep a map semantics (object->vlaue).
in this case ( map_attr.ctrl_type = XDP_ATTR ) object (key) is if_index
and value is xdp attributes of that if_index, which plays nicely when
you want to iterate through all the objects (XDP netdevs). simply think
of ifindex as a key and not an input value.

different ctrl map types can have different keys and values hence the
need for map_create and passing map_attr.ctrl_type attribute which will
define what the user is trying to access (which control map) and what
will be the key (object) & value (configuration) pair layout.

3) Turn On/Off XDP meta data offloads and retrieve meta data BTF
format
of specific netdev/hardware:

xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type
=
XDP_ATTR);

xdp_attr->command = SETUP_MD;
xdp_attr->enable_md = 1;
err = bpf_map_update_elem(xdp_ctrl, &ifindex, &xdp_attr);
if (err) {
printf("XDP meta data is not supported on this netdev");
return;
}
// Query Meta data BTF
bpf_map_lookup_elem(xdp_ctrl, &ifindex, &xdp_attr);
md_btf = xdp_attr.md_btf;
here it gets even weirder, since lookup arguments are also
split between key and value.
ifindex is inside the key while addition info is passed
inside xdp_attr which is part of value.

I wish we could do maps where every key would have different layout.
Then such api would be suitable.
The existing maps require all keys and values to be inform.
I guess one can argue that such 'control map' can have one element
and one value, but then what is the purpose of 'map_create'
and 'map_delete==close(fd)' operations?
This is exactly the purpose of map_{create/delete} to define what the
key,vlaue format will be ( it is not going to be ifindex for all
control maps), to make it clear, the key doesn't have to be an ifindex
at all, it depends on the map_attr.ctrl_type which the user request on
map creation, so different layouts are already supported by this
proposal

examples of different map_attr.ctrl_type:

fd = create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type = XDP_ATTR)
// Key layout == ifindex, vlaue format if_xdp_attributes

fd = create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =
BPF_PROG_STATS)
// Key layout == prog_fd, value format struct bpf_prog_stats

fd = create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =
BPF_SOCK_ATTR)
// Key layout == socket_fd, value layout struct bpf_sock_attr

extending this can be done in any linux BPF subsystem, by introducing
new map_attr.ctrl_types and new key value layouts of that control type.

on map creation we will attach the btf format of the (key, value) pair
to the map created for that user.


I think we need something else here. Using BTF to describe
output stats is nice, but using BTF to describe input query is
problematic,
since user cannot know before hand what kernel can and cannot accept.
imo input should be stable uapi with fixed constants whereas
stats-like output can be BTF based and vary from driver to driver
and from one nic version to another.
Well, we can decide to use static stable uapi, and still use this
special map to leverage the map API as described in this doc.

but also we can allow dynamic value layouts, but any extension should
be done to the end of any value structuer and on lookup we will only
copy the size that the user already recognizes .. ? or we can
assume/force the user to use the map_attributes to figure out format
layouts and sizes, but still we will guarantee backward compatibility
in kernel level by keeping old format the same, and extension is only
allowed to the end of the value structures.


Re: R? min value is negative, either use unsigned or 'var &= const' #verifier

Yonghong Song
 

On Tue, Mar 19, 2019 at 9:06 AM Simon <contact@...> wrote:

The compiler is doing optimization which make verifier fail. It is possible an early compiler with less optimizations may work.

Maybe a silly question, but does it make sense to try to change compiler optimization option ? (I tried to play with -O option without success)
Maybe. I have not looked at this yet from compiler side. Sometimes you
won't have an easy compiler option to turn off. Tuning -O may not
help. Lowering -O to -O1/-O0 may help to remove this particular
optimization, but may introduce more spills which verifier will also
reject.


Please keep me informed about your progress about this issue :)
Sure. Will let you know if I have made progress in this.


Thx again.




Re: [RFC][Proposal] BPF Control MAP

Alexei Starovoitov
 

On Wed, Mar 20, 2019 at 04:47:11AM +0000, Saeed Mahameed wrote:

consider the following command line, with bpftool that compiled way
before xsk-ring-szie attribute was introduced.
perfect. let's agree on the use case first...

$ bpftool xdp set eth0 xsk-ring-size 128
would be great to have. no doubt.

what this can eventually do:

fd = create_map(BPF_CONTROL, map_attr.ctrl_type = XDP_ATTR)

//Query BTF of the control (XDP) map
query_map_attributes(fd, &map_attr);
and bpf_map_info returns btf_key_type_id and btf_value_type_id
Are you saying btf_key_type_id will return single u32 with name 'ifindex' ?
Where do you specify netns_id+dev_id ?

// Find if control map values btf layout include the new member
offset = btf_get_offset_of_memeber_name(map_attr.btf, "xsk-ring-size",
BTF_TYPE_INT);
if (offset < 0)
return offset; // the new member is not supported in kernel

*(value + offset) = (int)new_val;
map_update(fd, <eth0 ifindex>, value);
what about other fields in this value ?
If zero do not update ?
Shouldn't it be read/modify/write ?
So lookup everything first, modify only that '*(value + offset)'
and do map_update ?
and the driver received full blob of bytes for all fields.
What driver suppose to do ? Rewrite all fields ?
Or try to be smart and do read+if_not_the_same_write ?

Different ifindexes will be different drivers,
so single map providing single btf_value_type_id for all of them
isn't going to work.

I hope you see that such api is falling apart even for simplest use case.


Re: [RFC][Proposal] BPF Control MAP

Alexei Starovoitov
 

On Tue, Mar 19, 2019 at 06:54:08PM +0000, Saeed Mahameed wrote:

This is exactly the purpose of map_{create/delete} to define what the
key,vlaue format will be ( it is not going to be ifindex for all
control maps), to make it clear, the key doesn't have to be an ifindex
at all, it depends on the map_attr.ctrl_type which the user request on
map creation, so different layouts are already supported by this
proposal

examples of different map_attr.ctrl_type:

fd = create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type = XDP_ATTR)
// Key layout == ifindex, vlaue format if_xdp_attributes

fd = create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =
BPF_PROG_STATS)
// Key layout == prog_fd, value format struct bpf_prog_stats

fd = create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =
BPF_SOCK_ATTR)
// Key layout == socket_fd, value layout struct bpf_sock_attr

extending this can be done in any linux BPF subsystem, by introducing
new map_attr.ctrl_types and new key value layouts of that control type.

on map creation we will attach the btf format of the (key, value) pair
to the map created for that user.
In your examples above does netdev with corresponding ifindex
exist before map is created?
Does prog_fd exist ? and socket?
In all cases yes. they do. Hence creation of 'map' (even pseudo map)
is an unnecessary step.

User space providing BTF for input and output makes little sense to me.
What kernel suppose to do with it?

In case of XDP stats (that could be different between drivers)
the driver would provide a BTF to describe the stats it's collecting.
So it's kernel supplied BTF instead of user.

I think we need something else here. Using BTF to describe
output stats is nice, but using BTF to describe input query is
problematic,
since user cannot know before hand what kernel can and cannot accept.
imo input should be stable uapi with fixed constants whereas
stats-like output can be BTF based and vary from driver to driver
and from one nic version to another.
Well, we can decide to use static stable uapi, and still use this
special map to leverage the map API as described in this doc.

but also we can allow dynamic value layouts, but any extension should
be done to the end of any value structuer and on lookup we will only
copy the size that the user already recognizes .. ? or we can
assume/force the user to use the map_attributes to figure out format
layouts and sizes, but still we will guarantee backward compatibility
in kernel level by keeping old format the same, and extension is only
allowed to the end of the value structures.
for input queries - yes. See how 'union bpf_attr' works.
It can accommodate any number of new commands with its own arguments
to query XDP stats.
For output the driver can supply BTF back to user space along with
a blob of data.

If I got your intent correctly you want only BPF_MAP_TYPE_CONTROL to
be processed by generic bpf layer and everything else to be done in
the driver? All commands, values, input/output to be driver specific?


reminder: IO Visor TSC/Dev Meeting

Brenden Blanco
 

Agenda: Discussion on BTF Control MAP proposal from Saeed

Also, please note the 1 hour time change.

Please join us tomorrow for our bi-weekly call. As usual, this meeting is
open to everybody and completely optional.
You might be interested to join if:
You want to know what is going on in BPF land
You are doing something interesting yourself with BPF and would like to share
You want to know what the heck BPF is

=== IO Visor Dev/TSC Meeting ===

Every 2 weeks on Wednesday, from Wednesday, January 25, 2017, to no end date
11:00 am | Pacific Daylight Time (San Francisco, GMT-07:00) | 30 min

https://bluejeans.com/568677804/

https://www.timeanddate.com/worldclock/meetingdetails.html?year=2019&month=3&day=20&hour=18&min=0&sec=0&p1=900


Re: R? min value is negative, either use unsigned or 'var &= const' #verifier

Simon
 

The compiler is doing optimization which make verifier fail. It is possible an early compiler with less optimizations may work.

Maybe a silly question, but does it make sense to try to change compiler optimization option ? (I tried  to play with -O option without success)

Please keep me informed about your progress about this issue :)

Thx again.

 


Re: [RFC][Proposal] BPF Control MAP

Alexei Starovoitov
 

On Fri, Mar 08, 2019 at 08:51:03PM +0000, Saeed Mahameed wrote:

Thoughts ?
It's certainly an interesting idea. I think we need to agree on use cases
and goals first before bikesheding on the solution.


Example use cases (XDP only for now):

1) Query XDP stats of all XDP netdevs:

xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =
XDP_STATS);

while (bpf_map_get_next_key(xdp_ctrl, &ifindex, &next_ifindex) == 0) {
bpf_map_lookup_elem(xdp_ctrl, &next_ifindex, &stats);
// we don't even need to know stats format in this case
btf_pretty_print(xdp_ctrl->btf, &stats);
ifindex = next_ifindex;
}
this bit show cases advantage of BTF nicely.

2) Setup XDP tx redirect resources on egress netdev (netdev with no XDP
program).

xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =
XDP_ATTR);

xdp_attr->command = SETUP_REDIRECT;
xdp_attr->rings.num = 12;
xdp_attr->rings.size = 128;
bpf_map_update_elem(xdp_ctrl, &ifindex, &xdp_attr);
this one starting to become a bit odd, since input arguments are split
between key an value. ifindex is part of the key whereas command,
rings.num and rings.size are part of the value.

3) Turn On/Off XDP meta data offloads and retrieve meta data BTF format
of specific netdev/hardware:

xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =
XDP_ATTR);

xdp_attr->command = SETUP_MD;
xdp_attr->enable_md = 1;
err = bpf_map_update_elem(xdp_ctrl, &ifindex, &xdp_attr);
if (err) {
printf("XDP meta data is not supported on this netdev");
return;
}
// Query Meta data BTF
bpf_map_lookup_elem(xdp_ctrl, &ifindex, &xdp_attr);
md_btf = xdp_attr.md_btf;
here it gets even weirder, since lookup arguments are also
split between key and value.
ifindex is inside the key while addition info is passed
inside xdp_attr which is part of value.

I wish we could do maps where every key would have different layout.
Then such api would be suitable.
The existing maps require all keys and values to be inform.
I guess one can argue that such 'control map' can have one element
and one value, but then what is the purpose of 'map_create'
and 'map_delete==close(fd)' operations?

I think we need something else here. Using BTF to describe
output stats is nice, but using BTF to describe input query is problematic,
since user cannot know before hand what kernel can and cannot accept.
imo input should be stable uapi with fixed constants whereas
stats-like output can be BTF based and vary from driver to driver
and from one nic version to another.


Re: R? min value is negative, either use unsigned or 'var &= const' #verifier

Yonghong Song
 

On Mon, Mar 18, 2019 at 4:18 AM Simon <contact@...> wrote:

Thx a lot again for your time and your detailed explanation.

About the workaround you proposed, I didn't get where I should repeat __u16 udp_len = bpf_ntohs(udp->len);.. I tried several spot but didn't succeed to make it works.
This is a tough issue. I spent a couple of hours trying various source
workaround and did not succeed.
To illustrate my experiment, the following is what I tried to do to
move the code udp_len calculation and its usage closer to each other
to avoid register spill/refills.

```
-bash-4.4$ diff ulb.c ulb.c.org
61,62c61,62
< static inline int ipv4_l4_csum(struct udphdr *data_start, __u32 data_size,
< __u64 *csum, struct iphdr *iph, void
*data_end) {
---
static inline void ipv4_l4_csum(void *data_start, __u32 data_size,
__u64 *csum, struct iphdr *iph) {
70,87c70,71
<
< data_start = iph + 1;
< if (data_start + 1 > data_end)
< return XDP_DROP;
< data_size = bpf_ntohs(data_start->len);
< if (data_size < 8)
< return -1;
< if (data_size > 512)
< return -1;
< data_size = data_size & 0x1ff;
< data_start = (void *)data_start + data_size;
< if ((void *) data_start <= data_end) {
< *csum = bpf_csum_diff(0, 0, (void *)data_start - data_size,
data_size, *csum);
< *csum = csum_fold_helper(*csum);
< return 0;
< }
<
< return -1;
---
*csum = bpf_csum_diff(0, 0, data_start, data_size, *csum);
*csum = csum_fold_helper(*csum);
140a125,132
__u16 udp_len = bpf_ntohs(udp->len);
if (udp_len < 8)
return XDP_DROP;
if (udp_len > 512) // TODO use a more approriate max value
return XDP_DROP;
udp_len = udp_len & 0x1ff;
if ((void *) udp + udp_len > data_end)
return XDP_DROP;
200,203c192
< __u16 udp_len = 0;
< int ret = ipv4_l4_csum(udp, udp_len, &cs, iph, data_end) ;
< if (ret == -1)
< return XDP_DROP;
---
ipv4_l4_csum(udp, udp_len, &cs, iph) ;
-bash-4.4$
```

But my attempt above not working. The main reason is the compiler is
always trying to
use the original assignment register
data_start = iph + 1; /* data_start is in r8 */
/* r1 = r8, and below r1 is used for data_start */
if (data_start + 1 > data_end)
return XDP_DROP;
...
*csum = bpf_csum_diff(0, 0, (void *)data_start - data_size,
data_size, *csum); /* data_start in r8 */

The data_start is refined in r1 while r8 is used in the final argument passing.

I think I need to look at kernel verifier side and compiler side.

By the way repeat it, means probably shifting byte again ... so I pretty sure I didn't get well what you mean :/ ... Did you succeed to make it works with this workaround ?

This is maybe a stupid question, but it seems to me that I'm not doing so exotic code here. I mean calculate a checksum with XDP and I fall in not so easy "issues". Is there something I do which is totally wrong or uncommon ?
You are not. The compiler is doing optimization which make verifier
fail. It is possible an early compiler with less optimizations may
work.


I will take a look at SCALAR_VALUE issue later on.But if you or anybody has cycles and want to look into this issue,feel free to do so.

I didn't have enough skill to do that, I'm not even sure to understand what "spill/reload" means ?
Something like:
udp_len = ...
...
... = udp_len

The generated code:
store_to_stack for udp_len;
...
get udp_len value from stack
use udp_len


Re: R? min value is negative, either use unsigned or 'var &= const' #verifier

Simon
 

Thx a lot again for your time and your detailed explanation.

About the workaround you proposed, I didn't get where I should repeat __u16 udp_len = bpf_ntohs(udp->len);.. I tried several spot but didn't succeed to make it works.
By the way repeat it, means probably shifting byte again ... so I pretty sure I didn't  get well what you mean :/ ... Did you succeed to make it works with this workaround ?

This is maybe a stupid question, but it seems to me that I'm not doing so exotic code here. I mean calculate a checksum with XDP and I fall in not so easy "issues". Is there something I do which is totally wrong or uncommon ?

I will take a look at SCALAR_VALUE issue later on.But if you or anybody has cycles and want to look into this issue,feel free to do so.
I didn't have enough skill to do that, I'm not even sure to understand what "spill/reload" means ?


Re: R? min value is negative, either use unsigned or 'var &= const' #verifier

Yonghong Song
 

On Mon, Mar 11, 2019 at 11:13 PM Yonghong Song via Lists.Iovisor.Org
<ys114321=gmail.com@...> wrote:

On Mon, Mar 11, 2019 at 4:08 AM Simon <contact@...> wrote:

I tried to understand again this verifier error again and probably my previous post does not contain enough information.

I understand that :

93: (67) r0 <<= 32
294: (c7) r0 s>>= 32
295: (b7) r1 = 0
296: (b7) r2 = 0
297: (bf) r3 = r8
298: (79) r4 = *(u64 *)(r10 -40)
299: (bf) r5 = r0
300: (85) call bpf_csum_diff#28
R4 min value is negative, either use unsigned or 'var &= const'

is about this line (in ipv4_l4_csum)

*csum = bpf_csum_diff(0, 0, data_start, data_size, *csum);

R1=0,
R2=0,
R3= R8=pkt(id=0,off=34,r=42,imm=0) = data_start = a pointer to struct udphdr *udp
R4= something in the stack = data_size = __u16 udp_len

So I can not understand how this bring to R4 min value is negative, either use unsigned or 'var &= const'
I took a brief look. Indeed, it is very strange. I can see proper
value of udp_len is stored into
r10 - 40, but when it is retrieved later, the value became unkown....

I will try to experiment with this problem later this week.

Could you do me a favor to make it reproducible with python2? My env.
flexible to rebuild/retry with kernel is python2 friendly.



298: (79) r4 = *(u64 *)(r10 -40)

As I understand this line, r4 will get a value in the stack (R10=fp0,call_-1 fp-48=pkt) and cast this value in a u64, so unsigned. (min value = 0)
The reason for the failure is due spill/reload does not preserve the
original register state for scalar value.
Look at the kernel function:
static bool is_spillable_regtype(enum bpf_reg_type type)
{
switch (type) {
case PTR_TO_MAP_VALUE:
case PTR_TO_MAP_VALUE_OR_NULL:
case PTR_TO_STACK:
case PTR_TO_CTX:
case PTR_TO_PACKET:
case PTR_TO_PACKET_META:
case PTR_TO_PACKET_END:
case PTR_TO_FLOW_KEYS:
case CONST_PTR_TO_MAP:
case PTR_TO_SOCKET:
case PTR_TO_SOCKET_OR_NULL:
case PTR_TO_SOCK_COMMON:
case PTR_TO_SOCK_COMMON_OR_NULL:
case PTR_TO_TCP_SOCK:
case PTR_TO_TCP_SOCK_OR_NULL:
return true;
default:
return false;
}
}

The original variable udp_len is defined as
__u16 udp_len = bpf_ntohs(udp->len);
if (udp_len < 8)
return XDP_DROP;
if (udp_len > 512) // TODO use a more approriate max value
return XDP_DROP;
udp_len = udp_len & 0x1ff;

and it is saved to stack, and its register type is SCALAR_VALUE.
But since SCALAR_VALUE is not part of is_spillable_regtype() so
when the value is reloaded from stack to register, the original
state is not copied and the worst case scalar value is assumed
and this will incur the error you see.

I tried to add SCALAR_VALUE to the is_spillable_regtype()
and it does not work and more changes in verifier is required.

The trick like `data_size = data_size & 0x1ff` may not work
as the compiler may optimize it away and the original spill
may still exist.

The workaound could be to repeat
__u16 udp_len = bpf_ntohs(udp->len);
here udp is a pointer to the package, even it is spilled, its register
state can be restored.

I will take a look at SCALAR_VALUE issue later on.
But if you or anybody has cycles and want to look into this issue,
feel free to do so.

Thanks!

Yonghong


(By the way I can not understand why this is a u64 and not a u16 as udp_len variable or u32 as data_size parameter of ipv4_l4_csum function or u32 as tosize from bpf_csum_diff function...)

I tried to use the &= tricks like :

data_size = data_size & 0x1ff;
*csum = bpf_csum_diff(0, 0, data_start, data_size, *csum);

Same issue ...

Here a more longer trace from the verifier :

R0=inv(id=0,umax_value=4295032831,var_off=(0x0; 0x1ffffffff))
R1=inv(id=0,umax_value=65536,var_off=(0x0; 0x1ffff))
R6=ctx(id=0,off=0,imm=0)
R7=pkt(id=0,off=0,r=42,imm=0)
R8=pkt(id=0,off=34,r=42,imm=0)
R9=pkt(id=0,off=30,r=42,imm=0)
R10=fp0,call_-1 fp-48=pkt
239: (57) r0 &= 65535
240: (0f) r0 += r1
241: (bf) r1 = r0
242: (77) r1 >>= 16
243: (0f) r1 += r0
244: (a7) r1 ^= -1
245: (6b) *(u16 *)(r7 +24) = r1
246: (b7) r1 = 0
247: (6b) *(u16 *)(r7 +40) = r1
248: (b7) r1 = 0
249: (b7) r2 = 0
250: (79) r3 = *(u64 *)(r10 -48)
251: (b7) r4 = 4
252: (b7) r5 = 0
253: (85) call bpf_csum_diff#28
254: (67) r0 <<= 32
255: (c7) r0 s>>= 32
256: (b7) r1 = 0
257: (b7) r2 = 0
258: (bf) r3 = r9
259: (b7) r4 = 4
260: (bf) r5 = r0
261: (85) call bpf_csum_diff#28
262: (71) r1 = *(u8 *)(r7 +23)
263: (dc) r1 = be32 r1
264: (63) *(u32 *)(r10 -24) = r1
265: (67) r0 <<= 32
266: (c7) r0 s>>= 32
267: (bf) r9 = r10
268: (07) r9 += -24
269: (b7) r1 = 0
270: (b7) r2 = 0
271: (bf) r3 = r9
272: (b7) r4 = 4
273: (bf) r5 = r0
274: (85) call bpf_csum_diff#28
275: (79) r1 = *(u64 *)(r10 -40)
276: (dc) r1 = be32 r1
277: (63) *(u32 *)(r10 -24) = r1
278: (67) r0 <<= 32
279: (c7) r0 s>>= 32
280: (b7) r1 = 0
281: (b7) r2 = 0
282: (bf) r3 = r9
283: (b7) r4 = 4
284: (bf) r5 = r0
285: (85) call bpf_csum_diff#28
286: (67) r0 <<= 32
287: (c7) r0 s>>= 32
288: (b7) r1 = 0
289: (b7) r2 = 0
290: (bf) r3 = r8
291: (79) r4 = *(u64 *)(r10 -40)
292: (bf) r5 = r0
293: (85) call bpf_csum_diff#28

I reference the commit instead of repository to keep the link consistent over the time : https://github.com/sbernard31/udploadbalancer/tree/5ca93d0893a60bc70a75f30eb5cfde496a9e5d93

Again do not hesitate to redirect me to better place if I'm not asking at the right place :)

Thx again for your time.



BCC support for ARM

Jugurtha BELKALEM
 

Hi;

We're trying to integrate bcc as a package in buildroot with the configuration shown below:
  -  Target board : Raspberry PI 3.
  -  Toolchain : Arm ARM 2018.11
  -  LLVM 7.0.1

However, buildroot includes only "LuaJIT 2.0.5"(which does not support ARM64).

So we have decided to use ARM32, which seems like bcc does not have a support for.

The build system returned the following errors :

buildroot/output/build/bcc-v0.8.0/tests/cc/test_usdt_args.cc:65:14: error: ‘parser’ was not declared in this scope
     REQUIRE(!parser.parse(&arg));
              ^~~~~~
buildroot/output/build/bcc-v0.8.0/tests/cc/test_usdt_args.cc:65:14: note: suggested alternative: ‘pause’
buildroot/output/build/bcc-v0.8.0/tests/cc/test_usdt_args.cc:65:14: error: ‘parser’ was not declared in this scope
     REQUIRE(!parser.parse(&arg));
              ^~~~~~
buildroot/output/build/bcc-v0.8.0/tests/cc/test_usdt_args.cc:65:14: note: suggested alternative: ‘pause’
buildroot/output/build/bcc-v0.8.0/tests/cc/test_usdt_args.cc:156:13: error: ‘parser’ was not declared in this scope
     REQUIRE(parser.done());
             ^~~~~~
buildroot/output/build/bcc-v0.8.0/tests/cc/test_usdt_args.cc:156:13: note: suggested alternative: ‘pause’
output/build/bcc-v0.8.0/tests/cc/test_usdt_args.cc:156:13: error: ‘parser’ was not declared in this scope
     REQUIRE(parser.done());


Taking a close look to the file "test_usdt_args.cc" shows support for few architectures (which does not include ARM32):

#ifdef __aarch64__
    USDT::ArgumentParser_aarch64 parser("4@[x32,200]");
#elif __powerpc64__
    USDT::ArgumentParser_powerpc64 parser("4@-12(42)");
#elif defined(__x86_64__)
    USDT::ArgumentParser_x64 parser("4@i%ra+1r");
#endif
    USDT::Argument arg;
    REQUIRE(!parser.parse(&arg));
    int i;
    for (i = 0; i < 10 && !parser.done(); ++i) {
      parser.parse(&arg);
    }
    // Make sure we reach termination
    REQUIRE(i < 10);
  }
  SECTION("argument examples from the Python implementation") {

I'm wondering if there is a list of exhaustive supported architectures in bcc and if there is any current works which aim to include support for ARM32?


bpf: Failed to load program: Permission denied

Jacob Steadman
 

Hi,

I'm new to BPF. I'm trying to write a program that analyses the structure o= f DNS requests. I keep getting the following error (bellow) at a certain point in the code (bellow).

The error only occurs when I try to "return -1;" (i.e. allow the packet). I= f I remove this line the program executes as expected.

I wonder if it could be an issue with the kernel version rather than the co= de? (Ubuntu 16.04.4 LTS, Kernel version 4.4.0-87-generic)

Error***************************************
bpf: Failed to load program: Permission denied
...
...
R2 invalid mem access 'inv'

HINT: The invalid mem access 'inv' error can happen if you try to dereferen= ce memory without first using bpf_probe_read() to copy it to the BPF stack.=  Sometimes the bpf_probe_read is automatic by the bcc rewriter, other times=  you'll need to be explicit.

Traceback (most recent call last):
  File "dns_matching.py", line 57, in <module>
    function_dns_matching =3D bpf.load_func("dns_exfil_detection_v2", BPF.S=
OCKET_FILTER)
  File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 379, in loa= d_func
    (func_name, errstr))
Exception: Failed to load BPF program dns_exfil_detection_v2: Permission de= nied
********************************************


Code****************************************
        #pragma unroll
        for(i =3D 0; i<255;i++){
                c =3D cursor_advance(cursor, 1);

                if (c->c =3D=3D 0)
                                break;

key.p[i] =3D c->c;

                //**ensure this is the correct max length of a subdomain**
                if(c->c < 0x0f){
                                subdomLengths[subdomainCount] =3D (u16) c->= c;
                                subdomainCount =3D subdomainCount +1;
                 }
        }

*** if(subdomLengths[subdomainCount] =3D=3D 2 && subdomLengths[subdomainCou= nt-1]  =3D=3D2 && subdomainCount <4 ){
***         return -1;
*** }
*********************************************


Re: R? min value is negative, either use unsigned or 'var &= const' #verifier

Simon
 

Here a python 2 version.


Re: R? min value is negative, either use unsigned or 'var &= const' #verifier

Yonghong Song
 

On Mon, Mar 11, 2019 at 4:08 AM Simon <contact@...> wrote:

I tried to understand again this verifier error again and probably my previous post does not contain enough information.

I understand that :

93: (67) r0 <<= 32
294: (c7) r0 s>>= 32
295: (b7) r1 = 0
296: (b7) r2 = 0
297: (bf) r3 = r8
298: (79) r4 = *(u64 *)(r10 -40)
299: (bf) r5 = r0
300: (85) call bpf_csum_diff#28
R4 min value is negative, either use unsigned or 'var &= const'

is about this line (in ipv4_l4_csum)

*csum = bpf_csum_diff(0, 0, data_start, data_size, *csum);

R1=0,
R2=0,
R3= R8=pkt(id=0,off=34,r=42,imm=0) = data_start = a pointer to struct udphdr *udp
R4= something in the stack = data_size = __u16 udp_len

So I can not understand how this bring to R4 min value is negative, either use unsigned or 'var &= const'
I took a brief look. Indeed, it is very strange. I can see proper
value of udp_len is stored into
r10 - 40, but when it is retrieved later, the value became unkown....

I will try to experiment with this problem later this week.

Could you do me a favor to make it reproducible with python2? My env.
flexible to rebuild/retry with kernel is python2 friendly.



298: (79) r4 = *(u64 *)(r10 -40)

As I understand this line, r4 will get a value in the stack (R10=fp0,call_-1 fp-48=pkt) and cast this value in a u64, so unsigned. (min value = 0)

(By the way I can not understand why this is a u64 and not a u16 as udp_len variable or u32 as data_size parameter of ipv4_l4_csum function or u32 as tosize from bpf_csum_diff function...)

I tried to use the &= tricks like :

data_size = data_size & 0x1ff;
*csum = bpf_csum_diff(0, 0, data_start, data_size, *csum);

Same issue ...

Here a more longer trace from the verifier :

R0=inv(id=0,umax_value=4295032831,var_off=(0x0; 0x1ffffffff))
R1=inv(id=0,umax_value=65536,var_off=(0x0; 0x1ffff))
R6=ctx(id=0,off=0,imm=0)
R7=pkt(id=0,off=0,r=42,imm=0)
R8=pkt(id=0,off=34,r=42,imm=0)
R9=pkt(id=0,off=30,r=42,imm=0)
R10=fp0,call_-1 fp-48=pkt
239: (57) r0 &= 65535
240: (0f) r0 += r1
241: (bf) r1 = r0
242: (77) r1 >>= 16
243: (0f) r1 += r0
244: (a7) r1 ^= -1
245: (6b) *(u16 *)(r7 +24) = r1
246: (b7) r1 = 0
247: (6b) *(u16 *)(r7 +40) = r1
248: (b7) r1 = 0
249: (b7) r2 = 0
250: (79) r3 = *(u64 *)(r10 -48)
251: (b7) r4 = 4
252: (b7) r5 = 0
253: (85) call bpf_csum_diff#28
254: (67) r0 <<= 32
255: (c7) r0 s>>= 32
256: (b7) r1 = 0
257: (b7) r2 = 0
258: (bf) r3 = r9
259: (b7) r4 = 4
260: (bf) r5 = r0
261: (85) call bpf_csum_diff#28
262: (71) r1 = *(u8 *)(r7 +23)
263: (dc) r1 = be32 r1
264: (63) *(u32 *)(r10 -24) = r1
265: (67) r0 <<= 32
266: (c7) r0 s>>= 32
267: (bf) r9 = r10
268: (07) r9 += -24
269: (b7) r1 = 0
270: (b7) r2 = 0
271: (bf) r3 = r9
272: (b7) r4 = 4
273: (bf) r5 = r0
274: (85) call bpf_csum_diff#28
275: (79) r1 = *(u64 *)(r10 -40)
276: (dc) r1 = be32 r1
277: (63) *(u32 *)(r10 -24) = r1
278: (67) r0 <<= 32
279: (c7) r0 s>>= 32
280: (b7) r1 = 0
281: (b7) r2 = 0
282: (bf) r3 = r9
283: (b7) r4 = 4
284: (bf) r5 = r0
285: (85) call bpf_csum_diff#28
286: (67) r0 <<= 32
287: (c7) r0 s>>= 32
288: (b7) r1 = 0
289: (b7) r2 = 0
290: (bf) r3 = r8
291: (79) r4 = *(u64 *)(r10 -40)
292: (bf) r5 = r0
293: (85) call bpf_csum_diff#28

I reference the commit instead of repository to keep the link consistent over the time : https://github.com/sbernard31/udploadbalancer/tree/5ca93d0893a60bc70a75f30eb5cfde496a9e5d93

Again do not hesitate to redirect me to better place if I'm not asking at the right place :)

Thx again for your time.


Re: R? min value is negative, either use unsigned or 'var &= const' #verifier

Simon
 

I tried to understand again this verifier error again and probably my previous post does not contain enough information.

I  understand that :

93: (67) r0 <<= 32
294: (c7) r0 s>>= 32
295: (b7) r1 = 0
296: (b7) r2 = 0
297: (bf) r3 = r8
298: (79) r4 = *(u64 *)(r10 -40)
299: (bf) r5 = r0
300: (85) call bpf_csum_diff#28
R4 min value is negative, either use unsigned or 'var &= const'

is about this line  (in ipv4_l4_csum)

  *csum = bpf_csum_diff(0, 0, data_start, data_size, *csum);

R1=0,
R2=0,
R3= R8=pkt(id=0,off=34,r=42,imm=0) = data_start =  a pointer to struct udphdr *udp
R4= something in the stack  = data_size = __u16 udp_len

So I can not understand how this bring to R4 min value is negative, either use unsigned or 'var &= const'

298: (79) r4 = *(u64 *)(r10 -40)

As I understand this line, r4 will get a value in the stack (R10=fp0,call_-1 fp-48=pkt) and cast  this value in a u64, so unsigned. (min value = 0)

(By the way I can not understand why this is a u64 and not a u16 as udp_len variable or u32 as  data_size parameter of ipv4_l4_csum function or u32 as tosize from bpf_csum_diff function...)

I tried to use the &= tricks like :

data_size = data_size & 0x1ff;
*csum = bpf_csum_diff(0, 0, data_start, data_size, *csum);

Same issue ...

Here a more longer trace from the verifier :

R0=inv(id=0,umax_value=4295032831,var_off=(0x0; 0x1ffffffff)) 
R1=inv(id=0,umax_value=65536,var_off=(0x0; 0x1ffff))
R6=ctx(id=0,off=0,imm=0)
R7=pkt(id=0,off=0,r=42,imm=0)
R8=pkt(id=0,off=34,r=42,imm=0)
R9=pkt(id=0,off=30,r=42,imm=0)
R10=fp0,call_-1 fp-48=pkt 239: (57) r0 &= 65535 240: (0f) r0 += r1 241: (bf) r1 = r0 242: (77) r1 >>= 16 243: (0f) r1 += r0 244: (a7) r1 ^= -1 245: (6b) *(u16 *)(r7 +24) = r1 246: (b7) r1 = 0 247: (6b) *(u16 *)(r7 +40) = r1 248: (b7) r1 = 0 249: (b7) r2 = 0 250: (79) r3 = *(u64 *)(r10 -48) 251: (b7) r4 = 4 252: (b7) r5 = 0 253: (85) call bpf_csum_diff#28 254: (67) r0 <<= 32 255: (c7) r0 s>>= 32 256: (b7) r1 = 0 257: (b7) r2 = 0 258: (bf) r3 = r9 259: (b7) r4 = 4 260: (bf) r5 = r0 261: (85) call bpf_csum_diff#28 262: (71) r1 = *(u8 *)(r7 +23) 263: (dc) r1 = be32 r1 264: (63) *(u32 *)(r10 -24) = r1 265: (67) r0 <<= 32 266: (c7) r0 s>>= 32 267: (bf) r9 = r10 268: (07) r9 += -24 269: (b7) r1 = 0 270: (b7) r2 = 0 271: (bf) r3 = r9 272: (b7) r4 = 4 273: (bf) r5 = r0 274: (85) call bpf_csum_diff#28 275: (79) r1 = *(u64 *)(r10 -40) 276: (dc) r1 = be32 r1 277: (63) *(u32 *)(r10 -24) = r1 278: (67) r0 <<= 32 279: (c7) r0 s>>= 32 280: (b7) r1 = 0 281: (b7) r2 = 0 282: (bf) r3 = r9 283: (b7) r4 = 4 284: (bf) r5 = r0 285: (85) call bpf_csum_diff#28 286: (67) r0 <<= 32 287: (c7) r0 s>>= 32 288: (b7) r1 = 0 289: (b7) r2 = 0 290: (bf) r3 = r8 291: (79) r4 = *(u64 *)(r10 -40) 292: (bf) r5 = r0 293: (85) call bpf_csum_diff#28

I reference the commit instead of repository to keep the link consistent over the time : https://github.com/sbernard31/udploadbalancer/tree/5ca93d0893a60bc70a75f30eb5cfde496a9e5d93

Again do not hesitate to redirect me to better place if I'm not asking at the right place :)

Thx again for your time.


Re: math between pkt pointer and register with unbounded min value is not allowed #verifier

Yonghong Song
 

On Fri, Mar 8, 2019 at 9:22 AM Simon <contact@...> wrote:


35: (69) r3 = *(u16 *)(r7 +38)
36: (dc) r3 = be16 r3

r3 get the value from memory, its value could be any one as permitted
by the type.

Does it mean that r3 is considered as be16 ? I do not understand why as I explicitly convert it in u16.
The be16 is to convert r3 with big endian encoding. If the host system
is big endian, it will do nothing. Otherwise,
it will convert from little endian to big endian.


This output language is a readable format of bpf bytecode, right ? Is there any documentation to lean/understand it ?
Yes, there is no documentation. It intends to be self explanatory. I
guess "be16" is special and may need some documentation. Otherwise
assembly-style codes should be easy to understand.


The compiler does the right thing, just verifier is not advanced enough.

Is it worthy to share this issue of verifier.c with bpf maintainers ? The compiler which is used here is clang which is called by bcc, right ?
I am also a regular kernel/bpf reviewer. The bpf maintainers/community
are aware of this limitation. As you mentioned, the verifier is
already very complex. To implement complex tracking like described in
this thread will make verifier even more complex, hence this is
delayed. One of reason is that we have reasonable, although
unpleasant, workarounds.

Yes, it is compiled with clang.


Yes, you will need some source workaround. You could try below (untested):
+ udp_len = udp_len & 0x1ff;

I tested it and it seems to work. Thx a lot !!

But that means I can not use the u16 max value ?
You can. I add that because you have a test to limit the range of the
value to 511.



Re: minutes: IO Visor TSC/Dev Meeting

Saeed Mahameed
 

Hi Guys,

My agenda for next meeting:

1) unifying and centralizing XDP statistics accounting [1].
2) XDP resource management, User API [2].
3) XDP meta data via btf (in kernel BTF registration).
4) all of the above issues share one common problem, which is the lack
of a unified user interface
without it, We really can't make a real progress.

I just sent a proposal [3] for away to achieve the unified interface,
please look it up and let me know your thoughts.
[1] https://github.com/xdp-project/xdp-project/blob/master/xdp-project.org#statistics-per-xdp-action
[2] https://github.com/xdp-project/xdp-project/blob/master/xdp-project.org#better-ndo_xdp_xmit-resource-management
[3] Subject: "[RFC][Proposal] BPF Control MAP"

Thanks,
Saeed

On Wed, Mar 6, 2019 at 12:51 PM Brenden Blanco <bblanco@...> wrote:

Hi All,

Thank you for joining the call today. Here are my notes from the discussion.

Thanks,
Brenden

=== Discussion ===
Brenden:
* Plan to tag release to coincide with kernel 5.0

Brendan:
* Speaking this weekend at SCaLE in Los Angeles

Yonghong:
* LLVM work
* compile once - run anywhere WIP
* support for static variables

Daniel:
* Global data support work in kernel continues
* Ability to lock maps as read-only
* bugfixes after merge window

Alexei:
* Some thoughts on future work of BPF
* especially with introduction of BTF
* overall needs concerted effort to improve debuggability
* BTF for programs itself with source/type/layout information
* structures for maps and global data
* suggest to always require type information
(already turned on by default in bcc and supported by llvm)
* Some extra hoops to jump through for driver embedded BPF
* to be enabled with a sysctl
* kernel support is ready
* some long tail of support - e.g. systemd has raw assembly BPF
* kconfig option - eventual deprecation
* if kernel is default strict, llvm should automatically emit BTF as well
* memcg accounting patch status?
* Daniel - still being worked on
* proposal to enable the same accounting for verifier memory
* helps to enable verifier multithreading

Jakub:
* question regarding global data atomicity
* Daniel - requires read once / write once instructions to work properly
* some todo work on documentation, interpreter + jit implementations
* depends on architecture (machine word size guarantees only)

Jesper:
* which llvm release supports BTF
* landed in December - will be in 8.0, better in 9.0
* working on tutorial for xdp at netdev
* https://www.netdevconf.org/0x13/session.html?tutorial-XDP-hands-on
* soliciting feedback
* https://github.com/xdp-project/xdp-tutorial/

Saeed:
* request to devote some time in the next meeting to iron out some XDP issues
* please send an agend in reply to the reminder email before next call
* prepare discussion over email in between time

=== Attendees ===
Alexei Starovoitov
Marco Leogrande
Mauricio Vasquez
Paul Chaignon
Brenden Blanco
Jiong Wang
Yonghong Song
Daniel Borkmann
Jesper Brouer
Quentin Monnet
Dan Siemon
Jakub Kicinski
Saeed
John
Yutaro


401 - 420 of 2021