Date   

bpftrace and include search paths?

Richard Elling
 

I have a need to have a bpftrace script #include headers from a project
directory. In cc, this is like adding -I<path>. Am I blind from reading manuals
or is there a clever way to pass that info down through bpftrace into bpf or
is this a new RFE?
-- richard


XDP on Azure vNICs

Kanthi P
 

Hi,

We are seeing an issue with XDP(xdp generic) attached to Azure vNICs. When "accelerated networking" is enabled on Azure vNICs, xdp doesn't receive all the packets.

We see all of them in tcpdump though.

Has anyone tried it this way?

Thanks,
Kanthi



minutes: IO Visor TSC/Dev Meeting

Brenden Blanco
 

Hi All,

Thanks for the good discussion today! Below are my notes.

Thanks,
Brenden

=== Discussion ===

Michael:
* https://github.com/savisko/katran/tree/xdp_off
* Mellanox presentation on XDP + Katran
* offload tc XDP programs to hardware nic
* Example application: Katran from Facebook
* Control is implemented as a C++ library (example is open source)
* Katran DP already implemented in XDP
* Parsing + extract flow ID
* lookup key generation
* counter update
* packet modified to forward to other IP
* Accelerate marking of flows in hardware
* XDP metadata to pass mark field from hardware to xdp program
* struct xdp_md_mark {
__u32 mark)
};
* if (mark_ptr + 1 <= data)
markID = mark_ptr->mark;
* per-CPU XDP map to convert mark -> real flow information
mark == 0 implies new flow
* original XDP slow-path has get_packet_dst() to create LRU mapping
* modified version uses perf event output to notify acceleration helper to
install flow mark in hardware
* perf results:
100 flows: 40+% performance improvements
10k flows: 0-50% performance improvements depending on #rx queues used
Software: 25Mpps
Hardware: 37-39Mpps
* considering changing implementation to mark real server instead of flow
id, to reduce number of entries kept in L1 cache

Yonghong:
* Compile once run anywhere work continues
* bitfield handling bugs in IR/debuginfo

Daniel:
* Global support work continues
* BTF side patches submitted to bpf mailing list
* tests included

Jiong:
* 32 bit patch set
* test methodology improvements
* updated patches later in the week
* some concerns around shifts, to be addressed in later improvements

Andrii:
* BTF and compile-once work integration
to share prototype tool with Saeed

Brendan:
* Is there a tool to measure queue latency in qdisc->netdev layer?
* debian/ubuntu are packaging bpftrace
* except libbcc renamed to libbpf_cc
* some issues with mixing iovisor's libbcc and debian's

Jesper:
* Fedora adding packaging support for libbpf

Alexei:
* systemd also adding support for libbpf - link to be provided?

=== Attendees ===
Brenden Blanco
Andril Nakryiko
Daniel Borkmann
Jesper Brouer
Jiong Wang
Marco Leogrande
Michael Savisko
Paul Chaignon
Quentin Monnet
Alexei Starovoitov
Saeed
Flavio
Rony
Jonathan Lemon
Brendan Gregg
Dan Siemon
Joe Stringer
John


Re: reminder: IO Visor TSC/Dev Meeting

Michael Savisko
 

Hi,

Please see attached presentation of XDP acceleration of Katrab LB (in PPT and PDF).


Regards,
Michael

On Wednesday, April 3, 2019, 2:58:39 AM GMT+3, Brenden Blanco <bblanco@...> wrote:


Agenda: discussion on XDP acceleration of Katran LB

Please join us tomorrow for our bi-weekly call. As usual, this meeting is
open to everybody and completely optional.
You might be interested to join if:
You want to know what is going on in BPF land
You are doing something interesting yourself with BPF and would like to share
You want to know what the heck BPF is

=== IO Visor Dev/TSC Meeting ===

Every 2 weeks on Wednesday, from Wednesday, January 25, 2017, to no end date
11:00 am  |  Pacific Daylight Time (San Francisco, GMT-07:00)  |  30 min






reminder: IO Visor TSC/Dev Meeting

Brenden Blanco
 

Agenda: discussion on XDP acceleration of Katran LB

Please join us tomorrow for our bi-weekly call. As usual, this meeting is
open to everybody and completely optional.
You might be interested to join if:
You want to know what is going on in BPF land
You are doing something interesting yourself with BPF and would like to share
You want to know what the heck BPF is

=== IO Visor Dev/TSC Meeting ===

Every 2 weeks on Wednesday, from Wednesday, January 25, 2017, to no end date
11:00 am | Pacific Daylight Time (San Francisco, GMT-07:00) | 30 min

https://bluejeans.com/568677804/

https://www.timeanddate.com/worldclock/meetingdetails.html?year=2019&month=4&day=3&hour=18&min=0&sec=0&p1=900


Re: minutes: IO Visor TSC/Dev Meeting

Brenden Blanco
 

On Tue, Apr 2, 2019 at 10:48 AM Saeed Mahameed
<saeedm@...> wrote:

Hi Brenden,

I am sending this on behalf of Michael Savisko, he is having some
difficulties sending emails to the iovisor list.
Sending to the iovisor-dev mailer is gated by a signup requirement, in
order to reduce spam. The signup process should be pretty painless, I
believe it just requires going through an email validation step:
https://lists.iovisor.org/g/iovisor-dev/join

Michael is working on real world use cases for XDP acceleration.
He would like to present and discuss his work and analysis on
accelerating Katran load balancer [1] via meta data offloads.
He will need 10 minutes and will share some slides, i hope we can push
this to tomorrow's meeting agenda.
Sounds good!

Katran modified code is on Github:
https://github.com/savisko/katran/tree/xdp_off

[1] https://code.fb.com/open-source/open-sourcing-katran-a-scalable-network-load-balancer/
Thanks,
Saeed.

On Wed, Mar 20, 2019 at 3:32 PM Brenden Blanco <bblanco@...> wrote:

Hi all,

Thanks for joining the discussion today. Here are the notes; however, this was
a longer discussion and I'm sure I missed some things.

Cheers,
Brenden

=== Discussion ===

Yonghong:
* Some internal BTF work
* Compiler support for static variables
* Some compile-once-run-everywhere work
* Looking for help with issues regarding libbpf packaging/dependencies
* Issue to continue offline, not concluded on the call
* Issue related to function->function call in bcc and compiler optimizations
* Jiong offers to debug the codegen using 32 bit mode

Saeed:
* XDP driver statistics standardization
* All drivers run same entry point for xdp progs
* Why not account stats here?
* Even though xdp program can implement its own statistics
* Many drivers are already paying stats accounting cost
* Just remove unused stats from driver?
* Stats may be used in debugging, but FB for instance is guarding with
static key, wouldn't want extra stats on by default
* Allocating resources for tx queue/redirect?
* Is there a better way to allocate resources when it isn't known that a
program will need queues
* One approach is to attach dummy bpf program
* Resource allocation point when configuring devmap?
* Seems like a clean enough solution, doesn't solve all cases but moves the
ball forward
* BTF metadata structure registration
* Should be queryable from userspace, don't yet have an API for that
* Netlink vs syscall?
* No silver bullet for all use cases
* Hesitation for creating a new object to describe existing objects (bpf
progs, maps)
* BTF is metadata conceptually different from maps, progs
* ethtool? unlikely due to lack of code ownership
* For buffers, something like devlink is more appropriate
* For BTF, bpf() syscall works
* BTF for statistics description (ethtool replacement?)

Daniel:
* verification of static data is working, patches coming soon

=== Attendees ===
Brenden Blanco
Michael Savisko
Alexei Starovoitov
Daniel Borkmann
Jakub Kicinski
Neerav Parikh
Paul Chaignon
Saeed
Marco Leogrande
Jiong Wang
Andrii Nakryiko
Yonghong Song
William Tu
Joe Stringer
John
Maciej Fijalkowski
Martin Lau
Mauricio Vasquez
Piotr Raczynski
Quillian Rutherford



Re: minutes: IO Visor TSC/Dev Meeting

Saeed Mahameed
 

Hi Brenden,

I am sending this on behalf of Michael Savisko, he is having some
difficulties sending emails to the iovisor list.

Michael is working on real world use cases for XDP acceleration.
He would like to present and discuss his work and analysis on
accelerating Katran load balancer [1] via meta data offloads.
He will need 10 minutes and will share some slides, i hope we can push
this to tomorrow's meeting agenda.

Katran modified code is on Github:
https://github.com/savisko/katran/tree/xdp_off

[1] https://code.fb.com/open-source/open-sourcing-katran-a-scalable-network-load-balancer/
Thanks,
Saeed.

On Wed, Mar 20, 2019 at 3:32 PM Brenden Blanco <bblanco@...> wrote:

Hi all,

Thanks for joining the discussion today. Here are the notes; however, this was
a longer discussion and I'm sure I missed some things.

Cheers,
Brenden

=== Discussion ===

Yonghong:
* Some internal BTF work
* Compiler support for static variables
* Some compile-once-run-everywhere work
* Looking for help with issues regarding libbpf packaging/dependencies
* Issue to continue offline, not concluded on the call
* Issue related to function->function call in bcc and compiler optimizations
* Jiong offers to debug the codegen using 32 bit mode

Saeed:
* XDP driver statistics standardization
* All drivers run same entry point for xdp progs
* Why not account stats here?
* Even though xdp program can implement its own statistics
* Many drivers are already paying stats accounting cost
* Just remove unused stats from driver?
* Stats may be used in debugging, but FB for instance is guarding with
static key, wouldn't want extra stats on by default
* Allocating resources for tx queue/redirect?
* Is there a better way to allocate resources when it isn't known that a
program will need queues
* One approach is to attach dummy bpf program
* Resource allocation point when configuring devmap?
* Seems like a clean enough solution, doesn't solve all cases but moves the
ball forward
* BTF metadata structure registration
* Should be queryable from userspace, don't yet have an API for that
* Netlink vs syscall?
* No silver bullet for all use cases
* Hesitation for creating a new object to describe existing objects (bpf
progs, maps)
* BTF is metadata conceptually different from maps, progs
* ethtool? unlikely due to lack of code ownership
* For buffers, something like devlink is more appropriate
* For BTF, bpf() syscall works
* BTF for statistics description (ethtool replacement?)

Daniel:
* verification of static data is working, patches coming soon

=== Attendees ===
Brenden Blanco
Michael Savisko
Alexei Starovoitov
Daniel Borkmann
Jakub Kicinski
Neerav Parikh
Paul Chaignon
Saeed
Marco Leogrande
Jiong Wang
Andrii Nakryiko
Yonghong Song
William Tu
Joe Stringer
John
Maciej Fijalkowski
Martin Lau
Mauricio Vasquez
Piotr Raczynski
Quillian Rutherford



Re: math between pkt pointer and register with unbounded min value is not allowed #verifier

Yonghong Song
 

On Wed, Mar 27, 2019 at 1:48 PM Pablo Alvarez via Lists.Iovisor.Org
<palvarez=akamai.com@...> wrote:

Why is it required that llvm compile the BPF code with -O2? That seems
to be part of what is causing these verifier problems...
-O0 won't work as helper call will become an indirect call
static void *(*bpf_map_lookup_elem)(void *map, void *key) =
(void *) BPF_FUNC_map_lookup_elem;
Another reason is below value tracking in verifier.

The verifier does not handle value spill well. It kept track of map
pointers, packet pointers, etc.
But if a particular value is spilled, later on, when reload from
stack, it will become unknown.
For example,
r1 = 10; /* verifier state: r1 = 10 */
*(r10 - 40) = r1;
r2 = *(r10 - 40); /* verifier state: r2, unknown */

The reason is that keep tracking of values could increase verification
time quite a bit.
This is measured sometime back, but it may warrant to do some measurement again
at this moment to see whether can relax this restriction.

So practically -O0 won't work. -O1 may or may not work and most people
do not use it.
-O2 is preferred. Also for performance reasons, we want to make -O2
work as BPF JIT
inside the kernel does not do any optimization, it is simple one insn
each time translation.


On 3/27/19 4:23 PM, Yonghong Song wrote:
On Wed, Mar 27, 2019 at 10:17 AM Jiong Wang <jiong.wang@...> wrote:

On 27 Mar 2019, at 16:43, Simon <contact@...> wrote:

Thx a lot for your time Jiong.

The more I played with bpf/xdp, the more I understand that the challenge is about making "optimized byte code" compliant for the verifier.

How could I do this kind of checks my self ? I mean looking how llvm optimized my code ? (to be able to do same kind of analyses you do above?)
Just my humble opinion, I would recommend:

1. get used to verifier rejection information, for example:

R0=inv1 R1=pkt(id=0,off=0,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv(id=0) R4=inv(id=0,umax_value=504,var_off=(0x0; 0x1ff)) R5=inv5 R10=fp0,call_-1
40: (0f) r1 += r3
math between pkt pointer and register with unbounded min value is not allowed

It tells you the status of each registers at the rejection point,
for example, now R3 is “inv”, meaning a scalar value (not a pointer),
and is without value range, then r4 has value range, and maximum value
is 504.
If you use BPF constructor debug=16 flag, it will print out the
register state for every insn if you are even more curious.

2. known what verifier will reject. Could refer to:

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/verifier?id=473c5daa86ffe91e937856cc32b4faa61db2e3e3

those are unit examples of what will be rejected, and some of them are with
meaningful test name or comments so could be easy to understand.
To resolve this issue, llvm may need to do more:
- prevent/undo optimization which may cause ultimate verifier rejections.
- provide hints (e.g., through BTF) to verifier so verifier may
selectively do some analysis
or enable some tracking for the cases where BTF instructed to
handle. For example,
BTF may tell verifier two register have the same state at a
particular point and verifier
only needs to check these two registers with limited range and no
others, etc.

Regards,
Jiong







Re: math between pkt pointer and register with unbounded min value is not allowed #verifier

Pablo Alvarez
 

Why is it required that llvm compile the BPF code with -O2? That seems to be part of what is causing these verifier problems...

On 3/27/19 4:23 PM, Yonghong Song wrote:
On Wed, Mar 27, 2019 at 10:17 AM Jiong Wang <jiong.wang@...> wrote:

On 27 Mar 2019, at 16:43, Simon <contact@...> wrote:

Thx a lot for your time Jiong.

The more I played with bpf/xdp, the more I understand that the challenge is about making "optimized byte code" compliant for the verifier.

How could I do this kind of checks my self ? I mean looking how llvm optimized my code ? (to be able to do same kind of analyses you do above?)
Just my humble opinion, I would recommend:

1. get used to verifier rejection information, for example:

R0=inv1 R1=pkt(id=0,off=0,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv(id=0) R4=inv(id=0,umax_value=504,var_off=(0x0; 0x1ff)) R5=inv5 R10=fp0,call_-1
40: (0f) r1 += r3
math between pkt pointer and register with unbounded min value is not allowed

It tells you the status of each registers at the rejection point,
for example, now R3 is “inv”, meaning a scalar value (not a pointer),
and is without value range, then r4 has value range, and maximum value
is 504.
If you use BPF constructor debug=16 flag, it will print out the
register state for every insn if you are even more curious.

2. known what verifier will reject. Could refer to:

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/verifier?id=473c5daa86ffe91e937856cc32b4faa61db2e3e3

those are unit examples of what will be rejected, and some of them are with
meaningful test name or comments so could be easy to understand.
To resolve this issue, llvm may need to do more:
- prevent/undo optimization which may cause ultimate verifier rejections.
- provide hints (e.g., through BTF) to verifier so verifier may
selectively do some analysis
or enable some tracking for the cases where BTF instructed to
handle. For example,
BTF may tell verifier two register have the same state at a
particular point and verifier
only needs to check these two registers with limited range and no
others, etc.

Regards,
Jiong





Re: math between pkt pointer and register with unbounded min value is not allowed #verifier

Yonghong Song
 

On Wed, Mar 27, 2019 at 10:17 AM Jiong Wang <jiong.wang@...> wrote:


On 27 Mar 2019, at 16:43, Simon <contact@...> wrote:

Thx a lot for your time Jiong.

The more I played with bpf/xdp, the more I understand that the challenge is about making "optimized byte code" compliant for the verifier.

How could I do this kind of checks my self ? I mean looking how llvm optimized my code ? (to be able to do same kind of analyses you do above?)
Just my humble opinion, I would recommend:

1. get used to verifier rejection information, for example:

R0=inv1 R1=pkt(id=0,off=0,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv(id=0) R4=inv(id=0,umax_value=504,var_off=(0x0; 0x1ff)) R5=inv5 R10=fp0,call_-1
40: (0f) r1 += r3
math between pkt pointer and register with unbounded min value is not allowed

It tells you the status of each registers at the rejection point,
for example, now R3 is “inv”, meaning a scalar value (not a pointer),
and is without value range, then r4 has value range, and maximum value
is 504.
If you use BPF constructor debug=16 flag, it will print out the
register state for every insn if you are even more curious.


2. known what verifier will reject. Could refer to:

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/verifier?id=473c5daa86ffe91e937856cc32b4faa61db2e3e3

those are unit examples of what will be rejected, and some of them are with
meaningful test name or comments so could be easy to understand.
To resolve this issue, llvm may need to do more:
- prevent/undo optimization which may cause ultimate verifier rejections.
- provide hints (e.g., through BTF) to verifier so verifier may
selectively do some analysis
or enable some tracking for the cases where BTF instructed to
handle. For example,
BTF may tell verifier two register have the same state at a
particular point and verifier
only needs to check these two registers with limited range and no
others, etc.

Regards,
Jiong







Re: math between pkt pointer and register with unbounded min value is not allowed #verifier

Jiong Wang
 

On 27 Mar 2019, at 16:43, Simon <contact@...> wrote:

Thx a lot for your time Jiong.

The more I played with bpf/xdp, the more I understand that the challenge is about making "optimized byte code" compliant for the verifier.

How could I do this kind of checks my self ? I mean looking how llvm optimized my code ? (to be able to do same kind of analyses you do above?)
Just my humble opinion, I would recommend:

1. get used to verifier rejection information, for example:

R0=inv1 R1=pkt(id=0,off=0,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv(id=0) R4=inv(id=0,umax_value=504,var_off=(0x0; 0x1ff)) R5=inv5 R10=fp0,call_-1
40: (0f) r1 += r3
math between pkt pointer and register with unbounded min value is not allowed

It tells you the status of each registers at the rejection point,
for example, now R3 is “inv”, meaning a scalar value (not a pointer),
and is without value range, then r4 has value range, and maximum value
is 504.

2. known what verifier will reject. Could refer to:

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/verifier?id=473c5daa86ffe91e937856cc32b4faa61db2e3e3

those are unit examples of what will be rejected, and some of them are with
meaningful test name or comments so could be easy to understand.

Regards,
Jiong





Re: math between pkt pointer and register with unbounded min value is not allowed #verifier

Simon
 

Thx a lot for your time Jiong.

The more I played with bpf/xdp, the more I understand that the challenge is about making "optimized byte code" compliant for the verifier.

How could I do this kind of checks my self ? I mean looking how llvm optimized my code ? (to be able to do same kind of analyses you do above?)


Re: math between pkt pointer and register with unbounded min value is not allowed #verifier

Jiong Wang
 

On 27 Mar 2019, at 16:11, Jiong Wang via Lists.Iovisor.Org <jiong.wang=netronome.com@...> wrote:


On 27 Mar 2019, at 14:53, Simon <contact@...> wrote:

Hi Jiong,
I didn't succeed to generate .i file using bcc, but since severals days I try to rewrite my code without bcc. (directly with bpf C api / clang / iproute2)

I didn't finished yet, but I have a reduced version compared to the one I written for bcc and I face the same issue, so this time I can get .i file easily.

So the .i file is as attachment.
The corresponding code is available here : https://github.com/sbernard31/udploadbalancer/blob/44fe1ea549a55ab23c7d1b70e9651df6f61fb865/ulb.c

Hi Simon,

Thanks for the .i, I prototyped some byteswap code-gen change, but
seems doesn’t help your issue which could narrow down to the following
general code pattern:

unsigned char cal(unsigned int a, unsigned char *b)
{
if (a < 8 || a > 512)
return 0;

return b[a];
}

LLVM is doing some optimisation, instead of generating two separate comparison,
a < 8 and a > 512, it is combining them because a negative value when casted
into unsigned must be bigger than 504, so above code turned into

unsigned char cal(unsigned int a, unsigned char *b)
{
unsigned tmp = a - 8;
if (tmp > 504)
return 0;

return b[a];
}

The consequence of such optimisation is new variable “tmp” is used for comparison
And verifier now know “tmp”'s value range instead of the original “a” which is used
later adding to a packet pointer. A unknown value range of “a” then caused the
verifier rejection.

So, I suspect any code using above c code pattern will likely be rejected.

1. combinable comparisons
2. the variable involved in the comparison used later in places requiring value range
And in your code, after you insert those printk, they made the following
two comparisons non-combinable any more, so udp_len is used for the
comparison and got correct value range to pass the later pkt pointer
addition check.


if (udp_len < 8) {
return XDP_DROP;
}
if (udp_len > 512) {
return XDP_DROP;
}


Regards,
Jiong


The error is still math between pkt pointer and register with unbounded min value is not allowed

The verifier output is pretty much the same :

27: (71) r4 = *(u8 *)(r1 +23)
28: (b7) r0 = 2
29: (55) if r4 != 0x11 goto pc+15
R0=inv2 R1=pkt(id=0,off=0,r=34,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=pkt(id=0,off=34,r=34,imm=0) R4=inv17 R5=inv5 R10=fp0,call_-1
30: (07) r3 += 8
31: (b7) r0 = 1
32: (2d) if r3 > r2 goto pc+12
R0=inv1 R1=pkt(id=0,off=0,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=pkt(id=0,off=42,r=42,imm=0) R4=inv17 R5=inv5 R10=fp0,call_-1
33: (69) r3 = *(u16 *)(r1 +38)
34: (dc) r3 = be16 r3
35: (bf) r4 = r3
36: (07) r4 += -8
37: (57) r4 &= 65535
38: (b7) r0 = 1
39: (25) if r4 > 0x1f8 goto pc+5
R0=inv1 R1=pkt(id=0,off=0,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv(id=0) R4=inv(id=0,umax_value=504,var_off=(0x0; 0x1ff)) R5=inv5 R10=fp0,call_-1
40: (0f) r1 += r3
math between pkt pointer and register with unbounded min value is not allowed

I'm pretty sure the bcc version I used before linked statically clang/llvm v7.0.
Here I use a v6.0.

The funny part ... This modification about just adding some logs/printk makes this error disappear : https://github.com/sbernard31/udploadbalancer/commit/0145538c7b35e2a6bb92225f69a45f4bee120a6d

All of those erifier errors make me a bit crazy (╥_╥)

HTH

Simon


<ulb.i>


Re: math between pkt pointer and register with unbounded min value is not allowed #verifier

Jiong Wang
 

On 27 Mar 2019, at 14:53, Simon <contact@...> wrote:

Hi Jiong,
I didn't succeed to generate .i file using bcc, but since severals days I try to rewrite my code without bcc. (directly with bpf C api / clang / iproute2)

I didn't finished yet, but I have a reduced version compared to the one I written for bcc and I face the same issue, so this time I can get .i file easily.

So the .i file is as attachment.
The corresponding code is available here : https://github.com/sbernard31/udploadbalancer/blob/44fe1ea549a55ab23c7d1b70e9651df6f61fb865/ulb.c

Hi Simon,

Thanks for the .i, I prototyped some byteswap code-gen change, but
seems doesn’t help your issue which could narrow down to the following
general code pattern:

unsigned char cal(unsigned int a, unsigned char *b)
{
if (a < 8 || a > 512)
return 0;

return b[a];
}

LLVM is doing some optimisation, instead of generating two separate comparison,
a < 8 and a > 512, it is combining them because a negative value when casted
into unsigned must be bigger than 504, so above code turned into

unsigned char cal(unsigned int a, unsigned char *b)
{
unsigned tmp = a - 8;
if (tmp > 504)
return 0;

return b[a];
}

The consequence of such optimisation is new variable “tmp” is used for comparison
And verifier now know “tmp”'s value range instead of the original “a” which is used
later adding to a packet pointer. A unknown value range of “a” then caused the
verifier rejection.

So, I suspect any code using above c code pattern will likely be rejected.

1. combinable comparisons
2. the variable involved in the comparison used later in places requiring value range

Regards,
Jiong


The error is still math between pkt pointer and register with unbounded min value is not allowed

The verifier output is pretty much the same :

27: (71) r4 = *(u8 *)(r1 +23)
28: (b7) r0 = 2
29: (55) if r4 != 0x11 goto pc+15
R0=inv2 R1=pkt(id=0,off=0,r=34,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=pkt(id=0,off=34,r=34,imm=0) R4=inv17 R5=inv5 R10=fp0,call_-1
30: (07) r3 += 8
31: (b7) r0 = 1
32: (2d) if r3 > r2 goto pc+12
R0=inv1 R1=pkt(id=0,off=0,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=pkt(id=0,off=42,r=42,imm=0) R4=inv17 R5=inv5 R10=fp0,call_-1
33: (69) r3 = *(u16 *)(r1 +38)
34: (dc) r3 = be16 r3
35: (bf) r4 = r3
36: (07) r4 += -8
37: (57) r4 &= 65535
38: (b7) r0 = 1
39: (25) if r4 > 0x1f8 goto pc+5
R0=inv1 R1=pkt(id=0,off=0,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv(id=0) R4=inv(id=0,umax_value=504,var_off=(0x0; 0x1ff)) R5=inv5 R10=fp0,call_-1
40: (0f) r1 += r3
math between pkt pointer and register with unbounded min value is not allowed

I'm pretty sure the bcc version I used before linked statically clang/llvm v7.0.
Here I use a v6.0.

The funny part ... This modification about just adding some logs/printk makes this error disappear : https://github.com/sbernard31/udploadbalancer/commit/0145538c7b35e2a6bb92225f69a45f4bee120a6d

All of those erifier errors make me a bit crazy (╥_╥)

HTH

Simon


<ulb.i>


Re: R? min value is negative, either use unsigned or 'var &= const' #verifier

Simon
 

Hi,
 I begin to start to rewrite my code without using bcc. (I only used bpf c api / clang/ iproute2)

 I have a reduced version compared to the one I used here, unlike the previous error I reported here, I was not able to reproduce it.
 But I get a new one for exactly the same call (checksum calculation/bpf_csum_diff ...)

4: (57) r0 &= 65535
65: (0f) r0 += r1
66: (bf) r1 = r0
67: (77) r1 >>= 16
68: (15) if r1 == 0x0 goto pc+2

R0=inv(id=0,umax_value=4295032831,var_off=(0x0; 0x1ffffffff)) 
R1=inv(id=0,umax_value=65536,var_off=(0x0; 0x1ffff)) R6=pkt(id=0,off=34,r=42,imm=0) 
R7=inv(id=0,umax_value=511,var_off=(0x0; 0x1ff)) R8=inv0 
R9=pkt(id=0,off=0,r=42,imm=0) R10=fp0,call_-1
69: (57) r0 &= 65535
70: (0f) r0 += r1
71: (bf) r1 = r0
72: (77) r1 >>= 16
73: (0f) r1 += r0
74: (a7) r1 ^= -1
75: (6b) *(u16 *)(r9 +24) = r1
76: (6b) *(u16 *)(r9 +40) = r8
77: (bf) r3 = r9
78: (07) r3 += 26
79: (b7) r1 = 0
80: (b7) r2 = 0
81: (b7) r4 = 4
82: (b7) r5 = 0
83: (85) call bpf_csum_diff#28
84: (bf) r3 = r9
85: (07) r3 += 30
86: (b7) r1 = 0
87: (b7) r2 = 0
88: (b7) r4 = 4
89: (bf) r5 = r0
90: (85) call bpf_csum_diff#28
91: (71) r1 = *(u8 *)(r9 +23)
92: (dc) r1 = be32 r1
93: (63) *(u32 *)(r10 -4) = r1
94: (bf) r8 = r10
95: (07) r8 += -4
96: (b7) r1 = 0
97: (b7) r2 = 0
98: (bf) r3 = r8
99: (b7) r4 = 4
100: (bf) r5 = r0
101: (85) call bpf_csum_diff#28
102: (57) r7 &= 65535
103: (bf) r1 = r7
104: (dc) r1 = be32 r1
105: (63) *(u32 *)(r10 -4) = r1
106: (b7) r1 = 0
107: (b7) r2 = 0
108: (bf) r3 = r8
109: (b7) r4 = 4
110: (bf) r5 = r0
111: (85) call bpf_csum_diff#28
112: (b7) r1 = 0
113: (b7) r2 = 0
114: (bf) r3 = r6
115: (bf) r4 = r7
116: (bf) r5 = r0
117: (85) call bpf_csum_diff#28
invalid access to packet, off=34 size=511, R3(id=0,off=34,r=42)

I think I understand the error.

R7 which is my udp_len variable. It is considered as a integer with a max value 511 (min value should be 8 but I can not see that in verifier log)
And R6 is a reference to the packet at offset 34 with a max valid size of 42 (r=42?) and so boom !

But I already checked that this is a valid access before : https://github.com/sbernard31/udploadbalancer/blob/bpf_only_without_logs/ulb.c#L115

Is it another issue ? or pretty much the same ?

 


Re: math between pkt pointer and register with unbounded min value is not allowed #verifier

Simon
 

Hi Jiong,
   I didn't succeed to generate .i file using bcc, but since severals days I try to rewrite my code without bcc. (directly with bpf C api / clang / iproute2)

   I didn't finished yet, but I have a reduced version compared to the one I written for bcc and I face the same issue, so this time I can get .i file easily.

   So the .i file is as attachment.
   The corresponding code is available here : https://github.com/sbernard31/udploadbalancer/blob/44fe1ea549a55ab23c7d1b70e9651df6f61fb865/ulb.c

   The error is still  math between pkt pointer and register with unbounded min value is not allowed

   The verifier output is pretty much the same :

27: (71) r4 = *(u8 *)(r1 +23)
28: (b7) r0 = 2
29: (55) if r4 != 0x11 goto pc+15
 R0=inv2 R1=pkt(id=0,off=0,r=34,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=pkt(id=0,off=34,r=34,imm=0) R4=inv17 R5=inv5 R10=fp0,call_-1
30: (07) r3 += 8
31: (b7) r0 = 1
32: (2d) if r3 > r2 goto pc+12
 R0=inv1 R1=pkt(id=0,off=0,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=pkt(id=0,off=42,r=42,imm=0) R4=inv17 R5=inv5 R10=fp0,call_-1
33: (69) r3 = *(u16 *)(r1 +38)
34: (dc) r3 = be16 r3
35: (bf) r4 = r3
36: (07) r4 += -8
37: (57) r4 &= 65535
38: (b7) r0 = 1
39: (25) if r4 > 0x1f8 goto pc+5
 R0=inv1 R1=pkt(id=0,off=0,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv(id=0) R4=inv(id=0,umax_value=504,var_off=(0x0; 0x1ff)) R5=inv5 R10=fp0,call_-1
40: (0f) r1 += r3
math between pkt pointer and register with unbounded min value is not allowed

I'm pretty sure the bcc version I used before linked statically clang/llvm v7.0.
Here I use a v6.0.

The funny part ...  This modification about just adding some logs/printk makes this error disappear : https://github.com/sbernard31/udploadbalancer/commit/0145538c7b35e2a6bb92225f69a45f4bee120a6d

All of those erifier errors make me a bit crazy (╥_╥)

HTH

Simon


[PATCH v3 bpf-next 1/3] BPF: helpers: New helper to obtain namespace data from current task

neirac
 

Hi,

Could you give me a hand with a couple of doubts?

What is the reason to not use the current namespace api instead of directly
  accessing namespaces?.

Regarding bpf programs not being preemptible. If we add spin_locks to the
  vfs_getattr call, would that be an acceptable solution?

   spin_lock(&bpf_lock);
   res = vfs_getattr(&kp, &ks);
   spin_unlock(&bpf_lock);

Is there another way to interact with the vfs layer within a bpf helper?.

---------- Forwarded message ---------
From: Carlos Antonio Neira Bustos <cneirabustos@...>
Date: Thu, Mar 21, 2019 at 7:40 AM
Subject: Re: [PATCH v3 bpf-next 1/3] BPF: helpers: New helper to obtain namespace data from current task
To: Alexei Starovoitov <alexei.starovoitov@...>
Cc: <netdev@...>, <ys114321@...>


On Wed, Mar 20, 2019 at 06:23:20PM -0700, Alexei Starovoitov wrote:
> On Wed, Mar 20, 2019 at 01:49:22PM -0300, Carlos Antonio Neira Bustos wrote:
> >
> > This is a series of patches to introduce a new helper called bpf_get_current_pidns_info,
> > this change has been splitted into the following patches:
> >
> > 1- Feature introduction
> > 2- Update tools/.../bpf.h
> > 3- Self tests and samples
> >
> >
> > From 852a65906122b05b4d1a23af868b2c245d240402 Mon Sep 17 00:00:00 2001
> > From: Carlos <cneirabustos@...>
> > Date: Tue, 19 Mar 2019 19:38:48 -0300
> > Subject: [PATCH] [PATCH bpf-next 1/3] BPF: New helper to obtain namespace data
> >  from current task
> >
> > This helper obtains the active namespace from current and returns pid, tgid,
> > device and namespace id as seen from that namespace, allowing to instrument
> > a process inside a container.
> > Device is read from /proc/self/ns/pid, as in the future it's possible that
> > different pid_ns files may belong to different devices, according
> > to the discussion between Eric Biederman and Yonghong in 2017 linux plumbers
> > conference.
> > Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's
> > scripts but this helper returns the pid as seen by the root namespace which is
> > fine when a bcc script is not executed inside a container.
> > When the process of interest is inside a container, pid filtering will not work
> > if bpf_get_current_pid_tgid() is used. This helper addresses this limitation
> > returning the pid as it's seen by the current namespace where the script is
> > executing.
> >
> > This helper has the same use cases as bpf_get_current_pid_tgid() as it can be
> > used to do pid filtering even inside a container.
> >
> > For example a bcc script using bpf_get_current_pid_tgid() (tools/funccount.py):
> >
> >         u32 pid = bpf_get_current_pid_tgid() >> 32;
> >         if (pid != <pid_arg_passed_in>)
> >                 return 0;
> > Could be modified to use bpf_get_current_pidns_info() as follows:
> >
> >         struct bpf_pidns pidns;
> >         bpf_get_current_pidns_info(&pidns, sizeof(struct bpf_pidns));
> >         u32 pid = pidns.tgid;
> >         u32 nsid = pidns.nsid;
> >         if ((pid != <pid_arg_passed_in>) && (nsid != <nsid_arg_passed_in>))
> >                 return 0;
> >
> > To find out the name PID namespace id of a process, you could use this command:
> >
> > $ ps -h -o pidns -p <pid_of_interest>
> >
> > Or this other command:
> >
> > $ ls -Li /proc/<pid_of_interest>/ns/pid
> >
> > Signed-off-by: Carlos Antonio Neira Bustos <cneirabustos@...>
> > -
> > ---
> >  include/linux/bpf.h      |  1 +
> >  include/uapi/linux/bpf.h | 26 ++++++++++++++++++-
> >  kernel/bpf/core.c        |  1 +
> >  kernel/bpf/helpers.c     | 67 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  kernel/trace/bpf_trace.c |  2 ++
> >  5 files changed, 96 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index a2132e09dc1c..a77f5bd77bd8 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -930,6 +930,7 @@ extern const struct bpf_func_proto bpf_sk_redirect_map_proto;
> >  extern const struct bpf_func_proto bpf_spin_lock_proto;
> >  extern const struct bpf_func_proto bpf_spin_unlock_proto;
> >  extern const struct bpf_func_proto bpf_get_local_storage_proto;
> > +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
> > 
> >  /* Shared helpers among cBPF and eBPF. */
> >  void bpf_user_rnd_init_once(void);
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 3c38ac9a92a7..facc701c7873 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -2366,6 +2366,18 @@ union bpf_attr {
> >   *             current value is ect (ECN capable). Works with IPv6 and IPv4.
> >   *     Return
> >   *             1 if set, 0 if not set.
> > + *
> > + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
> > + * Description
> > + *         Copies into *pidns* pid, namespace id and tgid as seen by the
> > + *         current namespace and also device from /proc/self/ns/pid.
> > + *         *size_of_pidns* must be the size of *pidns*
> > + *
> > + *         This helper is used when pid filtering is needed inside a
> > + *         container as bpf_get_current_tgid() helper returns always the
> > + *         pid id as seen by the root namespace.
> > + * Return
> > + *         0 on success -EINVAL on error.
> >   */
> >  #define __BPF_FUNC_MAPPER(FN)              \
> >     FN(unspec),                     \
> > @@ -2465,7 +2477,8 @@ union bpf_attr {
> >     FN(spin_unlock),                \
> >     FN(sk_fullsock),                \
> >     FN(tcp_sock),                   \
> > -   FN(skb_ecn_set_ce),
> > +   FN(skb_ecn_set_ce),             \
> > +   FN(get_current_pidns_info),
> > 
> >  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> >   * function eBPF program intends to call
> > @@ -3152,4 +3165,15 @@ struct bpf_line_info {
> >  struct bpf_spin_lock {
> >     __u32   val;
> >  };
> > +
> > +/* helper bpf_get_current_pidns_info will store the following
> > + * data, dev will contain major/minor from /proc/self/pid.
> > +*/
> > +struct bpf_pidns_info {
> > +   __u32 dev;
> > +   __u32 nsid;
> > +   __u32 tgid;
> > +   __u32 pid;
> > +};
> > +
> >  #endif /* _UAPI__LINUX_BPF_H__ */
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 3f08c257858e..06329fbed95f 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -2044,6 +2044,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
> >  const struct bpf_func_proto bpf_get_current_comm_proto __weak;
> >  const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
> >  const struct bpf_func_proto bpf_get_local_storage_proto __weak;
> > +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
> > 
> >  const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
> >  {
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index a411fc17d265..95c3780a6ba7 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -18,6 +18,11 @@
> >  #include <linux/sched.h>
> >  #include <linux/uidgid.h>
> >  #include <linux/filter.h>
> > +#include <linux/pid_namespace.h>
> > +#include <linux/major.h>
> > +#include <linux/stat.h>
> > +#include <linux/namei.h>
> > +#include <linux/version.h>
> > 
> >  /* If kernel subsystem is allowing eBPF programs to call this function,
> >   * inside its own verifier_ops->get_func_proto() callback it should return
> > @@ -364,3 +369,65 @@ const struct bpf_func_proto bpf_get_local_storage_proto = {
> >  };
> >  #endif
> >  #endif
> > +
> > +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
> > +    size)
> > +{
> > +   const char *nspid = "/proc/self/ns/pid";
> > +   struct pid_namespace *pidns = NULL;
> > +   struct kstat ks;
> > +   struct path kp;
> > +   pid_t tgid = 0;
> > +   pid_t pid = 0;
> > +   int res = 0;
> > +
> > +   if (unlikely(size != sizeof(struct bpf_pidns_info)))
> > +           goto clear;
> > +
> > +   pidns = task_active_pid_ns(current);
> > +
> > +   if (unlikely(!pidns))
> > +           goto clear;
> > +
> > +   pidns_info->nsid =  pidns->ns.inum;
> > +   pid = task_pid_nr_ns(current, pidns);
> > +
> > +   if (unlikely(!pid))
> > +           goto clear;
> > +
> > +   tgid = task_tgid_nr_ns(current, pidns);
> > +
> > +   if (unlikely(!tgid))
> > +           goto clear;
> > +
> > +   pidns_info->tgid = (u32) tgid;
> > +   pidns_info->pid = (u32) pid;
> > +
> > +        kern_path(nspid, 0, &kp);
> > +
> > +#if LINUX_VERSION_CODE >= KERNEL_VERSION(4,11,0)
> > +    res = vfs_getattr(&kp, &ks, STATX_ALL, 0);
> > +#else
> > +    res = vfs_getattr(&kp, &ks);
> > +#endif
>
> Please access namespaces directly.
> I suspect bpf helpers cannot do vfs_getattr. Something inside might sleep
> while bpf progs are non preemptible.
>
> Also please run ./scripts/checkpatch.pl on your patches
> and read Documentation/bpf/bpf_devel_QA.rst.
>
Hello Alexei,

Thanks for checking this out and also thanks for the documentation link.
At this point I have a couple of questions:

* What is the reason to not use the current namespace api instead of directly
  accessing namespaces?.

* Regarding bpf programs not being preemptible. If we add spin_locks to the
  vfs_getattr call, would that be an acceptable solution?

   spin_lock(&bpf_lock);
   res = vfs_getattr(&kp, &ks);
   spin_unlock(&bpf_lock);

Is there another way to interact with the vfs layer within a bpf helper?.

Bests


Re: R? min value is negative, either use unsigned or 'var &= const' #verifier

Yonghong Song
 

Hi, Jiong,

Thanks for your interest to help with this issue.
You can reproduce with the code at
https://github.com/sbernard31/udploadbalancer/tree/bf71e99fbd0c3f806a43076fc12a47e966422839
Using command:
sudo python ulb.py lo -vip 10.41.44.13 -rs
00:00:00:00:00:00/127.0.0.1 -p 5683 5684
You need to have bcc installed in the system.

Yonghong


On Tue, Mar 19, 2019 at 11:23 PM Yonghong Song via Lists.Iovisor.Org
<ys114321=gmail.com@...> wrote:

On Tue, Mar 19, 2019 at 9:06 AM Simon <contact@...> wrote:

The compiler is doing optimization which make verifier fail. It is possible an early compiler with less optimizations may work.

Maybe a silly question, but does it make sense to try to change compiler optimization option ? (I tried to play with -O option without success)
Maybe. I have not looked at this yet from compiler side. Sometimes you
won't have an easy compiler option to turn off. Tuning -O may not
help. Lowering -O to -O1/-O0 may help to remove this particular
optimization, but may introduce more spills which verifier will also
reject.


Please keep me informed about your progress about this issue :)
Sure. Will let you know if I have made progress in this.


Thx again.





minutes: IO Visor TSC/Dev Meeting

Brenden Blanco
 

Hi all,

Thanks for joining the discussion today. Here are the notes; however, this was
a longer discussion and I'm sure I missed some things.

Cheers,
Brenden

=== Discussion ===

Yonghong:
* Some internal BTF work
* Compiler support for static variables
* Some compile-once-run-everywhere work
* Looking for help with issues regarding libbpf packaging/dependencies
* Issue to continue offline, not concluded on the call
* Issue related to function->function call in bcc and compiler optimizations
* Jiong offers to debug the codegen using 32 bit mode

Saeed:
* XDP driver statistics standardization
* All drivers run same entry point for xdp progs
* Why not account stats here?
* Even though xdp program can implement its own statistics
* Many drivers are already paying stats accounting cost
* Just remove unused stats from driver?
* Stats may be used in debugging, but FB for instance is guarding with
static key, wouldn't want extra stats on by default
* Allocating resources for tx queue/redirect?
* Is there a better way to allocate resources when it isn't known that a
program will need queues
* One approach is to attach dummy bpf program
* Resource allocation point when configuring devmap?
* Seems like a clean enough solution, doesn't solve all cases but moves the
ball forward
* BTF metadata structure registration
* Should be queryable from userspace, don't yet have an API for that
* Netlink vs syscall?
* No silver bullet for all use cases
* Hesitation for creating a new object to describe existing objects (bpf
progs, maps)
* BTF is metadata conceptually different from maps, progs
* ethtool? unlikely due to lack of code ownership
* For buffers, something like devlink is more appropriate
* For BTF, bpf() syscall works
* BTF for statistics description (ethtool replacement?)

Daniel:
* verification of static data is working, patches coming soon

=== Attendees ===
Brenden Blanco
Michael Savisko
Alexei Starovoitov
Daniel Borkmann
Jakub Kicinski
Neerav Parikh
Paul Chaignon
Saeed
Marco Leogrande
Jiong Wang
Andrii Nakryiko
Yonghong Song
William Tu
Joe Stringer
John
Maciej Fijalkowski
Martin Lau
Mauricio Vasquez
Piotr Raczynski
Quillian Rutherford


[RFC][Proposal] BPF Control MAP

Saeed Mahameed <saeedm@...>
 

In this proposal I am going to address the lack of a unified user API
for accessing and manipulating BPF system attributes, while this
proposal is generic and will work on any BPF subsystem (eBPF attach
points), I will mostly focus on XDP use cases.

So lately I started working on three different XDP open issues, namely
XDP statistic, XDP redirect and XDP meta-data, while the details of
these issues are not really relevant for the sake of this proposal, all
of them share one common problem: the lack of unified user interface to
manipulate and access their attributes.

Examples:
1. Query XDP statistics.
2. XDP resource management, Setup XDP-redirect TX resources.
3. Setup and query XDP-metadata - (BTF data structure).

Jesper Brouer, explains some of these issues in details at:
https://github.com/xdp-project/xdp-project/blob/master/xdp-project.org

Yes I considered, netlink, devlink, ethtool, sysctrl, etc .. but each
one of them has it's own drawback, they are networking specific and
will not serve the BPF general purpose.

What we want is, all of the BPF related knobs to be present in BPF user
tools: bcc, bpftool and libbpf. Ideally we don't want these tools to
integrate with all different subsystem's UAPIs, especially the wide
variety of the networking UAPIs, and imagine what other subsystems are
going to be using ..

So what seems to be the right path here is a unified BPF
control/configuration user interface, which will hook the caller with
the targeted subsystem.

To be aligned with all existing BPF tools I am going to propose the use
of BPF syscall (No, not a new BPF syscall command, I am not planing to
reinvent the wheel - "again" -).
What i am going to suggest is to use an already existing API which runs
on top of the BPF syscall, BPF MAPs API with just a small tweak. Enter:



BPF control MAP:

A special type of MAP "BPF_MAP_TYPE_CONTROL", this map will not behave
like other maps in the essence of having a user defined data structure
behind it, we are going to use it just to hook the user with the
targeted underlying subsystem and delegate user commands to it through
map operations (create/update_elem/lookup_elem/etc ...)



Requirements and implementation details:

1) Hook the user with the targeted subsystem:
- On create map, user selects the BPF_MAP_TYPE_CONTROL map type and
sets map_attr.ctrl_type to be the subsystem he wants to access and
manipulate (KPROBE/CGROUP/SOCKET_FILTER/XDP/etc..).

2) Set and Get operations of a specific BPF subsystem or an object in
that subsystem (for example a netdev in XDP).
- user will use the file descriptor retrieved on map creation to access
(Set/Get) the BPF subsystem attributes via map update_elem and
lookup_elem operations, the key will be the object id (example:
ifindex, or just the type of configuration to access) keys and values
are subsystem dependent.

3) Iterate through the different attributes/objects of the subsystem,
Use case: XDP BPF subsystem, get ALL netdevs XDP attributes/statistics.
can be easily achieved with: bpf_map_get_next_key.



Advantages & Motivation:
Why BPF MAP and not just a plain new BPF syscall command or any other
existing UAPI:

0) All BPF users love maps and got used to them, and simply, everything
is a map, system objects can be keys and their attributes can be
values.

1) **BTF** integration, any map (key, value) pair can be described in
BTF in kernel level and can be attached to the map the user creates,
this will be a huge advantage for user forward compatibility, and for
development convenience to not copy kernel uapi headers on each
attribute set updates, and simplify ABI compatibility.
New values or attributes can be dumped/parsed in user space with zero
effort, no need to constantly update user space tools.

2) BPF maps already laid the groundwork for our requirements as the
infrastructure and has the semantics that we are looking for (set/get).

3) Already integrated in user-space tools and libraries such ash
bcc/libbpf and friends, what is missing is just this small tweak (in
the kernel) to hook one special map type with the underlying BPF
subsystems.

Thoughts ?


[Some EXTRAs]
Example use cases (XDP only for now):

1) Query XDP stats of all XDP netdevs:

xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =
XDP_STATS);

while (bpf_map_get_next_key(xdp_ctrl, &ifindex, &next_ifindex) == 0) {
bpf_map_lookup_elem(xdp_ctrl, &next_ifindex, &stats);
// we don't even need to know stats format in this case
btf_pretty_print(xdp_ctrl->btf, &stats);
ifindex = next_ifindex;
}

2) Setup XDP tx redirect resources on egress netdev (netdev with no XDP
program).

xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =
XDP_ATTR);

xdp_attr->command = SETUP_REDIRECT;
xdp_attr->rings.num = 12;
xdp_attr->rings.size = 128;
bpf_map_update_elem(xdp_ctrl, &ifindex, &xdp_attr);

3) Turn On/Off XDP meta data offloads and retrieve meta data BTF format
of specific netdev/hardware:

xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =
XDP_ATTR);

xdp_attr->command = SETUP_MD;
xdp_attr->enable_md = 1;
err = bpf_map_update_elem(xdp_ctrl, &ifindex, &xdp_attr);
if (err) {
printf("XDP meta data is not supported on this netdev");
return;
}
// Query Meta data BTF
bpf_map_lookup_elem(xdp_ctrl, &ifindex, &xdp_attr);
md_btf = xdp_attr.md_btf;

Thanks,
Saeed.

381 - 400 of 2020