Date   

Re: Which is oldest linux kernel version that can support BTF? #bcc

Alison Chaiken
 

bg.salunke09@... asked:
Can I get information about oldest linux kernel version that can support BTF?
The basic support appears to have been added by

commit e83b9f55448afce3fe1abcd1d10db9584f8042a6
Author: Andrii Nakryiko <andriin@...>
Date: Tue Apr 2 09:49:50 2019 -0700
kbuild: add ability to generate BTF type info for vmlinux

The inquiry "git branch --contains e83b9f55448a" will tell you which
of your branches contains this commit.

Hope this helps,
Alison Chaiken
Aurora Innovation


Which is oldest linux kernel version that can support BTF? #bcc

bg.salunke09@...
 
Edited

Hi, 

I'm looking into BTF and it's use case. Based on the document I understood to run BPF programs across different kernel versions, it needs to build with libbpf which depends on the BTF information. 
Now to enable/to have BTF information on any Kernel, the kernel needs to be re-build with "" flag. 

I can see the BTF support in Linux introduced from kernel version 5.1.0  (https://www.kernel.org/doc/html/v5.1/bpf/btf.html?highlight=btf)
however I can still see the BTF information(/sys/kernel/btf/vmlinux) on my 4.18.0-193.28.1.el8_2.x86_64 kernel.

I'm little confused here how old kernel can generate BTF info if the was support added recently

Can I get information about oldest linux kernel version that can support BTF?



Re: BCC Support for BPF Subprograms with Tail Calls (Kernel 5.10 Feature)

Yonghong Song
 

On Wed, Feb 24, 2021 at 12:24 PM <jwkova@...> wrote:

Hello,

I was wondering if BCC implements the new BPF feature (as of kernel 5.10) to allow BPF programs to utilize both BPF tail calls and BPF subprograms. This behavior is described near the end of this section of the BPF reference guide. I am interested in this functionality to extend a BPF program in order to reach the limit of 8KB of stack space.
You can use bpf tail calls today. You can look at
bcc/tests/cc/test_prog_table.cc for an example. bcc does not support
subprogram yet. In the future we do plan to be more libbpf compatible
so we can use those features.
BTW, the stack limit is 512 bytes not 8KB.


Thanks,

Jake


Re: __builtin_memcpy behavior

Toke Høiland-Jørgensen
 

"Tristan Mayfield" <mayfieldtristan@...> writes:

Thank you to both Andrii and Toke! It's been extremely helpful to read
your responses. Having conversations like these really helps me when I
go into the source code and try to understand the overall intent of
it. I'm going to try and summarize the conversation to confirm my
understanding.

bpf_probe_read() will read any valid kernel memory (nothing new here).
If the memory is already available to be read in the program (e.g. in
tracepoint args), then __builtin_memcpy can be used and will
potentially throw attach-time errors if reading structs incorrectly
(for some reason I don't think we clarified).
OK, I'll try to explain this one:

Think of __builtin_memcpy() as a macro: it just compiles down to regular
program instructions copying the memory (i.e., these two are roughly
equivalent, modulo any optimisations the compiler might make):

x = y;
__builtin_memcpy(&x, &y, sizeof(x));

The verifier will check the resulting memory access instructions, to
make sure you're not reading or writing out of bounds for whatever
variable you're reading from / writing to. E.g., if you're reading from
a context pointer, the verifier will know the size of the context object
and make sure you only dereference up to the memory address ctx +
sizeof(*ctx).

CO-RE can guarantee valid memory reads because of the nature of being
able to check offsets and relocations at load time rather than attach
time or just returning garbage data with no errors.
Yes, that's basically what it boils down to. It works like this: What
CO-RE does (for structs) is add some more information to the compiled
binary so that you can reference struct members by name instead of
memory offset. So, normally if you write:

x = y->z;

that will compile to a load from 'y + offsetof(typeof(y), z)', with the
offset being computed at compile time. When you add the
preserve_access_index attribute, clang will record a relocation that
says you wanted the member named 'z' (and its type) by way of the BTF
information. libbpf will read that at load time, and compute a new
'offsetof(typeof(y), z)' for the struct member as it exists in the
running kernel, so that if the layout has changed, you'll still get the
right offset. The load instruction in the byte code is then rewritten
with this new offset.

This means that by the time the bytecode is loaded into the kernel, it
has already been rewritten, so the kernel bounds check is still the same
- it'll just check that the memory you read is inside the size of the
structure; but because the offsets have been fixed up, the end result
you won't get out-of-bound errors - i.e., you might say that passing the
bounds check is an implicit effect of the CO-RE rewriting.

-Toke


BCC Support for BPF Subprograms with Tail Calls (Kernel 5.10 Feature)

jwkova@...
 

Hello,

I was wondering if BCC implements the new BPF feature (as of kernel 5.10) to allow BPF programs to utilize both BPF tail calls and BPF subprograms. This behavior is described near the end of this section of the BPF reference guide. I am interested in this functionality to extend a BPF program in order to reach the limit of 8KB of stack space.

Thanks,

Jake


Re: __builtin_memcpy behavior

Andrii Nakryiko
 

-- Andrii

On Wed, Feb 24, 2021 at 11:28 AM Tristan Mayfield
<mayfieldtristan@...> wrote:

Thank you to both Andrii and Toke! It's been extremely helpful to read your responses. Having conversations like these really helps me when I go into the source code and try to understand the overall intent of it. I'm going to try and summarize the conversation to confirm my understanding.

bpf_probe_read() will read any valid kernel memory (nothing new here). If the memory is already available to be read in the program (e.g. in tracepoint args), then __builtin_memcpy can be used and will potentially throw attach-time errors if reading structs incorrectly (for some reason I don't think we clarified).

CO-RE can guarantee valid memory reads because of the nature of being able to check offsets and relocations at load time rather than attach time or just returning garbage data with no errors.

To build CO-RE programs you need a vmlinux file (not to be confused with the header, vmlinux.h) which is normally found at /sys/kernel/btf/vmlinux on systems that have been compiled with pahole and CONFIG_DEBUG_INFO_BTF=y. Having the vmlinux.h file is helpful because it replaces kernel headers and makes building a bit nicer, but isn't necessary. Once compiled, CO-RE programs should be able to run on any system that has a vmlinux file in one of the locations listed here: https://github.com/libbpf/libbpf/blob/master/src/btf.c#L4583.
vmlinux usually refers to kernel image binary. /sys/kernel/btf/vmlinux
is not that, it's only the BTF data. So CO-RE needs kernel BTF, not
necessarily vmlinux kernel image. Just a clarification. But vmlinux
image (ELF file) itself has .BTF section, which has the same data
exposed in /sys/kernel/btf/vmlinux, so libbpf will try to fetch that
data, if /sys/kernel/btf/vmlinux is not present. That is necessary for
some older kernel versions, as well if you "embed" BTF information
manually with `pahole -J`.


For earlier kernels, it's possible to generate a vmlinux file (and this is one of the spots I'm a bit murky on) with pahole -J, but I'm not sure what you are supposed to target when running that? Just the compiled kernel binary? Something else?
Yes, `pahole -J <path-to-kernel-image-vmlinux-binary>`. Pahole is able
to produce BTF from DWARF type information, contained in your vmlinux
kernel image (if you compile it with DWARF, of course). That's what is
happening in newer kernels when you specify CONFIG_DEBUG_INFO_BTF=y
(plus some extra linker steps to make that section "loadable":
available to kernel itself in runtime, but that's not necessary for
CO-RE itself).


BTF is just a type format that can describe C data-types. Almost like a meta-language? I've personally not looked at the source for BTF yet, but it seems to be versatile enough that it's useful for CO-RE for describing internal data structures from the kernel, but it's also useful for a variety of other things (like map declarations) and will likely be increasingly relied on in future iterations of BPF, both CO-RE and otherwise. BTF support mainly comes from the compiler (which I do believe clang 10+ works, just from my experience. I'm primarily using clang 10 right now) and libbpf supporting it, not necessarily the kernel (except for CO-RE with the vmlinux).

Again, appreciate the responses. I've been building with BPF/libbpf about a year now and still feel like I've only scratched the surface. Reading source code is great, but sometimes it just really helps to get high-level ideas as well!
I think you got everything right. BTW, feel free to check my more
recent blog post ([0]), it might help a bit more.

[0] https://nakryiko.com/posts/libbpf-bootstrap/


-Tristan


Re: __builtin_memcpy behavior

Tristan Mayfield
 

Thank you to both Andrii and Toke! It's been extremely helpful to read your responses. Having conversations like these really helps me when I go into the source code and try to understand the overall intent of it. I'm going to try and summarize the conversation to confirm my understanding.

bpf_probe_read() will read any valid kernel memory (nothing new here). If the memory is already available to be read in the program (e.g. in tracepoint args), then __builtin_memcpy can be used and will potentially throw attach-time errors if reading structs incorrectly (for some reason I don't think we clarified).

CO-RE can guarantee valid memory reads because of the nature of being able to check offsets and relocations at load time rather than attach time or just returning garbage data with no errors.

To build CO-RE programs you need a vmlinux file (not to be confused with the header, vmlinux.h) which is normally found at /sys/kernel/btf/vmlinux on systems that have been compiled with pahole and CONFIG_DEBUG_INFO_BTF=y. Having the vmlinux.h file is helpful because it replaces kernel headers and makes building a bit nicer, but isn't necessary. Once compiled, CO-RE programs should be able to run on any system  that has a vmlinux file in one of the locations listed here: https://github.com/libbpf/libbpf/blob/master/src/btf.c#L4583.

For earlier kernels, it's possible to generate a vmlinux file (and this is one of the spots I'm a bit murky on) with pahole -J, but I'm not sure what you are supposed to target when running that? Just the compiled kernel binary? Something else?

BTF is just a type format that can describe C data-types. Almost like a meta-language? I've personally not looked at the source for BTF yet, but it seems to be versatile enough that it's useful for CO-RE for describing internal data structures from the kernel, but it's also useful for a variety of other things (like map declarations) and will likely be increasingly relied on in future iterations of BPF, both CO-RE and otherwise. BTF support mainly comes from the compiler (which I do believe clang 10+ works, just from my experience. I'm primarily using clang 10 right now) and libbpf supporting it, not necessarily the kernel (except for CO-RE with the vmlinux).

Again, appreciate the responses. I've been building with BPF/libbpf about a year now and still feel like I've only scratched the surface. Reading source code is great, but sometimes it just really helps to get high-level ideas as well!

-Tristan


Re: __builtin_memcpy behavior

Toke Høiland-Jørgensen
 

Am I misunderstanding what BTF is and the role it plays in BPF? Or
maybe has libbpf development moved so far toward CO-RE that non-CO-RE
development gets similar or the same error messages that just aren't
as clear for it?
Hmm, no, CO-RE is the specific feature that does relocations of struct
fields based on member names. This relies on BTF, but it's not the only
CO-RE is more than only field offset relocations, btw, you can detect
type and field existence, get type size, use relocatable enums
(internal kernel enums can get renumbered, so this feature allows to
accommodate that), etc.
Ah, neat, didn't know that (and I tend to lump all that together
mentally anyway).

thing BTF is used for. The map definition is another, as you discovered,
and there are some program types that cannot work without BTF
information at all. Also, things like bpftool being able to print out
the struct layout of map values is using BTF. So you're certainly right
that the BPF ecosystem in general is moving towards using BTF in more
and more places. And I guess you're also right that this leads to some
cryptic error messages sometimes... :)
Thanks for your reply, Toke. I don't think I added much value here :)
You're welcome, and thanks for confirming my understanding :)

-Toke


Re: __builtin_memcpy behavior

Andrii Nakryiko
 

On Tue, Feb 23, 2021 at 1:12 PM Toke Høiland-Jørgensen <toke@...> wrote:

"Tristan Mayfield" <mayfieldtristan@...> writes:

Toke, thanks for the quick response!

Yes, I was checking the bpf_probe_read return values, and was reading
the number of bytes expected, so nothing wrong there!
Right, in that case that's probably just because the struct in question
is next to some other valid memory (not sure where tracepoints keep
their data, but if it's on the stack, for instance, you'll have no
problem reading past it).

Now that you mention CO-RE, it does actually make sense that these
sorts of errors could be shifted to load time rather than attach time
(that the right phrase?). I've fiddled with CO-RE a bit but I haven't
adopted it for a few reasons (which I could certainly be mistaken
about).
I'm by no means the leading authority on CO-RE, but I can give answering
a shot; hopefully someone will chime in to correct me if I'm wrong :)

I don't have control over kernel versions or compilation flags for the
kernel on the systems I'm targeting and I've had significant
difficulty trying to compile CO-RE programs (e.g. from the BCC repo's
libbpf-tools) on Linux <5.4 because I've had a hard time getting the
vmlinux. I can't remember if I used bpftool though (this was about a
year ago that I last played with CO-RE), so perhaps I'll give it
another shot.
Yeah, getting all your ducks in a row when compiling can be a bit of an
issue. However, I don't think you need anything special from the kernel
at compile-time if you just compile your own programs with a vmlinux.h
file you generated on a kernel that has been compiled with BTF.
As far as CO-RE BPF program compilation goes, there shouldn't be much
difference between the latest kernel vs some older one. In case of
libbpf-tools, some of the tools might be using some features that are
supported by newer kernels only, but that's a bit different.

BTW, vmlinux.h is a pure convenience, so that you don't have to use
system headers or define your own types with
__attribute__((preserve_access_index)). vmlinux.h is not a
requirement. For libbpf-tools, though, it's pre-packaged to make life
easier (and now we have per-architecture vmlinux.h to facilitate
building libbpf-tools for various target arches).

New enough Clang is a requirement, though. Clang 11+ is preferred, but
I believe Clang 10 should have enough features for a lot of CO-RE
functionality.


I've also been very unclear, and have gotten many different answers
regarding the target systems and whether they need to be custom
compiled with BTF enabled for CO-RE programs to run on them, or if you
can put a CO-RE program onto a generic kernel build and it "just
works?" From your answer, the answer seems to be that
/sys/kernel/btf/vmlinux needs to be on the target system, so it must
have that BTF_ENABLE flag set?
Well, you'll need the BTF information of the running kernel. It doesn't
*have* to come from /sys/kernel/btf/vmlinux, libbpf will look for it in
a few other locations as well:

https://github.com/libbpf/libbpf/blob/master/src/btf.c#L4583
Right. For older kernels that don't yet support
/sys/kernel/btf/vmlinux, it's possible to add .BTF data with pahole -J
after the kernel is built. It's also possible to provide just BTF data
separately using bpf_object_open_opts, if it's more convenient.
Certainly an advanced use case, but doable.

But, of course, having kernels built with BTF and exposing it from
/sys/kernel/btf/vmlinux is hands down the most convenient way, which
seems to become more and more an option for popular Linux distros. See
[0] for a list (I think ALT Linux is going to have BTF built-in as
well).

[0] https://github.com/libbpf/libbpf#bpf-co-re-compile-once--run-everywhere


Distros have gotten pretty good about enabling BTF in their kernel
builds, though, so it's getting increasingly feasible to rely on it. It
should certainly be available on RHEL8 (and thus CentOS 8).

If that's set, do you also need a vmlinux.h file as well? A coworker
was recently messing with CO-RE and seemed to think that deploying a
CO-RE program required shipping the vmlinux.h file and I think he
mentioned that file was about 1Gb big, which is certainly a no-go for
our position.
No, you don't need to ship the vmlinux.h file. That's just a regular
header file with an unusual amount of definitions in it, that will be
used at compile time. It can be useful to include a copy of it in your
source code repository, though, as mentioned above. That's what BCC
does, for instance:
https://github.com/iovisor/bcc/tree/master/libbpf-tools/x86

An no, it's not 1GB in size. Maybe that size was from before BTF
de-duplication got implemented? The one linked above is 2.7M.
Maybe if you build allyesconfig it can come closer to 1GB :) But as
Toke said, it's used during compilation only. After that you get BPF
object file (that .o file), which contains all the necessary
relocation information internally and is very small. Then there is BPF
skeleton, which can be used to avoid distributing those separate .o
(and provides a bunch of other convenience features, of course), but
it's not a requirement either.


In addition to that, I've been unclear in the role of BTF in BPF
generally. When I began tinkering with BPF I was under the impression
that BTF was *only* something used for CO-RE programs (something I
actually might've gotten from the article referenced and written by
Andrii), but I've periodically seen errors arise that cite BTF reasons
for erroring.
BTF started out as "just" compact debug info for your BPF programs,
but it quickly grew into much more and is used for many BPF-related
features. CO-RE is one big area, but there are kernel BPF features
that rely on in-kernel BTF heavily as well.

One common cause for this has been when loading 'tc' programs with
iproute2, because the iproute2 loader doesn't understand BTF and will
complain about it. That is usually harmless, though, but I agree it's
quite annoying. Fortunately, iproute2 has recently gained support for
using libbpf for its BPF loading, so hopefully that particular error
should go away before too long.

Unfortunately I haven't saved any of these errors and
can't remember the causes specifically, but something like the
"updated" maps declarations, i.e.

struct {
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
__uint(key_size, sizeof(u32));
__uint(value_size, sizeof(u32));
} events SEC(".maps");

I've learned does use BTF?
Yes, the new-style map definitions use BTF. While BTF is ostensibly a
type format (i.e., something that describes C data types), Andrii
figured out that it is also possible to use it as a general purpose
key/value store. You do this by being a bit clever about how you
represent your data, which is what the __uint() macro in the above is
doing (it's encoding the integer value as the size of an array, which
becomes part of the type and thus embedded in the BTF). When loading,
libbpf will parse this data back out of the BTF data and use it when
creating the map. So you'll need BTF support in your compiler and in
libbpf to use this style of map definitions.
Right. Clang 10+ should be enough (but I'm too lazy to check), which
coincides with CO-RE requirements.


Am I misunderstanding what BTF is and the role it plays in BPF? Or
maybe has libbpf development moved so far toward CO-RE that non-CO-RE
development gets similar or the same error messages that just aren't
as clear for it?
Hmm, no, CO-RE is the specific feature that does relocations of struct
fields based on member names. This relies on BTF, but it's not the only
CO-RE is more than only field offset relocations, btw, you can detect
type and field existence, get type size, use relocatable enums
(internal kernel enums can get renumbered, so this feature allows to
accommodate that), etc.

thing BTF is used for. The map definition is another, as you discovered,
and there are some program types that cannot work without BTF
information at all. Also, things like bpftool being able to print out
the struct layout of map values is using BTF. So you're certainly right
that the BPF ecosystem in general is moving towards using BTF in more
and more places. And I guess you're also right that this leads to some
cryptic error messages sometimes... :)
Thanks for your reply, Toke. I don't think I added much value here :)

-Toke






Re: __builtin_memcpy behavior

Toke Høiland-Jørgensen
 

"Tristan Mayfield" <mayfieldtristan@...> writes:

Toke, thanks for the quick response!

Yes, I was checking the bpf_probe_read return values, and was reading
the number of bytes expected, so nothing wrong there!
Right, in that case that's probably just because the struct in question
is next to some other valid memory (not sure where tracepoints keep
their data, but if it's on the stack, for instance, you'll have no
problem reading past it).

Now that you mention CO-RE, it does actually make sense that these
sorts of errors could be shifted to load time rather than attach time
(that the right phrase?). I've fiddled with CO-RE a bit but I haven't
adopted it for a few reasons (which I could certainly be mistaken
about).
I'm by no means the leading authority on CO-RE, but I can give answering
a shot; hopefully someone will chime in to correct me if I'm wrong :)

I don't have control over kernel versions or compilation flags for the
kernel on the systems I'm targeting and I've had significant
difficulty trying to compile CO-RE programs (e.g. from the BCC repo's
libbpf-tools) on Linux <5.4 because I've had a hard time getting the
vmlinux. I can't remember if I used bpftool though (this was about a
year ago that I last played with CO-RE), so perhaps I'll give it
another shot.
Yeah, getting all your ducks in a row when compiling can be a bit of an
issue. However, I don't think you need anything special from the kernel
at compile-time if you just compile your own programs with a vmlinux.h
file you generated on a kernel that has been compiled with BTF.

I've also been very unclear, and have gotten many different answers
regarding the target systems and whether they need to be custom
compiled with BTF enabled for CO-RE programs to run on them, or if you
can put a CO-RE program onto a generic kernel build and it "just
works?" From your answer, the answer seems to be that
/sys/kernel/btf/vmlinux needs to be on the target system, so it must
have that BTF_ENABLE flag set?
Well, you'll need the BTF information of the running kernel. It doesn't
*have* to come from /sys/kernel/btf/vmlinux, libbpf will look for it in
a few other locations as well:

https://github.com/libbpf/libbpf/blob/master/src/btf.c#L4583

Distros have gotten pretty good about enabling BTF in their kernel
builds, though, so it's getting increasingly feasible to rely on it. It
should certainly be available on RHEL8 (and thus CentOS 8).

If that's set, do you also need a vmlinux.h file as well? A coworker
was recently messing with CO-RE and seemed to think that deploying a
CO-RE program required shipping the vmlinux.h file and I think he
mentioned that file was about 1Gb big, which is certainly a no-go for
our position.
No, you don't need to ship the vmlinux.h file. That's just a regular
header file with an unusual amount of definitions in it, that will be
used at compile time. It can be useful to include a copy of it in your
source code repository, though, as mentioned above. That's what BCC
does, for instance:
https://github.com/iovisor/bcc/tree/master/libbpf-tools/x86

An no, it's not 1GB in size. Maybe that size was from before BTF
de-duplication got implemented? The one linked above is 2.7M.

In addition to that, I've been unclear in the role of BTF in BPF
generally. When I began tinkering with BPF I was under the impression
that BTF was *only* something used for CO-RE programs (something I
actually might've gotten from the article referenced and written by
Andrii), but I've periodically seen errors arise that cite BTF reasons
for erroring.
One common cause for this has been when loading 'tc' programs with
iproute2, because the iproute2 loader doesn't understand BTF and will
complain about it. That is usually harmless, though, but I agree it's
quite annoying. Fortunately, iproute2 has recently gained support for
using libbpf for its BPF loading, so hopefully that particular error
should go away before too long.

Unfortunately I haven't saved any of these errors and
can't remember the causes specifically, but something like the
"updated" maps declarations, i.e.

struct {
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
__uint(key_size, sizeof(u32));
__uint(value_size, sizeof(u32));
} events SEC(".maps");

I've learned does use BTF?
Yes, the new-style map definitions use BTF. While BTF is ostensibly a
type format (i.e., something that describes C data types), Andrii
figured out that it is also possible to use it as a general purpose
key/value store. You do this by being a bit clever about how you
represent your data, which is what the __uint() macro in the above is
doing (it's encoding the integer value as the size of an array, which
becomes part of the type and thus embedded in the BTF). When loading,
libbpf will parse this data back out of the BTF data and use it when
creating the map. So you'll need BTF support in your compiler and in
libbpf to use this style of map definitions.

Am I misunderstanding what BTF is and the role it plays in BPF? Or
maybe has libbpf development moved so far toward CO-RE that non-CO-RE
development gets similar or the same error messages that just aren't
as clear for it?
Hmm, no, CO-RE is the specific feature that does relocations of struct
fields based on member names. This relies on BTF, but it's not the only
thing BTF is used for. The map definition is another, as you discovered,
and there are some program types that cannot work without BTF
information at all. Also, things like bpftool being able to print out
the struct layout of map values is using BTF. So you're certainly right
that the BPF ecosystem in general is moving towards using BTF in more
and more places. And I guess you're also right that this leads to some
cryptic error messages sometimes... :)

-Toke


Re: __builtin_memcpy behavior

Tristan Mayfield
 

Toke, thanks for the quick response!

Yes, I was checking the bpf_probe_read return values, and was reading the number of bytes expected, so nothing wrong there!

Now that you mention CO-RE, it does actually make sense that these sorts of errors could be shifted to load time rather than attach time (that the right phrase?). I've fiddled with CO-RE a bit but I haven't adopted it for a few reasons (which I could certainly be mistaken about).
I don't have control over kernel versions or compilation flags for the kernel on the systems I'm targeting and I've had significant difficulty trying to compile CO-RE programs (e.g. from the BCC repo's libbpf-tools) on Linux <5.4 because I've had a hard time getting the vmlinux. I can't remember if I used bpftool though (this was about a year ago that I last played with CO-RE), so perhaps I'll give it another shot.
I've also been very unclear, and have gotten many different answers regarding the target systems and whether they need to be custom compiled with BTF enabled for CO-RE programs to run on them, or if you can put a CO-RE program onto a generic kernel build and it "just works?" From your answer, the answer seems to be that /sys/kernel/btf/vmlinux needs to be on the target system, so it must have that BTF_ENABLE flag set? If that's set, do you also need a vmlinux.h file as well? A coworker was recently messing with CO-RE and seemed to think that deploying a CO-RE program required shipping the vmlinux.h file and I think he mentioned that file was about 1Gb big, which is certainly a no-go for our position.

In addition to that, I've been unclear in the role of BTF in BPF generally. When I began tinkering with BPF I was under the impression that BTF was *only* something used for CO-RE programs (something I actually might've gotten from the article referenced and written by Andrii), but I've periodically seen errors arise that cite BTF reasons for erroring. Unfortunately I haven't saved any of these errors and can't remember the causes specifically, but something like the "updated" maps declarations, i.e.

struct {
    __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
    __uint(key_size, sizeof(u32));
    __uint(value_size, sizeof(u32));
} events SEC(".maps");

I've learned does use BTF? Am I misunderstanding what BTF is and the role it plays in BPF? Or maybe has libbpf development moved so far toward CO-RE that non-CO-RE development gets similar or the same error messages that just aren't as clear for it?

I have tons of other questions, like the relationship with BPF and perf's utilities, but I think I've probably asked enough for this message!


Re: __builtin_memcpy behavior

Toke Høiland-Jørgensen
 

"Tristan Mayfield" <mayfieldtristan@...> writes:

I'm not sure where the memory it was reading was and if that should be
defined behavior, but I thought I would send this here and see if this
is intended or if I have actually found something unexpected.
bpf_probe_read() will happily read any piece of kernel memory, it
doesn't respect kernel boundaries. So if the call succeeded (you did
check the return code, right?), that just means that the memory it was
reading contained *something*, not that it was actually what you were
*expecting* it to be.

Should __builtin_memcpy be used? Or should bpf_probe_read? If
bpf_probe_read is recommended, is there a way we can verify that we're
not reading garbage data in this context other than having a human
eyeball the data returned? Or is that just a necessary part of BPF
development in this context? Is this issue something that the verifier
can even check at load time? I can provide more information on the
program and/or bug if it's needed, thanks!
In this case, I think what you're after is the BPF CO-RE facility in
general. Have a look at Andrii's excellent introduction post here:
https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html

But basically, what it means is that if you add a magic attribute
(preserve_access_index) to your variables, libbpf will notice that and
perform load-time relocations so you get the right offset on the kernel
you're running on. You can do this with your self-defined struct, but
with BTF you don't even have to do that: You can make bpftool spit out a
header file with all the structs defined by the current kernel and just
include that (the struct name in this case would be 'struct
trace_event_raw_tcp_event_sk_skb'). To do this, issue:

bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h

and use that as an include file. bpftool will even add a nice pragma
statement at the top of the file so that all structs defined in it will
automatically get the preserve_access_index attribute; which means you
can basically use them like regular struct members (i.e., you can stick
with __builtin_memcpy() and regular assignments) and still get the magic
CO-RE relocations on load.

Note that all this requires that your libbpf and clang versions are
recent enough to understand all this, and libbpf will need the kernel
BTF information to be present at load time (check for the existence of
/sys/kernel/btf/vmlinux on the target system).

-Toke


__builtin_memcpy behavior

Tristan Mayfield
 

The other day I was in the process of porting a little libbpf application from Ubuntu 20 (Linux 5.4) to CentOS 8 (Linux 4.18). This program uses tracepoint:tcp:tcp_send_reset. Here's the relevant BPF code:

struct tcp_send_rst_args {
    long long pad;
    const void * skbaddr;
    const void * skaddr;
    int state;
    u16 sport;
    u16 dport;
    u8 saddr[4];
    u8 daddr[4];
    u8 saddr_v6[16];
    u8 daddr_v6[16];
};

SEC("tracepoint/tcp/tcp_send_reset")
int tcp_send_reset_prog(struct tcp_send_rst_args * args) {

    struct tcprstsend_data_t data = {};
    data.pid = bpf_get_current_pid_tgid() >> 32;

    data.sport = args->sport;
    data.dport = args->dport;
    bpf_get_current_comm(&data.comm, sizeof(data.comm));

    __builtin_memcpy(&data.saddr, args->saddr, sizeof(data.saddr));
    __builtin_memcpy(&data.daddr, args->daddr, sizeof(data.daddr));

    __builtin_memcpy(&data.saddr_v6, args->saddr_v6, sizeof(data.saddr_v6));
    __builtin_memcpy(&data.daddr_v6, args->daddr_v6, sizeof(data.daddr_v6));

    bpf_perf_event_output(args, &tcprstsend_events, BPF_F_CURRENT_CPU, &data, sizeof(data));
    return 0;
}

What I found was that this code compiles and can be loaded into the kernel, but fails when you are attaching it to the tracepoint.
It fails with a permission error stating that it can't be attached to the pfd. Here's the actual message

libbpf: program 'tracepoint/tcp/tcp_send_reset': failed to attach to pfd 92: Permission denied
libbpf: program 'tracepoint/tcp/tcp_send_reset': failed to attach to tracepoint 'tcp/tcp_send_reset': Permission denied

I switched from __builtin_memcpy to bpf_probe_read to see if that would help and it resolved the permission errors and allowed me to attach to the tracepoint, but I found that the data wasn't read correctly. The "state" member of the tcp_send_rst_args struct that I defined isn't included in CentOS 8/kernel 4.18 so all my reads were off by four bytes on CentOS. It works fine if I redefine that struct to:

struct tcp_send_rst_args {
    long long pad;
    const void * skbaddr;
    const void * skaddr;
#ifndef RHEL_RELEASE_CODE
    int state; // This needs to be removed for CentOS 8/Linux 4.18
#endif
    u16 sport;
    u16 dport;
    u8 saddr[4];
    u8 daddr[4];
    u8 saddr_v6[16];
    u8 daddr_v6[16];
};

Now I'm a bit confused because __builtin_memcpy seemed to fail at attach time rather than load time. However, it did actually fail (albeit with error messages that ended up being really hard to debug, I will never NOT check the tracepoint format file again though). bpf_probe_read just happily read past the struct.
I'm not sure where the memory it was reading was and if that should be defined behavior, but I thought I would send this here and see if this is intended or if I have actually found something unexpected. Should __builtin_memcpy be used? Or should bpf_probe_read? If bpf_probe_read is recommended, is there a way we can verify that we're not reading garbage data in this context other than having a human eyeball the data returned? Or is that just a necessary part of BPF development in this context? Is this issue something that the verifier can even check at load time? I can provide more information on the program and/or bug if it's needed, thanks!


Re: android adeb KASAN_SHADOW_SCALE_SHIFT

Yonghong Song
 

Unfortunately, the value is defined in Makefile,

```
ifeq ($(CONFIG_KASAN_SW_TAGS), y)
KASAN_SHADOW_SCALE_SHIFT := 4
else ifeq ($(CONFIG_KASAN_GENERIC), y)
KASAN_SHADOW_SCALE_SHIFT := 3
endif

KBUILD_CFLAGS += -DKASAN_SHADOW_SCALE_SHIFT=$(KASAN_SHADOW_SCALE_SHIFT)
KBUILD_CPPFLAGS += -DKASAN_SHADOW_SCALE_SHIFT=$(KASAN_SHADOW_SCALE_SHIFT)
KBUILD_AFLAGS += -DKASAN_SHADOW_SCALE_SHIFT=$(KASAN_SHADOW_SCALE_SHIFT)
```

We could add something above in helpers.h e.g.,
```
#if defined(__aarch64__)
#if defined(CONFIG_KASAN_SW_TAGS)
#define KASAN_SHADOW_SCALE_SHIFT 4
#elif defined(CONFIG_KASAN_GENERIC)
#define KASAN_SHADOW_SCALE_SHIFT 3
#endif
#endif
```

You can also add the above code to the tool itself.

On Wed, Feb 10, 2021 at 10:18 AM katrina lulz
<anotherworkqueue@...> wrote:

Hi *,
I managed to setup adeb on a pixel4 with custom kernel compiled as suggested by adeb's README.
The setup is working fine for some BCC tools, as vfsstat but a few as opensnoop and the trace command return the following error:

In file included from ./arch/arm64/include/asm/thread_info.h:13:
./arch/arm64/include/asm/memory.h:136:24: error: use of undeclared identifier 'KASAN_SHADOW_SCALE_SHIFT'
return kimage_vaddr - KIMAGE_VADDR;

I verified by the config.gz on the device that IKHEADERS and the other BPF related configs are correctly enabled.
Any ideas on how to fix the above error?

thanks,
best.


Re: BPF perf event: runq length

Yonghong Song
 

On Mon, Feb 15, 2021 at 3:45 AM Raga lahari <ragalahari.potti@...> wrote:

Hi,


I am trying to write a BPF perf event program to get CPU runq length. The Following is the code snippet. I am observing that a big integer (len is 2839296536 ) as queue length in trace output for some instances.


Can someone please let me know that whether this approach helps to get length?
Take a look at bcc tool runqlen.py. Did you get abnormal len with runqlen.py?



struct cfs_rq_partial {

struct load_weight load;

unsigned long runnable_weight;

unsigned int nr_running;

unsigned int h_nr_running;

};


#define _(P) ({typeof(P) val = 0; bpf_probe_read(&val, sizeof(val), &P); val;})


SEC("perf_event")

int do_sample(struct bpf_perf_event_data *ctx)

{



struct cfs_rq_partial *my_q = NULL;

struct task_struct *task = NULL;

unsigned int len;



task = (struct task_struct *)bpf_get_current_task();

my_q = _(task->se.cfs_rq);

len = _(my_q->nr_running);

bpf_printk("len is %u", len);



…..

}


I have tested with another program and confirmed that cfs_rq has runnable_weight filed.




Regards,
Ragalahari


BPF perf event: runq length

Raga lahari
 

Hi,


I am trying to write a BPF perf event program to get CPU runq length.  The Following is the code snippet. I am observing that a big integer (len is 2839296536 ) as queue length in trace output for some instances.


Can someone please let me know that  whether this approach helps to get length? 


struct cfs_rq_partial {

    struct load_weight load;

    unsigned long runnable_weight;

    unsigned int nr_running;

    unsigned int h_nr_running;

};


#define _(P) ({typeof(P) val = 0; bpf_probe_read(&val, sizeof(val), &P); val;}) 


SEC("perf_event")

int do_sample(struct bpf_perf_event_data *ctx)

{

       

        struct cfs_rq_partial *my_q = NULL;

        struct task_struct *task = NULL;

        unsigned int len;

        

        task = (struct task_struct *)bpf_get_current_task();

        my_q = _(task->se.cfs_rq);

        len = _(my_q->nr_running);

        bpf_printk("len is %u", len);

       

      …..

}


I have tested with another program and confirmed that cfs_rq has runnable_weight filed.




Regards,
Ragalahari


android adeb KASAN_SHADOW_SCALE_SHIFT

katrina lulz
 

Hi *,
I managed to setup adeb on a pixel4 with custom kernel compiled as suggested by adeb's README.
The setup is working  fine for some BCC tools, as vfsstat but a few as opensnoop and the trace command return the following error:

In file included from ./arch/arm64/include/asm/thread_info.h:13:
./arch/arm64/include/asm/memory.h:136:24: error: use of undeclared identifier 'KASAN_SHADOW_SCALE_SHIFT'
        return kimage_vaddr - KIMAGE_VADDR;
                              
I verified by the config.gz on the device that IKHEADERS and the other BPF related configs are correctly enabled.
Any ideas on how to fix the above error?

thanks,
best.


get function latency using ebpf-uprobe when using coroutine

Forrest Chen
 

Bcc has funclatency.py which support get function latency for the user program, using pid_tgid as the key.
But when it comes to a program was written by golang which supports coroutine(goroutine), it doesn't work anymore.

Is there another way to handle this situation?

Thanks!


Re: Weird behaviour when updating a hash map from userspace

Yonghong Song
 

On Fri, Jan 15, 2021 at 12:42 PM William Findlay
<williamfindlay@...> wrote:

Hi all.

Currently debugging a very strange behaviour with eBPF hash maps and was wondering if anyone else has run into a similar issue? I am using libbpf-rs with BPF CO-RE and my kernel version is 5.9.14.

My setup: I have a map with some compound key and I am updating it once from userspace using libbpf and once (later) from a BPF program, using the same key both times, but with different values.

Here's the weird part: Somehow both key,value pairs are being stored in the map, according to output from bpftool. Even more bizarre, the value provided from userspace is essentially a "ghost value" the entire time -- all map lookups fail until the map has been updated from a BPF program as described above.

To be clear, the weirdness is two-fold:

Lookup should not fail after updating the map the first time; and
The second value should be overwriting the first one.

After performing both updates, here is the output from bpftool showcasing the weird behaviour:

[{
"key": {
"id": 3069983010007500772,
"device": 48
},
"value": 10
},{
"key": {
"id": 3069983010007500772,
"device": 48
},
"value": 40
}

]
Does your key data structure have padding? Different padding values
will cause different actual keys.

If padding is not an issue in your case, could you construct a test
case (best if no rust involved) so we can take a deep look?
You can file a task to document this issue if you intend to send a test case.
Thanks!


This behaviour also seems to be inconsistent between different maps and yet consistent between different runs. For some maps, I get the expected result and for others I get this weirdness instead.

Is this possibly a bug in the kernel? Any assistance would be greatly appreciated.

Regards,
William
[...]


Weird behaviour when updating a hash map from userspace

williamfindlay@...
 

Hi all.

Currently debugging a very strange behaviour with eBPF hash maps and was wondering if anyone else has run into a similar issue? I am using libbpf-rs with BPF CO-RE and my kernel version is 5.9.14.

My setup: I have a map with some compound key and I am updating it once from userspace using libbpf and once (later) from a BPF program, using the same key both times, but with different values.

Here's the weird part: Somehow both key,value pairs are being stored in the map, according to output from bpftool. Even more bizarre, the value provided from userspace is essentially a "ghost value" the entire time -- all map lookups fail until the map has been updated from a BPF program as described above.

To be clear, the weirdness is two-fold:

  1. Lookup should not fail after updating the map the first time; and
  2. The second value should be overwriting the first one.

After performing both updates, here is the output from bpftool showcasing the weird behaviour:

[{
        "key": {
            "id": 3069983010007500772,
            "device": 48
        },
        "value": 10
    },{
        "key": {
            "id": 3069983010007500772,
            "device": 48
        },
        "value": 40
    }

]

This behaviour also seems to be inconsistent between different maps and yet consistent between different runs. For some maps, I get the expected result and for others I get this weirdness instead.

Is this possibly a bug in the kernel? Any assistance would be greatly appreciated.

Regards,
William

61 - 80 of 2021