Re: Which is oldest linux kernel version that can support BTF?
#bcc
Toke Høiland-Jørgensen
"Daniel Xu" <dxu@...> writes:
On Sun, Feb 28, 2021, at 12:07 PM, bg.salunke09@... wrote:Yeah, that's a RHEL version number (RHEL8.2 in this case, as seen by theHi,What distro are you using? Your distro probably backported BTF "el8_2" bit). Which means that as far as features are concerned, the 4.18 version number is basically a complete fiction at this point. For BPF we basically backport everything, IIRC we made it up to upstream kernel 5.4 for RHEL8.2... -Toke |
Re: Which is oldest linux kernel version that can support BTF?
#bcc
Daniel Xu
On Sun, Feb 28, 2021, at 12:07 PM, bg.salunke09@... wrote:
Hi,What distro are you using? Your distro probably backported BTF support. Daniel |
Questions about runqlen
Abel Wu
Hi, when I looked into the runqlen script yesterday, I found that,
sadly, I misunderstood the "queue length" all the time not only the "length" part but also the "queue" part. Queue ===== Only CFS runqueues are taken into account. This makes sense when main workloads are all under CFS scheduler, which is common in cloud scenario. But what I don't quite follow is that the selected queue is task->se.cfs_rq which is from a task view, rather than the top level cfs_rq from a cpu view. I suppose the task view is not enough to draw the whole picture of saturation? Length ====== Within this scope length means the number of schedulable entities, that is cfs_rq->nr_running. From time sharing point of view, it is OK because it represents how many units involved in scheduling of this cfs_rq. But what about from execution point of view in which the number of tasks (cfs_rq->h_nr_running) will be used? And besides the above, without the shares information of each entity, how could runqlen help us optimizing the performance? Maybe we should always focus on occupancy rather than length? It would be very much appreciated if someone can shed some light. Thanks & Best regards, Abel |
Re: Which is oldest linux kernel version that can support BTF?
#bcc
bg.salunke09@...
On Tue, Mar 2, 2021 at 08:22 PM, Andrii Nakryiko wrote:
On Tue, Mar 2, 2021 at 4:42 PM <bg.salunke09@...> wrote:Thank you for confirming! Go it. I'm following the discussion thread and patch. Thank you so much for your time.Btw, Is there any document to generate BTF information for a linux kernel? Or Is there a way to generate BTF info for running kernel i.e. at runtime and not at compile time? Thanks!Yes, you can, if you have vmlinux image with DWARF information in it. |
Re: Which is oldest linux kernel version that can support BTF?
#bcc
Andrii Nakryiko
On Tue, Mar 2, 2021 at 4:42 PM <bg.salunke09@...> wrote:
Yes, correct. Yes, you can, if you have vmlinux image with DWARF information in it. You can use pahole tool like this to add .BTF section to vmlinux image: pahole -J <path-to-vmlinux-image> You most probably would want to make a local copy of vmlinux image, of course. After that you can pass the path to that vmlinux with embedded .BTF to libbpf to use for CO-RE relocations. See [0] for recent discussion of the exact same topic. See also patch [1] that was aiming to make this scenario better in libbpf (unfortunately it hasn't landed yet, but it is pretty close to being done, so shouldn't be a problem for you to pick up, if necessary). This is certainly not the most straightforward and easiest path, but if you want to get CO-RE working with older kernel for which you don't have much control, it is definitely a possible way (as long as you have DWARF, which is used to produce BTF for vmlinux). [0] https://lore.kernel.org/bpf/CAEf4BzbJZLjNoiK8_VfeVg_Vrg=9iYFv+po-38SMe=UzwDKJ=Q@mail.gmail.com/ [1] https://lore.kernel.org/bpf/B8801F77-37E8-4EF8-8994-D366D48169A3@araalinetworks.com/ |
Re: Which is oldest linux kernel version that can support BTF?
#bcc
bg.salunke09@...
Thanks Andrii, for detailed answer.
Yes you are right, I'm looking for CO-RE. Basically I'm trying to build the eBPF program which can run on any linux kernel version using libbpf What I understood from your blog https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html (Thanks for in depth blog post, appreciate it), to work libbpf based program the BTF information should be available on the running host. Is my understanding correct? Btw, Is there any document to generate BTF information for a linux kernel? Or Is there a way to generate BTF info for running kernel i.e. at runtime and not at compile time? Thanks! |
Re: Which is oldest linux kernel version that can support BTF?
#bcc
Toke Høiland-Jørgensen
"Andrii Nakryiko" <andrii.nakryiko@...> writes:
On Sun, Feb 28, 2021 at 12:37 PM <bg.salunke09@...> wrote:Yeah, that looks like a RHEL/CentOS kernel version number, which means/sys/kernel/btf/vmlinux appeared in 5.4 kernel (upstream version). If the 4.18 bit is mostly fiction at this point (at least as far as BPF is concerned). IIRC we backported up to upstream kernel 5.4 for RHEL 8.2, which seems to be what you're running (from the el8_2 bit of the version), and I guess that fits with the availability of /sys/kernel/btf/vmlinux -Toke |
Re: Which is oldest linux kernel version that can support BTF?
#bcc
Andrii Nakryiko
On Sun, Feb 28, 2021 at 12:37 PM <bg.salunke09@...> wrote:
/sys/kernel/btf/vmlinux appeared in 5.4 kernel (upstream version). If you see it on 4.18, that means someone backported the changes. But for BPF CO-RE (which I assume is what you are referring to) to work, kernel itself doesn't need to "support BTF", it just needs to have .BTF data built-in inside its vmlinux binary image, and that image needs to be in one of the supported locations (see [0]). Starting from 5.2 kernel CONFIG_DEBUG_INTO_BTF=y is supported with adds .BTF section as part of the kernel build process. But one could technically add .BTF by using pahole tool (part of dwarves package) even before that, as long as vmlinux image contains DWARF information. So in short, the easiest way is to get the latest kernel you can. But with enough persistence and effort you can get kernel BTF embedded for pretty much any kernel version. [0] https://github.com/libbpf/libbpf/blob/master/src/btf.c#L4589-L4598 |
Re: Which is oldest linux kernel version that can support BTF?
#bcc
bg.salunke09@... asked:
Can I get information about oldest linux kernel version that can support BTF?The basic support appears to have been added by commit e83b9f55448afce3fe1abcd1d10db9584f8042a6 Author: Andrii Nakryiko <andriin@...> Date: Tue Apr 2 09:49:50 2019 -0700 kbuild: add ability to generate BTF type info for vmlinux The inquiry "git branch --contains e83b9f55448a" will tell you which of your branches contains this commit. Hope this helps, Alison Chaiken Aurora Innovation |
Which is oldest linux kernel version that can support BTF?
#bcc
Hi,
I'm looking into BTF and it's use case. Based on the document I understood to run BPF programs across different kernel versions, it needs to build with libbpf which depends on the BTF information. Now to enable/to have BTF information on any Kernel, the kernel needs to be re-build with "" flag. I can see the BTF support in Linux introduced from kernel version 5.1.0 (https://www.kernel.org/doc/html/v5.1/bpf/btf.html?highlight=btf) however I can still see the BTF information(/sys/kernel/btf/vmlinux) on my 4.18.0-193.28.1.el8_2.x86_64 kernel. I'm little confused here how old kernel can generate BTF info if the was support added recently. Can I get information about oldest linux kernel version that can support BTF? |
Re: BCC Support for BPF Subprograms with Tail Calls (Kernel 5.10 Feature)
Yonghong Song
On Wed, Feb 24, 2021 at 12:24 PM <jwkova@...> wrote:
You can use bpf tail calls today. You can look at bcc/tests/cc/test_prog_table.cc for an example. bcc does not support subprogram yet. In the future we do plan to be more libbpf compatible so we can use those features. BTW, the stack limit is 512 bytes not 8KB.
|
Re: __builtin_memcpy behavior
Toke Høiland-Jørgensen
"Tristan Mayfield" <mayfieldtristan@...> writes:
Thank you to both Andrii and Toke! It's been extremely helpful to readOK, I'll try to explain this one: Think of __builtin_memcpy() as a macro: it just compiles down to regular program instructions copying the memory (i.e., these two are roughly equivalent, modulo any optimisations the compiler might make): x = y; __builtin_memcpy(&x, &y, sizeof(x)); The verifier will check the resulting memory access instructions, to make sure you're not reading or writing out of bounds for whatever variable you're reading from / writing to. E.g., if you're reading from a context pointer, the verifier will know the size of the context object and make sure you only dereference up to the memory address ctx + sizeof(*ctx). CO-RE can guarantee valid memory reads because of the nature of beingYes, that's basically what it boils down to. It works like this: What CO-RE does (for structs) is add some more information to the compiled binary so that you can reference struct members by name instead of memory offset. So, normally if you write: x = y->z; that will compile to a load from 'y + offsetof(typeof(y), z)', with the offset being computed at compile time. When you add the preserve_access_index attribute, clang will record a relocation that says you wanted the member named 'z' (and its type) by way of the BTF information. libbpf will read that at load time, and compute a new 'offsetof(typeof(y), z)' for the struct member as it exists in the running kernel, so that if the layout has changed, you'll still get the right offset. The load instruction in the byte code is then rewritten with this new offset. This means that by the time the bytecode is loaded into the kernel, it has already been rewritten, so the kernel bounds check is still the same - it'll just check that the memory you read is inside the size of the structure; but because the offsets have been fixed up, the end result you won't get out-of-bound errors - i.e., you might say that passing the bounds check is an implicit effect of the CO-RE rewriting. -Toke |
BCC Support for BPF Subprograms with Tail Calls (Kernel 5.10 Feature)
jwkova@...
Hello,
I was wondering if BCC implements the new BPF feature (as of kernel 5.10) to allow BPF programs to utilize both BPF tail calls and BPF subprograms. This behavior is described near the end of this section of the BPF reference guide. I am interested in this functionality to extend a BPF program in order to reach the limit of 8KB of stack space. Thanks, Jake |
Re: __builtin_memcpy behavior
Andrii Nakryiko
-- Andrii
On Wed, Feb 24, 2021 at 11:28 AM Tristan Mayfield <mayfieldtristan@...> wrote: vmlinux usually refers to kernel image binary. /sys/kernel/btf/vmlinux is not that, it's only the BTF data. So CO-RE needs kernel BTF, not necessarily vmlinux kernel image. Just a clarification. But vmlinux image (ELF file) itself has .BTF section, which has the same data exposed in /sys/kernel/btf/vmlinux, so libbpf will try to fetch that data, if /sys/kernel/btf/vmlinux is not present. That is necessary for some older kernel versions, as well if you "embed" BTF information manually with `pahole -J`. Yes, `pahole -J <path-to-kernel-image-vmlinux-binary>`. Pahole is able to produce BTF from DWARF type information, contained in your vmlinux kernel image (if you compile it with DWARF, of course). That's what is happening in newer kernels when you specify CONFIG_DEBUG_INFO_BTF=y (plus some extra linker steps to make that section "loadable": available to kernel itself in runtime, but that's not necessary for CO-RE itself). I think you got everything right. BTW, feel free to check my more recent blog post ([0]), it might help a bit more. [0] https://nakryiko.com/posts/libbpf-bootstrap/
|
Re: __builtin_memcpy behavior
Tristan Mayfield
Thank you to both Andrii and Toke! It's been extremely helpful to read your responses. Having conversations like these really helps me when I go into the source code and try to understand the overall intent of it. I'm going to try and summarize the conversation to confirm my understanding.
bpf_probe_read() will read any valid kernel memory (nothing new here). If the memory is already available to be read in the program (e.g. in tracepoint args), then __builtin_memcpy can be used and will potentially throw attach-time errors if reading structs incorrectly (for some reason I don't think we clarified). CO-RE can guarantee valid memory reads because of the nature of being able to check offsets and relocations at load time rather than attach time or just returning garbage data with no errors. To build CO-RE programs you need a vmlinux file (not to be confused with the header, vmlinux.h) which is normally found at /sys/kernel/btf/vmlinux on systems that have been compiled with pahole and CONFIG_DEBUG_INFO_BTF=y. Having the vmlinux.h file is helpful because it replaces kernel headers and makes building a bit nicer, but isn't necessary. Once compiled, CO-RE programs should be able to run on any system that has a vmlinux file in one of the locations listed here: https://github.com/libbpf/libbpf/blob/master/src/btf.c#L4583. For earlier kernels, it's possible to generate a vmlinux file (and this is one of the spots I'm a bit murky on) with pahole -J, but I'm not sure what you are supposed to target when running that? Just the compiled kernel binary? Something else? BTF is just a type format that can describe C data-types. Almost like a meta-language? I've personally not looked at the source for BTF yet, but it seems to be versatile enough that it's useful for CO-RE for describing internal data structures from the kernel, but it's also useful for a variety of other things (like map declarations) and will likely be increasingly relied on in future iterations of BPF, both CO-RE and otherwise. BTF support mainly comes from the compiler (which I do believe clang 10+ works, just from my experience. I'm primarily using clang 10 right now) and libbpf supporting it, not necessarily the kernel (except for CO-RE with the vmlinux). Again, appreciate the responses. I've been building with BPF/libbpf about a year now and still feel like I've only scratched the surface. Reading source code is great, but sometimes it just really helps to get high-level ideas as well! -Tristan |
Re: __builtin_memcpy behavior
Toke Høiland-Jørgensen
Ah, neat, didn't know that (and I tend to lump all that togetherCO-RE is more than only field offset relocations, btw, you can detectAm I misunderstanding what BTF is and the role it plays in BPF? OrHmm, no, CO-RE is the specific feature that does relocations of struct mentally anyway). You're welcome, and thanks for confirming my understanding :)thing BTF is used for. The map definition is another, as you discovered,Thanks for your reply, Toke. I don't think I added much value here :) -Toke |
Re: __builtin_memcpy behavior
Andrii Nakryiko
On Tue, Feb 23, 2021 at 1:12 PM Toke Høiland-Jørgensen <toke@...> wrote:
As far as CO-RE BPF program compilation goes, there shouldn't be much difference between the latest kernel vs some older one. In case of libbpf-tools, some of the tools might be using some features that are supported by newer kernels only, but that's a bit different. BTW, vmlinux.h is a pure convenience, so that you don't have to use system headers or define your own types with __attribute__((preserve_access_index)). vmlinux.h is not a requirement. For libbpf-tools, though, it's pre-packaged to make life easier (and now we have per-architecture vmlinux.h to facilitate building libbpf-tools for various target arches). New enough Clang is a requirement, though. Clang 11+ is preferred, but I believe Clang 10 should have enough features for a lot of CO-RE functionality. Right. For older kernels that don't yet supportI've also been very unclear, and have gotten many different answersWell, you'll need the BTF information of the running kernel. It doesn't /sys/kernel/btf/vmlinux, it's possible to add .BTF data with pahole -J after the kernel is built. It's also possible to provide just BTF data separately using bpf_object_open_opts, if it's more convenient. Certainly an advanced use case, but doable. But, of course, having kernels built with BTF and exposing it from /sys/kernel/btf/vmlinux is hands down the most convenient way, which seems to become more and more an option for popular Linux distros. See [0] for a list (I think ALT Linux is going to have BTF built-in as well). [0] https://github.com/libbpf/libbpf#bpf-co-re-compile-once--run-everywhere Maybe if you build allyesconfig it can come closer to 1GB :) But as Toke said, it's used during compilation only. After that you get BPF object file (that .o file), which contains all the necessary relocation information internally and is very small. Then there is BPF skeleton, which can be used to avoid distributing those separate .o (and provides a bunch of other convenience features, of course), but it's not a requirement either. BTF started out as "just" compact debug info for your BPF programs,In addition to that, I've been unclear in the role of BTF in BPF but it quickly grew into much more and is used for many BPF-related features. CO-RE is one big area, but there are kernel BPF features that rely on in-kernel BTF heavily as well. One common cause for this has been when loading 'tc' programs withRight. Clang 10+ should be enough (but I'm too lazy to check), which coincides with CO-RE requirements. CO-RE is more than only field offset relocations, btw, you can detectAm I misunderstanding what BTF is and the role it plays in BPF? OrHmm, no, CO-RE is the specific feature that does relocations of struct type and field existence, get type size, use relocatable enums (internal kernel enums can get renumbered, so this feature allows to accommodate that), etc. thing BTF is used for. The map definition is another, as you discovered,Thanks for your reply, Toke. I don't think I added much value here :) -Toke |
Re: __builtin_memcpy behavior
Toke Høiland-Jørgensen
"Tristan Mayfield" <mayfieldtristan@...> writes:
Toke, thanks for the quick response!Right, in that case that's probably just because the struct in question is next to some other valid memory (not sure where tracepoints keep their data, but if it's on the stack, for instance, you'll have no problem reading past it). Now that you mention CO-RE, it does actually make sense that theseI'm by no means the leading authority on CO-RE, but I can give answering a shot; hopefully someone will chime in to correct me if I'm wrong :) I don't have control over kernel versions or compilation flags for theYeah, getting all your ducks in a row when compiling can be a bit of an issue. However, I don't think you need anything special from the kernel at compile-time if you just compile your own programs with a vmlinux.h file you generated on a kernel that has been compiled with BTF. I've also been very unclear, and have gotten many different answersWell, you'll need the BTF information of the running kernel. It doesn't *have* to come from /sys/kernel/btf/vmlinux, libbpf will look for it in a few other locations as well: https://github.com/libbpf/libbpf/blob/master/src/btf.c#L4583 Distros have gotten pretty good about enabling BTF in their kernel builds, though, so it's getting increasingly feasible to rely on it. It should certainly be available on RHEL8 (and thus CentOS 8). If that's set, do you also need a vmlinux.h file as well? A coworkerNo, you don't need to ship the vmlinux.h file. That's just a regular header file with an unusual amount of definitions in it, that will be used at compile time. It can be useful to include a copy of it in your source code repository, though, as mentioned above. That's what BCC does, for instance: https://github.com/iovisor/bcc/tree/master/libbpf-tools/x86 An no, it's not 1GB in size. Maybe that size was from before BTF de-duplication got implemented? The one linked above is 2.7M. In addition to that, I've been unclear in the role of BTF in BPFOne common cause for this has been when loading 'tc' programs with iproute2, because the iproute2 loader doesn't understand BTF and will complain about it. That is usually harmless, though, but I agree it's quite annoying. Fortunately, iproute2 has recently gained support for using libbpf for its BPF loading, so hopefully that particular error should go away before too long. Unfortunately I haven't saved any of these errors andYes, the new-style map definitions use BTF. While BTF is ostensibly a type format (i.e., something that describes C data types), Andrii figured out that it is also possible to use it as a general purpose key/value store. You do this by being a bit clever about how you represent your data, which is what the __uint() macro in the above is doing (it's encoding the integer value as the size of an array, which becomes part of the type and thus embedded in the BTF). When loading, libbpf will parse this data back out of the BTF data and use it when creating the map. So you'll need BTF support in your compiler and in libbpf to use this style of map definitions. Am I misunderstanding what BTF is and the role it plays in BPF? OrHmm, no, CO-RE is the specific feature that does relocations of struct fields based on member names. This relies on BTF, but it's not the only thing BTF is used for. The map definition is another, as you discovered, and there are some program types that cannot work without BTF information at all. Also, things like bpftool being able to print out the struct layout of map values is using BTF. So you're certainly right that the BPF ecosystem in general is moving towards using BTF in more and more places. And I guess you're also right that this leads to some cryptic error messages sometimes... :) -Toke |
Re: __builtin_memcpy behavior
Tristan Mayfield
Toke, thanks for the quick response! I have tons of other questions, like the relationship with BPF and perf's utilities, but I think I've probably asked enough for this message! |
Re: __builtin_memcpy behavior
Toke Høiland-Jørgensen
"Tristan Mayfield" <mayfieldtristan@...> writes:
I'm not sure where the memory it was reading was and if that should bebpf_probe_read() will happily read any piece of kernel memory, it doesn't respect kernel boundaries. So if the call succeeded (you did check the return code, right?), that just means that the memory it was reading contained *something*, not that it was actually what you were *expecting* it to be. Should __builtin_memcpy be used? Or should bpf_probe_read? IfIn this case, I think what you're after is the BPF CO-RE facility in general. Have a look at Andrii's excellent introduction post here: https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html But basically, what it means is that if you add a magic attribute (preserve_access_index) to your variables, libbpf will notice that and perform load-time relocations so you get the right offset on the kernel you're running on. You can do this with your self-defined struct, but with BTF you don't even have to do that: You can make bpftool spit out a header file with all the structs defined by the current kernel and just include that (the struct name in this case would be 'struct trace_event_raw_tcp_event_sk_skb'). To do this, issue: bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h and use that as an include file. bpftool will even add a nice pragma statement at the top of the file so that all structs defined in it will automatically get the preserve_access_index attribute; which means you can basically use them like regular struct members (i.e., you can stick with __builtin_memcpy() and regular assignments) and still get the magic CO-RE relocations on load. Note that all this requires that your libbpf and clang versions are recent enough to understand all this, and libbpf will need the kernel BTF information to be present at load time (check for the existence of /sys/kernel/btf/vmlinux on the target system). -Toke |