Date   

Re: Extracting data from tracepoints (and anything else)

Andrii Nakryiko
 

On Wed, Apr 1, 2020 at 12:52 PM <mayfieldtristan@...> wrote:

I've spent a few days trying to solve this issue I've had, and I've learned a lot about both the past BPF APIs, and the new CO-RE API. I do have a couple questions though.

Once a CO-RE program is compiled and tested with the verifier, can it be run on a kernel of the same version that isn't compiled with BTF?
Just answered on another Github issue
(https://github.com/iovisor/bcc/issues/2855#issuecomment-609532793),
please check it there as well. Short answer: no. Unless you can pretty
much guarantee that it will be exactly the same **binary** compiled
version of the kernel (not just same version).

The CO-RE API is very nice, but in case that ends up only being able to run on kernels with BTF support enabled, I've been trying to solve the original issue found in this topic without the CO-RE approach. I'm still not able to read the arguments from a given tracepoint. I'll put my code below. I'm sure there are still plenty of issues and appreciate any time given to nudge me in the right direction.

#include <linux/bpf.h>
#include "bpf_helpers.h"

// To get kernel datatypes. Haven't figured out how to do this
// without cloning the kernel source tree yet.
#include "/kernel-src/tools/include/linux/types.h"
These should come from kernel-devel packages.

#include <linux/version.h>
#include <asm/ptrace.h>
#include <unistd.h>
#define MAX_CPUS 4

struct bpf_map_def SEC("maps") events = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = MAX_CPUS,
};
nit: this is deprecated form of declaring maps, please see kernel
selftests for better examples.



// Struct to pass data via perf buffer
struct data_t {
u32 pid;
u32 tgid;
char program_name[16]; // max comm length is arbitrary
It's not arbitrary, it's set at 16 in kernel.

char file[255];
};

struct sys_enter_openat_args {
// struct fields obtained from tplist.py output
long long pad;
int __syscall_nr;
int dfd;
const char * filename;
int flags;
__mode_t mode; // used __mode_t instead of umode_t
};
I haven't checked the order of fields, but each field has to be long
in size (so 8 bytes on 64-bit arch). BPF is 64-bit arch, so long is
64-bit there. I'm not sure how this plays out on 32-bit target
architecture, but assuming you are on x86-64, all switch int to long
and make __mode_t also long.


SEC("tracepoint/syscalls/sys_enter_openat")
int bpf_prog(struct sys_enter_openat_args *ctx)
{
struct data_t data = {};

data.pid = bpf_get_current_pid_tgid() >> 32;
data.tgid = bpf_get_current_pid_tgid();
bpf_get_current_comm(&data.program_name, sizeof(data.program_name));

int err = bpf_probe_read_str(data.file, sizeof(data.file), ctx->filename);

// debugging
char msg[] = "Probe read results: %d\n";
bpf_trace_printk(msg, sizeof(msg), ctx->err);
ctx->err doesn't exist according to definition above?


bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));
0 is not right here, use BPF_F_CURRENT_CPU (0xffffffffULL). Otherwise
you'll get data only on CPU #0 (if you get tracepoint triggered on
that CPU).


return 0;
}
char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;
_version is not necessary with modern libbpf and kernel.


With the above code, err = -14 and ctx->filename = -100.
This is due to invalid memory layour of struct sys_enter_openat_args,
you are reading wrong pointer. But sometimes filename might not be in
memory and you will get -EFAULT (-14), but that should not happen all
the time for sure.


I took a look at an article written by Gianluca Borello (https://sysdig.com/blog/the-art-of-writing-ebpf-programs-a-primer/) for Sysdig's approach, and thought that using a raw tracepoint would be easier to get the filename arg than the above approach. I tried it out, but couldn't get it to compile.
Here's the new function:

SEC("raw_tracepoint/sys_enter")
int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
{
unsigned long syscall_id = ctx->args[1];
volatile struct pt_regs *regs;
volatile const char *pathname;

regs = (struct pt_regs *)ctx->args[0];
pathname = (const char *)regs->si;
better include bpf_tracing.h header from libbpf and use
PT_REGS_PARM2_CORE(regs) instead of directly referencing fields of
pt_regs.


struct data_t data = {};

data.pid = bpf_get_current_pid_tgid() >> 32;
data.tgid = bpf_get_current_pid_tgid();
bpf_get_current_comm(&data.program_name, sizeof(data.program_name));

char msg[] = "Probe read results: %d\n";
bpf_trace_printk(msg, sizeof(msg), syscall_id);

bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));

return 0;
}

With this code I get a compilation error:
file_open_kern.c:77:34: error: no member named 'si' in 'struct pt_regs'
pathname = (const char *)regs->si;
~~~~ ^
This error is strange to me because ptrace.h does list %si as a valid field. Perhaps I'm using the wrong header. Hopefully this is enough information to be clear.
This is due to different definitions of struct pt_regs in user-space
and kernel-space. Using libbpf's bpf_tracing.h header and PT_REGS
macros should eliminate a lot of those. Sticking to vmlinux.h also
helps, but requires BPF CO-RE.

If CO-RE compiled programs can run on non-BTF supported kernels, then I would be more than happy to shift to that approach. Otherwise, it's nice to have non-BTF reliant code.
No, unfortunately, it can't.

As a final note, I was working through some examples for XDP in https://github.com/xdp-project/xdp-tutorial and was thinking that something similar would be helpful for general BPF programming. The API may be too volatile at this point, but if people who have the technical expertise are interested, I'm willing to donate some of my own time to help build something similar. BCC's libbpf-tools has been extremely helpful, but it seems that there's not any resources (I've found) that are as in-depth and cohesive as the tutorial linked above. Again, I don't know if it's completely appropriate at this stage of development, but I know there's a lot of interest out there in using BPF at a more granular level and with less overhead than what is offered with BCC.
I agree that such tutorial is sorely missing. libbpf-tools and kernel
selftests (not so much samples/bpf, though) are probably the best way
to see usage of all the newer features. It would be awesome for
someone to prepare an approachable and comprehensive set of tutorials,
of course. Please do give it a try and community will certainly help
you with answering questions you have!


Re: Extracting data from tracepoints (and anything else)

Tristan Mayfield
 

I've spent a few days trying to solve this issue I've had, and I've learned a lot about both the past BPF APIs, and the new CO-RE API. I do have a couple questions though.
  • Once a CO-RE program is compiled and tested with the verifier, can it be run on a kernel of the same version that isn't compiled with BTF?
  • The CO-RE API is very nice, but in case that ends up only being able to run on kernels with BTF support enabled, I've been trying to solve the original issue found in this topic without the CO-RE approach. I'm still not able to read the arguments from a given tracepoint. I'll put my code below. I'm sure there are still plenty of issues and appreciate any time given to nudge me in the right direction.
#include <linux/bpf.h>
#include "bpf_helpers.h"

// To get kernel datatypes. Haven't figured out how to do this
// without cloning the kernel source tree yet.
#include "/kernel-src/tools/include/linux/types.h"
#include <linux/version.h>
#include <asm/ptrace.h>
#include <unistd.h>
#define MAX_CPUS 4

struct bpf_map_def SEC("maps") events = {
  .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
  .key_size = sizeof(int),
  .value_size = sizeof(u32),
  .max_entries = MAX_CPUS,
};

// Struct to pass data via perf buffer
struct data_t {
    u32 pid;
    u32 tgid;
    char program_name[16]; // max comm length is arbitrary
    char file[255];
};

struct sys_enter_openat_args {
    // struct fields obtained from tplist.py output
    long long pad;
    int __syscall_nr;
    int dfd;
    const char * filename;
    int flags;
    __mode_t mode;  // used __mode_t instead of umode_t
};

SEC("tracepoint/syscalls/sys_enter_openat")
int bpf_prog(struct sys_enter_openat_args *ctx)
{
  struct data_t data = {};

  data.pid = bpf_get_current_pid_tgid() >> 32;
  data.tgid = bpf_get_current_pid_tgid();
  bpf_get_current_comm(&data.program_name, sizeof(data.program_name));

  int err = bpf_probe_read_str(data.file, sizeof(data.file), ctx->filename);

  // debugging
  char msg[] = "Probe read results: %d\n";
  bpf_trace_printk(msg, sizeof(msg), ctx->err);

  bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));

  return 0;
}
char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;

With the above code, err = -14 and ctx->filename = -100.
I took a look at an article written by


Re: Extracting data from tracepoints (and anything else)

Andrii Nakryiko
 

On Wed, Mar 25, 2020 at 11:39 AM <mayfieldtristan@...> wrote:

Take a closer look. libbpf-tools do not use bpf_load.h, that one is
deprecated and its use is discouraged. libbpf-tools rely on
code-generated BPF skeleton. But really, get a close look at
libbpf-tools, it has everything you need to get started.


Will do. Does this mean that, going forward, BPF development will be encouraged to use kernels compiled with "CONFIG_DEBUG_INFO_BTF=y"? I've been using a default build up to now.
Yes. A lot of newer functionality relies on kernel BTF as well. But to
compile portable BPF program you also need kernel BTF (for BPF CO-RE
stuff).


Re: Extracting data from tracepoints (and anything else)

Tristan Mayfield
 

Take a closer look. libbpf-tools do not use bpf_load.h, that one is
deprecated and its use is discouraged. libbpf-tools rely on
code-generated BPF skeleton. But really, get a close look at
libbpf-tools, it has everything you need to get started.

Will do. Does this mean that, going forward, BPF development will be encouraged to use kernels compiled with "CONFIG_DEBUG_INFO_BTF=y"? I've been using a default build up to now.


Re: Extracting data from tracepoints (and anything else)

Andrii Nakryiko
 

On Wed, Mar 25, 2020 at 6:45 AM <mayfieldtristan@...> wrote:

bpf_probe_read_str() has been there for a long time, at least 4.12 or
even older.

I found out that the cloned the kernel tree from the Ubuntu repo (i.e. "git clone --depth 1 git://kernel.ubuntu.com/ubuntu/ubuntu-bionic.git") for Bionic was the issue. For some reason it doesn't have an up to date libbpf library and so doesn't have bpf_probe_read_str(). I think going forward, getting the API from the repo you recommended or from the official kernel source is the way to go.

I appreciate the pointers for my BPF program. If using github.com/libbpf/libbpf, should I just plan on loading and attaching programs manually instead of using bpf_load.h? I've been looking through the bcc/libbpf-tools/ directory and it looks like they're making use of bpf_load.h and BTF/CO-RE. I've tried using bpf_load.h/c with the standalone libbpf, but I've gotten some difficult linking issues I haven't been able to resolve.
Take a closer look. libbpf-tools do not use bpf_load.h, that one is
deprecated and its use is discouraged. libbpf-tools rely on
code-generated BPF skeleton. But really, get a close look at
libbpf-tools, it has everything you need to get started.


Please keep this discussion on mailing list, though, it might benefit
someone else.

Agreed, the last message I replied to just you accidentally.
Thanks again for the help.


Re: Extracting data from tracepoints (and anything else)

Tristan Mayfield
 

bpf_probe_read_str() has been there for a long time, at least 4.12 or
even older.
I found out that the cloned the kernel tree from the Ubuntu repo (i.e. "git clone --depth 1 git://kernel.ubuntu.com/ubuntu/ubuntu-bionic.git") for Bionic was the issue. For some reason it doesn't have an up to date libbpf library and so doesn't have bpf_probe_read_str(). I think going forward, getting the API from the repo you recommended or from the official kernel source is the way to go.

I appreciate the pointers for my BPF program. If using github.com/libbpf/libbpf, should I just plan on loading and attaching programs manually instead of using bpf_load.h? I've been looking through the bcc/libbpf-tools/ directory and it looks like they're making use of bpf_load.h and BTF/CO-RE. I've tried using bpf_load.h/c with the standalone libbpf, but I've gotten some difficult linking issues I haven't been able to resolve.
Please keep this discussion on mailing list, though, it might benefit
someone else.
Agreed, the last message I replied to just you accidentally.
Thanks again for the help.


Re: Extracting data from tracepoints (and anything else)

Andrii Nakryiko
 

Adding back mailing list.

On Mon, Mar 23, 2020 at 12:33 PM <mayfieldtristan@...> wrote:

Thanks for the reply. All of your suggestions make sense. Because I'm targeting a kernel 4.15 for this specific bit of code, I can't use bpf_probe_read_str().. I think? The way I'm developing is that I cloned the kernel src of kernel 4.15 to a certain depth, and then compiled libbpf. Should I use the standalone libbpf repo instead? I've tried that but have struggled to get samples/bpf/ to compile taking that approach.
bpf_probe_read_str() has been there for a long time, at least 4.12 or
even older.

samples/bpf are part of kernel, so yes, they are using libbpf from
kernel sources. For stand-alone application I'd go with
github.com/libbpf/libbpf


Regardless, I took your advice, and my code now looks like:

struct data_t {
u32 pid;
char program_name[256]; // max comm length is arbitrary
comm is 16 and unlikely to ever change. No need to waste 256 bytes here.

char *file;
};

struct bpf_map_def SEC("maps") events = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = MAX_CPUS,
};



struct sys_enter_openat_args {
u16 common_type;
u8 common_flags;
u8 common_preempt_count;
int common_pid;
int __syscall_nr;
int dfd;
char *filename;
int flags;
__mode_t mode;
};

SEC("tracepoint/syscalls/sys_enter_openat")
int bpf_prog(struct sys_enter_openat_args *ctx) {
struct data_t data;
data has to be initialized here:

struct data_t data = {};

struct sys_enter_openat_args args;

int res = bpf_probe_read(&args, sizeof(args), ctx); // read the ctx into bpf space
if(!res) {
data.file = "couldn't get file";
} else {
data.file = args.filename;
}

data.pid = bpf_get_current_pid_tgid();
bpf_get_current_comm(data.program_name, sizeof(data.program_name));

bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));
return 0;
}

char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;

---
With a new error code

bpf_load_program() err=13
0: (bf) r6 = r1
1: (bf) r1 = r10
2: (07) r1 += -304
3: (b7) r2 = 32
4: (bf) r3 = r6
5: (85) call bpf_probe_read#4
6: (67) r0 <<= 32
7: (77) r0 >>= 32
8: (55) if r0 != 0x0 goto pc+3
R0=inv0 R6=ctx(id=0,off=0,imm=0) R10=fp0
9: (18) r1 = 0xffff88ac7c953000
11: (05) goto pc+1
13: (7b) *(u64 *)(r10 -8) = r1
14: (85) call bpf_get_current_pid_tgid#14
15: (63) *(u32 *)(r10 -272) = r0
16: (bf) r1 = r10
17: (07) r1 += -268
18: (b7) r2 = 256
19: (85) call bpf_get_current_comm#16
20: (bf) r4 = r10
21: (07) r4 += -272
22: (bf) r1 = r6
23: (18) r2 = 0xffff88ac7c953000
25: (b7) r3 = 0
26: (b7) r5 = 272
27: (85) call bpf_perf_event_output#25
invalid indirect read from stack off -272+260 size 272
The kernel didn't load the BPF program
---

Thanks for the pointer about the libbpf-tools in the BCC repo! I had seen it before but for some reason didn't make any kind of note about it. It's extremely helpful.

I will change the map declaration, just trying to understand getting data from the tracepoint right now.
data is not completely initialized, see above.

Please keep this discussion on mailing list, though, it might benefit
someone else.


Re: Array brace-enclosed initialization

Yonghong Song
 

On Mon, Mar 23, 2020 at 10:05 AM Federico Parola <fede.parola@...> wrote:

Hello everybody,
in my XDP eBPF program I'm trying to initialize an array with a brace-enclosed list, however my code is rejected by the verifier.
Here is a simple piece of code to replicate the problem:

#include <linux/bpf.h>

#ifndef __section
# define __section(NAME) \
__attribute__((section(NAME), used))
#endif

#ifndef BPF_FUNC
# define BPF_FUNC(NAME, ...) \
(*NAME)(__VA_ARGS__) = (void *)BPF_FUNC_##NAME
#endif

#ifndef printk
# define printk(fmt, ...) \
({ \
char ____fmt[] = fmt; \
trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__); \
})
#endif

static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);

__section("prog")
int xdp_prog(struct xdp_md *ctx) {
int i;
int array[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
The init list is too long. The compiler puts {0, 1, 2, ..., 9} in a readonly
section. That is why you got the failure below.

If you use native clang compilation and a recent kernel, libbpf
should be able to automatically create a map for you so your
code will work.


#pragma nounroll
for (i = 0; i < 10; i++) {
printk("%d", array[i]);
}

return XDP_PASS;
}

char __license[] __section("license") = "GPL"

This is the error reported by the verifier:

0: (b7) r6 = 0
1: (b7) r7 = 25637
2: (b7) r8 = 0
3: (73) *(u8 *)(r10 -2) = r6
last_idx 3 first_idx 0
regs=40 stack=0 before 2: (b7) r8 = 0
regs=40 stack=0 before 1: (b7) r7 = 25637
regs=40 stack=0 before 0: (b7) r6 = 0
4: (6b) *(u16 *)(r10 -4) = r7
5: (18) r1 = 0x0
7: (0f) r1 += r8
8: (61) r3 = *(u32 *)(r1 +0)
R1 invalid mem access 'inv'
processed 8 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

I tried compiling with clang-6 and clang-9 with optimizaton set to O1 and O2, but I get this error in all cases.
If I initialize the array in another way (e.g. with a loop) the program works correctly.
The eBPF bytecode generated by clang is the following:

.text
.file "test.c"
.section prog,"ax",@progbits
.globl xdp_prog # -- Begin function xdp_prog
.p2align 3
.type xdp_prog,@function
xdp_prog: # @xdp_prog
# %bb.0:
r6 = 0
r7 = 25637
r8 = 0
LBB0_1: # =>This Inner Loop Header: Depth=1
*(u8 *)(r10 - 2) = r6
*(u16 *)(r10 - 4) = r7
r1 = .L__const.xdp_prog.array ll
r1 += r8
r3 = *(u32 *)(r1 + 0)
r1 = r10
r1 += -4
r2 = 3
call 6
r8 += 4
if r8 != 24 goto LBB0_1
# %bb.2:
r0 = 2
exit
.Lfunc_end0:
.size xdp_prog, .Lfunc_end0-xdp_prog
# -- End function
.type .L__const.xdp_prog.array,@object # @__const.xdp_prog.array
.section .rodata,"a",@progbits
.p2align 2
.L__const.xdp_prog.array:
.long 0 # 0x0
.long 1 # 0x1
.long 2 # 0x2
.long 3 # 0x3
.long 4 # 0x4
.long 5 # 0x5
.long 6 # 0x6
.long 7 # 0x7
.long 8 # 0x8
.long 9 # 0x9
.size .L__const.xdp_prog.array, 40

.type .L__const.xdp_prog.____fmt,@object # @__const.xdp_prog.____fmt
.section .rodata.str1.1,"aMS",@progbits,1
.L__const.xdp_prog.____fmt:
.asciz "%d"
.size .L__const.xdp_prog.____fmt, 3

.type __license,@object # @__license
.section license,"aw",@progbits
.globl __license
__license:
.asciz "GPL"
.size __license, 4


.addrsig
.addrsig_sym xdp_prog
.addrsig_sym __license

It seems like the array in the stack is not initialized in the code. With some declarations the code works, for example decalring the array in the following way:
int array[10] = {0, 1, 2, 3};
everything works, the generated bytecode is the following:

.text
.file "test.c"
.section prog,"ax",@progbits
.globl xdp_prog # -- Begin function xdp_prog
.p2align 3
.type xdp_prog,@function
xdp_prog: # @xdp_prog
# %bb.0:
r1 = 2
*(u32 *)(r10 - 36) = r1
r6 = 0
*(u32 *)(r10 - 8) = r6
*(u32 *)(r10 - 12) = r6
*(u32 *)(r10 - 16) = r6
*(u32 *)(r10 - 20) = r6
*(u32 *)(r10 - 24) = r6
*(u32 *)(r10 - 28) = r6
*(u32 *)(r10 - 4) = r6
r1 = 3
*(u32 *)(r10 - 32) = r1
r1 = 1
*(u32 *)(r10 - 40) = r1
*(u8 *)(r10 - 42) = r6
r7 = 25637
*(u16 *)(r10 - 44) = r7
r1 = r10
r1 += -44
r2 = 3
r3 = 1
call 6
r8 = 4
LBB0_1: # =>This Inner Loop Header: Depth=1
r1 = r10
r1 += -40
r1 += r8
r3 = *(u32 *)(r1 + 0)
*(u8 *)(r10 - 42) = r6
*(u16 *)(r10 - 44) = r7
r1 = r10
r1 += -44
r2 = 3
call 6
r8 += 4
if r8 != 24 goto LBB0_1
# %bb.2:
r0 = 2
exit
.Lfunc_end0:
.size xdp_prog, .Lfunc_end0-xdp_prog
# -- End function
.type .L__const.xdp_prog.____fmt,@object # @__const.xdp_prog.____fmt
.section .rodata.str1.1,"aMS",@progbits,1
.L__const.xdp_prog.____fmt:
.asciz "%d"
.size .L__const.xdp_prog.____fmt, 3

.type __license,@object # @__license
.section license,"aw",@progbits
.globl __license
__license:
.asciz "GPL"
.size __license, 4


.addrsig
.addrsig_sym xdp_prog
.addrsig_sym __license

This time the array is correctly initialized into code.
Is this a clang bug?


Re: Extracting data from tracepoints (and anything else)

Andrii Nakryiko
 

On Mon, Mar 23, 2020 at 9:38 AM <mayfieldtristan@...> wrote:

I've been exploring the libbpf library for different versions of the Linux kernel, and trying to rewrite some of the BCC tools. I would like to do more work with CO-RE eventually, but I'm trying to understand the entire model of how BPF programs work and how data flows between the kernel, the VM, and userspace. I just started using perf buffers instead of bpf_trace_printk and came across an issue that has me scratching my head. In the below code, I'm not able to access the const char * arg in the tracepoint sys_enter_openat (kernel 4.15). For some reason the verifier rejects this code. I think it's valid C (although I'm a little bit rusty still) and I think I followed the correct flow where data must be copied from the kernel to the VM before being able to use.

If anyone has insight to share, I'd much appreciate it. Conversely, if anyone can point me in the direction of how to debug BPF programs that would be extremely helpful too. Should I just dig into learning the basics of BPF asm?

Highlights of the code:

struct bpf_map_def SEC("maps") events = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = MAX_CPUS,
};
nit: this is a legacy syntax of specifying BPF maps, please see [0]
for some newer examples

[0] https://github.com/iovisor/bcc/tree/master/libbpf-tools


struct sys_enter_openat_args {
u16 common_type;
u8 common_flags;
u8 common_preempt_count;
int common_pid;
int __syscall_nr;
int dfd;
char *filename;
int flags;
__mode_t mode;
};

SEC("tracepoint/syscalls/sys_enter_openat")
int bpf_prog(struct sys_enter_openat_args *ctx) {
struct data_t data;
struct sys_enter_openat_args *args;

int res = bpf_probe_read(args, sizeof(ctx), ctx);
you don't need to bpf_probe_read() ctx here, you can just access its
members directly.

if(!res) {
data.file = "couldn't get file";
} else {
data.file = args->filename;
But here if you want to read filename contents itself, you'll need to
use bpf_probe_read_str().

Having data_t definition would be also helpful.

}

Error Message:

bpf_load_program() err=13
0: (bf) r6 = r1
1: (b7) r2 = 8
2: (bf) r3 = r6
3: (85) call bpf_probe_read#4
R1 type=ctx expected=fp
this error from verifier is quite misleading, but what verifier
complains about here is that you try to read uninitialized pointer
(arg) and pass it as a first parameter into bpf_probe_read(). But see
above, you don't need to bpf_probe_read() anything, and even if you
wanted to it would have to be done very differently:

struct sys_enter_openat_args args; /* notice no pointer here */
bpf_probe_read(&args, sizeof(args), ctx); /* taking address of args,
taking size of args, not its pointer */

The kernel didn't load the BPF program

data.pid = bpf_get_current_pid_tgid(); // use fn from libbpf.h to get pid_tgid
bpf_get_current_comm(data.program_name, sizeof(data.program_name)); // puts current comm into char array

bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));

return 0;
}

If more code would be helpful, I'm happy to share.

I recognize that libbpf and CO-RE in later kernels provides an easier API for dealing with char * (bpf_probe_read_str() I believe) but I'm trying to understand what needs to be done to target different kernels and not just the most cutting edge.

As a second question, how much should I learn about perf(1) and its overlap with BPF?

Finally, for long-term monitoring solutions and passing readable data, do most programs rely on pinning maps to the vfs instead of using perf buffers or passing directly to a userspace process?
It's a mix. If your data should/can be pre-aggregated in kernel, using
map might benefit you in that you will be sending much less data to
user-space. But if you want to send every piece of information than
perf_buffer is faster and more convenient than having user-space query
BPF maps all the time.


Thanks for the patience and goodwill with a new systems dev. I've enjoyed my interactions with the BPF community.
You're welcome. Check libbpf-tools in BCC repo, it should give you
some examples to work off of.


Tristan


Array brace-enclosed initialization

Federico Parola <fede.parola@...>
 

Hello everybody,
in my XDP eBPF program I'm trying to initialize an array with a brace-enclosed list, however my code is rejected by the verifier.
Here is a simple piece of code to replicate the problem:

#include <linux/bpf.h>

#ifndef __section
# define __section(NAME)                  \
   __attribute__((section(NAME), used))
#endif

#ifndef BPF_FUNC
# define BPF_FUNC(NAME, ...)              \
   (*NAME)(__VA_ARGS__) = (void *)BPF_FUNC_##NAME
#endif

#ifndef printk
# define printk(fmt, ...)                                      \
    ({                                                         \
        char ____fmt[] = fmt;                                  \
        trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__); \
    })
#endif

static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);

__section("prog")
int xdp_prog(struct xdp_md *ctx) {
  int i;
  int array[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};

#pragma nounroll
  for (i = 0; i < 10; i++) {
    printk("%d", array[i]);
  }

  return XDP_PASS;
}

char __license[] __section("license") = "GPL"
This is the error reported by the verifier:
0: (b7) r6 = 0
1: (b7) r7 = 25637
2: (b7) r8 = 0
3: (73) *(u8 *)(r10 -2) = r6
last_idx 3 first_idx 0
regs=40 stack=0 before 2: (b7) r8 = 0
regs=40 stack=0 before 1: (b7) r7 = 25637
regs=40 stack=0 before 0: (b7) r6 = 0
4: (6b) *(u16 *)(r10 -4) = r7
5: (18) r1 = 0x0
7: (0f) r1 += r8
8: (61) r3 = *(u32 *)(r1 +0)
R1 invalid mem access 'inv'
processed 8 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
I tried compiling with clang-6 and clang-9 with optimizaton set to O1 and O2, but I get this error in all cases.
If I initialize the array in another way (e.g. with a loop) the program works correctly.
The eBPF bytecode generated by clang is the following:
	.text
	.file	"test.c"
	.section	prog,"ax",@progbits
	.globl	xdp_prog                # -- Begin function xdp_prog
	.p2align	3
	.type	xdp_prog,@function
xdp_prog:                               # @xdp_prog
# %bb.0:
	r6 = 0
	r7 = 25637
	r8 = 0
LBB0_1:                                 # =>This Inner Loop Header: Depth=1
	*(u8 *)(r10 - 2) = r6
	*(u16 *)(r10 - 4) = r7
	r1 = .L__const.xdp_prog.array ll
	r1 += r8
	r3 = *(u32 *)(r1 + 0)
	r1 = r10
	r1 += -4
	r2 = 3
	call 6
	r8 += 4
	if r8 != 24 goto LBB0_1
# %bb.2:
	r0 = 2
	exit
.Lfunc_end0:
	.size	xdp_prog, .Lfunc_end0-xdp_prog
                                        # -- End function
	.type	.L__const.xdp_prog.array,@object # @__const.xdp_prog.array
	.section	.rodata,"a",@progbits
	.p2align	2
.L__const.xdp_prog.array:
	.long	0                       # 0x0
	.long	1                       # 0x1
	.long	2                       # 0x2
	.long	3                       # 0x3
	.long	4                       # 0x4
	.long	5                       # 0x5
	.long	6                       # 0x6
	.long	7                       # 0x7
	.long	8                       # 0x8
	.long	9                       # 0x9
	.size	.L__const.xdp_prog.array, 40

	.type	.L__const.xdp_prog.____fmt,@object # @__const.xdp_prog.____fmt
	.section	.rodata.str1.1,"aMS",@progbits,1
.L__const.xdp_prog.____fmt:
	.asciz	"%d"
	.size	.L__const.xdp_prog.____fmt, 3

	.type	__license,@object       # @__license
	.section	license,"aw",@progbits
	.globl	__license
__license:
	.asciz	"GPL"
	.size	__license, 4


	.addrsig
	.addrsig_sym xdp_prog
	.addrsig_sym __license
It seems like the array in the stack is not initialized in the code. With some declarations the code works, for example decalring the array in the following way:
int array[10] = {0, 1, 2, 3};
everything works, the generated bytecode is the following:
	.text
	.file	"test.c"
	.section	prog,"ax",@progbits
	.globl	xdp_prog                # -- Begin function xdp_prog
	.p2align	3
	.type	xdp_prog,@function
xdp_prog:                               # @xdp_prog
# %bb.0:
	r1 = 2
	*(u32 *)(r10 - 36) = r1
	r6 = 0
	*(u32 *)(r10 - 8) = r6
	*(u32 *)(r10 - 12) = r6
	*(u32 *)(r10 - 16) = r6
	*(u32 *)(r10 - 20) = r6
	*(u32 *)(r10 - 24) = r6
	*(u32 *)(r10 - 28) = r6
	*(u32 *)(r10 - 4) = r6
	r1 = 3
	*(u32 *)(r10 - 32) = r1
	r1 = 1
	*(u32 *)(r10 - 40) = r1
	*(u8 *)(r10 - 42) = r6
	r7 = 25637
	*(u16 *)(r10 - 44) = r7
	r1 = r10
	r1 += -44
	r2 = 3
	r3 = 1
	call 6
	r8 = 4
LBB0_1:                                 # =>This Inner Loop Header: Depth=1
	r1 = r10
	r1 += -40
	r1 += r8
	r3 = *(u32 *)(r1 + 0)
	*(u8 *)(r10 - 42) = r6
	*(u16 *)(r10 - 44) = r7
	r1 = r10
	r1 += -44
	r2 = 3
	call 6
	r8 += 4
	if r8 != 24 goto LBB0_1
# %bb.2:
	r0 = 2
	exit
.Lfunc_end0:
	.size	xdp_prog, .Lfunc_end0-xdp_prog
                                        # -- End function
	.type	.L__const.xdp_prog.____fmt,@object # @__const.xdp_prog.____fmt
	.section	.rodata.str1.1,"aMS",@progbits,1
.L__const.xdp_prog.____fmt:
	.asciz	"%d"
	.size	.L__const.xdp_prog.____fmt, 3

	.type	__license,@object       # @__license
	.section	license,"aw",@progbits
	.globl	__license
__license:
	.asciz	"GPL"
	.size	__license, 4


	.addrsig
	.addrsig_sym xdp_prog
	.addrsig_sym __license
This time the array is correctly initialized into code.
Is this a clang bug?


Extracting data from tracepoints (and anything else)

Tristan Mayfield
 

I've been exploring the libbpf library for different versions of the Linux kernel, and trying to rewrite some of the BCC tools. I would like to do more work with CO-RE eventually, but I'm trying to understand the entire model of how BPF programs work and how data flows between the kernel, the VM, and userspace. I just started using perf buffers instead of bpf_trace_printk and came across an issue that has me scratching my head. In the below code, I'm not able to access the const char * arg in the tracepoint sys_enter_openat (kernel 4.15). For some reason the verifier rejects this code. I think it's valid C (although I'm a little bit rusty still) and I think I followed the correct flow where data must be copied from the kernel to the VM before being able to use.

If anyone has insight to share, I'd much appreciate it. Conversely, if anyone can point me in the direction of how to debug BPF programs that would be extremely helpful too. Should I just dig into learning the basics of BPF asm?

Highlights of the code:

struct bpf_map_def SEC("maps") events = {
  .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
  .key_size = sizeof(int),
  .value_size = sizeof(u32),
  .max_entries = MAX_CPUS,
};

struct sys_enter_openat_args {
        u16 common_type;
        u8 common_flags;
        u8 common_preempt_count;
        int common_pid;
        int __syscall_nr;
        int dfd;
        char *filename;
        int flags;
        __mode_t mode;
};

SEC("tracepoint/syscalls/sys_enter_openat")
int bpf_prog(struct sys_enter_openat_args *ctx) {
  struct data_t data;
  struct sys_enter_openat_args *args;

  int res = bpf_probe_read(args, sizeof(ctx), ctx);
  if(!res) {
         data.file = "couldn't get file";
  } else {
         data.file = args->filename;
  }

Error Message:

bpf_load_program() err=13
0: (bf) r6 = r1
1: (b7) r2 = 8
2: (bf) r3 = r6
3: (85) call bpf_probe_read#4
R1 type=ctx expected=fp
The kernel didn't load the BPF program

  data.pid = bpf_get_current_pid_tgid(); // use fn from libbpf.h to get pid_tgid
  bpf_get_current_comm(data.program_name, sizeof(data.program_name)); // puts current comm into char array

  bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));

  return 0;
}

If more code would be helpful, I'm happy to share.

I recognize that libbpf and CO-RE in later kernels provides an easier API for dealing with char * (bpf_probe_read_str() I believe) but I'm trying to understand what needs to be done to target different kernels and not just the most cutting edge.

As a second question, how much should I learn about perf(1) and its overlap with BPF?

Finally, for long-term monitoring solutions and passing readable data, do most programs rely on pinning maps to the vfs instead of using perf buffers or passing directly to a userspace process?

Thanks for the patience and goodwill with a new systems dev. I've enjoyed my interactions with the BPF community.

Tristan


Study on annotation of design and implementation choices, and of technical debt

a.serebrenik@...
 

Dear all,

As software engineering research teams at the University of Sannio (Italy) and Eindhoven University of Technology (The Netherlands) we are interested in investigating the protocol used by developers while they have to annotate implementation and design choices during their normal development activities. More specifically, we are looking at whether, where and what kind of annotations developers usually use trying to be focused more on those annotations mainly aimed at highlighting that the code is not in the right shape (e.g., comments for annotating delayed or intended work activities such as TODO, FIXME, hack, workaround, etc). In the latter case, we are looking at what is the content of the above annotations, as well as how they usually behave while evolving the code that has been previously annotated.

When answering the survey, in case your annotation practices are different in different open source projects you may contribute, please refer to how you behave for the projects where you have been contacted.

Filling out the survey will take about 5 minutes.

Please note that your identity and personal data will not be disclosed, while we plan to use the aggregated results and anonymized responses as part of a scientific publication. 

If you have any questions about the questionnaire or our research, please do not hesitate to contact us.

You can find the survey link here:


Thanks and regards,

Gianmarco Fucci (gianmarcofucci94@...)
Fiorella Zampetti (fzampetti@...)
Alexander Serebrenik (a.serebrenik@...)
Massimiliano Di Penta (dipenta@...)

--


Re: is BCC tools safe to enable root privilegies in production?

Cristian Spinetta
 

Thanks for your fast reply!

In our infrastructure the owners of the app can logging into the production VMs that are running their apps and execute a restricted list of command with sudo (e.g. tcpdump, netstat, ...). The idea is to give root access to each script of bcc tool (all within /usr/share/bcc/tools/*). We are concerned if there are some bcc scripts that can run another command like in the example above or if there are other security concerns to be aware of.

Best,
Cristian Spinetta


On Fri, Mar 13, 2020 at 1:23 PM Brendan Gregg <brendan.d.gregg@...> wrote:
On Fri, Mar 13, 2020 at 7:59 AM Cristian Spinetta <cebspinetta@...> wrote:
>
> Hi all!
>
> I am curious about whether it is safe to enable root access to BCC scripts on production machines.
> In my company, each team has access to their instances via ssh, and we are thinking to allow them to use bcc in production. For this purpose we need to allow root access to any BCC tool. Do you think it would be safe? for example, is there some tool that can receive a command to execute? in that case it would be unsafe because someone could execute any command thought a bcc tool.
>
> e.g.:
> sudo /usr/share/bcc/tools/some-great-tool.sh dd if=/dev/zero of=/dev/sda bs=512 count=1 conv=notrunc

^^^^

sudo isn't safe. If you remove the BCC tool from this one-liner,
you'll find it still destroys your disk.

In practice the production concern I have is for the overhead of each
tool, hence the overhead section in each tool's man page.

Brendan

>
> Best,
> Cristian Spinetta
>




Re: is BCC tools safe to enable root privilegies in production?

Brendan Gregg
 

On Fri, Mar 13, 2020 at 7:59 AM Cristian Spinetta <cebspinetta@...> wrote:

Hi all!

I am curious about whether it is safe to enable root access to BCC scripts on production machines.
In my company, each team has access to their instances via ssh, and we are thinking to allow them to use bcc in production. For this purpose we need to allow root access to any BCC tool. Do you think it would be safe? for example, is there some tool that can receive a command to execute? in that case it would be unsafe because someone could execute any command thought a bcc tool.

e.g.:
sudo /usr/share/bcc/tools/some-great-tool.sh dd if=/dev/zero of=/dev/sda bs=512 count=1 conv=notrunc
^^^^

sudo isn't safe. If you remove the BCC tool from this one-liner,
you'll find it still destroys your disk.

In practice the production concern I have is for the overhead of each
tool, hence the overhead section in each tool's man page.

Brendan


Best,
Cristian Spinetta


is BCC tools safe to enable root privilegies in production?

Cristian Spinetta
 

Hi all!

I am curious about whether it is safe to enable root access to BCC scripts on production machines.
In my company, each team has access to their instances via ssh, and we are thinking to allow them to use bcc in production. For this purpose we need to allow root access to any BCC tool. Do you think it would be safe? for example, is there some tool that can receive a command to execute? in that case it would be unsafe because someone could execute any command thought a bcc tool.

e.g.:
sudo /usr/share/bcc/tools/some-great-tool.sh dd if=/dev/zero of=/dev/sda bs=512 count=1 conv=notrunc

Best,
Cristian Spinetta


Re: clang 10 for BPF CO-RE

Jesper Dangaard Brouer
 

On Wed, 11 Mar 2020 10:36:47 -0700
"Andrii Nakryiko" <andrii.nakryiko@...> wrote:

On Wed, Mar 11, 2020 at 10:33 AM <tmayfield@...> wrote:

Hi all,

Finally found the BPF blog and it's been nice to get more
information on using libbpf directly since I don't have a lot of
systems or kernel experience.
Thanks! Glad it was useful.
I assume this is the blog post[1]:
[1] https://facebookmicrosites.github.io/bpf/blog/2020/02/20/bcc-to-libbpf-howto-guide.html
Thanks for writing that Andrii!

For using libbpf directly from C, we also have the XDP-tutorial[2], but
doesn't contain BPF CO-RE examples. And it uses the old style map
defines. We are planning to update/fix that, once LLVM 10 gets more
widely available in distros.

[2] https://github.com/xdp-project/xdp-tutorial


Scanning through the "BCC to libbpf" post, I noticed Andrii calls
for using clang 10. I went to look at llvm releases and only saw
clang/llvm 9 (as of September 2019). Is clang 10 just built from
source?
For kernel/libbpf development we do build Clang from sources, but you
can install it from packages as well. See https://apt.llvm.org/, there
are packages for Clang 10 and even Clang 11 and they are updated
frequently.
Let me give you the manual compile recipe (that I got from Eelco):

git clone https://github.com/llvm/llvm-project.git
cd llvm-project
mkdir -p llvm/build/install
cd llvm/build
cmake -G "Ninja" -DLLVM_TARGETS_TO_BUILD="BPF;X86" \
-DLLVM_ENABLE_PROJECTS="clang" \
-DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$PWD/install ..
ninja && ninja install
export PATH=$PWD/install/bin:$PATH

--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer


Re: clang 10 for BPF CO-RE

Andrii Nakryiko
 

On Wed, Mar 11, 2020 at 10:33 AM <tmayfield@...> wrote:

Hi all,

Finally found the BPF blog and it's been nice to get more information on using libbpf directly since I don't have a lot of systems or kernel experience.
Thanks! Glad it was useful.


Scanning through the "BCC to libbpf" post, I noticed Andrii calls for using clang 10. I went to look at llvm releases and only saw clang/llvm 9 (as of September 2019).
Is clang 10 just built from source?
For kernel/libbpf development we do build Clang from sources, but you
can install it from packages as well. See https://apt.llvm.org/, there
are packages for Clang 10 and even Clang 11 and they are updated
frequently.


Looking forward to building with CO-RE and move some of my BCC tooling to libbpf.
Great, please do!


-Tristan


clang 10 for BPF CO-RE

Tristan Mayfield
 

Hi all,
 
Finally found the BPF blog and it's been nice to get more information on using libbpf directly since I don't have a lot of systems or kernel experience.
 
Scanning through the "BCC to libbpf" post, I noticed Andrii calls for using clang 10. I went to look at llvm releases and only saw clang/llvm 9 (as of September 2019).
Is clang 10 just built from source?
 
Looking forward to building with CO-RE and move some of my BCC tooling to libbpf.
 
-Tristan


Re: Getting function's address from BPF_TRACE_FENTRY BPF program

Yutaro Hayakawa
 

I see, so this means the fentry program
needs to load and verify the program for
every functions to attach right?

In my (maybe very specific) case, the
tool may attaches programs to more than
1000 functions. So it is important to
reduce the programs to reduce the attach
time.

I will continue to use kprobe. Thank you very
much for your help.

Yutaro

On Mar 8, 2020, at 4:19, Alexei Starovoitov <alexei.starovoitov@...> wrote:

On Fri, Mar 6, 2020 at 11:19 PM Yutaro Hayakawa <yhayakawa3720@...> wrote:

Hello,

Is there any way to get the address of the function in fentry type programs like
kprobe type programs does by PT_REGS_IP(pt_regs)?

I'd like to migrate my kprobe based tool[1] to fentry based one, but only this
feature is missing right now. Since the tool attaches single BPF program to
the multiple kernel functions, it needs to have function's address to identify
which function the trace data comes from.

[1] https://github.com/YutaroHayakawa/ipftrace
I think this approach won't quite work with fentry because
the same fenty type prog cannot be attached to multiple kernel functions.
At load time the kernel verifier needs to hold target kernel function,
check that arguments match, etc. So at that point the target function
address is fixed and when fentry prog is called it will see only one
'faddr' == regs_ip.


Re: Getting function's address from BPF_TRACE_FENTRY BPF program

Alexei Starovoitov
 

On Fri, Mar 6, 2020 at 11:19 PM Yutaro Hayakawa <yhayakawa3720@...> wrote:

Hello,

Is there any way to get the address of the function in fentry type programs like
kprobe type programs does by PT_REGS_IP(pt_regs)?

I'd like to migrate my kprobe based tool[1] to fentry based one, but only this
feature is missing right now. Since the tool attaches single BPF program to
the multiple kernel functions, it needs to have function's address to identify
which function the trace data comes from.

[1] https://github.com/YutaroHayakawa/ipftrace
I think this approach won't quite work with fentry because
the same fenty type prog cannot be attached to multiple kernel functions.
At load time the kernel verifier needs to hold target kernel function,
check that arguments match, etc. So at that point the target function
address is fixed and when fentry prog is called it will see only one
'faddr' == regs_ip.

181 - 200 of 2015