Date   

#bcc - skb_network_header crashes in a BPF Kernel trace function #bcc

vigs.prof@...
 

Hello - I am looking to trace ip_forward_finish. The intent is to trace latency of all TCP connections going through a linux based gateway router.  Hence thought of tracing ip_forward_finish kernel function. And capture the time-stamp of SYN, SYN-ACK and ACK messages at the router. 
 
The issue is accessing iphdr inside the trace function crashes with the below error:
 
bpf: Failed to load program: Permission denied
0: (79) r6 = *(u64 *)(r1 +96)
1: (b7) r1 = 0
2: (6b) *(u16 *)(r10 -24) = r1
3: (bf) r3 = r6
4: (07) r3 += 192
5: (bf) r1 = r10
6: (07) r1 += -24
7: (b7) r2 = 2
8: (85) call bpf_probe_read#4
9: (69) r1 = *(u16 *)(r10 -24)
10: (55) if r1 != 0x8 goto pc+7
 R0=inv(id=0) R1=inv8 R6=inv(id=0) R10=fp0
11: (69) r1 = *(u16 *)(r6 +196)
R6 invalid mem access 'inv'
 
HINT: The invalid mem access 'inv' error can happen if you try to dereference memory without first using bpf_probe_read() to copy it to the BPF stack. Sometimes the bpf_probe_read is automatic by the bcc rewriter, other times you'll need to be explicit.
 
The code fragment I originally had was as below and the crash occurs when an access to ip_Hdr->protocol is made. And I also checked that ip_Hdr is not null. 
 
int trace_forward_finish(struct pt_regs *ctx,struct net *net, struct sock *sk, struct sk_buff *skb)
{
 
    if (skb->protocol != htons(ETH_P_IP)) return 0;
 
    struct iphdr* ip_Hdr = (struct iphdr *) skb_network_header(skb);
 
    if (ip_Hdr->protocol != IPPROTO_TCP)
         return 0;
 
 
    /// Other code
 
  }
 
Per the HINT in the message, I did try to change to bpf_probe_read but still the same outcome
 
int trace_forward_finish(struct pt_regs *ctx,struct net *net, struct sock *sk, struct sk_buff *skb)
{
    if (skb->protocol != htons(ETH_P_IP)) return 0;
 
    struct iphdr ip_Hdr;
    bpf_probe_read(&ip_Hdr, sizeof(ip_Hdr), (void*)ip_hdr(skb)); 
    
    if ( (ip_Hdr.protocol != IPPROTO_TCP))
         return 0;
 
    return 0;
}
 
Any help would be appreciated. 


Seeking candidates for PhD position related to XDP/eBPF

Jesper Dangaard Brouer
 

Hi Potential PhD student,

Reminder: Application deadline 15.May 2020 is really soon for our PhD
position located in Sweden, at Karlstads University. See:

"PhD position in Computer Science, programmable networks"
https://kau.varbi.com/en/what:job/jobID:315513

This PhD position is related to XDP/eBPF. The Red Hat engineers you
will be cooperating with are Toke and I. Red Hat is funding the
position, but employment happens under University terms, with the
exception the work should be released under an Open Source license.

--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer


Re: Extracting data from tracepoints (and anything else)

Andrii Nakryiko
 

On Thu, Apr 16, 2020 at 8:42 AM <mayfieldtristan@...> wrote:

I've waited to reply, not wanting to clog the mailing list, but I thought it would be beneficial to follow up on the same topic with kprobes in addition to tracepoints. The main issue I had with tracepoints was not understanding the 8-byte alignment in the arguments. Once that was sorted, getting information was actually really simple.

At this point I've moved to kprobes, kretprobes, and raw tracepoints. From what I understand, if not using CO-RE or vmlinux.h, to access data from kprobes or kretprobes you must access the cpu registers in which those values live?
You are not really accessing CPU registers, but you access their
values before the program was interrupted. Those values are stored in
pt_regs struct. It's a technicality in this case, but you can't access
CPU registers directly in BPF.

BTW, raw_tracepoints are completely different, but you should be able
to find examples in selftests for those.

For example, if I'm porting Brenden Gregg's bpftrace tool "elfsnoop" to libbpf, I'd want to trace "load_elf_binary()." load_elf_binary() only has one argument: "struct linux_binrprm *bprm." So if I want to read that struct, I'd have to access the register with that argument. I think in bpf_tracing.h that macro would be PT_REGS_PARAM1(x). I don't have the greatest understanding of asm and cpu registers, but I believe that would be the %rdi register?
Yes, rdi register, which is accesed from pt_regs using PT_REGS_PARM1()

With that in mind, here's my code and build.

#include <linux/bpf.h>
#include "bpf_helpers.h"
#include "bpf_tracing.h"
#include <linux/ptrace.h>
#include <linux/types.h>

SEC("kprobe/load_elf_binary")
int trace_entry(struct pt_regs *ctx) {
char msg[] = "hello world\n"; // for verification that the bpf program is running at all
bpf_trace_printk(msg, sizeof(msg));

struct linux_binprm *arg = (struct linux_binprm *) PT_REGS_PARM1(ctx);

return 0;
}
char _license[] SEC("license") = "GPL";

//// And the build command.
// Target arch and kernel are defined to get the correct macros
// in bpf_tracing.h
$ clang -O3 -Wall -target bpf \
-D__TARGET_ARCH_x86 \
-D__KERNEL__ -c \
elfsnoop.bpf.c \
-I/home/vagrant/libbpf/src/ \
-o elfsnoop.bpf.o


Unfortunately, as Andrii mentioned previously in this topic, I think there are different definitions of pt_regs and my /usr/include/linux/ptrace.h does not have the correct one, as evidenced by the error I get when trying to build.

elfsnoop.bpf.c:89:54: error: no member named 'di' in 'struct pt_regs'
struct linux_binprm *arg = (struct linux_binprm *) PT_REGS_PARM1(ctx);
^~~~~~~~~~~~~~~~~~
/home/vagrant/libbpf/src/bpf_tracing.h:54:32: note: expanded from macro 'PT_REGS_PARM1'
#define PT_REGS_PARM1(x) ((x)->di)

Is this the correct way to access data in kprobes? Most of the information I've found explicitly talking about accessing kprobe data is pretty old (2012-2015). selftests/bpf/ seems to not have examples of accessing kprobe data, and, from my understanding, libbpf-tools is CO-RE dependent which I'm trying to avoid for now just because most default kernels aren't BTF enabled yet (I will definitely be voicing my opinion to distros that this should change since the average user likely isn't keen on recompiling and installing a kernel). I also looked at the brief C Appendix of "BPF Performace Tools" and "Linux Observability with BPF" to try and understand, but I still haven't been able to extract data from the kprobes or raw tracepoints yet.
I think the final question that may (or may not) solve this issue is which pt_regs should be used?
So <linux/ptrace.h> in your case is taken from UAPI headers, not
kernel internal headers. They have different names for field. Drop
-D__KERNEL__ part and it should work.


Also, assuming this is the correct way, is this generalizable to raw tracepoints and kretprobes as well?
kretprobes can only safely access return value, which you would use
PT_REGS_RC(ctx) to get. Input arguments are clobbered by the time
kretprobe fires, so using PT_REGS_PARM1(ctx) would return you
something, but most probably it won't be a correct value of first
input argument.

raw_tracepoints are similar to fentry/fexit in that each input
argument is 8-byte long. See progs/test_vmlinux.c in selftests/bpf for
an example of getting a syscall number on sys_entry. BPF_PROG is
useful macro for such use cases.


After I have these things figured out with some working examples, I think I will publish a github repo with a tutorial as discussed with Andrii in a few messages above.
Appreciate any feedback and help.


Re: Extracting data from tracepoints (and anything else)

Tristan Mayfield
 

I've waited to reply, not wanting to clog the mailing list, but I thought it would be beneficial to follow up on the same topic with kprobes in addition to tracepoints. The main issue I had with tracepoints was not understanding the 8-byte alignment in the arguments. Once that was sorted, getting information was actually really simple.

At this point I've moved to kprobes, kretprobes, and raw tracepoints. From what I understand, if not using CO-RE or vmlinux.h, to access data from kprobes or kretprobes you must access the cpu registers in which those values live?
For example, if I'm porting Brenden Gregg's bpftrace tool "elfsnoop" to libbpf, I'd want to trace "load_elf_binary()." load_elf_binary() only has one argument: "struct linux_binrprm *bprm." So if I want to read that struct, I'd have to access the register with that argument. I think in bpf_tracing.h that macro would be PT_REGS_PARAM1(x). I don't have the greatest understanding of asm and cpu registers, but I believe that would be the %rdi register?
With that in mind, here's my code and build.

#include <linux/bpf.h>
#include "bpf_helpers.h"
#include "bpf_tracing.h"
#include <linux/ptrace.h>
#include <linux/types.h>

SEC("kprobe/load_elf_binary")
int trace_entry(struct pt_regs *ctx) {
char msg[] = "hello world\n"; // for verification that the bpf program is running at all
bpf_trace_printk(msg, sizeof(msg));

struct linux_binprm *arg = (struct linux_binprm *) PT_REGS_PARM1(ctx);

return 0;
}

char _license[] SEC("license") = "GPL";

//// And the build command.
// Target arch and kernel are defined to get the correct macros
// in bpf_tracing.h
$ clang -O3 -Wall -target bpf \
-D__TARGET_ARCH_x86 \
-D__KERNEL__ -c \
elfsnoop.bpf.c \
-I/home/vagrant/libbpf/src/ \
-o elfsnoop.bpf.o

Unfortunately, as Andrii mentioned previously in this topic, I think there are different definitions of pt_regs and my /usr/include/linux/ptrace.h does not have the correct one, as evidenced by the error I get when trying to build.

elfsnoop.bpf.c:89:54: error: no member named 'di' in 'struct pt_regs'
  struct linux_binprm *arg = (struct linux_binprm *) PT_REGS_PARM1(ctx);
                                                     ^~~~~~~~~~~~~~~~~~
/home/vagrant/libbpf/src/bpf_tracing.h:54:32: note: expanded from macro 'PT_REGS_PARM1'
#define PT_REGS_PARM1(x) ((x)->di)

Is this the correct way to access data in kprobes? Most of the information I've found explicitly talking about accessing kprobe data is pretty old (2012-2015). selftests/bpf/ seems to not have examples of accessing kprobe data, and, from my understanding, libbpf-tools is CO-RE dependent which I'm trying to avoid for now just because most default kernels aren't BTF enabled yet (I will definitely be voicing my opinion to distros that this should change since the average user likely isn't keen on recompiling and installing a kernel). I also looked at the brief C Appendix of "BPF Performace Tools" and "Linux Observability with BPF" to try and understand, but I still haven't been able to extract data from the kprobes or raw tracepoints yet.
I think the final question that may (or may not) solve this issue is which pt_regs should be used?

Also, assuming this is the correct way, is this generalizable to raw tracepoints and kretprobes as well?

After I have these things figured out with some working examples, I think I will publish a github repo with a tutorial as discussed with Andrii in a few messages above.
Appreciate any feedback and help.


Re: Extracting data from tracepoints (and anything else)

Andrii Nakryiko
 

adding back mailing list


On Mon, Apr 6, 2020 at 7:58 AM <mayfieldtristan@...> wrote:

Andrii, thanks for the reply!

It's not arbitrary, it's set at 16 in kernel.

ctx->err doesn't exist according to definition above?

Sorry, these were my mistake. I neglected cleaning my code up properly before sending here. I thought I had caught my relic comments and weird experiments, but hadn't.
Really sorry.


I haven't checked the order of fields, but each field has to be long
in size (so 8 bytes on 64-bit arch). BPF is 64-bit arch, so long is
64-bit there. I'm not sure how this plays out on 32-bit target
architecture, but assuming you are on x86-64, all switch int to long
and make __mode_t also long.
Interesting. Here's the tracepoint field order for reference (if nothing else so the information is in one place for people who may read this):

root@ubuntu-focal:~# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_openat/format
name: sys_enter_openat
ID: 622
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;

field:int __syscall_nr; offset:8; size:4; signed:1;
field:int dfd; offset:16; size:8; signed:0;
field:const char * filename; offset:24; size:8; signed:0;
field:int flags; offset:32; size:8; signed:0;
field:umode_t mode; offset:40; size:8; signed:0;

I tried matching the struct to the fields listed, but I am on x86_64 so I guess the ints and umode_t should be long.
Notice offsets, they are all (except for first 4 fields which fit in
first 8 bytes) 8-byte aligned. You can do that in your struct
definitions as:

int __syscall_nr __attribute__((aligned(8)));

OR just use long.

The other issue I've been confused about, is __syscall_nr has an offset of 8 and size 4, but dfd has an offset 16 where I'd expect 12.
Does that mean that there's just meaningless data in that area that should be accounted for?
And, if the data are longs, does that mean that the information given in "format" is incorrect?


0 is not right here, use BPF_F_CURRENT_CPU (0xffffffffULL). Otherwise
you'll get data only on CPU #0 (if you get tracepoint triggered on
that CPU).
Ah, that is really helpful! I think I just took 0 from some code at https://github.com/bpftools/linux-observability-with-bpf
and just hadn't looked into those arguments yet, assuming they were correct!

This is due to invalid memory layour of struct sys_enter_openat_args,
you are reading wrong pointer. But sometimes filename might not be in
memory and you will get -EFAULT (-14), but that should not happen all
the time for sure.
Okay, so fixing the *ctx struct to use longs did, in fact, work! Is there a resource or way that I should have read in order to know that?
I'm actually really excited I can finally read tracepoint data :)
Not sure which part do you mean? Field alignment, sizes, and padding
are all part of standard C. As for tracepoint, selftests in kernel and
various BCC and libbpf examples should be a good starting point.


Since that worked, I'm a little less concerned with the raw tracepoints, but still interested. Here's my modified code for it:

#include "bpf_tracing.h"
#include <linux/bpf.h>
#include "bpf_helpers.h"

SEC("raw_tracepoint/sys_enter")
int bpf_prog(struct bpf_raw_tracepoint_args *ctx) {

volatile struct pt_regs *regs;
volatile const char *pathname;
regs = (struct pt_regs *)ctx->args[0];
pathname = PT_REGS_PARM2_CORE(regs); // instead of (const char *)regs->si;

char msg[] = "Path: %d\n";
bpf_trace_printk(msg, sizeof(msg), pathname);

return 0;
}
char _license[] SEC("license") = "GPL";

With this, I get a compiler error warning that "implicit declaration of function 'PT_REGS_PARM2_CORE' is invalid in C99"
which indicates to me that the defined guards in bpf_tracing.h are keeping me from accessing the macro.
I looked over the bpf_tracing.h file to see if it was an easy error, but it hasn't been obvious to me yet.
I'll keep fiddling with it, and look at selftests, and see if I can get it working.
You can use libbpf-tools/Makefile for inspiration on how to do this:
https://github.com/iovisor/bcc/blob/master/libbpf-tools/Makefile

You might need to define __TARGET_ARCH_x86 and __KERNEL__ explicitly
otherwise. It's easier with vmlinux.h, though.



Finally, I definitely am interested in starting up a tutorial. Right now I can load, attach, and unload BPF programs. Use perf buffers. I'm sure I could use other maps types as they're pretty simple, just haven't dabbled in them yet. I can also read data from tracepoints ;)
I'm going to start on kprobes this week, and hopefully that will be a little more straightforward after doing the work on tracepoints.
That's about what I could start a tutorial with right now. I'll maybe start one this week with some basic "hello world" type stuff, but I'm nervous to get too deep into technical details if the community isn't willing to at least look over it and make sure I'm not steering information the wrong direction. From the sound of it, that's not a huge worry, but a concern of mine nonetheless. There's a lot of deprecated information about BPF out there, and I don't want to make another deprecated resource.
BPF is still rapidly evolving, so yeah, that's a concern. It
definitely requires dedication and time to maintain good up-to-date
documentation. No way around that, unfortunately.


Cheers again for helping me debug my tracepoint code! I'm excited it's working!
Sure, you are welcome.


Re: Extracting data from tracepoints (and anything else)

Andrii Nakryiko
 

On Wed, Apr 1, 2020 at 12:52 PM <mayfieldtristan@...> wrote:

I've spent a few days trying to solve this issue I've had, and I've learned a lot about both the past BPF APIs, and the new CO-RE API. I do have a couple questions though.

Once a CO-RE program is compiled and tested with the verifier, can it be run on a kernel of the same version that isn't compiled with BTF?
Just answered on another Github issue
(https://github.com/iovisor/bcc/issues/2855#issuecomment-609532793),
please check it there as well. Short answer: no. Unless you can pretty
much guarantee that it will be exactly the same **binary** compiled
version of the kernel (not just same version).

The CO-RE API is very nice, but in case that ends up only being able to run on kernels with BTF support enabled, I've been trying to solve the original issue found in this topic without the CO-RE approach. I'm still not able to read the arguments from a given tracepoint. I'll put my code below. I'm sure there are still plenty of issues and appreciate any time given to nudge me in the right direction.

#include <linux/bpf.h>
#include "bpf_helpers.h"

// To get kernel datatypes. Haven't figured out how to do this
// without cloning the kernel source tree yet.
#include "/kernel-src/tools/include/linux/types.h"
These should come from kernel-devel packages.

#include <linux/version.h>
#include <asm/ptrace.h>
#include <unistd.h>
#define MAX_CPUS 4

struct bpf_map_def SEC("maps") events = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = MAX_CPUS,
};
nit: this is deprecated form of declaring maps, please see kernel
selftests for better examples.



// Struct to pass data via perf buffer
struct data_t {
u32 pid;
u32 tgid;
char program_name[16]; // max comm length is arbitrary
It's not arbitrary, it's set at 16 in kernel.

char file[255];
};

struct sys_enter_openat_args {
// struct fields obtained from tplist.py output
long long pad;
int __syscall_nr;
int dfd;
const char * filename;
int flags;
__mode_t mode; // used __mode_t instead of umode_t
};
I haven't checked the order of fields, but each field has to be long
in size (so 8 bytes on 64-bit arch). BPF is 64-bit arch, so long is
64-bit there. I'm not sure how this plays out on 32-bit target
architecture, but assuming you are on x86-64, all switch int to long
and make __mode_t also long.


SEC("tracepoint/syscalls/sys_enter_openat")
int bpf_prog(struct sys_enter_openat_args *ctx)
{
struct data_t data = {};

data.pid = bpf_get_current_pid_tgid() >> 32;
data.tgid = bpf_get_current_pid_tgid();
bpf_get_current_comm(&data.program_name, sizeof(data.program_name));

int err = bpf_probe_read_str(data.file, sizeof(data.file), ctx->filename);

// debugging
char msg[] = "Probe read results: %d\n";
bpf_trace_printk(msg, sizeof(msg), ctx->err);
ctx->err doesn't exist according to definition above?


bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));
0 is not right here, use BPF_F_CURRENT_CPU (0xffffffffULL). Otherwise
you'll get data only on CPU #0 (if you get tracepoint triggered on
that CPU).


return 0;
}
char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;
_version is not necessary with modern libbpf and kernel.


With the above code, err = -14 and ctx->filename = -100.
This is due to invalid memory layour of struct sys_enter_openat_args,
you are reading wrong pointer. But sometimes filename might not be in
memory and you will get -EFAULT (-14), but that should not happen all
the time for sure.


I took a look at an article written by Gianluca Borello (https://sysdig.com/blog/the-art-of-writing-ebpf-programs-a-primer/) for Sysdig's approach, and thought that using a raw tracepoint would be easier to get the filename arg than the above approach. I tried it out, but couldn't get it to compile.
Here's the new function:

SEC("raw_tracepoint/sys_enter")
int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
{
unsigned long syscall_id = ctx->args[1];
volatile struct pt_regs *regs;
volatile const char *pathname;

regs = (struct pt_regs *)ctx->args[0];
pathname = (const char *)regs->si;
better include bpf_tracing.h header from libbpf and use
PT_REGS_PARM2_CORE(regs) instead of directly referencing fields of
pt_regs.


struct data_t data = {};

data.pid = bpf_get_current_pid_tgid() >> 32;
data.tgid = bpf_get_current_pid_tgid();
bpf_get_current_comm(&data.program_name, sizeof(data.program_name));

char msg[] = "Probe read results: %d\n";
bpf_trace_printk(msg, sizeof(msg), syscall_id);

bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));

return 0;
}

With this code I get a compilation error:
file_open_kern.c:77:34: error: no member named 'si' in 'struct pt_regs'
pathname = (const char *)regs->si;
~~~~ ^
This error is strange to me because ptrace.h does list %si as a valid field. Perhaps I'm using the wrong header. Hopefully this is enough information to be clear.
This is due to different definitions of struct pt_regs in user-space
and kernel-space. Using libbpf's bpf_tracing.h header and PT_REGS
macros should eliminate a lot of those. Sticking to vmlinux.h also
helps, but requires BPF CO-RE.

If CO-RE compiled programs can run on non-BTF supported kernels, then I would be more than happy to shift to that approach. Otherwise, it's nice to have non-BTF reliant code.
No, unfortunately, it can't.

As a final note, I was working through some examples for XDP in https://github.com/xdp-project/xdp-tutorial and was thinking that something similar would be helpful for general BPF programming. The API may be too volatile at this point, but if people who have the technical expertise are interested, I'm willing to donate some of my own time to help build something similar. BCC's libbpf-tools has been extremely helpful, but it seems that there's not any resources (I've found) that are as in-depth and cohesive as the tutorial linked above. Again, I don't know if it's completely appropriate at this stage of development, but I know there's a lot of interest out there in using BPF at a more granular level and with less overhead than what is offered with BCC.
I agree that such tutorial is sorely missing. libbpf-tools and kernel
selftests (not so much samples/bpf, though) are probably the best way
to see usage of all the newer features. It would be awesome for
someone to prepare an approachable and comprehensive set of tutorials,
of course. Please do give it a try and community will certainly help
you with answering questions you have!


Re: Extracting data from tracepoints (and anything else)

Tristan Mayfield
 

I've spent a few days trying to solve this issue I've had, and I've learned a lot about both the past BPF APIs, and the new CO-RE API. I do have a couple questions though.
  • Once a CO-RE program is compiled and tested with the verifier, can it be run on a kernel of the same version that isn't compiled with BTF?
  • The CO-RE API is very nice, but in case that ends up only being able to run on kernels with BTF support enabled, I've been trying to solve the original issue found in this topic without the CO-RE approach. I'm still not able to read the arguments from a given tracepoint. I'll put my code below. I'm sure there are still plenty of issues and appreciate any time given to nudge me in the right direction.
#include <linux/bpf.h>
#include "bpf_helpers.h"

// To get kernel datatypes. Haven't figured out how to do this
// without cloning the kernel source tree yet.
#include "/kernel-src/tools/include/linux/types.h"
#include <linux/version.h>
#include <asm/ptrace.h>
#include <unistd.h>
#define MAX_CPUS 4

struct bpf_map_def SEC("maps") events = {
  .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
  .key_size = sizeof(int),
  .value_size = sizeof(u32),
  .max_entries = MAX_CPUS,
};

// Struct to pass data via perf buffer
struct data_t {
    u32 pid;
    u32 tgid;
    char program_name[16]; // max comm length is arbitrary
    char file[255];
};

struct sys_enter_openat_args {
    // struct fields obtained from tplist.py output
    long long pad;
    int __syscall_nr;
    int dfd;
    const char * filename;
    int flags;
    __mode_t mode;  // used __mode_t instead of umode_t
};

SEC("tracepoint/syscalls/sys_enter_openat")
int bpf_prog(struct sys_enter_openat_args *ctx)
{
  struct data_t data = {};

  data.pid = bpf_get_current_pid_tgid() >> 32;
  data.tgid = bpf_get_current_pid_tgid();
  bpf_get_current_comm(&data.program_name, sizeof(data.program_name));

  int err = bpf_probe_read_str(data.file, sizeof(data.file), ctx->filename);

  // debugging
  char msg[] = "Probe read results: %d\n";
  bpf_trace_printk(msg, sizeof(msg), ctx->err);

  bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));

  return 0;
}
char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;

With the above code, err = -14 and ctx->filename = -100.
I took a look at an article written by


Re: Extracting data from tracepoints (and anything else)

Andrii Nakryiko
 

On Wed, Mar 25, 2020 at 11:39 AM <mayfieldtristan@...> wrote:

Take a closer look. libbpf-tools do not use bpf_load.h, that one is
deprecated and its use is discouraged. libbpf-tools rely on
code-generated BPF skeleton. But really, get a close look at
libbpf-tools, it has everything you need to get started.


Will do. Does this mean that, going forward, BPF development will be encouraged to use kernels compiled with "CONFIG_DEBUG_INFO_BTF=y"? I've been using a default build up to now.
Yes. A lot of newer functionality relies on kernel BTF as well. But to
compile portable BPF program you also need kernel BTF (for BPF CO-RE
stuff).


Re: Extracting data from tracepoints (and anything else)

Tristan Mayfield
 

Take a closer look. libbpf-tools do not use bpf_load.h, that one is
deprecated and its use is discouraged. libbpf-tools rely on
code-generated BPF skeleton. But really, get a close look at
libbpf-tools, it has everything you need to get started.

Will do. Does this mean that, going forward, BPF development will be encouraged to use kernels compiled with "CONFIG_DEBUG_INFO_BTF=y"? I've been using a default build up to now.


Re: Extracting data from tracepoints (and anything else)

Andrii Nakryiko
 

On Wed, Mar 25, 2020 at 6:45 AM <mayfieldtristan@...> wrote:

bpf_probe_read_str() has been there for a long time, at least 4.12 or
even older.

I found out that the cloned the kernel tree from the Ubuntu repo (i.e. "git clone --depth 1 git://kernel.ubuntu.com/ubuntu/ubuntu-bionic.git") for Bionic was the issue. For some reason it doesn't have an up to date libbpf library and so doesn't have bpf_probe_read_str(). I think going forward, getting the API from the repo you recommended or from the official kernel source is the way to go.

I appreciate the pointers for my BPF program. If using github.com/libbpf/libbpf, should I just plan on loading and attaching programs manually instead of using bpf_load.h? I've been looking through the bcc/libbpf-tools/ directory and it looks like they're making use of bpf_load.h and BTF/CO-RE. I've tried using bpf_load.h/c with the standalone libbpf, but I've gotten some difficult linking issues I haven't been able to resolve.
Take a closer look. libbpf-tools do not use bpf_load.h, that one is
deprecated and its use is discouraged. libbpf-tools rely on
code-generated BPF skeleton. But really, get a close look at
libbpf-tools, it has everything you need to get started.


Please keep this discussion on mailing list, though, it might benefit
someone else.

Agreed, the last message I replied to just you accidentally.
Thanks again for the help.


Re: Extracting data from tracepoints (and anything else)

Tristan Mayfield
 

bpf_probe_read_str() has been there for a long time, at least 4.12 or
even older.
I found out that the cloned the kernel tree from the Ubuntu repo (i.e. "git clone --depth 1 git://kernel.ubuntu.com/ubuntu/ubuntu-bionic.git") for Bionic was the issue. For some reason it doesn't have an up to date libbpf library and so doesn't have bpf_probe_read_str(). I think going forward, getting the API from the repo you recommended or from the official kernel source is the way to go.

I appreciate the pointers for my BPF program. If using github.com/libbpf/libbpf, should I just plan on loading and attaching programs manually instead of using bpf_load.h? I've been looking through the bcc/libbpf-tools/ directory and it looks like they're making use of bpf_load.h and BTF/CO-RE. I've tried using bpf_load.h/c with the standalone libbpf, but I've gotten some difficult linking issues I haven't been able to resolve.
Please keep this discussion on mailing list, though, it might benefit
someone else.
Agreed, the last message I replied to just you accidentally.
Thanks again for the help.


Re: Extracting data from tracepoints (and anything else)

Andrii Nakryiko
 

Adding back mailing list.

On Mon, Mar 23, 2020 at 12:33 PM <mayfieldtristan@...> wrote:

Thanks for the reply. All of your suggestions make sense. Because I'm targeting a kernel 4.15 for this specific bit of code, I can't use bpf_probe_read_str().. I think? The way I'm developing is that I cloned the kernel src of kernel 4.15 to a certain depth, and then compiled libbpf. Should I use the standalone libbpf repo instead? I've tried that but have struggled to get samples/bpf/ to compile taking that approach.
bpf_probe_read_str() has been there for a long time, at least 4.12 or
even older.

samples/bpf are part of kernel, so yes, they are using libbpf from
kernel sources. For stand-alone application I'd go with
github.com/libbpf/libbpf


Regardless, I took your advice, and my code now looks like:

struct data_t {
u32 pid;
char program_name[256]; // max comm length is arbitrary
comm is 16 and unlikely to ever change. No need to waste 256 bytes here.

char *file;
};

struct bpf_map_def SEC("maps") events = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = MAX_CPUS,
};



struct sys_enter_openat_args {
u16 common_type;
u8 common_flags;
u8 common_preempt_count;
int common_pid;
int __syscall_nr;
int dfd;
char *filename;
int flags;
__mode_t mode;
};

SEC("tracepoint/syscalls/sys_enter_openat")
int bpf_prog(struct sys_enter_openat_args *ctx) {
struct data_t data;
data has to be initialized here:

struct data_t data = {};

struct sys_enter_openat_args args;

int res = bpf_probe_read(&args, sizeof(args), ctx); // read the ctx into bpf space
if(!res) {
data.file = "couldn't get file";
} else {
data.file = args.filename;
}

data.pid = bpf_get_current_pid_tgid();
bpf_get_current_comm(data.program_name, sizeof(data.program_name));

bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));
return 0;
}

char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;

---
With a new error code

bpf_load_program() err=13
0: (bf) r6 = r1
1: (bf) r1 = r10
2: (07) r1 += -304
3: (b7) r2 = 32
4: (bf) r3 = r6
5: (85) call bpf_probe_read#4
6: (67) r0 <<= 32
7: (77) r0 >>= 32
8: (55) if r0 != 0x0 goto pc+3
R0=inv0 R6=ctx(id=0,off=0,imm=0) R10=fp0
9: (18) r1 = 0xffff88ac7c953000
11: (05) goto pc+1
13: (7b) *(u64 *)(r10 -8) = r1
14: (85) call bpf_get_current_pid_tgid#14
15: (63) *(u32 *)(r10 -272) = r0
16: (bf) r1 = r10
17: (07) r1 += -268
18: (b7) r2 = 256
19: (85) call bpf_get_current_comm#16
20: (bf) r4 = r10
21: (07) r4 += -272
22: (bf) r1 = r6
23: (18) r2 = 0xffff88ac7c953000
25: (b7) r3 = 0
26: (b7) r5 = 272
27: (85) call bpf_perf_event_output#25
invalid indirect read from stack off -272+260 size 272
The kernel didn't load the BPF program
---

Thanks for the pointer about the libbpf-tools in the BCC repo! I had seen it before but for some reason didn't make any kind of note about it. It's extremely helpful.

I will change the map declaration, just trying to understand getting data from the tracepoint right now.
data is not completely initialized, see above.

Please keep this discussion on mailing list, though, it might benefit
someone else.


Re: Array brace-enclosed initialization

Yonghong Song
 

On Mon, Mar 23, 2020 at 10:05 AM Federico Parola <fede.parola@...> wrote:

Hello everybody,
in my XDP eBPF program I'm trying to initialize an array with a brace-enclosed list, however my code is rejected by the verifier.
Here is a simple piece of code to replicate the problem:

#include <linux/bpf.h>

#ifndef __section
# define __section(NAME) \
__attribute__((section(NAME), used))
#endif

#ifndef BPF_FUNC
# define BPF_FUNC(NAME, ...) \
(*NAME)(__VA_ARGS__) = (void *)BPF_FUNC_##NAME
#endif

#ifndef printk
# define printk(fmt, ...) \
({ \
char ____fmt[] = fmt; \
trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__); \
})
#endif

static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);

__section("prog")
int xdp_prog(struct xdp_md *ctx) {
int i;
int array[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
The init list is too long. The compiler puts {0, 1, 2, ..., 9} in a readonly
section. That is why you got the failure below.

If you use native clang compilation and a recent kernel, libbpf
should be able to automatically create a map for you so your
code will work.


#pragma nounroll
for (i = 0; i < 10; i++) {
printk("%d", array[i]);
}

return XDP_PASS;
}

char __license[] __section("license") = "GPL"

This is the error reported by the verifier:

0: (b7) r6 = 0
1: (b7) r7 = 25637
2: (b7) r8 = 0
3: (73) *(u8 *)(r10 -2) = r6
last_idx 3 first_idx 0
regs=40 stack=0 before 2: (b7) r8 = 0
regs=40 stack=0 before 1: (b7) r7 = 25637
regs=40 stack=0 before 0: (b7) r6 = 0
4: (6b) *(u16 *)(r10 -4) = r7
5: (18) r1 = 0x0
7: (0f) r1 += r8
8: (61) r3 = *(u32 *)(r1 +0)
R1 invalid mem access 'inv'
processed 8 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

I tried compiling with clang-6 and clang-9 with optimizaton set to O1 and O2, but I get this error in all cases.
If I initialize the array in another way (e.g. with a loop) the program works correctly.
The eBPF bytecode generated by clang is the following:

.text
.file "test.c"
.section prog,"ax",@progbits
.globl xdp_prog # -- Begin function xdp_prog
.p2align 3
.type xdp_prog,@function
xdp_prog: # @xdp_prog
# %bb.0:
r6 = 0
r7 = 25637
r8 = 0
LBB0_1: # =>This Inner Loop Header: Depth=1
*(u8 *)(r10 - 2) = r6
*(u16 *)(r10 - 4) = r7
r1 = .L__const.xdp_prog.array ll
r1 += r8
r3 = *(u32 *)(r1 + 0)
r1 = r10
r1 += -4
r2 = 3
call 6
r8 += 4
if r8 != 24 goto LBB0_1
# %bb.2:
r0 = 2
exit
.Lfunc_end0:
.size xdp_prog, .Lfunc_end0-xdp_prog
# -- End function
.type .L__const.xdp_prog.array,@object # @__const.xdp_prog.array
.section .rodata,"a",@progbits
.p2align 2
.L__const.xdp_prog.array:
.long 0 # 0x0
.long 1 # 0x1
.long 2 # 0x2
.long 3 # 0x3
.long 4 # 0x4
.long 5 # 0x5
.long 6 # 0x6
.long 7 # 0x7
.long 8 # 0x8
.long 9 # 0x9
.size .L__const.xdp_prog.array, 40

.type .L__const.xdp_prog.____fmt,@object # @__const.xdp_prog.____fmt
.section .rodata.str1.1,"aMS",@progbits,1
.L__const.xdp_prog.____fmt:
.asciz "%d"
.size .L__const.xdp_prog.____fmt, 3

.type __license,@object # @__license
.section license,"aw",@progbits
.globl __license
__license:
.asciz "GPL"
.size __license, 4


.addrsig
.addrsig_sym xdp_prog
.addrsig_sym __license

It seems like the array in the stack is not initialized in the code. With some declarations the code works, for example decalring the array in the following way:
int array[10] = {0, 1, 2, 3};
everything works, the generated bytecode is the following:

.text
.file "test.c"
.section prog,"ax",@progbits
.globl xdp_prog # -- Begin function xdp_prog
.p2align 3
.type xdp_prog,@function
xdp_prog: # @xdp_prog
# %bb.0:
r1 = 2
*(u32 *)(r10 - 36) = r1
r6 = 0
*(u32 *)(r10 - 8) = r6
*(u32 *)(r10 - 12) = r6
*(u32 *)(r10 - 16) = r6
*(u32 *)(r10 - 20) = r6
*(u32 *)(r10 - 24) = r6
*(u32 *)(r10 - 28) = r6
*(u32 *)(r10 - 4) = r6
r1 = 3
*(u32 *)(r10 - 32) = r1
r1 = 1
*(u32 *)(r10 - 40) = r1
*(u8 *)(r10 - 42) = r6
r7 = 25637
*(u16 *)(r10 - 44) = r7
r1 = r10
r1 += -44
r2 = 3
r3 = 1
call 6
r8 = 4
LBB0_1: # =>This Inner Loop Header: Depth=1
r1 = r10
r1 += -40
r1 += r8
r3 = *(u32 *)(r1 + 0)
*(u8 *)(r10 - 42) = r6
*(u16 *)(r10 - 44) = r7
r1 = r10
r1 += -44
r2 = 3
call 6
r8 += 4
if r8 != 24 goto LBB0_1
# %bb.2:
r0 = 2
exit
.Lfunc_end0:
.size xdp_prog, .Lfunc_end0-xdp_prog
# -- End function
.type .L__const.xdp_prog.____fmt,@object # @__const.xdp_prog.____fmt
.section .rodata.str1.1,"aMS",@progbits,1
.L__const.xdp_prog.____fmt:
.asciz "%d"
.size .L__const.xdp_prog.____fmt, 3

.type __license,@object # @__license
.section license,"aw",@progbits
.globl __license
__license:
.asciz "GPL"
.size __license, 4


.addrsig
.addrsig_sym xdp_prog
.addrsig_sym __license

This time the array is correctly initialized into code.
Is this a clang bug?


Re: Extracting data from tracepoints (and anything else)

Andrii Nakryiko
 

On Mon, Mar 23, 2020 at 9:38 AM <mayfieldtristan@...> wrote:

I've been exploring the libbpf library for different versions of the Linux kernel, and trying to rewrite some of the BCC tools. I would like to do more work with CO-RE eventually, but I'm trying to understand the entire model of how BPF programs work and how data flows between the kernel, the VM, and userspace. I just started using perf buffers instead of bpf_trace_printk and came across an issue that has me scratching my head. In the below code, I'm not able to access the const char * arg in the tracepoint sys_enter_openat (kernel 4.15). For some reason the verifier rejects this code. I think it's valid C (although I'm a little bit rusty still) and I think I followed the correct flow where data must be copied from the kernel to the VM before being able to use.

If anyone has insight to share, I'd much appreciate it. Conversely, if anyone can point me in the direction of how to debug BPF programs that would be extremely helpful too. Should I just dig into learning the basics of BPF asm?

Highlights of the code:

struct bpf_map_def SEC("maps") events = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = MAX_CPUS,
};
nit: this is a legacy syntax of specifying BPF maps, please see [0]
for some newer examples

[0] https://github.com/iovisor/bcc/tree/master/libbpf-tools


struct sys_enter_openat_args {
u16 common_type;
u8 common_flags;
u8 common_preempt_count;
int common_pid;
int __syscall_nr;
int dfd;
char *filename;
int flags;
__mode_t mode;
};

SEC("tracepoint/syscalls/sys_enter_openat")
int bpf_prog(struct sys_enter_openat_args *ctx) {
struct data_t data;
struct sys_enter_openat_args *args;

int res = bpf_probe_read(args, sizeof(ctx), ctx);
you don't need to bpf_probe_read() ctx here, you can just access its
members directly.

if(!res) {
data.file = "couldn't get file";
} else {
data.file = args->filename;
But here if you want to read filename contents itself, you'll need to
use bpf_probe_read_str().

Having data_t definition would be also helpful.

}

Error Message:

bpf_load_program() err=13
0: (bf) r6 = r1
1: (b7) r2 = 8
2: (bf) r3 = r6
3: (85) call bpf_probe_read#4
R1 type=ctx expected=fp
this error from verifier is quite misleading, but what verifier
complains about here is that you try to read uninitialized pointer
(arg) and pass it as a first parameter into bpf_probe_read(). But see
above, you don't need to bpf_probe_read() anything, and even if you
wanted to it would have to be done very differently:

struct sys_enter_openat_args args; /* notice no pointer here */
bpf_probe_read(&args, sizeof(args), ctx); /* taking address of args,
taking size of args, not its pointer */

The kernel didn't load the BPF program

data.pid = bpf_get_current_pid_tgid(); // use fn from libbpf.h to get pid_tgid
bpf_get_current_comm(data.program_name, sizeof(data.program_name)); // puts current comm into char array

bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));

return 0;
}

If more code would be helpful, I'm happy to share.

I recognize that libbpf and CO-RE in later kernels provides an easier API for dealing with char * (bpf_probe_read_str() I believe) but I'm trying to understand what needs to be done to target different kernels and not just the most cutting edge.

As a second question, how much should I learn about perf(1) and its overlap with BPF?

Finally, for long-term monitoring solutions and passing readable data, do most programs rely on pinning maps to the vfs instead of using perf buffers or passing directly to a userspace process?
It's a mix. If your data should/can be pre-aggregated in kernel, using
map might benefit you in that you will be sending much less data to
user-space. But if you want to send every piece of information than
perf_buffer is faster and more convenient than having user-space query
BPF maps all the time.


Thanks for the patience and goodwill with a new systems dev. I've enjoyed my interactions with the BPF community.
You're welcome. Check libbpf-tools in BCC repo, it should give you
some examples to work off of.


Tristan


Array brace-enclosed initialization

Federico Parola <fede.parola@...>
 

Hello everybody,
in my XDP eBPF program I'm trying to initialize an array with a brace-enclosed list, however my code is rejected by the verifier.
Here is a simple piece of code to replicate the problem:

#include <linux/bpf.h>

#ifndef __section
# define __section(NAME)                  \
   __attribute__((section(NAME), used))
#endif

#ifndef BPF_FUNC
# define BPF_FUNC(NAME, ...)              \
   (*NAME)(__VA_ARGS__) = (void *)BPF_FUNC_##NAME
#endif

#ifndef printk
# define printk(fmt, ...)                                      \
    ({                                                         \
        char ____fmt[] = fmt;                                  \
        trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__); \
    })
#endif

static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);

__section("prog")
int xdp_prog(struct xdp_md *ctx) {
  int i;
  int array[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};

#pragma nounroll
  for (i = 0; i < 10; i++) {
    printk("%d", array[i]);
  }

  return XDP_PASS;
}

char __license[] __section("license") = "GPL"
This is the error reported by the verifier:
0: (b7) r6 = 0
1: (b7) r7 = 25637
2: (b7) r8 = 0
3: (73) *(u8 *)(r10 -2) = r6
last_idx 3 first_idx 0
regs=40 stack=0 before 2: (b7) r8 = 0
regs=40 stack=0 before 1: (b7) r7 = 25637
regs=40 stack=0 before 0: (b7) r6 = 0
4: (6b) *(u16 *)(r10 -4) = r7
5: (18) r1 = 0x0
7: (0f) r1 += r8
8: (61) r3 = *(u32 *)(r1 +0)
R1 invalid mem access 'inv'
processed 8 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
I tried compiling with clang-6 and clang-9 with optimizaton set to O1 and O2, but I get this error in all cases.
If I initialize the array in another way (e.g. with a loop) the program works correctly.
The eBPF bytecode generated by clang is the following:
	.text
	.file	"test.c"
	.section	prog,"ax",@progbits
	.globl	xdp_prog                # -- Begin function xdp_prog
	.p2align	3
	.type	xdp_prog,@function
xdp_prog:                               # @xdp_prog
# %bb.0:
	r6 = 0
	r7 = 25637
	r8 = 0
LBB0_1:                                 # =>This Inner Loop Header: Depth=1
	*(u8 *)(r10 - 2) = r6
	*(u16 *)(r10 - 4) = r7
	r1 = .L__const.xdp_prog.array ll
	r1 += r8
	r3 = *(u32 *)(r1 + 0)
	r1 = r10
	r1 += -4
	r2 = 3
	call 6
	r8 += 4
	if r8 != 24 goto LBB0_1
# %bb.2:
	r0 = 2
	exit
.Lfunc_end0:
	.size	xdp_prog, .Lfunc_end0-xdp_prog
                                        # -- End function
	.type	.L__const.xdp_prog.array,@object # @__const.xdp_prog.array
	.section	.rodata,"a",@progbits
	.p2align	2
.L__const.xdp_prog.array:
	.long	0                       # 0x0
	.long	1                       # 0x1
	.long	2                       # 0x2
	.long	3                       # 0x3
	.long	4                       # 0x4
	.long	5                       # 0x5
	.long	6                       # 0x6
	.long	7                       # 0x7
	.long	8                       # 0x8
	.long	9                       # 0x9
	.size	.L__const.xdp_prog.array, 40

	.type	.L__const.xdp_prog.____fmt,@object # @__const.xdp_prog.____fmt
	.section	.rodata.str1.1,"aMS",@progbits,1
.L__const.xdp_prog.____fmt:
	.asciz	"%d"
	.size	.L__const.xdp_prog.____fmt, 3

	.type	__license,@object       # @__license
	.section	license,"aw",@progbits
	.globl	__license
__license:
	.asciz	"GPL"
	.size	__license, 4


	.addrsig
	.addrsig_sym xdp_prog
	.addrsig_sym __license
It seems like the array in the stack is not initialized in the code. With some declarations the code works, for example decalring the array in the following way:
int array[10] = {0, 1, 2, 3};
everything works, the generated bytecode is the following:
	.text
	.file	"test.c"
	.section	prog,"ax",@progbits
	.globl	xdp_prog                # -- Begin function xdp_prog
	.p2align	3
	.type	xdp_prog,@function
xdp_prog:                               # @xdp_prog
# %bb.0:
	r1 = 2
	*(u32 *)(r10 - 36) = r1
	r6 = 0
	*(u32 *)(r10 - 8) = r6
	*(u32 *)(r10 - 12) = r6
	*(u32 *)(r10 - 16) = r6
	*(u32 *)(r10 - 20) = r6
	*(u32 *)(r10 - 24) = r6
	*(u32 *)(r10 - 28) = r6
	*(u32 *)(r10 - 4) = r6
	r1 = 3
	*(u32 *)(r10 - 32) = r1
	r1 = 1
	*(u32 *)(r10 - 40) = r1
	*(u8 *)(r10 - 42) = r6
	r7 = 25637
	*(u16 *)(r10 - 44) = r7
	r1 = r10
	r1 += -44
	r2 = 3
	r3 = 1
	call 6
	r8 = 4
LBB0_1:                                 # =>This Inner Loop Header: Depth=1
	r1 = r10
	r1 += -40
	r1 += r8
	r3 = *(u32 *)(r1 + 0)
	*(u8 *)(r10 - 42) = r6
	*(u16 *)(r10 - 44) = r7
	r1 = r10
	r1 += -44
	r2 = 3
	call 6
	r8 += 4
	if r8 != 24 goto LBB0_1
# %bb.2:
	r0 = 2
	exit
.Lfunc_end0:
	.size	xdp_prog, .Lfunc_end0-xdp_prog
                                        # -- End function
	.type	.L__const.xdp_prog.____fmt,@object # @__const.xdp_prog.____fmt
	.section	.rodata.str1.1,"aMS",@progbits,1
.L__const.xdp_prog.____fmt:
	.asciz	"%d"
	.size	.L__const.xdp_prog.____fmt, 3

	.type	__license,@object       # @__license
	.section	license,"aw",@progbits
	.globl	__license
__license:
	.asciz	"GPL"
	.size	__license, 4


	.addrsig
	.addrsig_sym xdp_prog
	.addrsig_sym __license
This time the array is correctly initialized into code.
Is this a clang bug?


Extracting data from tracepoints (and anything else)

Tristan Mayfield
 

I've been exploring the libbpf library for different versions of the Linux kernel, and trying to rewrite some of the BCC tools. I would like to do more work with CO-RE eventually, but I'm trying to understand the entire model of how BPF programs work and how data flows between the kernel, the VM, and userspace. I just started using perf buffers instead of bpf_trace_printk and came across an issue that has me scratching my head. In the below code, I'm not able to access the const char * arg in the tracepoint sys_enter_openat (kernel 4.15). For some reason the verifier rejects this code. I think it's valid C (although I'm a little bit rusty still) and I think I followed the correct flow where data must be copied from the kernel to the VM before being able to use.

If anyone has insight to share, I'd much appreciate it. Conversely, if anyone can point me in the direction of how to debug BPF programs that would be extremely helpful too. Should I just dig into learning the basics of BPF asm?

Highlights of the code:

struct bpf_map_def SEC("maps") events = {
  .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
  .key_size = sizeof(int),
  .value_size = sizeof(u32),
  .max_entries = MAX_CPUS,
};

struct sys_enter_openat_args {
        u16 common_type;
        u8 common_flags;
        u8 common_preempt_count;
        int common_pid;
        int __syscall_nr;
        int dfd;
        char *filename;
        int flags;
        __mode_t mode;
};

SEC("tracepoint/syscalls/sys_enter_openat")
int bpf_prog(struct sys_enter_openat_args *ctx) {
  struct data_t data;
  struct sys_enter_openat_args *args;

  int res = bpf_probe_read(args, sizeof(ctx), ctx);
  if(!res) {
         data.file = "couldn't get file";
  } else {
         data.file = args->filename;
  }

Error Message:

bpf_load_program() err=13
0: (bf) r6 = r1
1: (b7) r2 = 8
2: (bf) r3 = r6
3: (85) call bpf_probe_read#4
R1 type=ctx expected=fp
The kernel didn't load the BPF program

  data.pid = bpf_get_current_pid_tgid(); // use fn from libbpf.h to get pid_tgid
  bpf_get_current_comm(data.program_name, sizeof(data.program_name)); // puts current comm into char array

  bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));

  return 0;
}

If more code would be helpful, I'm happy to share.

I recognize that libbpf and CO-RE in later kernels provides an easier API for dealing with char * (bpf_probe_read_str() I believe) but I'm trying to understand what needs to be done to target different kernels and not just the most cutting edge.

As a second question, how much should I learn about perf(1) and its overlap with BPF?

Finally, for long-term monitoring solutions and passing readable data, do most programs rely on pinning maps to the vfs instead of using perf buffers or passing directly to a userspace process?

Thanks for the patience and goodwill with a new systems dev. I've enjoyed my interactions with the BPF community.

Tristan


Study on annotation of design and implementation choices, and of technical debt

a.serebrenik@...
 

Dear all,

As software engineering research teams at the University of Sannio (Italy) and Eindhoven University of Technology (The Netherlands) we are interested in investigating the protocol used by developers while they have to annotate implementation and design choices during their normal development activities. More specifically, we are looking at whether, where and what kind of annotations developers usually use trying to be focused more on those annotations mainly aimed at highlighting that the code is not in the right shape (e.g., comments for annotating delayed or intended work activities such as TODO, FIXME, hack, workaround, etc). In the latter case, we are looking at what is the content of the above annotations, as well as how they usually behave while evolving the code that has been previously annotated.

When answering the survey, in case your annotation practices are different in different open source projects you may contribute, please refer to how you behave for the projects where you have been contacted.

Filling out the survey will take about 5 minutes.

Please note that your identity and personal data will not be disclosed, while we plan to use the aggregated results and anonymized responses as part of a scientific publication. 

If you have any questions about the questionnaire or our research, please do not hesitate to contact us.

You can find the survey link here:


Thanks and regards,

Gianmarco Fucci (gianmarcofucci94@...)
Fiorella Zampetti (fzampetti@...)
Alexander Serebrenik (a.serebrenik@...)
Massimiliano Di Penta (dipenta@...)

--


Re: is BCC tools safe to enable root privilegies in production?

Cristian Spinetta
 

Thanks for your fast reply!

In our infrastructure the owners of the app can logging into the production VMs that are running their apps and execute a restricted list of command with sudo (e.g. tcpdump, netstat, ...). The idea is to give root access to each script of bcc tool (all within /usr/share/bcc/tools/*). We are concerned if there are some bcc scripts that can run another command like in the example above or if there are other security concerns to be aware of.

Best,
Cristian Spinetta


On Fri, Mar 13, 2020 at 1:23 PM Brendan Gregg <brendan.d.gregg@...> wrote:
On Fri, Mar 13, 2020 at 7:59 AM Cristian Spinetta <cebspinetta@...> wrote:
>
> Hi all!
>
> I am curious about whether it is safe to enable root access to BCC scripts on production machines.
> In my company, each team has access to their instances via ssh, and we are thinking to allow them to use bcc in production. For this purpose we need to allow root access to any BCC tool. Do you think it would be safe? for example, is there some tool that can receive a command to execute? in that case it would be unsafe because someone could execute any command thought a bcc tool.
>
> e.g.:
> sudo /usr/share/bcc/tools/some-great-tool.sh dd if=/dev/zero of=/dev/sda bs=512 count=1 conv=notrunc

^^^^

sudo isn't safe. If you remove the BCC tool from this one-liner,
you'll find it still destroys your disk.

In practice the production concern I have is for the overhead of each
tool, hence the overhead section in each tool's man page.

Brendan

>
> Best,
> Cristian Spinetta
>




Re: is BCC tools safe to enable root privilegies in production?

Brendan Gregg
 

On Fri, Mar 13, 2020 at 7:59 AM Cristian Spinetta <cebspinetta@...> wrote:

Hi all!

I am curious about whether it is safe to enable root access to BCC scripts on production machines.
In my company, each team has access to their instances via ssh, and we are thinking to allow them to use bcc in production. For this purpose we need to allow root access to any BCC tool. Do you think it would be safe? for example, is there some tool that can receive a command to execute? in that case it would be unsafe because someone could execute any command thought a bcc tool.

e.g.:
sudo /usr/share/bcc/tools/some-great-tool.sh dd if=/dev/zero of=/dev/sda bs=512 count=1 conv=notrunc
^^^^

sudo isn't safe. If you remove the BCC tool from this one-liner,
you'll find it still destroys your disk.

In practice the production concern I have is for the overhead of each
tool, hence the overhead section in each tool's man page.

Brendan


Best,
Cristian Spinetta


is BCC tools safe to enable root privilegies in production?

Cristian Spinetta
 

Hi all!

I am curious about whether it is safe to enable root access to BCC scripts on production machines.
In my company, each team has access to their instances via ssh, and we are thinking to allow them to use bcc in production. For this purpose we need to allow root access to any BCC tool. Do you think it would be safe? for example, is there some tool that can receive a command to execute? in that case it would be unsafe because someone could execute any command thought a bcc tool.

e.g.:
sudo /usr/share/bcc/tools/some-great-tool.sh dd if=/dev/zero of=/dev/sda bs=512 count=1 conv=notrunc

Best,
Cristian Spinetta

181 - 200 of 2020