Topics

bpf_probe_read() split: bpftrace RFC

Brendan Gregg
 

G'Day,

This is the biggest change afoot to the bpftrace API, and I think we
can sort it out quickly without fuss, but it is worth sharing here.
This is from https://github.com/iovisor/bpftrace/issues/614 .

bpftrace currently allows pointer dereferencing via *addr, and
str(addr) for strings. But the future split of bpf_probe_read() into
bpf_probe_read_user() and bpf_probe_read_kernel() (to support SPARC,
etc) may break a lot of bpftrace tools and documentation. Or it may
not, if we are clever about it.

The proposal is this: add the following bpftrace builtins:

- uptr(addr): dereference user address
- ustr(addr): fetch NULL-terminated user string
- kptr(addr): dereference kernel address
- kstr(addr): fetch NULL-terminated kernel string

AND, to introduce a "context" for probe actions -- user or kernel --
where *addr and str(addr) work relative to that context. The context
would be:

- kprobes/kretprobes: kernel
- uprobes/uretprobes: user
- tracepoints: kernel (with the exception of syscall tracepoints: user)
- other probe types: kernel

It's possible that this context approach leaves us with zero broken
tools and documentation (ie, there are zero cases so far where we even
need to use uptr/ustr/kptr/kstr). I'm still checking and looking for
exceptions. Where you can help: can you think of a syscall tracepoint
that has a kernel address as an argument? Or another non-syscall
tracepoint that has a user-address as an argument? Or can you think of
any other problem with this plan?

thanks,

Brendan

Matheus Marchini
 

How will bpf_probe_read_user/bpf_probe_read_kernel be enforced in the
Kernel? In other words, how bpf_probe_read_user will detect and report
when it get's a Kernel address as parameter, and vice-versa? Will it
be accomplished by the verifier (is it even possible to do this
reliably with the verifier) or only on runtime?

If the kernel will only test it during runtime, and it returns an
unique error code (different than errors that probe_read can return
today, we might need to create a new error code) , we could do the
following for the dereference operands (*/str()):

typedef int (probe_read_t)(void *dst, int size, void *src);

// Assuming bpf_probe_read_[user,kernel] will return EINVALADDRSPC
// if the user tires to access an address with the wrong function
int err;

// space_ctx is defined according to Brendan's email
probe_read_t default_probe_read; = space_ctx == KERNEL ?
bpf_probe_read_kernel : bpf_probe_read_user;
probe_read_t fallback_probe_read;
if (addr_space_ctx == KERNEL) {
default_probe_read = bpf_probe_read_kernel;
fallback_probe_read = bpf_probe_read_user;
}
else {
default_probe_read = bpf_probe_read_user;
fallback_probe_read = bpf_probe_read_kernel;
}

if (err = (*default_probe_read)(dst, size, src) == EINVALADDRSPC) {
err = (*fallback_probe_read)(dst, size, src);
}
if (err < 0)
{
bpf_trace_printk("Error while reading address %x\n", src);
return;
}

With this approach we can avoid breaking any scripts. The only
difference is that it will add more overhead when the fallback
probe_read is used (and if the user is affected by this overhead, they
can still use kptr/uptr/kstr/ustr). We could also: print to
stdout/syslog when the fallback method is used if bpftrace is running
in verbose mode, and provide a "strict" mode which would not try to
run the fallback probe_read.

On Thu, Jun 13, 2019 at 11:32 AM Brendan Gregg
<brendan.d.gregg@...> wrote:

G'Day,

This is the biggest change afoot to the bpftrace API, and I think we
can sort it out quickly without fuss, but it is worth sharing here.
This is from https://github.com/iovisor/bpftrace/issues/614 .

bpftrace currently allows pointer dereferencing via *addr, and
str(addr) for strings. But the future split of bpf_probe_read() into
bpf_probe_read_user() and bpf_probe_read_kernel() (to support SPARC,
etc) may break a lot of bpftrace tools and documentation. Or it may
not, if we are clever about it.

The proposal is this: add the following bpftrace builtins:

- uptr(addr): dereference user address
- ustr(addr): fetch NULL-terminated user string
- kptr(addr): dereference kernel address
- kstr(addr): fetch NULL-terminated kernel string

AND, to introduce a "context" for probe actions -- user or kernel --
where *addr and str(addr) work relative to that context. The context
would be:

- kprobes/kretprobes: kernel
- uprobes/uretprobes: user
- tracepoints: kernel (with the exception of syscall tracepoints: user)
- other probe types: kernel

It's possible that this context approach leaves us with zero broken
tools and documentation (ie, there are zero cases so far where we even
need to use uptr/ustr/kptr/kstr). I'm still checking and looking for
exceptions. Where you can help: can you think of a syscall tracepoint
that has a kernel address as an argument? Or another non-syscall
tracepoint that has a user-address as an argument? Or can you think of
any other problem with this plan?

thanks,

Brendan