Date
1 - 2 of 2
bpf_probe_read() split: bpftrace RFC
Matheus Marchini <mat@...>
How will bpf_probe_read_user/bpf_probe_read_kernel be enforced in the
Kernel? In other words, how bpf_probe_read_user will detect and report when it get's a Kernel address as parameter, and vice-versa? Will it be accomplished by the verifier (is it even possible to do this reliably with the verifier) or only on runtime? If the kernel will only test it during runtime, and it returns an unique error code (different than errors that probe_read can return today, we might need to create a new error code) , we could do the following for the dereference operands (*/str()): typedef int (probe_read_t)(void *dst, int size, void *src); // Assuming bpf_probe_read_[user,kernel] will return EINVALADDRSPC // if the user tires to access an address with the wrong function int err; // space_ctx is defined according to Brendan's email probe_read_t default_probe_read; = space_ctx == KERNEL ? bpf_probe_read_kernel : bpf_probe_read_user; probe_read_t fallback_probe_read; if (addr_space_ctx == KERNEL) { default_probe_read = bpf_probe_read_kernel; fallback_probe_read = bpf_probe_read_user; } else { default_probe_read = bpf_probe_read_user; fallback_probe_read = bpf_probe_read_kernel; } if (err = (*default_probe_read)(dst, size, src) == EINVALADDRSPC) { err = (*fallback_probe_read)(dst, size, src); } if (err < 0) { bpf_trace_printk("Error while reading address %x\n", src); return; } With this approach we can avoid breaking any scripts. The only difference is that it will add more overhead when the fallback probe_read is used (and if the user is affected by this overhead, they can still use kptr/uptr/kstr/ustr). We could also: print to stdout/syslog when the fallback method is used if bpftrace is running in verbose mode, and provide a "strict" mode which would not try to run the fallback probe_read. On Thu, Jun 13, 2019 at 11:32 AM Brendan Gregg <brendan.d.gregg@...> wrote:
|
|
Brendan Gregg
G'Day,
This is the biggest change afoot to the bpftrace API, and I think we can sort it out quickly without fuss, but it is worth sharing here. This is from https://github.com/iovisor/bpftrace/issues/614 . bpftrace currently allows pointer dereferencing via *addr, and str(addr) for strings. But the future split of bpf_probe_read() into bpf_probe_read_user() and bpf_probe_read_kernel() (to support SPARC, etc) may break a lot of bpftrace tools and documentation. Or it may not, if we are clever about it. The proposal is this: add the following bpftrace builtins: - uptr(addr): dereference user address - ustr(addr): fetch NULL-terminated user string - kptr(addr): dereference kernel address - kstr(addr): fetch NULL-terminated kernel string AND, to introduce a "context" for probe actions -- user or kernel -- where *addr and str(addr) work relative to that context. The context would be: - kprobes/kretprobes: kernel - uprobes/uretprobes: user - tracepoints: kernel (with the exception of syscall tracepoints: user) - other probe types: kernel It's possible that this context approach leaves us with zero broken tools and documentation (ie, there are zero cases so far where we even need to use uptr/ustr/kptr/kstr). I'm still checking and looking for exceptions. Where you can help: can you think of a syscall tracepoint that has a kernel address as an argument? Or another non-syscall tracepoint that has a user-address as an argument? Or can you think of any other problem with this plan? thanks, Brendan
|
|