Date   

Re: verifier: variable offset stack access question

Yonghong Song
 

On Wed, Dec 23, 2020 at 2:21 PM Andrei Matei <andreimatei1@...> wrote:

Hello Yonghong, all,

I'm curious about a verifier workaround that Yonghong provided two years ago, in this thread.
Brendan Gregg was asking about accessing stack buffers through a register with a variable offset, and Yonghong suggested a memset as a solution:
"
You can initialize the array with ' ' to workaround the issue:
struct data_t data;
uint64_t max = sizeof(data.argv);
const char *argp = NULL;
memset(&data, ' ', sizeof(data));
bpf_probe_read(&argp, sizeof(argp), (void *)&__argv[0]);
uint64_t len = bpf_probe_read_str(&data.argv, max, argp);
len &= 0xffffffff; // to avoid: "math between fp pointer and register errs"
bpf_trace_printk("len: %d\\n", len); // sanity check: len is indeed valid
"

My question is - how does the memset help? I sort of understand the trouble with variable stack access (different regions of the stack can hold memory of different types), and I've looked through the verifier's code but I've failed to get a clue.
I cannot remember details. Here, what "memset" did is to initialize
related bytes in stack to 0. I guess maybe at that point
bpf_probe_read_str requires an initialized memory?

Right now, bpf_probe_read_str does not require initialized memory, so
memset may not be necessary.


As far as actually trying the trick, I've had difficulty importing <string.h> in my bpf program. I'm not working in the context of BCC, so maybe that makes the difference. I've tried zero-ing out my buffer manually, and it didn't seem to change anything. I've had better success allocating my buffer using map memory rather than stack memory, but I'm still curious what a memset could do for me.
A lot of string.h functions are implemented as external functions in
glibc. This won't work for bpf programs as the bpf program is not
linked against glibc. The clang compiler will translate the above
memset to some stores if memset() size is not big enough. Better,
using clang __builtin_memset() so it won't have any relation with
glibc.


Thanks a lot!

- Andrei


Re: verifier: variable offset stack access question

Andrei Matei
 

For posterity, I think I can now answer my own question. I suspect
things were different in 2018 (because otherwise I don’t see how the
referenced exchange makes sense); here’s my understanding about the
verifier’s rules for stack accesses today:

There’s two distinct aspects relevant to the use of variable stack offsets:

1) “Direct” stack access with variable offset. This is simply
forbidden; you can’t read or write from a dynamic offset in the stack
because, in the case of reads, the verifier doesn’t know what type of
memory would be returned (is it “misc” data? Is it a spilled
register?) and, in the case of writes, what stack slot’s memory type
should be updated.
Separately, when reading from the stack with a fixed offset, the
respective memory needs to have been initialized (i.e. written to)
before.

2) Passing pointers to the stack to helper functions which will write
through the pointer (such as bpf_probe_read_user()). Here, if the
stack offset is variable, then all the memory that falls within the
possible bounds has to be initialized.
If the offset is fixed, then the memory doesn’t necessarily need to be
initialized (at least not if the helper’s argument is of type
ARG_PTR_TO_UNINIT_MEM). Why the restriction in the variable offset
case? Because, in that case, it cannot be known what memory the helper
will end up initializing; if the verifier pretended that all the
memory within the offset bounds would be initialized then further
reads could leak uninitialized stack memory.


verifier: variable offset stack access question

Andrei Matei
 

Hello Yonghong, all,

I'm curious about a verifier workaround that Yonghong provided two years ago, in this thread.
Brendan Gregg was asking about accessing stack buffers through a register with a variable offset, and Yonghong suggested a memset as a solution:
"
You can initialize the array with ' ' to workaround the issue:
    struct data_t data;
    uint64_t max = sizeof(data.argv);
    const char *argp = NULL;
    memset(&data, ' ', sizeof(data));
    bpf_probe_read(&argp, sizeof(argp), (void *)&__argv[0]);
    uint64_t len = bpf_probe_read_str(&data.argv, max, argp);
    len &= 0xffffffff; // to avoid: "math between fp pointer and register errs"
    bpf_trace_printk("len: %d\\n", len); // sanity check: len is indeed valid
"

My question is - how does the memset help? I sort of understand the trouble with variable stack access (different regions of the stack can hold memory of different types), and I've looked through the verifier's code but I've failed to get a clue.

As far as actually trying the trick, I've had difficulty importing <string.h> in my bpf program. I'm not working in the context of BCC, so maybe that makes the difference. I've tried zero-ing out my buffer manually, and it didn't seem to change anything. I've had better success allocating my buffer using map memory rather than stack memory, but I'm still curious what a memset could do for me.

Thanks a lot!

- Andrei


[Warning ⚠] Do you understand how to built bpf.file for snort on fedora?

Dorian ROSSE
 

Hello, 


[Warning ⚠] Do you understand how to built bpf.file for snort on fedora?

Thank you in advance, 

I hope the success, 

Regards. 


Dorian Rosse 
Téléchargez Outlook pour Android


Re: High volume bpf_perf_output tracing

Daniel Xu
 

Hi,

Ideally you’d want to do as much work in the kernel as possible. Passing that much data to user space is kind of mis using bpf.

What kind of work are you doing that can only be done in user space?

But otherwise, yeah, if you need perf, you might get more power from a lower level language. C/c++ is one option, you could also check out libbpf-rs if you prefer to write in rust.

Daniel

On Thu, Nov 19, 2020, at 5:56 PM, wes.vaske@... wrote:
I'm currently working on a python script to trace the nvme driver. I'm
hitting a performance bottleneck on the event callback in python and am
looking for the best way (or maybe a quick and dirty way) to improve
performance.

Currently I'm attaching to a kprobe and 2 tracepoints and using
perf_submit to pass information back to userspace.

When my callback is:
def count_only(cpu, data, size):
event_count += 1

My throughput is ~2,000,000 events per second

When my callback is my full event processing the throughput drops to
~40,000 events per second.

My first idea was to put the event_data in a Queue and have multiple
worker processes handle the parsing. Unfortunately the bcc.Table
classes aren't pickleable. As soon as we start parsing data to put in
the queue we drop down to 150k events per second without even touching
the Queue, just converting data types.

My next idea was to just store the data in memory and process after the
fact (for this use case, I effectively have "unlimited" memory for the
trace). This ranges from 100k to 450k events per second. (I think
python his issues allocating memory quickly with a list.append() and
with tuning I should be able to get 450k sustained). This isn't
terrible but I'd like to be above 1,000,000 events per second.

My next idea was to see if I can attach multiple reader processes to
the same BPF map. This is where I hit the wall and came here. It looks
like there isn't a way to do this with the Python API; at least not
easily.

With that context, I have 2 questions:
1. Is there a way I can attach multiple python processes to the same
BPF map to poll in parallel? Event ordering doesn't matter, I'll just
post process it all anyway. This doesn't need to be a final solution,
just something to get me through the next month
2. What is the "right" way to do this? My primary concern is
increasing the rate at which I can move data from the BPF_PERF_OUTPUT
map to userspace. It looks like the Python API is being deprecated in
favor of libbpf. So I'm assuming a C++ version of this script would be
the "right" way? (I've never touched C/C++ outside the BPF C code so
this would need to be a future project for me)


Thanks!


Re: BPF Maps with wildcards

Yonghong Song
 

On Thu, Nov 19, 2020 at 9:57 AM Marinos Dimolianis
<dimolianis.marinos@...> wrote:

Thanks for the response.
LPM is actually the closest solution however I wanted a structure closer to the way TCAMs operate in which you can have wildcards also in the interim bits.
I believe that something like that does not exist and I need to implement it using available structures in
eBPF/XDP.

Right, BPF does not have TCAM style maps. If you organize data
structure properly, you may be able to use LPM.


Στις Πέμ, 19 Νοε 2020 στις 5:27 π.μ., ο/η Y Song <ys114321@...> έγραψε:

On Wed, Nov 18, 2020 at 6:20 AM <dimolianis.marinos@...> wrote:

Hi all, I am trying to find a way to represent wildcards in BPF Map Keys?
I could not find anything relevant to that, does anyone know anything further.
Are there any efforts towards that functionality?
The closest map is lpm (trie) map. You may want to take a look.


High volume bpf_perf_output tracing

wes.vaske@...
 

I'm currently working on a python script to trace the nvme driver. I'm hitting a performance bottleneck on the event callback in python and am looking for the best way (or maybe a quick and dirty way) to improve performance.

Currently I'm attaching to a kprobe and 2 tracepoints and using perf_submit to pass information back to userspace.

When my callback is:
def count_only(cpu, data, size):
    event_count += 1

My throughput is ~2,000,000 events per second

When my callback is my full event processing the throughput drops to ~40,000 events per second.

My first idea was to put the event_data in a Queue and have multiple worker processes handle the parsing. Unfortunately the bcc.Table classes aren't pickleable. As soon as we start parsing data to put in the queue we drop down to 150k events per second without even touching the Queue, just converting data types.

My next idea was to just store the data in memory and process after the fact (for this use case, I effectively have "unlimited" memory for the trace). This ranges from 100k to 450k events per second. (I think python his issues allocating memory quickly with a list.append() and with tuning I should be able to get 450k sustained). This isn't terrible but I'd like to be above 1,000,000 events per second.

My next idea was to see if I can attach multiple reader processes to the same BPF map. This is where I hit the wall and came here. It looks like there isn't a way to do this with the Python API; at least not easily.

With that context, I have 2 questions:
  1. Is there a way I can attach multiple python processes to the same BPF map to poll in parallel? Event ordering doesn't matter, I'll just post process it all anyway. This doesn't need to be a final solution, just something to get me through the next month
  2. What is the "right" way to do this? My primary concern is increasing the rate at which I can move data from the BPF_PERF_OUTPUT map to userspace. It looks like the Python API is being deprecated in favor of libbpf. So I'm assuming a C++ version of this script would be the "right" way? (I've never touched C/C++ outside the BPF C code so this would need to be a future project for me)


Thanks!


Re: BPF Maps with wildcards

Marinos Dimolianis
 

Thanks for the response.
LPM is actually the closest solution however I wanted a structure closer to the way TCAMs operate in which you can have wildcards also in the interim bits.
I believe that something like that does not exist and I need to implement it using available structures in eBPF/XDP.

Στις Πέμ, 19 Νοε 2020 στις 5:27 π.μ., ο/η Y Song <ys114321@...> έγραψε:

On Wed, Nov 18, 2020 at 6:20 AM <dimolianis.marinos@...> wrote:
>
> Hi all, I am trying to find a way to represent wildcards in BPF Map Keys?
> I could not find anything relevant to that, does anyone know anything further.
> Are there any efforts towards that functionality?

The closest map is lpm (trie) map. You may want to take a look.


Re: BPF Maps with wildcards

Yonghong Song
 

On Wed, Nov 18, 2020 at 6:20 AM <dimolianis.marinos@...> wrote:

Hi all, I am trying to find a way to represent wildcards in BPF Map Keys?
I could not find anything relevant to that, does anyone know anything further.
Are there any efforts towards that functionality?
The closest map is lpm (trie) map. You may want to take a look.


BPF Maps with wildcards

Marinos Dimolianis
 

Hi all, I am trying to find a way to represent wildcards in BPF Map Keys?
I could not find anything relevant to that, does anyone know anything further.
Are there any efforts towards that functionality?
Regards,
Marinos


Attaching dynamic uprobe to C++ library/application #bcc

harnan@...
 

Hi all,

I am learning about ebpf and the bcc tools/library. I have a question about dynamic uprobe of C++ code. I have been able to attach a uprobe successfully by looking up the mangled symbol name. However, I am curious how the bpf program will access the parameters or arguments of a function I am probing. For a C++ object, do I just create an equivalent C struct that represents the application's C++ object/class, and then typecast the argument (from PT_REGS_PARM[x](ctx)) ?

Thanks!
Siva


Re: Future of BCC Python tools

Alexei Starovoitov
 

On Mon, Oct 26, 2020 at 3:34 PM Brendan Gregg <brendan.d.gregg@...> wrote:

G'Day all,

I have colleagues working on BCC Python tools (e.g., the recent
enhancement of tcpconnect.py) and I'm wondering, given libbpf tools,
what our advice should be.

- Should we keep both Python and libbpf tools in sync?
- Should we focus on libbpf only, and leave Python versions for legacy systems?
bcc python is still used by many where they need on the fly compilation.
Such cases still exist. One example is USDT support.
The libbpf and CO-RE support for USDT is still wip.
So such cases have to continue using bcc style with llvm.
The number of such cases is gradually reducing.
I think right now anyone who starts with bpf should be all set with
libbpf, BTF and CO-RE. It's much better suited for embedded setups too.
So I think bcc as a go-to place is still a great framework, but adding
a new python based tool is probably not the best investment of time
for the noobs. Experiences folks who already learned py-bcc will
keep hacking their scripts in python. That's all great.
noobs should probably learn bpftrace for quick experiments
and libbpf-tools for standalone long-term tried-and-true tools.

Should we keep libbpf-tools and py-bcc tools in sync?
I think py tools where libbpf-tools replacement is complete could be
moved into 'deprecated' directory and not installed by default.
All major distros are built with CONFIG_DEBUG_INFO_BTF=y
so the users won't be surprised. Their favorite tools will keep
working as-is. The underlying implementation of them will quietly change.
We can document it of course, but who reads docs.


Future of BCC Python tools

Brendan Gregg
 

G'Day all,

I have colleagues working on BCC Python tools (e.g., the recent
enhancement of tcpconnect.py) and I'm wondering, given libbpf tools,
what our advice should be.

- Should we keep both Python and libbpf tools in sync?
- Should we focus on libbpf only, and leave Python versions for legacy systems?

I like the tweak-ability of the Python tools: sometimes I'm on a
production instance and I'll copy a tool and edit it on the fly. That
won't work with libbpf. Although, we also install all the bpftrace
tools on our prod instances [0], and if I'm editing tools I start with
them.

However, the llvm dependency of the Python tools is a pain, and an
obstacle for making bcc tools a default install with different
distros. I could imagine having a selection of the top 10 libbpf tools
as a package (bcc-essentials), which would be about 1.5 Mbytes (last
time I did libbpf tool it was 150 Kbytes stripped), and getting that
installed by default by different distros. (Ultimately, I want a
lightweight bpftrace installed by default as well.)

So, I guess I'm not certain about the future of the BCC Python tools.
What do people think? If we agree that the Python tools are legacy, we
should update the README to let everyone know.

Note: I'm just talking about the tools (tools/*.py)! I imagine BCC
Python is currently used for many other BPF things, and I'm not
suggesting that go away.

Brendan

[0] https://github.com/Netflix-Skunkworks/bpftoolkit


execveat tracepoints issues

alessandro.gario@...
 

Hello everyone!

I am experiencing some issues with the execveat tracepoints, and was wondering if others could reproduce it or help me understand what I am doing wrong.

On Arch Linux (kernel 5.9.1, perf 5.7.g3d77e6a8804a), both sys_enter_execveat and sys_exit_execveat never seem to report any event.

On Ubuntu 20.04 (kernel 5.4.0, perf 5.4.65), sys_enter_execveat will work provided there is no one else making use of that tracepoint, while sys_exit_execveat is always completely silent.

I traced the program I am using to test this with strace and verified that execveat is being called correctly. The following is the source code for that program:

---
#include <unistd.h>
#include <linux/fcntl.h>
#include <linux/unistd.h>

int main() {
syscall(__NR_execveat, AT_FDCWD,
"/usr/bin/bash", nullptr,
nullptr, 0);

return 0;
}
---

Here's a recording of what I'm experiencing on Ubuntu: https://asciinema.org/a/6EiDfoOpK1AYcDm7aPftrYqdo

Thanks for your help!

Alessandro Gario


Re: Minimum LLVM version for bcc

Yonghong Song
 

On Wed, Oct 21, 2020 at 8:57 AM Dale Hamel <daleha@...> wrote:

Does the LLVM version used by bcc matter, for packaging purposes?
This is a good question. For packaging purpose, no, it does not matter
much. The people who builds package can choose whatever it is
available to them to package. bcc is supposed to work for all major
llvm releases since llvm 3.7.


I assume bcc includes some static libraries from LLVM, so I'm curious if the older versions are acceptable. For instance, on ubuntu 16.04, we use LLVM 3.7, but on ubuntu 18.04 and 20.04 it uses LLVM 6.0, based on the current debian control file.
This is probably due to historical reason.


Are there features of newer LLVM releases that we need? For example, does BTF require a specific minimum version of LLVM? If this is the case, perhaps we should update the dependency descriptions in the debian control file to reflect this.
for BTF support, best is >= llvm10. For testing purpose, we may still
want to keep an option to build with old llvm's.


Minimum LLVM version for bcc

Dale Hamel
 

Does the LLVM version used by bcc matter, for packaging purposes?

I assume bcc includes some static libraries from LLVM, so I'm curious if the older versions are acceptable. For instance, on ubuntu 16.04, we use LLVM 3.7, but on ubuntu 18.04 and 20.04 it uses LLVM 6.0, based on the current debian control file.

Are there features of newer LLVM releases that we need? For example, does BTF require a specific minimum version of LLVM? If this is the case, perhaps we should update the dependency descriptions in the debian control file to reflect this.

-Dale


Re: [Ext] Re: [iovisor-dev] Questions about current eBPF usages

Jiada Tu
 

Thank you very much, Yonghong! Those are very helpful.


Re: [Ext] Re: [iovisor-dev] Questions about current eBPF usages

Yonghong Song
 

On Thu, Oct 15, 2020 at 11:03 PM Jiada Tu <jtu3@...> wrote:

Thanks a lot, Yonghong. From your response:

In your case, the bpf program is to influence io scheduling decisions.
You could implement in a way to do kernel data structure write in
kernel but have a hook to a bpf program to make decision and based on
bpf program return value, kernel can decide what to schedule.

1. How can I make a kernel function use the return value of a eBPF program/function?
e.g., in kernel/events/core.c, for perf event overflow handler, we have

rcu_read_lock();
ret = BPF_PROG_RUN(event->prog, &ctx);
rcu_read_unlock();
out:
__this_cpu_dec(bpf_prog_active);
if (!ret)
return;

event->orig_overflow_handler(event, data, regs);

The above `ret` is the return value from the bpf program.


2. An KProbes related question: from an old article https://lwn.net/Articles/132196/ which was written in 2005, it said:
```
The current KProbes implementation, however, introduces some latency of its own in handling probes. The cause behind this latency is the single kprobe_lock which serializes the execution of probes across all CPUs on a SMP machine.
```
I read it as "functions (e.g., eBPF functions) attached to KProbes are executed in serial, i.e., the same eBPF function can not be run by multiple threads at the same time". As eBPF programs frequently use KProbes to hook to kernel functions, do you know if it's true currently that the calling of a eBPF function/program is single-threaded?
The article is 2005, I am not sure whether this serialization of
kprobes across all CPUs still true or not. bpf subsystem won't prevent
from executing on all cpus in parallelism if kprobe subsystem allows
it.

We recently have kfunc based probing, this is trampoline based, much
faster and does not have this restriction.


Re: [Ext] Re: [iovisor-dev] Questions about current eBPF usages

Jiada Tu
 

Thanks a lot, Yonghong. From your response:

In your case, the bpf program is to influence io scheduling decisions.
You could implement in a way to do kernel data structure write in
kernel but have a hook to a bpf program to make decision and based on
bpf program return value, kernel can decide what to schedule.

1. How can I make a kernel function use the return value of a eBPF program/function?

2. An KProbes related question: from an old article https://lwn.net/Articles/132196/ which was written in 2005, it said:
```
The current KProbes implementation, however, introduces some latency of its own in handling probes. The cause behind this latency is the single kprobe_lock which serializes the execution of probes across all CPUs on a SMP machine.
```
I read it as "functions (e.g., eBPF functions) attached to KProbes are executed in serial, i.e., the same eBPF function can not be run by multiple threads at the same time". As eBPF programs frequently use KProbes to hook to kernel functions, do you know if it's true currently that the calling of a eBPF function/program is single-threaded?

 


Re: Questions about current eBPF usages

Yonghong Song
 

On Thu, Oct 15, 2020 at 4:06 PM Jiada Tu via lists.iovisor.org
<jtu3=hawk.iit.edu@...> wrote:

Hello BPF community,

I am looking for a way to move a user space program's disk I/O scheduling related logic down to kernel space, and then have the new kernel logic communicate with the user space program to make better I/O scheduling decisions. The reason that the user space program itself has I/O scheduling logic is because it needs to prioritize certain read or write requests.

I started looking at eBPF for that purpose. After doing some research, I learned that eBPF is very good at kernel profiling and tracing, but I didn't find much information about modifying kernel functions / data-structure using eBPF.

I am wondering:

1. Instead of calling eBPF function before / after calling a kernel function and then returning back to that kernel function, is it possible for eBPF programs to totally replace a kernel function or module logic?
Currently, no. Kernel has support to replace a bpf program, but not
kernel function. Replacing kernel functions may easily causing kernel
mishehave. There are some proposals to explicitly specify functions
which can be replaced. This work is not done yet.


2. Is it possible for eBPF programs to tamper the parameter and return value of a kernel function, or eBPF program can only read kernel data-structure but can not modify them? (some search indicates that it can not few years ago, but I am not sure if it is changed recently)
No for input parameters.
Yes for return values in certain cases. For any kernel functions
annotated with ALLOW_ERROR_INJECTION, you can attach a bpf program to
that function to change its return values.

all tracing programs can read kernel data structures as of today with
bpf_probe_read or
direct memory access similar to bpf_probe_read in later kernels.
writing to kernel data structure has to be extremely careful as it can
easily crash the kernel or cause kernel to misbehavior. This has to be
done in a controlled way, e.g., in networking, through specific
helpers.

In your case, the bpf program is to influence io scheduling decisions.
You could implement in a way to do kernel data structure write in
kernel but have a hook to a bpf program to make decision and based on
bpf program return value, kernel can decide what to schedule.



Thank you!
Jiada

81 - 100 of 2020