Re: upprobe speedups?

Yonghong Song


The issue you are facing on prohibitive uprobe performance (even for
sampling) has been discussed in this forum. The idea is to adopt maybe
similar mechanism like dyninst or kerninst. We (me, Brenden, and maybe
others) are starting to exploring in this space.


On Mon, Jul 17, 2017 at 4:37 PM, Frederik Deweerdt via iovisor-dev
<iovisor-dev@...> wrote:

I'm trying to use uprobes in order to be able to keep track of how
long it takes to execute a given function in a shared library used by
tens of thousands of threads on a 32 core machine. Unfortunately, i'm
seeing a 3x slowdown when setting the uprobes (even ones that only do
a `return 0;` in the body).

Looking at a perf record, it looks like lock contention is the
culprit: queued_spin_lock_slowpath accounts for more than 20% of the

I'm wondering if there's a way around that, for my use case sampling
would be an option, but I haven't found a way to do so without
actually entering the probe (which has a prohibitive cost by itself).

Any thoughts?
iovisor-dev mailing list

Join { to automatically receive all group messages.