Fixing stack trace function names by argument introspection


marko@kevac.org
 

Hi!

Imagine I have an interpreter that runs some program in some custom language. If I were to get a stack trace, it would look like:

sys_read() [k]
read()
execute_fn()
execute_fn()
execute_fn()
execute_fn()
main()

These execute_fn() functions execute functions defined in my custom language. Such stack trace is not very helpful.

But I know that I can get to real function name through execute_fn() arguments. Imagine it is as simple as execute_fn(char *real_fn_name).

I know I can trace execute_fn() invocations and get to this function name through BCC/eBPF. But I would like to have tool similar to profile.py to be able to profile my programs written in my custom language.

So I need to get stack traces periodically (49 Hz say) and I need to substitute name of a function from execute_fn() to the real one from arguments.

Can you give me some pointers how to do that or if it is possible at all.
I couldn't find any example that walks stack trace. All of the examples just record them.

Thanks!
Marko.


Yonghong Song
 

On Sun, Jul 22, 2018 at 3:00 PM, marko@... <marko@...> wrote:
Hi!

Imagine I have an interpreter that runs some program in some custom
language. If I were to get a stack trace, it would look like:

sys_read() [k]
read()
execute_fn()
execute_fn()
execute_fn()
execute_fn()
main()

These execute_fn() functions execute functions defined in my custom
language. Such stack trace is not very helpful.

But I know that I can get to real function name through execute_fn()
arguments. Imagine it is as simple as execute_fn(char *real_fn_name).

I know I can trace execute_fn() invocations and get to this function name
through BCC/eBPF. But I would like to have tool similar to profile.py to be
able to profile my programs written in my custom language.

So I need to get stack traces periodically (49 Hz say) and I need to
substitute name of a function from execute_fn() to the real one from
arguments.

Can you give me some pointers how to do that or if it is possible at all.
I couldn't find any example that walks stack trace. All of the examples just
record them.
We did not have such an example in BCC. In Facebook, we have a bpf
program to catch
stack traces for python programs. It is very similar to what you want
to achieve in the above.
Basically, you need to walk the stack by yourself. Since verifier do
not support unbounded loops,
you need to have a fully-unrollable loop with progma unroll.

During each loop iteration, you can access the frame pointer, you need
some mechanism to
get the real function name based on that level frame pointer and then
you move on
to the next. In bpf program, you can access current task structure,
which contains some
data related to TLS which could be used by the bpf program.


Thanks!
Marko.


marko@kevac.org
 



On Mon, Jul 23, 2018 at 7:48 AM, Y Song <ys114321@...> wrote:

We did not have such an example in BCC. In Facebook, we have a bpf
program to catch
stack traces for python programs. It is very similar to what you want
to achieve in the above.

Can you share it with me? Maybe I can use it as an example.
 
Basically, you need to walk the stack by yourself. Since verifier do
not support unbounded loops,
you need to have a fully-unrollable loop with progma unroll.

I have never used pragma unroll before, but I understand what it is supposed to do.
Quick search gives me usages for CUDA and several little known examples for gcc/clang.

 
During each loop iteration, you can access the frame pointer, you need
some mechanism to
get the real function name based on that level frame pointer and then
you move on
to the next. In bpf program, you can access current task structure,
which contains some
data related to TLS which could be used by the bpf program.


As far as I understand it would work if my program is built with frame pointers. In that case going throigh stack trace shold be straightforward. Never done it before though :-)
But usually programs are build omitting frame pointers. In that case you need additional info from DWARF and code is much more complex. Right?
Are you suggesting implementing all this?

Sorry for newbie questions :-)


Yonghong Song
 

On Tue, Jul 24, 2018 at 9:06 AM, marko@... <marko@...> wrote:


On Mon, Jul 23, 2018 at 7:48 AM, Y Song <ys114321@...> wrote:


We did not have such an example in BCC. In Facebook, we have a bpf
program to catch
stack traces for python programs. It is very similar to what you want
to achieve in the above.

Can you share it with me? Maybe I can use it as an example.
It is a piece of complicated software, let me see how much I can do.



Basically, you need to walk the stack by yourself. Since verifier do
not support unbounded loops,
you need to have a fully-unrollable loop with progma unroll.

I have never used pragma unroll before, but I understand what it is supposed
to do.
Quick search gives me usages for CUDA and several little known examples for
gcc/clang.
Are you talking about them?
https://stackoverflow.com/questions/4071690/tell-gcc-to-specifically-unroll-a-loop
Yes, it is `#pragma unroll`.



During each loop iteration, you can access the frame pointer, you need
some mechanism to
get the real function name based on that level frame pointer and then
you move on
to the next. In bpf program, you can access current task structure,
which contains some
data related to TLS which could be used by the bpf program.
As far as I understand it would work if my program is built with frame
pointers. In that case going throigh stack trace shold be straightforward.
Never done it before though :-)
Not 100% whether just frame pointers are enough for you or not.
Remeber, on the stack, typically only frame pointer (if available),
function return address, spills, and #7 and later arguments.
It is very likely the case that `fn_name` in `execute_fn(fn_name)` may
not on the stack and you need to find a different way to access it.

But usually programs are build omitting frame pointers. In that case you
need additional info from DWARF and code is much more complex. Right?
Are you suggesting implementing all this?
We are talking about stack walking inside bpf programs, dwarf option is
certainly out of question. In that case, you may need to use
perf record dwarf...


Sorry for newbie questions :-)