callchains & args


Brendan Gregg
 

G'Day,

I wrote a long script that, as a basic example, tries to do the following in C:

int trace(struct pt_regs *ctx) {
        bpf_trace_printk("%d", ctx->ax);
        return 1;
}

Then consumes both the printk output, and the callchain, in Python. But these currently seem mutually exclusive:

- tests/cc/test_callchain.py uses a callback and a b.kprobe_poll() loop to fetch the callchain.
- Many other examples use a b.trace_fields() loop to fetch the printk output.

I haven't found a way to do both at the same time. I'd like the callchain with the output of printk together.

Thoughts? Is this just another example of pushing bpf_trace_printk() too far?

Could the callback arguments be extended to be more than "pid, callchain"?

If the "return 1" and callback method is reading the raw perf_event, is there a way to read the fmt string? (which is usually set to something useful for tracepoints). Could there be a bpf_trace_fmt(), to customize such a string for kprobes?

thanks,

Brendan


Brenden Blanco <bblanco@...>
 

On Mon, Oct 05, 2015 at 01:32:14PM -0700, Brendan Gregg via iovisor-dev wrote:
G'Day,

I wrote a long script that, as a basic example, tries to do the following
in C:

int trace(struct pt_regs *ctx) {
bpf_trace_printk("%d", ctx->ax);
return 1;
}

Then consumes both the printk output, and the callchain, in Python. But
these currently seem mutually exclusive:

- tests/cc/test_callchain.py uses a callback and a b.kprobe_poll() loop to
fetch the callchain.
- Many other examples use a b.trace_fields() loop to fetch the printk
output.

I haven't found a way to do both at the same time. I'd like the callchain
with the output of printk together.
There can always be a python thread for this, but I don't believe there
would be a way to correlate the two events, since they'll come in at
different times and with different chances of being rate limited.


Thoughts? Is this just another example of pushing bpf_trace_printk() too
far?
In a nutshell: yes.

Where I'd like to eventually get is one poll loop, which is the
kprobe_poll/perf one. The trace_pipe based approach should eventually
die, but if we do it right the tools/examples won't notice (we can
migrate trace_fields() to map onto kprobe_poll). Just the underlying
implementation will change. The information in the printk can in theory
be returned over the ring buffer, but there is still some kernel work to
be done. Multiple people have asked for this, so I would place it high
on the priority list with a good chance of it making it into a
near-future kernel.


Could the callback arguments be extended to be more than "pid, callchain"?
Yes, take a look at what is available in linux/perf_event.h:
enum perf_event_sample_format {
PERF_SAMPLE_* ...
}

If you have something specific in there that would be useful, we can add
code for it. I don't have time to implement the whole list at the
moment.


If the "return 1" and callback method is reading the raw perf_event, is
there a way to read the fmt string? (which is usually set to something
useful for tracepoints). Could there be a bpf_trace_fmt(), to customize
such a string for kprobes?
This is the TBD kernel work. I want this too :)


thanks,

Brendan
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Brendan Gregg
 

On Mon, Oct 5, 2015 at 1:53 PM, Brenden Blanco <bblanco@...> wrote:
On Mon, Oct 05, 2015 at 01:32:14PM -0700, Brendan Gregg via iovisor-dev wrote:
G'Day,

I wrote a long script that, as a basic example, tries to do the following
in C:

int trace(struct pt_regs *ctx) {
bpf_trace_printk("%d", ctx->ax);
return 1;
}

Then consumes both the printk output, and the callchain, in Python. But
these currently seem mutually exclusive:

- tests/cc/test_callchain.py uses a callback and a b.kprobe_poll() loop to
fetch the callchain.
- Many other examples use a b.trace_fields() loop to fetch the printk
output.

I haven't found a way to do both at the same time. I'd like the callchain
with the output of printk together.
There can always be a python thread for this, but I don't believe there
would be a way to correlate the two events, since they'll come in at
different times and with different chances of being rate limited.


Thoughts? Is this just another example of pushing bpf_trace_printk() too
far?
In a nutshell: yes.

Where I'd like to eventually get is one poll loop, which is the
kprobe_poll/perf one. The trace_pipe based approach should eventually
die, but if we do it right the tools/examples won't notice (we can
migrate trace_fields() to map onto kprobe_poll). Just the underlying
implementation will change. The information in the printk can in theory
be returned over the ring buffer, but there is still some kernel work to
be done. Multiple people have asked for this, so I would place it high
on the priority list with a good chance of it making it into a
near-future kernel.


Could the callback arguments be extended to be more than "pid, callchain"?
Yes, take a look at what is available in linux/perf_event.h:
enum perf_event_sample_format {
PERF_SAMPLE_* ...
}

If you have something specific in there that would be useful, we can add
code for it. I don't have time to implement the whole list at the
moment.
I dug through the list and realized that the actual integer I want to
emit in this case -- which is not ctx->ax like my simple example, but
is the thread blocked time -- can already be exposed as
PERF_SAMPLE_PERIOD. I blogged about it previously:
http://www.brendangregg.com/blog/2015-02-26/linux-perf-off-cpu-flame-graph.html

So maybe my particular case can be served with PERF_SAMPLE_PERIOD
support. Although there would need to be a way to set it from
bcc/eBPF.

Alternately, I could roughly associate kprobe_poll() and
trace_fields() given PERF_SAMPLE_TIME, and maybe PERF_SAMPLE_CPU
(along with the PID).

Brendan