On Mon, Oct 5, 2015 at 1:53 PM, Brenden Blanco <bblanco@...> wrote:
On Mon, Oct 05, 2015 at 01:32:14PM -0700, Brendan Gregg via iovisor-dev wrote:
G'Day,
I wrote a long script that, as a basic example, tries to do the following
in C:
int trace(struct pt_regs *ctx) {
bpf_trace_printk("%d", ctx->ax);
return 1;
}
Then consumes both the printk output, and the callchain, in Python. But
these currently seem mutually exclusive:
- tests/cc/test_callchain.py uses a callback and a b.kprobe_poll() loop to
fetch the callchain.
- Many other examples use a b.trace_fields() loop to fetch the printk
output.
I haven't found a way to do both at the same time. I'd like the callchain
with the output of printk together.
There can always be a python thread for this, but I don't believe there
would be a way to correlate the two events, since they'll come in at
different times and with different chances of being rate limited.
Thoughts? Is this just another example of pushing bpf_trace_printk() too
far?
In a nutshell: yes.
Where I'd like to eventually get is one poll loop, which is the
kprobe_poll/perf one. The trace_pipe based approach should eventually
die, but if we do it right the tools/examples won't notice (we can
migrate trace_fields() to map onto kprobe_poll). Just the underlying
implementation will change. The information in the printk can in theory
be returned over the ring buffer, but there is still some kernel work to
be done. Multiple people have asked for this, so I would place it high
on the priority list with a good chance of it making it into a
near-future kernel.
Could the callback arguments be extended to be more than "pid, callchain"?
Yes, take a look at what is available in linux/perf_event.h:
enum perf_event_sample_format {
PERF_SAMPLE_* ...
}
If you have something specific in there that would be useful, we can add
code for it. I don't have time to implement the whole list at the
moment.
I dug through the list and realized that the actual integer I want to
emit in this case -- which is not ctx->ax like my simple example, but
is the thread blocked time -- can already be exposed as
PERF_SAMPLE_PERIOD. I blogged about it previously:
http://www.brendangregg.com/blog/2015-02-26/linux-perf-off-cpu-flame-graph.htmlSo maybe my particular case can be served with PERF_SAMPLE_PERIOD
support. Although there would need to be a way to set it from
bcc/eBPF.
Alternately, I could roughly associate kprobe_poll() and
trace_fields() given PERF_SAMPLE_TIME, and maybe PERF_SAMPLE_CPU
(along with the PID).
Brendan