High volume bpf_perf_output tracing


I'm currently working on a python script to trace the nvme driver. I'm hitting a performance bottleneck on the event callback in python and am looking for the best way (or maybe a quick and dirty way) to improve performance.

Currently I'm attaching to a kprobe and 2 tracepoints and using perf_submit to pass information back to userspace.

When my callback is:
def count_only(cpu, data, size):
    event_count += 1

My throughput is ~2,000,000 events per second

When my callback is my full event processing the throughput drops to ~40,000 events per second.

My first idea was to put the event_data in a Queue and have multiple worker processes handle the parsing. Unfortunately the bcc.Table classes aren't pickleable. As soon as we start parsing data to put in the queue we drop down to 150k events per second without even touching the Queue, just converting data types.

My next idea was to just store the data in memory and process after the fact (for this use case, I effectively have "unlimited" memory for the trace). This ranges from 100k to 450k events per second. (I think python his issues allocating memory quickly with a list.append() and with tuning I should be able to get 450k sustained). This isn't terrible but I'd like to be above 1,000,000 events per second.

My next idea was to see if I can attach multiple reader processes to the same BPF map. This is where I hit the wall and came here. It looks like there isn't a way to do this with the Python API; at least not easily.

With that context, I have 2 questions:
  1. Is there a way I can attach multiple python processes to the same BPF map to poll in parallel? Event ordering doesn't matter, I'll just post process it all anyway. This doesn't need to be a final solution, just something to get me through the next month
  2. What is the "right" way to do this? My primary concern is increasing the rate at which I can move data from the BPF_PERF_OUTPUT map to userspace. It looks like the Python API is being deprecated in favor of libbpf. So I'm assuming a C++ version of this script would be the "right" way? (I've never touched C/C++ outside the BPF C code so this would need to be a future project for me)


Join iovisor-dev@lists.iovisor.org to automatically receive all group messages.