Re: Polling multiple BPF_MAP_TYPE_PERF_EVENT_ARRAY causing dropped events
Andrii Nakryiko
On Wed, Aug 12, 2020 at 5:38 AM Ian <icampbe14@...> wrote:
No perf buffer is just fine to pass data from the BPF program in the kernel to the user-space part for post-processing. It's hard to give you any definitive answer, it all depends. But think about this. Perf buffer is a queue. Let's say that your per-CPU buffer size is 1MB, each of your samples is say 1KB. What does that mean? It means that at any given time you can't have at most 1024 samples enqueued. So, if your BPF program in the kernel generates those 1024 samples faster than the user-space side consumes them, then you'll have drops. So you have many ways to reduce drops: 1. Generate events at the lower rate. E.g., add sampling, filter unuseful events, etc. This will give user-space side time to consume. 2. Speed up user-space. Many things can influence this. You can do less work per item. You can ensure you start reacting to items sooner by increasing priority of your consumer thread and/or pin it to a dedicated CPU, etc. 3. Reduce the size of the event. If you can reduce sample size from 1KB to 512B by more effective data encoding or dropping unnecessary data, you suddenly will be able to produce up to 2048 events before running out of space. That will give your user-space more time to consume data. 4. Increase per-CPU buffer size. Going from 1MB to 2MB will have the same effect as reducing sample size from 1KB to 512B, again, increasing the capacity of your buffer and thus giving more time to consumer data. Hope that makes sense and helps showing that I can't answer your questions, you'll need to do analysis on your own based on your specific implementation and problem domain. Some of the event loss might also be attributed to the inefficiencies of my looping mechanism. Although I think the feedback loop might be the bigger culprit. I am thinking about following the Sysdig approach, which is to have a single perf buffer that is used by all my BPF programs (16 in total). This would remove the loop and eliminate all but 1 perf buffer. I would think that would be more efficient because I am removing 15 perf buffers and their epoll_waits. Then I would use a ID member of each passed data structure to properly read the data.Yes, that would be a good approach. It's better to have 16x bigger single perf_buffer shared across all BPF programs, than 16 separate smaller perf buffers. Because you can absorb event spikes more effectively. One way I can help you, if you do need to have multiple PERF_EVENT_ARRAY maps that you need to consume, is to add perf_buffer APIs similar to ring_buffer that would allow to epoll all of them simultaneously. Let me know if you are interested. That will effectively eliminate your outer (LIST_FOREACH(evt, &event_head, list)), you'll be just doing while(true) perf_buffer__poll() across all perf buffers simultaneously. But single perf_buffer allows you to do the same, effectively.
|
|