Polling multiple BPF_MAP_TYPE_PERF_EVENT_ARRAY causing dropped events


Ian
 
Edited

The project I am working on generically loads BPF object files, pins their respective maps, and then proceeds to use perf_buffer__poll from libbpf to poll the maps. I currently am polling the multiple maps this way after loading and setting everything else up:

        while(true) {
            LIST_FOREACH(evt, &event_head, list) {
                if(evt->map_loaded == 1) {
                    err = perf_buffer__poll(evt->pb, 10);
                    if(err < 0) {
                        break;
                    }
                }
            }
        }

Where a evt is a structure that looks like:

struct evt_struct {
    char * map_name;
    FILE * fp;
    int map_loaded;
    ...<some elements removed for clarity>...
    struct perf_buffer * pb;
    LIST_ENTRY(evt_struct) list;
};

Essentially each event (evt) in this program correlates to a BPF program. I am looping through the events and calling perf_buffer__poll for each of them. This doesn't seem efficient and to me it makes the epoll_wait that perf_buffer__poll calls loose any of its efficiencies by looping through the events before hand. In perf_buffer__poll epoll is used to poll each CPU. Is there a more efficient way to poll multiple maps like this? Does it involve dropping perf? I don't like that I have to make a separate epoll context for each BPF program I am going to poll, that just checks the CPUs. It would be better if I just had two sets for epoll to monitor, but then I would lose the built in perf functionality. More than just being efficient my current polling implementation drops a significant number of events (i.e. the lost event callback in the perf options is called). This is the issue that really must be fixed.  I have some ideas that might be worth trying but I wanted to ascertain more information before I do any substantial refactoring: 

1) I was thinking about dropping perf and just using another BPF map type (Hash, Array) to pass elements back to user space then using a standard epoll context to monitor all the maps FD. I wouldn't lose any events that way (or if I did I would never know). But I have read in various books that perf maps are the ideal way to send data to user space...

2) Do perf maps or their buffer pages (for the mmap ring buffer) get cleaned up automatically? When do analyzed entries get removed? I tried increasing the page size of my perf buffer and it just took longer for me to start getting lost events. Which almost suggests I am leaking memory. Am I using perf incorrectly? Each perf buffer is created by:

pb_opts.sample_cb = handle_events;
pb_opts.lost_cb = handle_lost_events;
evt->pb = perf_buffer__new(map_fd, 16, &pb_opts); // Where the map_fd is received from a bpf_object_get call

Any help or advice would be appreciated!

- Ian
 

Join iovisor-dev@lists.iovisor.org to automatically receive all group messages.