Why is BPF_PERF_OUTPUT max_entries set to total processor count?


Hayden Livingston
 

I'm very confused why BCC creates a map of number of processors for
the perf_events output map.

I can imagine it being 1 since all it does is act as a kernel-user
mode intermediary and it is true that the code cannot be preempted.

Or if it can be preempted then I can imagine that since there can't be
more than processor count it is the max depth one has to worry about.

Is my thinking flawed? Or maybe there is a completely different reason?


Yonghong Song
 

PERF_EVENT_OUTPUT map is to hold per cpu ring buffers created by
perf_event_open.
That is why its typical size is the number of cpus on the host.

On Sun, Feb 16, 2020 at 1:52 AM Hayden Livingston
<halivingston@...> wrote:

I'm very confused why BCC creates a map of number of processors for
the perf_events output map.

I can imagine it being 1 since all it does is act as a kernel-user
mode intermediary and it is true that the code cannot be preempted.

Or if it can be preempted then I can imagine that since there can't be
more than processor count it is the max depth one has to worry about.

Is my thinking flawed? Or maybe there is a completely different reason?



Hayden Livingston
 

Thanks. I had to re-read your reply and the kernel code multiple
times, but I think I get it now. Please confirm.

It is this call is made by user mode code:

fd = bpf_create_map(BPF_MAP_TYPE_PERF_EVENT_ARRAY, /*key_size*/
sizeof(int), /*value_size*/ sizeof(int), NUM_POSSIBLE_CPUS, 0);

key is smp_processor_id. value is perf_events fd. This is why the map
is both is key integer and value integer.

Why so many indirections? Is it to support pinning where user program
can different ring buffers?

To me it seems the kernel code that uses cpu index to look into array
could just to told fd directly.

On Sun, Feb 16, 2020 at 1:50 PM Y Song <ys114321@...> wrote:

PERF_EVENT_OUTPUT map is to hold per cpu ring buffers created by
perf_event_open.
That is why its typical size is the number of cpus on the host.

On Sun, Feb 16, 2020 at 1:52 AM Hayden Livingston
<halivingston@...> wrote:

I'm very confused why BCC creates a map of number of processors for
the perf_events output map.

I can imagine it being 1 since all it does is act as a kernel-user
mode intermediary and it is true that the code cannot be preempted.

Or if it can be preempted then I can imagine that since there can't be
more than processor count it is the max depth one has to worry about.

Is my thinking flawed? Or maybe there is a completely different reason?



Yonghong Song
 

On Sun, Feb 16, 2020 at 5:09 PM Hayden Livingston
<halivingston@...> wrote:

Thanks. I had to re-read your reply and the kernel code multiple
times, but I think I get it now. Please confirm.

It is this call is made by user mode code:

fd = bpf_create_map(BPF_MAP_TYPE_PERF_EVENT_ARRAY, /*key_size*/
sizeof(int), /*value_size*/ sizeof(int), NUM_POSSIBLE_CPUS, 0);

key is smp_processor_id. value is perf_events fd. This is why the map
is both is key integer and value integer.

Why so many indirections? Is it to support pinning where user program
can different ring buffers?
Perf event ring buffer is per cpu.


To me it seems the kernel code that uses cpu index to look into array
could just to told fd directly.
Yes, it is what it did in the kernel. Each array element holds one ring buffer.


On Sun, Feb 16, 2020 at 1:50 PM Y Song <ys114321@...> wrote:

PERF_EVENT_OUTPUT map is to hold per cpu ring buffers created by
perf_event_open.
That is why its typical size is the number of cpus on the host.

On Sun, Feb 16, 2020 at 1:52 AM Hayden Livingston
<halivingston@...> wrote:

I'm very confused why BCC creates a map of number of processors for
the perf_events output map.

I can imagine it being 1 since all it does is act as a kernel-user
mode intermediary and it is true that the code cannot be preempted.

Or if it can be preempted then I can imagine that since there can't be
more than processor count it is the max depth one has to worry about.

Is my thinking flawed? Or maybe there is a completely different reason?