read perf counters


riya
 

Hi,

I'm trying to read perf counters using bpf. However, adding
BPF_PERF_ARRAY reports error:

bpf: Invalid argument
unrecognized bpf_ld_imm64 inns

Is there an example/sample to read perf counters that I can follow?
The code below is what I'm trying to execute.

Thanks,
Riya

# load BPF program

bpf_text = """

#include <uapi/linux/ptrace.h>

BPF_PERF_ARRAY(my_map, 32);

int start_counting(struct pt_regs *ctx) {

if (!PT_REGS_PARM1(ctx))

return 0;

u64 count;

u32 key = bpf_get_smp_processor_id();

count = bpf_perf_event_read(&my_map, key);

bpf_trace_printk("CPU-%d %llu", key, count);

return 0;

}

"""


riya
 

So I fixed the error above by using "count = my_map.perf_read(key);"
as opposed to "count = bpf_perf_event_read(&my_map, key);". However,
how do I selectively enable counters (e.g. instructions, cache misses,
etc.)?

Thanks,
Riya

On Mon, Jul 25, 2016 at 9:58 AM, riya khanna <riyakhanna1983@...> wrote:
Hi,

I'm trying to read perf counters using bpf. However, adding
BPF_PERF_ARRAY reports error:

bpf: Invalid argument
unrecognized bpf_ld_imm64 inns

Is there an example/sample to read perf counters that I can follow?
The code below is what I'm trying to execute.

Thanks,
Riya

# load BPF program

bpf_text = """

#include <uapi/linux/ptrace.h>

BPF_PERF_ARRAY(my_map, 32);

int start_counting(struct pt_regs *ctx) {

if (!PT_REGS_PARM1(ctx))

return 0;

u64 count;

u32 key = bpf_get_smp_processor_id();

count = bpf_perf_event_read(&my_map, key);

bpf_trace_printk("CPU-%d %llu", key, count);

return 0;

}

"""


Brenden Blanco <bblanco@...>
 

This needs support in bcc.

I had a patch laying around that I never finished, you can find the partial support here:

It shouldn't be too hard to finalize that, let me see what I can do.

On Mon, Jul 25, 2016 at 4:11 PM, riya khanna via iovisor-dev <iovisor-dev@...> wrote:
So I fixed the error above by using "count = my_map.perf_read(key);"
as opposed to "count = bpf_perf_event_read(&my_map, key);". However,
how do I selectively enable counters (e.g. instructions, cache misses,
etc.)?

Thanks,
Riya

On Mon, Jul 25, 2016 at 9:58 AM, riya khanna <riyakhanna1983@...> wrote:
> Hi,
>
> I'm trying to read perf counters using bpf. However, adding
> BPF_PERF_ARRAY reports error:
>
> bpf: Invalid argument
> unrecognized bpf_ld_imm64 inns
>
> Is there an example/sample to read perf counters  that I can follow?
> The code below is what I'm trying to execute.
>
> Thanks,
> Riya
>
> # load BPF program
>
> bpf_text = """
>
> #include <uapi/linux/ptrace.h>
>
> BPF_PERF_ARRAY(my_map, 32);
>
> int start_counting(struct pt_regs *ctx) {
>
>     if (!PT_REGS_PARM1(ctx))
>
>         return 0;
>
>     u64 count;
>
>     u32 key = bpf_get_smp_processor_id();
>
>     count = bpf_perf_event_read(&my_map, key);
>
>     bpf_trace_printk("CPU-%d %llu", key, count);
>
>     return 0;
>
> }
>
> """
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


riya
 

Thanks Brenden!

I will try with your changes. Meanwhile please let me know if you add
missing functionality.

On Mon, Jul 25, 2016 at 8:14 PM, Brenden Blanco <bblanco@...> wrote:
This needs support in bcc.

I had a patch laying around that I never finished, you can find the partial
support here:
https://github.com/iovisor/bcc/tree/perf-counter

It shouldn't be too hard to finalize that, let me see what I can do.

On Mon, Jul 25, 2016 at 4:11 PM, riya khanna via iovisor-dev
<iovisor-dev@...> wrote:

So I fixed the error above by using "count = my_map.perf_read(key);"
as opposed to "count = bpf_perf_event_read(&my_map, key);". However,
how do I selectively enable counters (e.g. instructions, cache misses,
etc.)?

Thanks,
Riya

On Mon, Jul 25, 2016 at 9:58 AM, riya khanna <riyakhanna1983@...>
wrote:
Hi,

I'm trying to read perf counters using bpf. However, adding
BPF_PERF_ARRAY reports error:

bpf: Invalid argument
unrecognized bpf_ld_imm64 inns

Is there an example/sample to read perf counters that I can follow?
The code below is what I'm trying to execute.

Thanks,
Riya

# load BPF program

bpf_text = """

#include <uapi/linux/ptrace.h>

BPF_PERF_ARRAY(my_map, 32);

int start_counting(struct pt_regs *ctx) {

if (!PT_REGS_PARM1(ctx))

return 0;

u64 count;

u32 key = bpf_get_smp_processor_id();

count = bpf_perf_event_read(&my_map, key);

bpf_trace_printk("CPU-%d %llu", key, count);

return 0;

}

"""
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


riya
 

From your patches I see that perf support is enabled per-cpu. Could
this be extended to enabling all or a group of perf counters on all
CPU cores similar to what perf_event_open provides (with args -1)?

On Mon, Jul 25, 2016 at 9:55 PM, riya khanna <riyakhanna1983@...> wrote:
Thanks Brenden!

I will try with your changes. Meanwhile please let me know if you add
missing functionality.


On Mon, Jul 25, 2016 at 8:14 PM, Brenden Blanco <bblanco@...> wrote:
This needs support in bcc.

I had a patch laying around that I never finished, you can find the partial
support here:
https://github.com/iovisor/bcc/tree/perf-counter

It shouldn't be too hard to finalize that, let me see what I can do.

On Mon, Jul 25, 2016 at 4:11 PM, riya khanna via iovisor-dev
<iovisor-dev@...> wrote:

So I fixed the error above by using "count = my_map.perf_read(key);"
as opposed to "count = bpf_perf_event_read(&my_map, key);". However,
how do I selectively enable counters (e.g. instructions, cache misses,
etc.)?

Thanks,
Riya

On Mon, Jul 25, 2016 at 9:58 AM, riya khanna <riyakhanna1983@...>
wrote:
Hi,

I'm trying to read perf counters using bpf. However, adding
BPF_PERF_ARRAY reports error:

bpf: Invalid argument
unrecognized bpf_ld_imm64 inns

Is there an example/sample to read perf counters that I can follow?
The code below is what I'm trying to execute.

Thanks,
Riya

# load BPF program

bpf_text = """

#include <uapi/linux/ptrace.h>

BPF_PERF_ARRAY(my_map, 32);

int start_counting(struct pt_regs *ctx) {

if (!PT_REGS_PARM1(ctx))

return 0;

u64 count;

u32 key = bpf_get_smp_processor_id();

count = bpf_perf_event_read(&my_map, key);

bpf_trace_printk("CPU-%d %llu", key, count);

return 0;

}

"""
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


riya
 

I'm testing perf counters on a 8-core machine.

since BPF_PERF_ARRAY.perf_read(cpu) reads from local CPU, I'm
aggregating counters across all cpus by doing:

BPF_PERF_ARRAY(counter, 32);

for (key = 0; key < 8; key++)
counter.perf_read(key);

However, this reports error:

bpf: Invalid argument
back-edge from insn 69 to 17

If I loop from 0-4, it works. The code below works:
for (key = 0; key < 4; key++)
counter.perf_read(key);

What could be wrong here?

On Tue, Jul 26, 2016 at 7:29 PM, riya khanna <riyakhanna1983@...> wrote:
From your patches I see that perf support is enabled per-cpu. Could
this be extended to enabling all or a group of perf counters on all
CPU cores similar to what perf_event_open provides (with args -1)?

On Mon, Jul 25, 2016 at 9:55 PM, riya khanna <riyakhanna1983@...> wrote:
Thanks Brenden!

I will try with your changes. Meanwhile please let me know if you add
missing functionality.


On Mon, Jul 25, 2016 at 8:14 PM, Brenden Blanco <bblanco@...> wrote:
This needs support in bcc.

I had a patch laying around that I never finished, you can find the partial
support here:
https://github.com/iovisor/bcc/tree/perf-counter

It shouldn't be too hard to finalize that, let me see what I can do.

On Mon, Jul 25, 2016 at 4:11 PM, riya khanna via iovisor-dev
<iovisor-dev@...> wrote:

So I fixed the error above by using "count = my_map.perf_read(key);"
as opposed to "count = bpf_perf_event_read(&my_map, key);". However,
how do I selectively enable counters (e.g. instructions, cache misses,
etc.)?

Thanks,
Riya

On Mon, Jul 25, 2016 at 9:58 AM, riya khanna <riyakhanna1983@...>
wrote:
Hi,

I'm trying to read perf counters using bpf. However, adding
BPF_PERF_ARRAY reports error:

bpf: Invalid argument
unrecognized bpf_ld_imm64 inns

Is there an example/sample to read perf counters that I can follow?
The code below is what I'm trying to execute.

Thanks,
Riya

# load BPF program

bpf_text = """

#include <uapi/linux/ptrace.h>

BPF_PERF_ARRAY(my_map, 32);

int start_counting(struct pt_regs *ctx) {

if (!PT_REGS_PARM1(ctx))

return 0;

u64 count;

u32 key = bpf_get_smp_processor_id();

count = bpf_perf_event_read(&my_map, key);

bpf_trace_printk("CPU-%d %llu", key, count);

return 0;

}

"""
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Brenden Blanco <bblanco@...>
 

On Fri, Jul 29, 2016 at 10:21 AM, riya khanna <riyakhanna1983@...> wrote:
I'm testing perf counters on a 8-core machine.

since BPF_PERF_ARRAY.perf_read(cpu) reads from local CPU, I'm
aggregating counters across all cpus by doing:

BPF_PERF_ARRAY(counter, 32);

for (key = 0; key < 8; key++)
    counter.perf_read(key);

I think it would make more sense to only read the counter on the cpu where the event is taking place. So: 

u64 key = cycles.perf_read(bpf_get_smp_processor_id());

And then aggregate counters in userspace.

I have spent some time over the past couple days cleaning up the code in that private branch, but have been distracted a bit so haven't finalized it. Hopefully a PR will come soon.
 
However, this reports error:

bpf: Invalid argument
back-edge from insn 69 to 17

If I loop from 0-4, it works. The code below works:
for (key = 0; key < 4; key++)
    counter.perf_read(key);

What could be wrong here?
The kernel verifier won't allow loops (i.e. back edges), and depending on the loop unroll optimization decision made by llvm, this short loop may have been automatically unrolled. Still, the solution should be to remove the loop and just read the local cpu's perf counter as mentioned above. 


On Tue, Jul 26, 2016 at 7:29 PM, riya khanna <riyakhanna1983@...> wrote:
> From your patches I see that perf support is enabled per-cpu. Could
> this be extended to enabling all or a group of perf counters on all
> CPU cores similar to what perf_event_open provides (with args -1)?
>
> On Mon, Jul 25, 2016 at 9:55 PM, riya khanna <riyakhanna1983@...> wrote:
>> Thanks Brenden!
>>
>> I will try with your changes. Meanwhile please let me know if you add
>> missing functionality.
>>
>>
>> On Mon, Jul 25, 2016 at 8:14 PM, Brenden Blanco <bblanco@...> wrote:
>>> This needs support in bcc.
>>>
>>> I had a patch laying around that I never finished, you can find the partial
>>> support here:
>>> https://github.com/iovisor/bcc/tree/perf-counter
>>>
>>> It shouldn't be too hard to finalize that, let me see what I can do.
>>>
>>> On Mon, Jul 25, 2016 at 4:11 PM, riya khanna via iovisor-dev
>>> <iovisor-dev@...> wrote:
>>>>
>>>> So I fixed the error above by using "count = my_map.perf_read(key);"
>>>> as opposed to "count = bpf_perf_event_read(&my_map, key);". However,
>>>> how do I selectively enable counters (e.g. instructions, cache misses,
>>>> etc.)?
>>>>
>>>> Thanks,
>>>> Riya
>>>>
>>>> On Mon, Jul 25, 2016 at 9:58 AM, riya khanna <riyakhanna1983@...>
>>>> wrote:
>>>> > Hi,
>>>> >
>>>> > I'm trying to read perf counters using bpf. However, adding
>>>> > BPF_PERF_ARRAY reports error:
>>>> >
>>>> > bpf: Invalid argument
>>>> > unrecognized bpf_ld_imm64 inns
>>>> >
>>>> > Is there an example/sample to read perf counters  that I can follow?
>>>> > The code below is what I'm trying to execute.
>>>> >
>>>> > Thanks,
>>>> > Riya
>>>> >
>>>> > # load BPF program
>>>> >
>>>> > bpf_text = """
>>>> >
>>>> > #include <uapi/linux/ptrace.h>
>>>> >
>>>> > BPF_PERF_ARRAY(my_map, 32);
>>>> >
>>>> > int start_counting(struct pt_regs *ctx) {
>>>> >
>>>> >     if (!PT_REGS_PARM1(ctx))
>>>> >
>>>> >         return 0;
>>>> >
>>>> >     u64 count;
>>>> >
>>>> >     u32 key = bpf_get_smp_processor_id();
>>>> >
>>>> >     count = bpf_perf_event_read(&my_map, key);
>>>> >
>>>> >     bpf_trace_printk("CPU-%d %llu", key, count);
>>>> >
>>>> >     return 0;
>>>> >
>>>> > }
>>>> >
>>>> > """
>>>> _______________________________________________
>>>> iovisor-dev mailing list
>>>> iovisor-dev@...
>>>> https://lists.iovisor.org/mailman/listinfo/iovisor-dev
>>>
>>>


riya
 

Thanks Brenden!

I'm working with your branch for now. Additionally, I'm unable to
create software events (see exception below). Just wanted to bring
this to your attention.

Traceback (most recent call last):

File "./test_bpf.py", line 176, in <module>
sw_clock.open_perf_event(1, 0)

File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 410, in
open_perf_event
fd = self._open_perf_event(typ, config, i)
File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 416, in
_open_perf_event
self[self.Key(cpu)] = self.Leaf(fd)
File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 320, in __setitem__
super(ArrayBase, self).__setitem__(key, leaf)
File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 169, in __setitem__
raise Exception("Could not update table")
Exception: Could not update table

On Fri, Jul 29, 2016 at 1:34 PM, Brenden Blanco <bblanco@...> wrote:
On Fri, Jul 29, 2016 at 10:21 AM, riya khanna <riyakhanna1983@...>
wrote:

I'm testing perf counters on a 8-core machine.

since BPF_PERF_ARRAY.perf_read(cpu) reads from local CPU, I'm
aggregating counters across all cpus by doing:

BPF_PERF_ARRAY(counter, 32);

for (key = 0; key < 8; key++)
counter.perf_read(key);

I think it would make more sense to only read the counter on the cpu where
the event is taking place. So:

u64 key = cycles.perf_read(bpf_get_smp_processor_id());

And then aggregate counters in userspace.

I have spent some time over the past couple days cleaning up the code in
that private branch, but have been distracted a bit so haven't finalized it.
Hopefully a PR will come soon.


However, this reports error:

bpf: Invalid argument
back-edge from insn 69 to 17

If I loop from 0-4, it works. The code below works:
for (key = 0; key < 4; key++)
counter.perf_read(key);


What could be wrong here?
The kernel verifier won't allow loops (i.e. back edges), and depending on
the loop unroll optimization decision made by llvm, this short loop may have
been automatically unrolled. Still, the solution should be to remove the
loop and just read the local cpu's perf counter as mentioned above.



On Tue, Jul 26, 2016 at 7:29 PM, riya khanna <riyakhanna1983@...>
wrote:
From your patches I see that perf support is enabled per-cpu. Could
this be extended to enabling all or a group of perf counters on all
CPU cores similar to what perf_event_open provides (with args -1)?

On Mon, Jul 25, 2016 at 9:55 PM, riya khanna <riyakhanna1983@...>
wrote:
Thanks Brenden!

I will try with your changes. Meanwhile please let me know if you add
missing functionality.


On Mon, Jul 25, 2016 at 8:14 PM, Brenden Blanco <bblanco@...>
wrote:
This needs support in bcc.

I had a patch laying around that I never finished, you can find the
partial
support here:
https://github.com/iovisor/bcc/tree/perf-counter

It shouldn't be too hard to finalize that, let me see what I can do.

On Mon, Jul 25, 2016 at 4:11 PM, riya khanna via iovisor-dev
<iovisor-dev@...> wrote:

So I fixed the error above by using "count = my_map.perf_read(key);"
as opposed to "count = bpf_perf_event_read(&my_map, key);". However,
how do I selectively enable counters (e.g. instructions, cache
misses,
etc.)?

Thanks,
Riya

On Mon, Jul 25, 2016 at 9:58 AM, riya khanna
<riyakhanna1983@...>
wrote:
Hi,

I'm trying to read perf counters using bpf. However, adding
BPF_PERF_ARRAY reports error:

bpf: Invalid argument
unrecognized bpf_ld_imm64 inns

Is there an example/sample to read perf counters that I can
follow?
The code below is what I'm trying to execute.

Thanks,
Riya

# load BPF program

bpf_text = """

#include <uapi/linux/ptrace.h>

BPF_PERF_ARRAY(my_map, 32);

int start_counting(struct pt_regs *ctx) {

if (!PT_REGS_PARM1(ctx))

return 0;

u64 count;

u32 key = bpf_get_smp_processor_id();

count = bpf_perf_event_read(&my_map, key);

bpf_trace_printk("CPU-%d %llu", key, count);

return 0;

}

"""
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


riya
 

Hi Brenden,

Saw test_perf_event.py in your branch. Its creates and enables per
counters once during start. Is it also possible to
enable/disable/reset counters on the fly? Perhaps we need a kernel
patch for this?

Thanks,
Riya

On Fri, Jul 29, 2016 at 1:57 PM, riya khanna <riyakhanna1983@...> wrote:
Thanks Brenden!

I'm working with your branch for now. Additionally, I'm unable to
create software events (see exception below). Just wanted to bring
this to your attention.

Traceback (most recent call last):

File "./test_bpf.py", line 176, in <module>
sw_clock.open_perf_event(1, 0)

File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 410, in
open_perf_event
fd = self._open_perf_event(typ, config, i)
File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 416, in
_open_perf_event
self[self.Key(cpu)] = self.Leaf(fd)
File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 320, in __setitem__
super(ArrayBase, self).__setitem__(key, leaf)
File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 169, in __setitem__
raise Exception("Could not update table")
Exception: Could not update table

On Fri, Jul 29, 2016 at 1:34 PM, Brenden Blanco <bblanco@...> wrote:
On Fri, Jul 29, 2016 at 10:21 AM, riya khanna <riyakhanna1983@...>
wrote:

I'm testing perf counters on a 8-core machine.

since BPF_PERF_ARRAY.perf_read(cpu) reads from local CPU, I'm
aggregating counters across all cpus by doing:

BPF_PERF_ARRAY(counter, 32);

for (key = 0; key < 8; key++)
counter.perf_read(key);

I think it would make more sense to only read the counter on the cpu where
the event is taking place. So:

u64 key = cycles.perf_read(bpf_get_smp_processor_id());

And then aggregate counters in userspace.

I have spent some time over the past couple days cleaning up the code in
that private branch, but have been distracted a bit so haven't finalized it.
Hopefully a PR will come soon.


However, this reports error:

bpf: Invalid argument
back-edge from insn 69 to 17

If I loop from 0-4, it works. The code below works:
for (key = 0; key < 4; key++)
counter.perf_read(key);


What could be wrong here?
The kernel verifier won't allow loops (i.e. back edges), and depending on
the loop unroll optimization decision made by llvm, this short loop may have
been automatically unrolled. Still, the solution should be to remove the
loop and just read the local cpu's perf counter as mentioned above.



On Tue, Jul 26, 2016 at 7:29 PM, riya khanna <riyakhanna1983@...>
wrote:
From your patches I see that perf support is enabled per-cpu. Could
this be extended to enabling all or a group of perf counters on all
CPU cores similar to what perf_event_open provides (with args -1)?

On Mon, Jul 25, 2016 at 9:55 PM, riya khanna <riyakhanna1983@...>
wrote:
Thanks Brenden!

I will try with your changes. Meanwhile please let me know if you add
missing functionality.


On Mon, Jul 25, 2016 at 8:14 PM, Brenden Blanco <bblanco@...>
wrote:
This needs support in bcc.

I had a patch laying around that I never finished, you can find the
partial
support here:
https://github.com/iovisor/bcc/tree/perf-counter

It shouldn't be too hard to finalize that, let me see what I can do.

On Mon, Jul 25, 2016 at 4:11 PM, riya khanna via iovisor-dev
<iovisor-dev@...> wrote:

So I fixed the error above by using "count = my_map.perf_read(key);"
as opposed to "count = bpf_perf_event_read(&my_map, key);". However,
how do I selectively enable counters (e.g. instructions, cache
misses,
etc.)?

Thanks,
Riya

On Mon, Jul 25, 2016 at 9:58 AM, riya khanna
<riyakhanna1983@...>
wrote:
Hi,

I'm trying to read perf counters using bpf. However, adding
BPF_PERF_ARRAY reports error:

bpf: Invalid argument
unrecognized bpf_ld_imm64 inns

Is there an example/sample to read perf counters that I can
follow?
The code below is what I'm trying to execute.

Thanks,
Riya

# load BPF program

bpf_text = """

#include <uapi/linux/ptrace.h>

BPF_PERF_ARRAY(my_map, 32);

int start_counting(struct pt_regs *ctx) {

if (!PT_REGS_PARM1(ctx))

return 0;

u64 count;

u32 key = bpf_get_smp_processor_id();

count = bpf_perf_event_read(&my_map, key);

bpf_trace_printk("CPU-%d %llu", key, count);

return 0;

}

"""
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Brenden Blanco <bblanco@...>
 

On Tue, Aug 9, 2016 at 8:54 AM, riya khanna <riyakhanna1983@...> wrote:
Hi Brenden,

Saw test_perf_event.py in your branch. Its creates and enables per
counters once during start. Is it also possible to
enable/disable/reset counters on the fly? Perhaps we need a kernel
patch for this?
It doesn't "create" counters, it just attaches to the already available counters provided by the hardware or OS. Any type of "reset" infrastructure would adversely impact other users of those same counters (perf). I consider it the job of userspace or the program to compute deltas or other types of history.

Thanks,
Riya

On Fri, Jul 29, 2016 at 1:57 PM, riya khanna <riyakhanna1983@...> wrote:
> Thanks Brenden!
>
> I'm working with your branch for now. Additionally, I'm unable to
> create software events (see exception below). Just wanted to bring
> this to your attention.
>
> Traceback (most recent call last):
>
>   File "./test_bpf.py", line 176, in <module>
>     sw_clock.open_perf_event(1, 0)
>
>   File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 410, in
> open_perf_event
>     fd = self._open_perf_event(typ, config, i)
>   File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 416, in
> _open_perf_event
>     self[self.Key(cpu)] = self.Leaf(fd)
>   File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 320, in __setitem__
>     super(ArrayBase, self).__setitem__(key, leaf)
>   File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 169, in __setitem__
>     raise Exception("Could not update table")
> Exception: Could not update table
>
> On Fri, Jul 29, 2016 at 1:34 PM, Brenden Blanco <bblanco@...> wrote:
>> On Fri, Jul 29, 2016 at 10:21 AM, riya khanna <riyakhanna1983@...>
>> wrote:
>>>
>>> I'm testing perf counters on a 8-core machine.
>>>
>>> since BPF_PERF_ARRAY.perf_read(cpu) reads from local CPU, I'm
>>> aggregating counters across all cpus by doing:
>>>
>>> BPF_PERF_ARRAY(counter, 32);
>>>
>>> for (key = 0; key < 8; key++)
>>>     counter.perf_read(key);
>>
>>
>> I think it would make more sense to only read the counter on the cpu where
>> the event is taking place. So:
>>
>> u64 key = cycles.perf_read(bpf_get_smp_processor_id());
>>
>> And then aggregate counters in userspace.
>>
>> I have spent some time over the past couple days cleaning up the code in
>> that private branch, but have been distracted a bit so haven't finalized it.
>> Hopefully a PR will come soon.
>>
>>>
>>> However, this reports error:
>>>
>>> bpf: Invalid argument
>>> back-edge from insn 69 to 17
>>>
>>> If I loop from 0-4, it works. The code below works:
>>> for (key = 0; key < 4; key++)
>>>     counter.perf_read(key);
>>>
>>>
>>> What could be wrong here?
>>
>> The kernel verifier won't allow loops (i.e. back edges), and depending on
>> the loop unroll optimization decision made by llvm, this short loop may have
>> been automatically unrolled. Still, the solution should be to remove the
>> loop and just read the local cpu's perf counter as mentioned above.
>>>
>>>
>>>
>>> On Tue, Jul 26, 2016 at 7:29 PM, riya khanna <riyakhanna1983@...>
>>> wrote:
>>> > From your patches I see that perf support is enabled per-cpu. Could
>>> > this be extended to enabling all or a group of perf counters on all
>>> > CPU cores similar to what perf_event_open provides (with args -1)?
>>> >
>>> > On Mon, Jul 25, 2016 at 9:55 PM, riya khanna <riyakhanna1983@...>
>>> > wrote:
>>> >> Thanks Brenden!
>>> >>
>>> >> I will try with your changes. Meanwhile please let me know if you add
>>> >> missing functionality.
>>> >>
>>> >>
>>> >> On Mon, Jul 25, 2016 at 8:14 PM, Brenden Blanco <bblanco@...>
>>> >> wrote:
>>> >>> This needs support in bcc.
>>> >>>
>>> >>> I had a patch laying around that I never finished, you can find the
>>> >>> partial
>>> >>> support here:
>>> >>> https://github.com/iovisor/bcc/tree/perf-counter
>>> >>>
>>> >>> It shouldn't be too hard to finalize that, let me see what I can do.
>>> >>>
>>> >>> On Mon, Jul 25, 2016 at 4:11 PM, riya khanna via iovisor-dev
>>> >>> <iovisor-dev@...> wrote:
>>> >>>>
>>> >>>> So I fixed the error above by using "count = my_map.perf_read(key);"
>>> >>>> as opposed to "count = bpf_perf_event_read(&my_map, key);". However,
>>> >>>> how do I selectively enable counters (e.g. instructions, cache
>>> >>>> misses,
>>> >>>> etc.)?
>>> >>>>
>>> >>>> Thanks,
>>> >>>> Riya
>>> >>>>
>>> >>>> On Mon, Jul 25, 2016 at 9:58 AM, riya khanna
>>> >>>> <riyakhanna1983@...>
>>> >>>> wrote:
>>> >>>> > Hi,
>>> >>>> >
>>> >>>> > I'm trying to read perf counters using bpf. However, adding
>>> >>>> > BPF_PERF_ARRAY reports error:
>>> >>>> >
>>> >>>> > bpf: Invalid argument
>>> >>>> > unrecognized bpf_ld_imm64 inns
>>> >>>> >
>>> >>>> > Is there an example/sample to read perf counters  that I can
>>> >>>> > follow?
>>> >>>> > The code below is what I'm trying to execute.
>>> >>>> >
>>> >>>> > Thanks,
>>> >>>> > Riya
>>> >>>> >
>>> >>>> > # load BPF program
>>> >>>> >
>>> >>>> > bpf_text = """
>>> >>>> >
>>> >>>> > #include <uapi/linux/ptrace.h>
>>> >>>> >
>>> >>>> > BPF_PERF_ARRAY(my_map, 32);
>>> >>>> >
>>> >>>> > int start_counting(struct pt_regs *ctx) {
>>> >>>> >
>>> >>>> >     if (!PT_REGS_PARM1(ctx))
>>> >>>> >
>>> >>>> >         return 0;
>>> >>>> >
>>> >>>> >     u64 count;
>>> >>>> >
>>> >>>> >     u32 key = bpf_get_smp_processor_id();
>>> >>>> >
>>> >>>> >     count = bpf_perf_event_read(&my_map, key);
>>> >>>> >
>>> >>>> >     bpf_trace_printk("CPU-%d %llu", key, count);
>>> >>>> >
>>> >>>> >     return 0;
>>> >>>> >
>>> >>>> > }
>>> >>>> >
>>> >>>> > """
>>> >>>> _______________________________________________
>>> >>>> iovisor-dev mailing list
>>> >>>> iovisor-dev@...
>>> >>>> https://lists.iovisor.org/mailman/listinfo/iovisor-dev
>>> >>>
>>> >>>
>>
>>


riya
 

On Tue, Aug 9, 2016 at 12:05 PM, Brenden Blanco <bblanco@...> wrote:
On Tue, Aug 9, 2016 at 8:54 AM, riya khanna <riyakhanna1983@...>
wrote:

Hi Brenden,

Saw test_perf_event.py in your branch. Its creates and enables per
counters once during start. Is it also possible to
enable/disable/reset counters on the fly? Perhaps we need a kernel
patch for this?
It doesn't "create" counters, it just attaches to the already available
counters provided by the hardware or OS.
Yes, it enables monitoring when attached.

Any type of "reset" infrastructure
would adversely impact other users of those same counters (perf). I consider
it the job of userspace or the program to compute deltas or other types of
history.
Well, there are limited counters. How to multiplex from userspace on
the fly (e.g. monitoring a set of events first, followed by a
different set)? Also, is it possible to handle counter overflow?



Thanks,
Riya

On Fri, Jul 29, 2016 at 1:57 PM, riya khanna <riyakhanna1983@...>
wrote:
Thanks Brenden!

I'm working with your branch for now. Additionally, I'm unable to
create software events (see exception below). Just wanted to bring
this to your attention.

Traceback (most recent call last):

File "./test_bpf.py", line 176, in <module>
sw_clock.open_perf_event(1, 0)

File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 410, in
open_perf_event
fd = self._open_perf_event(typ, config, i)
File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 416, in
_open_perf_event
self[self.Key(cpu)] = self.Leaf(fd)
File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 320, in
__setitem__
super(ArrayBase, self).__setitem__(key, leaf)
File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 169, in
__setitem__
raise Exception("Could not update table")
Exception: Could not update table

On Fri, Jul 29, 2016 at 1:34 PM, Brenden Blanco <bblanco@...>
wrote:
On Fri, Jul 29, 2016 at 10:21 AM, riya khanna
<riyakhanna1983@...>
wrote:

I'm testing perf counters on a 8-core machine.

since BPF_PERF_ARRAY.perf_read(cpu) reads from local CPU, I'm
aggregating counters across all cpus by doing:

BPF_PERF_ARRAY(counter, 32);

for (key = 0; key < 8; key++)
counter.perf_read(key);

I think it would make more sense to only read the counter on the cpu
where
the event is taking place. So:

u64 key = cycles.perf_read(bpf_get_smp_processor_id());

And then aggregate counters in userspace.

I have spent some time over the past couple days cleaning up the code
in
that private branch, but have been distracted a bit so haven't
finalized it.
Hopefully a PR will come soon.


However, this reports error:

bpf: Invalid argument
back-edge from insn 69 to 17

If I loop from 0-4, it works. The code below works:
for (key = 0; key < 4; key++)
counter.perf_read(key);


What could be wrong here?
The kernel verifier won't allow loops (i.e. back edges), and depending
on
the loop unroll optimization decision made by llvm, this short loop may
have
been automatically unrolled. Still, the solution should be to remove
the
loop and just read the local cpu's perf counter as mentioned above.



On Tue, Jul 26, 2016 at 7:29 PM, riya khanna
<riyakhanna1983@...>
wrote:
From your patches I see that perf support is enabled per-cpu. Could
this be extended to enabling all or a group of perf counters on all
CPU cores similar to what perf_event_open provides (with args -1)?

On Mon, Jul 25, 2016 at 9:55 PM, riya khanna
<riyakhanna1983@...>
wrote:
Thanks Brenden!

I will try with your changes. Meanwhile please let me know if you
add
missing functionality.


On Mon, Jul 25, 2016 at 8:14 PM, Brenden Blanco
<bblanco@...>
wrote:
This needs support in bcc.

I had a patch laying around that I never finished, you can find
the
partial
support here:
https://github.com/iovisor/bcc/tree/perf-counter

It shouldn't be too hard to finalize that, let me see what I can
do.

On Mon, Jul 25, 2016 at 4:11 PM, riya khanna via iovisor-dev
<iovisor-dev@...> wrote:

So I fixed the error above by using "count =
my_map.perf_read(key);"
as opposed to "count = bpf_perf_event_read(&my_map, key);".
However,
how do I selectively enable counters (e.g. instructions, cache
misses,
etc.)?

Thanks,
Riya

On Mon, Jul 25, 2016 at 9:58 AM, riya khanna
<riyakhanna1983@...>
wrote:
Hi,

I'm trying to read perf counters using bpf. However, adding
BPF_PERF_ARRAY reports error:

bpf: Invalid argument
unrecognized bpf_ld_imm64 inns

Is there an example/sample to read perf counters that I can
follow?
The code below is what I'm trying to execute.

Thanks,
Riya

# load BPF program

bpf_text = """

#include <uapi/linux/ptrace.h>

BPF_PERF_ARRAY(my_map, 32);

int start_counting(struct pt_regs *ctx) {

if (!PT_REGS_PARM1(ctx))

return 0;

u64 count;

u32 key = bpf_get_smp_processor_id();

count = bpf_perf_event_read(&my_map, key);

bpf_trace_printk("CPU-%d %llu", key, count);

return 0;

}

"""
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Brenden Blanco <bblanco@...>
 



On Tue, Aug 9, 2016 at 9:16 AM, riya khanna <riyakhanna1983@...> wrote:
On Tue, Aug 9, 2016 at 12:05 PM, Brenden Blanco <bblanco@...> wrote:
> On Tue, Aug 9, 2016 at 8:54 AM, riya khanna <riyakhanna1983@...>
> wrote:
>>
>> Hi Brenden,
>>
>> Saw test_perf_event.py in your branch. Its creates and enables per
>> counters once during start. Is it also possible to
>> enable/disable/reset counters on the fly? Perhaps we need a kernel
>> patch for this?
>
> It doesn't "create" counters, it just attaches to the already available
> counters provided by the hardware or OS.

Yes, it enables monitoring when attached.

> Any type of "reset" infrastructure
> would adversely impact other users of those same counters (perf). I consider
> it the job of userspace or the program to compute deltas or other types of
> history.

Well, there are limited counters. How to multiplex from userspace on
the fly (e.g. monitoring a set of events first, followed by a
different set)?
I would just create a different BPF_PERF_ARRAY for each different one. 
Also, is it possible to handle counter overflow?
What does it mean to "handle"? If computing deltas, for instance, the subtraction will just underflow and wrap around to the correct value, assuming the values are both unsigned.

>>
>>
>> Thanks,
>> Riya
>>
>> On Fri, Jul 29, 2016 at 1:57 PM, riya khanna <riyakhanna1983@...>
>> wrote:
>> > Thanks Brenden!
>> >
>> > I'm working with your branch for now. Additionally, I'm unable to
>> > create software events (see exception below). Just wanted to bring
>> > this to your attention.
>> >
>> > Traceback (most recent call last):
>> >
>> >   File "./test_bpf.py", line 176, in <module>
>> >     sw_clock.open_perf_event(1, 0)
>> >
>> >   File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 410, in
>> > open_perf_event
>> >     fd = self._open_perf_event(typ, config, i)
>> >   File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 416, in
>> > _open_perf_event
>> >     self[self.Key(cpu)] = self.Leaf(fd)
>> >   File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 320, in
>> > __setitem__
>> >     super(ArrayBase, self).__setitem__(key, leaf)
>> >   File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 169, in
>> > __setitem__
>> >     raise Exception("Could not update table")
>> > Exception: Could not update table
>> >
>> > On Fri, Jul 29, 2016 at 1:34 PM, Brenden Blanco <bblanco@...>
>> > wrote:
>> >> On Fri, Jul 29, 2016 at 10:21 AM, riya khanna
>> >> <riyakhanna1983@...>
>> >> wrote:
>> >>>
>> >>> I'm testing perf counters on a 8-core machine.
>> >>>
>> >>> since BPF_PERF_ARRAY.perf_read(cpu) reads from local CPU, I'm
>> >>> aggregating counters across all cpus by doing:
>> >>>
>> >>> BPF_PERF_ARRAY(counter, 32);
>> >>>
>> >>> for (key = 0; key < 8; key++)
>> >>>     counter.perf_read(key);
>> >>
>> >>
>> >> I think it would make more sense to only read the counter on the cpu
>> >> where
>> >> the event is taking place. So:
>> >>
>> >> u64 key = cycles.perf_read(bpf_get_smp_processor_id());
>> >>
>> >> And then aggregate counters in userspace.
>> >>
>> >> I have spent some time over the past couple days cleaning up the code
>> >> in
>> >> that private branch, but have been distracted a bit so haven't
>> >> finalized it.
>> >> Hopefully a PR will come soon.
>> >>
>> >>>
>> >>> However, this reports error:
>> >>>
>> >>> bpf: Invalid argument
>> >>> back-edge from insn 69 to 17
>> >>>
>> >>> If I loop from 0-4, it works. The code below works:
>> >>> for (key = 0; key < 4; key++)
>> >>>     counter.perf_read(key);
>> >>>
>> >>>
>> >>> What could be wrong here?
>> >>
>> >> The kernel verifier won't allow loops (i.e. back edges), and depending
>> >> on
>> >> the loop unroll optimization decision made by llvm, this short loop may
>> >> have
>> >> been automatically unrolled. Still, the solution should be to remove
>> >> the
>> >> loop and just read the local cpu's perf counter as mentioned above.
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Jul 26, 2016 at 7:29 PM, riya khanna
>> >>> <riyakhanna1983@...>
>> >>> wrote:
>> >>> > From your patches I see that perf support is enabled per-cpu. Could
>> >>> > this be extended to enabling all or a group of perf counters on all
>> >>> > CPU cores similar to what perf_event_open provides (with args -1)?
>> >>> >
>> >>> > On Mon, Jul 25, 2016 at 9:55 PM, riya khanna
>> >>> > <riyakhanna1983@...>
>> >>> > wrote:
>> >>> >> Thanks Brenden!
>> >>> >>
>> >>> >> I will try with your changes. Meanwhile please let me know if you
>> >>> >> add
>> >>> >> missing functionality.
>> >>> >>
>> >>> >>
>> >>> >> On Mon, Jul 25, 2016 at 8:14 PM, Brenden Blanco
>> >>> >> <bblanco@...>
>> >>> >> wrote:
>> >>> >>> This needs support in bcc.
>> >>> >>>
>> >>> >>> I had a patch laying around that I never finished, you can find
>> >>> >>> the
>> >>> >>> partial
>> >>> >>> support here:
>> >>> >>> https://github.com/iovisor/bcc/tree/perf-counter
>> >>> >>>
>> >>> >>> It shouldn't be too hard to finalize that, let me see what I can
>> >>> >>> do.
>> >>> >>>
>> >>> >>> On Mon, Jul 25, 2016 at 4:11 PM, riya khanna via iovisor-dev
>> >>> >>> <iovisor-dev@...> wrote:
>> >>> >>>>
>> >>> >>>> So I fixed the error above by using "count =
>> >>> >>>> my_map.perf_read(key);"
>> >>> >>>> as opposed to "count = bpf_perf_event_read(&my_map, key);".
>> >>> >>>> However,
>> >>> >>>> how do I selectively enable counters (e.g. instructions, cache
>> >>> >>>> misses,
>> >>> >>>> etc.)?
>> >>> >>>>
>> >>> >>>> Thanks,
>> >>> >>>> Riya
>> >>> >>>>
>> >>> >>>> On Mon, Jul 25, 2016 at 9:58 AM, riya khanna
>> >>> >>>> <riyakhanna1983@...>
>> >>> >>>> wrote:
>> >>> >>>> > Hi,
>> >>> >>>> >
>> >>> >>>> > I'm trying to read perf counters using bpf. However, adding
>> >>> >>>> > BPF_PERF_ARRAY reports error:
>> >>> >>>> >
>> >>> >>>> > bpf: Invalid argument
>> >>> >>>> > unrecognized bpf_ld_imm64 inns
>> >>> >>>> >
>> >>> >>>> > Is there an example/sample to read perf counters  that I can
>> >>> >>>> > follow?
>> >>> >>>> > The code below is what I'm trying to execute.
>> >>> >>>> >
>> >>> >>>> > Thanks,
>> >>> >>>> > Riya
>> >>> >>>> >
>> >>> >>>> > # load BPF program
>> >>> >>>> >
>> >>> >>>> > bpf_text = """
>> >>> >>>> >
>> >>> >>>> > #include <uapi/linux/ptrace.h>
>> >>> >>>> >
>> >>> >>>> > BPF_PERF_ARRAY(my_map, 32);
>> >>> >>>> >
>> >>> >>>> > int start_counting(struct pt_regs *ctx) {
>> >>> >>>> >
>> >>> >>>> >     if (!PT_REGS_PARM1(ctx))
>> >>> >>>> >
>> >>> >>>> >         return 0;
>> >>> >>>> >
>> >>> >>>> >     u64 count;
>> >>> >>>> >
>> >>> >>>> >     u32 key = bpf_get_smp_processor_id();
>> >>> >>>> >
>> >>> >>>> >     count = bpf_perf_event_read(&my_map, key);
>> >>> >>>> >
>> >>> >>>> >     bpf_trace_printk("CPU-%d %llu", key, count);
>> >>> >>>> >
>> >>> >>>> >     return 0;
>> >>> >>>> >
>> >>> >>>> > }
>> >>> >>>> >
>> >>> >>>> > """
>> >>> >>>> _______________________________________________
>> >>> >>>> iovisor-dev mailing list
>> >>> >>>> iovisor-dev@...
>> >>> >>>> https://lists.iovisor.org/mailman/listinfo/iovisor-dev
>> >>> >>>
>> >>> >>>
>> >>
>> >>
>
>


riya
 

On Tue, Aug 9, 2016 at 12:24 PM, Brenden Blanco <bblanco@...> wrote:


On Tue, Aug 9, 2016 at 9:16 AM, riya khanna <riyakhanna1983@...>
wrote:

On Tue, Aug 9, 2016 at 12:05 PM, Brenden Blanco <bblanco@...>
wrote:
On Tue, Aug 9, 2016 at 8:54 AM, riya khanna <riyakhanna1983@...>
wrote:

Hi Brenden,

Saw test_perf_event.py in your branch. Its creates and enables per
counters once during start. Is it also possible to
enable/disable/reset counters on the fly? Perhaps we need a kernel
patch for this?
It doesn't "create" counters, it just attaches to the already available
counters provided by the hardware or OS.
Yes, it enables monitoring when attached.

Any type of "reset" infrastructure
would adversely impact other users of those same counters (perf). I
consider
it the job of userspace or the program to compute deltas or other types
of
history.
Well, there are limited counters. How to multiplex from userspace on
the fly (e.g. monitoring a set of events first, followed by a
different set)?
I would just create a different BPF_PERF_ARRAY for each different one.
Yes, but if you create more than available number of hardware counters
(i.e. try to monitor more events concurrently than allowed by the
hardware), counter value is reported as '0'. Verified with a test
program (perf.c, attached) that uses perf_event_open() syscall.
Depending upon the number of counters available on your platform,
change NUM_REQ_HW_CNTRS to verify the behavior

A way to enable/disable events at runtime will help userspace
multiplex over available hardware counters and monitor more more
events (similar to perf stat tool)


Also, is it possible to handle counter overflow?
What does it mean to "handle"? If computing deltas, for instance, the
subtraction will just underflow and wrap around to the correct value,
assuming the values are both unsigned.




Thanks,
Riya

On Fri, Jul 29, 2016 at 1:57 PM, riya khanna <riyakhanna1983@...>
wrote:
Thanks Brenden!

I'm working with your branch for now. Additionally, I'm unable to
create software events (see exception below). Just wanted to bring
this to your attention.

Traceback (most recent call last):

File "./test_bpf.py", line 176, in <module>
sw_clock.open_perf_event(1, 0)

File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 410, in
open_perf_event
fd = self._open_perf_event(typ, config, i)
File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 416, in
_open_perf_event
self[self.Key(cpu)] = self.Leaf(fd)
File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 320, in
__setitem__
super(ArrayBase, self).__setitem__(key, leaf)
File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 169, in
__setitem__
raise Exception("Could not update table")
Exception: Could not update table

On Fri, Jul 29, 2016 at 1:34 PM, Brenden Blanco
<bblanco@...>
wrote:
On Fri, Jul 29, 2016 at 10:21 AM, riya khanna
<riyakhanna1983@...>
wrote:

I'm testing perf counters on a 8-core machine.

since BPF_PERF_ARRAY.perf_read(cpu) reads from local CPU, I'm
aggregating counters across all cpus by doing:

BPF_PERF_ARRAY(counter, 32);

for (key = 0; key < 8; key++)
counter.perf_read(key);

I think it would make more sense to only read the counter on the cpu
where
the event is taking place. So:

u64 key = cycles.perf_read(bpf_get_smp_processor_id());

And then aggregate counters in userspace.

I have spent some time over the past couple days cleaning up the
code
in
that private branch, but have been distracted a bit so haven't
finalized it.
Hopefully a PR will come soon.


However, this reports error:

bpf: Invalid argument
back-edge from insn 69 to 17

If I loop from 0-4, it works. The code below works:
for (key = 0; key < 4; key++)
counter.perf_read(key);


What could be wrong here?
The kernel verifier won't allow loops (i.e. back edges), and
depending
on
the loop unroll optimization decision made by llvm, this short loop
may
have
been automatically unrolled. Still, the solution should be to remove
the
loop and just read the local cpu's perf counter as mentioned above.



On Tue, Jul 26, 2016 at 7:29 PM, riya khanna
<riyakhanna1983@...>
wrote:
From your patches I see that perf support is enabled per-cpu.
Could
this be extended to enabling all or a group of perf counters on
all
CPU cores similar to what perf_event_open provides (with args
-1)?

On Mon, Jul 25, 2016 at 9:55 PM, riya khanna
<riyakhanna1983@...>
wrote:
Thanks Brenden!

I will try with your changes. Meanwhile please let me know if
you
add
missing functionality.


On Mon, Jul 25, 2016 at 8:14 PM, Brenden Blanco
<bblanco@...>
wrote:
This needs support in bcc.

I had a patch laying around that I never finished, you can find
the
partial
support here:
https://github.com/iovisor/bcc/tree/perf-counter

It shouldn't be too hard to finalize that, let me see what I
can
do.

On Mon, Jul 25, 2016 at 4:11 PM, riya khanna via iovisor-dev
<iovisor-dev@...> wrote:

So I fixed the error above by using "count =
my_map.perf_read(key);"
as opposed to "count = bpf_perf_event_read(&my_map, key);".
However,
how do I selectively enable counters (e.g. instructions, cache
misses,
etc.)?

Thanks,
Riya

On Mon, Jul 25, 2016 at 9:58 AM, riya khanna
<riyakhanna1983@...>
wrote:
Hi,

I'm trying to read perf counters using bpf. However, adding
BPF_PERF_ARRAY reports error:

bpf: Invalid argument
unrecognized bpf_ld_imm64 inns

Is there an example/sample to read perf counters that I can
follow?
The code below is what I'm trying to execute.

Thanks,
Riya

# load BPF program

bpf_text = """

#include <uapi/linux/ptrace.h>

BPF_PERF_ARRAY(my_map, 32);

int start_counting(struct pt_regs *ctx) {

if (!PT_REGS_PARM1(ctx))

return 0;

u64 count;

u32 key = bpf_get_smp_processor_id();

count = bpf_perf_event_read(&my_map, key);

bpf_trace_printk("CPU-%d %llu", key, count);

return 0;

}

"""
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Brenden Blanco <bblanco@...>
 



On Tue, Aug 9, 2016 at 9:59 AM, riya khanna <riyakhanna1983@...> wrote:
On Tue, Aug 9, 2016 at 12:24 PM, Brenden Blanco <bblanco@...> wrote:
>
>
> On Tue, Aug 9, 2016 at 9:16 AM, riya khanna <riyakhanna1983@...>
> wrote:
>>
>> On Tue, Aug 9, 2016 at 12:05 PM, Brenden Blanco <bblanco@...>
>> wrote:
>> > On Tue, Aug 9, 2016 at 8:54 AM, riya khanna <riyakhanna1983@...>
>> > wrote:
>> >>
>> >> Hi Brenden,
>> >>
>> >> Saw test_perf_event.py in your branch. Its creates and enables per
>> >> counters once during start. Is it also possible to
>> >> enable/disable/reset counters on the fly? Perhaps we need a kernel
>> >> patch for this?
>> >
>> > It doesn't "create" counters, it just attaches to the already available
>> > counters provided by the hardware or OS.
>>
>> Yes, it enables monitoring when attached.
>>
>> > Any type of "reset" infrastructure
>> > would adversely impact other users of those same counters (perf). I
>> > consider
>> > it the job of userspace or the program to compute deltas or other types
>> > of
>> > history.
>>
>> Well, there are limited counters. How to multiplex from userspace on
>> the fly (e.g. monitoring a set of events first, followed by a
>> different set)?
>
> I would just create a different BPF_PERF_ARRAY for each different one.

Yes, but if you create more than available number of hardware counters
(i.e. try to monitor more events concurrently than allowed by the
hardware), counter value is reported as '0'. Verified with a test
program (perf.c, attached) that uses perf_event_open() syscall.
Depending upon the number of counters available on your platform,
change NUM_REQ_HW_CNTRS to verify the behavior

A way to enable/disable events at runtime will help userspace
multiplex over available hardware counters and monitor more more
events (similar to perf stat tool)
If you just call table.open_perf_event(NEW_TYPE) multiple times, it should bind a new counter to the same table entry, allowing you to change the monitored event over time. Have you tried that?

>>
>> Also, is it possible to handle counter overflow?
>
> What does it mean to "handle"? If computing deltas, for instance, the
> subtraction will just underflow and wrap around to the correct value,
> assuming the values are both unsigned.
>>
>>
>> >>
>> >>
>> >> Thanks,
>> >> Riya
>> >>
>> >> On Fri, Jul 29, 2016 at 1:57 PM, riya khanna <riyakhanna1983@...>
>> >> wrote:
>> >> > Thanks Brenden!
>> >> >
>> >> > I'm working with your branch for now. Additionally, I'm unable to
>> >> > create software events (see exception below). Just wanted to bring
>> >> > this to your attention.
>> >> >
>> >> > Traceback (most recent call last):
>> >> >
>> >> >   File "./test_bpf.py", line 176, in <module>
>> >> >     sw_clock.open_perf_event(1, 0)
>> >> >
>> >> >   File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 410, in
>> >> > open_perf_event
>> >> >     fd = self._open_perf_event(typ, config, i)
>> >> >   File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 416, in
>> >> > _open_perf_event
>> >> >     self[self.Key(cpu)] = self.Leaf(fd)
>> >> >   File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 320, in
>> >> > __setitem__
>> >> >     super(ArrayBase, self).__setitem__(key, leaf)
>> >> >   File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 169, in
>> >> > __setitem__
>> >> >     raise Exception("Could not update table")
>> >> > Exception: Could not update table
>> >> >
>> >> > On Fri, Jul 29, 2016 at 1:34 PM, Brenden Blanco
>> >> > <bblanco@...>
>> >> > wrote:
>> >> >> On Fri, Jul 29, 2016 at 10:21 AM, riya khanna
>> >> >> <riyakhanna1983@...>
>> >> >> wrote:
>> >> >>>
>> >> >>> I'm testing perf counters on a 8-core machine.
>> >> >>>
>> >> >>> since BPF_PERF_ARRAY.perf_read(cpu) reads from local CPU, I'm
>> >> >>> aggregating counters across all cpus by doing:
>> >> >>>
>> >> >>> BPF_PERF_ARRAY(counter, 32);
>> >> >>>
>> >> >>> for (key = 0; key < 8; key++)
>> >> >>>     counter.perf_read(key);
>> >> >>
>> >> >>
>> >> >> I think it would make more sense to only read the counter on the cpu
>> >> >> where
>> >> >> the event is taking place. So:
>> >> >>
>> >> >> u64 key = cycles.perf_read(bpf_get_smp_processor_id());
>> >> >>
>> >> >> And then aggregate counters in userspace.
>> >> >>
>> >> >> I have spent some time over the past couple days cleaning up the
>> >> >> code
>> >> >> in
>> >> >> that private branch, but have been distracted a bit so haven't
>> >> >> finalized it.
>> >> >> Hopefully a PR will come soon.
>> >> >>
>> >> >>>
>> >> >>> However, this reports error:
>> >> >>>
>> >> >>> bpf: Invalid argument
>> >> >>> back-edge from insn 69 to 17
>> >> >>>
>> >> >>> If I loop from 0-4, it works. The code below works:
>> >> >>> for (key = 0; key < 4; key++)
>> >> >>>     counter.perf_read(key);
>> >> >>>
>> >> >>>
>> >> >>> What could be wrong here?
>> >> >>
>> >> >> The kernel verifier won't allow loops (i.e. back edges), and
>> >> >> depending
>> >> >> on
>> >> >> the loop unroll optimization decision made by llvm, this short loop
>> >> >> may
>> >> >> have
>> >> >> been automatically unrolled. Still, the solution should be to remove
>> >> >> the
>> >> >> loop and just read the local cpu's perf counter as mentioned above.
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On Tue, Jul 26, 2016 at 7:29 PM, riya khanna
>> >> >>> <riyakhanna1983@...>
>> >> >>> wrote:
>> >> >>> > From your patches I see that perf support is enabled per-cpu.
>> >> >>> > Could
>> >> >>> > this be extended to enabling all or a group of perf counters on
>> >> >>> > all
>> >> >>> > CPU cores similar to what perf_event_open provides (with args
>> >> >>> > -1)?
>> >> >>> >
>> >> >>> > On Mon, Jul 25, 2016 at 9:55 PM, riya khanna
>> >> >>> > <riyakhanna1983@...>
>> >> >>> > wrote:
>> >> >>> >> Thanks Brenden!
>> >> >>> >>
>> >> >>> >> I will try with your changes. Meanwhile please let me know if
>> >> >>> >> you
>> >> >>> >> add
>> >> >>> >> missing functionality.
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> On Mon, Jul 25, 2016 at 8:14 PM, Brenden Blanco
>> >> >>> >> <bblanco@...>
>> >> >>> >> wrote:
>> >> >>> >>> This needs support in bcc.
>> >> >>> >>>
>> >> >>> >>> I had a patch laying around that I never finished, you can find
>> >> >>> >>> the
>> >> >>> >>> partial
>> >> >>> >>> support here:
>> >> >>> >>> https://github.com/iovisor/bcc/tree/perf-counter
>> >> >>> >>>
>> >> >>> >>> It shouldn't be too hard to finalize that, let me see what I
>> >> >>> >>> can
>> >> >>> >>> do.
>> >> >>> >>>
>> >> >>> >>> On Mon, Jul 25, 2016 at 4:11 PM, riya khanna via iovisor-dev
>> >> >>> >>> <iovisor-dev@...> wrote:
>> >> >>> >>>>
>> >> >>> >>>> So I fixed the error above by using "count =
>> >> >>> >>>> my_map.perf_read(key);"
>> >> >>> >>>> as opposed to "count = bpf_perf_event_read(&my_map, key);".
>> >> >>> >>>> However,
>> >> >>> >>>> how do I selectively enable counters (e.g. instructions, cache
>> >> >>> >>>> misses,
>> >> >>> >>>> etc.)?
>> >> >>> >>>>
>> >> >>> >>>> Thanks,
>> >> >>> >>>> Riya
>> >> >>> >>>>
>> >> >>> >>>> On Mon, Jul 25, 2016 at 9:58 AM, riya khanna
>> >> >>> >>>> <riyakhanna1983@...>
>> >> >>> >>>> wrote:
>> >> >>> >>>> > Hi,
>> >> >>> >>>> >
>> >> >>> >>>> > I'm trying to read perf counters using bpf. However, adding
>> >> >>> >>>> > BPF_PERF_ARRAY reports error:
>> >> >>> >>>> >
>> >> >>> >>>> > bpf: Invalid argument
>> >> >>> >>>> > unrecognized bpf_ld_imm64 inns
>> >> >>> >>>> >
>> >> >>> >>>> > Is there an example/sample to read perf counters  that I can
>> >> >>> >>>> > follow?
>> >> >>> >>>> > The code below is what I'm trying to execute.
>> >> >>> >>>> >
>> >> >>> >>>> > Thanks,
>> >> >>> >>>> > Riya
>> >> >>> >>>> >
>> >> >>> >>>> > # load BPF program
>> >> >>> >>>> >
>> >> >>> >>>> > bpf_text = """
>> >> >>> >>>> >
>> >> >>> >>>> > #include <uapi/linux/ptrace.h>
>> >> >>> >>>> >
>> >> >>> >>>> > BPF_PERF_ARRAY(my_map, 32);
>> >> >>> >>>> >
>> >> >>> >>>> > int start_counting(struct pt_regs *ctx) {
>> >> >>> >>>> >
>> >> >>> >>>> >     if (!PT_REGS_PARM1(ctx))
>> >> >>> >>>> >
>> >> >>> >>>> >         return 0;
>> >> >>> >>>> >
>> >> >>> >>>> >     u64 count;
>> >> >>> >>>> >
>> >> >>> >>>> >     u32 key = bpf_get_smp_processor_id();
>> >> >>> >>>> >
>> >> >>> >>>> >     count = bpf_perf_event_read(&my_map, key);
>> >> >>> >>>> >
>> >> >>> >>>> >     bpf_trace_printk("CPU-%d %llu", key, count);
>> >> >>> >>>> >
>> >> >>> >>>> >     return 0;
>> >> >>> >>>> >
>> >> >>> >>>> > }
>> >> >>> >>>> >
>> >> >>> >>>> > """
>> >> >>> >>>> _______________________________________________
>> >> >>> >>>> iovisor-dev mailing list
>> >> >>> >>>> iovisor-dev@...
>> >> >>> >>>> https://lists.iovisor.org/mailman/listinfo/iovisor-dev
>> >> >>> >>>
>> >> >>> >>>
>> >> >>
>> >> >>
>> >
>> >
>
>


riya
 

How do I do that at runtime conditionally (e.g. upon execution of a
particular kernel/user code)?

May be this example will help present my case:

I'm trying to monitor different sets of events (e.g. set1 consists of
cpu cycles, cache misses, and cache refs and set 2 consists of
instruction count, branch refs, and branch misses) at different times
(e.g. set 1 should be enabled upon execution of a kernel function i.e.
kprobe and set 2 should be enabled for a particular user function i.e.
uprobe).

I can create table.open_perf_event() for each event I'm trying to
monitor. This will call ioctl(PERF_EVENT_IOC_ENABLE) for each event.
However, if number of events > number of available counters, there
will be a problem as demonstrated by my perf.c example (attached in my
last email).

Let me know if I'm missing something.

On Wed, Aug 10, 2016 at 7:36 PM, Brenden Blanco <bblanco@...> wrote:


On Tue, Aug 9, 2016 at 9:59 AM, riya khanna <riyakhanna1983@...>
wrote:

On Tue, Aug 9, 2016 at 12:24 PM, Brenden Blanco <bblanco@...>
wrote:


On Tue, Aug 9, 2016 at 9:16 AM, riya khanna <riyakhanna1983@...>
wrote:

On Tue, Aug 9, 2016 at 12:05 PM, Brenden Blanco <bblanco@...>
wrote:
On Tue, Aug 9, 2016 at 8:54 AM, riya khanna
<riyakhanna1983@...>
wrote:

Hi Brenden,

Saw test_perf_event.py in your branch. Its creates and enables per
counters once during start. Is it also possible to
enable/disable/reset counters on the fly? Perhaps we need a kernel
patch for this?
It doesn't "create" counters, it just attaches to the already
available
counters provided by the hardware or OS.
Yes, it enables monitoring when attached.

Any type of "reset" infrastructure
would adversely impact other users of those same counters (perf). I
consider
it the job of userspace or the program to compute deltas or other
types
of
history.
Well, there are limited counters. How to multiplex from userspace on
the fly (e.g. monitoring a set of events first, followed by a
different set)?
I would just create a different BPF_PERF_ARRAY for each different one.
Yes, but if you create more than available number of hardware counters
(i.e. try to monitor more events concurrently than allowed by the
hardware), counter value is reported as '0'. Verified with a test
program (perf.c, attached) that uses perf_event_open() syscall.
Depending upon the number of counters available on your platform,
change NUM_REQ_HW_CNTRS to verify the behavior

A way to enable/disable events at runtime will help userspace
multiplex over available hardware counters and monitor more more
events (similar to perf stat tool)
If you just call table.open_perf_event(NEW_TYPE) multiple times, it should
bind a new counter to the same table entry, allowing you to change the
monitored event over time. Have you tried that?



Also, is it possible to handle counter overflow?
What does it mean to "handle"? If computing deltas, for instance, the
subtraction will just underflow and wrap around to the correct value,
assuming the values are both unsigned.




Thanks,
Riya

On Fri, Jul 29, 2016 at 1:57 PM, riya khanna
<riyakhanna1983@...>
wrote:
Thanks Brenden!

I'm working with your branch for now. Additionally, I'm unable to
create software events (see exception below). Just wanted to bring
this to your attention.

Traceback (most recent call last):

File "./test_bpf.py", line 176, in <module>
sw_clock.open_perf_event(1, 0)

File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 410,
in
open_perf_event
fd = self._open_perf_event(typ, config, i)
File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 416,
in
_open_perf_event
self[self.Key(cpu)] = self.Leaf(fd)
File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 320,
in
__setitem__
super(ArrayBase, self).__setitem__(key, leaf)
File "/usr/lib/python2.7/dist-packages/bcc/table.py", line 169,
in
__setitem__
raise Exception("Could not update table")
Exception: Could not update table

On Fri, Jul 29, 2016 at 1:34 PM, Brenden Blanco
<bblanco@...>
wrote:
On Fri, Jul 29, 2016 at 10:21 AM, riya khanna
<riyakhanna1983@...>
wrote:

I'm testing perf counters on a 8-core machine.

since BPF_PERF_ARRAY.perf_read(cpu) reads from local CPU, I'm
aggregating counters across all cpus by doing:

BPF_PERF_ARRAY(counter, 32);

for (key = 0; key < 8; key++)
counter.perf_read(key);

I think it would make more sense to only read the counter on the
cpu
where
the event is taking place. So:

u64 key = cycles.perf_read(bpf_get_smp_processor_id());

And then aggregate counters in userspace.

I have spent some time over the past couple days cleaning up the
code
in
that private branch, but have been distracted a bit so haven't
finalized it.
Hopefully a PR will come soon.


However, this reports error:

bpf: Invalid argument
back-edge from insn 69 to 17

If I loop from 0-4, it works. The code below works:
for (key = 0; key < 4; key++)
counter.perf_read(key);


What could be wrong here?
The kernel verifier won't allow loops (i.e. back edges), and
depending
on
the loop unroll optimization decision made by llvm, this short
loop
may
have
been automatically unrolled. Still, the solution should be to
remove
the
loop and just read the local cpu's perf counter as mentioned
above.



On Tue, Jul 26, 2016 at 7:29 PM, riya khanna
<riyakhanna1983@...>
wrote:
From your patches I see that perf support is enabled per-cpu.
Could
this be extended to enabling all or a group of perf counters
on
all
CPU cores similar to what perf_event_open provides (with args
-1)?

On Mon, Jul 25, 2016 at 9:55 PM, riya khanna
<riyakhanna1983@...>
wrote:
Thanks Brenden!

I will try with your changes. Meanwhile please let me know if
you
add
missing functionality.


On Mon, Jul 25, 2016 at 8:14 PM, Brenden Blanco
<bblanco@...>
wrote:
This needs support in bcc.

I had a patch laying around that I never finished, you can
find
the
partial
support here:
https://github.com/iovisor/bcc/tree/perf-counter

It shouldn't be too hard to finalize that, let me see what I
can
do.

On Mon, Jul 25, 2016 at 4:11 PM, riya khanna via iovisor-dev
<iovisor-dev@...> wrote:

So I fixed the error above by using "count =
my_map.perf_read(key);"
as opposed to "count = bpf_perf_event_read(&my_map, key);".
However,
how do I selectively enable counters (e.g. instructions,
cache
misses,
etc.)?

Thanks,
Riya

On Mon, Jul 25, 2016 at 9:58 AM, riya khanna
<riyakhanna1983@...>
wrote:
Hi,

I'm trying to read perf counters using bpf. However,
adding
BPF_PERF_ARRAY reports error:

bpf: Invalid argument
unrecognized bpf_ld_imm64 inns

Is there an example/sample to read perf counters that I
can
follow?
The code below is what I'm trying to execute.

Thanks,
Riya

# load BPF program

bpf_text = """

#include <uapi/linux/ptrace.h>

BPF_PERF_ARRAY(my_map, 32);

int start_counting(struct pt_regs *ctx) {

if (!PT_REGS_PARM1(ctx))

return 0;

u64 count;

u32 key = bpf_get_smp_processor_id();

count = bpf_perf_event_read(&my_map, key);

bpf_trace_printk("CPU-%d %llu", key, count);

return 0;

}

"""
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev