Date   

Re: Overly brief stack traces for Java/linux ?

Yonghong Song
 

On Mon, Apr 5, 2021 at 10:08 PM Bradley Schatz
<bradley@...> wrote:

Thanks for the suggestion. I found a tunable to keep the JNI shared library in memory after loading. As you can see below, it is no longer showing as deleted.

13238272 bytes in 404 allocations from stack
[unknown] [jna2576903844543447777.tmp]
[unknown] [perf-18047.map]
I have no experience with perf-map-agent, but the following is what I guess:
[perf-18047.map] is used to find the mapping between address and symbol.
What does '[unknown] [perf-18047.map]' mean? Does this mean
perf-18047.map is not found? If the perf-<pid>.map file cannot be found,
symbolization won't be possible. Maybe you want to double check this?


No improvement in granularity though.

In the VM I'm using -XX:+PreserveFramePointer -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints. In perf_maps_agent, I'm using "unfoldall"

Any other suggestions?

Thanks!




On 3/4/21, 2:42 am, "Y Song" <ys114321@...> wrote:

On Wed, Mar 31, 2021 at 11:25 PM Bradley Schatz
<bradley@...> wrote:
>
> Hi,
>
>
>
> I’m just starting to come to grips with bcc & perf-map-agent for introspecting java on linux, with the goal of identifying what appears to be an off-heap memory leak (using memleak).
>
>
>
> I appear to be getting reliable stack decoding for jvm library code and for jit’ed java methods (see below for an example of the former). However I am seeing some very short stack traces which don’t seem to decode (the latter three stacks of below).
>
>
>
> It’s looking to me like the frame starting with “jna…” is likely the native JNI shared library for the FFI library “JNA”.
>
>
>
> Any suggestions as to why these latter three are so brief and/or how I can increase the resolution?

I can see the file has been marked as deleted.

34603008 bytes in 33 allocations from stack

[unknown] [jna9005484735610534564.tmp (deleted)]

[unknown] [perf-31566.map]

96468992 bytes in 92 allocations from stack

[unknown] [jna9005484735610534564.tmp (deleted)]

[unknown] [perf-31566.map]

So the file has been removed in userspace and current bcc won't be
pass to parse it since it takes the file name as
"jna9005484735610534564.tmp (deleted)"
The file name is actually taken from /proc/<pid>/maps.

I am not sure whether you can hack to parse "jna9005484735610534564.tmp" or not.
But I would consider it is unsafe to do that as the original file
related info may just
exist in kernel and there is a reference to it. For user space, it is
either gone or
could be replaced by something else. So the safest way is to find a place to
do symbolization before file is gone or keep tmp file a little bit longer.
>
>
>
> Apologies if this is the wrong place for such a question. Thank you for your help.
>
>
>
> Kind regards,
>
> Bradley
>
>
>
>
>
>
>
>
>
> 119408 bytes in 71 allocations from stack
>
> os::malloc(unsigned long, MemoryType, NativeCallStack const&)+0xb5 [libjvm.so]
>
> CodeBlob::set_oop_maps(OopMapSet*) [clone .part.5]+0x75 [libjvm.so]
>
> CodeBlob::CodeBlob(char const*, CodeBuffer*, int, int, int, int, OopMapSet*)+0xe3 [libjvm.so]
>
> nmethod::nmethod(Method*, int, int, int, CodeOffsets*, int, DebugInformationRecorder*, Dependencies*, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, int)+0x4d [libjvm.so]
>
> nmethod::new_nmethod(methodHandle, int, int, CodeOffsets*, int, DebugInformationRecorder*, Dependencies*, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, int)+0x219 [libjvm.so]
>
> ciEnv::register_method(ciMethod*, int, CodeOffsets*, int, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, int, bool, bool, RTMState)+0x1b1 [libjvm.so]
>
> Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool)+0xe60 [libjvm.so]
>
> C2Compiler::compile_method(ciEnv*, ciMethod*, int)+0xa3 [libjvm.so]
>
> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x808 [libjvm.so]
>
> CompileBroker::compiler_thread_loop()+0x6d8 [libjvm.so]
>
> JavaThread::thread_main_inner()+0x1c7 [libjvm.so]
>
> JavaThread::run()+0x2fa [libjvm.so]
>
> java_start(Thread*)+0x102 [libjvm.so]
>
> start_thread+0xf3 [libpthread-2.28.so]
>
> 34603008 bytes in 33 allocations from stack
>
> [unknown] [jna9005484735610534564.tmp (deleted)]
>
> [unknown] [perf-31566.map]
>
> 96468992 bytes in 92 allocations from stack
>
> [unknown] [jna9005484735610534564.tmp (deleted)]
>
> [unknown] [perf-31566.map]
>
> 295698432 bytes in 282 allocations from stack
>
> [unknown] [jna9005484735610534564.tmp (deleted)]
>
> [unknown] [perf-31566.map]
>
>
>
>


LPC 2021 Networking and BPF Track CFP

Daniel Borkmann
 

We are pleased to announce the Call for Proposals (CFP) for the Networking and
BPF track at the 2021 edition of the Linux Plumbers Conference (LPC), which is
planned to be held in Dublin, Ireland, on September 27th - 29th, 2021.

Note that if an in-person conference should prove to be impossible due to the
circumstances at that time, Linux Plumbers will switch to a virtual-only
conference. CFP submitters should ideally be able to give their presentation
in person, if circumstances permit, although presenting remotely will always
be possible.

This year's Networking and BPF track technical committee is comprised of:

David S. Miller <davem@...>
Jakub Kicinski <kuba@...>
Eric Dumazet <edumazet@...>
Alexei Starovoitov <ast@...>
Daniel Borkmann <daniel@...>
Andrii Nakryiko <andrii@...>

We are seeking proposals of 40 minutes in length (including Q&A discussion),
optionally accompanied by papers of 2 to 10 pages in length.

Any kind of advanced Linux networking and/or BPF related topic will be considered.

Please submit your proposals through the official LPC website at:

https://linuxplumbersconf.org/event/11/abstracts/

Make sure to select "Networking & BPF Summit" in the Track pull-down menu.

Proposals must be submitted by August 13th, and submitters will be notified of
acceptance by August 16th.

Final slides and papers (as PDF) are due on the first day of the conference.


Re: Overly brief stack traces for Java/linux ?

Bradley Schatz
 

Thanks for the suggestion. I found a tunable to keep the JNI shared library in memory after loading. As you can see below, it is no longer showing as deleted.

13238272 bytes in 404 allocations from stack
[unknown] [jna2576903844543447777.tmp]
[unknown] [perf-18047.map]

No improvement in granularity though.

In the VM I'm using -XX:+PreserveFramePointer -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints. In perf_maps_agent, I'm using "unfoldall"

Any other suggestions?

Thanks!




On 3/4/21, 2:42 am, "Y Song" <ys114321@...> wrote:

On Wed, Mar 31, 2021 at 11:25 PM Bradley Schatz
<bradley@...> wrote:
>
> Hi,
>
>
>
> I’m just starting to come to grips with bcc & perf-map-agent for introspecting java on linux, with the goal of identifying what appears to be an off-heap memory leak (using memleak).
>
>
>
> I appear to be getting reliable stack decoding for jvm library code and for jit’ed java methods (see below for an example of the former). However I am seeing some very short stack traces which don’t seem to decode (the latter three stacks of below).
>
>
>
> It’s looking to me like the frame starting with “jna…” is likely the native JNI shared library for the FFI library “JNA”.
>
>
>
> Any suggestions as to why these latter three are so brief and/or how I can increase the resolution?

I can see the file has been marked as deleted.

34603008 bytes in 33 allocations from stack

[unknown] [jna9005484735610534564.tmp (deleted)]

[unknown] [perf-31566.map]

96468992 bytes in 92 allocations from stack

[unknown] [jna9005484735610534564.tmp (deleted)]

[unknown] [perf-31566.map]

So the file has been removed in userspace and current bcc won't be
pass to parse it since it takes the file name as
"jna9005484735610534564.tmp (deleted)"
The file name is actually taken from /proc/<pid>/maps.

I am not sure whether you can hack to parse "jna9005484735610534564.tmp" or not.
But I would consider it is unsafe to do that as the original file
related info may just
exist in kernel and there is a reference to it. For user space, it is
either gone or
could be replaced by something else. So the safest way is to find a place to
do symbolization before file is gone or keep tmp file a little bit longer.
>
>
>
> Apologies if this is the wrong place for such a question. Thank you for your help.
>
>
>
> Kind regards,
>
> Bradley
>
>
>
>
>
>
>
>
>
> 119408 bytes in 71 allocations from stack
>
> os::malloc(unsigned long, MemoryType, NativeCallStack const&)+0xb5 [libjvm.so]
>
> CodeBlob::set_oop_maps(OopMapSet*) [clone .part.5]+0x75 [libjvm.so]
>
> CodeBlob::CodeBlob(char const*, CodeBuffer*, int, int, int, int, OopMapSet*)+0xe3 [libjvm.so]
>
> nmethod::nmethod(Method*, int, int, int, CodeOffsets*, int, DebugInformationRecorder*, Dependencies*, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, int)+0x4d [libjvm.so]
>
> nmethod::new_nmethod(methodHandle, int, int, CodeOffsets*, int, DebugInformationRecorder*, Dependencies*, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, int)+0x219 [libjvm.so]
>
> ciEnv::register_method(ciMethod*, int, CodeOffsets*, int, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, int, bool, bool, RTMState)+0x1b1 [libjvm.so]
>
> Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool)+0xe60 [libjvm.so]
>
> C2Compiler::compile_method(ciEnv*, ciMethod*, int)+0xa3 [libjvm.so]
>
> CompileBroker::invoke_compiler_on_method(CompileTask*)+0x808 [libjvm.so]
>
> CompileBroker::compiler_thread_loop()+0x6d8 [libjvm.so]
>
> JavaThread::thread_main_inner()+0x1c7 [libjvm.so]
>
> JavaThread::run()+0x2fa [libjvm.so]
>
> java_start(Thread*)+0x102 [libjvm.so]
>
> start_thread+0xf3 [libpthread-2.28.so]
>
> 34603008 bytes in 33 allocations from stack
>
> [unknown] [jna9005484735610534564.tmp (deleted)]
>
> [unknown] [perf-31566.map]
>
> 96468992 bytes in 92 allocations from stack
>
> [unknown] [jna9005484735610534564.tmp (deleted)]
>
> [unknown] [perf-31566.map]
>
> 295698432 bytes in 282 allocations from stack
>
> [unknown] [jna9005484735610534564.tmp (deleted)]
>
> [unknown] [perf-31566.map]
>
>
>
>


Re: Overly brief stack traces for Java/linux ?

Yonghong Song
 

On Wed, Mar 31, 2021 at 11:25 PM Bradley Schatz
<bradley@...> wrote:

Hi,



I’m just starting to come to grips with bcc & perf-map-agent for introspecting java on linux, with the goal of identifying what appears to be an off-heap memory leak (using memleak).



I appear to be getting reliable stack decoding for jvm library code and for jit’ed java methods (see below for an example of the former). However I am seeing some very short stack traces which don’t seem to decode (the latter three stacks of below).



It’s looking to me like the frame starting with “jna…” is likely the native JNI shared library for the FFI library “JNA”.



Any suggestions as to why these latter three are so brief and/or how I can increase the resolution?
I can see the file has been marked as deleted.

34603008 bytes in 33 allocations from stack

[unknown] [jna9005484735610534564.tmp (deleted)]

[unknown] [perf-31566.map]

96468992 bytes in 92 allocations from stack

[unknown] [jna9005484735610534564.tmp (deleted)]

[unknown] [perf-31566.map]

So the file has been removed in userspace and current bcc won't be
pass to parse it since it takes the file name as
"jna9005484735610534564.tmp (deleted)"
The file name is actually taken from /proc/<pid>/maps.

I am not sure whether you can hack to parse "jna9005484735610534564.tmp" or not.
But I would consider it is unsafe to do that as the original file
related info may just
exist in kernel and there is a reference to it. For user space, it is
either gone or
could be replaced by something else. So the safest way is to find a place to
do symbolization before file is gone or keep tmp file a little bit longer.



Apologies if this is the wrong place for such a question. Thank you for your help.



Kind regards,

Bradley









119408 bytes in 71 allocations from stack

os::malloc(unsigned long, MemoryType, NativeCallStack const&)+0xb5 [libjvm.so]

CodeBlob::set_oop_maps(OopMapSet*) [clone .part.5]+0x75 [libjvm.so]

CodeBlob::CodeBlob(char const*, CodeBuffer*, int, int, int, int, OopMapSet*)+0xe3 [libjvm.so]

nmethod::nmethod(Method*, int, int, int, CodeOffsets*, int, DebugInformationRecorder*, Dependencies*, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, int)+0x4d [libjvm.so]

nmethod::new_nmethod(methodHandle, int, int, CodeOffsets*, int, DebugInformationRecorder*, Dependencies*, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, int)+0x219 [libjvm.so]

ciEnv::register_method(ciMethod*, int, CodeOffsets*, int, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, int, bool, bool, RTMState)+0x1b1 [libjvm.so]

Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool)+0xe60 [libjvm.so]

C2Compiler::compile_method(ciEnv*, ciMethod*, int)+0xa3 [libjvm.so]

CompileBroker::invoke_compiler_on_method(CompileTask*)+0x808 [libjvm.so]

CompileBroker::compiler_thread_loop()+0x6d8 [libjvm.so]

JavaThread::thread_main_inner()+0x1c7 [libjvm.so]

JavaThread::run()+0x2fa [libjvm.so]

java_start(Thread*)+0x102 [libjvm.so]

start_thread+0xf3 [libpthread-2.28.so]

34603008 bytes in 33 allocations from stack

[unknown] [jna9005484735610534564.tmp (deleted)]

[unknown] [perf-31566.map]

96468992 bytes in 92 allocations from stack

[unknown] [jna9005484735610534564.tmp (deleted)]

[unknown] [perf-31566.map]

295698432 bytes in 282 allocations from stack

[unknown] [jna9005484735610534564.tmp (deleted)]

[unknown] [perf-31566.map]




Overly brief stack traces for Java/linux ?

Bradley Schatz
 

Hi,

 

I’m just starting to come to grips with bcc & perf-map-agent for introspecting java on linux, with the goal of identifying what appears to be an off-heap memory leak (using memleak).

 

I appear to be getting reliable stack decoding for jvm library code and for jit’ed java methods (see below for an example of the former). However I am seeing some very short stack traces which don’t seem to decode (the latter three stacks of below).

 

It’s looking to me like the frame starting with “jna…” is likely the native JNI shared library for the FFI library “JNA”.

 

Any suggestions as to why these latter three are so brief and/or how I can increase the resolution?

 

Apologies if this is the wrong place for such a question. Thank you for your help.

 

Kind regards,

Bradley

 

 

 

 

       119408 bytes in 71 allocations from stack

              os::malloc(unsigned long, MemoryType, NativeCallStack const&)+0xb5 [libjvm.so]

              CodeBlob::set_oop_maps(OopMapSet*) [clone .part.5]+0x75 [libjvm.so]

              CodeBlob::CodeBlob(char const*, CodeBuffer*, int, int, int, int, OopMapSet*)+0xe3 [libjvm.so]

              nmethod::nmethod(Method*, int, int, int, CodeOffsets*, int, DebugInformationRecorder*, Dependencies*, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, int)+0x4d [libjvm.so]

              nmethod::new_nmethod(methodHandle, int, int, CodeOffsets*, int, DebugInformationRecorder*, Dependencies*, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, int)+0x219 [libjvm.so]

              ciEnv::register_method(ciMethod*, int, CodeOffsets*, int, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, int, bool, bool, RTMState)+0x1b1 [libjvm.so]

              Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool)+0xe60 [libjvm.so]

              C2Compiler::compile_method(ciEnv*, ciMethod*, int)+0xa3 [libjvm.so]

              CompileBroker::invoke_compiler_on_method(CompileTask*)+0x808 [libjvm.so]

              CompileBroker::compiler_thread_loop()+0x6d8 [libjvm.so]

              JavaThread::thread_main_inner()+0x1c7 [libjvm.so]

              JavaThread::run()+0x2fa [libjvm.so]

              java_start(Thread*)+0x102 [libjvm.so]

              start_thread+0xf3 [libpthread-2.28.so]

       34603008 bytes in 33 allocations from stack

              [unknown] [jna9005484735610534564.tmp (deleted)]

              [unknown] [perf-31566.map]

       96468992 bytes in 92 allocations from stack

              [unknown] [jna9005484735610534564.tmp (deleted)]

              [unknown] [perf-31566.map]

       295698432 bytes in 282 allocations from stack

              [unknown] [jna9005484735610534564.tmp (deleted)]

              [unknown] [perf-31566.map]

 


Re: Questions about runqlen

Abel Wu
 

Hi Y Song,

On 3/21/21 1:38 AM, Y Song wrote:
On Tue, Mar 16, 2021 at 4:00 AM Abel Wu <wuyun.abel@...> wrote:

Hi, when I looked into the runqlen script yesterday, I found that,
sadly, I misunderstood the "queue length" all the time not only the
"length" part but also the "queue" part.
Could you file an "issue" for the question? This issue, the
questions/answers can be easily tracked.


Queue
=====
Only CFS runqueues are taken into account. This makes sense when
main workloads are all under CFS scheduler, which is common in
cloud scenario. But what I don't quite follow is that the selected
queue is task->se.cfs_rq which is from a task view, rather than the
top level cfs_rq from a cpu view. I suppose the task view is not
enough to draw the whole picture of saturation?

Length
======
Within this scope length means the number of schedulable entities,
that is cfs_rq->nr_running. From time sharing point of view, it is
OK because it represents how many units involved in scheduling of
this cfs_rq. But what about from execution point of view in which
the number of tasks (cfs_rq->h_nr_running) will be used?

And besides the above, without the shares information of each entity,
how could runqlen help us optimizing the performance? Maybe we should
always focus on occupancy rather than length?
There are some answers in this issue:
https://github.com/iovisor/bcc/issues/3093
To be accurate for cgroup/task-group environments, you may
need to traverse to the root. Could you check and experiment
whether this can solve your issue? if this is the case, we may
need to enhance runqlen.py. Maybe you could help provide
a pull request? Thanks!
Loop is forbidden in BPF programs (although bounded loop is
supported from linux-5.3, tracking down to NULL se->parent is
un-bounded). Maybe it's worth trying to get the definition of
struct rq? I will PR if made some progress.

Thanks,
Abel


It would be very much appreciated if someone can shed some light.

Thanks & Best regards,
Abel




Re: Questions about runqlen

Yonghong Song
 

On Tue, Mar 16, 2021 at 4:00 AM Abel Wu <wuyun.abel@...> wrote:

Hi, when I looked into the runqlen script yesterday, I found that,
sadly, I misunderstood the "queue length" all the time not only the
"length" part but also the "queue" part.
Could you file an "issue" for the question? This issue, the
questions/answers can be easily tracked.


Queue
=====
Only CFS runqueues are taken into account. This makes sense when
main workloads are all under CFS scheduler, which is common in
cloud scenario. But what I don't quite follow is that the selected
queue is task->se.cfs_rq which is from a task view, rather than the
top level cfs_rq from a cpu view. I suppose the task view is not
enough to draw the whole picture of saturation?

Length
======
Within this scope length means the number of schedulable entities,
that is cfs_rq->nr_running. From time sharing point of view, it is
OK because it represents how many units involved in scheduling of
this cfs_rq. But what about from execution point of view in which
the number of tasks (cfs_rq->h_nr_running) will be used?

And besides the above, without the shares information of each entity,
how could runqlen help us optimizing the performance? Maybe we should
always focus on occupancy rather than length?
There are some answers in this issue:
https://github.com/iovisor/bcc/issues/3093

To be accurate for cgroup/task-group environments, you may
need to traverse to the root. Could you check and experiment
whether this can solve your issue? if this is the case, we may
need to enhance runqlen.py. Maybe you could help provide
a pull request? Thanks!


It would be very much appreciated if someone can shed some light.

Thanks & Best regards,
Abel





Re: BCC and passing packet from XDP to user-mode app #bcc

Yonghong Song
 

On Thu, Mar 18, 2021 at 4:49 AM Federico Parola
<federico.parola@...> wrote:

Hi,
the virtual function you are looking for is perf_submit_skb():

https://github.com/iovisor/bcc/blob/c8de00e1746e242cdcd68b4673a083bb467cd35e/src/cc/export/helpers.h#L193

Strangely it is not documented in the reference guide.
Thanks, Federico and others. Maybe one of you can add it to the
reference_guide.md? We
do have events.perf_submit there. Thanks!


Best regards,
Federico Parola

On 18/03/21 10:29, v.a.bonert@... wrote:
Hi!
Is it possible to pass full ethernet packet from XDP to user-mode app
using BCC?
I wrote C code like this:
BPF_PERF_OUTPUT(captured_data);
capture(struct xdp_md *ctx)
{
events.perf_submit(ctx, ...);
}
But there is no flags argument in perf_submit function (but
bpf_perf_event_output has such argument).
Without BCC I can write such code to pass full packet to user-mode:
struct packet_info
{
uint32_t packet_len;
uint32_t iface_id;
};
struct bpf_map_def SEC("maps") captured_data =
{
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(u32),
.max_entries = MAX_CPUS
};
SEC("xdp")
int capture_kern(struct xdp_md *ctx)
{
u32 len = ctx->data_end - ctx->data;
u64 flags = BPF_F_CURRENT_CPU;
flags |= (u64)len << 32;
struct packet_info info = {len, ctx->ingress_ifindex};
bpf_perf_event_output(ctx, &captured_data, flags, &info, sizeof(info));
return XDP_PASS;
}
How can I do the same when using BCC?




Re: BCC and passing packet from XDP to user-mode app #bcc

Federico Parola
 

Hi,
the virtual function you are looking for is perf_submit_skb():

https://github.com/iovisor/bcc/blob/c8de00e1746e242cdcd68b4673a083bb467cd35e/src/cc/export/helpers.h#L193

Strangely it is not documented in the reference guide.

Best regards,
Federico Parola

On 18/03/21 10:29, v.a.bonert@... wrote:
Hi!
Is it possible to pass full ethernet packet from XDP to user-mode app using BCC?
I wrote C code like this:
BPF_PERF_OUTPUT(captured_data);
capture(struct xdp_md *ctx)
{
    events.perf_submit(ctx, ...);
}
But there is no flags argument in perf_submit function (but bpf_perf_event_output has such argument).
Without BCC I can write such code to pass full packet to user-mode:
struct packet_info
{
    uint32_t packet_len;
    uint32_t iface_id;
};
struct bpf_map_def SEC("maps") captured_data =
{
    .type        = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
    .key_size    = sizeof(u32),
    .value_size  = sizeof(u32),
    .max_entries = MAX_CPUS
};
SEC("xdp")
int capture_kern(struct xdp_md *ctx)
{
    u32 len = ctx->data_end - ctx->data;
    u64 flags = BPF_F_CURRENT_CPU;
    flags |= (u64)len << 32;
    struct packet_info info = {len, ctx->ingress_ifindex};
    bpf_perf_event_output(ctx, &captured_data, flags, &info, sizeof(info));
    return XDP_PASS;
}
How can I do the same when using BCC?


BCC and passing packet from XDP to user-mode app #bcc

v.a.bonert@...
 
Edited

Hi!
 
Is it possible to pass full ethernet packet from XDP to user-mode app using BCC?
I wrote C code like this:
 
BPF_PERF_OUTPUT(captured_data);
int capture(struct xdp_md *ctx)
{
    captured_data.perf_submit(ctx, ...);
    return XDP_PASS;
}
 
But there is no flags argument in perf_submit function (but bpf_perf_event_output has such argument).
 
Without BCC I can write such code to pass full packet to user-mode:
 
struct packet_info
{
    uint32_t packet_len;
    uint32_t iface_id;
};
 
struct bpf_map_def SEC("maps") captured_data =
{
    .type        = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
    .key_size    = sizeof(u32),
    .value_size  = sizeof(u32),
    .max_entries = MAX_CPUS
};
 
SEC("xdp")
int capture_kern(struct xdp_md *ctx)
{
    u32 len = ctx->data_end - ctx->data;
    u64 flags = BPF_F_CURRENT_CPU;
    flags |= (u64)len << 32;
    struct packet_info info = {len, ctx->ingress_ifindex};
    bpf_perf_event_output(ctx, &captured_data, flags, &info, sizeof(info));
 
    return XDP_PASS;
}
 
How can I do the same when using BCC?


Re: Which file should I include for KERNEL_VERSION macro ?

Andrii Nakryiko
 

On Wed, Mar 17, 2021 at 5:10 AM <chenhengqi@...> wrote:

I'v read this blog post

https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html

And want to apply this technique to my program:

extern u32 LINUX_KERNEL_VERSION __kconfig; extern u32 CONFIG_HZ __kconfig; u64 utime_ns; if (LINUX_KERNEL_VERSION >= KERNEL_VERSION(4, 11, 0)) utime_ns = BPF_CORE_READ(task, utime); else /* convert jiffies to nanoseconds */ utime_ns = BPF_CORE_READ(task, utime) * (1000000000UL / CONFIG_HZ);
It will soon be part of bpf_helpers.h, but meanwhile just copy/paste
it into your code. See
https://patchwork.kernel.org/project/netdevbpf/patch/20210317200510.1354627-2-andrii@kernel.org/


Which file should I include for KERNEL_VERSION macro ?

chenhengqi@...
 

I'v read this blog post

https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html

And want to apply this technique to my program:

extern u32 LINUX_KERNEL_VERSION __kconfig; extern u32 CONFIG_HZ __kconfig; u64 utime_ns; if (LINUX_KERNEL_VERSION >= KERNEL_VERSION(4, 11, 0)) utime_ns = BPF_CORE_READ(task, utime); else /* convert jiffies to nanoseconds */ utime_ns = BPF_CORE_READ(task, utime) * (1000000000UL / CONFIG_HZ);


Re: Which is oldest linux kernel version that can support BTF? #bcc

Toke Høiland-Jørgensen
 

"Daniel Xu" <dxu@...> writes:

On Sun, Feb 28, 2021, at 12:07 PM, bg.salunke09@... wrote:
Hi,

I'm looking into BTF and it's use case. Based on the document I
understood to run BPF programs across different kernel versions, it
needs to build with libbpf which depends on the BTF information.
Now to enable/to have BTF information on any Kernel, the kernel needs
to be re-build with "" flag.

I can see the BTF support in Linus introduced from *kernel version
5.1.0
(*https://www.kernel.org/doc/html/v5.1/bpf/btf.html?highlight=btf)
however I can still see the BTF information(/sys/kernel/btf/vmlinux) on
my 4.18.0-193.28.1.el8_2.x86_64 kernel.
What distro are you using? Your distro probably backported BTF
support.
Yeah, that's a RHEL version number (RHEL8.2 in this case, as seen by the
"el8_2" bit). Which means that as far as features are concerned, the
4.18 version number is basically a complete fiction at this point. For
BPF we basically backport everything, IIRC we made it up to upstream
kernel 5.4 for RHEL8.2...

-Toke


Re: Which is oldest linux kernel version that can support BTF? #bcc

Daniel Xu
 

On Sun, Feb 28, 2021, at 12:07 PM, bg.salunke09@... wrote:
Hi,

I'm looking into BTF and it's use case. Based on the document I
understood to run BPF programs across different kernel versions, it
needs to build with libbpf which depends on the BTF information.
Now to enable/to have BTF information on any Kernel, the kernel needs
to be re-build with "" flag.

I can see the BTF support in Linus introduced from *kernel version
5.1.0
(*https://www.kernel.org/doc/html/v5.1/bpf/btf.html?highlight=btf)
however I can still see the BTF information(/sys/kernel/btf/vmlinux) on
my 4.18.0-193.28.1.el8_2.x86_64 kernel.
What distro are you using? Your distro probably backported BTF support.

Daniel


Questions about runqlen

Abel Wu
 

Hi, when I looked into the runqlen script yesterday, I found that,
sadly, I misunderstood the "queue length" all the time not only the
"length" part but also the "queue" part.

Queue
=====
Only CFS runqueues are taken into account. This makes sense when
main workloads are all under CFS scheduler, which is common in
cloud scenario. But what I don't quite follow is that the selected
queue is task->se.cfs_rq which is from a task view, rather than the
top level cfs_rq from a cpu view. I suppose the task view is not
enough to draw the whole picture of saturation?

Length
======
Within this scope length means the number of schedulable entities,
that is cfs_rq->nr_running. From time sharing point of view, it is
OK because it represents how many units involved in scheduling of
this cfs_rq. But what about from execution point of view in which
the number of tasks (cfs_rq->h_nr_running) will be used?

And besides the above, without the shares information of each entity,
how could runqlen help us optimizing the performance? Maybe we should
always focus on occupancy rather than length?

It would be very much appreciated if someone can shed some light.

Thanks & Best regards,
Abel


Re: Which is oldest linux kernel version that can support BTF? #bcc

bg.salunke09@...
 

On Tue, Mar 2, 2021 at 08:22 PM, Andrii Nakryiko wrote:
On Tue, Mar 2, 2021 at 4:42 PM <bg.salunke09@...> wrote:
Thanks Andrii, for detailed answer.
Yes you are right, I'm looking for CO-RE. Basically I'm trying to build the eBPF program which can run on any linux kernel version using libbpf

What I understood from your blog https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html (Thanks for in depth blog post, appreciate it), to work libbpf based program
the BTF information should be available on the running host. Is my understanding correct?
Yes, correct.
Thank you for confirming! 
Btw, Is there any document to generate BTF information for a linux kernel? Or Is there a way to generate BTF info for running kernel i.e. at runtime and not at compile time? Thanks!
Yes, you can, if you have vmlinux image with DWARF information in it.
You can use pahole tool like this to add .BTF section to vmlinux
image:

pahole -J <path-to-vmlinux-image>

You most probably would want to make a local copy of vmlinux image, of
course. After that you can pass the path to that vmlinux with embedded
.BTF to libbpf to use for CO-RE relocations. See [0] for recent
discussion of the exact same topic. See also patch [1] that was aiming
to make this scenario better in libbpf (unfortunately it hasn't landed
yet, but it is pretty close to being done, so shouldn't be a problem
for you to pick up, if necessary).

This is certainly not the most straightforward and easiest path, but
if you want to get CO-RE working with older kernel for which you don't
have much control, it is definitely a possible way (as long as you
have DWARF, which is used to produce BTF for vmlinux).

[0] https://lore.kernel.org/bpf/CAEf4BzbJZLjNoiK8_VfeVg_Vrg=9iYFv+po-38SMe=UzwDKJ=Q@.../
[1] https://lore.kernel.org/bpf/B8801F77-37E8-4EF8-8994-D366D48169A3@.../

Go it. I'm following the discussion thread and patch. Thank you so much for your time. 


Re: Which is oldest linux kernel version that can support BTF? #bcc

Andrii Nakryiko
 

On Tue, Mar 2, 2021 at 4:42 PM <bg.salunke09@...> wrote:

Thanks Andrii, for detailed answer.
Yes you are right, I'm looking for CO-RE. Basically I'm trying to build the eBPF program which can run on any linux kernel version using libbpf

What I understood from your blog https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html (Thanks for in depth blog post, appreciate it), to work libbpf based program
the BTF information should be available on the running host. Is my understanding correct?
Yes, correct.


Btw, Is there any document to generate BTF information for a linux kernel? Or Is there a way to generate BTF info for running kernel i.e. at runtime and not at compile time? Thanks!
Yes, you can, if you have vmlinux image with DWARF information in it.
You can use pahole tool like this to add .BTF section to vmlinux
image:

pahole -J <path-to-vmlinux-image>

You most probably would want to make a local copy of vmlinux image, of
course. After that you can pass the path to that vmlinux with embedded
.BTF to libbpf to use for CO-RE relocations. See [0] for recent
discussion of the exact same topic. See also patch [1] that was aiming
to make this scenario better in libbpf (unfortunately it hasn't landed
yet, but it is pretty close to being done, so shouldn't be a problem
for you to pick up, if necessary).

This is certainly not the most straightforward and easiest path, but
if you want to get CO-RE working with older kernel for which you don't
have much control, it is definitely a possible way (as long as you
have DWARF, which is used to produce BTF for vmlinux).

[0] https://lore.kernel.org/bpf/CAEf4BzbJZLjNoiK8_VfeVg_Vrg=9iYFv+po-38SMe=UzwDKJ=Q@mail.gmail.com/
[1] https://lore.kernel.org/bpf/B8801F77-37E8-4EF8-8994-D366D48169A3@araalinetworks.com/



Re: Which is oldest linux kernel version that can support BTF? #bcc

bg.salunke09@...
 

Thanks Andrii, for detailed answer.  
Yes you are right, I'm looking for CO-RE. Basically I'm trying to build the eBPF program which can run on any linux kernel version using libbpf

What I understood from your blog https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html (Thanks for in depth blog post, appreciate it), to work libbpf based program 
the BTF information should be available on the running host. Is my understanding correct?

Btw, Is there any document to generate BTF information for a linux kernel?  Or Is there a way to generate BTF info for running kernel i.e. at runtime and not at compile time? Thanks! 


Re: Which is oldest linux kernel version that can support BTF? #bcc

Toke Høiland-Jørgensen
 

"Andrii Nakryiko" <andrii.nakryiko@...> writes:

On Sun, Feb 28, 2021 at 12:37 PM <bg.salunke09@...> wrote:

[Edited Message Follows]

Hi,

I'm looking into BTF and it's use case. Based on the document I understood to run BPF programs across different kernel versions, it needs to build with libbpf which depends on the BTF information.
Now to enable/to have BTF information on any Kernel, the kernel needs to be re-build with "" flag.

I can see the BTF support in Linux introduced from kernel version 5.1.0 (https://www.kernel.org/doc/html/v5.1/bpf/btf.html?highlight=btf)
however I can still see the BTF information(/sys/kernel/btf/vmlinux) on my 4.18.0-193.28.1.el8_2.x86_64 kernel.

I'm little confused here how old kernel can generate BTF info if the was support added recently.

Can I get information about oldest linux kernel version that can support BTF?
/sys/kernel/btf/vmlinux appeared in 5.4 kernel (upstream version). If
you see it on 4.18, that means someone backported the changes.
Yeah, that looks like a RHEL/CentOS kernel version number, which means
the 4.18 bit is mostly fiction at this point (at least as far as BPF is
concerned). IIRC we backported up to upstream kernel 5.4 for RHEL 8.2,
which seems to be what you're running (from the el8_2 bit of the
version), and I guess that fits with the availability of
/sys/kernel/btf/vmlinux

-Toke


Re: Which is oldest linux kernel version that can support BTF? #bcc

Andrii Nakryiko
 

On Sun, Feb 28, 2021 at 12:37 PM <bg.salunke09@...> wrote:

[Edited Message Follows]

Hi,

I'm looking into BTF and it's use case. Based on the document I understood to run BPF programs across different kernel versions, it needs to build with libbpf which depends on the BTF information.
Now to enable/to have BTF information on any Kernel, the kernel needs to be re-build with "" flag.

I can see the BTF support in Linux introduced from kernel version 5.1.0 (https://www.kernel.org/doc/html/v5.1/bpf/btf.html?highlight=btf)
however I can still see the BTF information(/sys/kernel/btf/vmlinux) on my 4.18.0-193.28.1.el8_2.x86_64 kernel.

I'm little confused here how old kernel can generate BTF info if the was support added recently.

Can I get information about oldest linux kernel version that can support BTF?
/sys/kernel/btf/vmlinux appeared in 5.4 kernel (upstream version). If
you see it on 4.18, that means someone backported the changes. But for
BPF CO-RE (which I assume is what you are referring to) to work,
kernel itself doesn't need to "support BTF", it just needs to have
.BTF data built-in inside its vmlinux binary image, and that image
needs to be in one of the supported locations (see [0]). Starting from
5.2 kernel CONFIG_DEBUG_INTO_BTF=y is supported with adds .BTF section
as part of the kernel build process.

But one could technically add .BTF by using pahole tool (part of
dwarves package) even before that, as long as vmlinux image contains
DWARF information.

So in short, the easiest way is to get the latest kernel you can. But
with enough persistence and effort you can get kernel BTF embedded for
pretty much any kernel version.


[0] https://github.com/libbpf/libbpf/blob/master/src/btf.c#L4589-L4598



41 - 60 of 2021