Date   

Re: reminder: IO Visor TSC and Dev Members Call

O Mahony, Billy <billy.o.mahony@...>
 

Hi Brendan,

Does this mean the time is 12 midday Santa Clara time? (I won't even venture am/pm !)

Thanks,
/Billy.

-----Original Message-----
From: iovisor-dev-bounces@... [mailto:iovisor-dev-
bounces@...] On Behalf Of Brenden Blanco via iovisor-dev
Sent: Tuesday, March 29, 2016 6:50 PM
To: iovisor-dev@...
Subject: [iovisor-dev] reminder: IO Visor TSC and Dev Members Call

Hi All,

Please join us for our bi-weekly call. This meeting is open to everybody and
completely optional.

Topics for this week:
1. Discuss ongoing work around tracing:
- tracepoints update
- USDT
2. New binding development
- Lua!
3. Documentation
4. Open /issues


http://www.timeanddate.com/worldclock/meetingdetails.html?year=2016&
month=3&day=30&hour=19&min=0&sec=0&p1=886

The meeting is open to join:

JOIN WEBEX MEETING
https://plumgrid.webex.com/plumgrid/j.php?MTID=m67436ba0408d6bad48
acd69138c03aea
Meeting number: 283 885 640
Meeting password: iovisor


JOIN BY PHONE
+1-415-655-0003 US TOLL
Access code: 283 885 640

Global call-in numbers:
https://plumgrid.webex.com/plumgrid/globalcallin.php?serviceType=MC&E
D=44474908&tollFree=0
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: reminder: IO Visor TSC and Dev Members Call

Brenden Blanco <bblanco@...>
 

What I actually meant was (11 am pacific)

http://www.timeanddate.com/worldclock/meetingdetails.html?year=2016&month=3&day=30&hour=18&min=0&sec=0&p1=886

Blame daylight "saving" time, which saves nothing imho.

On Mar 30, 2016 7:39 AM, "O Mahony, Billy" <billy.o.mahony@...> wrote:

Hi Brendan,

Does this mean the time is 12 midday Santa Clara time? (I won't even venture am/pm !)

Thanks,
/Billy.


> -----Original Message-----
> From: iovisor-dev-bounces@... [mailto:iovisor-dev-
> bounces@...] On Behalf Of Brenden Blanco via iovisor-dev
> Sent: Tuesday, March 29, 2016 6:50 PM
> To: iovisor-dev@...
> Subject: [iovisor-dev] reminder: IO Visor TSC and Dev Members Call
>
> Hi All,
>
> Please join us for our bi-weekly call. This meeting is open to everybody and
> completely optional.
>
> Topics for this week:
> 1. Discuss ongoing work around tracing:
>   - tracepoints update
>   - USDT
> 2. New binding development
>   - Lua!
> 3. Documentation
> 4. Open /issues
>
>
> http://www.timeanddate.com/worldclock/meetingdetails.html?year=2016&
> month=3&day=30&hour=19&min=0&sec=0&p1=886
>
> The meeting is open to join:
>
> JOIN WEBEX MEETING
> https://plumgrid.webex.com/plumgrid/j.php?MTID=m67436ba0408d6bad48
> acd69138c03aea
> Meeting number: 283 885 640
> Meeting password: iovisor
>
>
> JOIN BY PHONE
> +1-415-655-0003 US TOLL
> Access code: 283 885 640
>
> Global call-in numbers:
> https://plumgrid.webex.com/plumgrid/globalcallin.php?serviceType=MC&E
> D=44474908&tollFree=0
> _______________________________________________
> iovisor-dev mailing list
> iovisor-dev@...
> https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: reminder: IO Visor TSC and Dev Members Call

O Mahony, Billy <billy.o.mahony@...>
 

Perfect!

 

From: Brenden Blanco [mailto:bblanco@...]
Sent: Wednesday, March 30, 2016 3:50 PM
To: O Mahony, Billy <billy.o.mahony@...>
Cc: iovisor-dev@...
Subject: RE: [iovisor-dev] reminder: IO Visor TSC and Dev Members Call

 

What I actually meant was (11 am pacific)

http://www.timeanddate.com/worldclock/meetingdetails.html?year=2016&month=3&day=30&hour=18&min=0&sec=0&p1=886

Blame daylight "saving" time, which saves nothing imho.

On Mar 30, 2016 7:39 AM, "O Mahony, Billy" <billy.o.mahony@...> wrote:

Hi Brendan,

Does this mean the time is 12 midday Santa Clara time? (I won't even venture am/pm !)

Thanks,
/Billy.


> -----Original Message-----
> From: iovisor-dev-bounces@... [mailto:iovisor-dev-
> bounces@...] On Behalf Of Brenden Blanco via iovisor-dev
> Sent: Tuesday, March 29, 2016 6:50 PM
> To: iovisor-dev@...
> Subject: [iovisor-dev] reminder: IO Visor TSC and Dev Members Call
>
> Hi All,
>
> Please join us for our bi-weekly call. This meeting is open to everybody and
> completely optional.
>
> Topics for this week:
> 1. Discuss ongoing work around tracing:
>   - tracepoints update
>   - USDT
> 2. New binding development
>   - Lua!
> 3. Documentation
> 4. Open /issues
>
>
> http://www.timeanddate.com/worldclock/meetingdetails.html?year=2016&
> month=3&day=30&hour=19&min=0&sec=0&p1=886
>
> The meeting is open to join:
>
> JOIN WEBEX MEETING
> https://plumgrid.webex.com/plumgrid/j.php?MTID=m67436ba0408d6bad48
> acd69138c03aea
> Meeting number: 283 885 640
> Meeting password: iovisor
>
>
> JOIN BY PHONE
> +1-415-655-0003 US TOLL
> Access code: 283 885 640
>
> Global call-in numbers:
> https://plumgrid.webex.com/plumgrid/globalcallin.php?serviceType=MC&E
> D=44474908&tollFree=0
> _______________________________________________
> iovisor-dev mailing list
> iovisor-dev@...
> https://lists.iovisor.org/mailman/listinfo/iovisor-dev


ietf drafts

Alexei Starovoitov
 


minutes: IO Visor TSC and Dev Members Call

Brenden Blanco <bblanco@...>
 

Hi All,

Thanks for attending the meeting today. Here are the things I took note of:

(Alexei)
Tracepoint patches are almost ready, to be posted when net-next opens
- Can be faster than kprobes when the tracepoint doesn't copy much
data, and generally faster when kprobes has to do a lot of
bpr_probe_read traversing
- introduces new program type, fairly straightforward
New API feature idea to expose the hash of a loaded program (minus the
fd values) is being thrown around.

(Sasha)
USDT in progress, already some patches have merged
- Brendan and Alexei have been testing

(Vicent)
Lua frontend support has just been added, complete with initial tests and readme
Alexei especially liked the possibility of statically linking some of
the tools, which we can't do in python.

(Daniel)
Working on verifier to reduce complexity of instructions generated (at
least I think that's what it was)

(John)
Working on various BPF use cases, which are still a few months out
- Conntrack in BPF
- LLVM P4 frontend, with bpf-like (but not quite) backend that maps
to HW. Direct to BPF has issues, but through various layerings we
might be able to mix the two worlds.
Also there are some patches to further reduce locking in qdisc, to be
proposed soon when net-next opens

Other items:
A patch in net-next came through for influencing queue mapping/xps,
exposing this to bpf would be a nice-to-have.

At NSDI, Alexei met Simon a student working on an FPGA with support
for BPF, which now supports some of the extended functionality.
Hopefully the RTL will be made available open-source. Alexei suggested
to him to try a Mellanox NIC with FPGA as a next step.

Next week Facebook will be presenting ILA at IETF. They plan to use
BPF/XDP to accelerate it, and will continue to keep us posted. Alexei
has shared the links to the various drafts, please take a look and
give feedback soon.

And last but not least
Documentation needs work in several areas!
- bcc: C api
Brendan created a bunch of /issues to capture low-hanging fruit for
new recruits
- kernel: manpages for helper functions


Thanks,
Brenden Blanco


Attendees:
Uri Elzur
Prem Jonnalagadda
Pere Monclus
John Fastabend
Daniel Borkmann
Brenden Blanco
Brendan Gregg
Billy O'Mahony
Alexei Starovoitov


"invalid mem access" when inspecting packets from probe

Alfonso Acosta <fons@...>
 

Hi,

I want to efficiently obtain incoming http request rates by process. 

For that I am writing a proof of concept [0] based on https://github.com/iovisor/bcc/blob/master/examples/networking/http_filter/http-parse-complete.c which uses a probe to obtain the PID of the process receiving the request.

Specifically, and for code-stability, I am using the new tracepoint code [1] to run a probe at trace_skb_copy_datagram_iovec [2].

However, the verifier doesn't let me access the socket buffer data 

bpf_probe_read(&data, sizeof(data), skb->data); // [3]

...
51: (63) *(u32 *)(r10 -60) = r7
52: (79) r3 = *(u64 *)(r8 +216)
R8 invalid mem access 'inv' 

[4]

Is there a way around this? 

Also, but think it would be useful to be able to use the packet inspection features in probes (e.g. cursor_advance() ) by specifying the socket_buff they should be applied to (I am new to ebfp so tell me if this is plain stupid)

Thanks,

Fons



Re: "invalid mem access" when inspecting packets from probe

Alfonso Acosta <fons@...>
 

PS: I am running kernel 4.4.2 and the nightly bcc.

On Wed, Apr 6, 2016 at 5:13 PM, Alfonso Acosta <fons@...> wrote:
Hi,

I want to efficiently obtain incoming http request rates by process. 

For that I am writing a proof of concept [0] based on https://github.com/iovisor/bcc/blob/master/examples/networking/http_filter/http-parse-complete.c which uses a probe to obtain the PID of the process receiving the request.

Specifically, and for code-stability, I am using the new tracepoint code [1] to run a probe at trace_skb_copy_datagram_iovec [2].

However, the verifier doesn't let me access the socket buffer data 

bpf_probe_read(&data, sizeof(data), skb->data); // [3]

...
51: (63) *(u32 *)(r10 -60) = r7
52: (79) r3 = *(u64 *)(r8 +216)
R8 invalid mem access 'inv' 

[4]

Is there a way around this? 

Also, but think it would be useful to be able to use the packet inspection features in probes (e.g. cursor_advance() ) by specifying the socket_buff they should be applied to (I am new to ebfp so tell me if this is plain stupid)

Thanks,

Fons




Re: "invalid mem access" when inspecting packets from probe

Alfonso Acosta <fons@...>
 

I wasn't thinking. I first need to read the field and then the pointed data.

u8 data[4] = {};
void* datap = 0;
bpf_probe_read(&datap, sizeof(datap), &skb->data);
bpf_probe_read(&data, sizeof(data), datap);

https://github.com/weaveworks/scope/commit/ac7980e5979a28528df08d9aacd540f56d704000

On Wed, Apr 6, 2016 at 5:15 PM, Alfonso Acosta <fons@...> wrote:
PS: I am running kernel 4.4.2 and the nightly bcc.

On Wed, Apr 6, 2016 at 5:13 PM, Alfonso Acosta <fons@...> wrote:
Hi,

I want to efficiently obtain incoming http request rates by process. 

For that I am writing a proof of concept [0] based on https://github.com/iovisor/bcc/blob/master/examples/networking/http_filter/http-parse-complete.c which uses a probe to obtain the PID of the process receiving the request.

Specifically, and for code-stability, I am using the new tracepoint code [1] to run a probe at trace_skb_copy_datagram_iovec [2].

However, the verifier doesn't let me access the socket buffer data 

bpf_probe_read(&data, sizeof(data), skb->data); // [3]

...
51: (63) *(u32 *)(r10 -60) = r7
52: (79) r3 = *(u64 *)(r8 +216)
R8 invalid mem access 'inv' 

[4]

Is there a way around this? 

Also, but think it would be useful to be able to use the packet inspection features in probes (e.g. cursor_advance() ) by specifying the socket_buff they should be applied to (I am new to ebfp so tell me if this is plain stupid)

Thanks,

Fons





Why do some bcc scripts run smoothly while some not?

Nan Xiao <xiaonan830818@...>
 

Hi all,

My OS is RHEL 7.2 (kernel version is 3.10). To use bcc/bpf, I download and install the newest 4.5 kernel.

After installing bcc, I find some scripts run smoothly:

# ./dcstat
TIME         REFS/s   SLOW/s   MISS/s     HIT%
19:12:08:        78       42       42    46.15
19:12:09:         7        0        0   100.00
19:12:10:         7        0        0   100.00

some run with warning:

# ./bitesize
In file included from /virtual/main.c:3:
In file included from include/linux/blkdev.h:14:
In file included from include/linux/pagemap.h:7:
In file included from include/linux/mm.h:347:
include/linux/huge_mm.h:133:10: warning: expression which evaluates to zero treated as a null pointer constant of type 'spinlock_t *'
      (aka 'struct spinlock *') [-Wnon-literal-null-conversion]
                return false;
                       ^~~~~
1 warning generated.
Tracing... Hit Ctrl-C to end.
^C
Process Name = 'rhsmcertd-worke'
     Kbytes              : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 3        |****************************************|

While some can't start:

# ./memleak
/virtual/main.c:12:1: error: could not open bpf map: Invalid argument
BPF_STACK_TRACE(stack_traces, 1024)
^
/virtual/include/bcc/helpers.h:119:3: note: expanded from macro 'BPF_STACK_TRACE'
  BPF_TABLE("stacktrace", int, struct bpf_stacktrace, _name, _max_entries);
  ^
/virtual/include/bcc/helpers.h:53:4: note: expanded from macro 'BPF_TABLE'
}; \
   ^
/virtual/main.c:46:25: error: expected initialized handle for bpf_table
        info.stack_id = stack_traces.get_stackid(ctx, BPF_F_REUSE_STACKID);
                        ^
2 errors generated.
Traceback (most recent call last):
  File "./memleak", line 250, in <module>
    bpf_program = BPF(text=bpf_source)
  File "/usr/lib/python2.7/site-packages/bcc/__init__.py", line 165, in __init__
    raise Exception("Failed to compile BPF module %s" % src_file)
Exception: Failed to compile BPF module

So is there some wrong with my building kernle/installing bcc? Could some give some suggestions?

Thanks very much in advance!

Best Regards
Nan Xiao


Re: Why do some bcc scripts run smoothly while some not?

Brenden Blanco <bblanco@...>
 

Hello Nan Xiao,
Please see responses inline. Hope this helps.

On Fri, Apr 8, 2016 at 2:18 AM, Nan Xiao via iovisor-dev <iovisor-dev@...> wrote: 
Hi all,

My OS is RHEL 7.2 (kernel version is 3.10). To use bcc/bpf, I download and install the newest 4.5 kernel.

After installing bcc, I find some scripts run smoothly:

# ./dcstat
TIME         REFS/s   SLOW/s   MISS/s     HIT%
19:12:08:        78       42       42    46.15
19:12:09:         7        0        0   100.00
19:12:10:         7        0        0   100.00

some run with warning:

This occurs because kernel (generally built with gcc) is now compiled with clang, and these headers are generating warnings. The only solution is to submit a patch upstream. The warning can be suppressed by passing BPF(..., cflags=["-w"]).
# ./bitesize
In file included from /virtual/main.c:3:
In file included from include/linux/blkdev.h:14:
In file included from include/linux/pagemap.h:7:
In file included from include/linux/mm.h:347:
include/linux/huge_mm.h:133:10: warning: expression which evaluates to zero treated as a null pointer constant of type 'spinlock_t *'
      (aka 'struct spinlock *') [-Wnon-literal-null-conversion]
                return false;
                       ^~~~~
1 warning generated.
Tracing... Hit Ctrl-C to end.
^C
Process Name = 'rhsmcertd-worke'
     Kbytes              : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 3        |****************************************|

While some can't start:
The stack trace map type is available in 4.6 onwards. We should improve the error message to be more clear.

# ./memleak
/virtual/main.c:12:1: error: could not open bpf map: Invalid argument
BPF_STACK_TRACE(stack_traces, 1024)
^
/virtual/include/bcc/helpers.h:119:3: note: expanded from macro 'BPF_STACK_TRACE'
  BPF_TABLE("stacktrace", int, struct bpf_stacktrace, _name, _max_entries);
  ^
/virtual/include/bcc/helpers.h:53:4: note: expanded from macro 'BPF_TABLE'
}; \
   ^
/virtual/main.c:46:25: error: expected initialized handle for bpf_table
        info.stack_id = stack_traces.get_stackid(ctx, BPF_F_REUSE_STACKID);
                        ^
2 errors generated.
Traceback (most recent call last):
  File "./memleak", line 250, in <module>
    bpf_program = BPF(text=bpf_source)
  File "/usr/lib/python2.7/site-packages/bcc/__init__.py", line 165, in __init__
    raise Exception("Failed to compile BPF module %s" % src_file)
Exception: Failed to compile BPF module

So is there some wrong with my building kernle/installing bcc? Could some give some suggestions?
Seems that everything is installed correctly. You can try upgrading your kernel to use the memleak script but others that depend on lesser kernel features should continue to work.

Thanks very much in advance!

Best Regards
Nan Xiao

_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev



Re: Why do some bcc scripts run smoothly while some not?

Nan Xiao <xiaonan830818@...>
 

Hi Brenden,

Thanks very much for your detailed explanations!

Best Regards
Nan Xiao

On Sat, Apr 9, 2016 at 1:14 AM, Brenden Blanco <bblanco@...> wrote:
Hello Nan Xiao,
Please see responses inline. Hope this helps.

On Fri, Apr 8, 2016 at 2:18 AM, Nan Xiao via iovisor-dev <iovisor-dev@...> wrote: 
Hi all,

My OS is RHEL 7.2 (kernel version is 3.10). To use bcc/bpf, I download and install the newest 4.5 kernel.

After installing bcc, I find some scripts run smoothly:

# ./dcstat
TIME         REFS/s   SLOW/s   MISS/s     HIT%
19:12:08:        78       42       42    46.15
19:12:09:         7        0        0   100.00
19:12:10:         7        0        0   100.00

some run with warning:

This occurs because kernel (generally built with gcc) is now compiled with clang, and these headers are generating warnings. The only solution is to submit a patch upstream. The warning can be suppressed by passing BPF(..., cflags=["-w"]).
# ./bitesize
In file included from /virtual/main.c:3:
In file included from include/linux/blkdev.h:14:
In file included from include/linux/pagemap.h:7:
In file included from include/linux/mm.h:347:
include/linux/huge_mm.h:133:10: warning: expression which evaluates to zero treated as a null pointer constant of type 'spinlock_t *'
      (aka 'struct spinlock *') [-Wnon-literal-null-conversion]
                return false;
                       ^~~~~
1 warning generated.
Tracing... Hit Ctrl-C to end.
^C
Process Name = 'rhsmcertd-worke'
     Kbytes              : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 3        |****************************************|

While some can't start:
The stack trace map type is available in 4.6 onwards. We should improve the error message to be more clear.

# ./memleak
/virtual/main.c:12:1: error: could not open bpf map: Invalid argument
BPF_STACK_TRACE(stack_traces, 1024)
^
/virtual/include/bcc/helpers.h:119:3: note: expanded from macro 'BPF_STACK_TRACE'
  BPF_TABLE("stacktrace", int, struct bpf_stacktrace, _name, _max_entries);
  ^
/virtual/include/bcc/helpers.h:53:4: note: expanded from macro 'BPF_TABLE'
}; \
   ^
/virtual/main.c:46:25: error: expected initialized handle for bpf_table
        info.stack_id = stack_traces.get_stackid(ctx, BPF_F_REUSE_STACKID);
                        ^
2 errors generated.
Traceback (most recent call last):
  File "./memleak", line 250, in <module>
    bpf_program = BPF(text=bpf_source)
  File "/usr/lib/python2.7/site-packages/bcc/__init__.py", line 165, in __init__
    raise Exception("Failed to compile BPF module %s" % src_file)
Exception: Failed to compile BPF module

So is there some wrong with my building kernle/installing bcc? Could some give some suggestions?
Seems that everything is installed correctly. You can try upgrading your kernel to use the memleak script but others that depend on lesser kernel features should continue to work.

Thanks very much in advance!

Best Regards
Nan Xiao

_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev




Can we use IOVISOR for virtual machine introspection.

Tahir Ahmed <tahirahmed1044@...>
 

Hi
I am working on IOVIOSR. I have implemented IOVISOR tools and got the output. But my purpose is different. I am working on libvmi. I have a machine on KVM hyper-visor running on my Ubuntu machine. Is there possible to get information about machine running on KVM from my Ubuntu machine where i have IOVISOR. Kindly guide me in this sense. 
Thanks
Tahir Ahmed


reminder: IO Visor TSC and Dev Members Call

Brenden Blanco <bblanco@...>
 

Hi all,

Please join us for our bi-weekly call. This meeting is open to everybody and completely optional.


autofdo video

Hannes Frederic Sowa
 

FYI, as discussed in the conf call:
https://www.youtube.com/watch?v=26SrOC6MXWg


bpf version of this sample

Hannes Frederic Sowa
 

Hello,

if a bpf counterpart of this example would be possible, it would be
really cool:
http://lxr.free-electrons.com/source/samples/hw_breakpoint/data_breakpoint.c

Bye,
Hannes


Re: bpf version of this sample

Hannes Frederic Sowa
 

Hello,

On 13.04.2016 20:53, Hannes Frederic Sowa wrote:
if a bpf counterpart of this example would be possible, it would be
really cool:
http://lxr.free-electrons.com/source/samples/hw_breakpoint/data_breakpoint.c
Care must be taken, as the code path in this example as-is is only valid from process context, as mutexes will be taken during the installation of the watch point. This actually renders it a little bit moot from the usage in the networking stack.

I once had an netfilter code which reinserted the skb from a workqueue, but it was rather horrible to deal with.

Bye,
Hannes


Re: bpf version of this sample

Alexei Starovoitov
 

On Wed, Apr 13, 2016 at 12:43 PM, Hannes Frederic Sowa via iovisor-dev
<iovisor-dev@...> wrote:
Hello,

On 13.04.2016 20:53, Hannes Frederic Sowa wrote:

if a bpf counterpart of this example would be possible, it would be
really cool:

http://lxr.free-electrons.com/source/samples/hw_breakpoint/data_breakpoint.c

Care must be taken, as the code path in this example as-is is only valid
from process context, as mutexes will be taken during the installation of
the watch point. This actually renders it a little bit moot from the usage
in the networking stack.

I once had an netfilter code which reinserted the skb from a workqueue, but
it was rather horrible to deal with.
do you propose to let programs register such hw breakpoints to point to
other programs?
That will be hard to achieve as you said due to mutexes...
but we can probably extend perf_event_open() api to allow such hw breakpoints
and attach bpf progs to it.


Re: bpf version of this sample

Daniel Borkmann
 

On 04/13/2016 10:23 PM, Alexei Starovoitov via iovisor-dev wrote:
On Wed, Apr 13, 2016 at 12:43 PM, Hannes Frederic Sowa via iovisor-dev
<iovisor-dev@...> wrote:
Hello,

On 13.04.2016 20:53, Hannes Frederic Sowa wrote:

if a bpf counterpart of this example would be possible, it would be
really cool:

http://lxr.free-electrons.com/source/samples/hw_breakpoint/data_breakpoint.c
Care must be taken, as the code path in this example as-is is only valid
from process context, as mutexes will be taken during the installation of
the watch point. This actually renders it a little bit moot from the usage
in the networking stack.

I once had an netfilter code which reinserted the skb from a workqueue, but
it was rather horrible to deal with.
do you propose to let programs register such hw breakpoints to point to
other programs?
That will be hard to achieve as you said due to mutexes...
but we can probably extend perf_event_open() api to allow such hw breakpoints
and attach bpf progs to it.
Actually for network debugging it would be nice if we could classify an skb
e.g. from XDP (when passed to stack) or cls_bpf ingress, and tell the kernel
that if this particular skb changes e.g. its protocol or priority field, set
a breakpoint, so I can debug the issue in relation to it. Means that the setting
of breakpoint would be a root-only helper function and the callback it triggers
would need to be a new BPF program type that has helpers such as 'throw stack
trace' etc, and that can inspect struct members. With the mutex, yeah, it's
an issue, maybe it can be worked around ...


Re: bpf version of this sample

Hannes Frederic Sowa
 

On 13.04.2016 22:23, Alexei Starovoitov wrote:
On Wed, Apr 13, 2016 at 12:43 PM, Hannes Frederic Sowa via iovisor-dev
<iovisor-dev@...> wrote:
Hello,

On 13.04.2016 20:53, Hannes Frederic Sowa wrote:

if a bpf counterpart of this example would be possible, it would be
really cool:

http://lxr.free-electrons.com/source/samples/hw_breakpoint/data_breakpoint.c

Care must be taken, as the code path in this example as-is is only valid
from process context, as mutexes will be taken during the installation of
the watch point. This actually renders it a little bit moot from the usage
in the networking stack.

I once had an netfilter code which reinserted the skb from a workqueue, but
it was rather horrible to deal with.
do you propose to let programs register such hw breakpoints to point to
other programs?
Yes, that a eBPF program can register such a hw breakpoint within the kernel after checking for some conditions. The CPU debug registers aren't very big thus we probably can't use more than 4 registrations at one time.

That will be hard to achieve as you said due to mutexes...
That is true, unfortunately.

but we can probably extend perf_event_open() api to allow such hw breakpoints
and attach bpf progs to it.
That is not actually that useful IMHO. I used this feature with systemtap from time to time if I knew the absolute pointer of the memory location I want to trace beforehand, but there was only one situation where this actually made sense. Dynamically registering those trace points would be a huge improvement.

In case of skb tracing, I actually tried the approach to allocate the skbs beforehand, set up the watchpoints from user space on the interesting fields and in case I finally hit the skb to be traced, copied over the content of the skb and reinjected it into the stack. But as said, those are horrible hacks...

Bye,
Hannes


minutes: IO Visor TSC and Dev members call

Brenden Blanco <bblanco@...>
 

Thanks all for another good discussion.

There are two major updates this week.

== Tracepoints

First, from Alexei, the infrastructure to attach bpf programs to tracepoints
has been merged in net-next. This will bring a stable kernel ABI to some
critical kernel events, rather than relying on kprobes (which can break as the
kernel internals change).

To start, tcp will likely be the first subsystem to define such events.
 - proposed events: retrans, rx estab, v4/v6 send rst, destroy sock
 - other ideas: passive open, active open, tcp state change

== XDP Early drop

I have also been working to prototype a new bpf hook early in the packet path
(driver rx), with the hopes of improving throughput for some use cases. The
first simple use case is for programmable early drop. The code is in RFC
format, see the discussion at [1] and an LWN article at [2].

The next step is to show that we can extend this infrastructure to include
forwarding, in addition to drop.


For xmit, defining the forward action is tricky, and we discussed some
possibilities. There was agreement that a complicated return code should be
avoided, instead model it like bpf_redirect().
 - Specify device rx queue
 - Specify ifindex - has the problem of allocating skb to cross nic boundary
 - Hardcode (ethtool) which rx queue maps to tx queue when fwd is picked
 - Batching should also be designed in

Hannes also mentioned a possible way to efficiently forward from phys_dev to a
namespace/socket. This mechanism would be cleaner and lighter weight than
ipvlan.

== Misc

Brendan mentioned that Xenial will be released soon, and we should prepare a
Xenial package for folks to download.

Brendan mentioned that from SREcon he found someone from a "major tech company"
that is interested in bpf as non-root for tracing.
This will be tricky since kprobes look well into privileged kernel structures,
but perhaps the tracepoint infrastructure will allow some use cases. Let's keep
this in the back of our minds for the future.

Alexei is working on infra to simplify packet read/write to reduce overhead. So
far, a new instruction with smarter bounds checking, as well as a smarter
verifier that lets packet access look more like regular pointer arithmetic are
being prototyped.

Daniel is working on arbitrary push/pop header. There is working code, maybe
soon we will see some patches.

Alexei is also working on the following enhancements, some of which have been
mentioned before:
 - mmap array
 - inline array map access

Also, someone raised the idea of having something like a static key for bpf
programs, which could be used to enable debug paths without reloading the
program.

Attendees:
Alexei Starovoitov
Brendan Gregg
Daniel Borkmann
Prem Jonnalagadda
Shehzad Ismail
Alex Bagehot
Alex Reece
Hannes Frederic Sowa
Uri Elzur
John Fastabend
Billy O Mahony

121 - 140 of 2021