Date   

Re: [agenda] IO Visor TSC/Dev Meeting

Brenden Blanco
 

Hi all,

Let's skip the meeting this week, since there isn't a specific agenda
to discuss. Going forward, I'll send out a call for agenda reminder 2
days in advance so the meeting isn't decided last minute.

Thanks,
Brenden

On Wed, Jun 26, 2019 at 9:27 AM Brenden Blanco <bblanco@...> wrote:

On Tue, Jun 25, 2019 at 1:39 PM Brenden Blanco <bblanco@...> wrote:

Hi All,

As per the discussion from last meeting, this week's meeting will be
provisional on having a proposed agenda rather than free-form. Therefore,
please reply if there is a topic that you would like to discuss live with the
other BPF developers.
I have only received the following agenda from Brendan so far:

I have a meeting clash so I can only join for a bit; my agenda items are:
BPF tracing book
ply
Are there any further agenda items?



=== IO Visor Dev/TSC Meeting ===

Every 2 weeks on Wednesday, from Wednesday, January 25, 2017, to no end date
11:00 am | Pacific Daylight Time (San Francisco, GMT-07:00) | 30 min

https://bluejeans.com/568677804/


Re: [agenda] IO Visor TSC/Dev Meeting

Brenden Blanco
 

On Tue, Jun 25, 2019 at 1:39 PM Brenden Blanco <bblanco@...> wrote:

Hi All,

As per the discussion from last meeting, this week's meeting will be
provisional on having a proposed agenda rather than free-form. Therefore,
please reply if there is a topic that you would like to discuss live with the
other BPF developers.
I have only received the following agenda from Brendan so far:

I have a meeting clash so I can only join for a bit; my agenda items are:
BPF tracing book
ply
Are there any further agenda items?



=== IO Visor Dev/TSC Meeting ===

Every 2 weeks on Wednesday, from Wednesday, January 25, 2017, to no end date
11:00 am | Pacific Daylight Time (San Francisco, GMT-07:00) | 30 min

https://bluejeans.com/568677804/


[agenda] IO Visor TSC/Dev Meeting

Brenden Blanco
 

Hi All,

As per the discussion from last meeting, this week's meeting will be
provisional on having a proposed agenda rather than free-form. Therefore,
please reply if there is a topic that you would like to discuss live with the
other BPF developers.


=== IO Visor Dev/TSC Meeting ===

Every 2 weeks on Wednesday, from Wednesday, January 25, 2017, to no end date
11:00 am | Pacific Daylight Time (San Francisco, GMT-07:00) | 30 min

https://bluejeans.com/568677804/


bpftrace v0.9.1

Matheus Marchini <mat@...>
 

We just released bpftrace v0.9.1. This turned into a pretty big
release, with almost 200 commits and 3 months since v0.9.0. We'll try
to have smaller development cycles for the next releases.

Some highlights of this version:

- Compound assignment operators (+= and friends)
- Support arrays and IPv6 in ntop
- Add basic support to enums
- Add basic macro definition support
- Allow comparison of two string variables
- Add pre and post behavior to ++ and -- operators
- Ban kprobes that cause CPU deadlocks
- Add unsafe-mode and make default execution mode safe-mode

Full release notes can be found at:
https://github.com/iovisor/bpftrace/releases/tag/v0.9.1

Hope y'all enjoy this version! Let us know if you find any issues.
Cheers,


Re: Performance of veth XDP

Forrest Chen
 

On Wed, Jun 12, 2019 at 09:31 PM, Toshiaki Makita wrote:
I cannot reproduce it.
Is it an XDP-related problem? What happens if you use bridge in place of XDP_REDIRECT?
Did you collect tcpdump result in server-side?
Also how about checking "ethtool -S" and "nstat" (not netstat) periodically?
Thanks. I have tested in non-XDP mode and the problem also happen. It maybe a netperf bug...


Re: bpf_probe_read() split: bpftrace RFC

Matheus Marchini <mat@...>
 

How will bpf_probe_read_user/bpf_probe_read_kernel be enforced in the
Kernel? In other words, how bpf_probe_read_user will detect and report
when it get's a Kernel address as parameter, and vice-versa? Will it
be accomplished by the verifier (is it even possible to do this
reliably with the verifier) or only on runtime?

If the kernel will only test it during runtime, and it returns an
unique error code (different than errors that probe_read can return
today, we might need to create a new error code) , we could do the
following for the dereference operands (*/str()):

typedef int (probe_read_t)(void *dst, int size, void *src);

// Assuming bpf_probe_read_[user,kernel] will return EINVALADDRSPC
// if the user tires to access an address with the wrong function
int err;

// space_ctx is defined according to Brendan's email
probe_read_t default_probe_read; = space_ctx == KERNEL ?
bpf_probe_read_kernel : bpf_probe_read_user;
probe_read_t fallback_probe_read;
if (addr_space_ctx == KERNEL) {
default_probe_read = bpf_probe_read_kernel;
fallback_probe_read = bpf_probe_read_user;
}
else {
default_probe_read = bpf_probe_read_user;
fallback_probe_read = bpf_probe_read_kernel;
}

if (err = (*default_probe_read)(dst, size, src) == EINVALADDRSPC) {
err = (*fallback_probe_read)(dst, size, src);
}
if (err < 0)
{
bpf_trace_printk("Error while reading address %x\n", src);
return;
}

With this approach we can avoid breaking any scripts. The only
difference is that it will add more overhead when the fallback
probe_read is used (and if the user is affected by this overhead, they
can still use kptr/uptr/kstr/ustr). We could also: print to
stdout/syslog when the fallback method is used if bpftrace is running
in verbose mode, and provide a "strict" mode which would not try to
run the fallback probe_read.

On Thu, Jun 13, 2019 at 11:32 AM Brendan Gregg
<brendan.d.gregg@...> wrote:

G'Day,

This is the biggest change afoot to the bpftrace API, and I think we
can sort it out quickly without fuss, but it is worth sharing here.
This is from https://github.com/iovisor/bpftrace/issues/614 .

bpftrace currently allows pointer dereferencing via *addr, and
str(addr) for strings. But the future split of bpf_probe_read() into
bpf_probe_read_user() and bpf_probe_read_kernel() (to support SPARC,
etc) may break a lot of bpftrace tools and documentation. Or it may
not, if we are clever about it.

The proposal is this: add the following bpftrace builtins:

- uptr(addr): dereference user address
- ustr(addr): fetch NULL-terminated user string
- kptr(addr): dereference kernel address
- kstr(addr): fetch NULL-terminated kernel string

AND, to introduce a "context" for probe actions -- user or kernel --
where *addr and str(addr) work relative to that context. The context
would be:

- kprobes/kretprobes: kernel
- uprobes/uretprobes: user
- tracepoints: kernel (with the exception of syscall tracepoints: user)
- other probe types: kernel

It's possible that this context approach leaves us with zero broken
tools and documentation (ie, there are zero cases so far where we even
need to use uptr/ustr/kptr/kstr). I'm still checking and looking for
exceptions. Where you can help: can you think of a syscall tracepoint
that has a kernel address as an argument? Or another non-syscall
tracepoint that has a user-address as an argument? Or can you think of
any other problem with this plan?

thanks,

Brendan



bpf_probe_read() split: bpftrace RFC

Brendan Gregg
 

G'Day,

This is the biggest change afoot to the bpftrace API, and I think we
can sort it out quickly without fuss, but it is worth sharing here.
This is from https://github.com/iovisor/bpftrace/issues/614 .

bpftrace currently allows pointer dereferencing via *addr, and
str(addr) for strings. But the future split of bpf_probe_read() into
bpf_probe_read_user() and bpf_probe_read_kernel() (to support SPARC,
etc) may break a lot of bpftrace tools and documentation. Or it may
not, if we are clever about it.

The proposal is this: add the following bpftrace builtins:

- uptr(addr): dereference user address
- ustr(addr): fetch NULL-terminated user string
- kptr(addr): dereference kernel address
- kstr(addr): fetch NULL-terminated kernel string

AND, to introduce a "context" for probe actions -- user or kernel --
where *addr and str(addr) work relative to that context. The context
would be:

- kprobes/kretprobes: kernel
- uprobes/uretprobes: user
- tracepoints: kernel (with the exception of syscall tracepoints: user)
- other probe types: kernel

It's possible that this context approach leaves us with zero broken
tools and documentation (ie, there are zero cases so far where we even
need to use uptr/ustr/kptr/kstr). I'm still checking and looking for
exceptions. Where you can help: can you think of a syscall tracepoint
that has a kernel address as an argument? Or another non-syscall
tracepoint that has a user-address as an argument? Or can you think of
any other problem with this plan?

thanks,

Brendan


Re: BPF Virtual Machine Runtime

Panagiotis Moustafellos
 



On Thu, Jun 13, 2019 at 10:37 AM Fulvio Risso <fulvio.risso@...> wrote:
Just a personal comment.

Talking about "BPF VM" to students raises a lot of confusion, as they
expect a full fledged VM and do not undestand what "VM" means in this
case. Comparing to Java does not help, as most people think about Java
as a language, not as a VM.

FWIW - I don't share the same consideration, and I'd be inclined to say that clear documentation helps educate folks, whereas inventing new terminology just because there's a wall of confusion, would only lead to a greater wall longterm.

That said we can be a bit more explicit. Would framing it as "the BPF VM is a specification of an in-kernel virtual machine that runs BPF instructions" help here? The extra "in-kernel" prefix would make people understand there's a some distinction here, while still being very true to reality.


So, in my classes I started to present BPF it as a virtual CPU, which is
not that far from the reality; this improves the way people quicly
understand the concept.

Cheers,

        fulvio


On 12/06/2019 23:36, Daniel Borkmann wrote:
> On 06/12/2019 09:52 PM, Brendan Gregg wrote:
>> Following on from the call... Does this sound even better? (mapping
>> from the JVM for comparison):
>>
>> The JVM is a specification of a virtual machine that runs Java
>> bytecode. It is implemented by a Java Runtime Environment, such as
>> OpenJDK, which includes an interpreter and a JIT compiler.
>>
>> The BPF VM (BVM?) is a specification of a virtual machine that runs
>> BPF instructions (defined in filter.h, etc). It is implemented by the
>> Linux kernel BPF runtime, which includes an interpreter and a JIT
>> compiler. Most of the work for the past 5 years has been developing
>> the BPF runtime.
>
> I'd probably drop the '(defined in filter.h, etc)' part, but otherwise
> I think it's fine.
>
> Thanks,
> Daniel
>
>
>





--
Panagiotis Moustafellos
SRE Tech Lead @ Elastic


Re: BPF Virtual Machine Runtime

Fulvio Risso
 

Just a personal comment.

Talking about "BPF VM" to students raises a lot of confusion, as they expect a full fledged VM and do not undestand what "VM" means in this case. Comparing to Java does not help, as most people think about Java as a language, not as a VM.

So, in my classes I started to present BPF it as a virtual CPU, which is not that far from the reality; this improves the way people quicly understand the concept.

Cheers,

fulvio

On 12/06/2019 23:36, Daniel Borkmann wrote:
On 06/12/2019 09:52 PM, Brendan Gregg wrote:
Following on from the call... Does this sound even better? (mapping
from the JVM for comparison):

The JVM is a specification of a virtual machine that runs Java
bytecode. It is implemented by a Java Runtime Environment, such as
OpenJDK, which includes an interpreter and a JIT compiler.

The BPF VM (BVM?) is a specification of a virtual machine that runs
BPF instructions (defined in filter.h, etc). It is implemented by the
Linux kernel BPF runtime, which includes an interpreter and a JIT
compiler. Most of the work for the past 5 years has been developing
the BPF runtime.
I'd probably drop the '(defined in filter.h, etc)' part, but otherwise
I think it's fine.
Thanks,
Daniel


Re: Performance of veth XDP

Toshiaki Makita
 

On 2019/06/12 17:38, forrest0579@... wrote:
In https://lists.iovisor.org/g/iovisor-dev/topic/how_to_make_redirect_map/31867035 I have built up an environment to make veth+XDP work.
There're some question when I do some performance test
1.
When I do a performance test using iperf, I found that the test result with xdp is nearly the same as without xdp. I guess maybe it is because in xdp I have to turn off de tx offload. So my question is why the xdp would affect the veth tx offload?
Because XDP core does not support checksum (or any other) offload. Any of necessary
information for offloading will be discarded when converting skb into xdp_frame.
Basically veth XDP is not so fast when you only use XDP_PASS in containers.

2.
When I test using netperf with TCP_CRR type, I find that after some connection test, the test will be blocked. After debug with tcpdump & netstat, I find that the last connection in client-side enter into FIN_WAIT2 state, the tcpdump result for normal and abnormal connection show in gist <https://gist.github.com/ChenLingPeng/50f02a1c6f6e4e5a195206f60baece14>. Every normal connection has 10 records but the blocked abnormal connection has only 8 records. And the sequence of the first 8 records is different. I have no idea why this would happen since what I do is just redirect packets. DO anyone have any ideas?
I cannot reproduce it.
Is it an XDP-related problem? What happens if you use bridge in place of XDP_REDIRECT?
Did you collect tcpdump result in server-side?
Also how about checking "ethtool -S" and "nstat" (not netstat) periodically?

Toshiaki Makita


minutes: IO Visor TSC/Dev Meeting

Brenden Blanco
 

Hi all,

Thanks for dialing in to the meeting this week.

Going forward, in order to conserve everyone's time, we will be polling ahead
of time for an agenda, and foregoing the online meeting in favor of voluntary
email updates if no agenda is suggested. I will send out a reminder for an
agenda the day before this scheduled meeting if one hasn't been suggested.

Cheers,
Brenden

=== Discussion ===

This meeting has become somewhat repetitive, and often lacks an agenda.
The discussion that focuses around status isn't too useful.
The goal of the meeting should be to discuss specific issues.
Please send issues to discuss prior to the meeting, otherwise send regular
updates (if useful) in email format.

Brendan:
Book is not visible online yet, waiting on publisher
Reviewers are still looking at pre-copy-edit versions

Question for kernel devs: how does the verifier handle divide by zero?
Should the book call the BPF a VM or runtime? Answer: runtime
It is also an instruction set.
Any mention of turing completeness?
... waiting for bpf runtime written in bpf :)


=== Attendees ===
Brenden Blanco
Michael Savisko
Bjorn Topel
Jesper Brouer
Jakub Kicinski
Daniel Borkmann
Jiong Wang
Alexei Starovoitov
Joe Stringer
Marco Leogrande
Martin Lau
Maciej Fijalkowski
Brendan Gregg
Dan Siemon
John F
Quentin Monnet
Richard Elling


Re: BPF Virtual Machine Runtime

Daniel Borkmann
 

On 06/12/2019 09:52 PM, Brendan Gregg wrote:
Following on from the call... Does this sound even better? (mapping
from the JVM for comparison):

The JVM is a specification of a virtual machine that runs Java
bytecode. It is implemented by a Java Runtime Environment, such as
OpenJDK, which includes an interpreter and a JIT compiler.

The BPF VM (BVM?) is a specification of a virtual machine that runs
BPF instructions (defined in filter.h, etc). It is implemented by the
Linux kernel BPF runtime, which includes an interpreter and a JIT
compiler. Most of the work for the past 5 years has been developing
the BPF runtime.
I'd probably drop the '(defined in filter.h, etc)' part, but otherwise
I think it's fine.

Thanks,
Daniel


BPF Virtual Machine Runtime

Brendan Gregg
 

Following on from the call... Does this sound even better? (mapping
from the JVM for comparison):

The JVM is a specification of a virtual machine that runs Java
bytecode. It is implemented by a Java Runtime Environment, such as
OpenJDK, which includes an interpreter and a JIT compiler.

The BPF VM (BVM?) is a specification of a virtual machine that runs
BPF instructions (defined in filter.h, etc). It is implemented by the
Linux kernel BPF runtime, which includes an interpreter and a JIT
compiler. Most of the work for the past 5 years has been developing
the BPF runtime.

Brendan


Performance of veth XDP

Forrest Chen
 

In https://lists.iovisor.org/g/iovisor-dev/topic/how_to_make_redirect_map/31867035 I have built up an environment to make veth+XDP work.
There're some question when I do some performance test

1. 
When I do a performance test using iperf, I found that the test result with xdp is nearly the same as without xdp. I guess maybe it is because in xdp I have to turn off de tx offload. So my question is why the xdp would affect the veth tx offload?

2. 
When I test using netperf with TCP_CRR type, I find that after some connection test, the test will be blocked. After debug with tcpdump & netstat, I find that the last connection in client-side enter into FIN_WAIT2 state, the tcpdump result for normal and abnormal connection show in gist. Every normal connection has 10 records but the blocked abnormal connection has only 8 records. And the sequence of the first 8 records is different. I have no idea why this would happen since what I do is just redirect packets. DO anyone have any ideas? 


Re: reminder: IO Visor TSC/Dev Meeting

Alexei Starovoitov
 

Brenden,
thanks for the bi-weekly reminders!

All,
if you have any topics to discuss, please email them to the list,
so folks have better idea what to expect tomorrow.

Thanks!

On Tue, Jun 11, 2019 at 4:33 PM Brenden Blanco <bblanco@...> wrote:

Please join us tomorrow for our bi-weekly call. As usual, this meeting is
open to everybody and completely optional.
You might be interested to join if:
You want to know what is going on in BPF land
You are doing something interesting yourself with BPF and would like to share
You want to know what the heck BPF is

=== IO Visor Dev/TSC Meeting ===

Every 2 weeks on Wednesday, from Wednesday, January 25, 2017, to no end date
11:00 am | Pacific Daylight Time (San Francisco, GMT-07:00) | 30 min

https://bluejeans.com/568677804/

https://www.timeanddate.com/worldclock/meetingdetails.html?year=2019&month=6&day=12&hour=18&min=0&sec=0&p1=900



reminder: IO Visor TSC/Dev Meeting

Brenden Blanco
 

Please join us tomorrow for our bi-weekly call. As usual, this meeting is
open to everybody and completely optional.
You might be interested to join if:
You want to know what is going on in BPF land
You are doing something interesting yourself with BPF and would like to share
You want to know what the heck BPF is

=== IO Visor Dev/TSC Meeting ===

Every 2 weeks on Wednesday, from Wednesday, January 25, 2017, to no end date
11:00 am | Pacific Daylight Time (San Francisco, GMT-07:00) | 30 min

https://bluejeans.com/568677804/

https://www.timeanddate.com/worldclock/meetingdetails.html?year=2019&month=6&day=12&hour=18&min=0&sec=0&p1=900


Re: Headers Parsing with fields of variable length

Raymond
 

On 2019-06-10 12:35 p.m., mdimolianis@... wrote:
I am trying to create a header for the DNS protocol and parse DNS queries however I cannot parse headers with variable size e.g. the domain name (due to looping constraints of XDP). Is there a method I could handle cases like that?
Wouldn't you punt that to a userland xdp listener for action?  Dns packets are complicated.


Headers Parsing with fields of variable length

mdimolianis@...
 

Hi all,
I am trying to create a header for the DNS protocol and parse DNS queries however I cannot parse headers with variable size e.g. the domain name (due to looping constraints of XDP). Is there a method I could handle cases like that?
Thank you in advance!


Re: how to make redirect_map work?

Forrest Chen
 

On Tue, Jun 4, 2019 at 12:36 PM, Mauricio Vasquez wrote:
I am sorry, I was not clear enough. If you attach the program in SKB mode you won't need to attach any XDP program on vbox1 and vbox2, on the other hand, if you use DRV mode you need to have an XDP pass program attached to vbox1 and vbox2 (as indicated by Toshiaki Makita).
I'm sorry, it's my fault. I've re-test use SKB mode and it works now. I think the reason why I failed before was I didn't change the dst MAC address so the kernel drops it.

Forrest


Re: how to make redirect_map work?

Mauricio Vasquez
 


On 5/30/19 9:25 PM, forrest0579@... wrote:
On Thu, May 30, 2019 at 05:40 AM, Mauricio Vasquez wrote:

You're using veth interfaces, in this case you have to attach the program in SKB mode, to do it set flags = 1 << 1.

 

Why should I attach xdp in SKB mode when using veth interface, is there any docs for that? Is it because I use DEVMAP?
In my test, I can attach my xdp program in driver mode using veth and just works as my expect when I just return XDP_DROP or XDP_PASS.
My kerner version is "5.0.0-15"(ubuntu/disco64) which support veth xdp in driver mode. https://github.com/xdp-project/xdp-project/issues/23

And when I test my program in SKB mode, the connection also can't be built.

I am sorry, I was not clear enough. If you attach the program in SKB mode you won't need to attach any XDP program on vbox1 and vbox2, on the other hand, if you use DRV mode you need to have an XDP pass program attached to vbox1 and vbox2 (as indicated by Toshiaki Makita).

Mauricio.


301 - 320 of 2027