Date   

Re: BPF Virtual Machine Runtime

Panagiotis Moustafellos
 



On Thu, Jun 13, 2019 at 10:37 AM Fulvio Risso <fulvio.risso@...> wrote:
Just a personal comment.

Talking about "BPF VM" to students raises a lot of confusion, as they
expect a full fledged VM and do not undestand what "VM" means in this
case. Comparing to Java does not help, as most people think about Java
as a language, not as a VM.

FWIW - I don't share the same consideration, and I'd be inclined to say that clear documentation helps educate folks, whereas inventing new terminology just because there's a wall of confusion, would only lead to a greater wall longterm.

That said we can be a bit more explicit. Would framing it as "the BPF VM is a specification of an in-kernel virtual machine that runs BPF instructions" help here? The extra "in-kernel" prefix would make people understand there's a some distinction here, while still being very true to reality.


So, in my classes I started to present BPF it as a virtual CPU, which is
not that far from the reality; this improves the way people quicly
understand the concept.

Cheers,

        fulvio


On 12/06/2019 23:36, Daniel Borkmann wrote:
> On 06/12/2019 09:52 PM, Brendan Gregg wrote:
>> Following on from the call... Does this sound even better? (mapping
>> from the JVM for comparison):
>>
>> The JVM is a specification of a virtual machine that runs Java
>> bytecode. It is implemented by a Java Runtime Environment, such as
>> OpenJDK, which includes an interpreter and a JIT compiler.
>>
>> The BPF VM (BVM?) is a specification of a virtual machine that runs
>> BPF instructions (defined in filter.h, etc). It is implemented by the
>> Linux kernel BPF runtime, which includes an interpreter and a JIT
>> compiler. Most of the work for the past 5 years has been developing
>> the BPF runtime.
>
> I'd probably drop the '(defined in filter.h, etc)' part, but otherwise
> I think it's fine.
>
> Thanks,
> Daniel
>
>
>





--
Panagiotis Moustafellos
SRE Tech Lead @ Elastic


Re: BPF Virtual Machine Runtime

Fulvio Risso
 

Just a personal comment.

Talking about "BPF VM" to students raises a lot of confusion, as they expect a full fledged VM and do not undestand what "VM" means in this case. Comparing to Java does not help, as most people think about Java as a language, not as a VM.

So, in my classes I started to present BPF it as a virtual CPU, which is not that far from the reality; this improves the way people quicly understand the concept.

Cheers,

fulvio

On 12/06/2019 23:36, Daniel Borkmann wrote:
On 06/12/2019 09:52 PM, Brendan Gregg wrote:
Following on from the call... Does this sound even better? (mapping
from the JVM for comparison):

The JVM is a specification of a virtual machine that runs Java
bytecode. It is implemented by a Java Runtime Environment, such as
OpenJDK, which includes an interpreter and a JIT compiler.

The BPF VM (BVM?) is a specification of a virtual machine that runs
BPF instructions (defined in filter.h, etc). It is implemented by the
Linux kernel BPF runtime, which includes an interpreter and a JIT
compiler. Most of the work for the past 5 years has been developing
the BPF runtime.
I'd probably drop the '(defined in filter.h, etc)' part, but otherwise
I think it's fine.
Thanks,
Daniel


Re: Performance of veth XDP

Toshiaki Makita
 

On 2019/06/12 17:38, forrest0579@... wrote:
In https://lists.iovisor.org/g/iovisor-dev/topic/how_to_make_redirect_map/31867035 I have built up an environment to make veth+XDP work.
There're some question when I do some performance test
1.
When I do a performance test using iperf, I found that the test result with xdp is nearly the same as without xdp. I guess maybe it is because in xdp I have to turn off de tx offload. So my question is why the xdp would affect the veth tx offload?
Because XDP core does not support checksum (or any other) offload. Any of necessary
information for offloading will be discarded when converting skb into xdp_frame.
Basically veth XDP is not so fast when you only use XDP_PASS in containers.

2.
When I test using netperf with TCP_CRR type, I find that after some connection test, the test will be blocked. After debug with tcpdump & netstat, I find that the last connection in client-side enter into FIN_WAIT2 state, the tcpdump result for normal and abnormal connection show in gist <https://gist.github.com/ChenLingPeng/50f02a1c6f6e4e5a195206f60baece14>. Every normal connection has 10 records but the blocked abnormal connection has only 8 records. And the sequence of the first 8 records is different. I have no idea why this would happen since what I do is just redirect packets. DO anyone have any ideas?
I cannot reproduce it.
Is it an XDP-related problem? What happens if you use bridge in place of XDP_REDIRECT?
Did you collect tcpdump result in server-side?
Also how about checking "ethtool -S" and "nstat" (not netstat) periodically?

Toshiaki Makita


minutes: IO Visor TSC/Dev Meeting

Brenden Blanco
 

Hi all,

Thanks for dialing in to the meeting this week.

Going forward, in order to conserve everyone's time, we will be polling ahead
of time for an agenda, and foregoing the online meeting in favor of voluntary
email updates if no agenda is suggested. I will send out a reminder for an
agenda the day before this scheduled meeting if one hasn't been suggested.

Cheers,
Brenden

=== Discussion ===

This meeting has become somewhat repetitive, and often lacks an agenda.
The discussion that focuses around status isn't too useful.
The goal of the meeting should be to discuss specific issues.
Please send issues to discuss prior to the meeting, otherwise send regular
updates (if useful) in email format.

Brendan:
Book is not visible online yet, waiting on publisher
Reviewers are still looking at pre-copy-edit versions

Question for kernel devs: how does the verifier handle divide by zero?
Should the book call the BPF a VM or runtime? Answer: runtime
It is also an instruction set.
Any mention of turing completeness?
... waiting for bpf runtime written in bpf :)


=== Attendees ===
Brenden Blanco
Michael Savisko
Bjorn Topel
Jesper Brouer
Jakub Kicinski
Daniel Borkmann
Jiong Wang
Alexei Starovoitov
Joe Stringer
Marco Leogrande
Martin Lau
Maciej Fijalkowski
Brendan Gregg
Dan Siemon
John F
Quentin Monnet
Richard Elling


Re: BPF Virtual Machine Runtime

Daniel Borkmann
 

On 06/12/2019 09:52 PM, Brendan Gregg wrote:
Following on from the call... Does this sound even better? (mapping
from the JVM for comparison):

The JVM is a specification of a virtual machine that runs Java
bytecode. It is implemented by a Java Runtime Environment, such as
OpenJDK, which includes an interpreter and a JIT compiler.

The BPF VM (BVM?) is a specification of a virtual machine that runs
BPF instructions (defined in filter.h, etc). It is implemented by the
Linux kernel BPF runtime, which includes an interpreter and a JIT
compiler. Most of the work for the past 5 years has been developing
the BPF runtime.
I'd probably drop the '(defined in filter.h, etc)' part, but otherwise
I think it's fine.

Thanks,
Daniel


BPF Virtual Machine Runtime

Brendan Gregg
 

Following on from the call... Does this sound even better? (mapping
from the JVM for comparison):

The JVM is a specification of a virtual machine that runs Java
bytecode. It is implemented by a Java Runtime Environment, such as
OpenJDK, which includes an interpreter and a JIT compiler.

The BPF VM (BVM?) is a specification of a virtual machine that runs
BPF instructions (defined in filter.h, etc). It is implemented by the
Linux kernel BPF runtime, which includes an interpreter and a JIT
compiler. Most of the work for the past 5 years has been developing
the BPF runtime.

Brendan


Performance of veth XDP

Forrest Chen
 

In https://lists.iovisor.org/g/iovisor-dev/topic/how_to_make_redirect_map/31867035 I have built up an environment to make veth+XDP work.
There're some question when I do some performance test

1. 
When I do a performance test using iperf, I found that the test result with xdp is nearly the same as without xdp. I guess maybe it is because in xdp I have to turn off de tx offload. So my question is why the xdp would affect the veth tx offload?

2. 
When I test using netperf with TCP_CRR type, I find that after some connection test, the test will be blocked. After debug with tcpdump & netstat, I find that the last connection in client-side enter into FIN_WAIT2 state, the tcpdump result for normal and abnormal connection show in gist. Every normal connection has 10 records but the blocked abnormal connection has only 8 records. And the sequence of the first 8 records is different. I have no idea why this would happen since what I do is just redirect packets. DO anyone have any ideas? 


Re: reminder: IO Visor TSC/Dev Meeting

Alexei Starovoitov
 

Brenden,
thanks for the bi-weekly reminders!

All,
if you have any topics to discuss, please email them to the list,
so folks have better idea what to expect tomorrow.

Thanks!

On Tue, Jun 11, 2019 at 4:33 PM Brenden Blanco <bblanco@...> wrote:

Please join us tomorrow for our bi-weekly call. As usual, this meeting is
open to everybody and completely optional.
You might be interested to join if:
You want to know what is going on in BPF land
You are doing something interesting yourself with BPF and would like to share
You want to know what the heck BPF is

=== IO Visor Dev/TSC Meeting ===

Every 2 weeks on Wednesday, from Wednesday, January 25, 2017, to no end date
11:00 am | Pacific Daylight Time (San Francisco, GMT-07:00) | 30 min

https://bluejeans.com/568677804/

https://www.timeanddate.com/worldclock/meetingdetails.html?year=2019&month=6&day=12&hour=18&min=0&sec=0&p1=900



reminder: IO Visor TSC/Dev Meeting

Brenden Blanco
 

Please join us tomorrow for our bi-weekly call. As usual, this meeting is
open to everybody and completely optional.
You might be interested to join if:
You want to know what is going on in BPF land
You are doing something interesting yourself with BPF and would like to share
You want to know what the heck BPF is

=== IO Visor Dev/TSC Meeting ===

Every 2 weeks on Wednesday, from Wednesday, January 25, 2017, to no end date
11:00 am | Pacific Daylight Time (San Francisco, GMT-07:00) | 30 min

https://bluejeans.com/568677804/

https://www.timeanddate.com/worldclock/meetingdetails.html?year=2019&month=6&day=12&hour=18&min=0&sec=0&p1=900


Re: Headers Parsing with fields of variable length

Raymond
 

On 2019-06-10 12:35 p.m., mdimolianis@... wrote:
I am trying to create a header for the DNS protocol and parse DNS queries however I cannot parse headers with variable size e.g. the domain name (due to looping constraints of XDP). Is there a method I could handle cases like that?
Wouldn't you punt that to a userland xdp listener for action?  Dns packets are complicated.


Headers Parsing with fields of variable length

mdimolianis@...
 

Hi all,
I am trying to create a header for the DNS protocol and parse DNS queries however I cannot parse headers with variable size e.g. the domain name (due to looping constraints of XDP). Is there a method I could handle cases like that?
Thank you in advance!


Re: how to make redirect_map work?

Forrest Chen
 

On Tue, Jun 4, 2019 at 12:36 PM, Mauricio Vasquez wrote:
I am sorry, I was not clear enough. If you attach the program in SKB mode you won't need to attach any XDP program on vbox1 and vbox2, on the other hand, if you use DRV mode you need to have an XDP pass program attached to vbox1 and vbox2 (as indicated by Toshiaki Makita).
I'm sorry, it's my fault. I've re-test use SKB mode and it works now. I think the reason why I failed before was I didn't change the dst MAC address so the kernel drops it.

Forrest


Re: how to make redirect_map work?

Mauricio Vasquez
 


On 5/30/19 9:25 PM, forrest0579@... wrote:
On Thu, May 30, 2019 at 05:40 AM, Mauricio Vasquez wrote:

You're using veth interfaces, in this case you have to attach the program in SKB mode, to do it set flags = 1 << 1.

 

Why should I attach xdp in SKB mode when using veth interface, is there any docs for that? Is it because I use DEVMAP?
In my test, I can attach my xdp program in driver mode using veth and just works as my expect when I just return XDP_DROP or XDP_PASS.
My kerner version is "5.0.0-15"(ubuntu/disco64) which support veth xdp in driver mode. https://github.com/xdp-project/xdp-project/issues/23

And when I test my program in SKB mode, the connection also can't be built.

I am sorry, I was not clear enough. If you attach the program in SKB mode you won't need to attach any XDP program on vbox1 and vbox2, on the other hand, if you use DRV mode you need to have an XDP pass program attached to vbox1 and vbox2 (as indicated by Toshiaki Makita).

Mauricio.



Re: how to make redirect_map work?

Forrest Chen
 

On Mon, Jun 3, 2019 at 02:53 AM, Toshiaki Makita wrote:
You should not need SKB mode on kernel 5.0.
Do you attach any XDP program on vbox1 and vbox2? If not, redirected packets will be dropped.
Please refer to the slides below for details.
https://netdevconf.org/0x13/session.html?talk-veth-xdp
Thanks for your material. It is really helpful. 
After attach XDP_PASS program on vbox1 and vbox2, and set the right dst mac address in xdp redirect program, I can now ping success from ns1 to ns2 (192.168.1.2->192.168.2.2) :)




 


How to trigger BPF program execution from user space

tranviethoang.vn@...
 

Hi all,

I have a use case that the user daemon needs to trigger the execution of a TCP-BPF program (which was loaded already by the same user daemon).
This may be a trivial question but after searching around, I could not find a good answer yet.

Thank you in advance,
Hoang


BCC integration into Buildroot

Jugurtha BELKALEM
 

Hi,

I've been doing some Linux debugging since one year, and I've used  BCC to solve multiple issues (like writting a ddos detector : https://github.com/iovisor/bcc/blob/master/examples/tracing/dddos.py). I have made an article : http://www.linuxembedded.fr/2019/03/les-secrets-du-traceur-ebpf/ (to present BCC to french community).

But, because my job focuses mainly on embedded systems; I and my colleague "Romain Naour" ported BCC to the Buildroot project and tests were already successful for ARM64 (Raspberry PI 3) as described in this article : http://www.linuxembedded.fr/2019/05/bcc-integration-into-buildroot/.

BCC is such a great tool and I'd love to know what you think about running it on tiny devices.

Note : sorry if you have received this mail twice, I've just added the mailing list.
Regards. 

--

Jugurtha.


--
SMILE 

32 boulevard Vincent Gâche
44200 NANTES

Jugurtha BELKALEM
Ingénieur Etude et Développement 1


Twitter Facebook LinkedIn Github


eco Pour la planète, n'imprimez ce mail que si c'est nécessaire
                    
      


Re: how to make redirect_map work?

Toshiaki Makita
 

On 2019/05/31 11:25, forrest0579@... wrote:
On Thu, May 30, 2019 at 05:40 AM, Mauricio Vasquez wrote:
You're using veth interfaces, in this case you have to attach the
program in SKB mode, to do it set flags = 1 << 1.
Why should I attach xdp in SKB mode when using veth interface, is there any docs for that? Is it because I use DEVMAP?
You should not need SKB mode on kernel 5.0.
Do you attach any XDP program on vbox1 and vbox2? If not, redirected packets will be dropped.
Please refer to the slides below for details.
https://netdevconf.org/0x13/session.html?talk-veth-xdp

Toshiaki Makita

In my test, I can attach my xdp program in driver mode using veth and just works as my expect when I just return XDP_DROP or XDP_PASS.
My kerner version is "5.0.0-15"(ubuntu/disco64) which support veth xdp in driver mode. https://github.com/xdp-project/xdp-project/issues/23
And when I test my program in SKB mode, the connection also can't be built.


Packet replication using EBPF, not scaling beyond 28kpps, pls let me know any optimization if possible in the code.

Prashanth Fernando
 

Hi, 

I am implementing a EBPF based packet replicator running as part of TC classisifer.
The problem is I am not able to scale beyond 28K, once the pps rate goes beyond 28kpps I see traffic loss and the CPU usage is well within 5%.

Commands used to load EBPF code in TC:
sudo tc qdisc add dev $(DEVICE) ingress handle ffff:
sudo tc filter add dev $(DEVICE) parent ffff: bpf obj replicator.o classifier flowid ffff:1 

Attaching the lscpu output and the code snippet.
Please do let me know if there are any optimizations possible to handle more packets.

Thank You,
Prashnath 




Re: how to make redirect_map work?

Forrest Chen
 

On Thu, May 30, 2019 at 05:40 AM, Mauricio Vasquez wrote:

You're using veth interfaces, in this case you have to attach the program in SKB mode, to do it set flags = 1 << 1.

 

Why should I attach xdp in SKB mode when using veth interface, is there any docs for that? Is it because I use DEVMAP?
In my test, I can attach my xdp program in driver mode using veth and just works as my expect when I just return XDP_DROP or XDP_PASS.
My kerner version is "5.0.0-15"(ubuntu/disco64) which support veth xdp in driver mode. https://github.com/xdp-project/xdp-project/issues/23

And when I test my program in SKB mode, the connection also can't be built.


Re: Behavior of bpf_obj_get

Yonghong Song
 

On Thu, May 30, 2019 at 11:26 AM Adam Drescher
<adam.r.drescher@...> wrote:

Thank you for your answer, that makes sense.

This leads me to a follow up question: is there a standard way to see
if a pinned map has been reloaded, without closing and re-opening the
map file every time to check? From your answer, now I know we cannot
If you take a reference count for the pinned map, the map cannot be
removed.
If you are talking about the content change of a pinned map, the applicaiton
has to check the map key/values to find it. Just like shared memory.

do this by monitoring a change in the map's file descriptor. We cannot
use the map ID from bpf_obj_get_info_by_fd, as the info still
corresponds to the old map. As far as I can tell, we don't have access
to the BPF object or program pointers in a standalone userspace
daemon.
You got a map fd, you should use map fd to do standard map lookup/insert/delete
operations.


On Thu, May 30, 2019 at 12:00 PM Y Song <ys114321@...> wrote:

On Thu, May 30, 2019 at 8:40 AM Adam Drescher <adam.r.drescher@...> wrote:

I am seeing unexpected behavior from bpf_obj_get, although this is
likely due to my inexperience with BPF.

In a loader program, I create a pinned map at
"sys/fs/bpf/test/xdp_stats_map". In a separate statistics program, I
access the pinned map via file descriptor -- I got the file descriptor
from a call to bpf_obj_get and provided the pathname above.

In a polling loop, I call bpf_obj_get on the same pathname and compare
this value to the original. However, instead of getting the same file
descriptor, the file descriptor returned by bpf_obj_get increments by
1 each invocation (so it returns 4, 5, 6, 7, ...). I am not doing
anything externally to reload the map or change the file descriptor.
Why is this happening? Looking at samples/bpf/fds_example.c as example
usage of bpf_obj_get, I would expect to get the same file descriptor
back. Any ideas?
No, you won't get the same file descriptor although they pointing to
the same map.
- the original map fd holds a reference count in the kernel.
- bpf_obj_get returns a new fd and map reference count is increased
by 1 as well

You can close the original map fd and the fd returned by bpf_obj_get() would
still be valid, and vice verse.


Relevant code (filename has already been populated elsewhere):

#ifndef PATH_MAX
#define PATH_MAX 4096
#endif
char filename[PATH_MAX];

static void stats_poll(int map_fd, __u32 map_type, int interval)
{
int fd;
while (1) {
fd = bpf_obj_get(filename);
printf("filename: %s\n", filename);
printf("bpf_obj_get: %d - map_fd: %d\n", fd, map_fd);
sleep(interval);
}
}

Relevant output:
- BPF map (bpf_map_type:6) id:42 name:xdp_stats_map key_size:4
value_size:16 max_entries:5
filename: /sys/fs/bpf/test/xdp_stats_map
bpf_obj_get: 4 - map_fd: 3

filename: /sys/fs/bpf/test/xdp_stats_map
bpf_obj_get: 5 - map_fd: 3

filename: /sys/fs/bpf/test/xdp_stats_map
bpf_obj_get: 6 - map_fd: 3


301 - 320 of 2020