Date   

Re: A slightly different use of the BCC tools

Rich Lane <rich.lane@...>
 

You can also use objcopy:

    objcopy -I elf64-little -O binary -j.text foo.elf foo.bin

The output file will contain just the eBPF instructions.

On Tue, Feb 16, 2016 at 11:51 AM, Alexei Starovoitov <alexei.starovoitov@...> wrote:
take a look at native c++ api of bcc in bpf_module.h
with load_string() you can pass C code as a string
and function_start() + function_size() will give you raw bpf instructions.
Note that compiler doesn't guarantee the safety.
One can write for(;;); and llvm will generate bpf code with this infinite loop.
In-kernel verifier checks for safety.


On Tue, Feb 16, 2016 at 8:32 PM, Gerard via iovisor-dev
<iovisor-dev@...> wrote:
> Hi Rich,
>
> Thanks for answering. I'm loading eBPF from ELF, but the idea is to load it
> from raw instructions when I'm able to just build the instructions using
> clang and not the complete ELF object. Right now my Clang backend is
> something like this: http://stackoverflow.com/a/34966966/1132943 that's why
> I wanted to use bcc as backend as it seems to just generate the raw
> instructions and not an ELF object.
>
> Another question is that I didn't thought that char foo[] = "abc"; is
> different from char *foo = "abc"; that's why I wasn't able to generate the
> code that copies the string into the stack... Thanks for pointing this.
>
> Now I can focus on understanding how bcc uses Clang/LLVM to generate the
> eBPF instructions and adapt it.
>
> Gerard
>
> El mar., 16 feb. 2016 a las 19:23, Rich Lane (<rich.lane@...>)
> escribió:
>>
>> Hi Gerard,
>>
>> Are you loading eBPF from ELF or from raw instructions? If it's ELF I
>> could add support for those relocations and load rodata.
>>
>> Otherwise, you could try copying the string to the stack manually before
>> the function call. The tc samples do this.
>>
>>     char foo[] = "abc";
>>
>> The compiler turns this into a sequence of load-immediate/store
>> instructions.
>>
>> Thanks,
>> Rich
>>
>> On Tue, Feb 16, 2016 at 4:30 AM, Gerard via iovisor-dev
>> <iovisor-dev@...> wrote:
>>>
>>> Hello,
>>>
>>> I'd like to explain a different use for eBPF in which I'm currently
>>> working on and ask for a little bit of help.
>>>
>>> I'm part of a research group which works on opportunistic networks, e.g.
>>> sensor networks. We propose an approach where the messages in the carry
>>> their forwarding protocol. This way different applications, with different
>>> forwarding protocols, can use the same network without having to install
>>> these protocols on each of the nodes, which is quite difficult in a network
>>> where nodes are completely disconnected from each other for extended periods
>>> of time.
>>>
>>> There is where eBPF comes to play. The forwarding codes need to be
>>> executed quickly and securely therefore, the use of eBPF seems like a good
>>> idea.
>>>
>>> Right now I have implemented a proof of concept using libclang to compile
>>> the C codes to eBPF and then I use ubpf (https://github.com/rlane/ubpf) to
>>> load and execute the resulting objects. The problem is that this way I can't
>>> use external functions that have arguments other than integers.
>>>
>>> I have noticed that bcc is capable of having strings (char *) as function
>>> parameters because they are previously written in the stack, and I'm trying
>>> to understand how this code is generated.
>>>
>>> If I'm not mistaken bcc uses libclang to generate a LLVM IR module (in
>>> frontends/clang/loader.cc), and then I'm not able to understand if it
>>> implements its eBPF code generator or if it modifies the eBPF code generator
>>> that is implemented in LLVM. What I'm trying to do is to register some
>>> external functions to the code generator so it doesn't complain when I use
>>> unknown functions or just leave the functions unlinked so I can link them
>>> using ubpf.
>>>
>>> The goal is to provide a library that user space applications could use
>>> to execute eBPF codes for their purposes.
>>>
>>> Any help will be highly appreciated.
>>>
>>> Thanks!
>>>
>>> Gerard
>>>
>>>
>>>
>>> _______________________________________________
>>> iovisor-dev mailing list
>>> iovisor-dev@...
>>> https://lists.iovisor.org/mailman/listinfo/iovisor-dev
>>>
>>
>
> _______________________________________________
> iovisor-dev mailing list
> iovisor-dev@...
> https://lists.iovisor.org/mailman/listinfo/iovisor-dev
>


reminder: IO Visor TSC and Dev Members Call

Brenden Blanco <bblanco@...>
 

Hi,

Please feel free to join us today at 11am PST (1900 UTC) for another
round of IOVisor developer updates and discussions.

http://www.timeanddate.com/worldclock/meetingdetails.html?year=2016&month=2&day=17&hour=19&min=0&sec=0&p1=886

The meeting is open to join:

JOIN WEBEX MEETING
https://plumgrid.webex.com/plumgrid/j.php?MTID=m67436ba0408d6bad48acd69138c03aea
Meeting number: 283 885 640
Meeting password: iovisor


JOIN BY PHONE
+1-415-655-0003 US TOLL
Access code: 283 885 640

Global call-in numbers:
https://plumgrid.webex.com/plumgrid/globalcallin.php?serviceType=MC&ED=44474908&tollFree=0



Can't join the meeting? Contact support here:
https://plumgrid.webex.com/plumgrid/mc


IMPORTANT NOTICE: Please note that this WebEx service allows audio and
other information sent during the session to be recorded, which may be
discoverable in a legal matter. By joining this session, you
automatically consent to such recordings. If you do not consent to
being recorded, discuss your concerns with the host or do not join the
session.


Re: [diamon-discuss] Diamon Meeting on Tuesday February 9th, 2016, at 11h EDT (15h UTC)

Alexei Starovoitov
 

On Thu, Feb 18, 2016 at 2:44 PM, Mathieu Desnoyers
<mathieu.desnoyers@...> wrote:
Adding Alexei and Brendan in CC, they will likely be interested
in the eBPF discussion.
yep :)
please cc iovisor-dev as well for ideas you guys have around
making use of bpf kernel bits with 'latency-tracker'.

----- On Feb 17, 2016, at 11:41 PM, Julien Desfossez jdesfossez@... wrote:

Hi,

We are currently setting up a place on the diamon website so we can
upload the slides, I will send the link soon.

Regarding the mainlining, we intend to push this projet upstream. Before
that, we are working on more measurements/benchmarks, stability testing
and usability improvements.

Also, one things that we would like to investigate before sending the
patches is the possibility of using eBPF/bcc to handle the probes while
keeping the latency-tracker as the backend to keep the state and all the
more advanced interactions with the kernel. That way, we could combine
the flexibility/safety of eBPF and the efficiency of the
latency-tracker.

Thanks for your feedback, I will keep this list updated of the progress
we make on that front.

Julien

On 14-Feb-2016 11:37:44 AM, Al Grant wrote:
Hi,

I have read the blog and this looks a very useful technique but unfortunately I
couldn't make the call. Are slides/minutes available?

My main question is not actually technical - it's about the prospects for
getting
this into mainline Linux.

Thanks,

Al


-----Original Message-----
From: diamon-discuss-bounces@... [mailto:diamon-
discuss-bounces@...] On Behalf Of Julien Desfossez
Sent: 08 February 2016 19:14
To: Mathieu Desnoyers
Cc: diamon-discuss@...
Subject: Re: [diamon-discuss] Diamon Meeting on Tuesday February 9th, 2016,
at 11h EDT (15h UTC)

Hi,

Here is the information for tomorrow's call:
February 9th, 2016, at 11h EDT (15h UTC)

Conference: 415-906-5657 PIN: 88326
URL for the screen sharing: www.uberconference.com/mdolan

Local International phone numbers are available at:
https://www.uberconference.com/international

Please note anyone dialing using the international numbers need to dial the
local international number for the country they are in, then enter the US
conference number (4159065657), and then enter the PIN (88326).

Julien

On 28-Jan-2016 08:08:45 PM, Mathieu Desnoyers wrote:
Hi,

Following the blog post published two weeks ago [1], we would like to
propose organizing a phone meeting with all interested members of this
workgroup on February 9th to gather feedback and ideas for improvement
on the subject of measuring and detecting high response time.

At EfficiOS, we have developed a kernel module for monitoring at
run-time the delay between the moment the kernel starts processing an
interrupt
(do_IRQ) and the moment the target task gets scheduled in or has
finished processing the data.

When a high latency is detected, it emits a tracepoint event and can
wakeup a user-space script to take arbitrary actions as soon as
possible.

The main intent is to provide an entry point in a kernel trace. After
that, everyone has their own methodology to process the trace. The
blog post illustrates what we can do with LTTng as an example but the
detection and triggers are not coupled with any tracer.

The proposed agenda is a discussion around these points:
- presentation of the scope of the problem
- limitation of the current tools
- overview of the latency_tracker module applied for this use-case
-- current state
-- use-cases
-- future plans
- from the audience: comments, ideas, other approaches, etc.

If you have other points you would like to discuss around this
subject, please let me know and I will add them.

Also, if you wish to attend but can't make it at the proposed date and
time, let us know.

The details for the conference call will be sent soon.

Thanks,

Julien & Mathieu

[1] https://lttng.org/blog/2016/01/06/monitoring-realtime-latencies/

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
_______________________________________________
diamon-discuss mailing list
diamon-discuss@...
https://lists.linuxfoundation.org/mailman/listinfo/diamon-discuss
IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended recipient,
please notify the sender immediately and do not disclose the contents to any
other person, use it for any purpose, or store or copy the information in any
medium. Thank you.
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


unknown func 13

O Mahony, Billy <billy.o.mahony@...>
 

Hi All,

I'm doing my version of hello world for eBPF by forwarding eth frames between two nics - eth1 and eth3 on my machine. Using bpf_clone_redirect(). (cc/export/helpers.h says bpf_redirect is not available until a later kernel.)

When I run the python program it generates:

[17:42 GD-WCP my_bpf]$ sudo python p2p.py
bpf: Invalid argument
0: (b7) r2 = 5
1: (b7) r3 = 0
2: (85) call 13
unknown func 13

Traceback (most recent call last):
File "p2p.py", line 86, in <module>
function_eth1_ic = bpf.load_func("eth1_ic", BPF.SOCKET_FILTER)
File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 165, in load_func
raise Exception("Failed to load BPF program %s" % func_name)
Exception: Failed to load BPF program eth1_ic

The bpf program is:

int eth1_ic(struct __sk_buff *skb) {
bpf_clone_redirect(skb, 5, 0);
return -1;
}

I've hard coded the egress ifindex based on output from '$ipi a'

Bpf_clone_redirect is indeed the 13th entry in the bpf_func_id enum but why is it reported as 'unknown' ?

I'm running Ubuntu 14.04 with the kernel upgraded using binary packages to 4.3.0-040300-generic from Mon Nov 2 2015.

Thanks,
Billy.


Re: unknown func 13

Brenden Blanco <bblanco@...>
 


On Tue, Mar 1, 2016 at 9:52 AM, O Mahony, Billy via iovisor-dev <iovisor-dev@...> wrote:
Hi All,

I'm doing my version of hello world for eBPF by forwarding eth frames between two nics - eth1 and eth3 on my machine.  Using bpf_clone_redirect(). (cc/export/helpers.h says bpf_redirect is not available until a later kernel.)

When I run the python program it generates:

[17:42 GD-WCP my_bpf]$ sudo python p2p.py
bpf: Invalid argument
0: (b7) r2 = 5
1: (b7) r3 = 0
2: (85) call 13
unknown func 13

Traceback (most recent call last):
  File "p2p.py", line 86, in <module>
    function_eth1_ic = bpf.load_func("eth1_ic", BPF.SOCKET_FILTER)
  File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 165, in load_func
    raise Exception("Failed to load BPF program %s" % func_name)
Exception: Failed to load BPF program eth1_ic

You need to use BPF.SCHED_ACT or SCHED_CLS to use these forwarding functions. They are disallowed by the verifier for SOCKET_FILTER programs. The simplest usage (focus on the pyroute2 bits) is probably in tests/cc/test_xlate1.py.
 

The bpf program is:

int eth1_ic(struct __sk_buff *skb) {
      bpf_clone_redirect(skb, 5, 0);
      return -1;
  }

I've hard coded the egress ifindex based on output from '$ipi a'

Bpf_clone_redirect is indeed the 13th entry in the bpf_func_id enum but why is it reported as 'unknown' ?

I'm running Ubuntu 14.04 with the kernel upgraded using binary packages to 4.3.0-040300-generic from Mon Nov 2 2015.

Thanks,
Billy.
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


reminder: IO Visor TSC and Dev Members Call

Brenden Blanco <bblanco@...>
 

Hi,

Please feel free to join us tomorrow at 11am PST (1900 UTC) for another
round of IOVisor developer updates and discussions.

Special on the agenda this week is a half hour or so discussion led by Alexei on the topic of accelerating packet processing in kernel drivers using BPF.

--------------------------------------------------------------------------

http://www.timeanddate.com/worldclock/meetingdetails.html?year=2016&month=3&day=2&hour=19&min=0&sec=0&p1=886

The meeting is open to join:

JOIN WEBEX MEETING
https://plumgrid.webex.com/plumgrid/j.php?MTID=m67436ba0408d6bad48acd69138c03aea
Meeting number: 283 885 640
Meeting password: iovisor


JOIN BY PHONE
+1-415-655-0003 US TOLL
Access code: 283 885 640

Global call-in numbers:
https://plumgrid.webex.com/plumgrid/globalcallin.php?serviceType=MC&ED=44474908&tollFree=0



Can't join the meeting? Contact support here:
https://plumgrid.webex.com/plumgrid/mc


IMPORTANT NOTICE: Please note that this WebEx service allows audio and
other information sent during the session to be recorded, which may be
discoverable in a legal matter. By joining this session, you
automatically consent to such recordings. If you do not consent to
being recorded, discuss your concerns with the host or do not join the
session.


Re: reminder: IO Visor TSC and Dev Members Call

Uri Elzur
 

Sorry interested but it keeps scheduled on top of other meetings I can’t step out of

 

Thx

 

Uri (“Oo-Ree”)

C: 949-378-7568

 

From: Brenden Blanco [mailto:bblanco@...]
Sent: Tuesday, March 1, 2016 2:34 PM
To: iovisor-dev@...
Subject: reminder: IO Visor TSC and Dev Members Call

 

Hi,

Please feel free to join us tomorrow at 11am PST (1900 UTC) for another
round of IOVisor developer updates and discussions.

 

Special on the agenda this week is a half hour or so discussion led by Alexei on the topic of accelerating packet processing in kernel drivers using BPF.

 

--------------------------------------------------------------------------

http://www.timeanddate.com/worldclock/meetingdetails.html?year=2016&month=3&day=2&hour=19&min=0&sec=0&p1=886

The meeting is open to join:

JOIN WEBEX MEETING
https://plumgrid.webex.com/plumgrid/j.php?MTID=m67436ba0408d6bad48acd69138c03aea
Meeting number: 283 885 640
Meeting password: iovisor


JOIN BY PHONE
+1-415-655-0003 US TOLL
Access code: 283 885 640

Global call-in numbers:
https://plumgrid.webex.com/plumgrid/globalcallin.php?serviceType=MC&ED=44474908&tollFree=0



Can't join the meeting? Contact support here:
https://plumgrid.webex.com/plumgrid/mc


IMPORTANT NOTICE: Please note that this WebEx service allows audio and
other information sent during the session to be recorded, which may be
discoverable in a legal matter. By joining this session, you
automatically consent to such recordings. If you do not consent to
being recorded, discuss your concerns with the host or do not join the
session.


Re: unknown func 13

O Mahony, Billy <billy.o.mahony@...>
 

Hi Brendan,

 

Thanks, that’s great. I’ll have a look at that example you mention too.

 

Whereabouts in the code should I look to see which functions are allowable in which bpf program types?

 

I haven’t looked again recently but iirc in some header or man page there is a TODO for description of the various program types.

 

If you point me to the relevant area of the code I can do a patch to fill out some of that documentation. I won’t be able to figure out chapter and verse on it but some basic info would be better than none!

 

Cheers,

/Billy.

 

From: Brenden Blanco [mailto:bblanco@...]
Sent: Tuesday, March 1, 2016 6:07 PM
To: O Mahony, Billy <billy.o.mahony@...>
Cc: iovisor-dev@...
Subject: Re: [iovisor-dev] unknown func 13

 

 

On Tue, Mar 1, 2016 at 9:52 AM, O Mahony, Billy via iovisor-dev <iovisor-dev@...> wrote:

Hi All,

I'm doing my version of hello world for eBPF by forwarding eth frames between two nics - eth1 and eth3 on my machine.  Using bpf_clone_redirect(). (cc/export/helpers.h says bpf_redirect is not available until a later kernel.)

When I run the python program it generates:

[17:42 GD-WCP my_bpf]$ sudo python p2p.py
bpf: Invalid argument
0: (b7) r2 = 5
1: (b7) r3 = 0
2: (85) call 13
unknown func 13

Traceback (most recent call last):
  File "p2p.py", line 86, in <module>
    function_eth1_ic = bpf.load_func("eth1_ic", BPF.SOCKET_FILTER)
  File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 165, in load_func
    raise Exception("Failed to load BPF program %s" % func_name)
Exception: Failed to load BPF program eth1_ic

 

You need to use BPF.SCHED_ACT or SCHED_CLS to use these forwarding functions. They are disallowed by the verifier for SOCKET_FILTER programs. The simplest usage (focus on the pyroute2 bits) is probably in tests/cc/test_xlate1.py.

 


The bpf program is:

int eth1_ic(struct __sk_buff *skb) {
      bpf_clone_redirect(skb, 5, 0);
      return -1;
  }

I've hard coded the egress ifindex based on output from '$ipi a'

Bpf_clone_redirect is indeed the 13th entry in the bpf_func_id enum but why is it reported as 'unknown' ?

I'm running Ubuntu 14.04 with the kernel upgraded using binary packages to 4.3.0-040300-generic from Mon Nov 2 2015.

Thanks,
Billy.
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev

 


Re: unknown func 13

Brenden Blanco <bblanco@...>
 

The best source of truth is in the kernel code.

The whitelist of functions for networking is:

net/core/filter.c:
sk_filter_func_proto()
tc_cls_act_func_proto()

other whitelists are available around various pieces where you see BPF_FUNC_*


On Wed, Mar 2, 2016 at 1:48 AM, O Mahony, Billy <billy.o.mahony@...> wrote:

Hi Brendan,

 

Thanks, that’s great. I’ll have a look at that example you mention too.

 

Whereabouts in the code should I look to see which functions are allowable in which bpf program types?

 

I haven’t looked again recently but iirc in some header or man page there is a TODO for description of the various program types.

 

If you point me to the relevant area of the code I can do a patch to fill out some of that documentation. I won’t be able to figure out chapter and verse on it but some basic info would be better than none!

 

Cheers,

/Billy.

 

From: Brenden Blanco [mailto:bblanco@...]
Sent: Tuesday, March 1, 2016 6:07 PM
To: O Mahony, Billy <billy.o.mahony@...>
Cc: iovisor-dev@...
Subject: Re: [iovisor-dev] unknown func 13

 

 

On Tue, Mar 1, 2016 at 9:52 AM, O Mahony, Billy via iovisor-dev <iovisor-dev@...> wrote:

Hi All,

I'm doing my version of hello world for eBPF by forwarding eth frames between two nics - eth1 and eth3 on my machine.  Using bpf_clone_redirect(). (cc/export/helpers.h says bpf_redirect is not available until a later kernel.)

When I run the python program it generates:

[17:42 GD-WCP my_bpf]$ sudo python p2p.py
bpf: Invalid argument
0: (b7) r2 = 5
1: (b7) r3 = 0
2: (85) call 13
unknown func 13

Traceback (most recent call last):
  File "p2p.py", line 86, in <module>
    function_eth1_ic = bpf.load_func("eth1_ic", BPF.SOCKET_FILTER)
  File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 165, in load_func
    raise Exception("Failed to load BPF program %s" % func_name)
Exception: Failed to load BPF program eth1_ic

 

You need to use BPF.SCHED_ACT or SCHED_CLS to use these forwarding functions. They are disallowed by the verifier for SOCKET_FILTER programs. The simplest usage (focus on the pyroute2 bits) is probably in tests/cc/test_xlate1.py.

 


The bpf program is:

int eth1_ic(struct __sk_buff *skb) {
      bpf_clone_redirect(skb, 5, 0);
      return -1;
  }

I've hard coded the egress ifindex based on output from '$ipi a'

Bpf_clone_redirect is indeed the 13th entry in the bpf_func_id enum but why is it reported as 'unknown' ?

I'm running Ubuntu 14.04 with the kernel upgraded using binary packages to 4.3.0-040300-generic from Mon Nov 2 2015.

Thanks,
Billy.
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev

 



minutes: IO VIsor TSC and Dev members call

Brenden Blanco <bblanco@...>
 

Thanks all for joining today,

We had a very interesting session focused entirely on XDP (express data path), a new initiative to improve packet processing performance in the linux kernel. The details will best be covered by the slides, which I'll be sure to bug Tom to get a copy of to share, so I won't make the situation worse by sharing my possibly erroneous notes.

In a nutshell, the goal is to give the low level driver architecture of the kernel some TLC, improving PPS and BPFifying it.

Some early prototypes are already in the works!

The performance goals are:
20M pps per-cpu drop rate
14M pps per-cpu forwarding rate
100Gbps per-cpu GRO

There were some opinions on the initial use cases that some of us would like to apply to this to:
- Drop (DDOS mitigation)
- Passthrough / Forwarding
- Delivery to socket
- Delivery to VM

Thanks!

Attendees:
Alexei Starovoitov
Alex Reece
Brendan Gregg
Brenden Blanco
Daniel Borkmann
Deepa Kalani
Jesper Brouer
Jianwen Pi
John Fastabend
Mihai Budiu
Pere Monclus
Prem Jonnalagadda
Thomas Graf
Tom Herbert
Yunsong Lu


Re: minutes: IO VIsor TSC and Dev members call

Jesper Dangaard Brouer
 

On Wed, 2 Mar 2016 22:28:42 -0800
Brenden Blanco <bblanco@...> wrote:

Thanks all for joining today,

We had a very interesting session focused entirely on XDP (express data
path), a new initiative to improve packet processing performance in the
linux kernel. The details will best be covered by the slides, which I'll be
sure to bug Tom to get a copy of to share, so I won't make the situation
worse by sharing my possibly erroneous notes.

In a nutshell, the goal is to give the low level driver architecture of the
kernel some TLC, improving PPS and BPFifying it.

Some early prototypes are already in the works!
I'm doing my usual benchmark driven development. Which means I'm
currently benchmarking the lowest RX layer of the drivers and just
dropping packets inside the driver.

Current results from driver:mlx4 (40Gbits/s) indicate that interacting
with the page-allocator is costing us 30% overhead.

The performance goals are:
20M pps per-cpu drop rate
14M pps per-cpu forwarding rate
100Gbps per-cpu GRO
Driver: mlx4 early drop tests
- 6 Mpps => SKB drop (just calling dev_kfree_skb)
* (main overhead is first cache-miss on pkt-data hdr)
- 14.5 Mpps => Driver drop before SKB alloc, no-pkt-data touched
* main overhead 30% is page-allocator related
- 20 Mpps => MAX bound, if removing all 30% page-alloc overhead
* this just upper possible bound... stop tuning when getting close to this

The mlx4 driver already implements it's own page-allocator-cache, but
does not do proper recycling. I want us to implement a more generic
page-allocator-cache that drivers can use, and that support recycling.


There were some opinions on the initial use cases that some of us would
like to apply to this to:
- Drop (DDOS mitigation)
I see DDoS as project goal #1

- Passthrough / Forwarding
I see forward as proj goal #2

- Delivery to socket
I would actually say, we don't want to deliver these kind of RX frames
into sockets. Primary reason is memory consumption. The amount of
time a packet can stay on a socket is unbounded.

For small packet, we might even consider doing a copy, if the dest is a
local socket. This is what drivers already do for small packets, but I
would like for this "copy-break" to be pushed "up-a-level".

In the future we can consider zero-copy RX socket delivery, but the
socket would likely need to opt-in, with a setsockopt, and the
userspace API programming model also needs some changes.


- Delivery to VM
Delivery into VM is a very interesting feature. I actually see this as
goal #3. Even-though this is actually fairly complicated.

I tried to explain on the call what my VM design plan was, maybe it's
easier over email:

Once we have our own page-allocator-cache in-place. We assign a
separate page-allocator-cache for each HW RX ring queue (primarily for
performance reasons in the normal use-case).

For VM delivery, we create a new RX ring queue, and use ntuple HW
filters in the NIC to direct packets to the VM-specific-RXQ. (I see
this as HW based early demux)

The page-allocator-cache assigned to this VM-specific-RXQ is configured
to be VM specific. That is, kernel pages are (pre-mapped) memory map
shared with the VM process. Thus, the DMA RX engine will deliver
packet-data into pages which are available in the VM's memory space,
thus avail as zero-copy RX. (The API for delivering and returning
pages, also needs some careful considerations, e.g. designing in bulk
from the start. More work required here)


Thanks!

Attendees:
Alexei Starovoitov
Alex Reece
Brendan Gregg
Brenden Blanco
Daniel Borkmann
Deepa Kalani
Jesper Brouer
Jianwen Pi
John Fastabend
Mihai Budiu
Pere Monclus
Prem Jonnalagadda
Thomas Graf
Tom Herbert
Yunsong Lu


--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer


Re: unknown func 13

O Mahony, Billy <billy.o.mahony@...>
 

Hi Brendan,

 

After looking at the xlate tests, I *think* have managed to get my bpf programs attached to my NICS (or rather attached to a qdisc attached to my NICS - as I now understand it).

 

However I still can't forward traffic ingressing one nic to egress the other.

 

My python program looks like this.

 

  bpf = BPF(src_file = "p2p.c",debug = 0)

 

  function_eth1_ic = bpf.load_func("eth1_ic", BPF.SCHED_ACT)

  function_eth3_ic = bpf.load_func("eth3_ic", BPF.SCHED_ACT)

 

  ip = IPRoute()

  ifindex_eth1 = ip.link_lookup(ifname="eth1")[0]

  ifindex_eth3 = ip.link_lookup(ifname="eth3")[0]

 

  #Add an ingress qdisc to both NICs

  ip.tc("add", "ingress", ifindex_eth1, "ffff:")

  ip.tc("add", "ingress", ifindex_eth3, "ffff:")

 

  #create actions based on the BPF functions

  action_eth1 = {"kind": "bpf", "fd": function_eth1_ic.fd,

      "name": function_eth1_ic.name, "action": "ok"}

  action_eth3 = {"kind": "bpf", "fd": function_eth3_ic.fd,

      "name": function_eth3_ic.name, "action": "ok"}

 

  #add a filter to accept all eth frame. Attach the bpf action? functions to the filter?

  #I have no idea what the classid, target and keys parameters mean!

  ip.tc("add-filter", "u32", ifindex_eth1, ":1", parent="ffff:",

      action=[action_eth1],

      protocol=protocols.ETH_P_ALL, classid=1,

      target=0x10002, keys=['0x0/0x0+0'])

  ip.tc("add-filter", "u32", ifindex_eth3, ":1", parent="ffff:",

      action=[action_eth3],

      protocol=protocols.ETH_P_ALL, classid=1,

      target=0x10002, keys=['0x0/0x0+0'])

     

  while (True): pass

 

A couple of things; I can’t see any mention of the filter or action on the output from some tc show cmds:

 

  [13:32 GD-WCP bpf]$ tc qdisc show dev eth3

  qdisc mq 0: root

  qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

  ...                                                                                                                                               

  qdisc pfifo_fast 0: parent :38 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

  qdisc ingress ffff: parent ffff:fff1 ----------------

 

  [13:32 GD-WCP bpf]$ tc class show dev eth3

  class mq :1 root             

  ...

  class mq :40 root

 

  [13:32 GD-WCP bpf]$ tc filter show dev eth3

  <nothing>

  

I know my query is mainly relating to tc but after looking at a lot of documentation on it I'm not much the wiser about how packet actually traverses tc.

 

The BPF programs remain the same as earlier with hard-coded value for the ifindex value in bpf_clone_redirect.

 

Thanks,

Billy.

 

From: Brenden Blanco [mailto:bblanco@...]
Sent: Wednesday, March 2, 2016 6:55 PM
To: O Mahony, Billy <billy.o.mahony@...>
Cc: iovisor-dev@...
Subject: Re: [iovisor-dev] unknown func 13

 

The best source of truth is in the kernel code.

 

The whitelist of functions for networking is:

 

net/core/filter.c:

sk_filter_func_proto()

tc_cls_act_func_proto()

 

other whitelists are available around various pieces where you see BPF_FUNC_*

 

 

On Wed, Mar 2, 2016 at 1:48 AM, O Mahony, Billy <billy.o.mahony@...> wrote:

Hi Brendan,

 

Thanks, that’s great. I’ll have a look at that example you mention too.

 

Whereabouts in the code should I look to see which functions are allowable in which bpf program types?

 

I haven’t looked again recently but iirc in some header or man page there is a TODO for description of the various program types.

 

If you point me to the relevant area of the code I can do a patch to fill out some of that documentation. I won’t be able to figure out chapter and verse on it but some basic info would be better than none!

 

Cheers,

/Billy.

 

From: Brenden Blanco [mailto:bblanco@...]
Sent: Tuesday, March 1, 2016 6:07 PM
To: O Mahony, Billy <billy.o.mahony@...>
Cc: iovisor-dev@...
Subject: Re: [iovisor-dev] unknown func 13

 

 

On Tue, Mar 1, 2016 at 9:52 AM, O Mahony, Billy via iovisor-dev <iovisor-dev@...> wrote:

Hi All,

I'm doing my version of hello world for eBPF by forwarding eth frames between two nics - eth1 and eth3 on my machine.  Using bpf_clone_redirect(). (cc/export/helpers.h says bpf_redirect is not available until a later kernel.)

When I run the python program it generates:

[17:42 GD-WCP my_bpf]$ sudo python p2p.py
bpf: Invalid argument
0: (b7) r2 = 5
1: (b7) r3 = 0
2: (85) call 13
unknown func 13

Traceback (most recent call last):
  File "p2p.py", line 86, in <module>
    function_eth1_ic = bpf.load_func("eth1_ic", BPF.SOCKET_FILTER)
  File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 165, in load_func
    raise Exception("Failed to load BPF program %s" % func_name)
Exception: Failed to load BPF program eth1_ic

 

You need to use BPF.SCHED_ACT or SCHED_CLS to use these forwarding functions. They are disallowed by the verifier for SOCKET_FILTER programs. The simplest usage (focus on the pyroute2 bits) is probably in tests/cc/test_xlate1.py.

 


The bpf program is:

int eth1_ic(struct __sk_buff *skb) {
      bpf_clone_redirect(skb, 5, 0);
      return -1;
  }

I've hard coded the egress ifindex based on output from '$ipi a'

Bpf_clone_redirect is indeed the 13th entry in the bpf_func_id enum but why is it reported as 'unknown' ?

I'm running Ubuntu 14.04 with the kernel upgraded using binary packages to 4.3.0-040300-generic from Mon Nov 2 2015.

Thanks,
Billy.
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev

 

 


Re: minutes: IO VIsor TSC and Dev members call

Alexei Starovoitov
 

On Thu, Mar 3, 2016 at 1:57 AM, Jesper Dangaard Brouer via iovisor-dev
<iovisor-dev@...> wrote:
On Wed, 2 Mar 2016 22:28:42 -0800
Brenden Blanco <bblanco@...> wrote:

Thanks all for joining today,

We had a very interesting session focused entirely on XDP (express data
path), a new initiative to improve packet processing performance in the
linux kernel. The details will best be covered by the slides, which I'll be
sure to bug Tom to get a copy of to share, so I won't make the situation
worse by sharing my possibly erroneous notes.

In a nutshell, the goal is to give the low level driver architecture of the
kernel some TLC, improving PPS and BPFifying it.

Some early prototypes are already in the works!
I'm doing my usual benchmark driven development. Which means I'm
currently benchmarking the lowest RX layer of the drivers and just
dropping packets inside the driver.

Current results from driver:mlx4 (40Gbits/s) indicate that interacting
with the page-allocator is costing us 30% overhead.

The performance goals are:
20M pps per-cpu drop rate
14M pps per-cpu forwarding rate
100Gbps per-cpu GRO
Driver: mlx4 early drop tests
- 6 Mpps => SKB drop (just calling dev_kfree_skb)
* (main overhead is first cache-miss on pkt-data hdr)
- 14.5 Mpps => Driver drop before SKB alloc, no-pkt-data touched
* main overhead 30% is page-allocator related
awesome. that's a great baseline.

- 20 Mpps => MAX bound, if removing all 30% page-alloc overhead
* this just upper possible bound... stop tuning when getting close to this

The mlx4 driver already implements it's own page-allocator-cache, but
does not do proper recycling. I want us to implement a more generic
page-allocator-cache that drivers can use, and that support recycling.
I think the next step here is to make mlx4 to recycle
pages and rx descriptors on its own. Later we can generalize it
into something that other drivers can use. Right now
I'd try to get to maximum possible drop rate with
minimal changes.

Or, we're talking about benchmarking MLX4_EN_FLAG_RX_FILTER_NEEDED
That's the place where we plan to add XDP hook.

Jesper, can you share 'perf report' ?

John, if you can share similar numbers for ixgbe or i40e
that would be great, so we can have some driver competition :)
Also it will help us to see how different drivers can
recycle pages. imo only then we can generalize it into
common page-alloctor-cache-with-recycle infra.

There were some opinions on the initial use cases that some of us would
like to apply to this to:
- Drop (DDOS mitigation)
I see DDoS as project goal #1
+1

- Passthrough / Forwarding
I see forward as proj goal #2
+1

- Delivery to socket
- Delivery to VM
Delivery into VM is a very interesting feature. I actually see this as
goal #3. Even-though this is actually fairly complicated.
yeah, let's worry about it later. We need to walk before we can fly.

I think in parallel mellanox folks need to fix mlx5 driver
to allocate skb only after packet is arrived (similar to mlx4).


Re: unknown func 13

Brenden Blanco <bblanco@...>
 

I will heartily acknowledge that the mechanism to attach bpf programs to an interface is non-intuitive. This has been a source of pain for many. The code in https://github.com/iovisor/iomodules is an attempt to wrap some of this in a rest API, but the target audience for that is other automation tools (container plugins, etc.), so as a learning tool it will be a step back rather than forward. It's also not mature yet, so I'm hesitant to mention it.

See other answers inline.

On Thu, Mar 3, 2016 at 5:52 AM, O Mahony, Billy <billy.o.mahony@...> wrote:

Hi Brendan,

 

After looking at the xlate tests, I *think* have managed to get my bpf programs attached to my NICS (or rather attached to a qdisc attached to my NICS - as I now understand it).

 

However I still can't forward traffic ingressing one nic to egress the other.

 

My python program looks like this.

 

  bpf = BPF(src_file = "p2p.c",debug = 0)

 

  function_eth1_ic = bpf.load_func("eth1_ic", BPF.SCHED_ACT)

  function_eth3_ic = bpf.load_func("eth3_ic", BPF.SCHED_ACT)

 

  ip = IPRoute()

  ifindex_eth1 = ip.link_lookup(ifname="eth1")[0]

  ifindex_eth3 = ip.link_lookup(ifname="eth3")[0]

 

  #Add an ingress qdisc to both NICs

  ip.tc("add", "ingress", ifindex_eth1, "ffff:")

  ip.tc("add", "ingress", ifindex_eth3, "ffff:")

 

  #create actions based on the BPF functions

  action_eth1 = {"kind": "bpf", "fd": function_eth1_ic.fd,

      "name": function_eth1_ic.name, "action": "ok"}

  action_eth3 = {"kind": "bpf", "fd": function_eth3_ic.fd,

      "name": function_eth3_ic.name, "action": "ok"}

 

  #add a filter to accept all eth frame. Attach the bpf action? functions to the filter?

  #I have no idea what the classid, target and keys parameters mean!

  ip.tc("add-filter", "u32", ifindex_eth1, ":1", parent="ffff:",

      action=[action_eth1],

      protocol=protocols.ETH_P_ALL, classid=1,

      target=0x10002, keys=['0x0/0x0+0'])

  ip.tc("add-filter", "u32", ifindex_eth3, ":1", parent="ffff:",

      action=[action_eth3],

      protocol=protocols.ETH_P_ALL, classid=1,

      target=0x10002, keys=['0x0/0x0+0'])


Yeah, it's a bit of black magic.  it's basically creating a match-all rule (match 0 bytes and offset 0 == true), with the matching action being the bpf program. The classid and target aren't really meaningful with just one action and filter, so it's ok to ignore for now.

The clsact qdisc added in 4.5 by Daniel Borkmann is a much better abstraction for this, and I'll try to upstream some pyroute2 code to support this.

     

  while (True): pass

 

A couple of things; I can’t see any mention of the filter or action on the output from some tc show cmds:

 

  [13:32 GD-WCP bpf]$ tc qdisc show dev eth3

  qdisc mq 0: root

  qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

  ...                                                                                                                                               

  qdisc pfifo_fast 0: parent :38 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

  qdisc ingress ffff: parent ffff:fff1 ----------------


Try `tc filter show dev eth3 parent ffff:` 

 

  [13:32 GD-WCP bpf]$ tc class show dev eth3

  class mq :1 root             

  ...

  class mq :40 root

 

  [13:32 GD-WCP bpf]$ tc filter show dev eth3

  <nothing>

  

I know my query is mainly relating to tc but after looking at a lot of documentation on it I'm not much the wiser about how packet actually traverses tc.

 

The BPF programs remain the same as earlier with hard-coded value for the ifindex value in bpf_clone_redirect.


Try using bpf_redirect() (available in 4.4) instead, and return an action of TC_ACT_REDIRECT. Also, use `tc -s filter show ...` to see pass/drop statistics.
 

 

Thanks,

Billy.

 

From: Brenden Blanco [mailto:bblanco@...]
Sent: Wednesday, March 2, 2016 6:55 PM


To: O Mahony, Billy <billy.o.mahony@...>
Cc: iovisor-dev@...
Subject: Re: [iovisor-dev] unknown func 13

 

The best source of truth is in the kernel code.

 

The whitelist of functions for networking is:

 

net/core/filter.c:

sk_filter_func_proto()

tc_cls_act_func_proto()

 

other whitelists are available around various pieces where you see BPF_FUNC_*

 

 

On Wed, Mar 2, 2016 at 1:48 AM, O Mahony, Billy <billy.o.mahony@...> wrote:

Hi Brendan,

 

Thanks, that’s great. I’ll have a look at that example you mention too.

 

Whereabouts in the code should I look to see which functions are allowable in which bpf program types?

 

I haven’t looked again recently but iirc in some header or man page there is a TODO for description of the various program types.

 

If you point me to the relevant area of the code I can do a patch to fill out some of that documentation. I won’t be able to figure out chapter and verse on it but some basic info would be better than none!

 

Cheers,

/Billy.

 

From: Brenden Blanco [mailto:bblanco@...]
Sent: Tuesday, March 1, 2016 6:07 PM
To: O Mahony, Billy <billy.o.mahony@...>
Cc: iovisor-dev@...
Subject: Re: [iovisor-dev] unknown func 13

 

 

On Tue, Mar 1, 2016 at 9:52 AM, O Mahony, Billy via iovisor-dev <iovisor-dev@...> wrote:

Hi All,

I'm doing my version of hello world for eBPF by forwarding eth frames between two nics - eth1 and eth3 on my machine.  Using bpf_clone_redirect(). (cc/export/helpers.h says bpf_redirect is not available until a later kernel.)

When I run the python program it generates:

[17:42 GD-WCP my_bpf]$ sudo python p2p.py
bpf: Invalid argument
0: (b7) r2 = 5
1: (b7) r3 = 0
2: (85) call 13
unknown func 13

Traceback (most recent call last):
  File "p2p.py", line 86, in <module>
    function_eth1_ic = bpf.load_func("eth1_ic", BPF.SOCKET_FILTER)
  File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 165, in load_func
    raise Exception("Failed to load BPF program %s" % func_name)
Exception: Failed to load BPF program eth1_ic

 

You need to use BPF.SCHED_ACT or SCHED_CLS to use these forwarding functions. They are disallowed by the verifier for SOCKET_FILTER programs. The simplest usage (focus on the pyroute2 bits) is probably in tests/cc/test_xlate1.py.

 


The bpf program is:

int eth1_ic(struct __sk_buff *skb) {
      bpf_clone_redirect(skb, 5, 0);
      return -1;
  }

I've hard coded the egress ifindex based on output from '$ipi a'

Bpf_clone_redirect is indeed the 13th entry in the bpf_func_id enum but why is it reported as 'unknown' ?

I'm running Ubuntu 14.04 with the kernel upgraded using binary packages to 4.3.0-040300-generic from Mon Nov 2 2015.

Thanks,
Billy.
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev

 

 



Re: unknown func 13

Daniel Borkmann
 

On 03/03/2016 08:19 PM, Brenden Blanco via iovisor-dev wrote:
[...]
On Thu, Mar 3, 2016 at 5:52 AM, O Mahony, Billy <billy.o.mahony@...>
[...]
#add a filter to accept all eth frame. Attach the bpf action? functions
to the filter?

#I have no idea what the classid, target and keys parameters mean!

ip.tc("add-filter", "u32", ifindex_eth1, ":1", parent="ffff:",

action=[action_eth1],

protocol=protocols.ETH_P_ALL, classid=1,

target=0x10002, keys=['0x0/0x0+0'])

ip.tc("add-filter", "u32", ifindex_eth3, ":1", parent="ffff:",

action=[action_eth3],

protocol=protocols.ETH_P_ALL, classid=1,

target=0x10002, keys=['0x0/0x0+0'])
Yeah, it's a bit of black magic. it's basically creating a match-all rule
(match 0 bytes and offset 0 == true), with the matching action being the
bpf program. The classid and target aren't really meaningful with just one
action and filter, so it's ok to ignore for now.

The clsact qdisc added in 4.5 by Daniel Borkmann is a much better
abstraction for this, and I'll try to upstream some pyroute2 code to
support this.
Plus you may also want to try out cls_bpf with da (direct-action) mode.
That will save you the u32 match all classifier config and is faster.


Re: minutes: IO VIsor TSC and Dev members call

Jesper Dangaard Brouer
 

On Thu, 3 Mar 2016 10:12:27 -0800
Alexei Starovoitov <alexei.starovoitov@...> wrote:

On Thu, Mar 3, 2016 at 1:57 AM, Jesper Dangaard Brouer via iovisor-dev
<iovisor-dev@...> wrote:
On Wed, 2 Mar 2016 22:28:42 -0800
Brenden Blanco <bblanco@...> wrote:

Thanks all for joining today,

We had a very interesting session focused entirely on XDP (express data
path), a new initiative to improve packet processing performance in the
linux kernel. The details will best be covered by the slides, which I'll be
sure to bug Tom to get a copy of to share, so I won't make the situation
worse by sharing my possibly erroneous notes.

In a nutshell, the goal is to give the low level driver architecture of the
kernel some TLC, improving PPS and BPFifying it.

Some early prototypes are already in the works!
I'm doing my usual benchmark driven development. Which means I'm
currently benchmarking the lowest RX layer of the drivers and just
dropping packets inside the driver.

Current results from driver:mlx4 (40Gbits/s) indicate that interacting
with the page-allocator is costing us 30% overhead.

The performance goals are:
20M pps per-cpu drop rate
14M pps per-cpu forwarding rate
100Gbps per-cpu GRO
Driver: mlx4 early drop tests
- 6 Mpps => SKB drop (just calling dev_kfree_skb)
* (main overhead is first cache-miss on pkt-data hdr)
- 14.5 Mpps => Driver drop before SKB alloc, no-pkt-data touched
* main overhead 30% is page-allocator related
awesome. that's a great baseline.

- 20 Mpps => MAX bound, if removing all 30% page-alloc overhead
* this just upper possible bound... stop tuning when getting close to this

The mlx4 driver already implements it's own page-allocator-cache, but
does not do proper recycling. I want us to implement a more generic
page-allocator-cache that drivers can use, and that support recycling.
I think the next step here is to make mlx4 to recycle
pages and rx descriptors on its own. Later we can generalize it
into something that other drivers can use. Right now
I'd try to get to maximum possible drop rate with
minimal changes.
Yes, but we might as well start up with making the allocator hacks in
mlx4 more generic, when adding recycle. But still keep them locally in
that file.


Or, we're talking about benchmarking MLX4_EN_FLAG_RX_FILTER_NEEDED
That's the place where we plan to add XDP hook.
Almost.
I had some issue with benchmarking just before MLX4_EN_FLAG_RX_FILTER_NEEDED.

I added a drop (via goto next;) just inside the statement:

if (dev->features & NETIF_F_GRO) {
goto next;
}

That allowed me some flexibility to enable/disable it easily.


Jesper, can you share 'perf report' ?
Its easiest to see with a FlameGraph:
http://people.netfilter.org/hawk/FlameGraph/mlx4_rx_drop_NAPI_force_on.svg

The problem with normal perf report output, is that page functions are
so "many", thus they look small percentage wise, but once you add them
up, they start to use a lot.

The "NAPI_force_on" in the name was a hack, where I forced it to never
exit softirq, but the performance didn't improve. It did made the perf
record more focused on what we want to look at.


John, if you can share similar numbers for ixgbe or i40e
that would be great, so we can have some driver competition :)
Also it will help us to see how different drivers can
recycle pages. imo only then we can generalize it into
common page-alloctor-cache-with-recycle infra.

There were some opinions on the initial use cases that some of us would
like to apply to this to:
- Drop (DDOS mitigation)
I see DDoS as project goal #1
+1

- Passthrough / Forwarding
I see forward as proj goal #2
+1

- Delivery to socket
- Delivery to VM
Delivery into VM is a very interesting feature. I actually see this as
goal #3. Even-though this is actually fairly complicated.
yeah, let's worry about it later. We need to walk before we can fly.
For practical implementations yes.

But I need/want to work a bit in this area... because I'm attending
MM-summit, and I want to present on this idea there. Getting something
like this integrated into the MM-area is going to take time, and we
need to present our ideas in this area as early as possible to the
MM-people. At least I'm hoping to get some MM-feedback on things I
should not do ;-)


I think in parallel mellanox folks need to fix mlx5 driver
to allocate skb only after packet is arrived (similar to mlx4).
Yes, I already told them to do so...


--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer


Re: unknown func 13

Brenden Blanco <bblanco@...>
 



On Thu, Mar 3, 2016 at 11:31 AM, Daniel Borkmann <daniel@...> wrote:
On 03/03/2016 08:19 PM, Brenden Blanco via iovisor-dev wrote:
[...]
On Thu, Mar 3, 2016 at 5:52 AM, O Mahony, Billy <billy.o.mahony@...>
[...]
   #add a filter to accept all eth frame. Attach the bpf action? functions
to the filter?

   #I have no idea what the classid, target and keys parameters mean!

   ip.tc("add-filter", "u32", ifindex_eth1, ":1", parent="ffff:",

       action=[action_eth1],

       protocol=protocols.ETH_P_ALL, classid=1,

       target=0x10002, keys=['0x0/0x0+0'])

   ip.tc("add-filter", "u32", ifindex_eth3, ":1", parent="ffff:",

       action=[action_eth3],

       protocol=protocols.ETH_P_ALL, classid=1,

       target=0x10002, keys=['0x0/0x0+0'])


Yeah, it's a bit of black magic.  it's basically creating a match-all rule
(match 0 bytes and offset 0 == true), with the matching action being the
bpf program. The classid and target aren't really meaningful with just one
action and filter, so it's ok to ignore for now.

The clsact qdisc added in 4.5 by Daniel Borkmann is a much better
abstraction for this, and I'll try to upstream some pyroute2 code to
support this.

Plus you may also want to try out cls_bpf with da (direct-action) mode.
That will save you the u32 match all classifier config and is faster.

Speaking of which, I just opened https://github.com/svinota/pyroute2/pull/223 to add support in pyroute2 upstream. The docstring in that pull request shows an example usage, which I'll also add to a testcase in bcc once it merges upstream.

I've done similar additions for go in https://github.com/vishvananda/netlink/pull/94, if go is your thing.


Re: unknown func 13

Daniel Borkmann
 

On 03/03/2016 10:01 PM, Brenden Blanco wrote:
On Thu, Mar 3, 2016 at 11:31 AM, Daniel Borkmann <daniel@...>
wrote:

On 03/03/2016 08:19 PM, Brenden Blanco via iovisor-dev wrote:
[...]

On Thu, Mar 3, 2016 at 5:52 AM, O Mahony, Billy <billy.o.mahony@...
[...]

#add a filter to accept all eth frame. Attach the bpf action? functions
to the filter?

#I have no idea what the classid, target and keys parameters mean!

ip.tc("add-filter", "u32", ifindex_eth1, ":1", parent="ffff:",

action=[action_eth1],

protocol=protocols.ETH_P_ALL, classid=1,

target=0x10002, keys=['0x0/0x0+0'])

ip.tc("add-filter", "u32", ifindex_eth3, ":1", parent="ffff:",

action=[action_eth3],

protocol=protocols.ETH_P_ALL, classid=1,

target=0x10002, keys=['0x0/0x0+0'])

Yeah, it's a bit of black magic. it's basically creating a match-all rule
(match 0 bytes and offset 0 == true), with the matching action being the
bpf program. The classid and target aren't really meaningful with just one
action and filter, so it's ok to ignore for now.

The clsact qdisc added in 4.5 by Daniel Borkmann is a much better
abstraction for this, and I'll try to upstream some pyroute2 code to
support this.
Plus you may also want to try out cls_bpf with da (direct-action) mode.
That will save you the u32 match all classifier config and is faster.
Speaking of which, I just opened
https://github.com/svinota/pyroute2/pull/223 to add support in pyroute2
upstream. The docstring in that pull request shows an example usage, which
I'll also add to a testcase in bcc once it merges upstream.

I've done similar additions for go in
https://github.com/vishvananda/netlink/pull/94, if go is your thing.
That's awesome, thanks Brenden!


Re: minutes: IO VIsor TSC and Dev members call

John Fastabend
 

On 16-03-03 12:48 PM, Jesper Dangaard Brouer wrote:
On Thu, 3 Mar 2016 10:12:27 -0800
Alexei Starovoitov <alexei.starovoitov@...> wrote:

On Thu, Mar 3, 2016 at 1:57 AM, Jesper Dangaard Brouer via iovisor-dev
<iovisor-dev@...> wrote:
On Wed, 2 Mar 2016 22:28:42 -0800
Brenden Blanco <bblanco@...> wrote:

Thanks all for joining today,

We had a very interesting session focused entirely on XDP (express data
path), a new initiative to improve packet processing performance in the
linux kernel. The details will best be covered by the slides, which I'll be
sure to bug Tom to get a copy of to share, so I won't make the situation
worse by sharing my possibly erroneous notes.

In a nutshell, the goal is to give the low level driver architecture of the
kernel some TLC, improving PPS and BPFifying it.

Some early prototypes are already in the works!
I'm doing my usual benchmark driven development. Which means I'm
currently benchmarking the lowest RX layer of the drivers and just
dropping packets inside the driver.

Current results from driver:mlx4 (40Gbits/s) indicate that interacting
with the page-allocator is costing us 30% overhead.

The performance goals are:
20M pps per-cpu drop rate
14M pps per-cpu forwarding rate
100Gbps per-cpu GRO
Driver: mlx4 early drop tests
- 6 Mpps => SKB drop (just calling dev_kfree_skb)
* (main overhead is first cache-miss on pkt-data hdr)
- 14.5 Mpps => Driver drop before SKB alloc, no-pkt-data touched
* main overhead 30% is page-allocator related
awesome. that's a great baseline.

- 20 Mpps => MAX bound, if removing all 30% page-alloc overhead
* this just upper possible bound... stop tuning when getting close to this

The mlx4 driver already implements it's own page-allocator-cache, but
does not do proper recycling. I want us to implement a more generic
page-allocator-cache that drivers can use, and that support recycling.
I think the next step here is to make mlx4 to recycle
pages and rx descriptors on its own. Later we can generalize it
into something that other drivers can use. Right now
I'd try to get to maximum possible drop rate with
minimal changes.
Yes, but we might as well start up with making the allocator hacks in
mlx4 more generic, when adding recycle. But still keep them locally in
that file.


Or, we're talking about benchmarking MLX4_EN_FLAG_RX_FILTER_NEEDED
That's the place where we plan to add XDP hook.
Almost.
I had some issue with benchmarking just before MLX4_EN_FLAG_RX_FILTER_NEEDED.

I added a drop (via goto next;) just inside the statement:

if (dev->features & NETIF_F_GRO) {
goto next;
}

That allowed me some flexibility to enable/disable it easily.


Jesper, can you share 'perf report' ?
Its easiest to see with a FlameGraph:
http://people.netfilter.org/hawk/FlameGraph/mlx4_rx_drop_NAPI_force_on.svg

The problem with normal perf report output, is that page functions are
so "many", thus they look small percentage wise, but once you add them
up, they start to use a lot.

The "NAPI_force_on" in the name was a hack, where I forced it to never
exit softirq, but the performance didn't improve. It did made the perf
record more focused on what we want to look at.


John, if you can share similar numbers for ixgbe or i40e
that would be great, so we can have some driver competition :)
Also it will help us to see how different drivers can
recycle pages. imo only then we can generalize it into
common page-alloctor-cache-with-recycle infra.

There were some opinions on the initial use cases that some of us would
like to apply to this to:
- Drop (DDOS mitigation)
I see DDoS as project goal #1
+1

- Passthrough / Forwarding
I see forward as proj goal #2
+1

- Delivery to socket
- Delivery to VM
Delivery into VM is a very interesting feature. I actually see this as
goal #3. Even-though this is actually fairly complicated.
yeah, let's worry about it later. We need to walk before we can fly.
For practical implementations yes.

But I need/want to work a bit in this area... because I'm attending
MM-summit, and I want to present on this idea there. Getting something
like this integrated into the MM-area is going to take time, and we
need to present our ideas in this area as early as possible to the
MM-people. At least I'm hoping to get some MM-feedback on things I
should not do ;-)
If you are looking at this you might want to check out what we have
today or perhaps your already aware of it.

For steering flows to specific queues we have ethtool interface and
soon 'tc' will support this as well via u32 but eventually ebpf
programs,

https://patchwork.ozlabs.org/patch/476511/

I guess I never added explicit ethtool support for this as customers
have custom software running in the control plane that manages this.
Then VFs map onto userspace dataplane or VM or whatever. I think your
team wrote this,

http://rhelblog.redhat.com/2015/10/02/getting-the-best-of-both-worlds-with-queue-splitting-bifurcated-driver/

If you want something that doesn't require SRIOV we have these patches
that were rejected due to security concerns.

https://lwn.net/Articles/615046/

Of course my colleagues run DPDK on the top of these queues but there
are two alternatives to doing this that we started working on but never
made it very far. But you can hook this direct into qemu so you have a
direct queue into a VM I think this is the best bet. Although you
will need to wait until we get hardware support to protect the queue
pair dma ring if you push that all the way up into userspace for best
performance. A middle ground is to sanitize the dma addresses in
the driver by kicking it with a system call or something else. I
measured this @ about 15% overhead in 2015 but I am told system
calls are getting better as time goes on. We also had some crazy
schemes where the in kernel driver polled on some shared mmap bit
but this never worked very well and worse it burned a core. Neil
Horman had some other ideas around catching TLB misses or something
but I forget exactly what he was up to.

.John


I think in parallel mellanox folks need to fix mlx5 driver
to allocate skb only after packet is arrived (similar to mlx4).
Yes, I already told them to do so...


Re: minutes: IO VIsor TSC and Dev members call

Brenden Blanco <bblanco@...>
 

The details will best be covered by the slides, which I'll be sure to bug Tom to get a copy of to share, so I won't make the situation worse by sharing my possibly erroneous notes.
As promised, the slides are now available here:
https://github.com/iovisor/bpf-docs/blob/master/Express_Data_Path.pdf

81 - 100 of 2021