Date   

Re: Using IOVisor to resize IP packets

Brenden Blanco <bblanco@...>
 

Hi,

Doing raw resizing of a packet from eBPF is not yet possible, with the
exception of push/pop vlan header. This is a feature on my own
wishlist as well.

For encapsulation, it is possible to forward a packet to a tunnel
device (vxlan, gre, ipsec, etc.) with a criteria of your choosing. You
would end up using bpf_[clone_]redirect to pick the egress ifindex,
similar to what is done in
https://github.com/iovisor/bcc/blob/master/examples/networking/distributed_bridge/tunnel_mesh.c.

In the future hopefully this resizing will be possible.

Thanks,
Brenden

On Mon, Dec 21, 2015 at 5:36 AM, Mat and Helen via iovisor-dev
<iovisor-dev@...> wrote:
I’m interested in the possibility of using ebpf / IOVisor to add
encapsulation to IP packets, or to translate IPv4 packets to IPv6. Either
of these would require me to increase the size of the packet header.


Can you tell me whether this is possible please? Or is it outside of the
scope of what can be done with ebpf?


Many thanks!


_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


No TSC + Dev members call this week

Brenden Blanco <bblanco@...>
 

Hi all,

There will be no call this Wednesday due to the holidays. Take care
and see you all again when we resume on Jan. 6, 2016!

Cheers,
Brenden


IOVisor 2015 summary and 2016 goals discussion

Pere Monclus
 

Dear IOVisor TSC members and developers,

Happy New Year 2016! We are collecting a high level set of goals and efforts targeted for 2016. This is in anticipation of the discussion in the IOVisor Board meeting on what efforts to plan and target for 2016 and how to allocate the budget.

Understanding that this is a very high level overview and that the individual work of members and contributors will shape 2016, would be good to capture some of the efforts and requirements you are driving within your community or organizations and see what should we actively track and pursue in 2016. 

Would you be so kind to take a look at the presentation and give your feedback? 

Thanks,
Pere


IOVisor TSC/Dev Meeting

Brenden Blanco <bblanco@...>
 

Happy New Year developers!

Tomorrow we should resume the biweekly dev call, and for this week I was thinking of two main topics.

Firstly, there has been some progress on the GBP implementation, both on the dataplane side (https://github.com/iovisor/iomodules) and on the renderer side (or so I hear from Keith).

Secondly, we should discuss the slide deck that Pere sent out earlier today regarding the goals for 2016.

See you all tomorrow.
-Brenden

P.s. If you are receiving this email, please feel free to join the call, lurkers welcome.
 
IOVisor TSC/Dev Meeting
Wednesday, January 6, 2016
11:00 am  |  Pacific Standard Time (San Francisco, GMT-08:00)  |  1 hr
 
Join WebEx meeting
Meeting number:804 574 668
Meeting password:iovisor
 
Join by phone
+1-415-655-0003 US TOLL
Access code: 804 574 668
Global call-in numbers


[Update] IOvisor TSC & Dev Members call

Pere Monclus
 

please use this to join the meeting

https://plumgrid.webex.com/plumgrid/j.php?MTID=m28fc8e8e64eec19da0043aca2445de5b

IOvisor TSC & Dev Members call

Agenda :

1) Update on IOVisor Module design and progress on top of eBPF engine
2) discussion about YAML model for Control Plane of IOVisor Modules and their REST APIs
3) Brainstorm on how to integrate IOVisor Module abstraction in DPDK


details for the call:

IOvisor TSC & Dev Members call
Wednesday, October 14, 2015
11:00 am | Eastern Daylight Time (New York, GMT-04:00) | 1 hr 30 mins

Join WebEx meeting
https://plumgrid.webex.com/plumgrid/j.php?MTID=mad00b9597a507885b5e2b3ee0c4166af
Meeting number: 809 028 432
Meeting password: iovisor

Join by phone
+1-415-655-0003 US TOLL
Access code: 809 028 432
Global call-in numbers

When
Wed Jan 6, 2016 11am – 12pm Pacific Time
Where
webex, link attached (map)
Video call
https://plus.google.com/hangouts/_/plumgrid.com/iovisor
Who
Pere Monclus - organizer
Prasun Kapoor
mc3124@...
Sushil Singh
John Zannos
Brenden Blanco
Ed Doe
developer@...
Rich Lane
Neela Jacques
bkanekar@...
David Duffey
uri.elzur@...
aclark@...
iovisor-dev@...
yunsong.lu@...
christopher.price@...
wardd@...
john fastabend
Affan Ahmed Syed
Alexei Starovoitov
krb@...
mbudiu@...
prem@...
jianwen.pi@...
Bhushan Kanekar


Updated Invitation: IOvisor TSC & Dev Members call @ Wed Jan 6, 2016 11am - 12pm (pmonclus@plumgrid.com)

Pere Monclus
 

This event has been changed.

IOvisor TSC & Dev Members call

Changed: Agenda :

1) Update on IOVisor Module design and progress on top of eBPF engine
2) discussion about YAML model for Control Plane of IOVisor Modules and their REST APIs
3) Brainstorm on how to integrate IOVisor Module abstraction in DPDK


details for the call:

https://plumgrid.webex.com/plumgrid/j.php?MTID=m28fc8e8e64eec19da0043aca2445de5b


When
Wed Jan 6, 2016 11am – 12pm Pacific Time
Where
webex, link attached (map)
Video call
https://plus.google.com/hangouts/_/plumgrid.com/iovisor
Calendar
pmonclus@...
Who
Pere Monclus - organizer
Prasun Kapoor
mc3124@...
Sushil Singh
John Zannos
Brenden Blanco
Ed Doe
developer@...
Rich Lane
Neela Jacques
bkanekar@...
David Duffey
uri.elzur@...
aclark@...
iovisor-dev@...
yunsong.lu@...
christopher.price@...
wardd@...
john fastabend
Affan Ahmed Syed
Alexei Starovoitov
krb@...
mbudiu@...
prem@...
jianwen.pi@...
Bhushan Kanekar

Going?   Yes - Maybe - No    more options »

Invitation from Google Calendar

You are receiving this courtesy email at the account iovisor-dev@... because you are an attendee of this event.

To stop receiving future updates for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar.

Forwarding this invitation could allow any recipient to modify your RSVP response. Learn More.


Updated Invitation: IOvisor TSC & Dev Members call @ Wed Jan 6, 2016 11am - 12pm (pmonclus@plumgrid.com)

Pere Monclus
 

This event has been changed.

IOvisor TSC & Dev Members call

Changed: Agenda :

1) Update on IOVisor Module design and progress on top of eBPF engine
2) discussion about YAML model for Control Plane of IOVisor Modules and their REST APIs
3) Brainstorm on how to integrate IOVisor Module abstraction in DPDK


details for the call:

https://plumgrid.webex.com/plumgrid/j.php?MTID=m28fc8e8e64eec19da0043aca2445de5b


When
Wed Jan 6, 2016 11am – 12pm Pacific Time
Where
webex, link attached (map)
Video call
https://plus.google.com/hangouts/_/plumgrid.com/iovisor
Calendar
pmonclus@...
Who
Pere Monclus - organizer
Prasun Kapoor
mc3124@...
Sushil Singh
John Zannos
Brenden Blanco
Ed Doe
developer@...
Rich Lane
Neela Jacques
bkanekar@...
David Duffey
uri.elzur@...
aclark@...
iovisor-dev@...
yunsong.lu@...
christopher.price@...
wardd@...
john fastabend
Affan Ahmed Syed
Alexei Starovoitov
krb@...
mbudiu@...
prem@...
jianwen.pi@...
Bhushan Kanekar

Going?   Yes - Maybe - No    more options »

Invitation from Google Calendar

You are receiving this courtesy email at the account developer@... because you are an attendee of this event.

To stop receiving future updates for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar.

Forwarding this invitation could allow any recipient to modify your RSVP response. Learn More.


Re: Updated Invitation: IOvisor TSC & Dev Members call @ Wed Jan 6, 2016 11am - 12pm (pmonclus@plumgrid.com)

Jianwen Pi <Jianwen.Pi@...>
 

Is this past event? 
From: Pere Monclus
To: Prasun Kapoor; mc3124@...; Sushil Singh; John Zannos; Brenden Blanco; Ed Doe; developer@...; Rich Lane; Neela Jacques; bkanekar@...; David Duffey; uri.elzur@...; aclark@...; iovisor-dev@...; Yunsong Lu; christopher.price@...; wardd@...; john fastabend; Affan Ahmed Syed; Alexei Starovoitov; krb@...; mbudiu@...; prem@...; Jianwen Pi; Bhushan Kanekar;
Subject: Updated Invitation: IOvisor TSC & Dev Members call @ Wed Jan 6, 2016 11am - 12pm (pmonclus@...)

Time: 2016-01-06 11:17:40
This event has been changed.
IOvisor TSC & Dev Members call
Changed: Agenda :
1) Update on IOVisor Module design and progress on top of eBPF engine
2) discussion about YAML model for Control Plane of IOVisor Modules and their REST APIs
3) Brainstorm on how to integrate IOVisor Module abstraction in DPDK

details for the call:

 
 
 
When Wed Jan 6, 2016 11am – 12pm Pacific Time
Where webex, link attached (map)
Video call https://plus.google.com/hangouts/_/plumgrid.com/iovisor
Calendar pmonclus@...
Who
Pere Monclus - organizer
Prasun Kapoor
mc3124@...
Sushil Singh
John Zannos
Brenden Blanco
Ed Doe
developer@...
Rich Lane
Neela Jacques
bkanekar@...
David Duffey
uri.elzur@...
aclark@...
iovisor-dev@...
yunsong.lu@...
christopher.price@...
wardd@...
john fastabend
Affan Ahmed Syed
Alexei Starovoitov
krb@...
mbudiu@...
prem@...
jianwen.pi@...
Bhushan Kanekar
 
Going?   Yes - Maybe - No    more options »
Invitation from Google Calendar
You are receiving this courtesy email at the account jianwen.pi@... because you are an attendee of this event.
To stop receiving future updates for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar.
Forwarding this invitation could allow any recipient to modify your RSVP response. Learn More.


tracing track at linux plumbers

Alexei Starovoitov
 

Hi All,

we're organizing a tracing track at Linux Plumbers:
http://wiki.linuxplumbersconf.org/2016:tracing

please add your topics/proposals to the wiki.
It will also help us to reserve appropriate rooms/time slots.

November is still far away, but registration is already open:
https://www.linuxplumbersconf.org/2016/attend/

and last years the conf was full very early,
so don't wait till Sep to register.

Thanks


Name proposal for iomodule manager?

Brenden Blanco <bblanco@...>
 

While I've been working on the iomodule manager (working name so far is hive), not much has been coming to mind as a better name. However, one popped into my head just now, and I thought I'd throw it out there to get feedback.
"Hover"
I don't know why I picked two things with H, but I kind of like it. hover/hoverd watches over your iomodules from above.

hover/hoverd as package names don't seem to be taken, although there is a libhover as part of eclipse.


Honorable mention also goes to "colony", which Keith proposed, but we both kind of -1'd that after a little discussion.


Re: Name proposal for iomodule manager?

Affan Ahmed Syed <asyed@...>
 

How about "Cerebrum" -- that is the brain behind the iomodules ?

On Tue, Jan 19, 2016 at 11:27 AM, Brenden Blanco via iovisor-dev <iovisor-dev@...> wrote:
While I've been working on the iomodule manager (working name so far is hive), not much has been coming to mind as a better name. However, one popped into my head just now, and I thought I'd throw it out there to get feedback.
"Hover"
I don't know why I picked two things with H, but I kind of like it. hover/hoverd watches over your iomodules from above.

hover/hoverd as package names don't seem to be taken, although there is a libhover as part of eclipse.


Honorable mention also goes to "colony", which Keith proposed, but we both kind of -1'd that after a little discussion.

_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev



Updated Invitation: IOvisor TSC & Dev Members call @ Wed Jan 20, 2016 11am - 12pm (pmonclus@plumgrid.com)

Pere Monclus
 

This event has been changed.

IOvisor TSC & Dev Members call

Changed: IOVisor TSC/Dev Meeting
Every 2 weeks on Wednesday, from Wednesday, January 20, 2016, to no end date11:00 am  |  Pacific Standard Time (San Francisco, GMT-08:00)  |  1 hr 
Join WebEx meeting
Meeting number:283 885 640
Meeting password:iovisor 
Join by phone
+1-415-655-0003
 US TOLLAccess code: 283 885 640
Global call-in numbers Add this meeting to your calendar. (Cannot add from mobile devices.) Can't join the meeting? Contact
When
Wed Jan 20, 2016 11am – 12pm Pacific Time
Where
webex, link attached (map)
Video call
https://plus.google.com/hangouts/_/plumgrid.com/iovisor
Calendar
pmonclus@...
Who
Pere Monclus - organizer
aclark@...
John Zannos
mbudiu@...
Bhushan Kanekar
Prasun Kapoor
David Duffey
john fastabend
iovisor-dev@...
jianwen.pi@...
Affan Ahmed Syed
yunsong.lu@...
prem@...
Rich Lane
christopher.price@...
developer@...
Neela Jacques
bkanekar@...
uri.elzur@...
mc3124@...
wardd@...
Alexei Starovoitov
krb@...
Brenden Blanco
Ed Doe
Sushil Singh

Going?   Yes - Maybe - No    more options »

Invitation from Google Calendar

You are receiving this courtesy email at the account developer@... because you are an attendee of this event.

To stop receiving future updates for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar.

Forwarding this invitation could allow any recipient to modify your RSVP response. Learn More.


Updated Invitation: IOvisor TSC & Dev Members call @ Wed Jan 20, 2016 11am - 12pm (pmonclus@plumgrid.com)

Pere Monclus
 

This event has been changed.

IOvisor TSC & Dev Members call

Changed: IOVisor TSC/Dev Meeting
Every 2 weeks on Wednesday, from Wednesday, January 20, 2016, to no end date11:00 am  |  Pacific Standard Time (San Francisco, GMT-08:00)  |  1 hr 
Join WebEx meeting
Meeting number:283 885 640
Meeting password:iovisor 
Join by phone
+1-415-655-0003
 US TOLLAccess code: 283 885 640
Global call-in numbers Add this meeting to your calendar. (Cannot add from mobile devices.) Can't join the meeting? Contact
When
Wed Jan 20, 2016 11am – 12pm Pacific Time
Where
webex, link attached (map)
Video call
https://plus.google.com/hangouts/_/plumgrid.com/iovisor
Calendar
pmonclus@...
Who
Pere Monclus - organizer
aclark@...
John Zannos
mbudiu@...
Bhushan Kanekar
Prasun Kapoor
David Duffey
john fastabend
iovisor-dev@...
jianwen.pi@...
Affan Ahmed Syed
yunsong.lu@...
prem@...
Rich Lane
christopher.price@...
developer@...
Neela Jacques
bkanekar@...
uri.elzur@...
mc3124@...
wardd@...
Alexei Starovoitov
krb@...
Brenden Blanco
Ed Doe
Sushil Singh

Going?   Yes - Maybe - No    more options »

Invitation from Google Calendar

You are receiving this courtesy email at the account iovisor-dev@... because you are an attendee of this event.

To stop receiving future updates for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar.

Forwarding this invitation could allow any recipient to modify your RSVP response. Learn More.


Re: Updated Invitation: IOvisor TSC & Dev Members call @ Wed Jan 20, 2016 11am - 12pm (bblanco@plumgrid.com)

Brenden Blanco <bblanco@...>
 

Looks like Pere's email client mangled the webex link. The meeting is joinable by phone #, or use this link: https://plumgrid.webex.com/plumgrid/j.php?MTID=m38483459873b0f92c0533e82e955739a

On Wed, Jan 20, 2016 at 6:38 AM, Pere Monclus <pmonclus@...> wrote:

This event has been changed.

IOvisor TSC & Dev Members call

Changed: IOVisor TSC/Dev Meeting
Every 2 weeks on Wednesday, from Wednesday, January 20, 2016, to no end date11:00 am  |  Pacific Standard Time (San Francisco, GMT-08:00)  |  1 hr 
Join WebEx meeting
Meeting number:283 885 640
Meeting password:iovisor 
Join by phone
+1-415-655-0003
 US TOLLAccess code: 283 885 640
Global call-in numbers Add this meeting to your calendar. (Cannot add from mobile devices.) Can't join the meeting? Contact
When
Wed Jan 20, 2016 11am – 12pm Pacific Time
Where
webex, link attached (map)
Video call
https://plus.google.com/hangouts/_/plumgrid.com/iovisor
Calendar
bblanco@...
Who
Pere Monclus - organizer
john fastabend
Brenden Blanco
Bhushan Kanekar
Prasun Kapoor
Alexei Starovoitov
John Zannos
Sushil Singh
Ed Doe
Neela Jacques
Rich Lane
Affan Ahmed Syed
David Duffey

Going?   Yes - Maybe - No    more options »

Invitation from Google Calendar

You are receiving this email at the account bblanco@... because you are subscribed for updated invitations on calendar bblanco@....

To stop receiving these emails, please log in to https://www.google.com/calendar/ and change your notification settings for this calendar.

Forwarding this invitation could allow any recipient to modify your RSVP response. Learn More.



IO Visor TSC/Dev Meeting Minutes

Brenden Blanco <bblanco@...>
 

Hi All,

Today we had our biweekly sync-up call. It was a lively discussion, and I
captured some but not all of the notes below.

One administrative note is that a new wiki project has been created on
github.com/iovisor/wiki. It should be open for anyone with a github account to
make edits, so please feel free to add your wishlist, things that are being
worked on, etc. If there is something that is relevant to a particular project
(e.g. bcc), please track it there separately/in addition to, perhaps with a
hyperlink to the issue #.

Attendees:

Rich Lane
Prem Jonnalagadda
Mihai Budiu
Keith Burns
John Fastabend
Jianwen Pi
Deepa Kalani
Brenden Blanco
Brendan Gregg
Billy O'Mahony
Alexei Starovoitov

Here is a summary of the discussions during the meeting:

Brendan G. recently added support for stack trace / callchain magic, see
http://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html for
instance. He has more things in the works, for instance is what he called
'wake-up chains'. Stay tuned!

Alexei S. mentioned mmap-ing arrays as something that might add nice
performance characteristics to the tools that Brendan is working on. Alexei,
please feel free to add your thoughts to the wiki.

Tracepoints continue to be a requested feature, Alexei says this still needs
some work, there have been several iterations - the discussion should be
resurrected.
* One fear from the kernel community is that all arguments to tracepoints will
become fixed ABI if exposed in the wrong way.

On the networking side, some folks are working to sprinkle tracepoints in the
tcp stack.

Brendan mentioned that the dtrace book has some relevant use cases, that would
be nice to share with the folks working on adding new tracepoints. Brendan, can
you provide us with the link to that?

Also mentioned was tcpdive (https://github.com/fastos/tcpdive), based on
systemtap, which looks like an interesting tool to compare to.

Alexei mentions that Facebook is having a kernel meetup/miniconference in the
Bay Area in March. Ping Alexei for details.

John F. has been having a hardware/kernel offload discussion in the background
* Working on Intel drivers to bring switch/nic hardware mapping support into tc
* Haven't tackled bpf implementation yet
* Alexei asking for pointers on what bpf support might be needed
* Working backwards from bpf seems tricky
* Already did a p4 translation to bpf with load to tc ingress
* Requirement for tcam-like map support
* Upcall would be nice (for trying to implement ovs with bpf)
- Currently forwarding to a tap device
- Alexei: something like ipvlan, where something can just be "put" into a
device without veth overhead

* DPDK + iomodule support is still on the radar: no new updates besides that

Here are also some miscellaneous notes/idea, Alexei can you put these
on the wiki perhaps?

Idea: maybe include socket information along with skb, to build stateful
programs more easily

Idea: protocol gro with eBPF

Idea: may be able to push/pop arbitrary headers

llvm-dev: someone looked into adding vector support to bpf backend


Finally, I showed progress that has been made on the GBP iomodule
implementation. That continues to be captured at
https://github.com/iovisor/iomodules.

Keith will be potentially looking to include that in a demo format at an
upcoming Cisco event.


Re: Changing packet fields before redirect

Ashhad Sheikh <ashhadsheikh394@...>
 

I've tried using bpf_skb_store_bytes() but no luck so far. Actually I'm trying to implement a router function but when the first packet passes through a linux bridge it broadcasts an ARP request(since there isn't any path that exists). I guess that's stopping me to change fields since the ARP request packet is cloned.

-Ashhad 


On Sun, Dec 20, 2015 at 4:37 AM Alexei Starovoitov <alexei.starovoitov@...> wrote:
On Sat, Dec 19, 2015 at 11:20 AM, Ashhad Sheikh via iovisor-dev
<iovisor-dev@...> wrote:
> Hello,
> I there any way ton change the packet fields in BPF program before
> redirecting it to another ifindex.
> I want to make changes in packet fields before redirecting it to next
> ifindex
> i.e I want to change ARP request packet to response packet by making changes
> in its field.
> arp->oper=0x0002
>
> bpf_clone_redirect won't work in this scenerio, tried bpf_redirect too but
> no luck so far.

have you tried bpf_skb_store_bytes() to write into packet data ?
meta-data fields of skb are mostly read-only with few exceptions.


Re: Changing packet fields before redirect

Daniel Borkmann
 

On 01/27/2016 02:56 PM, Ashhad Sheikh via iovisor-dev wrote:
I've tried using bpf_skb_store_bytes() but no luck so far. Actually I'm
trying to implement a router function but when the first packet passes
through a linux bridge it broadcasts an ARP request(since there isn't any
path that exists). I guess that's stopping me to change fields since the
ARP request packet is cloned.
Do you get -EFAULT error from bpf_skb_store_bytes()? What buffer size
do you pass into the function (R4)?

Would something like the below help? Just compile-tested for now, still
needs an audit whether we are always safe with regards to shared skbs
where skb is cloned. (Looks like pskb_expand_head() is called from
__skb_vlan_pop() as well, though.)

Cheers,
Daniel

From 0928ab81f893caf582c20d8efbd0d983f99f7ea5 Mon Sep 17 00:00:00 2001
Message-Id: <0928ab81f893caf582c20d8efbd0d983f99f7ea5.1453906418.git.daniel@...>
From: Daniel Borkmann <daniel@...>
Date: Wed, 27 Jan 2016 11:55:42 +0100
Subject: [PATCH] bpf: try harder on clones when writing into skb

When we're dealing with clones and the area is not writeable, try harder
and get a copy via pskb_expand_head().

Signed-off-by: Daniel Borkmann <daniel@...>
---
include/linux/skbuff.h | 7 +++++++
net/core/filter.c | 21 +++++++++++----------
2 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 11f935c..0eb5dd7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2621,6 +2621,13 @@ static inline int skb_clone_writable(const struct sk_buff *skb, unsigned int len
skb_headroom(skb) + len <= skb->hdr_len;
}

+static inline int skb_try_make_writable(struct sk_buff *skb, int offset,
+ int len)
+{
+ return skb_cloned(skb) && !skb_clone_writable(skb, offset + len) &&
+ pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
+}
+
static inline int __skb_cow(struct sk_buff *skb, unsigned int headroom,
int cloned)
{
diff --git a/net/core/filter.c b/net/core/filter.c
index 6e6bbac..2f5ce50 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1361,9 +1361,7 @@ static u64 bpf_skb_store_bytes(u64 r1, u64 r2, u64 r3, u64 r4, u64 flags)
*/
if (unlikely((u32) offset > 0xffff || len > sizeof(sp->buff)))
return -EFAULT;
-
- if (unlikely(skb_cloned(skb) &&
- !skb_clone_writable(skb, offset + len)))
+ if (unlikely(skb_try_make_writable(skb, offset, len)))
return -EFAULT;

ptr = skb_header_pointer(skb, offset, len, sp->buff);
@@ -1436,10 +1434,8 @@ static u64 bpf_l3_csum_replace(u64 r1, u64 r2, u64 from, u64 to, u64 flags)
return -EINVAL;
if (unlikely((u32) offset > 0xffff))
return -EFAULT;
-
- if (unlikely(skb_cloned(skb) &&
- !skb_clone_writable(skb, offset + sizeof(sum))))
- return -EFAULT;
+ if (unlikely(skb_try_make_writable(skb, offset, sizeof(sum))))
+ return -EFAULT;

ptr = skb_header_pointer(skb, offset, sizeof(sum), &sum);
if (unlikely(!ptr))
@@ -1485,9 +1481,7 @@ static u64 bpf_l4_csum_replace(u64 r1, u64 r2, u64 from, u64 to, u64 flags)
return -EINVAL;
if (unlikely((u32) offset > 0xffff))
return -EFAULT;
-
- if (unlikely(skb_cloned(skb) &&
- !skb_clone_writable(skb, offset + sizeof(sum))))
+ if (unlikely(skb_try_make_writable(skb, offset, sizeof(sum))))
return -EFAULT;

ptr = skb_header_pointer(skb, offset, sizeof(sum), &sum);
@@ -1737,6 +1731,13 @@ bool bpf_helper_changes_skb_data(void *func)
return true;
if (func == bpf_skb_vlan_pop)
return true;
+ if (func == bpf_skb_store_bytes)
+ return true;
+ if (func == bpf_l3_csum_replace)
+ return true;
+ if (func == bpf_l4_csum_replace)
+ return true;
+
return false;
}

--
1.9.3


Re: Changing packet fields before redirect

Alexei Starovoitov
 

On Wed, Jan 27, 2016 at 7:03 AM, Daniel Borkmann <daniel@...> wrote:

+static inline int skb_try_make_writable(struct sk_buff *skb, int offset,
+ int len)
I would keep single 'offset' or 'len' argument here.
Let the caller do the math, since it's faster and
better matches meaning of single arg as
'length up to which to write'.

+{
+ return skb_cloned(skb) && !skb_clone_writable(skb, offset + len) &&
+ pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
+}
+
..
@@ -1737,6 +1731,13 @@ bool bpf_helper_changes_skb_data(void *func)
return true;
if (func == bpf_skb_vlan_pop)
return true;
+ if (func == bpf_skb_store_bytes)
+ return true;
+ if (func == bpf_l3_csum_replace)
+ return true;
+ if (func == bpf_l4_csum_replace)
+ return true;
yep. was thinking to do the same, since
bpf_helper_changes_skb_data() landed.
That should be a nice addition!
Thanks


Re: Changing packet fields before redirect

Daniel Borkmann
 

On 01/27/2016 06:18 PM, Alexei Starovoitov wrote:
On Wed, Jan 27, 2016 at 7:03 AM, Daniel Borkmann <daniel@...> wrote:

+static inline int skb_try_make_writable(struct sk_buff *skb, int offset,
+ int len)
I would keep single 'offset' or 'len' argument here.
Sure, that's fine, can do that.

Let the caller do the math, since it's faster and
better matches meaning of single arg as
'length up to which to write'.

+{
+ return skb_cloned(skb) && !skb_clone_writable(skb, offset + len) &&
+ pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
+}
+
..
@@ -1737,6 +1731,13 @@ bool bpf_helper_changes_skb_data(void *func)
return true;
if (func == bpf_skb_vlan_pop)
return true;
+ if (func == bpf_skb_store_bytes)
+ return true;
+ if (func == bpf_l3_csum_replace)
+ return true;
+ if (func == bpf_l4_csum_replace)
+ return true;
yep. was thinking to do the same, since
bpf_helper_changes_skb_data() landed.
That should be a nice addition!
Yeah, I think reloading pointers via JIT should not take too many cycles.

Thanks
Was still thinking if we better should extend this (rather slow-path)
test into handling it more gracefully:

!skb_shared(skb) && pskb_expand_head(skb, 0, 0, GFP_ATOMIC);

Shared skbs should be rather rare. But, there seem to be tricky things
with skb_get() or raw atomic_inc(&skb->users) that /could/ cause a BUG
when calling into pskb_expand_head() in our path. Taking pktgen aside,
I remember from an old netdev discussion, that with taps shared skbs
and pskb_expand_head() should cause issues. Going through the pf_packet
code in tpacket_rcv() (RX_RING), I see we clone the shared skb that came
in via deliver_skb() on ingress, and for egress which should only have
a cloned skb, we make the clone shared via skb_get() for TP_STATUS_COPY
case. packet_rcv() is only so long a shared skb until we lazy clone it
after a BPF filter was running that decided to keep the skb. So there,
since it's clone the users are reset to 1, but that should happen
sequentially wrt ingress qdisc invocation.

Thanks,
Daniel


Re: Changing packet fields before redirect

Alexei Starovoitov
 

On Wed, Jan 27, 2016 at 11:26 AM, Daniel Borkmann <daniel@...> wrote:
On 01/27/2016 06:18 PM, Alexei Starovoitov wrote:

On Wed, Jan 27, 2016 at 7:03 AM, Daniel Borkmann <daniel@...>
wrote:


+static inline int skb_try_make_writable(struct sk_buff *skb, int offset,
+ int len)

I would keep single 'offset' or 'len' argument here.

Sure, that's fine, can do that.

Let the caller do the math, since it's faster and
better matches meaning of single arg as
'length up to which to write'.

+{
+ return skb_cloned(skb) && !skb_clone_writable(skb, offset + len)
&&
+ pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
+}
+
..

@@ -1737,6 +1731,13 @@ bool bpf_helper_changes_skb_data(void *func)
return true;
if (func == bpf_skb_vlan_pop)
return true;
+ if (func == bpf_skb_store_bytes)
+ return true;
+ if (func == bpf_l3_csum_replace)
+ return true;
+ if (func == bpf_l4_csum_replace)
+ return true;

yep. was thinking to do the same, since
bpf_helper_changes_skb_data() landed.
That should be a nice addition!

Yeah, I think reloading pointers via JIT should not take too many cycles.

Thanks

Was still thinking if we better should extend this (rather slow-path)
test into handling it more gracefully:

!skb_shared(skb) && pskb_expand_head(skb, 0, 0, GFP_ATOMIC);

Shared skbs should be rather rare. But, there seem to be tricky things
with skb_get() or raw atomic_inc(&skb->users) that /could/ cause a BUG
when calling into pskb_expand_head() in our path. Taking pktgen aside,
I remember from an old netdev discussion, that with taps shared skbs
and pskb_expand_head() should cause issues. Going through the pf_packet
I don't think so. only pktgen does ugly things.
it's a requirement of the IP stack to have users == 1.
we had this discussion before. I will try to dig out my old email.

btw, there is skb_make_writable() that used by netfilter,
but doing skb_cloned(skb) && !skb_clone_writable() && pskb_expand
is faster and probably cleaner.

41 - 60 of 2021