Changing packet fields before redirect


Ashhad Sheikh <ashhadsheikh394@...>
 

Hello, 
I there any way ton change the packet fields in BPF program before redirecting it to another ifindex.
I want to make changes in packet fields before redirecting it to next ifindex
i.e I want to change ARP request packet to response packet by making changes in its field.
arp->oper=0x0002 

bpf_clone_redirect won't work in this scenerio, tried bpf_redirect too but no luck so far.

Regards
Ashhad 
 


Alexei Starovoitov
 

On Sat, Dec 19, 2015 at 11:20 AM, Ashhad Sheikh via iovisor-dev
<iovisor-dev@...> wrote:
Hello,
I there any way ton change the packet fields in BPF program before
redirecting it to another ifindex.
I want to make changes in packet fields before redirecting it to next
ifindex
i.e I want to change ARP request packet to response packet by making changes
in its field.
arp->oper=0x0002

bpf_clone_redirect won't work in this scenerio, tried bpf_redirect too but
no luck so far.
have you tried bpf_skb_store_bytes() to write into packet data ?
meta-data fields of skb are mostly read-only with few exceptions.


Ashhad Sheikh <ashhadsheikh394@...>
 

I've tried using bpf_skb_store_bytes() but no luck so far. Actually I'm trying to implement a router function but when the first packet passes through a linux bridge it broadcasts an ARP request(since there isn't any path that exists). I guess that's stopping me to change fields since the ARP request packet is cloned.

-Ashhad 


On Sun, Dec 20, 2015 at 4:37 AM Alexei Starovoitov <alexei.starovoitov@...> wrote:
On Sat, Dec 19, 2015 at 11:20 AM, Ashhad Sheikh via iovisor-dev
<iovisor-dev@...> wrote:
> Hello,
> I there any way ton change the packet fields in BPF program before
> redirecting it to another ifindex.
> I want to make changes in packet fields before redirecting it to next
> ifindex
> i.e I want to change ARP request packet to response packet by making changes
> in its field.
> arp->oper=0x0002
>
> bpf_clone_redirect won't work in this scenerio, tried bpf_redirect too but
> no luck so far.

have you tried bpf_skb_store_bytes() to write into packet data ?
meta-data fields of skb are mostly read-only with few exceptions.


Daniel Borkmann
 

On 01/27/2016 02:56 PM, Ashhad Sheikh via iovisor-dev wrote:
I've tried using bpf_skb_store_bytes() but no luck so far. Actually I'm
trying to implement a router function but when the first packet passes
through a linux bridge it broadcasts an ARP request(since there isn't any
path that exists). I guess that's stopping me to change fields since the
ARP request packet is cloned.
Do you get -EFAULT error from bpf_skb_store_bytes()? What buffer size
do you pass into the function (R4)?

Would something like the below help? Just compile-tested for now, still
needs an audit whether we are always safe with regards to shared skbs
where skb is cloned. (Looks like pskb_expand_head() is called from
__skb_vlan_pop() as well, though.)

Cheers,
Daniel

From 0928ab81f893caf582c20d8efbd0d983f99f7ea5 Mon Sep 17 00:00:00 2001
Message-Id: <0928ab81f893caf582c20d8efbd0d983f99f7ea5.1453906418.git.daniel@...>
From: Daniel Borkmann <daniel@...>
Date: Wed, 27 Jan 2016 11:55:42 +0100
Subject: [PATCH] bpf: try harder on clones when writing into skb

When we're dealing with clones and the area is not writeable, try harder
and get a copy via pskb_expand_head().

Signed-off-by: Daniel Borkmann <daniel@...>
---
include/linux/skbuff.h | 7 +++++++
net/core/filter.c | 21 +++++++++++----------
2 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 11f935c..0eb5dd7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2621,6 +2621,13 @@ static inline int skb_clone_writable(const struct sk_buff *skb, unsigned int len
skb_headroom(skb) + len <= skb->hdr_len;
}

+static inline int skb_try_make_writable(struct sk_buff *skb, int offset,
+ int len)
+{
+ return skb_cloned(skb) && !skb_clone_writable(skb, offset + len) &&
+ pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
+}
+
static inline int __skb_cow(struct sk_buff *skb, unsigned int headroom,
int cloned)
{
diff --git a/net/core/filter.c b/net/core/filter.c
index 6e6bbac..2f5ce50 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1361,9 +1361,7 @@ static u64 bpf_skb_store_bytes(u64 r1, u64 r2, u64 r3, u64 r4, u64 flags)
*/
if (unlikely((u32) offset > 0xffff || len > sizeof(sp->buff)))
return -EFAULT;
-
- if (unlikely(skb_cloned(skb) &&
- !skb_clone_writable(skb, offset + len)))
+ if (unlikely(skb_try_make_writable(skb, offset, len)))
return -EFAULT;

ptr = skb_header_pointer(skb, offset, len, sp->buff);
@@ -1436,10 +1434,8 @@ static u64 bpf_l3_csum_replace(u64 r1, u64 r2, u64 from, u64 to, u64 flags)
return -EINVAL;
if (unlikely((u32) offset > 0xffff))
return -EFAULT;
-
- if (unlikely(skb_cloned(skb) &&
- !skb_clone_writable(skb, offset + sizeof(sum))))
- return -EFAULT;
+ if (unlikely(skb_try_make_writable(skb, offset, sizeof(sum))))
+ return -EFAULT;

ptr = skb_header_pointer(skb, offset, sizeof(sum), &sum);
if (unlikely(!ptr))
@@ -1485,9 +1481,7 @@ static u64 bpf_l4_csum_replace(u64 r1, u64 r2, u64 from, u64 to, u64 flags)
return -EINVAL;
if (unlikely((u32) offset > 0xffff))
return -EFAULT;
-
- if (unlikely(skb_cloned(skb) &&
- !skb_clone_writable(skb, offset + sizeof(sum))))
+ if (unlikely(skb_try_make_writable(skb, offset, sizeof(sum))))
return -EFAULT;

ptr = skb_header_pointer(skb, offset, sizeof(sum), &sum);
@@ -1737,6 +1731,13 @@ bool bpf_helper_changes_skb_data(void *func)
return true;
if (func == bpf_skb_vlan_pop)
return true;
+ if (func == bpf_skb_store_bytes)
+ return true;
+ if (func == bpf_l3_csum_replace)
+ return true;
+ if (func == bpf_l4_csum_replace)
+ return true;
+
return false;
}

--
1.9.3


Alexei Starovoitov
 

On Wed, Jan 27, 2016 at 7:03 AM, Daniel Borkmann <daniel@...> wrote:

+static inline int skb_try_make_writable(struct sk_buff *skb, int offset,
+ int len)
I would keep single 'offset' or 'len' argument here.
Let the caller do the math, since it's faster and
better matches meaning of single arg as
'length up to which to write'.

+{
+ return skb_cloned(skb) && !skb_clone_writable(skb, offset + len) &&
+ pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
+}
+
..
@@ -1737,6 +1731,13 @@ bool bpf_helper_changes_skb_data(void *func)
return true;
if (func == bpf_skb_vlan_pop)
return true;
+ if (func == bpf_skb_store_bytes)
+ return true;
+ if (func == bpf_l3_csum_replace)
+ return true;
+ if (func == bpf_l4_csum_replace)
+ return true;
yep. was thinking to do the same, since
bpf_helper_changes_skb_data() landed.
That should be a nice addition!
Thanks


Daniel Borkmann
 

On 01/27/2016 06:18 PM, Alexei Starovoitov wrote:
On Wed, Jan 27, 2016 at 7:03 AM, Daniel Borkmann <daniel@...> wrote:

+static inline int skb_try_make_writable(struct sk_buff *skb, int offset,
+ int len)
I would keep single 'offset' or 'len' argument here.
Sure, that's fine, can do that.

Let the caller do the math, since it's faster and
better matches meaning of single arg as
'length up to which to write'.

+{
+ return skb_cloned(skb) && !skb_clone_writable(skb, offset + len) &&
+ pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
+}
+
..
@@ -1737,6 +1731,13 @@ bool bpf_helper_changes_skb_data(void *func)
return true;
if (func == bpf_skb_vlan_pop)
return true;
+ if (func == bpf_skb_store_bytes)
+ return true;
+ if (func == bpf_l3_csum_replace)
+ return true;
+ if (func == bpf_l4_csum_replace)
+ return true;
yep. was thinking to do the same, since
bpf_helper_changes_skb_data() landed.
That should be a nice addition!
Yeah, I think reloading pointers via JIT should not take too many cycles.

Thanks
Was still thinking if we better should extend this (rather slow-path)
test into handling it more gracefully:

!skb_shared(skb) && pskb_expand_head(skb, 0, 0, GFP_ATOMIC);

Shared skbs should be rather rare. But, there seem to be tricky things
with skb_get() or raw atomic_inc(&skb->users) that /could/ cause a BUG
when calling into pskb_expand_head() in our path. Taking pktgen aside,
I remember from an old netdev discussion, that with taps shared skbs
and pskb_expand_head() should cause issues. Going through the pf_packet
code in tpacket_rcv() (RX_RING), I see we clone the shared skb that came
in via deliver_skb() on ingress, and for egress which should only have
a cloned skb, we make the clone shared via skb_get() for TP_STATUS_COPY
case. packet_rcv() is only so long a shared skb until we lazy clone it
after a BPF filter was running that decided to keep the skb. So there,
since it's clone the users are reset to 1, but that should happen
sequentially wrt ingress qdisc invocation.

Thanks,
Daniel


Alexei Starovoitov
 

On Wed, Jan 27, 2016 at 11:26 AM, Daniel Borkmann <daniel@...> wrote:
On 01/27/2016 06:18 PM, Alexei Starovoitov wrote:

On Wed, Jan 27, 2016 at 7:03 AM, Daniel Borkmann <daniel@...>
wrote:


+static inline int skb_try_make_writable(struct sk_buff *skb, int offset,
+ int len)

I would keep single 'offset' or 'len' argument here.

Sure, that's fine, can do that.

Let the caller do the math, since it's faster and
better matches meaning of single arg as
'length up to which to write'.

+{
+ return skb_cloned(skb) && !skb_clone_writable(skb, offset + len)
&&
+ pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
+}
+
..

@@ -1737,6 +1731,13 @@ bool bpf_helper_changes_skb_data(void *func)
return true;
if (func == bpf_skb_vlan_pop)
return true;
+ if (func == bpf_skb_store_bytes)
+ return true;
+ if (func == bpf_l3_csum_replace)
+ return true;
+ if (func == bpf_l4_csum_replace)
+ return true;

yep. was thinking to do the same, since
bpf_helper_changes_skb_data() landed.
That should be a nice addition!

Yeah, I think reloading pointers via JIT should not take too many cycles.

Thanks

Was still thinking if we better should extend this (rather slow-path)
test into handling it more gracefully:

!skb_shared(skb) && pskb_expand_head(skb, 0, 0, GFP_ATOMIC);

Shared skbs should be rather rare. But, there seem to be tricky things
with skb_get() or raw atomic_inc(&skb->users) that /could/ cause a BUG
when calling into pskb_expand_head() in our path. Taking pktgen aside,
I remember from an old netdev discussion, that with taps shared skbs
and pskb_expand_head() should cause issues. Going through the pf_packet
I don't think so. only pktgen does ugly things.
it's a requirement of the IP stack to have users == 1.
we had this discussion before. I will try to dig out my old email.

btw, there is skb_make_writable() that used by netfilter,
but doing skb_cloned(skb) && !skb_clone_writable() && pskb_expand
is faster and probably cleaner.


Daniel Borkmann
 

On 01/27/2016 08:42 PM, Alexei Starovoitov wrote:
On Wed, Jan 27, 2016 at 11:26 AM, Daniel Borkmann <daniel@...> wrote:
On 01/27/2016 06:18 PM, Alexei Starovoitov wrote:

On Wed, Jan 27, 2016 at 7:03 AM, Daniel Borkmann <daniel@...>
wrote:


+static inline int skb_try_make_writable(struct sk_buff *skb, int offset,
+ int len)
I would keep single 'offset' or 'len' argument here.
Sure, that's fine, can do that.

Let the caller do the math, since it's faster and
better matches meaning of single arg as
'length up to which to write'.

+{
+ return skb_cloned(skb) && !skb_clone_writable(skb, offset + len)
&&
+ pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
+}
+
..

@@ -1737,6 +1731,13 @@ bool bpf_helper_changes_skb_data(void *func)
return true;
if (func == bpf_skb_vlan_pop)
return true;
+ if (func == bpf_skb_store_bytes)
+ return true;
+ if (func == bpf_l3_csum_replace)
+ return true;
+ if (func == bpf_l4_csum_replace)
+ return true;
yep. was thinking to do the same, since
bpf_helper_changes_skb_data() landed.
That should be a nice addition!
Yeah, I think reloading pointers via JIT should not take too many cycles.

Thanks
Was still thinking if we better should extend this (rather slow-path)
test into handling it more gracefully:

!skb_shared(skb) && pskb_expand_head(skb, 0, 0, GFP_ATOMIC);

Shared skbs should be rather rare. But, there seem to be tricky things
with skb_get() or raw atomic_inc(&skb->users) that /could/ cause a BUG
when calling into pskb_expand_head() in our path. Taking pktgen aside,
I remember from an old netdev discussion, that with taps shared skbs
and pskb_expand_head() should cause issues. Going through the pf_packet
I don't think so. only pktgen does ugly things.
it's a requirement of the IP stack to have users == 1.
we had this discussion before. I will try to dig out my old email.
I believe from the old nft ingress discussion. ;) But maybe it was just
a different issue w/ other actions (or not up to date fact anymore).

btw, there is skb_make_writable() that used by netfilter,
but doing skb_cloned(skb) && !skb_clone_writable() && pskb_expand
is faster and probably cleaner.
There's also skb_ensure_writable(), but seems to be doing more than we
actually need.

Thanks,
Daniel


Alexei Starovoitov
 

On Wed, Jan 27, 2016 at 12:17 PM, Daniel Borkmann <daniel@...> wrote:
On 01/27/2016 08:42 PM, Alexei Starovoitov wrote:

On Wed, Jan 27, 2016 at 11:26 AM, Daniel Borkmann <daniel@...>
wrote:

On 01/27/2016 06:18 PM, Alexei Starovoitov wrote:


On Wed, Jan 27, 2016 at 7:03 AM, Daniel Borkmann <daniel@...>
wrote:



+static inline int skb_try_make_writable(struct sk_buff *skb, int
offset,
+ int len)

I would keep single 'offset' or 'len' argument here.

Sure, that's fine, can do that.

Let the caller do the math, since it's faster and
better matches meaning of single arg as
'length up to which to write'.

+{
+ return skb_cloned(skb) && !skb_clone_writable(skb, offset +
len)
&&
+ pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
+}
+

..


@@ -1737,6 +1731,13 @@ bool bpf_helper_changes_skb_data(void *func)
return true;
if (func == bpf_skb_vlan_pop)
return true;
+ if (func == bpf_skb_store_bytes)
+ return true;
+ if (func == bpf_l3_csum_replace)
+ return true;
+ if (func == bpf_l4_csum_replace)
+ return true;

yep. was thinking to do the same, since
bpf_helper_changes_skb_data() landed.
That should be a nice addition!

Yeah, I think reloading pointers via JIT should not take too many cycles.

Thanks

Was still thinking if we better should extend this (rather slow-path)
test into handling it more gracefully:

!skb_shared(skb) && pskb_expand_head(skb, 0, 0, GFP_ATOMIC);

Shared skbs should be rather rare. But, there seem to be tricky things
with skb_get() or raw atomic_inc(&skb->users) that /could/ cause a BUG
when calling into pskb_expand_head() in our path. Taking pktgen aside,
I remember from an old netdev discussion, that with taps shared skbs
and pskb_expand_head() should cause issues. Going through the pf_packet

I don't think so. only pktgen does ugly things.
it's a requirement of the IP stack to have users == 1.
we had this discussion before. I will try to dig out my old email.

I believe from the old nft ingress discussion. ;) But maybe it was just
a different issue w/ other actions (or not up to date fact anymore).

btw, there is skb_make_writable() that used by netfilter,
but doing skb_cloned(skb) && !skb_clone_writable() && pskb_expand
is faster and probably cleaner.

There's also skb_ensure_writable(), but seems to be doing more than we
actually need.
yep. it's an ovs gimmick. pskb_may_pull() is useless for us.


Daniel Borkmann
 

On 01/27/2016 10:07 PM, Alexei Starovoitov wrote:
On Wed, Jan 27, 2016 at 12:17 PM, Daniel Borkmann <daniel@...> wrote:
On 01/27/2016 08:42 PM, Alexei Starovoitov wrote:
[...]
Was still thinking if we better should extend this (rather slow-path)
test into handling it more gracefully:

!skb_shared(skb) && pskb_expand_head(skb, 0, 0, GFP_ATOMIC);

Shared skbs should be rather rare. But, there seem to be tricky things
with skb_get() or raw atomic_inc(&skb->users) that /could/ cause a BUG
when calling into pskb_expand_head() in our path. Taking pktgen aside,
I remember from an old netdev discussion, that with taps shared skbs
and pskb_expand_head() should cause issues. Going through the pf_packet
I don't think so. only pktgen does ugly things.
it's a requirement of the IP stack to have users == 1.
we had this discussion before. I will try to dig out my old email.
I believe from the old nft ingress discussion. ;) But maybe it was just
a different issue w/ other actions (or not up to date fact anymore).

btw, there is skb_make_writable() that used by netfilter,
but doing skb_cloned(skb) && !skb_clone_writable() && pskb_expand
is faster and probably cleaner.
There's also skb_ensure_writable(), but seems to be doing more than we
actually need.
yep. it's an ovs gimmick. pskb_may_pull() is useless for us.
Yeah, so I'll do some more testing and get it out for -net-next when it
opens up next few days hopefully, keeping also Ashhad in the loop (meanwhile
you can use the patch).

Cheers,
Daniel