Date   

Re: Changing packet fields before redirect

Daniel Borkmann
 

On 01/27/2016 08:42 PM, Alexei Starovoitov wrote:
On Wed, Jan 27, 2016 at 11:26 AM, Daniel Borkmann <daniel@...> wrote:
On 01/27/2016 06:18 PM, Alexei Starovoitov wrote:

On Wed, Jan 27, 2016 at 7:03 AM, Daniel Borkmann <daniel@...>
wrote:


+static inline int skb_try_make_writable(struct sk_buff *skb, int offset,
+ int len)
I would keep single 'offset' or 'len' argument here.
Sure, that's fine, can do that.

Let the caller do the math, since it's faster and
better matches meaning of single arg as
'length up to which to write'.

+{
+ return skb_cloned(skb) && !skb_clone_writable(skb, offset + len)
&&
+ pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
+}
+
..

@@ -1737,6 +1731,13 @@ bool bpf_helper_changes_skb_data(void *func)
return true;
if (func == bpf_skb_vlan_pop)
return true;
+ if (func == bpf_skb_store_bytes)
+ return true;
+ if (func == bpf_l3_csum_replace)
+ return true;
+ if (func == bpf_l4_csum_replace)
+ return true;
yep. was thinking to do the same, since
bpf_helper_changes_skb_data() landed.
That should be a nice addition!
Yeah, I think reloading pointers via JIT should not take too many cycles.

Thanks
Was still thinking if we better should extend this (rather slow-path)
test into handling it more gracefully:

!skb_shared(skb) && pskb_expand_head(skb, 0, 0, GFP_ATOMIC);

Shared skbs should be rather rare. But, there seem to be tricky things
with skb_get() or raw atomic_inc(&skb->users) that /could/ cause a BUG
when calling into pskb_expand_head() in our path. Taking pktgen aside,
I remember from an old netdev discussion, that with taps shared skbs
and pskb_expand_head() should cause issues. Going through the pf_packet
I don't think so. only pktgen does ugly things.
it's a requirement of the IP stack to have users == 1.
we had this discussion before. I will try to dig out my old email.
I believe from the old nft ingress discussion. ;) But maybe it was just
a different issue w/ other actions (or not up to date fact anymore).

btw, there is skb_make_writable() that used by netfilter,
but doing skb_cloned(skb) && !skb_clone_writable() && pskb_expand
is faster and probably cleaner.
There's also skb_ensure_writable(), but seems to be doing more than we
actually need.

Thanks,
Daniel


Re: Changing packet fields before redirect

Alexei Starovoitov
 

On Wed, Jan 27, 2016 at 12:17 PM, Daniel Borkmann <daniel@...> wrote:
On 01/27/2016 08:42 PM, Alexei Starovoitov wrote:

On Wed, Jan 27, 2016 at 11:26 AM, Daniel Borkmann <daniel@...>
wrote:

On 01/27/2016 06:18 PM, Alexei Starovoitov wrote:


On Wed, Jan 27, 2016 at 7:03 AM, Daniel Borkmann <daniel@...>
wrote:



+static inline int skb_try_make_writable(struct sk_buff *skb, int
offset,
+ int len)

I would keep single 'offset' or 'len' argument here.

Sure, that's fine, can do that.

Let the caller do the math, since it's faster and
better matches meaning of single arg as
'length up to which to write'.

+{
+ return skb_cloned(skb) && !skb_clone_writable(skb, offset +
len)
&&
+ pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
+}
+

..


@@ -1737,6 +1731,13 @@ bool bpf_helper_changes_skb_data(void *func)
return true;
if (func == bpf_skb_vlan_pop)
return true;
+ if (func == bpf_skb_store_bytes)
+ return true;
+ if (func == bpf_l3_csum_replace)
+ return true;
+ if (func == bpf_l4_csum_replace)
+ return true;

yep. was thinking to do the same, since
bpf_helper_changes_skb_data() landed.
That should be a nice addition!

Yeah, I think reloading pointers via JIT should not take too many cycles.

Thanks

Was still thinking if we better should extend this (rather slow-path)
test into handling it more gracefully:

!skb_shared(skb) && pskb_expand_head(skb, 0, 0, GFP_ATOMIC);

Shared skbs should be rather rare. But, there seem to be tricky things
with skb_get() or raw atomic_inc(&skb->users) that /could/ cause a BUG
when calling into pskb_expand_head() in our path. Taking pktgen aside,
I remember from an old netdev discussion, that with taps shared skbs
and pskb_expand_head() should cause issues. Going through the pf_packet

I don't think so. only pktgen does ugly things.
it's a requirement of the IP stack to have users == 1.
we had this discussion before. I will try to dig out my old email.

I believe from the old nft ingress discussion. ;) But maybe it was just
a different issue w/ other actions (or not up to date fact anymore).

btw, there is skb_make_writable() that used by netfilter,
but doing skb_cloned(skb) && !skb_clone_writable() && pskb_expand
is faster and probably cleaner.

There's also skb_ensure_writable(), but seems to be doing more than we
actually need.
yep. it's an ovs gimmick. pskb_may_pull() is useless for us.


Re: Changing packet fields before redirect

Daniel Borkmann
 

On 01/27/2016 10:07 PM, Alexei Starovoitov wrote:
On Wed, Jan 27, 2016 at 12:17 PM, Daniel Borkmann <daniel@...> wrote:
On 01/27/2016 08:42 PM, Alexei Starovoitov wrote:
[...]
Was still thinking if we better should extend this (rather slow-path)
test into handling it more gracefully:

!skb_shared(skb) && pskb_expand_head(skb, 0, 0, GFP_ATOMIC);

Shared skbs should be rather rare. But, there seem to be tricky things
with skb_get() or raw atomic_inc(&skb->users) that /could/ cause a BUG
when calling into pskb_expand_head() in our path. Taking pktgen aside,
I remember from an old netdev discussion, that with taps shared skbs
and pskb_expand_head() should cause issues. Going through the pf_packet
I don't think so. only pktgen does ugly things.
it's a requirement of the IP stack to have users == 1.
we had this discussion before. I will try to dig out my old email.
I believe from the old nft ingress discussion. ;) But maybe it was just
a different issue w/ other actions (or not up to date fact anymore).

btw, there is skb_make_writable() that used by netfilter,
but doing skb_cloned(skb) && !skb_clone_writable() && pskb_expand
is faster and probably cleaner.
There's also skb_ensure_writable(), but seems to be doing more than we
actually need.
yep. it's an ovs gimmick. pskb_may_pull() is useless for us.
Yeah, so I'll do some more testing and get it out for -net-next when it
opens up next few days hopefully, keeping also Ashhad in the loop (meanwhile
you can use the patch).

Cheers,
Daniel


reminder: IO Visor TSC and Dev Members Call

Brenden Blanco <bblanco@...>
 

Hi,

Please feel free to join us today at 11am PST (1900 UTC) for another round of IOVisor developer updates and discussions. 

http://www.timeanddate.com/worldclock/meetingdetails.html?year=2016&month=2&day=3&hour=19&min=0&sec=0&p1=886

The meeting is open to join:

JOIN WEBEX MEETING
Meeting number: 283 885 640
Meeting password: iovisor


JOIN BY PHONE
+1-415-655-0003 US TOLL
Access code: 283 885 640

Global call-in numbers:



Can't join the meeting? Contact support here:


IMPORTANT NOTICE: Please note that this WebEx service allows audio and other information sent during the session to be recorded, which may be discoverable in a legal matter. By joining this session, you automatically consent to such recordings. If you do not consent to being recorded, discuss your concerns with the host or do not join the session.


Updated Invitation: IOvisor TSC & Dev Members call @ Every 2 weeks from 11am to 12pm on Wednesday (pmonclus@plumgrid.com)

Pere Monclus
 

This event has been changed.

IOvisor TSC & Dev Members call

Changed: Hi,

Please feel free to join us today at 11am PST (1900 UTC) for another round of IOVisor developer updates and discussions.

http://www.timeanddate.com/worldclock/meetingdetails.html?year=2016&month=2&day=3&hour=19&min=0&sec=0&p1=886

The meeting is open to join:

JOIN WEBEX MEETING
https://plumgrid.webex.com/plumgrid/j.php?MTID=m67436ba0408d6bad48acd69138c03aea
Meeting number: 283 885 640
Meeting password: iovisor


JOIN BY PHONE
+1-415-655-0003 US TOLL
Access code: 283 885 640

Global call-in numbers:
https://plumgrid.webex.com/plumgrid/globalcallin.php?serviceType=MC&ED=44474908&tollFree=0

Can't join the meeting? Contact support here:
https://plumgrid.webex.com/plumgrid/mc


IMPORTANT NOTICE: Please note that this WebEx service allows audio and other information sent during the session to be recorded, which may be discoverable in a legal matter. By joining this session, you automatically consent to such recordings. If you do not consent to being recorded, discuss your concerns with the host or do not join the session.


_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev

When
Every 2 weeks from 11am to 12pm on Wednesday Pacific Time
Where
Changed: webex updated (map)
Video call
https://plus.google.com/hangouts/_/plumgrid.com/iovisor
Calendar
pmonclus@...
Who
Pere Monclus - organizer
John Zannos
uri.elzur@...
prem@...
Prasun Kapoor
Alexei Starovoitov
wardd@...
yunsong.lu@...
aclark@...
bkanekar@...
Ed Doe
Brenden Blanco
developer@...
mc3124@...
krb@...
christopher.price@...
Neela Jacques
Sushil Singh
jianwen.pi@...
Bhushan Kanekar
mbudiu@...
David Duffey
Affan Ahmed Syed
john fastabend
iovisor-dev@...
Rich Lane

Going?   All events in this series:   Yes - Maybe - No    more options »

Invitation from Google Calendar

You are receiving this courtesy email at the account iovisor-dev@... because you are an attendee of this event.

To stop receiving future updates for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar.

Forwarding this invitation could allow any recipient to modify your RSVP response. Learn More.


Updated Invitation: IOvisor TSC & Dev Members call @ Every 2 weeks from 11am to 12pm on Wednesday (pmonclus@plumgrid.com)

Pere Monclus
 

This event has been changed.

IOvisor TSC & Dev Members call

Changed: Hi,

Please feel free to join us today at 11am PST (1900 UTC) for another round of IOVisor developer updates and discussions.

http://www.timeanddate.com/worldclock/meetingdetails.html?year=2016&month=2&day=3&hour=19&min=0&sec=0&p1=886

The meeting is open to join:

JOIN WEBEX MEETING
https://plumgrid.webex.com/plumgrid/j.php?MTID=m67436ba0408d6bad48acd69138c03aea
Meeting number: 283 885 640
Meeting password: iovisor


JOIN BY PHONE
+1-415-655-0003 US TOLL
Access code: 283 885 640

Global call-in numbers:
https://plumgrid.webex.com/plumgrid/globalcallin.php?serviceType=MC&ED=44474908&tollFree=0

Can't join the meeting? Contact support here:
https://plumgrid.webex.com/plumgrid/mc


IMPORTANT NOTICE: Please note that this WebEx service allows audio and other information sent during the session to be recorded, which may be discoverable in a legal matter. By joining this session, you automatically consent to such recordings. If you do not consent to being recorded, discuss your concerns with the host or do not join the session.


_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev

When
Every 2 weeks from 11am to 12pm on Wednesday Pacific Time
Where
Changed: webex updated (map)
Video call
https://plus.google.com/hangouts/_/plumgrid.com/iovisor
Calendar
pmonclus@...
Who
Pere Monclus - organizer
John Zannos
uri.elzur@...
prem@...
Prasun Kapoor
Alexei Starovoitov
wardd@...
yunsong.lu@...
aclark@...
bkanekar@...
Ed Doe
Brenden Blanco
developer@...
mc3124@...
krb@...
christopher.price@...
Neela Jacques
Sushil Singh
jianwen.pi@...
Bhushan Kanekar
mbudiu@...
David Duffey
Affan Ahmed Syed
john fastabend
iovisor-dev@...
Rich Lane

Going?   All events in this series:   Yes - Maybe - No    more options »

Invitation from Google Calendar

You are receiving this courtesy email at the account developer@... because you are an attendee of this event.

To stop receiving future updates for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar.

Forwarding this invitation could allow any recipient to modify your RSVP response. Learn More.


Re: continuous profiling

Alexei Starovoitov
 

The paper Mihai mentioned during the meeting:

"Continuous Profiling: Where Have All the Cycles Gone?"
1997
http://www-plan.cs.colorado.edu/diwan/7135/p357-anderson.pdf


IO Visor TSC/Dev Meeting Minutes

Brenden Blanco <bblanco@...>
 

Attendees:

Alexei Starovoitov
Benjamin Poirier
Brendan Gregg
Brenden Blanco
Daniel Borkmann
Deepa Kalani
Luis Rodriguez
Mihai Budiu
Prem Jonnalagadda
Rich Lane

Here is an overview of work from the previous two weeks:

Brenden B. and Brendan G. presented at SCaLE 14x conference, giving a general
overview of IOVisor concepts. The talk was well received with good Q+A from the
audience. Slides are available at
https://www.socallinuxexpo.org/scale/14x/presentations/kernel-low-latency-tracing-and-networking.

Alexei submitted for review some patches on per-cpu hash and array bpf maps.
The API was structured with python (and other) binding friendliness in mind.
When that lands, BCC userspace support will follow.

Many upcoming work items were also discussed:

Alexei is working on native stack trace functionality support for bpf tracing
use cases. The API has been fleshed out and the implementation is currently
being massaged from the already existing alternative implementations from perf
and/or ftrace.

One big nice-to-have is support for bpf+tracepoints. The FS folks seem to be
onboard with the approach, but there is currently some resistance on the
networking side.

Folks are raising concerns about the way that complex map key/values in C are
accessed in the python and other APIs. Need to have a single source of truth
going forward. Consider generating a #include that can be used from C or
Python.

Alexei would like to remove the dependency on the .h files in /usr/share.
Brenden to statically include that file in libbcc. Alexei, please raise a
github issue for this.

On the tracing side, we discussed the use case and how user-space symbols could
work.
* For now, even a best-effort approach would be beneficial. Perhaps to start,
* only support binaries, not .so/dlopen symbols At runtime, will need to do
* some pid -> comm translation See systemtap/perf for hints on how it can work

Alexei also discussed a concept for ring/overwrite buffer for tracing. I didn't
catch all the details of how this would work, perhaps others can elaborate on
the mailing list.

Also desired would be an mmap of bpf arrays, for userspace sharing.
* May not work seamlessly with per-cpu

Daniel recently gave a talk at FOSDEM on eBPF+tc. The slides can be seen here:
https://fosdem.org/2016/schedule/event/ebpf/

A user on IRC raised a question about whether one could get an inotify for bpf
map changes. So far this is just food for future thought.

Next week, several of us will be attending netdev 1.1 conference in Europe.
There is a BoF on Thursday where Prem, John Fastabend, and Brenden will cover
"Describing datapath processing from userspace policy".

Finally, Luis talked about a problem set that he is currently interested in,
where IOVisor may play an interesting role. The topic loosely oversimplified is
how to avoid dead runtime code in the kernel. Some background on his thoughts
are available here:
* http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html
* http://www.do-not-panic.com/2015/12/xen-and-x86-linux-zero-page.html
Hopefully

Thank you all!


eBPF list feature

Affan Ahmed Syed <asyed@...>
 

Hi,

I was thinking that as we increase the number of ways and modes in which ebpf/iomodules are being created, something akin to "docker ps" [1] would be really useful. Do we have something like this currently supported?


Affan


bcc, bpf_perf_event_output(), and the upcoming Ubuntu 16.04 LTS

Brendan Gregg
 

I'm considering creating a directory in bcc called /oldtools, for some older versions that work on the 4.4 kernel.

With bpf_perf_event_output() in 4.5, there are many tools that I (or someone) should change to ditch bpf_trace_printk() and use bpf_perf_event_output() instead, which is more efficient and allows multi-user access. Great!

However...

Ubuntu 16.04 LTS (Xenial Xerus) will be out soon, and likely on the 4.4 kernel. Ubuntu is not the only distribution out there, but a widely used one, including by Netflix. And, given this is an LTS release, I'd expect us to see it in use for a year or more. That gives me hesitation to break these tools for a wide audience, especially one who may be experiencing their first impression of bcc & eBPF.

So I'm considering putting some older versions of tools (like execsnoop, opensnoop) in a temporary /oldtools directory, as I bpf_perf_event_output() all the things. In the distant future, we can delete /oldtools. Sound ok?

Brendan


Re: bcc, bpf_perf_event_output(), and the upcoming Ubuntu 16.04 LTS

Alexei Starovoitov
 

makes sense to me.
may be new subdir under tools/ ?
Since we have few links on the web pointing to github/iovisor/bcc/tools/

On Thu, Feb 11, 2016 at 2:53 AM, Brendan Gregg via iovisor-dev
<iovisor-dev@...> wrote:
I'm considering creating a directory in bcc called /oldtools, for some older
versions that work on the 4.4 kernel.

With bpf_perf_event_output() in 4.5, there are many tools that I (or
someone) should change to ditch bpf_trace_printk() and use
bpf_perf_event_output() instead, which is more efficient and allows
multi-user access. Great!

However...

Ubuntu 16.04 LTS (Xenial Xerus) will be out soon, and likely on the 4.4
kernel. Ubuntu is not the only distribution out there, but a widely used
one, including by Netflix. And, given this is an LTS release, I'd expect us
to see it in use for a year or more. That gives me hesitation to break these
tools for a wide audience, especially one who may be experiencing their
first impression of bcc & eBPF.

So I'm considering putting some older versions of tools (like execsnoop,
opensnoop) in a temporary /oldtools directory, as I bpf_perf_event_output()
all the things. In the distant future, we can delete /oldtools. Sound ok?

Brendan


_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: bcc, bpf_perf_event_output(), and the upcoming Ubuntu 16.04 LTS

Allan McAleavy
 

Hi Folks,

Has there been a consensus on the way forward with the subdir under tools? I have updated biosnoop and bashreadline to use make use of bpf_perf_event_output and was looking to create a PR. Should I include a move of the current tools to a subdir. If moving I would look to update each of the appropriate man pages with additional info under see also or stability section along the lines of "For some older kernel revisions bpf_perf_event_output may be unavailable, please check for this tool in tools/xyz_dir/"

Thanks
Allan 

On Thu, Feb 11, 2016 at 9:12 AM, Alexei Starovoitov via iovisor-dev <iovisor-dev@...> wrote:
makes sense to me.
may be new subdir under tools/ ?
Since we have few links on the web pointing to github/iovisor/bcc/tools/

On Thu, Feb 11, 2016 at 2:53 AM, Brendan Gregg via iovisor-dev
<iovisor-dev@...> wrote:
> I'm considering creating a directory in bcc called /oldtools, for some older
> versions that work on the 4.4 kernel.
>
> With bpf_perf_event_output() in 4.5, there are many tools that I (or
> someone) should change to ditch bpf_trace_printk() and use
> bpf_perf_event_output() instead, which is more efficient and allows
> multi-user access. Great!
>
> However...
>
> Ubuntu 16.04 LTS (Xenial Xerus) will be out soon, and likely on the 4.4
> kernel. Ubuntu is not the only distribution out there, but a widely used
> one, including by Netflix. And, given this is an LTS release, I'd expect us
> to see it in use for a year or more. That gives me hesitation to break these
> tools for a wide audience, especially one who may be experiencing their
> first impression of bcc & eBPF.
>
> So I'm considering putting some older versions of tools (like execsnoop,
> opensnoop) in a temporary /oldtools directory, as I bpf_perf_event_output()
> all the things. In the distant future, we can delete /oldtools. Sound ok?
>
> Brendan
>
>
> _______________________________________________
> iovisor-dev mailing list
> iovisor-dev@...
> https://lists.iovisor.org/mailman/listinfo/iovisor-dev
>
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: bcc, bpf_perf_event_output(), and the upcoming Ubuntu 16.04 LTS

Brendan Gregg
 



On Fri, Feb 12, 2016 at 10:13 AM, Allan McAleavy <allan.mcaleavy@...> wrote:
Hi Folks,

Has there been a consensus on the way forward with the subdir under tools?

Yes, lets have tools/old, to keep it simple.
 
I have updated biosnoop and bashreadline to use make use of bpf_perf_event_output and was looking to create a PR.

Great! I've been using bpf_perf_event_output() too (see my PR for ext4slower/xfslower), and had a few nits I need to file tickets on (I labeled some things in the code as "workaround", like it not liking u32's in the data_t). So if you ran into similar issues, you weren't alone.

Should I include a move of the current tools to a subdir. If moving I would look to update each of the appropriate man pages with additional info under see also or stability section along the lines of "For some older kernel revisions bpf_perf_event_output may be unavailable, please check for this tool in tools/xyz_dir/"

Sounds good. Just put the tool into tools/old, and add "This makes use of a Linux 4.5 feature (bpf_perf_event_output()); for kernels older than 4.5, see the version under tools/old, which uses an older mechanism."

Brendan


Thanks
Allan 

On Thu, Feb 11, 2016 at 9:12 AM, Alexei Starovoitov via iovisor-dev <iovisor-dev@...> wrote:
makes sense to me.
may be new subdir under tools/ ?
Since we have few links on the web pointing to github/iovisor/bcc/tools/

On Thu, Feb 11, 2016 at 2:53 AM, Brendan Gregg via iovisor-dev
<iovisor-dev@...> wrote:
> I'm considering creating a directory in bcc called /oldtools, for some older
> versions that work on the 4.4 kernel.
>
> With bpf_perf_event_output() in 4.5, there are many tools that I (or
> someone) should change to ditch bpf_trace_printk() and use
> bpf_perf_event_output() instead, which is more efficient and allows
> multi-user access. Great!
>
> However...
>
> Ubuntu 16.04 LTS (Xenial Xerus) will be out soon, and likely on the 4.4
> kernel. Ubuntu is not the only distribution out there, but a widely used
> one, including by Netflix. And, given this is an LTS release, I'd expect us
> to see it in use for a year or more. That gives me hesitation to break these
> tools for a wide audience, especially one who may be experiencing their
> first impression of bcc & eBPF.
>
> So I'm considering putting some older versions of tools (like execsnoop,
> opensnoop) in a temporary /oldtools directory, as I bpf_perf_event_output()
> all the things. In the distant future, we can delete /oldtools. Sound ok?
>
> Brendan
>
>
> _______________________________________________
> iovisor-dev mailing list
> iovisor-dev@...
> https://lists.iovisor.org/mailman/listinfo/iovisor-dev
>
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev



Tools type sub directories

Allan McAleavy
 

Hello

As the amount of tools start to grow should we be looking to create subdirs per tool type to make it easier for other users? Looking at what we have I would categorise the tools as follows. 

Disk  biolatency.py biosnoop.py biotop.py bitesize.py ext4dist.py ext4slower.py mdflush.py
Net   gethostlatency.py opensnoop.py tcpaccept.py tcpconnect.py 
Cpu  hardirqs.py offcputime.py offwaketime.py runqlat.py softirqs.py wakeuptime.py
FS  dcsnoop.py dcstat.py statsnoop.py vfscount.py vfsstat.py xfsdist.py xfsslower.py
File   filelife.py fileslower.py filetop.py 
Memory  cachestat.py memleak.py  oomkill.py 
User  bashreadline.py pidpersec.py
Kernel / Tracing argdist.py execsnoop.py funccount.py funclatency.py killsnoop.py stackcount.py stacksnoop.py
However some tools such as pidpersec / execsnoop / memleak etc could live in a tracing context as well. Any thoughts on this or is everyone happy with the current setup? 


Al


On 12 Feb 2016, at 18:41, Brendan Gregg <brendan.d.gregg@...> wrote:



On Fri, Feb 12, 2016 at 10:13 AM, Allan McAleavy <allan.mcaleavy@...> wrote:
Hi Folks,

Has there been a consensus on the way forward with the subdir under tools?

Yes, lets have tools/old, to keep it simple.
 
I have updated biosnoop and bashreadline to use make use of bpf_perf_event_output and was looking to create a PR.

Great! I've been using bpf_perf_event_output() too (see my PR for ext4slower/xfslower), and had a few nits I need to file tickets on (I labeled some things in the code as "workaround", like it not liking u32's in the data_t). So if you ran into similar issues, you weren't alone.

Should I include a move of the current tools to a subdir. If moving I would look to update each of the appropriate man pages with additional info under see also or stability section along the lines of "For some older kernel revisions bpf_perf_event_output may be unavailable, please check for this tool in tools/xyz_dir/"

Sounds good. Just put the tool into tools/old, and add "This makes use of a Linux 4.5 feature (bpf_perf_event_output()); for kernels older than 4.5, see the version under tools/old, which uses an older mechanism."

Brendan


Thanks
Allan 

On Thu, Feb 11, 2016 at 9:12 AM, Alexei Starovoitov via iovisor-dev <iovisor-dev@...> wrote:
makes sense to me.
may be new subdir under tools/ ?
Since we have few links on the web pointing to github/iovisor/bcc/tools/

On Thu, Feb 11, 2016 at 2:53 AM, Brendan Gregg via iovisor-dev
<iovisor-dev@...> wrote:
> I'm considering creating a directory in bcc called /oldtools, for some older
> versions that work on the 4.4 kernel.
>
> With bpf_perf_event_output() in 4.5, there are many tools that I (or
> someone) should change to ditch bpf_trace_printk() and use
> bpf_perf_event_output() instead, which is more efficient and allows
> multi-user access. Great!
>
> However...
>
> Ubuntu 16.04 LTS (Xenial Xerus) will be out soon, and likely on the 4.4
> kernel. Ubuntu is not the only distribution out there, but a widely used
> one, including by Netflix. And, given this is an LTS release, I'd expect us
> to see it in use for a year or more. That gives me hesitation to break these
> tools for a wide audience, especially one who may be experiencing their
> first impression of bcc & eBPF.
>
> So I'm considering putting some older versions of tools (like execsnoop,
> opensnoop) in a temporary /oldtools directory, as I bpf_perf_event_output()
> all the things. In the distant future, we can delete /oldtools. Sound ok?
>
> Brendan
>
>
> _______________________________________________
> iovisor-dev mailing list
> iovisor-dev@...
> https://lists.iovisor.org/mailman/listinfo/iovisor-dev
>
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev




Re: Tools type sub directories

Alexei Starovoitov
 

On Sun, Feb 14, 2016 at 1:13 PM, allan mcaleavy via iovisor-dev
<iovisor-dev@...> wrote:
However some tools such as pidpersec / execsnoop / memleak etc could live in
a tracing context as well. Any thoughts on this or is everyone happy with
the current setup?
makes sense, but then install dir also needs to be adjusted
and it will lead to users not having all the tools in their PATH,
so not good.
But changing only repo and keep install in one place
will make it inconsistent. So I would keep it as-is.


Re: Tools type sub directories

Brendan Gregg
 



On Sun, Feb 14, 2016 at 4:13 AM, allan mcaleavy via iovisor-dev <iovisor-dev@...> wrote:
Hello

As the amount of tools start to grow should we be looking to create subdirs per tool type to make it easier for other users? Looking at what we have I would categorise the tools as follows. 

Disk  biolatency.py biosnoop.py biotop.py bitesize.py ext4dist.py ext4slower.py mdflush.py
Net   gethostlatency.py opensnoop.py tcpaccept.py tcpconnect.py 
Cpu  hardirqs.py offcputime.py offwaketime.py runqlat.py softirqs.py wakeuptime.py
FS  dcsnoop.py dcstat.py statsnoop.py vfscount.py vfsstat.py xfsdist.py xfsslower.py
File   filelife.py fileslower.py filetop.py 
Memory  cachestat.py memleak.py  oomkill.py 
User  bashreadline.py pidpersec.py
Kernel / Tracing argdist.py execsnoop.py funccount.py funclatency.py killsnoop.py stackcount.py stacksnoop.py

Not quite there yet, but another ten tools or so and it's going to get onerous to pick through a long listing. I'll take a swing at this myself -- I've already had directories in mind for these (as with other toolkits). And as with other toolkits, I've found it handy to have a "bin" directory of symlinks, where one can go for grepping every script.

Brendan

 
However some tools such as pidpersec / execsnoop / memleak etc could live in a tracing context as well. Any thoughts on this or is everyone happy with the current setup? 


Al


On 12 Feb 2016, at 18:41, Brendan Gregg <brendan.d.gregg@...> wrote:



On Fri, Feb 12, 2016 at 10:13 AM, Allan McAleavy <allan.mcaleavy@...> wrote:
Hi Folks,

Has there been a consensus on the way forward with the subdir under tools?

Yes, lets have tools/old, to keep it simple.
 
I have updated biosnoop and bashreadline to use make use of bpf_perf_event_output and was looking to create a PR.

Great! I've been using bpf_perf_event_output() too (see my PR for ext4slower/xfslower), and had a few nits I need to file tickets on (I labeled some things in the code as "workaround", like it not liking u32's in the data_t). So if you ran into similar issues, you weren't alone.

Should I include a move of the current tools to a subdir. If moving I would look to update each of the appropriate man pages with additional info under see also or stability section along the lines of "For some older kernel revisions bpf_perf_event_output may be unavailable, please check for this tool in tools/xyz_dir/"

Sounds good. Just put the tool into tools/old, and add "This makes use of a Linux 4.5 feature (bpf_perf_event_output()); for kernels older than 4.5, see the version under tools/old, which uses an older mechanism."

Brendan


Thanks
Allan 

On Thu, Feb 11, 2016 at 9:12 AM, Alexei Starovoitov via iovisor-dev <iovisor-dev@...> wrote:
makes sense to me.
may be new subdir under tools/ ?
Since we have few links on the web pointing to github/iovisor/bcc/tools/

On Thu, Feb 11, 2016 at 2:53 AM, Brendan Gregg via iovisor-dev
<iovisor-dev@...> wrote:
> I'm considering creating a directory in bcc called /oldtools, for some older
> versions that work on the 4.4 kernel.
>
> With bpf_perf_event_output() in 4.5, there are many tools that I (or
> someone) should change to ditch bpf_trace_printk() and use
> bpf_perf_event_output() instead, which is more efficient and allows
> multi-user access. Great!
>
> However...
>
> Ubuntu 16.04 LTS (Xenial Xerus) will be out soon, and likely on the 4.4
> kernel. Ubuntu is not the only distribution out there, but a widely used
> one, including by Netflix. And, given this is an LTS release, I'd expect us
> to see it in use for a year or more. That gives me hesitation to break these
> tools for a wide audience, especially one who may be experiencing their
> first impression of bcc & eBPF.
>
> So I'm considering putting some older versions of tools (like execsnoop,
> opensnoop) in a temporary /oldtools directory, as I bpf_perf_event_output()
> all the things. In the distant future, we can delete /oldtools. Sound ok?
>
> Brendan
>
>
> _______________________________________________
> iovisor-dev mailing list
> iovisor-dev@...
> https://lists.iovisor.org/mailman/listinfo/iovisor-dev
>
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev




_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev



A slightly different use of the BCC tools

Gerard <ggarcia@...>
 

Hello,

I'd like to explain a different use for eBPF in which I'm currently working on and ask for a little bit of help.

I'm part of a research group which works on opportunistic networks, e.g. sensor networks. We propose an approach where the messages in the carry their forwarding protocol. This way different applications, with different forwarding protocols, can use the same network without having to install these protocols on each of the nodes, which is quite difficult in a network where nodes are completely disconnected from each other for extended periods of time.

There is where eBPF comes to play. The forwarding codes need to be executed quickly and securely therefore, the use of eBPF seems like a good idea. 

Right now I have implemented a proof of concept using libclang to compile the C codes to eBPF and then I use ubpf (https://github.com/rlane/ubpf) to load and execute the resulting objects. The problem is that this way I can't use external functions that have arguments other than integers. 

I have noticed that bcc is capable of having strings (char *) as function parameters because they are previously written in the stack, and I'm trying to understand how this code is generated.

If I'm not mistaken bcc uses libclang to generate a LLVM IR module (in frontends/clang/loader.cc), and then I'm not able to understand if it implements its eBPF code generator or if it modifies the eBPF code generator that is implemented in LLVM. What I'm trying to do is to register some external functions to the code generator so it doesn't complain when I use unknown functions or just leave the functions unlinked so I can link them using ubpf.

The goal is to provide a library that user space applications could use to execute eBPF codes for their purposes.

Any help will be highly appreciated.

Thanks!

Gerard



Re: A slightly different use of the BCC tools

Rich Lane <rich.lane@...>
 

Hi Gerard,

Are you loading eBPF from ELF or from raw instructions? If it's ELF I could add support for those relocations and load rodata.

Otherwise, you could try copying the string to the stack manually before the function call. The tc samples do this.

    char foo[] = "abc";

The compiler turns this into a sequence of load-immediate/store instructions.

Thanks,
Rich

On Tue, Feb 16, 2016 at 4:30 AM, Gerard via iovisor-dev <iovisor-dev@...> wrote:
Hello,

I'd like to explain a different use for eBPF in which I'm currently working on and ask for a little bit of help.

I'm part of a research group which works on opportunistic networks, e.g. sensor networks. We propose an approach where the messages in the carry their forwarding protocol. This way different applications, with different forwarding protocols, can use the same network without having to install these protocols on each of the nodes, which is quite difficult in a network where nodes are completely disconnected from each other for extended periods of time.

There is where eBPF comes to play. The forwarding codes need to be executed quickly and securely therefore, the use of eBPF seems like a good idea. 

Right now I have implemented a proof of concept using libclang to compile the C codes to eBPF and then I use ubpf (https://github.com/rlane/ubpf) to load and execute the resulting objects. The problem is that this way I can't use external functions that have arguments other than integers. 

I have noticed that bcc is capable of having strings (char *) as function parameters because they are previously written in the stack, and I'm trying to understand how this code is generated.

If I'm not mistaken bcc uses libclang to generate a LLVM IR module (in frontends/clang/loader.cc), and then I'm not able to understand if it implements its eBPF code generator or if it modifies the eBPF code generator that is implemented in LLVM. What I'm trying to do is to register some external functions to the code generator so it doesn't complain when I use unknown functions or just leave the functions unlinked so I can link them using ubpf.

The goal is to provide a library that user space applications could use to execute eBPF codes for their purposes.

Any help will be highly appreciated.

Thanks!

Gerard



_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev



Re: A slightly different use of the BCC tools

Gerard <ggarcia@...>
 

Hi Rich,

Thanks for answering. I'm loading eBPF from ELF, but the idea is to load it from raw instructions when I'm able to just build the instructions using clang and not the complete ELF object. Right now my Clang backend is something like this: http://stackoverflow.com/a/34966966/1132943 that's why I wanted to use bcc as backend as it seems to just generate the raw instructions and not an ELF object.

Another question is that I didn't thought that char foo[] = "abc"; is different from char *foo = "abc"; that's why I wasn't able to generate the code that copies the string into the stack... Thanks for pointing this. 

Now I can focus on understanding how bcc uses Clang/LLVM to generate the eBPF instructions and adapt it.

Gerard

El mar., 16 feb. 2016 a las 19:23, Rich Lane (<rich.lane@...>) escribió:
Hi Gerard,

Are you loading eBPF from ELF or from raw instructions? If it's ELF I could add support for those relocations and load rodata.

Otherwise, you could try copying the string to the stack manually before the function call. The tc samples do this.

    char foo[] = "abc";

The compiler turns this into a sequence of load-immediate/store instructions.

Thanks,
Rich

On Tue, Feb 16, 2016 at 4:30 AM, Gerard via iovisor-dev <iovisor-dev@...> wrote:
Hello,

I'd like to explain a different use for eBPF in which I'm currently working on and ask for a little bit of help.

I'm part of a research group which works on opportunistic networks, e.g. sensor networks. We propose an approach where the messages in the carry their forwarding protocol. This way different applications, with different forwarding protocols, can use the same network without having to install these protocols on each of the nodes, which is quite difficult in a network where nodes are completely disconnected from each other for extended periods of time.

There is where eBPF comes to play. The forwarding codes need to be executed quickly and securely therefore, the use of eBPF seems like a good idea. 

Right now I have implemented a proof of concept using libclang to compile the C codes to eBPF and then I use ubpf (https://github.com/rlane/ubpf) to load and execute the resulting objects. The problem is that this way I can't use external functions that have arguments other than integers. 

I have noticed that bcc is capable of having strings (char *) as function parameters because they are previously written in the stack, and I'm trying to understand how this code is generated.

If I'm not mistaken bcc uses libclang to generate a LLVM IR module (in frontends/clang/loader.cc), and then I'm not able to understand if it implements its eBPF code generator or if it modifies the eBPF code generator that is implemented in LLVM. What I'm trying to do is to register some external functions to the code generator so it doesn't complain when I use unknown functions or just leave the functions unlinked so I can link them using ubpf.

The goal is to provide a library that user space applications could use to execute eBPF codes for their purposes.

Any help will be highly appreciated.

Thanks!

Gerard



_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev



Re: A slightly different use of the BCC tools

Alexei Starovoitov
 

take a look at native c++ api of bcc in bpf_module.h
with load_string() you can pass C code as a string
and function_start() + function_size() will give you raw bpf instructions.
Note that compiler doesn't guarantee the safety.
One can write for(;;); and llvm will generate bpf code with this infinite loop.
In-kernel verifier checks for safety.


On Tue, Feb 16, 2016 at 8:32 PM, Gerard via iovisor-dev
<iovisor-dev@...> wrote:
Hi Rich,

Thanks for answering. I'm loading eBPF from ELF, but the idea is to load it
from raw instructions when I'm able to just build the instructions using
clang and not the complete ELF object. Right now my Clang backend is
something like this: http://stackoverflow.com/a/34966966/1132943 that's why
I wanted to use bcc as backend as it seems to just generate the raw
instructions and not an ELF object.

Another question is that I didn't thought that char foo[] = "abc"; is
different from char *foo = "abc"; that's why I wasn't able to generate the
code that copies the string into the stack... Thanks for pointing this.

Now I can focus on understanding how bcc uses Clang/LLVM to generate the
eBPF instructions and adapt it.

Gerard

El mar., 16 feb. 2016 a las 19:23, Rich Lane (<rich.lane@...>)
escribió:

Hi Gerard,

Are you loading eBPF from ELF or from raw instructions? If it's ELF I
could add support for those relocations and load rodata.

Otherwise, you could try copying the string to the stack manually before
the function call. The tc samples do this.

char foo[] = "abc";

The compiler turns this into a sequence of load-immediate/store
instructions.

Thanks,
Rich

On Tue, Feb 16, 2016 at 4:30 AM, Gerard via iovisor-dev
<iovisor-dev@...> wrote:

Hello,

I'd like to explain a different use for eBPF in which I'm currently
working on and ask for a little bit of help.

I'm part of a research group which works on opportunistic networks, e.g.
sensor networks. We propose an approach where the messages in the carry
their forwarding protocol. This way different applications, with different
forwarding protocols, can use the same network without having to install
these protocols on each of the nodes, which is quite difficult in a network
where nodes are completely disconnected from each other for extended periods
of time.

There is where eBPF comes to play. The forwarding codes need to be
executed quickly and securely therefore, the use of eBPF seems like a good
idea.

Right now I have implemented a proof of concept using libclang to compile
the C codes to eBPF and then I use ubpf (https://github.com/rlane/ubpf) to
load and execute the resulting objects. The problem is that this way I can't
use external functions that have arguments other than integers.

I have noticed that bcc is capable of having strings (char *) as function
parameters because they are previously written in the stack, and I'm trying
to understand how this code is generated.

If I'm not mistaken bcc uses libclang to generate a LLVM IR module (in
frontends/clang/loader.cc), and then I'm not able to understand if it
implements its eBPF code generator or if it modifies the eBPF code generator
that is implemented in LLVM. What I'm trying to do is to register some
external functions to the code generator so it doesn't complain when I use
unknown functions or just leave the functions unlinked so I can link them
using ubpf.

The goal is to provide a library that user space applications could use
to execute eBPF codes for their purposes.

Any help will be highly appreciated.

Thanks!

Gerard



_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev
_______________________________________________
iovisor-dev mailing list
iovisor-dev@...
https://lists.iovisor.org/mailman/listinfo/iovisor-dev

61 - 80 of 2021