This group is locked. No changes can be made to the group while it is locked.
Date
1 - 5 of 5
ANNOUNCE: bpfd - a remote proxy daemon for executing bpf code (with corres. bcc changes)
Josef Bacik <josef@...>
On Wed, Jan 24, 2018 at 08:29:39PM -0800, Joel Fernandes wrote:
Hi Guys,I added a bunch of comments on github, I hope that's fine, otherwise I can dig them out and post them in here. All in all everything looks good, I just have some concerns about how the remote stuff is bolted into the python bindings. Thanks for doing this, it's a great start, Josef |
Joel Fernandes <joelaf@...>
Sigh, Correcting Brenden's email address. Apologies.
toggle quoted message
Show quoted text
On Wed, Jan 24, 2018 at 8:29 PM, Joel Fernandes <joelaf@...> wrote:
Hi Guys, |
Joel Fernandes <joelaf@...>
Hi Guys,
toggle quoted message
Show quoted text
Just providing an update: I made lots of progress last few weeks and all the issues mentioned below (in the last post) are resolved. I published an LWN article explaining the design of BPFd and BCC-side changes: https://lwn.net/SubscriberLink/744522/ba023b555957408e/ There are still a few things that don't work well like symbol resolution, and I am working on it. The effect is just tools that need it wont work, which aren't that many from what I see but the goal is to get everything working ultimately. My next steps are to post the BCC side changes on github as a pull request (once I get a chance to rebase and send along with any clean ups). I also request you to take a look at the top 7 patches on this branch and provide any early feedback to me: https://github.com/joelagnel/bcc/commits/bcc-bpfd Based on activity on twitter and folks pinging me, it seems there is a LOT of interest for this both within Google and outside so its quite exciting. I strongly believe this will expand the use of BCC in the community. I look forward to presenting some demos at SCALE showing it in action. I am looking forward to your comments. Thanks. Regards, - Joel On Fri, Dec 29, 2017 at 12:58 AM, Joel Fernandes <joelaf@...> wrote:
Hi Guys, |
Joel Fernandes <joelaf@...>
Correcting Brenden's email address.
toggle quoted message
Show quoted text
On Fri, Dec 29, 2017 at 12:58 AM, Joel Fernandes <joelaf@...> wrote:
Hi Guys, |
Joel Fernandes <joelaf@...>
Hi Guys,
I've been working on an idea I discussed with other kernel developers (Alexei, Josef etc) last LPC about how to make it easier to run bcc tools on remote systems. Use case ======== Run bcc tools on a remotely connected system without having to load the entire LLVM infrastructure onto the remote target and have to sync the kernel sources with it. On architecture such as ARM64 especially, its a bit more work if you were to run the tools directly on the target itself (local to the target) because LLVM and Python have to be cross-compiled for it (along with syncing of kernel sources which takes up space and needs to be kept sync'ed for correct operation). I believe Facebook also has some usecases where they want to run bcc tools on remote instances. Lastly this is also the way arm64 development normally happens, you cross build for it and typically the ARM64 embedded systems may not have much space for kernel sources and clang so its better some times if the tools are remote. All our kernel development for android is cross developed with the cross-toolchain running remotely as well. I am looking forward to collaborating with interested developers on this idea and getting more feedback about the design etc. I am also planning to talk about it next year during SCALE and OSPM. Implementation ============== To facilitate this, I started working on a daemon called bpfd which is executed on the remote target and listening for commands: https://github.com/joelagnel/bpfd The daemon does more than proxy the bpf syscall, there's several things like registering a kprobe with perf, and perf callbacks that need to be replicated. All this infrastructure is pretty much code complete in bpfd. Sample commands sent to bpfd are as follows: https://github.com/joelagnel/bpfd/blob/master/tests/TESTS ------------------------ ; Program opensnoop BPF_CREATE_MAP 1 8 40 10240 0 BPF_CREATE_MAP 4 4 4 2 0 BPF_PROG_LOAD 2 248 GPL 264721 eRdwAAAAAAC3AQAAAAAAAHsa+P8AA[...] BPF_PROG_LOAD 2 664 GPL 264721 vxYAAAAAAACFAAAADgAAAHsK+P8AA[...] ------------------------ Binary streams are communicated using base64 making it possible to keep interaction with binary simple. Several patches is written on the bcc side to be able to send these commands using a "remotes infrastructure", available in the branch at: https://github.com/joelagnel/bcc/commits/bcc-bpfd My idea was to keep the remote infrastructure as generic/plug-and-play as possible - so in the future its easy to add other remotes like networking. Currently I've adb (android bridge) remote and a shell remote: https://github.com/joelagnel/bcc/tree/bcc-bpfd/src/python/bcc/remote The shell remote is more of a "test" remote that simply forks bpfd and communicates with it over stdio. This makes the development quite easy. Status ====== What's working: - executing several bcc tools across process boundary using "shell" remote (bcc tools and bpfd both running on local x86 machine). - communication with remote arm64 android target using the "adb remote". But there are several issues to do with arm64 and bcc tools that I'm ironing out. Since my arm64 bcc hackery is a bit recent, I created a separate WIP branch here: https://github.com/joelagnel/bcc/tree/bcc-bpfd-arm64. I don't suspect these to be a major issue since I noticed some folks have been using bcc tools on arm64 already. Issues: - Since bcc is building with clang on x86 - the eBPF backend code is generated for x86. Although it loads fine on arm64, there seem several issues such as kprobe handler doesn't see arguments or return code correctly in opensnoop. This is (probably)easy to fix by just user telling bcc we're build for a certain architecture - but that would mean we carry code for each arch when building the bcc libraries and dynamically select the code path to run - than building for the C++ compiler's target architecture. - Some operations are quite slow, such as stackcount when the number of stack traces are a lot. Each stack trace is a key and and every key iterated is at a cost, which adds up. Maybe we can batch these up so that they're faster instead of making each key iteration a separate remote command/response? - Some tools read the ps table on the local host. This needs to be remotely proxied. - Provide mechanism to make bcc/clang build eBPF for arm64 (using a command line switch) ? - Design a generic parser mechanism to be added to all bcc tools to be able to pass which remote method to use, what the remote architecture is and what the path to the kernel sources are (for kprobes to work) Thanks a lot to Alexei for discussing ideas in conference and for all the great advice and help. Regards, - Joel PS: We also have some usecases where our Android networking daemon has hardcoded eBPF asm and our teams want to write them C and load the binary stream. It seems bpfd can be a good fit here as well. |