Re: Failing bcc tests on xenial with 4.13 kernel


Yonghong Song
 

Your workaround looks good. Basically, you can put the workaround in
for 4.13 for any tools which references
task_struct construct.

The bcc issue https://github.com/iovisor/bcc/issues/1492 has more details.
It is an issue for 4.13.
it is an issue for 4.14 original release, but the patch is ported to
stable release, so later 4.14 update should contain the fix.
4.15 and later contains the fix.

On Wed, Mar 7, 2018 at 10:34 PM, Mike Percy <mpercy@...> wrote:
Y Song, thank you very much for your detailed feedback.

Let me focus on the specific wakeuptime.py failure since that seems to be my
biggest blocker at the moment:

File "../../tools/wakeuptime.py", line 216, in <module>
print(" %-16s %s" % ("target:", k.target.decode()))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0:
ordinal not in range(128)
Maybe this is a real issue related to python 2/3 compatibility, could
you help take a look?

It looks like this is a known issue. According to
https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md#7-bpf_get_current_task
:

With Linux 4.13, due to issues with field randomization, you may need two
#define directives before the includes:
#define randomized_struct_fields_start struct {
#define randomized_struct_fields_end };
#include <linux/sched.h>
int do_trace(void *ctx) {
struct task_struct *t = (struct task_struct *)bpf_get_current_task();
[...]

And indeed that fixed the wakeuptime tool. For now, this patch seems to do
the trick on this kernel (the undefs just get rid of some warnings):

diff --git a/tools/wakeuptime.py b/tools/wakeuptime.py
index cf0ca7d..3f6fd67 100755
--- a/tools/wakeuptime.py
+++ b/tools/wakeuptime.py
@@ -86,6 +86,10 @@ def signal_ignore(signal, frame):

# define BPF program
bpf_text = """
+#undef randomized_struct_fields_start
+#undef randomized_struct_fields_end
+#define randomized_struct_fields_start struct {
+#define randomized_struct_fields_end };
#include <uapi/linux/ptrace.h>
#include <linux/sched.h>

It appears that same workaround is also needed for many other tools in the
repo to work correctly on this kernel.

I was looking around at the kernel 4.13 release notes and I found the
RANDSTRUCT patch
<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=313dd1b629219db50cad532dba6a3b3b22ffe622>
but I'm not sure that is what I'm hitting here since I didn't find a
reference to it in my kernel config. Any ideas about the root cause for this
workaround and if this is a permanent issue with kernels after 4.13? If so,
then I suppose we should find a more maintainable way of injecting that
workaround automatically in the BPF class, or find a more permanent
workaround for bcc.

Thoughts?

Thanks again,
Mike

Join iovisor-dev@lists.iovisor.org to automatically receive all group messages.