Date
1 - 5 of 5
Failing bcc tests on xenial with 4.13 kernel
Mike Percy <mpercy@...>
Hi bcc devs, I am trying to get the bcc tests to pass on the master branch on top of Ubuntu 16.04 running the HWE kernel, which is version 4.13.0-36-generic. I'm compiling bcc using LLVM 6.0 (yes, it's now available) from the llvm.org APT repo. Has anyone gotten the bcc tests to pass on this kernel? If not, is there a recommended platform? Is it possible that the tests could be currently broken on master? (I'm not sure if bcc is currently set up to use a CI system.) The following tests are failing for me: * test_libbcc: test listing all USDT probes in Ruby/MRI * py_test_brb2: TestBPFSocket * py_test_tools_smoke: test_bpflist * py_test_tools_smoke: test_ucalls * py_test_tools_smoke: test_wakeuptime Since this is kernel version 4.13, I would expect wakeuptime to work. I am particularly interested in that and related tools for some performance analysis I want to do. Below, I've added some relevant details about the kernel I'm running and the output of the specific test failures. Thanks in advance for any advice or pointers! Best regards, Mike Some relevant info on my kernel: $ uname -a Linux mpercy-T460p 4.13.0-36-generic #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux It looks like all the required options are enabled: $ egrep 'BPF|NET_SCH_SFQ|NET_ACT|CONFIG_DUMMY|CONFIG_VXLAN' /boot/config-4.13.0-36-generic | sort CONFIG_BPF_EVENTS=y CONFIG_BPF_JIT=y CONFIG_BPF_SYSCALL=y CONFIG_BPF=y CONFIG_DUMMY=m CONFIG_HAVE_EBPF_JIT=y CONFIG_NET_ACT_BPF=m CONFIG_NET_ACT_GACT=m CONFIG_NET_ACT_POLICE=m CONFIG_NET_CLS_BPF=m CONFIG_NET_SCH_SFQ=m CONFIG_VXLAN=m # n.b.: options not appearing in INSTALLING.md were elided Test failure output: 3/35 Testing: test_libbcc 3/35 Test: test_libbcc Command: "/home/mpercy/src/bcc/build/tests/wrapper.sh" "c_test_all" "sudo" "/home/mpercy/src/bcc/build/tests/cc/test_libbcc" Directory: /home/mpercy/src/bcc/build/tests/cc "test_libbcc" start time: Mar 06 22:11 PST Output: ---------------------------------------------------------- Parse error: 4@i%ra+1r -------^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ test_libbcc is a Catch v1.4.0 host application. Run with -? for options ------------------------------------------------------------------------------- test listing all USDT probes in Ruby/MRI without a running Ruby process ------------------------------------------------------------------------------- ../tests/cc/test_usdt_probes.cc:118 ............................................................................... ../tests/cc/test_usdt_probes.cc:127: FAILED: REQUIRE( ctx.num_probes() > 10 ) with expansion: 0 > 10 =============================================================================== test cases: 22 | 21 passed | 1 failed assertions: 433 | 432 passed | 1 failed Failed <end of output> Test time = 9.37 sec ---------------------------------------------------------- Test Failed. "test_libbcc" end time: Mar 06 22:11 PST "test_libbcc" time elapsed: 00:00:09 ---------------------------------------------------------- 16/35 Testing: py_test_brb2 16/35 Test: py_test_brb2 Command: "/home/mpercy/src/bcc/build/tests/wrapper.sh" "py_brb2_c" "sudo" "/home/mpercy/src/bcc/tests/python/test_brb2.py" "test_brb2.c" Directory: /home/mpercy/src/bcc/tests/python "py_test_brb2" start time: Mar 06 22:11 PST Output: ---------------------------------------------------------- net.ipv4.ip_forward = 1 PING 200.1.1.1 (200.1.1.1) 56(84) bytes of data. --- 200.1.1.1 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1026ms F ====================================================================== FAIL: test_brb2 (__main__.TestBPFSocket) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/mpercy/src/bcc/tests/python/test_brb2.py", line 176, in test_brb2 self.assertEqual(self.pem_stats[c_uint(0)].value, 12) AssertionError: 4L != 12 ---------------------------------------------------------------------- Ran 1 test in 12.734s FAILED (failures=1) Failed <end of output> Test time = 12.90 sec ---------------------------------------------------------- Test Failed. "py_test_brb2" end time: Mar 06 22:12 PST "py_test_brb2" time elapsed: 00:00:12 ---------------------------------------------------------- 27/35 Testing: py_test_tools_smoke 27/35 Test: py_test_tools_smoke Command: "/home/mpercy/src/bcc/build/tests/wrapper.sh" "py_test_tools_smoke" "sudo" "/home/mpercy/src/bcc/tests/python/test_tools_smoke.py" Directory: /home/mpercy/src/bcc/tests/python "py_test_tools_smoke" start time: Mar 06 22:12 PST Output: ---------------------------------------------------------- ......Traceback (most recent call last): File "../../tools/bpflist.py", line 75, in <module> find_bpf_fds(int(pdir)) File "../../tools/bpflist.py", line 63, in find_bpf_fds for fd in os.listdir(root): OSError: [Errno 2] No such file or directory: '/proc/18787/fd' F.......................Killed ....................Traceback (most recent call last): File "../../tools/stackcount.py", line 370, in <module> Tool().run() File "../../tools/stackcount.py", line 352, in run for addr in kernel_stack: KeyboardInterrupt .../bin/sh: 1: ausyscall: not found .In file included from /virtual/main.c:3: In file included from include/net/sock.h:58: include/linux/memcontrol.h:580:31: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] __mod_memcg_state(pn->memcg, idx, val); ~~~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:593:29: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] mod_memcg_state(pn->memcg, idx, val); ~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:605:38: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] __mod_memcg_state(page->mem_cgroup, idx, val); ~~~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:618:36: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] mod_memcg_state(page->mem_cgroup, idx, val); ~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:639:40: warning: implicit conversion from enumeration type 'enum memcg_stat_item' to different enumeration type 'enum vm_event_item' [-Wenum-conversion] count_memcg_events(page->mem_cgroup, idx, 1); ~~~~~~~~~~~~~~~~~~ ^~~ 5 warnings generated. .In file included from /virtual/main.c:3: In file included from include/net/sock.h:58: include/linux/memcontrol.h:580:31: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] __mod_memcg_state(pn->memcg, idx, val); ~~~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:593:29: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] mod_memcg_state(pn->memcg, idx, val); ~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:605:38: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] __mod_memcg_state(page->mem_cgroup, idx, val); ~~~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:618:36: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] mod_memcg_state(page->mem_cgroup, idx, val); ~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:639:40: warning: implicit conversion from enumeration type 'enum memcg_stat_item' to different enumeration type 'enum vm_event_item' [-Wenum-conversion] count_memcg_events(page->mem_cgroup, idx, 1); ~~~~~~~~~~~~~~~~~~ ^~~ 5 warnings generated. .In file included from /virtual/main.c:3: In file included from include/net/sock.h:58: include/linux/memcontrol.h:580:31: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] __mod_memcg_state(pn->memcg, idx, val); ~~~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:593:29: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] mod_memcg_state(pn->memcg, idx, val); ~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:605:38: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] __mod_memcg_state(page->mem_cgroup, idx, val); ~~~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:618:36: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] mod_memcg_state(page->mem_cgroup, idx, val); ~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:639:40: warning: implicit conversion from enumeration type 'enum memcg_stat_item' to different enumeration type 'enum vm_event_item' [-Wenum-conversion] count_memcg_events(page->mem_cgroup, idx, 1); ~~~~~~~~~~~~~~~~~~ ^~~ 5 warnings generated. .In file included from /virtual/main.c:3: In file included from include/net/sock.h:58: include/linux/memcontrol.h:580:31: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] __mod_memcg_state(pn->memcg, idx, val); ~~~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:593:29: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] mod_memcg_state(pn->memcg, idx, val); ~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:605:38: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] __mod_memcg_state(page->mem_cgroup, idx, val); ~~~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:618:36: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] mod_memcg_state(page->mem_cgroup, idx, val); ~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:639:40: warning: implicit conversion from enumeration type 'enum memcg_stat_item' to different enumeration type 'enum vm_event_item' [-Wenum-conversion] count_memcg_events(page->mem_cgroup, idx, 1); ~~~~~~~~~~~~~~~~~~ ^~~ 5 warnings generated. .In file included from /virtual/main.c:3: In file included from include/net/sock.h:58: include/linux/memcontrol.h:580:31: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] __mod_memcg_state(pn->memcg, idx, val); ~~~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:593:29: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] mod_memcg_state(pn->memcg, idx, val); ~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:605:38: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] __mod_memcg_state(page->mem_cgroup, idx, val); ~~~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:618:36: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] mod_memcg_state(page->mem_cgroup, idx, val); ~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:639:40: warning: implicit conversion from enumeration type 'enum memcg_stat_item' to different enumeration type 'enum vm_event_item' [-Wenum-conversion] count_memcg_events(page->mem_cgroup, idx, 1); ~~~~~~~~~~~~~~~~~~ ^~~ 5 warnings generated. .In file included from /virtual/main.c:3: In file included from include/net/sock.h:58: include/linux/memcontrol.h:580:31: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] __mod_memcg_state(pn->memcg, idx, val); ~~~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:593:29: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] mod_memcg_state(pn->memcg, idx, val); ~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:605:38: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] __mod_memcg_state(page->mem_cgroup, idx, val); ~~~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:618:36: warning: implicit conversion from enumeration type 'enum node_stat_item' to different enumeration type 'enum memcg_stat_item' [-Wenum-conversion] mod_memcg_state(page->mem_cgroup, idx, val); ~~~~~~~~~~~~~~~ ^~~ include/linux/memcontrol.h:639:40: warning: implicit conversion from enumeration type 'enum memcg_stat_item' to different enumeration type 'enum vm_event_item' [-Wenum-conversion] count_memcg_events(page->mem_cgroup, idx, 1); ~~~~~~~~~~~~~~~~~~ ^~~ 5 warnings generated. ....Killed F.....check dmesg output for possible cause check dmesg output for possible cause ..Traceback (most recent call last): File "../../tools/wakeuptime.py", line 216, in <module> print(" %-16s %s" % ("target:", k.target.decode())) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128) F.... ====================================================================== FAIL: test_bpflist (__main__.SmokeTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/mpercy/src/bcc/tests/python/test_tools_smoke.py", line 89, in test_bpflist self.run_with_duration("bpflist.py") File "/home/mpercy/src/bcc/tests/python/test_tools_smoke.py", line 32, in run_with_duration (timeout, full_command), shell=True)) AssertionError: 0 != 1 ====================================================================== FAIL: test_ucalls (__main__.SmokeTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/mpercy/src/bcc/tests/python/test_tools_smoke.py", line 336, in test_ucalls timeout=60, kill_timeout=60) File "/home/mpercy/src/bcc/tests/python/test_tools_smoke.py", line 52, in run_with_int or (rc == 137 and kill), "rc was %d" % rc) AssertionError: rc was 137 ====================================================================== FAIL: test_wakeuptime (__main__.SmokeTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/mpercy/src/bcc/tests/python/test_tools_smoke.py", line 371, in test_wakeuptime self.run_with_duration("wakeuptime.py 1") File "/home/mpercy/src/bcc/tests/python/test_tools_smoke.py", line 32, in run_with_duration (timeout, full_command), shell=True)) AssertionError: 0 != 1 ---------------------------------------------------------------------- Ran 76 tests in 368.071s FAILED (failures=3) Failed <end of output> Test time = 368.11 sec ---------------------------------------------------------- Test Failed. "py_test_tools_smoke" end time: Mar 06 22:18 PST "py_test_tools_smoke" time elapsed: 00:06:08 ---------------------------------------------------------- |
|
Yonghong Song
On Tue, Mar 6, 2018 at 11:12 PM, Mike Percy via iovisor-dev
<iovisor-dev@...> wrote: Hi bcc devs,Maybe some dependencies are not satisfied. I will put some comments below and you may try to install additional packages and run again. Maybe 'ruby' is not installed? Not sure about this one. bpflist.py cannot find one process which just went disappearing after it ID is recorded. This is a known issue. We probably need to add try/except block here. File "../../tools/stackcount.py", line 370, in <module>Maybe this is a real issue related to python 2/3 compatibility, could you help take a look? F....timeout handling issue? 43 # timeout returns 124 if the program did not terminate prematurely, 44 # and returns 137 if we used KILL instead of INT. So there are three 45 # sensible scenarios: 46 # 1. The script is allowed to return early, and it did, with a 47 # success return code. 48 # 2. The script timed out and was killed by the SIGINT signal. 49 # 3. The script timed out and was killed by the SIGKILL signal, and 50 # this was what we asked for using kill=True. 51 self.assertTrue((rc == 0 and allow_early) or rc == 124 52 or (rc == 137 and kill), "rc was %d" % rc)
|
|
Mike Percy <mpercy@...>
Y Song, thank you very much for your detailed feedback. Let me focus on the specific wakeuptime.py failure since that seems to be my biggest blocker at the moment:
It looks like this is a known issue. According to https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md#7-bpf_get_current_task : With Linux 4.13, due to issues with field randomization, you may need two #define directives before the includes: And indeed that fixed the wakeuptime tool. For now, this patch seems to do the trick on this kernel (the undefs just get rid of some warnings): diff --git a/tools/wakeuptime.py b/tools/wakeuptime.py index cf0ca7d..3f6fd67 100755 --- a/tools/wakeuptime.py +++ b/tools/wakeuptime.py @@ -86,6 +86,10 @@ def signal_ignore(signal, frame): # define BPF program bpf_text = """ +#undef randomized_struct_fields_start +#undef randomized_struct_fields_end +#define randomized_struct_fields_start struct { +#define randomized_struct_fields_end }; #include <uapi/linux/ptrace.h> #include <linux/sched.h> It appears that same workaround is also needed for many other tools in the repo to work correctly on this kernel. I was looking around at the kernel 4.13 release notes and I found the RANDSTRUCT patch <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=313dd1b629219db50cad532dba6a3b3b22ffe622> but I'm not sure that is what I'm hitting here since I didn't find a reference to it in my kernel config. Any ideas about the root cause for this workaround and if this is a permanent issue with kernels after 4.13? If so, then I suppose we should find a more maintainable way of injecting that workaround automatically in the BPF class, or find a more permanent workaround for bcc. Thoughts? Thanks again, Mike |
|
Yonghong Song
Your workaround looks good. Basically, you can put the workaround in
toggle quoted message
Show quoted text
for 4.13 for any tools which references task_struct construct. The bcc issue https://github.com/iovisor/bcc/issues/1492 has more details. It is an issue for 4.13. it is an issue for 4.14 original release, but the patch is ported to stable release, so later 4.14 update should contain the fix. 4.15 and later contains the fix. On Wed, Mar 7, 2018 at 10:34 PM, Mike Percy <mpercy@...> wrote:
Y Song, thank you very much for your detailed feedback. |
|
Mike Percy <mpercy@...>
Great! Thanks for the info! Mike On Wed, Mar 7, 2018 at 10:47 PM, Y Song <ys114321@...> wrote: Your workaround looks good. Basically, you can put the workaround in |
|