Hashmap OOPS


Dan Siemon
 

Anyone seen this before? 4.17.6

I've seen this on a few boxes now.

(If there is a better place to post this (LKML?) please let me know.)


Oct 08 21:21:02 kernel: BUG: unable to handle kernel paging request at
ffffc48f1f42d500
Oct 08 21:21:02 kernel: PGD 0 P4D 0
Oct 08 21:21:02 kernel: Oops: 0000 [#1] SMP PTI
Oct 08 21:21:02 kernel: Modules linked in: sch_htb bpwd_drv(OE)
lanner_drv(OE) cls_bpf sch_ingress ip6t_rpfilter ip6t_REJECT
nf_reject_ipv6 xt_conntrack ip_set nf>
Oct 08 21:21:02 kernel: crc32c_intel igb i2c_algo_bit dca
Oct 08 21:21:02 kernel: CPU: 5 PID: 27247 Comm: preseem-netdev-
Tainted: G OE 4.17.6-200.fc28.x86_64 #1
Oct 08 21:21:02 kernel: Hardware name: Default string Default
string/SKYBAY, BIOS 5.12 08/15/2017
Oct 08 21:21:02 kernel: RIP: 0010:memcpy_erms+0x6/0x10
Oct 08 21:21:02 kernel: RSP: 0018:ffffbc8ec1f37e40 EFLAGS: 00010286
Oct 08 21:21:02 kernel: RAX: ffff9eca25ae0ab0 RBX: ffff9eca25ae0ab0
RCX: 0000000000000010
Oct 08 21:21:02 kernel: RDX: 0000000000000010 RSI: ffffc48f1f42d500
RDI: ffff9eca25ae0ab0
Oct 08 21:21:02 kernel: RBP: 0000000000000010 R08: 0000000000400000
R09: 0000000000000001
Oct 08 21:21:02 kernel: R10: 000000002e523fcb R11: 0000000000000000
R12: ffff9ec9eb450600
Oct 08 21:21:02 kernel: R13: 0000000000141ca6 R14: ffff9eca464a3700
R15: ffff9eca25ae0ed0
Oct 08 21:21:02 kernel: FS: 00007f3262ffd700(0000)
GS:ffff9eca9ed40000(0000) knlGS:0000000000000000
Oct 08 21:21:02 kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Oct 08 21:21:02 kernel: CR2: ffffc48f1f42d500 CR3: 0000000422b82003
CR4: 00000000003606e0
Oct 08 21:21:02 kernel: DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
Oct 08 21:21:02 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0
DR7: 0000000000000400
Oct 08 21:21:02 kernel: Call Trace:
Oct 08 21:21:02 kernel: htab_map_get_next_key+0x237/0x280
Oct 08 21:21:02 kernel: map_get_next_key+0xed/0x1f0
Oct 08 21:21:02 kernel: __x64_sys_bpf+0x16b/0x260
Oct 08 21:21:02 kernel: do_syscall_64+0x5b/0x160
Oct 08 21:21:02 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct 08 21:21:02 kernel: RIP: 0033:0x47fa54
Oct 08 21:21:02 kernel: RSP: 002b:000000c428409278 EFLAGS: 00000246
ORIG_RAX: 0000000000000141
Oct 08 21:21:02 kernel: RAX: ffffffffffffffda RBX: 0000000000000000
RCX: 000000000047fa54
Oct 08 21:21:02 kernel: RDX: 0000000000000020 RSI: 000000c428409300
RDI: 0000000000000004
Oct 08 21:21:02 kernel: RBP: 000000c4284093a8 R08: 0000000000000000
R09: 0000000000000000
Oct 08 21:21:02 kernel: R10: 0000000000000000 R11: 0000000000000246
R12: 0000000000001e00
Oct 08 21:21:02 kernel: R13: 0000000000000000 R14: 000000000045d560
R15: 0000000000000000
Oct 08 21:21:02 kernel: Code: 90 90 90 90 90 eb 1e 0f 1f 00 48 89 f8
48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00
48 89 f8 48 89 d1 <f3>
Oct 08 21:21:02 kernel: RIP: memcpy_erms+0x6/0x10 RSP:
ffffbc8ec1f37e40
Oct 08 21:21:02 kernel: CR2: ffffc48f1f42d500
Oct 08 21:21:02 kernel: ---[ end trace ebebe46b9c6091a1 ]---
-- Reboot --


Daniel Borkmann
 

Hi Dan,

On 10/17/2018 03:08 AM, Dan Siemon wrote:
Anyone seen this before? 4.17.6

I've seen this on a few boxes now.

(If there is a better place to post this (LKML?) please let me know.)

Oct 08 21:21:02 kernel: BUG: unable to handle kernel paging request at
ffffc48f1f42d500
Oct 08 21:21:02 kernel: PGD 0 P4D 0
Oct 08 21:21:02 kernel: Oops: 0000 [#1] SMP PTI
Oct 08 21:21:02 kernel: Modules linked in: sch_htb bpwd_drv(OE)
lanner_drv(OE) cls_bpf sch_ingress ip6t_rpfilter ip6t_REJECT
nf_reject_ipv6 xt_conntrack ip_set nf>
Oct 08 21:21:02 kernel: crc32c_intel igb i2c_algo_bit dca
Oct 08 21:21:02 kernel: CPU: 5 PID: 27247 Comm: preseem-netdev-
Tainted: G OE 4.17.6-200.fc28.x86_64 #1
Oct 08 21:21:02 kernel: Hardware name: Default string Default
string/SKYBAY, BIOS 5.12 08/15/2017
Oct 08 21:21:02 kernel: RIP: 0010:memcpy_erms+0x6/0x10
Oct 08 21:21:02 kernel: RSP: 0018:ffffbc8ec1f37e40 EFLAGS: 00010286
Oct 08 21:21:02 kernel: RAX: ffff9eca25ae0ab0 RBX: ffff9eca25ae0ab0
RCX: 0000000000000010
Oct 08 21:21:02 kernel: RDX: 0000000000000010 RSI: ffffc48f1f42d500
RDI: ffff9eca25ae0ab0
Oct 08 21:21:02 kernel: RBP: 0000000000000010 R08: 0000000000400000
R09: 0000000000000001
Oct 08 21:21:02 kernel: R10: 000000002e523fcb R11: 0000000000000000
R12: ffff9ec9eb450600
Oct 08 21:21:02 kernel: R13: 0000000000141ca6 R14: ffff9eca464a3700
R15: ffff9eca25ae0ed0
Oct 08 21:21:02 kernel: FS: 00007f3262ffd700(0000)
GS:ffff9eca9ed40000(0000) knlGS:0000000000000000
Oct 08 21:21:02 kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Oct 08 21:21:02 kernel: CR2: ffffc48f1f42d500 CR3: 0000000422b82003
CR4: 00000000003606e0
Oct 08 21:21:02 kernel: DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
Oct 08 21:21:02 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0
DR7: 0000000000000400
Oct 08 21:21:02 kernel: Call Trace:
Oct 08 21:21:02 kernel: htab_map_get_next_key+0x237/0x280
Oct 08 21:21:02 kernel: map_get_next_key+0xed/0x1f0
Oct 08 21:21:02 kernel: __x64_sys_bpf+0x16b/0x260
Oct 08 21:21:02 kernel: do_syscall_64+0x5b/0x160
Oct 08 21:21:02 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct 08 21:21:02 kernel: RIP: 0033:0x47fa54
Oct 08 21:21:02 kernel: RSP: 002b:000000c428409278 EFLAGS: 00000246
ORIG_RAX: 0000000000000141
Oct 08 21:21:02 kernel: RAX: ffffffffffffffda RBX: 0000000000000000
RCX: 000000000047fa54
Oct 08 21:21:02 kernel: RDX: 0000000000000020 RSI: 000000c428409300
RDI: 0000000000000004
Oct 08 21:21:02 kernel: RBP: 000000c4284093a8 R08: 0000000000000000
R09: 0000000000000000
Oct 08 21:21:02 kernel: R10: 0000000000000000 R11: 0000000000000246
R12: 0000000000001e00
Oct 08 21:21:02 kernel: R13: 0000000000000000 R14: 000000000045d560
R15: 0000000000000000
Oct 08 21:21:02 kernel: Code: 90 90 90 90 90 eb 1e 0f 1f 00 48 89 f8
48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00
48 89 f8 48 89 d1 <f3>
Oct 08 21:21:02 kernel: RIP: memcpy_erms+0x6/0x10 RSP:
ffffbc8ec1f37e40
Oct 08 21:21:02 kernel: CR2: ffffc48f1f42d500
Oct 08 21:21:02 kernel: ---[ end trace ebebe46b9c6091a1 ]---
-- Reboot --
Thanks a lot for the report! Do you happen to have a reproducer? Are you
running into this with concurrent i) updates, ii) deletes, iii) both or
iv) while destroying the whole htab? If i-iii) is it out of BPF datapath
or from syscall path? Is your hashtab preallocated? Any more elaborate info
would help.

Thanks,
Daniel


Dan Siemon
 

On Wed, 2018-10-17 at 14:08 +0200, Daniel Borkmann wrote:
Hi Dan,

On 10/17/2018 03:08 AM, Dan Siemon wrote:
Anyone seen this before? 4.17.6

I've seen this on a few boxes now.

(If there is a better place to post this (LKML?) please let me
know.)

Oct 08 21:21:02 kernel: BUG: unable to handle kernel paging
request at
ffffc48f1f42d500
Oct 08 21:21:02 kernel: PGD 0 P4D 0
Oct 08 21:21:02 kernel: Oops: 0000 [#1] SMP PTI
Oct 08 21:21:02 kernel: Modules linked in: sch_htb bpwd_drv(OE)
lanner_drv(OE) cls_bpf sch_ingress ip6t_rpfilter ip6t_REJECT
nf_reject_ipv6 xt_conntrack ip_set nf>
Oct 08 21:21:02 kernel: crc32c_intel igb i2c_algo_bit dca
Oct 08 21:21:02 kernel: CPU: 5 PID: 27247 Comm: preseem-netdev-
Tainted: G OE 4.17.6-200.fc28.x86_64 #1
Oct 08 21:21:02 kernel: Hardware name: Default string Default
string/SKYBAY, BIOS 5.12 08/15/2017
Oct 08 21:21:02 kernel: RIP: 0010:memcpy_erms+0x6/0x10
Oct 08 21:21:02 kernel: RSP: 0018:ffffbc8ec1f37e40 EFLAGS:
00010286
Oct 08 21:21:02 kernel: RAX: ffff9eca25ae0ab0 RBX:
ffff9eca25ae0ab0
RCX: 0000000000000010
Oct 08 21:21:02 kernel: RDX: 0000000000000010 RSI:
ffffc48f1f42d500
RDI: ffff9eca25ae0ab0
Oct 08 21:21:02 kernel: RBP: 0000000000000010 R08:
0000000000400000
R09: 0000000000000001
Oct 08 21:21:02 kernel: R10: 000000002e523fcb R11:
0000000000000000
R12: ffff9ec9eb450600
Oct 08 21:21:02 kernel: R13: 0000000000141ca6 R14:
ffff9eca464a3700
R15: ffff9eca25ae0ed0
Oct 08 21:21:02 kernel: FS: 00007f3262ffd700(0000)
GS:ffff9eca9ed40000(0000) knlGS:0000000000000000
Oct 08 21:21:02 kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Oct 08 21:21:02 kernel: CR2: ffffc48f1f42d500 CR3:
0000000422b82003
CR4: 00000000003606e0
Oct 08 21:21:02 kernel: DR0: 0000000000000000 DR1:
0000000000000000
DR2: 0000000000000000
Oct 08 21:21:02 kernel: DR3: 0000000000000000 DR6:
00000000fffe0ff0
DR7: 0000000000000400
Oct 08 21:21:02 kernel: Call Trace:
Oct 08 21:21:02 kernel: htab_map_get_next_key+0x237/0x280
Oct 08 21:21:02 kernel: map_get_next_key+0xed/0x1f0
Oct 08 21:21:02 kernel: __x64_sys_bpf+0x16b/0x260
Oct 08 21:21:02 kernel: do_syscall_64+0x5b/0x160
Oct 08 21:21:02 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct 08 21:21:02 kernel: RIP: 0033:0x47fa54
Oct 08 21:21:02 kernel: RSP: 002b:000000c428409278 EFLAGS:
00000246
ORIG_RAX: 0000000000000141
Oct 08 21:21:02 kernel: RAX: ffffffffffffffda RBX:
0000000000000000
RCX: 000000000047fa54
Oct 08 21:21:02 kernel: RDX: 0000000000000020 RSI:
000000c428409300
RDI: 0000000000000004
Oct 08 21:21:02 kernel: RBP: 000000c4284093a8 R08:
0000000000000000
R09: 0000000000000000
Oct 08 21:21:02 kernel: R10: 0000000000000000 R11:
0000000000000246
R12: 0000000000001e00
Oct 08 21:21:02 kernel: R13: 0000000000000000 R14:
000000000045d560
R15: 0000000000000000
Oct 08 21:21:02 kernel: Code: 90 90 90 90 90 eb 1e 0f 1f 00 48 89
f8
48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44
00 00
48 89 f8 48 89 d1 <f3>
Oct 08 21:21:02 kernel: RIP: memcpy_erms+0x6/0x10 RSP:
ffffbc8ec1f37e40
Oct 08 21:21:02 kernel: CR2: ffffc48f1f42d500
Oct 08 21:21:02 kernel: ---[ end trace ebebe46b9c6091a1 ]---
-- Reboot --
Thanks a lot for the report! Do you happen to have a reproducer? Are
you
running into this with concurrent i) updates, ii) deletes, iii) both
or
iv) while destroying the whole htab? If i-iii) is it out of BPF
datapath
or from syscall path? Is your hashtab preallocated? Any more
elaborate info
would help.
Sadly I don't have a reproducer other than I've seen it happen on
several distinct boxes and deployments.

We use several hash maps and I don't know how to tell from the OOPs
which one caused this but the two largest (100-600k entries) work like:

- eBPF program creates entries as it sees flows on the network and
counts metrics (bytes, packets, etc) into each entry and updates a
timestamp on each update
- User space iterates over the map every 10s and pulls out the metrics
for further processing. If the timestamp on the entry is over a
threshold, the user space process deletes the entry (kernel side never
deletes)

bpftool map output:

22: hash flags 0x0
key 12B value 8B max_entries 1213 memlock 122880B
23: hash flags 0x0
key 16B value 256B max_entries 2102717 memlock 739983360B
24: hash flags 0x0
key 12B value 2B max_entries 2102717 memlock 218505216B
25: hash flags 0x0
key 20B value 48B max_entries 2102717 memlock 319438848B


Daniel Borkmann
 

On 10/17/2018 03:48 PM, Dan Siemon wrote:
On Wed, 2018-10-17 at 14:08 +0200, Daniel Borkmann wrote:
Hi Dan,

On 10/17/2018 03:08 AM, Dan Siemon wrote:
Anyone seen this before? 4.17.6

I've seen this on a few boxes now.

(If there is a better place to post this (LKML?) please let me
know.)

Oct 08 21:21:02 kernel: BUG: unable to handle kernel paging
request at
ffffc48f1f42d500
Oct 08 21:21:02 kernel: PGD 0 P4D 0
Oct 08 21:21:02 kernel: Oops: 0000 [#1] SMP PTI
Oct 08 21:21:02 kernel: Modules linked in: sch_htb bpwd_drv(OE)
lanner_drv(OE) cls_bpf sch_ingress ip6t_rpfilter ip6t_REJECT
nf_reject_ipv6 xt_conntrack ip_set nf>
Oct 08 21:21:02 kernel: crc32c_intel igb i2c_algo_bit dca
Oct 08 21:21:02 kernel: CPU: 5 PID: 27247 Comm: preseem-netdev-
Tainted: G OE 4.17.6-200.fc28.x86_64 #1
Oct 08 21:21:02 kernel: Hardware name: Default string Default
string/SKYBAY, BIOS 5.12 08/15/2017
Oct 08 21:21:02 kernel: RIP: 0010:memcpy_erms+0x6/0x10
Oct 08 21:21:02 kernel: RSP: 0018:ffffbc8ec1f37e40 EFLAGS:
00010286
Oct 08 21:21:02 kernel: RAX: ffff9eca25ae0ab0 RBX:
ffff9eca25ae0ab0
RCX: 0000000000000010
Oct 08 21:21:02 kernel: RDX: 0000000000000010 RSI:
ffffc48f1f42d500
RDI: ffff9eca25ae0ab0
Oct 08 21:21:02 kernel: RBP: 0000000000000010 R08:
0000000000400000
R09: 0000000000000001
Oct 08 21:21:02 kernel: R10: 000000002e523fcb R11:
0000000000000000
R12: ffff9ec9eb450600
Oct 08 21:21:02 kernel: R13: 0000000000141ca6 R14:
ffff9eca464a3700
R15: ffff9eca25ae0ed0
Oct 08 21:21:02 kernel: FS: 00007f3262ffd700(0000)
GS:ffff9eca9ed40000(0000) knlGS:0000000000000000
Oct 08 21:21:02 kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Oct 08 21:21:02 kernel: CR2: ffffc48f1f42d500 CR3:
0000000422b82003
CR4: 00000000003606e0
Oct 08 21:21:02 kernel: DR0: 0000000000000000 DR1:
0000000000000000
DR2: 0000000000000000
Oct 08 21:21:02 kernel: DR3: 0000000000000000 DR6:
00000000fffe0ff0
DR7: 0000000000000400
Oct 08 21:21:02 kernel: Call Trace:
Oct 08 21:21:02 kernel: htab_map_get_next_key+0x237/0x280
Oct 08 21:21:02 kernel: map_get_next_key+0xed/0x1f0
Oct 08 21:21:02 kernel: __x64_sys_bpf+0x16b/0x260
Oct 08 21:21:02 kernel: do_syscall_64+0x5b/0x160
Oct 08 21:21:02 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct 08 21:21:02 kernel: RIP: 0033:0x47fa54
Oct 08 21:21:02 kernel: RSP: 002b:000000c428409278 EFLAGS:
00000246
ORIG_RAX: 0000000000000141
Oct 08 21:21:02 kernel: RAX: ffffffffffffffda RBX:
0000000000000000
RCX: 000000000047fa54
Oct 08 21:21:02 kernel: RDX: 0000000000000020 RSI:
000000c428409300
RDI: 0000000000000004
Oct 08 21:21:02 kernel: RBP: 000000c4284093a8 R08:
0000000000000000
R09: 0000000000000000
Oct 08 21:21:02 kernel: R10: 0000000000000000 R11:
0000000000000246
R12: 0000000000001e00
Oct 08 21:21:02 kernel: R13: 0000000000000000 R14:
000000000045d560
R15: 0000000000000000
Oct 08 21:21:02 kernel: Code: 90 90 90 90 90 eb 1e 0f 1f 00 48 89
f8
48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44
00 00
48 89 f8 48 89 d1 <f3>
Oct 08 21:21:02 kernel: RIP: memcpy_erms+0x6/0x10 RSP:
ffffbc8ec1f37e40
Oct 08 21:21:02 kernel: CR2: ffffc48f1f42d500
Oct 08 21:21:02 kernel: ---[ end trace ebebe46b9c6091a1 ]---
-- Reboot --
Thanks a lot for the report! Do you happen to have a reproducer? Are
you
running into this with concurrent i) updates, ii) deletes, iii) both
or
iv) while destroying the whole htab? If i-iii) is it out of BPF
datapath
or from syscall path? Is your hashtab preallocated? Any more
elaborate info
would help.
Sadly I don't have a reproducer other than I've seen it happen on
several distinct boxes and deployments.

We use several hash maps and I don't know how to tell from the OOPs
which one caused this but the two largest (100-600k entries) work like:

- eBPF program creates entries as it sees flows on the network and
counts metrics (bytes, packets, etc) into each entry and updates a
timestamp on each update
Does that happen via map update helper from kernel side in your prog?

- User space iterates over the map every 10s and pulls out the metrics
for further processing. If the timestamp on the entry is over a
threshold, the user space process deletes the entry (kernel side never
deletes)
And the map iterate + delete, are there multiple threads walking it
at the same time or just single one (just making sure ...)?

bpftool map output:

22: hash flags 0x0
Ok, you are using map prealloc here. How often does the BUG trigger
in general on your machines? Was there an older kernel where you
haven't been able to trigger it?

key 12B value 8B max_entries 1213 memlock 122880B
23: hash flags 0x0
key 16B value 256B max_entries 2102717 memlock 739983360B
24: hash flags 0x0
key 12B value 2B max_entries 2102717 memlock 218505216B
25: hash flags 0x0
key 20B value 48B max_entries 2102717 memlock 319438848B



Dan Siemon
 

On Wed, 2018-10-17 at 16:18 +0200, Daniel Borkmann wrote:
Sadly I don't have a reproducer other than I've seen it happen on
several distinct boxes and deployments.

We use several hash maps and I don't know how to tell from the OOPs
which one caused this but the two largest (100-600k entries) work
like:

- eBPF program creates entries as it sees flows on the network and
counts metrics (bytes, packets, etc) into each entry and updates a
timestamp on each update
Does that happen via map update helper from kernel side in your prog?
Yes, when a new entry is added there is a map_update_elem() call.
However when an entry is updated, we update the struct returned by
map_lookup_elem directly without another call to map_update_elem().

We had another thread a while back on whether or not map_update_elem()
needs to be called every time and I'm still not really clear on the
requirements for safe concurrency in this area.

- User space iterates over the map every 10s and pulls out the
metrics
for further processing. If the timestamp on the entry is over a
threshold, the user space process deletes the entry (kernel side
never
deletes)
And the map iterate + delete, are there multiple threads walking it
at the same time or just single one (just making sure ...)?
There is one Goroutine which does this work which should map to one
thread (we use https://github.com/aterlo/puregobpf).

bpftool map output:

22: hash flags 0x0
Ok, you are using map prealloc here. How often does the BUG trigger
in general on your machines? Was there an older kernel where you
haven't been able to trigger it?
We've seen these for some time going back several kernel versions. I'm
pretty sure I've also seen it on 4.18 but don't have an example handy.
I can't say definitively how often per box but we probably see one of
these every week or so across many boxes. I initially thought it was
related to bad hardware but now doubt that.