Re: [PATCHv5 RFC 1/3] BPF: New helper to obtainnamespace data from current task.


Yonghong Song
 

On Mon, Aug 5, 2019 at 2:19 PM carlos antonio neira bustos
<cneirabustos@...> wrote:

Hi Yonghong,



I had time yesterday to finish the code to avoid using getname_kernel , self-tests run ok with expected results.
looks good. A couple of comments below.


Here is the current code to avoid the getname_kernel call.



diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c

index 5e28718928ca..bfb2bbf37a58 100644

--- a/kernel/bpf/helpers.c

+++ b/kernel/bpf/helpers.c

@@ -11,6 +11,12 @@

#include <linux/uidgid.h>

#include <linux/filter.h>

#include <linux/ctype.h>

+#include <linux/pid_namespace.h>

+#include <linux/major.h>

+#include <linux/stat.h>

+#include <linux/namei.h>

+#include <linux/version.h>

+



#include "../../lib/kstrtox.h"

@@ -312,6 +318,81 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,

preempt_enable();

}

+BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,

+ size)

+{

+ const char* name = "/proc/self/ns/pid";

+ struct pid_namespace *pidns = NULL;

+ int len = strlen(name) + 1;

+ struct filename *tmp = NULL;
reverse Christmas order.


+ struct inode *inode;

+ struct path kp;

+ pid_t tgid = 0;

+ pid_t pid = 0;

+ int ret;
Looks like -EINVAL is returned for all error cases.
This is suboptimal. There are cases where other error
code should be returned.

+

+

+ if (unlikely(size != sizeof(struct bpf_pidns_info)))

+ goto clear;
directly return -EINVAL.

+

+ pidns = task_active_pid_ns(current);

+

+ if (unlikely(!pidns))

+ goto clear;

+

+ pidns_info->nsid = pidns->ns.inum;

+ pid = task_pid_nr_ns(current, pidns);

+

+ if (unlikely(!pid))

+ goto clear;

+

+ tgid = task_tgid_nr_ns(current, pidns);

+

+ if (unlikely(!tgid))

+ goto clear;
Not 100% sure whether pidns/pid/tgid can go wrong. During process exit?
You can keep it now. Need more study or other people can comment on this.


+

+ pidns_info->tgid = (u32) tgid;

+ pidns_info->pid = (u32) pid;

+

+ if(!names_cachep)

+ goto clear;

+

+ tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);

+ if (unlikely(!tmp)) {

+ goto clear;

+ }

+

+ memcpy((char *)tmp->name, name, len);

+ tmp->uptr = NULL;

+ tmp->aname = NULL;

+ tmp->refcnt = 1;

+

+ ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);

+

+ if (ret)

+ goto clear;
We should return the error code "ret" instead of -EINVAL.


+

+

+ inode = d_backing_inode(kp.dentry);

+ pidns_info->dev = inode->i_sb->s_dev;

+

+ return 0;

+

+ clear:

+ if (pidns_info)

+ memset((void *)pidns, 0, (size_t) size);
pidns_info is not NULL, guaranteed by the verifier, so "if
(pidns_info)" can be removed.
It should be
memset((void *)pidns_info, ...)
Accidentally zeroing pidns may cause kernel errors (not sure whether
it is related below fault or not).

After making the above suggested changes, you can directly send the
patch set to the upstream.
Please cc me at yhs@....


+

+ return -EINVAL;

+}

+

+const struct bpf_func_proto bpf_get_current_pidns_info_proto = {

+ .func = bpf_get_current_pidns_info,

+ .gpl_only = false,

+ .ret_type = RET_INTEGER,

+ .arg1_type = ARG_PTR_TO_UNINIT_MEM,

+ .arg2_type = ARG_CONST_SIZE,

+};

+

#ifdef CONFIG_CGROUPS

BPF_CALL_0(bpf_get_current_cgroup_id)

{





Now the problem I’m facing is that after running the sample code to call the helper, I’m seeing this once but then it’s never repeated running in later runs of the same sample code.



[ 75.097213] CPU: 0 PID: 1382 Comm: bash Not tainted 5.3.0-rc1patch12-00120-g5d8525007580-dirty #7

[ 75.097213] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018

[ 75.097215] RIP: 0010:idr_alloc_u32+0xa8/0xc0

[ 75.097216] Code: 31 d2 48 89 e6 48 89 ef e8 e5 54 00 00 31 c0 48 8b 7c 24 20 65 48 33 3c 25 28 00 00 00 75 1a 48 83 c4 28 5b 5d 41 5c 41 5d c3 <0f> 0b 0d 04 00 80 00 89 47 04 e9 78 ff ff ff e8 14 ed 93 ff 0f 1f

[ 75.097216] RSP: 0018:ffffac39007c7d08 EFLAGS: 00010046

[ 75.097217] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000001ffff

[ 75.097218] RDX: ffffac39007c7d5c RSI: 0000000000000d99 RDI: ffffffffbea46b88

[ 75.097218] RBP: ffffffffbea46b88 R08: 0000000000000a20 R09: 0000000000000000

[ 75.097218] R10: 0000000001200000 R11: 0000000000000000 R12: 0000000000000000

[ 75.097219] R13: ffffac39007c7d5c R14: 000000000000012c R15: ffffffffbea46b80

[ 75.097219] FS: 00007f7ba7691b40(0000) GS:ffffa0fef5c00000(0000) knlGS:0000000000000000

[ 75.097220] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[ 75.097220] CR2: 00000000022eea64 CR3: 000000022f498004 CR4: 00000000003606f0

[ 75.097246] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

[ 75.097246] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

[ 75.097247] Call Trace:

[ 75.097251] ? kmem_cache_alloc+0x161/0x600

[ 75.097252] idr_alloc_cyclic+0x53/0xb0

[ 75.097254] alloc_pid+0xc1/0x2a0

[ 75.097255] copy_process+0xdc4/0x1c20

[ 75.097256] _do_fork+0x74/0x360

[ 75.097257] ? recalc_sigpending+0x17/0x50

[ 75.097258] __x64_sys_clone+0x7f/0xa0

[ 75.097260] do_syscall_64+0x55/0x130

[ 75.097261] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[ 75.097263] RIP: 0033:0x7f7ba6d6538b

[ 75.097264] Code: db 45 85 f6 0f 85 95 01 00 00 64 4c 8b 04 25 10 00 00 00 31 d2 4d 8d 90 d0 02 00 00 31 f6 bf 11 00 20 01 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 de 00 00 00 85 c0 41 89 c5 0f 85 e5 00 00

[ 75.097265] RSP: 002b:00007ffc9a6cf180 EFLAGS: 00000246 ORIG_RAX: 0000000000000038

[ 75.097265] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7ba6d6538b

[ 75.097266] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011

[ 75.097266] RBP: 00007ffc9a6cf1b0 R08: 00007f7ba7691b40 R09: 0000000000000002

[ 75.097266] R10: 00007f7ba7691e10 R11: 0000000000000246 R12: 0000000000000000

[ 75.097267] R13: 00000000004cd15e R14: 0000000000000000 R15: 0000000000000000

[ 75.097268] ---[ end trace 2937ef4d051e39cd ]---



I need a couple of days to debug this and then submit the patch if nobody finds anything odd in the code.

I hope tomorrow I’ll have good news on this if nobody finds anything odd in the code first.



Bests



From: Y Song
Sent: 05 August 2019 16:16
To: carlos antonio neira bustos
Cc: Alexei Starovoitov; Tom Herbert via iovisor-dev; Eric Biederman
Subject: Re: [iovisor-dev][PATCHv5 RFC 1/3] BPF: New helper to obtainnamespace data from current task.



Carlos,



Any update on this? It would be good to submit the patch earlier for a

release cycle

so we got a chance it may be accepted in the current release cycle.



On Fri, Jul 26, 2019 at 3:41 AM carlos antonio neira bustos

<cneirabustos@...> wrote:

Thanks Yonghong!.
Yes, I was looking at filename_lookup today on how to do it, I'll work on that.
Thank you very much.
El vie., 26 de jul. de 2019 01:18, Yonghong Song <ys114321@...> escribió:
On Thu, Jul 25, 2019 at 12:11 PM Alexei Starovoitov
<alexei.starovoitov@...> wrote:
On Thu, Jul 25, 2019 at 11:53 AM neirac <cneirabustos@...> wrote:
Hey Yonghong,
I have changed getname_kernel interface to specify the allocation type,
all previous callers preserve GFP_KERNEL as the allocation type, after this change
getname_kernel could be used with GFP_ATOMIC in ebpf helpers.
If this change goes in, I could complete bpf_get_current_pidns_info helper.
Let me know your thoughts.
please figure out how to get pidns_info w/o calling into getname_kernel().
Looks like we can call filename_lookup() directly. We do not need to
call getname_kernel().
We know the exact path name. We can do a atomic allocation inside the helper.
Carlos, could you take a look along this line? In this way, we do not
need to change
any files under linux/fs.

Join iovisor-dev@lists.iovisor.org to automatically receive all group messages.