Re: Accessing user memory and minor page faults

Gianluca Borello <g.borello@...>

On Sun, Oct 1, 2017 at 7:00 PM, Alexei Starovoitov
<alexei.starovoitov@...> wrote:

my understanding of speculative page fault patch is that the whole
get_user_pages will operate under srcu, so we can call it
if necessary (not only find_vma will be safe to call).
Hi Alexei,

My understanding, and please correct me if I'm wrong, is that yes,
get_user_pages() can definitely work under srcu, but if the PTE is not
found (like after fork), then get_user_pages will call some operations
in vma->vm_ops that may cause the caller to sleep and there's no way
or flag to prevent this behavior (except perhaps using
__get_user_pages_fast). While this isn't a problem in the places where
get_user_pages is typically called (e.g. access_process_vm), we can't
do it from a BPF helper because we run it with preempt_disable()
regardless of the srcu (at least the BPF programs from kprobes and
tracepoints, which I use the most).

all correct. It's not simpler than what you described.
I thought access() will be available in the situation we care about.
The only bit I'm missing is why to trace exec() args it will be going
all the way to filemap-backed pages of the executable ?
The strings are in the heap/stack, no?
Well, in my experiments done on normal workloads I can see many
constant strings that come directly from the mapped executable and are
passed as arguments to system calls. As a rule of thumb, running an
"apt-get install xxx" on any system will generate several dozens of
them, I analyzed a few and they are all instances of fork() +

we can try to implement filemap_access and it probably won't
need to lock_page to access it, since readahead does it without locking.
Implementing filemap_access sounds like a feasible strategy, I wasn't
sure about that option since it would effectively mean patching all
the vm_ops definitions under fs/. I haven't read the implementation of
filemap yet but I'll take a better look as soon as I have some time.

At the same time, for anonymous VMAs it would be useful to have some
sort of BPF helper working like __get_user_pages_fast which would
somehow ignore the PROT_NONE set on the PTEs during the NUMA balancing
(not even sure that's actually possible, will have to experiment).

By the way, none of this is urgent as I have my workarounds to get
around that (I mostly parse arguments at the exit of the system call
tracepoint), but it'd be nice to have a few more powerful helpers that
can dig into all sorts of memory situations :-)


Join to automatically receive all group messages.