verifier: variable offset stack access question


Andrei Matei
 

Hello Yonghong, all,

I'm curious about a verifier workaround that Yonghong provided two years ago, in this thread.
Brendan Gregg was asking about accessing stack buffers through a register with a variable offset, and Yonghong suggested a memset as a solution:
"
You can initialize the array with ' ' to workaround the issue:
    struct data_t data;
    uint64_t max = sizeof(data.argv);
    const char *argp = NULL;
    memset(&data, ' ', sizeof(data));
    bpf_probe_read(&argp, sizeof(argp), (void *)&__argv[0]);
    uint64_t len = bpf_probe_read_str(&data.argv, max, argp);
    len &= 0xffffffff; // to avoid: "math between fp pointer and register errs"
    bpf_trace_printk("len: %d\\n", len); // sanity check: len is indeed valid
"

My question is - how does the memset help? I sort of understand the trouble with variable stack access (different regions of the stack can hold memory of different types), and I've looked through the verifier's code but I've failed to get a clue.

As far as actually trying the trick, I've had difficulty importing <string.h> in my bpf program. I'm not working in the context of BCC, so maybe that makes the difference. I've tried zero-ing out my buffer manually, and it didn't seem to change anything. I've had better success allocating my buffer using map memory rather than stack memory, but I'm still curious what a memset could do for me.

Thanks a lot!

- Andrei


Andrei Matei
 

For posterity, I think I can now answer my own question. I suspect
things were different in 2018 (because otherwise I don’t see how the
referenced exchange makes sense); here’s my understanding about the
verifier’s rules for stack accesses today:

There’s two distinct aspects relevant to the use of variable stack offsets:

1) “Direct” stack access with variable offset. This is simply
forbidden; you can’t read or write from a dynamic offset in the stack
because, in the case of reads, the verifier doesn’t know what type of
memory would be returned (is it “misc” data? Is it a spilled
register?) and, in the case of writes, what stack slot’s memory type
should be updated.
Separately, when reading from the stack with a fixed offset, the
respective memory needs to have been initialized (i.e. written to)
before.

2) Passing pointers to the stack to helper functions which will write
through the pointer (such as bpf_probe_read_user()). Here, if the
stack offset is variable, then all the memory that falls within the
possible bounds has to be initialized.
If the offset is fixed, then the memory doesn’t necessarily need to be
initialized (at least not if the helper’s argument is of type
ARG_PTR_TO_UNINIT_MEM). Why the restriction in the variable offset
case? Because, in that case, it cannot be known what memory the helper
will end up initializing; if the verifier pretended that all the
memory within the offset bounds would be initialized then further
reads could leak uninitialized stack memory.


Yonghong Song
 

On Wed, Dec 23, 2020 at 2:21 PM Andrei Matei <andreimatei1@...> wrote:

Hello Yonghong, all,

I'm curious about a verifier workaround that Yonghong provided two years ago, in this thread.
Brendan Gregg was asking about accessing stack buffers through a register with a variable offset, and Yonghong suggested a memset as a solution:
"
You can initialize the array with ' ' to workaround the issue:
struct data_t data;
uint64_t max = sizeof(data.argv);
const char *argp = NULL;
memset(&data, ' ', sizeof(data));
bpf_probe_read(&argp, sizeof(argp), (void *)&__argv[0]);
uint64_t len = bpf_probe_read_str(&data.argv, max, argp);
len &= 0xffffffff; // to avoid: "math between fp pointer and register errs"
bpf_trace_printk("len: %d\\n", len); // sanity check: len is indeed valid
"

My question is - how does the memset help? I sort of understand the trouble with variable stack access (different regions of the stack can hold memory of different types), and I've looked through the verifier's code but I've failed to get a clue.
I cannot remember details. Here, what "memset" did is to initialize
related bytes in stack to 0. I guess maybe at that point
bpf_probe_read_str requires an initialized memory?

Right now, bpf_probe_read_str does not require initialized memory, so
memset may not be necessary.


As far as actually trying the trick, I've had difficulty importing <string.h> in my bpf program. I'm not working in the context of BCC, so maybe that makes the difference. I've tried zero-ing out my buffer manually, and it didn't seem to change anything. I've had better success allocating my buffer using map memory rather than stack memory, but I'm still curious what a memset could do for me.
A lot of string.h functions are implemented as external functions in
glibc. This won't work for bpf programs as the bpf program is not
linked against glibc. The clang compiler will translate the above
memset to some stores if memset() size is not big enough. Better,
using clang __builtin_memset() so it won't have any relation with
glibc.


Thanks a lot!

- Andrei


Yonghong Song
 

On Fri, Dec 25, 2020 at 5:41 PM Andrei Matei <andreimatei1@...> wrote:

For posterity, I think I can now answer my own question. I suspect
things were different in 2018 (because otherwise I don’t see how the
referenced exchange makes sense); here’s my understanding about the
verifier’s rules for stack accesses today:

There’s two distinct aspects relevant to the use of variable stack offsets:

1) “Direct” stack access with variable offset. This is simply
forbidden; you can’t read or write from a dynamic offset in the stack
because, in the case of reads, the verifier doesn’t know what type of
memory would be returned (is it “misc” data? Is it a spilled
register?) and, in the case of writes, what stack slot’s memory type
should be updated.
Separately, when reading from the stack with a fixed offset, the
respective memory needs to have been initialized (i.e. written to)
before.

2) Passing pointers to the stack to helper functions which will write
through the pointer (such as bpf_probe_read_user()). Here, if the
stack offset is variable, then all the memory that falls within the
possible bounds has to be initialized.
If the offset is fixed, then the memory doesn’t necessarily need to be
initialized (at least not if the helper’s argument is of type
ARG_PTR_TO_UNINIT_MEM). Why the restriction in the variable offset
case? Because, in that case, it cannot be known what memory the helper
will end up initializing; if the verifier pretended that all the
memory within the offset bounds would be initialized then further
reads could leak uninitialized stack memory.
I think your above assessment is kind of correct. For any read/write
to stack in bpf programs, the stack offset must be known so the
verifier knows exactly what the program tries to do. For helpers,
variable length of stack is permitted and the verifier will do
analysis to ensure the stack meets the memory (esp. initialized
memory) requirement as stated in helper proto definition.