Re: Documentation on eBPF map types?


Jesper Dangaard Brouer
 

On Thu, 2 Feb 2017 11:56:19 +0100
Jesper Dangaard Brouer <brouer@...> wrote:

On Tue, 31 Jan 2017 20:54:10 -0800
Alexei Starovoitov <alexei.starovoitov@...> wrote:

On Tue, Jan 31, 2017 at 9:54 AM, Jesper Dangaard Brouer via
iovisor-dev <iovisor-dev@...> wrote:

On Sat, 28 Jan 2017 10:14:58 +1100 Brendan Gregg <brendan.d.gregg@...> wrote:

I did some in the bcc ref guide, but it's incomplete, and the bcc versions:
https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md
Thanks! - this seem rather BCC specific syntax, and I'm looking for
documentation close for the kernel (samples/bpf/).

The best documentation I found was the man-page for the syscall bpf(2):
http://man7.org/linux/man-pages/man2/bpf.2.html

In lack of a better place, I've started documenting eBPF here:
https://prototype-kernel.readthedocs.io/en/latest/bpf/index.html

This doc is compatible with the kernels doc format, and I hope we can
push this into the kernel tree, if it turns out to be valuable?
(https://www.kernel.org/doc/html/latest/)
yeah. definitely would be great to add map descriptions to the kernel docs.
So far most of it is in commit logs.
git log kernel/bpf/arraymap.c|tail -33
git log kernel/bpf/hashtab.c|tail -33
will give an overview of key hash and array map principles.
Thanks, I'm using that to write some doc.
http://prototype-kernel.readthedocs.io/en/latest/bpf/ebpf_maps.html
Gotten to BPF_MAP_TYPE_ARRAY
http://prototype-kernel.readthedocs.io/en/latest/bpf/ebpf_maps.html#bpf-map-type-array


Can you explain the difference between the kernel and userspace side of
the call bpf_map_lookup_elem() ?

Kernel side:
long *value = bpf_map_lookup_elem(&my_map, &key);

Userspace side:
long long value;
bpf_map_lookup_elem(map_fd[0], &key, &value)

Looks like userspace gets a copy of the memory...
If so, how can userspace then increment the value safely?
I documented this myself, please correct me.
http://prototype-kernel.readthedocs.io/en/latest/bpf/ebpf_maps.html#creating-a-map
http://prototype-kernel.readthedocs.io/en/latest/bpf/ebpf_maps.html#interacting-with-maps

Commit:
https://github.com/netoptimizer/prototype-kernel/commit/ffbcdc453f3c

Inlined below:
-------------

[PATCH] doc: how interacting with eBPF maps works

Documented by reading the code.

I hope someone more knowledgeable will review this and
correct me where I misunderstood things.

Signed-off-by: Jesper Dangaard Brouer <brouer@...>
---
kernel/Documentation/bpf/ebpf_maps.rst | 126 ++++++++++++++++++++++++++++++--
1 file changed, 120 insertions(+), 6 deletions(-)

diff --git a/kernel/Documentation/bpf/ebpf_maps.rst b/kernel/Documentation/bpf/ebpf_maps.rst
index 562edd566e0b..55068c7f3dab 100644
--- a/kernel/Documentation/bpf/ebpf_maps.rst
+++ b/kernel/Documentation/bpf/ebpf_maps.rst
@@ -23,13 +23,128 @@ and accessed by multiple programs (from man-page `bpf(2)`_):
up to the user process and eBPF program to decide what they store
inside maps.

+Creating a map
+==============
+
+A maps is created based on a request from userspace, via the `bpf`_
+syscall (`bpf_cmd`_ BPF_MAP_CREATE), and returns a new file descriptor
+that refers to the map. These are the setup arguments when creating a
+map.
+
+.. code-block:: c
+
+ struct { /* anonymous struct used by BPF_MAP_CREATE command */
+ __u32 map_type; /* one of enum bpf_map_type */
+ __u32 key_size; /* size of key in bytes */
+ __u32 value_size; /* size of value in bytes */
+ __u32 max_entries; /* max number of entries in a map */
+ __u32 map_flags; /* prealloc or not */
+ };
+
+For programs under samples/bpf/ the ``load_bpf_file()`` call (from
+`samples/bpf/bpf_load`_) takes care of parsing elf file compiled by
+LLVM, pickup 'maps' section and creates maps via BPF syscall. This is
+done by defining a ``struct bpf_map_def`` with an elf section
+__attribute__ ``SEC("maps")``, in the xxx_kern.c file. The maps file
+descriptor is available in the userspace xxx_user.c file, via global
+array variable ``map_fd[]``, and the array map index correspons to the
+order the maps sections were defined in elf file of xxx_kern.c file.
+
+.. code-block:: c
+
+ struct bpf_map_def {
+ unsigned int type;
+ unsigned int key_size;
+ unsigned int value_size;
+ unsigned int max_entries;
+ unsigned int map_flags;
+ };
+
+ struct bpf_map_def SEC("maps") my_map = {
+ .type = BPF_MAP_TYPE_XXX,
+ .key_size = sizeof(u32),
+ .value_size = sizeof(u64),
+ .max_entries = 42,
+ .map_flags = 0
+ };
+
+.. _samples/bpf/bpf_load:
+ https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/samples/bpf/bpf_load.c
+
+
+Interacting with maps
+=====================
+
+Interacting with an eBPF maps from **userspace**, happens through the
+`bpf`_ syscall and a file descriptor. The kernel
+`tools/lib/bpf/bpf.h`_ define some ``bpf_map_*()`` helper functions
+for wrapping the `bpf_cmd`_ relating to manipulating the map elements.
+
+.. code-block:: c
+
+ enum bpf_cmd {
+ [...]
+ BPF_MAP_LOOKUP_ELEM,
+ BPF_MAP_UPDATE_ELEM,
+ BPF_MAP_DELETE_ELEM,
+ BPF_MAP_GET_NEXT_KEY,
+ [...]
+ };
+ /* Corresponding helper functions */
+ int bpf_map_lookup_elem(int fd, void *key, void *value);
+ int bpf_map_update_elem(int fd, void *key, void *value, __u64 flags);
+ int bpf_map_delete_elem(int fd, void *key);
+ int bpf_map_get_next_key(int fd, void *key, void *next_key);
+
+Notice from userspace, there is no call to atomically increment or
+decrement the value 'in-place'. The bpf_map_update_elem() call will
+overwrite the existing value. The flags argument allows
+bpf_map_update_elem() define semantics on weather the element exist:
+
+.. code-block:: c
+
+ /* File: include/uapi/linux/bpf.h */
+ /* flags for BPF_MAP_UPDATE_ELEM command */
+ #define BPF_ANY 0 /* create new element or update existing */
+ #define BPF_NOEXIST 1 /* create new element if it didn't exist */
+ #define BPF_EXIST 2 /* update existing element */
+
+The eBPF-program running "kernel-side" have almost the same primitives
+(lookup/update/delete) for interacting with the map, but it interact
+more directly with the map data structures. For example the call
+``bpf_map_lookup_elem()`` returns a direct pointer to the 'value'
+memory-element inside the kernel (while userspace gets a copy). This
+allows the eBPF-program to atomically increment or decrement the value
+'in-place', by using appropiate compiler primitives like
+``__sync_fetch_and_add()``, which is understood by LLVM when
+generating eBPF instructions.
+
+On the kernel side, implementing a map type requires defining some
+function (pointers) via `struct bpf_map_ops`_. And eBPF programs have
+access to ``map_lookup_elem``, ``map_update_elem`` and
+``map_delete_elem``, which gets invoked from eBPF via bpf-helpers in
+`kernel/bpf/helpers.c`_.
+
+.. section links
+
+.. _tools/lib/bpf/bpf.h:
+ https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/tools/lib/bpf/bpf.h
+
+.. _bpf_cmd: http://lxr.free-electrons.com/ident?i=bpf_cmd
+
+.. _struct bpf_map_ops: http://lxr.free-electrons.com/ident?i=bpf_map_ops
+
+.. _kernel/bpf/helpers.c:
+ https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/helpers.c
+
+
=============
Types of maps
=============

-There are diffent types of maps available. The defines needed when
-creating the maps are defined in include/uapi/linux/bpf.h as
-``enum bpf_map_type``.
+There are diffent types of maps available. The type definitions
+needed when creating the maps are defined in include/uapi/linux/bpf.h
+as ``enum bpf_map_type``.

Example of `bpf_map_type`_ from kernel 4.9, but remember to `lookup
latest`_ available maps in the source code ::
@@ -48,9 +163,6 @@ latest`_ available maps in the source code ::
BPF_MAP_TYPE_LRU_PERCPU_HASH,
};

-
-.. TODO:: documentation how I interact with these maps
-
BPF_MAP_TYPE_ARRAY
==================

@@ -112,6 +224,8 @@ when updating the value in-place.

.. _bpf(2): http://man7.org/linux/man-pages/man2/bpf.2.html

+.. _bpf: http://man7.org/linux/man-pages/man2/bpf.2.html
+
.. _bpf_map_type:
http://lxr.free-electrons.com/source/tools/include/uapi/linux/bpf.h?v=4.9#L78


--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer

Join iovisor-dev@lists.iovisor.org to automatically receive all group messages.