Re: BPF Concurrency

Quentin Monnet

2020-06-16 14:18 UTC+0200 ~ Jesper Dangaard Brouer <brouer@...>
Hi Kanthi and Quentin,

You mention token counter as the use-case, for somehow ratelimiting
(TCP connections in your case?).

That reminds me that Quentin implemented[1] a two-color token-bucket
ratelimiter in BPF. (Notice, code is 3 years old, and e.g. use the old
syntax for maps). There is a really good README[1] with illustrations:


What I found interesting, is that the code actually doesn't implement
this via token counter, but instead uses a sliding time window. Based
on getting timestamping via bpf_ktime_get_ns() code[2]. (p.s. as we have
discussed on xdp-newbies list, getting this timestamp also have
overhead, but we can mitigate overhead by only doing it once per NAPI

I would like to hear Quentin why he avoided maintaining the token counter?


Hi Jesper, Kanthi,

This token bucket implementation was realised in the context of a
research project (BEBA), for which the OPP interface, based on “extended
finite state machines”, was developed. This abstraction interface can be
used to implement various programs, including the token bucket
application, on several platforms (we had proof-of-concept
implementations on FPGAs or in software switches for example). Since it
is strictly event-based, the model does not really allow for “external”
interactions such as map update from user space, which would be
necessary to increment the token counter periodically. More information
on BEBA and OPP is available from the links on the GitHub README.

My objective at the time was simply to translate this token bucket
example into an eBPF program, in order to show that eBPF could be a
target for OPP, so I stuck to the OPP-based algorithm we had. It should
work like a regular token bucket implementation, the sliding window is
just another way to represent the maximum number of packets to accept in
a given period of time, without having to periodically increment the
counter. For example, a long shift of the sliding window (case 2 on the
diagram) equates to refilling the bucket entirely.

We may have run some benchmarks at the time, but we did not compare it
to a more standard implementation, so I could not really tell which is
best in terms of performance.

The code is from 2017, but if you want to test it, I would still expect
it to load (I haven't checked recently though). Jesper, the syntax for
maps is still the one used today when loading programs with iproute2 I
believe? Anyway, not sure how I can help further, but let me know if you
have questions.

Best regards,

Join { to automatically receive all group messages.