Re: BPF Concurrency


Jesper Dangaard Brouer
 

Hi Kanthi and Quentin,

You mention token counter as the use-case, for somehow ratelimiting
(TCP connections in your case?).

That reminds me that Quentin implemented[1] a two-color token-bucket
ratelimiter in BPF. (Notice, code is 3 years old, and e.g. use the old
syntax for maps). There is a really good README[1] with illustrations:

[1] https://github.com/qmonnet/tbpoc-bpf

What I found interesting, is that the code actually doesn't implement
this via token counter, but instead uses a sliding time window. Based
on getting timestamping via bpf_ktime_get_ns() code[2]. (p.s. as we have
discussed on xdp-newbies list, getting this timestamp also have
overhead, but we can mitigate overhead by only doing it once per NAPI
cycle).

I would like to hear Quentin why he avoided maintaining the token counter?

-Jesper

[2] https://github.com/qmonnet/tbpoc-bpf/blob/master/tokenbucket.c#L98

On Mon, 15 Jun 2020 05:07:25 +0530
"Kanthi P" <Pavuluri.kanthi@...> wrote:

[Edited Message Follows]

Thanks Song and Andrii for the response.

Use-case is global rate-limiting for incoming TCP connections. And we
want to implement the token bucket algorithm using XDP for this
purpose.

So we are planning to have a map that holds a token counter which
gets two kinds of updates:

1. Periodic increments with 'x' number of tokens per second
2. Decrements as and when we get a new TCP connection request.

Most of our systems are 64 core machines. Since every core would try
to update the counter in parallel as the packets arrive each of them,
the problem I am imagining is that I might miss few updates of the
counter as one core update can overwrite other’s.

I guess it is still ok to lose the case 2 type of updates as that
might just allow a small fraction of more or less connections than
what is configured.

But I cannot afford to lose case 1 kind of updates as that could mean
that I cannot process bunch of connections until the next second.

So if I use "__sync_fetch_and_add" for incrementing the counter (for
case 1), would it guarantee that this update is never missed(though
some other core is trying to update the map to decrement the counter
to account the incoming connection at the same time)?

My understanding is that __sync_fetch_and_add translates to BPF_XADD
internally.  And it looks like spin locks are being supported from
5.x kernel versions, we are on lower version, so can’t try this one
atm.

Regards,
Kanthi

P.S there has been some problem sending the reply, which resulted in
multiple edits and deletes, please bear with me



On Wed, May 27, 2020 at 1:29 AM Andrii Nakryiko < andrii.nakryiko@...
wrote:
On Fri, May 22, 2020 at 1:07 PM Kanthi P < Pavuluri.kanthi@... >
wrote:

Hi,


I’ve been reading that hash map’s update element is atomic and also that
we can use BPF_XADD to make the entire map update atomically.


But I think that doesn’t guarantee that these updates are thread safe,
meaning one cpu core can overwrite other core’s update.


Is there a clean way of keeping them thread safe. Unfortunately I can’t
use per-cpu maps as I need global counters.


And spin locks sounds a costly operation. Can you please throw some
light?
Stating that spin locks are costly without empirical data seems
premature. What's the scenario? What's the number of CPUs? What's the
level of contention? Under light contention, spin locks in practice
would be almost as fast as atomic increments. Under heavy contention,
spin locks would probably be even better than atomics because they
will not waste as much CPU, as a typical atomic retry loop would.

But basically, depending on your use case (which you should probably
describe to get a better answer), you can either:
- do atomic increment/decrement if you need to update a counter (see
examples in kernel selftests using __sync_fetch_and_add);
- use map with bpf_spin_lock (there are also examples in selftests).


--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer

Join iovisor-dev@lists.iovisor.org to automatically receive all group messages.