Topics

sockmap redirect doesn't work all the time

forrest0579@...
 

Hi all,

Recently I've been testing with ebpf sockmap. I know that after setting the sock fd in sockmap, the data send to this socket can redirect to another socket that register in the sockmap. 

I rewrite an example based on https://github.com/dippynark/bpf-sockmap and find something unexpected. In my example, my program(an LB instance with sockmap support to redirect data from client to the backend real server) will accept a connection from client and build a new connection to real server. Than set both the two sock fds to sockmap and the verdict program will redirect packet from one socket to another.
I find that some packets are not handled by sockmap if the packet arrived before I set sock fd. So I have to read the packets from my program and send it out to the real server.

In my expectation, I think that after set sockfd to sockmap, all packets should be handled by parser/verdict ebpf functions and should not read any data from userspace, even if the packet received before the sock fd. But it seems I'm wrong...

If the description above is as expected, in what scenario I could use the sockmap since it doesn't make sure all packet handled by ebpf redirect.
Or if I am wrong, or using the sockmap in the wrong way/scenario, please point this out

Thanks,

Forrest Chen

Ferenc Fejes
 

Hi!

Are you sure about the problem? Try to log (with bpf_printk for example) the incoming packets in the BPF program to check if the really missing. I also encountered your problem in my experiments, but then it turned out the packets dropped because of insufficient receiver memory on the peer socket. I recommend you to try it with small messages (like 10kbyte, so you can count all the packets if the arriving properly, based on the BPF parser log). 

Thanks,
Ferenc

forrest0579@...
 

Hi Ferenc:

The packets are really small, I just send a curl get request. Sometimes I can receive the packets from my program, I think it is because the packets arrive before I set the socket desc so the bpf program doesn't work for that and I have to handle these packets in userspace.

Ferenc Fejes
 

Hi!

Well it seems like a different problem from what I first thought. I have a few idea, maybe some of them helps:
1. If you able to modify the source code of user space receiver program, maybe try to avoid the user space receive right after the connection. This is ugly, but I think that would solve the problem: TCP backlog will store your packets until the sockmap code will redirect them to the right socket.
2. You can use BPF sock_ops for placing the new sockets to the sockmap before they receive the first packet. The problem with that you will need to put your application into a cgroup v2 to catch TCP state events. Also, on Ubuntu, you should disable net_cls,net_prio, because that would overwrite the cgroup BPF program of the sockets (see: https://stackoverflow.com/questions/55646983/why-does-my-bpf-prog-type-cgroup-skb-program-not-work-in-a-container)
3. I encountered very similar problem in my sockmap accelerated shadowsocks fork (https://github.com/SPYFF/shadowsocks-libev-nocrypto/tree/ebpf attaching sockmap right after the connection established but before the first packet) and as far as I remember I still able to count all the packets in the BPF program. So in my case the issue was different. All of your curl request successfully establish his TCP connection? If there are some connection reset failure, maybe you could increase net.somaxconn and the backlog size of the listener socket.

I hope some of them might be help.

Good luck,
Ferenc 

forrest0579@...
 


 Hi Ferenc,

I think sock_ops is a good option for me, it seems the cilium project also use this to accelerate networks.

All of your curl request successfully establish his TCP connection? If there are some connection reset failure, maybe you could increase net.somaxconn and the backlog size of the listener socket.
Yes, all TCP connection established.  There're no connection pressures and I send curl requests manually. 

Thanks,
Forrest chen