We also care about icache pressure, and GRO/TSO already provides
bundling where it is applicable, without adding insane complexity in
the stacks.
Sorry, I cannot resist. The GRO code is really bad regarding icache
pressure/usage, due to how everything is function pointers calling
function pointers, even if the general case is calling the function
defined just next to it in the same C-file (which usually cause
inlining). I can easily get 10% more performance for UDP use-cases by
simply disabling the GRO code, and I measure a significant drop in

Edward's solution should lower icache pressure.

