Generic Receive Offload
Generic Receive Offload (GRO) is a technique for optimising the performance of computer networks that seeks to reduce CPU workload by deduplicating incoming packets (where possible) into one larger packet. This larger packet only has a single header, and its payload is larger due to combining all of the payloads of the smaller packets. As GRO converts multiple packets into one packet, there are fewer context switches and CPU interrupts—hence the performance improvement. GRO is typically implemented in kernelspace as part of the system's networking stack; however, this necessitates hardware support onboard the Network Interface Card (NIC)[7][8].
GRO can operate on multiple protocols, but is typically applied to Transmission Control Protocol (TCP) packets. There is nonetheless support for UDP though.[1][2] GRO enjoys good kernelspace support; with Linux receiving full GRO support in 2.6.29.[3]
For it to be legal to aggregate headers, certain fields in the headers being aggregated must be identical. Specifically,
- Source address
- Destination address
- (Layer 4) protocol
- Source port
- Destination port
Additionally, the sequence numbers must be in ascending order (although not necessarily contiguous).
Recent work[7] has suggested that there may be potentially significant performance penalties in most modern implementations of the QUIC protocol. Certain discussions[2] around this work have suggested that extending GRO to QUIC stacks may help to alleviate these penalties.
References
- P. Abeni, “udp: implement GRO support,” LWN.net, Oct. 19, 2018. https://lwn.net/Articles/768995 (accessed Sep. 10, 2024).
- J. Whited, “Increasing QUIC and UDP throughput,” Tailscale Blog, Nov. 16, 2023. https://tailscale.com/blog/quic-udp-throughput (accessed Sep. 10, 2024).
- J. Corbet, “JLS2009: Generic receive offload,” LWN.net, Oct. 07, 2009. https://lwn.net/Articles/358910 (accessed Sep. 10, 2024).
- Jens Burkhard Schmitt, Measurement, modelling, and evaluation of computing systems and dependability and fault tolerance : 16th International GI/ITG Conference, MMB & DFT 2012, Kaiserslautern, Germany, March 19-21, 2012. Proceedings. Kaiserslautern, Germany: Springer, 2012.
- Stack Overflow, “Why is GRO more efficient?,” Stack Overflow, Nov. 16, 2017. https://stackoverflow.com/questions/47332232/why-is-gro-more-efficient (accessed Sep. 10, 2024).
- Server Fault, “How GRO (generic receive offload) works on more advanced NICs?,” Server Fault, Feb. 03, 2011. https://serverfault.com/questions/230804/how-gro-generic-receive-offload-works-on-more-advanced-nics (accessed Sep. 10, 2024).
- X. Zhang et al., “QUIC is not Quick Enough over Fast Internet,” in WWW ’24: Proceedings of the ACM Web Conference 2024, New York: Association for Computing Machinery, May 2024, pp. 2713–2722. Accessed: Sep. 10, 2024. [Online]. Available: https://dl.acm.org/doi/10.1145/3589334.3645323
- M. Chan, “Hardware GRO,” Broadcom Limited, 2017. Accessed: Sep. 10, 2024. [Online]. Available: http://vger.kernel.org/netconf2017_files/hardware_gro.pdf
Acknowledgements
Many thanks to @jxs for important corrections.
: This release was called "Tasmanian Devil" and debuted at linux.conf.au in 2009!