15630257fSFerruh Yigit.. SPDX-License-Identifier: BSD-3-Clause 25630257fSFerruh Yigit Copyright(c) 2017 Intel Corporation. 32c900d09SJiayu Hu 4*41dd9a6bSDavid YoungGeneric Receive Offload (GRO) Library 5*41dd9a6bSDavid Young===================================== 62c900d09SJiayu Hu 72c900d09SJiayu HuGeneric Receive Offload (GRO) is a widely used SW-based offloading 81e4cf4d6SJiayu Hutechnique to reduce per-packet processing overheads. By reassembling 91e4cf4d6SJiayu Husmall packets into larger ones, GRO enables applications to process 101e4cf4d6SJiayu Hufewer large packets directly, thus reducing the number of packets to 111e4cf4d6SJiayu Hube processed. To benefit DPDK-based applications, like Open vSwitch, 121e4cf4d6SJiayu HuDPDK also provides own GRO implementation. In DPDK, GRO is implemented 131e4cf4d6SJiayu Huas a standalone library. Applications explicitly use the GRO library to 141e4cf4d6SJiayu Hureassemble packets. 152c900d09SJiayu Hu 161e4cf4d6SJiayu HuOverview 171e4cf4d6SJiayu Hu-------- 182c900d09SJiayu Hu 191e4cf4d6SJiayu HuIn the GRO library, there are many GRO types which are defined by packet 201e4cf4d6SJiayu Hutypes. One GRO type is in charge of process one kind of packets. For 211e4cf4d6SJiayu Huexample, TCP/IPv4 GRO processes TCP/IPv4 packets. 222c900d09SJiayu Hu 231e4cf4d6SJiayu HuEach GRO type has a reassembly function, which defines own algorithm and 241e4cf4d6SJiayu Hutable structure to reassemble packets. We assign input packets to the 251e4cf4d6SJiayu Hucorresponding GRO functions by MBUF->packet_type. 262c900d09SJiayu Hu 271e4cf4d6SJiayu HuThe GRO library doesn't check if input packets have correct checksums and 281e4cf4d6SJiayu Hudoesn't re-calculate checksums for merged packets. The GRO library 291e4cf4d6SJiayu Huassumes the packets are complete (i.e., MF==0 && frag_off==0), when IP 30b52b61f0SJiayu Hufragmentation is possible (i.e., DF==0). Additionally, it complies RFC 31b52b61f0SJiayu Hu6864 to process the IPv4 ID field. 322c900d09SJiayu Hu 331ca5e674SYi YangCurrently, the GRO library provides GRO supports for TCP/IPv4 and UDP/IPv4 341ca5e674SYi Yangpackets as well as VxLAN packets which contain an outer IPv4 header and an 35e2d81106SYi Yanginner TCP/IPv4 or UDP/IPv4 packet. 362c900d09SJiayu Hu 371e4cf4d6SJiayu HuTwo Sets of API 381e4cf4d6SJiayu Hu--------------- 392c900d09SJiayu Hu 401e4cf4d6SJiayu HuFor different usage scenarios, the GRO library provides two sets of API. 411e4cf4d6SJiayu HuThe one is called the lightweight mode API, which enables applications to 421e4cf4d6SJiayu Humerge a small number of packets rapidly; the other is called the 431e4cf4d6SJiayu Huheavyweight mode API, which provides fine-grained controls to 441e4cf4d6SJiayu Huapplications and supports to merge a large number of packets. 452c900d09SJiayu Hu 461e4cf4d6SJiayu HuLightweight Mode API 471e4cf4d6SJiayu Hu~~~~~~~~~~~~~~~~~~~~ 482c900d09SJiayu Hu 491e4cf4d6SJiayu HuThe lightweight mode only has one function ``rte_gro_reassemble_burst()``, 501e4cf4d6SJiayu Huwhich process N packets at a time. Using the lightweight mode API to 511e4cf4d6SJiayu Humerge packets is very simple. Calling ``rte_gro_reassemble_burst()`` is 521e4cf4d6SJiayu Huenough. The GROed packets are returned to applications as soon as it 531e4cf4d6SJiayu Hufinishes. 542c900d09SJiayu Hu 551e4cf4d6SJiayu HuIn ``rte_gro_reassemble_burst()``, table structures of different GRO 561e4cf4d6SJiayu Hutypes are allocated in the stack. This design simplifies applications' 571e4cf4d6SJiayu Huoperations. However, limited by the stack size, the maximum number of 581e4cf4d6SJiayu Hupackets that ``rte_gro_reassemble_burst()`` can process in an invocation 591e4cf4d6SJiayu Hushould be less than or equal to ``RTE_GRO_MAX_BURST_ITEM_NUM``. 602c900d09SJiayu Hu 611e4cf4d6SJiayu HuHeavyweight Mode API 621e4cf4d6SJiayu Hu~~~~~~~~~~~~~~~~~~~~ 632c900d09SJiayu Hu 641e4cf4d6SJiayu HuCompared with the lightweight mode, using the heavyweight mode API is 651e4cf4d6SJiayu Hurelatively complex. Firstly, applications need to create a GRO context 661e4cf4d6SJiayu Huby ``rte_gro_ctx_create()``. ``rte_gro_ctx_create()`` allocates tables 671e4cf4d6SJiayu Hustructures in the heap and stores their pointers in the GRO context. 681e4cf4d6SJiayu HuSecondly, applications use ``rte_gro_reassemble()`` to merge packets. 691e4cf4d6SJiayu HuIf input packets have invalid parameters, ``rte_gro_reassemble()`` 701e4cf4d6SJiayu Hureturns them to applications. For example, packets of unsupported GRO 711e4cf4d6SJiayu Hutypes or TCP SYN packets are returned. Otherwise, the input packets are 721e4cf4d6SJiayu Hueither merged with the existed packets in the tables or inserted into the 731e4cf4d6SJiayu Hutables. Finally, applications use ``rte_gro_timeout_flush()`` to flush 741e4cf4d6SJiayu Hupackets from the tables, when they want to get the GROed packets. 752c900d09SJiayu Hu 761e4cf4d6SJiayu HuNote that all update/lookup operations on the GRO context are not thread 771e4cf4d6SJiayu Husafe. So if different processes or threads want to access the same 781e4cf4d6SJiayu Hucontext object simultaneously, some external syncing mechanisms must be 791e4cf4d6SJiayu Huused. 801e4cf4d6SJiayu Hu 811e4cf4d6SJiayu HuReassembly Algorithm 821e4cf4d6SJiayu Hu-------------------- 831e4cf4d6SJiayu Hu 841e4cf4d6SJiayu HuThe reassembly algorithm is used for reassembling packets. In the GRO 851e4cf4d6SJiayu Hulibrary, different GRO types can use different algorithms. In this 869e0b9d2eSJiayu Husection, we will introduce an algorithm, which is used by TCP/IPv4 GRO 879e0b9d2eSJiayu Huand VxLAN GRO. 881e4cf4d6SJiayu Hu 891e4cf4d6SJiayu HuChallenges 901e4cf4d6SJiayu Hu~~~~~~~~~~ 911e4cf4d6SJiayu Hu 921e4cf4d6SJiayu HuThe reassembly algorithm determines the efficiency of GRO. There are two 931e4cf4d6SJiayu Huchallenges in the algorithm design: 941e4cf4d6SJiayu Hu 951e4cf4d6SJiayu Hu- a high cost algorithm/implementation would cause packet dropping in a 961e4cf4d6SJiayu Hu high speed network. 971e4cf4d6SJiayu Hu 981e4cf4d6SJiayu Hu- packet reordering makes it hard to merge packets. For example, Linux 991e4cf4d6SJiayu Hu GRO fails to merge packets when encounters packet reordering. 1001e4cf4d6SJiayu Hu 1011e4cf4d6SJiayu HuThe above two challenges require our algorithm is: 1021e4cf4d6SJiayu Hu 1031e4cf4d6SJiayu Hu- lightweight enough to scale fast networking speed 1041e4cf4d6SJiayu Hu 1051e4cf4d6SJiayu Hu- capable of handling packet reordering 1061e4cf4d6SJiayu Hu 1071e4cf4d6SJiayu HuIn DPDK GRO, we use a key-based algorithm to address the two challenges. 1081e4cf4d6SJiayu Hu 1091e4cf4d6SJiayu HuKey-based Reassembly Algorithm 1101e4cf4d6SJiayu Hu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1111e4cf4d6SJiayu Hu 1121e4cf4d6SJiayu Hu:numref:`figure_gro-key-algorithm` illustrates the procedure of the 1131e4cf4d6SJiayu Hukey-based algorithm. Packets are classified into "flows" by some header 1141e4cf4d6SJiayu Hufields (we call them as "key"). To process an input packet, the algorithm 1151e4cf4d6SJiayu Husearches for a matched "flow" (i.e., the same value of key) for the 1161e4cf4d6SJiayu Hupacket first, then checks all packets in the "flow" and tries to find a 1171e4cf4d6SJiayu Hu"neighbor" for it. If find a "neighbor", merge the two packets together. 1181e4cf4d6SJiayu HuIf can't find a "neighbor", store the packet into its "flow". If can't 1191e4cf4d6SJiayu Hufind a matched "flow", insert a new "flow" and store the packet into the 1201e4cf4d6SJiayu Hu"flow". 1211e4cf4d6SJiayu Hu 1221e4cf4d6SJiayu Hu.. note:: 1231e4cf4d6SJiayu Hu Packets in the same "flow" that can't merge are always caused 1241e4cf4d6SJiayu Hu by packet reordering. 1251e4cf4d6SJiayu Hu 1261e4cf4d6SJiayu HuThe key-based algorithm has two characters: 1271e4cf4d6SJiayu Hu 1281e4cf4d6SJiayu Hu- classifying packets into "flows" to accelerate packet aggregation is 1291e4cf4d6SJiayu Hu simple (address challenge 1). 1301e4cf4d6SJiayu Hu 1311e4cf4d6SJiayu Hu- storing out-of-order packets makes it possible to merge later (address 1321e4cf4d6SJiayu Hu challenge 2). 1331e4cf4d6SJiayu Hu 1341e4cf4d6SJiayu Hu.. _figure_gro-key-algorithm: 1351e4cf4d6SJiayu Hu 1361e4cf4d6SJiayu Hu.. figure:: img/gro-key-algorithm.* 1371e4cf4d6SJiayu Hu :align: center 1381e4cf4d6SJiayu Hu 1391e4cf4d6SJiayu Hu Key-based Reassembly Algorithm 1402c900d09SJiayu Hu 14174080d7dSKumara ParameshwaranTCP-IPv4/IPv6 GRO 14274080d7dSKumara Parameshwaran----------------- 1432c900d09SJiayu Hu 14474080d7dSKumara ParameshwaranThe table structure used by TCP-IPv4/IPv6 GRO contains two arrays: flow array 1451e4cf4d6SJiayu Huand item array. The flow array keeps flow information, and the item array 1461e4cf4d6SJiayu Hukeeps packet information. 14774080d7dSKumara ParameshwaranThe flow array is different for IPv4 and IPv6 while the item array is the same. 1482c900d09SJiayu Hu 14974080d7dSKumara ParameshwaranHeader fields used to define a TCP-IPv4/IPv6 flow include: 1502c900d09SJiayu Hu 15174080d7dSKumara Parameshwaran- common TCP key fields : Ethernet address, TCP port, TCP acknowledge number 15274080d7dSKumara Parameshwaran- version specific IP address 15374080d7dSKumara Parameshwaran- IPv6 flow label for IPv6 flow 1542c900d09SJiayu Hu 15574080d7dSKumara ParameshwaranTCP packets whose FIN, SYN, RST, URG, PSH, ECE or CWR bit is set 1561e4cf4d6SJiayu Huwon't be processed. 1572c900d09SJiayu Hu 1581e4cf4d6SJiayu HuHeader fields deciding if two packets are neighbors include: 1592c900d09SJiayu Hu 1601e4cf4d6SJiayu Hu- TCP sequence number 1612c900d09SJiayu Hu 162b52b61f0SJiayu Hu- IPv4 ID. The IPv4 ID fields of the packets, whose DF bit is 0, should 16374080d7dSKumara Parameshwaran be increased by 1. This is applicable only for IPv4. 164b52b61f0SJiayu Hu 1659e0b9d2eSJiayu HuVxLAN GRO 1669e0b9d2eSJiayu Hu--------- 1679e0b9d2eSJiayu Hu 1689e0b9d2eSJiayu HuThe table structure used by VxLAN GRO, which is in charge of processing 1699e0b9d2eSJiayu HuVxLAN packets with an outer IPv4 header and inner TCP/IPv4 packet, is 1709e0b9d2eSJiayu Husimilar with that of TCP/IPv4 GRO. Differently, the header fields used 1719e0b9d2eSJiayu Huto define a VxLAN flow include: 1729e0b9d2eSJiayu Hu 1739e0b9d2eSJiayu Hu- outer source and destination: Ethernet and IP address, UDP port 1749e0b9d2eSJiayu Hu 1759e0b9d2eSJiayu Hu- VxLAN header (VNI and flag) 1769e0b9d2eSJiayu Hu 1779e0b9d2eSJiayu Hu- inner source and destination: Ethernet and IP address, TCP port 1789e0b9d2eSJiayu Hu 1799e0b9d2eSJiayu HuHeader fields deciding if packets are neighbors include: 1809e0b9d2eSJiayu Hu 1819e0b9d2eSJiayu Hu- outer IPv4 ID. The IPv4 ID fields of the packets, whose DF bit in the 1829e0b9d2eSJiayu Hu outer IPv4 header is 0, should be increased by 1. 1839e0b9d2eSJiayu Hu 1849e0b9d2eSJiayu Hu- inner TCP sequence number 1859e0b9d2eSJiayu Hu 1869e0b9d2eSJiayu Hu- inner IPv4 ID. The IPv4 ID fields of the packets, whose DF bit in the 1879e0b9d2eSJiayu Hu inner IPv4 header is 0, should be increased by 1. 1889e0b9d2eSJiayu Hu 189b52b61f0SJiayu Hu.. note:: 190b52b61f0SJiayu Hu We comply RFC 6864 to process the IPv4 ID field. Specifically, 191b52b61f0SJiayu Hu we check IPv4 ID fields for the packets whose DF bit is 0 and 192b52b61f0SJiayu Hu ignore IPv4 ID fields for the packets whose DF bit is 1. 193b52b61f0SJiayu Hu Additionally, packets which have different value of DF bit can't 194b52b61f0SJiayu Hu be merged. 1955bd5f7b3SJiayu Hu 1965bd5f7b3SJiayu HuGRO Library Limitations 1975bd5f7b3SJiayu Hu----------------------- 1985bd5f7b3SJiayu Hu 1995bd5f7b3SJiayu Hu- GRO library uses MBUF->l2_len/l3_len/l4_len/outer_l2_len/ 2005bd5f7b3SJiayu Hu outer_l3_len/packet_type to get protocol headers for the 2015bd5f7b3SJiayu Hu input packet, rather than parsing the packet header. Therefore, 2025bd5f7b3SJiayu Hu before call GRO APIs to merge packets, user applications 2035bd5f7b3SJiayu Hu must set MBUF->l2_len/l3_len/l4_len/outer_l2_len/outer_l3_len/ 2045bd5f7b3SJiayu Hu packet_type to the same values as the protocol headers of the 2055bd5f7b3SJiayu Hu packet. 2065bd5f7b3SJiayu Hu 2075bd5f7b3SJiayu Hu- GRO library doesn't support to process the packets with IPv4 2085bd5f7b3SJiayu Hu Options or VLAN tagged. 2095bd5f7b3SJiayu Hu 2105bd5f7b3SJiayu Hu- GRO library just supports to process the packet organized 2115bd5f7b3SJiayu Hu in a single MBUF. If the input packet consists of multiple 2125bd5f7b3SJiayu Hu MBUFs (i.e. chained MBUFs), GRO reassembly behaviors are 2135bd5f7b3SJiayu Hu unknown. 214