15630257fSFerruh Yigit.. SPDX-License-Identifier: BSD-3-Clause 25630257fSFerruh Yigit Copyright(c) 2017 Intel Corporation. 32c900d09SJiayu Hu 42c900d09SJiayu HuGeneric Receive Offload Library 52c900d09SJiayu Hu=============================== 62c900d09SJiayu Hu 72c900d09SJiayu HuGeneric Receive Offload (GRO) is a widely used SW-based offloading 81e4cf4d6SJiayu Hutechnique to reduce per-packet processing overheads. By reassembling 91e4cf4d6SJiayu Husmall packets into larger ones, GRO enables applications to process 101e4cf4d6SJiayu Hufewer large packets directly, thus reducing the number of packets to 111e4cf4d6SJiayu Hube processed. To benefit DPDK-based applications, like Open vSwitch, 121e4cf4d6SJiayu HuDPDK also provides own GRO implementation. In DPDK, GRO is implemented 131e4cf4d6SJiayu Huas a standalone library. Applications explicitly use the GRO library to 141e4cf4d6SJiayu Hureassemble packets. 152c900d09SJiayu Hu 161e4cf4d6SJiayu HuOverview 171e4cf4d6SJiayu Hu-------- 182c900d09SJiayu Hu 191e4cf4d6SJiayu HuIn the GRO library, there are many GRO types which are defined by packet 201e4cf4d6SJiayu Hutypes. One GRO type is in charge of process one kind of packets. For 211e4cf4d6SJiayu Huexample, TCP/IPv4 GRO processes TCP/IPv4 packets. 222c900d09SJiayu Hu 231e4cf4d6SJiayu HuEach GRO type has a reassembly function, which defines own algorithm and 241e4cf4d6SJiayu Hutable structure to reassemble packets. We assign input packets to the 251e4cf4d6SJiayu Hucorresponding GRO functions by MBUF->packet_type. 262c900d09SJiayu Hu 271e4cf4d6SJiayu HuThe GRO library doesn't check if input packets have correct checksums and 281e4cf4d6SJiayu Hudoesn't re-calculate checksums for merged packets. The GRO library 291e4cf4d6SJiayu Huassumes the packets are complete (i.e., MF==0 && frag_off==0), when IP 30b52b61f0SJiayu Hufragmentation is possible (i.e., DF==0). Additionally, it complies RFC 31b52b61f0SJiayu Hu6864 to process the IPv4 ID field. 322c900d09SJiayu Hu 339e0b9d2eSJiayu HuCurrently, the GRO library provides GRO supports for TCP/IPv4 packets and 349e0b9d2eSJiayu HuVxLAN packets which contain an outer IPv4 header and an inner TCP/IPv4 359e0b9d2eSJiayu Hupacket. 362c900d09SJiayu Hu 371e4cf4d6SJiayu HuTwo Sets of API 381e4cf4d6SJiayu Hu--------------- 392c900d09SJiayu Hu 401e4cf4d6SJiayu HuFor different usage scenarios, the GRO library provides two sets of API. 411e4cf4d6SJiayu HuThe one is called the lightweight mode API, which enables applications to 421e4cf4d6SJiayu Humerge a small number of packets rapidly; the other is called the 431e4cf4d6SJiayu Huheavyweight mode API, which provides fine-grained controls to 441e4cf4d6SJiayu Huapplications and supports to merge a large number of packets. 452c900d09SJiayu Hu 461e4cf4d6SJiayu HuLightweight Mode API 471e4cf4d6SJiayu Hu~~~~~~~~~~~~~~~~~~~~ 482c900d09SJiayu Hu 491e4cf4d6SJiayu HuThe lightweight mode only has one function ``rte_gro_reassemble_burst()``, 501e4cf4d6SJiayu Huwhich process N packets at a time. Using the lightweight mode API to 511e4cf4d6SJiayu Humerge packets is very simple. Calling ``rte_gro_reassemble_burst()`` is 521e4cf4d6SJiayu Huenough. The GROed packets are returned to applications as soon as it 531e4cf4d6SJiayu Hufinishes. 542c900d09SJiayu Hu 551e4cf4d6SJiayu HuIn ``rte_gro_reassemble_burst()``, table structures of different GRO 561e4cf4d6SJiayu Hutypes are allocated in the stack. This design simplifies applications' 571e4cf4d6SJiayu Huoperations. However, limited by the stack size, the maximum number of 581e4cf4d6SJiayu Hupackets that ``rte_gro_reassemble_burst()`` can process in an invocation 591e4cf4d6SJiayu Hushould be less than or equal to ``RTE_GRO_MAX_BURST_ITEM_NUM``. 602c900d09SJiayu Hu 611e4cf4d6SJiayu HuHeavyweight Mode API 621e4cf4d6SJiayu Hu~~~~~~~~~~~~~~~~~~~~ 632c900d09SJiayu Hu 641e4cf4d6SJiayu HuCompared with the lightweight mode, using the heavyweight mode API is 651e4cf4d6SJiayu Hurelatively complex. Firstly, applications need to create a GRO context 661e4cf4d6SJiayu Huby ``rte_gro_ctx_create()``. ``rte_gro_ctx_create()`` allocates tables 671e4cf4d6SJiayu Hustructures in the heap and stores their pointers in the GRO context. 681e4cf4d6SJiayu HuSecondly, applications use ``rte_gro_reassemble()`` to merge packets. 691e4cf4d6SJiayu HuIf input packets have invalid parameters, ``rte_gro_reassemble()`` 701e4cf4d6SJiayu Hureturns them to applications. For example, packets of unsupported GRO 711e4cf4d6SJiayu Hutypes or TCP SYN packets are returned. Otherwise, the input packets are 721e4cf4d6SJiayu Hueither merged with the existed packets in the tables or inserted into the 731e4cf4d6SJiayu Hutables. Finally, applications use ``rte_gro_timeout_flush()`` to flush 741e4cf4d6SJiayu Hupackets from the tables, when they want to get the GROed packets. 752c900d09SJiayu Hu 761e4cf4d6SJiayu HuNote that all update/lookup operations on the GRO context are not thread 771e4cf4d6SJiayu Husafe. So if different processes or threads want to access the same 781e4cf4d6SJiayu Hucontext object simultaneously, some external syncing mechanisms must be 791e4cf4d6SJiayu Huused. 801e4cf4d6SJiayu Hu 811e4cf4d6SJiayu HuReassembly Algorithm 821e4cf4d6SJiayu Hu-------------------- 831e4cf4d6SJiayu Hu 841e4cf4d6SJiayu HuThe reassembly algorithm is used for reassembling packets. In the GRO 851e4cf4d6SJiayu Hulibrary, different GRO types can use different algorithms. In this 869e0b9d2eSJiayu Husection, we will introduce an algorithm, which is used by TCP/IPv4 GRO 879e0b9d2eSJiayu Huand VxLAN GRO. 881e4cf4d6SJiayu Hu 891e4cf4d6SJiayu HuChallenges 901e4cf4d6SJiayu Hu~~~~~~~~~~ 911e4cf4d6SJiayu Hu 921e4cf4d6SJiayu HuThe reassembly algorithm determines the efficiency of GRO. There are two 931e4cf4d6SJiayu Huchallenges in the algorithm design: 941e4cf4d6SJiayu Hu 951e4cf4d6SJiayu Hu- a high cost algorithm/implementation would cause packet dropping in a 961e4cf4d6SJiayu Hu high speed network. 971e4cf4d6SJiayu Hu 981e4cf4d6SJiayu Hu- packet reordering makes it hard to merge packets. For example, Linux 991e4cf4d6SJiayu Hu GRO fails to merge packets when encounters packet reordering. 1001e4cf4d6SJiayu Hu 1011e4cf4d6SJiayu HuThe above two challenges require our algorithm is: 1021e4cf4d6SJiayu Hu 1031e4cf4d6SJiayu Hu- lightweight enough to scale fast networking speed 1041e4cf4d6SJiayu Hu 1051e4cf4d6SJiayu Hu- capable of handling packet reordering 1061e4cf4d6SJiayu Hu 1071e4cf4d6SJiayu HuIn DPDK GRO, we use a key-based algorithm to address the two challenges. 1081e4cf4d6SJiayu Hu 1091e4cf4d6SJiayu HuKey-based Reassembly Algorithm 1101e4cf4d6SJiayu Hu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1111e4cf4d6SJiayu Hu 1121e4cf4d6SJiayu Hu:numref:`figure_gro-key-algorithm` illustrates the procedure of the 1131e4cf4d6SJiayu Hukey-based algorithm. Packets are classified into "flows" by some header 1141e4cf4d6SJiayu Hufields (we call them as "key"). To process an input packet, the algorithm 1151e4cf4d6SJiayu Husearches for a matched "flow" (i.e., the same value of key) for the 1161e4cf4d6SJiayu Hupacket first, then checks all packets in the "flow" and tries to find a 1171e4cf4d6SJiayu Hu"neighbor" for it. If find a "neighbor", merge the two packets together. 1181e4cf4d6SJiayu HuIf can't find a "neighbor", store the packet into its "flow". If can't 1191e4cf4d6SJiayu Hufind a matched "flow", insert a new "flow" and store the packet into the 1201e4cf4d6SJiayu Hu"flow". 1211e4cf4d6SJiayu Hu 1221e4cf4d6SJiayu Hu.. note:: 1231e4cf4d6SJiayu Hu Packets in the same "flow" that can't merge are always caused 1241e4cf4d6SJiayu Hu by packet reordering. 1251e4cf4d6SJiayu Hu 1261e4cf4d6SJiayu HuThe key-based algorithm has two characters: 1271e4cf4d6SJiayu Hu 1281e4cf4d6SJiayu Hu- classifying packets into "flows" to accelerate packet aggregation is 1291e4cf4d6SJiayu Hu simple (address challenge 1). 1301e4cf4d6SJiayu Hu 1311e4cf4d6SJiayu Hu- storing out-of-order packets makes it possible to merge later (address 1321e4cf4d6SJiayu Hu challenge 2). 1331e4cf4d6SJiayu Hu 1341e4cf4d6SJiayu Hu.. _figure_gro-key-algorithm: 1351e4cf4d6SJiayu Hu 1361e4cf4d6SJiayu Hu.. figure:: img/gro-key-algorithm.* 1371e4cf4d6SJiayu Hu :align: center 1381e4cf4d6SJiayu Hu 1391e4cf4d6SJiayu Hu Key-based Reassembly Algorithm 1402c900d09SJiayu Hu 1412c900d09SJiayu HuTCP/IPv4 GRO 1422c900d09SJiayu Hu------------ 1432c900d09SJiayu Hu 1441e4cf4d6SJiayu HuThe table structure used by TCP/IPv4 GRO contains two arrays: flow array 1451e4cf4d6SJiayu Huand item array. The flow array keeps flow information, and the item array 1461e4cf4d6SJiayu Hukeeps packet information. 1472c900d09SJiayu Hu 1481e4cf4d6SJiayu HuHeader fields used to define a TCP/IPv4 flow include: 1492c900d09SJiayu Hu 1501e4cf4d6SJiayu Hu- source and destination: Ethernet and IP address, TCP port 1512c900d09SJiayu Hu 1521e4cf4d6SJiayu Hu- TCP acknowledge number 1532c900d09SJiayu Hu 1541e4cf4d6SJiayu HuTCP/IPv4 packets whose FIN, SYN, RST, URG, PSH, ECE or CWR bit is set 1551e4cf4d6SJiayu Huwon't be processed. 1562c900d09SJiayu Hu 1571e4cf4d6SJiayu HuHeader fields deciding if two packets are neighbors include: 1582c900d09SJiayu Hu 1591e4cf4d6SJiayu Hu- TCP sequence number 1602c900d09SJiayu Hu 161b52b61f0SJiayu Hu- IPv4 ID. The IPv4 ID fields of the packets, whose DF bit is 0, should 162b52b61f0SJiayu Hu be increased by 1. 163b52b61f0SJiayu Hu 1649e0b9d2eSJiayu HuVxLAN GRO 1659e0b9d2eSJiayu Hu--------- 1669e0b9d2eSJiayu Hu 1679e0b9d2eSJiayu HuThe table structure used by VxLAN GRO, which is in charge of processing 1689e0b9d2eSJiayu HuVxLAN packets with an outer IPv4 header and inner TCP/IPv4 packet, is 1699e0b9d2eSJiayu Husimilar with that of TCP/IPv4 GRO. Differently, the header fields used 1709e0b9d2eSJiayu Huto define a VxLAN flow include: 1719e0b9d2eSJiayu Hu 1729e0b9d2eSJiayu Hu- outer source and destination: Ethernet and IP address, UDP port 1739e0b9d2eSJiayu Hu 1749e0b9d2eSJiayu Hu- VxLAN header (VNI and flag) 1759e0b9d2eSJiayu Hu 1769e0b9d2eSJiayu Hu- inner source and destination: Ethernet and IP address, TCP port 1779e0b9d2eSJiayu Hu 1789e0b9d2eSJiayu HuHeader fields deciding if packets are neighbors include: 1799e0b9d2eSJiayu Hu 1809e0b9d2eSJiayu Hu- outer IPv4 ID. The IPv4 ID fields of the packets, whose DF bit in the 1819e0b9d2eSJiayu Hu outer IPv4 header is 0, should be increased by 1. 1829e0b9d2eSJiayu Hu 1839e0b9d2eSJiayu Hu- inner TCP sequence number 1849e0b9d2eSJiayu Hu 1859e0b9d2eSJiayu Hu- inner IPv4 ID. The IPv4 ID fields of the packets, whose DF bit in the 1869e0b9d2eSJiayu Hu inner IPv4 header is 0, should be increased by 1. 1879e0b9d2eSJiayu Hu 188b52b61f0SJiayu Hu.. note:: 189b52b61f0SJiayu Hu We comply RFC 6864 to process the IPv4 ID field. Specifically, 190b52b61f0SJiayu Hu we check IPv4 ID fields for the packets whose DF bit is 0 and 191b52b61f0SJiayu Hu ignore IPv4 ID fields for the packets whose DF bit is 1. 192b52b61f0SJiayu Hu Additionally, packets which have different value of DF bit can't 193b52b61f0SJiayu Hu be merged. 194*5bd5f7b3SJiayu Hu 195*5bd5f7b3SJiayu HuGRO Library Limitations 196*5bd5f7b3SJiayu Hu----------------------- 197*5bd5f7b3SJiayu Hu 198*5bd5f7b3SJiayu Hu- GRO library uses MBUF->l2_len/l3_len/l4_len/outer_l2_len/ 199*5bd5f7b3SJiayu Hu outer_l3_len/packet_type to get protocol headers for the 200*5bd5f7b3SJiayu Hu input packet, rather than parsing the packet header. Therefore, 201*5bd5f7b3SJiayu Hu before call GRO APIs to merge packets, user applications 202*5bd5f7b3SJiayu Hu must set MBUF->l2_len/l3_len/l4_len/outer_l2_len/outer_l3_len/ 203*5bd5f7b3SJiayu Hu packet_type to the same values as the protocol headers of the 204*5bd5f7b3SJiayu Hu packet. 205*5bd5f7b3SJiayu Hu 206*5bd5f7b3SJiayu Hu- GRO library doesn't support to process the packets with IPv4 207*5bd5f7b3SJiayu Hu Options or VLAN tagged. 208*5bd5f7b3SJiayu Hu 209*5bd5f7b3SJiayu Hu- GRO library just supports to process the packet organized 210*5bd5f7b3SJiayu Hu in a single MBUF. If the input packet consists of multiple 211*5bd5f7b3SJiayu Hu MBUFs (i.e. chained MBUFs), GRO reassembly behaviors are 212*5bd5f7b3SJiayu Hu unknown. 213