1.. BSD LICENSE 2 Copyright(c) 2017 Intel Corporation. All rights reserved. 3 All rights reserved. 4 5 Redistribution and use in source and binary forms, with or without 6 modification, are permitted provided that the following conditions 7 are met: 8 9 * Redistributions of source code must retain the above copyright 10 notice, this list of conditions and the following disclaimer. 11 * Redistributions in binary form must reproduce the above copyright 12 notice, this list of conditions and the following disclaimer in 13 the documentation and/or other materials provided with the 14 distribution. 15 * Neither the name of Intel Corporation nor the names of its 16 contributors may be used to endorse or promote products derived 17 from this software without specific prior written permission. 18 19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 31Generic Receive Offload Library 32=============================== 33 34Generic Receive Offload (GRO) is a widely used SW-based offloading 35technique to reduce per-packet processing overheads. By reassembling 36small packets into larger ones, GRO enables applications to process 37fewer large packets directly, thus reducing the number of packets to 38be processed. To benefit DPDK-based applications, like Open vSwitch, 39DPDK also provides own GRO implementation. In DPDK, GRO is implemented 40as a standalone library. Applications explicitly use the GRO library to 41reassemble packets. 42 43Overview 44-------- 45 46In the GRO library, there are many GRO types which are defined by packet 47types. One GRO type is in charge of process one kind of packets. For 48example, TCP/IPv4 GRO processes TCP/IPv4 packets. 49 50Each GRO type has a reassembly function, which defines own algorithm and 51table structure to reassemble packets. We assign input packets to the 52corresponding GRO functions by MBUF->packet_type. 53 54The GRO library doesn't check if input packets have correct checksums and 55doesn't re-calculate checksums for merged packets. The GRO library 56assumes the packets are complete (i.e., MF==0 && frag_off==0), when IP 57fragmentation is possible (i.e., DF==0). Additionally, it complies RFC 586864 to process the IPv4 ID field. 59 60Currently, the GRO library provides GRO supports for TCP/IPv4 packets and 61VxLAN packets which contain an outer IPv4 header and an inner TCP/IPv4 62packet. 63 64Two Sets of API 65--------------- 66 67For different usage scenarios, the GRO library provides two sets of API. 68The one is called the lightweight mode API, which enables applications to 69merge a small number of packets rapidly; the other is called the 70heavyweight mode API, which provides fine-grained controls to 71applications and supports to merge a large number of packets. 72 73Lightweight Mode API 74~~~~~~~~~~~~~~~~~~~~ 75 76The lightweight mode only has one function ``rte_gro_reassemble_burst()``, 77which process N packets at a time. Using the lightweight mode API to 78merge packets is very simple. Calling ``rte_gro_reassemble_burst()`` is 79enough. The GROed packets are returned to applications as soon as it 80finishes. 81 82In ``rte_gro_reassemble_burst()``, table structures of different GRO 83types are allocated in the stack. This design simplifies applications' 84operations. However, limited by the stack size, the maximum number of 85packets that ``rte_gro_reassemble_burst()`` can process in an invocation 86should be less than or equal to ``RTE_GRO_MAX_BURST_ITEM_NUM``. 87 88Heavyweight Mode API 89~~~~~~~~~~~~~~~~~~~~ 90 91Compared with the lightweight mode, using the heavyweight mode API is 92relatively complex. Firstly, applications need to create a GRO context 93by ``rte_gro_ctx_create()``. ``rte_gro_ctx_create()`` allocates tables 94structures in the heap and stores their pointers in the GRO context. 95Secondly, applications use ``rte_gro_reassemble()`` to merge packets. 96If input packets have invalid parameters, ``rte_gro_reassemble()`` 97returns them to applications. For example, packets of unsupported GRO 98types or TCP SYN packets are returned. Otherwise, the input packets are 99either merged with the existed packets in the tables or inserted into the 100tables. Finally, applications use ``rte_gro_timeout_flush()`` to flush 101packets from the tables, when they want to get the GROed packets. 102 103Note that all update/lookup operations on the GRO context are not thread 104safe. So if different processes or threads want to access the same 105context object simultaneously, some external syncing mechanisms must be 106used. 107 108Reassembly Algorithm 109-------------------- 110 111The reassembly algorithm is used for reassembling packets. In the GRO 112library, different GRO types can use different algorithms. In this 113section, we will introduce an algorithm, which is used by TCP/IPv4 GRO 114and VxLAN GRO. 115 116Challenges 117~~~~~~~~~~ 118 119The reassembly algorithm determines the efficiency of GRO. There are two 120challenges in the algorithm design: 121 122- a high cost algorithm/implementation would cause packet dropping in a 123 high speed network. 124 125- packet reordering makes it hard to merge packets. For example, Linux 126 GRO fails to merge packets when encounters packet reordering. 127 128The above two challenges require our algorithm is: 129 130- lightweight enough to scale fast networking speed 131 132- capable of handling packet reordering 133 134In DPDK GRO, we use a key-based algorithm to address the two challenges. 135 136Key-based Reassembly Algorithm 137~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 138 139:numref:`figure_gro-key-algorithm` illustrates the procedure of the 140key-based algorithm. Packets are classified into "flows" by some header 141fields (we call them as "key"). To process an input packet, the algorithm 142searches for a matched "flow" (i.e., the same value of key) for the 143packet first, then checks all packets in the "flow" and tries to find a 144"neighbor" for it. If find a "neighbor", merge the two packets together. 145If can't find a "neighbor", store the packet into its "flow". If can't 146find a matched "flow", insert a new "flow" and store the packet into the 147"flow". 148 149.. note:: 150 Packets in the same "flow" that can't merge are always caused 151 by packet reordering. 152 153The key-based algorithm has two characters: 154 155- classifying packets into "flows" to accelerate packet aggregation is 156 simple (address challenge 1). 157 158- storing out-of-order packets makes it possible to merge later (address 159 challenge 2). 160 161.. _figure_gro-key-algorithm: 162 163.. figure:: img/gro-key-algorithm.* 164 :align: center 165 166 Key-based Reassembly Algorithm 167 168TCP/IPv4 GRO 169------------ 170 171The table structure used by TCP/IPv4 GRO contains two arrays: flow array 172and item array. The flow array keeps flow information, and the item array 173keeps packet information. 174 175Header fields used to define a TCP/IPv4 flow include: 176 177- source and destination: Ethernet and IP address, TCP port 178 179- TCP acknowledge number 180 181TCP/IPv4 packets whose FIN, SYN, RST, URG, PSH, ECE or CWR bit is set 182won't be processed. 183 184Header fields deciding if two packets are neighbors include: 185 186- TCP sequence number 187 188- IPv4 ID. The IPv4 ID fields of the packets, whose DF bit is 0, should 189 be increased by 1. 190 191VxLAN GRO 192--------- 193 194The table structure used by VxLAN GRO, which is in charge of processing 195VxLAN packets with an outer IPv4 header and inner TCP/IPv4 packet, is 196similar with that of TCP/IPv4 GRO. Differently, the header fields used 197to define a VxLAN flow include: 198 199- outer source and destination: Ethernet and IP address, UDP port 200 201- VxLAN header (VNI and flag) 202 203- inner source and destination: Ethernet and IP address, TCP port 204 205Header fields deciding if packets are neighbors include: 206 207- outer IPv4 ID. The IPv4 ID fields of the packets, whose DF bit in the 208 outer IPv4 header is 0, should be increased by 1. 209 210- inner TCP sequence number 211 212- inner IPv4 ID. The IPv4 ID fields of the packets, whose DF bit in the 213 inner IPv4 header is 0, should be increased by 1. 214 215.. note:: 216 We comply RFC 6864 to process the IPv4 ID field. Specifically, 217 we check IPv4 ID fields for the packets whose DF bit is 0 and 218 ignore IPv4 ID fields for the packets whose DF bit is 1. 219 Additionally, packets which have different value of DF bit can't 220 be merged. 221