xref: /dpdk/doc/guides/prog_guide/generic_receive_offload_lib.rst (revision 9e0b9d2ec0f4bb7256f9b053c43d7c1d0c02c3e2)
12c900d09SJiayu Hu..  BSD LICENSE
22c900d09SJiayu Hu    Copyright(c) 2017 Intel Corporation. All rights reserved.
32c900d09SJiayu Hu    All rights reserved.
42c900d09SJiayu Hu
52c900d09SJiayu Hu    Redistribution and use in source and binary forms, with or without
62c900d09SJiayu Hu    modification, are permitted provided that the following conditions
72c900d09SJiayu Hu    are met:
82c900d09SJiayu Hu
92c900d09SJiayu Hu    * Redistributions of source code must retain the above copyright
102c900d09SJiayu Hu    notice, this list of conditions and the following disclaimer.
112c900d09SJiayu Hu    * Redistributions in binary form must reproduce the above copyright
122c900d09SJiayu Hu    notice, this list of conditions and the following disclaimer in
132c900d09SJiayu Hu    the documentation and/or other materials provided with the
142c900d09SJiayu Hu    distribution.
152c900d09SJiayu Hu    * Neither the name of Intel Corporation nor the names of its
162c900d09SJiayu Hu    contributors may be used to endorse or promote products derived
172c900d09SJiayu Hu    from this software without specific prior written permission.
182c900d09SJiayu Hu
192c900d09SJiayu Hu    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
202c900d09SJiayu Hu    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
212c900d09SJiayu Hu    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
222c900d09SJiayu Hu    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
232c900d09SJiayu Hu    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
242c900d09SJiayu Hu    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
252c900d09SJiayu Hu    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
262c900d09SJiayu Hu    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
272c900d09SJiayu Hu    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
282c900d09SJiayu Hu    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
292c900d09SJiayu Hu    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
302c900d09SJiayu Hu
312c900d09SJiayu HuGeneric Receive Offload Library
322c900d09SJiayu Hu===============================
332c900d09SJiayu Hu
342c900d09SJiayu HuGeneric Receive Offload (GRO) is a widely used SW-based offloading
351e4cf4d6SJiayu Hutechnique to reduce per-packet processing overheads. By reassembling
361e4cf4d6SJiayu Husmall packets into larger ones, GRO enables applications to process
371e4cf4d6SJiayu Hufewer large packets directly, thus reducing the number of packets to
381e4cf4d6SJiayu Hube processed. To benefit DPDK-based applications, like Open vSwitch,
391e4cf4d6SJiayu HuDPDK also provides own GRO implementation. In DPDK, GRO is implemented
401e4cf4d6SJiayu Huas a standalone library. Applications explicitly use the GRO library to
411e4cf4d6SJiayu Hureassemble packets.
422c900d09SJiayu Hu
431e4cf4d6SJiayu HuOverview
441e4cf4d6SJiayu Hu--------
452c900d09SJiayu Hu
461e4cf4d6SJiayu HuIn the GRO library, there are many GRO types which are defined by packet
471e4cf4d6SJiayu Hutypes. One GRO type is in charge of process one kind of packets. For
481e4cf4d6SJiayu Huexample, TCP/IPv4 GRO processes TCP/IPv4 packets.
492c900d09SJiayu Hu
501e4cf4d6SJiayu HuEach GRO type has a reassembly function, which defines own algorithm and
511e4cf4d6SJiayu Hutable structure to reassemble packets. We assign input packets to the
521e4cf4d6SJiayu Hucorresponding GRO functions by MBUF->packet_type.
532c900d09SJiayu Hu
541e4cf4d6SJiayu HuThe GRO library doesn't check if input packets have correct checksums and
551e4cf4d6SJiayu Hudoesn't re-calculate checksums for merged packets. The GRO library
561e4cf4d6SJiayu Huassumes the packets are complete (i.e., MF==0 && frag_off==0), when IP
57b52b61f0SJiayu Hufragmentation is possible (i.e., DF==0). Additionally, it complies RFC
58b52b61f0SJiayu Hu6864 to process the IPv4 ID field.
592c900d09SJiayu Hu
60*9e0b9d2eSJiayu HuCurrently, the GRO library provides GRO supports for TCP/IPv4 packets and
61*9e0b9d2eSJiayu HuVxLAN packets which contain an outer IPv4 header and an inner TCP/IPv4
62*9e0b9d2eSJiayu Hupacket.
632c900d09SJiayu Hu
641e4cf4d6SJiayu HuTwo Sets of API
651e4cf4d6SJiayu Hu---------------
662c900d09SJiayu Hu
671e4cf4d6SJiayu HuFor different usage scenarios, the GRO library provides two sets of API.
681e4cf4d6SJiayu HuThe one is called the lightweight mode API, which enables applications to
691e4cf4d6SJiayu Humerge a small number of packets rapidly; the other is called the
701e4cf4d6SJiayu Huheavyweight mode API, which provides fine-grained controls to
711e4cf4d6SJiayu Huapplications and supports to merge a large number of packets.
722c900d09SJiayu Hu
731e4cf4d6SJiayu HuLightweight Mode API
741e4cf4d6SJiayu Hu~~~~~~~~~~~~~~~~~~~~
752c900d09SJiayu Hu
761e4cf4d6SJiayu HuThe lightweight mode only has one function ``rte_gro_reassemble_burst()``,
771e4cf4d6SJiayu Huwhich process N packets at a time. Using the lightweight mode API to
781e4cf4d6SJiayu Humerge packets is very simple. Calling ``rte_gro_reassemble_burst()`` is
791e4cf4d6SJiayu Huenough. The GROed packets are returned to applications as soon as it
801e4cf4d6SJiayu Hufinishes.
812c900d09SJiayu Hu
821e4cf4d6SJiayu HuIn ``rte_gro_reassemble_burst()``, table structures of different GRO
831e4cf4d6SJiayu Hutypes are allocated in the stack. This design simplifies applications'
841e4cf4d6SJiayu Huoperations. However, limited by the stack size, the maximum number of
851e4cf4d6SJiayu Hupackets that ``rte_gro_reassemble_burst()`` can process in an invocation
861e4cf4d6SJiayu Hushould be less than or equal to ``RTE_GRO_MAX_BURST_ITEM_NUM``.
872c900d09SJiayu Hu
881e4cf4d6SJiayu HuHeavyweight Mode API
891e4cf4d6SJiayu Hu~~~~~~~~~~~~~~~~~~~~
902c900d09SJiayu Hu
911e4cf4d6SJiayu HuCompared with the lightweight mode, using the heavyweight mode API is
921e4cf4d6SJiayu Hurelatively complex. Firstly, applications need to create a GRO context
931e4cf4d6SJiayu Huby ``rte_gro_ctx_create()``. ``rte_gro_ctx_create()`` allocates tables
941e4cf4d6SJiayu Hustructures in the heap and stores their pointers in the GRO context.
951e4cf4d6SJiayu HuSecondly, applications use ``rte_gro_reassemble()`` to merge packets.
961e4cf4d6SJiayu HuIf input packets have invalid parameters, ``rte_gro_reassemble()``
971e4cf4d6SJiayu Hureturns them to applications. For example, packets of unsupported GRO
981e4cf4d6SJiayu Hutypes or TCP SYN packets are returned. Otherwise, the input packets are
991e4cf4d6SJiayu Hueither merged with the existed packets in the tables or inserted into the
1001e4cf4d6SJiayu Hutables. Finally, applications use ``rte_gro_timeout_flush()`` to flush
1011e4cf4d6SJiayu Hupackets from the tables, when they want to get the GROed packets.
1022c900d09SJiayu Hu
1031e4cf4d6SJiayu HuNote that all update/lookup operations on the GRO context are not thread
1041e4cf4d6SJiayu Husafe. So if different processes or threads want to access the same
1051e4cf4d6SJiayu Hucontext object simultaneously, some external syncing mechanisms must be
1061e4cf4d6SJiayu Huused.
1071e4cf4d6SJiayu Hu
1081e4cf4d6SJiayu HuReassembly Algorithm
1091e4cf4d6SJiayu Hu--------------------
1101e4cf4d6SJiayu Hu
1111e4cf4d6SJiayu HuThe reassembly algorithm is used for reassembling packets. In the GRO
1121e4cf4d6SJiayu Hulibrary, different GRO types can use different algorithms. In this
113*9e0b9d2eSJiayu Husection, we will introduce an algorithm, which is used by TCP/IPv4 GRO
114*9e0b9d2eSJiayu Huand VxLAN GRO.
1151e4cf4d6SJiayu Hu
1161e4cf4d6SJiayu HuChallenges
1171e4cf4d6SJiayu Hu~~~~~~~~~~
1181e4cf4d6SJiayu Hu
1191e4cf4d6SJiayu HuThe reassembly algorithm determines the efficiency of GRO. There are two
1201e4cf4d6SJiayu Huchallenges in the algorithm design:
1211e4cf4d6SJiayu Hu
1221e4cf4d6SJiayu Hu- a high cost algorithm/implementation would cause packet dropping in a
1231e4cf4d6SJiayu Hu  high speed network.
1241e4cf4d6SJiayu Hu
1251e4cf4d6SJiayu Hu- packet reordering makes it hard to merge packets. For example, Linux
1261e4cf4d6SJiayu Hu  GRO fails to merge packets when encounters packet reordering.
1271e4cf4d6SJiayu Hu
1281e4cf4d6SJiayu HuThe above two challenges require our algorithm is:
1291e4cf4d6SJiayu Hu
1301e4cf4d6SJiayu Hu- lightweight enough to scale fast networking speed
1311e4cf4d6SJiayu Hu
1321e4cf4d6SJiayu Hu- capable of handling packet reordering
1331e4cf4d6SJiayu Hu
1341e4cf4d6SJiayu HuIn DPDK GRO, we use a key-based algorithm to address the two challenges.
1351e4cf4d6SJiayu Hu
1361e4cf4d6SJiayu HuKey-based Reassembly Algorithm
1371e4cf4d6SJiayu Hu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1381e4cf4d6SJiayu Hu
1391e4cf4d6SJiayu Hu:numref:`figure_gro-key-algorithm` illustrates the procedure of the
1401e4cf4d6SJiayu Hukey-based algorithm. Packets are classified into "flows" by some header
1411e4cf4d6SJiayu Hufields (we call them as "key"). To process an input packet, the algorithm
1421e4cf4d6SJiayu Husearches for a matched "flow" (i.e., the same value of key) for the
1431e4cf4d6SJiayu Hupacket first, then checks all packets in the "flow" and tries to find a
1441e4cf4d6SJiayu Hu"neighbor" for it. If find a "neighbor", merge the two packets together.
1451e4cf4d6SJiayu HuIf can't find a "neighbor", store the packet into its "flow". If can't
1461e4cf4d6SJiayu Hufind a matched "flow", insert a new "flow" and store the packet into the
1471e4cf4d6SJiayu Hu"flow".
1481e4cf4d6SJiayu Hu
1491e4cf4d6SJiayu Hu.. note::
1501e4cf4d6SJiayu Hu        Packets in the same "flow" that can't merge are always caused
1511e4cf4d6SJiayu Hu        by packet reordering.
1521e4cf4d6SJiayu Hu
1531e4cf4d6SJiayu HuThe key-based algorithm has two characters:
1541e4cf4d6SJiayu Hu
1551e4cf4d6SJiayu Hu- classifying packets into "flows" to accelerate packet aggregation is
1561e4cf4d6SJiayu Hu  simple (address challenge 1).
1571e4cf4d6SJiayu Hu
1581e4cf4d6SJiayu Hu- storing out-of-order packets makes it possible to merge later (address
1591e4cf4d6SJiayu Hu  challenge 2).
1601e4cf4d6SJiayu Hu
1611e4cf4d6SJiayu Hu.. _figure_gro-key-algorithm:
1621e4cf4d6SJiayu Hu
1631e4cf4d6SJiayu Hu.. figure:: img/gro-key-algorithm.*
1641e4cf4d6SJiayu Hu   :align: center
1651e4cf4d6SJiayu Hu
1661e4cf4d6SJiayu Hu   Key-based Reassembly Algorithm
1672c900d09SJiayu Hu
1682c900d09SJiayu HuTCP/IPv4 GRO
1692c900d09SJiayu Hu------------
1702c900d09SJiayu Hu
1711e4cf4d6SJiayu HuThe table structure used by TCP/IPv4 GRO contains two arrays: flow array
1721e4cf4d6SJiayu Huand item array. The flow array keeps flow information, and the item array
1731e4cf4d6SJiayu Hukeeps packet information.
1742c900d09SJiayu Hu
1751e4cf4d6SJiayu HuHeader fields used to define a TCP/IPv4 flow include:
1762c900d09SJiayu Hu
1771e4cf4d6SJiayu Hu- source and destination: Ethernet and IP address, TCP port
1782c900d09SJiayu Hu
1791e4cf4d6SJiayu Hu- TCP acknowledge number
1802c900d09SJiayu Hu
1811e4cf4d6SJiayu HuTCP/IPv4 packets whose FIN, SYN, RST, URG, PSH, ECE or CWR bit is set
1821e4cf4d6SJiayu Huwon't be processed.
1832c900d09SJiayu Hu
1841e4cf4d6SJiayu HuHeader fields deciding if two packets are neighbors include:
1852c900d09SJiayu Hu
1861e4cf4d6SJiayu Hu- TCP sequence number
1872c900d09SJiayu Hu
188b52b61f0SJiayu Hu- IPv4 ID. The IPv4 ID fields of the packets, whose DF bit is 0, should
189b52b61f0SJiayu Hu  be increased by 1.
190b52b61f0SJiayu Hu
191*9e0b9d2eSJiayu HuVxLAN GRO
192*9e0b9d2eSJiayu Hu---------
193*9e0b9d2eSJiayu Hu
194*9e0b9d2eSJiayu HuThe table structure used by VxLAN GRO, which is in charge of processing
195*9e0b9d2eSJiayu HuVxLAN packets with an outer IPv4 header and inner TCP/IPv4 packet, is
196*9e0b9d2eSJiayu Husimilar with that of TCP/IPv4 GRO. Differently, the header fields used
197*9e0b9d2eSJiayu Huto define a VxLAN flow include:
198*9e0b9d2eSJiayu Hu
199*9e0b9d2eSJiayu Hu- outer source and destination: Ethernet and IP address, UDP port
200*9e0b9d2eSJiayu Hu
201*9e0b9d2eSJiayu Hu- VxLAN header (VNI and flag)
202*9e0b9d2eSJiayu Hu
203*9e0b9d2eSJiayu Hu- inner source and destination: Ethernet and IP address, TCP port
204*9e0b9d2eSJiayu Hu
205*9e0b9d2eSJiayu HuHeader fields deciding if packets are neighbors include:
206*9e0b9d2eSJiayu Hu
207*9e0b9d2eSJiayu Hu- outer IPv4 ID. The IPv4 ID fields of the packets, whose DF bit in the
208*9e0b9d2eSJiayu Hu  outer IPv4 header is 0, should be increased by 1.
209*9e0b9d2eSJiayu Hu
210*9e0b9d2eSJiayu Hu- inner TCP sequence number
211*9e0b9d2eSJiayu Hu
212*9e0b9d2eSJiayu Hu- inner IPv4 ID. The IPv4 ID fields of the packets, whose DF bit in the
213*9e0b9d2eSJiayu Hu  inner IPv4 header is 0, should be increased by 1.
214*9e0b9d2eSJiayu Hu
215b52b61f0SJiayu Hu.. note::
216b52b61f0SJiayu Hu        We comply RFC 6864 to process the IPv4 ID field. Specifically,
217b52b61f0SJiayu Hu        we check IPv4 ID fields for the packets whose DF bit is 0 and
218b52b61f0SJiayu Hu        ignore IPv4 ID fields for the packets whose DF bit is 1.
219b52b61f0SJiayu Hu        Additionally, packets which have different value of DF bit can't
220b52b61f0SJiayu Hu        be merged.
221