xref: /dpdk/doc/guides/prog_guide/generic_receive_offload_lib.rst (revision b52b61f0464ee6785137f21cbabc76847bdedc0c)
12c900d09SJiayu Hu..  BSD LICENSE
22c900d09SJiayu Hu    Copyright(c) 2017 Intel Corporation. All rights reserved.
32c900d09SJiayu Hu    All rights reserved.
42c900d09SJiayu Hu
52c900d09SJiayu Hu    Redistribution and use in source and binary forms, with or without
62c900d09SJiayu Hu    modification, are permitted provided that the following conditions
72c900d09SJiayu Hu    are met:
82c900d09SJiayu Hu
92c900d09SJiayu Hu    * Redistributions of source code must retain the above copyright
102c900d09SJiayu Hu    notice, this list of conditions and the following disclaimer.
112c900d09SJiayu Hu    * Redistributions in binary form must reproduce the above copyright
122c900d09SJiayu Hu    notice, this list of conditions and the following disclaimer in
132c900d09SJiayu Hu    the documentation and/or other materials provided with the
142c900d09SJiayu Hu    distribution.
152c900d09SJiayu Hu    * Neither the name of Intel Corporation nor the names of its
162c900d09SJiayu Hu    contributors may be used to endorse or promote products derived
172c900d09SJiayu Hu    from this software without specific prior written permission.
182c900d09SJiayu Hu
192c900d09SJiayu Hu    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
202c900d09SJiayu Hu    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
212c900d09SJiayu Hu    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
222c900d09SJiayu Hu    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
232c900d09SJiayu Hu    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
242c900d09SJiayu Hu    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
252c900d09SJiayu Hu    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
262c900d09SJiayu Hu    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
272c900d09SJiayu Hu    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
282c900d09SJiayu Hu    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
292c900d09SJiayu Hu    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
302c900d09SJiayu Hu
312c900d09SJiayu HuGeneric Receive Offload Library
322c900d09SJiayu Hu===============================
332c900d09SJiayu Hu
342c900d09SJiayu HuGeneric Receive Offload (GRO) is a widely used SW-based offloading
351e4cf4d6SJiayu Hutechnique to reduce per-packet processing overheads. By reassembling
361e4cf4d6SJiayu Husmall packets into larger ones, GRO enables applications to process
371e4cf4d6SJiayu Hufewer large packets directly, thus reducing the number of packets to
381e4cf4d6SJiayu Hube processed. To benefit DPDK-based applications, like Open vSwitch,
391e4cf4d6SJiayu HuDPDK also provides own GRO implementation. In DPDK, GRO is implemented
401e4cf4d6SJiayu Huas a standalone library. Applications explicitly use the GRO library to
411e4cf4d6SJiayu Hureassemble packets.
422c900d09SJiayu Hu
431e4cf4d6SJiayu HuOverview
441e4cf4d6SJiayu Hu--------
452c900d09SJiayu Hu
461e4cf4d6SJiayu HuIn the GRO library, there are many GRO types which are defined by packet
471e4cf4d6SJiayu Hutypes. One GRO type is in charge of process one kind of packets. For
481e4cf4d6SJiayu Huexample, TCP/IPv4 GRO processes TCP/IPv4 packets.
492c900d09SJiayu Hu
501e4cf4d6SJiayu HuEach GRO type has a reassembly function, which defines own algorithm and
511e4cf4d6SJiayu Hutable structure to reassemble packets. We assign input packets to the
521e4cf4d6SJiayu Hucorresponding GRO functions by MBUF->packet_type.
532c900d09SJiayu Hu
541e4cf4d6SJiayu HuThe GRO library doesn't check if input packets have correct checksums and
551e4cf4d6SJiayu Hudoesn't re-calculate checksums for merged packets. The GRO library
561e4cf4d6SJiayu Huassumes the packets are complete (i.e., MF==0 && frag_off==0), when IP
57*b52b61f0SJiayu Hufragmentation is possible (i.e., DF==0). Additionally, it complies RFC
58*b52b61f0SJiayu Hu6864 to process the IPv4 ID field.
592c900d09SJiayu Hu
601e4cf4d6SJiayu HuCurrently, the GRO library provides GRO supports for TCP/IPv4 packets.
612c900d09SJiayu Hu
621e4cf4d6SJiayu HuTwo Sets of API
631e4cf4d6SJiayu Hu---------------
642c900d09SJiayu Hu
651e4cf4d6SJiayu HuFor different usage scenarios, the GRO library provides two sets of API.
661e4cf4d6SJiayu HuThe one is called the lightweight mode API, which enables applications to
671e4cf4d6SJiayu Humerge a small number of packets rapidly; the other is called the
681e4cf4d6SJiayu Huheavyweight mode API, which provides fine-grained controls to
691e4cf4d6SJiayu Huapplications and supports to merge a large number of packets.
702c900d09SJiayu Hu
711e4cf4d6SJiayu HuLightweight Mode API
721e4cf4d6SJiayu Hu~~~~~~~~~~~~~~~~~~~~
732c900d09SJiayu Hu
741e4cf4d6SJiayu HuThe lightweight mode only has one function ``rte_gro_reassemble_burst()``,
751e4cf4d6SJiayu Huwhich process N packets at a time. Using the lightweight mode API to
761e4cf4d6SJiayu Humerge packets is very simple. Calling ``rte_gro_reassemble_burst()`` is
771e4cf4d6SJiayu Huenough. The GROed packets are returned to applications as soon as it
781e4cf4d6SJiayu Hufinishes.
792c900d09SJiayu Hu
801e4cf4d6SJiayu HuIn ``rte_gro_reassemble_burst()``, table structures of different GRO
811e4cf4d6SJiayu Hutypes are allocated in the stack. This design simplifies applications'
821e4cf4d6SJiayu Huoperations. However, limited by the stack size, the maximum number of
831e4cf4d6SJiayu Hupackets that ``rte_gro_reassemble_burst()`` can process in an invocation
841e4cf4d6SJiayu Hushould be less than or equal to ``RTE_GRO_MAX_BURST_ITEM_NUM``.
852c900d09SJiayu Hu
861e4cf4d6SJiayu HuHeavyweight Mode API
871e4cf4d6SJiayu Hu~~~~~~~~~~~~~~~~~~~~
882c900d09SJiayu Hu
891e4cf4d6SJiayu HuCompared with the lightweight mode, using the heavyweight mode API is
901e4cf4d6SJiayu Hurelatively complex. Firstly, applications need to create a GRO context
911e4cf4d6SJiayu Huby ``rte_gro_ctx_create()``. ``rte_gro_ctx_create()`` allocates tables
921e4cf4d6SJiayu Hustructures in the heap and stores their pointers in the GRO context.
931e4cf4d6SJiayu HuSecondly, applications use ``rte_gro_reassemble()`` to merge packets.
941e4cf4d6SJiayu HuIf input packets have invalid parameters, ``rte_gro_reassemble()``
951e4cf4d6SJiayu Hureturns them to applications. For example, packets of unsupported GRO
961e4cf4d6SJiayu Hutypes or TCP SYN packets are returned. Otherwise, the input packets are
971e4cf4d6SJiayu Hueither merged with the existed packets in the tables or inserted into the
981e4cf4d6SJiayu Hutables. Finally, applications use ``rte_gro_timeout_flush()`` to flush
991e4cf4d6SJiayu Hupackets from the tables, when they want to get the GROed packets.
1002c900d09SJiayu Hu
1011e4cf4d6SJiayu HuNote that all update/lookup operations on the GRO context are not thread
1021e4cf4d6SJiayu Husafe. So if different processes or threads want to access the same
1031e4cf4d6SJiayu Hucontext object simultaneously, some external syncing mechanisms must be
1041e4cf4d6SJiayu Huused.
1051e4cf4d6SJiayu Hu
1061e4cf4d6SJiayu HuReassembly Algorithm
1071e4cf4d6SJiayu Hu--------------------
1081e4cf4d6SJiayu Hu
1091e4cf4d6SJiayu HuThe reassembly algorithm is used for reassembling packets. In the GRO
1101e4cf4d6SJiayu Hulibrary, different GRO types can use different algorithms. In this
1111e4cf4d6SJiayu Husection, we will introduce an algorithm, which is used by TCP/IPv4 GRO.
1121e4cf4d6SJiayu Hu
1131e4cf4d6SJiayu HuChallenges
1141e4cf4d6SJiayu Hu~~~~~~~~~~
1151e4cf4d6SJiayu Hu
1161e4cf4d6SJiayu HuThe reassembly algorithm determines the efficiency of GRO. There are two
1171e4cf4d6SJiayu Huchallenges in the algorithm design:
1181e4cf4d6SJiayu Hu
1191e4cf4d6SJiayu Hu- a high cost algorithm/implementation would cause packet dropping in a
1201e4cf4d6SJiayu Hu  high speed network.
1211e4cf4d6SJiayu Hu
1221e4cf4d6SJiayu Hu- packet reordering makes it hard to merge packets. For example, Linux
1231e4cf4d6SJiayu Hu  GRO fails to merge packets when encounters packet reordering.
1241e4cf4d6SJiayu Hu
1251e4cf4d6SJiayu HuThe above two challenges require our algorithm is:
1261e4cf4d6SJiayu Hu
1271e4cf4d6SJiayu Hu- lightweight enough to scale fast networking speed
1281e4cf4d6SJiayu Hu
1291e4cf4d6SJiayu Hu- capable of handling packet reordering
1301e4cf4d6SJiayu Hu
1311e4cf4d6SJiayu HuIn DPDK GRO, we use a key-based algorithm to address the two challenges.
1321e4cf4d6SJiayu Hu
1331e4cf4d6SJiayu HuKey-based Reassembly Algorithm
1341e4cf4d6SJiayu Hu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1351e4cf4d6SJiayu Hu
1361e4cf4d6SJiayu Hu:numref:`figure_gro-key-algorithm` illustrates the procedure of the
1371e4cf4d6SJiayu Hukey-based algorithm. Packets are classified into "flows" by some header
1381e4cf4d6SJiayu Hufields (we call them as "key"). To process an input packet, the algorithm
1391e4cf4d6SJiayu Husearches for a matched "flow" (i.e., the same value of key) for the
1401e4cf4d6SJiayu Hupacket first, then checks all packets in the "flow" and tries to find a
1411e4cf4d6SJiayu Hu"neighbor" for it. If find a "neighbor", merge the two packets together.
1421e4cf4d6SJiayu HuIf can't find a "neighbor", store the packet into its "flow". If can't
1431e4cf4d6SJiayu Hufind a matched "flow", insert a new "flow" and store the packet into the
1441e4cf4d6SJiayu Hu"flow".
1451e4cf4d6SJiayu Hu
1461e4cf4d6SJiayu Hu.. note::
1471e4cf4d6SJiayu Hu        Packets in the same "flow" that can't merge are always caused
1481e4cf4d6SJiayu Hu        by packet reordering.
1491e4cf4d6SJiayu Hu
1501e4cf4d6SJiayu HuThe key-based algorithm has two characters:
1511e4cf4d6SJiayu Hu
1521e4cf4d6SJiayu Hu- classifying packets into "flows" to accelerate packet aggregation is
1531e4cf4d6SJiayu Hu  simple (address challenge 1).
1541e4cf4d6SJiayu Hu
1551e4cf4d6SJiayu Hu- storing out-of-order packets makes it possible to merge later (address
1561e4cf4d6SJiayu Hu  challenge 2).
1571e4cf4d6SJiayu Hu
1581e4cf4d6SJiayu Hu.. _figure_gro-key-algorithm:
1591e4cf4d6SJiayu Hu
1601e4cf4d6SJiayu Hu.. figure:: img/gro-key-algorithm.*
1611e4cf4d6SJiayu Hu   :align: center
1621e4cf4d6SJiayu Hu
1631e4cf4d6SJiayu Hu   Key-based Reassembly Algorithm
1642c900d09SJiayu Hu
1652c900d09SJiayu HuTCP/IPv4 GRO
1662c900d09SJiayu Hu------------
1672c900d09SJiayu Hu
1681e4cf4d6SJiayu HuThe table structure used by TCP/IPv4 GRO contains two arrays: flow array
1691e4cf4d6SJiayu Huand item array. The flow array keeps flow information, and the item array
1701e4cf4d6SJiayu Hukeeps packet information.
1712c900d09SJiayu Hu
1721e4cf4d6SJiayu HuHeader fields used to define a TCP/IPv4 flow include:
1732c900d09SJiayu Hu
1741e4cf4d6SJiayu Hu- source and destination: Ethernet and IP address, TCP port
1752c900d09SJiayu Hu
1761e4cf4d6SJiayu Hu- TCP acknowledge number
1772c900d09SJiayu Hu
1781e4cf4d6SJiayu HuTCP/IPv4 packets whose FIN, SYN, RST, URG, PSH, ECE or CWR bit is set
1791e4cf4d6SJiayu Huwon't be processed.
1802c900d09SJiayu Hu
1811e4cf4d6SJiayu HuHeader fields deciding if two packets are neighbors include:
1822c900d09SJiayu Hu
1831e4cf4d6SJiayu Hu- TCP sequence number
1842c900d09SJiayu Hu
185*b52b61f0SJiayu Hu- IPv4 ID. The IPv4 ID fields of the packets, whose DF bit is 0, should
186*b52b61f0SJiayu Hu  be increased by 1.
187*b52b61f0SJiayu Hu
188*b52b61f0SJiayu Hu.. note::
189*b52b61f0SJiayu Hu        We comply RFC 6864 to process the IPv4 ID field. Specifically,
190*b52b61f0SJiayu Hu        we check IPv4 ID fields for the packets whose DF bit is 0 and
191*b52b61f0SJiayu Hu        ignore IPv4 ID fields for the packets whose DF bit is 1.
192*b52b61f0SJiayu Hu        Additionally, packets which have different value of DF bit can't
193*b52b61f0SJiayu Hu        be merged.
194