xref: /dpdk/doc/guides/prog_guide/generic_receive_offload_lib.rst (revision 1e4cf4d6d4fb9786d2b0772867f2158ac90f55d6)
12c900d09SJiayu Hu..  BSD LICENSE
22c900d09SJiayu Hu    Copyright(c) 2017 Intel Corporation. All rights reserved.
32c900d09SJiayu Hu    All rights reserved.
42c900d09SJiayu Hu
52c900d09SJiayu Hu    Redistribution and use in source and binary forms, with or without
62c900d09SJiayu Hu    modification, are permitted provided that the following conditions
72c900d09SJiayu Hu    are met:
82c900d09SJiayu Hu
92c900d09SJiayu Hu    * Redistributions of source code must retain the above copyright
102c900d09SJiayu Hu    notice, this list of conditions and the following disclaimer.
112c900d09SJiayu Hu    * Redistributions in binary form must reproduce the above copyright
122c900d09SJiayu Hu    notice, this list of conditions and the following disclaimer in
132c900d09SJiayu Hu    the documentation and/or other materials provided with the
142c900d09SJiayu Hu    distribution.
152c900d09SJiayu Hu    * Neither the name of Intel Corporation nor the names of its
162c900d09SJiayu Hu    contributors may be used to endorse or promote products derived
172c900d09SJiayu Hu    from this software without specific prior written permission.
182c900d09SJiayu Hu
192c900d09SJiayu Hu    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
202c900d09SJiayu Hu    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
212c900d09SJiayu Hu    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
222c900d09SJiayu Hu    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
232c900d09SJiayu Hu    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
242c900d09SJiayu Hu    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
252c900d09SJiayu Hu    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
262c900d09SJiayu Hu    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
272c900d09SJiayu Hu    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
282c900d09SJiayu Hu    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
292c900d09SJiayu Hu    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
302c900d09SJiayu Hu
312c900d09SJiayu HuGeneric Receive Offload Library
322c900d09SJiayu Hu===============================
332c900d09SJiayu Hu
342c900d09SJiayu HuGeneric Receive Offload (GRO) is a widely used SW-based offloading
35*1e4cf4d6SJiayu Hutechnique to reduce per-packet processing overheads. By reassembling
36*1e4cf4d6SJiayu Husmall packets into larger ones, GRO enables applications to process
37*1e4cf4d6SJiayu Hufewer large packets directly, thus reducing the number of packets to
38*1e4cf4d6SJiayu Hube processed. To benefit DPDK-based applications, like Open vSwitch,
39*1e4cf4d6SJiayu HuDPDK also provides own GRO implementation. In DPDK, GRO is implemented
40*1e4cf4d6SJiayu Huas a standalone library. Applications explicitly use the GRO library to
41*1e4cf4d6SJiayu Hureassemble packets.
422c900d09SJiayu Hu
43*1e4cf4d6SJiayu HuOverview
44*1e4cf4d6SJiayu Hu--------
452c900d09SJiayu Hu
46*1e4cf4d6SJiayu HuIn the GRO library, there are many GRO types which are defined by packet
47*1e4cf4d6SJiayu Hutypes. One GRO type is in charge of process one kind of packets. For
48*1e4cf4d6SJiayu Huexample, TCP/IPv4 GRO processes TCP/IPv4 packets.
492c900d09SJiayu Hu
50*1e4cf4d6SJiayu HuEach GRO type has a reassembly function, which defines own algorithm and
51*1e4cf4d6SJiayu Hutable structure to reassemble packets. We assign input packets to the
52*1e4cf4d6SJiayu Hucorresponding GRO functions by MBUF->packet_type.
532c900d09SJiayu Hu
54*1e4cf4d6SJiayu HuThe GRO library doesn't check if input packets have correct checksums and
55*1e4cf4d6SJiayu Hudoesn't re-calculate checksums for merged packets. The GRO library
56*1e4cf4d6SJiayu Huassumes the packets are complete (i.e., MF==0 && frag_off==0), when IP
57*1e4cf4d6SJiayu Hufragmentation is possible (i.e., DF==0). Additionally, it requires IPv4
58*1e4cf4d6SJiayu HuID to be increased by one.
592c900d09SJiayu Hu
60*1e4cf4d6SJiayu HuCurrently, the GRO library provides GRO supports for TCP/IPv4 packets.
612c900d09SJiayu Hu
62*1e4cf4d6SJiayu HuTwo Sets of API
63*1e4cf4d6SJiayu Hu---------------
642c900d09SJiayu Hu
65*1e4cf4d6SJiayu HuFor different usage scenarios, the GRO library provides two sets of API.
66*1e4cf4d6SJiayu HuThe one is called the lightweight mode API, which enables applications to
67*1e4cf4d6SJiayu Humerge a small number of packets rapidly; the other is called the
68*1e4cf4d6SJiayu Huheavyweight mode API, which provides fine-grained controls to
69*1e4cf4d6SJiayu Huapplications and supports to merge a large number of packets.
702c900d09SJiayu Hu
71*1e4cf4d6SJiayu HuLightweight Mode API
72*1e4cf4d6SJiayu Hu~~~~~~~~~~~~~~~~~~~~
732c900d09SJiayu Hu
74*1e4cf4d6SJiayu HuThe lightweight mode only has one function ``rte_gro_reassemble_burst()``,
75*1e4cf4d6SJiayu Huwhich process N packets at a time. Using the lightweight mode API to
76*1e4cf4d6SJiayu Humerge packets is very simple. Calling ``rte_gro_reassemble_burst()`` is
77*1e4cf4d6SJiayu Huenough. The GROed packets are returned to applications as soon as it
78*1e4cf4d6SJiayu Hufinishes.
792c900d09SJiayu Hu
80*1e4cf4d6SJiayu HuIn ``rte_gro_reassemble_burst()``, table structures of different GRO
81*1e4cf4d6SJiayu Hutypes are allocated in the stack. This design simplifies applications'
82*1e4cf4d6SJiayu Huoperations. However, limited by the stack size, the maximum number of
83*1e4cf4d6SJiayu Hupackets that ``rte_gro_reassemble_burst()`` can process in an invocation
84*1e4cf4d6SJiayu Hushould be less than or equal to ``RTE_GRO_MAX_BURST_ITEM_NUM``.
852c900d09SJiayu Hu
86*1e4cf4d6SJiayu HuHeavyweight Mode API
87*1e4cf4d6SJiayu Hu~~~~~~~~~~~~~~~~~~~~
882c900d09SJiayu Hu
89*1e4cf4d6SJiayu HuCompared with the lightweight mode, using the heavyweight mode API is
90*1e4cf4d6SJiayu Hurelatively complex. Firstly, applications need to create a GRO context
91*1e4cf4d6SJiayu Huby ``rte_gro_ctx_create()``. ``rte_gro_ctx_create()`` allocates tables
92*1e4cf4d6SJiayu Hustructures in the heap and stores their pointers in the GRO context.
93*1e4cf4d6SJiayu HuSecondly, applications use ``rte_gro_reassemble()`` to merge packets.
94*1e4cf4d6SJiayu HuIf input packets have invalid parameters, ``rte_gro_reassemble()``
95*1e4cf4d6SJiayu Hureturns them to applications. For example, packets of unsupported GRO
96*1e4cf4d6SJiayu Hutypes or TCP SYN packets are returned. Otherwise, the input packets are
97*1e4cf4d6SJiayu Hueither merged with the existed packets in the tables or inserted into the
98*1e4cf4d6SJiayu Hutables. Finally, applications use ``rte_gro_timeout_flush()`` to flush
99*1e4cf4d6SJiayu Hupackets from the tables, when they want to get the GROed packets.
1002c900d09SJiayu Hu
101*1e4cf4d6SJiayu HuNote that all update/lookup operations on the GRO context are not thread
102*1e4cf4d6SJiayu Husafe. So if different processes or threads want to access the same
103*1e4cf4d6SJiayu Hucontext object simultaneously, some external syncing mechanisms must be
104*1e4cf4d6SJiayu Huused.
105*1e4cf4d6SJiayu Hu
106*1e4cf4d6SJiayu HuReassembly Algorithm
107*1e4cf4d6SJiayu Hu--------------------
108*1e4cf4d6SJiayu Hu
109*1e4cf4d6SJiayu HuThe reassembly algorithm is used for reassembling packets. In the GRO
110*1e4cf4d6SJiayu Hulibrary, different GRO types can use different algorithms. In this
111*1e4cf4d6SJiayu Husection, we will introduce an algorithm, which is used by TCP/IPv4 GRO.
112*1e4cf4d6SJiayu Hu
113*1e4cf4d6SJiayu HuChallenges
114*1e4cf4d6SJiayu Hu~~~~~~~~~~
115*1e4cf4d6SJiayu Hu
116*1e4cf4d6SJiayu HuThe reassembly algorithm determines the efficiency of GRO. There are two
117*1e4cf4d6SJiayu Huchallenges in the algorithm design:
118*1e4cf4d6SJiayu Hu
119*1e4cf4d6SJiayu Hu- a high cost algorithm/implementation would cause packet dropping in a
120*1e4cf4d6SJiayu Hu  high speed network.
121*1e4cf4d6SJiayu Hu
122*1e4cf4d6SJiayu Hu- packet reordering makes it hard to merge packets. For example, Linux
123*1e4cf4d6SJiayu Hu  GRO fails to merge packets when encounters packet reordering.
124*1e4cf4d6SJiayu Hu
125*1e4cf4d6SJiayu HuThe above two challenges require our algorithm is:
126*1e4cf4d6SJiayu Hu
127*1e4cf4d6SJiayu Hu- lightweight enough to scale fast networking speed
128*1e4cf4d6SJiayu Hu
129*1e4cf4d6SJiayu Hu- capable of handling packet reordering
130*1e4cf4d6SJiayu Hu
131*1e4cf4d6SJiayu HuIn DPDK GRO, we use a key-based algorithm to address the two challenges.
132*1e4cf4d6SJiayu Hu
133*1e4cf4d6SJiayu HuKey-based Reassembly Algorithm
134*1e4cf4d6SJiayu Hu~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
135*1e4cf4d6SJiayu Hu
136*1e4cf4d6SJiayu Hu:numref:`figure_gro-key-algorithm` illustrates the procedure of the
137*1e4cf4d6SJiayu Hukey-based algorithm. Packets are classified into "flows" by some header
138*1e4cf4d6SJiayu Hufields (we call them as "key"). To process an input packet, the algorithm
139*1e4cf4d6SJiayu Husearches for a matched "flow" (i.e., the same value of key) for the
140*1e4cf4d6SJiayu Hupacket first, then checks all packets in the "flow" and tries to find a
141*1e4cf4d6SJiayu Hu"neighbor" for it. If find a "neighbor", merge the two packets together.
142*1e4cf4d6SJiayu HuIf can't find a "neighbor", store the packet into its "flow". If can't
143*1e4cf4d6SJiayu Hufind a matched "flow", insert a new "flow" and store the packet into the
144*1e4cf4d6SJiayu Hu"flow".
145*1e4cf4d6SJiayu Hu
146*1e4cf4d6SJiayu Hu.. note::
147*1e4cf4d6SJiayu Hu        Packets in the same "flow" that can't merge are always caused
148*1e4cf4d6SJiayu Hu        by packet reordering.
149*1e4cf4d6SJiayu Hu
150*1e4cf4d6SJiayu HuThe key-based algorithm has two characters:
151*1e4cf4d6SJiayu Hu
152*1e4cf4d6SJiayu Hu- classifying packets into "flows" to accelerate packet aggregation is
153*1e4cf4d6SJiayu Hu  simple (address challenge 1).
154*1e4cf4d6SJiayu Hu
155*1e4cf4d6SJiayu Hu- storing out-of-order packets makes it possible to merge later (address
156*1e4cf4d6SJiayu Hu  challenge 2).
157*1e4cf4d6SJiayu Hu
158*1e4cf4d6SJiayu Hu.. _figure_gro-key-algorithm:
159*1e4cf4d6SJiayu Hu
160*1e4cf4d6SJiayu Hu.. figure:: img/gro-key-algorithm.*
161*1e4cf4d6SJiayu Hu   :align: center
162*1e4cf4d6SJiayu Hu
163*1e4cf4d6SJiayu Hu   Key-based Reassembly Algorithm
1642c900d09SJiayu Hu
1652c900d09SJiayu HuTCP/IPv4 GRO
1662c900d09SJiayu Hu------------
1672c900d09SJiayu Hu
168*1e4cf4d6SJiayu HuThe table structure used by TCP/IPv4 GRO contains two arrays: flow array
169*1e4cf4d6SJiayu Huand item array. The flow array keeps flow information, and the item array
170*1e4cf4d6SJiayu Hukeeps packet information.
1712c900d09SJiayu Hu
172*1e4cf4d6SJiayu HuHeader fields used to define a TCP/IPv4 flow include:
1732c900d09SJiayu Hu
174*1e4cf4d6SJiayu Hu- source and destination: Ethernet and IP address, TCP port
1752c900d09SJiayu Hu
176*1e4cf4d6SJiayu Hu- TCP acknowledge number
1772c900d09SJiayu Hu
178*1e4cf4d6SJiayu HuTCP/IPv4 packets whose FIN, SYN, RST, URG, PSH, ECE or CWR bit is set
179*1e4cf4d6SJiayu Huwon't be processed.
1802c900d09SJiayu Hu
181*1e4cf4d6SJiayu HuHeader fields deciding if two packets are neighbors include:
1822c900d09SJiayu Hu
183*1e4cf4d6SJiayu Hu- TCP sequence number
1842c900d09SJiayu Hu
185*1e4cf4d6SJiayu Hu- IPv4 ID. The IPv4 ID fields of the packets should be increased by 1.
186