xref: /dpdk/doc/guides/prog_guide/generic_segmentation_offload_lib.rst (revision 5630257fcc30397e7217139ec55da4f301f59fb7)
1..  SPDX-License-Identifier: BSD-3-Clause
2    Copyright(c) 2017 Intel Corporation.
3
4Generic Segmentation Offload Library
5====================================
6
7Overview
8--------
9Generic Segmentation Offload (GSO) is a widely used software implementation of
10TCP Segmentation Offload (TSO), which reduces per-packet processing overhead.
11Much like TSO, GSO gains performance by enabling upper layer applications to
12process a smaller number of large packets (e.g. MTU size of 64KB), instead of
13processing higher numbers of small packets (e.g. MTU size of 1500B), thus
14reducing per-packet overhead.
15
16For example, GSO allows guest kernel stacks to transmit over-sized TCP segments
17that far exceed the kernel interface's MTU; this eliminates the need to segment
18packets within the guest, and improves the data-to-overhead ratio of both the
19guest-host link, and PCI bus. The expectation of the guest network stack in this
20scenario is that segmentation of egress frames will take place either in the NIC
21HW, or where that hardware capability is unavailable, either in the host
22application, or network stack.
23
24Bearing that in mind, the GSO library enables DPDK applications to segment
25packets in software. Note however, that GSO is implemented as a standalone
26library, and not via a 'fallback' mechanism (i.e. for when TSO is unsupported
27in the underlying hardware); that is, applications must explicitly invoke the
28GSO library to segment packets. The size of GSO segments ``(segsz)`` is
29configurable by the application.
30
31Limitations
32-----------
33
34#. The GSO library doesn't check if input packets have correct checksums.
35
36#. In addition, the GSO library doesn't re-calculate checksums for segmented
37   packets (that task is left to the application).
38
39#. IP fragments are unsupported by the GSO library.
40
41#. The egress interface's driver must support multi-segment packets.
42
43#. Currently, the GSO library supports the following IPv4 packet types:
44
45 - TCP
46 - VxLAN
47 - GRE
48
49  See `Supported GSO Packet Types`_ for further details.
50
51Packet Segmentation
52-------------------
53
54The ``rte_gso_segment()`` function is the GSO library's primary
55segmentation API.
56
57Before performing segmentation, an application must create a GSO context object
58``(struct rte_gso_ctx)``, which provides the library with some of the
59information required to understand how the packet should be segmented. Refer to
60`How to Segment a Packet`_ for additional details on same. Once the GSO context
61has been created, and populated, the application can then use the
62``rte_gso_segment()`` function to segment packets.
63
64The GSO library typically stores each segment that it creates in two parts: the
65first part contains a copy of the original packet's headers, while the second
66part contains a pointer to an offset within the original packet. This mechanism
67is explained in more detail in `GSO Output Segment Format`_.
68
69The GSO library supports both single- and multi-segment input mbufs.
70
71GSO Output Segment Format
72~~~~~~~~~~~~~~~~~~~~~~~~~
73To reduce the number of expensive memcpy operations required when segmenting a
74packet, the GSO library typically stores each segment that it creates as a
75two-part mbuf (technically, this is termed a 'two-segment' mbuf; however, since
76the elements produced by the API are also called 'segments', for clarity the
77term 'part' is used here instead).
78
79The first part of each output segment is a direct mbuf and contains a copy of
80the original packet's headers, which must be prepended to each output segment.
81These headers are copied from the original packet into each output segment.
82
83The second part of each output segment, represents a section of data from the
84original packet, i.e. a data segment. Rather than copy the data directly from
85the original packet into the output segment (which would impact performance
86considerably), the second part of each output segment is an indirect mbuf,
87which contains no actual data, but simply points to an offset within the
88original packet.
89
90The combination of the 'header' segment and the 'data' segment constitutes a
91single logical output GSO segment of the original packet. This is illustrated
92in :numref:`figure_gso-output-segment-format`.
93
94.. _figure_gso-output-segment-format:
95
96.. figure:: img/gso-output-segment-format.*
97   :align: center
98
99   Two-part GSO output segment
100
101In one situation, the output segment may contain additional 'data' segments.
102This only occurs when:
103
104- the input packet on which GSO is to be performed is represented by a
105  multi-segment mbuf.
106
107- the output segment is required to contain data that spans the boundaries
108  between segments of the input multi-segment mbuf.
109
110The GSO library traverses each segment of the input packet, and produces
111numerous output segments; for optimal performance, the number of output
112segments is kept to a minimum. Consequently, the GSO library maximizes the
113amount of data contained within each output segment; i.e. each output segment
114``segsz`` bytes of data. The only exception to this is in the case of the very
115final output segment; if ``pkt_len`` % ``segsz``, then the final segment is
116smaller than the rest.
117
118In order for an output segment to meet its MSS, it may need to include data from
119multiple input segments. Due to the nature of indirect mbufs (each indirect mbuf
120can point to only one direct mbuf), the solution here is to add another indirect
121mbuf to the output segment; this additional segment then points to the next
122input segment. If necessary, this chaining process is repeated, until the sum of
123all of the data 'contained' in the output segment reaches ``segsz``. This
124ensures that the amount of data contained within each output segment is uniform,
125with the possible exception of the last segment, as previously described.
126
127:numref:`figure_gso-three-seg-mbuf` illustrates an example of a three-part
128output segment. In this example, the output segment needs to include data from
129the end of one input segment, and the beginning of another. To achieve this,
130an additional indirect mbuf is chained to the second part of the output segment,
131and is attached to the next input segment (i.e. it points to the data in the
132next input segment).
133
134.. _figure_gso-three-seg-mbuf:
135
136.. figure:: img/gso-three-seg-mbuf.*
137   :align: center
138
139   Three-part GSO output segment
140
141Supported GSO Packet Types
142--------------------------
143
144TCP/IPv4 GSO
145~~~~~~~~~~~~
146TCP/IPv4 GSO supports segmentation of suitably large TCP/IPv4 packets, which
147may also contain an optional VLAN tag.
148
149VxLAN GSO
150~~~~~~~~~
151VxLAN packets GSO supports segmentation of suitably large VxLAN packets,
152which contain an outer IPv4 header, inner TCP/IPv4 headers, and optional
153inner and/or outer VLAN tag(s).
154
155GRE GSO
156~~~~~~~
157GRE GSO supports segmentation of suitably large GRE packets, which contain
158an outer IPv4 header, inner TCP/IPv4 headers, and an optional VLAN tag.
159
160How to Segment a Packet
161-----------------------
162
163To segment an outgoing packet, an application must:
164
165#. First create a GSO context ``(struct rte_gso_ctx)``; this contains:
166
167   - a pointer to the mbuf pool for allocating the direct buffers, which are
168     used to store the GSO segments' packet headers.
169
170   - a pointer to the mbuf pool for allocating indirect buffers, which are
171     used to locate GSO segments' packet payloads.
172
173     .. note::
174
175       An application may use the same pool for both direct and indirect
176       buffers. However, since indirect mbufs simply store a pointer, the
177       application may reduce its memory consumption by creating a separate memory
178       pool, containing smaller elements, for the indirect pool.
179
180
181   - the size of each output segment, including packet headers and payload,
182     measured in bytes.
183
184   - the bit mask of required GSO types. The GSO library uses the same macros as
185     those that describe a physical device's TX offloading capabilities (i.e.
186     ``DEV_TX_OFFLOAD_*_TSO``) for gso_types. For example, if an application
187     wants to segment TCP/IPv4 packets, it should set gso_types to
188     ``DEV_TX_OFFLOAD_TCP_TSO``. The only other supported values currently
189     supported for gso_types are ``DEV_TX_OFFLOAD_VXLAN_TNL_TSO``, and
190     ``DEV_TX_OFFLOAD_GRE_TNL_TSO``; a combination of these macros is also
191     allowed.
192
193   - a flag, that indicates whether the IPv4 headers of output segments should
194     contain fixed or incremental ID values.
195
1962. Set the appropriate ol_flags in the mbuf.
197
198   - The GSO library use the value of an mbuf's ``ol_flags`` attribute to
199     to determine how a packet should be segmented. It is the application's
200     responsibility to ensure that these flags are set.
201
202   - For example, in order to segment TCP/IPv4 packets, the application should
203     add the ``PKT_TX_IPV4`` and ``PKT_TX_TCP_SEG`` flags to the mbuf's
204     ol_flags.
205
206   - If checksum calculation in hardware is required, the application should
207     also add the ``PKT_TX_TCP_CKSUM`` and ``PKT_TX_IP_CKSUM`` flags.
208
209#. Check if the packet should be processed. Packets with one of the
210   following properties are not processed and are returned immediately:
211
212   - Packet length is less than ``segsz`` (i.e. GSO is not required).
213
214   - Packet type is not supported by GSO library (see
215     `Supported GSO Packet Types`_).
216
217   - Application has not enabled GSO support for the packet type.
218
219   - Packet's ol_flags have been incorrectly set.
220
221#. Allocate space in which to store the output GSO segments. If the amount of
222   space allocated by the application is insufficient, segmentation will fail.
223
224#. Invoke the GSO segmentation API, ``rte_gso_segment()``.
225
226#. If required, update the L3 and L4 checksums of the newly-created segments.
227   For tunneled packets, the outer IPv4 headers' checksums should also be
228   updated. Alternatively, the application may offload checksum calculation
229   to HW.
230
231