1.. SPDX-License-Identifier: BSD-3-Clause 2 Copyright(c) 2010-2014 Intel Corporation. 3 4Packet (Mbuf) Library 5===================== 6 7The Packet (MBuf) library provides the ability to allocate and free buffers (mbufs) 8that may be used by the DPDK application to store message buffers. 9The message buffers are stored in a mempool, using the :doc:`mempool_lib`. 10 11A rte_mbuf struct generally carries network packet buffers, but it can actually 12be any data (control data, events, ...). 13The rte_mbuf header structure is kept as small as possible and currently uses 14just two cache lines, with the most frequently used fields being on the first 15of the two cache lines. 16 17Design of Packet Buffers 18------------------------ 19 20For the storage of the packet data (including protocol headers), two approaches were considered: 21 22#. Embed metadata within a single memory buffer the structure followed by a fixed size area for the packet data. 23 24#. Use separate memory buffers for the metadata structure and for the packet data. 25 26The advantage of the first method is that it only needs one operation to allocate/free the whole memory representation of a packet. 27On the other hand, the second method is more flexible and allows 28the complete separation of the allocation of metadata structures from the allocation of packet data buffers. 29 30The first method was chosen for the DPDK. 31The metadata contains control information such as message type, length, 32offset to the start of the data and a pointer for additional mbuf structures allowing buffer chaining. 33 34Message buffers that are used to carry network packets can handle buffer chaining 35where multiple buffers are required to hold the complete packet. 36This is the case for jumbo frames that are composed of many mbufs linked together through their next field. 37 38For a newly allocated mbuf, the area at which the data begins in the message buffer is 39RTE_PKTMBUF_HEADROOM bytes after the beginning of the buffer, which is cache aligned. 40Message buffers may be used to carry control information, packets, events, 41and so on between different entities in the system. 42Message buffers may also use their buffer pointers to point to other message buffer data sections or other structures. 43 44:numref:`figure_mbuf1` and :numref:`figure_mbuf2` show some of these scenarios. 45 46.. _figure_mbuf1: 47 48.. figure:: img/mbuf1.* 49 50 An mbuf with One Segment 51 52 53.. _figure_mbuf2: 54 55.. figure:: img/mbuf2.* 56 57 An mbuf with Three Segments 58 59 60The Buffer Manager implements a fairly standard set of buffer access functions to manipulate network packets. 61 62Buffers Stored in Memory Pools 63------------------------------ 64 65The Buffer Manager uses the :doc:`mempool_lib` to allocate buffers. 66Therefore, it ensures that the packet header is interleaved optimally across the channels and ranks for L3 processing. 67An mbuf contains a field indicating the pool that it originated from. 68When calling rte_pktmbuf_free(m), the mbuf returns to its original pool. 69 70Constructors 71------------ 72 73Packet mbuf constructors are provided by the API. 74The rte_pktmbuf_init() function initializes some fields in the mbuf structure that 75are not modified by the user once created (mbuf type, origin pool, buffer start address, and so on). 76This function is given as a callback function to the rte_mempool_create() function at pool creation time. 77 78Allocating and Freeing mbufs 79---------------------------- 80 81Allocating a new mbuf requires the user to specify the mempool from which the mbuf should be taken. 82For any newly-allocated mbuf, it contains one segment, with a length of 0. 83The offset to data is initialized to have some bytes of headroom in the buffer (RTE_PKTMBUF_HEADROOM). 84 85Freeing a mbuf means returning it into its original mempool. 86The content of an mbuf is not modified when it is stored in a pool (as a free mbuf). 87Fields initialized by the constructor do not need to be re-initialized at mbuf allocation. 88 89When freeing a packet mbuf that contains several segments, all of them are freed and returned to their original mempool. 90 91Manipulating mbufs 92------------------ 93 94This library provides some functions for manipulating the data in a packet mbuf. For instance: 95 96 * Get data length 97 98 * Get a pointer to the start of data 99 100 * Prepend data before data 101 102 * Append data after data 103 104 * Remove data at the beginning of the buffer (rte_pktmbuf_adj()) 105 106 * Remove data at the end of the buffer (rte_pktmbuf_trim()) Refer to the *DPDK API Reference* for details. 107 108.. _mbuf_meta: 109 110Meta Information 111---------------- 112 113Some information is retrieved by the network driver and stored in an mbuf to make processing easier. 114For instance, the VLAN, the RSS hash result 115and a flag indicating that the checksum was computed by hardware. 116 117An mbuf also contains the input port (where it comes from), and the number of segment mbufs in the chain. 118 119For chained buffers, only the first mbuf of the chain stores this meta information. 120 121For instance, this is the case on RX side for the IEEE1588 packet 122timestamp mechanism, the VLAN tagging and the IP checksum computation. 123 124On TX side, it is also possible for an application to delegate some 125processing to the hardware if it supports it. For instance, the 126RTE_MBUF_F_TX_IP_CKSUM flag allows to offload the computation of the IPv4 127checksum. 128 129The following examples explain how to configure different TX offloads on 130a vxlan-encapsulated tcp packet: 131``out_eth/out_ip/out_udp/vxlan/in_eth/in_ip/in_tcp/payload`` 132 133- calculate checksum of out_ip:: 134 135 mb->l2_len = len(out_eth) 136 mb->l3_len = len(out_ip) 137 mb->ol_flags |= RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM 138 set out_ip checksum to 0 in the packet 139 140 This is supported on hardware advertising RTE_ETH_TX_OFFLOAD_IPV4_CKSUM. 141 142- calculate checksum of out_ip and out_udp:: 143 144 mb->l2_len = len(out_eth) 145 mb->l3_len = len(out_ip) 146 mb->ol_flags |= RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_UDP_CKSUM 147 set out_ip checksum to 0 in the packet 148 set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum() 149 150 This is supported on hardware advertising RTE_ETH_TX_OFFLOAD_IPV4_CKSUM 151 and RTE_ETH_TX_OFFLOAD_UDP_CKSUM. 152 153- calculate checksum of in_ip:: 154 155 mb->l2_len = len(out_eth + out_ip + out_udp + vxlan + in_eth) 156 mb->l3_len = len(in_ip) 157 mb->ol_flags |= RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM 158 set in_ip checksum to 0 in the packet 159 160 This is similar to case 1), but l2_len is different. It is supported 161 on hardware advertising RTE_ETH_TX_OFFLOAD_IPV4_CKSUM. 162 Note that it can only work if outer L4 checksum is 0. 163 164- calculate checksum of in_ip and in_tcp:: 165 166 mb->l2_len = len(out_eth + out_ip + out_udp + vxlan + in_eth) 167 mb->l3_len = len(in_ip) 168 mb->ol_flags |= RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_TCP_CKSUM 169 set in_ip checksum to 0 in the packet 170 set in_tcp checksum to pseudo header using rte_ipv4_phdr_cksum() 171 172 This is similar to case 2), but l2_len is different. It is supported 173 on hardware advertising RTE_ETH_TX_OFFLOAD_IPV4_CKSUM and 174 RTE_ETH_TX_OFFLOAD_TCP_CKSUM. 175 Note that it can only work if outer L4 checksum is 0. 176 177- segment inner TCP:: 178 179 mb->l2_len = len(out_eth + out_ip + out_udp + vxlan + in_eth) 180 mb->l3_len = len(in_ip) 181 mb->l4_len = len(in_tcp) 182 mb->ol_flags |= RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_TCP_CKSUM | 183 RTE_MBUF_F_TX_TCP_SEG; 184 set in_ip checksum to 0 in the packet 185 set in_tcp checksum to pseudo header without including the IP 186 payload length using rte_ipv4_phdr_cksum() 187 188 This is supported on hardware advertising RTE_ETH_TX_OFFLOAD_TCP_TSO. 189 Note that it can only work if outer L4 checksum is 0. 190 191- calculate checksum of out_ip, in_ip, in_tcp:: 192 193 mb->outer_l2_len = len(out_eth) 194 mb->outer_l3_len = len(out_ip) 195 mb->l2_len = len(out_udp + vxlan + in_eth) 196 mb->l3_len = len(in_ip) 197 mb->ol_flags |= RTE_MBUF_F_TX_OUTER_IPV4 | RTE_MBUF_F_TX_OUTER_IP_CKSUM | \ 198 RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_TCP_CKSUM; 199 set out_ip checksum to 0 in the packet 200 set in_ip checksum to 0 in the packet 201 set in_tcp checksum to pseudo header using rte_ipv4_phdr_cksum() 202 203 This is supported on hardware advertising RTE_ETH_TX_OFFLOAD_IPV4_CKSUM, 204 RTE_ETH_TX_OFFLOAD_UDP_CKSUM and RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM. 205 206The list of flags and their precise meaning is described in the mbuf API 207documentation (rte_mbuf.h). Also refer to the testpmd source code 208(specifically the csumonly.c file) for details. 209 210Dynamic fields and flags 211~~~~~~~~~~~~~~~~~~~~~~~~ 212 213The size of the mbuf is constrained and limited; 214while the amount of metadata to save for each packet is quite unlimited. 215The most basic networking information already find their place 216in the existing mbuf fields and flags. 217 218If new features need to be added, the new fields and flags should fit 219in the "dynamic space", by registering some room in the mbuf structure: 220 221dynamic field 222 named area in the mbuf structure, 223 with a given size (at least 1 byte) and alignment constraint. 224 225dynamic flag 226 named bit in the mbuf structure, 227 stored in the field ``ol_flags``. 228 229The dynamic fields and flags are managed with the functions ``rte_mbuf_dyn*``. 230 231It is not possible to unregister fields or flags. 232 233.. _direct_indirect_buffer: 234 235Direct and Indirect Buffers 236--------------------------- 237 238A direct buffer is a buffer that is completely separate and self-contained. 239An indirect buffer behaves like a direct buffer but for the fact that the buffer pointer and 240data offset in it refer to data in another direct buffer. 241This is useful in situations where packets need to be duplicated or fragmented, 242since indirect buffers provide the means to reuse the same packet data across multiple buffers. 243 244A buffer becomes indirect when it is "attached" to a direct buffer using the rte_pktmbuf_attach() function. 245Each buffer has a reference counter field and whenever an indirect buffer is attached to the direct buffer, 246the reference counter on the direct buffer is incremented. 247Similarly, whenever the indirect buffer is detached, the reference counter on the direct buffer is decremented. 248If the resulting reference counter is equal to 0, the direct buffer is freed since it is no longer in use. 249 250There are a few things to remember when dealing with indirect buffers. 251First of all, an indirect buffer is never attached to another indirect buffer. 252Attempting to attach buffer A to indirect buffer B that is attached to C, makes rte_pktmbuf_attach() automatically attach A to C, effectively cloning B. 253Secondly, for a buffer to become indirect, its reference counter must be equal to 1, 254that is, it must not be already referenced by another indirect buffer. 255Finally, it is not possible to reattach an indirect buffer to the direct buffer (unless it is detached first). 256 257While the attach/detach operations can be invoked directly using the recommended rte_pktmbuf_attach() and rte_pktmbuf_detach() functions, 258it is suggested to use the higher-level rte_pktmbuf_clone() function, 259which takes care of the correct initialization of an indirect buffer and can clone buffers with multiple segments. 260 261Since indirect buffers are not supposed to actually hold any data, 262the memory pool for indirect buffers should be configured to indicate the reduced memory consumption. 263Examples of the initialization of a memory pool for indirect buffers (as well as use case examples for indirect buffers) 264can be found in several of the sample applications, for example, the IPv4 Multicast sample application. 265 266Debug 267----- 268 269In debug mode, the functions of the mbuf library perform sanity checks before any operation (such as, buffer corruption, 270bad type, and so on). 271 272Use Cases 273--------- 274 275All networking application should use mbufs to transport network packets. 276