1.. BSD LICENSE 2 Copyright(c) 2010-2014 Intel Corporation. All rights reserved. 3 All rights reserved. 4 5 Redistribution and use in source and binary forms, with or without 6 modification, are permitted provided that the following conditions 7 are met: 8 9 * Redistributions of source code must retain the above copyright 10 notice, this list of conditions and the following disclaimer. 11 * Redistributions in binary form must reproduce the above copyright 12 notice, this list of conditions and the following disclaimer in 13 the documentation and/or other materials provided with the 14 distribution. 15 * Neither the name of Intel Corporation nor the names of its 16 contributors may be used to endorse or promote products derived 17 from this software without specific prior written permission. 18 19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 31.. _Mbuf_Library: 32 33Mbuf Library 34============ 35 36The mbuf library provides the ability to allocate and free buffers (mbufs) 37that may be used by the DPDK application to store message buffers. 38The message buffers are stored in a mempool, using the :ref:`Mempool Library <Mempool_Library>`. 39 40A rte_mbuf struct can carry network packet buffers 41or generic control buffers (indicated by the CTRL_MBUF_FLAG). 42This can be extended to other types. 43The rte_mbuf header structure is kept as small as possible and currently uses 44just two cache lines, with the most frequently used fields being on the first 45of the two cache lines. 46 47Design of Packet Buffers 48------------------------ 49 50For the storage of the packet data (including protocol headers), two approaches were considered: 51 52#. Embed metadata within a single memory buffer the structure followed by a fixed size area for the packet data. 53 54#. Use separate memory buffers for the metadata structure and for the packet data. 55 56The advantage of the first method is that it only needs one operation to allocate/free the whole memory representation of a packet. 57On the other hand, the second method is more flexible and allows 58the complete separation of the allocation of metadata structures from the allocation of packet data buffers. 59 60The first method was chosen for the DPDK. 61The metadata contains control information such as message type, length, 62offset to the start of the data and a pointer for additional mbuf structures allowing buffer chaining. 63 64Message buffers that are used to carry network packets can handle buffer chaining 65where multiple buffers are required to hold the complete packet. 66This is the case for jumbo frames that are composed of many mbufs linked together through their next field. 67 68For a newly allocated mbuf, the area at which the data begins in the message buffer is 69RTE_PKTMBUF_HEADROOM bytes after the beginning of the buffer, which is cache aligned. 70Message buffers may be used to carry control information, packets, events, 71and so on between different entities in the system. 72Message buffers may also use their buffer pointers to point to other message buffer data sections or other structures. 73 74Figure 8 and Figure 9 show some of these scenarios. 75 76.. _pg_figure_8: 77 78**Figure 8. An mbuf with One Segment** 79 80.. image22_png has been replaced 81 82|mbuf1| 83 84.. _pg_figure_9: 85 86**Figure 9. An mbuf with Three Segments** 87 88.. image23_png has been replaced 89 90|mbuf2| 91 92The Buffer Manager implements a fairly standard set of buffer access functions to manipulate network packets. 93 94Buffers Stored in Memory Pools 95------------------------------ 96 97The Buffer Manager uses the :ref:`Mempool Library <Mempool_Library>` to allocate buffers. 98Therefore, it ensures that the packet header is interleaved optimally across the channels and ranks for L3 processing. 99An mbuf contains a field indicating the pool that it originated from. 100When calling rte_ctrlmbuf_free(m) or rte_pktmbuf_free(m), the mbuf returns to its original pool. 101 102Constructors 103------------ 104 105Packet and control mbuf constructors are provided by the API. 106The rte_pktmbuf_init() and rte_ctrlmbuf_init() functions initialize some fields in the mbuf structure that 107are not modified by the user once created (mbuf type, origin pool, buffer start address, and so on). 108This function is given as a callback function to the rte_mempool_create() function at pool creation time. 109 110Allocating and Freeing mbufs 111---------------------------- 112 113Allocating a new mbuf requires the user to specify the mempool from which the mbuf should be taken. 114For any newly-allocated mbuf, it contains one segment, with a length of 0. 115The offset to data is initialized to have some bytes of headroom in the buffer (RTE_PKTMBUF_HEADROOM). 116 117Freeing a mbuf means returning it into its original mempool. 118The content of an mbuf is not modified when it is stored in a pool (as a free mbuf). 119Fields initialized by the constructor do not need to be re-initialized at mbuf allocation. 120 121When freeing a packet mbuf that contains several segments, all of them are freed and returned to their original mempool. 122 123Manipulating mbufs 124------------------ 125 126This library provides some functions for manipulating the data in a packet mbuf. For instance: 127 128 * Get data length 129 130 * Get a pointer to the start of data 131 132 * Prepend data before data 133 134 * Append data after data 135 136 * Remove data at the beginning of the buffer (rte_pktmbuf_adj()) 137 138 * Remove data at the end of the buffer (rte_pktmbuf_trim()) Refer to the *DPDK API Reference* for details. 139 140Meta Information 141---------------- 142 143Some information is retrieved by the network driver and stored in an mbuf to make processing easier. 144For instance, the VLAN, the RSS hash result (see :ref:`Poll Mode Driver <Poll_Mode_Driver>`) 145and a flag indicating that the checksum was computed by hardware. 146 147An mbuf also contains the input port (where it comes from), and the number of segment mbufs in the chain. 148 149For chained buffers, only the first mbuf of the chain stores this meta information. 150 151For instance, this is the case on RX side for the IEEE1588 packet 152timestamp mechanism, the VLAN tagging and the IP checksum computation. 153 154On TX side, it is also possible for an application to delegate some 155processing to the hardware if it supports it. For instance, the 156PKT_TX_IP_CKSUM flag allows to offload the computation of the IPv4 157checksum. 158 159The following examples explain how to configure different TX offloads on 160a vxlan-encapsulated tcp packet: 161``out_eth/out_ip/out_udp/vxlan/in_eth/in_ip/in_tcp/payload`` 162 163- calculate checksum of out_ip:: 164 165 mb->l2_len = len(out_eth) 166 mb->l3_len = len(out_ip) 167 mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CSUM 168 set out_ip checksum to 0 in the packet 169 170 This is supported on hardwares advertising DEV_TX_OFFLOAD_IPV4_CKSUM. 171 172- calculate checksum of out_ip and out_udp:: 173 174 mb->l2_len = len(out_eth) 175 mb->l3_len = len(out_ip) 176 mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CSUM | PKT_TX_UDP_CKSUM 177 set out_ip checksum to 0 in the packet 178 set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum() 179 180 This is supported on hardwares advertising DEV_TX_OFFLOAD_IPV4_CKSUM 181 and DEV_TX_OFFLOAD_UDP_CKSUM. 182 183- calculate checksum of in_ip:: 184 185 mb->l2_len = len(out_eth + out_ip + out_udp + vxlan + in_eth) 186 mb->l3_len = len(in_ip) 187 mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CSUM 188 set in_ip checksum to 0 in the packet 189 190 This is similar to case 1), but l2_len is different. It is supported 191 on hardwares advertising DEV_TX_OFFLOAD_IPV4_CKSUM. 192 Note that it can only work if outer L4 checksum is 0. 193 194- calculate checksum of in_ip and in_tcp:: 195 196 mb->l2_len = len(out_eth + out_ip + out_udp + vxlan + in_eth) 197 mb->l3_len = len(in_ip) 198 mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CSUM | PKT_TX_TCP_CKSUM 199 set in_ip checksum to 0 in the packet 200 set in_tcp checksum to pseudo header using rte_ipv4_phdr_cksum() 201 202 This is similar to case 2), but l2_len is different. It is supported 203 on hardware advertising DEV_TX_OFFLOAD_IPV4_CKSUM and 204 DEV_TX_OFFLOAD_TCP_CKSUM. 205 Note that it can only work if outer L4 checksum is 0. 206 207- segment inner TCP:: 208 209 mb->l2_len = len(out_eth + out_ip + out_udp + vxlan + in_eth) 210 mb->l3_len = len(in_ip) 211 mb->l4_len = len(in_tcp) 212 mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | 213 PKT_TX_TCP_SEG; 214 set in_ip checksum to 0 in the packet 215 set in_tcp checksum to pseudo header without including the IP 216 payload length using rte_ipv4_phdr_cksum() 217 218 This is supported on hardware advertising DEV_TX_OFFLOAD_TCP_TSO. 219 Note that it can only work if outer L4 checksum is 0. 220 221- calculate checksum of out_ip, in_ip, in_tcp:: 222 223 mb->outer_l2_len = len(out_eth) 224 mb->outer_l3_len = len(out_ip) 225 mb->l2_len = len(out_udp + vxlan + in_eth) 226 mb->l3_len = len(in_ip) 227 mb->ol_flags |= PKT_TX_OUTER_IPV4 | PKT_TX_OUTER_IP_CKSUM | \ 228 PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM; 229 set out_ip checksum to 0 in the packet 230 set in_ip checksum to 0 in the packet 231 set in_tcp checksum to pseudo header using rte_ipv4_phdr_cksum() 232 233 This is supported on hardware advertising DEV_TX_OFFLOAD_IPV4_CKSUM, 234 DEV_TX_OFFLOAD_UDP_CKSUM and DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM. 235 236The list of flags and their precise meaning is described in the mbuf API 237documentation (rte_mbuf.h). Also refer to the testpmd source code 238(specifically the csumonly.c file) for details. 239 240Direct and Indirect Buffers 241--------------------------- 242 243A direct buffer is a buffer that is completely separate and self-contained. 244An indirect buffer behaves like a direct buffer but for the fact that the buffer pointer and 245data offset in it refer to data in another direct buffer. 246This is useful in situations where packets need to be duplicated or fragmented, 247since indirect buffers provide the means to reuse the same packet data across multiple buffers. 248 249A buffer becomes indirect when it is "attached" to a direct buffer using the rte_pktmbuf_attach() function. 250Each buffer has a reference counter field and whenever an indirect buffer is attached to the direct buffer, 251the reference counter on the direct buffer is incremented. 252Similarly, whenever the indirect buffer is detached, the reference counter on the direct buffer is decremented. 253If the resulting reference counter is equal to 0, the direct buffer is freed since it is no longer in use. 254 255There are a few things to remember when dealing with indirect buffers. 256First of all, it is not possible to attach an indirect buffer to another indirect buffer. 257Secondly, for a buffer to become indirect, its reference counter must be equal to 1, 258that is, it must not be already referenced by another indirect buffer. 259Finally, it is not possible to reattach an indirect buffer to the direct buffer (unless it is detached first). 260 261While the attach/detach operations can be invoked directly using the recommended rte_pktmbuf_attach() and rte_pktmbuf_detach() functions, 262it is suggested to use the higher-level rte_pktmbuf_clone() function, 263which takes care of the correct initialization of an indirect buffer and can clone buffers with multiple segments. 264 265Since indirect buffers are not supposed to actually hold any data, 266the memory pool for indirect buffers should be configured to indicate the reduced memory consumption. 267Examples of the initialization of a memory pool for indirect buffers (as well as use case examples for indirect buffers) 268can be found in several of the sample applications, for example, the IPv4 Multicast sample application. 269 270Debug 271----- 272 273In debug mode (CONFIG_RTE_MBUF_DEBUG is enabled), 274the functions of the mbuf library perform sanity checks before any operation (such as, buffer corruption, bad type, and so on). 275 276Use Cases 277--------- 278 279All networking application should use mbufs to transport network packets. 280 281.. |mbuf1| image:: img/mbuf1.* 282 283.. |mbuf2| image:: img/mbuf2.* 284