1fc1f2750SBernard Iremonger.. BSD LICENSE 2fc1f2750SBernard Iremonger Copyright(c) 2010-2014 Intel Corporation. All rights reserved. 3fc1f2750SBernard Iremonger All rights reserved. 4fc1f2750SBernard Iremonger 5fc1f2750SBernard Iremonger Redistribution and use in source and binary forms, with or without 6fc1f2750SBernard Iremonger modification, are permitted provided that the following conditions 7fc1f2750SBernard Iremonger are met: 8fc1f2750SBernard Iremonger 9fc1f2750SBernard Iremonger * Redistributions of source code must retain the above copyright 10fc1f2750SBernard Iremonger notice, this list of conditions and the following disclaimer. 11fc1f2750SBernard Iremonger * Redistributions in binary form must reproduce the above copyright 12fc1f2750SBernard Iremonger notice, this list of conditions and the following disclaimer in 13fc1f2750SBernard Iremonger the documentation and/or other materials provided with the 14fc1f2750SBernard Iremonger distribution. 15fc1f2750SBernard Iremonger * Neither the name of Intel Corporation nor the names of its 16fc1f2750SBernard Iremonger contributors may be used to endorse or promote products derived 17fc1f2750SBernard Iremonger from this software without specific prior written permission. 18fc1f2750SBernard Iremonger 19fc1f2750SBernard Iremonger THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20fc1f2750SBernard Iremonger "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21fc1f2750SBernard Iremonger LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 22fc1f2750SBernard Iremonger A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 23fc1f2750SBernard Iremonger OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 24fc1f2750SBernard Iremonger SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 25fc1f2750SBernard Iremonger LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 26fc1f2750SBernard Iremonger DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 27fc1f2750SBernard Iremonger THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 28fc1f2750SBernard Iremonger (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29fc1f2750SBernard Iremonger OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30fc1f2750SBernard Iremonger 31fc1f2750SBernard Iremonger.. _Mbuf_Library: 32fc1f2750SBernard Iremonger 33fc1f2750SBernard IremongerMbuf Library 34fc1f2750SBernard Iremonger============ 35fc1f2750SBernard Iremonger 36fc1f2750SBernard IremongerThe mbuf library provides the ability to allocate and free buffers (mbufs) 3748624fd9SSiobhan Butlerthat may be used by the DPDK application to store message buffers. 38fc1f2750SBernard IremongerThe message buffers are stored in a mempool, using the :ref:`Mempool Library <Mempool_Library>`. 39fc1f2750SBernard Iremonger 4014b5e699SBruce RichardsonA rte_mbuf struct can carry network packet buffers 4114b5e699SBruce Richardsonor generic control buffers (indicated by the CTRL_MBUF_FLAG). 42fc1f2750SBernard IremongerThis can be extended to other types. 4314b5e699SBruce RichardsonThe rte_mbuf header structure is kept as small as possible and currently uses 4414b5e699SBruce Richardsonjust two cache lines, with the most frequently used fields being on the first 4514b5e699SBruce Richardsonof the two cache lines. 46fc1f2750SBernard Iremonger 47fc1f2750SBernard IremongerDesign of Packet Buffers 48fc1f2750SBernard Iremonger------------------------ 49fc1f2750SBernard Iremonger 50fc1f2750SBernard IremongerFor the storage of the packet data (including protocol headers), two approaches were considered: 51fc1f2750SBernard Iremonger 52fc1f2750SBernard Iremonger#. Embed metadata within a single memory buffer the structure followed by a fixed size area for the packet data. 53fc1f2750SBernard Iremonger 54fc1f2750SBernard Iremonger#. Use separate memory buffers for the metadata structure and for the packet data. 55fc1f2750SBernard Iremonger 56fc1f2750SBernard IremongerThe advantage of the first method is that it only needs one operation to allocate/free the whole memory representation of a packet. 57fc1f2750SBernard IremongerOn the other hand, the second method is more flexible and allows 58fc1f2750SBernard Iremongerthe complete separation of the allocation of metadata structures from the allocation of packet data buffers. 59fc1f2750SBernard Iremonger 6048624fd9SSiobhan ButlerThe first method was chosen for the DPDK. 61fc1f2750SBernard IremongerThe metadata contains control information such as message type, length, 6214b5e699SBruce Richardsonoffset to the start of the data and a pointer for additional mbuf structures allowing buffer chaining. 63fc1f2750SBernard Iremonger 64fc1f2750SBernard IremongerMessage buffers that are used to carry network packets can handle buffer chaining 65fc1f2750SBernard Iremongerwhere multiple buffers are required to hold the complete packet. 6614b5e699SBruce RichardsonThis is the case for jumbo frames that are composed of many mbufs linked together through their next field. 67fc1f2750SBernard Iremonger 68fc1f2750SBernard IremongerFor a newly allocated mbuf, the area at which the data begins in the message buffer is 69fc1f2750SBernard IremongerRTE_PKTMBUF_HEADROOM bytes after the beginning of the buffer, which is cache aligned. 70fc1f2750SBernard IremongerMessage buffers may be used to carry control information, packets, events, 71fc1f2750SBernard Iremongerand so on between different entities in the system. 7214b5e699SBruce RichardsonMessage buffers may also use their buffer pointers to point to other message buffer data sections or other structures. 73fc1f2750SBernard Iremonger 744a22e6eeSJohn McNamara:numref:`figure_mbuf1` and :numref:`figure_mbuf2` show some of these scenarios. 75fc1f2750SBernard Iremonger 764a22e6eeSJohn McNamara.. _figure_mbuf1: 77fc1f2750SBernard Iremonger 784a22e6eeSJohn McNamara.. figure:: img/mbuf1.* 79fc1f2750SBernard Iremonger 804a22e6eeSJohn McNamara An mbuf with One Segment 81fc1f2750SBernard Iremonger 82fc1f2750SBernard Iremonger 834a22e6eeSJohn McNamara.. _figure_mbuf2: 84fc1f2750SBernard Iremonger 854a22e6eeSJohn McNamara.. figure:: img/mbuf2.* 86fc1f2750SBernard Iremonger 874a22e6eeSJohn McNamara An mbuf with Three Segments 88fc1f2750SBernard Iremonger 89fc1f2750SBernard Iremonger 90fc1f2750SBernard IremongerThe Buffer Manager implements a fairly standard set of buffer access functions to manipulate network packets. 91fc1f2750SBernard Iremonger 92fc1f2750SBernard IremongerBuffers Stored in Memory Pools 93fc1f2750SBernard Iremonger------------------------------ 94fc1f2750SBernard Iremonger 95fc1f2750SBernard IremongerThe Buffer Manager uses the :ref:`Mempool Library <Mempool_Library>` to allocate buffers. 96fc1f2750SBernard IremongerTherefore, it ensures that the packet header is interleaved optimally across the channels and ranks for L3 processing. 97fc1f2750SBernard IremongerAn mbuf contains a field indicating the pool that it originated from. 98fc1f2750SBernard IremongerWhen calling rte_ctrlmbuf_free(m) or rte_pktmbuf_free(m), the mbuf returns to its original pool. 99fc1f2750SBernard Iremonger 100fc1f2750SBernard IremongerConstructors 101fc1f2750SBernard Iremonger------------ 102fc1f2750SBernard Iremonger 103fc1f2750SBernard IremongerPacket and control mbuf constructors are provided by the API. 104fc1f2750SBernard IremongerThe rte_pktmbuf_init() and rte_ctrlmbuf_init() functions initialize some fields in the mbuf structure that 105fc1f2750SBernard Iremongerare not modified by the user once created (mbuf type, origin pool, buffer start address, and so on). 106fc1f2750SBernard IremongerThis function is given as a callback function to the rte_mempool_create() function at pool creation time. 107fc1f2750SBernard Iremonger 108fc1f2750SBernard IremongerAllocating and Freeing mbufs 109fc1f2750SBernard Iremonger---------------------------- 110fc1f2750SBernard Iremonger 111fc1f2750SBernard IremongerAllocating a new mbuf requires the user to specify the mempool from which the mbuf should be taken. 11214b5e699SBruce RichardsonFor any newly-allocated mbuf, it contains one segment, with a length of 0. 11314b5e699SBruce RichardsonThe offset to data is initialized to have some bytes of headroom in the buffer (RTE_PKTMBUF_HEADROOM). 114fc1f2750SBernard Iremonger 115fc1f2750SBernard IremongerFreeing a mbuf means returning it into its original mempool. 116fc1f2750SBernard IremongerThe content of an mbuf is not modified when it is stored in a pool (as a free mbuf). 117fc1f2750SBernard IremongerFields initialized by the constructor do not need to be re-initialized at mbuf allocation. 118fc1f2750SBernard Iremonger 119fc1f2750SBernard IremongerWhen freeing a packet mbuf that contains several segments, all of them are freed and returned to their original mempool. 120fc1f2750SBernard Iremonger 121fc1f2750SBernard IremongerManipulating mbufs 122fc1f2750SBernard Iremonger------------------ 123fc1f2750SBernard Iremonger 124fc1f2750SBernard IremongerThis library provides some functions for manipulating the data in a packet mbuf. For instance: 125fc1f2750SBernard Iremonger 126fc1f2750SBernard Iremonger * Get data length 127fc1f2750SBernard Iremonger 128fc1f2750SBernard Iremonger * Get a pointer to the start of data 129fc1f2750SBernard Iremonger 130fc1f2750SBernard Iremonger * Prepend data before data 131fc1f2750SBernard Iremonger 132fc1f2750SBernard Iremonger * Append data after data 133fc1f2750SBernard Iremonger 134fc1f2750SBernard Iremonger * Remove data at the beginning of the buffer (rte_pktmbuf_adj()) 135fc1f2750SBernard Iremonger 13648624fd9SSiobhan Butler * Remove data at the end of the buffer (rte_pktmbuf_trim()) Refer to the *DPDK API Reference* for details. 137fc1f2750SBernard Iremonger 138fc1f2750SBernard IremongerMeta Information 139fc1f2750SBernard Iremonger---------------- 140fc1f2750SBernard Iremonger 141fc1f2750SBernard IremongerSome information is retrieved by the network driver and stored in an mbuf to make processing easier. 142fc1f2750SBernard IremongerFor instance, the VLAN, the RSS hash result (see :ref:`Poll Mode Driver <Poll_Mode_Driver>`) 143fc1f2750SBernard Iremongerand a flag indicating that the checksum was computed by hardware. 144fc1f2750SBernard Iremonger 145fc1f2750SBernard IremongerAn mbuf also contains the input port (where it comes from), and the number of segment mbufs in the chain. 146fc1f2750SBernard Iremonger 147fc1f2750SBernard IremongerFor chained buffers, only the first mbuf of the chain stores this meta information. 148fc1f2750SBernard Iremonger 1492542ad53SOlivier MatzFor instance, this is the case on RX side for the IEEE1588 packet 1502542ad53SOlivier Matztimestamp mechanism, the VLAN tagging and the IP checksum computation. 1512542ad53SOlivier Matz 1522542ad53SOlivier MatzOn TX side, it is also possible for an application to delegate some 1532542ad53SOlivier Matzprocessing to the hardware if it supports it. For instance, the 1542542ad53SOlivier MatzPKT_TX_IP_CKSUM flag allows to offload the computation of the IPv4 1552542ad53SOlivier Matzchecksum. 1562542ad53SOlivier Matz 1572542ad53SOlivier MatzThe following examples explain how to configure different TX offloads on 1582542ad53SOlivier Matza vxlan-encapsulated tcp packet: 1592542ad53SOlivier Matz``out_eth/out_ip/out_udp/vxlan/in_eth/in_ip/in_tcp/payload`` 1602542ad53SOlivier Matz 1612542ad53SOlivier Matz- calculate checksum of out_ip:: 1622542ad53SOlivier Matz 1632542ad53SOlivier Matz mb->l2_len = len(out_eth) 1642542ad53SOlivier Matz mb->l3_len = len(out_ip) 1652542ad53SOlivier Matz mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CSUM 1662542ad53SOlivier Matz set out_ip checksum to 0 in the packet 1672542ad53SOlivier Matz 168fea1d908SJohn McNamara This is supported on hardware advertising DEV_TX_OFFLOAD_IPV4_CKSUM. 1692542ad53SOlivier Matz 1702542ad53SOlivier Matz- calculate checksum of out_ip and out_udp:: 1712542ad53SOlivier Matz 1722542ad53SOlivier Matz mb->l2_len = len(out_eth) 1732542ad53SOlivier Matz mb->l3_len = len(out_ip) 1742542ad53SOlivier Matz mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CSUM | PKT_TX_UDP_CKSUM 1752542ad53SOlivier Matz set out_ip checksum to 0 in the packet 1762542ad53SOlivier Matz set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum() 1772542ad53SOlivier Matz 178fea1d908SJohn McNamara This is supported on hardware advertising DEV_TX_OFFLOAD_IPV4_CKSUM 1792542ad53SOlivier Matz and DEV_TX_OFFLOAD_UDP_CKSUM. 1802542ad53SOlivier Matz 1812542ad53SOlivier Matz- calculate checksum of in_ip:: 1822542ad53SOlivier Matz 1832542ad53SOlivier Matz mb->l2_len = len(out_eth + out_ip + out_udp + vxlan + in_eth) 1842542ad53SOlivier Matz mb->l3_len = len(in_ip) 1852542ad53SOlivier Matz mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CSUM 1862542ad53SOlivier Matz set in_ip checksum to 0 in the packet 1872542ad53SOlivier Matz 1882542ad53SOlivier Matz This is similar to case 1), but l2_len is different. It is supported 189fea1d908SJohn McNamara on hardware advertising DEV_TX_OFFLOAD_IPV4_CKSUM. 1902542ad53SOlivier Matz Note that it can only work if outer L4 checksum is 0. 1912542ad53SOlivier Matz 1922542ad53SOlivier Matz- calculate checksum of in_ip and in_tcp:: 1932542ad53SOlivier Matz 1942542ad53SOlivier Matz mb->l2_len = len(out_eth + out_ip + out_udp + vxlan + in_eth) 1952542ad53SOlivier Matz mb->l3_len = len(in_ip) 1962542ad53SOlivier Matz mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CSUM | PKT_TX_TCP_CKSUM 1972542ad53SOlivier Matz set in_ip checksum to 0 in the packet 1982542ad53SOlivier Matz set in_tcp checksum to pseudo header using rte_ipv4_phdr_cksum() 1992542ad53SOlivier Matz 2002542ad53SOlivier Matz This is similar to case 2), but l2_len is different. It is supported 2012542ad53SOlivier Matz on hardware advertising DEV_TX_OFFLOAD_IPV4_CKSUM and 2022542ad53SOlivier Matz DEV_TX_OFFLOAD_TCP_CKSUM. 2032542ad53SOlivier Matz Note that it can only work if outer L4 checksum is 0. 2042542ad53SOlivier Matz 2052542ad53SOlivier Matz- segment inner TCP:: 2062542ad53SOlivier Matz 2072542ad53SOlivier Matz mb->l2_len = len(out_eth + out_ip + out_udp + vxlan + in_eth) 2082542ad53SOlivier Matz mb->l3_len = len(in_ip) 2092542ad53SOlivier Matz mb->l4_len = len(in_tcp) 2102542ad53SOlivier Matz mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | 2112542ad53SOlivier Matz PKT_TX_TCP_SEG; 2122542ad53SOlivier Matz set in_ip checksum to 0 in the packet 2132542ad53SOlivier Matz set in_tcp checksum to pseudo header without including the IP 2142542ad53SOlivier Matz payload length using rte_ipv4_phdr_cksum() 2152542ad53SOlivier Matz 2162542ad53SOlivier Matz This is supported on hardware advertising DEV_TX_OFFLOAD_TCP_TSO. 2172542ad53SOlivier Matz Note that it can only work if outer L4 checksum is 0. 2182542ad53SOlivier Matz 2192542ad53SOlivier Matz- calculate checksum of out_ip, in_ip, in_tcp:: 2202542ad53SOlivier Matz 2212542ad53SOlivier Matz mb->outer_l2_len = len(out_eth) 2222542ad53SOlivier Matz mb->outer_l3_len = len(out_ip) 2232542ad53SOlivier Matz mb->l2_len = len(out_udp + vxlan + in_eth) 2242542ad53SOlivier Matz mb->l3_len = len(in_ip) 2252542ad53SOlivier Matz mb->ol_flags |= PKT_TX_OUTER_IPV4 | PKT_TX_OUTER_IP_CKSUM | \ 2262542ad53SOlivier Matz PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM; 2272542ad53SOlivier Matz set out_ip checksum to 0 in the packet 2282542ad53SOlivier Matz set in_ip checksum to 0 in the packet 2292542ad53SOlivier Matz set in_tcp checksum to pseudo header using rte_ipv4_phdr_cksum() 2302542ad53SOlivier Matz 2312542ad53SOlivier Matz This is supported on hardware advertising DEV_TX_OFFLOAD_IPV4_CKSUM, 2322542ad53SOlivier Matz DEV_TX_OFFLOAD_UDP_CKSUM and DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM. 2332542ad53SOlivier Matz 2342542ad53SOlivier MatzThe list of flags and their precise meaning is described in the mbuf API 2352542ad53SOlivier Matzdocumentation (rte_mbuf.h). Also refer to the testpmd source code 2362542ad53SOlivier Matz(specifically the csumonly.c file) for details. 2372542ad53SOlivier Matz 238*29e30cbcSThomas Monjalon.. _direct_indirect_buffer: 239*29e30cbcSThomas Monjalon 240fc1f2750SBernard IremongerDirect and Indirect Buffers 241fc1f2750SBernard Iremonger--------------------------- 242fc1f2750SBernard Iremonger 243fc1f2750SBernard IremongerA direct buffer is a buffer that is completely separate and self-contained. 24414b5e699SBruce RichardsonAn indirect buffer behaves like a direct buffer but for the fact that the buffer pointer and 24514b5e699SBruce Richardsondata offset in it refer to data in another direct buffer. 246fc1f2750SBernard IremongerThis is useful in situations where packets need to be duplicated or fragmented, 247fc1f2750SBernard Iremongersince indirect buffers provide the means to reuse the same packet data across multiple buffers. 248fc1f2750SBernard Iremonger 249fc1f2750SBernard IremongerA buffer becomes indirect when it is "attached" to a direct buffer using the rte_pktmbuf_attach() function. 250fc1f2750SBernard IremongerEach buffer has a reference counter field and whenever an indirect buffer is attached to the direct buffer, 251fc1f2750SBernard Iremongerthe reference counter on the direct buffer is incremented. 252fc1f2750SBernard IremongerSimilarly, whenever the indirect buffer is detached, the reference counter on the direct buffer is decremented. 253fc1f2750SBernard IremongerIf the resulting reference counter is equal to 0, the direct buffer is freed since it is no longer in use. 254fc1f2750SBernard Iremonger 255fc1f2750SBernard IremongerThere are a few things to remember when dealing with indirect buffers. 256fc1f2750SBernard IremongerFirst of all, it is not possible to attach an indirect buffer to another indirect buffer. 257fc1f2750SBernard IremongerSecondly, for a buffer to become indirect, its reference counter must be equal to 1, 258fc1f2750SBernard Iremongerthat is, it must not be already referenced by another indirect buffer. 259fc1f2750SBernard IremongerFinally, it is not possible to reattach an indirect buffer to the direct buffer (unless it is detached first). 260fc1f2750SBernard Iremonger 261fc1f2750SBernard IremongerWhile the attach/detach operations can be invoked directly using the recommended rte_pktmbuf_attach() and rte_pktmbuf_detach() functions, 262fc1f2750SBernard Iremongerit is suggested to use the higher-level rte_pktmbuf_clone() function, 263fc1f2750SBernard Iremongerwhich takes care of the correct initialization of an indirect buffer and can clone buffers with multiple segments. 264fc1f2750SBernard Iremonger 265fc1f2750SBernard IremongerSince indirect buffers are not supposed to actually hold any data, 266fc1f2750SBernard Iremongerthe memory pool for indirect buffers should be configured to indicate the reduced memory consumption. 267fc1f2750SBernard IremongerExamples of the initialization of a memory pool for indirect buffers (as well as use case examples for indirect buffers) 268fc1f2750SBernard Iremongercan be found in several of the sample applications, for example, the IPv4 Multicast sample application. 269fc1f2750SBernard Iremonger 270fc1f2750SBernard IremongerDebug 271fc1f2750SBernard Iremonger----- 272fc1f2750SBernard Iremonger 273fc1f2750SBernard IremongerIn debug mode (CONFIG_RTE_MBUF_DEBUG is enabled), 274fc1f2750SBernard Iremongerthe functions of the mbuf library perform sanity checks before any operation (such as, buffer corruption, bad type, and so on). 275fc1f2750SBernard Iremonger 276fc1f2750SBernard IremongerUse Cases 277fc1f2750SBernard Iremonger--------- 278fc1f2750SBernard Iremonger 279fc1f2750SBernard IremongerAll networking application should use mbufs to transport network packets. 280