15630257fSFerruh Yigit.. SPDX-License-Identifier: BSD-3-Clause 25630257fSFerruh Yigit Copyright(c) 2010-2014 Intel Corporation. 3fc1f2750SBernard Iremonger 4*41dd9a6bSDavid YoungPacket (Mbuf) Library 5*41dd9a6bSDavid Young===================== 6fc1f2750SBernard Iremonger 7*41dd9a6bSDavid YoungThe Packet (MBuf) library provides the ability to allocate and free buffers (mbufs) 848624fd9SSiobhan Butlerthat may be used by the DPDK application to store message buffers. 9*41dd9a6bSDavid YoungThe message buffers are stored in a mempool, using the :doc:`mempool_lib`. 10fc1f2750SBernard Iremonger 11d27a6261SOlivier MatzA rte_mbuf struct generally carries network packet buffers, but it can actually 12d27a6261SOlivier Matzbe any data (control data, events, ...). 1314b5e699SBruce RichardsonThe rte_mbuf header structure is kept as small as possible and currently uses 1414b5e699SBruce Richardsonjust two cache lines, with the most frequently used fields being on the first 1514b5e699SBruce Richardsonof the two cache lines. 16fc1f2750SBernard Iremonger 17fc1f2750SBernard IremongerDesign of Packet Buffers 18fc1f2750SBernard Iremonger------------------------ 19fc1f2750SBernard Iremonger 20fc1f2750SBernard IremongerFor the storage of the packet data (including protocol headers), two approaches were considered: 21fc1f2750SBernard Iremonger 22fc1f2750SBernard Iremonger#. Embed metadata within a single memory buffer the structure followed by a fixed size area for the packet data. 23fc1f2750SBernard Iremonger 24fc1f2750SBernard Iremonger#. Use separate memory buffers for the metadata structure and for the packet data. 25fc1f2750SBernard Iremonger 26fc1f2750SBernard IremongerThe advantage of the first method is that it only needs one operation to allocate/free the whole memory representation of a packet. 27fc1f2750SBernard IremongerOn the other hand, the second method is more flexible and allows 28fc1f2750SBernard Iremongerthe complete separation of the allocation of metadata structures from the allocation of packet data buffers. 29fc1f2750SBernard Iremonger 3048624fd9SSiobhan ButlerThe first method was chosen for the DPDK. 31fc1f2750SBernard IremongerThe metadata contains control information such as message type, length, 3214b5e699SBruce Richardsonoffset to the start of the data and a pointer for additional mbuf structures allowing buffer chaining. 33fc1f2750SBernard Iremonger 34fc1f2750SBernard IremongerMessage buffers that are used to carry network packets can handle buffer chaining 35fc1f2750SBernard Iremongerwhere multiple buffers are required to hold the complete packet. 3614b5e699SBruce RichardsonThis is the case for jumbo frames that are composed of many mbufs linked together through their next field. 37fc1f2750SBernard Iremonger 38fc1f2750SBernard IremongerFor a newly allocated mbuf, the area at which the data begins in the message buffer is 39fc1f2750SBernard IremongerRTE_PKTMBUF_HEADROOM bytes after the beginning of the buffer, which is cache aligned. 40fc1f2750SBernard IremongerMessage buffers may be used to carry control information, packets, events, 41fc1f2750SBernard Iremongerand so on between different entities in the system. 4214b5e699SBruce RichardsonMessage buffers may also use their buffer pointers to point to other message buffer data sections or other structures. 43fc1f2750SBernard Iremonger 444a22e6eeSJohn McNamara:numref:`figure_mbuf1` and :numref:`figure_mbuf2` show some of these scenarios. 45fc1f2750SBernard Iremonger 464a22e6eeSJohn McNamara.. _figure_mbuf1: 47fc1f2750SBernard Iremonger 484a22e6eeSJohn McNamara.. figure:: img/mbuf1.* 49fc1f2750SBernard Iremonger 504a22e6eeSJohn McNamara An mbuf with One Segment 51fc1f2750SBernard Iremonger 52fc1f2750SBernard Iremonger 534a22e6eeSJohn McNamara.. _figure_mbuf2: 54fc1f2750SBernard Iremonger 554a22e6eeSJohn McNamara.. figure:: img/mbuf2.* 56fc1f2750SBernard Iremonger 574a22e6eeSJohn McNamara An mbuf with Three Segments 58fc1f2750SBernard Iremonger 59fc1f2750SBernard Iremonger 60fc1f2750SBernard IremongerThe Buffer Manager implements a fairly standard set of buffer access functions to manipulate network packets. 61fc1f2750SBernard Iremonger 62fc1f2750SBernard IremongerBuffers Stored in Memory Pools 63fc1f2750SBernard Iremonger------------------------------ 64fc1f2750SBernard Iremonger 65*41dd9a6bSDavid YoungThe Buffer Manager uses the :doc:`mempool_lib` to allocate buffers. 66fc1f2750SBernard IremongerTherefore, it ensures that the packet header is interleaved optimally across the channels and ranks for L3 processing. 67fc1f2750SBernard IremongerAn mbuf contains a field indicating the pool that it originated from. 68d27a6261SOlivier MatzWhen calling rte_pktmbuf_free(m), the mbuf returns to its original pool. 69fc1f2750SBernard Iremonger 70fc1f2750SBernard IremongerConstructors 71fc1f2750SBernard Iremonger------------ 72fc1f2750SBernard Iremonger 73d27a6261SOlivier MatzPacket mbuf constructors are provided by the API. 74d27a6261SOlivier MatzThe rte_pktmbuf_init() function initializes some fields in the mbuf structure that 75fc1f2750SBernard Iremongerare not modified by the user once created (mbuf type, origin pool, buffer start address, and so on). 76fc1f2750SBernard IremongerThis function is given as a callback function to the rte_mempool_create() function at pool creation time. 77fc1f2750SBernard Iremonger 78fc1f2750SBernard IremongerAllocating and Freeing mbufs 79fc1f2750SBernard Iremonger---------------------------- 80fc1f2750SBernard Iremonger 81fc1f2750SBernard IremongerAllocating a new mbuf requires the user to specify the mempool from which the mbuf should be taken. 8214b5e699SBruce RichardsonFor any newly-allocated mbuf, it contains one segment, with a length of 0. 8314b5e699SBruce RichardsonThe offset to data is initialized to have some bytes of headroom in the buffer (RTE_PKTMBUF_HEADROOM). 84fc1f2750SBernard Iremonger 85fc1f2750SBernard IremongerFreeing a mbuf means returning it into its original mempool. 86fc1f2750SBernard IremongerThe content of an mbuf is not modified when it is stored in a pool (as a free mbuf). 87fc1f2750SBernard IremongerFields initialized by the constructor do not need to be re-initialized at mbuf allocation. 88fc1f2750SBernard Iremonger 89fc1f2750SBernard IremongerWhen freeing a packet mbuf that contains several segments, all of them are freed and returned to their original mempool. 90fc1f2750SBernard Iremonger 91fc1f2750SBernard IremongerManipulating mbufs 92fc1f2750SBernard Iremonger------------------ 93fc1f2750SBernard Iremonger 94fc1f2750SBernard IremongerThis library provides some functions for manipulating the data in a packet mbuf. For instance: 95fc1f2750SBernard Iremonger 96fc1f2750SBernard Iremonger * Get data length 97fc1f2750SBernard Iremonger 98fc1f2750SBernard Iremonger * Get a pointer to the start of data 99fc1f2750SBernard Iremonger 100fc1f2750SBernard Iremonger * Prepend data before data 101fc1f2750SBernard Iremonger 102fc1f2750SBernard Iremonger * Append data after data 103fc1f2750SBernard Iremonger 104fc1f2750SBernard Iremonger * Remove data at the beginning of the buffer (rte_pktmbuf_adj()) 105fc1f2750SBernard Iremonger 10648624fd9SSiobhan Butler * Remove data at the end of the buffer (rte_pktmbuf_trim()) Refer to the *DPDK API Reference* for details. 107fc1f2750SBernard Iremonger 108*41dd9a6bSDavid Young.. _mbuf_meta: 109*41dd9a6bSDavid Young 110fc1f2750SBernard IremongerMeta Information 111fc1f2750SBernard Iremonger---------------- 112fc1f2750SBernard Iremonger 113fc1f2750SBernard IremongerSome information is retrieved by the network driver and stored in an mbuf to make processing easier. 114*41dd9a6bSDavid YoungFor instance, the VLAN, the RSS hash result 115fc1f2750SBernard Iremongerand a flag indicating that the checksum was computed by hardware. 116fc1f2750SBernard Iremonger 117fc1f2750SBernard IremongerAn mbuf also contains the input port (where it comes from), and the number of segment mbufs in the chain. 118fc1f2750SBernard Iremonger 119fc1f2750SBernard IremongerFor chained buffers, only the first mbuf of the chain stores this meta information. 120fc1f2750SBernard Iremonger 1212542ad53SOlivier MatzFor instance, this is the case on RX side for the IEEE1588 packet 1222542ad53SOlivier Matztimestamp mechanism, the VLAN tagging and the IP checksum computation. 1232542ad53SOlivier Matz 1242542ad53SOlivier MatzOn TX side, it is also possible for an application to delegate some 1252542ad53SOlivier Matzprocessing to the hardware if it supports it. For instance, the 126daa02b5cSOlivier MatzRTE_MBUF_F_TX_IP_CKSUM flag allows to offload the computation of the IPv4 1272542ad53SOlivier Matzchecksum. 1282542ad53SOlivier Matz 1292542ad53SOlivier MatzThe following examples explain how to configure different TX offloads on 1302542ad53SOlivier Matza vxlan-encapsulated tcp packet: 1312542ad53SOlivier Matz``out_eth/out_ip/out_udp/vxlan/in_eth/in_ip/in_tcp/payload`` 1322542ad53SOlivier Matz 1332542ad53SOlivier Matz- calculate checksum of out_ip:: 1342542ad53SOlivier Matz 1352542ad53SOlivier Matz mb->l2_len = len(out_eth) 1362542ad53SOlivier Matz mb->l3_len = len(out_ip) 1371d9077d1SYingming Mao mb->ol_flags |= RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM 1382542ad53SOlivier Matz set out_ip checksum to 0 in the packet 1392542ad53SOlivier Matz 140295968d1SFerruh Yigit This is supported on hardware advertising RTE_ETH_TX_OFFLOAD_IPV4_CKSUM. 1412542ad53SOlivier Matz 1422542ad53SOlivier Matz- calculate checksum of out_ip and out_udp:: 1432542ad53SOlivier Matz 1442542ad53SOlivier Matz mb->l2_len = len(out_eth) 1452542ad53SOlivier Matz mb->l3_len = len(out_ip) 1461d9077d1SYingming Mao mb->ol_flags |= RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_UDP_CKSUM 1472542ad53SOlivier Matz set out_ip checksum to 0 in the packet 1482542ad53SOlivier Matz set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum() 1492542ad53SOlivier Matz 150295968d1SFerruh Yigit This is supported on hardware advertising RTE_ETH_TX_OFFLOAD_IPV4_CKSUM 151295968d1SFerruh Yigit and RTE_ETH_TX_OFFLOAD_UDP_CKSUM. 1522542ad53SOlivier Matz 1532542ad53SOlivier Matz- calculate checksum of in_ip:: 1542542ad53SOlivier Matz 1552542ad53SOlivier Matz mb->l2_len = len(out_eth + out_ip + out_udp + vxlan + in_eth) 1562542ad53SOlivier Matz mb->l3_len = len(in_ip) 1571d9077d1SYingming Mao mb->ol_flags |= RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM 1582542ad53SOlivier Matz set in_ip checksum to 0 in the packet 1592542ad53SOlivier Matz 1602542ad53SOlivier Matz This is similar to case 1), but l2_len is different. It is supported 161295968d1SFerruh Yigit on hardware advertising RTE_ETH_TX_OFFLOAD_IPV4_CKSUM. 1622542ad53SOlivier Matz Note that it can only work if outer L4 checksum is 0. 1632542ad53SOlivier Matz 1642542ad53SOlivier Matz- calculate checksum of in_ip and in_tcp:: 1652542ad53SOlivier Matz 1662542ad53SOlivier Matz mb->l2_len = len(out_eth + out_ip + out_udp + vxlan + in_eth) 1672542ad53SOlivier Matz mb->l3_len = len(in_ip) 1681d9077d1SYingming Mao mb->ol_flags |= RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_TCP_CKSUM 1692542ad53SOlivier Matz set in_ip checksum to 0 in the packet 1702542ad53SOlivier Matz set in_tcp checksum to pseudo header using rte_ipv4_phdr_cksum() 1712542ad53SOlivier Matz 1722542ad53SOlivier Matz This is similar to case 2), but l2_len is different. It is supported 173295968d1SFerruh Yigit on hardware advertising RTE_ETH_TX_OFFLOAD_IPV4_CKSUM and 174295968d1SFerruh Yigit RTE_ETH_TX_OFFLOAD_TCP_CKSUM. 1752542ad53SOlivier Matz Note that it can only work if outer L4 checksum is 0. 1762542ad53SOlivier Matz 1772542ad53SOlivier Matz- segment inner TCP:: 1782542ad53SOlivier Matz 1792542ad53SOlivier Matz mb->l2_len = len(out_eth + out_ip + out_udp + vxlan + in_eth) 1802542ad53SOlivier Matz mb->l3_len = len(in_ip) 1812542ad53SOlivier Matz mb->l4_len = len(in_tcp) 182daa02b5cSOlivier Matz mb->ol_flags |= RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_TCP_CKSUM | 183daa02b5cSOlivier Matz RTE_MBUF_F_TX_TCP_SEG; 1842542ad53SOlivier Matz set in_ip checksum to 0 in the packet 1852542ad53SOlivier Matz set in_tcp checksum to pseudo header without including the IP 1862542ad53SOlivier Matz payload length using rte_ipv4_phdr_cksum() 1872542ad53SOlivier Matz 188295968d1SFerruh Yigit This is supported on hardware advertising RTE_ETH_TX_OFFLOAD_TCP_TSO. 1892542ad53SOlivier Matz Note that it can only work if outer L4 checksum is 0. 1902542ad53SOlivier Matz 1912542ad53SOlivier Matz- calculate checksum of out_ip, in_ip, in_tcp:: 1922542ad53SOlivier Matz 1932542ad53SOlivier Matz mb->outer_l2_len = len(out_eth) 1942542ad53SOlivier Matz mb->outer_l3_len = len(out_ip) 1952542ad53SOlivier Matz mb->l2_len = len(out_udp + vxlan + in_eth) 1962542ad53SOlivier Matz mb->l3_len = len(in_ip) 197daa02b5cSOlivier Matz mb->ol_flags |= RTE_MBUF_F_TX_OUTER_IPV4 | RTE_MBUF_F_TX_OUTER_IP_CKSUM | \ 198daa02b5cSOlivier Matz RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_TCP_CKSUM; 1992542ad53SOlivier Matz set out_ip checksum to 0 in the packet 2002542ad53SOlivier Matz set in_ip checksum to 0 in the packet 2012542ad53SOlivier Matz set in_tcp checksum to pseudo header using rte_ipv4_phdr_cksum() 2022542ad53SOlivier Matz 203295968d1SFerruh Yigit This is supported on hardware advertising RTE_ETH_TX_OFFLOAD_IPV4_CKSUM, 204295968d1SFerruh Yigit RTE_ETH_TX_OFFLOAD_UDP_CKSUM and RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM. 2052542ad53SOlivier Matz 2062542ad53SOlivier MatzThe list of flags and their precise meaning is described in the mbuf API 2072542ad53SOlivier Matzdocumentation (rte_mbuf.h). Also refer to the testpmd source code 2082542ad53SOlivier Matz(specifically the csumonly.c file) for details. 2092542ad53SOlivier Matz 210d1342ea4SThomas MonjalonDynamic fields and flags 211d1342ea4SThomas Monjalon~~~~~~~~~~~~~~~~~~~~~~~~ 212d1342ea4SThomas Monjalon 213d1342ea4SThomas MonjalonThe size of the mbuf is constrained and limited; 214d1342ea4SThomas Monjalonwhile the amount of metadata to save for each packet is quite unlimited. 215d1342ea4SThomas MonjalonThe most basic networking information already find their place 216d1342ea4SThomas Monjalonin the existing mbuf fields and flags. 217d1342ea4SThomas Monjalon 218d1342ea4SThomas MonjalonIf new features need to be added, the new fields and flags should fit 219d1342ea4SThomas Monjalonin the "dynamic space", by registering some room in the mbuf structure: 220d1342ea4SThomas Monjalon 221d1342ea4SThomas Monjalondynamic field 222d1342ea4SThomas Monjalon named area in the mbuf structure, 223d1342ea4SThomas Monjalon with a given size (at least 1 byte) and alignment constraint. 224d1342ea4SThomas Monjalon 225d1342ea4SThomas Monjalondynamic flag 226d1342ea4SThomas Monjalon named bit in the mbuf structure, 227d1342ea4SThomas Monjalon stored in the field ``ol_flags``. 228d1342ea4SThomas Monjalon 229d1342ea4SThomas MonjalonThe dynamic fields and flags are managed with the functions ``rte_mbuf_dyn*``. 230d1342ea4SThomas Monjalon 231d1342ea4SThomas MonjalonIt is not possible to unregister fields or flags. 232d1342ea4SThomas Monjalon 23329e30cbcSThomas Monjalon.. _direct_indirect_buffer: 23429e30cbcSThomas Monjalon 235fc1f2750SBernard IremongerDirect and Indirect Buffers 236fc1f2750SBernard Iremonger--------------------------- 237fc1f2750SBernard Iremonger 238fc1f2750SBernard IremongerA direct buffer is a buffer that is completely separate and self-contained. 23914b5e699SBruce RichardsonAn indirect buffer behaves like a direct buffer but for the fact that the buffer pointer and 24014b5e699SBruce Richardsondata offset in it refer to data in another direct buffer. 241fc1f2750SBernard IremongerThis is useful in situations where packets need to be duplicated or fragmented, 242fc1f2750SBernard Iremongersince indirect buffers provide the means to reuse the same packet data across multiple buffers. 243fc1f2750SBernard Iremonger 244fc1f2750SBernard IremongerA buffer becomes indirect when it is "attached" to a direct buffer using the rte_pktmbuf_attach() function. 245fc1f2750SBernard IremongerEach buffer has a reference counter field and whenever an indirect buffer is attached to the direct buffer, 246fc1f2750SBernard Iremongerthe reference counter on the direct buffer is incremented. 247fc1f2750SBernard IremongerSimilarly, whenever the indirect buffer is detached, the reference counter on the direct buffer is decremented. 248fc1f2750SBernard IremongerIf the resulting reference counter is equal to 0, the direct buffer is freed since it is no longer in use. 249fc1f2750SBernard Iremonger 250fc1f2750SBernard IremongerThere are a few things to remember when dealing with indirect buffers. 25185c05b51SBaruch SiachFirst of all, an indirect buffer is never attached to another indirect buffer. 25285c05b51SBaruch SiachAttempting to attach buffer A to indirect buffer B that is attached to C, makes rte_pktmbuf_attach() automatically attach A to C, effectively cloning B. 253fc1f2750SBernard IremongerSecondly, for a buffer to become indirect, its reference counter must be equal to 1, 254fc1f2750SBernard Iremongerthat is, it must not be already referenced by another indirect buffer. 255fc1f2750SBernard IremongerFinally, it is not possible to reattach an indirect buffer to the direct buffer (unless it is detached first). 256fc1f2750SBernard Iremonger 257fc1f2750SBernard IremongerWhile the attach/detach operations can be invoked directly using the recommended rte_pktmbuf_attach() and rte_pktmbuf_detach() functions, 258fc1f2750SBernard Iremongerit is suggested to use the higher-level rte_pktmbuf_clone() function, 259fc1f2750SBernard Iremongerwhich takes care of the correct initialization of an indirect buffer and can clone buffers with multiple segments. 260fc1f2750SBernard Iremonger 261fc1f2750SBernard IremongerSince indirect buffers are not supposed to actually hold any data, 262fc1f2750SBernard Iremongerthe memory pool for indirect buffers should be configured to indicate the reduced memory consumption. 263fc1f2750SBernard IremongerExamples of the initialization of a memory pool for indirect buffers (as well as use case examples for indirect buffers) 264fc1f2750SBernard Iremongercan be found in several of the sample applications, for example, the IPv4 Multicast sample application. 265fc1f2750SBernard Iremonger 266fc1f2750SBernard IremongerDebug 267fc1f2750SBernard Iremonger----- 268fc1f2750SBernard Iremonger 26989c67ae2SCiara PowerIn debug mode, the functions of the mbuf library perform sanity checks before any operation (such as, buffer corruption, 27089c67ae2SCiara Powerbad type, and so on). 271fc1f2750SBernard Iremonger 272fc1f2750SBernard IremongerUse Cases 273fc1f2750SBernard Iremonger--------- 274fc1f2750SBernard Iremonger 275fc1f2750SBernard IremongerAll networking application should use mbufs to transport network packets. 276