15630257fSFerruh Yigit.. SPDX-License-Identifier: BSD-3-Clause 25630257fSFerruh Yigit Copyright(c) 2010-2014 Intel Corporation. 3fc1f2750SBernard Iremonger 4fc1f2750SBernard Iremonger.. _Mempool_Library: 5fc1f2750SBernard Iremonger 6fc1f2750SBernard IremongerMempool Library 7fc1f2750SBernard Iremonger=============== 8fc1f2750SBernard Iremonger 9fc1f2750SBernard IremongerA memory pool is an allocator of a fixed-sized object. 10449c49b9SDavid HuntIn the DPDK, it is identified by name and uses a mempool handler to store free objects. 11449c49b9SDavid HuntThe default mempool handler is ring based. 12fc1f2750SBernard IremongerIt provides some other optional services such as a per-core object cache and 13fc1f2750SBernard Iremongeran alignment helper to ensure that objects are padded to spread them equally on all DRAM or DDR3 channels. 14fc1f2750SBernard Iremonger 15a3f34a98SThomas MonjalonThis library is used by the :ref:`Mbuf Library <Mbuf_Library>`. 16fc1f2750SBernard Iremonger 17fc1f2750SBernard IremongerCookies 18fc1f2750SBernard Iremonger------- 19fc1f2750SBernard Iremonger 20fc1f2750SBernard IremongerIn debug mode (CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG is enabled), cookies are added at the beginning and end of allocated blocks. 21fc1f2750SBernard IremongerThe allocated objects then contain overwrite protection fields to help debugging buffer overflows. 22fc1f2750SBernard Iremonger 23fc1f2750SBernard IremongerStats 24fc1f2750SBernard Iremonger----- 25fc1f2750SBernard Iremonger 26fc1f2750SBernard IremongerIn debug mode (CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG is enabled), 27fc1f2750SBernard Iremongerstatistics about get from/put in the pool are stored in the mempool structure. 28fc1f2750SBernard IremongerStatistics are per-lcore to avoid concurrent access to statistics counters. 29fc1f2750SBernard Iremonger 303f2d6766SJerin JacobMemory Alignment Constraints on x86 architecture 313f2d6766SJerin Jacob------------------------------------------------ 32fc1f2750SBernard Iremonger 333f2d6766SJerin JacobDepending on hardware memory configuration on X86 architecture, performance can be greatly improved by adding a specific padding between objects. 34fc1f2750SBernard IremongerThe objective is to ensure that the beginning of each object starts on a different channel and rank in memory so that all channels are equally loaded. 35fc1f2750SBernard Iremonger 36fc1f2750SBernard IremongerThis is particularly true for packet buffers when doing L3 forwarding or flow classification. 37fc1f2750SBernard IremongerOnly the first 64 bytes are accessed, so performance can be increased by spreading the start addresses of objects among the different channels. 38fc1f2750SBernard Iremonger 39fc1f2750SBernard IremongerThe number of ranks on any DIMM is the number of independent sets of DRAMs that can be accessed for the full data bit-width of the DIMM. 40fc1f2750SBernard IremongerThe ranks cannot be accessed simultaneously since they share the same data path. 41fc1f2750SBernard IremongerThe physical layout of the DRAM chips on the DIMM itself does not necessarily relate to the number of ranks. 42fc1f2750SBernard Iremonger 43fc1f2750SBernard IremongerWhen running an application, the EAL command line options provide the ability to add the number of memory channels and ranks. 44fc1f2750SBernard Iremonger 45fc1f2750SBernard Iremonger.. note:: 46fc1f2750SBernard Iremonger 47fc1f2750SBernard Iremonger The command line must always have the number of memory channels specified for the processor. 48fc1f2750SBernard Iremonger 494a22e6eeSJohn McNamaraExamples of alignment for different DIMM architectures are shown in 504a22e6eeSJohn McNamara:numref:`figure_memory-management` and :numref:`figure_memory-management2`. 51fc1f2750SBernard Iremonger 524a22e6eeSJohn McNamara.. _figure_memory-management: 53fc1f2750SBernard Iremonger 544a22e6eeSJohn McNamara.. figure:: img/memory-management.* 55fc1f2750SBernard Iremonger 564a22e6eeSJohn McNamara Two Channels and Quad-ranked DIMM Example 57fc1f2750SBernard Iremonger 58fc1f2750SBernard Iremonger 59fc1f2750SBernard IremongerIn this case, the assumption is that a packet is 16 blocks of 64 bytes, which is not true. 60fc1f2750SBernard Iremonger 61fc1f2750SBernard IremongerThe Intel® 5520 chipset has three channels, so in most cases, 62fc1f2750SBernard Iremongerno padding is required between objects (except for objects whose size are n x 3 x 64 bytes blocks). 63fc1f2750SBernard Iremonger 644a22e6eeSJohn McNamara.. _figure_memory-management2: 65fc1f2750SBernard Iremonger 664a22e6eeSJohn McNamara.. figure:: img/memory-management2.* 67fc1f2750SBernard Iremonger 684a22e6eeSJohn McNamara Three Channels and Two Dual-ranked DIMM Example 69fc1f2750SBernard Iremonger 70fc1f2750SBernard Iremonger 71fc1f2750SBernard IremongerWhen creating a new pool, the user can specify to use this feature or not. 72fc1f2750SBernard Iremonger 7329e30cbcSThomas Monjalon.. _mempool_local_cache: 7429e30cbcSThomas Monjalon 75fc1f2750SBernard IremongerLocal Cache 76fc1f2750SBernard Iremonger----------- 77fc1f2750SBernard Iremonger 78fc1f2750SBernard IremongerIn terms of CPU usage, the cost of multiple cores accessing a memory pool's ring of free buffers may be high 79fc1f2750SBernard Iremongersince each access requires a compare-and-set (CAS) operation. 80fc1f2750SBernard IremongerTo avoid having too many access requests to the memory pool's ring, 81fc1f2750SBernard Iremongerthe memory pool allocator can maintain a per-core cache and do bulk requests to the memory pool's ring, 82fc1f2750SBernard Iremongervia the cache with many fewer locks on the actual memory pool structure. 83fc1f2750SBernard IremongerIn this way, each core has full access to its own cache (with locks) of free objects and 84fc1f2750SBernard Iremongeronly when the cache fills does the core need to shuffle some of the free objects back to the pools ring or 85fc1f2750SBernard Iremongerobtain more objects when the cache is empty. 86fc1f2750SBernard Iremonger 87fc1f2750SBernard IremongerWhile this may mean a number of buffers may sit idle on some core's cache, 88fc1f2750SBernard Iremongerthe speed at which a core can access its own cache for a specific memory pool without locks provides performance gains. 89fc1f2750SBernard Iremonger 90fc1f2750SBernard IremongerThe cache is composed of a small, per-core table of pointers and its length (used as a stack). 914b506275SLazaros KoromilasThis internal cache can be enabled or disabled at creation of the pool. 92fc1f2750SBernard Iremonger 93fc1f2750SBernard IremongerThe maximum size of the cache is static and is defined at compilation time (CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE). 94fc1f2750SBernard Iremonger 954a22e6eeSJohn McNamara:numref:`figure_mempool` shows a cache in operation. 96fc1f2750SBernard Iremonger 974a22e6eeSJohn McNamara.. _figure_mempool: 98fc1f2750SBernard Iremonger 994a22e6eeSJohn McNamara.. figure:: img/mempool.* 100fc1f2750SBernard Iremonger 1014a22e6eeSJohn McNamara A mempool in Memory with its Associated Ring 102fc1f2750SBernard Iremonger 1034b506275SLazaros KoromilasAlternatively to the internal default per-lcore local cache, an application can create and manage external caches through the ``rte_mempool_cache_create()``, ``rte_mempool_cache_free()`` and ``rte_mempool_cache_flush()`` calls. 1044b506275SLazaros KoromilasThese user-owned caches can be explicitly passed to ``rte_mempool_generic_put()`` and ``rte_mempool_generic_get()``. 1054b506275SLazaros KoromilasThe ``rte_mempool_default_cache()`` call returns the default internal cache if any. 106*5c307ba2SDavid MarchandIn contrast to the default caches, user-owned caches can be used by unregistered non-EAL threads too. 107fc1f2750SBernard Iremonger 108449c49b9SDavid HuntMempool Handlers 109449c49b9SDavid Hunt------------------------ 110449c49b9SDavid Hunt 111449c49b9SDavid HuntThis allows external memory subsystems, such as external hardware memory 112449c49b9SDavid Huntmanagement systems and software based memory allocators, to be used with DPDK. 113449c49b9SDavid Hunt 114449c49b9SDavid HuntThere are two aspects to a mempool handler. 115449c49b9SDavid Hunt 116449c49b9SDavid Hunt* Adding the code for your new mempool operations (ops). This is achieved by 117885d4c0dSShreyansh Jain adding a new mempool ops code, and using the ``MEMPOOL_REGISTER_OPS`` macro. 118449c49b9SDavid Hunt 119449c49b9SDavid Hunt* Using the new API to call ``rte_mempool_create_empty()`` and 120449c49b9SDavid Hunt ``rte_mempool_set_ops_byname()`` to create a new mempool and specifying which 121449c49b9SDavid Hunt ops to use. 122449c49b9SDavid Hunt 123449c49b9SDavid HuntSeveral different mempool handlers may be used in the same application. A new 124449c49b9SDavid Huntmempool can be created by using the ``rte_mempool_create_empty()`` function, 125449c49b9SDavid Huntthen using ``rte_mempool_set_ops_byname()`` to point the mempool to the 126449c49b9SDavid Huntrelevant mempool handler callback (ops) structure. 127449c49b9SDavid Hunt 128449c49b9SDavid HuntLegacy applications may continue to use the old ``rte_mempool_create()`` API 129449c49b9SDavid Huntcall, which uses a ring based mempool handler by default. These applications 130449c49b9SDavid Huntwill need to be modified to use a new mempool handler. 131449c49b9SDavid Hunt 132449c49b9SDavid HuntFor applications that use ``rte_pktmbuf_create()``, there is a config setting 133449c49b9SDavid Hunt(``RTE_MBUF_DEFAULT_MEMPOOL_OPS``) that allows the application to make use of 134449c49b9SDavid Huntan alternative mempool handler. 135449c49b9SDavid Hunt 1364ae9f32eSGage Eads .. note:: 1374ae9f32eSGage Eads 1384ae9f32eSGage Eads When running a DPDK application with shared libraries, mempool handler 1394ae9f32eSGage Eads shared objects specified with the '-d' EAL command-line parameter are 1404ae9f32eSGage Eads dynamically loaded. When running a multi-process application with shared 1414ae9f32eSGage Eads libraries, the -d arguments for mempool handlers *must be specified in the 1424ae9f32eSGage Eads same order for all processes* to ensure correct operation. 1434ae9f32eSGage Eads 144449c49b9SDavid Hunt 145fc1f2750SBernard IremongerUse Cases 146fc1f2750SBernard Iremonger--------- 147fc1f2750SBernard Iremonger 148fc1f2750SBernard IremongerAll allocations that require a high level of performance should use a pool-based memory allocator. 149fc1f2750SBernard IremongerBelow are some examples: 150fc1f2750SBernard Iremonger 151fc1f2750SBernard Iremonger* :ref:`Mbuf Library <Mbuf_Library>` 152fc1f2750SBernard Iremonger 153fc1f2750SBernard Iremonger* :ref:`Environment Abstraction Layer <Environment_Abstraction_Layer>` , for logging service 154fc1f2750SBernard Iremonger 155fc1f2750SBernard Iremonger* Any application that needs to allocate fixed-sized objects in the data plane and that will be continuously utilized by the system. 156