xref: /dpdk/doc/guides/prog_guide/mempool_lib.rst (revision 41dd9a6bc2d9c6e20e139ad713cc9d172572dd43)
1..  SPDX-License-Identifier: BSD-3-Clause
2    Copyright(c) 2010-2014 Intel Corporation.
3
4Memory Pool Library
5===================
6
7A memory pool is an allocator of a fixed-sized object.
8In the DPDK, it is identified by name and uses a mempool handler to store free objects.
9The default mempool handler is ring based.
10It provides some other optional services such as a per-core object cache and
11an alignment helper to ensure that objects are padded to spread them equally on all DRAM or DDR3 channels.
12
13This library is used by the :doc:`mbuf_lib`.
14
15Cookies
16-------
17
18In debug mode, cookies are added at the beginning and end of allocated blocks.
19The allocated objects then contain overwrite protection fields to help debugging buffer overflows.
20
21Debug mode is disabled by default,
22but can be enabled by setting ``RTE_LIBRTE_MEMPOOL_DEBUG`` in ``config/rte_config.h``.
23
24Stats
25-----
26
27In stats mode, statistics about get from/put in the pool are stored in the mempool structure.
28Statistics are per-lcore to avoid concurrent access to statistics counters.
29
30Stats mode is disabled by default,
31but can be enabled by setting ``RTE_LIBRTE_MEMPOOL_STATS`` in ``config/rte_config.h``.
32
33Memory Alignment Constraints on x86 architecture
34------------------------------------------------
35
36Depending on hardware memory configuration on X86 architecture, performance can be greatly improved by adding a specific padding between objects.
37The objective is to ensure that the beginning of each object starts on a different channel and rank in memory so that all channels are equally loaded.
38
39This is particularly true for packet buffers when doing L3 forwarding or flow classification.
40Only the first 64 bytes are accessed, so performance can be increased by spreading the start addresses of objects among the different channels.
41
42The number of ranks on any DIMM is the number of independent sets of DRAMs that can be accessed for the full data bit-width of the DIMM.
43The ranks cannot be accessed simultaneously since they share the same data path.
44The physical layout of the DRAM chips on the DIMM itself does not necessarily relate to the number of ranks.
45
46When running an application, the EAL command line options provide the ability to add the number of memory channels and ranks.
47
48.. note::
49
50    The command line must always have the number of memory channels specified for the processor.
51
52Examples of alignment for different DIMM architectures are shown in
53:numref:`figure_memory-management` and :numref:`figure_memory-management2`.
54
55.. _figure_memory-management:
56
57.. figure:: img/memory-management.*
58
59   Two Channels and Quad-ranked DIMM Example
60
61
62In this case, the assumption is that a packet is 16 blocks of 64 bytes, which is not true.
63
64The Intel® 5520 chipset has three channels, so in most cases,
65no padding is required between objects (except for objects whose size are n x 3 x 64 bytes blocks).
66
67.. _figure_memory-management2:
68
69.. figure:: img/memory-management2.*
70
71   Three Channels and Two Dual-ranked DIMM Example
72
73
74When creating a new pool, the user can specify to use this feature or not.
75
76.. note::
77
78   This feature is not present for Arm systems.
79   Modern Arm Interconnects choose the SN-F (memory channel)
80   using a hash of memory address bits.
81   As a result, the load is distributed evenly in all cases,
82   including the above described, rendering this feature unnecessary.
83
84
85.. _mempool_local_cache:
86
87Local Cache
88-----------
89
90In terms of CPU usage, the cost of multiple cores accessing a memory pool's ring of free buffers may be high
91since each access requires a compare-and-set (CAS) operation.
92To avoid having too many access requests to the memory pool's ring,
93the memory pool allocator can maintain a per-core cache and do bulk requests to the memory pool's ring,
94via the cache with many fewer locks on the actual memory pool structure.
95In this way, each core has full access to its own cache (with locks) of free objects and
96only when the cache fills does the core need to shuffle some of the free objects back to the pools ring or
97obtain more objects when the cache is empty.
98
99While this may mean a number of buffers may sit idle on some core's cache,
100the speed at which a core can access its own cache for a specific memory pool without locks provides performance gains.
101
102The cache is composed of a small, per-core table of pointers and its length (used as a stack).
103This internal cache can be enabled or disabled at creation of the pool.
104
105The maximum size of the cache is static and is defined at compilation time (RTE_MEMPOOL_CACHE_MAX_SIZE).
106
107:numref:`figure_mempool` shows a cache in operation.
108
109.. _figure_mempool:
110
111.. figure:: img/mempool.*
112
113   A mempool in Memory with its Associated Ring
114
115Alternatively to the internal default per-lcore local cache, an application can create and manage external caches through the ``rte_mempool_cache_create()``, ``rte_mempool_cache_free()`` and ``rte_mempool_cache_flush()`` calls.
116These user-owned caches can be explicitly passed to ``rte_mempool_generic_put()`` and ``rte_mempool_generic_get()``.
117The ``rte_mempool_default_cache()`` call returns the default internal cache if any.
118In contrast to the default caches, user-owned caches can be used by unregistered non-EAL threads too.
119
120.. _Mempool_Handlers:
121
122Mempool Handlers
123------------------------
124
125This allows external memory subsystems, such as external hardware memory
126management systems and software based memory allocators, to be used with DPDK.
127
128There are two aspects to a mempool handler.
129
130* Adding the code for your new mempool operations (ops). This is achieved by
131  adding a new mempool ops code, and using the ``RTE_MEMPOOL_REGISTER_OPS`` macro.
132
133* Using the new API to call ``rte_mempool_create_empty()`` and
134  ``rte_mempool_set_ops_byname()`` to create a new mempool and specifying which
135  ops to use.
136
137Several different mempool handlers may be used in the same application. A new
138mempool can be created by using the ``rte_mempool_create_empty()`` function,
139then using ``rte_mempool_set_ops_byname()`` to point the mempool to the
140relevant mempool handler callback (ops) structure.
141
142Legacy applications may continue to use the old ``rte_mempool_create()`` API
143call, which uses a ring based mempool handler by default. These applications
144will need to be modified to use a new mempool handler.
145
146For applications that use ``rte_pktmbuf_create()``, there is a config setting
147(``RTE_MBUF_DEFAULT_MEMPOOL_OPS``) that allows the application to make use of
148an alternative mempool handler.
149
150  .. note::
151
152    When running a DPDK application with shared libraries, mempool handler
153    shared objects specified with the '-d' EAL command-line parameter are
154    dynamically loaded. When running a multi-process application with shared
155    libraries, the -d arguments for mempool handlers *must be specified in the
156    same order for all processes* to ensure correct operation.
157
158
159Use Cases
160---------
161
162All allocations that require a high level of performance should use a pool-based memory allocator.
163Below are some examples:
164
165*   :doc:`mbuf_lib`
166*   Any application that needs to allocate fixed-sized objects in the data plane and that will be continuously utilized by the system.
167