xref: /dpdk/doc/guides/prog_guide/lcore_var.rst (revision 37dda90ee15b7098bc48356868a87d34f727eecc)
1776d4753SMattias Rönnblom.. SPDX-License-Identifier: BSD-3-Clause
2776d4753SMattias Rönnblom   Copyright(c) 2024 Ericsson AB
3776d4753SMattias Rönnblom
4776d4753SMattias RönnblomLcore Variables
5776d4753SMattias Rönnblom===============
6776d4753SMattias Rönnblom
7776d4753SMattias RönnblomThe ``rte_lcore_var.h`` API provides a mechanism to allocate and
8776d4753SMattias Rönnblomaccess per-lcore id variables in a space- and cycle-efficient manner.
9776d4753SMattias Rönnblom
10776d4753SMattias Rönnblom
11776d4753SMattias RönnblomLcore Variables API
12776d4753SMattias Rönnblom-------------------
13776d4753SMattias Rönnblom
14776d4753SMattias RönnblomA per-lcore id variable (or lcore variable for short)
15776d4753SMattias Rönnblomholds a unique value for each EAL thread and registered non-EAL thread.
16776d4753SMattias RönnblomThus, there is one distinct value for each past, current and future
17776d4753SMattias Rönnblomlcore id-equipped thread, with a total of ``RTE_MAX_LCORE`` instances.
18776d4753SMattias Rönnblom
19776d4753SMattias RönnblomThe value of the lcore variable for one lcore id is independent of the
20776d4753SMattias Rönnblomvalues associated with other lcore ids within the same variable.
21776d4753SMattias Rönnblom
22776d4753SMattias RönnblomFor detailed information on the lcore variables API,
23776d4753SMattias Rönnblomplease refer to the ``rte_lcore_var.h`` API documentation.
24776d4753SMattias Rönnblom
25776d4753SMattias Rönnblom
26776d4753SMattias RönnblomLcore Variable Handle
27776d4753SMattias Rönnblom~~~~~~~~~~~~~~~~~~~~~
28776d4753SMattias Rönnblom
29776d4753SMattias RönnblomTo allocate and access an lcore variable's values, a *handle* is used.
30776d4753SMattias RönnblomThe handle is represented by an opaque pointer,
31776d4753SMattias Rönnblomonly to be dereferenced using the appropriate ``<rte_lcore_var.h>`` macros.
32776d4753SMattias Rönnblom
33776d4753SMattias RönnblomThe handle is a pointer to the value's type
34776d4753SMattias Rönnblom(e.g., for an ``uint32_t`` lcore variable, the handle is a ``uint32_t *``).
35776d4753SMattias Rönnblom
36776d4753SMattias RönnblomThe reason the handle is typed (i.e., it's not a void pointer or an integer)
37776d4753SMattias Rönnblomis to enable type checking when accessing values of the lcore variable.
38776d4753SMattias Rönnblom
39776d4753SMattias RönnblomA handle may be passed between modules and threads
40776d4753SMattias Rönnblomjust like any other pointer.
41776d4753SMattias Rönnblom
42776d4753SMattias RönnblomA valid (i.e., allocated) handle never has the value NULL.
43776d4753SMattias RönnblomThus, a handle set to NULL may be used
44776d4753SMattias Rönnblomto signify that allocation has not yet been done.
45776d4753SMattias Rönnblom
46776d4753SMattias Rönnblom
47776d4753SMattias RönnblomLcore Variable Allocation
48776d4753SMattias Rönnblom~~~~~~~~~~~~~~~~~~~~~~~~~
49776d4753SMattias Rönnblom
50776d4753SMattias RönnblomAn lcore variable is created in two steps:
51776d4753SMattias Rönnblom
52776d4753SMattias Rönnblom1. Define an lcore variable handle by using ``RTE_LCORE_VAR_HANDLE``.
53776d4753SMattias Rönnblom2. Allocate lcore variable storage and initialize the handle
54776d4753SMattias Rönnblom   by using ``RTE_LCORE_VAR_ALLOC`` or ``RTE_LCORE_VAR_INIT``.
55776d4753SMattias Rönnblom   Allocation generally occurs at the time of module initialization,
56776d4753SMattias Rönnblom   but may be done at any time.
57776d4753SMattias Rönnblom
58776d4753SMattias RönnblomThe lifetime of an lcore variable is not tied to the thread that created it.
59776d4753SMattias Rönnblom
60776d4753SMattias RönnblomEach lcore variable has ``RTE_MAX_LCORE`` values,
61776d4753SMattias Rönnblomone for each possible lcore id.
62776d4753SMattias RönnblomAll of an lcore variable's values may be accessed
63776d4753SMattias Rönnblomfrom the moment the lcore variable is created,
64776d4753SMattias Rönnblomthroughout the lifetime of the EAL (i.e., until ``rte_eal_cleanup()``).
65776d4753SMattias Rönnblom
66776d4753SMattias RönnblomLcore variables do not need to be freed and cannot be freed.
67776d4753SMattias Rönnblom
68776d4753SMattias Rönnblom
69776d4753SMattias RönnblomAccess
70776d4753SMattias Rönnblom~~~~~~
71776d4753SMattias Rönnblom
72776d4753SMattias RönnblomThe value of any lcore variable for any lcore id
73776d4753SMattias Rönnblommay be accessed from any thread (including unregistered threads),
74776d4753SMattias Rönnblombut it should only be *frequently* read from or written to by the *owner*.
75776d4753SMattias RönnblomA thread is considered the owner of a particular lcore variable value instance
76776d4753SMattias Rönnblomif it has the lcore id associated with that instance.
77776d4753SMattias Rönnblom
78776d4753SMattias RönnblomNon-owner accesses results in *false sharing*.
79776d4753SMattias RönnblomAs long as non-owner accesses are rare,
80776d4753SMattias Rönnblomthey will have only a very slight effect on performance.
81776d4753SMattias RönnblomThis property of lcore variables memory organization is intentional.
82776d4753SMattias RönnblomSee the implementation section for more information.
83776d4753SMattias Rönnblom
84776d4753SMattias RönnblomValues of the same lcore variable,
85776d4753SMattias Rönnblomassociated with different lcore ids may be frequently read or written
86776d4753SMattias Rönnblomby their respective owners without risking false sharing.
87776d4753SMattias Rönnblom
88776d4753SMattias RönnblomAn appropriate synchronization mechanism,
89776d4753SMattias Rönnblomsuch as atomic load and stores,
90776d4753SMattias Rönnblomshould be employed to prevent data races between the owning thread
91776d4753SMattias Rönnblomand any other thread accessing the same value instance.
92776d4753SMattias Rönnblom
93776d4753SMattias RönnblomThe value of the lcore variable for a particular lcore id
94776d4753SMattias Rönnblomis accessed via ``RTE_LCORE_VAR_LCORE``.
95776d4753SMattias Rönnblom
96776d4753SMattias RönnblomA common pattern is for an EAL thread or a registered non-EAL thread
97776d4753SMattias Rönnblomto access its own lcore variable value.
98776d4753SMattias RönnblomFor this purpose, a shorthand exists as ``RTE_LCORE_VAR``.
99776d4753SMattias Rönnblom
100776d4753SMattias Rönnblom``RTE_LCORE_VAR_FOREACH`` may be used to iterate
101776d4753SMattias Rönnblomover all values of a particular lcore variable.
102776d4753SMattias Rönnblom
103776d4753SMattias RönnblomThe handle, defined by ``RTE_LCORE_VAR_HANDLE``,
104776d4753SMattias Rönnblomis a pointer of the same type as the value,
105776d4753SMattias Rönnblombut it must be treated as an opaque identifier
106776d4753SMattias Rönnblomand cannot be directly dereferenced.
107776d4753SMattias Rönnblom
108776d4753SMattias RönnblomLcore variable handles and value pointers may be freely passed
109776d4753SMattias Rönnblombetween different threads.
110776d4753SMattias Rönnblom
111776d4753SMattias Rönnblom
112776d4753SMattias RönnblomStorage
113776d4753SMattias Rönnblom~~~~~~~
114776d4753SMattias Rönnblom
115776d4753SMattias RönnblomAn lcore variable's values may be of a primitive type like ``int``,
116776d4753SMattias Rönnblombut is typically a ``struct``.
117776d4753SMattias Rönnblom
118776d4753SMattias RönnblomThe lcore variable handle introduces a per-variable
119776d4753SMattias Rönnblom(not per-value/per-lcore id) overhead of ``sizeof(void *)`` bytes,
120776d4753SMattias Rönnblomso there are some memory footprint gains to be made by organizing
121776d4753SMattias Rönnblomall per-lcore id data for a particular module as one lcore variable
122776d4753SMattias Rönnblom(e.g., as a struct).
123776d4753SMattias Rönnblom
124776d4753SMattias RönnblomAn application may define an lcore variable handle
125776d4753SMattias Rönnblomwithout ever allocating the lcore variable.
126776d4753SMattias Rönnblom
127776d4753SMattias RönnblomThe size of an lcore variable's value cannot exceed
128776d4753SMattias Rönnblomthe DPDK build-time constant ``RTE_MAX_LCORE_VAR``.
129776d4753SMattias RönnblomAn lcore variable's size is the size of one of its value instance,
130776d4753SMattias Rönnblomnot the aggregate of all its ``RTE_MAX_LCORE`` instances.
131776d4753SMattias Rönnblom
132776d4753SMattias RönnblomLcore variables should generally *not* be ``__rte_cache_aligned``
133776d4753SMattias Rönnblomand need *not* include a ``RTE_CACHE_GUARD`` field,
134776d4753SMattias Rönnblomsince these constructs are designed to avoid false sharing.
135776d4753SMattias RönnblomWith lcore variables, false sharing is largely avoided by other means.
136776d4753SMattias RönnblomIn the case of an lcore variable instance,
137776d4753SMattias Rönnblomthe thread most recently accessing nearby data structures
138776d4753SMattias Rönnblomshould almost always be the lcore variable's owner.
139776d4753SMattias RönnblomAdding padding (e.g., with ``RTE_CACHE_GUARD``)
140776d4753SMattias Rönnblomwill increase the effective memory working set size,
141776d4753SMattias Rönnblompotentially reducing performance.
142776d4753SMattias Rönnblom
143776d4753SMattias RönnblomLcore variable values are initialized to zero by default.
144776d4753SMattias Rönnblom
145776d4753SMattias RönnblomLcore variables are not stored in huge page memory.
146776d4753SMattias Rönnblom
147776d4753SMattias Rönnblom
148776d4753SMattias RönnblomExample
149776d4753SMattias Rönnblom~~~~~~~
150776d4753SMattias Rönnblom
151776d4753SMattias RönnblomBelow is an example of the use of an lcore variable:
152776d4753SMattias Rönnblom
153776d4753SMattias Rönnblom.. code-block:: c
154776d4753SMattias Rönnblom
155776d4753SMattias Rönnblom   struct foo_lcore_state {
156776d4753SMattias Rönnblom           int a;
157776d4753SMattias Rönnblom           long b;
158776d4753SMattias Rönnblom   };
159776d4753SMattias Rönnblom
160776d4753SMattias Rönnblom   static RTE_LCORE_VAR_HANDLE(struct foo_lcore_state, lcore_states);
161776d4753SMattias Rönnblom
162776d4753SMattias Rönnblom   long foo_get_a_plus_b(void)
163776d4753SMattias Rönnblom   {
164776d4753SMattias Rönnblom           const struct foo_lcore_state *state = RTE_LCORE_VAR(lcore_states);
165776d4753SMattias Rönnblom
166776d4753SMattias Rönnblom           return state->a + state->b;
167776d4753SMattias Rönnblom   }
168776d4753SMattias Rönnblom
169776d4753SMattias Rönnblom   RTE_INIT(rte_foo_init)
170776d4753SMattias Rönnblom   {
171776d4753SMattias Rönnblom           RTE_LCORE_VAR_ALLOC(lcore_states);
172776d4753SMattias Rönnblom
173776d4753SMattias Rönnblom           unsigned int lcore_id;
174776d4753SMattias Rönnblom           struct foo_lcore_state *state;
175776d4753SMattias Rönnblom           RTE_LCORE_VAR_FOREACH(lcore_id, state, lcore_states) {
176776d4753SMattias Rönnblom                   /* initialize state */
177776d4753SMattias Rönnblom           }
178776d4753SMattias Rönnblom
179776d4753SMattias Rönnblom           /* other initialization */
180776d4753SMattias Rönnblom   }
181776d4753SMattias Rönnblom
182776d4753SMattias Rönnblom
183776d4753SMattias RönnblomImplementation
184776d4753SMattias Rönnblom--------------
185776d4753SMattias Rönnblom
186776d4753SMattias RönnblomThis section gives an overview of the implementation of lcore variables,
187776d4753SMattias Rönnblomand some background to its design.
188776d4753SMattias Rönnblom
189776d4753SMattias Rönnblom
190776d4753SMattias RönnblomLcore Variable Buffers
191776d4753SMattias Rönnblom~~~~~~~~~~~~~~~~~~~~~~
192776d4753SMattias Rönnblom
193776d4753SMattias RönnblomLcore variable values are kept in a set of ``lcore_var_buffer`` structs.
194776d4753SMattias Rönnblom
195776d4753SMattias Rönnblom.. literalinclude:: ../../../lib/eal/common/eal_common_lcore_var.c
196776d4753SMattias Rönnblom   :language: c
197776d4753SMattias Rönnblom   :start-after: base unit
198776d4753SMattias Rönnblom   :end-before: last allocated unit
199776d4753SMattias Rönnblom
200776d4753SMattias RönnblomAn lcore var buffer stores at a minimum one, but usually many, lcore variables.
201776d4753SMattias Rönnblom
202776d4753SMattias RönnblomThe value instances for all lcore ids are stored in the same buffer.
203776d4753SMattias RönnblomHowever, each lcore id has its own slice of the ``data`` array.
204776d4753SMattias RönnblomSuch a slice is ``RTE_MAX_LCORE_VAR`` bytes in size.
205776d4753SMattias Rönnblom
206776d4753SMattias RönnblomIn this way, the values associated with a particular lcore id
207776d4753SMattias Rönnblomare grouped spatially close (in memory).
208776d4753SMattias RönnblomNo padding is required to prevent false sharing.
209776d4753SMattias Rönnblom
210776d4753SMattias Rönnblom.. literalinclude:: ../../../lib/eal/common/eal_common_lcore_var.c
211776d4753SMattias Rönnblom   :language: c
212776d4753SMattias Rönnblom   :start-after: last allocated unit
213776d4753SMattias Rönnblom   :end-before: >8 end of documented variables
214776d4753SMattias Rönnblom
215776d4753SMattias RönnblomThe implementation maintains a current ``lcore_var_buffer`` and an ``offset``,
216776d4753SMattias Rönnblomwhere the latter tracks how many bytes of this current buffer has been allocated.
217776d4753SMattias Rönnblom
218776d4753SMattias RönnblomThe ``offset`` is progressively incremented
219776d4753SMattias Rönnblom(by the size of the just-allocated lcore variable),
220776d4753SMattias Rönnblomas lcore variables are being allocated.
221776d4753SMattias Rönnblom
222776d4753SMattias RönnblomIf the allocation of a variable would result in an ``offset`` larger
223776d4753SMattias Rönnblomthan ``RTE_MAX_LCORE_VAR`` (i.e., the slice size), the buffer is full.
224776d4753SMattias RönnblomIn that case, new buffer is allocated off the heap, and the ``offset`` is reset.
225776d4753SMattias Rönnblom
226776d4753SMattias RönnblomThe lcore var buffers are arranged in a link list,
227776d4753SMattias Rönnblomto allow freeing them at the point of ``rte_eal_cleanup()``.
228776d4753SMattias Rönnblom
229776d4753SMattias RönnblomThe lcore variable buffers are allocated off the regular C heap.
230776d4753SMattias RönnblomThere are a number of reasons for not using ``<rte_malloc.h>``
231776d4753SMattias Rönnblomand huge pages for lcore variables:
232776d4753SMattias Rönnblom
233776d4753SMattias Rönnblom- The libc heap is available at any time,
234776d4753SMattias Rönnblom  including early in the DPDK initialization.
235776d4753SMattias Rönnblom- The amount of data kept in lcore variables is projected to be small,
236776d4753SMattias Rönnblom  and thus is unlikely to induce translate lookaside buffer (TLB) misses.
237776d4753SMattias Rönnblom- The last (and potentially only) lcore buffer in the chain
238776d4753SMattias Rönnblom  will likely only partially be in use.
239776d4753SMattias Rönnblom  Huge pages of the sort used by DPDK are always resident in memory,
240776d4753SMattias Rönnblom  and their use would result in a significant amount of memory going to waste.
241776d4753SMattias Rönnblom  An example: ~256 kB worth of lcore variables are allocated
242776d4753SMattias Rönnblom  by DPDK libraries, PMDs and the application.
243*37dda90eSThomas Monjalon  ``RTE_MAX_LCORE_VAR`` is set to 128 kB and ``RTE_MAX_LCORE`` to 128.
244776d4753SMattias Rönnblom  With 4 kB OS pages, only the first ~64 pages of each of the 128 per-lcore id slices
245776d4753SMattias Rönnblom  in the (only) ``lcore_var_buffer`` will actually be resident (paged in).
246776d4753SMattias Rönnblom  Here, demand paging saves ~98 MB of memory.
247776d4753SMattias Rönnblom
248776d4753SMattias Rönnblom.. note::
249776d4753SMattias Rönnblom
250776d4753SMattias Rönnblom   Not residing in huge pages, lcore variables cannot be accessed from secondary processes.
251776d4753SMattias Rönnblom
252776d4753SMattias RönnblomHeap allocation failures are treated as fatal.
253776d4753SMattias RönnblomThe reason for this unorthodox design is that a majority of the allocations
254776d4753SMattias Rönnblomare deemed to happen at initialization.
255776d4753SMattias RönnblomAn early heap allocation failure for a fixed amount of data is a situation
256776d4753SMattias Rönnblomnot unlike one where there is not enough memory available for static variables
257776d4753SMattias Rönnblom(i.e., the BSS or data sections).
258776d4753SMattias Rönnblom
259776d4753SMattias RönnblomProvided these assumptions hold true, it's deemed acceptable
260776d4753SMattias Rönnblomto leave the application out of handling memory allocation failures.
261776d4753SMattias Rönnblom
262776d4753SMattias RönnblomThe upside of this approach is that no error handling code is required
263776d4753SMattias Rönnblomon the API user side.
264776d4753SMattias Rönnblom
265776d4753SMattias Rönnblom
266776d4753SMattias RönnblomLcore Variable Handles
267776d4753SMattias Rönnblom~~~~~~~~~~~~~~~~~~~~~~
268776d4753SMattias Rönnblom
269776d4753SMattias RönnblomUpon lcore variable allocation, the lcore variables API returns
270776d4753SMattias Rönnbloman opaque *handle* in the form of a pointer.
271776d4753SMattias RönnblomThe value of the pointer is ``buffer->data + offset``.
272776d4753SMattias Rönnblom
273776d4753SMattias RönnblomTranslating a handle base pointer to a pointer to a value
274776d4753SMattias Rönnblomassociated with a particular lcore id is straightforward:
275776d4753SMattias Rönnblom
276776d4753SMattias Rönnblom.. literalinclude:: ../../../lib/eal/include/rte_lcore_var.h
277776d4753SMattias Rönnblom   :language: c
278776d4753SMattias Rönnblom   :start-after: access function 8<
279776d4753SMattias Rönnblom   :end-before: >8 end of access function
280776d4753SMattias Rönnblom
281776d4753SMattias Rönnblom``RTE_MAX_LCORE_VAR`` is a public macro to allow the compiler
282776d4753SMattias Rönnblomto optimize the ``lcore_id * RTE_MAX_LCORE_VAR`` expression,
283776d4753SMattias Rönnblomand replace the multiplication with a less expensive arithmetic operation.
284776d4753SMattias Rönnblom
285776d4753SMattias RönnblomTo maintain type safety, the ``RTE_LCORE_VAR*()`` macros should be used,
286776d4753SMattias Rönnblominstead of directly invoking ``rte_lcore_var_lcore()``.
287776d4753SMattias RönnblomThe macros return a pointer of the same type as the handle
288776d4753SMattias Rönnblom(i.e., a pointer to the value's type).
289776d4753SMattias Rönnblom
290776d4753SMattias Rönnblom
291776d4753SMattias RönnblomMemory Layout
292776d4753SMattias Rönnblom~~~~~~~~~~~~~
293776d4753SMattias Rönnblom
294776d4753SMattias RönnblomThis section describes how lcore variables are organized in memory.
295776d4753SMattias Rönnblom
296776d4753SMattias RönnblomAs an illustration, two example modules are used,
297776d4753SMattias Rönnblom``rte_x`` and ``rte_y``, both maintaining per-lcore id state
298776d4753SMattias Rönnblomas a part of their implementation.
299776d4753SMattias Rönnblom
300776d4753SMattias RönnblomTwo different methods will be used to maintain such state -
301776d4753SMattias Rönnblomlcore variables and, to serve as a reference, lcore id-indexed static arrays.
302776d4753SMattias Rönnblom
303776d4753SMattias RönnblomCertain parameters are scaled down to make graphical depictions more practical.
304776d4753SMattias Rönnblom
305776d4753SMattias RönnblomFor the purpose of this exercise, a ``RTE_MAX_LCORE`` of 2 is assumed.
306776d4753SMattias RönnblomIn a real-world configuration, the maximum number of
307776d4753SMattias RönnblomEAL threads and registered threads will be much greater (e.g., 128).
308776d4753SMattias Rönnblom
309776d4753SMattias RönnblomThe lcore variables example assumes a ``RTE_MAX_LCORE_VAR`` of 64.
310776d4753SMattias RönnblomIn a real-world configuration (as controlled by ``rte_config.h``),
311776d4753SMattias Rönnblomthe value of this compile-time constant will be much greater (e.g., 1048576).
312776d4753SMattias Rönnblom
313776d4753SMattias RönnblomThe per-lcore id state is also smaller than what most real-world modules would have.
314776d4753SMattias Rönnblom
315776d4753SMattias RönnblomLcore Variables Example
316776d4753SMattias Rönnblom^^^^^^^^^^^^^^^^^^^^^^^
317776d4753SMattias Rönnblom
318776d4753SMattias RönnblomWhen lcore variables are used, the parts of ``rte_x`` and ``rte_y``
319776d4753SMattias Rönnblomthat deal with the declaration and allocation of per-lcore id data
320776d4753SMattias Rönnblommay look something like below.
321776d4753SMattias Rönnblom
322776d4753SMattias Rönnblom.. code-block:: c
323776d4753SMattias Rönnblom
324776d4753SMattias Rönnblom   /* -- Lcore variables -- */
325776d4753SMattias Rönnblom
326776d4753SMattias Rönnblom   /* rte_x.c */
327776d4753SMattias Rönnblom
328776d4753SMattias Rönnblom   struct x_lcore
329776d4753SMattias Rönnblom   {
330776d4753SMattias Rönnblom       int a;
331776d4753SMattias Rönnblom       char b;
332776d4753SMattias Rönnblom   };
333776d4753SMattias Rönnblom
334776d4753SMattias Rönnblom   static RTE_LCORE_VAR_HANDLE(struct x_lcore, x_lcores);
335776d4753SMattias Rönnblom   RTE_LCORE_VAR_INIT(x_lcores);
336776d4753SMattias Rönnblom
337776d4753SMattias Rönnblom   /../
338776d4753SMattias Rönnblom
339776d4753SMattias Rönnblom   /* rte_y.c */
340776d4753SMattias Rönnblom
341776d4753SMattias Rönnblom   struct y_lcore
342776d4753SMattias Rönnblom   {
343776d4753SMattias Rönnblom       long c;
344776d4753SMattias Rönnblom       long d;
345776d4753SMattias Rönnblom   };
346776d4753SMattias Rönnblom
347776d4753SMattias Rönnblom   static RTE_LCORE_VAR_HANDLE(struct y_lcore, y_lcores);
348776d4753SMattias Rönnblom   RTE_LCORE_VAR_INIT(y_lcores);
349776d4753SMattias Rönnblom
350776d4753SMattias Rönnblom   /../
351776d4753SMattias Rönnblom
352776d4753SMattias RönnblomThe resulting memory layout will look something like the following:
353776d4753SMattias Rönnblom
354776d4753SMattias Rönnblom.. figure:: img/lcore_var_mem_layout.*
355776d4753SMattias Rönnblom
356776d4753SMattias RönnblomThe above figure assumes that ``x_lcores`` is allocated prior to ``y_lcores``.
357776d4753SMattias Rönnblom``RTE_LCORE_VAR_INIT()`` relies constructors, run prior to ``main()`` in an undefined order.
358776d4753SMattias Rönnblom
359776d4753SMattias RönnblomThe use of lcore variables ensures that per-lcore id data is kept in close proximity,
360776d4753SMattias Rönnblomwithin a designated region of memory.
361776d4753SMattias RönnblomThis proximity enhances data locality and can improve performance.
362776d4753SMattias Rönnblom
363776d4753SMattias RönnblomLcore Id Index Static Array Example
364776d4753SMattias Rönnblom^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
365776d4753SMattias Rönnblom
366776d4753SMattias RönnblomBelow is an example of the struct declarations,
367776d4753SMattias Rönnblomdeclarations and the resulting organization in memory
368776d4753SMattias Rönnblomin case an lcore id indexed static array of cache-line aligned,
369776d4753SMattias RönnblomRTE_CACHE_GUARDed structs are used to maintain per-lcore id state.
370776d4753SMattias Rönnblom
371776d4753SMattias RönnblomThis is a common pattern in DPDK, which lcore variables attempts to replace.
372776d4753SMattias Rönnblom
373776d4753SMattias Rönnblom.. code-block:: c
374776d4753SMattias Rönnblom
375776d4753SMattias Rönnblom   /* -- Cache-aligned static arrays -- */
376776d4753SMattias Rönnblom
377776d4753SMattias Rönnblom   /* rte_x.c */
378776d4753SMattias Rönnblom
379776d4753SMattias Rönnblom   struct __rte_cache_aligned x_lcore
380776d4753SMattias Rönnblom   {
381776d4753SMattias Rönnblom       int a;
382776d4753SMattias Rönnblom       char b;
383776d4753SMattias Rönnblom       RTE_CACHE_GUARD;
384776d4753SMattias Rönnblom   };
385776d4753SMattias Rönnblom
386776d4753SMattias Rönnblom   static struct x_lcore x_lcores[RTE_MAX_LCORE];
387776d4753SMattias Rönnblom
388776d4753SMattias Rönnblom   /../
389776d4753SMattias Rönnblom
390776d4753SMattias Rönnblom   /* rte_y.c */
391776d4753SMattias Rönnblom
392776d4753SMattias Rönnblom   struct __rte_cache_aligned y_lcore
393776d4753SMattias Rönnblom   {
394776d4753SMattias Rönnblom       long c;
395776d4753SMattias Rönnblom       long d;
396776d4753SMattias Rönnblom       RTE_CACHE_GUARD;
397776d4753SMattias Rönnblom   };
398776d4753SMattias Rönnblom
399776d4753SMattias Rönnblom   static struct y_lcore y_lcores[RTE_MAX_LCORE];
400776d4753SMattias Rönnblom
401776d4753SMattias Rönnblom   /../
402776d4753SMattias Rönnblom
403776d4753SMattias RönnblomIn this approach, accessing the state for a particular lcore id is merely
404776d4753SMattias Rönnbloma matter retrieving the lcore id and looking up the correct struct instance.
405776d4753SMattias Rönnblom
406776d4753SMattias Rönnblom.. code-block:: c
407776d4753SMattias Rönnblom
408776d4753SMattias Rönnblom   struct x_lcore *my_lcore_state = &x_lcores[rte_lcore_id()];
409776d4753SMattias Rönnblom
410776d4753SMattias RönnblomThe address "0" at the top of the left-most column in the figure
411776d4753SMattias Rönnblomrepresent the base address for the ``x_lcores`` array
412776d4753SMattias Rönnblom(in the BSS segment in memory).
413776d4753SMattias Rönnblom
414776d4753SMattias RönnblomThe figure only includes the memory layout for the ``rte_x`` example module.
415776d4753SMattias Rönnblom``rte_y`` would look very similar, with ``y_lcores`` being located
416776d4753SMattias Rönnblomat some other address in the BSS section.
417776d4753SMattias Rönnblom
418776d4753SMattias Rönnblom.. figure:: img/static_array_mem_layout.*
419776d4753SMattias Rönnblom
420776d4753SMattias RönnblomThe static array approach results in the per-lcore id
421776d4753SMattias Rönnblombeing organized around modules, not lcore ids.
422776d4753SMattias RönnblomTo avoid false sharing, an extensive use of padding is employed,
423776d4753SMattias Rönnblomcausing cache fragmentation.
424776d4753SMattias Rönnblom
425776d4753SMattias RönnblomBecause the padding is interspersed with the data,
426776d4753SMattias Rönnblomdemand paging is unlikely to reduce the actual resident DRAM memory footprint.
427776d4753SMattias RönnblomThis is because the padding is smaller
428776d4753SMattias Rönnblomthan a typical operating system memory page (usually 4 kB).
429776d4753SMattias Rönnblom
430776d4753SMattias Rönnblom
431776d4753SMattias RönnblomPerformance
432776d4753SMattias Rönnblom~~~~~~~~~~~
433776d4753SMattias Rönnblom
434776d4753SMattias RönnblomOne of the goals of lcore variables is to improve performance.
435776d4753SMattias RönnblomThis is achieved by packing often-used data in fewer cache lines,
436776d4753SMattias Rönnblomand thus reducing fragmentation in CPU caches
437776d4753SMattias Rönnblomand thus somewhat improving the effective cache size and cache hit rates.
438776d4753SMattias Rönnblom
439776d4753SMattias RönnblomThe application-level gains depends much on how much data is kept in lcore variables,
440776d4753SMattias Rönnblomand how often it is accessed,
441776d4753SMattias Rönnblomand how much pressure the application asserts on the CPU caches
442776d4753SMattias Rönnblom(i.e., how much other memory it accesses).
443776d4753SMattias Rönnblom
444776d4753SMattias RönnblomThe ``lcore_var_perf_autotest`` is an attempt at exploring
445776d4753SMattias Rönnblomthe performance benefits (or drawbacks) of lcore variables
446776d4753SMattias Rönnblomcompared to its alternatives.
447776d4753SMattias RönnblomBeing a micro benchmark, it needs to be taken with a grain of salt.
448776d4753SMattias Rönnblom
449776d4753SMattias RönnblomGenerally, one shouldn't expect more than some very modest gains in performance
450776d4753SMattias Rönnblomafter a switch from lcore id indexed arrays to lcore variables.
451776d4753SMattias Rönnblom
452776d4753SMattias RönnblomAn additional benefit of the use of lcore variables is that it avoids
453776d4753SMattias Rönnblomcertain tricky issues related to CPU core hardware prefetching
454776d4753SMattias Rönnblom(e.g., next-N-lines prefetching) that may cause false sharing
455776d4753SMattias Rönnblomeven when data used by two cores do not reside on the same cache line.
456776d4753SMattias RönnblomHardware prefetch behavior is generally not publicly documented
457776d4753SMattias Rönnblomand varies across CPU vendors, CPU generations and BIOS (or similar) configurations.
458776d4753SMattias RönnblomFor applications aiming to be portable, this may cause issues.
459776d4753SMattias RönnblomOften, CPU hardware prefetch-induced issues are non-existent,
460776d4753SMattias Rönnblomexcept some particular circumstances, where their adverse effects may be significant.
461776d4753SMattias Rönnblom
462776d4753SMattias Rönnblom
463776d4753SMattias RönnblomAlternatives
464776d4753SMattias Rönnblom------------
465776d4753SMattias Rönnblom
466776d4753SMattias RönnblomLcore Id Indexed Static Arrays
467776d4753SMattias Rönnblom~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
468776d4753SMattias Rönnblom
469776d4753SMattias RönnblomLcore variables are designed to replace a pattern exemplified below:
470776d4753SMattias Rönnblom
471776d4753SMattias Rönnblom.. code-block:: c
472776d4753SMattias Rönnblom
473776d4753SMattias Rönnblom   struct __rte_cache_aligned foo_lcore_state {
474776d4753SMattias Rönnblom           int a;
475776d4753SMattias Rönnblom           long b;
476776d4753SMattias Rönnblom           RTE_CACHE_GUARD;
477776d4753SMattias Rönnblom   };
478776d4753SMattias Rönnblom
479776d4753SMattias Rönnblom   static struct foo_lcore_state lcore_states[RTE_MAX_LCORE];
480776d4753SMattias Rönnblom
481776d4753SMattias RönnblomThis scheme is simple and effective, but has one drawback:
482776d4753SMattias Rönnblomthe data is organized so that objects related to all lcores for a particular module
483776d4753SMattias Rönnblomare kept close in memory.
484776d4753SMattias RönnblomAt a bare minimum, this requires sizing data structures
485776d4753SMattias Rönnblom(e.g., using ``__rte_cache_aligned``) to an even number of cache lines
486776d4753SMattias Rönnblomand ensuring that allocation of such objects
487776d4753SMattias Rönnblomare cache line aligned to avoid false sharing.
488776d4753SMattias RönnblomWith CPU hardware prefetching and memory loads resulting from speculative execution
489776d4753SMattias Rönnblom(functions which seemingly are getting more eager faster
490776d4753SMattias Rönnblomthan they are getting more intelligent),
491776d4753SMattias Rönnblomone or more "guard" cache lines may be required
492776d4753SMattias Rönnblomto separate one lcore's data from another's and prevent false sharing.
493776d4753SMattias Rönnblom
494776d4753SMattias RönnblomLcore variables offer the advantage of working with,
495776d4753SMattias Rönnblomrather than against, the CPU's assumptions.
496776d4753SMattias RönnblomA next-line hardware prefetcher, for example, may function as intended
497776d4753SMattias Rönnblom(i.e., to the benefit, not detriment, of system performance).
498776d4753SMattias Rönnblom
499776d4753SMattias Rönnblom
500776d4753SMattias RönnblomThread Local Storage
501776d4753SMattias Rönnblom~~~~~~~~~~~~~~~~~~~~
502776d4753SMattias Rönnblom
503776d4753SMattias RönnblomAn alternative to ``rte_lcore_var.h`` is the ``rte_per_lcore.h`` API,
504776d4753SMattias Rönnblomwhich makes use of thread-local storage
505776d4753SMattias Rönnblom(TLS, e.g., GCC ``__thread`` or C11 ``_Thread_local``).
506776d4753SMattias Rönnblom
507776d4753SMattias RönnblomThere are a number of differences between using TLS
508776d4753SMattias Rönnblomand the use of lcore variables.
509776d4753SMattias Rönnblom
510776d4753SMattias RönnblomThe lifecycle of a thread-local variable instance is tied to that of the thread.
511776d4753SMattias RönnblomThe data cannot be accessed before the thread has been created,
512776d4753SMattias Rönnblomnor after it has terminated.
513776d4753SMattias RönnblomAs a result, thread-local variables must be initialized in a "lazy" manner
514776d4753SMattias Rönnblom(e.g., at the point of thread creation).
515776d4753SMattias RönnblomLcore variables may be accessed immediately after having been allocated
516776d4753SMattias Rönnblom(which may occur before any thread beyond the main thread is running).
517776d4753SMattias Rönnblom
518776d4753SMattias RönnblomA thread-local variable is duplicated across all threads in the process,
519776d4753SMattias Rönnblomincluding unregistered non-EAL threads (i.e., "regular" threads).
520776d4753SMattias RönnblomFor DPDK applications heavily relying on multi-threading
521776d4753SMattias Rönnblom(in conjunction to DPDK's "one thread per core" pattern),
522776d4753SMattias Rönnblomeither by having many concurrent threads or creating/destroying threads at a high rate,
523776d4753SMattias Rönnbloman excessive use of thread-local variables may cause inefficiencies
524776d4753SMattias Rönnblom(e.g., increased thread creation overhead due to thread-local storage initialization
525776d4753SMattias Rönnblomor increased memory footprint).
526776d4753SMattias RönnblomLcore variables *only* exist for threads with an lcore id.
527776d4753SMattias Rönnblom
528776d4753SMattias RönnblomWhether data in thread-local storage can be shared between threads
529776d4753SMattias Rönnblom(i.e., whether a pointer to a thread-local variable can be passed to
530776d4753SMattias Rönnblomand successfully dereferenced by a non-owning thread)
531776d4753SMattias Rönnblomdepends on the specifics of the TLS implementation.
532776d4753SMattias RönnblomWith GCC ``__thread`` and GCC ``_Thread_local``,
533776d4753SMattias Rönnblomdata sharing between threads is supported.
534776d4753SMattias RönnblomIn the C11 standard, accessing another thread's ``_Thread_local`` object
535776d4753SMattias Rönnblomis implementation-defined.
536776d4753SMattias RönnblomLcore variable instances may be accessed reliably by any thread.
537776d4753SMattias Rönnblom
538776d4753SMattias RönnblomLcore variables also relies on TLS to retrieve the thread's lcore id.
539776d4753SMattias RönnblomHowever, the rest of the per-thread data is not kept in TLS.
540776d4753SMattias Rönnblom
541776d4753SMattias RönnblomFrom a memory layout perspective, TLS is similar to lcore variables,
542776d4753SMattias Rönnblomand thus per-thread data structure need not be padded.
543776d4753SMattias Rönnblom
544776d4753SMattias RönnblomIn case the above-mentioned drawbacks of the use of TLS is of no significance
545776d4753SMattias Rönnblomto a particular application, TLS is a good alternative to lcore variables.
546