1776d4753SMattias Rönnblom.. SPDX-License-Identifier: BSD-3-Clause 2776d4753SMattias Rönnblom Copyright(c) 2024 Ericsson AB 3776d4753SMattias Rönnblom 4776d4753SMattias RönnblomLcore Variables 5776d4753SMattias Rönnblom=============== 6776d4753SMattias Rönnblom 7776d4753SMattias RönnblomThe ``rte_lcore_var.h`` API provides a mechanism to allocate and 8776d4753SMattias Rönnblomaccess per-lcore id variables in a space- and cycle-efficient manner. 9776d4753SMattias Rönnblom 10776d4753SMattias Rönnblom 11776d4753SMattias RönnblomLcore Variables API 12776d4753SMattias Rönnblom------------------- 13776d4753SMattias Rönnblom 14776d4753SMattias RönnblomA per-lcore id variable (or lcore variable for short) 15776d4753SMattias Rönnblomholds a unique value for each EAL thread and registered non-EAL thread. 16776d4753SMattias RönnblomThus, there is one distinct value for each past, current and future 17776d4753SMattias Rönnblomlcore id-equipped thread, with a total of ``RTE_MAX_LCORE`` instances. 18776d4753SMattias Rönnblom 19776d4753SMattias RönnblomThe value of the lcore variable for one lcore id is independent of the 20776d4753SMattias Rönnblomvalues associated with other lcore ids within the same variable. 21776d4753SMattias Rönnblom 22776d4753SMattias RönnblomFor detailed information on the lcore variables API, 23776d4753SMattias Rönnblomplease refer to the ``rte_lcore_var.h`` API documentation. 24776d4753SMattias Rönnblom 25776d4753SMattias Rönnblom 26776d4753SMattias RönnblomLcore Variable Handle 27776d4753SMattias Rönnblom~~~~~~~~~~~~~~~~~~~~~ 28776d4753SMattias Rönnblom 29776d4753SMattias RönnblomTo allocate and access an lcore variable's values, a *handle* is used. 30776d4753SMattias RönnblomThe handle is represented by an opaque pointer, 31776d4753SMattias Rönnblomonly to be dereferenced using the appropriate ``<rte_lcore_var.h>`` macros. 32776d4753SMattias Rönnblom 33776d4753SMattias RönnblomThe handle is a pointer to the value's type 34776d4753SMattias Rönnblom(e.g., for an ``uint32_t`` lcore variable, the handle is a ``uint32_t *``). 35776d4753SMattias Rönnblom 36776d4753SMattias RönnblomThe reason the handle is typed (i.e., it's not a void pointer or an integer) 37776d4753SMattias Rönnblomis to enable type checking when accessing values of the lcore variable. 38776d4753SMattias Rönnblom 39776d4753SMattias RönnblomA handle may be passed between modules and threads 40776d4753SMattias Rönnblomjust like any other pointer. 41776d4753SMattias Rönnblom 42776d4753SMattias RönnblomA valid (i.e., allocated) handle never has the value NULL. 43776d4753SMattias RönnblomThus, a handle set to NULL may be used 44776d4753SMattias Rönnblomto signify that allocation has not yet been done. 45776d4753SMattias Rönnblom 46776d4753SMattias Rönnblom 47776d4753SMattias RönnblomLcore Variable Allocation 48776d4753SMattias Rönnblom~~~~~~~~~~~~~~~~~~~~~~~~~ 49776d4753SMattias Rönnblom 50776d4753SMattias RönnblomAn lcore variable is created in two steps: 51776d4753SMattias Rönnblom 52776d4753SMattias Rönnblom1. Define an lcore variable handle by using ``RTE_LCORE_VAR_HANDLE``. 53776d4753SMattias Rönnblom2. Allocate lcore variable storage and initialize the handle 54776d4753SMattias Rönnblom by using ``RTE_LCORE_VAR_ALLOC`` or ``RTE_LCORE_VAR_INIT``. 55776d4753SMattias Rönnblom Allocation generally occurs at the time of module initialization, 56776d4753SMattias Rönnblom but may be done at any time. 57776d4753SMattias Rönnblom 58776d4753SMattias RönnblomThe lifetime of an lcore variable is not tied to the thread that created it. 59776d4753SMattias Rönnblom 60776d4753SMattias RönnblomEach lcore variable has ``RTE_MAX_LCORE`` values, 61776d4753SMattias Rönnblomone for each possible lcore id. 62776d4753SMattias RönnblomAll of an lcore variable's values may be accessed 63776d4753SMattias Rönnblomfrom the moment the lcore variable is created, 64776d4753SMattias Rönnblomthroughout the lifetime of the EAL (i.e., until ``rte_eal_cleanup()``). 65776d4753SMattias Rönnblom 66776d4753SMattias RönnblomLcore variables do not need to be freed and cannot be freed. 67776d4753SMattias Rönnblom 68776d4753SMattias Rönnblom 69776d4753SMattias RönnblomAccess 70776d4753SMattias Rönnblom~~~~~~ 71776d4753SMattias Rönnblom 72776d4753SMattias RönnblomThe value of any lcore variable for any lcore id 73776d4753SMattias Rönnblommay be accessed from any thread (including unregistered threads), 74776d4753SMattias Rönnblombut it should only be *frequently* read from or written to by the *owner*. 75776d4753SMattias RönnblomA thread is considered the owner of a particular lcore variable value instance 76776d4753SMattias Rönnblomif it has the lcore id associated with that instance. 77776d4753SMattias Rönnblom 78776d4753SMattias RönnblomNon-owner accesses results in *false sharing*. 79776d4753SMattias RönnblomAs long as non-owner accesses are rare, 80776d4753SMattias Rönnblomthey will have only a very slight effect on performance. 81776d4753SMattias RönnblomThis property of lcore variables memory organization is intentional. 82776d4753SMattias RönnblomSee the implementation section for more information. 83776d4753SMattias Rönnblom 84776d4753SMattias RönnblomValues of the same lcore variable, 85776d4753SMattias Rönnblomassociated with different lcore ids may be frequently read or written 86776d4753SMattias Rönnblomby their respective owners without risking false sharing. 87776d4753SMattias Rönnblom 88776d4753SMattias RönnblomAn appropriate synchronization mechanism, 89776d4753SMattias Rönnblomsuch as atomic load and stores, 90776d4753SMattias Rönnblomshould be employed to prevent data races between the owning thread 91776d4753SMattias Rönnblomand any other thread accessing the same value instance. 92776d4753SMattias Rönnblom 93776d4753SMattias RönnblomThe value of the lcore variable for a particular lcore id 94776d4753SMattias Rönnblomis accessed via ``RTE_LCORE_VAR_LCORE``. 95776d4753SMattias Rönnblom 96776d4753SMattias RönnblomA common pattern is for an EAL thread or a registered non-EAL thread 97776d4753SMattias Rönnblomto access its own lcore variable value. 98776d4753SMattias RönnblomFor this purpose, a shorthand exists as ``RTE_LCORE_VAR``. 99776d4753SMattias Rönnblom 100776d4753SMattias Rönnblom``RTE_LCORE_VAR_FOREACH`` may be used to iterate 101776d4753SMattias Rönnblomover all values of a particular lcore variable. 102776d4753SMattias Rönnblom 103776d4753SMattias RönnblomThe handle, defined by ``RTE_LCORE_VAR_HANDLE``, 104776d4753SMattias Rönnblomis a pointer of the same type as the value, 105776d4753SMattias Rönnblombut it must be treated as an opaque identifier 106776d4753SMattias Rönnblomand cannot be directly dereferenced. 107776d4753SMattias Rönnblom 108776d4753SMattias RönnblomLcore variable handles and value pointers may be freely passed 109776d4753SMattias Rönnblombetween different threads. 110776d4753SMattias Rönnblom 111776d4753SMattias Rönnblom 112776d4753SMattias RönnblomStorage 113776d4753SMattias Rönnblom~~~~~~~ 114776d4753SMattias Rönnblom 115776d4753SMattias RönnblomAn lcore variable's values may be of a primitive type like ``int``, 116776d4753SMattias Rönnblombut is typically a ``struct``. 117776d4753SMattias Rönnblom 118776d4753SMattias RönnblomThe lcore variable handle introduces a per-variable 119776d4753SMattias Rönnblom(not per-value/per-lcore id) overhead of ``sizeof(void *)`` bytes, 120776d4753SMattias Rönnblomso there are some memory footprint gains to be made by organizing 121776d4753SMattias Rönnblomall per-lcore id data for a particular module as one lcore variable 122776d4753SMattias Rönnblom(e.g., as a struct). 123776d4753SMattias Rönnblom 124776d4753SMattias RönnblomAn application may define an lcore variable handle 125776d4753SMattias Rönnblomwithout ever allocating the lcore variable. 126776d4753SMattias Rönnblom 127776d4753SMattias RönnblomThe size of an lcore variable's value cannot exceed 128776d4753SMattias Rönnblomthe DPDK build-time constant ``RTE_MAX_LCORE_VAR``. 129776d4753SMattias RönnblomAn lcore variable's size is the size of one of its value instance, 130776d4753SMattias Rönnblomnot the aggregate of all its ``RTE_MAX_LCORE`` instances. 131776d4753SMattias Rönnblom 132776d4753SMattias RönnblomLcore variables should generally *not* be ``__rte_cache_aligned`` 133776d4753SMattias Rönnblomand need *not* include a ``RTE_CACHE_GUARD`` field, 134776d4753SMattias Rönnblomsince these constructs are designed to avoid false sharing. 135776d4753SMattias RönnblomWith lcore variables, false sharing is largely avoided by other means. 136776d4753SMattias RönnblomIn the case of an lcore variable instance, 137776d4753SMattias Rönnblomthe thread most recently accessing nearby data structures 138776d4753SMattias Rönnblomshould almost always be the lcore variable's owner. 139776d4753SMattias RönnblomAdding padding (e.g., with ``RTE_CACHE_GUARD``) 140776d4753SMattias Rönnblomwill increase the effective memory working set size, 141776d4753SMattias Rönnblompotentially reducing performance. 142776d4753SMattias Rönnblom 143776d4753SMattias RönnblomLcore variable values are initialized to zero by default. 144776d4753SMattias Rönnblom 145776d4753SMattias RönnblomLcore variables are not stored in huge page memory. 146776d4753SMattias Rönnblom 147776d4753SMattias Rönnblom 148776d4753SMattias RönnblomExample 149776d4753SMattias Rönnblom~~~~~~~ 150776d4753SMattias Rönnblom 151776d4753SMattias RönnblomBelow is an example of the use of an lcore variable: 152776d4753SMattias Rönnblom 153776d4753SMattias Rönnblom.. code-block:: c 154776d4753SMattias Rönnblom 155776d4753SMattias Rönnblom struct foo_lcore_state { 156776d4753SMattias Rönnblom int a; 157776d4753SMattias Rönnblom long b; 158776d4753SMattias Rönnblom }; 159776d4753SMattias Rönnblom 160776d4753SMattias Rönnblom static RTE_LCORE_VAR_HANDLE(struct foo_lcore_state, lcore_states); 161776d4753SMattias Rönnblom 162776d4753SMattias Rönnblom long foo_get_a_plus_b(void) 163776d4753SMattias Rönnblom { 164776d4753SMattias Rönnblom const struct foo_lcore_state *state = RTE_LCORE_VAR(lcore_states); 165776d4753SMattias Rönnblom 166776d4753SMattias Rönnblom return state->a + state->b; 167776d4753SMattias Rönnblom } 168776d4753SMattias Rönnblom 169776d4753SMattias Rönnblom RTE_INIT(rte_foo_init) 170776d4753SMattias Rönnblom { 171776d4753SMattias Rönnblom RTE_LCORE_VAR_ALLOC(lcore_states); 172776d4753SMattias Rönnblom 173776d4753SMattias Rönnblom unsigned int lcore_id; 174776d4753SMattias Rönnblom struct foo_lcore_state *state; 175776d4753SMattias Rönnblom RTE_LCORE_VAR_FOREACH(lcore_id, state, lcore_states) { 176776d4753SMattias Rönnblom /* initialize state */ 177776d4753SMattias Rönnblom } 178776d4753SMattias Rönnblom 179776d4753SMattias Rönnblom /* other initialization */ 180776d4753SMattias Rönnblom } 181776d4753SMattias Rönnblom 182776d4753SMattias Rönnblom 183776d4753SMattias RönnblomImplementation 184776d4753SMattias Rönnblom-------------- 185776d4753SMattias Rönnblom 186776d4753SMattias RönnblomThis section gives an overview of the implementation of lcore variables, 187776d4753SMattias Rönnblomand some background to its design. 188776d4753SMattias Rönnblom 189776d4753SMattias Rönnblom 190776d4753SMattias RönnblomLcore Variable Buffers 191776d4753SMattias Rönnblom~~~~~~~~~~~~~~~~~~~~~~ 192776d4753SMattias Rönnblom 193776d4753SMattias RönnblomLcore variable values are kept in a set of ``lcore_var_buffer`` structs. 194776d4753SMattias Rönnblom 195776d4753SMattias Rönnblom.. literalinclude:: ../../../lib/eal/common/eal_common_lcore_var.c 196776d4753SMattias Rönnblom :language: c 197776d4753SMattias Rönnblom :start-after: base unit 198776d4753SMattias Rönnblom :end-before: last allocated unit 199776d4753SMattias Rönnblom 200776d4753SMattias RönnblomAn lcore var buffer stores at a minimum one, but usually many, lcore variables. 201776d4753SMattias Rönnblom 202776d4753SMattias RönnblomThe value instances for all lcore ids are stored in the same buffer. 203776d4753SMattias RönnblomHowever, each lcore id has its own slice of the ``data`` array. 204776d4753SMattias RönnblomSuch a slice is ``RTE_MAX_LCORE_VAR`` bytes in size. 205776d4753SMattias Rönnblom 206776d4753SMattias RönnblomIn this way, the values associated with a particular lcore id 207776d4753SMattias Rönnblomare grouped spatially close (in memory). 208776d4753SMattias RönnblomNo padding is required to prevent false sharing. 209776d4753SMattias Rönnblom 210776d4753SMattias Rönnblom.. literalinclude:: ../../../lib/eal/common/eal_common_lcore_var.c 211776d4753SMattias Rönnblom :language: c 212776d4753SMattias Rönnblom :start-after: last allocated unit 213776d4753SMattias Rönnblom :end-before: >8 end of documented variables 214776d4753SMattias Rönnblom 215776d4753SMattias RönnblomThe implementation maintains a current ``lcore_var_buffer`` and an ``offset``, 216776d4753SMattias Rönnblomwhere the latter tracks how many bytes of this current buffer has been allocated. 217776d4753SMattias Rönnblom 218776d4753SMattias RönnblomThe ``offset`` is progressively incremented 219776d4753SMattias Rönnblom(by the size of the just-allocated lcore variable), 220776d4753SMattias Rönnblomas lcore variables are being allocated. 221776d4753SMattias Rönnblom 222776d4753SMattias RönnblomIf the allocation of a variable would result in an ``offset`` larger 223776d4753SMattias Rönnblomthan ``RTE_MAX_LCORE_VAR`` (i.e., the slice size), the buffer is full. 224776d4753SMattias RönnblomIn that case, new buffer is allocated off the heap, and the ``offset`` is reset. 225776d4753SMattias Rönnblom 226776d4753SMattias RönnblomThe lcore var buffers are arranged in a link list, 227776d4753SMattias Rönnblomto allow freeing them at the point of ``rte_eal_cleanup()``. 228776d4753SMattias Rönnblom 229776d4753SMattias RönnblomThe lcore variable buffers are allocated off the regular C heap. 230776d4753SMattias RönnblomThere are a number of reasons for not using ``<rte_malloc.h>`` 231776d4753SMattias Rönnblomand huge pages for lcore variables: 232776d4753SMattias Rönnblom 233776d4753SMattias Rönnblom- The libc heap is available at any time, 234776d4753SMattias Rönnblom including early in the DPDK initialization. 235776d4753SMattias Rönnblom- The amount of data kept in lcore variables is projected to be small, 236776d4753SMattias Rönnblom and thus is unlikely to induce translate lookaside buffer (TLB) misses. 237776d4753SMattias Rönnblom- The last (and potentially only) lcore buffer in the chain 238776d4753SMattias Rönnblom will likely only partially be in use. 239776d4753SMattias Rönnblom Huge pages of the sort used by DPDK are always resident in memory, 240776d4753SMattias Rönnblom and their use would result in a significant amount of memory going to waste. 241776d4753SMattias Rönnblom An example: ~256 kB worth of lcore variables are allocated 242776d4753SMattias Rönnblom by DPDK libraries, PMDs and the application. 243*37dda90eSThomas Monjalon ``RTE_MAX_LCORE_VAR`` is set to 128 kB and ``RTE_MAX_LCORE`` to 128. 244776d4753SMattias Rönnblom With 4 kB OS pages, only the first ~64 pages of each of the 128 per-lcore id slices 245776d4753SMattias Rönnblom in the (only) ``lcore_var_buffer`` will actually be resident (paged in). 246776d4753SMattias Rönnblom Here, demand paging saves ~98 MB of memory. 247776d4753SMattias Rönnblom 248776d4753SMattias Rönnblom.. note:: 249776d4753SMattias Rönnblom 250776d4753SMattias Rönnblom Not residing in huge pages, lcore variables cannot be accessed from secondary processes. 251776d4753SMattias Rönnblom 252776d4753SMattias RönnblomHeap allocation failures are treated as fatal. 253776d4753SMattias RönnblomThe reason for this unorthodox design is that a majority of the allocations 254776d4753SMattias Rönnblomare deemed to happen at initialization. 255776d4753SMattias RönnblomAn early heap allocation failure for a fixed amount of data is a situation 256776d4753SMattias Rönnblomnot unlike one where there is not enough memory available for static variables 257776d4753SMattias Rönnblom(i.e., the BSS or data sections). 258776d4753SMattias Rönnblom 259776d4753SMattias RönnblomProvided these assumptions hold true, it's deemed acceptable 260776d4753SMattias Rönnblomto leave the application out of handling memory allocation failures. 261776d4753SMattias Rönnblom 262776d4753SMattias RönnblomThe upside of this approach is that no error handling code is required 263776d4753SMattias Rönnblomon the API user side. 264776d4753SMattias Rönnblom 265776d4753SMattias Rönnblom 266776d4753SMattias RönnblomLcore Variable Handles 267776d4753SMattias Rönnblom~~~~~~~~~~~~~~~~~~~~~~ 268776d4753SMattias Rönnblom 269776d4753SMattias RönnblomUpon lcore variable allocation, the lcore variables API returns 270776d4753SMattias Rönnbloman opaque *handle* in the form of a pointer. 271776d4753SMattias RönnblomThe value of the pointer is ``buffer->data + offset``. 272776d4753SMattias Rönnblom 273776d4753SMattias RönnblomTranslating a handle base pointer to a pointer to a value 274776d4753SMattias Rönnblomassociated with a particular lcore id is straightforward: 275776d4753SMattias Rönnblom 276776d4753SMattias Rönnblom.. literalinclude:: ../../../lib/eal/include/rte_lcore_var.h 277776d4753SMattias Rönnblom :language: c 278776d4753SMattias Rönnblom :start-after: access function 8< 279776d4753SMattias Rönnblom :end-before: >8 end of access function 280776d4753SMattias Rönnblom 281776d4753SMattias Rönnblom``RTE_MAX_LCORE_VAR`` is a public macro to allow the compiler 282776d4753SMattias Rönnblomto optimize the ``lcore_id * RTE_MAX_LCORE_VAR`` expression, 283776d4753SMattias Rönnblomand replace the multiplication with a less expensive arithmetic operation. 284776d4753SMattias Rönnblom 285776d4753SMattias RönnblomTo maintain type safety, the ``RTE_LCORE_VAR*()`` macros should be used, 286776d4753SMattias Rönnblominstead of directly invoking ``rte_lcore_var_lcore()``. 287776d4753SMattias RönnblomThe macros return a pointer of the same type as the handle 288776d4753SMattias Rönnblom(i.e., a pointer to the value's type). 289776d4753SMattias Rönnblom 290776d4753SMattias Rönnblom 291776d4753SMattias RönnblomMemory Layout 292776d4753SMattias Rönnblom~~~~~~~~~~~~~ 293776d4753SMattias Rönnblom 294776d4753SMattias RönnblomThis section describes how lcore variables are organized in memory. 295776d4753SMattias Rönnblom 296776d4753SMattias RönnblomAs an illustration, two example modules are used, 297776d4753SMattias Rönnblom``rte_x`` and ``rte_y``, both maintaining per-lcore id state 298776d4753SMattias Rönnblomas a part of their implementation. 299776d4753SMattias Rönnblom 300776d4753SMattias RönnblomTwo different methods will be used to maintain such state - 301776d4753SMattias Rönnblomlcore variables and, to serve as a reference, lcore id-indexed static arrays. 302776d4753SMattias Rönnblom 303776d4753SMattias RönnblomCertain parameters are scaled down to make graphical depictions more practical. 304776d4753SMattias Rönnblom 305776d4753SMattias RönnblomFor the purpose of this exercise, a ``RTE_MAX_LCORE`` of 2 is assumed. 306776d4753SMattias RönnblomIn a real-world configuration, the maximum number of 307776d4753SMattias RönnblomEAL threads and registered threads will be much greater (e.g., 128). 308776d4753SMattias Rönnblom 309776d4753SMattias RönnblomThe lcore variables example assumes a ``RTE_MAX_LCORE_VAR`` of 64. 310776d4753SMattias RönnblomIn a real-world configuration (as controlled by ``rte_config.h``), 311776d4753SMattias Rönnblomthe value of this compile-time constant will be much greater (e.g., 1048576). 312776d4753SMattias Rönnblom 313776d4753SMattias RönnblomThe per-lcore id state is also smaller than what most real-world modules would have. 314776d4753SMattias Rönnblom 315776d4753SMattias RönnblomLcore Variables Example 316776d4753SMattias Rönnblom^^^^^^^^^^^^^^^^^^^^^^^ 317776d4753SMattias Rönnblom 318776d4753SMattias RönnblomWhen lcore variables are used, the parts of ``rte_x`` and ``rte_y`` 319776d4753SMattias Rönnblomthat deal with the declaration and allocation of per-lcore id data 320776d4753SMattias Rönnblommay look something like below. 321776d4753SMattias Rönnblom 322776d4753SMattias Rönnblom.. code-block:: c 323776d4753SMattias Rönnblom 324776d4753SMattias Rönnblom /* -- Lcore variables -- */ 325776d4753SMattias Rönnblom 326776d4753SMattias Rönnblom /* rte_x.c */ 327776d4753SMattias Rönnblom 328776d4753SMattias Rönnblom struct x_lcore 329776d4753SMattias Rönnblom { 330776d4753SMattias Rönnblom int a; 331776d4753SMattias Rönnblom char b; 332776d4753SMattias Rönnblom }; 333776d4753SMattias Rönnblom 334776d4753SMattias Rönnblom static RTE_LCORE_VAR_HANDLE(struct x_lcore, x_lcores); 335776d4753SMattias Rönnblom RTE_LCORE_VAR_INIT(x_lcores); 336776d4753SMattias Rönnblom 337776d4753SMattias Rönnblom /../ 338776d4753SMattias Rönnblom 339776d4753SMattias Rönnblom /* rte_y.c */ 340776d4753SMattias Rönnblom 341776d4753SMattias Rönnblom struct y_lcore 342776d4753SMattias Rönnblom { 343776d4753SMattias Rönnblom long c; 344776d4753SMattias Rönnblom long d; 345776d4753SMattias Rönnblom }; 346776d4753SMattias Rönnblom 347776d4753SMattias Rönnblom static RTE_LCORE_VAR_HANDLE(struct y_lcore, y_lcores); 348776d4753SMattias Rönnblom RTE_LCORE_VAR_INIT(y_lcores); 349776d4753SMattias Rönnblom 350776d4753SMattias Rönnblom /../ 351776d4753SMattias Rönnblom 352776d4753SMattias RönnblomThe resulting memory layout will look something like the following: 353776d4753SMattias Rönnblom 354776d4753SMattias Rönnblom.. figure:: img/lcore_var_mem_layout.* 355776d4753SMattias Rönnblom 356776d4753SMattias RönnblomThe above figure assumes that ``x_lcores`` is allocated prior to ``y_lcores``. 357776d4753SMattias Rönnblom``RTE_LCORE_VAR_INIT()`` relies constructors, run prior to ``main()`` in an undefined order. 358776d4753SMattias Rönnblom 359776d4753SMattias RönnblomThe use of lcore variables ensures that per-lcore id data is kept in close proximity, 360776d4753SMattias Rönnblomwithin a designated region of memory. 361776d4753SMattias RönnblomThis proximity enhances data locality and can improve performance. 362776d4753SMattias Rönnblom 363776d4753SMattias RönnblomLcore Id Index Static Array Example 364776d4753SMattias Rönnblom^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 365776d4753SMattias Rönnblom 366776d4753SMattias RönnblomBelow is an example of the struct declarations, 367776d4753SMattias Rönnblomdeclarations and the resulting organization in memory 368776d4753SMattias Rönnblomin case an lcore id indexed static array of cache-line aligned, 369776d4753SMattias RönnblomRTE_CACHE_GUARDed structs are used to maintain per-lcore id state. 370776d4753SMattias Rönnblom 371776d4753SMattias RönnblomThis is a common pattern in DPDK, which lcore variables attempts to replace. 372776d4753SMattias Rönnblom 373776d4753SMattias Rönnblom.. code-block:: c 374776d4753SMattias Rönnblom 375776d4753SMattias Rönnblom /* -- Cache-aligned static arrays -- */ 376776d4753SMattias Rönnblom 377776d4753SMattias Rönnblom /* rte_x.c */ 378776d4753SMattias Rönnblom 379776d4753SMattias Rönnblom struct __rte_cache_aligned x_lcore 380776d4753SMattias Rönnblom { 381776d4753SMattias Rönnblom int a; 382776d4753SMattias Rönnblom char b; 383776d4753SMattias Rönnblom RTE_CACHE_GUARD; 384776d4753SMattias Rönnblom }; 385776d4753SMattias Rönnblom 386776d4753SMattias Rönnblom static struct x_lcore x_lcores[RTE_MAX_LCORE]; 387776d4753SMattias Rönnblom 388776d4753SMattias Rönnblom /../ 389776d4753SMattias Rönnblom 390776d4753SMattias Rönnblom /* rte_y.c */ 391776d4753SMattias Rönnblom 392776d4753SMattias Rönnblom struct __rte_cache_aligned y_lcore 393776d4753SMattias Rönnblom { 394776d4753SMattias Rönnblom long c; 395776d4753SMattias Rönnblom long d; 396776d4753SMattias Rönnblom RTE_CACHE_GUARD; 397776d4753SMattias Rönnblom }; 398776d4753SMattias Rönnblom 399776d4753SMattias Rönnblom static struct y_lcore y_lcores[RTE_MAX_LCORE]; 400776d4753SMattias Rönnblom 401776d4753SMattias Rönnblom /../ 402776d4753SMattias Rönnblom 403776d4753SMattias RönnblomIn this approach, accessing the state for a particular lcore id is merely 404776d4753SMattias Rönnbloma matter retrieving the lcore id and looking up the correct struct instance. 405776d4753SMattias Rönnblom 406776d4753SMattias Rönnblom.. code-block:: c 407776d4753SMattias Rönnblom 408776d4753SMattias Rönnblom struct x_lcore *my_lcore_state = &x_lcores[rte_lcore_id()]; 409776d4753SMattias Rönnblom 410776d4753SMattias RönnblomThe address "0" at the top of the left-most column in the figure 411776d4753SMattias Rönnblomrepresent the base address for the ``x_lcores`` array 412776d4753SMattias Rönnblom(in the BSS segment in memory). 413776d4753SMattias Rönnblom 414776d4753SMattias RönnblomThe figure only includes the memory layout for the ``rte_x`` example module. 415776d4753SMattias Rönnblom``rte_y`` would look very similar, with ``y_lcores`` being located 416776d4753SMattias Rönnblomat some other address in the BSS section. 417776d4753SMattias Rönnblom 418776d4753SMattias Rönnblom.. figure:: img/static_array_mem_layout.* 419776d4753SMattias Rönnblom 420776d4753SMattias RönnblomThe static array approach results in the per-lcore id 421776d4753SMattias Rönnblombeing organized around modules, not lcore ids. 422776d4753SMattias RönnblomTo avoid false sharing, an extensive use of padding is employed, 423776d4753SMattias Rönnblomcausing cache fragmentation. 424776d4753SMattias Rönnblom 425776d4753SMattias RönnblomBecause the padding is interspersed with the data, 426776d4753SMattias Rönnblomdemand paging is unlikely to reduce the actual resident DRAM memory footprint. 427776d4753SMattias RönnblomThis is because the padding is smaller 428776d4753SMattias Rönnblomthan a typical operating system memory page (usually 4 kB). 429776d4753SMattias Rönnblom 430776d4753SMattias Rönnblom 431776d4753SMattias RönnblomPerformance 432776d4753SMattias Rönnblom~~~~~~~~~~~ 433776d4753SMattias Rönnblom 434776d4753SMattias RönnblomOne of the goals of lcore variables is to improve performance. 435776d4753SMattias RönnblomThis is achieved by packing often-used data in fewer cache lines, 436776d4753SMattias Rönnblomand thus reducing fragmentation in CPU caches 437776d4753SMattias Rönnblomand thus somewhat improving the effective cache size and cache hit rates. 438776d4753SMattias Rönnblom 439776d4753SMattias RönnblomThe application-level gains depends much on how much data is kept in lcore variables, 440776d4753SMattias Rönnblomand how often it is accessed, 441776d4753SMattias Rönnblomand how much pressure the application asserts on the CPU caches 442776d4753SMattias Rönnblom(i.e., how much other memory it accesses). 443776d4753SMattias Rönnblom 444776d4753SMattias RönnblomThe ``lcore_var_perf_autotest`` is an attempt at exploring 445776d4753SMattias Rönnblomthe performance benefits (or drawbacks) of lcore variables 446776d4753SMattias Rönnblomcompared to its alternatives. 447776d4753SMattias RönnblomBeing a micro benchmark, it needs to be taken with a grain of salt. 448776d4753SMattias Rönnblom 449776d4753SMattias RönnblomGenerally, one shouldn't expect more than some very modest gains in performance 450776d4753SMattias Rönnblomafter a switch from lcore id indexed arrays to lcore variables. 451776d4753SMattias Rönnblom 452776d4753SMattias RönnblomAn additional benefit of the use of lcore variables is that it avoids 453776d4753SMattias Rönnblomcertain tricky issues related to CPU core hardware prefetching 454776d4753SMattias Rönnblom(e.g., next-N-lines prefetching) that may cause false sharing 455776d4753SMattias Rönnblomeven when data used by two cores do not reside on the same cache line. 456776d4753SMattias RönnblomHardware prefetch behavior is generally not publicly documented 457776d4753SMattias Rönnblomand varies across CPU vendors, CPU generations and BIOS (or similar) configurations. 458776d4753SMattias RönnblomFor applications aiming to be portable, this may cause issues. 459776d4753SMattias RönnblomOften, CPU hardware prefetch-induced issues are non-existent, 460776d4753SMattias Rönnblomexcept some particular circumstances, where their adverse effects may be significant. 461776d4753SMattias Rönnblom 462776d4753SMattias Rönnblom 463776d4753SMattias RönnblomAlternatives 464776d4753SMattias Rönnblom------------ 465776d4753SMattias Rönnblom 466776d4753SMattias RönnblomLcore Id Indexed Static Arrays 467776d4753SMattias Rönnblom~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 468776d4753SMattias Rönnblom 469776d4753SMattias RönnblomLcore variables are designed to replace a pattern exemplified below: 470776d4753SMattias Rönnblom 471776d4753SMattias Rönnblom.. code-block:: c 472776d4753SMattias Rönnblom 473776d4753SMattias Rönnblom struct __rte_cache_aligned foo_lcore_state { 474776d4753SMattias Rönnblom int a; 475776d4753SMattias Rönnblom long b; 476776d4753SMattias Rönnblom RTE_CACHE_GUARD; 477776d4753SMattias Rönnblom }; 478776d4753SMattias Rönnblom 479776d4753SMattias Rönnblom static struct foo_lcore_state lcore_states[RTE_MAX_LCORE]; 480776d4753SMattias Rönnblom 481776d4753SMattias RönnblomThis scheme is simple and effective, but has one drawback: 482776d4753SMattias Rönnblomthe data is organized so that objects related to all lcores for a particular module 483776d4753SMattias Rönnblomare kept close in memory. 484776d4753SMattias RönnblomAt a bare minimum, this requires sizing data structures 485776d4753SMattias Rönnblom(e.g., using ``__rte_cache_aligned``) to an even number of cache lines 486776d4753SMattias Rönnblomand ensuring that allocation of such objects 487776d4753SMattias Rönnblomare cache line aligned to avoid false sharing. 488776d4753SMattias RönnblomWith CPU hardware prefetching and memory loads resulting from speculative execution 489776d4753SMattias Rönnblom(functions which seemingly are getting more eager faster 490776d4753SMattias Rönnblomthan they are getting more intelligent), 491776d4753SMattias Rönnblomone or more "guard" cache lines may be required 492776d4753SMattias Rönnblomto separate one lcore's data from another's and prevent false sharing. 493776d4753SMattias Rönnblom 494776d4753SMattias RönnblomLcore variables offer the advantage of working with, 495776d4753SMattias Rönnblomrather than against, the CPU's assumptions. 496776d4753SMattias RönnblomA next-line hardware prefetcher, for example, may function as intended 497776d4753SMattias Rönnblom(i.e., to the benefit, not detriment, of system performance). 498776d4753SMattias Rönnblom 499776d4753SMattias Rönnblom 500776d4753SMattias RönnblomThread Local Storage 501776d4753SMattias Rönnblom~~~~~~~~~~~~~~~~~~~~ 502776d4753SMattias Rönnblom 503776d4753SMattias RönnblomAn alternative to ``rte_lcore_var.h`` is the ``rte_per_lcore.h`` API, 504776d4753SMattias Rönnblomwhich makes use of thread-local storage 505776d4753SMattias Rönnblom(TLS, e.g., GCC ``__thread`` or C11 ``_Thread_local``). 506776d4753SMattias Rönnblom 507776d4753SMattias RönnblomThere are a number of differences between using TLS 508776d4753SMattias Rönnblomand the use of lcore variables. 509776d4753SMattias Rönnblom 510776d4753SMattias RönnblomThe lifecycle of a thread-local variable instance is tied to that of the thread. 511776d4753SMattias RönnblomThe data cannot be accessed before the thread has been created, 512776d4753SMattias Rönnblomnor after it has terminated. 513776d4753SMattias RönnblomAs a result, thread-local variables must be initialized in a "lazy" manner 514776d4753SMattias Rönnblom(e.g., at the point of thread creation). 515776d4753SMattias RönnblomLcore variables may be accessed immediately after having been allocated 516776d4753SMattias Rönnblom(which may occur before any thread beyond the main thread is running). 517776d4753SMattias Rönnblom 518776d4753SMattias RönnblomA thread-local variable is duplicated across all threads in the process, 519776d4753SMattias Rönnblomincluding unregistered non-EAL threads (i.e., "regular" threads). 520776d4753SMattias RönnblomFor DPDK applications heavily relying on multi-threading 521776d4753SMattias Rönnblom(in conjunction to DPDK's "one thread per core" pattern), 522776d4753SMattias Rönnblomeither by having many concurrent threads or creating/destroying threads at a high rate, 523776d4753SMattias Rönnbloman excessive use of thread-local variables may cause inefficiencies 524776d4753SMattias Rönnblom(e.g., increased thread creation overhead due to thread-local storage initialization 525776d4753SMattias Rönnblomor increased memory footprint). 526776d4753SMattias RönnblomLcore variables *only* exist for threads with an lcore id. 527776d4753SMattias Rönnblom 528776d4753SMattias RönnblomWhether data in thread-local storage can be shared between threads 529776d4753SMattias Rönnblom(i.e., whether a pointer to a thread-local variable can be passed to 530776d4753SMattias Rönnblomand successfully dereferenced by a non-owning thread) 531776d4753SMattias Rönnblomdepends on the specifics of the TLS implementation. 532776d4753SMattias RönnblomWith GCC ``__thread`` and GCC ``_Thread_local``, 533776d4753SMattias Rönnblomdata sharing between threads is supported. 534776d4753SMattias RönnblomIn the C11 standard, accessing another thread's ``_Thread_local`` object 535776d4753SMattias Rönnblomis implementation-defined. 536776d4753SMattias RönnblomLcore variable instances may be accessed reliably by any thread. 537776d4753SMattias Rönnblom 538776d4753SMattias RönnblomLcore variables also relies on TLS to retrieve the thread's lcore id. 539776d4753SMattias RönnblomHowever, the rest of the per-thread data is not kept in TLS. 540776d4753SMattias Rönnblom 541776d4753SMattias RönnblomFrom a memory layout perspective, TLS is similar to lcore variables, 542776d4753SMattias Rönnblomand thus per-thread data structure need not be padded. 543776d4753SMattias Rönnblom 544776d4753SMattias RönnblomIn case the above-mentioned drawbacks of the use of TLS is of no significance 545776d4753SMattias Rönnblomto a particular application, TLS is a good alternative to lcore variables. 546