xref: /dpdk/doc/guides/prog_guide/rcu_lib.rst (revision 41dd9a6bc2d9c6e20e139ad713cc9d172572dd43)
1d67e6ecfSHonnappa Nagarahalli..  SPDX-License-Identifier: BSD-3-Clause
2d67e6ecfSHonnappa Nagarahalli    Copyright(c) 2019 Arm Limited.
3d67e6ecfSHonnappa Nagarahalli
4*41dd9a6bSDavid YoungRead-Copy-Update (RCU) Library
5*41dd9a6bSDavid Young==============================
6d67e6ecfSHonnappa Nagarahalli
7d67e6ecfSHonnappa NagarahalliLockless data structures provide scalability and determinism.
8d67e6ecfSHonnappa NagarahalliThey enable use cases where locking may not be allowed
9d67e6ecfSHonnappa Nagarahalli(for example real-time applications).
10d67e6ecfSHonnappa Nagarahalli
11d67e6ecfSHonnappa NagarahalliIn the following sections, the term "memory" refers to memory allocated
12d67e6ecfSHonnappa Nagarahalliby typical APIs like malloc() or anything that is representative of
13d67e6ecfSHonnappa Nagarahallimemory, for example an index of a free element array.
14d67e6ecfSHonnappa Nagarahalli
15d67e6ecfSHonnappa NagarahalliSince these data structures are lockless, the writers and readers
16d67e6ecfSHonnappa Nagarahalliare accessing the data structures concurrently. Hence, while removing
17d67e6ecfSHonnappa Nagarahallian element from a data structure, the writers cannot return the memory
18d67e6ecfSHonnappa Nagarahallito the allocator, without knowing that the readers are not
19d67e6ecfSHonnappa Nagarahallireferencing that element/memory anymore. Hence, it is required to
20d67e6ecfSHonnappa Nagarahalliseparate the operation of removing an element into two steps:
21d67e6ecfSHonnappa Nagarahalli
22d67e6ecfSHonnappa Nagarahalli#. Delete: in this step, the writer removes the reference to the element from
23d67e6ecfSHonnappa Nagarahalli   the data structure but does not return the associated memory to the
24d67e6ecfSHonnappa Nagarahalli   allocator. This will ensure that new readers will not get a reference to
25d67e6ecfSHonnappa Nagarahalli   the removed element. Removing the reference is an atomic operation.
26d67e6ecfSHonnappa Nagarahalli
27d67e6ecfSHonnappa Nagarahalli#. Free (Reclaim): in this step, the writer returns the memory to the
28d67e6ecfSHonnappa Nagarahalli   memory allocator only after knowing that all the readers have stopped
29d67e6ecfSHonnappa Nagarahalli   referencing the deleted element.
30d67e6ecfSHonnappa Nagarahalli
31d67e6ecfSHonnappa NagarahalliThis library helps the writer determine when it is safe to free the
32d67e6ecfSHonnappa Nagarahallimemory by making use of thread Quiescent State (QS).
33d67e6ecfSHonnappa Nagarahalli
34d67e6ecfSHonnappa NagarahalliWhat is Quiescent State
35d67e6ecfSHonnappa Nagarahalli-----------------------
36d67e6ecfSHonnappa Nagarahalli
37d67e6ecfSHonnappa NagarahalliQuiescent State can be defined as "any point in the thread execution where the
384831115fSHonnappa Nagarahallithread does not hold a reference to shared memory". It is the responsibility of
394831115fSHonnappa Nagarahallithe application to determine its quiescent state.
40d67e6ecfSHonnappa Nagarahalli
41d67e6ecfSHonnappa NagarahalliLet us consider the following diagram:
42d67e6ecfSHonnappa Nagarahalli
43d67e6ecfSHonnappa Nagarahalli.. _figure_quiescent_state:
44d67e6ecfSHonnappa Nagarahalli
45d67e6ecfSHonnappa Nagarahalli.. figure:: img/rcu_general_info.*
46d67e6ecfSHonnappa Nagarahalli
47d67e6ecfSHonnappa Nagarahalli   Phases in the Quiescent State model.
48d67e6ecfSHonnappa Nagarahalli
49d67e6ecfSHonnappa Nagarahalli
50d67e6ecfSHonnappa NagarahalliAs shown in :numref:`figure_quiescent_state`, reader thread 1 accesses data
51d67e6ecfSHonnappa Nagarahallistructures D1 and D2. When it is accessing D1, if the writer has to remove an
52d67e6ecfSHonnappa Nagarahallielement from D1, the writer cannot free the memory associated with that
53d67e6ecfSHonnappa Nagarahallielement immediately. The writer can return the memory to the allocator only
54d67e6ecfSHonnappa Nagarahalliafter the reader stops referencing D1. In other words, reader thread RT1 has
55d67e6ecfSHonnappa Nagarahallito enter a quiescent state.
56d67e6ecfSHonnappa Nagarahalli
57d67e6ecfSHonnappa NagarahalliSimilarly, since reader thread 2 is also accessing D1, the writer has to
58d67e6ecfSHonnappa Nagarahalliwait till thread 2 enters quiescent state as well.
59d67e6ecfSHonnappa Nagarahalli
60d67e6ecfSHonnappa NagarahalliHowever, the writer does not need to wait for reader thread 3 to enter
61d67e6ecfSHonnappa Nagarahalliquiescent state. Reader thread 3 was not accessing D1 when the delete
6207d311ccSPrateek Agarwaloperation happened. So, reader thread 3 will not have a reference to the
63d67e6ecfSHonnappa Nagarahallideleted entry.
64d67e6ecfSHonnappa Nagarahalli
65d67e6ecfSHonnappa NagarahalliIt can be noted that, the critical sections for D2 is a quiescent state
66d67e6ecfSHonnappa Nagarahallifor D1. i.e. for a given data structure Dx, any point in the thread execution
67d67e6ecfSHonnappa Nagarahallithat does not reference Dx is a quiescent state.
68d67e6ecfSHonnappa Nagarahalli
69d67e6ecfSHonnappa NagarahalliSince memory is not freed immediately, there might be a need for
70d67e6ecfSHonnappa Nagarahalliprovisioning of additional memory, depending on the application requirements.
71d67e6ecfSHonnappa Nagarahalli
72d67e6ecfSHonnappa NagarahalliFactors affecting the RCU mechanism
73d67e6ecfSHonnappa Nagarahalli-----------------------------------
74d67e6ecfSHonnappa Nagarahalli
75d67e6ecfSHonnappa NagarahalliIt is important to make sure that this library keeps the overhead of
76d67e6ecfSHonnappa Nagarahalliidentifying the end of grace period and subsequent freeing of memory,
774831115fSHonnappa Nagarahallito a minimum. The following paras explain how grace period and critical
78d67e6ecfSHonnappa Nagarahallisection affect this overhead.
79d67e6ecfSHonnappa Nagarahalli
80d67e6ecfSHonnappa NagarahalliThe writer has to poll the readers to identify the end of grace period.
81d67e6ecfSHonnappa NagarahalliPolling introduces memory accesses and wastes CPU cycles. The memory
82d67e6ecfSHonnappa Nagarahalliis not available for reuse during the grace period. Longer grace periods
83d67e6ecfSHonnappa Nagarahalliexasperate these conditions.
84d67e6ecfSHonnappa Nagarahalli
85d67e6ecfSHonnappa NagarahalliThe length of the critical section and the number of reader threads
86d67e6ecfSHonnappa Nagarahalliis proportional to the duration of the grace period. Keeping the critical
87d67e6ecfSHonnappa Nagarahallisections smaller will keep the grace period smaller. However, keeping the
88d67e6ecfSHonnappa Nagarahallicritical sections smaller requires additional CPU cycles (due to additional
89d67e6ecfSHonnappa Nagarahallireporting) in the readers.
90d67e6ecfSHonnappa Nagarahalli
91d67e6ecfSHonnappa NagarahalliHence, we need the characteristics of a small grace period and large critical
924831115fSHonnappa Nagarahallisection. This library addresses these characteristics by allowing the writer
934831115fSHonnappa Nagarahallito do other work without having to block until the readers report their
944831115fSHonnappa Nagarahalliquiescent state.
95d67e6ecfSHonnappa Nagarahalli
96d67e6ecfSHonnappa NagarahalliRCU in DPDK
97d67e6ecfSHonnappa Nagarahalli-----------
98d67e6ecfSHonnappa Nagarahalli
994831115fSHonnappa NagarahalliFor DPDK applications, the beginning and end of a ``while(1)`` loop (where no
100d67e6ecfSHonnappa Nagarahallireferences to shared data structures are kept) act as perfect quiescent
101d67e6ecfSHonnappa Nagarahallistates. This will combine all the shared data structure accesses into a
102d67e6ecfSHonnappa Nagarahallisingle, large critical section which helps keep the overhead on the
103d67e6ecfSHonnappa Nagarahallireader side to a minimum.
104d67e6ecfSHonnappa Nagarahalli
105d67e6ecfSHonnappa NagarahalliDPDK supports a pipeline model of packet processing and service cores.
106d67e6ecfSHonnappa NagarahalliIn these use cases, a given data structure may not be used by all the
1074831115fSHonnappa Nagarahalliworkers in the application. The writer has to wait only for the workers that
1084831115fSHonnappa Nagarahalliuse the data structure to report their quiescent state. To provide the required
1094831115fSHonnappa Nagarahalliflexibility, this library has a concept of a QS variable. If required, the
1104831115fSHonnappa Nagarahalliapplication can create one QS variable per data structure to help it track the
1114831115fSHonnappa Nagarahalliend of grace period for each data structure. This helps keep the length of grace
112d67e6ecfSHonnappa Nagarahalliperiod to a minimum.
113d67e6ecfSHonnappa Nagarahalli
114d67e6ecfSHonnappa NagarahalliHow to use this library
115d67e6ecfSHonnappa Nagarahalli-----------------------
116d67e6ecfSHonnappa Nagarahalli
117d67e6ecfSHonnappa NagarahalliThe application must allocate memory and initialize a QS variable.
118d67e6ecfSHonnappa Nagarahalli
119d67e6ecfSHonnappa NagarahalliApplications can call ``rte_rcu_qsbr_get_memsize()`` to calculate the size
120d67e6ecfSHonnappa Nagarahalliof memory to allocate. This API takes a maximum number of reader threads,
121f50d9aadSHonnappa Nagarahalliusing this variable, as a parameter.
122d67e6ecfSHonnappa Nagarahalli
123d67e6ecfSHonnappa NagarahalliFurther, the application can initialize a QS variable using the API
124d67e6ecfSHonnappa Nagarahalli``rte_rcu_qsbr_init()``.
125d67e6ecfSHonnappa Nagarahalli
126d67e6ecfSHonnappa NagarahalliEach reader thread is assumed to have a unique thread ID. Currently, the
127d67e6ecfSHonnappa Nagarahallimanagement of the thread ID (for example allocation/free) is left to the
128d67e6ecfSHonnappa Nagarahalliapplication. The thread ID should be in the range of 0 to
129d67e6ecfSHonnappa Nagarahallimaximum number of threads provided while creating the QS variable.
130d67e6ecfSHonnappa NagarahalliThe application could also use ``lcore_id`` as the thread ID where applicable.
131d67e6ecfSHonnappa Nagarahalli
132d67e6ecfSHonnappa NagarahalliThe ``rte_rcu_qsbr_thread_register()`` API will register a reader thread
133d67e6ecfSHonnappa Nagarahallito report its quiescent state. This can be called from a reader thread.
134d67e6ecfSHonnappa NagarahalliA control plane thread can also call this on behalf of a reader thread.
135d67e6ecfSHonnappa NagarahalliThe reader thread must call ``rte_rcu_qsbr_thread_online()`` API to start
136d67e6ecfSHonnappa Nagarahallireporting its quiescent state.
137d67e6ecfSHonnappa Nagarahalli
138d67e6ecfSHonnappa NagarahalliSome of the use cases might require the reader threads to make blocking API
139d67e6ecfSHonnappa Nagarahallicalls (for example while using eventdev APIs). The writer thread should not
140d67e6ecfSHonnappa Nagarahalliwait for such reader threads to enter quiescent state.  The reader thread must
141d67e6ecfSHonnappa Nagarahallicall ``rte_rcu_qsbr_thread_offline()`` API, before calling blocking APIs. It
142d67e6ecfSHonnappa Nagarahallican call ``rte_rcu_qsbr_thread_online()`` API once the blocking API call
143d67e6ecfSHonnappa Nagarahallireturns.
144d67e6ecfSHonnappa Nagarahalli
145d67e6ecfSHonnappa NagarahalliThe writer thread can trigger the reader threads to report their quiescent
146d67e6ecfSHonnappa Nagarahallistate by calling the API ``rte_rcu_qsbr_start()``. It is possible for multiple
147d67e6ecfSHonnappa Nagarahalliwriter threads to query the quiescent state status simultaneously. Hence,
148d67e6ecfSHonnappa Nagarahalli``rte_rcu_qsbr_start()`` returns a token to each caller.
149d67e6ecfSHonnappa Nagarahalli
150d67e6ecfSHonnappa NagarahalliThe writer thread must call ``rte_rcu_qsbr_check()`` API with the token to
151d67e6ecfSHonnappa Nagarahalliget the current quiescent state status. Option to block till all the reader
152d67e6ecfSHonnappa Nagarahallithreads enter the quiescent state is provided. If this API indicates that
153d67e6ecfSHonnappa Nagarahalliall the reader threads have entered the quiescent state, the application
154d67e6ecfSHonnappa Nagarahallican free the deleted entry.
155d67e6ecfSHonnappa Nagarahalli
156d67e6ecfSHonnappa NagarahalliThe APIs ``rte_rcu_qsbr_start()`` and ``rte_rcu_qsbr_check()`` are lock free.
157d67e6ecfSHonnappa NagarahalliHence, they can be called concurrently from multiple writers even while
158d67e6ecfSHonnappa Nagarahallirunning as worker threads.
159d67e6ecfSHonnappa Nagarahalli
160d67e6ecfSHonnappa NagarahalliThe separation of triggering the reporting from querying the status provides
161d67e6ecfSHonnappa Nagarahallithe writer threads flexibility to do useful work instead of blocking for the
162d67e6ecfSHonnappa Nagarahallireader threads to enter the quiescent state or go offline. This reduces the
1630e9d3de6SHonnappa Nagarahallimemory accesses due to continuous polling for the status. But, since the
1640e9d3de6SHonnappa Nagarahalliresource is freed at a later time, the token and the reference to the deleted
1650e9d3de6SHonnappa Nagarahalliresource need to be stored for later queries.
166d67e6ecfSHonnappa Nagarahalli
167d67e6ecfSHonnappa NagarahalliThe ``rte_rcu_qsbr_synchronize()`` API combines the functionality of
168d67e6ecfSHonnappa Nagarahalli``rte_rcu_qsbr_start()`` and blocking ``rte_rcu_qsbr_check()`` into a single
169d67e6ecfSHonnappa NagarahalliAPI. This API triggers the reader threads to report their quiescent state and
170d67e6ecfSHonnappa Nagarahallipolls till all the readers enter the quiescent state or go offline. This API
171d67e6ecfSHonnappa Nagarahallidoes not allow the writer to do useful work while waiting and introduces
1720e9d3de6SHonnappa Nagarahalliadditional memory accesses due to continuous polling. However, the application
1730e9d3de6SHonnappa Nagarahallidoes not have to store the token or the reference to the deleted resource. The
1740e9d3de6SHonnappa Nagarahalliresource can be freed immediately after ``rte_rcu_qsbr_synchronize()`` API
1750e9d3de6SHonnappa Nagarahallireturns.
176d67e6ecfSHonnappa Nagarahalli
177d67e6ecfSHonnappa NagarahalliThe reader thread must call ``rte_rcu_qsbr_thread_offline()`` and
178d67e6ecfSHonnappa Nagarahalli``rte_rcu_qsbr_thread_unregister()`` APIs to remove itself from reporting its
179d67e6ecfSHonnappa Nagarahalliquiescent state. The ``rte_rcu_qsbr_check()`` API will not wait for this reader
180d67e6ecfSHonnappa Nagarahallithread to report the quiescent state status anymore.
181d67e6ecfSHonnappa Nagarahalli
182d67e6ecfSHonnappa NagarahalliThe reader threads should call ``rte_rcu_qsbr_quiescent()`` API to indicate that
183d67e6ecfSHonnappa Nagarahallithey entered a quiescent state. This API checks if a writer has triggered a
184d67e6ecfSHonnappa Nagarahalliquiescent state query and update the state accordingly.
185d67e6ecfSHonnappa Nagarahalli
186d67e6ecfSHonnappa NagarahalliThe ``rte_rcu_qsbr_lock()`` and ``rte_rcu_qsbr_unlock()`` are empty functions.
18789c67ae2SCiara PowerHowever, these APIs can aid in debugging issues. One can mark the access to
18889c67ae2SCiara Powershared data structures on the reader side using these APIs. The
18989c67ae2SCiara Power``rte_rcu_qsbr_quiescent()`` will check if all the locks are unlocked.
1901d301a8aSRuifeng Wang
1911d301a8aSRuifeng WangResource reclamation framework for DPDK
1921d301a8aSRuifeng Wang---------------------------------------
1931d301a8aSRuifeng Wang
1941d301a8aSRuifeng WangLock-free algorithms place additional burden of resource reclamation on
1951d301a8aSRuifeng Wangthe application. When a writer deletes an entry from a data structure, the writer:
1961d301a8aSRuifeng Wang
1971d301a8aSRuifeng Wang#. Has to start the grace period
1981d301a8aSRuifeng Wang#. Has to store a reference to the deleted resources in a FIFO
1991d301a8aSRuifeng Wang#. Should check if the readers have completed a grace period and free the resources.
2001d301a8aSRuifeng Wang
2011d301a8aSRuifeng WangThere are several APIs provided to help with this process. The writer
2021d301a8aSRuifeng Wangcan create a FIFO to store the references to deleted resources using ``rte_rcu_qsbr_dq_create()``.
2031d301a8aSRuifeng WangThe resources can be enqueued to this FIFO using ``rte_rcu_qsbr_dq_enqueue()``.
2041d301a8aSRuifeng WangIf the FIFO is full, ``rte_rcu_qsbr_dq_enqueue`` will reclaim the resources before enqueuing. It will also reclaim resources on regular basis to keep the FIFO from growing too large. If the writer runs out of resources, the writer can call ``rte_rcu_qsbr_dq_reclaim`` API to reclaim resources. ``rte_rcu_qsbr_dq_delete`` is provided to reclaim any remaining resources and free the FIFO while shutting down.
2051d301a8aSRuifeng Wang
2061d301a8aSRuifeng WangHowever, if this resource reclamation process were to be integrated in lock-free data structure libraries, it
2071d301a8aSRuifeng Wanghides this complexity from the application and makes it easier for the application to adopt lock-free algorithms. The following paragraphs discuss how the reclamation process can be integrated in DPDK libraries.
2081d301a8aSRuifeng Wang
2091d301a8aSRuifeng WangIn any DPDK application, the resource reclamation process using QSBR can be split into 4 parts:
2101d301a8aSRuifeng Wang
2111d301a8aSRuifeng Wang#. Initialization
2121d301a8aSRuifeng Wang#. Quiescent State Reporting
2131d301a8aSRuifeng Wang#. Reclaiming Resources
2141d301a8aSRuifeng Wang#. Shutdown
2151d301a8aSRuifeng Wang
2161d301a8aSRuifeng WangThe design proposed here assigns different parts of this process to client libraries and applications. The term 'client library' refers to lock-free data structure libraries such at rte_hash, rte_lpm etc. in DPDK or similar libraries outside of DPDK. The term 'application' refers to the packet processing application that makes use of DPDK such as L3 Forwarding example application, OVS, VPP etc..
2171d301a8aSRuifeng Wang
2181d301a8aSRuifeng WangThe application has to handle 'Initialization' and 'Quiescent State Reporting'. So,
2191d301a8aSRuifeng Wang
2201d301a8aSRuifeng Wang* the application has to create the RCU variable and register the reader threads to report their quiescent state.
2211d301a8aSRuifeng Wang* the application has to register the same RCU variable with the client library.
2221d301a8aSRuifeng Wang* reader threads in the application have to report the quiescent state. This allows for the application to control the length of the critical section/how frequently the application wants to report the quiescent state.
2231d301a8aSRuifeng Wang
2241d301a8aSRuifeng WangThe client library will handle 'Reclaiming Resources' part of the process. The
2251d301a8aSRuifeng Wangclient libraries will make use of the writer thread context to execute the memory
2261d301a8aSRuifeng Wangreclamation algorithm. So,
2271d301a8aSRuifeng Wang
2281d301a8aSRuifeng Wang* client library should provide an API to register a RCU variable that it will use. It should call ``rte_rcu_qsbr_dq_create()`` to create the FIFO to store the references to deleted entries.
2291d301a8aSRuifeng Wang* client library should use ``rte_rcu_qsbr_dq_enqueue`` to enqueue the deleted resources on the FIFO and start the grace period.
2301d301a8aSRuifeng Wang* if the library runs out of resources while adding entries, it should call ``rte_rcu_qsbr_dq_reclaim`` to reclaim the resources and try the resource allocation again.
2311d301a8aSRuifeng Wang
2321d301a8aSRuifeng WangThe 'Shutdown' process needs to be shared between the application and the
2331d301a8aSRuifeng Wangclient library.
2341d301a8aSRuifeng Wang
2351d301a8aSRuifeng Wang* the application should make sure that the reader threads are not using the shared data structure, unregister the reader threads from the QSBR variable before calling the client library's shutdown function.
2361d301a8aSRuifeng Wang
2371d301a8aSRuifeng Wang* client library should call ``rte_rcu_qsbr_dq_delete`` to reclaim any remaining resources and free the FIFO.
2381d301a8aSRuifeng Wang
2391d301a8aSRuifeng WangIntegrating the resource reclamation with client libraries removes the burden from
2401d301a8aSRuifeng Wangthe application and makes it easy to use lock-free algorithms.
2411d301a8aSRuifeng Wang
2421d301a8aSRuifeng WangThis design has several advantages over currently known methods.
2431d301a8aSRuifeng Wang
2441d301a8aSRuifeng Wang#. Application does not need a dedicated thread to reclaim resources. Memory
2451d301a8aSRuifeng Wang   reclamation happens as part of the writer thread with little impact on
2461d301a8aSRuifeng Wang   performance.
2471d301a8aSRuifeng Wang#. The client library has better control over the resources. For example: the client
2481d301a8aSRuifeng Wang   library can attempt to reclaim when it has run out of resources.
249