xref: /dpdk/doc/guides/prog_guide/rcu_lib.rst (revision f399b0171e6e64c8bbce42599afa35591a9d28f1)
1..  SPDX-License-Identifier: BSD-3-Clause
2    Copyright(c) 2019 Arm Limited.
3
4.. _RCU_Library:
5
6RCU Library
7============
8
9Lockless data structures provide scalability and determinism.
10They enable use cases where locking may not be allowed
11(for example real-time applications).
12
13In the following sections, the term "memory" refers to memory allocated
14by typical APIs like malloc() or anything that is representative of
15memory, for example an index of a free element array.
16
17Since these data structures are lockless, the writers and readers
18are accessing the data structures concurrently. Hence, while removing
19an element from a data structure, the writers cannot return the memory
20to the allocator, without knowing that the readers are not
21referencing that element/memory anymore. Hence, it is required to
22separate the operation of removing an element into two steps:
23
24#. Delete: in this step, the writer removes the reference to the element from
25   the data structure but does not return the associated memory to the
26   allocator. This will ensure that new readers will not get a reference to
27   the removed element. Removing the reference is an atomic operation.
28
29#. Free (Reclaim): in this step, the writer returns the memory to the
30   memory allocator only after knowing that all the readers have stopped
31   referencing the deleted element.
32
33This library helps the writer determine when it is safe to free the
34memory by making use of thread Quiescent State (QS).
35
36What is Quiescent State
37-----------------------
38
39Quiescent State can be defined as "any point in the thread execution where the
40thread does not hold a reference to shared memory". It is the responsibility of
41the application to determine its quiescent state.
42
43Let us consider the following diagram:
44
45.. _figure_quiescent_state:
46
47.. figure:: img/rcu_general_info.*
48
49   Phases in the Quiescent State model.
50
51
52As shown in :numref:`figure_quiescent_state`, reader thread 1 accesses data
53structures D1 and D2. When it is accessing D1, if the writer has to remove an
54element from D1, the writer cannot free the memory associated with that
55element immediately. The writer can return the memory to the allocator only
56after the reader stops referencing D1. In other words, reader thread RT1 has
57to enter a quiescent state.
58
59Similarly, since reader thread 2 is also accessing D1, the writer has to
60wait till thread 2 enters quiescent state as well.
61
62However, the writer does not need to wait for reader thread 3 to enter
63quiescent state. Reader thread 3 was not accessing D1 when the delete
64operation happened. So, reader thread 3 will not have a reference to the
65deleted entry.
66
67It can be noted that, the critical sections for D2 is a quiescent state
68for D1. i.e. for a given data structure Dx, any point in the thread execution
69that does not reference Dx is a quiescent state.
70
71Since memory is not freed immediately, there might be a need for
72provisioning of additional memory, depending on the application requirements.
73
74Factors affecting the RCU mechanism
75-----------------------------------
76
77It is important to make sure that this library keeps the overhead of
78identifying the end of grace period and subsequent freeing of memory,
79to a minimum. The following paras explain how grace period and critical
80section affect this overhead.
81
82The writer has to poll the readers to identify the end of grace period.
83Polling introduces memory accesses and wastes CPU cycles. The memory
84is not available for reuse during the grace period. Longer grace periods
85exasperate these conditions.
86
87The length of the critical section and the number of reader threads
88is proportional to the duration of the grace period. Keeping the critical
89sections smaller will keep the grace period smaller. However, keeping the
90critical sections smaller requires additional CPU cycles (due to additional
91reporting) in the readers.
92
93Hence, we need the characteristics of a small grace period and large critical
94section. This library addresses these characteristics by allowing the writer
95to do other work without having to block until the readers report their
96quiescent state.
97
98RCU in DPDK
99-----------
100
101For DPDK applications, the beginning and end of a ``while(1)`` loop (where no
102references to shared data structures are kept) act as perfect quiescent
103states. This will combine all the shared data structure accesses into a
104single, large critical section which helps keep the overhead on the
105reader side to a minimum.
106
107DPDK supports a pipeline model of packet processing and service cores.
108In these use cases, a given data structure may not be used by all the
109workers in the application. The writer has to wait only for the workers that
110use the data structure to report their quiescent state. To provide the required
111flexibility, this library has a concept of a QS variable. If required, the
112application can create one QS variable per data structure to help it track the
113end of grace period for each data structure. This helps keep the length of grace
114period to a minimum.
115
116How to use this library
117-----------------------
118
119The application must allocate memory and initialize a QS variable.
120
121Applications can call ``rte_rcu_qsbr_get_memsize()`` to calculate the size
122of memory to allocate. This API takes a maximum number of reader threads,
123using this variable, as a parameter.
124
125Further, the application can initialize a QS variable using the API
126``rte_rcu_qsbr_init()``.
127
128Each reader thread is assumed to have a unique thread ID. Currently, the
129management of the thread ID (for example allocation/free) is left to the
130application. The thread ID should be in the range of 0 to
131maximum number of threads provided while creating the QS variable.
132The application could also use ``lcore_id`` as the thread ID where applicable.
133
134The ``rte_rcu_qsbr_thread_register()`` API will register a reader thread
135to report its quiescent state. This can be called from a reader thread.
136A control plane thread can also call this on behalf of a reader thread.
137The reader thread must call ``rte_rcu_qsbr_thread_online()`` API to start
138reporting its quiescent state.
139
140Some of the use cases might require the reader threads to make blocking API
141calls (for example while using eventdev APIs). The writer thread should not
142wait for such reader threads to enter quiescent state.  The reader thread must
143call ``rte_rcu_qsbr_thread_offline()`` API, before calling blocking APIs. It
144can call ``rte_rcu_qsbr_thread_online()`` API once the blocking API call
145returns.
146
147The writer thread can trigger the reader threads to report their quiescent
148state by calling the API ``rte_rcu_qsbr_start()``. It is possible for multiple
149writer threads to query the quiescent state status simultaneously. Hence,
150``rte_rcu_qsbr_start()`` returns a token to each caller.
151
152The writer thread must call ``rte_rcu_qsbr_check()`` API with the token to
153get the current quiescent state status. Option to block till all the reader
154threads enter the quiescent state is provided. If this API indicates that
155all the reader threads have entered the quiescent state, the application
156can free the deleted entry.
157
158The APIs ``rte_rcu_qsbr_start()`` and ``rte_rcu_qsbr_check()`` are lock free.
159Hence, they can be called concurrently from multiple writers even while
160running as worker threads.
161
162The separation of triggering the reporting from querying the status provides
163the writer threads flexibility to do useful work instead of blocking for the
164reader threads to enter the quiescent state or go offline. This reduces the
165memory accesses due to continuous polling for the status. But, since the
166resource is freed at a later time, the token and the reference to the deleted
167resource need to be stored for later queries.
168
169The ``rte_rcu_qsbr_synchronize()`` API combines the functionality of
170``rte_rcu_qsbr_start()`` and blocking ``rte_rcu_qsbr_check()`` into a single
171API. This API triggers the reader threads to report their quiescent state and
172polls till all the readers enter the quiescent state or go offline. This API
173does not allow the writer to do useful work while waiting and introduces
174additional memory accesses due to continuous polling. However, the application
175does not have to store the token or the reference to the deleted resource. The
176resource can be freed immediately after ``rte_rcu_qsbr_synchronize()`` API
177returns.
178
179The reader thread must call ``rte_rcu_qsbr_thread_offline()`` and
180``rte_rcu_qsbr_thread_unregister()`` APIs to remove itself from reporting its
181quiescent state. The ``rte_rcu_qsbr_check()`` API will not wait for this reader
182thread to report the quiescent state status anymore.
183
184The reader threads should call ``rte_rcu_qsbr_quiescent()`` API to indicate that
185they entered a quiescent state. This API checks if a writer has triggered a
186quiescent state query and update the state accordingly.
187
188The ``rte_rcu_qsbr_lock()`` and ``rte_rcu_qsbr_unlock()`` are empty functions.
189However, when ``CONFIG_RTE_LIBRTE_RCU_DEBUG`` is enabled, these APIs aid
190in debugging issues. One can mark the access to shared data structures on the
191reader side using these APIs. The ``rte_rcu_qsbr_quiescent()`` will check if
192all the locks are unlocked.
193
194Resource reclamation framework for DPDK
195---------------------------------------
196
197Lock-free algorithms place additional burden of resource reclamation on
198the application. When a writer deletes an entry from a data structure, the writer:
199
200#. Has to start the grace period
201#. Has to store a reference to the deleted resources in a FIFO
202#. Should check if the readers have completed a grace period and free the resources.
203
204There are several APIs provided to help with this process. The writer
205can create a FIFO to store the references to deleted resources using ``rte_rcu_qsbr_dq_create()``.
206The resources can be enqueued to this FIFO using ``rte_rcu_qsbr_dq_enqueue()``.
207If the FIFO is full, ``rte_rcu_qsbr_dq_enqueue`` will reclaim the resources before enqueuing. It will also reclaim resources on regular basis to keep the FIFO from growing too large. If the writer runs out of resources, the writer can call ``rte_rcu_qsbr_dq_reclaim`` API to reclaim resources. ``rte_rcu_qsbr_dq_delete`` is provided to reclaim any remaining resources and free the FIFO while shutting down.
208
209However, if this resource reclamation process were to be integrated in lock-free data structure libraries, it
210hides this complexity from the application and makes it easier for the application to adopt lock-free algorithms. The following paragraphs discuss how the reclamation process can be integrated in DPDK libraries.
211
212In any DPDK application, the resource reclamation process using QSBR can be split into 4 parts:
213
214#. Initialization
215#. Quiescent State Reporting
216#. Reclaiming Resources
217#. Shutdown
218
219The design proposed here assigns different parts of this process to client libraries and applications. The term 'client library' refers to lock-free data structure libraries such at rte_hash, rte_lpm etc. in DPDK or similar libraries outside of DPDK. The term 'application' refers to the packet processing application that makes use of DPDK such as L3 Forwarding example application, OVS, VPP etc..
220
221The application has to handle 'Initialization' and 'Quiescent State Reporting'. So,
222
223* the application has to create the RCU variable and register the reader threads to report their quiescent state.
224* the application has to register the same RCU variable with the client library.
225* reader threads in the application have to report the quiescent state. This allows for the application to control the length of the critical section/how frequently the application wants to report the quiescent state.
226
227The client library will handle 'Reclaiming Resources' part of the process. The
228client libraries will make use of the writer thread context to execute the memory
229reclamation algorithm. So,
230
231* client library should provide an API to register a RCU variable that it will use. It should call ``rte_rcu_qsbr_dq_create()`` to create the FIFO to store the references to deleted entries.
232* client library should use ``rte_rcu_qsbr_dq_enqueue`` to enqueue the deleted resources on the FIFO and start the grace period.
233* if the library runs out of resources while adding entries, it should call ``rte_rcu_qsbr_dq_reclaim`` to reclaim the resources and try the resource allocation again.
234
235The 'Shutdown' process needs to be shared between the application and the
236client library.
237
238* the application should make sure that the reader threads are not using the shared data structure, unregister the reader threads from the QSBR variable before calling the client library's shutdown function.
239
240* client library should call ``rte_rcu_qsbr_dq_delete`` to reclaim any remaining resources and free the FIFO.
241
242Integrating the resource reclamation with client libraries removes the burden from
243the application and makes it easy to use lock-free algorithms.
244
245This design has several advantages over currently known methods.
246
247#. Application does not need a dedicated thread to reclaim resources. Memory
248   reclamation happens as part of the writer thread with little impact on
249   performance.
250#. The client library has better control over the resources. For example: the client
251   library can attempt to reclaim when it has run out of resources.
252