xref: /dpdk/doc/guides/prog_guide/ring_lib.rst (revision b5458e2cc48349b314c7354e4ddfd2100bd55c29)
15630257fSFerruh Yigit..  SPDX-License-Identifier: BSD-3-Clause
25630257fSFerruh Yigit    Copyright(c) 2010-2014 Intel Corporation.
3fc1f2750SBernard Iremonger
4fc1f2750SBernard IremongerRing Library
5fc1f2750SBernard Iremonger============
6fc1f2750SBernard Iremonger
7fc1f2750SBernard IremongerThe ring allows the management of queues.
8fc1f2750SBernard IremongerInstead of having a linked list of infinite size, the rte_ring has the following properties:
9fc1f2750SBernard Iremonger
10fc1f2750SBernard Iremonger*   FIFO
11fc1f2750SBernard Iremonger
12e89680d0SHonnappa Nagarahalli*   Maximum size is fixed, the objects are stored in a table
13e89680d0SHonnappa Nagarahalli
14e89680d0SHonnappa Nagarahalli*   Objects can be pointers or elements of multiple of 4 byte size
15fc1f2750SBernard Iremonger
16fc1f2750SBernard Iremonger*   Lockless implementation
17fc1f2750SBernard Iremonger
18fc1f2750SBernard Iremonger*   Multi-consumer or single-consumer dequeue
19fc1f2750SBernard Iremonger
20fc1f2750SBernard Iremonger*   Multi-producer or single-producer enqueue
21fc1f2750SBernard Iremonger
22fc1f2750SBernard Iremonger*   Bulk dequeue - Dequeues the specified count of objects if successful; otherwise fails
23fc1f2750SBernard Iremonger
24fc1f2750SBernard Iremonger*   Bulk enqueue - Enqueues the specified count of objects if successful; otherwise fails
25fc1f2750SBernard Iremonger
26fc1f2750SBernard Iremonger*   Burst dequeue - Dequeue the maximum available objects if the specified count cannot be fulfilled
27fc1f2750SBernard Iremonger
28fc1f2750SBernard Iremonger*   Burst enqueue - Enqueue the maximum available objects if the specified count cannot be fulfilled
29fc1f2750SBernard Iremonger
30fc1f2750SBernard IremongerThe advantages of this data structure over a linked list queue are as follows:
31fc1f2750SBernard Iremonger
32e89680d0SHonnappa Nagarahalli*   Faster; only requires a single 32 bit Compare-And-Swap instruction instead of several pointer size Compare-And-Swap instructions.
33fc1f2750SBernard Iremonger
34fc1f2750SBernard Iremonger*   Simpler than a full lockless queue.
35fc1f2750SBernard Iremonger
36fc1f2750SBernard Iremonger*   Adapted to bulk enqueue/dequeue operations.
37e89680d0SHonnappa Nagarahalli    As objects are stored in a table, a dequeue of several objects will not produce as many cache misses as in a linked queue.
38fc1f2750SBernard Iremonger    Also, a bulk dequeue of many objects does not cost more than a dequeue of a simple object.
39fc1f2750SBernard Iremonger
40fc1f2750SBernard IremongerThe disadvantages:
41fc1f2750SBernard Iremonger
42fc1f2750SBernard Iremonger*   Size is fixed
43fc1f2750SBernard Iremonger
44e89680d0SHonnappa Nagarahalli*   Having many rings costs more in terms of memory than a linked list queue. An empty ring contains at least N objects.
45fc1f2750SBernard Iremonger
46fc1f2750SBernard IremongerA simplified representation of a Ring is shown in with consumer and producer head and tail pointers to objects stored in the data structure.
47fc1f2750SBernard Iremonger
484a22e6eeSJohn McNamara.. _figure_ring1:
49fc1f2750SBernard Iremonger
504a22e6eeSJohn McNamara.. figure:: img/ring1.*
51fc1f2750SBernard Iremonger
524a22e6eeSJohn McNamara   Ring Structure
53fc1f2750SBernard Iremonger
54fc1f2750SBernard Iremonger
55fc1f2750SBernard IremongerReferences for Ring Implementation in FreeBSD*
56fc1f2750SBernard Iremonger----------------------------------------------
57fc1f2750SBernard Iremonger
58fc1f2750SBernard IremongerThe following code was added in FreeBSD 8.0, and is used in some network device drivers (at least in Intel drivers):
59fc1f2750SBernard Iremonger
60fc1f2750SBernard Iremonger    * `bufring.h in FreeBSD <http://svn.freebsd.org/viewvc/base/release/8.0.0/sys/sys/buf_ring.h?revision=199625&amp;view=markup>`_
61fc1f2750SBernard Iremonger
62fc1f2750SBernard Iremonger    * `bufring.c in FreeBSD <http://svn.freebsd.org/viewvc/base/release/8.0.0/sys/kern/subr_bufring.c?revision=199625&amp;view=markup>`_
63fc1f2750SBernard Iremonger
64fc1f2750SBernard IremongerLockless Ring Buffer in Linux*
65fc1f2750SBernard Iremonger------------------------------
66fc1f2750SBernard Iremonger
67fc1f2750SBernard IremongerThe following is a link describing the `Linux Lockless Ring Buffer Design <http://lwn.net/Articles/340400/>`_.
68fc1f2750SBernard Iremonger
69fc1f2750SBernard IremongerAdditional Features
70fc1f2750SBernard Iremonger-------------------
71fc1f2750SBernard Iremonger
72fc1f2750SBernard IremongerName
73fc1f2750SBernard Iremonger~~~~
74fc1f2750SBernard Iremonger
75fc1f2750SBernard IremongerA ring is identified by a unique name.
76fc1f2750SBernard IremongerIt is not possible to create two rings with the same name (rte_ring_create() returns NULL if this is attempted).
77fc1f2750SBernard Iremonger
78fc1f2750SBernard IremongerUse Cases
79fc1f2750SBernard Iremonger---------
80fc1f2750SBernard Iremonger
81fc1f2750SBernard IremongerUse cases for the Ring library include:
82fc1f2750SBernard Iremonger
8348624fd9SSiobhan Butler    *  Communication between applications in the DPDK
84fc1f2750SBernard Iremonger
85fc1f2750SBernard Iremonger    *  Used by memory pool allocator
86fc1f2750SBernard Iremonger
87fc1f2750SBernard IremongerAnatomy of a Ring Buffer
88fc1f2750SBernard Iremonger------------------------
89fc1f2750SBernard Iremonger
90fc1f2750SBernard IremongerThis section explains how a ring buffer operates.
91fc1f2750SBernard IremongerThe ring structure is composed of two head and tail couples; one is used by producers and one is used by the consumers.
92fc1f2750SBernard IremongerThe figures of the following sections refer to them as prod_head, prod_tail, cons_head and cons_tail.
93fc1f2750SBernard Iremonger
94fc1f2750SBernard IremongerEach figure represents a simplified state of the ring, which is a circular buffer.
95fc1f2750SBernard IremongerThe content of the function local variables is represented on the top of the figure,
96fc1f2750SBernard Iremongerand the content of ring structure is represented on the bottom of the figure.
97fc1f2750SBernard Iremonger
98fc1f2750SBernard IremongerSingle Producer Enqueue
99fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~~~~~~~
100fc1f2750SBernard Iremonger
101fc1f2750SBernard IremongerThis section explains what occurs when a producer adds an object to the ring.
102fc1f2750SBernard IremongerIn this example, only the producer head and tail (prod_head and prod_tail) are modified,
103fc1f2750SBernard Iremongerand there is only one producer.
104fc1f2750SBernard Iremonger
105fc1f2750SBernard IremongerThe initial state is to have a prod_head and prod_tail pointing at the same location.
106fc1f2750SBernard Iremonger
107fc1f2750SBernard IremongerEnqueue First Step
108fc1f2750SBernard Iremonger^^^^^^^^^^^^^^^^^^
109fc1f2750SBernard Iremonger
110fc1f2750SBernard IremongerFirst, *ring->prod_head* and ring->cons_tail are copied in local variables.
111fc1f2750SBernard IremongerThe prod_next local variable points to the next element of the table, or several elements after in case of bulk enqueue.
112fc1f2750SBernard Iremonger
113fc1f2750SBernard IremongerIf there is not enough room in the ring (this is detected by checking cons_tail), it returns an error.
114fc1f2750SBernard Iremonger
115fc1f2750SBernard Iremonger
1164a22e6eeSJohn McNamara.. _figure_ring-enqueue1:
1174a22e6eeSJohn McNamara
1184a22e6eeSJohn McNamara.. figure:: img/ring-enqueue1.*
1194a22e6eeSJohn McNamara
1204a22e6eeSJohn McNamara   Enqueue first step
1214a22e6eeSJohn McNamara
122fc1f2750SBernard Iremonger
123fc1f2750SBernard IremongerEnqueue Second Step
124fc1f2750SBernard Iremonger^^^^^^^^^^^^^^^^^^^
125fc1f2750SBernard Iremonger
126fc1f2750SBernard IremongerThe second step is to modify *ring->prod_head* in ring structure to point to the same location as prod_next.
127fc1f2750SBernard Iremonger
128e89680d0SHonnappa NagarahalliThe added object is copied in the ring (obj4).
129fc1f2750SBernard Iremonger
130fc1f2750SBernard Iremonger
1314a22e6eeSJohn McNamara.. _figure_ring-enqueue2:
1324a22e6eeSJohn McNamara
1334a22e6eeSJohn McNamara.. figure:: img/ring-enqueue2.*
1344a22e6eeSJohn McNamara
1354a22e6eeSJohn McNamara   Enqueue second step
1364a22e6eeSJohn McNamara
137fc1f2750SBernard Iremonger
138fc1f2750SBernard IremongerEnqueue Last Step
139fc1f2750SBernard Iremonger^^^^^^^^^^^^^^^^^
140fc1f2750SBernard Iremonger
141fc1f2750SBernard IremongerOnce the object is added in the ring, ring->prod_tail in the ring structure is modified to point to the same location as *ring->prod_head*.
142fc1f2750SBernard IremongerThe enqueue operation is finished.
143fc1f2750SBernard Iremonger
144fc1f2750SBernard Iremonger
1454a22e6eeSJohn McNamara.. _figure_ring-enqueue3:
1464a22e6eeSJohn McNamara
1474a22e6eeSJohn McNamara.. figure:: img/ring-enqueue3.*
1484a22e6eeSJohn McNamara
1494a22e6eeSJohn McNamara   Enqueue last step
1504a22e6eeSJohn McNamara
151fc1f2750SBernard Iremonger
152fc1f2750SBernard IremongerSingle Consumer Dequeue
153fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~~~~~~~
154fc1f2750SBernard Iremonger
155fc1f2750SBernard IremongerThis section explains what occurs when a consumer dequeues an object from the ring.
156fc1f2750SBernard IremongerIn this example, only the consumer head and tail (cons_head and cons_tail) are modified and there is only one consumer.
157fc1f2750SBernard Iremonger
158fc1f2750SBernard IremongerThe initial state is to have a cons_head and cons_tail pointing at the same location.
159fc1f2750SBernard Iremonger
160fc1f2750SBernard IremongerDequeue First Step
161fc1f2750SBernard Iremonger^^^^^^^^^^^^^^^^^^
162fc1f2750SBernard Iremonger
163fc1f2750SBernard IremongerFirst, ring->cons_head and ring->prod_tail are copied in local variables.
164fc1f2750SBernard IremongerThe cons_next local variable points to the next element of the table, or several elements after in the case of bulk dequeue.
165fc1f2750SBernard Iremonger
166fc1f2750SBernard IremongerIf there are not enough objects in the ring (this is detected by checking prod_tail), it returns an error.
167fc1f2750SBernard Iremonger
168fc1f2750SBernard Iremonger
1694a22e6eeSJohn McNamara.. _figure_ring-dequeue1:
1704a22e6eeSJohn McNamara
1714a22e6eeSJohn McNamara.. figure:: img/ring-dequeue1.*
1724a22e6eeSJohn McNamara
173a2248c87SHaiyue Wang   Dequeue first step
1744a22e6eeSJohn McNamara
175fc1f2750SBernard Iremonger
176fc1f2750SBernard IremongerDequeue Second Step
177fc1f2750SBernard Iremonger^^^^^^^^^^^^^^^^^^^
178fc1f2750SBernard Iremonger
179fc1f2750SBernard IremongerThe second step is to modify ring->cons_head in the ring structure to point to the same location as cons_next.
180fc1f2750SBernard Iremonger
181e89680d0SHonnappa NagarahalliThe dequeued object (obj1) is copied in the pointer given by the user.
182fc1f2750SBernard Iremonger
183fc1f2750SBernard Iremonger
1844a22e6eeSJohn McNamara.. _figure_ring-dequeue2:
1854a22e6eeSJohn McNamara
1864a22e6eeSJohn McNamara.. figure:: img/ring-dequeue2.*
1874a22e6eeSJohn McNamara
1884a22e6eeSJohn McNamara   Dequeue second step
1894a22e6eeSJohn McNamara
190fc1f2750SBernard Iremonger
191fc1f2750SBernard IremongerDequeue Last Step
192fc1f2750SBernard Iremonger^^^^^^^^^^^^^^^^^
193fc1f2750SBernard Iremonger
194fc1f2750SBernard IremongerFinally, ring->cons_tail in the ring structure is modified to point to the same location as ring->cons_head.
195fc1f2750SBernard IremongerThe dequeue operation is finished.
196fc1f2750SBernard Iremonger
197fc1f2750SBernard Iremonger
1984a22e6eeSJohn McNamara.. _figure_ring-dequeue3:
1994a22e6eeSJohn McNamara
2004a22e6eeSJohn McNamara.. figure:: img/ring-dequeue3.*
2014a22e6eeSJohn McNamara
2024a22e6eeSJohn McNamara   Dequeue last step
2034a22e6eeSJohn McNamara
204fc1f2750SBernard Iremonger
205fc1f2750SBernard IremongerMultiple Producers Enqueue
206fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~~~~~~~~~~
207fc1f2750SBernard Iremonger
208fc1f2750SBernard IremongerThis section explains what occurs when two producers concurrently add an object to the ring.
209fc1f2750SBernard IremongerIn this example, only the producer head and tail (prod_head and prod_tail) are modified.
210fc1f2750SBernard Iremonger
211fc1f2750SBernard IremongerThe initial state is to have a prod_head and prod_tail pointing at the same location.
212fc1f2750SBernard Iremonger
213cdf5a9f3SShreyansh JainMultiple Producers Enqueue First Step
214cdf5a9f3SShreyansh Jain^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
215fc1f2750SBernard Iremonger
216fc1f2750SBernard IremongerOn both cores, *ring->prod_head* and ring->cons_tail are copied in local variables.
217fc1f2750SBernard IremongerThe prod_next local variable points to the next element of the table,
218fc1f2750SBernard Iremongeror several elements after in the case of bulk enqueue.
219fc1f2750SBernard Iremonger
2200058f08dSPablo de LaraIf there is not enough room in the ring (this is detected by checking cons_tail), it returns an error.
221fc1f2750SBernard Iremonger
222fc1f2750SBernard Iremonger
2234a22e6eeSJohn McNamara.. _figure_ring-mp-enqueue1:
224fc1f2750SBernard Iremonger
2254a22e6eeSJohn McNamara.. figure:: img/ring-mp-enqueue1.*
2264a22e6eeSJohn McNamara
227cdf5a9f3SShreyansh Jain   Multiple producer enqueue first step
2284a22e6eeSJohn McNamara
2294a22e6eeSJohn McNamara
230cdf5a9f3SShreyansh JainMultiple Producers Enqueue Second Step
231cdf5a9f3SShreyansh Jain^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
232fc1f2750SBernard Iremonger
233fc1f2750SBernard IremongerThe second step is to modify ring->prod_head in the ring structure to point to the same location as prod_next.
234fc1f2750SBernard IremongerThis operation is done using a Compare And Swap (CAS) instruction, which does the following operations atomically:
235fc1f2750SBernard Iremonger
236fc1f2750SBernard Iremonger*   If ring->prod_head is different to local variable prod_head,
237fc1f2750SBernard Iremonger    the CAS operation fails, and the code restarts at first step.
238fc1f2750SBernard Iremonger
239fc1f2750SBernard Iremonger*   Otherwise, ring->prod_head is set to local prod_next,
240fc1f2750SBernard Iremonger    the CAS operation is successful, and processing continues.
241fc1f2750SBernard Iremonger
242fc1f2750SBernard IremongerIn the figure, the operation succeeded on core 1, and step one restarted on core 2.
243fc1f2750SBernard Iremonger
244fc1f2750SBernard Iremonger
2454a22e6eeSJohn McNamara.. _figure_ring-mp-enqueue2:
246fc1f2750SBernard Iremonger
2474a22e6eeSJohn McNamara.. figure:: img/ring-mp-enqueue2.*
2484a22e6eeSJohn McNamara
249cdf5a9f3SShreyansh Jain   Multiple producer enqueue second step
2504a22e6eeSJohn McNamara
2514a22e6eeSJohn McNamara
252cdf5a9f3SShreyansh JainMultiple Producers Enqueue Third Step
253cdf5a9f3SShreyansh Jain^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
254fc1f2750SBernard Iremonger
255fc1f2750SBernard IremongerThe CAS operation is retried on core 2 with success.
256fc1f2750SBernard Iremonger
257fc1f2750SBernard IremongerThe core 1 updates one element of the ring(obj4), and the core 2 updates another one (obj5).
258fc1f2750SBernard Iremonger
259fc1f2750SBernard Iremonger
2604a22e6eeSJohn McNamara.. _figure_ring-mp-enqueue3:
261fc1f2750SBernard Iremonger
2624a22e6eeSJohn McNamara.. figure:: img/ring-mp-enqueue3.*
2634a22e6eeSJohn McNamara
264cdf5a9f3SShreyansh Jain   Multiple producer enqueue third step
2654a22e6eeSJohn McNamara
2664a22e6eeSJohn McNamara
267cdf5a9f3SShreyansh JainMultiple Producers Enqueue Fourth Step
268cdf5a9f3SShreyansh Jain^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
269fc1f2750SBernard Iremonger
270fc1f2750SBernard IremongerEach core now wants to update ring->prod_tail.
271fc1f2750SBernard IremongerA core can only update it if ring->prod_tail is equal to the prod_head local variable.
272fc1f2750SBernard IremongerThis is only true on core 1. The operation is finished on core 1.
273fc1f2750SBernard Iremonger
274fc1f2750SBernard Iremonger
2754a22e6eeSJohn McNamara.. _figure_ring-mp-enqueue4:
276fc1f2750SBernard Iremonger
2774a22e6eeSJohn McNamara.. figure:: img/ring-mp-enqueue4.*
2784a22e6eeSJohn McNamara
279cdf5a9f3SShreyansh Jain   Multiple producer enqueue fourth step
2804a22e6eeSJohn McNamara
2814a22e6eeSJohn McNamara
282cdf5a9f3SShreyansh JainMultiple Producers Enqueue Last Step
283cdf5a9f3SShreyansh Jain^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
284fc1f2750SBernard Iremonger
285fc1f2750SBernard IremongerOnce ring->prod_tail is updated by core 1, core 2 is allowed to update it too.
286fc1f2750SBernard IremongerThe operation is also finished on core 2.
287fc1f2750SBernard Iremonger
288fc1f2750SBernard Iremonger
2894a22e6eeSJohn McNamara.. _figure_ring-mp-enqueue5:
2904a22e6eeSJohn McNamara
2914a22e6eeSJohn McNamara.. figure:: img/ring-mp-enqueue5.*
2924a22e6eeSJohn McNamara
293cdf5a9f3SShreyansh Jain   Multiple producer enqueue last step
2944a22e6eeSJohn McNamara
295fc1f2750SBernard Iremonger
296fc1f2750SBernard IremongerModulo 32-bit Indexes
297fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~~~~~
298fc1f2750SBernard Iremonger
299fc1f2750SBernard IremongerIn the preceding figures, the prod_head, prod_tail, cons_head and cons_tail indexes are represented by arrows.
300fc1f2750SBernard IremongerIn the actual implementation, these values are not between 0 and size(ring)-1 as would be assumed.
301e89680d0SHonnappa NagarahalliThe indexes are between 0 and 2^32 -1, and we mask their value when we access the object table (the ring itself).
302fc1f2750SBernard Iremonger32-bit modulo also implies that operations on indexes (such as, add/subtract) will automatically do 2^32 modulo
303fc1f2750SBernard Iremongerif the result overflows the 32-bit number range.
304fc1f2750SBernard Iremonger
305fc1f2750SBernard IremongerThe following are two examples that help to explain how indexes are used in a ring.
306fc1f2750SBernard Iremonger
307fc1f2750SBernard Iremonger.. note::
308fc1f2750SBernard Iremonger
309fc1f2750SBernard Iremonger    To simplify the explanation, operations with modulo 16-bit are used instead of modulo 32-bit.
310fc1f2750SBernard Iremonger    In addition, the four indexes are defined as unsigned 16-bit integers,
311fc1f2750SBernard Iremonger    as opposed to unsigned 32-bit integers in the more realistic case.
312fc1f2750SBernard Iremonger
313fc1f2750SBernard Iremonger
3144a22e6eeSJohn McNamara.. _figure_ring-modulo1:
3154a22e6eeSJohn McNamara
3164a22e6eeSJohn McNamara.. figure:: img/ring-modulo1.*
3174a22e6eeSJohn McNamara
3184a22e6eeSJohn McNamara   Modulo 32-bit indexes - Example 1
3194a22e6eeSJohn McNamara
320fc1f2750SBernard Iremonger
321fc1f2750SBernard IremongerThis ring contains 11000 entries.
322fc1f2750SBernard Iremonger
323fc1f2750SBernard Iremonger
3244a22e6eeSJohn McNamara.. _figure_ring-modulo2:
3254a22e6eeSJohn McNamara
3264a22e6eeSJohn McNamara.. figure:: img/ring-modulo2.*
3274a22e6eeSJohn McNamara
3284a22e6eeSJohn McNamara      Modulo 32-bit indexes - Example 2
3294a22e6eeSJohn McNamara
330fc1f2750SBernard Iremonger
331fc1f2750SBernard IremongerThis ring contains 12536 entries.
332fc1f2750SBernard Iremonger
333fc1f2750SBernard Iremonger.. note::
334fc1f2750SBernard Iremonger
335fc1f2750SBernard Iremonger    For ease of understanding, we use modulo 65536 operations in the above examples.
336fc1f2750SBernard Iremonger    In real execution cases, this is redundant for low efficiency, but is done automatically when the result overflows.
337fc1f2750SBernard Iremonger
338fc1f2750SBernard IremongerThe code always maintains a distance between producer and consumer between 0 and size(ring)-1.
339fc1f2750SBernard IremongerThanks to this property, we can do subtractions between 2 index values in a modulo-32bit base:
340fc1f2750SBernard Iremongerthat's why the overflow of the indexes is not a problem.
341fc1f2750SBernard Iremonger
342fc1f2750SBernard IremongerAt any time, entries and free_entries are between 0 and size(ring)-1,
343fc1f2750SBernard Iremongereven if only the first term of subtraction has overflowed:
344fc1f2750SBernard Iremonger
345fc1f2750SBernard Iremonger.. code-block:: c
346fc1f2750SBernard Iremonger
347fc1f2750SBernard Iremonger    uint32_t entries = (prod_tail - cons_head);
348fc1f2750SBernard Iremonger    uint32_t free_entries = (mask + cons_tail -prod_head);
349fc1f2750SBernard Iremonger
350ebff988dSKonstantin AnanyevProducer/consumer synchronization modes
351ebff988dSKonstantin Ananyev---------------------------------------
352ebff988dSKonstantin Ananyev
353ebff988dSKonstantin Ananyevrte_ring supports different synchronization modes for producers and consumers.
354ebff988dSKonstantin AnanyevThese modes can be specified at ring creation/init time via ``flags``
355ebff988dSKonstantin Ananyevparameter.
356ebff988dSKonstantin AnanyevThat should help users to configure ring in the most suitable way for his
357ebff988dSKonstantin Ananyevspecific usage scenarios.
358ebff988dSKonstantin AnanyevCurrently supported modes:
359ebff988dSKonstantin Ananyev
360e1187407SKonstantin Ananyev.. _Ring_Library_MPMC_Mode:
361e1187407SKonstantin Ananyev
362ebff988dSKonstantin AnanyevMP/MC (default one)
363ebff988dSKonstantin Ananyev~~~~~~~~~~~~~~~~~~~
364ebff988dSKonstantin Ananyev
365ebff988dSKonstantin AnanyevMulti-producer (/multi-consumer) mode. This is a default enqueue (/dequeue)
366ebff988dSKonstantin Ananyevmode for the ring. In this mode multiple threads can enqueue (/dequeue)
367ebff988dSKonstantin Ananyevobjects to (/from) the ring. For 'classic' DPDK deployments (with one thread
368ebff988dSKonstantin Ananyevper core) this is usually the most suitable and fastest synchronization mode.
369ebff988dSKonstantin AnanyevAs a well known limitation - it can perform quite pure on some overcommitted
370ebff988dSKonstantin Ananyevscenarios.
371ebff988dSKonstantin Ananyev
372e1187407SKonstantin Ananyev.. _Ring_Library_SPSC_Mode:
373e1187407SKonstantin Ananyev
374ebff988dSKonstantin AnanyevSP/SC
375ebff988dSKonstantin Ananyev~~~~~
376ebff988dSKonstantin AnanyevSingle-producer (/single-consumer) mode. In this mode only one thread at a time
377ebff988dSKonstantin Ananyevis allowed to enqueue (/dequeue) objects to (/from) the ring.
378ebff988dSKonstantin Ananyev
379e1187407SKonstantin Ananyev.. _Ring_Library_MT_RTS_Mode:
380e1187407SKonstantin Ananyev
381e6ba4731SKonstantin AnanyevMP_RTS/MC_RTS
382e6ba4731SKonstantin Ananyev~~~~~~~~~~~~~
383e6ba4731SKonstantin Ananyev
384e6ba4731SKonstantin AnanyevMulti-producer (/multi-consumer) with Relaxed Tail Sync (RTS) mode.
385e6ba4731SKonstantin AnanyevThe main difference from the original MP/MC algorithm is that
386e6ba4731SKonstantin Ananyevtail value is increased not by every thread that finished enqueue/dequeue,
387e6ba4731SKonstantin Ananyevbut only by the last one.
388e6ba4731SKonstantin AnanyevThat allows threads to avoid spinning on ring tail value,
389e6ba4731SKonstantin Ananyevleaving actual tail value change to the last thread at a given instance.
390e6ba4731SKonstantin AnanyevThat technique helps to avoid the Lock-Waiter-Preemption (LWP) problem on tail
391e6ba4731SKonstantin Ananyevupdate and improves average enqueue/dequeue times on overcommitted systems.
392e6ba4731SKonstantin AnanyevTo achieve that RTS requires 2 64-bit CAS for each enqueue(/dequeue) operation:
393e6ba4731SKonstantin Ananyevone for head update, second for tail update.
394e6ba4731SKonstantin AnanyevIn comparison the original MP/MC algorithm requires one 32-bit CAS
395e6ba4731SKonstantin Ananyevfor head update and waiting/spinning on tail value.
396e6ba4731SKonstantin Ananyev
397e1187407SKonstantin Ananyev.. _Ring_Library_MT_HTS_Mode:
398e1187407SKonstantin Ananyev
3991cc363b8SKonstantin AnanyevMP_HTS/MC_HTS
4001cc363b8SKonstantin Ananyev~~~~~~~~~~~~~
4011cc363b8SKonstantin Ananyev
4021cc363b8SKonstantin AnanyevMulti-producer (/multi-consumer) with Head/Tail Sync (HTS) mode.
4031cc363b8SKonstantin AnanyevIn that mode enqueue/dequeue operation is fully serialized:
4041cc363b8SKonstantin Ananyevat any given moment only one enqueue/dequeue operation can proceed.
4051cc363b8SKonstantin AnanyevThis is achieved by allowing a thread to proceed with changing ``head.value``
4061cc363b8SKonstantin Ananyevonly when ``head.value == tail.value``.
4071cc363b8SKonstantin AnanyevBoth head and tail values are updated atomically (as one 64-bit value).
4081cc363b8SKonstantin AnanyevTo achieve that 64-bit CAS is used by head update routine.
4091cc363b8SKonstantin AnanyevThat technique also avoids the Lock-Waiter-Preemption (LWP) problem on tail
4101cc363b8SKonstantin Ananyevupdate and helps to improve ring enqueue/dequeue behavior in overcommitted
4111cc363b8SKonstantin Ananyevscenarios. Another advantage of fully serialized producer/consumer -
4121cc363b8SKonstantin Ananyevit provides the ability to implement MT safe peek API for rte_ring.
4131cc363b8SKonstantin Ananyev
414664ff4b1SKonstantin AnanyevRing Peek API
415664ff4b1SKonstantin Ananyev-------------
416664ff4b1SKonstantin Ananyev
417664ff4b1SKonstantin AnanyevFor ring with serialized producer/consumer (HTS sync mode) it is possible
418664ff4b1SKonstantin Ananyevto split public enqueue/dequeue API into two phases:
419664ff4b1SKonstantin Ananyev
420664ff4b1SKonstantin Ananyev*   enqueue/dequeue start
421664ff4b1SKonstantin Ananyev
422664ff4b1SKonstantin Ananyev*   enqueue/dequeue finish
423664ff4b1SKonstantin Ananyev
424664ff4b1SKonstantin AnanyevThat allows user to inspect objects in the ring without removing them
425664ff4b1SKonstantin Ananyevfrom it (aka MT safe peek) and reserve space for the objects in the ring
426664ff4b1SKonstantin Ananyevbefore actual enqueue.
427664ff4b1SKonstantin AnanyevNote that this API is available only for two sync modes:
428664ff4b1SKonstantin Ananyev
429664ff4b1SKonstantin Ananyev*   Single Producer/Single Consumer (SP/SC)
430664ff4b1SKonstantin Ananyev
431664ff4b1SKonstantin Ananyev*   Multi-producer/Multi-consumer with Head/Tail Sync (HTS)
432664ff4b1SKonstantin Ananyev
433664ff4b1SKonstantin AnanyevIt is a user responsibility to create/init ring with appropriate sync modes
434664ff4b1SKonstantin Ananyevselected. As an example of usage:
435664ff4b1SKonstantin Ananyev
436664ff4b1SKonstantin Ananyev.. code-block:: c
437664ff4b1SKonstantin Ananyev
438664ff4b1SKonstantin Ananyev    /* read 1 elem from the ring: */
439664ff4b1SKonstantin Ananyev    uint32_t n = rte_ring_dequeue_bulk_start(ring, &obj, 1, NULL);
440664ff4b1SKonstantin Ananyev    if (n != 0) {
441664ff4b1SKonstantin Ananyev        /* examine object */
442664ff4b1SKonstantin Ananyev        if (object_examine(obj) == KEEP)
443664ff4b1SKonstantin Ananyev            /* decided to keep it in the ring. */
444664ff4b1SKonstantin Ananyev            rte_ring_dequeue_finish(ring, 0);
445664ff4b1SKonstantin Ananyev        else
446664ff4b1SKonstantin Ananyev            /* decided to remove it from the ring. */
447664ff4b1SKonstantin Ananyev            rte_ring_dequeue_finish(ring, n);
448664ff4b1SKonstantin Ananyev    }
449664ff4b1SKonstantin Ananyev
450664ff4b1SKonstantin AnanyevNote that between ``_start_`` and ``_finish_`` none other thread can proceed
451664ff4b1SKonstantin Ananyevwith enqueue(/dequeue) operation till ``_finish_`` completes.
452664ff4b1SKonstantin Ananyev
45347bec9a5SHonnappa NagarahalliRing Peek Zero Copy API
45447bec9a5SHonnappa Nagarahalli-----------------------
45547bec9a5SHonnappa Nagarahalli
45647bec9a5SHonnappa NagarahalliAlong with the advantages of the peek APIs, zero copy APIs provide the ability
45747bec9a5SHonnappa Nagarahallito copy the data to the ring memory directly without the need for temporary
45847bec9a5SHonnappa Nagarahallistorage (for ex: array of mbufs on the stack).
45947bec9a5SHonnappa Nagarahalli
46047bec9a5SHonnappa NagarahalliThese APIs make it possible to split public enqueue/dequeue API into 3 phases:
46147bec9a5SHonnappa Nagarahalli
46247bec9a5SHonnappa Nagarahalli* enqueue/dequeue start
46347bec9a5SHonnappa Nagarahalli
46447bec9a5SHonnappa Nagarahalli* copy data to/from the ring
46547bec9a5SHonnappa Nagarahalli
46647bec9a5SHonnappa Nagarahalli* enqueue/dequeue finish
46747bec9a5SHonnappa Nagarahalli
46847bec9a5SHonnappa NagarahalliNote that this API is available only for two sync modes:
46947bec9a5SHonnappa Nagarahalli
47047bec9a5SHonnappa Nagarahalli*   Single Producer/Single Consumer (SP/SC)
47147bec9a5SHonnappa Nagarahalli
47247bec9a5SHonnappa Nagarahalli*   Multi-producer/Multi-consumer with Head/Tail Sync (HTS)
47347bec9a5SHonnappa Nagarahalli
47447bec9a5SHonnappa NagarahalliIt is a user responsibility to create/init ring with appropriate sync modes.
47547bec9a5SHonnappa NagarahalliFollowing is an example of usage:
47647bec9a5SHonnappa Nagarahalli
47747bec9a5SHonnappa Nagarahalli.. code-block:: c
47847bec9a5SHonnappa Nagarahalli
47947bec9a5SHonnappa Nagarahalli    /* Reserve space on the ring */
48047bec9a5SHonnappa Nagarahalli    n = rte_ring_enqueue_zc_burst_start(r, 32, &zcd, NULL);
48147bec9a5SHonnappa Nagarahalli    /* Pkt I/O core polls packets from the NIC */
48247bec9a5SHonnappa Nagarahalli    if (n != 0) {
48347bec9a5SHonnappa Nagarahalli        nb_rx = rte_eth_rx_burst(portid, queueid, zcd->ptr1, zcd->n1);
48447bec9a5SHonnappa Nagarahalli        if (nb_rx == zcd->n1 && n != zcd->n1)
48547bec9a5SHonnappa Nagarahalli            nb_rx += rte_eth_rx_burst(portid, queueid, zcd->ptr2,
48647bec9a5SHonnappa Nagarahalli							n - zcd->n1);
48747bec9a5SHonnappa Nagarahalli        /* Provide packets to the packet processing cores */
48847bec9a5SHonnappa Nagarahalli        rte_ring_enqueue_zc_finish(r, nb_rx);
48947bec9a5SHonnappa Nagarahalli    }
49047bec9a5SHonnappa Nagarahalli
49147bec9a5SHonnappa NagarahalliNote that between ``_start_`` and ``_finish_`` no other thread can proceed
49247bec9a5SHonnappa Nagarahalliwith enqueue(/dequeue) operation till ``_finish_`` completes.
49347bec9a5SHonnappa Nagarahalli
494*b5458e2cSKonstantin Ananyev
495*b5458e2cSKonstantin AnanyevStaged Ordered Ring API
496*b5458e2cSKonstantin Ananyev-----------------------
497*b5458e2cSKonstantin Ananyev
498*b5458e2cSKonstantin AnanyevStaged-Ordered-Ring (SORING) API provides a SW abstraction for *ordered* queues
499*b5458e2cSKonstantin Ananyevwith multiple processing *stages*.
500*b5458e2cSKonstantin AnanyevIt is based on conventional DPDK ``rte_ring`` API,
501*b5458e2cSKonstantin Ananyevre-uses many of its concepts, and even substantial part of its code.
502*b5458e2cSKonstantin AnanyevIt can be viewed as an "extension" of ``rte_ring`` functionality.
503*b5458e2cSKonstantin Ananyev
504*b5458e2cSKonstantin AnanyevIn particular, main SORING properties:
505*b5458e2cSKonstantin Ananyev
506*b5458e2cSKonstantin Ananyev* circular ring buffer with fixed size objects and related metadata.
507*b5458e2cSKonstantin Ananyev* producer, consumer plus multiple processing stages in between.
508*b5458e2cSKonstantin Ananyev* allows to split objects processing into multiple stages.
509*b5458e2cSKonstantin Ananyev* objects remain in the same ring while moving from one stage to the other,
510*b5458e2cSKonstantin Ananyev  initial order is preserved, no extra copying needed.
511*b5458e2cSKonstantin Ananyev* preserves the ingress order of objects within the queue across multiple stages.
512*b5458e2cSKonstantin Ananyev* each stage (and producer/consumer) can be served by single and/or multiple threads.
513*b5458e2cSKonstantin Ananyev* number of stages, size and number of objects and their metadata in the ring
514*b5458e2cSKonstantin Ananyev  are configurable at ring initialization time.
515*b5458e2cSKonstantin Ananyev
516*b5458e2cSKonstantin AnanyevData-Path API
517*b5458e2cSKonstantin Ananyev~~~~~~~~~~~~~
518*b5458e2cSKonstantin Ananyev
519*b5458e2cSKonstantin AnanyevSORING data-path API provided four main operations:
520*b5458e2cSKonstantin Ananyev
521*b5458e2cSKonstantin Ananyev* ``enqueue``/``dequeue`` works in the same manner as for conventional ``rte_ring``,
522*b5458e2cSKonstantin Ananyev  all ``rte_ring`` synchronization types are supported.
523*b5458e2cSKonstantin Ananyev
524*b5458e2cSKonstantin Ananyev* ``acquire``/``release`` - for each stage there is an ``acquire`` (start)
525*b5458e2cSKonstantin Ananyev  and ``release`` (finish) operation.
526*b5458e2cSKonstantin Ananyev  After some objects are ``acquired`` - given thread can safely assume that
527*b5458e2cSKonstantin Ananyev  it has exclusive possession of these objects till ``release`` for them is invoked.
528*b5458e2cSKonstantin Ananyev  Note that right now user has to release exactly the same number of objects
529*b5458e2cSKonstantin Ananyev  that was acquired before.
530*b5458e2cSKonstantin Ananyev  After objects are ``released``, given thread loses its possession on them,
531*b5458e2cSKonstantin Ananyev  and they can be either acquired by next stage or dequeued
532*b5458e2cSKonstantin Ananyev  by the consumer (in case of last stage).
533*b5458e2cSKonstantin Ananyev
534*b5458e2cSKonstantin AnanyevA simplified representation of a SORING with two stages is shown below.
535*b5458e2cSKonstantin AnanyevOn that picture ``obj5`` and ``obj4`` elements are acquired by stage 0,
536*b5458e2cSKonstantin Ananyev``obj2`` and ``obj3`` are acquired by stage 1,
537*b5458e2cSKonstantin Ananyevwhile ``obj1`` was already released by stage 1 and is ready to be consumed.
538*b5458e2cSKonstantin Ananyev
539*b5458e2cSKonstantin Ananyev.. _figure_soring1:
540*b5458e2cSKonstantin Ananyev
541*b5458e2cSKonstantin Ananyev.. figure:: img/soring-pic1.*
542*b5458e2cSKonstantin Ananyev
543*b5458e2cSKonstantin AnanyevAlong with traditional flavor there are enhanced versions for
544*b5458e2cSKonstantin Ananyevall these data-path operations: ``enqueux``/``dequeux``/``acquirx``/``releasx``.
545*b5458e2cSKonstantin AnanyevAll enhanced versions take as extra parameter a pointer to an array of metadata values.
546*b5458e2cSKonstantin AnanyevAt initialization user can request within the ``soring`` supplementary
547*b5458e2cSKonstantin Ananyevand optional array of metadata associated with each object in the ``soring``.
548*b5458e2cSKonstantin AnanyevWhile ``soring`` element size is configurable and user can specify it big enough
549*b5458e2cSKonstantin Ananyevto hold both object and its metadata together,
550*b5458e2cSKonstantin Ananyevfor performance reasons it might be plausible to access them as separate arrays.
551*b5458e2cSKonstantin AnanyevNote that users are free to mix and match both versions of data-path API
552*b5458e2cSKonstantin Ananyevin a way they like.
553*b5458e2cSKonstantin AnanyevAs an example, possible usage scenario when such separation helps:
554*b5458e2cSKonstantin Ananyev
555*b5458e2cSKonstantin Ananyev.. code-block:: c
556*b5458e2cSKonstantin Ananyev
557*b5458e2cSKonstantin Ananyev   /*
558*b5458e2cSKonstantin Ananyev    * use pointer to mbuf as soring element, while tx_state as a metadata.
559*b5458e2cSKonstantin Ananyev    * In this example we use a soring with just one stage.
560*b5458e2cSKonstantin Ananyev    */
561*b5458e2cSKonstantin Ananyev    union tx_state {
562*b5458e2cSKonstantin Ananyev        /* negative values for error */
563*b5458e2cSKonstantin Ananyev        int32_t rc;
564*b5458e2cSKonstantin Ananyev        /* otherwise contain valid Tx port and queue IDs*/
565*b5458e2cSKonstantin Ananyev        struct {
566*b5458e2cSKonstantin Ananyev            uint16_t port_id;
567*b5458e2cSKonstantin Ananyev            uint16_t queue_id;
568*b5458e2cSKonstantin Ananyev        } tx;
569*b5458e2cSKonstantin Ananyev    };
570*b5458e2cSKonstantin Ananyev    struct rte_soring *soring;
571*b5458e2cSKonstantin Ananyev
572*b5458e2cSKonstantin Ananyevproducer/consumer part:
573*b5458e2cSKonstantin Ananyev
574*b5458e2cSKonstantin Ananyev.. code-block:: c
575*b5458e2cSKonstantin Ananyev
576*b5458e2cSKonstantin Ananyev   struct rte_mbuf *pkts[MAX_PKT_BURST];
577*b5458e2cSKonstantin Ananyev   union tx_state txst[MAX_PKT_BURST];
578*b5458e2cSKonstantin Ananyev   ...
579*b5458e2cSKonstantin Ananyev   /* enqueue - writes to soring objects array no need to update metadata */
580*b5458e2cSKonstantin Ananyev   uint32_t num = MAX_PKT_BURST;
581*b5458e2cSKonstantin Ananyev   num = rte_soring_enqueue_burst(soring, pkts, num, NULL);
582*b5458e2cSKonstantin Ananyev   ....
583*b5458e2cSKonstantin Ananyev   /* dequeux - reads both packets and related tx_state */
584*b5458e2cSKonstantin Ananyev   uint32_t num = MAX_PKT_BURST;
585*b5458e2cSKonstantin Ananyev   num = rte_soring_dequeux_burst(soring, pkts, txst, num, NULL);
586*b5458e2cSKonstantin Ananyev
587*b5458e2cSKonstantin Ananyev   /*
588*b5458e2cSKonstantin Ananyev    * Tx packets out, or drop in case of error.
589*b5458e2cSKonstantin Ananyev    * Note that we don't need to dereference the soring objects itself
590*b5458e2cSKonstantin Ananyev    * to make a decision.
591*b5458e2cSKonstantin Ananyev    */
592*b5458e2cSKonstantin Ananyev   uint32_t i, j, k, n;
593*b5458e2cSKonstantin Ananyev   struct rte_mbuf *dr[MAX_PKT_BURST];
594*b5458e2cSKonstantin Ananyev
595*b5458e2cSKonstantin Ananyev   k = 0;
596*b5458e2cSKonstantin Ananyev   for (i = 0; i != num; i++) {
597*b5458e2cSKonstantin Ananyev       /* packet processing reports an error */
598*b5458e2cSKonstantin Ananyev       if (txst[i].rc < 0)
599*b5458e2cSKonstantin Ananyev           dr[k++] = pkts[i];
600*b5458e2cSKonstantin Ananyev       /* valid packet, send it out */
601*b5458e2cSKonstantin Ananyev       else {
602*b5458e2cSKonstantin Ananyev           /* group consecutive packets with the same port and queue IDs */
603*b5458e2cSKonstantin Ananyev           for (j = i + 1; j < num; j++)
604*b5458e2cSKonstantin Ananyev               if (txst[j].rc != txst[i].rc)
605*b5458e2cSKonstantin Ananyev                   break;
606*b5458e2cSKonstantin Ananyev
607*b5458e2cSKonstantin Ananyev           n = rte_eth_tx_burst(txst[i].tx.port_id, txst[i].tx.queue_id,
608*b5458e2cSKonstantin Ananyev                                pkts + i, j - i);
609*b5458e2cSKonstantin Ananyev           if (i + n != j) {
610*b5458e2cSKonstantin Ananyev               /* decide with unsent packets if any */
611*b5458e2cSKonstantin Ananyev           }
612*b5458e2cSKonstantin Ananyev       }
613*b5458e2cSKonstantin Ananyev   }
614*b5458e2cSKonstantin Ananyev   /* drop erroneous packets */
615*b5458e2cSKonstantin Ananyev   if (k != 0)
616*b5458e2cSKonstantin Ananyev       rte_pktmbuf_free_bulk(dr, k);
617*b5458e2cSKonstantin Ananyev
618*b5458e2cSKonstantin Ananyevacquire/release part:
619*b5458e2cSKonstantin Ananyev
620*b5458e2cSKonstantin Ananyev.. code-block:: c
621*b5458e2cSKonstantin Ananyev
622*b5458e2cSKonstantin Ananyev   uint32_t ftoken;
623*b5458e2cSKonstantin Ananyev   struct rte_mbuf *pkts[MAX_PKT_BURST];
624*b5458e2cSKonstantin Ananyev   union tx_state txst[MAX_PKT_BURST];
625*b5458e2cSKonstantin Ananyev   ...
626*b5458e2cSKonstantin Ananyev   /* acquire - grab some packets to process */
627*b5458e2cSKonstantin Ananyev   uint32_t num = MAX_PKT_BURST;
628*b5458e2cSKonstantin Ananyev   num = rte_soring_acquire_burst(soring, pkts, 0, num, &ftoken, NULL);
629*b5458e2cSKonstantin Ananyev
630*b5458e2cSKonstantin Ananyev   /* process packets, fill txst[] for each */
631*b5458e2cSKonstantin Ananyev   do_process_packets(pkts, txst, num);
632*b5458e2cSKonstantin Ananyev
633*b5458e2cSKonstantin Ananyev   /*
634*b5458e2cSKonstantin Ananyev    * release - assuming that do_process_packets() didn't change
635*b5458e2cSKonstantin Ananyev    * contents of pkts[], we need to update soring metadata array only.
636*b5458e2cSKonstantin Ananyev    */
637*b5458e2cSKonstantin Ananyev   rte_soring_releasx(soring, NULL, txst, 0, num, ftoken);
638*b5458e2cSKonstantin Ananyev
639*b5458e2cSKonstantin AnanyevUse Cases
640*b5458e2cSKonstantin Ananyev~~~~~~~~~
641*b5458e2cSKonstantin Ananyev
642*b5458e2cSKonstantin AnanyevExpected use-cases include applications that use pipeline model
643*b5458e2cSKonstantin Ananyev(probably with multiple stages) for packet processing,
644*b5458e2cSKonstantin Ananyevwhen preserving incoming packet order is important.
645*b5458e2cSKonstantin AnanyevI.E.: IPsec processing, etc.
646*b5458e2cSKonstantin Ananyev
647*b5458e2cSKonstantin AnanyevSORING internals
648*b5458e2cSKonstantin Ananyev~~~~~~~~~~~~~~~~
649*b5458e2cSKonstantin Ananyev
650*b5458e2cSKonstantin Ananyev* In addition to accessible by the user array of objects (and metadata),
651*b5458e2cSKonstantin Ananyev  ``soring`` also contains an internal array of states.
652*b5458e2cSKonstantin Ananyev  Each ``state[]`` corresponds to exactly one object within the soring.
653*b5458e2cSKonstantin Ananyev  That ``state[]`` array is used by ``acquire``/``release``/``dequeue`` operations
654*b5458e2cSKonstantin Ananyev  to store internal information and should not be accessed by the user directly.
655*b5458e2cSKonstantin Ananyev
656*b5458e2cSKonstantin Ananyev* At ``acquire``, soring  moves stage's head
657*b5458e2cSKonstantin Ananyev  (in a same way as ``rte_ring`` ``move_head`` does),
658*b5458e2cSKonstantin Ananyev  plus it saves in ``state[stage.old_head]`` information
659*b5458e2cSKonstantin Ananyev  about how many elements were acquired, acquired head position,
660*b5458e2cSKonstantin Ananyev  and special flag value to indicate that given elements are acquired
661*b5458e2cSKonstantin Ananyev  (``SORING_ST_START``).
662*b5458e2cSKonstantin Ananyev  Note that ``acquire`` returns an opaque ``ftoken`` value
663*b5458e2cSKonstantin Ananyev  that user has to provide for ``release`` function.
664*b5458e2cSKonstantin Ananyev
665*b5458e2cSKonstantin Ananyev* ``release`` extracts old head value from provided by user ``ftoken``
666*b5458e2cSKonstantin Ananyev  and checks that corresponding ``state[]`` entry contains expected values
667*b5458e2cSKonstantin Ananyev  (mostly for sanity purposes).
668*b5458e2cSKonstantin Ananyev  Then it marks this ``state[]`` entry with ``SORING_ST_FINISH`` flag
669*b5458e2cSKonstantin Ananyev  to indicate that given subset of objects was released.
670*b5458e2cSKonstantin Ananyev  After that, it checks does stage's old ``head`` value
671*b5458e2cSKonstantin Ananyev  equals to its current ``tail`` value.
672*b5458e2cSKonstantin Ananyev  If so, then it performs ``finalize`` operation,
673*b5458e2cSKonstantin Ananyev  otherwise ``release`` just returns.
674*b5458e2cSKonstantin Ananyev
675*b5458e2cSKonstantin Ananyev* As ``state[]`` is shared by all threads,
676*b5458e2cSKonstantin Ananyev  some other thread can perform ``finalize`` operation for given stage.
677*b5458e2cSKonstantin Ananyev  That allows ``release`` to avoid excessive waits on the ``tail`` value.
678*b5458e2cSKonstantin Ananyev  Main purpose of ``finalize`` operation is to walk through ``state[]`` array
679*b5458e2cSKonstantin Ananyev  from current stage's ``tail`` position up to its ``head``,
680*b5458e2cSKonstantin Ananyev  check ``state[]`` and move stage ``tail`` through elements
681*b5458e2cSKonstantin Ananyev  that are already released (in ``SORING_ST_FINISH`` state).
682*b5458e2cSKonstantin Ananyev  Along with that, corresponding ``state[]`` entries are reset back to zero.
683*b5458e2cSKonstantin Ananyev  Note that ``finalize`` for given stage can be called from multiple places:
684*b5458e2cSKonstantin Ananyev  from ``release`` for that stage or from ``acquire`` for next stage,
685*b5458e2cSKonstantin Ananyev  or even from consumer's ``dequeue`` - in case given stage is the last one.
686*b5458e2cSKonstantin Ananyev  So ``finalize`` has to be MT-safe and inside it we have to guarantee that
687*b5458e2cSKonstantin Ananyev  at any given moment only one thread can update stage's ``tail``
688*b5458e2cSKonstantin Ananyev  and reset corresponding ``state[]`` entries.
689*b5458e2cSKonstantin Ananyev
690*b5458e2cSKonstantin Ananyev
691fc1f2750SBernard IremongerReferences
692fc1f2750SBernard Iremonger----------
693fc1f2750SBernard Iremonger
694fc1f2750SBernard Iremonger    *   `bufring.h in FreeBSD <http://svn.freebsd.org/viewvc/base/release/8.0.0/sys/sys/buf_ring.h?revision=199625&amp;view=markup>`_ (version 8)
695fc1f2750SBernard Iremonger
696fc1f2750SBernard Iremonger    *   `bufring.c in FreeBSD <http://svn.freebsd.org/viewvc/base/release/8.0.0/sys/kern/subr_bufring.c?revision=199625&amp;view=markup>`_ (version 8)
697fc1f2750SBernard Iremonger
698fc1f2750SBernard Iremonger    *   `Linux Lockless Ring Buffer Design <http://lwn.net/Articles/340400/>`_
699