xref: /dpdk/doc/guides/prog_guide/member_lib.rst (revision 41dd9a6bc2d9c6e20e139ad713cc9d172572dd43)
1..  SPDX-License-Identifier: BSD-3-Clause
2    Copyright(c) 2017 Intel Corporation.
3
4Membership Library
5==================
6
7Introduction
8------------
9
10The DPDK Membership Library provides an API for DPDK applications to insert a
11new member, delete an existing member, or query the existence of a member in a
12given set, or a group of sets. For the case of a group of sets, the library
13will return not only whether the element has been inserted before in one of
14the sets but also which set it belongs to.  The Membership Library is an
15extension and generalization of a traditional filter structure (for example
16Bloom Filter [Member-bloom]) that has multiple usages in a wide variety of
17workloads and applications. In general, the Membership Library is a data
18structure that provides a "set-summary" on whether a member belongs to a set,
19and as discussed in detail later, there are two advantages of using such a
20set-summary rather than operating on a "full-blown" complete list of elements:
21first, it has a much smaller storage requirement than storing the whole list of
22elements themselves, and secondly checking an element membership (or other
23operations) in this set-summary is much faster than checking it for the
24original full-blown complete list of elements.
25
26We use the term "Set-Summary" in this guide to refer to the space-efficient,
27probabilistic membership data structure that is provided by the library. A
28membership test for an element will return the set this element belongs to or
29that the element is "not-found" with very high probability of accuracy. Set-summary
30is a fundamental data aggregation component that can be used in many network
31(and other) applications. It is a crucial structure to address performance and
32scalability issues of diverse network applications including overlay networks,
33data-centric networks, flow table summaries, network statistics and
34traffic monitoring. A set-summary is useful for applications who need to
35include a list of elements while a complete list requires too much space
36and/or too much processing cost. In these situations, the set-summary works as
37a lossy hash-based representation of a set of members. It can dramatically
38reduce space requirement and significantly improve the performance of set
39membership queries at the cost of introducing a very small membership test error
40probability.
41
42.. _figure_membership1:
43.. figure:: img/member_i1.*
44
45  Example Usages of Membership Library
46
47There are various usages for a Membership Library in a very
48large set of applications and workloads. Interested readers can refer to
49[Member-survey] for a survey of possible networking usages. The above figure
50provide a small set of examples of using the Membership Library:
51
52* Sub-figure (a)
53  depicts a distributed web cache architecture where a collection of proxies
54  attempt to share their web caches (cached from a set of back-end web servers) to
55  provide faster responses to clients, and the proxies use the Membership
56  Library to share summaries of what web pages/objects they are caching. With the
57  Membership Library, a proxy receiving an http request will inquire the
58  set-summary to find its location and quickly determine whether to retrieve the
59  requested web page from a nearby proxy or from a back-end web server.
60
61* Sub-figure (b) depicts another example for using the Membership Library to
62  prevent routing loops which is typically done using slow TTL countdown and
63  dropping packets when TTL expires. As shown in Sub-figure (b), an embedded
64  set-summary in the packet header itself can be used to summarize the set of
65  nodes a packet has gone through, and each node upon receiving a packet can check
66  whether its id is a member of the set of visited nodes, and if it is, then a
67  routing loop is detected.
68
69* Sub-Figure (c) presents another usage of the Membership
70  Library to load-balance flows to worker threads with in-order guarantee where a
71  set-summary is used to query if a packet belongs to an existing flow or a new
72  flow. Packets belonging to a new flow are forwarded to the current least loaded
73  worker thread, while those belonging to an existing flow are forwarded to the
74  pre-assigned thread to guarantee in-order processing.
75
76* Sub-figure (d) highlights
77  yet another usage example in the database domain where a set-summary is used to
78  determine joins between sets instead of creating a join by comparing each
79  element of a set against the other elements in a different set, a join is done
80  on the summaries since they can efficiently encode members of a given set.
81
82Membership Library is a configurable library that is optimized to cover set
83membership functionality for both a single set and multi-set scenarios. Two set-summary
84schemes are presented including (a) vector of Bloom Filters and (b) Hash-Table based
85set-summary schemes with and without false negative probability.
86This guide first briefly describes these different types of set-summaries, usage examples for each,
87and then it highlights the Membership Library API.
88
89Vector of Bloom Filters
90-----------------------
91
92Bloom Filter (BF) [Member-bloom] is a well-known space-efficient
93probabilistic data structure that answers set membership queries (test whether
94an element is a member of a set) with some probability of false positives and
95zero false negatives; a query for an element returns either it is "possibly in
96a set" (with very high probability) or "definitely not in a set".
97
98The BF is a method for representing a set of ``n`` elements (for example flow keys
99in network applications domain) to support membership queries. The idea of BF is
100to allocate a bit-vector ``v`` with ``m`` bits, which are initially all set to 0. Then
101it chooses ``k`` independent hash functions ``h1``, ``h2``, ... ``hk`` with hash values range from
102``0`` to ``m-1`` to perform hashing calculations on each element to be inserted. Every time when an
103element ``X`` being inserted into the set, the bits at positions ``h1(X)``, ``h2(X)``, ...
104``hk(X)`` in ``v`` are set to 1 (any particular bit might be set to 1 multiple times
105for multiple different inserted elements). Given a query for any element ``Y``, the
106bits at positions ``h1(Y)``, ``h2(Y)``, ... ``hk(Y)`` are checked. If any of them is 0,
107then Y is definitely not in the set. Otherwise there is a high probability that
108Y is a member of the set with certain false positive probability. As shown in
109the next equation, the false positive probability can be made arbitrarily small
110by changing the number of hash functions (``k``) and the vector length (``m``).
111
112.. _figure_membership2:
113.. figure:: img/member_i2.*
114
115  Bloom Filter False Positive Probability
116
117Without BF, an accurate membership testing could involve a costly hash table
118lookup and full element comparison. The advantage of using a BF is to simplify
119the membership test into a series of hash calculations and memory accesses for a
120small bit-vector, which can be easily optimized. Hence the lookup throughput
121(set membership test) can be significantly faster than a normal hash table
122lookup with element comparison.
123
124.. _figure_membership3:
125.. figure:: img/member_i3.*
126
127  Detecting Routing Loops Using BF
128
129BF is used for applications that need only one set, and the
130membership of elements is checked against the BF. The example discussed
131in the above figure is one example of potential applications that uses only one
132set to capture the node IDs that have been visited so far by the packet. Each
133node will then check this embedded BF in the packet header for its own id, and
134if the BF indicates that the current node is definitely not in the set then a
135loop-free route is guaranteed.
136
137
138.. _figure_membership4:
139.. figure:: img/member_i4.*
140
141  Vector Bloom Filter (vBF) Overview
142
143To support membership test for both multiple sets and a single set,
144the library implements a Vector Bloom Filter (vBF) scheme.
145vBF basically composes multiple bloom filters into a vector of bloom filers.
146The membership test is conducted on all of the
147bloom filters concurrently to determine which set(s) it belongs to or none of
148them. The basic idea of vBF is shown in the above figure where an element is
149used to address multiple bloom filters concurrently and the bloom filter
150index(es) with a hit is returned.
151
152.. _figure_membership5:
153.. figure:: img/member_i5.*
154
155  vBF for Flow Scheduling to Worker Thread
156
157As previously mentioned, there are many usages of such structures. vBF is used
158for applications that need to check membership against multiple sets
159simultaneously. The example shown in the above figure uses a set to capture
160all flows being assigned for processing at a given worker thread. Upon receiving
161a packet the vBF is used to quickly figure out if this packet belongs to a new flow
162so as to be forwarded to the current least loaded worker thread, or otherwise it
163should be queued for an existing thread to guarantee in-order processing (i.e.
164the property of vBF to indicate right away that a given flow is a new one or
165not is critical to minimize response time latency).
166
167It should be noted that vBF can be implemented using a set of single bloom
168filters with sequential lookup of each BF. However, being able to concurrently
169search all set-summaries is a big throughput advantage. In the library, certain
170parallelism is realized by the implementation of checking all bloom filters
171together.
172
173
174Hash-Table based Set-Summaries
175------------------------------
176
177Hash-table based set-summary (HTSS) is another scheme in the membership library.
178Cuckoo filter [Member-cfilter] is an example of HTSS.
179HTSS supports multi-set membership testing like
180vBF does. However, while vBF is better for a small number of targets, HTSS is more suitable
181and can easily outperform vBF when the number of sets is
182large, since HTSS uses a single hash table for membership testing while vBF
183requires testing a series of Bloom Filters each corresponding to one set.
184As a result, generally speaking vBF is more adequate for the case of a small limited number of sets
185while HTSS should be used with a larger number of sets.
186
187.. _figure_membership6:
188.. figure:: img/member_i6.*
189
190  Using HTSS for Attack Signature Matching
191
192As shown in the above figure, attack signature matching where each set
193represents a certain signature length (for correctness of this example, an
194attack signature should not be a subset of another one) in the payload is a good
195example for using HTSS with 0% false negative (i.e., when an element returns not
196found, it has a 100% certainty that it is not a member of any set).  The packet
197inspection application benefits from knowing right away that the current payload
198does not match any attack signatures in the database to establish its
199legitimacy, otherwise a deep inspection of the packet is needed.
200
201HTSS employs a similar but simpler data structure to a traditional hash table,
202and the major difference is that HTSS stores only the signatures but not the
203full keys/elements which can significantly reduce the footprint of the table.
204Along with the signature, HTSS also stores a value to indicate the target set.
205When looking up an element, the element is hashed and the HTSS is addressed
206to retrieve the signature stored. If the signature matches then the value is
207retrieved corresponding to the index of the target set which the element belongs
208to. Because signatures can collide, HTSS can still has false positive
209probability. Furthermore, if elements are allowed to be
210overwritten or evicted when the hash table becomes full, it will also have a
211false negative probability. We discuss this case in the next section.
212
213Set-Summaries with False Negative Probability
214~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
215
216As previously mentioned, traditional set-summaries (e.g. Bloom Filters) do not
217have a false negative probability, i.e., it is 100% certain when an element
218returns "not to be present" for a given set. However, the Membership Library
219also supports a set-summary probabilistic data structure based on HTSS which
220allows for false negative probability.
221
222In HTSS, when the hash table becomes full, keys/elements will fail to be added
223into the table and the hash table has to be resized to accommodate for these new
224elements, which can be expensive. However, if we allow new elements to overwrite
225or evict existing elements (as a cache typically does), then the resulting
226set-summary will begin to have false negative probability. This is because the
227element that was evicted from the set-summary may still be present in the target
228set. For subsequent inquiries the set-summary will falsely report the element
229not being in the set, hence having a false negative probability.
230
231The major usage of HTSS with false negative is to use it as a cache for
232distributing elements to different target sets. By allowing HTSS to evict old
233elements, the set-summary can keep track of the most recent elements
234(i.e. active) as a cache typically does. Old inactive elements (infrequently
235used elements) will automatically and eventually get evicted from the
236set-summary. It is worth noting that the set-summary still has false positive
237probability, which means the application either can tolerate certain false positive
238or it has fall-back path when false positive happens.
239
240.. _figure_membership7:
241.. figure:: img/member_i7.*
242
243  Using HTSS with False Negatives for Wild Card Classification
244
245HTSS with false negative (i.e. a cache) also has its wide set of applications.
246For example wild card flow classification (e.g. ACL rules) highlighted in the
247above figure is an example of such application. In that case each target set
248represents a sub-table with rules defined by a certain flow mask. The flow masks
249are non-overlapping, and for flows matching more than one rule only the highest
250priority one is inserted in the corresponding sub-table (interested readers can
251refer to the Open vSwitch (OvS) design of Mega Flow Cache (MFC) [Member-OvS]
252for further details). Typically the rules will have a large number of distinct
253unique masks and hence, a large number of target sets each corresponding to one
254mask. Because the active set of flows varies widely based on the network
255traffic, HTSS with false negative will act as a cache for <flowid, target ACL
256sub-table> pair for the current active set of flows. When a miss occurs (as
257shown in red in the above figure) the sub-tables will be searched sequentially
258one by one for a possible match, and when found the flow key and target
259sub-table will be inserted into the set-summary (i.e. cache insertion) so
260subsequent packets from the same flow don’t incur the overhead of the
261sequential search of sub-tables.
262
263Library API Overview
264--------------------
265
266The design goal of the Membership Library API is to be as generic as possible to
267support all the different types of set-summaries we discussed in previous
268sections and beyond. Fundamentally, the APIs need to include creation,
269insertion, deletion, and lookup.
270
271
272Set-summary Create
273~~~~~~~~~~~~~~~~~~
274
275The ``rte_member_create()`` function is used to create a set-summary structure, the input parameter
276is a struct to pass in parameters that needed to initialize the set-summary, while the function returns the
277pointer to the created set-summary or ``NULL`` if the creation failed.
278
279The general input arguments used when creating the set-summary should include ``name``
280which is the name of the created set-summary, *type* which is one of the types
281supported by the library (e.g. ``RTE_MEMBER_TYPE_HT`` for HTSS or ``RTE_MEMBER_TYPE_VBF`` for vBF), and ``key_len``
282which is the length of the element/key. There are other parameters
283are only used for certain type of set-summary, or which have a slightly different meaning for different types of set-summary.
284For example, ``num_keys`` parameter means the maximum number of entries for Hash table based set-summary.
285However, for bloom filter, this value means the expected number of keys that could be
286inserted into the bloom filter(s). The value is used to calculate the size of each
287bloom filter.
288
289We also pass two seeds: ``prim_hash_seed`` and
290``sec_hash_seed`` for the primary and secondary hash functions to calculate two independent hash values.
291``socket_id`` parameter is the NUMA socket ID for the memory used to create the
292set-summary. For HTSS, another parameter ``is_cache`` is used to indicate
293if this set-summary is a cache (i.e. with false negative probability) or not.
294For vBF, extra parameters are needed. For example, ``num_set`` is the number of
295sets needed to initialize the vector bloom filters. This number is equal to the
296number of bloom filters will be created.
297``false_pos_rate`` is the false positive rate. num_keys and false_pos_rate will be used to determine
298the number of hash functions and the bloom filter size.
299
300
301Set-summary Element Insertion
302~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
303
304The ``rte_member_add()`` function is used to insert an element/key into a set-summary structure. If it fails an
305error is returned. For success the returned value is dependent on the
306set-summary mode to provide extra information for the users. For vBF
307mode, a return value of 0 means a successful insert. For HTSS mode without false negative, the insert
308could fail with ``-ENOSPC`` if the table is full. With false negative (i.e. cache mode),
309for insert that does not cause any eviction (i.e. no overwriting happens to an
310existing entry) the return value is 0. For insertion that causes eviction, the return
311value is 1 to indicate such situation, but it is not an error.
312
313The input arguments for the function should include the ``key`` which is a pointer to the element/key that needs to
314be added to the set-summary, and ``set_id`` which is the set id associated
315with the key that needs to be added.
316
317
318Set-summary Element Lookup
319~~~~~~~~~~~~~~~~~~~~~~~~~~
320
321The ``rte_member_lookup()`` function looks up a single key/element in the set-summary structure. It
322returns as soon as the first match is found. The return value is 1 if a
323match is found and 0 otherwise. The arguments for the function include ``key`` which is a pointer to the
324element/key that needs to be looked up, and ``set_id`` which is used to return the
325first target set id where the key has matched, if any.
326
327The ``rte_member_lookup_bulk()`` function is used to look up a bulk of keys/elements in the
328set-summary structure for their first match. Each key lookup returns as soon as the first match is found. The
329return value is the number of keys that find a match. The arguments of the function include ``keys``
330which is a pointer to a bulk of keys that are to be looked up,
331``num_keys`` is the number
332of keys that will be looked up, and ``set_ids`` are the return target set
333ids for the first match found for each of the input keys. ``set_ids`` is an array
334needs to be sized according to the ``num_keys``. If there is no match, the set id
335for that key will be set to RTE_MEMBER_NO_MATCH.
336
337The ``rte_member_lookup_multi()`` function looks up a single key/element in the
338set-summary structure for multiple matches. It
339returns ALL the matches (possibly more than one) found for this key when it
340is matched against all target sets (it is worth noting that for cache mode HTSS,
341the current implementation matches at most one target set). The return value is
342the number of matches
343that was found for this key (for cache mode HTSS the return value
344should be at most 1). The arguments for the function include ``key`` which is a pointer to the
345element/key that needs to be looked up, ``max_match_per_key`` which is to indicate the maximum number of matches
346the user expects to find for each key, and ``set_id`` which is used to return all
347target set ids where the key has matched, if any. The ``set_id`` array should be sized
348according to ``max_match_per_key``. For vBF, the maximum number of matches per key is equal
349to the number of sets. For HTSS, the maximum number of matches per key is equal to two time
350entry count per bucket. ``max_match_per_key`` should be equal or smaller than the maximum number of
351possible matches.
352
353The ``rte_membership_lookup_multi_bulk()`` function looks up a bulk of keys/elements in the
354set-summary structure for multiple matches, each key lookup returns ALL the matches (possibly more
355than one) found for this key when it is matched against all target sets (cache mode HTSS
356matches at most one target set). The
357return value is the number of keys that find one or more matches in the
358set-summary structure. The arguments of the
359function include ``keys`` which is
360a pointer to a bulk of keys that are to be looked up, ``num_keys`` is the number
361of keys that will be looked up, ``max_match_per_key`` is the possible
362maximum number of matches for each key, ``match_count`` which is the returned number
363of matches for each key, and ``set_ids`` are the returned target set
364ids for all matches found for each keys. ``set_ids`` is 2-D array
365containing a 1-D array for each key (the size of 1-D array per key should be set by the user according to ``max_match_per_key``).
366``max_match_per_key`` should be equal or smaller than the maximum number of
367possible matches, similar to ``rte_member_lookup_multi``.
368
369
370Set-summary Element Delete
371~~~~~~~~~~~~~~~~~~~~~~~~~~
372
373The ``rte_membership_delete()`` function deletes an element/key from a set-summary structure, if it fails
374an error is returned. The input arguments should include ``key`` which is a pointer to the
375element/key that needs to be deleted from the set-summary, and ``set_id``
376which is the set id associated with the key to delete. It is worth noting that current
377implementation of vBF does not support deletion [1]_. An error code ``-EINVAL`` will be returned.
378
379.. [1] Traditional bloom filter does not support proactive deletion. Supporting proactive deletion require additional implementation and performance overhead.
380
381References
382-----------
383
384[Member-bloom] B H Bloom, "Space/Time Trade-offs in Hash Coding with Allowable Errors," Communications of the ACM, 1970.
385
386[Member-survey] A Broder and M Mitzenmacher, "Network Applications of Bloom Filters: A Survey," in Internet Mathematics, 2005.
387
388[Member-cfilter] B Fan, D G Andersen and M Kaminsky, "Cuckoo Filter: Practically Better Than Bloom," in Conference on emerging Networking Experiments and Technologies, 2014.
389
390[Member-OvS] B Pfaff, "The Design and Implementation of Open vSwitch," in NSDI, 2015.
391