xref: /dpdk/doc/guides/prog_guide/member_lib.rst (revision 55694b2a9f64a860ce71f63a44b4c6efa2e09452)
1*55694b2aSYipeng Wang..  BSD LICENSE
2*55694b2aSYipeng Wang    Copyright(c) 2017 Intel Corporation. All rights reserved.
3*55694b2aSYipeng Wang    All rights reserved.
4*55694b2aSYipeng Wang
5*55694b2aSYipeng Wang    Redistribution and use in source and binary forms, with or without
6*55694b2aSYipeng Wang    modification, are permitted provided that the following conditions
7*55694b2aSYipeng Wang    are met:
8*55694b2aSYipeng Wang
9*55694b2aSYipeng Wang    * Redistributions of source code must retain the above copyright
10*55694b2aSYipeng Wang    notice, this list of conditions and the following disclaimer.
11*55694b2aSYipeng Wang    * Redistributions in binary form must reproduce the above copyright
12*55694b2aSYipeng Wang    notice, this list of conditions and the following disclaimer in
13*55694b2aSYipeng Wang    the documentation and/or other materials provided with the
14*55694b2aSYipeng Wang    distribution.
15*55694b2aSYipeng Wang    * Neither the name of Intel Corporation nor the names of its
16*55694b2aSYipeng Wang    contributors may be used to endorse or promote products derived
17*55694b2aSYipeng Wang    from this software without specific prior written permission.
18*55694b2aSYipeng Wang
19*55694b2aSYipeng Wang    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20*55694b2aSYipeng Wang    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21*55694b2aSYipeng Wang    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22*55694b2aSYipeng Wang    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23*55694b2aSYipeng Wang    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24*55694b2aSYipeng Wang    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25*55694b2aSYipeng Wang    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26*55694b2aSYipeng Wang    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27*55694b2aSYipeng Wang    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28*55694b2aSYipeng Wang    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29*55694b2aSYipeng Wang    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30*55694b2aSYipeng Wang
31*55694b2aSYipeng Wang
32*55694b2aSYipeng Wang.. _member_library:
33*55694b2aSYipeng Wang
34*55694b2aSYipeng WangMembership Library
35*55694b2aSYipeng Wang==================
36*55694b2aSYipeng Wang
37*55694b2aSYipeng WangIntroduction
38*55694b2aSYipeng Wang------------
39*55694b2aSYipeng Wang
40*55694b2aSYipeng WangThe DPDK Membership Library provides an API for DPDK applications to insert a
41*55694b2aSYipeng Wangnew member, delete an existing member, or query the existence of a member in a
42*55694b2aSYipeng Wanggiven set, or a group of sets. For the case of a group of sets, the library
43*55694b2aSYipeng Wangwill return not only whether the element has been inserted before in one of
44*55694b2aSYipeng Wangthe sets but also which set it belongs to.  The Membership Library is an
45*55694b2aSYipeng Wangextension and generalization of a traditional filter structure (for example
46*55694b2aSYipeng WangBloom Filter [Member-bloom]) that has multiple usages in a wide variety of
47*55694b2aSYipeng Wangworkloads and applications. In general, the Membership Library is a data
48*55694b2aSYipeng Wangstructure that provides a "set-summary" on whether a member belongs to a set,
49*55694b2aSYipeng Wangand as discussed in detail later, there are two advantages of using such a
50*55694b2aSYipeng Wangset-summary rather than operating on a "full-blown" complete list of elements:
51*55694b2aSYipeng Wangfirst, it has a much smaller storage requirement than storing the whole list of
52*55694b2aSYipeng Wangelements themselves, and secondly checking an element membership (or other
53*55694b2aSYipeng Wangoperations) in this set-summary is much faster than checking it for the
54*55694b2aSYipeng Wangoriginal full-blown complete list of elements.
55*55694b2aSYipeng Wang
56*55694b2aSYipeng WangWe use the term "Set-Summary" in this guide to refer to the space-efficient,
57*55694b2aSYipeng Wangprobabilistic membership data structure that is provided by the library. A
58*55694b2aSYipeng Wangmembership test for an element will return the set this element belongs to or
59*55694b2aSYipeng Wangthat the element is "not-found" with very high probability of accuracy. Set-summary
60*55694b2aSYipeng Wangis a fundamental data aggregation component that can be used in many network
61*55694b2aSYipeng Wang(and other) applications. It is a crucial structure to address performance and
62*55694b2aSYipeng Wangscalability issues of diverse network applications including overlay networks,
63*55694b2aSYipeng Wangdata-centric networks, flow table summaries, network statistics and
64*55694b2aSYipeng Wangtraffic monitoring. A set-summary is useful for applications who need to
65*55694b2aSYipeng Wanginclude a list of elements while a complete list requires too much space
66*55694b2aSYipeng Wangand/or too much processing cost. In these situations, the set-summary works as
67*55694b2aSYipeng Wanga lossy hash-based representation of a set of members. It can dramatically
68*55694b2aSYipeng Wangreduce space requirement and significantly improve the performance of set
69*55694b2aSYipeng Wangmembership queries at the cost of introducing a very small membership test error
70*55694b2aSYipeng Wangprobability.
71*55694b2aSYipeng Wang
72*55694b2aSYipeng Wang.. _figure_membership1:
73*55694b2aSYipeng Wang.. figure:: img/member_i1.*
74*55694b2aSYipeng Wang
75*55694b2aSYipeng Wang  Example Usages of Membership Library
76*55694b2aSYipeng Wang
77*55694b2aSYipeng WangThere are various usages for a Membership Library in a very
78*55694b2aSYipeng Wanglarge set of applications and workloads. Interested readers can refer to
79*55694b2aSYipeng Wang[Member-survey] for a survey of possible networking usages. The above figure
80*55694b2aSYipeng Wangprovide a small set of examples of using the Membership Library:
81*55694b2aSYipeng Wang
82*55694b2aSYipeng Wang* Sub-figure (a)
83*55694b2aSYipeng Wang  depicts a distributed web cache architecture where a collection of proxies
84*55694b2aSYipeng Wang  attempt to share their web caches (cached from a set of back-end web servers) to
85*55694b2aSYipeng Wang  provide faster responses to clients, and the proxies use the Membership
86*55694b2aSYipeng Wang  Library to share summaries of what web pages/objects they are caching. With the
87*55694b2aSYipeng Wang  Membership Library, a proxy receiving an http request will inquire the
88*55694b2aSYipeng Wang  set-summary to find its location and quickly determine whether to retrieve the
89*55694b2aSYipeng Wang  requested web page from a nearby proxy or from a back-end web server.
90*55694b2aSYipeng Wang
91*55694b2aSYipeng Wang* Sub-figure (b) depicts another example for using the Membership Library to
92*55694b2aSYipeng Wang  prevent routing loops which is typically done using slow TTL countdown and
93*55694b2aSYipeng Wang  dropping packets when TTL expires. As shown in Sub-figure (b), an embedded
94*55694b2aSYipeng Wang  set-summary in the packet header itself can be used to summarize the set of
95*55694b2aSYipeng Wang  nodes a packet has gone through, and each node upon receiving a packet can check
96*55694b2aSYipeng Wang  whether its id is a member of the set of visited nodes, and if it is, then a
97*55694b2aSYipeng Wang  routing loop is detected.
98*55694b2aSYipeng Wang
99*55694b2aSYipeng Wang* Sub-Figure (c) presents another usage of the Membership
100*55694b2aSYipeng Wang  Library to load-balance flows to worker threads with in-order guarantee where a
101*55694b2aSYipeng Wang  set-summary is used to query if a packet belongs to an existing flow or a new
102*55694b2aSYipeng Wang  flow. Packets belonging to a new flow are forwarded to the current least loaded
103*55694b2aSYipeng Wang  worker thread, while those belonging to an existing flow are forwarded to the
104*55694b2aSYipeng Wang  pre-assigned thread to guarantee in-order processing.
105*55694b2aSYipeng Wang
106*55694b2aSYipeng Wang* Sub-figure (d) highlights
107*55694b2aSYipeng Wang  yet another usage example in the database domain where a set-summary is used to
108*55694b2aSYipeng Wang  determine joins between sets instead of creating a join by comparing each
109*55694b2aSYipeng Wang  element of a set against the other elements in a different set, a join is done
110*55694b2aSYipeng Wang  on the summaries since they can efficiently encode members of a given set.
111*55694b2aSYipeng Wang
112*55694b2aSYipeng WangMembership Library is a configurable library that is optimized to cover set
113*55694b2aSYipeng Wangmembership functionality for both a single set and multi-set scenarios. Two set-summary
114*55694b2aSYipeng Wangschemes are presented including (a) vector of Bloom Filters and (b) Hash-Table based
115*55694b2aSYipeng Wangset-summary schemes with and without false negative probability.
116*55694b2aSYipeng WangThis guide first briefly describes these different types of set-summaries, usage examples for each,
117*55694b2aSYipeng Wangand then it highlights the Membership Library API.
118*55694b2aSYipeng Wang
119*55694b2aSYipeng WangVector of Bloom Filters
120*55694b2aSYipeng Wang-----------------------
121*55694b2aSYipeng Wang
122*55694b2aSYipeng WangBloom Filter (BF) [Member-bloom] is a well-known space-efficient
123*55694b2aSYipeng Wangprobabilistic data structure that answers set membership queries (test whether
124*55694b2aSYipeng Wangan element is a member of a set) with some probability of false positives and
125*55694b2aSYipeng Wangzero false negatives; a query for an element returns either it is "possibly in
126*55694b2aSYipeng Wanga set" (with very high probability) or "definitely not in a set".
127*55694b2aSYipeng Wang
128*55694b2aSYipeng WangThe BF is a method for representing a set of ``n`` elements (for example flow keys
129*55694b2aSYipeng Wangin network applications domain) to support membership queries. The idea of BF is
130*55694b2aSYipeng Wangto allocate a bit-vector ``v`` with ``m`` bits, which are initially all set to 0. Then
131*55694b2aSYipeng Wangit chooses ``k`` independent hash functions ``h1``, ``h2``, ... ``hk`` with hash values range from
132*55694b2aSYipeng Wang``0`` to ``m-1`` to perform hashing calculations on each element to be inserted. Every time when an
133*55694b2aSYipeng Wangelement ``X`` being inserted into the set, the bits at positions ``h1(X)``, ``h2(X)``, ...
134*55694b2aSYipeng Wang``hk(X)`` in ``v`` are set to 1 (any particular bit might be set to 1 multiple times
135*55694b2aSYipeng Wangfor multiple different inserted elements). Given a query for any element ``Y``, the
136*55694b2aSYipeng Wangbits at positions ``h1(Y)``, ``h2(Y)``, ... ``hk(Y)`` are checked. If any of them is 0,
137*55694b2aSYipeng Wangthen Y is definitely not in the set. Otherwise there is a high probability that
138*55694b2aSYipeng WangY is a member of the set with certain false positive probability. As shown in
139*55694b2aSYipeng Wangthe next equation, the false positive probability can be made arbitrarily small
140*55694b2aSYipeng Wangby changing the number of hash functions (``k``) and the vector length (``m``).
141*55694b2aSYipeng Wang
142*55694b2aSYipeng Wang.. _figure_membership2:
143*55694b2aSYipeng Wang.. figure:: img/member_i2.*
144*55694b2aSYipeng Wang
145*55694b2aSYipeng Wang  Bloom Filter False Positive Probability
146*55694b2aSYipeng Wang
147*55694b2aSYipeng WangWithout BF, an accurate membership testing could involve a costly hash table
148*55694b2aSYipeng Wanglookup and full element comparison. The advantage of using a BF is to simplify
149*55694b2aSYipeng Wangthe membership test into a series of hash calculations and memory accesses for a
150*55694b2aSYipeng Wangsmall bit-vector, which can be easily optimized. Hence the lookup throughput
151*55694b2aSYipeng Wang(set membership test) can be significantly faster than a normal hash table
152*55694b2aSYipeng Wanglookup with element comparison.
153*55694b2aSYipeng Wang
154*55694b2aSYipeng Wang.. _figure_membership3:
155*55694b2aSYipeng Wang.. figure:: img/member_i3.*
156*55694b2aSYipeng Wang
157*55694b2aSYipeng Wang  Detecting Routing Loops Using BF
158*55694b2aSYipeng Wang
159*55694b2aSYipeng WangBF is used for applications that need only one set, and the
160*55694b2aSYipeng Wangmembership of elements is checked against the BF. The example discussed
161*55694b2aSYipeng Wangin the above figure is one example of potential applications that uses only one
162*55694b2aSYipeng Wangset to capture the node IDs that have been visited so far by the packet. Each
163*55694b2aSYipeng Wangnode will then check this embedded BF in the packet header for its own id, and
164*55694b2aSYipeng Wangif the BF indicates that the current node is definitely not in the set then a
165*55694b2aSYipeng Wangloop-free route is guaranteed.
166*55694b2aSYipeng Wang
167*55694b2aSYipeng Wang
168*55694b2aSYipeng Wang.. _figure_membership4:
169*55694b2aSYipeng Wang.. figure:: img/member_i4.*
170*55694b2aSYipeng Wang
171*55694b2aSYipeng Wang  Vector Bloom Filter (vBF) Overview
172*55694b2aSYipeng Wang
173*55694b2aSYipeng WangTo support membership test for both multiple sets and a single set,
174*55694b2aSYipeng Wangthe library implements a Vector Bloom Filter (vBF) scheme.
175*55694b2aSYipeng WangvBF basically composes multiple bloom filters into a vector of bloom filers.
176*55694b2aSYipeng WangThe membership test is conducted on all of the
177*55694b2aSYipeng Wangbloom filters concurrently to determine which set(s) it belongs to or none of
178*55694b2aSYipeng Wangthem. The basic idea of vBF is shown in the above figure where an element is
179*55694b2aSYipeng Wangused to address multiple bloom filters concurrently and the bloom filter
180*55694b2aSYipeng Wangindex(es) with a hit is returned.
181*55694b2aSYipeng Wang
182*55694b2aSYipeng Wang.. _figure_membership5:
183*55694b2aSYipeng Wang.. figure:: img/member_i5.*
184*55694b2aSYipeng Wang
185*55694b2aSYipeng Wang  vBF for Flow Scheduling to Worker Thread
186*55694b2aSYipeng Wang
187*55694b2aSYipeng WangAs previously mentioned, there are many usages of such structures. vBF is used
188*55694b2aSYipeng Wangfor applications that need to check membership against multiple sets
189*55694b2aSYipeng Wangsimultaneously. The example shown in the above figure uses a set to capture
190*55694b2aSYipeng Wangall flows being assigned for processing at a given worker thread. Upon receiving
191*55694b2aSYipeng Wanga packet the vBF is used to quickly figure out if this packet belongs to a new flow
192*55694b2aSYipeng Wangso as to be forwarded to the current least loaded worker thread, or otherwise it
193*55694b2aSYipeng Wangshould be queued for an existing thread to guarantee in-order processing (i.e.
194*55694b2aSYipeng Wangthe property of vBF to indicate right away that a given flow is a new one or
195*55694b2aSYipeng Wangnot is critical to minimize response time latency).
196*55694b2aSYipeng Wang
197*55694b2aSYipeng WangIt should be noted that vBF can be implemented using a set of single bloom
198*55694b2aSYipeng Wangfilters with sequential lookup of each BF. However, being able to concurrently
199*55694b2aSYipeng Wangsearch all set-summaries is a big throughput advantage. In the library, certain
200*55694b2aSYipeng Wangparallelism is realized by the implementation of checking all bloom filters
201*55694b2aSYipeng Wangtogether.
202*55694b2aSYipeng Wang
203*55694b2aSYipeng Wang
204*55694b2aSYipeng WangHash-Table based Set-Summaries
205*55694b2aSYipeng Wang------------------------------
206*55694b2aSYipeng Wang
207*55694b2aSYipeng WangHash-table based set-summary (HTSS) is another scheme in the membership library.
208*55694b2aSYipeng WangCuckoo filter [Member-cfilter] is an example of HTSS.
209*55694b2aSYipeng WangHTSS supports multi-set membership testing like
210*55694b2aSYipeng WangvBF does. However, while vBF is better for a small number of targets, HTSS is more suitable
211*55694b2aSYipeng Wangand can easily outperform vBF when the number of sets is
212*55694b2aSYipeng Wanglarge, since HTSS uses a single hash table for membership testing while vBF
213*55694b2aSYipeng Wangrequires testing a series of Bloom Filters each corresponding to one set.
214*55694b2aSYipeng WangAs a result, generally speaking vBF is more adequate for the case of a small limited number of sets
215*55694b2aSYipeng Wangwhile HTSS should be used with a larger number of sets.
216*55694b2aSYipeng Wang
217*55694b2aSYipeng Wang.. _figure_membership6:
218*55694b2aSYipeng Wang.. figure:: img/member_i6.*
219*55694b2aSYipeng Wang
220*55694b2aSYipeng Wang  Using HTSS for Attack Signature Matching
221*55694b2aSYipeng Wang
222*55694b2aSYipeng WangAs shown in the above figure, attack signature matching where each set
223*55694b2aSYipeng Wangrepresents a certain signature length (for correctness of this example, an
224*55694b2aSYipeng Wangattack signature should not be a subset of another one) in the payload is a good
225*55694b2aSYipeng Wangexample for using HTSS with 0% false negative (i.e., when an element returns not
226*55694b2aSYipeng Wangfound, it has a 100% certainty that it is not a member of any set).  The packet
227*55694b2aSYipeng Wanginspection application benefits from knowing right away that the current payload
228*55694b2aSYipeng Wangdoes not match any attack signatures in the database to establish its
229*55694b2aSYipeng Wanglegitimacy, otherwise a deep inspection of the packet is needed.
230*55694b2aSYipeng Wang
231*55694b2aSYipeng WangHTSS employs a similar but simpler data structure to a traditional hash table,
232*55694b2aSYipeng Wangand the major difference is that HTSS stores only the signatures but not the
233*55694b2aSYipeng Wangfull keys/elements which can significantly reduce the footprint of the table.
234*55694b2aSYipeng WangAlong with the signature, HTSS also stores a value to indicate the target set.
235*55694b2aSYipeng WangWhen looking up an element, the element is hashed and the HTSS is addressed
236*55694b2aSYipeng Wangto retrieve the signature stored. If the signature matches then the value is
237*55694b2aSYipeng Wangretrieved corresponding to the index of the target set which the element belongs
238*55694b2aSYipeng Wangto. Because signatures can collide, HTSS can still has false positive
239*55694b2aSYipeng Wangprobability. Furthermore, if elements are allowed to be
240*55694b2aSYipeng Wangoverwritten or evicted when the hash table becomes full, it will also have a
241*55694b2aSYipeng Wangfalse negative probability. We discuss this case in the next section.
242*55694b2aSYipeng Wang
243*55694b2aSYipeng WangSet-Summaries with False Negative Probability
244*55694b2aSYipeng Wang~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
245*55694b2aSYipeng Wang
246*55694b2aSYipeng WangAs previously mentioned, traditional set-summaries (e.g. Bloom Filters) do not
247*55694b2aSYipeng Wanghave a false negative probability, i.e., it is 100% certain when an element
248*55694b2aSYipeng Wangreturns "not to be present" for a given set. However, the Membership Library
249*55694b2aSYipeng Wangalso supports a set-summary probabilistic data structure based on HTSS which
250*55694b2aSYipeng Wangallows for false negative probability.
251*55694b2aSYipeng Wang
252*55694b2aSYipeng WangIn HTSS, when the hash table becomes full, keys/elements will fail to be added
253*55694b2aSYipeng Wanginto the table and the hash table has to be resized to accommodate for these new
254*55694b2aSYipeng Wangelements, which can be expensive. However, if we allow new elements to overwrite
255*55694b2aSYipeng Wangor evict existing elements (as a cache typically does), then the resulting
256*55694b2aSYipeng Wangset-summary will begin to have false negative probability. This is because the
257*55694b2aSYipeng Wangelement that was evicted from the set-summary may still be present in the target
258*55694b2aSYipeng Wangset. For subsequent inquiries the set-summary will falsely report the element
259*55694b2aSYipeng Wangnot being in the set, hence having a false negative probability.
260*55694b2aSYipeng Wang
261*55694b2aSYipeng WangThe major usage of HTSS with false negative is to use it as a cache for
262*55694b2aSYipeng Wangdistributing elements to different target sets. By allowing HTSS to evict old
263*55694b2aSYipeng Wangelements, the set-summary can keep track of the most recent elements
264*55694b2aSYipeng Wang(i.e. active) as a cache typically does. Old inactive elements (infrequently
265*55694b2aSYipeng Wangused elements) will automatically and eventually get evicted from the
266*55694b2aSYipeng Wangset-summary. It is worth noting that the set-summary still has false positive
267*55694b2aSYipeng Wangprobability, which means the application either can tolerate certain false positive
268*55694b2aSYipeng Wangor it has fall-back path when false positive happens.
269*55694b2aSYipeng Wang
270*55694b2aSYipeng Wang.. _figure_membership7:
271*55694b2aSYipeng Wang.. figure:: img/member_i7.*
272*55694b2aSYipeng Wang
273*55694b2aSYipeng Wang  Using HTSS with False Negatives for Wild Card Classification
274*55694b2aSYipeng Wang
275*55694b2aSYipeng WangHTSS with false negative (i.e. a cache) also has its wide set of applications.
276*55694b2aSYipeng WangFor example wild card flow classification (e.g. ACL rules) highlighted in the
277*55694b2aSYipeng Wangabove figure is an example of such application. In that case each target set
278*55694b2aSYipeng Wangrepresents a sub-table with rules defined by a certain flow mask. The flow masks
279*55694b2aSYipeng Wangare non-overlapping, and for flows matching more than one rule only the highest
280*55694b2aSYipeng Wangpriority one is inserted in the corresponding sub-table (interested readers can
281*55694b2aSYipeng Wangrefer to the Open vSwitch (OvS) design of Mega Flow Cache (MFC) [Member-OvS]
282*55694b2aSYipeng Wangfor further details). Typically the rules will have a large number of distinct
283*55694b2aSYipeng Wangunique masks and hence, a large number of target sets each corresponding to one
284*55694b2aSYipeng Wangmask. Because the active set of flows varies widely based on the network
285*55694b2aSYipeng Wangtraffic, HTSS with false negative will act as a cache for <flowid, target ACL
286*55694b2aSYipeng Wangsub-table> pair for the current active set of flows. When a miss occurs (as
287*55694b2aSYipeng Wangshown in red in the above figure) the sub-tables will be searched sequentially
288*55694b2aSYipeng Wangone by one for a possible match, and when found the flow key and target
289*55694b2aSYipeng Wangsub-table will be inserted into the set-summary (i.e. cache insertion) so
290*55694b2aSYipeng Wangsubsequent packets from the same flow don’t incur the overhead of the
291*55694b2aSYipeng Wangsequential search of sub-tables.
292*55694b2aSYipeng Wang
293*55694b2aSYipeng WangLibrary API Overview
294*55694b2aSYipeng Wang--------------------
295*55694b2aSYipeng Wang
296*55694b2aSYipeng WangThe design goal of the Membership Library API is to be as generic as possible to
297*55694b2aSYipeng Wangsupport all the different types of set-summaries we discussed in previous
298*55694b2aSYipeng Wangsections and beyond. Fundamentally, the APIs need to include creation,
299*55694b2aSYipeng Wanginsertion, deletion, and lookup.
300*55694b2aSYipeng Wang
301*55694b2aSYipeng Wang
302*55694b2aSYipeng WangSet-summary Create
303*55694b2aSYipeng Wang~~~~~~~~~~~~~~~~~~
304*55694b2aSYipeng Wang
305*55694b2aSYipeng WangThe ``rte_member_create()`` function is used to create a set-summary structure, the input parameter
306*55694b2aSYipeng Wangis a struct to pass in parameters that needed to initialize the set-summary, while the function returns the
307*55694b2aSYipeng Wangpointer to the created set-summary or ``NULL`` if the creation failed.
308*55694b2aSYipeng Wang
309*55694b2aSYipeng WangThe general input arguments used when creating the set-summary should include ``name``
310*55694b2aSYipeng Wangwhich is the name of the created set-summary, *type* which is one of the types
311*55694b2aSYipeng Wangsupported by the library (e.g. ``RTE_MEMBER_TYPE_HT`` for HTSS or ``RTE_MEMBER_TYPE_VBF`` for vBF), and ``key_len``
312*55694b2aSYipeng Wangwhich is the length of the element/key. There are other parameters
313*55694b2aSYipeng Wangare only used for certain type of set-summary, or which have a slightly different meaning for different types of set-summary.
314*55694b2aSYipeng WangFor example, ``num_keys`` parameter means the maximum number of entries for Hash table based set-summary.
315*55694b2aSYipeng WangHowever, for bloom filter, this value means the expected number of keys that could be
316*55694b2aSYipeng Wanginserted into the bloom filter(s). The value is used to calculate the size of each
317*55694b2aSYipeng Wangbloom filter.
318*55694b2aSYipeng Wang
319*55694b2aSYipeng WangWe also pass two seeds: ``prim_hash_seed`` and
320*55694b2aSYipeng Wang``sec_hash_seed`` for the primary and secondary hash functions to calculate two independent hash values.
321*55694b2aSYipeng Wang``socket_id`` parameter is the NUMA socket ID for the memory used to create the
322*55694b2aSYipeng Wangset-summary. For HTSS, another parameter ``is_cache`` is used to indicate
323*55694b2aSYipeng Wangif this set-summary is a cache (i.e. with false negative probability) or not.
324*55694b2aSYipeng WangFor vBF, extra parameters are needed. For example, ``num_set`` is the number of
325*55694b2aSYipeng Wangsets needed to initialize the vector bloom filters. This number is equal to the
326*55694b2aSYipeng Wangnumber of bloom filters will be created.
327*55694b2aSYipeng Wang``false_pos_rate`` is the false positive rate. num_keys and false_pos_rate will be used to determine
328*55694b2aSYipeng Wangthe number of hash functions and the bloom filter size.
329*55694b2aSYipeng Wang
330*55694b2aSYipeng Wang
331*55694b2aSYipeng WangSet-summary Element Insertion
332*55694b2aSYipeng Wang~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
333*55694b2aSYipeng Wang
334*55694b2aSYipeng WangThe ``rte_member_add()`` function is used to insert an element/key into a set-summary structure. If it fails an
335*55694b2aSYipeng Wangerror is returned. For success the returned value is dependent on the
336*55694b2aSYipeng Wangset-summary mode to provide extra information for the users. For vBF
337*55694b2aSYipeng Wangmode, a return value of 0 means a successful insert. For HTSS mode without false negative, the insert
338*55694b2aSYipeng Wangcould fail with ``-ENOSPC`` if the table is full. With false negative (i.e. cache mode),
339*55694b2aSYipeng Wangfor insert that does not cause any eviction (i.e. no overwriting happens to an
340*55694b2aSYipeng Wangexisting entry) the return value is 0. For insertion that causes eviction, the return
341*55694b2aSYipeng Wangvalue is 1 to indicate such situation, but it is not an error.
342*55694b2aSYipeng Wang
343*55694b2aSYipeng WangThe input arguments for the function should include the ``key`` which is a pointer to the element/key that needs to
344*55694b2aSYipeng Wangbe added to the set-summary, and ``set_id`` which is the set id associated
345*55694b2aSYipeng Wangwith the key that needs to be added.
346*55694b2aSYipeng Wang
347*55694b2aSYipeng Wang
348*55694b2aSYipeng WangSet-summary Element Lookup
349*55694b2aSYipeng Wang~~~~~~~~~~~~~~~~~~~~~~~~~~
350*55694b2aSYipeng Wang
351*55694b2aSYipeng WangThe ``rte_member_lookup()`` function looks up a single key/element in the set-summary structure. It
352*55694b2aSYipeng Wangreturns as soon as the first match is found. The return value is 1 if a
353*55694b2aSYipeng Wangmatch is found and 0 otherwise. The arguments for the function include ``key`` which is a pointer to the
354*55694b2aSYipeng Wangelement/key that needs to be looked up, and ``set_id`` which is used to return the
355*55694b2aSYipeng Wangfirst target set id where the key has matched, if any.
356*55694b2aSYipeng Wang
357*55694b2aSYipeng WangThe ``rte_member_lookup_bulk()`` function is used to look up a bulk of keys/elements in the
358*55694b2aSYipeng Wangset-summary structure for their first match. Each key lookup returns as soon as the first match is found. The
359*55694b2aSYipeng Wangreturn value is the number of keys that find a match. The arguments of the function include ``keys``
360*55694b2aSYipeng Wangwhich is a pointer to a bulk of keys that are to be looked up,
361*55694b2aSYipeng Wang``num_keys`` is the number
362*55694b2aSYipeng Wangof keys that will be looked up, and ``set_ids`` are the return target set
363*55694b2aSYipeng Wangids for the first match found for each of the input keys. ``set_ids`` is an array
364*55694b2aSYipeng Wangneeds to be sized according to the ``num_keys``. If there is no match, the set id
365*55694b2aSYipeng Wangfor that key will be set to RTE_MEMBER_NO_MATCH.
366*55694b2aSYipeng Wang
367*55694b2aSYipeng WangThe ``rte_member_lookup_multi()`` function looks up a single key/element in the
368*55694b2aSYipeng Wangset-summary structure for multiple matches. It
369*55694b2aSYipeng Wangreturns ALL the matches (possibly more than one) found for this key when it
370*55694b2aSYipeng Wangis matched against all target sets (it is worth noting that for cache mode HTSS,
371*55694b2aSYipeng Wangthe current implementation matches at most one target set). The return value is
372*55694b2aSYipeng Wangthe number of matches
373*55694b2aSYipeng Wangthat was found for this key (for cache mode HTSS the return value
374*55694b2aSYipeng Wangshould be at most 1). The arguments for the function include ``key`` which is a pointer to the
375*55694b2aSYipeng Wangelement/key that needs to be looked up, ``max_match_per_key`` which is to indicate the maximum number of matches
376*55694b2aSYipeng Wangthe user expects to find for each key, and ``set_id`` which is used to return all
377*55694b2aSYipeng Wangtarget set ids where the key has matched, if any. The ``set_id`` array should be sized
378*55694b2aSYipeng Wangaccording to ``max_match_per_key``. For vBF, the maximum number of matches per key is equal
379*55694b2aSYipeng Wangto the number of sets. For HTSS, the maximum number of matches per key is equal to two time
380*55694b2aSYipeng Wangentry count per bucket. ``max_match_per_key`` should be equal or smaller than the maximum number of
381*55694b2aSYipeng Wangpossible matches.
382*55694b2aSYipeng Wang
383*55694b2aSYipeng WangThe ``rte_membership_lookup_multi_bulk()`` function looks up a bulk of keys/elements in the
384*55694b2aSYipeng Wangset-summary structure for multiple matches, each key lookup returns ALL the matches (possibly more
385*55694b2aSYipeng Wangthan one) found for this key when it is matched against all target sets (cache mode HTSS
386*55694b2aSYipeng Wangmatches at most one target set). The
387*55694b2aSYipeng Wangreturn value is the number of keys that find one or more matches in the
388*55694b2aSYipeng Wangset-summary structure. The arguments of the
389*55694b2aSYipeng Wangfunction include ``keys`` which is
390*55694b2aSYipeng Wanga pointer to a bulk of keys that are to be looked up, ``num_keys`` is the number
391*55694b2aSYipeng Wangof keys that will be looked up, ``max_match_per_key`` is the possible
392*55694b2aSYipeng Wangmaximum number of matches for each key, ``match_count`` which is the returned number
393*55694b2aSYipeng Wangof matches for each key, and ``set_ids`` are the returned target set
394*55694b2aSYipeng Wangids for all matches found for each keys. ``set_ids`` is 2-D array
395*55694b2aSYipeng Wangcontaining a 1-D array for each key (the size of 1-D array per key should be set by the user according to ``max_match_per_key``).
396*55694b2aSYipeng Wang``max_match_per_key`` should be equal or smaller than the maximum number of
397*55694b2aSYipeng Wangpossible matches, similar to ``rte_member_lookup_multi``.
398*55694b2aSYipeng Wang
399*55694b2aSYipeng Wang
400*55694b2aSYipeng WangSet-summary Element Delete
401*55694b2aSYipeng Wang~~~~~~~~~~~~~~~~~~~~~~~~~~
402*55694b2aSYipeng Wang
403*55694b2aSYipeng WangThe ``rte_membership_delete()`` function deletes an element/key from a set-summary structure, if it fails
404*55694b2aSYipeng Wangan error is returned. The input arguments should include ``key`` which is a pointer to the
405*55694b2aSYipeng Wangelement/key that needs to be deleted from the set-summary, and ``set_id``
406*55694b2aSYipeng Wangwhich is the set id associated with the key to delete. It is worth noting that current
407*55694b2aSYipeng Wangimplementation of vBF does not support deletion [1]_. An error code ``-EINVAL`` will be returned.
408*55694b2aSYipeng Wang
409*55694b2aSYipeng Wang.. [1] Traditional bloom filter does not support proactive deletion. Supporting proactive deletion require additional implementation and performance overhead.
410*55694b2aSYipeng Wang
411*55694b2aSYipeng WangReferences
412*55694b2aSYipeng Wang-----------
413*55694b2aSYipeng Wang
414*55694b2aSYipeng Wang[Member-bloom] B H Bloom, "Space/Time Trade-offs in Hash Coding with Allowable Errors," Communications of the ACM, 1970.
415*55694b2aSYipeng Wang
416*55694b2aSYipeng Wang[Member-survey] A Broder and M Mitzenmacher, "Network Applications of Bloom Filters: A Survey," in Internet Mathematics, 2005.
417*55694b2aSYipeng Wang
418*55694b2aSYipeng Wang[Member-cfilter] B Fan, D G Andersen and M Kaminsky, "Cuckoo Filter: Practically Better Than Bloom," in Conference on emerging Networking Experiments and Technologies, 2014.
419*55694b2aSYipeng Wang
420*55694b2aSYipeng Wang[Member-OvS] B Pfaff, "The Design and Implementation of Open vSwitch," in NSDI, 2015.
421