1*55694b2aSYipeng Wang.. BSD LICENSE 2*55694b2aSYipeng Wang Copyright(c) 2017 Intel Corporation. All rights reserved. 3*55694b2aSYipeng Wang All rights reserved. 4*55694b2aSYipeng Wang 5*55694b2aSYipeng Wang Redistribution and use in source and binary forms, with or without 6*55694b2aSYipeng Wang modification, are permitted provided that the following conditions 7*55694b2aSYipeng Wang are met: 8*55694b2aSYipeng Wang 9*55694b2aSYipeng Wang * Redistributions of source code must retain the above copyright 10*55694b2aSYipeng Wang notice, this list of conditions and the following disclaimer. 11*55694b2aSYipeng Wang * Redistributions in binary form must reproduce the above copyright 12*55694b2aSYipeng Wang notice, this list of conditions and the following disclaimer in 13*55694b2aSYipeng Wang the documentation and/or other materials provided with the 14*55694b2aSYipeng Wang distribution. 15*55694b2aSYipeng Wang * Neither the name of Intel Corporation nor the names of its 16*55694b2aSYipeng Wang contributors may be used to endorse or promote products derived 17*55694b2aSYipeng Wang from this software without specific prior written permission. 18*55694b2aSYipeng Wang 19*55694b2aSYipeng Wang THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20*55694b2aSYipeng Wang "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21*55694b2aSYipeng Wang LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 22*55694b2aSYipeng Wang A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 23*55694b2aSYipeng Wang OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 24*55694b2aSYipeng Wang SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 25*55694b2aSYipeng Wang LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 26*55694b2aSYipeng Wang DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 27*55694b2aSYipeng Wang THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 28*55694b2aSYipeng Wang (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29*55694b2aSYipeng Wang OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30*55694b2aSYipeng Wang 31*55694b2aSYipeng Wang 32*55694b2aSYipeng Wang.. _member_library: 33*55694b2aSYipeng Wang 34*55694b2aSYipeng WangMembership Library 35*55694b2aSYipeng Wang================== 36*55694b2aSYipeng Wang 37*55694b2aSYipeng WangIntroduction 38*55694b2aSYipeng Wang------------ 39*55694b2aSYipeng Wang 40*55694b2aSYipeng WangThe DPDK Membership Library provides an API for DPDK applications to insert a 41*55694b2aSYipeng Wangnew member, delete an existing member, or query the existence of a member in a 42*55694b2aSYipeng Wanggiven set, or a group of sets. For the case of a group of sets, the library 43*55694b2aSYipeng Wangwill return not only whether the element has been inserted before in one of 44*55694b2aSYipeng Wangthe sets but also which set it belongs to. The Membership Library is an 45*55694b2aSYipeng Wangextension and generalization of a traditional filter structure (for example 46*55694b2aSYipeng WangBloom Filter [Member-bloom]) that has multiple usages in a wide variety of 47*55694b2aSYipeng Wangworkloads and applications. In general, the Membership Library is a data 48*55694b2aSYipeng Wangstructure that provides a "set-summary" on whether a member belongs to a set, 49*55694b2aSYipeng Wangand as discussed in detail later, there are two advantages of using such a 50*55694b2aSYipeng Wangset-summary rather than operating on a "full-blown" complete list of elements: 51*55694b2aSYipeng Wangfirst, it has a much smaller storage requirement than storing the whole list of 52*55694b2aSYipeng Wangelements themselves, and secondly checking an element membership (or other 53*55694b2aSYipeng Wangoperations) in this set-summary is much faster than checking it for the 54*55694b2aSYipeng Wangoriginal full-blown complete list of elements. 55*55694b2aSYipeng Wang 56*55694b2aSYipeng WangWe use the term "Set-Summary" in this guide to refer to the space-efficient, 57*55694b2aSYipeng Wangprobabilistic membership data structure that is provided by the library. A 58*55694b2aSYipeng Wangmembership test for an element will return the set this element belongs to or 59*55694b2aSYipeng Wangthat the element is "not-found" with very high probability of accuracy. Set-summary 60*55694b2aSYipeng Wangis a fundamental data aggregation component that can be used in many network 61*55694b2aSYipeng Wang(and other) applications. It is a crucial structure to address performance and 62*55694b2aSYipeng Wangscalability issues of diverse network applications including overlay networks, 63*55694b2aSYipeng Wangdata-centric networks, flow table summaries, network statistics and 64*55694b2aSYipeng Wangtraffic monitoring. A set-summary is useful for applications who need to 65*55694b2aSYipeng Wanginclude a list of elements while a complete list requires too much space 66*55694b2aSYipeng Wangand/or too much processing cost. In these situations, the set-summary works as 67*55694b2aSYipeng Wanga lossy hash-based representation of a set of members. It can dramatically 68*55694b2aSYipeng Wangreduce space requirement and significantly improve the performance of set 69*55694b2aSYipeng Wangmembership queries at the cost of introducing a very small membership test error 70*55694b2aSYipeng Wangprobability. 71*55694b2aSYipeng Wang 72*55694b2aSYipeng Wang.. _figure_membership1: 73*55694b2aSYipeng Wang.. figure:: img/member_i1.* 74*55694b2aSYipeng Wang 75*55694b2aSYipeng Wang Example Usages of Membership Library 76*55694b2aSYipeng Wang 77*55694b2aSYipeng WangThere are various usages for a Membership Library in a very 78*55694b2aSYipeng Wanglarge set of applications and workloads. Interested readers can refer to 79*55694b2aSYipeng Wang[Member-survey] for a survey of possible networking usages. The above figure 80*55694b2aSYipeng Wangprovide a small set of examples of using the Membership Library: 81*55694b2aSYipeng Wang 82*55694b2aSYipeng Wang* Sub-figure (a) 83*55694b2aSYipeng Wang depicts a distributed web cache architecture where a collection of proxies 84*55694b2aSYipeng Wang attempt to share their web caches (cached from a set of back-end web servers) to 85*55694b2aSYipeng Wang provide faster responses to clients, and the proxies use the Membership 86*55694b2aSYipeng Wang Library to share summaries of what web pages/objects they are caching. With the 87*55694b2aSYipeng Wang Membership Library, a proxy receiving an http request will inquire the 88*55694b2aSYipeng Wang set-summary to find its location and quickly determine whether to retrieve the 89*55694b2aSYipeng Wang requested web page from a nearby proxy or from a back-end web server. 90*55694b2aSYipeng Wang 91*55694b2aSYipeng Wang* Sub-figure (b) depicts another example for using the Membership Library to 92*55694b2aSYipeng Wang prevent routing loops which is typically done using slow TTL countdown and 93*55694b2aSYipeng Wang dropping packets when TTL expires. As shown in Sub-figure (b), an embedded 94*55694b2aSYipeng Wang set-summary in the packet header itself can be used to summarize the set of 95*55694b2aSYipeng Wang nodes a packet has gone through, and each node upon receiving a packet can check 96*55694b2aSYipeng Wang whether its id is a member of the set of visited nodes, and if it is, then a 97*55694b2aSYipeng Wang routing loop is detected. 98*55694b2aSYipeng Wang 99*55694b2aSYipeng Wang* Sub-Figure (c) presents another usage of the Membership 100*55694b2aSYipeng Wang Library to load-balance flows to worker threads with in-order guarantee where a 101*55694b2aSYipeng Wang set-summary is used to query if a packet belongs to an existing flow or a new 102*55694b2aSYipeng Wang flow. Packets belonging to a new flow are forwarded to the current least loaded 103*55694b2aSYipeng Wang worker thread, while those belonging to an existing flow are forwarded to the 104*55694b2aSYipeng Wang pre-assigned thread to guarantee in-order processing. 105*55694b2aSYipeng Wang 106*55694b2aSYipeng Wang* Sub-figure (d) highlights 107*55694b2aSYipeng Wang yet another usage example in the database domain where a set-summary is used to 108*55694b2aSYipeng Wang determine joins between sets instead of creating a join by comparing each 109*55694b2aSYipeng Wang element of a set against the other elements in a different set, a join is done 110*55694b2aSYipeng Wang on the summaries since they can efficiently encode members of a given set. 111*55694b2aSYipeng Wang 112*55694b2aSYipeng WangMembership Library is a configurable library that is optimized to cover set 113*55694b2aSYipeng Wangmembership functionality for both a single set and multi-set scenarios. Two set-summary 114*55694b2aSYipeng Wangschemes are presented including (a) vector of Bloom Filters and (b) Hash-Table based 115*55694b2aSYipeng Wangset-summary schemes with and without false negative probability. 116*55694b2aSYipeng WangThis guide first briefly describes these different types of set-summaries, usage examples for each, 117*55694b2aSYipeng Wangand then it highlights the Membership Library API. 118*55694b2aSYipeng Wang 119*55694b2aSYipeng WangVector of Bloom Filters 120*55694b2aSYipeng Wang----------------------- 121*55694b2aSYipeng Wang 122*55694b2aSYipeng WangBloom Filter (BF) [Member-bloom] is a well-known space-efficient 123*55694b2aSYipeng Wangprobabilistic data structure that answers set membership queries (test whether 124*55694b2aSYipeng Wangan element is a member of a set) with some probability of false positives and 125*55694b2aSYipeng Wangzero false negatives; a query for an element returns either it is "possibly in 126*55694b2aSYipeng Wanga set" (with very high probability) or "definitely not in a set". 127*55694b2aSYipeng Wang 128*55694b2aSYipeng WangThe BF is a method for representing a set of ``n`` elements (for example flow keys 129*55694b2aSYipeng Wangin network applications domain) to support membership queries. The idea of BF is 130*55694b2aSYipeng Wangto allocate a bit-vector ``v`` with ``m`` bits, which are initially all set to 0. Then 131*55694b2aSYipeng Wangit chooses ``k`` independent hash functions ``h1``, ``h2``, ... ``hk`` with hash values range from 132*55694b2aSYipeng Wang``0`` to ``m-1`` to perform hashing calculations on each element to be inserted. Every time when an 133*55694b2aSYipeng Wangelement ``X`` being inserted into the set, the bits at positions ``h1(X)``, ``h2(X)``, ... 134*55694b2aSYipeng Wang``hk(X)`` in ``v`` are set to 1 (any particular bit might be set to 1 multiple times 135*55694b2aSYipeng Wangfor multiple different inserted elements). Given a query for any element ``Y``, the 136*55694b2aSYipeng Wangbits at positions ``h1(Y)``, ``h2(Y)``, ... ``hk(Y)`` are checked. If any of them is 0, 137*55694b2aSYipeng Wangthen Y is definitely not in the set. Otherwise there is a high probability that 138*55694b2aSYipeng WangY is a member of the set with certain false positive probability. As shown in 139*55694b2aSYipeng Wangthe next equation, the false positive probability can be made arbitrarily small 140*55694b2aSYipeng Wangby changing the number of hash functions (``k``) and the vector length (``m``). 141*55694b2aSYipeng Wang 142*55694b2aSYipeng Wang.. _figure_membership2: 143*55694b2aSYipeng Wang.. figure:: img/member_i2.* 144*55694b2aSYipeng Wang 145*55694b2aSYipeng Wang Bloom Filter False Positive Probability 146*55694b2aSYipeng Wang 147*55694b2aSYipeng WangWithout BF, an accurate membership testing could involve a costly hash table 148*55694b2aSYipeng Wanglookup and full element comparison. The advantage of using a BF is to simplify 149*55694b2aSYipeng Wangthe membership test into a series of hash calculations and memory accesses for a 150*55694b2aSYipeng Wangsmall bit-vector, which can be easily optimized. Hence the lookup throughput 151*55694b2aSYipeng Wang(set membership test) can be significantly faster than a normal hash table 152*55694b2aSYipeng Wanglookup with element comparison. 153*55694b2aSYipeng Wang 154*55694b2aSYipeng Wang.. _figure_membership3: 155*55694b2aSYipeng Wang.. figure:: img/member_i3.* 156*55694b2aSYipeng Wang 157*55694b2aSYipeng Wang Detecting Routing Loops Using BF 158*55694b2aSYipeng Wang 159*55694b2aSYipeng WangBF is used for applications that need only one set, and the 160*55694b2aSYipeng Wangmembership of elements is checked against the BF. The example discussed 161*55694b2aSYipeng Wangin the above figure is one example of potential applications that uses only one 162*55694b2aSYipeng Wangset to capture the node IDs that have been visited so far by the packet. Each 163*55694b2aSYipeng Wangnode will then check this embedded BF in the packet header for its own id, and 164*55694b2aSYipeng Wangif the BF indicates that the current node is definitely not in the set then a 165*55694b2aSYipeng Wangloop-free route is guaranteed. 166*55694b2aSYipeng Wang 167*55694b2aSYipeng Wang 168*55694b2aSYipeng Wang.. _figure_membership4: 169*55694b2aSYipeng Wang.. figure:: img/member_i4.* 170*55694b2aSYipeng Wang 171*55694b2aSYipeng Wang Vector Bloom Filter (vBF) Overview 172*55694b2aSYipeng Wang 173*55694b2aSYipeng WangTo support membership test for both multiple sets and a single set, 174*55694b2aSYipeng Wangthe library implements a Vector Bloom Filter (vBF) scheme. 175*55694b2aSYipeng WangvBF basically composes multiple bloom filters into a vector of bloom filers. 176*55694b2aSYipeng WangThe membership test is conducted on all of the 177*55694b2aSYipeng Wangbloom filters concurrently to determine which set(s) it belongs to or none of 178*55694b2aSYipeng Wangthem. The basic idea of vBF is shown in the above figure where an element is 179*55694b2aSYipeng Wangused to address multiple bloom filters concurrently and the bloom filter 180*55694b2aSYipeng Wangindex(es) with a hit is returned. 181*55694b2aSYipeng Wang 182*55694b2aSYipeng Wang.. _figure_membership5: 183*55694b2aSYipeng Wang.. figure:: img/member_i5.* 184*55694b2aSYipeng Wang 185*55694b2aSYipeng Wang vBF for Flow Scheduling to Worker Thread 186*55694b2aSYipeng Wang 187*55694b2aSYipeng WangAs previously mentioned, there are many usages of such structures. vBF is used 188*55694b2aSYipeng Wangfor applications that need to check membership against multiple sets 189*55694b2aSYipeng Wangsimultaneously. The example shown in the above figure uses a set to capture 190*55694b2aSYipeng Wangall flows being assigned for processing at a given worker thread. Upon receiving 191*55694b2aSYipeng Wanga packet the vBF is used to quickly figure out if this packet belongs to a new flow 192*55694b2aSYipeng Wangso as to be forwarded to the current least loaded worker thread, or otherwise it 193*55694b2aSYipeng Wangshould be queued for an existing thread to guarantee in-order processing (i.e. 194*55694b2aSYipeng Wangthe property of vBF to indicate right away that a given flow is a new one or 195*55694b2aSYipeng Wangnot is critical to minimize response time latency). 196*55694b2aSYipeng Wang 197*55694b2aSYipeng WangIt should be noted that vBF can be implemented using a set of single bloom 198*55694b2aSYipeng Wangfilters with sequential lookup of each BF. However, being able to concurrently 199*55694b2aSYipeng Wangsearch all set-summaries is a big throughput advantage. In the library, certain 200*55694b2aSYipeng Wangparallelism is realized by the implementation of checking all bloom filters 201*55694b2aSYipeng Wangtogether. 202*55694b2aSYipeng Wang 203*55694b2aSYipeng Wang 204*55694b2aSYipeng WangHash-Table based Set-Summaries 205*55694b2aSYipeng Wang------------------------------ 206*55694b2aSYipeng Wang 207*55694b2aSYipeng WangHash-table based set-summary (HTSS) is another scheme in the membership library. 208*55694b2aSYipeng WangCuckoo filter [Member-cfilter] is an example of HTSS. 209*55694b2aSYipeng WangHTSS supports multi-set membership testing like 210*55694b2aSYipeng WangvBF does. However, while vBF is better for a small number of targets, HTSS is more suitable 211*55694b2aSYipeng Wangand can easily outperform vBF when the number of sets is 212*55694b2aSYipeng Wanglarge, since HTSS uses a single hash table for membership testing while vBF 213*55694b2aSYipeng Wangrequires testing a series of Bloom Filters each corresponding to one set. 214*55694b2aSYipeng WangAs a result, generally speaking vBF is more adequate for the case of a small limited number of sets 215*55694b2aSYipeng Wangwhile HTSS should be used with a larger number of sets. 216*55694b2aSYipeng Wang 217*55694b2aSYipeng Wang.. _figure_membership6: 218*55694b2aSYipeng Wang.. figure:: img/member_i6.* 219*55694b2aSYipeng Wang 220*55694b2aSYipeng Wang Using HTSS for Attack Signature Matching 221*55694b2aSYipeng Wang 222*55694b2aSYipeng WangAs shown in the above figure, attack signature matching where each set 223*55694b2aSYipeng Wangrepresents a certain signature length (for correctness of this example, an 224*55694b2aSYipeng Wangattack signature should not be a subset of another one) in the payload is a good 225*55694b2aSYipeng Wangexample for using HTSS with 0% false negative (i.e., when an element returns not 226*55694b2aSYipeng Wangfound, it has a 100% certainty that it is not a member of any set). The packet 227*55694b2aSYipeng Wanginspection application benefits from knowing right away that the current payload 228*55694b2aSYipeng Wangdoes not match any attack signatures in the database to establish its 229*55694b2aSYipeng Wanglegitimacy, otherwise a deep inspection of the packet is needed. 230*55694b2aSYipeng Wang 231*55694b2aSYipeng WangHTSS employs a similar but simpler data structure to a traditional hash table, 232*55694b2aSYipeng Wangand the major difference is that HTSS stores only the signatures but not the 233*55694b2aSYipeng Wangfull keys/elements which can significantly reduce the footprint of the table. 234*55694b2aSYipeng WangAlong with the signature, HTSS also stores a value to indicate the target set. 235*55694b2aSYipeng WangWhen looking up an element, the element is hashed and the HTSS is addressed 236*55694b2aSYipeng Wangto retrieve the signature stored. If the signature matches then the value is 237*55694b2aSYipeng Wangretrieved corresponding to the index of the target set which the element belongs 238*55694b2aSYipeng Wangto. Because signatures can collide, HTSS can still has false positive 239*55694b2aSYipeng Wangprobability. Furthermore, if elements are allowed to be 240*55694b2aSYipeng Wangoverwritten or evicted when the hash table becomes full, it will also have a 241*55694b2aSYipeng Wangfalse negative probability. We discuss this case in the next section. 242*55694b2aSYipeng Wang 243*55694b2aSYipeng WangSet-Summaries with False Negative Probability 244*55694b2aSYipeng Wang~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 245*55694b2aSYipeng Wang 246*55694b2aSYipeng WangAs previously mentioned, traditional set-summaries (e.g. Bloom Filters) do not 247*55694b2aSYipeng Wanghave a false negative probability, i.e., it is 100% certain when an element 248*55694b2aSYipeng Wangreturns "not to be present" for a given set. However, the Membership Library 249*55694b2aSYipeng Wangalso supports a set-summary probabilistic data structure based on HTSS which 250*55694b2aSYipeng Wangallows for false negative probability. 251*55694b2aSYipeng Wang 252*55694b2aSYipeng WangIn HTSS, when the hash table becomes full, keys/elements will fail to be added 253*55694b2aSYipeng Wanginto the table and the hash table has to be resized to accommodate for these new 254*55694b2aSYipeng Wangelements, which can be expensive. However, if we allow new elements to overwrite 255*55694b2aSYipeng Wangor evict existing elements (as a cache typically does), then the resulting 256*55694b2aSYipeng Wangset-summary will begin to have false negative probability. This is because the 257*55694b2aSYipeng Wangelement that was evicted from the set-summary may still be present in the target 258*55694b2aSYipeng Wangset. For subsequent inquiries the set-summary will falsely report the element 259*55694b2aSYipeng Wangnot being in the set, hence having a false negative probability. 260*55694b2aSYipeng Wang 261*55694b2aSYipeng WangThe major usage of HTSS with false negative is to use it as a cache for 262*55694b2aSYipeng Wangdistributing elements to different target sets. By allowing HTSS to evict old 263*55694b2aSYipeng Wangelements, the set-summary can keep track of the most recent elements 264*55694b2aSYipeng Wang(i.e. active) as a cache typically does. Old inactive elements (infrequently 265*55694b2aSYipeng Wangused elements) will automatically and eventually get evicted from the 266*55694b2aSYipeng Wangset-summary. It is worth noting that the set-summary still has false positive 267*55694b2aSYipeng Wangprobability, which means the application either can tolerate certain false positive 268*55694b2aSYipeng Wangor it has fall-back path when false positive happens. 269*55694b2aSYipeng Wang 270*55694b2aSYipeng Wang.. _figure_membership7: 271*55694b2aSYipeng Wang.. figure:: img/member_i7.* 272*55694b2aSYipeng Wang 273*55694b2aSYipeng Wang Using HTSS with False Negatives for Wild Card Classification 274*55694b2aSYipeng Wang 275*55694b2aSYipeng WangHTSS with false negative (i.e. a cache) also has its wide set of applications. 276*55694b2aSYipeng WangFor example wild card flow classification (e.g. ACL rules) highlighted in the 277*55694b2aSYipeng Wangabove figure is an example of such application. In that case each target set 278*55694b2aSYipeng Wangrepresents a sub-table with rules defined by a certain flow mask. The flow masks 279*55694b2aSYipeng Wangare non-overlapping, and for flows matching more than one rule only the highest 280*55694b2aSYipeng Wangpriority one is inserted in the corresponding sub-table (interested readers can 281*55694b2aSYipeng Wangrefer to the Open vSwitch (OvS) design of Mega Flow Cache (MFC) [Member-OvS] 282*55694b2aSYipeng Wangfor further details). Typically the rules will have a large number of distinct 283*55694b2aSYipeng Wangunique masks and hence, a large number of target sets each corresponding to one 284*55694b2aSYipeng Wangmask. Because the active set of flows varies widely based on the network 285*55694b2aSYipeng Wangtraffic, HTSS with false negative will act as a cache for <flowid, target ACL 286*55694b2aSYipeng Wangsub-table> pair for the current active set of flows. When a miss occurs (as 287*55694b2aSYipeng Wangshown in red in the above figure) the sub-tables will be searched sequentially 288*55694b2aSYipeng Wangone by one for a possible match, and when found the flow key and target 289*55694b2aSYipeng Wangsub-table will be inserted into the set-summary (i.e. cache insertion) so 290*55694b2aSYipeng Wangsubsequent packets from the same flow don’t incur the overhead of the 291*55694b2aSYipeng Wangsequential search of sub-tables. 292*55694b2aSYipeng Wang 293*55694b2aSYipeng WangLibrary API Overview 294*55694b2aSYipeng Wang-------------------- 295*55694b2aSYipeng Wang 296*55694b2aSYipeng WangThe design goal of the Membership Library API is to be as generic as possible to 297*55694b2aSYipeng Wangsupport all the different types of set-summaries we discussed in previous 298*55694b2aSYipeng Wangsections and beyond. Fundamentally, the APIs need to include creation, 299*55694b2aSYipeng Wanginsertion, deletion, and lookup. 300*55694b2aSYipeng Wang 301*55694b2aSYipeng Wang 302*55694b2aSYipeng WangSet-summary Create 303*55694b2aSYipeng Wang~~~~~~~~~~~~~~~~~~ 304*55694b2aSYipeng Wang 305*55694b2aSYipeng WangThe ``rte_member_create()`` function is used to create a set-summary structure, the input parameter 306*55694b2aSYipeng Wangis a struct to pass in parameters that needed to initialize the set-summary, while the function returns the 307*55694b2aSYipeng Wangpointer to the created set-summary or ``NULL`` if the creation failed. 308*55694b2aSYipeng Wang 309*55694b2aSYipeng WangThe general input arguments used when creating the set-summary should include ``name`` 310*55694b2aSYipeng Wangwhich is the name of the created set-summary, *type* which is one of the types 311*55694b2aSYipeng Wangsupported by the library (e.g. ``RTE_MEMBER_TYPE_HT`` for HTSS or ``RTE_MEMBER_TYPE_VBF`` for vBF), and ``key_len`` 312*55694b2aSYipeng Wangwhich is the length of the element/key. There are other parameters 313*55694b2aSYipeng Wangare only used for certain type of set-summary, or which have a slightly different meaning for different types of set-summary. 314*55694b2aSYipeng WangFor example, ``num_keys`` parameter means the maximum number of entries for Hash table based set-summary. 315*55694b2aSYipeng WangHowever, for bloom filter, this value means the expected number of keys that could be 316*55694b2aSYipeng Wanginserted into the bloom filter(s). The value is used to calculate the size of each 317*55694b2aSYipeng Wangbloom filter. 318*55694b2aSYipeng Wang 319*55694b2aSYipeng WangWe also pass two seeds: ``prim_hash_seed`` and 320*55694b2aSYipeng Wang``sec_hash_seed`` for the primary and secondary hash functions to calculate two independent hash values. 321*55694b2aSYipeng Wang``socket_id`` parameter is the NUMA socket ID for the memory used to create the 322*55694b2aSYipeng Wangset-summary. For HTSS, another parameter ``is_cache`` is used to indicate 323*55694b2aSYipeng Wangif this set-summary is a cache (i.e. with false negative probability) or not. 324*55694b2aSYipeng WangFor vBF, extra parameters are needed. For example, ``num_set`` is the number of 325*55694b2aSYipeng Wangsets needed to initialize the vector bloom filters. This number is equal to the 326*55694b2aSYipeng Wangnumber of bloom filters will be created. 327*55694b2aSYipeng Wang``false_pos_rate`` is the false positive rate. num_keys and false_pos_rate will be used to determine 328*55694b2aSYipeng Wangthe number of hash functions and the bloom filter size. 329*55694b2aSYipeng Wang 330*55694b2aSYipeng Wang 331*55694b2aSYipeng WangSet-summary Element Insertion 332*55694b2aSYipeng Wang~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 333*55694b2aSYipeng Wang 334*55694b2aSYipeng WangThe ``rte_member_add()`` function is used to insert an element/key into a set-summary structure. If it fails an 335*55694b2aSYipeng Wangerror is returned. For success the returned value is dependent on the 336*55694b2aSYipeng Wangset-summary mode to provide extra information for the users. For vBF 337*55694b2aSYipeng Wangmode, a return value of 0 means a successful insert. For HTSS mode without false negative, the insert 338*55694b2aSYipeng Wangcould fail with ``-ENOSPC`` if the table is full. With false negative (i.e. cache mode), 339*55694b2aSYipeng Wangfor insert that does not cause any eviction (i.e. no overwriting happens to an 340*55694b2aSYipeng Wangexisting entry) the return value is 0. For insertion that causes eviction, the return 341*55694b2aSYipeng Wangvalue is 1 to indicate such situation, but it is not an error. 342*55694b2aSYipeng Wang 343*55694b2aSYipeng WangThe input arguments for the function should include the ``key`` which is a pointer to the element/key that needs to 344*55694b2aSYipeng Wangbe added to the set-summary, and ``set_id`` which is the set id associated 345*55694b2aSYipeng Wangwith the key that needs to be added. 346*55694b2aSYipeng Wang 347*55694b2aSYipeng Wang 348*55694b2aSYipeng WangSet-summary Element Lookup 349*55694b2aSYipeng Wang~~~~~~~~~~~~~~~~~~~~~~~~~~ 350*55694b2aSYipeng Wang 351*55694b2aSYipeng WangThe ``rte_member_lookup()`` function looks up a single key/element in the set-summary structure. It 352*55694b2aSYipeng Wangreturns as soon as the first match is found. The return value is 1 if a 353*55694b2aSYipeng Wangmatch is found and 0 otherwise. The arguments for the function include ``key`` which is a pointer to the 354*55694b2aSYipeng Wangelement/key that needs to be looked up, and ``set_id`` which is used to return the 355*55694b2aSYipeng Wangfirst target set id where the key has matched, if any. 356*55694b2aSYipeng Wang 357*55694b2aSYipeng WangThe ``rte_member_lookup_bulk()`` function is used to look up a bulk of keys/elements in the 358*55694b2aSYipeng Wangset-summary structure for their first match. Each key lookup returns as soon as the first match is found. The 359*55694b2aSYipeng Wangreturn value is the number of keys that find a match. The arguments of the function include ``keys`` 360*55694b2aSYipeng Wangwhich is a pointer to a bulk of keys that are to be looked up, 361*55694b2aSYipeng Wang``num_keys`` is the number 362*55694b2aSYipeng Wangof keys that will be looked up, and ``set_ids`` are the return target set 363*55694b2aSYipeng Wangids for the first match found for each of the input keys. ``set_ids`` is an array 364*55694b2aSYipeng Wangneeds to be sized according to the ``num_keys``. If there is no match, the set id 365*55694b2aSYipeng Wangfor that key will be set to RTE_MEMBER_NO_MATCH. 366*55694b2aSYipeng Wang 367*55694b2aSYipeng WangThe ``rte_member_lookup_multi()`` function looks up a single key/element in the 368*55694b2aSYipeng Wangset-summary structure for multiple matches. It 369*55694b2aSYipeng Wangreturns ALL the matches (possibly more than one) found for this key when it 370*55694b2aSYipeng Wangis matched against all target sets (it is worth noting that for cache mode HTSS, 371*55694b2aSYipeng Wangthe current implementation matches at most one target set). The return value is 372*55694b2aSYipeng Wangthe number of matches 373*55694b2aSYipeng Wangthat was found for this key (for cache mode HTSS the return value 374*55694b2aSYipeng Wangshould be at most 1). The arguments for the function include ``key`` which is a pointer to the 375*55694b2aSYipeng Wangelement/key that needs to be looked up, ``max_match_per_key`` which is to indicate the maximum number of matches 376*55694b2aSYipeng Wangthe user expects to find for each key, and ``set_id`` which is used to return all 377*55694b2aSYipeng Wangtarget set ids where the key has matched, if any. The ``set_id`` array should be sized 378*55694b2aSYipeng Wangaccording to ``max_match_per_key``. For vBF, the maximum number of matches per key is equal 379*55694b2aSYipeng Wangto the number of sets. For HTSS, the maximum number of matches per key is equal to two time 380*55694b2aSYipeng Wangentry count per bucket. ``max_match_per_key`` should be equal or smaller than the maximum number of 381*55694b2aSYipeng Wangpossible matches. 382*55694b2aSYipeng Wang 383*55694b2aSYipeng WangThe ``rte_membership_lookup_multi_bulk()`` function looks up a bulk of keys/elements in the 384*55694b2aSYipeng Wangset-summary structure for multiple matches, each key lookup returns ALL the matches (possibly more 385*55694b2aSYipeng Wangthan one) found for this key when it is matched against all target sets (cache mode HTSS 386*55694b2aSYipeng Wangmatches at most one target set). The 387*55694b2aSYipeng Wangreturn value is the number of keys that find one or more matches in the 388*55694b2aSYipeng Wangset-summary structure. The arguments of the 389*55694b2aSYipeng Wangfunction include ``keys`` which is 390*55694b2aSYipeng Wanga pointer to a bulk of keys that are to be looked up, ``num_keys`` is the number 391*55694b2aSYipeng Wangof keys that will be looked up, ``max_match_per_key`` is the possible 392*55694b2aSYipeng Wangmaximum number of matches for each key, ``match_count`` which is the returned number 393*55694b2aSYipeng Wangof matches for each key, and ``set_ids`` are the returned target set 394*55694b2aSYipeng Wangids for all matches found for each keys. ``set_ids`` is 2-D array 395*55694b2aSYipeng Wangcontaining a 1-D array for each key (the size of 1-D array per key should be set by the user according to ``max_match_per_key``). 396*55694b2aSYipeng Wang``max_match_per_key`` should be equal or smaller than the maximum number of 397*55694b2aSYipeng Wangpossible matches, similar to ``rte_member_lookup_multi``. 398*55694b2aSYipeng Wang 399*55694b2aSYipeng Wang 400*55694b2aSYipeng WangSet-summary Element Delete 401*55694b2aSYipeng Wang~~~~~~~~~~~~~~~~~~~~~~~~~~ 402*55694b2aSYipeng Wang 403*55694b2aSYipeng WangThe ``rte_membership_delete()`` function deletes an element/key from a set-summary structure, if it fails 404*55694b2aSYipeng Wangan error is returned. The input arguments should include ``key`` which is a pointer to the 405*55694b2aSYipeng Wangelement/key that needs to be deleted from the set-summary, and ``set_id`` 406*55694b2aSYipeng Wangwhich is the set id associated with the key to delete. It is worth noting that current 407*55694b2aSYipeng Wangimplementation of vBF does not support deletion [1]_. An error code ``-EINVAL`` will be returned. 408*55694b2aSYipeng Wang 409*55694b2aSYipeng Wang.. [1] Traditional bloom filter does not support proactive deletion. Supporting proactive deletion require additional implementation and performance overhead. 410*55694b2aSYipeng Wang 411*55694b2aSYipeng WangReferences 412*55694b2aSYipeng Wang----------- 413*55694b2aSYipeng Wang 414*55694b2aSYipeng Wang[Member-bloom] B H Bloom, "Space/Time Trade-offs in Hash Coding with Allowable Errors," Communications of the ACM, 1970. 415*55694b2aSYipeng Wang 416*55694b2aSYipeng Wang[Member-survey] A Broder and M Mitzenmacher, "Network Applications of Bloom Filters: A Survey," in Internet Mathematics, 2005. 417*55694b2aSYipeng Wang 418*55694b2aSYipeng Wang[Member-cfilter] B Fan, D G Andersen and M Kaminsky, "Cuckoo Filter: Practically Better Than Bloom," in Conference on emerging Networking Experiments and Technologies, 2014. 419*55694b2aSYipeng Wang 420*55694b2aSYipeng Wang[Member-OvS] B Pfaff, "The Design and Implementation of Open vSwitch," in NSDI, 2015. 421