1.. BSD LICENSE 2 Copyright(c) 2017 Intel Corporation. All rights reserved. 3 All rights reserved. 4 5 Redistribution and use in source and binary forms, with or without 6 modification, are permitted provided that the following conditions 7 are met: 8 9 * Redistributions of source code must retain the above copyright 10 notice, this list of conditions and the following disclaimer. 11 * Redistributions in binary form must reproduce the above copyright 12 notice, this list of conditions and the following disclaimer in 13 the documentation and/or other materials provided with the 14 distribution. 15 * Neither the name of Intel Corporation nor the names of its 16 contributors may be used to endorse or promote products derived 17 from this software without specific prior written permission. 18 19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 31 32.. _member_library: 33 34Membership Library 35================== 36 37Introduction 38------------ 39 40The DPDK Membership Library provides an API for DPDK applications to insert a 41new member, delete an existing member, or query the existence of a member in a 42given set, or a group of sets. For the case of a group of sets, the library 43will return not only whether the element has been inserted before in one of 44the sets but also which set it belongs to. The Membership Library is an 45extension and generalization of a traditional filter structure (for example 46Bloom Filter [Member-bloom]) that has multiple usages in a wide variety of 47workloads and applications. In general, the Membership Library is a data 48structure that provides a "set-summary" on whether a member belongs to a set, 49and as discussed in detail later, there are two advantages of using such a 50set-summary rather than operating on a "full-blown" complete list of elements: 51first, it has a much smaller storage requirement than storing the whole list of 52elements themselves, and secondly checking an element membership (or other 53operations) in this set-summary is much faster than checking it for the 54original full-blown complete list of elements. 55 56We use the term "Set-Summary" in this guide to refer to the space-efficient, 57probabilistic membership data structure that is provided by the library. A 58membership test for an element will return the set this element belongs to or 59that the element is "not-found" with very high probability of accuracy. Set-summary 60is a fundamental data aggregation component that can be used in many network 61(and other) applications. It is a crucial structure to address performance and 62scalability issues of diverse network applications including overlay networks, 63data-centric networks, flow table summaries, network statistics and 64traffic monitoring. A set-summary is useful for applications who need to 65include a list of elements while a complete list requires too much space 66and/or too much processing cost. In these situations, the set-summary works as 67a lossy hash-based representation of a set of members. It can dramatically 68reduce space requirement and significantly improve the performance of set 69membership queries at the cost of introducing a very small membership test error 70probability. 71 72.. _figure_membership1: 73.. figure:: img/member_i1.* 74 75 Example Usages of Membership Library 76 77There are various usages for a Membership Library in a very 78large set of applications and workloads. Interested readers can refer to 79[Member-survey] for a survey of possible networking usages. The above figure 80provide a small set of examples of using the Membership Library: 81 82* Sub-figure (a) 83 depicts a distributed web cache architecture where a collection of proxies 84 attempt to share their web caches (cached from a set of back-end web servers) to 85 provide faster responses to clients, and the proxies use the Membership 86 Library to share summaries of what web pages/objects they are caching. With the 87 Membership Library, a proxy receiving an http request will inquire the 88 set-summary to find its location and quickly determine whether to retrieve the 89 requested web page from a nearby proxy or from a back-end web server. 90 91* Sub-figure (b) depicts another example for using the Membership Library to 92 prevent routing loops which is typically done using slow TTL countdown and 93 dropping packets when TTL expires. As shown in Sub-figure (b), an embedded 94 set-summary in the packet header itself can be used to summarize the set of 95 nodes a packet has gone through, and each node upon receiving a packet can check 96 whether its id is a member of the set of visited nodes, and if it is, then a 97 routing loop is detected. 98 99* Sub-Figure (c) presents another usage of the Membership 100 Library to load-balance flows to worker threads with in-order guarantee where a 101 set-summary is used to query if a packet belongs to an existing flow or a new 102 flow. Packets belonging to a new flow are forwarded to the current least loaded 103 worker thread, while those belonging to an existing flow are forwarded to the 104 pre-assigned thread to guarantee in-order processing. 105 106* Sub-figure (d) highlights 107 yet another usage example in the database domain where a set-summary is used to 108 determine joins between sets instead of creating a join by comparing each 109 element of a set against the other elements in a different set, a join is done 110 on the summaries since they can efficiently encode members of a given set. 111 112Membership Library is a configurable library that is optimized to cover set 113membership functionality for both a single set and multi-set scenarios. Two set-summary 114schemes are presented including (a) vector of Bloom Filters and (b) Hash-Table based 115set-summary schemes with and without false negative probability. 116This guide first briefly describes these different types of set-summaries, usage examples for each, 117and then it highlights the Membership Library API. 118 119Vector of Bloom Filters 120----------------------- 121 122Bloom Filter (BF) [Member-bloom] is a well-known space-efficient 123probabilistic data structure that answers set membership queries (test whether 124an element is a member of a set) with some probability of false positives and 125zero false negatives; a query for an element returns either it is "possibly in 126a set" (with very high probability) or "definitely not in a set". 127 128The BF is a method for representing a set of ``n`` elements (for example flow keys 129in network applications domain) to support membership queries. The idea of BF is 130to allocate a bit-vector ``v`` with ``m`` bits, which are initially all set to 0. Then 131it chooses ``k`` independent hash functions ``h1``, ``h2``, ... ``hk`` with hash values range from 132``0`` to ``m-1`` to perform hashing calculations on each element to be inserted. Every time when an 133element ``X`` being inserted into the set, the bits at positions ``h1(X)``, ``h2(X)``, ... 134``hk(X)`` in ``v`` are set to 1 (any particular bit might be set to 1 multiple times 135for multiple different inserted elements). Given a query for any element ``Y``, the 136bits at positions ``h1(Y)``, ``h2(Y)``, ... ``hk(Y)`` are checked. If any of them is 0, 137then Y is definitely not in the set. Otherwise there is a high probability that 138Y is a member of the set with certain false positive probability. As shown in 139the next equation, the false positive probability can be made arbitrarily small 140by changing the number of hash functions (``k``) and the vector length (``m``). 141 142.. _figure_membership2: 143.. figure:: img/member_i2.* 144 145 Bloom Filter False Positive Probability 146 147Without BF, an accurate membership testing could involve a costly hash table 148lookup and full element comparison. The advantage of using a BF is to simplify 149the membership test into a series of hash calculations and memory accesses for a 150small bit-vector, which can be easily optimized. Hence the lookup throughput 151(set membership test) can be significantly faster than a normal hash table 152lookup with element comparison. 153 154.. _figure_membership3: 155.. figure:: img/member_i3.* 156 157 Detecting Routing Loops Using BF 158 159BF is used for applications that need only one set, and the 160membership of elements is checked against the BF. The example discussed 161in the above figure is one example of potential applications that uses only one 162set to capture the node IDs that have been visited so far by the packet. Each 163node will then check this embedded BF in the packet header for its own id, and 164if the BF indicates that the current node is definitely not in the set then a 165loop-free route is guaranteed. 166 167 168.. _figure_membership4: 169.. figure:: img/member_i4.* 170 171 Vector Bloom Filter (vBF) Overview 172 173To support membership test for both multiple sets and a single set, 174the library implements a Vector Bloom Filter (vBF) scheme. 175vBF basically composes multiple bloom filters into a vector of bloom filers. 176The membership test is conducted on all of the 177bloom filters concurrently to determine which set(s) it belongs to or none of 178them. The basic idea of vBF is shown in the above figure where an element is 179used to address multiple bloom filters concurrently and the bloom filter 180index(es) with a hit is returned. 181 182.. _figure_membership5: 183.. figure:: img/member_i5.* 184 185 vBF for Flow Scheduling to Worker Thread 186 187As previously mentioned, there are many usages of such structures. vBF is used 188for applications that need to check membership against multiple sets 189simultaneously. The example shown in the above figure uses a set to capture 190all flows being assigned for processing at a given worker thread. Upon receiving 191a packet the vBF is used to quickly figure out if this packet belongs to a new flow 192so as to be forwarded to the current least loaded worker thread, or otherwise it 193should be queued for an existing thread to guarantee in-order processing (i.e. 194the property of vBF to indicate right away that a given flow is a new one or 195not is critical to minimize response time latency). 196 197It should be noted that vBF can be implemented using a set of single bloom 198filters with sequential lookup of each BF. However, being able to concurrently 199search all set-summaries is a big throughput advantage. In the library, certain 200parallelism is realized by the implementation of checking all bloom filters 201together. 202 203 204Hash-Table based Set-Summaries 205------------------------------ 206 207Hash-table based set-summary (HTSS) is another scheme in the membership library. 208Cuckoo filter [Member-cfilter] is an example of HTSS. 209HTSS supports multi-set membership testing like 210vBF does. However, while vBF is better for a small number of targets, HTSS is more suitable 211and can easily outperform vBF when the number of sets is 212large, since HTSS uses a single hash table for membership testing while vBF 213requires testing a series of Bloom Filters each corresponding to one set. 214As a result, generally speaking vBF is more adequate for the case of a small limited number of sets 215while HTSS should be used with a larger number of sets. 216 217.. _figure_membership6: 218.. figure:: img/member_i6.* 219 220 Using HTSS for Attack Signature Matching 221 222As shown in the above figure, attack signature matching where each set 223represents a certain signature length (for correctness of this example, an 224attack signature should not be a subset of another one) in the payload is a good 225example for using HTSS with 0% false negative (i.e., when an element returns not 226found, it has a 100% certainty that it is not a member of any set). The packet 227inspection application benefits from knowing right away that the current payload 228does not match any attack signatures in the database to establish its 229legitimacy, otherwise a deep inspection of the packet is needed. 230 231HTSS employs a similar but simpler data structure to a traditional hash table, 232and the major difference is that HTSS stores only the signatures but not the 233full keys/elements which can significantly reduce the footprint of the table. 234Along with the signature, HTSS also stores a value to indicate the target set. 235When looking up an element, the element is hashed and the HTSS is addressed 236to retrieve the signature stored. If the signature matches then the value is 237retrieved corresponding to the index of the target set which the element belongs 238to. Because signatures can collide, HTSS can still has false positive 239probability. Furthermore, if elements are allowed to be 240overwritten or evicted when the hash table becomes full, it will also have a 241false negative probability. We discuss this case in the next section. 242 243Set-Summaries with False Negative Probability 244~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 245 246As previously mentioned, traditional set-summaries (e.g. Bloom Filters) do not 247have a false negative probability, i.e., it is 100% certain when an element 248returns "not to be present" for a given set. However, the Membership Library 249also supports a set-summary probabilistic data structure based on HTSS which 250allows for false negative probability. 251 252In HTSS, when the hash table becomes full, keys/elements will fail to be added 253into the table and the hash table has to be resized to accommodate for these new 254elements, which can be expensive. However, if we allow new elements to overwrite 255or evict existing elements (as a cache typically does), then the resulting 256set-summary will begin to have false negative probability. This is because the 257element that was evicted from the set-summary may still be present in the target 258set. For subsequent inquiries the set-summary will falsely report the element 259not being in the set, hence having a false negative probability. 260 261The major usage of HTSS with false negative is to use it as a cache for 262distributing elements to different target sets. By allowing HTSS to evict old 263elements, the set-summary can keep track of the most recent elements 264(i.e. active) as a cache typically does. Old inactive elements (infrequently 265used elements) will automatically and eventually get evicted from the 266set-summary. It is worth noting that the set-summary still has false positive 267probability, which means the application either can tolerate certain false positive 268or it has fall-back path when false positive happens. 269 270.. _figure_membership7: 271.. figure:: img/member_i7.* 272 273 Using HTSS with False Negatives for Wild Card Classification 274 275HTSS with false negative (i.e. a cache) also has its wide set of applications. 276For example wild card flow classification (e.g. ACL rules) highlighted in the 277above figure is an example of such application. In that case each target set 278represents a sub-table with rules defined by a certain flow mask. The flow masks 279are non-overlapping, and for flows matching more than one rule only the highest 280priority one is inserted in the corresponding sub-table (interested readers can 281refer to the Open vSwitch (OvS) design of Mega Flow Cache (MFC) [Member-OvS] 282for further details). Typically the rules will have a large number of distinct 283unique masks and hence, a large number of target sets each corresponding to one 284mask. Because the active set of flows varies widely based on the network 285traffic, HTSS with false negative will act as a cache for <flowid, target ACL 286sub-table> pair for the current active set of flows. When a miss occurs (as 287shown in red in the above figure) the sub-tables will be searched sequentially 288one by one for a possible match, and when found the flow key and target 289sub-table will be inserted into the set-summary (i.e. cache insertion) so 290subsequent packets from the same flow don’t incur the overhead of the 291sequential search of sub-tables. 292 293Library API Overview 294-------------------- 295 296The design goal of the Membership Library API is to be as generic as possible to 297support all the different types of set-summaries we discussed in previous 298sections and beyond. Fundamentally, the APIs need to include creation, 299insertion, deletion, and lookup. 300 301 302Set-summary Create 303~~~~~~~~~~~~~~~~~~ 304 305The ``rte_member_create()`` function is used to create a set-summary structure, the input parameter 306is a struct to pass in parameters that needed to initialize the set-summary, while the function returns the 307pointer to the created set-summary or ``NULL`` if the creation failed. 308 309The general input arguments used when creating the set-summary should include ``name`` 310which is the name of the created set-summary, *type* which is one of the types 311supported by the library (e.g. ``RTE_MEMBER_TYPE_HT`` for HTSS or ``RTE_MEMBER_TYPE_VBF`` for vBF), and ``key_len`` 312which is the length of the element/key. There are other parameters 313are only used for certain type of set-summary, or which have a slightly different meaning for different types of set-summary. 314For example, ``num_keys`` parameter means the maximum number of entries for Hash table based set-summary. 315However, for bloom filter, this value means the expected number of keys that could be 316inserted into the bloom filter(s). The value is used to calculate the size of each 317bloom filter. 318 319We also pass two seeds: ``prim_hash_seed`` and 320``sec_hash_seed`` for the primary and secondary hash functions to calculate two independent hash values. 321``socket_id`` parameter is the NUMA socket ID for the memory used to create the 322set-summary. For HTSS, another parameter ``is_cache`` is used to indicate 323if this set-summary is a cache (i.e. with false negative probability) or not. 324For vBF, extra parameters are needed. For example, ``num_set`` is the number of 325sets needed to initialize the vector bloom filters. This number is equal to the 326number of bloom filters will be created. 327``false_pos_rate`` is the false positive rate. num_keys and false_pos_rate will be used to determine 328the number of hash functions and the bloom filter size. 329 330 331Set-summary Element Insertion 332~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 333 334The ``rte_member_add()`` function is used to insert an element/key into a set-summary structure. If it fails an 335error is returned. For success the returned value is dependent on the 336set-summary mode to provide extra information for the users. For vBF 337mode, a return value of 0 means a successful insert. For HTSS mode without false negative, the insert 338could fail with ``-ENOSPC`` if the table is full. With false negative (i.e. cache mode), 339for insert that does not cause any eviction (i.e. no overwriting happens to an 340existing entry) the return value is 0. For insertion that causes eviction, the return 341value is 1 to indicate such situation, but it is not an error. 342 343The input arguments for the function should include the ``key`` which is a pointer to the element/key that needs to 344be added to the set-summary, and ``set_id`` which is the set id associated 345with the key that needs to be added. 346 347 348Set-summary Element Lookup 349~~~~~~~~~~~~~~~~~~~~~~~~~~ 350 351The ``rte_member_lookup()`` function looks up a single key/element in the set-summary structure. It 352returns as soon as the first match is found. The return value is 1 if a 353match is found and 0 otherwise. The arguments for the function include ``key`` which is a pointer to the 354element/key that needs to be looked up, and ``set_id`` which is used to return the 355first target set id where the key has matched, if any. 356 357The ``rte_member_lookup_bulk()`` function is used to look up a bulk of keys/elements in the 358set-summary structure for their first match. Each key lookup returns as soon as the first match is found. The 359return value is the number of keys that find a match. The arguments of the function include ``keys`` 360which is a pointer to a bulk of keys that are to be looked up, 361``num_keys`` is the number 362of keys that will be looked up, and ``set_ids`` are the return target set 363ids for the first match found for each of the input keys. ``set_ids`` is an array 364needs to be sized according to the ``num_keys``. If there is no match, the set id 365for that key will be set to RTE_MEMBER_NO_MATCH. 366 367The ``rte_member_lookup_multi()`` function looks up a single key/element in the 368set-summary structure for multiple matches. It 369returns ALL the matches (possibly more than one) found for this key when it 370is matched against all target sets (it is worth noting that for cache mode HTSS, 371the current implementation matches at most one target set). The return value is 372the number of matches 373that was found for this key (for cache mode HTSS the return value 374should be at most 1). The arguments for the function include ``key`` which is a pointer to the 375element/key that needs to be looked up, ``max_match_per_key`` which is to indicate the maximum number of matches 376the user expects to find for each key, and ``set_id`` which is used to return all 377target set ids where the key has matched, if any. The ``set_id`` array should be sized 378according to ``max_match_per_key``. For vBF, the maximum number of matches per key is equal 379to the number of sets. For HTSS, the maximum number of matches per key is equal to two time 380entry count per bucket. ``max_match_per_key`` should be equal or smaller than the maximum number of 381possible matches. 382 383The ``rte_membership_lookup_multi_bulk()`` function looks up a bulk of keys/elements in the 384set-summary structure for multiple matches, each key lookup returns ALL the matches (possibly more 385than one) found for this key when it is matched against all target sets (cache mode HTSS 386matches at most one target set). The 387return value is the number of keys that find one or more matches in the 388set-summary structure. The arguments of the 389function include ``keys`` which is 390a pointer to a bulk of keys that are to be looked up, ``num_keys`` is the number 391of keys that will be looked up, ``max_match_per_key`` is the possible 392maximum number of matches for each key, ``match_count`` which is the returned number 393of matches for each key, and ``set_ids`` are the returned target set 394ids for all matches found for each keys. ``set_ids`` is 2-D array 395containing a 1-D array for each key (the size of 1-D array per key should be set by the user according to ``max_match_per_key``). 396``max_match_per_key`` should be equal or smaller than the maximum number of 397possible matches, similar to ``rte_member_lookup_multi``. 398 399 400Set-summary Element Delete 401~~~~~~~~~~~~~~~~~~~~~~~~~~ 402 403The ``rte_membership_delete()`` function deletes an element/key from a set-summary structure, if it fails 404an error is returned. The input arguments should include ``key`` which is a pointer to the 405element/key that needs to be deleted from the set-summary, and ``set_id`` 406which is the set id associated with the key to delete. It is worth noting that current 407implementation of vBF does not support deletion [1]_. An error code ``-EINVAL`` will be returned. 408 409.. [1] Traditional bloom filter does not support proactive deletion. Supporting proactive deletion require additional implementation and performance overhead. 410 411References 412----------- 413 414[Member-bloom] B H Bloom, "Space/Time Trade-offs in Hash Coding with Allowable Errors," Communications of the ACM, 1970. 415 416[Member-survey] A Broder and M Mitzenmacher, "Network Applications of Bloom Filters: A Survey," in Internet Mathematics, 2005. 417 418[Member-cfilter] B Fan, D G Andersen and M Kaminsky, "Cuckoo Filter: Practically Better Than Bloom," in Conference on emerging Networking Experiments and Technologies, 2014. 419 420[Member-OvS] B Pfaff, "The Design and Implementation of Open vSwitch," in NSDI, 2015. 421