10dd62a01SPablo de Lara.. BSD LICENSE 20dd62a01SPablo de Lara Copyright(c) 2016-2017 Intel Corporation. All rights reserved. 30dd62a01SPablo de Lara All rights reserved. 40dd62a01SPablo de Lara 50dd62a01SPablo de Lara Redistribution and use in source and binary forms, with or without 60dd62a01SPablo de Lara modification, are permitted provided that the following conditions 70dd62a01SPablo de Lara are met: 80dd62a01SPablo de Lara 90dd62a01SPablo de Lara * Redistributions of source code must retain the above copyright 100dd62a01SPablo de Lara notice, this list of conditions and the following disclaimer. 110dd62a01SPablo de Lara * Redistributions in binary form must reproduce the above copyright 120dd62a01SPablo de Lara notice, this list of conditions and the following disclaimer in 130dd62a01SPablo de Lara the documentation and/or other materials provided with the 140dd62a01SPablo de Lara distribution. 150dd62a01SPablo de Lara * Neither the name of Intel Corporation nor the names of its 160dd62a01SPablo de Lara contributors may be used to endorse or promote products derived 170dd62a01SPablo de Lara from this software without specific prior written permission. 180dd62a01SPablo de Lara 190dd62a01SPablo de Lara THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 200dd62a01SPablo de Lara "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 210dd62a01SPablo de Lara LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 220dd62a01SPablo de Lara A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 230dd62a01SPablo de Lara OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 240dd62a01SPablo de Lara SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 250dd62a01SPablo de Lara LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 260dd62a01SPablo de Lara DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 270dd62a01SPablo de Lara THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 280dd62a01SPablo de Lara (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 290dd62a01SPablo de Lara OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 300dd62a01SPablo de Lara 310dd62a01SPablo de Lara.. _Efd_Library: 320dd62a01SPablo de Lara 330dd62a01SPablo de LaraElastic Flow Distributor Library 340dd62a01SPablo de Lara================================ 350dd62a01SPablo de Lara 360dd62a01SPablo de LaraIntroduction 370dd62a01SPablo de Lara------------ 380dd62a01SPablo de Lara 390dd62a01SPablo de LaraIn Data Centers today, clustering and scheduling of distributed workloads 400dd62a01SPablo de Larais a very common task. Many workloads require a deterministic 410dd62a01SPablo de Larapartitioning of a flat key space among a cluster of machines. When a 420dd62a01SPablo de Larapacket enters the cluster, the ingress node will direct the packet to 430dd62a01SPablo de Laraits handling node. For example, data-centers with disaggregated storage 440dd62a01SPablo de Larause storage metadata tables to forward I/O requests to the correct back end 450dd62a01SPablo de Larastorage cluster, stateful packet inspection will use match incoming 460dd62a01SPablo de Laraflows to signatures in flow tables to send incoming packets to their 470dd62a01SPablo de Laraintended deep packet inspection (DPI) devices, and so on. 480dd62a01SPablo de Lara 490dd62a01SPablo de LaraEFD is a distributor library that uses perfect hashing to determine a 500dd62a01SPablo de Laratarget/value for a given incoming flow key. It has the following 510dd62a01SPablo de Laraadvantages: first, because it uses perfect hashing it does not store the 520dd62a01SPablo de Larakey itself and hence lookup performance is not dependent on the key 530dd62a01SPablo de Larasize. Second, the target/value can be any arbitrary value hence the 540dd62a01SPablo de Larasystem designer and/or operator can better optimize service rates and 550dd62a01SPablo de Larainter-cluster network traffic locating. Third, since the storage 560dd62a01SPablo de Lararequirement is much smaller than a hash-based flow table (i.e. better 570dd62a01SPablo de Larafit for CPU cache), EFD can scale to millions of flow keys. Finally, 580dd62a01SPablo de Larawith the current optimized library implementation, performance is fully 590dd62a01SPablo de Larascalable with any number of CPU cores. 600dd62a01SPablo de Lara 610dd62a01SPablo de LaraFlow Based Distribution 620dd62a01SPablo de Lara----------------------- 630dd62a01SPablo de Lara 640dd62a01SPablo de LaraComputation Based Schemes 650dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~ 660dd62a01SPablo de Lara 670dd62a01SPablo de LaraFlow distribution and/or load balancing can be simply done using a 680dd62a01SPablo de Larastateless computation, for instance using round-robin or a simple 690dd62a01SPablo de Laracomputation based on the flow key as an input. For example, a hash 700dd62a01SPablo de Larafunction can be used to direct a certain flow to a target based on 710dd62a01SPablo de Larathe flow key (e.g. ``h(key) mod n``) where h(key) is the hash value of the 720dd62a01SPablo de Laraflow key and n is the number of possible targets. 730dd62a01SPablo de Lara 740dd62a01SPablo de Lara.. _figure_efd1: 750dd62a01SPablo de Lara 760dd62a01SPablo de Lara.. figure:: img/efd_i1.* 770dd62a01SPablo de Lara 780dd62a01SPablo de Lara Load Balancing Using Front End Node 790dd62a01SPablo de Lara 800dd62a01SPablo de LaraIn this scheme (:numref:`figure_efd1`), the front end server/distributor/load balancer 810dd62a01SPablo de Laraextracts the flow key from the input packet and applies a computation to determine where 820dd62a01SPablo de Larathis flow should be directed. Intuitively, this scheme is very simple 830dd62a01SPablo de Laraand requires no state to be kept at the front end node, and hence, 840dd62a01SPablo de Larastorage requirements are minimum. 850dd62a01SPablo de Lara 860dd62a01SPablo de Lara.. _figure_efd2: 870dd62a01SPablo de Lara 880dd62a01SPablo de Lara.. figure:: img/efd_i2.* 890dd62a01SPablo de Lara 900dd62a01SPablo de Lara Consistent Hashing 910dd62a01SPablo de Lara 920dd62a01SPablo de LaraA widely used flow distributor that belongs to the same category of 930dd62a01SPablo de Laracomputation-based schemes is ``consistent hashing``, shown in :numref:`figure_efd2`. 940dd62a01SPablo de LaraTarget destinations (shown in red) are hashed into the same space as the flow 950dd62a01SPablo de Larakeys (shown in blue), and keys are mapped to the nearest target in a clockwise 960dd62a01SPablo de Larafashion. Dynamically adding and removing targets with consistent hashing 970dd62a01SPablo de Lararequires only K/n keys to be remapped on average, where K is the number of 980dd62a01SPablo de Larakeys, and n is the number of targets. In contrast, in a traditional hash-based 990dd62a01SPablo de Larascheme, a change in the number of targets causes nearly all keys to be 1000dd62a01SPablo de Lararemapped. 1010dd62a01SPablo de Lara 1020dd62a01SPablo de LaraAlthough computation-based schemes are simple and need very little 1030dd62a01SPablo de Larastorage requirement, they suffer from the drawback that the system 1040dd62a01SPablo de Laradesigner/operator can’t fully control the target to assign a specific 1050dd62a01SPablo de Larakey, as this is dictated by the hash function. 1060dd62a01SPablo de LaraDeterministically co-locating of keys together (for example, to minimize 1070dd62a01SPablo de Larainter-server traffic or to optimize for network traffic conditions, 1080dd62a01SPablo de Laratarget load, etc.) is simply not possible. 1090dd62a01SPablo de Lara 1100dd62a01SPablo de LaraFlow-Table Based Schemes 1110dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~ 1120dd62a01SPablo de Lara 1130dd62a01SPablo de LaraWhen using a Flow-Table based scheme to handle flow distribution/load 1140dd62a01SPablo de Larabalancing, in contrast with computation-based schemes, the system designer 1150dd62a01SPablo de Larahas the flexibility of assigning a given flow to any given 1160dd62a01SPablo de Laratarget. The flow table (e.g. DPDK RTE Hash Library) will simply store 1170dd62a01SPablo de Laraboth the flow key and the target value. 1180dd62a01SPablo de Lara 1190dd62a01SPablo de Lara.. _figure_efd3: 1200dd62a01SPablo de Lara 1210dd62a01SPablo de Lara.. figure:: img/efd_i3.* 1220dd62a01SPablo de Lara 1230dd62a01SPablo de Lara Table Based Flow Distribution 1240dd62a01SPablo de Lara 1250dd62a01SPablo de LaraAs shown in :numref:`figure_efd3`, when doing a lookup, the flow-table 1260dd62a01SPablo de Larais indexed with the hash of the flow key and the keys (more than one is possible, 1270dd62a01SPablo de Larabecause of hash collision) stored in this index and corresponding values 1280dd62a01SPablo de Laraare retrieved. The retrieved key(s) is matched with the input flow key 1290dd62a01SPablo de Laraand if there is a match the value (target id) is returned. 1300dd62a01SPablo de Lara 1310dd62a01SPablo de LaraThe drawback of using a hash table for flow distribution/load balancing 1320dd62a01SPablo de Larais the storage requirement, since the flow table need to store keys, 1330dd62a01SPablo de Larasignatures and target values. This doesn't allow this scheme to scale to 1340dd62a01SPablo de Laramillions of flow keys. Large tables will usually not fit in 1350dd62a01SPablo de Larathe CPU cache, and hence, the lookup performance is degraded because of 1360dd62a01SPablo de Larathe latency to access the main memory. 1370dd62a01SPablo de Lara 1380dd62a01SPablo de LaraEFD Based Scheme 1390dd62a01SPablo de Lara~~~~~~~~~~~~~~~~ 1400dd62a01SPablo de Lara 1410dd62a01SPablo de LaraEFD combines the advantages of both flow-table based and computation-based 1420dd62a01SPablo de Laraschemes. It doesn't require the large storage necessary for 1430dd62a01SPablo de Laraflow-table based schemes (because EFD doesn't store the key as explained 1440dd62a01SPablo de Larabelow), and it supports any arbitrary value for any given key. 1450dd62a01SPablo de Lara 1460dd62a01SPablo de Lara.. _figure_efd4: 1470dd62a01SPablo de Lara 1480dd62a01SPablo de Lara.. figure:: img/efd_i4.* 1490dd62a01SPablo de Lara 1500dd62a01SPablo de Lara Searching for Perfect Hash Function 1510dd62a01SPablo de Lara 1520dd62a01SPablo de LaraThe basic idea of EFD is when a given key is to be inserted, a family of 1530dd62a01SPablo de Larahash functions is searched until the correct hash function that maps the 1540dd62a01SPablo de Larainput key to the correct value is found, as shown in :numref:`figure_efd4`. 1550dd62a01SPablo de LaraHowever, rather than explicitly storing all keys and their associated values, 1560dd62a01SPablo de LaraEFD stores only indices of hash functions that map keys to values, and 1570dd62a01SPablo de Larathereby consumes much less space than conventional flow-based tables. 1580dd62a01SPablo de LaraThe lookup operation is very simple, similar to a computational-based 1590dd62a01SPablo de Larascheme: given an input key the lookup operation is reduced to hashing 1600dd62a01SPablo de Larathat key with the correct hash function. 1610dd62a01SPablo de Lara 1620dd62a01SPablo de Lara.. _figure_efd5: 1630dd62a01SPablo de Lara 1640dd62a01SPablo de Lara.. figure:: img/efd_i5.* 1650dd62a01SPablo de Lara 1660dd62a01SPablo de Lara Divide and Conquer for Millions of Keys 1670dd62a01SPablo de Lara 1680dd62a01SPablo de LaraIntuitively, finding a hash function that maps each of a large number 1690dd62a01SPablo de Lara(millions) of input keys to the correct output value is effectively 1700dd62a01SPablo de Laraimpossible, as a result EFD, as shown in :numref:`figure_efd5`, 1710dd62a01SPablo de Larabreaks the problem into smaller pieces (divide and conquer). 1720dd62a01SPablo de LaraEFD divides the entire input key set into many small groups. 1730dd62a01SPablo de LaraEach group consists of approximately 20-28 keys (a configurable parameter 1740dd62a01SPablo de Larafor the library), then, for each small group, a brute force search to find 1750dd62a01SPablo de Laraa hash function that produces the correct outputs for each key in the group. 1760dd62a01SPablo de Lara 1770dd62a01SPablo de LaraIt should be mentioned that, since the online lookup table for EFD 1780dd62a01SPablo de Laradoesn't store the key itself, the size of the EFD table is independent 1790dd62a01SPablo de Laraof the key size and hence EFD lookup performance which is almost 1800dd62a01SPablo de Laraconstant irrespective of the length of the key which is a highly 1810dd62a01SPablo de Laradesirable feature especially for longer keys. 1820dd62a01SPablo de Lara 1830dd62a01SPablo de LaraIn summary, EFD is a set separation data structure that supports millions of 1840dd62a01SPablo de Larakeys. It is used to distribute a given key to an intended target. By itself 1850dd62a01SPablo de LaraEFD is not a FIB data structure with an exact match the input flow key. 1860dd62a01SPablo de Lara 1870dd62a01SPablo de Lara.. _Efd_example: 1880dd62a01SPablo de Lara 1890dd62a01SPablo de LaraExample of EFD Library Usage 1900dd62a01SPablo de Lara---------------------------- 1910dd62a01SPablo de Lara 1920dd62a01SPablo de LaraEFD can be used along the data path of many network functions and middleboxes. 1930dd62a01SPablo de LaraAs previously mentioned, it can used as an index table for 1940dd62a01SPablo de Lara<key,value> pairs, meta-data for objects, a flow-level load balancer, etc. 1950dd62a01SPablo de Lara:numref:`figure_efd6` shows an example of using EFD as a flow-level load 1960dd62a01SPablo de Larabalancer, where flows are received at a front end server before being forwarded 1970dd62a01SPablo de Larato the target back end server for processing. The system designer would 1980dd62a01SPablo de Laradeterministically co-locate flows together in order to minimize cross-server 1990dd62a01SPablo de Larainteraction. 2000dd62a01SPablo de Lara(For example, flows requesting certain webpage objects are co-located 2010dd62a01SPablo de Laratogether, to minimize forwarding of common objects across servers). 2020dd62a01SPablo de Lara 2030dd62a01SPablo de Lara.. _figure_efd6: 2040dd62a01SPablo de Lara 2050dd62a01SPablo de Lara.. figure:: img/efd_i6.* 2060dd62a01SPablo de Lara 2070dd62a01SPablo de Lara EFD as a Flow-Level Load Balancer 2080dd62a01SPablo de Lara 2090dd62a01SPablo de LaraAs shown in :numref:`figure_efd6`, the front end server will have an EFD table that 2100dd62a01SPablo de Larastores for each group what is the perfect hash index that satisfies the 2110dd62a01SPablo de Laracorrect output. Because the table size is small and fits in cache (since 2120dd62a01SPablo de Larakeys are not stored), it sustains a large number of flows (N*X, where N 2130dd62a01SPablo de Larais the maximum number of flows served by each back end server of the X 2140dd62a01SPablo de Larapossible targets). 2150dd62a01SPablo de Lara 2160dd62a01SPablo de LaraWith an input flow key, the group id is computed (for example, using 2170dd62a01SPablo de Laralast few bits of CRC hash) and then the EFD table is indexed with the 2180dd62a01SPablo de Laragroup id to retrieve the corresponding hash index to use. Once the index 2190dd62a01SPablo de Larais retrieved the key is hashed using this hash function and the result 2200dd62a01SPablo de Larawill be the intended correct target where this flow is supposed to be 2210dd62a01SPablo de Laraprocessed. 2220dd62a01SPablo de Lara 2230dd62a01SPablo de LaraIt should be noted that as a result of EFD not matching the exact key but 2240dd62a01SPablo de Lararather distributing the flows to a target back end node based on the 2250dd62a01SPablo de Laraperfect hash index, a key that has not been inserted before 2260dd62a01SPablo de Larawill be distributed to a valid target. Hence, a local table which stores 2270dd62a01SPablo de Larathe flows served at each node is used and is 2280dd62a01SPablo de Laraexact matched with the input key to rule out new never seen before 2290dd62a01SPablo de Laraflows. 2300dd62a01SPablo de Lara 2310dd62a01SPablo de Lara.. _Efd_api: 2320dd62a01SPablo de Lara 2330dd62a01SPablo de LaraLibrary API Overview 2340dd62a01SPablo de Lara-------------------- 2350dd62a01SPablo de Lara 2360dd62a01SPablo de LaraThe EFD library API is created with a very similar semantics of a 2370dd62a01SPablo de Larahash-index or a flow table. The application creates an EFD table for a 2380dd62a01SPablo de Laragiven maximum number of flows, a function is called to insert a flow key 2390dd62a01SPablo de Larawith a specific target value, and another function is used to retrieve 2400dd62a01SPablo de Laratarget values for a given individual flow key or a bulk of keys. 2410dd62a01SPablo de Lara 2420dd62a01SPablo de LaraEFD Table Create 2430dd62a01SPablo de Lara~~~~~~~~~~~~~~~~ 2440dd62a01SPablo de Lara 2450dd62a01SPablo de LaraThe function ``rte_efd_create()`` is used to create and return a pointer 2460dd62a01SPablo de Larato an EFD table that is sized to hold up to num_flows key. 2470dd62a01SPablo de LaraThe online version of the EFD table (the one that does 2480dd62a01SPablo de Laranot store the keys and is used for lookups) will be allocated and 2490dd62a01SPablo de Laracreated in the last level cache (LLC) of the socket defined by the 2500dd62a01SPablo de Laraonline_socket_bitmask, while the offline EFD table (the one that 2510dd62a01SPablo de Larastores the keys and is used for key inserts and for computing the 2520dd62a01SPablo de Laraperfect hashing) is allocated and created in the LLC of the socket 2530dd62a01SPablo de Laradefined by offline_socket_bitmask. It should be noted, that for 2540dd62a01SPablo de Larahighest performance the socket id should match that where the thread is 2550dd62a01SPablo de Lararunning, i.e. the online EFD lookup table should be created on the same 2560dd62a01SPablo de Larasocket as where the lookup thread is running. 2570dd62a01SPablo de Lara 2580dd62a01SPablo de LaraEFD Insert and Update 2590dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~ 2600dd62a01SPablo de Lara 2610dd62a01SPablo de LaraThe EFD function to insert a key or update a key to a new value is 2620dd62a01SPablo de Lara``rte_efd_update()``. This function will update an existing key to 2630dd62a01SPablo de Laraa new value (target) if the key has already been inserted 2640dd62a01SPablo de Larabefore, or will insert the <key,value> pair if this key has not been inserted 2650dd62a01SPablo de Larabefore. It will return 0 upon success. It will return 2660dd62a01SPablo de Lara``EFD_UPDATE_WARN_GROUP_FULL (1)`` if the operation is insert, and the 2670dd62a01SPablo de Laralast available space in the key's group was just used. It will return 2680dd62a01SPablo de Lara``EFD_UPDATE_FAILED (2)`` when the insertion or update has failed (either it 2690dd62a01SPablo de Larafailed to find a suitable perfect hash or the group was full). The function 2700dd62a01SPablo de Larawill return ``EFD_UPDATE_NO_CHANGE (3)`` if there is no change to the EFD 2710dd62a01SPablo de Laratable (i.e, same value already exists). 2720dd62a01SPablo de Lara 273*b09efeb9SPablo de Lara.. Note:: 274*b09efeb9SPablo de Lara 275*b09efeb9SPablo de Lara This function is not multi-thread safe and should only be called 276*b09efeb9SPablo de Lara from one thread. 277*b09efeb9SPablo de Lara 2780dd62a01SPablo de LaraEFD Lookup 2790dd62a01SPablo de Lara~~~~~~~~~~ 2800dd62a01SPablo de Lara 2810dd62a01SPablo de LaraTo lookup a certain key in an EFD table, the function ``rte_efd_lookup()`` 2820dd62a01SPablo de Larais used to return the value associated with single key. 2830dd62a01SPablo de LaraAs previously mentioned, if the key has been inserted, the correct value 2840dd62a01SPablo de Larainserted is returned, if the key has not been inserted before, 2850dd62a01SPablo de Laraa ‘random’ value (based on hashing of the key) is returned. 2860dd62a01SPablo de LaraFor better performance and to decrease the overhead of 2870dd62a01SPablo de Larafunction calls per key, it is always recommended to use a bulk lookup 2880dd62a01SPablo de Larafunction (simultaneous lookup of multiple keys) instead of a single key 2890dd62a01SPablo de Laralookup function. ``rte_efd_lookup_bulk()`` is the bulk lookup function, 2900dd62a01SPablo de Larathat looks up num_keys simultaneously stored in the key_list and the 2910dd62a01SPablo de Laracorresponding return values will be returned in the value_list. 2920dd62a01SPablo de Lara 293*b09efeb9SPablo de Lara.. Note:: 294*b09efeb9SPablo de Lara 295*b09efeb9SPablo de Lara This function is multi-thread safe, but there should not be other threads 296*b09efeb9SPablo de Lara writing in the EFD table, unless locks are used. 297*b09efeb9SPablo de Lara 2980dd62a01SPablo de LaraEFD Delete 2990dd62a01SPablo de Lara~~~~~~~~~~ 3000dd62a01SPablo de Lara 3010dd62a01SPablo de LaraTo delete a certain key in an EFD table, the function 3020dd62a01SPablo de Lara``rte_efd_delete()`` can be used. The function returns zero upon success 3030dd62a01SPablo de Larawhen the key has been found and deleted. Socket_id is the parameter to 3040dd62a01SPablo de Larause to lookup the existing value, which is ideally the caller's socket id. 3050dd62a01SPablo de LaraThe previous value associated with this key will be returned 3060dd62a01SPablo de Larain the prev_value argument. 3070dd62a01SPablo de Lara 308*b09efeb9SPablo de Lara.. Note:: 309*b09efeb9SPablo de Lara 310*b09efeb9SPablo de Lara This function is not multi-thread safe and should only be called 311*b09efeb9SPablo de Lara from one thread. 312*b09efeb9SPablo de Lara 3130dd62a01SPablo de Lara.. _Efd_internals: 3140dd62a01SPablo de Lara 3150dd62a01SPablo de LaraLibrary Internals 3160dd62a01SPablo de Lara----------------- 3170dd62a01SPablo de Lara 3180dd62a01SPablo de LaraThis section provides the brief high-level idea and an overview 3190dd62a01SPablo de Laraof the library internals to accompany the RFC. The intent of this 3200dd62a01SPablo de Larasection is to explain to readers the high-level implementation of 3210dd62a01SPablo de Larainsert, lookup and group rebalancing in the EFD library. 3220dd62a01SPablo de Lara 3230dd62a01SPablo de LaraInsert Function Internals 3240dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~ 3250dd62a01SPablo de Lara 3260dd62a01SPablo de LaraAs previously mentioned the EFD divides the whole set of keys into 3270dd62a01SPablo de Laragroups of a manageable size (e.g. 28 keys) and then searches for the 3280dd62a01SPablo de Laraperfect hash that satisfies the intended target value for each key. EFD 3290dd62a01SPablo de Larastores two version of the <key,value> table: 3300dd62a01SPablo de Lara 3310dd62a01SPablo de Lara- Offline Version (in memory): Only used for the insertion/update 3320dd62a01SPablo de Lara operation, which is less frequent than the lookup operation. In the 3330dd62a01SPablo de Lara offline version the exact keys for each group is stored. When a new 3340dd62a01SPablo de Lara key is added, the hash function is updated that will satisfy the 3350dd62a01SPablo de Lara value for the new key together with the all old keys already inserted 3360dd62a01SPablo de Lara in this group. 3370dd62a01SPablo de Lara 3380dd62a01SPablo de Lara- Online Version (in cache): Used for the frequent lookup operation. In 3390dd62a01SPablo de Lara the online version, as previously mentioned, the keys are not stored 3400dd62a01SPablo de Lara but rather only the hash index for each group. 3410dd62a01SPablo de Lara 3420dd62a01SPablo de Lara.. _figure_efd7: 3430dd62a01SPablo de Lara 3440dd62a01SPablo de Lara.. figure:: img/efd_i7.* 3450dd62a01SPablo de Lara 3460dd62a01SPablo de Lara Group Assignment 3470dd62a01SPablo de Lara 3480dd62a01SPablo de Lara:numref:`figure_efd7` depicts the group assignment for 7 flow keys as an example. 3490dd62a01SPablo de LaraGiven a flow key, a hash function (in our implementation CRC hash) is 3500dd62a01SPablo de Laraused to get the group id. As shown in the figure, the groups can be 3510dd62a01SPablo de Laraunbalanced. (We highlight group rebalancing further below). 3520dd62a01SPablo de Lara 3530dd62a01SPablo de Lara.. _figure_efd8: 3540dd62a01SPablo de Lara 3550dd62a01SPablo de Lara.. figure:: img/efd_i8.* 3560dd62a01SPablo de Lara 3570dd62a01SPablo de Lara Perfect Hash Search - Assigned Keys & Target Value 3580dd62a01SPablo de Lara 3590dd62a01SPablo de LaraFocusing on one group that has four keys, :numref:`figure_efd8` depicts the search 3600dd62a01SPablo de Laraalgorithm to find the perfect hash function. Assuming that the target 3610dd62a01SPablo de Laravalue bit for the keys is as shown in the figure, then the online EFD 3620dd62a01SPablo de Laratable will store a 16 bit hash index and 16 bit lookup table per group 3630dd62a01SPablo de Laraper value bit. 3640dd62a01SPablo de Lara 3650dd62a01SPablo de Lara.. _figure_efd9: 3660dd62a01SPablo de Lara 3670dd62a01SPablo de Lara.. figure:: img/efd_i9.* 3680dd62a01SPablo de Lara 3690dd62a01SPablo de Lara Perfect Hash Search - Satisfy Target Values 3700dd62a01SPablo de Lara 3710dd62a01SPablo de LaraFor a given keyX, a hash function ``(h(keyX, seed1) + index * h(keyX, seed2))`` 3720dd62a01SPablo de Larais used to point to certain bit index in the 16bit lookup_table value, 3730dd62a01SPablo de Laraas shown in :numref:`figure_efd9`. 3740dd62a01SPablo de LaraThe insert function will brute force search for all possible values for the 3750dd62a01SPablo de Larahash index until a non conflicting lookup_table is found. 3760dd62a01SPablo de Lara 3770dd62a01SPablo de Lara.. _figure_efd10: 3780dd62a01SPablo de Lara 3790dd62a01SPablo de Lara.. figure:: img/efd_i10.* 3800dd62a01SPablo de Lara 3810dd62a01SPablo de Lara Finding Hash Index for Conflict Free lookup_table 3820dd62a01SPablo de Lara 3830dd62a01SPablo de LaraFor example, since both key3 and key7 have a target bit value of 1, it 3840dd62a01SPablo de Larais okay if the hash function of both keys point to the same bit in the 3850dd62a01SPablo de Laralookup table. A conflict will occur if a hash index is used that maps 3860dd62a01SPablo de Laraboth Key4 and Key7 to the same index in the lookup_table, 3870dd62a01SPablo de Laraas shown in :numref:`figure_efd10`, since their target value bit are not the same. 3880dd62a01SPablo de LaraOnce a hash index is found that produces a lookup_table with no 3890dd62a01SPablo de Laracontradictions, this index is stored for this group. This procedure is 3900dd62a01SPablo de Lararepeated for each bit of target value. 3910dd62a01SPablo de Lara 3920dd62a01SPablo de LaraLookup Function Internals 3930dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~ 3940dd62a01SPablo de Lara 3950dd62a01SPablo de LaraThe design principle of EFD is that lookups are much more frequent than 3960dd62a01SPablo de Larainserts, and hence, EFD's design optimizes for the lookups which are 3970dd62a01SPablo de Larafaster and much simpler than the slower insert procedure (inserts are 3980dd62a01SPablo de Laraslow, because of perfect hash search as previously discussed). 3990dd62a01SPablo de Lara 4000dd62a01SPablo de Lara.. _figure_efd11: 4010dd62a01SPablo de Lara 4020dd62a01SPablo de Lara.. figure:: img/efd_i11.* 4030dd62a01SPablo de Lara 4040dd62a01SPablo de Lara EFD Lookup Operation 4050dd62a01SPablo de Lara 4060dd62a01SPablo de Lara:numref:`figure_efd11` depicts the lookup operation for EFD. Given an input key, 4070dd62a01SPablo de Larathe group id is computed (using CRC hash) and then the hash index for this 4080dd62a01SPablo de Laragroup is retrieved from the EFD table. Using the retrieved hash index, 4090dd62a01SPablo de Larathe hash function ``h(key, seed1) + index *h(key, seed2)`` is used which will 4100dd62a01SPablo de Lararesult in an index in the lookup_table, the bit corresponding to this 4110dd62a01SPablo de Laraindex will be the target value bit. This procedure is repeated for each 4120dd62a01SPablo de Larabit of the target value. 4130dd62a01SPablo de Lara 4140dd62a01SPablo de LaraGroup Rebalancing Function Internals 4150dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4160dd62a01SPablo de Lara 4170dd62a01SPablo de LaraWhen discussing EFD inserts and lookups, the discussion is simplified by 4180dd62a01SPablo de Laraassuming that a group id is simply a result of hash function. However, 4190dd62a01SPablo de Larasince hashing in general is not perfect and will not always produce a 4200dd62a01SPablo de Larauniform output, this simplified assumption will lead to unbalanced 4210dd62a01SPablo de Laragroups, i.e., some group will have more keys than other groups. 4220dd62a01SPablo de LaraTypically, and to minimize insert time with an increasing number of keys, 4230dd62a01SPablo de Larait is preferable that all groups will have a balanced number of keys, so 4240dd62a01SPablo de Larathe brute force search for the perfect hash terminates with a valid hash 4250dd62a01SPablo de Laraindex. In order to achieve this target, groups are rebalanced during 4260dd62a01SPablo de Lararuntime inserts, and keys are moved around from a busy group to a less 4270dd62a01SPablo de Laracrowded group as the more keys are inserted. 4280dd62a01SPablo de Lara 4290dd62a01SPablo de Lara.. _figure_efd12: 4300dd62a01SPablo de Lara 4310dd62a01SPablo de Lara.. figure:: img/efd_i12.* 4320dd62a01SPablo de Lara 4330dd62a01SPablo de Lara Runtime Group Rebalancing 4340dd62a01SPablo de Lara 4350dd62a01SPablo de Lara:numref:`figure_efd12` depicts the high level idea of group rebalancing, given an 4360dd62a01SPablo de Larainput key the hash result is split into two parts a chunk id and 8-bit 4370dd62a01SPablo de Larabin id. A chunk contains 64 different groups and 256 bins (i.e. for any 4380dd62a01SPablo de Laragiven bin it can map to 4 distinct groups). When a key is inserted, the 4390dd62a01SPablo de Larabin id is computed, for example in :numref:`figure_efd12` bin_id=2, 4400dd62a01SPablo de Laraand since each bin can be mapped to one of four different groups (2 bit storage), 4410dd62a01SPablo de Larathe four possible mappings are evaluated and the one that will result in a 4420dd62a01SPablo de Larabalanced key distribution across these four is selected the mapping result 4430dd62a01SPablo de Larais stored in these two bits. 4440dd62a01SPablo de Lara 4450dd62a01SPablo de Lara 4460dd62a01SPablo de Lara.. _Efd_references: 4470dd62a01SPablo de Lara 4480dd62a01SPablo de LaraReferences 4490dd62a01SPablo de Lara----------- 4500dd62a01SPablo de Lara 4510dd62a01SPablo de Lara1- EFD is based on collaborative research work between Intel and 4520dd62a01SPablo de LaraCarnegie Mellon University (CMU), interested readers can refer to the paper 4530dd62a01SPablo de Lara“Scaling Up Clustered Network Appliances with ScaleBricks;” Dong Zhou et al. 4540dd62a01SPablo de Laraat SIGCOMM 2015 (`http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p241.pdf`) 4550dd62a01SPablo de Larafor more information. 456