15630257fSFerruh Yigit.. SPDX-License-Identifier: BSD-3-Clause 25630257fSFerruh Yigit Copyright(c) 2016-2017 Intel Corporation. 30dd62a01SPablo de Lara 4*41dd9a6bSDavid YoungElastic Flow Distributor (EFD) Library 5*41dd9a6bSDavid Young====================================== 60dd62a01SPablo de Lara 70dd62a01SPablo de LaraIntroduction 80dd62a01SPablo de Lara------------ 90dd62a01SPablo de Lara 100dd62a01SPablo de LaraIn Data Centers today, clustering and scheduling of distributed workloads 110dd62a01SPablo de Larais a very common task. Many workloads require a deterministic 120dd62a01SPablo de Larapartitioning of a flat key space among a cluster of machines. When a 130dd62a01SPablo de Larapacket enters the cluster, the ingress node will direct the packet to 140dd62a01SPablo de Laraits handling node. For example, data-centers with disaggregated storage 150dd62a01SPablo de Larause storage metadata tables to forward I/O requests to the correct back end 160dd62a01SPablo de Larastorage cluster, stateful packet inspection will use match incoming 170dd62a01SPablo de Laraflows to signatures in flow tables to send incoming packets to their 180dd62a01SPablo de Laraintended deep packet inspection (DPI) devices, and so on. 190dd62a01SPablo de Lara 200dd62a01SPablo de LaraEFD is a distributor library that uses perfect hashing to determine a 210dd62a01SPablo de Laratarget/value for a given incoming flow key. It has the following 220dd62a01SPablo de Laraadvantages: first, because it uses perfect hashing it does not store the 230dd62a01SPablo de Larakey itself and hence lookup performance is not dependent on the key 240dd62a01SPablo de Larasize. Second, the target/value can be any arbitrary value hence the 250dd62a01SPablo de Larasystem designer and/or operator can better optimize service rates and 260dd62a01SPablo de Larainter-cluster network traffic locating. Third, since the storage 270dd62a01SPablo de Lararequirement is much smaller than a hash-based flow table (i.e. better 280dd62a01SPablo de Larafit for CPU cache), EFD can scale to millions of flow keys. Finally, 290dd62a01SPablo de Larawith the current optimized library implementation, performance is fully 300dd62a01SPablo de Larascalable with any number of CPU cores. 310dd62a01SPablo de Lara 320dd62a01SPablo de LaraFlow Based Distribution 330dd62a01SPablo de Lara----------------------- 340dd62a01SPablo de Lara 350dd62a01SPablo de LaraComputation Based Schemes 360dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~ 370dd62a01SPablo de Lara 380dd62a01SPablo de LaraFlow distribution and/or load balancing can be simply done using a 390dd62a01SPablo de Larastateless computation, for instance using round-robin or a simple 400dd62a01SPablo de Laracomputation based on the flow key as an input. For example, a hash 410dd62a01SPablo de Larafunction can be used to direct a certain flow to a target based on 420dd62a01SPablo de Larathe flow key (e.g. ``h(key) mod n``) where h(key) is the hash value of the 430dd62a01SPablo de Laraflow key and n is the number of possible targets. 440dd62a01SPablo de Lara 450dd62a01SPablo de Lara.. _figure_efd1: 460dd62a01SPablo de Lara 470dd62a01SPablo de Lara.. figure:: img/efd_i1.* 480dd62a01SPablo de Lara 490dd62a01SPablo de Lara Load Balancing Using Front End Node 500dd62a01SPablo de Lara 510dd62a01SPablo de LaraIn this scheme (:numref:`figure_efd1`), the front end server/distributor/load balancer 520dd62a01SPablo de Laraextracts the flow key from the input packet and applies a computation to determine where 530dd62a01SPablo de Larathis flow should be directed. Intuitively, this scheme is very simple 540dd62a01SPablo de Laraand requires no state to be kept at the front end node, and hence, 550dd62a01SPablo de Larastorage requirements are minimum. 560dd62a01SPablo de Lara 570dd62a01SPablo de Lara.. _figure_efd2: 580dd62a01SPablo de Lara 590dd62a01SPablo de Lara.. figure:: img/efd_i2.* 600dd62a01SPablo de Lara 610dd62a01SPablo de Lara Consistent Hashing 620dd62a01SPablo de Lara 630dd62a01SPablo de LaraA widely used flow distributor that belongs to the same category of 640dd62a01SPablo de Laracomputation-based schemes is ``consistent hashing``, shown in :numref:`figure_efd2`. 650dd62a01SPablo de LaraTarget destinations (shown in red) are hashed into the same space as the flow 660dd62a01SPablo de Larakeys (shown in blue), and keys are mapped to the nearest target in a clockwise 670dd62a01SPablo de Larafashion. Dynamically adding and removing targets with consistent hashing 680dd62a01SPablo de Lararequires only K/n keys to be remapped on average, where K is the number of 690dd62a01SPablo de Larakeys, and n is the number of targets. In contrast, in a traditional hash-based 700dd62a01SPablo de Larascheme, a change in the number of targets causes nearly all keys to be 710dd62a01SPablo de Lararemapped. 720dd62a01SPablo de Lara 730dd62a01SPablo de LaraAlthough computation-based schemes are simple and need very little 740dd62a01SPablo de Larastorage requirement, they suffer from the drawback that the system 750dd62a01SPablo de Laradesigner/operator can’t fully control the target to assign a specific 760dd62a01SPablo de Larakey, as this is dictated by the hash function. 770dd62a01SPablo de LaraDeterministically co-locating of keys together (for example, to minimize 780dd62a01SPablo de Larainter-server traffic or to optimize for network traffic conditions, 790dd62a01SPablo de Laratarget load, etc.) is simply not possible. 800dd62a01SPablo de Lara 810dd62a01SPablo de LaraFlow-Table Based Schemes 820dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~ 830dd62a01SPablo de Lara 840dd62a01SPablo de LaraWhen using a Flow-Table based scheme to handle flow distribution/load 850dd62a01SPablo de Larabalancing, in contrast with computation-based schemes, the system designer 860dd62a01SPablo de Larahas the flexibility of assigning a given flow to any given 870dd62a01SPablo de Laratarget. The flow table (e.g. DPDK RTE Hash Library) will simply store 880dd62a01SPablo de Laraboth the flow key and the target value. 890dd62a01SPablo de Lara 900dd62a01SPablo de Lara.. _figure_efd3: 910dd62a01SPablo de Lara 920dd62a01SPablo de Lara.. figure:: img/efd_i3.* 930dd62a01SPablo de Lara 940dd62a01SPablo de Lara Table Based Flow Distribution 950dd62a01SPablo de Lara 960dd62a01SPablo de LaraAs shown in :numref:`figure_efd3`, when doing a lookup, the flow-table 970dd62a01SPablo de Larais indexed with the hash of the flow key and the keys (more than one is possible, 980dd62a01SPablo de Larabecause of hash collision) stored in this index and corresponding values 990dd62a01SPablo de Laraare retrieved. The retrieved key(s) is matched with the input flow key 1000dd62a01SPablo de Laraand if there is a match the value (target id) is returned. 1010dd62a01SPablo de Lara 1020dd62a01SPablo de LaraThe drawback of using a hash table for flow distribution/load balancing 1030dd62a01SPablo de Larais the storage requirement, since the flow table need to store keys, 1040dd62a01SPablo de Larasignatures and target values. This doesn't allow this scheme to scale to 1050dd62a01SPablo de Laramillions of flow keys. Large tables will usually not fit in 1060dd62a01SPablo de Larathe CPU cache, and hence, the lookup performance is degraded because of 1070dd62a01SPablo de Larathe latency to access the main memory. 1080dd62a01SPablo de Lara 1090dd62a01SPablo de LaraEFD Based Scheme 1100dd62a01SPablo de Lara~~~~~~~~~~~~~~~~ 1110dd62a01SPablo de Lara 1120dd62a01SPablo de LaraEFD combines the advantages of both flow-table based and computation-based 1130dd62a01SPablo de Laraschemes. It doesn't require the large storage necessary for 1140dd62a01SPablo de Laraflow-table based schemes (because EFD doesn't store the key as explained 1150dd62a01SPablo de Larabelow), and it supports any arbitrary value for any given key. 1160dd62a01SPablo de Lara 1170dd62a01SPablo de Lara.. _figure_efd4: 1180dd62a01SPablo de Lara 1190dd62a01SPablo de Lara.. figure:: img/efd_i4.* 1200dd62a01SPablo de Lara 1210dd62a01SPablo de Lara Searching for Perfect Hash Function 1220dd62a01SPablo de Lara 1230dd62a01SPablo de LaraThe basic idea of EFD is when a given key is to be inserted, a family of 1240dd62a01SPablo de Larahash functions is searched until the correct hash function that maps the 1250dd62a01SPablo de Larainput key to the correct value is found, as shown in :numref:`figure_efd4`. 1260dd62a01SPablo de LaraHowever, rather than explicitly storing all keys and their associated values, 1270dd62a01SPablo de LaraEFD stores only indices of hash functions that map keys to values, and 1280dd62a01SPablo de Larathereby consumes much less space than conventional flow-based tables. 1290dd62a01SPablo de LaraThe lookup operation is very simple, similar to a computational-based 1300dd62a01SPablo de Larascheme: given an input key the lookup operation is reduced to hashing 1310dd62a01SPablo de Larathat key with the correct hash function. 1320dd62a01SPablo de Lara 1330dd62a01SPablo de Lara.. _figure_efd5: 1340dd62a01SPablo de Lara 1350dd62a01SPablo de Lara.. figure:: img/efd_i5.* 1360dd62a01SPablo de Lara 1370dd62a01SPablo de Lara Divide and Conquer for Millions of Keys 1380dd62a01SPablo de Lara 1390dd62a01SPablo de LaraIntuitively, finding a hash function that maps each of a large number 1400dd62a01SPablo de Lara(millions) of input keys to the correct output value is effectively 1410dd62a01SPablo de Laraimpossible, as a result EFD, as shown in :numref:`figure_efd5`, 1420dd62a01SPablo de Larabreaks the problem into smaller pieces (divide and conquer). 1430dd62a01SPablo de LaraEFD divides the entire input key set into many small groups. 1440dd62a01SPablo de LaraEach group consists of approximately 20-28 keys (a configurable parameter 1450dd62a01SPablo de Larafor the library), then, for each small group, a brute force search to find 1460dd62a01SPablo de Laraa hash function that produces the correct outputs for each key in the group. 1470dd62a01SPablo de Lara 1480dd62a01SPablo de LaraIt should be mentioned that, since the online lookup table for EFD 1490dd62a01SPablo de Laradoesn't store the key itself, the size of the EFD table is independent 1500dd62a01SPablo de Laraof the key size and hence EFD lookup performance which is almost 1510dd62a01SPablo de Laraconstant irrespective of the length of the key which is a highly 1520dd62a01SPablo de Laradesirable feature especially for longer keys. 1530dd62a01SPablo de Lara 1540dd62a01SPablo de LaraIn summary, EFD is a set separation data structure that supports millions of 1550dd62a01SPablo de Larakeys. It is used to distribute a given key to an intended target. By itself 1560dd62a01SPablo de LaraEFD is not a FIB data structure with an exact match the input flow key. 1570dd62a01SPablo de Lara 1580dd62a01SPablo de Lara.. _Efd_example: 1590dd62a01SPablo de Lara 1600dd62a01SPablo de LaraExample of EFD Library Usage 1610dd62a01SPablo de Lara---------------------------- 1620dd62a01SPablo de Lara 1630dd62a01SPablo de LaraEFD can be used along the data path of many network functions and middleboxes. 1640dd62a01SPablo de LaraAs previously mentioned, it can used as an index table for 1650dd62a01SPablo de Lara<key,value> pairs, meta-data for objects, a flow-level load balancer, etc. 1660dd62a01SPablo de Lara:numref:`figure_efd6` shows an example of using EFD as a flow-level load 1670dd62a01SPablo de Larabalancer, where flows are received at a front end server before being forwarded 1680dd62a01SPablo de Larato the target back end server for processing. The system designer would 1690dd62a01SPablo de Laradeterministically co-locate flows together in order to minimize cross-server 1700dd62a01SPablo de Larainteraction. 1710dd62a01SPablo de Lara(For example, flows requesting certain webpage objects are co-located 1720dd62a01SPablo de Laratogether, to minimize forwarding of common objects across servers). 1730dd62a01SPablo de Lara 1740dd62a01SPablo de Lara.. _figure_efd6: 1750dd62a01SPablo de Lara 1760dd62a01SPablo de Lara.. figure:: img/efd_i6.* 1770dd62a01SPablo de Lara 1780dd62a01SPablo de Lara EFD as a Flow-Level Load Balancer 1790dd62a01SPablo de Lara 1800dd62a01SPablo de LaraAs shown in :numref:`figure_efd6`, the front end server will have an EFD table that 1810dd62a01SPablo de Larastores for each group what is the perfect hash index that satisfies the 1820dd62a01SPablo de Laracorrect output. Because the table size is small and fits in cache (since 1830dd62a01SPablo de Larakeys are not stored), it sustains a large number of flows (N*X, where N 1840dd62a01SPablo de Larais the maximum number of flows served by each back end server of the X 1850dd62a01SPablo de Larapossible targets). 1860dd62a01SPablo de Lara 1870dd62a01SPablo de LaraWith an input flow key, the group id is computed (for example, using 1880dd62a01SPablo de Laralast few bits of CRC hash) and then the EFD table is indexed with the 1890dd62a01SPablo de Laragroup id to retrieve the corresponding hash index to use. Once the index 1900dd62a01SPablo de Larais retrieved the key is hashed using this hash function and the result 1910dd62a01SPablo de Larawill be the intended correct target where this flow is supposed to be 1920dd62a01SPablo de Laraprocessed. 1930dd62a01SPablo de Lara 1940dd62a01SPablo de LaraIt should be noted that as a result of EFD not matching the exact key but 1950dd62a01SPablo de Lararather distributing the flows to a target back end node based on the 1960dd62a01SPablo de Laraperfect hash index, a key that has not been inserted before 1970dd62a01SPablo de Larawill be distributed to a valid target. Hence, a local table which stores 1980dd62a01SPablo de Larathe flows served at each node is used and is 1990dd62a01SPablo de Laraexact matched with the input key to rule out new never seen before 2000dd62a01SPablo de Laraflows. 2010dd62a01SPablo de Lara 2020dd62a01SPablo de Lara.. _Efd_api: 2030dd62a01SPablo de Lara 2040dd62a01SPablo de LaraLibrary API Overview 2050dd62a01SPablo de Lara-------------------- 2060dd62a01SPablo de Lara 2070dd62a01SPablo de LaraThe EFD library API is created with a very similar semantics of a 2080dd62a01SPablo de Larahash-index or a flow table. The application creates an EFD table for a 2090dd62a01SPablo de Laragiven maximum number of flows, a function is called to insert a flow key 2100dd62a01SPablo de Larawith a specific target value, and another function is used to retrieve 2110dd62a01SPablo de Laratarget values for a given individual flow key or a bulk of keys. 2120dd62a01SPablo de Lara 2130dd62a01SPablo de LaraEFD Table Create 2140dd62a01SPablo de Lara~~~~~~~~~~~~~~~~ 2150dd62a01SPablo de Lara 2160dd62a01SPablo de LaraThe function ``rte_efd_create()`` is used to create and return a pointer 2170dd62a01SPablo de Larato an EFD table that is sized to hold up to num_flows key. 2180dd62a01SPablo de LaraThe online version of the EFD table (the one that does 2190dd62a01SPablo de Laranot store the keys and is used for lookups) will be allocated and 2200dd62a01SPablo de Laracreated in the last level cache (LLC) of the socket defined by the 2210dd62a01SPablo de Laraonline_socket_bitmask, while the offline EFD table (the one that 2220dd62a01SPablo de Larastores the keys and is used for key inserts and for computing the 2230dd62a01SPablo de Laraperfect hashing) is allocated and created in the LLC of the socket 2240dd62a01SPablo de Laradefined by offline_socket_bitmask. It should be noted, that for 2250dd62a01SPablo de Larahighest performance the socket id should match that where the thread is 2260dd62a01SPablo de Lararunning, i.e. the online EFD lookup table should be created on the same 2270dd62a01SPablo de Larasocket as where the lookup thread is running. 2280dd62a01SPablo de Lara 2290dd62a01SPablo de LaraEFD Insert and Update 2300dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~ 2310dd62a01SPablo de Lara 2320dd62a01SPablo de LaraThe EFD function to insert a key or update a key to a new value is 2330dd62a01SPablo de Lara``rte_efd_update()``. This function will update an existing key to 2340dd62a01SPablo de Laraa new value (target) if the key has already been inserted 2350dd62a01SPablo de Larabefore, or will insert the <key,value> pair if this key has not been inserted 2360dd62a01SPablo de Larabefore. It will return 0 upon success. It will return 2370dd62a01SPablo de Lara``EFD_UPDATE_WARN_GROUP_FULL (1)`` if the operation is insert, and the 2380dd62a01SPablo de Laralast available space in the key's group was just used. It will return 2390dd62a01SPablo de Lara``EFD_UPDATE_FAILED (2)`` when the insertion or update has failed (either it 2400dd62a01SPablo de Larafailed to find a suitable perfect hash or the group was full). The function 2410dd62a01SPablo de Larawill return ``EFD_UPDATE_NO_CHANGE (3)`` if there is no change to the EFD 2420dd62a01SPablo de Laratable (i.e, same value already exists). 2430dd62a01SPablo de Lara 244b09efeb9SPablo de Lara.. Note:: 245b09efeb9SPablo de Lara 246b09efeb9SPablo de Lara This function is not multi-thread safe and should only be called 247b09efeb9SPablo de Lara from one thread. 248b09efeb9SPablo de Lara 2490dd62a01SPablo de LaraEFD Lookup 2500dd62a01SPablo de Lara~~~~~~~~~~ 2510dd62a01SPablo de Lara 2520dd62a01SPablo de LaraTo lookup a certain key in an EFD table, the function ``rte_efd_lookup()`` 2530dd62a01SPablo de Larais used to return the value associated with single key. 2540dd62a01SPablo de LaraAs previously mentioned, if the key has been inserted, the correct value 2550dd62a01SPablo de Larainserted is returned, if the key has not been inserted before, 2560dd62a01SPablo de Laraa ‘random’ value (based on hashing of the key) is returned. 2570dd62a01SPablo de LaraFor better performance and to decrease the overhead of 2580dd62a01SPablo de Larafunction calls per key, it is always recommended to use a bulk lookup 2590dd62a01SPablo de Larafunction (simultaneous lookup of multiple keys) instead of a single key 2600dd62a01SPablo de Laralookup function. ``rte_efd_lookup_bulk()`` is the bulk lookup function, 2610dd62a01SPablo de Larathat looks up num_keys simultaneously stored in the key_list and the 2620dd62a01SPablo de Laracorresponding return values will be returned in the value_list. 2630dd62a01SPablo de Lara 264b09efeb9SPablo de Lara.. Note:: 265b09efeb9SPablo de Lara 266b09efeb9SPablo de Lara This function is multi-thread safe, but there should not be other threads 267b09efeb9SPablo de Lara writing in the EFD table, unless locks are used. 268b09efeb9SPablo de Lara 2690dd62a01SPablo de LaraEFD Delete 2700dd62a01SPablo de Lara~~~~~~~~~~ 2710dd62a01SPablo de Lara 2720dd62a01SPablo de LaraTo delete a certain key in an EFD table, the function 2730dd62a01SPablo de Lara``rte_efd_delete()`` can be used. The function returns zero upon success 2740dd62a01SPablo de Larawhen the key has been found and deleted. Socket_id is the parameter to 2750dd62a01SPablo de Larause to lookup the existing value, which is ideally the caller's socket id. 2760dd62a01SPablo de LaraThe previous value associated with this key will be returned 2770dd62a01SPablo de Larain the prev_value argument. 2780dd62a01SPablo de Lara 279b09efeb9SPablo de Lara.. Note:: 280b09efeb9SPablo de Lara 281b09efeb9SPablo de Lara This function is not multi-thread safe and should only be called 282b09efeb9SPablo de Lara from one thread. 283b09efeb9SPablo de Lara 2840dd62a01SPablo de Lara.. _Efd_internals: 2850dd62a01SPablo de Lara 2860dd62a01SPablo de LaraLibrary Internals 2870dd62a01SPablo de Lara----------------- 2880dd62a01SPablo de Lara 2890dd62a01SPablo de LaraThis section provides the brief high-level idea and an overview 2900dd62a01SPablo de Laraof the library internals to accompany the RFC. The intent of this 2910dd62a01SPablo de Larasection is to explain to readers the high-level implementation of 2920dd62a01SPablo de Larainsert, lookup and group rebalancing in the EFD library. 2930dd62a01SPablo de Lara 2940dd62a01SPablo de LaraInsert Function Internals 2950dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~ 2960dd62a01SPablo de Lara 2970dd62a01SPablo de LaraAs previously mentioned the EFD divides the whole set of keys into 2980dd62a01SPablo de Laragroups of a manageable size (e.g. 28 keys) and then searches for the 2990dd62a01SPablo de Laraperfect hash that satisfies the intended target value for each key. EFD 3000dd62a01SPablo de Larastores two version of the <key,value> table: 3010dd62a01SPablo de Lara 3020dd62a01SPablo de Lara- Offline Version (in memory): Only used for the insertion/update 3030dd62a01SPablo de Lara operation, which is less frequent than the lookup operation. In the 3040dd62a01SPablo de Lara offline version the exact keys for each group is stored. When a new 3050dd62a01SPablo de Lara key is added, the hash function is updated that will satisfy the 3060dd62a01SPablo de Lara value for the new key together with the all old keys already inserted 3070dd62a01SPablo de Lara in this group. 3080dd62a01SPablo de Lara 3090dd62a01SPablo de Lara- Online Version (in cache): Used for the frequent lookup operation. In 3100dd62a01SPablo de Lara the online version, as previously mentioned, the keys are not stored 3110dd62a01SPablo de Lara but rather only the hash index for each group. 3120dd62a01SPablo de Lara 3130dd62a01SPablo de Lara.. _figure_efd7: 3140dd62a01SPablo de Lara 3150dd62a01SPablo de Lara.. figure:: img/efd_i7.* 3160dd62a01SPablo de Lara 3170dd62a01SPablo de Lara Group Assignment 3180dd62a01SPablo de Lara 3190dd62a01SPablo de Lara:numref:`figure_efd7` depicts the group assignment for 7 flow keys as an example. 3200dd62a01SPablo de LaraGiven a flow key, a hash function (in our implementation CRC hash) is 3210dd62a01SPablo de Laraused to get the group id. As shown in the figure, the groups can be 3220dd62a01SPablo de Laraunbalanced. (We highlight group rebalancing further below). 3230dd62a01SPablo de Lara 3240dd62a01SPablo de Lara.. _figure_efd8: 3250dd62a01SPablo de Lara 3260dd62a01SPablo de Lara.. figure:: img/efd_i8.* 3270dd62a01SPablo de Lara 3280dd62a01SPablo de Lara Perfect Hash Search - Assigned Keys & Target Value 3290dd62a01SPablo de Lara 3300dd62a01SPablo de LaraFocusing on one group that has four keys, :numref:`figure_efd8` depicts the search 3310dd62a01SPablo de Laraalgorithm to find the perfect hash function. Assuming that the target 3320dd62a01SPablo de Laravalue bit for the keys is as shown in the figure, then the online EFD 3330dd62a01SPablo de Laratable will store a 16 bit hash index and 16 bit lookup table per group 3340dd62a01SPablo de Laraper value bit. 3350dd62a01SPablo de Lara 3360dd62a01SPablo de Lara.. _figure_efd9: 3370dd62a01SPablo de Lara 3380dd62a01SPablo de Lara.. figure:: img/efd_i9.* 3390dd62a01SPablo de Lara 3400dd62a01SPablo de Lara Perfect Hash Search - Satisfy Target Values 3410dd62a01SPablo de Lara 3420dd62a01SPablo de LaraFor a given keyX, a hash function ``(h(keyX, seed1) + index * h(keyX, seed2))`` 3430dd62a01SPablo de Larais used to point to certain bit index in the 16bit lookup_table value, 3440dd62a01SPablo de Laraas shown in :numref:`figure_efd9`. 3450dd62a01SPablo de LaraThe insert function will brute force search for all possible values for the 3460dd62a01SPablo de Larahash index until a non conflicting lookup_table is found. 3470dd62a01SPablo de Lara 3480dd62a01SPablo de Lara.. _figure_efd10: 3490dd62a01SPablo de Lara 3500dd62a01SPablo de Lara.. figure:: img/efd_i10.* 3510dd62a01SPablo de Lara 3520dd62a01SPablo de Lara Finding Hash Index for Conflict Free lookup_table 3530dd62a01SPablo de Lara 3540dd62a01SPablo de LaraFor example, since both key3 and key7 have a target bit value of 1, it 3550dd62a01SPablo de Larais okay if the hash function of both keys point to the same bit in the 3560dd62a01SPablo de Laralookup table. A conflict will occur if a hash index is used that maps 3570dd62a01SPablo de Laraboth Key4 and Key7 to the same index in the lookup_table, 3580dd62a01SPablo de Laraas shown in :numref:`figure_efd10`, since their target value bit are not the same. 3590dd62a01SPablo de LaraOnce a hash index is found that produces a lookup_table with no 3600dd62a01SPablo de Laracontradictions, this index is stored for this group. This procedure is 3610dd62a01SPablo de Lararepeated for each bit of target value. 3620dd62a01SPablo de Lara 3630dd62a01SPablo de LaraLookup Function Internals 3640dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~ 3650dd62a01SPablo de Lara 3660dd62a01SPablo de LaraThe design principle of EFD is that lookups are much more frequent than 3670dd62a01SPablo de Larainserts, and hence, EFD's design optimizes for the lookups which are 3680dd62a01SPablo de Larafaster and much simpler than the slower insert procedure (inserts are 3690dd62a01SPablo de Laraslow, because of perfect hash search as previously discussed). 3700dd62a01SPablo de Lara 3710dd62a01SPablo de Lara.. _figure_efd11: 3720dd62a01SPablo de Lara 3730dd62a01SPablo de Lara.. figure:: img/efd_i11.* 3740dd62a01SPablo de Lara 3750dd62a01SPablo de Lara EFD Lookup Operation 3760dd62a01SPablo de Lara 3770dd62a01SPablo de Lara:numref:`figure_efd11` depicts the lookup operation for EFD. Given an input key, 3780dd62a01SPablo de Larathe group id is computed (using CRC hash) and then the hash index for this 3790dd62a01SPablo de Laragroup is retrieved from the EFD table. Using the retrieved hash index, 3800dd62a01SPablo de Larathe hash function ``h(key, seed1) + index *h(key, seed2)`` is used which will 3810dd62a01SPablo de Lararesult in an index in the lookup_table, the bit corresponding to this 3820dd62a01SPablo de Laraindex will be the target value bit. This procedure is repeated for each 3830dd62a01SPablo de Larabit of the target value. 3840dd62a01SPablo de Lara 3850dd62a01SPablo de LaraGroup Rebalancing Function Internals 3860dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3870dd62a01SPablo de Lara 3880dd62a01SPablo de LaraWhen discussing EFD inserts and lookups, the discussion is simplified by 3890dd62a01SPablo de Laraassuming that a group id is simply a result of hash function. However, 3900dd62a01SPablo de Larasince hashing in general is not perfect and will not always produce a 3910dd62a01SPablo de Larauniform output, this simplified assumption will lead to unbalanced 3920dd62a01SPablo de Laragroups, i.e., some group will have more keys than other groups. 3930dd62a01SPablo de LaraTypically, and to minimize insert time with an increasing number of keys, 3940dd62a01SPablo de Larait is preferable that all groups will have a balanced number of keys, so 3950dd62a01SPablo de Larathe brute force search for the perfect hash terminates with a valid hash 3960dd62a01SPablo de Laraindex. In order to achieve this target, groups are rebalanced during 3970dd62a01SPablo de Lararuntime inserts, and keys are moved around from a busy group to a less 3980dd62a01SPablo de Laracrowded group as the more keys are inserted. 3990dd62a01SPablo de Lara 4000dd62a01SPablo de Lara.. _figure_efd12: 4010dd62a01SPablo de Lara 4020dd62a01SPablo de Lara.. figure:: img/efd_i12.* 4030dd62a01SPablo de Lara 4040dd62a01SPablo de Lara Runtime Group Rebalancing 4050dd62a01SPablo de Lara 4060dd62a01SPablo de Lara:numref:`figure_efd12` depicts the high level idea of group rebalancing, given an 4070dd62a01SPablo de Larainput key the hash result is split into two parts a chunk id and 8-bit 4080dd62a01SPablo de Larabin id. A chunk contains 64 different groups and 256 bins (i.e. for any 4090dd62a01SPablo de Laragiven bin it can map to 4 distinct groups). When a key is inserted, the 4100dd62a01SPablo de Larabin id is computed, for example in :numref:`figure_efd12` bin_id=2, 4110dd62a01SPablo de Laraand since each bin can be mapped to one of four different groups (2 bit storage), 4120dd62a01SPablo de Larathe four possible mappings are evaluated and the one that will result in a 4130dd62a01SPablo de Larabalanced key distribution across these four is selected the mapping result 4140dd62a01SPablo de Larais stored in these two bits. 4150dd62a01SPablo de Lara 4160dd62a01SPablo de Lara 4170dd62a01SPablo de Lara.. _Efd_references: 4180dd62a01SPablo de Lara 4190dd62a01SPablo de LaraReferences 4200dd62a01SPablo de Lara----------- 4210dd62a01SPablo de Lara 4220dd62a01SPablo de Lara1- EFD is based on collaborative research work between Intel and 4230dd62a01SPablo de LaraCarnegie Mellon University (CMU), interested readers can refer to the paper 424d629b7b5SJohn McNamara"Scaling Up Clustered Network Appliances with ScaleBricks" Dong Zhou et al. 4250dd62a01SPablo de Laraat SIGCOMM 2015 (`http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p241.pdf`) 4260dd62a01SPablo de Larafor more information. 427