xref: /dpdk/doc/guides/prog_guide/efd_lib.rst (revision 41dd9a6bc2d9c6e20e139ad713cc9d172572dd43)
15630257fSFerruh Yigit..  SPDX-License-Identifier: BSD-3-Clause
25630257fSFerruh Yigit    Copyright(c) 2016-2017 Intel Corporation.
30dd62a01SPablo de Lara
4*41dd9a6bSDavid YoungElastic Flow Distributor (EFD) Library
5*41dd9a6bSDavid Young======================================
60dd62a01SPablo de Lara
70dd62a01SPablo de LaraIntroduction
80dd62a01SPablo de Lara------------
90dd62a01SPablo de Lara
100dd62a01SPablo de LaraIn Data Centers today, clustering and scheduling of distributed workloads
110dd62a01SPablo de Larais a very common task. Many workloads require a deterministic
120dd62a01SPablo de Larapartitioning of a flat key space among a cluster of machines. When a
130dd62a01SPablo de Larapacket enters the cluster, the ingress node will direct the packet to
140dd62a01SPablo de Laraits handling node. For example, data-centers with disaggregated storage
150dd62a01SPablo de Larause storage metadata tables to forward I/O requests to the correct back end
160dd62a01SPablo de Larastorage cluster, stateful packet inspection will use match incoming
170dd62a01SPablo de Laraflows to signatures in flow tables to send incoming packets to their
180dd62a01SPablo de Laraintended deep packet inspection (DPI) devices, and so on.
190dd62a01SPablo de Lara
200dd62a01SPablo de LaraEFD is a distributor library that uses perfect hashing to determine a
210dd62a01SPablo de Laratarget/value for a given incoming flow key. It has the following
220dd62a01SPablo de Laraadvantages: first, because it uses perfect hashing it does not store the
230dd62a01SPablo de Larakey itself and hence lookup performance is not dependent on the key
240dd62a01SPablo de Larasize. Second, the target/value can be any arbitrary value hence the
250dd62a01SPablo de Larasystem designer and/or operator can better optimize service rates and
260dd62a01SPablo de Larainter-cluster network traffic locating. Third, since the storage
270dd62a01SPablo de Lararequirement is much smaller than a hash-based flow table (i.e. better
280dd62a01SPablo de Larafit for CPU cache), EFD can scale to millions of flow keys. Finally,
290dd62a01SPablo de Larawith the current optimized library implementation, performance is fully
300dd62a01SPablo de Larascalable with any number of CPU cores.
310dd62a01SPablo de Lara
320dd62a01SPablo de LaraFlow Based Distribution
330dd62a01SPablo de Lara-----------------------
340dd62a01SPablo de Lara
350dd62a01SPablo de LaraComputation Based Schemes
360dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~
370dd62a01SPablo de Lara
380dd62a01SPablo de LaraFlow distribution and/or load balancing can be simply done using a
390dd62a01SPablo de Larastateless computation, for instance using round-robin or a simple
400dd62a01SPablo de Laracomputation based on the flow key as an input. For example, a hash
410dd62a01SPablo de Larafunction can be used to direct a certain flow to a target based on
420dd62a01SPablo de Larathe flow key (e.g. ``h(key) mod n``) where h(key) is the hash value of the
430dd62a01SPablo de Laraflow key and n is the number of possible targets.
440dd62a01SPablo de Lara
450dd62a01SPablo de Lara.. _figure_efd1:
460dd62a01SPablo de Lara
470dd62a01SPablo de Lara.. figure:: img/efd_i1.*
480dd62a01SPablo de Lara
490dd62a01SPablo de Lara  Load Balancing Using Front End Node
500dd62a01SPablo de Lara
510dd62a01SPablo de LaraIn this scheme (:numref:`figure_efd1`), the front end server/distributor/load balancer
520dd62a01SPablo de Laraextracts the flow key from the input packet and applies a computation to determine where
530dd62a01SPablo de Larathis flow should be directed. Intuitively, this scheme is very simple
540dd62a01SPablo de Laraand requires no state to be kept at the front end node, and hence,
550dd62a01SPablo de Larastorage requirements are minimum.
560dd62a01SPablo de Lara
570dd62a01SPablo de Lara.. _figure_efd2:
580dd62a01SPablo de Lara
590dd62a01SPablo de Lara.. figure:: img/efd_i2.*
600dd62a01SPablo de Lara
610dd62a01SPablo de Lara  Consistent Hashing
620dd62a01SPablo de Lara
630dd62a01SPablo de LaraA widely used flow distributor that belongs to the same category of
640dd62a01SPablo de Laracomputation-based schemes is ``consistent hashing``, shown in :numref:`figure_efd2`.
650dd62a01SPablo de LaraTarget destinations (shown in red) are hashed into the same space as the flow
660dd62a01SPablo de Larakeys (shown in blue), and keys are mapped to the nearest target in a clockwise
670dd62a01SPablo de Larafashion. Dynamically adding and removing targets with consistent hashing
680dd62a01SPablo de Lararequires only K/n keys to be remapped on average, where K is the number of
690dd62a01SPablo de Larakeys, and n is the number of targets. In contrast, in a traditional hash-based
700dd62a01SPablo de Larascheme, a change in the number of targets causes nearly all keys to be
710dd62a01SPablo de Lararemapped.
720dd62a01SPablo de Lara
730dd62a01SPablo de LaraAlthough computation-based schemes are simple and need very little
740dd62a01SPablo de Larastorage requirement, they suffer from the drawback that the system
750dd62a01SPablo de Laradesigner/operator can’t fully control the target to assign a specific
760dd62a01SPablo de Larakey, as this is dictated by the hash function.
770dd62a01SPablo de LaraDeterministically co-locating of keys together (for example, to minimize
780dd62a01SPablo de Larainter-server traffic or to optimize for network traffic conditions,
790dd62a01SPablo de Laratarget load, etc.) is simply not possible.
800dd62a01SPablo de Lara
810dd62a01SPablo de LaraFlow-Table Based Schemes
820dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~
830dd62a01SPablo de Lara
840dd62a01SPablo de LaraWhen using a Flow-Table based scheme to handle flow distribution/load
850dd62a01SPablo de Larabalancing, in contrast with computation-based schemes, the system designer
860dd62a01SPablo de Larahas the flexibility of assigning a given flow to any given
870dd62a01SPablo de Laratarget. The flow table (e.g. DPDK RTE Hash Library) will simply store
880dd62a01SPablo de Laraboth the flow key and the target value.
890dd62a01SPablo de Lara
900dd62a01SPablo de Lara.. _figure_efd3:
910dd62a01SPablo de Lara
920dd62a01SPablo de Lara.. figure:: img/efd_i3.*
930dd62a01SPablo de Lara
940dd62a01SPablo de Lara  Table Based Flow Distribution
950dd62a01SPablo de Lara
960dd62a01SPablo de LaraAs shown in :numref:`figure_efd3`, when doing a lookup, the flow-table
970dd62a01SPablo de Larais indexed with the hash of the flow key and the keys (more than one is possible,
980dd62a01SPablo de Larabecause of hash collision) stored in this index and corresponding values
990dd62a01SPablo de Laraare retrieved. The retrieved key(s) is matched with the input flow key
1000dd62a01SPablo de Laraand if there is a match the value (target id) is returned.
1010dd62a01SPablo de Lara
1020dd62a01SPablo de LaraThe drawback of using a hash table for flow distribution/load balancing
1030dd62a01SPablo de Larais the storage requirement, since the flow table need to store keys,
1040dd62a01SPablo de Larasignatures and target values. This doesn't allow this scheme to scale to
1050dd62a01SPablo de Laramillions of flow keys. Large tables will usually not fit in
1060dd62a01SPablo de Larathe CPU cache, and hence, the lookup performance is degraded because of
1070dd62a01SPablo de Larathe latency to access the main memory.
1080dd62a01SPablo de Lara
1090dd62a01SPablo de LaraEFD Based Scheme
1100dd62a01SPablo de Lara~~~~~~~~~~~~~~~~
1110dd62a01SPablo de Lara
1120dd62a01SPablo de LaraEFD combines the advantages of both flow-table based and computation-based
1130dd62a01SPablo de Laraschemes. It doesn't require the large storage necessary for
1140dd62a01SPablo de Laraflow-table based schemes (because EFD doesn't store the key as explained
1150dd62a01SPablo de Larabelow), and it supports any arbitrary value for any given key.
1160dd62a01SPablo de Lara
1170dd62a01SPablo de Lara.. _figure_efd4:
1180dd62a01SPablo de Lara
1190dd62a01SPablo de Lara.. figure:: img/efd_i4.*
1200dd62a01SPablo de Lara
1210dd62a01SPablo de Lara  Searching for Perfect Hash Function
1220dd62a01SPablo de Lara
1230dd62a01SPablo de LaraThe basic idea of EFD is when a given key is to be inserted, a family of
1240dd62a01SPablo de Larahash functions is searched until the correct hash function that maps the
1250dd62a01SPablo de Larainput key to the correct value is found, as shown in :numref:`figure_efd4`.
1260dd62a01SPablo de LaraHowever, rather than explicitly storing all keys and their associated values,
1270dd62a01SPablo de LaraEFD stores only indices of hash functions that map keys to values, and
1280dd62a01SPablo de Larathereby consumes much less space than conventional flow-based tables.
1290dd62a01SPablo de LaraThe lookup operation is very simple, similar to a computational-based
1300dd62a01SPablo de Larascheme: given an input key the lookup operation is reduced to hashing
1310dd62a01SPablo de Larathat key with the correct hash function.
1320dd62a01SPablo de Lara
1330dd62a01SPablo de Lara.. _figure_efd5:
1340dd62a01SPablo de Lara
1350dd62a01SPablo de Lara.. figure:: img/efd_i5.*
1360dd62a01SPablo de Lara
1370dd62a01SPablo de Lara  Divide and Conquer for Millions of Keys
1380dd62a01SPablo de Lara
1390dd62a01SPablo de LaraIntuitively, finding a hash function that maps each of a large number
1400dd62a01SPablo de Lara(millions) of input keys to the correct output value is effectively
1410dd62a01SPablo de Laraimpossible, as a result EFD, as shown in :numref:`figure_efd5`,
1420dd62a01SPablo de Larabreaks the problem into smaller pieces (divide and conquer).
1430dd62a01SPablo de LaraEFD divides the entire input key set into many small groups.
1440dd62a01SPablo de LaraEach group consists of approximately 20-28 keys (a configurable parameter
1450dd62a01SPablo de Larafor the library), then, for each small group, a brute force search to find
1460dd62a01SPablo de Laraa hash function that produces the correct outputs for each key in the group.
1470dd62a01SPablo de Lara
1480dd62a01SPablo de LaraIt should be mentioned that, since the online lookup table for EFD
1490dd62a01SPablo de Laradoesn't store the key itself, the size of the EFD table is independent
1500dd62a01SPablo de Laraof the key size and hence EFD lookup performance which is almost
1510dd62a01SPablo de Laraconstant irrespective of the length of the key which is a highly
1520dd62a01SPablo de Laradesirable feature especially for longer keys.
1530dd62a01SPablo de Lara
1540dd62a01SPablo de LaraIn summary, EFD is a set separation data structure that supports millions of
1550dd62a01SPablo de Larakeys. It is used to distribute a given key to an intended target. By itself
1560dd62a01SPablo de LaraEFD is not a FIB data structure with an exact match the input flow key.
1570dd62a01SPablo de Lara
1580dd62a01SPablo de Lara.. _Efd_example:
1590dd62a01SPablo de Lara
1600dd62a01SPablo de LaraExample of EFD Library Usage
1610dd62a01SPablo de Lara----------------------------
1620dd62a01SPablo de Lara
1630dd62a01SPablo de LaraEFD can be used along the data path of many network functions and middleboxes.
1640dd62a01SPablo de LaraAs previously mentioned, it can used as an index table for
1650dd62a01SPablo de Lara<key,value> pairs, meta-data for objects, a flow-level load balancer, etc.
1660dd62a01SPablo de Lara:numref:`figure_efd6` shows an example of using EFD as a flow-level load
1670dd62a01SPablo de Larabalancer, where flows are received at a front end server before being forwarded
1680dd62a01SPablo de Larato the target back end server for processing. The system designer would
1690dd62a01SPablo de Laradeterministically co-locate flows together in order to minimize cross-server
1700dd62a01SPablo de Larainteraction.
1710dd62a01SPablo de Lara(For example, flows requesting certain webpage objects are co-located
1720dd62a01SPablo de Laratogether, to minimize forwarding of common objects across servers).
1730dd62a01SPablo de Lara
1740dd62a01SPablo de Lara.. _figure_efd6:
1750dd62a01SPablo de Lara
1760dd62a01SPablo de Lara.. figure:: img/efd_i6.*
1770dd62a01SPablo de Lara
1780dd62a01SPablo de Lara  EFD as a Flow-Level Load Balancer
1790dd62a01SPablo de Lara
1800dd62a01SPablo de LaraAs shown in :numref:`figure_efd6`, the front end server will have an EFD table that
1810dd62a01SPablo de Larastores for each group what is the perfect hash index that satisfies the
1820dd62a01SPablo de Laracorrect output. Because the table size is small and fits in cache (since
1830dd62a01SPablo de Larakeys are not stored), it sustains a large number of flows (N*X, where N
1840dd62a01SPablo de Larais the maximum number of flows served by each back end server of the X
1850dd62a01SPablo de Larapossible targets).
1860dd62a01SPablo de Lara
1870dd62a01SPablo de LaraWith an input flow key, the group id is computed (for example, using
1880dd62a01SPablo de Laralast few bits of CRC hash) and then the EFD table is indexed with the
1890dd62a01SPablo de Laragroup id to retrieve the corresponding hash index to use. Once the index
1900dd62a01SPablo de Larais retrieved the key is hashed using this hash function and the result
1910dd62a01SPablo de Larawill be the intended correct target where this flow is supposed to be
1920dd62a01SPablo de Laraprocessed.
1930dd62a01SPablo de Lara
1940dd62a01SPablo de LaraIt should be noted that as a result of EFD not matching the exact key but
1950dd62a01SPablo de Lararather distributing the flows to a target back end node based on the
1960dd62a01SPablo de Laraperfect hash index, a key that has not been inserted before
1970dd62a01SPablo de Larawill be distributed to a valid target. Hence, a local table which stores
1980dd62a01SPablo de Larathe flows served at each node is used and is
1990dd62a01SPablo de Laraexact matched with the input key to rule out new never seen before
2000dd62a01SPablo de Laraflows.
2010dd62a01SPablo de Lara
2020dd62a01SPablo de Lara.. _Efd_api:
2030dd62a01SPablo de Lara
2040dd62a01SPablo de LaraLibrary API Overview
2050dd62a01SPablo de Lara--------------------
2060dd62a01SPablo de Lara
2070dd62a01SPablo de LaraThe EFD library API is created with a very similar semantics of a
2080dd62a01SPablo de Larahash-index or a flow table. The application creates an EFD table for a
2090dd62a01SPablo de Laragiven maximum number of flows, a function is called to insert a flow key
2100dd62a01SPablo de Larawith a specific target value, and another function is used to retrieve
2110dd62a01SPablo de Laratarget values for a given individual flow key or a bulk of keys.
2120dd62a01SPablo de Lara
2130dd62a01SPablo de LaraEFD Table Create
2140dd62a01SPablo de Lara~~~~~~~~~~~~~~~~
2150dd62a01SPablo de Lara
2160dd62a01SPablo de LaraThe function ``rte_efd_create()`` is used to create and return a pointer
2170dd62a01SPablo de Larato an EFD table that is sized to hold up to num_flows key.
2180dd62a01SPablo de LaraThe online version of the EFD table (the one that does
2190dd62a01SPablo de Laranot store the keys and is used for lookups) will be allocated and
2200dd62a01SPablo de Laracreated in the last level cache (LLC) of the socket defined by the
2210dd62a01SPablo de Laraonline_socket_bitmask, while the offline EFD table (the one that
2220dd62a01SPablo de Larastores the keys and is used for key inserts and for computing the
2230dd62a01SPablo de Laraperfect hashing) is allocated and created in the LLC of the socket
2240dd62a01SPablo de Laradefined by offline_socket_bitmask. It should be noted, that for
2250dd62a01SPablo de Larahighest performance the socket id should match that where the thread is
2260dd62a01SPablo de Lararunning, i.e. the online EFD lookup table should be created on the same
2270dd62a01SPablo de Larasocket as where the lookup thread is running.
2280dd62a01SPablo de Lara
2290dd62a01SPablo de LaraEFD Insert and Update
2300dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~
2310dd62a01SPablo de Lara
2320dd62a01SPablo de LaraThe EFD function to insert a key or update a key to a new value is
2330dd62a01SPablo de Lara``rte_efd_update()``. This function will update an existing key to
2340dd62a01SPablo de Laraa new value (target) if the key has already been inserted
2350dd62a01SPablo de Larabefore, or will insert the <key,value> pair if this key has not been inserted
2360dd62a01SPablo de Larabefore. It will return 0 upon success. It will return
2370dd62a01SPablo de Lara``EFD_UPDATE_WARN_GROUP_FULL (1)`` if the operation is insert, and the
2380dd62a01SPablo de Laralast available space in the key's group was just used. It will return
2390dd62a01SPablo de Lara``EFD_UPDATE_FAILED (2)`` when the insertion or update has failed (either it
2400dd62a01SPablo de Larafailed to find a suitable perfect hash or the group was full). The function
2410dd62a01SPablo de Larawill return ``EFD_UPDATE_NO_CHANGE (3)`` if there is no change to the EFD
2420dd62a01SPablo de Laratable (i.e, same value already exists).
2430dd62a01SPablo de Lara
244b09efeb9SPablo de Lara.. Note::
245b09efeb9SPablo de Lara
246b09efeb9SPablo de Lara   This function is not multi-thread safe and should only be called
247b09efeb9SPablo de Lara   from one thread.
248b09efeb9SPablo de Lara
2490dd62a01SPablo de LaraEFD Lookup
2500dd62a01SPablo de Lara~~~~~~~~~~
2510dd62a01SPablo de Lara
2520dd62a01SPablo de LaraTo lookup a certain key in an EFD table, the function ``rte_efd_lookup()``
2530dd62a01SPablo de Larais used to return the value associated with single key.
2540dd62a01SPablo de LaraAs previously mentioned, if the key has been inserted, the correct value
2550dd62a01SPablo de Larainserted is returned, if the key has not been inserted before,
2560dd62a01SPablo de Laraa ‘random’ value (based on hashing of the key) is returned.
2570dd62a01SPablo de LaraFor better performance and to decrease the overhead of
2580dd62a01SPablo de Larafunction calls per key, it is always recommended to use a bulk lookup
2590dd62a01SPablo de Larafunction (simultaneous lookup of multiple keys) instead of a single key
2600dd62a01SPablo de Laralookup function. ``rte_efd_lookup_bulk()`` is the bulk lookup function,
2610dd62a01SPablo de Larathat looks up num_keys simultaneously stored in the key_list and the
2620dd62a01SPablo de Laracorresponding return values will be returned in the value_list.
2630dd62a01SPablo de Lara
264b09efeb9SPablo de Lara.. Note::
265b09efeb9SPablo de Lara
266b09efeb9SPablo de Lara   This function is multi-thread safe, but there should not be other threads
267b09efeb9SPablo de Lara   writing in the EFD table, unless locks are used.
268b09efeb9SPablo de Lara
2690dd62a01SPablo de LaraEFD Delete
2700dd62a01SPablo de Lara~~~~~~~~~~
2710dd62a01SPablo de Lara
2720dd62a01SPablo de LaraTo delete a certain key in an EFD table, the function
2730dd62a01SPablo de Lara``rte_efd_delete()`` can be used. The function returns zero upon success
2740dd62a01SPablo de Larawhen the key has been found and deleted. Socket_id is the parameter to
2750dd62a01SPablo de Larause to lookup the existing value, which is ideally the caller's socket id.
2760dd62a01SPablo de LaraThe previous value associated with this key will be returned
2770dd62a01SPablo de Larain the prev_value argument.
2780dd62a01SPablo de Lara
279b09efeb9SPablo de Lara.. Note::
280b09efeb9SPablo de Lara
281b09efeb9SPablo de Lara   This function is not multi-thread safe and should only be called
282b09efeb9SPablo de Lara   from one thread.
283b09efeb9SPablo de Lara
2840dd62a01SPablo de Lara.. _Efd_internals:
2850dd62a01SPablo de Lara
2860dd62a01SPablo de LaraLibrary Internals
2870dd62a01SPablo de Lara-----------------
2880dd62a01SPablo de Lara
2890dd62a01SPablo de LaraThis section provides the brief high-level idea and an overview
2900dd62a01SPablo de Laraof the library internals to accompany the RFC. The intent of this
2910dd62a01SPablo de Larasection is to explain to readers the high-level implementation of
2920dd62a01SPablo de Larainsert, lookup and group rebalancing in the EFD library.
2930dd62a01SPablo de Lara
2940dd62a01SPablo de LaraInsert Function Internals
2950dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~
2960dd62a01SPablo de Lara
2970dd62a01SPablo de LaraAs previously mentioned the EFD divides the whole set of keys into
2980dd62a01SPablo de Laragroups of a manageable size (e.g. 28 keys) and then searches for the
2990dd62a01SPablo de Laraperfect hash that satisfies the intended target value for each key. EFD
3000dd62a01SPablo de Larastores two version of the <key,value> table:
3010dd62a01SPablo de Lara
3020dd62a01SPablo de Lara-  Offline Version (in memory): Only used for the insertion/update
3030dd62a01SPablo de Lara   operation, which is less frequent than the lookup operation. In the
3040dd62a01SPablo de Lara   offline version the exact keys for each group is stored. When a new
3050dd62a01SPablo de Lara   key is added, the hash function is updated that will satisfy the
3060dd62a01SPablo de Lara   value for the new key together with the all old keys already inserted
3070dd62a01SPablo de Lara   in this group.
3080dd62a01SPablo de Lara
3090dd62a01SPablo de Lara-  Online Version (in cache): Used for the frequent lookup operation. In
3100dd62a01SPablo de Lara   the online version, as previously mentioned, the keys are not stored
3110dd62a01SPablo de Lara   but rather only the hash index for each group.
3120dd62a01SPablo de Lara
3130dd62a01SPablo de Lara.. _figure_efd7:
3140dd62a01SPablo de Lara
3150dd62a01SPablo de Lara.. figure:: img/efd_i7.*
3160dd62a01SPablo de Lara
3170dd62a01SPablo de Lara  Group Assignment
3180dd62a01SPablo de Lara
3190dd62a01SPablo de Lara:numref:`figure_efd7` depicts the group assignment for 7 flow keys as an example.
3200dd62a01SPablo de LaraGiven a flow key, a hash function (in our implementation CRC hash) is
3210dd62a01SPablo de Laraused to get the group id. As shown in the figure, the groups can be
3220dd62a01SPablo de Laraunbalanced. (We highlight group rebalancing further below).
3230dd62a01SPablo de Lara
3240dd62a01SPablo de Lara.. _figure_efd8:
3250dd62a01SPablo de Lara
3260dd62a01SPablo de Lara.. figure:: img/efd_i8.*
3270dd62a01SPablo de Lara
3280dd62a01SPablo de Lara  Perfect Hash Search - Assigned Keys & Target Value
3290dd62a01SPablo de Lara
3300dd62a01SPablo de LaraFocusing on one group that has four keys, :numref:`figure_efd8` depicts the search
3310dd62a01SPablo de Laraalgorithm to find the perfect hash function. Assuming that the target
3320dd62a01SPablo de Laravalue bit for the keys is as shown in the figure, then the online EFD
3330dd62a01SPablo de Laratable will store a 16 bit hash index and 16 bit lookup table per group
3340dd62a01SPablo de Laraper value bit.
3350dd62a01SPablo de Lara
3360dd62a01SPablo de Lara.. _figure_efd9:
3370dd62a01SPablo de Lara
3380dd62a01SPablo de Lara.. figure:: img/efd_i9.*
3390dd62a01SPablo de Lara
3400dd62a01SPablo de Lara  Perfect Hash Search - Satisfy Target Values
3410dd62a01SPablo de Lara
3420dd62a01SPablo de LaraFor a given keyX, a hash function ``(h(keyX, seed1) + index * h(keyX, seed2))``
3430dd62a01SPablo de Larais used to point to certain bit index in the 16bit lookup_table value,
3440dd62a01SPablo de Laraas shown in :numref:`figure_efd9`.
3450dd62a01SPablo de LaraThe insert function will brute force search for all possible values for the
3460dd62a01SPablo de Larahash index until a non conflicting lookup_table is found.
3470dd62a01SPablo de Lara
3480dd62a01SPablo de Lara.. _figure_efd10:
3490dd62a01SPablo de Lara
3500dd62a01SPablo de Lara.. figure:: img/efd_i10.*
3510dd62a01SPablo de Lara
3520dd62a01SPablo de Lara  Finding Hash Index for Conflict Free lookup_table
3530dd62a01SPablo de Lara
3540dd62a01SPablo de LaraFor example, since both key3 and key7 have a target bit value of 1, it
3550dd62a01SPablo de Larais okay if the hash function of both keys point to the same bit in the
3560dd62a01SPablo de Laralookup table. A conflict will occur if a hash index is used that maps
3570dd62a01SPablo de Laraboth Key4 and Key7 to the same index in the lookup_table,
3580dd62a01SPablo de Laraas shown in :numref:`figure_efd10`, since their target value bit are not the same.
3590dd62a01SPablo de LaraOnce a hash index is found that produces a lookup_table with no
3600dd62a01SPablo de Laracontradictions, this index is stored for this group. This procedure is
3610dd62a01SPablo de Lararepeated for each bit of target value.
3620dd62a01SPablo de Lara
3630dd62a01SPablo de LaraLookup Function Internals
3640dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~
3650dd62a01SPablo de Lara
3660dd62a01SPablo de LaraThe design principle of EFD is that lookups are much more frequent than
3670dd62a01SPablo de Larainserts, and hence, EFD's design optimizes for the lookups which are
3680dd62a01SPablo de Larafaster and much simpler than the slower insert procedure (inserts are
3690dd62a01SPablo de Laraslow, because of perfect hash search as previously discussed).
3700dd62a01SPablo de Lara
3710dd62a01SPablo de Lara.. _figure_efd11:
3720dd62a01SPablo de Lara
3730dd62a01SPablo de Lara.. figure:: img/efd_i11.*
3740dd62a01SPablo de Lara
3750dd62a01SPablo de Lara  EFD Lookup Operation
3760dd62a01SPablo de Lara
3770dd62a01SPablo de Lara:numref:`figure_efd11` depicts the lookup operation for EFD. Given an input key,
3780dd62a01SPablo de Larathe group id is computed (using CRC hash) and then the hash index for this
3790dd62a01SPablo de Laragroup is retrieved from the EFD table. Using the retrieved hash index,
3800dd62a01SPablo de Larathe hash function ``h(key, seed1) + index *h(key, seed2)`` is used which will
3810dd62a01SPablo de Lararesult in an index in the lookup_table, the bit corresponding to this
3820dd62a01SPablo de Laraindex will be the target value bit. This procedure is repeated for each
3830dd62a01SPablo de Larabit of the target value.
3840dd62a01SPablo de Lara
3850dd62a01SPablo de LaraGroup Rebalancing Function Internals
3860dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3870dd62a01SPablo de Lara
3880dd62a01SPablo de LaraWhen discussing EFD inserts and lookups, the discussion is simplified by
3890dd62a01SPablo de Laraassuming that a group id is simply a result of hash function. However,
3900dd62a01SPablo de Larasince hashing in general is not perfect and will not always produce a
3910dd62a01SPablo de Larauniform output, this simplified assumption will lead to unbalanced
3920dd62a01SPablo de Laragroups, i.e., some group will have more keys than other groups.
3930dd62a01SPablo de LaraTypically, and to minimize insert time with an increasing number of keys,
3940dd62a01SPablo de Larait is preferable that all groups will have a balanced number of keys, so
3950dd62a01SPablo de Larathe brute force search for the perfect hash terminates with a valid hash
3960dd62a01SPablo de Laraindex. In order to achieve this target, groups are rebalanced during
3970dd62a01SPablo de Lararuntime inserts, and keys are moved around from a busy group to a less
3980dd62a01SPablo de Laracrowded group as the more keys are inserted.
3990dd62a01SPablo de Lara
4000dd62a01SPablo de Lara.. _figure_efd12:
4010dd62a01SPablo de Lara
4020dd62a01SPablo de Lara.. figure:: img/efd_i12.*
4030dd62a01SPablo de Lara
4040dd62a01SPablo de Lara  Runtime Group Rebalancing
4050dd62a01SPablo de Lara
4060dd62a01SPablo de Lara:numref:`figure_efd12` depicts the high level idea of group rebalancing, given an
4070dd62a01SPablo de Larainput key the hash result is split into two parts a chunk id and 8-bit
4080dd62a01SPablo de Larabin id. A chunk contains 64 different groups and 256 bins (i.e. for any
4090dd62a01SPablo de Laragiven bin it can map to 4 distinct groups). When a key is inserted, the
4100dd62a01SPablo de Larabin id is computed, for example in :numref:`figure_efd12` bin_id=2,
4110dd62a01SPablo de Laraand since each bin can be mapped to one of four different groups (2 bit storage),
4120dd62a01SPablo de Larathe four possible mappings are evaluated and the one that will result in a
4130dd62a01SPablo de Larabalanced key distribution across these four is selected the mapping result
4140dd62a01SPablo de Larais stored in these two bits.
4150dd62a01SPablo de Lara
4160dd62a01SPablo de Lara
4170dd62a01SPablo de Lara.. _Efd_references:
4180dd62a01SPablo de Lara
4190dd62a01SPablo de LaraReferences
4200dd62a01SPablo de Lara-----------
4210dd62a01SPablo de Lara
4220dd62a01SPablo de Lara1- EFD is based on collaborative research work between Intel and
4230dd62a01SPablo de LaraCarnegie Mellon University (CMU), interested readers can refer to the paper
424d629b7b5SJohn McNamara"Scaling Up Clustered Network Appliances with ScaleBricks" Dong Zhou et al.
4250dd62a01SPablo de Laraat SIGCOMM 2015 (`http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p241.pdf`)
4260dd62a01SPablo de Larafor more information.
427