xref: /dpdk/doc/guides/prog_guide/efd_lib.rst (revision b09efeb9d56659b4f4966d693916d8d547397a73)
10dd62a01SPablo de Lara..  BSD LICENSE
20dd62a01SPablo de Lara    Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
30dd62a01SPablo de Lara    All rights reserved.
40dd62a01SPablo de Lara
50dd62a01SPablo de Lara    Redistribution and use in source and binary forms, with or without
60dd62a01SPablo de Lara    modification, are permitted provided that the following conditions
70dd62a01SPablo de Lara    are met:
80dd62a01SPablo de Lara
90dd62a01SPablo de Lara    * Redistributions of source code must retain the above copyright
100dd62a01SPablo de Lara    notice, this list of conditions and the following disclaimer.
110dd62a01SPablo de Lara    * Redistributions in binary form must reproduce the above copyright
120dd62a01SPablo de Lara    notice, this list of conditions and the following disclaimer in
130dd62a01SPablo de Lara    the documentation and/or other materials provided with the
140dd62a01SPablo de Lara    distribution.
150dd62a01SPablo de Lara    * Neither the name of Intel Corporation nor the names of its
160dd62a01SPablo de Lara    contributors may be used to endorse or promote products derived
170dd62a01SPablo de Lara    from this software without specific prior written permission.
180dd62a01SPablo de Lara
190dd62a01SPablo de Lara    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
200dd62a01SPablo de Lara    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
210dd62a01SPablo de Lara    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
220dd62a01SPablo de Lara    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
230dd62a01SPablo de Lara    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
240dd62a01SPablo de Lara    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
250dd62a01SPablo de Lara    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
260dd62a01SPablo de Lara    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
270dd62a01SPablo de Lara    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
280dd62a01SPablo de Lara    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
290dd62a01SPablo de Lara    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
300dd62a01SPablo de Lara
310dd62a01SPablo de Lara.. _Efd_Library:
320dd62a01SPablo de Lara
330dd62a01SPablo de LaraElastic Flow Distributor Library
340dd62a01SPablo de Lara================================
350dd62a01SPablo de Lara
360dd62a01SPablo de LaraIntroduction
370dd62a01SPablo de Lara------------
380dd62a01SPablo de Lara
390dd62a01SPablo de LaraIn Data Centers today, clustering and scheduling of distributed workloads
400dd62a01SPablo de Larais a very common task. Many workloads require a deterministic
410dd62a01SPablo de Larapartitioning of a flat key space among a cluster of machines. When a
420dd62a01SPablo de Larapacket enters the cluster, the ingress node will direct the packet to
430dd62a01SPablo de Laraits handling node. For example, data-centers with disaggregated storage
440dd62a01SPablo de Larause storage metadata tables to forward I/O requests to the correct back end
450dd62a01SPablo de Larastorage cluster, stateful packet inspection will use match incoming
460dd62a01SPablo de Laraflows to signatures in flow tables to send incoming packets to their
470dd62a01SPablo de Laraintended deep packet inspection (DPI) devices, and so on.
480dd62a01SPablo de Lara
490dd62a01SPablo de LaraEFD is a distributor library that uses perfect hashing to determine a
500dd62a01SPablo de Laratarget/value for a given incoming flow key. It has the following
510dd62a01SPablo de Laraadvantages: first, because it uses perfect hashing it does not store the
520dd62a01SPablo de Larakey itself and hence lookup performance is not dependent on the key
530dd62a01SPablo de Larasize. Second, the target/value can be any arbitrary value hence the
540dd62a01SPablo de Larasystem designer and/or operator can better optimize service rates and
550dd62a01SPablo de Larainter-cluster network traffic locating. Third, since the storage
560dd62a01SPablo de Lararequirement is much smaller than a hash-based flow table (i.e. better
570dd62a01SPablo de Larafit for CPU cache), EFD can scale to millions of flow keys. Finally,
580dd62a01SPablo de Larawith the current optimized library implementation, performance is fully
590dd62a01SPablo de Larascalable with any number of CPU cores.
600dd62a01SPablo de Lara
610dd62a01SPablo de LaraFlow Based Distribution
620dd62a01SPablo de Lara-----------------------
630dd62a01SPablo de Lara
640dd62a01SPablo de LaraComputation Based Schemes
650dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~
660dd62a01SPablo de Lara
670dd62a01SPablo de LaraFlow distribution and/or load balancing can be simply done using a
680dd62a01SPablo de Larastateless computation, for instance using round-robin or a simple
690dd62a01SPablo de Laracomputation based on the flow key as an input. For example, a hash
700dd62a01SPablo de Larafunction can be used to direct a certain flow to a target based on
710dd62a01SPablo de Larathe flow key (e.g. ``h(key) mod n``) where h(key) is the hash value of the
720dd62a01SPablo de Laraflow key and n is the number of possible targets.
730dd62a01SPablo de Lara
740dd62a01SPablo de Lara.. _figure_efd1:
750dd62a01SPablo de Lara
760dd62a01SPablo de Lara.. figure:: img/efd_i1.*
770dd62a01SPablo de Lara
780dd62a01SPablo de Lara  Load Balancing Using Front End Node
790dd62a01SPablo de Lara
800dd62a01SPablo de LaraIn this scheme (:numref:`figure_efd1`), the front end server/distributor/load balancer
810dd62a01SPablo de Laraextracts the flow key from the input packet and applies a computation to determine where
820dd62a01SPablo de Larathis flow should be directed. Intuitively, this scheme is very simple
830dd62a01SPablo de Laraand requires no state to be kept at the front end node, and hence,
840dd62a01SPablo de Larastorage requirements are minimum.
850dd62a01SPablo de Lara
860dd62a01SPablo de Lara.. _figure_efd2:
870dd62a01SPablo de Lara
880dd62a01SPablo de Lara.. figure:: img/efd_i2.*
890dd62a01SPablo de Lara
900dd62a01SPablo de Lara  Consistent Hashing
910dd62a01SPablo de Lara
920dd62a01SPablo de LaraA widely used flow distributor that belongs to the same category of
930dd62a01SPablo de Laracomputation-based schemes is ``consistent hashing``, shown in :numref:`figure_efd2`.
940dd62a01SPablo de LaraTarget destinations (shown in red) are hashed into the same space as the flow
950dd62a01SPablo de Larakeys (shown in blue), and keys are mapped to the nearest target in a clockwise
960dd62a01SPablo de Larafashion. Dynamically adding and removing targets with consistent hashing
970dd62a01SPablo de Lararequires only K/n keys to be remapped on average, where K is the number of
980dd62a01SPablo de Larakeys, and n is the number of targets. In contrast, in a traditional hash-based
990dd62a01SPablo de Larascheme, a change in the number of targets causes nearly all keys to be
1000dd62a01SPablo de Lararemapped.
1010dd62a01SPablo de Lara
1020dd62a01SPablo de LaraAlthough computation-based schemes are simple and need very little
1030dd62a01SPablo de Larastorage requirement, they suffer from the drawback that the system
1040dd62a01SPablo de Laradesigner/operator can’t fully control the target to assign a specific
1050dd62a01SPablo de Larakey, as this is dictated by the hash function.
1060dd62a01SPablo de LaraDeterministically co-locating of keys together (for example, to minimize
1070dd62a01SPablo de Larainter-server traffic or to optimize for network traffic conditions,
1080dd62a01SPablo de Laratarget load, etc.) is simply not possible.
1090dd62a01SPablo de Lara
1100dd62a01SPablo de LaraFlow-Table Based Schemes
1110dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~
1120dd62a01SPablo de Lara
1130dd62a01SPablo de LaraWhen using a Flow-Table based scheme to handle flow distribution/load
1140dd62a01SPablo de Larabalancing, in contrast with computation-based schemes, the system designer
1150dd62a01SPablo de Larahas the flexibility of assigning a given flow to any given
1160dd62a01SPablo de Laratarget. The flow table (e.g. DPDK RTE Hash Library) will simply store
1170dd62a01SPablo de Laraboth the flow key and the target value.
1180dd62a01SPablo de Lara
1190dd62a01SPablo de Lara.. _figure_efd3:
1200dd62a01SPablo de Lara
1210dd62a01SPablo de Lara.. figure:: img/efd_i3.*
1220dd62a01SPablo de Lara
1230dd62a01SPablo de Lara  Table Based Flow Distribution
1240dd62a01SPablo de Lara
1250dd62a01SPablo de LaraAs shown in :numref:`figure_efd3`, when doing a lookup, the flow-table
1260dd62a01SPablo de Larais indexed with the hash of the flow key and the keys (more than one is possible,
1270dd62a01SPablo de Larabecause of hash collision) stored in this index and corresponding values
1280dd62a01SPablo de Laraare retrieved. The retrieved key(s) is matched with the input flow key
1290dd62a01SPablo de Laraand if there is a match the value (target id) is returned.
1300dd62a01SPablo de Lara
1310dd62a01SPablo de LaraThe drawback of using a hash table for flow distribution/load balancing
1320dd62a01SPablo de Larais the storage requirement, since the flow table need to store keys,
1330dd62a01SPablo de Larasignatures and target values. This doesn't allow this scheme to scale to
1340dd62a01SPablo de Laramillions of flow keys. Large tables will usually not fit in
1350dd62a01SPablo de Larathe CPU cache, and hence, the lookup performance is degraded because of
1360dd62a01SPablo de Larathe latency to access the main memory.
1370dd62a01SPablo de Lara
1380dd62a01SPablo de LaraEFD Based Scheme
1390dd62a01SPablo de Lara~~~~~~~~~~~~~~~~
1400dd62a01SPablo de Lara
1410dd62a01SPablo de LaraEFD combines the advantages of both flow-table based and computation-based
1420dd62a01SPablo de Laraschemes. It doesn't require the large storage necessary for
1430dd62a01SPablo de Laraflow-table based schemes (because EFD doesn't store the key as explained
1440dd62a01SPablo de Larabelow), and it supports any arbitrary value for any given key.
1450dd62a01SPablo de Lara
1460dd62a01SPablo de Lara.. _figure_efd4:
1470dd62a01SPablo de Lara
1480dd62a01SPablo de Lara.. figure:: img/efd_i4.*
1490dd62a01SPablo de Lara
1500dd62a01SPablo de Lara  Searching for Perfect Hash Function
1510dd62a01SPablo de Lara
1520dd62a01SPablo de LaraThe basic idea of EFD is when a given key is to be inserted, a family of
1530dd62a01SPablo de Larahash functions is searched until the correct hash function that maps the
1540dd62a01SPablo de Larainput key to the correct value is found, as shown in :numref:`figure_efd4`.
1550dd62a01SPablo de LaraHowever, rather than explicitly storing all keys and their associated values,
1560dd62a01SPablo de LaraEFD stores only indices of hash functions that map keys to values, and
1570dd62a01SPablo de Larathereby consumes much less space than conventional flow-based tables.
1580dd62a01SPablo de LaraThe lookup operation is very simple, similar to a computational-based
1590dd62a01SPablo de Larascheme: given an input key the lookup operation is reduced to hashing
1600dd62a01SPablo de Larathat key with the correct hash function.
1610dd62a01SPablo de Lara
1620dd62a01SPablo de Lara.. _figure_efd5:
1630dd62a01SPablo de Lara
1640dd62a01SPablo de Lara.. figure:: img/efd_i5.*
1650dd62a01SPablo de Lara
1660dd62a01SPablo de Lara  Divide and Conquer for Millions of Keys
1670dd62a01SPablo de Lara
1680dd62a01SPablo de LaraIntuitively, finding a hash function that maps each of a large number
1690dd62a01SPablo de Lara(millions) of input keys to the correct output value is effectively
1700dd62a01SPablo de Laraimpossible, as a result EFD, as shown in :numref:`figure_efd5`,
1710dd62a01SPablo de Larabreaks the problem into smaller pieces (divide and conquer).
1720dd62a01SPablo de LaraEFD divides the entire input key set into many small groups.
1730dd62a01SPablo de LaraEach group consists of approximately 20-28 keys (a configurable parameter
1740dd62a01SPablo de Larafor the library), then, for each small group, a brute force search to find
1750dd62a01SPablo de Laraa hash function that produces the correct outputs for each key in the group.
1760dd62a01SPablo de Lara
1770dd62a01SPablo de LaraIt should be mentioned that, since the online lookup table for EFD
1780dd62a01SPablo de Laradoesn't store the key itself, the size of the EFD table is independent
1790dd62a01SPablo de Laraof the key size and hence EFD lookup performance which is almost
1800dd62a01SPablo de Laraconstant irrespective of the length of the key which is a highly
1810dd62a01SPablo de Laradesirable feature especially for longer keys.
1820dd62a01SPablo de Lara
1830dd62a01SPablo de LaraIn summary, EFD is a set separation data structure that supports millions of
1840dd62a01SPablo de Larakeys. It is used to distribute a given key to an intended target. By itself
1850dd62a01SPablo de LaraEFD is not a FIB data structure with an exact match the input flow key.
1860dd62a01SPablo de Lara
1870dd62a01SPablo de Lara.. _Efd_example:
1880dd62a01SPablo de Lara
1890dd62a01SPablo de LaraExample of EFD Library Usage
1900dd62a01SPablo de Lara----------------------------
1910dd62a01SPablo de Lara
1920dd62a01SPablo de LaraEFD can be used along the data path of many network functions and middleboxes.
1930dd62a01SPablo de LaraAs previously mentioned, it can used as an index table for
1940dd62a01SPablo de Lara<key,value> pairs, meta-data for objects, a flow-level load balancer, etc.
1950dd62a01SPablo de Lara:numref:`figure_efd6` shows an example of using EFD as a flow-level load
1960dd62a01SPablo de Larabalancer, where flows are received at a front end server before being forwarded
1970dd62a01SPablo de Larato the target back end server for processing. The system designer would
1980dd62a01SPablo de Laradeterministically co-locate flows together in order to minimize cross-server
1990dd62a01SPablo de Larainteraction.
2000dd62a01SPablo de Lara(For example, flows requesting certain webpage objects are co-located
2010dd62a01SPablo de Laratogether, to minimize forwarding of common objects across servers).
2020dd62a01SPablo de Lara
2030dd62a01SPablo de Lara.. _figure_efd6:
2040dd62a01SPablo de Lara
2050dd62a01SPablo de Lara.. figure:: img/efd_i6.*
2060dd62a01SPablo de Lara
2070dd62a01SPablo de Lara  EFD as a Flow-Level Load Balancer
2080dd62a01SPablo de Lara
2090dd62a01SPablo de LaraAs shown in :numref:`figure_efd6`, the front end server will have an EFD table that
2100dd62a01SPablo de Larastores for each group what is the perfect hash index that satisfies the
2110dd62a01SPablo de Laracorrect output. Because the table size is small and fits in cache (since
2120dd62a01SPablo de Larakeys are not stored), it sustains a large number of flows (N*X, where N
2130dd62a01SPablo de Larais the maximum number of flows served by each back end server of the X
2140dd62a01SPablo de Larapossible targets).
2150dd62a01SPablo de Lara
2160dd62a01SPablo de LaraWith an input flow key, the group id is computed (for example, using
2170dd62a01SPablo de Laralast few bits of CRC hash) and then the EFD table is indexed with the
2180dd62a01SPablo de Laragroup id to retrieve the corresponding hash index to use. Once the index
2190dd62a01SPablo de Larais retrieved the key is hashed using this hash function and the result
2200dd62a01SPablo de Larawill be the intended correct target where this flow is supposed to be
2210dd62a01SPablo de Laraprocessed.
2220dd62a01SPablo de Lara
2230dd62a01SPablo de LaraIt should be noted that as a result of EFD not matching the exact key but
2240dd62a01SPablo de Lararather distributing the flows to a target back end node based on the
2250dd62a01SPablo de Laraperfect hash index, a key that has not been inserted before
2260dd62a01SPablo de Larawill be distributed to a valid target. Hence, a local table which stores
2270dd62a01SPablo de Larathe flows served at each node is used and is
2280dd62a01SPablo de Laraexact matched with the input key to rule out new never seen before
2290dd62a01SPablo de Laraflows.
2300dd62a01SPablo de Lara
2310dd62a01SPablo de Lara.. _Efd_api:
2320dd62a01SPablo de Lara
2330dd62a01SPablo de LaraLibrary API Overview
2340dd62a01SPablo de Lara--------------------
2350dd62a01SPablo de Lara
2360dd62a01SPablo de LaraThe EFD library API is created with a very similar semantics of a
2370dd62a01SPablo de Larahash-index or a flow table. The application creates an EFD table for a
2380dd62a01SPablo de Laragiven maximum number of flows, a function is called to insert a flow key
2390dd62a01SPablo de Larawith a specific target value, and another function is used to retrieve
2400dd62a01SPablo de Laratarget values for a given individual flow key or a bulk of keys.
2410dd62a01SPablo de Lara
2420dd62a01SPablo de LaraEFD Table Create
2430dd62a01SPablo de Lara~~~~~~~~~~~~~~~~
2440dd62a01SPablo de Lara
2450dd62a01SPablo de LaraThe function ``rte_efd_create()`` is used to create and return a pointer
2460dd62a01SPablo de Larato an EFD table that is sized to hold up to num_flows key.
2470dd62a01SPablo de LaraThe online version of the EFD table (the one that does
2480dd62a01SPablo de Laranot store the keys and is used for lookups) will be allocated and
2490dd62a01SPablo de Laracreated in the last level cache (LLC) of the socket defined by the
2500dd62a01SPablo de Laraonline_socket_bitmask, while the offline EFD table (the one that
2510dd62a01SPablo de Larastores the keys and is used for key inserts and for computing the
2520dd62a01SPablo de Laraperfect hashing) is allocated and created in the LLC of the socket
2530dd62a01SPablo de Laradefined by offline_socket_bitmask. It should be noted, that for
2540dd62a01SPablo de Larahighest performance the socket id should match that where the thread is
2550dd62a01SPablo de Lararunning, i.e. the online EFD lookup table should be created on the same
2560dd62a01SPablo de Larasocket as where the lookup thread is running.
2570dd62a01SPablo de Lara
2580dd62a01SPablo de LaraEFD Insert and Update
2590dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~
2600dd62a01SPablo de Lara
2610dd62a01SPablo de LaraThe EFD function to insert a key or update a key to a new value is
2620dd62a01SPablo de Lara``rte_efd_update()``. This function will update an existing key to
2630dd62a01SPablo de Laraa new value (target) if the key has already been inserted
2640dd62a01SPablo de Larabefore, or will insert the <key,value> pair if this key has not been inserted
2650dd62a01SPablo de Larabefore. It will return 0 upon success. It will return
2660dd62a01SPablo de Lara``EFD_UPDATE_WARN_GROUP_FULL (1)`` if the operation is insert, and the
2670dd62a01SPablo de Laralast available space in the key's group was just used. It will return
2680dd62a01SPablo de Lara``EFD_UPDATE_FAILED (2)`` when the insertion or update has failed (either it
2690dd62a01SPablo de Larafailed to find a suitable perfect hash or the group was full). The function
2700dd62a01SPablo de Larawill return ``EFD_UPDATE_NO_CHANGE (3)`` if there is no change to the EFD
2710dd62a01SPablo de Laratable (i.e, same value already exists).
2720dd62a01SPablo de Lara
273*b09efeb9SPablo de Lara.. Note::
274*b09efeb9SPablo de Lara
275*b09efeb9SPablo de Lara   This function is not multi-thread safe and should only be called
276*b09efeb9SPablo de Lara   from one thread.
277*b09efeb9SPablo de Lara
2780dd62a01SPablo de LaraEFD Lookup
2790dd62a01SPablo de Lara~~~~~~~~~~
2800dd62a01SPablo de Lara
2810dd62a01SPablo de LaraTo lookup a certain key in an EFD table, the function ``rte_efd_lookup()``
2820dd62a01SPablo de Larais used to return the value associated with single key.
2830dd62a01SPablo de LaraAs previously mentioned, if the key has been inserted, the correct value
2840dd62a01SPablo de Larainserted is returned, if the key has not been inserted before,
2850dd62a01SPablo de Laraa ‘random’ value (based on hashing of the key) is returned.
2860dd62a01SPablo de LaraFor better performance and to decrease the overhead of
2870dd62a01SPablo de Larafunction calls per key, it is always recommended to use a bulk lookup
2880dd62a01SPablo de Larafunction (simultaneous lookup of multiple keys) instead of a single key
2890dd62a01SPablo de Laralookup function. ``rte_efd_lookup_bulk()`` is the bulk lookup function,
2900dd62a01SPablo de Larathat looks up num_keys simultaneously stored in the key_list and the
2910dd62a01SPablo de Laracorresponding return values will be returned in the value_list.
2920dd62a01SPablo de Lara
293*b09efeb9SPablo de Lara.. Note::
294*b09efeb9SPablo de Lara
295*b09efeb9SPablo de Lara   This function is multi-thread safe, but there should not be other threads
296*b09efeb9SPablo de Lara   writing in the EFD table, unless locks are used.
297*b09efeb9SPablo de Lara
2980dd62a01SPablo de LaraEFD Delete
2990dd62a01SPablo de Lara~~~~~~~~~~
3000dd62a01SPablo de Lara
3010dd62a01SPablo de LaraTo delete a certain key in an EFD table, the function
3020dd62a01SPablo de Lara``rte_efd_delete()`` can be used. The function returns zero upon success
3030dd62a01SPablo de Larawhen the key has been found and deleted. Socket_id is the parameter to
3040dd62a01SPablo de Larause to lookup the existing value, which is ideally the caller's socket id.
3050dd62a01SPablo de LaraThe previous value associated with this key will be returned
3060dd62a01SPablo de Larain the prev_value argument.
3070dd62a01SPablo de Lara
308*b09efeb9SPablo de Lara.. Note::
309*b09efeb9SPablo de Lara
310*b09efeb9SPablo de Lara   This function is not multi-thread safe and should only be called
311*b09efeb9SPablo de Lara   from one thread.
312*b09efeb9SPablo de Lara
3130dd62a01SPablo de Lara.. _Efd_internals:
3140dd62a01SPablo de Lara
3150dd62a01SPablo de LaraLibrary Internals
3160dd62a01SPablo de Lara-----------------
3170dd62a01SPablo de Lara
3180dd62a01SPablo de LaraThis section provides the brief high-level idea and an overview
3190dd62a01SPablo de Laraof the library internals to accompany the RFC. The intent of this
3200dd62a01SPablo de Larasection is to explain to readers the high-level implementation of
3210dd62a01SPablo de Larainsert, lookup and group rebalancing in the EFD library.
3220dd62a01SPablo de Lara
3230dd62a01SPablo de LaraInsert Function Internals
3240dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~
3250dd62a01SPablo de Lara
3260dd62a01SPablo de LaraAs previously mentioned the EFD divides the whole set of keys into
3270dd62a01SPablo de Laragroups of a manageable size (e.g. 28 keys) and then searches for the
3280dd62a01SPablo de Laraperfect hash that satisfies the intended target value for each key. EFD
3290dd62a01SPablo de Larastores two version of the <key,value> table:
3300dd62a01SPablo de Lara
3310dd62a01SPablo de Lara-  Offline Version (in memory): Only used for the insertion/update
3320dd62a01SPablo de Lara   operation, which is less frequent than the lookup operation. In the
3330dd62a01SPablo de Lara   offline version the exact keys for each group is stored. When a new
3340dd62a01SPablo de Lara   key is added, the hash function is updated that will satisfy the
3350dd62a01SPablo de Lara   value for the new key together with the all old keys already inserted
3360dd62a01SPablo de Lara   in this group.
3370dd62a01SPablo de Lara
3380dd62a01SPablo de Lara-  Online Version (in cache): Used for the frequent lookup operation. In
3390dd62a01SPablo de Lara   the online version, as previously mentioned, the keys are not stored
3400dd62a01SPablo de Lara   but rather only the hash index for each group.
3410dd62a01SPablo de Lara
3420dd62a01SPablo de Lara.. _figure_efd7:
3430dd62a01SPablo de Lara
3440dd62a01SPablo de Lara.. figure:: img/efd_i7.*
3450dd62a01SPablo de Lara
3460dd62a01SPablo de Lara  Group Assignment
3470dd62a01SPablo de Lara
3480dd62a01SPablo de Lara:numref:`figure_efd7` depicts the group assignment for 7 flow keys as an example.
3490dd62a01SPablo de LaraGiven a flow key, a hash function (in our implementation CRC hash) is
3500dd62a01SPablo de Laraused to get the group id. As shown in the figure, the groups can be
3510dd62a01SPablo de Laraunbalanced. (We highlight group rebalancing further below).
3520dd62a01SPablo de Lara
3530dd62a01SPablo de Lara.. _figure_efd8:
3540dd62a01SPablo de Lara
3550dd62a01SPablo de Lara.. figure:: img/efd_i8.*
3560dd62a01SPablo de Lara
3570dd62a01SPablo de Lara  Perfect Hash Search - Assigned Keys & Target Value
3580dd62a01SPablo de Lara
3590dd62a01SPablo de LaraFocusing on one group that has four keys, :numref:`figure_efd8` depicts the search
3600dd62a01SPablo de Laraalgorithm to find the perfect hash function. Assuming that the target
3610dd62a01SPablo de Laravalue bit for the keys is as shown in the figure, then the online EFD
3620dd62a01SPablo de Laratable will store a 16 bit hash index and 16 bit lookup table per group
3630dd62a01SPablo de Laraper value bit.
3640dd62a01SPablo de Lara
3650dd62a01SPablo de Lara.. _figure_efd9:
3660dd62a01SPablo de Lara
3670dd62a01SPablo de Lara.. figure:: img/efd_i9.*
3680dd62a01SPablo de Lara
3690dd62a01SPablo de Lara  Perfect Hash Search - Satisfy Target Values
3700dd62a01SPablo de Lara
3710dd62a01SPablo de LaraFor a given keyX, a hash function ``(h(keyX, seed1) + index * h(keyX, seed2))``
3720dd62a01SPablo de Larais used to point to certain bit index in the 16bit lookup_table value,
3730dd62a01SPablo de Laraas shown in :numref:`figure_efd9`.
3740dd62a01SPablo de LaraThe insert function will brute force search for all possible values for the
3750dd62a01SPablo de Larahash index until a non conflicting lookup_table is found.
3760dd62a01SPablo de Lara
3770dd62a01SPablo de Lara.. _figure_efd10:
3780dd62a01SPablo de Lara
3790dd62a01SPablo de Lara.. figure:: img/efd_i10.*
3800dd62a01SPablo de Lara
3810dd62a01SPablo de Lara  Finding Hash Index for Conflict Free lookup_table
3820dd62a01SPablo de Lara
3830dd62a01SPablo de LaraFor example, since both key3 and key7 have a target bit value of 1, it
3840dd62a01SPablo de Larais okay if the hash function of both keys point to the same bit in the
3850dd62a01SPablo de Laralookup table. A conflict will occur if a hash index is used that maps
3860dd62a01SPablo de Laraboth Key4 and Key7 to the same index in the lookup_table,
3870dd62a01SPablo de Laraas shown in :numref:`figure_efd10`, since their target value bit are not the same.
3880dd62a01SPablo de LaraOnce a hash index is found that produces a lookup_table with no
3890dd62a01SPablo de Laracontradictions, this index is stored for this group. This procedure is
3900dd62a01SPablo de Lararepeated for each bit of target value.
3910dd62a01SPablo de Lara
3920dd62a01SPablo de LaraLookup Function Internals
3930dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~
3940dd62a01SPablo de Lara
3950dd62a01SPablo de LaraThe design principle of EFD is that lookups are much more frequent than
3960dd62a01SPablo de Larainserts, and hence, EFD's design optimizes for the lookups which are
3970dd62a01SPablo de Larafaster and much simpler than the slower insert procedure (inserts are
3980dd62a01SPablo de Laraslow, because of perfect hash search as previously discussed).
3990dd62a01SPablo de Lara
4000dd62a01SPablo de Lara.. _figure_efd11:
4010dd62a01SPablo de Lara
4020dd62a01SPablo de Lara.. figure:: img/efd_i11.*
4030dd62a01SPablo de Lara
4040dd62a01SPablo de Lara  EFD Lookup Operation
4050dd62a01SPablo de Lara
4060dd62a01SPablo de Lara:numref:`figure_efd11` depicts the lookup operation for EFD. Given an input key,
4070dd62a01SPablo de Larathe group id is computed (using CRC hash) and then the hash index for this
4080dd62a01SPablo de Laragroup is retrieved from the EFD table. Using the retrieved hash index,
4090dd62a01SPablo de Larathe hash function ``h(key, seed1) + index *h(key, seed2)`` is used which will
4100dd62a01SPablo de Lararesult in an index in the lookup_table, the bit corresponding to this
4110dd62a01SPablo de Laraindex will be the target value bit. This procedure is repeated for each
4120dd62a01SPablo de Larabit of the target value.
4130dd62a01SPablo de Lara
4140dd62a01SPablo de LaraGroup Rebalancing Function Internals
4150dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4160dd62a01SPablo de Lara
4170dd62a01SPablo de LaraWhen discussing EFD inserts and lookups, the discussion is simplified by
4180dd62a01SPablo de Laraassuming that a group id is simply a result of hash function. However,
4190dd62a01SPablo de Larasince hashing in general is not perfect and will not always produce a
4200dd62a01SPablo de Larauniform output, this simplified assumption will lead to unbalanced
4210dd62a01SPablo de Laragroups, i.e., some group will have more keys than other groups.
4220dd62a01SPablo de LaraTypically, and to minimize insert time with an increasing number of keys,
4230dd62a01SPablo de Larait is preferable that all groups will have a balanced number of keys, so
4240dd62a01SPablo de Larathe brute force search for the perfect hash terminates with a valid hash
4250dd62a01SPablo de Laraindex. In order to achieve this target, groups are rebalanced during
4260dd62a01SPablo de Lararuntime inserts, and keys are moved around from a busy group to a less
4270dd62a01SPablo de Laracrowded group as the more keys are inserted.
4280dd62a01SPablo de Lara
4290dd62a01SPablo de Lara.. _figure_efd12:
4300dd62a01SPablo de Lara
4310dd62a01SPablo de Lara.. figure:: img/efd_i12.*
4320dd62a01SPablo de Lara
4330dd62a01SPablo de Lara  Runtime Group Rebalancing
4340dd62a01SPablo de Lara
4350dd62a01SPablo de Lara:numref:`figure_efd12` depicts the high level idea of group rebalancing, given an
4360dd62a01SPablo de Larainput key the hash result is split into two parts a chunk id and 8-bit
4370dd62a01SPablo de Larabin id. A chunk contains 64 different groups and 256 bins (i.e. for any
4380dd62a01SPablo de Laragiven bin it can map to 4 distinct groups). When a key is inserted, the
4390dd62a01SPablo de Larabin id is computed, for example in :numref:`figure_efd12` bin_id=2,
4400dd62a01SPablo de Laraand since each bin can be mapped to one of four different groups (2 bit storage),
4410dd62a01SPablo de Larathe four possible mappings are evaluated and the one that will result in a
4420dd62a01SPablo de Larabalanced key distribution across these four is selected the mapping result
4430dd62a01SPablo de Larais stored in these two bits.
4440dd62a01SPablo de Lara
4450dd62a01SPablo de Lara
4460dd62a01SPablo de Lara.. _Efd_references:
4470dd62a01SPablo de Lara
4480dd62a01SPablo de LaraReferences
4490dd62a01SPablo de Lara-----------
4500dd62a01SPablo de Lara
4510dd62a01SPablo de Lara1- EFD is based on collaborative research work between Intel and
4520dd62a01SPablo de LaraCarnegie Mellon University (CMU), interested readers can refer to the paper
4530dd62a01SPablo de Lara“Scaling Up Clustered Network Appliances with ScaleBricks;” Dong Zhou et al.
4540dd62a01SPablo de Laraat SIGCOMM 2015 (`http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p241.pdf`)
4550dd62a01SPablo de Larafor more information.
456