xref: /dpdk/doc/guides/prog_guide/efd_lib.rst (revision 0dd62a01874a5ac7becbfe18c0a8d0dc2483ec77)
1*0dd62a01SPablo de Lara..  BSD LICENSE
2*0dd62a01SPablo de Lara    Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
3*0dd62a01SPablo de Lara    All rights reserved.
4*0dd62a01SPablo de Lara
5*0dd62a01SPablo de Lara    Redistribution and use in source and binary forms, with or without
6*0dd62a01SPablo de Lara    modification, are permitted provided that the following conditions
7*0dd62a01SPablo de Lara    are met:
8*0dd62a01SPablo de Lara
9*0dd62a01SPablo de Lara    * Redistributions of source code must retain the above copyright
10*0dd62a01SPablo de Lara    notice, this list of conditions and the following disclaimer.
11*0dd62a01SPablo de Lara    * Redistributions in binary form must reproduce the above copyright
12*0dd62a01SPablo de Lara    notice, this list of conditions and the following disclaimer in
13*0dd62a01SPablo de Lara    the documentation and/or other materials provided with the
14*0dd62a01SPablo de Lara    distribution.
15*0dd62a01SPablo de Lara    * Neither the name of Intel Corporation nor the names of its
16*0dd62a01SPablo de Lara    contributors may be used to endorse or promote products derived
17*0dd62a01SPablo de Lara    from this software without specific prior written permission.
18*0dd62a01SPablo de Lara
19*0dd62a01SPablo de Lara    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20*0dd62a01SPablo de Lara    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21*0dd62a01SPablo de Lara    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22*0dd62a01SPablo de Lara    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23*0dd62a01SPablo de Lara    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24*0dd62a01SPablo de Lara    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25*0dd62a01SPablo de Lara    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26*0dd62a01SPablo de Lara    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27*0dd62a01SPablo de Lara    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28*0dd62a01SPablo de Lara    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29*0dd62a01SPablo de Lara    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30*0dd62a01SPablo de Lara
31*0dd62a01SPablo de Lara.. _Efd_Library:
32*0dd62a01SPablo de Lara
33*0dd62a01SPablo de LaraElastic Flow Distributor Library
34*0dd62a01SPablo de Lara================================
35*0dd62a01SPablo de Lara
36*0dd62a01SPablo de LaraIntroduction
37*0dd62a01SPablo de Lara------------
38*0dd62a01SPablo de Lara
39*0dd62a01SPablo de LaraIn Data Centers today, clustering and scheduling of distributed workloads
40*0dd62a01SPablo de Larais a very common task. Many workloads require a deterministic
41*0dd62a01SPablo de Larapartitioning of a flat key space among a cluster of machines. When a
42*0dd62a01SPablo de Larapacket enters the cluster, the ingress node will direct the packet to
43*0dd62a01SPablo de Laraits handling node. For example, data-centers with disaggregated storage
44*0dd62a01SPablo de Larause storage metadata tables to forward I/O requests to the correct back end
45*0dd62a01SPablo de Larastorage cluster, stateful packet inspection will use match incoming
46*0dd62a01SPablo de Laraflows to signatures in flow tables to send incoming packets to their
47*0dd62a01SPablo de Laraintended deep packet inspection (DPI) devices, and so on.
48*0dd62a01SPablo de Lara
49*0dd62a01SPablo de LaraEFD is a distributor library that uses perfect hashing to determine a
50*0dd62a01SPablo de Laratarget/value for a given incoming flow key. It has the following
51*0dd62a01SPablo de Laraadvantages: first, because it uses perfect hashing it does not store the
52*0dd62a01SPablo de Larakey itself and hence lookup performance is not dependent on the key
53*0dd62a01SPablo de Larasize. Second, the target/value can be any arbitrary value hence the
54*0dd62a01SPablo de Larasystem designer and/or operator can better optimize service rates and
55*0dd62a01SPablo de Larainter-cluster network traffic locating. Third, since the storage
56*0dd62a01SPablo de Lararequirement is much smaller than a hash-based flow table (i.e. better
57*0dd62a01SPablo de Larafit for CPU cache), EFD can scale to millions of flow keys. Finally,
58*0dd62a01SPablo de Larawith the current optimized library implementation, performance is fully
59*0dd62a01SPablo de Larascalable with any number of CPU cores.
60*0dd62a01SPablo de Lara
61*0dd62a01SPablo de LaraFlow Based Distribution
62*0dd62a01SPablo de Lara-----------------------
63*0dd62a01SPablo de Lara
64*0dd62a01SPablo de LaraComputation Based Schemes
65*0dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~
66*0dd62a01SPablo de Lara
67*0dd62a01SPablo de LaraFlow distribution and/or load balancing can be simply done using a
68*0dd62a01SPablo de Larastateless computation, for instance using round-robin or a simple
69*0dd62a01SPablo de Laracomputation based on the flow key as an input. For example, a hash
70*0dd62a01SPablo de Larafunction can be used to direct a certain flow to a target based on
71*0dd62a01SPablo de Larathe flow key (e.g. ``h(key) mod n``) where h(key) is the hash value of the
72*0dd62a01SPablo de Laraflow key and n is the number of possible targets.
73*0dd62a01SPablo de Lara
74*0dd62a01SPablo de Lara.. _figure_efd1:
75*0dd62a01SPablo de Lara
76*0dd62a01SPablo de Lara.. figure:: img/efd_i1.*
77*0dd62a01SPablo de Lara
78*0dd62a01SPablo de Lara  Load Balancing Using Front End Node
79*0dd62a01SPablo de Lara
80*0dd62a01SPablo de LaraIn this scheme (:numref:`figure_efd1`), the front end server/distributor/load balancer
81*0dd62a01SPablo de Laraextracts the flow key from the input packet and applies a computation to determine where
82*0dd62a01SPablo de Larathis flow should be directed. Intuitively, this scheme is very simple
83*0dd62a01SPablo de Laraand requires no state to be kept at the front end node, and hence,
84*0dd62a01SPablo de Larastorage requirements are minimum.
85*0dd62a01SPablo de Lara
86*0dd62a01SPablo de Lara.. _figure_efd2:
87*0dd62a01SPablo de Lara
88*0dd62a01SPablo de Lara.. figure:: img/efd_i2.*
89*0dd62a01SPablo de Lara
90*0dd62a01SPablo de Lara  Consistent Hashing
91*0dd62a01SPablo de Lara
92*0dd62a01SPablo de LaraA widely used flow distributor that belongs to the same category of
93*0dd62a01SPablo de Laracomputation-based schemes is ``consistent hashing``, shown in :numref:`figure_efd2`.
94*0dd62a01SPablo de LaraTarget destinations (shown in red) are hashed into the same space as the flow
95*0dd62a01SPablo de Larakeys (shown in blue), and keys are mapped to the nearest target in a clockwise
96*0dd62a01SPablo de Larafashion. Dynamically adding and removing targets with consistent hashing
97*0dd62a01SPablo de Lararequires only K/n keys to be remapped on average, where K is the number of
98*0dd62a01SPablo de Larakeys, and n is the number of targets. In contrast, in a traditional hash-based
99*0dd62a01SPablo de Larascheme, a change in the number of targets causes nearly all keys to be
100*0dd62a01SPablo de Lararemapped.
101*0dd62a01SPablo de Lara
102*0dd62a01SPablo de LaraAlthough computation-based schemes are simple and need very little
103*0dd62a01SPablo de Larastorage requirement, they suffer from the drawback that the system
104*0dd62a01SPablo de Laradesigner/operator can’t fully control the target to assign a specific
105*0dd62a01SPablo de Larakey, as this is dictated by the hash function.
106*0dd62a01SPablo de LaraDeterministically co-locating of keys together (for example, to minimize
107*0dd62a01SPablo de Larainter-server traffic or to optimize for network traffic conditions,
108*0dd62a01SPablo de Laratarget load, etc.) is simply not possible.
109*0dd62a01SPablo de Lara
110*0dd62a01SPablo de LaraFlow-Table Based Schemes
111*0dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~
112*0dd62a01SPablo de Lara
113*0dd62a01SPablo de LaraWhen using a Flow-Table based scheme to handle flow distribution/load
114*0dd62a01SPablo de Larabalancing, in contrast with computation-based schemes, the system designer
115*0dd62a01SPablo de Larahas the flexibility of assigning a given flow to any given
116*0dd62a01SPablo de Laratarget. The flow table (e.g. DPDK RTE Hash Library) will simply store
117*0dd62a01SPablo de Laraboth the flow key and the target value.
118*0dd62a01SPablo de Lara
119*0dd62a01SPablo de Lara.. _figure_efd3:
120*0dd62a01SPablo de Lara
121*0dd62a01SPablo de Lara.. figure:: img/efd_i3.*
122*0dd62a01SPablo de Lara
123*0dd62a01SPablo de Lara  Table Based Flow Distribution
124*0dd62a01SPablo de Lara
125*0dd62a01SPablo de LaraAs shown in :numref:`figure_efd3`, when doing a lookup, the flow-table
126*0dd62a01SPablo de Larais indexed with the hash of the flow key and the keys (more than one is possible,
127*0dd62a01SPablo de Larabecause of hash collision) stored in this index and corresponding values
128*0dd62a01SPablo de Laraare retrieved. The retrieved key(s) is matched with the input flow key
129*0dd62a01SPablo de Laraand if there is a match the value (target id) is returned.
130*0dd62a01SPablo de Lara
131*0dd62a01SPablo de LaraThe drawback of using a hash table for flow distribution/load balancing
132*0dd62a01SPablo de Larais the storage requirement, since the flow table need to store keys,
133*0dd62a01SPablo de Larasignatures and target values. This doesn't allow this scheme to scale to
134*0dd62a01SPablo de Laramillions of flow keys. Large tables will usually not fit in
135*0dd62a01SPablo de Larathe CPU cache, and hence, the lookup performance is degraded because of
136*0dd62a01SPablo de Larathe latency to access the main memory.
137*0dd62a01SPablo de Lara
138*0dd62a01SPablo de LaraEFD Based Scheme
139*0dd62a01SPablo de Lara~~~~~~~~~~~~~~~~
140*0dd62a01SPablo de Lara
141*0dd62a01SPablo de LaraEFD combines the advantages of both flow-table based and computation-based
142*0dd62a01SPablo de Laraschemes. It doesn't require the large storage necessary for
143*0dd62a01SPablo de Laraflow-table based schemes (because EFD doesn't store the key as explained
144*0dd62a01SPablo de Larabelow), and it supports any arbitrary value for any given key.
145*0dd62a01SPablo de Lara
146*0dd62a01SPablo de Lara.. _figure_efd4:
147*0dd62a01SPablo de Lara
148*0dd62a01SPablo de Lara.. figure:: img/efd_i4.*
149*0dd62a01SPablo de Lara
150*0dd62a01SPablo de Lara  Searching for Perfect Hash Function
151*0dd62a01SPablo de Lara
152*0dd62a01SPablo de LaraThe basic idea of EFD is when a given key is to be inserted, a family of
153*0dd62a01SPablo de Larahash functions is searched until the correct hash function that maps the
154*0dd62a01SPablo de Larainput key to the correct value is found, as shown in :numref:`figure_efd4`.
155*0dd62a01SPablo de LaraHowever, rather than explicitly storing all keys and their associated values,
156*0dd62a01SPablo de LaraEFD stores only indices of hash functions that map keys to values, and
157*0dd62a01SPablo de Larathereby consumes much less space than conventional flow-based tables.
158*0dd62a01SPablo de LaraThe lookup operation is very simple, similar to a computational-based
159*0dd62a01SPablo de Larascheme: given an input key the lookup operation is reduced to hashing
160*0dd62a01SPablo de Larathat key with the correct hash function.
161*0dd62a01SPablo de Lara
162*0dd62a01SPablo de Lara.. _figure_efd5:
163*0dd62a01SPablo de Lara
164*0dd62a01SPablo de Lara.. figure:: img/efd_i5.*
165*0dd62a01SPablo de Lara
166*0dd62a01SPablo de Lara  Divide and Conquer for Millions of Keys
167*0dd62a01SPablo de Lara
168*0dd62a01SPablo de LaraIntuitively, finding a hash function that maps each of a large number
169*0dd62a01SPablo de Lara(millions) of input keys to the correct output value is effectively
170*0dd62a01SPablo de Laraimpossible, as a result EFD, as shown in :numref:`figure_efd5`,
171*0dd62a01SPablo de Larabreaks the problem into smaller pieces (divide and conquer).
172*0dd62a01SPablo de LaraEFD divides the entire input key set into many small groups.
173*0dd62a01SPablo de LaraEach group consists of approximately 20-28 keys (a configurable parameter
174*0dd62a01SPablo de Larafor the library), then, for each small group, a brute force search to find
175*0dd62a01SPablo de Laraa hash function that produces the correct outputs for each key in the group.
176*0dd62a01SPablo de Lara
177*0dd62a01SPablo de LaraIt should be mentioned that, since the online lookup table for EFD
178*0dd62a01SPablo de Laradoesn't store the key itself, the size of the EFD table is independent
179*0dd62a01SPablo de Laraof the key size and hence EFD lookup performance which is almost
180*0dd62a01SPablo de Laraconstant irrespective of the length of the key which is a highly
181*0dd62a01SPablo de Laradesirable feature especially for longer keys.
182*0dd62a01SPablo de Lara
183*0dd62a01SPablo de LaraIn summary, EFD is a set separation data structure that supports millions of
184*0dd62a01SPablo de Larakeys. It is used to distribute a given key to an intended target. By itself
185*0dd62a01SPablo de LaraEFD is not a FIB data structure with an exact match the input flow key.
186*0dd62a01SPablo de Lara
187*0dd62a01SPablo de Lara.. _Efd_example:
188*0dd62a01SPablo de Lara
189*0dd62a01SPablo de LaraExample of EFD Library Usage
190*0dd62a01SPablo de Lara----------------------------
191*0dd62a01SPablo de Lara
192*0dd62a01SPablo de LaraEFD can be used along the data path of many network functions and middleboxes.
193*0dd62a01SPablo de LaraAs previously mentioned, it can used as an index table for
194*0dd62a01SPablo de Lara<key,value> pairs, meta-data for objects, a flow-level load balancer, etc.
195*0dd62a01SPablo de Lara:numref:`figure_efd6` shows an example of using EFD as a flow-level load
196*0dd62a01SPablo de Larabalancer, where flows are received at a front end server before being forwarded
197*0dd62a01SPablo de Larato the target back end server for processing. The system designer would
198*0dd62a01SPablo de Laradeterministically co-locate flows together in order to minimize cross-server
199*0dd62a01SPablo de Larainteraction.
200*0dd62a01SPablo de Lara(For example, flows requesting certain webpage objects are co-located
201*0dd62a01SPablo de Laratogether, to minimize forwarding of common objects across servers).
202*0dd62a01SPablo de Lara
203*0dd62a01SPablo de Lara.. _figure_efd6:
204*0dd62a01SPablo de Lara
205*0dd62a01SPablo de Lara.. figure:: img/efd_i6.*
206*0dd62a01SPablo de Lara
207*0dd62a01SPablo de Lara  EFD as a Flow-Level Load Balancer
208*0dd62a01SPablo de Lara
209*0dd62a01SPablo de LaraAs shown in :numref:`figure_efd6`, the front end server will have an EFD table that
210*0dd62a01SPablo de Larastores for each group what is the perfect hash index that satisfies the
211*0dd62a01SPablo de Laracorrect output. Because the table size is small and fits in cache (since
212*0dd62a01SPablo de Larakeys are not stored), it sustains a large number of flows (N*X, where N
213*0dd62a01SPablo de Larais the maximum number of flows served by each back end server of the X
214*0dd62a01SPablo de Larapossible targets).
215*0dd62a01SPablo de Lara
216*0dd62a01SPablo de LaraWith an input flow key, the group id is computed (for example, using
217*0dd62a01SPablo de Laralast few bits of CRC hash) and then the EFD table is indexed with the
218*0dd62a01SPablo de Laragroup id to retrieve the corresponding hash index to use. Once the index
219*0dd62a01SPablo de Larais retrieved the key is hashed using this hash function and the result
220*0dd62a01SPablo de Larawill be the intended correct target where this flow is supposed to be
221*0dd62a01SPablo de Laraprocessed.
222*0dd62a01SPablo de Lara
223*0dd62a01SPablo de LaraIt should be noted that as a result of EFD not matching the exact key but
224*0dd62a01SPablo de Lararather distributing the flows to a target back end node based on the
225*0dd62a01SPablo de Laraperfect hash index, a key that has not been inserted before
226*0dd62a01SPablo de Larawill be distributed to a valid target. Hence, a local table which stores
227*0dd62a01SPablo de Larathe flows served at each node is used and is
228*0dd62a01SPablo de Laraexact matched with the input key to rule out new never seen before
229*0dd62a01SPablo de Laraflows.
230*0dd62a01SPablo de Lara
231*0dd62a01SPablo de Lara.. _Efd_api:
232*0dd62a01SPablo de Lara
233*0dd62a01SPablo de LaraLibrary API Overview
234*0dd62a01SPablo de Lara--------------------
235*0dd62a01SPablo de Lara
236*0dd62a01SPablo de LaraThe EFD library API is created with a very similar semantics of a
237*0dd62a01SPablo de Larahash-index or a flow table. The application creates an EFD table for a
238*0dd62a01SPablo de Laragiven maximum number of flows, a function is called to insert a flow key
239*0dd62a01SPablo de Larawith a specific target value, and another function is used to retrieve
240*0dd62a01SPablo de Laratarget values for a given individual flow key or a bulk of keys.
241*0dd62a01SPablo de Lara
242*0dd62a01SPablo de LaraEFD Table Create
243*0dd62a01SPablo de Lara~~~~~~~~~~~~~~~~
244*0dd62a01SPablo de Lara
245*0dd62a01SPablo de LaraThe function ``rte_efd_create()`` is used to create and return a pointer
246*0dd62a01SPablo de Larato an EFD table that is sized to hold up to num_flows key.
247*0dd62a01SPablo de LaraThe online version of the EFD table (the one that does
248*0dd62a01SPablo de Laranot store the keys and is used for lookups) will be allocated and
249*0dd62a01SPablo de Laracreated in the last level cache (LLC) of the socket defined by the
250*0dd62a01SPablo de Laraonline_socket_bitmask, while the offline EFD table (the one that
251*0dd62a01SPablo de Larastores the keys and is used for key inserts and for computing the
252*0dd62a01SPablo de Laraperfect hashing) is allocated and created in the LLC of the socket
253*0dd62a01SPablo de Laradefined by offline_socket_bitmask. It should be noted, that for
254*0dd62a01SPablo de Larahighest performance the socket id should match that where the thread is
255*0dd62a01SPablo de Lararunning, i.e. the online EFD lookup table should be created on the same
256*0dd62a01SPablo de Larasocket as where the lookup thread is running.
257*0dd62a01SPablo de Lara
258*0dd62a01SPablo de LaraEFD Insert and Update
259*0dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~
260*0dd62a01SPablo de Lara
261*0dd62a01SPablo de LaraThe EFD function to insert a key or update a key to a new value is
262*0dd62a01SPablo de Lara``rte_efd_update()``. This function will update an existing key to
263*0dd62a01SPablo de Laraa new value (target) if the key has already been inserted
264*0dd62a01SPablo de Larabefore, or will insert the <key,value> pair if this key has not been inserted
265*0dd62a01SPablo de Larabefore. It will return 0 upon success. It will return
266*0dd62a01SPablo de Lara``EFD_UPDATE_WARN_GROUP_FULL (1)`` if the operation is insert, and the
267*0dd62a01SPablo de Laralast available space in the key's group was just used. It will return
268*0dd62a01SPablo de Lara``EFD_UPDATE_FAILED (2)`` when the insertion or update has failed (either it
269*0dd62a01SPablo de Larafailed to find a suitable perfect hash or the group was full). The function
270*0dd62a01SPablo de Larawill return ``EFD_UPDATE_NO_CHANGE (3)`` if there is no change to the EFD
271*0dd62a01SPablo de Laratable (i.e, same value already exists).
272*0dd62a01SPablo de Lara
273*0dd62a01SPablo de LaraEFD Lookup
274*0dd62a01SPablo de Lara~~~~~~~~~~
275*0dd62a01SPablo de Lara
276*0dd62a01SPablo de LaraTo lookup a certain key in an EFD table, the function ``rte_efd_lookup()``
277*0dd62a01SPablo de Larais used to return the value associated with single key.
278*0dd62a01SPablo de LaraAs previously mentioned, if the key has been inserted, the correct value
279*0dd62a01SPablo de Larainserted is returned, if the key has not been inserted before,
280*0dd62a01SPablo de Laraa ‘random’ value (based on hashing of the key) is returned.
281*0dd62a01SPablo de LaraFor better performance and to decrease the overhead of
282*0dd62a01SPablo de Larafunction calls per key, it is always recommended to use a bulk lookup
283*0dd62a01SPablo de Larafunction (simultaneous lookup of multiple keys) instead of a single key
284*0dd62a01SPablo de Laralookup function. ``rte_efd_lookup_bulk()`` is the bulk lookup function,
285*0dd62a01SPablo de Larathat looks up num_keys simultaneously stored in the key_list and the
286*0dd62a01SPablo de Laracorresponding return values will be returned in the value_list.
287*0dd62a01SPablo de Lara
288*0dd62a01SPablo de LaraEFD Delete
289*0dd62a01SPablo de Lara~~~~~~~~~~
290*0dd62a01SPablo de Lara
291*0dd62a01SPablo de LaraTo delete a certain key in an EFD table, the function
292*0dd62a01SPablo de Lara``rte_efd_delete()`` can be used. The function returns zero upon success
293*0dd62a01SPablo de Larawhen the key has been found and deleted. Socket_id is the parameter to
294*0dd62a01SPablo de Larause to lookup the existing value, which is ideally the caller's socket id.
295*0dd62a01SPablo de LaraThe previous value associated with this key will be returned
296*0dd62a01SPablo de Larain the prev_value argument.
297*0dd62a01SPablo de Lara
298*0dd62a01SPablo de Lara.. _Efd_internals:
299*0dd62a01SPablo de Lara
300*0dd62a01SPablo de LaraLibrary Internals
301*0dd62a01SPablo de Lara-----------------
302*0dd62a01SPablo de Lara
303*0dd62a01SPablo de LaraThis section provides the brief high-level idea and an overview
304*0dd62a01SPablo de Laraof the library internals to accompany the RFC. The intent of this
305*0dd62a01SPablo de Larasection is to explain to readers the high-level implementation of
306*0dd62a01SPablo de Larainsert, lookup and group rebalancing in the EFD library.
307*0dd62a01SPablo de Lara
308*0dd62a01SPablo de LaraInsert Function Internals
309*0dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~
310*0dd62a01SPablo de Lara
311*0dd62a01SPablo de LaraAs previously mentioned the EFD divides the whole set of keys into
312*0dd62a01SPablo de Laragroups of a manageable size (e.g. 28 keys) and then searches for the
313*0dd62a01SPablo de Laraperfect hash that satisfies the intended target value for each key. EFD
314*0dd62a01SPablo de Larastores two version of the <key,value> table:
315*0dd62a01SPablo de Lara
316*0dd62a01SPablo de Lara-  Offline Version (in memory): Only used for the insertion/update
317*0dd62a01SPablo de Lara   operation, which is less frequent than the lookup operation. In the
318*0dd62a01SPablo de Lara   offline version the exact keys for each group is stored. When a new
319*0dd62a01SPablo de Lara   key is added, the hash function is updated that will satisfy the
320*0dd62a01SPablo de Lara   value for the new key together with the all old keys already inserted
321*0dd62a01SPablo de Lara   in this group.
322*0dd62a01SPablo de Lara
323*0dd62a01SPablo de Lara-  Online Version (in cache): Used for the frequent lookup operation. In
324*0dd62a01SPablo de Lara   the online version, as previously mentioned, the keys are not stored
325*0dd62a01SPablo de Lara   but rather only the hash index for each group.
326*0dd62a01SPablo de Lara
327*0dd62a01SPablo de Lara.. _figure_efd7:
328*0dd62a01SPablo de Lara
329*0dd62a01SPablo de Lara.. figure:: img/efd_i7.*
330*0dd62a01SPablo de Lara
331*0dd62a01SPablo de Lara  Group Assignment
332*0dd62a01SPablo de Lara
333*0dd62a01SPablo de Lara:numref:`figure_efd7` depicts the group assignment for 7 flow keys as an example.
334*0dd62a01SPablo de LaraGiven a flow key, a hash function (in our implementation CRC hash) is
335*0dd62a01SPablo de Laraused to get the group id. As shown in the figure, the groups can be
336*0dd62a01SPablo de Laraunbalanced. (We highlight group rebalancing further below).
337*0dd62a01SPablo de Lara
338*0dd62a01SPablo de Lara.. _figure_efd8:
339*0dd62a01SPablo de Lara
340*0dd62a01SPablo de Lara.. figure:: img/efd_i8.*
341*0dd62a01SPablo de Lara
342*0dd62a01SPablo de Lara  Perfect Hash Search - Assigned Keys & Target Value
343*0dd62a01SPablo de Lara
344*0dd62a01SPablo de LaraFocusing on one group that has four keys, :numref:`figure_efd8` depicts the search
345*0dd62a01SPablo de Laraalgorithm to find the perfect hash function. Assuming that the target
346*0dd62a01SPablo de Laravalue bit for the keys is as shown in the figure, then the online EFD
347*0dd62a01SPablo de Laratable will store a 16 bit hash index and 16 bit lookup table per group
348*0dd62a01SPablo de Laraper value bit.
349*0dd62a01SPablo de Lara
350*0dd62a01SPablo de Lara.. _figure_efd9:
351*0dd62a01SPablo de Lara
352*0dd62a01SPablo de Lara.. figure:: img/efd_i9.*
353*0dd62a01SPablo de Lara
354*0dd62a01SPablo de Lara  Perfect Hash Search - Satisfy Target Values
355*0dd62a01SPablo de Lara
356*0dd62a01SPablo de LaraFor a given keyX, a hash function ``(h(keyX, seed1) + index * h(keyX, seed2))``
357*0dd62a01SPablo de Larais used to point to certain bit index in the 16bit lookup_table value,
358*0dd62a01SPablo de Laraas shown in :numref:`figure_efd9`.
359*0dd62a01SPablo de LaraThe insert function will brute force search for all possible values for the
360*0dd62a01SPablo de Larahash index until a non conflicting lookup_table is found.
361*0dd62a01SPablo de Lara
362*0dd62a01SPablo de Lara.. _figure_efd10:
363*0dd62a01SPablo de Lara
364*0dd62a01SPablo de Lara.. figure:: img/efd_i10.*
365*0dd62a01SPablo de Lara
366*0dd62a01SPablo de Lara  Finding Hash Index for Conflict Free lookup_table
367*0dd62a01SPablo de Lara
368*0dd62a01SPablo de LaraFor example, since both key3 and key7 have a target bit value of 1, it
369*0dd62a01SPablo de Larais okay if the hash function of both keys point to the same bit in the
370*0dd62a01SPablo de Laralookup table. A conflict will occur if a hash index is used that maps
371*0dd62a01SPablo de Laraboth Key4 and Key7 to the same index in the lookup_table,
372*0dd62a01SPablo de Laraas shown in :numref:`figure_efd10`, since their target value bit are not the same.
373*0dd62a01SPablo de LaraOnce a hash index is found that produces a lookup_table with no
374*0dd62a01SPablo de Laracontradictions, this index is stored for this group. This procedure is
375*0dd62a01SPablo de Lararepeated for each bit of target value.
376*0dd62a01SPablo de Lara
377*0dd62a01SPablo de LaraLookup Function Internals
378*0dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~
379*0dd62a01SPablo de Lara
380*0dd62a01SPablo de LaraThe design principle of EFD is that lookups are much more frequent than
381*0dd62a01SPablo de Larainserts, and hence, EFD's design optimizes for the lookups which are
382*0dd62a01SPablo de Larafaster and much simpler than the slower insert procedure (inserts are
383*0dd62a01SPablo de Laraslow, because of perfect hash search as previously discussed).
384*0dd62a01SPablo de Lara
385*0dd62a01SPablo de Lara.. _figure_efd11:
386*0dd62a01SPablo de Lara
387*0dd62a01SPablo de Lara.. figure:: img/efd_i11.*
388*0dd62a01SPablo de Lara
389*0dd62a01SPablo de Lara  EFD Lookup Operation
390*0dd62a01SPablo de Lara
391*0dd62a01SPablo de Lara:numref:`figure_efd11` depicts the lookup operation for EFD. Given an input key,
392*0dd62a01SPablo de Larathe group id is computed (using CRC hash) and then the hash index for this
393*0dd62a01SPablo de Laragroup is retrieved from the EFD table. Using the retrieved hash index,
394*0dd62a01SPablo de Larathe hash function ``h(key, seed1) + index *h(key, seed2)`` is used which will
395*0dd62a01SPablo de Lararesult in an index in the lookup_table, the bit corresponding to this
396*0dd62a01SPablo de Laraindex will be the target value bit. This procedure is repeated for each
397*0dd62a01SPablo de Larabit of the target value.
398*0dd62a01SPablo de Lara
399*0dd62a01SPablo de LaraGroup Rebalancing Function Internals
400*0dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
401*0dd62a01SPablo de Lara
402*0dd62a01SPablo de LaraWhen discussing EFD inserts and lookups, the discussion is simplified by
403*0dd62a01SPablo de Laraassuming that a group id is simply a result of hash function. However,
404*0dd62a01SPablo de Larasince hashing in general is not perfect and will not always produce a
405*0dd62a01SPablo de Larauniform output, this simplified assumption will lead to unbalanced
406*0dd62a01SPablo de Laragroups, i.e., some group will have more keys than other groups.
407*0dd62a01SPablo de LaraTypically, and to minimize insert time with an increasing number of keys,
408*0dd62a01SPablo de Larait is preferable that all groups will have a balanced number of keys, so
409*0dd62a01SPablo de Larathe brute force search for the perfect hash terminates with a valid hash
410*0dd62a01SPablo de Laraindex. In order to achieve this target, groups are rebalanced during
411*0dd62a01SPablo de Lararuntime inserts, and keys are moved around from a busy group to a less
412*0dd62a01SPablo de Laracrowded group as the more keys are inserted.
413*0dd62a01SPablo de Lara
414*0dd62a01SPablo de Lara.. _figure_efd12:
415*0dd62a01SPablo de Lara
416*0dd62a01SPablo de Lara.. figure:: img/efd_i12.*
417*0dd62a01SPablo de Lara
418*0dd62a01SPablo de Lara  Runtime Group Rebalancing
419*0dd62a01SPablo de Lara
420*0dd62a01SPablo de Lara:numref:`figure_efd12` depicts the high level idea of group rebalancing, given an
421*0dd62a01SPablo de Larainput key the hash result is split into two parts a chunk id and 8-bit
422*0dd62a01SPablo de Larabin id. A chunk contains 64 different groups and 256 bins (i.e. for any
423*0dd62a01SPablo de Laragiven bin it can map to 4 distinct groups). When a key is inserted, the
424*0dd62a01SPablo de Larabin id is computed, for example in :numref:`figure_efd12` bin_id=2,
425*0dd62a01SPablo de Laraand since each bin can be mapped to one of four different groups (2 bit storage),
426*0dd62a01SPablo de Larathe four possible mappings are evaluated and the one that will result in a
427*0dd62a01SPablo de Larabalanced key distribution across these four is selected the mapping result
428*0dd62a01SPablo de Larais stored in these two bits.
429*0dd62a01SPablo de Lara
430*0dd62a01SPablo de Lara
431*0dd62a01SPablo de Lara.. _Efd_references:
432*0dd62a01SPablo de Lara
433*0dd62a01SPablo de LaraReferences
434*0dd62a01SPablo de Lara-----------
435*0dd62a01SPablo de Lara
436*0dd62a01SPablo de Lara1- EFD is based on collaborative research work between Intel and
437*0dd62a01SPablo de LaraCarnegie Mellon University (CMU), interested readers can refer to the paper
438*0dd62a01SPablo de Lara“Scaling Up Clustered Network Appliances with ScaleBricks;” Dong Zhou et al.
439*0dd62a01SPablo de Laraat SIGCOMM 2015 (`http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p241.pdf`)
440*0dd62a01SPablo de Larafor more information.
441