1*0dd62a01SPablo de Lara.. BSD LICENSE 2*0dd62a01SPablo de Lara Copyright(c) 2016-2017 Intel Corporation. All rights reserved. 3*0dd62a01SPablo de Lara All rights reserved. 4*0dd62a01SPablo de Lara 5*0dd62a01SPablo de Lara Redistribution and use in source and binary forms, with or without 6*0dd62a01SPablo de Lara modification, are permitted provided that the following conditions 7*0dd62a01SPablo de Lara are met: 8*0dd62a01SPablo de Lara 9*0dd62a01SPablo de Lara * Redistributions of source code must retain the above copyright 10*0dd62a01SPablo de Lara notice, this list of conditions and the following disclaimer. 11*0dd62a01SPablo de Lara * Redistributions in binary form must reproduce the above copyright 12*0dd62a01SPablo de Lara notice, this list of conditions and the following disclaimer in 13*0dd62a01SPablo de Lara the documentation and/or other materials provided with the 14*0dd62a01SPablo de Lara distribution. 15*0dd62a01SPablo de Lara * Neither the name of Intel Corporation nor the names of its 16*0dd62a01SPablo de Lara contributors may be used to endorse or promote products derived 17*0dd62a01SPablo de Lara from this software without specific prior written permission. 18*0dd62a01SPablo de Lara 19*0dd62a01SPablo de Lara THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20*0dd62a01SPablo de Lara "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21*0dd62a01SPablo de Lara LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 22*0dd62a01SPablo de Lara A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 23*0dd62a01SPablo de Lara OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 24*0dd62a01SPablo de Lara SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 25*0dd62a01SPablo de Lara LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 26*0dd62a01SPablo de Lara DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 27*0dd62a01SPablo de Lara THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 28*0dd62a01SPablo de Lara (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29*0dd62a01SPablo de Lara OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30*0dd62a01SPablo de Lara 31*0dd62a01SPablo de Lara.. _Efd_Library: 32*0dd62a01SPablo de Lara 33*0dd62a01SPablo de LaraElastic Flow Distributor Library 34*0dd62a01SPablo de Lara================================ 35*0dd62a01SPablo de Lara 36*0dd62a01SPablo de LaraIntroduction 37*0dd62a01SPablo de Lara------------ 38*0dd62a01SPablo de Lara 39*0dd62a01SPablo de LaraIn Data Centers today, clustering and scheduling of distributed workloads 40*0dd62a01SPablo de Larais a very common task. Many workloads require a deterministic 41*0dd62a01SPablo de Larapartitioning of a flat key space among a cluster of machines. When a 42*0dd62a01SPablo de Larapacket enters the cluster, the ingress node will direct the packet to 43*0dd62a01SPablo de Laraits handling node. For example, data-centers with disaggregated storage 44*0dd62a01SPablo de Larause storage metadata tables to forward I/O requests to the correct back end 45*0dd62a01SPablo de Larastorage cluster, stateful packet inspection will use match incoming 46*0dd62a01SPablo de Laraflows to signatures in flow tables to send incoming packets to their 47*0dd62a01SPablo de Laraintended deep packet inspection (DPI) devices, and so on. 48*0dd62a01SPablo de Lara 49*0dd62a01SPablo de LaraEFD is a distributor library that uses perfect hashing to determine a 50*0dd62a01SPablo de Laratarget/value for a given incoming flow key. It has the following 51*0dd62a01SPablo de Laraadvantages: first, because it uses perfect hashing it does not store the 52*0dd62a01SPablo de Larakey itself and hence lookup performance is not dependent on the key 53*0dd62a01SPablo de Larasize. Second, the target/value can be any arbitrary value hence the 54*0dd62a01SPablo de Larasystem designer and/or operator can better optimize service rates and 55*0dd62a01SPablo de Larainter-cluster network traffic locating. Third, since the storage 56*0dd62a01SPablo de Lararequirement is much smaller than a hash-based flow table (i.e. better 57*0dd62a01SPablo de Larafit for CPU cache), EFD can scale to millions of flow keys. Finally, 58*0dd62a01SPablo de Larawith the current optimized library implementation, performance is fully 59*0dd62a01SPablo de Larascalable with any number of CPU cores. 60*0dd62a01SPablo de Lara 61*0dd62a01SPablo de LaraFlow Based Distribution 62*0dd62a01SPablo de Lara----------------------- 63*0dd62a01SPablo de Lara 64*0dd62a01SPablo de LaraComputation Based Schemes 65*0dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~ 66*0dd62a01SPablo de Lara 67*0dd62a01SPablo de LaraFlow distribution and/or load balancing can be simply done using a 68*0dd62a01SPablo de Larastateless computation, for instance using round-robin or a simple 69*0dd62a01SPablo de Laracomputation based on the flow key as an input. For example, a hash 70*0dd62a01SPablo de Larafunction can be used to direct a certain flow to a target based on 71*0dd62a01SPablo de Larathe flow key (e.g. ``h(key) mod n``) where h(key) is the hash value of the 72*0dd62a01SPablo de Laraflow key and n is the number of possible targets. 73*0dd62a01SPablo de Lara 74*0dd62a01SPablo de Lara.. _figure_efd1: 75*0dd62a01SPablo de Lara 76*0dd62a01SPablo de Lara.. figure:: img/efd_i1.* 77*0dd62a01SPablo de Lara 78*0dd62a01SPablo de Lara Load Balancing Using Front End Node 79*0dd62a01SPablo de Lara 80*0dd62a01SPablo de LaraIn this scheme (:numref:`figure_efd1`), the front end server/distributor/load balancer 81*0dd62a01SPablo de Laraextracts the flow key from the input packet and applies a computation to determine where 82*0dd62a01SPablo de Larathis flow should be directed. Intuitively, this scheme is very simple 83*0dd62a01SPablo de Laraand requires no state to be kept at the front end node, and hence, 84*0dd62a01SPablo de Larastorage requirements are minimum. 85*0dd62a01SPablo de Lara 86*0dd62a01SPablo de Lara.. _figure_efd2: 87*0dd62a01SPablo de Lara 88*0dd62a01SPablo de Lara.. figure:: img/efd_i2.* 89*0dd62a01SPablo de Lara 90*0dd62a01SPablo de Lara Consistent Hashing 91*0dd62a01SPablo de Lara 92*0dd62a01SPablo de LaraA widely used flow distributor that belongs to the same category of 93*0dd62a01SPablo de Laracomputation-based schemes is ``consistent hashing``, shown in :numref:`figure_efd2`. 94*0dd62a01SPablo de LaraTarget destinations (shown in red) are hashed into the same space as the flow 95*0dd62a01SPablo de Larakeys (shown in blue), and keys are mapped to the nearest target in a clockwise 96*0dd62a01SPablo de Larafashion. Dynamically adding and removing targets with consistent hashing 97*0dd62a01SPablo de Lararequires only K/n keys to be remapped on average, where K is the number of 98*0dd62a01SPablo de Larakeys, and n is the number of targets. In contrast, in a traditional hash-based 99*0dd62a01SPablo de Larascheme, a change in the number of targets causes nearly all keys to be 100*0dd62a01SPablo de Lararemapped. 101*0dd62a01SPablo de Lara 102*0dd62a01SPablo de LaraAlthough computation-based schemes are simple and need very little 103*0dd62a01SPablo de Larastorage requirement, they suffer from the drawback that the system 104*0dd62a01SPablo de Laradesigner/operator can’t fully control the target to assign a specific 105*0dd62a01SPablo de Larakey, as this is dictated by the hash function. 106*0dd62a01SPablo de LaraDeterministically co-locating of keys together (for example, to minimize 107*0dd62a01SPablo de Larainter-server traffic or to optimize for network traffic conditions, 108*0dd62a01SPablo de Laratarget load, etc.) is simply not possible. 109*0dd62a01SPablo de Lara 110*0dd62a01SPablo de LaraFlow-Table Based Schemes 111*0dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~ 112*0dd62a01SPablo de Lara 113*0dd62a01SPablo de LaraWhen using a Flow-Table based scheme to handle flow distribution/load 114*0dd62a01SPablo de Larabalancing, in contrast with computation-based schemes, the system designer 115*0dd62a01SPablo de Larahas the flexibility of assigning a given flow to any given 116*0dd62a01SPablo de Laratarget. The flow table (e.g. DPDK RTE Hash Library) will simply store 117*0dd62a01SPablo de Laraboth the flow key and the target value. 118*0dd62a01SPablo de Lara 119*0dd62a01SPablo de Lara.. _figure_efd3: 120*0dd62a01SPablo de Lara 121*0dd62a01SPablo de Lara.. figure:: img/efd_i3.* 122*0dd62a01SPablo de Lara 123*0dd62a01SPablo de Lara Table Based Flow Distribution 124*0dd62a01SPablo de Lara 125*0dd62a01SPablo de LaraAs shown in :numref:`figure_efd3`, when doing a lookup, the flow-table 126*0dd62a01SPablo de Larais indexed with the hash of the flow key and the keys (more than one is possible, 127*0dd62a01SPablo de Larabecause of hash collision) stored in this index and corresponding values 128*0dd62a01SPablo de Laraare retrieved. The retrieved key(s) is matched with the input flow key 129*0dd62a01SPablo de Laraand if there is a match the value (target id) is returned. 130*0dd62a01SPablo de Lara 131*0dd62a01SPablo de LaraThe drawback of using a hash table for flow distribution/load balancing 132*0dd62a01SPablo de Larais the storage requirement, since the flow table need to store keys, 133*0dd62a01SPablo de Larasignatures and target values. This doesn't allow this scheme to scale to 134*0dd62a01SPablo de Laramillions of flow keys. Large tables will usually not fit in 135*0dd62a01SPablo de Larathe CPU cache, and hence, the lookup performance is degraded because of 136*0dd62a01SPablo de Larathe latency to access the main memory. 137*0dd62a01SPablo de Lara 138*0dd62a01SPablo de LaraEFD Based Scheme 139*0dd62a01SPablo de Lara~~~~~~~~~~~~~~~~ 140*0dd62a01SPablo de Lara 141*0dd62a01SPablo de LaraEFD combines the advantages of both flow-table based and computation-based 142*0dd62a01SPablo de Laraschemes. It doesn't require the large storage necessary for 143*0dd62a01SPablo de Laraflow-table based schemes (because EFD doesn't store the key as explained 144*0dd62a01SPablo de Larabelow), and it supports any arbitrary value for any given key. 145*0dd62a01SPablo de Lara 146*0dd62a01SPablo de Lara.. _figure_efd4: 147*0dd62a01SPablo de Lara 148*0dd62a01SPablo de Lara.. figure:: img/efd_i4.* 149*0dd62a01SPablo de Lara 150*0dd62a01SPablo de Lara Searching for Perfect Hash Function 151*0dd62a01SPablo de Lara 152*0dd62a01SPablo de LaraThe basic idea of EFD is when a given key is to be inserted, a family of 153*0dd62a01SPablo de Larahash functions is searched until the correct hash function that maps the 154*0dd62a01SPablo de Larainput key to the correct value is found, as shown in :numref:`figure_efd4`. 155*0dd62a01SPablo de LaraHowever, rather than explicitly storing all keys and their associated values, 156*0dd62a01SPablo de LaraEFD stores only indices of hash functions that map keys to values, and 157*0dd62a01SPablo de Larathereby consumes much less space than conventional flow-based tables. 158*0dd62a01SPablo de LaraThe lookup operation is very simple, similar to a computational-based 159*0dd62a01SPablo de Larascheme: given an input key the lookup operation is reduced to hashing 160*0dd62a01SPablo de Larathat key with the correct hash function. 161*0dd62a01SPablo de Lara 162*0dd62a01SPablo de Lara.. _figure_efd5: 163*0dd62a01SPablo de Lara 164*0dd62a01SPablo de Lara.. figure:: img/efd_i5.* 165*0dd62a01SPablo de Lara 166*0dd62a01SPablo de Lara Divide and Conquer for Millions of Keys 167*0dd62a01SPablo de Lara 168*0dd62a01SPablo de LaraIntuitively, finding a hash function that maps each of a large number 169*0dd62a01SPablo de Lara(millions) of input keys to the correct output value is effectively 170*0dd62a01SPablo de Laraimpossible, as a result EFD, as shown in :numref:`figure_efd5`, 171*0dd62a01SPablo de Larabreaks the problem into smaller pieces (divide and conquer). 172*0dd62a01SPablo de LaraEFD divides the entire input key set into many small groups. 173*0dd62a01SPablo de LaraEach group consists of approximately 20-28 keys (a configurable parameter 174*0dd62a01SPablo de Larafor the library), then, for each small group, a brute force search to find 175*0dd62a01SPablo de Laraa hash function that produces the correct outputs for each key in the group. 176*0dd62a01SPablo de Lara 177*0dd62a01SPablo de LaraIt should be mentioned that, since the online lookup table for EFD 178*0dd62a01SPablo de Laradoesn't store the key itself, the size of the EFD table is independent 179*0dd62a01SPablo de Laraof the key size and hence EFD lookup performance which is almost 180*0dd62a01SPablo de Laraconstant irrespective of the length of the key which is a highly 181*0dd62a01SPablo de Laradesirable feature especially for longer keys. 182*0dd62a01SPablo de Lara 183*0dd62a01SPablo de LaraIn summary, EFD is a set separation data structure that supports millions of 184*0dd62a01SPablo de Larakeys. It is used to distribute a given key to an intended target. By itself 185*0dd62a01SPablo de LaraEFD is not a FIB data structure with an exact match the input flow key. 186*0dd62a01SPablo de Lara 187*0dd62a01SPablo de Lara.. _Efd_example: 188*0dd62a01SPablo de Lara 189*0dd62a01SPablo de LaraExample of EFD Library Usage 190*0dd62a01SPablo de Lara---------------------------- 191*0dd62a01SPablo de Lara 192*0dd62a01SPablo de LaraEFD can be used along the data path of many network functions and middleboxes. 193*0dd62a01SPablo de LaraAs previously mentioned, it can used as an index table for 194*0dd62a01SPablo de Lara<key,value> pairs, meta-data for objects, a flow-level load balancer, etc. 195*0dd62a01SPablo de Lara:numref:`figure_efd6` shows an example of using EFD as a flow-level load 196*0dd62a01SPablo de Larabalancer, where flows are received at a front end server before being forwarded 197*0dd62a01SPablo de Larato the target back end server for processing. The system designer would 198*0dd62a01SPablo de Laradeterministically co-locate flows together in order to minimize cross-server 199*0dd62a01SPablo de Larainteraction. 200*0dd62a01SPablo de Lara(For example, flows requesting certain webpage objects are co-located 201*0dd62a01SPablo de Laratogether, to minimize forwarding of common objects across servers). 202*0dd62a01SPablo de Lara 203*0dd62a01SPablo de Lara.. _figure_efd6: 204*0dd62a01SPablo de Lara 205*0dd62a01SPablo de Lara.. figure:: img/efd_i6.* 206*0dd62a01SPablo de Lara 207*0dd62a01SPablo de Lara EFD as a Flow-Level Load Balancer 208*0dd62a01SPablo de Lara 209*0dd62a01SPablo de LaraAs shown in :numref:`figure_efd6`, the front end server will have an EFD table that 210*0dd62a01SPablo de Larastores for each group what is the perfect hash index that satisfies the 211*0dd62a01SPablo de Laracorrect output. Because the table size is small and fits in cache (since 212*0dd62a01SPablo de Larakeys are not stored), it sustains a large number of flows (N*X, where N 213*0dd62a01SPablo de Larais the maximum number of flows served by each back end server of the X 214*0dd62a01SPablo de Larapossible targets). 215*0dd62a01SPablo de Lara 216*0dd62a01SPablo de LaraWith an input flow key, the group id is computed (for example, using 217*0dd62a01SPablo de Laralast few bits of CRC hash) and then the EFD table is indexed with the 218*0dd62a01SPablo de Laragroup id to retrieve the corresponding hash index to use. Once the index 219*0dd62a01SPablo de Larais retrieved the key is hashed using this hash function and the result 220*0dd62a01SPablo de Larawill be the intended correct target where this flow is supposed to be 221*0dd62a01SPablo de Laraprocessed. 222*0dd62a01SPablo de Lara 223*0dd62a01SPablo de LaraIt should be noted that as a result of EFD not matching the exact key but 224*0dd62a01SPablo de Lararather distributing the flows to a target back end node based on the 225*0dd62a01SPablo de Laraperfect hash index, a key that has not been inserted before 226*0dd62a01SPablo de Larawill be distributed to a valid target. Hence, a local table which stores 227*0dd62a01SPablo de Larathe flows served at each node is used and is 228*0dd62a01SPablo de Laraexact matched with the input key to rule out new never seen before 229*0dd62a01SPablo de Laraflows. 230*0dd62a01SPablo de Lara 231*0dd62a01SPablo de Lara.. _Efd_api: 232*0dd62a01SPablo de Lara 233*0dd62a01SPablo de LaraLibrary API Overview 234*0dd62a01SPablo de Lara-------------------- 235*0dd62a01SPablo de Lara 236*0dd62a01SPablo de LaraThe EFD library API is created with a very similar semantics of a 237*0dd62a01SPablo de Larahash-index or a flow table. The application creates an EFD table for a 238*0dd62a01SPablo de Laragiven maximum number of flows, a function is called to insert a flow key 239*0dd62a01SPablo de Larawith a specific target value, and another function is used to retrieve 240*0dd62a01SPablo de Laratarget values for a given individual flow key or a bulk of keys. 241*0dd62a01SPablo de Lara 242*0dd62a01SPablo de LaraEFD Table Create 243*0dd62a01SPablo de Lara~~~~~~~~~~~~~~~~ 244*0dd62a01SPablo de Lara 245*0dd62a01SPablo de LaraThe function ``rte_efd_create()`` is used to create and return a pointer 246*0dd62a01SPablo de Larato an EFD table that is sized to hold up to num_flows key. 247*0dd62a01SPablo de LaraThe online version of the EFD table (the one that does 248*0dd62a01SPablo de Laranot store the keys and is used for lookups) will be allocated and 249*0dd62a01SPablo de Laracreated in the last level cache (LLC) of the socket defined by the 250*0dd62a01SPablo de Laraonline_socket_bitmask, while the offline EFD table (the one that 251*0dd62a01SPablo de Larastores the keys and is used for key inserts and for computing the 252*0dd62a01SPablo de Laraperfect hashing) is allocated and created in the LLC of the socket 253*0dd62a01SPablo de Laradefined by offline_socket_bitmask. It should be noted, that for 254*0dd62a01SPablo de Larahighest performance the socket id should match that where the thread is 255*0dd62a01SPablo de Lararunning, i.e. the online EFD lookup table should be created on the same 256*0dd62a01SPablo de Larasocket as where the lookup thread is running. 257*0dd62a01SPablo de Lara 258*0dd62a01SPablo de LaraEFD Insert and Update 259*0dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~ 260*0dd62a01SPablo de Lara 261*0dd62a01SPablo de LaraThe EFD function to insert a key or update a key to a new value is 262*0dd62a01SPablo de Lara``rte_efd_update()``. This function will update an existing key to 263*0dd62a01SPablo de Laraa new value (target) if the key has already been inserted 264*0dd62a01SPablo de Larabefore, or will insert the <key,value> pair if this key has not been inserted 265*0dd62a01SPablo de Larabefore. It will return 0 upon success. It will return 266*0dd62a01SPablo de Lara``EFD_UPDATE_WARN_GROUP_FULL (1)`` if the operation is insert, and the 267*0dd62a01SPablo de Laralast available space in the key's group was just used. It will return 268*0dd62a01SPablo de Lara``EFD_UPDATE_FAILED (2)`` when the insertion or update has failed (either it 269*0dd62a01SPablo de Larafailed to find a suitable perfect hash or the group was full). The function 270*0dd62a01SPablo de Larawill return ``EFD_UPDATE_NO_CHANGE (3)`` if there is no change to the EFD 271*0dd62a01SPablo de Laratable (i.e, same value already exists). 272*0dd62a01SPablo de Lara 273*0dd62a01SPablo de LaraEFD Lookup 274*0dd62a01SPablo de Lara~~~~~~~~~~ 275*0dd62a01SPablo de Lara 276*0dd62a01SPablo de LaraTo lookup a certain key in an EFD table, the function ``rte_efd_lookup()`` 277*0dd62a01SPablo de Larais used to return the value associated with single key. 278*0dd62a01SPablo de LaraAs previously mentioned, if the key has been inserted, the correct value 279*0dd62a01SPablo de Larainserted is returned, if the key has not been inserted before, 280*0dd62a01SPablo de Laraa ‘random’ value (based on hashing of the key) is returned. 281*0dd62a01SPablo de LaraFor better performance and to decrease the overhead of 282*0dd62a01SPablo de Larafunction calls per key, it is always recommended to use a bulk lookup 283*0dd62a01SPablo de Larafunction (simultaneous lookup of multiple keys) instead of a single key 284*0dd62a01SPablo de Laralookup function. ``rte_efd_lookup_bulk()`` is the bulk lookup function, 285*0dd62a01SPablo de Larathat looks up num_keys simultaneously stored in the key_list and the 286*0dd62a01SPablo de Laracorresponding return values will be returned in the value_list. 287*0dd62a01SPablo de Lara 288*0dd62a01SPablo de LaraEFD Delete 289*0dd62a01SPablo de Lara~~~~~~~~~~ 290*0dd62a01SPablo de Lara 291*0dd62a01SPablo de LaraTo delete a certain key in an EFD table, the function 292*0dd62a01SPablo de Lara``rte_efd_delete()`` can be used. The function returns zero upon success 293*0dd62a01SPablo de Larawhen the key has been found and deleted. Socket_id is the parameter to 294*0dd62a01SPablo de Larause to lookup the existing value, which is ideally the caller's socket id. 295*0dd62a01SPablo de LaraThe previous value associated with this key will be returned 296*0dd62a01SPablo de Larain the prev_value argument. 297*0dd62a01SPablo de Lara 298*0dd62a01SPablo de Lara.. _Efd_internals: 299*0dd62a01SPablo de Lara 300*0dd62a01SPablo de LaraLibrary Internals 301*0dd62a01SPablo de Lara----------------- 302*0dd62a01SPablo de Lara 303*0dd62a01SPablo de LaraThis section provides the brief high-level idea and an overview 304*0dd62a01SPablo de Laraof the library internals to accompany the RFC. The intent of this 305*0dd62a01SPablo de Larasection is to explain to readers the high-level implementation of 306*0dd62a01SPablo de Larainsert, lookup and group rebalancing in the EFD library. 307*0dd62a01SPablo de Lara 308*0dd62a01SPablo de LaraInsert Function Internals 309*0dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~ 310*0dd62a01SPablo de Lara 311*0dd62a01SPablo de LaraAs previously mentioned the EFD divides the whole set of keys into 312*0dd62a01SPablo de Laragroups of a manageable size (e.g. 28 keys) and then searches for the 313*0dd62a01SPablo de Laraperfect hash that satisfies the intended target value for each key. EFD 314*0dd62a01SPablo de Larastores two version of the <key,value> table: 315*0dd62a01SPablo de Lara 316*0dd62a01SPablo de Lara- Offline Version (in memory): Only used for the insertion/update 317*0dd62a01SPablo de Lara operation, which is less frequent than the lookup operation. In the 318*0dd62a01SPablo de Lara offline version the exact keys for each group is stored. When a new 319*0dd62a01SPablo de Lara key is added, the hash function is updated that will satisfy the 320*0dd62a01SPablo de Lara value for the new key together with the all old keys already inserted 321*0dd62a01SPablo de Lara in this group. 322*0dd62a01SPablo de Lara 323*0dd62a01SPablo de Lara- Online Version (in cache): Used for the frequent lookup operation. In 324*0dd62a01SPablo de Lara the online version, as previously mentioned, the keys are not stored 325*0dd62a01SPablo de Lara but rather only the hash index for each group. 326*0dd62a01SPablo de Lara 327*0dd62a01SPablo de Lara.. _figure_efd7: 328*0dd62a01SPablo de Lara 329*0dd62a01SPablo de Lara.. figure:: img/efd_i7.* 330*0dd62a01SPablo de Lara 331*0dd62a01SPablo de Lara Group Assignment 332*0dd62a01SPablo de Lara 333*0dd62a01SPablo de Lara:numref:`figure_efd7` depicts the group assignment for 7 flow keys as an example. 334*0dd62a01SPablo de LaraGiven a flow key, a hash function (in our implementation CRC hash) is 335*0dd62a01SPablo de Laraused to get the group id. As shown in the figure, the groups can be 336*0dd62a01SPablo de Laraunbalanced. (We highlight group rebalancing further below). 337*0dd62a01SPablo de Lara 338*0dd62a01SPablo de Lara.. _figure_efd8: 339*0dd62a01SPablo de Lara 340*0dd62a01SPablo de Lara.. figure:: img/efd_i8.* 341*0dd62a01SPablo de Lara 342*0dd62a01SPablo de Lara Perfect Hash Search - Assigned Keys & Target Value 343*0dd62a01SPablo de Lara 344*0dd62a01SPablo de LaraFocusing on one group that has four keys, :numref:`figure_efd8` depicts the search 345*0dd62a01SPablo de Laraalgorithm to find the perfect hash function. Assuming that the target 346*0dd62a01SPablo de Laravalue bit for the keys is as shown in the figure, then the online EFD 347*0dd62a01SPablo de Laratable will store a 16 bit hash index and 16 bit lookup table per group 348*0dd62a01SPablo de Laraper value bit. 349*0dd62a01SPablo de Lara 350*0dd62a01SPablo de Lara.. _figure_efd9: 351*0dd62a01SPablo de Lara 352*0dd62a01SPablo de Lara.. figure:: img/efd_i9.* 353*0dd62a01SPablo de Lara 354*0dd62a01SPablo de Lara Perfect Hash Search - Satisfy Target Values 355*0dd62a01SPablo de Lara 356*0dd62a01SPablo de LaraFor a given keyX, a hash function ``(h(keyX, seed1) + index * h(keyX, seed2))`` 357*0dd62a01SPablo de Larais used to point to certain bit index in the 16bit lookup_table value, 358*0dd62a01SPablo de Laraas shown in :numref:`figure_efd9`. 359*0dd62a01SPablo de LaraThe insert function will brute force search for all possible values for the 360*0dd62a01SPablo de Larahash index until a non conflicting lookup_table is found. 361*0dd62a01SPablo de Lara 362*0dd62a01SPablo de Lara.. _figure_efd10: 363*0dd62a01SPablo de Lara 364*0dd62a01SPablo de Lara.. figure:: img/efd_i10.* 365*0dd62a01SPablo de Lara 366*0dd62a01SPablo de Lara Finding Hash Index for Conflict Free lookup_table 367*0dd62a01SPablo de Lara 368*0dd62a01SPablo de LaraFor example, since both key3 and key7 have a target bit value of 1, it 369*0dd62a01SPablo de Larais okay if the hash function of both keys point to the same bit in the 370*0dd62a01SPablo de Laralookup table. A conflict will occur if a hash index is used that maps 371*0dd62a01SPablo de Laraboth Key4 and Key7 to the same index in the lookup_table, 372*0dd62a01SPablo de Laraas shown in :numref:`figure_efd10`, since their target value bit are not the same. 373*0dd62a01SPablo de LaraOnce a hash index is found that produces a lookup_table with no 374*0dd62a01SPablo de Laracontradictions, this index is stored for this group. This procedure is 375*0dd62a01SPablo de Lararepeated for each bit of target value. 376*0dd62a01SPablo de Lara 377*0dd62a01SPablo de LaraLookup Function Internals 378*0dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~ 379*0dd62a01SPablo de Lara 380*0dd62a01SPablo de LaraThe design principle of EFD is that lookups are much more frequent than 381*0dd62a01SPablo de Larainserts, and hence, EFD's design optimizes for the lookups which are 382*0dd62a01SPablo de Larafaster and much simpler than the slower insert procedure (inserts are 383*0dd62a01SPablo de Laraslow, because of perfect hash search as previously discussed). 384*0dd62a01SPablo de Lara 385*0dd62a01SPablo de Lara.. _figure_efd11: 386*0dd62a01SPablo de Lara 387*0dd62a01SPablo de Lara.. figure:: img/efd_i11.* 388*0dd62a01SPablo de Lara 389*0dd62a01SPablo de Lara EFD Lookup Operation 390*0dd62a01SPablo de Lara 391*0dd62a01SPablo de Lara:numref:`figure_efd11` depicts the lookup operation for EFD. Given an input key, 392*0dd62a01SPablo de Larathe group id is computed (using CRC hash) and then the hash index for this 393*0dd62a01SPablo de Laragroup is retrieved from the EFD table. Using the retrieved hash index, 394*0dd62a01SPablo de Larathe hash function ``h(key, seed1) + index *h(key, seed2)`` is used which will 395*0dd62a01SPablo de Lararesult in an index in the lookup_table, the bit corresponding to this 396*0dd62a01SPablo de Laraindex will be the target value bit. This procedure is repeated for each 397*0dd62a01SPablo de Larabit of the target value. 398*0dd62a01SPablo de Lara 399*0dd62a01SPablo de LaraGroup Rebalancing Function Internals 400*0dd62a01SPablo de Lara~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 401*0dd62a01SPablo de Lara 402*0dd62a01SPablo de LaraWhen discussing EFD inserts and lookups, the discussion is simplified by 403*0dd62a01SPablo de Laraassuming that a group id is simply a result of hash function. However, 404*0dd62a01SPablo de Larasince hashing in general is not perfect and will not always produce a 405*0dd62a01SPablo de Larauniform output, this simplified assumption will lead to unbalanced 406*0dd62a01SPablo de Laragroups, i.e., some group will have more keys than other groups. 407*0dd62a01SPablo de LaraTypically, and to minimize insert time with an increasing number of keys, 408*0dd62a01SPablo de Larait is preferable that all groups will have a balanced number of keys, so 409*0dd62a01SPablo de Larathe brute force search for the perfect hash terminates with a valid hash 410*0dd62a01SPablo de Laraindex. In order to achieve this target, groups are rebalanced during 411*0dd62a01SPablo de Lararuntime inserts, and keys are moved around from a busy group to a less 412*0dd62a01SPablo de Laracrowded group as the more keys are inserted. 413*0dd62a01SPablo de Lara 414*0dd62a01SPablo de Lara.. _figure_efd12: 415*0dd62a01SPablo de Lara 416*0dd62a01SPablo de Lara.. figure:: img/efd_i12.* 417*0dd62a01SPablo de Lara 418*0dd62a01SPablo de Lara Runtime Group Rebalancing 419*0dd62a01SPablo de Lara 420*0dd62a01SPablo de Lara:numref:`figure_efd12` depicts the high level idea of group rebalancing, given an 421*0dd62a01SPablo de Larainput key the hash result is split into two parts a chunk id and 8-bit 422*0dd62a01SPablo de Larabin id. A chunk contains 64 different groups and 256 bins (i.e. for any 423*0dd62a01SPablo de Laragiven bin it can map to 4 distinct groups). When a key is inserted, the 424*0dd62a01SPablo de Larabin id is computed, for example in :numref:`figure_efd12` bin_id=2, 425*0dd62a01SPablo de Laraand since each bin can be mapped to one of four different groups (2 bit storage), 426*0dd62a01SPablo de Larathe four possible mappings are evaluated and the one that will result in a 427*0dd62a01SPablo de Larabalanced key distribution across these four is selected the mapping result 428*0dd62a01SPablo de Larais stored in these two bits. 429*0dd62a01SPablo de Lara 430*0dd62a01SPablo de Lara 431*0dd62a01SPablo de Lara.. _Efd_references: 432*0dd62a01SPablo de Lara 433*0dd62a01SPablo de LaraReferences 434*0dd62a01SPablo de Lara----------- 435*0dd62a01SPablo de Lara 436*0dd62a01SPablo de Lara1- EFD is based on collaborative research work between Intel and 437*0dd62a01SPablo de LaraCarnegie Mellon University (CMU), interested readers can refer to the paper 438*0dd62a01SPablo de Lara“Scaling Up Clustered Network Appliances with ScaleBricks;” Dong Zhou et al. 439*0dd62a01SPablo de Laraat SIGCOMM 2015 (`http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p241.pdf`) 440*0dd62a01SPablo de Larafor more information. 441