1.. SPDX-License-Identifier: BSD-3-Clause 2 Copyright(c) 2016-2017 Intel Corporation. 3 4.. _Efd_Library: 5 6Elastic Flow Distributor Library 7================================ 8 9Introduction 10------------ 11 12In Data Centers today, clustering and scheduling of distributed workloads 13is a very common task. Many workloads require a deterministic 14partitioning of a flat key space among a cluster of machines. When a 15packet enters the cluster, the ingress node will direct the packet to 16its handling node. For example, data-centers with disaggregated storage 17use storage metadata tables to forward I/O requests to the correct back end 18storage cluster, stateful packet inspection will use match incoming 19flows to signatures in flow tables to send incoming packets to their 20intended deep packet inspection (DPI) devices, and so on. 21 22EFD is a distributor library that uses perfect hashing to determine a 23target/value for a given incoming flow key. It has the following 24advantages: first, because it uses perfect hashing it does not store the 25key itself and hence lookup performance is not dependent on the key 26size. Second, the target/value can be any arbitrary value hence the 27system designer and/or operator can better optimize service rates and 28inter-cluster network traffic locating. Third, since the storage 29requirement is much smaller than a hash-based flow table (i.e. better 30fit for CPU cache), EFD can scale to millions of flow keys. Finally, 31with the current optimized library implementation, performance is fully 32scalable with any number of CPU cores. 33 34Flow Based Distribution 35----------------------- 36 37Computation Based Schemes 38~~~~~~~~~~~~~~~~~~~~~~~~~ 39 40Flow distribution and/or load balancing can be simply done using a 41stateless computation, for instance using round-robin or a simple 42computation based on the flow key as an input. For example, a hash 43function can be used to direct a certain flow to a target based on 44the flow key (e.g. ``h(key) mod n``) where h(key) is the hash value of the 45flow key and n is the number of possible targets. 46 47.. _figure_efd1: 48 49.. figure:: img/efd_i1.* 50 51 Load Balancing Using Front End Node 52 53In this scheme (:numref:`figure_efd1`), the front end server/distributor/load balancer 54extracts the flow key from the input packet and applies a computation to determine where 55this flow should be directed. Intuitively, this scheme is very simple 56and requires no state to be kept at the front end node, and hence, 57storage requirements are minimum. 58 59.. _figure_efd2: 60 61.. figure:: img/efd_i2.* 62 63 Consistent Hashing 64 65A widely used flow distributor that belongs to the same category of 66computation-based schemes is ``consistent hashing``, shown in :numref:`figure_efd2`. 67Target destinations (shown in red) are hashed into the same space as the flow 68keys (shown in blue), and keys are mapped to the nearest target in a clockwise 69fashion. Dynamically adding and removing targets with consistent hashing 70requires only K/n keys to be remapped on average, where K is the number of 71keys, and n is the number of targets. In contrast, in a traditional hash-based 72scheme, a change in the number of targets causes nearly all keys to be 73remapped. 74 75Although computation-based schemes are simple and need very little 76storage requirement, they suffer from the drawback that the system 77designer/operator can’t fully control the target to assign a specific 78key, as this is dictated by the hash function. 79Deterministically co-locating of keys together (for example, to minimize 80inter-server traffic or to optimize for network traffic conditions, 81target load, etc.) is simply not possible. 82 83Flow-Table Based Schemes 84~~~~~~~~~~~~~~~~~~~~~~~~ 85 86When using a Flow-Table based scheme to handle flow distribution/load 87balancing, in contrast with computation-based schemes, the system designer 88has the flexibility of assigning a given flow to any given 89target. The flow table (e.g. DPDK RTE Hash Library) will simply store 90both the flow key and the target value. 91 92.. _figure_efd3: 93 94.. figure:: img/efd_i3.* 95 96 Table Based Flow Distribution 97 98As shown in :numref:`figure_efd3`, when doing a lookup, the flow-table 99is indexed with the hash of the flow key and the keys (more than one is possible, 100because of hash collision) stored in this index and corresponding values 101are retrieved. The retrieved key(s) is matched with the input flow key 102and if there is a match the value (target id) is returned. 103 104The drawback of using a hash table for flow distribution/load balancing 105is the storage requirement, since the flow table need to store keys, 106signatures and target values. This doesn't allow this scheme to scale to 107millions of flow keys. Large tables will usually not fit in 108the CPU cache, and hence, the lookup performance is degraded because of 109the latency to access the main memory. 110 111EFD Based Scheme 112~~~~~~~~~~~~~~~~ 113 114EFD combines the advantages of both flow-table based and computation-based 115schemes. It doesn't require the large storage necessary for 116flow-table based schemes (because EFD doesn't store the key as explained 117below), and it supports any arbitrary value for any given key. 118 119.. _figure_efd4: 120 121.. figure:: img/efd_i4.* 122 123 Searching for Perfect Hash Function 124 125The basic idea of EFD is when a given key is to be inserted, a family of 126hash functions is searched until the correct hash function that maps the 127input key to the correct value is found, as shown in :numref:`figure_efd4`. 128However, rather than explicitly storing all keys and their associated values, 129EFD stores only indices of hash functions that map keys to values, and 130thereby consumes much less space than conventional flow-based tables. 131The lookup operation is very simple, similar to a computational-based 132scheme: given an input key the lookup operation is reduced to hashing 133that key with the correct hash function. 134 135.. _figure_efd5: 136 137.. figure:: img/efd_i5.* 138 139 Divide and Conquer for Millions of Keys 140 141Intuitively, finding a hash function that maps each of a large number 142(millions) of input keys to the correct output value is effectively 143impossible, as a result EFD, as shown in :numref:`figure_efd5`, 144breaks the problem into smaller pieces (divide and conquer). 145EFD divides the entire input key set into many small groups. 146Each group consists of approximately 20-28 keys (a configurable parameter 147for the library), then, for each small group, a brute force search to find 148a hash function that produces the correct outputs for each key in the group. 149 150It should be mentioned that, since the online lookup table for EFD 151doesn't store the key itself, the size of the EFD table is independent 152of the key size and hence EFD lookup performance which is almost 153constant irrespective of the length of the key which is a highly 154desirable feature especially for longer keys. 155 156In summary, EFD is a set separation data structure that supports millions of 157keys. It is used to distribute a given key to an intended target. By itself 158EFD is not a FIB data structure with an exact match the input flow key. 159 160.. _Efd_example: 161 162Example of EFD Library Usage 163---------------------------- 164 165EFD can be used along the data path of many network functions and middleboxes. 166As previously mentioned, it can used as an index table for 167<key,value> pairs, meta-data for objects, a flow-level load balancer, etc. 168:numref:`figure_efd6` shows an example of using EFD as a flow-level load 169balancer, where flows are received at a front end server before being forwarded 170to the target back end server for processing. The system designer would 171deterministically co-locate flows together in order to minimize cross-server 172interaction. 173(For example, flows requesting certain webpage objects are co-located 174together, to minimize forwarding of common objects across servers). 175 176.. _figure_efd6: 177 178.. figure:: img/efd_i6.* 179 180 EFD as a Flow-Level Load Balancer 181 182As shown in :numref:`figure_efd6`, the front end server will have an EFD table that 183stores for each group what is the perfect hash index that satisfies the 184correct output. Because the table size is small and fits in cache (since 185keys are not stored), it sustains a large number of flows (N*X, where N 186is the maximum number of flows served by each back end server of the X 187possible targets). 188 189With an input flow key, the group id is computed (for example, using 190last few bits of CRC hash) and then the EFD table is indexed with the 191group id to retrieve the corresponding hash index to use. Once the index 192is retrieved the key is hashed using this hash function and the result 193will be the intended correct target where this flow is supposed to be 194processed. 195 196It should be noted that as a result of EFD not matching the exact key but 197rather distributing the flows to a target back end node based on the 198perfect hash index, a key that has not been inserted before 199will be distributed to a valid target. Hence, a local table which stores 200the flows served at each node is used and is 201exact matched with the input key to rule out new never seen before 202flows. 203 204.. _Efd_api: 205 206Library API Overview 207-------------------- 208 209The EFD library API is created with a very similar semantics of a 210hash-index or a flow table. The application creates an EFD table for a 211given maximum number of flows, a function is called to insert a flow key 212with a specific target value, and another function is used to retrieve 213target values for a given individual flow key or a bulk of keys. 214 215EFD Table Create 216~~~~~~~~~~~~~~~~ 217 218The function ``rte_efd_create()`` is used to create and return a pointer 219to an EFD table that is sized to hold up to num_flows key. 220The online version of the EFD table (the one that does 221not store the keys and is used for lookups) will be allocated and 222created in the last level cache (LLC) of the socket defined by the 223online_socket_bitmask, while the offline EFD table (the one that 224stores the keys and is used for key inserts and for computing the 225perfect hashing) is allocated and created in the LLC of the socket 226defined by offline_socket_bitmask. It should be noted, that for 227highest performance the socket id should match that where the thread is 228running, i.e. the online EFD lookup table should be created on the same 229socket as where the lookup thread is running. 230 231EFD Insert and Update 232~~~~~~~~~~~~~~~~~~~~~ 233 234The EFD function to insert a key or update a key to a new value is 235``rte_efd_update()``. This function will update an existing key to 236a new value (target) if the key has already been inserted 237before, or will insert the <key,value> pair if this key has not been inserted 238before. It will return 0 upon success. It will return 239``EFD_UPDATE_WARN_GROUP_FULL (1)`` if the operation is insert, and the 240last available space in the key's group was just used. It will return 241``EFD_UPDATE_FAILED (2)`` when the insertion or update has failed (either it 242failed to find a suitable perfect hash or the group was full). The function 243will return ``EFD_UPDATE_NO_CHANGE (3)`` if there is no change to the EFD 244table (i.e, same value already exists). 245 246.. Note:: 247 248 This function is not multi-thread safe and should only be called 249 from one thread. 250 251EFD Lookup 252~~~~~~~~~~ 253 254To lookup a certain key in an EFD table, the function ``rte_efd_lookup()`` 255is used to return the value associated with single key. 256As previously mentioned, if the key has been inserted, the correct value 257inserted is returned, if the key has not been inserted before, 258a ‘random’ value (based on hashing of the key) is returned. 259For better performance and to decrease the overhead of 260function calls per key, it is always recommended to use a bulk lookup 261function (simultaneous lookup of multiple keys) instead of a single key 262lookup function. ``rte_efd_lookup_bulk()`` is the bulk lookup function, 263that looks up num_keys simultaneously stored in the key_list and the 264corresponding return values will be returned in the value_list. 265 266.. Note:: 267 268 This function is multi-thread safe, but there should not be other threads 269 writing in the EFD table, unless locks are used. 270 271EFD Delete 272~~~~~~~~~~ 273 274To delete a certain key in an EFD table, the function 275``rte_efd_delete()`` can be used. The function returns zero upon success 276when the key has been found and deleted. Socket_id is the parameter to 277use to lookup the existing value, which is ideally the caller's socket id. 278The previous value associated with this key will be returned 279in the prev_value argument. 280 281.. Note:: 282 283 This function is not multi-thread safe and should only be called 284 from one thread. 285 286.. _Efd_internals: 287 288Library Internals 289----------------- 290 291This section provides the brief high-level idea and an overview 292of the library internals to accompany the RFC. The intent of this 293section is to explain to readers the high-level implementation of 294insert, lookup and group rebalancing in the EFD library. 295 296Insert Function Internals 297~~~~~~~~~~~~~~~~~~~~~~~~~ 298 299As previously mentioned the EFD divides the whole set of keys into 300groups of a manageable size (e.g. 28 keys) and then searches for the 301perfect hash that satisfies the intended target value for each key. EFD 302stores two version of the <key,value> table: 303 304- Offline Version (in memory): Only used for the insertion/update 305 operation, which is less frequent than the lookup operation. In the 306 offline version the exact keys for each group is stored. When a new 307 key is added, the hash function is updated that will satisfy the 308 value for the new key together with the all old keys already inserted 309 in this group. 310 311- Online Version (in cache): Used for the frequent lookup operation. In 312 the online version, as previously mentioned, the keys are not stored 313 but rather only the hash index for each group. 314 315.. _figure_efd7: 316 317.. figure:: img/efd_i7.* 318 319 Group Assignment 320 321:numref:`figure_efd7` depicts the group assignment for 7 flow keys as an example. 322Given a flow key, a hash function (in our implementation CRC hash) is 323used to get the group id. As shown in the figure, the groups can be 324unbalanced. (We highlight group rebalancing further below). 325 326.. _figure_efd8: 327 328.. figure:: img/efd_i8.* 329 330 Perfect Hash Search - Assigned Keys & Target Value 331 332Focusing on one group that has four keys, :numref:`figure_efd8` depicts the search 333algorithm to find the perfect hash function. Assuming that the target 334value bit for the keys is as shown in the figure, then the online EFD 335table will store a 16 bit hash index and 16 bit lookup table per group 336per value bit. 337 338.. _figure_efd9: 339 340.. figure:: img/efd_i9.* 341 342 Perfect Hash Search - Satisfy Target Values 343 344For a given keyX, a hash function ``(h(keyX, seed1) + index * h(keyX, seed2))`` 345is used to point to certain bit index in the 16bit lookup_table value, 346as shown in :numref:`figure_efd9`. 347The insert function will brute force search for all possible values for the 348hash index until a non conflicting lookup_table is found. 349 350.. _figure_efd10: 351 352.. figure:: img/efd_i10.* 353 354 Finding Hash Index for Conflict Free lookup_table 355 356For example, since both key3 and key7 have a target bit value of 1, it 357is okay if the hash function of both keys point to the same bit in the 358lookup table. A conflict will occur if a hash index is used that maps 359both Key4 and Key7 to the same index in the lookup_table, 360as shown in :numref:`figure_efd10`, since their target value bit are not the same. 361Once a hash index is found that produces a lookup_table with no 362contradictions, this index is stored for this group. This procedure is 363repeated for each bit of target value. 364 365Lookup Function Internals 366~~~~~~~~~~~~~~~~~~~~~~~~~ 367 368The design principle of EFD is that lookups are much more frequent than 369inserts, and hence, EFD's design optimizes for the lookups which are 370faster and much simpler than the slower insert procedure (inserts are 371slow, because of perfect hash search as previously discussed). 372 373.. _figure_efd11: 374 375.. figure:: img/efd_i11.* 376 377 EFD Lookup Operation 378 379:numref:`figure_efd11` depicts the lookup operation for EFD. Given an input key, 380the group id is computed (using CRC hash) and then the hash index for this 381group is retrieved from the EFD table. Using the retrieved hash index, 382the hash function ``h(key, seed1) + index *h(key, seed2)`` is used which will 383result in an index in the lookup_table, the bit corresponding to this 384index will be the target value bit. This procedure is repeated for each 385bit of the target value. 386 387Group Rebalancing Function Internals 388~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 389 390When discussing EFD inserts and lookups, the discussion is simplified by 391assuming that a group id is simply a result of hash function. However, 392since hashing in general is not perfect and will not always produce a 393uniform output, this simplified assumption will lead to unbalanced 394groups, i.e., some group will have more keys than other groups. 395Typically, and to minimize insert time with an increasing number of keys, 396it is preferable that all groups will have a balanced number of keys, so 397the brute force search for the perfect hash terminates with a valid hash 398index. In order to achieve this target, groups are rebalanced during 399runtime inserts, and keys are moved around from a busy group to a less 400crowded group as the more keys are inserted. 401 402.. _figure_efd12: 403 404.. figure:: img/efd_i12.* 405 406 Runtime Group Rebalancing 407 408:numref:`figure_efd12` depicts the high level idea of group rebalancing, given an 409input key the hash result is split into two parts a chunk id and 8-bit 410bin id. A chunk contains 64 different groups and 256 bins (i.e. for any 411given bin it can map to 4 distinct groups). When a key is inserted, the 412bin id is computed, for example in :numref:`figure_efd12` bin_id=2, 413and since each bin can be mapped to one of four different groups (2 bit storage), 414the four possible mappings are evaluated and the one that will result in a 415balanced key distribution across these four is selected the mapping result 416is stored in these two bits. 417 418 419.. _Efd_references: 420 421References 422----------- 423 4241- EFD is based on collaborative research work between Intel and 425Carnegie Mellon University (CMU), interested readers can refer to the paper 426"Scaling Up Clustered Network Appliances with ScaleBricks" Dong Zhou et al. 427at SIGCOMM 2015 (`http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p241.pdf`) 428for more information. 429