xref: /dpdk/doc/guides/prog_guide/efd_lib.rst (revision 41dd9a6bc2d9c6e20e139ad713cc9d172572dd43)
1..  SPDX-License-Identifier: BSD-3-Clause
2    Copyright(c) 2016-2017 Intel Corporation.
3
4Elastic Flow Distributor (EFD) Library
5======================================
6
7Introduction
8------------
9
10In Data Centers today, clustering and scheduling of distributed workloads
11is a very common task. Many workloads require a deterministic
12partitioning of a flat key space among a cluster of machines. When a
13packet enters the cluster, the ingress node will direct the packet to
14its handling node. For example, data-centers with disaggregated storage
15use storage metadata tables to forward I/O requests to the correct back end
16storage cluster, stateful packet inspection will use match incoming
17flows to signatures in flow tables to send incoming packets to their
18intended deep packet inspection (DPI) devices, and so on.
19
20EFD is a distributor library that uses perfect hashing to determine a
21target/value for a given incoming flow key. It has the following
22advantages: first, because it uses perfect hashing it does not store the
23key itself and hence lookup performance is not dependent on the key
24size. Second, the target/value can be any arbitrary value hence the
25system designer and/or operator can better optimize service rates and
26inter-cluster network traffic locating. Third, since the storage
27requirement is much smaller than a hash-based flow table (i.e. better
28fit for CPU cache), EFD can scale to millions of flow keys. Finally,
29with the current optimized library implementation, performance is fully
30scalable with any number of CPU cores.
31
32Flow Based Distribution
33-----------------------
34
35Computation Based Schemes
36~~~~~~~~~~~~~~~~~~~~~~~~~
37
38Flow distribution and/or load balancing can be simply done using a
39stateless computation, for instance using round-robin or a simple
40computation based on the flow key as an input. For example, a hash
41function can be used to direct a certain flow to a target based on
42the flow key (e.g. ``h(key) mod n``) where h(key) is the hash value of the
43flow key and n is the number of possible targets.
44
45.. _figure_efd1:
46
47.. figure:: img/efd_i1.*
48
49  Load Balancing Using Front End Node
50
51In this scheme (:numref:`figure_efd1`), the front end server/distributor/load balancer
52extracts the flow key from the input packet and applies a computation to determine where
53this flow should be directed. Intuitively, this scheme is very simple
54and requires no state to be kept at the front end node, and hence,
55storage requirements are minimum.
56
57.. _figure_efd2:
58
59.. figure:: img/efd_i2.*
60
61  Consistent Hashing
62
63A widely used flow distributor that belongs to the same category of
64computation-based schemes is ``consistent hashing``, shown in :numref:`figure_efd2`.
65Target destinations (shown in red) are hashed into the same space as the flow
66keys (shown in blue), and keys are mapped to the nearest target in a clockwise
67fashion. Dynamically adding and removing targets with consistent hashing
68requires only K/n keys to be remapped on average, where K is the number of
69keys, and n is the number of targets. In contrast, in a traditional hash-based
70scheme, a change in the number of targets causes nearly all keys to be
71remapped.
72
73Although computation-based schemes are simple and need very little
74storage requirement, they suffer from the drawback that the system
75designer/operator can’t fully control the target to assign a specific
76key, as this is dictated by the hash function.
77Deterministically co-locating of keys together (for example, to minimize
78inter-server traffic or to optimize for network traffic conditions,
79target load, etc.) is simply not possible.
80
81Flow-Table Based Schemes
82~~~~~~~~~~~~~~~~~~~~~~~~
83
84When using a Flow-Table based scheme to handle flow distribution/load
85balancing, in contrast with computation-based schemes, the system designer
86has the flexibility of assigning a given flow to any given
87target. The flow table (e.g. DPDK RTE Hash Library) will simply store
88both the flow key and the target value.
89
90.. _figure_efd3:
91
92.. figure:: img/efd_i3.*
93
94  Table Based Flow Distribution
95
96As shown in :numref:`figure_efd3`, when doing a lookup, the flow-table
97is indexed with the hash of the flow key and the keys (more than one is possible,
98because of hash collision) stored in this index and corresponding values
99are retrieved. The retrieved key(s) is matched with the input flow key
100and if there is a match the value (target id) is returned.
101
102The drawback of using a hash table for flow distribution/load balancing
103is the storage requirement, since the flow table need to store keys,
104signatures and target values. This doesn't allow this scheme to scale to
105millions of flow keys. Large tables will usually not fit in
106the CPU cache, and hence, the lookup performance is degraded because of
107the latency to access the main memory.
108
109EFD Based Scheme
110~~~~~~~~~~~~~~~~
111
112EFD combines the advantages of both flow-table based and computation-based
113schemes. It doesn't require the large storage necessary for
114flow-table based schemes (because EFD doesn't store the key as explained
115below), and it supports any arbitrary value for any given key.
116
117.. _figure_efd4:
118
119.. figure:: img/efd_i4.*
120
121  Searching for Perfect Hash Function
122
123The basic idea of EFD is when a given key is to be inserted, a family of
124hash functions is searched until the correct hash function that maps the
125input key to the correct value is found, as shown in :numref:`figure_efd4`.
126However, rather than explicitly storing all keys and their associated values,
127EFD stores only indices of hash functions that map keys to values, and
128thereby consumes much less space than conventional flow-based tables.
129The lookup operation is very simple, similar to a computational-based
130scheme: given an input key the lookup operation is reduced to hashing
131that key with the correct hash function.
132
133.. _figure_efd5:
134
135.. figure:: img/efd_i5.*
136
137  Divide and Conquer for Millions of Keys
138
139Intuitively, finding a hash function that maps each of a large number
140(millions) of input keys to the correct output value is effectively
141impossible, as a result EFD, as shown in :numref:`figure_efd5`,
142breaks the problem into smaller pieces (divide and conquer).
143EFD divides the entire input key set into many small groups.
144Each group consists of approximately 20-28 keys (a configurable parameter
145for the library), then, for each small group, a brute force search to find
146a hash function that produces the correct outputs for each key in the group.
147
148It should be mentioned that, since the online lookup table for EFD
149doesn't store the key itself, the size of the EFD table is independent
150of the key size and hence EFD lookup performance which is almost
151constant irrespective of the length of the key which is a highly
152desirable feature especially for longer keys.
153
154In summary, EFD is a set separation data structure that supports millions of
155keys. It is used to distribute a given key to an intended target. By itself
156EFD is not a FIB data structure with an exact match the input flow key.
157
158.. _Efd_example:
159
160Example of EFD Library Usage
161----------------------------
162
163EFD can be used along the data path of many network functions and middleboxes.
164As previously mentioned, it can used as an index table for
165<key,value> pairs, meta-data for objects, a flow-level load balancer, etc.
166:numref:`figure_efd6` shows an example of using EFD as a flow-level load
167balancer, where flows are received at a front end server before being forwarded
168to the target back end server for processing. The system designer would
169deterministically co-locate flows together in order to minimize cross-server
170interaction.
171(For example, flows requesting certain webpage objects are co-located
172together, to minimize forwarding of common objects across servers).
173
174.. _figure_efd6:
175
176.. figure:: img/efd_i6.*
177
178  EFD as a Flow-Level Load Balancer
179
180As shown in :numref:`figure_efd6`, the front end server will have an EFD table that
181stores for each group what is the perfect hash index that satisfies the
182correct output. Because the table size is small and fits in cache (since
183keys are not stored), it sustains a large number of flows (N*X, where N
184is the maximum number of flows served by each back end server of the X
185possible targets).
186
187With an input flow key, the group id is computed (for example, using
188last few bits of CRC hash) and then the EFD table is indexed with the
189group id to retrieve the corresponding hash index to use. Once the index
190is retrieved the key is hashed using this hash function and the result
191will be the intended correct target where this flow is supposed to be
192processed.
193
194It should be noted that as a result of EFD not matching the exact key but
195rather distributing the flows to a target back end node based on the
196perfect hash index, a key that has not been inserted before
197will be distributed to a valid target. Hence, a local table which stores
198the flows served at each node is used and is
199exact matched with the input key to rule out new never seen before
200flows.
201
202.. _Efd_api:
203
204Library API Overview
205--------------------
206
207The EFD library API is created with a very similar semantics of a
208hash-index or a flow table. The application creates an EFD table for a
209given maximum number of flows, a function is called to insert a flow key
210with a specific target value, and another function is used to retrieve
211target values for a given individual flow key or a bulk of keys.
212
213EFD Table Create
214~~~~~~~~~~~~~~~~
215
216The function ``rte_efd_create()`` is used to create and return a pointer
217to an EFD table that is sized to hold up to num_flows key.
218The online version of the EFD table (the one that does
219not store the keys and is used for lookups) will be allocated and
220created in the last level cache (LLC) of the socket defined by the
221online_socket_bitmask, while the offline EFD table (the one that
222stores the keys and is used for key inserts and for computing the
223perfect hashing) is allocated and created in the LLC of the socket
224defined by offline_socket_bitmask. It should be noted, that for
225highest performance the socket id should match that where the thread is
226running, i.e. the online EFD lookup table should be created on the same
227socket as where the lookup thread is running.
228
229EFD Insert and Update
230~~~~~~~~~~~~~~~~~~~~~
231
232The EFD function to insert a key or update a key to a new value is
233``rte_efd_update()``. This function will update an existing key to
234a new value (target) if the key has already been inserted
235before, or will insert the <key,value> pair if this key has not been inserted
236before. It will return 0 upon success. It will return
237``EFD_UPDATE_WARN_GROUP_FULL (1)`` if the operation is insert, and the
238last available space in the key's group was just used. It will return
239``EFD_UPDATE_FAILED (2)`` when the insertion or update has failed (either it
240failed to find a suitable perfect hash or the group was full). The function
241will return ``EFD_UPDATE_NO_CHANGE (3)`` if there is no change to the EFD
242table (i.e, same value already exists).
243
244.. Note::
245
246   This function is not multi-thread safe and should only be called
247   from one thread.
248
249EFD Lookup
250~~~~~~~~~~
251
252To lookup a certain key in an EFD table, the function ``rte_efd_lookup()``
253is used to return the value associated with single key.
254As previously mentioned, if the key has been inserted, the correct value
255inserted is returned, if the key has not been inserted before,
256a ‘random’ value (based on hashing of the key) is returned.
257For better performance and to decrease the overhead of
258function calls per key, it is always recommended to use a bulk lookup
259function (simultaneous lookup of multiple keys) instead of a single key
260lookup function. ``rte_efd_lookup_bulk()`` is the bulk lookup function,
261that looks up num_keys simultaneously stored in the key_list and the
262corresponding return values will be returned in the value_list.
263
264.. Note::
265
266   This function is multi-thread safe, but there should not be other threads
267   writing in the EFD table, unless locks are used.
268
269EFD Delete
270~~~~~~~~~~
271
272To delete a certain key in an EFD table, the function
273``rte_efd_delete()`` can be used. The function returns zero upon success
274when the key has been found and deleted. Socket_id is the parameter to
275use to lookup the existing value, which is ideally the caller's socket id.
276The previous value associated with this key will be returned
277in the prev_value argument.
278
279.. Note::
280
281   This function is not multi-thread safe and should only be called
282   from one thread.
283
284.. _Efd_internals:
285
286Library Internals
287-----------------
288
289This section provides the brief high-level idea and an overview
290of the library internals to accompany the RFC. The intent of this
291section is to explain to readers the high-level implementation of
292insert, lookup and group rebalancing in the EFD library.
293
294Insert Function Internals
295~~~~~~~~~~~~~~~~~~~~~~~~~
296
297As previously mentioned the EFD divides the whole set of keys into
298groups of a manageable size (e.g. 28 keys) and then searches for the
299perfect hash that satisfies the intended target value for each key. EFD
300stores two version of the <key,value> table:
301
302-  Offline Version (in memory): Only used for the insertion/update
303   operation, which is less frequent than the lookup operation. In the
304   offline version the exact keys for each group is stored. When a new
305   key is added, the hash function is updated that will satisfy the
306   value for the new key together with the all old keys already inserted
307   in this group.
308
309-  Online Version (in cache): Used for the frequent lookup operation. In
310   the online version, as previously mentioned, the keys are not stored
311   but rather only the hash index for each group.
312
313.. _figure_efd7:
314
315.. figure:: img/efd_i7.*
316
317  Group Assignment
318
319:numref:`figure_efd7` depicts the group assignment for 7 flow keys as an example.
320Given a flow key, a hash function (in our implementation CRC hash) is
321used to get the group id. As shown in the figure, the groups can be
322unbalanced. (We highlight group rebalancing further below).
323
324.. _figure_efd8:
325
326.. figure:: img/efd_i8.*
327
328  Perfect Hash Search - Assigned Keys & Target Value
329
330Focusing on one group that has four keys, :numref:`figure_efd8` depicts the search
331algorithm to find the perfect hash function. Assuming that the target
332value bit for the keys is as shown in the figure, then the online EFD
333table will store a 16 bit hash index and 16 bit lookup table per group
334per value bit.
335
336.. _figure_efd9:
337
338.. figure:: img/efd_i9.*
339
340  Perfect Hash Search - Satisfy Target Values
341
342For a given keyX, a hash function ``(h(keyX, seed1) + index * h(keyX, seed2))``
343is used to point to certain bit index in the 16bit lookup_table value,
344as shown in :numref:`figure_efd9`.
345The insert function will brute force search for all possible values for the
346hash index until a non conflicting lookup_table is found.
347
348.. _figure_efd10:
349
350.. figure:: img/efd_i10.*
351
352  Finding Hash Index for Conflict Free lookup_table
353
354For example, since both key3 and key7 have a target bit value of 1, it
355is okay if the hash function of both keys point to the same bit in the
356lookup table. A conflict will occur if a hash index is used that maps
357both Key4 and Key7 to the same index in the lookup_table,
358as shown in :numref:`figure_efd10`, since their target value bit are not the same.
359Once a hash index is found that produces a lookup_table with no
360contradictions, this index is stored for this group. This procedure is
361repeated for each bit of target value.
362
363Lookup Function Internals
364~~~~~~~~~~~~~~~~~~~~~~~~~
365
366The design principle of EFD is that lookups are much more frequent than
367inserts, and hence, EFD's design optimizes for the lookups which are
368faster and much simpler than the slower insert procedure (inserts are
369slow, because of perfect hash search as previously discussed).
370
371.. _figure_efd11:
372
373.. figure:: img/efd_i11.*
374
375  EFD Lookup Operation
376
377:numref:`figure_efd11` depicts the lookup operation for EFD. Given an input key,
378the group id is computed (using CRC hash) and then the hash index for this
379group is retrieved from the EFD table. Using the retrieved hash index,
380the hash function ``h(key, seed1) + index *h(key, seed2)`` is used which will
381result in an index in the lookup_table, the bit corresponding to this
382index will be the target value bit. This procedure is repeated for each
383bit of the target value.
384
385Group Rebalancing Function Internals
386~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
387
388When discussing EFD inserts and lookups, the discussion is simplified by
389assuming that a group id is simply a result of hash function. However,
390since hashing in general is not perfect and will not always produce a
391uniform output, this simplified assumption will lead to unbalanced
392groups, i.e., some group will have more keys than other groups.
393Typically, and to minimize insert time with an increasing number of keys,
394it is preferable that all groups will have a balanced number of keys, so
395the brute force search for the perfect hash terminates with a valid hash
396index. In order to achieve this target, groups are rebalanced during
397runtime inserts, and keys are moved around from a busy group to a less
398crowded group as the more keys are inserted.
399
400.. _figure_efd12:
401
402.. figure:: img/efd_i12.*
403
404  Runtime Group Rebalancing
405
406:numref:`figure_efd12` depicts the high level idea of group rebalancing, given an
407input key the hash result is split into two parts a chunk id and 8-bit
408bin id. A chunk contains 64 different groups and 256 bins (i.e. for any
409given bin it can map to 4 distinct groups). When a key is inserted, the
410bin id is computed, for example in :numref:`figure_efd12` bin_id=2,
411and since each bin can be mapped to one of four different groups (2 bit storage),
412the four possible mappings are evaluated and the one that will result in a
413balanced key distribution across these four is selected the mapping result
414is stored in these two bits.
415
416
417.. _Efd_references:
418
419References
420-----------
421
4221- EFD is based on collaborative research work between Intel and
423Carnegie Mellon University (CMU), interested readers can refer to the paper
424"Scaling Up Clustered Network Appliances with ScaleBricks" Dong Zhou et al.
425at SIGCOMM 2015 (`http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p241.pdf`)
426for more information.
427