xref: /dpdk/doc/guides/prog_guide/efd_lib.rst (revision aa8128b1d830dbfcd92636a5fb6038a3b877ffc1)
1..  BSD LICENSE
2    Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
3    All rights reserved.
4
5    Redistribution and use in source and binary forms, with or without
6    modification, are permitted provided that the following conditions
7    are met:
8
9    * Redistributions of source code must retain the above copyright
10    notice, this list of conditions and the following disclaimer.
11    * Redistributions in binary form must reproduce the above copyright
12    notice, this list of conditions and the following disclaimer in
13    the documentation and/or other materials provided with the
14    distribution.
15    * Neither the name of Intel Corporation nor the names of its
16    contributors may be used to endorse or promote products derived
17    from this software without specific prior written permission.
18
19    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30
31.. _Efd_Library:
32
33Elastic Flow Distributor Library
34================================
35
36Introduction
37------------
38
39In Data Centers today, clustering and scheduling of distributed workloads
40is a very common task. Many workloads require a deterministic
41partitioning of a flat key space among a cluster of machines. When a
42packet enters the cluster, the ingress node will direct the packet to
43its handling node. For example, data-centers with disaggregated storage
44use storage metadata tables to forward I/O requests to the correct back end
45storage cluster, stateful packet inspection will use match incoming
46flows to signatures in flow tables to send incoming packets to their
47intended deep packet inspection (DPI) devices, and so on.
48
49EFD is a distributor library that uses perfect hashing to determine a
50target/value for a given incoming flow key. It has the following
51advantages: first, because it uses perfect hashing it does not store the
52key itself and hence lookup performance is not dependent on the key
53size. Second, the target/value can be any arbitrary value hence the
54system designer and/or operator can better optimize service rates and
55inter-cluster network traffic locating. Third, since the storage
56requirement is much smaller than a hash-based flow table (i.e. better
57fit for CPU cache), EFD can scale to millions of flow keys. Finally,
58with the current optimized library implementation, performance is fully
59scalable with any number of CPU cores.
60
61Flow Based Distribution
62-----------------------
63
64Computation Based Schemes
65~~~~~~~~~~~~~~~~~~~~~~~~~
66
67Flow distribution and/or load balancing can be simply done using a
68stateless computation, for instance using round-robin or a simple
69computation based on the flow key as an input. For example, a hash
70function can be used to direct a certain flow to a target based on
71the flow key (e.g. ``h(key) mod n``) where h(key) is the hash value of the
72flow key and n is the number of possible targets.
73
74.. _figure_efd1:
75
76.. figure:: img/efd_i1.*
77
78  Load Balancing Using Front End Node
79
80In this scheme (:numref:`figure_efd1`), the front end server/distributor/load balancer
81extracts the flow key from the input packet and applies a computation to determine where
82this flow should be directed. Intuitively, this scheme is very simple
83and requires no state to be kept at the front end node, and hence,
84storage requirements are minimum.
85
86.. _figure_efd2:
87
88.. figure:: img/efd_i2.*
89
90  Consistent Hashing
91
92A widely used flow distributor that belongs to the same category of
93computation-based schemes is ``consistent hashing``, shown in :numref:`figure_efd2`.
94Target destinations (shown in red) are hashed into the same space as the flow
95keys (shown in blue), and keys are mapped to the nearest target in a clockwise
96fashion. Dynamically adding and removing targets with consistent hashing
97requires only K/n keys to be remapped on average, where K is the number of
98keys, and n is the number of targets. In contrast, in a traditional hash-based
99scheme, a change in the number of targets causes nearly all keys to be
100remapped.
101
102Although computation-based schemes are simple and need very little
103storage requirement, they suffer from the drawback that the system
104designer/operator can’t fully control the target to assign a specific
105key, as this is dictated by the hash function.
106Deterministically co-locating of keys together (for example, to minimize
107inter-server traffic or to optimize for network traffic conditions,
108target load, etc.) is simply not possible.
109
110Flow-Table Based Schemes
111~~~~~~~~~~~~~~~~~~~~~~~~
112
113When using a Flow-Table based scheme to handle flow distribution/load
114balancing, in contrast with computation-based schemes, the system designer
115has the flexibility of assigning a given flow to any given
116target. The flow table (e.g. DPDK RTE Hash Library) will simply store
117both the flow key and the target value.
118
119.. _figure_efd3:
120
121.. figure:: img/efd_i3.*
122
123  Table Based Flow Distribution
124
125As shown in :numref:`figure_efd3`, when doing a lookup, the flow-table
126is indexed with the hash of the flow key and the keys (more than one is possible,
127because of hash collision) stored in this index and corresponding values
128are retrieved. The retrieved key(s) is matched with the input flow key
129and if there is a match the value (target id) is returned.
130
131The drawback of using a hash table for flow distribution/load balancing
132is the storage requirement, since the flow table need to store keys,
133signatures and target values. This doesn't allow this scheme to scale to
134millions of flow keys. Large tables will usually not fit in
135the CPU cache, and hence, the lookup performance is degraded because of
136the latency to access the main memory.
137
138EFD Based Scheme
139~~~~~~~~~~~~~~~~
140
141EFD combines the advantages of both flow-table based and computation-based
142schemes. It doesn't require the large storage necessary for
143flow-table based schemes (because EFD doesn't store the key as explained
144below), and it supports any arbitrary value for any given key.
145
146.. _figure_efd4:
147
148.. figure:: img/efd_i4.*
149
150  Searching for Perfect Hash Function
151
152The basic idea of EFD is when a given key is to be inserted, a family of
153hash functions is searched until the correct hash function that maps the
154input key to the correct value is found, as shown in :numref:`figure_efd4`.
155However, rather than explicitly storing all keys and their associated values,
156EFD stores only indices of hash functions that map keys to values, and
157thereby consumes much less space than conventional flow-based tables.
158The lookup operation is very simple, similar to a computational-based
159scheme: given an input key the lookup operation is reduced to hashing
160that key with the correct hash function.
161
162.. _figure_efd5:
163
164.. figure:: img/efd_i5.*
165
166  Divide and Conquer for Millions of Keys
167
168Intuitively, finding a hash function that maps each of a large number
169(millions) of input keys to the correct output value is effectively
170impossible, as a result EFD, as shown in :numref:`figure_efd5`,
171breaks the problem into smaller pieces (divide and conquer).
172EFD divides the entire input key set into many small groups.
173Each group consists of approximately 20-28 keys (a configurable parameter
174for the library), then, for each small group, a brute force search to find
175a hash function that produces the correct outputs for each key in the group.
176
177It should be mentioned that, since the online lookup table for EFD
178doesn't store the key itself, the size of the EFD table is independent
179of the key size and hence EFD lookup performance which is almost
180constant irrespective of the length of the key which is a highly
181desirable feature especially for longer keys.
182
183In summary, EFD is a set separation data structure that supports millions of
184keys. It is used to distribute a given key to an intended target. By itself
185EFD is not a FIB data structure with an exact match the input flow key.
186
187.. _Efd_example:
188
189Example of EFD Library Usage
190----------------------------
191
192EFD can be used along the data path of many network functions and middleboxes.
193As previously mentioned, it can used as an index table for
194<key,value> pairs, meta-data for objects, a flow-level load balancer, etc.
195:numref:`figure_efd6` shows an example of using EFD as a flow-level load
196balancer, where flows are received at a front end server before being forwarded
197to the target back end server for processing. The system designer would
198deterministically co-locate flows together in order to minimize cross-server
199interaction.
200(For example, flows requesting certain webpage objects are co-located
201together, to minimize forwarding of common objects across servers).
202
203.. _figure_efd6:
204
205.. figure:: img/efd_i6.*
206
207  EFD as a Flow-Level Load Balancer
208
209As shown in :numref:`figure_efd6`, the front end server will have an EFD table that
210stores for each group what is the perfect hash index that satisfies the
211correct output. Because the table size is small and fits in cache (since
212keys are not stored), it sustains a large number of flows (N*X, where N
213is the maximum number of flows served by each back end server of the X
214possible targets).
215
216With an input flow key, the group id is computed (for example, using
217last few bits of CRC hash) and then the EFD table is indexed with the
218group id to retrieve the corresponding hash index to use. Once the index
219is retrieved the key is hashed using this hash function and the result
220will be the intended correct target where this flow is supposed to be
221processed.
222
223It should be noted that as a result of EFD not matching the exact key but
224rather distributing the flows to a target back end node based on the
225perfect hash index, a key that has not been inserted before
226will be distributed to a valid target. Hence, a local table which stores
227the flows served at each node is used and is
228exact matched with the input key to rule out new never seen before
229flows.
230
231.. _Efd_api:
232
233Library API Overview
234--------------------
235
236The EFD library API is created with a very similar semantics of a
237hash-index or a flow table. The application creates an EFD table for a
238given maximum number of flows, a function is called to insert a flow key
239with a specific target value, and another function is used to retrieve
240target values for a given individual flow key or a bulk of keys.
241
242EFD Table Create
243~~~~~~~~~~~~~~~~
244
245The function ``rte_efd_create()`` is used to create and return a pointer
246to an EFD table that is sized to hold up to num_flows key.
247The online version of the EFD table (the one that does
248not store the keys and is used for lookups) will be allocated and
249created in the last level cache (LLC) of the socket defined by the
250online_socket_bitmask, while the offline EFD table (the one that
251stores the keys and is used for key inserts and for computing the
252perfect hashing) is allocated and created in the LLC of the socket
253defined by offline_socket_bitmask. It should be noted, that for
254highest performance the socket id should match that where the thread is
255running, i.e. the online EFD lookup table should be created on the same
256socket as where the lookup thread is running.
257
258EFD Insert and Update
259~~~~~~~~~~~~~~~~~~~~~
260
261The EFD function to insert a key or update a key to a new value is
262``rte_efd_update()``. This function will update an existing key to
263a new value (target) if the key has already been inserted
264before, or will insert the <key,value> pair if this key has not been inserted
265before. It will return 0 upon success. It will return
266``EFD_UPDATE_WARN_GROUP_FULL (1)`` if the operation is insert, and the
267last available space in the key's group was just used. It will return
268``EFD_UPDATE_FAILED (2)`` when the insertion or update has failed (either it
269failed to find a suitable perfect hash or the group was full). The function
270will return ``EFD_UPDATE_NO_CHANGE (3)`` if there is no change to the EFD
271table (i.e, same value already exists).
272
273EFD Lookup
274~~~~~~~~~~
275
276To lookup a certain key in an EFD table, the function ``rte_efd_lookup()``
277is used to return the value associated with single key.
278As previously mentioned, if the key has been inserted, the correct value
279inserted is returned, if the key has not been inserted before,
280a ‘random’ value (based on hashing of the key) is returned.
281For better performance and to decrease the overhead of
282function calls per key, it is always recommended to use a bulk lookup
283function (simultaneous lookup of multiple keys) instead of a single key
284lookup function. ``rte_efd_lookup_bulk()`` is the bulk lookup function,
285that looks up num_keys simultaneously stored in the key_list and the
286corresponding return values will be returned in the value_list.
287
288EFD Delete
289~~~~~~~~~~
290
291To delete a certain key in an EFD table, the function
292``rte_efd_delete()`` can be used. The function returns zero upon success
293when the key has been found and deleted. Socket_id is the parameter to
294use to lookup the existing value, which is ideally the caller's socket id.
295The previous value associated with this key will be returned
296in the prev_value argument.
297
298.. _Efd_internals:
299
300Library Internals
301-----------------
302
303This section provides the brief high-level idea and an overview
304of the library internals to accompany the RFC. The intent of this
305section is to explain to readers the high-level implementation of
306insert, lookup and group rebalancing in the EFD library.
307
308Insert Function Internals
309~~~~~~~~~~~~~~~~~~~~~~~~~
310
311As previously mentioned the EFD divides the whole set of keys into
312groups of a manageable size (e.g. 28 keys) and then searches for the
313perfect hash that satisfies the intended target value for each key. EFD
314stores two version of the <key,value> table:
315
316-  Offline Version (in memory): Only used for the insertion/update
317   operation, which is less frequent than the lookup operation. In the
318   offline version the exact keys for each group is stored. When a new
319   key is added, the hash function is updated that will satisfy the
320   value for the new key together with the all old keys already inserted
321   in this group.
322
323-  Online Version (in cache): Used for the frequent lookup operation. In
324   the online version, as previously mentioned, the keys are not stored
325   but rather only the hash index for each group.
326
327.. _figure_efd7:
328
329.. figure:: img/efd_i7.*
330
331  Group Assignment
332
333:numref:`figure_efd7` depicts the group assignment for 7 flow keys as an example.
334Given a flow key, a hash function (in our implementation CRC hash) is
335used to get the group id. As shown in the figure, the groups can be
336unbalanced. (We highlight group rebalancing further below).
337
338.. _figure_efd8:
339
340.. figure:: img/efd_i8.*
341
342  Perfect Hash Search - Assigned Keys & Target Value
343
344Focusing on one group that has four keys, :numref:`figure_efd8` depicts the search
345algorithm to find the perfect hash function. Assuming that the target
346value bit for the keys is as shown in the figure, then the online EFD
347table will store a 16 bit hash index and 16 bit lookup table per group
348per value bit.
349
350.. _figure_efd9:
351
352.. figure:: img/efd_i9.*
353
354  Perfect Hash Search - Satisfy Target Values
355
356For a given keyX, a hash function ``(h(keyX, seed1) + index * h(keyX, seed2))``
357is used to point to certain bit index in the 16bit lookup_table value,
358as shown in :numref:`figure_efd9`.
359The insert function will brute force search for all possible values for the
360hash index until a non conflicting lookup_table is found.
361
362.. _figure_efd10:
363
364.. figure:: img/efd_i10.*
365
366  Finding Hash Index for Conflict Free lookup_table
367
368For example, since both key3 and key7 have a target bit value of 1, it
369is okay if the hash function of both keys point to the same bit in the
370lookup table. A conflict will occur if a hash index is used that maps
371both Key4 and Key7 to the same index in the lookup_table,
372as shown in :numref:`figure_efd10`, since their target value bit are not the same.
373Once a hash index is found that produces a lookup_table with no
374contradictions, this index is stored for this group. This procedure is
375repeated for each bit of target value.
376
377Lookup Function Internals
378~~~~~~~~~~~~~~~~~~~~~~~~~
379
380The design principle of EFD is that lookups are much more frequent than
381inserts, and hence, EFD's design optimizes for the lookups which are
382faster and much simpler than the slower insert procedure (inserts are
383slow, because of perfect hash search as previously discussed).
384
385.. _figure_efd11:
386
387.. figure:: img/efd_i11.*
388
389  EFD Lookup Operation
390
391:numref:`figure_efd11` depicts the lookup operation for EFD. Given an input key,
392the group id is computed (using CRC hash) and then the hash index for this
393group is retrieved from the EFD table. Using the retrieved hash index,
394the hash function ``h(key, seed1) + index *h(key, seed2)`` is used which will
395result in an index in the lookup_table, the bit corresponding to this
396index will be the target value bit. This procedure is repeated for each
397bit of the target value.
398
399Group Rebalancing Function Internals
400~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
401
402When discussing EFD inserts and lookups, the discussion is simplified by
403assuming that a group id is simply a result of hash function. However,
404since hashing in general is not perfect and will not always produce a
405uniform output, this simplified assumption will lead to unbalanced
406groups, i.e., some group will have more keys than other groups.
407Typically, and to minimize insert time with an increasing number of keys,
408it is preferable that all groups will have a balanced number of keys, so
409the brute force search for the perfect hash terminates with a valid hash
410index. In order to achieve this target, groups are rebalanced during
411runtime inserts, and keys are moved around from a busy group to a less
412crowded group as the more keys are inserted.
413
414.. _figure_efd12:
415
416.. figure:: img/efd_i12.*
417
418  Runtime Group Rebalancing
419
420:numref:`figure_efd12` depicts the high level idea of group rebalancing, given an
421input key the hash result is split into two parts a chunk id and 8-bit
422bin id. A chunk contains 64 different groups and 256 bins (i.e. for any
423given bin it can map to 4 distinct groups). When a key is inserted, the
424bin id is computed, for example in :numref:`figure_efd12` bin_id=2,
425and since each bin can be mapped to one of four different groups (2 bit storage),
426the four possible mappings are evaluated and the one that will result in a
427balanced key distribution across these four is selected the mapping result
428is stored in these two bits.
429
430
431.. _Efd_references:
432
433References
434-----------
435
4361- EFD is based on collaborative research work between Intel and
437Carnegie Mellon University (CMU), interested readers can refer to the paper
438“Scaling Up Clustered Network Appliances with ScaleBricks;” Dong Zhou et al.
439at SIGCOMM 2015 (`http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p241.pdf`)
440for more information.
441