xref: /dpdk/doc/guides/prog_guide/hash_lib.rst (revision 41dd9a6bc2d9c6e20e139ad713cc9d172572dd43)
1..  SPDX-License-Identifier: BSD-3-Clause
2    Copyright(c) 2010-2015 Intel Corporation.
3    Copyright(c) 2018 Arm Limited.
4
5Hash Library
6============
7
8The DPDK provides a Hash Library for creating hash table for fast lookup.
9The hash table is a data structure optimized for searching through a set of entries that are each identified by a unique key.
10For increased performance the DPDK Hash requires that all the keys have the same number of bytes which is set at the hash creation time.
11
12Hash API Overview
13-----------------
14
15The main configuration parameters for the hash table are:
16
17*   Total number of hash entries in the table
18
19*   Size of the key in bytes
20
21*   An extra flag to describe additional settings, for example the multithreading mode of operation and extendable bucket functionality (as will be described later)
22
23The hash table also allows the configuration of some low-level implementation related parameters such as:
24
25*   Hash function to translate the key into a hash value
26
27The main methods exported by the Hash Library are:
28
29*   Add entry with key: The key is provided as input. If the new entry is successfully added to the hash table for the specified key,
30    or there is already an entry in the hash table for the specified key, then the position of the entry is returned.
31    If the operation was not successful, for example due to lack of free entries in the hash table, then a negative value is returned.
32
33*   Delete entry with key: The key is provided as input. If an entry with the specified key is found in the hash,
34    then the entry is removed from the hash table and the position where the entry was found in the hash table is returned.
35    If no entry with the specified key exists in the hash table, then a negative value is returned
36
37*   Lookup for entry with key: The key is provided as input. If an entry with the specified key is found in the hash table (i.e., lookup hit),
38    then the position of the entry is returned, otherwise (i.e., lookup miss) a negative value is returned.
39
40Apart from the basic methods explained above, the Hash Library API provides a few more advanced methods to query and update the hash table:
41
42*   Add / lookup / delete entry with key and precomputed hash: Both the key and its precomputed hash are provided as input. This allows
43    the user to perform these operations faster, as the hash value is already computed.
44
45*   Add / lookup entry with key and data: A data is provided as input for add. Add allows the user to store
46    not only the key, but also the data which may be either a 8-byte integer or a pointer to external data (if data size is more than 8 bytes).
47
48*   Combination of the two options above: User can provide key, precomputed hash, and data.
49
50*   Ability to not free the position of the entry in the hash table upon calling delete. This is useful for multi-threaded scenarios where
51    readers continue to use the position even after the entry is deleted.
52
53Also, the API contains a method to allow the user to look up entries in batches, achieving higher performance
54than looking up individual entries, as the function prefetches next entries at the time it is operating
55with the current ones, which reduces significantly the performance overhead of the necessary memory accesses.
56
57
58The actual data associated with each key can be either managed by the user using a separate table that
59mirrors the hash in terms of number of entries and position of each entry,
60as shown in the Flow Classification use case described in the following sections,
61or stored in the hash table itself.
62
63The example hash tables in the L2/L3 Forwarding sample applications define which port to forward a packet to based on a packet flow identified by the five-tuple lookup.
64However, this table could also be used for more sophisticated features and provide many other functions and actions that could be performed on the packets and flows.
65
66Multi-process support
67---------------------
68
69The hash library can be used in a multi-process environment.
70The only function that can only be used in single-process mode is rte_hash_set_cmp_func(), which sets up
71a custom compare function, which is assigned to a function pointer (therefore, it is not supported in
72multi-process mode).
73
74
75Multi-thread support
76---------------------
77
78The hash library supports multithreading, and the user specifies the needed mode of operation at the creation time of the hash table
79by appropriately setting the flag. In all modes of operation lookups are thread-safe meaning lookups can be called from multiple
80threads concurrently.
81
82For concurrent writes, and concurrent reads and writes the following flag values define the corresponding modes of operation:
83
84*  If the multi-writer flag (RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD) is set, multiple threads writing to the table is allowed.
85   Key add, delete, and table reset are protected from other writer threads. With only this flag set, readers are not protected from ongoing writes.
86
87*  If the read/write concurrency (RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY) is set, multithread read/write operation is safe
88   (i.e., application does not need to stop the readers from accessing the hash table until writers finish their updates. Readers and writers can operate on the table concurrently).
89   The library uses a reader-writer lock to provide the concurrency.
90
91*  In addition to these two flag values, if the transactional memory flag (RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT) is also set,
92   the reader-writer lock will use hardware transactional memory (e.g., Intel® TSX) if supported to guarantee thread safety.
93   If the platform supports Intel® TSX, it is advised to set the transactional memory flag, as this will speed up concurrent table operations.
94   Otherwise concurrent operations will be slower because of the overhead associated with the software locking mechanisms.
95
96*  If lock free read/write concurrency (RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF) is set, read/write concurrency is provided without using reader-writer lock.
97   For platforms (e.g., current ARM based platforms) that do not support transactional memory, it is advised to set this flag to achieve greater scalability in performance.
98   If this flag is set, the (RTE_HASH_EXTRA_FLAGS_NO_FREE_ON_DEL) flag is set by default.
99
100*  If the 'do not free on delete' (RTE_HASH_EXTRA_FLAGS_NO_FREE_ON_DEL) flag is set, the position of the entry in the hash table is not freed upon calling delete(). This flag is enabled
101   by default when the lock free read/write concurrency flag is set. The application should free the position after all the readers have stopped referencing the position.
102   Where required, the application can make use of RCU mechanisms to determine when the readers have stopped referencing the position.
103   RCU QSBR process is integrated within the Hash library for safe freeing of the position. Application has certain responsibilities while using this feature.
104   For example, rte_hash_rcu_qsbr_add() need to be called to use the integrated RCU mechanism.
105   Please refer to resource reclamation framework of :doc:`rcu_lib` for more details.
106
107
108Extendable Bucket Functionality support
109----------------------------------------
110An extra flag is used to enable this functionality (flag is not set by default). When the (RTE_HASH_EXTRA_FLAGS_EXT_TABLE) is set and
111in the very unlikely case due to excessive hash collisions that a key has failed to be inserted, the hash table bucket is extended with a linked
112list to insert these failed keys. This feature is important for the workloads (e.g. telco workloads) that need to insert up to 100% of the
113hash table size and can't tolerate any key insertion failure (even if very few).
114Please note that with the 'lock free read/write concurrency' flag enabled, users need to call 'rte_hash_free_key_with_position' API or configure integrated RCU QSBR
115(or use external RCU mechanisms) in order to free the empty buckets and deleted keys, to maintain the 100% capacity guarantee.
116
117Implementation Details (non Extendable Bucket Case)
118---------------------------------------------------
119
120The hash table has two main tables:
121
122* First table is an array of buckets each of which consists of multiple entries,
123  Each entry contains the signature
124  of a given key (explained below), and an index to the second table.
125
126* The second table is an array of all the keys stored in the hash table and its data associated to each key.
127
128The hash library uses the Cuckoo Hash algorithm to resolve collisions.
129For any input key, there are two possible buckets (primary and secondary/alternative location)
130to store that key in the hash table, therefore only the entries within those two buckets need to be examined
131when the key is looked up.
132The Hash Library uses a hash function (configurable) to translate the input key into a 4-byte hash value.
133The bucket index and a 2-byte signature is derived from the hash value using partial-key hashing [partial-key].
134
135Once the buckets are identified, the scope of the key add,
136delete, and lookup operations is reduced to the entries in those buckets (it is very likely that entries are in the primary bucket).
137
138To speed up the search logic within the bucket, each hash entry stores the 2-byte key signature together with the full key for each hash table entry.
139For large key sizes, comparing the input key against a key from the bucket can take significantly more time than
140comparing the 2-byte signature of the input key against the signature of a key from the bucket.
141Therefore, the signature comparison is done first and the full key comparison is done only when the signatures matches.
142The full key comparison is still necessary, as two input keys from the same bucket can still potentially have the same 2-byte signature,
143although this event is relatively rare for hash functions providing good uniform distributions for the set of input keys.
144
145Example of lookup:
146
147First of all, the primary bucket is identified and entry is likely to be stored there.
148If signature was stored there, we compare its key against the one provided and return the position
149where it was stored and/or the data associated to that key if there is a match.
150If signature is not in the primary bucket, the secondary bucket is looked up, where same procedure
151is carried out. If there is no match there either, key is not in the table and a negative value will be returned.
152
153Example of addition:
154
155Like lookup, the primary and secondary buckets are identified. If there is an empty entry in
156the primary bucket, a signature is stored in that entry, key and data (if any) are added to
157the second table and the index in the second table is stored in the entry of the first table.
158If there is no space in the primary bucket, one of the entries on that bucket is pushed to its alternative location,
159and the key to be added is inserted in its position.
160To know where the alternative bucket of the evicted entry is, a mechanism called partial-key hashing [partial-key] is used.
161If there is room in the alternative bucket, the evicted entry
162is stored in it. If not, same process is repeated (one of the entries gets pushed) until an empty entry is found.
163Notice that despite all the entry movement in the first table, the second table is not touched, which would impact
164greatly in performance.
165
166In the very unlikely event that an empty entry cannot be found after certain number of displacements,
167key is considered not able to be added (unless extendable bucket flag is set, and in that case the bucket is extended to insert the key, as will be explained later).
168With random keys, this method allows the user to get more than 90% table utilization, without
169having to drop any stored entry (e.g. using a LRU replacement policy) or allocate more memory (extendable buckets or rehashing).
170
171
172Example of deletion:
173
174Similar to lookup, the key is searched in its primary and secondary buckets. If the key is found, the
175entry is marked as empty. If the hash table was configured with 'no free on delete' or 'lock free read/write concurrency',
176the position of the key is not freed. It is the responsibility of the user to free the position after
177readers are not referencing the position anymore. User can configure integrated RCU QSBR or use external RCU mechanisms to safely free the position on delete.
178
179
180Implementation Details (with Extendable Bucket)
181-------------------------------------------------
182When the RTE_HASH_EXTRA_FLAGS_EXT_TABLE flag is set, the hash table implementation still uses the same Cuckoo Hash algorithm to store the keys into
183the first and second tables. However, in the very unlikely event that a key can't be inserted after certain number of the Cuckoo displacements is
184reached, the secondary bucket of this key is extended
185with a linked list of extra buckets and the key is stored in this linked list.
186
187In case of lookup for a certain key, as before, the primary bucket is searched for a match and then the secondary bucket is looked up.
188If there is no match there either, the extendable buckets (linked list of extra buckets) are searched one by one for a possible match and if there is no match
189the key is considered not to be in the table.
190
191The deletion is the same as the case when the RTE_HASH_EXTRA_FLAGS_EXT_TABLE flag is not set. With one exception, if a key is deleted from any bucket
192and an empty location is created, the last entry from the extendable buckets associated with this bucket is displaced into
193this empty location to possibly shorten the linked list.
194
195
196Entry distribution in hash table
197--------------------------------
198
199As mentioned above, Cuckoo hash implementation pushes elements out of their bucket,
200if there is a new entry to be added which primary location coincides with their current bucket,
201being pushed to their alternative location.
202Therefore, as user adds more entries to the hash table, distribution of the hash values
203in the buckets will change, being most of them in their primary location and a few in
204their secondary location, which the later will increase, as table gets busier.
205This information is quite useful, as performance may be lower as more entries
206are evicted to their secondary location.
207
208See the tables below showing example entry distribution as table utilization increases.
209
210.. _table_hash_lib_1:
211
212.. table:: Entry distribution measured with an example table with 1024 random entries using jhash algorithm
213
214   +--------------+-----------------------+-------------------------+
215   | % Table used | % In Primary location | % In Secondary location |
216   +==============+=======================+=========================+
217   |      25      |         100           |           0             |
218   +--------------+-----------------------+-------------------------+
219   |      50      |         96.1          |           3.9           |
220   +--------------+-----------------------+-------------------------+
221   |      75      |         88.2          |           11.8          |
222   +--------------+-----------------------+-------------------------+
223   |      80      |         86.3          |           13.7          |
224   +--------------+-----------------------+-------------------------+
225   |      85      |         83.1          |           16.9          |
226   +--------------+-----------------------+-------------------------+
227   |      90      |         77.3          |           22.7          |
228   +--------------+-----------------------+-------------------------+
229   |      95.8    |         64.5          |           35.5          |
230   +--------------+-----------------------+-------------------------+
231
232|
233
234.. _table_hash_lib_2:
235
236.. table:: Entry distribution measured with an example table with 1 million random entries using jhash algorithm
237
238   +--------------+-----------------------+-------------------------+
239   | % Table used | % In Primary location | % In Secondary location |
240   +==============+=======================+=========================+
241   |      50      |         96            |           4             |
242   +--------------+-----------------------+-------------------------+
243   |      75      |         86.9          |           13.1          |
244   +--------------+-----------------------+-------------------------+
245   |      80      |         83.9          |           16.1          |
246   +--------------+-----------------------+-------------------------+
247   |      85      |         80.1          |           19.9          |
248   +--------------+-----------------------+-------------------------+
249   |      90      |         74.8          |           25.2          |
250   +--------------+-----------------------+-------------------------+
251   |      94.5    |         67.4          |           32.6          |
252   +--------------+-----------------------+-------------------------+
253
254.. note::
255
256   Last values on the tables above are the average maximum table
257   utilization with random keys and using Jenkins hash function.
258
259Use Case: Flow Classification
260-----------------------------
261
262Flow classification is used to map each input packet to the connection/flow it belongs to.
263This operation is necessary as the processing of each input packet is usually done in the context of their connection,
264so the same set of operations is applied to all the packets from the same flow.
265
266Applications using flow classification typically have a flow table to manage, with each separate flow having an entry associated with it in this table.
267The size of the flow table entry is application specific, with typical values of 4, 16, 32 or 64 bytes.
268
269Each application using flow classification typically has a mechanism defined to uniquely identify a flow based on
270a number of fields read from the input packet that make up the flow key.
271One example is to use the DiffServ 5-tuple made up of the following fields of the IP and transport layer packet headers:
272Source IP Address, Destination IP Address, Protocol, Source Port, Destination Port.
273
274The DPDK hash provides a generic method to implement an application specific flow classification mechanism.
275Given a flow table implemented as an array, the application should create a hash object with the same number of entries as the flow table and
276with the hash key size set to the number of bytes in the selected flow key.
277
278The flow table operations on the application side are described below:
279
280*   Add flow: Add the flow key to hash.
281    If the returned position is valid, use it to access the flow entry in the flow table for adding a new flow or
282    updating the information associated with an existing flow.
283    Otherwise, the flow addition failed, for example due to lack of free entries for storing new flows.
284
285*   Delete flow: Delete the flow key from the hash. If the returned position is valid,
286    use it to access the flow entry in the flow table to invalidate the information associated with the flow.
287
288*   Free flow: Free flow key position. If 'no free on delete' or 'lock-free read/write concurrency' flags are set,
289    wait till the readers are not referencing the position returned during add/delete flow and then free the position.
290    RCU mechanisms can be used to find out when the readers are not referencing the position anymore.
291    RCU QSBR process is integrated within the Hash library for safe freeing of the position. Application has certain responsibilities while using this feature.
292    Please refer to resource reclamation framework of :doc:`rcu_lib` for more details.
293
294*   Lookup flow: Lookup for the flow key in the hash.
295    If the returned position is valid (flow lookup hit), use the returned position to access the flow entry in the flow table.
296    Otherwise (flow lookup miss) there is no flow registered for the current packet.
297
298References
299----------
300
301*   Donald E. Knuth, The Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition), 1998, Addison-Wesley Professional
302* [partial-key] Bin Fan, David G. Andersen, and Michael Kaminsky, MemC3: compact and concurrent MemCache with dumber caching and smarter hashing, 2013, NSDI
303