xref: /dpdk/doc/guides/prog_guide/hash_lib.rst (revision 27bd48ffe9ad13b0e20ac5bb4162b83b15305606)
1fc1f2750SBernard Iremonger..  BSD LICENSE
2*27bd48ffSPablo de Lara    Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
3fc1f2750SBernard Iremonger    All rights reserved.
4fc1f2750SBernard Iremonger
5fc1f2750SBernard Iremonger    Redistribution and use in source and binary forms, with or without
6fc1f2750SBernard Iremonger    modification, are permitted provided that the following conditions
7fc1f2750SBernard Iremonger    are met:
8fc1f2750SBernard Iremonger
9fc1f2750SBernard Iremonger    * Redistributions of source code must retain the above copyright
10fc1f2750SBernard Iremonger    notice, this list of conditions and the following disclaimer.
11fc1f2750SBernard Iremonger    * Redistributions in binary form must reproduce the above copyright
12fc1f2750SBernard Iremonger    notice, this list of conditions and the following disclaimer in
13fc1f2750SBernard Iremonger    the documentation and/or other materials provided with the
14fc1f2750SBernard Iremonger    distribution.
15fc1f2750SBernard Iremonger    * Neither the name of Intel Corporation nor the names of its
16fc1f2750SBernard Iremonger    contributors may be used to endorse or promote products derived
17fc1f2750SBernard Iremonger    from this software without specific prior written permission.
18fc1f2750SBernard Iremonger
19fc1f2750SBernard Iremonger    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20fc1f2750SBernard Iremonger    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21fc1f2750SBernard Iremonger    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22fc1f2750SBernard Iremonger    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23fc1f2750SBernard Iremonger    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24fc1f2750SBernard Iremonger    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25fc1f2750SBernard Iremonger    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26fc1f2750SBernard Iremonger    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27fc1f2750SBernard Iremonger    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28fc1f2750SBernard Iremonger    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29fc1f2750SBernard Iremonger    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30fc1f2750SBernard Iremonger
31fc1f2750SBernard Iremonger.. _Hash_Library:
32fc1f2750SBernard Iremonger
33fc1f2750SBernard IremongerHash Library
34fc1f2750SBernard Iremonger============
35fc1f2750SBernard Iremonger
3648624fd9SSiobhan ButlerThe DPDK provides a Hash Library for creating hash table for fast lookup.
37fc1f2750SBernard IremongerThe hash table is a data structure optimized for searching through a set of entries that are each identified by a unique key.
3848624fd9SSiobhan ButlerFor increased performance the DPDK Hash requires that all the keys have the same number of bytes which is set at the hash creation time.
39fc1f2750SBernard Iremonger
40fc1f2750SBernard IremongerHash API Overview
41fc1f2750SBernard Iremonger-----------------
42fc1f2750SBernard Iremonger
43fc1f2750SBernard IremongerThe main configuration parameters for the hash are:
44fc1f2750SBernard Iremonger
45fc1f2750SBernard Iremonger*   Total number of hash entries
46fc1f2750SBernard Iremonger
47fc1f2750SBernard Iremonger*   Size of the key in bytes
48fc1f2750SBernard Iremonger
49fc1f2750SBernard IremongerThe hash also allows the configuration of some low-level implementation related parameters such as:
50fc1f2750SBernard Iremonger
51fc1f2750SBernard Iremonger*   Hash function to translate the key into a bucket index
52fc1f2750SBernard Iremonger
53fc1f2750SBernard IremongerThe main methods exported by the hash are:
54fc1f2750SBernard Iremonger
55fc1f2750SBernard Iremonger*   Add entry with key: The key is provided as input. If a new entry is successfully added to the hash for the specified key,
56fc1f2750SBernard Iremonger    or there is already an entry in the hash for the specified key, then the position of the entry is returned.
57fc1f2750SBernard Iremonger    If the operation was not successful, for example due to lack of free entries in the hash, then a negative value is returned;
58fc1f2750SBernard Iremonger
59fc1f2750SBernard Iremonger*   Delete entry with key: The key is provided as input. If an entry with the specified key is found in the hash,
60fc1f2750SBernard Iremonger    then the entry is removed from the hash and the position where the entry was found in the hash is returned.
61fc1f2750SBernard Iremonger    If no entry with the specified key exists in the hash, then a negative value is returned
62fc1f2750SBernard Iremonger
63fc1f2750SBernard Iremonger*   Lookup for entry with key: The key is provided as input. If an entry with the specified key is found in the hash (lookup hit),
64fc1f2750SBernard Iremonger    then the position of the entry is returned, otherwise (lookup miss) a negative value is returned.
65fc1f2750SBernard Iremonger
66*27bd48ffSPablo de LaraApart from these method explained above, the API allows the user three more options:
67*27bd48ffSPablo de Lara
68*27bd48ffSPablo de Lara*   Add / lookup / delete with key and precomputed hash: Both the key and its precomputed hash are provided as input. This allows
69*27bd48ffSPablo de Lara    the user to perform these operations faster, as hash is already computed.
70*27bd48ffSPablo de Lara
71*27bd48ffSPablo de Lara*   Add / lookup with key and data: A pair of key-value is provided as input. This allows the user to store
72*27bd48ffSPablo de Lara    not only the key, but also data which may be either a 8-byte integer or a pointer to external data (if data size is more than 8 bytes).
73*27bd48ffSPablo de Lara
74*27bd48ffSPablo de Lara*   Combination of the two options above: User can provide key, precomputed hash and data.
75*27bd48ffSPablo de Lara
76*27bd48ffSPablo de LaraAlso, the API contains a method to allow the user to look up entries in bursts, achieving higher performance
77*27bd48ffSPablo de Larathan looking up individual entries, as the function prefetches next entries at the time it is operating
78*27bd48ffSPablo de Larawith the first ones, which reduces significantly the impact of the necessary memory accesses.
79*27bd48ffSPablo de LaraNotice that this method uses a pipeline of 8 entries (4 stages of 2 entries), so it is highly recommended
80*27bd48ffSPablo de Larato use at least 8 entries per burst.
81*27bd48ffSPablo de Lara
82*27bd48ffSPablo de LaraThe actual data associated with each key can be either managed by the user using a separate table that
83fc1f2750SBernard Iremongermirrors the hash in terms of number of entries and position of each entry,
84*27bd48ffSPablo de Laraas shown in the Flow Classification use case describes in the following sections,
85*27bd48ffSPablo de Laraor stored in the hash table itself.
86fc1f2750SBernard Iremonger
87fc1f2750SBernard IremongerThe example hash tables in the L2/L3 Forwarding sample applications defines which port to forward a packet to based on a packet flow identified by the five-tuple lookup.
88fc1f2750SBernard IremongerHowever, this table could also be used for more sophisticated features and provide many other functions and actions that could be performed on the packets and flows.
89fc1f2750SBernard Iremonger
90fc1f2750SBernard IremongerImplementation Details
91fc1f2750SBernard Iremonger----------------------
92fc1f2750SBernard Iremonger
93*27bd48ffSPablo de LaraThe hash table has two main tables:
94*27bd48ffSPablo de Lara
95*27bd48ffSPablo de Lara* First table is an array of entries which is further divided into buckets,
96*27bd48ffSPablo de Lara  with the same number of consecutive array entries in each bucket. Each entry contains the computed primary
97*27bd48ffSPablo de Lara  and secondary hashes of a given key (explained below), and an index to the second table.
98*27bd48ffSPablo de Lara
99*27bd48ffSPablo de Lara* The second table is an array of all the keys stored in the hash table and its data associated to each key.
100*27bd48ffSPablo de Lara
101*27bd48ffSPablo de LaraThe hash library uses the cuckoo hash method to resolve collisions.
102*27bd48ffSPablo de LaraFor any input key, there are two possible buckets (primary and secondary/alternative location)
103*27bd48ffSPablo de Larawhere that key can be stored in the hash, therefore only the entries within those bucket need to be examined
104*27bd48ffSPablo de Larawhen the key is looked up.
105fc1f2750SBernard IremongerThe lookup speed is achieved by reducing the number of entries to be scanned from the total
106*27bd48ffSPablo de Laranumber of hash entries down to the number of entries in the two hash buckets,
107fc1f2750SBernard Iremongeras opposed to the basic method of linearly scanning all the entries in the array.
108fc1f2750SBernard IremongerThe hash uses a hash function (configurable) to translate the input key into a 4-byte key signature.
109fc1f2750SBernard IremongerThe bucket index is the key signature modulo the number of hash buckets.
110*27bd48ffSPablo de Lara
111*27bd48ffSPablo de LaraOnce the buckets are identified, the scope of the hash add,
112*27bd48ffSPablo de Laradelete and lookup operations is reduced to the entries in those buckets (it is very likely that entries are in the primary bucket).
113fc1f2750SBernard Iremonger
114fc1f2750SBernard IremongerTo speed up the search logic within the bucket, each hash entry stores the 4-byte key signature together with the full key for each hash entry.
115fc1f2750SBernard IremongerFor large key sizes, comparing the input key against a key from the bucket can take significantly more time than
116fc1f2750SBernard Iremongercomparing the 4-byte signature of the input key against the signature of a key from the bucket.
117fc1f2750SBernard IremongerTherefore, the signature comparison is done first and the full key comparison done only when the signatures matches.
118fc1f2750SBernard IremongerThe full key comparison is still necessary, as two input keys from the same bucket can still potentially have the same 4-byte hash signature,
119fc1f2750SBernard Iremongeralthough this event is relatively rare for hash functions providing good uniform distributions for the set of input keys.
120fc1f2750SBernard Iremonger
121*27bd48ffSPablo de LaraExample of lookup:
122*27bd48ffSPablo de Lara
123*27bd48ffSPablo de LaraFirst of all, the primary bucket is identified and entry is likely to be stored there.
124*27bd48ffSPablo de LaraIf signature was stored there, we compare its key against the one provided and return the position
125*27bd48ffSPablo de Larawhere it was stored and/or the data associated to that key if there is a match.
126*27bd48ffSPablo de LaraIf signature is not in the primary bucket, the secondary bucket is looked up, where same procedure
127*27bd48ffSPablo de Larais carried out. If there is no match there either, key is considered not to be in the table.
128*27bd48ffSPablo de Lara
129*27bd48ffSPablo de LaraExample of addition:
130*27bd48ffSPablo de Lara
131*27bd48ffSPablo de LaraLike lookup, the primary and secondary buckets are indentified. If there is an empty slot in
132*27bd48ffSPablo de Larathe primary bucket, primary and secondary signatures are stored in that slot, key and data (if any) are added to
133*27bd48ffSPablo de Larathe second table and an index to the position in the second table is stored in the slot of the first table.
134*27bd48ffSPablo de LaraIf there is no space in the primary bucket, one of the entries on that bucket is pushed to its alternative location,
135*27bd48ffSPablo de Laraand the key to be added is inserted in its position.
136*27bd48ffSPablo de LaraTo know where the alternative bucket of the evicted entry is, the secondary signature is looked up and alternative bucket index
137*27bd48ffSPablo de Larais calculated from doing the modulo, as seen above. If there is room in the alternative bucket, the evicted entry
138*27bd48ffSPablo de Larais stored in it. If not, same process is repeated (one of the entries gets pushed) until a non full bucket is found.
139*27bd48ffSPablo de LaraNotice that despite all the entry movement in the first table, the second table is not touched, which would impact
140*27bd48ffSPablo de Laragreatly in performance.
141*27bd48ffSPablo de Lara
142*27bd48ffSPablo de LaraIn the very unlikely event that table enters in a loop where same entries are being evicted indefinitely,
143*27bd48ffSPablo de Larakey is considered not able to be stored.
144*27bd48ffSPablo de LaraWith random keys, this method allows the user to get around 90% of the table utilization, without
145*27bd48ffSPablo de Larahaving to drop any stored entry (LRU) or allocate more memory (extended buckets).
146*27bd48ffSPablo de Lara
147*27bd48ffSPablo de LaraEntry distribution in hash table
148*27bd48ffSPablo de Lara--------------------------------
149*27bd48ffSPablo de Lara
150*27bd48ffSPablo de LaraAs mentioned above, Cuckoo hash implementation pushes elements out of their bucket,
151*27bd48ffSPablo de Laraif there is a new entry to be added which primary location coincides with their current bucket,
152*27bd48ffSPablo de Larabeing pushed to their alternative location.
153*27bd48ffSPablo de LaraTherefore, as user adds more entries to the hash table, distribution of the hash values
154*27bd48ffSPablo de Larain the buckets will change, being most of them in their primary location and a few in
155*27bd48ffSPablo de Laratheir secondary location, which the later will increase, as table gets busier.
156*27bd48ffSPablo de LaraThis information is quite useful, as performance may be lower as more entries
157*27bd48ffSPablo de Laraare evicted to their secondary location.
158*27bd48ffSPablo de Lara
159*27bd48ffSPablo de LaraSee the tables below showing example entry distribution as table utilization increases.
160*27bd48ffSPablo de Lara
161*27bd48ffSPablo de Lara.. _table_hash_lib_1:
162*27bd48ffSPablo de Lara
163*27bd48ffSPablo de Lara.. table:: Entry distribution measured with an example table with 1024 random entries using jhash algorithm
164*27bd48ffSPablo de Lara
165*27bd48ffSPablo de Lara   +--------------+-----------------------+-------------------------+
166*27bd48ffSPablo de Lara   | % Table used | % In Primary location | % In Secondary location |
167*27bd48ffSPablo de Lara   +==============+=======================+=========================+
168*27bd48ffSPablo de Lara   |      25      |         100           |           0             |
169*27bd48ffSPablo de Lara   +--------------+-----------------------+-------------------------+
170*27bd48ffSPablo de Lara   |      50      |         96.1          |           3.9           |
171*27bd48ffSPablo de Lara   +--------------+-----------------------+-------------------------+
172*27bd48ffSPablo de Lara   |      75      |         88.2          |           11.8          |
173*27bd48ffSPablo de Lara   +--------------+-----------------------+-------------------------+
174*27bd48ffSPablo de Lara   |      80      |         86.3          |           13.7          |
175*27bd48ffSPablo de Lara   +--------------+-----------------------+-------------------------+
176*27bd48ffSPablo de Lara   |      85      |         83.1          |           16.9          |
177*27bd48ffSPablo de Lara   +--------------+-----------------------+-------------------------+
178*27bd48ffSPablo de Lara   |      90      |         77.3          |           22.7          |
179*27bd48ffSPablo de Lara   +--------------+-----------------------+-------------------------+
180*27bd48ffSPablo de Lara   |      95.8    |         64.5          |           35.5          |
181*27bd48ffSPablo de Lara   +--------------+-----------------------+-------------------------+
182*27bd48ffSPablo de Lara
183*27bd48ffSPablo de Lara|
184*27bd48ffSPablo de Lara
185*27bd48ffSPablo de Lara.. _table_hash_lib_2:
186*27bd48ffSPablo de Lara
187*27bd48ffSPablo de Lara.. table:: Entry distribution measured with an example table with 1 million random entries using jhash algorithm
188*27bd48ffSPablo de Lara
189*27bd48ffSPablo de Lara   +--------------+-----------------------+-------------------------+
190*27bd48ffSPablo de Lara   | % Table used | % In Primary location | % In Secondary location |
191*27bd48ffSPablo de Lara   +==============+=======================+=========================+
192*27bd48ffSPablo de Lara   |      50      |         96            |           4             |
193*27bd48ffSPablo de Lara   +--------------+-----------------------+-------------------------+
194*27bd48ffSPablo de Lara   |      75      |         86.9          |           13.1          |
195*27bd48ffSPablo de Lara   +--------------+-----------------------+-------------------------+
196*27bd48ffSPablo de Lara   |      80      |         83.9          |           16.1          |
197*27bd48ffSPablo de Lara   +--------------+-----------------------+-------------------------+
198*27bd48ffSPablo de Lara   |      85      |         80.1          |           19.9          |
199*27bd48ffSPablo de Lara   +--------------+-----------------------+-------------------------+
200*27bd48ffSPablo de Lara   |      90      |         74.8          |           25.2          |
201*27bd48ffSPablo de Lara   +--------------+-----------------------+-------------------------+
202*27bd48ffSPablo de Lara   |      94.5    |         67.4          |           32.6          |
203*27bd48ffSPablo de Lara   +--------------+-----------------------+-------------------------+
204*27bd48ffSPablo de Lara
205*27bd48ffSPablo de Lara.. note::
206*27bd48ffSPablo de Lara
207*27bd48ffSPablo de Lara   Last values on the tables above are the average maximum table
208*27bd48ffSPablo de Lara   utilization with random keys and using Jenkins hash function.
209*27bd48ffSPablo de Lara
210fc1f2750SBernard IremongerUse Case: Flow Classification
211fc1f2750SBernard Iremonger-----------------------------
212fc1f2750SBernard Iremonger
213fc1f2750SBernard IremongerFlow classification is used to map each input packet to the connection/flow it belongs to.
214fc1f2750SBernard IremongerThis operation is necessary as the processing of each input packet is usually done in the context of their connection,
215fc1f2750SBernard Iremongerso the same set of operations is applied to all the packets from the same flow.
216fc1f2750SBernard Iremonger
217fc1f2750SBernard IremongerApplications using flow classification typically have a flow table to manage, with each separate flow having an entry associated with it in this table.
218fc1f2750SBernard IremongerThe size of the flow table entry is application specific, with typical values of 4, 16, 32 or 64 bytes.
219fc1f2750SBernard Iremonger
220fc1f2750SBernard IremongerEach application using flow classification typically has a mechanism defined to uniquely identify a flow based on
221fc1f2750SBernard Iremongera number of fields read from the input packet that make up the flow key.
222fc1f2750SBernard IremongerOne example is to use the DiffServ 5-tuple made up of the following fields of the IP and transport layer packet headers:
223fc1f2750SBernard IremongerSource IP Address, Destination IP Address, Protocol, Source Port, Destination Port.
224fc1f2750SBernard Iremonger
22548624fd9SSiobhan ButlerThe DPDK hash provides a generic method to implement an application specific flow classification mechanism.
226fc1f2750SBernard IremongerGiven a flow table implemented as an array, the application should create a hash object with the same number of entries as the flow table and
227fc1f2750SBernard Iremongerwith the hash key size set to the number of bytes in the selected flow key.
228fc1f2750SBernard Iremonger
229fc1f2750SBernard IremongerThe flow table operations on the application side are described below:
230fc1f2750SBernard Iremonger
231fc1f2750SBernard Iremonger*   Add flow: Add the flow key to hash.
232fc1f2750SBernard Iremonger    If the returned position is valid, use it to access the flow entry in the flow table for adding a new flow or
233fc1f2750SBernard Iremonger    updating the information associated with an existing flow.
234fc1f2750SBernard Iremonger    Otherwise, the flow addition failed, for example due to lack of free entries for storing new flows.
235fc1f2750SBernard Iremonger
236fc1f2750SBernard Iremonger*   Delete flow: Delete the flow key from the hash. If the returned position is valid,
237fc1f2750SBernard Iremonger    use it to access the flow entry in the flow table to invalidate the information associated with the flow.
238fc1f2750SBernard Iremonger
239fc1f2750SBernard Iremonger*   Lookup flow: Lookup for the flow key in the hash.
240fc1f2750SBernard Iremonger    If the returned position is valid (flow lookup hit), use the returned position to access the flow entry in the flow table.
241fc1f2750SBernard Iremonger    Otherwise (flow lookup miss) there is no flow registered for the current packet.
242fc1f2750SBernard Iremonger
243fc1f2750SBernard IremongerReferences
244fc1f2750SBernard Iremonger----------
245fc1f2750SBernard Iremonger
246fc1f2750SBernard Iremonger*   Donald E. Knuth, The Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition), 1998, Addison-Wesley Professional
247