xref: /dpdk/doc/guides/prog_guide/toeplitz_hash_lib.rst (revision 31d7c069478dd94f40c7e4360b2a0953313ffe0e)
1534fe5f3SVladimir Medvedkin..  SPDX-License-Identifier: BSD-3-Clause
2534fe5f3SVladimir Medvedkin    Copyright(c) 2021 Intel Corporation.
3534fe5f3SVladimir Medvedkin
4534fe5f3SVladimir MedvedkinToeplitz Hash Library
5534fe5f3SVladimir Medvedkin=====================
6534fe5f3SVladimir Medvedkin
7534fe5f3SVladimir MedvedkinDPDK provides a Toeplitz Hash Library
8534fe5f3SVladimir Medvedkinto calculate the Toeplitz hash function and to use its properties.
9534fe5f3SVladimir MedvedkinThe Toeplitz hash function is commonly used in a wide range of NICs
10534fe5f3SVladimir Medvedkinto calculate the RSS hash sum to spread the traffic among the queues.
11534fe5f3SVladimir Medvedkin
12534fe5f3SVladimir Medvedkin.. _figure_rss_queue_assign:
13534fe5f3SVladimir Medvedkin
14534fe5f3SVladimir Medvedkin.. figure:: img/rss_queue_assign.*
15534fe5f3SVladimir Medvedkin
16534fe5f3SVladimir Medvedkin   RSS queue assignment example
17534fe5f3SVladimir Medvedkin
18534fe5f3SVladimir Medvedkin
19534fe5f3SVladimir MedvedkinToeplitz hash function API
20534fe5f3SVladimir Medvedkin--------------------------
21534fe5f3SVladimir Medvedkin
22*31d7c069SVladimir MedvedkinThere are four functions that provide calculation of the Toeplitz hash sum:
23534fe5f3SVladimir Medvedkin
24534fe5f3SVladimir Medvedkin* ``rte_softrss()``
25534fe5f3SVladimir Medvedkin* ``rte_softrss_be()``
264fd8c4cbSVladimir Medvedkin* ``rte_thash_gfni()``
27*31d7c069SVladimir Medvedkin* ``rte_thash_gfni_bulk()``
28534fe5f3SVladimir Medvedkin
294fd8c4cbSVladimir MedvedkinFirst two functions are scalar implementation and take the parameters:
30534fe5f3SVladimir Medvedkin
31534fe5f3SVladimir Medvedkin* A pointer to the tuple, containing fields extracted from the packet.
32534fe5f3SVladimir Medvedkin* A length of this tuple counted in double words.
33534fe5f3SVladimir Medvedkin* A pointer to the RSS hash key corresponding to the one installed on the NIC.
34534fe5f3SVladimir Medvedkin
354fd8c4cbSVladimir MedvedkinBoth of above mentioned _softrss_ functions expect the tuple to be in
364fd8c4cbSVladimir Medvedkin"host" byte order and a multiple of 4 bytes in length.
37534fe5f3SVladimir MedvedkinThe ``rte_softrss()`` function expects the ``rss_key``
38534fe5f3SVladimir Medvedkinto be exactly the same as the one installed on the NIC.
39534fe5f3SVladimir MedvedkinThe ``rte_softrss_be`` function is a faster implementation,
40534fe5f3SVladimir Medvedkinbut it expects ``rss_key`` to be converted to the host byte order.
4128ebff11SVladimir Medvedkin
42*31d7c069SVladimir MedvedkinThe last two functions are vectorized implementations using
43*31d7c069SVladimir MedvedkinGalois Fields New Instructions. Could be used if ``rte_thash_gfni_supported`` is true.
44*31d7c069SVladimir MedvedkinThey expect the tuple to be in network byte order.
454fd8c4cbSVladimir Medvedkin
46*31d7c069SVladimir Medvedkin``rte_thash_gfni()`` calculates the hash value for a single tuple, and
47*31d7c069SVladimir Medvedkin``rte_thash_gfni_bulk()`` bulk implementation of the rte_thash_gfni().
484fd8c4cbSVladimir Medvedkin
494fd8c4cbSVladimir Medvedkin``rte_thash_gfni()`` takes the parameters:
504fd8c4cbSVladimir Medvedkin
514fd8c4cbSVladimir Medvedkin* A pointer to the matrices derived from the RSS hash key using ``rte_thash_complete_matrix()``.
524fd8c4cbSVladimir Medvedkin* A pointer to the tuple.
534fd8c4cbSVladimir Medvedkin* A length of the tuple in bytes.
544fd8c4cbSVladimir Medvedkin
55*31d7c069SVladimir Medvedkin``rte_thash_gfni_bulk()`` takes the parameters:
56*31d7c069SVladimir Medvedkin
57*31d7c069SVladimir Medvedkin* A pointer to the matrices derived from the RSS hash key using ``rte_thash_complete_matrix()``.
58*31d7c069SVladimir Medvedkin* A length of the longest tuple in bytes.
59*31d7c069SVladimir Medvedkin* Array of the pointers on data to be hashed.
60*31d7c069SVladimir Medvedkin* Array of ``uint32_t`` where to put calculated Toeplitz hash values
61*31d7c069SVladimir Medvedkin* Number of tuples in a bulk.
62*31d7c069SVladimir Medvedkin
634fd8c4cbSVladimir Medvedkin``rte_thash_complete_matrix()`` is a function that calculates matrices required by
644fd8c4cbSVladimir MedvedkinGFNI implementations from the RSS hash key. It takes the parameters:
654fd8c4cbSVladimir Medvedkin
664fd8c4cbSVladimir Medvedkin* A pointer to the memory where the matrices will be written.
674fd8c4cbSVladimir Medvedkin* A pointer to the RSS hash key.
684fd8c4cbSVladimir Medvedkin* Length of the RSS hash key in bytes.
694fd8c4cbSVladimir Medvedkin
7028ebff11SVladimir Medvedkin
7128ebff11SVladimir MedvedkinPredictable RSS
7228ebff11SVladimir Medvedkin---------------
7328ebff11SVladimir Medvedkin
7428ebff11SVladimir MedvedkinIn some use cases it is useful to have a way to find partial collisions of the
7528ebff11SVladimir MedvedkinToeplitz hash function. In figure :numref:`figure_rss_queue_assign` only a few
7628ebff11SVladimir Medvedkinof the least significant bits (LSB) of the hash value are used to indicate an
7728ebff11SVladimir Medvedkinentry in the RSS Redirection Table (ReTa) and thus the index of the queue. So,
7828ebff11SVladimir Medvedkinin this case it would be useful to find another tuple whose hash has the same
7928ebff11SVladimir MedvedkinLSB's as the hash from the original tuple.
8028ebff11SVladimir Medvedkin
8128ebff11SVladimir MedvedkinFor example:
8228ebff11SVladimir Medvedkin
8328ebff11SVladimir Medvedkin- In the case of SNAT (Source Network Address Translation) it is possible to
8428ebff11SVladimir Medvedkin  find a special source port number on translation so that the hash of
8528ebff11SVladimir Medvedkin  returning packets, of the given connection, will have desired LSB's.
8628ebff11SVladimir Medvedkin- In the case of MPLS (Multiprotocol Label Switching), if the MPLS tag is used
8728ebff11SVladimir Medvedkin  in the hash calculation, the Label Switching router can allocate a special
8828ebff11SVladimir Medvedkin  MPLS tag to bind an LSP (Label Switching Path) to a given queue. This method
8928ebff11SVladimir Medvedkin  can be used with the allocation of IPSec SPI, VXLan VNI, etc., to bind the
9028ebff11SVladimir Medvedkin  tunnel to the desired queue.
9128ebff11SVladimir Medvedkin- In the case of a TCP stack, a special source port could be chosen for
9228ebff11SVladimir Medvedkin  outgoing connections, such that the response packets will be assigned to the
9328ebff11SVladimir Medvedkin  desired queue.
9428ebff11SVladimir Medvedkin
9528ebff11SVladimir MedvedkinThis functionality is provided by the API shown below.
9628ebff11SVladimir MedvedkinThe API consists of 3 parts:
9728ebff11SVladimir Medvedkin
9828ebff11SVladimir Medvedkin* Create the thash context.
9928ebff11SVladimir Medvedkin
10028ebff11SVladimir Medvedkin* Create the thash helper, associated with a context.
10128ebff11SVladimir Medvedkin
10228ebff11SVladimir Medvedkin* Use the helper run time to calculate the adjustable bits of the tuple to
10328ebff11SVladimir Medvedkin  ensure a collision.
10428ebff11SVladimir Medvedkin
10528ebff11SVladimir Medvedkin
10628ebff11SVladimir MedvedkinThash context
10728ebff11SVladimir Medvedkin~~~~~~~~~~~~~
10828ebff11SVladimir Medvedkin
10928ebff11SVladimir MedvedkinThe function ``rte_thash_init_ctx()`` initializes the context struct
11028ebff11SVladimir Medvedkinassociated with a particular NIC or a set of NICs. It expects:
11128ebff11SVladimir Medvedkin
11228ebff11SVladimir Medvedkin* The log2 value of the size of the RSS redirection table for the
11328ebff11SVladimir Medvedkin  corresponding NIC. It reflects the number of least significant bits of the
11428ebff11SVladimir Medvedkin  hash value to produce a collision for.
11528ebff11SVladimir Medvedkin
11628ebff11SVladimir Medvedkin* A predefined RSS hash key. This is optional, if ``NULL`` then a random key
11728ebff11SVladimir Medvedkin  will be initialized.
11828ebff11SVladimir Medvedkin
11928ebff11SVladimir Medvedkin* The length of the RSS hash key. This value is usually hardware/driver
12028ebff11SVladimir Medvedkin  specific and can be found in the NIC datasheet.
12128ebff11SVladimir Medvedkin
12228ebff11SVladimir Medvedkin* Optional flags, as shown below.
12328ebff11SVladimir Medvedkin
12428ebff11SVladimir MedvedkinSupported flags:
12528ebff11SVladimir Medvedkin
12628ebff11SVladimir Medvedkin* ``RTE_THASH_IGNORE_PERIOD_OVERFLOW`` - By default, and for security reasons,
12728ebff11SVladimir Medvedkin  the library prohibits generating a repeatable sequence in the hash key. This
12828ebff11SVladimir Medvedkin  flag disables such checking. The flag is mainly used for testing in the lab
12928ebff11SVladimir Medvedkin  to generate an RSS hash key with a uniform hash distribution, if the input
13028ebff11SVladimir Medvedkin  traffic also has a uniform distribution.
13128ebff11SVladimir Medvedkin
13228ebff11SVladimir Medvedkin* ``RTE_THASH_MINIMAL_SEQ`` - By default, the library generates a special bit
13328ebff11SVladimir Medvedkin  sequence in the hash key for all the bits of the subtuple. However, the
13428ebff11SVladimir Medvedkin  collision generation task requires only the ``log2(RETA_SZ)`` bits in the
13528ebff11SVladimir Medvedkin  subtuple. This flag forces the minimum bit sequence in the hash key to be
13628ebff11SVladimir Medvedkin  generated for the required ``log2(RETA_SZ)`` least significant bits of the
13728ebff11SVladimir Medvedkin  subtuple. The flag can be used in the case of a relatively large number of
13828ebff11SVladimir Medvedkin  helpers that may overlap with their corresponding bit sequences of RSS hash
13928ebff11SVladimir Medvedkin  keys.
14028ebff11SVladimir Medvedkin
14128ebff11SVladimir Medvedkin
14228ebff11SVladimir MedvedkinThash helper
14328ebff11SVladimir Medvedkin~~~~~~~~~~~~
14428ebff11SVladimir Medvedkin
14528ebff11SVladimir MedvedkinThe function ``rte_thash_add_helper()`` initializes the helper struct
14628ebff11SVladimir Medvedkinassociated with a given context and a part of a target tuple of interest which
14728ebff11SVladimir Medvedkincould be altered to produce a hash collision. On success it writes a specially
14828ebff11SVladimir Medvedkincalculated bit sequence into the RSS hash key which is stored in the context
14928ebff11SVladimir Medvedkinand calculates a table with values to be XORed with a subtuple.
15028ebff11SVladimir Medvedkin
15128ebff11SVladimir MedvedkinIt expects:
15228ebff11SVladimir Medvedkin
15328ebff11SVladimir Medvedkin* A pointer to the Thash context to be associated with.
15428ebff11SVladimir Medvedkin
15528ebff11SVladimir Medvedkin* A length of the subtuple to be modified. The length is counted in bits.
15628ebff11SVladimir Medvedkin
15728ebff11SVladimir Medvedkin* An offset of the subtuple to be modified from the beginning of the tuple. It
15828ebff11SVladimir Medvedkin  is also counted in bits.
15928ebff11SVladimir Medvedkin
16028ebff11SVladimir Medvedkin.. note::
16128ebff11SVladimir Medvedkin
16228ebff11SVladimir Medvedkin   Adding a helper changes the key stored in the corresponding context. So the
16328ebff11SVladimir Medvedkin   updated RSS hash key must be uploaded into the NIC after creating all the
16428ebff11SVladimir Medvedkin   required helpers.
16528ebff11SVladimir Medvedkin
16628ebff11SVladimir Medvedkin
16728ebff11SVladimir MedvedkinCalculation of the complementary bits to adjust the subtuple
16828ebff11SVladimir Medvedkin~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
16928ebff11SVladimir Medvedkin
17028ebff11SVladimir MedvedkinThe ``rte_thash_get_complement()`` function returns a special bit sequence
17128ebff11SVladimir Medvedkinwith length ``N = log2(rss_reta_sz)`` (for the ``rss_reta_sz`` provided at
17228ebff11SVladimir Medvedkincontext initialization) to be xored with N least significant bits of the
17328ebff11SVladimir Medvedkinsubtuple.
17428ebff11SVladimir Medvedkin
17528ebff11SVladimir MedvedkinIt expects:
17628ebff11SVladimir Medvedkin
17728ebff11SVladimir Medvedkin* A corresponding helper created for a given subtuple of the tuple.
17828ebff11SVladimir Medvedkin
17928ebff11SVladimir Medvedkin* A hash value of the tuple we want to alter.
18028ebff11SVladimir Medvedkin
18128ebff11SVladimir Medvedkin* The desired LSB's of the hash value the user expects to have.
18228ebff11SVladimir Medvedkin
18328ebff11SVladimir MedvedkinAfter the returned bit sequence has been XORed with the subtuple, the resulted
18428ebff11SVladimir MedvedkinLSB's of the new hash value, calculated from the altered tuple, will be the
18528ebff11SVladimir Medvedkinsame as in ``desired_hash``.
18628ebff11SVladimir Medvedkin
18728ebff11SVladimir Medvedkin
18828ebff11SVladimir MedvedkinAdjust tuple API
18928ebff11SVladimir Medvedkin~~~~~~~~~~~~~~~~~
19028ebff11SVladimir Medvedkin
19128ebff11SVladimir MedvedkinThe ``rte_thash_get_complement()`` function is a user-friendly wrapper around
19228ebff11SVladimir Medvedkina number of other functions. It alters a passed tuple to meet the above
19328ebff11SVladimir Medvedkinmentioned requirements around the desired hash LSB's.
19428ebff11SVladimir Medvedkin
19528ebff11SVladimir MedvedkinIt expects:
19628ebff11SVladimir Medvedkin
19728ebff11SVladimir Medvedkin* A Thash context and helper.
19828ebff11SVladimir Medvedkin
19928ebff11SVladimir Medvedkin* A pointer to the tuple to be changed.
20028ebff11SVladimir Medvedkin
20128ebff11SVladimir Medvedkin* The length of the tuple.
20228ebff11SVladimir Medvedkin
20328ebff11SVladimir Medvedkin* A callback function and its userdata to check the tuple after it has been
20428ebff11SVladimir Medvedkin  changed.
20528ebff11SVladimir Medvedkin
20628ebff11SVladimir Medvedkin* The number of attempts to change the tuple. Basically, it makes sense if
20728ebff11SVladimir Medvedkin  there is a callback and a limit on the number of attempts to change the
20828ebff11SVladimir Medvedkin  tuple, if the callback function returns an error.
20928ebff11SVladimir Medvedkin
21028ebff11SVladimir Medvedkin
21128ebff11SVladimir MedvedkinUse case example
2129c30a6f3SHenry Nadeau----------------
21328ebff11SVladimir Medvedkin
21428ebff11SVladimir MedvedkinThere could be a number of different use cases, such as NAT, TCP stack, MPLS
21528ebff11SVladimir Medvedkintag allocation, etc. In the following we will consider a SNAT application.
21628ebff11SVladimir Medvedkin
21728ebff11SVladimir MedvedkinPackets of a single bidirectional flow belonging to different directions can
21828ebff11SVladimir Medvedkinend up being assigned to different queues and thus processed by different
21928ebff11SVladimir Medvedkinlcores, as shown in :numref:`figure_predictable_snat_1`:
22028ebff11SVladimir Medvedkin
22128ebff11SVladimir Medvedkin.. _figure_predictable_snat_1:
22228ebff11SVladimir Medvedkin
22328ebff11SVladimir Medvedkin.. figure:: img/predictable_snat_1.*
22428ebff11SVladimir Medvedkin
22528ebff11SVladimir Medvedkin   Bidirectional flow packets distribution in general
22628ebff11SVladimir Medvedkin
22728ebff11SVladimir MedvedkinThat leads to a situation where the same packet flow can be shared between two
22828ebff11SVladimir Medvedkincores. Such a situation is not ideal from a performance perspective and
22928ebff11SVladimir Medvedkinrequires extra synchronization efforts that might lead to various performance
23028ebff11SVladimir Medvedkinpenalties, for example:
23128ebff11SVladimir Medvedkin
23228ebff11SVladimir Medvedkin* The connections table is global so locking/RCU on the flow insertion/removal
23328ebff11SVladimir Medvedkin  is required.
23428ebff11SVladimir Medvedkin
23528ebff11SVladimir Medvedkin* Connection metadata must be protected to avoid race conditions.
23628ebff11SVladimir Medvedkin
23728ebff11SVladimir Medvedkin* More cache pressure if a single connection metadata is kept in different
23828ebff11SVladimir Medvedkin  L1/L2 caches of a different CPU core.
23928ebff11SVladimir Medvedkin
24028ebff11SVladimir Medvedkin* Cache pressure/less cache locality on packet handover to the different cores.
24128ebff11SVladimir Medvedkin
24228ebff11SVladimir MedvedkinWe can avoid all these penalties if it can be guaranteed that packets
24328ebff11SVladimir Medvedkinbelonging to one bidirectional flow will be assigned to the same queue, as
24428ebff11SVladimir Medvedkinshown in :numref:`figure_predictable_snat_2`:
24528ebff11SVladimir Medvedkin
24628ebff11SVladimir Medvedkin.. _figure_predictable_snat_2:
24728ebff11SVladimir Medvedkin
24828ebff11SVladimir Medvedkin.. figure:: img/predictable_snat_2.*
24928ebff11SVladimir Medvedkin
25028ebff11SVladimir Medvedkin   Bidirectional flow packets distribution with predictable RSS
25128ebff11SVladimir Medvedkin
25228ebff11SVladimir Medvedkin
25328ebff11SVladimir MedvedkinTo achieve this in a SNAT scenario it is possible to choose a source port not
25428ebff11SVladimir Medvedkinrandomly, but using the predictable RSS library to produce a partial hash
25528ebff11SVladimir Medvedkincollision. This is shown in the code below.
25628ebff11SVladimir Medvedkin
25728ebff11SVladimir Medvedkin.. code-block:: c
25828ebff11SVladimir Medvedkin
25928ebff11SVladimir Medvedkin   int key_len = 40; /* The default Niantic RSS key length. */
26028ebff11SVladimir Medvedkin
26128ebff11SVladimir Medvedkin   /** The default Niantic RSS reta size = 2^7 entries, LSBs of hash value are
26228ebff11SVladimir Medvedkin    *  used as an indexes in RSS ReTa. */
26328ebff11SVladimir Medvedkin   int reta_sz = 7;
26428ebff11SVladimir Medvedkin   int ret;
26528ebff11SVladimir Medvedkin   struct rte_thash_ctx *ctx;
26628ebff11SVladimir Medvedkin
26728ebff11SVladimir Medvedkin   uint8_t initial_key[key_len] = {0}; /* Default empty key. */
26828ebff11SVladimir Medvedkin
26928ebff11SVladimir Medvedkin   /* Create and initialize a new thash context. */
27028ebff11SVladimir Medvedkin   ctx = rte_thash_init_ctx("SNAT", key_len, reta_sz, initial_key, 0);
27128ebff11SVladimir Medvedkin
27228ebff11SVladimir Medvedkin   /** Add a helper and specify the variable tuple part and its length. In the
27328ebff11SVladimir Medvedkin    *  SNAT case we want to choose a new source port on SNAT translation in a
27428ebff11SVladimir Medvedkin    *  way that the reverse tuple will have the same LSBs as the original
27528ebff11SVladimir Medvedkin    *  direction tuple so that the selected source port will be the
27628ebff11SVladimir Medvedkin    *  destination port on reply.
27728ebff11SVladimir Medvedkin    */
27828ebff11SVladimir Medvedkin   ret = rte_thash_add_helper(ctx, "snat", sizeof(uint16_t) * 8,
27928ebff11SVladimir Medvedkin                              offsetof(union rte_thash_tuple, v4.dport) * 8);
28028ebff11SVladimir Medvedkin
28128ebff11SVladimir Medvedkin   if (ret != 0)
28228ebff11SVladimir Medvedkin       return ret;
28328ebff11SVladimir Medvedkin
28428ebff11SVladimir Medvedkin   /* Get handler of the required helper. */
28528ebff11SVladimir Medvedkin   struct rte_thash_subtuple_helper *h = rte_thash_get_helper(ctx, "snat");
28628ebff11SVladimir Medvedkin
28728ebff11SVladimir Medvedkin   /** After calling rte_thash_add_helper() the initial_key passed on ctx
28828ebff11SVladimir Medvedkin    *  creation has been changed so we get the new one.
28928ebff11SVladimir Medvedkin    */
29028ebff11SVladimir Medvedkin   uint8_t *new_key = rte_thash_get_key(ctx);
29128ebff11SVladimir Medvedkin
29228ebff11SVladimir Medvedkin   union rte_thash_tuple tuple, rev_tuple;
29328ebff11SVladimir Medvedkin
29428ebff11SVladimir Medvedkin   /* A complete tuple from the packet. */
29528ebff11SVladimir Medvedkin   complete_tuple(mbuf, &tuple);
29628ebff11SVladimir Medvedkin
29728ebff11SVladimir Medvedkin   /* Calculate the RSS hash or get it from mbuf->hash.rss. */
29828ebff11SVladimir Medvedkin   uint32_t orig_hash = rte_softrss((uint32_t *)&tuple, RTE_THASH_V4_L4_LEN, new_key);
29928ebff11SVladimir Medvedkin
30028ebff11SVladimir Medvedkin   /** Complete the reverse tuple by translating the SRC address and swapping
30128ebff11SVladimir Medvedkin    *  src and dst addresses and ports.
30228ebff11SVladimir Medvedkin    */
30328ebff11SVladimir Medvedkin   get_rev_tuple(&rev_tuple, &tuple, new_ip);
30428ebff11SVladimir Medvedkin
30528ebff11SVladimir Medvedkin   /* Calculate the expected rss hash for the reverse tuple. */
30628ebff11SVladimir Medvedkin   uint32_t rev_hash = rte_softrss((uint32_t *)&rev_tuple, RTE_THASH_V4_L4_LEN, new_key);
30728ebff11SVladimir Medvedkin
30828ebff11SVladimir Medvedkin   /* Get the adjustment bits for the src port to get a new port. */
30928ebff11SVladimir Medvedkin   uint32_t adj = rte_thash_get_compliment(h, rev_hash, orig_hash);
31028ebff11SVladimir Medvedkin
31128ebff11SVladimir Medvedkin   /* Adjust the source port bits. */
31228ebff11SVladimir Medvedkin   uint16_t new_sport = tuple.v4.sport ^ adj;
31328ebff11SVladimir Medvedkin
31428ebff11SVladimir Medvedkin   /* Make an actual packet translation. */
31528ebff11SVladimir Medvedkin   do_snat(mbuf, new_ip, new_sport);
316