1534fe5f3SVladimir Medvedkin.. SPDX-License-Identifier: BSD-3-Clause 2534fe5f3SVladimir Medvedkin Copyright(c) 2021 Intel Corporation. 3534fe5f3SVladimir Medvedkin 4534fe5f3SVladimir MedvedkinToeplitz Hash Library 5534fe5f3SVladimir Medvedkin===================== 6534fe5f3SVladimir Medvedkin 7534fe5f3SVladimir MedvedkinDPDK provides a Toeplitz Hash Library 8534fe5f3SVladimir Medvedkinto calculate the Toeplitz hash function and to use its properties. 9534fe5f3SVladimir MedvedkinThe Toeplitz hash function is commonly used in a wide range of NICs 10534fe5f3SVladimir Medvedkinto calculate the RSS hash sum to spread the traffic among the queues. 11534fe5f3SVladimir Medvedkin 12534fe5f3SVladimir Medvedkin.. _figure_rss_queue_assign: 13534fe5f3SVladimir Medvedkin 14534fe5f3SVladimir Medvedkin.. figure:: img/rss_queue_assign.* 15534fe5f3SVladimir Medvedkin 16534fe5f3SVladimir Medvedkin RSS queue assignment example 17534fe5f3SVladimir Medvedkin 18534fe5f3SVladimir Medvedkin 19534fe5f3SVladimir MedvedkinToeplitz hash function API 20534fe5f3SVladimir Medvedkin-------------------------- 21534fe5f3SVladimir Medvedkin 22*4fd8c4cbSVladimir MedvedkinThere are three functions that provide calculation of the Toeplitz hash sum: 23534fe5f3SVladimir Medvedkin 24534fe5f3SVladimir Medvedkin* ``rte_softrss()`` 25534fe5f3SVladimir Medvedkin* ``rte_softrss_be()`` 26*4fd8c4cbSVladimir Medvedkin* ``rte_thash_gfni()`` 27534fe5f3SVladimir Medvedkin 28*4fd8c4cbSVladimir MedvedkinFirst two functions are scalar implementation and take the parameters: 29534fe5f3SVladimir Medvedkin 30534fe5f3SVladimir Medvedkin* A pointer to the tuple, containing fields extracted from the packet. 31534fe5f3SVladimir Medvedkin* A length of this tuple counted in double words. 32534fe5f3SVladimir Medvedkin* A pointer to the RSS hash key corresponding to the one installed on the NIC. 33534fe5f3SVladimir Medvedkin 34*4fd8c4cbSVladimir MedvedkinBoth of above mentioned _softrss_ functions expect the tuple to be in 35*4fd8c4cbSVladimir Medvedkin"host" byte order and a multiple of 4 bytes in length. 36534fe5f3SVladimir MedvedkinThe ``rte_softrss()`` function expects the ``rss_key`` 37534fe5f3SVladimir Medvedkinto be exactly the same as the one installed on the NIC. 38534fe5f3SVladimir MedvedkinThe ``rte_softrss_be`` function is a faster implementation, 39534fe5f3SVladimir Medvedkinbut it expects ``rss_key`` to be converted to the host byte order. 4028ebff11SVladimir Medvedkin 41*4fd8c4cbSVladimir MedvedkinThe last function is vectorized implementation using 42*4fd8c4cbSVladimir MedvedkinGalois Fields New Instructions. Could be used if ``rte_thash_gfni_supported`` returns true. 43*4fd8c4cbSVladimir MedvedkinIt expects the tuple to be in network byte order. 44*4fd8c4cbSVladimir Medvedkin 45*4fd8c4cbSVladimir Medvedkin``rte_thash_gfni()`` calculates the hash value for a single tuple 46*4fd8c4cbSVladimir Medvedkin 47*4fd8c4cbSVladimir Medvedkin``rte_thash_gfni()`` takes the parameters: 48*4fd8c4cbSVladimir Medvedkin 49*4fd8c4cbSVladimir Medvedkin* A pointer to the matrices derived from the RSS hash key using ``rte_thash_complete_matrix()``. 50*4fd8c4cbSVladimir Medvedkin* A pointer to the tuple. 51*4fd8c4cbSVladimir Medvedkin* A length of the tuple in bytes. 52*4fd8c4cbSVladimir Medvedkin 53*4fd8c4cbSVladimir Medvedkin``rte_thash_complete_matrix()`` is a function that calculates matrices required by 54*4fd8c4cbSVladimir MedvedkinGFNI implementations from the RSS hash key. It takes the parameters: 55*4fd8c4cbSVladimir Medvedkin 56*4fd8c4cbSVladimir Medvedkin* A pointer to the memory where the matrices will be written. 57*4fd8c4cbSVladimir Medvedkin* A pointer to the RSS hash key. 58*4fd8c4cbSVladimir Medvedkin* Length of the RSS hash key in bytes. 59*4fd8c4cbSVladimir Medvedkin 6028ebff11SVladimir Medvedkin 6128ebff11SVladimir MedvedkinPredictable RSS 6228ebff11SVladimir Medvedkin--------------- 6328ebff11SVladimir Medvedkin 6428ebff11SVladimir MedvedkinIn some use cases it is useful to have a way to find partial collisions of the 6528ebff11SVladimir MedvedkinToeplitz hash function. In figure :numref:`figure_rss_queue_assign` only a few 6628ebff11SVladimir Medvedkinof the least significant bits (LSB) of the hash value are used to indicate an 6728ebff11SVladimir Medvedkinentry in the RSS Redirection Table (ReTa) and thus the index of the queue. So, 6828ebff11SVladimir Medvedkinin this case it would be useful to find another tuple whose hash has the same 6928ebff11SVladimir MedvedkinLSB's as the hash from the original tuple. 7028ebff11SVladimir Medvedkin 7128ebff11SVladimir MedvedkinFor example: 7228ebff11SVladimir Medvedkin 7328ebff11SVladimir Medvedkin- In the case of SNAT (Source Network Address Translation) it is possible to 7428ebff11SVladimir Medvedkin find a special source port number on translation so that the hash of 7528ebff11SVladimir Medvedkin returning packets, of the given connection, will have desired LSB's. 7628ebff11SVladimir Medvedkin- In the case of MPLS (Multiprotocol Label Switching), if the MPLS tag is used 7728ebff11SVladimir Medvedkin in the hash calculation, the Label Switching router can allocate a special 7828ebff11SVladimir Medvedkin MPLS tag to bind an LSP (Label Switching Path) to a given queue. This method 7928ebff11SVladimir Medvedkin can be used with the allocation of IPSec SPI, VXLan VNI, etc., to bind the 8028ebff11SVladimir Medvedkin tunnel to the desired queue. 8128ebff11SVladimir Medvedkin- In the case of a TCP stack, a special source port could be chosen for 8228ebff11SVladimir Medvedkin outgoing connections, such that the response packets will be assigned to the 8328ebff11SVladimir Medvedkin desired queue. 8428ebff11SVladimir Medvedkin 8528ebff11SVladimir MedvedkinThis functionality is provided by the API shown below. 8628ebff11SVladimir MedvedkinThe API consists of 3 parts: 8728ebff11SVladimir Medvedkin 8828ebff11SVladimir Medvedkin* Create the thash context. 8928ebff11SVladimir Medvedkin 9028ebff11SVladimir Medvedkin* Create the thash helper, associated with a context. 9128ebff11SVladimir Medvedkin 9228ebff11SVladimir Medvedkin* Use the helper run time to calculate the adjustable bits of the tuple to 9328ebff11SVladimir Medvedkin ensure a collision. 9428ebff11SVladimir Medvedkin 9528ebff11SVladimir Medvedkin 9628ebff11SVladimir MedvedkinThash context 9728ebff11SVladimir Medvedkin~~~~~~~~~~~~~ 9828ebff11SVladimir Medvedkin 9928ebff11SVladimir MedvedkinThe function ``rte_thash_init_ctx()`` initializes the context struct 10028ebff11SVladimir Medvedkinassociated with a particular NIC or a set of NICs. It expects: 10128ebff11SVladimir Medvedkin 10228ebff11SVladimir Medvedkin* The log2 value of the size of the RSS redirection table for the 10328ebff11SVladimir Medvedkin corresponding NIC. It reflects the number of least significant bits of the 10428ebff11SVladimir Medvedkin hash value to produce a collision for. 10528ebff11SVladimir Medvedkin 10628ebff11SVladimir Medvedkin* A predefined RSS hash key. This is optional, if ``NULL`` then a random key 10728ebff11SVladimir Medvedkin will be initialized. 10828ebff11SVladimir Medvedkin 10928ebff11SVladimir Medvedkin* The length of the RSS hash key. This value is usually hardware/driver 11028ebff11SVladimir Medvedkin specific and can be found in the NIC datasheet. 11128ebff11SVladimir Medvedkin 11228ebff11SVladimir Medvedkin* Optional flags, as shown below. 11328ebff11SVladimir Medvedkin 11428ebff11SVladimir MedvedkinSupported flags: 11528ebff11SVladimir Medvedkin 11628ebff11SVladimir Medvedkin* ``RTE_THASH_IGNORE_PERIOD_OVERFLOW`` - By default, and for security reasons, 11728ebff11SVladimir Medvedkin the library prohibits generating a repeatable sequence in the hash key. This 11828ebff11SVladimir Medvedkin flag disables such checking. The flag is mainly used for testing in the lab 11928ebff11SVladimir Medvedkin to generate an RSS hash key with a uniform hash distribution, if the input 12028ebff11SVladimir Medvedkin traffic also has a uniform distribution. 12128ebff11SVladimir Medvedkin 12228ebff11SVladimir Medvedkin* ``RTE_THASH_MINIMAL_SEQ`` - By default, the library generates a special bit 12328ebff11SVladimir Medvedkin sequence in the hash key for all the bits of the subtuple. However, the 12428ebff11SVladimir Medvedkin collision generation task requires only the ``log2(RETA_SZ)`` bits in the 12528ebff11SVladimir Medvedkin subtuple. This flag forces the minimum bit sequence in the hash key to be 12628ebff11SVladimir Medvedkin generated for the required ``log2(RETA_SZ)`` least significant bits of the 12728ebff11SVladimir Medvedkin subtuple. The flag can be used in the case of a relatively large number of 12828ebff11SVladimir Medvedkin helpers that may overlap with their corresponding bit sequences of RSS hash 12928ebff11SVladimir Medvedkin keys. 13028ebff11SVladimir Medvedkin 13128ebff11SVladimir Medvedkin 13228ebff11SVladimir MedvedkinThash helper 13328ebff11SVladimir Medvedkin~~~~~~~~~~~~ 13428ebff11SVladimir Medvedkin 13528ebff11SVladimir MedvedkinThe function ``rte_thash_add_helper()`` initializes the helper struct 13628ebff11SVladimir Medvedkinassociated with a given context and a part of a target tuple of interest which 13728ebff11SVladimir Medvedkincould be altered to produce a hash collision. On success it writes a specially 13828ebff11SVladimir Medvedkincalculated bit sequence into the RSS hash key which is stored in the context 13928ebff11SVladimir Medvedkinand calculates a table with values to be XORed with a subtuple. 14028ebff11SVladimir Medvedkin 14128ebff11SVladimir MedvedkinIt expects: 14228ebff11SVladimir Medvedkin 14328ebff11SVladimir Medvedkin* A pointer to the Thash context to be associated with. 14428ebff11SVladimir Medvedkin 14528ebff11SVladimir Medvedkin* A length of the subtuple to be modified. The length is counted in bits. 14628ebff11SVladimir Medvedkin 14728ebff11SVladimir Medvedkin* An offset of the subtuple to be modified from the beginning of the tuple. It 14828ebff11SVladimir Medvedkin is also counted in bits. 14928ebff11SVladimir Medvedkin 15028ebff11SVladimir Medvedkin.. note:: 15128ebff11SVladimir Medvedkin 15228ebff11SVladimir Medvedkin Adding a helper changes the key stored in the corresponding context. So the 15328ebff11SVladimir Medvedkin updated RSS hash key must be uploaded into the NIC after creating all the 15428ebff11SVladimir Medvedkin required helpers. 15528ebff11SVladimir Medvedkin 15628ebff11SVladimir Medvedkin 15728ebff11SVladimir MedvedkinCalculation of the complementary bits to adjust the subtuple 15828ebff11SVladimir Medvedkin~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 15928ebff11SVladimir Medvedkin 16028ebff11SVladimir MedvedkinThe ``rte_thash_get_complement()`` function returns a special bit sequence 16128ebff11SVladimir Medvedkinwith length ``N = log2(rss_reta_sz)`` (for the ``rss_reta_sz`` provided at 16228ebff11SVladimir Medvedkincontext initialization) to be xored with N least significant bits of the 16328ebff11SVladimir Medvedkinsubtuple. 16428ebff11SVladimir Medvedkin 16528ebff11SVladimir MedvedkinIt expects: 16628ebff11SVladimir Medvedkin 16728ebff11SVladimir Medvedkin* A corresponding helper created for a given subtuple of the tuple. 16828ebff11SVladimir Medvedkin 16928ebff11SVladimir Medvedkin* A hash value of the tuple we want to alter. 17028ebff11SVladimir Medvedkin 17128ebff11SVladimir Medvedkin* The desired LSB's of the hash value the user expects to have. 17228ebff11SVladimir Medvedkin 17328ebff11SVladimir MedvedkinAfter the returned bit sequence has been XORed with the subtuple, the resulted 17428ebff11SVladimir MedvedkinLSB's of the new hash value, calculated from the altered tuple, will be the 17528ebff11SVladimir Medvedkinsame as in ``desired_hash``. 17628ebff11SVladimir Medvedkin 17728ebff11SVladimir Medvedkin 17828ebff11SVladimir MedvedkinAdjust tuple API 17928ebff11SVladimir Medvedkin~~~~~~~~~~~~~~~~~ 18028ebff11SVladimir Medvedkin 18128ebff11SVladimir MedvedkinThe ``rte_thash_get_complement()`` function is a user-friendly wrapper around 18228ebff11SVladimir Medvedkina number of other functions. It alters a passed tuple to meet the above 18328ebff11SVladimir Medvedkinmentioned requirements around the desired hash LSB's. 18428ebff11SVladimir Medvedkin 18528ebff11SVladimir MedvedkinIt expects: 18628ebff11SVladimir Medvedkin 18728ebff11SVladimir Medvedkin* A Thash context and helper. 18828ebff11SVladimir Medvedkin 18928ebff11SVladimir Medvedkin* A pointer to the tuple to be changed. 19028ebff11SVladimir Medvedkin 19128ebff11SVladimir Medvedkin* The length of the tuple. 19228ebff11SVladimir Medvedkin 19328ebff11SVladimir Medvedkin* A callback function and its userdata to check the tuple after it has been 19428ebff11SVladimir Medvedkin changed. 19528ebff11SVladimir Medvedkin 19628ebff11SVladimir Medvedkin* The number of attempts to change the tuple. Basically, it makes sense if 19728ebff11SVladimir Medvedkin there is a callback and a limit on the number of attempts to change the 19828ebff11SVladimir Medvedkin tuple, if the callback function returns an error. 19928ebff11SVladimir Medvedkin 20028ebff11SVladimir Medvedkin 20128ebff11SVladimir MedvedkinUse case example 2029c30a6f3SHenry Nadeau---------------- 20328ebff11SVladimir Medvedkin 20428ebff11SVladimir MedvedkinThere could be a number of different use cases, such as NAT, TCP stack, MPLS 20528ebff11SVladimir Medvedkintag allocation, etc. In the following we will consider a SNAT application. 20628ebff11SVladimir Medvedkin 20728ebff11SVladimir MedvedkinPackets of a single bidirectional flow belonging to different directions can 20828ebff11SVladimir Medvedkinend up being assigned to different queues and thus processed by different 20928ebff11SVladimir Medvedkinlcores, as shown in :numref:`figure_predictable_snat_1`: 21028ebff11SVladimir Medvedkin 21128ebff11SVladimir Medvedkin.. _figure_predictable_snat_1: 21228ebff11SVladimir Medvedkin 21328ebff11SVladimir Medvedkin.. figure:: img/predictable_snat_1.* 21428ebff11SVladimir Medvedkin 21528ebff11SVladimir Medvedkin Bidirectional flow packets distribution in general 21628ebff11SVladimir Medvedkin 21728ebff11SVladimir MedvedkinThat leads to a situation where the same packet flow can be shared between two 21828ebff11SVladimir Medvedkincores. Such a situation is not ideal from a performance perspective and 21928ebff11SVladimir Medvedkinrequires extra synchronization efforts that might lead to various performance 22028ebff11SVladimir Medvedkinpenalties, for example: 22128ebff11SVladimir Medvedkin 22228ebff11SVladimir Medvedkin* The connections table is global so locking/RCU on the flow insertion/removal 22328ebff11SVladimir Medvedkin is required. 22428ebff11SVladimir Medvedkin 22528ebff11SVladimir Medvedkin* Connection metadata must be protected to avoid race conditions. 22628ebff11SVladimir Medvedkin 22728ebff11SVladimir Medvedkin* More cache pressure if a single connection metadata is kept in different 22828ebff11SVladimir Medvedkin L1/L2 caches of a different CPU core. 22928ebff11SVladimir Medvedkin 23028ebff11SVladimir Medvedkin* Cache pressure/less cache locality on packet handover to the different cores. 23128ebff11SVladimir Medvedkin 23228ebff11SVladimir MedvedkinWe can avoid all these penalties if it can be guaranteed that packets 23328ebff11SVladimir Medvedkinbelonging to one bidirectional flow will be assigned to the same queue, as 23428ebff11SVladimir Medvedkinshown in :numref:`figure_predictable_snat_2`: 23528ebff11SVladimir Medvedkin 23628ebff11SVladimir Medvedkin.. _figure_predictable_snat_2: 23728ebff11SVladimir Medvedkin 23828ebff11SVladimir Medvedkin.. figure:: img/predictable_snat_2.* 23928ebff11SVladimir Medvedkin 24028ebff11SVladimir Medvedkin Bidirectional flow packets distribution with predictable RSS 24128ebff11SVladimir Medvedkin 24228ebff11SVladimir Medvedkin 24328ebff11SVladimir MedvedkinTo achieve this in a SNAT scenario it is possible to choose a source port not 24428ebff11SVladimir Medvedkinrandomly, but using the predictable RSS library to produce a partial hash 24528ebff11SVladimir Medvedkincollision. This is shown in the code below. 24628ebff11SVladimir Medvedkin 24728ebff11SVladimir Medvedkin.. code-block:: c 24828ebff11SVladimir Medvedkin 24928ebff11SVladimir Medvedkin int key_len = 40; /* The default Niantic RSS key length. */ 25028ebff11SVladimir Medvedkin 25128ebff11SVladimir Medvedkin /** The default Niantic RSS reta size = 2^7 entries, LSBs of hash value are 25228ebff11SVladimir Medvedkin * used as an indexes in RSS ReTa. */ 25328ebff11SVladimir Medvedkin int reta_sz = 7; 25428ebff11SVladimir Medvedkin int ret; 25528ebff11SVladimir Medvedkin struct rte_thash_ctx *ctx; 25628ebff11SVladimir Medvedkin 25728ebff11SVladimir Medvedkin uint8_t initial_key[key_len] = {0}; /* Default empty key. */ 25828ebff11SVladimir Medvedkin 25928ebff11SVladimir Medvedkin /* Create and initialize a new thash context. */ 26028ebff11SVladimir Medvedkin ctx = rte_thash_init_ctx("SNAT", key_len, reta_sz, initial_key, 0); 26128ebff11SVladimir Medvedkin 26228ebff11SVladimir Medvedkin /** Add a helper and specify the variable tuple part and its length. In the 26328ebff11SVladimir Medvedkin * SNAT case we want to choose a new source port on SNAT translation in a 26428ebff11SVladimir Medvedkin * way that the reverse tuple will have the same LSBs as the original 26528ebff11SVladimir Medvedkin * direction tuple so that the selected source port will be the 26628ebff11SVladimir Medvedkin * destination port on reply. 26728ebff11SVladimir Medvedkin */ 26828ebff11SVladimir Medvedkin ret = rte_thash_add_helper(ctx, "snat", sizeof(uint16_t) * 8, 26928ebff11SVladimir Medvedkin offsetof(union rte_thash_tuple, v4.dport) * 8); 27028ebff11SVladimir Medvedkin 27128ebff11SVladimir Medvedkin if (ret != 0) 27228ebff11SVladimir Medvedkin return ret; 27328ebff11SVladimir Medvedkin 27428ebff11SVladimir Medvedkin /* Get handler of the required helper. */ 27528ebff11SVladimir Medvedkin struct rte_thash_subtuple_helper *h = rte_thash_get_helper(ctx, "snat"); 27628ebff11SVladimir Medvedkin 27728ebff11SVladimir Medvedkin /** After calling rte_thash_add_helper() the initial_key passed on ctx 27828ebff11SVladimir Medvedkin * creation has been changed so we get the new one. 27928ebff11SVladimir Medvedkin */ 28028ebff11SVladimir Medvedkin uint8_t *new_key = rte_thash_get_key(ctx); 28128ebff11SVladimir Medvedkin 28228ebff11SVladimir Medvedkin union rte_thash_tuple tuple, rev_tuple; 28328ebff11SVladimir Medvedkin 28428ebff11SVladimir Medvedkin /* A complete tuple from the packet. */ 28528ebff11SVladimir Medvedkin complete_tuple(mbuf, &tuple); 28628ebff11SVladimir Medvedkin 28728ebff11SVladimir Medvedkin /* Calculate the RSS hash or get it from mbuf->hash.rss. */ 28828ebff11SVladimir Medvedkin uint32_t orig_hash = rte_softrss((uint32_t *)&tuple, RTE_THASH_V4_L4_LEN, new_key); 28928ebff11SVladimir Medvedkin 29028ebff11SVladimir Medvedkin /** Complete the reverse tuple by translating the SRC address and swapping 29128ebff11SVladimir Medvedkin * src and dst addresses and ports. 29228ebff11SVladimir Medvedkin */ 29328ebff11SVladimir Medvedkin get_rev_tuple(&rev_tuple, &tuple, new_ip); 29428ebff11SVladimir Medvedkin 29528ebff11SVladimir Medvedkin /* Calculate the expected rss hash for the reverse tuple. */ 29628ebff11SVladimir Medvedkin uint32_t rev_hash = rte_softrss((uint32_t *)&rev_tuple, RTE_THASH_V4_L4_LEN, new_key); 29728ebff11SVladimir Medvedkin 29828ebff11SVladimir Medvedkin /* Get the adjustment bits for the src port to get a new port. */ 29928ebff11SVladimir Medvedkin uint32_t adj = rte_thash_get_compliment(h, rev_hash, orig_hash); 30028ebff11SVladimir Medvedkin 30128ebff11SVladimir Medvedkin /* Adjust the source port bits. */ 30228ebff11SVladimir Medvedkin uint16_t new_sport = tuple.v4.sport ^ adj; 30328ebff11SVladimir Medvedkin 30428ebff11SVladimir Medvedkin /* Make an actual packet translation. */ 30528ebff11SVladimir Medvedkin do_snat(mbuf, new_ip, new_sport); 306