xref: /dpdk/doc/guides/prog_guide/packet_framework.rst (revision 41dd9a6bc2d9c6e20e139ad713cc9d172572dd43)
15630257fSFerruh Yigit..  SPDX-License-Identifier: BSD-3-Clause
25630257fSFerruh Yigit    Copyright(c) 2010-2014 Intel Corporation.
3fc1f2750SBernard Iremonger
4*41dd9a6bSDavid YoungPacket Framework Library
5*41dd9a6bSDavid Young========================
6fc1f2750SBernard Iremonger
7fc1f2750SBernard IremongerDesign Objectives
8fc1f2750SBernard Iremonger-----------------
9fc1f2750SBernard Iremonger
1048624fd9SSiobhan ButlerThe main design objectives for the DPDK Packet Framework are:
11fc1f2750SBernard Iremonger
12fc1f2750SBernard Iremonger*   Provide standard methodology to build complex packet processing pipelines.
13fc1f2750SBernard Iremonger    Provide reusable and extensible templates for the commonly used pipeline functional blocks;
14fc1f2750SBernard Iremonger
15fc1f2750SBernard Iremonger*   Provide capability to switch between pure software and hardware-accelerated implementations for the same pipeline functional block;
16fc1f2750SBernard Iremonger
17fc1f2750SBernard Iremonger*   Provide the best trade-off between flexibility and performance.
18fc1f2750SBernard Iremonger    Hardcoded pipelines usually provide the best performance, but are not flexible,
19fc1f2750SBernard Iremonger    while developing flexible frameworks is never a problem, but performance is usually low;
20fc1f2750SBernard Iremonger
21fc1f2750SBernard Iremonger*   Provide a framework that is logically similar to Open Flow.
22fc1f2750SBernard Iremonger
23fc1f2750SBernard IremongerOverview
24fc1f2750SBernard Iremonger--------
25fc1f2750SBernard Iremonger
26fc1f2750SBernard IremongerPacket processing applications are frequently structured as pipelines of multiple stages,
27fc1f2750SBernard Iremongerwith the logic of each stage glued around a lookup table.
28fc1f2750SBernard IremongerFor each incoming packet, the table defines the set of actions to be applied to the packet,
29fc1f2750SBernard Iremongeras well as the next stage to send the packet to.
30fc1f2750SBernard Iremonger
3148624fd9SSiobhan ButlerThe DPDK Packet Framework minimizes the development effort required to build packet processing pipelines
32fc1f2750SBernard Iremongerby defining a standard methodology for pipeline development,
33fc1f2750SBernard Iremongeras well as providing libraries of reusable templates for the commonly used pipeline blocks.
34fc1f2750SBernard Iremonger
35fc1f2750SBernard IremongerThe pipeline is constructed by connecting the set of input ports with the set of output ports
36fc1f2750SBernard Iremongerthrough the set of tables in a tree-like topology.
37fc1f2750SBernard IremongerAs result of lookup operation for the current packet in the current table,
38fc1f2750SBernard Iremongerone of the table entries (on lookup hit) or the default table entry (on lookup miss)
39fc1f2750SBernard Iremongerprovides the set of actions to be applied on the current packet,
40fc1f2750SBernard Iremongeras well as the next hop for the packet, which can be either another table, an output port or packet drop.
41fc1f2750SBernard Iremonger
424a22e6eeSJohn McNamaraAn example of packet processing pipeline is presented in :numref:`figure_figure32`:
43fc1f2750SBernard Iremonger
444a22e6eeSJohn McNamara.. _figure_figure32:
45fc1f2750SBernard Iremonger
464a22e6eeSJohn McNamara.. figure:: img/figure32.*
47fc1f2750SBernard Iremonger
484a22e6eeSJohn McNamara   Example of Packet Processing Pipeline where Input Ports 0 and 1
494a22e6eeSJohn McNamara   are Connected with Output Ports 0, 1 and 2 through Tables 0 and 1
50fc1f2750SBernard Iremonger
51fc1f2750SBernard Iremonger
52fc1f2750SBernard IremongerPort Library Design
53fc1f2750SBernard Iremonger-------------------
54fc1f2750SBernard Iremonger
55fc1f2750SBernard IremongerPort Types
56fc1f2750SBernard Iremonger~~~~~~~~~~
57fc1f2750SBernard Iremonger
588c9a3374SJohn McNamara:numref:`table_qos_19` is a non-exhaustive list of ports that can be implemented with the Packet Framework.
59fc1f2750SBernard Iremonger
608c9a3374SJohn McNamara.. _table_qos_19:
61fc1f2750SBernard Iremonger
628c9a3374SJohn McNamara.. table:: Port Types
63fc1f2750SBernard Iremonger
64fc1f2750SBernard Iremonger   +---+------------------+---------------------------------------------------------------------------------------+
65fc1f2750SBernard Iremonger   | # | Port type        | Description                                                                           |
66fc1f2750SBernard Iremonger   |   |                  |                                                                                       |
67fc1f2750SBernard Iremonger   +===+==================+=======================================================================================+
68fc1f2750SBernard Iremonger   | 1 | SW ring          | SW circular buffer used for message passing between the application threads. Uses     |
6948624fd9SSiobhan Butler   |   |                  | the DPDK rte_ring primitive. Expected to be the most commonly used type of            |
70fc1f2750SBernard Iremonger   |   |                  | port.                                                                                 |
71fc1f2750SBernard Iremonger   |   |                  |                                                                                       |
72fc1f2750SBernard Iremonger   +---+------------------+---------------------------------------------------------------------------------------+
73fc1f2750SBernard Iremonger   | 2 | HW ring          | Queue of buffer descriptors used to interact with NIC, switch or accelerator ports.   |
7448624fd9SSiobhan Butler   |   |                  | For NIC ports, it uses the DPDK rte_eth_rx_queue or rte_eth_tx_queue                  |
75fc1f2750SBernard Iremonger   |   |                  | primitives.                                                                           |
76fc1f2750SBernard Iremonger   |   |                  |                                                                                       |
77fc1f2750SBernard Iremonger   +---+------------------+---------------------------------------------------------------------------------------+
78fc1f2750SBernard Iremonger   | 3 | IP reassembly    | Input packets are either IP fragments or complete IP datagrams. Output packets are    |
79fc1f2750SBernard Iremonger   |   |                  | complete IP datagrams.                                                                |
80fc1f2750SBernard Iremonger   |   |                  |                                                                                       |
81fc1f2750SBernard Iremonger   +---+------------------+---------------------------------------------------------------------------------------+
82fc1f2750SBernard Iremonger   | 4 | IP fragmentation | Input packets are jumbo (IP datagrams with length bigger than MTU) or non-jumbo       |
83fc1f2750SBernard Iremonger   |   |                  | packets. Output packets are non-jumbo packets.                                        |
84fc1f2750SBernard Iremonger   |   |                  |                                                                                       |
85fc1f2750SBernard Iremonger   +---+------------------+---------------------------------------------------------------------------------------+
86fc1f2750SBernard Iremonger   | 5 | Traffic manager  | Traffic manager attached to a specific NIC output port, performing congestion         |
87fc1f2750SBernard Iremonger   |   |                  | management and hierarchical scheduling according to pre-defined SLAs.                 |
88fc1f2750SBernard Iremonger   |   |                  |                                                                                       |
89fc1f2750SBernard Iremonger   +---+------------------+---------------------------------------------------------------------------------------+
90f78c100bSStephen Hemminger   | 6 | Source           | Input port used as packet generator. Similar to Linux kernel /dev/zero character      |
91fc1f2750SBernard Iremonger   |   |                  | device.                                                                               |
92fc1f2750SBernard Iremonger   |   |                  |                                                                                       |
93fc1f2750SBernard Iremonger   +---+------------------+---------------------------------------------------------------------------------------+
94f78c100bSStephen Hemminger   | 7 | Sink             | Output port used to drop all input packets. Similar to Linux kernel /dev/null         |
95fc1f2750SBernard Iremonger   |   |                  | character device.                                                                     |
96fc1f2750SBernard Iremonger   |   |                  |                                                                                       |
97fc1f2750SBernard Iremonger   +---+------------------+---------------------------------------------------------------------------------------+
98f78c100bSStephen Hemminger   | 8 | Sym_crypto       | Output port used to extract DPDK Cryptodev operations from a fixed offset of the      |
99cc85c078SFan Zhang   |   |                  | packet and then enqueue to the Cryptodev PMD. Input port used to dequeue the          |
100cc85c078SFan Zhang   |   |                  | Cryptodev operations from the Cryptodev PMD and then retrieve the packets from them.  |
101cc85c078SFan Zhang   +---+------------------+---------------------------------------------------------------------------------------+
102fc1f2750SBernard Iremonger
103fc1f2750SBernard IremongerPort Interface
104fc1f2750SBernard Iremonger~~~~~~~~~~~~~~
105fc1f2750SBernard Iremonger
106fc1f2750SBernard IremongerEach port is unidirectional, i.e. either input port or output port.
107fc1f2750SBernard IremongerEach input/output port is required to implement an abstract interface that
108fc1f2750SBernard Iremongerdefines the initialization and run-time operation of the port.
109fc1f2750SBernard IremongerThe port abstract interface is described in.
110fc1f2750SBernard Iremonger
1118c9a3374SJohn McNamara.. _table_qos_20:
112fc1f2750SBernard Iremonger
1138c9a3374SJohn McNamara.. table:: 20 Port Abstract Interface
114fc1f2750SBernard Iremonger
115fc1f2750SBernard Iremonger   +---+----------------+-----------------------------------------------------------------------------------------+
116fc1f2750SBernard Iremonger   | # | Port Operation | Description                                                                             |
117fc1f2750SBernard Iremonger   |   |                |                                                                                         |
118fc1f2750SBernard Iremonger   +===+================+=========================================================================================+
119fc1f2750SBernard Iremonger   | 1 | Create         | Create the low-level port object (e.g. queue). Can internally allocate memory.          |
120fc1f2750SBernard Iremonger   |   |                |                                                                                         |
121fc1f2750SBernard Iremonger   +---+----------------+-----------------------------------------------------------------------------------------+
122fc1f2750SBernard Iremonger   | 2 | Free           | Free the resources (e.g. memory) used by the low-level port object.                     |
123fc1f2750SBernard Iremonger   |   |                |                                                                                         |
124fc1f2750SBernard Iremonger   +---+----------------+-----------------------------------------------------------------------------------------+
125fc1f2750SBernard Iremonger   | 3 | RX             | Read a burst of input packets. Non-blocking operation. Only defined for input ports.    |
126fc1f2750SBernard Iremonger   |   |                |                                                                                         |
127fc1f2750SBernard Iremonger   +---+----------------+-----------------------------------------------------------------------------------------+
128fc1f2750SBernard Iremonger   | 4 | TX             | Write a burst of input packets. Non-blocking operation. Only defined for output ports.  |
129fc1f2750SBernard Iremonger   |   |                |                                                                                         |
130fc1f2750SBernard Iremonger   +---+----------------+-----------------------------------------------------------------------------------------+
131fc1f2750SBernard Iremonger   | 5 | Flush          | Flush the output buffer. Only defined for output ports.                                 |
132fc1f2750SBernard Iremonger   |   |                |                                                                                         |
133fc1f2750SBernard Iremonger   +---+----------------+-----------------------------------------------------------------------------------------+
134fc1f2750SBernard Iremonger
135fc1f2750SBernard IremongerTable Library Design
136fc1f2750SBernard Iremonger--------------------
137fc1f2750SBernard Iremonger
138fc1f2750SBernard IremongerTable Types
139fc1f2750SBernard Iremonger~~~~~~~~~~~
140fc1f2750SBernard Iremonger
1418c9a3374SJohn McNamara:numref:`table_qos_21` is a non-exhaustive list of types of tables that can be implemented with the Packet Framework.
142fc1f2750SBernard Iremonger
1438c9a3374SJohn McNamara.. _table_qos_21:
144fc1f2750SBernard Iremonger
1458c9a3374SJohn McNamara.. table:: Table Types
146fc1f2750SBernard Iremonger
147fc1f2750SBernard Iremonger   +---+----------------------------+-----------------------------------------------------------------------------+
148fc1f2750SBernard Iremonger   | # | Table Type                 | Description                                                                 |
149fc1f2750SBernard Iremonger   |   |                            |                                                                             |
150fc1f2750SBernard Iremonger   +===+============================+=============================================================================+
151fc1f2750SBernard Iremonger   | 1 | Hash table                 | Lookup key is n-tuple based.                                                |
152fc1f2750SBernard Iremonger   |   |                            |                                                                             |
153fc1f2750SBernard Iremonger   |   |                            | Typically, the lookup key is hashed to produce a signature that is used to  |
154fc1f2750SBernard Iremonger   |   |                            | identify a bucket of entries where the lookup key is searched next.         |
155fc1f2750SBernard Iremonger   |   |                            |                                                                             |
156fc1f2750SBernard Iremonger   |   |                            | The signature associated with the lookup key of each input packet is either |
157fc1f2750SBernard Iremonger   |   |                            | read from the packet descriptor (pre-computed signature) or computed at     |
158fc1f2750SBernard Iremonger   |   |                            | table lookup time.                                                          |
159fc1f2750SBernard Iremonger   |   |                            |                                                                             |
160fc1f2750SBernard Iremonger   |   |                            | The table lookup, add entry and delete entry operations, as well as any     |
161fc1f2750SBernard Iremonger   |   |                            | other pipeline block that pre-computes the signature all have to use the    |
162fc1f2750SBernard Iremonger   |   |                            | same hashing algorithm to generate the signature.                           |
163fc1f2750SBernard Iremonger   |   |                            |                                                                             |
164fc1f2750SBernard Iremonger   |   |                            | Typically used to implement flow classification tables, ARP caches, routing |
165fc1f2750SBernard Iremonger   |   |                            | table for tunnelling protocols, etc.                                        |
166fc1f2750SBernard Iremonger   |   |                            |                                                                             |
167fc1f2750SBernard Iremonger   +---+----------------------------+-----------------------------------------------------------------------------+
168fc1f2750SBernard Iremonger   | 2 | Longest Prefix Match (LPM) | Lookup key is the IP address.                                               |
169fc1f2750SBernard Iremonger   |   |                            |                                                                             |
170fc1f2750SBernard Iremonger   |   |                            | Each table entries has an associated IP prefix (IP and depth).              |
171fc1f2750SBernard Iremonger   |   |                            |                                                                             |
172fc1f2750SBernard Iremonger   |   |                            | The table lookup operation selects the IP prefix that is matched by the     |
173fc1f2750SBernard Iremonger   |   |                            | lookup key; in case of multiple matches, the entry with the longest prefix  |
174fc1f2750SBernard Iremonger   |   |                            | depth wins.                                                                 |
175fc1f2750SBernard Iremonger   |   |                            |                                                                             |
176fc1f2750SBernard Iremonger   |   |                            | Typically used to implement IP routing tables.                              |
177fc1f2750SBernard Iremonger   |   |                            |                                                                             |
178fc1f2750SBernard Iremonger   +---+----------------------------+-----------------------------------------------------------------------------+
179fc1f2750SBernard Iremonger   | 3 | Access Control List (ACLs) | Lookup key is 7-tuple of two VLAN/MPLS labels, IP destination address,      |
180fc1f2750SBernard Iremonger   |   |                            | IP source addresses, L4 protocol, L4 destination port, L4 source port.      |
181fc1f2750SBernard Iremonger   |   |                            |                                                                             |
182fc1f2750SBernard Iremonger   |   |                            | Each table entry has an associated ACL and priority. The ACL contains bit   |
183fc1f2750SBernard Iremonger   |   |                            | masks for the VLAN/MPLS labels, IP prefix for IP destination address, IP    |
184fc1f2750SBernard Iremonger   |   |                            | prefix for IP source addresses, L4 protocol and bitmask, L4 destination     |
185fc1f2750SBernard Iremonger   |   |                            | port and bit mask, L4 source port and bit mask.                             |
186fc1f2750SBernard Iremonger   |   |                            |                                                                             |
187fc1f2750SBernard Iremonger   |   |                            | The table lookup operation selects the ACL that is matched by the lookup    |
188fc1f2750SBernard Iremonger   |   |                            | key; in case of multiple matches, the entry with the highest priority wins. |
189fc1f2750SBernard Iremonger   |   |                            |                                                                             |
190fc1f2750SBernard Iremonger   |   |                            | Typically used to implement rule databases for firewalls, etc.              |
191fc1f2750SBernard Iremonger   |   |                            |                                                                             |
192fc1f2750SBernard Iremonger   +---+----------------------------+-----------------------------------------------------------------------------+
193fc1f2750SBernard Iremonger   | 4 | Pattern matching search    | Lookup key is the packet payload.                                           |
194fc1f2750SBernard Iremonger   |   |                            |                                                                             |
195fc1f2750SBernard Iremonger   |   |                            | Table is a database of patterns, with each pattern having a priority        |
196fc1f2750SBernard Iremonger   |   |                            | assigned.                                                                   |
197fc1f2750SBernard Iremonger   |   |                            |                                                                             |
198fc1f2750SBernard Iremonger   |   |                            | The table lookup operation selects the patterns that is matched by the      |
199fc1f2750SBernard Iremonger   |   |                            | input packet; in case of multiple matches, the matching pattern with the    |
200fc1f2750SBernard Iremonger   |   |                            | highest priority wins.                                                      |
201fc1f2750SBernard Iremonger   |   |                            |                                                                             |
202fc1f2750SBernard Iremonger   +---+----------------------------+-----------------------------------------------------------------------------+
203fc1f2750SBernard Iremonger   | 5 | Array                      | Lookup key is the table entry index itself.                                 |
204fc1f2750SBernard Iremonger   |   |                            |                                                                             |
205fc1f2750SBernard Iremonger   +---+----------------------------+-----------------------------------------------------------------------------+
206fc1f2750SBernard Iremonger
207fc1f2750SBernard IremongerTable Interface
208fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~
209fc1f2750SBernard Iremonger
210fc1f2750SBernard IremongerEach table is required to implement an abstract interface that defines the initialization
211fc1f2750SBernard Iremongerand run-time operation of the table.
2128c9a3374SJohn McNamaraThe table abstract interface is described in :numref:`table_qos_29_1`.
213fc1f2750SBernard Iremonger
2148c9a3374SJohn McNamara.. _table_qos_29_1:
215fc1f2750SBernard Iremonger
2168c9a3374SJohn McNamara.. table:: Table Abstract Interface
217fc1f2750SBernard Iremonger
218fc1f2750SBernard Iremonger   +---+-----------------+----------------------------------------------------------------------------------------+
219fc1f2750SBernard Iremonger   | # | Table operation | Description                                                                            |
220fc1f2750SBernard Iremonger   |   |                 |                                                                                        |
221fc1f2750SBernard Iremonger   +===+=================+========================================================================================+
222fc1f2750SBernard Iremonger   | 1 | Create          | Create the low-level data structures of the lookup table. Can internally allocate      |
223fc1f2750SBernard Iremonger   |   |                 | memory.                                                                                |
224fc1f2750SBernard Iremonger   |   |                 |                                                                                        |
225fc1f2750SBernard Iremonger   +---+-----------------+----------------------------------------------------------------------------------------+
226fc1f2750SBernard Iremonger   | 2 | Free            | Free up all the resources used by the lookup table.                                    |
227fc1f2750SBernard Iremonger   |   |                 |                                                                                        |
228fc1f2750SBernard Iremonger   +---+-----------------+----------------------------------------------------------------------------------------+
229fc1f2750SBernard Iremonger   | 3 | Add entry       | Add new entry to the lookup table.                                                     |
230fc1f2750SBernard Iremonger   |   |                 |                                                                                        |
231fc1f2750SBernard Iremonger   +---+-----------------+----------------------------------------------------------------------------------------+
232fc1f2750SBernard Iremonger   | 4 | Delete entry    | Delete specific entry from the lookup table.                                           |
233fc1f2750SBernard Iremonger   |   |                 |                                                                                        |
234fc1f2750SBernard Iremonger   +---+-----------------+----------------------------------------------------------------------------------------+
235fc1f2750SBernard Iremonger   | 5 | Lookup          | Look up a burst of input packets and return a bit mask specifying the result of the    |
236fc1f2750SBernard Iremonger   |   |                 | lookup operation for each packet: a set bit signifies lookup hit for the corresponding |
237fc1f2750SBernard Iremonger   |   |                 | packet, while a cleared bit a lookup miss.                                             |
238fc1f2750SBernard Iremonger   |   |                 |                                                                                        |
239fc1f2750SBernard Iremonger   |   |                 | For each lookup hit packet, the lookup operation also returns a pointer to the table   |
240fc1f2750SBernard Iremonger   |   |                 | entry that was hit, which contains the actions to be applied on the packet and any     |
241fc1f2750SBernard Iremonger   |   |                 | associated metadata.                                                                   |
242fc1f2750SBernard Iremonger   |   |                 |                                                                                        |
243fc1f2750SBernard Iremonger   |   |                 | For each lookup miss packet, the actions to be applied on the packet and any           |
244fc1f2750SBernard Iremonger   |   |                 | associated metadata are specified by the default table entry preconfigured for lookup  |
245fc1f2750SBernard Iremonger   |   |                 | miss.                                                                                  |
246fc1f2750SBernard Iremonger   |   |                 |                                                                                        |
247fc1f2750SBernard Iremonger   +---+-----------------+----------------------------------------------------------------------------------------+
248fc1f2750SBernard Iremonger
249fc1f2750SBernard Iremonger
250fc1f2750SBernard IremongerHash Table Design
251fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~
252fc1f2750SBernard Iremonger
253fc1f2750SBernard IremongerHash Table Overview
254fc1f2750SBernard Iremonger^^^^^^^^^^^^^^^^^^^
255fc1f2750SBernard Iremonger
256fc1f2750SBernard IremongerHash tables are important because the key lookup operation is optimized for speed:
257fc1f2750SBernard Iremongerinstead of having to linearly search the lookup key through all the keys in the table,
258fc1f2750SBernard Iremongerthe search is limited to only the keys stored in a single table bucket.
259fc1f2750SBernard Iremonger
260fc1f2750SBernard Iremonger**Associative Arrays**
261fc1f2750SBernard Iremonger
262fc1f2750SBernard IremongerAn associative array is a function that can be specified as a set of (key, value) pairs,
263fc1f2750SBernard Iremongerwith each key from the possible set of input keys present at most once.
264fc1f2750SBernard IremongerFor a given associative array, the possible operations are:
265fc1f2750SBernard Iremonger
266fc1f2750SBernard Iremonger#.  *add (key, value)*: When no value is currently associated with *key*, then the (key, *value* ) association is created.
267fc1f2750SBernard Iremonger    When *key* is already associated value *value0*, then the association (*key*, *value0*) is removed
268fc1f2750SBernard Iremonger    and association *(key, value)* is created;
269fc1f2750SBernard Iremonger
270fc1f2750SBernard Iremonger#.  *delete key*: When no value is currently associated with *key*, this operation has no effect.
271fc1f2750SBernard Iremonger    When *key* is already associated  *value*, then association  *(key, value)* is removed;
272fc1f2750SBernard Iremonger
273fc1f2750SBernard Iremonger#.  *lookup key*: When no value is currently associated with  *key*, then this operation returns void value (lookup miss).
274fc1f2750SBernard Iremonger    When *key* is associated with *value*, then this operation returns *value*.
275fc1f2750SBernard Iremonger    The *(key, value)* association is not changed.
276fc1f2750SBernard Iremonger
277fc1f2750SBernard IremongerThe matching criterion used to compare the input key against the keys in the associative array is *exact match*,
278fc1f2750SBernard Iremongeras the key size (number of bytes) and the key value (array of bytes) have to match exactly for the two keys under comparison.
279fc1f2750SBernard Iremonger
280fc1f2750SBernard Iremonger**Hash Function**
281fc1f2750SBernard Iremonger
282fc1f2750SBernard IremongerA hash function deterministically maps data of variable length (key) to data of fixed size (hash value or key signature).
283fc1f2750SBernard IremongerTypically, the size of the key is bigger than the size of the key signature.
284fc1f2750SBernard IremongerThe hash function basically compresses a long key into a short signature.
285fc1f2750SBernard IremongerSeveral keys can share the same signature (collisions).
286fc1f2750SBernard Iremonger
287fc1f2750SBernard IremongerHigh quality hash functions have uniform distribution.
288fc1f2750SBernard IremongerFor large number of keys, when dividing the space of signature values into a fixed number of equal intervals (buckets),
289fc1f2750SBernard Iremongerit is desirable to have the key signatures evenly distributed across these intervals (uniform distribution),
290fc1f2750SBernard Iremongeras opposed to most of the signatures going into only a few of the intervals
291fc1f2750SBernard Iremongerand the rest of the intervals being largely unused (non-uniform distribution).
292fc1f2750SBernard Iremonger
293fc1f2750SBernard Iremonger**Hash Table**
294fc1f2750SBernard Iremonger
295fc1f2750SBernard IremongerA hash table is an associative array that uses a hash function for its operation.
296fc1f2750SBernard IremongerThe reason for using a hash function is to optimize the performance of the lookup operation
297fc1f2750SBernard Iremongerby minimizing the number of table keys that have to be compared against the input key.
298fc1f2750SBernard Iremonger
299fc1f2750SBernard IremongerInstead of storing the (key, value) pairs in a single list, the hash table maintains multiple lists (buckets).
300fc1f2750SBernard IremongerFor any given key, there is a single bucket where that key might exist, and this bucket is uniquely identified based on the key signature.
301fc1f2750SBernard IremongerOnce the key signature is computed and the hash table bucket identified,
302fc1f2750SBernard Iremongerthe key is either located in this bucket or it is not present in the hash table at all,
303fc1f2750SBernard Iremongerso the key search can be narrowed down from the full set of keys currently in the table
304fc1f2750SBernard Iremongerto just the set of keys currently in the identified table bucket.
305fc1f2750SBernard Iremonger
306fc1f2750SBernard IremongerThe performance of the hash table lookup operation is greatly improved,
307fea1d908SJohn McNamaraprovided that the table keys are evenly distributed among the hash table buckets,
308fc1f2750SBernard Iremongerwhich can be achieved by using a hash function with uniform distribution.
309fc1f2750SBernard IremongerThe rule to map a key to its bucket can simply be to use the key signature (modulo the number of table buckets) as the table bucket ID:
310fc1f2750SBernard Iremonger
311fc1f2750SBernard Iremonger    *bucket_id = f_hash(key) % n_buckets;*
312fc1f2750SBernard Iremonger
313fc1f2750SBernard IremongerBy selecting the number of buckets to be a power of two, the modulo operator can be replaced by a bitwise AND logical operation:
314fc1f2750SBernard Iremonger
315fc1f2750SBernard Iremonger    *bucket_id = f_hash(key) & (n_buckets - 1);*
316fc1f2750SBernard Iremonger
317fc1f2750SBernard Iremongerconsidering *n_bits* as the number of bits set in *bucket_mask = n_buckets - 1*,
318fc1f2750SBernard Iremongerthis means that all the keys that end up in the same hash table bucket have the lower *n_bits* of their signature identical.
319fc1f2750SBernard IremongerIn order to reduce the number of keys in the same bucket (collisions), the number of hash table buckets needs to be increased.
320fc1f2750SBernard Iremonger
3214a22e6eeSJohn McNamaraIn packet processing context, the sequence of operations involved in hash table operations is described in :numref:`figure_figure33`:
322fc1f2750SBernard Iremonger
3234a22e6eeSJohn McNamara.. _figure_figure33:
324fc1f2750SBernard Iremonger
3254a22e6eeSJohn McNamara.. figure:: img/figure33.*
326fc1f2750SBernard Iremonger
3274a22e6eeSJohn McNamara   Sequence of Steps for Hash Table Operations in a Packet Processing Context
3284a22e6eeSJohn McNamara
329fc1f2750SBernard Iremonger
330fc1f2750SBernard Iremonger
331fc1f2750SBernard IremongerHash Table Use Cases
332fc1f2750SBernard Iremonger^^^^^^^^^^^^^^^^^^^^
333fc1f2750SBernard Iremonger
334fc1f2750SBernard Iremonger**Flow Classification**
335fc1f2750SBernard Iremonger
336fc1f2750SBernard Iremonger*Description:* The flow classification is executed at least once for each input packet.
337fc1f2750SBernard IremongerThis operation maps each incoming packet against one of the known traffic flows in the flow database that typically contains millions of flows.
338fc1f2750SBernard Iremonger
339fc1f2750SBernard Iremonger*Hash table name:* Flow classification table
340fc1f2750SBernard Iremonger
341fc1f2750SBernard Iremonger*Number of keys:* Millions
342fc1f2750SBernard Iremonger
343fc1f2750SBernard Iremonger*Key format:* n-tuple of packet fields that uniquely identify a traffic flow/connection.
344fc1f2750SBernard IremongerExample: DiffServ 5-tuple of (Source IP address, Destination IP address, L4 protocol, L4 protocol source port, L4 protocol destination port).
345fc1f2750SBernard IremongerFor IPv4 protocol and L4 protocols like TCP, UDP or SCTP, the size of the DiffServ 5-tuple is 13 bytes, while for IPv6 it is 37 bytes.
346fc1f2750SBernard Iremonger
347fc1f2750SBernard Iremonger*Key value (key data):* actions and action meta-data describing what processing to be applied for the packets of the current flow.
348fc1f2750SBernard IremongerThe size of the data associated with each traffic flow can vary from 8 bytes to kilobytes.
349fc1f2750SBernard Iremonger
350fc1f2750SBernard Iremonger**Address Resolution Protocol (ARP)**
351fc1f2750SBernard Iremonger
352fc1f2750SBernard Iremonger*Description:* Once a route has been identified for an IP packet (so the output interface and the IP address of the next hop station are known),
353fc1f2750SBernard Iremongerthe MAC address of the next hop station is needed in order to send this packet onto the next leg of the journey
354fc1f2750SBernard Iremongertowards its destination (as identified by its destination IP address).
355fc1f2750SBernard IremongerThe MAC address of the next hop station becomes the destination MAC address of the outgoing Ethernet frame.
356fc1f2750SBernard Iremonger
357fc1f2750SBernard Iremonger*Hash table name:* ARP table
358fc1f2750SBernard Iremonger
359fc1f2750SBernard Iremonger*Number of keys:* Thousands
360fc1f2750SBernard Iremonger
361fc1f2750SBernard Iremonger*Key format:* The pair of (Output interface, Next Hop IP address), which is typically 5 bytes for IPv4 and 17 bytes for IPv6.
362fc1f2750SBernard Iremonger
363fc1f2750SBernard Iremonger*Key value (key data):* MAC address of the next hop station (6 bytes).
364fc1f2750SBernard Iremonger
365fc1f2750SBernard IremongerHash Table Types
366fc1f2750SBernard Iremonger^^^^^^^^^^^^^^^^
367fc1f2750SBernard Iremonger
3688c9a3374SJohn McNamara:numref:`table_qos_22` lists the hash table configuration parameters shared by all different hash table types.
369fc1f2750SBernard Iremonger
3708c9a3374SJohn McNamara.. _table_qos_22:
371fc1f2750SBernard Iremonger
3728c9a3374SJohn McNamara.. table:: Configuration Parameters Common for All Hash Table Types
373fc1f2750SBernard Iremonger
374fc1f2750SBernard Iremonger   +---+---------------------------+------------------------------------------------------------------------------+
375fc1f2750SBernard Iremonger   | # | Parameter                 | Details                                                                      |
376fc1f2750SBernard Iremonger   |   |                           |                                                                              |
377fc1f2750SBernard Iremonger   +===+===========================+==============================================================================+
378fc1f2750SBernard Iremonger   | 1 | Key size                  | Measured as number of bytes. All keys have the same size.                    |
379fc1f2750SBernard Iremonger   |   |                           |                                                                              |
380fc1f2750SBernard Iremonger   +---+---------------------------+------------------------------------------------------------------------------+
381fc1f2750SBernard Iremonger   | 2 | Key value (key data) size | Measured as number of bytes.                                                 |
382fc1f2750SBernard Iremonger   |   |                           |                                                                              |
383fc1f2750SBernard Iremonger   +---+---------------------------+------------------------------------------------------------------------------+
384fc1f2750SBernard Iremonger   | 3 | Number of buckets         | Needs to be a power of two.                                                  |
385fc1f2750SBernard Iremonger   |   |                           |                                                                              |
386fc1f2750SBernard Iremonger   +---+---------------------------+------------------------------------------------------------------------------+
387fc1f2750SBernard Iremonger   | 4 | Maximum number of keys    | Needs to be a power of two.                                                  |
388fc1f2750SBernard Iremonger   |   |                           |                                                                              |
389fc1f2750SBernard Iremonger   +---+---------------------------+------------------------------------------------------------------------------+
390fc1f2750SBernard Iremonger   | 5 | Hash function             | Examples: jhash, CRC hash, etc.                                              |
391fc1f2750SBernard Iremonger   |   |                           |                                                                              |
392fc1f2750SBernard Iremonger   +---+---------------------------+------------------------------------------------------------------------------+
393fc1f2750SBernard Iremonger   | 6 | Hash function seed        | Parameter to be passed to the hash function.                                 |
394fc1f2750SBernard Iremonger   |   |                           |                                                                              |
395fc1f2750SBernard Iremonger   +---+---------------------------+------------------------------------------------------------------------------+
396fc1f2750SBernard Iremonger   | 7 | Key offset                | Offset of the lookup key byte array within the packet meta-data stored in    |
397fc1f2750SBernard Iremonger   |   |                           | the packet buffer.                                                           |
398fc1f2750SBernard Iremonger   |   |                           |                                                                              |
399fc1f2750SBernard Iremonger   +---+---------------------------+------------------------------------------------------------------------------+
400fc1f2750SBernard Iremonger
401fc1f2750SBernard IremongerBucket Full Problem
402fc1f2750SBernard Iremonger"""""""""""""""""""
403fc1f2750SBernard Iremonger
404fc1f2750SBernard IremongerOn initialization, each hash table bucket is allocated space for exactly 4 keys.
405fc1f2750SBernard IremongerAs keys are added to the table, it can happen that a given bucket already has 4 keys when a new key has to be added to this bucket.
406fc1f2750SBernard IremongerThe possible options are:
407fc1f2750SBernard Iremonger
408fc1f2750SBernard Iremonger#.  **Least Recently Used (LRU) Hash Table.**
409fc1f2750SBernard Iremonger    One of the existing keys in the bucket is deleted and the new key is added in its place.
410fc1f2750SBernard Iremonger    The number of keys in each bucket never grows bigger than 4. The logic to pick the key to be dropped from the bucket is LRU.
411fc1f2750SBernard Iremonger    The hash table lookup operation maintains the order in which the keys in the same bucket are hit, so every time a key is hit,
412fc1f2750SBernard Iremonger    it becomes the new Most Recently Used (MRU) key, i.e. the last candidate for drop.
413fc1f2750SBernard Iremonger    When a key is added to the bucket, it also becomes the new MRU key.
414fc1f2750SBernard Iremonger    When a key needs to be picked and dropped, the first candidate for drop, i.e. the current LRU key, is always picked.
415fc1f2750SBernard Iremonger    The LRU logic requires maintaining specific data structures per each bucket.
416fc1f2750SBernard Iremonger
417fea1d908SJohn McNamara#.  **Extendable Bucket Hash Table.**
418fc1f2750SBernard Iremonger    The bucket is extended with space for 4 more keys.
419fc1f2750SBernard Iremonger    This is done by allocating additional memory at table initialization time,
420fc1f2750SBernard Iremonger    which is used to create a pool of free keys (the size of this pool is configurable and always a multiple of 4).
421fc1f2750SBernard Iremonger    On key add operation, the allocation of a group of 4 keys only happens successfully within the limit of free keys,
422fc1f2750SBernard Iremonger    otherwise the key add operation fails.
423fc1f2750SBernard Iremonger    On key delete operation, a group of 4 keys is freed back to the pool of free keys
424fc1f2750SBernard Iremonger    when the key to be deleted is the only key that was used within its group of 4 keys at that time.
425fc1f2750SBernard Iremonger    On key lookup operation, if the current bucket is in extended state and a match is not found in the first group of 4 keys,
426fc1f2750SBernard Iremonger    the search continues beyond the first group of 4 keys, potentially until all keys in this bucket are examined.
427fea1d908SJohn McNamara    The extendable bucket logic requires maintaining specific data structures per table and per each bucket.
428fc1f2750SBernard Iremonger
4298c9a3374SJohn McNamara.. _table_qos_23:
430fc1f2750SBernard Iremonger
4312fe68f32SJohn McNamara.. table:: Configuration Parameters Specific to Extendable Bucket Hash Table
432fc1f2750SBernard Iremonger
433fc1f2750SBernard Iremonger   +---+---------------------------+--------------------------------------------------+
434fc1f2750SBernard Iremonger   | # | Parameter                 | Details                                          |
435fc1f2750SBernard Iremonger   |   |                           |                                                  |
436fc1f2750SBernard Iremonger   +===+===========================+==================================================+
437fc1f2750SBernard Iremonger   | 1 | Number of additional keys | Needs to be a power of two, at least equal to 4. |
438fc1f2750SBernard Iremonger   |   |                           |                                                  |
439fc1f2750SBernard Iremonger   +---+---------------------------+--------------------------------------------------+
440fc1f2750SBernard Iremonger
441fc1f2750SBernard Iremonger
442fc1f2750SBernard IremongerSignature Computation
443fc1f2750SBernard Iremonger"""""""""""""""""""""
444fc1f2750SBernard Iremonger
445fc1f2750SBernard IremongerThe possible options for key signature computation are:
446fc1f2750SBernard Iremonger
447fc1f2750SBernard Iremonger#.  **Pre-computed key signature.**
448fc1f2750SBernard Iremonger    The key lookup operation is split between two CPU cores.
449fc1f2750SBernard Iremonger    The first CPU core (typically the CPU core that performs packet RX) extracts the key from the input packet,
450fc1f2750SBernard Iremonger    computes the key signature and saves both the key and the key signature in the packet buffer as packet meta-data.
451fc1f2750SBernard Iremonger    The second CPU core reads both the key and the key signature from the packet meta-data
452fc1f2750SBernard Iremonger    and performs the bucket search step of the key lookup operation.
453fc1f2750SBernard Iremonger
454fc1f2750SBernard Iremonger#.  **Key signature computed on lookup ("do-sig" version).**
455fc1f2750SBernard Iremonger    The same CPU core reads the key from the packet meta-data, uses it to compute the key signature
456fc1f2750SBernard Iremonger    and also performs the bucket search step of the key lookup operation.
457fc1f2750SBernard Iremonger
4588c9a3374SJohn McNamara.. _table_qos_24:
459fc1f2750SBernard Iremonger
4608c9a3374SJohn McNamara.. table:: Configuration Parameters Specific to Pre-computed Key Signature Hash Table
461fc1f2750SBernard Iremonger
462fc1f2750SBernard Iremonger   +---+------------------+-----------------------------------------------------------------------+
463fc1f2750SBernard Iremonger   | # | Parameter        | Details                                                               |
464fc1f2750SBernard Iremonger   |   |                  |                                                                       |
465fc1f2750SBernard Iremonger   +===+==================+=======================================================================+
466fc1f2750SBernard Iremonger   | 1 | Signature offset | Offset of the pre-computed key signature within the packet meta-data. |
467fc1f2750SBernard Iremonger   |   |                  |                                                                       |
468fc1f2750SBernard Iremonger   +---+------------------+-----------------------------------------------------------------------+
469fc1f2750SBernard Iremonger
470fc1f2750SBernard IremongerKey Size Optimized Hash Tables
471fc1f2750SBernard Iremonger""""""""""""""""""""""""""""""
472fc1f2750SBernard Iremonger
473fc1f2750SBernard IremongerFor specific key sizes, the data structures and algorithm of key lookup operation can be specially handcrafted for further performance improvements,
474fc1f2750SBernard Iremongerso following options are possible:
475fc1f2750SBernard Iremonger
476fc1f2750SBernard Iremonger#.  **Implementation supporting configurable key size.**
477fc1f2750SBernard Iremonger
478fc1f2750SBernard Iremonger#.  **Implementation supporting a single key size.**
479fc1f2750SBernard Iremonger    Typical key sizes are 8 bytes and 16 bytes.
480fc1f2750SBernard Iremonger
481fc1f2750SBernard IremongerBucket Search Logic for Configurable Key Size Hash Tables
482fc1f2750SBernard Iremonger^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
483fc1f2750SBernard Iremonger
484fc1f2750SBernard IremongerThe performance of the bucket search logic is one of the main factors influencing the performance of the key lookup operation.
485fc1f2750SBernard IremongerThe data structures and algorithm are designed to make the best use of Intel CPU architecture resources like:
486fc1f2750SBernard Iremongercache memory space, cache memory bandwidth, external memory bandwidth, multiple execution units working in parallel,
487fc1f2750SBernard Iremongerout of order instruction execution, special CPU instructions, etc.
488fc1f2750SBernard Iremonger
489fc1f2750SBernard IremongerThe bucket search logic handles multiple input packets in parallel.
490fc1f2750SBernard IremongerIt is built as a pipeline of several stages (3 or 4), with each pipeline stage handling two different packets from the burst of input packets.
491fc1f2750SBernard IremongerOn each pipeline iteration, the packets are pushed to the next pipeline stage: for the 4-stage pipeline,
492fc1f2750SBernard Iremongertwo packets (that just completed stage 3) exit the pipeline,
493fc1f2750SBernard Iremongertwo packets (that just completed stage 2) are now executing stage 3, two packets (that just completed stage 1) are now executing stage 2,
494fc1f2750SBernard Iremongertwo packets (that just completed stage 0) are now executing stage 1 and two packets (next two packets to read from the burst of input packets)
495fc1f2750SBernard Iremongerare entering the pipeline to execute stage 0.
496fc1f2750SBernard IremongerThe pipeline iterations continue until all packets from the burst of input packets execute the last stage of the pipeline.
497fc1f2750SBernard Iremonger
498fc1f2750SBernard IremongerThe bucket search logic is broken into pipeline stages at the boundary of the next memory access.
499fc1f2750SBernard IremongerEach pipeline stage uses data structures that are stored (with high probability) into the L1 or L2 cache memory of the current CPU core and
500fc1f2750SBernard Iremongerbreaks just before the next memory access required by the algorithm.
501fc1f2750SBernard IremongerThe current pipeline stage finalizes by prefetching the data structures required by the next pipeline stage,
502fc1f2750SBernard Iremongerso given enough time for the prefetch to complete,
503fc1f2750SBernard Iremongerwhen the next pipeline stage eventually gets executed for the same packets,
504fc1f2750SBernard Iremongerit will read the data structures it needs from L1 or L2 cache memory and thus avoid the significant penalty incurred by L2 or L3 cache memory miss.
505fc1f2750SBernard Iremonger
506fc1f2750SBernard IremongerBy prefetching the data structures required by the next pipeline stage in advance (before they are used)
507fc1f2750SBernard Iremongerand switching to executing another pipeline stage for different packets,
508fc1f2750SBernard Iremongerthe number of L2 or L3 cache memory misses is greatly reduced, hence one of the main reasons for improved performance.
509fc1f2750SBernard IremongerThis is because the cost of L2/L3 cache memory miss on memory read accesses is high, as usually due to data dependency between instructions,
510fc1f2750SBernard Iremongerthe CPU execution units have to stall until the read operation is completed from L3 cache memory or external DRAM memory.
511fc1f2750SBernard IremongerBy using prefetch instructions, the latency of memory read accesses is hidden,
512e4e1f2f7SFlore Norceideprovided that it is performed early enough before the respective data structure is actually used.
513fc1f2750SBernard Iremonger
514fc1f2750SBernard IremongerBy splitting the processing into several stages that are executed on different packets (the packets from the input burst are interlaced),
515fc1f2750SBernard Iremongerenough work is created to allow the prefetch instructions to complete successfully (before the prefetched data structures are actually accessed) and
516fc1f2750SBernard Iremongeralso the data dependency between instructions is loosened.
517fc1f2750SBernard IremongerFor example, for the 4-stage pipeline, stage 0 is executed on packets 0 and 1 and then,
518fc1f2750SBernard Iremongerbefore same packets 0 and 1 are used (i.e. before stage 1 is executed on packets 0 and 1),
519fc1f2750SBernard Iremongerdifferent packets are used: packets 2 and 3 (executing stage 1), packets 4 and 5 (executing stage 2) and packets 6 and 7 (executing stage 3).
520fc1f2750SBernard IremongerBy executing useful work while the data structures are brought into the L1 or L2 cache memory, the latency of the read memory accesses is hidden.
521fc1f2750SBernard IremongerBy increasing the gap between two consecutive accesses to the same data structure, the data dependency between instructions is loosened;
522fc1f2750SBernard Iremongerthis allows making the best use of the super-scalar and out-of-order execution CPU architecture,
523fc1f2750SBernard Iremongeras the number of CPU core execution units that are active (rather than idle or stalled due to data dependency constraints between instructions) is maximized.
524fc1f2750SBernard Iremonger
525fc1f2750SBernard IremongerThe bucket search logic is also implemented without using any branch instructions.
526fc1f2750SBernard IremongerThis avoids the important cost associated with flushing the CPU core execution pipeline on every instance of branch misprediction.
527fc1f2750SBernard Iremonger
528fc1f2750SBernard IremongerConfigurable Key Size Hash Table
529fc1f2750SBernard Iremonger""""""""""""""""""""""""""""""""
530fc1f2750SBernard Iremonger
5318c9a3374SJohn McNamara:numref:`figure_figure34`, :numref:`table_qos_25` and :numref:`table_qos_26` detail the main data structures used to implement configurable key size hash tables (either LRU or extendable bucket,
532fc1f2750SBernard Iremongereither with pre-computed signature or "do-sig").
533fc1f2750SBernard Iremonger
5344a22e6eeSJohn McNamara.. _figure_figure34:
535fc1f2750SBernard Iremonger
5364a22e6eeSJohn McNamara.. figure:: img/figure34.*
537fc1f2750SBernard Iremonger
5384a22e6eeSJohn McNamara   Data Structures for Configurable Key Size Hash Tables
539fc1f2750SBernard Iremonger
540fc1f2750SBernard Iremonger
5418c9a3374SJohn McNamara.. _table_qos_25:
542fc1f2750SBernard Iremonger
5438c9a3374SJohn McNamara.. table:: Main Large Data Structures (Arrays) used for Configurable Key Size Hash Tables
544fc1f2750SBernard Iremonger
545fc1f2750SBernard Iremonger   +---+-------------------------+------------------------------+---------------------------+-------------------------------+
546fc1f2750SBernard Iremonger   | # | Array name              | Number of entries            | Entry size (bytes)        | Description                   |
547fc1f2750SBernard Iremonger   |   |                         |                              |                           |                               |
548fc1f2750SBernard Iremonger   +===+=========================+==============================+===========================+===============================+
549fc1f2750SBernard Iremonger   | 1 | Bucket array            | n_buckets (configurable)     | 32                        | Buckets of the hash table.    |
550fc1f2750SBernard Iremonger   |   |                         |                              |                           |                               |
551fc1f2750SBernard Iremonger   +---+-------------------------+------------------------------+---------------------------+-------------------------------+
552fc1f2750SBernard Iremonger   | 2 | Bucket extensions array | n_buckets_ext (configurable) | 32                        | This array is only created    |
5532fe68f32SJohn McNamara   |   |                         |                              |                           | for extendable bucket tables. |
554fc1f2750SBernard Iremonger   |   |                         |                              |                           |                               |
555fc1f2750SBernard Iremonger   +---+-------------------------+------------------------------+---------------------------+-------------------------------+
556fc1f2750SBernard Iremonger   | 3 | Key array               | n_keys                       | key_size (configurable)   | Keys added to the hash table. |
557fc1f2750SBernard Iremonger   |   |                         |                              |                           |                               |
558fc1f2750SBernard Iremonger   +---+-------------------------+------------------------------+---------------------------+-------------------------------+
559fc1f2750SBernard Iremonger   | 4 | Data array              | n_keys                       | entry_size (configurable) | Key values (key data)         |
560fc1f2750SBernard Iremonger   |   |                         |                              |                           | associated with the hash      |
561fc1f2750SBernard Iremonger   |   |                         |                              |                           | table keys.                   |
562fc1f2750SBernard Iremonger   |   |                         |                              |                           |                               |
563fc1f2750SBernard Iremonger   +---+-------------------------+------------------------------+---------------------------+-------------------------------+
564fc1f2750SBernard Iremonger
5658c9a3374SJohn McNamara.. _table_qos_26:
566fc1f2750SBernard Iremonger
5678c9a3374SJohn McNamara.. table:: Field Description for Bucket Array Entry (Configurable Key Size Hash Tables)
568fc1f2750SBernard Iremonger
569fc1f2750SBernard Iremonger   +---+------------------+--------------------+------------------------------------------------------------------+
570fc1f2750SBernard Iremonger   | # | Field name       | Field size (bytes) | Description                                                      |
571fc1f2750SBernard Iremonger   |   |                  |                    |                                                                  |
572fc1f2750SBernard Iremonger   +===+==================+====================+==================================================================+
573fc1f2750SBernard Iremonger   | 1 | Next Ptr/LRU     | 8                  | For LRU tables, this fields represents the LRU list for the      |
574fc1f2750SBernard Iremonger   |   |                  |                    | current bucket stored as array of 4 entries of 2 bytes each.     |
575fc1f2750SBernard Iremonger   |   |                  |                    | Entry 0 stores the index (0 .. 3) of the MRU key, while entry 3  |
576fc1f2750SBernard Iremonger   |   |                  |                    | stores the index of the LRU key.                                 |
577fc1f2750SBernard Iremonger   |   |                  |                    |                                                                  |
5782fe68f32SJohn McNamara   |   |                  |                    | For extendable bucket tables, this field represents the next     |
579fc1f2750SBernard Iremonger   |   |                  |                    | pointer (i.e. the pointer to the next group of 4 keys linked to  |
580fc1f2750SBernard Iremonger   |   |                  |                    | the current bucket). The next pointer is not NULL if the bucket  |
581fc1f2750SBernard Iremonger   |   |                  |                    | is currently extended or NULL otherwise.                         |
582fc1f2750SBernard Iremonger   |   |                  |                    | To help the branchless implementation, bit 0 (least significant  |
583fc1f2750SBernard Iremonger   |   |                  |                    | bit) of this field is set to 1 if the next pointer is not NULL   |
584fc1f2750SBernard Iremonger   |   |                  |                    | and to 0 otherwise.                                              |
585fc1f2750SBernard Iremonger   |   |                  |                    |                                                                  |
586fc1f2750SBernard Iremonger   +---+------------------+--------------------+------------------------------------------------------------------+
587fc1f2750SBernard Iremonger   | 2 | Sig[0 .. 3]      | 4 x 2              | If key X (X = 0 .. 3) is valid, then sig X bits 15 .. 1 store    |
588fc1f2750SBernard Iremonger   |   |                  |                    | the most significant 15 bits of key X signature and sig X bit 0  |
589fc1f2750SBernard Iremonger   |   |                  |                    | is set to 1.                                                     |
590fc1f2750SBernard Iremonger   |   |                  |                    |                                                                  |
591fc1f2750SBernard Iremonger   |   |                  |                    | If key X is not valid, then sig X is set to zero.                |
592fc1f2750SBernard Iremonger   |   |                  |                    |                                                                  |
593fc1f2750SBernard Iremonger   +---+------------------+--------------------+------------------------------------------------------------------+
594fc1f2750SBernard Iremonger   | 3 | Key Pos [0 .. 3] | 4 x 4              | If key X is valid (X = 0 .. 3), then Key Pos X represents the    |
595fc1f2750SBernard Iremonger   |   |                  |                    | index into the key array where key X is stored, as well as the   |
596fc1f2750SBernard Iremonger   |   |                  |                    | index into the data array where the value associated with key X  |
597fc1f2750SBernard Iremonger   |   |                  |                    | is stored.                                                       |
598fc1f2750SBernard Iremonger   |   |                  |                    |                                                                  |
599fc1f2750SBernard Iremonger   |   |                  |                    | If key X is not valid, then the value of Key Pos X is undefined. |
600fc1f2750SBernard Iremonger   |   |                  |                    |                                                                  |
601fc1f2750SBernard Iremonger   +---+------------------+--------------------+------------------------------------------------------------------+
602fc1f2750SBernard Iremonger
603fc1f2750SBernard Iremonger
6048c9a3374SJohn McNamara:numref:`figure_figure35` and :numref:`table_qos_27` detail the bucket search pipeline stages (either LRU or extendable bucket,
605fc1f2750SBernard Iremongereither with pre-computed signature or "do-sig").
606fc1f2750SBernard IremongerFor each pipeline stage, the described operations are applied to each of the two packets handled by that stage.
607fc1f2750SBernard Iremonger
6084a22e6eeSJohn McNamara.. _figure_figure35:
609fc1f2750SBernard Iremonger
6104a22e6eeSJohn McNamara.. figure:: img/figure35.*
611fc1f2750SBernard Iremonger
6124a22e6eeSJohn McNamara   Bucket Search Pipeline for Key Lookup Operation (Configurable Key Size Hash
6134a22e6eeSJohn McNamara   Tables)
6144a22e6eeSJohn McNamara
615fc1f2750SBernard Iremonger
6168c9a3374SJohn McNamara.. _table_qos_27:
617fc1f2750SBernard Iremonger
6188c9a3374SJohn McNamara.. table:: Description of the Bucket Search Pipeline Stages (Configurable Key Size Hash Tables)
619fc1f2750SBernard Iremonger
620fc1f2750SBernard Iremonger   +---+---------------------------+------------------------------------------------------------------------------+
621fc1f2750SBernard Iremonger   | # | Stage name                | Description                                                                  |
622fc1f2750SBernard Iremonger   |   |                           |                                                                              |
623fc1f2750SBernard Iremonger   +===+===========================+==============================================================================+
624fc1f2750SBernard Iremonger   | 0 | Prefetch packet meta-data | Select next two packets from the burst of input packets.                     |
625fc1f2750SBernard Iremonger   |   |                           |                                                                              |
626fc1f2750SBernard Iremonger   |   |                           | Prefetch packet meta-data containing the key and key signature.              |
627fc1f2750SBernard Iremonger   |   |                           |                                                                              |
628fc1f2750SBernard Iremonger   +---+---------------------------+------------------------------------------------------------------------------+
629fc1f2750SBernard Iremonger   | 1 | Prefetch table bucket     | Read the key signature from the packet meta-data (for extendable bucket hash |
630fc1f2750SBernard Iremonger   |   |                           | tables) or read the key from the packet meta-data and compute key signature  |
631fc1f2750SBernard Iremonger   |   |                           | (for LRU tables).                                                            |
632fc1f2750SBernard Iremonger   |   |                           |                                                                              |
633fc1f2750SBernard Iremonger   |   |                           | Identify the bucket ID using the key signature.                              |
634fc1f2750SBernard Iremonger   |   |                           |                                                                              |
635fc1f2750SBernard Iremonger   |   |                           | Set bit 0 of the signature to 1 (to match only signatures of valid keys from |
636fc1f2750SBernard Iremonger   |   |                           | the table).                                                                  |
637fc1f2750SBernard Iremonger   |   |                           |                                                                              |
638fc1f2750SBernard Iremonger   |   |                           | Prefetch the bucket.                                                         |
639fc1f2750SBernard Iremonger   |   |                           |                                                                              |
640fc1f2750SBernard Iremonger   +---+---------------------------+------------------------------------------------------------------------------+
641fc1f2750SBernard Iremonger   | 2 | Prefetch table key        | Read the key signatures from the bucket.                                     |
642fc1f2750SBernard Iremonger   |   |                           |                                                                              |
643fc1f2750SBernard Iremonger   |   |                           | Compare the signature of the input key against the 4 key signatures from the |
644fc1f2750SBernard Iremonger   |   |                           | packet. As result, the following is obtained:                                |
645fc1f2750SBernard Iremonger   |   |                           |                                                                              |
646fc1f2750SBernard Iremonger   |   |                           | *match*                                                                      |
647fc1f2750SBernard Iremonger   |   |                           | = equal to TRUE if there was at least one signature match and to FALSE in    |
648fc1f2750SBernard Iremonger   |   |                           | the case of no signature match;                                              |
649fc1f2750SBernard Iremonger   |   |                           |                                                                              |
650fc1f2750SBernard Iremonger   |   |                           | *match_many*                                                                 |
651fc1f2750SBernard Iremonger   |   |                           | = equal to TRUE is there were more than one signature matches (can be up to  |
652fc1f2750SBernard Iremonger   |   |                           | 4 signature matches in the worst case scenario) and to FALSE otherwise;      |
653fc1f2750SBernard Iremonger   |   |                           |                                                                              |
654fc1f2750SBernard Iremonger   |   |                           | *match_pos*                                                                  |
655fc1f2750SBernard Iremonger   |   |                           | = the index of the first key that produced signature match (only valid if    |
656fc1f2750SBernard Iremonger   |   |                           | match is true).                                                              |
657fc1f2750SBernard Iremonger   |   |                           |                                                                              |
658fc1f2750SBernard Iremonger   |   |                           | For extendable bucket hash tables only, set                                  |
659fc1f2750SBernard Iremonger   |   |                           | *match_many*                                                                 |
660fc1f2750SBernard Iremonger   |   |                           | to TRUE if next pointer is valid.                                            |
661fc1f2750SBernard Iremonger   |   |                           |                                                                              |
662fc1f2750SBernard Iremonger   |   |                           | Prefetch the bucket key indicated by                                         |
663fc1f2750SBernard Iremonger   |   |                           | *match_pos*                                                                  |
664fc1f2750SBernard Iremonger   |   |                           | (even if                                                                     |
665fc1f2750SBernard Iremonger   |   |                           | *match_pos*                                                                  |
666fc1f2750SBernard Iremonger   |   |                           | does not point to valid key valid).                                          |
667fc1f2750SBernard Iremonger   |   |                           |                                                                              |
668fc1f2750SBernard Iremonger   +---+---------------------------+------------------------------------------------------------------------------+
669fc1f2750SBernard Iremonger   | 3 | Prefetch table data       | Read the bucket key indicated by                                             |
670fc1f2750SBernard Iremonger   |   |                           | *match_pos*.                                                                 |
671fc1f2750SBernard Iremonger   |   |                           |                                                                              |
672fc1f2750SBernard Iremonger   |   |                           | Compare the bucket key against the input key. As result, the following is    |
673fc1f2750SBernard Iremonger   |   |                           | obtained:                                                                    |
674fc1f2750SBernard Iremonger   |   |                           | *match_key*                                                                  |
675fc1f2750SBernard Iremonger   |   |                           | = equal to TRUE if the two keys match and to FALSE otherwise.                |
676fc1f2750SBernard Iremonger   |   |                           |                                                                              |
677fc1f2750SBernard Iremonger   |   |                           | Report input key as lookup hit only when both                                |
678fc1f2750SBernard Iremonger   |   |                           | *match*                                                                      |
679fc1f2750SBernard Iremonger   |   |                           | and                                                                          |
680fc1f2750SBernard Iremonger   |   |                           | *match_key*                                                                  |
681fc1f2750SBernard Iremonger   |   |                           | are equal to TRUE and as lookup miss otherwise.                              |
682fc1f2750SBernard Iremonger   |   |                           |                                                                              |
683fc1f2750SBernard Iremonger   |   |                           | For LRU tables only, use branchless logic to update the bucket LRU list      |
684fc1f2750SBernard Iremonger   |   |                           | (the current key becomes the new MRU) only on lookup hit.                    |
685fc1f2750SBernard Iremonger   |   |                           |                                                                              |
686fc1f2750SBernard Iremonger   |   |                           | Prefetch the key value (key data) associated with the current key (to avoid  |
687fc1f2750SBernard Iremonger   |   |                           | branches, this is done on both lookup hit and miss).                         |
688fc1f2750SBernard Iremonger   |   |                           |                                                                              |
689fc1f2750SBernard Iremonger   +---+---------------------------+------------------------------------------------------------------------------+
690fc1f2750SBernard Iremonger
691fc1f2750SBernard Iremonger
692fc1f2750SBernard IremongerAdditional notes:
693fc1f2750SBernard Iremonger
694fc1f2750SBernard Iremonger#.  The pipelined version of the bucket search algorithm is executed only if there are at least 7 packets in the burst of input packets.
695fc1f2750SBernard Iremonger    If there are less than 7 packets in the burst of input packets,
696fc1f2750SBernard Iremonger    a non-optimized implementation of the bucket search algorithm is executed.
697fc1f2750SBernard Iremonger
698fc1f2750SBernard Iremonger#.  Once the pipelined version of the bucket search algorithm has been executed for all the packets in the burst of input packets,
699fc1f2750SBernard Iremonger    the non-optimized implementation of the bucket search algorithm is also executed for any packets that did not produce a lookup hit,
700fc1f2750SBernard Iremonger    but have the *match_many* flag set.
701fc1f2750SBernard Iremonger    As result of executing the non-optimized version, some of these packets may produce a lookup hit or lookup miss.
702fc1f2750SBernard Iremonger    This does not impact the performance of the key lookup operation,
703fc1f2750SBernard Iremonger    as the probability of matching more than one signature in the same group of 4 keys or of having the bucket in extended state
704fc1f2750SBernard Iremonger    (for extendable bucket hash tables only) is relatively small.
705fc1f2750SBernard Iremonger
706fc1f2750SBernard Iremonger**Key Signature Comparison Logic**
707fc1f2750SBernard Iremonger
7088c9a3374SJohn McNamaraThe key signature comparison logic is described in :numref:`table_qos_28`.
709fc1f2750SBernard Iremonger
7108c9a3374SJohn McNamara.. _table_qos_28:
711fc1f2750SBernard Iremonger
7128c9a3374SJohn McNamara.. table:: Lookup Tables for Match, Match_Many and Match_Pos
713fc1f2750SBernard Iremonger
714fc1f2750SBernard Iremonger   +----+------+---------------+--------------------+--------------------+
715fc1f2750SBernard Iremonger   | #  | mask | match (1 bit) | match_many (1 bit) | match_pos (2 bits) |
716fc1f2750SBernard Iremonger   |    |      |               |                    |                    |
717fc1f2750SBernard Iremonger   +----+------+---------------+--------------------+--------------------+
718fc1f2750SBernard Iremonger   | 0  | 0000 | 0             | 0                  | 00                 |
719fc1f2750SBernard Iremonger   |    |      |               |                    |                    |
720fc1f2750SBernard Iremonger   +----+------+---------------+--------------------+--------------------+
721fc1f2750SBernard Iremonger   | 1  | 0001 | 1             | 0                  | 00                 |
722fc1f2750SBernard Iremonger   |    |      |               |                    |                    |
723fc1f2750SBernard Iremonger   +----+------+---------------+--------------------+--------------------+
724fc1f2750SBernard Iremonger   | 2  | 0010 | 1             | 0                  | 01                 |
725fc1f2750SBernard Iremonger   |    |      |               |                    |                    |
726fc1f2750SBernard Iremonger   +----+------+---------------+--------------------+--------------------+
727fc1f2750SBernard Iremonger   | 3  | 0011 | 1             | 1                  | 00                 |
728fc1f2750SBernard Iremonger   |    |      |               |                    |                    |
729fc1f2750SBernard Iremonger   +----+------+---------------+--------------------+--------------------+
730fc1f2750SBernard Iremonger   | 4  | 0100 | 1             | 0                  | 10                 |
731fc1f2750SBernard Iremonger   |    |      |               |                    |                    |
732fc1f2750SBernard Iremonger   +----+------+---------------+--------------------+--------------------+
733fc1f2750SBernard Iremonger   | 5  | 0101 | 1             | 1                  | 00                 |
734fc1f2750SBernard Iremonger   |    |      |               |                    |                    |
735fc1f2750SBernard Iremonger   +----+------+---------------+--------------------+--------------------+
736fc1f2750SBernard Iremonger   | 6  | 0110 | 1             | 1                  | 01                 |
737fc1f2750SBernard Iremonger   |    |      |               |                    |                    |
738fc1f2750SBernard Iremonger   +----+------+---------------+--------------------+--------------------+
739fc1f2750SBernard Iremonger   | 7  | 0111 | 1             | 1                  | 00                 |
740fc1f2750SBernard Iremonger   |    |      |               |                    |                    |
741fc1f2750SBernard Iremonger   +----+------+---------------+--------------------+--------------------+
742fc1f2750SBernard Iremonger   | 8  | 1000 | 1             | 0                  | 11                 |
743fc1f2750SBernard Iremonger   |    |      |               |                    |                    |
744fc1f2750SBernard Iremonger   +----+------+---------------+--------------------+--------------------+
745fc1f2750SBernard Iremonger   | 9  | 1001 | 1             | 1                  | 00                 |
746fc1f2750SBernard Iremonger   |    |      |               |                    |                    |
747fc1f2750SBernard Iremonger   +----+------+---------------+--------------------+--------------------+
748fc1f2750SBernard Iremonger   | 10 | 1010 | 1             | 1                  | 01                 |
749fc1f2750SBernard Iremonger   |    |      |               |                    |                    |
750fc1f2750SBernard Iremonger   +----+------+---------------+--------------------+--------------------+
751fc1f2750SBernard Iremonger   | 11 | 1011 | 1             | 1                  | 00                 |
752fc1f2750SBernard Iremonger   |    |      |               |                    |                    |
753fc1f2750SBernard Iremonger   +----+------+---------------+--------------------+--------------------+
754fc1f2750SBernard Iremonger   | 12 | 1100 | 1             | 1                  | 10                 |
755fc1f2750SBernard Iremonger   |    |      |               |                    |                    |
756fc1f2750SBernard Iremonger   +----+------+---------------+--------------------+--------------------+
757fc1f2750SBernard Iremonger   | 13 | 1101 | 1             | 1                  | 00                 |
758fc1f2750SBernard Iremonger   |    |      |               |                    |                    |
759fc1f2750SBernard Iremonger   +----+------+---------------+--------------------+--------------------+
760fc1f2750SBernard Iremonger   | 14 | 1110 | 1             | 1                  | 01                 |
761fc1f2750SBernard Iremonger   |    |      |               |                    |                    |
762fc1f2750SBernard Iremonger   +----+------+---------------+--------------------+--------------------+
763fc1f2750SBernard Iremonger   | 15 | 1111 | 1             | 1                  | 00                 |
764fc1f2750SBernard Iremonger   |    |      |               |                    |                    |
765fc1f2750SBernard Iremonger   +----+------+---------------+--------------------+--------------------+
766fc1f2750SBernard Iremonger
767fc1f2750SBernard IremongerThe input *mask* hash bit X (X = 0 .. 3) set to 1 if input signature is equal to bucket signature X and set to 0 otherwise.
768fc1f2750SBernard IremongerThe outputs *match*, *match_many* and *match_pos* are 1 bit, 1 bit and 2 bits in size respectively and their meaning has been explained above.
769fc1f2750SBernard Iremonger
7708c9a3374SJohn McNamaraAs displayed in :numref:`table_qos_29`, the lookup tables for *match* and *match_many* can be collapsed into a single 32-bit value and the lookup table for
771fc1f2750SBernard Iremonger*match_pos* can be collapsed into a 64-bit value.
772fc1f2750SBernard IremongerGiven the input *mask*, the values for *match*, *match_many* and *match_pos* can be obtained by indexing their respective bit array to extract 1 bit,
773fc1f2750SBernard Iremonger1 bit and 2 bits respectively with branchless logic.
774fc1f2750SBernard Iremonger
7758c9a3374SJohn McNamara.. _table_qos_29:
776fc1f2750SBernard Iremonger
7778c9a3374SJohn McNamara.. table:: Collapsed Lookup Tables for Match, Match_Many and Match_Pos
778fc1f2750SBernard Iremonger
779fc1f2750SBernard Iremonger   +------------+------------------------------------------+-------------------+
780fc1f2750SBernard Iremonger   |            | Bit array                                | Hexadecimal value |
781fc1f2750SBernard Iremonger   |            |                                          |                   |
782fc1f2750SBernard Iremonger   +------------+------------------------------------------+-------------------+
783fc1f2750SBernard Iremonger   | match      | 1111_1111_1111_1110                      | 0xFFFELLU         |
784fc1f2750SBernard Iremonger   |            |                                          |                   |
785fc1f2750SBernard Iremonger   +------------+------------------------------------------+-------------------+
786fc1f2750SBernard Iremonger   | match_many | 1111_1110_1110_1000                      | 0xFEE8LLU         |
787fc1f2750SBernard Iremonger   |            |                                          |                   |
788fc1f2750SBernard Iremonger   +------------+------------------------------------------+-------------------+
789fc1f2750SBernard Iremonger   | match_pos  | 0001_0010_0001_0011__0001_0010_0001_0000 | 0x12131210LLU     |
790fc1f2750SBernard Iremonger   |            |                                          |                   |
791fc1f2750SBernard Iremonger   +------------+------------------------------------------+-------------------+
792fc1f2750SBernard Iremonger
793fc1f2750SBernard Iremonger
7944a22e6eeSJohn McNamaraThe pseudo-code for match, match_many and match_pos is::
795fc1f2750SBernard Iremonger
796fc1f2750SBernard Iremonger    match = (0xFFFELLU >> mask) & 1;
797fc1f2750SBernard Iremonger
798fc1f2750SBernard Iremonger    match_many = (0xFEE8LLU >> mask) & 1;
799fc1f2750SBernard Iremonger
800fc1f2750SBernard Iremonger    match_pos = (0x12131210LLU >> (mask << 1)) & 3;
801fc1f2750SBernard Iremonger
802fc1f2750SBernard IremongerSingle Key Size Hash Tables
803fc1f2750SBernard Iremonger"""""""""""""""""""""""""""
804fc1f2750SBernard Iremonger
8058c9a3374SJohn McNamara:numref:`figure_figure37`, :numref:`figure_figure38`, :numref:`table_qos_30` and :numref:`table_qos_31` detail the main data structures used to implement 8-byte and 16-byte key hash tables
806fc1f2750SBernard Iremonger(either LRU or extendable bucket, either with pre-computed signature or "do-sig").
807fc1f2750SBernard Iremonger
8084a22e6eeSJohn McNamara.. _figure_figure37:
809fc1f2750SBernard Iremonger
8104a22e6eeSJohn McNamara.. figure:: img/figure37.*
811fc1f2750SBernard Iremonger
8124a22e6eeSJohn McNamara   Data Structures for 8-byte Key Hash Tables
813fc1f2750SBernard Iremonger
814fc1f2750SBernard Iremonger
8154a22e6eeSJohn McNamara.. _figure_figure38:
816fc1f2750SBernard Iremonger
8174a22e6eeSJohn McNamara.. figure:: img/figure38.*
818fc1f2750SBernard Iremonger
8194a22e6eeSJohn McNamara   Data Structures for 16-byte Key Hash Tables
820fc1f2750SBernard Iremonger
821fc1f2750SBernard Iremonger
8228c9a3374SJohn McNamara.. _table_qos_30:
823fc1f2750SBernard Iremonger
8248c9a3374SJohn McNamara.. table:: Main Large Data Structures (Arrays) used for 8-byte and 16-byte Key Size Hash Tables
825fc1f2750SBernard Iremonger
826fc1f2750SBernard Iremonger   +---+-------------------------+------------------------------+----------------------+------------------------------------+
827fc1f2750SBernard Iremonger   | # | Array name              | Number of entries            | Entry size (bytes)   | Description                        |
828fc1f2750SBernard Iremonger   |   |                         |                              |                      |                                    |
829fc1f2750SBernard Iremonger   +===+=========================+==============================+======================+====================================+
830fc1f2750SBernard Iremonger   | 1 | Bucket array            | n_buckets (configurable)     | *8-byte key size:*   | Buckets of the hash table.         |
831fc1f2750SBernard Iremonger   |   |                         |                              |                      |                                    |
832fc1f2750SBernard Iremonger   |   |                         |                              | 64 + 4 x entry_size  |                                    |
833fc1f2750SBernard Iremonger   |   |                         |                              |                      |                                    |
834fc1f2750SBernard Iremonger   |   |                         |                              |                      |                                    |
835fc1f2750SBernard Iremonger   |   |                         |                              | *16-byte key size:*  |                                    |
836fc1f2750SBernard Iremonger   |   |                         |                              |                      |                                    |
837fc1f2750SBernard Iremonger   |   |                         |                              | 128 + 4 x entry_size |                                    |
838fc1f2750SBernard Iremonger   |   |                         |                              |                      |                                    |
839fc1f2750SBernard Iremonger   +---+-------------------------+------------------------------+----------------------+------------------------------------+
840fc1f2750SBernard Iremonger   | 2 | Bucket extensions array | n_buckets_ext (configurable) | *8-byte key size:*   | This array is only created for     |
8412fe68f32SJohn McNamara   |   |                         |                              |                      | extendable bucket tables.          |
842fc1f2750SBernard Iremonger   |   |                         |                              |                      |                                    |
843fc1f2750SBernard Iremonger   |   |                         |                              | 64 + 4 x entry_size  |                                    |
844fc1f2750SBernard Iremonger   |   |                         |                              |                      |                                    |
845fc1f2750SBernard Iremonger   |   |                         |                              |                      |                                    |
846fc1f2750SBernard Iremonger   |   |                         |                              | *16-byte key size:*  |                                    |
847fc1f2750SBernard Iremonger   |   |                         |                              |                      |                                    |
848fc1f2750SBernard Iremonger   |   |                         |                              | 128 + 4 x entry_size |                                    |
849fc1f2750SBernard Iremonger   |   |                         |                              |                      |                                    |
850fc1f2750SBernard Iremonger   +---+-------------------------+------------------------------+----------------------+------------------------------------+
851fc1f2750SBernard Iremonger
8528c9a3374SJohn McNamara.. _table_qos_31:
853fc1f2750SBernard Iremonger
8548c9a3374SJohn McNamara.. table:: Field Description for Bucket Array Entry (8-byte and 16-byte Key Hash Tables)
855fc1f2750SBernard Iremonger
856fc1f2750SBernard Iremonger   +---+---------------+--------------------+-------------------------------------------------------------------------------+
857fc1f2750SBernard Iremonger   | # | Field name    | Field size (bytes) | Description                                                                   |
858fc1f2750SBernard Iremonger   |   |               |                    |                                                                               |
859fc1f2750SBernard Iremonger   +===+===============+====================+===============================================================================+
860fc1f2750SBernard Iremonger   | 1 | Valid         | 8                  | Bit X (X = 0 .. 3) is set to 1 if key X is valid or to 0 otherwise.           |
861fc1f2750SBernard Iremonger   |   |               |                    |                                                                               |
8622fe68f32SJohn McNamara   |   |               |                    | Bit 4 is only used for extendable bucket tables to help with the              |
863fc1f2750SBernard Iremonger   |   |               |                    | implementation of the branchless logic. In this case, bit 4 is set to 1 if    |
864fc1f2750SBernard Iremonger   |   |               |                    | next pointer is valid (not NULL) or to 0 otherwise.                           |
865fc1f2750SBernard Iremonger   |   |               |                    |                                                                               |
866fc1f2750SBernard Iremonger   +---+---------------+--------------------+-------------------------------------------------------------------------------+
867fc1f2750SBernard Iremonger   | 2 | Next Ptr/LRU  | 8                  | For LRU tables, this fields represents the LRU list for the current bucket    |
868fc1f2750SBernard Iremonger   |   |               |                    | stored as array of 4 entries of 2 bytes each. Entry 0 stores the index        |
869fc1f2750SBernard Iremonger   |   |               |                    | (0 .. 3) of the MRU key, while entry 3 stores the index of the LRU key.       |
870fc1f2750SBernard Iremonger   |   |               |                    |                                                                               |
8712fe68f32SJohn McNamara   |   |               |                    | For extendable bucket tables, this field represents the next pointer (i.e.    |
872fc1f2750SBernard Iremonger   |   |               |                    | the pointer to the next group of 4 keys linked to the current bucket). The    |
873fc1f2750SBernard Iremonger   |   |               |                    | next pointer is not NULL if the bucket is currently extended or NULL          |
874fc1f2750SBernard Iremonger   |   |               |                    | otherwise.                                                                    |
875fc1f2750SBernard Iremonger   |   |               |                    |                                                                               |
876fc1f2750SBernard Iremonger   +---+---------------+--------------------+-------------------------------------------------------------------------------+
877fc1f2750SBernard Iremonger   | 3 | Key [0 .. 3]  | 4 x key_size       | Full keys.                                                                    |
878fc1f2750SBernard Iremonger   |   |               |                    |                                                                               |
879fc1f2750SBernard Iremonger   +---+---------------+--------------------+-------------------------------------------------------------------------------+
880fc1f2750SBernard Iremonger   | 4 | Data [0 .. 3] | 4 x entry_size     | Full key values (key data) associated with keys 0 .. 3.                       |
881fc1f2750SBernard Iremonger   |   |               |                    |                                                                               |
882fc1f2750SBernard Iremonger   +---+---------------+--------------------+-------------------------------------------------------------------------------+
883fc1f2750SBernard Iremonger
884fc1f2750SBernard Iremongerand detail the bucket search pipeline used to implement 8-byte and 16-byte key hash tables (either LRU or extendable bucket,
885fc1f2750SBernard Iremongereither with pre-computed signature or "do-sig").
886fc1f2750SBernard IremongerFor each pipeline stage, the described operations are applied to each of the two packets handled by that stage.
887fc1f2750SBernard Iremonger
8884a22e6eeSJohn McNamara.. _figure_figure39:
889fc1f2750SBernard Iremonger
8904a22e6eeSJohn McNamara.. figure:: img/figure39.*
891fc1f2750SBernard Iremonger
8924a22e6eeSJohn McNamara   Bucket Search Pipeline for Key Lookup Operation (Single Key Size Hash
8934a22e6eeSJohn McNamara   Tables)
8944a22e6eeSJohn McNamara
895fc1f2750SBernard Iremonger
8968c9a3374SJohn McNamara.. _table_qos_32:
897fc1f2750SBernard Iremonger
8988c9a3374SJohn McNamara.. table:: Description of the Bucket Search Pipeline Stages (8-byte and 16-byte Key Hash Tables)
899fc1f2750SBernard Iremonger
900fc1f2750SBernard Iremonger   +---+---------------------------+-----------------------------------------------------------------------------+
901fc1f2750SBernard Iremonger   | # | Stage name                | Description                                                                 |
902fc1f2750SBernard Iremonger   |   |                           |                                                                             |
903fc1f2750SBernard Iremonger   +===+===========================+=============================================================================+
904fc1f2750SBernard Iremonger   | 0 | Prefetch packet meta-data | #.  Select next two packets from the burst of input packets.                |
905fc1f2750SBernard Iremonger   |   |                           |                                                                             |
906fc1f2750SBernard Iremonger   |   |                           | #.  Prefetch packet meta-data containing the key and key signature.         |
907fc1f2750SBernard Iremonger   |   |                           |                                                                             |
908fc1f2750SBernard Iremonger   +---+---------------------------+-----------------------------------------------------------------------------+
909fc1f2750SBernard Iremonger   | 1 | Prefetch table bucket     | #.  Read the key signature from the packet meta-data (for extendable bucket |
910fc1f2750SBernard Iremonger   |   |                           |     hash tables) or read the key from the packet meta-data and compute key  |
911fc1f2750SBernard Iremonger   |   |                           |     signature (for LRU tables).                                             |
912fc1f2750SBernard Iremonger   |   |                           |                                                                             |
913fc1f2750SBernard Iremonger   |   |                           | #.  Identify the bucket ID using the key signature.                         |
914fc1f2750SBernard Iremonger   |   |                           |                                                                             |
915fc1f2750SBernard Iremonger   |   |                           | #.  Prefetch the bucket.                                                    |
916fc1f2750SBernard Iremonger   |   |                           |                                                                             |
917fc1f2750SBernard Iremonger   +---+---------------------------+-----------------------------------------------------------------------------+
918fc1f2750SBernard Iremonger   | 2 | Prefetch table data       | #.  Read the bucket.                                                        |
919fc1f2750SBernard Iremonger   |   |                           |                                                                             |
920fc1f2750SBernard Iremonger   |   |                           | #.  Compare all 4 bucket keys against the input key.                        |
921fc1f2750SBernard Iremonger   |   |                           |                                                                             |
922fc1f2750SBernard Iremonger   |   |                           | #.  Report input key as lookup hit only when a match is identified (more    |
923fc1f2750SBernard Iremonger   |   |                           |     than one key match is not possible)                                     |
924fc1f2750SBernard Iremonger   |   |                           |                                                                             |
925fc1f2750SBernard Iremonger   |   |                           | #.  For LRU tables only, use branchless logic to update the bucket LRU list |
926fc1f2750SBernard Iremonger   |   |                           |     (the current key becomes the new MRU) only on lookup hit.               |
927fc1f2750SBernard Iremonger   |   |                           |                                                                             |
928fc1f2750SBernard Iremonger   |   |                           | #.  Prefetch the key value (key data) associated with the matched key (to   |
929fc1f2750SBernard Iremonger   |   |                           |     avoid branches, this is done on both lookup hit and miss).              |
930fc1f2750SBernard Iremonger   |   |                           |                                                                             |
931fc1f2750SBernard Iremonger   +---+---------------------------+-----------------------------------------------------------------------------+
932fc1f2750SBernard Iremonger
933fc1f2750SBernard IremongerAdditional notes:
934fc1f2750SBernard Iremonger
935fc1f2750SBernard Iremonger#.  The pipelined version of the bucket search algorithm is executed only if there are at least 5 packets in the burst of input packets.
936fc1f2750SBernard Iremonger    If there are less than 5 packets in the burst of input packets, a non-optimized implementation of the bucket search algorithm is executed.
937fc1f2750SBernard Iremonger
938fea1d908SJohn McNamara#.  For extendable bucket hash tables only,
939fc1f2750SBernard Iremonger    once the pipelined version of the bucket search algorithm has been executed for all the packets in the burst of input packets,
940fc1f2750SBernard Iremonger    the non-optimized implementation of the bucket search algorithm is also executed for any packets that did not produce a lookup hit,
941fc1f2750SBernard Iremonger    but have the bucket in extended state.
942fc1f2750SBernard Iremonger    As result of executing the non-optimized version, some of these packets may produce a lookup hit or lookup miss.
943fc1f2750SBernard Iremonger    This does not impact the performance of the key lookup operation,
944fc1f2750SBernard Iremonger    as the probability of having the bucket in extended state is relatively small.
945fc1f2750SBernard Iremonger
946fc1f2750SBernard IremongerPipeline Library Design
947fc1f2750SBernard Iremonger-----------------------
948fc1f2750SBernard Iremonger
949fc1f2750SBernard IremongerA pipeline is defined by:
950fc1f2750SBernard Iremonger
951fc1f2750SBernard Iremonger#.  The set of input ports;
952fc1f2750SBernard Iremonger
953fc1f2750SBernard Iremonger#.  The set of output ports;
954fc1f2750SBernard Iremonger
955fc1f2750SBernard Iremonger#.  The set of tables;
956fc1f2750SBernard Iremonger
957fc1f2750SBernard Iremonger#.  The set of actions.
958fc1f2750SBernard Iremonger
959fc1f2750SBernard IremongerThe input ports are connected with the output ports through tree-like topologies of interconnected tables.
960fc1f2750SBernard IremongerThe table entries contain the actions defining the operations to be executed on the input packets and the packet flow within the pipeline.
961fc1f2750SBernard Iremonger
962fc1f2750SBernard IremongerConnectivity of Ports and Tables
963fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
964fc1f2750SBernard Iremonger
965fc1f2750SBernard IremongerTo avoid any dependencies on the order in which pipeline elements are created,
966fc1f2750SBernard Iremongerthe connectivity of pipeline elements is defined after all the pipeline input ports,
967fc1f2750SBernard Iremongeroutput ports and tables have been created.
968fc1f2750SBernard Iremonger
969fc1f2750SBernard IremongerGeneral connectivity rules:
970fc1f2750SBernard Iremonger
971fc1f2750SBernard Iremonger#.  Each input port is connected to a single table. No input port should be left unconnected;
972fc1f2750SBernard Iremonger
973fc1f2750SBernard Iremonger#.  The table connectivity to other tables or to output ports is regulated by the next hop actions of each table entry and the default table entry.
974fc1f2750SBernard Iremonger    The table connectivity is fluid, as the table entries and the default table entry can be updated during run-time.
975fc1f2750SBernard Iremonger
976fc1f2750SBernard Iremonger    *   A table can have multiple entries (including the default entry) connected to the same output port.
977fc1f2750SBernard Iremonger        A table can have different entries connected to different output ports.
978fc1f2750SBernard Iremonger        Different tables can have entries (including default table entry) connected to the same output port.
979fc1f2750SBernard Iremonger
980fc1f2750SBernard Iremonger    *   A table can have multiple entries (including the default entry) connected to another table,
981fc1f2750SBernard Iremonger        in which case all these entries have to point to the same table.
982fc1f2750SBernard Iremonger        This constraint is enforced by the API and prevents tree-like topologies from being created (allowing table chaining only),
983fc1f2750SBernard Iremonger        with the purpose of simplifying the implementation of the pipeline run-time execution engine.
984fc1f2750SBernard Iremonger
985fc1f2750SBernard IremongerPort Actions
986fc1f2750SBernard Iremonger~~~~~~~~~~~~
987fc1f2750SBernard Iremonger
988fc1f2750SBernard IremongerPort Action Handler
989fc1f2750SBernard Iremonger^^^^^^^^^^^^^^^^^^^
990fc1f2750SBernard Iremonger
991fc1f2750SBernard IremongerAn action handler can be assigned to each input/output port to define actions to be executed on each input packet that is received by the port.
992fc1f2750SBernard IremongerDefining the action handler for a specific input/output port is optional (i.e. the action handler can be disabled).
993fc1f2750SBernard Iremonger
994fc1f2750SBernard IremongerFor input ports, the action handler is executed after RX function. For output ports, the action handler is executed before the TX function.
995fc1f2750SBernard Iremonger
996fc1f2750SBernard IremongerThe action handler can decide to drop packets.
997fc1f2750SBernard Iremonger
998fc1f2750SBernard IremongerTable Actions
999fc1f2750SBernard Iremonger~~~~~~~~~~~~~
1000fc1f2750SBernard Iremonger
1001fc1f2750SBernard IremongerTable Action Handler
1002fc1f2750SBernard Iremonger^^^^^^^^^^^^^^^^^^^^
1003fc1f2750SBernard Iremonger
1004fc1f2750SBernard IremongerAn action handler to be executed on each input packet can be assigned to each table.
1005fc1f2750SBernard IremongerDefining the action handler for a specific table is optional (i.e. the action handler can be disabled).
1006fc1f2750SBernard Iremonger
1007fc1f2750SBernard IremongerThe action handler is executed after the table lookup operation is performed and the table entry associated with each input packet is identified.
1008fc1f2750SBernard IremongerThe action handler can only handle the user-defined actions, while the reserved actions (e.g. the next hop actions) are handled by the Packet Framework.
1009fc1f2750SBernard IremongerThe action handler can decide to drop the input packet.
1010fc1f2750SBernard Iremonger
1011fc1f2750SBernard IremongerReserved Actions
1012fc1f2750SBernard Iremonger^^^^^^^^^^^^^^^^
1013fc1f2750SBernard Iremonger
1014fc1f2750SBernard IremongerThe reserved actions are handled directly by the Packet Framework without the user being able to change their meaning
1015fc1f2750SBernard Iremongerthrough the table action handler configuration.
1016fc1f2750SBernard IremongerA special category of the reserved actions is represented by the next hop actions, which regulate the packet flow between input ports,
1017fc1f2750SBernard Iremongertables and output ports through the pipeline.
10188c9a3374SJohn McNamara:numref:`table_qos_33` lists the next hop actions.
1019fc1f2750SBernard Iremonger
10208c9a3374SJohn McNamara.. _table_qos_33:
1021fc1f2750SBernard Iremonger
10228c9a3374SJohn McNamara.. table:: Next Hop Actions (Reserved)
1023fc1f2750SBernard Iremonger
1024fc1f2750SBernard Iremonger   +---+---------------------+-----------------------------------------------------------------------------------+
1025fc1f2750SBernard Iremonger   | # | Next hop action     | Description                                                                       |
1026fc1f2750SBernard Iremonger   |   |                     |                                                                                   |
1027fc1f2750SBernard Iremonger   +===+=====================+===================================================================================+
1028fc1f2750SBernard Iremonger   | 1 | Drop                | Drop the current packet.                                                          |
1029fc1f2750SBernard Iremonger   |   |                     |                                                                                   |
1030fc1f2750SBernard Iremonger   +---+---------------------+-----------------------------------------------------------------------------------+
1031fc1f2750SBernard Iremonger   | 2 | Send to output port | Send the current packet to specified output port. The output port ID is metadata  |
1032fc1f2750SBernard Iremonger   |   |                     | stored in the same table entry.                                                   |
1033fc1f2750SBernard Iremonger   |   |                     |                                                                                   |
1034fc1f2750SBernard Iremonger   +---+---------------------+-----------------------------------------------------------------------------------+
1035fc1f2750SBernard Iremonger   | 3 | Send to table       | Send the current packet to specified table. The table ID is metadata stored in    |
1036fc1f2750SBernard Iremonger   |   |                     | the same table entry.                                                             |
1037fc1f2750SBernard Iremonger   |   |                     |                                                                                   |
1038fc1f2750SBernard Iremonger   +---+---------------------+-----------------------------------------------------------------------------------+
1039fc1f2750SBernard Iremonger
1040fc1f2750SBernard IremongerUser Actions
1041fc1f2750SBernard Iremonger^^^^^^^^^^^^
1042fc1f2750SBernard Iremonger
1043fc1f2750SBernard IremongerFor each table, the meaning of user actions is defined through the configuration of the table action handler.
1044fc1f2750SBernard IremongerDifferent tables can be configured with different action handlers, therefore the meaning of the user actions
1045fc1f2750SBernard Iremongerand their associated meta-data is private to each table.
1046fc1f2750SBernard IremongerWithin the same table, all the table entries (including the table default entry) share the same definition
1047fc1f2750SBernard Iremongerfor the user actions and their associated meta-data,
1048fc1f2750SBernard Iremongerwith each table entry having its own set of enabled user actions and its own copy of the action meta-data.
10498c9a3374SJohn McNamara:numref:`table_qos_34` contains a non-exhaustive list of user action examples.
1050fc1f2750SBernard Iremonger
10518c9a3374SJohn McNamara.. _table_qos_34:
1052fc1f2750SBernard Iremonger
10538c9a3374SJohn McNamara.. table:: User Action Examples
1054fc1f2750SBernard Iremonger
1055fc1f2750SBernard Iremonger   +---+-----------------------------------+---------------------------------------------------------------------+
1056fc1f2750SBernard Iremonger   | # | User action                       | Description                                                         |
1057fc1f2750SBernard Iremonger   |   |                                   |                                                                     |
1058fc1f2750SBernard Iremonger   +===+===================================+=====================================================================+
1059fc1f2750SBernard Iremonger   | 1 | Metering                          | Per flow traffic metering using the srTCM and trTCM algorithms.     |
1060fc1f2750SBernard Iremonger   |   |                                   |                                                                     |
1061fc1f2750SBernard Iremonger   +---+-----------------------------------+---------------------------------------------------------------------+
1062fc1f2750SBernard Iremonger   | 2 | Statistics                        | Update the statistics counters maintained per flow.                 |
1063fc1f2750SBernard Iremonger   |   |                                   |                                                                     |
1064fc1f2750SBernard Iremonger   +---+-----------------------------------+---------------------------------------------------------------------+
1065fc1f2750SBernard Iremonger   | 3 | App ID                            | Per flow state machine fed by variable length sequence of packets   |
1066fc1f2750SBernard Iremonger   |   |                                   | at the flow initialization with the purpose of identifying the      |
1067fc1f2750SBernard Iremonger   |   |                                   | traffic type and application.                                       |
1068fc1f2750SBernard Iremonger   |   |                                   |                                                                     |
1069fc1f2750SBernard Iremonger   +---+-----------------------------------+---------------------------------------------------------------------+
1070fc1f2750SBernard Iremonger   | 4 | Push/pop labels                   | Push/pop VLAN/MPLS labels to/from the current packet.               |
1071fc1f2750SBernard Iremonger   |   |                                   |                                                                     |
1072fc1f2750SBernard Iremonger   +---+-----------------------------------+---------------------------------------------------------------------+
1073fc1f2750SBernard Iremonger   | 5 | Network Address Translation (NAT) | Translate between the internal (LAN) and external (WAN) IP          |
1074fc1f2750SBernard Iremonger   |   |                                   | destination/source address and/or L4 protocol destination/source    |
1075fc1f2750SBernard Iremonger   |   |                                   | port.                                                               |
1076fc1f2750SBernard Iremonger   |   |                                   |                                                                     |
1077fc1f2750SBernard Iremonger   +---+-----------------------------------+---------------------------------------------------------------------+
1078fc1f2750SBernard Iremonger   | 6 | TTL update                        | Decrement IP TTL and, in case of IPv4 packets, update the IP        |
1079fc1f2750SBernard Iremonger   |   |                                   | checksum.                                                           |
1080fc1f2750SBernard Iremonger   |   |                                   |                                                                     |
1081fc1f2750SBernard Iremonger   +---+-----------------------------------+---------------------------------------------------------------------+
1082cc85c078SFan Zhang   | 7 | Sym Crypto                        | Generate Cryptodev session based on the user-specified algorithm    |
1083cc85c078SFan Zhang   |   |                                   | and key(s), and assemble the cryptodev operation based on the       |
1084cc85c078SFan Zhang   |   |                                   | predefined offsets.                                                 |
1085cc85c078SFan Zhang   |   |                                   |                                                                     |
1086cc85c078SFan Zhang   +---+-----------------------------------+---------------------------------------------------------------------+
1087fc1f2750SBernard Iremonger
1088fc1f2750SBernard IremongerMulticore Scaling
1089fc1f2750SBernard Iremonger-----------------
1090fc1f2750SBernard Iremonger
1091fc1f2750SBernard IremongerA complex application is typically split across multiple cores, with cores communicating through SW queues.
1092fc1f2750SBernard IremongerThere is usually a performance limit on the number of table lookups
1093fc1f2750SBernard Iremongerand actions that can be fitted on the same CPU core due to HW constraints like:
1094fc1f2750SBernard Iremongeravailable CPU cycles, cache memory size, cache transfer BW, memory transfer BW, etc.
1095fc1f2750SBernard Iremonger
1096fc1f2750SBernard IremongerAs the application is split across multiple CPU cores, the Packet Framework facilitates the creation of several pipelines,
1097fc1f2750SBernard Iremongerthe assignment of each such pipeline to a different CPU core
1098fc1f2750SBernard Iremongerand the interconnection of all CPU core-level pipelines into a single application-level complex pipeline.
1099fc1f2750SBernard IremongerFor example, if CPU core A is assigned to run pipeline P1 and CPU core B pipeline P2,
1100fc1f2750SBernard Iremongerthen the interconnection of P1 with P2 could be achieved by having the same set of SW queues act like output ports
1101fc1f2750SBernard Iremongerfor P1 and input ports for P2.
1102fc1f2750SBernard Iremonger
1103fc1f2750SBernard IremongerThis approach enables the application development using the pipeline, run-to-completion (clustered) or hybrid (mixed) models.
1104fc1f2750SBernard Iremonger
1105fc1f2750SBernard IremongerIt is allowed for the same core to run several pipelines, but it is not allowed for several cores to run the same pipeline.
1106fc1f2750SBernard Iremonger
1107fc1f2750SBernard IremongerShared Data Structures
1108fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~~~~~~
1109fc1f2750SBernard Iremonger
1110fc1f2750SBernard IremongerThe threads performing table lookup are actually table writers rather than just readers.
1111fc1f2750SBernard IremongerEven if the specific table lookup algorithm is thread-safe for multiple readers
1112fc1f2750SBernard Iremonger(e. g. read-only access of the search algorithm data structures is enough to conduct the lookup operation),
1113fc1f2750SBernard Iremongeronce the table entry for the current packet is identified, the thread is typically expected to update the action meta-data stored in the table entry
1114fc1f2750SBernard Iremonger(e.g. increment the counter tracking the number of packets that hit this table entry), and thus modify the table entry.
1115fc1f2750SBernard IremongerDuring the time this thread is accessing this table entry (either writing or reading; duration is application specific),
1116fc1f2750SBernard Iremongerfor data consistency reasons, no other threads (threads performing table lookup or entry add/delete operations) are allowed to modify this table entry.
1117fc1f2750SBernard Iremonger
1118fc1f2750SBernard IremongerMechanisms to share the same table between multiple threads:
1119fc1f2750SBernard Iremonger
1120fc1f2750SBernard Iremonger#.  **Multiple writer threads.**
1121fc1f2750SBernard Iremonger    Threads need to use synchronization primitives like semaphores (distinct semaphore per table entry) or atomic instructions.
1122fc1f2750SBernard Iremonger    The cost of semaphores is usually high, even when the semaphore is free.
1123fc1f2750SBernard Iremonger    The cost of atomic instructions is normally higher than the cost of regular instructions.
1124fc1f2750SBernard Iremonger
1125fc1f2750SBernard Iremonger#.  **Multiple writer threads, with single thread performing table lookup operations and multiple threads performing table entry add/delete operations.**
1126fc1f2750SBernard Iremonger    The threads performing table entry add/delete operations send table update requests to the reader (typically through message passing queues),
1127fc1f2750SBernard Iremonger    which does the actual table updates and then sends the response back to the request initiator.
1128fc1f2750SBernard Iremonger
1129fc1f2750SBernard Iremonger#.  **Single writer thread performing table entry add/delete operations and multiple reader threads that perform table lookup operations with read-only access to the table entries.**
1130fc1f2750SBernard Iremonger    The reader threads use the main table copy while the writer is updating the mirror copy.
1131fc1f2750SBernard Iremonger    Once the writer update is done, the writer can signal to the readers and busy wait until all readers swaps between the mirror copy (which now becomes the main copy) and
1132fc1f2750SBernard Iremonger    the mirror copy (which now becomes the main copy).
1133fc1f2750SBernard Iremonger
1134fc1f2750SBernard IremongerInterfacing with Accelerators
1135fc1f2750SBernard Iremonger-----------------------------
1136fc1f2750SBernard Iremonger
1137fc1f2750SBernard IremongerThe presence of accelerators is usually detected during the initialization phase by inspecting the HW devices that are part of the system (e.g. by PCI bus enumeration).
1138fc1f2750SBernard IremongerTypical devices with acceleration capabilities are:
1139fc1f2750SBernard Iremonger
1140fc1f2750SBernard Iremonger*   Inline accelerators: NICs, switches, FPGAs, etc;
1141fc1f2750SBernard Iremonger
1142cc85c078SFan Zhang*   Look-aside accelerators: chipsets, FPGAs, Intel QuickAssist, etc.
1143fc1f2750SBernard Iremonger
1144fc1f2750SBernard IremongerUsually, to support a specific functional block, specific implementation of Packet Framework tables and/or ports and/or actions has to be provided for each accelerator,
1145fc1f2750SBernard Iremongerwith all the implementations sharing the same API: pure SW implementation (no acceleration), implementation using accelerator A, implementation using accelerator B, etc.
1146fc1f2750SBernard IremongerThe selection between these implementations could be done at build time or at run-time (recommended), based on which accelerators are present in the system,
1147fc1f2750SBernard Iremongerwith no application changes required.
11485adff62eSCristian Dumitrescu
11495adff62eSCristian DumitrescuThe Software Switch (SWX) Pipeline
11505adff62eSCristian Dumitrescu----------------------------------
11515adff62eSCristian Dumitrescu
11525adff62eSCristian DumitrescuThe Software Switch (SWX) pipeline is designed to combine the DPDK performance with the flexibility of the P4-16 language [1]. It can be used either by itself
11535adff62eSCristian Dumitrescuto code a complete software switch or data plane application, or in combination with the open-source P4 compiler P4C [2], acting as a P4C back-end that allows
11545adff62eSCristian Dumitrescuthe P4 programs to be translated to the DPDK SWX API and run on multi-core CPUs.
11555adff62eSCristian Dumitrescu
11565adff62eSCristian DumitrescuThe main features of the SWX pipeline are:
11575adff62eSCristian Dumitrescu
11585adff62eSCristian Dumitrescu*   Nothing is hard-wired, everything is dynamically defined: The packet headers (i.e. the network protocols), the packet meta-data, the actions, the tables
11595adff62eSCristian Dumitrescu    and the pipeline itself are dynamically defined instead of selected from a predefined set.
11605adff62eSCristian Dumitrescu
11615adff62eSCristian Dumitrescu*   Instructions: The actions and the life of the packet through the pipeline are defined with instructions that manipulate the pipeline objects mentioned
11625adff62eSCristian Dumitrescu    above. The pipeline is the main function of the packet program, with actions as subroutines triggered by the tables.
11635adff62eSCristian Dumitrescu
11645adff62eSCristian Dumitrescu*   Call external plugins: Extern objects and functions can be defined to call functionality that cannot be efficiently implemented with the existing
11655adff62eSCristian Dumitrescu    pipeline-oriented instruction set, such as: error detecting/correcting codes, cryptographic operations, meters, statistics counter arrays, heuristics, etc.
11665adff62eSCristian Dumitrescu
11675adff62eSCristian Dumitrescu*   Better control plane interaction: Transaction-oriented table update mechanism that supports multi-table atomic updates. Multiple tables can be updated in a
11685adff62eSCristian Dumitrescu    single step with only the before-update and the after-update table entries visible to the packets. Alignment with the P4Runtime [3] protocol.
11695adff62eSCristian Dumitrescu
11705adff62eSCristian Dumitrescu*   Performance: Multiple packets are in-flight within the pipeline at any moment. Each packet is owned by a different time-sharing thread in
11715adff62eSCristian Dumitrescu    run-to-completion, with the thread pausing before memory access operations such as packet I/O and table lookup to allow the memory prefetch to complete.
11725adff62eSCristian Dumitrescu    The instructions are verified and translated at initialization time with no run-time impact. The instructions are also optimized to detect and "fuse"
11735adff62eSCristian Dumitrescu    frequently used patterns into vector-like instructions transparently to the user.
11745adff62eSCristian Dumitrescu
11755adff62eSCristian DumitrescuThe main SWX pipeline components are:
11765adff62eSCristian Dumitrescu
11775adff62eSCristian Dumitrescu*   Input and output ports: Each port instantiates a port type that defines the port operations, e.g. Ethernet device port, PCAP port, etc. The RX interface
11785adff62eSCristian Dumitrescu    of the input ports and the TX interface of the output ports are single packet based, with packet batching typically implemented internally by each port for
11795adff62eSCristian Dumitrescu    performance reasons.
11805adff62eSCristian Dumitrescu
11815adff62eSCristian Dumitrescu*   Structure types: Each structure type is used to define the logical layout of a memory block, such as: packet headers, packet meta-data, action data stored
11825adff62eSCristian Dumitrescu    in a table entry, mailboxes of extern objects and functions. Similar to C language structs, each structure type is a well defined sequence of fields, with
11835adff62eSCristian Dumitrescu    each field having a unique name and a constant size.
11845adff62eSCristian Dumitrescu
11855adff62eSCristian Dumitrescu*   Packet headers: Each packet typically has one or multiple headers. The headers are extracted from the input packet as part of the packet parsing operation,
11865adff62eSCristian Dumitrescu    which is likely executed immediately after the packet reception. As result of the extract operation, each header is logically removed from the packet, so
11875adff62eSCristian Dumitrescu    once the packet parsing operation is completed, the input packet is reduced to opaque payload. Just before transmission, one or several headers are pushed
11885adff62eSCristian Dumitrescu    in front of each output packet through the emit operation; these headers can be part of the set of headers that were previously extracted from the input
11895adff62eSCristian Dumitrescu    packet (and potentially modified afterwards) or some new headers whose contents is generated by the pipeline (e.g. by reading them from tables). The format
11905adff62eSCristian Dumitrescu    of each packet header is defined by instantiating a structure type.
11915adff62eSCristian Dumitrescu
11925adff62eSCristian Dumitrescu*   Packet meta-data: The packet meta-data is filled in by the pipeline (e.g. by reading it from tables) or computed by the pipeline. It is not sent out unless
11935adff62eSCristian Dumitrescu    some of the meta-data fields are explicitly written into the headers emitted into the output packet. The format of the packet meta-data is defined by
11945adff62eSCristian Dumitrescu    instantiating a structure type.
11955adff62eSCristian Dumitrescu
11965adff62eSCristian Dumitrescu*   Extern objects and functions: Used to plug into the pipeline any functionality that cannot be efficiently implemented with the existing pipeline instruction
11975adff62eSCristian Dumitrescu    set. Each extern object and extern function has its own mailbox, which is used to pass the input arguments to and retrieve the output arguments from the
11985adff62eSCristian Dumitrescu    extern object member functions or the extern function.  The mailbox format is defined by instantiating a structure type.
11995adff62eSCristian Dumitrescu
12005adff62eSCristian Dumitrescu*   Instructions: The pipeline and its actions are defined with instructions from a predefined instruction set. The instructions are used to receive and
12015adff62eSCristian Dumitrescu    transmit the current packet, extract and emit headers from/into the packet, read/write the packet headers, packet meta-data and mailboxes, start table
12025adff62eSCristian Dumitrescu    lookup operations, read the action arguments from the table entry, call extern object member functions or extern functions. See the rte_swx_pipeline.h file
12035adff62eSCristian Dumitrescu    for the complete list of instructions.
12045adff62eSCristian Dumitrescu
12055adff62eSCristian Dumitrescu*   Actions: The pipeline actions are dynamically defined through instructions as opposed to predefined. Essentially, the actions are subroutines of the
12065adff62eSCristian Dumitrescu    pipeline program and their execution is triggered by the table lookup. The input arguments of each action are read from the table entry (in case of table
12075adff62eSCristian Dumitrescu    lookup hit) or the default table action (in case of table lookup miss) and are read-only; their format is defined by instantiating a structure type. The
12085adff62eSCristian Dumitrescu    actions have read-write access to the packet headers and meta-data.
12095adff62eSCristian Dumitrescu
12105adff62eSCristian Dumitrescu*   Table: Each pipeline typically has one or more lookup tables. The match fields of each table are flexibly selected from the packet headers and meta-data
12115adff62eSCristian Dumitrescu    defined for the current pipeline. The set of table actions is flexibly selected for each table from the set of actions defined for the current pipeline. The
12125adff62eSCristian Dumitrescu    tables can be looked at as special pipeline operators that result in one of the table actions being called, depending on the result of the table lookup
12135adff62eSCristian Dumitrescu    operation.
12145adff62eSCristian Dumitrescu
12155adff62eSCristian Dumitrescu*   Pipeline: The pipeline represents the main program that defines the life of the packet, with subroutines (actions) executed on table lookup. As packets
12165adff62eSCristian Dumitrescu    go through the pipeline, the packet headers and meta-data are transformed along the way.
12175adff62eSCristian Dumitrescu
12185adff62eSCristian DumitrescuReferences:
12195adff62eSCristian Dumitrescu
12205adff62eSCristian Dumitrescu[1] P4-16 specification: https://p4.org/specs/
12215adff62eSCristian Dumitrescu
12225adff62eSCristian Dumitrescu[2] P4-16 compiler: https://github.com/p4lang/p4c
12235adff62eSCristian Dumitrescu
12245adff62eSCristian Dumitrescu[3] P4Runtime specification: https://p4.org/specs/
1225