xref: /dpdk/doc/guides/prog_guide/lpm_lib.rst (revision 41dd9a6bc2d9c6e20e139ad713cc9d172572dd43)
15630257fSFerruh Yigit..  SPDX-License-Identifier: BSD-3-Clause
25630257fSFerruh Yigit    Copyright(c) 2010-2014 Intel Corporation.
3fc1f2750SBernard Iremonger
4*41dd9a6bSDavid YoungLongest Prefix Match (LPM) Library
5*41dd9a6bSDavid Young==================================
6fc1f2750SBernard Iremonger
748624fd9SSiobhan ButlerThe DPDK LPM library component implements the Longest Prefix Match (LPM) table search method for 32-bit keys
8fc1f2750SBernard Iremongerthat is typically used to find the best route match in IP forwarding applications.
9fc1f2750SBernard Iremonger
10fc1f2750SBernard IremongerLPM API Overview
11fc1f2750SBernard Iremonger----------------
12fc1f2750SBernard Iremonger
13fc1f2750SBernard IremongerThe main configuration parameter for LPM component instances is the maximum number of rules to support.
14fc1f2750SBernard IremongerAn LPM prefix is represented by a pair of parameters (32- bit key, depth), with depth in the range of 1 to 32.
15fc1f2750SBernard IremongerAn LPM rule is represented by an LPM prefix and some user data associated with the prefix.
16fc1f2750SBernard IremongerThe prefix serves as the unique identifier of the LPM rule.
17fc1f2750SBernard IremongerIn this implementation, the user data is 1-byte long and is called next hop,
18fc1f2750SBernard Iremongerin correlation with its main use of storing the ID of the next hop in a routing table entry.
19fc1f2750SBernard Iremonger
20fc1f2750SBernard IremongerThe main methods exported by the LPM component are:
21fc1f2750SBernard Iremonger
22fc1f2750SBernard Iremonger*   Add LPM rule: The LPM rule is provided as input.
23fc1f2750SBernard Iremonger    If there is no rule with the same prefix present in the table, then the new rule is added to the LPM table.
24fc1f2750SBernard Iremonger    If a rule with the same prefix is already present in the table, the next hop of the rule is updated.
25fc1f2750SBernard Iremonger    An error is returned when there is no available rule space left.
26fc1f2750SBernard Iremonger
27fc1f2750SBernard Iremonger*   Delete LPM rule: The prefix of the LPM rule is provided as input.
28fc1f2750SBernard Iremonger    If a rule with the specified prefix is present in the LPM table, then it is removed.
29fc1f2750SBernard Iremonger
30fc1f2750SBernard Iremonger*   Lookup LPM key: The 32-bit key is provided as input.
31fc1f2750SBernard Iremonger    The algorithm selects the rule that represents the best match for the given key and returns the next hop of that rule.
32fc1f2750SBernard Iremonger    In the case that there are multiple rules present in the LPM table that have the same 32-bit key,
33fc1f2750SBernard Iremonger    the algorithm picks the rule with the highest depth as the best match rule,
34fc1f2750SBernard Iremonger    which means that the rule has the highest number of most significant bits matching between the input key and the rule key.
35fc1f2750SBernard Iremonger
3629e30cbcSThomas Monjalon.. _lpm4_details:
3729e30cbcSThomas Monjalon
38fc1f2750SBernard IremongerImplementation Details
39fc1f2750SBernard Iremonger----------------------
40fc1f2750SBernard Iremonger
41fc1f2750SBernard IremongerThe current implementation uses a variation of the DIR-24-8 algorithm that trades memory usage for improved LPM lookup speed.
42fc1f2750SBernard IremongerThe algorithm allows the lookup operation to be performed with typically a single memory read access.
43fc1f2750SBernard IremongerIn the statistically rare case when the best match rule is having a depth bigger than 24,
44fc1f2750SBernard Iremongerthe lookup operation requires two memory read accesses.
45fc1f2750SBernard IremongerTherefore, the performance of the LPM lookup operation is greatly influenced by
46fc1f2750SBernard Iremongerwhether the specific memory location is present in the processor cache or not.
47fc1f2750SBernard Iremonger
48fc1f2750SBernard IremongerThe main data structure is built using the following elements:
49fc1f2750SBernard Iremonger
50fc1f2750SBernard Iremonger*   A table with 2^24 entries.
51fc1f2750SBernard Iremonger
52fc1f2750SBernard Iremonger*   A number of tables (RTE_LPM_TBL8_NUM_GROUPS) with 2^8 entries.
53fc1f2750SBernard Iremonger
54fc1f2750SBernard IremongerThe first table, called tbl24, is indexed using the first 24 bits of the IP address to be looked up,
55fc1f2750SBernard Iremongerwhile the second table(s), called tbl8, is indexed using the last 8 bits of the IP address.
56fc1f2750SBernard IremongerThis means that depending on the outcome of trying to match the IP address of an incoming packet to the rule stored in the tbl24
57fc1f2750SBernard Iremongerwe might need to continue the lookup process in the second level.
58fc1f2750SBernard Iremonger
59fc1f2750SBernard IremongerSince every entry of the tbl24 can potentially point to a tbl8, ideally, we would have 2^24 tbl8s,
60fc1f2750SBernard Iremongerwhich would be the same as having a single table with 2^32 entries.
61fc1f2750SBernard IremongerThis is not feasible due to resource restrictions.
62fc1f2750SBernard IremongerInstead, this approach takes advantage of the fact that rules longer than 24 bits are very rare.
63fc1f2750SBernard IremongerBy splitting the process in two different tables/levels and limiting the number of tbl8s,
64fc1f2750SBernard Iremongerwe can greatly reduce memory consumption while maintaining a very good lookup speed (one memory access, most of the times).
65fc1f2750SBernard Iremonger
66fc1f2750SBernard Iremonger
674a22e6eeSJohn McNamara.. figure:: img/tbl24_tbl8.*
684a22e6eeSJohn McNamara
694a22e6eeSJohn McNamara   Table split into different levels
704a22e6eeSJohn McNamara
71fc1f2750SBernard Iremonger
72fc1f2750SBernard IremongerAn entry in tbl24 contains the following fields:
73fc1f2750SBernard Iremonger
74fc1f2750SBernard Iremonger*   next hop / index to the tbl8
75fc1f2750SBernard Iremonger
76fc1f2750SBernard Iremonger*   valid flag
77fc1f2750SBernard Iremonger
78fc1f2750SBernard Iremonger*   external entry flag
79fc1f2750SBernard Iremonger
80fc1f2750SBernard Iremonger*   depth of the rule (length)
81fc1f2750SBernard Iremonger
82fc1f2750SBernard IremongerThe first field can either contain a number indicating the tbl8 in which the lookup process should continue
83fc1f2750SBernard Iremongeror the next hop itself if the longest prefix match has already been found.
84fc1f2750SBernard IremongerThe two flags are used to determine whether the entry is valid or not and
85fc1f2750SBernard Iremongerwhether the search process have finished or not respectively.
86fc1f2750SBernard IremongerThe depth or length of the rule is the number of bits of the rule that is stored in a specific entry.
87fc1f2750SBernard Iremonger
88fc1f2750SBernard IremongerAn entry in a tbl8 contains the following fields:
89fc1f2750SBernard Iremonger
90fc1f2750SBernard Iremonger*   next hop
91fc1f2750SBernard Iremonger
92fc1f2750SBernard Iremonger*   valid
93fc1f2750SBernard Iremonger
94fc1f2750SBernard Iremonger*   valid group
95fc1f2750SBernard Iremonger
96fc1f2750SBernard Iremonger*   depth
97fc1f2750SBernard Iremonger
98fc1f2750SBernard IremongerNext hop and depth contain the same information as in the tbl24.
99fc1f2750SBernard IremongerThe two flags show whether the entry and the table are valid respectively.
100fc1f2750SBernard Iremonger
101fc1f2750SBernard IremongerThe other main data structure is a table containing the main information about the rules (IP and next hop).
102fc1f2750SBernard IremongerThis is a higher level table, used for different things:
103fc1f2750SBernard Iremonger
104fc1f2750SBernard Iremonger*   Check whether a rule already exists or not, prior to addition or deletion,
105fc1f2750SBernard Iremonger    without having to actually perform a lookup.
106fc1f2750SBernard Iremonger
107fc1f2750SBernard Iremonger*   When deleting, to check whether there is a rule containing the one that is to be deleted.
108fc1f2750SBernard Iremonger    This is important, since the main data structure will have to be updated accordingly.
109fc1f2750SBernard Iremonger
110fc1f2750SBernard IremongerAddition
111fc1f2750SBernard Iremonger~~~~~~~~
112fc1f2750SBernard Iremonger
113fc1f2750SBernard IremongerWhen adding a rule, there are different possibilities.
114fc1f2750SBernard IremongerIf the rule's depth is exactly 24 bits, then:
115fc1f2750SBernard Iremonger
116fc1f2750SBernard Iremonger*   Use the rule (IP address) as an index to the tbl24.
117fc1f2750SBernard Iremonger
118fc1f2750SBernard Iremonger*   If the entry is invalid (i.e. it doesn't already contain a rule) then set its next hop to its value,
119fc1f2750SBernard Iremonger    the valid flag to 1 (meaning this entry is in use),
120fc1f2750SBernard Iremonger    and the external entry flag to 0
121fc1f2750SBernard Iremonger    (meaning the lookup process ends at this point, since this is the longest prefix that matches).
122fc1f2750SBernard Iremonger
123fc1f2750SBernard IremongerIf the rule's depth is exactly 32 bits, then:
124fc1f2750SBernard Iremonger
125fc1f2750SBernard Iremonger*   Use the first 24 bits of the rule as an index to the tbl24.
126fc1f2750SBernard Iremonger
127fc1f2750SBernard Iremonger*   If the entry is invalid (i.e. it doesn't already contain a rule) then look for a free tbl8,
128fc1f2750SBernard Iremonger    set the index to the tbl8 to this value,
129fc1f2750SBernard Iremonger    the valid flag to 1 (meaning this entry is in use), and the external entry flag to 1
130fc1f2750SBernard Iremonger    (meaning the lookup process must continue since the rule hasn't been explored completely).
131fc1f2750SBernard Iremonger
132fc1f2750SBernard IremongerIf the rule's depth is any other value, prefix expansion must be performed.
133fc1f2750SBernard IremongerThis means the rule is copied to all the entries (as long as they are not in use) which would also cause a match.
134fc1f2750SBernard Iremonger
135fc1f2750SBernard IremongerAs a simple example, let's assume the depth is 20 bits.
136fc1f2750SBernard IremongerThis means that there are 2^(24 - 20) = 16 different combinations of the first 24 bits of an IP address that
137fc1f2750SBernard Iremongerwould cause a match.
138fc1f2750SBernard IremongerHence, in this case, we copy the exact same entry to every position indexed by one of these combinations.
139fc1f2750SBernard Iremonger
140fc1f2750SBernard IremongerBy doing this we ensure that during the lookup process, if a rule matching the IP address exists,
141fc1f2750SBernard Iremongerit is found in either one or two memory accesses,
142fc1f2750SBernard Iremongerdepending on whether we need to move to the next table or not.
143fc1f2750SBernard IremongerPrefix expansion is one of the keys of this algorithm,
144fc1f2750SBernard Iremongersince it improves the speed dramatically by adding redundancy.
145fc1f2750SBernard Iremonger
1468a9f8564SRuifeng WangDeletion
1478a9f8564SRuifeng Wang~~~~~~~~
1488a9f8564SRuifeng Wang
1498a9f8564SRuifeng WangWhen deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has
1508a9f8564SRuifeng Wangthe longest prefix match with the rule to be deleted, but has shorter prefix.
1518a9f8564SRuifeng Wang
1528a9f8564SRuifeng WangIf a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop
1538a9f8564SRuifeng Wangvalue with the replacement rule.
1548a9f8564SRuifeng Wang
1558a9f8564SRuifeng WangIf no replacement rule can be found, target tbl24 and tbl8 entries will be cleared.
1568a9f8564SRuifeng Wang
1578a9f8564SRuifeng WangPrefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits.
1588a9f8564SRuifeng Wang
1598a9f8564SRuifeng WangAfter deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases:
1608a9f8564SRuifeng Wang
1618a9f8564SRuifeng Wang*   All tbl8s in the group are empty .
1628a9f8564SRuifeng Wang
1638a9f8564SRuifeng Wang*   All tbl8s in the group have the same values and with depth no greater than 24.
1648a9f8564SRuifeng Wang
1658a9f8564SRuifeng WangFree of tbl8s have different behaviors:
1668a9f8564SRuifeng Wang
1678a9f8564SRuifeng Wang*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
1688a9f8564SRuifeng Wang
1698a9f8564SRuifeng Wang*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
1708a9f8564SRuifeng Wang
1718a9f8564SRuifeng WangWhen the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using
1728a9f8564SRuifeng Wangthe tbl8 group entries. This might result in incorrect lookup results.
1738a9f8564SRuifeng Wang
1748a9f8564SRuifeng WangRCU QSBR process is integrated for safe tbl8 group reclamation. Application has certain responsibilities
175*41dd9a6bSDavid Youngwhile using this feature. Please refer to resource reclamation framework of :doc:`rcu_lib`
1768a9f8564SRuifeng Wangfor more details.
1778a9f8564SRuifeng Wang
178fc1f2750SBernard IremongerLookup
179fc1f2750SBernard Iremonger~~~~~~
180fc1f2750SBernard Iremonger
181fc1f2750SBernard IremongerThe lookup process is much simpler and quicker. In this case:
182fc1f2750SBernard Iremonger
183fc1f2750SBernard Iremonger*   Use the first 24 bits of the IP address as an index to the tbl24.
184fc1f2750SBernard Iremonger    If the entry is not in use, then it means we don't have a rule matching this IP.
185fc1f2750SBernard Iremonger    If it is valid and the external entry flag is set to 0, then the next hop is returned.
186fc1f2750SBernard Iremonger
187fc1f2750SBernard Iremonger*   If it is valid and the external entry flag is set to 1,
188fc1f2750SBernard Iremonger    then we use the tbl8 index to find out the tbl8 to be checked,
189fc1f2750SBernard Iremonger    and the last 8 bits of the IP address as an index to this table.
190fc1f2750SBernard Iremonger    Similarly, if the entry is not in use, then we don't have a rule matching this IP address.
191fc1f2750SBernard Iremonger    If it is valid then the next hop is returned.
192fc1f2750SBernard Iremonger
193fc1f2750SBernard IremongerLimitations in the Number of Rules
194fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
195fc1f2750SBernard Iremonger
196fc1f2750SBernard IremongerThere are different things that limit the number of rules that can be added.
197fc1f2750SBernard IremongerThe first one is the maximum number of rules, which is a parameter passed through the API.
198fc1f2750SBernard IremongerOnce this number is reached,
199fc1f2750SBernard Iremongerit is not possible to add any more rules to the routing table unless one or more are removed.
200fc1f2750SBernard Iremonger
201fc1f2750SBernard IremongerThe second reason is an intrinsic limitation of the algorithm.
202fc1f2750SBernard IremongerAs explained before, to avoid high memory consumption, the number of tbl8s is limited in compilation time
203fc1f2750SBernard Iremonger(this value is by default 256).
204fc1f2750SBernard IremongerIf we exhaust tbl8s, we won't be able to add any more rules.
205fc1f2750SBernard IremongerHow many of them are necessary for a specific routing table is hard to determine in advance.
206fc1f2750SBernard Iremonger
207fc1f2750SBernard IremongerA tbl8 is consumed whenever we have a new rule with depth bigger than 24,
208fc1f2750SBernard Iremongerand the first 24 bits of this rule are not the same as the first 24 bits of a rule previously added.
209fc1f2750SBernard IremongerIf they are, then the new rule will share the same tbl8 than the previous one,
210fc1f2750SBernard Iremongersince the only difference between the two rules is within the last byte.
211fc1f2750SBernard Iremonger
212fc1f2750SBernard IremongerWith the default value of 256, we can have up to 256 rules longer than 24 bits that differ on their first three bytes.
213fc1f2750SBernard IremongerSince routes longer than 24 bits are unlikely, this shouldn't be a problem in most setups.
214fc1f2750SBernard IremongerEven if it is, however, the number of tbl8s can be modified.
215fc1f2750SBernard Iremonger
216fc1f2750SBernard IremongerUse Case: IPv4 Forwarding
217fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~~~~~~~~~
218fc1f2750SBernard Iremonger
219fc1f2750SBernard IremongerThe LPM algorithm is used to implement Classless Inter-Domain Routing (CIDR) strategy used by routers implementing IPv4 forwarding.
220fc1f2750SBernard Iremonger
221fc1f2750SBernard IremongerReferences
222fc1f2750SBernard Iremonger~~~~~~~~~~
223fc1f2750SBernard Iremonger
224fc1f2750SBernard Iremonger*   RFC1519 Classless Inter-Domain Routing (CIDR): an Address Assignment and Aggregation Strategy,
225fc1f2750SBernard Iremonger    `http://www.ietf.org/rfc/rfc1519 <http://www.ietf.org/rfc/rfc1519>`_
226fc1f2750SBernard Iremonger
227fc1f2750SBernard Iremonger*   Pankaj Gupta, Algorithms for Routing Lookups and Packet Classification, PhD Thesis, Stanford University,
228193774a7SHerakliusz Lipiec    2000  (`http://klamath.stanford.edu/~pankaj/thesis/thesis_1sided.pdf <http://klamath.stanford.edu/~pankaj/thesis/thesis_1sided.pdf>`_ )
229