xref: /dpdk/doc/guides/nics/mlx5.rst (revision 4843aacb0d1201fef37e8a579fcd8baec4acdf98)
15feecc57SShahaf Shuler..  SPDX-License-Identifier: BSD-3-Clause
2a7e11a0cSAdrien Mazarguil    Copyright 2015 6WIND S.A.
35feecc57SShahaf Shuler    Copyright 2015 Mellanox Technologies, Ltd
4a7e11a0cSAdrien Mazarguil
5ee76bddcSThomas Monjalon.. include:: <isonum.txt>
6ee76bddcSThomas Monjalon
70f91f952SThomas MonjalonNVIDIA MLX5 Ethernet Driver
80f91f952SThomas Monjalon===========================
90f91f952SThomas Monjalon
100f91f952SThomas Monjalon.. note::
110f91f952SThomas Monjalon
120f91f952SThomas Monjalon   NVIDIA acquired Mellanox Technologies in 2020.
130f91f952SThomas Monjalon   The DPDK documentation and code might still include instances
140f91f952SThomas Monjalon   of or references to Mellanox trademarks (like BlueField and ConnectX)
150f91f952SThomas Monjalon   that are now NVIDIA trademarks.
16a7e11a0cSAdrien Mazarguil
17a3ade5e3SMichael BaumThe mlx5 Ethernet poll mode driver library (**librte_net_mlx5**) provides support
187b61f14eSRaslan Darawshehfor **NVIDIA ConnectX-4**, **NVIDIA ConnectX-4 Lx** , **NVIDIA ConnectX-5**,
197b61f14eSRaslan Darawsheh**NVIDIA ConnectX-6**, **NVIDIA ConnectX-6 Dx**, **NVIDIA ConnectX-6 Lx**,
20cb0da841SRaslan Darawsheh**NVIDIA ConnectX-7**, **NVIDIA BlueField**, **NVIDIA BlueField-2** and
21cb0da841SRaslan Darawsheh**NVIDIA BlueField-3** families of 10/25/40/50/100/200/400 Gb/s adapters
227b61f14eSRaslan Darawshehas well as their virtual functions (VF) in SR-IOV context.
23a7e11a0cSAdrien Mazarguil
24b583b9a1SFerruh YigitSupported NICs
25b583b9a1SFerruh Yigit--------------
26b583b9a1SFerruh Yigit
27b583b9a1SFerruh YigitThe following NVIDIA device families are supported by the same mlx5 driver:
28b583b9a1SFerruh Yigit
29b583b9a1SFerruh Yigit  - ConnectX-4
30b583b9a1SFerruh Yigit  - ConnectX-4 Lx
31b583b9a1SFerruh Yigit  - ConnectX-5
32b583b9a1SFerruh Yigit  - ConnectX-5 Ex
33b583b9a1SFerruh Yigit  - ConnectX-6
34b583b9a1SFerruh Yigit  - ConnectX-6 Dx
35b583b9a1SFerruh Yigit  - ConnectX-6 Lx
36b583b9a1SFerruh Yigit  - ConnectX-7
37b583b9a1SFerruh Yigit  - BlueField
38b583b9a1SFerruh Yigit  - BlueField-2
39b583b9a1SFerruh Yigit  - BlueField-3
40b583b9a1SFerruh Yigit
41b583b9a1SFerruh YigitBelow are detailed device names:
42b583b9a1SFerruh Yigit
43b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-4 10G MCX4111A-XCAT (1x10G)
44b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-4 10G MCX412A-XCAT (2x10G)
45b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-4 25G MCX4111A-ACAT (1x25G)
46b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-4 25G MCX412A-ACAT (2x25G)
47b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-4 40G MCX413A-BCAT (1x40G)
48b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-4 40G MCX4131A-BCAT (1x40G)
49b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-4 40G MCX415A-BCAT (1x40G)
50b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-4 50G MCX413A-GCAT (1x50G)
51b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-4 50G MCX4131A-GCAT (1x50G)
52b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-4 50G MCX414A-BCAT (2x50G)
53b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-4 50G MCX415A-GCAT (1x50G)
54b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-4 50G MCX416A-BCAT (2x50G)
55b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-4 50G MCX416A-GCAT (2x50G)
56b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-4 50G MCX415A-CCAT (1x100G)
57b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-4 100G MCX416A-CCAT (2x100G)
58b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-4 Lx 10G MCX4111A-XCAT (1x10G)
59b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-4 Lx 10G MCX4121A-XCAT (2x10G)
60b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-4 Lx 25G MCX4111A-ACAT (1x25G)
61b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-4 Lx 25G MCX4121A-ACAT (2x25G)
62b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-4 Lx 40G MCX4131A-BCAT (1x40G)
63b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-5 100G MCX556A-ECAT (2x100G)
64b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-5 Ex EN 100G MCX516A-CDAT (2x100G)
65b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-6 200G MCX654106A-HCAT (2x200G)
66b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-6 Dx EN 100G MCX623106AN-CDAT (2x100G)
67b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-6 Dx EN 200G MCX623105AN-VDAT (1x200G)
68b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-6 Lx EN 25G MCX631102AN-ADAT (2x25G)
69b583b9a1SFerruh Yigit* NVIDIA\ |reg| ConnectX\ |reg|-7 200G CX713106AE-HEA_QP1_Ax (2x200G)
70b583b9a1SFerruh Yigit* NVIDIA\ |reg| BlueField\ |reg|-2 25G MBF2H332A-AEEOT_A1 (2x25Gg
71b583b9a1SFerruh Yigit* NVIDIA\ |reg| BlueField\ |reg|-3 200GbE 900-9D3B6-00CV-AA0 (2x200)
72b583b9a1SFerruh Yigit* NVIDIA\ |reg| BlueField\ |reg|-3 200GbE 900-9D3B6-00SV-AA0 (2x200)
73b583b9a1SFerruh Yigit* NVIDIA\ |reg| BlueField\ |reg|-3 400GbE 900-9D3B6-00CN-AB0 (2x400)
74b583b9a1SFerruh Yigit* NVIDIA\ |reg| BlueField\ |reg|-3 100GbE 900-9D3B4-00CC-EA0 (2x100)
75b583b9a1SFerruh Yigit* NVIDIA\ |reg| BlueField\ |reg|-3 100GbE 900-9D3B4-00SC-EA0 (2x100)
76b583b9a1SFerruh Yigit* NVIDIA\ |reg| BlueField\ |reg|-3 400GbE 900-9D3B4-00EN-EA0 (1x100)
77b583b9a1SFerruh Yigit
78a7e11a0cSAdrien Mazarguil
79bdc3a542SThomas MonjalonDesign
80bdc3a542SThomas Monjalon------
81a7e11a0cSAdrien Mazarguil
82a7e11a0cSAdrien MazarguilBesides its dependency on libibverbs (that implies libmlx5 and associated
838809f78cSBruce Richardsonkernel support), librte_net_mlx5 relies heavily on system calls for control
84a7e11a0cSAdrien Mazarguiloperations such as querying/updating the MTU and flow control parameters.
85a7e11a0cSAdrien Mazarguil
86a7e11a0cSAdrien MazarguilThis capability allows the PMD to coexist with kernel network interfaces
87a7e11a0cSAdrien Mazarguilwhich remain functional, although they stop receiving unicast packets as
88a7e11a0cSAdrien Mazarguillong as they share the same MAC address.
895747882cSShahaf ShulerThis means legacy linux control tools (for example: ethtool, ifconfig and
905747882cSShahaf Shulermore) can operate on the same network interfaces that owned by the DPDK
915747882cSShahaf Shulerapplication.
92a7e11a0cSAdrien Mazarguil
930f91f952SThomas MonjalonSee :doc:`../../platform/mlx5` guide for more design details,
940f91f952SThomas Monjalonincluding prerequisites installation.
95a7e11a0cSAdrien Mazarguil
961ca601d1SAdrien MazarguilFeatures
971ca601d1SAdrien Mazarguil--------
981ca601d1SAdrien Mazarguil
996bf10ab6SMoti Haimovsky- Multi arch support: x86_64, POWER8, ARMv8, i686.
1001ca601d1SAdrien Mazarguil- Multiple TX and RX queues.
10109c25553SXueming Li- Shared Rx queue.
102febcac7bSBing Zhao- Rx queue delay drop.
1035c9f3294SSpike Du- Rx queue available descriptor threshold event.
1042235fcdaSSpike Du- Host shaper support.
105311b17e6SMichael Baum- Support steering for external Rx queue created outside the PMD.
106eaf691f9SViacheslav Ovsiienko- Support for scattered TX frames.
107eaf691f9SViacheslav Ovsiienko- Advanced support for scattered Rx frames with tunable buffer attributes.
108699abebaSOlga Shern- IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of queues.
109b1d63d82SDekel Peled- RSS using different combinations of fields: L3 only, L4 only or both,
110b1d63d82SDekel Peled  and source only, destination only or both.
1111ca601d1SAdrien Mazarguil- Several RSS hash keys, one for each flow type.
112f1b85a27SOphir Munk- Default RSS operation with no hash key specification.
1130e04e1e2SXueming Li- Symmetric RSS function.
114699abebaSOlga Shern- Configurable RETA table.
1158cd3a399SThomas Monjalon- Link flow control (pause frame).
1161ca601d1SAdrien Mazarguil- Support for multiple MAC addresses.
1171ca601d1SAdrien Mazarguil- VLAN filtering.
118f3db9489SYaacov Hazan- RX VLAN stripping.
119e192ef80SYaacov Hazan- TX VLAN insertion.
1204d326709SOlga Shern- RX CRC stripping configuration.
1211d89c404SViacheslav Ovsiienko- TX mbuf fast free offload.
122909be50aSOri Kam- Promiscuous mode on PF and VF.
123909be50aSOri Kam- Multicast promiscuous mode on PF and VF.
124699abebaSOlga Shern- Hardware checksum offloads.
1254c3e9bcdSNélio Laranjeiro- Flow director (RTE_FDIR_MODE_PERFECT, RTE_FDIR_MODE_PERFECT_MAC_VLAN and
1264c3e9bcdSNélio Laranjeiro  RTE_ETH_FDIR_REJECT).
1270e0fb943SThomas Monjalon- Flow API, including :ref:`flow_isolated_mode`.
12887ec44ceSXueming Li- Multiple process.
12940a9c623SNélio Laranjeiro- KVM and VMware ESX SR-IOV modes are supported.
130ecf60761SNélio Laranjeiro- RSS hash result is supported.
131c8122e93SXueming Li- Hardware TSO for generic IP or UDP tunnel, including VXLAN and GRE.
132c8122e93SXueming Li- Hardware checksum Tx offload for generic IP or UDP tunnel, including VXLAN and GRE.
1335747882cSShahaf Shuler- RX interrupts.
1345747882cSShahaf Shuler- Statistics query including Basic, Extended and per queue.
135d296d7d0SShahaf Shuler- Rx HW timestamp.
136f31d7a01SDekel Peled- Tunnel types: VXLAN, L3 VXLAN, VXLAN-GPE, GRE, MPLSoGRE, MPLSoUDP, IP-in-IP, Geneve, GTP.
137d0a87d9aSXueming Li- Tunnel HW offloads: packet type, inner/outer RSS, IP and UDP checksum verification.
138909be50aSOri Kam- NIC HW offloads: encapsulation (vxlan, gre, mplsoudp, mplsogre), NAT, routing, TTL
13923bdbfecSThomas Monjalon  increment/decrement, count, drop, mark. For details please see :ref:`mlx5_offloads_support`.
140909be50aSOri Kam- Flow insertion rate of more then million flows per second, when using Direct Rules.
141909be50aSOri Kam- Support for multiple rte_flow groups.
1422d51f88dSViacheslav Ovsiienko- Per packet no-inline hint flag to disable packet data copying into Tx descriptors.
14321bb6c7eSDekel Peled- Hardware LRO.
14418f127adSBing Zhao- Hairpin.
1459fbe97f0SXueming Li- Multiple-thread flow insertion.
146c410e1d5SGregory Etelson- Matching on IPv4 Internet Header Length (IHL).
14700e57916SRongwei Liu- Matching on IPv6 routing extension header.
14806cd4cf6SShiri Kuzin- Matching on GTP extension header with raw encap/decap action.
149e440d6cfSShiri Kuzin- Matching on Geneve TLV option header with raw encap/decap action.
15018ca4a4eSRaja Zidane- Matching on ESP header SPI field.
15132c2847aSDong Zhou- Matching on InfiniBand BTH.
15256d0de7aSMichael Baum- Matching on random value.
15376d57561SSean Zhang- Modify IPv4/IPv6 ECN field.
1541be65c39SRongwei Liu- Push or remove IPv6 routing extension.
155840f09fbSBing Zhao- NAT64.
156bd49d1d3SJiawei Wang- RSS support in sample action.
1576a951567SJiawei Wang- E-Switch mirroring and jump.
15807627fbfSJiawei Wang- E-Switch mirroring and modify.
159b2cd3918SJiawei Wang- Send to kernel.
1605f8ae44dSDong Zhou- 21844 flow priorities for ingress or egress flow groups greater than 0 and for any transfer
1615f8ae44dSDong Zhou  flow group.
16215896eafSGregory Etelson- Flow quota.
16351ec04dcSShun Hao- Flow metering, including meter policy API.
164a3b7af90SShun Hao- Flow meter hierarchy.
16548fbb0e9SAlexander Kozyrev- Flow meter mark.
16679f89527SGregory Etelson- Flow integrity offload API.
1670c6285b7SBing Zhao- Connection tracking.
168bdbe00deSAsaf Penso- Sub-Function representors.
169919488fbSXueming Li- Sub-Function.
170e8146c63SSean Zhang- Matching on represented port.
171674afdf0SJiawei Wang- Matching on aggregated affinity.
1721944fbc3SSuanming Mou- Matching on external Tx queue.
1734cbeba6fSSuanming Mou- Matching on E-Switch manager.
174919488fbSXueming Li
1751ca601d1SAdrien Mazarguil
1761ca601d1SAdrien MazarguilLimitations
1771ca601d1SAdrien Mazarguil-----------
1781ca601d1SAdrien Mazarguil
1795881b2d2STal Shnaiderman- Windows support:
1805881b2d2STal Shnaiderman
1815881b2d2STal Shnaiderman  On Windows, the features are limited:
1825881b2d2STal Shnaiderman
1835881b2d2STal Shnaiderman  - Promiscuous mode is not supported
1845881b2d2STal Shnaiderman  - The following rules are supported:
1855881b2d2STal Shnaiderman
1865881b2d2STal Shnaiderman    - IPv4/UDP with CVLAN filtering
1875881b2d2STal Shnaiderman    - Unicast MAC filtering
1885881b2d2STal Shnaiderman
189a6a18d06STal Shnaiderman  - Additional rules are supported from WinOF2 version 2.70:
190a6a18d06STal Shnaiderman
191a6a18d06STal Shnaiderman    - IPv4/TCP with CVLAN filtering
192a6a18d06STal Shnaiderman    - L4 steering rules for port RSS of UDP, TCP and IP
193a6a18d06STal Shnaiderman
19482caf3daSViacheslav Ovsiienko- PCI Virtual Function MTU:
19582caf3daSViacheslav Ovsiienko
19682caf3daSViacheslav Ovsiienko  MTU settings on PCI Virtual Functions have no effect.
19782caf3daSViacheslav Ovsiienko  The maximum receivable packet size for a VF is determined by the MTU
19882caf3daSViacheslav Ovsiienko  configured on its associated Physical Function.
19982caf3daSViacheslav Ovsiienko  DPDK applications using VFs must be prepared to handle packets
20082caf3daSViacheslav Ovsiienko  up to the maximum size of this PF port.
20182caf3daSViacheslav Ovsiienko
202a482a41aSShahaf Shuler- For secondary process:
203a482a41aSShahaf Shuler
20487ec44ceSXueming Li  - Forked secondary process not supported.
205793f5f4aSAlexander Kozyrev  - MPRQ is not supported. Callback to free externally attached MPRQ buffer is set
206793f5f4aSAlexander Kozyrev    in a primary process, but has a different virtual address in a secondary process.
207793f5f4aSAlexander Kozyrev    Calling a function at the wrong address leads to a segmentation fault.
208207fe7acSYongseok Koh  - External memory unregistered in EAL memseg list cannot be used for DMA
209207fe7acSYongseok Koh    unless such memory has been registered by ``mlx5_mr_update_ext_mp()`` in
210207fe7acSYongseok Koh    primary process and remapped to the same virtual address in secondary
211207fe7acSYongseok Koh    process. If the external memory is registered by primary process but has
212207fe7acSYongseok Koh    different virtual address in secondary process, unexpected error may happen.
213a482a41aSShahaf Shuler
21409c25553SXueming Li- Shared Rx queue:
21509c25553SXueming Li
21609c25553SXueming Li  - Counters of received packets and bytes number of devices in same share group are same.
21709c25553SXueming Li  - Counters of received packets and bytes number of queues in same group and queue ID are same.
21809c25553SXueming Li
2195c9f3294SSpike Du- Available descriptor threshold event:
2205c9f3294SSpike Du
2215c9f3294SSpike Du  - Does not support shared Rx queue and hairpin Rx queue.
2225c9f3294SSpike Du
2230e04e1e2SXueming Li- The symmetric RSS function is supported by swapping source and destination
2240e04e1e2SXueming Li  addresses and ports.
2250e04e1e2SXueming Li
2262235fcdaSSpike Du- Host shaper:
2272235fcdaSSpike Du
228eb1dcc01SThomas Monjalon  - Support BlueField series NIC from BlueField-2.
22986647d46SThomas Monjalon  - When configuring host shaper with ``RTE_PMD_MLX5_HOST_SHAPER_FLAG_AVAIL_THRESH_TRIGGERED`` flag,
2302235fcdaSSpike Du    only rates 0 and 100Mbps are supported.
2312235fcdaSSpike Du
23222681deeSAlex Vesker- HW steering:
23322681deeSAlex Vesker
23422681deeSAlex Vesker  - WQE based high scaling and safer flow insertion/destruction.
23522681deeSAlex Vesker  - Set ``dv_flow_en`` to 2 in order to enable HW steering.
23622681deeSAlex Vesker  - Async queue-based ``rte_flow_async`` APIs supported only.
23722681deeSAlex Vesker  - NIC ConnectX-5 and before are not supported.
2381770a0fcSDariusz Sosnowski  - Reconfiguring flow API engine is not supported.
2391770a0fcSDariusz Sosnowski    Any subsequent call to ``rte_flow_configure()`` with different configuration
2401770a0fcSDariusz Sosnowski    than initially provided will be rejected with ``-ENOTSUP`` error code.
24122681deeSAlex Vesker  - Partial match with item template is not supported.
24222681deeSAlex Vesker  - IPv6 5-tuple matching is not supported.
2438e82ebe2SDariusz Sosnowski  - With E-Switch enabled, ports which share the E-Switch domain
2448e82ebe2SDariusz Sosnowski    should be started and stopped in a specific order:
2458e82ebe2SDariusz Sosnowski
2468e82ebe2SDariusz Sosnowski    - When starting ports, the transfer proxy port should be started first
2478e82ebe2SDariusz Sosnowski      and port representors should follow.
2488e82ebe2SDariusz Sosnowski    - When stopping ports, all of the port representors
2498e82ebe2SDariusz Sosnowski      should be stopped before stopping the transfer proxy port.
2508e82ebe2SDariusz Sosnowski
2518e82ebe2SDariusz Sosnowski    If ports are started/stopped in an incorrect order,
2528e82ebe2SDariusz Sosnowski    ``rte_eth_dev_start()``/``rte_eth_dev_stop()`` will return an appropriate error code:
2538e82ebe2SDariusz Sosnowski
2548e82ebe2SDariusz Sosnowski    - ``-EAGAIN`` for ``rte_eth_dev_start()``.
2558e82ebe2SDariusz Sosnowski    - ``-EBUSY`` for ``rte_eth_dev_stop()``.
25622681deeSAlex Vesker
25700e57916SRongwei Liu  - Matching on ICMP6 following IPv6 routing extension header,
25800e57916SRongwei Liu    should match ``ipv6_routing_ext_next_hdr`` instead of ICMP6.
259d054c250SRongwei Liu    IPv6 routing extension matching is not supported in flow template relaxed
260d054c250SRongwei Liu    matching mode (see ``struct rte_flow_pattern_template_attr::relaxed_matching``).
26100e57916SRongwei Liu
26288596e96SMaayan Kashani  - The supported actions order is as below::
26388596e96SMaayan Kashani
26488596e96SMaayan Kashani          MARK (a)
26588596e96SMaayan Kashani          *_DECAP (b)
26688596e96SMaayan Kashani          OF_POP_VLAN
26788596e96SMaayan Kashani          COUNT | AGE
26888596e96SMaayan Kashani          METER_MARK | CONNTRACK
26988596e96SMaayan Kashani          OF_PUSH_VLAN
27088596e96SMaayan Kashani          MODIFY_FIELD
27188596e96SMaayan Kashani          *_ENCAP (c)
27288596e96SMaayan Kashani          JUMP | DROP | RSS (a) | QUEUE (a) | REPRESENTED_PORT (d)
27388596e96SMaayan Kashani
27488596e96SMaayan Kashani    a. Only supported on ingress.
27588596e96SMaayan Kashani    b. Any decapsulation action, including the combination of RAW_ENCAP and RAW_DECAP actions
27688596e96SMaayan Kashani       which results in L3 decapsulation.
27788596e96SMaayan Kashani       Not supported on egress.
27888596e96SMaayan Kashani    c. Any encapsulation action, including the combination of RAW_ENCAP and RAW_DECAP actions
27988596e96SMaayan Kashani       which results in L3 encap.
28088596e96SMaayan Kashani    d. Only in transfer (switchdev) mode.
28188596e96SMaayan Kashani
28200f75a40SDekel Peled- When using Verbs flow engine (``dv_flow_en`` = 0), flow pattern without any
28300f75a40SDekel Peled  specific VLAN will match for VLAN packets as well:
2845747882cSShahaf Shuler
2855747882cSShahaf Shuler  When VLAN spec is not specified in the pattern, the matching rule will be created with VLAN as a wild card.
2865747882cSShahaf Shuler  Meaning, the flow rule::
2875747882cSShahaf Shuler
2885747882cSShahaf Shuler        flow create 0 ingress pattern eth / vlan vid is 3 / ipv4 / end ...
2895747882cSShahaf Shuler
29000f75a40SDekel Peled  Will only match vlan packets with vid=3. and the flow rule::
2915747882cSShahaf Shuler
2925747882cSShahaf Shuler        flow create 0 ingress pattern eth / ipv4 / end ...
2935747882cSShahaf Shuler
2945747882cSShahaf Shuler  Will match any ipv4 packet (VLAN included).
2951ca601d1SAdrien Mazarguil
29686b59a1aSMatan Azrad- When using Verbs flow engine (``dv_flow_en`` = 0), multi-tagged(QinQ) match is not supported.
29786b59a1aSMatan Azrad
29886b59a1aSMatan Azrad- When using DV flow engine (``dv_flow_en`` = 1), flow pattern with any VLAN specification will match only single-tagged packets unless the ETH item ``type`` field is 0x88A8 or the VLAN item ``has_more_vlan`` field is 1.
29992818d83SDekel Peled  The flow rule::
30092818d83SDekel Peled
30192818d83SDekel Peled        flow create 0 ingress pattern eth / ipv4 / end ...
30292818d83SDekel Peled
30386b59a1aSMatan Azrad  Will match any ipv4 packet.
30486b59a1aSMatan Azrad  The flow rules::
30592818d83SDekel Peled
30686b59a1aSMatan Azrad        flow create 0 ingress pattern eth / vlan / end ...
30786b59a1aSMatan Azrad        flow create 0 ingress pattern eth has_vlan is 1 / end ...
30886b59a1aSMatan Azrad        flow create 0 ingress pattern eth type is 0x8100 / end ...
30992818d83SDekel Peled
31086b59a1aSMatan Azrad  Will match single-tagged packets only, with any VLAN ID value.
31186b59a1aSMatan Azrad  The flow rules::
31292818d83SDekel Peled
31386b59a1aSMatan Azrad        flow create 0 ingress pattern eth type is 0x88A8 / end ...
31486b59a1aSMatan Azrad        flow create 0 ingress pattern eth / vlan has_more_vlan is 1 / end ...
31592818d83SDekel Peled
31686b59a1aSMatan Azrad  Will match multi-tagged packets only, with any VLAN ID value.
31786b59a1aSMatan Azrad
31886b59a1aSMatan Azrad- A flow pattern with 2 sequential VLAN items is not supported.
31992818d83SDekel Peled
320b41e47daSMoti Haimovsky- VLAN pop offload command:
321b41e47daSMoti Haimovsky
322b41e47daSMoti Haimovsky  - Flow rules having a VLAN pop offload command as one of their actions and
323b41e47daSMoti Haimovsky    are lacking a match on VLAN as one of their items are not supported.
324cb299214SDong Zhou  - The command is not supported on egress traffic in NIC mode.
325b41e47daSMoti Haimovsky
326cb299214SDong Zhou- VLAN push offload is not supported on ingress traffic in NIC mode.
3279aee7a84SMoti Haimovsky
32868fad363SMoti Haimovsky- VLAN set PCP offload is not supported on existing headers.
32968fad363SMoti Haimovsky
330e98e44baSViacheslav Ovsiienko- A multi segment packet must have not more segments than reported by dev_infos_get()
331e98e44baSViacheslav Ovsiienko  in tx_desc_lim.nb_seg_max field. This value depends on maximal supported Tx descriptor
332e98e44baSViacheslav Ovsiienko  size and ``txq_inline_min`` settings and may be from 2 (worst case forced by maximal
333e98e44baSViacheslav Ovsiienko  inline settings) to 58.
3347d6bf6b8SYongseok Koh
33549765b78SRongwei Liu- Match on VXLAN supports any bits in the tunnel header
336630a587bSRongwei Liu
33749765b78SRongwei Liu  - Flag 8-bits and first 24-bits reserved fields matching
33849765b78SRongwei Liu    is only supported when using DV flow engine (``dv_flow_en`` = 2).
33949765b78SRongwei Liu  - For ConnectX-5, the UDP destination port must be the standard one (4789).
34049765b78SRongwei Liu  - Default UDP destination is 4789 if not explicitly specified.
34149765b78SRongwei Liu  - Group zero's behavior may differ which depends on FW.
3427d6bf6b8SYongseok Koh
343279aa34fSGavin Li- Matching on VXLAN-GPE header fields:
344279aa34fSGavin Li
345279aa34fSGavin Li     - ``rsvd0``/``rsvd1`` matching support depends on FW version
346279aa34fSGavin Li       when using DV flow engine (``dv_flow_en`` = 1).
34749765b78SRongwei Liu     - ``protocol`` should be explicitly specified in HWS (``dv_flow_en`` = 2).
348279aa34fSGavin Li
3491f106da2SMatan Azrad- L3 VXLAN and VXLAN-GPE tunnels cannot be supported together with MPLSoGRE and MPLSoUDP.
3501f106da2SMatan Azrad
3511e2a51f5SMichael Baum- MPLSoGRE is not supported in HW steering (``dv_flow_en`` = 2).
3521e2a51f5SMichael Baum
3531e2a51f5SMichael Baum- MPLSoUDP with multiple MPLS headers is only supported in HW steering (``dv_flow_en`` = 2).
3541e2a51f5SMichael Baum
355e59a5dbcSMoti Haimovsky- Match on Geneve header supports the following fields only:
356e59a5dbcSMoti Haimovsky
357e59a5dbcSMoti Haimovsky     - VNI
358e59a5dbcSMoti Haimovsky     - OAM
359e59a5dbcSMoti Haimovsky     - protocol type
360e59a5dbcSMoti Haimovsky     - options length
361e440d6cfSShiri Kuzin
362e440d6cfSShiri Kuzin- Match on Geneve TLV option is supported on the following fields:
363e440d6cfSShiri Kuzin
364e440d6cfSShiri Kuzin     - Class
365e440d6cfSShiri Kuzin     - Type
366e440d6cfSShiri Kuzin     - Length
367e440d6cfSShiri Kuzin     - Data
368e440d6cfSShiri Kuzin
369e440d6cfSShiri Kuzin  Class/Type/Length fields must be specified as well as masks.
370e440d6cfSShiri Kuzin  Class/Type/Length specified masks must be full.
371e440d6cfSShiri Kuzin  Matching Geneve TLV option without specifying data is not supported.
372e440d6cfSShiri Kuzin  Matching Geneve TLV option with ``data & mask == 0`` is not supported.
373e59a5dbcSMoti Haimovsky
37485738168SMichael Baum  In SW steering (``dv_flow_en`` = 1):
37585738168SMichael Baum
37685738168SMichael Baum     - Only one Class/Type/Length Geneve TLV option is supported per shared device.
37785738168SMichael Baum     - Supported only with ``FLEX_PARSER_PROFILE_ENABLE`` = 0.
37885738168SMichael Baum
37985738168SMichael Baum  In HW steering (``dv_flow_en`` = 2):
38085738168SMichael Baum
38185738168SMichael Baum     - Multiple Class/Type/Length Geneve TLV options are supported per physical device.
38285738168SMichael Baum     - Multiple of same Geneve TLV option isn't supported at the same pattern template.
38385738168SMichael Baum     - Supported only with ``FLEX_PARSER_PROFILE_ENABLE`` = 8.
3842a39dda7SMichael Baum     - Supported also with ``FLEX_PARSER_PROFILE_ENABLE`` = 0 for single DW only.
38585738168SMichael Baum     - Supported for FW version **xx.37.0142** and above.
38685738168SMichael Baum
387f5177bdcSMichael Baum  .. _geneve_parser_api:
388f5177bdcSMichael Baum
38985738168SMichael Baum  - An API (``rte_pmd_mlx5_create_geneve_tlv_parser``)
39085738168SMichael Baum    is available for the flexible parser used in HW steering:
391f5177bdcSMichael Baum
392f5177bdcSMichael Baum    Each physical device has 7 DWs for GENEVE TLV options.
393f5177bdcSMichael Baum    Partial option configuration is supported,
394f5177bdcSMichael Baum    mask for data is provided in parser creation
395f5177bdcSMichael Baum    indicating which DWs configuration is requested.
396f5177bdcSMichael Baum    Only masked data DWs can be matched later as item field using flow API.
397f5177bdcSMichael Baum
398f5177bdcSMichael Baum    - Matching of ``type`` field is supported for each configured option.
399f5177bdcSMichael Baum    - However, for matching ``class`` field,
400f5177bdcSMichael Baum      the option should be configured with ``match_on_class_mode=2``.
401f5177bdcSMichael Baum      One extra DW is consumed for it.
402f5177bdcSMichael Baum    - Matching on ``length`` field is not supported.
403f5177bdcSMichael Baum
4042a39dda7SMichael Baum    - More limitations with ``FLEX_PARSER_PROFILE_ENABLE`` = 0:
4052a39dda7SMichael Baum
4062a39dda7SMichael Baum      - single DW
4072a39dda7SMichael Baum      - ``sample_len`` must be equal to ``option_len`` and not bigger than 1.
4082a39dda7SMichael Baum      - ``match_on_class_mode`` different than 1 is not supported.
4092a39dda7SMichael Baum      - ``offset`` must be 0.
4102a39dda7SMichael Baum
411f5177bdcSMichael Baum    Although the parser is created per physical device, this API is port oriented.
412f5177bdcSMichael Baum    Each port should call this API before using GENEVE OPT item,
413f5177bdcSMichael Baum    but its configuration must use the same options list
414f5177bdcSMichael Baum    with same internal order configured by first port.
415f5177bdcSMichael Baum
416f5177bdcSMichael Baum    Calling this API for different ports under same physical device doesn't consume
417f5177bdcSMichael Baum    more DWs, the first one creates the parser and the rest use same configuration.
418f5177bdcSMichael Baum
419ccdcba53SNélio Laranjeiro- VF: flow rules created on VF devices can only match traffic targeted at the
420ccdcba53SNélio Laranjeiro  configured MAC addresses (see ``rte_eth_dev_mac_addr_add()``).
421ccdcba53SNélio Laranjeiro
422f31d7a01SDekel Peled- Match on GTP tunnel header item supports the following fields only:
423f31d7a01SDekel Peled
424563ac307SDekel Peled     - v_pt_rsv_flags: E flag, S flag, PN flag
425f31d7a01SDekel Peled     - msg_type
426f31d7a01SDekel Peled     - teid
427f31d7a01SDekel Peled
42806cd4cf6SShiri Kuzin- Match on GTP extension header only for GTP PDU session container (next
42906cd4cf6SShiri Kuzin  extension header type = 0x85).
43006cd4cf6SShiri Kuzin- Match on GTP extension header is not supported in group 0.
43106cd4cf6SShiri Kuzin
43272185352SMaayan Kashani- When using DV/Verbs flow engine (``dv_flow_en`` = 1/0 respectively),
43390385c46SMaayan Kashani  match on SPI field in ESP header for group 0 is supported from ConnectX-7.
43472185352SMaayan Kashani
43581188e6fSViacheslav Ovsiienko- Matching on SPI field in ESP header is supported over the PF only.
43681188e6fSViacheslav Ovsiienko
43716508bfdSGregory Etelson- Flex item:
43816508bfdSGregory Etelson
439cb0da841SRaslan Darawsheh  - Hardware support: **NVIDIA BlueField-2** and **NVIDIA BlueField-3**.
44016508bfdSGregory Etelson  - Flex item is supported on PF only.
44116508bfdSGregory Etelson  - Hardware limits ``header_length_mask_width`` up to 6 bits.
44216508bfdSGregory Etelson  - Firmware supports 8 global sample fields.
44316508bfdSGregory Etelson    Each flex item allocates non-shared sample fields from that pool.
44416508bfdSGregory Etelson  - Supported flex item can have 1 input link - ``eth`` or ``udp``
4456b6c0b8dSRongwei Liu    and up to 3 output links - ``ipv4`` or ``ipv6``.
44616508bfdSGregory Etelson  - Flex item fields (``next_header``, ``next_protocol``, ``samples``)
44716508bfdSGregory Etelson    do not participate in RSS hash functions.
44816508bfdSGregory Etelson  - In flex item configuration, ``next_header.field_base`` value
44916508bfdSGregory Etelson    must be byte aligned (multiple of 8).
4506b6c0b8dSRongwei Liu  - Modify field with flex item, the offset must be byte aligned (multiple of 8).
45116508bfdSGregory Etelson
45256d0de7aSMichael Baum- Match on random value:
45356d0de7aSMichael Baum
45456d0de7aSMichael Baum  - Supported only with HW Steering enabled (``dv_flow_en`` = 2).
45556d0de7aSMichael Baum  - Supported only in table with ``nb_flows=1``.
45656d0de7aSMichael Baum  - NIC ingress/egress flow in group 0 is not supported.
45756d0de7aSMichael Baum  - Supports matching only 16 bits (LSB).
45856d0de7aSMichael Baum
459cb25df7cSSuanming Mou- Match with compare result item (``RTE_FLOW_ITEM_TYPE_COMPARE``):
460cb25df7cSSuanming Mou
461cb25df7cSSuanming Mou  - Only supported in HW steering(``dv_flow_en`` = 2) mode.
462cb25df7cSSuanming Mou  - Only single flow is supported to the flow table.
4632f929201SMichael Baum  - Only single item is supported per pattern template.
464c07dbef7SSuanming Mou  - In switch mode, when the ``repr_matching_en`` flag is enabled in the devargs
465c07dbef7SSuanming Mou    (which is the default setting),
466c07dbef7SSuanming Mou    the match with compare result item is not supported for ``ingress`` rules.
467c07dbef7SSuanming Mou    This is because an implicit ``REPRESENTED_PORT`` needs to be added to the matcher,
468c07dbef7SSuanming Mou    which conflicts with the single item limitation.
4692f929201SMichael Baum  - Only 32-bit comparison is supported or 16-bit for random field.
4702f929201SMichael Baum  - Only supported for ``RTE_FLOW_FIELD_META``, ``RTE_FLOW_FIELD_TAG``,
471e1af096eSMichael Baum    ``RTE_FLOW_FIELD_ESP_SEQ_NUM``,
4722f929201SMichael Baum    ``RTE_FLOW_FIELD_RANDOM`` and ``RTE_FLOW_FIELD_VALUE``.
4732f929201SMichael Baum  - The field type ``RTE_FLOW_FIELD_VALUE`` must be the base (``b``) field.
4742f929201SMichael Baum  - The field type ``RTE_FLOW_FIELD_RANDOM`` can only be compared with
4752f929201SMichael Baum    ``RTE_FLOW_FIELD_VALUE``.
476cb25df7cSSuanming Mou
477be206083SViacheslav Ovsiienko- No Tx metadata go to the E-Switch steering domain for the Flow group 0.
478be206083SViacheslav Ovsiienko  The flows within group 0 and set metadata action are rejected by hardware.
479be206083SViacheslav Ovsiienko
480ccdcba53SNélio Laranjeiro.. note::
481ccdcba53SNélio Laranjeiro
482ccdcba53SNélio Laranjeiro   MAC addresses not already present in the bridge table of the associated
483ccdcba53SNélio Laranjeiro   kernel network device will be added and cleaned up by the PMD when closing
484ccdcba53SNélio Laranjeiro   the device. In case of ungraceful program termination, some entries may
485ccdcba53SNélio Laranjeiro   remain present and should be removed manually by other means.
486883ce172SShahaf Shuler
487eaf691f9SViacheslav Ovsiienko- Buffer split offload is supported with regular Rx burst routine only,
488eaf691f9SViacheslav Ovsiienko  no MPRQ feature or vectorized code can be engaged.
489eaf691f9SViacheslav Ovsiienko
4907d6bf6b8SYongseok Koh- When Multi-Packet Rx queue is configured (``mprq_en``), a Rx packet can be
491daa02b5cSOlivier Matz  externally attached to a user-provided mbuf with having RTE_MBUF_F_EXTERNAL in
4927d6bf6b8SYongseok Koh  ol_flags. As the mempool for the external buffer is managed by PMD, all the
4937d6bf6b8SYongseok Koh  Rx mbufs must be freed before the device is closed. Otherwise, the mempool of
4947d6bf6b8SYongseok Koh  the external buffers will be freed by PMD and the application which still
4957d6bf6b8SYongseok Koh  holds the external buffers may be corrupted.
496b60d006cSAlexander Kozyrev  User-managed mempools with external pinned data buffers
497b60d006cSAlexander Kozyrev  cannot be used in conjunction with MPRQ
498b60d006cSAlexander Kozyrev  since packets may be already attached to PMD-managed external buffers.
4997d6bf6b8SYongseok Koh
5001787eb7bSYongseok Koh- If Multi-Packet Rx queue is configured (``mprq_en``) and Rx CQE compression is
5011787eb7bSYongseok Koh  enabled (``rxq_cqe_comp_en``) at the same time, RSS hash result is not fully
502daa02b5cSOlivier Matz  supported. Some Rx packets may not have RTE_MBUF_F_RX_RSS_HASH.
5031787eb7bSYongseok Koh
50412284221SDekel Peled- IPv6 Multicast messages are not supported on VM, while promiscuous mode
50512284221SDekel Peled  and allmulticast mode are both set to off.
50612284221SDekel Peled  To receive IPv6 Multicast messages on VM, explicitly set the relevant
50712284221SDekel Peled  MAC address using rte_eth_dev_mac_addr_add() API.
50812284221SDekel Peled
5092d51f88dSViacheslav Ovsiienko- To support a mixed traffic pattern (some buffers from local host memory, some
5102d51f88dSViacheslav Ovsiienko  buffers from other devices) with high bandwidth, a mbuf flag is used.
5112d51f88dSViacheslav Ovsiienko
5122d51f88dSViacheslav Ovsiienko  An application hints the PMD whether or not it should try to inline the
5132d51f88dSViacheslav Ovsiienko  given mbuf data buffer. PMD should do the best effort to act upon this request.
5142d51f88dSViacheslav Ovsiienko
5152d51f88dSViacheslav Ovsiienko  The hint flag ``RTE_PMD_MLX5_FINE_GRANULARITY_INLINE`` is dynamic,
5162d51f88dSViacheslav Ovsiienko  registered by application with rte_mbuf_dynflag_register(). This flag is
5172d51f88dSViacheslav Ovsiienko  purely driver-specific and declared in PMD specific header ``rte_pmd_mlx5.h``,
5182d51f88dSViacheslav Ovsiienko  which is intended to be used by the application.
5192d51f88dSViacheslav Ovsiienko
5202d51f88dSViacheslav Ovsiienko  To query the supported specific flags in runtime,
5212d51f88dSViacheslav Ovsiienko  the function ``rte_pmd_mlx5_get_dyn_flag_names`` returns the array of
5222d51f88dSViacheslav Ovsiienko  currently (over present hardware and configuration) supported specific flags.
5232d51f88dSViacheslav Ovsiienko  The "not inline hint" feature operating flow is the following one:
5242d51f88dSViacheslav Ovsiienko
5252d51f88dSViacheslav Ovsiienko    - application starts
5262d51f88dSViacheslav Ovsiienko    - probe the devices, ports are created
5272d51f88dSViacheslav Ovsiienko    - query the port capabilities
5282d51f88dSViacheslav Ovsiienko    - if port supporting the feature is found
5292d51f88dSViacheslav Ovsiienko    - register dynamic flag ``RTE_PMD_MLX5_FINE_GRANULARITY_INLINE``
5302d51f88dSViacheslav Ovsiienko    - application starts the ports
5312d51f88dSViacheslav Ovsiienko    - on ``dev_start()`` PMD checks whether the feature flag is registered and
5322d51f88dSViacheslav Ovsiienko      enables the feature support in datapath
5332d51f88dSViacheslav Ovsiienko    - application might set the registered flag bit in ``ol_flags`` field
5342d51f88dSViacheslav Ovsiienko      of mbuf being sent and PMD will handle ones appropriately.
5352d51f88dSViacheslav Ovsiienko
5363a418d1dSViacheslav Ovsiienko- The amount of descriptors in Tx queue may be limited by data inline settings.
5373a418d1dSViacheslav Ovsiienko  Inline data require the more descriptor building blocks and overall block
5383a418d1dSViacheslav Ovsiienko  amount may exceed the hardware supported limits. The application should
5393a418d1dSViacheslav Ovsiienko  reduce the requested Tx size or adjust data inline settings with
5403a418d1dSViacheslav Ovsiienko  ``txq_inline_max`` and ``txq_inline_mpw`` devargs keys.
5413a418d1dSViacheslav Ovsiienko
5428f848f32SViacheslav Ovsiienko- To provide the packet send scheduling on mbuf timestamps the ``tx_pp``
543af270529SThomas Monjalon  parameter should be specified.
544af270529SThomas Monjalon  When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME set on the packet
5458f848f32SViacheslav Ovsiienko  being sent it tries to synchronize the time of packet appearing on
5468f848f32SViacheslav Ovsiienko  the wire with the specified packet timestamp. It the specified one
5478f848f32SViacheslav Ovsiienko  is in the past it should be ignored, if one is in the distant future
5488f848f32SViacheslav Ovsiienko  it should be capped with some reasonable value (in range of seconds).
5498f848f32SViacheslav Ovsiienko  These specific cases ("too late" and "distant future") can be optionally
5508f848f32SViacheslav Ovsiienko  reported via device xstats to assist applications to detect the
5518f848f32SViacheslav Ovsiienko  time-related problems.
5528f848f32SViacheslav Ovsiienko
553b5f61561SViacheslav Ovsiienko  The timestamp upper "too-distant-future" limit
554b5f61561SViacheslav Ovsiienko  at the moment of invoking the Tx burst routine
555b5f61561SViacheslav Ovsiienko  can be estimated as ``tx_pp`` option (in nanoseconds) multiplied by 2^23.
556b5f61561SViacheslav Ovsiienko  Please note, for the testpmd txonly mode,
557b5f61561SViacheslav Ovsiienko  the limit is deduced from the expression::
558b5f61561SViacheslav Ovsiienko
559b5f61561SViacheslav Ovsiienko        (n_tx_descriptors / burst_size + 1) * inter_burst_gap
560b5f61561SViacheslav Ovsiienko
5618f848f32SViacheslav Ovsiienko  There is no any packet reordering according timestamps is supposed,
5628f848f32SViacheslav Ovsiienko  neither within packet burst, nor between packets, it is an entirely
5638f848f32SViacheslav Ovsiienko  application responsibility to generate packets and its timestamps
5648f848f32SViacheslav Ovsiienko  in desired order. The timestamps can be put only in the first packet
5658f848f32SViacheslav Ovsiienko  in the burst providing the entire burst scheduling.
5668f848f32SViacheslav Ovsiienko
567909be50aSOri Kam- E-Switch decapsulation Flow:
5680280f281SViacheslav Ovsiienko
5693fdac78aSDekel Peled  - can be applied to PF port only.
5700280f281SViacheslav Ovsiienko  - must specify VF port action (packet redirection from PF to VF).
5710280f281SViacheslav Ovsiienko  - optionally may specify tunnel inner source and destination MAC addresses.
5720280f281SViacheslav Ovsiienko
573909be50aSOri Kam- E-Switch  encapsulation Flow:
5740280f281SViacheslav Ovsiienko
5750280f281SViacheslav Ovsiienko  - can be applied to VF ports only.
5760280f281SViacheslav Ovsiienko  - must specify PF port action (packet redirection from VF to PF).
5770280f281SViacheslav Ovsiienko
57868e9925cSShun Hao- E-Switch Manager matching:
57968e9925cSShun Hao
580eb1dcc01SThomas Monjalon  - For BlueField with old FW
58168e9925cSShun Hao    which doesn't expose the E-Switch Manager vport ID in the capability,
582eb1dcc01SThomas Monjalon    matching E-Switch Manager should be used only in BlueField embedded CPU mode.
58368e9925cSShun Hao
58453b78ea1SRaslan Darawsheh- Raw encapsulation:
58553b78ea1SRaslan Darawsheh
58653b78ea1SRaslan Darawsheh  - The input buffer, used as outer header, is not validated.
58753b78ea1SRaslan Darawsheh
58853b78ea1SRaslan Darawsheh- Raw decapsulation:
58953b78ea1SRaslan Darawsheh
59053b78ea1SRaslan Darawsheh  - The decapsulation is always done up to the outermost tunnel detected by the HW.
59153b78ea1SRaslan Darawsheh  - The input buffer, providing the removal size, is not validated.
59253b78ea1SRaslan Darawsheh  - The buffer size must match the length of the headers to be removed.
59353b78ea1SRaslan Darawsheh
594035968c7SDariusz Sosnowski- Outer UDP checksum calculation for encapsulation flow actions:
595035968c7SDariusz Sosnowski
596035968c7SDariusz Sosnowski  - Currently available NVIDIA NICs and DPUs do not have a capability to calculate
597035968c7SDariusz Sosnowski    the UDP checksum in the header added using encapsulation flow actions.
598035968c7SDariusz Sosnowski
599035968c7SDariusz Sosnowski    Applications are required to use 0 in UDP checksum field in such flow actions.
600035968c7SDariusz Sosnowski    Resulting packet will have outer UDP checksum equal to 0.
601035968c7SDariusz Sosnowski
60201314192SLeo Xu- ICMP(code/type/identifier/sequence number) / ICMP6(code/type/identifier/sequence number) matching,
60301314192SLeo Xu  IP-in-IP and MPLS flow matching are all mutually exclusive features which cannot be supported together
6044710b113SThomas Monjalon  (see :ref:`mlx5_firmware_config`).
605d53aa89aSXiaoyu Min
606175f1c21SDekel Peled- LRO:
607175f1c21SDekel Peled
6082eb5dce8SDekel Peled  - Requires DevX and DV flow to be enabled.
6092579543fSMatan Azrad  - KEEP_CRC offload cannot be supported with LRO.
6102579543fSMatan Azrad  - The first mbuf length, without head-room,  must be big enough to include the
6112579543fSMatan Azrad    TCP header (122B).
6121c7e57f9SDekel Peled  - Rx queue with LRO offload enabled, receiving a non-LRO packet, can forward
6131c7e57f9SDekel Peled    it with size limited to max LRO size, not to max RX packet length.
614ea758089SGregory Etelson  - The driver rounds down the port configuration value ``max_lro_pkt_size``
615ea758089SGregory Etelson    (from ``rte_eth_rxmode``) to a multiple of 256 due to hardware limitation.
61661fb98dfSDekel Peled  - LRO can be used with outer header of TCP packets of the standard format:
61761fb98dfSDekel Peled        eth (with or without vlan) / ipv4 or ipv6 / tcp / payload
61861fb98dfSDekel Peled
61961fb98dfSDekel Peled    Other TCP packets (e.g. with MPLS label) received on Rx queue with LRO enabled, will be received with bad checksum.
620613d64e4SDekel Peled  - LRO packet aggregation is performed by HW only for packet size larger than
621613d64e4SDekel Peled    ``lro_min_mss_size``. This value is reported on device start, when debug
622613d64e4SDekel Peled    mode is enabled.
623175f1c21SDekel Peled
62491f7338eSSuanming Mou- CRC:
62591f7338eSSuanming Mou
626295968d1SFerruh Yigit  - ``RTE_ETH_RX_OFFLOAD_KEEP_CRC`` cannot be supported with decapsulation
627cb0da841SRaslan Darawsheh    for some NICs (such as ConnectX-6 Dx, ConnectX-6 Lx, ConnectX-7, BlueField-2,
628cb0da841SRaslan Darawsheh    and BlueField-3).
62991f7338eSSuanming Mou    The capability bit ``scatter_fcs_w_decap_disable`` shows NIC support.
63091f7338eSSuanming Mou
6311d89c404SViacheslav Ovsiienko- TX mbuf fast free:
6321d89c404SViacheslav Ovsiienko
6331d89c404SViacheslav Ovsiienko  - fast free offload assumes the all mbufs being sent are originated from the
6341d89c404SViacheslav Ovsiienko    same memory pool and there is no any extra references to the mbufs (the
6351d89c404SViacheslav Ovsiienko    reference counter for each mbuf is equal 1 on tx_burst call). The latter
6361d89c404SViacheslav Ovsiienko    means there should be no any externally attached buffers in mbufs. It is
6371d89c404SViacheslav Ovsiienko    an application responsibility to provide the correct mbufs if the fast
6381d89c404SViacheslav Ovsiienko    free offload is engaged. The mlx5 PMD implicitly produces the mbufs with
6391d89c404SViacheslav Ovsiienko    externally attached buffers if MPRQ option is enabled, hence, the fast
6401d89c404SViacheslav Ovsiienko    free offload is neither supported nor advertised if there is MPRQ enabled.
6411d89c404SViacheslav Ovsiienko
642f78b86c3SJiawei Wang- Sample flow:
643f78b86c3SJiawei Wang
64432a74d81SJiawei Wang  - Supports ``RTE_FLOW_ACTION_TYPE_SAMPLE`` action only within NIC Rx and
64532a74d81SJiawei Wang    E-Switch steering domain.
646eaa1ef70SJiawei Wang  - In E-Switch steering domain, for sampling with sample ratio > 1 in a transfer rule,
647eaa1ef70SJiawei Wang    additional actions are not supported in the sample actions list.
64832a74d81SJiawei Wang  - For ConnectX-5, the ``RTE_FLOW_ACTION_TYPE_SAMPLE`` is typically used as
64932a74d81SJiawei Wang    first action in the E-Switch egress flow if with header modify or
65032a74d81SJiawei Wang    encapsulation actions.
651eaa1ef70SJiawei Wang  - For NIC Rx flow, supports only ``MARK``, ``COUNT``, ``QUEUE``, ``RSS`` in the
65232a74d81SJiawei Wang    sample actions list.
653eaa1ef70SJiawei Wang  - In E-Switch steering domain, for mirroring with sample ratio = 1 in a transfer rule,
654eaa1ef70SJiawei Wang    supports only ``RAW_ENCAP``, ``PORT_ID``, ``REPRESENTED_PORT``, ``VXLAN_ENCAP``, ``NVGRE_ENCAP``
655eaa1ef70SJiawei Wang    in the sample actions list.
656eaa1ef70SJiawei Wang  - In E-Switch steering domain, for mirroring with sample ratio = 1 in a transfer rule,
657eaa1ef70SJiawei Wang    the encapsulation actions (``RAW_ENCAP`` or ``VXLAN_ENCAP`` or ``NVGRE_ENCAP``)
658eaa1ef70SJiawei Wang    support uplink port only.
659eaa1ef70SJiawei Wang  - In E-Switch steering domain, for mirroring with sample ratio = 1 in a transfer rule,
660eaa1ef70SJiawei Wang    the port actions (``PORT_ID`` or ``REPRESENTED_PORT``) with uplink port and ``JUMP`` action
661eaa1ef70SJiawei Wang    are not supported without the encapsulation actions
662eaa1ef70SJiawei Wang    (``RAW_ENCAP`` or ``VXLAN_ENCAP`` or ``NVGRE_ENCAP``) in the sample actions list.
6639a726360SJiawei Wang  - For ConnectX-5 trusted device, the application metadata with SET_TAG index 0
6649a726360SJiawei Wang    is not supported before ``RTE_FLOW_ACTION_TYPE_SAMPLE`` action.
665f78b86c3SJiawei Wang
666641dbe4fSAlexander Kozyrev- Modify Field flow:
667641dbe4fSAlexander Kozyrev
6687e59e875SDariusz Sosnowski  - Supports the 'set' and 'add' operations for ``RTE_FLOW_ACTION_TYPE_MODIFY_FIELD`` action.
669641dbe4fSAlexander Kozyrev  - Modification of an arbitrary place in a packet via the special ``RTE_FLOW_FIELD_START`` Field ID is not supported.
67056d0de7aSMichael Baum  - Modify field action using ``RTE_FLOW_FIELD_RANDOM`` is not supported.
671d3c94efbSMichael Baum  - Modification of the 802.1Q tag is not supported.
672d3c94efbSMichael Baum  - Modification of VXLAN network or GENEVE network ID is supported only for HW steering.
673d5e571c2SRongwei Liu  - Modification of the VXLAN header is supported with below limitations:
674d5e571c2SRongwei Liu
675d5e571c2SRongwei Liu    - Only for HW steering (``dv_flow_en=2``).
676d5e571c2SRongwei Liu    - Support VNI and the last reserved byte modifications for traffic
677d5e571c2SRongwei Liu      with default UDP destination port: 4789 for VXLAN and VXLAN-GBP, 4790 for VXLAN-GPE.
678d5e571c2SRongwei Liu
679d3c94efbSMichael Baum  - Modification of GENEVE network ID is not supported when configured
680d3c94efbSMichael Baum    ``FLEX_PARSER_PROFILE_ENABLE`` supports Geneve TLV options.
681d3c94efbSMichael Baum    See :ref:`mlx5_firmware_config` for more flex parser information.
6821caa89ecSMichael Baum  - Modification of GENEVE TLV option fields is supported only for HW steering.
6831caa89ecSMichael Baum    Only DWs configured in :ref:`parser creation <geneve_parser_api>` can be modified,
6841caa89ecSMichael Baum    'type' and 'class' fields can be modified when ``match_on_class_mode=2``.
6851caa89ecSMichael Baum  - Modification of GENEVE TLV option data supports one DW per action.
6867e59e875SDariusz Sosnowski  - Offsets cannot skip past the boundary of a field.
687d98c04e8SViacheslav Ovsiienko  - If the field type is ``RTE_FLOW_FIELD_MAC_TYPE``
688d98c04e8SViacheslav Ovsiienko    and packet contains one or more VLAN headers,
689d98c04e8SViacheslav Ovsiienko    the meaningful type field following the last VLAN header
690d98c04e8SViacheslav Ovsiienko    is used as modify field operation argument.
691d98c04e8SViacheslav Ovsiienko    The modify field action is not intended to modify VLAN headers type field,
692d98c04e8SViacheslav Ovsiienko    dedicated VLAN push and pop actions should be used instead.
6937e59e875SDariusz Sosnowski  - For packet fields (e.g. MAC addresses, IPv4 addresses or L4 ports)
6947e59e875SDariusz Sosnowski    offset specifies the number of bits to skip from field's start,
6957e59e875SDariusz Sosnowski    starting from MSB in the first byte, in the network order.
6967e59e875SDariusz Sosnowski  - For flow metadata fields (e.g. META or TAG)
6977e59e875SDariusz Sosnowski    offset specifies the number of bits to skip from field's start,
6987e59e875SDariusz Sosnowski    starting from LSB in the least significant byte, in the host order.
699c156799cSMichael Baum  - Modification of the MPLS header is supported with some limitations:
700c156799cSMichael Baum
701c156799cSMichael Baum    - Only in HW steering.
702c156799cSMichael Baum    - Only in ``src`` field.
703c156799cSMichael Baum    - Only for outermost tunnel header (``level=2``).
704c156799cSMichael Baum      For ``RTE_FLOW_FIELD_MPLS``,
705c156799cSMichael Baum      the default encapsulation level ``0`` describes the outermost tunnel header.
706c156799cSMichael Baum
707c156799cSMichael Baum      .. note::
708c156799cSMichael Baum
709c156799cSMichael Baum         The default encapsulation level ``0`` describes
710c156799cSMichael Baum         the "outermost that match is supported",
711c156799cSMichael Baum         currently it is the first tunnel,
712c156799cSMichael Baum         but it can be changed to outer when it is supported.
713c156799cSMichael Baum
714c156799cSMichael Baum  - Default encapsulation level ``0`` describes outermost.
715c156799cSMichael Baum  - Encapsulation level ``2`` is supported with some limitations:
716c156799cSMichael Baum
717c156799cSMichael Baum    - Only in HW steering.
718c156799cSMichael Baum    - Only in ``src`` field.
719c156799cSMichael Baum    - ``RTE_FLOW_FIELD_VLAN_ID`` is not supported.
720c156799cSMichael Baum    - ``RTE_FLOW_FIELD_IPV4_PROTO`` is not supported.
721c156799cSMichael Baum    - ``RTE_FLOW_FIELD_IPV6_PROTO/DSCP/ECN`` are not supported.
722c156799cSMichael Baum    - ``RTE_FLOW_FIELD_ESP_PROTO/SPI/SEQ_NUM`` are not supported.
723c156799cSMichael Baum    - ``RTE_FLOW_FIELD_TCP_SEQ/ACK_NUM`` are not supported.
724c156799cSMichael Baum    - Second tunnel fields are not supported.
725c156799cSMichael Baum
726c156799cSMichael Baum  - Encapsulation levels greater than ``2`` are not supported.
727641dbe4fSAlexander Kozyrev
72804a4de75SMichael Baum- Age action:
72904a4de75SMichael Baum
73004a4de75SMichael Baum  - with HW steering (``dv_flow_en=2``)
73104a4de75SMichael Baum
73204a4de75SMichael Baum    - Using the same indirect count action combined with multiple age actions
73304a4de75SMichael Baum      in different flows may cause a wrong age state for the age actions.
73404a4de75SMichael Baum    - Creating/destroying flow rules with indirect age action when it is active
73504a4de75SMichael Baum      (timeout != 0) may cause a wrong age state for the indirect age action.
73604a4de75SMichael Baum
73704a4de75SMichael Baum    - The driver reuses counters for aging action, so for optimization
73804a4de75SMichael Baum      the values in ``rte_flow_port_attr`` structure should describe:
73904a4de75SMichael Baum
74004a4de75SMichael Baum      - ``nb_counters`` is the number of flow rules using counter (with/without age)
74104a4de75SMichael Baum        in addition to flow rules using only age (without count action).
74204a4de75SMichael Baum      - ``nb_aging_objects`` is the number of flow rules containing age action.
74304a4de75SMichael Baum
74449175737SDekel Peled- IPv6 header item 'proto' field, indicating the next header protocol, should
74549175737SDekel Peled  not be set as extension header.
74649175737SDekel Peled  In case the next header is an extension header, it should not be specified in
74749175737SDekel Peled  IPv6 header item 'proto' field.
74849175737SDekel Peled  The last extension header item 'next header' field can specify the following
74949175737SDekel Peled  header protocol type.
75049175737SDekel Peled
75100e57916SRongwei Liu- Match on IPv6 routing extension header supports the following fields only:
75200e57916SRongwei Liu
75300e57916SRongwei Liu  - ``type``
75400e57916SRongwei Liu  - ``next_hdr``
75500e57916SRongwei Liu  - ``segments_left``
75600e57916SRongwei Liu
75700e57916SRongwei Liu  Only supports HW steering (``dv_flow_en=2``).
75800e57916SRongwei Liu
7591be65c39SRongwei Liu- IPv6 routing extension push/remove:
7601be65c39SRongwei Liu
7611be65c39SRongwei Liu  - Supported only with HW Steering enabled (``dv_flow_en=2``).
7621be65c39SRongwei Liu  - Supported in non-zero group
7631be65c39SRongwei Liu    (no limits on transfer domain if ``fdb_def_rule_en=1`` which is default).
7641be65c39SRongwei Liu  - Only supports TCP or UDP as next layer.
7651be65c39SRongwei Liu  - IPv6 routing header must be the only present extension.
7661be65c39SRongwei Liu  - Not supported on guest port.
7671be65c39SRongwei Liu
768840f09fbSBing Zhao- NAT64 action:
769840f09fbSBing Zhao
770840f09fbSBing Zhao  - Supported only with HW Steering enabled (``dv_flow_en`` = 2).
771840f09fbSBing Zhao  - FW version: at least ``XX.39.1002``.
772840f09fbSBing Zhao  - Supported only on non-root table.
773840f09fbSBing Zhao  - Actions order limitation should follow the modify fields action.
774840f09fbSBing Zhao  - The last 2 TAG registers will be used implicitly in address backup mode.
775840f09fbSBing Zhao  - Even if the action can be shared, new steering entries will be created per flow rule.
776840f09fbSBing Zhao    It is recommended a single rule with NAT64 should be shared
777840f09fbSBing Zhao    to reduce the duplication of entries.
778840f09fbSBing Zhao    The default address and other fields conversion will be handled with NAT64 action.
779840f09fbSBing Zhao    To support other address, new rule(s) with modify fields on the IP addresses should be created.
780840f09fbSBing Zhao  - TOS / Traffic Class is not supported now.
781840f09fbSBing Zhao
782fea92880SBing Zhao- Hairpin:
783fea92880SBing Zhao
784fea92880SBing Zhao  - Hairpin between two ports could only manual binding and explicit Tx flow mode. For single port hairpin, all the combinations of auto/manual binding and explicit/implicit Tx flow mode could be supported.
785fea92880SBing Zhao  - Hairpin in switchdev SR-IOV mode is not supported till now.
786cd00dce6SShani Peretz  - ``out_of_buffer`` statistics are not available on:
787cd00dce6SShani Peretz    - NICs older than ConnectX-7.
788cd00dce6SShani Peretz    - DPUs older than BlueField-3.
789fea92880SBing Zhao
79015896eafSGregory Etelson- Quota:
79115896eafSGregory Etelson
79215896eafSGregory Etelson  - Quota implemented for HWS / template API.
79315896eafSGregory Etelson  - Maximal value for quota SET and ADD operations in INT32_MAX (2GB).
79415896eafSGregory Etelson  - Application cannot use 2 consecutive ADD updates.
79515896eafSGregory Etelson    Next tokens update after ADD must always be SET.
79615896eafSGregory Etelson  - Quota flow action cannot be used with Meter or CT flow actions in the same rule.
79715896eafSGregory Etelson  - Quota flow action and item supported in non-root HWS tables.
79815896eafSGregory Etelson  - Maximal number of HW quota and HW meter objects <= 16e6.
79915896eafSGregory Etelson
8005df35867SLi Zhang- Meter:
80151ec04dcSShun Hao
8025df35867SLi Zhang  - All the meter colors with drop action will be counted only by the global drop statistics.
8034b7bf3ffSBing Zhao  - Yellow detection is only supported with ASO metering.
8045df35867SLi Zhang  - Red color must be with drop action.
80551ec04dcSShun Hao  - Meter statistics are supported only for drop case.
80651ec04dcSShun Hao  - A meter action created with pre-defined policy must be the last action in the flow except single case where the policy actions are:
80751ec04dcSShun Hao     - green: NULL or END.
80851ec04dcSShun Hao     - yellow: NULL or END.
80951ec04dcSShun Hao     - RED: DROP / END.
81051ec04dcSShun Hao  - The only supported meter policy actions:
8116431068dSSean Zhang     - green: QUEUE, RSS, PORT_ID, REPRESENTED_PORT, JUMP, DROP, MODIFY_FIELD, MARK, METER and SET_TAG.
8126431068dSSean Zhang     - yellow: QUEUE, RSS, PORT_ID, REPRESENTED_PORT, JUMP, DROP, MODIFY_FIELD, MARK, METER and SET_TAG.
81351ec04dcSShun Hao     - RED: must be DROP.
8144b7bf3ffSBing Zhao  - Policy actions of RSS for green and yellow should have the same configuration except queues.
815160f0d11SLi Zhang  - Policy with RSS/queue action is not supported when ``dv_xmeta_en`` enabled.
81696ca87daSShun Hao  - If green action is METER, yellow action must be the same METER action or NULL.
817aa065a9cSLi Zhang  - meter profile packet mode is supported.
81833a7493cSBing Zhao  - meter profiles of RFC2697, RFC2698 and RFC4115 are supported.
81996ca87daSShun Hao  - RFC4115 implementation is following MEF, meaning yellow traffic may reclaim unused green bandwidth when green token bucket is full.
820ca7e6051SShun Hao  - When using DV flow engine (``dv_flow_en`` = 1),
821ca7e6051SShun Hao    if meter has drop count
822ca7e6051SShun Hao    or meter hierarchy contains any meter that uses drop count,
823ca7e6051SShun Hao    it cannot be used by flow rule matching all ports.
8248330a5fbSShun Hao  - When using DV flow engine (``dv_flow_en`` = 1),
8258330a5fbSShun Hao    if meter hierarchy contains any meter that has MODIFY_FIELD/SET_TAG,
8268330a5fbSShun Hao    it cannot be used by flow matching all ports.
82748fbb0e9SAlexander Kozyrev  - When using HWS flow engine (``dv_flow_en`` = 2),
82848fbb0e9SAlexander Kozyrev    only meter mark action is supported.
8295df35867SLi Zhang
830ad17988aSAlexander Kozyrev- Ptype:
831ad17988aSAlexander Kozyrev
832ad17988aSAlexander Kozyrev  - Only supports HW steering (``dv_flow_en=2``).
833ad17988aSAlexander Kozyrev  - The supported values are:
834ad17988aSAlexander Kozyrev    L2: ``RTE_PTYPE_L2_ETHER``, ``RTE_PTYPE_L2_ETHER_VLAN``, ``RTE_PTYPE_L2_ETHER_QINQ``
835ad17988aSAlexander Kozyrev    L3: ``RTE_PTYPE_L3_IPV4``, ``RTE_PTYPE_L3_IPV6``
836ad17988aSAlexander Kozyrev    L4: ``RTE_PTYPE_L4_TCP``, ``RTE_PTYPE_L4_UDP``, ``RTE_PTYPE_L4_ICMP``
837ad17988aSAlexander Kozyrev    and their ``RTE_PTYPE_INNER_XXX`` counterparts as well as ``RTE_PTYPE_TUNNEL_ESP``.
838ad17988aSAlexander Kozyrev    Any other values are not supported. Using them as a value will cause unexpected behavior.
839ad17988aSAlexander Kozyrev  - Matching on both outer and inner IP fragmented is supported
840ad17988aSAlexander Kozyrev    using ``RTE_PTYPE_L4_FRAG`` and ``RTE_PTYPE_INNER_L4_FRAG`` values.
841ad17988aSAlexander Kozyrev    They are not part of L4 types, so they should be provided explicitly
842ad17988aSAlexander Kozyrev    as a mask value during pattern template creation.
843ad17988aSAlexander Kozyrev    Providing ``RTE_PTYPE_L4_MASK`` during pattern template creation
844ad17988aSAlexander Kozyrev    and ``RTE_PTYPE_L4_FRAG`` during flow rule creation
845ad17988aSAlexander Kozyrev    will cause unexpected behavior.
846ad17988aSAlexander Kozyrev
84779f89527SGregory Etelson- Integrity:
84879f89527SGregory Etelson
84979f89527SGregory Etelson  - Verification bits provided by the hardware are ``l3_ok``, ``ipv4_csum_ok``, ``l4_ok``, ``l4_csum_ok``.
85079f89527SGregory Etelson  - ``level`` value 0 references outer headers.
8515ddb9038SRaja Zidane  - Negative integrity item verification is not supported.
85247a76c9fSMichael Baum
85347a76c9fSMichael Baum  - With SW steering (``dv_flow_en=1``)
85447a76c9fSMichael Baum
85547a76c9fSMichael Baum    - Integrity offload is enabled starting from **ConnectX-6 Dx**.
85679f89527SGregory Etelson    - Multiple integrity items not supported in a single flow rule.
85747a76c9fSMichael Baum    - Flow rule items supplied by application must explicitly specify
85847a76c9fSMichael Baum      network headers referred by integrity item.
85979f89527SGregory Etelson
86047a76c9fSMichael Baum      For example, if integrity item mask sets ``l4_ok`` or ``l4_csum_ok`` bits,
86147a76c9fSMichael Baum      reference to L4 network header, TCP or UDP, must be in the rule pattern as well::
8625ddb9038SRaja Zidane
86347a76c9fSMichael Baum         flow create 0 ingress pattern integrity level is 0 value mask l3_ok value spec l3_ok / eth / ipv6 / end ...
86447a76c9fSMichael Baum         flow create 0 ingress pattern integrity level is 0 value mask l4_ok value spec l4_ok / eth / ipv4 proto is udp / end ...
86547a76c9fSMichael Baum
86647a76c9fSMichael Baum  - With HW steering (``dv_flow_en=2``)
86747a76c9fSMichael Baum    - The ``l3_ok`` field represents all L3 checks, but nothing about IPv4 checksum.
86847a76c9fSMichael Baum    - The ``l4_ok`` field represents all L4 checks including L4 checksum.
86979f89527SGregory Etelson
8708ebbc01fSBing Zhao- Connection tracking:
8718ebbc01fSBing Zhao
8728ebbc01fSBing Zhao  - Cannot co-exist with ASO meter, ASO age action in a single flow rule.
8732db75e8bSBing Zhao  - Flow rules insertion rate and memory consumption need more optimization.
8744c9e67b5SDariusz Sosnowski  - 16 ports maximum (with ``dv_flow_en=1``).
8754487a792SDariusz Sosnowski  - 32M connections maximum.
8768ebbc01fSBing Zhao
8779e22b859SSuanming Mou- Multi-thread flow insertion:
8789e22b859SSuanming Mou
8799e22b859SSuanming Mou  - In order to achieve best insertion rate, application should manage the flows per lcore.
8809e22b859SSuanming Mou  - Better to disable memory reclaim by setting ``reclaim_mem_mode`` to 0 to accelerate the flow object allocation and release with cache.
8819e22b859SSuanming Mou
882a89f6433SRongwei Liu- HW hashed bonding
883a89f6433SRongwei Liu
884a89f6433SRongwei Liu  - TXQ affinity subjects to HW hash once enabled.
885a89f6433SRongwei Liu
8867299ab68SRongwei Liu- Bonding under socket direct mode
8877299ab68SRongwei Liu
8880f91f952SThomas Monjalon  - Needs MLNX_OFED 5.4+.
8897299ab68SRongwei Liu
890674afdf0SJiawei Wang- Match on aggregated affinity:
891674afdf0SJiawei Wang
892674afdf0SJiawei Wang  - Supports NIC ingress flow in group 0.
893674afdf0SJiawei Wang  - Supports E-Switch flow in group 0 and depends on
894674afdf0SJiawei Wang    device-managed flow steering (DMFS) mode.
895674afdf0SJiawei Wang
8961a3709c1SViacheslav Ovsiienko- Timestamps:
8971a3709c1SViacheslav Ovsiienko
8981a3709c1SViacheslav Ovsiienko  - CQE timestamp field width is limited by hardware to 63 bits, MSB is zero.
8991a3709c1SViacheslav Ovsiienko  - In the free-running mode the timestamp counter is reset on power on
9001a3709c1SViacheslav Ovsiienko    and 63-bit value provides over 1800 years of uptime till overflow.
9011a3709c1SViacheslav Ovsiienko  - In the real-time mode
9021a3709c1SViacheslav Ovsiienko    (configurable with ``REAL_TIME_CLOCK_ENABLE`` firmware settings),
9031a3709c1SViacheslav Ovsiienko    the timestamp presents the nanoseconds elapsed since 01-Jan-1970,
9041a3709c1SViacheslav Ovsiienko    hardware timestamp overflow will happen on 19-Jan-2038
9051a3709c1SViacheslav Ovsiienko    (0x80000000 seconds since 01-Jan-1970).
9061a3709c1SViacheslav Ovsiienko  - The send scheduling is based on timestamps
9071a3709c1SViacheslav Ovsiienko    from the reference "Clock Queue" completions,
9081a3709c1SViacheslav Ovsiienko    the scheduled send timestamps should not be specified with non-zero MSB.
9091a3709c1SViacheslav Ovsiienko
9105c4d4917SSean Zhang- Match on GRE header supports the following fields:
9115c4d4917SSean Zhang
9125c4d4917SSean Zhang  - c_rsvd0_v: C bit, K bit, S bit
9135c4d4917SSean Zhang  - protocol type
9145c4d4917SSean Zhang  - checksum
9155c4d4917SSean Zhang  - key
9165c4d4917SSean Zhang  - sequence
9175c4d4917SSean Zhang
9180f91f952SThomas Monjalon  Matching on checksum and sequence needs MLNX_OFED 5.6+.
9195c4d4917SSean Zhang
9206c299801SDong Zhou- Matching on NVGRE header:
9216c299801SDong Zhou
9226c299801SDong Zhou  - c_rc_k_s_rsvd0_ver
9236c299801SDong Zhou  - protocol
9246c299801SDong Zhou  - tni
9256c299801SDong Zhou  - flow_id
9266c299801SDong Zhou
9276c299801SDong Zhou  In SW steering (``dv_flow_en`` = 1), only tni is supported.
9286c299801SDong Zhou  In HW steering (``dv_flow_en`` = 2), all fields are supported.
9296c299801SDong Zhou
9306d4f1066SJiawei Wang- The NIC egress flow rules on representor port are not supported.
9316d4f1066SJiawei Wang
9324cbeba6fSSuanming Mou- In switch mode, flow rule matching ``RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT`` item
9334cbeba6fSSuanming Mou  with port ID ``UINT16_MAX`` means matching packets sent by E-Switch manager from software.
9344cbeba6fSSuanming Mou  Need MLNX_OFED 24.04+.
9354cbeba6fSSuanming Mou
9363dce73a2SSuanming Mou- A driver limitation for ``RTE_FLOW_ACTION_TYPE_PORT_REPRESENTOR`` action
9373dce73a2SSuanming Mou  restricts the ``port_id`` configuration to only accept the value ``0xffff``,
9383dce73a2SSuanming Mou  indicating the E-Switch manager.
9393dce73a2SSuanming Mou  If the ``repr_matching_en`` flag is enabled, the traffic will be directed
9403dce73a2SSuanming Mou  to the representor of the source virtual port (SF/VF), while if it is disabled,
9413dce73a2SSuanming Mou  the traffic will be routed based on the steering rules in the ingress domain.
9423dce73a2SSuanming Mou
943b2cd3918SJiawei Wang- Send to kernel action (``RTE_FLOW_ACTION_TYPE_SEND_TO_KERNEL``):
944b2cd3918SJiawei Wang
945b2cd3918SJiawei Wang  - Supported on non-root table.
946b2cd3918SJiawei Wang  - Supported in isolated mode.
947b2cd3918SJiawei Wang  - In HW steering (``dv_flow_en`` = 2):
948b2cd3918SJiawei Wang    - not supported on guest port.
949b2cd3918SJiawei Wang
9502eece379SRongwei Liu- During live migration to a new process set its flow engine as standby mode,
9512eece379SRongwei Liu  the user should only program flow rules in group 0 (``fdb_def_rule_en=0``).
9522eece379SRongwei Liu  Live migration is only supported under SWS (``dv_flow_en=1``).
9532eece379SRongwei Liu  The flow group 0 is shared between DPDK processes
9542eece379SRongwei Liu  while the other flow groups are limited to the current process.
9552eece379SRongwei Liu  The flow engine of a process cannot move from active to standby mode
9562eece379SRongwei Liu  if preceding active application rules are still present and vice versa.
9572eece379SRongwei Liu
9585c4d4917SSean Zhang
959abdb903fSShahaf ShulerStatistics
960abdb903fSShahaf Shuler----------
961abdb903fSShahaf Shuler
96223bdbfecSThomas MonjalonMLX5 supports various methods to report statistics:
963abdb903fSShahaf Shuler
964ce9494d7STom BarbettePort statistics can be queried using ``rte_eth_stats_get()``. The received and sent statistics are through SW only and counts the number of packets received or sent successfully by the PMD. The imissed counter is the amount of packets that could not be delivered to SW because a queue was full. Packets not received due to congestion in the bus or on the NIC can be queried via the rx_discards_phy xstats counter.
965abdb903fSShahaf Shuler
9667b61f14eSRaslan DarawshehExtended statistics can be queried using ``rte_eth_xstats_get()``. The extended statistics expose a wider set of counters counted by the device. The extended port statistics counts the number of packets received or sent successfully by the port. As NVIDIA NICs are using the :ref:`Bifurcated Linux Driver <linux_gsg_linux_drivers>` those counters counts also packet received or sent by the Linux kernel. The counters with ``_phy`` suffix counts the total events on the physical port, therefore not valid for VF.
967abdb903fSShahaf Shuler
968abdb903fSShahaf ShulerFinally per-flow statistics can by queried using ``rte_flow_query`` when attaching a count action for specific flow. The flow counter counts the number of packets received successfully by the port and match the specific flow.
969abdb903fSShahaf Shuler
970a3ade5e3SMichael Baum
971*4843aacbSViacheslav OvsiienkoExtended Statistics Counters
972*4843aacbSViacheslav Ovsiienko~~~~~~~~~~~~~~~~~~~~~~~~~~~~
973*4843aacbSViacheslav Ovsiienko
974*4843aacbSViacheslav OvsiienkoSend Scheduling Counters
975*4843aacbSViacheslav Ovsiienko^^^^^^^^^^^^^^^^^^^^^^^^
976*4843aacbSViacheslav Ovsiienko
977*4843aacbSViacheslav OvsiienkoThe mlx5 PMD provides a comprehensive set of counters designed for
978*4843aacbSViacheslav Ovsiienkodebugging and diagnostics related to packet scheduling during transmission.
979*4843aacbSViacheslav OvsiienkoThese counters are applicable only if the port was configured with the ``tx_pp`` devarg
980*4843aacbSViacheslav Ovsiienkoand reflect the status of the PMD scheduling infrastructure
981*4843aacbSViacheslav Ovsiienkobased on Clock and Rearm Queues, used as a workaround on ConnectX-6 DX NICs.
982*4843aacbSViacheslav Ovsiienko
983*4843aacbSViacheslav Ovsiienko``tx_pp_missed_interrupt_errors``
984*4843aacbSViacheslav Ovsiienko  Indicates that the Rearm Queue interrupt was not serviced on time.
985*4843aacbSViacheslav Ovsiienko  The EAL manages interrupts in a dedicated thread,
986*4843aacbSViacheslav Ovsiienko  and it is possible that other time-consuming actions were being processed concurrently.
987*4843aacbSViacheslav Ovsiienko
988*4843aacbSViacheslav Ovsiienko``tx_pp_rearm_queue_errors``
989*4843aacbSViacheslav Ovsiienko  Signifies hardware errors that occurred on the Rearm Queue,
990*4843aacbSViacheslav Ovsiienko  typically caused by delays in servicing interrupts.
991*4843aacbSViacheslav Ovsiienko
992*4843aacbSViacheslav Ovsiienko``tx_pp_clock_queue_errors``
993*4843aacbSViacheslav Ovsiienko  Reflects hardware errors on the Clock Queue,
994*4843aacbSViacheslav Ovsiienko  which usually indicate configuration issues
995*4843aacbSViacheslav Ovsiienko  or problems with the internal NIC hardware or firmware.
996*4843aacbSViacheslav Ovsiienko
997*4843aacbSViacheslav Ovsiienko``tx_pp_timestamp_past_errors``
998*4843aacbSViacheslav Ovsiienko  Tracks the application attempted to send packets with timestamps set in the past.
999*4843aacbSViacheslav Ovsiienko  It is useful for debugging application code
1000*4843aacbSViacheslav Ovsiienko  and does not indicate a malfunction of the PMD.
1001*4843aacbSViacheslav Ovsiienko
1002*4843aacbSViacheslav Ovsiienko``tx_pp_timestamp_future_errors``
1003*4843aacbSViacheslav Ovsiienko  Records attempts by the application to send packets
1004*4843aacbSViacheslav Ovsiienko  with timestamps set too far into the future,
1005*4843aacbSViacheslav Ovsiienko  exceeding the hardware’s scheduling capabilities.
1006*4843aacbSViacheslav Ovsiienko  Like the previous counter, it aids in application debugging
1007*4843aacbSViacheslav Ovsiienko  without suggesting a PMD malfunction.
1008*4843aacbSViacheslav Ovsiienko
1009*4843aacbSViacheslav Ovsiienko``tx_pp_jitter``
1010*4843aacbSViacheslav Ovsiienko  Measures the internal NIC real-time clock jitter estimation
1011*4843aacbSViacheslav Ovsiienko  between two consecutive Clock Queue completions, expressed in nanoseconds.
1012*4843aacbSViacheslav Ovsiienko  Significant jitter may signal potential clock synchronization issues,
1013*4843aacbSViacheslav Ovsiienko  possibly due to inappropriate adjustments
1014*4843aacbSViacheslav Ovsiienko  made by a system PTP (Precision Time Protocol) agent.
1015*4843aacbSViacheslav Ovsiienko
1016*4843aacbSViacheslav Ovsiienko``tx_pp_wander``
1017*4843aacbSViacheslav Ovsiienko  Indicates the long-term stability of the internal NIC real-time clock
1018*4843aacbSViacheslav Ovsiienko  over 2^24 completions, measured in nanoseconds.
1019*4843aacbSViacheslav Ovsiienko  Significant wander may also suggest clock synchronization problems.
1020*4843aacbSViacheslav Ovsiienko
1021*4843aacbSViacheslav Ovsiienko``tx_pp_sync_lost``
1022*4843aacbSViacheslav Ovsiienko  A general operational indicator;
1023*4843aacbSViacheslav Ovsiienko  a non-zero value indicates that the driver has lost synchronization with the Clock Queue,
1024*4843aacbSViacheslav Ovsiienko  resulting in improper scheduling operations.
1025*4843aacbSViacheslav Ovsiienko  To restore correct scheduling functionality, it is necessary to restart the port.
1026*4843aacbSViacheslav Ovsiienko
1027*4843aacbSViacheslav OvsiienkoThe following counters are particularly valuable for verifying and debugging application code.
1028*4843aacbSViacheslav OvsiienkoThey do not indicate driver or hardware malfunctions
1029*4843aacbSViacheslav Ovsiienkoand are applicable to newer hardware with direct on-time scheduling capabilities
1030*4843aacbSViacheslav Ovsiienko(such as ConnectX-7 and above):
1031*4843aacbSViacheslav Ovsiienko
1032*4843aacbSViacheslav Ovsiienko``tx_pp_timestamp_order_errors``
1033*4843aacbSViacheslav Ovsiienko  Indicates attempts by the application to send packets
1034*4843aacbSViacheslav Ovsiienko  with timestamps that are not in strictly ascending order.
1035*4843aacbSViacheslav Ovsiienko  Since the PMD does not reorder packets within hardware queues,
1036*4843aacbSViacheslav Ovsiienko  violations of timestamp order can lead to packets being sent at incorrect times.
1037*4843aacbSViacheslav Ovsiienko
1038*4843aacbSViacheslav Ovsiienko
1039a3ade5e3SMichael BaumCompilation
1040a3ade5e3SMichael Baum-----------
1041a3ade5e3SMichael Baum
1042a3ade5e3SMichael BaumSee :ref:`mlx5 common compilation <mlx5_common_compilation>`.
1043a3ade5e3SMichael Baum
1044a3ade5e3SMichael Baum
1045a7e11a0cSAdrien MazarguilConfiguration
1046a7e11a0cSAdrien Mazarguil-------------
1047a7e11a0cSAdrien Mazarguil
1048a3ade5e3SMichael BaumEnvironment Configuration
1049a3ade5e3SMichael Baum~~~~~~~~~~~~~~~~~~~~~~~~~
1050a7e11a0cSAdrien Mazarguil
1051a3ade5e3SMichael BaumSee :ref:`mlx5 common configuration <mlx5_common_env>`.
1052a7e11a0cSAdrien Mazarguil
1053a3ade5e3SMichael BaumFirmware configuration
1054a7e11a0cSAdrien Mazarguil~~~~~~~~~~~~~~~~~~~~~~
1055a7e11a0cSAdrien Mazarguil
1056a3ade5e3SMichael BaumSee :ref:`mlx5_firmware_config` guide.
1057f772cc42SThomas Monjalon
1058b583b9a1SFerruh YigitRuntime Configuration
1059b583b9a1SFerruh Yigit~~~~~~~~~~~~~~~~~~~~~
1060a3ade5e3SMichael Baum
1061a3ade5e3SMichael BaumPlease refer to :ref:`mlx5 common options <mlx5_common_driver_options>`
1062a3ade5e3SMichael Baumfor an additional list of options shared with other mlx5 drivers.
1063f772cc42SThomas Monjalon
106499c12dccSNélio Laranjeiro- ``rxq_cqe_comp_en`` parameter [int]
106599c12dccSNélio Laranjeiro
106699c12dccSNélio Laranjeiro  A nonzero value enables the compression of CQE on RX side. This feature
10675747882cSShahaf Shuler  allows to save PCI bandwidth and improve performance. Enabled by default.
106854c2d46bSAlexander Kozyrev  Different compression formats are supported in order to achieve the best
1069fdc44cdcSAlexander Kozyrev  performance for different traffic patterns. Default format depends on
1070fdc44cdcSAlexander Kozyrev  Multi-Packet Rx queue configuration: Hash RSS format is used in case
1071fdc44cdcSAlexander Kozyrev  MPRQ is disabled, Checksum format is used in case MPRQ is enabled.
107254c2d46bSAlexander Kozyrev
107399532fb1SAlexander Kozyrev  The lower 3 bits define the CQE compression format:
107499532fb1SAlexander Kozyrev  Specifying 2 in these bits of the ``rxq_cqe_comp_en`` parameter selects
107599532fb1SAlexander Kozyrev  the flow tag format for better compression rate in case of flow mark traffic.
107699532fb1SAlexander Kozyrev  Specifying 3 in these bits selects checksum format.
107799532fb1SAlexander Kozyrev  Specifying 4 in these bits selects L3/L4 header format for
107854c2d46bSAlexander Kozyrev  better compression rate in case of mixed TCP/UDP and IPv4/IPv6 traffic.
1079fdc44cdcSAlexander Kozyrev  CQE compression format selection requires DevX to be enabled. If there is
1080fdc44cdcSAlexander Kozyrev  no DevX enabled/supported the value is reset to 1 by default.
108199c12dccSNélio Laranjeiro
108299532fb1SAlexander Kozyrev  8th bit defines the CQE compression layout.
108399532fb1SAlexander Kozyrev  Setting this bit to 1 turns enhanced CQE compression layout on.
108499532fb1SAlexander Kozyrev  Enhanced CQE compression is designed for better latency and SW utilization.
108599532fb1SAlexander Kozyrev  This bit is ignored if only the basic CQE compression layout is supported.
108699532fb1SAlexander Kozyrev
108713726648SOlga Shern  Supported on:
108813726648SOlga Shern
10896c21c887SRaslan Darawsheh  - x86_64 with ConnectX-4, ConnectX-4 Lx, ConnectX-5, ConnectX-6, ConnectX-6 Dx,
1090cb0da841SRaslan Darawsheh    ConnectX-6 Lx, ConnectX-7, BlueField, BlueField-2, and BlueField-3.
10916c21c887SRaslan Darawsheh  - POWER9 and ARMv8 with ConnectX-4 Lx, ConnectX-5, ConnectX-6, ConnectX-6 Dx,
1092cb0da841SRaslan Darawsheh    ConnectX-6 Lx, ConnectX-7 BlueField, BlueField-2, and BlueField-3.
109313726648SOlga Shern
109478c7a16dSYongseok Koh- ``rxq_pkt_pad_en`` parameter [int]
109578c7a16dSYongseok Koh
109678c7a16dSYongseok Koh  A nonzero value enables padding Rx packet to the size of cacheline on PCI
109778c7a16dSYongseok Koh  transaction. This feature would waste PCI bandwidth but could improve
109878c7a16dSYongseok Koh  performance by avoiding partial cacheline write which may cause costly
109978c7a16dSYongseok Koh  read-modify-copy in memory transaction on some architectures. Disabled by
110078c7a16dSYongseok Koh  default.
110178c7a16dSYongseok Koh
110278c7a16dSYongseok Koh  Supported on:
110378c7a16dSYongseok Koh
11046c21c887SRaslan Darawsheh  - x86_64 with ConnectX-4, ConnectX-4 Lx, ConnectX-5, ConnectX-6, ConnectX-6 Dx,
1105cb0da841SRaslan Darawsheh    ConnectX-6 Lx, ConnectX-7, BlueField, BlueField-2, and BlueField-3.
11066c21c887SRaslan Darawsheh  - POWER8 and ARMv8 with ConnectX-4 Lx, ConnectX-5, ConnectX-6, ConnectX-6 Dx,
1107cb0da841SRaslan Darawsheh    ConnectX-6 Lx, ConnectX-7, BlueField, BlueField-2, and BlueField-3.
110878c7a16dSYongseok Koh
1109febcac7bSBing Zhao- ``delay_drop`` parameter [int]
1110febcac7bSBing Zhao
1111febcac7bSBing Zhao  Bitmask value for the Rx queue delay drop attribute. Bit 0 is used for the
1112febcac7bSBing Zhao  standard Rx queue and bit 1 is used for the hairpin Rx queue. By default, the
1113febcac7bSBing Zhao  delay drop is disabled for all Rx queues. It will be ignored if the port does
1114febcac7bSBing Zhao  not support the attribute even if it is enabled explicitly.
1115febcac7bSBing Zhao
1116febcac7bSBing Zhao  The packets being received will not be dropped immediately when the WQEs are
1117febcac7bSBing Zhao  exhausted in a Rx queue with delay drop enabled.
1118febcac7bSBing Zhao
1119e8482187SBing Zhao  A timeout value is set in the driver to control the waiting time before
1120e8482187SBing Zhao  dropping a packet. Once the timer is expired, the delay drop will be
11217be78d02SJosh Soref  deactivated for all the Rx queues with this feature enable. To re-activate
1122e8482187SBing Zhao  it, a rearming is needed and it is part of the kernel driver starting from
11230f91f952SThomas Monjalon  MLNX_OFED 5.5.
1124e8482187SBing Zhao
1125e8482187SBing Zhao  To enable / disable the delay drop rearming, the private flag ``dropless_rq``
1126e8482187SBing Zhao  can be set and queried via ethtool:
1127e8482187SBing Zhao
1128e8482187SBing Zhao  - ethtool --set-priv-flags <netdev> dropless_rq on (/ off)
1129e8482187SBing Zhao  - ethtool --show-priv-flags <netdev>
1130e8482187SBing Zhao
1131e8482187SBing Zhao  The configuration flag is global per PF and can only be set on the PF, once
1132e8482187SBing Zhao  it is on, all the VFs', SFs' and representors' Rx queues will share the timer
1133e8482187SBing Zhao  and rearming.
1134e8482187SBing Zhao
11357d6bf6b8SYongseok Koh- ``mprq_en`` parameter [int]
11367d6bf6b8SYongseok Koh
11377d6bf6b8SYongseok Koh  A nonzero value enables configuring Multi-Packet Rx queues. Rx queue is
11387d6bf6b8SYongseok Koh  configured as Multi-Packet RQ if the total number of Rx queues is
1139ecb16045SAlexander Kozyrev  ``rxqs_min_mprq`` or more. Disabled by default.
11407d6bf6b8SYongseok Koh
11417d6bf6b8SYongseok Koh  Multi-Packet Rx Queue (MPRQ a.k.a Striding RQ) can further save PCIe bandwidth
11427d6bf6b8SYongseok Koh  by posting a single large buffer for multiple packets. Instead of posting a
11437d6bf6b8SYongseok Koh  buffers per a packet, one large buffer is posted in order to receive multiple
11447d6bf6b8SYongseok Koh  packets on the buffer. A MPRQ buffer consists of multiple fixed-size strides
11457d6bf6b8SYongseok Koh  and each stride receives one packet. MPRQ can improve throughput for
11463fdac78aSDekel Peled  small-packet traffic.
11477d6bf6b8SYongseok Koh
11481bb4a528SFerruh Yigit  When MPRQ is enabled, MTU can be larger than the size of
1149295968d1SFerruh Yigit  user-provided mbuf even if RTE_ETH_RX_OFFLOAD_SCATTER isn't enabled. PMD will
11501bb4a528SFerruh Yigit  configure large stride size enough to accommodate MTU as long as
11517d6bf6b8SYongseok Koh  device allows. Note that this can waste system memory compared to enabling Rx
11527d6bf6b8SYongseok Koh  scatter and multi-segment packet.
11537d6bf6b8SYongseok Koh
11547d6bf6b8SYongseok Koh- ``mprq_log_stride_num`` parameter [int]
11557d6bf6b8SYongseok Koh
11567d6bf6b8SYongseok Koh  Log 2 of the number of strides for Multi-Packet Rx queue. Configuring more
11573fdac78aSDekel Peled  strides can reduce PCIe traffic further. If configured value is not in the
11587d6bf6b8SYongseok Koh  range of device capability, the default value will be set with a warning
11597d6bf6b8SYongseok Koh  message. The default value is 4 which is 16 strides per a buffer, valid only
11607d6bf6b8SYongseok Koh  if ``mprq_en`` is set.
11617d6bf6b8SYongseok Koh
11627d6bf6b8SYongseok Koh  The size of Rx queue should be bigger than the number of strides.
11637d6bf6b8SYongseok Koh
1164ecb16045SAlexander Kozyrev- ``mprq_log_stride_size`` parameter [int]
1165ecb16045SAlexander Kozyrev
1166ecb16045SAlexander Kozyrev  Log 2 of the size of a stride for Multi-Packet Rx queue. Configuring a smaller
1167ecb16045SAlexander Kozyrev  stride size can save some memory and reduce probability of a depletion of all
1168ecb16045SAlexander Kozyrev  available strides due to unreleased packets by an application. If configured
1169ecb16045SAlexander Kozyrev  value is not in the range of device capability, the default value will be set
1170ecb16045SAlexander Kozyrev  with a warning message. The default value is 11 which is 2048 bytes per a
1171ecb16045SAlexander Kozyrev  stride, valid only if ``mprq_en`` is set. With ``mprq_log_stride_size`` set
1172b7ff093eSAli Alnubani  it is possible for a packet to span across multiple strides. This mode allows
1173ecb16045SAlexander Kozyrev  support of jumbo frames (9K) with MPRQ. The memcopy of some packets (or part
1174ecb16045SAlexander Kozyrev  of a packet if Rx scatter is configured) may be required in case there is no
1175ecb16045SAlexander Kozyrev  space left for a head room at the end of a stride which incurs some
1176ecb16045SAlexander Kozyrev  performance penalty.
1177ecb16045SAlexander Kozyrev
11787d6bf6b8SYongseok Koh- ``mprq_max_memcpy_len`` parameter [int]
11797d6bf6b8SYongseok Koh
11807d6bf6b8SYongseok Koh  The maximum length of packet to memcpy in case of Multi-Packet Rx queue. Rx
11817d6bf6b8SYongseok Koh  packet is mem-copied to a user-provided mbuf if the size of Rx packet is less
11827d6bf6b8SYongseok Koh  than or equal to this parameter. Otherwise, PMD will attach the Rx packet to
11837d6bf6b8SYongseok Koh  the mbuf by external buffer attachment - ``rte_pktmbuf_attach_extbuf()``.
11847d6bf6b8SYongseok Koh  A mempool for external buffers will be allocated and managed by PMD. If Rx
11857d6bf6b8SYongseok Koh  packet is externally attached, ol_flags field of the mbuf will have
1186daa02b5cSOlivier Matz  RTE_MBUF_F_EXTERNAL and this flag must be preserved. ``RTE_MBUF_HAS_EXTBUF()``
11877d6bf6b8SYongseok Koh  checks the flag. The default value is 128, valid only if ``mprq_en`` is set.
11887d6bf6b8SYongseok Koh
11897d6bf6b8SYongseok Koh- ``rxqs_min_mprq`` parameter [int]
11907d6bf6b8SYongseok Koh
11917d6bf6b8SYongseok Koh  Configure Rx queues as Multi-Packet RQ if the total number of Rx queues is
11927d6bf6b8SYongseok Koh  greater or equal to this value. The default value is 12, valid only if
11937d6bf6b8SYongseok Koh  ``mprq_en`` is set.
11947d6bf6b8SYongseok Koh
11952a66cf37SYaacov Hazan- ``txq_inline`` parameter [int]
11962a66cf37SYaacov Hazan
1197a6bd4911SViacheslav Ovsiienko  Amount of data to be inlined during TX operations. This parameter is
1198505f1fe4SViacheslav Ovsiienko  deprecated and converted to the new parameter ``txq_inline_max`` providing
1199505f1fe4SViacheslav Ovsiienko  partial compatibility.
12002a66cf37SYaacov Hazan
12012a66cf37SYaacov Hazan- ``txqs_min_inline`` parameter [int]
12022a66cf37SYaacov Hazan
1203505f1fe4SViacheslav Ovsiienko  Enable inline data send only when the number of TX queues is greater or equal
12042a66cf37SYaacov Hazan  to this value.
12052a66cf37SYaacov Hazan
1206505f1fe4SViacheslav Ovsiienko  This option should be used in combination with ``txq_inline_max`` and
1207505f1fe4SViacheslav Ovsiienko  ``txq_inline_mpw`` below and does not affect ``txq_inline_min`` settings above.
12082a66cf37SYaacov Hazan
1209505f1fe4SViacheslav Ovsiienko  If this option is not specified the default value 16 is used for BlueField
1210505f1fe4SViacheslav Ovsiienko  and 8 for other platforms
12115747882cSShahaf Shuler
1212505f1fe4SViacheslav Ovsiienko  The data inlining consumes the CPU cycles, so this option is intended to
1213505f1fe4SViacheslav Ovsiienko  auto enable inline data if we have enough Tx queues, which means we have
1214505f1fe4SViacheslav Ovsiienko  enough CPU cores and PCI bandwidth is getting more critical and CPU
1215505f1fe4SViacheslav Ovsiienko  is not supposed to be bottleneck anymore.
12165747882cSShahaf Shuler
1217505f1fe4SViacheslav Ovsiienko  The copying data into WQE improves latency and can improve PPS performance
1218505f1fe4SViacheslav Ovsiienko  when PCI back pressure is detected and may be useful for scenarios involving
1219505f1fe4SViacheslav Ovsiienko  heavy traffic on many queues.
12205747882cSShahaf Shuler
1221505f1fe4SViacheslav Ovsiienko  Because additional software logic is necessary to handle this mode, this
1222505f1fe4SViacheslav Ovsiienko  option should be used with care, as it may lower performance when back
1223505f1fe4SViacheslav Ovsiienko  pressure is not expected.
1224505f1fe4SViacheslav Ovsiienko
12253a418d1dSViacheslav Ovsiienko  If inline data are enabled it may affect the maximal size of Tx queue in
12263a418d1dSViacheslav Ovsiienko  descriptors because the inline data increase the descriptor size and
12273a418d1dSViacheslav Ovsiienko  queue size limits supported by hardware may be exceeded.
12283a418d1dSViacheslav Ovsiienko
1229505f1fe4SViacheslav Ovsiienko- ``txq_inline_min`` parameter [int]
1230505f1fe4SViacheslav Ovsiienko
1231505f1fe4SViacheslav Ovsiienko  Minimal amount of data to be inlined into WQE during Tx operations. NICs
1232505f1fe4SViacheslav Ovsiienko  may require this minimal data amount to operate correctly. The exact value
1233e98e44baSViacheslav Ovsiienko  may depend on NIC operation mode, requested offloads, etc. It is strongly
1234e98e44baSViacheslav Ovsiienko  recommended to omit this parameter and use the default values. Anyway,
1235e98e44baSViacheslav Ovsiienko  applications using this parameter should take into consideration that
1236e98e44baSViacheslav Ovsiienko  specifying an inconsistent value may prevent the NIC from sending packets.
1237505f1fe4SViacheslav Ovsiienko
1238505f1fe4SViacheslav Ovsiienko  If ``txq_inline_min`` key is present the specified value (may be aligned
1239505f1fe4SViacheslav Ovsiienko  by the driver in order not to exceed the limits and provide better descriptor
1240e98e44baSViacheslav Ovsiienko  space utilization) will be used by the driver and it is guaranteed that
1241e98e44baSViacheslav Ovsiienko  requested amount of data bytes are inlined into the WQE beside other inline
1242e98e44baSViacheslav Ovsiienko  settings. This key also may update ``txq_inline_max`` value (default
1243e98e44baSViacheslav Ovsiienko  or specified explicitly in devargs) to reserve the space for inline data.
1244505f1fe4SViacheslav Ovsiienko
1245505f1fe4SViacheslav Ovsiienko  If ``txq_inline_min`` key is not present, the value may be queried by the
1246505f1fe4SViacheslav Ovsiienko  driver from the NIC via DevX if this feature is available. If there is no DevX
1247505f1fe4SViacheslav Ovsiienko  enabled/supported the value 18 (supposing L2 header including VLAN) is set
1248ee76bddcSThomas Monjalon  for ConnectX-4 and ConnectX-4 Lx, and 0 is set by default for ConnectX-5
1249505f1fe4SViacheslav Ovsiienko  and newer NICs. If packet is shorter the ``txq_inline_min`` value, the entire
1250505f1fe4SViacheslav Ovsiienko  packet is inlined.
1251505f1fe4SViacheslav Ovsiienko
1252e98e44baSViacheslav Ovsiienko  For ConnectX-4 NIC, driver does not allow specifying value below 18
1253e98e44baSViacheslav Ovsiienko  (minimal L2 header, including VLAN), error will be raised.
1254e98e44baSViacheslav Ovsiienko
1255ee76bddcSThomas Monjalon  For ConnectX-4 Lx NIC, it is allowed to specify values below 18, but
1256e98e44baSViacheslav Ovsiienko  it is not recommended and may prevent NIC from sending packets over
1257e98e44baSViacheslav Ovsiienko  some configurations.
1258505f1fe4SViacheslav Ovsiienko
1259e60561ccSDmitry Kozlyuk  For ConnectX-4 and ConnectX-4 Lx NICs, automatically configured value
1260e60561ccSDmitry Kozlyuk  is insufficient for some traffic, because they require at least all L2 headers
1261e60561ccSDmitry Kozlyuk  to be inlined. For example, Q-in-Q adds 4 bytes to default 18 bytes
1262e60561ccSDmitry Kozlyuk  of Ethernet and VLAN, thus ``txq_inline_min`` must be set to 22.
1263e60561ccSDmitry Kozlyuk  MPLS would add 4 bytes per label. Final value must account for all possible
1264e60561ccSDmitry Kozlyuk  L2 encapsulation headers used in particular environment.
1265e60561ccSDmitry Kozlyuk
1266505f1fe4SViacheslav Ovsiienko  Please, note, this minimal data inlining disengages eMPW feature (Enhanced
1267505f1fe4SViacheslav Ovsiienko  Multi-Packet Write), because last one does not support partial packet inlining.
1268505f1fe4SViacheslav Ovsiienko  This is not very critical due to minimal data inlining is mostly required
1269505f1fe4SViacheslav Ovsiienko  by ConnectX-4 and ConnectX-4 Lx, these NICs do not support eMPW feature.
1270505f1fe4SViacheslav Ovsiienko
1271505f1fe4SViacheslav Ovsiienko- ``txq_inline_max`` parameter [int]
1272505f1fe4SViacheslav Ovsiienko
1273505f1fe4SViacheslav Ovsiienko  Specifies the maximal packet length to be completely inlined into WQE
1274505f1fe4SViacheslav Ovsiienko  Ethernet Segment for ordinary SEND method. If packet is larger than specified
1275505f1fe4SViacheslav Ovsiienko  value, the packet data won't be copied by the driver at all, data buffer
1276505f1fe4SViacheslav Ovsiienko  is addressed with a pointer. If packet length is less or equal all packet
1277505f1fe4SViacheslav Ovsiienko  data will be copied into WQE. This may improve PCI bandwidth utilization for
1278505f1fe4SViacheslav Ovsiienko  short packets significantly but requires the extra CPU cycles.
1279505f1fe4SViacheslav Ovsiienko
1280505f1fe4SViacheslav Ovsiienko  The data inline feature is controlled by number of Tx queues, if number of Tx
1281505f1fe4SViacheslav Ovsiienko  queues is larger than ``txqs_min_inline`` key parameter, the inline feature
1282505f1fe4SViacheslav Ovsiienko  is engaged, if there are not enough Tx queues (which means not enough CPU cores
1283505f1fe4SViacheslav Ovsiienko  and CPU resources are scarce), data inline is not performed by the driver.
1284505f1fe4SViacheslav Ovsiienko  Assigning ``txqs_min_inline`` with zero always enables the data inline.
1285505f1fe4SViacheslav Ovsiienko
1286505f1fe4SViacheslav Ovsiienko  The default ``txq_inline_max`` value is 290. The specified value may be adjusted
1287505f1fe4SViacheslav Ovsiienko  by the driver in order not to exceed the limit (930 bytes) and to provide better
1288505f1fe4SViacheslav Ovsiienko  WQE space filling without gaps, the adjustment is reflected in the debug log.
1289b53cd869SViacheslav Ovsiienko  Also, the default value (290) may be decreased in run-time if the large transmit
1290b53cd869SViacheslav Ovsiienko  queue size is requested and hardware does not support enough descriptor
1291b53cd869SViacheslav Ovsiienko  amount, in this case warning is emitted. If ``txq_inline_max`` key is
1292b53cd869SViacheslav Ovsiienko  specified and requested inline settings can not be satisfied then error
1293b53cd869SViacheslav Ovsiienko  will be raised.
1294505f1fe4SViacheslav Ovsiienko
1295505f1fe4SViacheslav Ovsiienko- ``txq_inline_mpw`` parameter [int]
1296505f1fe4SViacheslav Ovsiienko
1297505f1fe4SViacheslav Ovsiienko  Specifies the maximal packet length to be completely inlined into WQE for
1298505f1fe4SViacheslav Ovsiienko  Enhanced MPW method. If packet is large the specified value, the packet data
1299505f1fe4SViacheslav Ovsiienko  won't be copied, and data buffer is addressed with pointer. If packet length
1300505f1fe4SViacheslav Ovsiienko  is less or equal, all packet data will be copied into WQE. This may improve PCI
1301505f1fe4SViacheslav Ovsiienko  bandwidth utilization for short packets significantly but requires the extra
1302505f1fe4SViacheslav Ovsiienko  CPU cycles.
1303505f1fe4SViacheslav Ovsiienko
1304505f1fe4SViacheslav Ovsiienko  The data inline feature is controlled by number of TX queues, if number of Tx
1305505f1fe4SViacheslav Ovsiienko  queues is larger than ``txqs_min_inline`` key parameter, the inline feature
1306505f1fe4SViacheslav Ovsiienko  is engaged, if there are not enough Tx queues (which means not enough CPU cores
1307505f1fe4SViacheslav Ovsiienko  and CPU resources are scarce), data inline is not performed by the driver.
1308505f1fe4SViacheslav Ovsiienko  Assigning ``txqs_min_inline`` with zero always enables the data inline.
1309505f1fe4SViacheslav Ovsiienko
13103502b059SViacheslav Ovsiienko  The default ``txq_inline_mpw`` value is 268. The specified value may be adjusted
1311505f1fe4SViacheslav Ovsiienko  by the driver in order not to exceed the limit (930 bytes) and to provide better
1312505f1fe4SViacheslav Ovsiienko  WQE space filling without gaps, the adjustment is reflected in the debug log.
1313505f1fe4SViacheslav Ovsiienko  Due to multiple packets may be included to the same WQE with Enhanced Multi
1314505f1fe4SViacheslav Ovsiienko  Packet Write Method and overall WQE size is limited it is not recommended to
1315b53cd869SViacheslav Ovsiienko  specify large values for the ``txq_inline_mpw``. Also, the default value (268)
1316b53cd869SViacheslav Ovsiienko  may be decreased in run-time if the large transmit queue size is requested
1317b53cd869SViacheslav Ovsiienko  and hardware does not support enough descriptor amount, in this case warning
1318b53cd869SViacheslav Ovsiienko  is emitted. If ``txq_inline_mpw`` key is  specified and requested inline
1319b53cd869SViacheslav Ovsiienko  settings can not be satisfied then error will be raised.
13205747882cSShahaf Shuler
132109d8b416SYongseok Koh- ``txqs_max_vec`` parameter [int]
132209d8b416SYongseok Koh
132309d8b416SYongseok Koh  Enable vectorized Tx only when the number of TX queues is less than or
1324a6bd4911SViacheslav Ovsiienko  equal to this value. This parameter is deprecated and ignored, kept
1325a6bd4911SViacheslav Ovsiienko  for compatibility issue to not prevent driver from probing.
132609d8b416SYongseok Koh
13276ce84bd8SYongseok Koh- ``txq_mpw_hdr_dseg_en`` parameter [int]
13286ce84bd8SYongseok Koh
13296ce84bd8SYongseok Koh  A nonzero value enables including two pointers in the first block of TX
1330a6bd4911SViacheslav Ovsiienko  descriptor. The parameter is deprecated and ignored, kept for compatibility
1331a6bd4911SViacheslav Ovsiienko  issue.
13326ce84bd8SYongseok Koh
13336ce84bd8SYongseok Koh- ``txq_max_inline_len`` parameter [int]
13346ce84bd8SYongseok Koh
13356ce84bd8SYongseok Koh  Maximum size of packet to be inlined. This limits the size of packet to
13366ce84bd8SYongseok Koh  be inlined. If the size of a packet is larger than configured value, the
13376ce84bd8SYongseok Koh  packet isn't inlined even though there's enough space remained in the
1338a6bd4911SViacheslav Ovsiienko  descriptor. Instead, the packet is included with pointer. This parameter
1339505f1fe4SViacheslav Ovsiienko  is deprecated and converted directly to ``txq_inline_mpw`` providing full
1340505f1fe4SViacheslav Ovsiienko  compatibility. Valid only if eMPW feature is engaged.
1341505f1fe4SViacheslav Ovsiienko
1342505f1fe4SViacheslav Ovsiienko- ``txq_mpw_en`` parameter [int]
1343505f1fe4SViacheslav Ovsiienko
1344505f1fe4SViacheslav Ovsiienko  A nonzero value enables Enhanced Multi-Packet Write (eMPW) for ConnectX-5,
1345cb0da841SRaslan Darawsheh  ConnectX-6, ConnectX-6 Dx, ConnectX-6 Lx, ConnectX-7, BlueField, BlueField-2
1346cb0da841SRaslan Darawsheh  BlueField-3. eMPW allows the Tx burst function to pack up multiple packets
13476c21c887SRaslan Darawsheh  in a single descriptor session in order to save PCI bandwidth
13486c21c887SRaslan Darawsheh  and improve performance at the cost of a slightly higher CPU usage.
13496c21c887SRaslan Darawsheh  When ``txq_inline_mpw`` is set along with ``txq_mpw_en``,
13506c21c887SRaslan Darawsheh  Tx burst function copies entire packet data on to Tx descriptor
13516c21c887SRaslan Darawsheh  instead of including pointer of packet.
1352505f1fe4SViacheslav Ovsiienko
1353505f1fe4SViacheslav Ovsiienko  The Enhanced Multi-Packet Write feature is enabled by default if NIC supports
1354505f1fe4SViacheslav Ovsiienko  it, can be disabled by explicit specifying 0 value for ``txq_mpw_en`` option.
1355505f1fe4SViacheslav Ovsiienko  Also, if minimal data inlining is requested by non-zero ``txq_inline_min``
1356505f1fe4SViacheslav Ovsiienko  option or reported by the NIC, the eMPW feature is disengaged.
13576ce84bd8SYongseok Koh
13588409a285SViacheslav Ovsiienko- ``tx_db_nc`` parameter [int]
13598409a285SViacheslav Ovsiienko
1360a6b9d5a5SMichael Baum  This parameter name is deprecated and ignored.
1361a6b9d5a5SMichael Baum  The new name for this parameter is ``sq_db_nc``.
1362a6b9d5a5SMichael Baum  See :ref:`common driver options <mlx5_common_driver_options>`.
13638409a285SViacheslav Ovsiienko
13648f848f32SViacheslav Ovsiienko- ``tx_pp`` parameter [int]
13658f848f32SViacheslav Ovsiienko
13668f848f32SViacheslav Ovsiienko  If a nonzero value is specified the driver creates all necessary internal
13678f848f32SViacheslav Ovsiienko  objects to provide accurate packet send scheduling on mbuf timestamps.
13688f848f32SViacheslav Ovsiienko  The positive value specifies the scheduling granularity in nanoseconds,
13698f848f32SViacheslav Ovsiienko  the packet send will be accurate up to specified digits. The allowed range is
13708f848f32SViacheslav Ovsiienko  from 500 to 1 million of nanoseconds. The negative value specifies the module
13718f848f32SViacheslav Ovsiienko  of granularity and engages the special test mode the check the schedule rate.
13728f848f32SViacheslav Ovsiienko  By default (if the ``tx_pp`` is not specified) send scheduling on timestamps
13738f848f32SViacheslav Ovsiienko  feature is disabled.
13748f848f32SViacheslav Ovsiienko
137549e87976SViacheslav Ovsiienko  Starting with ConnectX-7 the capability to schedule traffic directly
137649e87976SViacheslav Ovsiienko  on timestamp specified in descriptor is provided,
137749e87976SViacheslav Ovsiienko  no extra objects are needed anymore and scheduling capability
137849e87976SViacheslav Ovsiienko  is advertised and handled regardless ``tx_pp`` parameter presence.
137949e87976SViacheslav Ovsiienko
13808f848f32SViacheslav Ovsiienko- ``tx_skew`` parameter [int]
13818f848f32SViacheslav Ovsiienko
13828f848f32SViacheslav Ovsiienko  The parameter adjusts the send packet scheduling on timestamps and represents
13838f848f32SViacheslav Ovsiienko  the average delay between beginning of the transmitting descriptor processing
13848f848f32SViacheslav Ovsiienko  by the hardware and appearance of actual packet data on the wire. The value
13858f848f32SViacheslav Ovsiienko  should be provided in nanoseconds and is valid only if ``tx_pp`` parameter is
13868f848f32SViacheslav Ovsiienko  specified. The default value is zero.
13878f848f32SViacheslav Ovsiienko
13885644d5b9SNelio Laranjeiro- ``tx_vec_en`` parameter [int]
13895644d5b9SNelio Laranjeiro
13906c21c887SRaslan Darawsheh  A nonzero value enables Tx vector on ConnectX-5, ConnectX-6, ConnectX-6 Dx,
1391cb0da841SRaslan Darawsheh  ConnectX-6 Lx, ConnectX-7, BlueField, BlueField-2, and BlueField-3 NICs
13926c21c887SRaslan Darawsheh  if the number of global Tx queues on the port is less than ``txqs_max_vec``.
13936c21c887SRaslan Darawsheh  The parameter is deprecated and ignored.
13945644d5b9SNelio Laranjeiro
13955644d5b9SNelio Laranjeiro- ``rx_vec_en`` parameter [int]
13965644d5b9SNelio Laranjeiro
13975644d5b9SNelio Laranjeiro  A nonzero value enables Rx vector if the port is not configured in
13985644d5b9SNelio Laranjeiro  multi-segment otherwise this parameter is ignored.
13995644d5b9SNelio Laranjeiro
14005644d5b9SNelio Laranjeiro  Enabled by default.
14015644d5b9SNelio Laranjeiro
1402db209cc3SNélio Laranjeiro- ``vf_nl_en`` parameter [int]
1403db209cc3SNélio Laranjeiro
1404db209cc3SNélio Laranjeiro  A nonzero value enables Netlink requests from the VF to add/remove MAC
1405db209cc3SNélio Laranjeiro  addresses or/and enable/disable promiscuous/all multicast on the Netdevice.
1406db209cc3SNélio Laranjeiro  Otherwise the relevant configuration must be run with Linux iproute2 tools.
1407db209cc3SNélio Laranjeiro  This is a prerequisite to receive this kind of traffic.
1408db209cc3SNélio Laranjeiro
1409db209cc3SNélio Laranjeiro  Enabled by default, valid only on VF devices ignored otherwise.
1410db209cc3SNélio Laranjeiro
141178a54648SXueming Li- ``l3_vxlan_en`` parameter [int]
141278a54648SXueming Li
141305dda761SXueming Li  A nonzero value allows L3 VXLAN and VXLAN-GPE flow creation. To enable
141405dda761SXueming Li  L3 VXLAN or VXLAN-GPE, users has to configure firmware and enable this
141505dda761SXueming Li  parameter. This is a prerequisite to receive this kind of traffic.
141678a54648SXueming Li
141778a54648SXueming Li  Disabled by default.
141878a54648SXueming Li
14192d241515SViacheslav Ovsiienko- ``dv_xmeta_en`` parameter [int]
14202d241515SViacheslav Ovsiienko
14212d241515SViacheslav Ovsiienko  A nonzero value enables extensive flow metadata support if device is
14222d241515SViacheslav Ovsiienko  capable and driver supports it. This can enable extensive support of
14232d241515SViacheslav Ovsiienko  ``MARK`` and ``META`` item of ``rte_flow``. The newly introduced
14242d241515SViacheslav Ovsiienko  ``SET_TAG`` and ``SET_META`` actions do not depend on ``dv_xmeta_en``.
14252d241515SViacheslav Ovsiienko
14262d241515SViacheslav Ovsiienko  There are some possible configurations, depending on parameter value:
14272d241515SViacheslav Ovsiienko
14282d241515SViacheslav Ovsiienko  - 0, this is default value, defines the legacy mode, the ``MARK`` and
14292d241515SViacheslav Ovsiienko    ``META`` related actions and items operate only within NIC Tx and
14302d241515SViacheslav Ovsiienko    NIC Rx steering domains, no ``MARK`` and ``META`` information crosses
14312d241515SViacheslav Ovsiienko    the domain boundaries. The ``MARK`` item is 24 bits wide, the ``META``
143210943706SMichael Baum    item is 32 bits wide and match supported on egress only
143310943706SMichael Baum    when ``dv_flow_en`` = 1.
14342d241515SViacheslav Ovsiienko
14352d241515SViacheslav Ovsiienko  - 1, this engages extensive metadata mode, the ``MARK`` and ``META``
14362d241515SViacheslav Ovsiienko    related actions and items operate within all supported steering domains,
14372d241515SViacheslav Ovsiienko    including FDB, ``MARK`` and ``META`` information may cross the domain
14382d241515SViacheslav Ovsiienko    boundaries. The ``MARK`` item is 24 bits wide, the ``META`` item width
14392d241515SViacheslav Ovsiienko    depends on kernel and firmware configurations and might be 0, 16 or
14402d241515SViacheslav Ovsiienko    32 bits. Within NIC Tx domain ``META`` data width is 32 bits for
14412d241515SViacheslav Ovsiienko    compatibility, the actual width of data transferred to the FDB domain
14422d241515SViacheslav Ovsiienko    depends on kernel configuration and may be vary. The actual supported
14432d241515SViacheslav Ovsiienko    width can be retrieved in runtime by series of rte_flow_validate()
14442d241515SViacheslav Ovsiienko    trials.
14452d241515SViacheslav Ovsiienko
14462d241515SViacheslav Ovsiienko  - 2, this engages extensive metadata mode, the ``MARK`` and ``META``
14472d241515SViacheslav Ovsiienko    related actions and items operate within all supported steering domains,
14482d241515SViacheslav Ovsiienko    including FDB, ``MARK`` and ``META`` information may cross the domain
14492d241515SViacheslav Ovsiienko    boundaries. The ``META`` item is 32 bits wide, the ``MARK`` item width
14502d241515SViacheslav Ovsiienko    depends on kernel and firmware configurations and might be 0, 16 or
14512d241515SViacheslav Ovsiienko    24 bits. The actual supported width can be retrieved in runtime by
14522d241515SViacheslav Ovsiienko    series of rte_flow_validate() trials.
14532d241515SViacheslav Ovsiienko
14544ec6360dSGregory Etelson  - 3, this engages tunnel offload mode. In E-Switch configuration, that
14554ec6360dSGregory Etelson    mode implicitly activates ``dv_xmeta_en=1``.
14564ec6360dSGregory Etelson
1457ddb68e47SBing Zhao  - 4, this mode is only supported in HWS (``dv_flow_en=2``).
1458ddb68e47SBing Zhao    The Rx/Tx metadata with 32b width copy between FDB and NIC is supported.
1459ddb68e47SBing Zhao    The mark is only supported in NIC and there is no copy supported.
1460ddb68e47SBing Zhao
14612d241515SViacheslav Ovsiienko  +------+-----------+-----------+-------------+-------------+
14622d241515SViacheslav Ovsiienko  | Mode | ``MARK``  | ``META``  | ``META`` Tx | FDB/Through |
14632d241515SViacheslav Ovsiienko  +======+===========+===========+=============+=============+
14642d241515SViacheslav Ovsiienko  | 0    | 24 bits   | 32 bits   | 32 bits     | no          |
14652d241515SViacheslav Ovsiienko  +------+-----------+-----------+-------------+-------------+
14662d241515SViacheslav Ovsiienko  | 1    | 24 bits   | vary 0-32 | 32 bits     | yes         |
14672d241515SViacheslav Ovsiienko  +------+-----------+-----------+-------------+-------------+
14683ceeed9fSViacheslav Ovsiienko  | 2    | vary 0-24 | 32 bits   | 32 bits     | yes         |
14692d241515SViacheslav Ovsiienko  +------+-----------+-----------+-------------+-------------+
14702d241515SViacheslav Ovsiienko
14712d241515SViacheslav Ovsiienko  If there is no E-Switch configuration the ``dv_xmeta_en`` parameter is
14722d241515SViacheslav Ovsiienko  ignored and the device is configured to operate in legacy mode (0).
14732d241515SViacheslav Ovsiienko
14742d241515SViacheslav Ovsiienko  Disabled by default (set to 0).
14752d241515SViacheslav Ovsiienko
1476771e5af0SViacheslav Ovsiienko  The Direct Verbs/Rules (engaged with ``dv_flow_en`` = 1) supports all
1477771e5af0SViacheslav Ovsiienko  of the extensive metadata features. The legacy Verbs supports FLAG and
1478771e5af0SViacheslav Ovsiienko  MARK metadata actions over NIC Rx steering domain only.
1479771e5af0SViacheslav Ovsiienko
1480b8ee0a16SViacheslav Ovsiienko  Setting META value to zero in flow action means there is no item provided
1481b8ee0a16SViacheslav Ovsiienko  and receiving datapath will not report in mbufs the metadata are present.
1482b8ee0a16SViacheslav Ovsiienko  Setting MARK value to zero in flow action means the zero FDIR ID value
1483b8ee0a16SViacheslav Ovsiienko  will be reported on packet receiving.
14843ceeed9fSViacheslav Ovsiienko
14853ceeed9fSViacheslav Ovsiienko  For the MARK action the last 16 values in the full range are reserved for
14863ceeed9fSViacheslav Ovsiienko  internal PMD purposes (to emulate FLAG action). The valid range for the
14877be78d02SJosh Soref  MARK action values is 0-0xFFEF for the 16-bit mode and 0-0xFFFFEF
14883ceeed9fSViacheslav Ovsiienko  for the 24-bit mode, the flows with the MARK action value outside
14893ceeed9fSViacheslav Ovsiienko  the specified range will be rejected.
14903ceeed9fSViacheslav Ovsiienko
149151e72d38SOri Kam- ``dv_flow_en`` parameter [int]
149251e72d38SOri Kam
1493d84c3cf7SSuanming Mou  Value 0 means legacy Verbs flow offloading.
149451e72d38SOri Kam
1495d84c3cf7SSuanming Mou  Value 1 enables the DV flow steering assuming it is supported by the
1496d84c3cf7SSuanming Mou  driver (requires rdma-core 24 or higher).
1497d84c3cf7SSuanming Mou
1498d84c3cf7SSuanming Mou  Value 2 enables the WQE based hardware steering.
1499d84c3cf7SSuanming Mou  In this mode, only queue-based flow management is supported.
1500d84c3cf7SSuanming Mou
1501d84c3cf7SSuanming Mou  It is configured by default to 1 (DV flow steering) if supported.
1502d84c3cf7SSuanming Mou  Otherwise, the value is 0 which indicates legacy Verbs flow offloading.
150351e72d38SOri Kam
1504909be50aSOri Kam- ``dv_esw_en`` parameter [int]
1505909be50aSOri Kam
1506909be50aSOri Kam  A nonzero value enables E-Switch using Direct Rules.
1507909be50aSOri Kam
1508909be50aSOri Kam  Enabled by default if supported.
1509909be50aSOri Kam
15101939eb6fSDariusz Sosnowski- ``fdb_def_rule_en`` parameter [int]
15111939eb6fSDariusz Sosnowski
15121939eb6fSDariusz Sosnowski  A non-zero value enables to create a dedicated rule on E-Switch root table.
15131939eb6fSDariusz Sosnowski  This dedicated rule forwards all incoming packets into table 1.
15141939eb6fSDariusz Sosnowski  Other rules will be created in E-Switch table original table level plus one,
15151939eb6fSDariusz Sosnowski  to improve the flow insertion rate due to skipping root table managed by firmware.
15161939eb6fSDariusz Sosnowski  If set to 0, all rules will be created on the original E-Switch table level.
15171939eb6fSDariusz Sosnowski
15181939eb6fSDariusz Sosnowski  By default, the PMD will set this value to 1.
15191939eb6fSDariusz Sosnowski
15200f0ae73aSShiri Kuzin- ``lacp_by_user`` parameter [int]
15210f0ae73aSShiri Kuzin
15220f0ae73aSShiri Kuzin  A nonzero value enables the control of LACP traffic by the user application.
15230f0ae73aSShiri Kuzin  When a bond exists in the driver, by default it should be managed by the
15240f0ae73aSShiri Kuzin  kernel and therefore LACP traffic should be steered to the kernel.
15250f0ae73aSShiri Kuzin  If this devarg is set to 1 it will allow the user to manage the bond by
15260f0ae73aSShiri Kuzin  itself and not steer LACP traffic to the kernel.
15270f0ae73aSShiri Kuzin
15280f0ae73aSShiri Kuzin  Disabled by default (set to 0).
15290f0ae73aSShiri Kuzin
15306de569f5SAdrien Mazarguil- ``representor`` parameter [list]
15316de569f5SAdrien Mazarguil
15326de569f5SAdrien Mazarguil  This parameter can be used to instantiate DPDK Ethernet devices from
1533cb95feefSXueming Li  existing port (PF, VF or SF) representors configured on the device.
15346de569f5SAdrien Mazarguil
15356de569f5SAdrien Mazarguil  It is a standard parameter whose format is described in
15366de569f5SAdrien Mazarguil  :ref:`ethernet_device_standard_device_arguments`.
15376de569f5SAdrien Mazarguil
1538cb95feefSXueming Li  For instance, to probe VF port representors 0 through 2::
15396de569f5SAdrien Mazarguil
1540f926cce3SXueming Li    <PCI_BDF>,representor=vf[0-2]
1541cb95feefSXueming Li
1542cb95feefSXueming Li  To probe SF port representors 0 through 2::
1543cb95feefSXueming Li
1544f926cce3SXueming Li    <PCI_BDF>,representor=sf[0-2]
15456de569f5SAdrien Mazarguil
154608c2772fSXueming Li  To probe VF port representors 0 through 2 on both PFs of bonding device::
154708c2772fSXueming Li
154808c2772fSXueming Li    <Primary_PCI_BDF>,representor=pf[0,1]vf[0-2]
154908c2772fSXueming Li
1550483181f7SDariusz Sosnowski- ``repr_matching_en`` parameter [int]
1551483181f7SDariusz Sosnowski
1552483181f7SDariusz Sosnowski  - 0. If representor matching is disabled, then there will be no implicit
1553483181f7SDariusz Sosnowski    item added. As a result, ingress flow rules will match traffic
1554483181f7SDariusz Sosnowski    coming to any port, not only the port on which flow rule is created.
1555042f52ddSDariusz Sosnowski    Because of that, default flow rules for ingress traffic cannot be created
1556042f52ddSDariusz Sosnowski    and port starts in isolated mode by default. Port cannot be switched back
1557042f52ddSDariusz Sosnowski    to non-isolated mode.
1558483181f7SDariusz Sosnowski
1559483181f7SDariusz Sosnowski  - 1. If representor matching is enabled (default setting),
1560483181f7SDariusz Sosnowski    then each ingress pattern template has an implicit REPRESENTED_PORT
1561483181f7SDariusz Sosnowski    item added. Flow rules based on this pattern template will match
1562483181f7SDariusz Sosnowski    the vport associated with port on which rule is created.
1563483181f7SDariusz Sosnowski
1564066cfecdSMatan Azrad- ``max_dump_files_num`` parameter [int]
1565066cfecdSMatan Azrad
1566066cfecdSMatan Azrad  The maximum number of files per PMD entity that may be created for debug information.
1567066cfecdSMatan Azrad  The files will be created in /var/log directory or in current directory.
1568066cfecdSMatan Azrad
1569066cfecdSMatan Azrad  set to 128 by default.
1570066cfecdSMatan Azrad
157121bb6c7eSDekel Peled- ``lro_timeout_usec`` parameter [int]
157221bb6c7eSDekel Peled
157321bb6c7eSDekel Peled  The maximum allowed duration of an LRO session, in micro-seconds.
157421bb6c7eSDekel Peled  PMD will set the nearest value supported by HW, which is not bigger than
157521bb6c7eSDekel Peled  the input ``lro_timeout_usec`` value.
157621bb6c7eSDekel Peled  If this parameter is not specified, by default PMD will set
157721bb6c7eSDekel Peled  the smallest value supported by HW.
157821bb6c7eSDekel Peled
157918f127adSBing Zhao- ``hp_buf_log_sz`` parameter [int]
158018f127adSBing Zhao
158118f127adSBing Zhao  The total data buffer size of a hairpin queue (logarithmic form), in bytes.
158218f127adSBing Zhao  PMD will set the data buffer size to 2 ** ``hp_buf_log_sz``, both for RX & TX.
158318f127adSBing Zhao  The capacity of the value is specified by the firmware and the initialization
158418f127adSBing Zhao  will get a failure if it is out of scope.
158518f127adSBing Zhao  The range of the value is from 11 to 19 right now, and the supported frame
158618f127adSBing Zhao  size of a single packet for hairpin is from 512B to 128KB. It might change if
158718f127adSBing Zhao  different firmware release is being used. By using a small value, it could
158818f127adSBing Zhao  reduce memory consumption but not work with a large frame. If the value is
158918f127adSBing Zhao  too large, the memory consumption will be high and some potential performance
159018f127adSBing Zhao  degradation will be introduced.
159118f127adSBing Zhao  By default, the PMD will set this value to 16, which means that 9KB jumbo
159218f127adSBing Zhao  frames will be supported.
159318f127adSBing Zhao
1594a1da6f62SSuanming Mou- ``reclaim_mem_mode`` parameter [int]
1595a1da6f62SSuanming Mou
1596a1da6f62SSuanming Mou  Cache some resources in flow destroy will help flow recreation more efficient.
1597a1da6f62SSuanming Mou  While some systems may require the all the resources can be reclaimed after
1598a1da6f62SSuanming Mou  flow destroyed.
1599a1da6f62SSuanming Mou  The parameter ``reclaim_mem_mode`` provides the option for user to configure
1600a1da6f62SSuanming Mou  if the resource cache is needed or not.
1601a1da6f62SSuanming Mou
1602a1da6f62SSuanming Mou  There are three options to choose:
1603a1da6f62SSuanming Mou
1604a1da6f62SSuanming Mou  - 0. It means the flow resources will be cached as usual. The resources will
1605a1da6f62SSuanming Mou    be cached, helpful with flow insertion rate.
1606a1da6f62SSuanming Mou
1607a1da6f62SSuanming Mou  - 1. It will only enable the DPDK PMD level resources reclaim.
1608a1da6f62SSuanming Mou
1609a1da6f62SSuanming Mou  - 2. Both DPDK PMD level and rdma-core low level will be configured as
1610a1da6f62SSuanming Mou    reclaimed mode.
1611a1da6f62SSuanming Mou
1612a1da6f62SSuanming Mou  By default, the PMD will set this value to 0.
1613a1da6f62SSuanming Mou
161450f95b23SSuanming Mou- ``decap_en`` parameter [int]
161550f95b23SSuanming Mou
161650f95b23SSuanming Mou  Some devices do not support FCS (frame checksum) scattering for
161750f95b23SSuanming Mou  tunnel-decapsulated packets.
161850f95b23SSuanming Mou  If set to 0, this option forces the FCS feature and rejects tunnel
161950f95b23SSuanming Mou  decapsulation in the flow engine for such devices.
162050f95b23SSuanming Mou
162150f95b23SSuanming Mou  By default, the PMD will set this value to 1.
162250f95b23SSuanming Mou
1623e39226bdSJiawei Wang- ``allow_duplicate_pattern`` parameter [int]
1624e39226bdSJiawei Wang
1625e39226bdSJiawei Wang  There are two options to choose:
1626e39226bdSJiawei Wang
1627e39226bdSJiawei Wang  - 0. Prevent insertion of rules with the same pattern items on non-root table.
1628e39226bdSJiawei Wang    In this case, only the first rule is inserted and the following rules are
1629e39226bdSJiawei Wang    rejected and error code EEXIST is returned.
1630e39226bdSJiawei Wang
1631e39226bdSJiawei Wang  - 1. Allow insertion of rules with the same pattern items.
1632e39226bdSJiawei Wang    In this case, all rules are inserted but only the first rule takes effect,
1633e39226bdSJiawei Wang    the next rule takes effect only if the previous rules are deleted.
1634e39226bdSJiawei Wang
1635e39226bdSJiawei Wang  By default, the PMD will set this value to 1.
1636e39226bdSJiawei Wang
1637a7e11a0cSAdrien Mazarguil
163811c73de9SDariusz SosnowskiMultiport E-Switch
163911c73de9SDariusz Sosnowski------------------
164011c73de9SDariusz Sosnowski
164111c73de9SDariusz SosnowskiIn standard deployments of NVIDIA ConnectX and BlueField HCAs, where embedded switch is enabled,
164211c73de9SDariusz Sosnowskieach physical port is associated with a single switching domain.
164311c73de9SDariusz SosnowskiOnly PFs, VFs and SFs related to that physical port are connected to this domain
164411c73de9SDariusz Sosnowskiand offloaded flow rules are allowed to steer traffic only between the entities in the given domain.
164511c73de9SDariusz Sosnowski
164611c73de9SDariusz SosnowskiThe following diagram pictures the high level overview of this architecture::
164711c73de9SDariusz Sosnowski
164811c73de9SDariusz Sosnowski       .---. .------. .------. .---. .------. .------.
164911c73de9SDariusz Sosnowski       |PF0| |PF0VFi| |PF0SFi| |PF1| |PF1VFi| |PF1SFi|
165011c73de9SDariusz Sosnowski       .-+-. .--+---. .--+---. .-+-. .--+---. .--+---.
165111c73de9SDariusz Sosnowski         |      |        |       |      |        |
165211c73de9SDariusz Sosnowski     .---|------|--------|-------|------|--------|---------.
165311c73de9SDariusz Sosnowski     |   |      |        |       |      |        |      HCA|
165411c73de9SDariusz Sosnowski     | .-+------+--------+---. .-+------+--------+---.     |
165511c73de9SDariusz Sosnowski     | |                     | |                     |     |
165611c73de9SDariusz Sosnowski     | |      E-Switch       | |     E-Switch        |     |
165711c73de9SDariusz Sosnowski     | |         PF0         | |        PF1          |     |
165811c73de9SDariusz Sosnowski     | |                     | |                     |     |
165911c73de9SDariusz Sosnowski     | .---------+-----------. .--------+------------.     |
166011c73de9SDariusz Sosnowski     |           |                      |                  |
166111c73de9SDariusz Sosnowski     .--------+--+---+---------------+--+---+--------------.
166211c73de9SDariusz Sosnowski              |      |               |      |
166311c73de9SDariusz Sosnowski              | PHY0 |               | PHY1 |
166411c73de9SDariusz Sosnowski              |      |               |      |
166511c73de9SDariusz Sosnowski              .------.               .------.
166611c73de9SDariusz Sosnowski
166711c73de9SDariusz SosnowskiMultiport E-Switch is a deployment scenario where:
166811c73de9SDariusz Sosnowski
166911c73de9SDariusz Sosnowski- All physical ports, PFs, VFs and SFs share the same switching domain.
167011c73de9SDariusz Sosnowski- Each physical port gets a separate representor port.
167111c73de9SDariusz Sosnowski- Traffic can be matched or forwarded explicitly between any of the entities
167211c73de9SDariusz Sosnowski  connected to the domain.
167311c73de9SDariusz Sosnowski
167411c73de9SDariusz SosnowskiThe following diagram pictures the high level overview of this architecture::
167511c73de9SDariusz Sosnowski
167611c73de9SDariusz Sosnowski       .---. .------. .------. .---. .------. .------.
167711c73de9SDariusz Sosnowski       |PF0| |PF0VFi| |PF0SFi| |PF1| |PF1VFi| |PF1SFi|
167811c73de9SDariusz Sosnowski       .-+-. .--+---. .--+---. .-+-. .--+---. .--+---.
167911c73de9SDariusz Sosnowski         |      |        |       |      |        |
168011c73de9SDariusz Sosnowski     .---|------|--------|-------|------|--------|---------.
168111c73de9SDariusz Sosnowski     |   |      |        |       |      |        |      HCA|
168211c73de9SDariusz Sosnowski     | .-+------+--------+-------+------+--------+---.     |
168311c73de9SDariusz Sosnowski     | |                                             |     |
168411c73de9SDariusz Sosnowski     | |                   Shared                    |     |
168511c73de9SDariusz Sosnowski     | |                  E-Switch                   |     |
168611c73de9SDariusz Sosnowski     | |                                             |     |
168711c73de9SDariusz Sosnowski     | .---------+----------------------+------------.     |
168811c73de9SDariusz Sosnowski     |           |                      |                  |
168911c73de9SDariusz Sosnowski     .--------+--+---+---------------+--+---+--------------.
169011c73de9SDariusz Sosnowski              |      |               |      |
169111c73de9SDariusz Sosnowski              | PHY0 |               | PHY1 |
169211c73de9SDariusz Sosnowski              |      |               |      |
169311c73de9SDariusz Sosnowski              .------.               .------.
169411c73de9SDariusz Sosnowski
169511c73de9SDariusz SosnowskiIn this deployment a single application can control the switching and forwarding behavior for all
169611c73de9SDariusz Sosnowskientities on the HCA.
169711c73de9SDariusz Sosnowski
169811c73de9SDariusz SosnowskiWith this configuration, mlx5 PMD supports:
169911c73de9SDariusz Sosnowski
170011c73de9SDariusz Sosnowski- matching traffic coming from physical port, PF, VF or SF using REPRESENTED_PORT items;
17014cbeba6fSSuanming Mou- matching traffic coming from E-Switch manager
17024cbeba6fSSuanming Mou  using REPRESENTED_PORT item with port ID ``UINT16_MAX``;
170311c73de9SDariusz Sosnowski- forwarding traffic to physical port, PF, VF or SF using REPRESENTED_PORT actions;
170411c73de9SDariusz Sosnowski
170511c73de9SDariusz SosnowskiRequirements
170611c73de9SDariusz Sosnowski~~~~~~~~~~~~
170711c73de9SDariusz Sosnowski
170811c73de9SDariusz SosnowskiSupported HCAs:
170911c73de9SDariusz Sosnowski
171011c73de9SDariusz Sosnowski- ConnectX family: ConnectX-6 Dx and above.
171111c73de9SDariusz Sosnowski- BlueField family: BlueField-2 and above.
171211c73de9SDariusz Sosnowski- FW version: at least ``XX.37.1014``.
171311c73de9SDariusz Sosnowski
171411c73de9SDariusz SosnowskiSupported mlx5 kernel modules versions:
171511c73de9SDariusz Sosnowski
171611c73de9SDariusz Sosnowski- Upstream Linux - from version 6.3.
171711c73de9SDariusz Sosnowski- Modules packaged in MLNX_OFED - from version v23.04-0.5.3.3.
171811c73de9SDariusz Sosnowski
171911c73de9SDariusz SosnowskiConfiguration
172011c73de9SDariusz Sosnowski~~~~~~~~~~~~~
172111c73de9SDariusz Sosnowski
172211c73de9SDariusz Sosnowski#. Apply required FW configuration::
172311c73de9SDariusz Sosnowski
172411c73de9SDariusz Sosnowski      sudo mlxconfig -d /dev/mst/mt4125_pciconf0 set LAG_RESOURCE_ALLOCATION=1
172511c73de9SDariusz Sosnowski
172611c73de9SDariusz Sosnowski#. Reset FW or cold reboot the host.
172711c73de9SDariusz Sosnowski
172811c73de9SDariusz Sosnowski#. Switch E-Switch mode on all of the PFs to ``switchdev`` mode::
172911c73de9SDariusz Sosnowski
173011c73de9SDariusz Sosnowski      sudo devlink dev eswitch set pci/0000:08:00.0 mode switchdev
173111c73de9SDariusz Sosnowski      sudo devlink dev eswitch set pci/0000:08:00.1 mode switchdev
173211c73de9SDariusz Sosnowski
173311c73de9SDariusz Sosnowski#. Enable Multiport E-Switch on all of the PFs::
173411c73de9SDariusz Sosnowski
173511c73de9SDariusz Sosnowski      sudo devlink dev param set pci/0000:08:00.0 name esw_multiport value true cmode runtime
173611c73de9SDariusz Sosnowski      sudo devlink dev param set pci/0000:08:00.1 name esw_multiport value true cmode runtime
173711c73de9SDariusz Sosnowski
173811c73de9SDariusz Sosnowski#. Configure required number of VFs/SFs::
173911c73de9SDariusz Sosnowski
174011c73de9SDariusz Sosnowski      echo 4 | sudo tee /sys/class/net/eth2/device/sriov_numvfs
174111c73de9SDariusz Sosnowski      echo 4 | sudo tee /sys/class/net/eth3/device/sriov_numvfs
174211c73de9SDariusz Sosnowski
174311c73de9SDariusz Sosnowski#. Start testpmd and verify that all ports are visible::
174411c73de9SDariusz Sosnowski
174511c73de9SDariusz Sosnowski      $ sudo dpdk-testpmd -a 08:00.0,dv_flow_en=2,representor=pf0-1vf0-3 -- -i
174611c73de9SDariusz Sosnowski      testpmd> show port summary all
174711c73de9SDariusz Sosnowski      Number of available ports: 10
174811c73de9SDariusz Sosnowski      Port MAC Address       Name         Driver         Status   Link
174911c73de9SDariusz Sosnowski      0    E8:EB:D5:18:22:BC 08:00.0_p0   mlx5_pci       up       200 Gbps
175011c73de9SDariusz Sosnowski      1    E8:EB:D5:18:22:BD 08:00.0_p1   mlx5_pci       up       200 Gbps
175111c73de9SDariusz Sosnowski      2    D2:F6:43:0B:9E:19 08:00.0_representor_c0pf0vf0 mlx5_pci       up       200 Gbps
175211c73de9SDariusz Sosnowski      3    E6:42:27:B7:68:BD 08:00.0_representor_c0pf0vf1 mlx5_pci       up       200 Gbps
175311c73de9SDariusz Sosnowski      4    A6:5B:7F:8B:B8:47 08:00.0_representor_c0pf0vf2 mlx5_pci       up       200 Gbps
175411c73de9SDariusz Sosnowski      5    12:93:50:45:89:02 08:00.0_representor_c0pf0vf3 mlx5_pci       up       200 Gbps
175511c73de9SDariusz Sosnowski      6    06:D3:B2:79:FE:AC 08:00.0_representor_c0pf1vf0 mlx5_pci       up       200 Gbps
175611c73de9SDariusz Sosnowski      7    12:FC:08:E4:C2:CA 08:00.0_representor_c0pf1vf1 mlx5_pci       up       200 Gbps
175711c73de9SDariusz Sosnowski      8    8E:A9:9A:D0:35:4C 08:00.0_representor_c0pf1vf2 mlx5_pci       up       200 Gbps
175811c73de9SDariusz Sosnowski      9    E6:35:83:1F:B0:A9 08:00.0_representor_c0pf1vf3 mlx5_pci       up       200 Gbps
175911c73de9SDariusz Sosnowski
176011c73de9SDariusz SosnowskiLimitations
176111c73de9SDariusz Sosnowski~~~~~~~~~~~
176211c73de9SDariusz Sosnowski
176311c73de9SDariusz Sosnowski- Multiport E-Switch is not supported on Windows.
176411c73de9SDariusz Sosnowski- Multiport E-Switch is supported only with HW Steering flow engine (``dv_flow_en=2``).
176511c73de9SDariusz Sosnowski- Matching traffic coming from a physical port and forwarding it to a physical port
176611c73de9SDariusz Sosnowski  (either the same or other one) is not supported.
176711c73de9SDariusz Sosnowski
176811c73de9SDariusz Sosnowski  In order to achieve such a functionality, an application has to setup hairpin queues
176911c73de9SDariusz Sosnowski  between physical port representors and forward the traffic using hairpin queues.
177011c73de9SDariusz Sosnowski
177111c73de9SDariusz Sosnowski
1772a3ade5e3SMichael BaumSub-Function
1773a3ade5e3SMichael Baum------------
1774d88f0449SShahaf Shuler
1775a3ade5e3SMichael BaumSee :ref:`mlx5_sub_function`.
1776919488fbSXueming Li
1777919488fbSXueming LiSub-Function representor support
1778a3ade5e3SMichael Baum~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1779919488fbSXueming Li
1780919488fbSXueming LiA SF netdev supports E-Switch representation offload
1781919488fbSXueming Lisimilar to PF and VF representors.
1782919488fbSXueming LiUse <sfnum> to probe SF representor::
1783919488fbSXueming Li
1784919488fbSXueming Li   testpmd> port attach <PCI_BDF>,representor=sf<sfnum>,dv_flow_en=1
1785cb95feefSXueming Li
1786a3ade5e3SMichael Baum
17875747882cSShahaf ShulerPerformance tuning
17885747882cSShahaf Shuler------------------
17895747882cSShahaf Shuler
1790443b949eSDavid Marchand#. Configure aggressive CQE Zipping for maximum performance::
17915747882cSShahaf Shuler
17925747882cSShahaf Shuler        mlxconfig -d <mst device> s CQE_COMPRESSION=1
17935747882cSShahaf Shuler
1794499b461fSThomas Monjalon   To set it back to the default CQE Zipping mode use::
17955747882cSShahaf Shuler
17965747882cSShahaf Shuler        mlxconfig -d <mst device> s CQE_COMPRESSION=0
17975747882cSShahaf Shuler
1798443b949eSDavid Marchand#. In case of virtualization:
17995747882cSShahaf Shuler
18005747882cSShahaf Shuler   - Make sure that hypervisor kernel is 3.16 or newer.
18015747882cSShahaf Shuler   - Configure boot with ``iommu=pt``.
18025747882cSShahaf Shuler   - Use 1G huge pages.
18035747882cSShahaf Shuler   - Make sure to allocate a VM on huge pages.
18045747882cSShahaf Shuler   - Make sure to set CPU pinning.
18055747882cSShahaf Shuler
1806443b949eSDavid Marchand#. Use the CPU near local NUMA node to which the PCIe adapter is connected,
18075747882cSShahaf Shuler   for better performance. For VMs, verify that the right CPU
1808499b461fSThomas Monjalon   and NUMA node are pinned according to the above. Run::
18095747882cSShahaf Shuler
18104821fa10SThomas Monjalon        lstopo-no-graphics --merge
18115747882cSShahaf Shuler
18125747882cSShahaf Shuler   to identify the NUMA node to which the PCIe adapter is connected.
18135747882cSShahaf Shuler
1814443b949eSDavid Marchand#. If more than one adapter is used, and root complex capabilities allow
18155747882cSShahaf Shuler   to put both adapters on the same NUMA node without PCI bandwidth degradation,
18165747882cSShahaf Shuler   it is recommended to locate both adapters on the same NUMA node.
18175747882cSShahaf Shuler   This in order to forward packets from one to the other without
18185747882cSShahaf Shuler   NUMA performance penalty.
18195747882cSShahaf Shuler
1820443b949eSDavid Marchand#. Disable pause frames::
18215747882cSShahaf Shuler
18225747882cSShahaf Shuler        ethtool -A <netdev> rx off tx off
18235747882cSShahaf Shuler
1824443b949eSDavid Marchand#. Verify IO non-posted prefetch is disabled by default. This can be checked
18255747882cSShahaf Shuler   via the BIOS configuration. Please contact you server provider for more
18265747882cSShahaf Shuler   information about the settings.
18275747882cSShahaf Shuler
18285747882cSShahaf Shuler   .. note::
18295747882cSShahaf Shuler
18305747882cSShahaf Shuler        On some machines, depends on the machine integrator, it is beneficial
18315747882cSShahaf Shuler        to set the PCI max read request parameter to 1K. This can be
18325747882cSShahaf Shuler        done in the following way:
18335747882cSShahaf Shuler
1834499b461fSThomas Monjalon        To query the read request size use::
18355747882cSShahaf Shuler
18365747882cSShahaf Shuler                setpci -s <NIC PCI address> 68.w
18375747882cSShahaf Shuler
1838499b461fSThomas Monjalon        If the output is different than 3XXX, set it by::
18395747882cSShahaf Shuler
18405747882cSShahaf Shuler                setpci -s <NIC PCI address> 68.w=3XXX
18415747882cSShahaf Shuler
18425747882cSShahaf Shuler        The XXX can be different on different systems. Make sure to configure
18435747882cSShahaf Shuler        according to the setpci output.
1844d88f0449SShahaf Shuler
1845443b949eSDavid Marchand#. To minimize overhead of searching Memory Regions:
1846974f1e7eSYongseok Koh
1847974f1e7eSYongseok Koh   - '--socket-mem' is recommended to pin memory by predictable amount.
1848974f1e7eSYongseok Koh   - Configure per-lcore cache when creating Mempools for packet buffer.
1849974f1e7eSYongseok Koh   - Refrain from dynamically allocating/freeing memory in run-time.
1850974f1e7eSYongseok Koh
18516457d0ecSAsaf PensoRx burst functions
18526457d0ecSAsaf Penso------------------
18536457d0ecSAsaf Penso
18546457d0ecSAsaf PensoThere are multiple Rx burst functions with different advantages and limitations.
18556457d0ecSAsaf Penso
18566457d0ecSAsaf Penso.. table:: Rx burst functions
18576457d0ecSAsaf Penso
18586457d0ecSAsaf Penso   +-------------------+------------------------+---------+-----------------+------+-------+
18596457d0ecSAsaf Penso   || Function Name    || Enabler               || Scatter|| Error Recovery || CQE || Large|
18606457d0ecSAsaf Penso   |                   |                        |         |                 || comp|| MTU  |
18616457d0ecSAsaf Penso   +===================+========================+=========+=================+======+=======+
18626457d0ecSAsaf Penso   | rx_burst          | rx_vec_en=0            |   Yes   | Yes             |  Yes |  Yes  |
18636457d0ecSAsaf Penso   +-------------------+------------------------+---------+-----------------+------+-------+
18646457d0ecSAsaf Penso   | rx_burst_vec      | rx_vec_en=1 (default)  |   No    | if CQE comp off |  Yes |  No   |
18656457d0ecSAsaf Penso   +-------------------+------------------------+---------+-----------------+------+-------+
18666457d0ecSAsaf Penso   | rx_burst_mprq     || mprq_en=1             |   No    | Yes             |  Yes |  Yes  |
18676457d0ecSAsaf Penso   |                   || RxQs >= rxqs_min_mprq |         |                 |      |       |
18686457d0ecSAsaf Penso   +-------------------+------------------------+---------+-----------------+------+-------+
18696457d0ecSAsaf Penso   | rx_burst_mprq_vec || rx_vec_en=1 (default) |   No    | if CQE comp off |  Yes |  Yes  |
18706457d0ecSAsaf Penso   |                   || mprq_en=1             |         |                 |      |       |
18716457d0ecSAsaf Penso   |                   || RxQs >= rxqs_min_mprq |         |                 |      |       |
18726457d0ecSAsaf Penso   +-------------------+------------------------+---------+-----------------+------+-------+
18736457d0ecSAsaf Penso
187423bdbfecSThomas Monjalon.. _mlx5_offloads_support:
187523bdbfecSThomas Monjalon
18764710b113SThomas MonjalonSupported hardware offloads
18774710b113SThomas Monjalon---------------------------
1878909be50aSOri Kam
18790f91f952SThomas MonjalonBelow tables show offload support depending on hardware, firmware,
18800f91f952SThomas Monjalonand Linux software support.
18810f91f952SThomas Monjalon
18820f91f952SThomas MonjalonThe :ref:`Linux prerequisites <mlx5_linux_prerequisites>`
18830f91f952SThomas Monjalonare Linux kernel and rdma-core libraries.
18840f91f952SThomas MonjalonThese dependencies are also packaged in MLNX_OFED or MLNX_EN,
18850f91f952SThomas Monjalonshortened below as "OFED".
18860f91f952SThomas Monjalon
18874710b113SThomas Monjalon.. table:: Minimal SW/HW versions for queue offloads
18884710b113SThomas Monjalon
1889cb7b0c24SAsaf Penso   ============== ===== ===== ========= ===== ========== =============
18904710b113SThomas Monjalon   Offload        DPDK  Linux rdma-core OFED   firmware   hardware
1891cb7b0c24SAsaf Penso   ============== ===== ===== ========= ===== ========== =============
18924710b113SThomas Monjalon   common base    17.11  4.14    16     4.2-1 12.21.1000 ConnectX-4
18934710b113SThomas Monjalon   checksums      17.11  4.14    16     4.2-1 12.21.1000 ConnectX-4
18944710b113SThomas Monjalon   Rx timestamp   17.11  4.14    16     4.2-1 12.21.1000 ConnectX-4
18954710b113SThomas Monjalon   TSO            17.11  4.14    16     4.2-1 12.21.1000 ConnectX-4
18964710b113SThomas Monjalon   LRO            19.08  N/A     N/A    4.6-4 16.25.6406 ConnectX-5
18975f5b0ac9SViacheslav Ovsiienko   Tx scheduling  20.08  N/A     N/A    5.1-2 22.28.2006 ConnectX-6 Dx
18985f5b0ac9SViacheslav Ovsiienko   Buffer Split   20.11  N/A     N/A    5.1-2 16.28.2006 ConnectX-5
1899cb7b0c24SAsaf Penso   ============== ===== ===== ========= ===== ========== =============
19004710b113SThomas Monjalon
19014710b113SThomas Monjalon.. table:: Minimal SW/HW versions for rte_flow offloads
1902909be50aSOri Kam
1903909be50aSOri Kam   +-----------------------+-----------------+-----------------+
1904fb2861f1SRaslan Darawsheh   | Offload               | with E-Switch   | with NIC        |
1905909be50aSOri Kam   +=======================+=================+=================+
1906909be50aSOri Kam   | Count                 | | DPDK 19.05    | | DPDK 19.02    |
1907909be50aSOri Kam   |                       | | OFED 4.6      | | OFED 4.6      |
190823bdbfecSThomas Monjalon   |                       | | rdma-core 24  | | rdma-core 23  |
1909909be50aSOri Kam   |                       | | ConnectX-5    | | ConnectX-5    |
1910909be50aSOri Kam   +-----------------------+-----------------+-----------------+
1911fb2861f1SRaslan Darawsheh   | Drop                  | | DPDK 19.05    | | DPDK 18.11    |
1912909be50aSOri Kam   |                       | | OFED 4.6      | | OFED 4.5      |
191323bdbfecSThomas Monjalon   |                       | | rdma-core 24  | | rdma-core 23  |
1914909be50aSOri Kam   |                       | | ConnectX-5    | | ConnectX-4    |
1915909be50aSOri Kam   +-----------------------+-----------------+-----------------+
1916fb2861f1SRaslan Darawsheh   | Queue / RSS           | |               | | DPDK 18.11    |
1917fb2861f1SRaslan Darawsheh   |                       | |     N/A       | | OFED 4.5      |
1918fb2861f1SRaslan Darawsheh   |                       | |               | | rdma-core 23  |
1919fb2861f1SRaslan Darawsheh   |                       | |               | | ConnectX-4    |
1920fb2861f1SRaslan Darawsheh   +-----------------------+-----------------+-----------------+
1921db5866c8SXiaoyu Min   | Shared action         | |               | |               |
1922db5866c8SXiaoyu Min   |                       | | :numref:`sact`| | :numref:`sact`|
1923db5866c8SXiaoyu Min   |                       | |               | |               |
1924db5866c8SXiaoyu Min   |                       | |               | |               |
1925cb7b0c24SAsaf Penso   +-----------------------+-----------------+-----------------+
1926cb7b0c24SAsaf Penso   | | VLAN                | | DPDK 19.11    | | DPDK 19.11    |
1927cb7b0c24SAsaf Penso   | | (of_pop_vlan /      | | OFED 4.7-1    | | OFED 4.7-1    |
1928cb7b0c24SAsaf Penso   | | of_push_vlan /      | | ConnectX-5    | | ConnectX-5    |
1929cb7b0c24SAsaf Penso   | | of_set_vlan_pcp /   | |               | |               |
1930cb7b0c24SAsaf Penso   | | of_set_vlan_vid)    | |               | |               |
1931cb7b0c24SAsaf Penso   +-----------------------+-----------------+-----------------+
1932cb299214SDong Zhou   | | VLAN                | | DPDK 21.05    | |               |
1933cb299214SDong Zhou   | | ingress and /       | | OFED 5.3      | |    N/A        |
1934cb299214SDong Zhou   | | of_push_vlan /      | | ConnectX-6 Dx | |               |
1935cb299214SDong Zhou   +-----------------------+-----------------+-----------------+
1936cb299214SDong Zhou   | | VLAN                | | DPDK 21.05    | |               |
1937cb299214SDong Zhou   | | egress and /        | | OFED 5.3      | |    N/A        |
1938cb299214SDong Zhou   | | of_pop_vlan /       | | ConnectX-6 Dx | |               |
1939cb299214SDong Zhou   +-----------------------+-----------------+-----------------+
1940909be50aSOri Kam   | Encapsulation         | | DPDK 19.05    | | DPDK 19.02    |
1941fb2861f1SRaslan Darawsheh   | (VXLAN / NVGRE / RAW) | | OFED 4.7-1    | | OFED 4.6      |
194223bdbfecSThomas Monjalon   |                       | | rdma-core 24  | | rdma-core 23  |
1943909be50aSOri Kam   |                       | | ConnectX-5    | | ConnectX-5    |
1944909be50aSOri Kam   +-----------------------+-----------------+-----------------+
1945fb2861f1SRaslan Darawsheh   | Encapsulation         | | DPDK 19.11    | | DPDK 19.11    |
1946fb2861f1SRaslan Darawsheh   | GENEVE                | | OFED 4.7-3    | | OFED 4.7-3    |
1947fb2861f1SRaslan Darawsheh   |                       | | rdma-core 27  | | rdma-core 27  |
1948fb2861f1SRaslan Darawsheh   |                       | | ConnectX-5    | | ConnectX-5    |
1949fb2861f1SRaslan Darawsheh   +-----------------------+-----------------+-----------------+
1950cb7b0c24SAsaf Penso   | Tunnel Offload        | |  DPDK 20.11   | | DPDK 20.11    |
1951cb7b0c24SAsaf Penso   |                       | |  OFED 5.1-2   | | OFED 5.1-2    |
1952cb7b0c24SAsaf Penso   |                       | |  rdma-core 32 | | N/A           |
1953cb7b0c24SAsaf Penso   |                       | |  ConnectX-5   | | ConnectX-5    |
1954cb7b0c24SAsaf Penso   +-----------------------+-----------------+-----------------+
195523bdbfecSThomas Monjalon   | | Header rewrite      | | DPDK 19.05    | | DPDK 19.02    |
1956fb2861f1SRaslan Darawsheh   | | (set_ipv4_src /     | | OFED 4.7-1    | | OFED 4.7-1    |
1957fb2861f1SRaslan Darawsheh   | | set_ipv4_dst /      | | rdma-core 24  | | rdma-core 24  |
195823bdbfecSThomas Monjalon   | | set_ipv6_src /      | | ConnectX-5    | | ConnectX-5    |
19595f163d52SMoti Haimovsky   | | set_ipv6_dst /      | |               | |               |
19605f163d52SMoti Haimovsky   | | set_tp_src /        | |               | |               |
19615f163d52SMoti Haimovsky   | | set_tp_dst /        | |               | |               |
19625f163d52SMoti Haimovsky   | | dec_ttl /           | |               | |               |
19635f163d52SMoti Haimovsky   | | set_ttl /           | |               | |               |
19645f163d52SMoti Haimovsky   | | set_mac_src /       | |               | |               |
19655f163d52SMoti Haimovsky   | | set_mac_dst)        | |               | |               |
1966909be50aSOri Kam   +-----------------------+-----------------+-----------------+
196793be73ddSThomas Monjalon   | | Header rewrite      | | DPDK 20.02    | | DPDK 20.02    |
196893be73ddSThomas Monjalon   | | (set_dscp)          | | OFED 5.0      | | OFED 5.0      |
196993be73ddSThomas Monjalon   | |                     | | rdma-core 24  | | rdma-core 24  |
197093be73ddSThomas Monjalon   | |                     | | ConnectX-5    | | ConnectX-5    |
197193be73ddSThomas Monjalon   +-----------------------+-----------------+-----------------+
197225450594SJiawei Wang   | | Header rewrite      | | DPDK 22.07    | | DPDK 22.07    |
197325450594SJiawei Wang   | | (ipv4_ecn /         | | OFED 5.6-2    | | OFED 5.6-2    |
197425450594SJiawei Wang   | | ipv6_ecn)           | | rdma-core 41  | | rdma-core 41  |
197525450594SJiawei Wang   | |                     | | ConnectX-5    | | ConnectX-5    |
197625450594SJiawei Wang   +-----------------------+-----------------+-----------------+
1977909be50aSOri Kam   | Jump                  | | DPDK 19.05    | | DPDK 19.02    |
1978fb2861f1SRaslan Darawsheh   |                       | | OFED 4.7-1    | | OFED 4.7-1    |
197923bdbfecSThomas Monjalon   |                       | | rdma-core 24  | | N/A           |
1980909be50aSOri Kam   |                       | | ConnectX-5    | | ConnectX-5    |
1981909be50aSOri Kam   +-----------------------+-----------------+-----------------+
1982909be50aSOri Kam   | Mark / Flag           | | DPDK 19.05    | | DPDK 18.11    |
1983909be50aSOri Kam   |                       | | OFED 4.6      | | OFED 4.5      |
198423bdbfecSThomas Monjalon   |                       | | rdma-core 24  | | rdma-core 23  |
1985909be50aSOri Kam   |                       | | ConnectX-5    | | ConnectX-4    |
1986909be50aSOri Kam   +-----------------------+-----------------+-----------------+
1987cb7b0c24SAsaf Penso   | Meta data             | |  DPDK 19.11   | | DPDK 19.11    |
1988cb7b0c24SAsaf Penso   |                       | |  OFED 4.7-3   | | OFED 4.7-3    |
1989cb7b0c24SAsaf Penso   |                       | |  rdma-core 26 | | rdma-core 26  |
1990cb7b0c24SAsaf Penso   |                       | |  ConnectX-5   | | ConnectX-5    |
1991cb7b0c24SAsaf Penso   +-----------------------+-----------------+-----------------+
1992909be50aSOri Kam   | Port ID               | | DPDK 19.05    |     | N/A       |
1993fb2861f1SRaslan Darawsheh   |                       | | OFED 4.7-1    |     | N/A       |
199423bdbfecSThomas Monjalon   |                       | | rdma-core 24  |     | N/A       |
1995909be50aSOri Kam   |                       | | ConnectX-5    |     | N/A       |
1996909be50aSOri Kam   +-----------------------+-----------------+-----------------+
1997fb2861f1SRaslan Darawsheh   | Hairpin               | |               | | DPDK 19.11    |
1998fb2861f1SRaslan Darawsheh   |                       | |     N/A       | | OFED 4.7-3    |
1999fb2861f1SRaslan Darawsheh   |                       | |               | | rdma-core 26  |
2000fb2861f1SRaslan Darawsheh   |                       | |               | | ConnectX-5    |
2001fb2861f1SRaslan Darawsheh   +-----------------------+-----------------+-----------------+
2002cb7b0c24SAsaf Penso   | 2-port Hairpin        | |               | | DPDK 20.11    |
2003cb7b0c24SAsaf Penso   |                       | |     N/A       | | OFED 5.1-2    |
2004cb7b0c24SAsaf Penso   |                       | |               | | N/A           |
2005cb7b0c24SAsaf Penso   |                       | |               | | ConnectX-5    |
2006fb2861f1SRaslan Darawsheh   +-----------------------+-----------------+-----------------+
2007fb2861f1SRaslan Darawsheh   | Metering              | |  DPDK 19.11   | | DPDK 19.11    |
2008fb2861f1SRaslan Darawsheh   |                       | |  OFED 4.7-3   | | OFED 4.7-3    |
2009fb2861f1SRaslan Darawsheh   |                       | |  rdma-core 26 | | rdma-core 26  |
2010fb2861f1SRaslan Darawsheh   |                       | |  ConnectX-5   | | ConnectX-5    |
2011fb2861f1SRaslan Darawsheh   +-----------------------+-----------------+-----------------+
20125c4e0deeSJiawei Wang   | ASO Metering          | |  DPDK 21.05   | | DPDK 21.05    |
20135c4e0deeSJiawei Wang   |                       | |  OFED 5.3     | | OFED 5.3      |
20145c4e0deeSJiawei Wang   |                       | |  rdma-core 33 | | rdma-core 33  |
20155c4e0deeSJiawei Wang   |                       | |  ConnectX-6 Dx| | ConnectX-6 Dx |
20165c4e0deeSJiawei Wang   +-----------------------+-----------------+-----------------+
20175c4e0deeSJiawei Wang   | Metering Hierarchy    | |  DPDK 21.08   | | DPDK 21.08    |
20185c4e0deeSJiawei Wang   |                       | |  OFED 5.3     | | OFED 5.3      |
20195c4e0deeSJiawei Wang   |                       | |  N/A          | | N/A           |
20205c4e0deeSJiawei Wang   |                       | |  ConnectX-6 Dx| | ConnectX-6 Dx |
20215c4e0deeSJiawei Wang   +-----------------------+-----------------+-----------------+
2022f78b86c3SJiawei Wang   | Sampling              | |  DPDK 20.11   | | DPDK 20.11    |
2023cb7b0c24SAsaf Penso   |                       | |  OFED 5.1-2   | | OFED 5.1-2    |
2024cb7b0c24SAsaf Penso   |                       | |  rdma-core 32 | | N/A           |
2025cb7b0c24SAsaf Penso   |                       | |  ConnectX-5   | | ConnectX-5    |
2026cb7b0c24SAsaf Penso   +-----------------------+-----------------+-----------------+
202706cd4cf6SShiri Kuzin   | Encapsulation         | |  DPDK 21.02   | | DPDK 21.02    |
202806cd4cf6SShiri Kuzin   | GTP PSC               | |  OFED 5.2     | | OFED 5.2      |
202906cd4cf6SShiri Kuzin   |                       | |  rdma-core 35 | | rdma-core 35  |
203006cd4cf6SShiri Kuzin   |                       | |  ConnectX-6 Dx| | ConnectX-6 Dx |
203106cd4cf6SShiri Kuzin   +-----------------------+-----------------+-----------------+
2032e440d6cfSShiri Kuzin   | Encapsulation         | | DPDK 21.02    | | DPDK 21.02    |
2033e440d6cfSShiri Kuzin   | GENEVE TLV option     | | OFED 5.2      | | OFED 5.2      |
2034e440d6cfSShiri Kuzin   |                       | | rdma-core 34  | | rdma-core 34  |
2035e440d6cfSShiri Kuzin   |                       | | ConnectX-6 Dx | | ConnectX-6 Dx |
2036e440d6cfSShiri Kuzin   +-----------------------+-----------------+-----------------+
2037641dbe4fSAlexander Kozyrev   | Modify Field          | | DPDK 21.02    | | DPDK 21.02    |
2038641dbe4fSAlexander Kozyrev   |                       | | OFED 5.2      | | OFED 5.2      |
2039641dbe4fSAlexander Kozyrev   |                       | | rdma-core 35  | | rdma-core 35  |
2040641dbe4fSAlexander Kozyrev   |                       | | ConnectX-5    | | ConnectX-5    |
2041641dbe4fSAlexander Kozyrev   +-----------------------+-----------------+-----------------+
20420c6285b7SBing Zhao   | Connection tracking   | |               | | DPDK 21.05    |
20430c6285b7SBing Zhao   |                       | |     N/A       | | OFED 5.3      |
20440c6285b7SBing Zhao   |                       | |               | | rdma-core 35  |
20450c6285b7SBing Zhao   |                       | |               | | ConnectX-6 Dx |
20460c6285b7SBing Zhao   +-----------------------+-----------------+-----------------+
2047909be50aSOri Kam
2048db5866c8SXiaoyu Min.. table:: Minimal SW/HW versions for shared action offload
2049db5866c8SXiaoyu Min   :name: sact
2050db5866c8SXiaoyu Min
2051db5866c8SXiaoyu Min   +-----------------------+-----------------+-----------------+
2052db5866c8SXiaoyu Min   | Shared Action         | with E-Switch   | with NIC        |
2053db5866c8SXiaoyu Min   +=======================+=================+=================+
2054db5866c8SXiaoyu Min   | RSS                   | |               | | DPDK 20.11    |
2055db5866c8SXiaoyu Min   |                       | |     N/A       | | OFED 5.2      |
2056db5866c8SXiaoyu Min   |                       | |               | | rdma-core 33  |
2057db5866c8SXiaoyu Min   |                       | |               | | ConnectX-5    |
2058db5866c8SXiaoyu Min   +-----------------------+-----------------+-----------------+
2059db5866c8SXiaoyu Min   | Age                   | | DPDK 20.11    | | DPDK 20.11    |
2060db5866c8SXiaoyu Min   |                       | | OFED 5.2      | | OFED 5.2      |
2061db5866c8SXiaoyu Min   |                       | | rdma-core 32  | | rdma-core 32  |
2062db5866c8SXiaoyu Min   |                       | | ConnectX-6 Dx | | ConnectX-6 Dx |
2063f3191849SMichael Baum   +-----------------------+-----------------+-----------------+
2064f3191849SMichael Baum   | Count                 | | DPDK 21.05    | | DPDK 21.05    |
2065f3191849SMichael Baum   |                       | | OFED 4.6      | | OFED 4.6      |
2066f3191849SMichael Baum   |                       | | rdma-core 24  | | rdma-core 23  |
2067f3191849SMichael Baum   |                       | | ConnectX-5    | | ConnectX-5    |
2068db5866c8SXiaoyu Min   +-----------------------+-----------------+-----------------+
2069db5866c8SXiaoyu Min
207030d1fe29SBing Zhao.. table:: Minimal SW/HW versions for flow template API
207130d1fe29SBing Zhao
207230d1fe29SBing Zhao   +-----------------+--------------------+--------------------+
207330d1fe29SBing Zhao   | DPDK            | NIC                | Firmware           |
207430d1fe29SBing Zhao   +=================+====================+====================+
207530d1fe29SBing Zhao   | 22.11           | ConnectX-6 Dx      | xx.35.1012         |
207630d1fe29SBing Zhao   +-----------------+--------------------+--------------------+
207730d1fe29SBing Zhao
2078dac98e87SWisam JaddoNotes for metadata
2079dac98e87SWisam Jaddo------------------
2080dac98e87SWisam Jaddo
2081dac98e87SWisam JaddoMARK and META items are interrelated with datapath - they might move from/to
2082dac98e87SWisam Jaddothe applications in mbuf fields. Hence, zero value for these items has the
2083dac98e87SWisam Jaddospecial meaning - it means "no metadata are provided", not zero values are
2084dac98e87SWisam Jaddotreated by applications and PMD as valid ones.
2085dac98e87SWisam Jaddo
2086dac98e87SWisam JaddoMoreover in the flow engine domain the value zero is acceptable to match and
2087dac98e87SWisam Jaddoset, and we should allow to specify zero values as rte_flow parameters for the
2088dac98e87SWisam JaddoMETA and MARK items and actions. In the same time zero mask has no meaning and
2089dac98e87SWisam Jaddoshould be rejected on validation stage.
2090dac98e87SWisam Jaddo
2091a02aa826SBing ZhaoNotes for rte_flow
2092a02aa826SBing Zhao------------------
2093a02aa826SBing Zhao
2094a02aa826SBing ZhaoFlows are not cached in the driver.
2095a02aa826SBing ZhaoWhen stopping a device port, all the flows created on this port from the
2096a02aa826SBing Zhaoapplication will be flushed automatically in the background.
2097a02aa826SBing ZhaoAfter stopping the device port, all flows on this port become invalid and
2098a02aa826SBing Zhaonot represented in the system.
2099a02aa826SBing ZhaoAll references to these flows held by the application should be discarded
2100a02aa826SBing Zhaodirectly but neither destroyed nor flushed.
2101a02aa826SBing Zhao
2102a02aa826SBing ZhaoThe application should re-create the flows as required after the port restart.
2103a02aa826SBing Zhao
21047274b417SDariusz Sosnowski
2105c0e29968SDariusz SosnowskiNotes for flow counters
2106c0e29968SDariusz Sosnowski-----------------------
2107c0e29968SDariusz Sosnowski
2108c0e29968SDariusz Sosnowskimlx5 PMD supports the ``COUNT`` flow action,
2109c0e29968SDariusz Sosnowskiwhich provides an ability to count packets (and bytes)
2110c0e29968SDariusz Sosnowskimatched against a given flow rule.
2111c0e29968SDariusz SosnowskiThis section describes the high level overview of
2112c0e29968SDariusz Sosnowskihow this support is implemented and limitations.
2113c0e29968SDariusz Sosnowski
2114c0e29968SDariusz SosnowskiHW steering flow engine
2115c0e29968SDariusz Sosnowski~~~~~~~~~~~~~~~~~~~~~~~
2116c0e29968SDariusz Sosnowski
2117c0e29968SDariusz SosnowskiFlow counters are allocated from HW in bulks.
2118c0e29968SDariusz SosnowskiA set of bulks forms a flow counter pool managed by PMD.
2119c0e29968SDariusz SosnowskiWhen flow counters are queried from HW,
2120c0e29968SDariusz Sosnowskieach counter is identified by an offset in a given bulk.
2121c0e29968SDariusz SosnowskiQuerying HW flow counter requires sending a request to HW,
2122c0e29968SDariusz Sosnowskiwhich will request a read of counter values for given offsets.
2123c0e29968SDariusz SosnowskiHW will asynchronously provide these values through a DMA write.
2124c0e29968SDariusz Sosnowski
2125c0e29968SDariusz SosnowskiIn order to optimize HW to SW communication,
2126c0e29968SDariusz Sosnowskithese requests are handled in a separate counter service thread
2127c0e29968SDariusz Sosnowskispawned by mlx5 PMD.
2128c0e29968SDariusz SosnowskiThis service thread will refresh the counter values stored in memory,
2129c0e29968SDariusz Sosnowskiin cycles, each spanning ``svc_cycle_time`` milliseconds.
2130c0e29968SDariusz SosnowskiBy default, ``svc_cycle_time`` is set to 500.
2131c0e29968SDariusz SosnowskiWhen applications query the ``COUNT`` flow action,
2132c0e29968SDariusz SosnowskiPMD returns the values stored in host memory.
2133c0e29968SDariusz Sosnowski
2134c0e29968SDariusz Sosnowskimlx5 PMD manages 3 global rings of allocated counter offsets:
2135c0e29968SDariusz Sosnowski
2136c0e29968SDariusz Sosnowski- ``free`` ring - Counters which were not used at all.
2137c0e29968SDariusz Sosnowski- ``wait_reset`` ring - Counters which were used in some flow rules,
2138c0e29968SDariusz Sosnowski  but were recently freed (flow rule was destroyed
2139c0e29968SDariusz Sosnowski  or an indirect action was destroyed).
2140c0e29968SDariusz Sosnowski  Since the count value might have changed
2141c0e29968SDariusz Sosnowski  between the last counter service thread cycle and the moment it was freed,
2142c0e29968SDariusz Sosnowski  the value in host memory might be stale.
2143c0e29968SDariusz Sosnowski  During the next service thread cycle,
2144c0e29968SDariusz Sosnowski  such counters will be moved to ``reuse`` ring.
2145c0e29968SDariusz Sosnowski- ``reuse`` ring - Counters which were used at least once
2146c0e29968SDariusz Sosnowski  and can be reused in new flow rules.
2147c0e29968SDariusz Sosnowski
2148c0e29968SDariusz SosnowskiWhen counters are assigned to a flow rule (or allocated to indirect action),
2149c0e29968SDariusz Sosnowskithe PMD first tries to fetch a counter from ``reuse`` ring.
2150c0e29968SDariusz SosnowskiIf it's empty, the PMD fetches a counter from ``free`` ring.
2151c0e29968SDariusz Sosnowski
2152c0e29968SDariusz SosnowskiThe counter service thread works as follows:
2153c0e29968SDariusz Sosnowski
2154c0e29968SDariusz Sosnowski#. Record counters stored in ``wait_reset`` ring.
2155c0e29968SDariusz Sosnowski#. Read values of all counters which were used at least once
2156c0e29968SDariusz Sosnowski   or are currently in use.
2157c0e29968SDariusz Sosnowski#. Move recorded counters from ``wait_reset`` to ``reuse`` ring.
2158c0e29968SDariusz Sosnowski#. Sleep for ``(query time) - svc_cycle_time`` milliseconds
2159c0e29968SDariusz Sosnowski#. Repeat.
2160c0e29968SDariusz Sosnowski
2161c0e29968SDariusz SosnowskiBecause freeing a counter (by destroying a flow rule or destroying indirect action)
2162c0e29968SDariusz Sosnowskidoes not immediately make it available for the application,
2163c0e29968SDariusz Sosnowskithe PMD might return:
2164c0e29968SDariusz Sosnowski
2165c0e29968SDariusz Sosnowski- ``ENOENT`` if no counter is available in ``free``, ``reuse``
2166c0e29968SDariusz Sosnowski  or ``wait_reset`` rings.
2167c0e29968SDariusz Sosnowski  No counter will be available until the application releases some of them.
2168c0e29968SDariusz Sosnowski- ``EAGAIN`` if no counter is available in ``free`` and ``reuse`` rings,
2169c0e29968SDariusz Sosnowski  but there are counters in ``wait_reset`` ring.
2170c0e29968SDariusz Sosnowski  This means that after the next service thread cycle new counters will be available.
2171c0e29968SDariusz Sosnowski
2172c0e29968SDariusz SosnowskiThe application has to be aware that flow rule create or indirect action create
2173c0e29968SDariusz Sosnowskimight need be retried.
2174c0e29968SDariusz Sosnowski
2175c0e29968SDariusz Sosnowski
21767274b417SDariusz SosnowskiNotes for hairpin
21777274b417SDariusz Sosnowski-----------------
21787274b417SDariusz Sosnowski
21797274b417SDariusz SosnowskiNVIDIA ConnectX and BlueField devices support
21807274b417SDariusz Sosnowskispecifying memory placement for hairpin Rx and Tx queues.
21817274b417SDariusz SosnowskiThis feature requires NVIDIA MLNX_OFED 5.8.
21827274b417SDariusz Sosnowski
21837274b417SDariusz SosnowskiBy default, data buffers and packet descriptors for hairpin queues
21847274b417SDariusz Sosnowskiare placed in device memory
21857274b417SDariusz Sosnowskiwhich is shared with other resources (e.g. flow rules).
21867274b417SDariusz Sosnowski
21877274b417SDariusz SosnowskiStarting with DPDK 22.11 and NVIDIA MLNX_OFED 5.8,
21887274b417SDariusz Sosnowskiapplications are allowed to:
21897274b417SDariusz Sosnowski
2190f2d43ff5SDariusz Sosnowski#. Place data buffers and Rx packet descriptors in dedicated device memory.
2191f2d43ff5SDariusz Sosnowski   Application can request that configuration
2192f2d43ff5SDariusz Sosnowski   through ``use_locked_device_memory`` configuration option.
2193f2d43ff5SDariusz Sosnowski
2194f2d43ff5SDariusz Sosnowski   Placing data buffers and Rx packet descriptors in dedicated device memory
2195f2d43ff5SDariusz Sosnowski   can decrease latency on hairpinned traffic,
2196f2d43ff5SDariusz Sosnowski   since traffic processing for the hairpin queue will not be memory starved.
2197f2d43ff5SDariusz Sosnowski
2198f2d43ff5SDariusz Sosnowski   However, reserving device memory for hairpin Rx queues
2199f2d43ff5SDariusz Sosnowski   may decrease throughput under heavy load,
2200f2d43ff5SDariusz Sosnowski   since less resources will be available on device.
2201f2d43ff5SDariusz Sosnowski
2202f2d43ff5SDariusz Sosnowski   This option is supported only for Rx hairpin queues.
2203f2d43ff5SDariusz Sosnowski
22047274b417SDariusz Sosnowski#. Place Tx packet descriptors in host memory.
22057274b417SDariusz Sosnowski   Application can request that configuration
22067274b417SDariusz Sosnowski   through ``use_rte_memory`` configuration option.
22077274b417SDariusz Sosnowski
22087274b417SDariusz Sosnowski   Placing Tx packet descritors in host memory can increase traffic throughput.
22097274b417SDariusz Sosnowski   This results in more resources available on the device for other purposes,
22107274b417SDariusz Sosnowski   which reduces memory contention on device.
22117274b417SDariusz Sosnowski   Side effect of this option is visible increase in latency,
22127274b417SDariusz Sosnowski   since each packet incurs additional PCI transactions.
22137274b417SDariusz Sosnowski
22147274b417SDariusz Sosnowski   This option is supported only for Tx hairpin queues.
22157274b417SDariusz Sosnowski
22167274b417SDariusz Sosnowski
2217699abebaSOlga ShernNotes for testpmd
2218699abebaSOlga Shern-----------------
2219699abebaSOlga Shern
22208809f78cSBruce RichardsonCompared to librte_net_mlx4 that implements a single RSS configuration per
22218809f78cSBruce Richardsonport, librte_net_mlx5 supports per-protocol RSS configuration.
2222699abebaSOlga Shern
2223699abebaSOlga ShernSince ``testpmd`` defaults to IP RSS mode and there is currently no
2224699abebaSOlga Sherncommand-line parameter to enable additional protocols (UDP and TCP as well
2225699abebaSOlga Shernas IP), the following commands must be entered from its CLI to get the same
22268809f78cSBruce Richardsonbehavior as librte_net_mlx4::
2227699abebaSOlga Shern
2228699abebaSOlga Shern   > port stop all
2229699abebaSOlga Shern   > port config all rss all
2230699abebaSOlga Shern   > port start all
2231699abebaSOlga Shern
2232a7e11a0cSAdrien MazarguilUsage example
2233a7e11a0cSAdrien Mazarguil-------------
2234a7e11a0cSAdrien Mazarguil
22357b61f14eSRaslan DarawshehThis section demonstrates how to launch **testpmd** with NVIDIA
22368809f78cSBruce RichardsonConnectX-4/ConnectX-5/ConnectX-6/BlueField devices managed by librte_net_mlx5.
2237a7e11a0cSAdrien Mazarguil
2238499b461fSThomas Monjalon#. Load the kernel modules::
2239a7e11a0cSAdrien Mazarguil
2240a7e11a0cSAdrien Mazarguil      modprobe -a ib_uverbs mlx5_core mlx5_ib
2241a7e11a0cSAdrien Mazarguil
2242709bd34cSYongseok Koh   Alternatively if MLNX_OFED/MLNX_EN is fully installed, the following script
2243499b461fSThomas Monjalon   can be run::
2244699abebaSOlga Shern
2245699abebaSOlga Shern      /etc/init.d/openibd restart
2246699abebaSOlga Shern
2247a7e11a0cSAdrien Mazarguil   .. note::
2248a7e11a0cSAdrien Mazarguil
2249a7e11a0cSAdrien Mazarguil      User space I/O kernel modules (uio and igb_uio) are not used and do
2250a7e11a0cSAdrien Mazarguil      not have to be loaded.
2251a7e11a0cSAdrien Mazarguil
2252a7e11a0cSAdrien Mazarguil#. Make sure Ethernet interfaces are in working order and linked to kernel
2253499b461fSThomas Monjalon   verbs. Related sysfs entries should be present::
2254a7e11a0cSAdrien Mazarguil
2255a7e11a0cSAdrien Mazarguil      ls -d /sys/class/net/*/device/infiniband_verbs/uverbs* | cut -d / -f 5
2256a7e11a0cSAdrien Mazarguil
2257499b461fSThomas Monjalon   Example output::
2258a7e11a0cSAdrien Mazarguil
2259a7e11a0cSAdrien Mazarguil      eth30
2260a7e11a0cSAdrien Mazarguil      eth31
2261a7e11a0cSAdrien Mazarguil      eth32
2262a7e11a0cSAdrien Mazarguil      eth33
2263a7e11a0cSAdrien Mazarguil
2264db27370bSStephen Hemminger#. Optionally, retrieve their PCI bus addresses for to be used with the allow list::
2265a7e11a0cSAdrien Mazarguil
2266a7e11a0cSAdrien Mazarguil      {
2267a7e11a0cSAdrien Mazarguil          for intf in eth2 eth3 eth4 eth5;
2268a7e11a0cSAdrien Mazarguil          do
2269a7e11a0cSAdrien Mazarguil              (cd "/sys/class/net/${intf}/device/" && pwd -P);
2270a7e11a0cSAdrien Mazarguil          done;
2271a7e11a0cSAdrien Mazarguil      } |
2272db27370bSStephen Hemminger      sed -n 's,.*/\(.*\),-a \1,p'
2273a7e11a0cSAdrien Mazarguil
2274499b461fSThomas Monjalon   Example output::
2275a7e11a0cSAdrien Mazarguil
2276db27370bSStephen Hemminger      -a 0000:05:00.1
2277db27370bSStephen Hemminger      -a 0000:06:00.0
2278db27370bSStephen Hemminger      -a 0000:06:00.1
2279db27370bSStephen Hemminger      -a 0000:05:00.0
2280a7e11a0cSAdrien Mazarguil
2281499b461fSThomas Monjalon#. Request huge pages::
2282a7e11a0cSAdrien Mazarguil
2283de34aaa9SThomas Monjalon      dpdk-hugepages.py --setup 2G
2284a7e11a0cSAdrien Mazarguil
2285499b461fSThomas Monjalon#. Start testpmd with basic parameters::
2286a7e11a0cSAdrien Mazarguil
2287e3f15be4SSarosh Arif      dpdk-testpmd -l 8-15 -n 4 -a 05:00.0 -a 05:00.1 -a 06:00.0 -a 06:00.1 -- --rxq=2 --txq=2 -i
2288a7e11a0cSAdrien Mazarguil
2289499b461fSThomas Monjalon   Example output::
2290a7e11a0cSAdrien Mazarguil
2291a7e11a0cSAdrien Mazarguil      [...]
2292a7e11a0cSAdrien Mazarguil      EAL: PCI device 0000:05:00.0 on NUMA socket 0
22938809f78cSBruce Richardson      EAL:   probe driver: 15b3:1013 librte_net_mlx5
22948809f78cSBruce Richardson      PMD: librte_net_mlx5: PCI information matches, using device "mlx5_0" (VF: false)
22958809f78cSBruce Richardson      PMD: librte_net_mlx5: 1 port(s) detected
22968809f78cSBruce Richardson      PMD: librte_net_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fe
2297a7e11a0cSAdrien Mazarguil      EAL: PCI device 0000:05:00.1 on NUMA socket 0
22988809f78cSBruce Richardson      EAL:   probe driver: 15b3:1013 librte_net_mlx5
22998809f78cSBruce Richardson      PMD: librte_net_mlx5: PCI information matches, using device "mlx5_1" (VF: false)
23008809f78cSBruce Richardson      PMD: librte_net_mlx5: 1 port(s) detected
23018809f78cSBruce Richardson      PMD: librte_net_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:ff
2302a7e11a0cSAdrien Mazarguil      EAL: PCI device 0000:06:00.0 on NUMA socket 0
23038809f78cSBruce Richardson      EAL:   probe driver: 15b3:1013 librte_net_mlx5
23048809f78cSBruce Richardson      PMD: librte_net_mlx5: PCI information matches, using device "mlx5_2" (VF: false)
23058809f78cSBruce Richardson      PMD: librte_net_mlx5: 1 port(s) detected
23068809f78cSBruce Richardson      PMD: librte_net_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fa
2307a7e11a0cSAdrien Mazarguil      EAL: PCI device 0000:06:00.1 on NUMA socket 0
23088809f78cSBruce Richardson      EAL:   probe driver: 15b3:1013 librte_net_mlx5
23098809f78cSBruce Richardson      PMD: librte_net_mlx5: PCI information matches, using device "mlx5_3" (VF: false)
23108809f78cSBruce Richardson      PMD: librte_net_mlx5: 1 port(s) detected
23118809f78cSBruce Richardson      PMD: librte_net_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fb
2312a7e11a0cSAdrien Mazarguil      Interactive-mode selected
2313a7e11a0cSAdrien Mazarguil      Configuring Port 0 (socket 0)
23148809f78cSBruce Richardson      PMD: librte_net_mlx5: 0x8cba80: TX queues number update: 0 -> 2
23158809f78cSBruce Richardson      PMD: librte_net_mlx5: 0x8cba80: RX queues number update: 0 -> 2
2316a7e11a0cSAdrien Mazarguil      Port 0: E4:1D:2D:E7:0C:FE
2317a7e11a0cSAdrien Mazarguil      Configuring Port 1 (socket 0)
23188809f78cSBruce Richardson      PMD: librte_net_mlx5: 0x8ccac8: TX queues number update: 0 -> 2
23198809f78cSBruce Richardson      PMD: librte_net_mlx5: 0x8ccac8: RX queues number update: 0 -> 2
2320a7e11a0cSAdrien Mazarguil      Port 1: E4:1D:2D:E7:0C:FF
2321a7e11a0cSAdrien Mazarguil      Configuring Port 2 (socket 0)
23228809f78cSBruce Richardson      PMD: librte_net_mlx5: 0x8cdb10: TX queues number update: 0 -> 2
23238809f78cSBruce Richardson      PMD: librte_net_mlx5: 0x8cdb10: RX queues number update: 0 -> 2
2324a7e11a0cSAdrien Mazarguil      Port 2: E4:1D:2D:E7:0C:FA
2325a7e11a0cSAdrien Mazarguil      Configuring Port 3 (socket 0)
23268809f78cSBruce Richardson      PMD: librte_net_mlx5: 0x8ceb58: TX queues number update: 0 -> 2
23278809f78cSBruce Richardson      PMD: librte_net_mlx5: 0x8ceb58: RX queues number update: 0 -> 2
2328a7e11a0cSAdrien Mazarguil      Port 3: E4:1D:2D:E7:0C:FB
2329a7e11a0cSAdrien Mazarguil      Checking link statuses...
2330a7e11a0cSAdrien Mazarguil      Port 0 Link Up - speed 40000 Mbps - full-duplex
2331a7e11a0cSAdrien Mazarguil      Port 1 Link Up - speed 40000 Mbps - full-duplex
2332a7e11a0cSAdrien Mazarguil      Port 2 Link Up - speed 10000 Mbps - full-duplex
2333a7e11a0cSAdrien Mazarguil      Port 3 Link Up - speed 10000 Mbps - full-duplex
2334a7e11a0cSAdrien Mazarguil      Done
2335a7e11a0cSAdrien Mazarguil      testpmd>
23369c5d2184SXueming Li
23379c5d2184SXueming LiHow to dump flows
23389c5d2184SXueming Li-----------------
23399c5d2184SXueming Li
23409c5d2184SXueming LiThis section demonstrates how to dump flows. Currently, it's possible to dump
23419c5d2184SXueming Liall flows with assistance of external tools.
23429c5d2184SXueming Li
23439c5d2184SXueming Li#. 2 ways to get flow raw file:
23449c5d2184SXueming Li
23459c5d2184SXueming Li   - Using testpmd CLI:
23469c5d2184SXueming Li
23479c5d2184SXueming Li   .. code-block:: console
23489c5d2184SXueming Li
234950c38379SHaifei Luo       To dump all flows:
235050c38379SHaifei Luo       testpmd> flow dump <port> all <output_file>
235150c38379SHaifei Luo       and dump one flow:
235250c38379SHaifei Luo       testpmd> flow dump <port> rule <rule_id> <output_file>
23539c5d2184SXueming Li
23549c5d2184SXueming Li   - call rte_flow_dev_dump api:
23559c5d2184SXueming Li
23569c5d2184SXueming Li   .. code-block:: console
23579c5d2184SXueming Li
235850c38379SHaifei Luo       rte_flow_dev_dump(port, flow, file, NULL);
23599c5d2184SXueming Li
23609c5d2184SXueming Li#. Dump human-readable flows from raw file:
23619c5d2184SXueming Li
23629c5d2184SXueming Li   Get flow parsing tool from: https://github.com/Mellanox/mlx_steering_dump
23639c5d2184SXueming Li
23649c5d2184SXueming Li   .. code-block:: console
23659c5d2184SXueming Li
236650c38379SHaifei Luo       mlx_steering_dump.py -f <output_file> -flowptr <flow_ptr>
236748fbc1beSShun Hao
236848fbc1beSShun HaoHow to share a meter between ports in the same switch domain
236948fbc1beSShun Hao------------------------------------------------------------
237048fbc1beSShun Hao
237148fbc1beSShun HaoThis section demonstrates how to use the shared meter. A meter M can be created
237248fbc1beSShun Haoon port X and to be shared with a port Y on the same switch domain by the next way:
237348fbc1beSShun Hao
237448fbc1beSShun Hao.. code-block:: console
237548fbc1beSShun Hao
237648fbc1beSShun Hao   flow create X ingress transfer pattern eth / port_id id is Y / end actions meter mtr_id M / end
2377a3b7af90SShun Hao
2378a3b7af90SShun HaoHow to use meter hierarchy
2379a3b7af90SShun Hao--------------------------
2380a3b7af90SShun Hao
2381a3b7af90SShun HaoThis section demonstrates how to create and use a meter hierarchy.
2382a3b7af90SShun HaoA termination meter M can be the policy green action of another termination meter N.
2383a3b7af90SShun HaoThe two meters are chained together as a chain. Using meter N in a flow will apply
2384a3b7af90SShun Haoboth the meters in hierarchy on that flow.
2385a3b7af90SShun Hao
2386a3b7af90SShun Hao.. code-block:: console
2387a3b7af90SShun Hao
2388a3b7af90SShun Hao   add port meter policy 0 1 g_actions queue index 0 / end y_actions end r_actions drop / end
2389a3b7af90SShun Hao   create port meter 0 M 1 1 yes 0xffff 1 0
2390a3b7af90SShun Hao   add port meter policy 0 2 g_actions meter mtr_id M / end y_actions end r_actions drop / end
2391a3b7af90SShun Hao   create port meter 0 N 2 2 yes 0xffff 1 0
2392a3b7af90SShun Hao   flow create 0 ingress group 1 pattern eth / end actions meter mtr_id N / end
2393dfd3e840SAsaf Penso
2394dfd3e840SAsaf PensoHow to configure a VF as trusted
2395dfd3e840SAsaf Penso--------------------------------
2396dfd3e840SAsaf Penso
2397dfd3e840SAsaf PensoThis section demonstrates how to configure a virtual function (VF) interface as trusted.
2398dfd3e840SAsaf PensoTrusted VF is needed to offload rules with rte_flow to a group that is bigger than 0.
2399dfd3e840SAsaf PensoThe configuration is done in two parts: driver and FW.
2400dfd3e840SAsaf Penso
2401dfd3e840SAsaf PensoThe procedure below is an example of using a ConnectX-5 adapter card (pf0) with 2 VFs:
2402dfd3e840SAsaf Penso
2403dfd3e840SAsaf Penso#. Create 2 VFs on the PF pf0 when in Legacy SR-IOV mode::
2404dfd3e840SAsaf Penso
2405dfd3e840SAsaf Penso   $ echo 2 > /sys/class/net/pf0/device/mlx5_num_vfs
2406dfd3e840SAsaf Penso
2407dfd3e840SAsaf Penso#. Verify the VFs are created:
2408dfd3e840SAsaf Penso
2409dfd3e840SAsaf Penso   .. code-block:: console
2410dfd3e840SAsaf Penso
2411dfd3e840SAsaf Penso      $ lspci | grep Mellanox
2412dfd3e840SAsaf Penso      82:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
2413dfd3e840SAsaf Penso      82:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
2414dfd3e840SAsaf Penso      82:00.2 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
2415dfd3e840SAsaf Penso      82:00.3 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
2416dfd3e840SAsaf Penso
2417dfd3e840SAsaf Penso#. Unbind all VFs. For each VF PCIe, using the following command to unbind the driver::
2418dfd3e840SAsaf Penso
2419dfd3e840SAsaf Penso   $ echo "0000:82:00.2" >> /sys/bus/pci/drivers/mlx5_core/unbind
2420dfd3e840SAsaf Penso
2421dfd3e840SAsaf Penso#. Set the VFs to be trusted for the kernel by using one of the methods below:
2422dfd3e840SAsaf Penso
2423dfd3e840SAsaf Penso      - Using sysfs file::
2424dfd3e840SAsaf Penso
2425dfd3e840SAsaf Penso        $ echo ON | tee /sys/class/net/pf0/device/sriov/0/trust
2426dfd3e840SAsaf Penso        $ echo ON | tee /sys/class/net/pf0/device/sriov/1/trust
2427dfd3e840SAsaf Penso
2428dfd3e840SAsaf Penso      - Using “ip link” command::
2429dfd3e840SAsaf Penso
2430dfd3e840SAsaf Penso        $ ip link set p0 vf 0 trust on
2431dfd3e840SAsaf Penso        $ ip link set p0 vf 1 trust on
2432dfd3e840SAsaf Penso
24333934143fSAli Alnubani#. Configure all VFs using ``mlxreg``:
24343934143fSAli Alnubani
24353934143fSAli Alnubani   - For MFT >= 4.21::
24363934143fSAli Alnubani
24373934143fSAli Alnubani     $ mlxreg -d /dev/mst/mt4121_pciconf0 --reg_name VHCA_TRUST_LEVEL --yes --indexes 'all_vhca=0x1,vhca_id=0x0' --set 'trust_level=0x1'
24383934143fSAli Alnubani
24393934143fSAli Alnubani   - For MFT < 4.21::
2440dfd3e840SAsaf Penso
2441dfd3e840SAsaf Penso     $ mlxreg -d /dev/mst/mt4121_pciconf0 --reg_name VHCA_TRUST_LEVEL --yes --set "all_vhca=0x1,trust_level=0x1"
2442dfd3e840SAsaf Penso
2443dfd3e840SAsaf Penso   .. note::
2444dfd3e840SAsaf Penso
2445dfd3e840SAsaf Penso      Firmware version used must be >= xx.29.1016 and MFT >= 4.18
2446dfd3e840SAsaf Penso
2447dfd3e840SAsaf Penso#. For each VF PCIe, using the following command to bind the driver::
2448dfd3e840SAsaf Penso
2449dfd3e840SAsaf Penso   $ echo "0000:82:00.2" >> /sys/bus/pci/drivers/mlx5_core/bind
24502235fcdaSSpike Du
24519cbf71e9SViacheslav OvsiienkoHow to trace Tx datapath
24529cbf71e9SViacheslav Ovsiienko------------------------
24539cbf71e9SViacheslav Ovsiienko
24549cbf71e9SViacheslav OvsiienkoThe mlx5 PMD provides Tx datapath tracing capability with extra debug information:
24559cbf71e9SViacheslav Ovsiienkowhen and how packets were scheduled,
24569cbf71e9SViacheslav Ovsiienkoand when the actual sending was completed by the NIC hardware.
24579cbf71e9SViacheslav Ovsiienko
24589cbf71e9SViacheslav OvsiienkoSteps to enable Tx datapath tracing:
24599cbf71e9SViacheslav Ovsiienko
24609cbf71e9SViacheslav Ovsiienko#. Build DPDK application with enabled datapath tracing
24619cbf71e9SViacheslav Ovsiienko
24629cbf71e9SViacheslav Ovsiienko   The Meson option ``--enable_trace_fp=true`` and
24639cbf71e9SViacheslav Ovsiienko   the C flag ``ALLOW_EXPERIMENTAL_API`` should be specified.
24649cbf71e9SViacheslav Ovsiienko
24659cbf71e9SViacheslav Ovsiienko   .. code-block:: console
24669cbf71e9SViacheslav Ovsiienko
24679cbf71e9SViacheslav Ovsiienko      meson configure --buildtype=debug -Denable_trace_fp=true
24689cbf71e9SViacheslav Ovsiienko         -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API' build
24699cbf71e9SViacheslav Ovsiienko
24709cbf71e9SViacheslav Ovsiienko#. Configure the NIC
24719cbf71e9SViacheslav Ovsiienko
24729cbf71e9SViacheslav Ovsiienko   If the sending completion timings are important,
24739cbf71e9SViacheslav Ovsiienko   the NIC should be configured to provide realtime timestamps.
24749cbf71e9SViacheslav Ovsiienko   The non-volatile settings parameter  ``REAL_TIME_CLOCK_ENABLE`` should be configured as ``1``.
24759cbf71e9SViacheslav Ovsiienko
24769cbf71e9SViacheslav Ovsiienko   .. code-block:: console
24779cbf71e9SViacheslav Ovsiienko
24789cbf71e9SViacheslav Ovsiienko      mlxconfig -d /dev/mst/mt4125_pciconf0 s REAL_TIME_CLOCK_ENABLE=1
24799cbf71e9SViacheslav Ovsiienko
24809cbf71e9SViacheslav Ovsiienko   The ``mlxconfig`` utility is part of the MFT package.
24819cbf71e9SViacheslav Ovsiienko
24829cbf71e9SViacheslav Ovsiienko#. Run application with EAL parameter enabling tracing in mlx5 Tx datapath
24839cbf71e9SViacheslav Ovsiienko
24849cbf71e9SViacheslav Ovsiienko   By default all tracepoints are disabled.
24859cbf71e9SViacheslav Ovsiienko   To analyze Tx datapath and its timings: ``--trace=pmd.net.mlx5.tx``.
24869cbf71e9SViacheslav Ovsiienko
24879cbf71e9SViacheslav Ovsiienko#. Commit the tracing data to the storage (with ``rte_trace_save()`` API call).
24889cbf71e9SViacheslav Ovsiienko
24899cbf71e9SViacheslav Ovsiienko#. Install or build the ``babeltrace2`` package
24909cbf71e9SViacheslav Ovsiienko
24919cbf71e9SViacheslav Ovsiienko   The Python script analyzing gathered trace data uses the ``babeltrace2`` library.
24929cbf71e9SViacheslav Ovsiienko   The package should be either installed or built from source as shown below.
24939cbf71e9SViacheslav Ovsiienko
24949cbf71e9SViacheslav Ovsiienko   .. code-block:: console
24959cbf71e9SViacheslav Ovsiienko
24969cbf71e9SViacheslav Ovsiienko      git clone https://github.com/efficios/babeltrace.git
24979cbf71e9SViacheslav Ovsiienko      cd babeltrace
24989cbf71e9SViacheslav Ovsiienko      ./bootstrap
24999cbf71e9SViacheslav Ovsiienko      ./configure -help
25009cbf71e9SViacheslav Ovsiienko      ./configure --disable-api-doc --disable-man-pages
25019cbf71e9SViacheslav Ovsiienko                  --disable-python-bindings-doc --enable-python-plugins
25029cbf71e9SViacheslav Ovsiienko                  --enable-python-binding
25039cbf71e9SViacheslav Ovsiienko
25049cbf71e9SViacheslav Ovsiienko#. Run analyzing script
25059cbf71e9SViacheslav Ovsiienko
25069cbf71e9SViacheslav Ovsiienko   ``mlx5_trace.py`` is used to combine related events (packet firing and completion)
25079cbf71e9SViacheslav Ovsiienko   and to show the results in human-readable view.
25089cbf71e9SViacheslav Ovsiienko
25099cbf71e9SViacheslav Ovsiienko   The analyzing script is located in the DPDK source tree: ``drivers/net/mlx5/tools``.
25109cbf71e9SViacheslav Ovsiienko
25119cbf71e9SViacheslav Ovsiienko   It requires Python 3.6 and ``babeltrace2`` package.
25129cbf71e9SViacheslav Ovsiienko
25139cbf71e9SViacheslav Ovsiienko   The parameter of the script is the trace data folder.
25149cbf71e9SViacheslav Ovsiienko
2515171360dfSViacheslav Ovsiienko   The optional parameter ``-a`` forces to dump incomplete bursts.
2516171360dfSViacheslav Ovsiienko
2517171360dfSViacheslav Ovsiienko   The optional parameter ``-v [level]`` forces to dump raw records data
2518171360dfSViacheslav Ovsiienko   for the specified level and below.
2519171360dfSViacheslav Ovsiienko   Level 0 dumps bursts, level 1 dumps WQEs, level 2 dumps mbufs.
2520171360dfSViacheslav Ovsiienko
25219cbf71e9SViacheslav Ovsiienko   .. code-block:: console
25229cbf71e9SViacheslav Ovsiienko
25239cbf71e9SViacheslav Ovsiienko      mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39
25249cbf71e9SViacheslav Ovsiienko
25259cbf71e9SViacheslav Ovsiienko#. Interpreting the script output data
25269cbf71e9SViacheslav Ovsiienko
25279cbf71e9SViacheslav Ovsiienko   All the timings are given in nanoseconds.
25289cbf71e9SViacheslav Ovsiienko   The list of Tx bursts per port/queue is presented in the output.
25299cbf71e9SViacheslav Ovsiienko   Each list element contains the list of built WQEs with specific opcodes.
25309cbf71e9SViacheslav Ovsiienko   Each WQE contains the list of the encompassed packets to send.
25319cbf71e9SViacheslav Ovsiienko
25322235fcdaSSpike DuHost shaper
25332235fcdaSSpike Du-----------
25342235fcdaSSpike Du
25352235fcdaSSpike DuHost shaper register is per host port register
25362235fcdaSSpike Duwhich sets a shaper on the host port.
25372235fcdaSSpike DuAll VF/host PF representors belonging to one host port share one host shaper.
25382235fcdaSSpike DuFor example, if representor 0 and representor 1 belong to the same host port,
25392235fcdaSSpike Duand a host shaper rate of 1Gbps is configured,
25402235fcdaSSpike Duthe shaper throttles both representors traffic from the host.
25412235fcdaSSpike Du
25422235fcdaSSpike DuHost shaper has two modes for setting the shaper,
25432235fcdaSSpike Duimmediate and deferred to available descriptor threshold event trigger.
25442235fcdaSSpike Du
25452235fcdaSSpike DuIn immediate mode, the rate limit is configured immediately to host shaper.
25462235fcdaSSpike Du
25472235fcdaSSpike DuWhen deferring to the available descriptor threshold trigger,
25482235fcdaSSpike Duthe shaper is not set until an available descriptor threshold event
25492235fcdaSSpike Duis received by any Rx queue in a VF representor belonging to the host port.
25502235fcdaSSpike DuThe only rate supported for deferred mode is 100Mbps
25512235fcdaSSpike Du(there is no limit on the supported rates for immediate mode).
25522235fcdaSSpike DuIn deferred mode, the shaper is set on the host port by the firmware
25532235fcdaSSpike Duupon receiving the available descriptor threshold event,
25542235fcdaSSpike Duwhich allows throttling host traffic on available descriptor threshold events
25552235fcdaSSpike Duat minimum latency, preventing excess drops in the Rx queue.
25562235fcdaSSpike Du
25572235fcdaSSpike DuDependency on mstflint package
25582235fcdaSSpike Du~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
25592235fcdaSSpike Du
25602235fcdaSSpike DuIn order to configure host shaper register,
25612235fcdaSSpike Du``librte_net_mlx5`` depends on ``libmtcr_ul``
25620f91f952SThomas Monjalonwhich can be installed from MLNX_OFED mstflint package.
25632235fcdaSSpike DuMeson detects ``libmtcr_ul`` existence at configure stage.
25642235fcdaSSpike DuIf the library is detected, the application must link with ``-lmtcr_ul``,
25652235fcdaSSpike Duas done by the pkg-config file libdpdk.pc.
2566f41a5092SSpike Du
2567f41a5092SSpike DuAvailable descriptor threshold and host shaper
2568f41a5092SSpike Du~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2569f41a5092SSpike Du
2570f41a5092SSpike DuThere is a command to configure the available descriptor threshold in testpmd.
2571f41a5092SSpike DuTestpmd also contains sample logic to handle available descriptor threshold events.
2572f41a5092SSpike DuThe typical workflow is:
2573f41a5092SSpike Dutestpmd configures available descriptor threshold for Rx queues,
2574f41a5092SSpike Duenables ``avail_thresh_triggered`` in host shaper and registers a callback.
2575f41a5092SSpike DuWhen traffic from the host is too high
2576f41a5092SSpike Duand Rx queue emptiness is below the available descriptor threshold,
2577f41a5092SSpike Duthe PMD receives an event
2578f41a5092SSpike Duand the firmware configures a 100Mbps shaper on the host port automatically.
2579f41a5092SSpike DuThen the PMD call the callback registered previously,
2580f41a5092SSpike Duwhich will delay a while to let Rx queue empty,
2581f41a5092SSpike Duthen disable host shaper.
2582f41a5092SSpike Du
2583eb1dcc01SThomas MonjalonLet's assume we have a simple BlueField-2 setup:
2584f41a5092SSpike Duport 0 is uplink, port 1 is VF representor.
2585f41a5092SSpike DuEach port has 2 Rx queues.
2586f41a5092SSpike DuTo control traffic from the host to the Arm device,
2587f41a5092SSpike Duwe can enable the available descriptor threshold in testpmd by:
2588f41a5092SSpike Du
2589f41a5092SSpike Du.. code-block:: console
2590f41a5092SSpike Du
2591f41a5092SSpike Du   testpmd> mlx5 set port 1 host_shaper avail_thresh_triggered 1 rate 0
2592f41a5092SSpike Du   testpmd> set port 1 rxq 0 avail_thresh 70
2593f41a5092SSpike Du   testpmd> set port 1 rxq 1 avail_thresh 70
2594f41a5092SSpike Du
2595f41a5092SSpike DuThe first command disables the current host shaper
2596f41a5092SSpike Duand enables the available descriptor threshold triggered mode.
2597f41a5092SSpike DuThe other commands configure the available descriptor threshold
2598f41a5092SSpike Duto 70% of Rx queue size for both Rx queues.
2599f41a5092SSpike Du
2600f41a5092SSpike DuWhen traffic from the host is too high,
2601f41a5092SSpike Dutestpmd console prints log about available descriptor threshold event,
2602f41a5092SSpike Duthen host shaper is disabled.
2603f41a5092SSpike DuThe traffic rate from the host is controlled and less drop happens in Rx queues.
2604f41a5092SSpike Du
2605f41a5092SSpike DuThe threshold event and shaper can be disabled like this:
2606f41a5092SSpike Du
2607f41a5092SSpike Du.. code-block:: console
2608f41a5092SSpike Du
2609f41a5092SSpike Du   testpmd> mlx5 set port 1 host_shaper avail_thresh_triggered 0 rate 0
2610f41a5092SSpike Du   testpmd> set port 1 rxq 0 avail_thresh 0
2611f41a5092SSpike Du   testpmd> set port 1 rxq 1 avail_thresh 0
2612f41a5092SSpike Du
2613f41a5092SSpike DuIt is recommended an application disables the available descriptor threshold
2614f41a5092SSpike Duand ``avail_thresh_triggered`` before exit,
2615f41a5092SSpike Duif it enables them before.
2616f41a5092SSpike Du
2617f41a5092SSpike DuThe shaper can also be configured with a value, the rate unit is 100Mbps.
2618f41a5092SSpike DuBelow, the command sets the current shaper to 5Gbps
2619f41a5092SSpike Duand disables ``avail_thresh_triggered``.
2620f41a5092SSpike Du
2621f41a5092SSpike Du.. code-block:: console
2622f41a5092SSpike Du
2623f41a5092SSpike Du   testpmd> mlx5 set port 1 host_shaper avail_thresh_triggered 0 rate 50
262485d9252eSMichael Baum
262585d9252eSMichael Baum
2626b583b9a1SFerruh YigitTestpmd driver specific commands
2627b583b9a1SFerruh Yigit--------------------------------
262885d9252eSMichael Baum
262985d9252eSMichael Baumport attach with socket path
263085d9252eSMichael Baum~~~~~~~~~~~~~~~~~~~~~~~~~~~~
263185d9252eSMichael Baum
263285d9252eSMichael BaumIt is possible to allocate a port with ``libibverbs`` from external application.
263385d9252eSMichael BaumFor importing the external port with extra device arguments,
263485d9252eSMichael Baumthere is a specific testpmd command
263585d9252eSMichael Baumsimilar to :ref:`port attach command <port_attach>`::
263685d9252eSMichael Baum
263785d9252eSMichael Baum   testpmd> mlx5 port attach (identifier) socket=(path)
263885d9252eSMichael Baum
263985d9252eSMichael Baumwhere:
264085d9252eSMichael Baum
264185d9252eSMichael Baum* ``identifier``: device identifier with optional parameters
264285d9252eSMichael Baum  as same as :ref:`port attach command <port_attach>`.
264385d9252eSMichael Baum* ``path``: path to IPC server socket created by the external application.
264485d9252eSMichael Baum
264585d9252eSMichael BaumThis command performs:
264685d9252eSMichael Baum
264785d9252eSMichael Baum#. Open IPC client socket using the given path, and connect it.
264885d9252eSMichael Baum
264985d9252eSMichael Baum#. Import ibverbs context and ibverbs protection domain.
265085d9252eSMichael Baum
265185d9252eSMichael Baum#. Add two device arguments for context (``cmd_fd``)
265285d9252eSMichael Baum   and protection domain (``pd_handle``) to the device identifier.
265385d9252eSMichael Baum   See :ref:`mlx5 driver options <mlx5_common_driver_options>` for more
265485d9252eSMichael Baum   information about these device arguments.
265585d9252eSMichael Baum
265685d9252eSMichael Baum#. Call the regular ``port attach`` function with updated identifier.
265785d9252eSMichael Baum
265885d9252eSMichael BaumFor example, to attach a port whose PCI address is ``0000:0a:00.0``
265985d9252eSMichael Baumand its socket path is ``/var/run/import_ipc_socket``:
266085d9252eSMichael Baum
266185d9252eSMichael Baum.. code-block:: console
266285d9252eSMichael Baum
266385d9252eSMichael Baum   testpmd> mlx5 port attach 0000:0a:00.0 socket=/var/run/import_ipc_socket
266485d9252eSMichael Baum   testpmd: MLX5 socket path is /var/run/import_ipc_socket
266585d9252eSMichael Baum   testpmd: Attach port with extra devargs 0000:0a:00.0,cmd_fd=40,pd_handle=1
266685d9252eSMichael Baum   Attaching a new port...
266785d9252eSMichael Baum   EAL: Probe PCI driver: mlx5_pci (15b3:101d) device: 0000:0a:00.0 (socket 0)
266885d9252eSMichael Baum   Port 0 is attached. Now total ports is 1
266985d9252eSMichael Baum   Done
267085d9252eSMichael Baum
2671740a2836SMichael Baum
2672740a2836SMichael Baumport map external Rx queue
2673740a2836SMichael Baum~~~~~~~~~~~~~~~~~~~~~~~~~~
2674740a2836SMichael Baum
2675740a2836SMichael BaumExternal Rx queue indexes mapping management.
2676740a2836SMichael Baum
2677740a2836SMichael BaumMap HW queue index (32-bit) to ethdev queue index (16-bit) for external Rx queue::
2678740a2836SMichael Baum
2679740a2836SMichael Baum   testpmd> mlx5 port (port_id) ext_rxq map (sw_queue_id) (hw_queue_id)
2680740a2836SMichael Baum
2681740a2836SMichael BaumUnmap external Rx queue::
2682740a2836SMichael Baum
2683740a2836SMichael Baum   testpmd> mlx5 port (port_id) ext_rxq unmap (sw_queue_id)
2684740a2836SMichael Baum
2685740a2836SMichael Baumwhere:
2686740a2836SMichael Baum
2687740a2836SMichael Baum* ``sw_queue_id``: queue index in range [64536, 65535].
2688740a2836SMichael Baum  This range is the highest 1000 numbers.
2689740a2836SMichael Baum* ``hw_queue_id``: queue index given by HW in queue creation.
2690baafc81eSRongwei Liu
26913dfa7877SKiran Vedere
26923dfa7877SKiran VedereDump RQ/SQ/CQ HW context for debug purposes
26933dfa7877SKiran Vedere~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
26943dfa7877SKiran Vedere
26953dfa7877SKiran VedereDump RQ/CQ HW context for a given port/queue to a file::
26963dfa7877SKiran Vedere
26973dfa7877SKiran Vedere   testpmd> mlx5 port (port_id) queue (queue_id) dump rq_context (file_name)
26983dfa7877SKiran Vedere
26993dfa7877SKiran VedereDump SQ/CQ HW context for a given port/queue to a file::
27003dfa7877SKiran Vedere
27013dfa7877SKiran Vedere   testpmd> mlx5 port (port_id) queue (queue_id) dump sq_context (file_name)
27023dfa7877SKiran Vedere
27033dfa7877SKiran Vedere
2704baafc81eSRongwei LiuSet Flow Engine Mode
2705baafc81eSRongwei Liu~~~~~~~~~~~~~~~~~~~~
2706baafc81eSRongwei Liu
2707baafc81eSRongwei LiuSet the flow engine to active or standby mode with specific flags (bitmap style).
270886647d46SThomas MonjalonSee ``RTE_PMD_MLX5_FLOW_ENGINE_FLAG_*`` for the flag definitions.
2709baafc81eSRongwei Liu
2710baafc81eSRongwei Liu.. code-block:: console
2711baafc81eSRongwei Liu
2712baafc81eSRongwei Liu   testpmd> mlx5 set flow_engine <active|standby> [<flags>]
2713baafc81eSRongwei Liu
2714baafc81eSRongwei LiuThis command is used for testing live migration,
2715baafc81eSRongwei Liuand works for software steering only.
2716baafc81eSRongwei LiuDefault FDB jump should be disabled if switchdev is enabled.
2717baafc81eSRongwei LiuThe mode will propagate to all the probed ports.
27180683c002SMichael Baum
27190683c002SMichael Baum
27200683c002SMichael BaumGENEVE TLV options parser
27210683c002SMichael Baum~~~~~~~~~~~~~~~~~~~~~~~~~
27220683c002SMichael Baum
27230683c002SMichael BaumSee the :ref:`GENEVE parser API <geneve_parser_api>` for more information.
27240683c002SMichael Baum
27250683c002SMichael BaumSet
27260683c002SMichael Baum^^^
27270683c002SMichael Baum
27280683c002SMichael BaumAdd single option to the global option list::
27290683c002SMichael Baum
27300683c002SMichael Baum   testpmd> mlx5 set tlv_option class (class) type (type) len (length) \
27310683c002SMichael Baum            offset (sample_offset) sample_len (sample_len) \
27320683c002SMichael Baum            class_mode (ignore|fixed|matchable) data (0xffffffff|0x0 [0xffffffff|0x0]*)
27330683c002SMichael Baum
27340683c002SMichael Baumwhere:
27350683c002SMichael Baum
27360683c002SMichael Baum* ``class``: option class.
27370683c002SMichael Baum* ``type``: option type.
27380683c002SMichael Baum* ``length``: option data length in 4 bytes granularity.
27390683c002SMichael Baum* ``sample_offset``: offset to data list related to option data start.
27400683c002SMichael Baum  The offset is in 4 bytes granularity.
27410683c002SMichael Baum* ``sample_len``: length data list in 4 bytes granularity.
27420683c002SMichael Baum* ``ignore``: ignore ``class`` field.
27430683c002SMichael Baum* ``fixed``: option class is fixed and defines the option along with the type.
27440683c002SMichael Baum* ``matchable``: ``class`` field is matchable.
27450683c002SMichael Baum* ``data``: list of masks indicating which DW should be configure.
27460683c002SMichael Baum  The size of list should be equal to ``sample_len``.
27470683c002SMichael Baum* ``0xffffffff``: this DW should be configure.
27480683c002SMichael Baum* ``0x0``: this DW shouldn't be configure.
27490683c002SMichael Baum
27500683c002SMichael BaumFlush
27510683c002SMichael Baum^^^^^
27520683c002SMichael Baum
27530683c002SMichael BaumRemove several options from the global option list::
27540683c002SMichael Baum
27550683c002SMichael Baum   testpmd> mlx5 flush tlv_options max (nb_option)
27560683c002SMichael Baum
27570683c002SMichael Baumwhere:
27580683c002SMichael Baum
27590683c002SMichael Baum* ``nb_option``: maximum number of option to remove from list. The order is LIFO.
27600683c002SMichael Baum
27610683c002SMichael BaumList
27620683c002SMichael Baum^^^^
27630683c002SMichael Baum
27640683c002SMichael BaumPrint all options which are set in the global option list so far::
27650683c002SMichael Baum
27660683c002SMichael Baum   testpmd> mlx5 list tlv_options
27670683c002SMichael Baum
27680683c002SMichael BaumOutput contains the values of each option, one per line.
27690683c002SMichael BaumThere is no output at all when no options are configured on the global list::
27700683c002SMichael Baum
27710683c002SMichael Baum   ID      Type    Class   Class_mode   Len     Offset  Sample_len   Data
27720683c002SMichael Baum   [...]   [...]   [...]   [...]        [...]   [...]   [...]        [...]
27730683c002SMichael Baum
27740683c002SMichael BaumSetting several options and listing them::
27750683c002SMichael Baum
27760683c002SMichael Baum   testpmd> mlx5 set tlv_option class 1 type 1 len 4 offset 1 sample_len 3
27770683c002SMichael Baum            class_mode fixed data 0xffffffff 0x0 0xffffffff
27780683c002SMichael Baum   testpmd: set new option in global list, now it has 1 options
27790683c002SMichael Baum   testpmd> mlx5 set tlv_option class 1 type 2 len 2 offset 0 sample_len 2
27800683c002SMichael Baum            class_mode fixed data 0xffffffff 0xffffffff
27810683c002SMichael Baum   testpmd: set new option in global list, now it has 2 options
27820683c002SMichael Baum   testpmd> mlx5 set tlv_option class 1 type 3 len 5 offset 4 sample_len 1
27830683c002SMichael Baum            class_mode fixed data 0xffffffff
27840683c002SMichael Baum   testpmd: set new option in global list, now it has 3 options
27850683c002SMichael Baum   testpmd> mlx5 list tlv_options
27860683c002SMichael Baum   ID      Type    Class   Class_mode   Len    Offset  Sample_len  Data
27870683c002SMichael Baum   0       1       1       fixed        4      1       3           0xffffffff 0x0 0xffffffff
27880683c002SMichael Baum   1       2       1       fixed        2      0       2           0xffffffff 0xffffffff
27890683c002SMichael Baum   2       3       1       fixed        5      4       1           0xffffffff
27900683c002SMichael Baum   testpmd>
27910683c002SMichael Baum
27920683c002SMichael BaumApply
27930683c002SMichael Baum^^^^^
27940683c002SMichael Baum
27950683c002SMichael BaumCreate GENEVE TLV parser for specific port using option list which are set so far::
27960683c002SMichael Baum
27970683c002SMichael Baum   testpmd> mlx5 port (port_id) apply tlv_options
27980683c002SMichael Baum
27990683c002SMichael BaumThe same global option list can used by several ports.
28000683c002SMichael Baum
28010683c002SMichael BaumDestroy
28020683c002SMichael Baum^^^^^^^
28030683c002SMichael Baum
28040683c002SMichael BaumDestroy GENEVE TLV parser for specific port::
28050683c002SMichael Baum
28060683c002SMichael Baum   testpmd> mlx5 port (port_id) destroy tlv_options
28070683c002SMichael Baum
28080683c002SMichael BaumThis command doesn't destroy the global list,
28090683c002SMichael BaumFor releasing options, ``flush`` command should be used.
2810