xref: /dpdk/doc/guides/platform/mlx5.rst (revision ab9c0ee13d8f8a03d058601b752633b2cfebe6f4)
1a3ade5e3SMichael Baum..  SPDX-License-Identifier: BSD-3-Clause
2a3ade5e3SMichael Baum    Copyright 2022 6WIND S.A.
3a3ade5e3SMichael Baum    Copyright (c) 2022 NVIDIA Corporation & Affiliates
4a3ade5e3SMichael Baum
5a3ade5e3SMichael Baum.. include:: <isonum.txt>
6a3ade5e3SMichael Baum
70f91f952SThomas MonjalonNVIDIA MLX5 Common Driver
80f91f952SThomas Monjalon=========================
90f91f952SThomas Monjalon
100f91f952SThomas Monjalon.. note::
110f91f952SThomas Monjalon
120f91f952SThomas Monjalon   NVIDIA acquired Mellanox Technologies in 2020.
130f91f952SThomas Monjalon   The DPDK documentation and code might still include instances
140f91f952SThomas Monjalon   of or references to Mellanox trademarks (like BlueField and ConnectX)
150f91f952SThomas Monjalon   that are now NVIDIA trademarks.
16a3ade5e3SMichael Baum
17a3ade5e3SMichael BaumThe mlx5 common driver library (**librte_common_mlx5**) provides support for
187b61f14eSRaslan Darawsheh**NVIDIA ConnectX-4**, **NVIDIA ConnectX-4 Lx**, **NVIDIA ConnectX-5**,
197b61f14eSRaslan Darawsheh**NVIDIA ConnectX-6**, **NVIDIA ConnectX-6 Dx**, **NVIDIA ConnectX-6 Lx**,
20cb0da841SRaslan Darawsheh**NVIDIA ConnectX-7**, **NVIDIA BlueField**, **NVIDIA BlueField-2** and
21cb0da841SRaslan Darawsheh**NVIDIA BlueField-3** families of 10/25/40/50/100/200 Gb/s adapters.
22a3ade5e3SMichael Baum
23a3ade5e3SMichael BaumInformation and documentation for these adapters can be found on the
24a3ade5e3SMichael Baum`NVIDIA website <https://www.nvidia.com/en-us/networking/>`_.
25a3ade5e3SMichael BaumHelp is also provided by the
260f91f952SThomas Monjalon`NVIDIA Networking forum <https://forums.developer.nvidia.com/c/infrastructure/369/>`_.
270f91f952SThomas MonjalonIn addition, there is a `web section dedicated to DPDK
28a3ade5e3SMichael Baum<https://developer.nvidia.com/networking/dpdk>`_.
29a3ade5e3SMichael Baum
30a3ade5e3SMichael Baum
31a3ade5e3SMichael BaumDesign
32a3ade5e3SMichael Baum------
33a3ade5e3SMichael Baum
34a3ade5e3SMichael BaumFor security reasons and to enhance robustness,
35a3ade5e3SMichael Baumthis driver only handles virtual memory addresses.
36a3ade5e3SMichael BaumThe way resources allocations are handled by the kernel,
37a3ade5e3SMichael Baumcombined with hardware specifications that allow handling virtual memory addresses directly,
38a3ade5e3SMichael Baumensure that DPDK applications cannot access random physical memory
39a3ade5e3SMichael Baum(or memory that does not belong to the current process).
40a3ade5e3SMichael Baum
41a3ade5e3SMichael BaumThere are different levels of objects and bypassing abilities
42a3ade5e3SMichael Baumwhich are used to get the best performance:
43a3ade5e3SMichael Baum
44a3ade5e3SMichael Baum- **Verbs** is a complete high-level generic API
45a3ade5e3SMichael Baum- **Direct Verbs** is a device-specific API
46a3ade5e3SMichael Baum- **DevX** allows accessing firmware objects
47a3ade5e3SMichael Baum- **Direct Rules** manages flow steering at the low-level hardware layer
48a3ade5e3SMichael Baum
49a3ade5e3SMichael BaumOn Linux, above interfaces are provided by linking with `libibverbs` and `libmlx5`.
50a3ade5e3SMichael BaumSee :ref:`mlx5_linux_prerequisites` for installation.
51a3ade5e3SMichael Baum
52a3ade5e3SMichael BaumOn Windows, DevX is the only requirement from the above list.
53a3ade5e3SMichael BaumSee :ref:`mlx5_windows_prerequisites` for DevX SDK package installation.
54a3ade5e3SMichael Baum
55a3ade5e3SMichael Baum
56a3ade5e3SMichael Baum.. _mlx5_classes:
57a3ade5e3SMichael Baum
58a3ade5e3SMichael BaumClasses
59a3ade5e3SMichael Baum-------
60a3ade5e3SMichael Baum
61a3ade5e3SMichael BaumOne mlx5 device can be probed by a number of different PMDs.
62a3ade5e3SMichael BaumTo select a specific PMD, its name should be specified as a device parameter
63a3ade5e3SMichael Baum(e.g. ``0000:08:00.1,class=eth``).
64a3ade5e3SMichael Baum
65a3ade5e3SMichael BaumIn order to allow probing by multiple PMDs,
66a3ade5e3SMichael Baumseveral classes may be listed separated by a colon.
67a3ade5e3SMichael BaumFor example: ``class=crypto:regex`` will probe both Crypto and RegEx PMDs.
68a3ade5e3SMichael Baum
69a3ade5e3SMichael Baum
70a3ade5e3SMichael BaumSupported Classes
71a3ade5e3SMichael Baum~~~~~~~~~~~~~~~~~
72a3ade5e3SMichael Baum
73a3ade5e3SMichael Baum- ``class=compress`` for :doc:`../../compressdevs/mlx5`.
74a3ade5e3SMichael Baum- ``class=crypto`` for :doc:`../../cryptodevs/mlx5`.
75a3ade5e3SMichael Baum- ``class=eth`` for :doc:`../../nics/mlx5`.
76a3ade5e3SMichael Baum- ``class=regex`` for :doc:`../../regexdevs/mlx5`.
77a3ade5e3SMichael Baum- ``class=vdpa`` for :doc:`../../vdpadevs/mlx5`.
78a3ade5e3SMichael Baum
79a3ade5e3SMichael BaumBy default, the mlx5 device will be probed by the ``eth`` PMD.
80a3ade5e3SMichael Baum
81a3ade5e3SMichael Baum
82a3ade5e3SMichael BaumLimitations
83a3ade5e3SMichael Baum~~~~~~~~~~~
84a3ade5e3SMichael Baum
85a3ade5e3SMichael Baum- ``eth`` and ``vdpa`` PMDs cannot be probed at the same time.
86a3ade5e3SMichael Baum  All other combinations are possible.
87a3ade5e3SMichael Baum
88a3ade5e3SMichael Baum- On Windows, only ``eth`` and ``crypto`` are supported.
89a3ade5e3SMichael Baum
90a3ade5e3SMichael Baum
91a3ade5e3SMichael Baum.. _mlx5_common_compilation:
92a3ade5e3SMichael Baum
93a3ade5e3SMichael BaumCompilation Prerequisites
94a3ade5e3SMichael Baum-------------------------
95a3ade5e3SMichael Baum
96a3ade5e3SMichael Baum.. _mlx5_linux_prerequisites:
97a3ade5e3SMichael Baum
98a3ade5e3SMichael BaumLinux Prerequisites
99a3ade5e3SMichael Baum~~~~~~~~~~~~~~~~~~~
100a3ade5e3SMichael Baum
101a3ade5e3SMichael BaumThis driver relies on external libraries and kernel drivers for resources
102a3ade5e3SMichael Baumallocations and initialization.
103a3ade5e3SMichael BaumThe following dependencies are not part of DPDK and must be installed separately:
104a3ade5e3SMichael Baum
105a3ade5e3SMichael Baum- **libibverbs**
106a3ade5e3SMichael Baum
107a3ade5e3SMichael Baum  User space Verbs framework used by ``librte_common_mlx5``.
108a3ade5e3SMichael Baum  This library provides a generic interface between the kernel
109a3ade5e3SMichael Baum  and low-level user space drivers such as ``libmlx5``.
110a3ade5e3SMichael Baum
111a3ade5e3SMichael Baum  It allows slow and privileged operations (context initialization,
112a3ade5e3SMichael Baum  hardware resources allocations) to be managed by the kernel
113a3ade5e3SMichael Baum  and fast operations to never leave user space.
114a3ade5e3SMichael Baum
115a3ade5e3SMichael Baum- **libmlx5**
116a3ade5e3SMichael Baum
1170f91f952SThomas Monjalon  Low-level user space driver library for NVIDIA devices,
118a3ade5e3SMichael Baum  it is automatically loaded by ``libibverbs``.
119a3ade5e3SMichael Baum
120a3ade5e3SMichael Baum  This library basically implements send/receive calls to the hardware queues.
121a3ade5e3SMichael Baum
122a3ade5e3SMichael Baum- **Kernel modules**
123a3ade5e3SMichael Baum
124a3ade5e3SMichael Baum  They provide the kernel-side Verbs API and low level device drivers
125a3ade5e3SMichael Baum  that manage actual hardware initialization
126a3ade5e3SMichael Baum  and resources sharing with user-space processes.
127a3ade5e3SMichael Baum
128a3ade5e3SMichael Baum  Unlike most other PMDs, these modules must remain loaded and bound to
129a3ade5e3SMichael Baum  their devices:
130a3ade5e3SMichael Baum
1310f91f952SThomas Monjalon  - ``mlx5_core``: hardware driver managing NVIDIA devices
132a3ade5e3SMichael Baum    and related Ethernet kernel network devices.
133a3ade5e3SMichael Baum  - ``mlx5_ib``: InfiniBand device driver.
134a3ade5e3SMichael Baum  - ``ib_uverbs``: user space driver for Verbs (entry point for ``libibverbs``).
135a3ade5e3SMichael Baum
136*ab9c0ee1SThomas Monjalon- **Firmware**
137a3ade5e3SMichael Baum
138*ab9c0ee1SThomas Monjalon  Minimal supported firmware version:
139a3ade5e3SMichael Baum
140*ab9c0ee1SThomas Monjalon  - ConnectX-4: **12.21.1000** and above.
141*ab9c0ee1SThomas Monjalon  - ConnectX-4 Lx: **14.21.1000** and above.
142*ab9c0ee1SThomas Monjalon  - ConnectX-5: **16.21.1000** and above.
143*ab9c0ee1SThomas Monjalon  - ConnectX-5 Ex: **16.21.1000** and above.
144*ab9c0ee1SThomas Monjalon  - ConnectX-6: **20.27.0090** and above.
145*ab9c0ee1SThomas Monjalon  - ConnectX-6 Dx: **22.27.0090** and above.
146*ab9c0ee1SThomas Monjalon  - ConnectX-6 Lx: **26.27.0090** and above.
147*ab9c0ee1SThomas Monjalon  - ConnectX-7: **28.33.2028** and above.
148*ab9c0ee1SThomas Monjalon  - BlueField: **18.25.1010** and above.
149*ab9c0ee1SThomas Monjalon  - BlueField-2: **24.28.1002** and above.
150*ab9c0ee1SThomas Monjalon  - BlueField-3: **32.36.3126** and above.
151*ab9c0ee1SThomas Monjalon
152*ab9c0ee1SThomas Monjalon  New features may be added in more recent firmwares.
153a3ade5e3SMichael Baum
154a3ade5e3SMichael BaumLibraries and kernel modules can be provided either by the Linux distribution,
1550f91f952SThomas Monjalonor by installing NVIDIA MLNX_OFED/EN which provides compatibility with older kernels.
156a3ade5e3SMichael Baum
157a3ade5e3SMichael Baum
158a3ade5e3SMichael BaumUpstream Dependencies
159a3ade5e3SMichael Baum^^^^^^^^^^^^^^^^^^^^^
160a3ade5e3SMichael Baum
161a3ade5e3SMichael BaumThe mlx5 kernel modules are part of upstream Linux.
162a3ade5e3SMichael BaumThe minimal supported kernel version is 4.14.
163a3ade5e3SMichael BaumFor 32-bit, version 4.14.41 or above is required.
164a3ade5e3SMichael Baum
165a3ade5e3SMichael BaumThe libraries `libibverbs` and `libmlx5` are part of ``rdma-core``.
166a3ade5e3SMichael BaumIt is packaged by most of Linux distributions.
167a3ade5e3SMichael BaumThe minimal supported rdma-core version is 16.
168a3ade5e3SMichael BaumFor 32-bit, version 18 or above is required.
169a3ade5e3SMichael Baum
170a3ade5e3SMichael BaumThe rdma-core sources can be downloaded at
171a3ade5e3SMichael Baumhttps://github.com/linux-rdma/rdma-core
172a3ade5e3SMichael Baum
173a3ade5e3SMichael BaumIt is possible to build rdma-core as static libraries starting with version 21::
174a3ade5e3SMichael Baum
175a3ade5e3SMichael Baum    cd build
1765d267b5aSThomas Monjalon    CFLAGS=-fPIC cmake -DENABLE_STATIC=1 -DNO_PYVERBS=1 -DNO_MAN_PAGES=1 -GNinja ..
177a3ade5e3SMichael Baum    ninja
1785d267b5aSThomas Monjalon    ninja install
179a3ade5e3SMichael Baum
180*ab9c0ee1SThomas MonjalonThe firmware can be updated with `mlxup
181*ab9c0ee1SThomas Monjalon<https://docs.nvidia.com/networking/display/mlxupfwutility>`_.
182*ab9c0ee1SThomas MonjalonThe latest firmwares can be downloaded at
183*ab9c0ee1SThomas Monjalonhttps://network.nvidia.com/support/firmware/firmware-downloads/
184*ab9c0ee1SThomas Monjalon
185a3ade5e3SMichael Baum
1860f91f952SThomas MonjalonNVIDIA MLNX_OFED/EN
1870f91f952SThomas Monjalon^^^^^^^^^^^^^^^^^^^
188a3ade5e3SMichael Baum
189a3ade5e3SMichael BaumThe kernel modules and libraries are packaged with other tools
1900f91f952SThomas Monjalonin NVIDIA MLNX_OFED or NVIDIA MLNX_EN.
191a3ade5e3SMichael BaumThe minimal supported versions are:
192a3ade5e3SMichael Baum
1930f91f952SThomas Monjalon- NVIDIA MLNX_OFED version: **4.5** and above.
1940f91f952SThomas Monjalon- NVIDIA MLNX_EN version: **4.5** and above.
195a3ade5e3SMichael Baum
196a3ade5e3SMichael BaumThe firmware, the libraries libibverbs, libmlx5, and mlnx-ofed-kernel modules
1970f91f952SThomas Monjalonare packaged in `NVIDIA MLNX_OFED
198a3ade5e3SMichael Baum<https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/>`_.
199a3ade5e3SMichael BaumAfter downloading, it can be installed with this command::
200a3ade5e3SMichael Baum
201a3ade5e3SMichael Baum   ./mlnxofedinstall --dpdk
202a3ade5e3SMichael Baum
2030f91f952SThomas Monjalon`NVIDIA MLNX_EN
204a3ade5e3SMichael Baum<https://network.nvidia.com/products/ethernet-drivers/linux/mlnx_en/>`_
205a3ade5e3SMichael Baumis a smaller package including what is needed for DPDK.
206a3ade5e3SMichael BaumAfter downloading, it can be installed with this command::
207a3ade5e3SMichael Baum
208a3ade5e3SMichael Baum   ./install --dpdk
209a3ade5e3SMichael Baum
210a3ade5e3SMichael BaumAfter installing, the firmware version can be checked::
211a3ade5e3SMichael Baum
212a3ade5e3SMichael Baum   ibv_devinfo
213a3ade5e3SMichael Baum
214*ab9c0ee1SThomas MonjalonThe firmware updates are included in NVIDIA MLNX_OFED/EN packages.
215*ab9c0ee1SThomas MonjalonBecause each release provides new features, these updates must be applied
216*ab9c0ee1SThomas Monjalonto match the kernel modules and libraries they come with.
217*ab9c0ee1SThomas Monjalon
218a3ade5e3SMichael Baum.. note::
219a3ade5e3SMichael Baum
2200f91f952SThomas Monjalon   Several versions of NVIDIA MLNX_OFED/EN are available. Installing the version
221a3ade5e3SMichael Baum   this DPDK release was developed and tested against is strongly recommended.
222a3ade5e3SMichael Baum   Please check the "Tested Platforms" section in the :doc:`../../rel_notes/index`.
223a3ade5e3SMichael Baum
224a3ade5e3SMichael Baum
225a3ade5e3SMichael Baum.. _mlx5_windows_prerequisites:
226a3ade5e3SMichael Baum
227a3ade5e3SMichael BaumWindows Prerequisites
228a3ade5e3SMichael Baum~~~~~~~~~~~~~~~~~~~~~
229a3ade5e3SMichael Baum
230a3ade5e3SMichael BaumThe mlx5 PMDs rely on external libraries and kernel drivers
231a3ade5e3SMichael Baumfor resource allocation and initialization.
232a3ade5e3SMichael Baum
233a3ade5e3SMichael Baum
234a3ade5e3SMichael BaumDevX SDK Installation
235a3ade5e3SMichael Baum^^^^^^^^^^^^^^^^^^^^^
236a3ade5e3SMichael Baum
237a3ade5e3SMichael BaumThe DevX SDK must be installed on the machine building the Windows PMD.
238a3ade5e3SMichael BaumAdditional information can be found at
239a3ade5e3SMichael Baum`How to Integrate Windows DevX in Your Development Environment
2405ddc8269SAli Alnubani<https://docs.nvidia.com/networking/display/winof2v290/devx+interface>`_.
241a3ade5e3SMichael BaumThe minimal supported WinOF2 version is 2.60.
242a3ade5e3SMichael Baum
243a3ade5e3SMichael Baum
244a3ade5e3SMichael BaumCompilation Options
245a3ade5e3SMichael Baum-------------------
246a3ade5e3SMichael Baum
247a3ade5e3SMichael BaumCompilation on Linux
248a3ade5e3SMichael Baum~~~~~~~~~~~~~~~~~~~~
249a3ade5e3SMichael Baum
250a3ade5e3SMichael BaumThe ibverbs libraries can be linked with this PMD in a number of ways,
251a3ade5e3SMichael Baumconfigured by the ``ibverbs_link`` build option:
252a3ade5e3SMichael Baum
253a3ade5e3SMichael Baum``shared`` (default)
254a3ade5e3SMichael Baum   The PMD depends on some .so files.
255a3ade5e3SMichael Baum
256a3ade5e3SMichael Baum``dlopen``
257a3ade5e3SMichael Baum   Split the dependencies glue in a separate library
258a3ade5e3SMichael Baum   loaded when needed by dlopen (see ``MLX5_GLUE_PATH``).
259a3ade5e3SMichael Baum   It makes dependencies on libibverbs and libmlx5 optional,
260a3ade5e3SMichael Baum   and has no performance impact.
261a3ade5e3SMichael Baum
262a3ade5e3SMichael Baum``static``
263a3ade5e3SMichael Baum   Embed static flavor of the dependencies libibverbs and libmlx5
264a3ade5e3SMichael Baum   in the PMD shared library or the executable static binary.
265a3ade5e3SMichael Baum
266a3ade5e3SMichael Baum
267a3ade5e3SMichael BaumCompilation on Windows
268a3ade5e3SMichael Baum~~~~~~~~~~~~~~~~~~~~~~
269a3ade5e3SMichael Baum
27085c51a4fSThomas MonjalonThe DevX SDK location must be set through CFLAGS/LDFLAGS,
27185c51a4fSThomas Monjaloneither::
272a3ade5e3SMichael Baum
27385c51a4fSThomas Monjalon   meson.exe setup "-Dc_args=-I\"%DEVX_INC_PATH%\"" "-Dc_link_args=-L\"%DEVX_LIB_PATH%\"" ...
274a3ade5e3SMichael Baum
27585c51a4fSThomas Monjalonor::
27685c51a4fSThomas Monjalon
27785c51a4fSThomas Monjalon   set CFLAGS=-I"%DEVX_INC_PATH%" && set LDFLAGS=-L"%DEVX_LIB_PATH%" && meson.exe setup ...
278a3ade5e3SMichael Baum
279a3ade5e3SMichael Baum
280a3ade5e3SMichael Baum.. _mlx5_common_env:
281a3ade5e3SMichael Baum
282a3ade5e3SMichael BaumEnvironment Configuration
283a3ade5e3SMichael Baum-------------------------
284a3ade5e3SMichael Baum
285a3ade5e3SMichael BaumLinux Environment
286a3ade5e3SMichael Baum~~~~~~~~~~~~~~~~~
287a3ade5e3SMichael Baum
288a3ade5e3SMichael BaumThe kernel network interfaces are brought up during initialization.
289a3ade5e3SMichael BaumForcing them down prevents packets reception.
290a3ade5e3SMichael Baum
291a3ade5e3SMichael BaumThe ethtool operations on the kernel interfaces may also affect the PMD.
292a3ade5e3SMichael Baum
293a3ade5e3SMichael BaumSome runtime behaviours may be configured through environment variables.
294a3ade5e3SMichael Baum
295a3ade5e3SMichael Baum``MLX5_GLUE_PATH``
296a3ade5e3SMichael Baum   If built with ``ibverbs_link=dlopen``,
297a3ade5e3SMichael Baum   list of directories in which to search for the rdma-core "glue" plug-in,
298a3ade5e3SMichael Baum   separated by colons or semi-colons.
299a3ade5e3SMichael Baum
300a3ade5e3SMichael Baum``MLX5_SHUT_UP_BF``
301a3ade5e3SMichael Baum   If Verbs is used (DevX disabled),
302a3ade5e3SMichael Baum   HW queue doorbell register mapping.
303a3ade5e3SMichael Baum   The value 0 means non-cached IO mapping,
304a3ade5e3SMichael Baum   while 1 is a regular memory mapping.
305a3ade5e3SMichael Baum
306a3ade5e3SMichael Baum   With regular memory mapping, the register is flushed to HW
307a3ade5e3SMichael Baum   usually when the write-combining buffer becomes full,
308a3ade5e3SMichael Baum   but it depends on CPU design.
309a3ade5e3SMichael Baum
310a3ade5e3SMichael Baum
3110f91f952SThomas MonjalonPort Link with MLNX_OFED/EN
3120f91f952SThomas Monjalon^^^^^^^^^^^^^^^^^^^^^^^^^^^
313a3ade5e3SMichael Baum
314a3ade5e3SMichael BaumPorts links must be set to Ethernet::
315a3ade5e3SMichael Baum
316a3ade5e3SMichael Baum   mlxconfig -d <mst device> query | grep LINK_TYPE
317a3ade5e3SMichael Baum   LINK_TYPE_P1                        ETH(2)
318a3ade5e3SMichael Baum   LINK_TYPE_P2                        ETH(2)
319a3ade5e3SMichael Baum
320a3ade5e3SMichael Baum   mlxconfig -d <mst device> set LINK_TYPE_P1/2=1/2/3
321a3ade5e3SMichael Baum
322a3ade5e3SMichael BaumLink type values are:
323a3ade5e3SMichael Baum
324a3ade5e3SMichael Baum* ``1`` Infiniband
325a3ade5e3SMichael Baum* ``2`` Ethernet
326a3ade5e3SMichael Baum* ``3`` VPI (auto-sense)
327a3ade5e3SMichael Baum
328a3ade5e3SMichael BaumIf link type was changed, firmware must be reset as well::
329a3ade5e3SMichael Baum
330a3ade5e3SMichael Baum   mlxfwreset -d <mst device> reset
331a3ade5e3SMichael Baum
332a3ade5e3SMichael Baum
333a3ade5e3SMichael Baum.. _mlx5_vf:
334a3ade5e3SMichael Baum
3350f91f952SThomas MonjalonSR-IOV Virtual Function with MLNX_OFED/EN
3360f91f952SThomas Monjalon^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
337a3ade5e3SMichael Baum
338a3ade5e3SMichael BaumSR-IOV must be enabled on the NIC.
339a3ade5e3SMichael BaumIt can be checked in the following command::
340a3ade5e3SMichael Baum
341a3ade5e3SMichael Baum   mlxconfig -d <mst device> query | grep SRIOV_EN
342a3ade5e3SMichael Baum   SRIOV_EN                            True(1)
343a3ade5e3SMichael Baum
344a3ade5e3SMichael BaumIf needed, configure SR-IOV::
345a3ade5e3SMichael Baum
346a3ade5e3SMichael Baum   mlxconfig -d <mst device> set SRIOV_EN=1 NUM_OF_VFS=16
347a3ade5e3SMichael Baum   mlxfwreset -d <mst device> reset
348a3ade5e3SMichael Baum
349a3ade5e3SMichael BaumAfter doing the change, restart the driver::
350a3ade5e3SMichael Baum
351a3ade5e3SMichael Baum   /etc/init.d/openibd restart
352a3ade5e3SMichael Baum
353a3ade5e3SMichael Baumor::
354a3ade5e3SMichael Baum
355a3ade5e3SMichael Baum   service openibd restart
356a3ade5e3SMichael Baum
357a3ade5e3SMichael BaumThen the virtual functions can be instantiated::
358a3ade5e3SMichael Baum
359a3ade5e3SMichael Baum   echo [num_vfs] > /sys/class/infiniband/mlx5_0/device/sriov_numvfs
360a3ade5e3SMichael Baum
361a3ade5e3SMichael Baum
362a3ade5e3SMichael Baum.. _mlx5_sub_function:
363a3ade5e3SMichael Baum
3640f91f952SThomas MonjalonSub-Function with MLNX_OFED/EN
3650f91f952SThomas Monjalon^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
366a3ade5e3SMichael Baum
367a3ade5e3SMichael BaumSub-Function is a portion of the PCI device,
368a3ade5e3SMichael Baumit has its own dedicated queues.
369a3ade5e3SMichael BaumAn SF shares PCI-level resources with other SFs and/or with its parent PCI function.
370a3ade5e3SMichael Baum
371443b949eSDavid Marchand#. Requirement::
372a3ade5e3SMichael Baum
3730f91f952SThomas Monjalon      MLNX_OFED version >= 5.4-0.3.3.0
374a3ade5e3SMichael Baum
375443b949eSDavid Marchand#. Configure SF feature::
376a3ade5e3SMichael Baum
377a3ade5e3SMichael Baum      # Run mlxconfig on both PFs on host and ECPFs on BlueField.
378a3ade5e3SMichael Baum      mlxconfig -d <mst device> set PER_PF_NUM_SF=1 PF_TOTAL_SF=252 PF_SF_BAR_SIZE=12
379a3ade5e3SMichael Baum
380443b949eSDavid Marchand#. Enable switchdev mode::
381a3ade5e3SMichael Baum
382a3ade5e3SMichael Baum      mlxdevm dev eswitch set pci/<DBDF> mode switchdev
383a3ade5e3SMichael Baum
384443b949eSDavid Marchand#. Add SF port::
385a3ade5e3SMichael Baum
386a3ade5e3SMichael Baum      mlxdevm port add pci/<DBDF> flavour pcisf pfnum 0 sfnum <sfnum>
387a3ade5e3SMichael Baum
388a3ade5e3SMichael Baum      Get SFID from output: pci/<DBDF>/<SFID>
389a3ade5e3SMichael Baum
390443b949eSDavid Marchand#. Modify MAC address::
391a3ade5e3SMichael Baum
392a3ade5e3SMichael Baum      mlxdevm port function set pci/<DBDF>/<SFID> hw_addr <MAC>
393a3ade5e3SMichael Baum
394443b949eSDavid Marchand#. Activate SF port::
395a3ade5e3SMichael Baum
396a3ade5e3SMichael Baum      mlxdevm port function set pci/<DBDF>/<ID> state active
397a3ade5e3SMichael Baum
398443b949eSDavid Marchand#. Devargs to probe SF device::
399a3ade5e3SMichael Baum
400a3ade5e3SMichael Baum      auxiliary:mlx5_core.sf.<num>,class=eth:regex
401a3ade5e3SMichael Baum
402a3ade5e3SMichael Baum
403a3ade5e3SMichael BaumEnable Switchdev Mode
404a3ade5e3SMichael Baum^^^^^^^^^^^^^^^^^^^^^
405a3ade5e3SMichael Baum
406a3ade5e3SMichael BaumSwitchdev mode is a mode in E-Switch, that binds between representor and VF or SF.
407a3ade5e3SMichael BaumRepresentor is a port in DPDK that is connected to a VF or SF in such a way
408a3ade5e3SMichael Baumthat assuming there are no offload flows, each packet that is sent from the VF or SF
409a3ade5e3SMichael Baumwill be received by the corresponding representor.
410a3ade5e3SMichael BaumWhile each packet that is sent to a representor will be received by the VF or SF.
411a3ade5e3SMichael Baum
412a3ade5e3SMichael BaumAfter :ref:`configuring VF <mlx5_vf>`, the device must be unbound::
413a3ade5e3SMichael Baum
414a3ade5e3SMichael Baum   printf "<device pci address>" > /sys/bus/pci/drivers/mlx5_core/unbind
415a3ade5e3SMichael Baum
416a3ade5e3SMichael BaumThen switchdev mode is enabled::
417a3ade5e3SMichael Baum
418a3ade5e3SMichael Baum   echo switchdev > /sys/class/net/<net device>/compat/devlink/mode
419a3ade5e3SMichael Baum
420a3ade5e3SMichael BaumThe device can be bound again at this point.
421a3ade5e3SMichael Baum
422a3ade5e3SMichael Baum
423a3ade5e3SMichael BaumRun as Non-Root
424a3ade5e3SMichael Baum^^^^^^^^^^^^^^^
425a3ade5e3SMichael Baum
42619ec0f42SDmitry KozlyukHugepage and resource limit setup are documented
42719ec0f42SDmitry Kozlyukin the :ref:`common Linux guide <Running_Without_Root_Privileges>`.
42819ec0f42SDmitry KozlyukThis PMD can operate without access to physical addresses,
42919ec0f42SDmitry Kozlyuktherefore it does not require ``SYS_ADMIN`` to access ``/proc/self/pagemaps``.
43019ec0f42SDmitry KozlyukNote that this requirement may still come from other drivers.
431a3ade5e3SMichael Baum
43219ec0f42SDmitry KozlyukBelow are additional capabilities that must be granted to the application
43319ec0f42SDmitry Kozlyukwith the reasons for the need of each capability:
434a3ade5e3SMichael Baum
43519ec0f42SDmitry Kozlyuk``NET_RAW``
43619ec0f42SDmitry Kozlyuk   For raw Ethernet queue allocation through the kernel driver.
437a3ade5e3SMichael Baum
43819ec0f42SDmitry Kozlyuk``NET_ADMIN``
43919ec0f42SDmitry Kozlyuk   For device configuration, like setting link status or MTU.
440a3ade5e3SMichael Baum
44119ec0f42SDmitry Kozlyuk``SYS_RAWIO``
44219ec0f42SDmitry Kozlyuk   For using group 1 and above (software steering) in Flow API.
443a3ade5e3SMichael Baum
44419ec0f42SDmitry KozlyukThey can be manually granted for a specific executable file::
445a3ade5e3SMichael Baum
44619ec0f42SDmitry Kozlyuk   setcap cap_net_raw,cap_net_admin,cap_sys_rawio+ep <executable>
44719ec0f42SDmitry Kozlyuk
44819ec0f42SDmitry KozlyukAlternatively, a service manager or a container runtime
44919ec0f42SDmitry Kozlyukmay configure the capabilities for a process.
450a3ade5e3SMichael Baum
451a3ade5e3SMichael Baum
452a3ade5e3SMichael BaumWindows Environment
453a3ade5e3SMichael Baum~~~~~~~~~~~~~~~~~~~
454a3ade5e3SMichael Baum
455a3ade5e3SMichael BaumWinOF2 version 2.60 or higher must be installed on the machine.
456a3ade5e3SMichael Baum
457a3ade5e3SMichael Baum
458a3ade5e3SMichael BaumWinOF2 Installation
459a3ade5e3SMichael Baum^^^^^^^^^^^^^^^^^^^
460a3ade5e3SMichael Baum
461a3ade5e3SMichael BaumThe driver can be downloaded from the following site: `WINOF2
462a3ade5e3SMichael Baum<https://network.nvidia.com/products/adapter-software/ethernet/windows/winof-2/>`_.
463a3ade5e3SMichael Baum
464a3ade5e3SMichael Baum
465a3ade5e3SMichael BaumDevX Enablement
466a3ade5e3SMichael Baum^^^^^^^^^^^^^^^
467a3ade5e3SMichael Baum
468a3ade5e3SMichael BaumDevX for Windows must be enabled in the Windows registry.
469a3ade5e3SMichael BaumThe keys ``DevxEnabled`` and ``DevxFsRules`` must be set.
470a3ade5e3SMichael BaumAdditional information can be found in the WinOF2 user manual.
471a3ade5e3SMichael Baum
472a3ade5e3SMichael Baum
473a3ade5e3SMichael Baum.. _mlx5_firmware_config:
474a3ade5e3SMichael Baum
475a3ade5e3SMichael BaumFirmware Configuration
476a3ade5e3SMichael Baum~~~~~~~~~~~~~~~~~~~~~~
477a3ade5e3SMichael Baum
478a3ade5e3SMichael BaumFirmware features can be configured as key/value pairs.
479a3ade5e3SMichael Baum
480a3ade5e3SMichael BaumThe command to set a value is::
481a3ade5e3SMichael Baum
482a3ade5e3SMichael Baum  mlxconfig -d <device> set <key>=<value>
483a3ade5e3SMichael Baum
484a3ade5e3SMichael BaumThe command to query a value is::
485a3ade5e3SMichael Baum
486a3ade5e3SMichael Baum  mlxconfig -d <device> query <key>
487a3ade5e3SMichael Baum
488a3ade5e3SMichael BaumThe device name for the command ``mlxconfig`` can be either the PCI address,
489a3ade5e3SMichael Baumor the mst device name found with::
490a3ade5e3SMichael Baum
491a3ade5e3SMichael Baum  mst status
492a3ade5e3SMichael Baum
493a3ade5e3SMichael BaumBelow are some firmware configurations listed.
494a3ade5e3SMichael Baum
495a3ade5e3SMichael Baum- link type::
496a3ade5e3SMichael Baum
497a3ade5e3SMichael Baum    LINK_TYPE_P1
498a3ade5e3SMichael Baum    LINK_TYPE_P2
499a3ade5e3SMichael Baum    value: 1=Infiniband 2=Ethernet 3=VPI(auto-sense)
500a3ade5e3SMichael Baum
501a3ade5e3SMichael Baum- enable SR-IOV::
502a3ade5e3SMichael Baum
503a3ade5e3SMichael Baum    SRIOV_EN=1
504a3ade5e3SMichael Baum
505a3ade5e3SMichael Baum- the maximum number of SR-IOV virtual functions::
506a3ade5e3SMichael Baum
507a3ade5e3SMichael Baum    NUM_OF_VFS=<max>
508a3ade5e3SMichael Baum
509a3ade5e3SMichael Baum- enable DevX (required by Direct Rules and other features)::
510a3ade5e3SMichael Baum
511a3ade5e3SMichael Baum    UCTX_EN=1
512a3ade5e3SMichael Baum
513a3ade5e3SMichael Baum- aggressive CQE zipping::
514a3ade5e3SMichael Baum
515a3ade5e3SMichael Baum    CQE_COMPRESSION=1
516a3ade5e3SMichael Baum
517a3ade5e3SMichael Baum- L3 VXLAN and VXLAN-GPE destination UDP port::
518a3ade5e3SMichael Baum
519a3ade5e3SMichael Baum    IP_OVER_VXLAN_EN=1
520a3ade5e3SMichael Baum    IP_OVER_VXLAN_PORT=<udp dport>
521a3ade5e3SMichael Baum
522a3ade5e3SMichael Baum- enable VXLAN-GPE tunnel flow matching::
523a3ade5e3SMichael Baum
524a3ade5e3SMichael Baum    FLEX_PARSER_PROFILE_ENABLE=0
525a3ade5e3SMichael Baum    or
526a3ade5e3SMichael Baum    FLEX_PARSER_PROFILE_ENABLE=2
527a3ade5e3SMichael Baum
528a3ade5e3SMichael Baum- enable IP-in-IP tunnel flow matching::
529a3ade5e3SMichael Baum
530a3ade5e3SMichael Baum    FLEX_PARSER_PROFILE_ENABLE=0
531a3ade5e3SMichael Baum
532a3ade5e3SMichael Baum- enable MPLS flow matching::
533a3ade5e3SMichael Baum
534a3ade5e3SMichael Baum    FLEX_PARSER_PROFILE_ENABLE=1
535a3ade5e3SMichael Baum
536a3ade5e3SMichael Baum- enable ICMP(code/type/identifier/sequence number) / ICMP6(code/type) fields matching::
537a3ade5e3SMichael Baum
538a3ade5e3SMichael Baum    FLEX_PARSER_PROFILE_ENABLE=2
539a3ade5e3SMichael Baum
540a3ade5e3SMichael Baum- enable Geneve flow matching::
541a3ade5e3SMichael Baum
542a3ade5e3SMichael Baum   FLEX_PARSER_PROFILE_ENABLE=0
543a3ade5e3SMichael Baum   or
544a3ade5e3SMichael Baum   FLEX_PARSER_PROFILE_ENABLE=1
545a3ade5e3SMichael Baum
5462a39dda7SMichael Baum- enable Geneve TLV option flow matching::
547a3ade5e3SMichael Baum
548a3ade5e3SMichael Baum   FLEX_PARSER_PROFILE_ENABLE=0
5492a39dda7SMichael Baum   or
550f5177bdcSMichael Baum   FLEX_PARSER_PROFILE_ENABLE=8
551f5177bdcSMichael Baum
552a3ade5e3SMichael Baum- enable GTP flow matching::
553a3ade5e3SMichael Baum
554a3ade5e3SMichael Baum   FLEX_PARSER_PROFILE_ENABLE=3
555a3ade5e3SMichael Baum
556a3ade5e3SMichael Baum- enable eCPRI flow matching::
557a3ade5e3SMichael Baum
558a3ade5e3SMichael Baum   FLEX_PARSER_PROFILE_ENABLE=4
559a3ade5e3SMichael Baum   PROG_PARSE_GRAPH=1
560a3ade5e3SMichael Baum
561a3ade5e3SMichael Baum- enable dynamic flex parser for flex item::
562a3ade5e3SMichael Baum
563a3ade5e3SMichael Baum   FLEX_PARSER_PROFILE_ENABLE=4
564a3ade5e3SMichael Baum   PROG_PARSE_GRAPH=1
565a3ade5e3SMichael Baum
566a3ade5e3SMichael Baum- enable realtime timestamp format::
567a3ade5e3SMichael Baum
568a3ade5e3SMichael Baum   REAL_TIME_CLOCK_ENABLE=1
569a3ade5e3SMichael Baum
570f2d43ff5SDariusz Sosnowski- allow locking hairpin RQ data buffer in device memory::
571f2d43ff5SDariusz Sosnowski
572f2d43ff5SDariusz Sosnowski   HAIRPIN_DATA_BUFFER_LOCK=1
573f2d43ff5SDariusz Sosnowski   MEMIC_SIZE_LIMIT=0
574f2d43ff5SDariusz Sosnowski
575a3ade5e3SMichael Baum
576a3ade5e3SMichael Baum.. _mlx5_common_driver_options:
577a3ade5e3SMichael Baum
578a3ade5e3SMichael BaumDevice Arguments
579a3ade5e3SMichael Baum----------------
580a3ade5e3SMichael Baum
581a3ade5e3SMichael BaumThe driver can be configured per device.
582a3ade5e3SMichael BaumA single argument list can be used for a device managed by multiple PMDs.
583a3ade5e3SMichael BaumThe parameters must be passed through the EAL option ``-a``,
584a3ade5e3SMichael Baumas examples below:
585a3ade5e3SMichael Baum
586a3ade5e3SMichael Baum- PCI device::
587a3ade5e3SMichael Baum
588a3ade5e3SMichael Baum  -a 0000:03:00.2,class=eth:regex,mr_mempool_reg_en=0
589a3ade5e3SMichael Baum
590a3ade5e3SMichael Baum- Auxiliary SF::
591a3ade5e3SMichael Baum
592a3ade5e3SMichael Baum  -a auxiliary:mlx5_core.sf.2,class=compress,mr_ext_memseg_en=0
593a3ade5e3SMichael Baum
594a3ade5e3SMichael BaumEach device class PMD has its own list of specific arguments,
595a3ade5e3SMichael Baumand below are the arguments supported by the common mlx5 layer.
596a3ade5e3SMichael Baum
597a3ade5e3SMichael Baum- ``class`` parameter [string]
598a3ade5e3SMichael Baum
599a3ade5e3SMichael Baum  Select the classes of the drivers that should probe the device.
600a3ade5e3SMichael Baum  See :ref:`mlx5_classes` for more explanation and details.
601a3ade5e3SMichael Baum
602a3ade5e3SMichael Baum  The default value is ``eth``.
603a3ade5e3SMichael Baum
604a3ade5e3SMichael Baum- ``mr_ext_memseg_en`` parameter [int]
605a3ade5e3SMichael Baum
606a3ade5e3SMichael Baum  A nonzero value enables extending memseg when registering DMA memory. If
607a3ade5e3SMichael Baum  enabled, the number of entries in MR (Memory Region) lookup table on datapath
608a3ade5e3SMichael Baum  is minimized and it benefits performance. On the other hand, it worsens memory
609a3ade5e3SMichael Baum  utilization because registered memory is pinned by kernel driver. Even if a
610a3ade5e3SMichael Baum  page in the extended chunk is freed, that doesn't become reusable until the
611a3ade5e3SMichael Baum  entire memory is freed.
612a3ade5e3SMichael Baum
613a3ade5e3SMichael Baum  Enabled by default.
614a3ade5e3SMichael Baum
615a3ade5e3SMichael Baum- ``mr_mempool_reg_en`` parameter [int]
616a3ade5e3SMichael Baum
617a3ade5e3SMichael Baum  A nonzero value enables implicit registration of DMA memory of all mempools
618a3ade5e3SMichael Baum  except those having ``RTE_MEMPOOL_F_NON_IO``. This flag is set automatically
619a3ade5e3SMichael Baum  for mempools populated with non-contiguous objects or those without IOVA.
620a3ade5e3SMichael Baum  The effect is that when a packet from a mempool is transmitted,
621a3ade5e3SMichael Baum  its memory is already registered for DMA in the PMD and no registration
622a3ade5e3SMichael Baum  will happen on the data path. The tradeoff is extra work on the creation
623a3ade5e3SMichael Baum  of each mempool and increased HW resource use if some mempools
624a3ade5e3SMichael Baum  are not used with MLX5 devices.
625a3ade5e3SMichael Baum
626a3ade5e3SMichael Baum  Enabled by default.
627a3ade5e3SMichael Baum
628a3ade5e3SMichael Baum- ``sys_mem_en`` parameter [int]
629a3ade5e3SMichael Baum
630a3ade5e3SMichael Baum  A non-zero value enables the PMD memory management allocating memory
631a3ade5e3SMichael Baum  from system by default, without explicit rte memory flag.
632a3ade5e3SMichael Baum
633a3ade5e3SMichael Baum  By default, the PMD will set this value to 0.
634a6b9d5a5SMichael Baum
635a6b9d5a5SMichael Baum- ``sq_db_nc`` parameter [int]
636a6b9d5a5SMichael Baum
637a6b9d5a5SMichael Baum  The rdma core library can map doorbell register in two ways,
638a6b9d5a5SMichael Baum  depending on the environment variable "MLX5_SHUT_UP_BF":
639a6b9d5a5SMichael Baum
640a6b9d5a5SMichael Baum  - As regular cached memory (usually with write combining attribute),
641a6b9d5a5SMichael Baum    if the variable is either missing or set to zero.
642a6b9d5a5SMichael Baum  - As non-cached memory, if the variable is present and set to not "0" value.
643a6b9d5a5SMichael Baum
644a6b9d5a5SMichael Baum   The same doorbell mapping approach is implemented directly by PMD
645a6b9d5a5SMichael Baum   in UAR generation for queues created with DevX.
646a6b9d5a5SMichael Baum
647a6b9d5a5SMichael Baum  The type of mapping may slightly affect the send queue performance,
648a6b9d5a5SMichael Baum  the optimal choice strongly relied on the host architecture
649a6b9d5a5SMichael Baum  and should be deduced practically.
650a6b9d5a5SMichael Baum
651a6b9d5a5SMichael Baum  If ``sq_db_nc`` is set to zero, the doorbell is forced to be mapped to
652a6b9d5a5SMichael Baum  regular memory (with write combining), the PMD will perform the extra write
653a6b9d5a5SMichael Baum  memory barrier after writing to doorbell, it might increase the needed CPU
654a6b9d5a5SMichael Baum  clocks per packet to send, but latency might be improved.
655a6b9d5a5SMichael Baum
656a6b9d5a5SMichael Baum  If ``sq_db_nc`` is set to one, the doorbell is forced to be mapped to non
657a6b9d5a5SMichael Baum  cached memory, the PMD will not perform the extra write memory barrier after
658a6b9d5a5SMichael Baum  writing to doorbell, on some architectures it might improve the performance.
659a6b9d5a5SMichael Baum
660a6b9d5a5SMichael Baum  If ``sq_db_nc`` is set to two, the doorbell is forced to be mapped to
661a6b9d5a5SMichael Baum  regular memory, the PMD will use heuristics to decide whether a write memory
662a6b9d5a5SMichael Baum  barrier should be performed. For bursts with size multiple of recommended one
663a6b9d5a5SMichael Baum  (64 pkts) it is supposed the next burst is coming and no need to issue the
664a6b9d5a5SMichael Baum  extra memory barrier (it is supposed to be issued in the next coming burst,
665a6b9d5a5SMichael Baum  at least after descriptor writing). It might increase latency (on some hosts
666a6b9d5a5SMichael Baum  till the next packets transmit) and should be used with care.
667a6b9d5a5SMichael Baum  The PMD uses heuristics only for Tx queue, for other semd queues the doorbell
668a6b9d5a5SMichael Baum  is forced to be mapped to regular memory as same as ``sq_db_nc`` is set to 0.
669a6b9d5a5SMichael Baum
670a6b9d5a5SMichael Baum  If ``sq_db_nc`` is omitted, the preset (if any) environment variable
671a6b9d5a5SMichael Baum  "MLX5_SHUT_UP_BF" value is used. If there is no "MLX5_SHUT_UP_BF", the
672a6b9d5a5SMichael Baum  default ``sq_db_nc`` value is zero for ARM64 hosts and one for others.
6739d936f4fSMichael Baum
6749d936f4fSMichael Baum- ``cmd_fd`` parameter [int]
6759d936f4fSMichael Baum
6769d936f4fSMichael Baum  File descriptor of ``ibv_context`` created outside the PMD.
6779d936f4fSMichael Baum  PMD will use this FD to import remote CTX. The ``cmd_fd`` is obtained from
6789d936f4fSMichael Baum  the ``ibv_context->cmd_fd`` member, which must be dup'd before being passed.
6799d936f4fSMichael Baum  This parameter is valid only if ``pd_handle`` parameter is specified.
6809d936f4fSMichael Baum
6819d936f4fSMichael Baum  By default, the PMD will create a new ``ibv_context``.
6829d936f4fSMichael Baum
6839d936f4fSMichael Baum  .. note::
6849d936f4fSMichael Baum
6859d936f4fSMichael Baum     When FD comes from another process, it is the user responsibility to
6869d936f4fSMichael Baum     share the FD between the processes (e.g. by SCM_RIGHTS).
6879d936f4fSMichael Baum
6889d936f4fSMichael Baum- ``pd_handle`` parameter [int]
6899d936f4fSMichael Baum
6909d936f4fSMichael Baum  Protection domain handle of ``ibv_pd`` created outside the PMD.
6919d936f4fSMichael Baum  PMD will use this handle to import remote PD. The ``pd_handle`` can be
6929d936f4fSMichael Baum  achieved from the original PD by getting its ``ibv_pd->handle`` member value.
6939d936f4fSMichael Baum  This parameter is valid only if ``cmd_fd`` parameter is specified,
6949d936f4fSMichael Baum  and its value must be a valid kernel handle for a PD object
6959d936f4fSMichael Baum  in the context represented by given ``cmd_fd``.
6969d936f4fSMichael Baum
6979d936f4fSMichael Baum  By default, the PMD will allocate a new PD.
6989d936f4fSMichael Baum
6999d936f4fSMichael Baum  .. note::
7009d936f4fSMichael Baum
7019d936f4fSMichael Baum     The ``ibv_pd->handle`` member is different than ``mlx5dv_pd->pdn`` member.
702