1a3ade5e3SMichael Baum.. SPDX-License-Identifier: BSD-3-Clause 2a3ade5e3SMichael Baum Copyright 2022 6WIND S.A. 3a3ade5e3SMichael Baum Copyright (c) 2022 NVIDIA Corporation & Affiliates 4a3ade5e3SMichael Baum 5a3ade5e3SMichael Baum.. include:: <isonum.txt> 6a3ade5e3SMichael Baum 70f91f952SThomas MonjalonNVIDIA MLX5 Common Driver 80f91f952SThomas Monjalon========================= 90f91f952SThomas Monjalon 100f91f952SThomas Monjalon.. note:: 110f91f952SThomas Monjalon 120f91f952SThomas Monjalon NVIDIA acquired Mellanox Technologies in 2020. 130f91f952SThomas Monjalon The DPDK documentation and code might still include instances 140f91f952SThomas Monjalon of or references to Mellanox trademarks (like BlueField and ConnectX) 150f91f952SThomas Monjalon that are now NVIDIA trademarks. 16a3ade5e3SMichael Baum 17a3ade5e3SMichael BaumThe mlx5 common driver library (**librte_common_mlx5**) provides support for 187b61f14eSRaslan Darawsheh**NVIDIA ConnectX-4**, **NVIDIA ConnectX-4 Lx**, **NVIDIA ConnectX-5**, 197b61f14eSRaslan Darawsheh**NVIDIA ConnectX-6**, **NVIDIA ConnectX-6 Dx**, **NVIDIA ConnectX-6 Lx**, 20cb0da841SRaslan Darawsheh**NVIDIA ConnectX-7**, **NVIDIA BlueField**, **NVIDIA BlueField-2** and 21cb0da841SRaslan Darawsheh**NVIDIA BlueField-3** families of 10/25/40/50/100/200 Gb/s adapters. 22a3ade5e3SMichael Baum 23a3ade5e3SMichael BaumInformation and documentation for these adapters can be found on the 24a3ade5e3SMichael Baum`NVIDIA website <https://www.nvidia.com/en-us/networking/>`_. 25a3ade5e3SMichael BaumHelp is also provided by the 260f91f952SThomas Monjalon`NVIDIA Networking forum <https://forums.developer.nvidia.com/c/infrastructure/369/>`_. 270f91f952SThomas MonjalonIn addition, there is a `web section dedicated to DPDK 28a3ade5e3SMichael Baum<https://developer.nvidia.com/networking/dpdk>`_. 29a3ade5e3SMichael Baum 30a3ade5e3SMichael Baum 31a3ade5e3SMichael BaumDesign 32a3ade5e3SMichael Baum------ 33a3ade5e3SMichael Baum 34a3ade5e3SMichael BaumFor security reasons and to enhance robustness, 35a3ade5e3SMichael Baumthis driver only handles virtual memory addresses. 36a3ade5e3SMichael BaumThe way resources allocations are handled by the kernel, 37a3ade5e3SMichael Baumcombined with hardware specifications that allow handling virtual memory addresses directly, 38a3ade5e3SMichael Baumensure that DPDK applications cannot access random physical memory 39a3ade5e3SMichael Baum(or memory that does not belong to the current process). 40a3ade5e3SMichael Baum 41a3ade5e3SMichael BaumThere are different levels of objects and bypassing abilities 42a3ade5e3SMichael Baumwhich are used to get the best performance: 43a3ade5e3SMichael Baum 44a3ade5e3SMichael Baum- **Verbs** is a complete high-level generic API 45a3ade5e3SMichael Baum- **Direct Verbs** is a device-specific API 46a3ade5e3SMichael Baum- **DevX** allows accessing firmware objects 47a3ade5e3SMichael Baum- **Direct Rules** manages flow steering at the low-level hardware layer 48a3ade5e3SMichael Baum 49a3ade5e3SMichael BaumOn Linux, above interfaces are provided by linking with `libibverbs` and `libmlx5`. 50a3ade5e3SMichael BaumSee :ref:`mlx5_linux_prerequisites` for installation. 51a3ade5e3SMichael Baum 52a3ade5e3SMichael BaumOn Windows, DevX is the only requirement from the above list. 53a3ade5e3SMichael BaumSee :ref:`mlx5_windows_prerequisites` for DevX SDK package installation. 54a3ade5e3SMichael Baum 55a3ade5e3SMichael Baum 56a3ade5e3SMichael Baum.. _mlx5_classes: 57a3ade5e3SMichael Baum 58a3ade5e3SMichael BaumClasses 59a3ade5e3SMichael Baum------- 60a3ade5e3SMichael Baum 61a3ade5e3SMichael BaumOne mlx5 device can be probed by a number of different PMDs. 62a3ade5e3SMichael BaumTo select a specific PMD, its name should be specified as a device parameter 63a3ade5e3SMichael Baum(e.g. ``0000:08:00.1,class=eth``). 64a3ade5e3SMichael Baum 65a3ade5e3SMichael BaumIn order to allow probing by multiple PMDs, 66a3ade5e3SMichael Baumseveral classes may be listed separated by a colon. 67a3ade5e3SMichael BaumFor example: ``class=crypto:regex`` will probe both Crypto and RegEx PMDs. 68a3ade5e3SMichael Baum 69a3ade5e3SMichael Baum 70a3ade5e3SMichael BaumSupported Classes 71a3ade5e3SMichael Baum~~~~~~~~~~~~~~~~~ 72a3ade5e3SMichael Baum 73a3ade5e3SMichael Baum- ``class=compress`` for :doc:`../../compressdevs/mlx5`. 74a3ade5e3SMichael Baum- ``class=crypto`` for :doc:`../../cryptodevs/mlx5`. 75a3ade5e3SMichael Baum- ``class=eth`` for :doc:`../../nics/mlx5`. 76a3ade5e3SMichael Baum- ``class=regex`` for :doc:`../../regexdevs/mlx5`. 77a3ade5e3SMichael Baum- ``class=vdpa`` for :doc:`../../vdpadevs/mlx5`. 78a3ade5e3SMichael Baum 79a3ade5e3SMichael BaumBy default, the mlx5 device will be probed by the ``eth`` PMD. 80a3ade5e3SMichael Baum 81a3ade5e3SMichael Baum 82a3ade5e3SMichael BaumLimitations 83a3ade5e3SMichael Baum~~~~~~~~~~~ 84a3ade5e3SMichael Baum 85a3ade5e3SMichael Baum- ``eth`` and ``vdpa`` PMDs cannot be probed at the same time. 86a3ade5e3SMichael Baum All other combinations are possible. 87a3ade5e3SMichael Baum 88a3ade5e3SMichael Baum- On Windows, only ``eth`` and ``crypto`` are supported. 89a3ade5e3SMichael Baum 90a3ade5e3SMichael Baum 91a3ade5e3SMichael Baum.. _mlx5_common_compilation: 92a3ade5e3SMichael Baum 93a3ade5e3SMichael BaumCompilation Prerequisites 94a3ade5e3SMichael Baum------------------------- 95a3ade5e3SMichael Baum 96a3ade5e3SMichael Baum.. _mlx5_linux_prerequisites: 97a3ade5e3SMichael Baum 98a3ade5e3SMichael BaumLinux Prerequisites 99a3ade5e3SMichael Baum~~~~~~~~~~~~~~~~~~~ 100a3ade5e3SMichael Baum 101a3ade5e3SMichael BaumThis driver relies on external libraries and kernel drivers for resources 102a3ade5e3SMichael Baumallocations and initialization. 103a3ade5e3SMichael BaumThe following dependencies are not part of DPDK and must be installed separately: 104a3ade5e3SMichael Baum 105a3ade5e3SMichael Baum- **libibverbs** 106a3ade5e3SMichael Baum 107a3ade5e3SMichael Baum User space Verbs framework used by ``librte_common_mlx5``. 108a3ade5e3SMichael Baum This library provides a generic interface between the kernel 109a3ade5e3SMichael Baum and low-level user space drivers such as ``libmlx5``. 110a3ade5e3SMichael Baum 111a3ade5e3SMichael Baum It allows slow and privileged operations (context initialization, 112a3ade5e3SMichael Baum hardware resources allocations) to be managed by the kernel 113a3ade5e3SMichael Baum and fast operations to never leave user space. 114a3ade5e3SMichael Baum 115a3ade5e3SMichael Baum- **libmlx5** 116a3ade5e3SMichael Baum 1170f91f952SThomas Monjalon Low-level user space driver library for NVIDIA devices, 118a3ade5e3SMichael Baum it is automatically loaded by ``libibverbs``. 119a3ade5e3SMichael Baum 120a3ade5e3SMichael Baum This library basically implements send/receive calls to the hardware queues. 121a3ade5e3SMichael Baum 122a3ade5e3SMichael Baum- **Kernel modules** 123a3ade5e3SMichael Baum 124a3ade5e3SMichael Baum They provide the kernel-side Verbs API and low level device drivers 125a3ade5e3SMichael Baum that manage actual hardware initialization 126a3ade5e3SMichael Baum and resources sharing with user-space processes. 127a3ade5e3SMichael Baum 128a3ade5e3SMichael Baum Unlike most other PMDs, these modules must remain loaded and bound to 129a3ade5e3SMichael Baum their devices: 130a3ade5e3SMichael Baum 1310f91f952SThomas Monjalon - ``mlx5_core``: hardware driver managing NVIDIA devices 132a3ade5e3SMichael Baum and related Ethernet kernel network devices. 133a3ade5e3SMichael Baum - ``mlx5_ib``: InfiniBand device driver. 134a3ade5e3SMichael Baum - ``ib_uverbs``: user space driver for Verbs (entry point for ``libibverbs``). 135a3ade5e3SMichael Baum 136*ab9c0ee1SThomas Monjalon- **Firmware** 137a3ade5e3SMichael Baum 138*ab9c0ee1SThomas Monjalon Minimal supported firmware version: 139a3ade5e3SMichael Baum 140*ab9c0ee1SThomas Monjalon - ConnectX-4: **12.21.1000** and above. 141*ab9c0ee1SThomas Monjalon - ConnectX-4 Lx: **14.21.1000** and above. 142*ab9c0ee1SThomas Monjalon - ConnectX-5: **16.21.1000** and above. 143*ab9c0ee1SThomas Monjalon - ConnectX-5 Ex: **16.21.1000** and above. 144*ab9c0ee1SThomas Monjalon - ConnectX-6: **20.27.0090** and above. 145*ab9c0ee1SThomas Monjalon - ConnectX-6 Dx: **22.27.0090** and above. 146*ab9c0ee1SThomas Monjalon - ConnectX-6 Lx: **26.27.0090** and above. 147*ab9c0ee1SThomas Monjalon - ConnectX-7: **28.33.2028** and above. 148*ab9c0ee1SThomas Monjalon - BlueField: **18.25.1010** and above. 149*ab9c0ee1SThomas Monjalon - BlueField-2: **24.28.1002** and above. 150*ab9c0ee1SThomas Monjalon - BlueField-3: **32.36.3126** and above. 151*ab9c0ee1SThomas Monjalon 152*ab9c0ee1SThomas Monjalon New features may be added in more recent firmwares. 153a3ade5e3SMichael Baum 154a3ade5e3SMichael BaumLibraries and kernel modules can be provided either by the Linux distribution, 1550f91f952SThomas Monjalonor by installing NVIDIA MLNX_OFED/EN which provides compatibility with older kernels. 156a3ade5e3SMichael Baum 157a3ade5e3SMichael Baum 158a3ade5e3SMichael BaumUpstream Dependencies 159a3ade5e3SMichael Baum^^^^^^^^^^^^^^^^^^^^^ 160a3ade5e3SMichael Baum 161a3ade5e3SMichael BaumThe mlx5 kernel modules are part of upstream Linux. 162a3ade5e3SMichael BaumThe minimal supported kernel version is 4.14. 163a3ade5e3SMichael BaumFor 32-bit, version 4.14.41 or above is required. 164a3ade5e3SMichael Baum 165a3ade5e3SMichael BaumThe libraries `libibverbs` and `libmlx5` are part of ``rdma-core``. 166a3ade5e3SMichael BaumIt is packaged by most of Linux distributions. 167a3ade5e3SMichael BaumThe minimal supported rdma-core version is 16. 168a3ade5e3SMichael BaumFor 32-bit, version 18 or above is required. 169a3ade5e3SMichael Baum 170a3ade5e3SMichael BaumThe rdma-core sources can be downloaded at 171a3ade5e3SMichael Baumhttps://github.com/linux-rdma/rdma-core 172a3ade5e3SMichael Baum 173a3ade5e3SMichael BaumIt is possible to build rdma-core as static libraries starting with version 21:: 174a3ade5e3SMichael Baum 175a3ade5e3SMichael Baum cd build 1765d267b5aSThomas Monjalon CFLAGS=-fPIC cmake -DENABLE_STATIC=1 -DNO_PYVERBS=1 -DNO_MAN_PAGES=1 -GNinja .. 177a3ade5e3SMichael Baum ninja 1785d267b5aSThomas Monjalon ninja install 179a3ade5e3SMichael Baum 180*ab9c0ee1SThomas MonjalonThe firmware can be updated with `mlxup 181*ab9c0ee1SThomas Monjalon<https://docs.nvidia.com/networking/display/mlxupfwutility>`_. 182*ab9c0ee1SThomas MonjalonThe latest firmwares can be downloaded at 183*ab9c0ee1SThomas Monjalonhttps://network.nvidia.com/support/firmware/firmware-downloads/ 184*ab9c0ee1SThomas Monjalon 185a3ade5e3SMichael Baum 1860f91f952SThomas MonjalonNVIDIA MLNX_OFED/EN 1870f91f952SThomas Monjalon^^^^^^^^^^^^^^^^^^^ 188a3ade5e3SMichael Baum 189a3ade5e3SMichael BaumThe kernel modules and libraries are packaged with other tools 1900f91f952SThomas Monjalonin NVIDIA MLNX_OFED or NVIDIA MLNX_EN. 191a3ade5e3SMichael BaumThe minimal supported versions are: 192a3ade5e3SMichael Baum 1930f91f952SThomas Monjalon- NVIDIA MLNX_OFED version: **4.5** and above. 1940f91f952SThomas Monjalon- NVIDIA MLNX_EN version: **4.5** and above. 195a3ade5e3SMichael Baum 196a3ade5e3SMichael BaumThe firmware, the libraries libibverbs, libmlx5, and mlnx-ofed-kernel modules 1970f91f952SThomas Monjalonare packaged in `NVIDIA MLNX_OFED 198a3ade5e3SMichael Baum<https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/>`_. 199a3ade5e3SMichael BaumAfter downloading, it can be installed with this command:: 200a3ade5e3SMichael Baum 201a3ade5e3SMichael Baum ./mlnxofedinstall --dpdk 202a3ade5e3SMichael Baum 2030f91f952SThomas Monjalon`NVIDIA MLNX_EN 204a3ade5e3SMichael Baum<https://network.nvidia.com/products/ethernet-drivers/linux/mlnx_en/>`_ 205a3ade5e3SMichael Baumis a smaller package including what is needed for DPDK. 206a3ade5e3SMichael BaumAfter downloading, it can be installed with this command:: 207a3ade5e3SMichael Baum 208a3ade5e3SMichael Baum ./install --dpdk 209a3ade5e3SMichael Baum 210a3ade5e3SMichael BaumAfter installing, the firmware version can be checked:: 211a3ade5e3SMichael Baum 212a3ade5e3SMichael Baum ibv_devinfo 213a3ade5e3SMichael Baum 214*ab9c0ee1SThomas MonjalonThe firmware updates are included in NVIDIA MLNX_OFED/EN packages. 215*ab9c0ee1SThomas MonjalonBecause each release provides new features, these updates must be applied 216*ab9c0ee1SThomas Monjalonto match the kernel modules and libraries they come with. 217*ab9c0ee1SThomas Monjalon 218a3ade5e3SMichael Baum.. note:: 219a3ade5e3SMichael Baum 2200f91f952SThomas Monjalon Several versions of NVIDIA MLNX_OFED/EN are available. Installing the version 221a3ade5e3SMichael Baum this DPDK release was developed and tested against is strongly recommended. 222a3ade5e3SMichael Baum Please check the "Tested Platforms" section in the :doc:`../../rel_notes/index`. 223a3ade5e3SMichael Baum 224a3ade5e3SMichael Baum 225a3ade5e3SMichael Baum.. _mlx5_windows_prerequisites: 226a3ade5e3SMichael Baum 227a3ade5e3SMichael BaumWindows Prerequisites 228a3ade5e3SMichael Baum~~~~~~~~~~~~~~~~~~~~~ 229a3ade5e3SMichael Baum 230a3ade5e3SMichael BaumThe mlx5 PMDs rely on external libraries and kernel drivers 231a3ade5e3SMichael Baumfor resource allocation and initialization. 232a3ade5e3SMichael Baum 233a3ade5e3SMichael Baum 234a3ade5e3SMichael BaumDevX SDK Installation 235a3ade5e3SMichael Baum^^^^^^^^^^^^^^^^^^^^^ 236a3ade5e3SMichael Baum 237a3ade5e3SMichael BaumThe DevX SDK must be installed on the machine building the Windows PMD. 238a3ade5e3SMichael BaumAdditional information can be found at 239a3ade5e3SMichael Baum`How to Integrate Windows DevX in Your Development Environment 2405ddc8269SAli Alnubani<https://docs.nvidia.com/networking/display/winof2v290/devx+interface>`_. 241a3ade5e3SMichael BaumThe minimal supported WinOF2 version is 2.60. 242a3ade5e3SMichael Baum 243a3ade5e3SMichael Baum 244a3ade5e3SMichael BaumCompilation Options 245a3ade5e3SMichael Baum------------------- 246a3ade5e3SMichael Baum 247a3ade5e3SMichael BaumCompilation on Linux 248a3ade5e3SMichael Baum~~~~~~~~~~~~~~~~~~~~ 249a3ade5e3SMichael Baum 250a3ade5e3SMichael BaumThe ibverbs libraries can be linked with this PMD in a number of ways, 251a3ade5e3SMichael Baumconfigured by the ``ibverbs_link`` build option: 252a3ade5e3SMichael Baum 253a3ade5e3SMichael Baum``shared`` (default) 254a3ade5e3SMichael Baum The PMD depends on some .so files. 255a3ade5e3SMichael Baum 256a3ade5e3SMichael Baum``dlopen`` 257a3ade5e3SMichael Baum Split the dependencies glue in a separate library 258a3ade5e3SMichael Baum loaded when needed by dlopen (see ``MLX5_GLUE_PATH``). 259a3ade5e3SMichael Baum It makes dependencies on libibverbs and libmlx5 optional, 260a3ade5e3SMichael Baum and has no performance impact. 261a3ade5e3SMichael Baum 262a3ade5e3SMichael Baum``static`` 263a3ade5e3SMichael Baum Embed static flavor of the dependencies libibverbs and libmlx5 264a3ade5e3SMichael Baum in the PMD shared library or the executable static binary. 265a3ade5e3SMichael Baum 266a3ade5e3SMichael Baum 267a3ade5e3SMichael BaumCompilation on Windows 268a3ade5e3SMichael Baum~~~~~~~~~~~~~~~~~~~~~~ 269a3ade5e3SMichael Baum 27085c51a4fSThomas MonjalonThe DevX SDK location must be set through CFLAGS/LDFLAGS, 27185c51a4fSThomas Monjaloneither:: 272a3ade5e3SMichael Baum 27385c51a4fSThomas Monjalon meson.exe setup "-Dc_args=-I\"%DEVX_INC_PATH%\"" "-Dc_link_args=-L\"%DEVX_LIB_PATH%\"" ... 274a3ade5e3SMichael Baum 27585c51a4fSThomas Monjalonor:: 27685c51a4fSThomas Monjalon 27785c51a4fSThomas Monjalon set CFLAGS=-I"%DEVX_INC_PATH%" && set LDFLAGS=-L"%DEVX_LIB_PATH%" && meson.exe setup ... 278a3ade5e3SMichael Baum 279a3ade5e3SMichael Baum 280a3ade5e3SMichael Baum.. _mlx5_common_env: 281a3ade5e3SMichael Baum 282a3ade5e3SMichael BaumEnvironment Configuration 283a3ade5e3SMichael Baum------------------------- 284a3ade5e3SMichael Baum 285a3ade5e3SMichael BaumLinux Environment 286a3ade5e3SMichael Baum~~~~~~~~~~~~~~~~~ 287a3ade5e3SMichael Baum 288a3ade5e3SMichael BaumThe kernel network interfaces are brought up during initialization. 289a3ade5e3SMichael BaumForcing them down prevents packets reception. 290a3ade5e3SMichael Baum 291a3ade5e3SMichael BaumThe ethtool operations on the kernel interfaces may also affect the PMD. 292a3ade5e3SMichael Baum 293a3ade5e3SMichael BaumSome runtime behaviours may be configured through environment variables. 294a3ade5e3SMichael Baum 295a3ade5e3SMichael Baum``MLX5_GLUE_PATH`` 296a3ade5e3SMichael Baum If built with ``ibverbs_link=dlopen``, 297a3ade5e3SMichael Baum list of directories in which to search for the rdma-core "glue" plug-in, 298a3ade5e3SMichael Baum separated by colons or semi-colons. 299a3ade5e3SMichael Baum 300a3ade5e3SMichael Baum``MLX5_SHUT_UP_BF`` 301a3ade5e3SMichael Baum If Verbs is used (DevX disabled), 302a3ade5e3SMichael Baum HW queue doorbell register mapping. 303a3ade5e3SMichael Baum The value 0 means non-cached IO mapping, 304a3ade5e3SMichael Baum while 1 is a regular memory mapping. 305a3ade5e3SMichael Baum 306a3ade5e3SMichael Baum With regular memory mapping, the register is flushed to HW 307a3ade5e3SMichael Baum usually when the write-combining buffer becomes full, 308a3ade5e3SMichael Baum but it depends on CPU design. 309a3ade5e3SMichael Baum 310a3ade5e3SMichael Baum 3110f91f952SThomas MonjalonPort Link with MLNX_OFED/EN 3120f91f952SThomas Monjalon^^^^^^^^^^^^^^^^^^^^^^^^^^^ 313a3ade5e3SMichael Baum 314a3ade5e3SMichael BaumPorts links must be set to Ethernet:: 315a3ade5e3SMichael Baum 316a3ade5e3SMichael Baum mlxconfig -d <mst device> query | grep LINK_TYPE 317a3ade5e3SMichael Baum LINK_TYPE_P1 ETH(2) 318a3ade5e3SMichael Baum LINK_TYPE_P2 ETH(2) 319a3ade5e3SMichael Baum 320a3ade5e3SMichael Baum mlxconfig -d <mst device> set LINK_TYPE_P1/2=1/2/3 321a3ade5e3SMichael Baum 322a3ade5e3SMichael BaumLink type values are: 323a3ade5e3SMichael Baum 324a3ade5e3SMichael Baum* ``1`` Infiniband 325a3ade5e3SMichael Baum* ``2`` Ethernet 326a3ade5e3SMichael Baum* ``3`` VPI (auto-sense) 327a3ade5e3SMichael Baum 328a3ade5e3SMichael BaumIf link type was changed, firmware must be reset as well:: 329a3ade5e3SMichael Baum 330a3ade5e3SMichael Baum mlxfwreset -d <mst device> reset 331a3ade5e3SMichael Baum 332a3ade5e3SMichael Baum 333a3ade5e3SMichael Baum.. _mlx5_vf: 334a3ade5e3SMichael Baum 3350f91f952SThomas MonjalonSR-IOV Virtual Function with MLNX_OFED/EN 3360f91f952SThomas Monjalon^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 337a3ade5e3SMichael Baum 338a3ade5e3SMichael BaumSR-IOV must be enabled on the NIC. 339a3ade5e3SMichael BaumIt can be checked in the following command:: 340a3ade5e3SMichael Baum 341a3ade5e3SMichael Baum mlxconfig -d <mst device> query | grep SRIOV_EN 342a3ade5e3SMichael Baum SRIOV_EN True(1) 343a3ade5e3SMichael Baum 344a3ade5e3SMichael BaumIf needed, configure SR-IOV:: 345a3ade5e3SMichael Baum 346a3ade5e3SMichael Baum mlxconfig -d <mst device> set SRIOV_EN=1 NUM_OF_VFS=16 347a3ade5e3SMichael Baum mlxfwreset -d <mst device> reset 348a3ade5e3SMichael Baum 349a3ade5e3SMichael BaumAfter doing the change, restart the driver:: 350a3ade5e3SMichael Baum 351a3ade5e3SMichael Baum /etc/init.d/openibd restart 352a3ade5e3SMichael Baum 353a3ade5e3SMichael Baumor:: 354a3ade5e3SMichael Baum 355a3ade5e3SMichael Baum service openibd restart 356a3ade5e3SMichael Baum 357a3ade5e3SMichael BaumThen the virtual functions can be instantiated:: 358a3ade5e3SMichael Baum 359a3ade5e3SMichael Baum echo [num_vfs] > /sys/class/infiniband/mlx5_0/device/sriov_numvfs 360a3ade5e3SMichael Baum 361a3ade5e3SMichael Baum 362a3ade5e3SMichael Baum.. _mlx5_sub_function: 363a3ade5e3SMichael Baum 3640f91f952SThomas MonjalonSub-Function with MLNX_OFED/EN 3650f91f952SThomas Monjalon^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 366a3ade5e3SMichael Baum 367a3ade5e3SMichael BaumSub-Function is a portion of the PCI device, 368a3ade5e3SMichael Baumit has its own dedicated queues. 369a3ade5e3SMichael BaumAn SF shares PCI-level resources with other SFs and/or with its parent PCI function. 370a3ade5e3SMichael Baum 371443b949eSDavid Marchand#. Requirement:: 372a3ade5e3SMichael Baum 3730f91f952SThomas Monjalon MLNX_OFED version >= 5.4-0.3.3.0 374a3ade5e3SMichael Baum 375443b949eSDavid Marchand#. Configure SF feature:: 376a3ade5e3SMichael Baum 377a3ade5e3SMichael Baum # Run mlxconfig on both PFs on host and ECPFs on BlueField. 378a3ade5e3SMichael Baum mlxconfig -d <mst device> set PER_PF_NUM_SF=1 PF_TOTAL_SF=252 PF_SF_BAR_SIZE=12 379a3ade5e3SMichael Baum 380443b949eSDavid Marchand#. Enable switchdev mode:: 381a3ade5e3SMichael Baum 382a3ade5e3SMichael Baum mlxdevm dev eswitch set pci/<DBDF> mode switchdev 383a3ade5e3SMichael Baum 384443b949eSDavid Marchand#. Add SF port:: 385a3ade5e3SMichael Baum 386a3ade5e3SMichael Baum mlxdevm port add pci/<DBDF> flavour pcisf pfnum 0 sfnum <sfnum> 387a3ade5e3SMichael Baum 388a3ade5e3SMichael Baum Get SFID from output: pci/<DBDF>/<SFID> 389a3ade5e3SMichael Baum 390443b949eSDavid Marchand#. Modify MAC address:: 391a3ade5e3SMichael Baum 392a3ade5e3SMichael Baum mlxdevm port function set pci/<DBDF>/<SFID> hw_addr <MAC> 393a3ade5e3SMichael Baum 394443b949eSDavid Marchand#. Activate SF port:: 395a3ade5e3SMichael Baum 396a3ade5e3SMichael Baum mlxdevm port function set pci/<DBDF>/<ID> state active 397a3ade5e3SMichael Baum 398443b949eSDavid Marchand#. Devargs to probe SF device:: 399a3ade5e3SMichael Baum 400a3ade5e3SMichael Baum auxiliary:mlx5_core.sf.<num>,class=eth:regex 401a3ade5e3SMichael Baum 402a3ade5e3SMichael Baum 403a3ade5e3SMichael BaumEnable Switchdev Mode 404a3ade5e3SMichael Baum^^^^^^^^^^^^^^^^^^^^^ 405a3ade5e3SMichael Baum 406a3ade5e3SMichael BaumSwitchdev mode is a mode in E-Switch, that binds between representor and VF or SF. 407a3ade5e3SMichael BaumRepresentor is a port in DPDK that is connected to a VF or SF in such a way 408a3ade5e3SMichael Baumthat assuming there are no offload flows, each packet that is sent from the VF or SF 409a3ade5e3SMichael Baumwill be received by the corresponding representor. 410a3ade5e3SMichael BaumWhile each packet that is sent to a representor will be received by the VF or SF. 411a3ade5e3SMichael Baum 412a3ade5e3SMichael BaumAfter :ref:`configuring VF <mlx5_vf>`, the device must be unbound:: 413a3ade5e3SMichael Baum 414a3ade5e3SMichael Baum printf "<device pci address>" > /sys/bus/pci/drivers/mlx5_core/unbind 415a3ade5e3SMichael Baum 416a3ade5e3SMichael BaumThen switchdev mode is enabled:: 417a3ade5e3SMichael Baum 418a3ade5e3SMichael Baum echo switchdev > /sys/class/net/<net device>/compat/devlink/mode 419a3ade5e3SMichael Baum 420a3ade5e3SMichael BaumThe device can be bound again at this point. 421a3ade5e3SMichael Baum 422a3ade5e3SMichael Baum 423a3ade5e3SMichael BaumRun as Non-Root 424a3ade5e3SMichael Baum^^^^^^^^^^^^^^^ 425a3ade5e3SMichael Baum 42619ec0f42SDmitry KozlyukHugepage and resource limit setup are documented 42719ec0f42SDmitry Kozlyukin the :ref:`common Linux guide <Running_Without_Root_Privileges>`. 42819ec0f42SDmitry KozlyukThis PMD can operate without access to physical addresses, 42919ec0f42SDmitry Kozlyuktherefore it does not require ``SYS_ADMIN`` to access ``/proc/self/pagemaps``. 43019ec0f42SDmitry KozlyukNote that this requirement may still come from other drivers. 431a3ade5e3SMichael Baum 43219ec0f42SDmitry KozlyukBelow are additional capabilities that must be granted to the application 43319ec0f42SDmitry Kozlyukwith the reasons for the need of each capability: 434a3ade5e3SMichael Baum 43519ec0f42SDmitry Kozlyuk``NET_RAW`` 43619ec0f42SDmitry Kozlyuk For raw Ethernet queue allocation through the kernel driver. 437a3ade5e3SMichael Baum 43819ec0f42SDmitry Kozlyuk``NET_ADMIN`` 43919ec0f42SDmitry Kozlyuk For device configuration, like setting link status or MTU. 440a3ade5e3SMichael Baum 44119ec0f42SDmitry Kozlyuk``SYS_RAWIO`` 44219ec0f42SDmitry Kozlyuk For using group 1 and above (software steering) in Flow API. 443a3ade5e3SMichael Baum 44419ec0f42SDmitry KozlyukThey can be manually granted for a specific executable file:: 445a3ade5e3SMichael Baum 44619ec0f42SDmitry Kozlyuk setcap cap_net_raw,cap_net_admin,cap_sys_rawio+ep <executable> 44719ec0f42SDmitry Kozlyuk 44819ec0f42SDmitry KozlyukAlternatively, a service manager or a container runtime 44919ec0f42SDmitry Kozlyukmay configure the capabilities for a process. 450a3ade5e3SMichael Baum 451a3ade5e3SMichael Baum 452a3ade5e3SMichael BaumWindows Environment 453a3ade5e3SMichael Baum~~~~~~~~~~~~~~~~~~~ 454a3ade5e3SMichael Baum 455a3ade5e3SMichael BaumWinOF2 version 2.60 or higher must be installed on the machine. 456a3ade5e3SMichael Baum 457a3ade5e3SMichael Baum 458a3ade5e3SMichael BaumWinOF2 Installation 459a3ade5e3SMichael Baum^^^^^^^^^^^^^^^^^^^ 460a3ade5e3SMichael Baum 461a3ade5e3SMichael BaumThe driver can be downloaded from the following site: `WINOF2 462a3ade5e3SMichael Baum<https://network.nvidia.com/products/adapter-software/ethernet/windows/winof-2/>`_. 463a3ade5e3SMichael Baum 464a3ade5e3SMichael Baum 465a3ade5e3SMichael BaumDevX Enablement 466a3ade5e3SMichael Baum^^^^^^^^^^^^^^^ 467a3ade5e3SMichael Baum 468a3ade5e3SMichael BaumDevX for Windows must be enabled in the Windows registry. 469a3ade5e3SMichael BaumThe keys ``DevxEnabled`` and ``DevxFsRules`` must be set. 470a3ade5e3SMichael BaumAdditional information can be found in the WinOF2 user manual. 471a3ade5e3SMichael Baum 472a3ade5e3SMichael Baum 473a3ade5e3SMichael Baum.. _mlx5_firmware_config: 474a3ade5e3SMichael Baum 475a3ade5e3SMichael BaumFirmware Configuration 476a3ade5e3SMichael Baum~~~~~~~~~~~~~~~~~~~~~~ 477a3ade5e3SMichael Baum 478a3ade5e3SMichael BaumFirmware features can be configured as key/value pairs. 479a3ade5e3SMichael Baum 480a3ade5e3SMichael BaumThe command to set a value is:: 481a3ade5e3SMichael Baum 482a3ade5e3SMichael Baum mlxconfig -d <device> set <key>=<value> 483a3ade5e3SMichael Baum 484a3ade5e3SMichael BaumThe command to query a value is:: 485a3ade5e3SMichael Baum 486a3ade5e3SMichael Baum mlxconfig -d <device> query <key> 487a3ade5e3SMichael Baum 488a3ade5e3SMichael BaumThe device name for the command ``mlxconfig`` can be either the PCI address, 489a3ade5e3SMichael Baumor the mst device name found with:: 490a3ade5e3SMichael Baum 491a3ade5e3SMichael Baum mst status 492a3ade5e3SMichael Baum 493a3ade5e3SMichael BaumBelow are some firmware configurations listed. 494a3ade5e3SMichael Baum 495a3ade5e3SMichael Baum- link type:: 496a3ade5e3SMichael Baum 497a3ade5e3SMichael Baum LINK_TYPE_P1 498a3ade5e3SMichael Baum LINK_TYPE_P2 499a3ade5e3SMichael Baum value: 1=Infiniband 2=Ethernet 3=VPI(auto-sense) 500a3ade5e3SMichael Baum 501a3ade5e3SMichael Baum- enable SR-IOV:: 502a3ade5e3SMichael Baum 503a3ade5e3SMichael Baum SRIOV_EN=1 504a3ade5e3SMichael Baum 505a3ade5e3SMichael Baum- the maximum number of SR-IOV virtual functions:: 506a3ade5e3SMichael Baum 507a3ade5e3SMichael Baum NUM_OF_VFS=<max> 508a3ade5e3SMichael Baum 509a3ade5e3SMichael Baum- enable DevX (required by Direct Rules and other features):: 510a3ade5e3SMichael Baum 511a3ade5e3SMichael Baum UCTX_EN=1 512a3ade5e3SMichael Baum 513a3ade5e3SMichael Baum- aggressive CQE zipping:: 514a3ade5e3SMichael Baum 515a3ade5e3SMichael Baum CQE_COMPRESSION=1 516a3ade5e3SMichael Baum 517a3ade5e3SMichael Baum- L3 VXLAN and VXLAN-GPE destination UDP port:: 518a3ade5e3SMichael Baum 519a3ade5e3SMichael Baum IP_OVER_VXLAN_EN=1 520a3ade5e3SMichael Baum IP_OVER_VXLAN_PORT=<udp dport> 521a3ade5e3SMichael Baum 522a3ade5e3SMichael Baum- enable VXLAN-GPE tunnel flow matching:: 523a3ade5e3SMichael Baum 524a3ade5e3SMichael Baum FLEX_PARSER_PROFILE_ENABLE=0 525a3ade5e3SMichael Baum or 526a3ade5e3SMichael Baum FLEX_PARSER_PROFILE_ENABLE=2 527a3ade5e3SMichael Baum 528a3ade5e3SMichael Baum- enable IP-in-IP tunnel flow matching:: 529a3ade5e3SMichael Baum 530a3ade5e3SMichael Baum FLEX_PARSER_PROFILE_ENABLE=0 531a3ade5e3SMichael Baum 532a3ade5e3SMichael Baum- enable MPLS flow matching:: 533a3ade5e3SMichael Baum 534a3ade5e3SMichael Baum FLEX_PARSER_PROFILE_ENABLE=1 535a3ade5e3SMichael Baum 536a3ade5e3SMichael Baum- enable ICMP(code/type/identifier/sequence number) / ICMP6(code/type) fields matching:: 537a3ade5e3SMichael Baum 538a3ade5e3SMichael Baum FLEX_PARSER_PROFILE_ENABLE=2 539a3ade5e3SMichael Baum 540a3ade5e3SMichael Baum- enable Geneve flow matching:: 541a3ade5e3SMichael Baum 542a3ade5e3SMichael Baum FLEX_PARSER_PROFILE_ENABLE=0 543a3ade5e3SMichael Baum or 544a3ade5e3SMichael Baum FLEX_PARSER_PROFILE_ENABLE=1 545a3ade5e3SMichael Baum 5462a39dda7SMichael Baum- enable Geneve TLV option flow matching:: 547a3ade5e3SMichael Baum 548a3ade5e3SMichael Baum FLEX_PARSER_PROFILE_ENABLE=0 5492a39dda7SMichael Baum or 550f5177bdcSMichael Baum FLEX_PARSER_PROFILE_ENABLE=8 551f5177bdcSMichael Baum 552a3ade5e3SMichael Baum- enable GTP flow matching:: 553a3ade5e3SMichael Baum 554a3ade5e3SMichael Baum FLEX_PARSER_PROFILE_ENABLE=3 555a3ade5e3SMichael Baum 556a3ade5e3SMichael Baum- enable eCPRI flow matching:: 557a3ade5e3SMichael Baum 558a3ade5e3SMichael Baum FLEX_PARSER_PROFILE_ENABLE=4 559a3ade5e3SMichael Baum PROG_PARSE_GRAPH=1 560a3ade5e3SMichael Baum 561a3ade5e3SMichael Baum- enable dynamic flex parser for flex item:: 562a3ade5e3SMichael Baum 563a3ade5e3SMichael Baum FLEX_PARSER_PROFILE_ENABLE=4 564a3ade5e3SMichael Baum PROG_PARSE_GRAPH=1 565a3ade5e3SMichael Baum 566a3ade5e3SMichael Baum- enable realtime timestamp format:: 567a3ade5e3SMichael Baum 568a3ade5e3SMichael Baum REAL_TIME_CLOCK_ENABLE=1 569a3ade5e3SMichael Baum 570f2d43ff5SDariusz Sosnowski- allow locking hairpin RQ data buffer in device memory:: 571f2d43ff5SDariusz Sosnowski 572f2d43ff5SDariusz Sosnowski HAIRPIN_DATA_BUFFER_LOCK=1 573f2d43ff5SDariusz Sosnowski MEMIC_SIZE_LIMIT=0 574f2d43ff5SDariusz Sosnowski 575a3ade5e3SMichael Baum 576a3ade5e3SMichael Baum.. _mlx5_common_driver_options: 577a3ade5e3SMichael Baum 578a3ade5e3SMichael BaumDevice Arguments 579a3ade5e3SMichael Baum---------------- 580a3ade5e3SMichael Baum 581a3ade5e3SMichael BaumThe driver can be configured per device. 582a3ade5e3SMichael BaumA single argument list can be used for a device managed by multiple PMDs. 583a3ade5e3SMichael BaumThe parameters must be passed through the EAL option ``-a``, 584a3ade5e3SMichael Baumas examples below: 585a3ade5e3SMichael Baum 586a3ade5e3SMichael Baum- PCI device:: 587a3ade5e3SMichael Baum 588a3ade5e3SMichael Baum -a 0000:03:00.2,class=eth:regex,mr_mempool_reg_en=0 589a3ade5e3SMichael Baum 590a3ade5e3SMichael Baum- Auxiliary SF:: 591a3ade5e3SMichael Baum 592a3ade5e3SMichael Baum -a auxiliary:mlx5_core.sf.2,class=compress,mr_ext_memseg_en=0 593a3ade5e3SMichael Baum 594a3ade5e3SMichael BaumEach device class PMD has its own list of specific arguments, 595a3ade5e3SMichael Baumand below are the arguments supported by the common mlx5 layer. 596a3ade5e3SMichael Baum 597a3ade5e3SMichael Baum- ``class`` parameter [string] 598a3ade5e3SMichael Baum 599a3ade5e3SMichael Baum Select the classes of the drivers that should probe the device. 600a3ade5e3SMichael Baum See :ref:`mlx5_classes` for more explanation and details. 601a3ade5e3SMichael Baum 602a3ade5e3SMichael Baum The default value is ``eth``. 603a3ade5e3SMichael Baum 604a3ade5e3SMichael Baum- ``mr_ext_memseg_en`` parameter [int] 605a3ade5e3SMichael Baum 606a3ade5e3SMichael Baum A nonzero value enables extending memseg when registering DMA memory. If 607a3ade5e3SMichael Baum enabled, the number of entries in MR (Memory Region) lookup table on datapath 608a3ade5e3SMichael Baum is minimized and it benefits performance. On the other hand, it worsens memory 609a3ade5e3SMichael Baum utilization because registered memory is pinned by kernel driver. Even if a 610a3ade5e3SMichael Baum page in the extended chunk is freed, that doesn't become reusable until the 611a3ade5e3SMichael Baum entire memory is freed. 612a3ade5e3SMichael Baum 613a3ade5e3SMichael Baum Enabled by default. 614a3ade5e3SMichael Baum 615a3ade5e3SMichael Baum- ``mr_mempool_reg_en`` parameter [int] 616a3ade5e3SMichael Baum 617a3ade5e3SMichael Baum A nonzero value enables implicit registration of DMA memory of all mempools 618a3ade5e3SMichael Baum except those having ``RTE_MEMPOOL_F_NON_IO``. This flag is set automatically 619a3ade5e3SMichael Baum for mempools populated with non-contiguous objects or those without IOVA. 620a3ade5e3SMichael Baum The effect is that when a packet from a mempool is transmitted, 621a3ade5e3SMichael Baum its memory is already registered for DMA in the PMD and no registration 622a3ade5e3SMichael Baum will happen on the data path. The tradeoff is extra work on the creation 623a3ade5e3SMichael Baum of each mempool and increased HW resource use if some mempools 624a3ade5e3SMichael Baum are not used with MLX5 devices. 625a3ade5e3SMichael Baum 626a3ade5e3SMichael Baum Enabled by default. 627a3ade5e3SMichael Baum 628a3ade5e3SMichael Baum- ``sys_mem_en`` parameter [int] 629a3ade5e3SMichael Baum 630a3ade5e3SMichael Baum A non-zero value enables the PMD memory management allocating memory 631a3ade5e3SMichael Baum from system by default, without explicit rte memory flag. 632a3ade5e3SMichael Baum 633a3ade5e3SMichael Baum By default, the PMD will set this value to 0. 634a6b9d5a5SMichael Baum 635a6b9d5a5SMichael Baum- ``sq_db_nc`` parameter [int] 636a6b9d5a5SMichael Baum 637a6b9d5a5SMichael Baum The rdma core library can map doorbell register in two ways, 638a6b9d5a5SMichael Baum depending on the environment variable "MLX5_SHUT_UP_BF": 639a6b9d5a5SMichael Baum 640a6b9d5a5SMichael Baum - As regular cached memory (usually with write combining attribute), 641a6b9d5a5SMichael Baum if the variable is either missing or set to zero. 642a6b9d5a5SMichael Baum - As non-cached memory, if the variable is present and set to not "0" value. 643a6b9d5a5SMichael Baum 644a6b9d5a5SMichael Baum The same doorbell mapping approach is implemented directly by PMD 645a6b9d5a5SMichael Baum in UAR generation for queues created with DevX. 646a6b9d5a5SMichael Baum 647a6b9d5a5SMichael Baum The type of mapping may slightly affect the send queue performance, 648a6b9d5a5SMichael Baum the optimal choice strongly relied on the host architecture 649a6b9d5a5SMichael Baum and should be deduced practically. 650a6b9d5a5SMichael Baum 651a6b9d5a5SMichael Baum If ``sq_db_nc`` is set to zero, the doorbell is forced to be mapped to 652a6b9d5a5SMichael Baum regular memory (with write combining), the PMD will perform the extra write 653a6b9d5a5SMichael Baum memory barrier after writing to doorbell, it might increase the needed CPU 654a6b9d5a5SMichael Baum clocks per packet to send, but latency might be improved. 655a6b9d5a5SMichael Baum 656a6b9d5a5SMichael Baum If ``sq_db_nc`` is set to one, the doorbell is forced to be mapped to non 657a6b9d5a5SMichael Baum cached memory, the PMD will not perform the extra write memory barrier after 658a6b9d5a5SMichael Baum writing to doorbell, on some architectures it might improve the performance. 659a6b9d5a5SMichael Baum 660a6b9d5a5SMichael Baum If ``sq_db_nc`` is set to two, the doorbell is forced to be mapped to 661a6b9d5a5SMichael Baum regular memory, the PMD will use heuristics to decide whether a write memory 662a6b9d5a5SMichael Baum barrier should be performed. For bursts with size multiple of recommended one 663a6b9d5a5SMichael Baum (64 pkts) it is supposed the next burst is coming and no need to issue the 664a6b9d5a5SMichael Baum extra memory barrier (it is supposed to be issued in the next coming burst, 665a6b9d5a5SMichael Baum at least after descriptor writing). It might increase latency (on some hosts 666a6b9d5a5SMichael Baum till the next packets transmit) and should be used with care. 667a6b9d5a5SMichael Baum The PMD uses heuristics only for Tx queue, for other semd queues the doorbell 668a6b9d5a5SMichael Baum is forced to be mapped to regular memory as same as ``sq_db_nc`` is set to 0. 669a6b9d5a5SMichael Baum 670a6b9d5a5SMichael Baum If ``sq_db_nc`` is omitted, the preset (if any) environment variable 671a6b9d5a5SMichael Baum "MLX5_SHUT_UP_BF" value is used. If there is no "MLX5_SHUT_UP_BF", the 672a6b9d5a5SMichael Baum default ``sq_db_nc`` value is zero for ARM64 hosts and one for others. 6739d936f4fSMichael Baum 6749d936f4fSMichael Baum- ``cmd_fd`` parameter [int] 6759d936f4fSMichael Baum 6769d936f4fSMichael Baum File descriptor of ``ibv_context`` created outside the PMD. 6779d936f4fSMichael Baum PMD will use this FD to import remote CTX. The ``cmd_fd`` is obtained from 6789d936f4fSMichael Baum the ``ibv_context->cmd_fd`` member, which must be dup'd before being passed. 6799d936f4fSMichael Baum This parameter is valid only if ``pd_handle`` parameter is specified. 6809d936f4fSMichael Baum 6819d936f4fSMichael Baum By default, the PMD will create a new ``ibv_context``. 6829d936f4fSMichael Baum 6839d936f4fSMichael Baum .. note:: 6849d936f4fSMichael Baum 6859d936f4fSMichael Baum When FD comes from another process, it is the user responsibility to 6869d936f4fSMichael Baum share the FD between the processes (e.g. by SCM_RIGHTS). 6879d936f4fSMichael Baum 6889d936f4fSMichael Baum- ``pd_handle`` parameter [int] 6899d936f4fSMichael Baum 6909d936f4fSMichael Baum Protection domain handle of ``ibv_pd`` created outside the PMD. 6919d936f4fSMichael Baum PMD will use this handle to import remote PD. The ``pd_handle`` can be 6929d936f4fSMichael Baum achieved from the original PD by getting its ``ibv_pd->handle`` member value. 6939d936f4fSMichael Baum This parameter is valid only if ``cmd_fd`` parameter is specified, 6949d936f4fSMichael Baum and its value must be a valid kernel handle for a PD object 6959d936f4fSMichael Baum in the context represented by given ``cmd_fd``. 6969d936f4fSMichael Baum 6979d936f4fSMichael Baum By default, the PMD will allocate a new PD. 6989d936f4fSMichael Baum 6999d936f4fSMichael Baum .. note:: 7009d936f4fSMichael Baum 7019d936f4fSMichael Baum The ``ibv_pd->handle`` member is different than ``mlx5dv_pd->pdn`` member. 702