1702928afSMaciej Bielski.. SPDX-License-Identifier: BSD-3-Clause 28a7a73f2SMichal Krawczyk Copyright (c) 2015-2020 Amazon.com, Inc. or its affiliates. 3cf8a122cSJan Medala All rights reserved. 4cf8a122cSJan Medala 5cf8a122cSJan MedalaENA Poll Mode Driver 6cf8a122cSJan Medala==================== 7cf8a122cSJan Medala 8cf8a122cSJan MedalaThe ENA PMD is a DPDK poll-mode driver for the Amazon Elastic 9cf8a122cSJan MedalaNetwork Adapter (ENA) family. 10cf8a122cSJan Medala 11b583b9a1SFerruh YigitSupported ENA adapters 12b583b9a1SFerruh Yigit---------------------- 13b583b9a1SFerruh Yigit 14b583b9a1SFerruh YigitCurrent ENA PMD supports the following ENA adapters including: 15b583b9a1SFerruh Yigit 16b583b9a1SFerruh Yigit* ``1d0f:ec20`` - ENA VF 17b583b9a1SFerruh Yigit* ``1d0f:ec21`` - ENA VF RSERV0 18b583b9a1SFerruh Yigit 19b583b9a1SFerruh YigitSupported features 20b583b9a1SFerruh Yigit------------------ 21b583b9a1SFerruh Yigit 22b583b9a1SFerruh Yigit* MTU configuration 23b583b9a1SFerruh Yigit* Jumbo frames up to 9K 24b583b9a1SFerruh Yigit* IPv4/TCP/UDP checksum offload 25b583b9a1SFerruh Yigit* TSO offload 26b583b9a1SFerruh Yigit* Multiple receive and transmit queues 27b583b9a1SFerruh Yigit* RSS hash 28b583b9a1SFerruh Yigit* RSS indirection table configuration 29b583b9a1SFerruh Yigit* Low Latency Queue for Tx 30b583b9a1SFerruh Yigit* Basic and extended statistics 31b583b9a1SFerruh Yigit* LSC event notification 32b583b9a1SFerruh Yigit* Watchdog (requires handling of timers in the application) 33b583b9a1SFerruh Yigit* Device reset upon failure 34b583b9a1SFerruh Yigit* Rx interrupts 35b583b9a1SFerruh Yigit 36cf8a122cSJan MedalaOverview 37cf8a122cSJan Medala-------- 38cf8a122cSJan Medala 39cf8a122cSJan MedalaThe ENA driver exposes a lightweight management interface with a 40cf8a122cSJan Medalaminimal set of memory mapped registers and an extendable command set 41cf8a122cSJan Medalathrough an Admin Queue. 42cf8a122cSJan Medala 43cf8a122cSJan MedalaThe driver supports a wide range of ENA adapters, is link-speed 44cf8a122cSJan Medalaindependent (i.e., the same driver is used for 10GbE, 25GbE, 40GbE, 45cf8a122cSJan Medalaetc.), and it negotiates and supports an extendable feature set. 46cf8a122cSJan Medala 47cf8a122cSJan MedalaENA adapters allow high speed and low overhead Ethernet traffic 48cf8a122cSJan Medalaprocessing by providing a dedicated Tx/Rx queue pair per CPU core. 49cf8a122cSJan Medala 50cf8a122cSJan MedalaThe ENA driver supports industry standard TCP/IP offload features such 51cf8a122cSJan Medalaas checksum offload and TCP transmit segmentation offload (TSO). 52cf8a122cSJan Medala 53cf8a122cSJan MedalaReceive-side scaling (RSS) is supported for multi-core scaling. 54cf8a122cSJan Medala 55cf8a122cSJan MedalaSome of the ENA devices support a working mode called Low-latency 56cf8a122cSJan MedalaQueue (LLQ), which saves several more microseconds. 57cf8a122cSJan Medala 58cf8a122cSJan MedalaManagement Interface 59cf8a122cSJan Medala-------------------- 60cf8a122cSJan Medala 61cf8a122cSJan MedalaENA management interface is exposed by means of: 62cf8a122cSJan Medala 63cf8a122cSJan Medala* Device Registers 64cf8a122cSJan Medala* Admin Queue (AQ) and Admin Completion Queue (ACQ) 65cf8a122cSJan Medala 66cf8a122cSJan MedalaENA device memory-mapped PCIe space for registers (MMIO registers) 67cf8a122cSJan Medalaare accessed only during driver initialization and are not involved 68cf8a122cSJan Medalain further normal device operation. 69cf8a122cSJan Medala 70cf8a122cSJan MedalaAQ is used for submitting management commands, and the 71cf8a122cSJan Medalaresults/responses are reported asynchronously through ACQ. 72cf8a122cSJan Medala 73cf8a122cSJan MedalaENA introduces a very small set of management commands with room for 74cf8a122cSJan Medalavendor-specific extensions. Most of the management operations are 75cf8a122cSJan Medalaframed in a generic Get/Set feature command. 76cf8a122cSJan Medala 77cf8a122cSJan MedalaThe following admin queue commands are supported: 78cf8a122cSJan Medala 79cf8a122cSJan Medala* Create I/O submission queue 80cf8a122cSJan Medala* Create I/O completion queue 81cf8a122cSJan Medala* Destroy I/O submission queue 82cf8a122cSJan Medala* Destroy I/O completion queue 83cf8a122cSJan Medala* Get feature 84cf8a122cSJan Medala* Set feature 85cf8a122cSJan Medala* Get statistics 86cf8a122cSJan Medala 87cf8a122cSJan MedalaRefer to ``ena_admin_defs.h`` for the list of supported Get/Set Feature 88cf8a122cSJan Medalaproperties. 89cf8a122cSJan Medala 90cf8a122cSJan MedalaData Path Interface 91cf8a122cSJan Medala------------------- 92cf8a122cSJan Medala 93cf8a122cSJan MedalaI/O operations are based on Tx and Rx Submission Queues (Tx SQ and Rx 94cf8a122cSJan MedalaSQ correspondingly). Each SQ has a completion queue (CQ) associated 95cf8a122cSJan Medalawith it. 96cf8a122cSJan Medala 97cf8a122cSJan MedalaThe SQs and CQs are implemented as descriptor rings in contiguous 98cf8a122cSJan Medalaphysical memory. 99cf8a122cSJan Medala 100cf8a122cSJan MedalaRefer to ``ena_eth_io_defs.h`` for the detailed structure of the descriptor 101cf8a122cSJan Medala 102cf8a122cSJan MedalaThe driver supports multi-queue for both Tx and Rx. 103cf8a122cSJan Medala 104b583b9a1SFerruh YigitConfiguration 105b583b9a1SFerruh Yigit------------- 106cf8a122cSJan Medala 107b583b9a1SFerruh YigitRuntime Configuration 108b583b9a1SFerruh Yigit^^^^^^^^^^^^^^^^^^^^^ 1098a7a73f2SMichal Krawczyk 110*d7918d19SShai Brandes * **llq_policy** (default 1) 1118a7a73f2SMichal Krawczyk 112*d7918d19SShai Brandes Controls whether use device recommended header policy or override it: 113*d7918d19SShai Brandes 114*d7918d19SShai Brandes 0 - Disable LLQ (Use with extreme caution as it leads to a huge performance 115*d7918d19SShai Brandes degradation on AWS instances built with Nitro v4 onwards). 116*d7918d19SShai Brandes 117*d7918d19SShai Brandes 1 - Accept device recommended LLQ policy (Default). 118*d7918d19SShai Brandes 119*d7918d19SShai Brandes 2 - Enforce normal LLQ policy. 120*d7918d19SShai Brandes 121*d7918d19SShai Brandes 3 - Enforce large LLQ policy. 1228a7a73f2SMichal Krawczyk 123cc0c5d25SMichal Krawczyk * **miss_txc_to** (default 5) 124cc0c5d25SMichal Krawczyk 125cc0c5d25SMichal Krawczyk Number of seconds after which the Tx packet will be considered missing. 126cc0c5d25SMichal Krawczyk If the missing packets number will exceed dynamically calculated threshold, 127cc0c5d25SMichal Krawczyk the driver will trigger the device reset which should be handled by the 128cc0c5d25SMichal Krawczyk application. Checking for missing Tx completions happens in the driver's 129cc0c5d25SMichal Krawczyk timer service. Setting this parameter to 0 disables this feature. Maximum 130cc0c5d25SMichal Krawczyk allowed value is 60 seconds. 131cc0c5d25SMichal Krawczyk 132ca1dfa85SShai Brandes * **control_poll_interval** (default 0) 133ca1dfa85SShai Brandes 134ca1dfa85SShai Brandes Enable polling-based functionality of the admin queues, 135ca1dfa85SShai Brandes eliminating the need for interrupts in the control-path: 136ca1dfa85SShai Brandes 137ca1dfa85SShai Brandes 0 - Disable (Admin queue will work in interrupt mode). 138ca1dfa85SShai Brandes 139ca1dfa85SShai Brandes [1..1000] - Number of milliseconds to wait between periodic inspection of the admin queues. 140ca1dfa85SShai Brandes 141ca1dfa85SShai Brandes **A non-zero value for this devarg is mandatory for control path functionality 142ca1dfa85SShai Brandes when binding ports to uio_pci_generic kernel module which lacks interrupt support.** 143ca1dfa85SShai Brandes 144ca1dfa85SShai Brandes 145b583b9a1SFerruh YigitENA Configuration Parameters 146b583b9a1SFerruh Yigit^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 147cf8a122cSJan Medala 148cf8a122cSJan Medala * **Number of Queues** 149cf8a122cSJan Medala 150cf8a122cSJan Medala This is the requested number of queues upon initialization, however, the actual 151cf8a122cSJan Medala number of receive and transmit queues to be created will be the minimum between 152cf8a122cSJan Medala the maximal number supported by the device and number of queues requested. 153cf8a122cSJan Medala 154cf8a122cSJan Medala * **Size of Queues** 155cf8a122cSJan Medala 156cf8a122cSJan Medala This is the requested size of receive/transmit queues, while the actual size 157cf8a122cSJan Medala will be the minimum between the requested size and the maximal receive/transmit 158cf8a122cSJan Medala supported by the device. 159cf8a122cSJan Medala 160cf8a122cSJan MedalaBuilding DPDK 161cf8a122cSJan Medala------------- 162cf8a122cSJan Medala 163cf8a122cSJan MedalaSee the :ref:`DPDK Getting Started Guide for Linux <linux_gsg>` for 164cf8a122cSJan Medalainstructions on how to build DPDK. 165cf8a122cSJan Medala 166cf8a122cSJan MedalaBy default the ENA PMD library will be built into the DPDK library. 167cf8a122cSJan Medala 168cf8a122cSJan MedalaFor configuring and using UIO and VFIO frameworks, please also refer :ref:`the 169cf8a122cSJan Medaladocumentation that comes with DPDK suite <linux_gsg>`. 170cf8a122cSJan Medala 171cf8a122cSJan MedalaSupported Operating Systems 172cf8a122cSJan Medala--------------------------- 173cf8a122cSJan Medala 174cf8a122cSJan MedalaAny Linux distribution fulfilling the conditions described in ``System Requirements`` 175cf8a122cSJan Medalasection of :ref:`the DPDK documentation <linux_gsg>` or refer to *DPDK Release Notes*. 176cf8a122cSJan Medala 177cf8a122cSJan MedalaPrerequisites 178cf8a122cSJan Medala------------- 179cf8a122cSJan Medala 180cf8a122cSJan Medala#. Prepare the system as recommended by DPDK suite. This includes environment 1814d0155dbSRafal Kozik variables, hugepages configuration, tool-chains and configuration. 182cf8a122cSJan Medala 183ca1dfa85SShai Brandes#. ENA PMD can operate with ``vfio-pci`` (*), ``igb_uio``, or ``uio_pci_generic`` driver. 18461e09339SRafal Kozik 18561e09339SRafal Kozik (*) ENAv2 hardware supports Low Latency Queue v2 (LLQv2). This feature 18661e09339SRafal Kozik reduces the latency of the packets by pushing the header directly through 18761e09339SRafal Kozik the PCI to the device, before the DMA is even triggered. For proper work 188ca1dfa85SShai Brandes kernel PCI driver must support write-combining (WC). 18956bb5841SThomas Monjalon In DPDK ``igb_uio`` it must be enabled by loading module with 19061e09339SRafal Kozik ``wc_activate=1`` flag (example below). However, mainline's vfio-pci 191ca1dfa85SShai Brandes driver in kernel doesn't have WC support yet (planned to be added). 192036ecae0SMichal Krawczyk If vfio-pci is used user should follow `AWS ENA PMD documentation 193036ecae0SMichal Krawczyk <https://github.com/amzn/amzn-drivers/tree/master/userspace/dpdk/README.md>`_. 194cf8a122cSJan Medala 195ca1dfa85SShai Brandes#. For ``igb_uio``: 196ca1dfa85SShai Brandes Insert ``igb_uio`` kernel module using the command ``modprobe uio; insmod igb_uio.ko wc_activate=1`` 1974d0155dbSRafal Kozik 198ca1dfa85SShai Brandes#. For ``vfio-pci``: 199ca1dfa85SShai Brandes Insert ``vfio-pci`` kernel module using the command ``modprobe vfio-pci`` 2004d0155dbSRafal Kozik Please make sure that ``IOMMU`` is enabled in your system, 2014d0155dbSRafal Kozik or use ``vfio`` driver in ``noiommu`` mode:: 2024d0155dbSRafal Kozik 2034d0155dbSRafal Kozik echo 1 > /sys/module/vfio/parameters/enable_unsafe_noiommu_mode 2044d0155dbSRafal Kozik 205fe9a344cSMichal Krawczyk To use ``noiommu`` mode, the ``vfio-pci`` must be built with flag 206fe9a344cSMichal Krawczyk ``CONFIG_VFIO_NOIOMMU``. 207fe9a344cSMichal Krawczyk 208ca1dfa85SShai Brandes#. For ``uio_pci_generic``: 209ca1dfa85SShai Brandes Insert ``uio_pci_generic`` kernel module using the command ``modprobe uio_pci_generic``. 210ca1dfa85SShai Brandes Make sure that the IOMMU is disabled or is in passthrough mode. 211ca1dfa85SShai Brandes For example: ``modprobe uio_pci_generic intel_iommu=off``. 212ca1dfa85SShai Brandes 213ca1dfa85SShai Brandes Note that when launching the application, 214ca1dfa85SShai Brandes the ``control_poll_interval`` devarg must be used with a non-zero value (1000 is recommended) 215ca1dfa85SShai Brandes as ``uio_pci_generic`` lacks interrupt support. 216ca1dfa85SShai Brandes The control-path (admin queues) of the ENA requires poll-mode 217ca1dfa85SShai Brandes to process command completion and asynchronous notification from the device. 218ca1dfa85SShai Brandes For example: ``dpdk-app -a "00:06.0,control_path_poll_interval=1000"``. 219ca1dfa85SShai Brandes 220ca1dfa85SShai Brandes#. Bind the intended ENA device to ``vfio-pci``, ``igb_uio``, or ``uio_pci_generic`` module. 221cf8a122cSJan Medala 222cf8a122cSJan MedalaAt this point the system should be ready to run DPDK applications. Once the 223fe9a344cSMichal Krawczykapplication runs to completion, the ENA can be detached from attached module if 224fe9a344cSMichal Krawczyknecessary. 225fe9a344cSMichal Krawczyk 2266986cdc4SMichal Krawczyk**Rx interrupts support** 2276986cdc4SMichal Krawczyk 228ca1dfa85SShai BrandesENA PMD supports Rx interrupts, which can be used to wake up lcores waiting for input. 229ca1dfa85SShai BrandesPlease note that it won't work with ``igb_uio`` and ``uio_pci_generic`` 230ca1dfa85SShai Brandesso to use this feature, the ``vfio-pci`` should be used. 2316986cdc4SMichal Krawczyk 2326986cdc4SMichal KrawczykENA handles admin interrupts and AENQ notifications on separate interrupt. 2336986cdc4SMichal KrawczykThere is possibility that there won't be enough event file descriptors to 2346986cdc4SMichal Krawczykhandle both admin and Rx interrupts. In that situation the Rx interrupt request 2356986cdc4SMichal Krawczykwill fail. 2366986cdc4SMichal Krawczyk 237fe9a344cSMichal Krawczyk**Note about usage on \*.metal instances** 238fe9a344cSMichal Krawczyk 239ca1dfa85SShai BrandesOn AWS, the metal instances are supporting IOMMU for both arm64 and x86_64 hosts. 240ca1dfa85SShai BrandesNote that ``uio_pci_generic`` lacks IOMMU support and cannot be used for metal instances. 241fe9a344cSMichal Krawczyk 242fe9a344cSMichal Krawczyk* x86_64 (e.g. c5.metal, i3.metal): 243fe9a344cSMichal Krawczyk IOMMU should be disabled by default. In that situation, the ``igb_uio`` can 244fe9a344cSMichal Krawczyk be used as it is but ``vfio-pci`` should be working in no-IOMMU mode (please 245fe9a344cSMichal Krawczyk see above). 246fe9a344cSMichal Krawczyk 247fe9a344cSMichal Krawczyk When IOMMU is enabled, ``igb_uio`` cannot be used as it's not supporting this 248fe9a344cSMichal Krawczyk feature, while ``vfio-pci`` should work without any changes. 249fe9a344cSMichal Krawczyk To enable IOMMU on those hosts, please update ``GRUB_CMDLINE_LINUX`` in file 250fe9a344cSMichal Krawczyk ``/etc/default/grub`` with the below extra boot arguments:: 251fe9a344cSMichal Krawczyk 252fe9a344cSMichal Krawczyk iommu=1 intel_iommu=on 253fe9a344cSMichal Krawczyk 254fe9a344cSMichal Krawczyk Then, make the changes live by executing as a root:: 255fe9a344cSMichal Krawczyk 256fe9a344cSMichal Krawczyk # grub2-mkconfig > /boot/grub2/grub.cfg 257fe9a344cSMichal Krawczyk 258fe9a344cSMichal Krawczyk Finally, reboot should result in IOMMU being enabled. 259fe9a344cSMichal Krawczyk 260fe9a344cSMichal Krawczyk* arm64 (a1.metal): 261fe9a344cSMichal Krawczyk IOMMU should be enabled by default. Unfortunately, ``vfio-pci`` isn't 262fe9a344cSMichal Krawczyk supporting SMMU, which is implementation of IOMMU for arm64 architecture and 263fe9a344cSMichal Krawczyk ``igb_uio`` isn't supporting IOMMU at all, so to use DPDK with ENA on those 264fe9a344cSMichal Krawczyk hosts, one must disable IOMMU. This can be done by updating 265fe9a344cSMichal Krawczyk ``GRUB_CMDLINE_LINUX`` in file ``/etc/default/grub`` with the extra boot 266fe9a344cSMichal Krawczyk argument:: 267fe9a344cSMichal Krawczyk 268fe9a344cSMichal Krawczyk iommu.passthrough=1 269fe9a344cSMichal Krawczyk 270fe9a344cSMichal Krawczyk Then, make the changes live by executing as a root:: 271fe9a344cSMichal Krawczyk 272fe9a344cSMichal Krawczyk # grub2-mkconfig > /boot/grub2/grub.cfg 273fe9a344cSMichal Krawczyk 274fe9a344cSMichal Krawczyk Finally, reboot should result in IOMMU being disabled. 275fe9a344cSMichal Krawczyk Without IOMMU, ``igb_uio`` can be used as it is but ``vfio-pci`` should be 276fe9a344cSMichal Krawczyk working in no-IOMMU mode (please see above). 277cf8a122cSJan Medala 278cf8a122cSJan MedalaUsage example 279cf8a122cSJan Medala------------- 280cf8a122cSJan Medala 281ec38d8b6SShijith ThottonFollow instructions available in the document 282ec38d8b6SShijith Thotton:ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` to launch 2838809f78cSBruce Richardson**testpmd** with Amazon ENA devices managed by librte_net_ena. 284cf8a122cSJan Medala 285cf8a122cSJan MedalaExample output: 286cf8a122cSJan Medala 287cf8a122cSJan Medala.. code-block:: console 288cf8a122cSJan Medala 289cf8a122cSJan Medala [...] 2903d62ecd8SMichal Krawczyk EAL: PCI device 0000:00:06.0 on NUMA socket -1 291e9b3d79bSDmitry Kozlyuk EAL: Device 0000:00:06.0 is not NUMA-aware, defaulting socket to 0 2923d62ecd8SMichal Krawczyk EAL: probe driver: 1d0f:ec20 net_ena 2933d62ecd8SMichal Krawczyk 294cf8a122cSJan Medala Interactive-mode selected 2953d62ecd8SMichal Krawczyk testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=171456, size=2176, socket=0 2963d62ecd8SMichal Krawczyk testpmd: preferred mempool ops selected: ring_mp_mc 2973d62ecd8SMichal Krawczyk Warning! port-topology=paired and odd forward ports number, the last port will pair with itself. 298cf8a122cSJan Medala Configuring Port 0 (socket 0) 299cf8a122cSJan Medala Port 0: 00:00:00:11:00:01 300cf8a122cSJan Medala Checking link statuses... 3013d62ecd8SMichal Krawczyk 302cf8a122cSJan Medala Done 303cf8a122cSJan Medala testpmd> 304