1.. SPDX-License-Identifier: BSD-3-Clause 2 Copyright(c) 2016 Cavium, Inc 3 4ThunderX NICVF Poll Mode Driver 5=============================== 6 7The ThunderX NICVF PMD (**librte_pmd_thunderx_nicvf**) provides poll mode driver 8support for the inbuilt NIC found in the **Cavium ThunderX** SoC family 9as well as their virtual functions (VF) in SR-IOV context. 10 11More information can be found at `Cavium, Inc Official Website 12<http://www.cavium.com/ThunderX_ARM_Processors.html>`_. 13 14Features 15-------- 16 17Features of the ThunderX PMD are: 18 19- Multiple queues for TX and RX 20- Receive Side Scaling (RSS) 21- Packet type information 22- Checksum offload 23- Promiscuous mode 24- Multicast mode 25- Port hardware statistics 26- Jumbo frames 27- Link state information 28- Setting up link state. 29- Scattered and gather for TX and RX 30- VLAN stripping 31- SR-IOV VF 32- NUMA support 33- Multi queue set support (up to 96 queues (12 queue sets)) per port 34- Skip data bytes 35 36Supported ThunderX SoCs 37----------------------- 38- CN88xx 39- CN81xx 40- CN83xx 41 42Prerequisites 43------------- 44- Follow the DPDK :ref:`Getting Started Guide for Linux <linux_gsg>` to setup the basic DPDK environment. 45 46Pre-Installation Configuration 47------------------------------ 48 49Config File Options 50~~~~~~~~~~~~~~~~~~~ 51 52The following options can be modified in the ``config`` file. 53Please note that enabling debugging options may affect system performance. 54 55- ``CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD`` (default ``y``) 56 57 Toggle compilation of the ``librte_pmd_thunderx_nicvf`` driver. 58 59- ``CONFIG_RTE_LIBRTE_THUNDERX_NICVF_DEBUG_RX`` (default ``n``) 60 61 Toggle asserts of receive fast path. 62 63- ``CONFIG_RTE_LIBRTE_THUNDERX_NICVF_DEBUG_TX`` (default ``n``) 64 65 Toggle asserts of transmit fast path. 66 67Driver compilation and testing 68------------------------------ 69 70Refer to the document :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` 71for details. 72 73To compile the ThunderX NICVF PMD for Linux arm64 gcc, 74use arm64-thunderx-linux-gcc as target. 75 76Linux 77----- 78 79SR-IOV: Prerequisites and sample Application Notes 80~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 81 82Current ThunderX NIC PF/VF kernel modules maps each physical Ethernet port 83automatically to virtual function (VF) and presented them as PCIe-like SR-IOV device. 84This section provides instructions to configure SR-IOV with Linux OS. 85 86#. Verify PF devices capabilities using ``lspci``: 87 88 .. code-block:: console 89 90 lspci -vvv 91 92 Example output: 93 94 .. code-block:: console 95 96 0002:01:00.0 Ethernet controller: Cavium Networks Device a01e (rev 01) 97 ... 98 Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI) 99 ... 100 Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV) 101 ... 102 Kernel driver in use: thunder-nic 103 ... 104 105 .. note:: 106 107 Unless ``thunder-nic`` driver is in use make sure your kernel config includes ``CONFIG_THUNDER_NIC_PF`` setting. 108 109#. Verify VF devices capabilities and drivers using ``lspci``: 110 111 .. code-block:: console 112 113 lspci -vvv 114 115 Example output: 116 117 .. code-block:: console 118 119 0002:01:00.1 Ethernet controller: Cavium Networks Device 0011 (rev 01) 120 ... 121 Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI) 122 ... 123 Kernel driver in use: thunder-nicvf 124 ... 125 126 0002:01:00.2 Ethernet controller: Cavium Networks Device 0011 (rev 01) 127 ... 128 Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI) 129 ... 130 Kernel driver in use: thunder-nicvf 131 ... 132 133 .. note:: 134 135 Unless ``thunder-nicvf`` driver is in use make sure your kernel config includes ``CONFIG_THUNDER_NIC_VF`` setting. 136 137#. Pass VF device to VM context (PCIe Passthrough): 138 139 The VF devices may be passed through to the guest VM using qemu or 140 virt-manager or virsh etc. 141 142 Example qemu guest launch command: 143 144 .. code-block:: console 145 146 sudo qemu-system-aarch64 -name vm1 \ 147 -machine virt,gic_version=3,accel=kvm,usb=off \ 148 -cpu host -m 4096 \ 149 -smp 4,sockets=1,cores=8,threads=1 \ 150 -nographic -nodefaults \ 151 -kernel <kernel image> \ 152 -append "root=/dev/vda console=ttyAMA0 rw hugepagesz=512M hugepages=3" \ 153 -device vfio-pci,host=0002:01:00.1 \ 154 -drive file=<rootfs.ext3>,if=none,id=disk1,format=raw \ 155 -device virtio-blk-device,scsi=off,drive=disk1,id=virtio-disk1,bootindex=1 \ 156 -netdev tap,id=net0,ifname=tap0,script=/etc/qemu-ifup_thunder \ 157 -device virtio-net-device,netdev=net0 \ 158 -serial stdio \ 159 -mem-path /dev/hugepages 160 161#. Enable **VFIO-NOIOMMU** mode (optional): 162 163 .. code-block:: console 164 165 echo 1 > /sys/module/vfio/parameters/enable_unsafe_noiommu_mode 166 167 .. note:: 168 169 **VFIO-NOIOMMU** is required only when running in VM context and should not be enabled otherwise. 170 171#. Running testpmd: 172 173 Follow instructions available in the document 174 :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` 175 to run testpmd. 176 177 Example output: 178 179 .. code-block:: console 180 181 ./arm64-thunderx-linux-gcc/app/testpmd -l 0-3 -n 4 -w 0002:01:00.2 \ 182 -- -i --no-flush-rx \ 183 --port-topology=loop 184 185 ... 186 187 PMD: rte_nicvf_pmd_init(): librte_pmd_thunderx nicvf version 1.0 188 189 ... 190 EAL: probe driver: 177d:11 rte_nicvf_pmd 191 EAL: using IOMMU type 1 (Type 1) 192 EAL: PCI memory mapped at 0x3ffade50000 193 EAL: Trying to map BAR 4 that contains the MSI-X table. 194 Trying offsets: 0x40000000000:0x0000, 0x10000:0x1f0000 195 EAL: PCI memory mapped at 0x3ffadc60000 196 PMD: nicvf_eth_dev_init(): nicvf: device (177d:11) 2:1:0:2 197 PMD: nicvf_eth_dev_init(): node=0 vf=1 mode=tns-bypass sqs=false 198 loopback_supported=true 199 PMD: nicvf_eth_dev_init(): Port 0 (177d:11) mac=a6:c6:d9:17:78:01 200 Interactive-mode selected 201 Configuring Port 0 (socket 0) 202 ... 203 204 PMD: nicvf_dev_configure(): Configured ethdev port0 hwcap=0x0 205 Port 0: A6:C6:D9:17:78:01 206 Checking link statuses... 207 Port 0 Link Up - speed 10000 Mbps - full-duplex 208 Done 209 testpmd> 210 211Multiple Queue Set per DPDK port configuration 212~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 213 214There are two types of VFs: 215 216- Primary VF 217- Secondary VF 218 219Each port consists of a primary VF and n secondary VF(s). Each VF provides 8 Tx/Rx queues to a port. 220When a given port is configured to use more than 8 queues, it requires one (or more) secondary VF. 221Each secondary VF adds 8 additional queues to the queue set. 222 223During PMD driver initialization, the primary VF's are enumerated by checking the 224specific flag (see sqs message in DPDK boot log - sqs indicates secondary queue set). 225They are at the beginning of VF list (the remain ones are secondary VF's). 226 227The primary VFs are used as master queue sets. Secondary VFs provide 228additional queue sets for primary ones. If a port is configured for more then 2298 queues than it will request for additional queues from secondary VFs. 230 231Secondary VFs cannot be shared between primary VFs. 232 233Primary VFs are present on the beginning of the 'Network devices using kernel 234driver' list, secondary VFs are on the remaining on the remaining part of the list. 235 236 .. note:: 237 238 The VNIC driver in the multiqueue setup works differently than other drivers like `ixgbe`. 239 We need to bind separately each specific queue set device with the ``usertools/dpdk-devbind.py`` utility. 240 241 .. note:: 242 243 Depending on the hardware used, the kernel driver sets a threshold ``vf_id``. VFs that try to attached with an id below or equal to 244 this boundary are considered primary VFs. VFs that try to attach with an id above this boundary are considered secondary VFs. 245 246LBK HW Access 247~~~~~~~~~~~~~ 248 249Loopback HW Unit (LBK) receives packets from NIC-RX and sends packets back to NIC-TX. 250The loopback block has N channels and contains data buffering that is shared across 251all channels. Four primary VFs are reserved as loopback ports. 252 253Example device binding 254~~~~~~~~~~~~~~~~~~~~~~ 255 256If a system has three interfaces, a total of 18 VF devices will be created 257on a non-NUMA machine. 258 259 .. note:: 260 261 NUMA systems have 12 VFs per port and non-NUMA 6 VFs per port. 262 263 .. code-block:: console 264 265 # usertools/dpdk-devbind.py --status 266 267 Network devices using DPDK-compatible driver 268 ============================================ 269 <none> 270 271 Network devices using kernel driver 272 =================================== 273 0000:01:10.0 'THUNDERX BGX (Common Ethernet Interface) a026' if= drv=thunder-BGX unused=vfio-pci 274 0000:01:10.1 'THUNDERX BGX (Common Ethernet Interface) a026' if= drv=thunder-BGX unused=vfio-pci 275 0001:01:00.0 'THUNDERX Network Interface Controller a01e' if= drv=thunder-nic unused=vfio-pci 276 0001:01:00.1 'Device a034' if=eth0 drv=thunder-nicvf unused=vfio-pci 277 0001:01:00.2 'Device a034' if=eth1 drv=thunder-nicvf unused=vfio-pci 278 0001:01:00.3 'Device a034' if=eth2 drv=thunder-nicvf unused=vfio-pci 279 0001:01:00.4 'Device a034' if=eth3 drv=thunder-nicvf unused=vfio-pci 280 0001:01:00.5 'Device a034' if=eth4 drv=thunder-nicvf unused=vfio-pci 281 0001:01:00.6 'Device a034' if=lbk0 drv=thunder-nicvf unused=vfio-pci 282 0001:01:00.7 'Device a034' if=lbk1 drv=thunder-nicvf unused=vfio-pci 283 0001:01:01.0 'Device a034' if=lbk2 drv=thunder-nicvf unused=vfio-pci 284 0001:01:01.1 'Device a034' if=lbk3 drv=thunder-nicvf unused=vfio-pci 285 0001:01:01.2 'Device a034' if= drv=thunder-nicvf unused=vfio-pci 286 0001:01:01.3 'Device a034' if= drv=thunder-nicvf unused=vfio-pci 287 0001:01:01.4 'Device a034' if= drv=thunder-nicvf unused=vfio-pci 288 0001:01:01.5 'Device a034' if= drv=thunder-nicvf unused=vfio-pci 289 0001:01:01.6 'Device a034' if= drv=thunder-nicvf unused=vfio-pci 290 0001:01:01.7 'Device a034' if= drv=thunder-nicvf unused=vfio-pci 291 0001:01:02.0 'Device a034' if= drv=thunder-nicvf unused=vfio-pci 292 0001:01:02.1 'Device a034' if= drv=thunder-nicvf unused=vfio-pci 293 0001:01:02.2 'Device a034' if= drv=thunder-nicvf unused=vfio-pci 294 295 Other network devices 296 ===================== 297 0002:00:03.0 'Device a01f' unused=vfio-pci,uio_pci_generic 298 299 .. note:: 300 301 Here total no of primary VFs = 5 (variable, depends on no of ethernet ports present) + 4 (fixed, loopback ports). 302 Ethernet ports are indicated as `if=eth0` while loopback ports as `if=lbk0`. 303 304We want to bind two physical interfaces with 24 queues each device, we attach two primary VFs 305and four secondary VFs. In our example we choose two 10G interfaces eth1 (0002:01:00.2) and eth2 (0002:01:00.3). 306We will choose four secondary queue sets from the ending of the list (0001:01:01.2-0002:01:02.2). 307 308 309#. Bind two primary VFs to the ``vfio-pci`` driver: 310 311 .. code-block:: console 312 313 usertools/dpdk-devbind.py -b vfio-pci 0002:01:00.2 314 usertools/dpdk-devbind.py -b vfio-pci 0002:01:00.3 315 316#. Bind four primary VFs to the ``vfio-pci`` driver: 317 318 .. code-block:: console 319 320 usertools/dpdk-devbind.py -b vfio-pci 0002:01:01.7 321 usertools/dpdk-devbind.py -b vfio-pci 0002:01:02.0 322 usertools/dpdk-devbind.py -b vfio-pci 0002:01:02.1 323 usertools/dpdk-devbind.py -b vfio-pci 0002:01:02.2 324 325The nicvf thunderx driver will make use of attached secondary VFs automatically during the interface configuration stage. 326 327Thunder-nic VF's 328~~~~~~~~~~~~~~~~ 329 330Use sysfs to distinguish thunder-nic primary VFs and secondary VFs. 331 .. code-block:: console 332 333 ls -l /sys/bus/pci/drivers/thunder-nic/ 334 total 0 335 drwxr-xr-x 2 root root 0 Jan 22 11:19 ./ 336 drwxr-xr-x 86 root root 0 Jan 22 11:07 ../ 337 lrwxrwxrwx 1 root root 0 Jan 22 11:19 0001:01:00.0 -> '../../../../devices/platform/soc@0/849000000000.pci/pci0001:00/0001:00:10.0/0001:01:00.0'/ 338 339 .. code-block:: console 340 341 cat /sys/bus/pci/drivers/thunder-nic/0001\:01\:00.0/sriov_sqs_assignment 342 12 343 0 0001:01:00.1 vfio-pci +: 12 13 344 1 0001:01:00.2 thunder-nicvf -: 345 2 0001:01:00.3 thunder-nicvf -: 346 3 0001:01:00.4 thunder-nicvf -: 347 4 0001:01:00.5 thunder-nicvf -: 348 5 0001:01:00.6 thunder-nicvf -: 349 6 0001:01:00.7 thunder-nicvf -: 350 7 0001:01:01.0 thunder-nicvf -: 351 8 0001:01:01.1 thunder-nicvf -: 352 9 0001:01:01.2 thunder-nicvf -: 353 10 0001:01:01.3 thunder-nicvf -: 354 11 0001:01:01.4 thunder-nicvf -: 355 12 0001:01:01.5 vfio-pci: 0 356 13 0001:01:01.6 vfio-pci: 0 357 14 0001:01:01.7 thunder-nicvf: 255 358 15 0001:01:02.0 thunder-nicvf: 255 359 16 0001:01:02.1 thunder-nicvf: 255 360 17 0001:01:02.2 thunder-nicvf: 255 361 18 0001:01:02.3 thunder-nicvf: 255 362 19 0001:01:02.4 thunder-nicvf: 255 363 20 0001:01:02.5 thunder-nicvf: 255 364 21 0001:01:02.6 thunder-nicvf: 255 365 22 0001:01:02.7 thunder-nicvf: 255 366 23 0001:01:03.0 thunder-nicvf: 255 367 24 0001:01:03.1 thunder-nicvf: 255 368 25 0001:01:03.2 thunder-nicvf: 255 369 26 0001:01:03.3 thunder-nicvf: 255 370 27 0001:01:03.4 thunder-nicvf: 255 371 28 0001:01:03.5 thunder-nicvf: 255 372 29 0001:01:03.6 thunder-nicvf: 255 373 30 0001:01:03.7 thunder-nicvf: 255 374 31 0001:01:04.0 thunder-nicvf: 255 375 376Every column that ends with 'thunder-nicvf: number' can be used as secondary VF. 377In printout above all entres after '14 0001:01:01.7 thunder-nicvf: 255' can be used as secondary VF. 378 379Debugging Options 380----------------- 381 382EAL command option to change log level 383 .. code-block:: console 384 385 --log-level=pmd.net.thunderx.driver:info 386 or 387 --log-level=pmd.net.thunderx.driver,7 388 389Module params 390-------------- 391 392skip_data_bytes 393~~~~~~~~~~~~~~~ 394This feature is used to create a hole between HEADROOM and actual data. Size of hole is specified 395in bytes as module param("skip_data_bytes") to pmd. 396This scheme is useful when application would like to insert vlan header without disturbing HEADROOM. 397 398Example: 399 .. code-block:: console 400 401 -w 0002:01:00.2,skip_data_bytes=8 402 403Limitations 404----------- 405 406CRC stripping 407~~~~~~~~~~~~~ 408 409The ThunderX SoC family NICs strip the CRC for every packets coming into the 410host interface irrespective of the offload configuration. 411 412Maximum packet length 413~~~~~~~~~~~~~~~~~~~~~ 414 415The ThunderX SoC family NICs support a maximum of a 9K jumbo frame. The value 416is fixed and cannot be changed. So, even when the ``rxmode.max_rx_pkt_len`` 417member of ``struct rte_eth_conf`` is set to a value lower than 9200, frames 418up to 9200 bytes can still reach the host interface. 419 420Maximum packet segments 421~~~~~~~~~~~~~~~~~~~~~~~ 422 423The ThunderX SoC family NICs support up to 12 segments per packet when working 424in scatter/gather mode. So, setting MTU will result with ``EINVAL`` when the 425frame size does not fit in the maximum number of segments. 426 427skip_data_bytes 428~~~~~~~~~~~~~~~ 429 430Maximum limit of skip_data_bytes is 128 bytes and number of bytes should be multiple of 8. 431