1.. SPDX-License-Identifier: BSD-3-Clause 2 Copyright(c) 2016 Cavium, Inc 3 4ThunderX NICVF Poll Mode Driver 5=============================== 6 7The ThunderX NICVF PMD (**librte_net_thunderx**) provides poll mode driver 8support for the inbuilt NIC found in the **Cavium ThunderX** SoC family 9as well as their virtual functions (VF) in SR-IOV context. 10 11More information can be found at `Cavium, Inc Official Website 12<http://www.cavium.com/ThunderX_ARM_Processors.html>`_. 13 14Supported ThunderX SoCs 15----------------------- 16- CN88xx 17- CN81xx 18- CN83xx 19 20Features 21-------- 22 23Features of the ThunderX PMD are: 24 25- Multiple queues for TX and RX 26- Receive Side Scaling (RSS) 27- Packet type information 28- Checksum offload 29- Promiscuous mode 30- Multicast mode 31- Port hardware statistics 32- Jumbo frames 33- Link state information 34- Setting up link state. 35- Scattered and gather for TX and RX 36- VLAN stripping 37- SR-IOV VF 38- NUMA support 39- Multi queue set support (up to 96 queues (12 queue sets)) per port 40- Skip data bytes 41 42Prerequisites 43------------- 44- Follow the DPDK :ref:`Getting Started Guide for Linux <linux_gsg>` to setup the basic DPDK environment. 45 46 47Driver compilation and testing 48------------------------------ 49 50Refer to the document :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` 51for details. 52 53Use config/arm/arm64-thunderx-linux-gcc as a meson cross-file when cross-compiling. 54 55Linux 56----- 57 58SR-IOV: Prerequisites and sample Application Notes 59~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 60 61Current ThunderX NIC PF/VF kernel modules maps each physical Ethernet port 62automatically to virtual function (VF) and presented them as PCIe-like SR-IOV device. 63This section provides instructions to configure SR-IOV with Linux OS. 64 65#. Verify PF devices capabilities using ``lspci``: 66 67 .. code-block:: console 68 69 lspci -vvv 70 71 Example output: 72 73 .. code-block:: console 74 75 0002:01:00.0 Ethernet controller: Cavium Networks Device a01e (rev 01) 76 ... 77 Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI) 78 ... 79 Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV) 80 ... 81 Kernel driver in use: thunder-nic 82 ... 83 84 .. note:: 85 86 Unless ``thunder-nic`` driver is in use make sure your kernel config includes ``CONFIG_THUNDER_NIC_PF`` setting. 87 88#. Verify VF devices capabilities and drivers using ``lspci``: 89 90 .. code-block:: console 91 92 lspci -vvv 93 94 Example output: 95 96 .. code-block:: console 97 98 0002:01:00.1 Ethernet controller: Cavium Networks Device 0011 (rev 01) 99 ... 100 Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI) 101 ... 102 Kernel driver in use: thunder-nicvf 103 ... 104 105 0002:01:00.2 Ethernet controller: Cavium Networks Device 0011 (rev 01) 106 ... 107 Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI) 108 ... 109 Kernel driver in use: thunder-nicvf 110 ... 111 112 .. note:: 113 114 Unless ``thunder-nicvf`` driver is in use make sure your kernel config includes ``CONFIG_THUNDER_NIC_VF`` setting. 115 116#. Pass VF device to VM context (PCIe Passthrough): 117 118 The VF devices may be passed through to the guest VM using qemu or 119 virt-manager or virsh etc. 120 121 Example qemu guest launch command: 122 123 .. code-block:: console 124 125 sudo qemu-system-aarch64 -name vm1 \ 126 -machine virt,gic_version=3,accel=kvm,usb=off \ 127 -cpu host -m 4096 \ 128 -smp 4,sockets=1,cores=8,threads=1 \ 129 -nographic -nodefaults \ 130 -kernel <kernel image> \ 131 -append "root=/dev/vda console=ttyAMA0 rw hugepagesz=512M hugepages=3" \ 132 -device vfio-pci,host=0002:01:00.1 \ 133 -drive file=<rootfs.ext3>,if=none,id=disk1,format=raw \ 134 -device virtio-blk-device,scsi=off,drive=disk1,id=virtio-disk1,bootindex=1 \ 135 -netdev tap,id=net0,ifname=tap0,script=/etc/qemu-ifup_thunder \ 136 -device virtio-net-device,netdev=net0 \ 137 -serial stdio \ 138 -mem-path /dev/hugepages 139 140#. Enable **VFIO-NOIOMMU** mode (optional): 141 142 .. code-block:: console 143 144 echo 1 > /sys/module/vfio/parameters/enable_unsafe_noiommu_mode 145 146 .. note:: 147 148 **VFIO-NOIOMMU** is required only when running in VM context and should not be enabled otherwise. 149 150#. Running testpmd: 151 152 Follow instructions available in the document 153 :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` 154 to run testpmd. 155 156 Example output: 157 158 .. code-block:: console 159 160 ./<build_dir>/app/dpdk-testpmd -l 0-3 -n 4 -a 0002:01:00.2 \ 161 -- -i --no-flush-rx \ 162 --port-topology=loop 163 164 ... 165 166 PMD: rte_nicvf_pmd_init(): librte_net_thunderx nicvf version 1.0 167 168 ... 169 EAL: probe driver: 177d:11 rte_nicvf_pmd 170 EAL: using IOMMU type 1 (Type 1) 171 EAL: PCI memory mapped at 0x3ffade50000 172 EAL: Trying to map BAR 4 that contains the MSI-X table. 173 Trying offsets: 0x40000000000:0x0000, 0x10000:0x1f0000 174 EAL: PCI memory mapped at 0x3ffadc60000 175 PMD: nicvf_eth_dev_init(): nicvf: device (177d:11) 2:1:0:2 176 PMD: nicvf_eth_dev_init(): node=0 vf=1 mode=tns-bypass sqs=false 177 loopback_supported=true 178 PMD: nicvf_eth_dev_init(): Port 0 (177d:11) mac=a6:c6:d9:17:78:01 179 Interactive-mode selected 180 Configuring Port 0 (socket 0) 181 ... 182 183 PMD: nicvf_dev_configure(): Configured ethdev port0 hwcap=0x0 184 Port 0: A6:C6:D9:17:78:01 185 Checking link statuses... 186 Port 0 Link Up - speed 10000 Mbps - full-duplex 187 Done 188 testpmd> 189 190Multiple Queue Set per DPDK port configuration 191~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 192 193There are two types of VFs: 194 195- Primary VF 196- Secondary VF 197 198Each port consists of a primary VF and n secondary VF(s). Each VF provides 8 Tx/Rx queues to a port. 199When a given port is configured to use more than 8 queues, it requires one (or more) secondary VF. 200Each secondary VF adds 8 additional queues to the queue set. 201 202During PMD initialization, the primary VF's are enumerated by checking the 203specific flag (see sqs message in DPDK boot log - sqs indicates secondary queue set). 204They are at the beginning of VF list (the remain ones are secondary VF's). 205 206The primary VFs are used as master queue sets. Secondary VFs provide 207additional queue sets for primary ones. If a port is configured for more then 2088 queues than it will request for additional queues from secondary VFs. 209 210Secondary VFs cannot be shared between primary VFs. 211 212Primary VFs are present on the beginning of the 'Network devices using kernel 213driver' list, secondary VFs are on the remaining on the remaining part of the list. 214 215 .. note:: 216 217 The VNIC driver in the multiqueue setup works differently than other drivers like `ixgbe`. 218 We need to bind separately each specific queue set device with the ``usertools/dpdk-devbind.py`` utility. 219 220 .. note:: 221 222 Depending on the hardware used, the kernel driver sets a threshold ``vf_id``. VFs that try to attached with an id below or equal to 223 this boundary are considered primary VFs. VFs that try to attach with an id above this boundary are considered secondary VFs. 224 225LBK HW Access 226~~~~~~~~~~~~~ 227 228Loopback HW Unit (LBK) receives packets from NIC-RX and sends packets back to NIC-TX. 229The loopback block has N channels and contains data buffering that is shared across 230all channels. Four primary VFs are reserved as loopback ports. 231 232Example device binding 233~~~~~~~~~~~~~~~~~~~~~~ 234 235If a system has three interfaces, a total of 18 VF devices will be created 236on a non-NUMA machine. 237 238 .. note:: 239 240 NUMA systems have 12 VFs per port and non-NUMA 6 VFs per port. 241 242 .. code-block:: console 243 244 # usertools/dpdk-devbind.py --status 245 246 Network devices using DPDK-compatible driver 247 ============================================ 248 <none> 249 250 Network devices using kernel driver 251 =================================== 252 0000:01:10.0 'THUNDERX BGX (Common Ethernet Interface) a026' if= drv=thunder-BGX unused=vfio-pci 253 0000:01:10.1 'THUNDERX BGX (Common Ethernet Interface) a026' if= drv=thunder-BGX unused=vfio-pci 254 0001:01:00.0 'THUNDERX Network Interface Controller a01e' if= drv=thunder-nic unused=vfio-pci 255 0001:01:00.1 'Device a034' if=eth0 drv=thunder-nicvf unused=vfio-pci 256 0001:01:00.2 'Device a034' if=eth1 drv=thunder-nicvf unused=vfio-pci 257 0001:01:00.3 'Device a034' if=eth2 drv=thunder-nicvf unused=vfio-pci 258 0001:01:00.4 'Device a034' if=eth3 drv=thunder-nicvf unused=vfio-pci 259 0001:01:00.5 'Device a034' if=eth4 drv=thunder-nicvf unused=vfio-pci 260 0001:01:00.6 'Device a034' if=lbk0 drv=thunder-nicvf unused=vfio-pci 261 0001:01:00.7 'Device a034' if=lbk1 drv=thunder-nicvf unused=vfio-pci 262 0001:01:01.0 'Device a034' if=lbk2 drv=thunder-nicvf unused=vfio-pci 263 0001:01:01.1 'Device a034' if=lbk3 drv=thunder-nicvf unused=vfio-pci 264 0001:01:01.2 'Device a034' if= drv=thunder-nicvf unused=vfio-pci 265 0001:01:01.3 'Device a034' if= drv=thunder-nicvf unused=vfio-pci 266 0001:01:01.4 'Device a034' if= drv=thunder-nicvf unused=vfio-pci 267 0001:01:01.5 'Device a034' if= drv=thunder-nicvf unused=vfio-pci 268 0001:01:01.6 'Device a034' if= drv=thunder-nicvf unused=vfio-pci 269 0001:01:01.7 'Device a034' if= drv=thunder-nicvf unused=vfio-pci 270 0001:01:02.0 'Device a034' if= drv=thunder-nicvf unused=vfio-pci 271 0001:01:02.1 'Device a034' if= drv=thunder-nicvf unused=vfio-pci 272 0001:01:02.2 'Device a034' if= drv=thunder-nicvf unused=vfio-pci 273 274 Other network devices 275 ===================== 276 0002:00:03.0 'Device a01f' unused=vfio-pci,uio_pci_generic 277 278 .. note:: 279 280 Here total no of primary VFs = 5 (variable, depends on no of ethernet ports present) + 4 (fixed, loopback ports). 281 Ethernet ports are indicated as `if=eth0` while loopback ports as `if=lbk0`. 282 283We want to bind two physical interfaces with 24 queues each device, we attach two primary VFs 284and four secondary VFs. In our example we choose two 10G interfaces eth1 (0002:01:00.2) and eth2 (0002:01:00.3). 285We will choose four secondary queue sets from the ending of the list (0001:01:01.2-0002:01:02.2). 286 287 288#. Bind two primary VFs to the ``vfio-pci`` driver: 289 290 .. code-block:: console 291 292 usertools/dpdk-devbind.py -b vfio-pci 0002:01:00.2 293 usertools/dpdk-devbind.py -b vfio-pci 0002:01:00.3 294 295#. Bind four primary VFs to the ``vfio-pci`` driver: 296 297 .. code-block:: console 298 299 usertools/dpdk-devbind.py -b vfio-pci 0002:01:01.7 300 usertools/dpdk-devbind.py -b vfio-pci 0002:01:02.0 301 usertools/dpdk-devbind.py -b vfio-pci 0002:01:02.1 302 usertools/dpdk-devbind.py -b vfio-pci 0002:01:02.2 303 304The nicvf thunderx driver will make use of attached secondary VFs automatically during the interface configuration stage. 305 306Thunder-nic VF's 307~~~~~~~~~~~~~~~~ 308 309Use sysfs to distinguish thunder-nic primary VFs and secondary VFs. 310 .. code-block:: console 311 312 ls -l /sys/bus/pci/drivers/thunder-nic/ 313 total 0 314 drwxr-xr-x 2 root root 0 Jan 22 11:19 ./ 315 drwxr-xr-x 86 root root 0 Jan 22 11:07 ../ 316 lrwxrwxrwx 1 root root 0 Jan 22 11:19 0001:01:00.0 -> '../../../../devices/platform/soc@0/849000000000.pci/pci0001:00/0001:00:10.0/0001:01:00.0'/ 317 318 .. code-block:: console 319 320 cat /sys/bus/pci/drivers/thunder-nic/0001\:01\:00.0/sriov_sqs_assignment 321 12 322 0 0001:01:00.1 vfio-pci +: 12 13 323 1 0001:01:00.2 thunder-nicvf -: 324 2 0001:01:00.3 thunder-nicvf -: 325 3 0001:01:00.4 thunder-nicvf -: 326 4 0001:01:00.5 thunder-nicvf -: 327 5 0001:01:00.6 thunder-nicvf -: 328 6 0001:01:00.7 thunder-nicvf -: 329 7 0001:01:01.0 thunder-nicvf -: 330 8 0001:01:01.1 thunder-nicvf -: 331 9 0001:01:01.2 thunder-nicvf -: 332 10 0001:01:01.3 thunder-nicvf -: 333 11 0001:01:01.4 thunder-nicvf -: 334 12 0001:01:01.5 vfio-pci: 0 335 13 0001:01:01.6 vfio-pci: 0 336 14 0001:01:01.7 thunder-nicvf: 255 337 15 0001:01:02.0 thunder-nicvf: 255 338 16 0001:01:02.1 thunder-nicvf: 255 339 17 0001:01:02.2 thunder-nicvf: 255 340 18 0001:01:02.3 thunder-nicvf: 255 341 19 0001:01:02.4 thunder-nicvf: 255 342 20 0001:01:02.5 thunder-nicvf: 255 343 21 0001:01:02.6 thunder-nicvf: 255 344 22 0001:01:02.7 thunder-nicvf: 255 345 23 0001:01:03.0 thunder-nicvf: 255 346 24 0001:01:03.1 thunder-nicvf: 255 347 25 0001:01:03.2 thunder-nicvf: 255 348 26 0001:01:03.3 thunder-nicvf: 255 349 27 0001:01:03.4 thunder-nicvf: 255 350 28 0001:01:03.5 thunder-nicvf: 255 351 29 0001:01:03.6 thunder-nicvf: 255 352 30 0001:01:03.7 thunder-nicvf: 255 353 31 0001:01:04.0 thunder-nicvf: 255 354 355Every column that ends with 'thunder-nicvf: number' can be used as secondary VF. 356In printout above all entres after '14 0001:01:01.7 thunder-nicvf: 255' can be used as secondary VF. 357 358Debugging Options 359----------------- 360 361EAL command option to change log level 362 .. code-block:: console 363 364 --log-level=pmd.net.thunderx.driver:info 365 or 366 --log-level=pmd.net.thunderx.driver,7 367 368Runtime Configuration 369--------------------- 370 371skip_data_bytes 372~~~~~~~~~~~~~~~ 373This feature is used to create a hole between HEADROOM and actual data. Size of hole is specified 374in bytes as module param("skip_data_bytes") to PMD. 375This scheme is useful when application would like to insert vlan header without disturbing HEADROOM. 376 377Example: 378 .. code-block:: console 379 380 -a 0002:01:00.2,skip_data_bytes=8 381 382Limitations 383----------- 384 385CRC stripping 386~~~~~~~~~~~~~ 387 388The ThunderX SoC family NICs strip the CRC for every packets coming into the 389host interface irrespective of the offload configuration. 390 391Maximum packet length 392~~~~~~~~~~~~~~~~~~~~~ 393 394The ThunderX SoC family NICs support a maximum of a 9K jumbo frame. The value 395is fixed and cannot be changed. So, even when the ``rxmode.mtu`` 396member of ``struct rte_eth_conf`` is set to a value lower than 9200, frames 397up to 9200 bytes can still reach the host interface. 398 399Maximum packet segments 400~~~~~~~~~~~~~~~~~~~~~~~ 401 402The ThunderX SoC family NICs support up to 12 segments per packet when working 403in scatter/gather mode. So, setting MTU will result with ``EINVAL`` when the 404frame size does not fit in the maximum number of segments. 405 406skip_data_bytes 407~~~~~~~~~~~~~~~ 408 409Maximum limit of skip_data_bytes is 128 bytes and number of bytes should be multiple of 8. 410