1.. SPDX-License-Identifier: BSD-3-Clause 2 Copyright(c) 2010-2014 Intel Corporation. 3 4.. include:: <isonum.txt> 5 6Intel Virtual Function Driver 7============================= 8 9Supported Intel® Ethernet Controllers (see the *DPDK Release Notes* for details) 10support the following modes of operation in a virtualized environment: 11 12* **SR-IOV mode**: Involves direct assignment of part of the port resources to different guest operating systems 13 using the PCI-SIG Single Root I/O Virtualization (SR IOV) standard, 14 also known as "native mode" or "pass-through" mode. 15 In this chapter, this mode is referred to as IOV mode. 16 17* **VMDq mode**: Involves central management of the networking resources by an IO Virtual Machine (IOVM) or 18 a Virtual Machine Monitor (VMM), also known as software switch acceleration mode. 19 In this chapter, this mode is referred to as the Next Generation VMDq mode. 20 21SR-IOV Mode Utilization in a DPDK Environment 22--------------------------------------------- 23 24The DPDK uses the SR-IOV feature for hardware-based I/O sharing in IOV mode. 25Therefore, it is possible to partition SR-IOV capability on Ethernet controller NIC resources logically and 26expose them to a virtual machine as a separate PCI function called a "Virtual Function". 27Refer to :numref:`figure_single_port_nic`. 28 29Therefore, a NIC is logically distributed among multiple virtual machines (as shown in :numref:`figure_single_port_nic`), 30while still having global data in common to share with the Physical Function and other Virtual Functions. 31The DPDK fm10kvf, iavf, igbvf or ixgbevf as a Poll Mode Driver (PMD) serves for the Intel® 82576 Gigabit Ethernet Controller, 32Intel® Ethernet Controller I350 family, Intel® 82599 10 Gigabit Ethernet Controller NIC, 33Intel® Fortville 10/40 Gigabit Ethernet Controller NIC's virtual PCI function, or PCIe host-interface of the Intel Ethernet Switch 34FM10000 Series. 35Meanwhile the DPDK Poll Mode Driver (PMD) also supports "Physical Function" of such NIC's on the host. 36 37The DPDK PF/VF Poll Mode Driver (PMD) supports the Layer 2 switch on Intel® 82576 Gigabit Ethernet Controller, 38Intel® Ethernet Controller I350 family, Intel® 82599 10 Gigabit Ethernet Controller, 39and Intel® Fortville 10/40 Gigabit Ethernet Controller NICs so that guest can choose it for inter virtual machine traffic in SR-IOV mode. 40 41For more detail on SR-IOV, please refer to the following documents: 42 43* `SR-IOV provides hardware based I/O sharing <http://www.intel.com/network/connectivity/solutions/vmdc.htm>`_ 44 45* `PCI-SIG-Single Root I/O Virtualization Support on IA 46 <http://www.intel.com/content/www/us/en/pci-express/pci-sig-single-root-io-virtualization-support-in-virtualization-technology-for-connectivity-paper.html>`_ 47 48* `Scalable I/O Virtualized Servers <http://www.intel.com/content/www/us/en/virtualization/server-virtualization/scalable-i-o-virtualized-servers-paper.html>`_ 49 50.. _figure_single_port_nic: 51 52.. figure:: img/single_port_nic.* 53 54 Virtualization for a Single Port NIC in SR-IOV Mode 55 56 57Physical and Virtual Function Infrastructure 58~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 59 60The following describes the Physical Function and Virtual Functions infrastructure for the supported Ethernet Controller NICs. 61 62Virtual Functions operate under the respective Physical Function on the same NIC Port and therefore have no access 63to the global NIC resources that are shared between other functions for the same NIC port. 64 65A Virtual Function has basic access to the queue resources and control structures of the queues assigned to it. 66For global resource access, a Virtual Function has to send a request to the Physical Function for that port, 67and the Physical Function operates on the global resources on behalf of the Virtual Function. 68For this out-of-band communication, an SR-IOV enabled NIC provides a memory buffer for each Virtual Function, 69which is called a "Mailbox". 70 71Intel® Ethernet Adaptive Virtual Function 72^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 73Adaptive Virtual Function (IAVF) is a SR-IOV Virtual Function with the same device id (8086:1889) on different Intel Ethernet Controller. 74IAVF Driver is VF driver which supports for all future Intel devices without requiring a VM update. And since this happens to be an adaptive VF driver, 75every new drop of the VF driver would add more and more advanced features that can be turned on in the VM if the underlying HW device supports those 76advanced features based on a device agnostic way without ever compromising on the base functionality. IAVF provides generic hardware interface and 77interface between IAVF driver and a compliant PF driver is specified. 78 79Intel products starting Ethernet Controller 700 Series to support Adaptive Virtual Function. 80 81The way to generate Virtual Function is like normal, and the resource of VF assignment depends on the NIC Infrastructure. 82 83For more detail on SR-IOV, please refer to the following documents: 84 85* `Intel® IAVF HAS <https://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/ethernet-adaptive-virtual-function-hardware-spec.pdf>`_ 86 87.. note:: 88 89 To use DPDK IAVF PMD on Intel® 700 Series Ethernet Controller, the device id (0x1889) need to specified during device 90 assignment in hypervisor. Take qemu for example, the device assignment should carry the IAVF device id (0x1889) like 91 ``-device vfio-pci,x-pci-device-id=0x1889,host=03:0a.0``. 92 93 When IAVF is backed by an Intel® E810 device, the "Protocol Extraction" feature which is supported by ice PMD is also 94 available for IAVF PMD. The same devargs with the same parameters can be applied to IAVF PMD, for detail please reference 95 the section ``Protocol extraction for per queue`` of ice.rst. 96 97 Quanta size configuration is also supported when IAVF is backed by an Intel® E810 device by setting ``devargs`` 98 parameter ``quanta_size`` like ``-a 18:00.0,quanta_size=2048``. The default value is 1024, and quanta size should be 99 set as the product of 64 in legacy host interface mode. 100 101 When IAVF is backed by an Intel® E810 device or an Intel® 700 Series Ethernet device, the reset watchdog is enabled 102 when link state changes to down. The default period is 2000us, defined by ``IAVF_DEV_WATCHDOG_PERIOD``. 103 Set ``devargs`` parameter ``watchdog_period`` to adjust the watchdog period in microseconds, or set it to 0 to disable the watchdog, 104 for example, ``-a 18:01.0,watchdog_period=5000`` or ``-a 18:01.0,watchdog_period=0``. 105 106 Enable VF auto-reset by setting the devargs parameter like ``-a 18:01.0,auto_reset=1`` 107 when IAVF is backed by an Intel\ |reg| E810 device 108 or an Intel\ |reg| 700 Series Ethernet device. 109 110 Stop polling Rx/Tx hardware queue when link is down 111 by setting the ``devargs`` parameter like ``-a 18:01.0,no-poll-on-link-down=1`` 112 when IAVF is backed by an Intel\ |reg| E810 device or an Intel\ |reg| 700 Series Ethernet device. 113 114 Similarly, when IAVF is backed by an Intel\ |reg| E810 device 115 or an Intel\ |reg| 700 Series Ethernet device, 116 set the ``devargs`` parameter ``mbuf_check`` to enable Tx diagnostics. 117 For example, ``-a 18:01.0,mbuf_check=<case>`` or ``-a 18:01.0,mbuf_check=[<case1>,<case2>...]``. 118 Thereafter, ``rte_eth_xstats_get()`` can be used to get the error counts, 119 which are collected in ``tx_mbuf_error_packets`` xstats. 120 In testpmd these can be shown via: ``testpmd> show port xstats all``. 121 Supported values for the ``case`` parameter are: 122 123 * ``mbuf``: Check for corrupted mbuf. 124 * ``size``: Check min/max packet length according to HW spec. 125 * ``segment``: Check number of mbuf segments does not exceed HW limits. 126 * ``offload``: Check for use of an unsupported offload flag. 127 128The PCIE host-interface of Intel Ethernet Switch FM10000 Series VF infrastructure 129^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 130 131In a virtualized environment, the programmer can enable a maximum of *64 Virtual Functions (VF)* 132globally per PCIE host-interface of the Intel Ethernet Switch FM10000 Series device. 133Each VF can have a maximum of 16 queue pairs. 134The Physical Function in host could be only configured by the Linux* fm10k driver 135(in the case of the Linux Kernel-based Virtual Machine [KVM]), DPDK PMD PF driver doesn't support it yet. 136 137For example, 138 139* Using Linux* fm10k driver: 140 141 .. code-block:: console 142 143 rmmod fm10k (To remove the fm10k module) 144 insmod fm0k.ko max_vfs=2,2 (To enable two Virtual Functions per port) 145 146Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a dual-port NIC. 147When you enable the four Virtual Functions with the above command, the four enabled functions have a Function# 148represented by (Bus#, Device#, Function#) in sequence starting from 0 to 3. 149However: 150 151* Virtual Functions 0 and 2 belong to Physical Function 0 152 153* Virtual Functions 1 and 3 belong to Physical Function 1 154 155.. note:: 156 157 The above is an important consideration to take into account when targeting specific packets to a selected port. 158 159Intel® X710/XL710 Gigabit Ethernet Controller VF Infrastructure 160^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 161 162In a virtualized environment, the programmer can enable a maximum of *128 Virtual Functions (VF)* 163globally per Intel® X710/XL710 Gigabit Ethernet Controller NIC device. 164The Physical Function in host could be either configured by the Linux* i40e driver 165(in the case of the Linux Kernel-based Virtual Machine [KVM]) or by DPDK PMD PF driver. 166When using both DPDK PMD PF/VF drivers, the whole NIC will be taken over by DPDK based application. 167 168For example, 169 170* Using Linux* i40e driver: 171 172 .. code-block:: console 173 174 rmmod i40e (To remove the i40e module) 175 insmod i40e.ko max_vfs=2,2 (To enable two Virtual Functions per port) 176 177* Using the DPDK PMD PF i40e driver, bind the PF device to ``vfio_pci`` or ``igb_uio`` and 178 create VF devices. See :ref:`linux_gsg_binding_kernel`. 179 180 Launch the DPDK testpmd/example or your own host daemon application using the DPDK PMD library. 181 182Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a dual-port NIC. 183When you enable the four Virtual Functions with the above command, the four enabled functions have a Function# 184represented by (Bus#, Device#, Function#) in sequence starting from 0 to 3. 185However: 186 187* Virtual Functions 0 and 2 belong to Physical Function 0 188 189* Virtual Functions 1 and 3 belong to Physical Function 1 190 191.. note:: 192 193 The above is an important consideration to take into account when targeting specific packets to a selected port. 194 195 For Intel® X710/XL710 Gigabit Ethernet Controller, queues are in pairs. One queue pair means one receive queue and 196 one transmit queue. The default number of queue pairs per VF is 4, and can be 16 in maximum. 197 198Intel® 82599 10 Gigabit Ethernet Controller VF Infrastructure 199^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 200 201The programmer can enable a maximum of *63 Virtual Functions* and there must be *one Physical Function* per Intel® 82599 20210 Gigabit Ethernet Controller NIC port. 203The reason for this is that the device allows for a maximum of 128 queues per port and a virtual/physical function has to 204have at least one queue pair (RX/TX). 205The current implementation of the DPDK ixgbevf driver supports a single queue pair (RX/TX) per Virtual Function. 206The Physical Function in host could be either configured by the Linux* ixgbe driver 207(in the case of the Linux Kernel-based Virtual Machine [KVM]) or by DPDK PMD PF driver. 208When using both DPDK PMD PF/VF drivers, the whole NIC will be taken over by DPDK based application. 209 210For example, 211 212* Using Linux* ixgbe driver: 213 214 .. code-block:: console 215 216 rmmod ixgbe (To remove the ixgbe module) 217 insmod ixgbe max_vfs=2,2 (To enable two Virtual Functions per port) 218 219* Using the DPDK PMD PF ixgbe driver, bind the PF device to ``vfio_pci`` or ``igb_uio`` and 220 create VF devices. See :ref:`linux_gsg_binding_kernel`. 221 222 Launch the DPDK testpmd/example or your own host daemon application using the DPDK PMD library. 223 224* Using the DPDK PMD PF ixgbe driver to enable VF RSS: 225 226 Same steps as above to bind the PF device, create VF devices, and 227 launch the DPDK testpmd/example or your own host daemon application using the DPDK PMD library. 228 229 The available queue number (at most 4) per VF depends on the total number of pool, which is 230 determined by the max number of VF at PF initialization stage and the number of queue specified 231 in config: 232 233 * If the max number of VFs (max_vfs) is set in the range of 1 to 32: 234 235 If the number of Rx queues is specified as 4 (``--rxq=4`` in testpmd), then there are totally 32 236 pools (RTE_ETH_32_POOLS), and each VF could have 4 Rx queues; 237 238 If the number of Rx queues is specified as 2 (``--rxq=2`` in testpmd), then there are totally 32 239 pools (RTE_ETH_32_POOLS), and each VF could have 2 Rx queues; 240 241 * If the max number of VFs (max_vfs) is in the range of 33 to 64: 242 243 If the number of Rx queues in specified as 4 (``--rxq=4`` in testpmd), then error message is expected 244 as ``rxq`` is not correct at this case; 245 246 If the number of rxq is 2 (``--rxq=2`` in testpmd), then there is totally 64 pools (RTE_ETH_64_POOLS), 247 and each VF have 2 Rx queues; 248 249 On host, to enable VF RSS functionality, rx mq mode should be set as RTE_ETH_MQ_RX_VMDQ_RSS 250 or RTE_ETH_MQ_RX_RSS mode, and SRIOV mode should be activated (max_vfs >= 1). 251 It also needs config VF RSS information like hash function, RSS key, RSS key length. 252 253.. note:: 254 255 The limitation for VF RSS on Intel® 82599 10 Gigabit Ethernet Controller is: 256 The hash and key are shared among PF and all VF, the RETA table with 128 entries is also shared 257 among PF and all VF; So it could not to provide a method to query the hash and reta content per 258 VF on guest, while, if possible, please query them on host for the shared RETA information. 259 260Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a dual-port NIC. 261When you enable the four Virtual Functions with the above command, the four enabled functions have a Function# 262represented by (Bus#, Device#, Function#) in sequence starting from 0 to 3. 263However: 264 265* Virtual Functions 0 and 2 belong to Physical Function 0 266 267* Virtual Functions 1 and 3 belong to Physical Function 1 268 269.. note:: 270 271 The above is an important consideration to take into account when targeting specific packets to a selected port. 272 273Intel® 82576 Gigabit Ethernet Controller and Intel® Ethernet Controller I350 Family VF Infrastructure 274^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 275 276In a virtualized environment, an Intel® 82576 Gigabit Ethernet Controller serves up to eight virtual machines (VMs). 277The controller has 16 TX and 16 RX queues. 278They are generally referred to (or thought of) as queue pairs (one TX and one RX queue). 279This gives the controller 16 queue pairs. 280 281A pool is a group of queue pairs for assignment to the same VF, used for transmit and receive operations. 282The controller has eight pools, with each pool containing two queue pairs, that is, two TX and two RX queues assigned to each VF. 283 284In a virtualized environment, an Intel® Ethernet Controller I350 family device serves up to eight virtual machines (VMs) per port. 285The eight queues can be accessed by eight different VMs if configured correctly (the i350 has 4x1GbE ports each with 8T X and 8 RX queues), 286that means, one Transmit and one Receive queue assigned to each VF. 287 288For example, 289 290* Using Linux* igb driver: 291 292 .. code-block:: console 293 294 rmmod igb (To remove the igb module) 295 insmod igb max_vfs=2,2 (To enable two Virtual Functions per port) 296 297* Using the DPDK PMD PF igb driver, bind the PF device to ``vfio_pci`` or ``igb_uio`` and 298 create VF devices. See :ref:`linux_gsg_binding_kernel`. 299 300 Launch DPDK testpmd/example or your own host daemon application using the DPDK PMD library. 301 302Virtual Function enumeration is performed in the following sequence by the Linux* pci driver for a four-port NIC. 303When you enable the four Virtual Functions with the above command, the four enabled functions have a Function# 304represented by (Bus#, Device#, Function#) in sequence, starting from 0 to 7. 305However: 306 307* Virtual Functions 0 and 4 belong to Physical Function 0 308 309* Virtual Functions 1 and 5 belong to Physical Function 1 310 311* Virtual Functions 2 and 6 belong to Physical Function 2 312 313* Virtual Functions 3 and 7 belong to Physical Function 3 314 315.. note:: 316 317 The above is an important consideration to take into account when targeting specific packets to a selected port. 318 319Validated Hypervisors 320~~~~~~~~~~~~~~~~~~~~~ 321 322The validated hypervisor is: 323 324* KVM (Kernel Virtual Machine) with Qemu, version 0.14.0 325 326However, the hypervisor is bypassed to configure the Virtual Function devices using the Mailbox interface, 327the solution is hypervisor-agnostic. 328Xen* and VMware* (when SR- IOV is supported) will also be able to support the DPDK with Virtual Function driver support. 329 330Expected Guest Operating System in Virtual Machine 331~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 332 333The expected guest operating systems in a virtualized environment are: 334 335* Fedora* 14 (64-bit) 336 337* Ubuntu* 10.04 (64-bit) 338 339For supported kernel versions, refer to the *DPDK Release Notes*. 340 341.. _intel_vf_kvm: 342 343Setting Up a KVM Virtual Machine Monitor 344---------------------------------------- 345 346The following describes a target environment: 347 348* Host Operating System: Fedora 14 349 350* Hypervisor: KVM (Kernel Virtual Machine) with Qemu version 0.14.0 351 352* Guest Operating System: Fedora 14 353 354* Linux Kernel Version: Refer to the *DPDK Getting Started Guide* 355 356* Target Applications: l2fwd, l3fwd-vf 357 358The setup procedure is as follows: 359 360#. Before booting the Host OS, open **BIOS setup** and enable **Intel® VT features**. 361 362#. While booting the Host OS kernel, pass the intel_iommu=on kernel command line argument using GRUB. 363 When using DPDK PF driver on host, pass the iommu=pt kernel command line argument in GRUB. 364 365#. Download qemu-kvm-0.14.0 from 366 `http://sourceforge.net/projects/kvm/files/qemu-kvm/ <http://sourceforge.net/projects/kvm/files/qemu-kvm/>`_ 367 and install it in the Host OS using the following steps: 368 369 When using a recent kernel (2.6.25+) with kvm modules included: 370 371 .. code-block:: console 372 373 tar xzf qemu-kvm-release.tar.gz 374 cd qemu-kvm-release 375 ./configure --prefix=/usr/local/kvm 376 make 377 sudo make install 378 sudo /sbin/modprobe kvm-intel 379 380 When using an older kernel, or a kernel from a distribution without the kvm modules, 381 you must download (from the same link), compile and install the modules yourself: 382 383 .. code-block:: console 384 385 tar xjf kvm-kmod-release.tar.bz2 386 cd kvm-kmod-release 387 ./configure 388 make 389 sudo make install 390 sudo /sbin/modprobe kvm-intel 391 392 qemu-kvm installs in the /usr/local/bin directory. 393 394 For more details about KVM configuration and usage, please refer to: 395 396 `http://www.linux-kvm.org/page/HOWTO1 <http://www.linux-kvm.org/page/HOWTO1>`_. 397 398#. Create a Virtual Machine and install Fedora 14 on the Virtual Machine. 399 This is referred to as the Guest Operating System (Guest OS). 400 401#. Download and install the latest ixgbe driver from 402 `intel.com <https://downloadcenter.intel.com/download/14687>`_. 403 404#. In the Host OS 405 406 When using Linux kernel ixgbe driver, unload the Linux ixgbe driver and reload it with the max_vfs=2,2 argument: 407 408 .. code-block:: console 409 410 rmmod ixgbe 411 modprobe ixgbe max_vfs=2,2 412 413 When using DPDK PMD PF driver, bind the PF device to ``vfio_pci`` or ``igb_uio`` and 414 create VF devices. See :ref:`linux_gsg_binding_kernel`. 415 416 Let say we have a machine with four physical ixgbe ports: 417 418 419 0000:02:00.0 420 421 0000:02:00.1 422 423 0000:0e:00.0 424 425 0000:0e:00.1 426 427 The mentioned steps above should result in two vfs for device 0000:02:00.0: 428 429 .. code-block:: console 430 431 ls -alrt /sys/bus/pci/devices/0000\:02\:00.0/virt* 432 lrwxrwxrwx. 1 root root 0 Apr 13 05:40 /sys/bus/pci/devices/0000:02:00.0/virtfn1 -> ../0000:02:10.2 433 lrwxrwxrwx. 1 root root 0 Apr 13 05:40 /sys/bus/pci/devices/0000:02:00.0/virtfn0 -> ../0000:02:10.0 434 435 It also creates two vfs for device 0000:02:00.1: 436 437 .. code-block:: console 438 439 ls -alrt /sys/bus/pci/devices/0000\:02\:00.1/virt* 440 lrwxrwxrwx. 1 root root 0 Apr 13 05:51 /sys/bus/pci/devices/0000:02:00.1/virtfn1 -> ../0000:02:10.3 441 lrwxrwxrwx. 1 root root 0 Apr 13 05:51 /sys/bus/pci/devices/0000:02:00.1/virtfn0 -> ../0000:02:10.1 442 443#. List the PCI devices connected and notice that the Host OS shows two Physical Functions (traditional ports) 444 and four Virtual Functions (two for each port). 445 This is the result of the previous step. 446 447#. Insert the pci_stub module to hold the PCI devices that are freed from the default driver using the following command 448 (see http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM Section 4 for more information): 449 450 .. code-block:: console 451 452 sudo /sbin/modprobe pci-stub 453 454 Unbind the default driver from the PCI devices representing the Virtual Functions. 455 A script to perform this action is as follows: 456 457 .. code-block:: console 458 459 echo "8086 10ed" > /sys/bus/pci/drivers/pci-stub/new_id 460 echo 0000:08:10.0 > /sys/bus/pci/devices/0000:08:10.0/driver/unbind 461 echo 0000:08:10.0 > /sys/bus/pci/drivers/pci-stub/bind 462 463 where, 0000:08:10.0 belongs to the Virtual Function visible in the Host OS. 464 465#. Now, start the Virtual Machine by running the following command: 466 467 .. code-block:: console 468 469 /usr/local/kvm/bin/qemu-system-x86_64 -m 4096 -smp 4 -boot c -hda lucid.qcow2 -device pci-assign,host=08:10.0 470 471 where: 472 473 — -m = memory to assign 474 475 — -smp = number of smp cores 476 477 — -boot = boot option 478 479 — -hda = virtual disk image 480 481 — -device = device to attach 482 483 .. note:: 484 485 — The pci-assign,host=08:10.0 value indicates that you want to attach a PCI device 486 to a Virtual Machine and the respective (Bus:Device.Function) 487 numbers should be passed for the Virtual Function to be attached. 488 489 — qemu-kvm-0.14.0 allows a maximum of four PCI devices assigned to a VM, 490 but this is qemu-kvm version dependent since qemu-kvm-0.14.1 allows a maximum of five PCI devices. 491 492 — qemu-system-x86_64 also has a -cpu command line option that is used to select the cpu_model 493 to emulate in a Virtual Machine. Therefore, it can be used as: 494 495 .. code-block:: console 496 497 /usr/local/kvm/bin/qemu-system-x86_64 -cpu ? 498 499 (to list all available cpu_models) 500 501 /usr/local/kvm/bin/qemu-system-x86_64 -m 4096 -cpu host -smp 4 -boot c -hda lucid.qcow2 -device pci-assign,host=08:10.0 502 503 (to use the same cpu_model equivalent to the host cpu) 504 505 For more information, please refer to: `http://wiki.qemu.org/Features/CPUModels <http://wiki.qemu.org/Features/CPUModels>`_. 506 507#. If use vfio-pci to pass through device instead of pci-assign, steps 8 and 9 need to be updated to bind device to vfio-pci and 508 replace pci-assign with vfio-pci when start virtual machine. 509 510 .. code-block:: console 511 512 sudo /sbin/modprobe vfio-pci 513 514 echo "8086 10ed" > /sys/bus/pci/drivers/vfio-pci/new_id 515 echo 0000:08:10.0 > /sys/bus/pci/devices/0000:08:10.0/driver/unbind 516 echo 0000:08:10.0 > /sys/bus/pci/drivers/vfio-pci/bind 517 518 /usr/local/kvm/bin/qemu-system-x86_64 -m 4096 -smp 4 -boot c -hda lucid.qcow2 -device vfio-pci,host=08:10.0 519 520#. Install and run DPDK host app to take over the Physical Function. Eg. 521 522 .. code-block:: console 523 524 ./<build_dir>/app/dpdk-testpmd -l 0-3 -n 4 -- -i 525 526#. Finally, access the Guest OS using vncviewer with the localhost:5900 port and check the lspci command output in the Guest OS. 527 The virtual functions will be listed as available for use. 528 529#. Configure and install the DPDK on the Guest OS as normal, that is, there is no change to the normal installation procedure. 530 531.. note:: 532 533 If you are unable to compile the DPDK and you are getting "error: CPU you selected does not support x86-64 instruction set", 534 power off the Guest OS and start the virtual machine with the correct -cpu option in the qemu- system-x86_64 command as shown in step 9. 535 You must select the best x86_64 cpu_model to emulate or you can select host option if available. 536 537.. note:: 538 539 Run the DPDK l2fwd sample application in the Guest OS with Hugepages enabled. 540 For the expected benchmark performance, you must pin the cores from the Guest OS to the Host OS (taskset can be used to do this) and 541 you must also look at the PCI Bus layout on the board to ensure you are not running the traffic over the QPI Interface. 542 543.. note:: 544 545 * The Virtual Machine Manager (the Fedora package name is virt-manager) is a utility for virtual machine management 546 that can also be used to create, start, stop and delete virtual machines. 547 If this option is used, step 2 and 6 in the instructions provided will be different. 548 549 * virsh, a command line utility for virtual machine management, 550 can also be used to bind and unbind devices to a virtual machine in Ubuntu. 551 If this option is used, step 6 in the instructions provided will be different. 552 553 * The Virtual Machine Monitor (see :numref:`figure_perf_benchmark`) is equivalent to a Host OS with KVM installed as described in the instructions. 554 555.. _figure_perf_benchmark: 556 557.. figure:: img/perf_benchmark.* 558 559 Performance Benchmark Setup 560 561 562DPDK SR-IOV PMD PF/VF Driver Usage Model 563---------------------------------------- 564 565Fast Host-based Packet Processing 566~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 567 568Software Defined Network (SDN) trends are demanding fast host-based packet handling. 569In a virtualization environment, 570the DPDK VF PMD performs the same throughput result as a non-VT native environment. 571 572With such host instance fast packet processing, lots of services such as filtering, QoS, 573DPI can be offloaded on the host fast path. 574 575:numref:`figure_fast_pkt_proc` shows the scenario where some VMs directly communicate externally via a VFs, 576while others connect to a virtual switch and share the same uplink bandwidth. 577 578.. _figure_fast_pkt_proc: 579 580.. figure:: img/fast_pkt_proc.* 581 582 Fast Host-based Packet Processing 583 584 585SR-IOV (PF/VF) Approach for Inter-VM Communication 586-------------------------------------------------- 587 588Inter-VM data communication is one of the traffic bottle necks in virtualization platforms. 589SR-IOV device assignment helps a VM to attach the real device, taking advantage of the bridge in the NIC. 590So VF-to-VF traffic within the same physical port (VM0<->VM1) have hardware acceleration. 591However, when VF crosses physical ports (VM0<->VM2), there is no such hardware bridge. 592In this case, the DPDK PMD PF driver provides host forwarding between such VMs. 593 594:numref:`figure_inter_vm_comms` shows an example. 595In this case an update of the MAC address lookup tables in both the NIC and host DPDK application is required. 596 597In the NIC, writing the destination of a MAC address belongs to another cross device VM to the PF specific pool. 598So when a packet comes in, its destination MAC address will match and forward to the host DPDK PMD application. 599 600In the host DPDK application, the behavior is similar to L2 forwarding, 601that is, the packet is forwarded to the correct PF pool. 602The SR-IOV NIC switch forwards the packet to a specific VM according to the MAC destination address 603which belongs to the destination VF on the VM. 604 605.. _figure_inter_vm_comms: 606 607.. figure:: img/inter_vm_comms.* 608 609 Inter-VM Communication 610 611 612Windows Support 613--------------- 614 615* IAVF PMD currently is supported only inside Windows guest created on Linux host. 616 617* Physical PCI resources are exposed as virtual functions 618 into Windows VM using SR-IOV pass-through feature. 619 620* Create a Windows guest on Linux host using KVM hypervisor. 621 Refer to the steps mentioned in the above section: :ref:`intel_vf_kvm`. 622 623* In the Host machine, download and install the kernel Ethernet driver 624 for `i40e <https://downloadcenter.intel.com/download/24411>`_ 625 or `ice <https://downloadcenter.intel.com/download/29746>`_. 626 627* For Windows guest, install NetUIO driver 628 in place of existing built-in (inbox) Virtual Function driver. 629 630* To load NetUIO driver, follow the steps mentioned in `dpdk-kmods repository 631 <https://git.dpdk.org/dpdk-kmods/tree/windows/netuio/README.rst>`_. 632 633 634Inline IPsec Support 635-------------------- 636 637* IAVF PMD supports inline crypto processing depending on the underlying 638 hardware crypto capabilities. IPsec Security Gateway Sample Application 639 supports inline IPsec processing for IAVF PMD. For more details see the 640 IPsec Security Gateway Sample Application and Security library 641 documentation. 642 643 644Diagnostic Utilities 645-------------------- 646 647Register mbuf dynfield to test Tx LLDP 648~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 649 650Register an mbuf dynfield ``IAVF_TX_LLDP_DYNFIELD`` on ``dev_start`` 651to indicate the need to send LLDP packet. 652This dynfield needs to be set to 1 when preparing packet. 653 654For ``dpdk-testpmd`` application, it needs to stop and restart Tx port to take effect. 655 656Usage:: 657 658 testpmd> set tx lldp on 659 660 661Limitations or Knowing issues 662----------------------------- 663 66416 Byte RX Descriptor setting is not available 665~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 666 667Currently the VF's RX descriptor size is decided by PF. There's no PF-VF 668interface for VF to request the RX descriptor size, also no interface to notify 669VF its own RX descriptor size. 670For all available versions of the kernel PF drivers, these drivers don't 671support 16 bytes RX descriptor. If the Linux kernel driver is used as host driver, 672while DPDK iavf PMD is used as the VF driver, DPDK cannot choose 16 bytes receive 673descriptor. The reason is that the RX descriptor is already set to 32 bytes by 674the all existing kernel driver. 675In the future, if the any kernel driver supports 16 bytes RX descriptor, user 676should make sure the DPDK VF uses the same RX descriptor size. 677 678i40e: VF performance is impacted by PCI extended tag setting 679~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 680 681To reach maximum NIC performance in the VF the PCI extended tag must be 682enabled. But the kernel driver does not set this feature during initialization. 683So when running traffic on a VF which is managed by the kernel PF driver, a 684significant NIC performance downgrade has been observed (for 64 byte packets, 685there is about 25% line-rate downgrade for a 25GbE device and about 35% for a 68640GbE device). 687 688For kernel version >= 4.11, the kernel's PCI driver will enable the extended 689tag if it detects that the device supports it. So by default, this is not an 690issue. For kernels <= 4.11 or when the PCI extended tag is disabled it can be 691enabled using the steps below. 692 693#. Get the current value of the PCI configure register:: 694 695 setpci -s <XX:XX.X> a8.w 696 697#. Set bit 8:: 698 699 value = value | 0x100 700 701#. Set the PCI configure register with new value:: 702 703 setpci -s <XX:XX.X> a8.w=<value> 704 705i40e: Vlan strip of VF 706~~~~~~~~~~~~~~~~~~~~~~ 707 708The VF vlan strip function is only supported in the i40e kernel driver >= 2.1.26. 709 710i40e: Vlan filtering of VF 711~~~~~~~~~~~~~~~~~~~~~~~~~~ 712 713For i40e driver 2.17.15, configuring VLAN filters from the DPDK VF is unsupported. 714When applying VLAN filters on the VF it must first be configured from the 715corresponding PF. 716 717ice: VF inserts VLAN tag incorrectly on AVX-512 Tx path 718~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 719 720When the kernel driver requests the VF to use the L2TAG2 field of the Tx context 721descriptor to insert the hardware offload VLAN tag, 722AVX-512 Tx path cannot handle this case correctly 723due to its lack of support for the Tx context descriptor. 724 725The VLAN tag will be inserted to the wrong location (inner of QinQ) 726on AVX-512 Tx path. 727That is inconsistent with the behavior of PF (outer of QinQ). 728The ice kernel driver version newer than 1.8.9 requests to use L2TAG2 729and has this issue. 730 731Set the parameter `--force-max-simd-bitwidth` as 64/128/256 732to avoid selecting AVX-512 Tx path. 733 734ice: VLAN tag length not included in MTU 735~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 736 737When configuring MTU for a VF, MTU must not include VLAN tag length. 738In practice, when kernel driver configures VLAN filtering for a VF, 739the VLAN header tag length will be automatically added to MTU when configuring queues. 740As a consequence, when attempting to configure a VF port with MTU that, 741together with a VLAN tag header, exceeds maximum supported MTU, 742port configuration will fail if kernel driver has configured VLAN filtering on that VF. 743