1 2.. BSD LICENSE 3 Copyright(c) 2010-2015 Intel Corporation. All rights reserved. 4 All rights reserved. 5 6 Redistribution and use in source and binary forms, with or without 7 modification, are permitted provided that the following conditions 8 are met: 9 10 * Redistributions of source code must retain the above copyright 11 notice, this list of conditions and the following disclaimer. 12 * Redistributions in binary form must reproduce the above copyright 13 notice, this list of conditions and the following disclaimer in 14 the documentation and/or other materials provided with the 15 distribution. 16 * Neither the name of Intel Corporation nor the names of its 17 contributors may be used to endorse or promote products derived 18 from this software without specific prior written permission. 19 20 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 21 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 22 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 23 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 24 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 25 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 26 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 27 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 28 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 29 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 30 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 31 32 33Vhost Sample Application 34======================== 35 36The vhost sample application demonstrates integration of the Data Plane Development Kit (DPDK) 37with the Linux* KVM hypervisor by implementing the vhost-net offload API. 38The sample application performs simple packet switching between virtual machines based on Media Access Control 39(MAC) address or Virtual Local Area Network (VLAN) tag. 40The splitting of Ethernet traffic from an external switch is performed in hardware by the Virtual Machine Device Queues 41(VMDQ) and Data Center Bridging (DCB) features of the Intel® 82599 10 Gigabit Ethernet Controller. 42 43Background 44---------- 45 46Virtio networking (virtio-net) was developed as the Linux* KVM para-virtualized method for communicating network packets 47between host and guest. 48It was found that virtio-net performance was poor due to context switching and packet copying between host, guest, and QEMU. 49The following figure shows the system architecture for a virtio-based networking (virtio-net). 50 51.. _figure_qemu_virtio_net: 52 53.. figure:: img/qemu_virtio_net.* 54 55 System Architecture for Virtio-based Networking (virtio-net). 56 57 58The Linux* Kernel vhost-net module was developed as an offload mechanism for virtio-net. 59The vhost-net module enables KVM (QEMU) to offload the servicing of virtio-net devices to the vhost-net kernel module, 60reducing the context switching and packet copies in the virtual dataplane. 61 62This is achieved by QEMU sharing the following information with the vhost-net module through the vhost-net API: 63 64* The layout of the guest memory space, to enable the vhost-net module to translate addresses. 65 66* The locations of virtual queues in QEMU virtual address space, 67 to enable the vhost module to read/write directly to and from the virtqueues. 68 69* An event file descriptor (eventfd) configured in KVM to send interrupts to the virtio- net device driver in the guest. 70 This enables the vhost-net module to notify (call) the guest. 71 72* An eventfd configured in KVM to be triggered on writes to the virtio-net device's 73 Peripheral Component Interconnect (PCI) config space. 74 This enables the vhost-net module to receive notifications (kicks) from the guest. 75 76The following figure shows the system architecture for virtio-net networking with vhost-net offload. 77 78.. _figure_virtio_linux_vhost: 79 80.. figure:: img/virtio_linux_vhost.* 81 82 Virtio with Linux 83 84 85Sample Code Overview 86-------------------- 87 88The DPDK vhost-net sample code demonstrates KVM (QEMU) offloading the servicing of a Virtual Machine's (VM's) 89virtio-net devices to a DPDK-based application in place of the kernel's vhost-net module. 90 91The DPDK vhost-net sample code is based on vhost library. Vhost library is developed for user space Ethernet switch to 92easily integrate with vhost functionality. 93 94The vhost library implements the following features: 95 96* Management of virtio-net device creation/destruction events. 97 98* Mapping of the VM's physical memory into the DPDK vhost-net's address space. 99 100* Triggering/receiving notifications to/from VMs via eventfds. 101 102* A virtio-net back-end implementation providing a subset of virtio-net features. 103 104There are two vhost implementations in vhost library, vhost cuse and vhost user. In vhost cuse, a character device driver is implemented to 105receive and process vhost requests through ioctl messages. In vhost user, a socket server is created to received vhost requests through 106socket messages. Most of the messages share the same handler routine. 107 108.. note:: 109 **Any vhost cuse specific requirement in the following sections will be emphasized**. 110 111Two implementations are turned on and off statically through configure file. Only one implementation could be turned on. They don't co-exist in current implementation. 112 113The vhost sample code application is a simple packet switching application with the following feature: 114 115* Packet switching between virtio-net devices and the network interface card, 116 including using VMDQs to reduce the switching that needs to be performed in software. 117 118The following figure shows the architecture of the Vhost sample application based on vhost-cuse. 119 120.. _figure_vhost_net_arch: 121 122.. figure:: img/vhost_net_arch.* 123 124 Vhost-net Architectural Overview 125 126 127The following figure shows the flow of packets through the vhost-net sample application. 128 129.. _figure_vhost_net_sample_app: 130 131.. figure:: img/vhost_net_sample_app.* 132 133 Packet Flow Through the vhost-net Sample Application 134 135 136Supported Distributions 137----------------------- 138 139The example in this section have been validated with the following distributions: 140 141* Fedora* 18 142 143* Fedora* 19 144 145* Fedora* 20 146 147.. _vhost_app_prerequisites: 148 149Prerequisites 150------------- 151 152This section lists prerequisite packages that must be installed. 153 154Installing Packages on the Host(vhost cuse required) 155~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 156 157The vhost cuse code uses the following packages; fuse, fuse-devel, and kernel-modules-extra. 158The vhost user code don't rely on those modules as eventfds are already installed into vhost process through 159Unix domain socket. 160 161#. Install Fuse Development Libraries and headers: 162 163 .. code-block:: console 164 165 yum -y install fuse fuse-devel 166 167#. Install the Cuse Kernel Module: 168 169 .. code-block:: console 170 171 yum -y install kernel-modules-extra 172 173QEMU simulator 174~~~~~~~~~~~~~~ 175 176For vhost user, qemu 2.2 is required. 177 178Setting up the Execution Environment 179~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 180 181The vhost sample code requires that QEMU allocates a VM's memory on the hugetlbfs file system. 182As the vhost sample code requires hugepages, 183the best practice is to partition the system into separate hugepage mount points for the VMs and the vhost sample code. 184 185.. note:: 186 187 This is best-practice only and is not mandatory. 188 For systems that only support 2 MB page sizes, 189 both QEMU and vhost sample code can use the same hugetlbfs mount point without issue. 190 191**QEMU** 192 193VMs with gigabytes of memory can benefit from having QEMU allocate their memory from 1 GB huge pages. 1941 GB huge pages must be allocated at boot time by passing kernel parameters through the grub boot loader. 195 196#. Calculate the maximum memory usage of all VMs to be run on the system. 197 Then, round this value up to the nearest Gigabyte the execution environment will require. 198 199#. Edit the /etc/default/grub file, and add the following to the GRUB_CMDLINE_LINUX entry: 200 201 .. code-block:: console 202 203 GRUB_CMDLINE_LINUX="... hugepagesz=1G hugepages=<Number of hugepages required> default_hugepagesz=1G" 204 205#. Update the grub boot loader: 206 207 .. code-block:: console 208 209 grub2-mkconfig -o /boot/grub2/grub.cfg 210 211#. Reboot the system. 212 213#. The hugetlbfs mount point (/dev/hugepages) should now default to allocating gigabyte pages. 214 215.. note:: 216 217 Making the above modification will change the system default hugepage size to 1 GB for all applications. 218 219**Vhost Sample Code** 220 221In this section, we create a second hugetlbs mount point to allocate hugepages for the DPDK vhost sample code. 222 223#. Allocate sufficient 2 MB pages for the DPDK vhost sample code: 224 225 .. code-block:: console 226 227 echo 256 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages 228 229#. Mount hugetlbs at a separate mount point for 2 MB pages: 230 231 .. code-block:: console 232 233 mount -t hugetlbfs nodev /mnt/huge -o pagesize=2M 234 235The above steps can be automated by doing the following: 236 237#. Edit /etc/fstab to add an entry to automatically mount the second hugetlbfs mount point: 238 239 :: 240 241 hugetlbfs <tab> /mnt/huge <tab> hugetlbfs defaults,pagesize=1G 0 0 242 243#. Edit the /etc/default/grub file, and add the following to the GRUB_CMDLINE_LINUX entry: 244 245 :: 246 247 GRUB_CMDLINE_LINUX="... hugepagesz=2M hugepages=256 ... default_hugepagesz=1G" 248 249#. Update the grub bootloader: 250 251 .. code-block:: console 252 253 grub2-mkconfig -o /boot/grub2/grub.cfg 254 255#. Reboot the system. 256 257.. note:: 258 259 Ensure that the default hugepage size after this setup is 1 GB. 260 261Setting up the Guest Execution Environment 262~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 263 264It is recommended for testing purposes that the DPDK testpmd sample application is used in the guest to forward packets, 265the reasons for this are discussed in `Running the Virtual Machine (QEMU)`_. 266 267The testpmd application forwards packets between pairs of Ethernet devices, 268it requires an even number of Ethernet devices (virtio or otherwise) to execute. 269It is therefore recommended to create multiples of two virtio-net devices for each Virtual Machine either through libvirt or 270at the command line as follows. 271 272.. note:: 273 274 Observe that in the example, "-device" and "-netdev" are repeated for two virtio-net devices. 275 276For vhost cuse: 277 278.. code-block:: console 279 280 qemu-system-x86_64 ... \ 281 -netdev tap,id=hostnet1,vhost=on,vhostfd=<open fd> \ 282 -device virtio-net-pci, netdev=hostnet1,id=net1 \ 283 -netdev tap,id=hostnet2,vhost=on,vhostfd=<open fd> \ 284 -device virtio-net-pci, netdev=hostnet2,id=net1 285 286For vhost user: 287 288.. code-block:: console 289 290 qemu-system-x86_64 ... \ 291 -chardev socket,id=char1,path=<sock_path> \ 292 -netdev type=vhost-user,id=hostnet1,chardev=char1 \ 293 -device virtio-net-pci,netdev=hostnet1,id=net1 \ 294 -chardev socket,id=char2,path=<sock_path> \ 295 -netdev type=vhost-user,id=hostnet2,chardev=char2 \ 296 -device virtio-net-pci,netdev=hostnet2,id=net2 297 298sock_path is the path for the socket file created by vhost. 299 300Compiling the Sample Code 301------------------------- 302#. Compile vhost lib: 303 304 To enable vhost, turn on vhost library in the configure file config/common_linuxapp. 305 306 .. code-block:: console 307 308 CONFIG_RTE_LIBRTE_VHOST=n 309 310 vhost user is turned on by default in the configure file config/common_linuxapp. 311 To enable vhost cuse, disable vhost user. 312 313 .. code-block:: console 314 315 CONFIG_RTE_LIBRTE_VHOST_USER=y 316 317 After vhost is enabled and the implementation is selected, build the vhost library. 318 319#. Go to the examples directory: 320 321 .. code-block:: console 322 323 export RTE_SDK=/path/to/rte_sdk 324 cd ${RTE_SDK}/examples/vhost 325 326#. Set the target (a default target is used if not specified). For example: 327 328 .. code-block:: console 329 330 export RTE_TARGET=x86_64-native-linuxapp-gcc 331 332 See the DPDK Getting Started Guide for possible RTE_TARGET values. 333 334#. Build the application: 335 336 .. code-block:: console 337 338 cd ${RTE_SDK} 339 make config ${RTE_TARGET} 340 make install ${RTE_TARGET} 341 cd ${RTE_SDK}/examples/vhost 342 make 343 344#. Go to the eventfd_link directory(vhost cuse required): 345 346 .. code-block:: console 347 348 cd ${RTE_SDK}/lib/librte_vhost/eventfd_link 349 350#. Build the eventfd_link kernel module(vhost cuse required): 351 352 .. code-block:: console 353 354 make 355 356Running the Sample Code 357----------------------- 358 359#. Install the cuse kernel module(vhost cuse required): 360 361 .. code-block:: console 362 363 modprobe cuse 364 365#. Go to the eventfd_link directory(vhost cuse required): 366 367 .. code-block:: console 368 369 export RTE_SDK=/path/to/rte_sdk 370 cd ${RTE_SDK}/lib/librte_vhost/eventfd_link 371 372#. Install the eventfd_link module(vhost cuse required): 373 374 .. code-block:: console 375 376 insmod ./eventfd_link.ko 377 378#. Go to the examples directory: 379 380 .. code-block:: console 381 382 export RTE_SDK=/path/to/rte_sdk 383 cd ${RTE_SDK}/examples/vhost/build/app 384 385#. Run the vhost-switch sample code: 386 387 vhost cuse: 388 389 .. code-block:: console 390 391 ./vhost-switch -c f -n 4 --socket-mem 1024 --huge-dir /mnt/huge \ 392 -- -p 0x1 --dev-basename usvhost 393 394 vhost user: a socket file named usvhost will be created under current directory. Use its path as the socket path in guest's qemu commandline. 395 396 .. code-block:: console 397 398 ./vhost-switch -c f -n 4 --socket-mem 1024 --huge-dir /mnt/huge \ 399 -- -p 0x1 --dev-basename usvhost 400 401.. note:: 402 403 Please note the huge-dir parameter instructs the DPDK to allocate its memory from the 2 MB page hugetlbfs. 404 405.. note:: 406 407 The number used with the --socket-mem parameter may need to be more than 1024. 408 The number required depends on the number of mbufs allocated by vhost-switch. 409 410.. _vhost_app_parameters: 411 412Parameters 413~~~~~~~~~~ 414 415**Basename.** 416vhost cuse uses a Linux* character device to communicate with QEMU. 417The basename is used to generate the character devices name. 418 419 /dev/<basename> 420 421For compatibility with the QEMU wrapper script, a base name of "usvhost" should be used: 422 423.. code-block:: console 424 425 ./vhost-switch -c f -n 4 --socket-mem 1024 --huge-dir /mnt/huge \ 426 -- -p 0x1 --dev-basename usvhost 427 428**vm2vm.** 429The vm2vm parameter disable/set mode of packet switching between guests in the host. 430Value of "0" means disabling vm2vm implies that on virtual machine packet transmission will always go to the Ethernet port; 431Value of "1" means software mode packet forwarding between guests, it needs packets copy in vHOST, 432so valid only in one-copy implementation, and invalid for zero copy implementation; 433value of "2" means hardware mode packet forwarding between guests, it allows packets go to the Ethernet port, 434hardware L2 switch will determine which guest the packet should forward to or need send to external, 435which bases on the packet destination MAC address and VLAN tag. 436 437.. code-block:: console 438 439 ./vhost-switch -c f -n 4 --socket-mem 1024 --huge-dir /mnt/huge \ 440 -- --vm2vm [0,1,2] 441 442**Mergeable Buffers.** 443The mergeable buffers parameter controls how virtio-net descriptors are used for virtio-net headers. 444In a disabled state, one virtio-net header is used per packet buffer; 445in an enabled state one virtio-net header is used for multiple packets. 446The default value is 0 or disabled since recent kernels virtio-net drivers show performance degradation with this feature is enabled. 447 448.. code-block:: console 449 450 ./vhost-switch -c f -n 4 --socket-mem 1024 --huge-dir /mnt/huge \ 451 -- --mergeable [0,1] 452 453**Stats.** 454The stats parameter controls the printing of virtio-net device statistics. 455The parameter specifies an interval second to print statistics, with an interval of 0 seconds disabling statistics. 456 457.. code-block:: console 458 459 ./vhost-switch -c f -n 4 --socket-mem 1024 --huge-dir /mnt/huge \ 460 -- --stats [0,n] 461 462**RX Retry.** 463The rx-retry option enables/disables enqueue retries when the guests RX queue is full. 464This feature resolves a packet loss that is observed at high data-rates, 465by allowing it to delay and retry in the receive path. 466This option is enabled by default. 467 468.. code-block:: console 469 470 ./vhost-switch -c f -n 4 --socket-mem 1024 --huge-dir /mnt/huge \ 471 -- --rx-retry [0,1] 472 473**RX Retry Number.** 474The rx-retry-num option specifies the number of retries on an RX burst, 475it takes effect only when rx retry is enabled. 476The default value is 4. 477 478.. code-block:: console 479 480 ./vhost-switch -c f -n 4 --socket-mem 1024 --huge-dir /mnt/huge \ 481 -- --rx-retry 1 --rx-retry-num 5 482 483**RX Retry Delay Time.** 484The rx-retry-delay option specifies the timeout (in micro seconds) between retries on an RX burst, 485it takes effect only when rx retry is enabled. 486The default value is 15. 487 488.. code-block:: console 489 490 ./vhost-switch -c f -n 4 --socket-mem 1024 --huge-dir /mnt/huge \ 491 -- --rx-retry 1 --rx-retry-delay 20 492 493**Zero copy.** 494Zero copy mode is removed, due to it has not been working for a while. And 495due to the large and complex code, it's better to redesign it than fixing 496it to make it work again. Hence, zero copy may be added back later. 497 498**VLAN strip.** 499The VLAN strip option enable/disable the VLAN strip on host, if disabled, the guest will receive the packets with VLAN tag. 500It is enabled by default. 501 502.. code-block:: console 503 504 ./vhost-switch -c f -n 4 --socket-mem 1024 --huge-dir /mnt/huge \ 505 -- --vlan-strip [0, 1] 506 507.. _vhost_app_running: 508 509Running the Virtual Machine (QEMU) 510---------------------------------- 511 512QEMU must be executed with specific parameters to: 513 514* Ensure the guest is configured to use virtio-net network adapters. 515 516 .. code-block:: console 517 518 qemu-system-x86_64 ... -device virtio-net-pci,netdev=hostnet1, \ 519 id=net1 ... 520 521* Ensure the guest's virtio-net network adapter is configured with offloads disabled. 522 523 .. code-block:: console 524 525 qemu-system-x86_64 ... -device virtio-net-pci,netdev=hostnet1, \ 526 id=net1, csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off 527 528* Redirect QEMU to communicate with the DPDK vhost-net sample code in place of the vhost-net kernel module(vhost cuse). 529 530 .. code-block:: console 531 532 qemu-system-x86_64 ... -netdev tap,id=hostnet1,vhost=on, \ 533 vhostfd=<open fd> ... 534 535* Enable the vhost-net sample code to map the VM's memory into its own process address space. 536 537 .. code-block:: console 538 539 qemu-system-x86_64 ... -mem-prealloc -mem-path /dev/hugepages ... 540 541.. note:: 542 543 The QEMU wrapper (qemu-wrap.py) is a Python script designed to automate the QEMU configuration described above. 544 It also facilitates integration with libvirt, although the script may also be used standalone without libvirt. 545 546Redirecting QEMU to vhost-net Sample Code(vhost cuse) 547~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 548 549To redirect QEMU to the vhost-net sample code implementation of the vhost-net API, 550an open file descriptor must be passed to QEMU running as a child process. 551 552.. code-block:: python 553 554 #!/usr/bin/python 555 fd = os.open("/dev/usvhost-1", os.O_RDWR) 556 subprocess.call 557 ("qemu-system-x86_64 ... -netdev tap,id=vhostnet0,vhost=on,vhostfd=" 558 + fd +"...", shell=True) 559 560.. note:: 561 562 This process is automated in the `QEMU Wrapper Script`_. 563 564Mapping the Virtual Machine's Memory 565~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 566 567For the DPDK vhost-net sample code to be run correctly, QEMU must allocate the VM's memory on hugetlbfs. 568This is done by specifying mem-prealloc and mem-path when executing QEMU. 569The vhost-net sample code accesses the virtio-net device's virtual rings and packet buffers 570by finding and mapping the VM's physical memory on hugetlbfs. 571In this case, the path passed to the guest should be that of the 1 GB page hugetlbfs: 572 573.. code-block:: console 574 575 qemu-system-x86_64 ... -mem-prealloc -mem-path /dev/hugepages ... 576 577.. note:: 578 579 This process is automated in the `QEMU Wrapper Script`_. 580 The following two sections only applies to vhost cuse. 581 For vhost-user, please make corresponding changes to qemu-wrapper script and guest XML file. 582 583QEMU Wrapper Script 584~~~~~~~~~~~~~~~~~~~ 585 586The QEMU wrapper script automatically detects and calls QEMU with the necessary parameters required 587to integrate with the vhost sample code. 588It performs the following actions: 589 590* Automatically detects the location of the hugetlbfs and inserts this into the command line parameters. 591 592* Automatically open file descriptors for each virtio-net device and inserts this into the command line parameters. 593 594* Disables offloads on each virtio-net device. 595 596* Calls Qemu passing both the command line parameters passed to the script itself and those it has auto-detected. 597 598The QEMU wrapper script will automatically configure calls to QEMU: 599 600.. code-block:: console 601 602 qemu-wrap.py -machine pc-i440fx-1.4,accel=kvm,usb=off \ 603 -cpu SandyBridge -smp 4,sockets=4,cores=1,threads=1 \ 604 -netdev tap,id=hostnet1,vhost=on \ 605 -device virtio-net-pci,netdev=hostnet1,id=net1 \ 606 -hda <disk img> -m 4096 607 608which will become the following call to QEMU: 609 610.. code-block:: console 611 612 qemu-system-x86_64 -machine pc-i440fx-1.4,accel=kvm,usb=off \ 613 -cpu SandyBridge -smp 4,sockets=4,cores=1,threads=1 \ 614 -netdev tap,id=hostnet1,vhost=on,vhostfd=<open fd> \ 615 -device virtio-net-pci,netdev=hostnet1,id=net1, \ 616 csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off \ 617 -hda <disk img> -m 4096 -mem-path /dev/hugepages -mem-prealloc 618 619Libvirt Integration 620~~~~~~~~~~~~~~~~~~~ 621 622The QEMU wrapper script (qemu-wrap.py) "wraps" libvirt calls to QEMU, 623such that QEMU is called with the correct parameters described above. 624To call the QEMU wrapper automatically from libvirt, the following configuration changes must be made: 625 626* Place the QEMU wrapper script in libvirt's binary search PATH ($PATH). 627 A good location is in the directory that contains the QEMU binary. 628 629* Ensure that the script has the same owner/group and file permissions as the QEMU binary. 630 631* Update the VM xml file using virsh edit <vm name>: 632 633 * Set the VM to use the launch script 634 635 * Set the emulator path contained in the #<emulator><emulator/> tags For example, 636 replace <emulator>/usr/bin/qemu-kvm<emulator/> with <emulator>/usr/bin/qemu-wrap.py<emulator/> 637 638 * Set the VM's virtio-net device's to use vhost-net offload: 639 640 .. code-block:: xml 641 642 <interface type="network"> 643 <model type="virtio"/> 644 <driver name="vhost"/> 645 <interface/> 646 647 * Enable libvirt to access the DPDK Vhost sample code's character device file by adding it 648 to controllers cgroup for libvirtd using the following steps: 649 650 .. code-block:: xml 651 652 cgroup_controllers = [ ... "devices", ... ] clear_emulator_capabilities = 0 653 user = "root" group = "root" 654 cgroup_device_acl = [ 655 "/dev/null", "/dev/full", "/dev/zero", 656 "/dev/random", "/dev/urandom", 657 "/dev/ptmx", "/dev/kvm", "/dev/kqemu", 658 "/dev/rtc", "/dev/hpet", "/dev/net/tun", 659 "/dev/<devbase-name>-<index>", 660 ] 661 662* Disable SELinux or set to permissive mode. 663 664 665* Mount cgroup device controller: 666 667 .. code-block:: console 668 669 mkdir /dev/cgroup 670 mount -t cgroup none /dev/cgroup -o devices 671 672* Restart the libvirtd system process 673 674 For example, on Fedora* "systemctl restart libvirtd.service" 675 676* Edit the configuration parameters section of the script: 677 678 * Configure the "emul_path" variable to point to the QEMU emulator. 679 680 .. code-block:: xml 681 682 emul_path = "/usr/local/bin/qemu-system-x86_64" 683 684 * Configure the "us_vhost_path" variable to point to the DPDK vhost-net sample code's character devices name. 685 DPDK vhost-net sample code's character device will be in the format "/dev/<basename>". 686 687 .. code-block:: xml 688 689 us_vhost_path = "/dev/usvhost" 690 691Common Issues 692~~~~~~~~~~~~~ 693 694* QEMU failing to allocate memory on hugetlbfs, with an error like the following:: 695 696 file_ram_alloc: can't mmap RAM pages: Cannot allocate memory 697 698 When running QEMU the above error indicates that it has failed to allocate memory for the Virtual Machine on 699 the hugetlbfs. This is typically due to insufficient hugepages being free to support the allocation request. 700 The number of free hugepages can be checked as follows: 701 702 .. code-block:: console 703 704 cat /sys/kernel/mm/hugepages/hugepages-<pagesize>/nr_hugepages 705 706 The command above indicates how many hugepages are free to support QEMU's allocation request. 707 708* User space VHOST when the guest has 2MB sized huge pages: 709 710 The guest may have 2MB or 1GB sized huge pages. The user space VHOST should work properly in both cases. 711 712* User space VHOST will not work with QEMU without the ``-mem-prealloc`` option: 713 714 The current implementation works properly only when the guest memory is pre-allocated, so it is required to 715 use a QEMU version (e.g. 1.6) which supports ``-mem-prealloc``. The ``-mem-prealloc`` option must be 716 specified explicitly in the QEMU command line. 717 718* User space VHOST will not work with a QEMU version without shared memory mapping: 719 720 As shared memory mapping is mandatory for user space VHOST to work properly with the guest, user space VHOST 721 needs access to the shared memory from the guest to receive and transmit packets. It is important to make sure 722 the QEMU version supports shared memory mapping. 723 724* In an Ubuntu environment, QEMU fails to start a new guest normally with user space VHOST due to not being able 725 to allocate huge pages for the new guest: 726 727 The solution for this issue is to add ``-boot c`` into the QEMU command line to make sure the huge pages are 728 allocated properly and then the guest should start normally. 729 730 Use ``cat /proc/meminfo`` to check if there is any changes in the value of ``HugePages_Total`` and ``HugePages_Free`` 731 after the guest startup. 732 733* Log message: ``eventfd_link: module verification failed: signature and/or required key missing - tainting kernel``: 734 735 This log message may be ignored. The message occurs due to the kernel module ``eventfd_link``, which is not a standard 736 Linux module but which is necessary for the user space VHOST current implementation (CUSE-based) to communicate with 737 the guest. 738 739.. _vhost_app_running_dpdk: 740 741Running DPDK in the Virtual Machine 742----------------------------------- 743 744For the DPDK vhost-net sample code to switch packets into the VM, 745the sample code must first learn the MAC address of the VM's virtio-net device. 746The sample code detects the address from packets being transmitted from the VM, similar to a learning switch. 747 748This behavior requires no special action or configuration with the Linux* virtio-net driver in the VM 749as the Linux* Kernel will automatically transmit packets during device initialization. 750However, DPDK-based applications must be modified to automatically transmit packets during initialization 751to facilitate the DPDK vhost- net sample code's MAC learning. 752 753The DPDK testpmd application can be configured to automatically transmit packets during initialization 754and to act as an L2 forwarding switch. 755 756Testpmd MAC Forwarding 757~~~~~~~~~~~~~~~~~~~~~~ 758 759At high packet rates, a minor packet loss may be observed. 760To resolve this issue, a "wait and retry" mode is implemented in the testpmd and vhost sample code. 761In the "wait and retry" mode if the virtqueue is found to be full, then testpmd waits for a period of time before retrying to enqueue packets. 762 763The "wait and retry" algorithm is implemented in DPDK testpmd as a forwarding method call "mac_retry". 764The following sequence diagram describes the algorithm in detail. 765 766.. _figure_tx_dpdk_testpmd: 767 768.. figure:: img/tx_dpdk_testpmd.* 769 770 Packet Flow on TX in DPDK-testpmd 771 772 773Running Testpmd 774~~~~~~~~~~~~~~~ 775 776The testpmd application is automatically built when DPDK is installed. 777Run the testpmd application as follows: 778 779.. code-block:: console 780 781 cd ${RTE_SDK}/x86_64-native-linuxapp-gcc/app 782 ./testpmd -c 0x3 -n 4 --socket-mem 512 \ 783 -- --burst=64 --i --disable-hw-vlan-filter 784 785The destination MAC address for packets transmitted on each port can be set at the command line: 786 787.. code-block:: console 788 789 ./testpmd -c 0x3 -n 4 --socket-mem 512 \ 790 -- --burst=64 --i --disable-hw-vlan-filter \ 791 --eth-peer=0,aa:bb:cc:dd:ee:ff --eth-peer=1,ff:ee:dd:cc:bb:aa 792 793* Packets received on port 1 will be forwarded on port 0 to MAC address 794 795 aa:bb:cc:dd:ee:ff 796 797* Packets received on port 0 will be forwarded on port 1 to MAC address 798 799 ff:ee:dd:cc:bb:aa 800 801The testpmd application can then be configured to act as an L2 forwarding application: 802 803.. code-block:: console 804 805 testpmd> set fwd mac_retry 806 807The testpmd can then be configured to start processing packets, 808transmitting packets first so the DPDK vhost sample code on the host can learn the MAC address: 809 810.. code-block:: console 811 812 testpmd> start tx_first 813 814.. note:: 815 816 Please note "set fwd mac_retry" is used in place of "set fwd mac_fwd" to ensure the retry feature is activated. 817 818Passing Traffic to the Virtual Machine Device 819--------------------------------------------- 820 821For a virtio-net device to receive traffic, 822the traffic's Layer 2 header must include both the virtio-net device's MAC address and VLAN tag. 823The DPDK sample code behaves in a similar manner to a learning switch in that 824it learns the MAC address of the virtio-net devices from the first transmitted packet. 825On learning the MAC address, 826the DPDK vhost sample code prints a message with the MAC address and VLAN tag virtio-net device. 827For example: 828 829.. code-block:: console 830 831 DATA: (0) MAC_ADDRESS cc:bb:bb:bb:bb:bb and VLAN_TAG 1000 registered 832 833The above message indicates that device 0 has been registered with MAC address cc:bb:bb:bb:bb:bb and VLAN tag 1000. 834Any packets received on the NIC with these values is placed on the devices receive queue. 835When a virtio-net device transmits packets, the VLAN tag is added to the packet by the DPDK vhost sample code. 836 837Running virtio_user with vhost-switch 838------------------------------------- 839 840We can also use virtio_user with vhost-switch now. 841Virtio_user is a virtual device that can be run in a application (container) parallelly with vhost in the same OS, 842aka, there is no need to start a VM. We just run it with a different --file-prefix to avoid startup failure. 843 844.. code-block:: console 845 846 cd ${RTE_SDK}/x86_64-native-linuxapp-gcc/app 847 ./testpmd -c 0x3 -n 4 --socket-mem 1024 --no-pci --file-prefix=virtio_user-testpmd \ 848 --vdev=virtio_user0,mac=00:01:02:03:04:05,path=$path_vhost \ 849 -- -i --txqflags=0xf01 --disable-hw-vlan 850 851There is no difference on the vhost side. 852Pleae note that there are some limitations (see release note for more information) in the usage of virtio_user. 853