1 2.. BSD LICENSE 3 Copyright(c) 2010-2014 Intel Corporation. All rights reserved. 4 All rights reserved. 5 6 Redistribution and use in source and binary forms, with or without 7 modification, are permitted provided that the following conditions 8 are met: 9 10 * Redistributions of source code must retain the above copyright 11 notice, this list of conditions and the following disclaimer. 12 * Redistributions in binary form must reproduce the above copyright 13 notice, this list of conditions and the following disclaimer in 14 the documentation and/or other materials provided with the 15 distribution. 16 * Neither the name of Intel Corporation nor the names of its 17 contributors may be used to endorse or promote products derived 18 from this software without specific prior written permission. 19 20 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 21 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 22 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 23 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 24 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 25 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 26 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 27 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 28 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 29 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 30 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 31 32 33Vhost Sample Application 34======================== 35 36The vhost sample application demonstrates integration of the Data Plane Development Kit (DPDK) 37with the Linux* KVM hypervisor by implementing the vhost-net offload API. 38The sample application performs simple packet switching between virtual machines based on Media Access Control 39(MAC) address or Virtual Local Area Network (VLAN) tag. 40The splitting of ethernet traffic from an external switch is performed in hardware by the Virtual Machine Device Queues 41(VMDQ) and Data Center Bridging (DCB) features of the Intel® 82599 10 Gigabit Ethernet Controller. 42 43Background 44---------- 45 46Virtio networking (virtio-net) was developed as the Linux* KVM para-virtualized method for communicating network packets 47between host and guest. 48It was found that virtio-net performance was poor due to context switching and packet copying between host, guest, and QEMU. 49The following figure shows the system architecture for a virtio-based networking (virtio-net). 50 51.. _figure_16: 52 53**Figure16. QEMU Virtio-net (prior to vhost-net)** 54 55.. image19_png has been renamed 56 57|qemu_virtio_net| 58 59The Linux* Kernel vhost-net module was developed as an offload mechanism for virtio-net. 60The vhost-net module enables KVM (QEMU) to offload the servicing of virtio-net devices to the vhost-net kernel module, 61reducing the context switching and packet copies in the virtual dataplane. 62 63This is achieved by QEMU sharing the following information with the vhost-net module through the vhost-net API: 64 65* The layout of the guest memory space, to enable the vhost-net module to translate addresses. 66 67* The locations of virtual queues in QEMU virtual address space, 68 to enable the vhost module to read/write directly to and from the virtqueues. 69 70* An event file descriptor (eventfd) configured in KVM to send interrupts to the virtio- net device driver in the guest. 71 This enables the vhost-net module to notify (call) the guest. 72 73* An eventfd configured in KVM to be triggered on writes to the virtio-net device's 74 Peripheral Component Interconnect (PCI) config space. 75 This enables the vhost-net module to receive notifications (kicks) from the guest. 76 77The following figure shows the system architecture for virtio-net networking with vhost-net offload. 78 79.. _figure_17: 80 81**Figure 17. Virtio with Linux* Kernel Vhost** 82 83.. image20_png has been renamed 84 85|virtio_linux_vhost| 86 87Sample Code Overview 88-------------------- 89 90The DPDK vhost-net sample code demonstrates KVM (QEMU) offloading the servicing of a Virtual Machine's (VM's) 91virtio-net devices to a DPDK-based application in place of the kernel's vhost-net module. 92 93The DPDK vhost-net sample code is based on vhost library. Vhost library is developed for user space ethernet switch to 94easily integrate with vhost functionality. 95 96The vhost library implements the following features: 97 98* Management of virtio-net device creation/destruction events. 99 100* Mapping of the VM's physical memory into the DPDK vhost-net's address space. 101 102* Triggering/receiving notifications to/from VMs via eventfds. 103 104* A virtio-net back-end implementation providing a subset of virtio-net features. 105 106There are two vhost implementations in vhost library, vhost cuse and vhost user. In vhost cuse, a character device driver is implemented to 107receive and process vhost requests through ioctl messages. In vhost user, a socket server is created to received vhost requests through 108socket messages. Most of the messages share the same handler routine. 109 110.. note:: 111 **Any vhost cuse specific requirement in the following sections will be emphasized**. 112 113Two impelmentations are turned on and off statically through configure file. Only one implementation could be turned on. They don't co-exist in current implementation. 114 115The vhost sample code application is a simple packet switching application with the following feature: 116 117* Packet switching between virtio-net devices and the network interface card, 118 including using VMDQs to reduce the switching that needs to be performed in software. 119 120The following figure shows the architecture of the Vhost sample application based on vhost-cuse. 121 122.. _figure_18: 123 124**Figure 18. Vhost-net Architectural Overview** 125 126.. image21_png has been renamed 127 128|vhost_net_arch| 129 130The following figure shows the flow of packets through the vhost-net sample application. 131 132.. _figure_19: 133 134**Figure 19. Packet Flow Through the vhost-net Sample Application** 135 136.. image22_png has been renamed 137 138|vhost_net_sample_app| 139 140Supported Distributions 141----------------------- 142 143The example in this section have been validated with the following distributions: 144 145* Fedora* 18 146 147* Fedora* 19 148 149* Fedora* 20 150 151Prerequisites 152------------- 153 154This section lists prerequisite packages that must be installed. 155 156Installing Packages on the Host(vhost cuse required) 157~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 158 159The vhost cuse code uses the following packages; fuse, fuse-devel, and kernel-modules-extra. 160The vhost user code don't rely on those modules as eventfds are already installed into vhost process through 161unix domain socket. 162 163#. Install Fuse Development Libraries and headers: 164 165 .. code-block:: console 166 167 yum -y install fuse fuse-devel 168 169#. Install the Cuse Kernel Module: 170 171 .. code-block:: console 172 173 yum -y install kernel-modules-extra 174 175QEMU simulator 176~~~~~~~~~~~~~~ 177 178For vhost user, qemu 2.2 is required. 179 180Setting up the Execution Environment 181~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 182 183The vhost sample code requires that QEMU allocates a VM's memory on the hugetlbfs file system. 184As the vhost sample code requires hugepages, 185the best practice is to partition the system into separate hugepage mount points for the VMs and the vhost sample code. 186 187.. note:: 188 189 This is best-practice only and is not mandatory. 190 For systems that only support 2 MB page sizes, 191 both QEMU and vhost sample code can use the same hugetlbfs mount point without issue. 192 193**QEMU** 194 195VMs with gigabytes of memory can benefit from having QEMU allocate their memory from 1 GB huge pages. 1961 GB huge pages must be allocated at boot time by passing kernel parameters through the grub boot loader. 197 198#. Calculate the maximum memory usage of all VMs to be run on the system. 199 Then, round this value up to the nearest Gigabyte the execution environment will require. 200 201#. Edit the /etc/default/grub file, and add the following to the GRUB_CMDLINE_LINUX entry: 202 203 .. code-block:: console 204 205 GRUB_CMDLINE_LINUX="... hugepagesz=1G hugepages=<Number of hugepages required> default_hugepagesz=1G" 206 207#. Update the grub boot loader: 208 209 .. code-block:: console 210 211 grub2-mkconfig -o /boot/grub2/grub.cfg 212 213#. Reboot the system. 214 215#. The hugetlbfs mount point (/dev/hugepages) should now default to allocating gigabyte pages. 216 217.. note:: 218 219 Making the above modification will change the system default hugepage size to 1 GB for all applications. 220 221**Vhost Sample Code** 222 223In this section, we create a second hugetlbs mount point to allocate hugepages for the DPDK vhost sample code. 224 225#. Allocate sufficient 2 MB pages for the DPDK vhost sample code: 226 227 .. code-block:: console 228 229 echo 256 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages 230 231#. Mount hugetlbs at a separate mount point for 2 MB pages: 232 233 .. code-block:: console 234 235 mount -t hugetlbfs nodev /mnt/huge -o pagesize=2M 236 237The above steps can be automated by doing the following: 238 239#. Edit /etc/fstab to add an entry to automatically mount the second hugetlbfs mount point: 240 241 :: 242 243 hugetlbfs <tab> /mnt/huge <tab> hugetlbfs defaults,pagesize=1G 0 0 244 245#. Edit the /etc/default/grub file, and add the following to the GRUB_CMDLINE_LINUX entry: 246 247 :: 248 249 GRUB_CMDLINE_LINUX="... hugepagesz=2M hugepages=256 ... default_hugepagesz=1G" 250 251#. Update the grub bootloader: 252 253 .. code-block:: console 254 255 grub2-mkconfig -o /boot/grub2/grub.cfg 256 257#. Reboot the system. 258 259.. note:: 260 261 Ensure that the default hugepage size after this setup is 1 GB. 262 263Setting up the Guest Execution Environment 264~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 265 266It is recommended for testing purposes that the DPDK testpmd sample application is used in the guest to forward packets, 267the reasons for this are discussed in Section 22.7, "Running the Virtual Machine (QEMU)". 268 269The testpmd application forwards packets between pairs of Ethernet devices, 270it requires an even number of Ethernet devices (virtio or otherwise) to execute. 271It is therefore recommended to create multiples of two virtio-net devices for each Virtual Machine either through libvirt or 272at the command line as follows. 273 274.. note:: 275 276 Observe that in the example, "-device" and "-netdev" are repeated for two virtio-net devices. 277 278For vhost cuse: 279 280.. code-block:: console 281 282 user@target:~$ qemu-system-x86_64 ... \ 283 -netdev tap,id=hostnet1,vhost=on,vhostfd=<open fd> \ 284 -device virtio-net-pci, netdev=hostnet1,id=net1 \ 285 -netdev tap,id=hostnet2,vhost=on,vhostfd=<open fd> \ 286 -device virtio-net-pci, netdev=hostnet2,id=net1 287 288For vhost user: 289 290.. code-block:: console 291 292 user@target:~$ qemu-system-x86_64 ... \ 293 -chardev socket,id=char1,path=<sock_path> \ 294 -netdev type=vhost-user,id=hostnet1,chardev=char1 \ 295 -device virtio-net-pci,netdev=hostnet1,id=net1 \ 296 -chardev socket,id=char2,path=<sock_path> \ 297 -netdev type=vhost-user,id=hostnet2,chardev=char2 \ 298 -device virtio-net-pci,netdev=hostnet2,id=net2 299 300sock_path is the path for the socket file created by vhost. 301 302Compiling the Sample Code 303------------------------- 304#. Compile vhost lib: 305 306 To enable vhost, turn on vhost library in the configure file config/common_linuxapp. 307 308 .. code-block:: console 309 310 CONFIG_RTE_LIBRTE_VHOST=n 311 312 vhost user is turned on by default in the lib/librte_vhost/Makefile. 313 To enable vhost cuse, uncomment vhost cuse and comment vhost user manually. In future, a configure will be created for switch between two implementations. 314 315 .. code-block:: console 316 317 SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_cuse/vhost-net-cdev.c vhost_cuse/virtio-net-cdev.c vhost_cuse/eventfd_copy.c 318 #SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_user/vhost-net-user.c vhost_user/virtio-net-user.c vhost_user/fd_man.c 319 320 After vhost is enabled and the implementation is selected, build the vhost library. 321 322#. Go to the examples directory: 323 324 .. code-block:: console 325 326 export RTE_SDK=/path/to/rte_sdk 327 cd ${RTE_SDK}/examples/vhost 328 329#. Set the target (a default target is used if not specified). For example: 330 331 .. code-block:: console 332 333 export RTE_TARGET=x86_64-native-linuxapp-gcc 334 335 See the DPDK Getting Started Guide for possible RTE_TARGET values. 336 337#. Build the application: 338 339 .. code-block:: console 340 341 cd ${RTE_SDK} 342 make config ${RTE_TARGET} 343 make install ${RTE_TARGET} 344 cd ${RTE_SDK}/examples/vhost 345 make 346 347#. Go to the eventfd_link directory(vhost cuse required): 348 349 .. code-block:: console 350 351 cd ${RTE_SDK}/lib/librte_vhost/eventfd_link 352 353#. Build the eventfd_link kernel module(vhost cuse required): 354 355 .. code-block:: console 356 357 make 358 359Running the Sample Code 360----------------------- 361 362#. Install the cuse kernel module(vhost cuse required): 363 364 .. code-block:: console 365 366 modprobe cuse 367 368#. Go to the eventfd_link directory(vhost cuse required): 369 370 .. code-block:: console 371 372 export RTE_SDK=/path/to/rte_sdk 373 cd ${RTE_SDK}/lib/librte_vhost/eventfd_link 374 375#. Install the eventfd_link module(vhost cuse required): 376 377 .. code-block:: console 378 379 insmod ./eventfd_link.ko 380 381#. Go to the examples directory: 382 383 .. code-block:: console 384 385 export RTE_SDK=/path/to/rte_sdk 386 cd ${RTE_SDK}/examples/vhost 387 388#. Run the vhost-switch sample code: 389 390 vhost cuse: 391 392 .. code-block:: console 393 394 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- -p 0x1 --dev-basename usvhost --dev-index 1 395 396 vhost user: a socket file named usvhost will be created under current directory. Use its path as the socket path in guest's qemu commandline. 397 398 .. code-block:: console 399 400 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- -p 0x1 --dev-basename usvhost 401 402.. note:: 403 404 Please note the huge-dir parameter instructs the DPDK to allocate its memory from the 2 MB page hugetlbfs. 405 406Parameters 407~~~~~~~~~~ 408 409**Basename and Index.** 410vhost cuse uses a Linux* character device to communicate with QEMU. 411The basename and the index are used to generate the character devices name. 412 413 /dev/<basename>-<index> 414 415The index parameter is provided for a situation where multiple instances of the virtual switch is required. 416 417For compatibility with the QEMU wrapper script, a base name of "usvhost" and an index of "1" should be used: 418 419.. code-block:: console 420 421 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- -p 0x1 --dev-basename usvhost --dev-index 1 422 423**vm2vm.** 424The vm2vm parameter disable/set mode of packet switching between guests in the host. 425Value of "0" means disabling vm2vm implies that on virtual machine packet transmission will always go to the Ethernet port; 426Value of "1" means software mode packet forwarding between guests, it needs packets copy in vHOST, 427so valid only in one-copy implementation, and invalid for zero copy implementation; 428value of "2" means hardware mode packet forwarding between guests, it allows packets go to the Ethernet port, 429hardware L2 switch will determine which guest the packet should forward to or need send to external, 430which bases on the packet destination MAC address and VLAN tag. 431 432.. code-block:: console 433 434 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --vm2vm [0,1,2] 435 436**Mergeable Buffers.** 437The mergeable buffers parameter controls how virtio-net descriptors are used for virtio-net headers. 438In a disabled state, one virtio-net header is used per packet buffer; 439in an enabled state one virtio-net header is used for multiple packets. 440The default value is 0 or disabled since recent kernels virtio-net drivers show performance degradation with this feature is enabled. 441 442.. code-block:: console 443 444 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- --mergeable [0,1] 445 446**Stats.** 447The stats parameter controls the printing of virtio-net device statistics. 448The parameter specifies an interval second to print statistics, with an interval of 0 seconds disabling statistics. 449 450.. code-block:: console 451 452 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- --stats [0,n] 453 454**RX Retry.** 455The rx-retry option enables/disables enqueue retries when the guests RX queue is full. 456This feature resolves a packet loss that is observed at high data-rates, 457by allowing it to delay and retry in the receive path. 458This option is enabled by default. 459 460.. code-block:: console 461 462 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- --rx-retry [0,1] 463 464**RX Retry Number.** 465The rx-retry-num option specifies the number of retries on an RX burst, 466it takes effect only when rx retry is enabled. 467The default value is 4. 468 469.. code-block:: console 470 471 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- --rx-retry 1 --rx-retry-num 5 472 473**RX Retry Delay Time.** 474The rx-retry-delay option specifies the timeout (in micro seconds) between retries on an RX burst, 475it takes effect only when rx retry is enabled. 476The default value is 15. 477 478.. code-block:: console 479 480 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- --rx-retry 1 --rx-retry-delay 20 481 482**Zero copy.** 483The zero copy option enables/disables the zero copy mode for RX/TX packet, 484in the zero copy mode the packet buffer address from guest translate into host physical address 485and then set directly as DMA address. 486If the zero copy mode is disabled, then one copy mode is utilized in the sample. 487This option is disabled by default. 488 489.. code-block:: console 490 491 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --zero-copy [0,1] 492 493**RX descriptor number.** 494The RX descriptor number option specify the Ethernet RX descriptor number, 495Linux legacy virtio-net has different behaviour in how to use the vring descriptor from DPDK based virtio-net PMD, 496the former likely allocate half for virtio header, another half for frame buffer, 497while the latter allocate all for frame buffer, 498this lead to different number for available frame buffer in vring, 499and then lead to different Ethernet RX descriptor number could be used in zero copy mode. 500So it is valid only in zero copy mode is enabled. The value is 32 by default. 501 502.. code-block:: console 503 504 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --zero-copy 1 --rx-desc-num [0, n] 505 506**TX descriptornumber.** 507The TX descriptor number option specify the Ethernet TX descriptor number, it is valid only in zero copy mode is enabled. 508The value is 64 by default. 509 510.. code-block:: console 511 512 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --zero-copy 1 --tx-desc-num [0, n] 513 514**VLAN strip.** 515The VLAN strip option enable/disable the VLAN strip on host, if disabled, the guest will receive the packets with VLAN tag. 516It is enabled by default. 517 518.. code-block:: console 519 520 user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --vlan-strip [0, 1] 521 522Running the Virtual Machine (QEMU) 523---------------------------------- 524 525QEMU must be executed with specific parameters to: 526 527* Ensure the guest is configured to use virtio-net network adapters. 528 529 .. code-block:: console 530 531 user@target:~$ qemu-system-x86_64 ... -device virtio-net-pci,netdev=hostnet1,id=net1 ... 532 533* Ensure the guest's virtio-net network adapter is configured with offloads disabled. 534 535 .. code-block:: console 536 537 user@target:~$ qemu-system-x86_64 ... -device virtio-net-pci,netdev=hostnet1,id=net1,csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off 538 539* Redirect QEMU to communicate with the DPDK vhost-net sample code in place of the vhost-net kernel module(vhost cuse). 540 541 .. code-block:: console 542 543 user@target:~$ qemu-system-x86_64 ... -netdev tap,id=hostnet1,vhost=on,vhostfd=<open fd> ... 544 545* Enable the vhost-net sample code to map the VM's memory into its own process address space. 546 547 .. code-block:: console 548 549 user@target:~$ qemu-system-x86_64 ... -mem-prealloc -mem-path / dev/hugepages ... 550 551.. note:: 552 553 The QEMU wrapper (qemu-wrap.py) is a Python script designed to automate the QEMU configuration described above. 554 It also facilitates integration with libvirt, although the script may also be used standalone without libvirt. 555 556Redirecting QEMU to vhost-net Sample Code(vhost cuse) 557~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 558 559To redirect QEMU to the vhost-net sample code implementation of the vhost-net API, 560an open file descriptor must be passed to QEMU running as a child process. 561 562.. code-block:: python 563 564 #!/usr/bin/python 565 fd = os.open("/dev/usvhost-1", os.O_RDWR) 566 subprocess.call("qemu-system-x86_64 ... . -netdev tap,id=vhostnet0,vhost=on,vhostfd=" + fd +"...", shell=True) 567 568.. note:: 569 570 This process is automated in the QEMU wrapper script discussed in Section 24.7.3. 571 572Mapping the Virtual Machine's Memory 573~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 574 575For the DPDK vhost-net sample code to be run correctly, QEMU must allocate the VM's memory on hugetlbfs. 576This is done by specifying mem-prealloc and mem-path when executing QEMU. 577The vhost-net sample code accesses the virtio-net device's virtual rings and packet buffers 578by finding and mapping the VM's physical memory on hugetlbfs. 579In this case, the path passed to the guest should be that of the 1 GB page hugetlbfs: 580 581.. code-block:: console 582 583 user@target:~$ qemu-system-x86_64 ... -mem-prealloc -mem-path / dev/hugepages ... 584 585.. note:: 586 587 This process is automated in the QEMU wrapper script discussed in Section 24.7.3. 588 The following two sections only applies to vhost cuse. For vhost-user, please make corresponding changes to qemu-wrapper script and guest XML file. 589 590QEMU Wrapper Script 591~~~~~~~~~~~~~~~~~~~ 592 593The QEMU wrapper script automatically detects and calls QEMU with the necessary parameters required 594to integrate with the vhost sample code. 595It performs the following actions: 596 597* Automatically detects the location of the hugetlbfs and inserts this into the command line parameters. 598 599* Automatically open file descriptors for each virtio-net device and inserts this into the command line parameters. 600 601* Disables offloads on each virtio-net device. 602 603* Calls Qemu passing both the command line parameters passed to the script itself and those it has auto-detected. 604 605The QEMU wrapper script will automatically configure calls to QEMU: 606 607.. code-block:: console 608 609 user@target:~$ qemu-wrap.py -machine pc-i440fx-1.4,accel=kvm,usb=off -cpu SandyBridge -smp 4,sockets=4,cores=1,threads=1 610 -netdev tap,id=hostnet1,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1 -hda <disk img> -m 4096 611 612which will become the following call to QEMU: 613 614.. code-block:: console 615 616 /usr/local/bin/qemu-system-x86_64 -machine pc-i440fx-1.4,accel=kvm,usb=off -cpu SandyBridge -smp 4,sockets=4,cores=1,threads=1 617 -netdev tap,id=hostnet1,vhost=on,vhostfd=<open fd> -device virtio-net-pci,netdev=hostnet1,id=net1, 618 csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off -hda <disk img> -m 4096 -mem-path /dev/hugepages -mem-prealloc 619 620Libvirt Integration 621~~~~~~~~~~~~~~~~~~~ 622 623The QEMU wrapper script (qemu-wrap.py) "wraps" libvirt calls to QEMU, 624such that QEMU is called with the correct parameters described above. 625To call the QEMU wrapper automatically from libvirt, the following configuration changes must be made: 626 627* Place the QEMU wrapper script in libvirt's binary search PATH ($PATH). 628 A good location is in the directory that contains the QEMU binary. 629 630* Ensure that the script has the same owner/group and file permissions as the QEMU binary. 631 632* Update the VM xml file using virsh edit <vm name>: 633 634 * Set the VM to use the launch script 635 636 * Set the emulator path contained in the #<emulator><emulator/> tags For example, 637 replace <emulator>/usr/bin/qemu-kvm<emulator/> with <emulator>/usr/bin/qemu-wrap.py<emulator/> 638 639 * Set the VM's virtio-net device's to use vhost-net offload: 640 641 .. code-block:: xml 642 643 <interface type="network"> 644 <model type="virtio"/> 645 <driver name="vhost"/> 646 <interface/> 647 648 * Enable libvirt to access the DPDK Vhost sample code's character device file by adding it 649 to controllers cgroup for libvirtd using the following steps: 650 651 .. code-block:: xml 652 653 cgroup_controllers = [ ... "devices", ... ] clear_emulator_capabilities = 0 654 user = "root" group = "root" 655 cgroup_device_acl = [ 656 "/dev/null", "/dev/full", "/dev/zero", 657 "/dev/random", "/dev/urandom", 658 "/dev/ptmx", "/dev/kvm", "/dev/kqemu", 659 "/dev/rtc", "/dev/hpet", "/dev/net/tun", 660 "/dev/<devbase-name>-<index>", 661 ] 662 663* Disable SELinux or set to permissive mode. 664 665 666* Mount cgroup device controller: 667 668 .. code-block:: console 669 670 user@target:~$ mkdir /dev/cgroup 671 user@target:~$ mount -t cgroup none /dev/cgroup -o devices 672 673* Restart the libvirtd system process 674 675 For example, on Fedora* "systemctl restart libvirtd.service" 676 677* Edit the configuration parameters section of the script: 678 679 * Configure the "emul_path" variable to point to the QEMU emulator. 680 681 .. code-block:: xml 682 683 emul_path = "/usr/local/bin/qemu-system-x86_64" 684 685 * Configure the "us_vhost_path" variable to point to the DPDK vhost-net sample code's character devices name. 686 DPDK vhost-net sample code's character device will be in the format "/dev/<basename>-<index>". 687 688 .. code-block:: xml 689 690 us_vhost_path = "/dev/usvhost-1" 691 692Common Issues 693~~~~~~~~~~~~~ 694 695* QEMU failing to allocate memory on hugetlbfs, with an error like the following:: 696 697 file_ram_alloc: can't mmap RAM pages: Cannot allocate memory 698 699 When running QEMU the above error indicates that it has failed to allocate memory for the Virtual Machine on 700 the hugetlbfs. This is typically due to insufficient hugepages being free to support the allocation request. 701 The number of free hugepages can be checked as follows: 702 703 .. code-block:: console 704 705 cat /sys/kernel/mm/hugepages/hugepages-<pagesize>/nr_hugepages 706 707 The command above indicates how many hugepages are free to support QEMU's allocation request. 708 709* User space VHOST when the guest has 2MB sized huge pages: 710 711 The guest may have 2MB or 1GB sized huge pages. The user space VHOST should work properly in both cases. 712 713* User space VHOST will not work with QEMU without the ``-mem-prealloc`` option: 714 715 The current implementation works properly only when the guest memory is pre-allocated, so it is required to 716 use a QEMU version (e.g. 1.6) which supports ``-mem-prealloc``. The ``-mem-prealloc`` option must be 717 specified explicitly in the QEMU command line. 718 719* User space VHOST will not work with a QEMU version without shared memory mapping: 720 721 As shared memory mapping is mandatory for user space VHOST to work properly with the guest, user space VHOST 722 needs access to the shared memory from the guest to receive and transmit packets. It is important to make sure 723 the QEMU version supports shared memory mapping. 724 725* Issues with ``virsh destroy`` not destroying the VM: 726 727 Using libvirt ``virsh create`` the ``qemu-wrap.py`` spawns a new process to run ``qemu-kvm``. This impacts the behavior 728 of ``virsh destroy`` which kills the process running ``qemu-wrap.py`` without actually destroying the VM (it leaves 729 the ``qemu-kvm`` process running): 730 731 This following patch should fix this issue: 732 http://dpdk.org/ml/archives/dev/2014-June/003607.html 733 734* In an Ubuntu environment, QEMU fails to start a new guest normally with user space VHOST due to not being able 735 to allocate huge pages for the new guest: 736 737 The solution for this issue is to add ``-boot c`` into the QEMU command line to make sure the huge pages are 738 allocated properly and then the guest should start normally. 739 740 Use ``cat /proc/meminfo`` to check if there is any changes in the value of ``HugePages_Total`` and ``HugePages_Free`` 741 after the guest startup. 742 743* Log message: ``eventfd_link: module verification failed: signature and/or required key missing - tainting kernel``: 744 745 This log message may be ignored. The message occurs due to the kernel module ``eventfd_link``, which is not a standard 746 Linux module but which is necessary for the user space VHOST current implementation (CUSE-based) to communicate with 747 the guest. 748 749 750Running DPDK in the Virtual Machine 751----------------------------------- 752 753For the DPDK vhost-net sample code to switch packets into the VM, 754the sample code must first learn the MAC address of the VM's virtio-net device. 755The sample code detects the address from packets being transmitted from the VM, similar to a learning switch. 756 757This behavior requires no special action or configuration with the Linux* virtio-net driver in the VM 758as the Linux* Kernel will automatically transmit packets during device initialization. 759However, DPDK-based applications must be modified to automatically transmit packets during initialization 760to facilitate the DPDK vhost- net sample code's MAC learning. 761 762The DPDK testpmd application can be configured to automatically transmit packets during initialization 763and to act as an L2 forwarding switch. 764 765Testpmd MAC Forwarding 766~~~~~~~~~~~~~~~~~~~~~~ 767 768At high packet rates, a minor packet loss may be observed. 769To resolve this issue, a "wait and retry" mode is implemented in the testpmd and vhost sample code. 770In the "wait and retry" mode if the virtqueue is found to be full, then testpmd waits for a period of time before retrying to enqueue packets. 771 772The "wait and retry" algorithm is implemented in DPDK testpmd as a forwarding method call "mac_retry". 773The following sequence diagram describes the algorithm in detail. 774 775.. _figure_20: 776 777**Figure 20. Packet Flow on TX in DPDK-testpmd** 778 779.. image23_png has been renamed 780 781|tx_dpdk_testpmd| 782 783Running Testpmd 784~~~~~~~~~~~~~~~ 785 786The testpmd application is automatically built when DPDK is installed. 787Run the testpmd application as follows: 788 789.. code-block:: console 790 791 user@target:~$ x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -- n 4 -socket-mem 128 -- --burst=64 -i 792 793The destination MAC address for packets transmitted on each port can be set at the command line: 794 795.. code-block:: console 796 797 user@target:~$ x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -- n 4 -socket-mem 128 -- --burst=64 -i --eth- peer=0,aa:bb:cc:dd:ee:ff --eth-peer=1,ff,ee,dd,cc,bb,aa 798 799* Packets received on port 1 will be forwarded on port 0 to MAC address 800 801 aa:bb:cc:dd:ee:ff. 802 803* Packets received on port 0 will be forwarded on port 1 to MAC address 804 805 ff,ee,dd,cc,bb,aa. 806 807The testpmd application can then be configured to act as an L2 forwarding application: 808 809.. code-block:: console 810 811 testpmd> set fwd mac_retry 812 813The testpmd can then be configured to start processing packets, 814transmitting packets first so the DPDK vhost sample code on the host can learn the MAC address: 815 816.. code-block:: console 817 818 testpmd> start tx_first 819 820.. note:: 821 822 Please note "set fwd mac_retry" is used in place of "set fwd mac_fwd" to ensure the retry feature is activated. 823 824Passing Traffic to the Virtual Machine Device 825--------------------------------------------- 826 827For a virtio-net device to receive traffic, 828the traffic's Layer 2 header must include both the virtio-net device's MAC address and VLAN tag. 829The DPDK sample code behaves in a similar manner to a learning switch in that 830it learns the MAC address of the virtio-net devices from the first transmitted packet. 831On learning the MAC address, 832the DPDK vhost sample code prints a message with the MAC address and VLAN tag virtio-net device. 833For example: 834 835.. code-block:: console 836 837 DATA: (0) MAC_ADDRESS cc:bb:bb:bb:bb:bb and VLAN_TAG 1000 registered 838 839The above message indicates that device 0 has been registered with MAC address cc:bb:bb:bb:bb:bb and VLAN tag 1000. 840Any packets received on the NIC with these values is placed on the devices receive queue. 841When a virtio-net device transmits packets, the VLAN tag is added to the packet by the DPDK vhost sample code. 842 843.. |vhost_net_arch| image:: img/vhost_net_arch.* 844 845.. |qemu_virtio_net| image:: img/qemu_virtio_net.* 846 847.. |tx_dpdk_testpmd| image:: img/tx_dpdk_testpmd.* 848 849.. |vhost_net_sample_app| image:: img/vhost_net_sample_app.* 850 851.. |virtio_linux_vhost| image:: img/virtio_linux_vhost.* 852