xref: /dpdk/doc/guides/sample_app_ug/vhost.rst (revision 2ee98e69e104b551aeb1d9567e8a28a854849c0f)
1
2..  BSD LICENSE
3    Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
4    All rights reserved.
5
6    Redistribution and use in source and binary forms, with or without
7    modification, are permitted provided that the following conditions
8    are met:
9
10    * Redistributions of source code must retain the above copyright
11    notice, this list of conditions and the following disclaimer.
12    * Redistributions in binary form must reproduce the above copyright
13    notice, this list of conditions and the following disclaimer in
14    the documentation and/or other materials provided with the
15    distribution.
16    * Neither the name of Intel Corporation nor the names of its
17    contributors may be used to endorse or promote products derived
18    from this software without specific prior written permission.
19
20    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
21    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
22    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
23    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
24    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
25    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
26    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
27    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
28    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
29    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
30    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
31
32
33Vhost Sample Application
34========================
35
36The vhost sample application demonstrates integration of the Data Plane Development Kit (DPDK)
37with the Linux* KVM hypervisor by implementing the vhost-net offload API.
38The sample application performs simple packet switching between virtual machines based on Media Access Control
39(MAC) address or Virtual Local Area Network (VLAN) tag.
40The splitting of ethernet traffic from an external switch is performed in hardware by the Virtual Machine Device Queues
41(VMDQ) and Data Center Bridging (DCB) features of the Intel® 82599 10 Gigabit Ethernet Controller.
42
43Background
44----------
45
46Virtio networking (virtio-net) was developed as the Linux* KVM para-virtualized method for communicating network packets
47between host and guest.
48It was found that virtio-net performance was poor due to context switching and packet copying between host, guest, and QEMU.
49The following figure shows the system architecture for a virtio-based networking (virtio-net).
50
51.. _figure_16:
52
53**Figure16. QEMU Virtio-net (prior to vhost-net)**
54
55.. image19_png has been renamed
56
57|qemu_virtio_net|
58
59The Linux* Kernel vhost-net module was developed as an offload mechanism for virtio-net.
60The vhost-net module enables KVM (QEMU) to offload the servicing of virtio-net devices to the vhost-net kernel module,
61reducing the context switching and packet copies in the virtual dataplane.
62
63This is achieved by QEMU sharing the following information with the vhost-net module through the vhost-net API:
64
65*   The layout of the guest memory space, to enable the vhost-net module to translate addresses.
66
67*   The locations of virtual queues in QEMU virtual address space,
68    to enable the vhost module to read/write directly to and from the virtqueues.
69
70*   An event file descriptor (eventfd) configured in KVM to send interrupts to the virtio- net device driver in the guest.
71    This enables the vhost-net module to notify (call) the guest.
72
73*   An eventfd configured in KVM to be triggered on writes to the virtio-net device's
74    Peripheral Component Interconnect (PCI) config space.
75    This enables the vhost-net module to receive notifications (kicks) from the guest.
76
77The following figure shows the system architecture for virtio-net networking with vhost-net offload.
78
79.. _figure_17:
80
81**Figure 17. Virtio with Linux* Kernel Vhost**
82
83.. image20_png has been renamed
84
85|virtio_linux_vhost|
86
87Sample Code Overview
88--------------------
89
90The DPDK vhost-net sample code demonstrates KVM (QEMU) offloading the servicing of a Virtual Machine's (VM's)
91virtio-net devices to a DPDK-based application in place of the kernel's vhost-net module.
92
93The DPDK vhost-net sample code is based on vhost library. Vhost library is developed for user space ethernet switch to
94easily integrate with vhost functionality.
95
96The vhost library implements the following features:
97
98*   Management of virtio-net device creation/destruction events.
99
100*   Mapping of the VM's physical memory into the DPDK vhost-net's address space.
101
102*   Triggering/receiving notifications to/from VMs via eventfds.
103
104*   A virtio-net back-end implementation providing a subset of virtio-net features.
105
106There are two vhost implementations in vhost library, vhost cuse and vhost user. In vhost cuse, a character device driver is implemented to
107receive and process vhost requests through ioctl messages. In vhost user, a socket server is created to received vhost requests through
108socket messages. Most of the messages share the same handler routine.
109
110.. note::
111    **Any vhost cuse specific requirement in the following sections will be emphasized**.
112
113Two impelmentations are turned on and off statically through configure file. Only one implementation could be turned on. They don't co-exist in current implementation.
114
115The vhost sample code application is a simple packet switching application with the following feature:
116
117*   Packet switching between virtio-net devices and the network interface card,
118    including using VMDQs to reduce the switching that needs to be performed in software.
119
120The following figure shows the architecture of the Vhost sample application based on vhost-cuse.
121
122.. _figure_18:
123
124**Figure 18. Vhost-net Architectural Overview**
125
126.. image21_png has been renamed
127
128|vhost_net_arch|
129
130The following figure shows the flow of packets through the vhost-net sample application.
131
132.. _figure_19:
133
134**Figure 19. Packet Flow Through the vhost-net Sample Application**
135
136.. image22_png  has been renamed
137
138|vhost_net_sample_app|
139
140Supported Distributions
141-----------------------
142
143The example in this section have been validated with the following distributions:
144
145*   Fedora* 18
146
147*   Fedora* 19
148
149*   Fedora* 20
150
151Prerequisites
152-------------
153
154This section lists prerequisite packages that must be installed.
155
156Installing Packages on the Host(vhost cuse required)
157~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
158
159The vhost cuse code uses the following packages; fuse, fuse-devel, and kernel-modules-extra.
160The vhost user code don't rely on those modules as eventfds are already installed into vhost process through
161unix domain socket.
162
163#.  Install Fuse Development Libraries and headers:
164
165    .. code-block:: console
166
167        yum -y install fuse fuse-devel
168
169#.  Install the Cuse Kernel Module:
170
171    .. code-block:: console
172
173        yum -y install kernel-modules-extra
174
175QEMU simulator
176~~~~~~~~~~~~~~
177
178For vhost user, qemu 2.2 is required.
179
180Setting up the Execution Environment
181~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
182
183The vhost sample code requires that QEMU allocates a VM's memory on the hugetlbfs file system.
184As the vhost sample code requires hugepages,
185the best practice is to partition the system into separate hugepage mount points for the VMs and the vhost sample code.
186
187.. note::
188
189    This is best-practice only and is not mandatory.
190    For systems that only support 2 MB page sizes,
191    both QEMU and vhost sample code can use the same hugetlbfs mount point without issue.
192
193**QEMU**
194
195VMs with gigabytes of memory can benefit from having QEMU allocate their memory from 1 GB huge pages.
1961 GB huge pages must be allocated at boot time by passing kernel parameters through the grub boot loader.
197
198#.  Calculate the maximum memory usage of all VMs to be run on the system.
199    Then, round this value up to the nearest Gigabyte the execution environment will require.
200
201#.  Edit the /etc/default/grub file, and add the following to the GRUB_CMDLINE_LINUX entry:
202
203    .. code-block:: console
204
205        GRUB_CMDLINE_LINUX="... hugepagesz=1G hugepages=<Number of hugepages required> default_hugepagesz=1G"
206
207#.  Update the grub boot loader:
208
209    .. code-block:: console
210
211        grub2-mkconfig -o /boot/grub2/grub.cfg
212
213#.  Reboot the system.
214
215#.  The hugetlbfs mount point (/dev/hugepages) should now default to allocating gigabyte pages.
216
217.. note::
218
219    Making the above modification will change the system default hugepage size to 1 GB for all applications.
220
221**Vhost Sample Code**
222
223In this section, we create a second hugetlbs mount point to allocate hugepages for the DPDK vhost sample code.
224
225#.  Allocate sufficient 2 MB pages for the DPDK vhost sample code:
226
227    .. code-block:: console
228
229        echo 256 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
230
231#.  Mount hugetlbs at a separate mount point for 2 MB pages:
232
233    .. code-block:: console
234
235        mount -t hugetlbfs nodev /mnt/huge -o pagesize=2M
236
237The above steps can be automated by doing the following:
238
239#.  Edit /etc/fstab to add an entry to automatically mount the second hugetlbfs mount point:
240
241    ::
242
243        hugetlbfs <tab> /mnt/huge <tab> hugetlbfs defaults,pagesize=1G 0 0
244
245#.  Edit the /etc/default/grub file, and add the following to the GRUB_CMDLINE_LINUX entry:
246
247    ::
248
249        GRUB_CMDLINE_LINUX="... hugepagesz=2M hugepages=256 ... default_hugepagesz=1G"
250
251#.  Update the grub bootloader:
252
253    .. code-block:: console
254
255        grub2-mkconfig -o /boot/grub2/grub.cfg
256
257#.  Reboot the system.
258
259.. note::
260
261    Ensure that the default hugepage size after this setup is 1 GB.
262
263Setting up the Guest Execution Environment
264~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
265
266It is recommended for testing purposes that the DPDK testpmd sample application is used in the guest to forward packets,
267the reasons for this are discussed in Section 22.7, "Running the Virtual Machine (QEMU)".
268
269The testpmd application forwards packets between pairs of Ethernet devices,
270it requires an even number of Ethernet devices (virtio or otherwise) to execute.
271It is therefore recommended to create multiples of two virtio-net devices for each Virtual Machine either through libvirt or
272at the command line as follows.
273
274.. note::
275
276    Observe that in the example, "-device" and "-netdev" are repeated for two virtio-net devices.
277
278For vhost cuse:
279
280.. code-block:: console
281
282    user@target:~$ qemu-system-x86_64 ... \
283    -netdev tap,id=hostnet1,vhost=on,vhostfd=<open fd> \
284    -device virtio-net-pci, netdev=hostnet1,id=net1 \
285    -netdev tap,id=hostnet2,vhost=on,vhostfd=<open fd> \
286    -device virtio-net-pci, netdev=hostnet2,id=net1
287
288For vhost user:
289
290.. code-block:: console
291
292    user@target:~$ qemu-system-x86_64 ... \
293    -chardev socket,id=char1,path=<sock_path> \
294    -netdev type=vhost-user,id=hostnet1,chardev=char1 \
295    -device virtio-net-pci,netdev=hostnet1,id=net1 \
296    -chardev socket,id=char2,path=<sock_path> \
297    -netdev type=vhost-user,id=hostnet2,chardev=char2 \
298    -device virtio-net-pci,netdev=hostnet2,id=net2
299
300sock_path is the path for the socket file created by vhost.
301
302Compiling the Sample Code
303-------------------------
304#.  Compile vhost lib:
305
306    To enable vhost, turn on vhost library in the configure file config/common_linuxapp.
307
308    .. code-block:: console
309
310        CONFIG_RTE_LIBRTE_VHOST=n
311
312    vhost user is turned on by default in the lib/librte_vhost/Makefile.
313    To enable vhost cuse, uncomment vhost cuse and comment vhost user manually. In future, a configure will be created for switch between two implementations.
314
315    .. code-block:: console
316
317        SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_cuse/vhost-net-cdev.c vhost_cuse/virtio-net-cdev.c vhost_cuse/eventfd_copy.c
318        #SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_user/vhost-net-user.c vhost_user/virtio-net-user.c vhost_user/fd_man.c
319
320     After vhost is enabled and the implementation is selected, build the vhost library.
321
322#.  Go to the examples directory:
323
324    .. code-block:: console
325
326        export RTE_SDK=/path/to/rte_sdk
327        cd ${RTE_SDK}/examples/vhost
328
329#.  Set the target (a default target is used if not specified). For example:
330
331    .. code-block:: console
332
333        export RTE_TARGET=x86_64-native-linuxapp-gcc
334
335    See the DPDK Getting Started Guide for possible RTE_TARGET values.
336
337#.  Build the application:
338
339    .. code-block:: console
340
341        cd ${RTE_SDK}
342        make config ${RTE_TARGET}
343        make install ${RTE_TARGET}
344        cd ${RTE_SDK}/examples/vhost
345        make
346
347#.  Go to the eventfd_link directory(vhost cuse required):
348
349    .. code-block:: console
350
351        cd ${RTE_SDK}/lib/librte_vhost/eventfd_link
352
353#.  Build the eventfd_link kernel module(vhost cuse required):
354
355    .. code-block:: console
356
357        make
358
359Running the Sample Code
360-----------------------
361
362#.  Install the cuse kernel module(vhost cuse required):
363
364    .. code-block:: console
365
366        modprobe cuse
367
368#.  Go to the eventfd_link directory(vhost cuse required):
369
370    .. code-block:: console
371
372        export RTE_SDK=/path/to/rte_sdk
373        cd ${RTE_SDK}/lib/librte_vhost/eventfd_link
374
375#.  Install the eventfd_link module(vhost cuse required):
376
377    .. code-block:: console
378
379        insmod ./eventfd_link.ko
380
381#.  Go to the examples directory:
382
383    .. code-block:: console
384
385        export RTE_SDK=/path/to/rte_sdk
386        cd ${RTE_SDK}/examples/vhost
387
388#.  Run the vhost-switch sample code:
389
390    vhost cuse:
391
392    .. code-block:: console
393
394        user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- -p 0x1 --dev-basename usvhost --dev-index 1
395
396    vhost user: a socket file named usvhost will be created under current directory. Use its path as the socket path in guest's qemu commandline.
397
398    .. code-block:: console
399
400        user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- -p 0x1 --dev-basename usvhost
401
402.. note::
403
404    Please note the huge-dir parameter instructs the DPDK to allocate its memory from the 2 MB page hugetlbfs.
405
406Parameters
407~~~~~~~~~~
408
409**Basename and Index.**
410vhost cuse uses a Linux* character device to communicate with QEMU.
411The basename and the index are used to generate the character devices name.
412
413    /dev/<basename>-<index>
414
415The index parameter is provided for a situation where multiple instances of the virtual switch is required.
416
417For compatibility with the QEMU wrapper script, a base name of "usvhost" and an index of "1" should be used:
418
419.. code-block:: console
420
421    user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- -p 0x1 --dev-basename usvhost --dev-index 1
422
423**vm2vm.**
424The vm2vm parameter disable/set mode of packet switching between guests in the host.
425Value of "0" means disabling vm2vm implies that on virtual machine packet transmission will always go to the Ethernet port;
426Value of "1" means software mode packet forwarding between guests, it needs packets copy in vHOST,
427so valid only in one-copy implementation, and invalid for zero copy implementation;
428value of "2" means hardware mode packet forwarding between guests, it allows packets go to the Ethernet port,
429hardware L2 switch will determine which guest the packet should forward to or need send to external,
430which bases on the packet destination MAC address and VLAN tag.
431
432.. code-block:: console
433
434    user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --vm2vm [0,1,2]
435
436**Mergeable Buffers.**
437The mergeable buffers parameter controls how virtio-net descriptors are used for virtio-net headers.
438In a disabled state, one virtio-net header is used per packet buffer;
439in an enabled state one virtio-net header is used for multiple packets.
440The default value is 0 or disabled since recent kernels virtio-net drivers show performance degradation with this feature is enabled.
441
442.. code-block:: console
443
444    user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- --mergeable [0,1]
445
446**Stats.**
447The stats parameter controls the printing of virtio-net device statistics.
448The parameter specifies an interval second to print statistics, with an interval of 0 seconds disabling statistics.
449
450.. code-block:: console
451
452    user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- --stats [0,n]
453
454**RX Retry.**
455The rx-retry option enables/disables enqueue retries when the guests RX queue is full.
456This feature resolves a packet loss that is observed at high data-rates,
457by allowing it to delay and retry in the receive path.
458This option is enabled by default.
459
460.. code-block:: console
461
462    user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- --rx-retry [0,1]
463
464**RX Retry Number.**
465The rx-retry-num option specifies the number of retries on an RX burst,
466it takes effect only when rx retry is enabled.
467The default value is 4.
468
469.. code-block:: console
470
471    user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- --rx-retry 1 --rx-retry-num 5
472
473**RX Retry Delay Time.**
474The rx-retry-delay option specifies the timeout (in micro seconds) between retries on an RX burst,
475it takes effect only when rx retry is enabled.
476The default value is 15.
477
478.. code-block:: console
479
480    user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir / mnt/huge -- --rx-retry 1 --rx-retry-delay 20
481
482**Zero copy.**
483The zero copy option enables/disables the zero copy mode for RX/TX packet,
484in the zero copy mode the packet buffer address from guest translate into host physical address
485and then set directly as DMA address.
486If the zero copy mode is disabled, then one copy mode is utilized in the sample.
487This option is disabled by default.
488
489.. code-block:: console
490
491    user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --zero-copy [0,1]
492
493**RX descriptor number.**
494The RX descriptor number option specify the Ethernet RX descriptor number,
495Linux legacy virtio-net has different behaviour in how to use the vring descriptor from DPDK based virtio-net PMD,
496the former likely allocate half for virtio header, another half for frame buffer,
497while the latter allocate all for frame buffer,
498this lead to different number for available frame buffer in vring,
499and then lead to different Ethernet RX descriptor number could be used in zero copy mode.
500So it is valid only in zero copy mode is enabled. The value is 32 by default.
501
502.. code-block:: console
503
504    user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --zero-copy 1 --rx-desc-num [0, n]
505
506**TX descriptornumber.**
507The TX descriptor number option specify the Ethernet TX descriptor number, it is valid only in zero copy mode is enabled.
508The value is 64 by default.
509
510.. code-block:: console
511
512    user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --zero-copy 1 --tx-desc-num [0, n]
513
514**VLAN strip.**
515The VLAN strip option enable/disable the VLAN strip on host, if disabled, the guest will receive the packets with VLAN tag.
516It is enabled by default.
517
518.. code-block:: console
519
520    user@target:~$ ./build/app/vhost-switch -c f -n 4 --huge-dir /mnt/huge -- --vlan-strip [0, 1]
521
522Running the Virtual Machine (QEMU)
523----------------------------------
524
525QEMU must be executed with specific parameters to:
526
527*   Ensure the guest is configured to use virtio-net network adapters.
528
529    .. code-block:: console
530
531        user@target:~$ qemu-system-x86_64 ... -device virtio-net-pci,netdev=hostnet1,id=net1 ...
532
533*   Ensure the guest's virtio-net network adapter is configured with offloads disabled.
534
535    .. code-block:: console
536
537        user@target:~$ qemu-system-x86_64 ... -device virtio-net-pci,netdev=hostnet1,id=net1,csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off
538
539*   Redirect QEMU to communicate with the DPDK vhost-net sample code in place of the vhost-net kernel module(vhost cuse).
540
541    .. code-block:: console
542
543        user@target:~$ qemu-system-x86_64 ... -netdev tap,id=hostnet1,vhost=on,vhostfd=<open fd> ...
544
545*   Enable the vhost-net sample code to map the VM's memory into its own process address space.
546
547    .. code-block:: console
548
549        user@target:~$ qemu-system-x86_64 ... -mem-prealloc -mem-path / dev/hugepages ...
550
551.. note::
552
553    The QEMU wrapper (qemu-wrap.py) is a Python script designed to automate the QEMU configuration described above.
554    It also facilitates integration with libvirt, although the script may also be used standalone without libvirt.
555
556Redirecting QEMU to vhost-net Sample Code(vhost cuse)
557~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
558
559To redirect QEMU to the vhost-net sample code implementation of the vhost-net API,
560an open file descriptor must be passed to QEMU running as a child process.
561
562.. code-block:: python
563
564    #!/usr/bin/python
565    fd = os.open("/dev/usvhost-1", os.O_RDWR)
566    subprocess.call("qemu-system-x86_64 ... . -netdev tap,id=vhostnet0,vhost=on,vhostfd=" + fd +"...", shell=True)
567
568.. note::
569
570    This process is automated in the QEMU wrapper script discussed in Section 24.7.3.
571
572Mapping the Virtual Machine's Memory
573~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
574
575For the DPDK vhost-net sample code to be run correctly, QEMU must allocate the VM's memory on hugetlbfs.
576This is done by specifying mem-prealloc and mem-path when executing QEMU.
577The vhost-net sample code accesses the virtio-net device's virtual rings and packet buffers
578by finding and mapping the VM's physical memory on hugetlbfs.
579In this case, the path passed to the guest should be that of the 1 GB page hugetlbfs:
580
581.. code-block:: console
582
583    user@target:~$ qemu-system-x86_64 ... -mem-prealloc -mem-path / dev/hugepages ...
584
585.. note::
586
587    This process is automated in the QEMU wrapper script discussed in Section 24.7.3.
588    The following two sections only applies to vhost cuse. For vhost-user, please make corresponding changes to qemu-wrapper script and guest XML file.
589
590QEMU Wrapper Script
591~~~~~~~~~~~~~~~~~~~
592
593The QEMU wrapper script automatically detects and calls QEMU with the necessary parameters required
594to integrate with the vhost sample code.
595It performs the following actions:
596
597*   Automatically detects the location of the hugetlbfs and inserts this into the command line parameters.
598
599*   Automatically open file descriptors for each virtio-net device and inserts this into the command line parameters.
600
601*   Disables offloads on each virtio-net device.
602
603*   Calls Qemu passing both the command line parameters passed to the script itself and those it has auto-detected.
604
605The QEMU wrapper script will automatically configure calls to QEMU:
606
607.. code-block:: console
608
609    user@target:~$ qemu-wrap.py -machine pc-i440fx-1.4,accel=kvm,usb=off -cpu SandyBridge -smp 4,sockets=4,cores=1,threads=1
610    -netdev tap,id=hostnet1,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1 -hda <disk img> -m 4096
611
612which will become the following call to QEMU:
613
614.. code-block:: console
615
616    /usr/local/bin/qemu-system-x86_64 -machine pc-i440fx-1.4,accel=kvm,usb=off -cpu SandyBridge -smp 4,sockets=4,cores=1,threads=1
617    -netdev tap,id=hostnet1,vhost=on,vhostfd=<open fd> -device virtio-net-pci,netdev=hostnet1,id=net1,
618    csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off -hda <disk img> -m 4096 -mem-path /dev/hugepages -mem-prealloc
619
620Libvirt Integration
621~~~~~~~~~~~~~~~~~~~
622
623The QEMU wrapper script (qemu-wrap.py) "wraps" libvirt calls to QEMU,
624such that QEMU is called with the correct parameters described above.
625To call the QEMU wrapper automatically from libvirt, the following configuration changes must be made:
626
627*   Place the QEMU wrapper script in libvirt's binary search PATH ($PATH).
628    A good location is in the directory that contains the QEMU binary.
629
630*   Ensure that the script has the same owner/group and file permissions as the QEMU binary.
631
632*   Update the VM xml file using virsh edit <vm name>:
633
634    *   Set the VM to use the launch script
635
636    *   Set the emulator path contained in the #<emulator><emulator/> tags For example,
637        replace <emulator>/usr/bin/qemu-kvm<emulator/> with  <emulator>/usr/bin/qemu-wrap.py<emulator/>
638
639    *   Set the VM's virtio-net device's to use vhost-net offload:
640
641        .. code-block:: xml
642
643            <interface type="network">
644            <model type="virtio"/>
645            <driver name="vhost"/>
646            <interface/>
647
648    *   Enable libvirt to access the DPDK Vhost sample code's character device file by adding it
649        to controllers cgroup for libvirtd using the following steps:
650
651        .. code-block:: xml
652
653            cgroup_controllers = [ ... "devices", ... ] clear_emulator_capabilities = 0
654            user = "root" group = "root"
655            cgroup_device_acl = [
656                "/dev/null", "/dev/full", "/dev/zero",
657                "/dev/random", "/dev/urandom",
658                "/dev/ptmx", "/dev/kvm", "/dev/kqemu",
659                "/dev/rtc", "/dev/hpet", "/dev/net/tun",
660                "/dev/<devbase-name>-<index>",
661            ]
662
663*   Disable SELinux  or set to permissive mode.
664
665
666*   Mount cgroup device controller:
667
668    .. code-block:: console
669
670        user@target:~$ mkdir /dev/cgroup
671        user@target:~$ mount -t cgroup none /dev/cgroup -o devices
672
673*   Restart the libvirtd system process
674
675    For example, on Fedora* "systemctl restart libvirtd.service"
676
677*   Edit the configuration parameters section of the script:
678
679    *   Configure the "emul_path" variable to point to the QEMU emulator.
680
681        .. code-block:: xml
682
683            emul_path = "/usr/local/bin/qemu-system-x86_64"
684
685    *   Configure the "us_vhost_path" variable to point to the DPDK vhost-net sample code's character devices name.
686        DPDK vhost-net sample code's character device will be in the format "/dev/<basename>-<index>".
687
688        .. code-block:: xml
689
690            us_vhost_path = "/dev/usvhost-1"
691
692Common Issues
693~~~~~~~~~~~~~
694
695*   QEMU failing to allocate memory on hugetlbfs, with an error like the following::
696
697       file_ram_alloc: can't mmap RAM pages: Cannot allocate memory
698
699    When running QEMU the above error indicates that it has failed to allocate memory for the Virtual Machine on
700    the hugetlbfs. This is typically due to insufficient hugepages being free to support the allocation request.
701    The number of free hugepages can be checked as follows:
702
703    .. code-block:: console
704
705        cat /sys/kernel/mm/hugepages/hugepages-<pagesize>/nr_hugepages
706
707    The command above indicates how many hugepages are free to support QEMU's allocation request.
708
709*   User space VHOST when the guest has 2MB sized huge pages:
710
711    The guest may have 2MB or 1GB sized huge pages. The user space VHOST should work properly in both cases.
712
713*   User space VHOST will not work with QEMU without the ``-mem-prealloc`` option:
714
715    The current implementation works properly only when the guest memory is pre-allocated, so it is required to
716    use a QEMU version (e.g. 1.6) which supports ``-mem-prealloc``. The ``-mem-prealloc`` option must be
717    specified explicitly in the QEMU command line.
718
719*   User space VHOST will not work with a QEMU version without shared memory mapping:
720
721    As shared memory mapping is mandatory for user space VHOST to work properly with the guest, user space VHOST
722    needs access to the shared memory from the guest to receive and transmit packets. It is important to make sure
723    the QEMU version supports shared memory mapping.
724
725*   Issues with ``virsh destroy`` not destroying the VM:
726
727    Using libvirt ``virsh create`` the ``qemu-wrap.py`` spawns a new process to run ``qemu-kvm``. This impacts the behavior
728    of ``virsh destroy`` which kills the process running ``qemu-wrap.py`` without actually destroying the VM (it leaves
729    the ``qemu-kvm`` process running):
730
731    This following patch should fix this issue:
732        http://dpdk.org/ml/archives/dev/2014-June/003607.html
733
734*   In an Ubuntu environment, QEMU fails to start a new guest normally with user space VHOST due to not being able
735    to allocate huge pages for the new guest:
736
737    The solution for this issue is to add ``-boot c`` into the QEMU command line to make sure the huge pages are
738    allocated properly and then the guest should start normally.
739
740    Use ``cat /proc/meminfo`` to check if there is any changes in the value of ``HugePages_Total`` and ``HugePages_Free``
741    after the guest startup.
742
743*   Log message: ``eventfd_link: module verification failed: signature and/or required key missing - tainting kernel``:
744
745    This log message may be ignored. The message occurs due to the kernel module ``eventfd_link``, which is not a standard
746    Linux module but which is necessary for the user space VHOST current implementation (CUSE-based) to communicate with
747    the guest.
748
749
750Running DPDK in the Virtual Machine
751-----------------------------------
752
753For the DPDK vhost-net sample code to switch packets into the VM,
754the sample code must first learn the MAC address of the VM's virtio-net device.
755The sample code detects the address from packets being transmitted from the VM, similar to a learning switch.
756
757This behavior requires no special action or configuration with the Linux* virtio-net driver in the VM
758as the Linux* Kernel will automatically transmit packets during device initialization.
759However, DPDK-based applications must be modified to automatically transmit packets during initialization
760to facilitate the DPDK vhost- net sample code's MAC learning.
761
762The DPDK testpmd application can be configured to automatically transmit packets during initialization
763and to act as an L2 forwarding switch.
764
765Testpmd MAC Forwarding
766~~~~~~~~~~~~~~~~~~~~~~
767
768At high packet rates, a minor packet loss may be observed.
769To resolve this issue, a "wait and retry" mode is implemented in the testpmd and vhost sample code.
770In the "wait and retry" mode if the virtqueue is found to be full, then testpmd waits for a period of time before retrying to enqueue packets.
771
772The "wait and retry" algorithm is implemented in DPDK testpmd as a forwarding method call "mac_retry".
773The following sequence diagram describes the algorithm in detail.
774
775.. _figure_20:
776
777**Figure 20. Packet Flow on TX in DPDK-testpmd**
778
779.. image23_png has been renamed
780
781|tx_dpdk_testpmd|
782
783Running Testpmd
784~~~~~~~~~~~~~~~
785
786The testpmd application is automatically built when DPDK is installed.
787Run the testpmd application as follows:
788
789.. code-block:: console
790
791    user@target:~$ x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -- n 4 -socket-mem 128 -- --burst=64 -i
792
793The destination MAC address for packets transmitted on each port can be set at the command line:
794
795.. code-block:: console
796
797    user@target:~$ x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -- n 4 -socket-mem 128 -- --burst=64 -i --eth- peer=0,aa:bb:cc:dd:ee:ff --eth-peer=1,ff,ee,dd,cc,bb,aa
798
799*   Packets received on port 1 will be forwarded on port 0 to MAC address
800
801    aa:bb:cc:dd:ee:ff.
802
803*   Packets received on port 0 will be forwarded on port 1 to MAC address
804
805    ff,ee,dd,cc,bb,aa.
806
807The testpmd application can then be configured to act as an L2 forwarding application:
808
809.. code-block:: console
810
811    testpmd> set fwd mac_retry
812
813The testpmd can then be configured to start processing packets,
814transmitting packets first so the DPDK vhost sample code on the host can learn the MAC address:
815
816.. code-block:: console
817
818    testpmd> start tx_first
819
820.. note::
821
822    Please note "set fwd mac_retry" is used in place of "set fwd mac_fwd" to ensure the retry feature is activated.
823
824Passing Traffic to the Virtual Machine Device
825---------------------------------------------
826
827For a virtio-net device to receive traffic,
828the traffic's Layer 2 header must include both the virtio-net device's MAC address and VLAN tag.
829The DPDK sample code behaves in a similar manner to a learning switch in that
830it learns the MAC address of the virtio-net devices from the first transmitted packet.
831On learning the MAC address,
832the DPDK vhost sample code prints a message with the MAC address and VLAN tag virtio-net device.
833For example:
834
835.. code-block:: console
836
837    DATA: (0) MAC_ADDRESS cc:bb:bb:bb:bb:bb and VLAN_TAG 1000 registered
838
839The above message indicates that device 0 has been registered with MAC address cc:bb:bb:bb:bb:bb and VLAN tag 1000.
840Any packets received on the NIC with these values is placed on the devices receive queue.
841When a virtio-net device transmits packets, the VLAN tag is added to the packet by the DPDK vhost sample code.
842
843.. |vhost_net_arch| image:: img/vhost_net_arch.*
844
845.. |qemu_virtio_net| image:: img/qemu_virtio_net.*
846
847.. |tx_dpdk_testpmd| image:: img/tx_dpdk_testpmd.*
848
849.. |vhost_net_sample_app| image:: img/vhost_net_sample_app.*
850
851.. |virtio_linux_vhost| image:: img/virtio_linux_vhost.*
852