1# vhost Target {#vhost} 2 3# Table of Contents {#vhost_toc} 4 5- @ref vhost_intro 6- @ref vhost_prereqs 7- @ref vhost_start 8- @ref vhost_config 9- @ref vhost_qemu_config 10- @ref vhost_example 11- @ref vhost_advanced_topics 12- @ref vhost_bugs 13 14# Introduction {#vhost_intro} 15 16A vhost target provides a local storage service as a process running on a local machine. 17It is capable of exposing virtualized block devices to QEMU instances or other arbitrary 18processes. 19 20The following diagram presents how QEMU-based VM communicates with SPDK Vhost-SCSI device. 21 22 23 24The diagram, and the vhost protocol itself is described in @ref vhost_processing doc. 25 26SPDK provides an accelerated vhost target by applying the same user space and polling 27techniques as other components in SPDK. Since SPDK is polling for vhost submissions, 28it can signal the VM to skip notifications on submission. This avoids VMEXITs on I/O 29submission and can significantly reduce CPU usage in the VM on heavy I/O workloads. 30 31# Prerequisites {#vhost_prereqs} 32 33This guide assumes the SPDK has been built according to the instructions in @ref 34getting_started. The SPDK vhost target is built with the default configure options. 35 36## Vhost Command Line Parameters {#vhost_cmd_line_args} 37 38Additional command line flags are available for Vhost target. 39 40Param | Type | Default | Description 41-------- | -------- | ---------------------- | ----------- 42-S | string | $PWD | directory where UNIX domain sockets will be created 43 44## Supported Guest Operating Systems 45 46The guest OS must contain virtio-scsi or virtio-blk drivers. Most Linux and FreeBSD 47distributions include virtio drivers. 48[Windows virtio drivers](https://fedoraproject.org/wiki/Windows_Virtio_Drivers) must be 49installed separately. The SPDK vhost target has been tested with recent versions of Ubuntu, 50Fedora, and Windows 51 52## QEMU 53 54Userspace vhost-scsi target support was added to upstream QEMU in v2.10.0. Run 55the following command to confirm your QEMU supports userspace vhost-scsi. 56 57~~~{.sh} 58qemu-system-x86_64 -device vhost-user-scsi-pci,help 59~~~ 60 61Userspace vhost-blk target support was added to upstream QEMU in v2.12.0. Run 62the following command to confirm your QEMU supports userspace vhost-blk. 63 64~~~{.sh} 65qemu-system-x86_64 -device vhost-user-blk-pci,help 66~~~ 67 68Userspace vhost-nvme target was added as experimental feature for SPDK 18.04 69release, patches for QEMU are available in SPDK's QEMU repository only. 70 71Run the following command to confirm your QEMU supports userspace vhost-nvme. 72 73~~~{.sh} 74qemu-system-x86_64 -device vhost-user-nvme,help 75~~~ 76 77# Starting SPDK vhost target {#vhost_start} 78 79First, run the SPDK setup.sh script to setup some hugepages for the SPDK vhost target 80application. This will allocate 4096MiB (4GiB) of hugepages, enough for the SPDK 81vhost target and the virtual machine. 82 83~~~{.sh} 84HUGEMEM=4096 scripts/setup.sh 85~~~ 86 87Next, start the SPDK vhost target application. The following command will start vhost 88on CPU cores 0 and 1 (cpumask 0x3) with all future socket files placed in /var/tmp. 89Vhost will fully occupy given CPU cores for I/O polling. Particular vhost devices can 90be restricted to run on a subset of these CPU cores. See @ref vhost_vdev_create for 91details. 92 93~~~{.sh} 94build/bin/vhost -S /var/tmp -m 0x3 95~~~ 96 97To list all available vhost options use the following command. 98 99~~~{.sh} 100build/bin/vhost -h 101~~~ 102 103# SPDK Configuration {#vhost_config} 104 105## Create bdev (block device) {#vhost_bdev_create} 106 107SPDK bdevs are block devices which will be exposed to the guest OS. 108For vhost-scsi, bdevs are exposed as SCSI LUNs on SCSI devices attached to the 109vhost-scsi controller in the guest OS. 110For vhost-blk, bdevs are exposed directly as block devices in the guest OS and are 111not associated at all with SCSI. 112 113SPDK supports several different types of storage backends, including NVMe, 114Linux AIO, malloc ramdisk and Ceph RBD. Refer to @ref bdev for 115additional information on configuring SPDK storage backends. 116 117This guide will use a malloc bdev (ramdisk) named Malloc0. The following RPC 118will create a 64MB malloc bdev with 512-byte block size. 119 120~~~{.sh} 121scripts/rpc.py bdev_malloc_create 64 512 -b Malloc0 122~~~ 123 124## Create a vhost device {#vhost_vdev_create} 125 126### Vhost-SCSI 127 128The following RPC will create a vhost-scsi controller which can be accessed 129by QEMU via /var/tmp/vhost.0. At the time of creation the controller will be 130bound to a single CPU core with the smallest number of vhost controllers. 131The optional `--cpumask` parameter can directly specify which cores should be 132taken into account - in this case always CPU 0. To achieve optimal performance 133on NUMA systems, the cpumask should specify cores on the same CPU socket as its 134associated VM. 135 136~~~{.sh} 137scripts/rpc.py vhost_create_scsi_controller --cpumask 0x1 vhost.0 138~~~ 139 140The following RPC will attach the Malloc0 bdev to the vhost.0 vhost-scsi 141controller. Malloc0 will appear as a single LUN on a SCSI device with 142target ID 0. SPDK Vhost-SCSI device currently supports only one LUN per SCSI target. 143Additional LUNs can be added by specifying a different target ID. 144 145~~~{.sh} 146scripts/rpc.py vhost_scsi_controller_add_target vhost.0 0 Malloc0 147~~~ 148 149To remove a bdev from a vhost-scsi controller use the following RPC: 150 151~~~{.sh} 152scripts/rpc.py vhost_scsi_controller_remove_target vhost.0 0 153~~~ 154 155### Vhost-BLK 156 157The following RPC will create a vhost-blk device exposing Malloc0 bdev. 158The device will be accessible to QEMU via /var/tmp/vhost.1. All the I/O polling 159will be pinned to the least occupied CPU core within given cpumask - in this case 160always CPU 0. For NUMA systems, the cpumask should specify cores on the same CPU 161socket as its associated VM. 162 163~~~{.sh} 164scripts/rpc.py vhost_create_blk_controller --cpumask 0x1 vhost.1 Malloc0 165~~~ 166 167It is also possible to create a read-only vhost-blk device by specifying an 168extra `-r` or `--readonly` parameter. 169 170~~~{.sh} 171scripts/rpc.py vhost_create_blk_controller --cpumask 0x1 -r vhost.1 Malloc0 172~~~ 173 174## QEMU {#vhost_qemu_config} 175 176Now the virtual machine can be started with QEMU. The following command-line 177parameters must be added to connect the virtual machine to its vhost controller. 178 179First, specify the memory backend for the virtual machine. Since QEMU must 180share the virtual machine's memory with the SPDK vhost target, the memory 181must be specified in this format with share=on. 182 183~~~{.sh} 184-object memory-backend-file,id=mem,size=1G,mem-path=/dev/hugepages,share=on 185-numa node,memdev=mem 186~~~ 187 188Second, ensure QEMU boots from the virtual machine image and not the 189SPDK malloc block device by specifying bootindex=0 for the boot image. 190 191~~~{.sh} 192-drive file=guest_os_image.qcow2,if=none,id=disk 193-device ide-hd,drive=disk,bootindex=0 194~~~ 195 196Finally, specify the SPDK vhost devices: 197 198### Vhost-SCSI 199 200~~~{.sh} 201-chardev socket,id=char0,path=/var/tmp/vhost.0 202-device vhost-user-scsi-pci,id=scsi0,chardev=char0 203~~~ 204 205### Vhost-BLK 206 207~~~{.sh} 208-chardev socket,id=char1,path=/var/tmp/vhost.1 209-device vhost-user-blk-pci,id=blk0,chardev=char1 210~~~ 211 212## Example output {#vhost_example} 213 214This example uses an NVMe bdev alongside Mallocs. SPDK vhost application is started 215on CPU cores 0 and 1, QEMU on cores 2 and 3. 216 217~~~{.sh} 218host:~# HUGEMEM=2048 ./scripts/setup.sh 2190000:01:00.0 (8086 0953): nvme -> vfio-pci 220~~~ 221 222~~~{.sh} 223host:~# ./build/bin/vhost -S /var/tmp -s 1024 -m 0x3 & 224Starting DPDK 17.11.0 initialization... 225[ DPDK EAL parameters: vhost -c 3 -m 1024 --master-lcore=1 --file-prefix=spdk_pid156014 ] 226EAL: Detected 48 lcore(s) 227EAL: Probing VFIO support... 228EAL: VFIO support initialized 229app.c: 369:spdk_app_start: *NOTICE*: Total cores available: 2 230reactor.c: 668:spdk_reactors_init: *NOTICE*: Occupied cpu socket mask is 0x1 231reactor.c: 424:_spdk_reactor_run: *NOTICE*: Reactor started on core 1 on socket 0 232reactor.c: 424:_spdk_reactor_run: *NOTICE*: Reactor started on core 0 on socket 0 233~~~ 234 235~~~{.sh} 236host:~# ./scripts/rpc.py bdev_nvme_attach_controller -b Nvme0 -t pcie -a 0000:01:00.0 237EAL: PCI device 0000:01:00.0 on NUMA socket 0 238EAL: probe driver: 8086:953 spdk_nvme 239EAL: using IOMMU type 1 (Type 1) 240~~~ 241 242~~~{.sh} 243host:~# ./scripts/rpc.py bdev_malloc_create 128 4096 Malloc0 244Malloc0 245~~~ 246 247~~~{.sh} 248host:~# ./scripts/rpc.py vhost_create_scsi_controller --cpumask 0x1 vhost.0 249VHOST_CONFIG: vhost-user server: socket created, fd: 21 250VHOST_CONFIG: bind to /var/tmp/vhost.0 251vhost.c: 596:spdk_vhost_dev_construct: *NOTICE*: Controller vhost.0: new controller added 252~~~ 253 254~~~{.sh} 255host:~# ./scripts/rpc.py vhost_scsi_controller_add_target vhost.0 0 Nvme0n1 256vhost_scsi.c: 840:spdk_vhost_scsi_dev_add_tgt: *NOTICE*: Controller vhost.0: defined target 'Target 0' using lun 'Nvme0' 257 258~~~ 259 260~~~{.sh} 261host:~# ./scripts/rpc.py vhost_scsi_controller_add_target vhost.0 1 Malloc0 262vhost_scsi.c: 840:spdk_vhost_scsi_dev_add_tgt: *NOTICE*: Controller vhost.0: defined target 'Target 1' using lun 'Malloc0' 263~~~ 264 265~~~{.sh} 266host:~# ./scripts/rpc.py bdev_malloc_create 64 512 -b Malloc1 267Malloc1 268~~~ 269 270~~~{.sh} 271host:~# ./scripts/rpc.py vhost_create_blk_controller --cpumask 0x2 vhost.1 Malloc1 272vhost_blk.c: 719:spdk_vhost_blk_construct: *NOTICE*: Controller vhost.1: using bdev 'Malloc1' 273~~~ 274 275~~~{.sh} 276host:~# taskset -c 2,3 qemu-system-x86_64 \ 277 --enable-kvm \ 278 -cpu host -smp 2 \ 279 -m 1G -object memory-backend-file,id=mem0,size=1G,mem-path=/dev/hugepages,share=on -numa node,memdev=mem0 \ 280 -drive file=guest_os_image.qcow2,if=none,id=disk \ 281 -device ide-hd,drive=disk,bootindex=0 \ 282 -chardev socket,id=spdk_vhost_scsi0,path=/var/tmp/vhost.0 \ 283 -device vhost-user-scsi-pci,id=scsi0,chardev=spdk_vhost_scsi0,num_queues=4 \ 284 -chardev socket,id=spdk_vhost_blk0,path=/var/tmp/vhost.1 \ 285 -device vhost-user-blk-pci,chardev=spdk_vhost_blk0,num-queues=4 286~~~ 287 288Please note the following two commands are run on the guest VM. 289 290~~~{.sh} 291guest:~# lsblk --output "NAME,KNAME,MODEL,HCTL,SIZE,VENDOR,SUBSYSTEMS" 292NAME KNAME MODEL HCTL SIZE VENDOR SUBSYSTEMS 293sda sda QEMU HARDDISK 1:0:0:0 80G ATA block:scsi:pci 294 sda1 sda1 80G block:scsi:pci 295sdb sdb NVMe disk 2:0:0:0 372,6G INTEL block:scsi:virtio:pci 296sdc sdc Malloc disk 2:0:1:0 128M INTEL block:scsi:virtio:pci 297vda vda 128M 0x1af4 block:virtio:pci 298~~~ 299 300~~~{.sh} 301guest:~# poweroff 302~~~ 303 304~~~{.sh} 305host:~# fg 306<< CTRL + C >> 307vhost.c:1006:session_shutdown: *NOTICE*: Exiting 308~~~ 309 310We can see that `sdb` and `sdc` are SPDK vhost-scsi LUNs, and `vda` is SPDK a 311vhost-blk disk. 312 313# Advanced Topics {#vhost_advanced_topics} 314 315## Multi-Queue Block Layer (blk-mq) {#vhost_multiqueue} 316 317For best performance use the Linux kernel block multi-queue feature with vhost. 318To enable it on Linux, it is required to modify kernel options inside the 319virtual machine. 320 321Instructions below for Ubuntu OS: 322 3231. `vi /etc/default/grub` 3242. Make sure mq is enabled: `GRUB_CMDLINE_LINUX="scsi_mod.use_blk_mq=1"` 3253. `sudo update-grub` 3264. Reboot virtual machine 327 328To achieve better performance, make sure to increase number of cores 329assigned to the VM and add `num_queues` parameter to the QEMU `device`. It should be enough 330to set `num_queues=4` to saturate physical device. Adding too many queues might lead to SPDK 331vhost performance degradation if many vhost devices are used because each device will require 332additional `num_queues` to be polled. 333 334## Hot-attach/hot-detach {#vhost_hotattach} 335 336Hotplug/hotremove within a vhost controller is called hot-attach/detach. This is to 337distinguish it from SPDK bdev hotplug/hotremove. E.g. if an NVMe bdev is attached 338to a vhost-scsi controller, physically hotremoving the NVMe will trigger vhost-scsi 339hot-detach. It is also possible to hot-detach a bdev manually via RPC - for example 340when the bdev is about to be attached to another controller. See the details below. 341 342Please also note that hot-attach/detach is Vhost-SCSI-specific. There are no RPCs 343to hot-attach/detach the bdev from a Vhost-BLK device. If Vhost-BLK device exposes 344an NVMe bdev that is hotremoved, all the I/O traffic on that Vhost-BLK device will 345be aborted - possibly flooding a VM with syslog warnings and errors. 346 347### Hot-attach 348 349Hot-attach is done by simply attaching a bdev to a vhost controller with a QEMU VM 350already started. No other extra action is necessary. 351 352~~~{.sh} 353scripts/rpc.py vhost_scsi_controller_add_target vhost.0 0 Malloc0 354~~~ 355 356### Hot-detach 357 358Just like hot-attach, the hot-detach is done by simply removing bdev from a controller 359when QEMU VM is already started. 360 361~~~{.sh} 362scripts/rpc.py vhost_scsi_controller_remove_target vhost.0 0 363~~~ 364 365Removing an entire bdev will hot-detach it from a controller as well. 366 367~~~{.sh} 368scripts/rpc.py bdev_malloc_delete Malloc0 369~~~ 370 371# Known bugs and limitations {#vhost_bugs} 372 373## Vhost-NVMe (experimental) can only be supported with latest Linux kernel 374 375Vhost-NVMe target was designed for one new feature of NVMe 1.3 specification, Doorbell 376Buffer Config Admin command, which is used for emulated NVMe controller only. Linux 4.12 377added this feature, so a new Guest kernel later than 4.12 is required to test this feature. 378 379## Windows virtio-blk driver before version 0.1.130-1 only works with 512-byte sectors 380 381The Windows `viostor` driver before version 0.1.130-1 is buggy and does not 382correctly support vhost-blk devices with non-512-byte block size. 383See the [bug report](https://bugzilla.redhat.com/show_bug.cgi?id=1411092) for 384more information. 385 386## QEMU vhost-user-blk 387 388QEMU [vhost-user-blk](https://git.qemu.org/?p=qemu.git;a=commit;h=00343e4b54ba) is 389supported from version 2.12. 390