xref: /spdk/doc/vhost.md (revision 7192849ed24874f3e9cc31e8a33a9b32c49b9506)
1# vhost Target {#vhost}
2
3# Table of Contents {#vhost_toc}
4
5- @ref vhost_intro
6- @ref vhost_prereqs
7- @ref vhost_start
8- @ref vhost_config
9- @ref vhost_qemu_config
10- @ref vhost_example
11- @ref vhost_advanced_topics
12- @ref vhost_bugs
13
14# Introduction {#vhost_intro}
15
16A vhost target provides a local storage service as a process running on a local machine.
17It is capable of exposing virtualized block devices to QEMU instances or other arbitrary
18processes.
19
20The following diagram presents how QEMU-based VM communicates with SPDK Vhost-SCSI device.
21
22![QEMU/SPDK vhost data flow](img/qemu_vhost_data_flow.svg)
23
24The diagram, and the vhost protocol itself is described in @ref vhost_processing doc.
25
26SPDK provides an accelerated vhost target by applying the same user space and polling
27techniques as other components in SPDK.  Since SPDK is polling for vhost submissions,
28it can signal the VM to skip notifications on submission.  This avoids VMEXITs on I/O
29submission and can significantly reduce CPU usage in the VM on heavy I/O workloads.
30
31# Prerequisites {#vhost_prereqs}
32
33This guide assumes the SPDK has been built according to the instructions in @ref
34getting_started.  The SPDK vhost target is built with the default configure options.
35
36## Vhost Command Line Parameters {#vhost_cmd_line_args}
37
38Additional command line flags are available for Vhost target.
39
40Param    | Type     | Default                | Description
41-------- | -------- | ---------------------- | -----------
42-S       | string   | $PWD                   | directory where UNIX domain sockets will be created
43
44## Supported Guest Operating Systems
45
46The guest OS must contain virtio-scsi or virtio-blk drivers.  Most Linux and FreeBSD
47distributions include virtio drivers.
48[Windows virtio drivers](https://fedoraproject.org/wiki/Windows_Virtio_Drivers) must be
49installed separately.  The SPDK vhost target has been tested with recent versions of Ubuntu,
50Fedora, and Windows
51
52## QEMU
53
54Userspace vhost-scsi target support was added to upstream QEMU in v2.10.0.  Run
55the following command to confirm your QEMU supports userspace vhost-scsi.
56
57~~~{.sh}
58qemu-system-x86_64 -device vhost-user-scsi-pci,help
59~~~
60
61Userspace vhost-blk target support was added to upstream QEMU in v2.12.0.  Run
62the following command to confirm your QEMU supports userspace vhost-blk.
63
64~~~{.sh}
65qemu-system-x86_64 -device vhost-user-blk-pci,help
66~~~
67
68Userspace vhost-nvme target was added as experimental feature for SPDK 18.04
69release, patches for QEMU are available in SPDK's QEMU repository only.
70
71Run the following command to confirm your QEMU supports userspace vhost-nvme.
72
73~~~{.sh}
74qemu-system-x86_64 -device vhost-user-nvme,help
75~~~
76
77# Starting SPDK vhost target {#vhost_start}
78
79First, run the SPDK setup.sh script to setup some hugepages for the SPDK vhost target
80application.  This will allocate 4096MiB (4GiB) of hugepages, enough for the SPDK
81vhost target and the virtual machine.
82
83~~~{.sh}
84HUGEMEM=4096 scripts/setup.sh
85~~~
86
87Next, start the SPDK vhost target application.  The following command will start vhost
88on CPU cores 0 and 1 (cpumask 0x3) with all future socket files placed in /var/tmp.
89Vhost will fully occupy given CPU cores for I/O polling. Particular vhost devices can
90be restricted to run on a subset of these CPU cores. See @ref vhost_vdev_create for
91details.
92
93~~~{.sh}
94build/bin/vhost -S /var/tmp -m 0x3
95~~~
96
97To list all available vhost options use the following command.
98
99~~~{.sh}
100build/bin/vhost -h
101~~~
102
103# SPDK Configuration {#vhost_config}
104
105## Create bdev (block device) {#vhost_bdev_create}
106
107SPDK bdevs are block devices which will be exposed to the guest OS.
108For vhost-scsi, bdevs are exposed as SCSI LUNs on SCSI devices attached to the
109vhost-scsi controller in the guest OS.
110For vhost-blk, bdevs are exposed directly as block devices in the guest OS and are
111not associated at all with SCSI.
112
113SPDK supports several different types of storage backends, including NVMe,
114Linux AIO, malloc ramdisk and Ceph RBD.  Refer to @ref bdev for
115additional information on configuring SPDK storage backends.
116
117This guide will use a malloc bdev (ramdisk) named Malloc0. The following RPC
118will create a 64MB malloc bdev with 512-byte block size.
119
120~~~{.sh}
121scripts/rpc.py bdev_malloc_create 64 512 -b Malloc0
122~~~
123
124## Create a vhost device {#vhost_vdev_create}
125
126### Vhost-SCSI
127
128The following RPC will create a vhost-scsi controller which can be accessed
129by QEMU via /var/tmp/vhost.0. At the time of creation the controller will be
130bound to a single CPU core with the smallest number of vhost controllers.
131The optional `--cpumask` parameter can directly specify which cores should be
132taken into account - in this case always CPU 0. To achieve optimal performance
133on NUMA systems, the cpumask should specify cores on the same CPU socket as its
134associated VM.
135
136~~~{.sh}
137scripts/rpc.py vhost_create_scsi_controller --cpumask 0x1 vhost.0
138~~~
139
140The following RPC will attach the Malloc0 bdev to the vhost.0 vhost-scsi
141controller.  Malloc0 will appear as a single LUN on a SCSI device with
142target ID 0. SPDK Vhost-SCSI device currently supports only one LUN per SCSI target.
143Additional LUNs can be added by specifying a different target ID.
144
145~~~{.sh}
146scripts/rpc.py vhost_scsi_controller_add_target vhost.0 0 Malloc0
147~~~
148
149To remove a bdev from a vhost-scsi controller use the following RPC:
150
151~~~{.sh}
152scripts/rpc.py vhost_scsi_controller_remove_target vhost.0 0
153~~~
154
155### Vhost-BLK
156
157The following RPC will create a vhost-blk device exposing Malloc0 bdev.
158The device will be accessible to QEMU via /var/tmp/vhost.1. All the I/O polling
159will be pinned to the least occupied CPU core within given cpumask - in this case
160always CPU 0. For NUMA systems, the cpumask should specify cores on the same CPU
161socket as its associated VM.
162
163~~~{.sh}
164scripts/rpc.py vhost_create_blk_controller --cpumask 0x1 vhost.1 Malloc0
165~~~
166
167It is also possible to create a read-only vhost-blk device by specifying an
168extra `-r` or `--readonly` parameter.
169
170~~~{.sh}
171scripts/rpc.py vhost_create_blk_controller --cpumask 0x1 -r vhost.1 Malloc0
172~~~
173
174### Vhost-NVMe (experimental)
175
176The following RPC will attach the Malloc0 bdev to the vhost.0 vhost-nvme
177controller. Malloc0 will appear as Namespace 1 of vhost.0 controller. Users
178can use `--cpumask` parameter to specify which cores should be used for this
179controller. Users must specify the maximum I/O queues supported for the
180controller, at least 1 Namespace is required for each controller.
181
182~~~{.sh}
183$rpc_py vhost_create_nvme_controller --cpumask 0x1 vhost.2 16
184$rpc_py vhost_nvme_controller_add_ns vhost.2 Malloc0
185~~~
186
187Users can use the following command to remove the controller, all the block
188devices attached to controller's Namespace will be removed automatically.
189
190~~~{.sh}
191$rpc_py vhost_delete_controller vhost.2
192~~~
193
194## QEMU {#vhost_qemu_config}
195
196Now the virtual machine can be started with QEMU.  The following command-line
197parameters must be added to connect the virtual machine to its vhost controller.
198
199First, specify the memory backend for the virtual machine.  Since QEMU must
200share the virtual machine's memory with the SPDK vhost target, the memory
201must be specified in this format with share=on.
202
203~~~{.sh}
204-object memory-backend-file,id=mem,size=1G,mem-path=/dev/hugepages,share=on
205-numa node,memdev=mem
206~~~
207
208Second, ensure QEMU boots from the virtual machine image and not the
209SPDK malloc block device by specifying bootindex=0 for the boot image.
210
211~~~{.sh}
212-drive file=guest_os_image.qcow2,if=none,id=disk
213-device ide-hd,drive=disk,bootindex=0
214~~~
215
216Finally, specify the SPDK vhost devices:
217
218### Vhost-SCSI
219
220~~~{.sh}
221-chardev socket,id=char0,path=/var/tmp/vhost.0
222-device vhost-user-scsi-pci,id=scsi0,chardev=char0
223~~~
224
225### Vhost-BLK
226
227~~~{.sh}
228-chardev socket,id=char1,path=/var/tmp/vhost.1
229-device vhost-user-blk-pci,id=blk0,chardev=char1
230~~~
231
232### Vhost-NVMe (experimental)
233
234~~~{.sh}
235-chardev socket,id=char2,path=/var/tmp/vhost.2
236-device vhost-user-nvme,id=nvme0,chardev=char2,num_io_queues=4
237~~~
238
239## Example output {#vhost_example}
240
241This example uses an NVMe bdev alongside Mallocs. SPDK vhost application is started
242on CPU cores 0 and 1, QEMU on cores 2 and 3.
243
244~~~{.sh}
245host:~# HUGEMEM=2048 ./scripts/setup.sh
2460000:01:00.0 (8086 0953): nvme -> vfio-pci
247~~~
248
249~~~{.sh}
250host:~# ./build/bin/vhost -S /var/tmp -s 1024 -m 0x3 &
251Starting DPDK 17.11.0 initialization...
252[ DPDK EAL parameters: vhost -c 3 -m 1024 --master-lcore=1 --file-prefix=spdk_pid156014 ]
253EAL: Detected 48 lcore(s)
254EAL: Probing VFIO support...
255EAL: VFIO support initialized
256app.c: 369:spdk_app_start: *NOTICE*: Total cores available: 2
257reactor.c: 668:spdk_reactors_init: *NOTICE*: Occupied cpu socket mask is 0x1
258reactor.c: 424:_spdk_reactor_run: *NOTICE*: Reactor started on core 1 on socket 0
259reactor.c: 424:_spdk_reactor_run: *NOTICE*: Reactor started on core 0 on socket 0
260~~~
261
262~~~{.sh}
263host:~# ./scripts/rpc.py bdev_nvme_attach_controller -b Nvme0 -t pcie -a 0000:01:00.0
264EAL: PCI device 0000:01:00.0 on NUMA socket 0
265EAL:   probe driver: 8086:953 spdk_nvme
266EAL:   using IOMMU type 1 (Type 1)
267~~~
268
269~~~{.sh}
270host:~# ./scripts/rpc.py bdev_malloc_create 128 4096 Malloc0
271Malloc0
272~~~
273
274~~~{.sh}
275host:~# ./scripts/rpc.py vhost_create_scsi_controller --cpumask 0x1 vhost.0
276VHOST_CONFIG: vhost-user server: socket created, fd: 21
277VHOST_CONFIG: bind to /var/tmp/vhost.0
278vhost.c: 596:spdk_vhost_dev_construct: *NOTICE*: Controller vhost.0: new controller added
279~~~
280
281~~~{.sh}
282host:~# ./scripts/rpc.py vhost_scsi_controller_add_target vhost.0 0 Nvme0n1
283vhost_scsi.c: 840:spdk_vhost_scsi_dev_add_tgt: *NOTICE*: Controller vhost.0: defined target 'Target 0' using lun 'Nvme0'
284
285~~~
286
287~~~{.sh}
288host:~# ./scripts/rpc.py vhost_scsi_controller_add_target vhost.0 1 Malloc0
289vhost_scsi.c: 840:spdk_vhost_scsi_dev_add_tgt: *NOTICE*: Controller vhost.0: defined target 'Target 1' using lun 'Malloc0'
290~~~
291
292~~~{.sh}
293host:~# ./scripts/rpc.py bdev_malloc_create 64 512 -b Malloc1
294Malloc1
295~~~
296
297~~~{.sh}
298host:~# ./scripts/rpc.py vhost_create_blk_controller --cpumask 0x2 vhost.1 Malloc1
299vhost_blk.c: 719:spdk_vhost_blk_construct: *NOTICE*: Controller vhost.1: using bdev 'Malloc1'
300~~~
301
302~~~{.sh}
303host:~# taskset -c 2,3 qemu-system-x86_64 \
304  --enable-kvm \
305  -cpu host -smp 2 \
306  -m 1G -object memory-backend-file,id=mem0,size=1G,mem-path=/dev/hugepages,share=on -numa node,memdev=mem0 \
307  -drive file=guest_os_image.qcow2,if=none,id=disk \
308  -device ide-hd,drive=disk,bootindex=0 \
309  -chardev socket,id=spdk_vhost_scsi0,path=/var/tmp/vhost.0 \
310  -device vhost-user-scsi-pci,id=scsi0,chardev=spdk_vhost_scsi0,num_queues=4 \
311  -chardev socket,id=spdk_vhost_blk0,path=/var/tmp/vhost.1 \
312  -device vhost-user-blk-pci,chardev=spdk_vhost_blk0,num-queues=4
313~~~
314
315Please note the following two commands are run on the guest VM.
316
317~~~{.sh}
318guest:~# lsblk --output "NAME,KNAME,MODEL,HCTL,SIZE,VENDOR,SUBSYSTEMS"
319NAME   KNAME MODEL            HCTL         SIZE VENDOR   SUBSYSTEMS
320sda    sda   QEMU HARDDISK    1:0:0:0       80G ATA      block:scsi:pci
321  sda1 sda1                                 80G          block:scsi:pci
322sdb    sdb   NVMe disk        2:0:0:0    372,6G INTEL    block:scsi:virtio:pci
323sdc    sdc   Malloc disk      2:0:1:0      128M INTEL    block:scsi:virtio:pci
324vda    vda                                 128M 0x1af4   block:virtio:pci
325~~~
326
327~~~{.sh}
328guest:~# poweroff
329~~~
330
331~~~{.sh}
332host:~# fg
333<< CTRL + C >>
334vhost.c:1006:session_shutdown: *NOTICE*: Exiting
335~~~
336
337We can see that `sdb` and `sdc` are SPDK vhost-scsi LUNs, and `vda` is SPDK a
338vhost-blk disk.
339
340# Advanced Topics {#vhost_advanced_topics}
341
342## Multi-Queue Block Layer (blk-mq) {#vhost_multiqueue}
343
344For best performance use the Linux kernel block multi-queue feature with vhost.
345To enable it on Linux, it is required to modify kernel options inside the
346virtual machine.
347
348Instructions below for Ubuntu OS:
349
3501. `vi /etc/default/grub`
3512. Make sure mq is enabled: `GRUB_CMDLINE_LINUX="scsi_mod.use_blk_mq=1"`
3523. `sudo update-grub`
3534. Reboot virtual machine
354
355To achieve better performance, make sure to increase number of cores
356assigned to the VM and add `num_queues` parameter to the QEMU `device`. It should be enough
357to set `num_queues=4` to saturate physical device. Adding too many queues might lead to SPDK
358vhost performance degradation if many vhost devices are used because each device will require
359additional `num_queues` to be polled.
360
361## Hot-attach/hot-detach {#vhost_hotattach}
362
363Hotplug/hotremove within a vhost controller is called hot-attach/detach. This is to
364distinguish it from SPDK bdev hotplug/hotremove. E.g. if an NVMe bdev is attached
365to a vhost-scsi controller, physically hotremoving the NVMe will trigger vhost-scsi
366hot-detach. It is also possible to hot-detach a bdev manually via RPC - for example
367when the bdev is about to be attached to another controller. See the details below.
368
369Please also note that hot-attach/detach is Vhost-SCSI-specific. There are no RPCs
370to hot-attach/detach the bdev from a Vhost-BLK device. If Vhost-BLK device exposes
371an NVMe bdev that is hotremoved, all the I/O traffic on that Vhost-BLK device will
372be aborted - possibly flooding a VM with syslog warnings and errors.
373
374### Hot-attach
375
376Hot-attach is done by simply attaching a bdev to a vhost controller with a QEMU VM
377already started. No other extra action is necessary.
378
379~~~{.sh}
380scripts/rpc.py vhost_scsi_controller_add_target vhost.0 0 Malloc0
381~~~
382
383### Hot-detach
384
385Just like hot-attach, the hot-detach is done by simply removing bdev from a controller
386when QEMU VM is already started.
387
388~~~{.sh}
389scripts/rpc.py vhost_scsi_controller_remove_target vhost.0 0
390~~~
391
392Removing an entire bdev will hot-detach it from a controller as well.
393
394~~~{.sh}
395scripts/rpc.py bdev_malloc_delete Malloc0
396~~~
397
398# Known bugs and limitations {#vhost_bugs}
399
400## Vhost-NVMe (experimental) can only be supported with latest Linux kernel
401
402Vhost-NVMe target was designed for one new feature of NVMe 1.3 specification, Doorbell
403Buffer Config Admin command, which is used for emulated NVMe controller only. Linux 4.12
404added this feature, so a new Guest kernel later than 4.12 is required to test this feature.
405
406## Windows virtio-blk driver before version 0.1.130-1 only works with 512-byte sectors
407
408The Windows `viostor` driver before version 0.1.130-1 is buggy and does not
409correctly support vhost-blk devices with non-512-byte block size.
410See the [bug report](https://bugzilla.redhat.com/show_bug.cgi?id=1411092) for
411more information.
412
413## QEMU vhost-user-blk
414
415QEMU [vhost-user-blk](https://git.qemu.org/?p=qemu.git;a=commit;h=00343e4b54ba) is
416supported from version 2.12.
417