xref: /spdk/doc/vhost.md (revision e1b81c06bf8d3f4b6c209c8423bc44e6a123d6b6)
1# vhost Target {#vhost}
2
3## Table of Contents {#vhost_toc}
4
5- @ref vhost_intro
6- @ref vhost_prereqs
7- @ref vhost_start
8- @ref vhost_config
9- @ref vhost_qemu_config
10- @ref vhost_example
11- @ref vhost_advanced_topics
12- @ref vhost_bugs
13
14## Introduction {#vhost_intro}
15
16A vhost target provides a local storage service as a process running on a local machine.
17It is capable of exposing virtualized block devices to QEMU instances or other arbitrary
18processes.
19
20The following diagram presents how QEMU-based VM communicates with SPDK Vhost-SCSI device.
21
22![QEMU/SPDK vhost data flow](img/qemu_vhost_data_flow.svg)
23
24The diagram, and the vhost protocol itself is described in @ref vhost_processing doc.
25
26SPDK provides an accelerated vhost target by applying the same user space and polling
27techniques as other components in SPDK.  Since SPDK is polling for vhost submissions,
28it can signal the VM to skip notifications on submission.  This avoids VMEXITs on I/O
29submission and can significantly reduce CPU usage in the VM on heavy I/O workloads.
30
31## Prerequisites {#vhost_prereqs}
32
33This guide assumes the SPDK has been built according to the instructions in @ref
34getting_started.  The SPDK vhost target is built with the default configure options.
35
36### Vhost Command Line Parameters {#vhost_cmd_line_args}
37
38Additional command line flags are available for Vhost target.
39
40Param    | Type     | Default                | Description
41-------- | -------- | ---------------------- | -----------
42-S       | string   | $PWD                   | directory where UNIX domain sockets will be created
43
44### Supported Guest Operating Systems
45
46The guest OS must contain virtio-scsi or virtio-blk drivers.  Most Linux and FreeBSD
47distributions include virtio drivers.
48[Windows virtio drivers](https://fedoraproject.org/wiki/Windows_Virtio_Drivers) must be
49installed separately.  The SPDK vhost target has been tested with recent versions of Ubuntu,
50Fedora, and Windows
51
52### QEMU
53
54Userspace vhost-scsi target support was added to upstream QEMU in v2.10.0.  Run
55the following command to confirm your QEMU supports userspace vhost-scsi.
56
57~~~{.sh}
58qemu-system-x86_64 -device vhost-user-scsi-pci,help
59~~~
60
61Userspace vhost-blk target support was added to upstream QEMU in v2.12.0.  Run
62the following command to confirm your QEMU supports userspace vhost-blk.
63
64~~~{.sh}
65qemu-system-x86_64 -device vhost-user-blk-pci,help
66~~~
67
68## Starting SPDK vhost target {#vhost_start}
69
70First, run the SPDK setup.sh script to setup some hugepages for the SPDK vhost target
71application.  This will allocate 4096MiB (4GiB) of hugepages, enough for the SPDK
72vhost target and the virtual machine.
73
74~~~{.sh}
75HUGEMEM=4096 scripts/setup.sh
76~~~
77
78Next, start the SPDK vhost target application.  The following command will start vhost
79on CPU cores 0 and 1 (cpumask 0x3) with all future socket files placed in /var/tmp.
80Vhost will fully occupy given CPU cores for I/O polling. Particular vhost devices can
81be restricted to run on a subset of these CPU cores. See @ref vhost_vdev_create for
82details.
83
84~~~{.sh}
85build/bin/vhost -S /var/tmp -m 0x3
86~~~
87
88To list all available vhost options use the following command.
89
90~~~{.sh}
91build/bin/vhost -h
92~~~
93
94## SPDK Configuration {#vhost_config}
95
96### Create bdev (block device) {#vhost_bdev_create}
97
98SPDK bdevs are block devices which will be exposed to the guest OS.
99For vhost-scsi, bdevs are exposed as SCSI LUNs on SCSI devices attached to the
100vhost-scsi controller in the guest OS.
101For vhost-blk, bdevs are exposed directly as block devices in the guest OS and are
102not associated at all with SCSI.
103
104SPDK supports several different types of storage backends, including NVMe,
105Linux AIO, malloc ramdisk and Ceph RBD.  Refer to @ref bdev for
106additional information on configuring SPDK storage backends.
107
108This guide will use a malloc bdev (ramdisk) named Malloc0. The following RPC
109will create a 64MB malloc bdev with 512-byte block size.
110
111~~~{.sh}
112scripts/rpc.py bdev_malloc_create 64 512 -b Malloc0
113~~~
114
115### Create a vhost device {#vhost_vdev_create}
116
117#### Vhost-SCSI
118
119The following RPC will create a vhost-scsi controller which can be accessed
120by QEMU via /var/tmp/vhost.0. At the time of creation the controller will be
121bound to a single CPU core with the smallest number of vhost controllers.
122The optional `--cpumask` parameter can directly specify which cores should be
123taken into account - in this case always CPU 0. To achieve optimal performance
124on NUMA systems, the cpumask should specify cores on the same CPU socket as its
125associated VM.
126
127~~~{.sh}
128scripts/rpc.py vhost_create_scsi_controller --cpumask 0x1 vhost.0
129~~~
130
131The following RPC will attach the Malloc0 bdev to the vhost.0 vhost-scsi
132controller.  Malloc0 will appear as a single LUN on a SCSI device with
133target ID 0. SPDK Vhost-SCSI device currently supports only one LUN per SCSI target.
134Additional LUNs can be added by specifying a different target ID.
135
136~~~{.sh}
137scripts/rpc.py vhost_scsi_controller_add_target vhost.0 0 Malloc0
138~~~
139
140To remove a bdev from a vhost-scsi controller use the following RPC:
141
142~~~{.sh}
143scripts/rpc.py vhost_scsi_controller_remove_target vhost.0 0
144~~~
145
146#### Vhost-BLK
147
148The following RPC will create a vhost-blk device exposing Malloc0 bdev.
149The device will be accessible to QEMU via /var/tmp/vhost.1. All the I/O polling
150will be pinned to the least occupied CPU core within given cpumask - in this case
151always CPU 0. For NUMA systems, the cpumask should specify cores on the same CPU
152socket as its associated VM.
153
154~~~{.sh}
155scripts/rpc.py vhost_create_blk_controller --cpumask 0x1 vhost.1 Malloc0
156~~~
157
158It is also possible to create a read-only vhost-blk device by specifying an
159extra `-r` or `--readonly` parameter.
160
161~~~{.sh}
162scripts/rpc.py vhost_create_blk_controller --cpumask 0x1 -r vhost.1 Malloc0
163~~~
164
165### QEMU {#vhost_qemu_config}
166
167Now the virtual machine can be started with QEMU.  The following command-line
168parameters must be added to connect the virtual machine to its vhost controller.
169
170First, specify the memory backend for the virtual machine.  Since QEMU must
171share the virtual machine's memory with the SPDK vhost target, the memory
172must be specified in this format with share=on.
173
174~~~{.sh}
175-object memory-backend-file,id=mem,size=1G,mem-path=/dev/hugepages,share=on
176-numa node,memdev=mem
177~~~
178
179Second, ensure QEMU boots from the virtual machine image and not the
180SPDK malloc block device by specifying bootindex=0 for the boot image.
181
182~~~{.sh}
183-drive file=guest_os_image.qcow2,if=none,id=disk
184-device ide-hd,drive=disk,bootindex=0
185~~~
186
187Finally, specify the SPDK vhost devices:
188
189#### Vhost-SCSI
190
191~~~{.sh}
192-chardev socket,id=char0,path=/var/tmp/vhost.0
193-device vhost-user-scsi-pci,id=scsi0,chardev=char0
194~~~
195
196#### Vhost-BLK
197
198~~~{.sh}
199-chardev socket,id=char1,path=/var/tmp/vhost.1
200-device vhost-user-blk-pci,id=blk0,chardev=char1
201~~~
202
203### Example output {#vhost_example}
204
205This example uses an NVMe bdev alongside Mallocs. SPDK vhost application is started
206on CPU cores 0 and 1, QEMU on cores 2 and 3.
207
208~~~{.sh}
209host:~# HUGEMEM=2048 ./scripts/setup.sh
2100000:01:00.0 (8086 0953): nvme -> vfio-pci
211~~~
212
213~~~{.sh}
214host:~# ./build/bin/vhost -S /var/tmp -s 1024 -m 0x3 &
215Starting DPDK 17.11.0 initialization...
216[ DPDK EAL parameters: vhost -c 3 -m 1024 --main-lcore=1 --file-prefix=spdk_pid156014 ]
217EAL: Detected 48 lcore(s)
218EAL: Probing VFIO support...
219EAL: VFIO support initialized
220app.c: 369:spdk_app_start: *NOTICE*: Total cores available: 2
221reactor.c: 668:spdk_reactors_init: *NOTICE*: Occupied cpu socket mask is 0x1
222reactor.c: 424:_spdk_reactor_run: *NOTICE*: Reactor started on core 1 on socket 0
223reactor.c: 424:_spdk_reactor_run: *NOTICE*: Reactor started on core 0 on socket 0
224~~~
225
226~~~{.sh}
227host:~# ./scripts/rpc.py bdev_nvme_attach_controller -b Nvme0 -t pcie -a 0000:01:00.0
228EAL: PCI device 0000:01:00.0 on NUMA socket 0
229EAL:   probe driver: 8086:953 spdk_nvme
230EAL:   using IOMMU type 1 (Type 1)
231~~~
232
233~~~{.sh}
234host:~# ./scripts/rpc.py bdev_malloc_create 128 4096 -b Malloc0
235Malloc0
236~~~
237
238~~~{.sh}
239host:~# ./scripts/rpc.py vhost_create_scsi_controller --cpumask 0x1 vhost.0
240VHOST_CONFIG: vhost-user server: socket created, fd: 21
241VHOST_CONFIG: bind to /var/tmp/vhost.0
242vhost.c: 596:spdk_vhost_dev_construct: *NOTICE*: Controller vhost.0: new controller added
243~~~
244
245~~~{.sh}
246host:~# ./scripts/rpc.py vhost_scsi_controller_add_target vhost.0 0 Nvme0n1
247vhost_scsi.c: 840:spdk_vhost_scsi_dev_add_tgt: *NOTICE*: Controller vhost.0: defined target 'Target 0' using lun 'Nvme0'
248
249~~~
250
251~~~{.sh}
252host:~# ./scripts/rpc.py vhost_scsi_controller_add_target vhost.0 1 Malloc0
253vhost_scsi.c: 840:spdk_vhost_scsi_dev_add_tgt: *NOTICE*: Controller vhost.0: defined target 'Target 1' using lun 'Malloc0'
254~~~
255
256~~~{.sh}
257host:~# ./scripts/rpc.py bdev_malloc_create 64 512 -b Malloc1
258Malloc1
259~~~
260
261~~~{.sh}
262host:~# ./scripts/rpc.py vhost_create_blk_controller --cpumask 0x2 vhost.1 Malloc1
263vhost_blk.c: 719:spdk_vhost_blk_construct: *NOTICE*: Controller vhost.1: using bdev 'Malloc1'
264~~~
265
266~~~{.sh}
267host:~# taskset -c 2,3 qemu-system-x86_64 \
268  --enable-kvm \
269  -cpu host -smp 2 \
270  -m 1G -object memory-backend-file,id=mem0,size=1G,mem-path=/dev/hugepages,share=on -numa node,memdev=mem0 \
271  -drive file=guest_os_image.qcow2,if=none,id=disk \
272  -device ide-hd,drive=disk,bootindex=0 \
273  -chardev socket,id=spdk_vhost_scsi0,path=/var/tmp/vhost.0 \
274  -device vhost-user-scsi-pci,id=scsi0,chardev=spdk_vhost_scsi0,num_queues=2 \
275  -chardev socket,id=spdk_vhost_blk0,path=/var/tmp/vhost.1 \
276  -device vhost-user-blk-pci,chardev=spdk_vhost_blk0,num-queues=2
277~~~
278
279Please note the following two commands are run on the guest VM.
280
281~~~{.sh}
282guest:~# lsblk --output "NAME,KNAME,MODEL,HCTL,SIZE,VENDOR,SUBSYSTEMS"
283NAME   KNAME MODEL            HCTL         SIZE VENDOR   SUBSYSTEMS
284sda    sda   QEMU HARDDISK    1:0:0:0       80G ATA      block:scsi:pci
285  sda1 sda1                                 80G          block:scsi:pci
286sdb    sdb   NVMe disk        2:0:0:0    372,6G INTEL    block:scsi:virtio:pci
287sdc    sdc   Malloc disk      2:0:1:0      128M INTEL    block:scsi:virtio:pci
288vda    vda                                 128M 0x1af4   block:virtio:pci
289~~~
290
291~~~{.sh}
292guest:~# poweroff
293~~~
294
295~~~{.sh}
296host:~# fg
297<< CTRL + C >>
298vhost.c:1006:session_shutdown: *NOTICE*: Exiting
299~~~
300
301We can see that `sdb` and `sdc` are SPDK vhost-scsi LUNs, and `vda` is SPDK a
302vhost-blk disk.
303
304## Advanced Topics {#vhost_advanced_topics}
305
306### Multi-Queue Block Layer (blk-mq) {#vhost_multiqueue}
307
308For best performance use the Linux kernel block multi-queue feature with vhost.
309To enable it on Linux, it is required to modify kernel options inside the
310virtual machine.
311
312Instructions below for Ubuntu OS:
313
3141. `vi /etc/default/grub`
3152. Make sure mq is enabled: `GRUB_CMDLINE_LINUX="scsi_mod.use_blk_mq=1"`
3163. `sudo update-grub`
3174. Reboot virtual machine
318
319To achieve better performance, make sure to increase number of cores
320assigned to the VM and add `num_queues` parameter to the QEMU `device`. It should be enough
321to set `num_queues=4` to saturate physical device. Adding too many queues might lead to SPDK
322vhost performance degradation if many vhost devices are used because each device will require
323additional `num_queues` to be polled.
324
325Some Linux distributions report a kernel panic when starting the VM if the number of I/O queues
326specified via the `num-queues` parameter is greater than number of vCPUs. If you need to use
327more I/O queues than vCPUs, check that your OS image supports that configuration.
328
329### Hot-attach/hot-detach {#vhost_hotattach}
330
331Hotplug/hotremove within a vhost controller is called hot-attach/detach. This is to
332distinguish it from SPDK bdev hotplug/hotremove. E.g. if an NVMe bdev is attached
333to a vhost-scsi controller, physically hotremoving the NVMe will trigger vhost-scsi
334hot-detach. It is also possible to hot-detach a bdev manually via RPC - for example
335when the bdev is about to be attached to another controller. See the details below.
336
337Please also note that hot-attach/detach is Vhost-SCSI-specific. There are no RPCs
338to hot-attach/detach the bdev from a Vhost-BLK device. If Vhost-BLK device exposes
339an NVMe bdev that is hotremoved, all the I/O traffic on that Vhost-BLK device will
340be aborted - possibly flooding a VM with syslog warnings and errors.
341
342#### Hot-attach
343
344Hot-attach is done by simply attaching a bdev to a vhost controller with a QEMU VM
345already started. No other extra action is necessary.
346
347~~~{.sh}
348scripts/rpc.py vhost_scsi_controller_add_target vhost.0 0 Malloc0
349~~~
350
351#### Hot-detach
352
353Just like hot-attach, the hot-detach is done by simply removing bdev from a controller
354when QEMU VM is already started.
355
356~~~{.sh}
357scripts/rpc.py vhost_scsi_controller_remove_target vhost.0 0
358~~~
359
360Removing an entire bdev will hot-detach it from a controller as well.
361
362~~~{.sh}
363scripts/rpc.py bdev_malloc_delete Malloc0
364~~~
365
366## Known bugs and limitations {#vhost_bugs}
367
368### Windows virtio-blk driver before version 0.1.130-1 only works with 512-byte sectors
369
370The Windows `viostor` driver before version 0.1.130-1 is buggy and does not
371correctly support vhost-blk devices with non-512-byte block size.
372See the [bug report](https://bugzilla.redhat.com/show_bug.cgi?id=1411092) for
373more information.
374
375### QEMU vhost-user-blk
376
377QEMU [vhost-user-blk](https://git.qemu.org/?p=qemu.git;a=commit;h=00343e4b54ba) is
378supported from version 2.12.
379