xref: /spdk/doc/vhost.md (revision d73077b84a71985da1db1c9847ea7c042189bae2)
1# vhost Target {#vhost}
2
3# Table of Contents {#vhost_toc}
4
5- @ref vhost_intro
6- @ref vhost_prereqs
7- @ref vhost_start
8- @ref vhost_config
9- @ref vhost_qemu_config
10- @ref vhost_example
11- @ref vhost_advanced_topics
12- @ref vhost_bugs
13
14# Introduction {#vhost_intro}
15
16A vhost target provides a local storage service as a process running on a local machine.
17It is capable of exposing virtualized block devices to QEMU instances or other arbitrary
18processes.
19
20The following diagram presents how QEMU-based VM communicates with SPDK Vhost-SCSI device.
21
22![QEMU/SPDK vhost data flow](img/qemu_vhost_data_flow.svg)
23
24The diagram, and the vhost protocol itself is described in @ref vhost_processing doc.
25
26SPDK provides an accelerated vhost target by applying the same user space and polling
27techniques as other components in SPDK.  Since SPDK is polling for vhost submissions,
28it can signal the VM to skip notifications on submission.  This avoids VMEXITs on I/O
29submission and can significantly reduce CPU usage in the VM on heavy I/O workloads.
30
31# Prerequisites {#vhost_prereqs}
32
33This guide assumes the SPDK has been built according to the instructions in @ref
34getting_started.  The SPDK vhost target is built with the default configure options.
35
36## Vhost Command Line Parameters {#vhost_cmd_line_args}
37
38Additional command line flags are available for Vhost target.
39
40Param    | Type     | Default                | Description
41-------- | -------- | ---------------------- | -----------
42-S       | string   | $PWD                   | directory where UNIX domain sockets will be created
43
44## Supported Guest Operating Systems
45
46The guest OS must contain virtio-scsi or virtio-blk drivers.  Most Linux and FreeBSD
47distributions include virtio drivers.
48[Windows virtio drivers](https://fedoraproject.org/wiki/Windows_Virtio_Drivers) must be
49installed separately.  The SPDK vhost target has been tested with recent versions of Ubuntu,
50Fedora, and Windows
51
52## QEMU
53
54Userspace vhost-scsi target support was added to upstream QEMU in v2.10.0.  Run
55the following command to confirm your QEMU supports userspace vhost-scsi.
56
57~~~{.sh}
58qemu-system-x86_64 -device vhost-user-scsi-pci,help
59~~~
60
61Userspace vhost-blk target support was added to upstream QEMU in v2.12.0.  Run
62the following command to confirm your QEMU supports userspace vhost-blk.
63
64~~~{.sh}
65qemu-system-x86_64 -device vhost-user-blk-pci,help
66~~~
67
68Userspace vhost-nvme target was added as experimental feature for SPDK 18.04
69release, patches for QEMU are available in SPDK's QEMU repository only.
70
71Run the following command to confirm your QEMU supports userspace vhost-nvme.
72
73~~~{.sh}
74qemu-system-x86_64 -device vhost-user-nvme,help
75~~~
76
77# Starting SPDK vhost target {#vhost_start}
78
79First, run the SPDK setup.sh script to setup some hugepages for the SPDK vhost target
80application.  This will allocate 4096MiB (4GiB) of hugepages, enough for the SPDK
81vhost target and the virtual machine.
82
83~~~{.sh}
84HUGEMEM=4096 scripts/setup.sh
85~~~
86
87Next, start the SPDK vhost target application.  The following command will start vhost
88on CPU cores 0 and 1 (cpumask 0x3) with all future socket files placed in /var/tmp.
89Vhost will fully occupy given CPU cores for I/O polling. Particular vhost devices can
90be restricted to run on a subset of these CPU cores. See @ref vhost_vdev_create for
91details.
92
93~~~{.sh}
94build/bin/vhost -S /var/tmp -m 0x3
95~~~
96
97To list all available vhost options use the following command.
98
99~~~{.sh}
100build/bin/vhost -h
101~~~
102
103# SPDK Configuration {#vhost_config}
104
105## Create bdev (block device) {#vhost_bdev_create}
106
107SPDK bdevs are block devices which will be exposed to the guest OS.
108For vhost-scsi, bdevs are exposed as SCSI LUNs on SCSI devices attached to the
109vhost-scsi controller in the guest OS.
110For vhost-blk, bdevs are exposed directly as block devices in the guest OS and are
111not associated at all with SCSI.
112
113SPDK supports several different types of storage backends, including NVMe,
114Linux AIO, malloc ramdisk and Ceph RBD.  Refer to @ref bdev for
115additional information on configuring SPDK storage backends.
116
117This guide will use a malloc bdev (ramdisk) named Malloc0. The following RPC
118will create a 64MB malloc bdev with 512-byte block size.
119
120~~~{.sh}
121scripts/rpc.py bdev_malloc_create 64 512 -b Malloc0
122~~~
123
124## Create a vhost device {#vhost_vdev_create}
125
126### Vhost-SCSI
127
128The following RPC will create a vhost-scsi controller which can be accessed
129by QEMU via /var/tmp/vhost.0. At the time of creation the controller will be
130bound to a single CPU core with the smallest number of vhost controllers.
131The optional `--cpumask` parameter can directly specify which cores should be
132taken into account - in this case always CPU 0. To achieve optimal performance
133on NUMA systems, the cpumask should specify cores on the same CPU socket as its
134associated VM.
135
136~~~{.sh}
137scripts/rpc.py vhost_create_scsi_controller --cpumask 0x1 vhost.0
138~~~
139
140The following RPC will attach the Malloc0 bdev to the vhost.0 vhost-scsi
141controller.  Malloc0 will appear as a single LUN on a SCSI device with
142target ID 0. SPDK Vhost-SCSI device currently supports only one LUN per SCSI target.
143Additional LUNs can be added by specifying a different target ID.
144
145~~~{.sh}
146scripts/rpc.py vhost_scsi_controller_add_target vhost.0 0 Malloc0
147~~~
148
149To remove a bdev from a vhost-scsi controller use the following RPC:
150
151~~~{.sh}
152scripts/rpc.py vhost_scsi_controller_remove_target vhost.0 0
153~~~
154
155### Vhost-BLK
156
157The following RPC will create a vhost-blk device exposing Malloc0 bdev.
158The device will be accessible to QEMU via /var/tmp/vhost.1. All the I/O polling
159will be pinned to the least occupied CPU core within given cpumask - in this case
160always CPU 0. For NUMA systems, the cpumask should specify cores on the same CPU
161socket as its associated VM.
162
163~~~{.sh}
164scripts/rpc.py vhost_create_blk_controller --cpumask 0x1 vhost.1 Malloc0
165~~~
166
167It is also possible to create a read-only vhost-blk device by specifying an
168extra `-r` or `--readonly` parameter.
169
170~~~{.sh}
171scripts/rpc.py vhost_create_blk_controller --cpumask 0x1 -r vhost.1 Malloc0
172~~~
173
174## QEMU {#vhost_qemu_config}
175
176Now the virtual machine can be started with QEMU.  The following command-line
177parameters must be added to connect the virtual machine to its vhost controller.
178
179First, specify the memory backend for the virtual machine.  Since QEMU must
180share the virtual machine's memory with the SPDK vhost target, the memory
181must be specified in this format with share=on.
182
183~~~{.sh}
184-object memory-backend-file,id=mem,size=1G,mem-path=/dev/hugepages,share=on
185-numa node,memdev=mem
186~~~
187
188Second, ensure QEMU boots from the virtual machine image and not the
189SPDK malloc block device by specifying bootindex=0 for the boot image.
190
191~~~{.sh}
192-drive file=guest_os_image.qcow2,if=none,id=disk
193-device ide-hd,drive=disk,bootindex=0
194~~~
195
196Finally, specify the SPDK vhost devices:
197
198### Vhost-SCSI
199
200~~~{.sh}
201-chardev socket,id=char0,path=/var/tmp/vhost.0
202-device vhost-user-scsi-pci,id=scsi0,chardev=char0
203~~~
204
205### Vhost-BLK
206
207~~~{.sh}
208-chardev socket,id=char1,path=/var/tmp/vhost.1
209-device vhost-user-blk-pci,id=blk0,chardev=char1
210~~~
211
212## Example output {#vhost_example}
213
214This example uses an NVMe bdev alongside Mallocs. SPDK vhost application is started
215on CPU cores 0 and 1, QEMU on cores 2 and 3.
216
217~~~{.sh}
218host:~# HUGEMEM=2048 ./scripts/setup.sh
2190000:01:00.0 (8086 0953): nvme -> vfio-pci
220~~~
221
222~~~{.sh}
223host:~# ./build/bin/vhost -S /var/tmp -s 1024 -m 0x3 &
224Starting DPDK 17.11.0 initialization...
225[ DPDK EAL parameters: vhost -c 3 -m 1024 --master-lcore=1 --file-prefix=spdk_pid156014 ]
226EAL: Detected 48 lcore(s)
227EAL: Probing VFIO support...
228EAL: VFIO support initialized
229app.c: 369:spdk_app_start: *NOTICE*: Total cores available: 2
230reactor.c: 668:spdk_reactors_init: *NOTICE*: Occupied cpu socket mask is 0x1
231reactor.c: 424:_spdk_reactor_run: *NOTICE*: Reactor started on core 1 on socket 0
232reactor.c: 424:_spdk_reactor_run: *NOTICE*: Reactor started on core 0 on socket 0
233~~~
234
235~~~{.sh}
236host:~# ./scripts/rpc.py bdev_nvme_attach_controller -b Nvme0 -t pcie -a 0000:01:00.0
237EAL: PCI device 0000:01:00.0 on NUMA socket 0
238EAL:   probe driver: 8086:953 spdk_nvme
239EAL:   using IOMMU type 1 (Type 1)
240~~~
241
242~~~{.sh}
243host:~# ./scripts/rpc.py bdev_malloc_create 128 4096 Malloc0
244Malloc0
245~~~
246
247~~~{.sh}
248host:~# ./scripts/rpc.py vhost_create_scsi_controller --cpumask 0x1 vhost.0
249VHOST_CONFIG: vhost-user server: socket created, fd: 21
250VHOST_CONFIG: bind to /var/tmp/vhost.0
251vhost.c: 596:spdk_vhost_dev_construct: *NOTICE*: Controller vhost.0: new controller added
252~~~
253
254~~~{.sh}
255host:~# ./scripts/rpc.py vhost_scsi_controller_add_target vhost.0 0 Nvme0n1
256vhost_scsi.c: 840:spdk_vhost_scsi_dev_add_tgt: *NOTICE*: Controller vhost.0: defined target 'Target 0' using lun 'Nvme0'
257
258~~~
259
260~~~{.sh}
261host:~# ./scripts/rpc.py vhost_scsi_controller_add_target vhost.0 1 Malloc0
262vhost_scsi.c: 840:spdk_vhost_scsi_dev_add_tgt: *NOTICE*: Controller vhost.0: defined target 'Target 1' using lun 'Malloc0'
263~~~
264
265~~~{.sh}
266host:~# ./scripts/rpc.py bdev_malloc_create 64 512 -b Malloc1
267Malloc1
268~~~
269
270~~~{.sh}
271host:~# ./scripts/rpc.py vhost_create_blk_controller --cpumask 0x2 vhost.1 Malloc1
272vhost_blk.c: 719:spdk_vhost_blk_construct: *NOTICE*: Controller vhost.1: using bdev 'Malloc1'
273~~~
274
275~~~{.sh}
276host:~# taskset -c 2,3 qemu-system-x86_64 \
277  --enable-kvm \
278  -cpu host -smp 2 \
279  -m 1G -object memory-backend-file,id=mem0,size=1G,mem-path=/dev/hugepages,share=on -numa node,memdev=mem0 \
280  -drive file=guest_os_image.qcow2,if=none,id=disk \
281  -device ide-hd,drive=disk,bootindex=0 \
282  -chardev socket,id=spdk_vhost_scsi0,path=/var/tmp/vhost.0 \
283  -device vhost-user-scsi-pci,id=scsi0,chardev=spdk_vhost_scsi0,num_queues=4 \
284  -chardev socket,id=spdk_vhost_blk0,path=/var/tmp/vhost.1 \
285  -device vhost-user-blk-pci,chardev=spdk_vhost_blk0,num-queues=4
286~~~
287
288Please note the following two commands are run on the guest VM.
289
290~~~{.sh}
291guest:~# lsblk --output "NAME,KNAME,MODEL,HCTL,SIZE,VENDOR,SUBSYSTEMS"
292NAME   KNAME MODEL            HCTL         SIZE VENDOR   SUBSYSTEMS
293sda    sda   QEMU HARDDISK    1:0:0:0       80G ATA      block:scsi:pci
294  sda1 sda1                                 80G          block:scsi:pci
295sdb    sdb   NVMe disk        2:0:0:0    372,6G INTEL    block:scsi:virtio:pci
296sdc    sdc   Malloc disk      2:0:1:0      128M INTEL    block:scsi:virtio:pci
297vda    vda                                 128M 0x1af4   block:virtio:pci
298~~~
299
300~~~{.sh}
301guest:~# poweroff
302~~~
303
304~~~{.sh}
305host:~# fg
306<< CTRL + C >>
307vhost.c:1006:session_shutdown: *NOTICE*: Exiting
308~~~
309
310We can see that `sdb` and `sdc` are SPDK vhost-scsi LUNs, and `vda` is SPDK a
311vhost-blk disk.
312
313# Advanced Topics {#vhost_advanced_topics}
314
315## Multi-Queue Block Layer (blk-mq) {#vhost_multiqueue}
316
317For best performance use the Linux kernel block multi-queue feature with vhost.
318To enable it on Linux, it is required to modify kernel options inside the
319virtual machine.
320
321Instructions below for Ubuntu OS:
322
3231. `vi /etc/default/grub`
3242. Make sure mq is enabled: `GRUB_CMDLINE_LINUX="scsi_mod.use_blk_mq=1"`
3253. `sudo update-grub`
3264. Reboot virtual machine
327
328To achieve better performance, make sure to increase number of cores
329assigned to the VM and add `num_queues` parameter to the QEMU `device`. It should be enough
330to set `num_queues=4` to saturate physical device. Adding too many queues might lead to SPDK
331vhost performance degradation if many vhost devices are used because each device will require
332additional `num_queues` to be polled.
333
334## Hot-attach/hot-detach {#vhost_hotattach}
335
336Hotplug/hotremove within a vhost controller is called hot-attach/detach. This is to
337distinguish it from SPDK bdev hotplug/hotremove. E.g. if an NVMe bdev is attached
338to a vhost-scsi controller, physically hotremoving the NVMe will trigger vhost-scsi
339hot-detach. It is also possible to hot-detach a bdev manually via RPC - for example
340when the bdev is about to be attached to another controller. See the details below.
341
342Please also note that hot-attach/detach is Vhost-SCSI-specific. There are no RPCs
343to hot-attach/detach the bdev from a Vhost-BLK device. If Vhost-BLK device exposes
344an NVMe bdev that is hotremoved, all the I/O traffic on that Vhost-BLK device will
345be aborted - possibly flooding a VM with syslog warnings and errors.
346
347### Hot-attach
348
349Hot-attach is done by simply attaching a bdev to a vhost controller with a QEMU VM
350already started. No other extra action is necessary.
351
352~~~{.sh}
353scripts/rpc.py vhost_scsi_controller_add_target vhost.0 0 Malloc0
354~~~
355
356### Hot-detach
357
358Just like hot-attach, the hot-detach is done by simply removing bdev from a controller
359when QEMU VM is already started.
360
361~~~{.sh}
362scripts/rpc.py vhost_scsi_controller_remove_target vhost.0 0
363~~~
364
365Removing an entire bdev will hot-detach it from a controller as well.
366
367~~~{.sh}
368scripts/rpc.py bdev_malloc_delete Malloc0
369~~~
370
371# Known bugs and limitations {#vhost_bugs}
372
373## Vhost-NVMe (experimental) can only be supported with latest Linux kernel
374
375Vhost-NVMe target was designed for one new feature of NVMe 1.3 specification, Doorbell
376Buffer Config Admin command, which is used for emulated NVMe controller only. Linux 4.12
377added this feature, so a new Guest kernel later than 4.12 is required to test this feature.
378
379## Windows virtio-blk driver before version 0.1.130-1 only works with 512-byte sectors
380
381The Windows `viostor` driver before version 0.1.130-1 is buggy and does not
382correctly support vhost-blk devices with non-512-byte block size.
383See the [bug report](https://bugzilla.redhat.com/show_bug.cgi?id=1411092) for
384more information.
385
386## QEMU vhost-user-blk
387
388QEMU [vhost-user-blk](https://git.qemu.org/?p=qemu.git;a=commit;h=00343e4b54ba) is
389supported from version 2.12.
390