xref: /dpdk/doc/guides/prog_guide/vhost_lib.rst (revision 2bfaec9072250104e1b152edd05385895fe43f0e)
10ee5e7fbSSiobhan Butler..  BSD LICENSE
2*2bfaec90SYuanhan Liu    Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
30ee5e7fbSSiobhan Butler    All rights reserved.
40ee5e7fbSSiobhan Butler
50ee5e7fbSSiobhan Butler    Redistribution and use in source and binary forms, with or without
60ee5e7fbSSiobhan Butler    modification, are permitted provided that the following conditions
70ee5e7fbSSiobhan Butler    are met:
80ee5e7fbSSiobhan Butler
90ee5e7fbSSiobhan Butler    * Redistributions of source code must retain the above copyright
100ee5e7fbSSiobhan Butler    notice, this list of conditions and the following disclaimer.
110ee5e7fbSSiobhan Butler    * Redistributions in binary form must reproduce the above copyright
120ee5e7fbSSiobhan Butler    notice, this list of conditions and the following disclaimer in
130ee5e7fbSSiobhan Butler    the documentation and/or other materials provided with the
140ee5e7fbSSiobhan Butler    distribution.
150ee5e7fbSSiobhan Butler    * Neither the name of Intel Corporation nor the names of its
160ee5e7fbSSiobhan Butler    contributors may be used to endorse or promote products derived
170ee5e7fbSSiobhan Butler    from this software without specific prior written permission.
180ee5e7fbSSiobhan Butler
190ee5e7fbSSiobhan Butler    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
200ee5e7fbSSiobhan Butler    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
210ee5e7fbSSiobhan Butler    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
220ee5e7fbSSiobhan Butler    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
230ee5e7fbSSiobhan Butler    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
240ee5e7fbSSiobhan Butler    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
250ee5e7fbSSiobhan Butler    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
260ee5e7fbSSiobhan Butler    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
270ee5e7fbSSiobhan Butler    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
280ee5e7fbSSiobhan Butler    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
290ee5e7fbSSiobhan Butler    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
300ee5e7fbSSiobhan Butler
310ee5e7fbSSiobhan ButlerVhost Library
320ee5e7fbSSiobhan Butler=============
330ee5e7fbSSiobhan Butler
34*2bfaec90SYuanhan LiuThe vhost library implements a user space virtio net server allowing the user
35*2bfaec90SYuanhan Liuto manipulate the virtio ring directly. In another words, it allows the user
36*2bfaec90SYuanhan Liuto fetch/put packets from/to the VM virtio net device. To achieve this, a
37*2bfaec90SYuanhan Liuvhost library should be able to:
38*2bfaec90SYuanhan Liu
39*2bfaec90SYuanhan Liu* Access the guest memory:
40*2bfaec90SYuanhan Liu
41*2bfaec90SYuanhan Liu  For QEMU, this is done by using the ``-object memory-backend-file,share=on,...``
42*2bfaec90SYuanhan Liu  option. Which means QEMU will create a file to serve as the guest RAM.
43*2bfaec90SYuanhan Liu  The ``share=on`` option allows another process to map that file, which
44*2bfaec90SYuanhan Liu  means it can access the guest RAM.
45*2bfaec90SYuanhan Liu
46*2bfaec90SYuanhan Liu* Know all the necessary information about the vring:
47*2bfaec90SYuanhan Liu
48*2bfaec90SYuanhan Liu  Information such as where the available ring is stored. Vhost defines some
49*2bfaec90SYuanhan Liu  messages to tell the backend all the information it needs to know how to
50*2bfaec90SYuanhan Liu  manipulate the vring.
51*2bfaec90SYuanhan Liu
52*2bfaec90SYuanhan LiuCurrently, there are two ways to pass these messages and as a result there are
53*2bfaec90SYuanhan Liutwo Vhost implementations in DPDK: *vhost-cuse* (where the character devices
54*2bfaec90SYuanhan Liuare in user space) and *vhost-user*.
55*2bfaec90SYuanhan Liu
56*2bfaec90SYuanhan LiuVhost-cuse creates a user space character device and hook to a function ioctl,
57*2bfaec90SYuanhan Liuso that all ioctl commands that are sent from the frontend (QEMU) will be
58*2bfaec90SYuanhan Liucaptured and handled.
59*2bfaec90SYuanhan Liu
60*2bfaec90SYuanhan LiuVhost-user creates a Unix domain socket file through which messages are
61*2bfaec90SYuanhan Liupassed.
62*2bfaec90SYuanhan Liu
63*2bfaec90SYuanhan Liu.. Note::
64*2bfaec90SYuanhan Liu
65*2bfaec90SYuanhan Liu   Since DPDK v2.2, the majority of the development effort has gone into
66*2bfaec90SYuanhan Liu   enhancing vhost-user, such as multiple queue, live migration, and
67*2bfaec90SYuanhan Liu   reconnect. Thus, it is strongly advised to use vhost-user instead of
68*2bfaec90SYuanhan Liu   vhost-cuse.
69*2bfaec90SYuanhan Liu
700ee5e7fbSSiobhan Butler
710ee5e7fbSSiobhan ButlerVhost API Overview
720ee5e7fbSSiobhan Butler------------------
730ee5e7fbSSiobhan Butler
74*2bfaec90SYuanhan LiuThe following is an overview of the Vhost API functions:
750ee5e7fbSSiobhan Butler
76*2bfaec90SYuanhan Liu* ``rte_vhost_driver_register(path, flags)``
770ee5e7fbSSiobhan Butler
78*2bfaec90SYuanhan Liu  This function registers a vhost driver into the system. For vhost-cuse, a
79*2bfaec90SYuanhan Liu  ``/dev/path`` character device file will be created. For vhost-user server
80*2bfaec90SYuanhan Liu  mode, a Unix domain socket file ``path`` will be created.
810ee5e7fbSSiobhan Butler
82*2bfaec90SYuanhan Liu  Currently two flags are supported (these are valid for vhost-user only):
830ee5e7fbSSiobhan Butler
84*2bfaec90SYuanhan Liu  - ``RTE_VHOST_USER_CLIENT``
850ee5e7fbSSiobhan Butler
86*2bfaec90SYuanhan Liu    DPDK vhost-user will act as the client when this flag is given. See below
87*2bfaec90SYuanhan Liu    for an explanation.
880ee5e7fbSSiobhan Butler
89*2bfaec90SYuanhan Liu  - ``RTE_VHOST_USER_NO_RECONNECT``
900ee5e7fbSSiobhan Butler
91*2bfaec90SYuanhan Liu    When DPDK vhost-user acts as the client it will keep trying to reconnect
92*2bfaec90SYuanhan Liu    to the server (QEMU) until it succeeds. This is useful in two cases:
930ee5e7fbSSiobhan Butler
94*2bfaec90SYuanhan Liu    * When QEMU is not started yet.
95*2bfaec90SYuanhan Liu    * When QEMU restarts (for example due to a guest OS reboot).
960ee5e7fbSSiobhan Butler
97*2bfaec90SYuanhan Liu    This reconnect option is enabled by default. However, it can be turned off
98*2bfaec90SYuanhan Liu    by setting this flag.
990ee5e7fbSSiobhan Butler
100*2bfaec90SYuanhan Liu* ``rte_vhost_driver_session_start()``
1010ee5e7fbSSiobhan Butler
102*2bfaec90SYuanhan Liu  This function starts the vhost session loop to handle vhost messages. It
103*2bfaec90SYuanhan Liu  starts an infinite loop, therefore it should be called in a dedicated
104*2bfaec90SYuanhan Liu  thread.
105*2bfaec90SYuanhan Liu
106*2bfaec90SYuanhan Liu* ``rte_vhost_driver_callback_register(virtio_net_device_ops)``
107*2bfaec90SYuanhan Liu
108*2bfaec90SYuanhan Liu  This function registers a set of callbacks, to let DPDK applications take
109*2bfaec90SYuanhan Liu  the appropriate action when some events happen. The following events are
110*2bfaec90SYuanhan Liu  currently supported:
111*2bfaec90SYuanhan Liu
112*2bfaec90SYuanhan Liu  * ``new_device(int vid)``
113*2bfaec90SYuanhan Liu
114*2bfaec90SYuanhan Liu    This callback is invoked when a virtio net device becomes ready. ``vid``
115*2bfaec90SYuanhan Liu    is the virtio net device ID.
116*2bfaec90SYuanhan Liu
117*2bfaec90SYuanhan Liu  * ``destroy_device(int vid)``
118*2bfaec90SYuanhan Liu
119*2bfaec90SYuanhan Liu    This callback is invoked when a virtio net device shuts down (or when the
120*2bfaec90SYuanhan Liu    vhost connection is broken).
121*2bfaec90SYuanhan Liu
122*2bfaec90SYuanhan Liu  * ``vring_state_changed(int vid, uint16_t queue_id, int enable)``
123*2bfaec90SYuanhan Liu
124*2bfaec90SYuanhan Liu    This callback is invoked when a specific queue's state is changed, for
125*2bfaec90SYuanhan Liu    example to enabled or disabled.
126*2bfaec90SYuanhan Liu
127*2bfaec90SYuanhan Liu* ``rte_vhost_enqueue_burst(vid, queue_id, pkts, count)``
128*2bfaec90SYuanhan Liu
129*2bfaec90SYuanhan Liu  Transmits (enqueues) ``count`` packets from host to guest.
130*2bfaec90SYuanhan Liu
131*2bfaec90SYuanhan Liu* ``rte_vhost_dequeue_burst(vid, queue_id, mbuf_pool, pkts, count)``
132*2bfaec90SYuanhan Liu
133*2bfaec90SYuanhan Liu  Receives (dequeues) ``count`` packets from guest, and stored them at ``pkts``.
134*2bfaec90SYuanhan Liu
135*2bfaec90SYuanhan Liu* ``rte_vhost_feature_disable/rte_vhost_feature_enable(feature_mask)``
136*2bfaec90SYuanhan Liu
137*2bfaec90SYuanhan Liu  This function disables/enables some features. For example, it can be used to
138*2bfaec90SYuanhan Liu  disable mergeable buffers and TSO features, which both are enabled by
139*2bfaec90SYuanhan Liu  default.
140*2bfaec90SYuanhan Liu
141*2bfaec90SYuanhan Liu
142*2bfaec90SYuanhan LiuVhost Implementations
143*2bfaec90SYuanhan Liu---------------------
144*2bfaec90SYuanhan Liu
145*2bfaec90SYuanhan LiuVhost-cuse implementation
14642683a7dSHuawei Xie~~~~~~~~~~~~~~~~~~~~~~~~~
147*2bfaec90SYuanhan Liu
1480ee5e7fbSSiobhan ButlerWhen vSwitch registers the vhost driver, it will register a cuse device driver
1490ee5e7fbSSiobhan Butlerinto the system and creates a character device file. This cuse driver will
150*2bfaec90SYuanhan Liureceive vhost open/release/IOCTL messages from the QEMU simulator.
1510ee5e7fbSSiobhan Butler
152*2bfaec90SYuanhan LiuWhen the open call is received, the vhost driver will create a vhost device
153*2bfaec90SYuanhan Liufor the virtio device in the guest.
1540ee5e7fbSSiobhan Butler
155*2bfaec90SYuanhan LiuWhen the ``VHOST_SET_MEM_TABLE`` ioctl is received, vhost searches the memory
156*2bfaec90SYuanhan Liuregion to find the starting user space virtual address that maps the memory of
157*2bfaec90SYuanhan Liuthe guest virtual machine. Through this virtual address and the QEMU pid,
158*2bfaec90SYuanhan Liuvhost can find the file QEMU uses to map the guest memory. Vhost maps this
159*2bfaec90SYuanhan Liufile into its address space, in this way vhost can fully access the guest
160*2bfaec90SYuanhan Liuphysical memory, which means vhost could access the shared virtio ring and the
161*2bfaec90SYuanhan Liuguest physical address specified in the entry of the ring.
1620ee5e7fbSSiobhan Butler
1630ee5e7fbSSiobhan ButlerThe guest virtual machine tells the vhost whether the virtio device is ready
164*2bfaec90SYuanhan Liufor processing or is de-activated through the ``VHOST_NET_SET_BACKEND``
165*2bfaec90SYuanhan Liumessage. The registered callback from vSwitch will be called.
1660ee5e7fbSSiobhan Butler
167*2bfaec90SYuanhan LiuWhen the release call is made, vhost will destroy the device.
1680ee5e7fbSSiobhan Butler
169*2bfaec90SYuanhan LiuVhost-user implementation
17042683a7dSHuawei Xie~~~~~~~~~~~~~~~~~~~~~~~~~
17142683a7dSHuawei Xie
172*2bfaec90SYuanhan LiuVhost-user uses Unix domain sockets for passing messages. This means the DPDK
173*2bfaec90SYuanhan Liuvhost-user implementation has two options:
17442683a7dSHuawei Xie
175*2bfaec90SYuanhan Liu* DPDK vhost-user acts as the server.
17642683a7dSHuawei Xie
177*2bfaec90SYuanhan Liu  DPDK will create a Unix domain socket server file and listen for
178*2bfaec90SYuanhan Liu  connections from the frontend.
17942683a7dSHuawei Xie
180*2bfaec90SYuanhan Liu  Note, this is the default mode, and the only mode before DPDK v16.07.
18142683a7dSHuawei Xie
182*2bfaec90SYuanhan Liu
183*2bfaec90SYuanhan Liu* DPDK vhost-user acts as the client.
184*2bfaec90SYuanhan Liu
185*2bfaec90SYuanhan Liu  Unlike the server mode, this mode doesn't create the socket file;
186*2bfaec90SYuanhan Liu  it just tries to connect to the server (which responses to create the
187*2bfaec90SYuanhan Liu  file instead).
188*2bfaec90SYuanhan Liu
189*2bfaec90SYuanhan Liu  When the DPDK vhost-user application restarts, DPDK vhost-user will try to
190*2bfaec90SYuanhan Liu  connect to the server again. This is how the "reconnect" feature works.
191*2bfaec90SYuanhan Liu
192*2bfaec90SYuanhan Liu  Note: the "reconnect" feature requires **QEMU v2.7** (or above).
193*2bfaec90SYuanhan Liu
194*2bfaec90SYuanhan LiuNo matter which mode is used, once a connection is established, DPDK
195*2bfaec90SYuanhan Liuvhost-user will start receiving and processing vhost messages from QEMU.
196*2bfaec90SYuanhan Liu
197*2bfaec90SYuanhan LiuFor messages with a file descriptor, the file descriptor can be used directly
198*2bfaec90SYuanhan Liuin the vhost process as it is already installed by the Unix domain socket.
199*2bfaec90SYuanhan Liu
200*2bfaec90SYuanhan LiuThe supported vhost messages are:
201*2bfaec90SYuanhan Liu
202*2bfaec90SYuanhan Liu* ``VHOST_SET_MEM_TABLE``
203*2bfaec90SYuanhan Liu* ``VHOST_SET_VRING_KICK``
204*2bfaec90SYuanhan Liu* ``VHOST_SET_VRING_CALL``
205*2bfaec90SYuanhan Liu* ``VHOST_SET_LOG_FD``
206*2bfaec90SYuanhan Liu* ``VHOST_SET_VRING_ERR``
207*2bfaec90SYuanhan Liu
208*2bfaec90SYuanhan LiuFor ``VHOST_SET_MEM_TABLE`` message, QEMU will send information for each
209*2bfaec90SYuanhan Liumemory region and its file descriptor in the ancillary data of the message.
210*2bfaec90SYuanhan LiuThe file descriptor is used to map that region.
211*2bfaec90SYuanhan Liu
212*2bfaec90SYuanhan LiuThere is no ``VHOST_NET_SET_BACKEND`` message as in vhost-cuse to signal
213*2bfaec90SYuanhan Liuwhether the virtio device is ready or stopped. Instead,
214*2bfaec90SYuanhan Liu``VHOST_SET_VRING_KICK`` is used as the signal to put the vhost device into
215*2bfaec90SYuanhan Liuthe data plane, and ``VHOST_GET_VRING_BASE`` is used as the signal to remove
216*2bfaec90SYuanhan Liuthe vhost device from the data plane.
21742683a7dSHuawei Xie
21842683a7dSHuawei XieWhen the socket connection is closed, vhost will destroy the device.
21942683a7dSHuawei Xie
2200ee5e7fbSSiobhan ButlerVhost supported vSwitch reference
2210ee5e7fbSSiobhan Butler---------------------------------
2220ee5e7fbSSiobhan Butler
223*2bfaec90SYuanhan LiuFor more vhost details and how to support vhost in vSwitch, please refer to
224*2bfaec90SYuanhan Liuthe vhost example in the DPDK Sample Applications Guide.
225