xref: /dpdk/doc/guides/prog_guide/vhost_lib.rst (revision 362f06f9a430d65e96a8bb8e972570bedd3ece5d)
15630257fSFerruh Yigit..  SPDX-License-Identifier: BSD-3-Clause
25630257fSFerruh Yigit    Copyright(c) 2010-2016 Intel Corporation.
30ee5e7fbSSiobhan Butler
40ee5e7fbSSiobhan ButlerVhost Library
50ee5e7fbSSiobhan Butler=============
60ee5e7fbSSiobhan Butler
72bfaec90SYuanhan LiuThe vhost library implements a user space virtio net server allowing the user
82bfaec90SYuanhan Liuto manipulate the virtio ring directly. In another words, it allows the user
92bfaec90SYuanhan Liuto fetch/put packets from/to the VM virtio net device. To achieve this, a
102bfaec90SYuanhan Liuvhost library should be able to:
112bfaec90SYuanhan Liu
122bfaec90SYuanhan Liu* Access the guest memory:
132bfaec90SYuanhan Liu
142bfaec90SYuanhan Liu  For QEMU, this is done by using the ``-object memory-backend-file,share=on,...``
152bfaec90SYuanhan Liu  option. Which means QEMU will create a file to serve as the guest RAM.
162bfaec90SYuanhan Liu  The ``share=on`` option allows another process to map that file, which
172bfaec90SYuanhan Liu  means it can access the guest RAM.
182bfaec90SYuanhan Liu
192bfaec90SYuanhan Liu* Know all the necessary information about the vring:
202bfaec90SYuanhan Liu
212bfaec90SYuanhan Liu  Information such as where the available ring is stored. Vhost defines some
22647e191bSYuanhan Liu  messages (passed through a Unix domain socket file) to tell the backend all
23647e191bSYuanhan Liu  the information it needs to know how to manipulate the vring.
242bfaec90SYuanhan Liu
250ee5e7fbSSiobhan Butler
260ee5e7fbSSiobhan ButlerVhost API Overview
270ee5e7fbSSiobhan Butler------------------
280ee5e7fbSSiobhan Butler
295fbb3941SYuanhan LiuThe following is an overview of some key Vhost API functions:
300ee5e7fbSSiobhan Butler
312bfaec90SYuanhan Liu* ``rte_vhost_driver_register(path, flags)``
320ee5e7fbSSiobhan Butler
33647e191bSYuanhan Liu  This function registers a vhost driver into the system. ``path`` specifies
34647e191bSYuanhan Liu  the Unix domain socket file path.
350ee5e7fbSSiobhan Butler
36647e191bSYuanhan Liu  Currently supported flags are:
370ee5e7fbSSiobhan Butler
382bfaec90SYuanhan Liu  - ``RTE_VHOST_USER_CLIENT``
390ee5e7fbSSiobhan Butler
402bfaec90SYuanhan Liu    DPDK vhost-user will act as the client when this flag is given. See below
412bfaec90SYuanhan Liu    for an explanation.
420ee5e7fbSSiobhan Butler
432bfaec90SYuanhan Liu  - ``RTE_VHOST_USER_NO_RECONNECT``
440ee5e7fbSSiobhan Butler
452bfaec90SYuanhan Liu    When DPDK vhost-user acts as the client it will keep trying to reconnect
462bfaec90SYuanhan Liu    to the server (QEMU) until it succeeds. This is useful in two cases:
470ee5e7fbSSiobhan Butler
482bfaec90SYuanhan Liu    * When QEMU is not started yet.
492bfaec90SYuanhan Liu    * When QEMU restarts (for example due to a guest OS reboot).
500ee5e7fbSSiobhan Butler
512bfaec90SYuanhan Liu    This reconnect option is enabled by default. However, it can be turned off
522bfaec90SYuanhan Liu    by setting this flag.
530ee5e7fbSSiobhan Butler
549ba1e744SYuanhan Liu  - ``RTE_VHOST_USER_DEQUEUE_ZERO_COPY``
559ba1e744SYuanhan Liu
569ba1e744SYuanhan Liu    Dequeue zero copy will be enabled when this flag is set. It is disabled by
579ba1e744SYuanhan Liu    default.
589ba1e744SYuanhan Liu
599ba1e744SYuanhan Liu    There are some truths (including limitations) you might want to know while
609ba1e744SYuanhan Liu    setting this flag:
619ba1e744SYuanhan Liu
629ba1e744SYuanhan Liu    * zero copy is not good for small packets (typically for packet size below
639ba1e744SYuanhan Liu      512).
649ba1e744SYuanhan Liu
659ba1e744SYuanhan Liu    * zero copy is really good for VM2VM case. For iperf between two VMs, the
66d629b7b5SJohn McNamara      boost could be above 70% (when TSO is enabled).
679ba1e744SYuanhan Liu
68a24e7032SJunjie Chen    * For zero copy in VM2NIC case, guest Tx used vring may be starved if the
69a24e7032SJunjie Chen      PMD driver consume the mbuf but not release them timely.
709ba1e744SYuanhan Liu
71a24e7032SJunjie Chen      For example, i40e driver has an optimization to maximum NIC pipeline which
72a24e7032SJunjie Chen      postpones returning transmitted mbuf until only tx_free_threshold free
73a24e7032SJunjie Chen      descs left. The virtio TX used ring will be starved if the formula
74a24e7032SJunjie Chen      (num_i40e_tx_desc - num_virtio_tx_desc > tx_free_threshold) is true, since
75a24e7032SJunjie Chen      i40e will not return back mbuf.
76a24e7032SJunjie Chen
77a24e7032SJunjie Chen      A performance tip for tuning zero copy in VM2NIC case is to adjust the
78a24e7032SJunjie Chen      frequency of mbuf free (i.e. adjust tx_free_threshold of i40e driver) to
79a24e7032SJunjie Chen      balance consumer and producer.
809ba1e744SYuanhan Liu
819ba1e744SYuanhan Liu    * Guest memory should be backended with huge pages to achieve better
829ba1e744SYuanhan Liu      performance. Using 1G page size is the best.
839ba1e744SYuanhan Liu
849ba1e744SYuanhan Liu      When dequeue zero copy is enabled, the guest phys address and host phys
859ba1e744SYuanhan Liu      address mapping has to be established. Using non-huge pages means far
869ba1e744SYuanhan Liu      more page segments. To make it simple, DPDK vhost does a linear search
879ba1e744SYuanhan Liu      of those segments, thus the fewer the segments, the quicker we will get
889ba1e744SYuanhan Liu      the mapping. NOTE: we may speed it by using tree searching in future.
899ba1e744SYuanhan Liu
90e3075e96SJunjie Chen    * zero copy can not work when using vfio-pci with iommu mode currently, this
91e3075e96SJunjie Chen      is because we don't setup iommu dma mapping for guest memory. If you have
92e3075e96SJunjie Chen      to use vfio-pci driver, please insert vfio-pci kernel module in noiommu
93e3075e96SJunjie Chen      mode.
94e3075e96SJunjie Chen
955c6c1480STiwei Bie    * The consumer of zero copy mbufs should consume these mbufs as soon as
965c6c1480STiwei Bie      possible, otherwise it may block the operations in vhost.
975c6c1480STiwei Bie
98002d6a7eSMaxime Coquelin  - ``RTE_VHOST_USER_IOMMU_SUPPORT``
99002d6a7eSMaxime Coquelin
100002d6a7eSMaxime Coquelin    IOMMU support will be enabled when this flag is set. It is disabled by
101002d6a7eSMaxime Coquelin    default.
102002d6a7eSMaxime Coquelin
103002d6a7eSMaxime Coquelin    Enabling this flag makes possible to use guest vIOMMU to protect vhost
104002d6a7eSMaxime Coquelin    from accessing memory the virtio device isn't allowed to, when the feature
105002d6a7eSMaxime Coquelin    is negotiated and an IOMMU device is declared.
106002d6a7eSMaxime Coquelin
107cd85039eSMaxime Coquelin  - ``RTE_VHOST_USER_POSTCOPY_SUPPORT``
108cd85039eSMaxime Coquelin
109cd85039eSMaxime Coquelin    Postcopy live-migration support will be enabled when this flag is set.
110cd85039eSMaxime Coquelin    It is disabled by default.
111cd85039eSMaxime Coquelin
112cd85039eSMaxime Coquelin    Enabling this flag should only be done when the calling application does
113cd85039eSMaxime Coquelin    not pre-fault the guest shared memory, otherwise migration would fail.
114cd85039eSMaxime Coquelin
115c3ff0ac7SFlavio Leitner  - ``RTE_VHOST_USER_LINEARBUF_SUPPORT``
116c3ff0ac7SFlavio Leitner
117c3ff0ac7SFlavio Leitner    Enabling this flag forces vhost dequeue function to only provide linear
118c3ff0ac7SFlavio Leitner    pktmbuf (no multi-segmented pktmbuf).
119c3ff0ac7SFlavio Leitner
120c3ff0ac7SFlavio Leitner    The vhost library by default provides a single pktmbuf for given a
121c3ff0ac7SFlavio Leitner    packet, but if for some reason the data doesn't fit into a single
122c3ff0ac7SFlavio Leitner    pktmbuf (e.g., TSO is enabled), the library will allocate additional
123c3ff0ac7SFlavio Leitner    pktmbufs from the same mempool and chain them together to create a
124c3ff0ac7SFlavio Leitner    multi-segmented pktmbuf.
125c3ff0ac7SFlavio Leitner
126c3ff0ac7SFlavio Leitner    However, the vhost application needs to support multi-segmented format.
127c3ff0ac7SFlavio Leitner    If the vhost application does not support that format and requires large
128c3ff0ac7SFlavio Leitner    buffers to be dequeue, this flag should be enabled to force only linear
129c3ff0ac7SFlavio Leitner    buffers (see RTE_VHOST_USER_EXTBUF_SUPPORT) or drop the packet.
130c3ff0ac7SFlavio Leitner
131c3ff0ac7SFlavio Leitner    It is disabled by default.
132c3ff0ac7SFlavio Leitner
133c3ff0ac7SFlavio Leitner  - ``RTE_VHOST_USER_EXTBUF_SUPPORT``
134c3ff0ac7SFlavio Leitner
135c3ff0ac7SFlavio Leitner    Enabling this flag allows vhost dequeue function to allocate and attach
136c3ff0ac7SFlavio Leitner    an external buffer to a pktmbuf if the pkmbuf doesn't provide enough
137c3ff0ac7SFlavio Leitner    space to store all data.
138c3ff0ac7SFlavio Leitner
139c3ff0ac7SFlavio Leitner    This is useful when the vhost application wants to support large packets
140c3ff0ac7SFlavio Leitner    but doesn't want to increase the default mempool object size nor to
141c3ff0ac7SFlavio Leitner    support multi-segmented mbufs (non-linear). In this case, a fresh buffer
142c3ff0ac7SFlavio Leitner    is allocated using rte_malloc() which gets attached to a pktmbuf using
143c3ff0ac7SFlavio Leitner    rte_pktmbuf_attach_extbuf().
144c3ff0ac7SFlavio Leitner
145c3ff0ac7SFlavio Leitner    See RTE_VHOST_USER_LINEARBUF_SUPPORT as well to disable multi-segmented
146c3ff0ac7SFlavio Leitner    mbufs for application that doesn't support chained mbufs.
147c3ff0ac7SFlavio Leitner
148c3ff0ac7SFlavio Leitner    It is disabled by default.
149c3ff0ac7SFlavio Leitner
150*362f06f9SPatrick Fu  - ``RTE_VHOST_USER_ASYNC_COPY``
151*362f06f9SPatrick Fu
152*362f06f9SPatrick Fu    Asynchronous data path will be enabled when this flag is set. Async data
153*362f06f9SPatrick Fu    path allows applications to register async copy devices (typically
154*362f06f9SPatrick Fu    hardware DMA channels) to the vhost queues. Vhost leverages the copy
155*362f06f9SPatrick Fu    device registered to free CPU from memory copy operations. A set of
156*362f06f9SPatrick Fu    async data path APIs are defined for DPDK applications to make use of
157*362f06f9SPatrick Fu    the async capability. Only packets enqueued/dequeued by async APIs are
158*362f06f9SPatrick Fu    processed through the async data path.
159*362f06f9SPatrick Fu
160*362f06f9SPatrick Fu    Currently this feature is only implemented on split ring enqueue data
161*362f06f9SPatrick Fu    path.
162*362f06f9SPatrick Fu
163*362f06f9SPatrick Fu    It is disabled by default.
164*362f06f9SPatrick Fu
1655fbb3941SYuanhan Liu* ``rte_vhost_driver_set_features(path, features)``
1665fbb3941SYuanhan Liu
1675fbb3941SYuanhan Liu  This function sets the feature bits the vhost-user driver supports. The
1685fbb3941SYuanhan Liu  vhost-user driver could be vhost-user net, yet it could be something else,
1695fbb3941SYuanhan Liu  say, vhost-user SCSI.
1705fbb3941SYuanhan Liu
1717c129037SYuanhan Liu* ``rte_vhost_driver_callback_register(path, vhost_device_ops)``
1722bfaec90SYuanhan Liu
1732bfaec90SYuanhan Liu  This function registers a set of callbacks, to let DPDK applications take
1742bfaec90SYuanhan Liu  the appropriate action when some events happen. The following events are
1752bfaec90SYuanhan Liu  currently supported:
1762bfaec90SYuanhan Liu
1772bfaec90SYuanhan Liu  * ``new_device(int vid)``
1782bfaec90SYuanhan Liu
179cb043557SYuanhan Liu    This callback is invoked when a virtio device becomes ready. ``vid``
180cb043557SYuanhan Liu    is the vhost device ID.
1812bfaec90SYuanhan Liu
1822bfaec90SYuanhan Liu  * ``destroy_device(int vid)``
1832bfaec90SYuanhan Liu
184efba12a7SDariusz Stojaczyk    This callback is invoked when a virtio device is paused or shut down.
1852bfaec90SYuanhan Liu
1862bfaec90SYuanhan Liu  * ``vring_state_changed(int vid, uint16_t queue_id, int enable)``
1872bfaec90SYuanhan Liu
1882bfaec90SYuanhan Liu    This callback is invoked when a specific queue's state is changed, for
1892bfaec90SYuanhan Liu    example to enabled or disabled.
1902bfaec90SYuanhan Liu
191abd53c16SYuanhan Liu  * ``features_changed(int vid, uint64_t features)``
192abd53c16SYuanhan Liu
193abd53c16SYuanhan Liu    This callback is invoked when the features is changed. For example,
194abd53c16SYuanhan Liu    ``VHOST_F_LOG_ALL`` will be set/cleared at the start/end of live
195abd53c16SYuanhan Liu    migration, respectively.
196abd53c16SYuanhan Liu
197efba12a7SDariusz Stojaczyk  * ``new_connection(int vid)``
198efba12a7SDariusz Stojaczyk
199efba12a7SDariusz Stojaczyk    This callback is invoked on new vhost-user socket connection. If DPDK
200efba12a7SDariusz Stojaczyk    acts as the server the device should not be deleted before
201efba12a7SDariusz Stojaczyk    ``destroy_connection`` callback is received.
202efba12a7SDariusz Stojaczyk
203efba12a7SDariusz Stojaczyk  * ``destroy_connection(int vid)``
204efba12a7SDariusz Stojaczyk
205efba12a7SDariusz Stojaczyk    This callback is invoked when vhost-user socket connection is closed.
206efba12a7SDariusz Stojaczyk    It indicates that device with id ``vid`` is no longer in use and can be
207efba12a7SDariusz Stojaczyk    safely deleted.
208efba12a7SDariusz Stojaczyk
209af147591SYuanhan Liu* ``rte_vhost_driver_disable/enable_features(path, features))``
210af147591SYuanhan Liu
211af147591SYuanhan Liu  This function disables/enables some features. For example, it can be used to
212af147591SYuanhan Liu  disable mergeable buffers and TSO features, which both are enabled by
213af147591SYuanhan Liu  default.
214af147591SYuanhan Liu
215af147591SYuanhan Liu* ``rte_vhost_driver_start(path)``
216af147591SYuanhan Liu
217af147591SYuanhan Liu  This function triggers the vhost-user negotiation. It should be invoked at
218af147591SYuanhan Liu  the end of initializing a vhost-user driver.
219af147591SYuanhan Liu
2202bfaec90SYuanhan Liu* ``rte_vhost_enqueue_burst(vid, queue_id, pkts, count)``
2212bfaec90SYuanhan Liu
2222bfaec90SYuanhan Liu  Transmits (enqueues) ``count`` packets from host to guest.
2232bfaec90SYuanhan Liu
2242bfaec90SYuanhan Liu* ``rte_vhost_dequeue_burst(vid, queue_id, mbuf_pool, pkts, count)``
2252bfaec90SYuanhan Liu
2262bfaec90SYuanhan Liu  Receives (dequeues) ``count`` packets from guest, and stored them at ``pkts``.
2272bfaec90SYuanhan Liu
228939066d9SFan Zhang* ``rte_vhost_crypto_create(vid, cryptodev_id, sess_mempool, socket_id)``
229939066d9SFan Zhang
230939066d9SFan Zhang  As an extension of new_device(), this function adds virtio-crypto workload
231939066d9SFan Zhang  acceleration capability to the device. All crypto workload is processed by
232939066d9SFan Zhang  DPDK cryptodev with the device ID of ``cryptodev_id``.
233939066d9SFan Zhang
234939066d9SFan Zhang* ``rte_vhost_crypto_free(vid)``
235939066d9SFan Zhang
236939066d9SFan Zhang  Frees the memory and vhost-user message handlers created in
237939066d9SFan Zhang  rte_vhost_crypto_create().
238939066d9SFan Zhang
239939066d9SFan Zhang* ``rte_vhost_crypto_fetch_requests(vid, queue_id, ops, nb_ops)``
240939066d9SFan Zhang
241939066d9SFan Zhang  Receives (dequeues) ``nb_ops`` virtio-crypto requests from guest, parses
242939066d9SFan Zhang  them to DPDK Crypto Operations, and fills the ``ops`` with parsing results.
243939066d9SFan Zhang
244939066d9SFan Zhang* ``rte_vhost_crypto_finalize_requests(queue_id, ops, nb_ops)``
245939066d9SFan Zhang
246939066d9SFan Zhang  After the ``ops`` are dequeued from Cryptodev, finalizes the jobs and
247939066d9SFan Zhang  notifies the guest(s).
248939066d9SFan Zhang
249939066d9SFan Zhang* ``rte_vhost_crypto_set_zero_copy(vid, option)``
250939066d9SFan Zhang
251939066d9SFan Zhang  Enable or disable zero copy feature of the vhost crypto backend.
252939066d9SFan Zhang
253*362f06f9SPatrick Fu* ``rte_vhost_async_channel_register(vid, queue_id, features, ops)``
254*362f06f9SPatrick Fu
255*362f06f9SPatrick Fu  Register a vhost queue with async copy device channel.
256*362f06f9SPatrick Fu  Following device ``features`` must be specified together with the
257*362f06f9SPatrick Fu  registration:
258*362f06f9SPatrick Fu
259*362f06f9SPatrick Fu  * ``async_inorder``
260*362f06f9SPatrick Fu
261*362f06f9SPatrick Fu    Async copy device can guarantee the ordering of copy completion
262*362f06f9SPatrick Fu    sequence. Copies are completed in the same order with that at
263*362f06f9SPatrick Fu    the submission time.
264*362f06f9SPatrick Fu
265*362f06f9SPatrick Fu    Currently, only ``async_inorder`` capable device is supported by vhost.
266*362f06f9SPatrick Fu
267*362f06f9SPatrick Fu  * ``async_threshold``
268*362f06f9SPatrick Fu
269*362f06f9SPatrick Fu    The copy length (in bytes) below which CPU copy will be used even if
270*362f06f9SPatrick Fu    applications call async vhost APIs to enqueue/dequeue data.
271*362f06f9SPatrick Fu
272*362f06f9SPatrick Fu    Typical value is 512~1024 depending on the async device capability.
273*362f06f9SPatrick Fu
274*362f06f9SPatrick Fu  Applications must provide following ``ops`` callbacks for vhost lib to
275*362f06f9SPatrick Fu  work with the async copy devices:
276*362f06f9SPatrick Fu
277*362f06f9SPatrick Fu  * ``transfer_data(vid, queue_id, descs, opaque_data, count)``
278*362f06f9SPatrick Fu
279*362f06f9SPatrick Fu    vhost invokes this function to submit copy data to the async devices.
280*362f06f9SPatrick Fu    For non-async_inorder capable devices, ``opaque_data`` could be used
281*362f06f9SPatrick Fu    for identifying the completed packets.
282*362f06f9SPatrick Fu
283*362f06f9SPatrick Fu  * ``check_completed_copies(vid, queue_id, opaque_data, max_packets)``
284*362f06f9SPatrick Fu
285*362f06f9SPatrick Fu    vhost invokes this function to get the copy data completed by async
286*362f06f9SPatrick Fu    devices.
287*362f06f9SPatrick Fu
288*362f06f9SPatrick Fu* ``rte_vhost_async_channel_unregister(vid, queue_id)``
289*362f06f9SPatrick Fu
290*362f06f9SPatrick Fu  Unregister the async copy device channel from a vhost queue.
291*362f06f9SPatrick Fu
292*362f06f9SPatrick Fu* ``rte_vhost_submit_enqueue_burst(vid, queue_id, pkts, count)``
293*362f06f9SPatrick Fu
294*362f06f9SPatrick Fu  Submit an enqueue request to transmit ``count`` packets from host to guest
295*362f06f9SPatrick Fu  by async data path. Enqueue is not guaranteed to finish upon the return of
296*362f06f9SPatrick Fu  this API call.
297*362f06f9SPatrick Fu
298*362f06f9SPatrick Fu  Applications must not free the packets submitted for enqueue until the
299*362f06f9SPatrick Fu  packets are completed.
300*362f06f9SPatrick Fu
301*362f06f9SPatrick Fu* ``rte_vhost_poll_enqueue_completed(vid, queue_id, pkts, count)``
302*362f06f9SPatrick Fu
303*362f06f9SPatrick Fu  Poll enqueue completion status from async data path. Completed packets
304*362f06f9SPatrick Fu  are returned to applications through ``pkts``.
305*362f06f9SPatrick Fu
306647e191bSYuanhan LiuVhost-user Implementations
307647e191bSYuanhan Liu--------------------------
30842683a7dSHuawei Xie
3092bfaec90SYuanhan LiuVhost-user uses Unix domain sockets for passing messages. This means the DPDK
3102bfaec90SYuanhan Liuvhost-user implementation has two options:
31142683a7dSHuawei Xie
3122bfaec90SYuanhan Liu* DPDK vhost-user acts as the server.
31342683a7dSHuawei Xie
3142bfaec90SYuanhan Liu  DPDK will create a Unix domain socket server file and listen for
3152bfaec90SYuanhan Liu  connections from the frontend.
31642683a7dSHuawei Xie
3172bfaec90SYuanhan Liu  Note, this is the default mode, and the only mode before DPDK v16.07.
31842683a7dSHuawei Xie
3192bfaec90SYuanhan Liu
3202bfaec90SYuanhan Liu* DPDK vhost-user acts as the client.
3212bfaec90SYuanhan Liu
3222bfaec90SYuanhan Liu  Unlike the server mode, this mode doesn't create the socket file;
3232bfaec90SYuanhan Liu  it just tries to connect to the server (which responses to create the
3242bfaec90SYuanhan Liu  file instead).
3252bfaec90SYuanhan Liu
3262bfaec90SYuanhan Liu  When the DPDK vhost-user application restarts, DPDK vhost-user will try to
3272bfaec90SYuanhan Liu  connect to the server again. This is how the "reconnect" feature works.
3282bfaec90SYuanhan Liu
329f6ee75b5SYuanhan Liu  .. Note::
330f6ee75b5SYuanhan Liu     * The "reconnect" feature requires **QEMU v2.7** (or above).
331f6ee75b5SYuanhan Liu
332f6ee75b5SYuanhan Liu     * The vhost supported features must be exactly the same before and
333f6ee75b5SYuanhan Liu       after the restart. For example, if TSO is disabled and then enabled,
334f6ee75b5SYuanhan Liu       nothing will work and issues undefined might happen.
3352bfaec90SYuanhan Liu
3362bfaec90SYuanhan LiuNo matter which mode is used, once a connection is established, DPDK
3372bfaec90SYuanhan Liuvhost-user will start receiving and processing vhost messages from QEMU.
3382bfaec90SYuanhan Liu
3392bfaec90SYuanhan LiuFor messages with a file descriptor, the file descriptor can be used directly
3402bfaec90SYuanhan Liuin the vhost process as it is already installed by the Unix domain socket.
3412bfaec90SYuanhan Liu
3422bfaec90SYuanhan LiuThe supported vhost messages are:
3432bfaec90SYuanhan Liu
3442bfaec90SYuanhan Liu* ``VHOST_SET_MEM_TABLE``
3452bfaec90SYuanhan Liu* ``VHOST_SET_VRING_KICK``
3462bfaec90SYuanhan Liu* ``VHOST_SET_VRING_CALL``
3472bfaec90SYuanhan Liu* ``VHOST_SET_LOG_FD``
3482bfaec90SYuanhan Liu* ``VHOST_SET_VRING_ERR``
3492bfaec90SYuanhan Liu
3502bfaec90SYuanhan LiuFor ``VHOST_SET_MEM_TABLE`` message, QEMU will send information for each
3512bfaec90SYuanhan Liumemory region and its file descriptor in the ancillary data of the message.
3522bfaec90SYuanhan LiuThe file descriptor is used to map that region.
3532bfaec90SYuanhan Liu
3542bfaec90SYuanhan Liu``VHOST_SET_VRING_KICK`` is used as the signal to put the vhost device into
3552bfaec90SYuanhan Liuthe data plane, and ``VHOST_GET_VRING_BASE`` is used as the signal to remove
3562bfaec90SYuanhan Liuthe vhost device from the data plane.
35742683a7dSHuawei Xie
35842683a7dSHuawei XieWhen the socket connection is closed, vhost will destroy the device.
35942683a7dSHuawei Xie
360768274ebSJianfeng TanGuest memory requirement
361768274ebSJianfeng Tan------------------------
362768274ebSJianfeng Tan
363768274ebSJianfeng Tan* Memory pre-allocation
364768274ebSJianfeng Tan
365*362f06f9SPatrick Fu  For non-zerocopy non-async data path, guest memory pre-allocation is not a
366*362f06f9SPatrick Fu  must. This can help save of memory. If users really want the guest memory
367*362f06f9SPatrick Fu  to be pre-allocated (e.g., for performance reason), we can add option
368*362f06f9SPatrick Fu  ``-mem-prealloc`` when starting QEMU. Or, we can lock all memory at vhost
369*362f06f9SPatrick Fu  side which will force memory to be allocated when mmap at vhost side;
370*362f06f9SPatrick Fu  option --mlockall in ovs-dpdk is an example in hand.
371768274ebSJianfeng Tan
372*362f06f9SPatrick Fu  For async and zerocopy data path, we force the VM memory to be
373*362f06f9SPatrick Fu  pre-allocated at vhost lib when mapping the guest memory; and also we need
374*362f06f9SPatrick Fu  to lock the memory to prevent pages being swapped out to disk.
375768274ebSJianfeng Tan
376768274ebSJianfeng Tan* Memory sharing
377768274ebSJianfeng Tan
378768274ebSJianfeng Tan  Make sure ``share=on`` QEMU option is given. vhost-user will not work with
379768274ebSJianfeng Tan  a QEMU version without shared memory mapping.
380768274ebSJianfeng Tan
3810ee5e7fbSSiobhan ButlerVhost supported vSwitch reference
3820ee5e7fbSSiobhan Butler---------------------------------
3830ee5e7fbSSiobhan Butler
3842bfaec90SYuanhan LiuFor more vhost details and how to support vhost in vSwitch, please refer to
3852bfaec90SYuanhan Liuthe vhost example in the DPDK Sample Applications Guide.
3866beea244SZhihong Wang
3876beea244SZhihong WangVhost data path acceleration (vDPA)
3886beea244SZhihong Wang-----------------------------------
3896beea244SZhihong Wang
3906beea244SZhihong WangvDPA supports selective datapath in vhost-user lib by enabling virtio ring
3916beea244SZhihong Wangcompatible devices to serve virtio driver directly for datapath acceleration.
3926beea244SZhihong Wang
3936beea244SZhihong Wang``rte_vhost_driver_attach_vdpa_device`` is used to configure the vhost device
3946beea244SZhihong Wangwith accelerated backend.
3956beea244SZhihong Wang
3966beea244SZhihong WangAlso vhost device capabilities are made configurable to adopt various devices.
3976beea244SZhihong WangSuch capabilities include supported features, protocol features, queue number.
3986beea244SZhihong Wang
3996beea244SZhihong WangFinally, a set of device ops is defined for device specific operations:
4006beea244SZhihong Wang
4016beea244SZhihong Wang* ``get_queue_num``
4026beea244SZhihong Wang
4036beea244SZhihong Wang  Called to get supported queue number of the device.
4046beea244SZhihong Wang
4056beea244SZhihong Wang* ``get_features``
4066beea244SZhihong Wang
4076beea244SZhihong Wang  Called to get supported features of the device.
4086beea244SZhihong Wang
4096beea244SZhihong Wang* ``get_protocol_features``
4106beea244SZhihong Wang
4116beea244SZhihong Wang  Called to get supported protocol features of the device.
4126beea244SZhihong Wang
4136beea244SZhihong Wang* ``dev_conf``
4146beea244SZhihong Wang
4156beea244SZhihong Wang  Called to configure the actual device when the virtio device becomes ready.
4166beea244SZhihong Wang
4176beea244SZhihong Wang* ``dev_close``
4186beea244SZhihong Wang
4196beea244SZhihong Wang  Called to close the actual device when the virtio device is stopped.
4206beea244SZhihong Wang
4216beea244SZhihong Wang* ``set_vring_state``
4226beea244SZhihong Wang
4236beea244SZhihong Wang  Called to change the state of the vring in the actual device when vring state
4246beea244SZhihong Wang  changes.
4256beea244SZhihong Wang
4266beea244SZhihong Wang* ``set_features``
4276beea244SZhihong Wang
4286beea244SZhihong Wang  Called to set the negotiated features to device.
4296beea244SZhihong Wang
4306beea244SZhihong Wang* ``migration_done``
4316beea244SZhihong Wang
4326beea244SZhihong Wang  Called to allow the device to response to RARP sending.
4336beea244SZhihong Wang
4346beea244SZhihong Wang* ``get_vfio_group_fd``
4356beea244SZhihong Wang
4366beea244SZhihong Wang   Called to get the VFIO group fd of the device.
4376beea244SZhihong Wang
4386beea244SZhihong Wang* ``get_vfio_device_fd``
4396beea244SZhihong Wang
4406beea244SZhihong Wang  Called to get the VFIO device fd of the device.
4416beea244SZhihong Wang
4426beea244SZhihong Wang* ``get_notify_area``
4436beea244SZhihong Wang
4446beea244SZhihong Wang  Called to get the notify area info of the queue.
445