xref: /dpdk/doc/guides/prog_guide/vhost_lib.rst (revision c8a3ee49c94d72d03e2de5814bca01f7fd7d79d1)
15630257fSFerruh Yigit..  SPDX-License-Identifier: BSD-3-Clause
25630257fSFerruh Yigit    Copyright(c) 2010-2016 Intel Corporation.
30ee5e7fbSSiobhan Butler
40ee5e7fbSSiobhan ButlerVhost Library
50ee5e7fbSSiobhan Butler=============
60ee5e7fbSSiobhan Butler
72bfaec90SYuanhan LiuThe vhost library implements a user space virtio net server allowing the user
82bfaec90SYuanhan Liuto manipulate the virtio ring directly. In another words, it allows the user
92bfaec90SYuanhan Liuto fetch/put packets from/to the VM virtio net device. To achieve this, a
102bfaec90SYuanhan Liuvhost library should be able to:
112bfaec90SYuanhan Liu
122bfaec90SYuanhan Liu* Access the guest memory:
132bfaec90SYuanhan Liu
142bfaec90SYuanhan Liu  For QEMU, this is done by using the ``-object memory-backend-file,share=on,...``
152bfaec90SYuanhan Liu  option. Which means QEMU will create a file to serve as the guest RAM.
162bfaec90SYuanhan Liu  The ``share=on`` option allows another process to map that file, which
172bfaec90SYuanhan Liu  means it can access the guest RAM.
182bfaec90SYuanhan Liu
192bfaec90SYuanhan Liu* Know all the necessary information about the vring:
202bfaec90SYuanhan Liu
212bfaec90SYuanhan Liu  Information such as where the available ring is stored. Vhost defines some
22647e191bSYuanhan Liu  messages (passed through a Unix domain socket file) to tell the backend all
23647e191bSYuanhan Liu  the information it needs to know how to manipulate the vring.
242bfaec90SYuanhan Liu
250ee5e7fbSSiobhan Butler
260ee5e7fbSSiobhan ButlerVhost API Overview
270ee5e7fbSSiobhan Butler------------------
280ee5e7fbSSiobhan Butler
295fbb3941SYuanhan LiuThe following is an overview of some key Vhost API functions:
300ee5e7fbSSiobhan Butler
312bfaec90SYuanhan Liu* ``rte_vhost_driver_register(path, flags)``
320ee5e7fbSSiobhan Butler
33647e191bSYuanhan Liu  This function registers a vhost driver into the system. ``path`` specifies
34647e191bSYuanhan Liu  the Unix domain socket file path.
350ee5e7fbSSiobhan Butler
36647e191bSYuanhan Liu  Currently supported flags are:
370ee5e7fbSSiobhan Butler
382bfaec90SYuanhan Liu  - ``RTE_VHOST_USER_CLIENT``
390ee5e7fbSSiobhan Butler
402bfaec90SYuanhan Liu    DPDK vhost-user will act as the client when this flag is given. See below
412bfaec90SYuanhan Liu    for an explanation.
420ee5e7fbSSiobhan Butler
432bfaec90SYuanhan Liu  - ``RTE_VHOST_USER_NO_RECONNECT``
440ee5e7fbSSiobhan Butler
452bfaec90SYuanhan Liu    When DPDK vhost-user acts as the client it will keep trying to reconnect
462bfaec90SYuanhan Liu    to the server (QEMU) until it succeeds. This is useful in two cases:
470ee5e7fbSSiobhan Butler
482bfaec90SYuanhan Liu    * When QEMU is not started yet.
492bfaec90SYuanhan Liu    * When QEMU restarts (for example due to a guest OS reboot).
500ee5e7fbSSiobhan Butler
512bfaec90SYuanhan Liu    This reconnect option is enabled by default. However, it can be turned off
522bfaec90SYuanhan Liu    by setting this flag.
530ee5e7fbSSiobhan Butler
54002d6a7eSMaxime Coquelin  - ``RTE_VHOST_USER_IOMMU_SUPPORT``
55002d6a7eSMaxime Coquelin
56002d6a7eSMaxime Coquelin    IOMMU support will be enabled when this flag is set. It is disabled by
57002d6a7eSMaxime Coquelin    default.
58002d6a7eSMaxime Coquelin
59002d6a7eSMaxime Coquelin    Enabling this flag makes possible to use guest vIOMMU to protect vhost
60002d6a7eSMaxime Coquelin    from accessing memory the virtio device isn't allowed to, when the feature
61002d6a7eSMaxime Coquelin    is negotiated and an IOMMU device is declared.
62002d6a7eSMaxime Coquelin
63cd85039eSMaxime Coquelin  - ``RTE_VHOST_USER_POSTCOPY_SUPPORT``
64cd85039eSMaxime Coquelin
65cd85039eSMaxime Coquelin    Postcopy live-migration support will be enabled when this flag is set.
66cd85039eSMaxime Coquelin    It is disabled by default.
67cd85039eSMaxime Coquelin
68cd85039eSMaxime Coquelin    Enabling this flag should only be done when the calling application does
69cd85039eSMaxime Coquelin    not pre-fault the guest shared memory, otherwise migration would fail.
70cd85039eSMaxime Coquelin
71c3ff0ac7SFlavio Leitner  - ``RTE_VHOST_USER_LINEARBUF_SUPPORT``
72c3ff0ac7SFlavio Leitner
73c3ff0ac7SFlavio Leitner    Enabling this flag forces vhost dequeue function to only provide linear
74c3ff0ac7SFlavio Leitner    pktmbuf (no multi-segmented pktmbuf).
75c3ff0ac7SFlavio Leitner
76c3ff0ac7SFlavio Leitner    The vhost library by default provides a single pktmbuf for given a
77c3ff0ac7SFlavio Leitner    packet, but if for some reason the data doesn't fit into a single
78c3ff0ac7SFlavio Leitner    pktmbuf (e.g., TSO is enabled), the library will allocate additional
79c3ff0ac7SFlavio Leitner    pktmbufs from the same mempool and chain them together to create a
80c3ff0ac7SFlavio Leitner    multi-segmented pktmbuf.
81c3ff0ac7SFlavio Leitner
82c3ff0ac7SFlavio Leitner    However, the vhost application needs to support multi-segmented format.
83c3ff0ac7SFlavio Leitner    If the vhost application does not support that format and requires large
84c3ff0ac7SFlavio Leitner    buffers to be dequeue, this flag should be enabled to force only linear
85c3ff0ac7SFlavio Leitner    buffers (see RTE_VHOST_USER_EXTBUF_SUPPORT) or drop the packet.
86c3ff0ac7SFlavio Leitner
87c3ff0ac7SFlavio Leitner    It is disabled by default.
88c3ff0ac7SFlavio Leitner
89c3ff0ac7SFlavio Leitner  - ``RTE_VHOST_USER_EXTBUF_SUPPORT``
90c3ff0ac7SFlavio Leitner
91c3ff0ac7SFlavio Leitner    Enabling this flag allows vhost dequeue function to allocate and attach
92c3ff0ac7SFlavio Leitner    an external buffer to a pktmbuf if the pkmbuf doesn't provide enough
93c3ff0ac7SFlavio Leitner    space to store all data.
94c3ff0ac7SFlavio Leitner
95c3ff0ac7SFlavio Leitner    This is useful when the vhost application wants to support large packets
96c3ff0ac7SFlavio Leitner    but doesn't want to increase the default mempool object size nor to
97c3ff0ac7SFlavio Leitner    support multi-segmented mbufs (non-linear). In this case, a fresh buffer
98c3ff0ac7SFlavio Leitner    is allocated using rte_malloc() which gets attached to a pktmbuf using
99c3ff0ac7SFlavio Leitner    rte_pktmbuf_attach_extbuf().
100c3ff0ac7SFlavio Leitner
101c3ff0ac7SFlavio Leitner    See RTE_VHOST_USER_LINEARBUF_SUPPORT as well to disable multi-segmented
102c3ff0ac7SFlavio Leitner    mbufs for application that doesn't support chained mbufs.
103c3ff0ac7SFlavio Leitner
104c3ff0ac7SFlavio Leitner    It is disabled by default.
105c3ff0ac7SFlavio Leitner
106362f06f9SPatrick Fu  - ``RTE_VHOST_USER_ASYNC_COPY``
107362f06f9SPatrick Fu
10853d3f477SJiayu Hu    Asynchronous data path will be enabled when this flag is set. Async
10953d3f477SJiayu Hu    data path allows applications to enable DMA acceleration for vhost
11053d3f477SJiayu Hu    queues. Vhost leverages the registered DMA channels to free CPU from
11153d3f477SJiayu Hu    memory copy operations in data path. A set of async data path APIs are
11253d3f477SJiayu Hu    defined for DPDK applications to make use of the async capability. Only
11353d3f477SJiayu Hu    packets enqueued/dequeued by async APIs are processed through the async
11453d3f477SJiayu Hu    data path.
115362f06f9SPatrick Fu
116362f06f9SPatrick Fu    Currently this feature is only implemented on split ring enqueue data
117362f06f9SPatrick Fu    path.
118362f06f9SPatrick Fu
119362f06f9SPatrick Fu    It is disabled by default.
120362f06f9SPatrick Fu
121ca7036b4SDavid Marchand  - ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS``
122ca7036b4SDavid Marchand
123ca7036b4SDavid Marchand    Since v16.04, the vhost library forwards checksum and gso requests for
124ca7036b4SDavid Marchand    packets received from a virtio driver by filling Tx offload metadata in
125ca7036b4SDavid Marchand    the mbuf. This behavior is inconsistent with other drivers but it is left
126ca7036b4SDavid Marchand    untouched for existing applications that might rely on it.
127ca7036b4SDavid Marchand
128ca7036b4SDavid Marchand    This flag disables the legacy behavior and instead ask vhost to simply
129ca7036b4SDavid Marchand    populate Rx offload metadata in the mbuf.
130ca7036b4SDavid Marchand
131ca7036b4SDavid Marchand    It is disabled by default.
132ca7036b4SDavid Marchand
133be75dc99SMaxime Coquelin  - ``RTE_VHOST_USER_NET_STATS_ENABLE``
134be75dc99SMaxime Coquelin
135be75dc99SMaxime Coquelin  Per-virtqueue statistics collection will be enabled when this flag is set.
136be75dc99SMaxime Coquelin  When enabled, the application may use rte_vhost_stats_get_names() and
137be75dc99SMaxime Coquelin  rte_vhost_stats_get() to collect statistics, and rte_vhost_stats_reset() to
138be75dc99SMaxime Coquelin  reset them.
139be75dc99SMaxime Coquelin
140be75dc99SMaxime Coquelin  It is disabled by default
141be75dc99SMaxime Coquelin
1425fbb3941SYuanhan Liu* ``rte_vhost_driver_set_features(path, features)``
1435fbb3941SYuanhan Liu
1445fbb3941SYuanhan Liu  This function sets the feature bits the vhost-user driver supports. The
1455fbb3941SYuanhan Liu  vhost-user driver could be vhost-user net, yet it could be something else,
1465fbb3941SYuanhan Liu  say, vhost-user SCSI.
1475fbb3941SYuanhan Liu
1487c129037SYuanhan Liu* ``rte_vhost_driver_callback_register(path, vhost_device_ops)``
1492bfaec90SYuanhan Liu
1502bfaec90SYuanhan Liu  This function registers a set of callbacks, to let DPDK applications take
1512bfaec90SYuanhan Liu  the appropriate action when some events happen. The following events are
1522bfaec90SYuanhan Liu  currently supported:
1532bfaec90SYuanhan Liu
1542bfaec90SYuanhan Liu  * ``new_device(int vid)``
1552bfaec90SYuanhan Liu
156cb043557SYuanhan Liu    This callback is invoked when a virtio device becomes ready. ``vid``
157cb043557SYuanhan Liu    is the vhost device ID.
1582bfaec90SYuanhan Liu
1592bfaec90SYuanhan Liu  * ``destroy_device(int vid)``
1602bfaec90SYuanhan Liu
161efba12a7SDariusz Stojaczyk    This callback is invoked when a virtio device is paused or shut down.
1622bfaec90SYuanhan Liu
1632bfaec90SYuanhan Liu  * ``vring_state_changed(int vid, uint16_t queue_id, int enable)``
1642bfaec90SYuanhan Liu
1652bfaec90SYuanhan Liu    This callback is invoked when a specific queue's state is changed, for
1662bfaec90SYuanhan Liu    example to enabled or disabled.
1672bfaec90SYuanhan Liu
168abd53c16SYuanhan Liu  * ``features_changed(int vid, uint64_t features)``
169abd53c16SYuanhan Liu
170abd53c16SYuanhan Liu    This callback is invoked when the features is changed. For example,
171abd53c16SYuanhan Liu    ``VHOST_F_LOG_ALL`` will be set/cleared at the start/end of live
172abd53c16SYuanhan Liu    migration, respectively.
173abd53c16SYuanhan Liu
174efba12a7SDariusz Stojaczyk  * ``new_connection(int vid)``
175efba12a7SDariusz Stojaczyk
176efba12a7SDariusz Stojaczyk    This callback is invoked on new vhost-user socket connection. If DPDK
177efba12a7SDariusz Stojaczyk    acts as the server the device should not be deleted before
178efba12a7SDariusz Stojaczyk    ``destroy_connection`` callback is received.
179efba12a7SDariusz Stojaczyk
180efba12a7SDariusz Stojaczyk  * ``destroy_connection(int vid)``
181efba12a7SDariusz Stojaczyk
182efba12a7SDariusz Stojaczyk    This callback is invoked when vhost-user socket connection is closed.
183efba12a7SDariusz Stojaczyk    It indicates that device with id ``vid`` is no longer in use and can be
184efba12a7SDariusz Stojaczyk    safely deleted.
185efba12a7SDariusz Stojaczyk
186af147591SYuanhan Liu* ``rte_vhost_driver_disable/enable_features(path, features))``
187af147591SYuanhan Liu
188af147591SYuanhan Liu  This function disables/enables some features. For example, it can be used to
189af147591SYuanhan Liu  disable mergeable buffers and TSO features, which both are enabled by
190af147591SYuanhan Liu  default.
191af147591SYuanhan Liu
192af147591SYuanhan Liu* ``rte_vhost_driver_start(path)``
193af147591SYuanhan Liu
194af147591SYuanhan Liu  This function triggers the vhost-user negotiation. It should be invoked at
195af147591SYuanhan Liu  the end of initializing a vhost-user driver.
196af147591SYuanhan Liu
1972bfaec90SYuanhan Liu* ``rte_vhost_enqueue_burst(vid, queue_id, pkts, count)``
1982bfaec90SYuanhan Liu
1992bfaec90SYuanhan Liu  Transmits (enqueues) ``count`` packets from host to guest.
2002bfaec90SYuanhan Liu
2012bfaec90SYuanhan Liu* ``rte_vhost_dequeue_burst(vid, queue_id, mbuf_pool, pkts, count)``
2022bfaec90SYuanhan Liu
2032bfaec90SYuanhan Liu  Receives (dequeues) ``count`` packets from guest, and stored them at ``pkts``.
2042bfaec90SYuanhan Liu
205939066d9SFan Zhang* ``rte_vhost_crypto_create(vid, cryptodev_id, sess_mempool, socket_id)``
206939066d9SFan Zhang
207939066d9SFan Zhang  As an extension of new_device(), this function adds virtio-crypto workload
208939066d9SFan Zhang  acceleration capability to the device. All crypto workload is processed by
209939066d9SFan Zhang  DPDK cryptodev with the device ID of ``cryptodev_id``.
210939066d9SFan Zhang
211939066d9SFan Zhang* ``rte_vhost_crypto_free(vid)``
212939066d9SFan Zhang
213939066d9SFan Zhang  Frees the memory and vhost-user message handlers created in
214939066d9SFan Zhang  rte_vhost_crypto_create().
215939066d9SFan Zhang
216939066d9SFan Zhang* ``rte_vhost_crypto_fetch_requests(vid, queue_id, ops, nb_ops)``
217939066d9SFan Zhang
218939066d9SFan Zhang  Receives (dequeues) ``nb_ops`` virtio-crypto requests from guest, parses
219939066d9SFan Zhang  them to DPDK Crypto Operations, and fills the ``ops`` with parsing results.
220939066d9SFan Zhang
221939066d9SFan Zhang* ``rte_vhost_crypto_finalize_requests(queue_id, ops, nb_ops)``
222939066d9SFan Zhang
223939066d9SFan Zhang  After the ``ops`` are dequeued from Cryptodev, finalizes the jobs and
224939066d9SFan Zhang  notifies the guest(s).
225939066d9SFan Zhang
226939066d9SFan Zhang* ``rte_vhost_crypto_set_zero_copy(vid, option)``
227939066d9SFan Zhang
228939066d9SFan Zhang  Enable or disable zero copy feature of the vhost crypto backend.
229939066d9SFan Zhang
23053d3f477SJiayu Hu* ``rte_vhost_async_dma_configure(dma_id, vchan_id)``
231362f06f9SPatrick Fu
23253d3f477SJiayu Hu  Tell vhost which DMA vChannel is going to use. This function needs to
23353d3f477SJiayu Hu  be called before register async data-path for vring.
234362f06f9SPatrick Fu
23553d3f477SJiayu Hu* ``rte_vhost_async_channel_register(vid, queue_id)``
236362f06f9SPatrick Fu
23753d3f477SJiayu Hu  Register async DMA acceleration for a vhost queue after vring is enabled.
238362f06f9SPatrick Fu
23953d3f477SJiayu Hu* ``rte_vhost_async_channel_register_thread_unsafe(vid, queue_id)``
240acbc3888SJiayu Hu
24153d3f477SJiayu Hu  Register async DMA acceleration for a vhost queue without performing
24253d3f477SJiayu Hu  any locking.
243fa51f1aaSJiayu Hu
244fa51f1aaSJiayu Hu  This function is only safe to call in vhost callback functions
245ab4bb424SMaxime Coquelin  (i.e., struct rte_vhost_device_ops).
246fa51f1aaSJiayu Hu
247362f06f9SPatrick Fu* ``rte_vhost_async_channel_unregister(vid, queue_id)``
248362f06f9SPatrick Fu
24953d3f477SJiayu Hu  Unregister the async DMA acceleration from a vhost queue.
250bcabc70aSJiayu Hu  Unregistration will fail, if the vhost queue has in-flight
251bcabc70aSJiayu Hu  packets that are not completed.
252bcabc70aSJiayu Hu
25353d3f477SJiayu Hu  Unregister async DMA acceleration in vring_state_changed() may
254bcabc70aSJiayu Hu  fail, as this API tries to acquire the spinlock of vhost
255bcabc70aSJiayu Hu  queue. The recommended way is to unregister async copy
256bcabc70aSJiayu Hu  devices for all vhost queues in destroy_device(), when a
257bcabc70aSJiayu Hu  virtio device is paused or shut down.
258362f06f9SPatrick Fu
259fa51f1aaSJiayu Hu* ``rte_vhost_async_channel_unregister_thread_unsafe(vid, queue_id)``
260fa51f1aaSJiayu Hu
26153d3f477SJiayu Hu  Unregister async DMA acceleration for a vhost queue without performing
26253d3f477SJiayu Hu  any locking.
263fa51f1aaSJiayu Hu
264fa51f1aaSJiayu Hu  This function is only safe to call in vhost callback functions
265ab4bb424SMaxime Coquelin  (i.e., struct rte_vhost_device_ops).
266fa51f1aaSJiayu Hu
26753d3f477SJiayu Hu* ``rte_vhost_submit_enqueue_burst(vid, queue_id, pkts, count, dma_id, vchan_id)``
268362f06f9SPatrick Fu
269362f06f9SPatrick Fu  Submit an enqueue request to transmit ``count`` packets from host to guest
27053d3f477SJiayu Hu  by async data path. Applications must not free the packets submitted for
27153d3f477SJiayu Hu  enqueue until the packets are completed.
272362f06f9SPatrick Fu
27353d3f477SJiayu Hu* ``rte_vhost_poll_enqueue_completed(vid, queue_id, pkts, count, dma_id, vchan_id)``
274362f06f9SPatrick Fu
275362f06f9SPatrick Fu  Poll enqueue completion status from async data path. Completed packets
276362f06f9SPatrick Fu  are returned to applications through ``pkts``.
277362f06f9SPatrick Fu
2780c0935c5SJiayu Hu* ``rte_vhost_async_get_inflight(vid, queue_id)``
2790c0935c5SJiayu Hu
2800c0935c5SJiayu Hu  This function returns the amount of in-flight packets for the vhost
2810c0935c5SJiayu Hu  queue using async acceleration.
2820c0935c5SJiayu Hu
2831419e8d9SXuan Ding * ``rte_vhost_async_get_inflight_thread_unsafe(vid, queue_id)``
2841419e8d9SXuan Ding
2851419e8d9SXuan Ding  Get the number of inflight packets for a vhost queue without performing
2861419e8d9SXuan Ding  any locking. It should only be used within the vhost ops, which already
2871419e8d9SXuan Ding  holds the lock.
2881419e8d9SXuan Ding
28953d3f477SJiayu Hu* ``rte_vhost_clear_queue_thread_unsafe(vid, queue_id, **pkts, count, dma_id, vchan_id)``
290b737fd61SCheng Jiang
2913753ebf0SYuan Wang  Clear in-flight packets which are submitted to async channel in vhost
2923753ebf0SYuan Wang  async data path without performing locking on virtqueue. Completed
2933753ebf0SYuan Wang  packets are returned to applications through ``pkts``.
2943753ebf0SYuan Wang
2953753ebf0SYuan Wang* ``rte_vhost_clear_queue(vid, queue_id, **pkts, count, dma_id, vchan_id)``
2963753ebf0SYuan Wang
2973753ebf0SYuan Wang  Clear in-flight packets which are submitted to async channel in vhost async data
298b737fd61SCheng Jiang  path. Completed packets are returned to applications through ``pkts``.
299b737fd61SCheng Jiang
300be75dc99SMaxime Coquelin* ``rte_vhost_vring_stats_get_names(int vid, uint16_t queue_id, struct rte_vhost_stat_name *names, unsigned int size)``
301be75dc99SMaxime Coquelin
302be75dc99SMaxime Coquelin  This function returns the names of the queue statistics. It requires
303be75dc99SMaxime Coquelin  statistics collection to be enabled at registration time.
304be75dc99SMaxime Coquelin
305be75dc99SMaxime Coquelin* ``rte_vhost_vring_stats_get(int vid, uint16_t queue_id, struct rte_vhost_stat *stats, unsigned int n)``
306be75dc99SMaxime Coquelin
307be75dc99SMaxime Coquelin  This function returns the queue statistics. It requires statistics
308be75dc99SMaxime Coquelin  collection to be enabled at registration time.
309be75dc99SMaxime Coquelin
310be75dc99SMaxime Coquelin* ``rte_vhost_vring_stats_reset(int vid, uint16_t queue_id)``
311be75dc99SMaxime Coquelin
312be75dc99SMaxime Coquelin  This function resets the queue statistics. It requires statistics
313be75dc99SMaxime Coquelin  collection to be enabled at registration time.
314be75dc99SMaxime Coquelin
31584d52043SXuan Ding* ``rte_vhost_async_try_dequeue_burst(vid, queue_id, mbuf_pool, pkts, count,
31684d52043SXuan Ding  nr_inflight, dma_id, vchan_id)``
31784d52043SXuan Ding
31884d52043SXuan Ding  Receive ``count`` packets from guest to host in async data path,
31984d52043SXuan Ding  and store them at ``pkts``.
32084d52043SXuan Ding
321486f65e6SAndy Pei* ``rte_vhost_driver_get_vdpa_dev_type(path, type)``
322486f65e6SAndy Pei
323486f65e6SAndy Pei  Get device type of vDPA device, such as VDPA_DEVICE_TYPE_NET,
324486f65e6SAndy Pei  VDPA_DEVICE_TYPE_BLK.
325486f65e6SAndy Pei
326647e191bSYuanhan LiuVhost-user Implementations
327647e191bSYuanhan Liu--------------------------
32842683a7dSHuawei Xie
3292bfaec90SYuanhan LiuVhost-user uses Unix domain sockets for passing messages. This means the DPDK
3302bfaec90SYuanhan Liuvhost-user implementation has two options:
33142683a7dSHuawei Xie
3322bfaec90SYuanhan Liu* DPDK vhost-user acts as the server.
33342683a7dSHuawei Xie
3342bfaec90SYuanhan Liu  DPDK will create a Unix domain socket server file and listen for
3352bfaec90SYuanhan Liu  connections from the frontend.
33642683a7dSHuawei Xie
3372bfaec90SYuanhan Liu  Note, this is the default mode, and the only mode before DPDK v16.07.
33842683a7dSHuawei Xie
3392bfaec90SYuanhan Liu
3402bfaec90SYuanhan Liu* DPDK vhost-user acts as the client.
3412bfaec90SYuanhan Liu
3422bfaec90SYuanhan Liu  Unlike the server mode, this mode doesn't create the socket file;
3432bfaec90SYuanhan Liu  it just tries to connect to the server (which responses to create the
3442bfaec90SYuanhan Liu  file instead).
3452bfaec90SYuanhan Liu
3462bfaec90SYuanhan Liu  When the DPDK vhost-user application restarts, DPDK vhost-user will try to
3472bfaec90SYuanhan Liu  connect to the server again. This is how the "reconnect" feature works.
3482bfaec90SYuanhan Liu
349f6ee75b5SYuanhan Liu  .. Note::
350f6ee75b5SYuanhan Liu     * The "reconnect" feature requires **QEMU v2.7** (or above).
351f6ee75b5SYuanhan Liu
352f6ee75b5SYuanhan Liu     * The vhost supported features must be exactly the same before and
353f6ee75b5SYuanhan Liu       after the restart. For example, if TSO is disabled and then enabled,
354*c8a3ee49SHerakliusz Lipiec       nothing will work and undefined issues might happen.
3552bfaec90SYuanhan Liu
3562bfaec90SYuanhan LiuNo matter which mode is used, once a connection is established, DPDK
3572bfaec90SYuanhan Liuvhost-user will start receiving and processing vhost messages from QEMU.
3582bfaec90SYuanhan Liu
3592bfaec90SYuanhan LiuFor messages with a file descriptor, the file descriptor can be used directly
3602bfaec90SYuanhan Liuin the vhost process as it is already installed by the Unix domain socket.
3612bfaec90SYuanhan Liu
3622bfaec90SYuanhan LiuThe supported vhost messages are:
3632bfaec90SYuanhan Liu
3642bfaec90SYuanhan Liu* ``VHOST_SET_MEM_TABLE``
3652bfaec90SYuanhan Liu* ``VHOST_SET_VRING_KICK``
3662bfaec90SYuanhan Liu* ``VHOST_SET_VRING_CALL``
3672bfaec90SYuanhan Liu* ``VHOST_SET_LOG_FD``
3682bfaec90SYuanhan Liu* ``VHOST_SET_VRING_ERR``
3692bfaec90SYuanhan Liu
3702bfaec90SYuanhan LiuFor ``VHOST_SET_MEM_TABLE`` message, QEMU will send information for each
3712bfaec90SYuanhan Liumemory region and its file descriptor in the ancillary data of the message.
3722bfaec90SYuanhan LiuThe file descriptor is used to map that region.
3732bfaec90SYuanhan Liu
3742bfaec90SYuanhan Liu``VHOST_SET_VRING_KICK`` is used as the signal to put the vhost device into
3752bfaec90SYuanhan Liuthe data plane, and ``VHOST_GET_VRING_BASE`` is used as the signal to remove
3762bfaec90SYuanhan Liuthe vhost device from the data plane.
37742683a7dSHuawei Xie
37842683a7dSHuawei XieWhen the socket connection is closed, vhost will destroy the device.
37942683a7dSHuawei Xie
380768274ebSJianfeng TanGuest memory requirement
381768274ebSJianfeng Tan------------------------
382768274ebSJianfeng Tan
383768274ebSJianfeng Tan* Memory pre-allocation
384768274ebSJianfeng Tan
385*c8a3ee49SHerakliusz Lipiec  For non-async data path guest memory pre-allocation is not a
386*c8a3ee49SHerakliusz Lipiec  must but can help save memory. To do this we can add option
387*c8a3ee49SHerakliusz Lipiec  ``-mem-prealloc`` when starting QEMU, or we can lock all memory at vhost
388*c8a3ee49SHerakliusz Lipiec  side which will force memory to be allocated when it calls mmap
389*c8a3ee49SHerakliusz Lipiec  (option --mlockall in ovs-dpdk is an example in hand).
390*c8a3ee49SHerakliusz Lipiec
391768274ebSJianfeng Tan
392cacf8267SMaxime Coquelin  For async data path, we force the VM memory to be pre-allocated at vhost
393cacf8267SMaxime Coquelin  lib when mapping the guest memory; and also we need to lock the memory to
394cacf8267SMaxime Coquelin  prevent pages being swapped out to disk.
395768274ebSJianfeng Tan
396768274ebSJianfeng Tan* Memory sharing
397768274ebSJianfeng Tan
398*c8a3ee49SHerakliusz Lipiec  Make sure ``share=on`` QEMU option is given. The vhost-user will not work with
399*c8a3ee49SHerakliusz Lipiec  a QEMU instance without shared memory mapping.
400768274ebSJianfeng Tan
4010ee5e7fbSSiobhan ButlerVhost supported vSwitch reference
4020ee5e7fbSSiobhan Butler---------------------------------
4030ee5e7fbSSiobhan Butler
4042bfaec90SYuanhan LiuFor more vhost details and how to support vhost in vSwitch, please refer to
4052bfaec90SYuanhan Liuthe vhost example in the DPDK Sample Applications Guide.
4066beea244SZhihong Wang
4076beea244SZhihong WangVhost data path acceleration (vDPA)
4086beea244SZhihong Wang-----------------------------------
4096beea244SZhihong Wang
4106beea244SZhihong WangvDPA supports selective datapath in vhost-user lib by enabling virtio ring
4116beea244SZhihong Wangcompatible devices to serve virtio driver directly for datapath acceleration.
4126beea244SZhihong Wang
4136beea244SZhihong Wang``rte_vhost_driver_attach_vdpa_device`` is used to configure the vhost device
4146beea244SZhihong Wangwith accelerated backend.
4156beea244SZhihong Wang
4166beea244SZhihong WangAlso vhost device capabilities are made configurable to adopt various devices.
4176beea244SZhihong WangSuch capabilities include supported features, protocol features, queue number.
4186beea244SZhihong Wang
4196beea244SZhihong WangFinally, a set of device ops is defined for device specific operations:
4206beea244SZhihong Wang
4216beea244SZhihong Wang* ``get_queue_num``
4226beea244SZhihong Wang
4236beea244SZhihong Wang  Called to get supported queue number of the device.
4246beea244SZhihong Wang
4256beea244SZhihong Wang* ``get_features``
4266beea244SZhihong Wang
4276beea244SZhihong Wang  Called to get supported features of the device.
4286beea244SZhihong Wang
4296beea244SZhihong Wang* ``get_protocol_features``
4306beea244SZhihong Wang
4316beea244SZhihong Wang  Called to get supported protocol features of the device.
4326beea244SZhihong Wang
4336beea244SZhihong Wang* ``dev_conf``
4346beea244SZhihong Wang
4356beea244SZhihong Wang  Called to configure the actual device when the virtio device becomes ready.
4366beea244SZhihong Wang
4376beea244SZhihong Wang* ``dev_close``
4386beea244SZhihong Wang
4396beea244SZhihong Wang  Called to close the actual device when the virtio device is stopped.
4406beea244SZhihong Wang
4416beea244SZhihong Wang* ``set_vring_state``
4426beea244SZhihong Wang
4436beea244SZhihong Wang  Called to change the state of the vring in the actual device when vring state
4446beea244SZhihong Wang  changes.
4456beea244SZhihong Wang
4466beea244SZhihong Wang* ``set_features``
4476beea244SZhihong Wang
4486beea244SZhihong Wang  Called to set the negotiated features to device.
4496beea244SZhihong Wang
4506beea244SZhihong Wang* ``migration_done``
4516beea244SZhihong Wang
4526beea244SZhihong Wang  Called to allow the device to response to RARP sending.
4536beea244SZhihong Wang
4546beea244SZhihong Wang* ``get_vfio_group_fd``
4556beea244SZhihong Wang
4566beea244SZhihong Wang   Called to get the VFIO group fd of the device.
4576beea244SZhihong Wang
4586beea244SZhihong Wang* ``get_vfio_device_fd``
4596beea244SZhihong Wang
4606beea244SZhihong Wang  Called to get the VFIO device fd of the device.
4616beea244SZhihong Wang
4626beea244SZhihong Wang* ``get_notify_area``
4636beea244SZhihong Wang
4646beea244SZhihong Wang  Called to get the notify area info of the queue.
46538e0f108SXuan Ding
46653d3f477SJiayu HuVhost asynchronous data path
46753d3f477SJiayu Hu----------------------------
46853d3f477SJiayu Hu
46953d3f477SJiayu HuVhost asynchronous data path leverages DMA devices to offload memory
47053d3f477SJiayu Hucopies from the CPU and it is implemented in an asynchronous way. It
47153d3f477SJiayu Huenables applications, like OVS, to save CPU cycles and hide memory copy
47253d3f477SJiayu Huoverhead, thus achieving higher throughput.
47353d3f477SJiayu Hu
47453d3f477SJiayu HuVhost doesn't manage DMA devices and applications, like OVS, need to
47553d3f477SJiayu Humanage and configure DMA devices. Applications need to tell vhost what
47653d3f477SJiayu HuDMA devices to use in every data path function call. This design enables
47753d3f477SJiayu Huthe flexibility for applications to dynamically use DMA channels in
47853d3f477SJiayu Hudifferent function modules, not limited in vhost.
47953d3f477SJiayu Hu
48053d3f477SJiayu HuIn addition, vhost supports M:N mapping between vrings and DMA virtual
48153d3f477SJiayu Huchannels. Specifically, one vring can use multiple different DMA channels
48253d3f477SJiayu Huand one DMA channel can be shared by multiple vrings at the same time.
48353d3f477SJiayu HuThe reason of enabling one vring to use multiple DMA channels is that
48453d3f477SJiayu Huit's possible that more than one dataplane threads enqueue packets to
48553d3f477SJiayu Huthe same vring with their own DMA virtual channels. Besides, the number
48653d3f477SJiayu Huof DMA devices is limited. For the purpose of scaling, it's necessary to
48753d3f477SJiayu Husupport sharing DMA channels among vrings.
48853d3f477SJiayu Hu
4899851c4e3SXuan Ding* Async enqueue API usage
4909851c4e3SXuan Ding
4919851c4e3SXuan Ding  In async enqueue path, rte_vhost_poll_enqueue_completed() needs to be
4929851c4e3SXuan Ding  called in time to notify the guest of DMA copy completed packets.
4939851c4e3SXuan Ding  Moreover, calling rte_vhost_submit_enqueue_burst() all the time but
4949851c4e3SXuan Ding  not poll completed will cause the DMA ring to be full, which will
4959851c4e3SXuan Ding  result in packet loss eventually.
4969851c4e3SXuan Ding
497741eda9dSXuan Ding* Recommended IOVA mode in async datapath
49838e0f108SXuan Ding
49938e0f108SXuan Ding  When DMA devices are bound to VFIO driver, VA mode is recommended.
50038e0f108SXuan Ding  For PA mode, page by page mapping may exceed IOMMU's max capability,
50138e0f108SXuan Ding  better to use 1G guest hugepage.
50238e0f108SXuan Ding
503741eda9dSXuan Ding  For UIO driver or kernel driver, any VFIO related error messages
504741eda9dSXuan Ding  can be ignored.
505