xref: /dpdk/doc/guides/prog_guide/vhost_lib.rst (revision ca7036b4af3a82d258cca914e71171434b3d0320)
15630257fSFerruh Yigit..  SPDX-License-Identifier: BSD-3-Clause
25630257fSFerruh Yigit    Copyright(c) 2010-2016 Intel Corporation.
30ee5e7fbSSiobhan Butler
40ee5e7fbSSiobhan ButlerVhost Library
50ee5e7fbSSiobhan Butler=============
60ee5e7fbSSiobhan Butler
72bfaec90SYuanhan LiuThe vhost library implements a user space virtio net server allowing the user
82bfaec90SYuanhan Liuto manipulate the virtio ring directly. In another words, it allows the user
92bfaec90SYuanhan Liuto fetch/put packets from/to the VM virtio net device. To achieve this, a
102bfaec90SYuanhan Liuvhost library should be able to:
112bfaec90SYuanhan Liu
122bfaec90SYuanhan Liu* Access the guest memory:
132bfaec90SYuanhan Liu
142bfaec90SYuanhan Liu  For QEMU, this is done by using the ``-object memory-backend-file,share=on,...``
152bfaec90SYuanhan Liu  option. Which means QEMU will create a file to serve as the guest RAM.
162bfaec90SYuanhan Liu  The ``share=on`` option allows another process to map that file, which
172bfaec90SYuanhan Liu  means it can access the guest RAM.
182bfaec90SYuanhan Liu
192bfaec90SYuanhan Liu* Know all the necessary information about the vring:
202bfaec90SYuanhan Liu
212bfaec90SYuanhan Liu  Information such as where the available ring is stored. Vhost defines some
22647e191bSYuanhan Liu  messages (passed through a Unix domain socket file) to tell the backend all
23647e191bSYuanhan Liu  the information it needs to know how to manipulate the vring.
242bfaec90SYuanhan Liu
250ee5e7fbSSiobhan Butler
260ee5e7fbSSiobhan ButlerVhost API Overview
270ee5e7fbSSiobhan Butler------------------
280ee5e7fbSSiobhan Butler
295fbb3941SYuanhan LiuThe following is an overview of some key Vhost API functions:
300ee5e7fbSSiobhan Butler
312bfaec90SYuanhan Liu* ``rte_vhost_driver_register(path, flags)``
320ee5e7fbSSiobhan Butler
33647e191bSYuanhan Liu  This function registers a vhost driver into the system. ``path`` specifies
34647e191bSYuanhan Liu  the Unix domain socket file path.
350ee5e7fbSSiobhan Butler
36647e191bSYuanhan Liu  Currently supported flags are:
370ee5e7fbSSiobhan Butler
382bfaec90SYuanhan Liu  - ``RTE_VHOST_USER_CLIENT``
390ee5e7fbSSiobhan Butler
402bfaec90SYuanhan Liu    DPDK vhost-user will act as the client when this flag is given. See below
412bfaec90SYuanhan Liu    for an explanation.
420ee5e7fbSSiobhan Butler
432bfaec90SYuanhan Liu  - ``RTE_VHOST_USER_NO_RECONNECT``
440ee5e7fbSSiobhan Butler
452bfaec90SYuanhan Liu    When DPDK vhost-user acts as the client it will keep trying to reconnect
462bfaec90SYuanhan Liu    to the server (QEMU) until it succeeds. This is useful in two cases:
470ee5e7fbSSiobhan Butler
482bfaec90SYuanhan Liu    * When QEMU is not started yet.
492bfaec90SYuanhan Liu    * When QEMU restarts (for example due to a guest OS reboot).
500ee5e7fbSSiobhan Butler
512bfaec90SYuanhan Liu    This reconnect option is enabled by default. However, it can be turned off
522bfaec90SYuanhan Liu    by setting this flag.
530ee5e7fbSSiobhan Butler
54002d6a7eSMaxime Coquelin  - ``RTE_VHOST_USER_IOMMU_SUPPORT``
55002d6a7eSMaxime Coquelin
56002d6a7eSMaxime Coquelin    IOMMU support will be enabled when this flag is set. It is disabled by
57002d6a7eSMaxime Coquelin    default.
58002d6a7eSMaxime Coquelin
59002d6a7eSMaxime Coquelin    Enabling this flag makes possible to use guest vIOMMU to protect vhost
60002d6a7eSMaxime Coquelin    from accessing memory the virtio device isn't allowed to, when the feature
61002d6a7eSMaxime Coquelin    is negotiated and an IOMMU device is declared.
62002d6a7eSMaxime Coquelin
63cd85039eSMaxime Coquelin  - ``RTE_VHOST_USER_POSTCOPY_SUPPORT``
64cd85039eSMaxime Coquelin
65cd85039eSMaxime Coquelin    Postcopy live-migration support will be enabled when this flag is set.
66cd85039eSMaxime Coquelin    It is disabled by default.
67cd85039eSMaxime Coquelin
68cd85039eSMaxime Coquelin    Enabling this flag should only be done when the calling application does
69cd85039eSMaxime Coquelin    not pre-fault the guest shared memory, otherwise migration would fail.
70cd85039eSMaxime Coquelin
71c3ff0ac7SFlavio Leitner  - ``RTE_VHOST_USER_LINEARBUF_SUPPORT``
72c3ff0ac7SFlavio Leitner
73c3ff0ac7SFlavio Leitner    Enabling this flag forces vhost dequeue function to only provide linear
74c3ff0ac7SFlavio Leitner    pktmbuf (no multi-segmented pktmbuf).
75c3ff0ac7SFlavio Leitner
76c3ff0ac7SFlavio Leitner    The vhost library by default provides a single pktmbuf for given a
77c3ff0ac7SFlavio Leitner    packet, but if for some reason the data doesn't fit into a single
78c3ff0ac7SFlavio Leitner    pktmbuf (e.g., TSO is enabled), the library will allocate additional
79c3ff0ac7SFlavio Leitner    pktmbufs from the same mempool and chain them together to create a
80c3ff0ac7SFlavio Leitner    multi-segmented pktmbuf.
81c3ff0ac7SFlavio Leitner
82c3ff0ac7SFlavio Leitner    However, the vhost application needs to support multi-segmented format.
83c3ff0ac7SFlavio Leitner    If the vhost application does not support that format and requires large
84c3ff0ac7SFlavio Leitner    buffers to be dequeue, this flag should be enabled to force only linear
85c3ff0ac7SFlavio Leitner    buffers (see RTE_VHOST_USER_EXTBUF_SUPPORT) or drop the packet.
86c3ff0ac7SFlavio Leitner
87c3ff0ac7SFlavio Leitner    It is disabled by default.
88c3ff0ac7SFlavio Leitner
89c3ff0ac7SFlavio Leitner  - ``RTE_VHOST_USER_EXTBUF_SUPPORT``
90c3ff0ac7SFlavio Leitner
91c3ff0ac7SFlavio Leitner    Enabling this flag allows vhost dequeue function to allocate and attach
92c3ff0ac7SFlavio Leitner    an external buffer to a pktmbuf if the pkmbuf doesn't provide enough
93c3ff0ac7SFlavio Leitner    space to store all data.
94c3ff0ac7SFlavio Leitner
95c3ff0ac7SFlavio Leitner    This is useful when the vhost application wants to support large packets
96c3ff0ac7SFlavio Leitner    but doesn't want to increase the default mempool object size nor to
97c3ff0ac7SFlavio Leitner    support multi-segmented mbufs (non-linear). In this case, a fresh buffer
98c3ff0ac7SFlavio Leitner    is allocated using rte_malloc() which gets attached to a pktmbuf using
99c3ff0ac7SFlavio Leitner    rte_pktmbuf_attach_extbuf().
100c3ff0ac7SFlavio Leitner
101c3ff0ac7SFlavio Leitner    See RTE_VHOST_USER_LINEARBUF_SUPPORT as well to disable multi-segmented
102c3ff0ac7SFlavio Leitner    mbufs for application that doesn't support chained mbufs.
103c3ff0ac7SFlavio Leitner
104c3ff0ac7SFlavio Leitner    It is disabled by default.
105c3ff0ac7SFlavio Leitner
106362f06f9SPatrick Fu  - ``RTE_VHOST_USER_ASYNC_COPY``
107362f06f9SPatrick Fu
108362f06f9SPatrick Fu    Asynchronous data path will be enabled when this flag is set. Async data
109362f06f9SPatrick Fu    path allows applications to register async copy devices (typically
110362f06f9SPatrick Fu    hardware DMA channels) to the vhost queues. Vhost leverages the copy
111362f06f9SPatrick Fu    device registered to free CPU from memory copy operations. A set of
112362f06f9SPatrick Fu    async data path APIs are defined for DPDK applications to make use of
113362f06f9SPatrick Fu    the async capability. Only packets enqueued/dequeued by async APIs are
114362f06f9SPatrick Fu    processed through the async data path.
115362f06f9SPatrick Fu
116362f06f9SPatrick Fu    Currently this feature is only implemented on split ring enqueue data
117362f06f9SPatrick Fu    path.
118362f06f9SPatrick Fu
119362f06f9SPatrick Fu    It is disabled by default.
120362f06f9SPatrick Fu
121*ca7036b4SDavid Marchand  - ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS``
122*ca7036b4SDavid Marchand
123*ca7036b4SDavid Marchand    Since v16.04, the vhost library forwards checksum and gso requests for
124*ca7036b4SDavid Marchand    packets received from a virtio driver by filling Tx offload metadata in
125*ca7036b4SDavid Marchand    the mbuf. This behavior is inconsistent with other drivers but it is left
126*ca7036b4SDavid Marchand    untouched for existing applications that might rely on it.
127*ca7036b4SDavid Marchand
128*ca7036b4SDavid Marchand    This flag disables the legacy behavior and instead ask vhost to simply
129*ca7036b4SDavid Marchand    populate Rx offload metadata in the mbuf.
130*ca7036b4SDavid Marchand
131*ca7036b4SDavid Marchand    It is disabled by default.
132*ca7036b4SDavid Marchand
1335fbb3941SYuanhan Liu* ``rte_vhost_driver_set_features(path, features)``
1345fbb3941SYuanhan Liu
1355fbb3941SYuanhan Liu  This function sets the feature bits the vhost-user driver supports. The
1365fbb3941SYuanhan Liu  vhost-user driver could be vhost-user net, yet it could be something else,
1375fbb3941SYuanhan Liu  say, vhost-user SCSI.
1385fbb3941SYuanhan Liu
1397c129037SYuanhan Liu* ``rte_vhost_driver_callback_register(path, vhost_device_ops)``
1402bfaec90SYuanhan Liu
1412bfaec90SYuanhan Liu  This function registers a set of callbacks, to let DPDK applications take
1422bfaec90SYuanhan Liu  the appropriate action when some events happen. The following events are
1432bfaec90SYuanhan Liu  currently supported:
1442bfaec90SYuanhan Liu
1452bfaec90SYuanhan Liu  * ``new_device(int vid)``
1462bfaec90SYuanhan Liu
147cb043557SYuanhan Liu    This callback is invoked when a virtio device becomes ready. ``vid``
148cb043557SYuanhan Liu    is the vhost device ID.
1492bfaec90SYuanhan Liu
1502bfaec90SYuanhan Liu  * ``destroy_device(int vid)``
1512bfaec90SYuanhan Liu
152efba12a7SDariusz Stojaczyk    This callback is invoked when a virtio device is paused or shut down.
1532bfaec90SYuanhan Liu
1542bfaec90SYuanhan Liu  * ``vring_state_changed(int vid, uint16_t queue_id, int enable)``
1552bfaec90SYuanhan Liu
1562bfaec90SYuanhan Liu    This callback is invoked when a specific queue's state is changed, for
1572bfaec90SYuanhan Liu    example to enabled or disabled.
1582bfaec90SYuanhan Liu
159abd53c16SYuanhan Liu  * ``features_changed(int vid, uint64_t features)``
160abd53c16SYuanhan Liu
161abd53c16SYuanhan Liu    This callback is invoked when the features is changed. For example,
162abd53c16SYuanhan Liu    ``VHOST_F_LOG_ALL`` will be set/cleared at the start/end of live
163abd53c16SYuanhan Liu    migration, respectively.
164abd53c16SYuanhan Liu
165efba12a7SDariusz Stojaczyk  * ``new_connection(int vid)``
166efba12a7SDariusz Stojaczyk
167efba12a7SDariusz Stojaczyk    This callback is invoked on new vhost-user socket connection. If DPDK
168efba12a7SDariusz Stojaczyk    acts as the server the device should not be deleted before
169efba12a7SDariusz Stojaczyk    ``destroy_connection`` callback is received.
170efba12a7SDariusz Stojaczyk
171efba12a7SDariusz Stojaczyk  * ``destroy_connection(int vid)``
172efba12a7SDariusz Stojaczyk
173efba12a7SDariusz Stojaczyk    This callback is invoked when vhost-user socket connection is closed.
174efba12a7SDariusz Stojaczyk    It indicates that device with id ``vid`` is no longer in use and can be
175efba12a7SDariusz Stojaczyk    safely deleted.
176efba12a7SDariusz Stojaczyk
177af147591SYuanhan Liu* ``rte_vhost_driver_disable/enable_features(path, features))``
178af147591SYuanhan Liu
179af147591SYuanhan Liu  This function disables/enables some features. For example, it can be used to
180af147591SYuanhan Liu  disable mergeable buffers and TSO features, which both are enabled by
181af147591SYuanhan Liu  default.
182af147591SYuanhan Liu
183af147591SYuanhan Liu* ``rte_vhost_driver_start(path)``
184af147591SYuanhan Liu
185af147591SYuanhan Liu  This function triggers the vhost-user negotiation. It should be invoked at
186af147591SYuanhan Liu  the end of initializing a vhost-user driver.
187af147591SYuanhan Liu
1882bfaec90SYuanhan Liu* ``rte_vhost_enqueue_burst(vid, queue_id, pkts, count)``
1892bfaec90SYuanhan Liu
1902bfaec90SYuanhan Liu  Transmits (enqueues) ``count`` packets from host to guest.
1912bfaec90SYuanhan Liu
1922bfaec90SYuanhan Liu* ``rte_vhost_dequeue_burst(vid, queue_id, mbuf_pool, pkts, count)``
1932bfaec90SYuanhan Liu
1942bfaec90SYuanhan Liu  Receives (dequeues) ``count`` packets from guest, and stored them at ``pkts``.
1952bfaec90SYuanhan Liu
196939066d9SFan Zhang* ``rte_vhost_crypto_create(vid, cryptodev_id, sess_mempool, socket_id)``
197939066d9SFan Zhang
198939066d9SFan Zhang  As an extension of new_device(), this function adds virtio-crypto workload
199939066d9SFan Zhang  acceleration capability to the device. All crypto workload is processed by
200939066d9SFan Zhang  DPDK cryptodev with the device ID of ``cryptodev_id``.
201939066d9SFan Zhang
202939066d9SFan Zhang* ``rte_vhost_crypto_free(vid)``
203939066d9SFan Zhang
204939066d9SFan Zhang  Frees the memory and vhost-user message handlers created in
205939066d9SFan Zhang  rte_vhost_crypto_create().
206939066d9SFan Zhang
207939066d9SFan Zhang* ``rte_vhost_crypto_fetch_requests(vid, queue_id, ops, nb_ops)``
208939066d9SFan Zhang
209939066d9SFan Zhang  Receives (dequeues) ``nb_ops`` virtio-crypto requests from guest, parses
210939066d9SFan Zhang  them to DPDK Crypto Operations, and fills the ``ops`` with parsing results.
211939066d9SFan Zhang
212939066d9SFan Zhang* ``rte_vhost_crypto_finalize_requests(queue_id, ops, nb_ops)``
213939066d9SFan Zhang
214939066d9SFan Zhang  After the ``ops`` are dequeued from Cryptodev, finalizes the jobs and
215939066d9SFan Zhang  notifies the guest(s).
216939066d9SFan Zhang
217939066d9SFan Zhang* ``rte_vhost_crypto_set_zero_copy(vid, option)``
218939066d9SFan Zhang
219939066d9SFan Zhang  Enable or disable zero copy feature of the vhost crypto backend.
220939066d9SFan Zhang
221362f06f9SPatrick Fu* ``rte_vhost_async_channel_register(vid, queue_id, features, ops)``
222362f06f9SPatrick Fu
223bcabc70aSJiayu Hu  Register a vhost queue with async copy device channel after vring
224bcabc70aSJiayu Hu  is enabled. Following device ``features`` must be specified together
225bcabc70aSJiayu Hu  with the registration:
226362f06f9SPatrick Fu
227362f06f9SPatrick Fu  * ``async_inorder``
228362f06f9SPatrick Fu
229362f06f9SPatrick Fu    Async copy device can guarantee the ordering of copy completion
230362f06f9SPatrick Fu    sequence. Copies are completed in the same order with that at
231362f06f9SPatrick Fu    the submission time.
232362f06f9SPatrick Fu
233362f06f9SPatrick Fu    Currently, only ``async_inorder`` capable device is supported by vhost.
234362f06f9SPatrick Fu
235362f06f9SPatrick Fu  * ``async_threshold``
236362f06f9SPatrick Fu
237362f06f9SPatrick Fu    The copy length (in bytes) below which CPU copy will be used even if
238362f06f9SPatrick Fu    applications call async vhost APIs to enqueue/dequeue data.
239362f06f9SPatrick Fu
240362f06f9SPatrick Fu    Typical value is 512~1024 depending on the async device capability.
241362f06f9SPatrick Fu
242362f06f9SPatrick Fu  Applications must provide following ``ops`` callbacks for vhost lib to
243362f06f9SPatrick Fu  work with the async copy devices:
244362f06f9SPatrick Fu
245362f06f9SPatrick Fu  * ``transfer_data(vid, queue_id, descs, opaque_data, count)``
246362f06f9SPatrick Fu
247362f06f9SPatrick Fu    vhost invokes this function to submit copy data to the async devices.
248362f06f9SPatrick Fu    For non-async_inorder capable devices, ``opaque_data`` could be used
249362f06f9SPatrick Fu    for identifying the completed packets.
250362f06f9SPatrick Fu
251362f06f9SPatrick Fu  * ``check_completed_copies(vid, queue_id, opaque_data, max_packets)``
252362f06f9SPatrick Fu
253362f06f9SPatrick Fu    vhost invokes this function to get the copy data completed by async
254362f06f9SPatrick Fu    devices.
255362f06f9SPatrick Fu
256362f06f9SPatrick Fu* ``rte_vhost_async_channel_unregister(vid, queue_id)``
257362f06f9SPatrick Fu
258362f06f9SPatrick Fu  Unregister the async copy device channel from a vhost queue.
259bcabc70aSJiayu Hu  Unregistration will fail, if the vhost queue has in-flight
260bcabc70aSJiayu Hu  packets that are not completed.
261bcabc70aSJiayu Hu
262bcabc70aSJiayu Hu  Unregister async copy devices in vring_state_changed() may
263bcabc70aSJiayu Hu  fail, as this API tries to acquire the spinlock of vhost
264bcabc70aSJiayu Hu  queue. The recommended way is to unregister async copy
265bcabc70aSJiayu Hu  devices for all vhost queues in destroy_device(), when a
266bcabc70aSJiayu Hu  virtio device is paused or shut down.
267362f06f9SPatrick Fu
2681b7b2438SJiayu Hu* ``rte_vhost_submit_enqueue_burst(vid, queue_id, pkts, count, comp_pkts, comp_count)``
269362f06f9SPatrick Fu
270362f06f9SPatrick Fu  Submit an enqueue request to transmit ``count`` packets from host to guest
2711b7b2438SJiayu Hu  by async data path. Successfully enqueued packets can be transfer completed
2721b7b2438SJiayu Hu  or being occupied by DMA engines; transfer completed packets are returned in
2731b7b2438SJiayu Hu  ``comp_pkts``, but others are not guaranteed to finish, when this API
2741b7b2438SJiayu Hu  call returns.
275362f06f9SPatrick Fu
276362f06f9SPatrick Fu  Applications must not free the packets submitted for enqueue until the
277362f06f9SPatrick Fu  packets are completed.
278362f06f9SPatrick Fu
279362f06f9SPatrick Fu* ``rte_vhost_poll_enqueue_completed(vid, queue_id, pkts, count)``
280362f06f9SPatrick Fu
281362f06f9SPatrick Fu  Poll enqueue completion status from async data path. Completed packets
282362f06f9SPatrick Fu  are returned to applications through ``pkts``.
283362f06f9SPatrick Fu
284647e191bSYuanhan LiuVhost-user Implementations
285647e191bSYuanhan Liu--------------------------
28642683a7dSHuawei Xie
2872bfaec90SYuanhan LiuVhost-user uses Unix domain sockets for passing messages. This means the DPDK
2882bfaec90SYuanhan Liuvhost-user implementation has two options:
28942683a7dSHuawei Xie
2902bfaec90SYuanhan Liu* DPDK vhost-user acts as the server.
29142683a7dSHuawei Xie
2922bfaec90SYuanhan Liu  DPDK will create a Unix domain socket server file and listen for
2932bfaec90SYuanhan Liu  connections from the frontend.
29442683a7dSHuawei Xie
2952bfaec90SYuanhan Liu  Note, this is the default mode, and the only mode before DPDK v16.07.
29642683a7dSHuawei Xie
2972bfaec90SYuanhan Liu
2982bfaec90SYuanhan Liu* DPDK vhost-user acts as the client.
2992bfaec90SYuanhan Liu
3002bfaec90SYuanhan Liu  Unlike the server mode, this mode doesn't create the socket file;
3012bfaec90SYuanhan Liu  it just tries to connect to the server (which responses to create the
3022bfaec90SYuanhan Liu  file instead).
3032bfaec90SYuanhan Liu
3042bfaec90SYuanhan Liu  When the DPDK vhost-user application restarts, DPDK vhost-user will try to
3052bfaec90SYuanhan Liu  connect to the server again. This is how the "reconnect" feature works.
3062bfaec90SYuanhan Liu
307f6ee75b5SYuanhan Liu  .. Note::
308f6ee75b5SYuanhan Liu     * The "reconnect" feature requires **QEMU v2.7** (or above).
309f6ee75b5SYuanhan Liu
310f6ee75b5SYuanhan Liu     * The vhost supported features must be exactly the same before and
311f6ee75b5SYuanhan Liu       after the restart. For example, if TSO is disabled and then enabled,
312f6ee75b5SYuanhan Liu       nothing will work and issues undefined might happen.
3132bfaec90SYuanhan Liu
3142bfaec90SYuanhan LiuNo matter which mode is used, once a connection is established, DPDK
3152bfaec90SYuanhan Liuvhost-user will start receiving and processing vhost messages from QEMU.
3162bfaec90SYuanhan Liu
3172bfaec90SYuanhan LiuFor messages with a file descriptor, the file descriptor can be used directly
3182bfaec90SYuanhan Liuin the vhost process as it is already installed by the Unix domain socket.
3192bfaec90SYuanhan Liu
3202bfaec90SYuanhan LiuThe supported vhost messages are:
3212bfaec90SYuanhan Liu
3222bfaec90SYuanhan Liu* ``VHOST_SET_MEM_TABLE``
3232bfaec90SYuanhan Liu* ``VHOST_SET_VRING_KICK``
3242bfaec90SYuanhan Liu* ``VHOST_SET_VRING_CALL``
3252bfaec90SYuanhan Liu* ``VHOST_SET_LOG_FD``
3262bfaec90SYuanhan Liu* ``VHOST_SET_VRING_ERR``
3272bfaec90SYuanhan Liu
3282bfaec90SYuanhan LiuFor ``VHOST_SET_MEM_TABLE`` message, QEMU will send information for each
3292bfaec90SYuanhan Liumemory region and its file descriptor in the ancillary data of the message.
3302bfaec90SYuanhan LiuThe file descriptor is used to map that region.
3312bfaec90SYuanhan Liu
3322bfaec90SYuanhan Liu``VHOST_SET_VRING_KICK`` is used as the signal to put the vhost device into
3332bfaec90SYuanhan Liuthe data plane, and ``VHOST_GET_VRING_BASE`` is used as the signal to remove
3342bfaec90SYuanhan Liuthe vhost device from the data plane.
33542683a7dSHuawei Xie
33642683a7dSHuawei XieWhen the socket connection is closed, vhost will destroy the device.
33742683a7dSHuawei Xie
338768274ebSJianfeng TanGuest memory requirement
339768274ebSJianfeng Tan------------------------
340768274ebSJianfeng Tan
341768274ebSJianfeng Tan* Memory pre-allocation
342768274ebSJianfeng Tan
343cacf8267SMaxime Coquelin  For non-async data path, guest memory pre-allocation is not a
344362f06f9SPatrick Fu  must. This can help save of memory. If users really want the guest memory
345362f06f9SPatrick Fu  to be pre-allocated (e.g., for performance reason), we can add option
346362f06f9SPatrick Fu  ``-mem-prealloc`` when starting QEMU. Or, we can lock all memory at vhost
347362f06f9SPatrick Fu  side which will force memory to be allocated when mmap at vhost side;
348362f06f9SPatrick Fu  option --mlockall in ovs-dpdk is an example in hand.
349768274ebSJianfeng Tan
350cacf8267SMaxime Coquelin  For async data path, we force the VM memory to be pre-allocated at vhost
351cacf8267SMaxime Coquelin  lib when mapping the guest memory; and also we need to lock the memory to
352cacf8267SMaxime Coquelin  prevent pages being swapped out to disk.
353768274ebSJianfeng Tan
354768274ebSJianfeng Tan* Memory sharing
355768274ebSJianfeng Tan
356768274ebSJianfeng Tan  Make sure ``share=on`` QEMU option is given. vhost-user will not work with
357768274ebSJianfeng Tan  a QEMU version without shared memory mapping.
358768274ebSJianfeng Tan
3590ee5e7fbSSiobhan ButlerVhost supported vSwitch reference
3600ee5e7fbSSiobhan Butler---------------------------------
3610ee5e7fbSSiobhan Butler
3622bfaec90SYuanhan LiuFor more vhost details and how to support vhost in vSwitch, please refer to
3632bfaec90SYuanhan Liuthe vhost example in the DPDK Sample Applications Guide.
3646beea244SZhihong Wang
3656beea244SZhihong WangVhost data path acceleration (vDPA)
3666beea244SZhihong Wang-----------------------------------
3676beea244SZhihong Wang
3686beea244SZhihong WangvDPA supports selective datapath in vhost-user lib by enabling virtio ring
3696beea244SZhihong Wangcompatible devices to serve virtio driver directly for datapath acceleration.
3706beea244SZhihong Wang
3716beea244SZhihong Wang``rte_vhost_driver_attach_vdpa_device`` is used to configure the vhost device
3726beea244SZhihong Wangwith accelerated backend.
3736beea244SZhihong Wang
3746beea244SZhihong WangAlso vhost device capabilities are made configurable to adopt various devices.
3756beea244SZhihong WangSuch capabilities include supported features, protocol features, queue number.
3766beea244SZhihong Wang
3776beea244SZhihong WangFinally, a set of device ops is defined for device specific operations:
3786beea244SZhihong Wang
3796beea244SZhihong Wang* ``get_queue_num``
3806beea244SZhihong Wang
3816beea244SZhihong Wang  Called to get supported queue number of the device.
3826beea244SZhihong Wang
3836beea244SZhihong Wang* ``get_features``
3846beea244SZhihong Wang
3856beea244SZhihong Wang  Called to get supported features of the device.
3866beea244SZhihong Wang
3876beea244SZhihong Wang* ``get_protocol_features``
3886beea244SZhihong Wang
3896beea244SZhihong Wang  Called to get supported protocol features of the device.
3906beea244SZhihong Wang
3916beea244SZhihong Wang* ``dev_conf``
3926beea244SZhihong Wang
3936beea244SZhihong Wang  Called to configure the actual device when the virtio device becomes ready.
3946beea244SZhihong Wang
3956beea244SZhihong Wang* ``dev_close``
3966beea244SZhihong Wang
3976beea244SZhihong Wang  Called to close the actual device when the virtio device is stopped.
3986beea244SZhihong Wang
3996beea244SZhihong Wang* ``set_vring_state``
4006beea244SZhihong Wang
4016beea244SZhihong Wang  Called to change the state of the vring in the actual device when vring state
4026beea244SZhihong Wang  changes.
4036beea244SZhihong Wang
4046beea244SZhihong Wang* ``set_features``
4056beea244SZhihong Wang
4066beea244SZhihong Wang  Called to set the negotiated features to device.
4076beea244SZhihong Wang
4086beea244SZhihong Wang* ``migration_done``
4096beea244SZhihong Wang
4106beea244SZhihong Wang  Called to allow the device to response to RARP sending.
4116beea244SZhihong Wang
4126beea244SZhihong Wang* ``get_vfio_group_fd``
4136beea244SZhihong Wang
4146beea244SZhihong Wang   Called to get the VFIO group fd of the device.
4156beea244SZhihong Wang
4166beea244SZhihong Wang* ``get_vfio_device_fd``
4176beea244SZhihong Wang
4186beea244SZhihong Wang  Called to get the VFIO device fd of the device.
4196beea244SZhihong Wang
4206beea244SZhihong Wang* ``get_notify_area``
4216beea244SZhihong Wang
4226beea244SZhihong Wang  Called to get the notify area info of the queue.
423