15630257fSFerruh Yigit.. SPDX-License-Identifier: BSD-3-Clause 25630257fSFerruh Yigit Copyright(c) 2010-2016 Intel Corporation. 30ee5e7fbSSiobhan Butler 40ee5e7fbSSiobhan ButlerVhost Library 50ee5e7fbSSiobhan Butler============= 60ee5e7fbSSiobhan Butler 72bfaec90SYuanhan LiuThe vhost library implements a user space virtio net server allowing the user 82bfaec90SYuanhan Liuto manipulate the virtio ring directly. In another words, it allows the user 92bfaec90SYuanhan Liuto fetch/put packets from/to the VM virtio net device. To achieve this, a 102bfaec90SYuanhan Liuvhost library should be able to: 112bfaec90SYuanhan Liu 122bfaec90SYuanhan Liu* Access the guest memory: 132bfaec90SYuanhan Liu 142bfaec90SYuanhan Liu For QEMU, this is done by using the ``-object memory-backend-file,share=on,...`` 152bfaec90SYuanhan Liu option. Which means QEMU will create a file to serve as the guest RAM. 162bfaec90SYuanhan Liu The ``share=on`` option allows another process to map that file, which 172bfaec90SYuanhan Liu means it can access the guest RAM. 182bfaec90SYuanhan Liu 192bfaec90SYuanhan Liu* Know all the necessary information about the vring: 202bfaec90SYuanhan Liu 212bfaec90SYuanhan Liu Information such as where the available ring is stored. Vhost defines some 22647e191bSYuanhan Liu messages (passed through a Unix domain socket file) to tell the backend all 23647e191bSYuanhan Liu the information it needs to know how to manipulate the vring. 242bfaec90SYuanhan Liu 250ee5e7fbSSiobhan Butler 260ee5e7fbSSiobhan ButlerVhost API Overview 270ee5e7fbSSiobhan Butler------------------ 280ee5e7fbSSiobhan Butler 295fbb3941SYuanhan LiuThe following is an overview of some key Vhost API functions: 300ee5e7fbSSiobhan Butler 312bfaec90SYuanhan Liu* ``rte_vhost_driver_register(path, flags)`` 320ee5e7fbSSiobhan Butler 33647e191bSYuanhan Liu This function registers a vhost driver into the system. ``path`` specifies 34647e191bSYuanhan Liu the Unix domain socket file path. 350ee5e7fbSSiobhan Butler 36647e191bSYuanhan Liu Currently supported flags are: 370ee5e7fbSSiobhan Butler 382bfaec90SYuanhan Liu - ``RTE_VHOST_USER_CLIENT`` 390ee5e7fbSSiobhan Butler 402bfaec90SYuanhan Liu DPDK vhost-user will act as the client when this flag is given. See below 412bfaec90SYuanhan Liu for an explanation. 420ee5e7fbSSiobhan Butler 432bfaec90SYuanhan Liu - ``RTE_VHOST_USER_NO_RECONNECT`` 440ee5e7fbSSiobhan Butler 452bfaec90SYuanhan Liu When DPDK vhost-user acts as the client it will keep trying to reconnect 462bfaec90SYuanhan Liu to the server (QEMU) until it succeeds. This is useful in two cases: 470ee5e7fbSSiobhan Butler 482bfaec90SYuanhan Liu * When QEMU is not started yet. 492bfaec90SYuanhan Liu * When QEMU restarts (for example due to a guest OS reboot). 500ee5e7fbSSiobhan Butler 512bfaec90SYuanhan Liu This reconnect option is enabled by default. However, it can be turned off 522bfaec90SYuanhan Liu by setting this flag. 530ee5e7fbSSiobhan Butler 54002d6a7eSMaxime Coquelin - ``RTE_VHOST_USER_IOMMU_SUPPORT`` 55002d6a7eSMaxime Coquelin 56002d6a7eSMaxime Coquelin IOMMU support will be enabled when this flag is set. It is disabled by 57002d6a7eSMaxime Coquelin default. 58002d6a7eSMaxime Coquelin 59002d6a7eSMaxime Coquelin Enabling this flag makes possible to use guest vIOMMU to protect vhost 60002d6a7eSMaxime Coquelin from accessing memory the virtio device isn't allowed to, when the feature 61002d6a7eSMaxime Coquelin is negotiated and an IOMMU device is declared. 62002d6a7eSMaxime Coquelin 63cd85039eSMaxime Coquelin - ``RTE_VHOST_USER_POSTCOPY_SUPPORT`` 64cd85039eSMaxime Coquelin 65cd85039eSMaxime Coquelin Postcopy live-migration support will be enabled when this flag is set. 66cd85039eSMaxime Coquelin It is disabled by default. 67cd85039eSMaxime Coquelin 68cd85039eSMaxime Coquelin Enabling this flag should only be done when the calling application does 69cd85039eSMaxime Coquelin not pre-fault the guest shared memory, otherwise migration would fail. 70cd85039eSMaxime Coquelin 71c3ff0ac7SFlavio Leitner - ``RTE_VHOST_USER_LINEARBUF_SUPPORT`` 72c3ff0ac7SFlavio Leitner 73c3ff0ac7SFlavio Leitner Enabling this flag forces vhost dequeue function to only provide linear 74c3ff0ac7SFlavio Leitner pktmbuf (no multi-segmented pktmbuf). 75c3ff0ac7SFlavio Leitner 76c3ff0ac7SFlavio Leitner The vhost library by default provides a single pktmbuf for given a 77c3ff0ac7SFlavio Leitner packet, but if for some reason the data doesn't fit into a single 78c3ff0ac7SFlavio Leitner pktmbuf (e.g., TSO is enabled), the library will allocate additional 79c3ff0ac7SFlavio Leitner pktmbufs from the same mempool and chain them together to create a 80c3ff0ac7SFlavio Leitner multi-segmented pktmbuf. 81c3ff0ac7SFlavio Leitner 82c3ff0ac7SFlavio Leitner However, the vhost application needs to support multi-segmented format. 83c3ff0ac7SFlavio Leitner If the vhost application does not support that format and requires large 84c3ff0ac7SFlavio Leitner buffers to be dequeue, this flag should be enabled to force only linear 85c3ff0ac7SFlavio Leitner buffers (see RTE_VHOST_USER_EXTBUF_SUPPORT) or drop the packet. 86c3ff0ac7SFlavio Leitner 87c3ff0ac7SFlavio Leitner It is disabled by default. 88c3ff0ac7SFlavio Leitner 89c3ff0ac7SFlavio Leitner - ``RTE_VHOST_USER_EXTBUF_SUPPORT`` 90c3ff0ac7SFlavio Leitner 91c3ff0ac7SFlavio Leitner Enabling this flag allows vhost dequeue function to allocate and attach 92c3ff0ac7SFlavio Leitner an external buffer to a pktmbuf if the pkmbuf doesn't provide enough 93c3ff0ac7SFlavio Leitner space to store all data. 94c3ff0ac7SFlavio Leitner 95c3ff0ac7SFlavio Leitner This is useful when the vhost application wants to support large packets 96c3ff0ac7SFlavio Leitner but doesn't want to increase the default mempool object size nor to 97c3ff0ac7SFlavio Leitner support multi-segmented mbufs (non-linear). In this case, a fresh buffer 98c3ff0ac7SFlavio Leitner is allocated using rte_malloc() which gets attached to a pktmbuf using 99c3ff0ac7SFlavio Leitner rte_pktmbuf_attach_extbuf(). 100c3ff0ac7SFlavio Leitner 101c3ff0ac7SFlavio Leitner See RTE_VHOST_USER_LINEARBUF_SUPPORT as well to disable multi-segmented 102c3ff0ac7SFlavio Leitner mbufs for application that doesn't support chained mbufs. 103c3ff0ac7SFlavio Leitner 104c3ff0ac7SFlavio Leitner It is disabled by default. 105c3ff0ac7SFlavio Leitner 106362f06f9SPatrick Fu - ``RTE_VHOST_USER_ASYNC_COPY`` 107362f06f9SPatrick Fu 108362f06f9SPatrick Fu Asynchronous data path will be enabled when this flag is set. Async data 109362f06f9SPatrick Fu path allows applications to register async copy devices (typically 110362f06f9SPatrick Fu hardware DMA channels) to the vhost queues. Vhost leverages the copy 111362f06f9SPatrick Fu device registered to free CPU from memory copy operations. A set of 112362f06f9SPatrick Fu async data path APIs are defined for DPDK applications to make use of 113362f06f9SPatrick Fu the async capability. Only packets enqueued/dequeued by async APIs are 114362f06f9SPatrick Fu processed through the async data path. 115362f06f9SPatrick Fu 116362f06f9SPatrick Fu Currently this feature is only implemented on split ring enqueue data 117362f06f9SPatrick Fu path. 118362f06f9SPatrick Fu 119362f06f9SPatrick Fu It is disabled by default. 120362f06f9SPatrick Fu 121ca7036b4SDavid Marchand - ``RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS`` 122ca7036b4SDavid Marchand 123ca7036b4SDavid Marchand Since v16.04, the vhost library forwards checksum and gso requests for 124ca7036b4SDavid Marchand packets received from a virtio driver by filling Tx offload metadata in 125ca7036b4SDavid Marchand the mbuf. This behavior is inconsistent with other drivers but it is left 126ca7036b4SDavid Marchand untouched for existing applications that might rely on it. 127ca7036b4SDavid Marchand 128ca7036b4SDavid Marchand This flag disables the legacy behavior and instead ask vhost to simply 129ca7036b4SDavid Marchand populate Rx offload metadata in the mbuf. 130ca7036b4SDavid Marchand 131ca7036b4SDavid Marchand It is disabled by default. 132ca7036b4SDavid Marchand 1335fbb3941SYuanhan Liu* ``rte_vhost_driver_set_features(path, features)`` 1345fbb3941SYuanhan Liu 1355fbb3941SYuanhan Liu This function sets the feature bits the vhost-user driver supports. The 1365fbb3941SYuanhan Liu vhost-user driver could be vhost-user net, yet it could be something else, 1375fbb3941SYuanhan Liu say, vhost-user SCSI. 1385fbb3941SYuanhan Liu 1397c129037SYuanhan Liu* ``rte_vhost_driver_callback_register(path, vhost_device_ops)`` 1402bfaec90SYuanhan Liu 1412bfaec90SYuanhan Liu This function registers a set of callbacks, to let DPDK applications take 1422bfaec90SYuanhan Liu the appropriate action when some events happen. The following events are 1432bfaec90SYuanhan Liu currently supported: 1442bfaec90SYuanhan Liu 1452bfaec90SYuanhan Liu * ``new_device(int vid)`` 1462bfaec90SYuanhan Liu 147cb043557SYuanhan Liu This callback is invoked when a virtio device becomes ready. ``vid`` 148cb043557SYuanhan Liu is the vhost device ID. 1492bfaec90SYuanhan Liu 1502bfaec90SYuanhan Liu * ``destroy_device(int vid)`` 1512bfaec90SYuanhan Liu 152efba12a7SDariusz Stojaczyk This callback is invoked when a virtio device is paused or shut down. 1532bfaec90SYuanhan Liu 1542bfaec90SYuanhan Liu * ``vring_state_changed(int vid, uint16_t queue_id, int enable)`` 1552bfaec90SYuanhan Liu 1562bfaec90SYuanhan Liu This callback is invoked when a specific queue's state is changed, for 1572bfaec90SYuanhan Liu example to enabled or disabled. 1582bfaec90SYuanhan Liu 159abd53c16SYuanhan Liu * ``features_changed(int vid, uint64_t features)`` 160abd53c16SYuanhan Liu 161abd53c16SYuanhan Liu This callback is invoked when the features is changed. For example, 162abd53c16SYuanhan Liu ``VHOST_F_LOG_ALL`` will be set/cleared at the start/end of live 163abd53c16SYuanhan Liu migration, respectively. 164abd53c16SYuanhan Liu 165efba12a7SDariusz Stojaczyk * ``new_connection(int vid)`` 166efba12a7SDariusz Stojaczyk 167efba12a7SDariusz Stojaczyk This callback is invoked on new vhost-user socket connection. If DPDK 168efba12a7SDariusz Stojaczyk acts as the server the device should not be deleted before 169efba12a7SDariusz Stojaczyk ``destroy_connection`` callback is received. 170efba12a7SDariusz Stojaczyk 171efba12a7SDariusz Stojaczyk * ``destroy_connection(int vid)`` 172efba12a7SDariusz Stojaczyk 173efba12a7SDariusz Stojaczyk This callback is invoked when vhost-user socket connection is closed. 174efba12a7SDariusz Stojaczyk It indicates that device with id ``vid`` is no longer in use and can be 175efba12a7SDariusz Stojaczyk safely deleted. 176efba12a7SDariusz Stojaczyk 177af147591SYuanhan Liu* ``rte_vhost_driver_disable/enable_features(path, features))`` 178af147591SYuanhan Liu 179af147591SYuanhan Liu This function disables/enables some features. For example, it can be used to 180af147591SYuanhan Liu disable mergeable buffers and TSO features, which both are enabled by 181af147591SYuanhan Liu default. 182af147591SYuanhan Liu 183af147591SYuanhan Liu* ``rte_vhost_driver_start(path)`` 184af147591SYuanhan Liu 185af147591SYuanhan Liu This function triggers the vhost-user negotiation. It should be invoked at 186af147591SYuanhan Liu the end of initializing a vhost-user driver. 187af147591SYuanhan Liu 1882bfaec90SYuanhan Liu* ``rte_vhost_enqueue_burst(vid, queue_id, pkts, count)`` 1892bfaec90SYuanhan Liu 1902bfaec90SYuanhan Liu Transmits (enqueues) ``count`` packets from host to guest. 1912bfaec90SYuanhan Liu 1922bfaec90SYuanhan Liu* ``rte_vhost_dequeue_burst(vid, queue_id, mbuf_pool, pkts, count)`` 1932bfaec90SYuanhan Liu 1942bfaec90SYuanhan Liu Receives (dequeues) ``count`` packets from guest, and stored them at ``pkts``. 1952bfaec90SYuanhan Liu 196939066d9SFan Zhang* ``rte_vhost_crypto_create(vid, cryptodev_id, sess_mempool, socket_id)`` 197939066d9SFan Zhang 198939066d9SFan Zhang As an extension of new_device(), this function adds virtio-crypto workload 199939066d9SFan Zhang acceleration capability to the device. All crypto workload is processed by 200939066d9SFan Zhang DPDK cryptodev with the device ID of ``cryptodev_id``. 201939066d9SFan Zhang 202939066d9SFan Zhang* ``rte_vhost_crypto_free(vid)`` 203939066d9SFan Zhang 204939066d9SFan Zhang Frees the memory and vhost-user message handlers created in 205939066d9SFan Zhang rte_vhost_crypto_create(). 206939066d9SFan Zhang 207939066d9SFan Zhang* ``rte_vhost_crypto_fetch_requests(vid, queue_id, ops, nb_ops)`` 208939066d9SFan Zhang 209939066d9SFan Zhang Receives (dequeues) ``nb_ops`` virtio-crypto requests from guest, parses 210939066d9SFan Zhang them to DPDK Crypto Operations, and fills the ``ops`` with parsing results. 211939066d9SFan Zhang 212939066d9SFan Zhang* ``rte_vhost_crypto_finalize_requests(queue_id, ops, nb_ops)`` 213939066d9SFan Zhang 214939066d9SFan Zhang After the ``ops`` are dequeued from Cryptodev, finalizes the jobs and 215939066d9SFan Zhang notifies the guest(s). 216939066d9SFan Zhang 217939066d9SFan Zhang* ``rte_vhost_crypto_set_zero_copy(vid, option)`` 218939066d9SFan Zhang 219939066d9SFan Zhang Enable or disable zero copy feature of the vhost crypto backend. 220939066d9SFan Zhang 221acbc3888SJiayu Hu* ``rte_vhost_async_channel_register(vid, queue_id, config, ops)`` 222362f06f9SPatrick Fu 223acbc3888SJiayu Hu Register an async copy device channel for a vhost queue after vring 224acbc3888SJiayu Hu is enabled. Following device ``config`` must be specified together 225bcabc70aSJiayu Hu with the registration: 226362f06f9SPatrick Fu 227acbc3888SJiayu Hu * ``features`` 228362f06f9SPatrick Fu 229acbc3888SJiayu Hu This field is used to specify async copy device features. 230362f06f9SPatrick Fu 231acbc3888SJiayu Hu ``RTE_VHOST_ASYNC_INORDER`` represents the async copy device can 232acbc3888SJiayu Hu guarantee the order of copy completion is the same as the order 233acbc3888SJiayu Hu of copy submission. 234acbc3888SJiayu Hu 235acbc3888SJiayu Hu Currently, only ``RTE_VHOST_ASYNC_INORDER`` capable device is 236acbc3888SJiayu Hu supported by vhost. 237362f06f9SPatrick Fu 238362f06f9SPatrick Fu * ``async_threshold`` 239362f06f9SPatrick Fu 240362f06f9SPatrick Fu The copy length (in bytes) below which CPU copy will be used even if 241362f06f9SPatrick Fu applications call async vhost APIs to enqueue/dequeue data. 242362f06f9SPatrick Fu 243acbc3888SJiayu Hu Typical value is 256~1024 depending on the async device capability. 244362f06f9SPatrick Fu 245362f06f9SPatrick Fu Applications must provide following ``ops`` callbacks for vhost lib to 246362f06f9SPatrick Fu work with the async copy devices: 247362f06f9SPatrick Fu 248362f06f9SPatrick Fu * ``transfer_data(vid, queue_id, descs, opaque_data, count)`` 249362f06f9SPatrick Fu 250362f06f9SPatrick Fu vhost invokes this function to submit copy data to the async devices. 251362f06f9SPatrick Fu For non-async_inorder capable devices, ``opaque_data`` could be used 252362f06f9SPatrick Fu for identifying the completed packets. 253362f06f9SPatrick Fu 254362f06f9SPatrick Fu * ``check_completed_copies(vid, queue_id, opaque_data, max_packets)`` 255362f06f9SPatrick Fu 256362f06f9SPatrick Fu vhost invokes this function to get the copy data completed by async 257362f06f9SPatrick Fu devices. 258362f06f9SPatrick Fu 259fa51f1aaSJiayu Hu* ``rte_vhost_async_channel_register_thread_unsafe(vid, queue_id, config, ops)`` 260fa51f1aaSJiayu Hu 261fa51f1aaSJiayu Hu Register an async copy device channel for a vhost queue without 262fa51f1aaSJiayu Hu performing any locking. 263fa51f1aaSJiayu Hu 264fa51f1aaSJiayu Hu This function is only safe to call in vhost callback functions 265fa51f1aaSJiayu Hu (i.e., struct vhost_device_ops). 266fa51f1aaSJiayu Hu 267362f06f9SPatrick Fu* ``rte_vhost_async_channel_unregister(vid, queue_id)`` 268362f06f9SPatrick Fu 269362f06f9SPatrick Fu Unregister the async copy device channel from a vhost queue. 270bcabc70aSJiayu Hu Unregistration will fail, if the vhost queue has in-flight 271bcabc70aSJiayu Hu packets that are not completed. 272bcabc70aSJiayu Hu 273bcabc70aSJiayu Hu Unregister async copy devices in vring_state_changed() may 274bcabc70aSJiayu Hu fail, as this API tries to acquire the spinlock of vhost 275bcabc70aSJiayu Hu queue. The recommended way is to unregister async copy 276bcabc70aSJiayu Hu devices for all vhost queues in destroy_device(), when a 277bcabc70aSJiayu Hu virtio device is paused or shut down. 278362f06f9SPatrick Fu 279fa51f1aaSJiayu Hu* ``rte_vhost_async_channel_unregister_thread_unsafe(vid, queue_id)`` 280fa51f1aaSJiayu Hu 281fa51f1aaSJiayu Hu Unregister the async copy device channel for a vhost queue without 282fa51f1aaSJiayu Hu performing any locking. 283fa51f1aaSJiayu Hu 284fa51f1aaSJiayu Hu This function is only safe to call in vhost callback functions 285fa51f1aaSJiayu Hu (i.e., struct vhost_device_ops). 286fa51f1aaSJiayu Hu 2871b7b2438SJiayu Hu* ``rte_vhost_submit_enqueue_burst(vid, queue_id, pkts, count, comp_pkts, comp_count)`` 288362f06f9SPatrick Fu 289362f06f9SPatrick Fu Submit an enqueue request to transmit ``count`` packets from host to guest 2901b7b2438SJiayu Hu by async data path. Successfully enqueued packets can be transfer completed 2911b7b2438SJiayu Hu or being occupied by DMA engines; transfer completed packets are returned in 2921b7b2438SJiayu Hu ``comp_pkts``, but others are not guaranteed to finish, when this API 2931b7b2438SJiayu Hu call returns. 294362f06f9SPatrick Fu 295362f06f9SPatrick Fu Applications must not free the packets submitted for enqueue until the 296362f06f9SPatrick Fu packets are completed. 297362f06f9SPatrick Fu 298362f06f9SPatrick Fu* ``rte_vhost_poll_enqueue_completed(vid, queue_id, pkts, count)`` 299362f06f9SPatrick Fu 300362f06f9SPatrick Fu Poll enqueue completion status from async data path. Completed packets 301362f06f9SPatrick Fu are returned to applications through ``pkts``. 302362f06f9SPatrick Fu 3030c0935c5SJiayu Hu* ``rte_vhost_async_get_inflight(vid, queue_id)`` 3040c0935c5SJiayu Hu 3050c0935c5SJiayu Hu This function returns the amount of in-flight packets for the vhost 3060c0935c5SJiayu Hu queue using async acceleration. 3070c0935c5SJiayu Hu 308*b737fd61SCheng Jiang* ``rte_vhost_clear_queue_thread_unsafe(vid, queue_id, **pkts, count)`` 309*b737fd61SCheng Jiang 310*b737fd61SCheng Jiang Clear inflight packets which are submitted to DMA engine in vhost async data 311*b737fd61SCheng Jiang path. Completed packets are returned to applications through ``pkts``. 312*b737fd61SCheng Jiang 313647e191bSYuanhan LiuVhost-user Implementations 314647e191bSYuanhan Liu-------------------------- 31542683a7dSHuawei Xie 3162bfaec90SYuanhan LiuVhost-user uses Unix domain sockets for passing messages. This means the DPDK 3172bfaec90SYuanhan Liuvhost-user implementation has two options: 31842683a7dSHuawei Xie 3192bfaec90SYuanhan Liu* DPDK vhost-user acts as the server. 32042683a7dSHuawei Xie 3212bfaec90SYuanhan Liu DPDK will create a Unix domain socket server file and listen for 3222bfaec90SYuanhan Liu connections from the frontend. 32342683a7dSHuawei Xie 3242bfaec90SYuanhan Liu Note, this is the default mode, and the only mode before DPDK v16.07. 32542683a7dSHuawei Xie 3262bfaec90SYuanhan Liu 3272bfaec90SYuanhan Liu* DPDK vhost-user acts as the client. 3282bfaec90SYuanhan Liu 3292bfaec90SYuanhan Liu Unlike the server mode, this mode doesn't create the socket file; 3302bfaec90SYuanhan Liu it just tries to connect to the server (which responses to create the 3312bfaec90SYuanhan Liu file instead). 3322bfaec90SYuanhan Liu 3332bfaec90SYuanhan Liu When the DPDK vhost-user application restarts, DPDK vhost-user will try to 3342bfaec90SYuanhan Liu connect to the server again. This is how the "reconnect" feature works. 3352bfaec90SYuanhan Liu 336f6ee75b5SYuanhan Liu .. Note:: 337f6ee75b5SYuanhan Liu * The "reconnect" feature requires **QEMU v2.7** (or above). 338f6ee75b5SYuanhan Liu 339f6ee75b5SYuanhan Liu * The vhost supported features must be exactly the same before and 340f6ee75b5SYuanhan Liu after the restart. For example, if TSO is disabled and then enabled, 341f6ee75b5SYuanhan Liu nothing will work and issues undefined might happen. 3422bfaec90SYuanhan Liu 3432bfaec90SYuanhan LiuNo matter which mode is used, once a connection is established, DPDK 3442bfaec90SYuanhan Liuvhost-user will start receiving and processing vhost messages from QEMU. 3452bfaec90SYuanhan Liu 3462bfaec90SYuanhan LiuFor messages with a file descriptor, the file descriptor can be used directly 3472bfaec90SYuanhan Liuin the vhost process as it is already installed by the Unix domain socket. 3482bfaec90SYuanhan Liu 3492bfaec90SYuanhan LiuThe supported vhost messages are: 3502bfaec90SYuanhan Liu 3512bfaec90SYuanhan Liu* ``VHOST_SET_MEM_TABLE`` 3522bfaec90SYuanhan Liu* ``VHOST_SET_VRING_KICK`` 3532bfaec90SYuanhan Liu* ``VHOST_SET_VRING_CALL`` 3542bfaec90SYuanhan Liu* ``VHOST_SET_LOG_FD`` 3552bfaec90SYuanhan Liu* ``VHOST_SET_VRING_ERR`` 3562bfaec90SYuanhan Liu 3572bfaec90SYuanhan LiuFor ``VHOST_SET_MEM_TABLE`` message, QEMU will send information for each 3582bfaec90SYuanhan Liumemory region and its file descriptor in the ancillary data of the message. 3592bfaec90SYuanhan LiuThe file descriptor is used to map that region. 3602bfaec90SYuanhan Liu 3612bfaec90SYuanhan Liu``VHOST_SET_VRING_KICK`` is used as the signal to put the vhost device into 3622bfaec90SYuanhan Liuthe data plane, and ``VHOST_GET_VRING_BASE`` is used as the signal to remove 3632bfaec90SYuanhan Liuthe vhost device from the data plane. 36442683a7dSHuawei Xie 36542683a7dSHuawei XieWhen the socket connection is closed, vhost will destroy the device. 36642683a7dSHuawei Xie 367768274ebSJianfeng TanGuest memory requirement 368768274ebSJianfeng Tan------------------------ 369768274ebSJianfeng Tan 370768274ebSJianfeng Tan* Memory pre-allocation 371768274ebSJianfeng Tan 372cacf8267SMaxime Coquelin For non-async data path, guest memory pre-allocation is not a 373362f06f9SPatrick Fu must. This can help save of memory. If users really want the guest memory 374362f06f9SPatrick Fu to be pre-allocated (e.g., for performance reason), we can add option 375362f06f9SPatrick Fu ``-mem-prealloc`` when starting QEMU. Or, we can lock all memory at vhost 376362f06f9SPatrick Fu side which will force memory to be allocated when mmap at vhost side; 377362f06f9SPatrick Fu option --mlockall in ovs-dpdk is an example in hand. 378768274ebSJianfeng Tan 379cacf8267SMaxime Coquelin For async data path, we force the VM memory to be pre-allocated at vhost 380cacf8267SMaxime Coquelin lib when mapping the guest memory; and also we need to lock the memory to 381cacf8267SMaxime Coquelin prevent pages being swapped out to disk. 382768274ebSJianfeng Tan 383768274ebSJianfeng Tan* Memory sharing 384768274ebSJianfeng Tan 385768274ebSJianfeng Tan Make sure ``share=on`` QEMU option is given. vhost-user will not work with 386768274ebSJianfeng Tan a QEMU version without shared memory mapping. 387768274ebSJianfeng Tan 3880ee5e7fbSSiobhan ButlerVhost supported vSwitch reference 3890ee5e7fbSSiobhan Butler--------------------------------- 3900ee5e7fbSSiobhan Butler 3912bfaec90SYuanhan LiuFor more vhost details and how to support vhost in vSwitch, please refer to 3922bfaec90SYuanhan Liuthe vhost example in the DPDK Sample Applications Guide. 3936beea244SZhihong Wang 3946beea244SZhihong WangVhost data path acceleration (vDPA) 3956beea244SZhihong Wang----------------------------------- 3966beea244SZhihong Wang 3976beea244SZhihong WangvDPA supports selective datapath in vhost-user lib by enabling virtio ring 3986beea244SZhihong Wangcompatible devices to serve virtio driver directly for datapath acceleration. 3996beea244SZhihong Wang 4006beea244SZhihong Wang``rte_vhost_driver_attach_vdpa_device`` is used to configure the vhost device 4016beea244SZhihong Wangwith accelerated backend. 4026beea244SZhihong Wang 4036beea244SZhihong WangAlso vhost device capabilities are made configurable to adopt various devices. 4046beea244SZhihong WangSuch capabilities include supported features, protocol features, queue number. 4056beea244SZhihong Wang 4066beea244SZhihong WangFinally, a set of device ops is defined for device specific operations: 4076beea244SZhihong Wang 4086beea244SZhihong Wang* ``get_queue_num`` 4096beea244SZhihong Wang 4106beea244SZhihong Wang Called to get supported queue number of the device. 4116beea244SZhihong Wang 4126beea244SZhihong Wang* ``get_features`` 4136beea244SZhihong Wang 4146beea244SZhihong Wang Called to get supported features of the device. 4156beea244SZhihong Wang 4166beea244SZhihong Wang* ``get_protocol_features`` 4176beea244SZhihong Wang 4186beea244SZhihong Wang Called to get supported protocol features of the device. 4196beea244SZhihong Wang 4206beea244SZhihong Wang* ``dev_conf`` 4216beea244SZhihong Wang 4226beea244SZhihong Wang Called to configure the actual device when the virtio device becomes ready. 4236beea244SZhihong Wang 4246beea244SZhihong Wang* ``dev_close`` 4256beea244SZhihong Wang 4266beea244SZhihong Wang Called to close the actual device when the virtio device is stopped. 4276beea244SZhihong Wang 4286beea244SZhihong Wang* ``set_vring_state`` 4296beea244SZhihong Wang 4306beea244SZhihong Wang Called to change the state of the vring in the actual device when vring state 4316beea244SZhihong Wang changes. 4326beea244SZhihong Wang 4336beea244SZhihong Wang* ``set_features`` 4346beea244SZhihong Wang 4356beea244SZhihong Wang Called to set the negotiated features to device. 4366beea244SZhihong Wang 4376beea244SZhihong Wang* ``migration_done`` 4386beea244SZhihong Wang 4396beea244SZhihong Wang Called to allow the device to response to RARP sending. 4406beea244SZhihong Wang 4416beea244SZhihong Wang* ``get_vfio_group_fd`` 4426beea244SZhihong Wang 4436beea244SZhihong Wang Called to get the VFIO group fd of the device. 4446beea244SZhihong Wang 4456beea244SZhihong Wang* ``get_vfio_device_fd`` 4466beea244SZhihong Wang 4476beea244SZhihong Wang Called to get the VFIO device fd of the device. 4486beea244SZhihong Wang 4496beea244SZhihong Wang* ``get_notify_area`` 4506beea244SZhihong Wang 4516beea244SZhihong Wang Called to get the notify area info of the queue. 452