xref: /dpdk/doc/guides/nics/virtio.rst (revision fc7428ead4ee6e1239ccd0bd4065edd974549ad6)
1..  SPDX-License-Identifier: BSD-3-Clause
2    Copyright(c) 2010-2015 Intel Corporation.
3
4Poll Mode Driver for Emulated Virtio NIC
5========================================
6
7Virtio is a para-virtualization framework initiated by IBM, and supported by KVM hypervisor.
8In the Data Plane Development Kit (DPDK),
9we provide a virtio Poll Mode Driver (PMD) as a software solution, comparing to SRIOV hardware solution,
10for fast guest VM to guest VM communication and guest VM to host communication.
11
12Vhost is a kernel acceleration module for virtio qemu backend.
13
14For basic qemu-KVM installation and other Intel EM poll mode driver in guest VM,
15please refer to Chapter "Driver for VM Emulated Devices".
16
17In this chapter, we will demonstrate usage of virtio PMD with two backends,
18standard qemu vhost back end.
19
20Virtio Implementation in DPDK
21-----------------------------
22
23For details about the virtio spec, refer to the latest
24`VIRTIO (Virtual I/O) Device Specification
25<https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio>`_.
26
27As a PMD, virtio provides packet reception and transmission callbacks.
28
29In Rx, packets described by the used descriptors in vring are available
30for virtio to burst out.
31
32In Tx, packets described by the used descriptors in vring are available
33for virtio to clean. Virtio will enqueue to be transmitted packets into
34vring, make them available to the device, and then notify the host back
35end if necessary.
36
37Features and Limitations of virtio PMD
38--------------------------------------
39
40In this release, the virtio PMD provides the basic functionality of packet reception and transmission.
41
42*   It supports merge-able buffers per packet when receiving packets and scattered buffer per packet
43    when transmitting packets. The packet size supported is from 64 to 9728.
44
45*   It supports multicast packets and promiscuous mode.
46
47*   The descriptor number for the Rx/Tx queue is hard-coded to be 256 by qemu 2.7 and below.
48    If given a different descriptor number by the upper application,
49    the virtio PMD generates a warning and fall back to the hard-coded value.
50    Rx queue size can be configurable and up to 1024 since qemu 2.8 and above. Rx queue size is 256
51    by default. Tx queue size is still hard-coded to be 256.
52
53*   Features of mac/vlan filter are supported, negotiation with vhost/backend are needed to support them.
54    When backend can't support vlan filter, virtio app on guest should not enable vlan filter in order
55    to make sure the virtio port is configured correctly. E.g. do not specify '--enable-hw-vlan' in testpmd
56    command line. Note that, mac/vlan filter is best effort: unwanted packets could still arrive.
57
58*   "RTE_PKTMBUF_HEADROOM" should be defined
59    no less than "sizeof(struct virtio_net_hdr_mrg_rxbuf)", which is 12 bytes when mergeable or
60    "VIRTIO_F_VERSION_1" is set.
61    no less than "sizeof(struct virtio_net_hdr)", which is 10 bytes, when using non-mergeable.
62
63*   Virtio does not support runtime configuration.
64
65*   Virtio supports Link State interrupt.
66
67*   Virtio supports Rx interrupt (so far, only support 1:1 mapping for queue/interrupt).
68
69*   Virtio supports software vlan stripping and inserting.
70
71*   Virtio supports using port IO to get PCI resource when UIO module is not available.
72
73*   Virtio supports RSS Rx mode with 40B configurable hash key length, 128
74    configurable RETA entries and configurable hash types.
75
76Prerequisites
77-------------
78
79The following prerequisites apply:
80
81*   In the BIOS, turn VT-x and VT-d on
82
83*   Linux kernel with KVM module; vhost module loaded and ioeventfd supported.
84    Qemu standard backend without vhost support isn't tested, and probably isn't supported.
85
86*   When using legacy interface, ``SYS_RAWIO`` capability is required
87    for ``iopl()`` call to enable access to PCI I/O ports.
88
89
90Virtio with qemu virtio Back End
91--------------------------------
92
93.. _figure_host_vm_comms_qemu:
94
95.. figure:: img/host_vm_comms_qemu.*
96
97   Host2VM Communication Example Using qemu vhost Back End
98
99
100.. code-block:: console
101
102    qemu-system-x86_64 -enable-kvm -cpu host -m 2048 -smp 2 -mem-path /dev/
103    hugepages -mem-prealloc
104    -drive file=/data/DPDKVMS/dpdk-vm1
105    -netdev tap,id=vm1_p1,ifname=tap0,script=no,vhost=on
106    -device virtio-net-pci,netdev=vm1_p1,bus=pci.0,addr=0x3,ioeventfd=on
107    -device pci-assign,host=04:10.1 \
108
109In this example, the packet reception flow path is:
110
111    IXIA packet generator->82599 PF->Linux Bridge->TAP0's socket queue-> Guest
112    VM virtio port 0 Rx burst-> Guest VM 82599 VF port1 Tx burst-> IXIA packet
113    generator
114
115The packet transmission flow is:
116
117    IXIA packet generator-> Guest VM 82599 VF port1 Rx burst-> Guest VM virtio
118    port 0 Tx burst-> tap -> Linux Bridge->82599 PF-> IXIA packet generator
119
120
121Virtio PMD Rx/Tx Callbacks
122--------------------------
123
124Virtio driver has 6 Rx callbacks and 3 Tx callbacks.
125
126Rx callbacks:
127
128#. ``virtio_recv_pkts``:
129   Regular version without mergeable Rx buffer support for split virtqueue.
130
131#. ``virtio_recv_mergeable_pkts``:
132   Regular version with mergeable Rx buffer support for split virtqueue.
133
134#. ``virtio_recv_pkts_vec``:
135   Vector version without mergeable Rx buffer support, also fixes the available
136   ring indexes and uses vector instructions to optimize performance for split
137   virtqueue.
138
139#. ``virtio_recv_pkts_inorder``:
140   In-order version with mergeable and non-mergeable Rx buffer support
141   for split virtqueue.
142
143#. ``virtio_recv_pkts_packed``:
144   Regular and in-order version without mergeable Rx buffer support for
145   packed virtqueue.
146
147#. ``virtio_recv_mergeable_pkts_packed``:
148   Regular and in-order version with mergeable Rx buffer support for packed
149   virtqueue.
150
151Tx callbacks:
152
153#. ``virtio_xmit_pkts``:
154   Regular version for split virtqueue.
155
156#. ``virtio_xmit_pkts_inorder``:
157   In-order version for split virtqueue.
158
159#. ``virtio_xmit_pkts_packed``:
160   Regular and in-order version for packed virtqueue.
161
162By default, the non-vector callbacks are used:
163
164*   For Rx: If mergeable Rx buffers is disabled then ``virtio_recv_pkts``
165    or ``virtio_recv_pkts_packed`` will be used, otherwise
166    ``virtio_recv_mergeable_pkts`` or ``virtio_recv_mergeable_pkts_packed``
167    will be used.
168
169*   For Tx: ``virtio_xmit_pkts`` or ``virtio_xmit_pkts_packed`` will be used.
170
171
172Vector callbacks will be used when:
173
174*   Mergeable Rx buffers is disabled.
175
176The corresponding callbacks are:
177
178*   For Rx: ``virtio_recv_pkts_vec``.
179
180There is no vector callbacks for packed virtqueue for now.
181
182
183Example of using the vector version of the virtio poll mode driver in
184``testpmd``::
185
186   dpdk-testpmd -l 0-2 -n 4 -- -i --rxq=1 --txq=1 --nb-cores=1
187
188In-order callbacks only work on simulated virtio user vdev.
189
190For split virtqueue:
191
192*   For Rx: If in-order is enabled then ``virtio_recv_pkts_inorder`` is used.
193
194*   For Tx: If in-order is enabled then ``virtio_xmit_pkts_inorder`` is used.
195
196For packed virtqueue, the default callbacks already support the
197in-order feature.
198
199Interrupt mode
200--------------
201
202.. _virtio_interrupt_mode:
203
204There are three kinds of interrupts from a virtio device over PCI bus: config
205interrupt, Rx interrupts, and Tx interrupts. Config interrupt is used for
206notification of device configuration changes, especially link status (lsc).
207Interrupt mode is translated into Rx interrupts in the context of DPDK.
208
209.. Note::
210
211   Virtio PMD already has support for receiving lsc from qemu when the link
212   status changes, especially when vhost user disconnects. However, it fails
213   to do that if the VM is created by qemu 2.6.2 or below, since the
214   capability to detect vhost user disconnection is introduced in qemu 2.7.0.
215
216Prerequisites for Rx interrupts
217~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
218
219To support Rx interrupts,
220
221#. Check if guest kernel supports VFIO-NOIOMMU:
222
223    Linux started to support VFIO-NOIOMMU since 4.8.0. Make sure the guest
224    kernel is compiled with:
225
226    .. code-block:: console
227
228        CONFIG_VFIO_NOIOMMU=y
229
230#. Properly set msix vectors when starting VM:
231
232    Enable multi-queue when starting VM, and specify msix vectors in qemu
233    cmdline. (N+1) is the minimum, and (2N+2) is mostly recommended.
234
235    .. code-block:: console
236
237        $(QEMU) ... -device virtio-net-pci,mq=on,vectors=2N+2 ...
238
239#. In VM, insert vfio module in NOIOMMU mode:
240
241    .. code-block:: console
242
243        modprobe vfio enable_unsafe_noiommu_mode=1
244        modprobe vfio-pci
245
246#. In VM, bind the virtio device with vfio-pci:
247
248    .. code-block:: console
249
250        ./usertools/dpdk-devbind.py -b vfio-pci 00:03.0
251
252Example
253~~~~~~~
254
255Here we use l3fwd-power as an example to show how to get started.
256
257    Example:
258
259    .. code-block:: console
260
261        $ dpdk-l3fwd-power -l 0-1 -- -p 1 -P --config="(0,0,1)" \
262                                               --no-numa --parse-ptype
263
264
265Runtime Configuration
266---------------------
267
268Below devargs are supported by the PCI virtio driver:
269
270#.  ``vdpa``:
271
272    A virtio device could also be driven by vDPA (vhost data path acceleration)
273    driver, and works as a HW vhost backend. This argument is used to specify
274    a virtio device needs to work in vDPA mode.
275    (Default: 0 (disabled))
276
277#.  ``speed``:
278
279    It is used to specify link speed of virtio device. Link speed is a part of
280    link status structure. It could be requested by application using
281    rte_eth_link_get_nowait function.
282    (Default: 0xffffffff (Unknown))
283
284#.  ``vectorized``:
285
286    It is used to specify whether virtio device prefers to use vectorized path.
287    Afterwards, dependencies of vectorized path will be checked in path
288    election.
289    (Default: 0 (disabled))
290
291Below devargs are supported by the virtio-user vdev:
292
293#.  ``path``:
294
295    It is used to specify a path to connect to vhost backend.
296
297#.  ``mac``:
298
299    It is used to specify the MAC address.
300
301#.  ``cq``:
302
303    It is used to enable the control queue. (Default: 0 (disabled))
304
305#.  ``queue_size``:
306
307    It is used to specify the queue size. (Default: 256)
308
309#.  ``queues``:
310
311    It is used to specify the queue number. (Default: 1)
312
313#.  ``iface``:
314
315    It is used to specify the host interface name for vhost-kernel
316    backend.
317
318#.  ``server``:
319
320    It is used to enable the server mode when using vhost-user backend.
321    (Default: 0 (disabled))
322
323#.  ``mrg_rxbuf``:
324
325    It is used to enable virtio device mergeable Rx buffer feature.
326    (Default: 1 (enabled))
327
328#.  ``in_order``:
329
330    It is used to enable virtio device in-order feature.
331    (Default: 1 (enabled))
332
333#.  ``packed_vq``:
334
335    It is used to enable virtio device packed virtqueue feature.
336    (Default: 0 (disabled))
337
338#.  ``speed``:
339
340    It is used to specify link speed of virtio device. Link speed is a part of
341    link status structure. It could be requested by application using
342    rte_eth_link_get_nowait function.
343    (Default: 0xffffffff (Unknown))
344
345#.  ``vectorized``:
346
347    It is used to specify whether virtio device prefers to use vectorized path.
348    Afterwards, dependencies of vectorized path will be checked in path
349    election.
350    (Default: 0 (disabled))
351
352Virtio paths Selection and Usage
353--------------------------------
354
355Logically virtio-PMD has 9 paths based on the combination of virtio features
356(Rx mergeable, In-order, Packed virtqueue), below is an introduction of these
357features:
358
359*   `Rx mergeable <https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/
360    virtio-v1.1-cs01.html#x1-2140004>`_: With this feature negotiated, device
361    can receive large packets by combining individual descriptors.
362*   `In-order <https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/
363    virtio-v1.1-cs01.html#x1-690008>`_: Some devices always use descriptors
364    in the same order in which they have been made available, these
365    devices can offer the VIRTIO_F_IN_ORDER feature. With this feature negotiated,
366    driver will use descriptors in order.
367*   `Packed virtqueue <https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/
368    virtio-v1.1-cs01.html#x1-610007>`_: The structure of packed virtqueue is
369    different from split virtqueue, split virtqueue is composed of available ring,
370    used ring and descriptor table, while packed virtqueue is composed of descriptor
371    ring, driver event suppression and device event suppression. The idea behind
372    this is to improve performance by avoiding cache misses and make it easier
373    for hardware to implement.
374
375Virtio paths Selection
376~~~~~~~~~~~~~~~~~~~~~~
377
378If packed virtqueue is not negotiated, below split virtqueue paths will be selected
379according to below configuration:
380
381#. Split virtqueue mergeable path: If Rx mergeable is negotiated, in-order feature is
382   not negotiated, this path will be selected.
383
384#. Split virtqueue non-mergeable path: If Rx mergeable and in-order feature are not
385   negotiated, also Rx offload(s) are requested, this path will be selected.
386
387#. Split virtqueue in-order mergeable path: If Rx mergeable and in-order feature are
388   both negotiated, this path will be selected.
389
390#. Split virtqueue in-order non-mergeable path: If in-order feature is negotiated and
391   Rx mergeable is not negotiated, this path will be selected.
392
393#. Split virtqueue vectorized Rx path: If Rx mergeable is disabled and no Rx offload
394   requested, this path will be selected.
395
396If packed virtqueue is negotiated, below packed virtqueue paths will be selected
397according to below configuration:
398
399#. Packed virtqueue mergeable path: If Rx mergeable is negotiated, in-order feature
400   is not negotiated, this path will be selected.
401
402#. Packed virtqueue non-mergeable path: If Rx mergeable and in-order feature are not
403   negotiated, this path will be selected.
404
405#. Packed virtqueue in-order mergeable path: If in-order and Rx mergeable feature are
406   both negotiated, this path will be selected.
407
408#. Packed virtqueue in-order non-mergeable path: If in-order feature is negotiated and
409   Rx mergeable is not negotiated, this path will be selected.
410
411#. Packed virtqueue vectorized Rx path: If building and running environment support
412   (AVX512 || NEON) && in-order feature is negotiated && Rx mergeable
413   is not negotiated && TCP_LRO Rx offloading is disabled && vectorized option enabled,
414   this path will be selected.
415
416#. Packed virtqueue vectorized Tx path: If building and running environment support
417   (AVX512 || NEON)  && in-order feature is negotiated && vectorized option enabled,
418   this path will be selected.
419
420Rx/Tx callbacks of each Virtio path
421~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
422
423Refer to above description, virtio path and corresponding Rx/Tx callbacks will
424be selected automatically. Rx callbacks and Tx callbacks for each virtio path
425are shown in below table:
426
427.. table:: Virtio Paths and Callbacks
428
429   ============================================ ================================= ========================
430                 Virtio paths                            Rx callbacks                   Tx callbacks
431   ============================================ ================================= ========================
432   Split virtqueue mergeable path               virtio_recv_mergeable_pkts        virtio_xmit_pkts
433   Split virtqueue non-mergeable path           virtio_recv_pkts                  virtio_xmit_pkts
434   Split virtqueue in-order mergeable path      virtio_recv_pkts_inorder          virtio_xmit_pkts_inorder
435   Split virtqueue in-order non-mergeable path  virtio_recv_pkts_inorder          virtio_xmit_pkts_inorder
436   Split virtqueue vectorized Rx path           virtio_recv_pkts_vec              virtio_xmit_pkts
437   Packed virtqueue mergeable path              virtio_recv_mergeable_pkts_packed virtio_xmit_pkts_packed
438   Packed virtqueue non-mergeable path          virtio_recv_pkts_packed           virtio_xmit_pkts_packed
439   Packed virtqueue in-order mergeable path     virtio_recv_mergeable_pkts_packed virtio_xmit_pkts_packed
440   Packed virtqueue in-order non-mergeable path virtio_recv_pkts_packed           virtio_xmit_pkts_packed
441   Packed virtqueue vectorized Rx path          virtio_recv_pkts_packed_vec       virtio_xmit_pkts_packed
442   Packed virtqueue vectorized Tx path          virtio_recv_pkts_packed           virtio_xmit_pkts_packed_vec
443   ============================================ ================================= ========================
444
445Virtio paths Support Status from Release to Release
446~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
447
448Virtio feature implementation:
449
450*   In-order feature is supported since DPDK 18.08 by adding new Rx/Tx callbacks
451    ``virtio_recv_pkts_inorder`` and ``virtio_xmit_pkts_inorder``.
452*   Packed virtqueue is supported since DPDK 19.02 by adding new Rx/Tx callbacks
453    ``virtio_recv_pkts_packed`` , ``virtio_recv_mergeable_pkts_packed`` and
454    ``virtio_xmit_pkts_packed``.
455
456All virtio paths support status are shown in below table:
457
458.. table:: Virtio Paths and Releases
459
460   ============================================ ============= ============= ============= =======
461                  Virtio paths                  16.11 ~ 18.05 18.08 ~ 18.11 19.02 ~ 19.11 20.05 ~
462   ============================================ ============= ============= ============= =======
463   Split virtqueue mergeable path                     Y             Y             Y          Y
464   Split virtqueue non-mergeable path                 Y             Y             Y          Y
465   Split virtqueue vectorized Rx path                 Y             Y             Y          Y
466   Split virtqueue simple Tx path                     Y             N             N          N
467   Split virtqueue in-order mergeable path                          Y             Y          Y
468   Split virtqueue in-order non-mergeable path                      Y             Y          Y
469   Packed virtqueue mergeable path                                                Y          Y
470   Packed virtqueue non-mergeable path                                            Y          Y
471   Packed virtqueue in-order mergeable path                                       Y          Y
472   Packed virtqueue in-order non-mergeable path                                   Y          Y
473   Packed virtqueue vectorized Rx path                                                       Y
474   Packed virtqueue vectorized Tx path                                                       Y
475   ============================================ ============= ============= ============= =======
476
477QEMU Support Status
478~~~~~~~~~~~~~~~~~~~
479
480*   Qemu now supports three paths of split virtqueue: Split virtqueue mergeable path,
481    Split virtqueue non-mergeable path, Split virtqueue vectorized Rx path.
482*   Since qemu 4.2.0, Packed virtqueue mergeable path and Packed virtqueue non-mergeable
483    path can be supported.
484
485How to Debug
486~~~~~~~~~~~~
487
488If you meet performance drop or some other issues after upgrading the driver
489or configuration, below steps can help you identify which path you selected and
490root cause faster.
491
492#. Run vhost/virtio test case;
493
494#. Run "perf top" and check virtio Rx/Tx callback names;
495
496#. Identify which virtio path is selected refer to above table.
497