xref: /dpdk/doc/guides/nics/af_xdp.rst (revision ffcfbfc89ad27c5155858bee1a7b50a512573744)
1..  SPDX-License-Identifier: BSD-3-Clause
2    Copyright(c) 2019-2020 Intel Corporation.
3
4AF_XDP Poll Mode Driver
5==========================
6
7AF_XDP is an address family that is optimized for high performance
8packet processing. AF_XDP sockets enable the possibility for an XDP program to
9redirect packets to a memory buffer in userspace.
10
11Further information about AF_XDP can be found in the
12`AF_XDP kernel documentation
13<https://www.kernel.org/doc/Documentation/networking/af_xdp.rst>`_.
14
15This Linux-specific PMD creates the AF_XDP socket and binds it to a
16specific netdev queue. The DPDK application can then send and receive raw
17packets through the socket which bypass the kernel network stack.
18
19Prerequisites
20-------------
21
22*  A Linux Kernel (version >= v4.18) with the XDP sockets configuration option
23   enabled (CONFIG_XDP_SOCKETS=y).
24*  Both libxdp (>= v1.2.2) and libbpf (any version) libraries installed, or
25   alternatively just the libbpf library <= v0.6.0.
26*  The pkg-config package should be installed on the system as it is used to
27   discover the libbpf and libxdp libraries and determine their versions are
28   sufficient.
29*  If using libxdp, it requires an environment variable called
30   LIBXDP_OBJECT_PATH to be set to the location of where libxdp placed its bpf
31   object files. This is usually in /usr/local/lib/bpf or /usr/local/lib64/bpf.
32*  A Kernel bound interface to attach to.
33*  The need_wakeup feature requires kernel version >= v5.4.
34*  The PMD zero copy feature requires kernel version >= v5.4.
35*  The shared UMEM feature requires kernel version >= v5.10 and libbpf version
36   v0.2.0 or later. The LINUX_VERSION_CODE defined in the version.h kernel
37   header is used to determine the kernel version at compile time.
38*  A kernel with version 5.4 or later is required for 32-bit OS.
39*  The busy polling feature requires kernel version >= v5.11.
40
41
42Options
43-------
44
45iface
46~~~~~
47
48The ``iface`` option is the only required option. It is the name of the Kernel
49interface to attach to.
50
51.. code-block:: console
52
53    --vdev net_af_xdp,iface=ens786f1
54
55The socket will by default be created on Rx queue 0. To ensure traffic lands on
56this queue, one can use flow steering if the network card supports it. Or, a
57simpler way is to reduce the number of configured queues for the device to just
58a single queue which will ensure that all traffic will land on that queue (queue
591) and thus reach the socket. This can be configured using ethtool:
60
61.. code-block:: console
62
63    ethtool -L ens786f1 combined 1
64
65start_queue
66~~~~~~~~~~~
67
68To create a socket on another queue, first configure the netdev with multiple
69queues, for example 2, like so:
70
71.. code-block:: console
72
73    ethtool -L ens786f1 combined 2
74
75Then, create the socket on one of those queues, for example queue 1:
76
77.. code-block:: console
78
79    --vdev net_af_xdp,iface=ens786f1,start_queue=1
80
81queue_count
82~~~~~~~~~~~
83
84To create a PMD with sockets on multiple queues, use the queue_count arg. The
85following example creates sockets on queues 0 and 1:
86
87.. code-block:: console
88
89    --vdev net_af_xdp,iface=ens786f1,queue_count=2
90
91shared_umem
92~~~~~~~~~~~
93
94The shared UMEM feature allows for two sockets to share UMEM and can be
95configured like so:
96
97.. code-block:: console
98
99    --vdev net_af_xdp0,iface=ens786f1,shared_umem=1 \
100    --vdev net_af_xdp1,iface=ens786f2,shared_umem=1
101
102xdp_prog
103~~~~~~~~
104
105The xdp_prog argument allows for the user to provide a path to a custom XDP
106program which should be used in place of the default libbpf/libxdp program which
107simply redirects packets to the sockets. For example:
108
109.. code-block:: console
110
111    --vdev net_af_xdp,iface=ens786f1,xdp_prog=/path/to/prog.o
112
113busy_budget
114~~~~~~~~~~~
115
116The busy polling feature aims to improve single-core performance for AF_XDP
117sockets under heavy load. It is enabled by default if the detected kernel
118version is sufficient ie. >= v5.11. The busy_budget arg sets the busy-polling
119NAPI budget which is the number of packets the kernel will attempt to process in
120the netdev's NAPI context. It can be configured like so:
121
122.. code-block:: console
123
124    --vdev net_af_xdp,iface=ens786f1,busy_budget=32
125
126To disable busy polling, simply set the busy_budget to 0:
127
128.. code-block:: console
129
130    --vdev net_af_xdp,iface=ens786f1,busy_budget=0
131
132It is also strongly recommended to set the following for optimal performance
133when using the busy polling feature:
134
135.. code-block:: console
136
137    echo 2 | sudo tee /sys/class/net/ens786f1/napi_defer_hard_irqs
138    echo 200000 | sudo tee /sys/class/net/ens786f1/gro_flush_timeout
139
140The above defers interrupts for interface ens786f1 and instead schedules its
141NAPI context from a watchdog timer instead of from softirqs. More information
142on this feature can be found at [1].
143
144force_copy
145~~~~~~~~~~
146
147The force_copy argument allows the user to force the socket to use copy mode
148instead of zero copy mode (if available).
149
150.. code-block:: console
151
152    --vdev net_af_xdp,iface=ens786f1,force_copy=1
153
154use_cni
155~~~~~~~
156
157The EAL vdev argument ``use_cni`` is used to indicate that the user wishes to
158enable the `AF_XDP Device Plugin for Kubernetes`_ with a DPDK application/pod.
159
160.. _AF_XDP Device Plugin for Kubernetes: https://github.com/redhat-et/afxdp-plugins-for-kubernetes
161
162.. code-block:: console
163
164   --vdev=net_af_xdp0,use_cni=1
165
166.. note::
167
168   When using `use_cni`_, both parameters `xdp_prog`_ and `busy_budget`_ are disabled
169   as both of these will be handled by the AF_XDP plugin.
170   Since the DPDK application is running in limited privileges
171   so enabling and disabling of the promiscuous mode through the DPDK application
172   is also not supported.
173
174use_pinned_map
175~~~~~~~~~~~~~~
176
177The EAL vdev argument ``use_pinned_map`` is used to indicate that the user wishes to
178load a pinned xskmap mounted by `AF_XDP Device Plugin for Kubernetes`_ in the DPDK
179application/pod.
180
181.. _AF_XDP Device Plugin for Kubernetes: https://github.com/redhat-et/afxdp-plugins-for-kubernetes
182
183.. code-block:: console
184
185   --vdev=net_af_xdp0,use_pinned_map=1
186
187.. note::
188
189    This feature can also be used with any external entity that can pin an eBPF map, not just
190    the `AF_XDP Device Plugin for Kubernetes`_.
191
192dp_path
193~~~~~~~
194
195The EAL vdev argument ``dp_path`` is used alongside
196the ``use_cni`` or ``use_pinned_map`` arguments
197to explicitly tell the AF_XDP PMD where to find either:
198
1991. The UDS to interact with the AF_XDP Device Plugin. OR
2002. The pinned xskmap to use when creating AF_XDP sockets.
201
202If this argument is not passed alongside the ``use_cni`` or ``use_pinned_map`` arguments then
203the AF_XDP PMD configures it internally to the `AF_XDP Device Plugin for Kubernetes`_.
204
205.. _AF_XDP Device Plugin for Kubernetes: https://github.com/redhat-et/afxdp-plugins-for-kubernetes
206
207.. code-block:: console
208
209   --vdev=net_af_xdp0,use_cni=1,dp_path="/tmp/afxdp_dp/<<interface name>>/afxdp.sock"
210
211.. code-block:: console
212
213   --vdev=net_af_xdp0,use_pinned_map=1,dp_path="/tmp/afxdp_dp/<<interface name>>/xsks_map"
214
215Limitations
216-----------
217
218- **MTU**
219
220  The MTU of the AF_XDP PMD is limited due to the XDP requirement of one packet
221  per page. In the PMD we report the maximum MTU for zero copy to be equal
222  to the page size less the frame overhead introduced by AF_XDP (XDP HR = 256)
223  and DPDK (frame headroom = 320). With a 4K page size this works out at 3520.
224  However in practice this value may be even smaller, due to differences between
225  the supported RX buffer sizes of the underlying kernel netdev driver.
226
227  For example, the largest RX buffer size supported by the underlying kernel driver
228  which is less than the page size (4096B) may be 3072B. In this case, the maximum
229  MTU value will be at most 3072, but likely even smaller than this, once relevant
230  headers are accounted for eg. Ethernet and VLAN.
231
232  To determine the actual maximum MTU value of the interface you are using with the
233  AF_XDP PMD, consult the documentation for the kernel driver.
234
235  Note: The AF_XDP PMD will fail to initialise if an MTU which violates the driver's
236  conditions as above is set prior to launching the application.
237
238- **Shared UMEM**
239
240  The sharing of UMEM is only supported for AF_XDP sockets with unique contexts.
241  The context refers to the netdev,qid tuple.
242
243  The following combination will fail:
244
245  .. code-block:: console
246
247    --vdev net_af_xdp0,iface=ens786f1,shared_umem=1 \
248    --vdev net_af_xdp1,iface=ens786f1,shared_umem=1 \
249
250  Either of the following however is permitted since either the netdev or qid differs
251  between the two vdevs:
252
253  .. code-block:: console
254
255    --vdev net_af_xdp0,iface=ens786f1,shared_umem=1 \
256    --vdev net_af_xdp1,iface=ens786f1,start_queue=1,shared_umem=1 \
257
258  .. code-block:: console
259
260    --vdev net_af_xdp0,iface=ens786f1,shared_umem=1 \
261    --vdev net_af_xdp1,iface=ens786f2,shared_umem=1 \
262
263- **Secondary Processes**
264
265  Rx and Tx are not supported for secondary processes due to memory mapping of
266  the AF_XDP rings being assigned by the kernel in the primary process only.
267  However other operations including statistics retrieval are permitted.
268  The maximum number of queues permitted for PMDs operating in this model is 8
269  as this is the maximum number of fds that can be sent through the IPC APIs as
270  defined by RTE_MP_MAX_FD_NUM.
271
272- **libxdp**
273
274  When using the default program (ie. when the vdev arg 'xdp_prog' is not used),
275  the following logs will appear when an application is launched:
276
277  .. code-block:: console
278
279    libbpf: elf: skipping unrecognized data section(7) .xdp_run_config
280    libbpf: elf: skipping unrecognized data section(8) xdp_metadata
281
282  These logs are not errors and can be ignored.
283
284  [1] https://lwn.net/Articles/837010/
285