1.. SPDX-License-Identifier: BSD-3-Clause 2 Copyright(c) 2019-2020 Intel Corporation. 3 4AF_XDP Poll Mode Driver 5========================== 6 7AF_XDP is an address family that is optimized for high performance 8packet processing. AF_XDP sockets enable the possibility for an XDP program to 9redirect packets to a memory buffer in userspace. 10 11Further information about AF_XDP can be found in the 12`AF_XDP kernel documentation 13<https://www.kernel.org/doc/Documentation/networking/af_xdp.rst>`_. 14 15This Linux-specific PMD creates the AF_XDP socket and binds it to a 16specific netdev queue. The DPDK application can then send and receive raw 17packets through the socket which bypass the kernel network stack. 18 19Prerequisites 20------------- 21 22* A Linux Kernel (version >= v4.18) with the XDP sockets configuration option 23 enabled (CONFIG_XDP_SOCKETS=y). 24* Both libxdp (>= v1.2.2) and libbpf (any version) libraries installed, or 25 alternatively just the libbpf library <= v0.6.0. 26* The pkg-config package should be installed on the system as it is used to 27 discover the libbpf and libxdp libraries and determine their versions are 28 sufficient. 29* If using libxdp, it requires an environment variable called 30 LIBXDP_OBJECT_PATH to be set to the location of where libxdp placed its bpf 31 object files. This is usually in /usr/local/lib/bpf or /usr/local/lib64/bpf. 32* A Kernel bound interface to attach to. 33* The need_wakeup feature requires kernel version >= v5.4. 34* The PMD zero copy feature requires kernel version >= v5.4. 35* The shared UMEM feature requires kernel version >= v5.10 and libbpf version 36 v0.2.0 or later. The LINUX_VERSION_CODE defined in the version.h kernel 37 header is used to determine the kernel version at compile time. 38* A kernel with version 5.4 or later is required for 32-bit OS. 39* The busy polling feature requires kernel version >= v5.11. 40 41 42Options 43------- 44 45iface 46~~~~~ 47 48The ``iface`` option is the only required option. It is the name of the Kernel 49interface to attach to. 50 51.. code-block:: console 52 53 --vdev net_af_xdp,iface=ens786f1 54 55The socket will by default be created on Rx queue 0. To ensure traffic lands on 56this queue, one can use flow steering if the network card supports it. Or, a 57simpler way is to reduce the number of configured queues for the device to just 58a single queue which will ensure that all traffic will land on that queue (queue 591) and thus reach the socket. This can be configured using ethtool: 60 61.. code-block:: console 62 63 ethtool -L ens786f1 combined 1 64 65start_queue 66~~~~~~~~~~~ 67 68To create a socket on another queue, first configure the netdev with multiple 69queues, for example 2, like so: 70 71.. code-block:: console 72 73 ethtool -L ens786f1 combined 2 74 75Then, create the socket on one of those queues, for example queue 1: 76 77.. code-block:: console 78 79 --vdev net_af_xdp,iface=ens786f1,start_queue=1 80 81queue_count 82~~~~~~~~~~~ 83 84To create a PMD with sockets on multiple queues, use the queue_count arg. The 85following example creates sockets on queues 0 and 1: 86 87.. code-block:: console 88 89 --vdev net_af_xdp,iface=ens786f1,queue_count=2 90 91shared_umem 92~~~~~~~~~~~ 93 94The shared UMEM feature allows for two sockets to share UMEM and can be 95configured like so: 96 97.. code-block:: console 98 99 --vdev net_af_xdp0,iface=ens786f1,shared_umem=1 \ 100 --vdev net_af_xdp1,iface=ens786f2,shared_umem=1 101 102xdp_prog 103~~~~~~~~ 104 105The xdp_prog argument allows for the user to provide a path to a custom XDP 106program which should be used in place of the default libbpf/libxdp program which 107simply redirects packets to the sockets. For example: 108 109.. code-block:: console 110 111 --vdev net_af_xdp,iface=ens786f1,xdp_prog=/path/to/prog.o 112 113busy_budget 114~~~~~~~~~~~ 115 116The busy polling feature aims to improve single-core performance for AF_XDP 117sockets under heavy load. It is enabled by default if the detected kernel 118version is sufficient ie. >= v5.11. The busy_budget arg sets the busy-polling 119NAPI budget which is the number of packets the kernel will attempt to process in 120the netdev's NAPI context. It can be configured like so: 121 122.. code-block:: console 123 124 --vdev net_af_xdp,iface=ens786f1,busy_budget=32 125 126To disable busy polling, simply set the busy_budget to 0: 127 128.. code-block:: console 129 130 --vdev net_af_xdp,iface=ens786f1,busy_budget=0 131 132It is also strongly recommended to set the following for optimal performance 133when using the busy polling feature: 134 135.. code-block:: console 136 137 echo 2 | sudo tee /sys/class/net/ens786f1/napi_defer_hard_irqs 138 echo 200000 | sudo tee /sys/class/net/ens786f1/gro_flush_timeout 139 140The above defers interrupts for interface ens786f1 and instead schedules its 141NAPI context from a watchdog timer instead of from softirqs. More information 142on this feature can be found at [1]. 143 144force_copy 145~~~~~~~~~~ 146 147The force_copy argument allows the user to force the socket to use copy mode 148instead of zero copy mode (if available). 149 150.. code-block:: console 151 152 --vdev net_af_xdp,iface=ens786f1,force_copy=1 153 154use_cni 155~~~~~~~ 156 157The EAL vdev argument ``use_cni`` is used to indicate that the user wishes to 158enable the `AF_XDP Device Plugin for Kubernetes`_ with a DPDK application/pod. 159 160.. _AF_XDP Device Plugin for Kubernetes: https://github.com/redhat-et/afxdp-plugins-for-kubernetes 161 162.. code-block:: console 163 164 --vdev=net_af_xdp0,use_cni=1 165 166.. note:: 167 168 When using `use_cni`_, both parameters `xdp_prog`_ and `busy_budget`_ are disabled 169 as both of these will be handled by the AF_XDP plugin. 170 Since the DPDK application is running in limited privileges 171 so enabling and disabling of the promiscuous mode through the DPDK application 172 is also not supported. 173 174use_pinned_map 175~~~~~~~~~~~~~~ 176 177The EAL vdev argument ``use_pinned_map`` is used to indicate that the user wishes to 178load a pinned xskmap mounted by `AF_XDP Device Plugin for Kubernetes`_ in the DPDK 179application/pod. 180 181.. _AF_XDP Device Plugin for Kubernetes: https://github.com/redhat-et/afxdp-plugins-for-kubernetes 182 183.. code-block:: console 184 185 --vdev=net_af_xdp0,use_pinned_map=1 186 187.. note:: 188 189 This feature can also be used with any external entity that can pin an eBPF map, not just 190 the `AF_XDP Device Plugin for Kubernetes`_. 191 192dp_path 193~~~~~~~ 194 195The EAL vdev argument ``dp_path`` is used alongside 196the ``use_cni`` or ``use_pinned_map`` arguments 197to explicitly tell the AF_XDP PMD where to find either: 198 1991. The UDS to interact with the AF_XDP Device Plugin. OR 2002. The pinned xskmap to use when creating AF_XDP sockets. 201 202If this argument is not passed alongside the ``use_cni`` or ``use_pinned_map`` arguments then 203the AF_XDP PMD configures it internally to the `AF_XDP Device Plugin for Kubernetes`_. 204 205.. _AF_XDP Device Plugin for Kubernetes: https://github.com/redhat-et/afxdp-plugins-for-kubernetes 206 207.. code-block:: console 208 209 --vdev=net_af_xdp0,use_cni=1,dp_path="/tmp/afxdp_dp/<<interface name>>/afxdp.sock" 210 211.. code-block:: console 212 213 --vdev=net_af_xdp0,use_pinned_map=1,dp_path="/tmp/afxdp_dp/<<interface name>>/xsks_map" 214 215Limitations 216----------- 217 218- **MTU** 219 220 The MTU of the AF_XDP PMD is limited due to the XDP requirement of one packet 221 per page. In the PMD we report the maximum MTU for zero copy to be equal 222 to the page size less the frame overhead introduced by AF_XDP (XDP HR = 256) 223 and DPDK (frame headroom = 320). With a 4K page size this works out at 3520. 224 However in practice this value may be even smaller, due to differences between 225 the supported RX buffer sizes of the underlying kernel netdev driver. 226 227 For example, the largest RX buffer size supported by the underlying kernel driver 228 which is less than the page size (4096B) may be 3072B. In this case, the maximum 229 MTU value will be at most 3072, but likely even smaller than this, once relevant 230 headers are accounted for eg. Ethernet and VLAN. 231 232 To determine the actual maximum MTU value of the interface you are using with the 233 AF_XDP PMD, consult the documentation for the kernel driver. 234 235 Note: The AF_XDP PMD will fail to initialise if an MTU which violates the driver's 236 conditions as above is set prior to launching the application. 237 238- **Shared UMEM** 239 240 The sharing of UMEM is only supported for AF_XDP sockets with unique contexts. 241 The context refers to the netdev,qid tuple. 242 243 The following combination will fail: 244 245 .. code-block:: console 246 247 --vdev net_af_xdp0,iface=ens786f1,shared_umem=1 \ 248 --vdev net_af_xdp1,iface=ens786f1,shared_umem=1 \ 249 250 Either of the following however is permitted since either the netdev or qid differs 251 between the two vdevs: 252 253 .. code-block:: console 254 255 --vdev net_af_xdp0,iface=ens786f1,shared_umem=1 \ 256 --vdev net_af_xdp1,iface=ens786f1,start_queue=1,shared_umem=1 \ 257 258 .. code-block:: console 259 260 --vdev net_af_xdp0,iface=ens786f1,shared_umem=1 \ 261 --vdev net_af_xdp1,iface=ens786f2,shared_umem=1 \ 262 263- **Secondary Processes** 264 265 Rx and Tx are not supported for secondary processes due to memory mapping of 266 the AF_XDP rings being assigned by the kernel in the primary process only. 267 However other operations including statistics retrieval are permitted. 268 The maximum number of queues permitted for PMDs operating in this model is 8 269 as this is the maximum number of fds that can be sent through the IPC APIs as 270 defined by RTE_MP_MAX_FD_NUM. 271 272- **libxdp** 273 274 When using the default program (ie. when the vdev arg 'xdp_prog' is not used), 275 the following logs will appear when an application is launched: 276 277 .. code-block:: console 278 279 libbpf: elf: skipping unrecognized data section(7) .xdp_run_config 280 libbpf: elf: skipping unrecognized data section(8) xdp_metadata 281 282 These logs are not errors and can be ignored. 283 284 [1] https://lwn.net/Articles/837010/ 285