1.. SPDX-License-Identifier: BSD-3-Clause 2 Copyright(c) 2016 Intel Corporation. 3 4Tun|Tap Poll Mode Driver 5======================== 6 7The ``rte_eth_tap.c`` PMD creates a device using TAP interfaces on the 8local host. The PMD allows for DPDK and the host to communicate using a raw 9device interface on the host and in the DPDK application. 10 11The device created is a TAP device, which sends/receives packet in a raw 12format with a L2 header. The usage for a TAP PMD is for connectivity to the 13local host using a TAP interface. When the TAP PMD is initialized it will 14create a number of tap devices in the host accessed via ``ifconfig -a`` or 15``ip`` command. The commands can be used to assign and query the virtual like 16device. 17 18These TAP interfaces can be used with Wireshark or tcpdump or Pktgen-DPDK 19along with being able to be used as a network connection to the DPDK 20application. The method enable one or more interfaces is to use the 21``--vdev=net_tap0`` option on the DPDK application command line. Each 22``--vdev=net_tap1`` option given will create an interface named dtap0, dtap1, 23and so on. 24 25The interface name can be changed by adding the ``iface=foo0``, for example:: 26 27 --vdev=net_tap0,iface=foo0 --vdev=net_tap1,iface=foo1, ... 28 29Normally the PMD will generate a random MAC address, but when testing or with 30a static configuration the developer may need a fixed MAC address style. 31Using the option ``mac=fixed`` you can create a fixed known MAC address:: 32 33 --vdev=net_tap0,mac=fixed 34 35The MAC address will have a fixed value with the last octet incrementing by one 36for each interface string containing ``mac=fixed``. The MAC address is formatted 37as 00:'d':'t':'a':'p':[00-FF]. Convert the characters to hex and you get the 38actual MAC address: ``00:64:74:61:70:[00-FF]``. 39 40 --vdev=net_tap0,mac="00:64:74:61:70:11" 41 42The MAC address will have a user value passed as string. The MAC address is in 43format with delimiter ``:``. The string is byte converted to hex and you get 44the actual MAC address: ``00:64:74:61:70:11``. 45 46It is possible to specify a remote netdevice to capture packets from by adding 47``remote=foo1``, for example:: 48 49 --vdev=net_tap,iface=tap0,remote=foo1 50 51If a ``remote`` is set, the tap MAC address will be set to match the remote one 52just after netdevice creation. Using TC rules, traffic from the remote netdevice 53will be redirected to the tap. If the tap is in promiscuous mode, then all 54packets will be redirected. In allmulti mode, all multicast packets will be 55redirected. 56 57Using the remote feature is especially useful for capturing traffic from a 58netdevice that has no support in the DPDK. It is possible to add explicit 59rte_flow rules on the tap PMD to capture specific traffic (see next section for 60examples). 61 62After the DPDK application is started you can send and receive packets on the 63interface using the standard rx_burst/tx_burst APIs in DPDK. From the host 64point of view you can use any host tool like tcpdump, Wireshark, ping, Pktgen 65and others to communicate with the DPDK application. The DPDK application may 66not understand network protocols like IPv4/6, UDP or TCP unless the 67application has been written to understand these protocols. 68 69If you need the interface as a real network interface meaning running and has 70a valid IP address then you can do this with the following commands:: 71 72 sudo ip link set dtap0 up; sudo ip addr add 192.168.0.250/24 dev dtap0 73 sudo ip link set dtap1 up; sudo ip addr add 192.168.1.250/24 dev dtap1 74 75Please change the IP addresses as you see fit. 76 77If routing is enabled on the host you can also communicate with the DPDK App 78over the internet via a standard socket layer application as long as you 79account for the protocol handling in the application. 80 81If you have a Network Stack in your DPDK application or something like it you 82can utilize that stack to handle the network protocols. Plus you would be able 83to address the interface using an IP address assigned to the internal 84interface. 85 86The TUN PMD allows user to create a TUN device on host. The PMD allows user 87to transmit and receive packets via DPDK API calls with L3 header and payload. 88The devices in host can be accessed via ``ifconfig`` or ``ip`` command. TUN 89interfaces are passed to DPDK ``rte_eal_init`` arguments as ``--vdev=net_tunX``, 90where X stands for unique id, example:: 91 92 --vdev=net_tun0 --vdev=net_tun1,iface=foo1, ... 93 94Unlike TAP PMD, TUN PMD does not support user arguments as ``MAC`` or ``remote`` user 95options. Default interface name is ``dtunX``, where X stands for unique id. 96 97Flow API support 98---------------- 99 100The tap PMD supports major flow API pattern items and actions, when running on 101linux kernels above 4.2 ("Flower" classifier required). 102The kernel support can be checked with this command:: 103 104 zcat /proc/config.gz | ( grep 'CLS_FLOWER=' || echo 'not supported' ) | 105 tee -a /dev/stderr | grep -q '=m' && 106 lsmod | ( grep cls_flower || echo 'try modprobe cls_flower' ) 107 108Supported items: 109 110- eth: src and dst (with variable masks), and eth_type (0xffff mask). 111- vlan: vid, pcp, but not eid. (requires kernel 4.9) 112- ipv4/6: src and dst (with variable masks), and ip_proto (0xffff mask). 113- udp/tcp: src and dst port (0xffff) mask. 114 115Supported actions: 116 117- DROP 118- QUEUE 119- PASSTHRU 120- RSS (requires kernel 4.9) 121 122It is generally not possible to provide a "last" item. However, if the "last" 123item, once masked, is identical to the masked spec, then it is supported. 124 125Only IPv4/6 and MAC addresses can use a variable mask. All other items need a 126full mask (exact match). 127 128As rules are translated to TC, it is possible to show them with something like:: 129 130 tc -s filter show dev tap1 parent 1: 131 132Examples of testpmd flow rules 133~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 134 135Drop packets for destination IP 192.0.2.1:: 136 137 testpmd> flow create 0 priority 1 ingress pattern eth / ipv4 dst is 192.0.2.1 \ 138 / end actions drop / end 139 140Ensure packets from a given MAC address are received on a queue 2:: 141 142 testpmd> flow create 0 priority 2 ingress pattern eth src is 06:05:04:03:02:01 \ 143 / end actions queue index 2 / end 144 145Drop UDP packets in vlan 3:: 146 147 testpmd> flow create 0 priority 3 ingress pattern eth / vlan vid is 3 / \ 148 ipv4 proto is 17 / end actions drop / end 149 150Distribute IPv4 TCP packets using RSS to a given MAC address over queues 0-3:: 151 152 testpmd> flow create 0 priority 4 ingress pattern eth dst is 0a:0b:0c:0d:0e:0f \ 153 / ipv4 / tcp / end actions rss queues 0 1 2 3 end / end 154 155Multi-process sharing 156--------------------- 157 158It is possible to attach an existing TAP device in a secondary process, 159by declaring it as a vdev with the same name as in the primary process, 160and without any parameter. 161 162The port attached in a secondary process will give access to the 163statistics and the queues. 164Therefore it can be used for monitoring or Rx/Tx processing. 165 166The IPC synchronization of Rx/Tx queues is currently limited: 167 168 - Maximum 8 queues shared 169 - Synchronized on probing, but not on later port update 170 171Example 172------- 173 174The following is a simple example of using the TAP PMD with the Pktgen 175packet generator. It requires that the ``socat`` utility is installed on the 176test system. 177 178Build DPDK, then pull down Pktgen and build pktgen using the DPDK SDK/Target 179used to build the dpdk you pulled down. 180 181Run pktgen from the pktgen directory in a terminal with a commandline like the 182following:: 183 184 sudo ./app/app/x86_64-native-linux-gcc/app/pktgen -l 1-5 -n 4 \ 185 --proc-type auto --log-level debug --socket-mem 512,512 --file-prefix pg \ 186 --vdev=net_tap0 --vdev=net_tap1 -b 05:00.0 -b 05:00.1 \ 187 -b 04:00.0 -b 04:00.1 -b 04:00.2 -b 04:00.3 \ 188 -b 81:00.0 -b 81:00.1 -b 81:00.2 -b 81:00.3 \ 189 -b 82:00.0 -b 83:00.0 -- -T -P -m [2:3].0 -m [4:5].1 \ 190 -f themes/black-yellow.theme 191 192.. Note: 193 194 Change the ``-b`` options to exclude all of your physical ports. The 195 following command line is all one line. 196 197 Also, ``-f themes/black-yellow.theme`` is optional if the default colors 198 work on your system configuration. See the Pktgen docs for more 199 information. 200 201Verify with ``ifconfig -a`` command in a different xterm window, should have a 202``dtap0`` and ``dtap1`` interfaces created. 203 204Next set the links for the two interfaces to up via the commands below:: 205 206 sudo ip link set dtap0 up; sudo ip addr add 192.168.0.250/24 dev dtap0 207 sudo ip link set dtap1 up; sudo ip addr add 192.168.1.250/24 dev dtap1 208 209Then use socat to create a loopback for the two interfaces:: 210 211 sudo socat interface:dtap0 interface:dtap1 212 213Then on the Pktgen command line interface you can start sending packets using 214the commands ``start 0`` and ``start 1`` or you can start both at the same 215time with ``start all``. The command ``str`` is an alias for ``start all`` and 216``stp`` is an alias for ``stop all``. 217 218While running you should see the 64 byte counters increasing to verify the 219traffic is being looped back. You can use ``set all size XXX`` to change the 220size of the packets after you stop the traffic. Use pktgen ``help`` 221command to see a list of all commands. You can also use the ``-f`` option to 222load commands at startup in command line or Lua script in pktgen. 223 224RSS specifics 225------------- 226Packet distribution in TAP is done by the kernel which has a default 227distribution. This feature is adding RSS distribution based on eBPF code. 228The default eBPF code calculates RSS hash based on Toeplitz algorithm for 229a fixed RSS key. It is calculated on fixed packet offsets. For IPv4 and IPv6 it 230is calculated over src/dst addresses (8 or 32 bytes for IPv4 or IPv6 231respectively) and src/dst TCP/UDP ports (4 bytes). 232 233The RSS algorithm is written in file ``tap_bpf_program.c`` which 234does not take part in TAP PMD compilation. Instead this file is compiled 235in advance to eBPF object file. The eBPF object file is then parsed and 236translated into eBPF byte code in the format of C arrays of eBPF 237instructions. The C array of eBPF instructions is part of TAP PMD tree and 238is taking part in TAP PMD compilation. At run time the C arrays are uploaded to 239the kernel via BPF system calls and the RSS hash is calculated by the 240kernel. 241 242It is possible to support different RSS hash algorithms by updating file 243``tap_bpf_program.c`` In order to add a new RSS hash algorithm follow these 244steps: 245 2461. Write the new RSS implementation in file ``tap_bpf_program.c`` 247 248BPF programs which are uploaded to the kernel correspond to 249C functions under different ELF sections. 250 2512. Install ``LLVM`` library and ``clang`` compiler versions 3.7 and above 252 2533. Compile ``tap_bpf_program.c`` via ``LLVM`` into an object file:: 254 255 clang -O2 -emit-llvm -c tap_bpf_program.c -o - | llc -march=bpf \ 256 -filetype=obj -o <tap_bpf_program.o> 257 258 2594. Use a tool that receives two parameters: an eBPF object file and a section 260name, and prints out the section as a C array of eBPF instructions. 261Embed the C array in your TAP PMD tree. 262 263The C arrays are uploaded to the kernel using BPF system calls. 264 265``tc`` (traffic control) is a well known user space utility program used to 266configure the Linux kernel packet scheduler. It is usually packaged as 267part of the ``iproute2`` package. 268Since commit 11c39b5e9 ("tc: add eBPF support to f_bpf") ``tc`` can be used 269to uploads eBPF code to the kernel and can be patched in order to print the 270C arrays of eBPF instructions just before calling the BPF system call. 271Please refer to ``iproute2`` package file ``lib/bpf.c`` function 272``bpf_prog_load()``. 273 274An example utility for eBPF instruction generation in the format of C arrays will 275be added in next releases 276 277TAP reports on supported RSS functions as part of dev_infos_get callback: 278``ETH_RSS_IP``, ``ETH_RSS_UDP`` and ``ETH_RSS_TCP``. 279**Known limitation:** TAP supports all of the above hash functions together 280and not in partial combinations. 281 282Systems supporting flow API 283--------------------------- 284 285- "tc flower" classifier requires linux kernel above 4.2 286- eBPF/RSS requires linux kernel above 4.9 287 288+--------------------+-----------------------+ 289| RH7.3 | No flow rule support | 290+--------------------+-----------------------+ 291| RH7.4 | No RSS action support | 292+--------------------+-----------------------+ 293| RH7.5 | No RSS action support | 294+--------------------+-----------------------+ 295| SLES 15, | No limitation | 296| kernel 4.12 | | 297+--------------------+-----------------------+ 298| Azure Ubuntu 16.04,| No limitation | 299| kernel 4.13 | | 300+--------------------+-----------------------+ 301