xref: /dpdk/doc/guides/nics/tap.rst (revision 68a03efeed657e6e05f281479b33b51102797e15)
1..  SPDX-License-Identifier: BSD-3-Clause
2    Copyright(c) 2016 Intel Corporation.
3
4Tun|Tap Poll Mode Driver
5========================
6
7The ``rte_eth_tap.c`` PMD creates a device using TAP interfaces on the
8local host. The PMD allows for DPDK and the host to communicate using a raw
9device interface on the host and in the DPDK application.
10
11The device created is a TAP device, which sends/receives packet in a raw
12format with a L2 header. The usage for a TAP PMD is for connectivity to the
13local host using a TAP interface. When the TAP PMD is initialized it will
14create a number of tap devices in the host accessed via ``ifconfig -a`` or
15``ip`` command. The commands can be used to assign and query the virtual like
16device.
17
18These TAP interfaces can be used with Wireshark or tcpdump or Pktgen-DPDK
19along with being able to be used as a network connection to the DPDK
20application. The method enable one or more interfaces is to use the
21``--vdev=net_tap0`` option on the DPDK application command line. Each
22``--vdev=net_tap1`` option given will create an interface named dtap0, dtap1,
23and so on.
24
25The interface name can be changed by adding the ``iface=foo0``, for example::
26
27   --vdev=net_tap0,iface=foo0 --vdev=net_tap1,iface=foo1, ...
28
29Normally the PMD will generate a random MAC address, but when testing or with
30a static configuration the developer may need a fixed MAC address style.
31Using the option ``mac=fixed`` you can create a fixed known MAC address::
32
33   --vdev=net_tap0,mac=fixed
34
35The MAC address will have a fixed value with the last octet incrementing by one
36for each interface string containing ``mac=fixed``. The MAC address is formatted
37as 00:'d':'t':'a':'p':[00-FF]. Convert the characters to hex and you get the
38actual MAC address: ``00:64:74:61:70:[00-FF]``.
39
40   --vdev=net_tap0,mac="00:64:74:61:70:11"
41
42The MAC address will have a user value passed as string. The MAC address is in
43format with delimiter ``:``. The string is byte converted to hex and you get
44the actual MAC address: ``00:64:74:61:70:11``.
45
46It is possible to specify a remote netdevice to capture packets from by adding
47``remote=foo1``, for example::
48
49   --vdev=net_tap,iface=tap0,remote=foo1
50
51If a ``remote`` is set, the tap MAC address will be set to match the remote one
52just after netdevice creation. Using TC rules, traffic from the remote netdevice
53will be redirected to the tap. If the tap is in promiscuous mode, then all
54packets will be redirected. In allmulti mode, all multicast packets will be
55redirected.
56
57Using the remote feature is especially useful for capturing traffic from a
58netdevice that has no support in the DPDK. It is possible to add explicit
59rte_flow rules on the tap PMD to capture specific traffic (see next section for
60examples).
61
62After the DPDK application is started you can send and receive packets on the
63interface using the standard rx_burst/tx_burst APIs in DPDK. From the host
64point of view you can use any host tool like tcpdump, Wireshark, ping, Pktgen
65and others to communicate with the DPDK application. The DPDK application may
66not understand network protocols like IPv4/6, UDP or TCP unless the
67application has been written to understand these protocols.
68
69If you need the interface as a real network interface meaning running and has
70a valid IP address then you can do this with the following commands::
71
72   sudo ip link set dtap0 up; sudo ip addr add 192.168.0.250/24 dev dtap0
73   sudo ip link set dtap1 up; sudo ip addr add 192.168.1.250/24 dev dtap1
74
75Please change the IP addresses as you see fit.
76
77If routing is enabled on the host you can also communicate with the DPDK App
78over the internet via a standard socket layer application as long as you
79account for the protocol handling in the application.
80
81If you have a Network Stack in your DPDK application or something like it you
82can utilize that stack to handle the network protocols. Plus you would be able
83to address the interface using an IP address assigned to the internal
84interface.
85
86The TUN PMD allows user to create a TUN device on host. The PMD allows user
87to transmit and receive packets via DPDK API calls with L3 header and payload.
88The devices in host can be accessed via ``ifconfig`` or ``ip`` command. TUN
89interfaces are passed to DPDK ``rte_eal_init`` arguments as ``--vdev=net_tunX``,
90where X stands for unique id, example::
91
92   --vdev=net_tun0 --vdev=net_tun1,iface=foo1, ...
93
94Unlike TAP PMD, TUN PMD does not support user arguments as ``MAC`` or ``remote`` user
95options. Default interface name is ``dtunX``, where X stands for unique id.
96
97Flow API support
98----------------
99
100The tap PMD supports major flow API pattern items and actions, when running on
101linux kernels above 4.2 ("Flower" classifier required).
102The kernel support can be checked with this command::
103
104   zcat /proc/config.gz | ( grep 'CLS_FLOWER=' || echo 'not supported' ) |
105   tee -a /dev/stderr | grep -q '=m' &&
106   lsmod | ( grep cls_flower || echo 'try modprobe cls_flower' )
107
108Supported items:
109
110- eth: src and dst (with variable masks), and eth_type (0xffff mask).
111- vlan: vid, pcp, but not eid. (requires kernel 4.9)
112- ipv4/6: src and dst (with variable masks), and ip_proto (0xffff mask).
113- udp/tcp: src and dst port (0xffff) mask.
114
115Supported actions:
116
117- DROP
118- QUEUE
119- PASSTHRU
120- RSS (requires kernel 4.9)
121
122It is generally not possible to provide a "last" item. However, if the "last"
123item, once masked, is identical to the masked spec, then it is supported.
124
125Only IPv4/6 and MAC addresses can use a variable mask. All other items need a
126full mask (exact match).
127
128As rules are translated to TC, it is possible to show them with something like::
129
130   tc -s filter show dev tap1 parent 1:
131
132Examples of testpmd flow rules
133~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
134
135Drop packets for destination IP 192.0.2.1::
136
137   testpmd> flow create 0 priority 1 ingress pattern eth / ipv4 dst is 192.0.2.1 \
138            / end actions drop / end
139
140Ensure packets from a given MAC address are received on a queue 2::
141
142   testpmd> flow create 0 priority 2 ingress pattern eth src is 06:05:04:03:02:01 \
143            / end actions queue index 2 / end
144
145Drop UDP packets in vlan 3::
146
147   testpmd> flow create 0 priority 3 ingress pattern eth / vlan vid is 3 / \
148            ipv4 proto is 17 / end actions drop / end
149
150Distribute IPv4 TCP packets using RSS to a given MAC address over queues 0-3::
151
152   testpmd> flow create 0 priority 4 ingress pattern eth dst is 0a:0b:0c:0d:0e:0f \
153            / ipv4 / tcp / end actions rss queues 0 1 2 3 end / end
154
155Multi-process sharing
156---------------------
157
158It is possible to attach an existing TAP device in a secondary process,
159by declaring it as a vdev with the same name as in the primary process,
160and without any parameter.
161
162The port attached in a secondary process will give access to the
163statistics and the queues.
164Therefore it can be used for monitoring or Rx/Tx processing.
165
166The IPC synchronization of Rx/Tx queues is currently limited:
167
168  - Maximum 8 queues shared
169  - Synchronized on probing, but not on later port update
170
171Example
172-------
173
174The following is a simple example of using the TAP PMD with the Pktgen
175packet generator. It requires that the ``socat`` utility is installed on the
176test system.
177
178Build DPDK, then pull down Pktgen and build pktgen using the DPDK SDK/Target
179used to build the dpdk you pulled down.
180
181Run pktgen from the pktgen directory in a terminal with a commandline like the
182following::
183
184    sudo ./app/app/x86_64-native-linux-gcc/app/pktgen -l 1-5 -n 4        \
185     --proc-type auto --log-level debug --socket-mem 512,512 --file-prefix pg   \
186     --vdev=net_tap0 --vdev=net_tap1 -b 05:00.0 -b 05:00.1                  \
187     -b 04:00.0 -b 04:00.1 -b 04:00.2 -b 04:00.3                            \
188     -b 81:00.0 -b 81:00.1 -b 81:00.2 -b 81:00.3                            \
189     -b 82:00.0 -b 83:00.0 -- -T -P -m [2:3].0 -m [4:5].1                   \
190     -f themes/black-yellow.theme
191
192.. Note:
193
194   Change the ``-b`` options to exclude all of your physical ports. The
195   following command line is all one line.
196
197   Also, ``-f themes/black-yellow.theme`` is optional if the default colors
198   work on your system configuration. See the Pktgen docs for more
199   information.
200
201Verify with ``ifconfig -a`` command in a different xterm window, should have a
202``dtap0`` and ``dtap1`` interfaces created.
203
204Next set the links for the two interfaces to up via the commands below::
205
206    sudo ip link set dtap0 up; sudo ip addr add 192.168.0.250/24 dev dtap0
207    sudo ip link set dtap1 up; sudo ip addr add 192.168.1.250/24 dev dtap1
208
209Then use socat to create a loopback for the two interfaces::
210
211    sudo socat interface:dtap0 interface:dtap1
212
213Then on the Pktgen command line interface you can start sending packets using
214the commands ``start 0`` and ``start 1`` or you can start both at the same
215time with ``start all``. The command ``str`` is an alias for ``start all`` and
216``stp`` is an alias for ``stop all``.
217
218While running you should see the 64 byte counters increasing to verify the
219traffic is being looped back. You can use ``set all size XXX`` to change the
220size of the packets after you stop the traffic. Use pktgen ``help``
221command to see a list of all commands. You can also use the ``-f`` option to
222load commands at startup in command line or Lua script in pktgen.
223
224RSS specifics
225-------------
226Packet distribution in TAP is done by the kernel which has a default
227distribution. This feature is adding RSS distribution based on eBPF code.
228The default eBPF code calculates RSS hash based on Toeplitz algorithm for
229a fixed RSS key. It is calculated on fixed packet offsets. For IPv4 and IPv6 it
230is calculated over src/dst addresses (8 or 32 bytes for IPv4 or IPv6
231respectively) and src/dst TCP/UDP ports (4 bytes).
232
233The RSS algorithm is written in file ``tap_bpf_program.c`` which
234does not take part in TAP PMD compilation. Instead this file is compiled
235in advance to eBPF object file. The eBPF object file is then parsed and
236translated into eBPF byte code in the format of C arrays of eBPF
237instructions. The C array of eBPF instructions is part of TAP PMD tree and
238is taking part in TAP PMD compilation. At run time the C arrays are uploaded to
239the kernel via BPF system calls and the RSS hash is calculated by the
240kernel.
241
242It is possible to support different RSS hash algorithms by updating file
243``tap_bpf_program.c``  In order to add a new RSS hash algorithm follow these
244steps:
245
2461. Write the new RSS implementation in file ``tap_bpf_program.c``
247
248BPF programs which are uploaded to the kernel correspond to
249C functions under different ELF sections.
250
2512. Install ``LLVM`` library and ``clang`` compiler versions 3.7 and above
252
2533. Compile ``tap_bpf_program.c`` via ``LLVM`` into an object file::
254
255    clang -O2 -emit-llvm -c tap_bpf_program.c -o - | llc -march=bpf \
256    -filetype=obj -o <tap_bpf_program.o>
257
258
2594. Use a tool that receives two parameters: an eBPF object file and a section
260name, and prints out the section as a C array of eBPF instructions.
261Embed the C array in your TAP PMD tree.
262
263The C arrays are uploaded to the kernel using BPF system calls.
264
265``tc`` (traffic control) is a well known user space utility program used to
266configure the Linux kernel packet scheduler. It is usually packaged as
267part of the ``iproute2`` package.
268Since commit 11c39b5e9 ("tc: add eBPF support to f_bpf") ``tc`` can be used
269to uploads eBPF code to the kernel and can be patched in order to print the
270C arrays of eBPF instructions just before calling the BPF system call.
271Please refer to ``iproute2`` package file ``lib/bpf.c`` function
272``bpf_prog_load()``.
273
274An example utility for eBPF instruction generation in the format of C arrays will
275be added in next releases
276
277TAP reports on supported RSS functions as part of dev_infos_get callback:
278``ETH_RSS_IP``, ``ETH_RSS_UDP`` and ``ETH_RSS_TCP``.
279**Known limitation:** TAP supports all of the above hash functions together
280and not in partial combinations.
281
282Systems supporting flow API
283---------------------------
284
285- "tc flower" classifier requires linux kernel above 4.2
286- eBPF/RSS requires linux kernel above 4.9
287
288+--------------------+-----------------------+
289| RH7.3              | No flow rule support  |
290+--------------------+-----------------------+
291| RH7.4              | No RSS action support |
292+--------------------+-----------------------+
293| RH7.5              | No RSS action support |
294+--------------------+-----------------------+
295| SLES 15,           | No limitation         |
296| kernel 4.12        |                       |
297+--------------------+-----------------------+
298| Azure Ubuntu 16.04,| No limitation         |
299| kernel 4.13        |                       |
300+--------------------+-----------------------+
301