1.. BSD LICENSE 2 Copyright(c) 2010-2014 Intel Corporation. All rights reserved. 3 All rights reserved. 4 5 Redistribution and use in source and binary forms, with or without 6 modification, are permitted provided that the following conditions 7 are met: 8 9 * Redistributions of source code must retain the above copyright 10 notice, this list of conditions and the following disclaimer. 11 * Redistributions in binary form must reproduce the above copyright 12 notice, this list of conditions and the following disclaimer in 13 the documentation and/or other materials provided with the 14 distribution. 15 * Neither the name of Intel Corporation nor the names of its 16 contributors may be used to endorse or promote products derived 17 from this software without specific prior written permission. 18 19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 31L2 Forwarding Sample Application (in Real and Virtualized Environments) 32======================================================================= 33 34The L2 Forwarding sample application is a simple example of packet processing using 35the Data Plane Development Kit (DPDK) which 36also takes advantage of Single Root I/O Virtualization (SR-IOV) features in a virtualized environment. 37 38.. note:: 39 40 Please note that previously a separate L2 Forwarding in Virtualized Environments sample application was used, 41 however, in later DPDK versions these sample applications have been merged. 42 43Overview 44-------- 45 46The L2 Forwarding sample application, which can operate in real and virtualized environments, 47performs L2 forwarding for each packet that is received on an RX_PORT. 48The destination port is the adjacent port from the enabled portmask, that is, 49if the first four ports are enabled (portmask 0xf), 50ports 1 and 2 forward into each other, and ports 3 and 4 forward into each other. 51Also, the MAC addresses are affected as follows: 52 53* The source MAC address is replaced by the TX_PORT MAC address 54 55* The destination MAC address is replaced by 02:00:00:00:00:TX_PORT_ID 56 57This application can be used to benchmark performance using a traffic-generator, as shown in the :numref:`figure_l2_fwd_benchmark_setup`. 58 59The application can also be used in a virtualized environment as shown in :numref:`figure_l2_fwd_virtenv_benchmark_setup`. 60 61The L2 Forwarding application can also be used as a starting point for developing a new application based on the DPDK. 62 63.. _figure_l2_fwd_benchmark_setup: 64 65.. figure:: img/l2_fwd_benchmark_setup.* 66 67 Performance Benchmark Setup (Basic Environment) 68 69 70.. _figure_l2_fwd_virtenv_benchmark_setup: 71 72.. figure:: img/l2_fwd_virtenv_benchmark_setup.* 73 74 Performance Benchmark Setup (Virtualized Environment) 75 76 77Virtual Function Setup Instructions 78~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 79 80This application can use the virtual function available in the system and 81therefore can be used in a virtual machine without passing through 82the whole Network Device into a guest machine in a virtualized scenario. 83The virtual functions can be enabled in the host machine or the hypervisor with the respective physical function driver. 84 85For example, in a Linux* host machine, it is possible to enable a virtual function using the following command: 86 87.. code-block:: console 88 89 modprobe ixgbe max_vfs=2,2 90 91This command enables two Virtual Functions on each of Physical Function of the NIC, 92with two physical ports in the PCI configuration space. 93It is important to note that enabled Virtual Function 0 and 2 would belong to Physical Function 0 94and Virtual Function 1 and 3 would belong to Physical Function 1, 95in this case enabling a total of four Virtual Functions. 96 97Compiling the Application 98------------------------- 99 100#. Go to the example directory: 101 102 .. code-block:: console 103 104 export RTE_SDK=/path/to/rte_sdk cd ${RTE_SDK}/examples/l2fwd 105 106#. Set the target (a default target is used if not specified). For example: 107 108 .. code-block:: console 109 110 export RTE_TARGET=x86_64-native-linuxapp-gcc 111 112 *See the DPDK Getting Started Guide* for possible RTE_TARGET values. 113 114#. Build the application: 115 116 .. code-block:: console 117 118 make 119 120Running the Application 121----------------------- 122 123The application requires a number of command line options: 124 125.. code-block:: console 126 127 ./build/l2fwd [EAL options] -- -p PORTMASK [-q NQ] 128 129where, 130 131* p PORTMASK: A hexadecimal bitmask of the ports to configure 132 133* q NQ: A number of queues (=ports) per lcore (default is 1) 134 135To run the application in linuxapp environment with 4 lcores, 16 ports and 8 RX queues per lcore, issue the command: 136 137.. code-block:: console 138 139 $ ./build/l2fwd -c f -n 4 -- -q 8 -p ffff 140 141Refer to the *DPDK Getting Started Guide* for general information on running applications 142and the Environment Abstraction Layer (EAL) options. 143 144Explanation 145----------- 146 147The following sections provide some explanation of the code. 148 149Command Line Arguments 150~~~~~~~~~~~~~~~~~~~~~~ 151 152The L2 Forwarding sample application takes specific parameters, 153in addition to Environment Abstraction Layer (EAL) arguments (see Section 9.3). 154The preferred way to parse parameters is to use the getopt() function, 155since it is part of a well-defined and portable library. 156 157The parsing of arguments is done in the l2fwd_parse_args() function. 158The method of argument parsing is not described here. 159Refer to the *glibc getopt(3)* man page for details. 160 161EAL arguments are parsed first, then application-specific arguments. 162This is done at the beginning of the main() function: 163 164.. code-block:: c 165 166 /* init EAL */ 167 168 ret = rte_eal_init(argc, argv); 169 if (ret < 0) 170 rte_exit(EXIT_FAILURE, "Invalid EAL arguments\n"); 171 172 argc -= ret; 173 argv += ret; 174 175 /* parse application arguments (after the EAL ones) */ 176 177 ret = l2fwd_parse_args(argc, argv); 178 if (ret < 0) 179 rte_exit(EXIT_FAILURE, "Invalid L2FWD arguments\n"); 180 181Mbuf Pool Initialization 182~~~~~~~~~~~~~~~~~~~~~~~~ 183 184Once the arguments are parsed, the mbuf pool is created. 185The mbuf pool contains a set of mbuf objects that will be used by the driver 186and the application to store network packet data: 187 188.. code-block:: c 189 190 /* create the mbuf pool */ 191 192 l2fwd_pktmbuf_pool = rte_mempool_create("mbuf_pool", NB_MBUF, MBUF_SIZE, 32, sizeof(struct rte_pktmbuf_pool_private), 193 rte_pktmbuf_pool_init, NULL, rte_pktmbuf_init, NULL, SOCKET0, 0); 194 195 if (l2fwd_pktmbuf_pool == NULL) 196 rte_panic("Cannot init mbuf pool\n"); 197 198The rte_mempool is a generic structure used to handle pools of objects. 199In this case, it is necessary to create a pool that will be used by the driver, 200which expects to have some reserved space in the mempool structure, 201sizeof(struct rte_pktmbuf_pool_private) bytes. 202The number of allocated pkt mbufs is NB_MBUF, with a size of MBUF_SIZE each. 203A per-lcore cache of 32 mbufs is kept. 204The memory is allocated in NUMA socket 0, 205but it is possible to extend this code to allocate one mbuf pool per socket. 206 207Two callback pointers are also given to the rte_mempool_create() function: 208 209* The first callback pointer is to rte_pktmbuf_pool_init() and is used 210 to initialize the private data of the mempool, which is needed by the driver. 211 This function is provided by the mbuf API, but can be copied and extended by the developer. 212 213* The second callback pointer given to rte_mempool_create() is the mbuf initializer. 214 The default is used, that is, rte_pktmbuf_init(), which is provided in the rte_mbuf library. 215 If a more complex application wants to extend the rte_pktmbuf structure for its own needs, 216 a new function derived from rte_pktmbuf_init( ) can be created. 217 218Driver Initialization 219~~~~~~~~~~~~~~~~~~~~~ 220 221The main part of the code in the main() function relates to the initialization of the driver. 222To fully understand this code, it is recommended to study the chapters that related to the Poll Mode Driver 223in the *DPDK Programmer's Guide* - Rel 1.4 EAR and the *DPDK API Reference*. 224 225.. code-block:: c 226 227 if (rte_eal_pci_probe() < 0) 228 rte_exit(EXIT_FAILURE, "Cannot probe PCI\n"); 229 230 nb_ports = rte_eth_dev_count(); 231 232 if (nb_ports == 0) 233 rte_exit(EXIT_FAILURE, "No Ethernet ports - bye\n"); 234 235 if (nb_ports > RTE_MAX_ETHPORTS) 236 nb_ports = RTE_MAX_ETHPORTS; 237 238 /* reset l2fwd_dst_ports */ 239 240 for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) 241 l2fwd_dst_ports[portid] = 0; 242 243 last_port = 0; 244 245 /* 246 * Each logical core is assigned a dedicated TX queue on each port. 247 */ 248 249 for (portid = 0; portid < nb_ports; portid++) { 250 /* skip ports that are not enabled */ 251 252 if ((l2fwd_enabled_port_mask & (1 << portid)) == 0) 253 continue; 254 255 if (nb_ports_in_mask % 2) { 256 l2fwd_dst_ports[portid] = last_port; 257 l2fwd_dst_ports[last_port] = portid; 258 } 259 else 260 last_port = portid; 261 262 nb_ports_in_mask++; 263 264 rte_eth_dev_info_get((uint8_t) portid, &dev_info); 265 } 266 267Observe that: 268 269* rte_igb_pmd_init_all() simultaneously registers the driver as a PCI driver and as an Ethernet* Poll Mode Driver. 270 271* rte_eal_pci_probe() parses the devices on the PCI bus and initializes recognized devices. 272 273The next step is to configure the RX and TX queues. 274For each port, there is only one RX queue (only one lcore is able to poll a given port). 275The number of TX queues depends on the number of available lcores. 276The rte_eth_dev_configure() function is used to configure the number of queues for a port: 277 278.. code-block:: c 279 280 ret = rte_eth_dev_configure((uint8_t)portid, 1, 1, &port_conf); 281 if (ret < 0) 282 rte_exit(EXIT_FAILURE, "Cannot configure device: " 283 "err=%d, port=%u\n", 284 ret, portid); 285 286The global configuration is stored in a static structure: 287 288.. code-block:: c 289 290 static const struct rte_eth_conf port_conf = { 291 .rxmode = { 292 .split_hdr_size = 0, 293 .header_split = 0, /**< Header Split disabled */ 294 .hw_ip_checksum = 0, /**< IP checksum offload disabled */ 295 .hw_vlan_filter = 0, /**< VLAN filtering disabled */ 296 .jumbo_frame = 0, /**< Jumbo Frame Support disabled */ 297 .hw_strip_crc= 0, /**< CRC stripped by hardware */ 298 }, 299 300 .txmode = { 301 .mq_mode = ETH_DCB_NONE 302 }, 303 }; 304 305RX Queue Initialization 306~~~~~~~~~~~~~~~~~~~~~~~ 307 308The application uses one lcore to poll one or several ports, depending on the -q option, 309which specifies the number of queues per lcore. 310 311For example, if the user specifies -q 4, the application is able to poll four ports with one lcore. 312If there are 16 ports on the target (and if the portmask argument is -p ffff ), 313the application will need four lcores to poll all the ports. 314 315.. code-block:: c 316 317 ret = rte_eth_rx_queue_setup((uint8_t) portid, 0, nb_rxd, SOCKET0, &rx_conf, l2fwd_pktmbuf_pool); 318 if (ret < 0) 319 320 rte_exit(EXIT_FAILURE, "rte_eth_rx_queue_setup: " 321 "err=%d, port=%u\n", 322 ret, portid); 323 324The list of queues that must be polled for a given lcore is stored in a private structure called struct lcore_queue_conf. 325 326.. code-block:: c 327 328 struct lcore_queue_conf { 329 unsigned n_rx_port; 330 unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE]; 331 struct mbuf_table tx_mbufs[L2FWD_MAX_PORTS]; 332 } rte_cache_aligned; 333 334 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE]; 335 336The values n_rx_port and rx_port_list[] are used in the main packet processing loop 337(see Section 9.4.6 "Receive, Process and Transmit Packets" later in this chapter). 338 339The global configuration for the RX queues is stored in a static structure: 340 341.. code-block:: c 342 343 static const struct rte_eth_rxconf rx_conf = { 344 .rx_thresh = { 345 .pthresh = RX_PTHRESH, 346 .hthresh = RX_HTHRESH, 347 .wthresh = RX_WTHRESH, 348 }, 349 }; 350 351TX Queue Initialization 352~~~~~~~~~~~~~~~~~~~~~~~ 353 354Each lcore should be able to transmit on any port. For every port, a single TX queue is initialized. 355 356.. code-block:: c 357 358 /* init one TX queue on each port */ 359 360 fflush(stdout); 361 362 ret = rte_eth_tx_queue_setup((uint8_t) portid, 0, nb_txd, rte_eth_dev_socket_id(portid), &tx_conf); 363 if (ret < 0) 364 rte_exit(EXIT_FAILURE, "rte_eth_tx_queue_setup:err=%d, port=%u\n", ret, (unsigned) portid); 365 366The global configuration for TX queues is stored in a static structure: 367 368.. code-block:: c 369 370 static const struct rte_eth_txconf tx_conf = { 371 .tx_thresh = { 372 .pthresh = TX_PTHRESH, 373 .hthresh = TX_HTHRESH, 374 .wthresh = TX_WTHRESH, 375 }, 376 .tx_free_thresh = RTE_TEST_TX_DESC_DEFAULT + 1, /* disable feature */ 377 }; 378 379Receive, Process and Transmit Packets 380~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 381 382In the l2fwd_main_loop() function, the main task is to read ingress packets from the RX queues. 383This is done using the following code: 384 385.. code-block:: c 386 387 /* 388 * Read packet from RX queues 389 */ 390 391 for (i = 0; i < qconf->n_rx_port; i++) { 392 portid = qconf->rx_port_list[i]; 393 nb_rx = rte_eth_rx_burst((uint8_t) portid, 0, pkts_burst, MAX_PKT_BURST); 394 395 for (j = 0; j < nb_rx; j++) { 396 m = pkts_burst[j]; 397 rte_prefetch0[rte_pktmbuf_mtod(m, void *)); l2fwd_simple_forward(m, portid); 398 } 399 } 400 401Packets are read in a burst of size MAX_PKT_BURST. 402The rte_eth_rx_burst() function writes the mbuf pointers in a local table and returns the number of available mbufs in the table. 403 404Then, each mbuf in the table is processed by the l2fwd_simple_forward() function. 405The processing is very simple: process the TX port from the RX port, then replace the source and destination MAC addresses. 406 407.. note:: 408 409 In the following code, one line for getting the output port requires some explanation. 410 411During the initialization process, a static array of destination ports (l2fwd_dst_ports[]) is filled such that for each source port, 412a destination port is assigned that is either the next or previous enabled port from the portmask. 413Naturally, the number of ports in the portmask must be even, otherwise, the application exits. 414 415.. code-block:: c 416 417 static void 418 l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid) 419 { 420 struct ether_hdr *eth; 421 void *tmp; 422 unsigned dst_port; 423 424 dst_port = l2fwd_dst_ports[portid]; 425 426 eth = rte_pktmbuf_mtod(m, struct ether_hdr *); 427 428 /* 02:00:00:00:00:xx */ 429 430 tmp = ð->d_addr.addr_bytes[0]; 431 432 *((uint64_t *)tmp) = 0x000000000002 + ((uint64_t) dst_port << 40); 433 434 /* src addr */ 435 436 ether_addr_copy(&l2fwd_ports_eth_addr[dst_port], ð->s_addr); 437 438 l2fwd_send_packet(m, (uint8_t) dst_port); 439 } 440 441Then, the packet is sent using the l2fwd_send_packet (m, dst_port) function. 442For this test application, the processing is exactly the same for all packets arriving on the same RX port. 443Therefore, it would have been possible to call the l2fwd_send_burst() function directly from the main loop 444to send all the received packets on the same TX port, 445using the burst-oriented send function, which is more efficient. 446 447However, in real-life applications (such as, L3 routing), 448packet N is not necessarily forwarded on the same port as packet N-1. 449The application is implemented to illustrate that, so the same approach can be reused in a more complex application. 450 451The l2fwd_send_packet() function stores the packet in a per-lcore and per-txport table. 452If the table is full, the whole packets table is transmitted using the l2fwd_send_burst() function: 453 454.. code-block:: c 455 456 /* Send the packet on an output interface */ 457 458 static int 459 l2fwd_send_packet(struct rte_mbuf *m, uint8_t port) 460 { 461 unsigned lcore_id, len; 462 struct lcore_queue_conf \*qconf; 463 464 lcore_id = rte_lcore_id(); 465 qconf = &lcore_queue_conf[lcore_id]; 466 len = qconf->tx_mbufs[port].len; 467 qconf->tx_mbufs[port].m_table[len] = m; 468 len++; 469 470 /* enough pkts to be sent */ 471 472 if (unlikely(len == MAX_PKT_BURST)) { 473 l2fwd_send_burst(qconf, MAX_PKT_BURST, port); 474 len = 0; 475 } 476 477 qconf->tx_mbufs[port].len = len; return 0; 478 } 479 480To ensure that no packets remain in the tables, each lcore does a draining of TX queue in its main loop. 481This technique introduces some latency when there are not many packets to send, 482however it improves performance: 483 484.. code-block:: c 485 486 cur_tsc = rte_rdtsc(); 487 488 /* 489 * TX burst queue drain 490 */ 491 492 diff_tsc = cur_tsc - prev_tsc; 493 494 if (unlikely(diff_tsc > drain_tsc)) { 495 for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) { 496 if (qconf->tx_mbufs[portid].len == 0) 497 continue; 498 499 l2fwd_send_burst(&lcore_queue_conf[lcore_id], qconf->tx_mbufs[portid].len, (uint8_t) portid); 500 501 qconf->tx_mbufs[portid].len = 0; 502 } 503 504 /* if timer is enabled */ 505 506 if (timer_period > 0) { 507 /* advance the timer */ 508 509 timer_tsc += diff_tsc; 510 511 /* if timer has reached its timeout */ 512 513 if (unlikely(timer_tsc >= (uint64_t) timer_period)) { 514 /* do this only on master core */ 515 516 if (lcore_id == rte_get_master_lcore()) { 517 print_stats(); 518 519 /* reset the timer */ 520 timer_tsc = 0; 521 } 522 } 523 } 524 525 prev_tsc = cur_tsc; 526 } 527