1.. BSD LICENSE 2 Copyright(c) 2010-2015 Intel Corporation. All rights reserved. 3 All rights reserved. 4 5 Redistribution and use in source and binary forms, with or without 6 modification, are permitted provided that the following conditions 7 are met: 8 9 * Redistributions of source code must retain the above copyright 10 notice, this list of conditions and the following disclaimer. 11 * Redistributions in binary form must reproduce the above copyright 12 notice, this list of conditions and the following disclaimer in 13 the documentation and/or other materials provided with the 14 distribution. 15 * Neither the name of Intel Corporation nor the names of its 16 contributors may be used to endorse or promote products derived 17 from this software without specific prior written permission. 18 19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 31L2 Forwarding Sample Application (in Real and Virtualized Environments) with core load statistics. 32================================================================================================== 33 34The L2 Forwarding sample application is a simple example of packet processing using 35the Data Plane Development Kit (DPDK) which 36also takes advantage of Single Root I/O Virtualization (SR-IOV) features in a virtualized environment. 37 38.. note:: 39 40 This application is a variation of L2 Forwarding sample application. It demonstrate possible 41 scheme of job stats library usage therefore some parts of this document is identical with original 42 L2 forwarding application. 43 44Overview 45-------- 46 47The L2 Forwarding sample application, which can operate in real and virtualized environments, 48performs L2 forwarding for each packet that is received. 49The destination port is the adjacent port from the enabled portmask, that is, 50if the first four ports are enabled (portmask 0xf), 51ports 1 and 2 forward into each other, and ports 3 and 4 forward into each other. 52Also, the MAC addresses are affected as follows: 53 54* The source MAC address is replaced by the TX port MAC address 55 56* The destination MAC address is replaced by 02:00:00:00:00:TX_PORT_ID 57 58This application can be used to benchmark performance using a traffic-generator, as shown in the Figure 3. 59 60The application can also be used in a virtualized environment as shown in Figure 4. 61 62The L2 Forwarding application can also be used as a starting point for developing a new application based on the DPDK. 63 64.. _figure_3: 65 66**Figure 3. Performance Benchmark Setup (Basic Environment)** 67 68.. image4_png has been replaced 69 70|l2_fwd_benchmark_setup| 71 72.. _figure_4: 73 74**Figure 4. Performance Benchmark Setup (Virtualized Environment)** 75 76.. image5_png has been renamed 77 78|l2_fwd_virtenv_benchmark_setup| 79 80Virtual Function Setup Instructions 81~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 82 83This application can use the virtual function available in the system and 84therefore can be used in a virtual machine without passing through 85the whole Network Device into a guest machine in a virtualized scenario. 86The virtual functions can be enabled in the host machine or the hypervisor with the respective physical function driver. 87 88For example, in a Linux* host machine, it is possible to enable a virtual function using the following command: 89 90.. code-block:: console 91 92 modprobe ixgbe max_vfs=2,2 93 94This command enables two Virtual Functions on each of Physical Function of the NIC, 95with two physical ports in the PCI configuration space. 96It is important to note that enabled Virtual Function 0 and 2 would belong to Physical Function 0 97and Virtual Function 1 and 3 would belong to Physical Function 1, 98in this case enabling a total of four Virtual Functions. 99 100Compiling the Application 101------------------------- 102 103#. Go to the example directory: 104 105 .. code-block:: console 106 107 export RTE_SDK=/path/to/rte_sdk cd ${RTE_SDK}/examples/l2fwd-jobstats 108 109#. Set the target (a default target is used if not specified). For example: 110 111 .. code-block:: console 112 113 export RTE_TARGET=x86_64-native-linuxapp-gcc 114 115 *See the DPDK Getting Started Guide* for possible RTE_TARGET values. 116 117#. Build the application: 118 119 .. code-block:: console 120 121 make 122 123Running the Application 124----------------------- 125 126The application requires a number of command line options: 127 128.. code-block:: console 129 130 ./build/l2fwd-jobstats [EAL options] -- -p PORTMASK [-q NQ] [-l] 131 132where, 133 134* p PORTMASK: A hexadecimal bitmask of the ports to configure 135 136* q NQ: A number of queues (=ports) per lcore (default is 1) 137 138* l: Use locale thousands separator when formatting big numbers. 139 140To run the application in linuxapp environment with 4 lcores, 16 ports, 8 RX queues per lcore and 141thousands separator printing, issue the command: 142 143.. code-block:: console 144 145 $ ./build/l2fwd-jobstats -c f -n 4 -- -q 8 -p ffff -l 146 147Refer to the *DPDK Getting Started Guide* for general information on running applications 148and the Environment Abstraction Layer (EAL) options. 149 150Explanation 151----------- 152 153The following sections provide some explanation of the code. 154 155Command Line Arguments 156~~~~~~~~~~~~~~~~~~~~~~ 157 158The L2 Forwarding sample application takes specific parameters, 159in addition to Environment Abstraction Layer (EAL) arguments (see Section 9.3). 160The preferred way to parse parameters is to use the getopt() function, 161since it is part of a well-defined and portable library. 162 163The parsing of arguments is done in the l2fwd_parse_args() function. 164The method of argument parsing is not described here. 165Refer to the *glibc getopt(3)* man page for details. 166 167EAL arguments are parsed first, then application-specific arguments. 168This is done at the beginning of the main() function: 169 170.. code-block:: c 171 172 /* init EAL */ 173 174 ret = rte_eal_init(argc, argv); 175 if (ret < 0) 176 rte_exit(EXIT_FAILURE, "Invalid EAL arguments\n"); 177 178 argc -= ret; 179 argv += ret; 180 181 /* parse application arguments (after the EAL ones) */ 182 183 ret = l2fwd_parse_args(argc, argv); 184 if (ret < 0) 185 rte_exit(EXIT_FAILURE, "Invalid L2FWD arguments\n"); 186 187Mbuf Pool Initialization 188~~~~~~~~~~~~~~~~~~~~~~~~ 189 190Once the arguments are parsed, the mbuf pool is created. 191The mbuf pool contains a set of mbuf objects that will be used by the driver 192and the application to store network packet data: 193 194.. code-block:: c 195 196 /* create the mbuf pool */ 197 l2fwd_pktmbuf_pool = 198 rte_mempool_create("mbuf_pool", NB_MBUF, 199 MBUF_SIZE, 32, 200 sizeof(struct rte_pktmbuf_pool_private), 201 rte_pktmbuf_pool_init, NULL, 202 rte_pktmbuf_init, NULL, 203 rte_socket_id(), 0); 204 205 if (l2fwd_pktmbuf_pool == NULL) 206 rte_exit(EXIT_FAILURE, "Cannot init mbuf pool\n"); 207 208The rte_mempool is a generic structure used to handle pools of objects. 209In this case, it is necessary to create a pool that will be used by the driver, 210which expects to have some reserved space in the mempool structure, 211sizeof(struct rte_pktmbuf_pool_private) bytes. 212The number of allocated pkt mbufs is NB_MBUF, with a size of MBUF_SIZE each. 213A per-lcore cache of 32 mbufs is kept. 214The memory is allocated in rte_socket_id() socket, 215but it is possible to extend this code to allocate one mbuf pool per socket. 216 217Two callback pointers are also given to the rte_mempool_create() function: 218 219* The first callback pointer is to rte_pktmbuf_pool_init() and is used 220 to initialize the private data of the mempool, which is needed by the driver. 221 This function is provided by the mbuf API, but can be copied and extended by the developer. 222 223* The second callback pointer given to rte_mempool_create() is the mbuf initializer. 224 The default is used, that is, rte_pktmbuf_init(), which is provided in the rte_mbuf library. 225 If a more complex application wants to extend the rte_pktmbuf structure for its own needs, 226 a new function derived from rte_pktmbuf_init( ) can be created. 227 228Driver Initialization 229~~~~~~~~~~~~~~~~~~~~~ 230 231The main part of the code in the main() function relates to the initialization of the driver. 232To fully understand this code, it is recommended to study the chapters that related to the Poll Mode Driver 233in the *DPDK Programmer's Guide* and the *DPDK API Reference*. 234 235.. code-block:: c 236 237 nb_ports = rte_eth_dev_count(); 238 239 if (nb_ports == 0) 240 rte_exit(EXIT_FAILURE, "No Ethernet ports - bye\n"); 241 242 if (nb_ports > RTE_MAX_ETHPORTS) 243 nb_ports = RTE_MAX_ETHPORTS; 244 245 /* reset l2fwd_dst_ports */ 246 247 for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) 248 l2fwd_dst_ports[portid] = 0; 249 250 last_port = 0; 251 252 /* 253 * Each logical core is assigned a dedicated TX queue on each port. 254 */ 255 for (portid = 0; portid < nb_ports; portid++) { 256 /* skip ports that are not enabled */ 257 if ((l2fwd_enabled_port_mask & (1 << portid)) == 0) 258 continue; 259 260 if (nb_ports_in_mask % 2) { 261 l2fwd_dst_ports[portid] = last_port; 262 l2fwd_dst_ports[last_port] = portid; 263 } 264 else 265 last_port = portid; 266 267 nb_ports_in_mask++; 268 269 rte_eth_dev_info_get((uint8_t) portid, &dev_info); 270 } 271 272The next step is to configure the RX and TX queues. 273For each port, there is only one RX queue (only one lcore is able to poll a given port). 274The number of TX queues depends on the number of available lcores. 275The rte_eth_dev_configure() function is used to configure the number of queues for a port: 276 277.. code-block:: c 278 279 ret = rte_eth_dev_configure((uint8_t)portid, 1, 1, &port_conf); 280 if (ret < 0) 281 rte_exit(EXIT_FAILURE, "Cannot configure device: " 282 "err=%d, port=%u\n", 283 ret, portid); 284 285The global configuration is stored in a static structure: 286 287.. code-block:: c 288 289 static const struct rte_eth_conf port_conf = { 290 .rxmode = { 291 .split_hdr_size = 0, 292 .header_split = 0, /**< Header Split disabled */ 293 .hw_ip_checksum = 0, /**< IP checksum offload disabled */ 294 .hw_vlan_filter = 0, /**< VLAN filtering disabled */ 295 .jumbo_frame = 0, /**< Jumbo Frame Support disabled */ 296 .hw_strip_crc= 0, /**< CRC stripped by hardware */ 297 }, 298 299 .txmode = { 300 .mq_mode = ETH_DCB_NONE 301 }, 302 }; 303 304RX Queue Initialization 305~~~~~~~~~~~~~~~~~~~~~~~ 306 307The application uses one lcore to poll one or several ports, depending on the -q option, 308which specifies the number of queues per lcore. 309 310For example, if the user specifies -q 4, the application is able to poll four ports with one lcore. 311If there are 16 ports on the target (and if the portmask argument is -p ffff ), 312the application will need four lcores to poll all the ports. 313 314.. code-block:: c 315 316 ret = rte_eth_rx_queue_setup(portid, 0, nb_rxd, 317 rte_eth_dev_socket_id(portid), 318 NULL, 319 l2fwd_pktmbuf_pool); 320 321 if (ret < 0) 322 rte_exit(EXIT_FAILURE, "rte_eth_rx_queue_setup:err=%d, port=%u\n", 323 ret, (unsigned) portid); 324 325The list of queues that must be polled for a given lcore is stored in a private structure called struct lcore_queue_conf. 326 327.. code-block:: c 328 329 struct lcore_queue_conf { 330 unsigned n_rx_port; 331 unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE]; 332 truct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS]; 333 334 struct rte_timer rx_timers[MAX_RX_QUEUE_PER_LCORE]; 335 struct rte_jobstats port_fwd_jobs[MAX_RX_QUEUE_PER_LCORE]; 336 337 struct rte_timer flush_timer; 338 struct rte_jobstats flush_job; 339 struct rte_jobstats idle_job; 340 struct rte_jobstats_context jobs_context; 341 342 rte_atomic16_t stats_read_pending; 343 rte_spinlock_t lock; 344 } __rte_cache_aligned; 345 346Values of struct lcore_queue_conf: 347 348* n_rx_port and rx_port_list[] are used in the main packet processing loop 349 (see Section 9.4.6 "Receive, Process and Transmit Packets" later in this chapter). 350 351* rx_timers and flush_timer are used to ensure forced TX on low packet rate. 352 353* flush_job, idle_job and jobs_context are librte_jobstats objects used for managing l2fwd jobs. 354 355* stats_read_pending and lock are used during job stats read phase. 356 357TX Queue Initialization 358~~~~~~~~~~~~~~~~~~~~~~~ 359 360Each lcore should be able to transmit on any port. For every port, a single TX queue is initialized. 361 362.. code-block:: c 363 364 /* init one TX queue on each port */ 365 366 fflush(stdout); 367 ret = rte_eth_tx_queue_setup(portid, 0, nb_txd, 368 rte_eth_dev_socket_id(portid), 369 NULL); 370 if (ret < 0) 371 rte_exit(EXIT_FAILURE, "rte_eth_tx_queue_setup:err=%d, port=%u\n", 372 ret, (unsigned) portid); 373 374Jobs statistics initialization 375~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 376There are several statistics objects available: 377 378* Flush job statistics 379 380.. code-block:: c 381 382 rte_jobstats_init(&qconf->flush_job, "flush", drain_tsc, drain_tsc, 383 drain_tsc, 0); 384 385 rte_timer_init(&qconf->flush_timer); 386 ret = rte_timer_reset(&qconf->flush_timer, drain_tsc, PERIODICAL, 387 lcore_id, &l2fwd_flush_job, NULL); 388 389 if (ret < 0) { 390 rte_exit(1, "Failed to reset flush job timer for lcore %u: %s", 391 lcore_id, rte_strerror(-ret)); 392 } 393 394* Statistics per RX port 395 396.. code-block:: c 397 398 rte_jobstats_init(job, name, 0, drain_tsc, 0, MAX_PKT_BURST); 399 rte_jobstats_set_update_period_function(job, l2fwd_job_update_cb); 400 401 rte_timer_init(&qconf->rx_timers[i]); 402 ret = rte_timer_reset(&qconf->rx_timers[i], 0, PERIODICAL, lcore_id, 403 l2fwd_fwd_job, (void *)(uintptr_t)i); 404 405 if (ret < 0) { 406 rte_exit(1, "Failed to reset lcore %u port %u job timer: %s", 407 lcore_id, qconf->rx_port_list[i], rte_strerror(-ret)); 408 } 409 410Following parameters are passed to rte_jobstats_init(): 411 412* 0 as minimal poll period 413 414* drain_tsc as maximum poll period 415 416* MAX_PKT_BURST as desired target value (RX burst size) 417 418Main loop 419~~~~~~~~~ 420 421The forwarding path is reworked comparing to original L2 Forwarding application. 422In the l2fwd_main_loop() function three loops are placed. 423 424.. code-block:: c 425 426 for (;;) { 427 rte_spinlock_lock(&qconf->lock); 428 429 do { 430 rte_jobstats_context_start(&qconf->jobs_context); 431 432 /* Do the Idle job: 433 * - Read stats_read_pending flag 434 * - check if some real job need to be executed 435 */ 436 rte_jobstats_start(&qconf->jobs_context, &qconf->idle_job); 437 438 do { 439 uint8_t i; 440 uint64_t now = rte_get_timer_cycles(); 441 442 need_manage = qconf->flush_timer.expire < now; 443 /* Check if we was esked to give a stats. */ 444 stats_read_pending = 445 rte_atomic16_read(&qconf->stats_read_pending); 446 need_manage |= stats_read_pending; 447 448 for (i = 0; i < qconf->n_rx_port && !need_manage; i++) 449 need_manage = qconf->rx_timers[i].expire < now; 450 451 } while (!need_manage); 452 rte_jobstats_finish(&qconf->idle_job, qconf->idle_job.target); 453 454 rte_timer_manage(); 455 rte_jobstats_context_finish(&qconf->jobs_context); 456 } while (likely(stats_read_pending == 0)); 457 458 rte_spinlock_unlock(&qconf->lock); 459 rte_pause(); 460 } 461 462First inifnite for loop is to minimize impact of stats reading. Lock is only locked/unlocked when asked. 463 464Second inner while loop do the whole jobs management. When any job is ready, the use rte_timer_manage() is used to call the job handler. 465In this place functions l2fwd_fwd_job() and l2fwd_flush_job() are called when needed. 466Then rte_jobstats_context_finish() is called to mark loop end - no other jobs are ready to execute. By this time stats are ready to be read 467and if stats_read_pending is set, loop breaks allowing stats to be read. 468 469Third do-while loop is the idle job (idle stats counter). Its only purpose is moniting if any job is ready or stats job read is pending 470for this lcore. Statistics from this part of code is considered as the headroom available fo additional processing. 471 472Receive, Process and Transmit Packets 473~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 474 475The main task of l2fwd_fwd_job() function is to read ingress packets from the RX queue of particular port and forward it. 476This is done using the following code: 477 478.. code-block:: c 479 480 total_nb_rx = rte_eth_rx_burst((uint8_t) portid, 0, pkts_burst, 481 MAX_PKT_BURST); 482 483 for (j = 0; j < total_nb_rx; j++) { 484 m = pkts_burst[j]; 485 rte_prefetch0(rte_pktmbuf_mtod(m, void *)); 486 l2fwd_simple_forward(m, portid); 487 } 488 489Packets are read in a burst of size MAX_PKT_BURST. 490Then, each mbuf in the table is processed by the l2fwd_simple_forward() function. 491The processing is very simple: process the TX port from the RX port, then replace the source and destination MAC addresses. 492 493The rte_eth_rx_burst() function writes the mbuf pointers in a local table and returns the number of available mbufs in the table. 494 495After first read second try is issued. 496 497.. code-block:: c 498 499 if (total_nb_rx == MAX_PKT_BURST) { 500 const uint16_t nb_rx = rte_eth_rx_burst((uint8_t) portid, 0, pkts_burst, 501 MAX_PKT_BURST); 502 503 total_nb_rx += nb_rx; 504 for (j = 0; j < nb_rx; j++) { 505 m = pkts_burst[j]; 506 rte_prefetch0(rte_pktmbuf_mtod(m, void *)); 507 l2fwd_simple_forward(m, portid); 508 } 509 } 510 511This second read is important to give job stats library a feedback how many packets was processed. 512 513.. code-block:: c 514 515 /* Adjust period time in which we are running here. */ 516 if (rte_jobstats_finish(job, total_nb_rx) != 0) { 517 rte_timer_reset(&qconf->rx_timers[port_idx], job->period, PERIODICAL, 518 lcore_id, l2fwd_fwd_job, arg); 519 } 520 521To maximize performance exactly MAX_PKT_BURST is expected (the target value) to be read for each l2fwd_fwd_job() call. 522If total_nb_rx is smaller than target value job->period will be increased. If it is greater the period will be decreased. 523 524.. note:: 525 526 In the following code, one line for getting the output port requires some explanation. 527 528During the initialization process, a static array of destination ports (l2fwd_dst_ports[]) is filled such that for each source port, 529a destination port is assigned that is either the next or previous enabled port from the portmask. 530Naturally, the number of ports in the portmask must be even, otherwise, the application exits. 531 532.. code-block:: c 533 534 static void 535 l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid) 536 { 537 struct ether_hdr *eth; 538 void *tmp; 539 unsigned dst_port; 540 541 dst_port = l2fwd_dst_ports[portid]; 542 543 eth = rte_pktmbuf_mtod(m, struct ether_hdr *); 544 545 /* 02:00:00:00:00:xx */ 546 547 tmp = ð->d_addr.addr_bytes[0]; 548 549 *((uint64_t *)tmp) = 0x000000000002 + ((uint64_t) dst_port << 40); 550 551 /* src addr */ 552 553 ether_addr_copy(&l2fwd_ports_eth_addr[dst_port], ð->s_addr); 554 555 l2fwd_send_packet(m, (uint8_t) dst_port); 556 } 557 558Then, the packet is sent using the l2fwd_send_packet (m, dst_port) function. 559For this test application, the processing is exactly the same for all packets arriving on the same RX port. 560Therefore, it would have been possible to call the l2fwd_send_burst() function directly from the main loop 561to send all the received packets on the same TX port, 562using the burst-oriented send function, which is more efficient. 563 564However, in real-life applications (such as, L3 routing), 565packet N is not necessarily forwarded on the same port as packet N-1. 566The application is implemented to illustrate that, so the same approach can be reused in a more complex application. 567 568The l2fwd_send_packet() function stores the packet in a per-lcore and per-txport table. 569If the table is full, the whole packets table is transmitted using the l2fwd_send_burst() function: 570 571.. code-block:: c 572 573 /* Send the packet on an output interface */ 574 575 static int 576 l2fwd_send_packet(struct rte_mbuf *m, uint8_t port) 577 { 578 unsigned lcore_id, len; 579 struct lcore_queue_conf *qconf; 580 581 lcore_id = rte_lcore_id(); 582 qconf = &lcore_queue_conf[lcore_id]; 583 len = qconf->tx_mbufs[port].len; 584 qconf->tx_mbufs[port].m_table[len] = m; 585 len++; 586 587 /* enough pkts to be sent */ 588 589 if (unlikely(len == MAX_PKT_BURST)) { 590 l2fwd_send_burst(qconf, MAX_PKT_BURST, port); 591 len = 0; 592 } 593 594 qconf->tx_mbufs[port].len = len; return 0; 595 } 596 597To ensure that no packets remain in the tables, the flush job exists. The l2fwd_flush_job() 598is called periodicaly to for each lcore draining TX queue of each port. 599This technique introduces some latency when there are not many packets to send, 600however it improves performance: 601 602.. code-block:: c 603 604 static void 605 l2fwd_flush_job(__rte_unused struct rte_timer *timer, __rte_unused void *arg) 606 { 607 uint64_t now; 608 unsigned lcore_id; 609 struct lcore_queue_conf *qconf; 610 struct mbuf_table *m_table; 611 uint8_t portid; 612 613 lcore_id = rte_lcore_id(); 614 qconf = &lcore_queue_conf[lcore_id]; 615 616 rte_jobstats_start(&qconf->jobs_context, &qconf->flush_job); 617 618 now = rte_get_timer_cycles(); 619 lcore_id = rte_lcore_id(); 620 qconf = &lcore_queue_conf[lcore_id]; 621 for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) { 622 m_table = &qconf->tx_mbufs[portid]; 623 if (m_table->len == 0 || m_table->next_flush_time <= now) 624 continue; 625 626 l2fwd_send_burst(qconf, portid); 627 } 628 629 630 /* Pass target to indicate that this job is happy of time interval 631 * in which it was called. */ 632 rte_jobstats_finish(&qconf->flush_job, qconf->flush_job.target); 633 } 634 635.. |l2_fwd_benchmark_setup| image:: img/l2_fwd_benchmark_setup.* 636 637.. |l2_fwd_virtenv_benchmark_setup| image:: img/l2_fwd_virtenv_benchmark_setup.* 638