1.. BSD LICENSE 2 Copyright(c) 2016-2017 Intel Corporation. All rights reserved. 3 All rights reserved. 4 5 Redistribution and use in source and binary forms, with or without 6 modification, are permitted provided that the following conditions 7 are met: 8 9 * Redistributions of source code must retain the above copyright 10 notice, this list of conditions and the following disclaimer. 11 * Redistributions in binary form must reproduce the above copyright 12 notice, this list of conditions and the following disclaimer in 13 the documentation and/or other materials provided with the 14 distribution. 15 * Neither the name of Intel Corporation nor the names of its 16 contributors may be used to endorse or promote products derived 17 from this software without specific prior written permission. 18 19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 31Server-Node EFD Sample Application 32================================== 33 34This sample application demonstrates the use of EFD library as a flow-level 35load balancer, for more information about the EFD Library please refer to the 36DPDK programmer's guide. 37 38This sample application is a variant of the 39:ref:`client-server sample application <multi_process_app>` 40where a specific target node is specified for every and each flow 41(not in a round-robin fashion as the original load balancing sample application). 42 43Overview 44-------- 45 46The architecture of the EFD flow-based load balancer sample application is 47presented in the following figure. 48 49.. _figure_efd_sample_app_overview: 50 51.. figure:: img/server_node_efd.* 52 53 Using EFD as a Flow-Level Load Balancer 54 55As shown in :numref:`figure_efd_sample_app_overview`, 56the sample application consists of a front-end node (server) 57using the EFD library to create a load-balancing table for flows, 58for each flow a target backend worker node is specified. The EFD table does not 59store the flow key (unlike a regular hash table), and hence, it can 60individually load-balance millions of flows (number of targets * maximum number 61of flows fit in a flow table per target) while still fitting in CPU cache. 62 63It should be noted that although they are referred to as nodes, the frontend 64server and worker nodes are processes running on the same platform. 65 66Front-end Server 67~~~~~~~~~~~~~~~~ 68 69Upon initializing, the frontend server node (process) creates a flow 70distributor table (based on the EFD library) which is populated with flow 71information and its intended target node. 72 73The sample application assigns a specific target node_id (process) for each of 74the IP destination addresses as follows: 75 76.. code-block:: c 77 78 node_id = i % num_nodes; /* Target node id is generated */ 79 ip_dst = rte_cpu_to_be_32(i); /* Specific ip destination address is 80 assigned to this target node */ 81 82then the pair of <key,target> is inserted into the flow distribution table. 83 84The main loop of the server process receives a burst of packets, then for 85each packet, a flow key (IP destination address) is extracted. The flow 86distributor table is looked up and the target node id is returned. Packets are 87then enqueued to the specified target node id. 88 89It should be noted that flow distributor table is not a membership test table. 90I.e. if the key has already been inserted the target node id will be correct, 91but for new keys the flow distributor table will return a value (which can be 92valid). 93 94Backend Worker Nodes 95~~~~~~~~~~~~~~~~~~~~ 96 97Upon initializing, the worker node (process) creates a flow table (a regular 98hash table that stores the key default size 1M flows) which is populated with 99only the flow information that is serviced at this node. This flow key is 100essential to point out new keys that have not been inserted before. 101 102The worker node's main loop is simply receiving packets then doing a hash table 103lookup. If a match occurs then statistics are updated for flows serviced by 104this node. If no match is found in the local hash table then this indicates 105that this is a new flow, which is dropped. 106 107 108Compiling the Application 109------------------------- 110 111The sequence of steps used to build the application is: 112 113#. Export the required environment variables: 114 115 .. code-block:: console 116 117 export RTE_SDK=/path/to/rte_sdk 118 export RTE_TARGET=x86_64-native-linuxapp-gcc 119 120#. Build the application executable file: 121 122 .. code-block:: console 123 124 cd ${RTE_SDK}/examples/server_node_efd/ 125 make 126 127 For more details on how to build the DPDK libraries and sample 128 applications, 129 please refer to the *DPDK Getting Started Guide.* 130 131 132Running the Application 133----------------------- 134 135The application has two binaries to be run: the front-end server 136and the back-end node. 137 138The frontend server (server) has the following command line options:: 139 140 ./server [EAL options] -- -p PORTMASK -n NUM_NODES -f NUM_FLOWS 141 142Where, 143 144* ``-p PORTMASK:`` Hexadecimal bitmask of ports to configure 145* ``-n NUM_NODES:`` Number of back-end nodes that will be used 146* ``-f NUM_FLOWS:`` Number of flows to be added in the EFD table (1 million, by default) 147 148The back-end node (node) has the following command line options:: 149 150 ./node [EAL options] -- -n NODE_ID 151 152Where, 153 154* ``-n NODE_ID:`` Node ID, which cannot be equal or higher than NUM_MODES 155 156 157First, the server app must be launched, with the number of nodes that will be run. 158Once it has been started, the node instances can be run, with different NODE_ID. 159These instances have to be run as secondary processes, with ``--proc-type=secondary`` 160in the EAL options, which will attach to the primary process memory, and therefore, 161they can access the queues created by the primary process to distribute packets. 162 163To successfully run the application, the command line used to start the 164application has to be in sync with the traffic flows configured on the traffic 165generator side. 166 167For examples of application command lines and traffic generator flows, please 168refer to the DPDK Test Report. For more details on how to set up and run the 169sample applications provided with DPDK package, please refer to the 170:ref:`DPDK Getting Started Guide for Linux <linux_gsg>` and 171:ref:`DPDK Getting Started Guide for FreeBSD <freebsd_gsg>`. 172 173 174Explanation 175----------- 176 177As described in previous sections, there are two processes in this example. 178 179The first process, the front-end server, creates and populates the EFD table, 180which is used to distribute packets to nodes, which the number of flows 181specified in the command line (1 million, by default). 182 183 184.. code-block:: c 185 186 static void 187 create_efd_table(void) 188 { 189 uint8_t socket_id = rte_socket_id(); 190 191 /* create table */ 192 efd_table = rte_efd_create("flow table", num_flows * 2, sizeof(uint32_t), 193 1 << socket_id, socket_id); 194 195 if (efd_table == NULL) 196 rte_exit(EXIT_FAILURE, "Problem creating the flow table\n"); 197 } 198 199 static void 200 populate_efd_table(void) 201 { 202 unsigned int i; 203 int32_t ret; 204 uint32_t ip_dst; 205 uint8_t socket_id = rte_socket_id(); 206 uint64_t node_id; 207 208 /* Add flows in table */ 209 for (i = 0; i < num_flows; i++) { 210 node_id = i % num_nodes; 211 212 ip_dst = rte_cpu_to_be_32(i); 213 ret = rte_efd_update(efd_table, socket_id, 214 (void *)&ip_dst, (efd_value_t)node_id); 215 if (ret < 0) 216 rte_exit(EXIT_FAILURE, "Unable to add entry %u in " 217 "EFD table\n", i); 218 } 219 220 printf("EFD table: Adding 0x%x keys\n", num_flows); 221 } 222 223After initialization, packets are received from the enabled ports, and the IPv4 224address from the packets is used as a key to look up in the EFD table, 225which tells the node where the packet has to be distributed. 226 227.. code-block:: c 228 229 static void 230 process_packets(uint32_t port_num __rte_unused, struct rte_mbuf *pkts[], 231 uint16_t rx_count, unsigned int socket_id) 232 { 233 uint16_t i; 234 uint8_t node; 235 efd_value_t data[EFD_BURST_MAX]; 236 const void *key_ptrs[EFD_BURST_MAX]; 237 238 struct ipv4_hdr *ipv4_hdr; 239 uint32_t ipv4_dst_ip[EFD_BURST_MAX]; 240 241 for (i = 0; i < rx_count; i++) { 242 /* Handle IPv4 header.*/ 243 ipv4_hdr = rte_pktmbuf_mtod_offset(pkts[i], struct ipv4_hdr *, 244 sizeof(struct ether_hdr)); 245 ipv4_dst_ip[i] = ipv4_hdr->dst_addr; 246 key_ptrs[i] = (void *)&ipv4_dst_ip[i]; 247 } 248 249 rte_efd_lookup_bulk(efd_table, socket_id, rx_count, 250 (const void **) key_ptrs, data); 251 for (i = 0; i < rx_count; i++) { 252 node = (uint8_t) ((uintptr_t)data[i]); 253 254 if (node >= num_nodes) { 255 /* 256 * Node is out of range, which means that 257 * flow has not been inserted 258 */ 259 flow_dist_stats.drop++; 260 rte_pktmbuf_free(pkts[i]); 261 } else { 262 flow_dist_stats.distributed++; 263 enqueue_rx_packet(node, pkts[i]); 264 } 265 } 266 267 for (i = 0; i < num_nodes; i++) 268 flush_rx_queue(i); 269 } 270 271The burst of packets received is enqueued in temporary buffers (per node), 272and enqueued in the shared ring between the server and the node. 273After this, a new burst of packets is received and this process is 274repeated infinitely. 275 276.. code-block:: c 277 278 static void 279 flush_rx_queue(uint16_t node) 280 { 281 uint16_t j; 282 struct node *cl; 283 284 if (cl_rx_buf[node].count == 0) 285 return; 286 287 cl = &nodes[node]; 288 if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[node].buffer, 289 cl_rx_buf[node].count) != 0){ 290 for (j = 0; j < cl_rx_buf[node].count; j++) 291 rte_pktmbuf_free(cl_rx_buf[node].buffer[j]); 292 cl->stats.rx_drop += cl_rx_buf[node].count; 293 } else 294 cl->stats.rx += cl_rx_buf[node].count; 295 296 cl_rx_buf[node].count = 0; 297 } 298 299The second process, the back-end node, receives the packets from the shared 300ring with the server and send them out, if they belong to the node. 301 302At initialization, it attaches to the server process memory, to have 303access to the shared ring, parameters and statistics. 304 305.. code-block:: c 306 307 rx_ring = rte_ring_lookup(get_rx_queue_name(node_id)); 308 if (rx_ring == NULL) 309 rte_exit(EXIT_FAILURE, "Cannot get RX ring - " 310 "is server process running?\n"); 311 312 mp = rte_mempool_lookup(PKTMBUF_POOL_NAME); 313 if (mp == NULL) 314 rte_exit(EXIT_FAILURE, "Cannot get mempool for mbufs\n"); 315 316 mz = rte_memzone_lookup(MZ_SHARED_INFO); 317 if (mz == NULL) 318 rte_exit(EXIT_FAILURE, "Cannot get port info structure\n"); 319 info = mz->addr; 320 tx_stats = &(info->tx_stats[node_id]); 321 filter_stats = &(info->filter_stats[node_id]); 322 323Then, the hash table that contains the flows that will be handled 324by the node is created and populated. 325 326.. code-block:: c 327 328 static struct rte_hash * 329 create_hash_table(const struct shared_info *info) 330 { 331 uint32_t num_flows_node = info->num_flows / info->num_nodes; 332 char name[RTE_HASH_NAMESIZE]; 333 struct rte_hash *h; 334 335 /* create table */ 336 struct rte_hash_parameters hash_params = { 337 .entries = num_flows_node * 2, /* table load = 50% */ 338 .key_len = sizeof(uint32_t), /* Store IPv4 dest IP address */ 339 .socket_id = rte_socket_id(), 340 .hash_func_init_val = 0, 341 }; 342 343 snprintf(name, sizeof(name), "hash_table_%d", node_id); 344 hash_params.name = name; 345 h = rte_hash_create(&hash_params); 346 347 if (h == NULL) 348 rte_exit(EXIT_FAILURE, 349 "Problem creating the hash table for node %d\n", 350 node_id); 351 return h; 352 } 353 354 static void 355 populate_hash_table(const struct rte_hash *h, const struct shared_info *info) 356 { 357 unsigned int i; 358 int32_t ret; 359 uint32_t ip_dst; 360 uint32_t num_flows_node = 0; 361 uint64_t target_node; 362 363 /* Add flows in table */ 364 for (i = 0; i < info->num_flows; i++) { 365 target_node = i % info->num_nodes; 366 if (target_node != node_id) 367 continue; 368 369 ip_dst = rte_cpu_to_be_32(i); 370 371 ret = rte_hash_add_key(h, (void *) &ip_dst); 372 if (ret < 0) 373 rte_exit(EXIT_FAILURE, "Unable to add entry %u " 374 "in hash table\n", i); 375 else 376 num_flows_node++; 377 378 } 379 380 printf("Hash table: Adding 0x%x keys\n", num_flows_node); 381 } 382 383After initialization, packets are dequeued from the shared ring 384(from the server) and, like in the server process, 385the IPv4 address from the packets is used as a key to look up in the hash table. 386If there is a hit, packet is stored in a buffer, to be eventually transmitted 387in one of the enabled ports. If key is not there, packet is dropped, since the 388flow is not handled by the node. 389 390.. code-block:: c 391 392 static inline void 393 handle_packets(struct rte_hash *h, struct rte_mbuf **bufs, uint16_t num_packets) 394 { 395 struct ipv4_hdr *ipv4_hdr; 396 uint32_t ipv4_dst_ip[PKT_READ_SIZE]; 397 const void *key_ptrs[PKT_READ_SIZE]; 398 unsigned int i; 399 int32_t positions[PKT_READ_SIZE] = {0}; 400 401 for (i = 0; i < num_packets; i++) { 402 /* Handle IPv4 header.*/ 403 ipv4_hdr = rte_pktmbuf_mtod_offset(bufs[i], struct ipv4_hdr *, 404 sizeof(struct ether_hdr)); 405 ipv4_dst_ip[i] = ipv4_hdr->dst_addr; 406 key_ptrs[i] = &ipv4_dst_ip[i]; 407 } 408 /* Check if packets belongs to any flows handled by this node */ 409 rte_hash_lookup_bulk(h, key_ptrs, num_packets, positions); 410 411 for (i = 0; i < num_packets; i++) { 412 if (likely(positions[i] >= 0)) { 413 filter_stats->passed++; 414 transmit_packet(bufs[i]); 415 } else { 416 filter_stats->drop++; 417 /* Drop packet, as flow is not handled by this node */ 418 rte_pktmbuf_free(bufs[i]); 419 } 420 } 421 } 422 423Finally, note that both processes updates statistics, such as transmitted, received 424and dropped packets, which are shown and refreshed by the server app. 425 426.. code-block:: c 427 428 static void 429 do_stats_display(void) 430 { 431 unsigned int i, j; 432 const char clr[] = {27, '[', '2', 'J', '\0'}; 433 const char topLeft[] = {27, '[', '1', ';', '1', 'H', '\0'}; 434 uint64_t port_tx[RTE_MAX_ETHPORTS], port_tx_drop[RTE_MAX_ETHPORTS]; 435 uint64_t node_tx[MAX_NODES], node_tx_drop[MAX_NODES]; 436 437 /* to get TX stats, we need to do some summing calculations */ 438 memset(port_tx, 0, sizeof(port_tx)); 439 memset(port_tx_drop, 0, sizeof(port_tx_drop)); 440 memset(node_tx, 0, sizeof(node_tx)); 441 memset(node_tx_drop, 0, sizeof(node_tx_drop)); 442 443 for (i = 0; i < num_nodes; i++) { 444 const struct tx_stats *tx = &info->tx_stats[i]; 445 446 for (j = 0; j < info->num_ports; j++) { 447 const uint64_t tx_val = tx->tx[info->id[j]]; 448 const uint64_t drop_val = tx->tx_drop[info->id[j]]; 449 450 port_tx[j] += tx_val; 451 port_tx_drop[j] += drop_val; 452 node_tx[i] += tx_val; 453 node_tx_drop[i] += drop_val; 454 } 455 } 456 457 /* Clear screen and move to top left */ 458 printf("%s%s", clr, topLeft); 459 460 printf("PORTS\n"); 461 printf("-----\n"); 462 for (i = 0; i < info->num_ports; i++) 463 printf("Port %u: '%s'\t", (unsigned int)info->id[i], 464 get_printable_mac_addr(info->id[i])); 465 printf("\n\n"); 466 for (i = 0; i < info->num_ports; i++) { 467 printf("Port %u - rx: %9"PRIu64"\t" 468 "tx: %9"PRIu64"\n", 469 (unsigned int)info->id[i], info->rx_stats.rx[i], 470 port_tx[i]); 471 } 472 473 printf("\nSERVER\n"); 474 printf("-----\n"); 475 printf("distributed: %9"PRIu64", drop: %9"PRIu64"\n", 476 flow_dist_stats.distributed, flow_dist_stats.drop); 477 478 printf("\nNODES\n"); 479 printf("-------\n"); 480 for (i = 0; i < num_nodes; i++) { 481 const unsigned long long rx = nodes[i].stats.rx; 482 const unsigned long long rx_drop = nodes[i].stats.rx_drop; 483 const struct filter_stats *filter = &info->filter_stats[i]; 484 485 printf("Node %2u - rx: %9llu, rx_drop: %9llu\n" 486 " tx: %9"PRIu64", tx_drop: %9"PRIu64"\n" 487 " filter_passed: %9"PRIu64", " 488 "filter_drop: %9"PRIu64"\n", 489 i, rx, rx_drop, node_tx[i], node_tx_drop[i], 490 filter->passed, filter->drop); 491 } 492 493 printf("\n"); 494 } 495