1.. BSD LICENSE 2 Copyright(c) 2016-2017 Intel Corporation. All rights reserved. 3 All rights reserved. 4 5 Redistribution and use in source and binary forms, with or without 6 modification, are permitted provided that the following conditions 7 are met: 8 9 * Redistributions of source code must retain the above copyright 10 notice, this list of conditions and the following disclaimer. 11 * Redistributions in binary form must reproduce the above copyright 12 notice, this list of conditions and the following disclaimer in 13 the documentation and/or other materials provided with the 14 distribution. 15 * Neither the name of Intel Corporation nor the names of its 16 contributors may be used to endorse or promote products derived 17 from this software without specific prior written permission. 18 19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 31Server-Node EFD Sample Application 32================================== 33 34This sample application demonstrates the use of EFD library as a flow-level 35load balancer, for more information about the EFD Library please refer to the 36DPDK programmer's guide. 37 38This sample application is a variant of the 39:ref:`client-server sample application <multi_process_app>` 40where a specific target node is specified for every and each flow 41(not in a round-robin fashion as the original load balancing sample application). 42 43Overview 44-------- 45 46The architecture of the EFD flow-based load balancer sample application is 47presented in the following figure. 48 49.. _figure_efd_sample_app_overview: 50 51.. figure:: img/server_node_efd.* 52 53 Using EFD as a Flow-Level Load Balancer 54 55As shown in :numref:`figure_efd_sample_app_overview`, 56the sample application consists of a front-end node (server) 57using the EFD library to create a load-balancing table for flows, 58for each flow a target backend worker node is specified. The EFD table does not 59store the flow key (unlike a regular hash table), and hence, it can 60individually load-balance millions of flows (number of targets * maximum number 61of flows fit in a flow table per target) while still fitting in CPU cache. 62 63It should be noted that although they are referred to as nodes, the frontend 64server and worker nodes are processes running on the same platform. 65 66Front-end Server 67~~~~~~~~~~~~~~~~ 68 69Upon initializing, the frontend server node (process) creates a flow 70distributor table (based on the EFD library) which is populated with flow 71information and its intended target node. 72 73The sample application assigns a specific target node_id (process) for each of 74the IP destination addresses as follows: 75 76.. code-block:: c 77 78 node_id = i % num_nodes; /* Target node id is generated */ 79 ip_dst = rte_cpu_to_be_32(i); /* Specific ip destination address is 80 assigned to this target node */ 81 82then the pair of <key,target> is inserted into the flow distribution table. 83 84The main loop of the server process receives a burst of packets, then for 85each packet, a flow key (IP destination address) is extracted. The flow 86distributor table is looked up and the target node id is returned. Packets are 87then enqueued to the specified target node id. 88 89It should be noted that flow distributor table is not a membership test table. 90I.e. if the key has already been inserted the target node id will be correct, 91but for new keys the flow distributor table will return a value (which can be 92valid). 93 94Backend Worker Nodes 95~~~~~~~~~~~~~~~~~~~~ 96 97Upon initializing, the worker node (process) creates a flow table (a regular 98hash table that stores the key default size 1M flows) which is populated with 99only the flow information that is serviced at this node. This flow key is 100essential to point out new keys that have not been inserted before. 101 102The worker node's main loop is simply receiving packets then doing a hash table 103lookup. If a match occurs then statistics are updated for flows serviced by 104this node. If no match is found in the local hash table then this indicates 105that this is a new flow, which is dropped. 106 107 108Compiling the Application 109------------------------- 110 111To compile the sample application see :doc:`compiling`. 112 113The application is located in the ``server_node_efd`` sub-directory. 114 115Running the Application 116----------------------- 117 118The application has two binaries to be run: the front-end server 119and the back-end node. 120 121The frontend server (server) has the following command line options:: 122 123 ./server [EAL options] -- -p PORTMASK -n NUM_NODES -f NUM_FLOWS 124 125Where, 126 127* ``-p PORTMASK:`` Hexadecimal bitmask of ports to configure 128* ``-n NUM_NODES:`` Number of back-end nodes that will be used 129* ``-f NUM_FLOWS:`` Number of flows to be added in the EFD table (1 million, by default) 130 131The back-end node (node) has the following command line options:: 132 133 ./node [EAL options] -- -n NODE_ID 134 135Where, 136 137* ``-n NODE_ID:`` Node ID, which cannot be equal or higher than NUM_MODES 138 139 140First, the server app must be launched, with the number of nodes that will be run. 141Once it has been started, the node instances can be run, with different NODE_ID. 142These instances have to be run as secondary processes, with ``--proc-type=secondary`` 143in the EAL options, which will attach to the primary process memory, and therefore, 144they can access the queues created by the primary process to distribute packets. 145 146To successfully run the application, the command line used to start the 147application has to be in sync with the traffic flows configured on the traffic 148generator side. 149 150For examples of application command lines and traffic generator flows, please 151refer to the DPDK Test Report. For more details on how to set up and run the 152sample applications provided with DPDK package, please refer to the 153:ref:`DPDK Getting Started Guide for Linux <linux_gsg>` and 154:ref:`DPDK Getting Started Guide for FreeBSD <freebsd_gsg>`. 155 156 157Explanation 158----------- 159 160As described in previous sections, there are two processes in this example. 161 162The first process, the front-end server, creates and populates the EFD table, 163which is used to distribute packets to nodes, which the number of flows 164specified in the command line (1 million, by default). 165 166 167.. code-block:: c 168 169 static void 170 create_efd_table(void) 171 { 172 uint8_t socket_id = rte_socket_id(); 173 174 /* create table */ 175 efd_table = rte_efd_create("flow table", num_flows * 2, sizeof(uint32_t), 176 1 << socket_id, socket_id); 177 178 if (efd_table == NULL) 179 rte_exit(EXIT_FAILURE, "Problem creating the flow table\n"); 180 } 181 182 static void 183 populate_efd_table(void) 184 { 185 unsigned int i; 186 int32_t ret; 187 uint32_t ip_dst; 188 uint8_t socket_id = rte_socket_id(); 189 uint64_t node_id; 190 191 /* Add flows in table */ 192 for (i = 0; i < num_flows; i++) { 193 node_id = i % num_nodes; 194 195 ip_dst = rte_cpu_to_be_32(i); 196 ret = rte_efd_update(efd_table, socket_id, 197 (void *)&ip_dst, (efd_value_t)node_id); 198 if (ret < 0) 199 rte_exit(EXIT_FAILURE, "Unable to add entry %u in " 200 "EFD table\n", i); 201 } 202 203 printf("EFD table: Adding 0x%x keys\n", num_flows); 204 } 205 206After initialization, packets are received from the enabled ports, and the IPv4 207address from the packets is used as a key to look up in the EFD table, 208which tells the node where the packet has to be distributed. 209 210.. code-block:: c 211 212 static void 213 process_packets(uint32_t port_num __rte_unused, struct rte_mbuf *pkts[], 214 uint16_t rx_count, unsigned int socket_id) 215 { 216 uint16_t i; 217 uint8_t node; 218 efd_value_t data[EFD_BURST_MAX]; 219 const void *key_ptrs[EFD_BURST_MAX]; 220 221 struct ipv4_hdr *ipv4_hdr; 222 uint32_t ipv4_dst_ip[EFD_BURST_MAX]; 223 224 for (i = 0; i < rx_count; i++) { 225 /* Handle IPv4 header.*/ 226 ipv4_hdr = rte_pktmbuf_mtod_offset(pkts[i], struct ipv4_hdr *, 227 sizeof(struct ether_hdr)); 228 ipv4_dst_ip[i] = ipv4_hdr->dst_addr; 229 key_ptrs[i] = (void *)&ipv4_dst_ip[i]; 230 } 231 232 rte_efd_lookup_bulk(efd_table, socket_id, rx_count, 233 (const void **) key_ptrs, data); 234 for (i = 0; i < rx_count; i++) { 235 node = (uint8_t) ((uintptr_t)data[i]); 236 237 if (node >= num_nodes) { 238 /* 239 * Node is out of range, which means that 240 * flow has not been inserted 241 */ 242 flow_dist_stats.drop++; 243 rte_pktmbuf_free(pkts[i]); 244 } else { 245 flow_dist_stats.distributed++; 246 enqueue_rx_packet(node, pkts[i]); 247 } 248 } 249 250 for (i = 0; i < num_nodes; i++) 251 flush_rx_queue(i); 252 } 253 254The burst of packets received is enqueued in temporary buffers (per node), 255and enqueued in the shared ring between the server and the node. 256After this, a new burst of packets is received and this process is 257repeated infinitely. 258 259.. code-block:: c 260 261 static void 262 flush_rx_queue(uint16_t node) 263 { 264 uint16_t j; 265 struct node *cl; 266 267 if (cl_rx_buf[node].count == 0) 268 return; 269 270 cl = &nodes[node]; 271 if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[node].buffer, 272 cl_rx_buf[node].count, NULL) != cl_rx_buf[node].count){ 273 for (j = 0; j < cl_rx_buf[node].count; j++) 274 rte_pktmbuf_free(cl_rx_buf[node].buffer[j]); 275 cl->stats.rx_drop += cl_rx_buf[node].count; 276 } else 277 cl->stats.rx += cl_rx_buf[node].count; 278 279 cl_rx_buf[node].count = 0; 280 } 281 282The second process, the back-end node, receives the packets from the shared 283ring with the server and send them out, if they belong to the node. 284 285At initialization, it attaches to the server process memory, to have 286access to the shared ring, parameters and statistics. 287 288.. code-block:: c 289 290 rx_ring = rte_ring_lookup(get_rx_queue_name(node_id)); 291 if (rx_ring == NULL) 292 rte_exit(EXIT_FAILURE, "Cannot get RX ring - " 293 "is server process running?\n"); 294 295 mp = rte_mempool_lookup(PKTMBUF_POOL_NAME); 296 if (mp == NULL) 297 rte_exit(EXIT_FAILURE, "Cannot get mempool for mbufs\n"); 298 299 mz = rte_memzone_lookup(MZ_SHARED_INFO); 300 if (mz == NULL) 301 rte_exit(EXIT_FAILURE, "Cannot get port info structure\n"); 302 info = mz->addr; 303 tx_stats = &(info->tx_stats[node_id]); 304 filter_stats = &(info->filter_stats[node_id]); 305 306Then, the hash table that contains the flows that will be handled 307by the node is created and populated. 308 309.. code-block:: c 310 311 static struct rte_hash * 312 create_hash_table(const struct shared_info *info) 313 { 314 uint32_t num_flows_node = info->num_flows / info->num_nodes; 315 char name[RTE_HASH_NAMESIZE]; 316 struct rte_hash *h; 317 318 /* create table */ 319 struct rte_hash_parameters hash_params = { 320 .entries = num_flows_node * 2, /* table load = 50% */ 321 .key_len = sizeof(uint32_t), /* Store IPv4 dest IP address */ 322 .socket_id = rte_socket_id(), 323 .hash_func_init_val = 0, 324 }; 325 326 snprintf(name, sizeof(name), "hash_table_%d", node_id); 327 hash_params.name = name; 328 h = rte_hash_create(&hash_params); 329 330 if (h == NULL) 331 rte_exit(EXIT_FAILURE, 332 "Problem creating the hash table for node %d\n", 333 node_id); 334 return h; 335 } 336 337 static void 338 populate_hash_table(const struct rte_hash *h, const struct shared_info *info) 339 { 340 unsigned int i; 341 int32_t ret; 342 uint32_t ip_dst; 343 uint32_t num_flows_node = 0; 344 uint64_t target_node; 345 346 /* Add flows in table */ 347 for (i = 0; i < info->num_flows; i++) { 348 target_node = i % info->num_nodes; 349 if (target_node != node_id) 350 continue; 351 352 ip_dst = rte_cpu_to_be_32(i); 353 354 ret = rte_hash_add_key(h, (void *) &ip_dst); 355 if (ret < 0) 356 rte_exit(EXIT_FAILURE, "Unable to add entry %u " 357 "in hash table\n", i); 358 else 359 num_flows_node++; 360 361 } 362 363 printf("Hash table: Adding 0x%x keys\n", num_flows_node); 364 } 365 366After initialization, packets are dequeued from the shared ring 367(from the server) and, like in the server process, 368the IPv4 address from the packets is used as a key to look up in the hash table. 369If there is a hit, packet is stored in a buffer, to be eventually transmitted 370in one of the enabled ports. If key is not there, packet is dropped, since the 371flow is not handled by the node. 372 373.. code-block:: c 374 375 static inline void 376 handle_packets(struct rte_hash *h, struct rte_mbuf **bufs, uint16_t num_packets) 377 { 378 struct ipv4_hdr *ipv4_hdr; 379 uint32_t ipv4_dst_ip[PKT_READ_SIZE]; 380 const void *key_ptrs[PKT_READ_SIZE]; 381 unsigned int i; 382 int32_t positions[PKT_READ_SIZE] = {0}; 383 384 for (i = 0; i < num_packets; i++) { 385 /* Handle IPv4 header.*/ 386 ipv4_hdr = rte_pktmbuf_mtod_offset(bufs[i], struct ipv4_hdr *, 387 sizeof(struct ether_hdr)); 388 ipv4_dst_ip[i] = ipv4_hdr->dst_addr; 389 key_ptrs[i] = &ipv4_dst_ip[i]; 390 } 391 /* Check if packets belongs to any flows handled by this node */ 392 rte_hash_lookup_bulk(h, key_ptrs, num_packets, positions); 393 394 for (i = 0; i < num_packets; i++) { 395 if (likely(positions[i] >= 0)) { 396 filter_stats->passed++; 397 transmit_packet(bufs[i]); 398 } else { 399 filter_stats->drop++; 400 /* Drop packet, as flow is not handled by this node */ 401 rte_pktmbuf_free(bufs[i]); 402 } 403 } 404 } 405 406Finally, note that both processes updates statistics, such as transmitted, received 407and dropped packets, which are shown and refreshed by the server app. 408 409.. code-block:: c 410 411 static void 412 do_stats_display(void) 413 { 414 unsigned int i, j; 415 const char clr[] = {27, '[', '2', 'J', '\0'}; 416 const char topLeft[] = {27, '[', '1', ';', '1', 'H', '\0'}; 417 uint64_t port_tx[RTE_MAX_ETHPORTS], port_tx_drop[RTE_MAX_ETHPORTS]; 418 uint64_t node_tx[MAX_NODES], node_tx_drop[MAX_NODES]; 419 420 /* to get TX stats, we need to do some summing calculations */ 421 memset(port_tx, 0, sizeof(port_tx)); 422 memset(port_tx_drop, 0, sizeof(port_tx_drop)); 423 memset(node_tx, 0, sizeof(node_tx)); 424 memset(node_tx_drop, 0, sizeof(node_tx_drop)); 425 426 for (i = 0; i < num_nodes; i++) { 427 const struct tx_stats *tx = &info->tx_stats[i]; 428 429 for (j = 0; j < info->num_ports; j++) { 430 const uint64_t tx_val = tx->tx[info->id[j]]; 431 const uint64_t drop_val = tx->tx_drop[info->id[j]]; 432 433 port_tx[j] += tx_val; 434 port_tx_drop[j] += drop_val; 435 node_tx[i] += tx_val; 436 node_tx_drop[i] += drop_val; 437 } 438 } 439 440 /* Clear screen and move to top left */ 441 printf("%s%s", clr, topLeft); 442 443 printf("PORTS\n"); 444 printf("-----\n"); 445 for (i = 0; i < info->num_ports; i++) 446 printf("Port %u: '%s'\t", (unsigned int)info->id[i], 447 get_printable_mac_addr(info->id[i])); 448 printf("\n\n"); 449 for (i = 0; i < info->num_ports; i++) { 450 printf("Port %u - rx: %9"PRIu64"\t" 451 "tx: %9"PRIu64"\n", 452 (unsigned int)info->id[i], info->rx_stats.rx[i], 453 port_tx[i]); 454 } 455 456 printf("\nSERVER\n"); 457 printf("-----\n"); 458 printf("distributed: %9"PRIu64", drop: %9"PRIu64"\n", 459 flow_dist_stats.distributed, flow_dist_stats.drop); 460 461 printf("\nNODES\n"); 462 printf("-------\n"); 463 for (i = 0; i < num_nodes; i++) { 464 const unsigned long long rx = nodes[i].stats.rx; 465 const unsigned long long rx_drop = nodes[i].stats.rx_drop; 466 const struct filter_stats *filter = &info->filter_stats[i]; 467 468 printf("Node %2u - rx: %9llu, rx_drop: %9llu\n" 469 " tx: %9"PRIu64", tx_drop: %9"PRIu64"\n" 470 " filter_passed: %9"PRIu64", " 471 "filter_drop: %9"PRIu64"\n", 472 i, rx, rx_drop, node_tx[i], node_tx_drop[i], 473 filter->passed, filter->drop); 474 } 475 476 printf("\n"); 477 } 478