1.. SPDX-License-Identifier: BSD-3-Clause 2 Copyright(c) 2016-2017 Intel Corporation. 3 4Server-Node EFD Sample Application 5================================== 6 7This sample application demonstrates the use of EFD library as a flow-level 8load balancer, for more information about the EFD Library please refer to the 9DPDK programmer's guide. 10 11This sample application is a variant of the 12:ref:`client-server sample application <multi_process_app>` 13where a specific target node is specified for every and each flow 14(not in a round-robin fashion as the original load balancing sample application). 15 16Overview 17-------- 18 19The architecture of the EFD flow-based load balancer sample application is 20presented in the following figure. 21 22.. _figure_efd_sample_app_overview: 23 24.. figure:: img/server_node_efd.* 25 26 Using EFD as a Flow-Level Load Balancer 27 28As shown in :numref:`figure_efd_sample_app_overview`, 29the sample application consists of a front-end node (server) 30using the EFD library to create a load-balancing table for flows, 31for each flow a target backend worker node is specified. The EFD table does not 32store the flow key (unlike a regular hash table), and hence, it can 33individually load-balance millions of flows (number of targets * maximum number 34of flows fit in a flow table per target) while still fitting in CPU cache. 35 36It should be noted that although they are referred to as nodes, the frontend 37server and worker nodes are processes running on the same platform. 38 39Front-end Server 40~~~~~~~~~~~~~~~~ 41 42Upon initializing, the frontend server node (process) creates a flow 43distributor table (based on the EFD library) which is populated with flow 44information and its intended target node. 45 46The sample application assigns a specific target node_id (process) for each of 47the IP destination addresses as follows: 48 49.. code-block:: c 50 51 node_id = i % num_nodes; /* Target node id is generated */ 52 ip_dst = rte_cpu_to_be_32(i); /* Specific ip destination address is 53 assigned to this target node */ 54 55then the pair of <key,target> is inserted into the flow distribution table. 56 57The main loop of the server process receives a burst of packets, then for 58each packet, a flow key (IP destination address) is extracted. The flow 59distributor table is looked up and the target node id is returned. Packets are 60then enqueued to the specified target node id. 61 62It should be noted that flow distributor table is not a membership test table. 63I.e. if the key has already been inserted the target node id will be correct, 64but for new keys the flow distributor table will return a value (which can be 65valid). 66 67Backend Worker Nodes 68~~~~~~~~~~~~~~~~~~~~ 69 70Upon initializing, the worker node (process) creates a flow table (a regular 71hash table that stores the key default size 1M flows) which is populated with 72only the flow information that is serviced at this node. This flow key is 73essential to point out new keys that have not been inserted before. 74 75The worker node's main loop is simply receiving packets then doing a hash table 76lookup. If a match occurs then statistics are updated for flows serviced by 77this node. If no match is found in the local hash table then this indicates 78that this is a new flow, which is dropped. 79 80 81Compiling the Application 82------------------------- 83 84To compile the sample application see :doc:`compiling`. 85 86The application is located in the ``server_node_efd`` sub-directory. 87 88Running the Application 89----------------------- 90 91The application has two binaries to be run: the front-end server 92and the back-end node. 93 94The frontend server (server) has the following command line options:: 95 96 ./server [EAL options] -- -p PORTMASK -n NUM_NODES -f NUM_FLOWS 97 98Where, 99 100* ``-p PORTMASK:`` Hexadecimal bitmask of ports to configure 101* ``-n NUM_NODES:`` Number of back-end nodes that will be used 102* ``-f NUM_FLOWS:`` Number of flows to be added in the EFD table (1 million, by default) 103 104The back-end node (node) has the following command line options:: 105 106 ./node [EAL options] -- -n NODE_ID 107 108Where, 109 110* ``-n NODE_ID:`` Node ID, which cannot be equal or higher than NUM_MODES 111 112 113First, the server app must be launched, with the number of nodes that will be run. 114Once it has been started, the node instances can be run, with different NODE_ID. 115These instances have to be run as secondary processes, with ``--proc-type=secondary`` 116in the EAL options, which will attach to the primary process memory, and therefore, 117they can access the queues created by the primary process to distribute packets. 118 119To successfully run the application, the command line used to start the 120application has to be in sync with the traffic flows configured on the traffic 121generator side. 122 123For examples of application command lines and traffic generator flows, please 124refer to the DPDK Test Report. For more details on how to set up and run the 125sample applications provided with DPDK package, please refer to the 126:ref:`DPDK Getting Started Guide for Linux <linux_gsg>` and 127:ref:`DPDK Getting Started Guide for FreeBSD <freebsd_gsg>`. 128 129 130Explanation 131----------- 132 133As described in previous sections, there are two processes in this example. 134 135The first process, the front-end server, creates and populates the EFD table, 136which is used to distribute packets to nodes, which the number of flows 137specified in the command line (1 million, by default). 138 139 140.. code-block:: c 141 142 static void 143 create_efd_table(void) 144 { 145 uint8_t socket_id = rte_socket_id(); 146 147 /* create table */ 148 efd_table = rte_efd_create("flow table", num_flows * 2, sizeof(uint32_t), 149 1 << socket_id, socket_id); 150 151 if (efd_table == NULL) 152 rte_exit(EXIT_FAILURE, "Problem creating the flow table\n"); 153 } 154 155 static void 156 populate_efd_table(void) 157 { 158 unsigned int i; 159 int32_t ret; 160 uint32_t ip_dst; 161 uint8_t socket_id = rte_socket_id(); 162 uint64_t node_id; 163 164 /* Add flows in table */ 165 for (i = 0; i < num_flows; i++) { 166 node_id = i % num_nodes; 167 168 ip_dst = rte_cpu_to_be_32(i); 169 ret = rte_efd_update(efd_table, socket_id, 170 (void *)&ip_dst, (efd_value_t)node_id); 171 if (ret < 0) 172 rte_exit(EXIT_FAILURE, "Unable to add entry %u in " 173 "EFD table\n", i); 174 } 175 176 printf("EFD table: Adding 0x%x keys\n", num_flows); 177 } 178 179After initialization, packets are received from the enabled ports, and the IPv4 180address from the packets is used as a key to look up in the EFD table, 181which tells the node where the packet has to be distributed. 182 183.. code-block:: c 184 185 static void 186 process_packets(uint32_t port_num __rte_unused, struct rte_mbuf *pkts[], 187 uint16_t rx_count, unsigned int socket_id) 188 { 189 uint16_t i; 190 uint8_t node; 191 efd_value_t data[EFD_BURST_MAX]; 192 const void *key_ptrs[EFD_BURST_MAX]; 193 194 struct rte_ipv4_hdr *ipv4_hdr; 195 uint32_t ipv4_dst_ip[EFD_BURST_MAX]; 196 197 for (i = 0; i < rx_count; i++) { 198 /* Handle IPv4 header.*/ 199 ipv4_hdr = rte_pktmbuf_mtod_offset(pkts[i], struct rte_ipv4_hdr *, 200 sizeof(struct rte_ether_hdr)); 201 ipv4_dst_ip[i] = ipv4_hdr->dst_addr; 202 key_ptrs[i] = (void *)&ipv4_dst_ip[i]; 203 } 204 205 rte_efd_lookup_bulk(efd_table, socket_id, rx_count, 206 (const void **) key_ptrs, data); 207 for (i = 0; i < rx_count; i++) { 208 node = (uint8_t) ((uintptr_t)data[i]); 209 210 if (node >= num_nodes) { 211 /* 212 * Node is out of range, which means that 213 * flow has not been inserted 214 */ 215 flow_dist_stats.drop++; 216 rte_pktmbuf_free(pkts[i]); 217 } else { 218 flow_dist_stats.distributed++; 219 enqueue_rx_packet(node, pkts[i]); 220 } 221 } 222 223 for (i = 0; i < num_nodes; i++) 224 flush_rx_queue(i); 225 } 226 227The burst of packets received is enqueued in temporary buffers (per node), 228and enqueued in the shared ring between the server and the node. 229After this, a new burst of packets is received and this process is 230repeated infinitely. 231 232.. code-block:: c 233 234 static void 235 flush_rx_queue(uint16_t node) 236 { 237 uint16_t j; 238 struct node *cl; 239 240 if (cl_rx_buf[node].count == 0) 241 return; 242 243 cl = &nodes[node]; 244 if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[node].buffer, 245 cl_rx_buf[node].count, NULL) != cl_rx_buf[node].count){ 246 for (j = 0; j < cl_rx_buf[node].count; j++) 247 rte_pktmbuf_free(cl_rx_buf[node].buffer[j]); 248 cl->stats.rx_drop += cl_rx_buf[node].count; 249 } else 250 cl->stats.rx += cl_rx_buf[node].count; 251 252 cl_rx_buf[node].count = 0; 253 } 254 255The second process, the back-end node, receives the packets from the shared 256ring with the server and send them out, if they belong to the node. 257 258At initialization, it attaches to the server process memory, to have 259access to the shared ring, parameters and statistics. 260 261.. code-block:: c 262 263 rx_ring = rte_ring_lookup(get_rx_queue_name(node_id)); 264 if (rx_ring == NULL) 265 rte_exit(EXIT_FAILURE, "Cannot get RX ring - " 266 "is server process running?\n"); 267 268 mp = rte_mempool_lookup(PKTMBUF_POOL_NAME); 269 if (mp == NULL) 270 rte_exit(EXIT_FAILURE, "Cannot get mempool for mbufs\n"); 271 272 mz = rte_memzone_lookup(MZ_SHARED_INFO); 273 if (mz == NULL) 274 rte_exit(EXIT_FAILURE, "Cannot get port info structure\n"); 275 info = mz->addr; 276 tx_stats = &(info->tx_stats[node_id]); 277 filter_stats = &(info->filter_stats[node_id]); 278 279Then, the hash table that contains the flows that will be handled 280by the node is created and populated. 281 282.. code-block:: c 283 284 static struct rte_hash * 285 create_hash_table(const struct shared_info *info) 286 { 287 uint32_t num_flows_node = info->num_flows / info->num_nodes; 288 char name[RTE_HASH_NAMESIZE]; 289 struct rte_hash *h; 290 291 /* create table */ 292 struct rte_hash_parameters hash_params = { 293 .entries = num_flows_node * 2, /* table load = 50% */ 294 .key_len = sizeof(uint32_t), /* Store IPv4 dest IP address */ 295 .socket_id = rte_socket_id(), 296 .hash_func_init_val = 0, 297 }; 298 299 snprintf(name, sizeof(name), "hash_table_%d", node_id); 300 hash_params.name = name; 301 h = rte_hash_create(&hash_params); 302 303 if (h == NULL) 304 rte_exit(EXIT_FAILURE, 305 "Problem creating the hash table for node %d\n", 306 node_id); 307 return h; 308 } 309 310 static void 311 populate_hash_table(const struct rte_hash *h, const struct shared_info *info) 312 { 313 unsigned int i; 314 int32_t ret; 315 uint32_t ip_dst; 316 uint32_t num_flows_node = 0; 317 uint64_t target_node; 318 319 /* Add flows in table */ 320 for (i = 0; i < info->num_flows; i++) { 321 target_node = i % info->num_nodes; 322 if (target_node != node_id) 323 continue; 324 325 ip_dst = rte_cpu_to_be_32(i); 326 327 ret = rte_hash_add_key(h, (void *) &ip_dst); 328 if (ret < 0) 329 rte_exit(EXIT_FAILURE, "Unable to add entry %u " 330 "in hash table\n", i); 331 else 332 num_flows_node++; 333 334 } 335 336 printf("Hash table: Adding 0x%x keys\n", num_flows_node); 337 } 338 339After initialization, packets are dequeued from the shared ring 340(from the server) and, like in the server process, 341the IPv4 address from the packets is used as a key to look up in the hash table. 342If there is a hit, packet is stored in a buffer, to be eventually transmitted 343in one of the enabled ports. If key is not there, packet is dropped, since the 344flow is not handled by the node. 345 346.. code-block:: c 347 348 static inline void 349 handle_packets(struct rte_hash *h, struct rte_mbuf **bufs, uint16_t num_packets) 350 { 351 struct rte_ipv4_hdr *ipv4_hdr; 352 uint32_t ipv4_dst_ip[PKT_READ_SIZE]; 353 const void *key_ptrs[PKT_READ_SIZE]; 354 unsigned int i; 355 int32_t positions[PKT_READ_SIZE] = {0}; 356 357 for (i = 0; i < num_packets; i++) { 358 /* Handle IPv4 header.*/ 359 ipv4_hdr = rte_pktmbuf_mtod_offset(bufs[i], struct rte_ipv4_hdr *, 360 sizeof(struct rte_ether_hdr)); 361 ipv4_dst_ip[i] = ipv4_hdr->dst_addr; 362 key_ptrs[i] = &ipv4_dst_ip[i]; 363 } 364 /* Check if packets belongs to any flows handled by this node */ 365 rte_hash_lookup_bulk(h, key_ptrs, num_packets, positions); 366 367 for (i = 0; i < num_packets; i++) { 368 if (likely(positions[i] >= 0)) { 369 filter_stats->passed++; 370 transmit_packet(bufs[i]); 371 } else { 372 filter_stats->drop++; 373 /* Drop packet, as flow is not handled by this node */ 374 rte_pktmbuf_free(bufs[i]); 375 } 376 } 377 } 378 379Finally, note that both processes updates statistics, such as transmitted, received 380and dropped packets, which are shown and refreshed by the server app. 381 382.. code-block:: c 383 384 static void 385 do_stats_display(void) 386 { 387 unsigned int i, j; 388 const char clr[] = {27, '[', '2', 'J', '\0'}; 389 const char topLeft[] = {27, '[', '1', ';', '1', 'H', '\0'}; 390 uint64_t port_tx[RTE_MAX_ETHPORTS], port_tx_drop[RTE_MAX_ETHPORTS]; 391 uint64_t node_tx[MAX_NODES], node_tx_drop[MAX_NODES]; 392 393 /* to get TX stats, we need to do some summing calculations */ 394 memset(port_tx, 0, sizeof(port_tx)); 395 memset(port_tx_drop, 0, sizeof(port_tx_drop)); 396 memset(node_tx, 0, sizeof(node_tx)); 397 memset(node_tx_drop, 0, sizeof(node_tx_drop)); 398 399 for (i = 0; i < num_nodes; i++) { 400 const struct tx_stats *tx = &info->tx_stats[i]; 401 402 for (j = 0; j < info->num_ports; j++) { 403 const uint64_t tx_val = tx->tx[info->id[j]]; 404 const uint64_t drop_val = tx->tx_drop[info->id[j]]; 405 406 port_tx[j] += tx_val; 407 port_tx_drop[j] += drop_val; 408 node_tx[i] += tx_val; 409 node_tx_drop[i] += drop_val; 410 } 411 } 412 413 /* Clear screen and move to top left */ 414 printf("%s%s", clr, topLeft); 415 416 printf("PORTS\n"); 417 printf("-----\n"); 418 for (i = 0; i < info->num_ports; i++) 419 printf("Port %u: '%s'\t", (unsigned int)info->id[i], 420 get_printable_mac_addr(info->id[i])); 421 printf("\n\n"); 422 for (i = 0; i < info->num_ports; i++) { 423 printf("Port %u - rx: %9"PRIu64"\t" 424 "tx: %9"PRIu64"\n", 425 (unsigned int)info->id[i], info->rx_stats.rx[i], 426 port_tx[i]); 427 } 428 429 printf("\nSERVER\n"); 430 printf("-----\n"); 431 printf("distributed: %9"PRIu64", drop: %9"PRIu64"\n", 432 flow_dist_stats.distributed, flow_dist_stats.drop); 433 434 printf("\nNODES\n"); 435 printf("-------\n"); 436 for (i = 0; i < num_nodes; i++) { 437 const unsigned long long rx = nodes[i].stats.rx; 438 const unsigned long long rx_drop = nodes[i].stats.rx_drop; 439 const struct filter_stats *filter = &info->filter_stats[i]; 440 441 printf("Node %2u - rx: %9llu, rx_drop: %9llu\n" 442 " tx: %9"PRIu64", tx_drop: %9"PRIu64"\n" 443 " filter_passed: %9"PRIu64", " 444 "filter_drop: %9"PRIu64"\n", 445 i, rx, rx_drop, node_tx[i], node_tx_drop[i], 446 filter->passed, filter->drop); 447 } 448 449 printf("\n"); 450 } 451