xref: /dpdk/doc/guides/sample_app_ug/server_node_efd.rst (revision 8809f78c7dd9f33a44a4f89c58fc91ded34296ed)
1..  SPDX-License-Identifier: BSD-3-Clause
2    Copyright(c) 2016-2017 Intel Corporation.
3
4Server-Node EFD Sample Application
5==================================
6
7This sample application demonstrates the use of EFD library as a flow-level
8load balancer, for more information about the EFD Library please refer to the
9DPDK programmer's guide.
10
11This sample application is a variant of the
12:ref:`client-server sample application <multi_process_app>`
13where a specific target node is specified for every and each flow
14(not in a round-robin fashion as the original load balancing sample application).
15
16Overview
17--------
18
19The architecture of the EFD flow-based load balancer sample application is
20presented in the following figure.
21
22.. _figure_efd_sample_app_overview:
23
24.. figure:: img/server_node_efd.*
25
26   Using EFD as a Flow-Level Load Balancer
27
28As shown in :numref:`figure_efd_sample_app_overview`,
29the sample application consists of a front-end node (server)
30using the EFD library to create a load-balancing table for flows,
31for each flow a target backend worker node is specified. The EFD table does not
32store the flow key (unlike a regular hash table), and hence, it can
33individually load-balance millions of flows (number of targets * maximum number
34of flows fit in a flow table per target) while still fitting in CPU cache.
35
36It should be noted that although they are referred to as nodes, the frontend
37server and worker nodes are processes running on the same platform.
38
39Front-end Server
40~~~~~~~~~~~~~~~~
41
42Upon initializing, the frontend server node (process) creates a flow
43distributor table (based on the EFD library) which is populated with flow
44information and its intended target node.
45
46The sample application assigns a specific target node_id (process) for each of
47the IP destination addresses as follows:
48
49.. code-block:: c
50
51    node_id = i % num_nodes; /* Target node id is generated */
52    ip_dst = rte_cpu_to_be_32(i); /* Specific ip destination address is
53                                     assigned to this target node */
54
55then the pair of <key,target> is inserted into the flow distribution table.
56
57The main loop of the server process receives a burst of packets, then for
58each packet, a flow key (IP destination address) is extracted. The flow
59distributor table is looked up and the target node id is returned.  Packets are
60then enqueued to the specified target node id.
61
62It should be noted that flow distributor table is not a membership test table.
63I.e. if the key has already been inserted the target node id will be correct,
64but for new keys the flow distributor table will return a value (which can be
65valid).
66
67Backend Worker Nodes
68~~~~~~~~~~~~~~~~~~~~
69
70Upon initializing, the worker node (process) creates a flow table (a regular
71hash table that stores the key default size 1M flows) which is populated with
72only the flow information that is serviced at this node. This flow key is
73essential to point out new keys that have not been inserted before.
74
75The worker node's main loop is simply receiving packets then doing a hash table
76lookup. If a match occurs then statistics are updated for flows serviced by
77this node. If no match is found in the local hash table then this indicates
78that this is a new flow, which is dropped.
79
80
81Compiling the Application
82-------------------------
83
84To compile the sample application see :doc:`compiling`.
85
86The application is located in the ``server_node_efd`` sub-directory.
87
88Running the Application
89-----------------------
90
91The application has two binaries to be run: the front-end server
92and the back-end node.
93
94The frontend server (server) has the following command line options::
95
96    ./<build_dir>/examples/dpdk-server [EAL options] -- -p PORTMASK -n NUM_NODES -f NUM_FLOWS
97
98Where,
99
100* ``-p PORTMASK:`` Hexadecimal bitmask of ports to configure
101* ``-n NUM_NODES:`` Number of back-end nodes that will be used
102* ``-f NUM_FLOWS:`` Number of flows to be added in the EFD table (1 million, by default)
103
104The back-end node (node) has the following command line options::
105
106    ./node [EAL options] -- -n NODE_ID
107
108Where,
109
110* ``-n NODE_ID:`` Node ID, which cannot be equal or higher than NUM_MODES
111
112
113First, the server app must be launched, with the number of nodes that will be run.
114Once it has been started, the node instances can be run, with different NODE_ID.
115These instances have to be run as secondary processes, with ``--proc-type=secondary``
116in the EAL options, which will attach to the primary process memory, and therefore,
117they can access the queues created by the primary process to distribute packets.
118
119To successfully run the application, the command line used to start the
120application has to be in sync with the traffic flows configured on the traffic
121generator side.
122
123For examples of application command lines and traffic generator flows, please
124refer to the DPDK Test Report. For more details on how to set up and run the
125sample applications provided with DPDK package, please refer to the
126:ref:`DPDK Getting Started Guide for Linux <linux_gsg>` and
127:ref:`DPDK Getting Started Guide for FreeBSD <freebsd_gsg>`.
128
129
130Explanation
131-----------
132
133As described in previous sections, there are two processes in this example.
134
135The first process, the front-end server, creates and populates the EFD table,
136which is used to distribute packets to nodes, which the number of flows
137specified in the command line (1 million, by default).
138
139
140.. code-block:: c
141
142    static void
143    create_efd_table(void)
144    {
145        uint8_t socket_id = rte_socket_id();
146
147        /* create table */
148        efd_table = rte_efd_create("flow table", num_flows * 2, sizeof(uint32_t),
149                        1 << socket_id, socket_id);
150
151        if (efd_table == NULL)
152            rte_exit(EXIT_FAILURE, "Problem creating the flow table\n");
153    }
154
155    static void
156    populate_efd_table(void)
157    {
158        unsigned int i;
159        int32_t ret;
160        uint32_t ip_dst;
161        uint8_t socket_id = rte_socket_id();
162        uint64_t node_id;
163
164        /* Add flows in table */
165        for (i = 0; i < num_flows; i++) {
166            node_id = i % num_nodes;
167
168            ip_dst = rte_cpu_to_be_32(i);
169            ret = rte_efd_update(efd_table, socket_id,
170                            (void *)&ip_dst, (efd_value_t)node_id);
171            if (ret < 0)
172                rte_exit(EXIT_FAILURE, "Unable to add entry %u in "
173                                    "EFD table\n", i);
174        }
175
176        printf("EFD table: Adding 0x%x keys\n", num_flows);
177    }
178
179After initialization, packets are received from the enabled ports, and the IPv4
180address from the packets is used as a key to look up in the EFD table,
181which tells the node where the packet has to be distributed.
182
183.. code-block:: c
184
185    static void
186    process_packets(uint32_t port_num __rte_unused, struct rte_mbuf *pkts[],
187            uint16_t rx_count, unsigned int socket_id)
188    {
189        uint16_t i;
190        uint8_t node;
191        efd_value_t data[EFD_BURST_MAX];
192        const void *key_ptrs[EFD_BURST_MAX];
193
194        struct rte_ipv4_hdr *ipv4_hdr;
195        uint32_t ipv4_dst_ip[EFD_BURST_MAX];
196
197        for (i = 0; i < rx_count; i++) {
198            /* Handle IPv4 header.*/
199            ipv4_hdr = rte_pktmbuf_mtod_offset(pkts[i], struct rte_ipv4_hdr *,
200                    sizeof(struct rte_ether_hdr));
201            ipv4_dst_ip[i] = ipv4_hdr->dst_addr;
202            key_ptrs[i] = (void *)&ipv4_dst_ip[i];
203        }
204
205        rte_efd_lookup_bulk(efd_table, socket_id, rx_count,
206                    (const void **) key_ptrs, data);
207        for (i = 0; i < rx_count; i++) {
208            node = (uint8_t) ((uintptr_t)data[i]);
209
210            if (node >= num_nodes) {
211                /*
212                 * Node is out of range, which means that
213                 * flow has not been inserted
214                 */
215                flow_dist_stats.drop++;
216                rte_pktmbuf_free(pkts[i]);
217            } else {
218                flow_dist_stats.distributed++;
219                enqueue_rx_packet(node, pkts[i]);
220            }
221        }
222
223        for (i = 0; i < num_nodes; i++)
224            flush_rx_queue(i);
225    }
226
227The burst of packets received is enqueued in temporary buffers (per node),
228and enqueued in the shared ring between the server and the node.
229After this, a new burst of packets is received and this process is
230repeated infinitely.
231
232.. code-block:: c
233
234    static void
235    flush_rx_queue(uint16_t node)
236    {
237        uint16_t j;
238        struct node *cl;
239
240        if (cl_rx_buf[node].count == 0)
241            return;
242
243        cl = &nodes[node];
244        if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[node].buffer,
245                cl_rx_buf[node].count, NULL) != cl_rx_buf[node].count){
246            for (j = 0; j < cl_rx_buf[node].count; j++)
247                rte_pktmbuf_free(cl_rx_buf[node].buffer[j]);
248            cl->stats.rx_drop += cl_rx_buf[node].count;
249        } else
250            cl->stats.rx += cl_rx_buf[node].count;
251
252        cl_rx_buf[node].count = 0;
253    }
254
255The second process, the back-end node, receives the packets from the shared
256ring with the server and send them out, if they belong to the node.
257
258At initialization, it attaches to the server process memory, to have
259access to the shared ring, parameters and statistics.
260
261.. code-block:: c
262
263    rx_ring = rte_ring_lookup(get_rx_queue_name(node_id));
264    if (rx_ring == NULL)
265        rte_exit(EXIT_FAILURE, "Cannot get RX ring - "
266                "is server process running?\n");
267
268    mp = rte_mempool_lookup(PKTMBUF_POOL_NAME);
269    if (mp == NULL)
270        rte_exit(EXIT_FAILURE, "Cannot get mempool for mbufs\n");
271
272    mz = rte_memzone_lookup(MZ_SHARED_INFO);
273    if (mz == NULL)
274        rte_exit(EXIT_FAILURE, "Cannot get port info structure\n");
275    info = mz->addr;
276    tx_stats = &(info->tx_stats[node_id]);
277    filter_stats = &(info->filter_stats[node_id]);
278
279Then, the hash table that contains the flows that will be handled
280by the node is created and populated.
281
282.. code-block:: c
283
284    static struct rte_hash *
285    create_hash_table(const struct shared_info *info)
286    {
287        uint32_t num_flows_node = info->num_flows / info->num_nodes;
288        char name[RTE_HASH_NAMESIZE];
289        struct rte_hash *h;
290
291        /* create table */
292        struct rte_hash_parameters hash_params = {
293            .entries = num_flows_node * 2, /* table load = 50% */
294            .key_len = sizeof(uint32_t), /* Store IPv4 dest IP address */
295            .socket_id = rte_socket_id(),
296            .hash_func_init_val = 0,
297        };
298
299        snprintf(name, sizeof(name), "hash_table_%d", node_id);
300        hash_params.name = name;
301        h = rte_hash_create(&hash_params);
302
303        if (h == NULL)
304            rte_exit(EXIT_FAILURE,
305                    "Problem creating the hash table for node %d\n",
306                    node_id);
307        return h;
308    }
309
310    static void
311    populate_hash_table(const struct rte_hash *h, const struct shared_info *info)
312    {
313        unsigned int i;
314        int32_t ret;
315        uint32_t ip_dst;
316        uint32_t num_flows_node = 0;
317        uint64_t target_node;
318
319        /* Add flows in table */
320        for (i = 0; i < info->num_flows; i++) {
321            target_node = i % info->num_nodes;
322            if (target_node != node_id)
323                continue;
324
325            ip_dst = rte_cpu_to_be_32(i);
326
327            ret = rte_hash_add_key(h, (void *) &ip_dst);
328            if (ret < 0)
329                rte_exit(EXIT_FAILURE, "Unable to add entry %u "
330                        "in hash table\n", i);
331            else
332                num_flows_node++;
333
334        }
335
336        printf("Hash table: Adding 0x%x keys\n", num_flows_node);
337    }
338
339After initialization, packets are dequeued from the shared ring
340(from the server) and, like in the server process,
341the IPv4 address from the packets is used as a key to look up in the hash table.
342If there is a hit, packet is stored in a buffer, to be eventually transmitted
343in one of the enabled ports. If key is not there, packet is dropped, since the
344flow is not handled by the node.
345
346.. code-block:: c
347
348    static inline void
349    handle_packets(struct rte_hash *h, struct rte_mbuf **bufs, uint16_t num_packets)
350    {
351        struct rte_ipv4_hdr *ipv4_hdr;
352        uint32_t ipv4_dst_ip[PKT_READ_SIZE];
353        const void *key_ptrs[PKT_READ_SIZE];
354        unsigned int i;
355        int32_t positions[PKT_READ_SIZE] = {0};
356
357        for (i = 0; i < num_packets; i++) {
358            /* Handle IPv4 header.*/
359            ipv4_hdr = rte_pktmbuf_mtod_offset(bufs[i], struct rte_ipv4_hdr *,
360                    sizeof(struct rte_ether_hdr));
361            ipv4_dst_ip[i] = ipv4_hdr->dst_addr;
362            key_ptrs[i] = &ipv4_dst_ip[i];
363        }
364        /* Check if packets belongs to any flows handled by this node */
365        rte_hash_lookup_bulk(h, key_ptrs, num_packets, positions);
366
367        for (i = 0; i < num_packets; i++) {
368            if (likely(positions[i] >= 0)) {
369                filter_stats->passed++;
370                transmit_packet(bufs[i]);
371            } else {
372                filter_stats->drop++;
373                /* Drop packet, as flow is not handled by this node */
374                rte_pktmbuf_free(bufs[i]);
375            }
376        }
377    }
378
379Finally, note that both processes updates statistics, such as transmitted, received
380and dropped packets, which are shown and refreshed by the server app.
381
382.. code-block:: c
383
384    static void
385    do_stats_display(void)
386    {
387        unsigned int i, j;
388        const char clr[] = {27, '[', '2', 'J', '\0'};
389        const char topLeft[] = {27, '[', '1', ';', '1', 'H', '\0'};
390        uint64_t port_tx[RTE_MAX_ETHPORTS], port_tx_drop[RTE_MAX_ETHPORTS];
391        uint64_t node_tx[MAX_NODES], node_tx_drop[MAX_NODES];
392
393        /* to get TX stats, we need to do some summing calculations */
394        memset(port_tx, 0, sizeof(port_tx));
395        memset(port_tx_drop, 0, sizeof(port_tx_drop));
396        memset(node_tx, 0, sizeof(node_tx));
397        memset(node_tx_drop, 0, sizeof(node_tx_drop));
398
399        for (i = 0; i < num_nodes; i++) {
400            const struct tx_stats *tx = &info->tx_stats[i];
401
402            for (j = 0; j < info->num_ports; j++) {
403                const uint64_t tx_val = tx->tx[info->id[j]];
404                const uint64_t drop_val = tx->tx_drop[info->id[j]];
405
406                port_tx[j] += tx_val;
407                port_tx_drop[j] += drop_val;
408                node_tx[i] += tx_val;
409                node_tx_drop[i] += drop_val;
410            }
411        }
412
413        /* Clear screen and move to top left */
414        printf("%s%s", clr, topLeft);
415
416        printf("PORTS\n");
417        printf("-----\n");
418        for (i = 0; i < info->num_ports; i++)
419            printf("Port %u: '%s'\t", (unsigned int)info->id[i],
420                    get_printable_mac_addr(info->id[i]));
421        printf("\n\n");
422        for (i = 0; i < info->num_ports; i++) {
423            printf("Port %u - rx: %9"PRIu64"\t"
424                    "tx: %9"PRIu64"\n",
425                    (unsigned int)info->id[i], info->rx_stats.rx[i],
426                    port_tx[i]);
427        }
428
429        printf("\nSERVER\n");
430        printf("-----\n");
431        printf("distributed: %9"PRIu64", drop: %9"PRIu64"\n",
432                flow_dist_stats.distributed, flow_dist_stats.drop);
433
434        printf("\nNODES\n");
435        printf("-------\n");
436        for (i = 0; i < num_nodes; i++) {
437            const unsigned long long rx = nodes[i].stats.rx;
438            const unsigned long long rx_drop = nodes[i].stats.rx_drop;
439            const struct filter_stats *filter = &info->filter_stats[i];
440
441            printf("Node %2u - rx: %9llu, rx_drop: %9llu\n"
442                    "            tx: %9"PRIu64", tx_drop: %9"PRIu64"\n"
443                    "            filter_passed: %9"PRIu64", "
444                    "filter_drop: %9"PRIu64"\n",
445                    i, rx, rx_drop, node_tx[i], node_tx_drop[i],
446                    filter->passed, filter->drop);
447        }
448
449        printf("\n");
450    }
451