xref: /dpdk/doc/guides/sample_app_ug/server_node_efd.rst (revision 945acb4a0d644d194f1823084a234f9c286dcf8c)
1..  BSD LICENSE
2    Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
3    All rights reserved.
4
5    Redistribution and use in source and binary forms, with or without
6    modification, are permitted provided that the following conditions
7    are met:
8
9    * Redistributions of source code must retain the above copyright
10    notice, this list of conditions and the following disclaimer.
11    * Redistributions in binary form must reproduce the above copyright
12    notice, this list of conditions and the following disclaimer in
13    the documentation and/or other materials provided with the
14    distribution.
15    * Neither the name of Intel Corporation nor the names of its
16    contributors may be used to endorse or promote products derived
17    from this software without specific prior written permission.
18
19    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30
31Server-Node EFD Sample Application
32==================================
33
34This sample application demonstrates the use of EFD library as a flow-level
35load balancer, for more information about the EFD Library please refer to the
36DPDK programmer's guide.
37
38This sample application is a variant of the
39:ref:`client-server sample application <multi_process_app>`
40where a specific target node is specified for every and each flow
41(not in a round-robin fashion as the original load balancing sample application).
42
43Overview
44--------
45
46The architecture of the EFD flow-based load balancer sample application is
47presented in the following figure.
48
49.. _figure_efd_sample_app_overview:
50
51.. figure:: img/server_node_efd.*
52
53   Using EFD as a Flow-Level Load Balancer
54
55As shown in :numref:`figure_efd_sample_app_overview`,
56the sample application consists of a front-end node (server)
57using the EFD library to create a load-balancing table for flows,
58for each flow a target backend worker node is specified. The EFD table does not
59store the flow key (unlike a regular hash table), and hence, it can
60individually load-balance millions of flows (number of targets * maximum number
61of flows fit in a flow table per target) while still fitting in CPU cache.
62
63It should be noted that although they are referred to as nodes, the frontend
64server and worker nodes are processes running on the same platform.
65
66Front-end Server
67~~~~~~~~~~~~~~~~
68
69Upon initializing, the frontend server node (process) creates a flow
70distributor table (based on the EFD library) which is populated with flow
71information and its intended target node.
72
73The sample application assigns a specific target node_id (process) for each of
74the IP destination addresses as follows:
75
76.. code-block:: c
77
78    node_id = i % num_nodes; /* Target node id is generated */
79    ip_dst = rte_cpu_to_be_32(i); /* Specific ip destination address is
80                                     assigned to this target node */
81
82then the pair of <key,target> is inserted into the flow distribution table.
83
84The main loop of the server process receives a burst of packets, then for
85each packet, a flow key (IP destination address) is extracted. The flow
86distributor table is looked up and the target node id is returned.  Packets are
87then enqueued to the specified target node id.
88
89It should be noted that flow distributor table is not a membership test table.
90I.e. if the key has already been inserted the target node id will be correct,
91but for new keys the flow distributor table will return a value (which can be
92valid).
93
94Backend Worker Nodes
95~~~~~~~~~~~~~~~~~~~~
96
97Upon initializing, the worker node (process) creates a flow table (a regular
98hash table that stores the key default size 1M flows) which is populated with
99only the flow information that is serviced at this node. This flow key is
100essential to point out new keys that have not been inserted before.
101
102The worker node's main loop is simply receiving packets then doing a hash table
103lookup. If a match occurs then statistics are updated for flows serviced by
104this node. If no match is found in the local hash table then this indicates
105that this is a new flow, which is dropped.
106
107
108Compiling the Application
109-------------------------
110
111To compile the sample application see :doc:`compiling`.
112
113The application is located in the ``server_node_efd`` sub-directory.
114
115Running the Application
116-----------------------
117
118The application has two binaries to be run: the front-end server
119and the back-end node.
120
121The frontend server (server) has the following command line options::
122
123    ./server [EAL options] -- -p PORTMASK -n NUM_NODES -f NUM_FLOWS
124
125Where,
126
127* ``-p PORTMASK:`` Hexadecimal bitmask of ports to configure
128* ``-n NUM_NODES:`` Number of back-end nodes that will be used
129* ``-f NUM_FLOWS:`` Number of flows to be added in the EFD table (1 million, by default)
130
131The back-end node (node) has the following command line options::
132
133    ./node [EAL options] -- -n NODE_ID
134
135Where,
136
137* ``-n NODE_ID:`` Node ID, which cannot be equal or higher than NUM_MODES
138
139
140First, the server app must be launched, with the number of nodes that will be run.
141Once it has been started, the node instances can be run, with different NODE_ID.
142These instances have to be run as secondary processes, with ``--proc-type=secondary``
143in the EAL options, which will attach to the primary process memory, and therefore,
144they can access the queues created by the primary process to distribute packets.
145
146To successfully run the application, the command line used to start the
147application has to be in sync with the traffic flows configured on the traffic
148generator side.
149
150For examples of application command lines and traffic generator flows, please
151refer to the DPDK Test Report. For more details on how to set up and run the
152sample applications provided with DPDK package, please refer to the
153:ref:`DPDK Getting Started Guide for Linux <linux_gsg>` and
154:ref:`DPDK Getting Started Guide for FreeBSD <freebsd_gsg>`.
155
156
157Explanation
158-----------
159
160As described in previous sections, there are two processes in this example.
161
162The first process, the front-end server, creates and populates the EFD table,
163which is used to distribute packets to nodes, which the number of flows
164specified in the command line (1 million, by default).
165
166
167.. code-block:: c
168
169    static void
170    create_efd_table(void)
171    {
172        uint8_t socket_id = rte_socket_id();
173
174        /* create table */
175        efd_table = rte_efd_create("flow table", num_flows * 2, sizeof(uint32_t),
176                        1 << socket_id, socket_id);
177
178        if (efd_table == NULL)
179            rte_exit(EXIT_FAILURE, "Problem creating the flow table\n");
180    }
181
182    static void
183    populate_efd_table(void)
184    {
185        unsigned int i;
186        int32_t ret;
187        uint32_t ip_dst;
188        uint8_t socket_id = rte_socket_id();
189        uint64_t node_id;
190
191        /* Add flows in table */
192        for (i = 0; i < num_flows; i++) {
193            node_id = i % num_nodes;
194
195            ip_dst = rte_cpu_to_be_32(i);
196            ret = rte_efd_update(efd_table, socket_id,
197                            (void *)&ip_dst, (efd_value_t)node_id);
198            if (ret < 0)
199                rte_exit(EXIT_FAILURE, "Unable to add entry %u in "
200                                    "EFD table\n", i);
201        }
202
203        printf("EFD table: Adding 0x%x keys\n", num_flows);
204    }
205
206After initialization, packets are received from the enabled ports, and the IPv4
207address from the packets is used as a key to look up in the EFD table,
208which tells the node where the packet has to be distributed.
209
210.. code-block:: c
211
212    static void
213    process_packets(uint32_t port_num __rte_unused, struct rte_mbuf *pkts[],
214            uint16_t rx_count, unsigned int socket_id)
215    {
216        uint16_t i;
217        uint8_t node;
218        efd_value_t data[EFD_BURST_MAX];
219        const void *key_ptrs[EFD_BURST_MAX];
220
221        struct ipv4_hdr *ipv4_hdr;
222        uint32_t ipv4_dst_ip[EFD_BURST_MAX];
223
224        for (i = 0; i < rx_count; i++) {
225            /* Handle IPv4 header.*/
226            ipv4_hdr = rte_pktmbuf_mtod_offset(pkts[i], struct ipv4_hdr *,
227                    sizeof(struct ether_hdr));
228            ipv4_dst_ip[i] = ipv4_hdr->dst_addr;
229            key_ptrs[i] = (void *)&ipv4_dst_ip[i];
230        }
231
232        rte_efd_lookup_bulk(efd_table, socket_id, rx_count,
233                    (const void **) key_ptrs, data);
234        for (i = 0; i < rx_count; i++) {
235            node = (uint8_t) ((uintptr_t)data[i]);
236
237            if (node >= num_nodes) {
238                /*
239                 * Node is out of range, which means that
240                 * flow has not been inserted
241                 */
242                flow_dist_stats.drop++;
243                rte_pktmbuf_free(pkts[i]);
244            } else {
245                flow_dist_stats.distributed++;
246                enqueue_rx_packet(node, pkts[i]);
247            }
248        }
249
250        for (i = 0; i < num_nodes; i++)
251            flush_rx_queue(i);
252    }
253
254The burst of packets received is enqueued in temporary buffers (per node),
255and enqueued in the shared ring between the server and the node.
256After this, a new burst of packets is received and this process is
257repeated infinitely.
258
259.. code-block:: c
260
261    static void
262    flush_rx_queue(uint16_t node)
263    {
264        uint16_t j;
265        struct node *cl;
266
267        if (cl_rx_buf[node].count == 0)
268            return;
269
270        cl = &nodes[node];
271        if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[node].buffer,
272                cl_rx_buf[node].count, NULL) != cl_rx_buf[node].count){
273            for (j = 0; j < cl_rx_buf[node].count; j++)
274                rte_pktmbuf_free(cl_rx_buf[node].buffer[j]);
275            cl->stats.rx_drop += cl_rx_buf[node].count;
276        } else
277            cl->stats.rx += cl_rx_buf[node].count;
278
279        cl_rx_buf[node].count = 0;
280    }
281
282The second process, the back-end node, receives the packets from the shared
283ring with the server and send them out, if they belong to the node.
284
285At initialization, it attaches to the server process memory, to have
286access to the shared ring, parameters and statistics.
287
288.. code-block:: c
289
290    rx_ring = rte_ring_lookup(get_rx_queue_name(node_id));
291    if (rx_ring == NULL)
292        rte_exit(EXIT_FAILURE, "Cannot get RX ring - "
293                "is server process running?\n");
294
295    mp = rte_mempool_lookup(PKTMBUF_POOL_NAME);
296    if (mp == NULL)
297        rte_exit(EXIT_FAILURE, "Cannot get mempool for mbufs\n");
298
299    mz = rte_memzone_lookup(MZ_SHARED_INFO);
300    if (mz == NULL)
301        rte_exit(EXIT_FAILURE, "Cannot get port info structure\n");
302    info = mz->addr;
303    tx_stats = &(info->tx_stats[node_id]);
304    filter_stats = &(info->filter_stats[node_id]);
305
306Then, the hash table that contains the flows that will be handled
307by the node is created and populated.
308
309.. code-block:: c
310
311    static struct rte_hash *
312    create_hash_table(const struct shared_info *info)
313    {
314        uint32_t num_flows_node = info->num_flows / info->num_nodes;
315        char name[RTE_HASH_NAMESIZE];
316        struct rte_hash *h;
317
318        /* create table */
319        struct rte_hash_parameters hash_params = {
320            .entries = num_flows_node * 2, /* table load = 50% */
321            .key_len = sizeof(uint32_t), /* Store IPv4 dest IP address */
322            .socket_id = rte_socket_id(),
323            .hash_func_init_val = 0,
324        };
325
326        snprintf(name, sizeof(name), "hash_table_%d", node_id);
327        hash_params.name = name;
328        h = rte_hash_create(&hash_params);
329
330        if (h == NULL)
331            rte_exit(EXIT_FAILURE,
332                    "Problem creating the hash table for node %d\n",
333                    node_id);
334        return h;
335    }
336
337    static void
338    populate_hash_table(const struct rte_hash *h, const struct shared_info *info)
339    {
340        unsigned int i;
341        int32_t ret;
342        uint32_t ip_dst;
343        uint32_t num_flows_node = 0;
344        uint64_t target_node;
345
346        /* Add flows in table */
347        for (i = 0; i < info->num_flows; i++) {
348            target_node = i % info->num_nodes;
349            if (target_node != node_id)
350                continue;
351
352            ip_dst = rte_cpu_to_be_32(i);
353
354            ret = rte_hash_add_key(h, (void *) &ip_dst);
355            if (ret < 0)
356                rte_exit(EXIT_FAILURE, "Unable to add entry %u "
357                        "in hash table\n", i);
358            else
359                num_flows_node++;
360
361        }
362
363        printf("Hash table: Adding 0x%x keys\n", num_flows_node);
364    }
365
366After initialization, packets are dequeued from the shared ring
367(from the server) and, like in the server process,
368the IPv4 address from the packets is used as a key to look up in the hash table.
369If there is a hit, packet is stored in a buffer, to be eventually transmitted
370in one of the enabled ports. If key is not there, packet is dropped, since the
371flow is not handled by the node.
372
373.. code-block:: c
374
375    static inline void
376    handle_packets(struct rte_hash *h, struct rte_mbuf **bufs, uint16_t num_packets)
377    {
378        struct ipv4_hdr *ipv4_hdr;
379        uint32_t ipv4_dst_ip[PKT_READ_SIZE];
380        const void *key_ptrs[PKT_READ_SIZE];
381        unsigned int i;
382        int32_t positions[PKT_READ_SIZE] = {0};
383
384        for (i = 0; i < num_packets; i++) {
385            /* Handle IPv4 header.*/
386            ipv4_hdr = rte_pktmbuf_mtod_offset(bufs[i], struct ipv4_hdr *,
387                    sizeof(struct ether_hdr));
388            ipv4_dst_ip[i] = ipv4_hdr->dst_addr;
389            key_ptrs[i] = &ipv4_dst_ip[i];
390        }
391        /* Check if packets belongs to any flows handled by this node */
392        rte_hash_lookup_bulk(h, key_ptrs, num_packets, positions);
393
394        for (i = 0; i < num_packets; i++) {
395            if (likely(positions[i] >= 0)) {
396                filter_stats->passed++;
397                transmit_packet(bufs[i]);
398            } else {
399                filter_stats->drop++;
400                /* Drop packet, as flow is not handled by this node */
401                rte_pktmbuf_free(bufs[i]);
402            }
403        }
404    }
405
406Finally, note that both processes updates statistics, such as transmitted, received
407and dropped packets, which are shown and refreshed by the server app.
408
409.. code-block:: c
410
411    static void
412    do_stats_display(void)
413    {
414        unsigned int i, j;
415        const char clr[] = {27, '[', '2', 'J', '\0'};
416        const char topLeft[] = {27, '[', '1', ';', '1', 'H', '\0'};
417        uint64_t port_tx[RTE_MAX_ETHPORTS], port_tx_drop[RTE_MAX_ETHPORTS];
418        uint64_t node_tx[MAX_NODES], node_tx_drop[MAX_NODES];
419
420        /* to get TX stats, we need to do some summing calculations */
421        memset(port_tx, 0, sizeof(port_tx));
422        memset(port_tx_drop, 0, sizeof(port_tx_drop));
423        memset(node_tx, 0, sizeof(node_tx));
424        memset(node_tx_drop, 0, sizeof(node_tx_drop));
425
426        for (i = 0; i < num_nodes; i++) {
427            const struct tx_stats *tx = &info->tx_stats[i];
428
429            for (j = 0; j < info->num_ports; j++) {
430                const uint64_t tx_val = tx->tx[info->id[j]];
431                const uint64_t drop_val = tx->tx_drop[info->id[j]];
432
433                port_tx[j] += tx_val;
434                port_tx_drop[j] += drop_val;
435                node_tx[i] += tx_val;
436                node_tx_drop[i] += drop_val;
437            }
438        }
439
440        /* Clear screen and move to top left */
441        printf("%s%s", clr, topLeft);
442
443        printf("PORTS\n");
444        printf("-----\n");
445        for (i = 0; i < info->num_ports; i++)
446            printf("Port %u: '%s'\t", (unsigned int)info->id[i],
447                    get_printable_mac_addr(info->id[i]));
448        printf("\n\n");
449        for (i = 0; i < info->num_ports; i++) {
450            printf("Port %u - rx: %9"PRIu64"\t"
451                    "tx: %9"PRIu64"\n",
452                    (unsigned int)info->id[i], info->rx_stats.rx[i],
453                    port_tx[i]);
454        }
455
456        printf("\nSERVER\n");
457        printf("-----\n");
458        printf("distributed: %9"PRIu64", drop: %9"PRIu64"\n",
459                flow_dist_stats.distributed, flow_dist_stats.drop);
460
461        printf("\nNODES\n");
462        printf("-------\n");
463        for (i = 0; i < num_nodes; i++) {
464            const unsigned long long rx = nodes[i].stats.rx;
465            const unsigned long long rx_drop = nodes[i].stats.rx_drop;
466            const struct filter_stats *filter = &info->filter_stats[i];
467
468            printf("Node %2u - rx: %9llu, rx_drop: %9llu\n"
469                    "            tx: %9"PRIu64", tx_drop: %9"PRIu64"\n"
470                    "            filter_passed: %9"PRIu64", "
471                    "filter_drop: %9"PRIu64"\n",
472                    i, rx, rx_drop, node_tx[i], node_tx_drop[i],
473                    filter->passed, filter->drop);
474        }
475
476        printf("\n");
477    }
478