xref: /dpdk/doc/guides/sample_app_ug/server_node_efd.rst (revision 35b09d76f89e7d5a4f38a2926cf6915028ed1e56)
1..  BSD LICENSE
2    Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
3    All rights reserved.
4
5    Redistribution and use in source and binary forms, with or without
6    modification, are permitted provided that the following conditions
7    are met:
8
9    * Redistributions of source code must retain the above copyright
10    notice, this list of conditions and the following disclaimer.
11    * Redistributions in binary form must reproduce the above copyright
12    notice, this list of conditions and the following disclaimer in
13    the documentation and/or other materials provided with the
14    distribution.
15    * Neither the name of Intel Corporation nor the names of its
16    contributors may be used to endorse or promote products derived
17    from this software without specific prior written permission.
18
19    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30
31Server-Node EFD Sample Application
32==================================
33
34This sample application demonstrates the use of EFD library as a flow-level
35load balancer, for more information about the EFD Library please refer to the
36DPDK programmer's guide.
37
38This sample application is a variant of the
39:ref:`client-server sample application <multi_process_app>`
40where a specific target node is specified for every and each flow
41(not in a round-robin fashion as the original load balancing sample application).
42
43Overview
44--------
45
46The architecture of the EFD flow-based load balancer sample application is
47presented in the following figure.
48
49.. _figure_efd_sample_app_overview:
50
51.. figure:: img/server_node_efd.*
52
53   Using EFD as a Flow-Level Load Balancer
54
55As shown in :numref:`figure_efd_sample_app_overview`,
56the sample application consists of a front-end node (server)
57using the EFD library to create a load-balancing table for flows,
58for each flow a target backend worker node is specified. The EFD table does not
59store the flow key (unlike a regular hash table), and hence, it can
60individually load-balance millions of flows (number of targets * maximum number
61of flows fit in a flow table per target) while still fitting in CPU cache.
62
63It should be noted that although they are referred to as nodes, the frontend
64server and worker nodes are processes running on the same platform.
65
66Front-end Server
67~~~~~~~~~~~~~~~~
68
69Upon initializing, the frontend server node (process) creates a flow
70distributor table (based on the EFD library) which is populated with flow
71information and its intended target node.
72
73The sample application assigns a specific target node_id (process) for each of
74the IP destination addresses as follows:
75
76.. code-block:: c
77
78    node_id = i % num_nodes; /* Target node id is generated */
79    ip_dst = rte_cpu_to_be_32(i); /* Specific ip destination address is
80                                     assigned to this target node */
81
82then the pair of <key,target> is inserted into the flow distribution table.
83
84The main loop of the server process receives a burst of packets, then for
85each packet, a flow key (IP destination address) is extracted. The flow
86distributor table is looked up and the target node id is returned.  Packets are
87then enqueued to the specified target node id.
88
89It should be noted that flow distributor table is not a membership test table.
90I.e. if the key has already been inserted the target node id will be correct,
91but for new keys the flow distributor table will return a value (which can be
92valid).
93
94Backend Worker Nodes
95~~~~~~~~~~~~~~~~~~~~
96
97Upon initializing, the worker node (process) creates a flow table (a regular
98hash table that stores the key default size 1M flows) which is populated with
99only the flow information that is serviced at this node. This flow key is
100essential to point out new keys that have not been inserted before.
101
102The worker node's main loop is simply receiving packets then doing a hash table
103lookup. If a match occurs then statistics are updated for flows serviced by
104this node. If no match is found in the local hash table then this indicates
105that this is a new flow, which is dropped.
106
107
108Compiling the Application
109-------------------------
110
111The sequence of steps used to build the application is:
112
113#.  Export the required environment variables:
114
115    .. code-block:: console
116
117        export RTE_SDK=/path/to/rte_sdk
118        export RTE_TARGET=x86_64-native-linuxapp-gcc
119
120#.  Build the application executable file:
121
122    .. code-block:: console
123
124        cd ${RTE_SDK}/examples/server_node_efd/
125        make
126
127    For more details on how to build the DPDK libraries and sample
128    applications,
129    please refer to the *DPDK Getting Started Guide.*
130
131
132Running the Application
133-----------------------
134
135The application has two binaries to be run: the front-end server
136and the back-end node.
137
138The frontend server (server) has the following command line options::
139
140    ./server [EAL options] -- -p PORTMASK -n NUM_NODES -f NUM_FLOWS
141
142Where,
143
144* ``-p PORTMASK:`` Hexadecimal bitmask of ports to configure
145* ``-n NUM_NODES:`` Number of back-end nodes that will be used
146* ``-f NUM_FLOWS:`` Number of flows to be added in the EFD table (1 million, by default)
147
148The back-end node (node) has the following command line options::
149
150    ./node [EAL options] -- -n NODE_ID
151
152Where,
153
154* ``-n NODE_ID:`` Node ID, which cannot be equal or higher than NUM_MODES
155
156
157First, the server app must be launched, with the number of nodes that will be run.
158Once it has been started, the node instances can be run, with different NODE_ID.
159These instances have to be run as secondary processes, with ``--proc-type=secondary``
160in the EAL options, which will attach to the primary process memory, and therefore,
161they can access the queues created by the primary process to distribute packets.
162
163To successfully run the application, the command line used to start the
164application has to be in sync with the traffic flows configured on the traffic
165generator side.
166
167For examples of application command lines and traffic generator flows, please
168refer to the DPDK Test Report. For more details on how to set up and run the
169sample applications provided with DPDK package, please refer to the
170:ref:`DPDK Getting Started Guide for Linux <linux_gsg>` and
171:ref:`DPDK Getting Started Guide for FreeBSD <freebsd_gsg>`.
172
173
174Explanation
175-----------
176
177As described in previous sections, there are two processes in this example.
178
179The first process, the front-end server, creates and populates the EFD table,
180which is used to distribute packets to nodes, which the number of flows
181specified in the command line (1 million, by default).
182
183
184.. code-block:: c
185
186    static void
187    create_efd_table(void)
188    {
189        uint8_t socket_id = rte_socket_id();
190
191        /* create table */
192        efd_table = rte_efd_create("flow table", num_flows * 2, sizeof(uint32_t),
193                        1 << socket_id, socket_id);
194
195        if (efd_table == NULL)
196            rte_exit(EXIT_FAILURE, "Problem creating the flow table\n");
197    }
198
199    static void
200    populate_efd_table(void)
201    {
202        unsigned int i;
203        int32_t ret;
204        uint32_t ip_dst;
205        uint8_t socket_id = rte_socket_id();
206        uint64_t node_id;
207
208        /* Add flows in table */
209        for (i = 0; i < num_flows; i++) {
210            node_id = i % num_nodes;
211
212            ip_dst = rte_cpu_to_be_32(i);
213            ret = rte_efd_update(efd_table, socket_id,
214                            (void *)&ip_dst, (efd_value_t)node_id);
215            if (ret < 0)
216                rte_exit(EXIT_FAILURE, "Unable to add entry %u in "
217                                    "EFD table\n", i);
218        }
219
220        printf("EFD table: Adding 0x%x keys\n", num_flows);
221    }
222
223After initialization, packets are received from the enabled ports, and the IPv4
224address from the packets is used as a key to look up in the EFD table,
225which tells the node where the packet has to be distributed.
226
227.. code-block:: c
228
229    static void
230    process_packets(uint32_t port_num __rte_unused, struct rte_mbuf *pkts[],
231            uint16_t rx_count, unsigned int socket_id)
232    {
233        uint16_t i;
234        uint8_t node;
235        efd_value_t data[EFD_BURST_MAX];
236        const void *key_ptrs[EFD_BURST_MAX];
237
238        struct ipv4_hdr *ipv4_hdr;
239        uint32_t ipv4_dst_ip[EFD_BURST_MAX];
240
241        for (i = 0; i < rx_count; i++) {
242            /* Handle IPv4 header.*/
243            ipv4_hdr = rte_pktmbuf_mtod_offset(pkts[i], struct ipv4_hdr *,
244                    sizeof(struct ether_hdr));
245            ipv4_dst_ip[i] = ipv4_hdr->dst_addr;
246            key_ptrs[i] = (void *)&ipv4_dst_ip[i];
247        }
248
249        rte_efd_lookup_bulk(efd_table, socket_id, rx_count,
250                    (const void **) key_ptrs, data);
251        for (i = 0; i < rx_count; i++) {
252            node = (uint8_t) ((uintptr_t)data[i]);
253
254            if (node >= num_nodes) {
255                /*
256                 * Node is out of range, which means that
257                 * flow has not been inserted
258                 */
259                flow_dist_stats.drop++;
260                rte_pktmbuf_free(pkts[i]);
261            } else {
262                flow_dist_stats.distributed++;
263                enqueue_rx_packet(node, pkts[i]);
264            }
265        }
266
267        for (i = 0; i < num_nodes; i++)
268            flush_rx_queue(i);
269    }
270
271The burst of packets received is enqueued in temporary buffers (per node),
272and enqueued in the shared ring between the server and the node.
273After this, a new burst of packets is received and this process is
274repeated infinitely.
275
276.. code-block:: c
277
278    static void
279    flush_rx_queue(uint16_t node)
280    {
281        uint16_t j;
282        struct node *cl;
283
284        if (cl_rx_buf[node].count == 0)
285            return;
286
287        cl = &nodes[node];
288        if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[node].buffer,
289                cl_rx_buf[node].count) != 0){
290            for (j = 0; j < cl_rx_buf[node].count; j++)
291                rte_pktmbuf_free(cl_rx_buf[node].buffer[j]);
292            cl->stats.rx_drop += cl_rx_buf[node].count;
293        } else
294            cl->stats.rx += cl_rx_buf[node].count;
295
296        cl_rx_buf[node].count = 0;
297    }
298
299The second process, the back-end node, receives the packets from the shared
300ring with the server and send them out, if they belong to the node.
301
302At initialization, it attaches to the server process memory, to have
303access to the shared ring, parameters and statistics.
304
305.. code-block:: c
306
307    rx_ring = rte_ring_lookup(get_rx_queue_name(node_id));
308    if (rx_ring == NULL)
309        rte_exit(EXIT_FAILURE, "Cannot get RX ring - "
310                "is server process running?\n");
311
312    mp = rte_mempool_lookup(PKTMBUF_POOL_NAME);
313    if (mp == NULL)
314        rte_exit(EXIT_FAILURE, "Cannot get mempool for mbufs\n");
315
316    mz = rte_memzone_lookup(MZ_SHARED_INFO);
317    if (mz == NULL)
318        rte_exit(EXIT_FAILURE, "Cannot get port info structure\n");
319    info = mz->addr;
320    tx_stats = &(info->tx_stats[node_id]);
321    filter_stats = &(info->filter_stats[node_id]);
322
323Then, the hash table that contains the flows that will be handled
324by the node is created and populated.
325
326.. code-block:: c
327
328    static struct rte_hash *
329    create_hash_table(const struct shared_info *info)
330    {
331        uint32_t num_flows_node = info->num_flows / info->num_nodes;
332        char name[RTE_HASH_NAMESIZE];
333        struct rte_hash *h;
334
335        /* create table */
336        struct rte_hash_parameters hash_params = {
337            .entries = num_flows_node * 2, /* table load = 50% */
338            .key_len = sizeof(uint32_t), /* Store IPv4 dest IP address */
339            .socket_id = rte_socket_id(),
340            .hash_func_init_val = 0,
341        };
342
343        snprintf(name, sizeof(name), "hash_table_%d", node_id);
344        hash_params.name = name;
345        h = rte_hash_create(&hash_params);
346
347        if (h == NULL)
348            rte_exit(EXIT_FAILURE,
349                    "Problem creating the hash table for node %d\n",
350                    node_id);
351        return h;
352    }
353
354    static void
355    populate_hash_table(const struct rte_hash *h, const struct shared_info *info)
356    {
357        unsigned int i;
358        int32_t ret;
359        uint32_t ip_dst;
360        uint32_t num_flows_node = 0;
361        uint64_t target_node;
362
363        /* Add flows in table */
364        for (i = 0; i < info->num_flows; i++) {
365            target_node = i % info->num_nodes;
366            if (target_node != node_id)
367                continue;
368
369            ip_dst = rte_cpu_to_be_32(i);
370
371            ret = rte_hash_add_key(h, (void *) &ip_dst);
372            if (ret < 0)
373                rte_exit(EXIT_FAILURE, "Unable to add entry %u "
374                        "in hash table\n", i);
375            else
376                num_flows_node++;
377
378        }
379
380        printf("Hash table: Adding 0x%x keys\n", num_flows_node);
381    }
382
383After initialization, packets are dequeued from the shared ring
384(from the server) and, like in the server process,
385the IPv4 address from the packets is used as a key to look up in the hash table.
386If there is a hit, packet is stored in a buffer, to be eventually transmitted
387in one of the enabled ports. If key is not there, packet is dropped, since the
388flow is not handled by the node.
389
390.. code-block:: c
391
392    static inline void
393    handle_packets(struct rte_hash *h, struct rte_mbuf **bufs, uint16_t num_packets)
394    {
395        struct ipv4_hdr *ipv4_hdr;
396        uint32_t ipv4_dst_ip[PKT_READ_SIZE];
397        const void *key_ptrs[PKT_READ_SIZE];
398        unsigned int i;
399        int32_t positions[PKT_READ_SIZE] = {0};
400
401        for (i = 0; i < num_packets; i++) {
402            /* Handle IPv4 header.*/
403            ipv4_hdr = rte_pktmbuf_mtod_offset(bufs[i], struct ipv4_hdr *,
404                    sizeof(struct ether_hdr));
405            ipv4_dst_ip[i] = ipv4_hdr->dst_addr;
406            key_ptrs[i] = &ipv4_dst_ip[i];
407        }
408        /* Check if packets belongs to any flows handled by this node */
409        rte_hash_lookup_bulk(h, key_ptrs, num_packets, positions);
410
411        for (i = 0; i < num_packets; i++) {
412            if (likely(positions[i] >= 0)) {
413                filter_stats->passed++;
414                transmit_packet(bufs[i]);
415            } else {
416                filter_stats->drop++;
417                /* Drop packet, as flow is not handled by this node */
418                rte_pktmbuf_free(bufs[i]);
419            }
420        }
421    }
422
423Finally, note that both processes updates statistics, such as transmitted, received
424and dropped packets, which are shown and refreshed by the server app.
425
426.. code-block:: c
427
428    static void
429    do_stats_display(void)
430    {
431        unsigned int i, j;
432        const char clr[] = {27, '[', '2', 'J', '\0'};
433        const char topLeft[] = {27, '[', '1', ';', '1', 'H', '\0'};
434        uint64_t port_tx[RTE_MAX_ETHPORTS], port_tx_drop[RTE_MAX_ETHPORTS];
435        uint64_t node_tx[MAX_NODES], node_tx_drop[MAX_NODES];
436
437        /* to get TX stats, we need to do some summing calculations */
438        memset(port_tx, 0, sizeof(port_tx));
439        memset(port_tx_drop, 0, sizeof(port_tx_drop));
440        memset(node_tx, 0, sizeof(node_tx));
441        memset(node_tx_drop, 0, sizeof(node_tx_drop));
442
443        for (i = 0; i < num_nodes; i++) {
444            const struct tx_stats *tx = &info->tx_stats[i];
445
446            for (j = 0; j < info->num_ports; j++) {
447                const uint64_t tx_val = tx->tx[info->id[j]];
448                const uint64_t drop_val = tx->tx_drop[info->id[j]];
449
450                port_tx[j] += tx_val;
451                port_tx_drop[j] += drop_val;
452                node_tx[i] += tx_val;
453                node_tx_drop[i] += drop_val;
454            }
455        }
456
457        /* Clear screen and move to top left */
458        printf("%s%s", clr, topLeft);
459
460        printf("PORTS\n");
461        printf("-----\n");
462        for (i = 0; i < info->num_ports; i++)
463            printf("Port %u: '%s'\t", (unsigned int)info->id[i],
464                    get_printable_mac_addr(info->id[i]));
465        printf("\n\n");
466        for (i = 0; i < info->num_ports; i++) {
467            printf("Port %u - rx: %9"PRIu64"\t"
468                    "tx: %9"PRIu64"\n",
469                    (unsigned int)info->id[i], info->rx_stats.rx[i],
470                    port_tx[i]);
471        }
472
473        printf("\nSERVER\n");
474        printf("-----\n");
475        printf("distributed: %9"PRIu64", drop: %9"PRIu64"\n",
476                flow_dist_stats.distributed, flow_dist_stats.drop);
477
478        printf("\nNODES\n");
479        printf("-------\n");
480        for (i = 0; i < num_nodes; i++) {
481            const unsigned long long rx = nodes[i].stats.rx;
482            const unsigned long long rx_drop = nodes[i].stats.rx_drop;
483            const struct filter_stats *filter = &info->filter_stats[i];
484
485            printf("Node %2u - rx: %9llu, rx_drop: %9llu\n"
486                    "            tx: %9"PRIu64", tx_drop: %9"PRIu64"\n"
487                    "            filter_passed: %9"PRIu64", "
488                    "filter_drop: %9"PRIu64"\n",
489                    i, rx, rx_drop, node_tx[i], node_tx_drop[i],
490                    filter->passed, filter->drop);
491        }
492
493        printf("\n");
494    }
495