1.. BSD LICENSE 2 Copyright(c) 2010-2014 Intel Corporation. All rights reserved. 3 All rights reserved. 4 5 Redistribution and use in source and binary forms, with or without 6 modification, are permitted provided that the following conditions 7 are met: 8 9 * Redistributions of source code must retain the above copyright 10 notice, this list of conditions and the following disclaimer. 11 * Redistributions in binary form must reproduce the above copyright 12 notice, this list of conditions and the following disclaimer in 13 the documentation and/or other materials provided with the 14 distribution. 15 * Neither the name of Intel Corporation nor the names of its 16 contributors may be used to endorse or promote products derived 17 from this software without specific prior written permission. 18 19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 31IP Reassembly Sample Application 32================================ 33 34The L3 Forwarding application is a simple example of packet processing using the DPDK. 35The application performs L3 forwarding with reassembly for fragmented IPv4 and IPv6 packets. 36 37Overview 38-------- 39 40The application demonstrates the use of the DPDK libraries to implement packet forwarding 41with reassembly for IPv4 and IPv6 fragmented packets. 42The initialization and run- time paths are very similar to those of the :doc:`l2_forward_real_virtual`. 43The main difference from the L2 Forwarding sample application is that 44it reassembles fragmented IPv4 and IPv6 packets before forwarding. 45The maximum allowed size of reassembled packet is 9.5 KB. 46 47There are two key differences from the L2 Forwarding sample application: 48 49* The first difference is that the forwarding decision is taken based on information read from the input packet's IP header. 50 51* The second difference is that the application differentiates between IP and non-IP traffic by means of offload flags. 52 53The Longest Prefix Match (LPM for IPv4, LPM6 for IPv6) table is used to store/lookup an outgoing port number, associated with that IPv4 address. Any unmatched packets are forwarded to the originating port.Compiling the Application 54-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 55 56To compile the application: 57 58#. Go to the sample application directory: 59 60 .. code-block:: console 61 62 export RTE_SDK=/path/to/rte_sdk 63 cd ${RTE_SDK}/examples/ip_reassembly 64 65#. Set the target (a default target is used if not specified). For example: 66 67 .. code-block:: console 68 69 export RTE_TARGET=x86_64-native-linuxapp-gcc 70 71See the *DPDK Getting Started Guide* for possible RTE_TARGET values. 72 73#. Build the application: 74 75 .. code-block:: console 76 77 make 78 79Running the Application 80----------------------- 81 82The application has a number of command line options: 83 84.. code-block:: console 85 86 ./build/ip_reassembly [EAL options] -- -p PORTMASK [-q NQ] [--maxflows=FLOWS>] [--flowttl=TTL[(s|ms)]] 87 88where: 89 90* -p PORTMASK: Hexadecimal bitmask of ports to configure 91 92* -q NQ: Number of RX queues per lcore 93 94* --maxflows=FLOWS: determines maximum number of active fragmented flows (1-65535). Default value: 4096. 95 96* --flowttl=TTL[(s|ms)]: determines maximum Time To Live for fragmented packet. 97 If all fragments of the packet wouldn't appear within given time-out, 98 then they are considered as invalid and will be dropped. 99 Valid range is 1ms - 3600s. Default value: 1s. 100 101To run the example in linuxapp environment with 2 lcores (2,4) over 2 ports(0,2) with 1 RX queue per lcore: 102 103.. code-block:: console 104 105 ./build/ip_reassembly -l 2,4 -n 3 -- -p 5 106 EAL: coremask set to 14 107 EAL: Detected lcore 0 on socket 0 108 EAL: Detected lcore 1 on socket 1 109 EAL: Detected lcore 2 on socket 0 110 EAL: Detected lcore 3 on socket 1 111 EAL: Detected lcore 4 on socket 0 112 ... 113 114 Initializing port 0 on lcore 2... Address:00:1B:21:76:FA:2C, rxq=0 txq=2,0 txq=4,1 115 done: Link Up - speed 10000 Mbps - full-duplex 116 Skipping disabled port 1 117 Initializing port 2 on lcore 4... Address:00:1B:21:5C:FF:54, rxq=0 txq=2,0 txq=4,1 118 done: Link Up - speed 10000 Mbps - full-duplex 119 Skipping disabled port 3IP_FRAG: Socket 0: adding route 100.10.0.0/16 (port 0) 120 IP_RSMBL: Socket 0: adding route 100.20.0.0/16 (port 1) 121 ... 122 123 IP_RSMBL: Socket 0: adding route 0101:0101:0101:0101:0101:0101:0101:0101/48 (port 0) 124 IP_RSMBL: Socket 0: adding route 0201:0101:0101:0101:0101:0101:0101:0101/48 (port 1) 125 ... 126 127 IP_RSMBL: entering main loop on lcore 4 128 IP_RSMBL: -- lcoreid=4 portid=2 129 IP_RSMBL: entering main loop on lcore 2 130 IP_RSMBL: -- lcoreid=2 portid=0 131 132To run the example in linuxapp environment with 1 lcore (4) over 2 ports(0,2) with 2 RX queues per lcore: 133 134.. code-block:: console 135 136 ./build/ip_reassembly -l 4 -n 3 -- -p 5 -q 2 137 138To test the application, flows should be set up in the flow generator that match the values in the 139l3fwd_ipv4_route_array and/or l3fwd_ipv6_route_array table. 140 141Please note that in order to test this application, 142the traffic generator should be generating valid fragmented IP packets. 143For IPv6, the only supported case is when no other extension headers other than 144fragment extension header are present in the packet. 145 146The default l3fwd_ipv4_route_array table is: 147 148.. code-block:: c 149 150 struct l3fwd_ipv4_route l3fwd_ipv4_route_array[] = { 151 {IPv4(100, 10, 0, 0), 16, 0}, 152 {IPv4(100, 20, 0, 0), 16, 1}, 153 {IPv4(100, 30, 0, 0), 16, 2}, 154 {IPv4(100, 40, 0, 0), 16, 3}, 155 {IPv4(100, 50, 0, 0), 16, 4}, 156 {IPv4(100, 60, 0, 0), 16, 5}, 157 {IPv4(100, 70, 0, 0), 16, 6}, 158 {IPv4(100, 80, 0, 0), 16, 7}, 159 }; 160 161The default l3fwd_ipv6_route_array table is: 162 163.. code-block:: c 164 165 struct l3fwd_ipv6_route l3fwd_ipv6_route_array[] = { 166 {{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, 48, 0}, 167 {{2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, 48, 1}, 168 {{3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, 48, 2}, 169 {{4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, 48, 3}, 170 {{5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, 48, 4}, 171 {{6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, 48, 5}, 172 {{7, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, 48, 6}, 173 {{8, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, 48, 7}, 174 }; 175 176For example, for the fragmented input IPv4 packet with destination address: 100.10.1.1, 177a reassembled IPv4 packet be sent out from port #0 to the destination address 100.10.1.1 178once all the fragments are collected. 179 180Explanation 181----------- 182 183The following sections provide some explanation of the sample application code. 184As mentioned in the overview section, the initialization and run-time paths are very similar to those of the :doc:`l2_forward_real_virtual`. 185The following sections describe aspects that are specific to the IP reassemble sample application. 186 187IPv4 Fragment Table Initialization 188~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 189 190This application uses the rte_ip_frag library. Please refer to Programmer's Guide for more detailed explanation of how to use this library. 191Fragment table maintains information about already received fragments of the packet. 192Each IP packet is uniquely identified by triple <Source IP address>, <Destination IP address>, <ID>. 193To avoid lock contention, each RX queue has its own Fragment Table, 194e.g. the application can't handle the situation when different fragments of the same packet arrive through different RX queues. 195Each table entry can hold information about packet consisting of up to RTE_LIBRTE_IP_FRAG_MAX_FRAGS fragments. 196 197.. code-block:: c 198 199 frag_cycles = (rte_get_tsc_hz() + MS_PER_S - 1) / MS_PER_S * max_flow_ttl; 200 201 if ((qconf->frag_tbl[queue] = rte_ip_frag_tbl_create(max_flow_num, IPV4_FRAG_TBL_BUCKET_ENTRIES, max_flow_num, frag_cycles, socket)) == NULL) 202 { 203 RTE_LOG(ERR, IP_RSMBL, "ip_frag_tbl_create(%u) on " "lcore: %u for queue: %u failed\n", max_flow_num, lcore, queue); 204 return -1; 205 } 206 207Mempools Initialization 208~~~~~~~~~~~~~~~~~~~~~~~ 209 210The reassembly application demands a lot of mbuf's to be allocated. 211At any given time up to (2 \* max_flow_num \* RTE_LIBRTE_IP_FRAG_MAX_FRAGS \* <maximum number of mbufs per packet>) 212can be stored inside Fragment Table waiting for remaining fragments. 213To keep mempool size under reasonable limits and to avoid situation when one RX queue can starve other queues, 214each RX queue uses its own mempool. 215 216.. code-block:: c 217 218 nb_mbuf = RTE_MAX(max_flow_num, 2UL * MAX_PKT_BURST) * RTE_LIBRTE_IP_FRAG_MAX_FRAGS; 219 nb_mbuf *= (port_conf.rxmode.max_rx_pkt_len + BUF_SIZE - 1) / BUF_SIZE; 220 nb_mbuf *= 2; /* ipv4 and ipv6 */ 221 nb_mbuf += RTE_TEST_RX_DESC_DEFAULT + RTE_TEST_TX_DESC_DEFAULT; 222 nb_mbuf = RTE_MAX(nb_mbuf, (uint32_t)NB_MBUF); 223 224 snprintf(buf, sizeof(buf), "mbuf_pool_%u_%u", lcore, queue); 225 226 if ((rxq->pool = rte_mempool_create(buf, nb_mbuf, MBUF_SIZE, 0, sizeof(struct rte_pktmbuf_pool_private), rte_pktmbuf_pool_init, NULL, 227 rte_pktmbuf_init, NULL, socket, MEMPOOL_F_SP_PUT | MEMPOOL_F_SC_GET)) == NULL) { 228 229 RTE_LOG(ERR, IP_RSMBL, "mempool_create(%s) failed", buf); 230 return -1; 231 } 232 233Packet Reassembly and Forwarding 234~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 235 236For each input packet, the packet forwarding operation is done by the l3fwd_simple_forward() function. 237If the packet is an IPv4 or IPv6 fragment, then it calls rte_ipv4_reassemble_packet() for IPv4 packets, 238or rte_ipv6_reassemble_packet() for IPv6 packets. 239These functions either return a pointer to valid mbuf that contains reassembled packet, 240or NULL (if the packet can't be reassembled for some reason). 241Then l3fwd_simple_forward() continues with the code for the packet forwarding decision 242(that is, the identification of the output interface for the packet) and 243actual transmit of the packet. 244 245The rte_ipv4_reassemble_packet() or rte_ipv6_reassemble_packet() are responsible for: 246 247#. Searching the Fragment Table for entry with packet's <IP Source Address, IP Destination Address, Packet ID> 248 249#. If the entry is found, then check if that entry already timed-out. 250 If yes, then free all previously received fragments, 251 and remove information about them from the entry. 252 253#. If no entry with such key is found, then try to create a new one by one of two ways: 254 255 #. Use as empty entry 256 257 #. Delete a timed-out entry, free mbufs associated with it mbufs and store a new entry with specified key in it. 258 259#. Update the entry with new fragment information and check 260 if a packet can be reassembled (the packet's entry contains all fragments). 261 262 #. If yes, then, reassemble the packet, mark table's entry as empty and return the reassembled mbuf to the caller. 263 264 #. If no, then just return a NULL to the caller. 265 266If at any stage of packet processing a reassembly function encounters an error 267(can't insert new entry into the Fragment table, or invalid/timed-out fragment), 268then it will free all associated with the packet fragments, 269mark the table entry as invalid and return NULL to the caller. 270 271Debug logging and Statistics Collection 272~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 273 274The RTE_LIBRTE_IP_FRAG_TBL_STAT controls statistics collection for the IP Fragment Table. 275This macro is disabled by default. 276To make ip_reassembly print the statistics to the standard output, 277the user must send either an USR1, INT or TERM signal to the process. 278For all of these signals, the ip_reassembly process prints Fragment table statistics for each RX queue, 279plus the INT and TERM will cause process termination as usual. 280