1*fc1f2750SBernard Iremonger.. BSD LICENSE 2*fc1f2750SBernard Iremonger Copyright(c) 2010-2014 Intel Corporation. All rights reserved. 3*fc1f2750SBernard Iremonger All rights reserved. 4*fc1f2750SBernard Iremonger 5*fc1f2750SBernard Iremonger Redistribution and use in source and binary forms, with or without 6*fc1f2750SBernard Iremonger modification, are permitted provided that the following conditions 7*fc1f2750SBernard Iremonger are met: 8*fc1f2750SBernard Iremonger 9*fc1f2750SBernard Iremonger * Redistributions of source code must retain the above copyright 10*fc1f2750SBernard Iremonger notice, this list of conditions and the following disclaimer. 11*fc1f2750SBernard Iremonger * Redistributions in binary form must reproduce the above copyright 12*fc1f2750SBernard Iremonger notice, this list of conditions and the following disclaimer in 13*fc1f2750SBernard Iremonger the documentation and/or other materials provided with the 14*fc1f2750SBernard Iremonger distribution. 15*fc1f2750SBernard Iremonger * Neither the name of Intel Corporation nor the names of its 16*fc1f2750SBernard Iremonger contributors may be used to endorse or promote products derived 17*fc1f2750SBernard Iremonger from this software without specific prior written permission. 18*fc1f2750SBernard Iremonger 19*fc1f2750SBernard Iremonger THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20*fc1f2750SBernard Iremonger "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21*fc1f2750SBernard Iremonger LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 22*fc1f2750SBernard Iremonger A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 23*fc1f2750SBernard Iremonger OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 24*fc1f2750SBernard Iremonger SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 25*fc1f2750SBernard Iremonger LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 26*fc1f2750SBernard Iremonger DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 27*fc1f2750SBernard Iremonger THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 28*fc1f2750SBernard Iremonger (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29*fc1f2750SBernard Iremonger OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30*fc1f2750SBernard Iremonger 31*fc1f2750SBernard IremongerIP Fragmentation and Reassembly Library 32*fc1f2750SBernard Iremonger======================================= 33*fc1f2750SBernard Iremonger 34*fc1f2750SBernard IremongerThe IP Fragmentation and Reassembly Library implements IPv4 and IPv6 packet fragmentation and reassembly. 35*fc1f2750SBernard Iremonger 36*fc1f2750SBernard IremongerPacket fragmentation 37*fc1f2750SBernard Iremonger-------------------- 38*fc1f2750SBernard Iremonger 39*fc1f2750SBernard IremongerPacket fragmentation routines devide input packet into number of fragments. 40*fc1f2750SBernard IremongerBoth rte_ipv4_fragment_packet() and rte_ipv6_fragment_packet() functions assume that input mbuf data 41*fc1f2750SBernard Iremongerpoints to the start of the IP header of the packet (i.e. L2 header is already stripped out). 42*fc1f2750SBernard IremongerTo avoid copying fo the actual packet's data zero-copy technique is used (rte_pktmbuf_attach). 43*fc1f2750SBernard IremongerFor each fragment two new mbufs are created: 44*fc1f2750SBernard Iremonger 45*fc1f2750SBernard Iremonger* Direct mbuf -- mbuf that will contain L3 header of the new fragment. 46*fc1f2750SBernard Iremonger 47*fc1f2750SBernard Iremonger* Indirect mbuf -- mbuf that is attached to the mbuf with the original packet. 48*fc1f2750SBernard Iremonger It's data field points to the start of the original packets data plus fragment offset. 49*fc1f2750SBernard Iremonger 50*fc1f2750SBernard IremongerThen L3 header is copied from the original mbuf into the 'direct' mbuf and updated to reflect new fragmented status. 51*fc1f2750SBernard IremongerNote that for IPv4, header checksum is not recalculated and is set to zero. 52*fc1f2750SBernard Iremonger 53*fc1f2750SBernard IremongerFinally 'direct' and 'indirect' mbufs for each fragnemt are linked together via mbuf's next filed to compose a packet for the new fragment. 54*fc1f2750SBernard Iremonger 55*fc1f2750SBernard IremongerThe caller has an ability to explicitly specify which mempools should be used to allocate 'direct' and 'indirect' mbufs from. 56*fc1f2750SBernard Iremonger 57*fc1f2750SBernard IremongerNote that configuration macro RTE_MBUF_SCATTER_GATHER has to be enabled to make fragmentation library build and work correctly. 58*fc1f2750SBernard IremongerFor more information about direct and indirect mbufs, refer to the *Intel DPDK Programmers guide 7.7 Direct and Indirect Buffers.* 59*fc1f2750SBernard Iremonger 60*fc1f2750SBernard IremongerPacket reassembly 61*fc1f2750SBernard Iremonger----------------- 62*fc1f2750SBernard Iremonger 63*fc1f2750SBernard IremongerIP Fragment Table 64*fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~ 65*fc1f2750SBernard Iremonger 66*fc1f2750SBernard IremongerFragment table maintains information about already received fragments of the packet. 67*fc1f2750SBernard Iremonger 68*fc1f2750SBernard IremongerEach IP packet is uniquely identified by triple <Source IP address>, <Destination IP address>, <ID>. 69*fc1f2750SBernard Iremonger 70*fc1f2750SBernard IremongerNote that all update/lookup operations on Fragmen Table are not thread safe. 71*fc1f2750SBernard IremongerSo if different execution contexts (threads/processes) will access the same table simultaneously, 72*fc1f2750SBernard Iremongerthen some exernal syncing mechanism have to be provided. 73*fc1f2750SBernard Iremonger 74*fc1f2750SBernard IremongerEach table entry can hold information about packets consisting of up to RTE_LIBRTE_IP_FRAG_MAX (by default: 4) fragments. 75*fc1f2750SBernard Iremonger 76*fc1f2750SBernard IremongerCode example, that demonstrates creation of a new Fragment table: 77*fc1f2750SBernard Iremonger 78*fc1f2750SBernard Iremonger.. code-block:: c 79*fc1f2750SBernard Iremonger 80*fc1f2750SBernard Iremonger frag_cycles = (rte_get_tsc_hz() + MS_PER_S - 1) / MS_PER_S * max_flow_ttl; 81*fc1f2750SBernard Iremonger bucket_num = max_flow_num + max_flow_num / 4; 82*fc1f2750SBernard Iremonger frag_tbl = rte_ip_frag_table_create(max_flow_num, bucket_entries, max_flow_num, frag_cycles, socket_id); 83*fc1f2750SBernard Iremonger 84*fc1f2750SBernard IremongerInternally Fragmen table is a simple hash table. 85*fc1f2750SBernard IremongerThe basic idea is to use two hash functions and <bucket_entries> \* associativity. 86*fc1f2750SBernard IremongerThis provides 2 \* <bucket_entries> possible locations in the hash table for each key. 87*fc1f2750SBernard IremongerWhen the collision occurs and all 2 \* <bucket_entries> are occupied, 88*fc1f2750SBernard Iremongerinstead of resinserting existing keys into alternative locations, ip_frag_tbl_add() just returns a faiure. 89*fc1f2750SBernard Iremonger 90*fc1f2750SBernard IremongerAlso, entries that resides in the table longer then <max_cycles> are considered as invalid, 91*fc1f2750SBernard Iremongerand could be removed/replaced by the new ones. 92*fc1f2750SBernard Iremonger 93*fc1f2750SBernard IremongerNote that reassembly demands a lot of mbuf's to be allocated. 94*fc1f2750SBernard IremongerAt any given time up to (2 \* bucket_entries \* RTE_LIBRTE_IP_FRAG_MAX \* <maximum number of mbufs per packet>) 95*fc1f2750SBernard Iremongercan be stored inside Fragment Table waiting for remaining fragments. 96*fc1f2750SBernard Iremonger 97*fc1f2750SBernard IremongerPacket Reassembly 98*fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~ 99*fc1f2750SBernard Iremonger 100*fc1f2750SBernard IremongerFragmented packets processing and reassembly is done by the rte_ipv4_frag_reassemble_packet()/rte_ipv6_frag_reassemble_packet. 101*fc1f2750SBernard IremongerFunctions. They either return a pointer to valid mbuf that contains reassembled packet, 102*fc1f2750SBernard Iremongeror NULL (if the packet can't be reassembled for some reason). 103*fc1f2750SBernard Iremonger 104*fc1f2750SBernard IremongerThese functions are responsible for: 105*fc1f2750SBernard Iremonger 106*fc1f2750SBernard Iremonger#. Search the Fragment Table for entry with packet's <IPv4 Source Address, IPv4 Destination Address, Packet ID>. 107*fc1f2750SBernard Iremonger 108*fc1f2750SBernard Iremonger#. If the entry is found, then check if that entry already timed-out. 109*fc1f2750SBernard Iremonger If yes, then free all previously received fragments, and remove information about them from the entry. 110*fc1f2750SBernard Iremonger 111*fc1f2750SBernard Iremonger#. If no entry with such key is found, then try to create a new one by one of two ways: 112*fc1f2750SBernard Iremonger 113*fc1f2750SBernard Iremonger a) Use as empty entry. 114*fc1f2750SBernard Iremonger 115*fc1f2750SBernard Iremonger b) Delete a timed-out entry, free mbufs associated with it mbufs and store a new entry with specified key in it. 116*fc1f2750SBernard Iremonger 117*fc1f2750SBernard Iremonger#. Update the entry with new fragment information and check if a packet can be reassembled 118*fc1f2750SBernard Iremonger (the packet's entry contains all fragments). 119*fc1f2750SBernard Iremonger 120*fc1f2750SBernard Iremonger a) If yes, then, reassemble the packet, mark table's entry as empty and return the reassembled mbuf to the caller. 121*fc1f2750SBernard Iremonger 122*fc1f2750SBernard Iremonger b) If no, then return a NULL to the caller. 123*fc1f2750SBernard Iremonger 124*fc1f2750SBernard IremongerIf at any stage of packet processing an error is envountered 125*fc1f2750SBernard Iremonger(e.g: can't insert new entry into the Fragment Table, or invalid/timed-out fragment), 126*fc1f2750SBernard Iremongerthen the function will free all associated with the packet fragments, 127*fc1f2750SBernard Iremongermark the table entry as invalid and return NULL to the caller. 128*fc1f2750SBernard Iremonger 129*fc1f2750SBernard IremongerDebug logging and Statistics Collection 130*fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 131*fc1f2750SBernard Iremonger 132*fc1f2750SBernard IremongerThe RTE_LIBRTE_IP_FRAG_TBL_STAT config macro controls statistics collection for the Fragment Table. 133*fc1f2750SBernard IremongerThis macro is not enabled by default. 134*fc1f2750SBernard Iremonger 135*fc1f2750SBernard IremongerThe RTE_LIBRTE_IP_FRAG_DEBUG controls debug logging of IP fragments processing and reassembling. 136*fc1f2750SBernard IremongerThis macro is disabled by default. 137*fc1f2750SBernard IremongerNote that while logging contains a lot of detailed information, 138*fc1f2750SBernard Iremongerit slows down packet processing and might cause the loss of a lot of packets. 139