15630257fSFerruh Yigit.. SPDX-License-Identifier: BSD-3-Clause 25630257fSFerruh Yigit Copyright(c) 2010-2014 Intel Corporation. 3fc1f2750SBernard Iremonger 4fc1f2750SBernard IremongerIP Fragmentation and Reassembly Library 5fc1f2750SBernard Iremonger======================================= 6fc1f2750SBernard Iremonger 7fc1f2750SBernard IremongerThe IP Fragmentation and Reassembly Library implements IPv4 and IPv6 packet fragmentation and reassembly. 8fc1f2750SBernard Iremonger 9fc1f2750SBernard IremongerPacket fragmentation 10fc1f2750SBernard Iremonger-------------------- 11fc1f2750SBernard Iremonger 12fea1d908SJohn McNamaraPacket fragmentation routines divide input packet into number of fragments. 13fc1f2750SBernard IremongerBoth rte_ipv4_fragment_packet() and rte_ipv6_fragment_packet() functions assume that input mbuf data 14fc1f2750SBernard Iremongerpoints to the start of the IP header of the packet (i.e. L2 header is already stripped out). 15fea1d908SJohn McNamaraTo avoid copying of the actual packet's data zero-copy technique is used (rte_pktmbuf_attach). 16fc1f2750SBernard IremongerFor each fragment two new mbufs are created: 17fc1f2750SBernard Iremonger 18fc1f2750SBernard Iremonger* Direct mbuf -- mbuf that will contain L3 header of the new fragment. 19fc1f2750SBernard Iremonger 20fc1f2750SBernard Iremonger* Indirect mbuf -- mbuf that is attached to the mbuf with the original packet. 21fc1f2750SBernard Iremonger It's data field points to the start of the original packets data plus fragment offset. 22fc1f2750SBernard Iremonger 23fc1f2750SBernard IremongerThen L3 header is copied from the original mbuf into the 'direct' mbuf and updated to reflect new fragmented status. 24fc1f2750SBernard IremongerNote that for IPv4, header checksum is not recalculated and is set to zero. 25fc1f2750SBernard Iremonger 26fea1d908SJohn McNamaraFinally 'direct' and 'indirect' mbufs for each fragment are linked together via mbuf's next filed to compose a packet for the new fragment. 27fc1f2750SBernard Iremonger 28fc1f2750SBernard IremongerThe caller has an ability to explicitly specify which mempools should be used to allocate 'direct' and 'indirect' mbufs from. 29fc1f2750SBernard Iremonger 3029e30cbcSThomas MonjalonFor more information about direct and indirect mbufs, refer to :ref:`direct_indirect_buffer`. 31fc1f2750SBernard Iremonger 32fc1f2750SBernard IremongerPacket reassembly 33fc1f2750SBernard Iremonger----------------- 34fc1f2750SBernard Iremonger 35fc1f2750SBernard IremongerIP Fragment Table 36fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~ 37fc1f2750SBernard Iremonger 38fc1f2750SBernard IremongerFragment table maintains information about already received fragments of the packet. 39fc1f2750SBernard Iremonger 40fc1f2750SBernard IremongerEach IP packet is uniquely identified by triple <Source IP address>, <Destination IP address>, <ID>. 41fc1f2750SBernard Iremonger 42fea1d908SJohn McNamaraNote that all update/lookup operations on Fragment Table are not thread safe. 43fc1f2750SBernard IremongerSo if different execution contexts (threads/processes) will access the same table simultaneously, 44fea1d908SJohn McNamarathen some external syncing mechanism have to be provided. 45fc1f2750SBernard Iremonger 46*0219d467SSimei SuEach table entry can hold information about packets consisting of up to RTE_LIBRTE_IP_FRAG_MAX (by default: 8) fragments. 47fc1f2750SBernard Iremonger 48fc1f2750SBernard IremongerCode example, that demonstrates creation of a new Fragment table: 49fc1f2750SBernard Iremonger 50fc1f2750SBernard Iremonger.. code-block:: c 51fc1f2750SBernard Iremonger 52fc1f2750SBernard Iremonger frag_cycles = (rte_get_tsc_hz() + MS_PER_S - 1) / MS_PER_S * max_flow_ttl; 53fc1f2750SBernard Iremonger bucket_num = max_flow_num + max_flow_num / 4; 54fc1f2750SBernard Iremonger frag_tbl = rte_ip_frag_table_create(max_flow_num, bucket_entries, max_flow_num, frag_cycles, socket_id); 55fc1f2750SBernard Iremonger 56fea1d908SJohn McNamaraInternally Fragment table is a simple hash table. 57fc1f2750SBernard IremongerThe basic idea is to use two hash functions and <bucket_entries> \* associativity. 58fc1f2750SBernard IremongerThis provides 2 \* <bucket_entries> possible locations in the hash table for each key. 59fc1f2750SBernard IremongerWhen the collision occurs and all 2 \* <bucket_entries> are occupied, 60fea1d908SJohn McNamarainstead of reinserting existing keys into alternative locations, ip_frag_tbl_add() just returns a failure. 61fc1f2750SBernard Iremonger 62fc1f2750SBernard IremongerAlso, entries that resides in the table longer then <max_cycles> are considered as invalid, 63fc1f2750SBernard Iremongerand could be removed/replaced by the new ones. 64fc1f2750SBernard Iremonger 65fc1f2750SBernard IremongerNote that reassembly demands a lot of mbuf's to be allocated. 66fc1f2750SBernard IremongerAt any given time up to (2 \* bucket_entries \* RTE_LIBRTE_IP_FRAG_MAX \* <maximum number of mbufs per packet>) 67fc1f2750SBernard Iremongercan be stored inside Fragment Table waiting for remaining fragments. 68fc1f2750SBernard Iremonger 69fc1f2750SBernard IremongerPacket Reassembly 70fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~ 71fc1f2750SBernard Iremonger 72fc1f2750SBernard IremongerFragmented packets processing and reassembly is done by the rte_ipv4_frag_reassemble_packet()/rte_ipv6_frag_reassemble_packet. 73fc1f2750SBernard IremongerFunctions. They either return a pointer to valid mbuf that contains reassembled packet, 74fc1f2750SBernard Iremongeror NULL (if the packet can't be reassembled for some reason). 75fc1f2750SBernard Iremonger 76fc1f2750SBernard IremongerThese functions are responsible for: 77fc1f2750SBernard Iremonger 78fc1f2750SBernard Iremonger#. Search the Fragment Table for entry with packet's <IPv4 Source Address, IPv4 Destination Address, Packet ID>. 79fc1f2750SBernard Iremonger 80fc1f2750SBernard Iremonger#. If the entry is found, then check if that entry already timed-out. 81fc1f2750SBernard Iremonger If yes, then free all previously received fragments, and remove information about them from the entry. 82fc1f2750SBernard Iremonger 83fc1f2750SBernard Iremonger#. If no entry with such key is found, then try to create a new one by one of two ways: 84fc1f2750SBernard Iremonger 85fc1f2750SBernard Iremonger a) Use as empty entry. 86fc1f2750SBernard Iremonger 87fc1f2750SBernard Iremonger b) Delete a timed-out entry, free mbufs associated with it mbufs and store a new entry with specified key in it. 88fc1f2750SBernard Iremonger 89fc1f2750SBernard Iremonger#. Update the entry with new fragment information and check if a packet can be reassembled 90fc1f2750SBernard Iremonger (the packet's entry contains all fragments). 91fc1f2750SBernard Iremonger 92fc1f2750SBernard Iremonger a) If yes, then, reassemble the packet, mark table's entry as empty and return the reassembled mbuf to the caller. 93fc1f2750SBernard Iremonger 94fc1f2750SBernard Iremonger b) If no, then return a NULL to the caller. 95fc1f2750SBernard Iremonger 96fea1d908SJohn McNamaraIf at any stage of packet processing an error is encountered 97fc1f2750SBernard Iremonger(e.g: can't insert new entry into the Fragment Table, or invalid/timed-out fragment), 98fc1f2750SBernard Iremongerthen the function will free all associated with the packet fragments, 99fc1f2750SBernard Iremongermark the table entry as invalid and return NULL to the caller. 100fc1f2750SBernard Iremonger 101fc1f2750SBernard IremongerDebug logging and Statistics Collection 102fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 103fc1f2750SBernard Iremonger 104fc1f2750SBernard IremongerThe RTE_LIBRTE_IP_FRAG_TBL_STAT config macro controls statistics collection for the Fragment Table. 105fc1f2750SBernard IremongerThis macro is not enabled by default. 106