xref: /dpdk/doc/guides/prog_guide/ip_fragment_reassembly_lib.rst (revision 48624fd96e7c4a9603e383baa193909fea392232)
1fc1f2750SBernard Iremonger..  BSD LICENSE
2fc1f2750SBernard Iremonger    Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
3fc1f2750SBernard Iremonger    All rights reserved.
4fc1f2750SBernard Iremonger
5fc1f2750SBernard Iremonger    Redistribution and use in source and binary forms, with or without
6fc1f2750SBernard Iremonger    modification, are permitted provided that the following conditions
7fc1f2750SBernard Iremonger    are met:
8fc1f2750SBernard Iremonger
9fc1f2750SBernard Iremonger    * Redistributions of source code must retain the above copyright
10fc1f2750SBernard Iremonger    notice, this list of conditions and the following disclaimer.
11fc1f2750SBernard Iremonger    * Redistributions in binary form must reproduce the above copyright
12fc1f2750SBernard Iremonger    notice, this list of conditions and the following disclaimer in
13fc1f2750SBernard Iremonger    the documentation and/or other materials provided with the
14fc1f2750SBernard Iremonger    distribution.
15fc1f2750SBernard Iremonger    * Neither the name of Intel Corporation nor the names of its
16fc1f2750SBernard Iremonger    contributors may be used to endorse or promote products derived
17fc1f2750SBernard Iremonger    from this software without specific prior written permission.
18fc1f2750SBernard Iremonger
19fc1f2750SBernard Iremonger    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20fc1f2750SBernard Iremonger    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21fc1f2750SBernard Iremonger    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22fc1f2750SBernard Iremonger    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23fc1f2750SBernard Iremonger    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24fc1f2750SBernard Iremonger    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25fc1f2750SBernard Iremonger    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26fc1f2750SBernard Iremonger    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27fc1f2750SBernard Iremonger    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28fc1f2750SBernard Iremonger    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29fc1f2750SBernard Iremonger    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30fc1f2750SBernard Iremonger
31fc1f2750SBernard IremongerIP Fragmentation and Reassembly Library
32fc1f2750SBernard Iremonger=======================================
33fc1f2750SBernard Iremonger
34fc1f2750SBernard IremongerThe IP Fragmentation and Reassembly Library implements IPv4 and IPv6 packet fragmentation and reassembly.
35fc1f2750SBernard Iremonger
36fc1f2750SBernard IremongerPacket fragmentation
37fc1f2750SBernard Iremonger--------------------
38fc1f2750SBernard Iremonger
39fc1f2750SBernard IremongerPacket fragmentation routines devide input packet into number of fragments.
40fc1f2750SBernard IremongerBoth rte_ipv4_fragment_packet() and rte_ipv6_fragment_packet() functions assume that input mbuf data
41fc1f2750SBernard Iremongerpoints to the start of the IP header of the packet (i.e. L2 header is already stripped out).
42fc1f2750SBernard IremongerTo avoid copying fo the actual packet's data zero-copy technique is used (rte_pktmbuf_attach).
43fc1f2750SBernard IremongerFor each fragment two new mbufs are created:
44fc1f2750SBernard Iremonger
45fc1f2750SBernard Iremonger*   Direct mbuf -- mbuf that will contain L3 header of the new fragment.
46fc1f2750SBernard Iremonger
47fc1f2750SBernard Iremonger*   Indirect mbuf -- mbuf that is attached to the mbuf with the original packet.
48fc1f2750SBernard Iremonger    It's data field points to the start of the original packets data plus fragment offset.
49fc1f2750SBernard Iremonger
50fc1f2750SBernard IremongerThen L3 header is copied from the original mbuf into the 'direct' mbuf and updated to reflect new fragmented status.
51fc1f2750SBernard IremongerNote that for IPv4, header checksum is not recalculated and is set to zero.
52fc1f2750SBernard Iremonger
53fc1f2750SBernard IremongerFinally 'direct' and 'indirect' mbufs for each fragnemt are linked together via mbuf's next filed to compose a packet for the new fragment.
54fc1f2750SBernard Iremonger
55fc1f2750SBernard IremongerThe caller has an ability to explicitly specify which mempools should be used to allocate 'direct' and 'indirect' mbufs from.
56fc1f2750SBernard Iremonger
57fc1f2750SBernard IremongerNote that configuration macro RTE_MBUF_SCATTER_GATHER has to be enabled to make fragmentation library build and work correctly.
58*48624fd9SSiobhan ButlerFor more information about direct and indirect mbufs, refer to the *DPDK Programmers guide 7.7 Direct and Indirect Buffers.*
59fc1f2750SBernard Iremonger
60fc1f2750SBernard IremongerPacket reassembly
61fc1f2750SBernard Iremonger-----------------
62fc1f2750SBernard Iremonger
63fc1f2750SBernard IremongerIP Fragment Table
64fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~
65fc1f2750SBernard Iremonger
66fc1f2750SBernard IremongerFragment table maintains information about already received fragments of the packet.
67fc1f2750SBernard Iremonger
68fc1f2750SBernard IremongerEach IP packet is uniquely identified by triple <Source IP address>, <Destination IP address>, <ID>.
69fc1f2750SBernard Iremonger
70fc1f2750SBernard IremongerNote that all update/lookup operations on Fragmen Table are not thread safe.
71fc1f2750SBernard IremongerSo if different execution contexts (threads/processes) will access the same table simultaneously,
72fc1f2750SBernard Iremongerthen some exernal syncing mechanism have to be provided.
73fc1f2750SBernard Iremonger
74fc1f2750SBernard IremongerEach table entry can hold information about packets consisting of up to RTE_LIBRTE_IP_FRAG_MAX (by default: 4) fragments.
75fc1f2750SBernard Iremonger
76fc1f2750SBernard IremongerCode example, that demonstrates creation of a new Fragment table:
77fc1f2750SBernard Iremonger
78fc1f2750SBernard Iremonger.. code-block:: c
79fc1f2750SBernard Iremonger
80fc1f2750SBernard Iremonger    frag_cycles = (rte_get_tsc_hz() + MS_PER_S - 1) / MS_PER_S * max_flow_ttl;
81fc1f2750SBernard Iremonger    bucket_num = max_flow_num + max_flow_num / 4;
82fc1f2750SBernard Iremonger    frag_tbl = rte_ip_frag_table_create(max_flow_num, bucket_entries, max_flow_num, frag_cycles, socket_id);
83fc1f2750SBernard Iremonger
84fc1f2750SBernard IremongerInternally Fragmen table is a simple hash table.
85fc1f2750SBernard IremongerThe basic idea is to use two hash functions and <bucket_entries> \* associativity.
86fc1f2750SBernard IremongerThis provides 2 \* <bucket_entries> possible locations in the hash table for each key.
87fc1f2750SBernard IremongerWhen the collision occurs and all 2 \* <bucket_entries> are occupied,
88fc1f2750SBernard Iremongerinstead of resinserting existing keys into alternative locations, ip_frag_tbl_add() just returns a faiure.
89fc1f2750SBernard Iremonger
90fc1f2750SBernard IremongerAlso, entries that resides in the table longer then <max_cycles> are considered as invalid,
91fc1f2750SBernard Iremongerand could be removed/replaced by the new ones.
92fc1f2750SBernard Iremonger
93fc1f2750SBernard IremongerNote that reassembly demands a lot of mbuf's to be allocated.
94fc1f2750SBernard IremongerAt any given time up to (2 \* bucket_entries \* RTE_LIBRTE_IP_FRAG_MAX \* <maximum number of mbufs per packet>)
95fc1f2750SBernard Iremongercan be stored inside Fragment Table waiting for remaining fragments.
96fc1f2750SBernard Iremonger
97fc1f2750SBernard IremongerPacket Reassembly
98fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~
99fc1f2750SBernard Iremonger
100fc1f2750SBernard IremongerFragmented packets processing and reassembly is done by the rte_ipv4_frag_reassemble_packet()/rte_ipv6_frag_reassemble_packet.
101fc1f2750SBernard IremongerFunctions. They either return a pointer to valid mbuf that contains reassembled packet,
102fc1f2750SBernard Iremongeror NULL (if the packet can't be reassembled for some reason).
103fc1f2750SBernard Iremonger
104fc1f2750SBernard IremongerThese functions are responsible for:
105fc1f2750SBernard Iremonger
106fc1f2750SBernard Iremonger#.  Search the Fragment Table for entry with packet's <IPv4 Source Address, IPv4 Destination Address, Packet ID>.
107fc1f2750SBernard Iremonger
108fc1f2750SBernard Iremonger#.  If the entry is found, then check if that entry already timed-out.
109fc1f2750SBernard Iremonger    If yes, then free all previously received fragments, and remove information about them from the entry.
110fc1f2750SBernard Iremonger
111fc1f2750SBernard Iremonger#.  If no entry with such key is found, then try to create a new one by one of two ways:
112fc1f2750SBernard Iremonger
113fc1f2750SBernard Iremonger    a) Use as empty entry.
114fc1f2750SBernard Iremonger
115fc1f2750SBernard Iremonger    b) Delete a timed-out entry, free mbufs associated with it mbufs and store a new entry with specified key in it.
116fc1f2750SBernard Iremonger
117fc1f2750SBernard Iremonger#.  Update the entry with new fragment information and check if a packet can be reassembled
118fc1f2750SBernard Iremonger    (the packet's entry contains all fragments).
119fc1f2750SBernard Iremonger
120fc1f2750SBernard Iremonger    a) If yes, then, reassemble the packet, mark table's entry as empty and return the reassembled mbuf to the caller.
121fc1f2750SBernard Iremonger
122fc1f2750SBernard Iremonger    b) If no, then return a NULL to the caller.
123fc1f2750SBernard Iremonger
124fc1f2750SBernard IremongerIf at any stage of packet processing an error is envountered
125fc1f2750SBernard Iremonger(e.g: can't insert new entry into the Fragment Table, or invalid/timed-out fragment),
126fc1f2750SBernard Iremongerthen the function will free all associated with the packet fragments,
127fc1f2750SBernard Iremongermark the table entry as invalid and return NULL to the caller.
128fc1f2750SBernard Iremonger
129fc1f2750SBernard IremongerDebug logging and Statistics Collection
130fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
131fc1f2750SBernard Iremonger
132fc1f2750SBernard IremongerThe RTE_LIBRTE_IP_FRAG_TBL_STAT config macro controls statistics collection for the Fragment Table.
133fc1f2750SBernard IremongerThis macro is not enabled by default.
134fc1f2750SBernard Iremonger
135fc1f2750SBernard IremongerThe RTE_LIBRTE_IP_FRAG_DEBUG controls debug logging of IP fragments processing and reassembling.
136fc1f2750SBernard IremongerThis macro is disabled by default.
137fc1f2750SBernard IremongerNote that while logging contains a lot of detailed information,
138fc1f2750SBernard Iremongerit slows down packet processing and might cause the loss of a lot of packets.
139