xref: /dpdk/doc/guides/prog_guide/ip_fragment_reassembly_lib.rst (revision fc1f2750a3ec6da919e3c86e59d56f34ec97154b)
1*fc1f2750SBernard Iremonger..  BSD LICENSE
2*fc1f2750SBernard Iremonger    Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
3*fc1f2750SBernard Iremonger    All rights reserved.
4*fc1f2750SBernard Iremonger
5*fc1f2750SBernard Iremonger    Redistribution and use in source and binary forms, with or without
6*fc1f2750SBernard Iremonger    modification, are permitted provided that the following conditions
7*fc1f2750SBernard Iremonger    are met:
8*fc1f2750SBernard Iremonger
9*fc1f2750SBernard Iremonger    * Redistributions of source code must retain the above copyright
10*fc1f2750SBernard Iremonger    notice, this list of conditions and the following disclaimer.
11*fc1f2750SBernard Iremonger    * Redistributions in binary form must reproduce the above copyright
12*fc1f2750SBernard Iremonger    notice, this list of conditions and the following disclaimer in
13*fc1f2750SBernard Iremonger    the documentation and/or other materials provided with the
14*fc1f2750SBernard Iremonger    distribution.
15*fc1f2750SBernard Iremonger    * Neither the name of Intel Corporation nor the names of its
16*fc1f2750SBernard Iremonger    contributors may be used to endorse or promote products derived
17*fc1f2750SBernard Iremonger    from this software without specific prior written permission.
18*fc1f2750SBernard Iremonger
19*fc1f2750SBernard Iremonger    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20*fc1f2750SBernard Iremonger    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21*fc1f2750SBernard Iremonger    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22*fc1f2750SBernard Iremonger    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23*fc1f2750SBernard Iremonger    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24*fc1f2750SBernard Iremonger    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25*fc1f2750SBernard Iremonger    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26*fc1f2750SBernard Iremonger    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27*fc1f2750SBernard Iremonger    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28*fc1f2750SBernard Iremonger    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29*fc1f2750SBernard Iremonger    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30*fc1f2750SBernard Iremonger
31*fc1f2750SBernard IremongerIP Fragmentation and Reassembly Library
32*fc1f2750SBernard Iremonger=======================================
33*fc1f2750SBernard Iremonger
34*fc1f2750SBernard IremongerThe IP Fragmentation and Reassembly Library implements IPv4 and IPv6 packet fragmentation and reassembly.
35*fc1f2750SBernard Iremonger
36*fc1f2750SBernard IremongerPacket fragmentation
37*fc1f2750SBernard Iremonger--------------------
38*fc1f2750SBernard Iremonger
39*fc1f2750SBernard IremongerPacket fragmentation routines devide input packet into number of fragments.
40*fc1f2750SBernard IremongerBoth rte_ipv4_fragment_packet() and rte_ipv6_fragment_packet() functions assume that input mbuf data
41*fc1f2750SBernard Iremongerpoints to the start of the IP header of the packet (i.e. L2 header is already stripped out).
42*fc1f2750SBernard IremongerTo avoid copying fo the actual packet's data zero-copy technique is used (rte_pktmbuf_attach).
43*fc1f2750SBernard IremongerFor each fragment two new mbufs are created:
44*fc1f2750SBernard Iremonger
45*fc1f2750SBernard Iremonger*   Direct mbuf -- mbuf that will contain L3 header of the new fragment.
46*fc1f2750SBernard Iremonger
47*fc1f2750SBernard Iremonger*   Indirect mbuf -- mbuf that is attached to the mbuf with the original packet.
48*fc1f2750SBernard Iremonger    It's data field points to the start of the original packets data plus fragment offset.
49*fc1f2750SBernard Iremonger
50*fc1f2750SBernard IremongerThen L3 header is copied from the original mbuf into the 'direct' mbuf and updated to reflect new fragmented status.
51*fc1f2750SBernard IremongerNote that for IPv4, header checksum is not recalculated and is set to zero.
52*fc1f2750SBernard Iremonger
53*fc1f2750SBernard IremongerFinally 'direct' and 'indirect' mbufs for each fragnemt are linked together via mbuf's next filed to compose a packet for the new fragment.
54*fc1f2750SBernard Iremonger
55*fc1f2750SBernard IremongerThe caller has an ability to explicitly specify which mempools should be used to allocate 'direct' and 'indirect' mbufs from.
56*fc1f2750SBernard Iremonger
57*fc1f2750SBernard IremongerNote that configuration macro RTE_MBUF_SCATTER_GATHER has to be enabled to make fragmentation library build and work correctly.
58*fc1f2750SBernard IremongerFor more information about direct and indirect mbufs, refer to the *Intel DPDK Programmers guide 7.7 Direct and Indirect Buffers.*
59*fc1f2750SBernard Iremonger
60*fc1f2750SBernard IremongerPacket reassembly
61*fc1f2750SBernard Iremonger-----------------
62*fc1f2750SBernard Iremonger
63*fc1f2750SBernard IremongerIP Fragment Table
64*fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~
65*fc1f2750SBernard Iremonger
66*fc1f2750SBernard IremongerFragment table maintains information about already received fragments of the packet.
67*fc1f2750SBernard Iremonger
68*fc1f2750SBernard IremongerEach IP packet is uniquely identified by triple <Source IP address>, <Destination IP address>, <ID>.
69*fc1f2750SBernard Iremonger
70*fc1f2750SBernard IremongerNote that all update/lookup operations on Fragmen Table are not thread safe.
71*fc1f2750SBernard IremongerSo if different execution contexts (threads/processes) will access the same table simultaneously,
72*fc1f2750SBernard Iremongerthen some exernal syncing mechanism have to be provided.
73*fc1f2750SBernard Iremonger
74*fc1f2750SBernard IremongerEach table entry can hold information about packets consisting of up to RTE_LIBRTE_IP_FRAG_MAX (by default: 4) fragments.
75*fc1f2750SBernard Iremonger
76*fc1f2750SBernard IremongerCode example, that demonstrates creation of a new Fragment table:
77*fc1f2750SBernard Iremonger
78*fc1f2750SBernard Iremonger.. code-block:: c
79*fc1f2750SBernard Iremonger
80*fc1f2750SBernard Iremonger    frag_cycles = (rte_get_tsc_hz() + MS_PER_S - 1) / MS_PER_S * max_flow_ttl;
81*fc1f2750SBernard Iremonger    bucket_num = max_flow_num + max_flow_num / 4;
82*fc1f2750SBernard Iremonger    frag_tbl = rte_ip_frag_table_create(max_flow_num, bucket_entries, max_flow_num, frag_cycles, socket_id);
83*fc1f2750SBernard Iremonger
84*fc1f2750SBernard IremongerInternally Fragmen table is a simple hash table.
85*fc1f2750SBernard IremongerThe basic idea is to use two hash functions and <bucket_entries> \* associativity.
86*fc1f2750SBernard IremongerThis provides 2 \* <bucket_entries> possible locations in the hash table for each key.
87*fc1f2750SBernard IremongerWhen the collision occurs and all 2 \* <bucket_entries> are occupied,
88*fc1f2750SBernard Iremongerinstead of resinserting existing keys into alternative locations, ip_frag_tbl_add() just returns a faiure.
89*fc1f2750SBernard Iremonger
90*fc1f2750SBernard IremongerAlso, entries that resides in the table longer then <max_cycles> are considered as invalid,
91*fc1f2750SBernard Iremongerand could be removed/replaced by the new ones.
92*fc1f2750SBernard Iremonger
93*fc1f2750SBernard IremongerNote that reassembly demands a lot of mbuf's to be allocated.
94*fc1f2750SBernard IremongerAt any given time up to (2 \* bucket_entries \* RTE_LIBRTE_IP_FRAG_MAX \* <maximum number of mbufs per packet>)
95*fc1f2750SBernard Iremongercan be stored inside Fragment Table waiting for remaining fragments.
96*fc1f2750SBernard Iremonger
97*fc1f2750SBernard IremongerPacket Reassembly
98*fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~
99*fc1f2750SBernard Iremonger
100*fc1f2750SBernard IremongerFragmented packets processing and reassembly is done by the rte_ipv4_frag_reassemble_packet()/rte_ipv6_frag_reassemble_packet.
101*fc1f2750SBernard IremongerFunctions. They either return a pointer to valid mbuf that contains reassembled packet,
102*fc1f2750SBernard Iremongeror NULL (if the packet can't be reassembled for some reason).
103*fc1f2750SBernard Iremonger
104*fc1f2750SBernard IremongerThese functions are responsible for:
105*fc1f2750SBernard Iremonger
106*fc1f2750SBernard Iremonger#.  Search the Fragment Table for entry with packet's <IPv4 Source Address, IPv4 Destination Address, Packet ID>.
107*fc1f2750SBernard Iremonger
108*fc1f2750SBernard Iremonger#.  If the entry is found, then check if that entry already timed-out.
109*fc1f2750SBernard Iremonger    If yes, then free all previously received fragments, and remove information about them from the entry.
110*fc1f2750SBernard Iremonger
111*fc1f2750SBernard Iremonger#.  If no entry with such key is found, then try to create a new one by one of two ways:
112*fc1f2750SBernard Iremonger
113*fc1f2750SBernard Iremonger    a) Use as empty entry.
114*fc1f2750SBernard Iremonger
115*fc1f2750SBernard Iremonger    b) Delete a timed-out entry, free mbufs associated with it mbufs and store a new entry with specified key in it.
116*fc1f2750SBernard Iremonger
117*fc1f2750SBernard Iremonger#.  Update the entry with new fragment information and check if a packet can be reassembled
118*fc1f2750SBernard Iremonger    (the packet's entry contains all fragments).
119*fc1f2750SBernard Iremonger
120*fc1f2750SBernard Iremonger    a) If yes, then, reassemble the packet, mark table's entry as empty and return the reassembled mbuf to the caller.
121*fc1f2750SBernard Iremonger
122*fc1f2750SBernard Iremonger    b) If no, then return a NULL to the caller.
123*fc1f2750SBernard Iremonger
124*fc1f2750SBernard IremongerIf at any stage of packet processing an error is envountered
125*fc1f2750SBernard Iremonger(e.g: can't insert new entry into the Fragment Table, or invalid/timed-out fragment),
126*fc1f2750SBernard Iremongerthen the function will free all associated with the packet fragments,
127*fc1f2750SBernard Iremongermark the table entry as invalid and return NULL to the caller.
128*fc1f2750SBernard Iremonger
129*fc1f2750SBernard IremongerDebug logging and Statistics Collection
130*fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
131*fc1f2750SBernard Iremonger
132*fc1f2750SBernard IremongerThe RTE_LIBRTE_IP_FRAG_TBL_STAT config macro controls statistics collection for the Fragment Table.
133*fc1f2750SBernard IremongerThis macro is not enabled by default.
134*fc1f2750SBernard Iremonger
135*fc1f2750SBernard IremongerThe RTE_LIBRTE_IP_FRAG_DEBUG controls debug logging of IP fragments processing and reassembling.
136*fc1f2750SBernard IremongerThis macro is disabled by default.
137*fc1f2750SBernard IremongerNote that while logging contains a lot of detailed information,
138*fc1f2750SBernard Iremongerit slows down packet processing and might cause the loss of a lot of packets.
139