xref: /dpdk/doc/guides/prog_guide/ip_fragment_reassembly_lib.rst (revision fea1d908d39989a27890b29b5c0ec94c85c8257b)
1fc1f2750SBernard Iremonger..  BSD LICENSE
2fc1f2750SBernard Iremonger    Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
3fc1f2750SBernard Iremonger    All rights reserved.
4fc1f2750SBernard Iremonger
5fc1f2750SBernard Iremonger    Redistribution and use in source and binary forms, with or without
6fc1f2750SBernard Iremonger    modification, are permitted provided that the following conditions
7fc1f2750SBernard Iremonger    are met:
8fc1f2750SBernard Iremonger
9fc1f2750SBernard Iremonger    * Redistributions of source code must retain the above copyright
10fc1f2750SBernard Iremonger    notice, this list of conditions and the following disclaimer.
11fc1f2750SBernard Iremonger    * Redistributions in binary form must reproduce the above copyright
12fc1f2750SBernard Iremonger    notice, this list of conditions and the following disclaimer in
13fc1f2750SBernard Iremonger    the documentation and/or other materials provided with the
14fc1f2750SBernard Iremonger    distribution.
15fc1f2750SBernard Iremonger    * Neither the name of Intel Corporation nor the names of its
16fc1f2750SBernard Iremonger    contributors may be used to endorse or promote products derived
17fc1f2750SBernard Iremonger    from this software without specific prior written permission.
18fc1f2750SBernard Iremonger
19fc1f2750SBernard Iremonger    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20fc1f2750SBernard Iremonger    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21fc1f2750SBernard Iremonger    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22fc1f2750SBernard Iremonger    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23fc1f2750SBernard Iremonger    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24fc1f2750SBernard Iremonger    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25fc1f2750SBernard Iremonger    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26fc1f2750SBernard Iremonger    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27fc1f2750SBernard Iremonger    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28fc1f2750SBernard Iremonger    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29fc1f2750SBernard Iremonger    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30fc1f2750SBernard Iremonger
31fc1f2750SBernard IremongerIP Fragmentation and Reassembly Library
32fc1f2750SBernard Iremonger=======================================
33fc1f2750SBernard Iremonger
34fc1f2750SBernard IremongerThe IP Fragmentation and Reassembly Library implements IPv4 and IPv6 packet fragmentation and reassembly.
35fc1f2750SBernard Iremonger
36fc1f2750SBernard IremongerPacket fragmentation
37fc1f2750SBernard Iremonger--------------------
38fc1f2750SBernard Iremonger
39*fea1d908SJohn McNamaraPacket fragmentation routines divide input packet into number of fragments.
40fc1f2750SBernard IremongerBoth rte_ipv4_fragment_packet() and rte_ipv6_fragment_packet() functions assume that input mbuf data
41fc1f2750SBernard Iremongerpoints to the start of the IP header of the packet (i.e. L2 header is already stripped out).
42*fea1d908SJohn McNamaraTo avoid copying of the actual packet's data zero-copy technique is used (rte_pktmbuf_attach).
43fc1f2750SBernard IremongerFor each fragment two new mbufs are created:
44fc1f2750SBernard Iremonger
45fc1f2750SBernard Iremonger*   Direct mbuf -- mbuf that will contain L3 header of the new fragment.
46fc1f2750SBernard Iremonger
47fc1f2750SBernard Iremonger*   Indirect mbuf -- mbuf that is attached to the mbuf with the original packet.
48fc1f2750SBernard Iremonger    It's data field points to the start of the original packets data plus fragment offset.
49fc1f2750SBernard Iremonger
50fc1f2750SBernard IremongerThen L3 header is copied from the original mbuf into the 'direct' mbuf and updated to reflect new fragmented status.
51fc1f2750SBernard IremongerNote that for IPv4, header checksum is not recalculated and is set to zero.
52fc1f2750SBernard Iremonger
53*fea1d908SJohn McNamaraFinally 'direct' and 'indirect' mbufs for each fragment are linked together via mbuf's next filed to compose a packet for the new fragment.
54fc1f2750SBernard Iremonger
55fc1f2750SBernard IremongerThe caller has an ability to explicitly specify which mempools should be used to allocate 'direct' and 'indirect' mbufs from.
56fc1f2750SBernard Iremonger
5748624fd9SSiobhan ButlerFor more information about direct and indirect mbufs, refer to the *DPDK Programmers guide 7.7 Direct and Indirect Buffers.*
58fc1f2750SBernard Iremonger
59fc1f2750SBernard IremongerPacket reassembly
60fc1f2750SBernard Iremonger-----------------
61fc1f2750SBernard Iremonger
62fc1f2750SBernard IremongerIP Fragment Table
63fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~
64fc1f2750SBernard Iremonger
65fc1f2750SBernard IremongerFragment table maintains information about already received fragments of the packet.
66fc1f2750SBernard Iremonger
67fc1f2750SBernard IremongerEach IP packet is uniquely identified by triple <Source IP address>, <Destination IP address>, <ID>.
68fc1f2750SBernard Iremonger
69*fea1d908SJohn McNamaraNote that all update/lookup operations on Fragment Table are not thread safe.
70fc1f2750SBernard IremongerSo if different execution contexts (threads/processes) will access the same table simultaneously,
71*fea1d908SJohn McNamarathen some external syncing mechanism have to be provided.
72fc1f2750SBernard Iremonger
73fc1f2750SBernard IremongerEach table entry can hold information about packets consisting of up to RTE_LIBRTE_IP_FRAG_MAX (by default: 4) fragments.
74fc1f2750SBernard Iremonger
75fc1f2750SBernard IremongerCode example, that demonstrates creation of a new Fragment table:
76fc1f2750SBernard Iremonger
77fc1f2750SBernard Iremonger.. code-block:: c
78fc1f2750SBernard Iremonger
79fc1f2750SBernard Iremonger    frag_cycles = (rte_get_tsc_hz() + MS_PER_S - 1) / MS_PER_S * max_flow_ttl;
80fc1f2750SBernard Iremonger    bucket_num = max_flow_num + max_flow_num / 4;
81fc1f2750SBernard Iremonger    frag_tbl = rte_ip_frag_table_create(max_flow_num, bucket_entries, max_flow_num, frag_cycles, socket_id);
82fc1f2750SBernard Iremonger
83*fea1d908SJohn McNamaraInternally Fragment table is a simple hash table.
84fc1f2750SBernard IremongerThe basic idea is to use two hash functions and <bucket_entries> \* associativity.
85fc1f2750SBernard IremongerThis provides 2 \* <bucket_entries> possible locations in the hash table for each key.
86fc1f2750SBernard IremongerWhen the collision occurs and all 2 \* <bucket_entries> are occupied,
87*fea1d908SJohn McNamarainstead of reinserting existing keys into alternative locations, ip_frag_tbl_add() just returns a failure.
88fc1f2750SBernard Iremonger
89fc1f2750SBernard IremongerAlso, entries that resides in the table longer then <max_cycles> are considered as invalid,
90fc1f2750SBernard Iremongerand could be removed/replaced by the new ones.
91fc1f2750SBernard Iremonger
92fc1f2750SBernard IremongerNote that reassembly demands a lot of mbuf's to be allocated.
93fc1f2750SBernard IremongerAt any given time up to (2 \* bucket_entries \* RTE_LIBRTE_IP_FRAG_MAX \* <maximum number of mbufs per packet>)
94fc1f2750SBernard Iremongercan be stored inside Fragment Table waiting for remaining fragments.
95fc1f2750SBernard Iremonger
96fc1f2750SBernard IremongerPacket Reassembly
97fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~
98fc1f2750SBernard Iremonger
99fc1f2750SBernard IremongerFragmented packets processing and reassembly is done by the rte_ipv4_frag_reassemble_packet()/rte_ipv6_frag_reassemble_packet.
100fc1f2750SBernard IremongerFunctions. They either return a pointer to valid mbuf that contains reassembled packet,
101fc1f2750SBernard Iremongeror NULL (if the packet can't be reassembled for some reason).
102fc1f2750SBernard Iremonger
103fc1f2750SBernard IremongerThese functions are responsible for:
104fc1f2750SBernard Iremonger
105fc1f2750SBernard Iremonger#.  Search the Fragment Table for entry with packet's <IPv4 Source Address, IPv4 Destination Address, Packet ID>.
106fc1f2750SBernard Iremonger
107fc1f2750SBernard Iremonger#.  If the entry is found, then check if that entry already timed-out.
108fc1f2750SBernard Iremonger    If yes, then free all previously received fragments, and remove information about them from the entry.
109fc1f2750SBernard Iremonger
110fc1f2750SBernard Iremonger#.  If no entry with such key is found, then try to create a new one by one of two ways:
111fc1f2750SBernard Iremonger
112fc1f2750SBernard Iremonger    a) Use as empty entry.
113fc1f2750SBernard Iremonger
114fc1f2750SBernard Iremonger    b) Delete a timed-out entry, free mbufs associated with it mbufs and store a new entry with specified key in it.
115fc1f2750SBernard Iremonger
116fc1f2750SBernard Iremonger#.  Update the entry with new fragment information and check if a packet can be reassembled
117fc1f2750SBernard Iremonger    (the packet's entry contains all fragments).
118fc1f2750SBernard Iremonger
119fc1f2750SBernard Iremonger    a) If yes, then, reassemble the packet, mark table's entry as empty and return the reassembled mbuf to the caller.
120fc1f2750SBernard Iremonger
121fc1f2750SBernard Iremonger    b) If no, then return a NULL to the caller.
122fc1f2750SBernard Iremonger
123*fea1d908SJohn McNamaraIf at any stage of packet processing an error is encountered
124fc1f2750SBernard Iremonger(e.g: can't insert new entry into the Fragment Table, or invalid/timed-out fragment),
125fc1f2750SBernard Iremongerthen the function will free all associated with the packet fragments,
126fc1f2750SBernard Iremongermark the table entry as invalid and return NULL to the caller.
127fc1f2750SBernard Iremonger
128fc1f2750SBernard IremongerDebug logging and Statistics Collection
129fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
130fc1f2750SBernard Iremonger
131fc1f2750SBernard IremongerThe RTE_LIBRTE_IP_FRAG_TBL_STAT config macro controls statistics collection for the Fragment Table.
132fc1f2750SBernard IremongerThis macro is not enabled by default.
133fc1f2750SBernard Iremonger
134fc1f2750SBernard IremongerThe RTE_LIBRTE_IP_FRAG_DEBUG controls debug logging of IP fragments processing and reassembling.
135fc1f2750SBernard IremongerThis macro is disabled by default.
136fc1f2750SBernard IremongerNote that while logging contains a lot of detailed information,
137fc1f2750SBernard Iremongerit slows down packet processing and might cause the loss of a lot of packets.
138