xref: /dpdk/doc/guides/prog_guide/ip_fragment_reassembly_lib.rst (revision 0219d467bcb1d19b386b5bae8eecd3514ba13fdb)
15630257fSFerruh Yigit..  SPDX-License-Identifier: BSD-3-Clause
25630257fSFerruh Yigit    Copyright(c) 2010-2014 Intel Corporation.
3fc1f2750SBernard Iremonger
4fc1f2750SBernard IremongerIP Fragmentation and Reassembly Library
5fc1f2750SBernard Iremonger=======================================
6fc1f2750SBernard Iremonger
7fc1f2750SBernard IremongerThe IP Fragmentation and Reassembly Library implements IPv4 and IPv6 packet fragmentation and reassembly.
8fc1f2750SBernard Iremonger
9fc1f2750SBernard IremongerPacket fragmentation
10fc1f2750SBernard Iremonger--------------------
11fc1f2750SBernard Iremonger
12fea1d908SJohn McNamaraPacket fragmentation routines divide input packet into number of fragments.
13fc1f2750SBernard IremongerBoth rte_ipv4_fragment_packet() and rte_ipv6_fragment_packet() functions assume that input mbuf data
14fc1f2750SBernard Iremongerpoints to the start of the IP header of the packet (i.e. L2 header is already stripped out).
15fea1d908SJohn McNamaraTo avoid copying of the actual packet's data zero-copy technique is used (rte_pktmbuf_attach).
16fc1f2750SBernard IremongerFor each fragment two new mbufs are created:
17fc1f2750SBernard Iremonger
18fc1f2750SBernard Iremonger*   Direct mbuf -- mbuf that will contain L3 header of the new fragment.
19fc1f2750SBernard Iremonger
20fc1f2750SBernard Iremonger*   Indirect mbuf -- mbuf that is attached to the mbuf with the original packet.
21fc1f2750SBernard Iremonger    It's data field points to the start of the original packets data plus fragment offset.
22fc1f2750SBernard Iremonger
23fc1f2750SBernard IremongerThen L3 header is copied from the original mbuf into the 'direct' mbuf and updated to reflect new fragmented status.
24fc1f2750SBernard IremongerNote that for IPv4, header checksum is not recalculated and is set to zero.
25fc1f2750SBernard Iremonger
26fea1d908SJohn McNamaraFinally 'direct' and 'indirect' mbufs for each fragment are linked together via mbuf's next filed to compose a packet for the new fragment.
27fc1f2750SBernard Iremonger
28fc1f2750SBernard IremongerThe caller has an ability to explicitly specify which mempools should be used to allocate 'direct' and 'indirect' mbufs from.
29fc1f2750SBernard Iremonger
3029e30cbcSThomas MonjalonFor more information about direct and indirect mbufs, refer to :ref:`direct_indirect_buffer`.
31fc1f2750SBernard Iremonger
32fc1f2750SBernard IremongerPacket reassembly
33fc1f2750SBernard Iremonger-----------------
34fc1f2750SBernard Iremonger
35fc1f2750SBernard IremongerIP Fragment Table
36fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~
37fc1f2750SBernard Iremonger
38fc1f2750SBernard IremongerFragment table maintains information about already received fragments of the packet.
39fc1f2750SBernard Iremonger
40fc1f2750SBernard IremongerEach IP packet is uniquely identified by triple <Source IP address>, <Destination IP address>, <ID>.
41fc1f2750SBernard Iremonger
42fea1d908SJohn McNamaraNote that all update/lookup operations on Fragment Table are not thread safe.
43fc1f2750SBernard IremongerSo if different execution contexts (threads/processes) will access the same table simultaneously,
44fea1d908SJohn McNamarathen some external syncing mechanism have to be provided.
45fc1f2750SBernard Iremonger
46*0219d467SSimei SuEach table entry can hold information about packets consisting of up to RTE_LIBRTE_IP_FRAG_MAX (by default: 8) fragments.
47fc1f2750SBernard Iremonger
48fc1f2750SBernard IremongerCode example, that demonstrates creation of a new Fragment table:
49fc1f2750SBernard Iremonger
50fc1f2750SBernard Iremonger.. code-block:: c
51fc1f2750SBernard Iremonger
52fc1f2750SBernard Iremonger    frag_cycles = (rte_get_tsc_hz() + MS_PER_S - 1) / MS_PER_S * max_flow_ttl;
53fc1f2750SBernard Iremonger    bucket_num = max_flow_num + max_flow_num / 4;
54fc1f2750SBernard Iremonger    frag_tbl = rte_ip_frag_table_create(max_flow_num, bucket_entries, max_flow_num, frag_cycles, socket_id);
55fc1f2750SBernard Iremonger
56fea1d908SJohn McNamaraInternally Fragment table is a simple hash table.
57fc1f2750SBernard IremongerThe basic idea is to use two hash functions and <bucket_entries> \* associativity.
58fc1f2750SBernard IremongerThis provides 2 \* <bucket_entries> possible locations in the hash table for each key.
59fc1f2750SBernard IremongerWhen the collision occurs and all 2 \* <bucket_entries> are occupied,
60fea1d908SJohn McNamarainstead of reinserting existing keys into alternative locations, ip_frag_tbl_add() just returns a failure.
61fc1f2750SBernard Iremonger
62fc1f2750SBernard IremongerAlso, entries that resides in the table longer then <max_cycles> are considered as invalid,
63fc1f2750SBernard Iremongerand could be removed/replaced by the new ones.
64fc1f2750SBernard Iremonger
65fc1f2750SBernard IremongerNote that reassembly demands a lot of mbuf's to be allocated.
66fc1f2750SBernard IremongerAt any given time up to (2 \* bucket_entries \* RTE_LIBRTE_IP_FRAG_MAX \* <maximum number of mbufs per packet>)
67fc1f2750SBernard Iremongercan be stored inside Fragment Table waiting for remaining fragments.
68fc1f2750SBernard Iremonger
69fc1f2750SBernard IremongerPacket Reassembly
70fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~
71fc1f2750SBernard Iremonger
72fc1f2750SBernard IremongerFragmented packets processing and reassembly is done by the rte_ipv4_frag_reassemble_packet()/rte_ipv6_frag_reassemble_packet.
73fc1f2750SBernard IremongerFunctions. They either return a pointer to valid mbuf that contains reassembled packet,
74fc1f2750SBernard Iremongeror NULL (if the packet can't be reassembled for some reason).
75fc1f2750SBernard Iremonger
76fc1f2750SBernard IremongerThese functions are responsible for:
77fc1f2750SBernard Iremonger
78fc1f2750SBernard Iremonger#.  Search the Fragment Table for entry with packet's <IPv4 Source Address, IPv4 Destination Address, Packet ID>.
79fc1f2750SBernard Iremonger
80fc1f2750SBernard Iremonger#.  If the entry is found, then check if that entry already timed-out.
81fc1f2750SBernard Iremonger    If yes, then free all previously received fragments, and remove information about them from the entry.
82fc1f2750SBernard Iremonger
83fc1f2750SBernard Iremonger#.  If no entry with such key is found, then try to create a new one by one of two ways:
84fc1f2750SBernard Iremonger
85fc1f2750SBernard Iremonger    a) Use as empty entry.
86fc1f2750SBernard Iremonger
87fc1f2750SBernard Iremonger    b) Delete a timed-out entry, free mbufs associated with it mbufs and store a new entry with specified key in it.
88fc1f2750SBernard Iremonger
89fc1f2750SBernard Iremonger#.  Update the entry with new fragment information and check if a packet can be reassembled
90fc1f2750SBernard Iremonger    (the packet's entry contains all fragments).
91fc1f2750SBernard Iremonger
92fc1f2750SBernard Iremonger    a) If yes, then, reassemble the packet, mark table's entry as empty and return the reassembled mbuf to the caller.
93fc1f2750SBernard Iremonger
94fc1f2750SBernard Iremonger    b) If no, then return a NULL to the caller.
95fc1f2750SBernard Iremonger
96fea1d908SJohn McNamaraIf at any stage of packet processing an error is encountered
97fc1f2750SBernard Iremonger(e.g: can't insert new entry into the Fragment Table, or invalid/timed-out fragment),
98fc1f2750SBernard Iremongerthen the function will free all associated with the packet fragments,
99fc1f2750SBernard Iremongermark the table entry as invalid and return NULL to the caller.
100fc1f2750SBernard Iremonger
101fc1f2750SBernard IremongerDebug logging and Statistics Collection
102fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
103fc1f2750SBernard Iremonger
104fc1f2750SBernard IremongerThe RTE_LIBRTE_IP_FRAG_TBL_STAT config macro controls statistics collection for the Fragment Table.
105fc1f2750SBernard IremongerThis macro is not enabled by default.
106