xref: /dpdk/doc/guides/nics/fm10k.rst (revision 8d54b1ec4a8be40975ae6978535bcc1431caad02)
15630257fSFerruh Yigit..  SPDX-License-Identifier: BSD-3-Clause
25630257fSFerruh Yigit    Copyright(c) 2015-2016 Intel Corporation.
33eee1f06SChen Jing D(Mark)
43eee1f06SChen Jing D(Mark)FM10K Poll Mode Driver
53eee1f06SChen Jing D(Mark)======================
63eee1f06SChen Jing D(Mark)
73eee1f06SChen Jing D(Mark)The FM10K poll mode driver library provides support for the Intel FM10000
83eee1f06SChen Jing D(Mark)(FM10K) family of 40GbE/100GbE adapters.
93eee1f06SChen Jing D(Mark)
107958b131SXiao WangFTAG Based Forwarding of FM10K
117958b131SXiao Wang------------------------------
127958b131SXiao Wang
137958b131SXiao WangFTAG Based Forwarding is a unique feature of FM10K. The FM10K family of NICs
147958b131SXiao Wangsupport the addition of a Fabric Tag (FTAG) to carry special information.
157958b131SXiao WangThe FTAG is placed at the beginning of the frame, it contains information
167958b131SXiao Wangsuch as where the packet comes from and goes, and the vlan tag. In FTAG based
177958b131SXiao Wangforwarding mode, the switch logic forwards packets according to glort (global
187958b131SXiao Wangresource tag) information, rather than the mac and vlan table. Currently this
197958b131SXiao Wangfeature works only on PF.
207958b131SXiao Wang
217958b131SXiao WangTo enable this feature, the user should pass a devargs parameter to the eal
22db27370bSStephen Hemmingerlike "-a 84:00.0,enable_ftag=1", and the application should make sure an
237958b131SXiao Wangappropriate FTAG is inserted for every frame on TX side.
243eee1f06SChen Jing D(Mark)
25995be951SChen Jing D(Mark)Vector PMD for FM10K
26995be951SChen Jing D(Mark)--------------------
27995be951SChen Jing D(Mark)
28995be951SChen Jing D(Mark)Vector PMD (vPMD) uses Intel® SIMD instructions to optimize packet I/O.
29995be951SChen Jing D(Mark)It improves load/store bandwidth efficiency of L1 data cache by using a wider
30995be951SChen Jing D(Mark)SSE/AVX ''register (1)''.
31995be951SChen Jing D(Mark)The wider register gives space to hold multiple packet buffers so as to save
32995be951SChen Jing D(Mark)on the number of instructions when bulk processing packets.
33995be951SChen Jing D(Mark)
34995be951SChen Jing D(Mark)There is no change to the PMD API. The RX/TX handlers are the only two entries for
35995be951SChen Jing D(Mark)vPMD packet I/O. They are transparently registered at runtime RX/TX execution
36995be951SChen Jing D(Mark)if all required conditions are met.
37995be951SChen Jing D(Mark)
38995be951SChen Jing D(Mark)Some constraints apply as pre-conditions for specific optimizations on bulk
39995be951SChen Jing D(Mark)packet transfers. The following sections explain RX and TX constraints in the
40995be951SChen Jing D(Mark)vPMD.
41995be951SChen Jing D(Mark)
42995be951SChen Jing D(Mark)
43995be951SChen Jing D(Mark)RX Constraints
44995be951SChen Jing D(Mark)~~~~~~~~~~~~~~
45995be951SChen Jing D(Mark)
46995be951SChen Jing D(Mark)
47995be951SChen Jing D(Mark)Prerequisites and Pre-conditions
48995be951SChen Jing D(Mark)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
49995be951SChen Jing D(Mark)
50995be951SChen Jing D(Mark)For Vector RX it is assumed that the number of descriptor rings will be a power
51995be951SChen Jing D(Mark)of 2. With this pre-condition, the ring pointer can easily scroll back to the
52995be951SChen Jing D(Mark)head after hitting the tail without a conditional check. In addition Vector RX
53995be951SChen Jing D(Mark)can use this assumption to do a bit mask using ``ring_size - 1``.
54995be951SChen Jing D(Mark)
55995be951SChen Jing D(Mark)
56995be951SChen Jing D(Mark)Features not Supported by Vector RX PMD
57995be951SChen Jing D(Mark)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
58995be951SChen Jing D(Mark)
59995be951SChen Jing D(Mark)Some features are not supported when trying to increase the throughput in
60995be951SChen Jing D(Mark)vPMD. They are:
61995be951SChen Jing D(Mark)
62995be951SChen Jing D(Mark)*   IEEE1588
63995be951SChen Jing D(Mark)
64995be951SChen Jing D(Mark)*   Flow director
65995be951SChen Jing D(Mark)
66995be951SChen Jing D(Mark)*   RX checksum offload
67995be951SChen Jing D(Mark)
68995be951SChen Jing D(Mark)Other features are supported using optional MACRO configuration. They include:
69995be951SChen Jing D(Mark)
70995be951SChen Jing D(Mark)*   HW VLAN strip
71995be951SChen Jing D(Mark)
72995be951SChen Jing D(Mark)*   L3/L4 packet type
73995be951SChen Jing D(Mark)
74995be951SChen Jing D(Mark)To enable via ``RX_OLFLAGS`` use ``RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y``.
75995be951SChen Jing D(Mark)
76ba932ddaSQi ZhangTo guarantee the constraint, the following capabilities in ``dev_conf.rxmode.offloads``
77995be951SChen Jing D(Mark)will be checked:
78995be951SChen Jing D(Mark)
79295968d1SFerruh Yigit*   ``RTE_ETH_RX_OFFLOAD_VLAN_EXTEND``
80995be951SChen Jing D(Mark)
81295968d1SFerruh Yigit*   ``RTE_ETH_RX_OFFLOAD_CHECKSUM``
82995be951SChen Jing D(Mark)
83995be951SChen Jing D(Mark)
84995be951SChen Jing D(Mark)RX Burst Size
85995be951SChen Jing D(Mark)^^^^^^^^^^^^^
86995be951SChen Jing D(Mark)
87995be951SChen Jing D(Mark)As vPMD is focused on high throughput, it processes 4 packets at a time. So it assumes
88995be951SChen Jing D(Mark)that the RX burst should be greater than 4 packets per burst. It returns zero if using
89995be951SChen Jing D(Mark)``nb_pkt`` < 4 in the receive handler. If ``nb_pkt`` is not a multiple of 4, a
90995be951SChen Jing D(Mark)floor alignment will be applied.
91995be951SChen Jing D(Mark)
92995be951SChen Jing D(Mark)
93995be951SChen Jing D(Mark)TX Constraint
94995be951SChen Jing D(Mark)~~~~~~~~~~~~~
95995be951SChen Jing D(Mark)
96995be951SChen Jing D(Mark)Features not Supported by TX Vector PMD
97995be951SChen Jing D(Mark)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
98995be951SChen Jing D(Mark)
99ba932ddaSQi ZhangTX vPMD only works when offloads is set to 0
100995be951SChen Jing D(Mark)
101ba932ddaSQi ZhangThis means that it does not support any TX offload.
102995be951SChen Jing D(Mark)
1033eee1f06SChen Jing D(Mark)Limitations
1043eee1f06SChen Jing D(Mark)-----------
1053eee1f06SChen Jing D(Mark)
1063eee1f06SChen Jing D(Mark)
1073eee1f06SChen Jing D(Mark)Switch manager
1083eee1f06SChen Jing D(Mark)~~~~~~~~~~~~~~
1093eee1f06SChen Jing D(Mark)
1103eee1f06SChen Jing D(Mark)The Intel FM10000 family of NICs integrate a hardware switch and multiple host
111*f8dbaebbSSean Morrisseyinterfaces. The FM10000 PMD only manages host interfaces. For the
112f43d3dbbSDavid Marchandswitch component another switch driver has to be loaded prior to the
113*f8dbaebbSSean MorrisseyFM10000 PMD. The switch driver can be acquired from Intel support.
1143eee1f06SChen Jing D(Mark)Only Testpoint is validated with DPDK, the latest version that has been
115e9b1bfb9SChen Jing D(Mark)validated with DPDK is 4.1.6.
1163eee1f06SChen Jing D(Mark)
1179eff0d0fSXiao WangSupport for Switch Restart
1189eff0d0fSXiao Wang~~~~~~~~~~~~~~~~~~~~~~~~~~
1199eff0d0fSXiao Wang
1209eff0d0fSXiao WangFor FM10000 multi host based design a DPDK app running in the VM or host needs
1219eff0d0fSXiao Wangto be aware of the switch's state since it may undergo a quit-restart. When
1229eff0d0fSXiao Wangthe switch goes down the DPDK app will receive a LSC event indicating link
1239eff0d0fSXiao Wangstatus down, and the app should stop the worker threads that are polling on
1249eff0d0fSXiao Wangthe Rx/Tx queues. When switch comes up, a LSC event indicating ``LINK_UP`` is
1259eff0d0fSXiao Wangsent to the app, which can then restart the FM10000 port to resume network
1269eff0d0fSXiao Wangprocessing.
1279eff0d0fSXiao Wang
128e599bbf4SThierry HerbelotCRC stripping
129e599bbf4SThierry Herbelot~~~~~~~~~~~~~
1303eee1f06SChen Jing D(Mark)
1313eee1f06SChen Jing D(Mark)The FM10000 family of NICs strip the CRC for every packets coming into the
132323e7b66SFerruh Yigithost interface. So, keeping CRC is not supported.
1333eee1f06SChen Jing D(Mark)
1343eee1f06SChen Jing D(Mark)Maximum packet length
1353eee1f06SChen Jing D(Mark)~~~~~~~~~~~~~~~~~~~~~
1363eee1f06SChen Jing D(Mark)
1373eee1f06SChen Jing D(Mark)The FM10000 family of NICS support a maximum of a 15K jumbo frame. The value
1381bb4a528SFerruh Yigitis fixed and cannot be changed. So, even when the ``rxmode.mtu``
1393eee1f06SChen Jing D(Mark)member of ``struct rte_eth_conf`` is set to a value lower than 15364, frames
1403eee1f06SChen Jing D(Mark)up to 15364 bytes can still reach the host interface.
14181496460SHarry van Haaren
14281496460SHarry van HaarenStatistic Polling Frequency
14381496460SHarry van Haaren~~~~~~~~~~~~~~~~~~~~~~~~~~~
14481496460SHarry van Haaren
14581496460SHarry van HaarenThe FM10000 NICs expose a set of statistics via the PCI BARs. These statistics
14681496460SHarry van Haarenare read from the hardware registers when ``rte_eth_stats_get()`` or
14781496460SHarry van Haaren``rte_eth_xstats_get()`` is called. The packet counting registers are 32 bits
14881496460SHarry van Haarenwhile the byte counting registers are 48 bits. As a result, the statistics must
14981496460SHarry van Haarenbe polled regularly in order to ensure the consistency of the returned reads.
15081496460SHarry van Haaren
15181496460SHarry van HaarenGiven the PCIe Gen3 x8, about 50Gbps of traffic can occur. With 64 byte packets
15281496460SHarry van Haarenthis gives almost 100 million packets/second, causing 32 bit integer overflow
15381496460SHarry van Haarenafter approx 40 seconds. To ensure these overflows are detected and accounted
15481496460SHarry van Haarenfor in the statistics, it is necessary to read statistic regularly. It is
15581496460SHarry van Haarensuggested to read stats every 20 seconds, which will ensure the statistics
15681496460SHarry van Haarenare accurate.
157eb57d9b7SShaopeng He
158eb57d9b7SShaopeng He
159eb57d9b7SShaopeng HeInterrupt mode
160eb57d9b7SShaopeng He~~~~~~~~~~~~~~
161eb57d9b7SShaopeng He
162eb57d9b7SShaopeng HeThe FM10000 family of NICS need one separate interrupt for mailbox. So only
163eb57d9b7SShaopeng Hedrivers which support multiple interrupt vectors e.g. vfio-pci can work
164eb57d9b7SShaopeng Hefor fm10k interrupt mode.
165