15630257fSFerruh Yigit.. SPDX-License-Identifier: BSD-3-Clause 25630257fSFerruh Yigit Copyright(c) 2015-2016 Intel Corporation. 33eee1f06SChen Jing D(Mark) 43eee1f06SChen Jing D(Mark)FM10K Poll Mode Driver 53eee1f06SChen Jing D(Mark)====================== 63eee1f06SChen Jing D(Mark) 73eee1f06SChen Jing D(Mark)The FM10K poll mode driver library provides support for the Intel FM10000 83eee1f06SChen Jing D(Mark)(FM10K) family of 40GbE/100GbE adapters. 93eee1f06SChen Jing D(Mark) 107958b131SXiao WangFTAG Based Forwarding of FM10K 117958b131SXiao Wang------------------------------ 127958b131SXiao Wang 137958b131SXiao WangFTAG Based Forwarding is a unique feature of FM10K. The FM10K family of NICs 147958b131SXiao Wangsupport the addition of a Fabric Tag (FTAG) to carry special information. 157958b131SXiao WangThe FTAG is placed at the beginning of the frame, it contains information 167958b131SXiao Wangsuch as where the packet comes from and goes, and the vlan tag. In FTAG based 177958b131SXiao Wangforwarding mode, the switch logic forwards packets according to glort (global 187958b131SXiao Wangresource tag) information, rather than the mac and vlan table. Currently this 197958b131SXiao Wangfeature works only on PF. 207958b131SXiao Wang 217958b131SXiao WangTo enable this feature, the user should pass a devargs parameter to the eal 22db27370bSStephen Hemmingerlike "-a 84:00.0,enable_ftag=1", and the application should make sure an 237958b131SXiao Wangappropriate FTAG is inserted for every frame on TX side. 243eee1f06SChen Jing D(Mark) 25995be951SChen Jing D(Mark)Vector PMD for FM10K 26995be951SChen Jing D(Mark)-------------------- 27995be951SChen Jing D(Mark) 28995be951SChen Jing D(Mark)Vector PMD (vPMD) uses Intel® SIMD instructions to optimize packet I/O. 29995be951SChen Jing D(Mark)It improves load/store bandwidth efficiency of L1 data cache by using a wider 30995be951SChen Jing D(Mark)SSE/AVX ''register (1)''. 31995be951SChen Jing D(Mark)The wider register gives space to hold multiple packet buffers so as to save 32995be951SChen Jing D(Mark)on the number of instructions when bulk processing packets. 33995be951SChen Jing D(Mark) 34995be951SChen Jing D(Mark)There is no change to the PMD API. The RX/TX handlers are the only two entries for 35995be951SChen Jing D(Mark)vPMD packet I/O. They are transparently registered at runtime RX/TX execution 36995be951SChen Jing D(Mark)if all required conditions are met. 37995be951SChen Jing D(Mark) 38995be951SChen Jing D(Mark)Some constraints apply as pre-conditions for specific optimizations on bulk 39995be951SChen Jing D(Mark)packet transfers. The following sections explain RX and TX constraints in the 40995be951SChen Jing D(Mark)vPMD. 41995be951SChen Jing D(Mark) 42995be951SChen Jing D(Mark) 43995be951SChen Jing D(Mark)RX Constraints 44995be951SChen Jing D(Mark)~~~~~~~~~~~~~~ 45995be951SChen Jing D(Mark) 46995be951SChen Jing D(Mark) 47995be951SChen Jing D(Mark)Prerequisites and Pre-conditions 48995be951SChen Jing D(Mark)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 49995be951SChen Jing D(Mark) 50995be951SChen Jing D(Mark)For Vector RX it is assumed that the number of descriptor rings will be a power 51995be951SChen Jing D(Mark)of 2. With this pre-condition, the ring pointer can easily scroll back to the 52995be951SChen Jing D(Mark)head after hitting the tail without a conditional check. In addition Vector RX 53995be951SChen Jing D(Mark)can use this assumption to do a bit mask using ``ring_size - 1``. 54995be951SChen Jing D(Mark) 55995be951SChen Jing D(Mark) 56995be951SChen Jing D(Mark)Features not Supported by Vector RX PMD 57995be951SChen Jing D(Mark)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 58995be951SChen Jing D(Mark) 59995be951SChen Jing D(Mark)Some features are not supported when trying to increase the throughput in 60995be951SChen Jing D(Mark)vPMD. They are: 61995be951SChen Jing D(Mark) 62995be951SChen Jing D(Mark)* IEEE1588 63995be951SChen Jing D(Mark) 64995be951SChen Jing D(Mark)* Flow director 65995be951SChen Jing D(Mark) 66995be951SChen Jing D(Mark)* RX checksum offload 67995be951SChen Jing D(Mark) 68995be951SChen Jing D(Mark)Other features are supported using optional MACRO configuration. They include: 69995be951SChen Jing D(Mark) 70995be951SChen Jing D(Mark)* HW VLAN strip 71995be951SChen Jing D(Mark) 72995be951SChen Jing D(Mark)* L3/L4 packet type 73995be951SChen Jing D(Mark) 74995be951SChen Jing D(Mark)To enable via ``RX_OLFLAGS`` use ``RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y``. 75995be951SChen Jing D(Mark) 76ba932ddaSQi ZhangTo guarantee the constraint, the following capabilities in ``dev_conf.rxmode.offloads`` 77995be951SChen Jing D(Mark)will be checked: 78995be951SChen Jing D(Mark) 79295968d1SFerruh Yigit* ``RTE_ETH_RX_OFFLOAD_VLAN_EXTEND`` 80995be951SChen Jing D(Mark) 81295968d1SFerruh Yigit* ``RTE_ETH_RX_OFFLOAD_CHECKSUM`` 82995be951SChen Jing D(Mark) 83995be951SChen Jing D(Mark) 84995be951SChen Jing D(Mark)RX Burst Size 85995be951SChen Jing D(Mark)^^^^^^^^^^^^^ 86995be951SChen Jing D(Mark) 87995be951SChen Jing D(Mark)As vPMD is focused on high throughput, it processes 4 packets at a time. So it assumes 88995be951SChen Jing D(Mark)that the RX burst should be greater than 4 packets per burst. It returns zero if using 89995be951SChen Jing D(Mark)``nb_pkt`` < 4 in the receive handler. If ``nb_pkt`` is not a multiple of 4, a 90995be951SChen Jing D(Mark)floor alignment will be applied. 91995be951SChen Jing D(Mark) 92995be951SChen Jing D(Mark) 93995be951SChen Jing D(Mark)TX Constraint 94995be951SChen Jing D(Mark)~~~~~~~~~~~~~ 95995be951SChen Jing D(Mark) 96995be951SChen Jing D(Mark)Features not Supported by TX Vector PMD 97995be951SChen Jing D(Mark)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 98995be951SChen Jing D(Mark) 99ba932ddaSQi ZhangTX vPMD only works when offloads is set to 0 100995be951SChen Jing D(Mark) 101ba932ddaSQi ZhangThis means that it does not support any TX offload. 102995be951SChen Jing D(Mark) 1033eee1f06SChen Jing D(Mark)Limitations 1043eee1f06SChen Jing D(Mark)----------- 1053eee1f06SChen Jing D(Mark) 1063eee1f06SChen Jing D(Mark) 1073eee1f06SChen Jing D(Mark)Switch manager 1083eee1f06SChen Jing D(Mark)~~~~~~~~~~~~~~ 1093eee1f06SChen Jing D(Mark) 1103eee1f06SChen Jing D(Mark)The Intel FM10000 family of NICs integrate a hardware switch and multiple host 111*f8dbaebbSSean Morrisseyinterfaces. The FM10000 PMD only manages host interfaces. For the 112f43d3dbbSDavid Marchandswitch component another switch driver has to be loaded prior to the 113*f8dbaebbSSean MorrisseyFM10000 PMD. The switch driver can be acquired from Intel support. 1143eee1f06SChen Jing D(Mark)Only Testpoint is validated with DPDK, the latest version that has been 115e9b1bfb9SChen Jing D(Mark)validated with DPDK is 4.1.6. 1163eee1f06SChen Jing D(Mark) 1179eff0d0fSXiao WangSupport for Switch Restart 1189eff0d0fSXiao Wang~~~~~~~~~~~~~~~~~~~~~~~~~~ 1199eff0d0fSXiao Wang 1209eff0d0fSXiao WangFor FM10000 multi host based design a DPDK app running in the VM or host needs 1219eff0d0fSXiao Wangto be aware of the switch's state since it may undergo a quit-restart. When 1229eff0d0fSXiao Wangthe switch goes down the DPDK app will receive a LSC event indicating link 1239eff0d0fSXiao Wangstatus down, and the app should stop the worker threads that are polling on 1249eff0d0fSXiao Wangthe Rx/Tx queues. When switch comes up, a LSC event indicating ``LINK_UP`` is 1259eff0d0fSXiao Wangsent to the app, which can then restart the FM10000 port to resume network 1269eff0d0fSXiao Wangprocessing. 1279eff0d0fSXiao Wang 128e599bbf4SThierry HerbelotCRC stripping 129e599bbf4SThierry Herbelot~~~~~~~~~~~~~ 1303eee1f06SChen Jing D(Mark) 1313eee1f06SChen Jing D(Mark)The FM10000 family of NICs strip the CRC for every packets coming into the 132323e7b66SFerruh Yigithost interface. So, keeping CRC is not supported. 1333eee1f06SChen Jing D(Mark) 1343eee1f06SChen Jing D(Mark)Maximum packet length 1353eee1f06SChen Jing D(Mark)~~~~~~~~~~~~~~~~~~~~~ 1363eee1f06SChen Jing D(Mark) 1373eee1f06SChen Jing D(Mark)The FM10000 family of NICS support a maximum of a 15K jumbo frame. The value 1381bb4a528SFerruh Yigitis fixed and cannot be changed. So, even when the ``rxmode.mtu`` 1393eee1f06SChen Jing D(Mark)member of ``struct rte_eth_conf`` is set to a value lower than 15364, frames 1403eee1f06SChen Jing D(Mark)up to 15364 bytes can still reach the host interface. 14181496460SHarry van Haaren 14281496460SHarry van HaarenStatistic Polling Frequency 14381496460SHarry van Haaren~~~~~~~~~~~~~~~~~~~~~~~~~~~ 14481496460SHarry van Haaren 14581496460SHarry van HaarenThe FM10000 NICs expose a set of statistics via the PCI BARs. These statistics 14681496460SHarry van Haarenare read from the hardware registers when ``rte_eth_stats_get()`` or 14781496460SHarry van Haaren``rte_eth_xstats_get()`` is called. The packet counting registers are 32 bits 14881496460SHarry van Haarenwhile the byte counting registers are 48 bits. As a result, the statistics must 14981496460SHarry van Haarenbe polled regularly in order to ensure the consistency of the returned reads. 15081496460SHarry van Haaren 15181496460SHarry van HaarenGiven the PCIe Gen3 x8, about 50Gbps of traffic can occur. With 64 byte packets 15281496460SHarry van Haarenthis gives almost 100 million packets/second, causing 32 bit integer overflow 15381496460SHarry van Haarenafter approx 40 seconds. To ensure these overflows are detected and accounted 15481496460SHarry van Haarenfor in the statistics, it is necessary to read statistic regularly. It is 15581496460SHarry van Haarensuggested to read stats every 20 seconds, which will ensure the statistics 15681496460SHarry van Haarenare accurate. 157eb57d9b7SShaopeng He 158eb57d9b7SShaopeng He 159eb57d9b7SShaopeng HeInterrupt mode 160eb57d9b7SShaopeng He~~~~~~~~~~~~~~ 161eb57d9b7SShaopeng He 162eb57d9b7SShaopeng HeThe FM10000 family of NICS need one separate interrupt for mailbox. So only 163eb57d9b7SShaopeng Hedrivers which support multiple interrupt vectors e.g. vfio-pci can work 164eb57d9b7SShaopeng Hefor fm10k interrupt mode. 165