15630257fSFerruh Yigit.. SPDX-License-Identifier: BSD-3-Clause 25630257fSFerruh Yigit Copyright(c) 2010-2015 Intel Corporation. 3fc1f2750SBernard Iremonger 4fc1f2750SBernard IremongerLink Bonding Poll Mode Driver Library 5fc1f2750SBernard Iremonger===================================== 6fc1f2750SBernard Iremonger 7fc1f2750SBernard IremongerIn addition to Poll Mode Drivers (PMDs) for physical and virtual hardware, 848624fd9SSiobhan ButlerDPDK also includes a pure-software library that 93c532414SZhiyong Yangallows physical PMDs to be bonded together to create a single logical PMD. 10fc1f2750SBernard Iremonger 114a22e6eeSJohn McNamara.. figure:: img/bond-overview.* 124a22e6eeSJohn McNamara 13*4f840086SLong Wu Bonding PMDs 144a22e6eeSJohn McNamara 15fc1f2750SBernard Iremonger 168809f78cSBruce RichardsonThe Link Bonding PMD library(librte_net_bond) supports bonding of groups of 173c532414SZhiyong Yang``rte_eth_dev`` ports of the same speed and duplex to provide similar 183c532414SZhiyong Yangcapabilities to that found in Linux bonding driver to allow the aggregation 1915e34522SLong Wuof multiple (member) NICs into a single logical interface between a server 20*4f840086SLong Wuand a switch. The new bonding PMD will then process these interfaces based on 213c532414SZhiyong Yangthe mode of operation specified to provide support for features such as 223c532414SZhiyong Yangredundant links, fault tolerance and/or load balancing. 23fc1f2750SBernard Iremonger 248809f78cSBruce RichardsonThe librte_net_bond library exports a C API which provides an API for the 25*4f840086SLong Wucreation of bonding devices as well as the configuration and management of the 26*4f840086SLong Wubonding device and its member devices. 27fc1f2750SBernard Iremonger 28fc1f2750SBernard Iremonger.. note:: 29fc1f2750SBernard Iremonger 30b0152b1bSDeclan Doherty The Link Bonding PMD Library is enabled by default in the build 3189c67ae2SCiara Power configuration, the library can be disabled using the meson option 32c4498cb3SBen Magistro "-Ddisable_drivers=net/bonding". 3389c67ae2SCiara Power 34fc1f2750SBernard Iremonger 35fc1f2750SBernard IremongerLink Bonding Modes Overview 36fc1f2750SBernard Iremonger--------------------------- 37fc1f2750SBernard Iremonger 38af9f6e12SFerruh YigitCurrently the Link Bonding PMD library supports following modes of operation: 39fc1f2750SBernard Iremonger 40fc1f2750SBernard Iremonger* **Round-Robin (Mode 0):** 41b0152b1bSDeclan Doherty 424a22e6eeSJohn McNamara.. figure:: img/bond-mode-0.* 434a22e6eeSJohn McNamara 444a22e6eeSJohn McNamara Round-Robin (Mode 0) 454a22e6eeSJohn McNamara 46b0152b1bSDeclan Doherty 47b0152b1bSDeclan Doherty This mode provides load balancing and fault tolerance by transmission of 4815e34522SLong Wu packets in sequential order from the first available member device through 49b0152b1bSDeclan Doherty the last. Packets are bulk dequeued from devices then serviced in a 50b0152b1bSDeclan Doherty round-robin manner. This mode does not guarantee in order reception of 51b0152b1bSDeclan Doherty packets and down stream should be able to handle out of order packets. 52fc1f2750SBernard Iremonger 53fc1f2750SBernard Iremonger* **Active Backup (Mode 1):** 54b0152b1bSDeclan Doherty 554a22e6eeSJohn McNamara.. figure:: img/bond-mode-1.* 564a22e6eeSJohn McNamara 574a22e6eeSJohn McNamara Active Backup (Mode 1) 584a22e6eeSJohn McNamara 59b0152b1bSDeclan Doherty 6015e34522SLong Wu In this mode only one member in the bond is active at any time, a different 6115e34522SLong Wu member becomes active if, and only if, the primary active member fails, 6215e34522SLong Wu thereby providing fault tolerance to member failure. The single logical 63*4f840086SLong Wu bonding interface's MAC address is externally visible on only one NIC (port) 64fc1f2750SBernard Iremonger to avoid confusing the network switch. 65fc1f2750SBernard Iremonger 66fc1f2750SBernard Iremonger* **Balance XOR (Mode 2):** 67b0152b1bSDeclan Doherty 684a22e6eeSJohn McNamara.. figure:: img/bond-mode-2.* 694a22e6eeSJohn McNamara 704a22e6eeSJohn McNamara Balance XOR (Mode 2) 714a22e6eeSJohn McNamara 72b0152b1bSDeclan Doherty 73b0152b1bSDeclan Doherty This mode provides transmit load balancing (based on the selected 74b0152b1bSDeclan Doherty transmission policy) and fault tolerance. The default policy (layer2) uses 75b0152b1bSDeclan Doherty a simple calculation based on the packet flow source and destination MAC 76*4f840086SLong Wu addresses as well as the number of active members available to the bonding 7715e34522SLong Wu device to classify the packet to a specific member to transmit on. Alternate 78b0152b1bSDeclan Doherty transmission policies supported are layer 2+3, this takes the IP source and 7915e34522SLong Wu destination addresses into the calculation of the transmit member port and 80b0152b1bSDeclan Doherty the final supported policy is layer 3+4, this uses IP source and 81b0152b1bSDeclan Doherty destination addresses as well as the TCP/UDP source and destination port. 82b0152b1bSDeclan Doherty 83b0152b1bSDeclan Doherty.. note:: 84fea1d908SJohn McNamara The coloring differences of the packets are used to identify different flow 85b0152b1bSDeclan Doherty classification calculated by the selected transmit policy 86b0152b1bSDeclan Doherty 87fc1f2750SBernard Iremonger 88fc1f2750SBernard Iremonger* **Broadcast (Mode 3):** 89b0152b1bSDeclan Doherty 904a22e6eeSJohn McNamara.. figure:: img/bond-mode-3.* 914a22e6eeSJohn McNamara 924a22e6eeSJohn McNamara Broadcast (Mode 3) 934a22e6eeSJohn McNamara 94b0152b1bSDeclan Doherty 9515e34522SLong Wu This mode provides fault tolerance by transmission of packets on all member 96b0152b1bSDeclan Doherty ports. 97b0152b1bSDeclan Doherty 98b0152b1bSDeclan Doherty* **Link Aggregation 802.3AD (Mode 4):** 99b0152b1bSDeclan Doherty 1004a22e6eeSJohn McNamara.. figure:: img/bond-mode-4.* 1014a22e6eeSJohn McNamara 1024a22e6eeSJohn McNamara Link Aggregation 802.3AD (Mode 4) 1034a22e6eeSJohn McNamara 104b0152b1bSDeclan Doherty 105b0152b1bSDeclan Doherty This mode provides dynamic link aggregation according to the 802.3ad 106b0152b1bSDeclan Doherty specification. It negotiates and monitors aggregation groups that share the 107b0152b1bSDeclan Doherty same speed and duplex settings using the selected balance transmit policy 108b0152b1bSDeclan Doherty for balancing outgoing traffic. 109b0152b1bSDeclan Doherty 110b0152b1bSDeclan Doherty DPDK implementation of this mode provide some additional requirements of 111b0152b1bSDeclan Doherty the application. 112b0152b1bSDeclan Doherty 113b0152b1bSDeclan Doherty #. It needs to call ``rte_eth_tx_burst`` and ``rte_eth_rx_burst`` with 114b0152b1bSDeclan Doherty intervals period of less than 100ms. 115b0152b1bSDeclan Doherty 116b0152b1bSDeclan Doherty #. Calls to ``rte_eth_tx_burst`` must have a buffer size of at least 2xN, 11715e34522SLong Wu where N is the number of members. This is a space required for LACP 118b0152b1bSDeclan Doherty frames. Additionally LACP packets are included in the statistics, but 119b0152b1bSDeclan Doherty they are not returned to the application. 120b0152b1bSDeclan Doherty 121b0152b1bSDeclan Doherty* **Transmit Load Balancing (Mode 5):** 122b0152b1bSDeclan Doherty 1234a22e6eeSJohn McNamara.. figure:: img/bond-mode-5.* 1244a22e6eeSJohn McNamara 1254a22e6eeSJohn McNamara Transmit Load Balancing (Mode 5) 1264a22e6eeSJohn McNamara 127b0152b1bSDeclan Doherty 128b0152b1bSDeclan Doherty This mode provides an adaptive transmit load balancing. It dynamically 12915e34522SLong Wu changes the transmitting member, according to the computed load. Statistics 130b0152b1bSDeclan Doherty are collected in 100ms intervals and scheduled every 10ms. 131b0152b1bSDeclan Doherty 132fc1f2750SBernard Iremonger 133fc1f2750SBernard IremongerImplementation Details 134fc1f2750SBernard Iremonger---------------------- 135fc1f2750SBernard Iremonger 136*4f840086SLong WuThe librte_net_bond bonding device is compatible with the Ethernet device API 13748624fd9SSiobhan Butlerexported by the Ethernet PMDs described in the *DPDK API Reference*. 138fc1f2750SBernard Iremonger 139*4f840086SLong WuThe Link Bonding Library supports the creation of bonding devices at application 140b0152b1bSDeclan Dohertystartup time during EAL initialization using the ``--vdev`` option as well as 141b0152b1bSDeclan Dohertyprogrammatically via the C API ``rte_eth_bond_create`` function. 142fc1f2750SBernard Iremonger 143*4f840086SLong WuBonding devices support the dynamical addition and removal of member devices using 14415e34522SLong Wuthe ``rte_eth_bond_member_add`` / ``rte_eth_bond_member_remove`` APIs. 145fc1f2750SBernard Iremonger 146*4f840086SLong WuAfter a member device is added to a bonding device member is stopped using 147b0152b1bSDeclan Doherty``rte_eth_dev_stop`` and then reconfigured using ``rte_eth_dev_configure`` 148b0152b1bSDeclan Dohertythe RX and TX queues are also reconfigured using ``rte_eth_tx_queue_setup`` / 149b0152b1bSDeclan Doherty``rte_eth_rx_queue_setup`` with the parameters use to configure the bonding 150734ce47fSTomasz Kulasekdevice. If RSS is enabled for bonding device, this mode is also enabled on new 15115e34522SLong Wumember and configured as well. 15249dad902SMatan AzradAny flow which was configured to the bond device also is configured to the added 15315e34522SLong Wumember. 154734ce47fSTomasz Kulasek 155734ce47fSTomasz KulasekSetting up multi-queue mode for bonding device to RSS, makes it fully 15615e34522SLong WuRSS-capable, so all members are synchronized with its configuration. This mode is 15715e34522SLong Wuintended to provide RSS configuration on members transparent for client 158734ce47fSTomasz Kulasekapplication implementation. 159734ce47fSTomasz Kulasek 160734ce47fSTomasz KulasekBonding device stores its own version of RSS settings i.e. RETA, RSS hash 16115e34522SLong Wufunction and RSS key, used to set up its members. That let to define the meaning 162734ce47fSTomasz Kulasekof RSS configuration of bonding device as desired configuration of whole bonding 16315e34522SLong Wu(as one unit), without pointing any of member inside. It is required to ensure 1642fe68f32SJohn McNamaraconsistency and made it more error-proof. 165734ce47fSTomasz Kulasek 166734ce47fSTomasz KulasekRSS hash function set for bonding device, is a maximal set of RSS hash functions 167*4f840086SLong Wusupported by all bonding members. RETA size is a GCD of all its RETA's sizes, so 16815e34522SLong Wuit can be easily used as a pattern providing expected behavior, even if member 169*4f840086SLong WuRETAs' sizes are different. If RSS Key is not set for bonding device, it's not 17015e34522SLong Wuchanged on the members and default key for device is used. 171734ce47fSTomasz Kulasek 172*4f840086SLong WuAs RSS configurations, there is flow consistency in the bonding members for the 17349dad902SMatan Azradnext rte flow operations: 17449dad902SMatan Azrad 17549dad902SMatan AzradValidate: 17615e34522SLong Wu - Validate flow for each member, failure at least for one member causes to 17749dad902SMatan Azrad bond validation failure. 17849dad902SMatan Azrad 17949dad902SMatan AzradCreate: 18015e34522SLong Wu - Create the flow in all members. 18115e34522SLong Wu - Save all the members created flows objects in bonding internal flow 18249dad902SMatan Azrad structure. 18315e34522SLong Wu - Failure in flow creation for existed member rejects the flow. 18415e34522SLong Wu - Failure in flow creation for new members in member adding time rejects 18515e34522SLong Wu the member. 18649dad902SMatan Azrad 18749dad902SMatan AzradDestroy: 18815e34522SLong Wu - Destroy the flow in all members and release the bond internal flow 18949dad902SMatan Azrad memory. 19049dad902SMatan Azrad 19149dad902SMatan AzradFlush: 19215e34522SLong Wu - Destroy all the bonding PMD flows in all the members. 19349dad902SMatan Azrad 19449dad902SMatan Azrad.. note:: 19549dad902SMatan Azrad 19615e34522SLong Wu Don't call members flush directly, It destroys all the member flows which 19749dad902SMatan Azrad may include external flows or the bond internal LACP flow. 19849dad902SMatan Azrad 19949dad902SMatan AzradQuery: 20015e34522SLong Wu - Summarize flow counters from all the members, relevant only for 20149dad902SMatan Azrad ``RTE_FLOW_ACTION_TYPE_COUNT``. 20249dad902SMatan Azrad 20349dad902SMatan AzradIsolate: 20415e34522SLong Wu - Call to flow isolate for all members. 20515e34522SLong Wu - Failure in flow isolation for existed member rejects the isolate mode. 20615e34522SLong Wu - Failure in flow isolation for new members in member adding time rejects 20715e34522SLong Wu the member. 20849dad902SMatan Azrad 209734ce47fSTomasz KulasekAll settings are managed through the bonding port API and always are propagated 21015e34522SLong Wuin one direction (from bonding to members). 211b0152b1bSDeclan Doherty 212b0152b1bSDeclan DohertyLink Status Change Interrupts / Polling 213b0152b1bSDeclan Doherty~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 214b0152b1bSDeclan Doherty 215b0152b1bSDeclan DohertyLink bonding devices support the registration of a link status change callback, 216b0152b1bSDeclan Dohertyusing the ``rte_eth_dev_callback_register`` API, this will be called when the 217b0152b1bSDeclan Dohertystatus of the bonding device changes. For example in the case of a bonding 21815e34522SLong Wudevice which has 3 members, the link status will change to up when one member 21915e34522SLong Wubecomes active or change to down when all members become inactive. There is no 22015e34522SLong Wucallback notification when a single member changes state and the previous 22115e34522SLong Wuconditions are not met. If a user wishes to monitor individual members then they 22215e34522SLong Wumust register callbacks with that member directly. 223b0152b1bSDeclan Doherty 224b0152b1bSDeclan DohertyThe link bonding library also supports devices which do not implement link 225f6d690f2STomasz Kulasekstatus change interrupts, this is achieved by polling the devices link status at 226b0152b1bSDeclan Dohertya defined period which is set using the ``rte_eth_bond_link_monitoring_set`` 22715e34522SLong WuAPI, the default polling interval is 10ms. When a device is added as a member to 228b0152b1bSDeclan Dohertya bonding device it is determined using the ``RTE_PCI_DRV_INTR_LSC`` flag 229b0152b1bSDeclan Dohertywhether the device supports interrupts or whether the link status should be 230b0152b1bSDeclan Dohertymonitored by polling it. 231fc1f2750SBernard Iremonger 232fc1f2750SBernard IremongerRequirements / Limitations 233fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~~~~~~~~~~ 234fc1f2750SBernard Iremonger 235b0152b1bSDeclan DohertyThe current implementation only supports devices that support the same speed 236*4f840086SLong Wuand duplex to be added as a members to the same bonding device. The bonding device 237*4f840086SLong Wuinherits these attributes from the first active member added to the bonding 238*4f840086SLong Wudevice and then all further members added to the bonding device must support 239b0152b1bSDeclan Dohertythese parameters. 240fc1f2750SBernard Iremonger 24115e34522SLong WuA bonding device must have a minimum of one member before the bonding device 242b0152b1bSDeclan Dohertyitself can be started. 243fc1f2750SBernard Iremonger 244734ce47fSTomasz KulasekTo use a bonding device dynamic RSS configuration feature effectively, it is 24515e34522SLong Wualso required, that all members should be RSS-capable and support, at least one 246734ce47fSTomasz Kulasekcommon hash function available for each of them. Changing RSS key is only 24715e34522SLong Wupossible, when all member devices support the same key size. 248734ce47fSTomasz Kulasek 24915e34522SLong WuTo prevent inconsistency on how members process packets, once a device is added 25049dad902SMatan Azradto a bonding device, RSS and rte flow configurations should be managed through 25115e34522SLong Wuthe bonding device API, and not directly on the member. 252734ce47fSTomasz Kulasek 253b0152b1bSDeclan DohertyLike all other PMD, all functions exported by a PMD are lock-free functions 254b0152b1bSDeclan Dohertythat are assumed not to be invoked in parallel on different logical cores to 255b0152b1bSDeclan Dohertywork on the same target object. 256fc1f2750SBernard Iremonger 257b0152b1bSDeclan DohertyIt should also be noted that the PMD receive function should not be invoked 258*4f840086SLong Wudirectly on a member devices after they have been to a bonding device since 25915e34522SLong Wupackets read directly from the member device will no longer be available to the 260*4f840086SLong Wubonding device to read. 261fc1f2750SBernard Iremonger 262fc1f2750SBernard IremongerConfiguration 263fc1f2750SBernard Iremonger~~~~~~~~~~~~~ 264fc1f2750SBernard Iremonger 265b0152b1bSDeclan DohertyLink bonding devices are created using the ``rte_eth_bond_create`` API 266fc1f2750SBernard Iremongerwhich requires a unique device name, the bonding mode, 267fc1f2750SBernard Iremongerand the socket Id to allocate the bonding device's resources on. 268*4f840086SLong WuThe other configurable parameters for a bonding device are its member devices, 26915e34522SLong Wuits primary member, a user defined MAC address and transmission policy to use if 270b0152b1bSDeclan Dohertythe device is in balance XOR mode. 271fc1f2750SBernard Iremonger 27215e34522SLong WuMember Devices 27315e34522SLong Wu^^^^^^^^^^^^^^ 274fc1f2750SBernard Iremonger 27515e34522SLong WuBonding devices support up to a maximum of ``RTE_MAX_ETHPORTS`` member devices 27615e34522SLong Wuof the same speed and duplex. Ethernet devices can be added as a member to a 277*4f840086SLong Wumaximum of one bonding device. Member devices are reconfigured with the 278*4f840086SLong Wuconfiguration of the bonding device on being added to a bonding device. 279fc1f2750SBernard Iremonger 280*4f840086SLong WuThe bonding also guarantees to return the MAC address of the member device to its 28115e34522SLong Wuoriginal value of removal of a member from it. 282fc1f2750SBernard Iremonger 28315e34522SLong WuPrimary Member 28415e34522SLong Wu^^^^^^^^^^^^^^ 285fc1f2750SBernard Iremonger 286*4f840086SLong WuThe primary member is used to define the default port to use when a bonding 287b0152b1bSDeclan Dohertydevice is in active backup mode. A different port will only be used if, and 288b0152b1bSDeclan Dohertyonly if, the current primary port goes down. If the user does not specify a 289*4f840086SLong Wuprimary port it will default to being the first port added to the bonding device. 290fc1f2750SBernard Iremonger 291fc1f2750SBernard IremongerMAC Address 292fc1f2750SBernard Iremonger^^^^^^^^^^^ 293fc1f2750SBernard Iremonger 294*4f840086SLong WuThe bonding device can be configured with a user specified MAC address, this 29515e34522SLong Wuaddress will be inherited by the some/all member devices depending on the 296b0152b1bSDeclan Dohertyoperating mode. If the device is in active backup mode then only the primary 29715e34522SLong Wudevice will have the user specified MAC, all other members will retain their 29815e34522SLong Wuoriginal MAC address. In mode 0, 2, 3, 4 all members devices are configure with 299*4f840086SLong Wuthe bonding devices MAC address. 300fc1f2750SBernard Iremonger 301*4f840086SLong WuIf a user defined MAC address is not defined then the bonding device will 30215e34522SLong Wudefault to using the primary members MAC address. 303fc1f2750SBernard Iremonger 304fc1f2750SBernard IremongerBalance XOR Transmit Policies 305fc1f2750SBernard Iremonger^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 306fc1f2750SBernard Iremonger 307*4f840086SLong WuThere are 3 supported transmission policies for bonding device running in 308b0152b1bSDeclan DohertyBalance XOR mode. Layer 2, Layer 2+3, Layer 3+4. 309fc1f2750SBernard Iremonger 310b0152b1bSDeclan Doherty* **Layer 2:** Ethernet MAC address based balancing is the default 311b0152b1bSDeclan Doherty transmission policy for Balance XOR bonding mode. It uses a simple XOR 312b0152b1bSDeclan Doherty calculation on the source MAC address and destination MAC address of the 31315e34522SLong Wu packet and then calculate the modulus of this value to calculate the member 314b0152b1bSDeclan Doherty device to transmit the packet on. 315fc1f2750SBernard Iremonger 316b0152b1bSDeclan Doherty* **Layer 2 + 3:** Ethernet MAC address & IP Address based balancing uses a 317b0152b1bSDeclan Doherty combination of source/destination MAC addresses and the source/destination 31815e34522SLong Wu IP addresses of the data packet to decide which member port the packet will 319b0152b1bSDeclan Doherty be transmitted on. 320fc1f2750SBernard Iremonger 321b0152b1bSDeclan Doherty* **Layer 3 + 4:** IP Address & UDP Port based balancing uses a combination 322b0152b1bSDeclan Doherty of source/destination IP Address and the source/destination UDP ports of 32315e34522SLong Wu the packet of the data packet to decide which member port the packet will be 324b0152b1bSDeclan Doherty transmitted on. 325fc1f2750SBernard Iremonger 326b0152b1bSDeclan DohertyAll these policies support 802.1Q VLAN Ethernet packets, as well as IPv4, IPv6 327b0152b1bSDeclan Dohertyand UDP protocols for load balancing. 328fc1f2750SBernard Iremonger 329fc1f2750SBernard IremongerUsing Link Bonding Devices 330fc1f2750SBernard Iremonger-------------------------- 331fc1f2750SBernard Iremonger 3328809f78cSBruce RichardsonThe librte_net_bond library supports two modes of device creation, the libraries 333b0152b1bSDeclan Dohertyexport full C API or using the EAL command line to statically configure link 334b0152b1bSDeclan Dohertybonding devices at application startup. Using the EAL option it is possible to 335b0152b1bSDeclan Dohertyuse link bonding functionality transparently without specific knowledge of the 336b0152b1bSDeclan Dohertylibraries API, this can be used, for example, to add bonding functionality, 337b0152b1bSDeclan Dohertysuch as active backup, to an existing application which has no knowledge of 338b0152b1bSDeclan Dohertythe link bonding C API. 339fc1f2750SBernard Iremonger 340fc1f2750SBernard IremongerUsing the Poll Mode Driver from an Application 341fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 342fc1f2750SBernard Iremonger 3438809f78cSBruce RichardsonUsing the librte_net_bond libraries API it is possible to dynamically create 344b0152b1bSDeclan Dohertyand manage link bonding device from within any application. Link bonding 345f6d690f2STomasz Kulasekdevices are created using the ``rte_eth_bond_create`` API which requires a 346b0152b1bSDeclan Dohertyunique device name, the link bonding mode to initial the device in and finally 347b0152b1bSDeclan Dohertythe socket Id which to allocate the devices resources onto. After successful 348b0152b1bSDeclan Dohertycreation of a bonding device it must be configured using the generic Ethernet 349b0152b1bSDeclan Dohertydevice configure API ``rte_eth_dev_configure`` and then the RX and TX queues 350b0152b1bSDeclan Dohertywhich will be used must be setup using ``rte_eth_tx_queue_setup`` / 351b0152b1bSDeclan Doherty``rte_eth_rx_queue_setup``. 352fc1f2750SBernard Iremonger 35315e34522SLong WuMember devices can be dynamically added and removed from a link bonding device 35415e34522SLong Wuusing the ``rte_eth_bond_member_add`` / ``rte_eth_bond_member_remove`` 35515e34522SLong WuAPIs but at least one member device must be added to the link bonding device 356b0152b1bSDeclan Dohertybefore it can be started using ``rte_eth_dev_start``. 357fc1f2750SBernard Iremonger 358*4f840086SLong WuThe link status of a bonding device is dictated by that of its members, if all 35915e34522SLong Wumember device link status are down or if all members are removed from the link 360b0152b1bSDeclan Dohertybonding device then the link status of the bonding device will go down. 361fc1f2750SBernard Iremonger 362b0152b1bSDeclan DohertyIt is also possible to configure / query the configuration of the control 363*4f840086SLong Wuparameters of a bonding device using the provided APIs 364b0152b1bSDeclan Doherty``rte_eth_bond_mode_set/ get``, ``rte_eth_bond_primary_set/get``, 365b0152b1bSDeclan Doherty``rte_eth_bond_mac_set/reset`` and ``rte_eth_bond_xmit_policy_set/get``. 366fc1f2750SBernard Iremonger 367fc1f2750SBernard IremongerUsing Link Bonding Devices from the EAL Command Line 368fc1f2750SBernard Iremonger~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 369fc1f2750SBernard Iremonger 370b0152b1bSDeclan DohertyLink bonding devices can be created at application startup time using the 371b0152b1bSDeclan Doherty``--vdev`` EAL command line option. The device name must start with the 37210c929f5SHerakliusz Lipiecnet_bonding prefix followed by numbers or letters. The name must be unique for 373b0152b1bSDeclan Dohertyeach device. Each device can have multiple options arranged in a comma 374b0152b1bSDeclan Dohertyseparated list. Multiple devices definitions can be arranged by calling the 375b0152b1bSDeclan Doherty``--vdev`` option multiple times. 376b0152b1bSDeclan Doherty 377fc1f2750SBernard IremongerDevice names and bonding options must be separated by commas as shown below: 378fc1f2750SBernard Iremonger 379fc1f2750SBernard Iremonger.. code-block:: console 380fc1f2750SBernard Iremonger 38189c67ae2SCiara Power ./<build_dir>/app/dpdk-testpmd -l 0-3 -n 4 --vdev 'net_bonding0,bond_opt0=..,bond opt1=..'--vdev 'net_bonding1,bond _opt0=..,bond_opt1=..' 382fc1f2750SBernard Iremonger 383fc1f2750SBernard IremongerLink Bonding EAL Options 384fc1f2750SBernard Iremonger^^^^^^^^^^^^^^^^^^^^^^^^ 385fc1f2750SBernard Iremonger 386b0152b1bSDeclan DohertyThere are multiple ways of definitions that can be assessed and combined as 387b0152b1bSDeclan Dohertylong as the following two rules are respected: 388fc1f2750SBernard Iremonger 38910c929f5SHerakliusz Lipiec* A unique device name, in the format of net_bondingX is provided, 390fc1f2750SBernard Iremonger where X can be any combination of numbers and/or letters, 391fc1f2750SBernard Iremonger and the name is no greater than 32 characters long. 392fc1f2750SBernard Iremonger 393*4f840086SLong Wu* A least one member device is provided with for each bonding device definition. 394fc1f2750SBernard Iremonger 395*4f840086SLong Wu* The operation mode of the bonding device being created is provided. 396fc1f2750SBernard Iremonger 397fc1f2750SBernard IremongerThe different options are: 398fc1f2750SBernard Iremonger 399fc1f2750SBernard Iremonger* mode: Integer value defining the bonding mode of the device. 400b0152b1bSDeclan Doherty Currently supports modes 0,1,2,3,4,5 (round-robin, active backup, balance, 401b0152b1bSDeclan Doherty broadcast, link aggregation, transmit load balancing). 402b0152b1bSDeclan Doherty 403b0152b1bSDeclan Doherty.. code-block:: console 404fc1f2750SBernard Iremonger 405fc1f2750SBernard Iremonger mode=2 406fc1f2750SBernard Iremonger 407*4f840086SLong Wu* member: Defines the PMD device which will be added as member to the bonding 408f6d690f2STomasz Kulasek device. This option can be selected multiple times, for each device to be 40915e34522SLong Wu added as a member. Physical devices should be specified using their PCI 410b0152b1bSDeclan Doherty address, in the format domain:bus:devid.function 411b0152b1bSDeclan Doherty 412b0152b1bSDeclan Doherty.. code-block:: console 413fc1f2750SBernard Iremonger 41415e34522SLong Wu member=0000:0a:00.0,member=0000:0a:00.1 415fc1f2750SBernard Iremonger 41615e34522SLong Wu* primary: Optional parameter which defines the primary member port, 41715e34522SLong Wu is used in active backup mode to select the primary member for data TX/RX if 418b0152b1bSDeclan Doherty it is available. The primary port also is used to select the MAC address to 41915e34522SLong Wu use when it is not defined by the user. This defaults to the first member 42015e34522SLong Wu added to the device if it is specified. The primary device must be a member 421*4f840086SLong Wu of the bonding device. 422b0152b1bSDeclan Doherty 423b0152b1bSDeclan Doherty.. code-block:: console 424fc1f2750SBernard Iremonger 425fc1f2750SBernard Iremonger primary=0000:0a:00.0 426fc1f2750SBernard Iremonger 427b0152b1bSDeclan Doherty* socket_id: Optional parameter used to select which socket on a NUMA device 428*4f840086SLong Wu the bonding devices resources will be allocated on. 429b0152b1bSDeclan Doherty 430b0152b1bSDeclan Doherty.. code-block:: console 431fc1f2750SBernard Iremonger 432fc1f2750SBernard Iremonger socket_id=0 433fc1f2750SBernard Iremonger 434b0152b1bSDeclan Doherty* mac: Optional parameter to select a MAC address for link bonding device, 43515e34522SLong Wu this overrides the value of the primary member device. 436b0152b1bSDeclan Doherty 437b0152b1bSDeclan Doherty.. code-block:: console 438fc1f2750SBernard Iremonger 439fc1f2750SBernard Iremonger mac=00:1e:67:1d:fd:1d 440fc1f2750SBernard Iremonger 441b0152b1bSDeclan Doherty* xmit_policy: Optional parameter which defines the transmission policy when 442*4f840086SLong Wu the bonding device is in balance mode. If not user specified this defaults 443b0152b1bSDeclan Doherty to l2 (layer 2) forwarding, the other transmission policies available are 444b0152b1bSDeclan Doherty l23 (layer 2+3) and l34 (layer 3+4) 445fc1f2750SBernard Iremonger 446b0152b1bSDeclan Doherty.. code-block:: console 447b0152b1bSDeclan Doherty 448b0152b1bSDeclan Doherty xmit_policy=l23 449b0152b1bSDeclan Doherty 450b0152b1bSDeclan Doherty* lsc_poll_period_ms: Optional parameter which defines the polling interval 451b0152b1bSDeclan Doherty in milli-seconds at which devices which don't support lsc interrupts are 452b0152b1bSDeclan Doherty checked for a change in the devices link status 453b0152b1bSDeclan Doherty 454b0152b1bSDeclan Doherty.. code-block:: console 455b0152b1bSDeclan Doherty 456b0152b1bSDeclan Doherty lsc_poll_period_ms=100 457b0152b1bSDeclan Doherty 458b0152b1bSDeclan Doherty* up_delay: Optional parameter which adds a delay in milli-seconds to the 459b0152b1bSDeclan Doherty propagation of a devices link status changing to up, by default this 460b0152b1bSDeclan Doherty parameter is zero. 461b0152b1bSDeclan Doherty 462b0152b1bSDeclan Doherty.. code-block:: console 463b0152b1bSDeclan Doherty 464b0152b1bSDeclan Doherty up_delay=10 465b0152b1bSDeclan Doherty 466b0152b1bSDeclan Doherty* down_delay: Optional parameter which adds a delay in milli-seconds to the 467b0152b1bSDeclan Doherty propagation of a devices link status changing to down, by default this 468b0152b1bSDeclan Doherty parameter is zero. 469b0152b1bSDeclan Doherty 470b0152b1bSDeclan Doherty.. code-block:: console 471b0152b1bSDeclan Doherty 472b0152b1bSDeclan Doherty down_delay=50 473fc1f2750SBernard Iremonger 474fc1f2750SBernard IremongerExamples of Usage 475fc1f2750SBernard Iremonger^^^^^^^^^^^^^^^^^ 476fc1f2750SBernard Iremonger 477*4f840086SLong WuCreate a bonding device in round robin mode with two members specified by their PCI address: 478fc1f2750SBernard Iremonger 479fc1f2750SBernard Iremonger.. code-block:: console 480fc1f2750SBernard Iremonger 48115e34522SLong Wu ./<build_dir>/app/dpdk-testpmd -l 0-3 -n 4 --vdev 'net_bonding0,mode=0,member=0000:0a:00.01,member=0000:04:00.00' -- --port-topology=chained 482fc1f2750SBernard Iremonger 483*4f840086SLong WuCreate a bonding device in round robin mode with two members specified by their PCI address and an overriding MAC address: 484fc1f2750SBernard Iremonger 485fc1f2750SBernard Iremonger.. code-block:: console 486fc1f2750SBernard Iremonger 48715e34522SLong Wu ./<build_dir>/app/dpdk-testpmd -l 0-3 -n 4 --vdev 'net_bonding0,mode=0,member=0000:0a:00.01,member=0000:04:00.00,mac=00:1e:67:1d:fd:1d' -- --port-topology=chained 488fc1f2750SBernard Iremonger 489*4f840086SLong WuCreate a bonding device in active backup mode with two members specified, and a primary member specified by their PCI addresses: 490fc1f2750SBernard Iremonger 491fc1f2750SBernard Iremonger.. code-block:: console 492fc1f2750SBernard Iremonger 49315e34522SLong Wu ./<build_dir>/app/dpdk-testpmd -l 0-3 -n 4 --vdev 'net_bonding0,mode=1,member=0000:0a:00.01,member=0000:04:00.00,primary=0000:0a:00.01' -- --port-topology=chained 494fc1f2750SBernard Iremonger 495*4f840086SLong WuCreate a bonding device in balance mode with two members specified by their PCI addresses, and a transmission policy of layer 3 + 4 forwarding: 496fc1f2750SBernard Iremonger 497fc1f2750SBernard Iremonger.. code-block:: console 498fc1f2750SBernard Iremonger 49915e34522SLong Wu ./<build_dir>/app/dpdk-testpmd -l 0-3 -n 4 --vdev 'net_bonding0,mode=2,member=0000:0a:00.01,member=0000:04:00.00,xmit_policy=l34' -- --port-topology=chained 500703178f8SDavid Marchand 501703178f8SDavid Marchand.. _bonding_testpmd_commands: 502703178f8SDavid Marchand 503703178f8SDavid MarchandTestpmd driver specific commands 504703178f8SDavid Marchand-------------------------------- 505703178f8SDavid Marchand 506703178f8SDavid MarchandSome bonding driver specific features are integrated in testpmd. 507703178f8SDavid Marchand 508*4f840086SLong Wucreate bonding device 509*4f840086SLong Wu~~~~~~~~~~~~~~~~~~~~~ 510703178f8SDavid Marchand 511703178f8SDavid MarchandCreate a new bonding device:: 512703178f8SDavid Marchand 513*4f840086SLong Wu testpmd> create bonding device (mode) (socket) 514703178f8SDavid Marchand 515*4f840086SLong WuFor example, to create a bonding device in mode 1 on socket 0:: 516703178f8SDavid Marchand 517*4f840086SLong Wu testpmd> create bonding device 1 0 518*4f840086SLong Wu created new bonding device (port X) 519703178f8SDavid Marchand 52015e34522SLong Wuadd bonding member 52115e34522SLong Wu~~~~~~~~~~~~~~~~~~ 522703178f8SDavid Marchand 523703178f8SDavid MarchandAdds Ethernet device to a Link Bonding device:: 524703178f8SDavid Marchand 52515e34522SLong Wu testpmd> add bonding member (member id) (port id) 526703178f8SDavid Marchand 527703178f8SDavid MarchandFor example, to add Ethernet device (port 6) to a Link Bonding device (port 10):: 528703178f8SDavid Marchand 52915e34522SLong Wu testpmd> add bonding member 6 10 530703178f8SDavid Marchand 531703178f8SDavid Marchand 53215e34522SLong Wuremove bonding member 53315e34522SLong Wu~~~~~~~~~~~~~~~~~~~~~ 534703178f8SDavid Marchand 53515e34522SLong WuRemoves an Ethernet member device from a Link Bonding device:: 536703178f8SDavid Marchand 53715e34522SLong Wu testpmd> remove bonding member (member id) (port id) 538703178f8SDavid Marchand 53915e34522SLong WuFor example, to remove Ethernet member device (port 6) to a Link Bonding device (port 10):: 540703178f8SDavid Marchand 54115e34522SLong Wu testpmd> remove bonding member 6 10 542703178f8SDavid Marchand 543703178f8SDavid Marchandset bonding mode 544703178f8SDavid Marchand~~~~~~~~~~~~~~~~ 545703178f8SDavid Marchand 546703178f8SDavid MarchandSet the Link Bonding mode of a Link Bonding device:: 547703178f8SDavid Marchand 548703178f8SDavid Marchand testpmd> set bonding mode (value) (port id) 549703178f8SDavid Marchand 550703178f8SDavid MarchandFor example, to set the bonding mode of a Link Bonding device (port 10) to broadcast (mode 3):: 551703178f8SDavid Marchand 552703178f8SDavid Marchand testpmd> set bonding mode 3 10 553703178f8SDavid Marchand 554703178f8SDavid Marchandset bonding primary 555703178f8SDavid Marchand~~~~~~~~~~~~~~~~~~~ 556703178f8SDavid Marchand 55715e34522SLong WuSet an Ethernet member device as the primary device on a Link Bonding device:: 558703178f8SDavid Marchand 55915e34522SLong Wu testpmd> set bonding primary (member id) (port id) 560703178f8SDavid Marchand 56115e34522SLong WuFor example, to set the Ethernet member device (port 6) as the primary port of a Link Bonding device (port 10):: 562703178f8SDavid Marchand 563703178f8SDavid Marchand testpmd> set bonding primary 6 10 564703178f8SDavid Marchand 565703178f8SDavid Marchandset bonding mac 566703178f8SDavid Marchand~~~~~~~~~~~~~~~ 567703178f8SDavid Marchand 568703178f8SDavid MarchandSet the MAC address of a Link Bonding device:: 569703178f8SDavid Marchand 570703178f8SDavid Marchand testpmd> set bonding mac (port id) (mac) 571703178f8SDavid Marchand 572703178f8SDavid MarchandFor example, to set the MAC address of a Link Bonding device (port 10) to 00:00:00:00:00:01:: 573703178f8SDavid Marchand 574703178f8SDavid Marchand testpmd> set bonding mac 10 00:00:00:00:00:01 575703178f8SDavid Marchand 576703178f8SDavid Marchandset bonding balance_xmit_policy 577703178f8SDavid Marchand~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 578703178f8SDavid Marchand 579703178f8SDavid MarchandSet the transmission policy for a Link Bonding device when it is in Balance XOR mode:: 580703178f8SDavid Marchand 581703178f8SDavid Marchand testpmd> set bonding balance_xmit_policy (port_id) (l2|l23|l34) 582703178f8SDavid Marchand 583703178f8SDavid MarchandFor example, set a Link Bonding device (port 10) to use a balance policy of layer 3+4 (IP addresses & UDP ports):: 584703178f8SDavid Marchand 585703178f8SDavid Marchand testpmd> set bonding balance_xmit_policy 10 l34 586703178f8SDavid Marchand 587703178f8SDavid Marchand 588703178f8SDavid Marchandset bonding mon_period 589703178f8SDavid Marchand~~~~~~~~~~~~~~~~~~~~~~ 590703178f8SDavid Marchand 591703178f8SDavid MarchandSet the link status monitoring polling period in milliseconds for a bonding device. 592703178f8SDavid Marchand 59315e34522SLong WuThis adds support for PMD member devices which do not support link status interrupts. 594703178f8SDavid MarchandWhen the mon_period is set to a value greater than 0 then all PMD's which do not support 595703178f8SDavid Marchandlink status ISR will be queried every polling interval to check if their link status has changed:: 596703178f8SDavid Marchand 597703178f8SDavid Marchand testpmd> set bonding mon_period (port_id) (value) 598703178f8SDavid Marchand 599*4f840086SLong WuFor example, to set the link status monitoring polling period of bonding device (port 5) to 150ms:: 600703178f8SDavid Marchand 601703178f8SDavid Marchand testpmd> set bonding mon_period 5 150 602703178f8SDavid Marchand 603703178f8SDavid Marchand 604703178f8SDavid Marchandset bonding lacp dedicated_queue 605703178f8SDavid Marchand~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 606703178f8SDavid Marchand 60715e34522SLong WuEnable dedicated tx/rx queues on bonding devices members to handle LACP control plane traffic 608703178f8SDavid Marchandwhen in mode 4 (link-aggregation-802.3ad):: 609703178f8SDavid Marchand 610703178f8SDavid Marchand testpmd> set bonding lacp dedicated_queues (port_id) (enable|disable) 611703178f8SDavid Marchand 612703178f8SDavid Marchand 613703178f8SDavid Marchandset bonding agg_mode 614703178f8SDavid Marchand~~~~~~~~~~~~~~~~~~~~ 615703178f8SDavid Marchand 616703178f8SDavid MarchandEnable one of the specific aggregators mode when in mode 4 (link-aggregation-802.3ad):: 617703178f8SDavid Marchand 618703178f8SDavid Marchand testpmd> set bonding agg_mode (port_id) (bandwidth|count|stable) 619703178f8SDavid Marchand 620703178f8SDavid Marchand 621703178f8SDavid Marchandshow bonding config 622703178f8SDavid Marchand~~~~~~~~~~~~~~~~~~~ 623703178f8SDavid Marchand 624f3b5f3d3SChengwen FengShow the current configuration of a Link Bonding device, 625f3b5f3d3SChengwen Fengit also shows link-aggregation-802.3ad information if the link mode is mode 4:: 626703178f8SDavid Marchand 627703178f8SDavid Marchand testpmd> show bonding config (port id) 628703178f8SDavid Marchand 629703178f8SDavid MarchandFor example, 63015e34522SLong Wuto show the configuration a Link Bonding device (port 9) with 3 member devices (1, 3, 4) 631703178f8SDavid Marchandin balance mode with a transmission policy of layer 2+3:: 632703178f8SDavid Marchand 633703178f8SDavid Marchand testpmd> show bonding config 9 634f3b5f3d3SChengwen Feng - Dev basic: 635f3b5f3d3SChengwen Feng Bonding mode: BALANCE(2) 636703178f8SDavid Marchand Balance Xmit Policy: BALANCE_XMIT_POLICY_LAYER23 63715e34522SLong Wu Members (3): [1 3 4] 63815e34522SLong Wu Active Members (3): [1 3 4] 639703178f8SDavid Marchand Primary: [3] 640