1.. BSD LICENSE 2 Copyright(c) 2010-2014 ntel Corporation. All rights reserved. 3 All rights reserved. 4 5 Redistribution and use in source and binary forms, with or without 6 modification, are permitted provided that the following conditions 7 are met: 8 9 * Redistributions of source code must retain the above copyright 10 notice, this list of conditions and the following disclaimer. 11 * Redistributions in binary form must reproduce the above copyright 12 notice, this list of conditions and the following disclaimer in 13 the documentation and/or other materials provided with the 14 distribution. 15 * Neither the name of Intel Corporation nor the names of its 16 contributors may be used to endorse or promote products derived 17 from this software without specific prior written permission. 18 19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 31Link Bonding Poll Mode Driver Library 32===================================== 33 34In addition to Poll Mode Drivers (PMDs) for physical and virtual hardware, 35DPDK also includes a pure-software library that 36allows physical PMD's to be bonded together to create a single logical PMD. 37 38|bond-overview| 39 40The Link Bonding PMD library(librte_pmd_bond) supports bonding of groups of 41``rte_eth_dev`` ports of the same speed and duplex to provide 42similar the capabilities to that found in Linux bonding driver to allow the 43aggregation of multiple (slave) NICs into a single logical interface between a 44server and a switch. The new bonded PMD will then process these interfaces 45based on the mode of operation specified to provide support for features such 46as redundant links, fault tolerance and/or load balancing. 47 48The librte_pmd_bond library exports a C API which provides an API for the 49creation of bonded devices as well as the configuration and management of the 50bonded device and its slave devices. 51 52.. note:: 53 54 The Link Bonding PMD Library is enabled by default in the build 55 configuration files, the library can be disabled by setting 56 ``CONFIG_RTE_LIBRTE_PMD_BOND=n`` and recompiling the DPDK. 57 58Link Bonding Modes Overview 59--------------------------- 60 61Currently the Link Bonding PMD library supports 4 modes of operation: 62 63* **Round-Robin (Mode 0):** 64 65|bond-mode-0| 66 67 This mode provides load balancing and fault tolerance by transmission of 68 packets in sequential order from the first available slave device through 69 the last. Packets are bulk dequeued from devices then serviced in a 70 round-robin manner. This mode does not guarantee in order reception of 71 packets and down stream should be able to handle out of order packets. 72 73* **Active Backup (Mode 1):** 74 75|bond-mode-1| 76 77 In this mode only one slave in the bond is active at any time, a different 78 slave becomes active if, and only if, the primary active slave fails, 79 thereby providing fault tolerance to slave failure. The single logical 80 bonded interface's MAC address is externally visible on only one NIC (port) 81 to avoid confusing the network switch. 82 83* **Balance XOR (Mode 2):** 84 85|bond-mode-2| 86 87 This mode provides transmit load balancing (based on the selected 88 transmission policy) and fault tolerance. The default policy (layer2) uses 89 a simple calculation based on the packet flow source and destination MAC 90 addresses aswell as the number of active slaves available to the bonded 91 device to classify the packet to a specific slave to transmit on. Alternate 92 transmission policies supported are layer 2+3, this takes the IP source and 93 destination addresses into the calculation of the transmit slave port and 94 the final supported policy is layer 3+4, this uses IP source and 95 destination addresses as well as the TCP/UDP source and destination port. 96 97.. note:: 98 The colouring differences of the packets are used to identify different flow 99 classification calculated by the selected transmit policy 100 101 102* **Broadcast (Mode 3):** 103 104|bond-mode-3| 105 106 This mode provides fault tolerance by transmission of packets on all slave 107 ports. 108 109* **Link Aggregation 802.3AD (Mode 4):** 110 111|bond-mode-4| 112 113 This mode provides dynamic link aggregation according to the 802.3ad 114 specification. It negotiates and monitors aggregation groups that share the 115 same speed and duplex settings using the selected balance transmit policy 116 for balancing outgoing traffic. 117 118 DPDK implementation of this mode provide some additional requirements of 119 the application. 120 121 #. It needs to call ``rte_eth_tx_burst`` and ``rte_eth_rx_burst`` with 122 intervals period of less than 100ms. 123 124 #. Calls to ``rte_eth_tx_burst`` must have a buffer size of at least 2xN, 125 where N is the number of slaves. This is a space required for LACP 126 frames. Additionally LACP packets are included in the statistics, but 127 they are not returned to the application. 128 129* **Transmit Load Balancing (Mode 5):** 130 131|bond-mode-5| 132 133 This mode provides an adaptive transmit load balancing. It dynamically 134 changes the transmitting slave, according to the computed load. Statistics 135 are collected in 100ms intervals and scheduled every 10ms. 136 137 138Implementation Details 139---------------------- 140 141The librte_pmd_bond bonded device are compatible with the Ethernet device API 142exported by the Ethernet PMDs described in the *DPDK API Reference*. 143 144The Link Bonding Library supports the creation of bonded devices at application 145startup time during EAL initialization using the ``--vdev`` option as well as 146programmatically via the C API ``rte_eth_bond_create`` function. 147 148Bonded devices support the dynamical addition and removal of slave devices using 149the ``rte_eth_bond_slave_add`` / ``rte_eth_bond_slave_remove`` APIs. 150 151After a slave device is added to a bonded device slave is stopped using 152``rte_eth_dev_stop`` and then reconfigured using ``rte_eth_dev_configure`` 153the RX and TX queues are also reconfigured using ``rte_eth_tx_queue_setup`` / 154``rte_eth_rx_queue_setup`` with the parameters use to configure the bonding 155device. 156 157Link Status Change Interrupts / Polling 158~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 159 160Link bonding devices support the registration of a link status change callback, 161using the ``rte_eth_dev_callback_register`` API, this will be called when the 162status of the bonding device changes. For example in the case of a bonding 163device which has 3 slaves, the link status will change to up when one slave 164becomes active or change to down when all slaves become inactive. There is no 165callback notification when a single slave changes state and the previous 166conditions are not met. If a user wishes to monitor individual slaves then they 167must register callbacks with that slave directly. 168 169The link bonding library also supports devices which do not implement link 170status change interrupts, this is achieve by polling the devices link status at 171a defined period which is set using the ``rte_eth_bond_link_monitoring_set`` 172API, the default polling interval is 10ms. When a device is added as a slave to 173a bonding device it is determined using the ``RTE_PCI_DRV_INTR_LSC`` flag 174whether the device supports interrupts or whether the link status should be 175monitored by polling it. 176 177Requirements / Limitations 178~~~~~~~~~~~~~~~~~~~~~~~~~~ 179 180The current implementation only supports devices that support the same speed 181and duplex to be added as a slaves to the same bonded device. The bonded device 182inherits these attributes from the first active slave added to the bonded 183device and then all further slaves added to the bonded device must support 184these parameters. 185 186A bonding device must have a minimum of one slave before the bonding device 187itself can be started. 188 189Like all other PMD, all functions exported by a PMD are lock-free functions 190that are assumed not to be invoked in parallel on different logical cores to 191work on the same target object. 192 193It should also be noted that the PMD receive function should not be invoked 194directly on a slave devices after they have been to a bonded device since 195packets read directly from the slave device will no longer be available to the 196bonded device to read. 197 198Configuration 199~~~~~~~~~~~~~ 200 201Link bonding devices are created using the ``rte_eth_bond_create`` API 202which requires a unique device name, the bonding mode, 203and the socket Id to allocate the bonding device's resources on. 204The other configurable parameters for a bonded device are its slave devices, 205its primary slave, a user defined MAC address and transmission policy to use if 206the device is in balance XOR mode. 207 208Slave Devices 209^^^^^^^^^^^^^ 210 211Bonding devices support up to a maximum of ``RTE_MAX_ETHPORTS`` slave devices 212of the same speed and duplex. Ethernet devices can be added as a slave to a 213maximum of one bonded device. Slave devices are reconfigured with the 214configuration of the bonded device on being added to a bonded device. 215 216The bonded also guarantees to return the MAC address of the slave device to its 217original value of removal of a slave from it. 218 219Primary Slave 220^^^^^^^^^^^^^ 221 222The primary slave is used to define the default port to use when a bonded 223device is in active backup mode. A different port will only be used if, and 224only if, the current primary port goes down. If the user does not specify a 225primary port it will default to being the first port added to the bonded device. 226 227MAC Address 228^^^^^^^^^^^ 229 230The bonded device can be configured with a user specified MAC address, this 231address will be inherited by the some/all slave devices depending on the 232operating mode. If the device is in active backup mode then only the primary 233device will have the user specified MAC, all other slaves will retain their 234original MAC address. In mode 0, 2, 3, 4 all slaves devices are configure with 235the bonded devices MAC address. 236 237If a user defined MAC address is not defined then the bonded device will 238default to using the primary slaves MAC address. 239 240Balance XOR Transmit Policies 241^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 242 243There are 3 supported transmission policies for bonded device running in 244Balance XOR mode. Layer 2, Layer 2+3, Layer 3+4. 245 246* **Layer 2:** Ethernet MAC address based balancing is the default 247 transmission policy for Balance XOR bonding mode. It uses a simple XOR 248 calculation on the source MAC address and destination MAC address of the 249 packet and then calculate the modulus of this value to calculate the slave 250 device to transmit the packet on. 251 252* **Layer 2 + 3:** Ethernet MAC address & IP Address based balancing uses a 253 combination of source/destination MAC addresses and the source/destination 254 IP addresses of the data packet to decide which slave port the packet will 255 be transmitted on. 256 257* **Layer 3 + 4:** IP Address & UDP Port based balancing uses a combination 258 of source/destination IP Address and the source/destination UDP ports of 259 the packet of the data packet to decide which slave port the packet will be 260 transmitted on. 261 262All these policies support 802.1Q VLAN Ethernet packets, as well as IPv4, IPv6 263and UDP protocols for load balancing. 264 265Using Link Bonding Devices 266-------------------------- 267 268The librte_pmd_bond library support two modes of device creation, the libraries 269export full C API or using the EAL command line to statically configure link 270bonding devices at application startup. Using the EAL option it is possible to 271use link bonding functionality transparently without specific knowledge of the 272libraries API, this can be used, for example, to add bonding functionality, 273such as active backup, to an existing application which has no knowledge of 274the link bonding C API. 275 276Using the Poll Mode Driver from an Application 277~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 278 279Using the librte_pmd_bond libraries API it is possible to dynamically create 280and manage link bonding device from within any application. Link bonding 281device are created using the ``rte_eth_bond_create`` API which requires a 282unique device name, the link bonding mode to initial the device in and finally 283the socket Id which to allocate the devices resources onto. After successful 284creation of a bonding device it must be configured using the generic Ethernet 285device configure API ``rte_eth_dev_configure`` and then the RX and TX queues 286which will be used must be setup using ``rte_eth_tx_queue_setup`` / 287``rte_eth_rx_queue_setup``. 288 289Slave devices can be dynamically added and removed from a link bonding device 290using the ``rte_eth_bond_slave_add`` / ``rte_eth_bond_slave_remove`` 291APIs but at least one slave device must be added to the link bonding device 292before it can be started using ``rte_eth_dev_start``. 293 294The link status of a bonded device is dictated by that of its slaves, if all 295slave device link status are down or if all slaves are removed from the link 296bonding device then the link status of the bonding device will go down. 297 298It is also possible to configure / query the configuration of the control 299parameters of a bonded device using the provided APIs 300``rte_eth_bond_mode_set/ get``, ``rte_eth_bond_primary_set/get``, 301``rte_eth_bond_mac_set/reset`` and ``rte_eth_bond_xmit_policy_set/get``. 302 303Using Link Bonding Devices from the EAL Command Line 304~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 305 306Link bonding devices can be created at application startup time using the 307``--vdev`` EAL command line option. The device name must start with the 308eth_bond prefix followed by numbers or letters. The name must be unique for 309each device. Each device can have multiple options arranged in a comma 310separated list. Multiple devices definitions can be arranged by calling the 311``--vdev`` option multiple times. 312 313Device names and bonding options must be separated by commas as shown below: 314 315.. code-block:: console 316 317 $RTE_TARGET/app/testpmd -c f -n 4 --vdev 'eth_bond0,bond_opt0=..,bond opt1=..'--vdev 'eth_bond1,bond _opt0=..,bond_opt1=..' 318 319Link Bonding EAL Options 320^^^^^^^^^^^^^^^^^^^^^^^^ 321 322There are multiple ways of definitions that can be assessed and combined as 323long as the following two rules are respected: 324 325* A unique device name, in the format of eth_bondX is provided, 326 where X can be any combination of numbers and/or letters, 327 and the name is no greater than 32 characters long. 328 329* A least one slave device is provided with for each bonded device definition. 330 331* The operation mode of the bonded device being created is provided. 332 333The different options are: 334 335* mode: Integer value defining the bonding mode of the device. 336 Currently supports modes 0,1,2,3,4,5 (round-robin, active backup, balance, 337 broadcast, link aggregation, transmit load balancing). 338 339.. code-block:: console 340 341 mode=2 342 343* slave: Defines the PMD device which will be added as slave to the bonded 344 device. This option can be selected multiple time, for each device to be 345 added as a slave. Physical devices should be specified using their PCI 346 address, in the format domain:bus:devid.function 347 348.. code-block:: console 349 350 slave=0000:0a:00.0,slave=0000:0a:00.1 351 352* primary: Optional parameter which defines the primary slave port, 353 is used in active backup mode to select the primary slave for data TX/RX if 354 it is available. The primary port also is used to select the MAC address to 355 use when it is not defined by the user. This defaults to the first slave 356 added to the device if it is specified. The primary device must be a slave 357 of the bonded device. 358 359.. code-block:: console 360 361 primary=0000:0a:00.0 362 363* socket_id: Optional parameter used to select which socket on a NUMA device 364 the bonded devices resources will be allocated on. 365 366.. code-block:: console 367 368 socket_id=0 369 370* mac: Optional parameter to select a MAC address for link bonding device, 371 this overrides the value of the primary slave device. 372 373.. code-block:: console 374 375 mac=00:1e:67:1d:fd:1d 376 377* xmit_policy: Optional parameter which defines the transmission policy when 378 the bonded device is in balance mode. If not user specified this defaults 379 to l2 (layer 2) forwarding, the other transmission policies available are 380 l23 (layer 2+3) and l34 (layer 3+4) 381 382.. code-block:: console 383 384 xmit_policy=l23 385 386* lsc_poll_period_ms: Optional parameter which defines the polling interval 387 in milli-seconds at which devices which don't support lsc interrupts are 388 checked for a change in the devices link status 389 390.. code-block:: console 391 392 lsc_poll_period_ms=100 393 394* up_delay: Optional parameter which adds a delay in milli-seconds to the 395 propagation of a devices link status changing to up, by default this 396 parameter is zero. 397 398.. code-block:: console 399 400 up_delay=10 401 402* down_delay: Optional parameter which adds a delay in milli-seconds to the 403 propagation of a devices link status changing to down, by default this 404 parameter is zero. 405 406.. code-block:: console 407 408 down_delay=50 409 410Examples of Usage 411^^^^^^^^^^^^^^^^^ 412 413Create a bonded device in round robin mode with two slaves specified by their PCI address: 414 415.. code-block:: console 416 417 $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_bond0,mode=0, slave=0000:00a:00.01,slave=0000:004:00.00' -- --port-topology=chained 418 419Create a bonded device in round robin mode with two slaves specified by their PCI address and an overriding MAC address: 420 421.. code-block:: console 422 423 $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_bond0,mode=0, slave=0000:00a:00.01,slave=0000:004:00.00,mac=00:1e:67:1d:fd:1d' -- --port-topology=chained 424 425Create a bonded device in active backup mode with two slaves specified, and a primary slave specified by their PCI addresses: 426 427.. code-block:: console 428 429 $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_bond0,mode=1, slave=0000:00a:00.01,slave=0000:004:00.00,primary=0000:00a:00.01' -- --port-topology=chained 430 431Create a bonded device in balance mode with two slaves specified by their PCI addresses, and a transmission policy of layer 3 + 4 forwarding: 432 433.. code-block:: console 434 435 $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_bond0,mode=2, slave=0000:00a:00.01,slave=0000:004:00.00,xmit_policy=l34' -- --port-topology=chained 436 437.. |bond-overview| image:: img/bond-overview.svg 438.. |bond-mode-0| image:: img/bond-mode-0.svg 439.. |bond-mode-1| image:: img/bond-mode-1.svg 440.. |bond-mode-2| image:: img/bond-mode-2.svg 441.. |bond-mode-3| image:: img/bond-mode-3.svg 442.. |bond-mode-4| image:: img/bond-mode-4.svg 443.. |bond-mode-5| image:: img/bond-mode-5.svg 444