xref: /dpdk/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst (revision bd89cca3ca34d255e48fa4246998c89bb38301d4)
1..  BSD LICENSE
2    Copyright(c) 2010-2014 ntel Corporation. All rights reserved.
3    All rights reserved.
4
5    Redistribution and use in source and binary forms, with or without
6    modification, are permitted provided that the following conditions
7    are met:
8
9    * Redistributions of source code must retain the above copyright
10    notice, this list of conditions and the following disclaimer.
11    * Redistributions in binary form must reproduce the above copyright
12    notice, this list of conditions and the following disclaimer in
13    the documentation and/or other materials provided with the
14    distribution.
15    * Neither the name of Intel Corporation nor the names of its
16    contributors may be used to endorse or promote products derived
17    from this software without specific prior written permission.
18
19    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30
31Link Bonding Poll Mode Driver Library
32=====================================
33
34In addition to Poll Mode Drivers (PMDs) for physical and virtual hardware,
35DPDK also includes a pure-software library that
36allows physical PMD's to be bonded together to create a single logical PMD.
37
38|bond-overview|
39
40The Link Bonding PMD library(librte_pmd_bond) supports bonding of groups of
41``rte_eth_dev`` ports of the same speed and duplex to provide
42similar the capabilities to that found in Linux bonding driver to allow the
43aggregation of multiple (slave) NICs into a single logical interface between a
44server and a switch. The new bonded PMD will then process these interfaces
45based on the mode of operation specified to provide support for features such
46as redundant links, fault tolerance and/or load balancing.
47
48The librte_pmd_bond library exports a C API which provides an API for the
49creation of bonded devices as well as the configuration and management of the
50bonded device and its slave devices.
51
52.. note::
53
54    The Link Bonding PMD Library is enabled by default in the build
55    configuration files, the library can be disabled by setting
56    ``CONFIG_RTE_LIBRTE_PMD_BOND=n`` and recompiling the DPDK.
57
58Link Bonding Modes Overview
59---------------------------
60
61Currently the Link Bonding PMD library supports 4 modes of operation:
62
63*   **Round-Robin (Mode 0):**
64
65|bond-mode-0|
66
67    This mode provides load balancing and fault tolerance by transmission of
68    packets in sequential order from the first available slave device through
69    the last. Packets are bulk dequeued from devices then serviced in a
70    round-robin manner. This mode does not guarantee in order reception of
71    packets and down stream should be able to handle out of order packets.
72
73*   **Active Backup (Mode 1):**
74
75|bond-mode-1|
76
77    In this mode only one slave in the bond is active at any time, a different
78    slave becomes active if, and only if, the primary active slave fails,
79    thereby providing fault tolerance to slave failure. The single logical
80    bonded interface's MAC address is externally visible on only one NIC (port)
81    to avoid confusing the network switch.
82
83*   **Balance XOR (Mode 2):**
84
85|bond-mode-2|
86
87    This mode provides transmit load balancing (based on the selected
88    transmission policy) and fault tolerance. The default policy (layer2) uses
89    a simple calculation based on the packet flow source and destination MAC
90    addresses aswell as the number of active slaves available to the bonded
91    device to classify the packet to a specific slave to transmit on. Alternate
92    transmission policies supported are layer 2+3, this takes the IP source and
93    destination addresses into the calculation of the transmit slave port and
94    the final supported policy is layer 3+4, this uses IP source and
95    destination addresses as well as the TCP/UDP source and destination port.
96
97.. note::
98    The colouring differences of the packets are used to identify different flow
99    classification calculated by the selected transmit policy
100
101
102*   **Broadcast (Mode 3):**
103
104|bond-mode-3|
105
106    This mode provides fault tolerance by transmission of packets on all slave
107    ports.
108
109*   **Link Aggregation 802.3AD (Mode 4):**
110
111|bond-mode-4|
112
113    This mode provides dynamic link aggregation according to the 802.3ad
114    specification. It negotiates and monitors aggregation groups that share the
115    same speed and duplex settings using the selected balance transmit policy
116    for balancing outgoing traffic.
117
118    DPDK implementation of this mode provide some additional requirements of
119    the application.
120
121    #. It needs to call ``rte_eth_tx_burst`` and ``rte_eth_rx_burst`` with
122       intervals period of less than 100ms.
123
124    #. Calls to ``rte_eth_tx_burst`` must have a buffer size of at least 2xN,
125       where N is the number of slaves. This is a space required for LACP
126       frames. Additionally LACP packets are included in the statistics, but
127       they are not returned to the application.
128
129*   **Transmit Load Balancing (Mode 5):**
130
131|bond-mode-5|
132
133    This mode provides an adaptive transmit load balancing. It dynamically
134    changes the transmitting slave, according to the computed load. Statistics
135    are collected in 100ms intervals and scheduled every 10ms.
136
137
138Implementation Details
139----------------------
140
141The librte_pmd_bond bonded device are compatible with the Ethernet device API
142exported by the Ethernet PMDs described in the *DPDK API Reference*.
143
144The Link Bonding Library supports the creation of bonded devices at application
145startup time during EAL initialization using the ``--vdev`` option as well as
146programmatically via the C API ``rte_eth_bond_create`` function.
147
148Bonded devices support the dynamical addition and removal of slave devices using
149the ``rte_eth_bond_slave_add`` / ``rte_eth_bond_slave_remove`` APIs.
150
151After a slave device is added to a bonded device slave is stopped using
152``rte_eth_dev_stop`` and then reconfigured using ``rte_eth_dev_configure``
153the RX and TX queues are also reconfigured using ``rte_eth_tx_queue_setup`` /
154``rte_eth_rx_queue_setup`` with the parameters use to configure the bonding
155device.
156
157Link Status Change Interrupts / Polling
158~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
159
160Link bonding devices support the registration of a link status change callback,
161using the ``rte_eth_dev_callback_register`` API, this will be called when the
162status of the bonding device changes. For example in the case of a bonding
163device which has 3 slaves, the link status will change to up when one slave
164becomes active or change to down when all slaves become inactive. There is no
165callback notification when a single slave changes state and the previous
166conditions are not met. If a user wishes to monitor individual slaves then they
167must register callbacks with that slave directly.
168
169The link bonding library also supports devices which do not implement link
170status change interrupts, this is achieve by polling the devices link status at
171a defined period which is set using the ``rte_eth_bond_link_monitoring_set``
172API, the default polling interval is 10ms. When a device is added as a slave to
173a bonding device it is determined using the ``RTE_PCI_DRV_INTR_LSC`` flag
174whether the device supports interrupts or whether the link status should be
175monitored by polling it.
176
177Requirements / Limitations
178~~~~~~~~~~~~~~~~~~~~~~~~~~
179
180The current implementation only supports devices that support the same speed
181and duplex to be added as a slaves to the same bonded device. The bonded device
182inherits these attributes from the first active slave added to the bonded
183device and then all further slaves added to the bonded device must support
184these parameters.
185
186A bonding device must have a minimum of one slave before the bonding device
187itself can be started.
188
189Like all other PMD, all functions exported by a PMD are lock-free functions
190that are assumed not to be invoked in parallel on different logical cores to
191work on the same target object.
192
193It should also be noted that the PMD receive function should not be invoked
194directly on a slave devices after they have been to a bonded device since
195packets read directly from the slave device will no longer be available to the
196bonded device to read.
197
198Configuration
199~~~~~~~~~~~~~
200
201Link bonding devices are created using the ``rte_eth_bond_create`` API
202which requires a unique device name, the bonding mode,
203and the socket Id to allocate the bonding device's resources on.
204The other configurable parameters for a bonded device are its slave devices,
205its primary slave, a user defined MAC address and transmission policy to use if
206the device is in balance XOR mode.
207
208Slave Devices
209^^^^^^^^^^^^^
210
211Bonding devices support up to a maximum of ``RTE_MAX_ETHPORTS`` slave devices
212of the same speed and duplex. Ethernet devices can be added as a slave to a
213maximum of one bonded device. Slave devices are reconfigured with the
214configuration of the bonded device on being added to a bonded device.
215
216The bonded also guarantees to return the MAC address of the slave device to its
217original value of removal of a slave from it.
218
219Primary Slave
220^^^^^^^^^^^^^
221
222The primary slave is used to define the default port to use when a bonded
223device is in active backup mode. A different port will only be used if, and
224only if, the current primary port goes down. If the user does not specify a
225primary port it will default to being the first port added to the bonded device.
226
227MAC Address
228^^^^^^^^^^^
229
230The bonded device can be configured with a user specified MAC address, this
231address will be inherited by the some/all slave devices depending on the
232operating mode. If the device is in active backup mode then only the primary
233device will have the user specified MAC, all other slaves will retain their
234original MAC address. In mode 0, 2, 3, 4 all slaves devices are configure with
235the bonded devices MAC address.
236
237If a user defined MAC address is not defined then the bonded device will
238default to using the primary slaves MAC address.
239
240Balance XOR Transmit Policies
241^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
242
243There are 3 supported transmission policies for bonded device running in
244Balance XOR mode. Layer 2, Layer 2+3, Layer 3+4.
245
246*   **Layer 2:**   Ethernet MAC address based balancing is the default
247    transmission policy for Balance XOR bonding mode. It uses a simple XOR
248    calculation on the source MAC address and destination MAC address of the
249    packet and then calculate the modulus of this value to calculate the slave
250    device to transmit the packet on.
251
252*   **Layer 2 + 3:** Ethernet MAC address & IP Address based balancing uses a
253    combination of source/destination MAC addresses and the source/destination
254    IP addresses of the data packet to decide which slave port the packet will
255    be transmitted on.
256
257*   **Layer 3 + 4:**  IP Address & UDP Port based  balancing uses a combination
258    of source/destination IP Address and the source/destination UDP ports of
259    the packet of the data packet to decide which slave port the packet will be
260    transmitted on.
261
262All these policies support 802.1Q VLAN Ethernet packets, as well as IPv4, IPv6
263and UDP protocols for load balancing.
264
265Using Link Bonding Devices
266--------------------------
267
268The librte_pmd_bond library support two modes of device creation, the libraries
269export full C API or using the EAL command line to statically configure link
270bonding devices at application startup. Using the EAL option it is possible to
271use link bonding functionality transparently without specific knowledge of the
272libraries API, this can be used, for example, to add bonding functionality,
273such as active backup, to an existing application which has no knowledge of
274the link bonding C API.
275
276Using the Poll Mode Driver from an Application
277~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
278
279Using the librte_pmd_bond libraries API it is possible to dynamically create
280and manage link bonding device from within any application. Link bonding
281device are created using the ``rte_eth_bond_create`` API which requires a
282unique device name, the link bonding mode to initial the device in and finally
283the socket Id which to allocate the devices resources onto. After successful
284creation of a bonding device it must be configured using the generic Ethernet
285device configure API ``rte_eth_dev_configure`` and then the RX and TX queues
286which will be used must be setup using ``rte_eth_tx_queue_setup`` /
287``rte_eth_rx_queue_setup``.
288
289Slave devices can be dynamically added and removed from a link bonding device
290using the ``rte_eth_bond_slave_add`` / ``rte_eth_bond_slave_remove``
291APIs but at least one slave device must be added to the link bonding device
292before it can be started using ``rte_eth_dev_start``.
293
294The link status of a bonded device is dictated by that of its slaves, if all
295slave device link status are down or if all slaves are removed from the link
296bonding device then the link status of the bonding device will go down.
297
298It is also possible to configure / query the configuration of the control
299parameters of a bonded device using the provided APIs
300``rte_eth_bond_mode_set/ get``, ``rte_eth_bond_primary_set/get``,
301``rte_eth_bond_mac_set/reset`` and ``rte_eth_bond_xmit_policy_set/get``.
302
303Using Link Bonding Devices from the EAL Command Line
304~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
305
306Link bonding devices can be created at application startup time using the
307``--vdev`` EAL command line option. The device name must start with the
308eth_bond prefix followed by numbers or letters. The name must be unique for
309each device. Each device can have multiple options arranged in a comma
310separated list. Multiple devices definitions can be arranged by calling the
311``--vdev`` option multiple times.
312
313Device names and bonding options must be separated by commas as shown below:
314
315.. code-block:: console
316
317    $RTE_TARGET/app/testpmd -c f -n 4 --vdev 'eth_bond0,bond_opt0=..,bond opt1=..'--vdev 'eth_bond1,bond _opt0=..,bond_opt1=..'
318
319Link Bonding EAL Options
320^^^^^^^^^^^^^^^^^^^^^^^^
321
322There are multiple ways of definitions that can be assessed and combined as
323long as the following two rules are respected:
324
325*   A unique device name, in the format of eth_bondX is provided,
326    where X can be any combination of numbers and/or letters,
327    and the name is no greater than 32 characters long.
328
329*   A least one slave device is provided with for each bonded device definition.
330
331*   The operation mode of the bonded device being created is provided.
332
333The different options are:
334
335*   mode: Integer value defining the bonding mode of the device.
336    Currently supports modes 0,1,2,3,4,5 (round-robin, active backup, balance,
337    broadcast, link aggregation, transmit load balancing).
338
339.. code-block:: console
340
341        mode=2
342
343*   slave: Defines the PMD device which will be added as slave to the bonded
344    device. This option can be selected multiple time, for each device to be
345    added as a slave. Physical devices should be specified using their PCI
346    address, in the format domain:bus:devid.function
347
348.. code-block:: console
349
350        slave=0000:0a:00.0,slave=0000:0a:00.1
351
352*   primary: Optional parameter which defines the primary slave port,
353    is used in active backup mode to select the primary slave for data TX/RX if
354    it is available. The primary port also is used to select the MAC address to
355    use when it is not defined by the user. This defaults to the first slave
356    added to the device if it is specified. The primary device must be a slave
357    of the bonded device.
358
359.. code-block:: console
360
361        primary=0000:0a:00.0
362
363*   socket_id: Optional parameter used to select which socket on a NUMA device
364    the bonded devices resources will be allocated on.
365
366.. code-block:: console
367
368        socket_id=0
369
370*   mac: Optional parameter to select a MAC address for link bonding device,
371    this overrides the value of the primary slave device.
372
373.. code-block:: console
374
375        mac=00:1e:67:1d:fd:1d
376
377*   xmit_policy: Optional parameter which defines the transmission policy when
378    the bonded device is in  balance mode. If not user specified this defaults
379    to l2 (layer 2) forwarding, the other transmission policies available are
380    l23 (layer 2+3) and l34 (layer 3+4)
381
382.. code-block:: console
383
384        xmit_policy=l23
385
386*   lsc_poll_period_ms: Optional parameter which defines the polling interval
387    in milli-seconds at which devices which don't support lsc interrupts are
388    checked for a change in the devices link status
389
390.. code-block:: console
391
392        lsc_poll_period_ms=100
393
394*   up_delay: Optional parameter which adds a delay in milli-seconds to the
395    propagation of a devices link status changing to up, by default this
396    parameter is zero.
397
398.. code-block:: console
399
400        up_delay=10
401
402*   down_delay: Optional parameter which adds a delay in milli-seconds to the
403    propagation of a devices link status changing to down, by default this
404    parameter is zero.
405
406.. code-block:: console
407
408        down_delay=50
409
410Examples of Usage
411^^^^^^^^^^^^^^^^^
412
413Create a bonded device in round robin mode with two slaves specified by their PCI address:
414
415.. code-block:: console
416
417    $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_bond0,mode=0, slave=0000:00a:00.01,slave=0000:004:00.00' -- --port-topology=chained
418
419Create a bonded device in round robin mode with two slaves specified by their PCI address and an overriding MAC address:
420
421.. code-block:: console
422
423    $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_bond0,mode=0, slave=0000:00a:00.01,slave=0000:004:00.00,mac=00:1e:67:1d:fd:1d' -- --port-topology=chained
424
425Create a bonded device in active backup mode with two slaves specified, and a primary slave specified by their PCI addresses:
426
427.. code-block:: console
428
429    $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_bond0,mode=1, slave=0000:00a:00.01,slave=0000:004:00.00,primary=0000:00a:00.01' -- --port-topology=chained
430
431Create a bonded device in balance mode with two slaves specified by their PCI addresses, and a transmission policy of layer 3 + 4 forwarding:
432
433.. code-block:: console
434
435    $RTE_TARGET/app/testpmd -c '0xf' -n 4 --vdev 'eth_bond0,mode=2, slave=0000:00a:00.01,slave=0000:004:00.00,xmit_policy=l34' -- --port-topology=chained
436
437.. |bond-overview| image:: img/bond-overview.svg
438.. |bond-mode-0| image:: img/bond-mode-0.svg
439.. |bond-mode-1| image:: img/bond-mode-1.svg
440.. |bond-mode-2| image:: img/bond-mode-2.svg
441.. |bond-mode-3| image:: img/bond-mode-3.svg
442.. |bond-mode-4| image:: img/bond-mode-4.svg
443.. |bond-mode-5| image:: img/bond-mode-5.svg
444