1.. BSD LICENSE 2 Copyright(c) 2010-2014 Intel Corporation. All rights reserved. 3 All rights reserved. 4 5 Redistribution and use in source and binary forms, with or without 6 modification, are permitted provided that the following conditions 7 are met: 8 9 * Redistributions of source code must retain the above copyright 10 notice, this list of conditions and the following disclaimer. 11 * Redistributions in binary form must reproduce the above copyright 12 notice, this list of conditions and the following disclaimer in 13 the documentation and/or other materials provided with the 14 distribution. 15 * Neither the name of Intel Corporation nor the names of its 16 contributors may be used to endorse or promote products derived 17 from this software without specific prior written permission. 18 19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 22 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 23 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 24 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 25 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 26 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 27 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 28 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 31IPv4 Multicast Sample Application 32================================= 33 34The IPv4 Multicast application is a simple example of packet processing 35using the Data Plane Development Kit (DPDK). 36The application performs L3 multicasting. 37 38Overview 39-------- 40 41The application demonstrates the use of zero-copy buffers for packet forwarding. 42The initialization and run-time paths are very similar to those of the :doc:`l2_forward_real_virtual`. 43This guide highlights the differences between the two applications. 44There are two key differences from the L2 Forwarding sample application: 45 46* The IPv4 Multicast sample application makes use of indirect buffers. 47 48* The forwarding decision is taken based on information read from the input packet's IPv4 header. 49 50The lookup method is the Four-byte Key (FBK) hash-based method. 51The lookup table is composed of pairs of destination IPv4 address (the FBK) 52and a port mask associated with that IPv4 address. 53 54.. note:: 55 56 The max port mask supported in the given hash table is 0xf, so only first 57 four ports can be supported. 58 If using non-consecutive ports, use the destination IPv4 address accordingly. 59 60For convenience and simplicity, this sample application does not take IANA-assigned multicast addresses into account, 61but instead equates the last four bytes of the multicast group (that is, the last four bytes of the destination IP address) 62with the mask of ports to multicast packets to. 63Also, the application does not consider the Ethernet addresses; 64it looks only at the IPv4 destination address for any given packet. 65 66Building the Application 67------------------------ 68 69To compile the application: 70 71#. Go to the sample application directory: 72 73 .. code-block:: console 74 75 export RTE_SDK=/path/to/rte_sdk 76 cd ${RTE_SDK}/examples/ipv4_multicast 77 78#. Set the target (a default target is used if not specified). For example: 79 80 .. code-block:: console 81 82 export RTE_TARGET=x86_64-native-linuxapp-gcc 83 84See the *DPDK Getting Started Guide* for possible RTE_TARGET values. 85 86#. Build the application: 87 88 .. code-block:: console 89 90 make 91 92.. note:: 93 94 The compiled application is written to the build subdirectory. 95 To have the application written to a different location, 96 the O=/path/to/build/directory option may be specified in the make command. 97 98Running the Application 99----------------------- 100 101The application has a number of command line options: 102 103.. code-block:: console 104 105 ./build/ipv4_multicast [EAL options] -- -p PORTMASK [-q NQ] 106 107where, 108 109* -p PORTMASK: Hexadecimal bitmask of ports to configure 110 111* -q NQ: determines the number of queues per lcore 112 113.. note:: 114 115 Unlike the basic L2/L3 Forwarding sample applications, 116 NUMA support is not provided in the IPv4 Multicast sample application. 117 118Typically, to run the IPv4 Multicast sample application, issue the following command (as root): 119 120.. code-block:: console 121 122 ./build/ipv4_multicast -l 0-3 -n 3 -- -p 0x3 -q 1 123 124In this command: 125 126* The -l option enables cores 0, 1, 2 and 3 127 128* The -n option specifies 3 memory channels 129 130* The -p option enables ports 0 and 1 131 132* The -q option assigns 1 queue to each lcore 133 134Refer to the *DPDK Getting Started Guide* for general information on running applications 135and the Environment Abstraction Layer (EAL) options. 136 137Explanation 138----------- 139 140The following sections provide some explanation of the code. 141As mentioned in the overview section, 142the initialization and run-time paths are very similar to those of the :doc:`l2_forward_real_virtual`. 143The following sections describe aspects that are specific to the IPv4 Multicast sample application. 144 145Memory Pool Initialization 146~~~~~~~~~~~~~~~~~~~~~~~~~~ 147 148The IPv4 Multicast sample application uses three memory pools. 149Two of the pools are for indirect buffers used for packet duplication purposes. 150Memory pools for indirect buffers are initialized differently from the memory pool for direct buffers: 151 152.. code-block:: c 153 154 packet_pool = rte_pktmbuf_pool_create("packet_pool", NB_PKT_MBUF, 32, 155 0, PKT_MBUF_DATA_SIZE, rte_socket_id()); 156 header_pool = rte_pktmbuf_pool_create("header_pool", NB_HDR_MBUF, 32, 157 0, HDR_MBUF_DATA_SIZE, rte_socket_id()); 158 clone_pool = rte_pktmbuf_pool_create("clone_pool", NB_CLONE_MBUF, 32, 159 0, 0, rte_socket_id()); 160 161The reason for this is because indirect buffers are not supposed to hold any packet data and 162therefore can be initialized with lower amount of reserved memory for each buffer. 163 164Hash Initialization 165~~~~~~~~~~~~~~~~~~~ 166 167The hash object is created and loaded with the pre-configured entries read from a global array: 168 169.. code-block:: c 170 171 static int 172 173 init_mcast_hash(void) 174 { 175 uint32_t i; 176 mcast_hash_params.socket_id = rte_socket_id(); 177 178 mcast_hash = rte_fbk_hash_create(&mcast_hash_params); 179 if (mcast_hash == NULL){ 180 return -1; 181 } 182 183 for (i = 0; i < N_MCAST_GROUPS; i ++){ 184 if (rte_fbk_hash_add_key(mcast_hash, mcast_group_table[i].ip, mcast_group_table[i].port_mask) < 0) { 185 return -1; 186 } 187 } 188 return 0; 189 } 190 191Forwarding 192~~~~~~~~~~ 193 194All forwarding is done inside the mcast_forward() function. 195Firstly, the Ethernet* header is removed from the packet and the IPv4 address is extracted from the IPv4 header: 196 197.. code-block:: c 198 199 /* Remove the Ethernet header from the input packet */ 200 201 iphdr = (struct ipv4_hdr *)rte_pktmbuf_adj(m, sizeof(struct ether_hdr)); 202 RTE_ASSERT(iphdr != NULL); 203 dest_addr = rte_be_to_cpu_32(iphdr->dst_addr); 204 205Then, the packet is checked to see if it has a multicast destination address and 206if the routing table has any ports assigned to the destination address: 207 208.. code-block:: c 209 210 if (!IS_IPV4_MCAST(dest_addr) || 211 (hash = rte_fbk_hash_lookup(mcast_hash, dest_addr)) <= 0 || 212 (port_mask = hash & enabled_port_mask) == 0) { 213 rte_pktmbuf_free(m); 214 return; 215 } 216 217Then, the number of ports in the destination portmask is calculated with the help of the bitcnt() function: 218 219.. code-block:: c 220 221 /* Get number of bits set. */ 222 223 static inline uint32_t bitcnt(uint32_t v) 224 { 225 uint32_t n; 226 227 for (n = 0; v != 0; v &= v - 1, n++) 228 ; 229 return n; 230 } 231 232This is done to determine which forwarding algorithm to use. 233This is explained in more detail in the next section. 234 235Thereafter, a destination Ethernet address is constructed: 236 237.. code-block:: c 238 239 /* construct destination Ethernet address */ 240 241 dst_eth_addr = ETHER_ADDR_FOR_IPV4_MCAST(dest_addr); 242 243Since Ethernet addresses are also part of the multicast process, each outgoing packet carries the same destination Ethernet address. 244The destination Ethernet address is constructed from the lower 23 bits of the multicast group OR-ed 245with the Ethernet address 01:00:5e:00:00:00, as per RFC 1112: 246 247.. code-block:: c 248 249 #define ETHER_ADDR_FOR_IPV4_MCAST(x) \ 250 (rte_cpu_to_be_64(0x01005e000000ULL | ((x) & 0x7fffff)) >> 16) 251 252Then, packets are dispatched to the destination ports according to the portmask associated with a multicast group: 253 254.. code-block:: c 255 256 for (port = 0; use_clone != port_mask; port_mask >>= 1, port++) { 257 /* Prepare output packet and send it out. */ 258 259 if ((port_mask & 1) != 0) { 260 if (likely ((mc = mcast_out_pkt(m, use_clone)) != NULL)) 261 mcast_send_pkt(mc, &dst_eth_addr.as_addr, qconf, port); 262 else if (use_clone == 0) 263 rte_pktmbuf_free(m); 264 } 265 } 266 267The actual packet transmission is done in the mcast_send_pkt() function: 268 269.. code-block:: c 270 271 static inline void mcast_send_pkt(struct rte_mbuf *pkt, struct ether_addr *dest_addr, struct lcore_queue_conf *qconf, uint16_t port) 272 { 273 struct ether_hdr *ethdr; 274 uint16_t len; 275 276 /* Construct Ethernet header. */ 277 278 ethdr = (struct ether_hdr *)rte_pktmbuf_prepend(pkt, (uint16_t) sizeof(*ethdr)); 279 280 RTE_ASSERT(ethdr != NULL); 281 282 ether_addr_copy(dest_addr, ðdr->d_addr); 283 ether_addr_copy(&ports_eth_addr[port], ðdr->s_addr); 284 ethdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv4); 285 286 /* Put new packet into the output queue */ 287 288 len = qconf->tx_mbufs[port].len; 289 qconf->tx_mbufs[port].m_table[len] = pkt; 290 qconf->tx_mbufs[port].len = ++len; 291 292 /* Transmit packets */ 293 294 if (unlikely(MAX_PKT_BURST == len)) 295 send_burst(qconf, port); 296 } 297 298Buffer Cloning 299~~~~~~~~~~~~~~ 300 301This is the most important part of the application since it demonstrates the use of zero- copy buffer cloning. 302There are two approaches for creating the outgoing packet and although both are based on the data zero-copy idea, 303there are some differences in the detail. 304 305The first approach creates a clone of the input packet, for example, 306walk though all segments of the input packet and for each of segment, 307create a new buffer and attach that new buffer to the segment 308(refer to rte_pktmbuf_clone() in the rte_mbuf library for more details). 309A new buffer is then allocated for the packet header and is prepended to the cloned buffer. 310 311The second approach does not make a clone, it just increments the reference counter for all input packet segment, 312allocates a new buffer for the packet header and prepends it to the input packet. 313 314Basically, the first approach reuses only the input packet's data, but creates its own copy of packet's metadata. 315The second approach reuses both input packet's data and metadata. 316 317The advantage of first approach is that each outgoing packet has its own copy of the metadata, 318so we can safely modify the data pointer of the input packet. 319That allows us to skip creation if the output packet is for the last destination port 320and instead modify input packet's header in place. 321For example, for N destination ports, we need to invoke mcast_out_pkt() (N-1) times. 322 323The advantage of the second approach is that there is less work to be done for each outgoing packet, 324that is, the "clone" operation is skipped completely. 325However, there is a price to pay. 326The input packet's metadata must remain intact, so for N destination ports, 327we need to invoke mcast_out_pkt() (N) times. 328 329Therefore, for a small number of outgoing ports (and segments in the input packet), 330first approach is faster. 331As the number of outgoing ports (and/or input segments) grows, the second approach becomes more preferable. 332 333Depending on the number of segments or the number of ports in the outgoing portmask, 334either the first (with cloning) or the second (without cloning) approach is taken: 335 336.. code-block:: c 337 338 use_clone = (port_num <= MCAST_CLONE_PORTS && m->pkt.nb_segs <= MCAST_CLONE_SEGS); 339 340It is the mcast_out_pkt() function that performs the packet duplication (either with or without actually cloning the buffers): 341 342.. code-block:: c 343 344 static inline struct rte_mbuf *mcast_out_pkt(struct rte_mbuf *pkt, int use_clone) 345 { 346 struct rte_mbuf *hdr; 347 348 /* Create new mbuf for the header. */ 349 350 if (unlikely ((hdr = rte_pktmbuf_alloc(header_pool)) == NULL)) 351 return NULL; 352 353 /* If requested, then make a new clone packet. */ 354 355 if (use_clone != 0 && unlikely ((pkt = rte_pktmbuf_clone(pkt, clone_pool)) == NULL)) { 356 rte_pktmbuf_free(hdr); 357 return NULL; 358 } 359 360 /* prepend new header */ 361 362 hdr->pkt.next = pkt; 363 364 /* update header's fields */ 365 366 hdr->pkt.pkt_len = (uint16_t)(hdr->pkt.data_len + pkt->pkt.pkt_len); 367 hdr->pkt.nb_segs = (uint8_t)(pkt->pkt.nb_segs + 1); 368 369 /* copy metadata from source packet */ 370 371 hdr->pkt.in_port = pkt->pkt.in_port; 372 hdr->pkt.vlan_macip = pkt->pkt.vlan_macip; 373 hdr->pkt.hash = pkt->pkt.hash; 374 hdr->ol_flags = pkt->ol_flags; 375 rte_mbuf_sanity_check(hdr, RTE_MBUF_PKT, 1); 376 377 return hdr; 378 } 379