1.\" Copyright (c) 2001-2003 International Computer Science Institute 2.\" 3.\" Permission is hereby granted, free of charge, to any person obtaining a 4.\" copy of this software and associated documentation files (the "Software"), 5.\" to deal in the Software without restriction, including without limitation 6.\" the rights to use, copy, modify, merge, publish, distribute, sublicense, 7.\" and/or sell copies of the Software, and to permit persons to whom the 8.\" Software is furnished to do so, subject to the following conditions: 9.\" 10.\" The above copyright notice and this permission notice shall be included in 11.\" all copies or substantial portions of the Software. 12.\" 13.\" The names and trademarks of copyright holders may not be used in 14.\" advertising or publicity pertaining to the software without specific 15.\" prior permission. Title to copyright in this software and any associated 16.\" documentation will at all times remain with the copyright holders. 17.\" 18.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 19.\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 20.\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 21.\" AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 22.\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 23.\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 24.\" DEALINGS IN THE SOFTWARE. 25.\" 26.\" $FreeBSD: src/share/man/man4/multicast.4,v 1.4 2004/07/09 09:22:36 ru Exp $ 27.\" $OpenBSD: multicast.4,v 1.11 2016/12/22 11:04:44 rzalamena Exp $ 28.\" $NetBSD: multicast.4,v 1.3 2004/09/12 13:12:26 wiz Exp $ 29.\" 30.Dd $Mdocdate: December 22 2016 $ 31.Dt MULTICAST 4 32.Os 33.\" 34.Sh NAME 35.Nm multicast 36.Nd Multicast Routing 37.\" 38.Sh SYNOPSIS 39.Cd "options MROUTING" 40.Pp 41.In sys/types.h 42.In sys/socket.h 43.In netinet/in.h 44.In netinet/ip_mroute.h 45.In netinet6/ip6_mroute.h 46.Ft int 47.Fn getsockopt "int s" IPPROTO_IP MRT_INIT "void *optval" "socklen_t *optlen" 48.Ft int 49.Fn setsockopt "int s" IPPROTO_IP MRT_INIT "const void *optval" "socklen_t optlen" 50.Ft int 51.Fn getsockopt "int s" IPPROTO_IPV6 MRT6_INIT "void *optval" "socklen_t *optlen" 52.Ft int 53.Fn setsockopt "int s" IPPROTO_IPV6 MRT6_INIT "const void *optval" "socklen_t optlen" 54.Sh DESCRIPTION 55.Tn "Multicast routing" 56is used to efficiently propagate data 57packets to a set of multicast listeners in multipoint networks. 58If unicast is used to replicate the data to all listeners, 59then some of the network links may carry multiple copies of the same 60data packets. 61With multicast routing, the overhead is reduced to one copy 62(at most) per network link. 63.Pp 64All multicast-capable routers must run a common multicast routing 65protocol. 66The Distance Vector Multicast Routing Protocol (DVMRP) 67was the first developed multicast routing protocol. 68Later, other protocols such as Multicast Extensions to OSPF (MOSPF) and 69Core Based Trees (CBT) 70were developed as well. 71.Pp 72To start multicast routing, 73the user must enable multicast forwarding via the 74.Xr sysctl 8 75variables 76.Va net.inet.ip.mforwarding 77and/or 78.Va net.inet.ip6.mforwarding . 79The user must also run a multicast routing capable user-level process, 80such as 81.Xr mrouted 8 . 82From a developer's point of view, 83the programming guide described in the 84.Sx Programming Guide 85section should be used to control the multicast forwarding in the kernel. 86.\" 87.Ss Programming Guide 88This section provides information about the basic multicast routing API. 89The so-called 90.Dq advanced multicast API 91is described in the 92.Sx "Advanced Multicast API Programming Guide" 93section. 94.Pp 95First, a multicast routing socket must be open. 96That socket would be used 97to control the multicast forwarding in the kernel. 98Note that most operations below require certain privilege 99(i.e., root privilege): 100.Bd -literal -offset indent 101/* IPv4 */ 102int mrouter_s4; 103mrouter_s4 = socket(AF_INET, SOCK_RAW, IPPROTO_IGMP); 104.Ed 105.Bd -literal -offset indent 106int mrouter_s6; 107mrouter_s6 = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6); 108.Ed 109.Pp 110Note that if the router needs to open an IGMP or ICMPv6 socket 111(IPv4 or IPv6, respectively) 112for sending or receiving of IGMP or MLD multicast group membership messages, 113then the same 114.Va mrouter_s4 115or 116.Va mrouter_s6 117sockets should be used 118for sending and receiving respectively IGMP or MLD messages. 119In the case of 120.Bx -derived 121kernels, 122it may be possible to open separate sockets 123for IGMP or MLD messages only. 124However, some other kernels (e.g., 125.Tn Linux ) 126require that the multicast 127routing socket must be used for sending and receiving of IGMP or MLD 128messages. 129Therefore, for portability reasons, the multicast 130routing socket should be reused for IGMP and MLD messages as well. 131.Pp 132After the multicast routing socket is open, it can be used to enable 133or disable multicast forwarding in the kernel: 134.Bd -literal -offset 5n 135/* IPv4 */ 136int v = 1; /* 1 to enable, or 0 to disable */ 137setsockopt(mrouter_s4, IPPROTO_IP, MRT_INIT, (void *)&v, sizeof(v)); 138.Ed 139.Bd -literal -offset 5n 140/* IPv6 */ 141int v = 1; /* 1 to enable, or 0 to disable */ 142setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_INIT, (void *)&v, sizeof(v)); 143\&... 144/* If necessary, filter all ICMPv6 messages */ 145struct icmp6_filter filter; 146ICMP6_FILTER_SETBLOCKALL(&filter); 147setsockopt(mrouter_s6, IPPROTO_ICMPV6, ICMP6_FILTER, (void *)&filter, 148 sizeof(filter)); 149.Ed 150.Pp 151For each network interface (e.g., physical or a virtual tunnel) 152that would be used for multicast forwarding, a corresponding 153multicast interface must be added to the kernel: 154.Bd -literal -offset 3n 155/* IPv4 */ 156struct vifctl vc; 157memset(&vc, 0, sizeof(vc)); 158/* Assign all vifctl fields as appropriate */ 159vc.vifc_vifi = vif_index; 160vc.vifc_flags = vif_flags; 161vc.vifc_threshold = min_ttl_threshold; 162vc.vifc_rate_limit = max_rate_limit; 163memcpy(&vc.vifc_lcl_addr, &vif_local_address, sizeof(vc.vifc_lcl_addr)); 164if (vc.vifc_flags & VIFF_TUNNEL) 165 memcpy(&vc.vifc_rmt_addr, &vif_remote_address, 166 sizeof(vc.vifc_rmt_addr)); 167setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_VIF, (void *)&vc, 168 sizeof(vc)); 169.Ed 170.Pp 171The 172.Va vif_index 173must be unique per vif. 174The 175.Va vif_flags 176contains the 177.Dv VIFF_* 178flags as defined in 179.In netinet/ip_mroute.h . 180The 181.Va min_ttl_threshold 182contains the minimum TTL a multicast data packet must have to be 183forwarded on that vif. 184Typically, it would be 1. 185The 186.Va max_rate_limit 187contains the maximum rate (in bits/s) of the multicast data packets forwarded 188on that vif. 189A value of 0 means no limit. 190The 191.Va vif_local_address 192contains the local IP address of the corresponding local interface. 193The 194.Va vif_remote_address 195contains the remote IP address for DVMRP multicast tunnels. 196.Bd -literal -offset indent 197/* IPv6 */ 198struct mif6ctl mc; 199memset(&mc, 0, sizeof(mc)); 200/* Assign all mif6ctl fields as appropriate */ 201mc.mif6c_mifi = mif_index; 202mc.mif6c_flags = mif_flags; 203mc.mif6c_pifi = pif_index; 204setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_ADD_MIF, (void *)&mc, 205 sizeof(mc)); 206.Ed 207.Pp 208The 209.Va mif_index 210must be unique per vif. 211The 212.Va mif_flags 213contains the 214.Dv MIFF_* 215flags as defined in 216.In netinet6/ip6_mroute.h . 217The 218.Va pif_index 219is the physical interface index of the corresponding local interface. 220.Pp 221A multicast interface is deleted by: 222.Bd -literal -offset indent 223/* IPv4 */ 224vifi_t vifi = vif_index; 225setsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_VIF, (void *)&vifi, 226 sizeof(vifi)); 227.Ed 228.Bd -literal -offset indent 229/* IPv6 */ 230mifi_t mifi = mif_index; 231setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_DEL_MIF, (void *)&mifi, 232 sizeof(mifi)); 233.Ed 234.Pp 235After multicast forwarding is enabled, and the multicast virtual 236interfaces have been 237added, the kernel may deliver upcall messages (also called signals 238later in this text) on the multicast routing socket that was open 239earlier with 240.Dv MRT_INIT 241or 242.Dv MRT6_INIT . 243The IPv4 upcalls have a 244.Vt "struct igmpmsg" 245header (see 246.In netinet/ip_mroute.h ) 247with the 248.Va im_mbz 249field set to zero. 250Note that this header follows the structure of 251.Vt "struct ip" 252with the protocol field 253.Va ip_p 254set to zero. 255The IPv6 upcalls have a 256.Vt "struct mrt6msg" 257header (see 258.In netinet6/ip6_mroute.h ) 259with the 260.Va im6_mbz 261field set to zero. 262Note that this header follows the structure of 263.Vt "struct ip6_hdr" 264with the next header field 265.Va ip6_nxt 266set to zero. 267.Pp 268The upcall header contains the 269.Va im_msgtype 270and 271.Va im6_msgtype 272fields, with the type of the upcall 273.Dv IGMPMSG_* 274and 275.Dv MRT6MSG_* 276for IPv4 and IPv6, respectively. 277The values of the rest of the upcall header fields 278and the body of the upcall message depend on the particular upcall type. 279.Pp 280If the upcall message type is 281.Dv IGMPMSG_NOCACHE 282or 283.Dv MRT6MSG_NOCACHE , 284this is an indication that a multicast packet has reached the multicast 285router, but the router has no forwarding state for that packet. 286Typically, the upcall would be a signal for the multicast routing 287user-level process to install the appropriate Multicast Forwarding 288Cache (MFC) entry in the kernel. 289.Pp 290An MFC entry is added by: 291.Bd -literal -offset indent 292/* IPv4 */ 293struct mfcctl mc; 294memset(&mc, 0, sizeof(mc)); 295memcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin)); 296memcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp)); 297mc.mfcc_parent = iif_index; 298for (i = 0; i < maxvifs; i++) 299 mc.mfcc_ttls[i] = oifs_ttl[i]; 300setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_MFC, 301 (void *)&mc, sizeof(mc)); 302.Ed 303.Bd -literal -offset indent 304/* IPv6 */ 305struct mf6cctl mc; 306memset(&mc, 0, sizeof(mc)); 307memcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin)); 308memcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp)); 309mc.mf6cc_parent = iif_index; 310for (i = 0; i < maxvifs; i++) 311 if (oifs_ttl[i] > 0) 312 IF_SET(i, &mc.mf6cc_ifset); 313setsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_ADD_MFC, 314 (void *)&mc, sizeof(mc)); 315.Ed 316.Pp 317The 318.Va source_addr 319and 320.Va group_addr 321fields are the source and group address of the multicast packet (as set 322in the upcall message). 323The 324.Va iif_index 325is the virtual interface index of the multicast interface the multicast 326packets for this specific source and group address should be received on. 327The 328.Va oifs_ttl[] 329array contains the minimum TTL (per interface) a multicast packet 330should have to be forwarded on an outgoing interface. 331If the TTL value is zero, the corresponding interface is not included 332in the set of outgoing interfaces. 333Note that for IPv6 only the set of outgoing interfaces can 334be specified. 335.Pp 336An MFC entry is deleted by: 337.Bd -literal -offset indent 338/* IPv4 */ 339struct mfcctl mc; 340memset(&mc, 0, sizeof(mc)); 341memcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin)); 342memcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp)); 343setsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_MFC, 344 (void *)&mc, sizeof(mc)); 345.Ed 346.Bd -literal -offset indent 347/* IPv6 */ 348struct mf6cctl mc; 349memset(&mc, 0, sizeof(mc)); 350memcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin)); 351memcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp)); 352setsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_DEL_MFC, 353 (void *)&mc, sizeof(mc)); 354.Ed 355.Pp 356The following method can be used to get various statistics per 357installed MFC entry in the kernel (e.g., the number of forwarded 358packets per source and group address): 359.Bd -literal -offset indent 360/* IPv4 */ 361struct sioc_sg_req sgreq; 362memset(&sgreq, 0, sizeof(sgreq)); 363memcpy(&sgreq.src, &source_addr, sizeof(sgreq.src)); 364memcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp)); 365ioctl(mrouter_s4, SIOCGETSGCNT, &sgreq); 366.Ed 367.Bd -literal -offset indent 368/* IPv6 */ 369struct sioc_sg_req6 sgreq; 370memset(&sgreq, 0, sizeof(sgreq)); 371memcpy(&sgreq.src, &source_addr, sizeof(sgreq.src)); 372memcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp)); 373ioctl(mrouter_s6, SIOCGETSGCNT_IN6, &sgreq); 374.Ed 375.Pp 376The following method can be used to get various statistics per 377multicast virtual interface in the kernel (e.g., the number of forwarded 378packets per interface): 379.Bd -literal -offset indent 380/* IPv4 */ 381struct sioc_vif_req vreq; 382memset(&vreq, 0, sizeof(vreq)); 383vreq.vifi = vif_index; 384ioctl(mrouter_s4, SIOCGETVIFCNT, &vreq); 385.Ed 386.Bd -literal -offset indent 387/* IPv6 */ 388struct sioc_mif_req6 mreq; 389memset(&mreq, 0, sizeof(mreq)); 390mreq.mifi = vif_index; 391ioctl(mrouter_s6, SIOCGETMIFCNT_IN6, &mreq); 392.Ed 393.Ss Advanced Multicast API Programming Guide 394Adding new features to the kernel makes it difficult 395to preserve backward compatibility (binary and API), 396and at the same time to allow user-level processes to take advantage of 397the new features (if the kernel supports them). 398.Pp 399One of the mechanisms that allows preserving the backward 400compatibility is a sort of negotiation 401between the user-level process and the kernel: 402.Bl -enum 403.It 404The user-level process tries to enable in the kernel the set of new 405features (and the corresponding API) it would like to use. 406.It 407The kernel returns the (sub)set of features it knows about 408and is willing to be enabled. 409.It 410The user-level process uses only that set of features 411the kernel has agreed on. 412.El 413.\" 414.Pp 415To support backward compatibility, if the user-level process does not 416ask for any new features, the kernel defaults to the basic 417multicast API (see the 418.Sx "Programming Guide" 419section). 420.\" XXX: edit as appropriate after the advanced multicast API is 421.\" supported under IPv6 422Currently, the advanced multicast API exists only for IPv4; 423in the future there will be IPv6 support as well. 424.Pp 425Below is a summary of the expandable API solution. 426Note that all new options and structures are defined 427in 428.In netinet/ip_mroute.h 429and 430.In netinet6/ip6_mroute.h , 431unless stated otherwise. 432.Pp 433The user-level process uses new 434.Fn getsockopt Ns / Ns Fn setsockopt 435options to 436perform the API features negotiation with the kernel. 437This negotiation must be performed right after the multicast routing 438socket is open. 439The set of desired/allowed features is stored in a bitset 440(currently, in 441.Vt uint32_t 442i.e., maximum of 32 new features). 443The new 444.Fn getsockopt Ns / Ns Fn setsockopt 445options are 446.Dv MRT_API_SUPPORT 447and 448.Dv MRT_API_CONFIG . 449An example: 450.Bd -literal -offset 3n 451uint32_t v; 452getsockopt(sock, IPPROTO_IP, MRT_API_SUPPORT, (void *)&v, sizeof(v)); 453.Ed 454.Pp 455This would set 456.Va v 457to the pre-defined bits that the kernel API supports. 458The eight least significant bits in 459.Vt uint32_t 460are the same as the 461eight possible flags 462.Dv MRT_MFC_FLAGS_* 463that can be used in 464.Va mfcc_flags 465as part of the new definition of 466.Vt "struct mfcctl" 467(see below about those flags), which leaves 24 flags for other new features. 468The value returned by 469.Fn getsockopt MRT_API_SUPPORT 470is read-only; in other words, 471.Fn setsockopt MRT_API_SUPPORT 472would fail. 473.Pp 474To modify the API, and to set some specific feature in the kernel, then: 475.Bd -literal -offset 3n 476uint32_t v = MRT_MFC_FLAGS_DISABLE_WRONGVIF; 477if (setsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v)) 478 != 0) { 479 return (ERROR); 480} 481if (v & MRT_MFC_FLAGS_DISABLE_WRONGVIF) 482 return (OK); /* Success */ 483else 484 return (ERROR); 485.Ed 486.Pp 487In other words, when 488.Fn setsockopt MRT_API_CONFIG 489is called, the 490argument to it specifies the desired set of features to 491be enabled in the API and the kernel. 492The return value in 493.Va v 494is the actual (sub)set of features that were enabled in the kernel. 495To obtain later the same set of features that were enabled, use: 496.Bd -literal -offset indent 497getsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v)); 498.Ed 499.Pp 500The set of enabled features is global. 501In other words, 502.Fn setsockopt MRT_API_CONFIG 503should be called right after 504.Fn setsockopt MRT_INIT . 505.Pp 506Currently, the following set of new features is defined: 507.Bd -literal 508#define MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0)/*disable WRONGVIF signals*/ 509#define MRT_MFC_FLAGS_BORDER_VIF (1 << 1) /* border vif */ 510#define MRT_MFC_RP (1 << 8) /* enable RP address */ 511#define MRT_MFC_BW_UPCALL (1 << 9) /* enable bw upcalls */ 512.Ed 513.\" .Pp 514.\" In the future there might be: 515.\" .Bd -literal 516.\" #define MRT_MFC_GROUP_SPECIFIC (1 << 10) /* allow (*,G) MFC entries */ 517.\" .Ed 518.\" .Pp 519.\" to allow (*,G) MFC entries (i.e., group-specific entries) in the kernel. 520.\" For now this is left-out until it is clear whether 521.\" (*,G) MFC support is the preferred solution instead of something more generic 522.\" solution for example. 523.\" 524.\" 2. The newly defined struct mfcctl2. 525.\" 526.Pp 527The advanced multicast API uses a newly defined 528.Vt "struct mfcctl2" 529instead of the traditional 530.Vt "struct mfcctl" . 531The original 532.Vt "struct mfcctl" 533is kept as is. 534The new 535.Vt "struct mfcctl2" 536is: 537.Bd -literal 538/* 539 * The new argument structure for MRT_ADD_MFC and MRT_DEL_MFC overlays 540 * and extends the old struct mfcctl. 541 */ 542struct mfcctl2 { 543 /* the mfcctl fields */ 544 struct in_addr mfcc_origin; /* ip origin of mcasts */ 545 struct in_addr mfcc_mcastgrp; /* multicast group associated*/ 546 vifi_t mfcc_parent; /* incoming vif */ 547 u_char mfcc_ttls[MAXVIFS];/* forwarding ttls on vifs */ 548 549 /* extension fields */ 550 uint8_t mfcc_flags[MAXVIFS];/* the MRT_MFC_FLAGS_* flags*/ 551 struct in_addr mfcc_rp; /* the RP address */ 552}; 553.Ed 554.Pp 555The new fields are 556.Va mfcc_flags[MAXVIFS] 557and 558.Va mfcc_rp . 559Note that for compatibility reasons they are added at the end. 560.Pp 561The 562.Va mfcc_flags[MAXVIFS] 563field is used to set various flags per 564interface per (S,G) entry. 565Currently, the defined flags are: 566.Bd -literal 567#define MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0)/*disable WRONGVIF signals*/ 568#define MRT_MFC_FLAGS_BORDER_VIF (1 << 1) /* border vif */ 569.Ed 570.Pp 571The 572.Dv MRT_MFC_FLAGS_DISABLE_WRONGVIF 573flag is used to explicitly disable the 574.Dv IGMPMSG_WRONGVIF 575kernel signal at the (S,G) granularity if a multicast data packet 576arrives on the wrong interface. 577However, it should not be delivered for interfaces that are not set in 578the outgoing interface, and that are not expecting to 579become an incoming interface. 580Hence, if the 581.Dv MRT_MFC_FLAGS_DISABLE_WRONGVIF 582flag is set for some of the 583interfaces, then a data packet that arrives on that interface for 584that MFC entry will NOT trigger a WRONGVIF signal. 585If that flag is not set, then a signal is triggered (the default action). 586.Pp 587Typically, a multicast routing user-level process would need to know the 588forwarding bandwidth for some data flow. 589.Pp 590The original solution for measuring the bandwidth of a dataflow was 591that a user-level process would periodically 592query the kernel about the number of forwarded packets/bytes per 593(S,G), and then based on those numbers it would estimate whether a source 594has been idle, or whether the source's transmission bandwidth is above a 595threshold. 596That solution is far from being scalable, hence the need for a new 597mechanism for bandwidth monitoring. 598.Pp 599Below is a description of the bandwidth monitoring mechanism. 600.Bl -bullet 601.It 602If the bandwidth of a data flow satisfies some pre-defined filter, 603the kernel delivers an upcall on the multicast routing socket 604to the multicast routing process that has installed that filter. 605.It 606The bandwidth-upcall filters are installed per (S,G). 607There can be 608more than one filter per (S,G). 609.It 610Instead of supporting all possible comparison operations 611(i.e., < <= == != > >= ), there is support only for the 612<= and >= operations, 613because this makes the kernel-level implementation simpler, 614and because practically we need only those two. 615Furthermore, the missing operations can be simulated by secondary 616user-level filtering of those <= and >= filters. 617For example, to simulate !=, then we need to install filter 618.Dq bw <= 0xffffffff , 619and after an 620upcall is received, we need to check whether 621.Dq measured_bw != expected_bw . 622.It 623The bandwidth-upcall mechanism is enabled by 624.Fn setsockopt MRT_API_CONFIG 625for the 626.Dv MRT_MFC_BW_UPCALL 627flag. 628.It 629The bandwidth-upcall filters are added/deleted by the new 630.Fn setsockopt MRT_ADD_BW_UPCALL 631and 632.Fn setsockopt MRT_DEL_BW_UPCALL 633respectively (with the appropriate 634.Vt "struct bw_upcall" 635argument of course). 636.El 637.Pp 638From an application point of view, a developer needs to know about 639the following: 640.Bd -literal 641/* 642 * Structure for installing or delivering an upcall if the 643 * measured bandwidth is above or below a threshold. 644 * 645 * User programs (e.g. daemons) may have a need to know when the 646 * bandwidth used by some data flow is above or below some threshold. 647 * This interface allows the userland to specify the threshold (in 648 * bytes and/or packets) and the measurement interval. Flows are 649 * all packet with the same source and destination IP address. 650 * At the moment the code is only used for multicast destinations 651 * but there is nothing that prevents its use for unicast. 652 * 653 * The measurement interval cannot be shorter than some Tmin (3s). 654 * The threshold is set in packets and/or bytes per_interval. 655 * 656 * Measurement works as follows: 657 * 658 * For >= measurements: 659 * The first packet marks the start of a measurement interval. 660 * During an interval we count packets and bytes, and when we 661 * pass the threshold we deliver an upcall and we are done. 662 * The first packet after the end of the interval resets the 663 * count and restarts the measurement. 664 * 665 * For <= measurement: 666 * We start a timer to fire at the end of the interval, and 667 * then for each incoming packet we count packets and bytes. 668 * When the timer fires, we compare the value with the threshold, 669 * schedule an upcall if we are below, and restart the measurement 670 * (reschedule timer and zero counters). 671 */ 672 673struct bw_data { 674 struct timeval b_time; 675 uint64_t b_packets; 676 uint64_t b_bytes; 677}; 678 679struct bw_upcall { 680 struct in_addr bu_src; /* source address */ 681 struct in_addr bu_dst; /* destination address */ 682 uint32_t bu_flags; /* misc flags (see below) */ 683#define BW_UPCALL_UNIT_PACKETS (1 << 0) /* threshold (in packets) */ 684#define BW_UPCALL_UNIT_BYTES (1 << 1) /* threshold (in bytes) */ 685#define BW_UPCALL_GEQ (1 << 2) /* upcall if bw >= threshold */ 686#define BW_UPCALL_LEQ (1 << 3) /* upcall if bw <= threshold */ 687#define BW_UPCALL_DELETE_ALL (1 << 4) /* delete all upcalls for s,d*/ 688 struct bw_data bu_threshold; /* the bw threshold */ 689 struct bw_data bu_measured; /* the measured bw */ 690}; 691 692/* max. number of upcalls to deliver together */ 693#define BW_UPCALLS_MAX 128 694/* min. threshold time interval for bandwidth measurement */ 695#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_SEC 3 696#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_USEC 0 697.Ed 698.Pp 699The 700.Vt bw_upcall 701structure is used as an argument to 702.Fn setsockopt MRT_ADD_BW_UPCALL 703and 704.Fn setsockopt MRT_DEL_BW_UPCALL . 705Each 706.Fn setsockopt MRT_ADD_BW_UPCALL 707installs a filter in the kernel 708for the source and destination address in the 709.Vt bw_upcall 710argument, 711and that filter will trigger an upcall according to the following 712pseudo-algorithm: 713.Bd -literal 714 if (bw_upcall_oper IS ">=") { 715 if (((bw_upcall_unit & PACKETS == PACKETS) && 716 (measured_packets >= threshold_packets)) || 717 ((bw_upcall_unit & BYTES == BYTES) && 718 (measured_bytes >= threshold_bytes))) 719 SEND_UPCALL("measured bandwidth is >= threshold"); 720 } 721 if (bw_upcall_oper IS "<=" && measured_interval >= threshold_interval) { 722 if (((bw_upcall_unit & PACKETS == PACKETS) && 723 (measured_packets <= threshold_packets)) || 724 ((bw_upcall_unit & BYTES == BYTES) && 725 (measured_bytes <= threshold_bytes))) 726 SEND_UPCALL("measured bandwidth is <= threshold"); 727 } 728.Ed 729.Pp 730In the same 731.Vt bw_upcall , 732the unit can be specified in both BYTES and PACKETS. 733However, the GEQ and LEQ flags are mutually exclusive. 734.Pp 735Basically, an upcall is delivered if the measured bandwidth is >= or 736<= the threshold bandwidth (within the specified measurement 737interval). 738For practical reasons, the smallest value for the measurement 739interval is 3 seconds. 740If smaller values are allowed, then the bandwidth 741estimation may be less accurate, or the potentially very high frequency 742of the generated upcalls may introduce too much overhead. 743For the >= operation, the answer may be known before the end of 744.Va threshold_interval , 745therefore the upcall may be delivered earlier. 746For the <= operation however, we must wait 747until the threshold interval has expired to know the answer. 748.Sh EXAMPLES 749.Bd -literal -offset indent 750struct bw_upcall bw_upcall; 751/* Assign all bw_upcall fields as appropriate */ 752memset(&bw_upcall, 0, sizeof(bw_upcall)); 753memcpy(&bw_upcall.bu_src, &source, sizeof(bw_upcall.bu_src)); 754memcpy(&bw_upcall.bu_dst, &group, sizeof(bw_upcall.bu_dst)); 755bw_upcall.bu_threshold.b_data = threshold_interval; 756bw_upcall.bu_threshold.b_packets = threshold_packets; 757bw_upcall.bu_threshold.b_bytes = threshold_bytes; 758if (is_threshold_in_packets) 759 bw_upcall.bu_flags |= BW_UPCALL_UNIT_PACKETS; 760if (is_threshold_in_bytes) 761 bw_upcall.bu_flags |= BW_UPCALL_UNIT_BYTES; 762do { 763 if (is_geq_upcall) { 764 bw_upcall.bu_flags |= BW_UPCALL_GEQ; 765 break; 766 } 767 if (is_leq_upcall) { 768 bw_upcall.bu_flags |= BW_UPCALL_LEQ; 769 break; 770 } 771 return (ERROR); 772} while (0); 773setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_BW_UPCALL, 774 (void *)&bw_upcall, sizeof(bw_upcall)); 775.Ed 776.Pp 777To delete a single filter, use 778.Dv MRT_DEL_BW_UPCALL , 779and the fields of bw_upcall must be set to 780exactly same as when 781.Dv MRT_ADD_BW_UPCALL 782was called. 783.Pp 784To delete all bandwidth filters for a given (S,G), then 785only the 786.Va bu_src 787and 788.Va bu_dst 789fields in 790.Vt "struct bw_upcall" 791need to be set, and then just set only the 792.Dv BW_UPCALL_DELETE_ALL 793flag inside field 794.Va bw_upcall.bu_flags . 795.Pp 796The bandwidth upcalls are received by aggregating them in the new upcall 797message: 798.Bd -literal -offset indent 799#define IGMPMSG_BW_UPCALL 4 /* BW monitoring upcall */ 800.Ed 801.Pp 802This message is an array of 803.Vt "struct bw_upcall" 804elements (up to 805.Dv BW_UPCALLS_MAX 806= 128). 807The upcalls are 808delivered when there are 128 pending upcalls, or when 1 second has 809expired since the previous upcall (whichever comes first). 810In an 811.Vt "struct upcall" 812element, the 813.Va bu_measured 814field is filled in to 815indicate the particular measured values. 816However, because of the way 817the particular intervals are measured, the user should be careful how 818.Va bu_measured.b_time 819is used. 820For example, if the 821filter is installed to trigger an upcall if the number of packets 822is >= 1, then 823.Va bu_measured 824may have a value of zero in the upcalls after the 825first one, because the measured interval for >= filters is 826.Dq clocked 827by the forwarded packets. 828Hence, this upcall mechanism should not be used for measuring 829the exact value of the bandwidth of the forwarded data. 830To measure the exact bandwidth, the user would need to 831get the forwarded packets statistics with the 832.Fn ioctl SIOCGETSGCNT 833mechanism 834(see the 835.Sx Programming Guide 836section) . 837.Pp 838Note that the upcalls for a filter are delivered until the specific 839filter is deleted, but no more frequently than once per 840.Va bu_threshold.b_time . 841For example, if the filter is specified to 842deliver a signal if bw >= 1 packet, the first packet will trigger a 843signal, but the next upcall will be triggered no earlier than 844.Va bu_threshold.b_time 845after the previous upcall. 846.\" 847.Sh SEE ALSO 848.Xr getsockopt 2 , 849.Xr recvfrom 2 , 850.Xr recvmsg 2 , 851.Xr setsockopt 2 , 852.Xr socket 2 , 853.Xr icmp6 4 , 854.Xr inet 4 , 855.Xr inet6 4 , 856.Xr intro 4 , 857.Xr ip 4 , 858.Xr ip6 4 , 859.Xr mrouted 8 , 860.Xr sysctl 8 861.\" 862.Sh AUTHORS 863.An -nosplit 864The original multicast code was written by 865.An David Waitzman 866(BBN Labs), 867and later modified by the following individuals: 868.An Steve Deering 869(Stanford), 870.An Mark J. Steiglitz 871(Stanford), 872.An Van Jacobson 873(LBL), 874.An Ajit Thyagarajan 875(PARC), 876.An Bill Fenner 877(PARC). 878.Pp 879The IPv6 multicast support was implemented by the KAME project 880.Pq Lk http://www.kame.net , 881and was based on the IPv4 multicast code. 882The advanced multicast API and the multicast bandwidth 883monitoring were implemented by 884.An Pavlin Radoslavov 885(ICSI) 886in collaboration with 887.An Chris Brown 888(NextHop). 889.Pp 890This manual page was written by 891.An Pavlin Radoslavov 892(ICSI). 893