186d7f5d3SJohn Marino.\" Copyright (c) 2001-2003 International Computer Science Institute 286d7f5d3SJohn Marino.\" 386d7f5d3SJohn Marino.\" Permission is hereby granted, free of charge, to any person obtaining a 486d7f5d3SJohn Marino.\" copy of this software and associated documentation files (the "Software"), 586d7f5d3SJohn Marino.\" to deal in the Software without restriction, including without limitation 686d7f5d3SJohn Marino.\" the rights to use, copy, modify, merge, publish, distribute, sublicense, 786d7f5d3SJohn Marino.\" and/or sell copies of the Software, and to permit persons to whom the 886d7f5d3SJohn Marino.\" Software is furnished to do so, subject to the following conditions: 986d7f5d3SJohn Marino.\" 1086d7f5d3SJohn Marino.\" The above copyright notice and this permission notice shall be included in 1186d7f5d3SJohn Marino.\" all copies or substantial portions of the Software. 1286d7f5d3SJohn Marino.\" 1386d7f5d3SJohn Marino.\" The names and trademarks of copyright holders may not be used in 1486d7f5d3SJohn Marino.\" advertising or publicity pertaining to the software without specific 1586d7f5d3SJohn Marino.\" prior permission. Title to copyright in this software and any associated 1686d7f5d3SJohn Marino.\" documentation will at all times remain with the copyright holders. 1786d7f5d3SJohn Marino.\" 1886d7f5d3SJohn Marino.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 1986d7f5d3SJohn Marino.\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 2086d7f5d3SJohn Marino.\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 2186d7f5d3SJohn Marino.\" AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 2286d7f5d3SJohn Marino.\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 2386d7f5d3SJohn Marino.\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 2486d7f5d3SJohn Marino.\" DEALINGS IN THE SOFTWARE. 2586d7f5d3SJohn Marino.\" 2686d7f5d3SJohn Marino.\" $FreeBSD: /repoman/r/ncvs/src/share/man/man4/multicast.4,v 1.1 2003/10/17 15:12:01 bmah Exp $ 2786d7f5d3SJohn Marino.\" $DragonFly: src/share/man/man4/multicast.4,v 1.7 2008/05/02 02:05:05 swildner Exp $ 2886d7f5d3SJohn Marino.\" 2986d7f5d3SJohn Marino.Dd September 4, 2003 3086d7f5d3SJohn Marino.Dt MULTICAST 4 3186d7f5d3SJohn Marino.Os 3286d7f5d3SJohn Marino.\" 3386d7f5d3SJohn Marino.Sh NAME 3486d7f5d3SJohn Marino.Nm multicast 3586d7f5d3SJohn Marino.Nd Multicast Routing 3686d7f5d3SJohn Marino.\" 3786d7f5d3SJohn Marino.Sh SYNOPSIS 3886d7f5d3SJohn Marino.Cd "options MROUTING" 3986d7f5d3SJohn Marino.Pp 4086d7f5d3SJohn Marino.In sys/types.h 4186d7f5d3SJohn Marino.In sys/socket.h 4286d7f5d3SJohn Marino.In netinet/in.h 4386d7f5d3SJohn Marino.In net/ip_mroute/ip_mroute.h 4486d7f5d3SJohn Marino.In netinet6/ip6_mroute.h 4586d7f5d3SJohn Marino.Ft int 4686d7f5d3SJohn Marino.Fn getsockopt "int s" IPPROTO_IP MRT_INIT "void *optval" "socklen_t *optlen" 4786d7f5d3SJohn Marino.Ft int 4886d7f5d3SJohn Marino.Fn setsockopt "int s" IPPROTO_IP MRT_INIT "const void *optval" "socklen_t optlen" 4986d7f5d3SJohn Marino.Ft int 5086d7f5d3SJohn Marino.Fn getsockopt "int s" IPPROTO_IPV6 MRT6_INIT "void *optval" "socklen_t *optlen" 5186d7f5d3SJohn Marino.Ft int 5286d7f5d3SJohn Marino.Fn setsockopt "int s" IPPROTO_IPV6 MRT6_INIT "const void *optval" "socklen_t optlen" 5386d7f5d3SJohn Marino.Sh DESCRIPTION 5486d7f5d3SJohn Marino.Tn "Multicast routing" 5586d7f5d3SJohn Marinois used to efficiently propagate data 5686d7f5d3SJohn Marinopackets to a set of multicast listeners in multipoint networks. 5786d7f5d3SJohn MarinoIf unicast is used to replicate the data to all listeners, 5886d7f5d3SJohn Marinothen some of the network links may carry multiple copies of the same 5986d7f5d3SJohn Marinodata packets. 6086d7f5d3SJohn MarinoWith multicast routing, the overhead is reduced to one copy 6186d7f5d3SJohn Marino(at most) per network link. 6286d7f5d3SJohn Marino.Pp 6386d7f5d3SJohn MarinoAll multicast-capable routers must run a common multicast routing 6486d7f5d3SJohn Marinoprotocol. 6586d7f5d3SJohn MarinoThe Distance Vector Multicast Routing Protocol (DVMRP) 6686d7f5d3SJohn Marinowas the first developed multicast routing protocol. 6786d7f5d3SJohn MarinoLater, other protocols such as Multicast Extensions to OSPF (MOSPF), 6886d7f5d3SJohn MarinoCore Based Trees (CBT), 6986d7f5d3SJohn MarinoProtocol Independent Multicast - Sparse Mode (PIM-SM), 7086d7f5d3SJohn Marinoand Protocol Independent Multicast - Dense Mode (PIM-DM) 7186d7f5d3SJohn Marinowere developed as well. 7286d7f5d3SJohn Marino.Pp 7386d7f5d3SJohn MarinoTo start multicast routing, 7486d7f5d3SJohn Marinothe user must enable multicast forwarding in the kernel 7586d7f5d3SJohn Marino(see 7686d7f5d3SJohn Marino.Sx SYNOPSIS 7786d7f5d3SJohn Marinoabout the kernel configuration options), 7886d7f5d3SJohn Marinoand must run a multicast routing capable user-level process. 7986d7f5d3SJohn MarinoFrom developer's point of view, 8086d7f5d3SJohn Marinothe programming guide described in the 8186d7f5d3SJohn Marino.Sx "Programming Guide" 8286d7f5d3SJohn Marinosection should be used to control the multicast forwarding in the kernel. 8386d7f5d3SJohn Marino.\" 8486d7f5d3SJohn Marino.Ss Programming Guide 8586d7f5d3SJohn MarinoThis section provides information about the basic multicast routing API. 8686d7f5d3SJohn MarinoThe so-called 8786d7f5d3SJohn Marino.Dq advanced multicast API 8886d7f5d3SJohn Marinois described in the 8986d7f5d3SJohn Marino.Sx "Advanced Multicast API Programming Guide" 9086d7f5d3SJohn Marinosection. 9186d7f5d3SJohn Marino.Pp 9286d7f5d3SJohn MarinoFirst, a multicast routing socket must be open. 9386d7f5d3SJohn MarinoThat socket would be used 9486d7f5d3SJohn Marinoto control the multicast forwarding in the kernel. 9586d7f5d3SJohn MarinoNote that most operations below require certain privilege 9686d7f5d3SJohn Marino(i.e., root privilege): 9786d7f5d3SJohn Marino.Bd -literal 9886d7f5d3SJohn Marino/* IPv4 */ 9986d7f5d3SJohn Marinoint mrouter_s4; 10086d7f5d3SJohn Marinomrouter_s4 = socket(AF_INET, SOCK_RAW, IPPROTO_IGMP); 10186d7f5d3SJohn Marino.Ed 10286d7f5d3SJohn Marino.Bd -literal 10386d7f5d3SJohn Marinoint mrouter_s6; 10486d7f5d3SJohn Marinomrouter_s6 = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6); 10586d7f5d3SJohn Marino.Ed 10686d7f5d3SJohn Marino.Pp 10786d7f5d3SJohn MarinoNote that if the router needs to open an IGMP or ICMPv6 socket 10886d7f5d3SJohn Marino(in case of IPv4 and IPv6 respectively) 10986d7f5d3SJohn Marinofor sending or receiving of IGMP or MLD multicast group membership messages, 11086d7f5d3SJohn Marinothen the same mrouter_s4 or mrouter_s6 sockets should be used 11186d7f5d3SJohn Marinofor sending and receiving respectively IGMP or MLD messages. 11286d7f5d3SJohn MarinoIn case of BSD-derived kernel, it may be possible to open separate sockets 11386d7f5d3SJohn Marinofor IGMP or MLD messages only. 11486d7f5d3SJohn MarinoHowever, some other kernels (e.g., Linux) require that the multicast 11586d7f5d3SJohn Marinorouting socket must be used for sending and receiving of IGMP or MLD 11686d7f5d3SJohn Marinomessages. 11786d7f5d3SJohn MarinoTherefore, for portability reason the multicast 11886d7f5d3SJohn Marinorouting socket should be reused for IGMP and MLD messages as well. 11986d7f5d3SJohn Marino.Pp 12086d7f5d3SJohn MarinoAfter the multicast routing socket is open, it can be used to enable 12186d7f5d3SJohn Marinoor disable multicast forwarding in the kernel: 12286d7f5d3SJohn Marino.Bd -literal 12386d7f5d3SJohn Marino/* IPv4 */ 12486d7f5d3SJohn Marinoint v = 1; /* 1 to enable, or 0 to disable */ 12586d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IP, MRT_INIT, (void *)&v, sizeof(v)); 12686d7f5d3SJohn Marino.Ed 12786d7f5d3SJohn Marino.Bd -literal 12886d7f5d3SJohn Marino/* IPv6 */ 12986d7f5d3SJohn Marinoint v = 1; /* 1 to enable, or 0 to disable */ 13086d7f5d3SJohn Marinosetsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_INIT, (void *)&v, sizeof(v)); 13186d7f5d3SJohn Marino\&... 13286d7f5d3SJohn Marino/* If necessary, filter all ICMPv6 messages */ 13386d7f5d3SJohn Marinostruct icmp6_filter filter; 13486d7f5d3SJohn MarinoICMP6_FILTER_SETBLOCKALL(&filter); 13586d7f5d3SJohn Marinosetsockopt(mrouter_s6, IPPROTO_ICMPV6, ICMP6_FILTER, (void *)&filter, 13686d7f5d3SJohn Marino sizeof(filter)); 13786d7f5d3SJohn Marino.Ed 13886d7f5d3SJohn Marino.Pp 13986d7f5d3SJohn MarinoAfter multicast forwarding is enabled, the multicast routing socket 14086d7f5d3SJohn Marinocan be used to enable PIM processing in the kernel if we are running PIM-SM or 14186d7f5d3SJohn MarinoPIM-DM 14286d7f5d3SJohn Marino(see 14386d7f5d3SJohn Marino.Xr pim 4 ) . 14486d7f5d3SJohn Marino.Pp 14586d7f5d3SJohn MarinoFor each network interface (e.g., physical or a virtual tunnel) 14686d7f5d3SJohn Marinothat would be used for multicast forwarding, a corresponding 14786d7f5d3SJohn Marinomulticast interface must be added to the kernel: 14886d7f5d3SJohn Marino.Bd -literal 14986d7f5d3SJohn Marino/* IPv4 */ 15086d7f5d3SJohn Marinostruct vifctl vc; 15186d7f5d3SJohn Marinomemset(&vc, 0, sizeof(vc)); 15286d7f5d3SJohn Marino/* Assign all vifctl fields as appropriate */ 15386d7f5d3SJohn Marinovc.vifc_vifi = vif_index; 15486d7f5d3SJohn Marinovc.vifc_flags = vif_flags; 15586d7f5d3SJohn Marinovc.vifc_threshold = min_ttl_threshold; 15686d7f5d3SJohn Marinovc.vifc_rate_limit = max_rate_limit; 15786d7f5d3SJohn Marinomemcpy(&vc.vifc_lcl_addr, &vif_local_address, sizeof(vc.vifc_lcl_addr)); 15886d7f5d3SJohn Marinoif (vc.vifc_flags & VIFF_TUNNEL) 15986d7f5d3SJohn Marino memcpy(&vc.vifc_rmt_addr, &vif_remote_address, 16086d7f5d3SJohn Marino sizeof(vc.vifc_rmt_addr)); 16186d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_VIF, (void *)&vc, 16286d7f5d3SJohn Marino sizeof(vc)); 16386d7f5d3SJohn Marino.Ed 16486d7f5d3SJohn Marino.Pp 16586d7f5d3SJohn MarinoThe 16686d7f5d3SJohn Marino.Dq vif_index 16786d7f5d3SJohn Marinomust be unique per vif. 16886d7f5d3SJohn MarinoThe 16986d7f5d3SJohn Marino.Dq vif_flags 17086d7f5d3SJohn Marinocontains the 17186d7f5d3SJohn Marino.Dq VIFF_* 17286d7f5d3SJohn Marinoflags as defined in 17386d7f5d3SJohn Marino.In net/ip_mroute/ip_mroute.h . 17486d7f5d3SJohn MarinoThe 17586d7f5d3SJohn Marino.Dq min_ttl_threshold 17686d7f5d3SJohn Marinocontains the minimum TTL a multicast data packet must have to be 17786d7f5d3SJohn Marinoforwarded on that vif. 17886d7f5d3SJohn MarinoTypically, it would have value of 1. 17986d7f5d3SJohn MarinoThe 18086d7f5d3SJohn Marino.Dq max_rate_limit 18186d7f5d3SJohn Marinocontains the maximum rate (in bits/s) of the multicast data packets forwarded 18286d7f5d3SJohn Marinoon that vif. 18386d7f5d3SJohn MarinoValue of 0 means no limit. 18486d7f5d3SJohn MarinoThe 18586d7f5d3SJohn Marino.Dq vif_local_address 18686d7f5d3SJohn Marinocontains the local IP address of the corresponding local interface. 18786d7f5d3SJohn MarinoThe 18886d7f5d3SJohn Marino.Dq vif_remote_address 18986d7f5d3SJohn Marinocontains the remote IP address in case of DVMRP multicast tunnels. 19086d7f5d3SJohn Marino.Bd -literal 19186d7f5d3SJohn Marino/* IPv6 */ 19286d7f5d3SJohn Marinostruct mif6ctl mc; 19386d7f5d3SJohn Marinomemset(&mc, 0, sizeof(mc)); 19486d7f5d3SJohn Marino/* Assign all mif6ctl fields as appropriate */ 19586d7f5d3SJohn Marinomc.mif6c_mifi = mif_index; 19686d7f5d3SJohn Marinomc.mif6c_flags = mif_flags; 19786d7f5d3SJohn Marinomc.mif6c_pifi = pif_index; 19886d7f5d3SJohn Marinosetsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_ADD_MIF, (void *)&mc, 19986d7f5d3SJohn Marino sizeof(mc)); 20086d7f5d3SJohn Marino.Ed 20186d7f5d3SJohn Marino.Pp 20286d7f5d3SJohn MarinoThe 20386d7f5d3SJohn Marino.Dq mif_index 20486d7f5d3SJohn Marinomust be unique per vif. 20586d7f5d3SJohn MarinoThe 20686d7f5d3SJohn Marino.Dq mif_flags 20786d7f5d3SJohn Marinocontains the 20886d7f5d3SJohn Marino.Dq MIFF_* 20986d7f5d3SJohn Marinoflags as defined in 21086d7f5d3SJohn Marino.In netinet6/ip6_mroute.h . 21186d7f5d3SJohn MarinoThe 21286d7f5d3SJohn Marino.Dq pif_index 21386d7f5d3SJohn Marinois the physical interface index of the corresponding local interface. 21486d7f5d3SJohn Marino.Pp 21586d7f5d3SJohn MarinoA multicast interface is deleted by: 21686d7f5d3SJohn Marino.Bd -literal 21786d7f5d3SJohn Marino/* IPv4 */ 21886d7f5d3SJohn Marinovifi_t vifi = vif_index; 21986d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_VIF, (void *)&vifi, 22086d7f5d3SJohn Marino sizeof(vifi)); 22186d7f5d3SJohn Marino.Ed 22286d7f5d3SJohn Marino.Bd -literal 22386d7f5d3SJohn Marino/* IPv6 */ 22486d7f5d3SJohn Marinomifi_t mifi = mif_index; 22586d7f5d3SJohn Marinosetsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_DEL_MIF, (void *)&mifi, 22686d7f5d3SJohn Marino sizeof(mifi)); 22786d7f5d3SJohn Marino.Ed 22886d7f5d3SJohn Marino.Pp 22986d7f5d3SJohn MarinoAfter the multicast forwarding is enabled, and the multicast virtual 23086d7f5d3SJohn Marinointerfaces are 23186d7f5d3SJohn Marinoadded, the kernel may deliver upcall messages (also called signals 23286d7f5d3SJohn Marinolater in this text) on the multicast routing socket that was open 23386d7f5d3SJohn Marinoearlier with 23486d7f5d3SJohn Marino.Dq MRT_INIT 23586d7f5d3SJohn Marinoor 23686d7f5d3SJohn Marino.Dq MRT6_INIT . 23786d7f5d3SJohn MarinoThe IPv4 upcalls have 23886d7f5d3SJohn Marino.Dq struct igmpmsg 23986d7f5d3SJohn Marinoheader (see 24086d7f5d3SJohn Marino.In net/ip_mroute/ip_mroute.h ) 24186d7f5d3SJohn Marinowith field 24286d7f5d3SJohn Marino.Dq im_mbz 24386d7f5d3SJohn Marinoset to zero. 24486d7f5d3SJohn MarinoNote that this header follows the structure of 24586d7f5d3SJohn Marino.Dq struct ip 24686d7f5d3SJohn Marinowith the protocol field 24786d7f5d3SJohn Marino.Dq ip_p 24886d7f5d3SJohn Marinoset to zero. 24986d7f5d3SJohn MarinoThe IPv6 upcalls have 25086d7f5d3SJohn Marino.Dq struct mrt6msg 25186d7f5d3SJohn Marinoheader (see 25286d7f5d3SJohn Marino.In netinet6/ip6_mroute.h ) 25386d7f5d3SJohn Marinowith field 25486d7f5d3SJohn Marino.Dq im6_mbz 25586d7f5d3SJohn Marinoset to zero. 25686d7f5d3SJohn MarinoNote that this header follows the structure of 25786d7f5d3SJohn Marino.Dq struct ip6_hdr 25886d7f5d3SJohn Marinowith the next header field 25986d7f5d3SJohn Marino.Dq ip6_nxt 26086d7f5d3SJohn Marinoset to zero. 26186d7f5d3SJohn Marino.Pp 26286d7f5d3SJohn MarinoThe upcall header contains field 26386d7f5d3SJohn Marino.Dq im_msgtype 26486d7f5d3SJohn Marinoand 26586d7f5d3SJohn Marino.Dq im6_msgtype 26686d7f5d3SJohn Marinowith the type of the upcall 26786d7f5d3SJohn Marino.Dq IGMPMSG_* 26886d7f5d3SJohn Marinoand 26986d7f5d3SJohn Marino.Dq MRT6MSG_* 27086d7f5d3SJohn Marinofor IPv4 and IPv6 respectively. 27186d7f5d3SJohn MarinoThe values of the rest of the upcall header fields 27286d7f5d3SJohn Marinoand the body of the upcall message depend on the particular upcall type. 27386d7f5d3SJohn Marino.Pp 27486d7f5d3SJohn MarinoIf the upcall message type is 27586d7f5d3SJohn Marino.Dq IGMPMSG_NOCACHE 27686d7f5d3SJohn Marinoor 27786d7f5d3SJohn Marino.Dq MRT6MSG_NOCACHE , 27886d7f5d3SJohn Marinothis is an indication that a multicast packet has reached the multicast 27986d7f5d3SJohn Marinorouter, but the router has no forwarding state for that packet. 28086d7f5d3SJohn MarinoTypically, the upcall would be a signal for the multicast routing 28186d7f5d3SJohn Marinouser-level process to install the appropriate Multicast Forwarding 28286d7f5d3SJohn MarinoCache (MFC) entry in the kernel. 28386d7f5d3SJohn Marino.Pp 28486d7f5d3SJohn MarinoA MFC entry is added by: 28586d7f5d3SJohn Marino.Bd -literal 28686d7f5d3SJohn Marino/* IPv4 */ 28786d7f5d3SJohn Marinostruct mfcctl mc; 28886d7f5d3SJohn Marinomemset(&mc, 0, sizeof(mc)); 28986d7f5d3SJohn Marinomemcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin)); 29086d7f5d3SJohn Marinomemcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp)); 29186d7f5d3SJohn Marinomc.mfcc_parent = iif_index; 29286d7f5d3SJohn Marinofor (i = 0; i < maxvifs; i++) 29386d7f5d3SJohn Marino mc.mfcc_ttls[i] = oifs_ttl[i]; 29486d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_MFC, 29586d7f5d3SJohn Marino (void *)&mc, sizeof(mc)); 29686d7f5d3SJohn Marino.Ed 29786d7f5d3SJohn Marino.Bd -literal 29886d7f5d3SJohn Marino/* IPv6 */ 29986d7f5d3SJohn Marinostruct mf6cctl mc; 30086d7f5d3SJohn Marinomemset(&mc, 0, sizeof(mc)); 30186d7f5d3SJohn Marinomemcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin)); 30286d7f5d3SJohn Marinomemcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp)); 30386d7f5d3SJohn Marinomc.mf6cc_parent = iif_index; 30486d7f5d3SJohn Marinofor (i = 0; i < maxvifs; i++) 30586d7f5d3SJohn Marino if (oifs_ttl[i] > 0) 30686d7f5d3SJohn Marino IF_SET(i, &mc.mf6cc_ifset); 30786d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_ADD_MFC, 30886d7f5d3SJohn Marino (void *)&mc, sizeof(mc)); 30986d7f5d3SJohn Marino.Ed 31086d7f5d3SJohn Marino.Pp 31186d7f5d3SJohn MarinoThe 31286d7f5d3SJohn Marino.Dq source_addr 31386d7f5d3SJohn Marinoand 31486d7f5d3SJohn Marino.Dq group_addr 31586d7f5d3SJohn Marinoare the source and group address of the multicast packet (as set 31686d7f5d3SJohn Marinoin the upcall message). 31786d7f5d3SJohn MarinoThe 31886d7f5d3SJohn Marino.Dq iif_index 31986d7f5d3SJohn Marinois the virtual interface index of the multicast interface the multicast 32086d7f5d3SJohn Marinopackets for this specific source and group address should be received on. 32186d7f5d3SJohn MarinoThe 32286d7f5d3SJohn Marino.Dq oifs_ttl[] 32386d7f5d3SJohn Marinoarray contains the minimum TTL (per interface) a multicast packet 32486d7f5d3SJohn Marinoshould have to be forwarded on an outgoing interface. 32586d7f5d3SJohn MarinoIf the TTL value is zero, the corresponding interface is not included 32686d7f5d3SJohn Marinoin the set of outgoing interfaces. 32786d7f5d3SJohn MarinoNote that in case of IPv6 only the set of outgoing interfaces can 32886d7f5d3SJohn Marinobe specified. 32986d7f5d3SJohn Marino.Pp 33086d7f5d3SJohn MarinoA MFC entry is deleted by: 33186d7f5d3SJohn Marino.Bd -literal 33286d7f5d3SJohn Marino/* IPv4 */ 33386d7f5d3SJohn Marinostruct mfcctl mc; 33486d7f5d3SJohn Marinomemset(&mc, 0, sizeof(mc)); 33586d7f5d3SJohn Marinomemcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin)); 33686d7f5d3SJohn Marinomemcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp)); 33786d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_MFC, 33886d7f5d3SJohn Marino (void *)&mc, sizeof(mc)); 33986d7f5d3SJohn Marino.Ed 34086d7f5d3SJohn Marino.Bd -literal 34186d7f5d3SJohn Marino/* IPv6 */ 34286d7f5d3SJohn Marinostruct mf6cctl mc; 34386d7f5d3SJohn Marinomemset(&mc, 0, sizeof(mc)); 34486d7f5d3SJohn Marinomemcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin)); 34586d7f5d3SJohn Marinomemcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp)); 34686d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_DEL_MFC, 34786d7f5d3SJohn Marino (void *)&mc, sizeof(mc)); 34886d7f5d3SJohn Marino.Ed 34986d7f5d3SJohn Marino.Pp 35086d7f5d3SJohn MarinoThe following method can be used to get various statistics per 35186d7f5d3SJohn Marinoinstalled MFC entry in the kernel (e.g., the number of forwarded 35286d7f5d3SJohn Marinopackets per source and group address): 35386d7f5d3SJohn Marino.Bd -literal 35486d7f5d3SJohn Marino/* IPv4 */ 35586d7f5d3SJohn Marinostruct sioc_sg_req sgreq; 35686d7f5d3SJohn Marinomemset(&sgreq, 0, sizeof(sgreq)); 35786d7f5d3SJohn Marinomemcpy(&sgreq.src, &source_addr, sizeof(sgreq.src)); 35886d7f5d3SJohn Marinomemcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp)); 35986d7f5d3SJohn Marinoioctl(mrouter_s4, SIOCGETSGCNT, &sgreq); 36086d7f5d3SJohn Marino.Ed 36186d7f5d3SJohn Marino.Bd -literal 36286d7f5d3SJohn Marino/* IPv6 */ 36386d7f5d3SJohn Marinostruct sioc_sg_req6 sgreq; 36486d7f5d3SJohn Marinomemset(&sgreq, 0, sizeof(sgreq)); 36586d7f5d3SJohn Marinomemcpy(&sgreq.src, &source_addr, sizeof(sgreq.src)); 36686d7f5d3SJohn Marinomemcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp)); 36786d7f5d3SJohn Marinoioctl(mrouter_s6, SIOCGETSGCNT_IN6, &sgreq); 36886d7f5d3SJohn Marino.Ed 36986d7f5d3SJohn Marino.Pp 37086d7f5d3SJohn MarinoThe following method can be used to get various statistics per 37186d7f5d3SJohn Marinomulticast virtual interface in the kernel (e.g., the number of forwarded 37286d7f5d3SJohn Marinopackets per interface): 37386d7f5d3SJohn Marino.Bd -literal 37486d7f5d3SJohn Marino/* IPv4 */ 37586d7f5d3SJohn Marinostruct sioc_vif_req vreq; 37686d7f5d3SJohn Marinomemset(&vreq, 0, sizeof(vreq)); 37786d7f5d3SJohn Marinovreq.vifi = vif_index; 37886d7f5d3SJohn Marinoioctl(mrouter_s4, SIOCGETVIFCNT, &vreq); 37986d7f5d3SJohn Marino.Ed 38086d7f5d3SJohn Marino.Bd -literal 38186d7f5d3SJohn Marino/* IPv6 */ 38286d7f5d3SJohn Marinostruct sioc_mif_req6 mreq; 38386d7f5d3SJohn Marinomemset(&mreq, 0, sizeof(mreq)); 38486d7f5d3SJohn Marinomreq.mifi = vif_index; 38586d7f5d3SJohn Marinoioctl(mrouter_s6, SIOCGETMIFCNT_IN6, &mreq); 38686d7f5d3SJohn Marino.Ed 38786d7f5d3SJohn Marino.Ss Advanced Multicast API Programming Guide 38886d7f5d3SJohn MarinoIf we want to add new features in the kernel, it becomes difficult 38986d7f5d3SJohn Marinoto preserve backward compatibility (binary and API), 39086d7f5d3SJohn Marinoand at the same time to allow user-level processes to take advantage of 39186d7f5d3SJohn Marinothe new features (if the kernel supports them). 39286d7f5d3SJohn Marino.Pp 39386d7f5d3SJohn MarinoOne of the mechanisms that allows us to preserve the backward 39486d7f5d3SJohn Marinocompatibility is a sort of negotiation 39586d7f5d3SJohn Marinobetween the user-level process and the kernel: 39686d7f5d3SJohn Marino.Bl -enum 39786d7f5d3SJohn Marino.It 39886d7f5d3SJohn MarinoThe user-level process tries to enable in the kernel the set of new 39986d7f5d3SJohn Marinofeatures (and the corresponding API) it would like to use. 40086d7f5d3SJohn Marino.It 40186d7f5d3SJohn MarinoThe kernel returns the (sub)set of features it knows about 40286d7f5d3SJohn Marinoand is willing to be enabled. 40386d7f5d3SJohn Marino.It 40486d7f5d3SJohn MarinoThe user-level process uses only that set of features 40586d7f5d3SJohn Marinothe kernel has agreed on. 40686d7f5d3SJohn Marino.El 40786d7f5d3SJohn Marino.\" 40886d7f5d3SJohn Marino.Pp 40986d7f5d3SJohn MarinoTo support backward compatibility, if the user-level process doesn't 41086d7f5d3SJohn Marinoask for any new features, the kernel defaults to the basic 41186d7f5d3SJohn Marinomulticast API (see the 41286d7f5d3SJohn Marino.Sx "Programming Guide" 41386d7f5d3SJohn Marinosection). 41486d7f5d3SJohn Marino.\" XXX: edit as appropriate after the advanced multicast API is 41586d7f5d3SJohn Marino.\" supported under IPv6 41686d7f5d3SJohn MarinoCurrently, the advanced multicast API exists only for IPv4; 41786d7f5d3SJohn Marinoin the future there will be IPv6 support as well. 41886d7f5d3SJohn Marino.Pp 41986d7f5d3SJohn MarinoBelow is a summary of the expandable API solution. 42086d7f5d3SJohn MarinoNote that all new options and structures are defined 42186d7f5d3SJohn Marinoin 42286d7f5d3SJohn Marino.In net/ip_mroute/ip_mroute.h 42386d7f5d3SJohn Marinoand 42486d7f5d3SJohn Marino.In netinet6/ip6_mroute.h , 42586d7f5d3SJohn Marinounless stated otherwise. 42686d7f5d3SJohn Marino.Pp 42786d7f5d3SJohn MarinoThe user-level process uses new get/setsockopt() options to 42886d7f5d3SJohn Marinoperform the API features negotiation with the kernel. 42986d7f5d3SJohn MarinoThis negotiation must be performed right after the multicast routing 43086d7f5d3SJohn Marinosocket is open. 43186d7f5d3SJohn MarinoThe set of desired/allowed features is stored in a bitset 43286d7f5d3SJohn Marino(currently, in uint32_t; i.e., maximum of 32 new features). 43386d7f5d3SJohn MarinoThe new get/setsockopt() options are 43486d7f5d3SJohn Marino.Dq MRT_API_SUPPORT 43586d7f5d3SJohn Marinoand 43686d7f5d3SJohn Marino.Dq MRT_API_CONFIG . 43786d7f5d3SJohn MarinoExample: 43886d7f5d3SJohn Marino.Bd -literal 43986d7f5d3SJohn Marinouint32_t v; 44086d7f5d3SJohn Marinogetsockopt(sock, IPPROTO_IP, MRT_API_SUPPORT, (void *)&v, sizeof(v)); 44186d7f5d3SJohn Marino.Ed 44286d7f5d3SJohn Marino.Pp 44386d7f5d3SJohn Marinowould set in 44486d7f5d3SJohn Marino.Dq v 44586d7f5d3SJohn Marinothe pre-defined bits that the kernel API supports. 44686d7f5d3SJohn MarinoThe eight least significant bits in uint32_t are same as the 44786d7f5d3SJohn Marinoeight possible flags 44886d7f5d3SJohn Marino.Dq MRT_MFC_FLAGS_* 44986d7f5d3SJohn Marinothat can be used in 45086d7f5d3SJohn Marino.Dq mfcc_flags 45186d7f5d3SJohn Marinoas part of the new definition of 45286d7f5d3SJohn Marino.Dq struct mfcctl 45386d7f5d3SJohn Marino(see below about those flags), which leaves 24 flags for other new features. 45486d7f5d3SJohn MarinoThe value returned by getsockopt(MRT_API_SUPPORT) is read-only; in other 45586d7f5d3SJohn Marinowords, setsockopt(MRT_API_SUPPORT) would fail. 45686d7f5d3SJohn Marino.Pp 45786d7f5d3SJohn MarinoTo modify the API, and to set some specific feature in the kernel, then: 45886d7f5d3SJohn Marino.Bd -literal 45986d7f5d3SJohn Marinouint32_t v = MRT_MFC_FLAGS_DISABLE_WRONGVIF; 46086d7f5d3SJohn Marinoif (setsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v)) 46186d7f5d3SJohn Marino != 0) { 46286d7f5d3SJohn Marino return (ERROR); 46386d7f5d3SJohn Marino} 46486d7f5d3SJohn Marinoif (v & MRT_MFC_FLAGS_DISABLE_WRONGVIF) 46586d7f5d3SJohn Marino return (OK); /* Success */ 46686d7f5d3SJohn Marinoelse 46786d7f5d3SJohn Marino return (ERROR); 46886d7f5d3SJohn Marino.Ed 46986d7f5d3SJohn Marino.Pp 47086d7f5d3SJohn MarinoIn other words, when setsockopt(MRT_API_CONFIG) is called, the 47186d7f5d3SJohn Marinoargument to it specifies the desired set of features to 47286d7f5d3SJohn Marinobe enabled in the API and the kernel. 47386d7f5d3SJohn MarinoThe return value in 47486d7f5d3SJohn Marino.Dq v 47586d7f5d3SJohn Marinois the actual (sub)set of features that were enabled in the kernel. 47686d7f5d3SJohn MarinoTo obtain later the same set of features that were enabled, then: 47786d7f5d3SJohn Marino.Bd -literal 47886d7f5d3SJohn Marinogetsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v)); 47986d7f5d3SJohn Marino.Ed 48086d7f5d3SJohn Marino.Pp 48186d7f5d3SJohn MarinoThe set of enabled features is global. 48286d7f5d3SJohn MarinoIn other words, setsockopt(MRT_API_CONFIG) 48386d7f5d3SJohn Marinoshould be called right after setsockopt(MRT_INIT). 48486d7f5d3SJohn Marino.Pp 48586d7f5d3SJohn MarinoCurrently, the following set of new features is defined: 48686d7f5d3SJohn Marino.Bd -literal 48786d7f5d3SJohn Marino#define MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0) /* disable WRONGVIF signals */ 48886d7f5d3SJohn Marino#define MRT_MFC_FLAGS_BORDER_VIF (1 << 1) /* border vif */ 48986d7f5d3SJohn Marino#define MRT_MFC_RP (1 << 8) /* enable RP address */ 49086d7f5d3SJohn Marino#define MRT_MFC_BW_UPCALL (1 << 9) /* enable bw upcalls */ 49186d7f5d3SJohn Marino.Ed 49286d7f5d3SJohn Marino.\" .Pp 49386d7f5d3SJohn Marino.\" In the future there might be: 49486d7f5d3SJohn Marino.\" .Bd -literal 49586d7f5d3SJohn Marino.\" #define MRT_MFC_GROUP_SPECIFIC (1 << 10) /* allow (*,G) MFC entries */ 49686d7f5d3SJohn Marino.\" .Ed 49786d7f5d3SJohn Marino.\" .Pp 49886d7f5d3SJohn Marino.\" to allow (*,G) MFC entries (i.e., group-specific entries) in the kernel. 49986d7f5d3SJohn Marino.\" For now this is left-out until it is clear whether 50086d7f5d3SJohn Marino.\" (*,G) MFC support is the preferred solution instead of something more generic 50186d7f5d3SJohn Marino.\" solution for example. 50286d7f5d3SJohn Marino.\" 50386d7f5d3SJohn Marino.\" 2. The newly defined struct mfcctl2. 50486d7f5d3SJohn Marino.\" 50586d7f5d3SJohn Marino.Pp 50686d7f5d3SJohn MarinoThe advanced multicast API uses a newly defined 50786d7f5d3SJohn Marino.Dq struct mfcctl2 50886d7f5d3SJohn Marinoinstead of the traditional 50986d7f5d3SJohn Marino.Dq struct mfcctl . 51086d7f5d3SJohn MarinoThe original 51186d7f5d3SJohn Marino.Dq struct mfcctl 51286d7f5d3SJohn Marinois kept as is. 51386d7f5d3SJohn MarinoThe new 51486d7f5d3SJohn Marino.Dq struct mfcctl2 51586d7f5d3SJohn Marinois: 51686d7f5d3SJohn Marino.Bd -literal 51786d7f5d3SJohn Marino/* 51886d7f5d3SJohn Marino * The new argument structure for MRT_ADD_MFC and MRT_DEL_MFC overlays 51986d7f5d3SJohn Marino * and extends the old struct mfcctl. 52086d7f5d3SJohn Marino */ 52186d7f5d3SJohn Marinostruct mfcctl2 { 52286d7f5d3SJohn Marino /* the mfcctl fields */ 52386d7f5d3SJohn Marino struct in_addr mfcc_origin; /* ip origin of mcasts */ 52486d7f5d3SJohn Marino struct in_addr mfcc_mcastgrp; /* multicast group associated*/ 52586d7f5d3SJohn Marino vifi_t mfcc_parent; /* incoming vif */ 52686d7f5d3SJohn Marino u_char mfcc_ttls[MAXVIFS];/* forwarding ttls on vifs */ 52786d7f5d3SJohn Marino 52886d7f5d3SJohn Marino /* extension fields */ 52986d7f5d3SJohn Marino uint8_t mfcc_flags[MAXVIFS];/* the MRT_MFC_FLAGS_* flags*/ 53086d7f5d3SJohn Marino struct in_addr mfcc_rp; /* the RP address */ 53186d7f5d3SJohn Marino}; 53286d7f5d3SJohn Marino.Ed 53386d7f5d3SJohn Marino.Pp 53486d7f5d3SJohn MarinoThe new fields are 53586d7f5d3SJohn Marino.Dq mfcc_flags[MAXVIFS] 53686d7f5d3SJohn Marinoand 53786d7f5d3SJohn Marino.Dq mfcc_rp . 53886d7f5d3SJohn MarinoNote that for compatibility reasons they are added at the end. 53986d7f5d3SJohn Marino.Pp 54086d7f5d3SJohn MarinoThe 54186d7f5d3SJohn Marino.Dq mfcc_flags[MAXVIFS] 54286d7f5d3SJohn Marinofield is used to set various flags per 54386d7f5d3SJohn Marinointerface per (S,G) entry. 54486d7f5d3SJohn MarinoCurrently, the defined flags are: 54586d7f5d3SJohn Marino.Bd -literal 54686d7f5d3SJohn Marino#define MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0) /* disable WRONGVIF signals */ 54786d7f5d3SJohn Marino#define MRT_MFC_FLAGS_BORDER_VIF (1 << 1) /* border vif */ 54886d7f5d3SJohn Marino.Ed 54986d7f5d3SJohn Marino.Pp 55086d7f5d3SJohn MarinoThe 55186d7f5d3SJohn Marino.Dq MRT_MFC_FLAGS_DISABLE_WRONGVIF 55286d7f5d3SJohn Marinoflag is used to explicitly disable the 55386d7f5d3SJohn Marino.Dq IGMPMSG_WRONGVIF 55486d7f5d3SJohn Marinokernel signal at the (S,G) granularity if a multicast data packet 55586d7f5d3SJohn Marinoarrives on the wrong interface. 55686d7f5d3SJohn MarinoUsually, this signal is used to 55786d7f5d3SJohn Marinocomplete the shortest-path switch in case of PIM-SM multicast routing, 55886d7f5d3SJohn Marinoor to trigger a PIM assert message. 55986d7f5d3SJohn MarinoHowever, it should not be delivered for interfaces that are not in 56086d7f5d3SJohn Marinothe outgoing interface set, and that are not expecting to 56186d7f5d3SJohn Marinobecome an incoming interface. 56286d7f5d3SJohn MarinoHence, if the 56386d7f5d3SJohn Marino.Dq MRT_MFC_FLAGS_DISABLE_WRONGVIF 56486d7f5d3SJohn Marinoflag is set for some of the 56586d7f5d3SJohn Marinointerfaces, then a data packet that arrives on that interface for 56686d7f5d3SJohn Marinothat MFC entry will NOT trigger a WRONGVIF signal. 56786d7f5d3SJohn MarinoIf that flag is not set, then a signal is triggered (the default action). 56886d7f5d3SJohn Marino.Pp 56986d7f5d3SJohn MarinoThe 57086d7f5d3SJohn Marino.Dq MRT_MFC_FLAGS_BORDER_VIF 57186d7f5d3SJohn Marinoflag is used to specify whether the Border-bit in PIM 57286d7f5d3SJohn MarinoRegister messages should be set (in case when the Register encapsulation 57386d7f5d3SJohn Marinois performed inside the kernel). 57486d7f5d3SJohn MarinoIf it is set for the special PIM Register kernel virtual interface 57586d7f5d3SJohn Marino(see 57686d7f5d3SJohn Marino.Xr pim 4 ) , 57786d7f5d3SJohn Marinothe Border-bit in the Register messages sent to the RP will be set. 57886d7f5d3SJohn Marino.Pp 57986d7f5d3SJohn MarinoThe remaining six bits are reserved for future usage. 58086d7f5d3SJohn Marino.Pp 58186d7f5d3SJohn MarinoThe 58286d7f5d3SJohn Marino.Dq mfcc_rp 58386d7f5d3SJohn Marinofield is used to specify the RP address (in case of PIM-SM multicast routing) 58486d7f5d3SJohn Marinofor a multicast 58586d7f5d3SJohn Marinogroup G if we want to perform kernel-level PIM Register encapsulation. 58686d7f5d3SJohn MarinoThe 58786d7f5d3SJohn Marino.Dq mfcc_rp 58886d7f5d3SJohn Marinofield is used only if the 58986d7f5d3SJohn Marino.Dq MRT_MFC_RP 59086d7f5d3SJohn Marinoadvanced API flag/capability has been successfully set by 59186d7f5d3SJohn Marinosetsockopt(MRT_API_CONFIG). 59286d7f5d3SJohn Marino.Pp 59386d7f5d3SJohn Marino.\" 59486d7f5d3SJohn Marino.\" 3. Kernel-level PIM Register encapsulation 59586d7f5d3SJohn Marino.\" 59686d7f5d3SJohn MarinoIf the 59786d7f5d3SJohn Marino.Dq MRT_MFC_RP 59886d7f5d3SJohn Marinoflag was successfully set by 59986d7f5d3SJohn Marinosetsockopt(MRT_API_CONFIG), then the kernel will attempt to perform 60086d7f5d3SJohn Marinothe PIM Register encapsulation itself instead of sending the 60186d7f5d3SJohn Marinomulticast data packets to user level (inside IGMPMSG_WHOLEPKT 60286d7f5d3SJohn Marinoupcalls) for user-level encapsulation. 60386d7f5d3SJohn MarinoThe RP address would be taken from the 60486d7f5d3SJohn Marino.Dq mfcc_rp 60586d7f5d3SJohn Marinofield 60686d7f5d3SJohn Marinoinside the new 60786d7f5d3SJohn Marino.Dq struct mfcctl2 . 60886d7f5d3SJohn MarinoHowever, even if the 60986d7f5d3SJohn Marino.Dq MRT_MFC_RP 61086d7f5d3SJohn Marinoflag was successfully set, if the 61186d7f5d3SJohn Marino.Dq mfcc_rp 61286d7f5d3SJohn Marinofield was set to 61386d7f5d3SJohn Marino.Dq INADDR_ANY , 61486d7f5d3SJohn Marinothen the 61586d7f5d3SJohn Marinokernel will still deliver an IGMPMSG_WHOLEPKT upcall with the 61686d7f5d3SJohn Marinomulticast data packet to the user-level process. 61786d7f5d3SJohn Marino.Pp 61886d7f5d3SJohn MarinoIn addition, if the multicast data packet is too large to fit within 61986d7f5d3SJohn Marinoa single IP packet after the PIM Register encapsulation (e.g., if 62086d7f5d3SJohn Marinoits size was on the order of 65500 bytes), the data packet will be 62186d7f5d3SJohn Marinofragmented, and then each of the fragments will be encapsulated 62286d7f5d3SJohn Marinoseparately. 62386d7f5d3SJohn MarinoNote that typically a multicast data packet can be that 62486d7f5d3SJohn Marinolarge only if it was originated locally from the same hosts that 62586d7f5d3SJohn Marinoperforms the encapsulation; otherwise the transmission of the 62686d7f5d3SJohn Marinomulticast data packet over Ethernet for example would have 62786d7f5d3SJohn Marinofragmented it into much smaller pieces. 62886d7f5d3SJohn Marino.\" 62986d7f5d3SJohn Marino.\" Note that if this code is ported to IPv6, we may need the kernel to 63086d7f5d3SJohn Marino.\" perform MTU discovery to the RP, and keep those discoveries inside 63186d7f5d3SJohn Marino.\" the kernel so the encapsulating router may send back ICMP 63286d7f5d3SJohn Marino.\" Fragmentation Required if the size of the multicast data packet is 63386d7f5d3SJohn Marino.\" too large (see "Encapsulating data packets in the Register Tunnel" 63486d7f5d3SJohn Marino.\" in Section 4.4.1 in the PIM-SM spec 63586d7f5d3SJohn Marino.\" draft-ietf-pim-sm-v2-new-05.{txt,ps}). 63686d7f5d3SJohn Marino.\" For IPv4 we may be able to get away without it, but for IPv6 we need 63786d7f5d3SJohn Marino.\" that. 63886d7f5d3SJohn Marino.\" 63986d7f5d3SJohn Marino.\" 4. Mechanism for "multicast bandwidth monitoring and upcalls". 64086d7f5d3SJohn Marino.\" 64186d7f5d3SJohn Marino.Pp 64286d7f5d3SJohn MarinoTypically, a multicast routing user-level process would need to know the 64386d7f5d3SJohn Marinoforwarding bandwidth for some data flow. 64486d7f5d3SJohn MarinoFor example, the multicast routing process may want to timeout idle MFC 64586d7f5d3SJohn Marinoentries, or in case of PIM-SM it can initiate (S,G) shortest-path switch if 64686d7f5d3SJohn Marinothe bandwidth rate is above a threshold for example. 64786d7f5d3SJohn Marino.Pp 64886d7f5d3SJohn MarinoThe original solution for measuring the bandwidth of a dataflow was 64986d7f5d3SJohn Marinothat a user-level process would periodically 65086d7f5d3SJohn Marinoquery the kernel about the number of forwarded packets/bytes per 65186d7f5d3SJohn Marino(S,G), and then based on those numbers it would estimate whether a source 65286d7f5d3SJohn Marinohas been idle, or whether the source's transmission bandwidth is above a 65386d7f5d3SJohn Marinothreshold. 65486d7f5d3SJohn MarinoThat solution is far from being scalable, hence the need for a new 65586d7f5d3SJohn Marinomechanism for bandwidth monitoring. 65686d7f5d3SJohn Marino.Pp 65786d7f5d3SJohn MarinoBelow is a description of the bandwidth monitoring mechanism. 65886d7f5d3SJohn Marino.Bl -bullet 65986d7f5d3SJohn Marino.It 66086d7f5d3SJohn MarinoIf the bandwidth of a data flow satisfies some pre-defined filter, 66186d7f5d3SJohn Marinothe kernel delivers an upcall on the multicast routing socket 66286d7f5d3SJohn Marinoto the multicast routing process that has installed that filter. 66386d7f5d3SJohn Marino.It 66486d7f5d3SJohn MarinoThe bandwidth-upcall filters are installed per (S,G). There can be 66586d7f5d3SJohn Marinomore than one filter per (S,G). 66686d7f5d3SJohn Marino.It 66786d7f5d3SJohn MarinoInstead of supporting all possible comparison operations 66886d7f5d3SJohn Marino(i.e., < <= == != > >= ), there is support only for the 66986d7f5d3SJohn Marino<= and >= operations, 67086d7f5d3SJohn Marinobecause this makes the kernel-level implementation simpler, 67186d7f5d3SJohn Marinoand because practically we need only those two. 67286d7f5d3SJohn MarinoFurther, the missing operations can be simulated by secondary 67386d7f5d3SJohn Marinouser-level filtering of those <= and >= filters. 67486d7f5d3SJohn MarinoFor example, to simulate !=, then we need to install filter 67586d7f5d3SJohn Marino.Dq bw <= 0xffffffff , 67686d7f5d3SJohn Marinoand after an 67786d7f5d3SJohn Marinoupcall is received, we need to check whether 67886d7f5d3SJohn Marino.Dq measured_bw != expected_bw . 67986d7f5d3SJohn Marino.It 68086d7f5d3SJohn MarinoThe bandwidth-upcall mechanism is enabled by 68186d7f5d3SJohn Marinosetsockopt(MRT_API_CONFIG) for the MRT_MFC_BW_UPCALL flag. 68286d7f5d3SJohn Marino.It 68386d7f5d3SJohn MarinoThe bandwidth-upcall filters are added/deleted by the new 68486d7f5d3SJohn Marinosetsockopt(MRT_ADD_BW_UPCALL) and setsockopt(MRT_DEL_BW_UPCALL) 68586d7f5d3SJohn Marinorespectively (with the appropriate 68686d7f5d3SJohn Marino.Dq struct bw_upcall 68786d7f5d3SJohn Marinoargument of course). 68886d7f5d3SJohn Marino.El 68986d7f5d3SJohn Marino.Pp 69086d7f5d3SJohn MarinoFrom application point of view, a developer needs to know about 69186d7f5d3SJohn Marinothe following: 69286d7f5d3SJohn Marino.Bd -literal 69386d7f5d3SJohn Marino/* 69486d7f5d3SJohn Marino * Structure for installing or delivering an upcall if the 69586d7f5d3SJohn Marino * measured bandwidth is above or below a threshold. 69686d7f5d3SJohn Marino * 69786d7f5d3SJohn Marino * User programs (e.g. daemons) may have a need to know when the 69886d7f5d3SJohn Marino * bandwidth used by some data flow is above or below some threshold. 69986d7f5d3SJohn Marino * This interface allows the userland to specify the threshold (in 70086d7f5d3SJohn Marino * bytes and/or packets) and the measurement interval. Flows are 70186d7f5d3SJohn Marino * all packet with the same source and destination IP address. 70286d7f5d3SJohn Marino * At the moment the code is only used for multicast destinations 70386d7f5d3SJohn Marino * but there is nothing that prevents its use for unicast. 70486d7f5d3SJohn Marino * 70586d7f5d3SJohn Marino * The measurement interval cannot be shorter than some Tmin (currently, 3s). 70686d7f5d3SJohn Marino * The threshold is set in packets and/or bytes per_interval. 70786d7f5d3SJohn Marino * 70886d7f5d3SJohn Marino * Measurement works as follows: 70986d7f5d3SJohn Marino * 71086d7f5d3SJohn Marino * For >= measurements: 71186d7f5d3SJohn Marino * The first packet marks the start of a measurement interval. 71286d7f5d3SJohn Marino * During an interval we count packets and bytes, and when we 71386d7f5d3SJohn Marino * pass the threshold we deliver an upcall and we are done. 71486d7f5d3SJohn Marino * The first packet after the end of the interval resets the 71586d7f5d3SJohn Marino * count and restarts the measurement. 71686d7f5d3SJohn Marino * 71786d7f5d3SJohn Marino * For <= measurement: 71886d7f5d3SJohn Marino * We start a timer to fire at the end of the interval, and 71986d7f5d3SJohn Marino * then for each incoming packet we count packets and bytes. 72086d7f5d3SJohn Marino * When the timer fires, we compare the value with the threshold, 72186d7f5d3SJohn Marino * schedule an upcall if we are below, and restart the measurement 72286d7f5d3SJohn Marino * (reschedule timer and zero counters). 72386d7f5d3SJohn Marino */ 72486d7f5d3SJohn Marino 72586d7f5d3SJohn Marinostruct bw_data { 72686d7f5d3SJohn Marino struct timeval b_time; 72786d7f5d3SJohn Marino uint64_t b_packets; 72886d7f5d3SJohn Marino uint64_t b_bytes; 72986d7f5d3SJohn Marino}; 73086d7f5d3SJohn Marino 73186d7f5d3SJohn Marinostruct bw_upcall { 73286d7f5d3SJohn Marino struct in_addr bu_src; /* source address */ 73386d7f5d3SJohn Marino struct in_addr bu_dst; /* destination address */ 73486d7f5d3SJohn Marino uint32_t bu_flags; /* misc flags (see below) */ 73586d7f5d3SJohn Marino#define BW_UPCALL_UNIT_PACKETS (1 << 0) /* threshold (in packets) */ 73686d7f5d3SJohn Marino#define BW_UPCALL_UNIT_BYTES (1 << 1) /* threshold (in bytes) */ 73786d7f5d3SJohn Marino#define BW_UPCALL_GEQ (1 << 2) /* upcall if bw >= threshold */ 73886d7f5d3SJohn Marino#define BW_UPCALL_LEQ (1 << 3) /* upcall if bw <= threshold */ 73986d7f5d3SJohn Marino#define BW_UPCALL_DELETE_ALL (1 << 4) /* delete all upcalls for s,d*/ 74086d7f5d3SJohn Marino struct bw_data bu_threshold; /* the bw threshold */ 74186d7f5d3SJohn Marino struct bw_data bu_measured; /* the measured bw */ 74286d7f5d3SJohn Marino}; 74386d7f5d3SJohn Marino 74486d7f5d3SJohn Marino/* max. number of upcalls to deliver together */ 74586d7f5d3SJohn Marino#define BW_UPCALLS_MAX 128 74686d7f5d3SJohn Marino/* min. threshold time interval for bandwidth measurement */ 74786d7f5d3SJohn Marino#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_SEC 3 74886d7f5d3SJohn Marino#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_USEC 0 74986d7f5d3SJohn Marino.Ed 75086d7f5d3SJohn Marino.Pp 75186d7f5d3SJohn MarinoThe 75286d7f5d3SJohn Marino.Dq bw_upcall 75386d7f5d3SJohn Marinostructure is used as an argument to 75486d7f5d3SJohn Marinosetsockopt(MRT_ADD_BW_UPCALL) and setsockopt(MRT_DEL_BW_UPCALL). 75586d7f5d3SJohn MarinoEach setsockopt(MRT_ADD_BW_UPCALL) installs a filter in the kernel 75686d7f5d3SJohn Marinofor the source and destination address in the 75786d7f5d3SJohn Marino.Dq bw_upcall 75886d7f5d3SJohn Marinoargument, 75986d7f5d3SJohn Marinoand that filter will trigger an upcall according to the following 76086d7f5d3SJohn Marinopseudo-algorithm: 76186d7f5d3SJohn Marino.Bd -literal 76286d7f5d3SJohn Marino if (bw_upcall_oper IS ">=") { 76386d7f5d3SJohn Marino if (((bw_upcall_unit & PACKETS == PACKETS) && 76486d7f5d3SJohn Marino (measured_packets >= threshold_packets)) || 76586d7f5d3SJohn Marino ((bw_upcall_unit & BYTES == BYTES) && 76686d7f5d3SJohn Marino (measured_bytes >= threshold_bytes))) 76786d7f5d3SJohn Marino SEND_UPCALL("measured bandwidth is >= threshold"); 76886d7f5d3SJohn Marino } 76986d7f5d3SJohn Marino if (bw_upcall_oper IS "<=" && measured_interval >= threshold_interval) { 77086d7f5d3SJohn Marino if (((bw_upcall_unit & PACKETS == PACKETS) && 77186d7f5d3SJohn Marino (measured_packets <= threshold_packets)) || 77286d7f5d3SJohn Marino ((bw_upcall_unit & BYTES == BYTES) && 77386d7f5d3SJohn Marino (measured_bytes <= threshold_bytes))) 77486d7f5d3SJohn Marino SEND_UPCALL("measured bandwidth is <= threshold"); 77586d7f5d3SJohn Marino } 77686d7f5d3SJohn Marino.Ed 77786d7f5d3SJohn Marino.Pp 77886d7f5d3SJohn MarinoIn the same 77986d7f5d3SJohn Marino.Dq bw_upcall 78086d7f5d3SJohn Marinothe unit can be specified in both BYTES and PACKETS. 78186d7f5d3SJohn MarinoHowever, the GEQ and LEQ flags are mutually exclusive. 78286d7f5d3SJohn Marino.Pp 78386d7f5d3SJohn MarinoBasically, an upcall is delivered if the measured bandwidth is >= or 78486d7f5d3SJohn Marino<= the threshold bandwidth (within the specified measurement 78586d7f5d3SJohn Marinointerval). 78686d7f5d3SJohn MarinoFor practical reasons, the smallest value for the measurement 78786d7f5d3SJohn Marinointerval is 3 seconds. 78886d7f5d3SJohn MarinoIf smaller values are allowed, then the bandwidth 78986d7f5d3SJohn Marinoestimation may be less accurate, or the potentially very high frequency 79086d7f5d3SJohn Marinoof the generated upcalls may introduce too much overhead. 79186d7f5d3SJohn MarinoFor the >= operation, the answer may be known before the end of 79286d7f5d3SJohn Marino.Dq threshold_interval , 79386d7f5d3SJohn Marinotherefore the upcall may be delivered earlier. 79486d7f5d3SJohn MarinoFor the <= operation however, we must wait 79586d7f5d3SJohn Marinountil the threshold interval has expired to know the answer. 79686d7f5d3SJohn Marino.Pp 79786d7f5d3SJohn MarinoExample of usage: 79886d7f5d3SJohn Marino.Bd -literal 79986d7f5d3SJohn Marinostruct bw_upcall bw_upcall; 80086d7f5d3SJohn Marino/* Assign all bw_upcall fields as appropriate */ 80186d7f5d3SJohn Marinomemset(&bw_upcall, 0, sizeof(bw_upcall)); 80286d7f5d3SJohn Marinomemcpy(&bw_upcall.bu_src, &source, sizeof(bw_upcall.bu_src)); 80386d7f5d3SJohn Marinomemcpy(&bw_upcall.bu_dst, &group, sizeof(bw_upcall.bu_dst)); 80486d7f5d3SJohn Marinobw_upcall.bu_threshold.b_data = threshold_interval; 80586d7f5d3SJohn Marinobw_upcall.bu_threshold.b_packets = threshold_packets; 80686d7f5d3SJohn Marinobw_upcall.bu_threshold.b_bytes = threshold_bytes; 80786d7f5d3SJohn Marinoif (is_threshold_in_packets) 80886d7f5d3SJohn Marino bw_upcall.bu_flags |= BW_UPCALL_UNIT_PACKETS; 80986d7f5d3SJohn Marinoif (is_threshold_in_bytes) 81086d7f5d3SJohn Marino bw_upcall.bu_flags |= BW_UPCALL_UNIT_BYTES; 81186d7f5d3SJohn Marinodo { 81286d7f5d3SJohn Marino if (is_geq_upcall) { 81386d7f5d3SJohn Marino bw_upcall.bu_flags |= BW_UPCALL_GEQ; 81486d7f5d3SJohn Marino break; 81586d7f5d3SJohn Marino } 81686d7f5d3SJohn Marino if (is_leq_upcall) { 81786d7f5d3SJohn Marino bw_upcall.bu_flags |= BW_UPCALL_LEQ; 81886d7f5d3SJohn Marino break; 81986d7f5d3SJohn Marino } 82086d7f5d3SJohn Marino return (ERROR); 82186d7f5d3SJohn Marino} while (0); 82286d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_BW_UPCALL, 82386d7f5d3SJohn Marino (void *)&bw_upcall, sizeof(bw_upcall)); 82486d7f5d3SJohn Marino.Ed 82586d7f5d3SJohn Marino.Pp 82686d7f5d3SJohn MarinoTo delete a single filter, then use MRT_DEL_BW_UPCALL, 82786d7f5d3SJohn Marinoand the fields of bw_upcall must be set 82886d7f5d3SJohn Marinoexactly same as when MRT_ADD_BW_UPCALL was called. 82986d7f5d3SJohn Marino.Pp 83086d7f5d3SJohn MarinoTo delete all bandwidth filters for a given (S,G), then 83186d7f5d3SJohn Marinoonly the 83286d7f5d3SJohn Marino.Dq bu_src 83386d7f5d3SJohn Marinoand 83486d7f5d3SJohn Marino.Dq bu_dst 83586d7f5d3SJohn Marinofields in 83686d7f5d3SJohn Marino.Dq struct bw_upcall 83786d7f5d3SJohn Marinoneed to be set, and then just set only the 83886d7f5d3SJohn Marino.Dq BW_UPCALL_DELETE_ALL 83986d7f5d3SJohn Marinoflag inside field 84086d7f5d3SJohn Marino.Dq bw_upcall.bu_flags . 84186d7f5d3SJohn Marino.Pp 84286d7f5d3SJohn MarinoThe bandwidth upcalls are received by aggregating them in the new upcall 84386d7f5d3SJohn Marinomessage: 84486d7f5d3SJohn Marino.Bd -literal 84586d7f5d3SJohn Marino#define IGMPMSG_BW_UPCALL 4 /* BW monitoring upcall */ 84686d7f5d3SJohn Marino.Ed 84786d7f5d3SJohn Marino.Pp 84886d7f5d3SJohn MarinoThis message is an array of 84986d7f5d3SJohn Marino.Dq struct bw_upcall 85086d7f5d3SJohn Marinoelements (up to BW_UPCALLS_MAX = 128). 85186d7f5d3SJohn MarinoThe upcalls are 85286d7f5d3SJohn Marinodelivered when there are 128 pending upcalls, or when 1 second has 85386d7f5d3SJohn Marinoexpired since the previous upcall (whichever comes first). 85486d7f5d3SJohn MarinoIn an 85586d7f5d3SJohn Marino.Dq struct upcall 85686d7f5d3SJohn Marinoelement, the 85786d7f5d3SJohn Marino.Dq bu_measured 85886d7f5d3SJohn Marinofield is filled-in to 85986d7f5d3SJohn Marinoindicate the particular measured values. 86086d7f5d3SJohn MarinoHowever, because of the way 86186d7f5d3SJohn Marinothe particular intervals are measured, the user should be careful how 86286d7f5d3SJohn Marinobu_measured.b_time is used. 86386d7f5d3SJohn MarinoFor example, if the 86486d7f5d3SJohn Marinofilter is installed to trigger an upcall if the number of packets 86586d7f5d3SJohn Marinois >= 1, then 86686d7f5d3SJohn Marino.Dq bu_measured 86786d7f5d3SJohn Marinomay have a value of zero in the upcalls after the 86886d7f5d3SJohn Marinofirst one, because the measured interval for >= filters is 86986d7f5d3SJohn Marino.Dq clocked 87086d7f5d3SJohn Marinoby the forwarded packets. 87186d7f5d3SJohn MarinoHence, this upcall mechanism should not be used for measuring 87286d7f5d3SJohn Marinothe exact value of the bandwidth of the forwarded data. 87386d7f5d3SJohn MarinoTo measure the exact bandwidth, the user would need to 87486d7f5d3SJohn Marinoget the forwarded packets statistics with the ioctl(SIOCGETSGCNT) 87586d7f5d3SJohn Marinomechanism 87686d7f5d3SJohn Marino(see the 87786d7f5d3SJohn Marino.Sx Programming Guide 87886d7f5d3SJohn Marinosection) . 87986d7f5d3SJohn Marino.Pp 88086d7f5d3SJohn MarinoNote that the upcalls for a filter are delivered until the specific 88186d7f5d3SJohn Marinofilter is deleted, but no more frequently than once per 88286d7f5d3SJohn Marino.Dq bu_threshold.b_time . 88386d7f5d3SJohn MarinoFor example, if the filter is specified to 88486d7f5d3SJohn Marinodeliver a signal if bw >= 1 packet, the first packet will trigger a 88586d7f5d3SJohn Marinosignal, but the next upcall will be triggered no earlier than 88686d7f5d3SJohn Marino.Dq bu_threshold.b_time 88786d7f5d3SJohn Marinoafter the previous upcall. 88886d7f5d3SJohn Marino.\" 88986d7f5d3SJohn Marino.Sh SEE ALSO 89086d7f5d3SJohn Marino.Xr getsockopt 2 , 89186d7f5d3SJohn Marino.Xr recvfrom 2 , 89286d7f5d3SJohn Marino.Xr recvmsg 2 , 89386d7f5d3SJohn Marino.Xr setsockopt 2 , 89486d7f5d3SJohn Marino.Xr socket 2 , 89586d7f5d3SJohn Marino.Xr icmp6 4 , 89686d7f5d3SJohn Marino.Xr inet 4 , 89786d7f5d3SJohn Marino.Xr inet6 4 , 89886d7f5d3SJohn Marino.Xr intro 4 , 89986d7f5d3SJohn Marino.Xr ip 4 , 90086d7f5d3SJohn Marino.Xr ip6 4 , 90186d7f5d3SJohn Marino.Xr pim 4 90286d7f5d3SJohn Marino.\" 90386d7f5d3SJohn Marino.Sh AUTHORS 90486d7f5d3SJohn Marino.An -nosplit 90586d7f5d3SJohn MarinoThe original multicast code was written by 90686d7f5d3SJohn Marino.An David Waitzman 90786d7f5d3SJohn Marino(BBN Labs), and later modified by the following individuals: 90886d7f5d3SJohn Marino.An Steve Deering 90986d7f5d3SJohn Marino(Stanford), 91086d7f5d3SJohn Marino.An Mark J. Steiglitz 91186d7f5d3SJohn Marino(Stanford), 91286d7f5d3SJohn Marino.An Van Jacobson 91386d7f5d3SJohn Marino(LBL), 91486d7f5d3SJohn Marino.An Ajit Thyagarajan 91586d7f5d3SJohn Marino(PARC), 91686d7f5d3SJohn Marino.An Bill Fenner 91786d7f5d3SJohn Marino(PARC). 91886d7f5d3SJohn MarinoThe IPv6 multicast support was implemented by the KAME project 91986d7f5d3SJohn Marino.Pa ( http://www.kame.net ) , 92086d7f5d3SJohn Marinoand was based on the IPv4 multicast code. 92186d7f5d3SJohn MarinoThe advanced multicast API and the multicast bandwidth 92286d7f5d3SJohn Marinomonitoring were implemented by 92386d7f5d3SJohn Marino.An Pavlin Radoslavov 92486d7f5d3SJohn Marino(ICSI) in collaboration with 92586d7f5d3SJohn Marino.An Chris Brown 92686d7f5d3SJohn Marino(NextHop). 92786d7f5d3SJohn Marino.Pp 92886d7f5d3SJohn MarinoThis manual page was written by 92986d7f5d3SJohn Marino.An Pavlin Radoslavov 93086d7f5d3SJohn Marino(ICSI). 931