xref: /dflybsd-src/share/man/man4/multicast.4 (revision 86d7f5d305c6adaa56ff4582ece9859d73106103)
1*86d7f5d3SJohn Marino.\" Copyright (c) 2001-2003 International Computer Science Institute
2*86d7f5d3SJohn Marino.\"
3*86d7f5d3SJohn Marino.\" Permission is hereby granted, free of charge, to any person obtaining a
4*86d7f5d3SJohn Marino.\" copy of this software and associated documentation files (the "Software"),
5*86d7f5d3SJohn Marino.\" to deal in the Software without restriction, including without limitation
6*86d7f5d3SJohn Marino.\" the rights to use, copy, modify, merge, publish, distribute, sublicense,
7*86d7f5d3SJohn Marino.\" and/or sell copies of the Software, and to permit persons to whom the
8*86d7f5d3SJohn Marino.\" Software is furnished to do so, subject to the following conditions:
9*86d7f5d3SJohn Marino.\"
10*86d7f5d3SJohn Marino.\" The above copyright notice and this permission notice shall be included in
11*86d7f5d3SJohn Marino.\" all copies or substantial portions of the Software.
12*86d7f5d3SJohn Marino.\"
13*86d7f5d3SJohn Marino.\" The names and trademarks of copyright holders may not be used in
14*86d7f5d3SJohn Marino.\" advertising or publicity pertaining to the software without specific
15*86d7f5d3SJohn Marino.\" prior permission. Title to copyright in this software and any associated
16*86d7f5d3SJohn Marino.\" documentation will at all times remain with the copyright holders.
17*86d7f5d3SJohn Marino.\"
18*86d7f5d3SJohn Marino.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
19*86d7f5d3SJohn Marino.\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
20*86d7f5d3SJohn Marino.\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
21*86d7f5d3SJohn Marino.\" AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
22*86d7f5d3SJohn Marino.\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
23*86d7f5d3SJohn Marino.\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
24*86d7f5d3SJohn Marino.\" DEALINGS IN THE SOFTWARE.
25*86d7f5d3SJohn Marino.\"
26*86d7f5d3SJohn Marino.\" $FreeBSD: /repoman/r/ncvs/src/share/man/man4/multicast.4,v 1.1 2003/10/17 15:12:01 bmah Exp $
27*86d7f5d3SJohn Marino.\" $DragonFly: src/share/man/man4/multicast.4,v 1.7 2008/05/02 02:05:05 swildner Exp $
28*86d7f5d3SJohn Marino.\"
29*86d7f5d3SJohn Marino.Dd September 4, 2003
30*86d7f5d3SJohn Marino.Dt MULTICAST 4
31*86d7f5d3SJohn Marino.Os
32*86d7f5d3SJohn Marino.\"
33*86d7f5d3SJohn Marino.Sh NAME
34*86d7f5d3SJohn Marino.Nm multicast
35*86d7f5d3SJohn Marino.Nd Multicast Routing
36*86d7f5d3SJohn Marino.\"
37*86d7f5d3SJohn Marino.Sh SYNOPSIS
38*86d7f5d3SJohn Marino.Cd "options MROUTING"
39*86d7f5d3SJohn Marino.Pp
40*86d7f5d3SJohn Marino.In sys/types.h
41*86d7f5d3SJohn Marino.In sys/socket.h
42*86d7f5d3SJohn Marino.In netinet/in.h
43*86d7f5d3SJohn Marino.In net/ip_mroute/ip_mroute.h
44*86d7f5d3SJohn Marino.In netinet6/ip6_mroute.h
45*86d7f5d3SJohn Marino.Ft int
46*86d7f5d3SJohn Marino.Fn getsockopt "int s" IPPROTO_IP MRT_INIT "void *optval" "socklen_t *optlen"
47*86d7f5d3SJohn Marino.Ft int
48*86d7f5d3SJohn Marino.Fn setsockopt "int s" IPPROTO_IP MRT_INIT "const void *optval" "socklen_t optlen"
49*86d7f5d3SJohn Marino.Ft int
50*86d7f5d3SJohn Marino.Fn getsockopt "int s" IPPROTO_IPV6 MRT6_INIT "void *optval" "socklen_t *optlen"
51*86d7f5d3SJohn Marino.Ft int
52*86d7f5d3SJohn Marino.Fn setsockopt "int s" IPPROTO_IPV6 MRT6_INIT "const void *optval" "socklen_t optlen"
53*86d7f5d3SJohn Marino.Sh DESCRIPTION
54*86d7f5d3SJohn Marino.Tn "Multicast routing"
55*86d7f5d3SJohn Marinois used to efficiently propagate data
56*86d7f5d3SJohn Marinopackets to a set of multicast listeners in multipoint networks.
57*86d7f5d3SJohn MarinoIf unicast is used to replicate the data to all listeners,
58*86d7f5d3SJohn Marinothen some of the network links may carry multiple copies of the same
59*86d7f5d3SJohn Marinodata packets.
60*86d7f5d3SJohn MarinoWith multicast routing, the overhead is reduced to one copy
61*86d7f5d3SJohn Marino(at most) per network link.
62*86d7f5d3SJohn Marino.Pp
63*86d7f5d3SJohn MarinoAll multicast-capable routers must run a common multicast routing
64*86d7f5d3SJohn Marinoprotocol.
65*86d7f5d3SJohn MarinoThe Distance Vector Multicast Routing Protocol (DVMRP)
66*86d7f5d3SJohn Marinowas the first developed multicast routing protocol.
67*86d7f5d3SJohn MarinoLater, other protocols such as Multicast Extensions to OSPF (MOSPF),
68*86d7f5d3SJohn MarinoCore Based Trees (CBT),
69*86d7f5d3SJohn MarinoProtocol Independent Multicast - Sparse Mode (PIM-SM),
70*86d7f5d3SJohn Marinoand Protocol Independent Multicast - Dense Mode (PIM-DM)
71*86d7f5d3SJohn Marinowere developed as well.
72*86d7f5d3SJohn Marino.Pp
73*86d7f5d3SJohn MarinoTo start multicast routing,
74*86d7f5d3SJohn Marinothe user must enable multicast forwarding in the kernel
75*86d7f5d3SJohn Marino(see
76*86d7f5d3SJohn Marino.Sx SYNOPSIS
77*86d7f5d3SJohn Marinoabout the kernel configuration options),
78*86d7f5d3SJohn Marinoand must run a multicast routing capable user-level process.
79*86d7f5d3SJohn MarinoFrom developer's point of view,
80*86d7f5d3SJohn Marinothe programming guide described in the
81*86d7f5d3SJohn Marino.Sx "Programming Guide"
82*86d7f5d3SJohn Marinosection should be used to control the multicast forwarding in the kernel.
83*86d7f5d3SJohn Marino.\"
84*86d7f5d3SJohn Marino.Ss Programming Guide
85*86d7f5d3SJohn MarinoThis section provides information about the basic multicast routing API.
86*86d7f5d3SJohn MarinoThe so-called
87*86d7f5d3SJohn Marino.Dq advanced multicast API
88*86d7f5d3SJohn Marinois described in the
89*86d7f5d3SJohn Marino.Sx "Advanced Multicast API Programming Guide"
90*86d7f5d3SJohn Marinosection.
91*86d7f5d3SJohn Marino.Pp
92*86d7f5d3SJohn MarinoFirst, a multicast routing socket must be open.
93*86d7f5d3SJohn MarinoThat socket would be used
94*86d7f5d3SJohn Marinoto control the multicast forwarding in the kernel.
95*86d7f5d3SJohn MarinoNote that most operations below require certain privilege
96*86d7f5d3SJohn Marino(i.e., root privilege):
97*86d7f5d3SJohn Marino.Bd -literal
98*86d7f5d3SJohn Marino/* IPv4 */
99*86d7f5d3SJohn Marinoint mrouter_s4;
100*86d7f5d3SJohn Marinomrouter_s4 = socket(AF_INET, SOCK_RAW, IPPROTO_IGMP);
101*86d7f5d3SJohn Marino.Ed
102*86d7f5d3SJohn Marino.Bd -literal
103*86d7f5d3SJohn Marinoint mrouter_s6;
104*86d7f5d3SJohn Marinomrouter_s6 = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6);
105*86d7f5d3SJohn Marino.Ed
106*86d7f5d3SJohn Marino.Pp
107*86d7f5d3SJohn MarinoNote that if the router needs to open an IGMP or ICMPv6 socket
108*86d7f5d3SJohn Marino(in case of IPv4 and IPv6 respectively)
109*86d7f5d3SJohn Marinofor sending or receiving of IGMP or MLD multicast group membership messages,
110*86d7f5d3SJohn Marinothen the same mrouter_s4 or mrouter_s6 sockets should be used
111*86d7f5d3SJohn Marinofor sending and receiving respectively IGMP or MLD messages.
112*86d7f5d3SJohn MarinoIn case of BSD-derived kernel, it may be possible to open separate sockets
113*86d7f5d3SJohn Marinofor IGMP or MLD messages only.
114*86d7f5d3SJohn MarinoHowever, some other kernels (e.g., Linux) require that the multicast
115*86d7f5d3SJohn Marinorouting socket must be used for sending and receiving of IGMP or MLD
116*86d7f5d3SJohn Marinomessages.
117*86d7f5d3SJohn MarinoTherefore, for portability reason the multicast
118*86d7f5d3SJohn Marinorouting socket should be reused for IGMP and MLD messages as well.
119*86d7f5d3SJohn Marino.Pp
120*86d7f5d3SJohn MarinoAfter the multicast routing socket is open, it can be used to enable
121*86d7f5d3SJohn Marinoor disable multicast forwarding in the kernel:
122*86d7f5d3SJohn Marino.Bd -literal
123*86d7f5d3SJohn Marino/* IPv4 */
124*86d7f5d3SJohn Marinoint v = 1;        /* 1 to enable, or 0 to disable */
125*86d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IP, MRT_INIT, (void *)&v, sizeof(v));
126*86d7f5d3SJohn Marino.Ed
127*86d7f5d3SJohn Marino.Bd -literal
128*86d7f5d3SJohn Marino/* IPv6 */
129*86d7f5d3SJohn Marinoint v = 1;        /* 1 to enable, or 0 to disable */
130*86d7f5d3SJohn Marinosetsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_INIT, (void *)&v, sizeof(v));
131*86d7f5d3SJohn Marino\&...
132*86d7f5d3SJohn Marino/* If necessary, filter all ICMPv6 messages */
133*86d7f5d3SJohn Marinostruct icmp6_filter filter;
134*86d7f5d3SJohn MarinoICMP6_FILTER_SETBLOCKALL(&filter);
135*86d7f5d3SJohn Marinosetsockopt(mrouter_s6, IPPROTO_ICMPV6, ICMP6_FILTER, (void *)&filter,
136*86d7f5d3SJohn Marino           sizeof(filter));
137*86d7f5d3SJohn Marino.Ed
138*86d7f5d3SJohn Marino.Pp
139*86d7f5d3SJohn MarinoAfter multicast forwarding is enabled, the multicast routing socket
140*86d7f5d3SJohn Marinocan be used to enable PIM processing in the kernel if we are running PIM-SM or
141*86d7f5d3SJohn MarinoPIM-DM
142*86d7f5d3SJohn Marino(see
143*86d7f5d3SJohn Marino.Xr pim 4 ) .
144*86d7f5d3SJohn Marino.Pp
145*86d7f5d3SJohn MarinoFor each network interface (e.g., physical or a virtual tunnel)
146*86d7f5d3SJohn Marinothat would be used for multicast forwarding, a corresponding
147*86d7f5d3SJohn Marinomulticast interface must be added to the kernel:
148*86d7f5d3SJohn Marino.Bd -literal
149*86d7f5d3SJohn Marino/* IPv4 */
150*86d7f5d3SJohn Marinostruct vifctl vc;
151*86d7f5d3SJohn Marinomemset(&vc, 0, sizeof(vc));
152*86d7f5d3SJohn Marino/* Assign all vifctl fields as appropriate */
153*86d7f5d3SJohn Marinovc.vifc_vifi = vif_index;
154*86d7f5d3SJohn Marinovc.vifc_flags = vif_flags;
155*86d7f5d3SJohn Marinovc.vifc_threshold = min_ttl_threshold;
156*86d7f5d3SJohn Marinovc.vifc_rate_limit = max_rate_limit;
157*86d7f5d3SJohn Marinomemcpy(&vc.vifc_lcl_addr, &vif_local_address, sizeof(vc.vifc_lcl_addr));
158*86d7f5d3SJohn Marinoif (vc.vifc_flags & VIFF_TUNNEL)
159*86d7f5d3SJohn Marino    memcpy(&vc.vifc_rmt_addr, &vif_remote_address,
160*86d7f5d3SJohn Marino           sizeof(vc.vifc_rmt_addr));
161*86d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_VIF, (void *)&vc,
162*86d7f5d3SJohn Marino           sizeof(vc));
163*86d7f5d3SJohn Marino.Ed
164*86d7f5d3SJohn Marino.Pp
165*86d7f5d3SJohn MarinoThe
166*86d7f5d3SJohn Marino.Dq vif_index
167*86d7f5d3SJohn Marinomust be unique per vif.
168*86d7f5d3SJohn MarinoThe
169*86d7f5d3SJohn Marino.Dq vif_flags
170*86d7f5d3SJohn Marinocontains the
171*86d7f5d3SJohn Marino.Dq VIFF_*
172*86d7f5d3SJohn Marinoflags as defined in
173*86d7f5d3SJohn Marino.In net/ip_mroute/ip_mroute.h .
174*86d7f5d3SJohn MarinoThe
175*86d7f5d3SJohn Marino.Dq min_ttl_threshold
176*86d7f5d3SJohn Marinocontains the minimum TTL a multicast data packet must have to be
177*86d7f5d3SJohn Marinoforwarded on that vif.
178*86d7f5d3SJohn MarinoTypically, it would have value of 1.
179*86d7f5d3SJohn MarinoThe
180*86d7f5d3SJohn Marino.Dq max_rate_limit
181*86d7f5d3SJohn Marinocontains the maximum rate (in bits/s) of the multicast data packets forwarded
182*86d7f5d3SJohn Marinoon that vif.
183*86d7f5d3SJohn MarinoValue of 0 means no limit.
184*86d7f5d3SJohn MarinoThe
185*86d7f5d3SJohn Marino.Dq vif_local_address
186*86d7f5d3SJohn Marinocontains the local IP address of the corresponding local interface.
187*86d7f5d3SJohn MarinoThe
188*86d7f5d3SJohn Marino.Dq vif_remote_address
189*86d7f5d3SJohn Marinocontains the remote IP address in case of DVMRP multicast tunnels.
190*86d7f5d3SJohn Marino.Bd -literal
191*86d7f5d3SJohn Marino/* IPv6 */
192*86d7f5d3SJohn Marinostruct mif6ctl mc;
193*86d7f5d3SJohn Marinomemset(&mc, 0, sizeof(mc));
194*86d7f5d3SJohn Marino/* Assign all mif6ctl fields as appropriate */
195*86d7f5d3SJohn Marinomc.mif6c_mifi = mif_index;
196*86d7f5d3SJohn Marinomc.mif6c_flags = mif_flags;
197*86d7f5d3SJohn Marinomc.mif6c_pifi = pif_index;
198*86d7f5d3SJohn Marinosetsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_ADD_MIF, (void *)&mc,
199*86d7f5d3SJohn Marino           sizeof(mc));
200*86d7f5d3SJohn Marino.Ed
201*86d7f5d3SJohn Marino.Pp
202*86d7f5d3SJohn MarinoThe
203*86d7f5d3SJohn Marino.Dq mif_index
204*86d7f5d3SJohn Marinomust be unique per vif.
205*86d7f5d3SJohn MarinoThe
206*86d7f5d3SJohn Marino.Dq mif_flags
207*86d7f5d3SJohn Marinocontains the
208*86d7f5d3SJohn Marino.Dq MIFF_*
209*86d7f5d3SJohn Marinoflags as defined in
210*86d7f5d3SJohn Marino.In netinet6/ip6_mroute.h .
211*86d7f5d3SJohn MarinoThe
212*86d7f5d3SJohn Marino.Dq pif_index
213*86d7f5d3SJohn Marinois the physical interface index of the corresponding local interface.
214*86d7f5d3SJohn Marino.Pp
215*86d7f5d3SJohn MarinoA multicast interface is deleted by:
216*86d7f5d3SJohn Marino.Bd -literal
217*86d7f5d3SJohn Marino/* IPv4 */
218*86d7f5d3SJohn Marinovifi_t vifi = vif_index;
219*86d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_VIF, (void *)&vifi,
220*86d7f5d3SJohn Marino           sizeof(vifi));
221*86d7f5d3SJohn Marino.Ed
222*86d7f5d3SJohn Marino.Bd -literal
223*86d7f5d3SJohn Marino/* IPv6 */
224*86d7f5d3SJohn Marinomifi_t mifi = mif_index;
225*86d7f5d3SJohn Marinosetsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_DEL_MIF, (void *)&mifi,
226*86d7f5d3SJohn Marino           sizeof(mifi));
227*86d7f5d3SJohn Marino.Ed
228*86d7f5d3SJohn Marino.Pp
229*86d7f5d3SJohn MarinoAfter the multicast forwarding is enabled, and the multicast virtual
230*86d7f5d3SJohn Marinointerfaces are
231*86d7f5d3SJohn Marinoadded, the kernel may deliver upcall messages (also called signals
232*86d7f5d3SJohn Marinolater in this text) on the multicast routing socket that was open
233*86d7f5d3SJohn Marinoearlier with
234*86d7f5d3SJohn Marino.Dq MRT_INIT
235*86d7f5d3SJohn Marinoor
236*86d7f5d3SJohn Marino.Dq MRT6_INIT .
237*86d7f5d3SJohn MarinoThe IPv4 upcalls have
238*86d7f5d3SJohn Marino.Dq struct igmpmsg
239*86d7f5d3SJohn Marinoheader (see
240*86d7f5d3SJohn Marino.In net/ip_mroute/ip_mroute.h )
241*86d7f5d3SJohn Marinowith field
242*86d7f5d3SJohn Marino.Dq im_mbz
243*86d7f5d3SJohn Marinoset to zero.
244*86d7f5d3SJohn MarinoNote that this header follows the structure of
245*86d7f5d3SJohn Marino.Dq struct ip
246*86d7f5d3SJohn Marinowith the protocol field
247*86d7f5d3SJohn Marino.Dq ip_p
248*86d7f5d3SJohn Marinoset to zero.
249*86d7f5d3SJohn MarinoThe IPv6 upcalls have
250*86d7f5d3SJohn Marino.Dq struct mrt6msg
251*86d7f5d3SJohn Marinoheader (see
252*86d7f5d3SJohn Marino.In netinet6/ip6_mroute.h )
253*86d7f5d3SJohn Marinowith field
254*86d7f5d3SJohn Marino.Dq im6_mbz
255*86d7f5d3SJohn Marinoset to zero.
256*86d7f5d3SJohn MarinoNote that this header follows the structure of
257*86d7f5d3SJohn Marino.Dq struct ip6_hdr
258*86d7f5d3SJohn Marinowith the next header field
259*86d7f5d3SJohn Marino.Dq ip6_nxt
260*86d7f5d3SJohn Marinoset to zero.
261*86d7f5d3SJohn Marino.Pp
262*86d7f5d3SJohn MarinoThe upcall header contains field
263*86d7f5d3SJohn Marino.Dq im_msgtype
264*86d7f5d3SJohn Marinoand
265*86d7f5d3SJohn Marino.Dq im6_msgtype
266*86d7f5d3SJohn Marinowith the type of the upcall
267*86d7f5d3SJohn Marino.Dq IGMPMSG_*
268*86d7f5d3SJohn Marinoand
269*86d7f5d3SJohn Marino.Dq MRT6MSG_*
270*86d7f5d3SJohn Marinofor IPv4 and IPv6 respectively.
271*86d7f5d3SJohn MarinoThe values of the rest of the upcall header fields
272*86d7f5d3SJohn Marinoand the body of the upcall message depend on the particular upcall type.
273*86d7f5d3SJohn Marino.Pp
274*86d7f5d3SJohn MarinoIf the upcall message type is
275*86d7f5d3SJohn Marino.Dq IGMPMSG_NOCACHE
276*86d7f5d3SJohn Marinoor
277*86d7f5d3SJohn Marino.Dq MRT6MSG_NOCACHE ,
278*86d7f5d3SJohn Marinothis is an indication that a multicast packet has reached the multicast
279*86d7f5d3SJohn Marinorouter, but the router has no forwarding state for that packet.
280*86d7f5d3SJohn MarinoTypically, the upcall would be a signal for the multicast routing
281*86d7f5d3SJohn Marinouser-level process to install the appropriate Multicast Forwarding
282*86d7f5d3SJohn MarinoCache (MFC) entry in the kernel.
283*86d7f5d3SJohn Marino.Pp
284*86d7f5d3SJohn MarinoA MFC entry is added by:
285*86d7f5d3SJohn Marino.Bd -literal
286*86d7f5d3SJohn Marino/* IPv4 */
287*86d7f5d3SJohn Marinostruct mfcctl mc;
288*86d7f5d3SJohn Marinomemset(&mc, 0, sizeof(mc));
289*86d7f5d3SJohn Marinomemcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin));
290*86d7f5d3SJohn Marinomemcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp));
291*86d7f5d3SJohn Marinomc.mfcc_parent = iif_index;
292*86d7f5d3SJohn Marinofor (i = 0; i < maxvifs; i++)
293*86d7f5d3SJohn Marino    mc.mfcc_ttls[i] = oifs_ttl[i];
294*86d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_MFC,
295*86d7f5d3SJohn Marino           (void *)&mc, sizeof(mc));
296*86d7f5d3SJohn Marino.Ed
297*86d7f5d3SJohn Marino.Bd -literal
298*86d7f5d3SJohn Marino/* IPv6 */
299*86d7f5d3SJohn Marinostruct mf6cctl mc;
300*86d7f5d3SJohn Marinomemset(&mc, 0, sizeof(mc));
301*86d7f5d3SJohn Marinomemcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin));
302*86d7f5d3SJohn Marinomemcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp));
303*86d7f5d3SJohn Marinomc.mf6cc_parent = iif_index;
304*86d7f5d3SJohn Marinofor (i = 0; i < maxvifs; i++)
305*86d7f5d3SJohn Marino    if (oifs_ttl[i] > 0)
306*86d7f5d3SJohn Marino        IF_SET(i, &mc.mf6cc_ifset);
307*86d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_ADD_MFC,
308*86d7f5d3SJohn Marino           (void *)&mc, sizeof(mc));
309*86d7f5d3SJohn Marino.Ed
310*86d7f5d3SJohn Marino.Pp
311*86d7f5d3SJohn MarinoThe
312*86d7f5d3SJohn Marino.Dq source_addr
313*86d7f5d3SJohn Marinoand
314*86d7f5d3SJohn Marino.Dq group_addr
315*86d7f5d3SJohn Marinoare the source and group address of the multicast packet (as set
316*86d7f5d3SJohn Marinoin the upcall message).
317*86d7f5d3SJohn MarinoThe
318*86d7f5d3SJohn Marino.Dq iif_index
319*86d7f5d3SJohn Marinois the virtual interface index of the multicast interface the multicast
320*86d7f5d3SJohn Marinopackets for this specific source and group address should be received on.
321*86d7f5d3SJohn MarinoThe
322*86d7f5d3SJohn Marino.Dq oifs_ttl[]
323*86d7f5d3SJohn Marinoarray contains the minimum TTL (per interface) a multicast packet
324*86d7f5d3SJohn Marinoshould have to be forwarded on an outgoing interface.
325*86d7f5d3SJohn MarinoIf the TTL value is zero, the corresponding interface is not included
326*86d7f5d3SJohn Marinoin the set of outgoing interfaces.
327*86d7f5d3SJohn MarinoNote that in case of IPv6 only the set of outgoing interfaces can
328*86d7f5d3SJohn Marinobe specified.
329*86d7f5d3SJohn Marino.Pp
330*86d7f5d3SJohn MarinoA MFC entry is deleted by:
331*86d7f5d3SJohn Marino.Bd -literal
332*86d7f5d3SJohn Marino/* IPv4 */
333*86d7f5d3SJohn Marinostruct mfcctl mc;
334*86d7f5d3SJohn Marinomemset(&mc, 0, sizeof(mc));
335*86d7f5d3SJohn Marinomemcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin));
336*86d7f5d3SJohn Marinomemcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp));
337*86d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_MFC,
338*86d7f5d3SJohn Marino           (void *)&mc, sizeof(mc));
339*86d7f5d3SJohn Marino.Ed
340*86d7f5d3SJohn Marino.Bd -literal
341*86d7f5d3SJohn Marino/* IPv6 */
342*86d7f5d3SJohn Marinostruct mf6cctl mc;
343*86d7f5d3SJohn Marinomemset(&mc, 0, sizeof(mc));
344*86d7f5d3SJohn Marinomemcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin));
345*86d7f5d3SJohn Marinomemcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp));
346*86d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_DEL_MFC,
347*86d7f5d3SJohn Marino           (void *)&mc, sizeof(mc));
348*86d7f5d3SJohn Marino.Ed
349*86d7f5d3SJohn Marino.Pp
350*86d7f5d3SJohn MarinoThe following method can be used to get various statistics per
351*86d7f5d3SJohn Marinoinstalled MFC entry in the kernel (e.g., the number of forwarded
352*86d7f5d3SJohn Marinopackets per source and group address):
353*86d7f5d3SJohn Marino.Bd -literal
354*86d7f5d3SJohn Marino/* IPv4 */
355*86d7f5d3SJohn Marinostruct sioc_sg_req sgreq;
356*86d7f5d3SJohn Marinomemset(&sgreq, 0, sizeof(sgreq));
357*86d7f5d3SJohn Marinomemcpy(&sgreq.src, &source_addr, sizeof(sgreq.src));
358*86d7f5d3SJohn Marinomemcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp));
359*86d7f5d3SJohn Marinoioctl(mrouter_s4, SIOCGETSGCNT, &sgreq);
360*86d7f5d3SJohn Marino.Ed
361*86d7f5d3SJohn Marino.Bd -literal
362*86d7f5d3SJohn Marino/* IPv6 */
363*86d7f5d3SJohn Marinostruct sioc_sg_req6 sgreq;
364*86d7f5d3SJohn Marinomemset(&sgreq, 0, sizeof(sgreq));
365*86d7f5d3SJohn Marinomemcpy(&sgreq.src, &source_addr, sizeof(sgreq.src));
366*86d7f5d3SJohn Marinomemcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp));
367*86d7f5d3SJohn Marinoioctl(mrouter_s6, SIOCGETSGCNT_IN6, &sgreq);
368*86d7f5d3SJohn Marino.Ed
369*86d7f5d3SJohn Marino.Pp
370*86d7f5d3SJohn MarinoThe following method can be used to get various statistics per
371*86d7f5d3SJohn Marinomulticast virtual interface in the kernel (e.g., the number of forwarded
372*86d7f5d3SJohn Marinopackets per interface):
373*86d7f5d3SJohn Marino.Bd -literal
374*86d7f5d3SJohn Marino/* IPv4 */
375*86d7f5d3SJohn Marinostruct sioc_vif_req vreq;
376*86d7f5d3SJohn Marinomemset(&vreq, 0, sizeof(vreq));
377*86d7f5d3SJohn Marinovreq.vifi = vif_index;
378*86d7f5d3SJohn Marinoioctl(mrouter_s4, SIOCGETVIFCNT, &vreq);
379*86d7f5d3SJohn Marino.Ed
380*86d7f5d3SJohn Marino.Bd -literal
381*86d7f5d3SJohn Marino/* IPv6 */
382*86d7f5d3SJohn Marinostruct sioc_mif_req6 mreq;
383*86d7f5d3SJohn Marinomemset(&mreq, 0, sizeof(mreq));
384*86d7f5d3SJohn Marinomreq.mifi = vif_index;
385*86d7f5d3SJohn Marinoioctl(mrouter_s6, SIOCGETMIFCNT_IN6, &mreq);
386*86d7f5d3SJohn Marino.Ed
387*86d7f5d3SJohn Marino.Ss Advanced Multicast API Programming Guide
388*86d7f5d3SJohn MarinoIf we want to add new features in the kernel, it becomes difficult
389*86d7f5d3SJohn Marinoto preserve backward compatibility (binary and API),
390*86d7f5d3SJohn Marinoand at the same time to allow user-level processes to take advantage of
391*86d7f5d3SJohn Marinothe new features (if the kernel supports them).
392*86d7f5d3SJohn Marino.Pp
393*86d7f5d3SJohn MarinoOne of the mechanisms that allows us to preserve the backward
394*86d7f5d3SJohn Marinocompatibility is a sort of negotiation
395*86d7f5d3SJohn Marinobetween the user-level process and the kernel:
396*86d7f5d3SJohn Marino.Bl -enum
397*86d7f5d3SJohn Marino.It
398*86d7f5d3SJohn MarinoThe user-level process tries to enable in the kernel the set of new
399*86d7f5d3SJohn Marinofeatures (and the corresponding API) it would like to use.
400*86d7f5d3SJohn Marino.It
401*86d7f5d3SJohn MarinoThe kernel returns the (sub)set of features it knows about
402*86d7f5d3SJohn Marinoand is willing to be enabled.
403*86d7f5d3SJohn Marino.It
404*86d7f5d3SJohn MarinoThe user-level process uses only that set of features
405*86d7f5d3SJohn Marinothe kernel has agreed on.
406*86d7f5d3SJohn Marino.El
407*86d7f5d3SJohn Marino.\"
408*86d7f5d3SJohn Marino.Pp
409*86d7f5d3SJohn MarinoTo support backward compatibility, if the user-level process doesn't
410*86d7f5d3SJohn Marinoask for any new features, the kernel defaults to the basic
411*86d7f5d3SJohn Marinomulticast API (see the
412*86d7f5d3SJohn Marino.Sx "Programming Guide"
413*86d7f5d3SJohn Marinosection).
414*86d7f5d3SJohn Marino.\" XXX: edit as appropriate after the advanced multicast API is
415*86d7f5d3SJohn Marino.\" supported under IPv6
416*86d7f5d3SJohn MarinoCurrently, the advanced multicast API exists only for IPv4;
417*86d7f5d3SJohn Marinoin the future there will be IPv6 support as well.
418*86d7f5d3SJohn Marino.Pp
419*86d7f5d3SJohn MarinoBelow is a summary of the expandable API solution.
420*86d7f5d3SJohn MarinoNote that all new options and structures are defined
421*86d7f5d3SJohn Marinoin
422*86d7f5d3SJohn Marino.In net/ip_mroute/ip_mroute.h
423*86d7f5d3SJohn Marinoand
424*86d7f5d3SJohn Marino.In netinet6/ip6_mroute.h ,
425*86d7f5d3SJohn Marinounless stated otherwise.
426*86d7f5d3SJohn Marino.Pp
427*86d7f5d3SJohn MarinoThe user-level process uses new get/setsockopt() options to
428*86d7f5d3SJohn Marinoperform the API features negotiation with the kernel.
429*86d7f5d3SJohn MarinoThis negotiation must be performed right after the multicast routing
430*86d7f5d3SJohn Marinosocket is open.
431*86d7f5d3SJohn MarinoThe set of desired/allowed features is stored in a bitset
432*86d7f5d3SJohn Marino(currently, in uint32_t; i.e., maximum of 32 new features).
433*86d7f5d3SJohn MarinoThe new get/setsockopt() options are
434*86d7f5d3SJohn Marino.Dq MRT_API_SUPPORT
435*86d7f5d3SJohn Marinoand
436*86d7f5d3SJohn Marino.Dq MRT_API_CONFIG .
437*86d7f5d3SJohn MarinoExample:
438*86d7f5d3SJohn Marino.Bd -literal
439*86d7f5d3SJohn Marinouint32_t v;
440*86d7f5d3SJohn Marinogetsockopt(sock, IPPROTO_IP, MRT_API_SUPPORT, (void *)&v, sizeof(v));
441*86d7f5d3SJohn Marino.Ed
442*86d7f5d3SJohn Marino.Pp
443*86d7f5d3SJohn Marinowould set in
444*86d7f5d3SJohn Marino.Dq v
445*86d7f5d3SJohn Marinothe pre-defined bits that the kernel API supports.
446*86d7f5d3SJohn MarinoThe eight least significant bits in uint32_t are same as the
447*86d7f5d3SJohn Marinoeight possible flags
448*86d7f5d3SJohn Marino.Dq MRT_MFC_FLAGS_*
449*86d7f5d3SJohn Marinothat can be used in
450*86d7f5d3SJohn Marino.Dq mfcc_flags
451*86d7f5d3SJohn Marinoas part of the new definition of
452*86d7f5d3SJohn Marino.Dq struct mfcctl
453*86d7f5d3SJohn Marino(see below about those flags), which leaves 24 flags for other new features.
454*86d7f5d3SJohn MarinoThe value returned by getsockopt(MRT_API_SUPPORT) is read-only; in other
455*86d7f5d3SJohn Marinowords, setsockopt(MRT_API_SUPPORT) would fail.
456*86d7f5d3SJohn Marino.Pp
457*86d7f5d3SJohn MarinoTo modify the API, and to set some specific feature in the kernel, then:
458*86d7f5d3SJohn Marino.Bd -literal
459*86d7f5d3SJohn Marinouint32_t v = MRT_MFC_FLAGS_DISABLE_WRONGVIF;
460*86d7f5d3SJohn Marinoif (setsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v))
461*86d7f5d3SJohn Marino    != 0) {
462*86d7f5d3SJohn Marino    return (ERROR);
463*86d7f5d3SJohn Marino}
464*86d7f5d3SJohn Marinoif (v & MRT_MFC_FLAGS_DISABLE_WRONGVIF)
465*86d7f5d3SJohn Marino    return (OK);	/* Success */
466*86d7f5d3SJohn Marinoelse
467*86d7f5d3SJohn Marino    return (ERROR);
468*86d7f5d3SJohn Marino.Ed
469*86d7f5d3SJohn Marino.Pp
470*86d7f5d3SJohn MarinoIn other words, when setsockopt(MRT_API_CONFIG) is called, the
471*86d7f5d3SJohn Marinoargument to it specifies the desired set of features to
472*86d7f5d3SJohn Marinobe enabled in the API and the kernel.
473*86d7f5d3SJohn MarinoThe return value in
474*86d7f5d3SJohn Marino.Dq v
475*86d7f5d3SJohn Marinois the actual (sub)set of features that were enabled in the kernel.
476*86d7f5d3SJohn MarinoTo obtain later the same set of features that were enabled, then:
477*86d7f5d3SJohn Marino.Bd -literal
478*86d7f5d3SJohn Marinogetsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v));
479*86d7f5d3SJohn Marino.Ed
480*86d7f5d3SJohn Marino.Pp
481*86d7f5d3SJohn MarinoThe set of enabled features is global.
482*86d7f5d3SJohn MarinoIn other words, setsockopt(MRT_API_CONFIG)
483*86d7f5d3SJohn Marinoshould be called right after setsockopt(MRT_INIT).
484*86d7f5d3SJohn Marino.Pp
485*86d7f5d3SJohn MarinoCurrently, the following set of new features is defined:
486*86d7f5d3SJohn Marino.Bd -literal
487*86d7f5d3SJohn Marino#define	MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0) /* disable WRONGVIF signals */
488*86d7f5d3SJohn Marino#define	MRT_MFC_FLAGS_BORDER_VIF   (1 << 1)  /* border vif              */
489*86d7f5d3SJohn Marino#define MRT_MFC_RP                 (1 << 8)  /* enable RP address	*/
490*86d7f5d3SJohn Marino#define MRT_MFC_BW_UPCALL          (1 << 9)  /* enable bw upcalls	*/
491*86d7f5d3SJohn Marino.Ed
492*86d7f5d3SJohn Marino.\" .Pp
493*86d7f5d3SJohn Marino.\" In the future there might be:
494*86d7f5d3SJohn Marino.\" .Bd -literal
495*86d7f5d3SJohn Marino.\" #define MRT_MFC_GROUP_SPECIFIC     (1 << 10) /* allow (*,G) MFC entries */
496*86d7f5d3SJohn Marino.\" .Ed
497*86d7f5d3SJohn Marino.\" .Pp
498*86d7f5d3SJohn Marino.\" to allow (*,G) MFC entries (i.e., group-specific entries) in the kernel.
499*86d7f5d3SJohn Marino.\" For now this is left-out until it is clear whether
500*86d7f5d3SJohn Marino.\" (*,G) MFC support is the preferred solution instead of something more generic
501*86d7f5d3SJohn Marino.\" solution for example.
502*86d7f5d3SJohn Marino.\"
503*86d7f5d3SJohn Marino.\" 2. The newly defined struct mfcctl2.
504*86d7f5d3SJohn Marino.\"
505*86d7f5d3SJohn Marino.Pp
506*86d7f5d3SJohn MarinoThe advanced multicast API uses a newly defined
507*86d7f5d3SJohn Marino.Dq struct mfcctl2
508*86d7f5d3SJohn Marinoinstead of the traditional
509*86d7f5d3SJohn Marino.Dq struct mfcctl .
510*86d7f5d3SJohn MarinoThe original
511*86d7f5d3SJohn Marino.Dq struct mfcctl
512*86d7f5d3SJohn Marinois kept as is.
513*86d7f5d3SJohn MarinoThe new
514*86d7f5d3SJohn Marino.Dq struct mfcctl2
515*86d7f5d3SJohn Marinois:
516*86d7f5d3SJohn Marino.Bd -literal
517*86d7f5d3SJohn Marino/*
518*86d7f5d3SJohn Marino * The new argument structure for MRT_ADD_MFC and MRT_DEL_MFC overlays
519*86d7f5d3SJohn Marino * and extends the old struct mfcctl.
520*86d7f5d3SJohn Marino */
521*86d7f5d3SJohn Marinostruct mfcctl2 {
522*86d7f5d3SJohn Marino        /* the mfcctl fields */
523*86d7f5d3SJohn Marino        struct in_addr  mfcc_origin;       /* ip origin of mcasts       */
524*86d7f5d3SJohn Marino        struct in_addr  mfcc_mcastgrp;     /* multicast group associated*/
525*86d7f5d3SJohn Marino        vifi_t          mfcc_parent;       /* incoming vif              */
526*86d7f5d3SJohn Marino        u_char          mfcc_ttls[MAXVIFS];/* forwarding ttls on vifs   */
527*86d7f5d3SJohn Marino
528*86d7f5d3SJohn Marino        /* extension fields */
529*86d7f5d3SJohn Marino        uint8_t         mfcc_flags[MAXVIFS];/* the MRT_MFC_FLAGS_* flags*/
530*86d7f5d3SJohn Marino        struct in_addr  mfcc_rp;            /* the RP address           */
531*86d7f5d3SJohn Marino};
532*86d7f5d3SJohn Marino.Ed
533*86d7f5d3SJohn Marino.Pp
534*86d7f5d3SJohn MarinoThe new fields are
535*86d7f5d3SJohn Marino.Dq mfcc_flags[MAXVIFS]
536*86d7f5d3SJohn Marinoand
537*86d7f5d3SJohn Marino.Dq mfcc_rp .
538*86d7f5d3SJohn MarinoNote that for compatibility reasons they are added at the end.
539*86d7f5d3SJohn Marino.Pp
540*86d7f5d3SJohn MarinoThe
541*86d7f5d3SJohn Marino.Dq mfcc_flags[MAXVIFS]
542*86d7f5d3SJohn Marinofield is used to set various flags per
543*86d7f5d3SJohn Marinointerface per (S,G) entry.
544*86d7f5d3SJohn MarinoCurrently, the defined flags are:
545*86d7f5d3SJohn Marino.Bd -literal
546*86d7f5d3SJohn Marino#define	MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0) /* disable WRONGVIF signals */
547*86d7f5d3SJohn Marino#define	MRT_MFC_FLAGS_BORDER_VIF       (1 << 1) /* border vif          */
548*86d7f5d3SJohn Marino.Ed
549*86d7f5d3SJohn Marino.Pp
550*86d7f5d3SJohn MarinoThe
551*86d7f5d3SJohn Marino.Dq MRT_MFC_FLAGS_DISABLE_WRONGVIF
552*86d7f5d3SJohn Marinoflag is used to explicitly disable the
553*86d7f5d3SJohn Marino.Dq IGMPMSG_WRONGVIF
554*86d7f5d3SJohn Marinokernel signal at the (S,G) granularity if a multicast data packet
555*86d7f5d3SJohn Marinoarrives on the wrong interface.
556*86d7f5d3SJohn MarinoUsually, this signal is used to
557*86d7f5d3SJohn Marinocomplete the shortest-path switch in case of PIM-SM multicast routing,
558*86d7f5d3SJohn Marinoor to trigger a PIM assert message.
559*86d7f5d3SJohn MarinoHowever, it should not be delivered for interfaces that are not in
560*86d7f5d3SJohn Marinothe outgoing interface set, and that are not expecting to
561*86d7f5d3SJohn Marinobecome an incoming interface.
562*86d7f5d3SJohn MarinoHence, if the
563*86d7f5d3SJohn Marino.Dq MRT_MFC_FLAGS_DISABLE_WRONGVIF
564*86d7f5d3SJohn Marinoflag is set for some of the
565*86d7f5d3SJohn Marinointerfaces, then a data packet that arrives on that interface for
566*86d7f5d3SJohn Marinothat MFC entry will NOT trigger a WRONGVIF signal.
567*86d7f5d3SJohn MarinoIf that flag is not set, then a signal is triggered (the default action).
568*86d7f5d3SJohn Marino.Pp
569*86d7f5d3SJohn MarinoThe
570*86d7f5d3SJohn Marino.Dq MRT_MFC_FLAGS_BORDER_VIF
571*86d7f5d3SJohn Marinoflag is used to specify whether the Border-bit in PIM
572*86d7f5d3SJohn MarinoRegister messages should be set (in case when the Register encapsulation
573*86d7f5d3SJohn Marinois performed inside the kernel).
574*86d7f5d3SJohn MarinoIf it is set for the special PIM Register kernel virtual interface
575*86d7f5d3SJohn Marino(see
576*86d7f5d3SJohn Marino.Xr pim 4 ) ,
577*86d7f5d3SJohn Marinothe Border-bit in the Register messages sent to the RP will be set.
578*86d7f5d3SJohn Marino.Pp
579*86d7f5d3SJohn MarinoThe remaining six bits are reserved for future usage.
580*86d7f5d3SJohn Marino.Pp
581*86d7f5d3SJohn MarinoThe
582*86d7f5d3SJohn Marino.Dq mfcc_rp
583*86d7f5d3SJohn Marinofield is used to specify the RP address (in case of PIM-SM multicast routing)
584*86d7f5d3SJohn Marinofor a multicast
585*86d7f5d3SJohn Marinogroup G if we want to perform kernel-level PIM Register encapsulation.
586*86d7f5d3SJohn MarinoThe
587*86d7f5d3SJohn Marino.Dq mfcc_rp
588*86d7f5d3SJohn Marinofield is used only if the
589*86d7f5d3SJohn Marino.Dq MRT_MFC_RP
590*86d7f5d3SJohn Marinoadvanced API flag/capability has been successfully set by
591*86d7f5d3SJohn Marinosetsockopt(MRT_API_CONFIG).
592*86d7f5d3SJohn Marino.Pp
593*86d7f5d3SJohn Marino.\"
594*86d7f5d3SJohn Marino.\" 3. Kernel-level PIM Register encapsulation
595*86d7f5d3SJohn Marino.\"
596*86d7f5d3SJohn MarinoIf the
597*86d7f5d3SJohn Marino.Dq MRT_MFC_RP
598*86d7f5d3SJohn Marinoflag was successfully set by
599*86d7f5d3SJohn Marinosetsockopt(MRT_API_CONFIG), then the kernel will attempt to perform
600*86d7f5d3SJohn Marinothe PIM Register encapsulation itself instead of sending the
601*86d7f5d3SJohn Marinomulticast data packets to user level (inside IGMPMSG_WHOLEPKT
602*86d7f5d3SJohn Marinoupcalls) for user-level encapsulation.
603*86d7f5d3SJohn MarinoThe RP address would be taken from the
604*86d7f5d3SJohn Marino.Dq mfcc_rp
605*86d7f5d3SJohn Marinofield
606*86d7f5d3SJohn Marinoinside the new
607*86d7f5d3SJohn Marino.Dq struct mfcctl2 .
608*86d7f5d3SJohn MarinoHowever, even if the
609*86d7f5d3SJohn Marino.Dq MRT_MFC_RP
610*86d7f5d3SJohn Marinoflag was successfully set, if the
611*86d7f5d3SJohn Marino.Dq mfcc_rp
612*86d7f5d3SJohn Marinofield was set to
613*86d7f5d3SJohn Marino.Dq INADDR_ANY ,
614*86d7f5d3SJohn Marinothen the
615*86d7f5d3SJohn Marinokernel will still deliver an IGMPMSG_WHOLEPKT upcall with the
616*86d7f5d3SJohn Marinomulticast data packet to the user-level process.
617*86d7f5d3SJohn Marino.Pp
618*86d7f5d3SJohn MarinoIn addition, if the multicast data packet is too large to fit within
619*86d7f5d3SJohn Marinoa single IP packet after the PIM Register encapsulation (e.g., if
620*86d7f5d3SJohn Marinoits size was on the order of 65500 bytes), the data packet will be
621*86d7f5d3SJohn Marinofragmented, and then each of the fragments will be encapsulated
622*86d7f5d3SJohn Marinoseparately.
623*86d7f5d3SJohn MarinoNote that typically a multicast data packet can be that
624*86d7f5d3SJohn Marinolarge only if it was originated locally from the same hosts that
625*86d7f5d3SJohn Marinoperforms the encapsulation; otherwise the transmission of the
626*86d7f5d3SJohn Marinomulticast data packet over Ethernet for example would have
627*86d7f5d3SJohn Marinofragmented it into much smaller pieces.
628*86d7f5d3SJohn Marino.\"
629*86d7f5d3SJohn Marino.\" Note that if this code is ported to IPv6, we may need the kernel to
630*86d7f5d3SJohn Marino.\" perform MTU discovery to the RP, and keep those discoveries inside
631*86d7f5d3SJohn Marino.\" the kernel so the encapsulating router may send back ICMP
632*86d7f5d3SJohn Marino.\" Fragmentation Required if the size of the multicast data packet is
633*86d7f5d3SJohn Marino.\" too large (see "Encapsulating data packets in the Register Tunnel"
634*86d7f5d3SJohn Marino.\" in Section 4.4.1 in the PIM-SM spec
635*86d7f5d3SJohn Marino.\" draft-ietf-pim-sm-v2-new-05.{txt,ps}).
636*86d7f5d3SJohn Marino.\" For IPv4 we may be able to get away without it, but for IPv6 we need
637*86d7f5d3SJohn Marino.\" that.
638*86d7f5d3SJohn Marino.\"
639*86d7f5d3SJohn Marino.\" 4. Mechanism for "multicast bandwidth monitoring and upcalls".
640*86d7f5d3SJohn Marino.\"
641*86d7f5d3SJohn Marino.Pp
642*86d7f5d3SJohn MarinoTypically, a multicast routing user-level process would need to know the
643*86d7f5d3SJohn Marinoforwarding bandwidth for some data flow.
644*86d7f5d3SJohn MarinoFor example, the multicast routing process may want to timeout idle MFC
645*86d7f5d3SJohn Marinoentries, or in case of PIM-SM it can initiate (S,G) shortest-path switch if
646*86d7f5d3SJohn Marinothe bandwidth rate is above a threshold for example.
647*86d7f5d3SJohn Marino.Pp
648*86d7f5d3SJohn MarinoThe original solution for measuring the bandwidth of a dataflow was
649*86d7f5d3SJohn Marinothat a user-level process would periodically
650*86d7f5d3SJohn Marinoquery the kernel about the number of forwarded packets/bytes per
651*86d7f5d3SJohn Marino(S,G), and then based on those numbers it would estimate whether a source
652*86d7f5d3SJohn Marinohas been idle, or whether the source's transmission bandwidth is above a
653*86d7f5d3SJohn Marinothreshold.
654*86d7f5d3SJohn MarinoThat solution is far from being scalable, hence the need for a new
655*86d7f5d3SJohn Marinomechanism for bandwidth monitoring.
656*86d7f5d3SJohn Marino.Pp
657*86d7f5d3SJohn MarinoBelow is a description of the bandwidth monitoring mechanism.
658*86d7f5d3SJohn Marino.Bl -bullet
659*86d7f5d3SJohn Marino.It
660*86d7f5d3SJohn MarinoIf the bandwidth of a data flow satisfies some pre-defined filter,
661*86d7f5d3SJohn Marinothe kernel delivers an upcall on the multicast routing socket
662*86d7f5d3SJohn Marinoto the multicast routing process that has installed that filter.
663*86d7f5d3SJohn Marino.It
664*86d7f5d3SJohn MarinoThe bandwidth-upcall filters are installed per (S,G). There can be
665*86d7f5d3SJohn Marinomore than one filter per (S,G).
666*86d7f5d3SJohn Marino.It
667*86d7f5d3SJohn MarinoInstead of supporting all possible comparison operations
668*86d7f5d3SJohn Marino(i.e., < <= == != > >= ), there is support only for the
669*86d7f5d3SJohn Marino<= and >= operations,
670*86d7f5d3SJohn Marinobecause this makes the kernel-level implementation simpler,
671*86d7f5d3SJohn Marinoand because practically we need only those two.
672*86d7f5d3SJohn MarinoFurther, the missing operations can be simulated by secondary
673*86d7f5d3SJohn Marinouser-level filtering of those <= and >= filters.
674*86d7f5d3SJohn MarinoFor example, to simulate !=, then we need to install filter
675*86d7f5d3SJohn Marino.Dq bw <= 0xffffffff ,
676*86d7f5d3SJohn Marinoand after an
677*86d7f5d3SJohn Marinoupcall is received, we need to check whether
678*86d7f5d3SJohn Marino.Dq measured_bw != expected_bw .
679*86d7f5d3SJohn Marino.It
680*86d7f5d3SJohn MarinoThe bandwidth-upcall mechanism is enabled by
681*86d7f5d3SJohn Marinosetsockopt(MRT_API_CONFIG) for the MRT_MFC_BW_UPCALL flag.
682*86d7f5d3SJohn Marino.It
683*86d7f5d3SJohn MarinoThe bandwidth-upcall filters are added/deleted by the new
684*86d7f5d3SJohn Marinosetsockopt(MRT_ADD_BW_UPCALL) and setsockopt(MRT_DEL_BW_UPCALL)
685*86d7f5d3SJohn Marinorespectively (with the appropriate
686*86d7f5d3SJohn Marino.Dq struct bw_upcall
687*86d7f5d3SJohn Marinoargument of course).
688*86d7f5d3SJohn Marino.El
689*86d7f5d3SJohn Marino.Pp
690*86d7f5d3SJohn MarinoFrom application point of view, a developer needs to know about
691*86d7f5d3SJohn Marinothe following:
692*86d7f5d3SJohn Marino.Bd -literal
693*86d7f5d3SJohn Marino/*
694*86d7f5d3SJohn Marino * Structure for installing or delivering an upcall if the
695*86d7f5d3SJohn Marino * measured bandwidth is above or below a threshold.
696*86d7f5d3SJohn Marino *
697*86d7f5d3SJohn Marino * User programs (e.g. daemons) may have a need to know when the
698*86d7f5d3SJohn Marino * bandwidth used by some data flow is above or below some threshold.
699*86d7f5d3SJohn Marino * This interface allows the userland to specify the threshold (in
700*86d7f5d3SJohn Marino * bytes and/or packets) and the measurement interval. Flows are
701*86d7f5d3SJohn Marino * all packet with the same source and destination IP address.
702*86d7f5d3SJohn Marino * At the moment the code is only used for multicast destinations
703*86d7f5d3SJohn Marino * but there is nothing that prevents its use for unicast.
704*86d7f5d3SJohn Marino *
705*86d7f5d3SJohn Marino * The measurement interval cannot be shorter than some Tmin (currently, 3s).
706*86d7f5d3SJohn Marino * The threshold is set in packets and/or bytes per_interval.
707*86d7f5d3SJohn Marino *
708*86d7f5d3SJohn Marino * Measurement works as follows:
709*86d7f5d3SJohn Marino *
710*86d7f5d3SJohn Marino * For >= measurements:
711*86d7f5d3SJohn Marino * The first packet marks the start of a measurement interval.
712*86d7f5d3SJohn Marino * During an interval we count packets and bytes, and when we
713*86d7f5d3SJohn Marino * pass the threshold we deliver an upcall and we are done.
714*86d7f5d3SJohn Marino * The first packet after the end of the interval resets the
715*86d7f5d3SJohn Marino * count and restarts the measurement.
716*86d7f5d3SJohn Marino *
717*86d7f5d3SJohn Marino * For <= measurement:
718*86d7f5d3SJohn Marino * We start a timer to fire at the end of the interval, and
719*86d7f5d3SJohn Marino * then for each incoming packet we count packets and bytes.
720*86d7f5d3SJohn Marino * When the timer fires, we compare the value with the threshold,
721*86d7f5d3SJohn Marino * schedule an upcall if we are below, and restart the measurement
722*86d7f5d3SJohn Marino * (reschedule timer and zero counters).
723*86d7f5d3SJohn Marino */
724*86d7f5d3SJohn Marino
725*86d7f5d3SJohn Marinostruct bw_data {
726*86d7f5d3SJohn Marino        struct timeval  b_time;
727*86d7f5d3SJohn Marino        uint64_t        b_packets;
728*86d7f5d3SJohn Marino        uint64_t        b_bytes;
729*86d7f5d3SJohn Marino};
730*86d7f5d3SJohn Marino
731*86d7f5d3SJohn Marinostruct bw_upcall {
732*86d7f5d3SJohn Marino        struct in_addr  bu_src;         /* source address            */
733*86d7f5d3SJohn Marino        struct in_addr  bu_dst;         /* destination address       */
734*86d7f5d3SJohn Marino        uint32_t        bu_flags;       /* misc flags (see below)    */
735*86d7f5d3SJohn Marino#define BW_UPCALL_UNIT_PACKETS (1 << 0) /* threshold (in packets)    */
736*86d7f5d3SJohn Marino#define BW_UPCALL_UNIT_BYTES   (1 << 1) /* threshold (in bytes)      */
737*86d7f5d3SJohn Marino#define BW_UPCALL_GEQ          (1 << 2) /* upcall if bw >= threshold */
738*86d7f5d3SJohn Marino#define BW_UPCALL_LEQ          (1 << 3) /* upcall if bw <= threshold */
739*86d7f5d3SJohn Marino#define BW_UPCALL_DELETE_ALL   (1 << 4) /* delete all upcalls for s,d*/
740*86d7f5d3SJohn Marino        struct bw_data  bu_threshold;   /* the bw threshold          */
741*86d7f5d3SJohn Marino        struct bw_data  bu_measured;    /* the measured bw           */
742*86d7f5d3SJohn Marino};
743*86d7f5d3SJohn Marino
744*86d7f5d3SJohn Marino/* max. number of upcalls to deliver together */
745*86d7f5d3SJohn Marino#define BW_UPCALLS_MAX				128
746*86d7f5d3SJohn Marino/* min. threshold time interval for bandwidth measurement */
747*86d7f5d3SJohn Marino#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_SEC	3
748*86d7f5d3SJohn Marino#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_USEC	0
749*86d7f5d3SJohn Marino.Ed
750*86d7f5d3SJohn Marino.Pp
751*86d7f5d3SJohn MarinoThe
752*86d7f5d3SJohn Marino.Dq bw_upcall
753*86d7f5d3SJohn Marinostructure is used as an argument to
754*86d7f5d3SJohn Marinosetsockopt(MRT_ADD_BW_UPCALL) and setsockopt(MRT_DEL_BW_UPCALL).
755*86d7f5d3SJohn MarinoEach setsockopt(MRT_ADD_BW_UPCALL) installs a filter in the kernel
756*86d7f5d3SJohn Marinofor the source and destination address in the
757*86d7f5d3SJohn Marino.Dq bw_upcall
758*86d7f5d3SJohn Marinoargument,
759*86d7f5d3SJohn Marinoand that filter will trigger an upcall according to the following
760*86d7f5d3SJohn Marinopseudo-algorithm:
761*86d7f5d3SJohn Marino.Bd -literal
762*86d7f5d3SJohn Marino if (bw_upcall_oper IS ">=") {
763*86d7f5d3SJohn Marino    if (((bw_upcall_unit & PACKETS == PACKETS) &&
764*86d7f5d3SJohn Marino         (measured_packets >= threshold_packets)) ||
765*86d7f5d3SJohn Marino        ((bw_upcall_unit & BYTES == BYTES) &&
766*86d7f5d3SJohn Marino         (measured_bytes >= threshold_bytes)))
767*86d7f5d3SJohn Marino       SEND_UPCALL("measured bandwidth is >= threshold");
768*86d7f5d3SJohn Marino  }
769*86d7f5d3SJohn Marino  if (bw_upcall_oper IS "<=" && measured_interval >= threshold_interval) {
770*86d7f5d3SJohn Marino    if (((bw_upcall_unit & PACKETS == PACKETS) &&
771*86d7f5d3SJohn Marino         (measured_packets <= threshold_packets)) ||
772*86d7f5d3SJohn Marino        ((bw_upcall_unit & BYTES == BYTES) &&
773*86d7f5d3SJohn Marino         (measured_bytes <= threshold_bytes)))
774*86d7f5d3SJohn Marino       SEND_UPCALL("measured bandwidth is <= threshold");
775*86d7f5d3SJohn Marino  }
776*86d7f5d3SJohn Marino.Ed
777*86d7f5d3SJohn Marino.Pp
778*86d7f5d3SJohn MarinoIn the same
779*86d7f5d3SJohn Marino.Dq bw_upcall
780*86d7f5d3SJohn Marinothe unit can be specified in both BYTES and PACKETS.
781*86d7f5d3SJohn MarinoHowever, the GEQ and LEQ flags are mutually exclusive.
782*86d7f5d3SJohn Marino.Pp
783*86d7f5d3SJohn MarinoBasically, an upcall is delivered if the measured bandwidth is >= or
784*86d7f5d3SJohn Marino<= the threshold bandwidth (within the specified measurement
785*86d7f5d3SJohn Marinointerval).
786*86d7f5d3SJohn MarinoFor practical reasons, the smallest value for the measurement
787*86d7f5d3SJohn Marinointerval is 3 seconds.
788*86d7f5d3SJohn MarinoIf smaller values are allowed, then the bandwidth
789*86d7f5d3SJohn Marinoestimation may be less accurate, or the potentially very high frequency
790*86d7f5d3SJohn Marinoof the generated upcalls may introduce too much overhead.
791*86d7f5d3SJohn MarinoFor the >= operation, the answer may be known before the end of
792*86d7f5d3SJohn Marino.Dq threshold_interval ,
793*86d7f5d3SJohn Marinotherefore the upcall may be delivered earlier.
794*86d7f5d3SJohn MarinoFor the <= operation however, we must wait
795*86d7f5d3SJohn Marinountil the threshold interval has expired to know the answer.
796*86d7f5d3SJohn Marino.Pp
797*86d7f5d3SJohn MarinoExample of usage:
798*86d7f5d3SJohn Marino.Bd -literal
799*86d7f5d3SJohn Marinostruct bw_upcall bw_upcall;
800*86d7f5d3SJohn Marino/* Assign all bw_upcall fields as appropriate */
801*86d7f5d3SJohn Marinomemset(&bw_upcall, 0, sizeof(bw_upcall));
802*86d7f5d3SJohn Marinomemcpy(&bw_upcall.bu_src, &source, sizeof(bw_upcall.bu_src));
803*86d7f5d3SJohn Marinomemcpy(&bw_upcall.bu_dst, &group, sizeof(bw_upcall.bu_dst));
804*86d7f5d3SJohn Marinobw_upcall.bu_threshold.b_data = threshold_interval;
805*86d7f5d3SJohn Marinobw_upcall.bu_threshold.b_packets = threshold_packets;
806*86d7f5d3SJohn Marinobw_upcall.bu_threshold.b_bytes = threshold_bytes;
807*86d7f5d3SJohn Marinoif (is_threshold_in_packets)
808*86d7f5d3SJohn Marino    bw_upcall.bu_flags |= BW_UPCALL_UNIT_PACKETS;
809*86d7f5d3SJohn Marinoif (is_threshold_in_bytes)
810*86d7f5d3SJohn Marino    bw_upcall.bu_flags |= BW_UPCALL_UNIT_BYTES;
811*86d7f5d3SJohn Marinodo {
812*86d7f5d3SJohn Marino    if (is_geq_upcall) {
813*86d7f5d3SJohn Marino        bw_upcall.bu_flags |= BW_UPCALL_GEQ;
814*86d7f5d3SJohn Marino        break;
815*86d7f5d3SJohn Marino    }
816*86d7f5d3SJohn Marino    if (is_leq_upcall) {
817*86d7f5d3SJohn Marino        bw_upcall.bu_flags |= BW_UPCALL_LEQ;
818*86d7f5d3SJohn Marino        break;
819*86d7f5d3SJohn Marino    }
820*86d7f5d3SJohn Marino    return (ERROR);
821*86d7f5d3SJohn Marino} while (0);
822*86d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_BW_UPCALL,
823*86d7f5d3SJohn Marino          (void *)&bw_upcall, sizeof(bw_upcall));
824*86d7f5d3SJohn Marino.Ed
825*86d7f5d3SJohn Marino.Pp
826*86d7f5d3SJohn MarinoTo delete a single filter, then use MRT_DEL_BW_UPCALL,
827*86d7f5d3SJohn Marinoand the fields of bw_upcall must be set
828*86d7f5d3SJohn Marinoexactly same as when MRT_ADD_BW_UPCALL was called.
829*86d7f5d3SJohn Marino.Pp
830*86d7f5d3SJohn MarinoTo delete all bandwidth filters for a given (S,G), then
831*86d7f5d3SJohn Marinoonly the
832*86d7f5d3SJohn Marino.Dq bu_src
833*86d7f5d3SJohn Marinoand
834*86d7f5d3SJohn Marino.Dq bu_dst
835*86d7f5d3SJohn Marinofields in
836*86d7f5d3SJohn Marino.Dq struct bw_upcall
837*86d7f5d3SJohn Marinoneed to be set, and then just set only the
838*86d7f5d3SJohn Marino.Dq BW_UPCALL_DELETE_ALL
839*86d7f5d3SJohn Marinoflag inside field
840*86d7f5d3SJohn Marino.Dq bw_upcall.bu_flags .
841*86d7f5d3SJohn Marino.Pp
842*86d7f5d3SJohn MarinoThe bandwidth upcalls are received by aggregating them in the new upcall
843*86d7f5d3SJohn Marinomessage:
844*86d7f5d3SJohn Marino.Bd -literal
845*86d7f5d3SJohn Marino#define IGMPMSG_BW_UPCALL  4  /* BW monitoring upcall */
846*86d7f5d3SJohn Marino.Ed
847*86d7f5d3SJohn Marino.Pp
848*86d7f5d3SJohn MarinoThis message is an array of
849*86d7f5d3SJohn Marino.Dq struct bw_upcall
850*86d7f5d3SJohn Marinoelements (up to BW_UPCALLS_MAX = 128).
851*86d7f5d3SJohn MarinoThe upcalls are
852*86d7f5d3SJohn Marinodelivered when there are 128 pending upcalls, or when 1 second has
853*86d7f5d3SJohn Marinoexpired since the previous upcall (whichever comes first).
854*86d7f5d3SJohn MarinoIn an
855*86d7f5d3SJohn Marino.Dq struct upcall
856*86d7f5d3SJohn Marinoelement, the
857*86d7f5d3SJohn Marino.Dq bu_measured
858*86d7f5d3SJohn Marinofield is filled-in to
859*86d7f5d3SJohn Marinoindicate the particular measured values.
860*86d7f5d3SJohn MarinoHowever, because of the way
861*86d7f5d3SJohn Marinothe particular intervals are measured, the user should be careful how
862*86d7f5d3SJohn Marinobu_measured.b_time is used.
863*86d7f5d3SJohn MarinoFor example, if the
864*86d7f5d3SJohn Marinofilter is installed to trigger an upcall if the number of packets
865*86d7f5d3SJohn Marinois >= 1, then
866*86d7f5d3SJohn Marino.Dq bu_measured
867*86d7f5d3SJohn Marinomay have a value of zero in the upcalls after the
868*86d7f5d3SJohn Marinofirst one, because the measured interval for >= filters is
869*86d7f5d3SJohn Marino.Dq clocked
870*86d7f5d3SJohn Marinoby the forwarded packets.
871*86d7f5d3SJohn MarinoHence, this upcall mechanism should not be used for measuring
872*86d7f5d3SJohn Marinothe exact value of the bandwidth of the forwarded data.
873*86d7f5d3SJohn MarinoTo measure the exact bandwidth, the user would need to
874*86d7f5d3SJohn Marinoget the forwarded packets statistics with the ioctl(SIOCGETSGCNT)
875*86d7f5d3SJohn Marinomechanism
876*86d7f5d3SJohn Marino(see the
877*86d7f5d3SJohn Marino.Sx Programming Guide
878*86d7f5d3SJohn Marinosection) .
879*86d7f5d3SJohn Marino.Pp
880*86d7f5d3SJohn MarinoNote that the upcalls for a filter are delivered until the specific
881*86d7f5d3SJohn Marinofilter is deleted, but no more frequently than once per
882*86d7f5d3SJohn Marino.Dq bu_threshold.b_time .
883*86d7f5d3SJohn MarinoFor example, if the filter is specified to
884*86d7f5d3SJohn Marinodeliver a signal if bw >= 1 packet, the first packet will trigger a
885*86d7f5d3SJohn Marinosignal, but the next upcall will be triggered no earlier than
886*86d7f5d3SJohn Marino.Dq bu_threshold.b_time
887*86d7f5d3SJohn Marinoafter the previous upcall.
888*86d7f5d3SJohn Marino.\"
889*86d7f5d3SJohn Marino.Sh SEE ALSO
890*86d7f5d3SJohn Marino.Xr getsockopt 2 ,
891*86d7f5d3SJohn Marino.Xr recvfrom 2 ,
892*86d7f5d3SJohn Marino.Xr recvmsg 2 ,
893*86d7f5d3SJohn Marino.Xr setsockopt 2 ,
894*86d7f5d3SJohn Marino.Xr socket 2 ,
895*86d7f5d3SJohn Marino.Xr icmp6 4 ,
896*86d7f5d3SJohn Marino.Xr inet 4 ,
897*86d7f5d3SJohn Marino.Xr inet6 4 ,
898*86d7f5d3SJohn Marino.Xr intro 4 ,
899*86d7f5d3SJohn Marino.Xr ip 4 ,
900*86d7f5d3SJohn Marino.Xr ip6 4 ,
901*86d7f5d3SJohn Marino.Xr pim 4
902*86d7f5d3SJohn Marino.\"
903*86d7f5d3SJohn Marino.Sh AUTHORS
904*86d7f5d3SJohn Marino.An -nosplit
905*86d7f5d3SJohn MarinoThe original multicast code was written by
906*86d7f5d3SJohn Marino.An David Waitzman
907*86d7f5d3SJohn Marino(BBN Labs), and later modified by the following individuals:
908*86d7f5d3SJohn Marino.An Steve Deering
909*86d7f5d3SJohn Marino(Stanford),
910*86d7f5d3SJohn Marino.An Mark J. Steiglitz
911*86d7f5d3SJohn Marino(Stanford),
912*86d7f5d3SJohn Marino.An Van Jacobson
913*86d7f5d3SJohn Marino(LBL),
914*86d7f5d3SJohn Marino.An Ajit Thyagarajan
915*86d7f5d3SJohn Marino(PARC),
916*86d7f5d3SJohn Marino.An Bill Fenner
917*86d7f5d3SJohn Marino(PARC).
918*86d7f5d3SJohn MarinoThe IPv6 multicast support was implemented by the KAME project
919*86d7f5d3SJohn Marino.Pa ( http://www.kame.net ) ,
920*86d7f5d3SJohn Marinoand was based on the IPv4 multicast code.
921*86d7f5d3SJohn MarinoThe advanced multicast API and the multicast bandwidth
922*86d7f5d3SJohn Marinomonitoring were implemented by
923*86d7f5d3SJohn Marino.An Pavlin Radoslavov
924*86d7f5d3SJohn Marino(ICSI) in collaboration with
925*86d7f5d3SJohn Marino.An Chris Brown
926*86d7f5d3SJohn Marino(NextHop).
927*86d7f5d3SJohn Marino.Pp
928*86d7f5d3SJohn MarinoThis manual page was written by
929*86d7f5d3SJohn Marino.An Pavlin Radoslavov
930*86d7f5d3SJohn Marino(ICSI).
931