xref: /openbsd-src/share/man/man4/multicast.4 (revision 897fc685943471cf985a0fe38ba076ea6fe74fa5)
1.\" Copyright (c) 2001-2003 International Computer Science Institute
2.\"
3.\" Permission is hereby granted, free of charge, to any person obtaining a
4.\" copy of this software and associated documentation files (the "Software"),
5.\" to deal in the Software without restriction, including without limitation
6.\" the rights to use, copy, modify, merge, publish, distribute, sublicense,
7.\" and/or sell copies of the Software, and to permit persons to whom the
8.\" Software is furnished to do so, subject to the following conditions:
9.\"
10.\" The above copyright notice and this permission notice shall be included in
11.\" all copies or substantial portions of the Software.
12.\"
13.\" The names and trademarks of copyright holders may not be used in
14.\" advertising or publicity pertaining to the software without specific
15.\" prior permission. Title to copyright in this software and any associated
16.\" documentation will at all times remain with the copyright holders.
17.\"
18.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
19.\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
20.\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
21.\" AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
22.\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
23.\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
24.\" DEALINGS IN THE SOFTWARE.
25.\"
26.\" $FreeBSD: src/share/man/man4/multicast.4,v 1.4 2004/07/09 09:22:36 ru Exp $
27.\" $OpenBSD: multicast.4,v 1.12 2018/03/07 09:54:23 jmc Exp $
28.\" $NetBSD: multicast.4,v 1.3 2004/09/12 13:12:26 wiz Exp $
29.\"
30.Dd $Mdocdate: March 7 2018 $
31.Dt MULTICAST 4
32.Os
33.\"
34.Sh NAME
35.Nm multicast
36.Nd multicast routing
37.\"
38.Sh SYNOPSIS
39.Cd "options MROUTING"
40.Pp
41.In sys/types.h
42.In sys/socket.h
43.In netinet/in.h
44.In netinet/ip_mroute.h
45.In netinet6/ip6_mroute.h
46.Ft int
47.Fn getsockopt "int s" IPPROTO_IP MRT_INIT "void *optval" "socklen_t *optlen"
48.Ft int
49.Fn setsockopt "int s" IPPROTO_IP MRT_INIT "const void *optval" "socklen_t optlen"
50.Ft int
51.Fn getsockopt "int s" IPPROTO_IPV6 MRT6_INIT "void *optval" "socklen_t *optlen"
52.Ft int
53.Fn setsockopt "int s" IPPROTO_IPV6 MRT6_INIT "const void *optval" "socklen_t optlen"
54.Sh DESCRIPTION
55Multicast routing is used to efficiently propagate data
56packets to a set of multicast listeners in multipoint networks.
57If unicast is used to replicate the data to all listeners,
58then some of the network links may carry multiple copies of the same
59data packets.
60With multicast routing, the overhead is reduced to one copy
61(at most) per network link.
62.Pp
63All multicast-capable routers must run a common multicast routing
64protocol.
65The Distance Vector Multicast Routing Protocol (DVMRP)
66was the first developed multicast routing protocol.
67Later, other protocols such as Multicast Extensions to OSPF (MOSPF) and
68Core Based Trees (CBT)
69were developed as well.
70.Pp
71To start multicast routing,
72the user must enable multicast forwarding via the
73.Xr sysctl 8
74variables
75.Va net.inet.ip.mforwarding
76and/or
77.Va net.inet.ip6.mforwarding ,
78and set
79.Va multicast
80to
81.Dq YES
82in
83.Xr rc.conf.local 8 .
84The user must also run a multicast routing capable user-level process,
85such as
86.Xr mrouted 8 .
87From a developer's point of view,
88the programming guide described in the
89.Sx Programming Guide
90section should be used to control the multicast forwarding in the kernel.
91.\"
92.Ss Programming Guide
93This section provides information about the basic multicast routing API.
94The so-called
95.Dq advanced multicast API
96is described in the
97.Sx "Advanced Multicast API Programming Guide"
98section.
99.Pp
100First, a multicast routing socket must be open.
101That socket would be used
102to control the multicast forwarding in the kernel.
103Note that most operations below require certain privilege
104(i.e., root privilege):
105.Bd -literal -offset indent
106/* IPv4 */
107int mrouter_s4;
108mrouter_s4 = socket(AF_INET, SOCK_RAW, IPPROTO_IGMP);
109.Ed
110.Bd -literal -offset indent
111int mrouter_s6;
112mrouter_s6 = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6);
113.Ed
114.Pp
115Note that if the router needs to open an IGMP or ICMPv6 socket
116(IPv4 or IPv6, respectively)
117for sending or receiving of IGMP or MLD multicast group membership messages,
118then the same
119.Va mrouter_s4
120or
121.Va mrouter_s6
122sockets should be used
123for sending and receiving respectively IGMP or MLD messages.
124In the case of
125.Bx -derived
126kernels,
127it may be possible to open separate sockets
128for IGMP or MLD messages only.
129However, some other kernels (e.g., Linux)
130require that the multicast
131routing socket must be used for sending and receiving of IGMP or MLD
132messages.
133Therefore, for portability reasons, the multicast
134routing socket should be reused for IGMP and MLD messages as well.
135.Pp
136After the multicast routing socket is open, it can be used to enable
137or disable multicast forwarding in the kernel:
138.Bd -literal -offset 5n
139/* IPv4 */
140int v = 1;        /* 1 to enable, or 0 to disable */
141setsockopt(mrouter_s4, IPPROTO_IP, MRT_INIT, (void *)&v, sizeof(v));
142.Ed
143.Bd -literal -offset 5n
144/* IPv6 */
145int v = 1;        /* 1 to enable, or 0 to disable */
146setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_INIT, (void *)&v, sizeof(v));
147\&...
148/* If necessary, filter all ICMPv6 messages */
149struct icmp6_filter filter;
150ICMP6_FILTER_SETBLOCKALL(&filter);
151setsockopt(mrouter_s6, IPPROTO_ICMPV6, ICMP6_FILTER, (void *)&filter,
152           sizeof(filter));
153.Ed
154.Pp
155For each network interface (e.g., physical or a virtual tunnel)
156that would be used for multicast forwarding, a corresponding
157multicast interface must be added to the kernel:
158.Bd -literal -offset 3n
159/* IPv4 */
160struct vifctl vc;
161memset(&vc, 0, sizeof(vc));
162/* Assign all vifctl fields as appropriate */
163vc.vifc_vifi = vif_index;
164vc.vifc_flags = vif_flags;
165vc.vifc_threshold = min_ttl_threshold;
166vc.vifc_rate_limit = max_rate_limit;
167memcpy(&vc.vifc_lcl_addr, &vif_local_address, sizeof(vc.vifc_lcl_addr));
168if (vc.vifc_flags & VIFF_TUNNEL)
169    memcpy(&vc.vifc_rmt_addr, &vif_remote_address,
170           sizeof(vc.vifc_rmt_addr));
171setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_VIF, (void *)&vc,
172           sizeof(vc));
173.Ed
174.Pp
175The
176.Va vif_index
177must be unique per vif.
178The
179.Va vif_flags
180contains the
181.Dv VIFF_*
182flags as defined in
183.In netinet/ip_mroute.h .
184The
185.Va min_ttl_threshold
186contains the minimum TTL a multicast data packet must have to be
187forwarded on that vif.
188Typically, it would be 1.
189The
190.Va max_rate_limit
191contains the maximum rate (in bits/s) of the multicast data packets forwarded
192on that vif.
193A value of 0 means no limit.
194The
195.Va vif_local_address
196contains the local IP address of the corresponding local interface.
197The
198.Va vif_remote_address
199contains the remote IP address for DVMRP multicast tunnels.
200.Bd -literal -offset indent
201/* IPv6 */
202struct mif6ctl mc;
203memset(&mc, 0, sizeof(mc));
204/* Assign all mif6ctl fields as appropriate */
205mc.mif6c_mifi = mif_index;
206mc.mif6c_flags = mif_flags;
207mc.mif6c_pifi = pif_index;
208setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_ADD_MIF, (void *)&mc,
209           sizeof(mc));
210.Ed
211.Pp
212The
213.Va mif_index
214must be unique per vif.
215The
216.Va mif_flags
217contains the
218.Dv MIFF_*
219flags as defined in
220.In netinet6/ip6_mroute.h .
221The
222.Va pif_index
223is the physical interface index of the corresponding local interface.
224.Pp
225A multicast interface is deleted by:
226.Bd -literal -offset indent
227/* IPv4 */
228vifi_t vifi = vif_index;
229setsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_VIF, (void *)&vifi,
230           sizeof(vifi));
231.Ed
232.Bd -literal -offset indent
233/* IPv6 */
234mifi_t mifi = mif_index;
235setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_DEL_MIF, (void *)&mifi,
236           sizeof(mifi));
237.Ed
238.Pp
239After multicast forwarding is enabled, and the multicast virtual
240interfaces have been
241added, the kernel may deliver upcall messages (also called signals
242later in this text) on the multicast routing socket that was open
243earlier with
244.Dv MRT_INIT
245or
246.Dv MRT6_INIT .
247The IPv4 upcalls have a
248.Vt "struct igmpmsg"
249header (see
250.In netinet/ip_mroute.h )
251with the
252.Va im_mbz
253field set to zero.
254Note that this header follows the structure of
255.Vt "struct ip"
256with the protocol field
257.Va ip_p
258set to zero.
259The IPv6 upcalls have a
260.Vt "struct mrt6msg"
261header (see
262.In netinet6/ip6_mroute.h )
263with the
264.Va im6_mbz
265field set to zero.
266Note that this header follows the structure of
267.Vt "struct ip6_hdr"
268with the next header field
269.Va ip6_nxt
270set to zero.
271.Pp
272The upcall header contains the
273.Va im_msgtype
274and
275.Va im6_msgtype
276fields, with the type of the upcall
277.Dv IGMPMSG_*
278and
279.Dv MRT6MSG_*
280for IPv4 and IPv6, respectively.
281The values of the rest of the upcall header fields
282and the body of the upcall message depend on the particular upcall type.
283.Pp
284If the upcall message type is
285.Dv IGMPMSG_NOCACHE
286or
287.Dv MRT6MSG_NOCACHE ,
288this is an indication that a multicast packet has reached the multicast
289router, but the router has no forwarding state for that packet.
290Typically, the upcall would be a signal for the multicast routing
291user-level process to install the appropriate Multicast Forwarding
292Cache (MFC) entry in the kernel.
293.Pp
294An MFC entry is added by:
295.Bd -literal -offset indent
296/* IPv4 */
297struct mfcctl mc;
298memset(&mc, 0, sizeof(mc));
299memcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin));
300memcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp));
301mc.mfcc_parent = iif_index;
302for (i = 0; i < maxvifs; i++)
303    mc.mfcc_ttls[i] = oifs_ttl[i];
304setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_MFC,
305           (void *)&mc, sizeof(mc));
306.Ed
307.Bd -literal -offset indent
308/* IPv6 */
309struct mf6cctl mc;
310memset(&mc, 0, sizeof(mc));
311memcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin));
312memcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp));
313mc.mf6cc_parent = iif_index;
314for (i = 0; i < maxvifs; i++)
315    if (oifs_ttl[i] > 0)
316        IF_SET(i, &mc.mf6cc_ifset);
317setsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_ADD_MFC,
318           (void *)&mc, sizeof(mc));
319.Ed
320.Pp
321The
322.Va source_addr
323and
324.Va group_addr
325fields are the source and group address of the multicast packet (as set
326in the upcall message).
327The
328.Va iif_index
329is the virtual interface index of the multicast interface the multicast
330packets for this specific source and group address should be received on.
331The
332.Va oifs_ttl[]
333array contains the minimum TTL (per interface) a multicast packet
334should have to be forwarded on an outgoing interface.
335If the TTL value is zero, the corresponding interface is not included
336in the set of outgoing interfaces.
337Note that for IPv6 only the set of outgoing interfaces can
338be specified.
339.Pp
340An MFC entry is deleted by:
341.Bd -literal -offset indent
342/* IPv4 */
343struct mfcctl mc;
344memset(&mc, 0, sizeof(mc));
345memcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin));
346memcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp));
347setsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_MFC,
348           (void *)&mc, sizeof(mc));
349.Ed
350.Bd -literal -offset indent
351/* IPv6 */
352struct mf6cctl mc;
353memset(&mc, 0, sizeof(mc));
354memcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin));
355memcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp));
356setsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_DEL_MFC,
357           (void *)&mc, sizeof(mc));
358.Ed
359.Pp
360The following method can be used to get various statistics per
361installed MFC entry in the kernel (e.g., the number of forwarded
362packets per source and group address):
363.Bd -literal -offset indent
364/* IPv4 */
365struct sioc_sg_req sgreq;
366memset(&sgreq, 0, sizeof(sgreq));
367memcpy(&sgreq.src, &source_addr, sizeof(sgreq.src));
368memcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp));
369ioctl(mrouter_s4, SIOCGETSGCNT, &sgreq);
370.Ed
371.Bd -literal -offset indent
372/* IPv6 */
373struct sioc_sg_req6 sgreq;
374memset(&sgreq, 0, sizeof(sgreq));
375memcpy(&sgreq.src, &source_addr, sizeof(sgreq.src));
376memcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp));
377ioctl(mrouter_s6, SIOCGETSGCNT_IN6, &sgreq);
378.Ed
379.Pp
380The following method can be used to get various statistics per
381multicast virtual interface in the kernel (e.g., the number of forwarded
382packets per interface):
383.Bd -literal -offset indent
384/* IPv4 */
385struct sioc_vif_req vreq;
386memset(&vreq, 0, sizeof(vreq));
387vreq.vifi = vif_index;
388ioctl(mrouter_s4, SIOCGETVIFCNT, &vreq);
389.Ed
390.Bd -literal -offset indent
391/* IPv6 */
392struct sioc_mif_req6 mreq;
393memset(&mreq, 0, sizeof(mreq));
394mreq.mifi = vif_index;
395ioctl(mrouter_s6, SIOCGETMIFCNT_IN6, &mreq);
396.Ed
397.Ss Advanced Multicast API Programming Guide
398Adding new features to the kernel makes it difficult
399to preserve backward compatibility (binary and API),
400and at the same time to allow user-level processes to take advantage of
401the new features (if the kernel supports them).
402.Pp
403One of the mechanisms that allows preserving the backward
404compatibility is a sort of negotiation
405between the user-level process and the kernel:
406.Bl -enum
407.It
408The user-level process tries to enable in the kernel the set of new
409features (and the corresponding API) it would like to use.
410.It
411The kernel returns the (sub)set of features it knows about
412and is willing to be enabled.
413.It
414The user-level process uses only that set of features
415the kernel has agreed on.
416.El
417.\"
418.Pp
419To support backward compatibility, if the user-level process does not
420ask for any new features, the kernel defaults to the basic
421multicast API (see the
422.Sx "Programming Guide"
423section).
424.\" XXX: edit as appropriate after the advanced multicast API is
425.\" supported under IPv6
426Currently, the advanced multicast API exists only for IPv4;
427in the future there will be IPv6 support as well.
428.Pp
429Below is a summary of the expandable API solution.
430Note that all new options and structures are defined
431in
432.In netinet/ip_mroute.h
433and
434.In netinet6/ip6_mroute.h ,
435unless stated otherwise.
436.Pp
437The user-level process uses new
438.Fn getsockopt Ns / Ns Fn setsockopt
439options to
440perform the API features negotiation with the kernel.
441This negotiation must be performed right after the multicast routing
442socket is open.
443The set of desired/allowed features is stored in a bitset
444(currently, in
445.Vt uint32_t
446i.e., maximum of 32 new features).
447The new
448.Fn getsockopt Ns / Ns Fn setsockopt
449options are
450.Dv MRT_API_SUPPORT
451and
452.Dv MRT_API_CONFIG .
453An example:
454.Bd -literal -offset 3n
455uint32_t v;
456getsockopt(sock, IPPROTO_IP, MRT_API_SUPPORT, (void *)&v, sizeof(v));
457.Ed
458.Pp
459This would set
460.Va v
461to the pre-defined bits that the kernel API supports.
462The eight least significant bits in
463.Vt uint32_t
464are the same as the
465eight possible flags
466.Dv MRT_MFC_FLAGS_*
467that can be used in
468.Va mfcc_flags
469as part of the new definition of
470.Vt "struct mfcctl"
471(see below about those flags), which leaves 24 flags for other new features.
472The value returned by
473.Fn getsockopt MRT_API_SUPPORT
474is read-only; in other words,
475.Fn setsockopt MRT_API_SUPPORT
476would fail.
477.Pp
478To modify the API, and to set some specific feature in the kernel, then:
479.Bd -literal -offset 3n
480uint32_t v = MRT_MFC_FLAGS_DISABLE_WRONGVIF;
481if (setsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v))
482    != 0) {
483    return (ERROR);
484}
485if (v & MRT_MFC_FLAGS_DISABLE_WRONGVIF)
486    return (OK);	/* Success */
487else
488    return (ERROR);
489.Ed
490.Pp
491In other words, when
492.Fn setsockopt MRT_API_CONFIG
493is called, the
494argument to it specifies the desired set of features to
495be enabled in the API and the kernel.
496The return value in
497.Va v
498is the actual (sub)set of features that were enabled in the kernel.
499To obtain later the same set of features that were enabled, use:
500.Bd -literal -offset indent
501getsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v));
502.Ed
503.Pp
504The set of enabled features is global.
505In other words,
506.Fn setsockopt MRT_API_CONFIG
507should be called right after
508.Fn setsockopt MRT_INIT .
509.Pp
510Currently, the following set of new features is defined:
511.Bd -literal
512#define	MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0)/*disable WRONGVIF signals*/
513#define	MRT_MFC_FLAGS_BORDER_VIF   (1 << 1)  /* border vif              */
514#define MRT_MFC_RP                 (1 << 8)  /* enable RP address	*/
515#define MRT_MFC_BW_UPCALL          (1 << 9)  /* enable bw upcalls	*/
516.Ed
517.\" .Pp
518.\" In the future there might be:
519.\" .Bd -literal
520.\" #define MRT_MFC_GROUP_SPECIFIC     (1 << 10) /* allow (*,G) MFC entries */
521.\" .Ed
522.\" .Pp
523.\" to allow (*,G) MFC entries (i.e., group-specific entries) in the kernel.
524.\" For now this is left-out until it is clear whether
525.\" (*,G) MFC support is the preferred solution instead of something more generic
526.\" solution for example.
527.\"
528.\" 2. The newly defined struct mfcctl2.
529.\"
530.Pp
531The advanced multicast API uses a newly defined
532.Vt "struct mfcctl2"
533instead of the traditional
534.Vt "struct mfcctl" .
535The original
536.Vt "struct mfcctl"
537is kept as is.
538The new
539.Vt "struct mfcctl2"
540is:
541.Bd -literal
542/*
543 * The new argument structure for MRT_ADD_MFC and MRT_DEL_MFC overlays
544 * and extends the old struct mfcctl.
545 */
546struct mfcctl2 {
547        /* the mfcctl fields */
548        struct in_addr  mfcc_origin;       /* ip origin of mcasts       */
549        struct in_addr  mfcc_mcastgrp;     /* multicast group associated*/
550        vifi_t          mfcc_parent;       /* incoming vif              */
551        u_char          mfcc_ttls[MAXVIFS];/* forwarding ttls on vifs   */
552
553        /* extension fields */
554        uint8_t         mfcc_flags[MAXVIFS];/* the MRT_MFC_FLAGS_* flags*/
555        struct in_addr  mfcc_rp;            /* the RP address           */
556};
557.Ed
558.Pp
559The new fields are
560.Va mfcc_flags[MAXVIFS]
561and
562.Va mfcc_rp .
563Note that for compatibility reasons they are added at the end.
564.Pp
565The
566.Va mfcc_flags[MAXVIFS]
567field is used to set various flags per
568interface per (S,G) entry.
569Currently, the defined flags are:
570.Bd -literal
571#define	MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0)/*disable WRONGVIF signals*/
572#define	MRT_MFC_FLAGS_BORDER_VIF       (1 << 1) /* border vif          */
573.Ed
574.Pp
575The
576.Dv MRT_MFC_FLAGS_DISABLE_WRONGVIF
577flag is used to explicitly disable the
578.Dv IGMPMSG_WRONGVIF
579kernel signal at the (S,G) granularity if a multicast data packet
580arrives on the wrong interface.
581However, it should not be delivered for interfaces that are not set in
582the outgoing interface, and that are not expecting to
583become an incoming interface.
584Hence, if the
585.Dv MRT_MFC_FLAGS_DISABLE_WRONGVIF
586flag is set for some of the
587interfaces, then a data packet that arrives on that interface for
588that MFC entry will NOT trigger a WRONGVIF signal.
589If that flag is not set, then a signal is triggered (the default action).
590.Pp
591Typically, a multicast routing user-level process would need to know the
592forwarding bandwidth for some data flow.
593.Pp
594The original solution for measuring the bandwidth of a dataflow was
595that a user-level process would periodically
596query the kernel about the number of forwarded packets/bytes per
597(S,G), and then based on those numbers it would estimate whether a source
598has been idle, or whether the source's transmission bandwidth is above a
599threshold.
600That solution is far from being scalable, hence the need for a new
601mechanism for bandwidth monitoring.
602.Pp
603Below is a description of the bandwidth monitoring mechanism.
604.Bl -bullet
605.It
606If the bandwidth of a data flow satisfies some pre-defined filter,
607the kernel delivers an upcall on the multicast routing socket
608to the multicast routing process that has installed that filter.
609.It
610The bandwidth-upcall filters are installed per (S,G).
611There can be
612more than one filter per (S,G).
613.It
614Instead of supporting all possible comparison operations
615(i.e., < <= == != > >= ), there is support only for the
616<= and >= operations,
617because this makes the kernel-level implementation simpler,
618and because practically we need only those two.
619Furthermore, the missing operations can be simulated by secondary
620user-level filtering of those <= and >= filters.
621For example, to simulate !=, then we need to install filter
622.Dq bw <= 0xffffffff ,
623and after an
624upcall is received, we need to check whether
625.Dq measured_bw != expected_bw .
626.It
627The bandwidth-upcall mechanism is enabled by
628.Fn setsockopt MRT_API_CONFIG
629for the
630.Dv MRT_MFC_BW_UPCALL
631flag.
632.It
633The bandwidth-upcall filters are added/deleted by the new
634.Fn setsockopt MRT_ADD_BW_UPCALL
635and
636.Fn setsockopt MRT_DEL_BW_UPCALL
637respectively (with the appropriate
638.Vt "struct bw_upcall"
639argument of course).
640.El
641.Pp
642From an application point of view, a developer needs to know about
643the following:
644.Bd -literal
645/*
646 * Structure for installing or delivering an upcall if the
647 * measured bandwidth is above or below a threshold.
648 *
649 * User programs (e.g. daemons) may have a need to know when the
650 * bandwidth used by some data flow is above or below some threshold.
651 * This interface allows the userland to specify the threshold (in
652 * bytes and/or packets) and the measurement interval. Flows are
653 * all packet with the same source and destination IP address.
654 * At the moment the code is only used for multicast destinations
655 * but there is nothing that prevents its use for unicast.
656 *
657 * The measurement interval cannot be shorter than some Tmin (3s).
658 * The threshold is set in packets and/or bytes per_interval.
659 *
660 * Measurement works as follows:
661 *
662 * For >= measurements:
663 * The first packet marks the start of a measurement interval.
664 * During an interval we count packets and bytes, and when we
665 * pass the threshold we deliver an upcall and we are done.
666 * The first packet after the end of the interval resets the
667 * count and restarts the measurement.
668 *
669 * For <= measurement:
670 * We start a timer to fire at the end of the interval, and
671 * then for each incoming packet we count packets and bytes.
672 * When the timer fires, we compare the value with the threshold,
673 * schedule an upcall if we are below, and restart the measurement
674 * (reschedule timer and zero counters).
675 */
676
677struct bw_data {
678        struct timeval  b_time;
679        uint64_t        b_packets;
680        uint64_t        b_bytes;
681};
682
683struct bw_upcall {
684        struct in_addr  bu_src;         /* source address            */
685        struct in_addr  bu_dst;         /* destination address       */
686        uint32_t        bu_flags;       /* misc flags (see below)    */
687#define BW_UPCALL_UNIT_PACKETS (1 << 0) /* threshold (in packets)    */
688#define BW_UPCALL_UNIT_BYTES   (1 << 1) /* threshold (in bytes)      */
689#define BW_UPCALL_GEQ          (1 << 2) /* upcall if bw >= threshold */
690#define BW_UPCALL_LEQ          (1 << 3) /* upcall if bw <= threshold */
691#define BW_UPCALL_DELETE_ALL   (1 << 4) /* delete all upcalls for s,d*/
692        struct bw_data  bu_threshold;   /* the bw threshold          */
693        struct bw_data  bu_measured;    /* the measured bw           */
694};
695
696/* max. number of upcalls to deliver together */
697#define BW_UPCALLS_MAX				128
698/* min. threshold time interval for bandwidth measurement */
699#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_SEC	3
700#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_USEC	0
701.Ed
702.Pp
703The
704.Vt bw_upcall
705structure is used as an argument to
706.Fn setsockopt MRT_ADD_BW_UPCALL
707and
708.Fn setsockopt MRT_DEL_BW_UPCALL .
709Each
710.Fn setsockopt MRT_ADD_BW_UPCALL
711installs a filter in the kernel
712for the source and destination address in the
713.Vt bw_upcall
714argument,
715and that filter will trigger an upcall according to the following
716pseudo-algorithm:
717.Bd -literal
718 if (bw_upcall_oper IS ">=") {
719    if (((bw_upcall_unit & PACKETS == PACKETS) &&
720         (measured_packets >= threshold_packets)) ||
721        ((bw_upcall_unit & BYTES == BYTES) &&
722         (measured_bytes >= threshold_bytes)))
723       SEND_UPCALL("measured bandwidth is >= threshold");
724  }
725  if (bw_upcall_oper IS "<=" && measured_interval >= threshold_interval) {
726    if (((bw_upcall_unit & PACKETS == PACKETS) &&
727         (measured_packets <= threshold_packets)) ||
728        ((bw_upcall_unit & BYTES == BYTES) &&
729         (measured_bytes <= threshold_bytes)))
730       SEND_UPCALL("measured bandwidth is <= threshold");
731  }
732.Ed
733.Pp
734In the same
735.Vt bw_upcall ,
736the unit can be specified in both BYTES and PACKETS.
737However, the GEQ and LEQ flags are mutually exclusive.
738.Pp
739Basically, an upcall is delivered if the measured bandwidth is >= or
740<= the threshold bandwidth (within the specified measurement
741interval).
742For practical reasons, the smallest value for the measurement
743interval is 3 seconds.
744If smaller values are allowed, then the bandwidth
745estimation may be less accurate, or the potentially very high frequency
746of the generated upcalls may introduce too much overhead.
747For the >= operation, the answer may be known before the end of
748.Va threshold_interval ,
749therefore the upcall may be delivered earlier.
750For the <= operation however, we must wait
751until the threshold interval has expired to know the answer.
752.Sh EXAMPLES
753.Bd -literal -offset indent
754struct bw_upcall bw_upcall;
755/* Assign all bw_upcall fields as appropriate */
756memset(&bw_upcall, 0, sizeof(bw_upcall));
757memcpy(&bw_upcall.bu_src, &source, sizeof(bw_upcall.bu_src));
758memcpy(&bw_upcall.bu_dst, &group, sizeof(bw_upcall.bu_dst));
759bw_upcall.bu_threshold.b_data = threshold_interval;
760bw_upcall.bu_threshold.b_packets = threshold_packets;
761bw_upcall.bu_threshold.b_bytes = threshold_bytes;
762if (is_threshold_in_packets)
763    bw_upcall.bu_flags |= BW_UPCALL_UNIT_PACKETS;
764if (is_threshold_in_bytes)
765    bw_upcall.bu_flags |= BW_UPCALL_UNIT_BYTES;
766do {
767    if (is_geq_upcall) {
768        bw_upcall.bu_flags |= BW_UPCALL_GEQ;
769        break;
770    }
771    if (is_leq_upcall) {
772        bw_upcall.bu_flags |= BW_UPCALL_LEQ;
773        break;
774    }
775    return (ERROR);
776} while (0);
777setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_BW_UPCALL,
778          (void *)&bw_upcall, sizeof(bw_upcall));
779.Ed
780.Pp
781To delete a single filter, use
782.Dv MRT_DEL_BW_UPCALL ,
783and the fields of bw_upcall must be set to
784exactly same as when
785.Dv MRT_ADD_BW_UPCALL
786was called.
787.Pp
788To delete all bandwidth filters for a given (S,G), then
789only the
790.Va bu_src
791and
792.Va bu_dst
793fields in
794.Vt "struct bw_upcall"
795need to be set, and then just set only the
796.Dv BW_UPCALL_DELETE_ALL
797flag inside field
798.Va bw_upcall.bu_flags .
799.Pp
800The bandwidth upcalls are received by aggregating them in the new upcall
801message:
802.Bd -literal -offset indent
803#define IGMPMSG_BW_UPCALL  4  /* BW monitoring upcall */
804.Ed
805.Pp
806This message is an array of
807.Vt "struct bw_upcall"
808elements (up to
809.Dv BW_UPCALLS_MAX
810= 128).
811The upcalls are
812delivered when there are 128 pending upcalls, or when 1 second has
813expired since the previous upcall (whichever comes first).
814In an
815.Vt "struct upcall"
816element, the
817.Va bu_measured
818field is filled in to
819indicate the particular measured values.
820However, because of the way
821the particular intervals are measured, the user should be careful how
822.Va bu_measured.b_time
823is used.
824For example, if the
825filter is installed to trigger an upcall if the number of packets
826is >= 1, then
827.Va bu_measured
828may have a value of zero in the upcalls after the
829first one, because the measured interval for >= filters is
830.Dq clocked
831by the forwarded packets.
832Hence, this upcall mechanism should not be used for measuring
833the exact value of the bandwidth of the forwarded data.
834To measure the exact bandwidth, the user would need to
835get the forwarded packets statistics with the
836.Fn ioctl SIOCGETSGCNT
837mechanism
838(see the
839.Sx Programming Guide
840section) .
841.Pp
842Note that the upcalls for a filter are delivered until the specific
843filter is deleted, but no more frequently than once per
844.Va bu_threshold.b_time .
845For example, if the filter is specified to
846deliver a signal if bw >= 1 packet, the first packet will trigger a
847signal, but the next upcall will be triggered no earlier than
848.Va bu_threshold.b_time
849after the previous upcall.
850.\"
851.Sh SEE ALSO
852.Xr getsockopt 2 ,
853.Xr recvfrom 2 ,
854.Xr recvmsg 2 ,
855.Xr setsockopt 2 ,
856.Xr socket 2 ,
857.Xr icmp6 4 ,
858.Xr inet 4 ,
859.Xr inet6 4 ,
860.Xr intro 4 ,
861.Xr ip 4 ,
862.Xr ip6 4 ,
863.Xr mrouted 8 ,
864.Xr sysctl 8
865.\"
866.Sh AUTHORS
867.An -nosplit
868The original multicast code was written by
869.An David Waitzman
870(BBN Labs),
871and later modified by the following individuals:
872.An Steve Deering
873(Stanford),
874.An Mark J. Steiglitz
875(Stanford),
876.An Van Jacobson
877(LBL),
878.An Ajit Thyagarajan
879(PARC),
880.An Bill Fenner
881(PARC).
882.Pp
883The IPv6 multicast support was implemented by the KAME project
884.Pq Lk http://www.kame.net ,
885and was based on the IPv4 multicast code.
886The advanced multicast API and the multicast bandwidth
887monitoring were implemented by
888.An Pavlin Radoslavov
889(ICSI)
890in collaboration with
891.An Chris Brown
892(NextHop).
893.Pp
894This manual page was written by
895.An Pavlin Radoslavov
896(ICSI).
897