xref: /openbsd-src/share/man/man4/multicast.4 (revision ae3cb403620ab940fbaabb3055fac045a63d56b7)
1.\" Copyright (c) 2001-2003 International Computer Science Institute
2.\"
3.\" Permission is hereby granted, free of charge, to any person obtaining a
4.\" copy of this software and associated documentation files (the "Software"),
5.\" to deal in the Software without restriction, including without limitation
6.\" the rights to use, copy, modify, merge, publish, distribute, sublicense,
7.\" and/or sell copies of the Software, and to permit persons to whom the
8.\" Software is furnished to do so, subject to the following conditions:
9.\"
10.\" The above copyright notice and this permission notice shall be included in
11.\" all copies or substantial portions of the Software.
12.\"
13.\" The names and trademarks of copyright holders may not be used in
14.\" advertising or publicity pertaining to the software without specific
15.\" prior permission. Title to copyright in this software and any associated
16.\" documentation will at all times remain with the copyright holders.
17.\"
18.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
19.\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
20.\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
21.\" AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
22.\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
23.\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
24.\" DEALINGS IN THE SOFTWARE.
25.\"
26.\" $FreeBSD: src/share/man/man4/multicast.4,v 1.4 2004/07/09 09:22:36 ru Exp $
27.\" $OpenBSD: multicast.4,v 1.11 2016/12/22 11:04:44 rzalamena Exp $
28.\" $NetBSD: multicast.4,v 1.3 2004/09/12 13:12:26 wiz Exp $
29.\"
30.Dd $Mdocdate: December 22 2016 $
31.Dt MULTICAST 4
32.Os
33.\"
34.Sh NAME
35.Nm multicast
36.Nd Multicast Routing
37.\"
38.Sh SYNOPSIS
39.Cd "options MROUTING"
40.Pp
41.In sys/types.h
42.In sys/socket.h
43.In netinet/in.h
44.In netinet/ip_mroute.h
45.In netinet6/ip6_mroute.h
46.Ft int
47.Fn getsockopt "int s" IPPROTO_IP MRT_INIT "void *optval" "socklen_t *optlen"
48.Ft int
49.Fn setsockopt "int s" IPPROTO_IP MRT_INIT "const void *optval" "socklen_t optlen"
50.Ft int
51.Fn getsockopt "int s" IPPROTO_IPV6 MRT6_INIT "void *optval" "socklen_t *optlen"
52.Ft int
53.Fn setsockopt "int s" IPPROTO_IPV6 MRT6_INIT "const void *optval" "socklen_t optlen"
54.Sh DESCRIPTION
55.Tn "Multicast routing"
56is used to efficiently propagate data
57packets to a set of multicast listeners in multipoint networks.
58If unicast is used to replicate the data to all listeners,
59then some of the network links may carry multiple copies of the same
60data packets.
61With multicast routing, the overhead is reduced to one copy
62(at most) per network link.
63.Pp
64All multicast-capable routers must run a common multicast routing
65protocol.
66The Distance Vector Multicast Routing Protocol (DVMRP)
67was the first developed multicast routing protocol.
68Later, other protocols such as Multicast Extensions to OSPF (MOSPF) and
69Core Based Trees (CBT)
70were developed as well.
71.Pp
72To start multicast routing,
73the user must enable multicast forwarding via the
74.Xr sysctl 8
75variables
76.Va net.inet.ip.mforwarding
77and/or
78.Va net.inet.ip6.mforwarding .
79The user must also run a multicast routing capable user-level process,
80such as
81.Xr mrouted 8 .
82From a developer's point of view,
83the programming guide described in the
84.Sx Programming Guide
85section should be used to control the multicast forwarding in the kernel.
86.\"
87.Ss Programming Guide
88This section provides information about the basic multicast routing API.
89The so-called
90.Dq advanced multicast API
91is described in the
92.Sx "Advanced Multicast API Programming Guide"
93section.
94.Pp
95First, a multicast routing socket must be open.
96That socket would be used
97to control the multicast forwarding in the kernel.
98Note that most operations below require certain privilege
99(i.e., root privilege):
100.Bd -literal -offset indent
101/* IPv4 */
102int mrouter_s4;
103mrouter_s4 = socket(AF_INET, SOCK_RAW, IPPROTO_IGMP);
104.Ed
105.Bd -literal -offset indent
106int mrouter_s6;
107mrouter_s6 = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6);
108.Ed
109.Pp
110Note that if the router needs to open an IGMP or ICMPv6 socket
111(IPv4 or IPv6, respectively)
112for sending or receiving of IGMP or MLD multicast group membership messages,
113then the same
114.Va mrouter_s4
115or
116.Va mrouter_s6
117sockets should be used
118for sending and receiving respectively IGMP or MLD messages.
119In the case of
120.Bx -derived
121kernels,
122it may be possible to open separate sockets
123for IGMP or MLD messages only.
124However, some other kernels (e.g.,
125.Tn Linux )
126require that the multicast
127routing socket must be used for sending and receiving of IGMP or MLD
128messages.
129Therefore, for portability reasons, the multicast
130routing socket should be reused for IGMP and MLD messages as well.
131.Pp
132After the multicast routing socket is open, it can be used to enable
133or disable multicast forwarding in the kernel:
134.Bd -literal -offset 5n
135/* IPv4 */
136int v = 1;        /* 1 to enable, or 0 to disable */
137setsockopt(mrouter_s4, IPPROTO_IP, MRT_INIT, (void *)&v, sizeof(v));
138.Ed
139.Bd -literal -offset 5n
140/* IPv6 */
141int v = 1;        /* 1 to enable, or 0 to disable */
142setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_INIT, (void *)&v, sizeof(v));
143\&...
144/* If necessary, filter all ICMPv6 messages */
145struct icmp6_filter filter;
146ICMP6_FILTER_SETBLOCKALL(&filter);
147setsockopt(mrouter_s6, IPPROTO_ICMPV6, ICMP6_FILTER, (void *)&filter,
148           sizeof(filter));
149.Ed
150.Pp
151For each network interface (e.g., physical or a virtual tunnel)
152that would be used for multicast forwarding, a corresponding
153multicast interface must be added to the kernel:
154.Bd -literal -offset 3n
155/* IPv4 */
156struct vifctl vc;
157memset(&vc, 0, sizeof(vc));
158/* Assign all vifctl fields as appropriate */
159vc.vifc_vifi = vif_index;
160vc.vifc_flags = vif_flags;
161vc.vifc_threshold = min_ttl_threshold;
162vc.vifc_rate_limit = max_rate_limit;
163memcpy(&vc.vifc_lcl_addr, &vif_local_address, sizeof(vc.vifc_lcl_addr));
164if (vc.vifc_flags & VIFF_TUNNEL)
165    memcpy(&vc.vifc_rmt_addr, &vif_remote_address,
166           sizeof(vc.vifc_rmt_addr));
167setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_VIF, (void *)&vc,
168           sizeof(vc));
169.Ed
170.Pp
171The
172.Va vif_index
173must be unique per vif.
174The
175.Va vif_flags
176contains the
177.Dv VIFF_*
178flags as defined in
179.In netinet/ip_mroute.h .
180The
181.Va min_ttl_threshold
182contains the minimum TTL a multicast data packet must have to be
183forwarded on that vif.
184Typically, it would be 1.
185The
186.Va max_rate_limit
187contains the maximum rate (in bits/s) of the multicast data packets forwarded
188on that vif.
189A value of 0 means no limit.
190The
191.Va vif_local_address
192contains the local IP address of the corresponding local interface.
193The
194.Va vif_remote_address
195contains the remote IP address for DVMRP multicast tunnels.
196.Bd -literal -offset indent
197/* IPv6 */
198struct mif6ctl mc;
199memset(&mc, 0, sizeof(mc));
200/* Assign all mif6ctl fields as appropriate */
201mc.mif6c_mifi = mif_index;
202mc.mif6c_flags = mif_flags;
203mc.mif6c_pifi = pif_index;
204setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_ADD_MIF, (void *)&mc,
205           sizeof(mc));
206.Ed
207.Pp
208The
209.Va mif_index
210must be unique per vif.
211The
212.Va mif_flags
213contains the
214.Dv MIFF_*
215flags as defined in
216.In netinet6/ip6_mroute.h .
217The
218.Va pif_index
219is the physical interface index of the corresponding local interface.
220.Pp
221A multicast interface is deleted by:
222.Bd -literal -offset indent
223/* IPv4 */
224vifi_t vifi = vif_index;
225setsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_VIF, (void *)&vifi,
226           sizeof(vifi));
227.Ed
228.Bd -literal -offset indent
229/* IPv6 */
230mifi_t mifi = mif_index;
231setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_DEL_MIF, (void *)&mifi,
232           sizeof(mifi));
233.Ed
234.Pp
235After multicast forwarding is enabled, and the multicast virtual
236interfaces have been
237added, the kernel may deliver upcall messages (also called signals
238later in this text) on the multicast routing socket that was open
239earlier with
240.Dv MRT_INIT
241or
242.Dv MRT6_INIT .
243The IPv4 upcalls have a
244.Vt "struct igmpmsg"
245header (see
246.In netinet/ip_mroute.h )
247with the
248.Va im_mbz
249field set to zero.
250Note that this header follows the structure of
251.Vt "struct ip"
252with the protocol field
253.Va ip_p
254set to zero.
255The IPv6 upcalls have a
256.Vt "struct mrt6msg"
257header (see
258.In netinet6/ip6_mroute.h )
259with the
260.Va im6_mbz
261field set to zero.
262Note that this header follows the structure of
263.Vt "struct ip6_hdr"
264with the next header field
265.Va ip6_nxt
266set to zero.
267.Pp
268The upcall header contains the
269.Va im_msgtype
270and
271.Va im6_msgtype
272fields, with the type of the upcall
273.Dv IGMPMSG_*
274and
275.Dv MRT6MSG_*
276for IPv4 and IPv6, respectively.
277The values of the rest of the upcall header fields
278and the body of the upcall message depend on the particular upcall type.
279.Pp
280If the upcall message type is
281.Dv IGMPMSG_NOCACHE
282or
283.Dv MRT6MSG_NOCACHE ,
284this is an indication that a multicast packet has reached the multicast
285router, but the router has no forwarding state for that packet.
286Typically, the upcall would be a signal for the multicast routing
287user-level process to install the appropriate Multicast Forwarding
288Cache (MFC) entry in the kernel.
289.Pp
290An MFC entry is added by:
291.Bd -literal -offset indent
292/* IPv4 */
293struct mfcctl mc;
294memset(&mc, 0, sizeof(mc));
295memcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin));
296memcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp));
297mc.mfcc_parent = iif_index;
298for (i = 0; i < maxvifs; i++)
299    mc.mfcc_ttls[i] = oifs_ttl[i];
300setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_MFC,
301           (void *)&mc, sizeof(mc));
302.Ed
303.Bd -literal -offset indent
304/* IPv6 */
305struct mf6cctl mc;
306memset(&mc, 0, sizeof(mc));
307memcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin));
308memcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp));
309mc.mf6cc_parent = iif_index;
310for (i = 0; i < maxvifs; i++)
311    if (oifs_ttl[i] > 0)
312        IF_SET(i, &mc.mf6cc_ifset);
313setsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_ADD_MFC,
314           (void *)&mc, sizeof(mc));
315.Ed
316.Pp
317The
318.Va source_addr
319and
320.Va group_addr
321fields are the source and group address of the multicast packet (as set
322in the upcall message).
323The
324.Va iif_index
325is the virtual interface index of the multicast interface the multicast
326packets for this specific source and group address should be received on.
327The
328.Va oifs_ttl[]
329array contains the minimum TTL (per interface) a multicast packet
330should have to be forwarded on an outgoing interface.
331If the TTL value is zero, the corresponding interface is not included
332in the set of outgoing interfaces.
333Note that for IPv6 only the set of outgoing interfaces can
334be specified.
335.Pp
336An MFC entry is deleted by:
337.Bd -literal -offset indent
338/* IPv4 */
339struct mfcctl mc;
340memset(&mc, 0, sizeof(mc));
341memcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin));
342memcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp));
343setsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_MFC,
344           (void *)&mc, sizeof(mc));
345.Ed
346.Bd -literal -offset indent
347/* IPv6 */
348struct mf6cctl mc;
349memset(&mc, 0, sizeof(mc));
350memcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin));
351memcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp));
352setsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_DEL_MFC,
353           (void *)&mc, sizeof(mc));
354.Ed
355.Pp
356The following method can be used to get various statistics per
357installed MFC entry in the kernel (e.g., the number of forwarded
358packets per source and group address):
359.Bd -literal -offset indent
360/* IPv4 */
361struct sioc_sg_req sgreq;
362memset(&sgreq, 0, sizeof(sgreq));
363memcpy(&sgreq.src, &source_addr, sizeof(sgreq.src));
364memcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp));
365ioctl(mrouter_s4, SIOCGETSGCNT, &sgreq);
366.Ed
367.Bd -literal -offset indent
368/* IPv6 */
369struct sioc_sg_req6 sgreq;
370memset(&sgreq, 0, sizeof(sgreq));
371memcpy(&sgreq.src, &source_addr, sizeof(sgreq.src));
372memcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp));
373ioctl(mrouter_s6, SIOCGETSGCNT_IN6, &sgreq);
374.Ed
375.Pp
376The following method can be used to get various statistics per
377multicast virtual interface in the kernel (e.g., the number of forwarded
378packets per interface):
379.Bd -literal -offset indent
380/* IPv4 */
381struct sioc_vif_req vreq;
382memset(&vreq, 0, sizeof(vreq));
383vreq.vifi = vif_index;
384ioctl(mrouter_s4, SIOCGETVIFCNT, &vreq);
385.Ed
386.Bd -literal -offset indent
387/* IPv6 */
388struct sioc_mif_req6 mreq;
389memset(&mreq, 0, sizeof(mreq));
390mreq.mifi = vif_index;
391ioctl(mrouter_s6, SIOCGETMIFCNT_IN6, &mreq);
392.Ed
393.Ss Advanced Multicast API Programming Guide
394Adding new features to the kernel makes it difficult
395to preserve backward compatibility (binary and API),
396and at the same time to allow user-level processes to take advantage of
397the new features (if the kernel supports them).
398.Pp
399One of the mechanisms that allows preserving the backward
400compatibility is a sort of negotiation
401between the user-level process and the kernel:
402.Bl -enum
403.It
404The user-level process tries to enable in the kernel the set of new
405features (and the corresponding API) it would like to use.
406.It
407The kernel returns the (sub)set of features it knows about
408and is willing to be enabled.
409.It
410The user-level process uses only that set of features
411the kernel has agreed on.
412.El
413.\"
414.Pp
415To support backward compatibility, if the user-level process does not
416ask for any new features, the kernel defaults to the basic
417multicast API (see the
418.Sx "Programming Guide"
419section).
420.\" XXX: edit as appropriate after the advanced multicast API is
421.\" supported under IPv6
422Currently, the advanced multicast API exists only for IPv4;
423in the future there will be IPv6 support as well.
424.Pp
425Below is a summary of the expandable API solution.
426Note that all new options and structures are defined
427in
428.In netinet/ip_mroute.h
429and
430.In netinet6/ip6_mroute.h ,
431unless stated otherwise.
432.Pp
433The user-level process uses new
434.Fn getsockopt Ns / Ns Fn setsockopt
435options to
436perform the API features negotiation with the kernel.
437This negotiation must be performed right after the multicast routing
438socket is open.
439The set of desired/allowed features is stored in a bitset
440(currently, in
441.Vt uint32_t
442i.e., maximum of 32 new features).
443The new
444.Fn getsockopt Ns / Ns Fn setsockopt
445options are
446.Dv MRT_API_SUPPORT
447and
448.Dv MRT_API_CONFIG .
449An example:
450.Bd -literal -offset 3n
451uint32_t v;
452getsockopt(sock, IPPROTO_IP, MRT_API_SUPPORT, (void *)&v, sizeof(v));
453.Ed
454.Pp
455This would set
456.Va v
457to the pre-defined bits that the kernel API supports.
458The eight least significant bits in
459.Vt uint32_t
460are the same as the
461eight possible flags
462.Dv MRT_MFC_FLAGS_*
463that can be used in
464.Va mfcc_flags
465as part of the new definition of
466.Vt "struct mfcctl"
467(see below about those flags), which leaves 24 flags for other new features.
468The value returned by
469.Fn getsockopt MRT_API_SUPPORT
470is read-only; in other words,
471.Fn setsockopt MRT_API_SUPPORT
472would fail.
473.Pp
474To modify the API, and to set some specific feature in the kernel, then:
475.Bd -literal -offset 3n
476uint32_t v = MRT_MFC_FLAGS_DISABLE_WRONGVIF;
477if (setsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v))
478    != 0) {
479    return (ERROR);
480}
481if (v & MRT_MFC_FLAGS_DISABLE_WRONGVIF)
482    return (OK);	/* Success */
483else
484    return (ERROR);
485.Ed
486.Pp
487In other words, when
488.Fn setsockopt MRT_API_CONFIG
489is called, the
490argument to it specifies the desired set of features to
491be enabled in the API and the kernel.
492The return value in
493.Va v
494is the actual (sub)set of features that were enabled in the kernel.
495To obtain later the same set of features that were enabled, use:
496.Bd -literal -offset indent
497getsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v));
498.Ed
499.Pp
500The set of enabled features is global.
501In other words,
502.Fn setsockopt MRT_API_CONFIG
503should be called right after
504.Fn setsockopt MRT_INIT .
505.Pp
506Currently, the following set of new features is defined:
507.Bd -literal
508#define	MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0)/*disable WRONGVIF signals*/
509#define	MRT_MFC_FLAGS_BORDER_VIF   (1 << 1)  /* border vif              */
510#define MRT_MFC_RP                 (1 << 8)  /* enable RP address	*/
511#define MRT_MFC_BW_UPCALL          (1 << 9)  /* enable bw upcalls	*/
512.Ed
513.\" .Pp
514.\" In the future there might be:
515.\" .Bd -literal
516.\" #define MRT_MFC_GROUP_SPECIFIC     (1 << 10) /* allow (*,G) MFC entries */
517.\" .Ed
518.\" .Pp
519.\" to allow (*,G) MFC entries (i.e., group-specific entries) in the kernel.
520.\" For now this is left-out until it is clear whether
521.\" (*,G) MFC support is the preferred solution instead of something more generic
522.\" solution for example.
523.\"
524.\" 2. The newly defined struct mfcctl2.
525.\"
526.Pp
527The advanced multicast API uses a newly defined
528.Vt "struct mfcctl2"
529instead of the traditional
530.Vt "struct mfcctl" .
531The original
532.Vt "struct mfcctl"
533is kept as is.
534The new
535.Vt "struct mfcctl2"
536is:
537.Bd -literal
538/*
539 * The new argument structure for MRT_ADD_MFC and MRT_DEL_MFC overlays
540 * and extends the old struct mfcctl.
541 */
542struct mfcctl2 {
543        /* the mfcctl fields */
544        struct in_addr  mfcc_origin;       /* ip origin of mcasts       */
545        struct in_addr  mfcc_mcastgrp;     /* multicast group associated*/
546        vifi_t          mfcc_parent;       /* incoming vif              */
547        u_char          mfcc_ttls[MAXVIFS];/* forwarding ttls on vifs   */
548
549        /* extension fields */
550        uint8_t         mfcc_flags[MAXVIFS];/* the MRT_MFC_FLAGS_* flags*/
551        struct in_addr  mfcc_rp;            /* the RP address           */
552};
553.Ed
554.Pp
555The new fields are
556.Va mfcc_flags[MAXVIFS]
557and
558.Va mfcc_rp .
559Note that for compatibility reasons they are added at the end.
560.Pp
561The
562.Va mfcc_flags[MAXVIFS]
563field is used to set various flags per
564interface per (S,G) entry.
565Currently, the defined flags are:
566.Bd -literal
567#define	MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0)/*disable WRONGVIF signals*/
568#define	MRT_MFC_FLAGS_BORDER_VIF       (1 << 1) /* border vif          */
569.Ed
570.Pp
571The
572.Dv MRT_MFC_FLAGS_DISABLE_WRONGVIF
573flag is used to explicitly disable the
574.Dv IGMPMSG_WRONGVIF
575kernel signal at the (S,G) granularity if a multicast data packet
576arrives on the wrong interface.
577However, it should not be delivered for interfaces that are not set in
578the outgoing interface, and that are not expecting to
579become an incoming interface.
580Hence, if the
581.Dv MRT_MFC_FLAGS_DISABLE_WRONGVIF
582flag is set for some of the
583interfaces, then a data packet that arrives on that interface for
584that MFC entry will NOT trigger a WRONGVIF signal.
585If that flag is not set, then a signal is triggered (the default action).
586.Pp
587Typically, a multicast routing user-level process would need to know the
588forwarding bandwidth for some data flow.
589.Pp
590The original solution for measuring the bandwidth of a dataflow was
591that a user-level process would periodically
592query the kernel about the number of forwarded packets/bytes per
593(S,G), and then based on those numbers it would estimate whether a source
594has been idle, or whether the source's transmission bandwidth is above a
595threshold.
596That solution is far from being scalable, hence the need for a new
597mechanism for bandwidth monitoring.
598.Pp
599Below is a description of the bandwidth monitoring mechanism.
600.Bl -bullet
601.It
602If the bandwidth of a data flow satisfies some pre-defined filter,
603the kernel delivers an upcall on the multicast routing socket
604to the multicast routing process that has installed that filter.
605.It
606The bandwidth-upcall filters are installed per (S,G).
607There can be
608more than one filter per (S,G).
609.It
610Instead of supporting all possible comparison operations
611(i.e., < <= == != > >= ), there is support only for the
612<= and >= operations,
613because this makes the kernel-level implementation simpler,
614and because practically we need only those two.
615Furthermore, the missing operations can be simulated by secondary
616user-level filtering of those <= and >= filters.
617For example, to simulate !=, then we need to install filter
618.Dq bw <= 0xffffffff ,
619and after an
620upcall is received, we need to check whether
621.Dq measured_bw != expected_bw .
622.It
623The bandwidth-upcall mechanism is enabled by
624.Fn setsockopt MRT_API_CONFIG
625for the
626.Dv MRT_MFC_BW_UPCALL
627flag.
628.It
629The bandwidth-upcall filters are added/deleted by the new
630.Fn setsockopt MRT_ADD_BW_UPCALL
631and
632.Fn setsockopt MRT_DEL_BW_UPCALL
633respectively (with the appropriate
634.Vt "struct bw_upcall"
635argument of course).
636.El
637.Pp
638From an application point of view, a developer needs to know about
639the following:
640.Bd -literal
641/*
642 * Structure for installing or delivering an upcall if the
643 * measured bandwidth is above or below a threshold.
644 *
645 * User programs (e.g. daemons) may have a need to know when the
646 * bandwidth used by some data flow is above or below some threshold.
647 * This interface allows the userland to specify the threshold (in
648 * bytes and/or packets) and the measurement interval. Flows are
649 * all packet with the same source and destination IP address.
650 * At the moment the code is only used for multicast destinations
651 * but there is nothing that prevents its use for unicast.
652 *
653 * The measurement interval cannot be shorter than some Tmin (3s).
654 * The threshold is set in packets and/or bytes per_interval.
655 *
656 * Measurement works as follows:
657 *
658 * For >= measurements:
659 * The first packet marks the start of a measurement interval.
660 * During an interval we count packets and bytes, and when we
661 * pass the threshold we deliver an upcall and we are done.
662 * The first packet after the end of the interval resets the
663 * count and restarts the measurement.
664 *
665 * For <= measurement:
666 * We start a timer to fire at the end of the interval, and
667 * then for each incoming packet we count packets and bytes.
668 * When the timer fires, we compare the value with the threshold,
669 * schedule an upcall if we are below, and restart the measurement
670 * (reschedule timer and zero counters).
671 */
672
673struct bw_data {
674        struct timeval  b_time;
675        uint64_t        b_packets;
676        uint64_t        b_bytes;
677};
678
679struct bw_upcall {
680        struct in_addr  bu_src;         /* source address            */
681        struct in_addr  bu_dst;         /* destination address       */
682        uint32_t        bu_flags;       /* misc flags (see below)    */
683#define BW_UPCALL_UNIT_PACKETS (1 << 0) /* threshold (in packets)    */
684#define BW_UPCALL_UNIT_BYTES   (1 << 1) /* threshold (in bytes)      */
685#define BW_UPCALL_GEQ          (1 << 2) /* upcall if bw >= threshold */
686#define BW_UPCALL_LEQ          (1 << 3) /* upcall if bw <= threshold */
687#define BW_UPCALL_DELETE_ALL   (1 << 4) /* delete all upcalls for s,d*/
688        struct bw_data  bu_threshold;   /* the bw threshold          */
689        struct bw_data  bu_measured;    /* the measured bw           */
690};
691
692/* max. number of upcalls to deliver together */
693#define BW_UPCALLS_MAX				128
694/* min. threshold time interval for bandwidth measurement */
695#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_SEC	3
696#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_USEC	0
697.Ed
698.Pp
699The
700.Vt bw_upcall
701structure is used as an argument to
702.Fn setsockopt MRT_ADD_BW_UPCALL
703and
704.Fn setsockopt MRT_DEL_BW_UPCALL .
705Each
706.Fn setsockopt MRT_ADD_BW_UPCALL
707installs a filter in the kernel
708for the source and destination address in the
709.Vt bw_upcall
710argument,
711and that filter will trigger an upcall according to the following
712pseudo-algorithm:
713.Bd -literal
714 if (bw_upcall_oper IS ">=") {
715    if (((bw_upcall_unit & PACKETS == PACKETS) &&
716         (measured_packets >= threshold_packets)) ||
717        ((bw_upcall_unit & BYTES == BYTES) &&
718         (measured_bytes >= threshold_bytes)))
719       SEND_UPCALL("measured bandwidth is >= threshold");
720  }
721  if (bw_upcall_oper IS "<=" && measured_interval >= threshold_interval) {
722    if (((bw_upcall_unit & PACKETS == PACKETS) &&
723         (measured_packets <= threshold_packets)) ||
724        ((bw_upcall_unit & BYTES == BYTES) &&
725         (measured_bytes <= threshold_bytes)))
726       SEND_UPCALL("measured bandwidth is <= threshold");
727  }
728.Ed
729.Pp
730In the same
731.Vt bw_upcall ,
732the unit can be specified in both BYTES and PACKETS.
733However, the GEQ and LEQ flags are mutually exclusive.
734.Pp
735Basically, an upcall is delivered if the measured bandwidth is >= or
736<= the threshold bandwidth (within the specified measurement
737interval).
738For practical reasons, the smallest value for the measurement
739interval is 3 seconds.
740If smaller values are allowed, then the bandwidth
741estimation may be less accurate, or the potentially very high frequency
742of the generated upcalls may introduce too much overhead.
743For the >= operation, the answer may be known before the end of
744.Va threshold_interval ,
745therefore the upcall may be delivered earlier.
746For the <= operation however, we must wait
747until the threshold interval has expired to know the answer.
748.Sh EXAMPLES
749.Bd -literal -offset indent
750struct bw_upcall bw_upcall;
751/* Assign all bw_upcall fields as appropriate */
752memset(&bw_upcall, 0, sizeof(bw_upcall));
753memcpy(&bw_upcall.bu_src, &source, sizeof(bw_upcall.bu_src));
754memcpy(&bw_upcall.bu_dst, &group, sizeof(bw_upcall.bu_dst));
755bw_upcall.bu_threshold.b_data = threshold_interval;
756bw_upcall.bu_threshold.b_packets = threshold_packets;
757bw_upcall.bu_threshold.b_bytes = threshold_bytes;
758if (is_threshold_in_packets)
759    bw_upcall.bu_flags |= BW_UPCALL_UNIT_PACKETS;
760if (is_threshold_in_bytes)
761    bw_upcall.bu_flags |= BW_UPCALL_UNIT_BYTES;
762do {
763    if (is_geq_upcall) {
764        bw_upcall.bu_flags |= BW_UPCALL_GEQ;
765        break;
766    }
767    if (is_leq_upcall) {
768        bw_upcall.bu_flags |= BW_UPCALL_LEQ;
769        break;
770    }
771    return (ERROR);
772} while (0);
773setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_BW_UPCALL,
774          (void *)&bw_upcall, sizeof(bw_upcall));
775.Ed
776.Pp
777To delete a single filter, use
778.Dv MRT_DEL_BW_UPCALL ,
779and the fields of bw_upcall must be set to
780exactly same as when
781.Dv MRT_ADD_BW_UPCALL
782was called.
783.Pp
784To delete all bandwidth filters for a given (S,G), then
785only the
786.Va bu_src
787and
788.Va bu_dst
789fields in
790.Vt "struct bw_upcall"
791need to be set, and then just set only the
792.Dv BW_UPCALL_DELETE_ALL
793flag inside field
794.Va bw_upcall.bu_flags .
795.Pp
796The bandwidth upcalls are received by aggregating them in the new upcall
797message:
798.Bd -literal -offset indent
799#define IGMPMSG_BW_UPCALL  4  /* BW monitoring upcall */
800.Ed
801.Pp
802This message is an array of
803.Vt "struct bw_upcall"
804elements (up to
805.Dv BW_UPCALLS_MAX
806= 128).
807The upcalls are
808delivered when there are 128 pending upcalls, or when 1 second has
809expired since the previous upcall (whichever comes first).
810In an
811.Vt "struct upcall"
812element, the
813.Va bu_measured
814field is filled in to
815indicate the particular measured values.
816However, because of the way
817the particular intervals are measured, the user should be careful how
818.Va bu_measured.b_time
819is used.
820For example, if the
821filter is installed to trigger an upcall if the number of packets
822is >= 1, then
823.Va bu_measured
824may have a value of zero in the upcalls after the
825first one, because the measured interval for >= filters is
826.Dq clocked
827by the forwarded packets.
828Hence, this upcall mechanism should not be used for measuring
829the exact value of the bandwidth of the forwarded data.
830To measure the exact bandwidth, the user would need to
831get the forwarded packets statistics with the
832.Fn ioctl SIOCGETSGCNT
833mechanism
834(see the
835.Sx Programming Guide
836section) .
837.Pp
838Note that the upcalls for a filter are delivered until the specific
839filter is deleted, but no more frequently than once per
840.Va bu_threshold.b_time .
841For example, if the filter is specified to
842deliver a signal if bw >= 1 packet, the first packet will trigger a
843signal, but the next upcall will be triggered no earlier than
844.Va bu_threshold.b_time
845after the previous upcall.
846.\"
847.Sh SEE ALSO
848.Xr getsockopt 2 ,
849.Xr recvfrom 2 ,
850.Xr recvmsg 2 ,
851.Xr setsockopt 2 ,
852.Xr socket 2 ,
853.Xr icmp6 4 ,
854.Xr inet 4 ,
855.Xr inet6 4 ,
856.Xr intro 4 ,
857.Xr ip 4 ,
858.Xr ip6 4 ,
859.Xr mrouted 8 ,
860.Xr sysctl 8
861.\"
862.Sh AUTHORS
863.An -nosplit
864The original multicast code was written by
865.An David Waitzman
866(BBN Labs),
867and later modified by the following individuals:
868.An Steve Deering
869(Stanford),
870.An Mark J. Steiglitz
871(Stanford),
872.An Van Jacobson
873(LBL),
874.An Ajit Thyagarajan
875(PARC),
876.An Bill Fenner
877(PARC).
878.Pp
879The IPv6 multicast support was implemented by the KAME project
880.Pq Lk http://www.kame.net ,
881and was based on the IPv4 multicast code.
882The advanced multicast API and the multicast bandwidth
883monitoring were implemented by
884.An Pavlin Radoslavov
885(ICSI)
886in collaboration with
887.An Chris Brown
888(NextHop).
889.Pp
890This manual page was written by
891.An Pavlin Radoslavov
892(ICSI).
893