xref: /dflybsd-src/share/man/man4/multicast.4 (revision 86d7f5d305c6adaa56ff4582ece9859d73106103)
186d7f5d3SJohn Marino.\" Copyright (c) 2001-2003 International Computer Science Institute
286d7f5d3SJohn Marino.\"
386d7f5d3SJohn Marino.\" Permission is hereby granted, free of charge, to any person obtaining a
486d7f5d3SJohn Marino.\" copy of this software and associated documentation files (the "Software"),
586d7f5d3SJohn Marino.\" to deal in the Software without restriction, including without limitation
686d7f5d3SJohn Marino.\" the rights to use, copy, modify, merge, publish, distribute, sublicense,
786d7f5d3SJohn Marino.\" and/or sell copies of the Software, and to permit persons to whom the
886d7f5d3SJohn Marino.\" Software is furnished to do so, subject to the following conditions:
986d7f5d3SJohn Marino.\"
1086d7f5d3SJohn Marino.\" The above copyright notice and this permission notice shall be included in
1186d7f5d3SJohn Marino.\" all copies or substantial portions of the Software.
1286d7f5d3SJohn Marino.\"
1386d7f5d3SJohn Marino.\" The names and trademarks of copyright holders may not be used in
1486d7f5d3SJohn Marino.\" advertising or publicity pertaining to the software without specific
1586d7f5d3SJohn Marino.\" prior permission. Title to copyright in this software and any associated
1686d7f5d3SJohn Marino.\" documentation will at all times remain with the copyright holders.
1786d7f5d3SJohn Marino.\"
1886d7f5d3SJohn Marino.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
1986d7f5d3SJohn Marino.\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
2086d7f5d3SJohn Marino.\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
2186d7f5d3SJohn Marino.\" AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
2286d7f5d3SJohn Marino.\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
2386d7f5d3SJohn Marino.\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
2486d7f5d3SJohn Marino.\" DEALINGS IN THE SOFTWARE.
2586d7f5d3SJohn Marino.\"
2686d7f5d3SJohn Marino.\" $FreeBSD: /repoman/r/ncvs/src/share/man/man4/multicast.4,v 1.1 2003/10/17 15:12:01 bmah Exp $
2786d7f5d3SJohn Marino.\" $DragonFly: src/share/man/man4/multicast.4,v 1.7 2008/05/02 02:05:05 swildner Exp $
2886d7f5d3SJohn Marino.\"
2986d7f5d3SJohn Marino.Dd September 4, 2003
3086d7f5d3SJohn Marino.Dt MULTICAST 4
3186d7f5d3SJohn Marino.Os
3286d7f5d3SJohn Marino.\"
3386d7f5d3SJohn Marino.Sh NAME
3486d7f5d3SJohn Marino.Nm multicast
3586d7f5d3SJohn Marino.Nd Multicast Routing
3686d7f5d3SJohn Marino.\"
3786d7f5d3SJohn Marino.Sh SYNOPSIS
3886d7f5d3SJohn Marino.Cd "options MROUTING"
3986d7f5d3SJohn Marino.Pp
4086d7f5d3SJohn Marino.In sys/types.h
4186d7f5d3SJohn Marino.In sys/socket.h
4286d7f5d3SJohn Marino.In netinet/in.h
4386d7f5d3SJohn Marino.In net/ip_mroute/ip_mroute.h
4486d7f5d3SJohn Marino.In netinet6/ip6_mroute.h
4586d7f5d3SJohn Marino.Ft int
4686d7f5d3SJohn Marino.Fn getsockopt "int s" IPPROTO_IP MRT_INIT "void *optval" "socklen_t *optlen"
4786d7f5d3SJohn Marino.Ft int
4886d7f5d3SJohn Marino.Fn setsockopt "int s" IPPROTO_IP MRT_INIT "const void *optval" "socklen_t optlen"
4986d7f5d3SJohn Marino.Ft int
5086d7f5d3SJohn Marino.Fn getsockopt "int s" IPPROTO_IPV6 MRT6_INIT "void *optval" "socklen_t *optlen"
5186d7f5d3SJohn Marino.Ft int
5286d7f5d3SJohn Marino.Fn setsockopt "int s" IPPROTO_IPV6 MRT6_INIT "const void *optval" "socklen_t optlen"
5386d7f5d3SJohn Marino.Sh DESCRIPTION
5486d7f5d3SJohn Marino.Tn "Multicast routing"
5586d7f5d3SJohn Marinois used to efficiently propagate data
5686d7f5d3SJohn Marinopackets to a set of multicast listeners in multipoint networks.
5786d7f5d3SJohn MarinoIf unicast is used to replicate the data to all listeners,
5886d7f5d3SJohn Marinothen some of the network links may carry multiple copies of the same
5986d7f5d3SJohn Marinodata packets.
6086d7f5d3SJohn MarinoWith multicast routing, the overhead is reduced to one copy
6186d7f5d3SJohn Marino(at most) per network link.
6286d7f5d3SJohn Marino.Pp
6386d7f5d3SJohn MarinoAll multicast-capable routers must run a common multicast routing
6486d7f5d3SJohn Marinoprotocol.
6586d7f5d3SJohn MarinoThe Distance Vector Multicast Routing Protocol (DVMRP)
6686d7f5d3SJohn Marinowas the first developed multicast routing protocol.
6786d7f5d3SJohn MarinoLater, other protocols such as Multicast Extensions to OSPF (MOSPF),
6886d7f5d3SJohn MarinoCore Based Trees (CBT),
6986d7f5d3SJohn MarinoProtocol Independent Multicast - Sparse Mode (PIM-SM),
7086d7f5d3SJohn Marinoand Protocol Independent Multicast - Dense Mode (PIM-DM)
7186d7f5d3SJohn Marinowere developed as well.
7286d7f5d3SJohn Marino.Pp
7386d7f5d3SJohn MarinoTo start multicast routing,
7486d7f5d3SJohn Marinothe user must enable multicast forwarding in the kernel
7586d7f5d3SJohn Marino(see
7686d7f5d3SJohn Marino.Sx SYNOPSIS
7786d7f5d3SJohn Marinoabout the kernel configuration options),
7886d7f5d3SJohn Marinoand must run a multicast routing capable user-level process.
7986d7f5d3SJohn MarinoFrom developer's point of view,
8086d7f5d3SJohn Marinothe programming guide described in the
8186d7f5d3SJohn Marino.Sx "Programming Guide"
8286d7f5d3SJohn Marinosection should be used to control the multicast forwarding in the kernel.
8386d7f5d3SJohn Marino.\"
8486d7f5d3SJohn Marino.Ss Programming Guide
8586d7f5d3SJohn MarinoThis section provides information about the basic multicast routing API.
8686d7f5d3SJohn MarinoThe so-called
8786d7f5d3SJohn Marino.Dq advanced multicast API
8886d7f5d3SJohn Marinois described in the
8986d7f5d3SJohn Marino.Sx "Advanced Multicast API Programming Guide"
9086d7f5d3SJohn Marinosection.
9186d7f5d3SJohn Marino.Pp
9286d7f5d3SJohn MarinoFirst, a multicast routing socket must be open.
9386d7f5d3SJohn MarinoThat socket would be used
9486d7f5d3SJohn Marinoto control the multicast forwarding in the kernel.
9586d7f5d3SJohn MarinoNote that most operations below require certain privilege
9686d7f5d3SJohn Marino(i.e., root privilege):
9786d7f5d3SJohn Marino.Bd -literal
9886d7f5d3SJohn Marino/* IPv4 */
9986d7f5d3SJohn Marinoint mrouter_s4;
10086d7f5d3SJohn Marinomrouter_s4 = socket(AF_INET, SOCK_RAW, IPPROTO_IGMP);
10186d7f5d3SJohn Marino.Ed
10286d7f5d3SJohn Marino.Bd -literal
10386d7f5d3SJohn Marinoint mrouter_s6;
10486d7f5d3SJohn Marinomrouter_s6 = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6);
10586d7f5d3SJohn Marino.Ed
10686d7f5d3SJohn Marino.Pp
10786d7f5d3SJohn MarinoNote that if the router needs to open an IGMP or ICMPv6 socket
10886d7f5d3SJohn Marino(in case of IPv4 and IPv6 respectively)
10986d7f5d3SJohn Marinofor sending or receiving of IGMP or MLD multicast group membership messages,
11086d7f5d3SJohn Marinothen the same mrouter_s4 or mrouter_s6 sockets should be used
11186d7f5d3SJohn Marinofor sending and receiving respectively IGMP or MLD messages.
11286d7f5d3SJohn MarinoIn case of BSD-derived kernel, it may be possible to open separate sockets
11386d7f5d3SJohn Marinofor IGMP or MLD messages only.
11486d7f5d3SJohn MarinoHowever, some other kernels (e.g., Linux) require that the multicast
11586d7f5d3SJohn Marinorouting socket must be used for sending and receiving of IGMP or MLD
11686d7f5d3SJohn Marinomessages.
11786d7f5d3SJohn MarinoTherefore, for portability reason the multicast
11886d7f5d3SJohn Marinorouting socket should be reused for IGMP and MLD messages as well.
11986d7f5d3SJohn Marino.Pp
12086d7f5d3SJohn MarinoAfter the multicast routing socket is open, it can be used to enable
12186d7f5d3SJohn Marinoor disable multicast forwarding in the kernel:
12286d7f5d3SJohn Marino.Bd -literal
12386d7f5d3SJohn Marino/* IPv4 */
12486d7f5d3SJohn Marinoint v = 1;        /* 1 to enable, or 0 to disable */
12586d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IP, MRT_INIT, (void *)&v, sizeof(v));
12686d7f5d3SJohn Marino.Ed
12786d7f5d3SJohn Marino.Bd -literal
12886d7f5d3SJohn Marino/* IPv6 */
12986d7f5d3SJohn Marinoint v = 1;        /* 1 to enable, or 0 to disable */
13086d7f5d3SJohn Marinosetsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_INIT, (void *)&v, sizeof(v));
13186d7f5d3SJohn Marino\&...
13286d7f5d3SJohn Marino/* If necessary, filter all ICMPv6 messages */
13386d7f5d3SJohn Marinostruct icmp6_filter filter;
13486d7f5d3SJohn MarinoICMP6_FILTER_SETBLOCKALL(&filter);
13586d7f5d3SJohn Marinosetsockopt(mrouter_s6, IPPROTO_ICMPV6, ICMP6_FILTER, (void *)&filter,
13686d7f5d3SJohn Marino           sizeof(filter));
13786d7f5d3SJohn Marino.Ed
13886d7f5d3SJohn Marino.Pp
13986d7f5d3SJohn MarinoAfter multicast forwarding is enabled, the multicast routing socket
14086d7f5d3SJohn Marinocan be used to enable PIM processing in the kernel if we are running PIM-SM or
14186d7f5d3SJohn MarinoPIM-DM
14286d7f5d3SJohn Marino(see
14386d7f5d3SJohn Marino.Xr pim 4 ) .
14486d7f5d3SJohn Marino.Pp
14586d7f5d3SJohn MarinoFor each network interface (e.g., physical or a virtual tunnel)
14686d7f5d3SJohn Marinothat would be used for multicast forwarding, a corresponding
14786d7f5d3SJohn Marinomulticast interface must be added to the kernel:
14886d7f5d3SJohn Marino.Bd -literal
14986d7f5d3SJohn Marino/* IPv4 */
15086d7f5d3SJohn Marinostruct vifctl vc;
15186d7f5d3SJohn Marinomemset(&vc, 0, sizeof(vc));
15286d7f5d3SJohn Marino/* Assign all vifctl fields as appropriate */
15386d7f5d3SJohn Marinovc.vifc_vifi = vif_index;
15486d7f5d3SJohn Marinovc.vifc_flags = vif_flags;
15586d7f5d3SJohn Marinovc.vifc_threshold = min_ttl_threshold;
15686d7f5d3SJohn Marinovc.vifc_rate_limit = max_rate_limit;
15786d7f5d3SJohn Marinomemcpy(&vc.vifc_lcl_addr, &vif_local_address, sizeof(vc.vifc_lcl_addr));
15886d7f5d3SJohn Marinoif (vc.vifc_flags & VIFF_TUNNEL)
15986d7f5d3SJohn Marino    memcpy(&vc.vifc_rmt_addr, &vif_remote_address,
16086d7f5d3SJohn Marino           sizeof(vc.vifc_rmt_addr));
16186d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_VIF, (void *)&vc,
16286d7f5d3SJohn Marino           sizeof(vc));
16386d7f5d3SJohn Marino.Ed
16486d7f5d3SJohn Marino.Pp
16586d7f5d3SJohn MarinoThe
16686d7f5d3SJohn Marino.Dq vif_index
16786d7f5d3SJohn Marinomust be unique per vif.
16886d7f5d3SJohn MarinoThe
16986d7f5d3SJohn Marino.Dq vif_flags
17086d7f5d3SJohn Marinocontains the
17186d7f5d3SJohn Marino.Dq VIFF_*
17286d7f5d3SJohn Marinoflags as defined in
17386d7f5d3SJohn Marino.In net/ip_mroute/ip_mroute.h .
17486d7f5d3SJohn MarinoThe
17586d7f5d3SJohn Marino.Dq min_ttl_threshold
17686d7f5d3SJohn Marinocontains the minimum TTL a multicast data packet must have to be
17786d7f5d3SJohn Marinoforwarded on that vif.
17886d7f5d3SJohn MarinoTypically, it would have value of 1.
17986d7f5d3SJohn MarinoThe
18086d7f5d3SJohn Marino.Dq max_rate_limit
18186d7f5d3SJohn Marinocontains the maximum rate (in bits/s) of the multicast data packets forwarded
18286d7f5d3SJohn Marinoon that vif.
18386d7f5d3SJohn MarinoValue of 0 means no limit.
18486d7f5d3SJohn MarinoThe
18586d7f5d3SJohn Marino.Dq vif_local_address
18686d7f5d3SJohn Marinocontains the local IP address of the corresponding local interface.
18786d7f5d3SJohn MarinoThe
18886d7f5d3SJohn Marino.Dq vif_remote_address
18986d7f5d3SJohn Marinocontains the remote IP address in case of DVMRP multicast tunnels.
19086d7f5d3SJohn Marino.Bd -literal
19186d7f5d3SJohn Marino/* IPv6 */
19286d7f5d3SJohn Marinostruct mif6ctl mc;
19386d7f5d3SJohn Marinomemset(&mc, 0, sizeof(mc));
19486d7f5d3SJohn Marino/* Assign all mif6ctl fields as appropriate */
19586d7f5d3SJohn Marinomc.mif6c_mifi = mif_index;
19686d7f5d3SJohn Marinomc.mif6c_flags = mif_flags;
19786d7f5d3SJohn Marinomc.mif6c_pifi = pif_index;
19886d7f5d3SJohn Marinosetsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_ADD_MIF, (void *)&mc,
19986d7f5d3SJohn Marino           sizeof(mc));
20086d7f5d3SJohn Marino.Ed
20186d7f5d3SJohn Marino.Pp
20286d7f5d3SJohn MarinoThe
20386d7f5d3SJohn Marino.Dq mif_index
20486d7f5d3SJohn Marinomust be unique per vif.
20586d7f5d3SJohn MarinoThe
20686d7f5d3SJohn Marino.Dq mif_flags
20786d7f5d3SJohn Marinocontains the
20886d7f5d3SJohn Marino.Dq MIFF_*
20986d7f5d3SJohn Marinoflags as defined in
21086d7f5d3SJohn Marino.In netinet6/ip6_mroute.h .
21186d7f5d3SJohn MarinoThe
21286d7f5d3SJohn Marino.Dq pif_index
21386d7f5d3SJohn Marinois the physical interface index of the corresponding local interface.
21486d7f5d3SJohn Marino.Pp
21586d7f5d3SJohn MarinoA multicast interface is deleted by:
21686d7f5d3SJohn Marino.Bd -literal
21786d7f5d3SJohn Marino/* IPv4 */
21886d7f5d3SJohn Marinovifi_t vifi = vif_index;
21986d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_VIF, (void *)&vifi,
22086d7f5d3SJohn Marino           sizeof(vifi));
22186d7f5d3SJohn Marino.Ed
22286d7f5d3SJohn Marino.Bd -literal
22386d7f5d3SJohn Marino/* IPv6 */
22486d7f5d3SJohn Marinomifi_t mifi = mif_index;
22586d7f5d3SJohn Marinosetsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_DEL_MIF, (void *)&mifi,
22686d7f5d3SJohn Marino           sizeof(mifi));
22786d7f5d3SJohn Marino.Ed
22886d7f5d3SJohn Marino.Pp
22986d7f5d3SJohn MarinoAfter the multicast forwarding is enabled, and the multicast virtual
23086d7f5d3SJohn Marinointerfaces are
23186d7f5d3SJohn Marinoadded, the kernel may deliver upcall messages (also called signals
23286d7f5d3SJohn Marinolater in this text) on the multicast routing socket that was open
23386d7f5d3SJohn Marinoearlier with
23486d7f5d3SJohn Marino.Dq MRT_INIT
23586d7f5d3SJohn Marinoor
23686d7f5d3SJohn Marino.Dq MRT6_INIT .
23786d7f5d3SJohn MarinoThe IPv4 upcalls have
23886d7f5d3SJohn Marino.Dq struct igmpmsg
23986d7f5d3SJohn Marinoheader (see
24086d7f5d3SJohn Marino.In net/ip_mroute/ip_mroute.h )
24186d7f5d3SJohn Marinowith field
24286d7f5d3SJohn Marino.Dq im_mbz
24386d7f5d3SJohn Marinoset to zero.
24486d7f5d3SJohn MarinoNote that this header follows the structure of
24586d7f5d3SJohn Marino.Dq struct ip
24686d7f5d3SJohn Marinowith the protocol field
24786d7f5d3SJohn Marino.Dq ip_p
24886d7f5d3SJohn Marinoset to zero.
24986d7f5d3SJohn MarinoThe IPv6 upcalls have
25086d7f5d3SJohn Marino.Dq struct mrt6msg
25186d7f5d3SJohn Marinoheader (see
25286d7f5d3SJohn Marino.In netinet6/ip6_mroute.h )
25386d7f5d3SJohn Marinowith field
25486d7f5d3SJohn Marino.Dq im6_mbz
25586d7f5d3SJohn Marinoset to zero.
25686d7f5d3SJohn MarinoNote that this header follows the structure of
25786d7f5d3SJohn Marino.Dq struct ip6_hdr
25886d7f5d3SJohn Marinowith the next header field
25986d7f5d3SJohn Marino.Dq ip6_nxt
26086d7f5d3SJohn Marinoset to zero.
26186d7f5d3SJohn Marino.Pp
26286d7f5d3SJohn MarinoThe upcall header contains field
26386d7f5d3SJohn Marino.Dq im_msgtype
26486d7f5d3SJohn Marinoand
26586d7f5d3SJohn Marino.Dq im6_msgtype
26686d7f5d3SJohn Marinowith the type of the upcall
26786d7f5d3SJohn Marino.Dq IGMPMSG_*
26886d7f5d3SJohn Marinoand
26986d7f5d3SJohn Marino.Dq MRT6MSG_*
27086d7f5d3SJohn Marinofor IPv4 and IPv6 respectively.
27186d7f5d3SJohn MarinoThe values of the rest of the upcall header fields
27286d7f5d3SJohn Marinoand the body of the upcall message depend on the particular upcall type.
27386d7f5d3SJohn Marino.Pp
27486d7f5d3SJohn MarinoIf the upcall message type is
27586d7f5d3SJohn Marino.Dq IGMPMSG_NOCACHE
27686d7f5d3SJohn Marinoor
27786d7f5d3SJohn Marino.Dq MRT6MSG_NOCACHE ,
27886d7f5d3SJohn Marinothis is an indication that a multicast packet has reached the multicast
27986d7f5d3SJohn Marinorouter, but the router has no forwarding state for that packet.
28086d7f5d3SJohn MarinoTypically, the upcall would be a signal for the multicast routing
28186d7f5d3SJohn Marinouser-level process to install the appropriate Multicast Forwarding
28286d7f5d3SJohn MarinoCache (MFC) entry in the kernel.
28386d7f5d3SJohn Marino.Pp
28486d7f5d3SJohn MarinoA MFC entry is added by:
28586d7f5d3SJohn Marino.Bd -literal
28686d7f5d3SJohn Marino/* IPv4 */
28786d7f5d3SJohn Marinostruct mfcctl mc;
28886d7f5d3SJohn Marinomemset(&mc, 0, sizeof(mc));
28986d7f5d3SJohn Marinomemcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin));
29086d7f5d3SJohn Marinomemcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp));
29186d7f5d3SJohn Marinomc.mfcc_parent = iif_index;
29286d7f5d3SJohn Marinofor (i = 0; i < maxvifs; i++)
29386d7f5d3SJohn Marino    mc.mfcc_ttls[i] = oifs_ttl[i];
29486d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_MFC,
29586d7f5d3SJohn Marino           (void *)&mc, sizeof(mc));
29686d7f5d3SJohn Marino.Ed
29786d7f5d3SJohn Marino.Bd -literal
29886d7f5d3SJohn Marino/* IPv6 */
29986d7f5d3SJohn Marinostruct mf6cctl mc;
30086d7f5d3SJohn Marinomemset(&mc, 0, sizeof(mc));
30186d7f5d3SJohn Marinomemcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin));
30286d7f5d3SJohn Marinomemcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp));
30386d7f5d3SJohn Marinomc.mf6cc_parent = iif_index;
30486d7f5d3SJohn Marinofor (i = 0; i < maxvifs; i++)
30586d7f5d3SJohn Marino    if (oifs_ttl[i] > 0)
30686d7f5d3SJohn Marino        IF_SET(i, &mc.mf6cc_ifset);
30786d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_ADD_MFC,
30886d7f5d3SJohn Marino           (void *)&mc, sizeof(mc));
30986d7f5d3SJohn Marino.Ed
31086d7f5d3SJohn Marino.Pp
31186d7f5d3SJohn MarinoThe
31286d7f5d3SJohn Marino.Dq source_addr
31386d7f5d3SJohn Marinoand
31486d7f5d3SJohn Marino.Dq group_addr
31586d7f5d3SJohn Marinoare the source and group address of the multicast packet (as set
31686d7f5d3SJohn Marinoin the upcall message).
31786d7f5d3SJohn MarinoThe
31886d7f5d3SJohn Marino.Dq iif_index
31986d7f5d3SJohn Marinois the virtual interface index of the multicast interface the multicast
32086d7f5d3SJohn Marinopackets for this specific source and group address should be received on.
32186d7f5d3SJohn MarinoThe
32286d7f5d3SJohn Marino.Dq oifs_ttl[]
32386d7f5d3SJohn Marinoarray contains the minimum TTL (per interface) a multicast packet
32486d7f5d3SJohn Marinoshould have to be forwarded on an outgoing interface.
32586d7f5d3SJohn MarinoIf the TTL value is zero, the corresponding interface is not included
32686d7f5d3SJohn Marinoin the set of outgoing interfaces.
32786d7f5d3SJohn MarinoNote that in case of IPv6 only the set of outgoing interfaces can
32886d7f5d3SJohn Marinobe specified.
32986d7f5d3SJohn Marino.Pp
33086d7f5d3SJohn MarinoA MFC entry is deleted by:
33186d7f5d3SJohn Marino.Bd -literal
33286d7f5d3SJohn Marino/* IPv4 */
33386d7f5d3SJohn Marinostruct mfcctl mc;
33486d7f5d3SJohn Marinomemset(&mc, 0, sizeof(mc));
33586d7f5d3SJohn Marinomemcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin));
33686d7f5d3SJohn Marinomemcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp));
33786d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_MFC,
33886d7f5d3SJohn Marino           (void *)&mc, sizeof(mc));
33986d7f5d3SJohn Marino.Ed
34086d7f5d3SJohn Marino.Bd -literal
34186d7f5d3SJohn Marino/* IPv6 */
34286d7f5d3SJohn Marinostruct mf6cctl mc;
34386d7f5d3SJohn Marinomemset(&mc, 0, sizeof(mc));
34486d7f5d3SJohn Marinomemcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin));
34586d7f5d3SJohn Marinomemcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp));
34686d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_DEL_MFC,
34786d7f5d3SJohn Marino           (void *)&mc, sizeof(mc));
34886d7f5d3SJohn Marino.Ed
34986d7f5d3SJohn Marino.Pp
35086d7f5d3SJohn MarinoThe following method can be used to get various statistics per
35186d7f5d3SJohn Marinoinstalled MFC entry in the kernel (e.g., the number of forwarded
35286d7f5d3SJohn Marinopackets per source and group address):
35386d7f5d3SJohn Marino.Bd -literal
35486d7f5d3SJohn Marino/* IPv4 */
35586d7f5d3SJohn Marinostruct sioc_sg_req sgreq;
35686d7f5d3SJohn Marinomemset(&sgreq, 0, sizeof(sgreq));
35786d7f5d3SJohn Marinomemcpy(&sgreq.src, &source_addr, sizeof(sgreq.src));
35886d7f5d3SJohn Marinomemcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp));
35986d7f5d3SJohn Marinoioctl(mrouter_s4, SIOCGETSGCNT, &sgreq);
36086d7f5d3SJohn Marino.Ed
36186d7f5d3SJohn Marino.Bd -literal
36286d7f5d3SJohn Marino/* IPv6 */
36386d7f5d3SJohn Marinostruct sioc_sg_req6 sgreq;
36486d7f5d3SJohn Marinomemset(&sgreq, 0, sizeof(sgreq));
36586d7f5d3SJohn Marinomemcpy(&sgreq.src, &source_addr, sizeof(sgreq.src));
36686d7f5d3SJohn Marinomemcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp));
36786d7f5d3SJohn Marinoioctl(mrouter_s6, SIOCGETSGCNT_IN6, &sgreq);
36886d7f5d3SJohn Marino.Ed
36986d7f5d3SJohn Marino.Pp
37086d7f5d3SJohn MarinoThe following method can be used to get various statistics per
37186d7f5d3SJohn Marinomulticast virtual interface in the kernel (e.g., the number of forwarded
37286d7f5d3SJohn Marinopackets per interface):
37386d7f5d3SJohn Marino.Bd -literal
37486d7f5d3SJohn Marino/* IPv4 */
37586d7f5d3SJohn Marinostruct sioc_vif_req vreq;
37686d7f5d3SJohn Marinomemset(&vreq, 0, sizeof(vreq));
37786d7f5d3SJohn Marinovreq.vifi = vif_index;
37886d7f5d3SJohn Marinoioctl(mrouter_s4, SIOCGETVIFCNT, &vreq);
37986d7f5d3SJohn Marino.Ed
38086d7f5d3SJohn Marino.Bd -literal
38186d7f5d3SJohn Marino/* IPv6 */
38286d7f5d3SJohn Marinostruct sioc_mif_req6 mreq;
38386d7f5d3SJohn Marinomemset(&mreq, 0, sizeof(mreq));
38486d7f5d3SJohn Marinomreq.mifi = vif_index;
38586d7f5d3SJohn Marinoioctl(mrouter_s6, SIOCGETMIFCNT_IN6, &mreq);
38686d7f5d3SJohn Marino.Ed
38786d7f5d3SJohn Marino.Ss Advanced Multicast API Programming Guide
38886d7f5d3SJohn MarinoIf we want to add new features in the kernel, it becomes difficult
38986d7f5d3SJohn Marinoto preserve backward compatibility (binary and API),
39086d7f5d3SJohn Marinoand at the same time to allow user-level processes to take advantage of
39186d7f5d3SJohn Marinothe new features (if the kernel supports them).
39286d7f5d3SJohn Marino.Pp
39386d7f5d3SJohn MarinoOne of the mechanisms that allows us to preserve the backward
39486d7f5d3SJohn Marinocompatibility is a sort of negotiation
39586d7f5d3SJohn Marinobetween the user-level process and the kernel:
39686d7f5d3SJohn Marino.Bl -enum
39786d7f5d3SJohn Marino.It
39886d7f5d3SJohn MarinoThe user-level process tries to enable in the kernel the set of new
39986d7f5d3SJohn Marinofeatures (and the corresponding API) it would like to use.
40086d7f5d3SJohn Marino.It
40186d7f5d3SJohn MarinoThe kernel returns the (sub)set of features it knows about
40286d7f5d3SJohn Marinoand is willing to be enabled.
40386d7f5d3SJohn Marino.It
40486d7f5d3SJohn MarinoThe user-level process uses only that set of features
40586d7f5d3SJohn Marinothe kernel has agreed on.
40686d7f5d3SJohn Marino.El
40786d7f5d3SJohn Marino.\"
40886d7f5d3SJohn Marino.Pp
40986d7f5d3SJohn MarinoTo support backward compatibility, if the user-level process doesn't
41086d7f5d3SJohn Marinoask for any new features, the kernel defaults to the basic
41186d7f5d3SJohn Marinomulticast API (see the
41286d7f5d3SJohn Marino.Sx "Programming Guide"
41386d7f5d3SJohn Marinosection).
41486d7f5d3SJohn Marino.\" XXX: edit as appropriate after the advanced multicast API is
41586d7f5d3SJohn Marino.\" supported under IPv6
41686d7f5d3SJohn MarinoCurrently, the advanced multicast API exists only for IPv4;
41786d7f5d3SJohn Marinoin the future there will be IPv6 support as well.
41886d7f5d3SJohn Marino.Pp
41986d7f5d3SJohn MarinoBelow is a summary of the expandable API solution.
42086d7f5d3SJohn MarinoNote that all new options and structures are defined
42186d7f5d3SJohn Marinoin
42286d7f5d3SJohn Marino.In net/ip_mroute/ip_mroute.h
42386d7f5d3SJohn Marinoand
42486d7f5d3SJohn Marino.In netinet6/ip6_mroute.h ,
42586d7f5d3SJohn Marinounless stated otherwise.
42686d7f5d3SJohn Marino.Pp
42786d7f5d3SJohn MarinoThe user-level process uses new get/setsockopt() options to
42886d7f5d3SJohn Marinoperform the API features negotiation with the kernel.
42986d7f5d3SJohn MarinoThis negotiation must be performed right after the multicast routing
43086d7f5d3SJohn Marinosocket is open.
43186d7f5d3SJohn MarinoThe set of desired/allowed features is stored in a bitset
43286d7f5d3SJohn Marino(currently, in uint32_t; i.e., maximum of 32 new features).
43386d7f5d3SJohn MarinoThe new get/setsockopt() options are
43486d7f5d3SJohn Marino.Dq MRT_API_SUPPORT
43586d7f5d3SJohn Marinoand
43686d7f5d3SJohn Marino.Dq MRT_API_CONFIG .
43786d7f5d3SJohn MarinoExample:
43886d7f5d3SJohn Marino.Bd -literal
43986d7f5d3SJohn Marinouint32_t v;
44086d7f5d3SJohn Marinogetsockopt(sock, IPPROTO_IP, MRT_API_SUPPORT, (void *)&v, sizeof(v));
44186d7f5d3SJohn Marino.Ed
44286d7f5d3SJohn Marino.Pp
44386d7f5d3SJohn Marinowould set in
44486d7f5d3SJohn Marino.Dq v
44586d7f5d3SJohn Marinothe pre-defined bits that the kernel API supports.
44686d7f5d3SJohn MarinoThe eight least significant bits in uint32_t are same as the
44786d7f5d3SJohn Marinoeight possible flags
44886d7f5d3SJohn Marino.Dq MRT_MFC_FLAGS_*
44986d7f5d3SJohn Marinothat can be used in
45086d7f5d3SJohn Marino.Dq mfcc_flags
45186d7f5d3SJohn Marinoas part of the new definition of
45286d7f5d3SJohn Marino.Dq struct mfcctl
45386d7f5d3SJohn Marino(see below about those flags), which leaves 24 flags for other new features.
45486d7f5d3SJohn MarinoThe value returned by getsockopt(MRT_API_SUPPORT) is read-only; in other
45586d7f5d3SJohn Marinowords, setsockopt(MRT_API_SUPPORT) would fail.
45686d7f5d3SJohn Marino.Pp
45786d7f5d3SJohn MarinoTo modify the API, and to set some specific feature in the kernel, then:
45886d7f5d3SJohn Marino.Bd -literal
45986d7f5d3SJohn Marinouint32_t v = MRT_MFC_FLAGS_DISABLE_WRONGVIF;
46086d7f5d3SJohn Marinoif (setsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v))
46186d7f5d3SJohn Marino    != 0) {
46286d7f5d3SJohn Marino    return (ERROR);
46386d7f5d3SJohn Marino}
46486d7f5d3SJohn Marinoif (v & MRT_MFC_FLAGS_DISABLE_WRONGVIF)
46586d7f5d3SJohn Marino    return (OK);	/* Success */
46686d7f5d3SJohn Marinoelse
46786d7f5d3SJohn Marino    return (ERROR);
46886d7f5d3SJohn Marino.Ed
46986d7f5d3SJohn Marino.Pp
47086d7f5d3SJohn MarinoIn other words, when setsockopt(MRT_API_CONFIG) is called, the
47186d7f5d3SJohn Marinoargument to it specifies the desired set of features to
47286d7f5d3SJohn Marinobe enabled in the API and the kernel.
47386d7f5d3SJohn MarinoThe return value in
47486d7f5d3SJohn Marino.Dq v
47586d7f5d3SJohn Marinois the actual (sub)set of features that were enabled in the kernel.
47686d7f5d3SJohn MarinoTo obtain later the same set of features that were enabled, then:
47786d7f5d3SJohn Marino.Bd -literal
47886d7f5d3SJohn Marinogetsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v));
47986d7f5d3SJohn Marino.Ed
48086d7f5d3SJohn Marino.Pp
48186d7f5d3SJohn MarinoThe set of enabled features is global.
48286d7f5d3SJohn MarinoIn other words, setsockopt(MRT_API_CONFIG)
48386d7f5d3SJohn Marinoshould be called right after setsockopt(MRT_INIT).
48486d7f5d3SJohn Marino.Pp
48586d7f5d3SJohn MarinoCurrently, the following set of new features is defined:
48686d7f5d3SJohn Marino.Bd -literal
48786d7f5d3SJohn Marino#define	MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0) /* disable WRONGVIF signals */
48886d7f5d3SJohn Marino#define	MRT_MFC_FLAGS_BORDER_VIF   (1 << 1)  /* border vif              */
48986d7f5d3SJohn Marino#define MRT_MFC_RP                 (1 << 8)  /* enable RP address	*/
49086d7f5d3SJohn Marino#define MRT_MFC_BW_UPCALL          (1 << 9)  /* enable bw upcalls	*/
49186d7f5d3SJohn Marino.Ed
49286d7f5d3SJohn Marino.\" .Pp
49386d7f5d3SJohn Marino.\" In the future there might be:
49486d7f5d3SJohn Marino.\" .Bd -literal
49586d7f5d3SJohn Marino.\" #define MRT_MFC_GROUP_SPECIFIC     (1 << 10) /* allow (*,G) MFC entries */
49686d7f5d3SJohn Marino.\" .Ed
49786d7f5d3SJohn Marino.\" .Pp
49886d7f5d3SJohn Marino.\" to allow (*,G) MFC entries (i.e., group-specific entries) in the kernel.
49986d7f5d3SJohn Marino.\" For now this is left-out until it is clear whether
50086d7f5d3SJohn Marino.\" (*,G) MFC support is the preferred solution instead of something more generic
50186d7f5d3SJohn Marino.\" solution for example.
50286d7f5d3SJohn Marino.\"
50386d7f5d3SJohn Marino.\" 2. The newly defined struct mfcctl2.
50486d7f5d3SJohn Marino.\"
50586d7f5d3SJohn Marino.Pp
50686d7f5d3SJohn MarinoThe advanced multicast API uses a newly defined
50786d7f5d3SJohn Marino.Dq struct mfcctl2
50886d7f5d3SJohn Marinoinstead of the traditional
50986d7f5d3SJohn Marino.Dq struct mfcctl .
51086d7f5d3SJohn MarinoThe original
51186d7f5d3SJohn Marino.Dq struct mfcctl
51286d7f5d3SJohn Marinois kept as is.
51386d7f5d3SJohn MarinoThe new
51486d7f5d3SJohn Marino.Dq struct mfcctl2
51586d7f5d3SJohn Marinois:
51686d7f5d3SJohn Marino.Bd -literal
51786d7f5d3SJohn Marino/*
51886d7f5d3SJohn Marino * The new argument structure for MRT_ADD_MFC and MRT_DEL_MFC overlays
51986d7f5d3SJohn Marino * and extends the old struct mfcctl.
52086d7f5d3SJohn Marino */
52186d7f5d3SJohn Marinostruct mfcctl2 {
52286d7f5d3SJohn Marino        /* the mfcctl fields */
52386d7f5d3SJohn Marino        struct in_addr  mfcc_origin;       /* ip origin of mcasts       */
52486d7f5d3SJohn Marino        struct in_addr  mfcc_mcastgrp;     /* multicast group associated*/
52586d7f5d3SJohn Marino        vifi_t          mfcc_parent;       /* incoming vif              */
52686d7f5d3SJohn Marino        u_char          mfcc_ttls[MAXVIFS];/* forwarding ttls on vifs   */
52786d7f5d3SJohn Marino
52886d7f5d3SJohn Marino        /* extension fields */
52986d7f5d3SJohn Marino        uint8_t         mfcc_flags[MAXVIFS];/* the MRT_MFC_FLAGS_* flags*/
53086d7f5d3SJohn Marino        struct in_addr  mfcc_rp;            /* the RP address           */
53186d7f5d3SJohn Marino};
53286d7f5d3SJohn Marino.Ed
53386d7f5d3SJohn Marino.Pp
53486d7f5d3SJohn MarinoThe new fields are
53586d7f5d3SJohn Marino.Dq mfcc_flags[MAXVIFS]
53686d7f5d3SJohn Marinoand
53786d7f5d3SJohn Marino.Dq mfcc_rp .
53886d7f5d3SJohn MarinoNote that for compatibility reasons they are added at the end.
53986d7f5d3SJohn Marino.Pp
54086d7f5d3SJohn MarinoThe
54186d7f5d3SJohn Marino.Dq mfcc_flags[MAXVIFS]
54286d7f5d3SJohn Marinofield is used to set various flags per
54386d7f5d3SJohn Marinointerface per (S,G) entry.
54486d7f5d3SJohn MarinoCurrently, the defined flags are:
54586d7f5d3SJohn Marino.Bd -literal
54686d7f5d3SJohn Marino#define	MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0) /* disable WRONGVIF signals */
54786d7f5d3SJohn Marino#define	MRT_MFC_FLAGS_BORDER_VIF       (1 << 1) /* border vif          */
54886d7f5d3SJohn Marino.Ed
54986d7f5d3SJohn Marino.Pp
55086d7f5d3SJohn MarinoThe
55186d7f5d3SJohn Marino.Dq MRT_MFC_FLAGS_DISABLE_WRONGVIF
55286d7f5d3SJohn Marinoflag is used to explicitly disable the
55386d7f5d3SJohn Marino.Dq IGMPMSG_WRONGVIF
55486d7f5d3SJohn Marinokernel signal at the (S,G) granularity if a multicast data packet
55586d7f5d3SJohn Marinoarrives on the wrong interface.
55686d7f5d3SJohn MarinoUsually, this signal is used to
55786d7f5d3SJohn Marinocomplete the shortest-path switch in case of PIM-SM multicast routing,
55886d7f5d3SJohn Marinoor to trigger a PIM assert message.
55986d7f5d3SJohn MarinoHowever, it should not be delivered for interfaces that are not in
56086d7f5d3SJohn Marinothe outgoing interface set, and that are not expecting to
56186d7f5d3SJohn Marinobecome an incoming interface.
56286d7f5d3SJohn MarinoHence, if the
56386d7f5d3SJohn Marino.Dq MRT_MFC_FLAGS_DISABLE_WRONGVIF
56486d7f5d3SJohn Marinoflag is set for some of the
56586d7f5d3SJohn Marinointerfaces, then a data packet that arrives on that interface for
56686d7f5d3SJohn Marinothat MFC entry will NOT trigger a WRONGVIF signal.
56786d7f5d3SJohn MarinoIf that flag is not set, then a signal is triggered (the default action).
56886d7f5d3SJohn Marino.Pp
56986d7f5d3SJohn MarinoThe
57086d7f5d3SJohn Marino.Dq MRT_MFC_FLAGS_BORDER_VIF
57186d7f5d3SJohn Marinoflag is used to specify whether the Border-bit in PIM
57286d7f5d3SJohn MarinoRegister messages should be set (in case when the Register encapsulation
57386d7f5d3SJohn Marinois performed inside the kernel).
57486d7f5d3SJohn MarinoIf it is set for the special PIM Register kernel virtual interface
57586d7f5d3SJohn Marino(see
57686d7f5d3SJohn Marino.Xr pim 4 ) ,
57786d7f5d3SJohn Marinothe Border-bit in the Register messages sent to the RP will be set.
57886d7f5d3SJohn Marino.Pp
57986d7f5d3SJohn MarinoThe remaining six bits are reserved for future usage.
58086d7f5d3SJohn Marino.Pp
58186d7f5d3SJohn MarinoThe
58286d7f5d3SJohn Marino.Dq mfcc_rp
58386d7f5d3SJohn Marinofield is used to specify the RP address (in case of PIM-SM multicast routing)
58486d7f5d3SJohn Marinofor a multicast
58586d7f5d3SJohn Marinogroup G if we want to perform kernel-level PIM Register encapsulation.
58686d7f5d3SJohn MarinoThe
58786d7f5d3SJohn Marino.Dq mfcc_rp
58886d7f5d3SJohn Marinofield is used only if the
58986d7f5d3SJohn Marino.Dq MRT_MFC_RP
59086d7f5d3SJohn Marinoadvanced API flag/capability has been successfully set by
59186d7f5d3SJohn Marinosetsockopt(MRT_API_CONFIG).
59286d7f5d3SJohn Marino.Pp
59386d7f5d3SJohn Marino.\"
59486d7f5d3SJohn Marino.\" 3. Kernel-level PIM Register encapsulation
59586d7f5d3SJohn Marino.\"
59686d7f5d3SJohn MarinoIf the
59786d7f5d3SJohn Marino.Dq MRT_MFC_RP
59886d7f5d3SJohn Marinoflag was successfully set by
59986d7f5d3SJohn Marinosetsockopt(MRT_API_CONFIG), then the kernel will attempt to perform
60086d7f5d3SJohn Marinothe PIM Register encapsulation itself instead of sending the
60186d7f5d3SJohn Marinomulticast data packets to user level (inside IGMPMSG_WHOLEPKT
60286d7f5d3SJohn Marinoupcalls) for user-level encapsulation.
60386d7f5d3SJohn MarinoThe RP address would be taken from the
60486d7f5d3SJohn Marino.Dq mfcc_rp
60586d7f5d3SJohn Marinofield
60686d7f5d3SJohn Marinoinside the new
60786d7f5d3SJohn Marino.Dq struct mfcctl2 .
60886d7f5d3SJohn MarinoHowever, even if the
60986d7f5d3SJohn Marino.Dq MRT_MFC_RP
61086d7f5d3SJohn Marinoflag was successfully set, if the
61186d7f5d3SJohn Marino.Dq mfcc_rp
61286d7f5d3SJohn Marinofield was set to
61386d7f5d3SJohn Marino.Dq INADDR_ANY ,
61486d7f5d3SJohn Marinothen the
61586d7f5d3SJohn Marinokernel will still deliver an IGMPMSG_WHOLEPKT upcall with the
61686d7f5d3SJohn Marinomulticast data packet to the user-level process.
61786d7f5d3SJohn Marino.Pp
61886d7f5d3SJohn MarinoIn addition, if the multicast data packet is too large to fit within
61986d7f5d3SJohn Marinoa single IP packet after the PIM Register encapsulation (e.g., if
62086d7f5d3SJohn Marinoits size was on the order of 65500 bytes), the data packet will be
62186d7f5d3SJohn Marinofragmented, and then each of the fragments will be encapsulated
62286d7f5d3SJohn Marinoseparately.
62386d7f5d3SJohn MarinoNote that typically a multicast data packet can be that
62486d7f5d3SJohn Marinolarge only if it was originated locally from the same hosts that
62586d7f5d3SJohn Marinoperforms the encapsulation; otherwise the transmission of the
62686d7f5d3SJohn Marinomulticast data packet over Ethernet for example would have
62786d7f5d3SJohn Marinofragmented it into much smaller pieces.
62886d7f5d3SJohn Marino.\"
62986d7f5d3SJohn Marino.\" Note that if this code is ported to IPv6, we may need the kernel to
63086d7f5d3SJohn Marino.\" perform MTU discovery to the RP, and keep those discoveries inside
63186d7f5d3SJohn Marino.\" the kernel so the encapsulating router may send back ICMP
63286d7f5d3SJohn Marino.\" Fragmentation Required if the size of the multicast data packet is
63386d7f5d3SJohn Marino.\" too large (see "Encapsulating data packets in the Register Tunnel"
63486d7f5d3SJohn Marino.\" in Section 4.4.1 in the PIM-SM spec
63586d7f5d3SJohn Marino.\" draft-ietf-pim-sm-v2-new-05.{txt,ps}).
63686d7f5d3SJohn Marino.\" For IPv4 we may be able to get away without it, but for IPv6 we need
63786d7f5d3SJohn Marino.\" that.
63886d7f5d3SJohn Marino.\"
63986d7f5d3SJohn Marino.\" 4. Mechanism for "multicast bandwidth monitoring and upcalls".
64086d7f5d3SJohn Marino.\"
64186d7f5d3SJohn Marino.Pp
64286d7f5d3SJohn MarinoTypically, a multicast routing user-level process would need to know the
64386d7f5d3SJohn Marinoforwarding bandwidth for some data flow.
64486d7f5d3SJohn MarinoFor example, the multicast routing process may want to timeout idle MFC
64586d7f5d3SJohn Marinoentries, or in case of PIM-SM it can initiate (S,G) shortest-path switch if
64686d7f5d3SJohn Marinothe bandwidth rate is above a threshold for example.
64786d7f5d3SJohn Marino.Pp
64886d7f5d3SJohn MarinoThe original solution for measuring the bandwidth of a dataflow was
64986d7f5d3SJohn Marinothat a user-level process would periodically
65086d7f5d3SJohn Marinoquery the kernel about the number of forwarded packets/bytes per
65186d7f5d3SJohn Marino(S,G), and then based on those numbers it would estimate whether a source
65286d7f5d3SJohn Marinohas been idle, or whether the source's transmission bandwidth is above a
65386d7f5d3SJohn Marinothreshold.
65486d7f5d3SJohn MarinoThat solution is far from being scalable, hence the need for a new
65586d7f5d3SJohn Marinomechanism for bandwidth monitoring.
65686d7f5d3SJohn Marino.Pp
65786d7f5d3SJohn MarinoBelow is a description of the bandwidth monitoring mechanism.
65886d7f5d3SJohn Marino.Bl -bullet
65986d7f5d3SJohn Marino.It
66086d7f5d3SJohn MarinoIf the bandwidth of a data flow satisfies some pre-defined filter,
66186d7f5d3SJohn Marinothe kernel delivers an upcall on the multicast routing socket
66286d7f5d3SJohn Marinoto the multicast routing process that has installed that filter.
66386d7f5d3SJohn Marino.It
66486d7f5d3SJohn MarinoThe bandwidth-upcall filters are installed per (S,G). There can be
66586d7f5d3SJohn Marinomore than one filter per (S,G).
66686d7f5d3SJohn Marino.It
66786d7f5d3SJohn MarinoInstead of supporting all possible comparison operations
66886d7f5d3SJohn Marino(i.e., < <= == != > >= ), there is support only for the
66986d7f5d3SJohn Marino<= and >= operations,
67086d7f5d3SJohn Marinobecause this makes the kernel-level implementation simpler,
67186d7f5d3SJohn Marinoand because practically we need only those two.
67286d7f5d3SJohn MarinoFurther, the missing operations can be simulated by secondary
67386d7f5d3SJohn Marinouser-level filtering of those <= and >= filters.
67486d7f5d3SJohn MarinoFor example, to simulate !=, then we need to install filter
67586d7f5d3SJohn Marino.Dq bw <= 0xffffffff ,
67686d7f5d3SJohn Marinoand after an
67786d7f5d3SJohn Marinoupcall is received, we need to check whether
67886d7f5d3SJohn Marino.Dq measured_bw != expected_bw .
67986d7f5d3SJohn Marino.It
68086d7f5d3SJohn MarinoThe bandwidth-upcall mechanism is enabled by
68186d7f5d3SJohn Marinosetsockopt(MRT_API_CONFIG) for the MRT_MFC_BW_UPCALL flag.
68286d7f5d3SJohn Marino.It
68386d7f5d3SJohn MarinoThe bandwidth-upcall filters are added/deleted by the new
68486d7f5d3SJohn Marinosetsockopt(MRT_ADD_BW_UPCALL) and setsockopt(MRT_DEL_BW_UPCALL)
68586d7f5d3SJohn Marinorespectively (with the appropriate
68686d7f5d3SJohn Marino.Dq struct bw_upcall
68786d7f5d3SJohn Marinoargument of course).
68886d7f5d3SJohn Marino.El
68986d7f5d3SJohn Marino.Pp
69086d7f5d3SJohn MarinoFrom application point of view, a developer needs to know about
69186d7f5d3SJohn Marinothe following:
69286d7f5d3SJohn Marino.Bd -literal
69386d7f5d3SJohn Marino/*
69486d7f5d3SJohn Marino * Structure for installing or delivering an upcall if the
69586d7f5d3SJohn Marino * measured bandwidth is above or below a threshold.
69686d7f5d3SJohn Marino *
69786d7f5d3SJohn Marino * User programs (e.g. daemons) may have a need to know when the
69886d7f5d3SJohn Marino * bandwidth used by some data flow is above or below some threshold.
69986d7f5d3SJohn Marino * This interface allows the userland to specify the threshold (in
70086d7f5d3SJohn Marino * bytes and/or packets) and the measurement interval. Flows are
70186d7f5d3SJohn Marino * all packet with the same source and destination IP address.
70286d7f5d3SJohn Marino * At the moment the code is only used for multicast destinations
70386d7f5d3SJohn Marino * but there is nothing that prevents its use for unicast.
70486d7f5d3SJohn Marino *
70586d7f5d3SJohn Marino * The measurement interval cannot be shorter than some Tmin (currently, 3s).
70686d7f5d3SJohn Marino * The threshold is set in packets and/or bytes per_interval.
70786d7f5d3SJohn Marino *
70886d7f5d3SJohn Marino * Measurement works as follows:
70986d7f5d3SJohn Marino *
71086d7f5d3SJohn Marino * For >= measurements:
71186d7f5d3SJohn Marino * The first packet marks the start of a measurement interval.
71286d7f5d3SJohn Marino * During an interval we count packets and bytes, and when we
71386d7f5d3SJohn Marino * pass the threshold we deliver an upcall and we are done.
71486d7f5d3SJohn Marino * The first packet after the end of the interval resets the
71586d7f5d3SJohn Marino * count and restarts the measurement.
71686d7f5d3SJohn Marino *
71786d7f5d3SJohn Marino * For <= measurement:
71886d7f5d3SJohn Marino * We start a timer to fire at the end of the interval, and
71986d7f5d3SJohn Marino * then for each incoming packet we count packets and bytes.
72086d7f5d3SJohn Marino * When the timer fires, we compare the value with the threshold,
72186d7f5d3SJohn Marino * schedule an upcall if we are below, and restart the measurement
72286d7f5d3SJohn Marino * (reschedule timer and zero counters).
72386d7f5d3SJohn Marino */
72486d7f5d3SJohn Marino
72586d7f5d3SJohn Marinostruct bw_data {
72686d7f5d3SJohn Marino        struct timeval  b_time;
72786d7f5d3SJohn Marino        uint64_t        b_packets;
72886d7f5d3SJohn Marino        uint64_t        b_bytes;
72986d7f5d3SJohn Marino};
73086d7f5d3SJohn Marino
73186d7f5d3SJohn Marinostruct bw_upcall {
73286d7f5d3SJohn Marino        struct in_addr  bu_src;         /* source address            */
73386d7f5d3SJohn Marino        struct in_addr  bu_dst;         /* destination address       */
73486d7f5d3SJohn Marino        uint32_t        bu_flags;       /* misc flags (see below)    */
73586d7f5d3SJohn Marino#define BW_UPCALL_UNIT_PACKETS (1 << 0) /* threshold (in packets)    */
73686d7f5d3SJohn Marino#define BW_UPCALL_UNIT_BYTES   (1 << 1) /* threshold (in bytes)      */
73786d7f5d3SJohn Marino#define BW_UPCALL_GEQ          (1 << 2) /* upcall if bw >= threshold */
73886d7f5d3SJohn Marino#define BW_UPCALL_LEQ          (1 << 3) /* upcall if bw <= threshold */
73986d7f5d3SJohn Marino#define BW_UPCALL_DELETE_ALL   (1 << 4) /* delete all upcalls for s,d*/
74086d7f5d3SJohn Marino        struct bw_data  bu_threshold;   /* the bw threshold          */
74186d7f5d3SJohn Marino        struct bw_data  bu_measured;    /* the measured bw           */
74286d7f5d3SJohn Marino};
74386d7f5d3SJohn Marino
74486d7f5d3SJohn Marino/* max. number of upcalls to deliver together */
74586d7f5d3SJohn Marino#define BW_UPCALLS_MAX				128
74686d7f5d3SJohn Marino/* min. threshold time interval for bandwidth measurement */
74786d7f5d3SJohn Marino#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_SEC	3
74886d7f5d3SJohn Marino#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_USEC	0
74986d7f5d3SJohn Marino.Ed
75086d7f5d3SJohn Marino.Pp
75186d7f5d3SJohn MarinoThe
75286d7f5d3SJohn Marino.Dq bw_upcall
75386d7f5d3SJohn Marinostructure is used as an argument to
75486d7f5d3SJohn Marinosetsockopt(MRT_ADD_BW_UPCALL) and setsockopt(MRT_DEL_BW_UPCALL).
75586d7f5d3SJohn MarinoEach setsockopt(MRT_ADD_BW_UPCALL) installs a filter in the kernel
75686d7f5d3SJohn Marinofor the source and destination address in the
75786d7f5d3SJohn Marino.Dq bw_upcall
75886d7f5d3SJohn Marinoargument,
75986d7f5d3SJohn Marinoand that filter will trigger an upcall according to the following
76086d7f5d3SJohn Marinopseudo-algorithm:
76186d7f5d3SJohn Marino.Bd -literal
76286d7f5d3SJohn Marino if (bw_upcall_oper IS ">=") {
76386d7f5d3SJohn Marino    if (((bw_upcall_unit & PACKETS == PACKETS) &&
76486d7f5d3SJohn Marino         (measured_packets >= threshold_packets)) ||
76586d7f5d3SJohn Marino        ((bw_upcall_unit & BYTES == BYTES) &&
76686d7f5d3SJohn Marino         (measured_bytes >= threshold_bytes)))
76786d7f5d3SJohn Marino       SEND_UPCALL("measured bandwidth is >= threshold");
76886d7f5d3SJohn Marino  }
76986d7f5d3SJohn Marino  if (bw_upcall_oper IS "<=" && measured_interval >= threshold_interval) {
77086d7f5d3SJohn Marino    if (((bw_upcall_unit & PACKETS == PACKETS) &&
77186d7f5d3SJohn Marino         (measured_packets <= threshold_packets)) ||
77286d7f5d3SJohn Marino        ((bw_upcall_unit & BYTES == BYTES) &&
77386d7f5d3SJohn Marino         (measured_bytes <= threshold_bytes)))
77486d7f5d3SJohn Marino       SEND_UPCALL("measured bandwidth is <= threshold");
77586d7f5d3SJohn Marino  }
77686d7f5d3SJohn Marino.Ed
77786d7f5d3SJohn Marino.Pp
77886d7f5d3SJohn MarinoIn the same
77986d7f5d3SJohn Marino.Dq bw_upcall
78086d7f5d3SJohn Marinothe unit can be specified in both BYTES and PACKETS.
78186d7f5d3SJohn MarinoHowever, the GEQ and LEQ flags are mutually exclusive.
78286d7f5d3SJohn Marino.Pp
78386d7f5d3SJohn MarinoBasically, an upcall is delivered if the measured bandwidth is >= or
78486d7f5d3SJohn Marino<= the threshold bandwidth (within the specified measurement
78586d7f5d3SJohn Marinointerval).
78686d7f5d3SJohn MarinoFor practical reasons, the smallest value for the measurement
78786d7f5d3SJohn Marinointerval is 3 seconds.
78886d7f5d3SJohn MarinoIf smaller values are allowed, then the bandwidth
78986d7f5d3SJohn Marinoestimation may be less accurate, or the potentially very high frequency
79086d7f5d3SJohn Marinoof the generated upcalls may introduce too much overhead.
79186d7f5d3SJohn MarinoFor the >= operation, the answer may be known before the end of
79286d7f5d3SJohn Marino.Dq threshold_interval ,
79386d7f5d3SJohn Marinotherefore the upcall may be delivered earlier.
79486d7f5d3SJohn MarinoFor the <= operation however, we must wait
79586d7f5d3SJohn Marinountil the threshold interval has expired to know the answer.
79686d7f5d3SJohn Marino.Pp
79786d7f5d3SJohn MarinoExample of usage:
79886d7f5d3SJohn Marino.Bd -literal
79986d7f5d3SJohn Marinostruct bw_upcall bw_upcall;
80086d7f5d3SJohn Marino/* Assign all bw_upcall fields as appropriate */
80186d7f5d3SJohn Marinomemset(&bw_upcall, 0, sizeof(bw_upcall));
80286d7f5d3SJohn Marinomemcpy(&bw_upcall.bu_src, &source, sizeof(bw_upcall.bu_src));
80386d7f5d3SJohn Marinomemcpy(&bw_upcall.bu_dst, &group, sizeof(bw_upcall.bu_dst));
80486d7f5d3SJohn Marinobw_upcall.bu_threshold.b_data = threshold_interval;
80586d7f5d3SJohn Marinobw_upcall.bu_threshold.b_packets = threshold_packets;
80686d7f5d3SJohn Marinobw_upcall.bu_threshold.b_bytes = threshold_bytes;
80786d7f5d3SJohn Marinoif (is_threshold_in_packets)
80886d7f5d3SJohn Marino    bw_upcall.bu_flags |= BW_UPCALL_UNIT_PACKETS;
80986d7f5d3SJohn Marinoif (is_threshold_in_bytes)
81086d7f5d3SJohn Marino    bw_upcall.bu_flags |= BW_UPCALL_UNIT_BYTES;
81186d7f5d3SJohn Marinodo {
81286d7f5d3SJohn Marino    if (is_geq_upcall) {
81386d7f5d3SJohn Marino        bw_upcall.bu_flags |= BW_UPCALL_GEQ;
81486d7f5d3SJohn Marino        break;
81586d7f5d3SJohn Marino    }
81686d7f5d3SJohn Marino    if (is_leq_upcall) {
81786d7f5d3SJohn Marino        bw_upcall.bu_flags |= BW_UPCALL_LEQ;
81886d7f5d3SJohn Marino        break;
81986d7f5d3SJohn Marino    }
82086d7f5d3SJohn Marino    return (ERROR);
82186d7f5d3SJohn Marino} while (0);
82286d7f5d3SJohn Marinosetsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_BW_UPCALL,
82386d7f5d3SJohn Marino          (void *)&bw_upcall, sizeof(bw_upcall));
82486d7f5d3SJohn Marino.Ed
82586d7f5d3SJohn Marino.Pp
82686d7f5d3SJohn MarinoTo delete a single filter, then use MRT_DEL_BW_UPCALL,
82786d7f5d3SJohn Marinoand the fields of bw_upcall must be set
82886d7f5d3SJohn Marinoexactly same as when MRT_ADD_BW_UPCALL was called.
82986d7f5d3SJohn Marino.Pp
83086d7f5d3SJohn MarinoTo delete all bandwidth filters for a given (S,G), then
83186d7f5d3SJohn Marinoonly the
83286d7f5d3SJohn Marino.Dq bu_src
83386d7f5d3SJohn Marinoand
83486d7f5d3SJohn Marino.Dq bu_dst
83586d7f5d3SJohn Marinofields in
83686d7f5d3SJohn Marino.Dq struct bw_upcall
83786d7f5d3SJohn Marinoneed to be set, and then just set only the
83886d7f5d3SJohn Marino.Dq BW_UPCALL_DELETE_ALL
83986d7f5d3SJohn Marinoflag inside field
84086d7f5d3SJohn Marino.Dq bw_upcall.bu_flags .
84186d7f5d3SJohn Marino.Pp
84286d7f5d3SJohn MarinoThe bandwidth upcalls are received by aggregating them in the new upcall
84386d7f5d3SJohn Marinomessage:
84486d7f5d3SJohn Marino.Bd -literal
84586d7f5d3SJohn Marino#define IGMPMSG_BW_UPCALL  4  /* BW monitoring upcall */
84686d7f5d3SJohn Marino.Ed
84786d7f5d3SJohn Marino.Pp
84886d7f5d3SJohn MarinoThis message is an array of
84986d7f5d3SJohn Marino.Dq struct bw_upcall
85086d7f5d3SJohn Marinoelements (up to BW_UPCALLS_MAX = 128).
85186d7f5d3SJohn MarinoThe upcalls are
85286d7f5d3SJohn Marinodelivered when there are 128 pending upcalls, or when 1 second has
85386d7f5d3SJohn Marinoexpired since the previous upcall (whichever comes first).
85486d7f5d3SJohn MarinoIn an
85586d7f5d3SJohn Marino.Dq struct upcall
85686d7f5d3SJohn Marinoelement, the
85786d7f5d3SJohn Marino.Dq bu_measured
85886d7f5d3SJohn Marinofield is filled-in to
85986d7f5d3SJohn Marinoindicate the particular measured values.
86086d7f5d3SJohn MarinoHowever, because of the way
86186d7f5d3SJohn Marinothe particular intervals are measured, the user should be careful how
86286d7f5d3SJohn Marinobu_measured.b_time is used.
86386d7f5d3SJohn MarinoFor example, if the
86486d7f5d3SJohn Marinofilter is installed to trigger an upcall if the number of packets
86586d7f5d3SJohn Marinois >= 1, then
86686d7f5d3SJohn Marino.Dq bu_measured
86786d7f5d3SJohn Marinomay have a value of zero in the upcalls after the
86886d7f5d3SJohn Marinofirst one, because the measured interval for >= filters is
86986d7f5d3SJohn Marino.Dq clocked
87086d7f5d3SJohn Marinoby the forwarded packets.
87186d7f5d3SJohn MarinoHence, this upcall mechanism should not be used for measuring
87286d7f5d3SJohn Marinothe exact value of the bandwidth of the forwarded data.
87386d7f5d3SJohn MarinoTo measure the exact bandwidth, the user would need to
87486d7f5d3SJohn Marinoget the forwarded packets statistics with the ioctl(SIOCGETSGCNT)
87586d7f5d3SJohn Marinomechanism
87686d7f5d3SJohn Marino(see the
87786d7f5d3SJohn Marino.Sx Programming Guide
87886d7f5d3SJohn Marinosection) .
87986d7f5d3SJohn Marino.Pp
88086d7f5d3SJohn MarinoNote that the upcalls for a filter are delivered until the specific
88186d7f5d3SJohn Marinofilter is deleted, but no more frequently than once per
88286d7f5d3SJohn Marino.Dq bu_threshold.b_time .
88386d7f5d3SJohn MarinoFor example, if the filter is specified to
88486d7f5d3SJohn Marinodeliver a signal if bw >= 1 packet, the first packet will trigger a
88586d7f5d3SJohn Marinosignal, but the next upcall will be triggered no earlier than
88686d7f5d3SJohn Marino.Dq bu_threshold.b_time
88786d7f5d3SJohn Marinoafter the previous upcall.
88886d7f5d3SJohn Marino.\"
88986d7f5d3SJohn Marino.Sh SEE ALSO
89086d7f5d3SJohn Marino.Xr getsockopt 2 ,
89186d7f5d3SJohn Marino.Xr recvfrom 2 ,
89286d7f5d3SJohn Marino.Xr recvmsg 2 ,
89386d7f5d3SJohn Marino.Xr setsockopt 2 ,
89486d7f5d3SJohn Marino.Xr socket 2 ,
89586d7f5d3SJohn Marino.Xr icmp6 4 ,
89686d7f5d3SJohn Marino.Xr inet 4 ,
89786d7f5d3SJohn Marino.Xr inet6 4 ,
89886d7f5d3SJohn Marino.Xr intro 4 ,
89986d7f5d3SJohn Marino.Xr ip 4 ,
90086d7f5d3SJohn Marino.Xr ip6 4 ,
90186d7f5d3SJohn Marino.Xr pim 4
90286d7f5d3SJohn Marino.\"
90386d7f5d3SJohn Marino.Sh AUTHORS
90486d7f5d3SJohn Marino.An -nosplit
90586d7f5d3SJohn MarinoThe original multicast code was written by
90686d7f5d3SJohn Marino.An David Waitzman
90786d7f5d3SJohn Marino(BBN Labs), and later modified by the following individuals:
90886d7f5d3SJohn Marino.An Steve Deering
90986d7f5d3SJohn Marino(Stanford),
91086d7f5d3SJohn Marino.An Mark J. Steiglitz
91186d7f5d3SJohn Marino(Stanford),
91286d7f5d3SJohn Marino.An Van Jacobson
91386d7f5d3SJohn Marino(LBL),
91486d7f5d3SJohn Marino.An Ajit Thyagarajan
91586d7f5d3SJohn Marino(PARC),
91686d7f5d3SJohn Marino.An Bill Fenner
91786d7f5d3SJohn Marino(PARC).
91886d7f5d3SJohn MarinoThe IPv6 multicast support was implemented by the KAME project
91986d7f5d3SJohn Marino.Pa ( http://www.kame.net ) ,
92086d7f5d3SJohn Marinoand was based on the IPv4 multicast code.
92186d7f5d3SJohn MarinoThe advanced multicast API and the multicast bandwidth
92286d7f5d3SJohn Marinomonitoring were implemented by
92386d7f5d3SJohn Marino.An Pavlin Radoslavov
92486d7f5d3SJohn Marino(ICSI) in collaboration with
92586d7f5d3SJohn Marino.An Chris Brown
92686d7f5d3SJohn Marino(NextHop).
92786d7f5d3SJohn Marino.Pp
92886d7f5d3SJohn MarinoThis manual page was written by
92986d7f5d3SJohn Marino.An Pavlin Radoslavov
93086d7f5d3SJohn Marino(ICSI).
931