xref: /netbsd-src/external/bsd/ntp/dist/README.leapsmear (revision f17b710f3d406bee67aa39c65053114ab78297c5)
1*f17b710fSchristosLeap Second Smearing with NTP
2*f17b710fSchristos-----------------------------
3*f17b710fSchristos
4*f17b710fSchristosBy Martin Burnicki
5*f17b710fSchristoswith some edits by Harlan Stenn
6*f17b710fSchristos
7*f17b710fSchristosThe NTP software protocol and its reference implementation, ntpd, were
8*f17b710fSchristosoriginally designed to distribute UTC time over a network as accurately as
9*f17b710fSchristospossible.
10*f17b710fSchristos
11*f17b710fSchristosUnfortunately, leap seconds are scheduled to be inserted into or deleted
12*f17b710fSchristosfrom the UTC time scale in irregular intervals to keep the UTC time scale
13*f17b710fSchristossynchronized with the Earth rotation.  Deletions haven't happened, yet, but
14*f17b710fSchristosinsertions have happened over 30 times.
15*f17b710fSchristos
16*f17b710fSchristosThe problem is that POSIX requires 86400 seconds in a day, and there is no
17*f17b710fSchristosprescribed way to handle leap seconds in POSIX.
18*f17b710fSchristos
19*f17b710fSchristosWhenever a leap second is to be handled ntpd either:
20*f17b710fSchristos
21*f17b710fSchristos- passes the leap second announcement down to the OS kernel (if the OS
22*f17b710fSchristossupports this) and the kernel handles the leap second automatically, or
23*f17b710fSchristos
24*f17b710fSchristos- applies the leap second correction itself.
25*f17b710fSchristos
26*f17b710fSchristosNTP servers also pass a leap second warning flag down to their clients via
27*f17b710fSchristosthe normal NTP packet exchange, so clients also become aware of an
28*f17b710fSchristosapproaching leap second, and can handle the leap second appropriately.
29*f17b710fSchristos
30*f17b710fSchristos
31*f17b710fSchristosThe Problem on Unix-like Systems
32*f17b710fSchristos--------------------------------
33*f17b710fSchristosIf a leap second is to be inserted then in most Unix-like systems the OS
34*f17b710fSchristoskernel just steps the time back by 1 second at the beginning of the leap
35*f17b710fSchristossecond, so the last second of the UTC day is repeated and thus duplicate
36*f17b710fSchristostimestamps can occur.
37*f17b710fSchristos
38*f17b710fSchristosUnfortunately there are lots of applications which get confused it the
39*f17b710fSchristossystem time is stepped back, e.g. due to a leap second insertion.  Thus,
40*f17b710fSchristosmany users have been looking for ways to avoid this, and tried to introduce
41*f17b710fSchristosworkarounds which may work properly, or not.
42*f17b710fSchristos
43*f17b710fSchristosSo even though these Unix kernels normally can handle leap seconds, the way
44*f17b710fSchristosthey do this is not optimal for applications.
45*f17b710fSchristos
46*f17b710fSchristosOne good way to handle the leap second is to use ntp_gettime() instead of
47*f17b710fSchristosthe usual calls, because ntp_gettime() includes a "clock state" variable
48*f17b710fSchristosthat will actually tell you if the time you are receiving is OK or not, and
49*f17b710fSchristosif it is OK, if the current second is an in-progress leap second.  But even
50*f17b710fSchristosthough this mechanism has been available for about 20 years' time, almost
51*f17b710fSchristosnobody uses it.
52*f17b710fSchristos
53*f17b710fSchristos
54*f17b710fSchristosNTP Client for Windows Contains a Workaround
55*f17b710fSchristos--------------------------------------------
56*f17b710fSchristosThe Windows system time knows nothing about leap seconds, so for many years
57*f17b710fSchristosthe Windows port of ntpd provides a workaround where the system time is
58*f17b710fSchristosslewed by the client to compensate the leap second.
59*f17b710fSchristos
60*f17b710fSchristosThus it is not required to use a smearing NTP server for Windows clients,
61*f17b710fSchristosbut of course the smearing server approach also works.
62*f17b710fSchristos
63*f17b710fSchristos
64*f17b710fSchristosThe Leap Smear Approach
65*f17b710fSchristos-----------------------
66*f17b710fSchristosDue to the reasons mentioned above some support for leap smearing has
67*f17b710fSchristosrecently been implemented in ntpd.  This means that to insert a leap second
68*f17b710fSchristosan NTP server adds a certain increasing "smear" offset to the real UTC time
69*f17b710fSchristossent to its clients, so that after some predefined interval the leap second
70*f17b710fSchristosoffset is compensated.  The smear interval should be long enough,
71*f17b710fSchristose.g. several hours, so that NTP clients can easily follow the clock drift
72*f17b710fSchristoscaused by the smeared time.
73*f17b710fSchristos
74*f17b710fSchristosDuring the period while the leap smear is being performed, ntpd will include
75*f17b710fSchristosa specially-formatted 'refid' in time packets that contain "smeared" time.
76*f17b710fSchristosThis refid is of the form 254.x.y.z, where x.y.z are 24 encoded bits of the
77*f17b710fSchristossmear value.
78*f17b710fSchristos
79*f17b710fSchristosWith this approach the time an NTP server sends to its clients still matches
80*f17b710fSchristosUTC before the leap second, up to the beginning of the smear interval, and
81*f17b710fSchristosagain corresponds to UTC after the insertion of the leap second has
82*f17b710fSchristosfinished, at the end of the smear interval.  By examining the first byte of
83*f17b710fSchristosthe refid, one can also determine if the server is offering smeared time or
84*f17b710fSchristosnot.
85*f17b710fSchristos
86*f17b710fSchristosOf course, clients which receive the "smeared" time from an NTP server don't
87*f17b710fSchristoshave to (and even must not) care about the leap second anymore.  Smearing is
88*f17b710fSchristosjust transparent to the clients, and the clients don't even notice there's a
89*f17b710fSchristosleap second.
90*f17b710fSchristos
91*f17b710fSchristos
92*f17b710fSchristosPros and Cons of the Smearing Approach
93*f17b710fSchristos--------------------------------------
94*f17b710fSchristosThe disadvantages of this approach are:
95*f17b710fSchristos
96*f17b710fSchristos- During the smear interval the time provided by smearing NTP servers
97*f17b710fSchristosdiffers significantly from UTC, and thus from the time provided by normal,
98*f17b710fSchristosnon-smearing NTP servers.  The difference can be up to 1 second, depending
99*f17b710fSchristoson the smear algorithm.
100*f17b710fSchristos
101*f17b710fSchristos- Since smeared time differs from true UTC, and many applications require
102*f17b710fSchristoscorrect legal time (UTC), there may be legal consequences to using smeared
103*f17b710fSchristostime.  Make sure you check to see if this requirement affects you.
104*f17b710fSchristos
105*f17b710fSchristosHowever, for applications where it's only important that all computers have
106*f17b710fSchristosthe same time and a temporary offset of up to 1 s to UTC is acceptable, a
107*f17b710fSchristosbetter approach may be to slew the time in a well defined way, over a
108*f17b710fSchristoscertain interval, which is what we call smearing the leap second.
109*f17b710fSchristos
110*f17b710fSchristos
111*f17b710fSchristosThe Motivation to Implement Leap Smearing
112*f17b710fSchristos-----------------------------------------
113*f17b710fSchristosHere is some historical background for ntpd, related to smearing/slewing
114*f17b710fSchristostime.
115*f17b710fSchristos
116*f17b710fSchristosUp to ntpd 4.2.4, if kernel support for leap seconds was either not
117*f17b710fSchristosavailable or was not enabled, ntpd didn't care about the leap second at all.
118*f17b710fSchristosSo if ntpd was run with -x and thus kernel support wasn't used, ntpd saw a
119*f17b710fSchristossudden 1 s offset after the leap second and normally would have stepped the
120*f17b710fSchristostime by -1 s a few minutes later.  However, 'ntpd -x' does not step the time
121*f17b710fSchristosbut "slews" the 1-second correction, which takes 33 minutes and 20 seconds
122*f17b710fSchristosto complete.  This could be considered a bug, but certainly this was only an
123*f17b710fSchristosaccidental behavior.
124*f17b710fSchristos
125*f17b710fSchristosHowever, as we learned in the discussion in http://bugs.ntp.org/2745, this
126*f17b710fSchristosbehavior was very much appreciated since indeed the time was never stepped
127*f17b710fSchristosback, and even though the start of the slewing was somewhat undefined and
128*f17b710fSchristosdepended on the poll interval.  The system time was off by 1 second for
129*f17b710fSchristosseveral minutes before slewing even started.
130*f17b710fSchristos
131*f17b710fSchristosIn ntpd 4.2.6 some code was added which let ntpd step the time at UTC
132*f17b710fSchristosmidnight to insert a leap second, if kernel support was not used.
133*f17b710fSchristosUnfortunately this also happened if ntpd was started with -x, so the folks
134*f17b710fSchristoswho expected that the time was never stepped when ntpd was run with -x found
135*f17b710fSchristosthis wasn't true anymore, and again from the discussion in NTP bug 2745 we
136*f17b710fSchristoslearn that there were even some folks who patched ntpd to get the 4.2.4
137*f17b710fSchristosbehavior back.
138*f17b710fSchristos
139*f17b710fSchristosIn 4.2.8 the leap second code was rewritten and some enhancements were
140*f17b710fSchristosintroduced, but the resulting code still showed the behavior of 4.2.6,
141*f17b710fSchristosi.e. ntpd with -x would still step the time.  This has only recently been
142*f17b710fSchristosfixed in the current ntpd stable code, but this fix is only available with a
143*f17b710fSchristoscertain patch level of ntpd 4.2.8.
144*f17b710fSchristos
145*f17b710fSchristosSo a possible solution for users who were looking for a way to come over the
146*f17b710fSchristosleap second without the time being stepped could have been to check the
147*f17b710fSchristosversion of ntpd installed on each of their systems.  If it's still 4.2.4 be
148*f17b710fSchristossure to start the client ntpd with -x.  If it's 4.2.6 or 4.2.8 it won't work
149*f17b710fSchristosanyway except if you had a patched ntpd version instead of the original
150*f17b710fSchristosversion.  So you'd need to upgrade to the current -stable code to be able to
151*f17b710fSchristosrun ntpd with -x and get the desired result, so you'd still have the
152*f17b710fSchristosrequirement to check/update/configure every single machine in your network
153*f17b710fSchristosthat runs ntpd.
154*f17b710fSchristos
155*f17b710fSchristosGoogle's leap smear approach is a very efficient solution for this, for
156*f17b710fSchristossites that do not require correct timestamps for legal purposes.  You just
157*f17b710fSchristoshave to take care that your NTP servers support leap smearing and configure
158*f17b710fSchristosthose few servers accordingly.  If the smear interval is long enough so that
159*f17b710fSchristosNTP clients can follow the smeared time it doesn't matter at all which
160*f17b710fSchristosversion of ntpd is installed on a client machine, it just works, and it even
161*f17b710fSchristosworks around kernel bugs due to the leap second.
162*f17b710fSchristos
163*f17b710fSchristosSince all clients follow the same smeared time the time difference between
164*f17b710fSchristosthe clients during the smear interval is as small as possible, compared to
165*f17b710fSchristosthe -x approach.  The current leap second code in ntpd determines the point
166*f17b710fSchristosin system time when the leap second is to be inserted, and given a
167*f17b710fSchristosparticular smear interval it's easy to determine the start point of the
168*f17b710fSchristossmearing, and the smearing is finished when the leap second ends, i.e. the
169*f17b710fSchristosnext UTC day begins.
170*f17b710fSchristos
171*f17b710fSchristosThe maximum error doesn't exceed what you'd get with the old smearing caused
172*f17b710fSchristosby -x in ntpd 4.2.4, so if users could accept the old behavior they would
173*f17b710fSchristoseven accept the smearing at the server side.
174*f17b710fSchristos
175*f17b710fSchristosIn order to affect the local timekeeping as little as possible the leap
176*f17b710fSchristossmear support currently implemented in ntpd does not affect the internal
177*f17b710fSchristossystem time at all.  Only the timestamps and refid in outgoing reply packets
178*f17b710fSchristos*to clients* are modified by the smear offset, so this makes sure the basic
179*f17b710fSchristosfunctionality of ntpd is not accidentally broken.  Also peer packets
180*f17b710fSchristosexchanged with other NTP servers are based on the real UTC system time and
181*f17b710fSchristosthe normal refid, as usual.
182*f17b710fSchristos
183*f17b710fSchristosThe leap smear implementation is optionally available in ntp-4.2.8p3 and
184*f17b710fSchristoslater, and the changes can be tracked via http://bugs.ntp.org/2855.
185*f17b710fSchristos
186*f17b710fSchristos
187*f17b710fSchristosUsing NTP's Leap Second Smearing
188*f17b710fSchristos--------------------------------
189*f17b710fSchristos- Leap Second Smearing MUST NOT be used for public servers, e.g. servers
190*f17b710fSchristosprovided by metrology institutes, or servers participating in the NTP pool
191*f17b710fSchristosproject.  There would be a high risk that NTP clients get the time from a
192*f17b710fSchristosmixture of smearing and non-smearing NTP servers which could result in
193*f17b710fSchristosundefined client behavior.  Instead, leap second smearing should only be
194*f17b710fSchristosconfigured on time servers providing dedicated clients with time, if all
195*f17b710fSchristosthose clients can accept smeared time.
196*f17b710fSchristos
197*f17b710fSchristos- Leap Second Smearing is NOT configured by default.  The only way to get
198*f17b710fSchristosthis behavior is to invoke the ./configure script from the NTP source code
199*f17b710fSchristospackage with the --enable-leap-smear parameter before the executables are
200*f17b710fSchristosbuilt.
201*f17b710fSchristos
202*f17b710fSchristos- Even if ntpd has been compiled to enable leap smearing support, leap
203*f17b710fSchristossmearing is only done if explicitly configured.
204*f17b710fSchristos
205*f17b710fSchristos- The leap smear interval should be at least several hours' long, and up to
206*f17b710fSchristos1 day (86400s).  If the interval is too short then the applied smear offset
207*f17b710fSchristosis applied too quickly for clients to follow.  86400s (1 day) is a good
208*f17b710fSchristoschoice.
209*f17b710fSchristos
210*f17b710fSchristos- If several NTP servers are set up for leap smearing then the *same* smear
211*f17b710fSchristosinterval should be configured on each server.
212*f17b710fSchristos
213*f17b710fSchristos- Smearing NTP servers DO NOT send a leap second warning flag to client time
214*f17b710fSchristosrequests.  Since the leap second is applied gradually the clients don't even
215*f17b710fSchristosnotice there's a leap second being inserted, and thus there will be no log
216*f17b710fSchristosmessage or similar related to the leap second be visible on the clients.
217*f17b710fSchristos
218*f17b710fSchristos- Since clients don't (and must not) become aware of the leap second at all,
219*f17b710fSchristosclients getting the time from a smearing NTP server MUST NOT be configured
220*f17b710fSchristosto use a leap second file.  If they had a leap second file they would apply
221*f17b710fSchristosthe leap second twice: the smeared one from the server, plus another one
222*f17b710fSchristosinserted by themselves due to the leap second file.  As a result, the
223*f17b710fSchristosadditional correction would soon be detected and corrected/adjusted.
224*f17b710fSchristos
225*f17b710fSchristos- Clients MUST NOT be configured to poll both smearing and non-smearing NTP
226*f17b710fSchristosservers at the same time.  During the smear interval they would get
227*f17b710fSchristosdifferent times from different servers and wouldn't know which server(s) to
228*f17b710fSchristosaccept.
229*f17b710fSchristos
230*f17b710fSchristos
231*f17b710fSchristosSetting Up A Smearing NTP Server
232*f17b710fSchristos--------------------------------
233*f17b710fSchristosIf an NTP server should perform leap smearing then the leap smear interval
234*f17b710fSchristos(in seconds) needs to be specified in the NTP configuration file ntp.conf,
235*f17b710fSchristose.g.:
236*f17b710fSchristos
237*f17b710fSchristos leapsmearinterval 86400
238*f17b710fSchristos
239*f17b710fSchristosPlease keep in mind the leap smear interval should be between several and 24
240*f17b710fSchristoshours' long.  With shorter values clients may not be able to follow the
241*f17b710fSchristosdrift caused by the smeared time, and with longer values the discrepancy
242*f17b710fSchristosbetween system time and UTC will cause more problems when reconciling
243*f17b710fSchristostimestamp differences.
244*f17b710fSchristos
245*f17b710fSchristosWhen ntpd starts and a smear interval has been specified then a log message
246*f17b710fSchristosis generated, e.g.:
247*f17b710fSchristos
248*f17b710fSchristos ntpd[31120]: config: leap smear interval 86400 s
249*f17b710fSchristos
250*f17b710fSchristosWhile ntpd is running with a leap smear interval specified the command:
251*f17b710fSchristos
252*f17b710fSchristos ntpq -c rv
253*f17b710fSchristos
254*f17b710fSchristosreports the smear status, e.g.:
255*f17b710fSchristos
256*f17b710fSchristos# ntpq -c rv
257*f17b710fSchristosassocid=0 status=4419 leap_add_sec, sync_uhf_radio, 1 event, leap_armed,
258*f17b710fSchristosversion="ntpd 4.2.8p3-RC1@1.3349-o Mon Jun 22 14:24:09 UTC 2015 (26)",
259*f17b710fSchristosprocessor="i586", system="Linux/3.7.1", leap=01, stratum=1,
260*f17b710fSchristosprecision=-18, rootdelay=0.000, rootdisp=1.075, refid=MRS,
261*f17b710fSchristosreftime=d93dab96.09666671 Tue, Jun 30 2015 23:58:14.036,
262*f17b710fSchristosclock=d93dab9b.3386a8d5 Tue, Jun 30 2015 23:58:19.201, peer=2335,
263*f17b710fSchristostc=3, mintc=3, offset=-0.097015, frequency=44.627, sys_jitter=0.003815,
264*f17b710fSchristosclk_jitter=0.451, clk_wander=0.035, tai=35, leapsec=201507010000,
265*f17b710fSchristosexpire=201512280000, leapsmearinterval=86400, leapsmearoffset=-932.087
266*f17b710fSchristos
267*f17b710fSchristosIn the example above 'leapsmearinterval' reports the configured leap smear
268*f17b710fSchristosinterval all the time, while the 'leapsmearoffset' value is 0 outside the
269*f17b710fSchristosinterval and increases from 0 to -1000 ms over the interval.  So this can be
270*f17b710fSchristosused to monitor if and how the time sent to clients is smeared.  With a
271*f17b710fSchristosleapsmearoffset of -.932087, the refid reported in smeared packets would be
272*f17b710fSchristos254.196.88.176.
273