xref: /llvm-project/libc/docs/headers/math/log.rst (revision a9aff440d9dde3a636a4ccff3d5f7eaf25836c34)
1*a9aff440SNick Desaulniers.. _log_algorithm:
2*a9aff440SNick Desaulniers
3*a9aff440SNick Desaulniers========================
4*a9aff440SNick DesaulniersLog/Log10/Log2 Algorithm
5*a9aff440SNick Desaulniers========================
6*a9aff440SNick Desaulniers
7*a9aff440SNick Desaulniers.. default-role:: math
8*a9aff440SNick Desaulniers
9*a9aff440SNick DesaulniersIn this short note, we will discuss in detail about the computation of
10*a9aff440SNick Desaulniers:math:`\log(x)` function, with double precision inputs, in particular, the range
11*a9aff440SNick Desaulniersreduction steps and error analysis.  The algorithm is broken down into 2 main
12*a9aff440SNick Desaulniersphases as follow:
13*a9aff440SNick Desaulniers
14*a9aff440SNick Desaulniers1. Fast phase:
15*a9aff440SNick Desaulniers
16*a9aff440SNick Desaulniers  a. Range reduction
17*a9aff440SNick Desaulniers  b. Polynomial approximation
18*a9aff440SNick Desaulniers  c. Ziv's test
19*a9aff440SNick Desaulniers
20*a9aff440SNick Desaulniers2. Accurate phase (if Ziv's test failed):
21*a9aff440SNick Desaulniers
22*a9aff440SNick Desaulniers  a. Further range reduction
23*a9aff440SNick Desaulniers  b. Polynomial approximation
24*a9aff440SNick Desaulniers
25*a9aff440SNick Desaulniers
26*a9aff440SNick DesaulniersFast phase
27*a9aff440SNick Desaulniers==========
28*a9aff440SNick Desaulniers
29*a9aff440SNick DesaulniersRange reduction
30*a9aff440SNick Desaulniers---------------
31*a9aff440SNick Desaulniers
32*a9aff440SNick DesaulniersLet `x = 2^{e_x} (1 + m_x)` be a normalized double precision number, in which
33*a9aff440SNick Desaulniers`-1074 \leq e_x \leq 1022` and `0 \leq m_x < 1` such that
34*a9aff440SNick Desaulniers`2^{52} m_x \in \mathbb{Z}`.
35*a9aff440SNick Desaulniers
36*a9aff440SNick DesaulniersThen from the properties of logarithm:
37*a9aff440SNick Desaulniers
38*a9aff440SNick Desaulniers.. math::
39*a9aff440SNick Desaulniers  \log(x) &= \log\left( 2^{e_x} (1 + m_x) \right) \\
40*a9aff440SNick Desaulniers          &= \log\left( 2^{e_x} \right) + \log(1 + m_x) \\
41*a9aff440SNick Desaulniers          &= e_x \log(2) + \log(1 + m_x)
42*a9aff440SNick Desaulniers
43*a9aff440SNick Desaulniersthe computation of `\log(x)` can be reduced to:
44*a9aff440SNick Desaulniers
45*a9aff440SNick Desaulniers1. compute the product of `e_x` and `\log(2)`,
46*a9aff440SNick Desaulniers2. compute `\log(1 + m_x)` for `0 \leq m_x < 1`,
47*a9aff440SNick Desaulniers3. add step 1 and 2.
48*a9aff440SNick Desaulniers
49*a9aff440SNick DesaulniersTo compute `\log(1 + m_x)` in step 2, we can reduce the range further by finding
50*a9aff440SNick Desaulniers`r > 0` such that:
51*a9aff440SNick Desaulniers
52*a9aff440SNick Desaulniers.. math::
53*a9aff440SNick Desaulniers  | r(1 + m_x) - 1 | < C \quad \quad \text{(R1)}
54*a9aff440SNick Desaulniers
55*a9aff440SNick Desaulniersfor small `0 < C < 1`.  Then if we let `u = r(1 + m_x) - 1`, `|u| < C`:
56*a9aff440SNick Desaulniers
57*a9aff440SNick Desaulniers.. math::
58*a9aff440SNick Desaulniers  \log(1 + m_x) &= \log \left( \frac{r (1 + m_x)}{r} \right) \\
59*a9aff440SNick Desaulniers                &= \log(r (1 + m_x) ) - \log(r) \\
60*a9aff440SNick Desaulniers                &= \log(1 + u) - \log(r)
61*a9aff440SNick Desaulniers
62*a9aff440SNick Desaulniersand step 2 can be computed with:
63*a9aff440SNick Desaulniers
64*a9aff440SNick Desaulniersa. extract `r` and `-\log(r)` from look-up tables,
65*a9aff440SNick Desaulniersb. compute the reduced argument `u = r(1 + m_x) - 1`,
66*a9aff440SNick Desaulniersc. compute `\log(1 + u)` by polynomial approximation or further range reduction,
67*a9aff440SNick Desaulniersd. add step a and step c results.
68*a9aff440SNick Desaulniers
69*a9aff440SNick Desaulniers
70*a9aff440SNick DesaulniersHow to derive `r`
71*a9aff440SNick Desaulniers-----------------
72*a9aff440SNick Desaulniers
73*a9aff440SNick DesaulniersFor an efficient implementation, we would like to use the first `M` significant
74*a9aff440SNick Desaulniersbits of `m_x` to look up for `r`.  In particular, we would like to find a value
75*a9aff440SNick Desaulniersof `r` that works for all `m_x` satisfying:
76*a9aff440SNick Desaulniers
77*a9aff440SNick Desaulniers.. math::
78*a9aff440SNick Desaulniers  k 2^{-M} \leq m_x < (k + 1) 2^{-M} \quad \text{for some} \quad
79*a9aff440SNick Desaulniers  k = 0..2^{M} - 1. \quad\quad \text{(M1)}
80*a9aff440SNick Desaulniers
81*a9aff440SNick DesaulniersLet `r = 1 + s`, then `u` can be expressed in terms of `s` as:
82*a9aff440SNick Desaulniers
83*a9aff440SNick Desaulniers.. math::
84*a9aff440SNick Desaulniers  u &= r(1 + m_x) - 1 \\
85*a9aff440SNick Desaulniers    &= (1 + s)(1 + m_x) - 1 \\
86*a9aff440SNick Desaulniers    &= s m_x + s + m_x  &\quad\quad \text{(U1)} \\
87*a9aff440SNick Desaulniers    &= s (1 + m_x) + m_x \\
88*a9aff440SNick Desaulniers    &= m_x (1 + s) + s.
89*a9aff440SNick Desaulniers
90*a9aff440SNick DesaulniersFrom the condition `\text{(R1)}`, `s` is bounded by:
91*a9aff440SNick Desaulniers
92*a9aff440SNick Desaulniers.. math::
93*a9aff440SNick Desaulniers  \frac{-C - m_x}{1 + m_x} < s < \frac{C - m_x}{1 + m_x} \quad\quad \text{(S1)}.
94*a9aff440SNick Desaulniers
95*a9aff440SNick DesaulniersSince our reduction constant `s` must work for all `m_x` in the interval
96*a9aff440SNick Desaulniers`I = \{ v: k 2^{-M} \leq v < (k + 1) 2^{-M} \}`, `s` is bounded by:
97*a9aff440SNick Desaulniers
98*a9aff440SNick Desaulniers.. math::
99*a9aff440SNick Desaulniers  \sup_{v \in I} \frac{-C - v}{1 + v} < s < \inf_{v \in I} \frac{C - v}{1 + v}
100*a9aff440SNick Desaulniers
101*a9aff440SNick DesaulniersFor a fixed constant `|c| < 1`, let `f(v) = \frac{c - v}{1 + v}`, then its
102*a9aff440SNick Desaulniersderivative is:
103*a9aff440SNick Desaulniers
104*a9aff440SNick Desaulniers.. math::
105*a9aff440SNick Desaulniers  f'(v) = \frac{(-1)(1 + v) - (1)(c - v)}{(1 + v)^2} = \frac{-1 - c}{(1 + v)^2}.
106*a9aff440SNick Desaulniers
107*a9aff440SNick DesaulniersSince `|c| < 1`, `f'(v) < 0` for all `v \neq -1`, so:
108*a9aff440SNick Desaulniers
109*a9aff440SNick Desaulniers.. math::
110*a9aff440SNick Desaulniers  \sup_{v \in I} f(v) &= f \left( \inf\{ v: v \in I \} \right)
111*a9aff440SNick Desaulniers                       = f \left( k 2^{-M} \right) \\
112*a9aff440SNick Desaulniers  \inf_{v \in I} f(v) &= f \left( \sup\{ v: v \in I \} \right)
113*a9aff440SNick Desaulniers                       = f \left( (k + 1) 2^{-M} \right)
114*a9aff440SNick Desaulniers
115*a9aff440SNick DesaulniersHence we have the following bound on `s`:
116*a9aff440SNick Desaulniers
117*a9aff440SNick Desaulniers.. math::
118*a9aff440SNick Desaulniers  \frac{-C - k 2^{-M}}{1 + k 2^{-M}} < s \leq
119*a9aff440SNick Desaulniers  \frac{C - (k + 1) 2^{-M}}{1 + (k + 1) 2^{-M}}. \quad\quad \text{(S2)}
120*a9aff440SNick Desaulniers
121*a9aff440SNick DesaulniersIn order for `s` to exist, we need that:
122*a9aff440SNick Desaulniers
123*a9aff440SNick Desaulniers.. math::
124*a9aff440SNick Desaulniers  \frac{C - (k + 1) 2^{-M}}{1 + (k + 1) 2^{-M}} >
125*a9aff440SNick Desaulniers  \frac{-C - k 2^{-M}}{1 + k 2^{-M}}
126*a9aff440SNick Desaulniers
127*a9aff440SNick Desaulnierswhich is equivalent to:
128*a9aff440SNick Desaulniers
129*a9aff440SNick Desaulniers.. math::
130*a9aff440SNick Desaulniers  \quad\quad 2C - 2^{-M} + (2k + 1) 2^{-M} C > 0
131*a9aff440SNick Desaulniers  \iff C > \frac{2^{-M - 1}}{1 + (2k + 1) 2^{-M - 1}} \quad\quad \text{(C1)}.
132*a9aff440SNick Desaulniers
133*a9aff440SNick DesaulniersConsider the case `C = 2^{-N}`.  Since `0 \leq k \leq 2^M - 1,` the right hand
134*a9aff440SNick Desaulniersside of `\text{(C1)}` is bounded by:
135*a9aff440SNick Desaulniers
136*a9aff440SNick Desaulniers.. math::
137*a9aff440SNick Desaulniers  2^{-M - 1} > \frac{2^{-M - 1}}{1 + (2k + 1) 2^{-M - 1}} \geq
138*a9aff440SNick Desaulniers  \frac{2^{-M - 1}}{1 + (2^{M + 1} - 1) 2^{-M - 1}} > 2^{-M - 2}.
139*a9aff440SNick Desaulniers
140*a9aff440SNick DesaulniersHence, from `\text{(C1)}`, being an exact power of 2, `C = 2^{-N}` is bounded below
141*a9aff440SNick Desaulniersby:
142*a9aff440SNick Desaulniers
143*a9aff440SNick Desaulniers.. math::
144*a9aff440SNick Desaulniers  C = 2^{-N} \geq 2^{-M - 1}.
145*a9aff440SNick Desaulniers
146*a9aff440SNick DesaulniersTo make the range reduction efficient, we will want to minimize `C` (maximize
147*a9aff440SNick Desaulniers`N`) while keeping the required precision of `s`(`r`) as low as possible.  And
148*a9aff440SNick Desaulniersfor that, we will consider the following two cases: `N = M + 1` and `N = M`.
149*a9aff440SNick Desaulniers
150*a9aff440SNick DesaulniersCase 1 - `N = M + 1`
151*a9aff440SNick Desaulniers~~~~~~~~~~~~~~~~~~~~
152*a9aff440SNick Desaulniers
153*a9aff440SNick DesaulniersWhen `N = M + 1`, `\text{(S2)}` becomes:
154*a9aff440SNick Desaulniers
155*a9aff440SNick Desaulniers.. math::
156*a9aff440SNick Desaulniers  \frac{-2^{-M - 1} - k 2^{-M}}{1 + k 2^{-M}} < s <
157*a9aff440SNick Desaulniers  \frac{2^{-M - 1} - (k + 1) 2^{-M}}{1 + (k + 1) 2^{-M}}.
158*a9aff440SNick Desaulniers  \quad\quad \text{(S2')}
159*a9aff440SNick Desaulniers
160*a9aff440SNick DesaulniersThis is an interval of length:
161*a9aff440SNick Desaulniers
162*a9aff440SNick Desaulniers.. math::
163*a9aff440SNick Desaulniers  l &= \frac{2^{-M - 1} - (k + 1) 2^{-M}}{1 + (k + 1) 2^{-M}} -
164*a9aff440SNick Desaulniers       \frac{-2^{-M - 1} - k 2^{-M}}{1 + k 2^{-M}} \\
165*a9aff440SNick Desaulniers    &= \frac{(2k + 1)2^{-2M - 1}}{(1 + k 2^{-M})(1 + (k + 1)2^{-M})}
166*a9aff440SNick Desaulniers    \quad\quad \text{(L1)}
167*a9aff440SNick Desaulniers
168*a9aff440SNick DesaulniersAs a function of `k`, the length `l` has its derivative with respect to `k`:
169*a9aff440SNick Desaulniers
170*a9aff440SNick Desaulniers.. math::
171*a9aff440SNick Desaulniers  \frac{dl}{dk} =
172*a9aff440SNick Desaulniers  \frac{2^{2M + 1} - 2k(k + 1) - 1}
173*a9aff440SNick Desaulniers       {2^{4M}(1 + k 2^{-M})^2 (1 + (k + 1) 2^{-M})^2}
174*a9aff440SNick Desaulniers
175*a9aff440SNick Desaulnierswhich is always positive for `0 \leq k \leq 2^M - 1`.  So for all
176*a9aff440SNick Desaulniers`0 < k < 2^{-M}` (`k = 0` will be treated differently in edge cases), and for
177*a9aff440SNick Desaulniers`M > 2`, `l` is bounded below by:
178*a9aff440SNick Desaulniers
179*a9aff440SNick Desaulniers.. math::
180*a9aff440SNick Desaulniers  l > 2^{-2M}.
181*a9aff440SNick Desaulniers
182*a9aff440SNick DesaulniersIt implies that we can always find `s` with `\operatorname{ulp}(s) = 2^{-2M}`.
183*a9aff440SNick DesaulniersAnd from `\text{(U1)}`, `u = s(1 + m_x) + m_x`, its `ulp` is:
184*a9aff440SNick Desaulniers
185*a9aff440SNick Desaulniers.. math::
186*a9aff440SNick Desaulniers  \operatorname{ulp}(u) &= \operatorname{ulp}(s) \cdot \operatorname{ulp}(m_x) \\
187*a9aff440SNick Desaulniers                        &= 2^{-2M} \operatorname{ulp}(m_x).
188*a9aff440SNick Desaulniers
189*a9aff440SNick DesaulniersSince:
190*a9aff440SNick Desaulniers
191*a9aff440SNick Desaulniers.. math::
192*a9aff440SNick Desaulniers  |u| < C = 2^{-N} = 2^{-M - 1},
193*a9aff440SNick Desaulniers
194*a9aff440SNick DesaulniersIts required precision is:
195*a9aff440SNick Desaulniers
196*a9aff440SNick Desaulniers.. math::
197*a9aff440SNick Desaulniers  \operatorname{prec}(u) &= \log_2(2^{-M-1} / \operatorname{ulp}(u)) \\
198*a9aff440SNick Desaulniers                         &= \log_2(2^{M - 1} / \operatorname{ulp}(m_x)) \\
199*a9aff440SNick Desaulniers                         &= M - 1 - \log_2(\operatorname{ulp}(m_x)).
200*a9aff440SNick Desaulniers
201*a9aff440SNick DesaulniersThis means that in this case, we cannot restrict `u` to be exactly representable
202*a9aff440SNick Desaulniersin double precision for double precision input `x` with `M > 2`.  Nonetheless,
203*a9aff440SNick Desaulniersfor a reasonable value of `M`, we can have `u` exactly representable in double
204*a9aff440SNick Desaulniersprecision for single precision input `x` (`\operatorname{ulp}(m_x) = 2^{-23}`)
205*a9aff440SNick Desaulnierssuch that `|u| < 2^{-M - 1}` using a look-up table of size `2^M`.
206*a9aff440SNick Desaulniers
207*a9aff440SNick DesaulniersA particular formula for `s` can be derived from `\text{(S2')}` by the midpoint
208*a9aff440SNick Desaulniersformula:
209*a9aff440SNick Desaulniers
210*a9aff440SNick Desaulniers.. math::
211*a9aff440SNick Desaulniers  s &= 2^{-2M} \operatorname{round}\left( 2^{2M} \cdot \operatorname{midpoint}
212*a9aff440SNick Desaulniers       \left(-\frac{-2^{-M - 1} - k2^{-M}}{1 + k 2^{-M}},
213*a9aff440SNick Desaulniers       \frac{2^{-M-1} - (k + 1)2^{-M}}{1 + (k + 1) 2^{-M}}\right) \right) \\
214*a9aff440SNick Desaulniers    &= 2^{-2M} \operatorname{round}\left( 2^{2M} \cdot \frac{1}{2} \left(
215*a9aff440SNick Desaulniers       \frac{-2^{-M - 1} - k2^{-M}}{1 + k 2^{-M}} +
216*a9aff440SNick Desaulniers       \frac{2^{-M - 1} + (k + 1)2^{-M}}{1 + (k + 1) 2^{-M}}
217*a9aff440SNick Desaulniers    \right) \right) \\
218*a9aff440SNick Desaulniers    &= 2^{-2M} \operatorname{round}\left( \frac{
219*a9aff440SNick Desaulniers       - \left(k + \frac{1}{2} \right) \left(2^M - k - \frac{1}{2} \right) }
220*a9aff440SNick Desaulniers       {(1 + k 2^{-N})(1 + (k + 1) 2^{-N})} \right) \\
221*a9aff440SNick Desaulniers    &= - 2^{-2M} \operatorname{round}\left( \frac{
222*a9aff440SNick Desaulniers       \left(k + \frac{1}{2} \right) \left(2^M - k - \frac{1}{2} \right) }
223*a9aff440SNick Desaulniers       {(1 + k 2^{-N})(1 + (k + 1) 2^{-N})} \right)  \quad\quad \text{(S3)}
224*a9aff440SNick Desaulniers
225*a9aff440SNick DesaulniersThe corresponding range and formula for `r = 1 + s` are:
226*a9aff440SNick Desaulniers
227*a9aff440SNick Desaulniers.. math::
228*a9aff440SNick Desaulniers  \frac{1 - 2^{-M - 1}}{1 + k 2^{-M}} < r \leq
229*a9aff440SNick Desaulniers  \frac{1 + 2^{-M - 1}}{1 + (k + 1) 2^{-M}}
230*a9aff440SNick Desaulniers
231*a9aff440SNick Desaulniers.. math::
232*a9aff440SNick Desaulniers  r &= 2^{-2M} \operatorname{round}\left( 2^{2M} \cdot
233*a9aff440SNick Desaulniers       \operatorname{midpoint}\left( \frac{1 - 2^{-M - 1}}{1 + k 2^{-M}},
234*a9aff440SNick Desaulniers          \frac{1 + 2^{-M - 1}}{1 + (k + 1) 2^{-M}}\right) \right) \\
235*a9aff440SNick Desaulniers    &= 2^{-2M} \operatorname{round}\left( 2^{2M} \cdot \frac{1}{2} \left(
236*a9aff440SNick Desaulniers       \frac{1 + 2^{-M-1}}{1 + (k + 1) 2^{-M}} + \frac{1 - 2^{-M-1}}{1 + k 2^{-M}}
237*a9aff440SNick Desaulniers    \right) \right) \\
238*a9aff440SNick Desaulniers    &= 2^{-2M} \operatorname{round}\left( 2^{2M} \cdot \frac{
239*a9aff440SNick Desaulniers       1 + \left(k + \frac{1}{2} \right) 2^{-M} - 2^{-2M-2} }{(1 + k 2^{-M})
240*a9aff440SNick Desaulniers       (1 + (k + 1) 2^{-M})} \right)
241*a9aff440SNick Desaulniers
242*a9aff440SNick DesaulniersCase 1 - `N = M`
243*a9aff440SNick Desaulniers~~~~~~~~~~~~~~~~
244*a9aff440SNick Desaulniers
245*a9aff440SNick DesaulniersWhen `N = M`, `\text{(S2)}` becomes:
246*a9aff440SNick Desaulniers
247*a9aff440SNick Desaulniers.. math::
248*a9aff440SNick Desaulniers  \frac{-(k + 1)2^{-M}}{1 + k 2^{-M}} < s < \frac{-k 2^{-M}}{1 + (k + 1) 2^{-M}}
249*a9aff440SNick Desaulniers  \quad\quad \text{(S2")}
250*a9aff440SNick Desaulniers
251*a9aff440SNick DesaulniersThis is an interval of length:
252*a9aff440SNick Desaulniers
253*a9aff440SNick Desaulniers.. math::
254*a9aff440SNick Desaulniers  l &= \frac{- k 2^{-M}}{1 + (k + 1) 2^{-M}} -
255*a9aff440SNick Desaulniers       \frac{- (k + 1) 2^{-M}}{1 + k 2^{-M}} \\
256*a9aff440SNick Desaulniers    &= \frac{2^{-M} (1 + (2k + 1) 2^{-M})}{(1 + k 2^{-M})(1 + (k + 1)2^{-M})}
257*a9aff440SNick Desaulniers    \quad\quad \text{(L1')}
258*a9aff440SNick Desaulniers
259*a9aff440SNick DesaulniersAs a function of `k`, its derivative with respect to `k`:
260*a9aff440SNick Desaulniers
261*a9aff440SNick Desaulniers.. math::
262*a9aff440SNick Desaulniers  \frac{dl}{dk} =
263*a9aff440SNick Desaulniers  -\frac{2^{-2M}(k(k + 1)2^{-M + 1} + 2^{-M} + 2k + 1)}
264*a9aff440SNick Desaulniers        {(1 + k 2^{-M})^2 (1 + (k + 1) 2^{-M})^2}
265*a9aff440SNick Desaulniers
266*a9aff440SNick Desaulnierswhich is always negative for `0 \leq k \leq 2^M - 1`.  So for `M > 1`, `l` is
267*a9aff440SNick Desaulniersbounded below by:
268*a9aff440SNick Desaulniers
269*a9aff440SNick Desaulniers.. math::
270*a9aff440SNick Desaulniers  l > \frac{2^{-M - 1} (3 - 2^{-M})}{2 - 2^{-M}} > 2^{-M - 1}.
271*a9aff440SNick Desaulniers
272*a9aff440SNick DesaulniersIt implies that we can always find `s` with `\operatorname{ulp}(s) = 2^{-M-1}`.
273*a9aff440SNick DesaulniersAnd from `\text{(U1)}`, `u = s(1 + m_x) + m_x`, its `ulp` is:
274*a9aff440SNick Desaulniers
275*a9aff440SNick Desaulniers.. math::
276*a9aff440SNick Desaulniers  \operatorname{ulp}(u) &= \operatorname{ulp}(s) \cdot \operatorname{ulp}(m_x) \\
277*a9aff440SNick Desaulniers                        &= 2^{-M - 1} \operatorname{ulp}(m_x).
278*a9aff440SNick Desaulniers
279*a9aff440SNick DesaulniersSince:
280*a9aff440SNick Desaulniers
281*a9aff440SNick Desaulniers.. math::
282*a9aff440SNick Desaulniers  |u| < C = 2^{-N} = 2^{-M},
283*a9aff440SNick Desaulniers
284*a9aff440SNick DesaulniersIts required precision is:
285*a9aff440SNick Desaulniers
286*a9aff440SNick Desaulniers.. math::
287*a9aff440SNick Desaulniers  \operatorname{prec}(u) &= \log_2(2^{-M} / \operatorname{ulp}(u)) \\
288*a9aff440SNick Desaulniers                         &= \log_2(2 / \operatorname{ulp}(m_x)) \\
289*a9aff440SNick Desaulniers                         &= 1 - \log_2(\operatorname{ulp}(m_x)).
290*a9aff440SNick Desaulniers
291*a9aff440SNick DesaulniersHence, for double precision `x`, `\operatorname{ulp}(m_x) = 2^{-52}`, and the
292*a9aff440SNick Desaulniersprecision needed for `u` is `\operatorname{prec}(u) = 53`, i.e., `u` can be
293*a9aff440SNick Desaulniersexactly representable in double precision.  And in this case, `s` can be
294*a9aff440SNick Desaulniersderived from `\text{(S2")}` by the midpoint formula:
295*a9aff440SNick Desaulniers
296*a9aff440SNick Desaulniers.. math::
297*a9aff440SNick Desaulniers  s &= 2^{-M - 1} \operatorname{round}\left( 2^{M + 1} \cdot
298*a9aff440SNick Desaulniers       \operatorname{midpoint} \left(-\frac{-(k + 1)2^{-M}}{1 + k 2^{-M}},
299*a9aff440SNick Desaulniers       \frac{-k2^{-M}}{1 + (k + 1) 2^{-M}}\right) \right) \\
300*a9aff440SNick Desaulniers    &= 2^{-M - 1} \operatorname{round}\left( 2^{M + 1} \cdot \frac{1}{2} \left(
301*a9aff440SNick Desaulniers       \frac{-(k + 1)2^{-M}}{1 + k 2^{-M}} + \frac{-k2^{-M}}{1 + (k + 1) 2^{-M}}
302*a9aff440SNick Desaulniers       \right) \right) \\
303*a9aff440SNick Desaulniers    &= -2^{-M - 1} \operatorname{round}\left( \frac{
304*a9aff440SNick Desaulniers       (2k + 1) + (2k^2 + 2k + 1) 2^{-M} }
305*a9aff440SNick Desaulniers       {(1 + k 2^{-N})(1 + (k + 1) 2^{-N})} \right)  \quad\quad \text{(S3')}
306*a9aff440SNick Desaulniers
307*a9aff440SNick DesaulniersThe corresponding range and formula for `r = 1 + s` are:
308*a9aff440SNick Desaulniers
309*a9aff440SNick Desaulniers.. math::
310*a9aff440SNick Desaulniers  \frac{1 - 2^{-M}}{1 + k 2^{-M}} < r \leq \frac{1 + 2^{-M}}{1 + (k + 1) 2^{-M}}
311*a9aff440SNick Desaulniers
312*a9aff440SNick Desaulniers.. math::
313*a9aff440SNick Desaulniers  r &= 2^{-M-1} \operatorname{round}\left( 2^{M + 1} \cdot
314*a9aff440SNick Desaulniers       \operatorname{midpoint}\left( \frac{1 - 2^{-M}}{1 + k 2^{-M}},
315*a9aff440SNick Desaulniers          \frac{1 + 2^{-M}}{1 + (k + 1) 2^{-M}}\right) \right) \\
316*a9aff440SNick Desaulniers    &= 2^{-M-1} \operatorname{round}\left( 2^{M + 1} \cdot \frac{1}{2} \left(
317*a9aff440SNick Desaulniers       \frac{1 + 2^{-M}}{1 + (k + 1) 2^{-M}} + \frac{1 - 2^{-M}}{1 + k 2^{-M}}
318*a9aff440SNick Desaulniers    \right) \right) \\
319*a9aff440SNick Desaulniers    &= 2^{-M - 1} \operatorname{round}\left( 2^{M + 1} \cdot \frac{
320*a9aff440SNick Desaulniers       1 + \left(k + \frac{1}{2} \right) 2^{-M} - 2^{-2M-1} }{(1 + k 2^{-M})
321*a9aff440SNick Desaulniers       (1 + (k + 1) 2^{-M})} \right)
322*a9aff440SNick Desaulniers
323*a9aff440SNick DesaulniersEdge cases
324*a9aff440SNick Desaulniers----------
325*a9aff440SNick Desaulniers
326*a9aff440SNick Desaulniers1. When `k = 0`, notice that:
327*a9aff440SNick Desaulniers
328*a9aff440SNick Desaulniers.. math::
329*a9aff440SNick Desaulniers  0 = k 2^{-N} \leq m_x < (k + 1) 2^{-N} = 2^{-N} = C,
330*a9aff440SNick Desaulniers
331*a9aff440SNick Desaulniersso we can simply choose `r = 1` so that `\log(r) = 0` is exact, then `u = m_x`.
332*a9aff440SNick DesaulniersThis will help reduce the accumulated errors when `m_x` is close to 0 while
333*a9aff440SNick Desaulniersmaintaining the range reduction output's requirements.
334*a9aff440SNick Desaulniers
335*a9aff440SNick Desaulniers2. When `k = 2^{N} - 1`, `\text{(S2)}` becomes:
336*a9aff440SNick Desaulniers
337*a9aff440SNick Desaulniers.. math::
338*a9aff440SNick Desaulniers  -\frac{1}{2} - \frac{C - 2^{-M-1}}{2 - 2^{-M}} <> s \leq
339*a9aff440SNick Desaulniers  -\frac{1}{2} + \frac{C}{2}.
340*a9aff440SNick Desaulniers
341*a9aff440SNick Desaulniersso when `C > 2^{-M - 1}` is a power of 2, we can always choose:
342*a9aff440SNick Desaulniers
343*a9aff440SNick Desaulniers.. math::
344*a9aff440SNick Desaulniers  s = -\frac{1}{2}, \quad \text{i.e.} \quad r = \frac{1}{2}.
345*a9aff440SNick Desaulniers
346*a9aff440SNick DesaulniersThis reduction works well to avoid catastrophic cancellation happening when
347*a9aff440SNick Desaulniers`e_x = -1`.
348*a9aff440SNick Desaulniers
349*a9aff440SNick DesaulniersThis also works when `C = 2^{-M - 1}` if we relax the condition on `u` to
350*a9aff440SNick Desaulniers`|u| \leq C = 2^{-M-1}`.
351*a9aff440SNick Desaulniers
352*a9aff440SNick DesaulniersIntermediate precision, and Ziv's test
353*a9aff440SNick Desaulniers--------------------------------------
354*a9aff440SNick Desaulniers
355*a9aff440SNick DesaulniersIn the fast phase, we want extra precision while performant, so we use
356*a9aff440SNick Desaulniersdouble-double precision for most intermediate computation steps, and employ Ziv
357*a9aff440SNick Desaulnierstest to see if the result is accurate or not.  In our case, the Ziv's test can
358*a9aff440SNick Desaulniersbe described as follow:
359*a9aff440SNick Desaulniers
360*a9aff440SNick Desaulniers1. Let `re = re.hi + re.lo` be the double-double output of the fast phase
361*a9aff440SNick Desaulniers   computation.
362*a9aff440SNick Desaulniers2. Let `err` be an estimated upper bound of the errors of `re`.
363*a9aff440SNick Desaulniers3. If `\circ(re.hi + (re.lo - err)) == \circ(re.hi + (r.lo + err))` then the
364*a9aff440SNick Desaulniers   result is correctly rounded to double precision for the current rounding mode
365*a9aff440SNick Desaulniers   `\circ`.  Otherwise, the accurate phase with extra precision is needed.
366*a9aff440SNick Desaulniers
367*a9aff440SNick DesaulniersFor an easy and cheap estimation of the error bound `err`, since the range
368*a9aff440SNick Desaulniersreduction step described above is accurate, the errors of the result:
369*a9aff440SNick Desaulniers
370*a9aff440SNick Desaulniers.. math::
371*a9aff440SNick Desaulniers  \log(x) &= e_x \log(2) - \log(r) + \log(1 + u) \\
372*a9aff440SNick Desaulniers          &\approx e_x \log(2) - \log(r) + u P(u)
373*a9aff440SNick Desaulniers
374*a9aff440SNick Desaulnierscome from 2 parts:
375*a9aff440SNick Desaulniers
376*a9aff440SNick Desaulniers1. the look-up part: `e_x \log(2) - \log(r)`
377*a9aff440SNick Desaulniers2. the polynomial approximation part: `u P(u)`
378*a9aff440SNick Desaulniers
379*a9aff440SNick DesaulniersThe errors of the first part can be computed with a single `\operatorname{fma}`
380*a9aff440SNick Desaulniersoperation:
381*a9aff440SNick Desaulniers
382*a9aff440SNick Desaulniers.. math::
383*a9aff440SNick Desaulniers  err_1 = \operatorname{fma}(e_x, err(\log(2)), err(\log(r))),
384*a9aff440SNick Desaulniers
385*a9aff440SNick Desaulniersand then combining with the errors of the second part for another
386*a9aff440SNick Desaulniers`\operatorname{fma}` operation:
387*a9aff440SNick Desaulniers
388*a9aff440SNick Desaulniers.. math::
389*a9aff440SNick Desaulniers  err = \operatorname{fma}(u, err(P), err_1)
390*a9aff440SNick Desaulniers
391*a9aff440SNick Desaulniers
392*a9aff440SNick DesaulniersAccurate phase
393*a9aff440SNick Desaulniers==============
394*a9aff440SNick Desaulniers
395*a9aff440SNick DesaulniersExtending range reduction
396*a9aff440SNick Desaulniers-------------------------
397*a9aff440SNick Desaulniers
398*a9aff440SNick DesaulniersSince the output `u = r(1 + m_x) - 1` of the fast phase's range reduction
399*a9aff440SNick Desaulniersis computed exactly, we can apply further range reduction steps by
400*a9aff440SNick Desaulniersusing the following formula:
401*a9aff440SNick Desaulniers
402*a9aff440SNick Desaulniers.. math::
403*a9aff440SNick Desaulniers  u_{i + 1} = r_i(1 + u_i) - 1 = u_i \cdot r_i + (r_i - 1),
404*a9aff440SNick Desaulniers
405*a9aff440SNick Desaulnierswhere `|u_i| < 2^{-N_i}` and `u_0 = u` is representable in double precision.
406*a9aff440SNick Desaulniers
407*a9aff440SNick DesaulniersLet `s_i = r_i - 1`, then we can rewrite it as:
408*a9aff440SNick Desaulniers
409*a9aff440SNick Desaulniers.. math::
410*a9aff440SNick Desaulniers  u_{i + 1} &= (1 + s_i)(1 + u_i) - 1 \\
411*a9aff440SNick Desaulniers            &= s_i u_i + u_i + s_i \\
412*a9aff440SNick Desaulniers            &= u_i (1 + s_i) + s_i
413*a9aff440SNick Desaulniers            &= s_i (1 + u_i) + u_i.
414*a9aff440SNick Desaulniers
415*a9aff440SNick DesaulniersThen the bound on `u_{i + 1}` is translated to `s_i` as:
416*a9aff440SNick Desaulniers
417*a9aff440SNick Desaulniers.. math::
418*a9aff440SNick Desaulniers  \frac{-2^{-N_{i + 1}} - u_i}{1 + u_i} < s_i < \frac{2^{-N_{i + 1}} - u_i}{1 + u_i}.
419*a9aff440SNick Desaulniers
420*a9aff440SNick DesaulniersLet say we divide the interval `[0, 2^-{N_i})` into `2^{M_i}` subintervals
421*a9aff440SNick Desaulniersevenly and use the index `k` such that:
422*a9aff440SNick Desaulniers
423*a9aff440SNick Desaulniers.. math::
424*a9aff440SNick Desaulniers  k 2^{-N_i - M_i} \leq u_i < (k + 1) 2^{-N_i - M_i},
425*a9aff440SNick Desaulniers
426*a9aff440SNick Desaulniersto look-up for the reduction constant `s_{i, k}`.  In other word, `k` is given
427*a9aff440SNick Desaulniersby the formula:
428*a9aff440SNick Desaulniers
429*a9aff440SNick Desaulniers.. math::
430*a9aff440SNick Desaulniers  k = \left\lfloor 2^{N_i + M_i} u_i \right\rfloor
431*a9aff440SNick Desaulniers
432*a9aff440SNick DesaulniersNotice that our reduction constant `s_{i, k}` must work for all `u_i` in the
433*a9aff440SNick Desaulniersinterval `I = \{ v: k 2^{-N_i - M_i} \leq v < (k + 1) 2^{-N_i - M_i} \}`,
434*a9aff440SNick Desaulniersso it is bounded by:
435*a9aff440SNick Desaulniers
436*a9aff440SNick Desaulniers.. math::
437*a9aff440SNick Desaulniers  \sup_{v \in I} \frac{-2^{-N_{i + 1}} - v}{1 + v} < s_{i, k} < \inf_{v \in I} \frac{2^{-N_{i + 1}} - v}{1 + v}
438*a9aff440SNick Desaulniers
439*a9aff440SNick DesaulniersFor a fixed constant `|C| < 1`, let `f(v) = \frac{C - v}{1 + v}`, then its derivative
440*a9aff440SNick Desaulniersis:
441*a9aff440SNick Desaulniers
442*a9aff440SNick Desaulniers.. math::
443*a9aff440SNick Desaulniers  f'(v) = \frac{(-1)(1 + v) - (1)(C - v)}{(1 + v)^2} = \frac{-1 - C}{(1 + v)^2}.
444*a9aff440SNick Desaulniers
445*a9aff440SNick DesaulniersSince `|C| < 1`, `f'(v) < 0` for all `v \neq -1`, so:
446*a9aff440SNick Desaulniers
447*a9aff440SNick Desaulniers.. math::
448*a9aff440SNick Desaulniers  \sup_{v \in I} f(v) &= f \left( \inf\{ v: v \in I \} \right)
449*a9aff440SNick Desaulniers                       = f \left( k 2^{-N_i - M_i} \right) \\
450*a9aff440SNick Desaulniers  \inf_{v \in I} f(v) &= f \left( \sup\{ v: v \in I \} \right)
451*a9aff440SNick Desaulniers                       = f \left( (k + 1) 2^{-N_i - M_i} \right)
452*a9aff440SNick Desaulniers
453*a9aff440SNick DesaulniersHence we have the following bound on `s_{i, k}`:
454*a9aff440SNick Desaulniers
455*a9aff440SNick Desaulniers.. math::
456*a9aff440SNick Desaulniers  \frac{-2^{-N_{i + 1}} - k 2^{-N_i - M_i}}{1 + k 2^{-N_i - M_i}} < s_{i, k}
457*a9aff440SNick Desaulniers  \leq \frac{2^{-N_{i + 1}} - (k + 1) 2^{-N_i - M_i}}{1 + (k + 1) 2^{-N_i - M_i}}
458*a9aff440SNick Desaulniers
459*a9aff440SNick DesaulniersThis interval is of length:
460*a9aff440SNick Desaulniers
461*a9aff440SNick Desaulniers.. math::
462*a9aff440SNick Desaulniers  l &= \frac{2^{-N_{i + 1}} - (k + 1) 2^{-N_i - M_i}}{1 + (k + 1) 2^{-N_i - M_i}} -
463*a9aff440SNick Desaulniers  \frac{-2^{-N_{i + 1}} - k 2^{-N_i - M_i}}{1 + k 2^{-N_i - M_i}} \\
464*a9aff440SNick Desaulniers  &= \frac{2^{-N_{i + 1} + 1} - 2^{-N_i - M_i} + (2k + 1) 2^{-N_{i + 1} - N_i - M_i}}
465*a9aff440SNick Desaulniers      {(1 + k 2^{-N_i - M_i})(1 + (k + 1) 2^{-N_i -M_i})}
466*a9aff440SNick Desaulniers
467*a9aff440SNick DesaulniersSo in order to be able to find `s_{i, k}`, we need that:
468*a9aff440SNick Desaulniers
469*a9aff440SNick Desaulniers.. math::
470*a9aff440SNick Desaulniers  2^{-N_{i + 1} + 1} - 2^{-N_i - M_i} + (2k + 1) 2^{-N_{i + 1} - N_i - M_i} > 0
471*a9aff440SNick Desaulniers
472*a9aff440SNick DesaulniersThis give us the following bound on `N_{i + 1}`:
473*a9aff440SNick Desaulniers
474*a9aff440SNick Desaulniers.. math::
475*a9aff440SNick Desaulniers  N_{i + 1} \leq N_i + M_i + 1.
476*a9aff440SNick Desaulniers
477*a9aff440SNick DesaulniersTo make the range reduction effective, we will want to maximize `N_{i + 1}`, so
478*a9aff440SNick Desaulnierslet consider the two cases: `N_{i + 1} = N_i + M_i + 1` and
479*a9aff440SNick Desaulniers`N_{i + 1} = N_i + M_i`.
480*a9aff440SNick Desaulniers
481*a9aff440SNick Desaulniers
482*a9aff440SNick Desaulniers
483*a9aff440SNick DesaulniersThe optimal choice to balance between maximizing `N_{i + 1}` and minimizing the
484*a9aff440SNick Desaulniersprecision needed for `s_{i, k}` is:
485*a9aff440SNick Desaulniers
486*a9aff440SNick Desaulniers.. math::
487*a9aff440SNick Desaulniers  N_{i + 1} = N_i + M_i,
488*a9aff440SNick Desaulniers
489*a9aff440SNick Desaulniersand in this case, the optimal `\operatorname{ulp}(s_{i, k})` is:
490*a9aff440SNick Desaulniers
491*a9aff440SNick Desaulniers.. math::
492*a9aff440SNick Desaulniers  \operatorname{ulp}(s_{i, k}) = 2^{-N_i - M_i}
493*a9aff440SNick Desaulniers
494*a9aff440SNick Desaulniersand the corresponding `\operatorname{ulp}(u_{i + 1})` is:
495*a9aff440SNick Desaulniers
496*a9aff440SNick Desaulniers.. math::
497*a9aff440SNick Desaulniers  \operatorname{ulp}(u_{i + 1}) &= \operatorname{ulp}(u_i) \operatorname{ulp}(s_{i, k}) \\
498*a9aff440SNick Desaulniers    &= \operatorname{ulp}(u_i) \cdot 2^{-N_i - M_i} \\
499*a9aff440SNick Desaulniers    &= \operatorname{ulp}(u_0) \cdot 2^{-N_0 - M_0} \cdot 2^{-N_0 - M_0 - M_1} \cdots 2^{-N_0 - M_0 - M_1 - \cdots - M_i} \\
500*a9aff440SNick Desaulniers    &= 2^{-N_0 - 53} \cdot 2^{-N_0 - M_0} \cdot 2^{-N_0 - M_0 - M_1} \cdots 2^{-N_0 - M_0 - M_1 - \cdots - M_i}
501*a9aff440SNick Desaulniers
502*a9aff440SNick DesaulniersSince `|u_{i + 1}| < 2^{-N_{i + 1}} = 2^{-N_0 - M_1 - ... -M_i}`, the precision
503*a9aff440SNick Desaulniersof `u_{i + 1}` is:
504*a9aff440SNick Desaulniers
505*a9aff440SNick Desaulniers.. math::
506*a9aff440SNick Desaulniers  \operatorname{prec}(u_{i + 1}) &= (N_0 + 53) + (N_0 + M_0) + \cdots +
507*a9aff440SNick Desaulniers    (N_0 + M_0 + \cdots + M_i) - (N_0 + M_0 + \cdots + M_i) \\
508*a9aff440SNick Desaulniers    &= (i + 1) N_0 + i M_0 + (i - 1) M_1 + \cdots + M_{i - 1} + 53
509*a9aff440SNick Desaulniers
510*a9aff440SNick DesaulniersIf we choose to have the same `M_0 = M_1 = \cdots = M_i = M`, this can be
511*a9aff440SNick Desaulnierssimplified to:
512*a9aff440SNick Desaulniers
513*a9aff440SNick Desaulniers.. math::
514*a9aff440SNick Desaulniers  \operatorname{prec}(u_{i + 1}) = (i + 1) N_0 + \frac{i(i + 1)}{2} \cdot M + 53.
515*a9aff440SNick Desaulniers
516*a9aff440SNick DesaulniersWe summarize the precision analysis for extending the range reduction in the
517*a9aff440SNick Desaulnierstable below:
518*a9aff440SNick Desaulniers
519*a9aff440SNick Desaulniers+-------+-----+-----------+------------+--------------+-----------------+-------------------+
520*a9aff440SNick Desaulniers| `N_0` | `M` | No. steps | Table size | Output bound | ulp(`s_{i, k}`) | prec(`u_{i + 1}`) |
521*a9aff440SNick Desaulniers+-------+-----+-----------+------------+--------------+-----------------+-------------------+
522*a9aff440SNick Desaulniers| 7     |  4  |         1 |         32 | `2^{-11}`    | `2^{-12}`       |  60               |
523*a9aff440SNick Desaulniers|       |     +-----------+------------+--------------+-----------------+-------------------+
524*a9aff440SNick Desaulniers|       |     |         2 |         64 | `2^{-15}`    | `2^{-16}`       |  71               |
525*a9aff440SNick Desaulniers|       |     +-----------+------------+--------------+-----------------+-------------------+
526*a9aff440SNick Desaulniers|       |     |         3 |         96 | `2^{-19}`    | `2^{-20}`       |  86               |
527*a9aff440SNick Desaulniers|       |     +-----------+------------+--------------+-----------------+-------------------+
528*a9aff440SNick Desaulniers|       |     |         4 |        128 | `2^{-23}`    | `2^{-24}`       | 105               |
529*a9aff440SNick Desaulniers|       |     +-----------+------------+--------------+-----------------+-------------------+
530*a9aff440SNick Desaulniers|       |     |         5 |        160 | `2^{-27}`    | `2^{-28}`       | 128               |
531*a9aff440SNick Desaulniers|       |     +-----------+------------+--------------+-----------------+-------------------+
532*a9aff440SNick Desaulniers|       |     |         6 |        192 | `2^{-31}`    | `2^{-32}`       | 155               |
533*a9aff440SNick Desaulniers|       +-----+-----------+------------+--------------+-----------------+-------------------+
534*a9aff440SNick Desaulniers|       |  5  |         3 |        192 | `2^{-22}`    | `2^{-23}`       |  89               |
535*a9aff440SNick Desaulniers|       |     +-----------+------------+--------------+-----------------+-------------------+
536*a9aff440SNick Desaulniers|       |     |         4 |        256 | `2^{-27}`    | `2^{-28}`       | 111               |
537*a9aff440SNick Desaulniers|       |     +-----------+------------+--------------+-----------------+-------------------+
538*a9aff440SNick Desaulniers|       |     |         5 |        320 | `2^{-32}`    | `2^{-33}`       | 138               |
539*a9aff440SNick Desaulniers|       |     +-----------+------------+--------------+-----------------+-------------------+
540*a9aff440SNick Desaulniers|       |     |         6 |        384 | `2^{-37}`    | `2^{-38}`       | 170               |
541*a9aff440SNick Desaulniers|       +-----+-----------+------------+--------------+-----------------+-------------------+
542*a9aff440SNick Desaulniers|       |  6  |         3 |        384 | `2^{-25}`    | `2^{-26}`       |  92               |
543*a9aff440SNick Desaulniers|       |     +-----------+------------+--------------+-----------------+-------------------+
544*a9aff440SNick Desaulniers|       |     |         4 |        512 | `2^{-31}`    | `2^{-32}`       | 117               |
545*a9aff440SNick Desaulniers|       +-----+-----------+------------+--------------+-----------------+-------------------+
546*a9aff440SNick Desaulniers|       |  7  |         1 |        256 | `2^{-24}`    | `2^{-15}`       |  60               |
547*a9aff440SNick Desaulniers|       |     +-----------+------------+--------------+-----------------+-------------------+
548*a9aff440SNick Desaulniers|       |     |         2 |        512 | `2^{-21}`    | `2^{-22}`       |  74               |
549*a9aff440SNick Desaulniers+-------+-----+-----------+------------+--------------+-----------------+-------------------+
550*a9aff440SNick Desaulniers
551*a9aff440SNick Desaulnierswhere:
552*a9aff440SNick Desaulniers
553*a9aff440SNick Desaulniers- Number of steps = `i + 1`
554*a9aff440SNick Desaulniers- Table size = `(i + 1) 2^{M + 1}`
555*a9aff440SNick Desaulniers- Output bound = `2^{-N_{i + 1}} = 2^{-N_0 - (i + 1) M}`
556*a9aff440SNick Desaulniers- `\operatorname{ulp}(s_{i, k}) = 2^{-N_{i + 1} - 1}`
557*a9aff440SNick Desaulniers- `\operatorname{prec}(u_{i + 1}) = (i + 1) N_0 + \frac{i(i + 1)}{2} \cdot M + 53`
558