1*a9aff440SNick Desaulniers.. _log_algorithm: 2*a9aff440SNick Desaulniers 3*a9aff440SNick Desaulniers======================== 4*a9aff440SNick DesaulniersLog/Log10/Log2 Algorithm 5*a9aff440SNick Desaulniers======================== 6*a9aff440SNick Desaulniers 7*a9aff440SNick Desaulniers.. default-role:: math 8*a9aff440SNick Desaulniers 9*a9aff440SNick DesaulniersIn this short note, we will discuss in detail about the computation of 10*a9aff440SNick Desaulniers:math:`\log(x)` function, with double precision inputs, in particular, the range 11*a9aff440SNick Desaulniersreduction steps and error analysis. The algorithm is broken down into 2 main 12*a9aff440SNick Desaulniersphases as follow: 13*a9aff440SNick Desaulniers 14*a9aff440SNick Desaulniers1. Fast phase: 15*a9aff440SNick Desaulniers 16*a9aff440SNick Desaulniers a. Range reduction 17*a9aff440SNick Desaulniers b. Polynomial approximation 18*a9aff440SNick Desaulniers c. Ziv's test 19*a9aff440SNick Desaulniers 20*a9aff440SNick Desaulniers2. Accurate phase (if Ziv's test failed): 21*a9aff440SNick Desaulniers 22*a9aff440SNick Desaulniers a. Further range reduction 23*a9aff440SNick Desaulniers b. Polynomial approximation 24*a9aff440SNick Desaulniers 25*a9aff440SNick Desaulniers 26*a9aff440SNick DesaulniersFast phase 27*a9aff440SNick Desaulniers========== 28*a9aff440SNick Desaulniers 29*a9aff440SNick DesaulniersRange reduction 30*a9aff440SNick Desaulniers--------------- 31*a9aff440SNick Desaulniers 32*a9aff440SNick DesaulniersLet `x = 2^{e_x} (1 + m_x)` be a normalized double precision number, in which 33*a9aff440SNick Desaulniers`-1074 \leq e_x \leq 1022` and `0 \leq m_x < 1` such that 34*a9aff440SNick Desaulniers`2^{52} m_x \in \mathbb{Z}`. 35*a9aff440SNick Desaulniers 36*a9aff440SNick DesaulniersThen from the properties of logarithm: 37*a9aff440SNick Desaulniers 38*a9aff440SNick Desaulniers.. math:: 39*a9aff440SNick Desaulniers \log(x) &= \log\left( 2^{e_x} (1 + m_x) \right) \\ 40*a9aff440SNick Desaulniers &= \log\left( 2^{e_x} \right) + \log(1 + m_x) \\ 41*a9aff440SNick Desaulniers &= e_x \log(2) + \log(1 + m_x) 42*a9aff440SNick Desaulniers 43*a9aff440SNick Desaulniersthe computation of `\log(x)` can be reduced to: 44*a9aff440SNick Desaulniers 45*a9aff440SNick Desaulniers1. compute the product of `e_x` and `\log(2)`, 46*a9aff440SNick Desaulniers2. compute `\log(1 + m_x)` for `0 \leq m_x < 1`, 47*a9aff440SNick Desaulniers3. add step 1 and 2. 48*a9aff440SNick Desaulniers 49*a9aff440SNick DesaulniersTo compute `\log(1 + m_x)` in step 2, we can reduce the range further by finding 50*a9aff440SNick Desaulniers`r > 0` such that: 51*a9aff440SNick Desaulniers 52*a9aff440SNick Desaulniers.. math:: 53*a9aff440SNick Desaulniers | r(1 + m_x) - 1 | < C \quad \quad \text{(R1)} 54*a9aff440SNick Desaulniers 55*a9aff440SNick Desaulniersfor small `0 < C < 1`. Then if we let `u = r(1 + m_x) - 1`, `|u| < C`: 56*a9aff440SNick Desaulniers 57*a9aff440SNick Desaulniers.. math:: 58*a9aff440SNick Desaulniers \log(1 + m_x) &= \log \left( \frac{r (1 + m_x)}{r} \right) \\ 59*a9aff440SNick Desaulniers &= \log(r (1 + m_x) ) - \log(r) \\ 60*a9aff440SNick Desaulniers &= \log(1 + u) - \log(r) 61*a9aff440SNick Desaulniers 62*a9aff440SNick Desaulniersand step 2 can be computed with: 63*a9aff440SNick Desaulniers 64*a9aff440SNick Desaulniersa. extract `r` and `-\log(r)` from look-up tables, 65*a9aff440SNick Desaulniersb. compute the reduced argument `u = r(1 + m_x) - 1`, 66*a9aff440SNick Desaulniersc. compute `\log(1 + u)` by polynomial approximation or further range reduction, 67*a9aff440SNick Desaulniersd. add step a and step c results. 68*a9aff440SNick Desaulniers 69*a9aff440SNick Desaulniers 70*a9aff440SNick DesaulniersHow to derive `r` 71*a9aff440SNick Desaulniers----------------- 72*a9aff440SNick Desaulniers 73*a9aff440SNick DesaulniersFor an efficient implementation, we would like to use the first `M` significant 74*a9aff440SNick Desaulniersbits of `m_x` to look up for `r`. In particular, we would like to find a value 75*a9aff440SNick Desaulniersof `r` that works for all `m_x` satisfying: 76*a9aff440SNick Desaulniers 77*a9aff440SNick Desaulniers.. math:: 78*a9aff440SNick Desaulniers k 2^{-M} \leq m_x < (k + 1) 2^{-M} \quad \text{for some} \quad 79*a9aff440SNick Desaulniers k = 0..2^{M} - 1. \quad\quad \text{(M1)} 80*a9aff440SNick Desaulniers 81*a9aff440SNick DesaulniersLet `r = 1 + s`, then `u` can be expressed in terms of `s` as: 82*a9aff440SNick Desaulniers 83*a9aff440SNick Desaulniers.. math:: 84*a9aff440SNick Desaulniers u &= r(1 + m_x) - 1 \\ 85*a9aff440SNick Desaulniers &= (1 + s)(1 + m_x) - 1 \\ 86*a9aff440SNick Desaulniers &= s m_x + s + m_x &\quad\quad \text{(U1)} \\ 87*a9aff440SNick Desaulniers &= s (1 + m_x) + m_x \\ 88*a9aff440SNick Desaulniers &= m_x (1 + s) + s. 89*a9aff440SNick Desaulniers 90*a9aff440SNick DesaulniersFrom the condition `\text{(R1)}`, `s` is bounded by: 91*a9aff440SNick Desaulniers 92*a9aff440SNick Desaulniers.. math:: 93*a9aff440SNick Desaulniers \frac{-C - m_x}{1 + m_x} < s < \frac{C - m_x}{1 + m_x} \quad\quad \text{(S1)}. 94*a9aff440SNick Desaulniers 95*a9aff440SNick DesaulniersSince our reduction constant `s` must work for all `m_x` in the interval 96*a9aff440SNick Desaulniers`I = \{ v: k 2^{-M} \leq v < (k + 1) 2^{-M} \}`, `s` is bounded by: 97*a9aff440SNick Desaulniers 98*a9aff440SNick Desaulniers.. math:: 99*a9aff440SNick Desaulniers \sup_{v \in I} \frac{-C - v}{1 + v} < s < \inf_{v \in I} \frac{C - v}{1 + v} 100*a9aff440SNick Desaulniers 101*a9aff440SNick DesaulniersFor a fixed constant `|c| < 1`, let `f(v) = \frac{c - v}{1 + v}`, then its 102*a9aff440SNick Desaulniersderivative is: 103*a9aff440SNick Desaulniers 104*a9aff440SNick Desaulniers.. math:: 105*a9aff440SNick Desaulniers f'(v) = \frac{(-1)(1 + v) - (1)(c - v)}{(1 + v)^2} = \frac{-1 - c}{(1 + v)^2}. 106*a9aff440SNick Desaulniers 107*a9aff440SNick DesaulniersSince `|c| < 1`, `f'(v) < 0` for all `v \neq -1`, so: 108*a9aff440SNick Desaulniers 109*a9aff440SNick Desaulniers.. math:: 110*a9aff440SNick Desaulniers \sup_{v \in I} f(v) &= f \left( \inf\{ v: v \in I \} \right) 111*a9aff440SNick Desaulniers = f \left( k 2^{-M} \right) \\ 112*a9aff440SNick Desaulniers \inf_{v \in I} f(v) &= f \left( \sup\{ v: v \in I \} \right) 113*a9aff440SNick Desaulniers = f \left( (k + 1) 2^{-M} \right) 114*a9aff440SNick Desaulniers 115*a9aff440SNick DesaulniersHence we have the following bound on `s`: 116*a9aff440SNick Desaulniers 117*a9aff440SNick Desaulniers.. math:: 118*a9aff440SNick Desaulniers \frac{-C - k 2^{-M}}{1 + k 2^{-M}} < s \leq 119*a9aff440SNick Desaulniers \frac{C - (k + 1) 2^{-M}}{1 + (k + 1) 2^{-M}}. \quad\quad \text{(S2)} 120*a9aff440SNick Desaulniers 121*a9aff440SNick DesaulniersIn order for `s` to exist, we need that: 122*a9aff440SNick Desaulniers 123*a9aff440SNick Desaulniers.. math:: 124*a9aff440SNick Desaulniers \frac{C - (k + 1) 2^{-M}}{1 + (k + 1) 2^{-M}} > 125*a9aff440SNick Desaulniers \frac{-C - k 2^{-M}}{1 + k 2^{-M}} 126*a9aff440SNick Desaulniers 127*a9aff440SNick Desaulnierswhich is equivalent to: 128*a9aff440SNick Desaulniers 129*a9aff440SNick Desaulniers.. math:: 130*a9aff440SNick Desaulniers \quad\quad 2C - 2^{-M} + (2k + 1) 2^{-M} C > 0 131*a9aff440SNick Desaulniers \iff C > \frac{2^{-M - 1}}{1 + (2k + 1) 2^{-M - 1}} \quad\quad \text{(C1)}. 132*a9aff440SNick Desaulniers 133*a9aff440SNick DesaulniersConsider the case `C = 2^{-N}`. Since `0 \leq k \leq 2^M - 1,` the right hand 134*a9aff440SNick Desaulniersside of `\text{(C1)}` is bounded by: 135*a9aff440SNick Desaulniers 136*a9aff440SNick Desaulniers.. math:: 137*a9aff440SNick Desaulniers 2^{-M - 1} > \frac{2^{-M - 1}}{1 + (2k + 1) 2^{-M - 1}} \geq 138*a9aff440SNick Desaulniers \frac{2^{-M - 1}}{1 + (2^{M + 1} - 1) 2^{-M - 1}} > 2^{-M - 2}. 139*a9aff440SNick Desaulniers 140*a9aff440SNick DesaulniersHence, from `\text{(C1)}`, being an exact power of 2, `C = 2^{-N}` is bounded below 141*a9aff440SNick Desaulniersby: 142*a9aff440SNick Desaulniers 143*a9aff440SNick Desaulniers.. math:: 144*a9aff440SNick Desaulniers C = 2^{-N} \geq 2^{-M - 1}. 145*a9aff440SNick Desaulniers 146*a9aff440SNick DesaulniersTo make the range reduction efficient, we will want to minimize `C` (maximize 147*a9aff440SNick Desaulniers`N`) while keeping the required precision of `s`(`r`) as low as possible. And 148*a9aff440SNick Desaulniersfor that, we will consider the following two cases: `N = M + 1` and `N = M`. 149*a9aff440SNick Desaulniers 150*a9aff440SNick DesaulniersCase 1 - `N = M + 1` 151*a9aff440SNick Desaulniers~~~~~~~~~~~~~~~~~~~~ 152*a9aff440SNick Desaulniers 153*a9aff440SNick DesaulniersWhen `N = M + 1`, `\text{(S2)}` becomes: 154*a9aff440SNick Desaulniers 155*a9aff440SNick Desaulniers.. math:: 156*a9aff440SNick Desaulniers \frac{-2^{-M - 1} - k 2^{-M}}{1 + k 2^{-M}} < s < 157*a9aff440SNick Desaulniers \frac{2^{-M - 1} - (k + 1) 2^{-M}}{1 + (k + 1) 2^{-M}}. 158*a9aff440SNick Desaulniers \quad\quad \text{(S2')} 159*a9aff440SNick Desaulniers 160*a9aff440SNick DesaulniersThis is an interval of length: 161*a9aff440SNick Desaulniers 162*a9aff440SNick Desaulniers.. math:: 163*a9aff440SNick Desaulniers l &= \frac{2^{-M - 1} - (k + 1) 2^{-M}}{1 + (k + 1) 2^{-M}} - 164*a9aff440SNick Desaulniers \frac{-2^{-M - 1} - k 2^{-M}}{1 + k 2^{-M}} \\ 165*a9aff440SNick Desaulniers &= \frac{(2k + 1)2^{-2M - 1}}{(1 + k 2^{-M})(1 + (k + 1)2^{-M})} 166*a9aff440SNick Desaulniers \quad\quad \text{(L1)} 167*a9aff440SNick Desaulniers 168*a9aff440SNick DesaulniersAs a function of `k`, the length `l` has its derivative with respect to `k`: 169*a9aff440SNick Desaulniers 170*a9aff440SNick Desaulniers.. math:: 171*a9aff440SNick Desaulniers \frac{dl}{dk} = 172*a9aff440SNick Desaulniers \frac{2^{2M + 1} - 2k(k + 1) - 1} 173*a9aff440SNick Desaulniers {2^{4M}(1 + k 2^{-M})^2 (1 + (k + 1) 2^{-M})^2} 174*a9aff440SNick Desaulniers 175*a9aff440SNick Desaulnierswhich is always positive for `0 \leq k \leq 2^M - 1`. So for all 176*a9aff440SNick Desaulniers`0 < k < 2^{-M}` (`k = 0` will be treated differently in edge cases), and for 177*a9aff440SNick Desaulniers`M > 2`, `l` is bounded below by: 178*a9aff440SNick Desaulniers 179*a9aff440SNick Desaulniers.. math:: 180*a9aff440SNick Desaulniers l > 2^{-2M}. 181*a9aff440SNick Desaulniers 182*a9aff440SNick DesaulniersIt implies that we can always find `s` with `\operatorname{ulp}(s) = 2^{-2M}`. 183*a9aff440SNick DesaulniersAnd from `\text{(U1)}`, `u = s(1 + m_x) + m_x`, its `ulp` is: 184*a9aff440SNick Desaulniers 185*a9aff440SNick Desaulniers.. math:: 186*a9aff440SNick Desaulniers \operatorname{ulp}(u) &= \operatorname{ulp}(s) \cdot \operatorname{ulp}(m_x) \\ 187*a9aff440SNick Desaulniers &= 2^{-2M} \operatorname{ulp}(m_x). 188*a9aff440SNick Desaulniers 189*a9aff440SNick DesaulniersSince: 190*a9aff440SNick Desaulniers 191*a9aff440SNick Desaulniers.. math:: 192*a9aff440SNick Desaulniers |u| < C = 2^{-N} = 2^{-M - 1}, 193*a9aff440SNick Desaulniers 194*a9aff440SNick DesaulniersIts required precision is: 195*a9aff440SNick Desaulniers 196*a9aff440SNick Desaulniers.. math:: 197*a9aff440SNick Desaulniers \operatorname{prec}(u) &= \log_2(2^{-M-1} / \operatorname{ulp}(u)) \\ 198*a9aff440SNick Desaulniers &= \log_2(2^{M - 1} / \operatorname{ulp}(m_x)) \\ 199*a9aff440SNick Desaulniers &= M - 1 - \log_2(\operatorname{ulp}(m_x)). 200*a9aff440SNick Desaulniers 201*a9aff440SNick DesaulniersThis means that in this case, we cannot restrict `u` to be exactly representable 202*a9aff440SNick Desaulniersin double precision for double precision input `x` with `M > 2`. Nonetheless, 203*a9aff440SNick Desaulniersfor a reasonable value of `M`, we can have `u` exactly representable in double 204*a9aff440SNick Desaulniersprecision for single precision input `x` (`\operatorname{ulp}(m_x) = 2^{-23}`) 205*a9aff440SNick Desaulnierssuch that `|u| < 2^{-M - 1}` using a look-up table of size `2^M`. 206*a9aff440SNick Desaulniers 207*a9aff440SNick DesaulniersA particular formula for `s` can be derived from `\text{(S2')}` by the midpoint 208*a9aff440SNick Desaulniersformula: 209*a9aff440SNick Desaulniers 210*a9aff440SNick Desaulniers.. math:: 211*a9aff440SNick Desaulniers s &= 2^{-2M} \operatorname{round}\left( 2^{2M} \cdot \operatorname{midpoint} 212*a9aff440SNick Desaulniers \left(-\frac{-2^{-M - 1} - k2^{-M}}{1 + k 2^{-M}}, 213*a9aff440SNick Desaulniers \frac{2^{-M-1} - (k + 1)2^{-M}}{1 + (k + 1) 2^{-M}}\right) \right) \\ 214*a9aff440SNick Desaulniers &= 2^{-2M} \operatorname{round}\left( 2^{2M} \cdot \frac{1}{2} \left( 215*a9aff440SNick Desaulniers \frac{-2^{-M - 1} - k2^{-M}}{1 + k 2^{-M}} + 216*a9aff440SNick Desaulniers \frac{2^{-M - 1} + (k + 1)2^{-M}}{1 + (k + 1) 2^{-M}} 217*a9aff440SNick Desaulniers \right) \right) \\ 218*a9aff440SNick Desaulniers &= 2^{-2M} \operatorname{round}\left( \frac{ 219*a9aff440SNick Desaulniers - \left(k + \frac{1}{2} \right) \left(2^M - k - \frac{1}{2} \right) } 220*a9aff440SNick Desaulniers {(1 + k 2^{-N})(1 + (k + 1) 2^{-N})} \right) \\ 221*a9aff440SNick Desaulniers &= - 2^{-2M} \operatorname{round}\left( \frac{ 222*a9aff440SNick Desaulniers \left(k + \frac{1}{2} \right) \left(2^M - k - \frac{1}{2} \right) } 223*a9aff440SNick Desaulniers {(1 + k 2^{-N})(1 + (k + 1) 2^{-N})} \right) \quad\quad \text{(S3)} 224*a9aff440SNick Desaulniers 225*a9aff440SNick DesaulniersThe corresponding range and formula for `r = 1 + s` are: 226*a9aff440SNick Desaulniers 227*a9aff440SNick Desaulniers.. math:: 228*a9aff440SNick Desaulniers \frac{1 - 2^{-M - 1}}{1 + k 2^{-M}} < r \leq 229*a9aff440SNick Desaulniers \frac{1 + 2^{-M - 1}}{1 + (k + 1) 2^{-M}} 230*a9aff440SNick Desaulniers 231*a9aff440SNick Desaulniers.. math:: 232*a9aff440SNick Desaulniers r &= 2^{-2M} \operatorname{round}\left( 2^{2M} \cdot 233*a9aff440SNick Desaulniers \operatorname{midpoint}\left( \frac{1 - 2^{-M - 1}}{1 + k 2^{-M}}, 234*a9aff440SNick Desaulniers \frac{1 + 2^{-M - 1}}{1 + (k + 1) 2^{-M}}\right) \right) \\ 235*a9aff440SNick Desaulniers &= 2^{-2M} \operatorname{round}\left( 2^{2M} \cdot \frac{1}{2} \left( 236*a9aff440SNick Desaulniers \frac{1 + 2^{-M-1}}{1 + (k + 1) 2^{-M}} + \frac{1 - 2^{-M-1}}{1 + k 2^{-M}} 237*a9aff440SNick Desaulniers \right) \right) \\ 238*a9aff440SNick Desaulniers &= 2^{-2M} \operatorname{round}\left( 2^{2M} \cdot \frac{ 239*a9aff440SNick Desaulniers 1 + \left(k + \frac{1}{2} \right) 2^{-M} - 2^{-2M-2} }{(1 + k 2^{-M}) 240*a9aff440SNick Desaulniers (1 + (k + 1) 2^{-M})} \right) 241*a9aff440SNick Desaulniers 242*a9aff440SNick DesaulniersCase 1 - `N = M` 243*a9aff440SNick Desaulniers~~~~~~~~~~~~~~~~ 244*a9aff440SNick Desaulniers 245*a9aff440SNick DesaulniersWhen `N = M`, `\text{(S2)}` becomes: 246*a9aff440SNick Desaulniers 247*a9aff440SNick Desaulniers.. math:: 248*a9aff440SNick Desaulniers \frac{-(k + 1)2^{-M}}{1 + k 2^{-M}} < s < \frac{-k 2^{-M}}{1 + (k + 1) 2^{-M}} 249*a9aff440SNick Desaulniers \quad\quad \text{(S2")} 250*a9aff440SNick Desaulniers 251*a9aff440SNick DesaulniersThis is an interval of length: 252*a9aff440SNick Desaulniers 253*a9aff440SNick Desaulniers.. math:: 254*a9aff440SNick Desaulniers l &= \frac{- k 2^{-M}}{1 + (k + 1) 2^{-M}} - 255*a9aff440SNick Desaulniers \frac{- (k + 1) 2^{-M}}{1 + k 2^{-M}} \\ 256*a9aff440SNick Desaulniers &= \frac{2^{-M} (1 + (2k + 1) 2^{-M})}{(1 + k 2^{-M})(1 + (k + 1)2^{-M})} 257*a9aff440SNick Desaulniers \quad\quad \text{(L1')} 258*a9aff440SNick Desaulniers 259*a9aff440SNick DesaulniersAs a function of `k`, its derivative with respect to `k`: 260*a9aff440SNick Desaulniers 261*a9aff440SNick Desaulniers.. math:: 262*a9aff440SNick Desaulniers \frac{dl}{dk} = 263*a9aff440SNick Desaulniers -\frac{2^{-2M}(k(k + 1)2^{-M + 1} + 2^{-M} + 2k + 1)} 264*a9aff440SNick Desaulniers {(1 + k 2^{-M})^2 (1 + (k + 1) 2^{-M})^2} 265*a9aff440SNick Desaulniers 266*a9aff440SNick Desaulnierswhich is always negative for `0 \leq k \leq 2^M - 1`. So for `M > 1`, `l` is 267*a9aff440SNick Desaulniersbounded below by: 268*a9aff440SNick Desaulniers 269*a9aff440SNick Desaulniers.. math:: 270*a9aff440SNick Desaulniers l > \frac{2^{-M - 1} (3 - 2^{-M})}{2 - 2^{-M}} > 2^{-M - 1}. 271*a9aff440SNick Desaulniers 272*a9aff440SNick DesaulniersIt implies that we can always find `s` with `\operatorname{ulp}(s) = 2^{-M-1}`. 273*a9aff440SNick DesaulniersAnd from `\text{(U1)}`, `u = s(1 + m_x) + m_x`, its `ulp` is: 274*a9aff440SNick Desaulniers 275*a9aff440SNick Desaulniers.. math:: 276*a9aff440SNick Desaulniers \operatorname{ulp}(u) &= \operatorname{ulp}(s) \cdot \operatorname{ulp}(m_x) \\ 277*a9aff440SNick Desaulniers &= 2^{-M - 1} \operatorname{ulp}(m_x). 278*a9aff440SNick Desaulniers 279*a9aff440SNick DesaulniersSince: 280*a9aff440SNick Desaulniers 281*a9aff440SNick Desaulniers.. math:: 282*a9aff440SNick Desaulniers |u| < C = 2^{-N} = 2^{-M}, 283*a9aff440SNick Desaulniers 284*a9aff440SNick DesaulniersIts required precision is: 285*a9aff440SNick Desaulniers 286*a9aff440SNick Desaulniers.. math:: 287*a9aff440SNick Desaulniers \operatorname{prec}(u) &= \log_2(2^{-M} / \operatorname{ulp}(u)) \\ 288*a9aff440SNick Desaulniers &= \log_2(2 / \operatorname{ulp}(m_x)) \\ 289*a9aff440SNick Desaulniers &= 1 - \log_2(\operatorname{ulp}(m_x)). 290*a9aff440SNick Desaulniers 291*a9aff440SNick DesaulniersHence, for double precision `x`, `\operatorname{ulp}(m_x) = 2^{-52}`, and the 292*a9aff440SNick Desaulniersprecision needed for `u` is `\operatorname{prec}(u) = 53`, i.e., `u` can be 293*a9aff440SNick Desaulniersexactly representable in double precision. And in this case, `s` can be 294*a9aff440SNick Desaulniersderived from `\text{(S2")}` by the midpoint formula: 295*a9aff440SNick Desaulniers 296*a9aff440SNick Desaulniers.. math:: 297*a9aff440SNick Desaulniers s &= 2^{-M - 1} \operatorname{round}\left( 2^{M + 1} \cdot 298*a9aff440SNick Desaulniers \operatorname{midpoint} \left(-\frac{-(k + 1)2^{-M}}{1 + k 2^{-M}}, 299*a9aff440SNick Desaulniers \frac{-k2^{-M}}{1 + (k + 1) 2^{-M}}\right) \right) \\ 300*a9aff440SNick Desaulniers &= 2^{-M - 1} \operatorname{round}\left( 2^{M + 1} \cdot \frac{1}{2} \left( 301*a9aff440SNick Desaulniers \frac{-(k + 1)2^{-M}}{1 + k 2^{-M}} + \frac{-k2^{-M}}{1 + (k + 1) 2^{-M}} 302*a9aff440SNick Desaulniers \right) \right) \\ 303*a9aff440SNick Desaulniers &= -2^{-M - 1} \operatorname{round}\left( \frac{ 304*a9aff440SNick Desaulniers (2k + 1) + (2k^2 + 2k + 1) 2^{-M} } 305*a9aff440SNick Desaulniers {(1 + k 2^{-N})(1 + (k + 1) 2^{-N})} \right) \quad\quad \text{(S3')} 306*a9aff440SNick Desaulniers 307*a9aff440SNick DesaulniersThe corresponding range and formula for `r = 1 + s` are: 308*a9aff440SNick Desaulniers 309*a9aff440SNick Desaulniers.. math:: 310*a9aff440SNick Desaulniers \frac{1 - 2^{-M}}{1 + k 2^{-M}} < r \leq \frac{1 + 2^{-M}}{1 + (k + 1) 2^{-M}} 311*a9aff440SNick Desaulniers 312*a9aff440SNick Desaulniers.. math:: 313*a9aff440SNick Desaulniers r &= 2^{-M-1} \operatorname{round}\left( 2^{M + 1} \cdot 314*a9aff440SNick Desaulniers \operatorname{midpoint}\left( \frac{1 - 2^{-M}}{1 + k 2^{-M}}, 315*a9aff440SNick Desaulniers \frac{1 + 2^{-M}}{1 + (k + 1) 2^{-M}}\right) \right) \\ 316*a9aff440SNick Desaulniers &= 2^{-M-1} \operatorname{round}\left( 2^{M + 1} \cdot \frac{1}{2} \left( 317*a9aff440SNick Desaulniers \frac{1 + 2^{-M}}{1 + (k + 1) 2^{-M}} + \frac{1 - 2^{-M}}{1 + k 2^{-M}} 318*a9aff440SNick Desaulniers \right) \right) \\ 319*a9aff440SNick Desaulniers &= 2^{-M - 1} \operatorname{round}\left( 2^{M + 1} \cdot \frac{ 320*a9aff440SNick Desaulniers 1 + \left(k + \frac{1}{2} \right) 2^{-M} - 2^{-2M-1} }{(1 + k 2^{-M}) 321*a9aff440SNick Desaulniers (1 + (k + 1) 2^{-M})} \right) 322*a9aff440SNick Desaulniers 323*a9aff440SNick DesaulniersEdge cases 324*a9aff440SNick Desaulniers---------- 325*a9aff440SNick Desaulniers 326*a9aff440SNick Desaulniers1. When `k = 0`, notice that: 327*a9aff440SNick Desaulniers 328*a9aff440SNick Desaulniers.. math:: 329*a9aff440SNick Desaulniers 0 = k 2^{-N} \leq m_x < (k + 1) 2^{-N} = 2^{-N} = C, 330*a9aff440SNick Desaulniers 331*a9aff440SNick Desaulniersso we can simply choose `r = 1` so that `\log(r) = 0` is exact, then `u = m_x`. 332*a9aff440SNick DesaulniersThis will help reduce the accumulated errors when `m_x` is close to 0 while 333*a9aff440SNick Desaulniersmaintaining the range reduction output's requirements. 334*a9aff440SNick Desaulniers 335*a9aff440SNick Desaulniers2. When `k = 2^{N} - 1`, `\text{(S2)}` becomes: 336*a9aff440SNick Desaulniers 337*a9aff440SNick Desaulniers.. math:: 338*a9aff440SNick Desaulniers -\frac{1}{2} - \frac{C - 2^{-M-1}}{2 - 2^{-M}} <> s \leq 339*a9aff440SNick Desaulniers -\frac{1}{2} + \frac{C}{2}. 340*a9aff440SNick Desaulniers 341*a9aff440SNick Desaulniersso when `C > 2^{-M - 1}` is a power of 2, we can always choose: 342*a9aff440SNick Desaulniers 343*a9aff440SNick Desaulniers.. math:: 344*a9aff440SNick Desaulniers s = -\frac{1}{2}, \quad \text{i.e.} \quad r = \frac{1}{2}. 345*a9aff440SNick Desaulniers 346*a9aff440SNick DesaulniersThis reduction works well to avoid catastrophic cancellation happening when 347*a9aff440SNick Desaulniers`e_x = -1`. 348*a9aff440SNick Desaulniers 349*a9aff440SNick DesaulniersThis also works when `C = 2^{-M - 1}` if we relax the condition on `u` to 350*a9aff440SNick Desaulniers`|u| \leq C = 2^{-M-1}`. 351*a9aff440SNick Desaulniers 352*a9aff440SNick DesaulniersIntermediate precision, and Ziv's test 353*a9aff440SNick Desaulniers-------------------------------------- 354*a9aff440SNick Desaulniers 355*a9aff440SNick DesaulniersIn the fast phase, we want extra precision while performant, so we use 356*a9aff440SNick Desaulniersdouble-double precision for most intermediate computation steps, and employ Ziv 357*a9aff440SNick Desaulnierstest to see if the result is accurate or not. In our case, the Ziv's test can 358*a9aff440SNick Desaulniersbe described as follow: 359*a9aff440SNick Desaulniers 360*a9aff440SNick Desaulniers1. Let `re = re.hi + re.lo` be the double-double output of the fast phase 361*a9aff440SNick Desaulniers computation. 362*a9aff440SNick Desaulniers2. Let `err` be an estimated upper bound of the errors of `re`. 363*a9aff440SNick Desaulniers3. If `\circ(re.hi + (re.lo - err)) == \circ(re.hi + (r.lo + err))` then the 364*a9aff440SNick Desaulniers result is correctly rounded to double precision for the current rounding mode 365*a9aff440SNick Desaulniers `\circ`. Otherwise, the accurate phase with extra precision is needed. 366*a9aff440SNick Desaulniers 367*a9aff440SNick DesaulniersFor an easy and cheap estimation of the error bound `err`, since the range 368*a9aff440SNick Desaulniersreduction step described above is accurate, the errors of the result: 369*a9aff440SNick Desaulniers 370*a9aff440SNick Desaulniers.. math:: 371*a9aff440SNick Desaulniers \log(x) &= e_x \log(2) - \log(r) + \log(1 + u) \\ 372*a9aff440SNick Desaulniers &\approx e_x \log(2) - \log(r) + u P(u) 373*a9aff440SNick Desaulniers 374*a9aff440SNick Desaulnierscome from 2 parts: 375*a9aff440SNick Desaulniers 376*a9aff440SNick Desaulniers1. the look-up part: `e_x \log(2) - \log(r)` 377*a9aff440SNick Desaulniers2. the polynomial approximation part: `u P(u)` 378*a9aff440SNick Desaulniers 379*a9aff440SNick DesaulniersThe errors of the first part can be computed with a single `\operatorname{fma}` 380*a9aff440SNick Desaulniersoperation: 381*a9aff440SNick Desaulniers 382*a9aff440SNick Desaulniers.. math:: 383*a9aff440SNick Desaulniers err_1 = \operatorname{fma}(e_x, err(\log(2)), err(\log(r))), 384*a9aff440SNick Desaulniers 385*a9aff440SNick Desaulniersand then combining with the errors of the second part for another 386*a9aff440SNick Desaulniers`\operatorname{fma}` operation: 387*a9aff440SNick Desaulniers 388*a9aff440SNick Desaulniers.. math:: 389*a9aff440SNick Desaulniers err = \operatorname{fma}(u, err(P), err_1) 390*a9aff440SNick Desaulniers 391*a9aff440SNick Desaulniers 392*a9aff440SNick DesaulniersAccurate phase 393*a9aff440SNick Desaulniers============== 394*a9aff440SNick Desaulniers 395*a9aff440SNick DesaulniersExtending range reduction 396*a9aff440SNick Desaulniers------------------------- 397*a9aff440SNick Desaulniers 398*a9aff440SNick DesaulniersSince the output `u = r(1 + m_x) - 1` of the fast phase's range reduction 399*a9aff440SNick Desaulniersis computed exactly, we can apply further range reduction steps by 400*a9aff440SNick Desaulniersusing the following formula: 401*a9aff440SNick Desaulniers 402*a9aff440SNick Desaulniers.. math:: 403*a9aff440SNick Desaulniers u_{i + 1} = r_i(1 + u_i) - 1 = u_i \cdot r_i + (r_i - 1), 404*a9aff440SNick Desaulniers 405*a9aff440SNick Desaulnierswhere `|u_i| < 2^{-N_i}` and `u_0 = u` is representable in double precision. 406*a9aff440SNick Desaulniers 407*a9aff440SNick DesaulniersLet `s_i = r_i - 1`, then we can rewrite it as: 408*a9aff440SNick Desaulniers 409*a9aff440SNick Desaulniers.. math:: 410*a9aff440SNick Desaulniers u_{i + 1} &= (1 + s_i)(1 + u_i) - 1 \\ 411*a9aff440SNick Desaulniers &= s_i u_i + u_i + s_i \\ 412*a9aff440SNick Desaulniers &= u_i (1 + s_i) + s_i 413*a9aff440SNick Desaulniers &= s_i (1 + u_i) + u_i. 414*a9aff440SNick Desaulniers 415*a9aff440SNick DesaulniersThen the bound on `u_{i + 1}` is translated to `s_i` as: 416*a9aff440SNick Desaulniers 417*a9aff440SNick Desaulniers.. math:: 418*a9aff440SNick Desaulniers \frac{-2^{-N_{i + 1}} - u_i}{1 + u_i} < s_i < \frac{2^{-N_{i + 1}} - u_i}{1 + u_i}. 419*a9aff440SNick Desaulniers 420*a9aff440SNick DesaulniersLet say we divide the interval `[0, 2^-{N_i})` into `2^{M_i}` subintervals 421*a9aff440SNick Desaulniersevenly and use the index `k` such that: 422*a9aff440SNick Desaulniers 423*a9aff440SNick Desaulniers.. math:: 424*a9aff440SNick Desaulniers k 2^{-N_i - M_i} \leq u_i < (k + 1) 2^{-N_i - M_i}, 425*a9aff440SNick Desaulniers 426*a9aff440SNick Desaulniersto look-up for the reduction constant `s_{i, k}`. In other word, `k` is given 427*a9aff440SNick Desaulniersby the formula: 428*a9aff440SNick Desaulniers 429*a9aff440SNick Desaulniers.. math:: 430*a9aff440SNick Desaulniers k = \left\lfloor 2^{N_i + M_i} u_i \right\rfloor 431*a9aff440SNick Desaulniers 432*a9aff440SNick DesaulniersNotice that our reduction constant `s_{i, k}` must work for all `u_i` in the 433*a9aff440SNick Desaulniersinterval `I = \{ v: k 2^{-N_i - M_i} \leq v < (k + 1) 2^{-N_i - M_i} \}`, 434*a9aff440SNick Desaulniersso it is bounded by: 435*a9aff440SNick Desaulniers 436*a9aff440SNick Desaulniers.. math:: 437*a9aff440SNick Desaulniers \sup_{v \in I} \frac{-2^{-N_{i + 1}} - v}{1 + v} < s_{i, k} < \inf_{v \in I} \frac{2^{-N_{i + 1}} - v}{1 + v} 438*a9aff440SNick Desaulniers 439*a9aff440SNick DesaulniersFor a fixed constant `|C| < 1`, let `f(v) = \frac{C - v}{1 + v}`, then its derivative 440*a9aff440SNick Desaulniersis: 441*a9aff440SNick Desaulniers 442*a9aff440SNick Desaulniers.. math:: 443*a9aff440SNick Desaulniers f'(v) = \frac{(-1)(1 + v) - (1)(C - v)}{(1 + v)^2} = \frac{-1 - C}{(1 + v)^2}. 444*a9aff440SNick Desaulniers 445*a9aff440SNick DesaulniersSince `|C| < 1`, `f'(v) < 0` for all `v \neq -1`, so: 446*a9aff440SNick Desaulniers 447*a9aff440SNick Desaulniers.. math:: 448*a9aff440SNick Desaulniers \sup_{v \in I} f(v) &= f \left( \inf\{ v: v \in I \} \right) 449*a9aff440SNick Desaulniers = f \left( k 2^{-N_i - M_i} \right) \\ 450*a9aff440SNick Desaulniers \inf_{v \in I} f(v) &= f \left( \sup\{ v: v \in I \} \right) 451*a9aff440SNick Desaulniers = f \left( (k + 1) 2^{-N_i - M_i} \right) 452*a9aff440SNick Desaulniers 453*a9aff440SNick DesaulniersHence we have the following bound on `s_{i, k}`: 454*a9aff440SNick Desaulniers 455*a9aff440SNick Desaulniers.. math:: 456*a9aff440SNick Desaulniers \frac{-2^{-N_{i + 1}} - k 2^{-N_i - M_i}}{1 + k 2^{-N_i - M_i}} < s_{i, k} 457*a9aff440SNick Desaulniers \leq \frac{2^{-N_{i + 1}} - (k + 1) 2^{-N_i - M_i}}{1 + (k + 1) 2^{-N_i - M_i}} 458*a9aff440SNick Desaulniers 459*a9aff440SNick DesaulniersThis interval is of length: 460*a9aff440SNick Desaulniers 461*a9aff440SNick Desaulniers.. math:: 462*a9aff440SNick Desaulniers l &= \frac{2^{-N_{i + 1}} - (k + 1) 2^{-N_i - M_i}}{1 + (k + 1) 2^{-N_i - M_i}} - 463*a9aff440SNick Desaulniers \frac{-2^{-N_{i + 1}} - k 2^{-N_i - M_i}}{1 + k 2^{-N_i - M_i}} \\ 464*a9aff440SNick Desaulniers &= \frac{2^{-N_{i + 1} + 1} - 2^{-N_i - M_i} + (2k + 1) 2^{-N_{i + 1} - N_i - M_i}} 465*a9aff440SNick Desaulniers {(1 + k 2^{-N_i - M_i})(1 + (k + 1) 2^{-N_i -M_i})} 466*a9aff440SNick Desaulniers 467*a9aff440SNick DesaulniersSo in order to be able to find `s_{i, k}`, we need that: 468*a9aff440SNick Desaulniers 469*a9aff440SNick Desaulniers.. math:: 470*a9aff440SNick Desaulniers 2^{-N_{i + 1} + 1} - 2^{-N_i - M_i} + (2k + 1) 2^{-N_{i + 1} - N_i - M_i} > 0 471*a9aff440SNick Desaulniers 472*a9aff440SNick DesaulniersThis give us the following bound on `N_{i + 1}`: 473*a9aff440SNick Desaulniers 474*a9aff440SNick Desaulniers.. math:: 475*a9aff440SNick Desaulniers N_{i + 1} \leq N_i + M_i + 1. 476*a9aff440SNick Desaulniers 477*a9aff440SNick DesaulniersTo make the range reduction effective, we will want to maximize `N_{i + 1}`, so 478*a9aff440SNick Desaulnierslet consider the two cases: `N_{i + 1} = N_i + M_i + 1` and 479*a9aff440SNick Desaulniers`N_{i + 1} = N_i + M_i`. 480*a9aff440SNick Desaulniers 481*a9aff440SNick Desaulniers 482*a9aff440SNick Desaulniers 483*a9aff440SNick DesaulniersThe optimal choice to balance between maximizing `N_{i + 1}` and minimizing the 484*a9aff440SNick Desaulniersprecision needed for `s_{i, k}` is: 485*a9aff440SNick Desaulniers 486*a9aff440SNick Desaulniers.. math:: 487*a9aff440SNick Desaulniers N_{i + 1} = N_i + M_i, 488*a9aff440SNick Desaulniers 489*a9aff440SNick Desaulniersand in this case, the optimal `\operatorname{ulp}(s_{i, k})` is: 490*a9aff440SNick Desaulniers 491*a9aff440SNick Desaulniers.. math:: 492*a9aff440SNick Desaulniers \operatorname{ulp}(s_{i, k}) = 2^{-N_i - M_i} 493*a9aff440SNick Desaulniers 494*a9aff440SNick Desaulniersand the corresponding `\operatorname{ulp}(u_{i + 1})` is: 495*a9aff440SNick Desaulniers 496*a9aff440SNick Desaulniers.. math:: 497*a9aff440SNick Desaulniers \operatorname{ulp}(u_{i + 1}) &= \operatorname{ulp}(u_i) \operatorname{ulp}(s_{i, k}) \\ 498*a9aff440SNick Desaulniers &= \operatorname{ulp}(u_i) \cdot 2^{-N_i - M_i} \\ 499*a9aff440SNick Desaulniers &= \operatorname{ulp}(u_0) \cdot 2^{-N_0 - M_0} \cdot 2^{-N_0 - M_0 - M_1} \cdots 2^{-N_0 - M_0 - M_1 - \cdots - M_i} \\ 500*a9aff440SNick Desaulniers &= 2^{-N_0 - 53} \cdot 2^{-N_0 - M_0} \cdot 2^{-N_0 - M_0 - M_1} \cdots 2^{-N_0 - M_0 - M_1 - \cdots - M_i} 501*a9aff440SNick Desaulniers 502*a9aff440SNick DesaulniersSince `|u_{i + 1}| < 2^{-N_{i + 1}} = 2^{-N_0 - M_1 - ... -M_i}`, the precision 503*a9aff440SNick Desaulniersof `u_{i + 1}` is: 504*a9aff440SNick Desaulniers 505*a9aff440SNick Desaulniers.. math:: 506*a9aff440SNick Desaulniers \operatorname{prec}(u_{i + 1}) &= (N_0 + 53) + (N_0 + M_0) + \cdots + 507*a9aff440SNick Desaulniers (N_0 + M_0 + \cdots + M_i) - (N_0 + M_0 + \cdots + M_i) \\ 508*a9aff440SNick Desaulniers &= (i + 1) N_0 + i M_0 + (i - 1) M_1 + \cdots + M_{i - 1} + 53 509*a9aff440SNick Desaulniers 510*a9aff440SNick DesaulniersIf we choose to have the same `M_0 = M_1 = \cdots = M_i = M`, this can be 511*a9aff440SNick Desaulnierssimplified to: 512*a9aff440SNick Desaulniers 513*a9aff440SNick Desaulniers.. math:: 514*a9aff440SNick Desaulniers \operatorname{prec}(u_{i + 1}) = (i + 1) N_0 + \frac{i(i + 1)}{2} \cdot M + 53. 515*a9aff440SNick Desaulniers 516*a9aff440SNick DesaulniersWe summarize the precision analysis for extending the range reduction in the 517*a9aff440SNick Desaulnierstable below: 518*a9aff440SNick Desaulniers 519*a9aff440SNick Desaulniers+-------+-----+-----------+------------+--------------+-----------------+-------------------+ 520*a9aff440SNick Desaulniers| `N_0` | `M` | No. steps | Table size | Output bound | ulp(`s_{i, k}`) | prec(`u_{i + 1}`) | 521*a9aff440SNick Desaulniers+-------+-----+-----------+------------+--------------+-----------------+-------------------+ 522*a9aff440SNick Desaulniers| 7 | 4 | 1 | 32 | `2^{-11}` | `2^{-12}` | 60 | 523*a9aff440SNick Desaulniers| | +-----------+------------+--------------+-----------------+-------------------+ 524*a9aff440SNick Desaulniers| | | 2 | 64 | `2^{-15}` | `2^{-16}` | 71 | 525*a9aff440SNick Desaulniers| | +-----------+------------+--------------+-----------------+-------------------+ 526*a9aff440SNick Desaulniers| | | 3 | 96 | `2^{-19}` | `2^{-20}` | 86 | 527*a9aff440SNick Desaulniers| | +-----------+------------+--------------+-----------------+-------------------+ 528*a9aff440SNick Desaulniers| | | 4 | 128 | `2^{-23}` | `2^{-24}` | 105 | 529*a9aff440SNick Desaulniers| | +-----------+------------+--------------+-----------------+-------------------+ 530*a9aff440SNick Desaulniers| | | 5 | 160 | `2^{-27}` | `2^{-28}` | 128 | 531*a9aff440SNick Desaulniers| | +-----------+------------+--------------+-----------------+-------------------+ 532*a9aff440SNick Desaulniers| | | 6 | 192 | `2^{-31}` | `2^{-32}` | 155 | 533*a9aff440SNick Desaulniers| +-----+-----------+------------+--------------+-----------------+-------------------+ 534*a9aff440SNick Desaulniers| | 5 | 3 | 192 | `2^{-22}` | `2^{-23}` | 89 | 535*a9aff440SNick Desaulniers| | +-----------+------------+--------------+-----------------+-------------------+ 536*a9aff440SNick Desaulniers| | | 4 | 256 | `2^{-27}` | `2^{-28}` | 111 | 537*a9aff440SNick Desaulniers| | +-----------+------------+--------------+-----------------+-------------------+ 538*a9aff440SNick Desaulniers| | | 5 | 320 | `2^{-32}` | `2^{-33}` | 138 | 539*a9aff440SNick Desaulniers| | +-----------+------------+--------------+-----------------+-------------------+ 540*a9aff440SNick Desaulniers| | | 6 | 384 | `2^{-37}` | `2^{-38}` | 170 | 541*a9aff440SNick Desaulniers| +-----+-----------+------------+--------------+-----------------+-------------------+ 542*a9aff440SNick Desaulniers| | 6 | 3 | 384 | `2^{-25}` | `2^{-26}` | 92 | 543*a9aff440SNick Desaulniers| | +-----------+------------+--------------+-----------------+-------------------+ 544*a9aff440SNick Desaulniers| | | 4 | 512 | `2^{-31}` | `2^{-32}` | 117 | 545*a9aff440SNick Desaulniers| +-----+-----------+------------+--------------+-----------------+-------------------+ 546*a9aff440SNick Desaulniers| | 7 | 1 | 256 | `2^{-24}` | `2^{-15}` | 60 | 547*a9aff440SNick Desaulniers| | +-----------+------------+--------------+-----------------+-------------------+ 548*a9aff440SNick Desaulniers| | | 2 | 512 | `2^{-21}` | `2^{-22}` | 74 | 549*a9aff440SNick Desaulniers+-------+-----+-----------+------------+--------------+-----------------+-------------------+ 550*a9aff440SNick Desaulniers 551*a9aff440SNick Desaulnierswhere: 552*a9aff440SNick Desaulniers 553*a9aff440SNick Desaulniers- Number of steps = `i + 1` 554*a9aff440SNick Desaulniers- Table size = `(i + 1) 2^{M + 1}` 555*a9aff440SNick Desaulniers- Output bound = `2^{-N_{i + 1}} = 2^{-N_0 - (i + 1) M}` 556*a9aff440SNick Desaulniers- `\operatorname{ulp}(s_{i, k}) = 2^{-N_{i + 1} - 1}` 557*a9aff440SNick Desaulniers- `\operatorname{prec}(u_{i + 1}) = (i + 1) N_0 + \frac{i(i + 1)}{2} \cdot M + 53` 558