1.. _log_algorithm: 2 3======================== 4Log/Log10/Log2 Algorithm 5======================== 6 7.. default-role:: math 8 9In this short note, we will discuss in detail about the computation of 10:math:`\log(x)` function, with double precision inputs, in particular, the range 11reduction steps and error analysis. The algorithm is broken down into 2 main 12phases as follow: 13 141. Fast phase: 15 16 a. Range reduction 17 b. Polynomial approximation 18 c. Ziv's test 19 202. Accurate phase (if Ziv's test failed): 21 22 a. Further range reduction 23 b. Polynomial approximation 24 25 26Fast phase 27========== 28 29Range reduction 30--------------- 31 32Let `x = 2^{e_x} (1 + m_x)` be a normalized double precision number, in which 33`-1074 \leq e_x \leq 1022` and `0 \leq m_x < 1` such that 34`2^{52} m_x \in \mathbb{Z}`. 35 36Then from the properties of logarithm: 37 38.. math:: 39 \log(x) &= \log\left( 2^{e_x} (1 + m_x) \right) \\ 40 &= \log\left( 2^{e_x} \right) + \log(1 + m_x) \\ 41 &= e_x \log(2) + \log(1 + m_x) 42 43the computation of `\log(x)` can be reduced to: 44 451. compute the product of `e_x` and `\log(2)`, 462. compute `\log(1 + m_x)` for `0 \leq m_x < 1`, 473. add step 1 and 2. 48 49To compute `\log(1 + m_x)` in step 2, we can reduce the range further by finding 50`r > 0` such that: 51 52.. math:: 53 | r(1 + m_x) - 1 | < C \quad \quad \text{(R1)} 54 55for small `0 < C < 1`. Then if we let `u = r(1 + m_x) - 1`, `|u| < C`: 56 57.. math:: 58 \log(1 + m_x) &= \log \left( \frac{r (1 + m_x)}{r} \right) \\ 59 &= \log(r (1 + m_x) ) - \log(r) \\ 60 &= \log(1 + u) - \log(r) 61 62and step 2 can be computed with: 63 64a. extract `r` and `-\log(r)` from look-up tables, 65b. compute the reduced argument `u = r(1 + m_x) - 1`, 66c. compute `\log(1 + u)` by polynomial approximation or further range reduction, 67d. add step a and step c results. 68 69 70How to derive `r` 71----------------- 72 73For an efficient implementation, we would like to use the first `M` significant 74bits of `m_x` to look up for `r`. In particular, we would like to find a value 75of `r` that works for all `m_x` satisfying: 76 77.. math:: 78 k 2^{-M} \leq m_x < (k + 1) 2^{-M} \quad \text{for some} \quad 79 k = 0..2^{M} - 1. \quad\quad \text{(M1)} 80 81Let `r = 1 + s`, then `u` can be expressed in terms of `s` as: 82 83.. math:: 84 u &= r(1 + m_x) - 1 \\ 85 &= (1 + s)(1 + m_x) - 1 \\ 86 &= s m_x + s + m_x &\quad\quad \text{(U1)} \\ 87 &= s (1 + m_x) + m_x \\ 88 &= m_x (1 + s) + s. 89 90From the condition `\text{(R1)}`, `s` is bounded by: 91 92.. math:: 93 \frac{-C - m_x}{1 + m_x} < s < \frac{C - m_x}{1 + m_x} \quad\quad \text{(S1)}. 94 95Since our reduction constant `s` must work for all `m_x` in the interval 96`I = \{ v: k 2^{-M} \leq v < (k + 1) 2^{-M} \}`, `s` is bounded by: 97 98.. math:: 99 \sup_{v \in I} \frac{-C - v}{1 + v} < s < \inf_{v \in I} \frac{C - v}{1 + v} 100 101For a fixed constant `|c| < 1`, let `f(v) = \frac{c - v}{1 + v}`, then its 102derivative is: 103 104.. math:: 105 f'(v) = \frac{(-1)(1 + v) - (1)(c - v)}{(1 + v)^2} = \frac{-1 - c}{(1 + v)^2}. 106 107Since `|c| < 1`, `f'(v) < 0` for all `v \neq -1`, so: 108 109.. math:: 110 \sup_{v \in I} f(v) &= f \left( \inf\{ v: v \in I \} \right) 111 = f \left( k 2^{-M} \right) \\ 112 \inf_{v \in I} f(v) &= f \left( \sup\{ v: v \in I \} \right) 113 = f \left( (k + 1) 2^{-M} \right) 114 115Hence we have the following bound on `s`: 116 117.. math:: 118 \frac{-C - k 2^{-M}}{1 + k 2^{-M}} < s \leq 119 \frac{C - (k + 1) 2^{-M}}{1 + (k + 1) 2^{-M}}. \quad\quad \text{(S2)} 120 121In order for `s` to exist, we need that: 122 123.. math:: 124 \frac{C - (k + 1) 2^{-M}}{1 + (k + 1) 2^{-M}} > 125 \frac{-C - k 2^{-M}}{1 + k 2^{-M}} 126 127which is equivalent to: 128 129.. math:: 130 \quad\quad 2C - 2^{-M} + (2k + 1) 2^{-M} C > 0 131 \iff C > \frac{2^{-M - 1}}{1 + (2k + 1) 2^{-M - 1}} \quad\quad \text{(C1)}. 132 133Consider the case `C = 2^{-N}`. Since `0 \leq k \leq 2^M - 1,` the right hand 134side of `\text{(C1)}` is bounded by: 135 136.. math:: 137 2^{-M - 1} > \frac{2^{-M - 1}}{1 + (2k + 1) 2^{-M - 1}} \geq 138 \frac{2^{-M - 1}}{1 + (2^{M + 1} - 1) 2^{-M - 1}} > 2^{-M - 2}. 139 140Hence, from `\text{(C1)}`, being an exact power of 2, `C = 2^{-N}` is bounded below 141by: 142 143.. math:: 144 C = 2^{-N} \geq 2^{-M - 1}. 145 146To make the range reduction efficient, we will want to minimize `C` (maximize 147`N`) while keeping the required precision of `s`(`r`) as low as possible. And 148for that, we will consider the following two cases: `N = M + 1` and `N = M`. 149 150Case 1 - `N = M + 1` 151~~~~~~~~~~~~~~~~~~~~ 152 153When `N = M + 1`, `\text{(S2)}` becomes: 154 155.. math:: 156 \frac{-2^{-M - 1} - k 2^{-M}}{1 + k 2^{-M}} < s < 157 \frac{2^{-M - 1} - (k + 1) 2^{-M}}{1 + (k + 1) 2^{-M}}. 158 \quad\quad \text{(S2')} 159 160This is an interval of length: 161 162.. math:: 163 l &= \frac{2^{-M - 1} - (k + 1) 2^{-M}}{1 + (k + 1) 2^{-M}} - 164 \frac{-2^{-M - 1} - k 2^{-M}}{1 + k 2^{-M}} \\ 165 &= \frac{(2k + 1)2^{-2M - 1}}{(1 + k 2^{-M})(1 + (k + 1)2^{-M})} 166 \quad\quad \text{(L1)} 167 168As a function of `k`, the length `l` has its derivative with respect to `k`: 169 170.. math:: 171 \frac{dl}{dk} = 172 \frac{2^{2M + 1} - 2k(k + 1) - 1} 173 {2^{4M}(1 + k 2^{-M})^2 (1 + (k + 1) 2^{-M})^2} 174 175which is always positive for `0 \leq k \leq 2^M - 1`. So for all 176`0 < k < 2^{-M}` (`k = 0` will be treated differently in edge cases), and for 177`M > 2`, `l` is bounded below by: 178 179.. math:: 180 l > 2^{-2M}. 181 182It implies that we can always find `s` with `\operatorname{ulp}(s) = 2^{-2M}`. 183And from `\text{(U1)}`, `u = s(1 + m_x) + m_x`, its `ulp` is: 184 185.. math:: 186 \operatorname{ulp}(u) &= \operatorname{ulp}(s) \cdot \operatorname{ulp}(m_x) \\ 187 &= 2^{-2M} \operatorname{ulp}(m_x). 188 189Since: 190 191.. math:: 192 |u| < C = 2^{-N} = 2^{-M - 1}, 193 194Its required precision is: 195 196.. math:: 197 \operatorname{prec}(u) &= \log_2(2^{-M-1} / \operatorname{ulp}(u)) \\ 198 &= \log_2(2^{M - 1} / \operatorname{ulp}(m_x)) \\ 199 &= M - 1 - \log_2(\operatorname{ulp}(m_x)). 200 201This means that in this case, we cannot restrict `u` to be exactly representable 202in double precision for double precision input `x` with `M > 2`. Nonetheless, 203for a reasonable value of `M`, we can have `u` exactly representable in double 204precision for single precision input `x` (`\operatorname{ulp}(m_x) = 2^{-23}`) 205such that `|u| < 2^{-M - 1}` using a look-up table of size `2^M`. 206 207A particular formula for `s` can be derived from `\text{(S2')}` by the midpoint 208formula: 209 210.. math:: 211 s &= 2^{-2M} \operatorname{round}\left( 2^{2M} \cdot \operatorname{midpoint} 212 \left(-\frac{-2^{-M - 1} - k2^{-M}}{1 + k 2^{-M}}, 213 \frac{2^{-M-1} - (k + 1)2^{-M}}{1 + (k + 1) 2^{-M}}\right) \right) \\ 214 &= 2^{-2M} \operatorname{round}\left( 2^{2M} \cdot \frac{1}{2} \left( 215 \frac{-2^{-M - 1} - k2^{-M}}{1 + k 2^{-M}} + 216 \frac{2^{-M - 1} + (k + 1)2^{-M}}{1 + (k + 1) 2^{-M}} 217 \right) \right) \\ 218 &= 2^{-2M} \operatorname{round}\left( \frac{ 219 - \left(k + \frac{1}{2} \right) \left(2^M - k - \frac{1}{2} \right) } 220 {(1 + k 2^{-N})(1 + (k + 1) 2^{-N})} \right) \\ 221 &= - 2^{-2M} \operatorname{round}\left( \frac{ 222 \left(k + \frac{1}{2} \right) \left(2^M - k - \frac{1}{2} \right) } 223 {(1 + k 2^{-N})(1 + (k + 1) 2^{-N})} \right) \quad\quad \text{(S3)} 224 225The corresponding range and formula for `r = 1 + s` are: 226 227.. math:: 228 \frac{1 - 2^{-M - 1}}{1 + k 2^{-M}} < r \leq 229 \frac{1 + 2^{-M - 1}}{1 + (k + 1) 2^{-M}} 230 231.. math:: 232 r &= 2^{-2M} \operatorname{round}\left( 2^{2M} \cdot 233 \operatorname{midpoint}\left( \frac{1 - 2^{-M - 1}}{1 + k 2^{-M}}, 234 \frac{1 + 2^{-M - 1}}{1 + (k + 1) 2^{-M}}\right) \right) \\ 235 &= 2^{-2M} \operatorname{round}\left( 2^{2M} \cdot \frac{1}{2} \left( 236 \frac{1 + 2^{-M-1}}{1 + (k + 1) 2^{-M}} + \frac{1 - 2^{-M-1}}{1 + k 2^{-M}} 237 \right) \right) \\ 238 &= 2^{-2M} \operatorname{round}\left( 2^{2M} \cdot \frac{ 239 1 + \left(k + \frac{1}{2} \right) 2^{-M} - 2^{-2M-2} }{(1 + k 2^{-M}) 240 (1 + (k + 1) 2^{-M})} \right) 241 242Case 1 - `N = M` 243~~~~~~~~~~~~~~~~ 244 245When `N = M`, `\text{(S2)}` becomes: 246 247.. math:: 248 \frac{-(k + 1)2^{-M}}{1 + k 2^{-M}} < s < \frac{-k 2^{-M}}{1 + (k + 1) 2^{-M}} 249 \quad\quad \text{(S2")} 250 251This is an interval of length: 252 253.. math:: 254 l &= \frac{- k 2^{-M}}{1 + (k + 1) 2^{-M}} - 255 \frac{- (k + 1) 2^{-M}}{1 + k 2^{-M}} \\ 256 &= \frac{2^{-M} (1 + (2k + 1) 2^{-M})}{(1 + k 2^{-M})(1 + (k + 1)2^{-M})} 257 \quad\quad \text{(L1')} 258 259As a function of `k`, its derivative with respect to `k`: 260 261.. math:: 262 \frac{dl}{dk} = 263 -\frac{2^{-2M}(k(k + 1)2^{-M + 1} + 2^{-M} + 2k + 1)} 264 {(1 + k 2^{-M})^2 (1 + (k + 1) 2^{-M})^2} 265 266which is always negative for `0 \leq k \leq 2^M - 1`. So for `M > 1`, `l` is 267bounded below by: 268 269.. math:: 270 l > \frac{2^{-M - 1} (3 - 2^{-M})}{2 - 2^{-M}} > 2^{-M - 1}. 271 272It implies that we can always find `s` with `\operatorname{ulp}(s) = 2^{-M-1}`. 273And from `\text{(U1)}`, `u = s(1 + m_x) + m_x`, its `ulp` is: 274 275.. math:: 276 \operatorname{ulp}(u) &= \operatorname{ulp}(s) \cdot \operatorname{ulp}(m_x) \\ 277 &= 2^{-M - 1} \operatorname{ulp}(m_x). 278 279Since: 280 281.. math:: 282 |u| < C = 2^{-N} = 2^{-M}, 283 284Its required precision is: 285 286.. math:: 287 \operatorname{prec}(u) &= \log_2(2^{-M} / \operatorname{ulp}(u)) \\ 288 &= \log_2(2 / \operatorname{ulp}(m_x)) \\ 289 &= 1 - \log_2(\operatorname{ulp}(m_x)). 290 291Hence, for double precision `x`, `\operatorname{ulp}(m_x) = 2^{-52}`, and the 292precision needed for `u` is `\operatorname{prec}(u) = 53`, i.e., `u` can be 293exactly representable in double precision. And in this case, `s` can be 294derived from `\text{(S2")}` by the midpoint formula: 295 296.. math:: 297 s &= 2^{-M - 1} \operatorname{round}\left( 2^{M + 1} \cdot 298 \operatorname{midpoint} \left(-\frac{-(k + 1)2^{-M}}{1 + k 2^{-M}}, 299 \frac{-k2^{-M}}{1 + (k + 1) 2^{-M}}\right) \right) \\ 300 &= 2^{-M - 1} \operatorname{round}\left( 2^{M + 1} \cdot \frac{1}{2} \left( 301 \frac{-(k + 1)2^{-M}}{1 + k 2^{-M}} + \frac{-k2^{-M}}{1 + (k + 1) 2^{-M}} 302 \right) \right) \\ 303 &= -2^{-M - 1} \operatorname{round}\left( \frac{ 304 (2k + 1) + (2k^2 + 2k + 1) 2^{-M} } 305 {(1 + k 2^{-N})(1 + (k + 1) 2^{-N})} \right) \quad\quad \text{(S3')} 306 307The corresponding range and formula for `r = 1 + s` are: 308 309.. math:: 310 \frac{1 - 2^{-M}}{1 + k 2^{-M}} < r \leq \frac{1 + 2^{-M}}{1 + (k + 1) 2^{-M}} 311 312.. math:: 313 r &= 2^{-M-1} \operatorname{round}\left( 2^{M + 1} \cdot 314 \operatorname{midpoint}\left( \frac{1 - 2^{-M}}{1 + k 2^{-M}}, 315 \frac{1 + 2^{-M}}{1 + (k + 1) 2^{-M}}\right) \right) \\ 316 &= 2^{-M-1} \operatorname{round}\left( 2^{M + 1} \cdot \frac{1}{2} \left( 317 \frac{1 + 2^{-M}}{1 + (k + 1) 2^{-M}} + \frac{1 - 2^{-M}}{1 + k 2^{-M}} 318 \right) \right) \\ 319 &= 2^{-M - 1} \operatorname{round}\left( 2^{M + 1} \cdot \frac{ 320 1 + \left(k + \frac{1}{2} \right) 2^{-M} - 2^{-2M-1} }{(1 + k 2^{-M}) 321 (1 + (k + 1) 2^{-M})} \right) 322 323Edge cases 324---------- 325 3261. When `k = 0`, notice that: 327 328.. math:: 329 0 = k 2^{-N} \leq m_x < (k + 1) 2^{-N} = 2^{-N} = C, 330 331so we can simply choose `r = 1` so that `\log(r) = 0` is exact, then `u = m_x`. 332This will help reduce the accumulated errors when `m_x` is close to 0 while 333maintaining the range reduction output's requirements. 334 3352. When `k = 2^{N} - 1`, `\text{(S2)}` becomes: 336 337.. math:: 338 -\frac{1}{2} - \frac{C - 2^{-M-1}}{2 - 2^{-M}} <> s \leq 339 -\frac{1}{2} + \frac{C}{2}. 340 341so when `C > 2^{-M - 1}` is a power of 2, we can always choose: 342 343.. math:: 344 s = -\frac{1}{2}, \quad \text{i.e.} \quad r = \frac{1}{2}. 345 346This reduction works well to avoid catastrophic cancellation happening when 347`e_x = -1`. 348 349This also works when `C = 2^{-M - 1}` if we relax the condition on `u` to 350`|u| \leq C = 2^{-M-1}`. 351 352Intermediate precision, and Ziv's test 353-------------------------------------- 354 355In the fast phase, we want extra precision while performant, so we use 356double-double precision for most intermediate computation steps, and employ Ziv 357test to see if the result is accurate or not. In our case, the Ziv's test can 358be described as follow: 359 3601. Let `re = re.hi + re.lo` be the double-double output of the fast phase 361 computation. 3622. Let `err` be an estimated upper bound of the errors of `re`. 3633. If `\circ(re.hi + (re.lo - err)) == \circ(re.hi + (r.lo + err))` then the 364 result is correctly rounded to double precision for the current rounding mode 365 `\circ`. Otherwise, the accurate phase with extra precision is needed. 366 367For an easy and cheap estimation of the error bound `err`, since the range 368reduction step described above is accurate, the errors of the result: 369 370.. math:: 371 \log(x) &= e_x \log(2) - \log(r) + \log(1 + u) \\ 372 &\approx e_x \log(2) - \log(r) + u P(u) 373 374come from 2 parts: 375 3761. the look-up part: `e_x \log(2) - \log(r)` 3772. the polynomial approximation part: `u P(u)` 378 379The errors of the first part can be computed with a single `\operatorname{fma}` 380operation: 381 382.. math:: 383 err_1 = \operatorname{fma}(e_x, err(\log(2)), err(\log(r))), 384 385and then combining with the errors of the second part for another 386`\operatorname{fma}` operation: 387 388.. math:: 389 err = \operatorname{fma}(u, err(P), err_1) 390 391 392Accurate phase 393============== 394 395Extending range reduction 396------------------------- 397 398Since the output `u = r(1 + m_x) - 1` of the fast phase's range reduction 399is computed exactly, we can apply further range reduction steps by 400using the following formula: 401 402.. math:: 403 u_{i + 1} = r_i(1 + u_i) - 1 = u_i \cdot r_i + (r_i - 1), 404 405where `|u_i| < 2^{-N_i}` and `u_0 = u` is representable in double precision. 406 407Let `s_i = r_i - 1`, then we can rewrite it as: 408 409.. math:: 410 u_{i + 1} &= (1 + s_i)(1 + u_i) - 1 \\ 411 &= s_i u_i + u_i + s_i \\ 412 &= u_i (1 + s_i) + s_i 413 &= s_i (1 + u_i) + u_i. 414 415Then the bound on `u_{i + 1}` is translated to `s_i` as: 416 417.. math:: 418 \frac{-2^{-N_{i + 1}} - u_i}{1 + u_i} < s_i < \frac{2^{-N_{i + 1}} - u_i}{1 + u_i}. 419 420Let say we divide the interval `[0, 2^-{N_i})` into `2^{M_i}` subintervals 421evenly and use the index `k` such that: 422 423.. math:: 424 k 2^{-N_i - M_i} \leq u_i < (k + 1) 2^{-N_i - M_i}, 425 426to look-up for the reduction constant `s_{i, k}`. In other word, `k` is given 427by the formula: 428 429.. math:: 430 k = \left\lfloor 2^{N_i + M_i} u_i \right\rfloor 431 432Notice that our reduction constant `s_{i, k}` must work for all `u_i` in the 433interval `I = \{ v: k 2^{-N_i - M_i} \leq v < (k + 1) 2^{-N_i - M_i} \}`, 434so it is bounded by: 435 436.. math:: 437 \sup_{v \in I} \frac{-2^{-N_{i + 1}} - v}{1 + v} < s_{i, k} < \inf_{v \in I} \frac{2^{-N_{i + 1}} - v}{1 + v} 438 439For a fixed constant `|C| < 1`, let `f(v) = \frac{C - v}{1 + v}`, then its derivative 440is: 441 442.. math:: 443 f'(v) = \frac{(-1)(1 + v) - (1)(C - v)}{(1 + v)^2} = \frac{-1 - C}{(1 + v)^2}. 444 445Since `|C| < 1`, `f'(v) < 0` for all `v \neq -1`, so: 446 447.. math:: 448 \sup_{v \in I} f(v) &= f \left( \inf\{ v: v \in I \} \right) 449 = f \left( k 2^{-N_i - M_i} \right) \\ 450 \inf_{v \in I} f(v) &= f \left( \sup\{ v: v \in I \} \right) 451 = f \left( (k + 1) 2^{-N_i - M_i} \right) 452 453Hence we have the following bound on `s_{i, k}`: 454 455.. math:: 456 \frac{-2^{-N_{i + 1}} - k 2^{-N_i - M_i}}{1 + k 2^{-N_i - M_i}} < s_{i, k} 457 \leq \frac{2^{-N_{i + 1}} - (k + 1) 2^{-N_i - M_i}}{1 + (k + 1) 2^{-N_i - M_i}} 458 459This interval is of length: 460 461.. math:: 462 l &= \frac{2^{-N_{i + 1}} - (k + 1) 2^{-N_i - M_i}}{1 + (k + 1) 2^{-N_i - M_i}} - 463 \frac{-2^{-N_{i + 1}} - k 2^{-N_i - M_i}}{1 + k 2^{-N_i - M_i}} \\ 464 &= \frac{2^{-N_{i + 1} + 1} - 2^{-N_i - M_i} + (2k + 1) 2^{-N_{i + 1} - N_i - M_i}} 465 {(1 + k 2^{-N_i - M_i})(1 + (k + 1) 2^{-N_i -M_i})} 466 467So in order to be able to find `s_{i, k}`, we need that: 468 469.. math:: 470 2^{-N_{i + 1} + 1} - 2^{-N_i - M_i} + (2k + 1) 2^{-N_{i + 1} - N_i - M_i} > 0 471 472This give us the following bound on `N_{i + 1}`: 473 474.. math:: 475 N_{i + 1} \leq N_i + M_i + 1. 476 477To make the range reduction effective, we will want to maximize `N_{i + 1}`, so 478let consider the two cases: `N_{i + 1} = N_i + M_i + 1` and 479`N_{i + 1} = N_i + M_i`. 480 481 482 483The optimal choice to balance between maximizing `N_{i + 1}` and minimizing the 484precision needed for `s_{i, k}` is: 485 486.. math:: 487 N_{i + 1} = N_i + M_i, 488 489and in this case, the optimal `\operatorname{ulp}(s_{i, k})` is: 490 491.. math:: 492 \operatorname{ulp}(s_{i, k}) = 2^{-N_i - M_i} 493 494and the corresponding `\operatorname{ulp}(u_{i + 1})` is: 495 496.. math:: 497 \operatorname{ulp}(u_{i + 1}) &= \operatorname{ulp}(u_i) \operatorname{ulp}(s_{i, k}) \\ 498 &= \operatorname{ulp}(u_i) \cdot 2^{-N_i - M_i} \\ 499 &= \operatorname{ulp}(u_0) \cdot 2^{-N_0 - M_0} \cdot 2^{-N_0 - M_0 - M_1} \cdots 2^{-N_0 - M_0 - M_1 - \cdots - M_i} \\ 500 &= 2^{-N_0 - 53} \cdot 2^{-N_0 - M_0} \cdot 2^{-N_0 - M_0 - M_1} \cdots 2^{-N_0 - M_0 - M_1 - \cdots - M_i} 501 502Since `|u_{i + 1}| < 2^{-N_{i + 1}} = 2^{-N_0 - M_1 - ... -M_i}`, the precision 503of `u_{i + 1}` is: 504 505.. math:: 506 \operatorname{prec}(u_{i + 1}) &= (N_0 + 53) + (N_0 + M_0) + \cdots + 507 (N_0 + M_0 + \cdots + M_i) - (N_0 + M_0 + \cdots + M_i) \\ 508 &= (i + 1) N_0 + i M_0 + (i - 1) M_1 + \cdots + M_{i - 1} + 53 509 510If we choose to have the same `M_0 = M_1 = \cdots = M_i = M`, this can be 511simplified to: 512 513.. math:: 514 \operatorname{prec}(u_{i + 1}) = (i + 1) N_0 + \frac{i(i + 1)}{2} \cdot M + 53. 515 516We summarize the precision analysis for extending the range reduction in the 517table below: 518 519+-------+-----+-----------+------------+--------------+-----------------+-------------------+ 520| `N_0` | `M` | No. steps | Table size | Output bound | ulp(`s_{i, k}`) | prec(`u_{i + 1}`) | 521+-------+-----+-----------+------------+--------------+-----------------+-------------------+ 522| 7 | 4 | 1 | 32 | `2^{-11}` | `2^{-12}` | 60 | 523| | +-----------+------------+--------------+-----------------+-------------------+ 524| | | 2 | 64 | `2^{-15}` | `2^{-16}` | 71 | 525| | +-----------+------------+--------------+-----------------+-------------------+ 526| | | 3 | 96 | `2^{-19}` | `2^{-20}` | 86 | 527| | +-----------+------------+--------------+-----------------+-------------------+ 528| | | 4 | 128 | `2^{-23}` | `2^{-24}` | 105 | 529| | +-----------+------------+--------------+-----------------+-------------------+ 530| | | 5 | 160 | `2^{-27}` | `2^{-28}` | 128 | 531| | +-----------+------------+--------------+-----------------+-------------------+ 532| | | 6 | 192 | `2^{-31}` | `2^{-32}` | 155 | 533| +-----+-----------+------------+--------------+-----------------+-------------------+ 534| | 5 | 3 | 192 | `2^{-22}` | `2^{-23}` | 89 | 535| | +-----------+------------+--------------+-----------------+-------------------+ 536| | | 4 | 256 | `2^{-27}` | `2^{-28}` | 111 | 537| | +-----------+------------+--------------+-----------------+-------------------+ 538| | | 5 | 320 | `2^{-32}` | `2^{-33}` | 138 | 539| | +-----------+------------+--------------+-----------------+-------------------+ 540| | | 6 | 384 | `2^{-37}` | `2^{-38}` | 170 | 541| +-----+-----------+------------+--------------+-----------------+-------------------+ 542| | 6 | 3 | 384 | `2^{-25}` | `2^{-26}` | 92 | 543| | +-----------+------------+--------------+-----------------+-------------------+ 544| | | 4 | 512 | `2^{-31}` | `2^{-32}` | 117 | 545| +-----+-----------+------------+--------------+-----------------+-------------------+ 546| | 7 | 1 | 256 | `2^{-24}` | `2^{-15}` | 60 | 547| | +-----------+------------+--------------+-----------------+-------------------+ 548| | | 2 | 512 | `2^{-21}` | `2^{-22}` | 74 | 549+-------+-----+-----------+------------+--------------+-----------------+-------------------+ 550 551where: 552 553- Number of steps = `i + 1` 554- Table size = `(i + 1) 2^{M + 1}` 555- Output bound = `2^{-N_{i + 1}} = 2^{-N_0 - (i + 1) M}` 556- `\operatorname{ulp}(s_{i, k}) = 2^{-N_{i + 1} - 1}` 557- `\operatorname{prec}(u_{i + 1}) = (i + 1) N_0 + \frac{i(i + 1)}{2} \cdot M + 53` 558