common_constants.cpp - OpenGrok history log for /llvm-project/libc/src/math/generic/common

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# 5ff3ff33	12-Jul-2024	Petr Hosek <phosek@google.com>	[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98597) This is a part of #97655.
# ce9035f5	12-Jul-2024	Mehdi Amini <joker.eph@gmail.com>	Revert "[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration" (#98593) Reverts llvm/llvm-project#98075 bots are broken
# 3f30effe	11-Jul-2024	Petr Hosek <phosek@google.com>	[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98075) This is a part of #97655.
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5
# bc7a3bd8	06-Nov-2023	lntue <35648136+lntue@users.noreply.github.com>	[libc][math] Implement powf function correctly rounded to all rounding modes. (#71188) We compute `pow(x, y)` using the formula ``` pow(x, y) = x^y = 2^(y * log2(x)) ``` We follow similar step [libc][math] Implement powf function correctly rounded to all rounding modes. (#71188) We compute `pow(x, y)` using the formula ``` pow(x, y) = x^y = 2^(y * log2(x)) ``` We follow similar steps as in `log2f(x)` and `exp2f(x)`, by breaking down into `hi + mid + lo` parts, in which `hi` parts are computed using the exponent field directly, `mid` parts will use look-up tables, and `lo` parts are approximated by polynomials. We add some speedup for common use-cases: ``` pow(2, y) = exp2(y) pow(10, y) = exp10(y) pow(x, 2) = x * x pow(x, 1/2) = sqrt(x) pow(x, -1/2) = rsqrt(x) - to be added ``` show more ...
Revision tags: llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2
# b6bc9d72	26-Sep-2023	Guillaume Chatelet <gchatelet@google.com>	[libc] Mass replace enclosing namespace (#67032) This is step 4 of https://discourse.llvm.org/t/rfc-customizable-namespace-to-allow-testing-the-libc-when-the-system-libc-is-also-llvms-libc/73079
Revision tags: llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4
# 8ca614aa	25-Aug-2023	Tue Ly <lntue@google.com>	[libc][math] Implement double precision exp2 function correctly rounded for all rounding modes. Implement double precision exp2 function correctly rounded for all rounding modes. Using the same alg [libc][math] Implement double precision exp2 function correctly rounded for all rounding modes. Implement double precision exp2 function correctly rounded for all rounding modes. Using the same algorithm as double precision exp function in https://reviews.llvm.org/D158551. Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D158812 show more ...
Revision tags: llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4
# a68bbf42	08-May-2023	Tue Ly <lntue@google.com>	[libc][math] Implement double precision log function correctly rounded to all rounding modes. Implement double precision log function correctly rounded to all rounding modes. See https://reviews.ll [libc][math] Implement double precision log function correctly rounded to all rounding modes. Implement double precision log function correctly rounded to all rounding modes. See https://reviews.llvm.org/D150014 for a more detail description of the algorithm. Performance - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.93%. - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 17.465 + 0.596 clc/call; Median-Min = 0.602 clc/call; Max = 18.389 clc/call; -- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 54.961 + 2.606 clc/call; Median-Min = 2.180 clc/call; Max = 59.583 clc/call; -- System LIBC reciprocal throughput -- [####################] 100 % Ntrial = 20 ; Min = 12.608 + 0.276 clc/call; Median-Min = 0.359 clc/call; Max = 13.147 clc/call; -- LIBC reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 20.952 + 0.468 clc/call; Median-Min = 0.602 clc/call; Max = 21.881 clc/call; -- LIBC reciprocal throughput -- without FMA [####################] 100 % Ntrial = 20 ; Min = 18.569 + 0.552 clc/call; Median-Min = 0.601 clc/call; Max = 19.259 clc/call; ``` - Latency from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log --latency GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 48.431 + 0.699 clc/call; Median-Min = 0.073 clc/call; Max = 51.269 clc/call; -- CORE-MATH latency -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 64.865 + 3.235 clc/call; Median-Min = 3.475 clc/call; Max = 71.788 clc/call; -- System LIBC latency -- [####################] 100 % Ntrial = 20 ; Min = 42.151 + 2.090 clc/call; Median-Min = 2.270 clc/call; Max = 44.773 clc/call; -- LIBC latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 35.266 + 0.479 clc/call; Median-Min = 0.373 clc/call; Max = 36.798 clc/call; -- LIBC latency -- without FMA [####################] 100 % Ntrial = 20 ; Min = 48.518 + 0.484 clc/call; Median-Min = 0.500 clc/call; Max = 49.896 clc/call; ``` - Accurate pass latency: ``` $ ./perf.sh log --latency --simple_stat GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA 598.306 -- CORE-MATH latency -- without FMA (-march=x86-64-v2) 632.925 -- LIBC latency -- with FMA 455.632 -- LIBC latency -- without FMA 488.564 ``` Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D150131 show more ...
# a0c92a38	06-May-2023	Tue Ly <lntue@google.com>	[libc][math] Make log10 correctly rounded for non-FMA targets and improve itsperformance. Make log10 correctly rounded for non-FMA targets and improve its performance. Implemented fast pass and acc [libc][math] Make log10 correctly rounded for non-FMA targets and improve itsperformance. Make log10 correctly rounded for non-FMA targets and improve its performance. Implemented fast pass and accurate pass: Fast Pass: - Range reduction step 0: Extract exponent and mantissa ``` x = 2^(e_x) * m_x ``` - Range reduction step 1: Use lookup tables of size 2^7 = 128 to reduce the argument to: ``` -2^-8 <= v = r * m_x - 1 < 2^-7 where r = 2^-8 * ceil( 2^8 * (1 - 2^-8) / (1 + k * 2^-7) ) and k = trunc( (m_x - 1) * 2^7 ) ``` - Polynomial approximation: approximate `log(1 + v)` by a degree-7 polynomial generated by Sollya with: ``` > P = fpminimax((log(1 + x) - x)/x^2, 5, [\|D...\|], [-2^-8, 2^-7]); ``` - Combine the results: ``` log10(x) ~ ( e_x * log(2) - log(r) + v + v^2 * P(v) ) * log10(e) ``` - Perform additive Ziv's test with errors bounded by `P_ERR * v^2`. Return the result if Ziv's test passed. Accurate Pass: - Take `e_x`, `v`, and the lookup table index from the range reduction step of fast pass. - Perform 3 more range reduction steps: - Range reduction step 2: Use look-up tables of size 193 to reduce the argument to `[-0x1.3ffcp-15, 0x1.3e3dp-15]` ``` v2 = r2 * (1 + v) - 1 = (1 + s2) * (1 + v) - 1 = s2 + v + s2 * v where r2 = 2^-16 * round ( 2^16 / (1 + k * 2^-14) ) and k = trunc( v * 2^14 + 0.5 ). ``` - Range reduction step 3: Use look-up tables of size 161 to reduce the argument to `[-0x1.01928p-22 , 0x1p-22]` ``` v3 = r3 * (1 + v2) - 1 = (1 + s3) * (1 + v2) - 1 = s3 + v2 + s3 * v2 where r3 = 2^-21 * round ( 2^21 / (1 + k * 2^-21) ) and k = trunc( v * 2^21 + 0.5 ). ``` - Range reduction step 4: Use look-up tables of size 130 to reduce the argument to `[-0x1.0002143p-29 , 0x1p-29]` ``` v4 = r4 * (1 + v3) - 1 = (1 + s4) * (1 + v3) - 1 = s4 + v3 + s4 * v3 where r4 = 2^-28 * round ( 2^28 / (1 + k * 2^-28) ) and k = trunc( v * 2^28 + 0.5 ). ``` - Polynomial approximation: approximate `log10(1 + v4)` by a degree-4 minimax polynomial generated by Sollya with: ``` > P = fpminimax(log10(1 + x)/x, 3, [\|128...\|], [-0x1.0002143p-29 , 0x1p-29]); ``` - Combine the results: ``` log10(x) ~ e_x * log10(2) - log10(r) - log10(r2) - log10(r3) - log10(r4) + v * P(v) ``` - The combined results are computed using floating points of 128-bit precision. Performance - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.92%. - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log10 GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 20.402 + 0.589 clc/call; Median-Min = 0.277 clc/call; Max = 22.752 clc/call; -- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 75.797 + 3.317 clc/call; Median-Min = 3.407 clc/call; Max = 79.371 clc/call; -- System LIBC reciprocal throughput -- [####################] 100 % Ntrial = 20 ; Min = 22.668 + 0.184 clc/call; Median-Min = 0.181 clc/call; Max = 23.205 clc/call; -- LIBC reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 25.977 + 0.183 clc/call; Median-Min = 0.138 clc/call; Max = 26.283 clc/call; -- LIBC reciprocal throughput -- without FMA [####################] 100 % Ntrial = 20 ; Min = 22.140 + 0.980 clc/call; Median-Min = 0.853 clc/call; Max = 23.790 clc/call; ``` - Latency from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log10 --latency GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 54.613 + 0.357 clc/call; Median-Min = 0.287 clc/call; Max = 55.701 clc/call; -- CORE-MATH latency -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 79.681 + 0.482 clc/call; Median-Min = 0.294 clc/call; Max = 81.604 clc/call; -- System LIBC latency -- [####################] 100 % Ntrial = 20 ; Min = 61.532 + 0.208 clc/call; Median-Min = 0.199 clc/call; Max = 62.256 clc/call; -- LIBC latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 41.510 + 0.205 clc/call; Median-Min = 0.244 clc/call; Max = 41.867 clc/call; -- LIBC latency -- without FMA [####################] 100 % Ntrial = 20 ; Min = 55.669 + 0.240 clc/call; Median-Min = 0.280 clc/call; Max = 56.056 clc/call; ``` - Accurate pass latency: ``` $ ./perf.sh log10 --latency --simple_stat GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA 640.688 -- CORE-MATH latency -- without FMA (-march=x86-64-v2) 667.354 -- LIBC latency -- with FMA 495.593 -- LIBC latency -- without FMA 504.143 ``` Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D150014 show more ...
Revision tags: llvmorg-16.0.3, llvmorg-16.0.2
# bc8e87ef	07-Apr-2023	Tue Ly <lntue@google.com>	[libc][math] Update range reduction step for logf and reduce its latency. Simplify the range reduction steps by choosing the reduction constants carefully so that the reduced arguments v = rm_x - 1 [libc][math] Update range reduction step for logf and reduce its latency. Simplify the range reduction steps by choosing the reduction constants carefully so that the reduced arguments v = rm_x - 1 and v^2 are exact in double precision, even without FMA instructions, and -2^-8 <= v < 2^-7. This allows the polynomial evaluations to be parallelized more efficiently. Reviewed By: santoshn, zimmermann6 Differential Revision: https://reviews.llvm.org/D147755 show more ...
# 9af8dca7	06-Apr-2023	Tue Ly <lntue.h@gmail.com>	[libc][math] Update range reduction step for log10f and reduce its latency. Simplify the range reduction steps by choosing the reduction constants carefully so that the reduced arguments v = rm_x - [libc][math] Update range reduction step for log10f and reduce its latency. Simplify the range reduction steps by choosing the reduction constants carefully so that the reduced arguments v = rm_x - 1 and v^2 are exact in double precision, even without FMA instructions, and -2^-8 <= v < 2^-7. This allows the polynomial evaluations to be parallelized more efficiently. Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D147676 show more ...
Revision tags: llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0
# 89ed5b7c	28-Aug-2022	Kirill Okhotnikov <okir@google.com>	[libc][math] Added auxiliary function log2_eval for asinhf/acoshf/atanhf. 1) `double log2_eval(double)` function added with better than float precision is added. 2) Some refactoring done to put all [libc][math] Added auxiliary function log2_eval for asinhf/acoshf/atanhf. 1) `double log2_eval(double)` function added with better than float precision is added. 2) Some refactoring done to put all auxiliary functions and corresponding data to one place to reuse the code. 3) Added tests for new functions. 4) Performance and precision tests of the function shows, that it more precise than exiting log2, (no exceptional cases), but timing is ~5% higer that on current one. Differential Revision: https://reviews.llvm.org/D132809 show more ...
Revision tags: llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2
# 131dda9a	01-Aug-2022	Tue Ly <lntue@google.com>	[libc] Implement sincosf function correctly rounded to all rounding modes. Refactor common range reductions and evaluations for sinf, cosf, and sincosf. Added exhaustive tests for sincosf. Perform [libc] Implement sincosf function correctly rounded to all rounding modes. Refactor common range reductions and evaluations for sinf, cosf, and sincosf. Added exhaustive tests for sincosf. Performance before the patch: ``` System LIBC reciprocal throughput : 30.205 LIBC reciprocal throughput : 30.533 System LIBC latency : 67.961 LIBC latency : 61.564 ``` Performance after the patch: ``` System LIBC reciprocal throughput : 30.409 LIBC reciprocal throughput : 20.273 System LIBC latency : 67.527 LIBC latency : 61.959 ``` Reviewed By: orex Differential Revision: https://reviews.llvm.org/D130901 show more ...
Revision tags: llvmorg-15.0.0-rc1, llvmorg-16-init
# c78144e1	01-Jul-2022	Kirill Okhotnikov <okir@google.com>	[libc][math] Improved performance of exp2f function. New exp2 function algorithm: 1) Improved performance: 8.176 vs 15.270 by core-math perf tool. 2) Improved accuracy. Only two special values left. [libc][math] Improved performance of exp2f function. New exp2 function algorithm: 1) Improved performance: 8.176 vs 15.270 by core-math perf tool. 2) Improved accuracy. Only two special values left. 3) Lookup table size reduced twice. Differential Revision: https://reviews.llvm.org/D129005 show more ...
# 15b9380d	27-Jul-2022	Tue Ly <lntue@google.com>	[libc] Change sinf range reduction to mod pi/16 to be shared with cosf. Change `sinf` range reduction to mod pi/16 to be shared with `cosf`. Previously, `sinf` used range reduction `mod pi`, but th [libc] Change sinf range reduction to mod pi/16 to be shared with cosf. Change `sinf` range reduction to mod pi/16 to be shared with `cosf`. Previously, `sinf` used range reduction `mod pi`, but this cannot be used to implement `cosf` since the minimax algorithm for `cosf` does not converge due to critical points at `pi/2`. In order to be able to share the same range reduction functions for both `sinf` and `cosf`, we change the range reduction to `mod pi/16` for the following reasons: - The table size is sufficiently small: 32 entries for `sin(k * pi/16)` with `k = 0..31`. It could be reduced to 16 entries if we treat the final sign separately, with an extra multiplication at the end. - The polynomials' degrees are reduced to 7/8 from 15, with extra computations to combine `sin` and `cos` with trig sum equality. - The number of exceptional cases reduced to 2 (with FMA) and 3 (without FMA). - The latency is reduced while maintaining similar throughput as before. Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D130629 show more ...
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# 64af346b	14-Mar-2022	Tue Ly <lntue@google.com>	[libc] Implement expm1f function that is correctly rounded for all rounding modes. Implement expm1f function that is correctly rounded for all rounding modes. This is based on expf implementation. [libc] Implement expm1f function that is correctly rounded for all rounding modes. Implement expm1f function that is correctly rounded for all rounding modes. This is based on expf implementation. From exhaustive testings, using expf implementation, and subtract 1.0 before rounding the final result to single precision gives correctly rounded results for all \|x\| > 2^-4 with 1 exception. When \|x\| < 2^-25, we use x + x^2 (implemented with a single fma). And for 2^-25 <= \|x\| <= 2^-4, we use a single degree-8 minimax polynomial generated by Sollya. Reviewed By: sivachandra, zimmermann6 Differential Revision: https://reviews.llvm.org/D121574 show more ...
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1
# 9e7688c7	04-Feb-2022	Tue Ly <lntue@google.com>	[libc] Implement log1pf correctly rounded to all rounding modes. Implement log1pf correctly rounded to all rounding modes relying on logf implementation for exponent > 2^(-8). Reviewed By: sivachan [libc] Implement log1pf correctly rounded to all rounding modes. Implement log1pf correctly rounded to all rounding modes relying on logf implementation for exponent > 2^(-8). Reviewed By: sivachandra, zimmermann6 Differential Revision: https://reviews.llvm.org/D118962 show more ...
Revision tags: llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 63d2df00	15-Dec-2021	Tue Ly <lntue@google.com>	[libc] Implement correctly rounded log2f based on RLIBM library. Implement log2f based on RLIBM library correctly rounded for all rounding modes. Reviewed By: sivachandra, michaelrj, santoshn, jpl1 [libc] Implement correctly rounded log2f based on RLIBM library. Implement log2f based on RLIBM library correctly rounded for all rounding modes. Reviewed By: sivachandra, michaelrj, santoshn, jpl169, zimmermann6 Differential Revision: https://reviews.llvm.org/D115828 show more ...