#
5ff3ff33 |
| 12-Jul-2024 |
Petr Hosek <phosek@google.com> |
[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98597)
This is a part of #97655.
|
#
ce9035f5 |
| 12-Jul-2024 |
Mehdi Amini <joker.eph@gmail.com> |
Revert "[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration" (#98593)
Reverts llvm/llvm-project#98075
bots are broken
|
#
3f30effe |
| 11-Jul-2024 |
Petr Hosek <phosek@google.com> |
[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98075)
This is a part of #97655.
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5 |
|
#
bc7a3bd8 |
| 06-Nov-2023 |
lntue <35648136+lntue@users.noreply.github.com> |
[libc][math] Implement powf function correctly rounded to all rounding modes. (#71188)
We compute `pow(x, y)` using the formula
```
pow(x, y) = x^y = 2^(y * log2(x))
```
We follow similar step
[libc][math] Implement powf function correctly rounded to all rounding modes. (#71188)
We compute `pow(x, y)` using the formula
```
pow(x, y) = x^y = 2^(y * log2(x))
```
We follow similar steps as in `log2f(x)` and `exp2f(x)`, by breaking
down into `hi + mid + lo` parts, in which `hi` parts are computed using
the exponent field directly, `mid` parts will use look-up tables, and
`lo` parts are approximated by polynomials.
We add some speedup for common use-cases:
```
pow(2, y) = exp2(y)
pow(10, y) = exp10(y)
pow(x, 2) = x * x
pow(x, 1/2) = sqrt(x)
pow(x, -1/2) = rsqrt(x) - to be added
```
show more ...
|
Revision tags: llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2 |
|
#
b6bc9d72 |
| 26-Sep-2023 |
Guillaume Chatelet <gchatelet@google.com> |
[libc] Mass replace enclosing namespace (#67032)
This is step 4 of
https://discourse.llvm.org/t/rfc-customizable-namespace-to-allow-testing-the-libc-when-the-system-libc-is-also-llvms-libc/73079
|
Revision tags: llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4 |
|
#
8ca614aa |
| 25-Aug-2023 |
Tue Ly <lntue@google.com> |
[libc][math] Implement double precision exp2 function correctly rounded for all rounding modes.
Implement double precision exp2 function correctly rounded for all rounding modes. Using the same alg
[libc][math] Implement double precision exp2 function correctly rounded for all rounding modes.
Implement double precision exp2 function correctly rounded for all rounding modes. Using the same algorithm as double precision exp function in https://reviews.llvm.org/D158551.
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D158812
show more ...
|
Revision tags: llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4 |
|
#
a68bbf42 |
| 08-May-2023 |
Tue Ly <lntue@google.com> |
[libc][math] Implement double precision log function correctly rounded to all rounding modes.
Implement double precision log function correctly rounded to all rounding modes.
See https://reviews.ll
[libc][math] Implement double precision log function correctly rounded to all rounding modes.
Implement double precision log function correctly rounded to all rounding modes.
See https://reviews.llvm.org/D150014 for a more detail description of the algorithm.
**Performance**
- For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.93%.
- Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log GNU libc version: 2.35 GNU libc release: stable
-- CORE-MATH reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 17.465 + 0.596 clc/call; Median-Min = 0.602 clc/call; Max = 18.389 clc/call;
-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 54.961 + 2.606 clc/call; Median-Min = 2.180 clc/call; Max = 59.583 clc/call;
-- System LIBC reciprocal throughput -- [####################] 100 % Ntrial = 20 ; Min = 12.608 + 0.276 clc/call; Median-Min = 0.359 clc/call; Max = 13.147 clc/call;
-- LIBC reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 20.952 + 0.468 clc/call; Median-Min = 0.602 clc/call; Max = 21.881 clc/call;
-- LIBC reciprocal throughput -- without FMA [####################] 100 % Ntrial = 20 ; Min = 18.569 + 0.552 clc/call; Median-Min = 0.601 clc/call; Max = 19.259 clc/call;
``` - Latency from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log --latency GNU libc version: 2.35 GNU libc release: stable
-- CORE-MATH latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 48.431 + 0.699 clc/call; Median-Min = 0.073 clc/call; Max = 51.269 clc/call;
-- CORE-MATH latency -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 64.865 + 3.235 clc/call; Median-Min = 3.475 clc/call; Max = 71.788 clc/call;
-- System LIBC latency -- [####################] 100 % Ntrial = 20 ; Min = 42.151 + 2.090 clc/call; Median-Min = 2.270 clc/call; Max = 44.773 clc/call;
-- LIBC latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 35.266 + 0.479 clc/call; Median-Min = 0.373 clc/call; Max = 36.798 clc/call;
-- LIBC latency -- without FMA [####################] 100 % Ntrial = 20 ; Min = 48.518 + 0.484 clc/call; Median-Min = 0.500 clc/call; Max = 49.896 clc/call; ``` - Accurate pass latency: ``` $ ./perf.sh log --latency --simple_stat GNU libc version: 2.35 GNU libc release: stable
-- CORE-MATH latency -- with FMA 598.306
-- CORE-MATH latency -- without FMA (-march=x86-64-v2) 632.925
-- LIBC latency -- with FMA 455.632
-- LIBC latency -- without FMA 488.564 ```
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D150131
show more ...
|
#
a0c92a38 |
| 06-May-2023 |
Tue Ly <lntue@google.com> |
[libc][math] Make log10 correctly rounded for non-FMA targets and improve itsperformance.
Make log10 correctly rounded for non-FMA targets and improve its performance.
Implemented fast pass and acc
[libc][math] Make log10 correctly rounded for non-FMA targets and improve itsperformance.
Make log10 correctly rounded for non-FMA targets and improve its performance.
Implemented fast pass and accurate pass:
**Fast Pass**:
- Range reduction step 0: Extract exponent and mantissa ``` x = 2^(e_x) * m_x ``` - Range reduction step 1: Use lookup tables of size 2^7 = 128 to reduce the argument to: ``` -2^-8 <= v = r * m_x - 1 < 2^-7 where r = 2^-8 * ceil( 2^8 * (1 - 2^-8) / (1 + k * 2^-7) ) and k = trunc( (m_x - 1) * 2^7 ) ``` - Polynomial approximation: approximate `log(1 + v)` by a degree-7 polynomial generated by Sollya with: ``` > P = fpminimax((log(1 + x) - x)/x^2, 5, [|D...|], [-2^-8, 2^-7]); ``` - Combine the results: ``` log10(x) ~ ( e_x * log(2) - log(r) + v + v^2 * P(v) ) * log10(e) ``` - Perform additive Ziv's test with errors bounded by `P_ERR * v^2`. Return the result if Ziv's test passed.
**Accurate Pass**:
- Take `e_x`, `v`, and the lookup table index from the range reduction step of fast pass. - Perform 3 more range reduction steps: - Range reduction step 2: Use look-up tables of size 193 to reduce the argument to `[-0x1.3ffcp-15, 0x1.3e3dp-15]` ``` v2 = r2 * (1 + v) - 1 = (1 + s2) * (1 + v) - 1 = s2 + v + s2 * v where r2 = 2^-16 * round ( 2^16 / (1 + k * 2^-14) ) and k = trunc( v * 2^14 + 0.5 ). ``` - Range reduction step 3: Use look-up tables of size 161 to reduce the argument to `[-0x1.01928p-22 , 0x1p-22]` ``` v3 = r3 * (1 + v2) - 1 = (1 + s3) * (1 + v2) - 1 = s3 + v2 + s3 * v2 where r3 = 2^-21 * round ( 2^21 / (1 + k * 2^-21) ) and k = trunc( v * 2^21 + 0.5 ). ``` - Range reduction step 4: Use look-up tables of size 130 to reduce the argument to `[-0x1.0002143p-29 , 0x1p-29]` ``` v4 = r4 * (1 + v3) - 1 = (1 + s4) * (1 + v3) - 1 = s4 + v3 + s4 * v3 where r4 = 2^-28 * round ( 2^28 / (1 + k * 2^-28) ) and k = trunc( v * 2^28 + 0.5 ). ``` - Polynomial approximation: approximate `log10(1 + v4)` by a degree-4 minimax polynomial generated by Sollya with: ``` > P = fpminimax(log10(1 + x)/x, 3, [|128...|], [-0x1.0002143p-29 , 0x1p-29]); ``` - Combine the results: ``` log10(x) ~ e_x * log10(2) - log10(r) - log10(r2) - log10(r3) - log10(r4) + v * P(v) ``` - The combined results are computed using floating points of 128-bit precision.
**Performance**
- For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.92%.
- Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log10 GNU libc version: 2.35 GNU libc release: stable
-- CORE-MATH reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 20.402 + 0.589 clc/call; Median-Min = 0.277 clc/call; Max = 22.752 clc/call;
-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 75.797 + 3.317 clc/call; Median-Min = 3.407 clc/call; Max = 79.371 clc/call;
-- System LIBC reciprocal throughput -- [####################] 100 % Ntrial = 20 ; Min = 22.668 + 0.184 clc/call; Median-Min = 0.181 clc/call; Max = 23.205 clc/call;
-- LIBC reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 25.977 + 0.183 clc/call; Median-Min = 0.138 clc/call; Max = 26.283 clc/call;
-- LIBC reciprocal throughput -- without FMA [####################] 100 % Ntrial = 20 ; Min = 22.140 + 0.980 clc/call; Median-Min = 0.853 clc/call; Max = 23.790 clc/call;
``` - Latency from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log10 --latency GNU libc version: 2.35 GNU libc release: stable
-- CORE-MATH latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 54.613 + 0.357 clc/call; Median-Min = 0.287 clc/call; Max = 55.701 clc/call;
-- CORE-MATH latency -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 79.681 + 0.482 clc/call; Median-Min = 0.294 clc/call; Max = 81.604 clc/call;
-- System LIBC latency -- [####################] 100 % Ntrial = 20 ; Min = 61.532 + 0.208 clc/call; Median-Min = 0.199 clc/call; Max = 62.256 clc/call;
-- LIBC latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 41.510 + 0.205 clc/call; Median-Min = 0.244 clc/call; Max = 41.867 clc/call;
-- LIBC latency -- without FMA [####################] 100 % Ntrial = 20 ; Min = 55.669 + 0.240 clc/call; Median-Min = 0.280 clc/call; Max = 56.056 clc/call; ``` - Accurate pass latency: ``` $ ./perf.sh log10 --latency --simple_stat GNU libc version: 2.35 GNU libc release: stable
-- CORE-MATH latency -- with FMA 640.688
-- CORE-MATH latency -- without FMA (-march=x86-64-v2) 667.354
-- LIBC latency -- with FMA 495.593
-- LIBC latency -- without FMA 504.143 ```
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D150014
show more ...
|
Revision tags: llvmorg-16.0.3, llvmorg-16.0.2 |
|
#
bc8e87ef |
| 07-Apr-2023 |
Tue Ly <lntue@google.com> |
[libc][math] Update range reduction step for logf and reduce its latency.
Simplify the range reduction steps by choosing the reduction constants carefully so that the reduced arguments v = r*m_x - 1
[libc][math] Update range reduction step for logf and reduce its latency.
Simplify the range reduction steps by choosing the reduction constants carefully so that the reduced arguments v = r*m_x - 1 and v^2 are exact in double precision, even without FMA instructions, and -2^-8 <= v < 2^-7. This allows the polynomial evaluations to be parallelized more efficiently.
Reviewed By: santoshn, zimmermann6
Differential Revision: https://reviews.llvm.org/D147755
show more ...
|
#
9af8dca7 |
| 06-Apr-2023 |
Tue Ly <lntue.h@gmail.com> |
[libc][math] Update range reduction step for log10f and reduce its latency.
Simplify the range reduction steps by choosing the reduction constants carefully so that the reduced arguments v = r*m_x -
[libc][math] Update range reduction step for log10f and reduce its latency.
Simplify the range reduction steps by choosing the reduction constants carefully so that the reduced arguments v = r*m_x - 1 and v^2 are exact in double precision, even without FMA instructions, and -2^-8 <= v < 2^-7. This allows the polynomial evaluations to be parallelized more efficiently.
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D147676
show more ...
|
Revision tags: llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0 |
|
#
89ed5b7c |
| 28-Aug-2022 |
Kirill Okhotnikov <okir@google.com> |
[libc][math] Added auxiliary function log2_eval for asinhf/acoshf/atanhf.
1) `double log2_eval(double)` function added with better than float precision is added. 2) Some refactoring done to put all
[libc][math] Added auxiliary function log2_eval for asinhf/acoshf/atanhf.
1) `double log2_eval(double)` function added with better than float precision is added. 2) Some refactoring done to put all auxiliary functions and corresponding data to one place to reuse the code. 3) Added tests for new functions. 4) Performance and precision tests of the function shows, that it more precise than exiting log2, (no exceptional cases), but timing is ~5% higer that on current one.
Differential Revision: https://reviews.llvm.org/D132809
show more ...
|
Revision tags: llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2 |
|
#
131dda9a |
| 01-Aug-2022 |
Tue Ly <lntue@google.com> |
[libc] Implement sincosf function correctly rounded to all rounding modes.
Refactor common range reductions and evaluations for sinf, cosf, and sincosf. Added exhaustive tests for sincosf.
Perform
[libc] Implement sincosf function correctly rounded to all rounding modes.
Refactor common range reductions and evaluations for sinf, cosf, and sincosf. Added exhaustive tests for sincosf.
Performance before the patch: ``` System LIBC reciprocal throughput : 30.205 LIBC reciprocal throughput : 30.533
System LIBC latency : 67.961 LIBC latency : 61.564 ``` Performance after the patch: ``` System LIBC reciprocal throughput : 30.409 LIBC reciprocal throughput : 20.273
System LIBC latency : 67.527 LIBC latency : 61.959 ```
Reviewed By: orex
Differential Revision: https://reviews.llvm.org/D130901
show more ...
|
Revision tags: llvmorg-15.0.0-rc1, llvmorg-16-init |
|
#
c78144e1 |
| 01-Jul-2022 |
Kirill Okhotnikov <okir@google.com> |
[libc][math] Improved performance of exp2f function.
New exp2 function algorithm: 1) Improved performance: 8.176 vs 15.270 by core-math perf tool. 2) Improved accuracy. Only two special values left.
[libc][math] Improved performance of exp2f function.
New exp2 function algorithm: 1) Improved performance: 8.176 vs 15.270 by core-math perf tool. 2) Improved accuracy. Only two special values left. 3) Lookup table size reduced twice.
Differential Revision: https://reviews.llvm.org/D129005
show more ...
|
#
15b9380d |
| 27-Jul-2022 |
Tue Ly <lntue@google.com> |
[libc] Change sinf range reduction to mod pi/16 to be shared with cosf.
Change `sinf` range reduction to mod pi/16 to be shared with `cosf`.
Previously, `sinf` used range reduction `mod pi`, but th
[libc] Change sinf range reduction to mod pi/16 to be shared with cosf.
Change `sinf` range reduction to mod pi/16 to be shared with `cosf`.
Previously, `sinf` used range reduction `mod pi`, but this cannot be used to implement `cosf` since the minimax algorithm for `cosf` does not converge due to critical points at `pi/2`. In order to be able to share the same range reduction functions for both `sinf` and `cosf`, we change the range reduction to `mod pi/16` for the following reasons: - The table size is sufficiently small: 32 entries for `sin(k * pi/16)` with `k = 0..31`. It could be reduced to 16 entries if we treat the final sign separately, with an extra multiplication at the end. - The polynomials' degrees are reduced to 7/8 from 15, with extra computations to combine `sin` and `cos` with trig sum equality. - The number of exceptional cases reduced to 2 (with FMA) and 3 (without FMA). - The latency is reduced while maintaining similar throughput as before.
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D130629
show more ...
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
#
64af346b |
| 14-Mar-2022 |
Tue Ly <lntue@google.com> |
[libc] Implement expm1f function that is correctly rounded for all rounding modes.
Implement expm1f function that is correctly rounded for all rounding modes. This is based on expf implementation.
[libc] Implement expm1f function that is correctly rounded for all rounding modes.
Implement expm1f function that is correctly rounded for all rounding modes. This is based on expf implementation.
From exhaustive testings, using expf implementation, and subtract 1.0 before rounding the final result to single precision gives correctly rounded results for all |x| > 2^-4 with 1 exception. When |x| < 2^-25, we use x + x^2 (implemented with a single fma). And for 2^-25 <= |x| <= 2^-4, we use a single degree-8 minimax polynomial generated by Sollya.
Reviewed By: sivachandra, zimmermann6
Differential Revision: https://reviews.llvm.org/D121574
show more ...
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1 |
|
#
9e7688c7 |
| 04-Feb-2022 |
Tue Ly <lntue@google.com> |
[libc] Implement log1pf correctly rounded to all rounding modes.
Implement log1pf correctly rounded to all rounding modes relying on logf implementation for exponent > 2^(-8).
Reviewed By: sivachan
[libc] Implement log1pf correctly rounded to all rounding modes.
Implement log1pf correctly rounded to all rounding modes relying on logf implementation for exponent > 2^(-8).
Reviewed By: sivachandra, zimmermann6
Differential Revision: https://reviews.llvm.org/D118962
show more ...
|
Revision tags: llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
#
63d2df00 |
| 15-Dec-2021 |
Tue Ly <lntue@google.com> |
[libc] Implement correctly rounded log2f based on RLIBM library.
Implement log2f based on RLIBM library correctly rounded for all rounding modes.
Reviewed By: sivachandra, michaelrj, santoshn, jpl1
[libc] Implement correctly rounded log2f based on RLIBM library.
Implement log2f based on RLIBM library correctly rounded for all rounding modes.
Reviewed By: sivachandra, michaelrj, santoshn, jpl169, zimmermann6
Differential Revision: https://reviews.llvm.org/D115828
show more ...
|