|
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6 |
|
| #
ca79ff07 |
| 14-Dec-2024 |
Chandler Carruth <chandlerc@gmail.com> |
Revert "Switch builtin strings to use string tables" (#119638)
Reverts llvm/llvm-project#118734
There are currently some specific versions of MSVC that are miscompiling
this code (we think). We
Revert "Switch builtin strings to use string tables" (#119638)
Reverts llvm/llvm-project#118734
There are currently some specific versions of MSVC that are miscompiling
this code (we think). We don't know why as all the other build bots and
at least some folks' local Windows builds work fine.
This is a candidate revert to help the relevant folks catch their
builders up and have time to debug the issue. However, the expectation
is to roll forward at some point with a workaround if at all possible.
show more ...
|
| #
be2df95e |
| 09-Dec-2024 |
Chandler Carruth <chandlerc@gmail.com> |
Switch builtin strings to use string tables (#118734)
The Clang binary (and any binary linking Clang as a library), when built
using PIE, ends up with a pretty shocking number of dynamic relocation
Switch builtin strings to use string tables (#118734)
The Clang binary (and any binary linking Clang as a library), when built
using PIE, ends up with a pretty shocking number of dynamic relocations
to apply to the executable image: roughly 400k.
Each of these takes up binary space in the executable, and perhaps most
interestingly takes start-up time to apply the relocations.
The largest pattern I identified were the strings used to describe
target builtins. The addresses of these string literals were stored into
huge arrays, each one requiring a dynamic relocation. The way to avoid
this is to design the target builtins to use a single large table of
strings and offsets within the table for the individual strings. This
switches the builtin management to such a scheme.
This saves over 100k dynamic relocations by my measurement, an over 25%
reduction. Just looking at byte size improvements, using the `bloaty`
tool to compare a newly built `clang` binary to an old one:
```
FILE SIZE VM SIZE
-------------- --------------
+1.4% +653Ki +1.4% +653Ki .rodata
+0.0% +960 +0.0% +960 .text
+0.0% +197 +0.0% +197 .dynstr
+0.0% +184 +0.0% +184 .eh_frame
+0.0% +96 +0.0% +96 .dynsym
+0.0% +40 +0.0% +40 .eh_frame_hdr
+114% +32 [ = ] 0 [Unmapped]
+0.0% +20 +0.0% +20 .gnu.hash
+0.0% +8 +0.0% +8 .gnu.version
+0.9% +7 +0.9% +7 [LOAD #2 [R]]
[ = ] 0 -75.4% -3.00Ki .relro_padding
-16.1% -802Ki -16.1% -802Ki .data.rel.ro
-27.3% -2.52Mi -27.3% -2.52Mi .rela.dyn
-1.6% -2.66Mi -1.6% -2.66Mi TOTAL
```
We get a 16% reduction in the `.data.rel.ro` section, and nearly 30%
reduction in `.rela.dyn` where those reloctaions are stored.
This is also visible in my benchmarking of binary start-up overhead at
least:
```
Benchmark 1: ./old_clang --version
Time (mean ± σ): 17.6 ms ± 1.5 ms [User: 4.1 ms, System: 13.3 ms]
Range (min … max): 14.2 ms … 22.8 ms 162 runs
Benchmark 2: ./new_clang --version
Time (mean ± σ): 15.5 ms ± 1.4 ms [User: 3.6 ms, System: 11.8 ms]
Range (min … max): 12.4 ms … 20.3 ms 216 runs
Summary
'./new_clang --version' ran
1.13 ± 0.14 times faster than './old_clang --version'
```
We get about 2ms faster `--version` runs. While there is a lot of noise
in binary execution time, this delta is pretty consistent, and
represents over 10% improvement. This is particularly interesting to me
because for very short source files, repeatedly starting the `clang`
binary is actually the dominant cost. For example, `configure` scripts
running against the `clang` compiler are slow in large part because of
binary start up time, not the time to process the actual inputs to the
compiler.
----
This PR implements the string tables using `constexpr` code and the
existing macro system. I understand that the builtins are moving towards
a TableGen model, and if complete that would provide more options for
modeling this. Unfortunately, that migration isn't complete, and even
the parts that are migrated still rely on the ability to break out of
the TableGen model and directly expand an X-macro style `BUILTIN(...)`
textually. I looked at trying to complete the move to TableGen, but it
would both require the difficult migration of the remaining targets, and
solving some tricky problems with how to move away from any macro-based
expansion.
I was also able to find a reasonably clean and effective way of doing
this with the existing macros and some `constexpr` code that I think is
clean enough to be a pretty good intermediate state, and maybe give a
good target for the eventual TableGen solution. I was also able to
factor the macros into set of consistent patterns that avoids a
significant regression in overall boilerplate.
show more ...
|
|
Revision tags: llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1 |
|
| #
0013f94b |
| 19-Sep-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[clang][powerpc][wasm][systemz][x86] Replace target vector popcount intrinsics with __builtin_elementwise_popcount (#109160)
Now that we have the C/C++ `__builtin_elementwise_popcount` intrinsic (#1
[clang][powerpc][wasm][systemz][x86] Replace target vector popcount intrinsics with __builtin_elementwise_popcount (#109160)
Now that we have the C/C++ `__builtin_elementwise_popcount` intrinsic (#108121) - remove custom target intrinsics that just immediately map to Intrinsic::ctpop and use the generic intrinsic directly.
show more ...
|
|
Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3 |
|
| #
64510c14 |
| 07-Aug-2024 |
Lei Huang <lei@ca.ibm.com> |
[PPC] Implement BCD assist builtins (#101390)
Implement BCD assist builtins for XL and GCC compatibility.
GCC compat:
```
unsigned int __builtin_cdtbcd (unsigned int);
unsigned int __builtin_c
[PPC] Implement BCD assist builtins (#101390)
Implement BCD assist builtins for XL and GCC compatibility.
GCC compat:
```
unsigned int __builtin_cdtbcd (unsigned int);
unsigned int __builtin_cbcdtd (unsigned int);
unsigned int __builtin_addg6s (unsigned int, unsigned int);
```
64BIT XL compat:
```
long long __cdtbcd (long long);
long long __cbcdtd (long long);
long long __addg6s (long long source1, long long source2)
```
show more ...
|
|
Revision tags: llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
| #
85071a3c |
| 15-Jan-2024 |
Qiu Chaofan <qiucofan@cn.ibm.com> |
[PowerPC] Implement fence builtin (#76495)
|
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4 |
|
| #
de7c0068 |
| 26-Oct-2023 |
Qiu Chaofan <qiucofan@cn.ibm.com> |
[PowerPC] Fix use of FPSCR builtins in smmintrin.h (#67299)
smmintrin.h uses __builtin_mffs, __builtin_mffsl, __builtin_mtfsf and
__builtin_set_fpscr_rn. This patch replaces the uses with ppc prefi
[PowerPC] Fix use of FPSCR builtins in smmintrin.h (#67299)
smmintrin.h uses __builtin_mffs, __builtin_mffsl, __builtin_mtfsf and
__builtin_set_fpscr_rn. This patch replaces the uses with ppc prefix
and implement the missing ones.
show more ...
|
|
Revision tags: llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4 |
|
| #
082c5d7f |
| 05-Sep-2023 |
Qiu Chaofan <qiucofan@cn.ibm.com> |
[PowerPC] Implement builtin for mffsl
mffsl is available since ISA 3.0. The builtin is named with ppc prefix to follow our convention. For targets earlier than power9, GCC generates extra code to su
[PowerPC] Implement builtin for mffsl
mffsl is available since ISA 3.0. The builtin is named with ppc prefix to follow our convention. For targets earlier than power9, GCC generates extra code to support the functionality, while this patch does not implement such behavior.
Reviewed By: nemanjai, tuliom
Differential Revision: https://reviews.llvm.org/D158065
show more ...
|
|
Revision tags: llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4 |
|
| #
fa1f88cd |
| 10-May-2023 |
Qiu Chaofan <qiucofan@cn.ibm.com> |
Reland "[PowerPC] Add target feature requirement to builtins"
This relands D143467 after fixing build failure with GCC.
|
| #
af88d34f |
| 08-May-2023 |
Vitaly Buka <vitalybuka@google.com> |
Revert "[PowerPC] Add target feature requirement to builtins"
Breaks PPC bots, see D143467.
This reverts commit 651b0e2e7afca926c3d4f8d7f988db40b9832676.
|
| #
651b0e2e |
| 08-May-2023 |
Qiu Chaofan <qiucofan@cn.ibm.com> |
[PowerPC] Add target feature requirement to builtins
Clang has mechanism to specify required target features of a built-in function. This patch adds such definitions to Altivec, VSX, HTM, PairedVec
[PowerPC] Add target feature requirement to builtins
Clang has mechanism to specify required target features of a built-in function. This patch adds such definitions to Altivec, VSX, HTM, PairedVec and MMA builtins.
This will help frontend to detect incompatible target features of bulitin when using target attribute syntax.
Reviewed By: nemanjai, kamaub
Differential Revision: https://reviews.llvm.org/D143467
show more ...
|
|
Revision tags: llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3 |
|
| #
6897dbc4 |
| 14-Oct-2022 |
Stefan Pintilie <stefanp@ca.ibm.com> |
[PowerPC] Fix parameters for __builtin_crypto_vsbox
The documentation specifies that the input and ouput for the builtin __builtin_crypto_vsbox should be vector unsigned char.
This patch fixes this
[PowerPC] Fix parameters for __builtin_crypto_vsbox
The documentation specifies that the input and ouput for the builtin __builtin_crypto_vsbox should be vector unsigned char.
This patch fixes this type for the builtin.
Reviewed By: amyk
Differential Revision: https://reviews.llvm.org/D135834
show more ...
|
| #
0e2e1fc9 |
| 06-Oct-2022 |
Stefan Pintilie <stefanp@ca.ibm.com> |
[PowerPC] Fix types for vcipher builtins.
The documentation specifies that the parameters for the vcipher builtins are ``` vector unsigned char ``` The code used ``` vector unsigned long long ```
T
[PowerPC] Fix types for vcipher builtins.
The documentation specifies that the parameters for the vcipher builtins are ``` vector unsigned char ``` The code used ``` vector unsigned long long ```
This patch fixes the types for the vcipher builtins.
Reviewed By: amyk
Differential Revision: https://reviews.llvm.org/D135300
show more ...
|
|
Revision tags: working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6 |
|
| #
a9ddb7d5 |
| 16-Jun-2022 |
Maryam Moghadas <maryammo@ca.ibm.com> |
[PowerPC] Fixing implicit castings in altivec for -fno-lax-vector-conversions
XL considers different vector types to be incompatible with each other. For example assignment between variables of type
[PowerPC] Fixing implicit castings in altivec for -fno-lax-vector-conversions
XL considers different vector types to be incompatible with each other. For example assignment between variables of types vector float and vector long long or even vector signed int and vector unsigned int are diagnosed. clang, however does not diagnose such cases and does a simple bitcast between the two types. This could easily result in program errors. This patch is to fix the implicit casts in altivec.h so that there is no incompatible vector type errors whit -fno-lax-vector-conversions, this is the prerequisite patch to switch the default to -fno-lax-vector-conversions later.
Reviewed By: nemanjai, amyk
Differential Revision: https://reviews.llvm.org/D124093
show more ...
|
|
Revision tags: llvmorg-14.0.5 |
|
| #
10139674 |
| 01-Jun-2022 |
Nemanja Ivanovic <nemanja.i.ibm@gmail.com> |
[PowerPC] Remove const from paired vector store builtins
For some reason, we implemented the xx_stxvp intrinsics to require a const pointer. This absolutely doesn't make sense for a store. Remove th
[PowerPC] Remove const from paired vector store builtins
For some reason, we implemented the xx_stxvp intrinsics to require a const pointer. This absolutely doesn't make sense for a store. Remove the const from the definition.
show more ...
|
|
Revision tags: llvmorg-14.0.4 |
|
| #
c35ca3a1 |
| 19-May-2022 |
Amy Kwan <amy.kwan1@ibm.com> |
[PowerPC] Implement XL compat __fnabs and __fnabss builtins.
This patch implements the following floating point negative absolute value builtins that required for compatibility with the XL compiler:
[PowerPC] Implement XL compat __fnabs and __fnabss builtins.
This patch implements the following floating point negative absolute value builtins that required for compatibility with the XL compiler: ``` double __fnabs(double); float __fnabss(float); ```
These builtins will emit : - fnabs on PWR6 and below, or if VSX is disabled. - xsnabsdp on PWR7 and above, if VSX is enabled.
Differential Revision: https://reviews.llvm.org/D125506
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
| #
b389354b |
| 06-Apr-2022 |
Ting Wang <Ting.Wang.SH@ibm.com> |
[Clang][PowerPC] Add max/min intrinsics to Clang and PPC backend
Add support for builtin_[max|min] which has below prototype: A builtin_max (A1, A2, A3, ...) All arguments must have the same type; t
[Clang][PowerPC] Add max/min intrinsics to Clang and PPC backend
Add support for builtin_[max|min] which has below prototype: A builtin_max (A1, A2, A3, ...) All arguments must have the same type; they must all be float, double, or long double. Internally use SelectCC to get the result.
Reviewed By: qiucf
Differential Revision: https://reviews.llvm.org/D122478
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
0850655d |
| 30-Nov-2021 |
Tarique Islam <tislam@ca.ibm.com> |
Big-endian version of vpermxor
A big-endian version of vpermxor, named vpermxor_be, is added to LLVM and Clang. vpermxor_be can be called directly on both the little-endian and the big-endian platfo
Big-endian version of vpermxor
A big-endian version of vpermxor, named vpermxor_be, is added to LLVM and Clang. vpermxor_be can be called directly on both the little-endian and the big-endian platforms.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D114540
show more ...
|
| #
dc1aa8ea |
| 24-Nov-2021 |
Nemanja Ivanovic <nemanja.i.ibm@gmail.com> |
[PowerPC] Add missed clang portion of c933c2eb3346
The clang portion of c933c2eb3346 was missed as I made some kind of mistake squashing the commits with git. This patch just adds those.
The origin
[PowerPC] Add missed clang portion of c933c2eb3346
The clang portion of c933c2eb3346 was missed as I made some kind of mistake squashing the commits with git. This patch just adds those.
The original review: https://reviews.llvm.org/D114088
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
741aeda9 |
| 03-Nov-2021 |
Qiu Chaofan <qiucofan@cn.ibm.com> |
[PowerPC] Implement longdouble pack/unpack builtins
Implement two builtins to pack/unpack IBM extended long double float, according to GCC 'Basic PowerPC Builtin Functions Available ISA 2.05'.
Revi
[PowerPC] Implement longdouble pack/unpack builtins
Implement two builtins to pack/unpack IBM extended long double float, according to GCC 'Basic PowerPC Builtin Functions Available ISA 2.05'.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D112055
show more ...
|
| #
89ec99c7 |
| 06-Oct-2021 |
Kamau Bridgeman <kamau.bridgeman@ibm.com> |
[PowerPC][Builtin] Allowing __rlwnm to accept a variable as a shift parameter
The builtin __rlwnm is currently constrained to accept only constants for the shift parameter but the instructions emitt
[PowerPC][Builtin] Allowing __rlwnm to accept a variable as a shift parameter
The builtin __rlwnm is currently constrained to accept only constants for the shift parameter but the instructions emitted for it have no such constraint, this patch allows the builtins to accept variable shift.
Reviewed By: NeHuang, amyk
Differential Revision: https://reviews.llvm.org/D111229
show more ...
|
| #
4fc2f497 |
| 29-Sep-2021 |
Stefan Pintilie <stefanp@ca.ibm.com> |
[PowerPC] Fix __builtin_ppc_load2r to return short instead of int.
This patch fixes the return value of the builtin __builtin_ppc_load2r to correctly return short instead of int.
Reviewed By: neman
[PowerPC] Fix __builtin_ppc_load2r to return short instead of int.
This patch fixes the return value of the builtin __builtin_ppc_load2r to correctly return short instead of int.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D110771
show more ...
|
| #
29bb8774 |
| 30-Sep-2021 |
Albion Fung <albionapc@gmail.com> |
[PowerPC] Fix lharx and lbarx builtin signatures
The signatures for the PowerPC builtins lharx and lbarx are incorrect, and causes issues when used in a function that requires the return of the buil
[PowerPC] Fix lharx and lbarx builtin signatures
The signatures for the PowerPC builtins lharx and lbarx are incorrect, and causes issues when used in a function that requires the return of the builtin to be promoted. This patch fixes these signatures.
Differential revision: https://reviews.llvm.org/D110273
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init |
|
| #
67a3d1e2 |
| 22-Jul-2021 |
Quinn Pham <quinn.pham@ibm.com> |
[PowerPC] swdiv builtins for XL compatibility
This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch implements the software divide builtin as wr
[PowerPC] swdiv builtins for XL compatibility
This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch implements the software divide builtin as wrappers for a floating point divide. XL provided these builtins because it didn't produce software estimates by default at `-Ofast`. When compiled with `-Ofast` these builtins will produce the software estimate for divide.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D106959
show more ...
|
| #
09b67aa1 |
| 29-Sep-2021 |
Nemanja Ivanovic <nemanja.i.ibm@gmail.com> |
[PowerPC] Implement builtin for vbpermd
The instruction has similar semantics to vbpermq but for doublewords. It was added in Power9 and the ABI documents the builtin.
Differential revision: https:
[PowerPC] Implement builtin for vbpermd
The instruction has similar semantics to vbpermq but for doublewords. It was added in Power9 and the ABI documents the builtin.
Differential revision: https://reviews.llvm.org/D107899
show more ...
|
| #
70391b34 |
| 08-Sep-2021 |
Quinn Pham <Quinn.Pham@ibm.com> |
[PowerPC] FP compare and test XL compat builtins.
This patch is in a series of patches to provide builtins for compatability with the XL compiler. This patch adds builtins for compare exponent and t
[PowerPC] FP compare and test XL compat builtins.
This patch is in a series of patches to provide builtins for compatability with the XL compiler. This patch adds builtins for compare exponent and test data class operations on floating point values.
Reviewed By: #powerpc, lei
Differential Revision: https://reviews.llvm.org/D109437
show more ...
|