Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6 |
|
#
ca79ff07 |
| 14-Dec-2024 |
Chandler Carruth <chandlerc@gmail.com> |
Revert "Switch builtin strings to use string tables" (#119638)
Reverts llvm/llvm-project#118734
There are currently some specific versions of MSVC that are miscompiling
this code (we think). We
Revert "Switch builtin strings to use string tables" (#119638)
Reverts llvm/llvm-project#118734
There are currently some specific versions of MSVC that are miscompiling
this code (we think). We don't know why as all the other build bots and
at least some folks' local Windows builds work fine.
This is a candidate revert to help the relevant folks catch their
builders up and have time to debug the issue. However, the expectation
is to roll forward at some point with a workaround if at all possible.
show more ...
|
#
be2df95e |
| 09-Dec-2024 |
Chandler Carruth <chandlerc@gmail.com> |
Switch builtin strings to use string tables (#118734)
The Clang binary (and any binary linking Clang as a library), when built
using PIE, ends up with a pretty shocking number of dynamic relocation
Switch builtin strings to use string tables (#118734)
The Clang binary (and any binary linking Clang as a library), when built
using PIE, ends up with a pretty shocking number of dynamic relocations
to apply to the executable image: roughly 400k.
Each of these takes up binary space in the executable, and perhaps most
interestingly takes start-up time to apply the relocations.
The largest pattern I identified were the strings used to describe
target builtins. The addresses of these string literals were stored into
huge arrays, each one requiring a dynamic relocation. The way to avoid
this is to design the target builtins to use a single large table of
strings and offsets within the table for the individual strings. This
switches the builtin management to such a scheme.
This saves over 100k dynamic relocations by my measurement, an over 25%
reduction. Just looking at byte size improvements, using the `bloaty`
tool to compare a newly built `clang` binary to an old one:
```
FILE SIZE VM SIZE
-------------- --------------
+1.4% +653Ki +1.4% +653Ki .rodata
+0.0% +960 +0.0% +960 .text
+0.0% +197 +0.0% +197 .dynstr
+0.0% +184 +0.0% +184 .eh_frame
+0.0% +96 +0.0% +96 .dynsym
+0.0% +40 +0.0% +40 .eh_frame_hdr
+114% +32 [ = ] 0 [Unmapped]
+0.0% +20 +0.0% +20 .gnu.hash
+0.0% +8 +0.0% +8 .gnu.version
+0.9% +7 +0.9% +7 [LOAD #2 [R]]
[ = ] 0 -75.4% -3.00Ki .relro_padding
-16.1% -802Ki -16.1% -802Ki .data.rel.ro
-27.3% -2.52Mi -27.3% -2.52Mi .rela.dyn
-1.6% -2.66Mi -1.6% -2.66Mi TOTAL
```
We get a 16% reduction in the `.data.rel.ro` section, and nearly 30%
reduction in `.rela.dyn` where those reloctaions are stored.
This is also visible in my benchmarking of binary start-up overhead at
least:
```
Benchmark 1: ./old_clang --version
Time (mean ± σ): 17.6 ms ± 1.5 ms [User: 4.1 ms, System: 13.3 ms]
Range (min … max): 14.2 ms … 22.8 ms 162 runs
Benchmark 2: ./new_clang --version
Time (mean ± σ): 15.5 ms ± 1.4 ms [User: 3.6 ms, System: 11.8 ms]
Range (min … max): 12.4 ms … 20.3 ms 216 runs
Summary
'./new_clang --version' ran
1.13 ± 0.14 times faster than './old_clang --version'
```
We get about 2ms faster `--version` runs. While there is a lot of noise
in binary execution time, this delta is pretty consistent, and
represents over 10% improvement. This is particularly interesting to me
because for very short source files, repeatedly starting the `clang`
binary is actually the dominant cost. For example, `configure` scripts
running against the `clang` compiler are slow in large part because of
binary start up time, not the time to process the actual inputs to the
compiler.
----
This PR implements the string tables using `constexpr` code and the
existing macro system. I understand that the builtins are moving towards
a TableGen model, and if complete that would provide more options for
modeling this. Unfortunately, that migration isn't complete, and even
the parts that are migrated still rely on the ability to break out of
the TableGen model and directly expand an X-macro style `BUILTIN(...)`
textually. I looked at trying to complete the move to TableGen, but it
would both require the difficult migration of the remaining targets, and
solving some tricky problems with how to move away from any macro-based
expansion.
I was also able to find a reasonably clean and effective way of doing
this with the existing macros and some `constexpr` code that I think is
clean enough to be a pretty good intermediate state, and maybe give a
good target for the eventual TableGen solution. I was also able to
factor the macros into set of consistent patterns that avoids a
significant regression in overall boilerplate.
show more ...
|
Revision tags: llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0 |
|
#
b55186ee |
| 04-Sep-2024 |
Alex Rønne Petersen <alex@alexrp.com> |
[clang][Driver] Define soft float macros for PPC. (#106012)
Fixes #105972.
Co-authored-by: Qiu Chaofan <qcf@ecnelises.com>
|
Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3 |
|
#
3e713575 |
| 08-Aug-2024 |
Zaara Syeda <syzaara@ca.ibm.com> |
[PPC] Disable vsx and altivec when -msoft-float is used (#100450)
We emit an error when -msoft-float and -maltivec/-mvsx is used together,
but when -msoft-float is used on its own, there is still +
[PPC] Disable vsx and altivec when -msoft-float is used (#100450)
We emit an error when -msoft-float and -maltivec/-mvsx is used together,
but when -msoft-float is used on its own, there is still +altivec and
+vsx in the IR attributes. This patch disables altivec and vsx and all
related sub features when -msoft-float is used.
show more ...
|
#
64510c14 |
| 07-Aug-2024 |
Lei Huang <lei@ca.ibm.com> |
[PPC] Implement BCD assist builtins (#101390)
Implement BCD assist builtins for XL and GCC compatibility.
GCC compat:
```
unsigned int __builtin_cdtbcd (unsigned int);
unsigned int __builtin_c
[PPC] Implement BCD assist builtins (#101390)
Implement BCD assist builtins for XL and GCC compatibility.
GCC compat:
```
unsigned int __builtin_cdtbcd (unsigned int);
unsigned int __builtin_cbcdtd (unsigned int);
unsigned int __builtin_addg6s (unsigned int, unsigned int);
```
64BIT XL compat:
```
long long __cdtbcd (long long);
long long __cbcdtd (long long);
long long __addg6s (long long source1, long long source2)
```
show more ...
|
Revision tags: llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1 |
|
#
25482b35 |
| 25-Jul-2024 |
Chen Zheng <czhengsz@cn.ibm.com> |
[PowerPC] add TargetParser for PPC target (#97541)
For now only focus on the CPU type, will work on the CPU features part
later.
With the CPU handling in TargetParser, clang and llc/opt are able
[PowerPC] add TargetParser for PPC target (#97541)
For now only focus on the CPU type, will work on the CPU features part
later.
With the CPU handling in TargetParser, clang and llc/opt are able to
query common interfaces.
So we can set same default CPU and CPU features with same interfaces.
show more ...
|
#
1df4d866 |
| 23-Jul-2024 |
azhan92 <alisonxzhang@gmail.com> |
[PowerPC] Add support for -mcpu=pwr11 / -mtune=pwr11 (#99511)
This PR adds support for -mcpu=pwr11/power11 and -mtune=pwr11/power11 in
clang and llvm.
|
Revision tags: llvmorg-20-init |
|
#
afd0e6d0 |
| 08-Jul-2024 |
Chen Zheng <czhengsz@cn.ibm.com> |
[PowerPC] Diagnose musttail instead of crash inside backend (#93267)
musttail is not often possible to be generated on PPC targets as when
calling to a function defined in another module, PPC needs
[PowerPC] Diagnose musttail instead of crash inside backend (#93267)
musttail is not often possible to be generated on PPC targets as when
calling to a function defined in another module, PPC needs to restore
the TOC pointer. To restore the TOC pointer, compiler needs to emit a
nop after the call to let linker generate codes to restore TOC pointer.
Tail call cannot generate expected call sequence for this case.
To avoid the crash inside the compiler backend, a diagnosis is added in
the frontend.
Fixes #63214
show more ...
|
#
6a992bc8 |
| 03-Jul-2024 |
Chen Zheng <czhengsz@cn.ibm.com> |
[PowerPC] refactor CPU info in PPCTargetParser.def, NFC
CPU features will be done in follow up patches.
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6 |
|
#
ea126aeb |
| 09-May-2024 |
Felix (Ting Wang) <Ting.Wang.SH@ibm.com> |
[PowerPC] Tune AIX shared library TLS model at function level (#84132)
Under some circumstance (library loaded with the main program), TLS
initial-exec model can be applied to local-dynamic access(
[PowerPC] Tune AIX shared library TLS model at function level (#84132)
Under some circumstance (library loaded with the main program), TLS
initial-exec model can be applied to local-dynamic access(es). We
could use some simple heuristic to decide the update at function level:
* If there is equal or less than a number of TLS local-dynamic access(es)
in the function, use TLS initial-exec model. (the threshold which default to
1 is controlled by hidden option)
show more ...
|
#
d4a25976 |
| 02-May-2024 |
zhijian lin <zhijian@ca.ibm.com> |
Implement a subset of builtin_cpu_supports() features (#82809)
The PR implements a subset of features of function
__builtin_cpu_support() for AIX OS based on the information which AIX
kernel runti
Implement a subset of builtin_cpu_supports() features (#82809)
The PR implements a subset of features of function
__builtin_cpu_support() for AIX OS based on the information which AIX
kernel runtime variable `_system_configuration` and function call `getsystemcfg()` of
/usr/include/sys/systemcfg.h in AIX OS can provide.
Following subset of features are supported in the PR
"arch_3_00", "arch_3_1","booke","cellbe","darn","dfp","dscr" ,"ebb","efpsingle","efpdouble","fpu","htm","isel",
"mma","mmu","pa6t","power4","power5","power5+","power6x","ppc32","ppc601","ppc64","ppcle","smt",
"spe","tar","true_le","ucache","vsx"
show more ...
|
Revision tags: llvmorg-18.1.5 |
|
#
16efd2a4 |
| 23-Apr-2024 |
Felix (Ting Wang) <Ting.Wang.SH@ibm.com> |
[AIX][TLS][clang] Add -maix-small-local-dynamic-tls clang option (#88829)
This patch adds the clang portion of an AIX-specific option to inform
the
compiler that it can use a faster access sequen
[AIX][TLS][clang] Add -maix-small-local-dynamic-tls clang option (#88829)
This patch adds the clang portion of an AIX-specific option to inform
the
compiler that it can use a faster access sequence for the local-dynamic
TLS model (formally named aix-small-local-dynamic-tls).
This patch mainly references Amy's work on small local-exec TLS support.
show more ...
|
Revision tags: llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4 |
|
#
5b8e5604 |
| 22-Feb-2024 |
zhijian lin <zhijian@ca.ibm.com> |
[AIX] Lower intrinsic __builtin_cpu_is into AIX platform-specific code. (#80069)
On AIX OS, __builtin_cpu_is() references the runtime external variable
_system_configuration from /usr/include/sys/s
[AIX] Lower intrinsic __builtin_cpu_is into AIX platform-specific code. (#80069)
On AIX OS, __builtin_cpu_is() references the runtime external variable
_system_configuration from /usr/include/sys/systemcfg.h.
ref issue: https://github.com/llvm/llvm-project/issues/80042
show more ...
|
Revision tags: llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1 |
|
#
d5fe1bd0 |
| 26-Jan-2024 |
Amy Kwan <amy.kwan1@ibm.com> |
[AIX][TLS] Disallow the use of -maix-small-local-exec-tls and -fno-data-sections (#79252)
This patch disallows the use of the -maix-small-local-exec-tls and
-fno-data-sections options within clang,
[AIX][TLS] Disallow the use of -maix-small-local-exec-tls and -fno-data-sections (#79252)
This patch disallows the use of the -maix-small-local-exec-tls and
-fno-data-sections options within clang, and also disallows the use of
the aix-small-local-exec-tls attribute with the -data-sections=false
option in llc.
This is because having data sections off when using the
aix-small-local-exec-tls feature is not ideal for performance. As the
small-local-exec-tls region is a limited resource, this space should not
used for variables that may be replaced.
Note, that on AIX, data sections is turned on by default, so this patch
makes it so that a diagnostic is emitted when users explicitly turn off
data sections while using the aix-small-local-exec-tls feature.
show more ...
|
#
67c1c1db |
| 26-Jan-2024 |
Nemanja Ivanovic <nemanja.i.ibm@gmail.com> |
[PowerPC][X86] Make cpu id builtins target independent and lower for PPC (#68919)
Make __builtin_cpu_{init|supports|is} target independent and provide an
opt-in query for targets that want to suppo
[PowerPC][X86] Make cpu id builtins target independent and lower for PPC (#68919)
Make __builtin_cpu_{init|supports|is} target independent and provide an
opt-in query for targets that want to support it. Each target is still
responsible for their specific lowering/code-gen. Also provide code-gen
for PowerPC.
I originally proposed this in https://reviews.llvm.org/D152914 and this
addresses the comments I received there.
---------
Co-authored-by: Nemanja Ivanovic <nemanjaivanovic@nemanjas-air.kpn>
Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>
show more ...
|
#
4792f912 |
| 26-Jan-2024 |
Chen Zheng <czhengsz@cn.ibm.com> |
[PowerPC] Diagnose invalid combination with Altivec, VSX and soft-float (#79109)
Moved from https://reviews.llvm.org/D126302
The current behaviour with these three options is quite undesirable:
[PowerPC] Diagnose invalid combination with Altivec, VSX and soft-float (#79109)
Moved from https://reviews.llvm.org/D126302
The current behaviour with these three options is quite undesirable:
-mno-altivec -mvsx allows VSX to override no Altivec, thereby turning on
both
-msoft-float -maltivec causes a crash if an actual Altivec instruction
is required because soft float turns of Altivec
-msoft-float -mvsx is also accepted with both Altivec and VSX turned off
(potentially causing crashes as above)
This patch diagnoses these impossible combinations in the driver so the
user does not end up with surprises in terms of their options being
ignored or silently overridden.
Fixes https://github.com/llvm/llvm-project/issues/55556
---------
Co-authored-by: Nemanja Ivanovic <nemanja.i.ibm@gmail.com>
show more ...
|
Revision tags: llvmorg-19-init |
|
#
85071a3c |
| 15-Jan-2024 |
Qiu Chaofan <qiucofan@cn.ibm.com> |
[PowerPC] Implement fence builtin (#76495)
|
Revision tags: llvmorg-17.0.6 |
|
#
d572c4cd |
| 20-Nov-2023 |
Qiu Chaofan <qiucofan@cn.ibm.com> |
[PowerPC] Disable float128 on AIX in Clang (#67298)
PowerPC AIX backend does not support float128 at all. Diagnose even when
specifying -mfloat128 to avoid backend crash.
---------
Co-authore
[PowerPC] Disable float128 on AIX in Clang (#67298)
PowerPC AIX backend does not support float128 at all. Diagnose even when
specifying -mfloat128 to avoid backend crash.
---------
Co-authored-by: Kai Luo <gluokai@gmail.com>
show more ...
|
Revision tags: llvmorg-17.0.5, llvmorg-17.0.4 |
|
#
de7c0068 |
| 26-Oct-2023 |
Qiu Chaofan <qiucofan@cn.ibm.com> |
[PowerPC] Fix use of FPSCR builtins in smmintrin.h (#67299)
smmintrin.h uses __builtin_mffs, __builtin_mffsl, __builtin_mtfsf and
__builtin_set_fpscr_rn. This patch replaces the uses with ppc prefi
[PowerPC] Fix use of FPSCR builtins in smmintrin.h (#67299)
smmintrin.h uses __builtin_mffs, __builtin_mffsl, __builtin_mtfsf and
__builtin_set_fpscr_rn. This patch replaces the uses with ppc prefix
and implement the missing ones.
show more ...
|
Revision tags: llvmorg-17.0.3 |
|
#
c661c4f5 |
| 12-Oct-2023 |
Chen Zheng <czhengsz@cn.ibm.com> |
[AIX] recognize vsr in inline asm for AIX (#68476)
Extend `PPCTargetInfo::getGCCAddlRegNames()` to aix as well. The
definition should be common between Linux PPC and AIX PPC.
|
#
635eb5f3 |
| 07-Oct-2023 |
Hubert Tong <hubert-reinterpretcast@users.noreply.github.com> |
[clang][NFC] Typo fix in PPC.cpp
s/Definitoin/Definition/
|
Revision tags: llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0 |
|
#
b1922e55 |
| 07-Sep-2023 |
Amy Kwan <amy.kwan1@ibm.com> |
[AIX][TLS][clang] Add -maix-small-local-exec-tls clang option.
This patch adds the clang portion of an AIX-specific option to inform the compiler that it can use a faster access sequence for the loc
[AIX][TLS][clang] Add -maix-small-local-exec-tls clang option.
This patch adds the clang portion of an AIX-specific option to inform the compiler that it can use a faster access sequence for the local-exec TLS model (formally named aix-small-local-exec-tls).
This patch only adds the frontend portion of the option, building upon:
Backend portion of the option (D156203) Backend patch that utilizes this option to actually produce the faster access sequence (D155600)
Differential Revision: https://reviews.llvm.org/D155544
show more ...
|
Revision tags: llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
#
f6d557ee |
| 26-Jun-2023 |
Nikolas Klauser <nikolasklauser@berlin.de> |
[clang][NFC] Remove trailing whitespaces and enforce it in lib, include and docs
A lot of editors remove trailing whitespaces. This patch removes any trailing whitespaces and makes sure that no new
[clang][NFC] Remove trailing whitespaces and enforce it in lib, include and docs
A lot of editors remove trailing whitespaces. This patch removes any trailing whitespaces and makes sure that no new ones are added.
Reviewed By: erichkeane, paulkirth, #libc, philnik
Spies: wangpc, aheejin, MaskRay, pcwang-thead, cfe-commits, libcxx-commits, dschuff, nemanjai, arichardson, kbarton, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, s.egerton, sameer.abuasal, apazos, luismarques, martong, frasercrmck, steakhal, luke
Differential Revision: https://reviews.llvm.org/D151963
show more ...
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4 |
|
#
fa1f88cd |
| 10-May-2023 |
Qiu Chaofan <qiucofan@cn.ibm.com> |
Reland "[PowerPC] Add target feature requirement to builtins"
This relands D143467 after fixing build failure with GCC.
|
#
af88d34f |
| 08-May-2023 |
Vitaly Buka <vitalybuka@google.com> |
Revert "[PowerPC] Add target feature requirement to builtins"
Breaks PPC bots, see D143467.
This reverts commit 651b0e2e7afca926c3d4f8d7f988db40b9832676.
|