Revision tags: llvmorg-21-init, llvmorg-19.1.7 |
|
#
8e659401 |
| 08-Jan-2025 |
Alexandros Lamprineas <alexandros.lamprineas@arm.com> |
[FMV][AArch64] Simplify version selection according to ACLE. (#121921)
Currently, the more features a version has, the higher its priority is.
We are changing ACLE https://github.com/ARM-software/a
[FMV][AArch64] Simplify version selection according to ACLE. (#121921)
Currently, the more features a version has, the higher its priority is.
We are changing ACLE https://github.com/ARM-software/acle/pull/370 as
follows:
"Among any two versions, the higher priority version is determined by
identifying the highest priority feature that is specified in exactly
one of the versions, and selecting that version."
show more ...
|
#
a774adb0 |
| 05-Jan-2025 |
Chandler Carruth <chandlerc@gmail.com> |
Bulk port 64-bit x86 builtins to TableGen (#121043)
This PR follows https://github.com/llvm/llvm-project/pull/120831 for
x86-64.
Similar to that PR, this does a very mechanical port of X86 built
Bulk port 64-bit x86 builtins to TableGen (#121043)
This PR follows https://github.com/llvm/llvm-project/pull/120831 for
x86-64.
Similar to that PR, this does a very mechanical port of X86 builtins to
TableGen. There is a *lot* of improvement available here to use TableGen
more effectively and collapse repeated structures. But those can now be
follow-up PRs that restructure *within* the `.td` file.
The current structure produces a file that exactly matches the original
X-macros except for the differences outlined in
https://github.com/llvm/llvm-project/pull/120831:
- Horizontal whitespace
- `long long` types now use `long long` outside of OpenCL, but switch to
`long` in OpenCL where relevant.
Otherwise, only the order of builtins change, and no tests regress.
show more ...
|
#
2529a8df |
| 04-Jan-2025 |
Chandler Carruth <chandlerc@gmail.com> |
Mechanically port bulk of x86 builtins to TableGen (#120831)
The goal is to make incremental (if small) progress towards fully
TableGen'ed builtins, and to unblock #120534 by gaining access to more
Mechanically port bulk of x86 builtins to TableGen (#120831)
The goal is to make incremental (if small) progress towards fully
TableGen'ed builtins, and to unblock #120534 by gaining access to more
powerful TableGen-based representations.
The bulk `.td` file addition was generated with the help of a very rough
Python script. That script made no attempt to be robust or reusable, it
specifically handled only the cases in the X86 `.def` file.
Four entries from the `.def` file were not handled automatically as they
used `BUILTIN` rather than `TARGET_BUILTIN`. These were ported by hand
to an empty-feature `TargetBuiltin` entry, which seems like a better
match.
For all the automatically ported entries, the results were compared by
sorting and diffing the `.def` file and the generated `.inc` file. The
only differences were:
- Different horizontal whitespace
- Additional entries that had already been ported to the `.td` file.
- More systematically using `Oi` instead of `LLi` for the type `long
long int` in the fully general `__builtin_ia32_...` builtins for OpenCL
support. The `.def` file was only partially moved to this it seems, and
the systematic migration has updated a few missed builtins.
show more ...
|
Revision tags: llvmorg-19.1.6 |
|
#
ca79ff07 |
| 14-Dec-2024 |
Chandler Carruth <chandlerc@gmail.com> |
Revert "Switch builtin strings to use string tables" (#119638)
Reverts llvm/llvm-project#118734
There are currently some specific versions of MSVC that are miscompiling
this code (we think). We
Revert "Switch builtin strings to use string tables" (#119638)
Reverts llvm/llvm-project#118734
There are currently some specific versions of MSVC that are miscompiling
this code (we think). We don't know why as all the other build bots and
at least some folks' local Windows builds work fine.
This is a candidate revert to help the relevant folks catch their
builders up and have time to debug the issue. However, the expectation
is to roll forward at some point with a workaround if at all possible.
show more ...
|
#
be2df95e |
| 09-Dec-2024 |
Chandler Carruth <chandlerc@gmail.com> |
Switch builtin strings to use string tables (#118734)
The Clang binary (and any binary linking Clang as a library), when built
using PIE, ends up with a pretty shocking number of dynamic relocation
Switch builtin strings to use string tables (#118734)
The Clang binary (and any binary linking Clang as a library), when built
using PIE, ends up with a pretty shocking number of dynamic relocations
to apply to the executable image: roughly 400k.
Each of these takes up binary space in the executable, and perhaps most
interestingly takes start-up time to apply the relocations.
The largest pattern I identified were the strings used to describe
target builtins. The addresses of these string literals were stored into
huge arrays, each one requiring a dynamic relocation. The way to avoid
this is to design the target builtins to use a single large table of
strings and offsets within the table for the individual strings. This
switches the builtin management to such a scheme.
This saves over 100k dynamic relocations by my measurement, an over 25%
reduction. Just looking at byte size improvements, using the `bloaty`
tool to compare a newly built `clang` binary to an old one:
```
FILE SIZE VM SIZE
-------------- --------------
+1.4% +653Ki +1.4% +653Ki .rodata
+0.0% +960 +0.0% +960 .text
+0.0% +197 +0.0% +197 .dynstr
+0.0% +184 +0.0% +184 .eh_frame
+0.0% +96 +0.0% +96 .dynsym
+0.0% +40 +0.0% +40 .eh_frame_hdr
+114% +32 [ = ] 0 [Unmapped]
+0.0% +20 +0.0% +20 .gnu.hash
+0.0% +8 +0.0% +8 .gnu.version
+0.9% +7 +0.9% +7 [LOAD #2 [R]]
[ = ] 0 -75.4% -3.00Ki .relro_padding
-16.1% -802Ki -16.1% -802Ki .data.rel.ro
-27.3% -2.52Mi -27.3% -2.52Mi .rela.dyn
-1.6% -2.66Mi -1.6% -2.66Mi TOTAL
```
We get a 16% reduction in the `.data.rel.ro` section, and nearly 30%
reduction in `.rela.dyn` where those reloctaions are stored.
This is also visible in my benchmarking of binary start-up overhead at
least:
```
Benchmark 1: ./old_clang --version
Time (mean ± σ): 17.6 ms ± 1.5 ms [User: 4.1 ms, System: 13.3 ms]
Range (min … max): 14.2 ms … 22.8 ms 162 runs
Benchmark 2: ./new_clang --version
Time (mean ± σ): 15.5 ms ± 1.4 ms [User: 3.6 ms, System: 11.8 ms]
Range (min … max): 12.4 ms … 20.3 ms 216 runs
Summary
'./new_clang --version' ran
1.13 ± 0.14 times faster than './old_clang --version'
```
We get about 2ms faster `--version` runs. While there is a lot of noise
in binary execution time, this delta is pretty consistent, and
represents over 10% improvement. This is particularly interesting to me
because for very short source files, repeatedly starting the `clang`
binary is actually the dominant cost. For example, `configure` scripts
running against the `clang` compiler are slow in large part because of
binary start up time, not the time to process the actual inputs to the
compiler.
----
This PR implements the string tables using `constexpr` code and the
existing macro system. I understand that the builtins are moving towards
a TableGen model, and if complete that would provide more options for
modeling this. Unfortunately, that migration isn't complete, and even
the parts that are migrated still rely on the ability to break out of
the TableGen model and directly expand an X-macro style `BUILTIN(...)`
textually. I looked at trying to complete the move to TableGen, but it
would both require the difficult migration of the remaining targets, and
solving some tricky problems with how to move away from any macro-based
expansion.
I was also able to find a reasonably clean and effective way of doing
this with the existing macros and some `constexpr` code that I think is
clean enough to be a pretty good intermediate state, and maybe give a
good target for the eventual TableGen solution. I was also able to
factor the macros into set of consistent patterns that avoids a
significant regression in overall boilerplate.
show more ...
|
#
ea6cdb9a |
| 03-Dec-2024 |
Matthias Braun <matze@braunis.de> |
allow prefer 256 bit attribute target (#117092)
This allows
`__attribute__((target("prefer-256-bit")))` /
`__attribute__((target("no-prefer-256-bit")))` to create variants of a
functions with 256
allow prefer 256 bit attribute target (#117092)
This allows
`__attribute__((target("prefer-256-bit")))` /
`__attribute__((target("no-prefer-256-bit")))` to create variants of a
functions with 256/512 bit vector sizes within the same application.
show more ...
|
Revision tags: llvmorg-19.1.5 |
|
#
b869f1bd |
| 28-Nov-2024 |
Jie Fu <jiefu@tencent.com> |
[clang] Remove unused lambda capture (NFC)
/llvm-project/clang/lib/Basic/Targets/X86.cpp:1368:23: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture] auto getPriority = [thi
[clang] Remove unused lambda capture (NFC)
/llvm-project/clang/lib/Basic/Targets/X86.cpp:1368:23: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture] auto getPriority = [this](StringRef Feature) -> unsigned { ^~~~ 1 error generated.
show more ...
|
#
88c2af80 |
| 28-Nov-2024 |
Alexandros Lamprineas <alexandros.lamprineas@arm.com> |
[NFC][clang][FMV][TargetInfo] Refactor API for FMV feature priority. (#116257)
Currently we have code with target hooks in CodeGenModule shared between
X86 and AArch64 for sorting MultiVersionResol
[NFC][clang][FMV][TargetInfo] Refactor API for FMV feature priority. (#116257)
Currently we have code with target hooks in CodeGenModule shared between
X86 and AArch64 for sorting MultiVersionResolverOptions. Those are used
when generating IFunc resolvers for FMV. The RISCV target has different
criteria for sorting, therefore it repeats sorting after calling
CodeGenFunction::EmitMultiVersionResolver.
I am moving the FMV priority logic in TargetInfo, so that it can be
implemented by the TargetParser which then makes it possible to query it
from llvm. Here is an example why this is handy:
https://github.com/llvm/llvm-project/pull/87939
show more ...
|
Revision tags: llvmorg-19.1.4 |
|
#
97836bed |
| 18-Nov-2024 |
Freddy Ye <freddy.ye@intel.com> |
Reland "[X86] Support -march=diamondrapids (#113881)" (#116564)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
|
#
90e92239 |
| 18-Nov-2024 |
Freddy Ye <freddy.ye@intel.com> |
Revert "[X86] Support -march=diamondrapids (#113881)" (#116563)
This reverts commit 826b845c9e97448395431be3e4e5da585bd98c5e.
|
#
826b845c |
| 18-Nov-2024 |
Freddy Ye <freddy.ye@intel.com> |
[X86] Support -march=diamondrapids (#113881)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
|
#
f77101ea |
| 12-Nov-2024 |
Malay Sanghi <malay.sanghi@intel.com> |
[X86][AMX] Support AMX-MOVRS (#115151)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
|
#
eddb79d5 |
| 11-Nov-2024 |
Feng Zou <feng.zou@intel.com> |
[X86][AMX] Support AMX-TF32 (#115625)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
|
#
8f440137 |
| 09-Nov-2024 |
Phoebe Wang <phoebe.wang@intel.com> |
Reland "[X86][AMX] Support AMX-AVX512" (#115581)
Resolve compile fail without SSE2.
|
#
ff225154 |
| 09-Nov-2024 |
Alan Zhao <ayzhao@google.com> |
Revert "[X86][AMX] Support AMX-AVX512" (#115570)
Reverts llvm/llvm-project#114070
Reason: Causes `immintrin.h` to fail to compile if `-msse` and
`-mno-sse2` are passed to clang:
https://github.
Revert "[X86][AMX] Support AMX-AVX512" (#115570)
Reverts llvm/llvm-project#114070
Reason: Causes `immintrin.h` to fail to compile if `-msse` and
`-mno-sse2` are passed to clang:
https://github.com/llvm/llvm-project/pull/114070#issuecomment-2465926700
show more ...
|
#
58a17e1b |
| 08-Nov-2024 |
Phoebe Wang <phoebe.wang@intel.com> |
[X86][AMX] Support AMX-AVX512 (#114070)
|
#
c72a751d |
| 01-Nov-2024 |
Phoebe Wang <phoebe.wang@intel.com> |
[X86][AMX] Support AMX-TRANSPOSE (#113532)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
|
#
81271624 |
| 31-Oct-2024 |
Feng Zou <feng.zou@intel.com> |
[X86][AMX] Support AMX-FP8 (#113850)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
|
#
50826382 |
| 30-Oct-2024 |
Nikolas Klauser <nikolasklauser@berlin.de> |
[Clang] Start moving X86Builtins.def to X86Builtins.td (#106005)
This starts moving `X86Builtins.def` to be a tablegen file. It's quite
large, so I think it'd be good to move things in multiple ste
[Clang] Start moving X86Builtins.def to X86Builtins.td (#106005)
This starts moving `X86Builtins.def` to be a tablegen file. It's quite
large, so I think it'd be good to move things in multiple steps to avoid
a bunch of merge conflicts due to the amount of time this takes to
complete.
show more ...
|
Revision tags: llvmorg-19.1.3 |
|
#
7bd8a165 |
| 28-Oct-2024 |
Craig Topper <craig.topper@sifive.com> |
[X86] Don't allow '+f' as an inline asm constraint. (#113871)
f cannot be used as an output constraint. We already errored for '=f'
but not '+f'.
Fixes #113692.
|
#
c4248fa3 |
| 25-Oct-2024 |
Freddy Ye <freddy.ye@intel.com> |
[X86] Support MOVRS and AVX10.2 instructions. (#113274)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368
|
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0 |
|
#
02e4186d |
| 13-Sep-2024 |
Ganesh <Ganesh.Gopalasubramanian@amd.com> |
[X86] AMD Zen 5 Initial enablement (#107964)
This patch enables the basic skeleton enablement of AMD next gen zen5 CPUs.
|
#
83ad644a |
| 04-Sep-2024 |
Freddy Ye <freddy.ye@intel.com> |
[X86][AVX10.2] Support AVX10.2-BF16 new instructions. (#101603)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
|
Revision tags: llvmorg-19.1.0-rc4 |
|
#
3f25f23a |
| 20-Aug-2024 |
Phoebe Wang <phoebe.wang@intel.com> |
[X86][AVX10] Fix unexpected error and warning when using intrinsic (#104781)
E.g.: https://godbolt.org/z/G8zK5svjK
Based on Evgenii's work.
|
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2 |
|
#
259ca9ee |
| 03-Aug-2024 |
Phoebe Wang <phoebe.wang@intel.com> |
Reland "[X86][AVX10.2] Support AVX10.2 option and VMPSADBW/VADDP[D,H,S] new instructions (#101452)" (#101616)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
|