Revision tags: llvmorg-16.0.0-rc3 |
|
#
7e6e636f |
| 16-Feb-2023 |
Kazu Hirata <kazu@google.com> |
Use llvm::has_single_bit<uint32_t> (NFC)
This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t> where the argument is wider than uint32_t.
|
#
08533f8b |
| 14-Feb-2023 |
Jake Egan <jakeegan10@gmail.com> |
Revert "[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation"
These commits are causing a test-suite build failure on AIX. Revert for now for time to investigate. https://lab.ll
Revert "[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation"
These commits are causing a test-suite build failure on AIX. Revert for now for time to investigate. https://lab.llvm.org/buildbot/#/builders/214/builds/5779/steps/9/logs/stdio
This reverts commit bd87a2449da0c82e63cebdf9c131c54a5472e3a7 and 4c72266830ffa332ebb7cf1d3bbd6c56d001fa0f.
show more ...
|
#
c5085c91 |
| 14-Feb-2023 |
Jay Foad <jay.foad@amd.com> |
[CodeGen] Trivial simplification of some getRegisterType calls. NFC.
|
#
4c722668 |
| 09-Feb-2023 |
Alex Richardson <alexrichardson@google.com> |
Fix call to deprecated API in bd87a2449da0c82e63cebdf9c131c54a5472e3a7
|
#
bd87a244 |
| 09-Feb-2023 |
Alex Richardson <alexrichardson@google.com> |
[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation
This function was added for ARM targets, but aligning global/stack pointer arguments passed to memcpy/memmove/memset can imp
[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation
This function was added for ARM targets, but aligning global/stack pointer arguments passed to memcpy/memmove/memset can improve code size and performance for all targets that don't have fast unaligned accesses. This adds a generic implementation that adjusts the alignment to pointer size if unaligned accesses are slow. Review D134168 suggests that this significantly improves performance on synthetic benchmarks such as Dhrystone on RV32 as it avoids memcpy() calls.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D134282
show more ...
|
Revision tags: llvmorg-16.0.0-rc2 |
|
#
62c7f035 |
| 07-Feb-2023 |
Archibald Elliott <archibald.elliott@arm.com> |
[NFC][TargetParser] Remove llvm/ADT/Triple.h
I also ran `git clang-format` to get the headers in the right order for the new location, which has changed the order of other headers in two files.
|
#
526966d0 |
| 29-Jan-2023 |
Kazu Hirata <kazu@google.com> |
Use llvm::bit_ceil (NFC)
Note that:
std::has_single_bit(X) ? X : llvm::NextPowerOf2(X);
is equivalent to:
std::bit_ceil(X)
even for input 0.
|
Revision tags: llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
#
e70ae0f4 |
| 07-Jan-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
DAG/GlobalISel: Fix broken/redundant setting of MODereferenceable
This was incorrectly setting dereferenceable on unaligned operands. getLoadMemOperandFlags does the alignment dereferenceabilty chec
DAG/GlobalISel: Fix broken/redundant setting of MODereferenceable
This was incorrectly setting dereferenceable on unaligned operands. getLoadMemOperandFlags does the alignment dereferenceabilty check without alignment, and then both paths went on to check isDereferenceableAndAlignedPointer. Make getLoadMemOperandFlags check isDereferenceableAndAlignedPointer, and remove the second call.
show more ...
|
#
48f5d77e |
| 11-Jan-2023 |
Guillaume Chatelet <gchatelet@google.com> |
[NFC] Use TypeSize::getKnownMinValue() instead of TypeSize::getKnownMinSize()
This change is one of a series to implement the discussion from https://reviews.llvm.org/D141134.
|
#
16facf1c |
| 31-Dec-2022 |
Roman Lebedev <lebedev.ri@gmail.com> |
[DAGCombiner][TLI] Do not fuse bitcast to <1 x ?> into a load/store of a vector
Single-element vectors are legalized by splitting, so the the memory operations would also get scalarized. While we do
[DAGCombiner][TLI] Do not fuse bitcast to <1 x ?> into a load/store of a vector
Single-element vectors are legalized by splitting, so the the memory operations would also get scalarized. While we do have some support to reconstruct scalarized loads, we clearly don't catch everything.
The comment for the affected AArch64 store suggests that having two stores was the desired outcome in the first place.
This was showing as a source of *many* regressions with more aggressive ZERO_EXTEND_VECTOR_INREG recognition.
show more ...
|
#
603e8490 |
| 30-Dec-2022 |
Roman Lebedev <lebedev.ri@gmail.com> |
[NFC][TLI] Move `isLoadBitCastBeneficial()` implementation into source file
... so any change to it does not cause 700 source files to be recompiled.
|
#
89f36dd8 |
| 01-Dec-2022 |
Freddy Ye <freddy.ye@intel.com> |
[X86] Add ExpandLargeFpConvert Pass and enable for X86
As stated in https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528, this implementa
[X86] Add ExpandLargeFpConvert Pass and enable for X86
As stated in https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528, this implementation is very similar to ExpandLargeDivRem, which expands ‘fptoui .. to’, ‘fptosi .. to’, ‘uitofp .. to’, ‘sitofp .. to’ instructions with a bitwidth above a threshold into auto-generated functions. This is useful for targets like x86_64 that cannot lower fp convertions with more than 128 bits. The expanded nodes are referring from the IR generated by `compiler-rt/lib/builtins/floattidf.c`, `compiler-rt/lib/builtins/fixdfti.c`, and etc.
Corner cases: 1. For fp16: as there is no related builtins added in compliler-rt. So I mainly utilized the fp32 <-> fp16 lib calls to implement. 2. For fp80: as this pass is soft fp emulation and no fp80 instructions can help in this problem. I recommend users to deprecate this usage. For now, the implementation uses fp128 as the temporary conversion type and inserts fptrunc/ext at top/end of the function. 3. For bf16: as clang FE currently doesn't support bf16 algorithm operations (convert to int, float, +, -, *, ...), this patch doesn't consider bf16 for now. 4. For unsigned FPToI: since both default hardware behaviors and libgcc are ignoring "returns 0 for negative input" spec. This pass follows this old way to ignore unsigned FPToI. See this example: https://gcc.godbolt.org/z/bnv3jqW1M
The end-to-end tests are uploaded at https://reviews.llvm.org/D138261
Reviewed By: LuoYuanke, mgehre-amd
Differential Revision: https://reviews.llvm.org/D137241
show more ...
|
Revision tags: llvmorg-15.0.6 |
|
#
b39b76f2 |
| 22-Nov-2022 |
Phoebe Wang <phoebe.wang@intel.com> |
[X86] Allow no X87 on 32-bit
This patch is an alternative of D100091. It solved the problems in `f80` type lowering.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D137946
|
Revision tags: llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2 |
|
#
bcaf31ec |
| 21-Apr-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Allow finer grain control of an unaligned access speed
A target can return if a misaligned access is 'fast' as defined by the target or not. In reality there can be different levels of 'fas
[AMDGPU] Allow finer grain control of an unaligned access speed
A target can return if a misaligned access is 'fast' as defined by the target or not. In reality there can be different levels of 'fast' and 'slow'. This patch changes the boolean 'Fast' argument of the allowsMisalignedMemoryAccesses family of functions to an unsigned representing its speed.
A target can still define it as it wants and the direct translation of the current code uses 0 and 1 for current false and true. This makes the change an NFC.
Subsequent patch will start using an actual value of speed in the load/store vectorizer to compare if a vectorized access going to be not just fast, but not slower than before.
Differential Revision: https://reviews.llvm.org/D124217
show more ...
|
#
1121eca6 |
| 16-Sep-2022 |
Craig Topper <craig.topper@sifive.com> |
[VP][VE] Default VP_SREM/UREM to Expand and add generic expansion using VP_SDIV/UDIV+VP_MUL+VP_SUB.
I want to default all VP operations to Expand. These 2 were blocking because VE doesn't support th
[VP][VE] Default VP_SREM/UREM to Expand and add generic expansion using VP_SDIV/UDIV+VP_MUL+VP_SUB.
I want to default all VP operations to Expand. These 2 were blocking because VE doesn't support them and the tests were expecting them to fail a specific way. Using Expand caused them to fail differently.
Seemed better to emulate them using operations that are supported.
@simoll mentioned on Discord that VE has some expansion downstream. Not sure if its done like this or in the VE target.
Reviewed By: frasercrmck, efocht
Differential Revision: https://reviews.llvm.org/D133514
show more ...
|
#
c1502425 |
| 12-Sep-2022 |
Matthias Gehre <matthias.gehre@xilinx.com> |
Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth
Also remove new-pass-manager version of ExpandLargeDivRem because there is no way yet to access TargetL
Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth
Also remove new-pass-manager version of ExpandLargeDivRem because there is no way yet to access TargetLowering in the new pass manager.
Differential Revision: https://reviews.llvm.org/D133691
show more ...
|
#
5e96cea1 |
| 07-Sep-2022 |
Joe Loser <joeloser@fastmail.com> |
[llvm] Use std::size instead of llvm::array_lengthof
LLVM contains a helpful function for getting the size of a C-style array: `llvm::array_lengthof`. This is useful prior to C++17, but not as helpf
[llvm] Use std::size instead of llvm::array_lengthof
LLVM contains a helpful function for getting the size of a C-style array: `llvm::array_lengthof`. This is useful prior to C++17, but not as helpful for C++17 or later: `std::size` already has support for C-style arrays.
Change call sites to use `std::size` instead.
Differential Revision: https://reviews.llvm.org/D133429
show more ...
|
#
c349d7f4 |
| 02-Sep-2022 |
Benjamin Kramer <benny.kra@googlemail.com> |
[SelectionDAG] Rewrite bfloat16 softening to use the "half promotion" path
The main difference is that this preserves intermediate rounding steps, which the other route doesn't. This aligns bfloat16
[SelectionDAG] Rewrite bfloat16 softening to use the "half promotion" path
The main difference is that this preserves intermediate rounding steps, which the other route doesn't. This aligns bfloat16 more with half floats, which use this path on most targets.
I didn't understand what the difference was between these softening approaches when I first added bfloat lowerings, would be nice if we only had one of them.
Based on @pengfei 's D131502
Differential Revision: https://reviews.llvm.org/D133207
show more ...
|
#
7ed3d813 |
| 17-Aug-2022 |
Daniil Fukalov <1671137+dfukalov@users.noreply.github.com> |
[NFCI] Move cost estimation from TargetLowering to TargetTransformInfo.
TragetLowering had two last InstructionCost related `getTypeLegalizationCost()` and `getScalingFactorCost()` members, but all
[NFCI] Move cost estimation from TargetLowering to TargetTransformInfo.
TragetLowering had two last InstructionCost related `getTypeLegalizationCost()` and `getScalingFactorCost()` members, but all other costs are processed in TTI.
E.g. it is not comfortable to use other TTI members in these two functions overrided in a target.
Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout parameter - it was always passed from TTI.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D117723
show more ...
|
#
de9d80c1 |
| 08-Aug-2022 |
Fangrui Song <i@maskray.me> |
[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
|
#
65246d3e |
| 27-Jul-2022 |
Amara Emerson <amara@apple.com> |
Use hasNItemsOrLess() in MRI::hasAtMostUserInstrs().
|
#
19cdd190 |
| 26-Jul-2022 |
Amara Emerson <amara@apple.com> |
[AArch64][GlobalISel] Add heuristics for localizing G_CONSTANT.
This adds similar heuristics to G_GLOBAL_VALUE, querying the cost of materializing a specific constant in code size. Doing so prevents
[AArch64][GlobalISel] Add heuristics for localizing G_CONSTANT.
This adds similar heuristics to G_GLOBAL_VALUE, querying the cost of materializing a specific constant in code size. Doing so prevents us from sinking constants which require multiple instructions to generate into use blocks.
Code size savings on CTMark -Os: Program size.__text before after diff ClamAV/clamscan 381940.00 382052.00 0.0% lencod/lencod 428408.00 428428.00 0.0% SPASS/SPASS 411868.00 411876.00 0.0% kimwitu++/kc 449944.00 449944.00 0.0% Bullet/bullet 463588.00 463556.00 -0.0% sqlite3/sqlite3 284696.00 284668.00 -0.0% consumer-typeset/consumer-typeset 414492.00 414424.00 -0.0% 7zip/7zip-benchmark 595244.00 594972.00 -0.0% mafft/pairlocalalign 247512.00 247368.00 -0.1% tramp3d-v4/tramp3d-v4 372884.00 372044.00 -0.2% Geomean difference -0.0%
Differential Revision: https://reviews.llvm.org/D130554
show more ...
|
#
9e6d1f4b |
| 17-Jul-2022 |
Kazu Hirata <kazu@google.com> |
[CodeGen] Qualify auto variables in for loops (NFC)
|
#
ac2ad3b7 |
| 15-Jun-2022 |
Paul Robinson <paul.robinson@sony.com> |
[PS5] Support sin+cos->sincos optimization
|
#
8bc0bb95 |
| 07-Jun-2022 |
Benjamin Kramer <benny.kra@googlemail.com> |
Add a conversion from double to bf16
This introduces a new compiler-rt function `__truncdfbf2`.
|