|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3 |
|
| #
7b3bbd83 |
| 09-Oct-2023 |
Jay Foad <jay.foad@amd.com> |
Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"
This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.
Reverted due to various buildbot failures.
|
| #
2501ae58 |
| 09-Oct-2023 |
Jay Foad <jay.foad@amd.com> |
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
show more ...
|
|
Revision tags: llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
| #
a1cdb323 |
| 14-Jul-2023 |
Maurice Heumann <maurice.heumann@wibu.com> |
[ARM] Adjust strd/ldrd codegen alignment requirements
In change https://reviews.llvm.org/D152790, it was discovered that the alignment requirement calculation for LDRD/STRD codegen was suboptimal an
[ARM] Adjust strd/ldrd codegen alignment requirements
In change https://reviews.llvm.org/D152790, it was discovered that the alignment requirement calculation for LDRD/STRD codegen was suboptimal and the calculation for volatile loads and stores was adjusted.
This change here adopts the calculation for the remaining non-volatile occurances.
Recommitting after undefined behavior fix in D155093.
Differential Revision: https://reviews.llvm.org/D153800
show more ...
|
| #
ab3bb86d |
| 03-Jul-2023 |
David Spickett <david.spickett@linaro.org> |
Revert "[ARM] Adjust strd/ldrd codegen alignment requirements"
This reverts commit 92a9c30c61da7f973d55cd84fade424159b9cac9.
This has caused a test failure in the 2nd stage of Linaro's Arm 32 bit b
Revert "[ARM] Adjust strd/ldrd codegen alignment requirements"
This reverts commit 92a9c30c61da7f973d55cd84fade424159b9cac9.
This has caused a test failure in the 2nd stage of Linaro's Arm 32 bit buildbots.
LLVM::simplified-template-names.s
7: error: Simplified template DW_AT_name could not be reconstituted: check:10'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 8: original: f3<unsigned char, (unsigned char)'\x00'> check:10'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 9: reconstituted: f3<unsigned char, (unsigned char)'\x7f'> check:10'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I suspect a load/store is slightly off.
show more ...
|
| #
92a9c30c |
| 02-Jul-2023 |
Maurice Heumann <MauriceHeumann@gmail.com> |
[ARM] Adjust strd/ldrd codegen alignment requirements
In change https://reviews.llvm.org/D152790, it was discovered that the alignment requirement calculation for LDRD/STRD codegen was suboptimal an
[ARM] Adjust strd/ldrd codegen alignment requirements
In change https://reviews.llvm.org/D152790, it was discovered that the alignment requirement calculation for LDRD/STRD codegen was suboptimal and the calculation for volatile loads and stores was adjusted.
This change here adopts the calculation for the remaining non-volatile occurances.
Differential Revision: https://reviews.llvm.org/D153800
show more ...
|
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5 |
|
| #
c4a60c9d |
| 25-May-2023 |
sgokhale <sgokhale@nvidia.com> |
[CodeGen][ShrinkWrap] Enable PostShrinkWrap by default
This is an attempt to reland D42600 and enabling this optimisation by default.
This also resolves the issue pointed out in the context of PGO
[CodeGen][ShrinkWrap] Enable PostShrinkWrap by default
This is an attempt to reland D42600 and enabling this optimisation by default.
This also resolves the issue pointed out in the context of PGO build.
Differential Revision: https://reviews.llvm.org/D42600
show more ...
|
|
Revision tags: llvmorg-16.0.4 |
|
| #
f4999d35 |
| 08-May-2023 |
Alan Zhao <ayzhao@google.com> |
Revert "[CodeGen][ShrinkWrap] Split restore point"
This reverts commit 1ddfd1c8186735c62b642df05c505dc4907ffac4.
The original commit causes a Chrome build assertion failure with ThinLTO: https://cr
Revert "[CodeGen][ShrinkWrap] Split restore point"
This reverts commit 1ddfd1c8186735c62b642df05c505dc4907ffac4.
The original commit causes a Chrome build assertion failure with ThinLTO: https://crbug.com/1443635
show more ...
|
| #
1ddfd1c8 |
| 08-May-2023 |
sgokhale <sgokhale@nvidia.com> |
[CodeGen][ShrinkWrap] Split restore point
Try to reland D42600
Differential Revision: https://reviews.llvm.org/D42600
|
|
Revision tags: llvmorg-16.0.3, llvmorg-16.0.2 |
|
| #
bb5befef |
| 13-Apr-2023 |
sgokhale <sgokhale@nvidia.com> |
Revert "[CodeGen][ShrinkWrap] Split restore point"
This reverts commit 5f0bccc3d1a74111458c71f009817c9995f4bf83.
An issue has been reported here: https://github.com/ClangBuiltLinux/linux/issues/1833
|
| #
5f0bccc3 |
| 11-Apr-2023 |
sgokhale <sgokhale@nvidia.com> |
[CodeGen][ShrinkWrap] Split restore point
This patch splits a restore point to allow it to only post-dominate blocks reachable by use or def of CSRs(Callee Saved Registers)/FI(Frame Index).
Benchma
[CodeGen][ShrinkWrap] Split restore point
This patch splits a restore point to allow it to only post-dominate blocks reachable by use or def of CSRs(Callee Saved Registers)/FI(Frame Index).
Benchmarking this on SPEC2017, this gives around 4% improvement on povray and no significant change for others.
Co-authored-by: junbuml
Differential Revision: https://reviews.llvm.org/D42600
show more ...
|
|
Revision tags: llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
| #
b5b663aa |
| 19-Dec-2022 |
Nikita Popov <npopov@redhat.com> |
[Thumb2] Convert some tests to opaque pointers (NFC)
|
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
dcfc1fd2 |
| 14-Jul-2022 |
Craig Topper <craig.topper@sifive.com> |
[SelectionDAG][RISCV][AMDGPU][ARM] Improve SimplifyDemandedBits for SHL with variable shift amount.
If we have a variable shift amount and the demanded mask has leading zeros, we can propagate those
[SelectionDAG][RISCV][AMDGPU][ARM] Improve SimplifyDemandedBits for SHL with variable shift amount.
If we have a variable shift amount and the demanded mask has leading zeros, we can propagate those leading zeros to not demand those bits from operand 0. This can allow zero_extend/sign_extend to become any_extend. This pattern can occur due to C integer promotion rules.
This transform is already done by InstCombineSimplifyDemanded.cpp where sign_extend can be turned into zero_extend for example.
Reviewed By: spatel, foad
Differential Revision: https://reviews.llvm.org/D121833
show more ...
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
08d7eec0 |
| 24-Sep-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
Revert "Allow rematerialization of virtual reg uses"
Reverted due to two distcint performance regression reports.
This reverts commit 92c1fd19abb15bc68b1127a26137a69e033cdb39.
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
| #
92c1fd19 |
| 19-Aug-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
Allow rematerialization of virtual reg uses
Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a tr
Allow rematerialization of virtual reg uses
Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges.
It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt().
The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable.
The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve.
The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists.
Differential Revision: https://reviews.llvm.org/D106408
show more ...
|
| #
2d4470ab |
| 18-Aug-2021 |
Petr Hosek <phosek@google.com> |
Revert "Allow rematerialization of virtual reg uses"
This reverts commit 877572cc193a470f310eec46a7ce793a6cc97c2f which introduced PR51516.
|
| #
877572cc |
| 09-Aug-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
Allow rematerialization of virtual reg uses
Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a tr
Allow rematerialization of virtual reg uses
Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges.
It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt().
The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable.
The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve.
The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists.
Differential Revision: https://reviews.llvm.org/D106408
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
1011d4ed |
| 13-May-2021 |
David Green <david.green@arm.com> |
[ARM] Constrain CMPZ shift combine to a single use
We currently prefer t2CMPrs over t2CMPri when the node contains a shift. This can introduce more nodes if the shift has multiple uses though, as va
[ARM] Constrain CMPZ shift combine to a single use
We currently prefer t2CMPrs over t2CMPri when the node contains a shift. This can introduce more nodes if the shift has multiple uses though, as value from the shift will be needed anyway, and in the case of a t2CMPri compared with zero will more readily be removed entirely.
Differential Revision: https://reviews.llvm.org/D101688
show more ...
|
| #
ea817d79 |
| 28-Apr-2021 |
Teresa Johnson <tejohnson@google.com> |
[SimplifyCFG] Look for control flow changes instead of side effects.
When passingValueIsAlwaysUndefined scans for an instruction between an inst with a null or undef argument and its first use, it w
[SimplifyCFG] Look for control flow changes instead of side effects.
When passingValueIsAlwaysUndefined scans for an instruction between an inst with a null or undef argument and its first use, it was checking for instructions that may have side effects, which is a superset of the instructions it intended to find (as per the comments, control flow changing instructions that would prevent reaching the uses). Switch to using isGuaranteedToTransferExecutionToSuccessor() instead.
Without this change, when enabling -fwhole-program-vtables, which causes assumes to be inserted by clang, we can get different simplification decisions. In particular, when building with instrumentation FDO it can affect the optimizations decisions before FDO matching, leading to some mismatches.
I had to modify d83507-knowledge-retention-bug.ll since this fix enables more aggressive optimization of that code such that it no longer tested the original bug it was meant to test. I removed the undef which still provokes the original failure (confirmed by temporarily reverting the fix) and also changed it to just invoke the passes of interest to narrow the testing.
Similarly I needed to adjust code for UnreachableEliminate.ll to avoid an undef which was causing the function body to get optimized away with this fix.
Differential Revision: https://reviews.llvm.org/D101507
show more ...
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4 |
|
| #
d9718960 |
| 25-Mar-2021 |
David Green <david.green@arm.com> |
[ARM] Revert WhileLoopStartLR to DoLoopStart
If a WhileLoopStartLR is reverted due to calls in the preheader, we may still be able to instead create a DoLoopStart, preserving the low overhead loop.
[ARM] Revert WhileLoopStartLR to DoLoopStart
If a WhileLoopStartLR is reverted due to calls in the preheader, we may still be able to instead create a DoLoopStart, preserving the low overhead loop. This adds code for that, only reverting the WhileLoopStartR to a Br/Cmp, leaving the rest of the low overhead loop in place.
Differential Revision: https://reviews.llvm.org/D98413
show more ...
|
| #
fad70c30 |
| 11-Mar-2021 |
David Green <david.green@arm.com> |
[ARM] Improve WLS lowering
Recently we improved the lowering of low overhead loops and tail predicated loops, but concentrated first on the DLS do style loops. This extends those improvements over t
[ARM] Improve WLS lowering
Recently we improved the lowering of low overhead loops and tail predicated loops, but concentrated first on the DLS do style loops. This extends those improvements over to the WLS while loops, improving the chance of lowering them successfully. To do this the lowering has to change a little as the instructions are terminators that produce a value - something that needs to be treated carefully.
Lowering starts at the Hardware Loop pass, inserting a new llvm.test.start.loop.iterations that produces both an i1 to control the loop entry and an i32 similar to the llvm.start.loop.iterations intrinsic added for do loops. This feeds into the loop phi, properly gluing the values together:
%wls = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %div) %wls0 = extractvalue { i32, i1 } %wls, 0 %wls1 = extractvalue { i32, i1 } %wls, 1 br i1 %wls1, label %loop.ph, label %loop.exit ... loop: %lsr.iv = phi i32 [ %wls0, %loop.ph ], [ %iv.next, %loop ] .. %iv.next = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1) %cmp = icmp ne i32 %iv.next, 0 br i1 %cmp, label %loop, label %loop.exit
The llvm.test.start.loop.iterations need to be lowered through ISel lowering as a pair of WLS and WLSSETUP nodes, which each get converted to t2WhileLoopSetup and t2WhileLoopStart Pseudos. This helps prevent t2WhileLoopStart from being a terminator that produces a value, something difficult to control at that stage in the pipeline. Instead the t2WhileLoopSetup produces the value of LR (essentially acting as a lr = subs rn, 0), t2WhileLoopStart consumes that lr value (the Bcc).
These are then converted into a single t2WhileLoopStartLR at the same point as t2DoLoopStartTP and t2LoopEndDec. Otherwise we revert the loop to prevent them from progressing further in the pipeline. The t2WhileLoopStartLR is a single instruction that takes a GPR and produces LR, similar to the WLS instruction.
%1:gprlr = t2WhileLoopStartLR %0:rgpr, %bb.3 t2B %bb.1 ... bb.2.loop: %2:gprlr = PHI %1:gprlr, %bb.1, %3:gprlr, %bb.2 ... %3:gprlr = t2LoopEndDec %2:gprlr, %bb.2 t2B %bb.3
The t2WhileLoopStartLR can then be treated similar to the other low overhead loop pseudos, eventually being lowered to a WLS providing the branches are within range.
Differential Revision: https://reviews.llvm.org/D97729
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc3 |
|
| #
54e28761 |
| 01-Mar-2021 |
David Green <david.green@arm.com> |
[ARM] Update and add extra WLS testing. NFC
|
|
Revision tags: llvmorg-12.0.0-rc2 |
|
| #
7786ac83 |
| 11-Feb-2021 |
David Green <david.green@arm.com> |
[ARM] Remove dead mov's in preheader of tail predicated loops
With t2DoLoopDec we can be left with some extra MOV's in the preheaders of tail predicated loops. This removes them, in the same way we
[ARM] Remove dead mov's in preheader of tail predicated loops
With t2DoLoopDec we can be left with some extra MOV's in the preheaders of tail predicated loops. This removes them, in the same way we remove other dead variables.
Differential Revision: https://reviews.llvm.org/D91857
show more ...
|
|
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2 |
|
| #
e7dc083a |
| 18-Jan-2021 |
David Green <david.green@arm.com> |
[ARM] Don't handle low overhead branches in AnalyzeBranch
It turns our that the BranchFolder and IfCvt does not like unanalyzable branches that fall-through. This means that removing the uncondition
[ARM] Don't handle low overhead branches in AnalyzeBranch
It turns our that the BranchFolder and IfCvt does not like unanalyzable branches that fall-through. This means that removing the unconditional branches from the end of tail predicated instruction can run into asserts and verifier issues.
This effectively reverts 372eb2bbb6fb903ce76266e659dfefbaee67722b, but adds handling to t2DoLoopEndDec which are not branches, so can be safely skipped.
show more ...
|
| #
372eb2bb |
| 16-Jan-2021 |
David Green <david.green@arm.com> |
[ARM] Add low overhead loops terminators to AnalyzeBranch
This treats low overhead loop branches the same as jump tables and indirect branches in analyzeBranch - they cannot be analyzed but the dire
[ARM] Add low overhead loops terminators to AnalyzeBranch
This treats low overhead loop branches the same as jump tables and indirect branches in analyzeBranch - they cannot be analyzed but the direct branches on the end of the block may be removed. This helps remove the unnecessary branches earlier, which can help produce better codegen (and change block layout in a number of cases).
Differential Revision: https://reviews.llvm.org/D94392
show more ...
|
|
Revision tags: llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2 |
|
| #
3f571be1 |
| 11-Dec-2020 |
David Green <david.green@arm.com> |
[ARM] Make t2DoLoopStartTP a terminator
Although this was something that I was hoping we would not have to do, this patch makes t2DoLoopStartTP a terminator in order to keep it at the end of it's bl
[ARM] Make t2DoLoopStartTP a terminator
Although this was something that I was hoping we would not have to do, this patch makes t2DoLoopStartTP a terminator in order to keep it at the end of it's block, so not allowing extra MVE instruction between it and the end. With t2DoLoopStartTP's also starting tail predication regions, it also marks them as having side effects. The t2DoLoopStart is still not a terminator, giving it the extra scheduling freedom that can be helpful, but now that we have a TP version they can be treated differently.
Differential Revision: https://reviews.llvm.org/D91887
show more ...
|