varying-outer-2d-reduction.ll - OpenGrok history log for /llvm-project/llvm/test/CodeGen/Thumb2/LowOverheadLoops/varying-outer-2d-reduction.ll

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3
# 7b3bbd83	09-Oct-2023	Jay Foad <jay.foad@amd.com>	Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)" This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c. Reverted due to various buildbot failures.
# 2501ae58	09-Oct-2023	Jay Foad <jay.foad@amd.com>	[CodeGen] Really renumber slot indexes before register allocation (#67038) PR #66334 tried to renumber slot indexes before register allocation, but the numbering was still affected by list entries [CodeGen] Really renumber slot indexes before register allocation (#67038) PR #66334 tried to renumber slot indexes before register allocation, but the numbering was still affected by list entries for instructions which had been erased. Fix this to make the register allocator's live range length heuristics even less dependent on the history of how instructions have been added to and removed from SlotIndexes's maps. show more ...
Revision tags: llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6
# 03de1cb7	09-Jun-2023	Nikita Popov <npopov@redhat.com>	[InstCombine][CGP] Move swapMayExposeCSEOpportunities() fold InstCombine tries to swap compare operands to match sub instructions in order to expose "CSE opportunities". However, it doesn't really m [InstCombine][CGP] Move swapMayExposeCSEOpportunities() fold InstCombine tries to swap compare operands to match sub instructions in order to expose "CSE opportunities". However, it doesn't really make sense to perform this transform in the middle-end, as we cannot actually CSE the instructions there. The backend already performs this fold in https://github.com/llvm/llvm-project/blob/18f5446a45da5a61dbfb1b7667d27fb441ac62db/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp#L4236 on the SDAG level, however this only works within a single basic block. To handle cross-BB cases, we do need to handle this in the IR layer. This patch moves the fold from InstCombine to CGP in the backend, while keeping the same (somewhat dubious) heuristic. Differential Revision: https://reviews.llvm.org/D152541 show more ...
Revision tags: llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1
# eb64450a	29-Mar-2023	David Green <david.green@arm.com>	[ARM] Convert active.lane.masks to vctp with non-zero starts This attempts to expand the logic in the MVETailPredication pass to convert active lane masks that the vectorizer produces to vctp instru [ARM] Convert active.lane.masks to vctp with non-zero starts This attempts to expand the logic in the MVETailPredication pass to convert active lane masks that the vectorizer produces to vctp instructions that the backend can later turn into tail predicated loops. Especially for addrecs with non-zero starts that can be created from epilog vectorization. There is some adjustment to the logic to handle this, moving some of the code to check the addrec earlier so that we can get the start value. This start value is then incorporated into the logic of checkin the new vctp is valid, and there is a newly added check that it is known to be a multiple of the VF as we expect. Differential Revision: https://reviews.llvm.org/D146517 show more ...
Revision tags: llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# b5b663aa	19-Dec-2022	Nikita Popov <npopov@redhat.com>	[Thumb2] Convert some tests to opaque pointers (NFC)
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1
# 09124402	04-Nov-2021	David Green <david.green@arm.com>	[ARM] Move VPTBlock pass after post-ra scheduling Currently when tail predicating loops, vpt blocks need to be created with the vctp predicate in case we need to revert to non-tail predicated form. [ARM] Move VPTBlock pass after post-ra scheduling Currently when tail predicating loops, vpt blocks need to be created with the vctp predicate in case we need to revert to non-tail predicated form. This has the unfortunate side effect of severely hampering post-ra scheduling at times as the instructions are already stuck in vpt blocks, not allowed to be independently ordered. This patch addresses that by just moving the creation of VPT blocks later in the pipeline, after post-ra scheduling has been performed. This allows more optimal scheduling post-ra before the vpt blocks are created, leading to more optimal tail predicated loops. Differential Revision: https://reviews.llvm.org/D113094 show more ...
# 08d7eec0	24-Sep-2021	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	Revert "Allow rematerialization of virtual reg uses" Reverted due to two distcint performance regression reports. This reverts commit 92c1fd19abb15bc68b1127a26137a69e033cdb39.
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2
# 92c1fd19	19-Aug-2021	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	Allow rematerialization of virtual reg uses Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a tr Allow rematerialization of virtual reg uses Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges. It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt(). The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable. The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists. Differential Revision: https://reviews.llvm.org/D106408 show more ...
# 2d4470ab	18-Aug-2021	Petr Hosek <phosek@google.com>	Revert "Allow rematerialization of virtual reg uses" This reverts commit 877572cc193a470f310eec46a7ce793a6cc97c2f which introduced PR51516.
# 877572cc	09-Aug-2021	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	Allow rematerialization of virtual reg uses Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a tr Allow rematerialization of virtual reg uses Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges. It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt(). The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable. The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists. Differential Revision: https://reviews.llvm.org/D106408 show more ...
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1
# 211ce51f	22-May-2021	David Green <david.green@arm.com>	[ARM] Clean up some tests, removing dead instructions. NFC
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2
# e7dc083a	18-Jan-2021	David Green <david.green@arm.com>	[ARM] Don't handle low overhead branches in AnalyzeBranch It turns our that the BranchFolder and IfCvt does not like unanalyzable branches that fall-through. This means that removing the uncondition [ARM] Don't handle low overhead branches in AnalyzeBranch It turns our that the BranchFolder and IfCvt does not like unanalyzable branches that fall-through. This means that removing the unconditional branches from the end of tail predicated instruction can run into asserts and verifier issues. This effectively reverts 372eb2bbb6fb903ce76266e659dfefbaee67722b, but adds handling to t2DoLoopEndDec which are not branches, so can be safely skipped. show more ...
# 372eb2bb	16-Jan-2021	David Green <david.green@arm.com>	[ARM] Add low overhead loops terminators to AnalyzeBranch This treats low overhead loop branches the same as jump tables and indirect branches in analyzeBranch - they cannot be analyzed but the dire [ARM] Add low overhead loops terminators to AnalyzeBranch This treats low overhead loop branches the same as jump tables and indirect branches in analyzeBranch - they cannot be analyzed but the direct branches on the end of the block may be removed. This helps remove the unnecessary branches earlier, which can help produce better codegen (and change block layout in a number of cases). Differential Revision: https://reviews.llvm.org/D94392 show more ...
Revision tags: llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1
# b2ac9681	10-Nov-2020	David Green <david.green@arm.com>	[ARM] Alter t2DoLoopStart to define lr This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR This will hopefully mean that low overhead loops are more t [ARM] Alter t2DoLoopStart to define lr This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR This will hopefully mean that low overhead loops are more tied together, and we can more reliably generate loops without reverting or being at the whims of the register allocator. This is a fairly simple change in itself, but leads to a number of other required alterations. - The hardware loop pass, if UsePhi is set, now generates loops of the form: %start = llvm.start.loop.iterations(%N) loop: %p = phi [%start], [%dec] %dec = llvm.loop.decrement.reg(%p, 1) %c = icmp ne %dec, 0 br %c, loop, exit - For this a new llvm.start.loop.iterations intrinsic was added, identical to llvm.set.loop.iterations but produces a value as seen above, gluing the loop together more through def-use chains. - This new instrinsic conceptually produces the same output as input, which is taught to SCEV so that the checks in MVETailPredication are not affected. - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has been left mostly as before. We should now more reliably be able to tell that the t2DoLoopStart is correct without having to prove it, but t2WhileLoopStart and tail-predicated loops will remain the same. - And all the tests have been updated. There are a lot of them! This patch on it's own might cause more trouble that it helps, with more tail-predicated loops being reverted, but some additional patches can hopefully improve upon that to get to something that is better overall. Differential Revision: https://reviews.llvm.org/D89881 show more ...
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6
# 322d0afd	03-Oct-2020	Amara Emerson <amara@apple.com>	[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics. This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle lega [llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics. This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle legacy intrinsics. Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html Differential Revision: https://reviews.llvm.org/D88787 show more ...
Revision tags: llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3
# b30adfb5	28-Aug-2020	Sam Parker <sam.parker@arm.com>	[ARM][LowOverheadLoops] Liveouts and reductions Remove the code that tried to look for reduction patterns, since the vectorizer and isel can now produce predicated arithmetic instructios within the [ARM][LowOverheadLoops] Liveouts and reductions Remove the code that tried to look for reduction patterns, since the vectorizer and isel can now produce predicated arithmetic instructios within the loop body. This has required some reorganisation and fixes around live-out and predication checks, as well as looking for cases where an input/output is initialised to zero. Differential Revision: https://reviews.llvm.org/D86613 show more ...
# c352e7fb	25-Aug-2020	Sjoerd Meijer <sjoerd.meijer@arm.com>	[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks This adapts tail-predication to the new semantics of get.active.lane.mask as defined in D86147. This means that: - we can remove the B [ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks This adapts tail-predication to the new semantics of get.active.lane.mask as defined in D86147. This means that: - we can remove the BTC + 1 overflow checks because now the loop tripcount is passed in to the intrinsic, - we can immediately use that value to setup a counter for the number of elements processed by the loop and don't need to materialize BTC + 1. Differential Revision: https://reviews.llvm.org/D86303 show more ...
Revision tags: llvmorg-11.0.0-rc2
# 0f14b2e6	17-Aug-2020	Dávid Bolvanský <david.bolvansky@gmail.com>	Revert "[BPI] Improve static heuristics for integer comparisons" This reverts commit 50c743fa713002fe4e0c76d23043e6c1f9e9fe6f. Patch will be split to smaller ones.
# 50c743fa	13-Aug-2020	Dávid Bolvanský <david.bolvansky@gmail.com>	[BPI] Improve static heuristics for integer comparisons Similarly as for pointers, even for integers a == b is usually false. GCC also uses this heuristic. Reviewed By: ebrevnov Differential Revi [BPI] Improve static heuristics for integer comparisons Similarly as for pointers, even for integers a == b is usually false. GCC also uses this heuristic. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D85781 show more ...
# f9264995	13-Aug-2020	Dávid Bolvanský <david.bolvansky@gmail.com>	Revert "[BPI] Improve static heuristics for integer comparisons" This reverts commit 44587e2f7e732604cd6340061d40ac21e7e188e5. Sanitizer tests need to be updated.
# 44587e2f	13-Aug-2020	Dávid Bolvanský <david.bolvansky@gmail.com>	[BPI] Improve static heuristics for integer comparisons Similarly as for pointers, even for integers a == b is usually false. GCC also uses this heuristic. Reviewed By: ebrevnov Differential Revi [BPI] Improve static heuristics for integer comparisons Similarly as for pointers, even for integers a == b is usually false. GCC also uses this heuristic. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D85781 show more ...
# a0485421	13-Aug-2020	Dávid Bolvanský <david.bolvansky@gmail.com>	Revert "[BPI] Improve static heuristics for integer comparisons" This reverts commit 385c9d673f217e176b18e7bf6fe055154ac589c6.
# 385c9d67	13-Aug-2020	Dávid Bolvanský <david.bolvansky@gmail.com>	[BPI] Improve static heuristics for integer comparisons Similarly as for pointers, even for integers a == b is usually false. GCC also uses this heuristic. Reviewed By: ebrevnov Differential Revi [BPI] Improve static heuristics for integer comparisons Similarly as for pointers, even for integers a == b is usually false. GCC also uses this heuristic. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D85781 show more ...
# 6716e786	11-Aug-2020	Sjoerd Meijer <sjoerd.meijer@arm.com>	[ARM][MVE] tail-predication: overflow checks for backedge taken count. This pick ups the work on the overflow checks for get.active.lane.mask, which ensure that it is safe to insert the VCTP intrini [ARM][MVE] tail-predication: overflow checks for backedge taken count. This pick ups the work on the overflow checks for get.active.lane.mask, which ensure that it is safe to insert the VCTP intrinisc that enables tail-predication. For a 2d auto-correlation kernel and its inner loop j: M = Size - i; for (j = 0; j < M; j++) Sum += Input[j] * Input[j+i]; For this inner loop, the SCEV backedge taken count (BTC) expression is: (-1 + (sext i16 %Size to i32)),+,-1}<nw><%for.body> and LoopUtil cannotBeMaxInLoop couldn't calculate a bound on this, thus "BTC cannot be max" could not be determined. So overflow behaviour had to be assumed in the loop tripcount expression that uses the BTC. As a result tail-predication had to be forced (with an option) for this case. This change solves that by using ScalarEvolution's helper getConstantMaxBackedgeTakenCount which is able to determine the range of BTC, thus can determine it is safe, so that we no longer need to force tail-predication as reflected in the changed test cases. Differential Revision: https://reviews.llvm.org/D85737 show more ...
# a680c797	11-Aug-2020	Sjoerd Meijer <sjoerd.meijer@arm.com>	[ARM][MVE] Added extra tail-predication runs for auto-correlation test case. NFC
12