MVETailPredication.cpp - OpenGrok history log for /llvm-project/llvm/lib/Target/ARM/MVETailPredication.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4
# 9571cc2b	13-Nov-2024	Kazu Hirata <kazu@google.com>	[ARM] Remove unused includes (NFC) (#115995) Identified with misc-include-cleaner.
Revision tags: llvmorg-19.1.3
# 85c17e40	17-Oct-2024	Jay Foad <jay.foad@amd.com>	[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706) Convert many instances of: Fn = Intrinsic::getOrInsertDeclaration(...); CreateCall(Fn, ...) to the equivalent CreateIntrinsi [LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706) Convert many instances of: Fn = Intrinsic::getOrInsertDeclaration(...); CreateCall(Fn, ...) to the equivalent CreateIntrinsic call. show more ...
Revision tags: llvmorg-19.1.2
# fa789dff	11-Oct-2024	Rahul Joshi <rjoshi@nvidia.com>	[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752) Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is a [NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752) Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is also in preparation of adding a new `Intrinsic::getDeclaration` that will have behavior similar to `Module::getFunction` (i.e, just lookup, no creation). show more ...
Revision tags: llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2
# edf46f36	03-Aug-2024	Florian Hahn <flo@fhahn.com>	[SCEV] Use const SCEV * explicitly in more places. Use const SCEV * explicitly in more places to prepare for https://github.com/llvm/llvm-project/pull/91961. Split off as suggested.
Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init
# 2d209d96	27-Jun-2024	Nikita Popov <npopov@redhat.com>	[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902) This is a helper to avoid writing `getModule()->getDataLayout()`. I regularly try to use this method only to remember it does [IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902) This is a helper to avoid writing `getModule()->getDataLayout()`. I regularly try to use this method only to remember it doesn't exist... `getModule()->getDataLayout()` is also a common (the most common?) reason why code has to include the Module.h header. show more ...
# d75f9dd1	24-Jun-2024	Stephen Tozer <stephen.tozer@sony.com>	Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)" Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)" Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot/#/builders/29/builds/382 This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27. show more ...
# 6481dc57	24-Jun-2024	Stephen Tozer <stephen.tozer@sony.com>	[IR][NFC] Update IRBuilder to use InsertPosition (#96497) Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock [IR][NFC] Update IRBuilder to use InsertPosition (#96497) Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock alongside a BasicBlock::iterator, using the fact that we can now get the parent basic block from the iterator even if it points to the sentinel. This patch removes the BasicBlock argument from each constructor or call to setInsertPoint. This has no functional effect, but later on as we look to remove the `Instruction *InsertBefore` argument from instruction-creation (discussed [here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)), this will simplify the process by allowing us to deprecate the InsertPosition constructor directly and catch all the cases where we use instructions rather than iterators. show more ...
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0
# e54277fa	11-Sep-2023	Jeremy Morse <jeremy.morse@sony.com>	[NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilder This patch adds a two-argument SetInsertPoint method to IRBuilder that takes a block/iterator instead of an instruction, and up [NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilder This patch adds a two-argument SetInsertPoint method to IRBuilder that takes a block/iterator instead of an instruction, and updates many call sites to use it. The motivating reason for doing this is given here [0], we'd like to pass around more information about the position of debug-info in the iterator object. That necessitates passing iterators around most of the time. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939 Differential Revision: https://reviews.llvm.org/D152468 show more ...
Revision tags: llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4
# 9eef6d9a	07-May-2023	Kazu Hirata <kazu@google.com>	[ARM] Remove unused declaration RematerializeIterCount The corresponding function definition was removed by: commit af45907653fd312264632b616eff0fad1ae1eb2e Author: Sjoerd Meijer <sjoerd.meijer [ARM] Remove unused declaration RematerializeIterCount The corresponding function definition was removed by: commit af45907653fd312264632b616eff0fad1ae1eb2e Author: Sjoerd Meijer <sjoerd.meijer@arm.com> Date: Mon Jun 29 15:40:03 2020 +0100 show more ...
Revision tags: llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1
# eb64450a	29-Mar-2023	David Green <david.green@arm.com>	[ARM] Convert active.lane.masks to vctp with non-zero starts This attempts to expand the logic in the MVETailPredication pass to convert active lane masks that the vectorizer produces to vctp instru [ARM] Convert active.lane.masks to vctp with non-zero starts This attempts to expand the logic in the MVETailPredication pass to convert active lane masks that the vectorizer produces to vctp instructions that the backend can later turn into tail predicated loops. Especially for addrecs with non-zero starts that can be created from epilog vectorization. There is some adjustment to the logic to handle this, moving some of the code to check the addrec earlier so that we can get the start value. This start value is then incorporated into the logic of checkin the new vctp is valid, and there is a newly added check that it is known to be a multiple of the VF as we expect. Differential Revision: https://reviews.llvm.org/D146517 show more ...
Revision tags: llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0
# 2833760c	29-Aug-2022	Kazu Hirata <kazu@google.com>	[Target] Qualify auto in range-based for loops (NFC)
Revision tags: llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# ab0c5cea	03-Dec-2021	David Green <david.green@arm.com>	[ARM] Use v2i1 for MVE and CDE intrinsics This adjusts all the MVE and CDE intrinsics now that v2i1 is a legal type, to use a <2 x i1> as opposed to emulating the predicate with a <4 x i1>. The v4i1 [ARM] Use v2i1 for MVE and CDE intrinsics This adjusts all the MVE and CDE intrinsics now that v2i1 is a legal type, to use a <2 x i1> as opposed to emulating the predicate with a <4 x i1>. The v4i1 workarounds have been removed leaving the natural v2i1 types, notably in vctp64 which now generates a v2i1 type. AutoUpgrade code has been added to upgrade old IR, which needs to convert the old v4i1 to a v2i1 be converting it back and forth to an integer with arm.mve.v2i and arm.mve.i2v intrinsics. These should be optimized away in the final assembly. Differential Revision: https://reviews.llvm.org/D114455 show more ...
Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3
# 29fa37ec	01-Sep-2021	Philip Reames <listmail@philipreames.com>	[SCEV] If max BTC is zero, then so is the exact BTC [2 of 2] This extends D108921 into a generic rule applied to constructing ExitLimits along all paths. The remaining paths (primarily howFarToZero) [SCEV] If max BTC is zero, then so is the exact BTC [2 of 2] This extends D108921 into a generic rule applied to constructing ExitLimits along all paths. The remaining paths (primarily howFarToZero) don't have the same reasoning about UB sensitivity as the howManyLessThan ones did. Instead, the remain cause for max counts being more precise than exact counts is that we apply context sensitive loop guards on the max path, and not on the exact path. That choice is mildly suspect, but out of scope of this patch. The MVETailPredication.cpp change deserves a bit of explanation. We were previously figuring out that two SCEVs happened to be equal because the happened to be identical. When we optimized one with context sensitive information, but not the other, we lost the ability to prove them equal. So, cover this case by subtracting and then applying loop guards again. Without this, we see changes in test/CodeGen/Thumb2/mve-blockplacement.ll Differential Revision: https://reviews.llvm.org/D109015 show more ...
Revision tags: llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1
# 258e2e9a	26-Apr-2021	David Green <david.green@arm.com>	[ARM] Ensure loop invariant active.lane.mask operands CGP can move instructions like a ptrtoint into a loop, but the MVETailPredication when converting them will currently assume invariant trip coun [ARM] Ensure loop invariant active.lane.mask operands CGP can move instructions like a ptrtoint into a loop, but the MVETailPredication when converting them will currently assume invariant trip counts. This tries to ensure the operands are loop invariant, and bails if not. Differential Revision: https://reviews.llvm.org/D100550 show more ...
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4
# fad70c30	11-Mar-2021	David Green <david.green@arm.com>	[ARM] Improve WLS lowering Recently we improved the lowering of low overhead loops and tail predicated loops, but concentrated first on the DLS do style loops. This extends those improvements over t [ARM] Improve WLS lowering Recently we improved the lowering of low overhead loops and tail predicated loops, but concentrated first on the DLS do style loops. This extends those improvements over to the WLS while loops, improving the chance of lowering them successfully. To do this the lowering has to change a little as the instructions are terminators that produce a value - something that needs to be treated carefully. Lowering starts at the Hardware Loop pass, inserting a new llvm.test.start.loop.iterations that produces both an i1 to control the loop entry and an i32 similar to the llvm.start.loop.iterations intrinsic added for do loops. This feeds into the loop phi, properly gluing the values together: %wls = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %div) %wls0 = extractvalue { i32, i1 } %wls, 0 %wls1 = extractvalue { i32, i1 } %wls, 1 br i1 %wls1, label %loop.ph, label %loop.exit ... loop: %lsr.iv = phi i32 [ %wls0, %loop.ph ], [ %iv.next, %loop ] .. %iv.next = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1) %cmp = icmp ne i32 %iv.next, 0 br i1 %cmp, label %loop, label %loop.exit The llvm.test.start.loop.iterations need to be lowered through ISel lowering as a pair of WLS and WLSSETUP nodes, which each get converted to t2WhileLoopSetup and t2WhileLoopStart Pseudos. This helps prevent t2WhileLoopStart from being a terminator that produces a value, something difficult to control at that stage in the pipeline. Instead the t2WhileLoopSetup produces the value of LR (essentially acting as a lr = subs rn, 0), t2WhileLoopStart consumes that lr value (the Bcc). These are then converted into a single t2WhileLoopStartLR at the same point as t2DoLoopStartTP and t2LoopEndDec. Otherwise we revert the loop to prevent them from progressing further in the pipeline. The t2WhileLoopStartLR is a single instruction that takes a GPR and produces LR, similar to the WLS instruction. %1:gprlr = t2WhileLoopStartLR %0:rgpr, %bb.3 t2B %bb.1 ... bb.2.loop: %2:gprlr = PHI %1:gprlr, %bb.1, %3:gprlr, %bb.2 ... %3:gprlr = t2LoopEndDec %2:gprlr, %bb.2 t2B %bb.3 The t2WhileLoopStartLR can then be treated similar to the other low overhead loop pseudos, eventually being lowered to a WLS providing the branches are within range. Differential Revision: https://reviews.llvm.org/D97729 show more ...
Revision tags: llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2
# f5abf0bd	15-Jan-2021	David Green <david.green@arm.com>	[ARM] Tail predication with constant loop bounds The TripCount for a predicated vector loop body will be ceil(ElementCount/Width). This alters the conversion of an active.lane.mask to a VCPT intrins [ARM] Tail predication with constant loop bounds The TripCount for a predicated vector loop body will be ceil(ElementCount/Width). This alters the conversion of an active.lane.mask to a VCPT intrinsics to match. Differential Revision: https://reviews.llvm.org/D94608 show more ...
Revision tags: llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2
# 0e49a40d	26-Nov-2020	David Green <david.green@arm.com>	[ARM] Cleanup for the MVETailPrediction pass This strips out a lot of the code that should no longer be needed from the MVETailPredictionPass, leaving the important part - find active lane mask inst [ARM] Cleanup for the MVETailPrediction pass This strips out a lot of the code that should no longer be needed from the MVETailPredictionPass, leaving the important part - find active lane mask instructions and convert them to VCTP operations. Differential Revision: https://reviews.llvm.org/D91866 show more ...
Revision tags: llvmorg-11.0.1-rc1
# b2ac9681	10-Nov-2020	David Green <david.green@arm.com>	[ARM] Alter t2DoLoopStart to define lr This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR This will hopefully mean that low overhead loops are more t [ARM] Alter t2DoLoopStart to define lr This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR This will hopefully mean that low overhead loops are more tied together, and we can more reliably generate loops without reverting or being at the whims of the register allocator. This is a fairly simple change in itself, but leads to a number of other required alterations. - The hardware loop pass, if UsePhi is set, now generates loops of the form: %start = llvm.start.loop.iterations(%N) loop: %p = phi [%start], [%dec] %dec = llvm.loop.decrement.reg(%p, 1) %c = icmp ne %dec, 0 br %c, loop, exit - For this a new llvm.start.loop.iterations intrinsic was added, identical to llvm.set.loop.iterations but produces a value as seen above, gluing the loop together more through def-use chains. - This new instrinsic conceptually produces the same output as input, which is taught to SCEV so that the checks in MVETailPredication are not affected. - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has been left mostly as before. We should now more reliably be able to tell that the t2DoLoopStart is correct without having to prove it, but t2WhileLoopStart and tail-predicated loops will remain the same. - And all the tests have been updated. There are a lot of them! This patch on it's own might cause more trouble that it helps, with more tail-predicated loops being reverted, but some additional patches can hopefully improve upon that to get to something that is better overall. Differential Revision: https://reviews.llvm.org/D89881 show more ...
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6
# 322d0afd	03-Oct-2020	Amara Emerson <amara@apple.com>	[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics. This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle lega [llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics. This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle legacy intrinsics. Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html Differential Revision: https://reviews.llvm.org/D88787 show more ...
Revision tags: llvmorg-11.0.0-rc5
# 509fba75	28-Sep-2020	Tres Popp <tpopp@google.com>	[llvm] Fix unused variable in non-debug configurations
Revision tags: llvmorg-11.0.0-rc4
# 1696dd27	28-Sep-2020	Sjoerd Meijer <sjoerd.meijer@arm.com>	[ARM][MVE] Enable tail-predication by default We have been running tests/benchmarks downstream with tail-predication enabled for some time now and this behaves as expected: we are not aware of any c [ARM][MVE] Enable tail-predication by default We have been running tests/benchmarks downstream with tail-predication enabled for some time now and this behaves as expected: we are not aware of any correctness issues, and this performs better across the board than with tail-predication disabled. Time to flip the switch! Differential Revision: https://reviews.llvm.org/D88093 show more ...
# f39f92c1	24-Sep-2020	Sjoerd Meijer <sjoerd.meijer@arm.com>	[ARM][MVE] tail-predication: overflow checks for elementcount, cont'd This is a reimplementation of the overflow checks for the elementcount, i.e. the 2nd argument of intrinsic get.active.lane.mask. [ARM][MVE] tail-predication: overflow checks for elementcount, cont'd This is a reimplementation of the overflow checks for the elementcount, i.e. the 2nd argument of intrinsic get.active.lane.mask. The element count is lowered in each iteration of the tail-predicated loop, and we must prove that this expression doesn't overflow. Many thanks to Eli Friedman and Sam Parker for all their help with this work. Differential Revision: https://reviews.llvm.org/D88086 show more ...
# 2fc690ac	24-Sep-2020	Sjoerd Meijer <sjoerd.meijer@arm.com>	[ARM] LowoverheadLoops: add an option to disable tail-predication This might be useful for testing. We already have an option -tail-predication but that controls the MVETailPredication pass. This - [ARM] LowoverheadLoops: add an option to disable tail-predication This might be useful for testing. We already have an option -tail-predication but that controls the MVETailPredication pass. This -arm-loloops-disable-tail-pred is just for disabling it in the LowoverheadLoops pass. Differential Revision: https://reviews.llvm.org/D88212 show more ...
Revision tags: llvmorg-11.0.0-rc3
# b5c3efeb	16-Sep-2020	Sjoerd Meijer <sjoerd.meijer@arm.com>	[ARM][MVE] Tail-predication: predicate new elementcount checks on force-enabled Additional sanity checks were added to get.active.lane.mask's second argument, the loop tripcount/elementcount, in rG6 [ARM][MVE] Tail-predication: predicate new elementcount checks on force-enabled Additional sanity checks were added to get.active.lane.mask's second argument, the loop tripcount/elementcount, in rG635b87511ec3. Like the other (overflow) checks, skip this if tail-predication is forced. Differential Revision: https://reviews.llvm.org/D87769 show more ...
# 635b8751	15-Sep-2020	Sjoerd Meijer <sjoerd.meijer@arm.com>	[ARM][MVE] Tail-predication: use unsigned SCEV ranges for tripcount Loop tripcount expressions have a positive range, so use unsigned SCEV ranges for them. Differential Revision: https://reviews.ll [ARM][MVE] Tail-predication: use unsigned SCEV ranges for tripcount Loop tripcount expressions have a positive range, so use unsigned SCEV ranges for them. Differential Revision: https://reviews.llvm.org/D87608 show more ...
12 3