|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2 |
|
| #
b31fffbc |
| 05-Feb-2024 |
Nikita Popov <npopov@redhat.com> |
[ARM] Convert tests to opaque pointers (NFC)
|
|
Revision tags: llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3 |
|
| #
7b3bbd83 |
| 09-Oct-2023 |
Jay Foad <jay.foad@amd.com> |
Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"
This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.
Reverted due to various buildbot failures.
|
| #
2501ae58 |
| 09-Oct-2023 |
Jay Foad <jay.foad@amd.com> |
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
show more ...
|
|
Revision tags: llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
211ce51f |
| 22-May-2021 |
David Green <david.green@arm.com> |
[ARM] Clean up some tests, removing dead instructions. NFC
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4 |
|
| #
bd516d24 |
| 11-Mar-2021 |
David Green <david.green@arm.com> |
[ARM] Move t2DoLoopStart reg alloc hint
This adjusts the place that the t2DoLoopStart reg allocation hint is inserted, adding it in the ARMTPAndVPTOptimizaionPass in a similar place as other tail pr
[ARM] Move t2DoLoopStart reg alloc hint
This adjusts the place that the t2DoLoopStart reg allocation hint is inserted, adding it in the ARMTPAndVPTOptimizaionPass in a similar place as other tail predicated loop optimizations. This removes the need for doing so in a custom inserter, and should make the hint more accurate, only adding it where we expect to create a DLS (not DLSTP or WLS).
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3 |
|
| #
48230355 |
| 02-Feb-2021 |
David Green <david.green@arm.com> |
[ARM] Remove DLS lr, lr
A DLS lr, lr instruction only moves lr to itself. It need not be emitted on it's own to save a instruction in the loop preheader.
Differential Revision: https://reviews.llvm
[ARM] Remove DLS lr, lr
A DLS lr, lr instruction only moves lr to itself. It need not be emitted on it's own to save a instruction in the loop preheader.
Differential Revision: https://reviews.llvm.org/D78916
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1 |
|
| #
73a6cd4b |
| 10-Nov-2020 |
David Green <david.green@arm.com> |
[ARM] Add a RegAllocHint for hinting t2DoLoopStart towards LR
This hints the operand of a t2DoLoopStart towards using LR, which can help make it more likely to become t2DLS lr, lr. This makes it eas
[ARM] Add a RegAllocHint for hinting t2DoLoopStart towards LR
This hints the operand of a t2DoLoopStart towards using LR, which can help make it more likely to become t2DLS lr, lr. This makes it easier to move if needed (as the input is the same as the output), or potentially remove entirely.
The hint is added after others (from COPY's etc) which still take precedence. It needed to find a place to add the hint, which currently uses the post isel custom inserter.
Differential Revision: https://reviews.llvm.org/D89883
show more ...
|
| #
b2ac9681 |
| 10-Nov-2020 |
David Green <david.green@arm.com> |
[ARM] Alter t2DoLoopStart to define lr
This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR
This will hopefully mean that low overhead loops are more t
[ARM] Alter t2DoLoopStart to define lr
This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR
This will hopefully mean that low overhead loops are more tied together, and we can more reliably generate loops without reverting or being at the whims of the register allocator.
This is a fairly simple change in itself, but leads to a number of other required alterations.
- The hardware loop pass, if UsePhi is set, now generates loops of the form: %start = llvm.start.loop.iterations(%N) loop: %p = phi [%start], [%dec] %dec = llvm.loop.decrement.reg(%p, 1) %c = icmp ne %dec, 0 br %c, loop, exit - For this a new llvm.start.loop.iterations intrinsic was added, identical to llvm.set.loop.iterations but produces a value as seen above, gluing the loop together more through def-use chains. - This new instrinsic conceptually produces the same output as input, which is taught to SCEV so that the checks in MVETailPredication are not affected. - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has been left mostly as before. We should now more reliably be able to tell that the t2DoLoopStart is correct without having to prove it, but t2WhileLoopStart and tail-predicated loops will remain the same. - And all the tests have been updated. There are a lot of them!
This patch on it's own might cause more trouble that it helps, with more tail-predicated loops being reverted, but some additional patches can hopefully improve upon that to get to something that is better overall.
Differential Revision: https://reviews.llvm.org/D89881
show more ...
|
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6 |
|
| #
322d0afd |
| 03-Oct-2020 |
Amara Emerson <amara@apple.com> |
[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.
This change renames the intrinsics to not have "experimental" in the name.
The autoupgrader will handle lega
[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.
This change renames the intrinsics to not have "experimental" in the name.
The autoupgrader will handle legacy intrinsics.
Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html
Differential Revision: https://reviews.llvm.org/D88787
show more ...
|
|
Revision tags: llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3 |
|
| #
b30adfb5 |
| 28-Aug-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Liveouts and reductions
Remove the code that tried to look for reduction patterns, since the vectorizer and isel can now produce predicated arithmetic instructios within the
[ARM][LowOverheadLoops] Liveouts and reductions
Remove the code that tried to look for reduction patterns, since the vectorizer and isel can now produce predicated arithmetic instructios within the loop body. This has required some reorganisation and fixes around live-out and predication checks, as well as looking for cases where an input/output is initialised to zero.
Differential Revision: https://reviews.llvm.org/D86613
show more ...
|
| #
a3e41d45 |
| 27-Aug-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM] Make MachineVerifier more strict about terminators
Fix the ARM backend's analyzeBranch so it doesn't ignore predicated return instructions, and make the MachineVerifier rule more strict.
Diff
[ARM] Make MachineVerifier more strict about terminators
Fix the ARM backend's analyzeBranch so it doesn't ignore predicated return instructions, and make the MachineVerifier rule more strict.
Differential Revision: https://reviews.llvm.org/D40061
show more ...
|
| #
c352e7fb |
| 25-Aug-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks
This adapts tail-predication to the new semantics of get.active.lane.mask as defined in D86147. This means that: - we can remove the B
[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks
This adapts tail-predication to the new semantics of get.active.lane.mask as defined in D86147. This means that: - we can remove the BTC + 1 overflow checks because now the loop tripcount is passed in to the intrinsic, - we can immediately use that value to setup a counter for the number of elements processed by the loop and don't need to materialize BTC + 1.
Differential Revision: https://reviews.llvm.org/D86303
show more ...
|
|
Revision tags: llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init |
|
| #
595270ae |
| 13-Jul-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM][MVE] Refactor option -disable-mve-tail-predication
This refactors option -disable-mve-tail-predication to take different arguments so that we have 1 option to control tail-predication rather t
[ARM][MVE] Refactor option -disable-mve-tail-predication
This refactors option -disable-mve-tail-predication to take different arguments so that we have 1 option to control tail-predication rather than several different ones.
This is also a prep step for D82953, in which we want to reject reductions unless that is requested with this option.
Differential Revision: https://reviews.llvm.org/D83133
show more ...
|
|
Revision tags: llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3 |
|
| #
3ee580d0 |
| 01-Jul-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Handle reductions
While validating live-out values, record instructions that look like a reduction. This will comprise of a vector op (for now only vadd), a vorr (vmov) which
[ARM][LowOverheadLoops] Handle reductions
While validating live-out values, record instructions that look like a reduction. This will comprise of a vector op (for now only vadd), a vorr (vmov) which store the previous value of vadd and then a vpsel in the exit block which is predicated upon a vctp. This vctp will combine the last two iterations using the vmov and vadd into a vector which can then be consumed by a vaddv.
Once we have determined that it's safe to perform tail-predication, we need to change this sequence of instructions so that the predication doesn't produce incorrect code. This involves changing the register allocation of the vadd so it updates itself and the predication on the final iteration will not update the falsely predicated lanes. This mimics what the vmov, vctp and vpsel do and so we then don't need any of those instructions.
Differential Revision: https://reviews.llvm.org/D75533
show more ...
|
|
Revision tags: llvmorg-10.0.1-rc2 |
|
| #
d1522513 |
| 17-Jun-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM] Reimplement MVE Tail-Predication pass using @llvm.get.active.lane.mask
To set up a tail-predicated loop, we need to to calculate the number of elements processed by the loop. We can now use in
[ARM] Reimplement MVE Tail-Predication pass using @llvm.get.active.lane.mask
To set up a tail-predicated loop, we need to to calculate the number of elements processed by the loop. We can now use intrinsic @llvm.get.active.lane.mask() to do this, which is emitted by the vectoriser in D79100. This intrinsic generates a predicate for the masked loads/stores, and consumes the Backedge Taken Count (BTC) as its second argument. We can now use that to reconstruct the loop tripcount, instead of the IR pattern match approach we were using before.
Many thanks to Eli Friedman and Sam Parker for all their help with this work.
This also adds overflow checks for the different, new expressions that we create: the loop tripcount, and the sub expression that calculates the remaining elements to be processed. For the latter, SCEV is not able to calculate precise enough bounds, so we work around that at the moment, but is not entirely correct yet, it's conservative. The overflow checks can be overruled with a force flag, which is thus potentially unsafe (but not really because the vectoriser is the only place where this intrinsic is emitted at the moment). It's also good to mention that the tail-predication pass is not yet enabled by default. We will follow up to see if we can implement these overflow checks better, either by a change in SCEV or we may want revise the definition of llvm.get.active.lane.mask.
Differential Revision: https://reviews.llvm.org/D79175
show more ...
|
|
Revision tags: llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3 |
|
| #
de3e65e6 |
| 18-Feb-2020 |
Sam Parker <sam.parker@arm.com> |
[ARM][LowOverheadLoops] Check loop liveouts
Check that no Q-regs are live out of the loop, unless the instruction within the loop is predicated on the vctp.
Differential Revision: https://reviews.l
[ARM][LowOverheadLoops] Check loop liveouts
Check that no Q-regs are live out of the loop, unless the instruction within the loop is predicated on the vctp.
Differential Revision: https://reviews.llvm.org/D72713
show more ...
|
|
Revision tags: llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1 |
|
| #
8cba99e2 |
| 20-Jan-2020 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM][MVE] Tail-Predication: rematerialise iteration count in exit blocks
This patch uses helper function rewriteLoopExitValues that is refactored in D72602 to rematerialise the iteration count in e
[ARM][MVE] Tail-Predication: rematerialise iteration count in exit blocks
This patch uses helper function rewriteLoopExitValues that is refactored in D72602 to rematerialise the iteration count in exit blocks, so that we can clean-up loop update expressions inside the hardware-loops later in ARMLowOverheadLoops, which is necessary to get actual performance gains for tail-predicated loops.
Differential Revision: https://reviews.llvm.org/D72714
show more ...
|
| #
5e51f755 |
| 17-Jan-2020 |
David Green <david.green@arm.com> |
[ARM] Favour post inc for MVE loops
We were previously not necessarily favouring postinc for the MVE loads and stores, leading to extra code prior to the loop to set up the preinc. MVE in general ca
[ARM] Favour post inc for MVE loops
We were previously not necessarily favouring postinc for the MVE loads and stores, leading to extra code prior to the loop to set up the preinc. MVE in general can benefit from postinc (as we don't have unrolled loops), and certain instructions like the VLD2's only post-inc versions are available.
Differential Revision: https://reviews.llvm.org/D70790
show more ...
|
|
Revision tags: llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3 |
|
| #
d97cf1f8 |
| 11-Dec-2019 |
Sjoerd Meijer <sjoerd.meijer@arm.com> |
[ARM][LowOverheadLoops] Remove dead loop update instructions.
After creating a low-overhead loop, the loop update instruction was still lingering around hurting performance. This removes dead loop u
[ARM][LowOverheadLoops] Remove dead loop update instructions.
After creating a low-overhead loop, the loop update instruction was still lingering around hurting performance. This removes dead loop update instructions, which in our case are mostly SUBS instructions.
To support this, some helper functions were added to MachineLoopUtils and ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses before a particular loop instruction, respectively.
This is a first version that removes a SUBS instruction when there are no other uses inside and outside the loop block, but there are some more interesting cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which shows that there is room for improvement. For example, we can't handle this case yet:
.. dlstp.32 lr, r2 .LBB0_1: mov r3, r2 subs r2, #4 vldrh.u32 q2, [r1], #8 vmov q1, q0 vmla.u32 q0, q2, r0 letp lr, .LBB0_1 @ %bb.2: vctp.32 r3 ..
which is a lot more tricky because r2 is not only used by the subs, but also by the mov to r3, which is used outside the low-overhead loop by the vctp instruction, and that requires a bit of a different approach, and I will follow up on this.
Differential Revision: https://reviews.llvm.org/D71007
show more ...
|
| #
b1aba037 |
| 08-Dec-2019 |
David Green <david.green@arm.com> |
[ARM] Enable MVE masked loads and stores
With the extra optimisations we have done, these should now be fine to enable by default. Which is what this patch does.
Differential Revision: https://revi
[ARM] Enable MVE masked loads and stores
With the extra optimisations we have done, these should now be fine to enable by default. Which is what this patch does.
Differential Revision: https://reviews.llvm.org/D70968
show more ...
|
|
Revision tags: llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1 |
|
| #
b5315ae8 |
| 21-Nov-2019 |
David Green <david.green@arm.com> |
[Codegen][ARM] Add addressing modes from masked loads and stores
MVE has a basic symmetry between it's normal loads/store operations and the masked variants. This means that masked loads and stores
[Codegen][ARM] Add addressing modes from masked loads and stores
MVE has a basic symmetry between it's normal loads/store operations and the masked variants. This means that masked loads and stores can use pre-inc and post-inc addressing modes, just like the standard loads and stores already do.
To enable that, this patch adds all the relevant infrastructure for treating masked loads/stores addressing modes in the same way as normal loads/stores.
This involves: - Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra Offset operand that is added after the PtrBase. - Extending the IndexedModeActions from 8bits to 16bits to store the legality of masked operations as well as normal ones. This array is fairly small, so doubling the size still won't make it very large. Offset masked loads can then be controlled with setIndexedMaskedLoadAction, similar to standard loads. - The same methods that combine to indexed loads, such as CombineToPostIndexedLoadStore, are adjusted to handle masked loads in the same way. - The ARM backend is then adjusted to make use of these indexed masked loads/stores. - The X86 backend is adjusted to hopefully be no functional changes.
Differential Revision: https://reviews.llvm.org/D70176
show more ...
|
| #
28166816 |
| 26-Nov-2019 |
Sam Parker <sam.parker@arm.com> |
[ARM][ReachingDefs] Remove dead code in loloops.
Add some more helper functions to ReachingDefs to query the uses of a given MachineInstr and also to query whether two MachineInstrs use the same def
[ARM][ReachingDefs] Remove dead code in loloops.
Add some more helper functions to ReachingDefs to query the uses of a given MachineInstr and also to query whether two MachineInstrs use the same def of a register.
For Arm, while tail-predicating, these helpers are used in the low-overhead loops to remove the dead code that calculates the number of loop iterations.
Differential Revision: https://reviews.llvm.org/D70240
show more ...
|
| #
8978c12b |
| 18-Nov-2019 |
Sam Parker <sam.parker@arm.com> |
[ARM][MVE] Tail predication conversion
This patch modifies ARMLowOverheadLoops to convert a predicated vector low-overhead loop into a tail-predicatd one. This is currently a very basic conversion,
[ARM][MVE] Tail predication conversion
This patch modifies ARMLowOverheadLoops to convert a predicated vector low-overhead loop into a tail-predicatd one. This is currently a very basic conversion, with the following restrictions: - Operates only on single block loops. - The loop can only contain a single vctp instruction. - No other instructions can write to the vpr. - We only allow a subset of the mve instructions in the loop.
TODO: Pass the number of elements, not the number of iterations to dlstp/wlstp.
Differential Revision: https://reviews.llvm.org/D69945
show more ...
|
| #
3ff961ca |
| 17-Oct-2019 |
Sam Parker <sam.parker@arm.com> |
[ARM][MVE] Change VPST to use, not def, VPR
Unlike VPT, VPST just uses the current value of VPR.P0.
Differential Revision: https://reviews.llvm.org/D69037
llvm-svn: 375087
|
| #
aac03ae0 |
| 30-Sep-2019 |
Sam Parker <sam.parker@arm.com> |
[ARM][MVE] Change VCTP operand
The VCTP instruction will calculate the predicate masked based upon the number of elements that need to be processed. I had inserted the sub before the vctp intrinsic
[ARM][MVE] Change VCTP operand
The VCTP instruction will calculate the predicate masked based upon the number of elements that need to be processed. I had inserted the sub before the vctp intrinsic and supplied it as the operand, but this is incorrect as the phi should directly feed the vctp. The sub is calculating the value for the next iteration.
Differential Revision: https://reviews.llvm.org/D67921
llvm-svn: 373188
show more ...
|