|
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5 |
|
| #
b3995aa3 |
| 19-Nov-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Decrease default NSA threshold from 3 to 2 (#116624)
In graphics shaders it is better overall to use NSA encoding for IMAGE
instructions, because the benefit of less constrained register
[AMDGPU] Decrease default NSA threshold from 3 to 2 (#116624)
In graphics shaders it is better overall to use NSA encoding for IMAGE
instructions, because the benefit of less constrained register
allocation outweighs the cost of larger encoding. In particular NSA form
often avoids the need for extra V_MOV_B32 instructions between IMAGE
instructions, which can allow the IMAGE instructions to be claused.
Note that in GFX12 there is no longer a bit in the encoding to choose
between NSA and non-NSA forms, so this only affects GFX10 and GFX11.
show more ...
|
|
Revision tags: llvmorg-19.1.4 |
|
| #
2b5b57c5 |
| 12-Nov-2024 |
Christudasan Devadasan <christudasan.devadasan@amd.com> |
[AMDGPU] Skip non-wwm reg implicit-def from bb prolog (#115834)
Currently all implicit-def instructions are part of bb prolog. We should only include the wwm-register's implicit definitions into the
[AMDGPU] Skip non-wwm reg implicit-def from bb prolog (#115834)
Currently all implicit-def instructions are part of bb prolog. We should only include the wwm-register's implicit definitions into the BB prolog. The other vector class registers' implicit defs when exist at the bb top might cause interference when pushed the LR_split copy insertion downwards. The SplitKit is very strict on altering the insertion points and will assert such instances.
show more ...
|
|
Revision tags: llvmorg-19.1.3 |
|
| #
3c5cea65 |
| 21-Oct-2024 |
Christudasan Devadasan <christudasan.devadasan@amd.com> |
[AMDGPU]: Add implicit-def to the BB prolog (#112872)
IMPLICIT_DEF inserted for a wwm-register at the very first block or the predecessor block where it is used for sgpr spilling can appear at a blo
[AMDGPU]: Add implicit-def to the BB prolog (#112872)
IMPLICIT_DEF inserted for a wwm-register at the very first block or the predecessor block where it is used for sgpr spilling can appear at a block begin that requires spill-insertion during per-lane VGPR regalloc phase. The presence of the IMPLICIT_DEF currently breaks the BB prolog.
Fixes: SWDEV-490717
show more ...
|
|
Revision tags: llvmorg-19.1.2 |
|
| #
6636f326 |
| 08-Oct-2024 |
Christudasan Devadasan <christudasan.devadasan@amd.com> |
[AMDGPU] Include WWM register spill into BB Prolog (#111496)
With #93526 we split the regalloc pipeline further
to have a standalone allocation for wwm registers
and per-lane VGPRs. Currently the
[AMDGPU] Include WWM register spill into BB Prolog (#111496)
With #93526 we split the regalloc pipeline further
to have a standalone allocation for wwm registers
and per-lane VGPRs. Currently the presence of the
wwm-spill reloads inserted at the bb-top limits the
isBasicPrologue function during the per-lane vgpr
regalloc to skip past the exec manipulation instruction
and ended up causing incorrect codegen. The wmm-spill
inserted during the wwm-regalloc pipeline should also
be included in the bb-prolog so that the per-lane vgpr
regalloc pipeline can identify the appropriate insertion
points for their spills and copies.
show more ...
|
|
Revision tags: llvmorg-19.1.1 |
|
| #
ac0f64f0 |
| 30-Sep-2024 |
Christudasan Devadasan <christudasan.devadasan@amd.com> |
[AMDGPU] Split vgpr regalloc pipeline (#93526)
Allocating wwm-registers and per-thread VGPR operands
together imposes many challenges in the way the
registers are reused during allocation. There a
[AMDGPU] Split vgpr regalloc pipeline (#93526)
Allocating wwm-registers and per-thread VGPR operands
together imposes many challenges in the way the
registers are reused during allocation. There are
times when regalloc reuses the registers of regular
VGPRs operations for wwm-operations in a small range
leading to unwantedly clobbering their inactive lanes
causing correctness issues that are hard to trace.
This patch splits the VGPR allocation pipeline further
to allocate wwm-registers first and the regular VGPR
operands in a separate pipeline. The splitting would
ensure that the physical registers used for wwm
allocations won't take part in the next allocation
pipeline to avoid any such clobbering.
show more ...
|
| #
4f90e75b |
| 25-Sep-2024 |
Stanislav Mekhanoshin <rampitec@users.noreply.github.com> |
[AMDGPU] Do not count implicit VGPRs in SIInsertWaitcnts (#109049)
When generating waitcounts before a use or def skip VGPRs. We never have
a real implicit VGPR operands on memory instructions, it
[AMDGPU] Do not count implicit VGPRs in SIInsertWaitcnts (#109049)
When generating waitcounts before a use or def skip VGPRs. We never have
a real implicit VGPR operands on memory instructions, it is only for
super-reg liveness accounting.
Some other instructions (MOVRELS as an example) may have real implicit
VGPR uses though.
This is less then ideal but most of the problems observed with spills.
show more ...
|
|
Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4 |
|
| #
a0eb6b88 |
| 27-Oct-2023 |
Christudasan Devadasan <christudasan.devadasan@amd.com> |
[AMDGPU] Try to fix the block prologs broken by RA inserted instructions (#69924)
The insertion point determined by RA while attempting spills and liverange
split at the beginning of a block goes w
[AMDGPU] Try to fix the block prologs broken by RA inserted instructions (#69924)
The insertion point determined by RA while attempting spills and liverange
split at the beginning of a block goes wrong at times, and the newly
inserted vector instructions are placed before the exec-mask restore
instruction which is wrong. It occurs mainly due to the dependency on
isBasicBlockPrologue that doesn't account early inserted instructions
(spills and splits) during RA and causes the block prolog break.
A better approach for deciding the insertion point should be worked out.
For now, improving the helper function to consider all possible early
insertions. This patch includes the spill instructions. The copies
associated with liverange split should also be included in the block
prolog.
show more ...
|
|
Revision tags: llvmorg-17.0.3, llvmorg-17.0.2 |
|
| #
7ac532ef |
| 29-Sep-2023 |
Yashwant Singh <114088807+yashssh@users.noreply.github.com> |
[AMDGPU] Introduce AMDGPU::SGPR_SPILL asm comment flag (#67091)
Use this flag to give more context to implicit def comments in assembly.
Reviewed on phabricator:
https://reviews.llvm.org/D153754
|
|
Revision tags: llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1 |
|
| #
4d42e8b5 |
| 28-Jul-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.
The workaround in c26dfc81e254c78dc2
Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.
The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work around the underlying problem with SUBREG_TO_REG.
show more ...
|
| #
a496c8be |
| 26-Jul-2023 |
Vitaly Buka <vitalybuka@google.com> |
Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
And dependent commits.
Details in D150388.
This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048
Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
And dependent commits.
Details in D150388.
This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c. This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e. This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826. This reverts commit b7836d856206ec39509d42529f958c920368166b.
No conflicts in the code, few tests had conflicts in autogenerated CHECKs: llvm/test/CodeGen/Thumb2/mve-float32regloops.ll llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll
Reviewed By: alexfh
Differential Revision: https://reviews.llvm.org/D156381
show more ...
|
|
Revision tags: llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5 |
|
| #
7a98f084 |
| 17-May-2023 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector
[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector regclass allocation. This imposes many restrictions that we ended up with unsuccessful SGPR spilling when there won't be enough VGPRs and we are forced to spill the leftover into memory during PEI. The custom spill handling during PEI has many edge cases and often breaks the compiler time to time.
This patch implements spilling SGPRs into virtual VGPR lanes. Since we now split the register allocation for SGPRs and VGPRs, the virtual registers introduced for the spill lanes would get allocated automatically in the subsequent regalloc invocation for VGPRs.
Spill to virtual registers will always be successful, even in the high-pressure situations, and hence it avoids most of the edge cases during PEI. We are now left with only the custom SGPR spills during PEI for special registers like the frame pointer which is an unproblematic case.
Differential Revision: https://reviews.llvm.org/D124196
show more ...
|
| #
f2c164c8 |
| 21-Jun-2023 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Do not wait for vscnt on function entry and return
SIInsertWaitcnts inserts waitcnt instructions to resolve data dependencies. The GFX10+ vscnt (VMEM store count) counter is never used in t
[AMDGPU] Do not wait for vscnt on function entry and return
SIInsertWaitcnts inserts waitcnt instructions to resolve data dependencies. The GFX10+ vscnt (VMEM store count) counter is never used in this way. It is only used to resolve memory dependencies, and that is handled by SIMemoryLegalizer. Hence there is no need to conservatively wait for vscnt to be 0 on function entry and before returns.
Differential Revision: https://reviews.llvm.org/D153537
show more ...
|
|
Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init |
|
| #
d9a6fc82 |
| 24-Jan-2023 |
Pierre van Houtryve <pierre.vanhoutryve@amd.com> |
[AMDGPU] Run unmerge combines post regbankselect
RegBankSelect can insert G_UNMERGE_VALUES in a lot of places which left us with a lot of unmerge/merge pairs that could be simplified. These often go
[AMDGPU] Run unmerge combines post regbankselect
RegBankSelect can insert G_UNMERGE_VALUES in a lot of places which left us with a lot of unmerge/merge pairs that could be simplified. These often got in the way of pattern matching and made codegen worse.
This patch: - Makes the necessary changes to the merge/unmerge combines so they can run post RegBankSelect - Adds relevant unmerge combines to the list of RegBankSelect combines for AMDGPU - Updates some tablegen patterns that were missing explicit cross-regbank copies (V_BFI patterns were causing constant bus violations with this change).
This seems to be mostly beneficial for code quality.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D142192
show more ...
|
|
Revision tags: llvmorg-15.0.7 |
|
| #
a3028239 |
| 21-Dec-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
Revert "[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs"
This reverts commit 40ba0942e2ab1107f83aa5a0ee5ae2980bf47b1a.
|
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2 |
|
| #
40ba0942 |
| 14-Apr-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector
[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector regclass allocation. This imposes many restrictions that we ended up with unsuccessful SGPR spilling when there won't be enough VGPRs and we are forced to spill the leftover into memory during PEI. The custom spill handling during PEI has many edge cases and often breaks the compiler time to time.
This patch implements spilling SGPRs into virtual VGPR lanes. Since we now split the register allocation for SGPRs and VGPRs, the virtual registers introduced for the spill lanes would get allocated automatically in the subsequent regalloc invocation for VGPRs.
Spill to virtual registers will always be successful, even in the high-pressure situations, and hence it avoids most of the edge cases during PEI. We are now left with only the custom SGPR spills during PEI for special registers like the frame pointer which isn an unproblematic case.
This patch also implements the whole wave spills which might occur if RA spills any live range of virtual registers involved in the whole wave operations. Earlier, we had been hand-picking registers for such machine operands. But now with SGPR spills into virtual VGPR lanes, we are exposing them to the allocator.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D124196
show more ...
|
| #
7a72a935 |
| 23-Sep-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU] Preserve only the inactive lanes of scratch vgprs
In general, a callee is free to use a scratch register without preserving its previous state. However, the VGPR used for SGPR spilling can
[AMDGPU] Preserve only the inactive lanes of scratch vgprs
In general, a callee is free to use a scratch register without preserving its previous state. However, the VGPR used for SGPR spilling can potentially have its inactive lanes overwritten by the writelane instructions. When the function returns, it can cause unexpected behavior if the VGPR value is not preserved appropriately.
The current scheme to preserve the inactive lanes of such scratch VGPRs is not done rightly. It preserves all lanes and causes the outgoing values (if any) getting overwritten by the epilog restores. It then corrupts the return value.
To avoid such situation with scratch VGPRs, this patch ensures we preserve only their inactive lanes.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D134526
show more ...
|
| #
5692a7e8 |
| 13-Jun-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU] Callee must always spill writelane VGPRs
Since the writelane instruction used for SGPR spills can modify inactive lanes, the callee must preserve the VGPR this instruction modifies even if
[AMDGPU] Callee must always spill writelane VGPRs
Since the writelane instruction used for SGPR spills can modify inactive lanes, the callee must preserve the VGPR this instruction modifies even if it was marked Caller-saved.
Reviewed By: arsenm, nhaehnle
Differential Revision: https://reviews.llvm.org/D124192
show more ...
|
| #
5159be3c |
| 18-Aug-2022 |
Luo, Yuanke <yuanke.luo@intel.com> |
(Reland) [fastalloc] Support allocating specific register class in fastalloc
This reverts commit 853bb192c407f5d9e75a5fd55cc089151530cbd3.
|
| #
853bb192 |
| 15-Aug-2022 |
Luo, Yuanke <yuanke.luo@intel.com> |
Revert "(Reland) [fastalloc] Support allocating specific register class in fastalloc"
This reverts commit 30f9e6ebd30b79d13f99eaca4d829e0da07186b3.
|
| #
30f9e6eb |
| 23-Jun-2022 |
Luo, Yuanke <yuanke.luo@intel.com> |
(Reland) [fastalloc] Support allocating specific register class in fastalloc
Reland commit 719658d078c4
The base RA support infrastructure that only allow a specific register class be allocated in
(Reland) [fastalloc] Support allocating specific register class in fastalloc
Reland commit 719658d078c4
The base RA support infrastructure that only allow a specific register class be allocated in RA pss. Since greedy RA, basic RA derived from base RA, they all allow allocating specific register class. Fast RA doesn't support allocating register for specific register class. This patch is to enable ShouldAllocateClass in fast RA, so that it can support allocating register for specific register class.
Differential Revision: https://reviews.llvm.org/D131825
show more ...
|
| #
851a5efe |
| 23-Jun-2022 |
Nico Weber <thakis@chromium.org> |
Revert "[fastalloc] Support allocating specific register class in fastalloc"
This reverts commit 719658d078c4093d1ee716fb65ae94673df7b22b. Breaks a few things, see comments on https://reviews.llvm.o
Revert "[fastalloc] Support allocating specific register class in fastalloc"
This reverts commit 719658d078c4093d1ee716fb65ae94673df7b22b. Breaks a few things, see comments on https://reviews.llvm.org/D128437 There's disagreement about the best fix. So let's keep HEAD green while discussions are happening.
show more ...
|
| #
719658d0 |
| 23-Jun-2022 |
Luo, Yuanke <yuanke.luo@intel.com> |
[fastalloc] Support allocating specific register class in fastalloc
The base RA support infrastructure that only allow a specific register class be allocated in RA pss. Since greedy RA, basic RA der
[fastalloc] Support allocating specific register class in fastalloc
The base RA support infrastructure that only allow a specific register class be allocated in RA pss. Since greedy RA, basic RA derived from base RA, they all allow allocating specific register class. Fast RA doesn't support allocating register for specific register class. This patch is to enable ShouldAllocateClass in fast RA, so that it can support allocating register for specific register class.
Differential Revision: https://reviews.llvm.org/D126771
show more ...
|
| #
affa1b1c |
| 10-May-2022 |
Nicolai Hähnle <nicolai.haehnle@amd.com> |
AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane
A later change will add a 3rd user, so factoring out the common code seems useful.
Reorganizing the executeInWaterfallLoop causes
AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane
A later change will add a 3rd user, so factoring out the common code seems useful.
Reorganizing the executeInWaterfallLoop causes some more COPYs to be generated, but those all fold away during instruction selection. Generating the comparisons uses generic instructions over machine instructions now which admittedly shouldn't make a difference (though it should make it easier to move the waterfall loop generation to another place).
(Resubmit with missing test added.)
Differential Revision: https://reviews.llvm.org/D125324
show more ...
|
| #
afc90101 |
| 25-May-2022 |
Nicolai Hähnle <nicolai.haehnle@amd.com> |
Revert "AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane"
This reverts commit 2a28467e5389c4d741d1825fadd39ae84ecaa5dc.
|
| #
2a28467e |
| 10-May-2022 |
Nicolai Hähnle <nicolai.haehnle@amd.com> |
AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane
A later change will add a 3rd user, so factoring out the common code seems useful.
Reorganizing the executeInWaterfallLoop causes
AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane
A later change will add a 3rd user, so factoring out the common code seems useful.
Reorganizing the executeInWaterfallLoop causes some more COPYs to be generated, but those all fold away during instruction selection. Generating the comparisons uses generic instructions over machine instructions now which admittedly shouldn't make a difference (though it should make it easier to move the waterfall loop generation to another place).
Differential Revision: https://reviews.llvm.org/D125324
show more ...
|