image-waterfall-loop-O0.ll - OpenGrok history log for /llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/image-waterfall-loop-O0.ll

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5
# b3995aa3	19-Nov-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Decrease default NSA threshold from 3 to 2 (#116624) In graphics shaders it is better overall to use NSA encoding for IMAGE instructions, because the benefit of less constrained register [AMDGPU] Decrease default NSA threshold from 3 to 2 (#116624) In graphics shaders it is better overall to use NSA encoding for IMAGE instructions, because the benefit of less constrained register allocation outweighs the cost of larger encoding. In particular NSA form often avoids the need for extra V_MOV_B32 instructions between IMAGE instructions, which can allow the IMAGE instructions to be claused. Note that in GFX12 there is no longer a bit in the encoding to choose between NSA and non-NSA forms, so this only affects GFX10 and GFX11. show more ...
Revision tags: llvmorg-19.1.4
# 2b5b57c5	12-Nov-2024	Christudasan Devadasan <christudasan.devadasan@amd.com>	[AMDGPU] Skip non-wwm reg implicit-def from bb prolog (#115834) Currently all implicit-def instructions are part of bb prolog. We should only include the wwm-register's implicit definitions into the [AMDGPU] Skip non-wwm reg implicit-def from bb prolog (#115834) Currently all implicit-def instructions are part of bb prolog. We should only include the wwm-register's implicit definitions into the BB prolog. The other vector class registers' implicit defs when exist at the bb top might cause interference when pushed the LR_split copy insertion downwards. The SplitKit is very strict on altering the insertion points and will assert such instances. show more ...
Revision tags: llvmorg-19.1.3
# 3c5cea65	21-Oct-2024	Christudasan Devadasan <christudasan.devadasan@amd.com>	[AMDGPU]: Add implicit-def to the BB prolog (#112872) IMPLICIT_DEF inserted for a wwm-register at the very first block or the predecessor block where it is used for sgpr spilling can appear at a blo [AMDGPU]: Add implicit-def to the BB prolog (#112872) IMPLICIT_DEF inserted for a wwm-register at the very first block or the predecessor block where it is used for sgpr spilling can appear at a block begin that requires spill-insertion during per-lane VGPR regalloc phase. The presence of the IMPLICIT_DEF currently breaks the BB prolog. Fixes: SWDEV-490717 show more ...
Revision tags: llvmorg-19.1.2
# 6636f326	08-Oct-2024	Christudasan Devadasan <christudasan.devadasan@amd.com>	[AMDGPU] Include WWM register spill into BB Prolog (#111496) With #93526 we split the regalloc pipeline further to have a standalone allocation for wwm registers and per-lane VGPRs. Currently the [AMDGPU] Include WWM register spill into BB Prolog (#111496) With #93526 we split the regalloc pipeline further to have a standalone allocation for wwm registers and per-lane VGPRs. Currently the presence of the wwm-spill reloads inserted at the bb-top limits the isBasicPrologue function during the per-lane vgpr regalloc to skip past the exec manipulation instruction and ended up causing incorrect codegen. The wmm-spill inserted during the wwm-regalloc pipeline should also be included in the bb-prolog so that the per-lane vgpr regalloc pipeline can identify the appropriate insertion points for their spills and copies. show more ...
Revision tags: llvmorg-19.1.1
# ac0f64f0	30-Sep-2024	Christudasan Devadasan <christudasan.devadasan@amd.com>	[AMDGPU] Split vgpr regalloc pipeline (#93526) Allocating wwm-registers and per-thread VGPR operands together imposes many challenges in the way the registers are reused during allocation. There a [AMDGPU] Split vgpr regalloc pipeline (#93526) Allocating wwm-registers and per-thread VGPR operands together imposes many challenges in the way the registers are reused during allocation. There are times when regalloc reuses the registers of regular VGPRs operations for wwm-operations in a small range leading to unwantedly clobbering their inactive lanes causing correctness issues that are hard to trace. This patch splits the VGPR allocation pipeline further to allocate wwm-registers first and the regular VGPR operands in a separate pipeline. The splitting would ensure that the physical registers used for wwm allocations won't take part in the next allocation pipeline to avoid any such clobbering. show more ...
# 4f90e75b	25-Sep-2024	Stanislav Mekhanoshin <rampitec@users.noreply.github.com>	[AMDGPU] Do not count implicit VGPRs in SIInsertWaitcnts (#109049) When generating waitcounts before a use or def skip VGPRs. We never have a real implicit VGPR operands on memory instructions, it [AMDGPU] Do not count implicit VGPRs in SIInsertWaitcnts (#109049) When generating waitcounts before a use or def skip VGPRs. We never have a real implicit VGPR operands on memory instructions, it is only for super-reg liveness accounting. Some other instructions (MOVRELS as an example) may have real implicit VGPR uses though. This is less then ideal but most of the problems observed with spills. show more ...
Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4
# a0eb6b88	27-Oct-2023	Christudasan Devadasan <christudasan.devadasan@amd.com>	[AMDGPU] Try to fix the block prologs broken by RA inserted instructions (#69924) The insertion point determined by RA while attempting spills and liverange split at the beginning of a block goes w [AMDGPU] Try to fix the block prologs broken by RA inserted instructions (#69924) The insertion point determined by RA while attempting spills and liverange split at the beginning of a block goes wrong at times, and the newly inserted vector instructions are placed before the exec-mask restore instruction which is wrong. It occurs mainly due to the dependency on isBasicBlockPrologue that doesn't account early inserted instructions (spills and splits) during RA and causes the block prolog break. A better approach for deciding the insertion point should be worked out. For now, improving the helper function to consider all possible early insertions. This patch includes the spill instructions. The copies associated with liverange split should also be included in the block prolog. show more ...
Revision tags: llvmorg-17.0.3, llvmorg-17.0.2
# 7ac532ef	29-Sep-2023	Yashwant Singh <114088807+yashssh@users.noreply.github.com>	[AMDGPU] Introduce AMDGPU::SGPR_SPILL asm comment flag (#67091) Use this flag to give more context to implicit def comments in assembly. Reviewed on phabricator: https://reviews.llvm.org/D153754
Revision tags: llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1
# 4d42e8b5	28-Jul-2023	Matt Arsenault <Matthew.Arsenault@amd.com>	Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd. The workaround in c26dfc81e254c78dc2 Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd. The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work around the underlying problem with SUBREG_TO_REG. show more ...
# a496c8be	26-Jul-2023	Vitaly Buka <vitalybuka@google.com>	Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" And dependent commits. Details in D150388. This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048 Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" And dependent commits. Details in D150388. This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c. This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e. This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826. This reverts commit b7836d856206ec39509d42529f958c920368166b. No conflicts in the code, few tests had conflicts in autogenerated CHECKs: llvm/test/CodeGen/Thumb2/mve-float32regloops.ll llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll Reviewed By: alexfh Differential Revision: https://reviews.llvm.org/D156381 show more ...
Revision tags: llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5
# 7a98f084	17-May-2023	Christudasan Devadasan <Christudasan.Devadasan@amd.com>	[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector regclass allocation. This imposes many restrictions that we ended up with unsuccessful SGPR spilling when there won't be enough VGPRs and we are forced to spill the leftover into memory during PEI. The custom spill handling during PEI has many edge cases and often breaks the compiler time to time. This patch implements spilling SGPRs into virtual VGPR lanes. Since we now split the register allocation for SGPRs and VGPRs, the virtual registers introduced for the spill lanes would get allocated automatically in the subsequent regalloc invocation for VGPRs. Spill to virtual registers will always be successful, even in the high-pressure situations, and hence it avoids most of the edge cases during PEI. We are now left with only the custom SGPR spills during PEI for special registers like the frame pointer which is an unproblematic case. Differential Revision: https://reviews.llvm.org/D124196 show more ...
# f2c164c8	21-Jun-2023	Jay Foad <jay.foad@amd.com>	[AMDGPU] Do not wait for vscnt on function entry and return SIInsertWaitcnts inserts waitcnt instructions to resolve data dependencies. The GFX10+ vscnt (VMEM store count) counter is never used in t [AMDGPU] Do not wait for vscnt on function entry and return SIInsertWaitcnts inserts waitcnt instructions to resolve data dependencies. The GFX10+ vscnt (VMEM store count) counter is never used in this way. It is only used to resolve memory dependencies, and that is handled by SIMemoryLegalizer. Hence there is no need to conservatively wait for vscnt to be 0 on function entry and before returns. Differential Revision: https://reviews.llvm.org/D153537 show more ...
Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init
# d9a6fc82	24-Jan-2023	Pierre van Houtryve <pierre.vanhoutryve@amd.com>	[AMDGPU] Run unmerge combines post regbankselect RegBankSelect can insert G_UNMERGE_VALUES in a lot of places which left us with a lot of unmerge/merge pairs that could be simplified. These often go [AMDGPU] Run unmerge combines post regbankselect RegBankSelect can insert G_UNMERGE_VALUES in a lot of places which left us with a lot of unmerge/merge pairs that could be simplified. These often got in the way of pattern matching and made codegen worse. This patch: - Makes the necessary changes to the merge/unmerge combines so they can run post RegBankSelect - Adds relevant unmerge combines to the list of RegBankSelect combines for AMDGPU - Updates some tablegen patterns that were missing explicit cross-regbank copies (V_BFI patterns were causing constant bus violations with this change). This seems to be mostly beneficial for code quality. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D142192 show more ...
Revision tags: llvmorg-15.0.7
# a3028239	21-Dec-2022	Christudasan Devadasan <Christudasan.Devadasan@amd.com>	Revert "[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs" This reverts commit 40ba0942e2ab1107f83aa5a0ee5ae2980bf47b1a.
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2
# 40ba0942	14-Apr-2022	Christudasan Devadasan <Christudasan.Devadasan@amd.com>	[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector regclass allocation. This imposes many restrictions that we ended up with unsuccessful SGPR spilling when there won't be enough VGPRs and we are forced to spill the leftover into memory during PEI. The custom spill handling during PEI has many edge cases and often breaks the compiler time to time. This patch implements spilling SGPRs into virtual VGPR lanes. Since we now split the register allocation for SGPRs and VGPRs, the virtual registers introduced for the spill lanes would get allocated automatically in the subsequent regalloc invocation for VGPRs. Spill to virtual registers will always be successful, even in the high-pressure situations, and hence it avoids most of the edge cases during PEI. We are now left with only the custom SGPR spills during PEI for special registers like the frame pointer which isn an unproblematic case. This patch also implements the whole wave spills which might occur if RA spills any live range of virtual registers involved in the whole wave operations. Earlier, we had been hand-picking registers for such machine operands. But now with SGPR spills into virtual VGPR lanes, we are exposing them to the allocator. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D124196 show more ...
# 7a72a935	23-Sep-2022	Christudasan Devadasan <Christudasan.Devadasan@amd.com>	[AMDGPU] Preserve only the inactive lanes of scratch vgprs In general, a callee is free to use a scratch register without preserving its previous state. However, the VGPR used for SGPR spilling can [AMDGPU] Preserve only the inactive lanes of scratch vgprs In general, a callee is free to use a scratch register without preserving its previous state. However, the VGPR used for SGPR spilling can potentially have its inactive lanes overwritten by the writelane instructions. When the function returns, it can cause unexpected behavior if the VGPR value is not preserved appropriately. The current scheme to preserve the inactive lanes of such scratch VGPRs is not done rightly. It preserves all lanes and causes the outgoing values (if any) getting overwritten by the epilog restores. It then corrupts the return value. To avoid such situation with scratch VGPRs, this patch ensures we preserve only their inactive lanes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134526 show more ...
# 5692a7e8	13-Jun-2022	Christudasan Devadasan <Christudasan.Devadasan@amd.com>	[AMDGPU] Callee must always spill writelane VGPRs Since the writelane instruction used for SGPR spills can modify inactive lanes, the callee must preserve the VGPR this instruction modifies even if [AMDGPU] Callee must always spill writelane VGPRs Since the writelane instruction used for SGPR spills can modify inactive lanes, the callee must preserve the VGPR this instruction modifies even if it was marked Caller-saved. Reviewed By: arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D124192 show more ...
# 5159be3c	18-Aug-2022	Luo, Yuanke <yuanke.luo@intel.com>	(Reland) [fastalloc] Support allocating specific register class in fastalloc This reverts commit 853bb192c407f5d9e75a5fd55cc089151530cbd3.
# 853bb192	15-Aug-2022	Luo, Yuanke <yuanke.luo@intel.com>	Revert "(Reland) [fastalloc] Support allocating specific register class in fastalloc" This reverts commit 30f9e6ebd30b79d13f99eaca4d829e0da07186b3.
# 30f9e6eb	23-Jun-2022	Luo, Yuanke <yuanke.luo@intel.com>	(Reland) [fastalloc] Support allocating specific register class in fastalloc Reland commit 719658d078c4 The base RA support infrastructure that only allow a specific register class be allocated in (Reland) [fastalloc] Support allocating specific register class in fastalloc Reland commit 719658d078c4 The base RA support infrastructure that only allow a specific register class be allocated in RA pss. Since greedy RA, basic RA derived from base RA, they all allow allocating specific register class. Fast RA doesn't support allocating register for specific register class. This patch is to enable ShouldAllocateClass in fast RA, so that it can support allocating register for specific register class. Differential Revision: https://reviews.llvm.org/D131825 show more ...
# 851a5efe	23-Jun-2022	Nico Weber <thakis@chromium.org>	Revert "[fastalloc] Support allocating specific register class in fastalloc" This reverts commit 719658d078c4093d1ee716fb65ae94673df7b22b. Breaks a few things, see comments on https://reviews.llvm.o Revert "[fastalloc] Support allocating specific register class in fastalloc" This reverts commit 719658d078c4093d1ee716fb65ae94673df7b22b. Breaks a few things, see comments on https://reviews.llvm.org/D128437 There's disagreement about the best fix. So let's keep HEAD green while discussions are happening. show more ...
# 719658d0	23-Jun-2022	Luo, Yuanke <yuanke.luo@intel.com>	[fastalloc] Support allocating specific register class in fastalloc The base RA support infrastructure that only allow a specific register class be allocated in RA pss. Since greedy RA, basic RA der [fastalloc] Support allocating specific register class in fastalloc The base RA support infrastructure that only allow a specific register class be allocated in RA pss. Since greedy RA, basic RA derived from base RA, they all allow allocating specific register class. Fast RA doesn't support allocating register for specific register class. This patch is to enable ShouldAllocateClass in fast RA, so that it can support allocating register for specific register class. Differential Revision: https://reviews.llvm.org/D126771 show more ...
# affa1b1c	10-May-2022	Nicolai Hähnle <nicolai.haehnle@amd.com>	AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane A later change will add a 3rd user, so factoring out the common code seems useful. Reorganizing the executeInWaterfallLoop causes AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane A later change will add a 3rd user, so factoring out the common code seems useful. Reorganizing the executeInWaterfallLoop causes some more COPYs to be generated, but those all fold away during instruction selection. Generating the comparisons uses generic instructions over machine instructions now which admittedly shouldn't make a difference (though it should make it easier to move the waterfall loop generation to another place). (Resubmit with missing test added.) Differential Revision: https://reviews.llvm.org/D125324 show more ...
# afc90101	25-May-2022	Nicolai Hähnle <nicolai.haehnle@amd.com>	Revert "AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane" This reverts commit 2a28467e5389c4d741d1825fadd39ae84ecaa5dc.
# 2a28467e	10-May-2022	Nicolai Hähnle <nicolai.haehnle@amd.com>	AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane A later change will add a 3rd user, so factoring out the common code seems useful. Reorganizing the executeInWaterfallLoop causes AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane A later change will add a 3rd user, so factoring out the common code seems useful. Reorganizing the executeInWaterfallLoop causes some more COPYs to be generated, but those all fold away during instruction selection. Generating the comparisons uses generic instructions over machine instructions now which admittedly shouldn't make a difference (though it should make it easier to move the waterfall loop generation to another place). Differential Revision: https://reviews.llvm.org/D125324 show more ...
12