History log of /llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/image-waterfall-loop-O0.ll (Results 1 – 25 of 28)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5
# b3995aa3 19-Nov-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Decrease default NSA threshold from 3 to 2 (#116624)

In graphics shaders it is better overall to use NSA encoding for IMAGE
instructions, because the benefit of less constrained register

[AMDGPU] Decrease default NSA threshold from 3 to 2 (#116624)

In graphics shaders it is better overall to use NSA encoding for IMAGE
instructions, because the benefit of less constrained register
allocation outweighs the cost of larger encoding. In particular NSA form
often avoids the need for extra V_MOV_B32 instructions between IMAGE
instructions, which can allow the IMAGE instructions to be claused.

Note that in GFX12 there is no longer a bit in the encoding to choose
between NSA and non-NSA forms, so this only affects GFX10 and GFX11.

show more ...


Revision tags: llvmorg-19.1.4
# 2b5b57c5 12-Nov-2024 Christudasan Devadasan <christudasan.devadasan@amd.com>

[AMDGPU] Skip non-wwm reg implicit-def from bb prolog (#115834)

Currently all implicit-def instructions are part of
bb prolog. We should only include the wwm-register's
implicit definitions into the

[AMDGPU] Skip non-wwm reg implicit-def from bb prolog (#115834)

Currently all implicit-def instructions are part of
bb prolog. We should only include the wwm-register's
implicit definitions into the BB prolog. The other
vector class registers' implicit defs when exist at
the bb top might cause interference when pushed the
LR_split copy insertion downwards. The SplitKit is
very strict on altering the insertion points and will
assert such instances.

show more ...


Revision tags: llvmorg-19.1.3
# 3c5cea65 21-Oct-2024 Christudasan Devadasan <christudasan.devadasan@amd.com>

[AMDGPU]: Add implicit-def to the BB prolog (#112872)

IMPLICIT_DEF inserted for a wwm-register at the
very first block or the predecessor block where
it is used for sgpr spilling can appear at a blo

[AMDGPU]: Add implicit-def to the BB prolog (#112872)

IMPLICIT_DEF inserted for a wwm-register at the
very first block or the predecessor block where
it is used for sgpr spilling can appear at a block
begin that requires spill-insertion during per-lane
VGPR regalloc phase. The presence of the IMPLICIT_DEF
currently breaks the BB prolog.

Fixes: SWDEV-490717

show more ...


Revision tags: llvmorg-19.1.2
# 6636f326 08-Oct-2024 Christudasan Devadasan <christudasan.devadasan@amd.com>

[AMDGPU] Include WWM register spill into BB Prolog (#111496)

With #93526 we split the regalloc pipeline further
to have a standalone allocation for wwm registers
and per-lane VGPRs. Currently the

[AMDGPU] Include WWM register spill into BB Prolog (#111496)

With #93526 we split the regalloc pipeline further
to have a standalone allocation for wwm registers
and per-lane VGPRs. Currently the presence of the
wwm-spill reloads inserted at the bb-top limits the
isBasicPrologue function during the per-lane vgpr
regalloc to skip past the exec manipulation instruction
and ended up causing incorrect codegen. The wmm-spill
inserted during the wwm-regalloc pipeline should also
be included in the bb-prolog so that the per-lane vgpr
regalloc pipeline can identify the appropriate insertion
points for their spills and copies.

show more ...


Revision tags: llvmorg-19.1.1
# ac0f64f0 30-Sep-2024 Christudasan Devadasan <christudasan.devadasan@amd.com>

[AMDGPU] Split vgpr regalloc pipeline (#93526)

Allocating wwm-registers and per-thread VGPR operands
together imposes many challenges in the way the
registers are reused during allocation. There a

[AMDGPU] Split vgpr regalloc pipeline (#93526)

Allocating wwm-registers and per-thread VGPR operands
together imposes many challenges in the way the
registers are reused during allocation. There are
times when regalloc reuses the registers of regular
VGPRs operations for wwm-operations in a small range
leading to unwantedly clobbering their inactive lanes
causing correctness issues that are hard to trace.

This patch splits the VGPR allocation pipeline further
to allocate wwm-registers first and the regular VGPR
operands in a separate pipeline. The splitting would
ensure that the physical registers used for wwm
allocations won't take part in the next allocation
pipeline to avoid any such clobbering.

show more ...


# 4f90e75b 25-Sep-2024 Stanislav Mekhanoshin <rampitec@users.noreply.github.com>

[AMDGPU] Do not count implicit VGPRs in SIInsertWaitcnts (#109049)

When generating waitcounts before a use or def skip VGPRs. We never have
a real implicit VGPR operands on memory instructions, it

[AMDGPU] Do not count implicit VGPRs in SIInsertWaitcnts (#109049)

When generating waitcounts before a use or def skip VGPRs. We never have
a real implicit VGPR operands on memory instructions, it is only for
super-reg liveness accounting.

Some other instructions (MOVRELS as an example) may have real implicit
VGPR uses though.

This is less then ideal but most of the problems observed with spills.

show more ...


Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4
# a0eb6b88 27-Oct-2023 Christudasan Devadasan <christudasan.devadasan@amd.com>

[AMDGPU] Try to fix the block prologs broken by RA inserted instructions (#69924)

The insertion point determined by RA while attempting spills and liverange
split at the beginning of a block goes w

[AMDGPU] Try to fix the block prologs broken by RA inserted instructions (#69924)

The insertion point determined by RA while attempting spills and liverange
split at the beginning of a block goes wrong at times, and the newly
inserted vector instructions are placed before the exec-mask restore
instruction which is wrong. It occurs mainly due to the dependency on
isBasicBlockPrologue that doesn't account early inserted instructions
(spills and splits) during RA and causes the block prolog break.

A better approach for deciding the insertion point should be worked out.
For now, improving the helper function to consider all possible early
insertions. This patch includes the spill instructions. The copies
associated with liverange split should also be included in the block
prolog.

show more ...


Revision tags: llvmorg-17.0.3, llvmorg-17.0.2
# 7ac532ef 29-Sep-2023 Yashwant Singh <114088807+yashssh@users.noreply.github.com>

[AMDGPU] Introduce AMDGPU::SGPR_SPILL asm comment flag (#67091)

Use this flag to give more context to implicit def comments in assembly.

Reviewed on phabricator:
https://reviews.llvm.org/D153754


Revision tags: llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1
# 4d42e8b5 28-Jul-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"

This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.

The workaround in c26dfc81e254c78dc2

Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"

This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.

The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work
around the underlying problem with SUBREG_TO_REG.

show more ...


# a496c8be 26-Jul-2023 Vitaly Buka <vitalybuka@google.com>

Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"

And dependent commits.

Details in D150388.

This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048

Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"

And dependent commits.

Details in D150388.

This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c.
This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e.
This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826.
This reverts commit b7836d856206ec39509d42529f958c920368166b.

No conflicts in the code, few tests had conflicts in autogenerated CHECKs:
llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll

Reviewed By: alexfh

Differential Revision: https://reviews.llvm.org/D156381

show more ...


Revision tags: llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5
# 7a98f084 17-May-2023 Christudasan Devadasan <Christudasan.Devadasan@amd.com>

[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs

Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector

[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs

Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector regclass allocation.
This imposes many restrictions that we ended up with
unsuccessful SGPR spilling when there won't be enough
VGPRs and we are forced to spill the leftover into
memory during PEI. The custom spill handling during PEI
has many edge cases and often breaks the compiler time
to time.

This patch implements spilling SGPRs into virtual VGPR
lanes. Since we now split the register allocation for
SGPRs and VGPRs, the virtual registers introduced for
the spill lanes would get allocated automatically in
the subsequent regalloc invocation for VGPRs.

Spill to virtual registers will always be successful,
even in the high-pressure situations, and hence it avoids
most of the edge cases during PEI. We are now left with
only the custom SGPR spills during PEI for special registers
like the frame pointer which is an unproblematic case.

Differential Revision: https://reviews.llvm.org/D124196

show more ...


# f2c164c8 21-Jun-2023 Jay Foad <jay.foad@amd.com>

[AMDGPU] Do not wait for vscnt on function entry and return

SIInsertWaitcnts inserts waitcnt instructions to resolve data
dependencies. The GFX10+ vscnt (VMEM store count) counter is never used
in t

[AMDGPU] Do not wait for vscnt on function entry and return

SIInsertWaitcnts inserts waitcnt instructions to resolve data
dependencies. The GFX10+ vscnt (VMEM store count) counter is never used
in this way. It is only used to resolve memory dependencies, and that is
handled by SIMemoryLegalizer. Hence there is no need to conservatively
wait for vscnt to be 0 on function entry and before returns.

Differential Revision: https://reviews.llvm.org/D153537

show more ...


Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init
# d9a6fc82 24-Jan-2023 Pierre van Houtryve <pierre.vanhoutryve@amd.com>

[AMDGPU] Run unmerge combines post regbankselect

RegBankSelect can insert G_UNMERGE_VALUES in a lot of places which
left us with a lot of unmerge/merge pairs that could be simplified.
These often go

[AMDGPU] Run unmerge combines post regbankselect

RegBankSelect can insert G_UNMERGE_VALUES in a lot of places which
left us with a lot of unmerge/merge pairs that could be simplified.
These often got in the way of pattern matching and made codegen
worse.

This patch:
- Makes the necessary changes to the merge/unmerge combines so they can run post RegBankSelect
- Adds relevant unmerge combines to the list of RegBankSelect combines for AMDGPU
- Updates some tablegen patterns that were missing explicit cross-regbank copies (V_BFI patterns were causing constant bus violations with this change).

This seems to be mostly beneficial for code quality.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D142192

show more ...


Revision tags: llvmorg-15.0.7
# a3028239 21-Dec-2022 Christudasan Devadasan <Christudasan.Devadasan@amd.com>

Revert "[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs"

This reverts commit 40ba0942e2ab1107f83aa5a0ee5ae2980bf47b1a.


Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2
# 40ba0942 14-Apr-2022 Christudasan Devadasan <Christudasan.Devadasan@amd.com>

[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs

Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector

[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs

Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector regclass allocation.
This imposes many restrictions that we ended up with
unsuccessful SGPR spilling when there won't be enough
VGPRs and we are forced to spill the leftover into
memory during PEI. The custom spill handling during PEI
has many edge cases and often breaks the compiler time
to time.

This patch implements spilling SGPRs into virtual VGPR
lanes. Since we now split the register allocation for
SGPRs and VGPRs, the virtual registers introduced for
the spill lanes would get allocated automatically in
the subsequent regalloc invocation for VGPRs.

Spill to virtual registers will always be successful,
even in the high-pressure situations, and hence it avoids
most of the edge cases during PEI. We are now left with
only the custom SGPR spills during PEI for special registers
like the frame pointer which isn an unproblematic case.

This patch also implements the whole wave spills which
might occur if RA spills any live range of virtual registers
involved in the whole wave operations. Earlier, we had
been hand-picking registers for such machine operands.
But now with SGPR spills into virtual VGPR lanes, we are
exposing them to the allocator.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D124196

show more ...


# 7a72a935 23-Sep-2022 Christudasan Devadasan <Christudasan.Devadasan@amd.com>

[AMDGPU] Preserve only the inactive lanes of scratch vgprs

In general, a callee is free to use a scratch register without
preserving its previous state. However, the VGPR used for SGPR
spilling can

[AMDGPU] Preserve only the inactive lanes of scratch vgprs

In general, a callee is free to use a scratch register without
preserving its previous state. However, the VGPR used for SGPR
spilling can potentially have its inactive lanes overwritten by
the writelane instructions. When the function returns, it can
cause unexpected behavior if the VGPR value is not preserved
appropriately.

The current scheme to preserve the inactive lanes of such
scratch VGPRs is not done rightly. It preserves all lanes
and causes the outgoing values (if any) getting overwritten
by the epilog restores. It then corrupts the return value.

To avoid such situation with scratch VGPRs, this patch ensures
we preserve only their inactive lanes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134526

show more ...


# 5692a7e8 13-Jun-2022 Christudasan Devadasan <Christudasan.Devadasan@amd.com>

[AMDGPU] Callee must always spill writelane VGPRs

Since the writelane instruction used for SGPR spills can
modify inactive lanes, the callee must preserve the VGPR
this instruction modifies even if

[AMDGPU] Callee must always spill writelane VGPRs

Since the writelane instruction used for SGPR spills can
modify inactive lanes, the callee must preserve the VGPR
this instruction modifies even if it was marked Caller-saved.

Reviewed By: arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D124192

show more ...


# 5159be3c 18-Aug-2022 Luo, Yuanke <yuanke.luo@intel.com>

(Reland) [fastalloc] Support allocating specific register class in fastalloc

This reverts commit 853bb192c407f5d9e75a5fd55cc089151530cbd3.


# 853bb192 15-Aug-2022 Luo, Yuanke <yuanke.luo@intel.com>

Revert "(Reland) [fastalloc] Support allocating specific register class in fastalloc"

This reverts commit 30f9e6ebd30b79d13f99eaca4d829e0da07186b3.


# 30f9e6eb 23-Jun-2022 Luo, Yuanke <yuanke.luo@intel.com>

(Reland) [fastalloc] Support allocating specific register class in fastalloc

Reland commit 719658d078c4

The base RA support infrastructure that only allow a specific register
class be allocated in

(Reland) [fastalloc] Support allocating specific register class in fastalloc

Reland commit 719658d078c4

The base RA support infrastructure that only allow a specific register
class be allocated in RA pss. Since greedy RA, basic RA derived from
base RA, they all allow allocating specific register class. Fast RA
doesn't support allocating register for specific register class. This
patch is to enable ShouldAllocateClass in fast RA, so that it can
support allocating register for specific register class.

Differential Revision: https://reviews.llvm.org/D131825

show more ...


# 851a5efe 23-Jun-2022 Nico Weber <thakis@chromium.org>

Revert "[fastalloc] Support allocating specific register class in fastalloc"

This reverts commit 719658d078c4093d1ee716fb65ae94673df7b22b.
Breaks a few things, see comments on https://reviews.llvm.o

Revert "[fastalloc] Support allocating specific register class in fastalloc"

This reverts commit 719658d078c4093d1ee716fb65ae94673df7b22b.
Breaks a few things, see comments on https://reviews.llvm.org/D128437
There's disagreement about the best fix.
So let's keep HEAD green while discussions are happening.

show more ...


# 719658d0 23-Jun-2022 Luo, Yuanke <yuanke.luo@intel.com>

[fastalloc] Support allocating specific register class in fastalloc

The base RA support infrastructure that only allow a specific register
class be allocated in RA pss. Since greedy RA, basic RA der

[fastalloc] Support allocating specific register class in fastalloc

The base RA support infrastructure that only allow a specific register
class be allocated in RA pss. Since greedy RA, basic RA derived from
base RA, they all allow allocating specific register class. Fast RA
doesn't support allocating register for specific register class. This
patch is to enable ShouldAllocateClass in fast RA, so that it can
support allocating register for specific register class.

Differential Revision: https://reviews.llvm.org/D126771

show more ...


# affa1b1c 10-May-2022 Nicolai Hähnle <nicolai.haehnle@amd.com>

AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane

A later change will add a 3rd user, so factoring out the common code
seems useful.

Reorganizing the executeInWaterfallLoop causes

AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane

A later change will add a 3rd user, so factoring out the common code
seems useful.

Reorganizing the executeInWaterfallLoop causes some more COPYs to be
generated, but those all fold away during instruction selection.
Generating the comparisons uses generic instructions over machine
instructions now which admittedly shouldn't make a difference
(though it should make it easier to move the waterfall loop generation
to another place).

(Resubmit with missing test added.)

Differential Revision: https://reviews.llvm.org/D125324

show more ...


# afc90101 25-May-2022 Nicolai Hähnle <nicolai.haehnle@amd.com>

Revert "AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane"

This reverts commit 2a28467e5389c4d741d1825fadd39ae84ecaa5dc.


# 2a28467e 10-May-2022 Nicolai Hähnle <nicolai.haehnle@amd.com>

AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane

A later change will add a 3rd user, so factoring out the common code
seems useful.

Reorganizing the executeInWaterfallLoop causes

AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane

A later change will add a 3rd user, so factoring out the common code
seems useful.

Reorganizing the executeInWaterfallLoop causes some more COPYs to be
generated, but those all fold away during instruction selection.
Generating the comparisons uses generic instructions over machine
instructions now which admittedly shouldn't make a difference
(though it should make it easier to move the waterfall loop generation
to another place).

Differential Revision: https://reviews.llvm.org/D125324

show more ...


12