History log of /llvm-project/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (Results 101 – 125 of 229)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# d6ff5cf9 25-Dec-2020 Kazu Hirata <kazu@google.com>

[Target] Use llvm::any_of (NFC)


Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2
# ae8f4b21 18-Dec-2020 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Folding of FI operand with flat scratch

Differential Revision: https://reviews.llvm.org/D93501


# 1fd1f638 13-Dec-2020 Michael Liao <michael.hliao@gmail.com>

[amdgpu] Fix a crash case when `V_CNDMASK` could be simplified.

- Once an instruction is simplified, foldable candidates from it should
be invalidated or skipped as the operand index is no longer

[amdgpu] Fix a crash case when `V_CNDMASK` could be simplified.

- Once an instruction is simplified, foldable candidates from it should
be invalidated or skipped as the operand index is no longer valid.

Differential Revision: https://reviews.llvm.org/D93174

show more ...


Revision tags: llvmorg-11.0.1-rc1
# 038d884a 21-Oct-2020 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Use flat scratch instructions where available

The support is disabled by default. So far there is instruction
selection, spilling, and frame elimination. It also changes SP
from unswizzled

[AMDGPU] Use flat scratch instructions where available

The support is disabled by default. So far there is instruction
selection, spilling, and frame elimination. It also changes SP
from unswizzled to swizzled as used by flat scratch instructions,
so it cannot be mixed with MUBUF stack access.

At the very least missing:

- GlobalISel;
- Some optimizations in frame elimination in between vector
and scalar ALU;
- It shall finally allow to always materialize frame index
as an SGPR, but that is not implemented and frame elimination
cannot handle it yet;
- Unaligned and/or multidword flat scratch shall work, but it
is legalized now for MUBUF;
- Operand folding cannot optimize FI like with MUBUF yet;
- It will need scaling the value of the SP/FP in the DWARF
expression to recover the unswizzled scratch address;

Differential Revision: https://reviews.llvm.org/D89170

show more ...


Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3
# 892ef2e3 16-Sep-2020 Jay Foad <jay.foad@amd.com>

[AMDGPU] More codegen patterns for v2i16/v2f16 build_vector

It's simpler to do this at codegen time than to do ad-hoc constant
folding of machine instructions in SIFoldOperands.

Differential Revisi

[AMDGPU] More codegen patterns for v2i16/v2f16 build_vector

It's simpler to do this at codegen time than to do ad-hoc constant
folding of machine instructions in SIFoldOperands.

Differential Revision: https://reviews.llvm.org/D88028

show more ...


# 27df1652 17-Sep-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

Revert "[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel."

This reverts commit c3492a1aa1b98c8d81b0969d52cea7681f0624c2.

I think this is the wrong strategy and wrong place to do this
tra

Revert "[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel."

This reverts commit c3492a1aa1b98c8d81b0969d52cea7681f0624c2.

I think this is the wrong strategy and wrong place to do this
transform anyway. Also reverts follow up commit
7d593d0d6905b55ca1124fca5e4d1ebb17203138.

show more ...


# c3492a1a 09-Sep-2020 Michael Liao <michael.hliao@gmail.com>

[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel.

- Need to lower COPY from SGPR to VGPR to a real instruction as the
standard COPY is used where the source and destination are from the

[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel.

- Need to lower COPY from SGPR to VGPR to a real instruction as the
standard COPY is used where the source and destination are from the
same register bank so that we potentially coalesc them together and
save one COPY. Considering that, backend optimizations, such as CSE,
won't handle them. However, the copy from SGPR to VGPR always needs
materializing to a native instruction, it should be lowered into a
real one before other backend optimizations.

Differential Revision: https://reviews.llvm.org/D87556

show more ...


# 90777e29 09-Sep-2020 Jay Foad <jay.foad@amd.com>

[AMDGPU] Enable scheduling around FP MODE-setting instructions

Pre-gfx10 all MODE-setting instructions were S_SETREG_B32 which is
marked as having unmodeled side effects, which makes the machine
sch

[AMDGPU] Enable scheduling around FP MODE-setting instructions

Pre-gfx10 all MODE-setting instructions were S_SETREG_B32 which is
marked as having unmodeled side effects, which makes the machine
scheduler treat it as a barrier. Now that we have proper implicit $mode
operands we can use a no-side-effects S_SETREG_B32_mode pseudo instead
for setregs that only touch the FP MODE bits, to give the scheduler more
freedom.

Differential Revision: https://reviews.llvm.org/D87446

show more ...


# c259d3a0 04-Sep-2020 dfukalov <daniil.fukalov@amd.com>

[AMDGPU] Fix for folding v2.16 literals.

It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are
incorrectly processed so one of two packed values were lost.

Introduced new fu

[AMDGPU] Fix for folding v2.16 literals.

It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are
incorrectly processed so one of two packed values were lost.

Introduced new function to check immediate 32-bit operand can be folded.
Converted condition about current op_sel flags value to fall-through.

Fixes: SWDEV-247595

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D87158

show more ...


Revision tags: llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1
# bf41c4d2 23-Oct-2019 Michael Liao <michael.hliao@gmail.com>

[codegen] Ensure target flags are cleared/set properly. NFC.

- When an operand is changed into an immediate value or like, ensure their
target flags being cleared or set properly.

Differential Re

[codegen] Ensure target flags are cleared/set properly. NFC.

- When an operand is changed into an immediate value or like, ensure their
target flags being cleared or set properly.

Differential Revision: https://reviews.llvm.org/D87109

show more ...


# 34978602 20-Aug-2020 Jay Foad <jay.foad@amd.com>

[AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister

... in favour of the isPhysical/isVirtual methods.


# da3f357d 22-Jul-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Don't look at dbg users for foldable operands

These would have always failed to fold, so checking them or adding
them to the fold candidates is useless.


# 68fab44a 09-Aug-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix visiting physreg dest users when folding immediate copies

This can fold the immediate into the physical destination, but this
should not look for further users of the register. Fixes reg

AMDGPU: Fix visiting physreg dest users when folding immediate copies

This can fold the immediate into the physical destination, but this
should not look for further users of the register. Fixes regression
introduced by 766cb615a3b96025192707f4670cdf171da84034.

show more ...


# 5b32518f 30-Jul-2020 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Do not use undef on indirect source

We are using undef on the indirect move source subreg and then
using implicit super-reg. This creates a problem in RA when
Greedy decides to split the re

[AMDGPU] Do not use undef on indirect source

We are using undef on the indirect move source subreg and then
using implicit super-reg. This creates a problem in RA when
Greedy decides to split the register. It reassigns the implicit
super-reg but does not bother to change undef source because
it is really does not matter. The fix is to stop lying to RA and
drop undef flag.

This has also hit a problem in SIFoldOperands as it can fold
immediate into an indirect move since there is no undef flag
anymore. That results in multiple test failures, so added the
check for this case.

Differential Revision: https://reviews.llvm.org/D84899

show more ...


# 766cb615 20-Jul-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Relax restriction on folding immediates into physregs

I never completed the work on the patches referenced by
f8bf7d7f42f28fa18144091022236208e199f331, but this was intended to
avoid folding

AMDGPU: Relax restriction on folding immediates into physregs

I never completed the work on the patches referenced by
f8bf7d7f42f28fa18144091022236208e199f331, but this was intended to
avoid folding immediate writes into m0 which the coalescer doesn't
understand very well. Relax this to allow simple SGPR immediates to
fold directly into VGPR copies. This pattern shows up routinely in
current GlobalISel code since nothing is smart enough to emit VGPR
constants yet.

show more ...


# b9c644ec 22-Jul-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix failures from overflowing uint8_t number of operands

If the operand index exceeded the limit of unsigned char, it wrapped
and would point to the wrong operand. Increase the size of the o

AMDGPU: Fix failures from overflowing uint8_t number of operands

If the operand index exceeded the limit of unsigned char, it wrapped
and would point to the wrong operand. Increase the size of the operand
index field to avoid this, and also don't bother trying to fold into
implicit operands.

show more ...


# 79f67cae 14-Jul-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Rename add/sub with carry out instructions

The hardware has created a real mess in the naming for add/sub, which
have been renamed basically every generation. Switch the carry out
pseudos to

AMDGPU: Rename add/sub with carry out instructions

The hardware has created a real mess in the naming for add/sub, which
have been renamed basically every generation. Switch the carry out
pseudos to have the gfx9/gfx10 names. We were using the original SI/CI
v_add_i32/v_sub_i32 names. Later targets reintroduced these names as
carryless instructions with a saturating clamp bit, which we do not
define. Do this rename so we can unambiguously add these missing
instructions.

The carry-in versions should also be renamed, but at least those had a
consistent _u32 name to begin with. The 16-bit instructions were also
renamed, but aren't ambiguous.

This does regress assembler error message quality in some cases. In
mismatched wave32/wave64 situations, this will switch from
"unsupported instruction" to "invalid operand", with the error
pointing at the wrong position. I couldn't quite follow how the
assembler selects these, but the previous behavior seemed accidental
to me. It looked like there was a partial attempt to handle this which
was never completed (i.e. there is an AMDGPUOperand::isBoolReg but it
isn't used for anything).

show more ...


# 16ea23ff 01-Jul-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Clear subreg when folding immediate copies

This was getting reinterpreted as operand target flags, and appearing
as as <unknown target flag>, resulting in unparseable MIR.


# a19a56f6 07-Apr-2020 Graham Sellers <graham.sellers@amd.com>

[AMDGPU] Extend constant folding for logical operations

This patch extends existing constant folding in logical operations to
handle S_XNOR, S_NAND, S_NOR, S_ANDN2, S_ORN2, V_LSHL_ADD_U32 and
V_AND_

[AMDGPU] Extend constant folding for logical operations

This patch extends existing constant folding in logical operations to
handle S_XNOR, S_NAND, S_NOR, S_ANDN2, S_ORN2, V_LSHL_ADD_U32 and
V_AND_OR_B32. Also added a couple of tests for existing folds.

show more ...


# 60b1967c 21-Jan-2020 Scott Linder <Scott.Linder@amd.com>

[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions

Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in
the entry function prologue. This allows us t

[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions

Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in
the entry function prologue. This allows us to removes the scratch wave
offset register from the calling convention ABI.

As part of this change, allow the use of an inline constant zero for the
SOffset of MUBUF instructions accessing the stack in entry functions
when a frame pointer is not requested/required. Entry functions with
calls still need to set up the calling convention ABI stack pointer
register, and reference it in order to address arguments of called
functions. The ABI stack pointer register remains unswizzled, but is now
wave-relative instead of queue-relative.

Non-entry functions also use an inline constant zero SOffset for
wave-relative scratch access, but continue to use the stack and frame
pointers as before. When the stack or frame pointer is converted to a
swizzled offset it is now scaled directly, as the scratch wave offset no
longer needs to be subtracted first.

Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling
convention.

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75138

show more ...


# 1024b73e 03-Dec-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Split denormal mode tracking bits

Prepare to accurately track the future denormal-fp-math attribute
changes. The way to actually set these separately is not wired in yet.

This is just a mec

AMDGPU: Split denormal mode tracking bits

Prepare to accurately track the future denormal-fp-math attribute
changes. The way to actually set these separately is not wired in yet.

This is just a mechanical change, and mostly still assumes the input
and output mode match. This should be refined for some cases. For
example, fcanonicalize lowering should use the flushing variant if
either input or output flushing is enabled

show more ...


# 07a569a0 08-Jan-2020 Michael Liao <michael.hliao@gmail.com>

[amdgpu] Remove unused header. NFC.


# 46db6068 02-Dec-2019 David Stuttard <david.stuttard@amd.com>

AMDGPU: Avoid folding 2 constant operands into an SALU operation

Summary:
Catch the (admittedly unusual) case where SIFoldOperands attempts to fold 2
constant operands into the same SALU operation,

AMDGPU: Avoid folding 2 constant operands into an SALU operation

Summary:
Catch the (admittedly unusual) case where SIFoldOperands attempts to fold 2
constant operands into the same SALU operation, with neither operand able to be
encoded as an inline constant.

Change-Id: Ibc48d662c9ffd8bbacd154976b0b1c257ace0927

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70896

show more ...


# db0ed3e4 01-Nov-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Refactor treatment of denormal mode

Start moving towards treating this as a property of the calling
convention, and not the subtarget. The default denormal mode should
not be part of the sub

AMDGPU: Refactor treatment of denormal mode

Start moving towards treating this as a property of the calling
convention, and not the subtarget. The default denormal mode should
not be part of the subtarget, and be moved into a separate function
attribute.

This patch is still NFC. The denormal mode remains as a subtarget
feature for now, but make the necessary changes to switch to using an
attribute.

show more ...


# 1bfcc608 04-Nov-2019 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Added assert in SIFoldOperands before ptr use. NFC.


12345678910