History log of /llvm-project/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (Results 51 – 75 of 229)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 96dfa523 07-Sep-2022 Jay Foad <jay.foad@amd.com>

[AMDGPU] Refactor SIFoldOperands. NFC.

Refactor static functions into class methods so they have access to TII, MRI
etc.


# 9861a68a 28-Aug-2022 Kazu Hirata <kazu@google.com>

[Target] Qualify auto in range-based for loops (NFC)


# dbda30e2 28-Jul-2022 Carl Ritson <carl.ritson@amd.com>

[AMDGPU][SIFoldOperands] Clear kills when folding COPY

Clear all kill flags on source register when folding a COPY.
This is necessary because the kills may now be out of order with the uses.

Review

[AMDGPU][SIFoldOperands] Clear kills when folding COPY

Clear all kill flags on source register when folding a COPY.
This is necessary because the kills may now be out of order with the uses.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D130622

show more ...


# 3eb2281b 16-May-2022 Jay Foad <jay.foad@amd.com>

[AMDGPU] Aggressively fold immediates in SIFoldOperands

Previously SIFoldOperands::foldInstOperand would only fold a
non-inlinable immediate into a single user, so as not to increase code
size by ad

[AMDGPU] Aggressively fold immediates in SIFoldOperands

Previously SIFoldOperands::foldInstOperand would only fold a
non-inlinable immediate into a single user, so as not to increase code
size by adding the same 32-bit literal operand to many instructions.

This patch removes that restriction, so that a non-inlinable immediate
will be folded into any number of users. The rationale is:
- It reduces the number of registers used for holding constant values,
which might increase occupancy. (On the other hand, many of these
registers are SGPRs which no longer affect occupancy on GFX10+.)
- It reduces ALU stalls between the instruction that loads a constant
into a register, and the instruction that uses it.
- The above benefits are expected to outweigh any increase in code size.

Differential Revision: https://reviews.llvm.org/D114643

show more ...


# 6dd21d1d 16-Mar-2022 Christudasan Devadasan <Christudasan.Devadasan@amd.com>

[AMDGPU][SIFoldOperands] Consider the alignment constraints

Enforced an alignment check while folding the operands.


# 37b37838 16-Mar-2022 Shengchen Kan <shengchen.kan@intel.com>

[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments


# c4500de2 14-Mar-2022 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] gfx940: disable OP_SEL on V_DOT instructions

Differential Revision: https://reviews.llvm.org/D121634


# 36fe3f13 08-Mar-2022 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] flat scratch SVS addressing mode for gfx940

Both VADDR and SADDR are used in SVS mode.

Differential Revision: https://reviews.llvm.org/D121254


# 0d849b82 02-Mar-2022 Christudasan Devadasan <Christudasan.Devadasan@amd.com>

AMDGPU: Skip folding REG_SEQUENCE if found unknown regclasses for its users

Use TII::getRegClass to return a valid regclass or a nullptr
if the RC is unknown for a given OpIdx. This fixes a potentia

AMDGPU: Skip folding REG_SEQUENCE if found unknown regclasses for its users

Use TII::getRegClass to return a valid regclass or a nullptr
if the RC is unknown for a given OpIdx. This fixes a potential
crash occurred while getting the RC from a variadic instruction.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D120813

show more ...


# e7b362d7 04-Mar-2022 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Add v_mov_b64 gfx940 opcode

Differential Revision: https://reviews.llvm.org/D121023


Revision tags: llvmorg-14.0.0-rc2
# 69ab233a 16-Feb-2022 Jay Foad <jay.foad@amd.com>

[AMDGPU] Return better Changed status from SIFoldOperands

Differential Revision: https://reviews.llvm.org/D120023


Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init
# dbf278b9 21-Jan-2022 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Prevent aliasing of SrcC and Dst in MAI

Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs
or Src_C and vDst are completely separated. The case that Src_C and vDst
are

[AMDGPU] Prevent aliasing of SrcC and Dst in MAI

Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs
or Src_C and vDst are completely separated. The case that Src_C and vDst
are overlapping should be avoid as new value could be written to accumulator
input before it gets read.

Note that this inevitably increases register pressure to the point where
some programs will become uncompilable.

This patch separates MAC and FMA versions of MFMA instructions using either
tied dst and src2 or earlyclobber dst.

Fixes: SWDEV-318900

Differential Revision: https://reviews.llvm.org/D117844

show more ...


Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# f108c7f5 05-Dec-2021 Jack Andersen <jackoalan@gmail.com>

[GlobalISel] Allow DBG_VALUE to use undefined vregs before LiveDebugValues.

Expanding on D109750.

Since `DBG_VALUE` instructions have final register validity determined in
`LDVImpl::handleDebugValu

[GlobalISel] Allow DBG_VALUE to use undefined vregs before LiveDebugValues.

Expanding on D109750.

Since `DBG_VALUE` instructions have final register validity determined in
`LDVImpl::handleDebugValue`, there is no apparent reason to immediately prune
unused register operands as their defs are erased. Consequently, this renders
`MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval` moot; gaining a
substantial performance improvement.

The only necessary changes involve making relevant passes consider invalid
DBG_VALUE vregs uses as valid.

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D112852

show more ...


Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3
# 654c89d8 06-Sep-2021 Christudasan Devadasan <Christudasan.Devadasan@amd.com>

[AMDGPU] Make vector superclasses allocatable

The combined vector register classes with both
VGPRs and AGPRs are currently unallocatable.
This patch turns them into allocatable as a
prerequisite to

[AMDGPU] Make vector superclasses allocatable

The combined vector register classes with both
VGPRs and AGPRs are currently unallocatable.
This patch turns them into allocatable as a
prerequisite to enable copy between VGPR and
AGPR registers during regalloc.

Also, added the missing AV register classes from
192b to 1024b.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D109300

show more ...


# 5b8bbbec 18-Nov-2021 Zarko Todorovski <zarko@ca.ibm.com>

[NFC][llvm] Inclusive language: reword and remove uses of sanity in llvm/lib/Target

Reworded removed code comments that contain `sanity check` and `sanity
test`.


# 3264e959 09-Nov-2021 Jay Foad <jay.foad@amd.com>

[CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress

Delegate updating of LiveIntervals to each target's
convertToThreeAddress implementation, instead of repairing LiveIntervals

[CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress

Delegate updating of LiveIntervals to each target's
convertToThreeAddress implementation, instead of repairing LiveIntervals
after the fact in TwoAddressInstruction::convertInstTo3Addr.

Differential Revision: https://reviews.llvm.org/D113493

show more ...


# d1f45ed5 11-Nov-2021 Neubauer, Sebastian <Sebastian.Neubauer@amd.com>

[AMDGPU][NFC] Fix typos

Differential Revision: https://reviews.llvm.org/D113672


# 6cef28ed 22-Sep-2021 Jay Foad <jay.foad@amd.com>

[TII] Remove the MFI argument to convertToThreeAddress. NFC.

This simplifies the API and addresses a FIXME in
TwoAddressInstructionPass::convertInstTo3Addr.

Differential Revision: https://reviews.l

[TII] Remove the MFI argument to convertToThreeAddress. NFC.

This simplifies the API and addresses a FIXME in
TwoAddressInstructionPass::convertInstTo3Addr.

Differential Revision: https://reviews.llvm.org/D110229

show more ...


# e7b169a8 23-Sep-2021 Mikael Holmen <mikael.holmen@ericsson.com>

[AMDGPU] Fix gcc warnings about unused variables [NFC]


# 0205806d 21-Sep-2021 Jay Foad <jay.foad@amd.com>

[AMDGPU] Convert mac/fmac to mad/fma when folding output modifiers

Use of output modifiers forces VOP3 encoding for a VOP2 mac/fmac
instruction, so we might as well convert it to the more flexible V

[AMDGPU] Convert mac/fmac to mad/fma when folding output modifiers

Use of output modifiers forces VOP3 encoding for a VOP2 mac/fmac
instruction, so we might as well convert it to the more flexible VOP3-
only mad/fma form.

With this change, the only way we should emit VOP3-encoded mac/fmac is
if regalloc chooses registers that require the VOP3 encoding, e.g. sgprs
for both src0 and src1. In all other cases the mac/fmac should either be
converted to mad/fma or shrunk to VOP2 encoding.

Differential Revision: https://reviews.llvm.org/D110156

show more ...


Revision tags: llvmorg-13.0.0-rc2
# f3fe44fa 19-Aug-2021 Sebastian Neubauer <Sebastian.Neubauer@amd.com>

[AMDGPU] Fix too many constants with flat scratch

Prevent SIFoldOperands from creating SALU instructions with a constant
and a frame index. Previously, only one operand was checked to be a
frame ind

[AMDGPU] Fix too many constants with flat scratch

Prevent SIFoldOperands from creating SALU instructions with a constant
and a frame index. Previously, only one operand was checked to be a
frame index, leading to too many constants when flat scratch is enabled
and stack offsets are large.

Differential Revision: https://reviews.llvm.org/D108368

show more ...


Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2
# 39f8a792 15-Jun-2021 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Try to eliminate clearing of high bits of 16-bit instructions

These used to consistently be zeroed pre-gfx9, but gfx9 made the
situation complicated since now some still do and some don't. T

AMDGPU: Try to eliminate clearing of high bits of 16-bit instructions

These used to consistently be zeroed pre-gfx9, but gfx9 made the
situation complicated since now some still do and some don't. This
also manages to pick up a few cases that the pattern fails to optimize
away.

We handle some cases with instruction patterns, but some get
through. In particular this improves the integer cases.

show more ...


Revision tags: llvmorg-12.0.1-rc1
# 7c706af0 07-Apr-2021 Jay Foad <jay.foad@amd.com>

[AMDGPU] SIFoldOperands: clean up tryConstantFoldOp

First clean up the strange API of tryConstantFoldOp where it took an
immediate operand value, but no indication of which operand it was the
value

[AMDGPU] SIFoldOperands: clean up tryConstantFoldOp

First clean up the strange API of tryConstantFoldOp where it took an
immediate operand value, but no indication of which operand it was the
value for.

Second clean up the loop that calls tryConstantFoldOp so that it does
not have to restart from the beginning every time it folds an
instruction.

This is NFCI but there are some minor changes caused by the order in
which things are folded.

Differential Revision: https://reviews.llvm.org/D100031

show more ...


# b5833277 23-Apr-2021 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix assert on inline asm on gfx90a

This was assuming all mayLoad instructions have one def.


# 987e5285 20-Apr-2021 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix assert when trying to fold reg_sequence of physreg copies


12345678910