#
96dfa523 |
| 07-Sep-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Refactor SIFoldOperands. NFC.
Refactor static functions into class methods so they have access to TII, MRI etc.
|
#
9861a68a |
| 28-Aug-2022 |
Kazu Hirata <kazu@google.com> |
[Target] Qualify auto in range-based for loops (NFC)
|
#
dbda30e2 |
| 28-Jul-2022 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU][SIFoldOperands] Clear kills when folding COPY
Clear all kill flags on source register when folding a COPY. This is necessary because the kills may now be out of order with the uses.
Review
[AMDGPU][SIFoldOperands] Clear kills when folding COPY
Clear all kill flags on source register when folding a COPY. This is necessary because the kills may now be out of order with the uses.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D130622
show more ...
|
#
3eb2281b |
| 16-May-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Aggressively fold immediates in SIFoldOperands
Previously SIFoldOperands::foldInstOperand would only fold a non-inlinable immediate into a single user, so as not to increase code size by ad
[AMDGPU] Aggressively fold immediates in SIFoldOperands
Previously SIFoldOperands::foldInstOperand would only fold a non-inlinable immediate into a single user, so as not to increase code size by adding the same 32-bit literal operand to many instructions.
This patch removes that restriction, so that a non-inlinable immediate will be folded into any number of users. The rationale is: - It reduces the number of registers used for holding constant values, which might increase occupancy. (On the other hand, many of these registers are SGPRs which no longer affect occupancy on GFX10+.) - It reduces ALU stalls between the instruction that loads a constant into a register, and the instruction that uses it. - The above benefits are expected to outweigh any increase in code size.
Differential Revision: https://reviews.llvm.org/D114643
show more ...
|
#
6dd21d1d |
| 16-Mar-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU][SIFoldOperands] Consider the alignment constraints
Enforced an alignment check while folding the operands.
|
#
37b37838 |
| 16-Mar-2022 |
Shengchen Kan <shengchen.kan@intel.com> |
[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments
|
#
c4500de2 |
| 14-Mar-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] gfx940: disable OP_SEL on V_DOT instructions
Differential Revision: https://reviews.llvm.org/D121634
|
#
36fe3f13 |
| 08-Mar-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] flat scratch SVS addressing mode for gfx940
Both VADDR and SADDR are used in SVS mode.
Differential Revision: https://reviews.llvm.org/D121254
|
#
0d849b82 |
| 02-Mar-2022 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
AMDGPU: Skip folding REG_SEQUENCE if found unknown regclasses for its users
Use TII::getRegClass to return a valid regclass or a nullptr if the RC is unknown for a given OpIdx. This fixes a potentia
AMDGPU: Skip folding REG_SEQUENCE if found unknown regclasses for its users
Use TII::getRegClass to return a valid regclass or a nullptr if the RC is unknown for a given OpIdx. This fixes a potential crash occurred while getting the RC from a variadic instruction.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D120813
show more ...
|
#
e7b362d7 |
| 04-Mar-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Add v_mov_b64 gfx940 opcode
Differential Revision: https://reviews.llvm.org/D121023
|
Revision tags: llvmorg-14.0.0-rc2 |
|
#
69ab233a |
| 16-Feb-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Return better Changed status from SIFoldOperands
Differential Revision: https://reviews.llvm.org/D120023
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init |
|
#
dbf278b9 |
| 21-Jan-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Prevent aliasing of SrcC and Dst in MAI
Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs or Src_C and vDst are completely separated. The case that Src_C and vDst are
[AMDGPU] Prevent aliasing of SrcC and Dst in MAI
Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs or Src_C and vDst are completely separated. The case that Src_C and vDst are overlapping should be avoid as new value could be written to accumulator input before it gets read.
Note that this inevitably increases register pressure to the point where some programs will become uncompilable.
This patch separates MAC and FMA versions of MFMA instructions using either tied dst and src2 or earlyclobber dst.
Fixes: SWDEV-318900
Differential Revision: https://reviews.llvm.org/D117844
show more ...
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
#
f108c7f5 |
| 05-Dec-2021 |
Jack Andersen <jackoalan@gmail.com> |
[GlobalISel] Allow DBG_VALUE to use undefined vregs before LiveDebugValues.
Expanding on D109750.
Since `DBG_VALUE` instructions have final register validity determined in `LDVImpl::handleDebugValu
[GlobalISel] Allow DBG_VALUE to use undefined vregs before LiveDebugValues.
Expanding on D109750.
Since `DBG_VALUE` instructions have final register validity determined in `LDVImpl::handleDebugValue`, there is no apparent reason to immediately prune unused register operands as their defs are erased. Consequently, this renders `MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval` moot; gaining a substantial performance improvement.
The only necessary changes involve making relevant passes consider invalid DBG_VALUE vregs uses as valid.
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D112852
show more ...
|
Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3 |
|
#
654c89d8 |
| 06-Sep-2021 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU] Make vector superclasses allocatable
The combined vector register classes with both VGPRs and AGPRs are currently unallocatable. This patch turns them into allocatable as a prerequisite to
[AMDGPU] Make vector superclasses allocatable
The combined vector register classes with both VGPRs and AGPRs are currently unallocatable. This patch turns them into allocatable as a prerequisite to enable copy between VGPR and AGPR registers during regalloc.
Also, added the missing AV register classes from 192b to 1024b.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D109300
show more ...
|
#
5b8bbbec |
| 18-Nov-2021 |
Zarko Todorovski <zarko@ca.ibm.com> |
[NFC][llvm] Inclusive language: reword and remove uses of sanity in llvm/lib/Target
Reworded removed code comments that contain `sanity check` and `sanity test`.
|
#
3264e959 |
| 09-Nov-2021 |
Jay Foad <jay.foad@amd.com> |
[CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress
Delegate updating of LiveIntervals to each target's convertToThreeAddress implementation, instead of repairing LiveIntervals
[CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress
Delegate updating of LiveIntervals to each target's convertToThreeAddress implementation, instead of repairing LiveIntervals after the fact in TwoAddressInstruction::convertInstTo3Addr.
Differential Revision: https://reviews.llvm.org/D113493
show more ...
|
#
d1f45ed5 |
| 11-Nov-2021 |
Neubauer, Sebastian <Sebastian.Neubauer@amd.com> |
[AMDGPU][NFC] Fix typos
Differential Revision: https://reviews.llvm.org/D113672
|
#
6cef28ed |
| 22-Sep-2021 |
Jay Foad <jay.foad@amd.com> |
[TII] Remove the MFI argument to convertToThreeAddress. NFC.
This simplifies the API and addresses a FIXME in TwoAddressInstructionPass::convertInstTo3Addr.
Differential Revision: https://reviews.l
[TII] Remove the MFI argument to convertToThreeAddress. NFC.
This simplifies the API and addresses a FIXME in TwoAddressInstructionPass::convertInstTo3Addr.
Differential Revision: https://reviews.llvm.org/D110229
show more ...
|
#
e7b169a8 |
| 23-Sep-2021 |
Mikael Holmen <mikael.holmen@ericsson.com> |
[AMDGPU] Fix gcc warnings about unused variables [NFC]
|
#
0205806d |
| 21-Sep-2021 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Convert mac/fmac to mad/fma when folding output modifiers
Use of output modifiers forces VOP3 encoding for a VOP2 mac/fmac instruction, so we might as well convert it to the more flexible V
[AMDGPU] Convert mac/fmac to mad/fma when folding output modifiers
Use of output modifiers forces VOP3 encoding for a VOP2 mac/fmac instruction, so we might as well convert it to the more flexible VOP3- only mad/fma form.
With this change, the only way we should emit VOP3-encoded mac/fmac is if regalloc chooses registers that require the VOP3 encoding, e.g. sgprs for both src0 and src1. In all other cases the mac/fmac should either be converted to mad/fma or shrunk to VOP2 encoding.
Differential Revision: https://reviews.llvm.org/D110156
show more ...
|
Revision tags: llvmorg-13.0.0-rc2 |
|
#
f3fe44fa |
| 19-Aug-2021 |
Sebastian Neubauer <Sebastian.Neubauer@amd.com> |
[AMDGPU] Fix too many constants with flat scratch
Prevent SIFoldOperands from creating SALU instructions with a constant and a frame index. Previously, only one operand was checked to be a frame ind
[AMDGPU] Fix too many constants with flat scratch
Prevent SIFoldOperands from creating SALU instructions with a constant and a frame index. Previously, only one operand was checked to be a frame index, leading to too many constants when flat scratch is enabled and stack offsets are large.
Differential Revision: https://reviews.llvm.org/D108368
show more ...
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2 |
|
#
39f8a792 |
| 15-Jun-2021 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Try to eliminate clearing of high bits of 16-bit instructions
These used to consistently be zeroed pre-gfx9, but gfx9 made the situation complicated since now some still do and some don't. T
AMDGPU: Try to eliminate clearing of high bits of 16-bit instructions
These used to consistently be zeroed pre-gfx9, but gfx9 made the situation complicated since now some still do and some don't. This also manages to pick up a few cases that the pattern fails to optimize away.
We handle some cases with instruction patterns, but some get through. In particular this improves the integer cases.
show more ...
|
Revision tags: llvmorg-12.0.1-rc1 |
|
#
7c706af0 |
| 07-Apr-2021 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] SIFoldOperands: clean up tryConstantFoldOp
First clean up the strange API of tryConstantFoldOp where it took an immediate operand value, but no indication of which operand it was the value
[AMDGPU] SIFoldOperands: clean up tryConstantFoldOp
First clean up the strange API of tryConstantFoldOp where it took an immediate operand value, but no indication of which operand it was the value for.
Second clean up the loop that calls tryConstantFoldOp so that it does not have to restart from the beginning every time it folds an instruction.
This is NFCI but there are some minor changes caused by the order in which things are folded.
Differential Revision: https://reviews.llvm.org/D100031
show more ...
|
#
b5833277 |
| 23-Apr-2021 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix assert on inline asm on gfx90a
This was assuming all mayLoad instructions have one def.
|
#
987e5285 |
| 20-Apr-2021 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix assert when trying to fold reg_sequence of physreg copies
|