SIFoldOperands.cpp - OpenGrok history log for /llvm-project/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# 96dfa523	07-Sep-2022	Jay Foad <jay.foad@amd.com>	[AMDGPU] Refactor SIFoldOperands. NFC. Refactor static functions into class methods so they have access to TII, MRI etc.
# 9861a68a	28-Aug-2022	Kazu Hirata <kazu@google.com>	[Target] Qualify auto in range-based for loops (NFC)
# dbda30e2	28-Jul-2022	Carl Ritson <carl.ritson@amd.com>	[AMDGPU][SIFoldOperands] Clear kills when folding COPY Clear all kill flags on source register when folding a COPY. This is necessary because the kills may now be out of order with the uses. Review [AMDGPU][SIFoldOperands] Clear kills when folding COPY Clear all kill flags on source register when folding a COPY. This is necessary because the kills may now be out of order with the uses. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D130622 show more ...
# 3eb2281b	16-May-2022	Jay Foad <jay.foad@amd.com>	[AMDGPU] Aggressively fold immediates in SIFoldOperands Previously SIFoldOperands::foldInstOperand would only fold a non-inlinable immediate into a single user, so as not to increase code size by ad [AMDGPU] Aggressively fold immediates in SIFoldOperands Previously SIFoldOperands::foldInstOperand would only fold a non-inlinable immediate into a single user, so as not to increase code size by adding the same 32-bit literal operand to many instructions. This patch removes that restriction, so that a non-inlinable immediate will be folded into any number of users. The rationale is: - It reduces the number of registers used for holding constant values, which might increase occupancy. (On the other hand, many of these registers are SGPRs which no longer affect occupancy on GFX10+.) - It reduces ALU stalls between the instruction that loads a constant into a register, and the instruction that uses it. - The above benefits are expected to outweigh any increase in code size. Differential Revision: https://reviews.llvm.org/D114643 show more ...
# 6dd21d1d	16-Mar-2022	Christudasan Devadasan <Christudasan.Devadasan@amd.com>	[AMDGPU][SIFoldOperands] Consider the alignment constraints Enforced an alignment check while folding the operands.
# 37b37838	16-Mar-2022	Shengchen Kan <shengchen.kan@intel.com>	[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments
# c4500de2	14-Mar-2022	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	[AMDGPU] gfx940: disable OP_SEL on V_DOT instructions Differential Revision: https://reviews.llvm.org/D121634
# 36fe3f13	08-Mar-2022	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	[AMDGPU] flat scratch SVS addressing mode for gfx940 Both VADDR and SADDR are used in SVS mode. Differential Revision: https://reviews.llvm.org/D121254
# 0d849b82	02-Mar-2022	Christudasan Devadasan <Christudasan.Devadasan@amd.com>	AMDGPU: Skip folding REG_SEQUENCE if found unknown regclasses for its users Use TII::getRegClass to return a valid regclass or a nullptr if the RC is unknown for a given OpIdx. This fixes a potentia AMDGPU: Skip folding REG_SEQUENCE if found unknown regclasses for its users Use TII::getRegClass to return a valid regclass or a nullptr if the RC is unknown for a given OpIdx. This fixes a potential crash occurred while getting the RC from a variadic instruction. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D120813 show more ...
# e7b362d7	04-Mar-2022	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	[AMDGPU] Add v_mov_b64 gfx940 opcode Differential Revision: https://reviews.llvm.org/D121023
Revision tags: llvmorg-14.0.0-rc2
# 69ab233a	16-Feb-2022	Jay Foad <jay.foad@amd.com>	[AMDGPU] Return better Changed status from SIFoldOperands Differential Revision: https://reviews.llvm.org/D120023
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init
# dbf278b9	21-Jan-2022	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	[AMDGPU] Prevent aliasing of SrcC and Dst in MAI Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs or Src_C and vDst are completely separated. The case that Src_C and vDst are [AMDGPU] Prevent aliasing of SrcC and Dst in MAI Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs or Src_C and vDst are completely separated. The case that Src_C and vDst are overlapping should be avoid as new value could be written to accumulator input before it gets read. Note that this inevitably increases register pressure to the point where some programs will become uncompilable. This patch separates MAC and FMA versions of MFMA instructions using either tied dst and src2 or earlyclobber dst. Fixes: SWDEV-318900 Differential Revision: https://reviews.llvm.org/D117844 show more ...
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# f108c7f5	05-Dec-2021	Jack Andersen <jackoalan@gmail.com>	[GlobalISel] Allow DBG_VALUE to use undefined vregs before LiveDebugValues. Expanding on D109750. Since `DBG_VALUE` instructions have final register validity determined in `LDVImpl::handleDebugValu [GlobalISel] Allow DBG_VALUE to use undefined vregs before LiveDebugValues. Expanding on D109750. Since `DBG_VALUE` instructions have final register validity determined in `LDVImpl::handleDebugValue`, there is no apparent reason to immediately prune unused register operands as their defs are erased. Consequently, this renders `MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval` moot; gaining a substantial performance improvement. The only necessary changes involve making relevant passes consider invalid DBG_VALUE vregs uses as valid. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D112852 show more ...
Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3
# 654c89d8	06-Sep-2021	Christudasan Devadasan <Christudasan.Devadasan@amd.com>	[AMDGPU] Make vector superclasses allocatable The combined vector register classes with both VGPRs and AGPRs are currently unallocatable. This patch turns them into allocatable as a prerequisite to [AMDGPU] Make vector superclasses allocatable The combined vector register classes with both VGPRs and AGPRs are currently unallocatable. This patch turns them into allocatable as a prerequisite to enable copy between VGPR and AGPR registers during regalloc. Also, added the missing AV register classes from 192b to 1024b. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D109300 show more ...
# 5b8bbbec	18-Nov-2021	Zarko Todorovski <zarko@ca.ibm.com>	[NFC][llvm] Inclusive language: reword and remove uses of sanity in llvm/lib/Target Reworded removed code comments that contain `sanity check` and `sanity test`.
# 3264e959	09-Nov-2021	Jay Foad <jay.foad@amd.com>	[CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress Delegate updating of LiveIntervals to each target's convertToThreeAddress implementation, instead of repairing LiveIntervals [CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress Delegate updating of LiveIntervals to each target's convertToThreeAddress implementation, instead of repairing LiveIntervals after the fact in TwoAddressInstruction::convertInstTo3Addr. Differential Revision: https://reviews.llvm.org/D113493 show more ...
# d1f45ed5	11-Nov-2021	Neubauer, Sebastian <Sebastian.Neubauer@amd.com>	[AMDGPU][NFC] Fix typos Differential Revision: https://reviews.llvm.org/D113672
# 6cef28ed	22-Sep-2021	Jay Foad <jay.foad@amd.com>	[TII] Remove the MFI argument to convertToThreeAddress. NFC. This simplifies the API and addresses a FIXME in TwoAddressInstructionPass::convertInstTo3Addr. Differential Revision: https://reviews.l [TII] Remove the MFI argument to convertToThreeAddress. NFC. This simplifies the API and addresses a FIXME in TwoAddressInstructionPass::convertInstTo3Addr. Differential Revision: https://reviews.llvm.org/D110229 show more ...
# e7b169a8	23-Sep-2021	Mikael Holmen <mikael.holmen@ericsson.com>	[AMDGPU] Fix gcc warnings about unused variables [NFC]
# 0205806d	21-Sep-2021	Jay Foad <jay.foad@amd.com>	[AMDGPU] Convert mac/fmac to mad/fma when folding output modifiers Use of output modifiers forces VOP3 encoding for a VOP2 mac/fmac instruction, so we might as well convert it to the more flexible V [AMDGPU] Convert mac/fmac to mad/fma when folding output modifiers Use of output modifiers forces VOP3 encoding for a VOP2 mac/fmac instruction, so we might as well convert it to the more flexible VOP3- only mad/fma form. With this change, the only way we should emit VOP3-encoded mac/fmac is if regalloc chooses registers that require the VOP3 encoding, e.g. sgprs for both src0 and src1. In all other cases the mac/fmac should either be converted to mad/fma or shrunk to VOP2 encoding. Differential Revision: https://reviews.llvm.org/D110156 show more ...
Revision tags: llvmorg-13.0.0-rc2
# f3fe44fa	19-Aug-2021	Sebastian Neubauer <Sebastian.Neubauer@amd.com>	[AMDGPU] Fix too many constants with flat scratch Prevent SIFoldOperands from creating SALU instructions with a constant and a frame index. Previously, only one operand was checked to be a frame ind [AMDGPU] Fix too many constants with flat scratch Prevent SIFoldOperands from creating SALU instructions with a constant and a frame index. Previously, only one operand was checked to be a frame index, leading to too many constants when flat scratch is enabled and stack offsets are large. Differential Revision: https://reviews.llvm.org/D108368 show more ...
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2
# 39f8a792	15-Jun-2021	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Try to eliminate clearing of high bits of 16-bit instructions These used to consistently be zeroed pre-gfx9, but gfx9 made the situation complicated since now some still do and some don't. T AMDGPU: Try to eliminate clearing of high bits of 16-bit instructions These used to consistently be zeroed pre-gfx9, but gfx9 made the situation complicated since now some still do and some don't. This also manages to pick up a few cases that the pattern fails to optimize away. We handle some cases with instruction patterns, but some get through. In particular this improves the integer cases. show more ...
Revision tags: llvmorg-12.0.1-rc1
# 7c706af0	07-Apr-2021	Jay Foad <jay.foad@amd.com>	[AMDGPU] SIFoldOperands: clean up tryConstantFoldOp First clean up the strange API of tryConstantFoldOp where it took an immediate operand value, but no indication of which operand it was the value [AMDGPU] SIFoldOperands: clean up tryConstantFoldOp First clean up the strange API of tryConstantFoldOp where it took an immediate operand value, but no indication of which operand it was the value for. Second clean up the loop that calls tryConstantFoldOp so that it does not have to restart from the beginning every time it folds an instruction. This is NFCI but there are some minor changes caused by the order in which things are folded. Differential Revision: https://reviews.llvm.org/D100031 show more ...
# b5833277	23-Apr-2021	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Fix assert on inline asm on gfx90a This was assuming all mayLoad instructions have one def.
# 987e5285	20-Apr-2021	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Fix assert when trying to fold reg_sequence of physreg copies
1 234 5 6 7 8 9 10