SIFoldOperands.cpp - OpenGrok history log for /llvm-project/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# d6ff5cf9	25-Dec-2020	Kazu Hirata <kazu@google.com>	[Target] Use llvm::any_of (NFC)
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2
# ae8f4b21	18-Dec-2020	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	[AMDGPU] Folding of FI operand with flat scratch Differential Revision: https://reviews.llvm.org/D93501
# 1fd1f638	13-Dec-2020	Michael Liao <michael.hliao@gmail.com>	[amdgpu] Fix a crash case when `V_CNDMASK` could be simplified. - Once an instruction is simplified, foldable candidates from it should be invalidated or skipped as the operand index is no longer [amdgpu] Fix a crash case when `V_CNDMASK` could be simplified. - Once an instruction is simplified, foldable candidates from it should be invalidated or skipped as the operand index is no longer valid. Differential Revision: https://reviews.llvm.org/D93174 show more ...
Revision tags: llvmorg-11.0.1-rc1
# 038d884a	21-Oct-2020	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	[AMDGPU] Use flat scratch instructions where available The support is disabled by default. So far there is instruction selection, spilling, and frame elimination. It also changes SP from unswizzled [AMDGPU] Use flat scratch instructions where available The support is disabled by default. So far there is instruction selection, spilling, and frame elimination. It also changes SP from unswizzled to swizzled as used by flat scratch instructions, so it cannot be mixed with MUBUF stack access. At the very least missing: - GlobalISel; - Some optimizations in frame elimination in between vector and scalar ALU; - It shall finally allow to always materialize frame index as an SGPR, but that is not implemented and frame elimination cannot handle it yet; - Unaligned and/or multidword flat scratch shall work, but it is legalized now for MUBUF; - Operand folding cannot optimize FI like with MUBUF yet; - It will need scaling the value of the SP/FP in the DWARF expression to recover the unswizzled scratch address; Differential Revision: https://reviews.llvm.org/D89170 show more ...
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3
# 892ef2e3	16-Sep-2020	Jay Foad <jay.foad@amd.com>	[AMDGPU] More codegen patterns for v2i16/v2f16 build_vector It's simpler to do this at codegen time than to do ad-hoc constant folding of machine instructions in SIFoldOperands. Differential Revisi [AMDGPU] More codegen patterns for v2i16/v2f16 build_vector It's simpler to do this at codegen time than to do ad-hoc constant folding of machine instructions in SIFoldOperands. Differential Revision: https://reviews.llvm.org/D88028 show more ...
# 27df1652	17-Sep-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	Revert "[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel." This reverts commit c3492a1aa1b98c8d81b0969d52cea7681f0624c2. I think this is the wrong strategy and wrong place to do this tra Revert "[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel." This reverts commit c3492a1aa1b98c8d81b0969d52cea7681f0624c2. I think this is the wrong strategy and wrong place to do this transform anyway. Also reverts follow up commit 7d593d0d6905b55ca1124fca5e4d1ebb17203138. show more ...
# c3492a1a	09-Sep-2020	Michael Liao <michael.hliao@gmail.com>	[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel. - Need to lower COPY from SGPR to VGPR to a real instruction as the standard COPY is used where the source and destination are from the [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel. - Need to lower COPY from SGPR to VGPR to a real instruction as the standard COPY is used where the source and destination are from the same register bank so that we potentially coalesc them together and save one COPY. Considering that, backend optimizations, such as CSE, won't handle them. However, the copy from SGPR to VGPR always needs materializing to a native instruction, it should be lowered into a real one before other backend optimizations. Differential Revision: https://reviews.llvm.org/D87556 show more ...
# 90777e29	09-Sep-2020	Jay Foad <jay.foad@amd.com>	[AMDGPU] Enable scheduling around FP MODE-setting instructions Pre-gfx10 all MODE-setting instructions were S_SETREG_B32 which is marked as having unmodeled side effects, which makes the machine sch [AMDGPU] Enable scheduling around FP MODE-setting instructions Pre-gfx10 all MODE-setting instructions were S_SETREG_B32 which is marked as having unmodeled side effects, which makes the machine scheduler treat it as a barrier. Now that we have proper implicit $mode operands we can use a no-side-effects S_SETREG_B32_mode pseudo instead for setregs that only touch the FP MODE bits, to give the scheduler more freedom. Differential Revision: https://reviews.llvm.org/D87446 show more ...
# c259d3a0	04-Sep-2020	dfukalov <daniil.fukalov@amd.com>	[AMDGPU] Fix for folding v2.16 literals. It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are incorrectly processed so one of two packed values were lost. Introduced new fu [AMDGPU] Fix for folding v2.16 literals. It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are incorrectly processed so one of two packed values were lost. Introduced new function to check immediate 32-bit operand can be folded. Converted condition about current op_sel flags value to fall-through. Fixes: SWDEV-247595 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D87158 show more ...
Revision tags: llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1
# bf41c4d2	23-Oct-2019	Michael Liao <michael.hliao@gmail.com>	[codegen] Ensure target flags are cleared/set properly. NFC. - When an operand is changed into an immediate value or like, ensure their target flags being cleared or set properly. Differential Re [codegen] Ensure target flags are cleared/set properly. NFC. - When an operand is changed into an immediate value or like, ensure their target flags being cleared or set properly. Differential Revision: https://reviews.llvm.org/D87109 show more ...
# 34978602	20-Aug-2020	Jay Foad <jay.foad@amd.com>	[AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister ... in favour of the isPhysical/isVirtual methods.
# da3f357d	22-Jul-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Don't look at dbg users for foldable operands These would have always failed to fold, so checking them or adding them to the fold candidates is useless.
# 68fab44a	09-Aug-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Fix visiting physreg dest users when folding immediate copies This can fold the immediate into the physical destination, but this should not look for further users of the register. Fixes reg AMDGPU: Fix visiting physreg dest users when folding immediate copies This can fold the immediate into the physical destination, but this should not look for further users of the register. Fixes regression introduced by 766cb615a3b96025192707f4670cdf171da84034. show more ...
# 5b32518f	30-Jul-2020	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	[AMDGPU] Do not use undef on indirect source We are using undef on the indirect move source subreg and then using implicit super-reg. This creates a problem in RA when Greedy decides to split the re [AMDGPU] Do not use undef on indirect source We are using undef on the indirect move source subreg and then using implicit super-reg. This creates a problem in RA when Greedy decides to split the register. It reassigns the implicit super-reg but does not bother to change undef source because it is really does not matter. The fix is to stop lying to RA and drop undef flag. This has also hit a problem in SIFoldOperands as it can fold immediate into an indirect move since there is no undef flag anymore. That results in multiple test failures, so added the check for this case. Differential Revision: https://reviews.llvm.org/D84899 show more ...
# 766cb615	20-Jul-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Relax restriction on folding immediates into physregs I never completed the work on the patches referenced by f8bf7d7f42f28fa18144091022236208e199f331, but this was intended to avoid folding AMDGPU: Relax restriction on folding immediates into physregs I never completed the work on the patches referenced by f8bf7d7f42f28fa18144091022236208e199f331, but this was intended to avoid folding immediate writes into m0 which the coalescer doesn't understand very well. Relax this to allow simple SGPR immediates to fold directly into VGPR copies. This pattern shows up routinely in current GlobalISel code since nothing is smart enough to emit VGPR constants yet. show more ...
# b9c644ec	22-Jul-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Fix failures from overflowing uint8_t number of operands If the operand index exceeded the limit of unsigned char, it wrapped and would point to the wrong operand. Increase the size of the o AMDGPU: Fix failures from overflowing uint8_t number of operands If the operand index exceeded the limit of unsigned char, it wrapped and would point to the wrong operand. Increase the size of the operand index field to avoid this, and also don't bother trying to fold into implicit operands. show more ...
# 79f67cae	14-Jul-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Rename add/sub with carry out instructions The hardware has created a real mess in the naming for add/sub, which have been renamed basically every generation. Switch the carry out pseudos to AMDGPU: Rename add/sub with carry out instructions The hardware has created a real mess in the naming for add/sub, which have been renamed basically every generation. Switch the carry out pseudos to have the gfx9/gfx10 names. We were using the original SI/CI v_add_i32/v_sub_i32 names. Later targets reintroduced these names as carryless instructions with a saturating clamp bit, which we do not define. Do this rename so we can unambiguously add these missing instructions. The carry-in versions should also be renamed, but at least those had a consistent _u32 name to begin with. The 16-bit instructions were also renamed, but aren't ambiguous. This does regress assembler error message quality in some cases. In mismatched wave32/wave64 situations, this will switch from "unsupported instruction" to "invalid operand", with the error pointing at the wrong position. I couldn't quite follow how the assembler selects these, but the previous behavior seemed accidental to me. It looked like there was a partial attempt to handle this which was never completed (i.e. there is an AMDGPUOperand::isBoolReg but it isn't used for anything). show more ...
# 16ea23ff	01-Jul-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Clear subreg when folding immediate copies This was getting reinterpreted as operand target flags, and appearing as as <unknown target flag>, resulting in unparseable MIR.
# a19a56f6	07-Apr-2020	Graham Sellers <graham.sellers@amd.com>	[AMDGPU] Extend constant folding for logical operations This patch extends existing constant folding in logical operations to handle S_XNOR, S_NAND, S_NOR, S_ANDN2, S_ORN2, V_LSHL_ADD_U32 and V_AND_ [AMDGPU] Extend constant folding for logical operations This patch extends existing constant folding in logical operations to handle S_XNOR, S_NAND, S_NOR, S_ANDN2, S_ORN2, V_LSHL_ADD_U32 and V_AND_OR_B32. Also added a couple of tests for existing folds. show more ...
# 60b1967c	21-Jan-2020	Scott Linder <Scott.Linder@amd.com>	[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us t [AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI. As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative. Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first. Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention. Tags: #llvm Differential Revision: https://reviews.llvm.org/D75138 show more ...
# 1024b73e	03-Dec-2019	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Split denormal mode tracking bits Prepare to accurately track the future denormal-fp-math attribute changes. The way to actually set these separately is not wired in yet. This is just a mec AMDGPU: Split denormal mode tracking bits Prepare to accurately track the future denormal-fp-math attribute changes. The way to actually set these separately is not wired in yet. This is just a mechanical change, and mostly still assumes the input and output mode match. This should be refined for some cases. For example, fcanonicalize lowering should use the flushing variant if either input or output flushing is enabled show more ...
# 07a569a0	08-Jan-2020	Michael Liao <michael.hliao@gmail.com>	[amdgpu] Remove unused header. NFC.
# 46db6068	02-Dec-2019	David Stuttard <david.stuttard@amd.com>	AMDGPU: Avoid folding 2 constant operands into an SALU operation Summary: Catch the (admittedly unusual) case where SIFoldOperands attempts to fold 2 constant operands into the same SALU operation, AMDGPU: Avoid folding 2 constant operands into an SALU operation Summary: Catch the (admittedly unusual) case where SIFoldOperands attempts to fold 2 constant operands into the same SALU operation, with neither operand able to be encoded as an inline constant. Change-Id: Ibc48d662c9ffd8bbacd154976b0b1c257ace0927 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70896 show more ...
# db0ed3e4	01-Nov-2019	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Refactor treatment of denormal mode Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the sub AMDGPU: Refactor treatment of denormal mode Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute. This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute. show more ...
# 1bfcc608	04-Nov-2019	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	[AMDGPU] Added assert in SIFoldOperands before ptr use. NFC.
1 2 3 456 7 8 9 10