#
d6ff5cf9 |
| 25-Dec-2020 |
Kazu Hirata <kazu@google.com> |
[Target] Use llvm::any_of (NFC)
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2 |
|
#
ae8f4b21 |
| 18-Dec-2020 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Folding of FI operand with flat scratch
Differential Revision: https://reviews.llvm.org/D93501
|
#
1fd1f638 |
| 13-Dec-2020 |
Michael Liao <michael.hliao@gmail.com> |
[amdgpu] Fix a crash case when `V_CNDMASK` could be simplified.
- Once an instruction is simplified, foldable candidates from it should be invalidated or skipped as the operand index is no longer
[amdgpu] Fix a crash case when `V_CNDMASK` could be simplified.
- Once an instruction is simplified, foldable candidates from it should be invalidated or skipped as the operand index is no longer valid.
Differential Revision: https://reviews.llvm.org/D93174
show more ...
|
Revision tags: llvmorg-11.0.1-rc1 |
|
#
038d884a |
| 21-Oct-2020 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Use flat scratch instructions where available
The support is disabled by default. So far there is instruction selection, spilling, and frame elimination. It also changes SP from unswizzled
[AMDGPU] Use flat scratch instructions where available
The support is disabled by default. So far there is instruction selection, spilling, and frame elimination. It also changes SP from unswizzled to swizzled as used by flat scratch instructions, so it cannot be mixed with MUBUF stack access.
At the very least missing:
- GlobalISel; - Some optimizations in frame elimination in between vector and scalar ALU; - It shall finally allow to always materialize frame index as an SGPR, but that is not implemented and frame elimination cannot handle it yet; - Unaligned and/or multidword flat scratch shall work, but it is legalized now for MUBUF; - Operand folding cannot optimize FI like with MUBUF yet; - It will need scaling the value of the SP/FP in the DWARF expression to recover the unswizzled scratch address;
Differential Revision: https://reviews.llvm.org/D89170
show more ...
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3 |
|
#
892ef2e3 |
| 16-Sep-2020 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] More codegen patterns for v2i16/v2f16 build_vector
It's simpler to do this at codegen time than to do ad-hoc constant folding of machine instructions in SIFoldOperands.
Differential Revisi
[AMDGPU] More codegen patterns for v2i16/v2f16 build_vector
It's simpler to do this at codegen time than to do ad-hoc constant folding of machine instructions in SIFoldOperands.
Differential Revision: https://reviews.llvm.org/D88028
show more ...
|
#
27df1652 |
| 17-Sep-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Revert "[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel."
This reverts commit c3492a1aa1b98c8d81b0969d52cea7681f0624c2.
I think this is the wrong strategy and wrong place to do this tra
Revert "[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel."
This reverts commit c3492a1aa1b98c8d81b0969d52cea7681f0624c2.
I think this is the wrong strategy and wrong place to do this transform anyway. Also reverts follow up commit 7d593d0d6905b55ca1124fca5e4d1ebb17203138.
show more ...
|
#
c3492a1a |
| 09-Sep-2020 |
Michael Liao <michael.hliao@gmail.com> |
[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel.
- Need to lower COPY from SGPR to VGPR to a real instruction as the standard COPY is used where the source and destination are from the
[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel.
- Need to lower COPY from SGPR to VGPR to a real instruction as the standard COPY is used where the source and destination are from the same register bank so that we potentially coalesc them together and save one COPY. Considering that, backend optimizations, such as CSE, won't handle them. However, the copy from SGPR to VGPR always needs materializing to a native instruction, it should be lowered into a real one before other backend optimizations.
Differential Revision: https://reviews.llvm.org/D87556
show more ...
|
#
90777e29 |
| 09-Sep-2020 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Enable scheduling around FP MODE-setting instructions
Pre-gfx10 all MODE-setting instructions were S_SETREG_B32 which is marked as having unmodeled side effects, which makes the machine sch
[AMDGPU] Enable scheduling around FP MODE-setting instructions
Pre-gfx10 all MODE-setting instructions were S_SETREG_B32 which is marked as having unmodeled side effects, which makes the machine scheduler treat it as a barrier. Now that we have proper implicit $mode operands we can use a no-side-effects S_SETREG_B32_mode pseudo instead for setregs that only touch the FP MODE bits, to give the scheduler more freedom.
Differential Revision: https://reviews.llvm.org/D87446
show more ...
|
#
c259d3a0 |
| 04-Sep-2020 |
dfukalov <daniil.fukalov@amd.com> |
[AMDGPU] Fix for folding v2.16 literals.
It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are incorrectly processed so one of two packed values were lost.
Introduced new fu
[AMDGPU] Fix for folding v2.16 literals.
It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are incorrectly processed so one of two packed values were lost.
Introduced new function to check immediate 32-bit operand can be folded. Converted condition about current op_sel flags value to fall-through.
Fixes: SWDEV-247595
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D87158
show more ...
|
Revision tags: llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1 |
|
#
bf41c4d2 |
| 23-Oct-2019 |
Michael Liao <michael.hliao@gmail.com> |
[codegen] Ensure target flags are cleared/set properly. NFC.
- When an operand is changed into an immediate value or like, ensure their target flags being cleared or set properly.
Differential Re
[codegen] Ensure target flags are cleared/set properly. NFC.
- When an operand is changed into an immediate value or like, ensure their target flags being cleared or set properly.
Differential Revision: https://reviews.llvm.org/D87109
show more ...
|
#
34978602 |
| 20-Aug-2020 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister
... in favour of the isPhysical/isVirtual methods.
|
#
da3f357d |
| 22-Jul-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Don't look at dbg users for foldable operands
These would have always failed to fold, so checking them or adding them to the fold candidates is useless.
|
#
68fab44a |
| 09-Aug-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix visiting physreg dest users when folding immediate copies
This can fold the immediate into the physical destination, but this should not look for further users of the register. Fixes reg
AMDGPU: Fix visiting physreg dest users when folding immediate copies
This can fold the immediate into the physical destination, but this should not look for further users of the register. Fixes regression introduced by 766cb615a3b96025192707f4670cdf171da84034.
show more ...
|
#
5b32518f |
| 30-Jul-2020 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Do not use undef on indirect source
We are using undef on the indirect move source subreg and then using implicit super-reg. This creates a problem in RA when Greedy decides to split the re
[AMDGPU] Do not use undef on indirect source
We are using undef on the indirect move source subreg and then using implicit super-reg. This creates a problem in RA when Greedy decides to split the register. It reassigns the implicit super-reg but does not bother to change undef source because it is really does not matter. The fix is to stop lying to RA and drop undef flag.
This has also hit a problem in SIFoldOperands as it can fold immediate into an indirect move since there is no undef flag anymore. That results in multiple test failures, so added the check for this case.
Differential Revision: https://reviews.llvm.org/D84899
show more ...
|
#
766cb615 |
| 20-Jul-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Relax restriction on folding immediates into physregs
I never completed the work on the patches referenced by f8bf7d7f42f28fa18144091022236208e199f331, but this was intended to avoid folding
AMDGPU: Relax restriction on folding immediates into physregs
I never completed the work on the patches referenced by f8bf7d7f42f28fa18144091022236208e199f331, but this was intended to avoid folding immediate writes into m0 which the coalescer doesn't understand very well. Relax this to allow simple SGPR immediates to fold directly into VGPR copies. This pattern shows up routinely in current GlobalISel code since nothing is smart enough to emit VGPR constants yet.
show more ...
|
#
b9c644ec |
| 22-Jul-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix failures from overflowing uint8_t number of operands
If the operand index exceeded the limit of unsigned char, it wrapped and would point to the wrong operand. Increase the size of the o
AMDGPU: Fix failures from overflowing uint8_t number of operands
If the operand index exceeded the limit of unsigned char, it wrapped and would point to the wrong operand. Increase the size of the operand index field to avoid this, and also don't bother trying to fold into implicit operands.
show more ...
|
#
79f67cae |
| 14-Jul-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Rename add/sub with carry out instructions
The hardware has created a real mess in the naming for add/sub, which have been renamed basically every generation. Switch the carry out pseudos to
AMDGPU: Rename add/sub with carry out instructions
The hardware has created a real mess in the naming for add/sub, which have been renamed basically every generation. Switch the carry out pseudos to have the gfx9/gfx10 names. We were using the original SI/CI v_add_i32/v_sub_i32 names. Later targets reintroduced these names as carryless instructions with a saturating clamp bit, which we do not define. Do this rename so we can unambiguously add these missing instructions.
The carry-in versions should also be renamed, but at least those had a consistent _u32 name to begin with. The 16-bit instructions were also renamed, but aren't ambiguous.
This does regress assembler error message quality in some cases. In mismatched wave32/wave64 situations, this will switch from "unsupported instruction" to "invalid operand", with the error pointing at the wrong position. I couldn't quite follow how the assembler selects these, but the previous behavior seemed accidental to me. It looked like there was a partial attempt to handle this which was never completed (i.e. there is an AMDGPUOperand::isBoolReg but it isn't used for anything).
show more ...
|
#
16ea23ff |
| 01-Jul-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Clear subreg when folding immediate copies
This was getting reinterpreted as operand target flags, and appearing as as <unknown target flag>, resulting in unparseable MIR.
|
#
a19a56f6 |
| 07-Apr-2020 |
Graham Sellers <graham.sellers@amd.com> |
[AMDGPU] Extend constant folding for logical operations
This patch extends existing constant folding in logical operations to handle S_XNOR, S_NAND, S_NOR, S_ANDN2, S_ORN2, V_LSHL_ADD_U32 and V_AND_
[AMDGPU] Extend constant folding for logical operations
This patch extends existing constant folding in logical operations to handle S_XNOR, S_NAND, S_NOR, S_ANDN2, S_ORN2, V_LSHL_ADD_U32 and V_AND_OR_B32. Also added a couple of tests for existing folds.
show more ...
|
#
60b1967c |
| 21-Jan-2020 |
Scott Linder <Scott.Linder@amd.com> |
[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us t
[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI.
As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative.
Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first.
Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention.
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75138
show more ...
|
#
1024b73e |
| 03-Dec-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Split denormal mode tracking bits
Prepare to accurately track the future denormal-fp-math attribute changes. The way to actually set these separately is not wired in yet.
This is just a mec
AMDGPU: Split denormal mode tracking bits
Prepare to accurately track the future denormal-fp-math attribute changes. The way to actually set these separately is not wired in yet.
This is just a mechanical change, and mostly still assumes the input and output mode match. This should be refined for some cases. For example, fcanonicalize lowering should use the flushing variant if either input or output flushing is enabled
show more ...
|
#
07a569a0 |
| 08-Jan-2020 |
Michael Liao <michael.hliao@gmail.com> |
[amdgpu] Remove unused header. NFC.
|
#
46db6068 |
| 02-Dec-2019 |
David Stuttard <david.stuttard@amd.com> |
AMDGPU: Avoid folding 2 constant operands into an SALU operation
Summary: Catch the (admittedly unusual) case where SIFoldOperands attempts to fold 2 constant operands into the same SALU operation,
AMDGPU: Avoid folding 2 constant operands into an SALU operation
Summary: Catch the (admittedly unusual) case where SIFoldOperands attempts to fold 2 constant operands into the same SALU operation, with neither operand able to be encoded as an inline constant.
Change-Id: Ibc48d662c9ffd8bbacd154976b0b1c257ace0927
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70896
show more ...
|
#
db0ed3e4 |
| 01-Nov-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Refactor treatment of denormal mode
Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the sub
AMDGPU: Refactor treatment of denormal mode
Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute.
This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute.
show more ...
|
#
1bfcc608 |
| 04-Nov-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Added assert in SIFoldOperands before ptr use. NFC.
|