#
149ed9d2 |
| 24-Jan-2024 |
Petar Avramovic <Petar.Avramovic@amd.com> |
AMDGPU: update GFX11 wmma hazards (#76143)
One V_NOP or unrelated VALU instruction in between is required for
correctness when matrix A or B of current WMMA instruction overlaps with
matrix D of p
AMDGPU: update GFX11 wmma hazards (#76143)
One V_NOP or unrelated VALU instruction in between is required for
correctness when matrix A or B of current WMMA instruction overlaps with
matrix D of previous WMMA instruction.
Remaining cases of WMMA operand overlaps are handled by the hardware and
do not require handling in hazard recognizer.
Hardware may stall in cases where:
- matrix C of current WMMA instruction overlaps with matrix D of
previous WMMA instruction
- VALU instruction reads matrix D of previous WMMA instruction
- matrix A,B or C of WMMA instruction reads result of previous VALU
instruction
show more ...
|
Revision tags: llvmorg-19-init |
|
#
42b08842 |
| 23-Jan-2024 |
Pierre van Houtryve <pierre.vanhoutryve@amd.com> |
[AMDGPU] Handle V_PERMLANE64_B32 in fixVcmpxPermlaneHazards (#79125)
Fixes #78856
|
#
97747467 |
| 19-Jan-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Update hazard recognition for new GFX12 wait counters (#78722)
In most cases the hazards no longer apply, so just assert that we are
not on GFX12.
|
#
ba52f06f |
| 18-Jan-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438)
Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait
instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into
sep
[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438)
Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait
instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into
separate LOADCNT, SAMPLECNT and BVHCNT counters.
show more ...
|
#
b120dae9 |
| 11-Jan-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Support GFX12 VDSDIR instructions WAITVMSRC operand in GCNHazardRecognizer (#77628)
Modify GCNHazardRecognizer::fixLdsDirectVMEMHazard() so the waitvsrc
operand
in gfx12 DS_PARAM_LOAD or
[AMDGPU] Support GFX12 VDSDIR instructions WAITVMSRC operand in GCNHazardRecognizer (#77628)
Modify GCNHazardRecognizer::fixLdsDirectVMEMHazard() so the waitvsrc
operand
in gfx12 DS_PARAM_LOAD or DS_DIRECT_LOAD instructions is set
appropriately
depending on whether a hazard is found or not, rather than inserting an
S_WAITCNT_DEPCTR instruction if a hazard needs to be mitigated.
Co-authored-by: Stephen Thomas <Stephen.Thomas@amd.com>
show more ...
|
#
966416b9 |
| 15-Dec-2023 |
Mariusz Sikora <mariusz.sikora@amd.com> |
[AMDGPU][GFX12] Add new v_permlane16 variants (#75475)
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4 |
|
#
85e3875a |
| 23-Aug-2023 |
Michael Maitland <michaeltmaitland@gmail.com> |
[TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to `AcquireAtCycle` and `ReleaseAtCycle` r
[TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to `AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the fact that resource allocation is now represented as an interval, relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these fields in the scheduler. In addition it was confusing that `StartAtCycle` was singular but `Cycles` was plural. This renaming fixes this inconsistency.
This commit as previously reverted since it missed renaming that came down after rebasing. This version of the commit fixes those problems.
Differential Revision: https://reviews.llvm.org/D158568
show more ...
|
#
71bfec76 |
| 24-Aug-2023 |
Michael Maitland <michaeltmaitland@gmail.com> |
Revert "[TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics"
This reverts commit 5b854f2c23ea1b000cb4cac4c0fea77326c03d43.
Build still failing.
|
#
5b854f2c |
| 23-Aug-2023 |
Michael Maitland <michaeltmaitland@gmail.com> |
[TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to `AcquireAtCycle` and `ReleaseAtCycle` r
[TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to `AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the fact that resource allocation is now represented as an interval, relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these fields in the scheduler. In addition it was confusing that `StartAtCycle` was singular but `Cycles` was plural. This renaming fixes this inconsistency.
This commit as previously reverted since it missed renaming that came down after rebasing. This version of the commit fixes those problems.
Differential Revision: https://reviews.llvm.org/D158568
show more ...
|
Revision tags: llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
#
2dfb4b56 |
| 04-Jul-2023 |
Stephen Thomas <Stephen.Thomas@amd.com> |
[AMDGPU] Fix incorrect hazard mitigation
GCNHazardRecognizer::fixVcmpxExecWARHazard() mitigates a specific hazard by inserting a wait on sa_sdst==0 if such a wait isn't already present. Unfortunatel
[AMDGPU] Fix incorrect hazard mitigation
GCNHazardRecognizer::fixVcmpxExecWARHazard() mitigates a specific hazard by inserting a wait on sa_sdst==0 if such a wait isn't already present. Unfortunately, the check for an existing wait incorrectly checks for one that doesn't actually care about sa_sdst itself, but requires that no other counters are waited for.
Once the check is performed correctly, a lit test needs to be updated, since it is currently testing for the incorrect behaviour.
Differential Revision: https://reviews.llvm.org/D154438
show more ...
|
#
8aedad0f |
| 04-Jul-2023 |
Stephen Thomas <Stephen.Thomas@amd.com> |
[AMDGPU] Add functions for composing and decomposing S_WAIT_DEPCTR operands
Add functions AMDGPU::DepCtr::encodeField*() and AMDGPU::DepCtr::decodeField*() for each of vm_vsrc, va_vdst and sa_sdst.
[AMDGPU] Add functions for composing and decomposing S_WAIT_DEPCTR operands
Add functions AMDGPU::DepCtr::encodeField*() and AMDGPU::DepCtr::decodeField*() for each of vm_vsrc, va_vdst and sa_sdst. These are now used in AMDGPUInsertDelayAlu and GCNHazardRecognizer so as to make working with S_WAITCNT_DEPCTR operands easier and more readable.
Differential Revision: https://reviews.llvm.org/D154424
show more ...
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5 |
|
#
aa2d0fbc |
| 21-May-2023 |
Sergei Barannikov <barannikov88@gmail.com> |
[MC] Add MCRegisterInfo::regunits for iteration over register units
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D152098
|
#
890c76a9 |
| 19-May-2023 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Fix odd implicit operand handling in clause breaking
By inspection. Because of the strange behaviour of MI.uses(), this was adding implicit defs to the clause *uses* set, and then wrongly d
[AMDGPU] Fix odd implicit operand handling in clause breaking
By inspection. Because of the strange behaviour of MI.uses(), this was adding implicit defs to the clause *uses* set, and then wrongly detecting a conflict between explicit defs and implicit defs.
For example it would detect a conflict on this pair of instructions:
$vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4088, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1 :: (load (s32) from %stack.1, addrspace 5) $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4092, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1 :: (load (s32) from %stack.1 + 4, addrspace 5)
Differential Revision: https://reviews.llvm.org/D150947
show more ...
|
Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2 |
|
#
4241d890 |
| 15-Apr-2023 |
Kazu Hirata <kazu@google.com> |
[Target] Use range-based for loops (NFC)
|
Revision tags: llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2 |
|
#
a07584d5 |
| 03-Feb-2023 |
Jay Foad <jay.foad@amd.com> |
[CodeGen] Make more use of MachineOperand::getOperandNo. NFC.
Differential Revision: https://reviews.llvm.org/D143252
|
#
8e3d7cf5 |
| 07-Feb-2023 |
Archibald Elliott <archibald.elliott@arm.com> |
[NFC][TargetParser] Remove llvm/Support/TargetParser.h
|
Revision tags: llvmorg-16.0.0-rc1, llvmorg-17-init |
|
#
768aed13 |
| 13-Jan-2023 |
Jay Foad <jay.foad@amd.com> |
[MC] Make more use of MCInstrDesc::operands. NFC.
Change MCInstrDesc::operands to return an ArrayRef so we can easily use it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end. A futu
[MC] Make more use of MCInstrDesc::operands. NFC.
Change MCInstrDesc::operands to return an ArrayRef so we can easily use it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end. A future patch will remove opInfo_begin and opInfo_end.
Also use it instead of raw access to the OpInfo pointer. A future patch will remove this pointer.
Differential Revision: https://reviews.llvm.org/D142213
show more ...
|
Revision tags: llvmorg-15.0.7 |
|
#
5bc703f7 |
| 20-Dec-2022 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Replace getPhysRegClass with getPhysRegBaseClass
Accelerate finding the base class for a physical register by building a statically mapping table from physical registers to base classes usi
[AMDGPU] Replace getPhysRegClass with getPhysRegBaseClass
Accelerate finding the base class for a physical register by building a statically mapping table from physical registers to base classes using TableGen.
Replace uses of SIRegisterInfo::getPhysRegClass with TargetRegisterInfo::getPhysRegBaseClass in order to use the computed table.
Reviewed By: arsenm, foad
Differential Revision: https://reviews.llvm.org/D139422
show more ...
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
#
49762162 |
| 08-Mar-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Remove isLiteralConstant and isLiteralConstantLike
isLiteralConstant and isLiteralConstantLike were similar to !isInlineConstant with slight differences like handling isReg operands.
To av
[AMDGPU] Remove isLiteralConstant and isLiteralConstantLike
isLiteralConstant and isLiteralConstantLike were similar to !isInlineConstant with slight differences like handling isReg operands.
To avoid a profusion of similar functions with undocumented differences, this patch removes all the isLiteralConstant* variants. Callers are responsible for handling the isReg case.
Differential Revision: https://reviews.llvm.org/D125759
show more ...
|
#
7425077e |
| 07-Nov-2022 |
Pierre van Houtryve <pierre.vanhoutryve@amd.com> |
[AMDGPU] Add & use `hasNamedOperand`, NFC
In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1. This is fine in itself, but it's verbose and doesn'
[AMDGPU] Add & use `hasNamedOperand`, NFC
In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1. This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually.
Reviewed By: arsenm, foad
Differential Revision: https://reviews.llvm.org/D137540
show more ...
|
#
c8a90316 |
| 28-Oct-2022 |
Stephen Thomas <Stephen.Thomas@amd.com> |
[AMDGPU] Small cleanups in wait counter code
A small number of cleanups and refactors intended to enhance readability in two passes that deal with s_waitcnt instructions.
Differential Revision: htt
[AMDGPU] Small cleanups in wait counter code
A small number of cleanups and refactors intended to enhance readability in two passes that deal with s_waitcnt instructions.
Differential Revision: https://reviews.llvm.org/D136677
show more ...
|
#
9bb1e21f |
| 27-Oct-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Clean up calls to MachineOperand::setIsDead and friends. NFC.
|
#
575eed3d |
| 11-Oct-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix hazard with v_accvgpr_write_b32 and inline asm VGPR defs
If inline asm has a VGPR def, it must have come from a VGPR write somewhere inside the asm. This should be further extended to al
AMDGPU: Fix hazard with v_accvgpr_write_b32 and inline asm VGPR defs
If inline asm has a VGPR def, it must have come from a VGPR write somewhere inside the asm. This should be further extended to all read after write hazards.
show more ...
|
#
a35013be |
| 01-Oct-2022 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU][GFX11] Mitigate VALU mask write hazard
VALU use of an SGPR (pair) as mask followed by SALU write to the same SGPR can cause incorrect execution of subsequent SALU reads of the SGPR.
Review
[AMDGPU][GFX11] Mitigate VALU mask write hazard
VALU use of an SGPR (pair) as mask followed by SALU write to the same SGPR can cause incorrect execution of subsequent SALU reads of the SGPR.
Reviewed By: foad, rampitec
Differential Revision: https://reviews.llvm.org/D134151
show more ...
|
#
f19cc793 |
| 20-Sep-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Disable fp atomic to s_denorm_mode hazard for GFX11
This hazard only exists on GFX10.
Differential Revision: https://reviews.llvm.org/D134276
|