History log of /llvm-project/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp (Results 1 – 25 of 153)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init
# 9ca1323d 21-Jan-2025 Chinmay Deshpande <chdeshpa@amd.com>

[AMDGPU] Fix crash due to missing check for FLAT instructions that dont use vector registers when computing VALU hazard (#123627)


# 8a0c2e75 16-Jan-2025 Brox Chen <guochen2@amd.com>

[AMDGPU][True16][MC][CodeGen] true16 for v_cndmask_b16 (#119736)

Support true16 format for v_cndmask_b16 in MC and CodeGen in true16 and
fake16 flow.

Since we are replacing `v_cndmask_b16` to `v

[AMDGPU][True16][MC][CodeGen] true16 for v_cndmask_b16 (#119736)

Support true16 format for v_cndmask_b16 in MC and CodeGen in true16 and
fake16 flow.

Since we are replacing `v_cndmask_b16` to `v_cndmask_b16_t16/fake16`, we
have to at least update the fake16 codeGen to get codeGen test passing.
For this case, we have to update the true16 and with fake16 together,
otherwise some of the true16 tests will fail

show more ...


Revision tags: llvmorg-19.1.7, llvmorg-19.1.6
# 5e007afa 11-Dec-2024 Pravin Jagtap <Pravin.Jagtap@amd.com>

[AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589)

Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when
partially writing to high bytes.
Experiments in SWDEV-4

[AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589)

Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when
partially writing to high bytes.
Experiments in SWDEV-499733 and SWDEV-501347 suggest that we need nop
for above cases irrespective of opsel values.

Note: We might need to add few others into the same table.

show more ...


Revision tags: llvmorg-19.1.5
# 39337ff2 02-Dec-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Handle cvt_scale F32/F16->F4/F8 gfx950 hazard (#117844)

gfx950 SP changes doc says:
No 4 clk forwarding on opcodes that convert from
F32/F16->F8 or F32/F16->F4. Must insert a NOP or
instruct

AMDGPU: Handle cvt_scale F32/F16->F4/F8 gfx950 hazard (#117844)

gfx950 SP changes doc says:
No 4 clk forwarding on opcodes that convert from
F32/F16->F8 or F32/F16->F4. Must insert a NOP or
instruction writing some other destination VREG
after a conversion to F4/F8 since it writes either
low/high half or bytes.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
Co-authored-by: Jeffrey Byrnes <Jeffrey.Byrnes@amd.com>

show more ...


# 27a8afa3 25-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Handle gfx950 valu write vdst + permlane read hazard (#117287)


# c3fe5ad6 25-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Handle vcmpx+permalane gfx950 hazard (#117286)

Confusingly, this is a different hazard to the one on gfx10
with a subtarget feature.


# 3db4f5b0 25-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Refine gfx950 xdl-write-vgpr hazard cases (#117285)

The 2-pass XDL write VGPR, read by non-XDL SGEMM/DGEMM case
was 1 wait state overly conservative. Previously, for gfx940,
the XDL/non-XDL

AMDGPU: Refine gfx950 xdl-write-vgpr hazard cases (#117285)

The 2-pass XDL write VGPR, read by non-XDL SGEMM/DGEMM case
was 1 wait state overly conservative. Previously, for gfx940,
the XDL/non-XDL cases happened to have the same number of cycles
in all cases. Now the XDL consumer case has an additional state for
2 pass sources.

show more ...


# 85601fd7 23-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Handle v_mfma_f64_16x16x4_f64 write VGPR read srca/srcb hazard change for gfx950 (#117284)

Increase in wait states from 11 to 19. The index for smfmac counts as like srcA/srcB.


# db08d78c 23-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Handle v_mfma_f64_16x16x4_f64 srcc write VGPR hazard change for gfx950 (#117283)

Read by sgemm/dgemm in srcc after v_mfma_f64_16x16x4_f64 increases from 9 to 17
wait states.


# 8cb6c990 23-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Handle gfx950 XDL-write-overlapped-smfma-src-c wait state change (#117263)

These have an additional wait state compared to gfx940.


# b078b882 23-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Handle gfx950 change in mfma_f64_16x16x4 + valu hazard (#117262)

Increase from 11 wait states to 19


Revision tags: llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2
# d6173713 02-Oct-2024 Juan Manuel Martinez Caamaño <jmartinezcaamao@gmail.com>

[AMDGPU] Use the SchedModel available in SIInstrInfo (#110859)

Instead of allocating an initializing a new instance in
`GCNHazardRecognizer` and `AMDGPUInsertDelayAlu`.


Revision tags: llvmorg-19.1.1, llvmorg-19.1.0
# 86627149 04-Sep-2024 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#100067)

Any SGPR read by a VALU can potentially obscure SALU writes to the same
register.
Insert s_wait_alu instructions to mitigate the hazard on a

[AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#100067)

Any SGPR read by a VALU can potentially obscure SALU writes to the same
register.
Insert s_wait_alu instructions to mitigate the hazard on affected paths.

Compute a global cache of SGPRs with any VALU reads and use this to
avoid inserting mitigation for SGPRs never accessed by VALUs.

To avoid excessive search when compile time is priority implement
secondary mode where all SALU writes are mitigated.

Co-authored-by: Shilei Tian <shilei.tian@amd.com>

show more ...


Revision tags: llvmorg-19.1.0-rc4
# 987ffc31 23-Aug-2024 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] Refactor code for GETPC bundle updates in hazards (NFCI)

As suggested in review for PR #100067.
Refactor code for S_GETPC_B64 bundle updates for use with multiple
hazard mitigations.


# 7bcf4d63 22-Aug-2024 Jeffrey Byrnes <jeffrey.byrnes@amd.com>

[AMDGPU] Correctly insert s_nops for dst forwarding hazard (#100276)

MI300 ISA section 4.5 states there is a hazard between "VALU op which
uses OPSEL or SDWA with changes the result’s bit position"

[AMDGPU] Correctly insert s_nops for dst forwarding hazard (#100276)

MI300 ISA section 4.5 states there is a hazard between "VALU op which
uses OPSEL or SDWA with changes the result’s bit position" and "VALU op
consumes result of that op"

This includes the case where the second op is SDWA with same dest and
dst_sel != DWORD && dst_unused == UNUSED_PRESERVE. In this case, there
is an implicit read of the first op dst and the compiler needs to
resolve this hazard. Confirmed with HW team.

We model dst_unused == UNUSED_PRESERVE as tied-def of implicit operand,
so this PR checks for that.

MI300_SP_MAS section 1.3.9.2 specifies that CVT_SR_FP8_F32 and
CVT_SR_BF8_F32 with opsel[3:2] !=0 have dest forwarding issue.
Currently, we only add check for CVT_SR_FP8_F32 with opsel[3] != 0 --
this PR adds support opsel[2] != 0 as well

show more ...


Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 939a6624 23-Jul-2024 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] Implement workaround for GFX11.5 export priority (#99273)

On GFX11.5 shaders having completed exports need to execute/wait at a
lower priority than shaders still executing exports.
Add co

[AMDGPU] Implement workaround for GFX11.5 export priority (#99273)

On GFX11.5 shaders having completed exports need to execute/wait at a
lower priority than shaders still executing exports.
Add code to maintain normal priority of 2 for shaders that export and
drop to priority 0 after exports.

show more ...


# aeafdc21 16-Jul-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Use using instead of typedef. NFC.


Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5
# 0606747c 01-May-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Remove some pointless fallthrough annotations


# f6d431f2 24-Apr-2024 Xu Zhang <simonzgx@gmail.com>

[CodeGen] Make the parameter TRI required in some functions. (#85968)

Fixes #82659

There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many

[CodeGen] Make the parameter TRI required in some functions. (#85968)

Fixes #82659

There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411.

Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.

After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.

show more ...


Revision tags: llvmorg-18.1.4, llvmorg-18.1.3
# 0234d90d 31-Mar-2024 Austin Kerbow <Austin.Kerbow@amd.com>

[AMDGPU] Extend MFMA padding option to gfx90a+ (#86768)

It was shown experimentally that this may have some benefit on newer HW.


Revision tags: llvmorg-18.1.2, llvmorg-18.1.1
# a6382de3 07-Mar-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Refactor mfma hazard handling [NFC] (#84276)

Try to make this editable by using functions for the number of wait
states as a function of the number of passes. I'm assuming the current
haza

AMDGPU: Refactor mfma hazard handling [NFC] (#84276)

Try to make this editable by using functions for the number of wait
states as a function of the number of passes. I'm assuming the current
hazard test coverage is comprehensive.

This could probably use another round to further simplify it.
Alternatively, I believe this could all be expressed in a constant table
indexed by an instruction classify function and number of passes.

show more ...


# 0f3628a9 06-Mar-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Correct cycle counts for f64 mfma on gfx940 (#83782)


Revision tags: llvmorg-18.1.0, llvmorg-18.1.0-rc4
# dfa1d9b0 23-Feb-2024 Ivan Kosarev <ivan.kosarev@amd.com>

[AMDGPU][NFC] Have helpers to deal with encoding fields. (#82772)

These are hoped to provide more convenient and less error prone
facilities to encode and decode fields than manually defined consta

[AMDGPU][NFC] Have helpers to deal with encoding fields. (#82772)

These are hoped to provide more convenient and less error prone
facilities to encode and decode fields than manually defined constants
and functions.

show more ...


Revision tags: llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1
# 659ce8f6 29-Jan-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Simplify else if to else in GCNHazardRecognizer

Fixes #79736


# 7fdf608c 24-Jan-2024 Mirko Brkušanin <Mirko.Brkusanin@amd.com>

[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)

Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>


1234567