Revision tags: llvmorg-21-init |
|
#
9ca1323d |
| 21-Jan-2025 |
Chinmay Deshpande <chdeshpa@amd.com> |
[AMDGPU] Fix crash due to missing check for FLAT instructions that dont use vector registers when computing VALU hazard (#123627)
|
#
8a0c2e75 |
| 16-Jan-2025 |
Brox Chen <guochen2@amd.com> |
[AMDGPU][True16][MC][CodeGen] true16 for v_cndmask_b16 (#119736)
Support true16 format for v_cndmask_b16 in MC and CodeGen in true16 and
fake16 flow.
Since we are replacing `v_cndmask_b16` to `v
[AMDGPU][True16][MC][CodeGen] true16 for v_cndmask_b16 (#119736)
Support true16 format for v_cndmask_b16 in MC and CodeGen in true16 and
fake16 flow.
Since we are replacing `v_cndmask_b16` to `v_cndmask_b16_t16/fake16`, we
have to at least update the fake16 codeGen to get codeGen test passing.
For this case, we have to update the true16 and with fake16 together,
otherwise some of the true16 tests will fail
show more ...
|
Revision tags: llvmorg-19.1.7, llvmorg-19.1.6 |
|
#
5e007afa |
| 11-Dec-2024 |
Pravin Jagtap <Pravin.Jagtap@amd.com> |
[AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589)
Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when
partially writing to high bytes.
Experiments in SWDEV-4
[AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589)
Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when
partially writing to high bytes.
Experiments in SWDEV-499733 and SWDEV-501347 suggest that we need nop
for above cases irrespective of opsel values.
Note: We might need to add few others into the same table.
show more ...
|
Revision tags: llvmorg-19.1.5 |
|
#
39337ff2 |
| 02-Dec-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Handle cvt_scale F32/F16->F4/F8 gfx950 hazard (#117844)
gfx950 SP changes doc says: No 4 clk forwarding on opcodes that convert from F32/F16->F8 or F32/F16->F4. Must insert a NOP or instruct
AMDGPU: Handle cvt_scale F32/F16->F4/F8 gfx950 hazard (#117844)
gfx950 SP changes doc says: No 4 clk forwarding on opcodes that convert from F32/F16->F8 or F32/F16->F4. Must insert a NOP or instruction writing some other destination VREG after a conversion to F4/F8 since it writes either low/high half or bytes.
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com> Co-authored-by: Jeffrey Byrnes <Jeffrey.Byrnes@amd.com>
show more ...
|
#
27a8afa3 |
| 25-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Handle gfx950 valu write vdst + permlane read hazard (#117287)
|
#
c3fe5ad6 |
| 25-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Handle vcmpx+permalane gfx950 hazard (#117286)
Confusingly, this is a different hazard to the one on gfx10 with a subtarget feature.
|
#
3db4f5b0 |
| 25-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Refine gfx950 xdl-write-vgpr hazard cases (#117285)
The 2-pass XDL write VGPR, read by non-XDL SGEMM/DGEMM case was 1 wait state overly conservative. Previously, for gfx940, the XDL/non-XDL
AMDGPU: Refine gfx950 xdl-write-vgpr hazard cases (#117285)
The 2-pass XDL write VGPR, read by non-XDL SGEMM/DGEMM case was 1 wait state overly conservative. Previously, for gfx940, the XDL/non-XDL cases happened to have the same number of cycles in all cases. Now the XDL consumer case has an additional state for 2 pass sources.
show more ...
|
#
85601fd7 |
| 23-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Handle v_mfma_f64_16x16x4_f64 write VGPR read srca/srcb hazard change for gfx950 (#117284)
Increase in wait states from 11 to 19. The index for smfmac counts as like srcA/srcB.
|
#
db08d78c |
| 23-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Handle v_mfma_f64_16x16x4_f64 srcc write VGPR hazard change for gfx950 (#117283)
Read by sgemm/dgemm in srcc after v_mfma_f64_16x16x4_f64 increases from 9 to 17 wait states.
|
#
8cb6c990 |
| 23-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Handle gfx950 XDL-write-overlapped-smfma-src-c wait state change (#117263)
These have an additional wait state compared to gfx940.
|
#
b078b882 |
| 23-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Handle gfx950 change in mfma_f64_16x16x4 + valu hazard (#117262)
Increase from 11 wait states to 19
|
Revision tags: llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2 |
|
#
d6173713 |
| 02-Oct-2024 |
Juan Manuel Martinez Caamaño <jmartinezcaamao@gmail.com> |
[AMDGPU] Use the SchedModel available in SIInstrInfo (#110859)
Instead of allocating an initializing a new instance in
`GCNHazardRecognizer` and `AMDGPUInsertDelayAlu`.
|
Revision tags: llvmorg-19.1.1, llvmorg-19.1.0 |
|
#
86627149 |
| 04-Sep-2024 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#100067)
Any SGPR read by a VALU can potentially obscure SALU writes to the same
register.
Insert s_wait_alu instructions to mitigate the hazard on a
[AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#100067)
Any SGPR read by a VALU can potentially obscure SALU writes to the same
register.
Insert s_wait_alu instructions to mitigate the hazard on affected paths.
Compute a global cache of SGPRs with any VALU reads and use this to
avoid inserting mitigation for SGPRs never accessed by VALUs.
To avoid excessive search when compile time is priority implement
secondary mode where all SALU writes are mitigated.
Co-authored-by: Shilei Tian <shilei.tian@amd.com>
show more ...
|
Revision tags: llvmorg-19.1.0-rc4 |
|
#
987ffc31 |
| 23-Aug-2024 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Refactor code for GETPC bundle updates in hazards (NFCI)
As suggested in review for PR #100067. Refactor code for S_GETPC_B64 bundle updates for use with multiple hazard mitigations.
|
#
7bcf4d63 |
| 22-Aug-2024 |
Jeffrey Byrnes <jeffrey.byrnes@amd.com> |
[AMDGPU] Correctly insert s_nops for dst forwarding hazard (#100276)
MI300 ISA section 4.5 states there is a hazard between "VALU op which
uses OPSEL or SDWA with changes the result’s bit position"
[AMDGPU] Correctly insert s_nops for dst forwarding hazard (#100276)
MI300 ISA section 4.5 states there is a hazard between "VALU op which
uses OPSEL or SDWA with changes the result’s bit position" and "VALU op
consumes result of that op"
This includes the case where the second op is SDWA with same dest and
dst_sel != DWORD && dst_unused == UNUSED_PRESERVE. In this case, there
is an implicit read of the first op dst and the compiler needs to
resolve this hazard. Confirmed with HW team.
We model dst_unused == UNUSED_PRESERVE as tied-def of implicit operand,
so this PR checks for that.
MI300_SP_MAS section 1.3.9.2 specifies that CVT_SR_FP8_F32 and
CVT_SR_BF8_F32 with opsel[3:2] !=0 have dest forwarding issue.
Currently, we only add check for CVT_SR_FP8_F32 with opsel[3] != 0 --
this PR adds support opsel[2] != 0 as well
show more ...
|
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
939a6624 |
| 23-Jul-2024 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Implement workaround for GFX11.5 export priority (#99273)
On GFX11.5 shaders having completed exports need to execute/wait at a
lower priority than shaders still executing exports.
Add co
[AMDGPU] Implement workaround for GFX11.5 export priority (#99273)
On GFX11.5 shaders having completed exports need to execute/wait at a
lower priority than shaders still executing exports.
Add code to maintain normal priority of 2 for shaders that export and
drop to priority 0 after exports.
show more ...
|
#
aeafdc21 |
| 16-Jul-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Use using instead of typedef. NFC.
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5 |
|
#
0606747c |
| 01-May-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Remove some pointless fallthrough annotations
|
#
f6d431f2 |
| 24-Apr-2024 |
Xu Zhang <simonzgx@gmail.com> |
[CodeGen] Make the parameter TRI required in some functions. (#85968)
Fixes #82659
There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many
[CodeGen] Make the parameter TRI required in some functions. (#85968)
Fixes #82659
There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411.
Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.
After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
show more ...
|
Revision tags: llvmorg-18.1.4, llvmorg-18.1.3 |
|
#
0234d90d |
| 31-Mar-2024 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Extend MFMA padding option to gfx90a+ (#86768)
It was shown experimentally that this may have some benefit on newer HW.
|
Revision tags: llvmorg-18.1.2, llvmorg-18.1.1 |
|
#
a6382de3 |
| 07-Mar-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Refactor mfma hazard handling [NFC] (#84276)
Try to make this editable by using functions for the number of wait
states as a function of the number of passes. I'm assuming the current
haza
AMDGPU: Refactor mfma hazard handling [NFC] (#84276)
Try to make this editable by using functions for the number of wait
states as a function of the number of passes. I'm assuming the current
hazard test coverage is comprehensive.
This could probably use another round to further simplify it.
Alternatively, I believe this could all be expressed in a constant table
indexed by an instruction classify function and number of passes.
show more ...
|
#
0f3628a9 |
| 06-Mar-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Correct cycle counts for f64 mfma on gfx940 (#83782)
|
Revision tags: llvmorg-18.1.0, llvmorg-18.1.0-rc4 |
|
#
dfa1d9b0 |
| 23-Feb-2024 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][NFC] Have helpers to deal with encoding fields. (#82772)
These are hoped to provide more convenient and less error prone
facilities to encode and decode fields than manually defined consta
[AMDGPU][NFC] Have helpers to deal with encoding fields. (#82772)
These are hoped to provide more convenient and less error prone
facilities to encode and decode fields than manually defined constants
and functions.
show more ...
|
Revision tags: llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1 |
|
#
659ce8f6 |
| 29-Jan-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Simplify else if to else in GCNHazardRecognizer
Fixes #79736
|
#
7fdf608c |
| 24-Jan-2024 |
Mirko Brkušanin <Mirko.Brkusanin@amd.com> |
[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
|