GCNHazardRecognizer.cpp - OpenGrok history log for /llvm-project/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init
# 9ca1323d	21-Jan-2025	Chinmay Deshpande <chdeshpa@amd.com>	[AMDGPU] Fix crash due to missing check for FLAT instructions that dont use vector registers when computing VALU hazard (#123627)
# 8a0c2e75	16-Jan-2025	Brox Chen <guochen2@amd.com>	[AMDGPU][True16][MC][CodeGen] true16 for v_cndmask_b16 (#119736) Support true16 format for v_cndmask_b16 in MC and CodeGen in true16 and fake16 flow. Since we are replacing `v_cndmask_b16` to `v [AMDGPU][True16][MC][CodeGen] true16 for v_cndmask_b16 (#119736) Support true16 format for v_cndmask_b16 in MC and CodeGen in true16 and fake16 flow. Since we are replacing `v_cndmask_b16` to `v_cndmask_b16_t16/fake16`, we have to at least update the fake16 codeGen to get codeGen test passing. For this case, we have to update the true16 and with fake16 together, otherwise some of the true16 tests will fail show more ...
Revision tags: llvmorg-19.1.7, llvmorg-19.1.6
# 5e007afa	11-Dec-2024	Pravin Jagtap <Pravin.Jagtap@amd.com>	[AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589) Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when partially writing to high bytes. Experiments in SWDEV-4 [AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589) Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when partially writing to high bytes. Experiments in SWDEV-499733 and SWDEV-501347 suggest that we need nop for above cases irrespective of opsel values. Note: We might need to add few others into the same table. show more ...
Revision tags: llvmorg-19.1.5
# 39337ff2	02-Dec-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Handle cvt_scale F32/F16->F4/F8 gfx950 hazard (#117844) gfx950 SP changes doc says: No 4 clk forwarding on opcodes that convert from F32/F16->F8 or F32/F16->F4. Must insert a NOP or instruct AMDGPU: Handle cvt_scale F32/F16->F4/F8 gfx950 hazard (#117844) gfx950 SP changes doc says: No 4 clk forwarding on opcodes that convert from F32/F16->F8 or F32/F16->F4. Must insert a NOP or instruction writing some other destination VREG after a conversion to F4/F8 since it writes either low/high half or bytes. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com> Co-authored-by: Jeffrey Byrnes <Jeffrey.Byrnes@amd.com> show more ...
# 27a8afa3	25-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Handle gfx950 valu write vdst + permlane read hazard (#117287)
# c3fe5ad6	25-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Handle vcmpx+permalane gfx950 hazard (#117286) Confusingly, this is a different hazard to the one on gfx10 with a subtarget feature.
# 3db4f5b0	25-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Refine gfx950 xdl-write-vgpr hazard cases (#117285) The 2-pass XDL write VGPR, read by non-XDL SGEMM/DGEMM case was 1 wait state overly conservative. Previously, for gfx940, the XDL/non-XDL AMDGPU: Refine gfx950 xdl-write-vgpr hazard cases (#117285) The 2-pass XDL write VGPR, read by non-XDL SGEMM/DGEMM case was 1 wait state overly conservative. Previously, for gfx940, the XDL/non-XDL cases happened to have the same number of cycles in all cases. Now the XDL consumer case has an additional state for 2 pass sources. show more ...
# 85601fd7	23-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Handle v_mfma_f64_16x16x4_f64 write VGPR read srca/srcb hazard change for gfx950 (#117284) Increase in wait states from 11 to 19. The index for smfmac counts as like srcA/srcB.
# db08d78c	23-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Handle v_mfma_f64_16x16x4_f64 srcc write VGPR hazard change for gfx950 (#117283) Read by sgemm/dgemm in srcc after v_mfma_f64_16x16x4_f64 increases from 9 to 17 wait states.
# 8cb6c990	23-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Handle gfx950 XDL-write-overlapped-smfma-src-c wait state change (#117263) These have an additional wait state compared to gfx940.
# b078b882	23-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Handle gfx950 change in mfma_f64_16x16x4 + valu hazard (#117262) Increase from 11 wait states to 19
Revision tags: llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2
# d6173713	02-Oct-2024	Juan Manuel Martinez Caamaño <jmartinezcaamao@gmail.com>	[AMDGPU] Use the SchedModel available in SIInstrInfo (#110859) Instead of allocating an initializing a new instance in `GCNHazardRecognizer` and `AMDGPUInsertDelayAlu`.
Revision tags: llvmorg-19.1.1, llvmorg-19.1.0
# 86627149	04-Sep-2024	Carl Ritson <carl.ritson@amd.com>	[AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#100067) Any SGPR read by a VALU can potentially obscure SALU writes to the same register. Insert s_wait_alu instructions to mitigate the hazard on a [AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#100067) Any SGPR read by a VALU can potentially obscure SALU writes to the same register. Insert s_wait_alu instructions to mitigate the hazard on affected paths. Compute a global cache of SGPRs with any VALU reads and use this to avoid inserting mitigation for SGPRs never accessed by VALUs. To avoid excessive search when compile time is priority implement secondary mode where all SALU writes are mitigated. Co-authored-by: Shilei Tian <shilei.tian@amd.com> show more ...
Revision tags: llvmorg-19.1.0-rc4
# 987ffc31	23-Aug-2024	Carl Ritson <carl.ritson@amd.com>	[AMDGPU] Refactor code for GETPC bundle updates in hazards (NFCI) As suggested in review for PR #100067. Refactor code for S_GETPC_B64 bundle updates for use with multiple hazard mitigations.
# 7bcf4d63	22-Aug-2024	Jeffrey Byrnes <jeffrey.byrnes@amd.com>	[AMDGPU] Correctly insert s_nops for dst forwarding hazard (#100276) MI300 ISA section 4.5 states there is a hazard between "VALU op which uses OPSEL or SDWA with changes the result’s bit position" [AMDGPU] Correctly insert s_nops for dst forwarding hazard (#100276) MI300 ISA section 4.5 states there is a hazard between "VALU op which uses OPSEL or SDWA with changes the result’s bit position" and "VALU op consumes result of that op" This includes the case where the second op is SDWA with same dest and dst_sel != DWORD && dst_unused == UNUSED_PRESERVE. In this case, there is an implicit read of the first op dst and the compiler needs to resolve this hazard. Confirmed with HW team. We model dst_unused == UNUSED_PRESERVE as tied-def of implicit operand, so this PR checks for that. MI300_SP_MAS section 1.3.9.2 specifies that CVT_SR_FP8_F32 and CVT_SR_BF8_F32 with opsel[3:2] !=0 have dest forwarding issue. Currently, we only add check for CVT_SR_FP8_F32 with opsel[3] != 0 -- this PR adds support opsel[2] != 0 as well show more ...
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 939a6624	23-Jul-2024	Carl Ritson <carl.ritson@amd.com>	[AMDGPU] Implement workaround for GFX11.5 export priority (#99273) On GFX11.5 shaders having completed exports need to execute/wait at a lower priority than shaders still executing exports. Add co [AMDGPU] Implement workaround for GFX11.5 export priority (#99273) On GFX11.5 shaders having completed exports need to execute/wait at a lower priority than shaders still executing exports. Add code to maintain normal priority of 2 for shaders that export and drop to priority 0 after exports. show more ...
# aeafdc21	16-Jul-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Use using instead of typedef. NFC.
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5
# 0606747c	01-May-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Remove some pointless fallthrough annotations
# f6d431f2	24-Apr-2024	Xu Zhang <simonzgx@gmail.com>	[CodeGen] Make the parameter TRI required in some functions. (#85968) Fixes #82659 There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many [CodeGen] Make the parameter TRI required in some functions. (#85968) Fixes #82659 There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411. Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact. After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`. show more ...
Revision tags: llvmorg-18.1.4, llvmorg-18.1.3
# 0234d90d	31-Mar-2024	Austin Kerbow <Austin.Kerbow@amd.com>	[AMDGPU] Extend MFMA padding option to gfx90a+ (#86768) It was shown experimentally that this may have some benefit on newer HW.
Revision tags: llvmorg-18.1.2, llvmorg-18.1.1
# a6382de3	07-Mar-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Refactor mfma hazard handling [NFC] (#84276) Try to make this editable by using functions for the number of wait states as a function of the number of passes. I'm assuming the current haza AMDGPU: Refactor mfma hazard handling [NFC] (#84276) Try to make this editable by using functions for the number of wait states as a function of the number of passes. I'm assuming the current hazard test coverage is comprehensive. This could probably use another round to further simplify it. Alternatively, I believe this could all be expressed in a constant table indexed by an instruction classify function and number of passes. show more ...
# 0f3628a9	06-Mar-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Correct cycle counts for f64 mfma on gfx940 (#83782)
Revision tags: llvmorg-18.1.0, llvmorg-18.1.0-rc4
# dfa1d9b0	23-Feb-2024	Ivan Kosarev <ivan.kosarev@amd.com>	[AMDGPU][NFC] Have helpers to deal with encoding fields. (#82772) These are hoped to provide more convenient and less error prone facilities to encode and decode fields than manually defined consta [AMDGPU][NFC] Have helpers to deal with encoding fields. (#82772) These are hoped to provide more convenient and less error prone facilities to encode and decode fields than manually defined constants and functions. show more ...
Revision tags: llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1
# 659ce8f6	29-Jan-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Simplify else if to else in GCNHazardRecognizer Fixes #79736
# 7fdf608c	24-Jan-2024	Mirko Brkušanin <Mirko.Brkusanin@amd.com>	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
12 3 4 5 6 7