|
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5 |
|
| #
39337ff2 |
| 02-Dec-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Handle cvt_scale F32/F16->F4/F8 gfx950 hazard (#117844)
gfx950 SP changes doc says: No 4 clk forwarding on opcodes that convert from F32/F16->F8 or F32/F16->F4. Must insert a NOP or instruct
AMDGPU: Handle cvt_scale F32/F16->F4/F8 gfx950 hazard (#117844)
gfx950 SP changes doc says: No 4 clk forwarding on opcodes that convert from F32/F16->F8 or F32/F16->F4. Must insert a NOP or instruction writing some other destination VREG after a conversion to F4/F8 since it writes either low/high half or bytes.
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com> Co-authored-by: Jeffrey Byrnes <Jeffrey.Byrnes@amd.com>
show more ...
|
|
Revision tags: llvmorg-19.1.4 |
|
| #
1bf385f1 |
| 09-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Default to selecting frame indexes to SGPRs (#115060)
Only select to a VGPR if it's trivally used in VGPR only contexts. This fixes mishandling frame indexes used in SGPR only contexts, like
AMDGPU: Default to selecting frame indexes to SGPRs (#115060)
Only select to a VGPR if it's trivally used in VGPR only contexts. This fixes mishandling frame indexes used in SGPR only contexts, like inline assembly constraints.
This is suboptimal in the common case where the frame index is transitively used by only VALU ops. We make up for this by later folding the copy to VALU plus scalar op in SIFoldOperands.
show more ...
|
|
Revision tags: llvmorg-19.1.3 |
|
| #
ef91cd3f |
| 19-Oct-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Handle folding frame indexes into add with immediate (#110738)
|
|
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1 |
|
| #
ac0f64f0 |
| 30-Sep-2024 |
Christudasan Devadasan <christudasan.devadasan@amd.com> |
[AMDGPU] Split vgpr regalloc pipeline (#93526)
Allocating wwm-registers and per-thread VGPR operands
together imposes many challenges in the way the
registers are reused during allocation. There a
[AMDGPU] Split vgpr regalloc pipeline (#93526)
Allocating wwm-registers and per-thread VGPR operands
together imposes many challenges in the way the
registers are reused during allocation. There are
times when regalloc reuses the registers of regular
VGPRs operations for wwm-operations in a small range
leading to unwantedly clobbering their inactive lanes
causing correctness issues that are hard to trace.
This patch splits the VGPR allocation pipeline further
to allocate wwm-registers first and the regular VGPR
operands in a separate pipeline. The splitting would
ensure that the physical registers used for wwm
allocations won't take part in the next allocation
pipeline to avoid any such clobbering.
show more ...
|
|
Revision tags: llvmorg-19.1.0 |
|
| #
86627149 |
| 04-Sep-2024 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#100067)
Any SGPR read by a VALU can potentially obscure SALU writes to the same
register.
Insert s_wait_alu instructions to mitigate the hazard on a
[AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#100067)
Any SGPR read by a VALU can potentially obscure SALU writes to the same
register.
Insert s_wait_alu instructions to mitigate the hazard on affected paths.
Compute a global cache of SGPRs with any VALU reads and use this to
avoid inserting mitigation for SGPRs never accessed by VALUs.
To avoid excessive search when compile time is priority implement
secondary mode where all SALU writes are mitigated.
Co-authored-by: Shilei Tian <shilei.tian@amd.com>
show more ...
|
|
Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2 |
|
| #
42bae9c5 |
| 05-Aug-2024 |
Pankaj Dwivedi <pankajkumar.divedi@amd.com> |
[AMDGPU] Optimize the register uses if offset inlinable (#101676)
Fold the frame index offset into v_mad if inlinable.
|
| #
adac04ff |
| 02-Aug-2024 |
Pankaj Dwivedi <divedi.pk.117@gmail.com> |
[AMDGPU] Fix using wrong register in frame index shift (#101649)
In case of v_mad we have materialized the offset in vgpr and mad is
performed in wave space, later vgpr have to be shifted back in l
[AMDGPU] Fix using wrong register in frame index shift (#101649)
In case of v_mad we have materialized the offset in vgpr and mad is
performed in wave space, later vgpr have to be shifted back in lane
space. [#99556](https://github.com/llvm/llvm-project/pull/99556)
introduces a bug.
Co-authored-by: Pankajdwivedi-25 <pankajkumar.divedi@amd.com>
show more ...
|
| #
ef67664d |
| 31-Jul-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add testcase for materializing sgpr frame indexes (#101306)
These add some IR tests for 57d10b4fc9142d12fbdec578a0cc6f78deb67ef4.
These do rely on some lucky MIR placement to test the scc i
AMDGPU: Add testcase for materializing sgpr frame indexes (#101306)
These add some IR tests for 57d10b4fc9142d12fbdec578a0cc6f78deb67ef4.
These do rely on some lucky MIR placement to test the scc input, but I
haven't found a better way to do it. Also, scc handling in inline asm
is extremely buggy.
show more ...
|