Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5 |
|
#
c3fe5ad6 |
| 25-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Handle vcmpx+permalane gfx950 hazard (#117286)
Confusingly, this is a different hazard to the one on gfx10 with a subtarget feature.
|
Revision tags: llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2 |
|
#
d6173713 |
| 02-Oct-2024 |
Juan Manuel Martinez Caamaño <jmartinezcaamao@gmail.com> |
[AMDGPU] Use the SchedModel available in SIInstrInfo (#110859)
Instead of allocating an initializing a new instance in
`GCNHazardRecognizer` and `AMDGPUInsertDelayAlu`.
|
Revision tags: llvmorg-19.1.1, llvmorg-19.1.0 |
|
#
86627149 |
| 04-Sep-2024 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#100067)
Any SGPR read by a VALU can potentially obscure SALU writes to the same
register.
Insert s_wait_alu instructions to mitigate the hazard on a
[AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#100067)
Any SGPR read by a VALU can potentially obscure SALU writes to the same
register.
Insert s_wait_alu instructions to mitigate the hazard on affected paths.
Compute a global cache of SGPRs with any VALU reads and use this to
avoid inserting mitigation for SGPRs never accessed by VALUs.
To avoid excessive search when compile time is priority implement
secondary mode where all SALU writes are mitigated.
Co-authored-by: Shilei Tian <shilei.tian@amd.com>
show more ...
|
Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
939a6624 |
| 23-Jul-2024 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Implement workaround for GFX11.5 export priority (#99273)
On GFX11.5 shaders having completed exports need to execute/wait at a
lower priority than shaders still executing exports.
Add co
[AMDGPU] Implement workaround for GFX11.5 export priority (#99273)
On GFX11.5 shaders having completed exports need to execute/wait at a
lower priority than shaders still executing exports.
Add code to maintain normal priority of 2 for shaders that export and
drop to priority 0 after exports.
show more ...
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2 |
|
#
a35013be |
| 01-Oct-2022 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU][GFX11] Mitigate VALU mask write hazard
VALU use of an SGPR (pair) as mask followed by SALU write to the same SGPR can cause incorrect execution of subsequent SALU reads of the SGPR.
Review
[AMDGPU][GFX11] Mitigate VALU mask write hazard
VALU use of an SGPR (pair) as mask followed by SALU write to the same SGPR can cause incorrect execution of subsequent SALU reads of the SGPR.
Reviewed By: foad, rampitec
Differential Revision: https://reviews.llvm.org/D134151
show more ...
|
Revision tags: llvmorg-15.0.1, llvmorg-15.0.0 |
|
#
95d497ff |
| 31-Aug-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] W/a hazard if 64 bit shift amount is a highest allocated VGPR
In this case gfx90a uses v0 instead of the correct register. Swap the value temporarily with a lower register and then swap it
[AMDGPU] W/a hazard if 64 bit shift amount is a highest allocated VGPR
In this case gfx90a uses v0 instead of the correct register. Swap the value temporarily with a lower register and then swap it back.
Unfortunately hazard recognizer works after wait count insertion, so we cannot simply reuse an arbitrary register, hence w/a also includes a full waitcount. This can be avoided if we run it from expandPostRAPseudo, but that is a complete misplacement.
Differential Revision: https://reviews.llvm.org/D133067
show more ...
|
Revision tags: llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
#
4874838a |
| 28-Jun-2022 |
Piotr Sobczak <piotr.sobczak@amd.com> |
[AMDGPU] gfx11 WMMA instruction support
gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D1287
[AMDGPU] gfx11 WMMA instruction support
gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D128756
show more ...
|
Revision tags: llvmorg-14.0.6 |
|
#
13107c27 |
| 16-Jun-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Add support for GFX11 LDSDIR hazards
Detect LDS direct WAR/WAW hazards and compute values for wait_vdst (va_vdst) parameter. Where appropriate this raises wait_vdst from the default 0 to a
[AMDGPU] Add support for GFX11 LDSDIR hazards
Detect LDS direct WAR/WAW hazards and compute values for wait_vdst (va_vdst) parameter. Where appropriate this raises wait_vdst from the default 0 to allow concurrent issue of LDS direct with VALU execution.
Also detect LDS direct versus VMEM source VGPR hazards and insert vm_vsrc=0 waits using s_waitcnt_depctr.
Differential Revision: https://reviews.llvm.org/D127963
show more ...
|
#
9dff14be |
| 15-Jun-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Add support for GFX11 hazards
Add support for partial stall over EXEC hazard and trans use hazard.
Differential Revision: https://reviews.llvm.org/D127872
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
#
1e15adba |
| 04-Mar-2022 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Add s_nop WaitStates between neighboring mfma
In some cases padding bubbles between sequential MFMA instructions may lead to increased inter-wave performance. Add option to request to pad s
[AMDGPU] Add s_nop WaitStates between neighboring mfma
In some cases padding bubbles between sequential MFMA instructions may lead to increased inter-wave performance. Add option to request to pad some portion of these stall cycles with s_nops.
Fixes: SWDEV-326925
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D121437
show more ...
|
Revision tags: llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
#
41bfac6a |
| 02-Jan-2022 |
Kazu Hirata <kazu@google.com> |
[Target] Remove unused forward declarations (NFC)
|
Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2 |
|
#
e0c382a9 |
| 14-Jun-2021 |
Piotr Sobczak <Piotr.Sobczak@amd.com> |
[AMDGPU] Limit runs of fixLdsBranchVmemWARHazard
The code in fixLdsBranchVmemWARHazard looks for patterns of a vmem/lds access followed by a branch, followed by an lds/vmem access.
The handling of
[AMDGPU] Limit runs of fixLdsBranchVmemWARHazard
The code in fixLdsBranchVmemWARHazard looks for patterns of a vmem/lds access followed by a branch, followed by an lds/vmem access.
The handling of the hazard requires an arbitrary number of instructions to process. In the worst case where a function has a vmem access, but no lds accesses, all instructions are examined only to conclude that the hazard cannot occur.
Add the pre-processing stage which detects if there is both lds and vmem present in the function and only then does the more costly search.
This patch significantly improves compilation time in the cases the hazard cannot happen. In one pathological case I looked at IsHazardInst is needlesly called 88.6 milions times.
The numbers could also be improved by introducing a map around the inner calls to ::getWaitStatesSince in fixLdsBranchVmemWARHazard, but nothing will beat not running fixLdsBranchVmemWARHazard at all in the cases detected by shouldRunLdsBranchVmemWARHazardFixup().
Differential Revision: https://reviews.llvm.org/D104219
show more ...
|
Revision tags: llvmorg-12.0.1-rc1 |
|
#
424f1f6f |
| 30-Apr-2021 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU][NFC] Refactor hazard recognition IsHazardFn and IsExpiredFn
Refactor IsHazardFn and IsExpiredFn to use constant references as these should not be mutating the instructions visited and the i
[AMDGPU][NFC] Refactor hazard recognition IsHazardFn and IsExpiredFn
Refactor IsHazardFn and IsExpiredFn to use constant references as these should not be mutating the instructions visited and the instruction can never be null.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D101430
show more ...
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2 |
|
#
a8d9d507 |
| 17-Feb-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
|
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1 |
|
#
de518673 |
| 28-Oct-2020 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Add Reset function to GCNHazardRecognizer
Reset the tracked emitted instructions when starting scheduling on a new region.
Reviewed By: rampitec
Differential Revision: https://reviews.llv
[AMDGPU] Add Reset function to GCNHazardRecognizer
Reset the tracked emitted instructions when starting scheduling on a new region.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D90347
show more ...
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2 |
|
#
13b63be4 |
| 29-Jul-2020 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] prefer non-mfma in post-RA schedule
MFMA instructions shall not be scheduled back to back to avoid MAI SIMD stall. Tell post-RA schedule we would prefer some other instruction instead.
Dif
[AMDGPU] prefer non-mfma in post-RA schedule
MFMA instructions shall not be scheduled back to back to avoid MAI SIMD stall. Tell post-RA schedule we would prefer some other instruction instead.
Differential Revision: https://reviews.llvm.org/D84883
show more ...
|
Revision tags: llvmorg-11.0.0-rc1 |
|
#
2e87acac |
| 17-Jul-2020 |
Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com> |
[AMDGPU] Removed s_mov_regrd and mov_fed opcodes
These opcodes are not intended for public use.
Reviewers: arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D81659
|
Revision tags: llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1 |
|
#
29067aac |
| 06-May-2020 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Don't implement GCNHazardRecognizer::PreEmitNoops(SUnit *)
When called from the post-RA scheduler, hazards have already been handled by getHazardType returning NoopHazard, so PreEmitNoops a
[AMDGPU] Don't implement GCNHazardRecognizer::PreEmitNoops(SUnit *)
When called from the post-RA scheduler, hazards have already been handled by getHazardType returning NoopHazard, so PreEmitNoops always returns zero. Remove it. NFC.
Historical note: PreEmitNoops was added to the hazard recognizer interface as an optional feature to support dispatch group formation on the POWER target: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20131202/197470.html So it seems right that we shouldn't need to implement it.
We do still implement the other overload PreEmitNoops(MachineInstr *) because that is used by the PostRAHazardRecognizer pass.
Differential Revision: https://reviews.llvm.org/D79476
show more ...
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init |
|
#
7d2019bb |
| 11-Jul-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] gfx908 hazard recognizer
Differential Revision: https://reviews.llvm.org/D64593
llvm-svn: 365829
|
Revision tags: llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3 |
|
#
bdf7f81b |
| 21-Jun-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] hazard recognizer for fp atomic to s_denorm_mode
This requires 3 wait states unless there is a wait or VALU in between.
Differential Revision: https://reviews.llvm.org/D63619
llvm-svn: 36
[AMDGPU] hazard recognizer for fp atomic to s_denorm_mode
This requires 3 wait states unless there is a wait or VALU in between.
Differential Revision: https://reviews.llvm.org/D63619
llvm-svn: 364074
show more ...
|
#
5f581c9f |
| 12-Jun-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] gfx1010 premlane instructions
Differential Revision: https://reviews.llvm.org/D63202
llvm-svn: 363185
|
Revision tags: llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1 |
|
#
8a3d3a9a |
| 07-May-2019 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Check MI bundles for hazards
Summary: GCNHazardRecognizer fails to identify hazards that are in and around bundles. This patch allows the hazard recognizer to consider bundled instructions
[AMDGPU] Check MI bundles for hazards
Summary: GCNHazardRecognizer fails to identify hazards that are in and around bundles. This patch allows the hazard recognizer to consider bundled instructions in both scheduler and hazard recognizer mode. We ignore “bundledness” for the purpose of detecting hazards and examine the instructions individually.
Reviewers: arsenm, msearles, rampitec
Reviewed By: rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61564
llvm-svn: 360199
show more ...
|
#
51d1415a |
| 04-May-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
AMDGPU] gfx1010 hazard recognizer
Differential Revision: https://reviews.llvm.org/D61536
llvm-svn: 359961
|
Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1 |
|
#
f92ed696 |
| 21-Jan-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Fixed hazard recognizer to walk predecessors
Fixes two problems with GCNHazardRecognizer: 1. It only scans up to 5 instructions emitted earlier. 2. It does not take control flow into accoun
[AMDGPU] Fixed hazard recognizer to walk predecessors
Fixes two problems with GCNHazardRecognizer: 1. It only scans up to 5 instructions emitted earlier. 2. It does not take control flow into account. An earlier instruction from the previous basic block is not necessarily a predecessor. At the same time a real predecessor block is not scanned.
The patch provides a way to distinguish between scheduler and hazard recognizer mode. It is OK to work with emitted instructions in the scheduler because we do not really know what will be emitted later and its order. However, when pass works as a hazard recognizer the schedule is already finalized, and we have full access to the instructions for the whole function, so we can properly traverse predecessors and their instructions.
Differential Revision: https://reviews.llvm.org/D56923
llvm-svn: 351759
show more ...
|
#
2946cd70 |
| 19-Jan-2019 |
Chandler Carruth <chandlerc@gmail.com> |
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the ne
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository.
llvm-svn: 351636
show more ...
|