#
fb28bf3f |
| 07-Sep-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Fix liveness verifier error in hazard recognizer
After D133067 we are inserting swaps to use a new physical register. I have noticed verifier errors about undefined physical register uses i
[AMDGPU] Fix liveness verifier error in hazard recognizer
After D133067 we are inserting swaps to use a new physical register. I have noticed verifier errors about undefined physical register uses if we are tracking liveness post RA.
We have no access to LIS at this point, so mark new register uses as undef to calm down the verifier. Liveness should not matter at this point anyway.
Note the description of the RegState::Undef: "Value of the register doesn't matter." I.e. it does not say it is strictly undefined. In fact that is what we really need: this value does not matter.
I also had to modify the test a bit since with tracking enabled it does not pass verification even before the recognizer.
Differential Revision: https://reviews.llvm.org/D133459
show more ...
|
#
95d497ff |
| 31-Aug-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] W/a hazard if 64 bit shift amount is a highest allocated VGPR
In this case gfx90a uses v0 instead of the correct register. Swap the value temporarily with a lower register and then swap it
[AMDGPU] W/a hazard if 64 bit shift amount is a highest allocated VGPR
In this case gfx90a uses v0 instead of the correct register. Swap the value temporarily with a lower register and then swap it back.
Unfortunately hazard recognizer works after wait count insertion, so we cannot simply reuse an arbitrary register, hence w/a also includes a full waitcount. This can be avoided if we run it from expandPostRAPseudo, but that is a complete misplacement.
Differential Revision: https://reviews.llvm.org/D133067
show more ...
|
#
de9d80c1 |
| 08-Aug-2022 |
Fangrui Song <i@maskray.me> |
[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
|
#
7fc52d7c |
| 27-Jul-2022 |
Vang Thao <Vang.Thao@amd.com> |
[AMDGPU] Fix DGEMM hazard for GFX90a
For VALU write and memory (VM, L/DS, FLAT) instructions, SQ would insert wait-states to avoid data hazard. However when there is a DGEMM instruction in-between t
[AMDGPU] Fix DGEMM hazard for GFX90a
For VALU write and memory (VM, L/DS, FLAT) instructions, SQ would insert wait-states to avoid data hazard. However when there is a DGEMM instruction in-between them, SQ incorrectly disables the wait-states thus the data hazard needs to be handled with this workaround.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D130677
show more ...
|
#
f29a19b0 |
| 01-Aug-2022 |
Piotr Sobczak <Piotr.Sobczak@amd.com> |
[AMDGPU] Extend cases for ReadM0MovRelInterpHazard
Extend hazard recognizer of ReadM0MovRelInterpHazard with DS_READ_ADDTID and DS_WRITE_ADDTID, as they also require a manually inserted S_NOP after
[AMDGPU] Extend cases for ReadM0MovRelInterpHazard
Extend hazard recognizer of ReadM0MovRelInterpHazard with DS_READ_ADDTID and DS_WRITE_ADDTID, as they also require a manually inserted S_NOP after SALU writing m0.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D130783
show more ...
|
#
4874838a |
| 28-Jun-2022 |
Piotr Sobczak <piotr.sobczak@amd.com> |
[AMDGPU] gfx11 WMMA instruction support
gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D1287
[AMDGPU] gfx11 WMMA instruction support
gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D128756
show more ...
|
#
13107c27 |
| 16-Jun-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Add support for GFX11 LDSDIR hazards
Detect LDS direct WAR/WAW hazards and compute values for wait_vdst (va_vdst) parameter. Where appropriate this raises wait_vdst from the default 0 to a
[AMDGPU] Add support for GFX11 LDSDIR hazards
Detect LDS direct WAR/WAW hazards and compute values for wait_vdst (va_vdst) parameter. Where appropriate this raises wait_vdst from the default 0 to allow concurrent issue of LDS direct with VALU execution.
Also detect LDS direct versus VMEM source VGPR hazards and insert vm_vsrc=0 waits using s_waitcnt_depctr.
Differential Revision: https://reviews.llvm.org/D127963
show more ...
|
#
9dff14be |
| 15-Jun-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Add support for GFX11 hazards
Add support for partial stall over EXEC hazard and trans use hazard.
Differential Revision: https://reviews.llvm.org/D127872
|
#
bd9eed3a |
| 22-May-2022 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Add isMFMA helper function. NFC
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D127124
|
#
5c974d08 |
| 08-Jun-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Fix hazard handling of v_cmpx to permlane
- VOP3 and SDWA forms of V_CMPX were not handled - Hazard only exists if the compare defines EXEC (i.e. V_CMPX) forwarded to the permlane.
Diffe
[AMDGPU] Fix hazard handling of v_cmpx to permlane
- VOP3 and SDWA forms of V_CMPX were not handled - Hazard only exists if the compare defines EXEC (i.e. V_CMPX) forwarded to the permlane.
Differential Revision: https://reviews.llvm.org/D127344
show more ...
|
#
63f21f4c |
| 27-Apr-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Handle LDS DMA and LDS_DIRECT hazards
There shall be 1 wait state between M0 write and LDS DMA/LDS_DIRECT use.
Differential Revision: https://reviews.llvm.org/D124550
|
#
d951d937 |
| 13-Apr-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Increate hazard for store dwordx3/4 to 2 waitstates on gfx940
Fixes: SWDEV-327053
Differential Revision: https://reviews.llvm.org/D123687
|
#
f311f934 |
| 23-Mar-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] gfx940 VALU hazard recognizer
Differntial Revision: https://reviews.llvm.org/D122339
|
#
64838ba3 |
| 23-Mar-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Use GenericTable to classify DGEMM
Since there is a table introduced for MAI instructions extend it to use for DGEMM classification.
Differential Revision: https://reviews.llvm.org/D122337
|
#
cad9de71 |
| 23-Mar-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] gfx940 MAI hazard recognizer
Differential Revision: https://reviews.llvm.org/D122263
|
#
1e15adba |
| 04-Mar-2022 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Add s_nop WaitStates between neighboring mfma
In some cases padding bubbles between sequential MFMA instructions may lead to increased inter-wave performance. Add option to request to pad s
[AMDGPU] Add s_nop WaitStates between neighboring mfma
In some cases padding bubbles between sequential MFMA instructions may lead to increased inter-wave performance. Add option to request to pad some portion of these stall cycles with s_nops.
Fixes: SWDEV-326925
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D121437
show more ...
|
#
e9a49c64 |
| 17-Mar-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] gfx940 basic speed model
This is incomplete and will handle more instructions as they are added.
Differential Revision: https://reviews.llvm.org/D121966
|
Revision tags: llvmorg-14.0.0-rc2 |
|
#
380ff31d |
| 22-Feb-2022 |
Thomas Symalla <5754458+tsymalla@users.noreply.github.com> |
[AMDGPU] Fix typo in comment [NFC]
This replaces "V_MOB_B32" with "V_MOV_B32" in some comment.
|
#
6527b2a4 |
| 18-Feb-2022 |
Sebastian Neubauer <Sebastian.Neubauer@amd.com> |
[AMDGPU][NFC] Fix typos
Fix some typos in the amdgpu backend.
Differential Revision: https://reviews.llvm.org/D119235
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init |
|
#
dbf278b9 |
| 21-Jan-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Prevent aliasing of SrcC and Dst in MAI
Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs or Src_C and vDst are completely separated. The case that Src_C and vDst are
[AMDGPU] Prevent aliasing of SrcC and Dst in MAI
Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs or Src_C and vDst are completely separated. The case that Src_C and vDst are overlapping should be avoid as new value could be written to accumulator input before it gets read.
Note that this inevitably increases register pressure to the point where some programs will become uncompilable.
This patch separates MAC and FMA versions of MFMA instructions using either tied dst and src2 or earlyclobber dst.
Fixes: SWDEV-318900
Differential Revision: https://reviews.llvm.org/D117844
show more ...
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3 |
|
#
d6b07348 |
| 19-Jan-2022 |
Jim Lin <jim@andestech.com> |
[NFC] Use Register instead of unsigned
|
Revision tags: llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
#
661a232e |
| 19-Nov-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Remove a no-op check in the gfx90a hazard recognizer
Also rename helper function accordingly.
Differential Revision: https://reviews.llvm.org/D114289
|
#
d1f45ed5 |
| 11-Nov-2021 |
Neubauer, Sebastian <Sebastian.Neubauer@amd.com> |
[AMDGPU][NFC] Fix typos
Differential Revision: https://reviews.llvm.org/D113672
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
#
4f5ba46e |
| 17-Aug-2021 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU] Set wait state for meta instructions to zero
It looked more reasonable to set the wait state to zero for all non-instructions. With that we can avoid the special handling for them in `getWa
[AMDGPU] Set wait state for meta instructions to zero
It looked more reasonable to set the wait state to zero for all non-instructions. With that we can avoid the special handling for them in `getWaitStatesSince` and `AdvanceCycle`. This NFC patch makes the handling more generic.
show more ...
|
#
68660767 |
| 13-Aug-2021 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU] Skip pseudo MIs in hazard recognizer
Instructions like WAVE_BARRIER and SI_MASKED_UNREACHABLE are only placeholders to prevent certain unwanted transformations and will get discarded during
[AMDGPU] Skip pseudo MIs in hazard recognizer
Instructions like WAVE_BARRIER and SI_MASKED_UNREACHABLE are only placeholders to prevent certain unwanted transformations and will get discarded during assembly emission. They should not be counted during nop insertion.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D108022
show more ...
|