#
aad2d272 |
| 26-Nov-2022 |
Kazu Hirata <kazu@google.com> |
[Utils] Use std::optional in AMDGPUBaseInfo.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-ge
[Utils] Use std::optional in AMDGPUBaseInfo.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
show more ...
|
#
96155bf4 |
| 18-Nov-2022 |
Dmitry Preobrazhensky <dmitri.preobrazhenski@gmail.com> |
[AMDGPU][GFX11][NFC] Refactor VOPD operands handling (part 2)
Rename interface functions and operands to make code clearer.
Differential Revision: https://reviews.llvm.org/D138133
|
#
e468b1b7 |
| 16-Nov-2022 |
Dmitry Preobrazhensky <dmitri.preobrazhenski@gmail.com> |
[AMDGPU][GFX11] Refactor VOPD operands handling
Differential Revision: https://reviews.llvm.org/D137952
|
Revision tags: llvmorg-15.0.5 |
|
#
7425077e |
| 07-Nov-2022 |
Pierre van Houtryve <pierre.vanhoutryve@amd.com> |
[AMDGPU] Add & use `hasNamedOperand`, NFC
In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1. This is fine in itself, but it's verbose and doesn'
[AMDGPU] Add & use `hasNamedOperand`, NFC
In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1. This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually.
Reviewed By: arsenm, foad
Differential Revision: https://reviews.llvm.org/D137540
show more ...
|
Revision tags: llvmorg-15.0.4 |
|
#
01b8140d |
| 20-Oct-2022 |
Joe Nash <Joseph.Nash@amd.com> |
[AMDGPU] Fix delay alu for VOPD with src2acc
V_FMAC_F32 and V_DOT2C_F32_F16 have a dummy src2 operand tied to vdst to inform passes that the instructions read the dst operand. The VOPD versions of t
[AMDGPU] Fix delay alu for VOPD with src2acc
V_FMAC_F32 and V_DOT2C_F32_F16 have a dummy src2 operand tied to vdst to inform passes that the instructions read the dst operand. The VOPD versions of these instructions lacked the dummy operand, which was a problem for inserting s_delay_alu. Introduce the dummy src2 operand on the VOPD versions, and fix the VOPD operand tracking logic to account for it.
Reviewed By: dp
Differential Revision: https://reviews.llvm.org/D136629
show more ...
|
Revision tags: llvmorg-15.0.3 |
|
#
fd7b0eea |
| 07-Oct-2022 |
Dmitry Preobrazhensky <dmitri.preobrazhenski@gmail.com> |
[AMDGPU][MC][GFX11] Add VOPD VGPR bank access validation
Differential Revision: https://reviews.llvm.org/D134960
|
Revision tags: working, llvmorg-15.0.2 |
|
#
ddfa0f62 |
| 23-Sep-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Add GFX11 feature for subtargets with more VGPRs
The full complement of physical VGPRs for GFX11 is 50% more than GFX10. Some subtargets have this, others stay the same as GFX10. This affec
[AMDGPU] Add GFX11 feature for subtargets with more VGPRs
The full complement of physical VGPRs for GFX11 is 50% more than GFX10. Some subtargets have this, others stay the same as GFX10. This affects occupancy calculations.
Differential Revision: https://reviews.llvm.org/D134522
show more ...
|
Revision tags: llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
#
b982ba2a |
| 13-Jul-2022 |
Joe Nash <Joseph.Nash@amd.com> |
[AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C
Due to the encoding changes in GFX11, we had a hack in place that disables the use of VGPRs above 128. This patch removes the need for that
[AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C
Due to the encoding changes in GFX11, we had a hack in place that disables the use of VGPRs above 128. This patch removes the need for that hack.
We introduce a new register class VGPR_32_Lo128 which is used for 16-bit operands of VOP1, VOP2, and VOPC instructions. This register class only has the low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1, VOP2, and VOPC instructions are correctly limited to use the first 128 VGPRs, while the other instructions can freely use all 256.
We introduce new pseduo-instructions used on GFX11 which have the suffix t16 (True 16) to use the VGPR_32_Lo128 register class.
Reviewed By: foad, rampitec, #amdgpu
Differential Revision: https://reviews.llvm.org/D133723
show more ...
|
#
c89e60bf |
| 15-Sep-2022 |
Dmitry Preobrazhensky <dmitri.preobrazhenski@gmail.com> |
[AMDGPU][MC][GFX11] Add VOPD literals validation
Differential Revision: https://reviews.llvm.org/D133864
|
#
a80116ef |
| 13-Sep-2022 |
Dmitry Preobrazhensky <dmitri.preobrazhenski@gmail.com> |
[AMDGPU][MC][GFX11] Add a helper function for identification of VOPD instructions
Differential Revision: https://reviews.llvm.org/D133608
|
#
7094ab4e |
| 18-Jul-2022 |
Kazu Hirata <kazu@google.com> |
[llvm] Modernize bool literals (NFC)
Identified with modernize-use-bool-literals.
|
#
d1af09ad |
| 23-Jun-2022 |
Joe Nash <Joseph.Nash@amd.com> |
[AMDGPU] gfx11 Generate VOPD Instructions
We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a
[AMDGPU] gfx11 Generate VOPD Instructions
We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a legal VOPD, namely that the matching operands (e.g. src0x and src0y, src1x and src1y) must be in different register banks. We add a PostRA scheduler mutation to put possible VOPD components back-to-back.
Depends on D128442, D128270
Reviewed By: #amdgpu, rampitec
Differential Revision: https://reviews.llvm.org/D128656
show more ...
|
#
4874838a |
| 28-Jun-2022 |
Piotr Sobczak <piotr.sobczak@amd.com> |
[AMDGPU] gfx11 WMMA instruction support
gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D1287
[AMDGPU] gfx11 WMMA instruction support
gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D128756
show more ...
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5 |
|
#
07b7fada |
| 25-May-2022 |
Joe Nash <Joseph.Nash@amd.com> |
[AMDGPU] gfx11 VOPD instructions MC support
VOPD is a new encoding for dual-issue instructions for use in wave32. This patch includes MC layer support only.
A VOPD instruction is constituted of an
[AMDGPU] gfx11 VOPD instructions MC support
VOPD is a new encoding for dual-issue instructions for use in wave32. This patch includes MC layer support only.
A VOPD instruction is constituted of an X component (for which there are 13 possible opcodes) and a Y component (for which there are the 13 X opcodes plus 3 more). Most of the complexity in defining and parsing a VOPD operation arises from the possible different total numbers of operands and deferred parsing of certain operands depending on the constituent X and Y opcodes.
Reviewed By: dp
Differential Revision: https://reviews.llvm.org/D128218
show more ...
|
#
485e8b4f |
| 20-Jun-2022 |
Dmitry Preobrazhensky <d-pre@mail.ru> |
[AMDGPU][MC][GFX11] Correct disassembly of DPP variants of VOPC64 opcodes
Fix bugs https://github.com/llvm/llvm-project/issues/56091, https://github.com/llvm/llvm-project/issues/56065.
Differential
[AMDGPU][MC][GFX11] Correct disassembly of DPP variants of VOPC64 opcodes
Fix bugs https://github.com/llvm/llvm-project/issues/56091, https://github.com/llvm/llvm-project/issues/56065.
Differential Revision: https://reviews.llvm.org/D128075
show more ...
|
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
#
7050d5b9 |
| 17-Feb-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Limit GFX11 to using 128 VGPRs
This is a temporary measure to avoid generating incorrect code until the compiler understands the new way that GFX11 encodes 16-bit operands in VOP instructio
[AMDGPU] Limit GFX11 to using 128 VGPRs
This is a temporary measure to avoid generating incorrect code until the compiler understands the new way that GFX11 encodes 16-bit operands in VOP instructions.
Differential Revision: https://reviews.llvm.org/D128054
show more ...
|
#
cb9ae937 |
| 10-Jun-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Define SGPR_NULL64 register. NFCI.
On gfx10+ null register can be used as both 32 and 64 bit operand. Define a 64 bit version of the register to use during codegen.
Differential Revision:
[AMDGPU] Define SGPR_NULL64 register. NFCI.
On gfx10+ null register can be used as both 32 and 64 bit operand. Define a 64 bit version of the register to use during codegen.
Differential Revision: https://reviews.llvm.org/D127527
show more ...
|
#
95a13425 |
| 05-Jun-2022 |
Fangrui Song <i@maskray.me> |
Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options
|
#
835e09c4 |
| 10-May-2022 |
Joe Nash <Joseph.Nash@amd.com> |
[AMDGPU] gfx11 FLAT Instructions
MachineCode Support for FLAT type instructions
Contributors: Sebastian Neubauer <sebastian.neubauer@amd.com>
Patch 12/N for upstreaming of AMDGPU gfx11 architectur
[AMDGPU] gfx11 FLAT Instructions
MachineCode Support for FLAT type instructions
Contributors: Sebastian Neubauer <sebastian.neubauer@amd.com>
Patch 12/N for upstreaming of AMDGPU gfx11 architecture.
Depends on D125989
Reviewed By: rampitec, #amdgpu
Differential Revision: https://reviews.llvm.org/D125992
show more ...
|
#
1a51ab76 |
| 25-Apr-2022 |
Joe Nash <Joseph.Nash@amd.com> |
[AMDGPU] gfx11 export instructions
Contributors: Jay Foad <jay.foad@amd.com> Dmitry Preobrazhensky <d-pre@mail.ru>
Patch 10/N for upstreaming of AMDGPU gfx11 architecture.
Depends on D125822
Revi
[AMDGPU] gfx11 export instructions
Contributors: Jay Foad <jay.foad@amd.com> Dmitry Preobrazhensky <d-pre@mail.ru>
Patch 10/N for upstreaming of AMDGPU gfx11 architecture.
Depends on D125822
Reviewed By: dp
Differential Revision: https://reviews.llvm.org/D125824
show more ...
|
#
d21b9b49 |
| 21-Apr-2022 |
Joe Nash <Joseph.Nash@amd.com> |
[AMDGPU] gfx11 scalar alu instructions
MC layer support for SOP(scalar alu operations) including encoding support for s_delay_alu and s_sendmsg_rtn.
Contributors: Jay Foad <jay.foad@amd.com>
Patch
[AMDGPU] gfx11 scalar alu instructions
MC layer support for SOP(scalar alu operations) including encoding support for s_delay_alu and s_sendmsg_rtn.
Contributors: Jay Foad <jay.foad@amd.com>
Patch 7/N for upstreaming of AMDGPU gfx11 architecture.
Depends on D125319
Reviewed By: #amdgpu, arsenm
Differential Revision: https://reviews.llvm.org/D125498
show more ...
|
#
c7025940 |
| 19-Apr-2022 |
Joe Nash <Joseph.Nash@amd.com> |
[AMDGPU] gfx11 BUF Instructions
Includes MachineCode layer support and tests, and MIR tests not requiring CodeGen pass changes. Includes a small change in SMInstructions.td to correct encoded bits.
[AMDGPU] gfx11 BUF Instructions
Includes MachineCode layer support and tests, and MIR tests not requiring CodeGen pass changes. Includes a small change in SMInstructions.td to correct encoded bits.
Contributors: Petar Avramovic <Petar.Avramovic@amd.com> Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com>
Depends on D125316
Patch 6/N for upstreaming of AMDGPU gfx11 architecture.
Reviewed By: dp, Petar.Avramovic
Differential Revision: https://reviews.llvm.org/D125319
show more ...
|
#
e01dbabd |
| 19-Apr-2022 |
Dmitry Preobrazhensky <d-pre@mail.ru> |
[AMDGPU][MC] Corrected error message "image data size does not match dmask and tfe"
Differential Revision: https://reviews.llvm.org/D123929
|
#
8edaf259 |
| 12-Apr-2022 |
Changpeng Fang <Changpeng.Fang@amd.com> |
AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally
Summary: Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default. We use implicitarg_ptr + offset t
AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally
Summary: Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default. We use implicitarg_ptr + offset to check whether the multigrid synchronization pointer is used. If yes, we remove this attribute and also remove amdgpu-no-implicitarg-ptr. We generate metadata for the hidden_multigrid_sync_arg only when the amdgpu-no-multigrid-sync-arg attribute is removed from the function.
Reviewers: arsenm, sameerds, b-sumner and foad
Differential Revision: https://reviews.llvm.org/D123548
show more ...
|
#
1f6aa903 |
| 07-Apr-2022 |
Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com> |
[AMDGPU][MC][GFX10] Added syntactic sugar for s_waitcnt_depctr operand
Added the following helpers:
depctr_hold_cnt(...) depctr_sa_sdst(...) depctr_va_vdst(...) depctr_va_sdst(...)
[AMDGPU][MC][GFX10] Added syntactic sugar for s_waitcnt_depctr operand
Added the following helpers:
depctr_hold_cnt(...) depctr_sa_sdst(...) depctr_va_vdst(...) depctr_va_sdst(...) depctr_va_ssrc(...) depctr_va_vcc(...) depctr_vm_vsrc(...)
Differential Revision: https://reviews.llvm.org/D123022
show more ...
|