#
64838ba3 |
| 23-Mar-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Use GenericTable to classify DGEMM
Since there is a table introduced for MAI instructions extend it to use for DGEMM classification.
Differential Revision: https://reviews.llvm.org/D122337
|
#
cad9de71 |
| 23-Mar-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] gfx940 MAI hazard recognizer
Differential Revision: https://reviews.llvm.org/D122263
|
#
1d817a14 |
| 21-Mar-2022 |
Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com> |
[AMDGPU][MC][NFC] Refactored sendmsg(...) handling
Differential Revision: https://reviews.llvm.org/D121995
|
#
dd5895cc |
| 17-Mar-2022 |
Changpeng Fang <Changpeng.Fang@amd.com> |
AMDGPU: Use the implicit kernargs for code object version 5
Summary: Specifically, for trap handling, for targets that do not support getDoorbellID, we load the queue_ptr from the implicit kernarg
AMDGPU: Use the implicit kernargs for code object version 5
Summary: Specifically, for trap handling, for targets that do not support getDoorbellID, we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1]. To get aperture bases when targets do not have aperture registers, we load private_base or shared_base directly from the implicit kernarg. In clang, we use implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}.
Reviewers: arsenm, sameerds, yaxunl
Differential Revision: https://reviews.llvm.org/D120265
show more ...
|
#
5977dfba |
| 16-Mar-2022 |
Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com> |
[AMDGPU][MC][NFC] Refactored custom operands handling
The original design of custom operands support assumed that most GPUs have the same or very similar operand names end encodings. This is no long
[AMDGPU][MC][NFC] Refactored custom operands handling
The original design of custom operands support assumed that most GPUs have the same or very similar operand names end encodings. This is no longer the case. As a result the support code becomes over-complicated and difficult to maintain.
This change implements a different design with the following benefits:
- support of aliases; - support of operands with overlapped encodings; - identification of defined but unsupported operands.
Differential Revision: https://reviews.llvm.org/D121696
show more ...
|
#
8dd3d1cf |
| 10-Mar-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Add symbolic names for gfx940 HWREGs
The namespaces of HWREGs is now overlapping with gfx10. Thus the patch is longer than necessary to just support new names. It also need to handle proper
[AMDGPU] Add symbolic names for gfx940 HWREGs
The namespaces of HWREGs is now overlapping with gfx10. Thus the patch is longer than necessary to just support new names. It also need to handle proper error messages, i.e. to issue a "specified hardware register is not supported on this GPU" message.
This may need a major refactoring in the future.
Differential Revision: https://reviews.llvm.org/D121418
show more ...
|
#
0f20a35b |
| 09-Mar-2022 |
Changpeng Fang <Changpeng.Fang@amd.com> |
AMDGPU: Set up User SGPRs for queue_ptr only when necessary
Summary: In general, we need queue_ptr for aperture bases and trap handling, and user SGPRs have to be set up to hold queue_ptr. In curr
AMDGPU: Set up User SGPRs for queue_ptr only when necessary
Summary: In general, we need queue_ptr for aperture bases and trap handling, and user SGPRs have to be set up to hold queue_ptr. In current implementation, user SGPRs are set up unnecessarily for some cases. If the target has aperture registers, queue_ptr is not needed to reference aperture bases. For trap handling, if target suppots getDoorbellID, queue_ptr is also not necessary. Futher, code object version 5 introduces new kernel ABI which passes queue_ptr as an implicit kernel argument, so user SGPRs are no longer necessary for queue_ptr. Based on the trap handling document: https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table, llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap in the backend.
Reviewers: sameerds, arsenm
Fixes: SWDEV-307189
Differential Revision: https://reviews.llvm.org/D119762
show more ...
|
#
8992b50e |
| 02-Mar-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] gfx940 uses new names for coherency bits
Differential Revision: https://reviews.llvm.org/D120855
|
#
ca62b1db |
| 25-Feb-2022 |
Changpeng Fang <Changpeng.Fang@amd.com> |
[AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg
Summary: Emit metadata for hidden_heap_v1 kernarg
Reviewers: sameerds, b-sumner
Fixes: SWDEV-307188
Differential Revision: https://
[AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg
Summary: Emit metadata for hidden_heap_v1 kernarg
Reviewers: sameerds, b-sumner
Fixes: SWDEV-307188
Differential Revision: https://reviews.llvm.org/D119027
show more ...
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init |
|
#
74702444 |
| 02-Feb-2022 |
Jacob Lambert <jacob.lambert@amd.com> |
[AMDGPU] Add agpr_count to metadata and AsmParser
gfx90a allows the number of ACC registers (AGPRs) to be set independently to the VGPR registers. For both HSA and PAL metadata, we now include an "a
[AMDGPU] Add agpr_count to metadata and AsmParser
gfx90a allows the number of ACC registers (AGPRs) to be set independently to the VGPR registers. For both HSA and PAL metadata, we now include an "agpr_count" key to report the number of AGPRs set for supported devices (gfx90a, gfx908, as determined by hasMAIInsts()). This is collected from SIProgramInfo.NumAccVGPR for both HSA and PAL. The AsmParser also now recognizes ".kernel.agpr_count" for supported devices.
Differential Revision: https://reviews.llvm.org/D116140
show more ...
|
#
0bad7cb5 |
| 16-Feb-2022 |
Jacob Lambert <jacob.lambert@amd.com> |
Hoist getTotalNumVGPRs into AMDGPUBaseInfo for use in both codegen and MC
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D119912
|
#
6655c5a6 |
| 16-Feb-2022 |
Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com> |
[AMDGPU][MC][GFX10] Added an alias for HW_REG_HW_ID1
Enabled HW_REG_HW_ID as an alias for HW_REG_HW_ID1. This is required for compatibility with existing code.
Differential Revision: https://review
[AMDGPU][MC][GFX10] Added an alias for HW_REG_HW_ID1
Enabled HW_REG_HW_ID as an alias for HW_REG_HW_ID1. This is required for compatibility with existing code.
Differential Revision: https://reviews.llvm.org/D119939
show more ...
|
#
cb199e0f |
| 11-Feb-2022 |
Jay Foad <jay.foad@amd.com> |
[MC] Define and use MCRegisterInfo::regsOverlap
Separate MCRegisterInfo::regsOverlap out from TargetRegisterInfo::regsOverlap. This is useful in the AMDGPU AsmParser where we only have access to MCR
[MC] Define and use MCRegisterInfo::regsOverlap
Separate MCRegisterInfo::regsOverlap out from TargetRegisterInfo::regsOverlap. This is useful in the AMDGPU AsmParser where we only have access to MCRegisterInfo.
Differential Revision: https://reviews.llvm.org/D119533
show more ...
|
#
c7eb8463 |
| 11-Feb-2022 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Merge AMDGPULDSUtils into AMDGPUMemoryUtils
Differential Revision: https://reviews.llvm.org/D119502
|
#
d8f99bb6 |
| 11-Feb-2022 |
Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> |
[AMDGPU] replace hostcall module flag with function attribute
The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now r
[AMDGPU] replace hostcall module flag with function attribute
The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now replaced by a function attribute that gets propagated to top-level kernel functions via their respective call-graph.
If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the default behaviour is to emit kernel metadata indicating that the kernel uses the hostcall buffer pointer passed as an implicit argument.
The attribute may be placed explicitly by the user, or inferred by the AMDGPU attributor by examining the call-graph. The attribute is inferred only if the function is not being sanitized, and the implictarg_ptr does not result in a load of any byte in the hostcall pointer argument.
Reviewed By: jdoerfert, arsenm, kpyzhov
Differential Revision: https://reviews.llvm.org/D119216
show more ...
|
#
1194b9cd |
| 01-Feb-2022 |
Changpeng Fang <Changpeng.Fang@amd.com> |
AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args
Summary: Add code object v5 support (deafult is still v4) Generate metadata for implicit kernel args for t
AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args
Summary: Add code object v5 support (deafult is still v4) Generate metadata for implicit kernel args for the new ABI Set the metadata version to be 1.2
Reviewers: t-tye, b-sumner, arsenm, and bcahoon
Fixes: SWDEV-307188, SWDEV-307189
Differential Revision: https://reviews.llvm.org/D118272
show more ...
|
#
80532ebb |
| 24-Jan-2022 |
Sebastian Neubauer <Sebastian.Neubauer@amd.com> |
[AMDGPU][InstCombine] Remove zero image offset
Remove the offset parameter if it is zero.
Differential Revision: https://reviews.llvm.org/D117876
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
#
603d1803 |
| 21-Dec-2021 |
Sebastian Neubauer <Sebastian.Neubauer@amd.com> |
[AMDGPU][InstCombine] Remove zero LOD bias
If the bias is zero, we can remove it from the image instruction. Also copy other image optimizations (l->lz, mip->nomip) to IR combines.
Differential Rev
[AMDGPU][InstCombine] Remove zero LOD bias
If the bias is zero, we can remove it from the image instruction. Also copy other image optimizations (l->lz, mip->nomip) to IR combines.
Differential Revision: https://reviews.llvm.org/D116042
show more ...
|
#
c7ca4c63 |
| 17-Jan-2022 |
Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com> |
[AMDGPU][GFX10][MC] Updated symbolic names of internal HW registers
GFX10 no longer support HW_ID. It has been replaced with HW_ID1 and HW_ID2. See bug 52904: https://github.com/llvm/llvm-project/is
[AMDGPU][GFX10][MC] Updated symbolic names of internal HW registers
GFX10 no longer support HW_ID. It has been replaced with HW_ID1 and HW_ID2. See bug 52904: https://github.com/llvm/llvm-project/issues/52904
Differential Revision: https://reviews.llvm.org/D117313
show more ...
|
#
0542d152 |
| 26-Dec-2021 |
Kazu Hirata <kazu@google.com> |
Remove redundant string initialization (NFC)
Identified with readability-redundant-string-init.
|
Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3 |
|
#
654c89d8 |
| 06-Sep-2021 |
Christudasan Devadasan <Christudasan.Devadasan@amd.com> |
[AMDGPU] Make vector superclasses allocatable
The combined vector register classes with both VGPRs and AGPRs are currently unallocatable. This patch turns them into allocatable as a prerequisite to
[AMDGPU] Make vector superclasses allocatable
The combined vector register classes with both VGPRs and AGPRs are currently unallocatable. This patch turns them into allocatable as a prerequisite to enable copy between VGPR and AGPR registers during regalloc.
Also, added the missing AV register classes from 192b to 1024b.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D109300
show more ...
|
#
b4b7e605 |
| 04-Oct-2021 |
Joe Nash <Joseph.Nash@amd.com> |
[AMDGPU] Support shared literals in FMAMK/FMAAK
These instructions should allow src0 to be a literal with the same value as the mandatory other literal. Enable it by introducing an operand that defe
[AMDGPU] Support shared literals in FMAMK/FMAAK
These instructions should allow src0 to be a literal with the same value as the mandatory other literal. Enable it by introducing an operand that defers adding its value to the MI when decoding till the mandatory literal is parsed.
Reviewed By: dp, foad
Differential Revision: https://reviews.llvm.org/D111067
Change-Id: I22b0ae0d35bad17b6f976808e48bffe9a6af70b7
show more ...
|
#
21661607 |
| 06-Oct-2021 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[llvm] Replace report_fatal_error(std::string) uses with report_fatal_error(Twine)
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of rep
[llvm] Replace report_fatal_error(std::string) uses with report_fatal_error(Twine)
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.
show more ...
|
#
082e22f3 |
| 24-Sep-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Always reserve flat scratch SGPR for architected flat scratch
With architected flat scratch it becomes readonly. We must always reserve SGPR pair for it even if we do not use scratch at all
[AMDGPU] Always reserve flat scratch SGPR for architected flat scratch
With architected flat scratch it becomes readonly. We must always reserve SGPR pair for it even if we do not use scratch at all since an attempt to write to SGPRs mapped to FLAT_SCRATCH results in memory violation.
This is not needed since GFX10 with architected flat scratch though since special SGPRs are not carving space from normal SGPRs.
Differential Revision: https://reviews.llvm.org/D110376
show more ...
|
Revision tags: llvmorg-13.0.0-rc2 |
|
#
a0c42ca5 |
| 13-Aug-2021 |
Arthur Eubanks <aeubanks@google.com> |
[NFC] Remove AttributeList::hasParamAttribute()
It's the same as AttributeList::hasParamAttr().
|