History log of /llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (Results 151 – 175 of 364)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 64838ba3 23-Mar-2022 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Use GenericTable to classify DGEMM

Since there is a table introduced for MAI instructions extend it
to use for DGEMM classification.

Differential Revision: https://reviews.llvm.org/D122337


# cad9de71 23-Mar-2022 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] gfx940 MAI hazard recognizer

Differential Revision: https://reviews.llvm.org/D122263


# 1d817a14 21-Mar-2022 Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com>

[AMDGPU][MC][NFC] Refactored sendmsg(...) handling

Differential Revision: https://reviews.llvm.org/D121995


# dd5895cc 17-Mar-2022 Changpeng Fang <Changpeng.Fang@amd.com>

AMDGPU: Use the implicit kernargs for code object version 5

Summary:
Specifically, for trap handling, for targets that do not support getDoorbellID,
we load the queue_ptr from the implicit kernarg

AMDGPU: Use the implicit kernargs for code object version 5

Summary:
Specifically, for trap handling, for targets that do not support getDoorbellID,
we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1].
To get aperture bases when targets do not have aperture registers, we load
private_base or shared_base directly from the implicit kernarg. In clang, we use
implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}.

Reviewers: arsenm, sameerds, yaxunl

Differential Revision: https://reviews.llvm.org/D120265

show more ...


# 5977dfba 16-Mar-2022 Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com>

[AMDGPU][MC][NFC] Refactored custom operands handling

The original design of custom operands support assumed that most GPUs
have the same or very similar operand names end encodings. This is
no long

[AMDGPU][MC][NFC] Refactored custom operands handling

The original design of custom operands support assumed that most GPUs
have the same or very similar operand names end encodings. This is
no longer the case. As a result the support code becomes over-complicated
and difficult to maintain.

This change implements a different design with the following benefits:

- support of aliases;
- support of operands with overlapped encodings;
- identification of defined but unsupported operands.

Differential Revision: https://reviews.llvm.org/D121696

show more ...


# 8dd3d1cf 10-Mar-2022 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Add symbolic names for gfx940 HWREGs

The namespaces of HWREGs is now overlapping with gfx10. Thus the
patch is longer than necessary to just support new names. It also
need to handle proper

[AMDGPU] Add symbolic names for gfx940 HWREGs

The namespaces of HWREGs is now overlapping with gfx10. Thus the
patch is longer than necessary to just support new names. It also
need to handle proper error messages, i.e. to issue a "specified
hardware register is not supported on this GPU" message.

This may need a major refactoring in the future.

Differential Revision: https://reviews.llvm.org/D121418

show more ...


# 0f20a35b 09-Mar-2022 Changpeng Fang <Changpeng.Fang@amd.com>

AMDGPU: Set up User SGPRs for queue_ptr only when necessary

Summary:
In general, we need queue_ptr for aperture bases and trap handling,
and user SGPRs have to be set up to hold queue_ptr. In curr

AMDGPU: Set up User SGPRs for queue_ptr only when necessary

Summary:
In general, we need queue_ptr for aperture bases and trap handling,
and user SGPRs have to be set up to hold queue_ptr. In current implementation,
user SGPRs are set up unnecessarily for some cases. If the target has aperture
registers, queue_ptr is not needed to reference aperture bases. For trap
handling, if target suppots getDoorbellID, queue_ptr is also not necessary.
Futher, code object version 5 introduces new kernel ABI which passes queue_ptr
as an implicit kernel argument, so user SGPRs are no longer necessary for
queue_ptr. Based on the trap handling document:
https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table,
llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap
in the backend.

Reviewers: sameerds, arsenm

Fixes: SWDEV-307189

Differential Revision: https://reviews.llvm.org/D119762

show more ...


# 8992b50e 02-Mar-2022 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] gfx940 uses new names for coherency bits

Differential Revision: https://reviews.llvm.org/D120855


# ca62b1db 25-Feb-2022 Changpeng Fang <Changpeng.Fang@amd.com>

[AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg

Summary:
Emit metadata for hidden_heap_v1 kernarg

Reviewers:
sameerds, b-sumner

Fixes:
SWDEV-307188

Differential Revision:
https://

[AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg

Summary:
Emit metadata for hidden_heap_v1 kernarg

Reviewers:
sameerds, b-sumner

Fixes:
SWDEV-307188

Differential Revision:
https://reviews.llvm.org/D119027

show more ...


Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init
# 74702444 02-Feb-2022 Jacob Lambert <jacob.lambert@amd.com>

[AMDGPU] Add agpr_count to metadata and AsmParser

gfx90a allows the number of ACC registers (AGPRs) to be set
independently to the VGPR registers. For both HSA and PAL metadata, we
now include an "a

[AMDGPU] Add agpr_count to metadata and AsmParser

gfx90a allows the number of ACC registers (AGPRs) to be set
independently to the VGPR registers. For both HSA and PAL metadata, we
now include an "agpr_count" key to report the number of AGPRs set for
supported devices (gfx90a, gfx908, as determined by hasMAIInsts()).
This is collected from SIProgramInfo.NumAccVGPR for both HSA and PAL.
The AsmParser also now recognizes ".kernel.agpr_count" for supported
devices.

Differential Revision: https://reviews.llvm.org/D116140

show more ...


# 0bad7cb5 16-Feb-2022 Jacob Lambert <jacob.lambert@amd.com>

Hoist getTotalNumVGPRs into AMDGPUBaseInfo for use in both codegen and MC

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D119912


# 6655c5a6 16-Feb-2022 Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com>

[AMDGPU][MC][GFX10] Added an alias for HW_REG_HW_ID1

Enabled HW_REG_HW_ID as an alias for HW_REG_HW_ID1. This is required for compatibility with existing code.

Differential Revision: https://review

[AMDGPU][MC][GFX10] Added an alias for HW_REG_HW_ID1

Enabled HW_REG_HW_ID as an alias for HW_REG_HW_ID1. This is required for compatibility with existing code.

Differential Revision: https://reviews.llvm.org/D119939

show more ...


# cb199e0f 11-Feb-2022 Jay Foad <jay.foad@amd.com>

[MC] Define and use MCRegisterInfo::regsOverlap

Separate MCRegisterInfo::regsOverlap out from
TargetRegisterInfo::regsOverlap. This is useful in the AMDGPU AsmParser
where we only have access to MCR

[MC] Define and use MCRegisterInfo::regsOverlap

Separate MCRegisterInfo::regsOverlap out from
TargetRegisterInfo::regsOverlap. This is useful in the AMDGPU AsmParser
where we only have access to MCRegisterInfo.

Differential Revision: https://reviews.llvm.org/D119533

show more ...


# c7eb8463 11-Feb-2022 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Merge AMDGPULDSUtils into AMDGPUMemoryUtils

Differential Revision: https://reviews.llvm.org/D119502


# d8f99bb6 11-Feb-2022 Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com>

[AMDGPU] replace hostcall module flag with function attribute

The module flag to indicate use of hostcall is insufficient to catch
all cases where hostcall might be in use by a kernel. This is now
r

[AMDGPU] replace hostcall module flag with function attribute

The module flag to indicate use of hostcall is insufficient to catch
all cases where hostcall might be in use by a kernel. This is now
replaced by a function attribute that gets propagated to top-level
kernel functions via their respective call-graph.

If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the
default behaviour is to emit kernel metadata indicating that the
kernel uses the hostcall buffer pointer passed as an implicit
argument.

The attribute may be placed explicitly by the user, or inferred by the
AMDGPU attributor by examining the call-graph. The attribute is
inferred only if the function is not being sanitized, and the
implictarg_ptr does not result in a load of any byte in the hostcall
pointer argument.

Reviewed By: jdoerfert, arsenm, kpyzhov

Differential Revision: https://reviews.llvm.org/D119216

show more ...


# 1194b9cd 01-Feb-2022 Changpeng Fang <Changpeng.Fang@amd.com>

AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args

Summary:
Add code object v5 support (deafult is still v4)
Generate metadata for implicit kernel args for t

AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args

Summary:
Add code object v5 support (deafult is still v4)
Generate metadata for implicit kernel args for the new ABI
Set the metadata version to be 1.2

Reviewers:
t-tye, b-sumner, arsenm, and bcahoon

Fixes:
SWDEV-307188, SWDEV-307189

Differential Revision:
https://reviews.llvm.org/D118272

show more ...


# 80532ebb 24-Jan-2022 Sebastian Neubauer <Sebastian.Neubauer@amd.com>

[AMDGPU][InstCombine] Remove zero image offset

Remove the offset parameter if it is zero.

Differential Revision: https://reviews.llvm.org/D117876


Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 603d1803 21-Dec-2021 Sebastian Neubauer <Sebastian.Neubauer@amd.com>

[AMDGPU][InstCombine] Remove zero LOD bias

If the bias is zero, we can remove it from the image instruction.
Also copy other image optimizations (l->lz, mip->nomip) to IR combines.

Differential Rev

[AMDGPU][InstCombine] Remove zero LOD bias

If the bias is zero, we can remove it from the image instruction.
Also copy other image optimizations (l->lz, mip->nomip) to IR combines.

Differential Revision: https://reviews.llvm.org/D116042

show more ...


# c7ca4c63 17-Jan-2022 Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com>

[AMDGPU][GFX10][MC] Updated symbolic names of internal HW registers

GFX10 no longer support HW_ID. It has been replaced with HW_ID1 and HW_ID2.
See bug 52904: https://github.com/llvm/llvm-project/is

[AMDGPU][GFX10][MC] Updated symbolic names of internal HW registers

GFX10 no longer support HW_ID. It has been replaced with HW_ID1 and HW_ID2.
See bug 52904: https://github.com/llvm/llvm-project/issues/52904

Differential Revision: https://reviews.llvm.org/D117313

show more ...


# 0542d152 26-Dec-2021 Kazu Hirata <kazu@google.com>

Remove redundant string initialization (NFC)

Identified with readability-redundant-string-init.


Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3
# 654c89d8 06-Sep-2021 Christudasan Devadasan <Christudasan.Devadasan@amd.com>

[AMDGPU] Make vector superclasses allocatable

The combined vector register classes with both
VGPRs and AGPRs are currently unallocatable.
This patch turns them into allocatable as a
prerequisite to

[AMDGPU] Make vector superclasses allocatable

The combined vector register classes with both
VGPRs and AGPRs are currently unallocatable.
This patch turns them into allocatable as a
prerequisite to enable copy between VGPR and
AGPR registers during regalloc.

Also, added the missing AV register classes from
192b to 1024b.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D109300

show more ...


# b4b7e605 04-Oct-2021 Joe Nash <Joseph.Nash@amd.com>

[AMDGPU] Support shared literals in FMAMK/FMAAK

These instructions should allow src0 to be a literal with the same
value as the mandatory other literal. Enable it by introducing an
operand that defe

[AMDGPU] Support shared literals in FMAMK/FMAAK

These instructions should allow src0 to be a literal with the same
value as the mandatory other literal. Enable it by introducing an
operand that defers adding its value to the MI when decoding till
the mandatory literal is parsed.

Reviewed By: dp, foad

Differential Revision: https://reviews.llvm.org/D111067

Change-Id: I22b0ae0d35bad17b6f976808e48bffe9a6af70b7

show more ...


# 21661607 06-Oct-2021 Simon Pilgrim <llvm-dev@redking.me.uk>

[llvm] Replace report_fatal_error(std::string) uses with report_fatal_error(Twine)

As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of rep

[llvm] Replace report_fatal_error(std::string) uses with report_fatal_error(Twine)

As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.

show more ...


# 082e22f3 24-Sep-2021 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Always reserve flat scratch SGPR for architected flat scratch

With architected flat scratch it becomes readonly. We must always
reserve SGPR pair for it even if we do not use scratch at all

[AMDGPU] Always reserve flat scratch SGPR for architected flat scratch

With architected flat scratch it becomes readonly. We must always
reserve SGPR pair for it even if we do not use scratch at all since
an attempt to write to SGPRs mapped to FLAT_SCRATCH results in
memory violation.

This is not needed since GFX10 with architected flat scratch though
since special SGPRs are not carving space from normal SGPRs.

Differential Revision: https://reviews.llvm.org/D110376

show more ...


Revision tags: llvmorg-13.0.0-rc2
# a0c42ca5 13-Aug-2021 Arthur Eubanks <aeubanks@google.com>

[NFC] Remove AttributeList::hasParamAttribute()

It's the same as AttributeList::hasParamAttr().


12345678910>>...15