History log of /llvm-project/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h (Results 1 – 25 of 186)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7
# 67c55b1f 18-Dec-2024 Ruiling, Song <ruiling.song@amd.com>

[AMDGPU] Make max dwords of memory cluster configurable (#119342)

We find it helpful to increase the value for graphics workload. Make it
configurable so we can experiment with a different value.


Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4
# 2b5b57c5 12-Nov-2024 Christudasan Devadasan <christudasan.devadasan@amd.com>

[AMDGPU] Skip non-wwm reg implicit-def from bb prolog (#115834)

Currently all implicit-def instructions are part of
bb prolog. We should only include the wwm-register's
implicit definitions into the

[AMDGPU] Skip non-wwm reg implicit-def from bb prolog (#115834)

Currently all implicit-def instructions are part of
bb prolog. We should only include the wwm-register's
implicit definitions into the BB prolog. The other
vector class registers' implicit defs when exist at
the bb top might cause interference when pushed the
LR_split copy insertion downwards. The SplitKit is
very strict on altering the insertion points and will
assert such instances.

show more ...


# bc7e099a 07-Nov-2024 dyung <douglas.yung@sony.com>

Revert "[AMDGPU][MIR] Serialize NumPhysicalVGPRSpillLanes" (#115353)

Reverts llvm/llvm-project#115291

Reverting due to test failures on many bots including
https://lab.llvm.org/buildbot/#/builde

Revert "[AMDGPU][MIR] Serialize NumPhysicalVGPRSpillLanes" (#115353)

Reverts llvm/llvm-project#115291

Reverting due to test failures on many bots including
https://lab.llvm.org/buildbot/#/builders/174/builds/8049

show more ...


# 21835ee2 07-Nov-2024 Akshat Oke <Akshat.Oke@amd.com>

[AMDGPU][MIR] Serialize NumPhysicalVGPRSpillLanes (#115291)


# 3495d045 05-Nov-2024 Akshat Oke <Akshat.Oke@amd.com>

[AMDGPU][MIR] Serialize SpillPhysVGPRs (#113129)


Revision tags: llvmorg-19.1.3, llvmorg-19.1.2
# 8d13e7b8 03-Oct-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Qualify auto. NFC. (#110878)

Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)


Revision tags: llvmorg-19.1.1
# ac0f64f0 30-Sep-2024 Christudasan Devadasan <christudasan.devadasan@amd.com>

[AMDGPU] Split vgpr regalloc pipeline (#93526)

Allocating wwm-registers and per-thread VGPR operands
together imposes many challenges in the way the
registers are reused during allocation. There a

[AMDGPU] Split vgpr regalloc pipeline (#93526)

Allocating wwm-registers and per-thread VGPR operands
together imposes many challenges in the way the
registers are reused during allocation. There are
times when regalloc reuses the registers of regular
VGPRs operations for wwm-operations in a small range
leading to unwantedly clobbering their inactive lanes
causing correctness issues that are hard to trace.

This patch splits the VGPR allocation pipeline further
to allocate wwm-registers first and the regular VGPR
operands in a separate pipeline. The splitting would
ensure that the physical registers used for wwm
allocations won't take part in the next allocation
pipeline to avoid any such clobbering.

show more ...


Revision tags: llvmorg-19.1.0
# 33562085 13-Sep-2024 Diana Picus <Diana-Magda.Picus@amd.com>

Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512)

This reverts commit
https://github.com/llvm/llvm-project/commit/7792b4ae79e5ac9355ee13b01f16e25455f8427f.

The problem was a

Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512)

This reverts commit
https://github.com/llvm/llvm-project/commit/7792b4ae79e5ac9355ee13b01f16e25455f8427f.

The problem was a conflict with
https://github.com/llvm/llvm-project/commit/e55d6f5ea2656bf842973d8bee86c3ace31bc865
"[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive
(https://github.com/llvm/llvm-project/pull/107889)"
which changed the syntax of V_SET_INACTIVE (and thus made my MIR test
crash).

...if only we had a merge queue.

show more ...


# 7792b4ae 12-Sep-2024 Diana Picus <Diana-Magda.Picus@amd.com>

Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)"" (#108341)

Reverts llvm/llvm-project#108173

si-init-whole-wave.mir crashes on some buildbots (although it passed
bo

Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)"" (#108341)

Reverts llvm/llvm-project#108173

si-init-whole-wave.mir crashes on some buildbots (although it passed
both locally with sanitizers enabled and in pre-merge tests).
Investigating.

show more ...


# 703ebca8 12-Sep-2024 Diana Picus <Diana-Magda.Picus@amd.com>

Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)" (#108173)

This reverts commit
https://github.com/llvm/llvm-project/commit/c7a7767fca736d0447832ea4d4587fb3b9e797c2.

The bui

Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)" (#108173)

This reverts commit
https://github.com/llvm/llvm-project/commit/c7a7767fca736d0447832ea4d4587fb3b9e797c2.

The buildbots failed because I removed a MI from its parent before
updating LIS. This PR should fix that.

show more ...


# f4dd1bc8 11-Sep-2024 Fraser Cormack <fraser@codeplay.com>

[AMDGPU] Fix leak and self-assignment in copy assignment operator (#107847)

A static analyzer identified that this operator was unsafe in the case
of self-assignment.

In the placement new statem

[AMDGPU] Fix leak and self-assignment in copy assignment operator (#107847)

A static analyzer identified that this operator was unsafe in the case
of self-assignment.

In the placement new statement, StringValue's copy constructor was being
implicitly called, which received a reference to "itself". In fact, it
was being passed an old StringValue at the same address - one whose
lifetime had already ended. The copy constructor was thus copying fields
from a dead object.

We need to be careful when switching active union members, and calling
the destructor on the old StringValue will avoid memory leaks which I
believe the old code exhibited.

show more ...


# c7a7767f 10-Sep-2024 Vitaly Buka <vitalybuka@google.com>

Revert "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)

Breaks bots, see #105822.

Reverts llvm/llvm-project#105822


# 44556e64 10-Sep-2024 Diana Picus <Diana-Magda.Picus@amd.com>

[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822)

This intrinsic is meant to be used in functions that have a "tail" that
needs to be run with all the lanes enabled. The "tail" may conta

[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822)

This intrinsic is meant to be used in functions that have a "tail" that
needs to be run with all the lanes enabled. The "tail" may contain
complex control flow that makes it unsuitable for the use of the
existing WWM intrinsics. Instead, we will pretend that the function
starts with all the lanes enabled, then branches into the actual body of
the function for the lanes that were meant to run it, and then finally
all the lanes will rejoin and run the tail.

As such, the intrinsic will return the EXEC mask for the body of the
function, and is meant to be used only as part of a very limited pattern
(for now only in amdgpu_cs_chain functions):

```
entry:
%func_exec = call i1 @llvm.amdgcn.init.whole.wave()
br i1 %func_exec, label %func, label %tail

func:
; ... stuff that should run with the actual EXEC mask
br label %tail

tail:
; ... stuff that runs with all the lanes enabled;
; can contain more than one basic block
```

It's an error to use the result of this intrinsic for anything
other than a branch (but unfortunately checking that in the verifier is
non-trivial because SIAnnotateControlFlow will introduce an amdgcn.if
between the intrinsic and the branch).

The intrinsic is lowered to a SI_INIT_WHOLE_WAVE pseudo, which for now
is expanded in si-wqm (which is where SI_INIT_EXEC is handled too);
however the information that the function was conceptually started in
whole wave mode is stored in the machine function info
(hasInitWholeWave). This will be useful in prolog epilog insertion,
where we can skip saving the inactive lanes for CSRs (since if the
function started with all the lanes active, then there are no inactive
lanes to preserve).

show more ...


Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 7e9b49f6 25-Jun-2024 Nicolai Hähnle <nicolai.haehnle@amd.com>

AMDGPU: Add plumbing for private segment size argument (#96445)

The actual size of scratch/private is determined at dispatch time, so
add more plumbing to request it. Will be used in subsequent cha

AMDGPU: Add plumbing for private segment size argument (#96445)

The actual size of scratch/private is determined at dispatch time, so
add more plumbing to request it. Will be used in subsequent change.

show more ...


Revision tags: llvmorg-18.1.8, llvmorg-18.1.7
# 197c3a3e 02-Jun-2024 Kazu Hirata <kazu@google.com>

Use llvm::less_first (NFC) (#94136)


Revision tags: llvmorg-18.1.6
# cd4287bc 03-May-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Convert PrologEpilogSGPRSpills from DenseMap to sorted vector (#90957)

In practice PrologEpilogSGPRSpills never has more than 3 entries so
DenseMap is overkill. In addition this means that

[AMDGPU] Convert PrologEpilogSGPRSpills from DenseMap to sorted vector (#90957)

In practice PrologEpilogSGPRSpills never has more than 3 entries so
DenseMap is overkill. In addition this means that iteration happens in
register number order, instead of DenseMap's hashed order, so it will
not be affected by future patches that define new physical registers.
This should reduce future test case churn.

show more ...


Revision tags: llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2
# c4e517f5 12-Mar-2024 Jun Wang <jwang86@yahoo.com>

[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035)

A new function attribute named amdgpu_num_work_groups is added. This
attribute, which consists of three integers, allows progr

[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035)

A new function attribute named amdgpu_num_work_groups is added. This
attribute, which consists of three integers, allows programmers to let
the compiler know the number of workgroups to be launched in each of the
three dimensions and do optimizations based on that information.

---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>

show more ...


Revision tags: llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1
# 70fc9703 24-Jan-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Move architected SGPR implementation into isel (#79120)


Revision tags: llvmorg-19-init
# 230c13d5 24-Jan-2024 Christudasan Devadasan <christudasan.devadasan@amd.com>

[AMDGPU] Pick available high VGPR for CSR SGPR spilling (#78669)

CSR SGPR spilling currently uses the early available physical VGPRs. It
currently imposes a high register pressure while trying to a

[AMDGPU] Pick available high VGPR for CSR SGPR spilling (#78669)

CSR SGPR spilling currently uses the early available physical VGPRs. It
currently imposes a high register pressure while trying to allocate
large VGPR tuples within the default register budget.

This patch changes the spilling strategy by picking the VGPRs in the
reverse order, the highest available VGPR first and later after regalloc
shift them back to the lowest available range. With that, the initial
VGPRs would be available for allocation and possibility
of finding large number of contiguous registers will be more.

show more ...


# 818f13fc 23-Jan-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Remove getWorkGroupIDSGPR, unused since aa6fb4c45e01


# 51392996 17-Dec-2023 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] Track physical VGPRs used for SGPR spills (#75573)

Physical VGPRs used for SGPR spills need to be tracked independent of
WWM reserved registers. The WWM reserved set contains extra registe

[AMDGPU] Track physical VGPRs used for SGPR spills (#75573)

Physical VGPRs used for SGPR spills need to be tracked independent of
WWM reserved registers. The WWM reserved set contains extra registers
allocated during WWM pre-allocation pass.

This causes SGPR spills allocated after WWM pre-allocation to overlap
with WWM register usage, e.g. if frame pointer is spilt during
prologue/epilog insertion.

show more ...


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2
# 0455596e 02-Aug-2023 Austin Kerbow <Austin.Kerbow@amd.com>

[AMDGPU] Add DAG ISel support for preloaded kernel arguments

This patch adds the DAG isel changes for kernel argument preloading.
These changes are not usable with older firmware but subsequent patc

[AMDGPU] Add DAG ISel support for preloaded kernel arguments

This patch adds the DAG isel changes for kernel argument preloading.
These changes are not usable with older firmware but subsequent patches
in the series will make the codegen backwards compatible. This patch
should only be submitted alongside that subsequent patch.

Preloading here begins from the start of the kernel arguments until the
amount of arguments indicated by the CL flag
amdgpu-kernarg-preload-count.

Aggregates and arguments passed by-ref are not supported.

Special care for the alignment of the kernarg segment is needed as well
as consideration of the alignment of addressable SGPR tuples when we
cannot directly use misaligned large tuples that the arguments are
loaded to.

Reviewed By: bcahoon

Differential Revision: https://reviews.llvm.org/D158579

show more ...


# 343be513 19-Aug-2023 Austin Kerbow <Austin.Kerbow@amd.com>

[AMDGPU] Add utilities to track number of user SGPRs. NFC.

Factor out and unify some common code that calculates and tracks the
number of user SGRPs.

Reviewed By: arsenm

Differential Revision: htt

[AMDGPU] Add utilities to track number of user SGPRs. NFC.

Factor out and unify some common code that calculates and tracks the
number of user SGRPs.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D159439

show more ...


Revision tags: llvmorg-17.0.0-rc1
# 5272ae66 27-Jul-2023 Diana Picus <Diana-Magda.Picus@amd.com>

[AMDGPU] Add IsChainFunction to the MachineFunctionInfo

This will represent functions with the amdgpu_cs_chain or
amdgpu_cs_chain_preserve calling conventions.

Differential Revision: https://review

[AMDGPU] Add IsChainFunction to the MachineFunctionInfo

This will represent functions with the amdgpu_cs_chain or
amdgpu_cs_chain_preserve calling conventions.

Differential Revision: https://reviews.llvm.org/D156410

show more ...


# 4d42e8b5 28-Jul-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"

This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.

The workaround in c26dfc81e254c78dc2

Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"

This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.

The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work
around the underlying problem with SUBREG_TO_REG.

show more ...


12345678