Revision tags: llvmorg-21-init, llvmorg-19.1.7 |
|
#
67c55b1f |
| 18-Dec-2024 |
Ruiling, Song <ruiling.song@amd.com> |
[AMDGPU] Make max dwords of memory cluster configurable (#119342)
We find it helpful to increase the value for graphics workload. Make it
configurable so we can experiment with a different value.
|
Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
2b5b57c5 |
| 12-Nov-2024 |
Christudasan Devadasan <christudasan.devadasan@amd.com> |
[AMDGPU] Skip non-wwm reg implicit-def from bb prolog (#115834)
Currently all implicit-def instructions are part of bb prolog. We should only include the wwm-register's implicit definitions into the
[AMDGPU] Skip non-wwm reg implicit-def from bb prolog (#115834)
Currently all implicit-def instructions are part of bb prolog. We should only include the wwm-register's implicit definitions into the BB prolog. The other vector class registers' implicit defs when exist at the bb top might cause interference when pushed the LR_split copy insertion downwards. The SplitKit is very strict on altering the insertion points and will assert such instances.
show more ...
|
#
bc7e099a |
| 07-Nov-2024 |
dyung <douglas.yung@sony.com> |
Revert "[AMDGPU][MIR] Serialize NumPhysicalVGPRSpillLanes" (#115353)
Reverts llvm/llvm-project#115291
Reverting due to test failures on many bots including
https://lab.llvm.org/buildbot/#/builde
Revert "[AMDGPU][MIR] Serialize NumPhysicalVGPRSpillLanes" (#115353)
Reverts llvm/llvm-project#115291
Reverting due to test failures on many bots including
https://lab.llvm.org/buildbot/#/builders/174/builds/8049
show more ...
|
#
21835ee2 |
| 07-Nov-2024 |
Akshat Oke <Akshat.Oke@amd.com> |
[AMDGPU][MIR] Serialize NumPhysicalVGPRSpillLanes (#115291)
|
#
3495d045 |
| 05-Nov-2024 |
Akshat Oke <Akshat.Oke@amd.com> |
[AMDGPU][MIR] Serialize SpillPhysVGPRs (#113129)
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2 |
|
#
8d13e7b8 |
| 03-Oct-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
|
Revision tags: llvmorg-19.1.1 |
|
#
ac0f64f0 |
| 30-Sep-2024 |
Christudasan Devadasan <christudasan.devadasan@amd.com> |
[AMDGPU] Split vgpr regalloc pipeline (#93526)
Allocating wwm-registers and per-thread VGPR operands
together imposes many challenges in the way the
registers are reused during allocation. There a
[AMDGPU] Split vgpr regalloc pipeline (#93526)
Allocating wwm-registers and per-thread VGPR operands
together imposes many challenges in the way the
registers are reused during allocation. There are
times when regalloc reuses the registers of regular
VGPRs operations for wwm-operations in a small range
leading to unwantedly clobbering their inactive lanes
causing correctness issues that are hard to trace.
This patch splits the VGPR allocation pipeline further
to allocate wwm-registers first and the regular VGPR
operands in a separate pipeline. The splitting would
ensure that the physical registers used for wwm
allocations won't take part in the next allocation
pipeline to avoid any such clobbering.
show more ...
|
Revision tags: llvmorg-19.1.0 |
|
#
33562085 |
| 13-Sep-2024 |
Diana Picus <Diana-Magda.Picus@amd.com> |
Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512)
This reverts commit
https://github.com/llvm/llvm-project/commit/7792b4ae79e5ac9355ee13b01f16e25455f8427f.
The problem was a
Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512)
This reverts commit
https://github.com/llvm/llvm-project/commit/7792b4ae79e5ac9355ee13b01f16e25455f8427f.
The problem was a conflict with
https://github.com/llvm/llvm-project/commit/e55d6f5ea2656bf842973d8bee86c3ace31bc865
"[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive
(https://github.com/llvm/llvm-project/pull/107889)"
which changed the syntax of V_SET_INACTIVE (and thus made my MIR test
crash).
...if only we had a merge queue.
show more ...
|
#
7792b4ae |
| 12-Sep-2024 |
Diana Picus <Diana-Magda.Picus@amd.com> |
Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)"" (#108341)
Reverts llvm/llvm-project#108173
si-init-whole-wave.mir crashes on some buildbots (although it passed
bo
Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)"" (#108341)
Reverts llvm/llvm-project#108173
si-init-whole-wave.mir crashes on some buildbots (although it passed
both locally with sanitizers enabled and in pre-merge tests).
Investigating.
show more ...
|
#
703ebca8 |
| 12-Sep-2024 |
Diana Picus <Diana-Magda.Picus@amd.com> |
Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)" (#108173)
This reverts commit
https://github.com/llvm/llvm-project/commit/c7a7767fca736d0447832ea4d4587fb3b9e797c2.
The bui
Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)" (#108173)
This reverts commit
https://github.com/llvm/llvm-project/commit/c7a7767fca736d0447832ea4d4587fb3b9e797c2.
The buildbots failed because I removed a MI from its parent before
updating LIS. This PR should fix that.
show more ...
|
#
f4dd1bc8 |
| 11-Sep-2024 |
Fraser Cormack <fraser@codeplay.com> |
[AMDGPU] Fix leak and self-assignment in copy assignment operator (#107847)
A static analyzer identified that this operator was unsafe in the case
of self-assignment.
In the placement new statem
[AMDGPU] Fix leak and self-assignment in copy assignment operator (#107847)
A static analyzer identified that this operator was unsafe in the case
of self-assignment.
In the placement new statement, StringValue's copy constructor was being
implicitly called, which received a reference to "itself". In fact, it
was being passed an old StringValue at the same address - one whose
lifetime had already ended. The copy constructor was thus copying fields
from a dead object.
We need to be careful when switching active union members, and calling
the destructor on the old StringValue will avoid memory leaks which I
believe the old code exhibited.
show more ...
|
#
c7a7767f |
| 10-Sep-2024 |
Vitaly Buka <vitalybuka@google.com> |
Revert "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)
Breaks bots, see #105822.
Reverts llvm/llvm-project#105822
|
#
44556e64 |
| 10-Sep-2024 |
Diana Picus <Diana-Magda.Picus@amd.com> |
[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822)
This intrinsic is meant to be used in functions that have a "tail" that
needs to be run with all the lanes enabled. The "tail" may conta
[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822)
This intrinsic is meant to be used in functions that have a "tail" that
needs to be run with all the lanes enabled. The "tail" may contain
complex control flow that makes it unsuitable for the use of the
existing WWM intrinsics. Instead, we will pretend that the function
starts with all the lanes enabled, then branches into the actual body of
the function for the lanes that were meant to run it, and then finally
all the lanes will rejoin and run the tail.
As such, the intrinsic will return the EXEC mask for the body of the
function, and is meant to be used only as part of a very limited pattern
(for now only in amdgpu_cs_chain functions):
```
entry:
%func_exec = call i1 @llvm.amdgcn.init.whole.wave()
br i1 %func_exec, label %func, label %tail
func:
; ... stuff that should run with the actual EXEC mask
br label %tail
tail:
; ... stuff that runs with all the lanes enabled;
; can contain more than one basic block
```
It's an error to use the result of this intrinsic for anything
other than a branch (but unfortunately checking that in the verifier is
non-trivial because SIAnnotateControlFlow will introduce an amdgcn.if
between the intrinsic and the branch).
The intrinsic is lowered to a SI_INIT_WHOLE_WAVE pseudo, which for now
is expanded in si-wqm (which is where SI_INIT_EXEC is handled too);
however the information that the function was conceptually started in
whole wave mode is stored in the machine function info
(hasInitWholeWave). This will be useful in prolog epilog insertion,
where we can skip saving the inactive lanes for CSRs (since if the
function started with all the lanes active, then there are no inactive
lanes to preserve).
show more ...
|
Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
7e9b49f6 |
| 25-Jun-2024 |
Nicolai Hähnle <nicolai.haehnle@amd.com> |
AMDGPU: Add plumbing for private segment size argument (#96445)
The actual size of scratch/private is determined at dispatch time, so
add more plumbing to request it. Will be used in subsequent cha
AMDGPU: Add plumbing for private segment size argument (#96445)
The actual size of scratch/private is determined at dispatch time, so
add more plumbing to request it. Will be used in subsequent change.
show more ...
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7 |
|
#
197c3a3e |
| 02-Jun-2024 |
Kazu Hirata <kazu@google.com> |
Use llvm::less_first (NFC) (#94136)
|
Revision tags: llvmorg-18.1.6 |
|
#
cd4287bc |
| 03-May-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Convert PrologEpilogSGPRSpills from DenseMap to sorted vector (#90957)
In practice PrologEpilogSGPRSpills never has more than 3 entries so
DenseMap is overkill. In addition this means that
[AMDGPU] Convert PrologEpilogSGPRSpills from DenseMap to sorted vector (#90957)
In practice PrologEpilogSGPRSpills never has more than 3 entries so
DenseMap is overkill. In addition this means that iteration happens in
register number order, instead of DenseMap's hashed order, so it will
not be affected by future patches that define new physical registers.
This should reduce future test case churn.
show more ...
|
Revision tags: llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2 |
|
#
c4e517f5 |
| 12-Mar-2024 |
Jun Wang <jwang86@yahoo.com> |
[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035)
A new function attribute named amdgpu_num_work_groups is added. This
attribute, which consists of three integers, allows progr
[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035)
A new function attribute named amdgpu_num_work_groups is added. This
attribute, which consists of three integers, allows programmers to let
the compiler know the number of workgroups to be launched in each of the
three dimensions and do optimizations based on that information.
---------
Co-authored-by: Jun Wang <jun.wang7@amd.com>
show more ...
|
Revision tags: llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1 |
|
#
70fc9703 |
| 24-Jan-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Move architected SGPR implementation into isel (#79120)
|
Revision tags: llvmorg-19-init |
|
#
230c13d5 |
| 24-Jan-2024 |
Christudasan Devadasan <christudasan.devadasan@amd.com> |
[AMDGPU] Pick available high VGPR for CSR SGPR spilling (#78669)
CSR SGPR spilling currently uses the early available physical VGPRs. It
currently imposes a high register pressure while trying to a
[AMDGPU] Pick available high VGPR for CSR SGPR spilling (#78669)
CSR SGPR spilling currently uses the early available physical VGPRs. It
currently imposes a high register pressure while trying to allocate
large VGPR tuples within the default register budget.
This patch changes the spilling strategy by picking the VGPRs in the
reverse order, the highest available VGPR first and later after regalloc
shift them back to the lowest available range. With that, the initial
VGPRs would be available for allocation and possibility
of finding large number of contiguous registers will be more.
show more ...
|
#
818f13fc |
| 23-Jan-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Remove getWorkGroupIDSGPR, unused since aa6fb4c45e01
|
#
51392996 |
| 17-Dec-2023 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Track physical VGPRs used for SGPR spills (#75573)
Physical VGPRs used for SGPR spills need to be tracked independent of
WWM reserved registers. The WWM reserved set contains extra registe
[AMDGPU] Track physical VGPRs used for SGPR spills (#75573)
Physical VGPRs used for SGPR spills need to be tracked independent of
WWM reserved registers. The WWM reserved set contains extra registers
allocated during WWM pre-allocation pass.
This causes SGPR spills allocated after WWM pre-allocation to overlap
with WWM register usage, e.g. if frame pointer is spilt during
prologue/epilog insertion.
show more ...
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2 |
|
#
0455596e |
| 02-Aug-2023 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Add DAG ISel support for preloaded kernel arguments
This patch adds the DAG isel changes for kernel argument preloading. These changes are not usable with older firmware but subsequent patc
[AMDGPU] Add DAG ISel support for preloaded kernel arguments
This patch adds the DAG isel changes for kernel argument preloading. These changes are not usable with older firmware but subsequent patches in the series will make the codegen backwards compatible. This patch should only be submitted alongside that subsequent patch.
Preloading here begins from the start of the kernel arguments until the amount of arguments indicated by the CL flag amdgpu-kernarg-preload-count.
Aggregates and arguments passed by-ref are not supported.
Special care for the alignment of the kernarg segment is needed as well as consideration of the alignment of addressable SGPR tuples when we cannot directly use misaligned large tuples that the arguments are loaded to.
Reviewed By: bcahoon
Differential Revision: https://reviews.llvm.org/D158579
show more ...
|
#
343be513 |
| 19-Aug-2023 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Add utilities to track number of user SGPRs. NFC.
Factor out and unify some common code that calculates and tracks the number of user SGRPs.
Reviewed By: arsenm
Differential Revision: htt
[AMDGPU] Add utilities to track number of user SGPRs. NFC.
Factor out and unify some common code that calculates and tracks the number of user SGRPs.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D159439
show more ...
|
Revision tags: llvmorg-17.0.0-rc1 |
|
#
5272ae66 |
| 27-Jul-2023 |
Diana Picus <Diana-Magda.Picus@amd.com> |
[AMDGPU] Add IsChainFunction to the MachineFunctionInfo
This will represent functions with the amdgpu_cs_chain or amdgpu_cs_chain_preserve calling conventions.
Differential Revision: https://review
[AMDGPU] Add IsChainFunction to the MachineFunctionInfo
This will represent functions with the amdgpu_cs_chain or amdgpu_cs_chain_preserve calling conventions.
Differential Revision: https://reviews.llvm.org/D156410
show more ...
|
#
4d42e8b5 |
| 28-Jul-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.
The workaround in c26dfc81e254c78dc2
Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.
The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work around the underlying problem with SUBREG_TO_REG.
show more ...
|