History log of /llvm-project/llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll (Results 1 – 25 of 39)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7
# eeac0ffa 10-Jan-2025 Nikita Popov <npopov@redhat.com>

Revert "[MachineLICM] Use `RegisterClassInfo::getRegPressureSetLimit` (#119826)"

This reverts commit b4e17d4a314ed87ff6b40b4b05397d4b25b6636a.

This causes a large compile-time regression.


# b4e17d4a 09-Jan-2025 Pengcheng Wang <wangpengcheng.pp@bytedance.com>

[MachineLICM] Use `RegisterClassInfo::getRegPressureSetLimit` (#119826)

`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of
`TargetRegisterInfo::getRegPressureSetLimit` with some logics to
a

[MachineLICM] Use `RegisterClassInfo::getRegPressureSetLimit` (#119826)

`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of
`TargetRegisterInfo::getRegPressureSetLimit` with some logics to
adjust the limit by removing reserved registers.

It seems that we shouldn't use
`TargetRegisterInfo::getRegPressureSetLimit`
directly, just like the comment "This limit must be adjusted
dynamically for reserved registers" said.

Separate from https://github.com/llvm/llvm-project/pull/118787

show more ...


Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4
# 6548b635 09-Nov-2024 Shilei Tian <i@tianshilei.me>

Reapply "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"

This reverts commit ca33649abe5fad93c57afef54e43ed9b3249cd86.


# ca33649a 08-Nov-2024 Shilei Tian <i@tianshilei.me>

Revert "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"

This reverts commit e215a1e27d84adad2635a52393621eb4fa439dc9 as it broke both
hip and openmp buildbots.


# e215a1e2 08-Nov-2024 Shilei Tian <i@tianshilei.me>

[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)


Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1
# 4490003a 06-Mar-2024 Emma Pilkington <emma.pilkington95@gmail.com>

[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905)

The previous name 'amdgpu_code_object_version', was misleading since
this is really a property of the HSA OS. The new spelling

[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905)

The previous name 'amdgpu_code_object_version', was misleading since
this is really a property of the HSA OS. The new spelling also matches
the asm directive I added in bc82cfb.

show more ...


Revision tags: llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3
# f6610578 09-Feb-2024 Jan Patrick Lehr <JanPatrick.Lehr@amd.com>

Revert "[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init" (#81234)

Reverts llvm/llvm-project#79586

This broke the AMDGPU OpenMP Offload buildbot.
The

Revert "[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init" (#81234)

Reverts llvm/llvm-project#79586

This broke the AMDGPU OpenMP Offload buildbot.
The typical error message was that the GPU attempted to read beyong the
largest legal address.

Error message:
AMDGPU fatal error 1: Received error in queue 0x7f8363f22000:
HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to
access memory beyond the largest legal address.

show more ...


# 88e52511 08-Feb-2024 alex-t <alex-t@users.noreply.github.com>

[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init (#79586)

This change implements synthesizing the private buffer resource
descriptor in the kernel prolo

[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init (#79586)

This change implements synthesizing the private buffer resource
descriptor in the kernel prolog instead of using the preloaded kernel
argument.

show more ...


Revision tags: llvmorg-18.1.0-rc2
# 1f3c3091 01-Feb-2024 Yaxun (Sam) Liu <yaxun.liu@amd.com>

[AMDGPU] Mark PC_ADD_REL_OFFSET rematerializable (#79674)

Currently machine LICM hoist PC_ADD_REL_OFFSET out of loops, causes
register pressure when function calls are deep in loops. This is a main

[AMDGPU] Mark PC_ADD_REL_OFFSET rematerializable (#79674)

Currently machine LICM hoist PC_ADD_REL_OFFSET out of loops, causes
register pressure when function calls are deep in loops. This is a main
cause of sgpr spill for programs containing large number of function
calls in loops.

This patch marks PC_ADD_REL_OFFSET as rematerializable, which eliminates
sgpr spills due to function calls in loops.

show more ...


Revision tags: llvmorg-18.1.0-rc1, llvmorg-19-init
# e21b7e21 15-Dec-2023 Saiyedul Islam <Saiyedul.Islam@amd.com>

[AMDGPU][NFC] Check more autogenerated llc tests for COV5 (#75219)

Regenerate a few more llc tests to check for COV5 instead of the default
ABI version.


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4
# fe8335ba 30-Oct-2023 Stanislav Mekhanoshin <rampitec@users.noreply.github.com>

[AMDGPU] Select 64-bit imm moves if can be encoded as 32 bit operand (#70395)

This allows folding of 64-bit operands if fit into 32-bit. Fixes
https://github.com/llvm/llvm-project/issues/67781


# e3cf80c5 25-Oct-2023 Matthias Braun <matze@braunis.de>

BlockFrequencyInfoImpl: Avoid big numbers, increase precision for small spreads

BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit

BlockFrequencyInfoImpl: Avoid big numbers, increase precision for small spreads

BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that:

* Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room.
* Spread the difference between hottest/coldest block as much as possible to increase precision.
* If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.

show more ...


Revision tags: llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init
# ff68e43c 20-Jul-2023 Jingu Kang <jingu.kang@arm.com>

[MachineLICM] Handle Subloops

It is a re-commit from reverted commit 3454cf67bd0a650097dc6ca99874a34e1d59b500.

Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass
handle

[MachineLICM] Handle Subloops

It is a re-commit from reverted commit 3454cf67bd0a650097dc6ca99874a34e1d59b500.

Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass
handle subloops with only visiting outermost loop's blocks once.

Differential Revision: https://reviews.llvm.org/D154205

show more ...


# 3454cf67 15-Sep-2023 Benjamin Kramer <benny.kra@googlemail.com>

Revert "[MachineLICM] Handle Subloops"

This reverts commit 5ec9699c4d1f165364586d825baef434e2c110b4. It
accesses MI after it has been hoisted.


# 5ec9699c 20-Jul-2023 Jingu Kang <jingu.kang@arm.com>

[MachineLICM] Handle Subloops

Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass
handle subloops with only visiting outermost loop's blocks once.

Differential Revision:

[MachineLICM] Handle Subloops

Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass
handle subloops with only visiting outermost loop's blocks once.

Differential Revision: https://reviews.llvm.org/D154205

show more ...


# 466a8149 12-Sep-2023 Saiyedul Islam <Saiyedul.Islam@amd.com>

Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)" (#66060)

This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.


# 0a8d17e7 12-Sep-2023 Saiyedul Islam <Saiyedul.Islam@amd.com>

[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Reviewed B

[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Reviewed By: arsenm, jhuber6

Github PR: #65410

Differential Revision: https://reviews.llvm.org/D129818

show more ...


# 4d42e8b5 28-Jul-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"

This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.

The workaround in c26dfc81e254c78dc2

Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"

This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.

The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work
around the underlying problem with SUBREG_TO_REG.

show more ...


# a496c8be 26-Jul-2023 Vitaly Buka <vitalybuka@google.com>

Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"

And dependent commits.

Details in D150388.

This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048

Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"

And dependent commits.

Details in D150388.

This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c.
This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e.
This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826.
This reverts commit b7836d856206ec39509d42529f958c920368166b.

No conflicts in the code, few tests had conflicts in autogenerated CHECKs:
llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll

Reviewed By: alexfh

Differential Revision: https://reviews.llvm.org/D156381

show more ...


# 351b4c17 20-Jul-2023 Jingu Kang <jingu.kang@arm.com>

Revert "[MachineLICM] Handle Subloops"

This reverts commit 50dd383d08670960540fecb4b48c0f0429fbfba3.


# 50dd383d 19-Jul-2023 Jingu Kang <jingu.kang@arm.com>

[MachineLICM] Handle Subloops

Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass
handle subloops with only visiting outmost loop's blocks once.

Differential Revision: h

[MachineLICM] Handle Subloops

Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass
handle subloops with only visiting outmost loop's blocks once.

Differential Revision: https://reviews.llvm.org/D154205

show more ...


# 62ed3ff4 19-Jul-2023 Jingu Kang <jingu.kang@arm.com>

Revert "[MachineLICM] Handle Subloops"

This reverts commit 33e60484d750291e99301e29e60fe72c8fa48ccd.


# 33e60484 30-Jun-2023 Jingu Kang <jingu.kang@arm.com>

[MachineLICM] Handle Subloops

MachineLICM pass handles inner loops only when outmost loop does not have unique
predecessor. If the loop has preheader and there is loop invariant code, the
invariant

[MachineLICM] Handle Subloops

MachineLICM pass handles inner loops only when outmost loop does not have unique
predecessor. If the loop has preheader and there is loop invariant code, the
invariant code can be hoisted to the preheader in general. This patch makes the
pass handle inner loops in general.

Differential Revision: https://reviews.llvm.org/D154205

show more ...


Revision tags: llvmorg-16.0.6, llvmorg-16.0.5
# 7a98f084 17-May-2023 Christudasan Devadasan <Christudasan.Devadasan@amd.com>

[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs

Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector

[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs

Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector regclass allocation.
This imposes many restrictions that we ended up with
unsuccessful SGPR spilling when there won't be enough
VGPRs and we are forced to spill the leftover into
memory during PEI. The custom spill handling during PEI
has many edge cases and often breaks the compiler time
to time.

This patch implements spilling SGPRs into virtual VGPR
lanes. Since we now split the register allocation for
SGPRs and VGPRs, the virtual registers introduced for
the spill lanes would get allocated automatically in
the subsequent regalloc invocation for VGPRs.

Spill to virtual registers will always be successful,
even in the high-pressure situations, and hence it avoids
most of the edge cases during PEI. We are now left with
only the custom SGPR spills during PEI for special registers
like the frame pointer which is an unproblematic case.

Differential Revision: https://reviews.llvm.org/D124196

show more ...


Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3
# d8925210 13-Feb-2023 pvanhout <pierre.vanhoutryve@amd.com>

[AMDGPU] Break-up large PHIs for DAGISel

DAGISel uses CopyToReg/CopyFromReg to lower PHI nodes. With large PHIs, this can result in poor codegen.
This is because it introduces a need to have a build

[AMDGPU] Break-up large PHIs for DAGISel

DAGISel uses CopyToReg/CopyFromReg to lower PHI nodes. With large PHIs, this can result in poor codegen.
This is because it introduces a need to have a build_vector before copying the PHI value, and that build_vector may have many undef elements. This can cause very high register pressure and abnormal stack usage in some cases.

This scalarization/phi "break-up" can be easily tuned/disabled through CL options in case it's not beneficial for some users.
It's also only enabled for DAGIsel and GlobalISel handles PHIs much better (as it works on the whole function).

This can both scalarize (break a vector into its elements) and simplify (break a vector into smaller, more manageable subvectors) PHIs.

Fixes SWDEV-321581

Reviewed By: kzhuravl

Differential Revision: https://reviews.llvm.org/D143731

show more ...


12