Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
6548b635 |
| 09-Nov-2024 |
Shilei Tian <i@tianshilei.me> |
Reapply "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"
This reverts commit ca33649abe5fad93c57afef54e43ed9b3249cd86.
|
#
ca33649a |
| 08-Nov-2024 |
Shilei Tian <i@tianshilei.me> |
Revert "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"
This reverts commit e215a1e27d84adad2635a52393621eb4fa439dc9 as it broke both hip and openmp buildbots.
|
#
e215a1e2 |
| 08-Nov-2024 |
Shilei Tian <i@tianshilei.me> |
[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)
|
Revision tags: llvmorg-19.1.3 |
|
#
3277c7cd |
| 21-Oct-2024 |
Stanislav Mekhanoshin <rampitec@users.noreply.github.com> |
[AMDGPU] Skip VGPR deallocation for waveslot limited kernels (#112765)
MSG_DEALLOC_VGPRS slows down very small waveslot limited kernels. It's
been identified this message is only really needed for
[AMDGPU] Skip VGPR deallocation for waveslot limited kernels (#112765)
MSG_DEALLOC_VGPRS slows down very small waveslot limited kernels. It's
been identified this message is only really needed for VGPR limited
kernels. A kernel becomes VGPR limited if a total number of VGPRs per
SIMD / number of used VGPRs is more than a number of wave slots.
show more ...
|
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0 |
|
#
86627149 |
| 04-Sep-2024 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#100067)
Any SGPR read by a VALU can potentially obscure SALU writes to the same
register.
Insert s_wait_alu instructions to mitigate the hazard on a
[AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#100067)
Any SGPR read by a VALU can potentially obscure SALU writes to the same
register.
Insert s_wait_alu instructions to mitigate the hazard on affected paths.
Compute a global cache of SGPRs with any VALU reads and use this to
avoid inserting mitigation for SGPRs never accessed by VALUs.
To avoid excessive search when compile time is priority implement
secondary mode where all SALU writes are mitigated.
Co-authored-by: Shilei Tian <shilei.tian@amd.com>
show more ...
|
Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1 |
|
#
a8203291 |
| 26-Jul-2024 |
Changpeng Fang <changpeng.fang@amd.com> |
[AMDGPU] Remove -wavefrontsize32 and -wavefrontsize64 from GFX10+ tests (NFC) (#100711)
They are no longer needed after the patch: [AMDGPU] Remove wavefrontsize
feature from GFX10: https://github.c
[AMDGPU] Remove -wavefrontsize32 and -wavefrontsize64 from GFX10+ tests (NFC) (#100711)
They are no longer needed after the patch: [AMDGPU] Remove wavefrontsize
feature from GFX10: https://github.com/llvm/llvm-project/pull/98400
The exception is when "target-features" are set to "+wavefrontsize32" or
"+wavefrontsize64", we still need to remove a wavefrontsize feature
before add a different one to make sure only one of them are present.
show more ...
|
Revision tags: llvmorg-20-init |
|
#
229e1185 |
| 23-Jul-2024 |
Christudasan Devadasan <christudasan.devadasan@amd.com> |
[AMDGPU] Codegen support for constrained multi-dword sloads (#96163)
For targets that support xnack replay feature (gfx8+), the multi-dword scalar loads shouldn't clobber any register that holds the
[AMDGPU] Codegen support for constrained multi-dword sloads (#96163)
For targets that support xnack replay feature (gfx8+), the multi-dword scalar loads shouldn't clobber any register that holds the src address. The constrained version of the scalar loads have the early clobber flag attached to the dst operand to restrict RA from re-allocating any of the src regs for its dst operand.
show more ...
|
#
cf230e77 |
| 15-Jul-2024 |
Vikram Hegde <115221833+vikramRH@users.noreply.github.com> |
[AMDGPU] Enable atomic optimizer for divergent i64 and double values (#96934)
|
#
b1bcb7ca |
| 15-Jul-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commit adaff46d087799
Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commit adaff46d087799072438dd744b038e6fd50a2d78.
Drop the -O3 checks from default-attributes.hip. I don't know why they are different on some bots but reverting this is far too disruptive.
show more ...
|
#
adaff46d |
| 15-Jul-2024 |
dyung <douglas.yung@sony.com> |
Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commits 677cc15e0ff2e0
Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commits 677cc15e0ff2e0e6aa30538eb187990a6a8f53c0 and
78bc1b64a6dc3fb6191355a5e1b502be8b3668e7.
The test CodeGenHIP/default-attributes.hip is failing on multiple bots
even after the attempted fix including the following:
- https://lab.llvm.org/buildbot/#/builders/3/builds/1473
- https://lab.llvm.org/buildbot/#/builders/65/builds/1380
- https://lab.llvm.org/buildbot/#/builders/161/builds/595
- https://lab.llvm.org/buildbot/#/builders/154/builds/1372
- https://lab.llvm.org/buildbot/#/builders/133/builds/1547
- https://lab.llvm.org/buildbot/#/builders/81/builds/755
- https://lab.llvm.org/buildbot/#/builders/40/builds/570
- https://lab.llvm.org/buildbot/#/builders/13/builds/748
- https://lab.llvm.org/buildbot/#/builders/12/builds/1845
- https://lab.llvm.org/buildbot/#/builders/11/builds/1695
- https://lab.llvm.org/buildbot/#/builders/190/builds/1829
- https://lab.llvm.org/buildbot/#/builders/193/builds/962
- https://lab.llvm.org/buildbot/#/builders/23/builds/991
- https://lab.llvm.org/buildbot/#/builders/144/builds/2256
- https://lab.llvm.org/buildbot/#/builders/46/builds/1614
These bots have been broken for a day, so reverting to get everything
back to green.
show more ...
|
#
78bc1b64 |
| 14-Jul-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Move attributor into optimization pipeline (#83131)
Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.
AMDGPU: Move attributor into optimization pipeline (#83131)
Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.
Mostly mechanical, but there are some creative test updates. I preferred
to take the changes as-is in tests where the ABI isn't relevant. In
cases where it's more relevant, or the optimize out logic was too
ingrained in the test, I pre-run the optimization. Some cases manually
add attributes to disable inputs.
show more ...
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
ba52f06f |
| 18-Jan-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438)
Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait
instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into
sep
[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438)
Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait
instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into
separate LOADCNT, SAMPLECNT and BVHCNT counters.
show more ...
|
#
e9e9d1b0 |
| 17-Jan-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Disable V_MAD_U64_U32/V_MAD_I64_I32 workaround for GFX12 (#77927)
|
#
9e9907f1 |
| 17-Jan-2024 |
Fangrui Song <i@maskray.me> |
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.
For IR files without a target triple, -mtriple= specifies the full
target triple while
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.
For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.
Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.
This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:
```
LLVM :: CodeGen/AMDGPU/fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fabs.ll
LLVM :: CodeGen/AMDGPU/floor.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```
show more ...
|
#
48f36c6e |
| 25-Dec-2023 |
Acim Maravic <Acim.Maravic@Syrmia.com> |
[LLVM] Make use of s_flbit_i32_b64 and s_ff1_i32_b64 (#75158)
Update DAG ISel to support 64bit versions S_FF1_I32_B64 and
S_FLBIT_I32_B664
---------
Co-authored-by: Acim Maravic <Acim.Maravic
[LLVM] Make use of s_flbit_i32_b64 and s_ff1_i32_b64 (#75158)
Update DAG ISel to support 64bit versions S_FF1_I32_B64 and
S_FLBIT_I32_B664
---------
Co-authored-by: Acim Maravic <Acim.Maravic@amd.com>
show more ...
|
#
5879162f |
| 15-Dec-2023 |
Mirko Brkušanin <Mirko.Brkusanin@amd.com> |
[AMDGPU] CodeGen for GFX12 VBUFFER instructions (#75492)
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3 |
|
#
7b3bbd83 |
| 09-Oct-2023 |
Jay Foad <jay.foad@amd.com> |
Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"
This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.
Reverted due to various buildbot failures.
|
#
2501ae58 |
| 09-Oct-2023 |
Jay Foad <jay.foad@amd.com> |
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
show more ...
|
Revision tags: llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
#
7fa7a08f |
| 19-Jul-2023 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Insert s_nop before s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
Differential Revision: https://reviews.llvm.org/D155681
|
#
7a101798 |
| 30-Jun-2023 |
Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> |
Revert "[AMDGPU] Mark mbcnt as convergent"
This reverts commit 37114036aa57e53217a57afacd7f47b36114edfb.
The output of mbcnt does not depend on other active lanes, and hence it is not convergent. T
Revert "[AMDGPU] Mark mbcnt as convergent"
This reverts commit 37114036aa57e53217a57afacd7f47b36114edfb.
The output of mbcnt does not depend on other active lanes, and hence it is not convergent. The original change was made as a possible fix for
https://github.com/ROCm-Developer-Tools/HIP/issues/3172
But changing mbcnt does not fix that issue.
Reviewed By: ruiling, foad, yaxunl
Differential Revision: https://reviews.llvm.org/D153953
show more ...
|
#
597fb7fb |
| 22-Jun-2023 |
Pravin Jagtap <Pravin.Jagtap@amd.com> |
[AMDGPU] Switch to the new cl option amdgpu-atomic-optimizer-strategy.
Atomic optimizer is turned on by default through D152649. This patch removes the usage of old command line option amdgpu-atomic
[AMDGPU] Switch to the new cl option amdgpu-atomic-optimizer-strategy.
Atomic optimizer is turned on by default through D152649. This patch removes the usage of old command line option amdgpu-atomic-optimizations and transfer the responsibility to `amdgpu-atomic-optimizer-strategy`.
We can safely remove old option when LLPC remove its all usage.
Reviewed By: foad, arsenm, #amdgpu, cdevadas
Differential Revision: https://reviews.llvm.org/D153007
show more ...
|
Revision tags: llvmorg-16.0.6 |
|
#
f6c8a8e9 |
| 09-Jun-2023 |
Pravin Jagtap <Pravin.Jagtap@amd.com> |
[AMDGPU] Iterative scan implementation for atomic optimizer.
This patch provides an alternative implementation to DPP for Scan Computations.
An alternative implementation iterates over all active l
[AMDGPU] Iterative scan implementation for atomic optimizer.
This patch provides an alternative implementation to DPP for Scan Computations.
An alternative implementation iterates over all active lanes of Wavefront using llvm.cttz and performs the following steps: 1. Read the value that needs to be atomically incremented using llvm.amdgcn.readlane intrinsic 2. Accumulate the result. 3. Update the scan result using llvm.amdgcn.writelane intrinsic if intermediate scan results are needed later in the kernel.
Reviewed By: arsenm, cdevadas
Differential Revision: https://reviews.llvm.org/D147408
show more ...
|
Revision tags: llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1 |
|
#
faa2c678 |
| 04-Apr-2023 |
Krzysztof Drewniak <Krzysztof.Drewniak@amd.com> |
[AMDGPU] Add buffer intrinsics that take resources as pointers
In order to enable the LLVM frontend to better analyze buffer operations (and to potentially enable more precise analyses on the backen
[AMDGPU] Add buffer intrinsics that take resources as pointers
In order to enable the LLVM frontend to better analyze buffer operations (and to potentially enable more precise analyses on the backend), define versions of the raw and structured buffer intrinsics that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their rsrc arguments.
The new intrinsics are named by replacing `buffer.` with `buffer.ptr`.
One advantage to these intrinsic definitions is that, instead of specifying that a buffer load/store will read/write some memory, we can indicate that the memory read or written will be based on the pointer argument. This means that, for example, a read from a `noalias` buffer can be pulled out of a loop that is modifying a distinct buffer.
In the future, we will define custom PseudoSourceValues that will allow us to package up the (buffer, index, offset) triples that buffer intrinsics contain and allow for more precise backend analysis.
This work also enables creating address space 7, which represents manipulation of raw buffers using native LLVM load and store instructions.
Where tests simply used a buffer intrinsic while testing some other code path (such as the tests for VGPR spills), they have been updated to use the new intrinsic form. Tests that are "about" buffer intrinsics (for instance, those that ensure that they codegen as expected) have been duplicated, either within existing files or into new ones.
Depends on D145441
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D147547
show more ...
|
Revision tags: llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
#
bdf2fbba |
| 19-Dec-2022 |
Nikita Popov <npopov@redhat.com> |
[AMDGPU] Convert some tests to opaque pointers (NFC)
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5 |
|
#
11e86868 |
| 14-Nov-2022 |
Guozhi Wei <carrot@google.com> |
[MachineCSE] Allow CSE for instructions with ignorable operands
Ignorable operands don't impact instruction's behavior, we can safely do CSE on the instruction.
It is split from D130919. It has big
[MachineCSE] Allow CSE for instructions with ignorable operands
Ignorable operands don't impact instruction's behavior, we can safely do CSE on the instruction.
It is split from D130919. It has big impact to some AMDGPU test cases. For example in atomic_optimizations_raw_buffer.ll, when trying to check if the following instruction can be CSEed
%37:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
Function isCallerPreservedOrConstPhysReg is called on operand "implicit $exec", this function is implemented as
- return TRI.isCallerPreservedPhysReg(Reg, MF) || + return TRI.isCallerPreservedPhysReg(Reg, MF) || TII.isIgnorableUse(MO) || (MRI.reservedRegsFrozen() && MRI.isConstantPhysReg(Reg));
Both TRI.isCallerPreservedPhysReg and MRI.isConstantPhysReg return false on this operand, so isCallerPreservedOrConstPhysReg is also false, it causes LLVM failed to CSE this instruction.
With this patch TII.isIgnorableUse returns true for the operand $exec, so isCallerPreservedOrConstPhysReg also returns true, it causes this instruction to be CSEed with previous instruction
%14:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
So I got different result from here. AMDGPU's implementation of isIgnorableUse is
bool SIInstrInfo::isIgnorableUse(const MachineOperand &MO) const { // Any implicit use of exec by VALU is not a real register read. return MO.getReg() == AMDGPU::EXEC && MO.isImplicit() && isVALU(*MO.getParent()) && !resultDependsOnExec(*MO.getParent()); }
Since the operand $exec is not a real register read, my understanding is it's reasonable to do CSE on such instructions.
Because more instructions are CSEed, so I get less instructions generated for these tests.
Differential Revision: https://reviews.llvm.org/D137222
show more ...
|