|
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
| #
229e1185 |
| 23-Jul-2024 |
Christudasan Devadasan <christudasan.devadasan@amd.com> |
[AMDGPU] Codegen support for constrained multi-dword sloads (#96163)
For targets that support xnack replay feature (gfx8+), the multi-dword scalar loads shouldn't clobber any register that holds the
[AMDGPU] Codegen support for constrained multi-dword sloads (#96163)
For targets that support xnack replay feature (gfx8+), the multi-dword scalar loads shouldn't clobber any register that holds the src address. The constrained version of the scalar loads have the early clobber flag attached to the dst operand to restrict RA from re-allocating any of the src regs for its dst operand.
show more ...
|
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
| #
9e9907f1 |
| 17-Jan-2024 |
Fangrui Song <i@maskray.me> |
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.
For IR files without a target triple, -mtriple= specifies the full
target triple while
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.
For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.
Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.
This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:
```
LLVM :: CodeGen/AMDGPU/fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fabs.ll
LLVM :: CodeGen/AMDGPU/floor.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```
show more ...
|
|
Revision tags: llvmorg-17.0.6 |
|
| #
01c1c7a1 |
| 14-Nov-2023 |
Acim Maravic <119684637+Acim-Maravic@users.noreply.github.com> |
[AMDGPU][CodeGen] Update support (soffset + offset) s_buffer_load's (#68302)
getBaseWithConstantOffset() is used for scalar and non-scalar buffer
loads. Diffrence between s_load and load instructio
[AMDGPU][CodeGen] Update support (soffset + offset) s_buffer_load's (#68302)
getBaseWithConstantOffset() is used for scalar and non-scalar buffer
loads. Diffrence between s_load and load instruction is that s_load
instruction extends 32-bit offset to 64-bits, so a 32-bit (address +
offset) should not cause unsigned 32-bit integer wraparound, because it
performs addition in 64-bits.
show more ...
|
|
Revision tags: llvmorg-17.0.5 |
|
| #
86f2e092 |
| 01-Nov-2023 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Tweak handling of GlobalAddress operands in SI_PC_ADD_REL_OFFSET (#70960)
When SI_PC_ADD_REL_OFFSET is expanded to S_GETPC/S_ADD/S_ADDC, the
GlobalAddress operands have to be adjusted by 4
[AMDGPU] Tweak handling of GlobalAddress operands in SI_PC_ADD_REL_OFFSET (#70960)
When SI_PC_ADD_REL_OFFSET is expanded to S_GETPC/S_ADD/S_ADDC, the
GlobalAddress operands have to be adjusted by 4 or 12 bytes to account
for the offset from the end of the S_GETPC instruction to the literal
operands. Do this all in SIInstrInfo::expandPostRAPseudo instead of
duplicating the adjustment code in both AMDGPULegalizerInfo and
SITargetLowering. NFCI.
show more ...
|
|
Revision tags: llvmorg-17.0.4 |
|
| #
d96529af |
| 29-Oct-2023 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[DAG] Attempt shl narrowing in SimplifyDemandedBits (REAPPLIED)
If a shl node leaves the upper half bits zero / undemanded, then see if we can profitably perform this with a half-width shl and a fre
[DAG] Attempt shl narrowing in SimplifyDemandedBits (REAPPLIED)
If a shl node leaves the upper half bits zero / undemanded, then see if we can profitably perform this with a half-width shl and a free trunc/zext.
Followup to D146121
Reapplied - moved after the ShrinkDemandedOp call; reuse the existing KnownBits result; ensure that we only attempt this if all the upper bits are demanded; 547dc461225ba should address the remaining regressions that were noticed in the previous commit.
Differential Revision: https://reviews.llvm.org/D155472
show more ...
|
|
Revision tags: llvmorg-17.0.3 |
|
| #
0a776996 |
| 04-Oct-2023 |
Kirill Stoimenov <kstoimenov@google.com> |
Revert "[DAG] Attempt shl narrowing in SimplifyDemandedBits"
This reverts commit 7a8c04ef84ecdab4390b451d4c2fe17bc45a7b63.
|
| #
7a8c04ef |
| 04-Oct-2023 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[DAG] Attempt shl narrowing in SimplifyDemandedBits
If a shl node leaves the upper half bits zero / undemanded, then see if we can profitably perform this with a half-width shl and a free trunc/zext
[DAG] Attempt shl narrowing in SimplifyDemandedBits
If a shl node leaves the upper half bits zero / undemanded, then see if we can profitably perform this with a half-width shl and a free trunc/zext.
Followup to D146121
Differential Revision: https://reviews.llvm.org/D155472
show more ...
|
|
Revision tags: llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1 |
|
| #
faa2c678 |
| 04-Apr-2023 |
Krzysztof Drewniak <Krzysztof.Drewniak@amd.com> |
[AMDGPU] Add buffer intrinsics that take resources as pointers
In order to enable the LLVM frontend to better analyze buffer operations (and to potentially enable more precise analyses on the backen
[AMDGPU] Add buffer intrinsics that take resources as pointers
In order to enable the LLVM frontend to better analyze buffer operations (and to potentially enable more precise analyses on the backend), define versions of the raw and structured buffer intrinsics that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their rsrc arguments.
The new intrinsics are named by replacing `buffer.` with `buffer.ptr`.
One advantage to these intrinsic definitions is that, instead of specifying that a buffer load/store will read/write some memory, we can indicate that the memory read or written will be based on the pointer argument. This means that, for example, a read from a `noalias` buffer can be pulled out of a loop that is modifying a distinct buffer.
In the future, we will define custom PseudoSourceValues that will allow us to package up the (buffer, index, offset) triples that buffer intrinsics contain and allow for more precise backend analysis.
This work also enables creating address space 7, which represents manipulation of raw buffers using native LLVM load and store instructions.
Where tests simply used a buffer intrinsic while testing some other code path (such as the tests for VGPR spills), they have been updated to use the new intrinsic form. Tests that are "about" buffer intrinsics (for instance, those that ensure that they codegen as expected) have been duplicated, either within existing files or into new ones.
Depends on D145441
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D147547
show more ...
|
|
Revision tags: llvmorg-16.0.0, llvmorg-16.0.0-rc4 |
|
| #
2be31896 |
| 10-Mar-2023 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Don't select _SGPR forms of SMEM instructions on GFX9+
On GFX9+, SMEM instructions have an _SGPR_IMM form which is strictly more powerful than the _SGPR form. It simplifies codegen if we al
[AMDGPU] Don't select _SGPR forms of SMEM instructions on GFX9+
On GFX9+, SMEM instructions have an _SGPR_IMM form which is strictly more powerful than the _SGPR form. It simplifies codegen if we always select the _SGPR_IMM form with an immediate offset of 0 instead of the _SGPR form.
Note that this patch just makes minimal changes to the selection patterns to prove the concept. Further simplifications are possible to reduced the number of selection patterns.
On GFX9 the _SGPR form of the Real instruction is still required for assembly/disassembly but on GFX10+ it can be removed completely.
Differential Revision: https://reviews.llvm.org/D147334
show more ...
|
|
Revision tags: llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
| #
d85e849f |
| 02-Dec-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Convert some assorted tests to opaque pointers
|
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
bb70b5d4 |
| 18-Jul-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
CodeGen: Set MODereferenceable from isDereferenceableAndAlignedPointer
Previously this was assuming piontsToConstantMemory implies dereferenceable.
|
| #
5db8d6fd |
| 05-Sep-2022 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][CodeGen] Support (base | offset) SMEM loads.
Prevents generation of unnecessary s_or_b32 instructions.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D132552
|
| #
1f550d86 |
| 05-Sep-2022 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][CodeGen] Pre-commit a test on (base | offset) SMEM loads for D132552.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D133021
|
| #
f3364530 |
| 05-Sep-2022 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][CodeGen] Support (soffset + offset) s_buffer_load's.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D130263
|
| #
8d0383eb |
| 24-Jun-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
CodeGen: Remove AliasAnalysis from regalloc
This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is
CodeGen: Remove AliasAnalysis from regalloc
This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is rematerializable. I also don't think this was entirely correct, since it was implicitly assuming constant loads are also dereferenceable.
Remove this and rely only on the invariant+dereferenceable flags in the memory operand. Set the flag based on the AA query upfront. This should have the same net benefit, but has the possible disadvantage of making this AA query nonlazy.
Preserve the behavior of assuming pointsToConstantMemory implying dereferenceable for now, but maybe this should be changed.
show more ...
|
| #
432cbd78 |
| 18-Jul-2022 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][CodeGen] Support (register + immediate) SMRD offsets.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D129381
|
| #
9c66c02e |
| 18-Jul-2022 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][CodeGen] Match SMRDs with constant bases and register offsets.
Saves some add instructions on a couple Rage 2 shaders and is also a prerequisite for a coming-soon change matching (register
[AMDGPU][CodeGen] Match SMRDs with constant bases and register offsets.
Saves some add instructions on a couple Rage 2 shaders and is also a prerequisite for a coming-soon change matching (register + immediate) offsets.
Reviewed By: foad, arsenm
Differential Revision: https://reviews.llvm.org/D129095
show more ...
|
| #
8cd79bc1 |
| 05-Jul-2022 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][GlobalISel] Support register offsets for SMRDs.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D128836
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
fae05692 |
| 20-May-2021 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
CodeGen: Print/parse LLTs in MachineMemOperands
This will currently accept the old number of bytes syntax, and convert it to a scalar. This should be removed in the near future (I think I converted
CodeGen: Print/parse LLTs in MachineMemOperands
This will currently accept the old number of bytes syntax, and convert it to a scalar. This should be removed in the near future (I think I converted all of the tests already, but likely missed a few).
Not sure what the exact syntax and policy should be. We can continue printing the number of bytes for non-generic instructions to avoid test churn and only allow non-scalar types for generic instructions.
This will currently print the LLT in parentheses, but accept parsing the existing integers and implicitly converting to scalar. The parentheses are a bit ugly, but the parser logic seems unable to deal without either parentheses or some keyword to indicate the start of a type.
show more ...
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2 |
|
| #
3bffb1cd |
| 09-Feb-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy bitmask operand. This will reduce the number of operands in MIR and I hope the amoun
[AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy bitmask operand. This will reduce the number of operands in MIR and I hope the amount of code. These operands are mostly 0 anyway.
Additional advantage that parser will accept these flags in any order unlike now.
Differential Revision: https://reviews.llvm.org/D96469
show more ...
|
|
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2 |
|
| #
b8c8d1b3 |
| 30-Jul-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Convert some tests to use new buffer intrinsics
The legacy not struct or raw buffer intrinsics should now all be consolidated into the tests specifically for those intrinsics.
|
|
Revision tags: llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1 |
|
| #
77ce2e21 |
| 30-Mar-2020 |
Jakub Kuderski <kubak@google.com> |
[AMDGPU] Add Relocation Constant Support
Summary: This change adds amdgcn.reloc.constant intrinsic to the amdgpu backend, which will compile into a relocation entry in the resulting elf.
The intrin
[AMDGPU] Add Relocation Constant Support
Summary: This change adds amdgcn.reloc.constant intrinsic to the amdgpu backend, which will compile into a relocation entry in the resulting elf.
The intrinsics takes a MetadataNode (String) as its only argument, which specifies the symbol name of the relocation entry.
`SelectionDAGBuilder::getValueImpl` is changed to allow metadata operands passed through to ISel.
Author: csyonghe <yonghe@google.com>
Reviewers: tpr, nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76440
show more ...
|