Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5 |
|
#
01c9a14c |
| 21-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)
These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. Non
AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)
These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp).
I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns.
The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior.
show more ...
|
Revision tags: llvmorg-19.1.4 |
|
#
12409024 |
| 31-Oct-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU/GlobalISel: Handle atomic sextload and zextload (#111721)
Atomic loads are handled differently from the DAG, and have separate opcodes and explicit control over the extensions, like ordinary
AMDGPU/GlobalISel: Handle atomic sextload and zextload (#111721)
Atomic loads are handled differently from the DAG, and have separate opcodes and explicit control over the extensions, like ordinary loads. Add new patterns for these.
There's room for cleanup and improvement. d16 cases aren't handled.
Fixes #111645
show more ...
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2 |
|
#
c198f775 |
| 09-Oct-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove flat/global fmin/fmax intrinsics (#105642)
These have been replaced with atomicrmw
|
Revision tags: llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4 |
|
#
ee08d9cb |
| 22-Aug-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove global/flat atomic fadd intrinics (#97051)
These have been replaced with atomicrmw.
|
#
9d364286 |
| 21-Aug-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove flat/global atomic fadd v2bf16 intrinsics (#97050)
These are now fully covered by atomicrmw.
|
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
70c8b9c2 |
| 23-Jun-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove ds atomic fadd intrinsics (#95396)
These have been replaced with atomicrmw fadd
|
#
85200612 |
| 18-Jun-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Support local atomicrmw fmin/fmax for float/double (#95590)
This has always been supported. Somehow, we ended up with 2 copies of clang builtins for this case, and the newer one erroneously
AMDGPU: Support local atomicrmw fmin/fmax for float/double (#95590)
This has always been supported. Somehow, we ended up with 2 copies of clang builtins for this case, and the newer one erroneously requires gfx8-insts.
show more ...
|
#
16238669 |
| 18-Jun-2024 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][MC] Support UC_VERSION_* constants. (#95618)
Our other tools support them, so we want them in LLVM
assembler/disassembler too.
|
#
eda9ff89 |
| 18-Jun-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Flat instructions do not have signed offsets gfx7-gfx11 (#95852)
Fixes some atomicrmw fadd and intrinsic cases
|
Revision tags: llvmorg-18.1.8 |
|
#
5c9352eb |
| 13-Jun-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
DAG: Replace bitwidth with type in suffix in atomic tablegen ops (#94845)
|
Revision tags: llvmorg-18.1.7 |
|
#
4ae896fe |
| 21-May-2024 |
jofrn <jofernau@amd.com> |
[AMDGPU] HasOneUse uses (#92534)
#91578 implements `HasOneUse` predicate on `PatFrag`, so this commit
uses it within AMDGPU.
|
Revision tags: llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2 |
|
#
ceb744eb |
| 13-Mar-2024 |
Harald van Dijk <harald@gigawatt.nl> |
[AMDGPU] Fix canonicalization of truncated values. (#83054)
We were relying on roundings to implicitly canonicalize, which is
generally safe, except with roundings that may be optimized away.
Fi
[AMDGPU] Fix canonicalization of truncated values. (#83054)
We were relying on roundings to implicitly canonicalize, which is
generally safe, except with roundings that may be optimized away.
Fixes #82937.
show more ...
|
Revision tags: llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3 |
|
#
f122268c |
| 20-Feb-2024 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][NFC] Extend PredicateControl to support True16 predicates. (#82245)
Using OtherPredicates for True16 predicates is often problematic due to
interference with other kinds of predicates, par
[AMDGPU][NFC] Extend PredicateControl to support True16 predicates. (#82245)
Using OtherPredicates for True16 predicates is often problematic due to
interference with other kinds of predicates, particularly when this
overrides predicates inherited from pseudo instructions.
show more ...
|
Revision tags: llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
c99da46f |
| 17-Jan-2024 |
Mariusz Sikora <mariusz.sikora@amd.com> |
[AMDGPU][GFX12] Add Atomic cond_sub_u32 (#76224)
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
|
#
cf025c76 |
| 02-Jan-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] GFX12 global_atomic_ordered_add_b64 instruction and intrinsic (#76149)
|
#
07a6d736 |
| 15-Dec-2023 |
Mirko Brkušanin <Mirko.Brkusanin@amd.com> |
[AMDGPU] CodeGen for GFX12 VFLAT, VSCRATCH and VGLOBAL instructions (#75493)
|
#
6eec8013 |
| 13-Dec-2023 |
Piotr Sobczak <piotr.sobczak@amd.com> |
[AMDGPU] Min/max changes for GFX12 (#75214)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5 |
|
#
c9cdaffe |
| 10-Nov-2023 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU] Fix operand definitions for atomic scalar memory instructions. (#71799)
CPol and CPol_GLC1 operand classes have identical predicates, which
means AsmParser cannot differentiate between the
[AMDGPU] Fix operand definitions for atomic scalar memory instructions. (#71799)
CPol and CPol_GLC1 operand classes have identical predicates, which
means AsmParser cannot differentiate between the RTN and non-RTN
variants of the instructions. When it currently selects the wrong
instruction, a hack in cvtSMEMAtomic() corrects the op-code. Using the
new predicated-value operands makes this hack and the whole conversion
function not needed.
Other uses of CPol_GLC1 operands are to be addressed separately.
Resolves about half of the remaining ~1000 pairs of ambiguous
instructions.
Part of <https://github.com/llvm/llvm-project/issues/69256>.
show more ...
|
Revision tags: llvmorg-17.0.4 |
|
#
509b5708 |
| 17-Oct-2023 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][AsmParser] Eliminate custom predicates for named-bit operands. (#69243)
isGDS() and isTFE() need special treatment, because they may be both
named-bit and token operands.
Part of #62629.
|
Revision tags: llvmorg-17.0.3 |
|
#
720be6c5 |
| 11-Oct-2023 |
Stephen Thomas <104134586+stepthomas@users.noreply.github.com> |
[AMDGPU] Add encoding/decoding support for non-result-returning ATOMIC_CSUB instructions (#68684)
The BUFFER_ATOMIC_CSUB and GLOBAL_ATOMIC_CSUB instructions have
encodings for
non-value-returning
[AMDGPU] Add encoding/decoding support for non-result-returning ATOMIC_CSUB instructions (#68684)
The BUFFER_ATOMIC_CSUB and GLOBAL_ATOMIC_CSUB instructions have
encodings for
non-value-returning forms, although actually using them isn't supported
by
hardware. However, these encodings aren't supported by the backend,
meaning
that they can't even be assembled or disassembled.
Add support for the non-returning encodings, but gate actually using
them
in instruction selection behind a new feature
FeatureAtomicCSubNoRtnInsts,
which no target uses. This does allow the non-returning instructions to
be
tested manually and llvm.amdgcn.atomic.csub.ll is extended to cover
them.
The feature does not gate assembling or disassembling them, this is now
not an error, and encoding and decoding tests have been adapted
accordingly.
show more ...
|
Revision tags: llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
#
ad9d13d5 |
| 04-Apr-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
SelectionDAG: Swap operands of atomic_store
Irritatingly, atomic_store had operands in the opposite order from regular store. This made it difficult to share patterns between regular and atomic stor
SelectionDAG: Swap operands of atomic_store
Irritatingly, atomic_store had operands in the opposite order from regular store. This made it difficult to share patterns between regular and atomic stores.
There was a previous incomplete attempt to move atomic_store into the regular StoreSDNode which would be better.
I think it was a mistake for all atomicrmw to swap the operand order, so maybe it's better to take this one step further.
https://reviews.llvm.org/D123143
show more ...
|
#
fbe4ff81 |
| 14-Jun-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Partially fix not respecting dynamic denormal mode
The most notable issue was producing v_mad_f32 in functions with the dynamic mode, since it just ignores the mode. fdiv lowering is still s
AMDGPU: Partially fix not respecting dynamic denormal mode
The most notable issue was producing v_mad_f32 in functions with the dynamic mode, since it just ignores the mode. fdiv lowering is still somewhat broken because it involves a mode switch and we need to query the original mode.
show more ...
|
#
12460cf9 |
| 05-Jul-2023 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][AsmParser] Simplify the implementation of SWZ operands.
Those are implicit helper operands and therefore don't need any parsers or printers.
Part of <https://github.com/llvm/llvm-project/i
[AMDGPU][AsmParser] Simplify the implementation of SWZ operands.
Those are implicit helper operands and therefore don't need any parsers or printers.
Part of <https://github.com/llvm/llvm-project/issues/62629>.
Reviewed By: piotr, foad
Differential Revision: https://reviews.llvm.org/D154432
show more ...
|
#
e95457d2 |
| 15-Jun-2023 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][AsmParser][NFC] Simplify v_interp-related operand definitions.
Part of <https://github.com/llvm/llvm-project/issues/62629>.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm
[AMDGPU][AsmParser][NFC] Simplify v_interp-related operand definitions.
Part of <https://github.com/llvm/llvm-project/issues/62629>.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D152897
show more ...
|
#
024e5408 |
| 14-Jun-2023 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][AsmParser][NFC] Get rid of custom default operand handlers.
Removes the need to add and remove them manually depending on whether they are used in cvt*() functions. Also removes the compile
[AMDGPU][AsmParser][NFC] Get rid of custom default operand handlers.
Removes the need to add and remove them manually depending on whether they are used in cvt*() functions. Also removes the compiler warnings about unused handlers when it happens to be the case.
Part of <https://github.com/llvm/llvm-project/issues/62629>.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D151688
show more ...
|