Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
ea33af63 |
| 01-Nov-2024 |
Krzysztof Drewniak <Krzysztof.Drewniak@amd.com> |
Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" v3 (#114443)
This reverts commit 8a849a2a567d4e519b246a16936b6e7519936d4b.
It seems I missed a spot when tr
Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" v3 (#114443)
This reverts commit 8a849a2a567d4e519b246a16936b6e7519936d4b.
It seems I missed a spot when trying to ensure the code in the
instruction selection tests were actually legalized MIR.
show more ...
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2 |
|
#
8a849a2a |
| 10-Oct-2024 |
Mikhail Goncharov <goncharov.mikhail@gmail.com> |
Revert "Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" v2 (#111708)"
This reverts commit 4b4a0d419c81b8b12a7dbb33dae1f7e9be91a88f.
New test fails on buildbo
Revert "Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" v2 (#111708)"
This reverts commit 4b4a0d419c81b8b12a7dbb33dae1f7e9be91a88f.
New test fails on buildbots https://lab.llvm.org/buildbot/#/builders/63/builds/2039 https://lab.llvm.org/buildbot/#/builders/127/builds/1055
show more ...
|
#
4b4a0d41 |
| 09-Oct-2024 |
Krzysztof Drewniak <Krzysztof.Drewniak@amd.com> |
Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" v2 (#111708)
This adds `-disable-gisel-legality-check` to some gfx6 and gfx7 test
lines to prevent behavior m
Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" v2 (#111708)
This adds `-disable-gisel-legality-check` to some gfx6 and gfx7 test
lines to prevent behavior mismatches between debug and release builds
The first attempted reapply was #111059
This reverts commit e075dcf7d270fd52dc837163ff24e8c872dfeb49.
show more ...
|
#
e075dcf7 |
| 06-Oct-2024 |
NAKAMURA Takumi <geek4civic@gmail.com> |
Revert "Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" (#111059)"
This reverts commit 98a15c7b0c6ec129d371f0c121dbe9396c4f5609. (llvmorg-20-init-8051-g98a15c
Revert "Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" (#111059)"
This reverts commit 98a15c7b0c6ec129d371f0c121dbe9396c4f5609. (llvmorg-20-init-8051-g98a15c7b0c6e)
show more ...
|
#
98a15c7b |
| 04-Oct-2024 |
Krzysztof Drewniak <Krzysztof.Drewniak@amd.com> |
Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" (#111059)
This reverts commit 650c41aad2eb43c634a05b2b5799a0c13a73b92f.
The test failures appear to be from
Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" (#111059)
This reverts commit 650c41aad2eb43c634a05b2b5799a0c13a73b92f.
The test failures appear to be from conflicts with other PRs that landed around this time.
show more ...
|
#
650c41aa |
| 03-Oct-2024 |
NAKAMURA Takumi <geek4civic@gmail.com> |
Revert "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)"
Some builders has been failing tests. ``` Failed Tests (2): LLVM :: CodeGen/AMDGPU/GlobalISel/inst-select-loa
Revert "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)"
Some builders has been failing tests. ``` Failed Tests (2): LLVM :: CodeGen/AMDGPU/GlobalISel/inst-select-load-global-old-legalization.mir LLVM :: CodeGen/AMDGPU/GlobalISel/inst-select-load-local.mir ```
This reverts commit ae5bd2a9f292037c605b2ec0ee31200581bd8701. (llvmorg-20-init-7805-gae5bd2a9f292)
show more ...
|
#
ae5bd2a9 |
| 02-Oct-2024 |
Krzysztof Drewniak <Krzysztof.Drewniak@amd.com> |
[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)
Certain pointer address spaces were not being correctly handled by the
GlobalISel lowering for buffer_load and buffer_s
[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)
Certain pointer address spaces were not being correctly handled by the
GlobalISel lowering for buffer_load and buffer_store.
1. ptr addrspace(1) and addrspace(4) did not have rewrite patterns
defined for them, while p0 did, since those pointer types weren't in the
list of types that was iterated to form the patterns.
2. Vectors of pointers need to be bitcast to vectors of the
corresponding scalars, since there doesn't seem to be a good way to
define the rewrite patterns for buffer_load/store of those types
The need to bitcast vectors of pointers was also revealed to affect
ordinary `G_LOAD` and `G_STORE` in some cases, so
`shouldBitcastLoadStore()` has been fixed to handle it properly.
show more ...
|
Revision tags: llvmorg-19.1.1, llvmorg-19.1.0 |
|
#
0745219d |
| 06-Sep-2024 |
Stanislav Mekhanoshin <rampitec@users.noreply.github.com> |
[AMDGPU] Add target intrinsic for s_buffer_prefetch_data (#107293)
|
Revision tags: llvmorg-19.1.0-rc4 |
|
#
26b0bef1 |
| 29-Aug-2024 |
Changpeng Fang <changpeng.fang@amd.com> |
AMDGPU: Use pattern to select instruction for intrinsic llvm.fptrunc.round (#105761)
Use GCNPat instead of Custom Lowering to select instructions for
intrinsic llvm.fptrunc.round. "SupportedRoundMo
AMDGPU: Use pattern to select instruction for intrinsic llvm.fptrunc.round (#105761)
Use GCNPat instead of Custom Lowering to select instructions for
intrinsic llvm.fptrunc.round. "SupportedRoundMode : TImmLeaf" is used as
a predicate to select only when the rounding mode is supported.
"as_hw_round_mode : SDNodeXForm" is developed to translate the round
modes to the corresponding ones that hardware recognizes.
show more ...
|
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
4477ff68 |
| 27-Jun-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove ds_fmin/ds_fmax intrinsics (#96739)
These have been replaced with atomicrmw.
|
#
5feb32ba |
| 25-Jun-2024 |
Vikram Hegde <115221833+vikramRH@users.noreply.github.com> |
[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217)
This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass t
[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217)
This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass to support i64 and f64 operations (along
with removing all unnecessary bitcasts). This legalizes 64 bit readlane,
writelane and readfirstlane ops pre-ISel
---------
Co-authored-by: vikramRH <vikhegde@amd.com>
show more ...
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7 |
|
#
fb2c6597 |
| 19-May-2024 |
Leon Clark <PeddleSpam@users.noreply.github.com> |
[AMDGPU] Use LSH for lowering ctlz_zero_undef.i8/i16 (#88512)
Use LSH to lower ctlz_zero_undef instead of subtracting leading zeros
for i8 and i16.
Related to [77615](https://github.com/llvm/llv
[AMDGPU] Use LSH for lowering ctlz_zero_undef.i8/i16 (#88512)
Use LSH to lower ctlz_zero_undef instead of subtracting leading zeros
for i8 and i16.
Related to [77615](https://github.com/llvm/llvm-project/pull/77615).
---------
Co-authored-by: Leon Clark <leoclark@amd.com>
show more ...
|
Revision tags: llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3 |
|
#
d365a45c |
| 23-Mar-2024 |
Evgenii Kudriashov <evgenii.kudriashov@intel.com> |
[GlobalISel] Introduce G_TRAP, G_DEBUGTRAP, G_UBSANTRAP (#84941)
Here we introduce three new GMIR instructions to cover a set of trap
intrinsics. The idea behind it is that generic intrinsics shoul
[GlobalISel] Introduce G_TRAP, G_DEBUGTRAP, G_UBSANTRAP (#84941)
Here we introduce three new GMIR instructions to cover a set of trap
intrinsics. The idea behind it is that generic intrinsics shouldn't be
used with G_INTRINSIC opcode.
These new instructions can match perfectly with existing trap ISD nodes.
It allows X86, AArch64, RISCV and Mips to reuse SelectionDAG patterns for
selection and avoid manual selection. However AMDGPU is an exception. It
selects traps during legalization regardless SelectionDAG or GlobalISel.
Since there are not many places where traps are used, this change
attempts to clean up all the usages of G_INTRINSIC with trap intrinsics. So,
there is no stage when both G_TRAP and
G_INTRINSIC_W_SIDE_EFFECTS(@llvm.trap) are allowed.
show more ...
|
Revision tags: llvmorg-18.1.2, llvmorg-18.1.1 |
|
#
1fc5e50c |
| 06-Mar-2024 |
Joseph Huber <huberjn@outlook.com> |
[AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (#83906)
Summary: This patch implements the LLVM floating point environment control intrinsics and also exposes it through clang. We encode t
[AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (#83906)
Summary: This patch implements the LLVM floating point environment control intrinsics and also exposes it through clang. We encode the floating point environment as a 64-bit value that simply concatenates the values of the mode registers and the current trap status. We only fetch the bits relevant for floating point instructions. That is, rounding mode, denormalization mode, ieee, dx10 clamp, debug, enabled traps, f16 overflow, and active exceptions.
show more ...
|
Revision tags: llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1 |
|
#
45d2d775 |
| 25-Jan-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325)
This is only valid on targets with architected SGPRs.
|
Revision tags: llvmorg-19-init |
|
#
f22cde10 |
| 08-Jan-2024 |
Ningning Shi(史宁宁) <shiningning@iscas.ac.cn> |
[GlobalISel][NFC]Delete the comments of XXLegalizerInfo (#76918)
Delete the LegalizerInfo comments of AArch64/AMD64/ARM/M68k/RISCV/x86,
they are copied from register bank.
|
#
d659bd16 |
| 03-Jan-2024 |
David Green <david.green@arm.com> |
[GlobalISel][AArch64] Tail call libcalls. (#74929)
This tries to allow libcalls to be tail called, using a similar method
to DAG where the type is checked to make sure they match, and if so the
ba
[GlobalISel][AArch64] Tail call libcalls. (#74929)
This tries to allow libcalls to be tail called, using a similar method
to DAG where the type is checked to make sure they match, and if so the
backend, through lowerCall checks that the tailcall is valid for all
arguments.
show more ...
|
Revision tags: llvmorg-17.0.6 |
|
#
f3138524 |
| 14-Nov-2023 |
Acim-Maravic <119684637+Acim-Maravic@users.noreply.github.com> |
[AMDGPU] Generic lowering for rint and nearbyint (#69596)
The are three different rounding intrinsics, that are brought down to
same instruction.
Co-authored-by: Acim Maravic <acim.maravic@amd.c
[AMDGPU] Generic lowering for rint and nearbyint (#69596)
The are three different rounding intrinsics, that are brought down to
same instruction.
Co-authored-by: Acim Maravic <acim.maravic@amd.com>
show more ...
|
Revision tags: llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3 |
|
#
aa5158cd |
| 10-Oct-2023 |
Thomas Symalla <5754458+tsymalla@users.noreply.github.com> |
[AMDGPU] Use absolute relocations when compiling for AMDPAL and Mesa3D (#67791)
The primary ISA-independent justification for using PC-relative
addressing is that it makes code position-independent
[AMDGPU] Use absolute relocations when compiling for AMDPAL and Mesa3D (#67791)
The primary ISA-independent justification for using PC-relative
addressing is that it makes code position-independent and therefore
allows sharing of .text pages between processes.
When not sharing .text pages, we can use absolute relocations instead,
which will possibly prevent a bubble introduced by s_getpc_b64.
Co-authored-by: Thomas Symalla <thomas.symalla@amd.com>
show more ...
|
Revision tags: llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3 |
|
#
72a7024a |
| 16-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Correctly lower llvm.sqrt.f32
Make codegen emit correctly rounded sqrt by default.
Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare based on !fpmath, like the fdiv case
AMDGPU: Correctly lower llvm.sqrt.f32
Make codegen emit correctly rounded sqrt by default.
Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare based on !fpmath, like the fdiv case. Hack around visitation ordering problems from AMDGPUCodeGenPrepare using forward iteration instead of a well behaved combiner.
https://reviews.llvm.org/D158129
show more ...
|
#
4b7b4b94 |
| 14-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix fast f32 log/log10
OpenCL conformance didn't like interpreting afn as ignore the denormal handling.
https://reviews.llvm.org/D157940
|
Revision tags: llvmorg-17.0.0-rc2 |
|
#
10304835 |
| 30-Jul-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU/GlobalISel: Handle stacksave/stackrestore
https://reviews.llvm.org/D156670
|
Revision tags: llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6 |
|
#
e3fd8f83 |
| 20-Nov-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Correctly expand f64 sqrt intrinsic
rocm-device-libs and llpc were avoiding using f64 sqrt intrinsics in favor of their own expansions. Port the expansion into the backend. Both of these use
AMDGPU: Correctly expand f64 sqrt intrinsic
rocm-device-libs and llpc were avoiding using f64 sqrt intrinsics in favor of their own expansions. Port the expansion into the backend. Both of these users should be updated to call the intrinsic instead.
The library and llpc expansions are slightly different. llpc uses an ldexp to do the scale; the library uses a multiply.
Use ldexp to do the scale instead of the multiply. I believe v_ldexp_f64 and v_mul_f64 are always the same number of cycles, but it's cheaper to materialize the 32-bit integer constant than the 64-bit double constant.
The libraries have another fast version of sqrt which will be handled separately.
I am tempted to do this in an IR expansion instead. In the IR we could take advantage of computeKnownFPClass to avoid the 0-or-inf argument check.
show more ...
|
#
54916662 |
| 14-Jun-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Correctly lower llvm.exp.f32
The library expansion has too many paths for all the permutations of DAZ, unsafe and the 3 exp functions. It's easier to expand it in the backend when we know al
AMDGPU: Correctly lower llvm.exp.f32
The library expansion has too many paths for all the permutations of DAZ, unsafe and the 3 exp functions. It's easier to expand it in the backend when we know all of these things. The library currently misses the no-infinity check on the overflow, which this handles optimizing out.
Some of the <3 x half> fast tests regress due to vector widening dropping flags which will be fixed separately.
Apparently there is no exp10 intrinsic, but there should be. Adds some deadish code in preparation for adding one while I'm following along with the current library expansion.
show more ...
|
#
ed556a1a |
| 14-Jun-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Correctly lower llvm.exp2.f32
Previously this did a fast math expansion only.
|