History log of /llvm-project/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h (Results 1 – 25 of 118)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4
# ea33af63 01-Nov-2024 Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>

Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" v3 (#114443)

This reverts commit 8a849a2a567d4e519b246a16936b6e7519936d4b.

It seems I missed a spot when tr

Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" v3 (#114443)

This reverts commit 8a849a2a567d4e519b246a16936b6e7519936d4b.

It seems I missed a spot when trying to ensure the code in the
instruction selection tests were actually legalized MIR.

show more ...


Revision tags: llvmorg-19.1.3, llvmorg-19.1.2
# 8a849a2a 10-Oct-2024 Mikhail Goncharov <goncharov.mikhail@gmail.com>

Revert "Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" v2 (#111708)"

This reverts commit 4b4a0d419c81b8b12a7dbb33dae1f7e9be91a88f.

New test fails on buildbo

Revert "Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" v2 (#111708)"

This reverts commit 4b4a0d419c81b8b12a7dbb33dae1f7e9be91a88f.

New test fails on buildbots https://lab.llvm.org/buildbot/#/builders/63/builds/2039 https://lab.llvm.org/buildbot/#/builders/127/builds/1055

show more ...


# 4b4a0d41 09-Oct-2024 Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>

Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" v2 (#111708)

This adds `-disable-gisel-legality-check` to some gfx6 and gfx7 test
lines to prevent behavior m

Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" v2 (#111708)

This adds `-disable-gisel-legality-check` to some gfx6 and gfx7 test
lines to prevent behavior mismatches between debug and release builds

The first attempted reapply was #111059

This reverts commit e075dcf7d270fd52dc837163ff24e8c872dfeb49.

show more ...


# e075dcf7 06-Oct-2024 NAKAMURA Takumi <geek4civic@gmail.com>

Revert "Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" (#111059)"

This reverts commit 98a15c7b0c6ec129d371f0c121dbe9396c4f5609.
(llvmorg-20-init-8051-g98a15c

Revert "Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" (#111059)"

This reverts commit 98a15c7b0c6ec129d371f0c121dbe9396c4f5609.
(llvmorg-20-init-8051-g98a15c7b0c6e)

show more ...


# 98a15c7b 04-Oct-2024 Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>

Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" (#111059)

This reverts commit 650c41aad2eb43c634a05b2b5799a0c13a73b92f.

The test failures appear to be from

Reapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)" (#111059)

This reverts commit 650c41aad2eb43c634a05b2b5799a0c13a73b92f.

The test failures appear to be from conflicts with other PRs that landed around this time.

show more ...


# 650c41aa 03-Oct-2024 NAKAMURA Takumi <geek4civic@gmail.com>

Revert "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)"

Some builders has been failing tests.
```
Failed Tests (2):
LLVM :: CodeGen/AMDGPU/GlobalISel/inst-select-loa

Revert "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)"

Some builders has been failing tests.
```
Failed Tests (2):
LLVM :: CodeGen/AMDGPU/GlobalISel/inst-select-load-global-old-legalization.mir
LLVM :: CodeGen/AMDGPU/GlobalISel/inst-select-load-local.mir
```

This reverts commit ae5bd2a9f292037c605b2ec0ee31200581bd8701.
(llvmorg-20-init-7805-gae5bd2a9f292)

show more ...


# ae5bd2a9 02-Oct-2024 Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>

[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)

Certain pointer address spaces were not being correctly handled by the
GlobalISel lowering for buffer_load and buffer_s

[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN (#110714)

Certain pointer address spaces were not being correctly handled by the
GlobalISel lowering for buffer_load and buffer_store.

1. ptr addrspace(1) and addrspace(4) did not have rewrite patterns
defined for them, while p0 did, since those pointer types weren't in the
list of types that was iterated to form the patterns.
2. Vectors of pointers need to be bitcast to vectors of the
corresponding scalars, since there doesn't seem to be a good way to
define the rewrite patterns for buffer_load/store of those types

The need to bitcast vectors of pointers was also revealed to affect
ordinary `G_LOAD` and `G_STORE` in some cases, so
`shouldBitcastLoadStore()` has been fixed to handle it properly.

show more ...


Revision tags: llvmorg-19.1.1, llvmorg-19.1.0
# 0745219d 06-Sep-2024 Stanislav Mekhanoshin <rampitec@users.noreply.github.com>

[AMDGPU] Add target intrinsic for s_buffer_prefetch_data (#107293)


Revision tags: llvmorg-19.1.0-rc4
# 26b0bef1 29-Aug-2024 Changpeng Fang <changpeng.fang@amd.com>

AMDGPU: Use pattern to select instruction for intrinsic llvm.fptrunc.round (#105761)

Use GCNPat instead of Custom Lowering to select instructions for
intrinsic llvm.fptrunc.round. "SupportedRoundMo

AMDGPU: Use pattern to select instruction for intrinsic llvm.fptrunc.round (#105761)

Use GCNPat instead of Custom Lowering to select instructions for
intrinsic llvm.fptrunc.round. "SupportedRoundMode : TImmLeaf" is used as
a predicate to select only when the rounding mode is supported.
"as_hw_round_mode : SDNodeXForm" is developed to translate the round
modes to the corresponding ones that hardware recognizes.

show more ...


Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 4477ff68 27-Jun-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Remove ds_fmin/ds_fmax intrinsics (#96739)

These have been replaced with atomicrmw.


# 5feb32ba 25-Jun-2024 Vikram Hegde <115221833+vikramRH@users.noreply.github.com>

[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217)

This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass t

[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217)

This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass to support i64 and f64 operations (along
with removing all unnecessary bitcasts). This legalizes 64 bit readlane,
writelane and readfirstlane ops pre-ISel

---------

Co-authored-by: vikramRH <vikhegde@amd.com>

show more ...


Revision tags: llvmorg-18.1.8, llvmorg-18.1.7
# fb2c6597 19-May-2024 Leon Clark <PeddleSpam@users.noreply.github.com>

[AMDGPU] Use LSH for lowering ctlz_zero_undef.i8/i16 (#88512)

Use LSH to lower ctlz_zero_undef instead of subtracting leading zeros
for i8 and i16.

Related to [77615](https://github.com/llvm/llv

[AMDGPU] Use LSH for lowering ctlz_zero_undef.i8/i16 (#88512)

Use LSH to lower ctlz_zero_undef instead of subtracting leading zeros
for i8 and i16.

Related to [77615](https://github.com/llvm/llvm-project/pull/77615).

---------

Co-authored-by: Leon Clark <leoclark@amd.com>

show more ...


Revision tags: llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3
# d365a45c 23-Mar-2024 Evgenii Kudriashov <evgenii.kudriashov@intel.com>

[GlobalISel] Introduce G_TRAP, G_DEBUGTRAP, G_UBSANTRAP (#84941)

Here we introduce three new GMIR instructions to cover a set of trap
intrinsics. The idea behind it is that generic intrinsics shoul

[GlobalISel] Introduce G_TRAP, G_DEBUGTRAP, G_UBSANTRAP (#84941)

Here we introduce three new GMIR instructions to cover a set of trap
intrinsics. The idea behind it is that generic intrinsics shouldn't be
used with G_INTRINSIC opcode.

These new instructions can match perfectly with existing trap ISD nodes.
It allows X86, AArch64, RISCV and Mips to reuse SelectionDAG patterns for
selection and avoid manual selection. However AMDGPU is an exception. It
selects traps during legalization regardless SelectionDAG or GlobalISel.

Since there are not many places where traps are used, this change
attempts to clean up all the usages of G_INTRINSIC with trap intrinsics. So,
there is no stage when both G_TRAP and
G_INTRINSIC_W_SIDE_EFFECTS(@llvm.trap) are allowed.

show more ...


Revision tags: llvmorg-18.1.2, llvmorg-18.1.1
# 1fc5e50c 06-Mar-2024 Joseph Huber <huberjn@outlook.com>

[AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (#83906)

Summary:
This patch implements the LLVM floating point environment control
intrinsics and also exposes it through clang. We encode t

[AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (#83906)

Summary:
This patch implements the LLVM floating point environment control
intrinsics and also exposes it through clang. We encode the floating
point environment as a 64-bit value that simply concatenates the values
of the mode registers and the current trap status. We only fetch the
bits relevant for floating point instructions. That is, rounding mode,
denormalization mode, ieee, dx10 clamp, debug, enabled traps, f16
overflow, and active exceptions.

show more ...


Revision tags: llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1
# 45d2d775 25-Jan-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325)

This is only valid on targets with architected SGPRs.


Revision tags: llvmorg-19-init
# f22cde10 08-Jan-2024 Ningning Shi(史宁宁) <shiningning@iscas.ac.cn>

[GlobalISel][NFC]Delete the comments of XXLegalizerInfo (#76918)

Delete the LegalizerInfo comments of AArch64/AMD64/ARM/M68k/RISCV/x86,
they are copied from register bank.


# d659bd16 03-Jan-2024 David Green <david.green@arm.com>

[GlobalISel][AArch64] Tail call libcalls. (#74929)

This tries to allow libcalls to be tail called, using a similar method
to DAG where the type is checked to make sure they match, and if so the
ba

[GlobalISel][AArch64] Tail call libcalls. (#74929)

This tries to allow libcalls to be tail called, using a similar method
to DAG where the type is checked to make sure they match, and if so the
backend, through lowerCall checks that the tailcall is valid for all
arguments.

show more ...


Revision tags: llvmorg-17.0.6
# f3138524 14-Nov-2023 Acim-Maravic <119684637+Acim-Maravic@users.noreply.github.com>

[AMDGPU] Generic lowering for rint and nearbyint (#69596)

The are three different rounding intrinsics, that are brought down to
same instruction.

Co-authored-by: Acim Maravic <acim.maravic@amd.c

[AMDGPU] Generic lowering for rint and nearbyint (#69596)

The are three different rounding intrinsics, that are brought down to
same instruction.

Co-authored-by: Acim Maravic <acim.maravic@amd.com>

show more ...


Revision tags: llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3
# aa5158cd 10-Oct-2023 Thomas Symalla <5754458+tsymalla@users.noreply.github.com>

[AMDGPU] Use absolute relocations when compiling for AMDPAL and Mesa3D (#67791)

The primary ISA-independent justification for using PC-relative
addressing is that it makes code position-independent

[AMDGPU] Use absolute relocations when compiling for AMDPAL and Mesa3D (#67791)

The primary ISA-independent justification for using PC-relative
addressing is that it makes code position-independent and therefore
allows sharing of .text pages between processes.

When not sharing .text pages, we can use absolute relocations instead,
which will possibly prevent a bubble introduced by s_getpc_b64.

Co-authored-by: Thomas Symalla <thomas.symalla@amd.com>

show more ...


Revision tags: llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3
# 72a7024a 16-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Correctly lower llvm.sqrt.f32

Make codegen emit correctly rounded sqrt by default.

Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare
based on !fpmath, like the fdiv case

AMDGPU: Correctly lower llvm.sqrt.f32

Make codegen emit correctly rounded sqrt by default.

Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare
based on !fpmath, like the fdiv case. Hack around visitation ordering
problems from AMDGPUCodeGenPrepare using forward iteration instead of
a well behaved combiner.

https://reviews.llvm.org/D158129

show more ...


# 4b7b4b94 14-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix fast f32 log/log10

OpenCL conformance didn't like interpreting afn as ignore the denormal
handling.

https://reviews.llvm.org/D157940


Revision tags: llvmorg-17.0.0-rc2
# 10304835 30-Jul-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU/GlobalISel: Handle stacksave/stackrestore

https://reviews.llvm.org/D156670


Revision tags: llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6
# e3fd8f83 20-Nov-2022 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Correctly expand f64 sqrt intrinsic

rocm-device-libs and llpc were avoiding using f64 sqrt
intrinsics in favor of their own expansions. Port the
expansion into the backend. Both of these use

AMDGPU: Correctly expand f64 sqrt intrinsic

rocm-device-libs and llpc were avoiding using f64 sqrt
intrinsics in favor of their own expansions. Port the
expansion into the backend. Both of these users should be
updated to call the intrinsic instead.

The library and llpc expansions are slightly different.
llpc uses an ldexp to do the scale; the library uses a multiply.

Use ldexp to do the scale instead of the multiply.
I believe v_ldexp_f64 and v_mul_f64 are always the same number of
cycles, but it's cheaper to materialize the 32-bit integer constant
than the 64-bit double constant.

The libraries have another fast version of sqrt which will
be handled separately.

I am tempted to do this in an IR expansion instead. In the IR
we could take advantage of computeKnownFPClass to avoid
the 0-or-inf argument check.

show more ...


# 54916662 14-Jun-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Correctly lower llvm.exp.f32

The library expansion has too many paths for all the permutations of
DAZ, unsafe and the 3 exp functions. It's easier to expand it in the
backend when we know al

AMDGPU: Correctly lower llvm.exp.f32

The library expansion has too many paths for all the permutations of
DAZ, unsafe and the 3 exp functions. It's easier to expand it in the
backend when we know all of these things. The library currently misses
the no-infinity check on the overflow, which this handles optimizing
out.

Some of the <3 x half> fast tests regress due to vector widening
dropping flags which will be fixed separately.

Apparently there is no exp10 intrinsic, but there should be. Adds some
deadish code in preparation for adding one while I'm following along
with the current library expansion.

show more ...


# ed556a1a 14-Jun-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Correctly lower llvm.exp2.f32

Previously this did a fast math expansion only.


12345