History log of /llvm-project/llvm/lib/Target/AMDGPU/AMDGPULibCalls.cpp (Results 1 – 25 of 116)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6
# b446c208 16-Dec-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Verify function type matches when matching libcalls (#119043)

Previously this would recognize a call to a mangled ldexp(float, float)
as a candidate to replace with the intrinsic. We need to

AMDGPU: Verify function type matches when matching libcalls (#119043)

Previously this would recognize a call to a mangled ldexp(float, float)
as a candidate to replace with the intrinsic. We need to verify the second
parameter is in fact an integer.

Fixes: SWDEV-501389

show more ...


Revision tags: llvmorg-19.1.5, llvmorg-19.1.4
# be187369 14-Nov-2024 Kazu Hirata <kazu@google.com>

[AMDGPU] Remove unused includes (NFC) (#116154)

Identified with misc-include-cleaner.


Revision tags: llvmorg-19.1.3
# c85611e8 17-Oct-2024 goldsteinn <35538541+goldsteinn@users.noreply.github.com>

[SimplifyLibCall][Attribute] Fix bug where we may keep `range` attr with incompatible type (#112649)

In a variety of places we change the bitwidth of a parameter but don't
update the attributes.

[SimplifyLibCall][Attribute] Fix bug where we may keep `range` attr with incompatible type (#112649)

In a variety of places we change the bitwidth of a parameter but don't
update the attributes.

The issue in this case is from the `range` attribute when inlining
`__memset_chk`. `optimizeMemSetChk` will replace an `i32` with an
`i8`, and if the `i32` had a `range` attr assosiated it will cause an
error.

Fixes #112633

show more ...


Revision tags: llvmorg-19.1.2
# fa789dff 11-Oct-2024 Rahul Joshi <rjoshi@nvidia.com>

[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752)

Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is a

[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752)

Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).

show more ...


Revision tags: llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# c7309dad 17-Jul-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Use range-based for loops. NFC. (#99047)


# 74b87b02 16-Jul-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Fix and add namespace closing comments. NFC.


# aeafdc21 16-Jul-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Use using instead of typedef. NFC.


# 63a1242a 16-Jul-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] clang-tidy: define trivial constructors with = default. NFC.


# bff619f9 01-Jul-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

Revert "AMDGPU: Use real copysign in fast pow (#97152)"

This reverts commit d3e7c4ce7a3d7f08cea02cba8f34c590a349688b.


# d3e7c4ce 01-Jul-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Use real copysign in fast pow (#97152)

Previously this would introduce some codegen regressions, but
those have been avoided by simplifying demanded bits on copysign
operations.


# d75f9dd1 24-Jun-2024 Stephen Tozer <stephen.tozer@sony.com>

Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"

Reverts the above commit, as it updates a common header function and
did not update all callsites:

https://lab.llvm.org/buildbot

Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"

Reverts the above commit, as it updates a common header function and
did not update all callsites:

https://lab.llvm.org/buildbot/#/builders/29/builds/382

This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.

show more ...


# 6481dc57 24-Jun-2024 Stephen Tozer <stephen.tozer@sony.com>

[IR][NFC] Update IRBuilder to use InsertPosition (#96497)

Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock

[IR][NFC] Update IRBuilder to use InsertPosition (#96497)

Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.

This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.

show more ...


# b932da16 18-Jun-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix vector handling in pown libcall simplification (#95832)

The isIntegerTy check would not work as you would hope in
the vector case.


Revision tags: llvmorg-18.1.8, llvmorg-18.1.7
# dab1f7c8 21-May-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Emit 1/llvm.sqrt(x) instead of rsqrt calls in libcall handling (#92863)

With the contract flag we should end up codegening to the rsqrt
instruction, or denormal corrected rsqrt sequence pre

AMDGPU: Emit 1/llvm.sqrt(x) instead of rsqrt calls in libcall handling (#92863)

With the contract flag we should end up codegening to the rsqrt
instruction, or denormal corrected rsqrt sequence present in the
library.

show more ...


# 66b76faf 21-May-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Directly emit sqrt intrinsic when folding rootn(x, 2) (#92598)

This avoids depending on pre/post link runs.

Depends #92595


# 3cb1fe60 20-May-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Don't fold rootn(x, 1) to input for strictfp functions (#92595)

We need to insert a constrained canonicalize.

Depends #92594


# 586ecd75 20-May-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Relax vector restriction for rootn libcall folds (#92594)

We could try harder for nonsplat vectors but probably not worth the
effort.


Revision tags: llvmorg-18.1.6
# 48b23c09 17-May-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Handle undef correctly in isKnownIntegral (#92566)


Revision tags: llvmorg-18.1.5
# 1baa3850 18-Apr-2024 Nikita Popov <npopov@redhat.com>

[IR][PatternMatch] Only accept poison in getSplatValue() (#89159)

In #88217 a large set of matchers was changed to only accept poison
values in splats, but not undef values. This is because we now

[IR][PatternMatch] Only accept poison in getSplatValue() (#89159)

In #88217 a large set of matchers was changed to only accept poison
values in splats, but not undef values. This is because we now use
poison for non-demanded vector elements, and allowing undef can cause
correctness issues.

This patch covers the remaining matchers by changing the AllowUndef
parameter of getSplatValue() to AllowPoison instead. We also carry out
corresponding renames in matchers.

As a followup, we may want to change the default for things like m_APInt
to m_APIntAllowPoison (as this is much less risky when only allowing
poison), but this change doesn't do that.

There is one caveat here: We have a single place
(X86FixupVectorConstants) which does require handling of vector splats
with undefs. This is because this works on backend constant pool
entries, which currently still use undef instead of poison for
non-demanded elements (because SDAG as a whole does not have an explicit
poison representation). As it's just the single use, I've open-coded a
getSplatValueAllowUndef() helper there, to discourage use in any other
places.

show more ...


Revision tags: llvmorg-18.1.4, llvmorg-18.1.3
# f5296df9 27-Mar-2024 Kevin P. Neal <52762977+kpneal@users.noreply.github.com>

[FPEnv][AMDGPU] Correct AMDGPUSimplifyLibCalls handling of strictfp attribute. (#86705)

The AMDGPUSimplifyLibCalls pass was lowering function calls with the
strictfp attribute to sequences that inc

[FPEnv][AMDGPU] Correct AMDGPUSimplifyLibCalls handling of strictfp attribute. (#86705)

The AMDGPUSimplifyLibCalls pass was lowering function calls with the
strictfp attribute to sequences that included function calls incorrectly
lacking the attribute. This patch corrects that.

The pass now also emits the correct constrained fp call instead of
normal FP instructions when in a function with the strictfp attribute.
Replacing non-constrained calls with constrained calls when required
is still on the IRBuilder's TODO list.

show more ...


Revision tags: llvmorg-18.1.2
# b9d83eff 19-Mar-2024 Jeremy Morse <jeremy.morse@sony.com>

[NFC][RemoveDIs] Use iterators for insertion at various call-sites (#84736)

These are the last remaining "trivial" changes to passes that use
Instruction pointers for insertion. All of this should

[NFC][RemoveDIs] Use iterators for insertion at various call-sites (#84736)

These are the last remaining "trivial" changes to passes that use
Instruction pointers for insertion. All of this should be NFC, it's just
changing the spelling of how we identify a position.

In one or two locations, I'm also switching uses of getNextNode etc to
using std::next with iterators. This too should be NFC.

---------

Merged by: Stephen Tozer <stephen.tozer@sony.com>

show more ...


Revision tags: llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2
# 930996e9 05-Feb-2024 Yingwei Zheng <dtcxzyw2333@gmail.com>

[ValueTracking][NFC] Pass `SimplifyQuery` to `computeKnownFPClass` family (#80657)

This patch refactors the interface of the `computeKnownFPClass` family
to pass `SimplifyQuery` directly.
The moti

[ValueTracking][NFC] Pass `SimplifyQuery` to `computeKnownFPClass` family (#80657)

This patch refactors the interface of the `computeKnownFPClass` family
to pass `SimplifyQuery` directly.
The motivation of this patch is to compute known fpclass with
`DomConditionCache`, which was introduced by
https://github.com/llvm/llvm-project/pull/73662. With
`DomConditionCache`, we can do more optimization with context-sensitive
information.

Example (extracted from
[fmt/format.h](https://github.com/fmtlib/fmt/blob/e17bc67547a66cdd378ca6a90c56b865d30d6168/include/fmt/format.h#L3555-L3566)):
```
define float @test(float %x, i1 %cond) {
%i32 = bitcast float %x to i32
%cmp = icmp slt i32 %i32, 0
br i1 %cmp, label %if.then1, label %if.else

if.then1:
%fneg = fneg float %x
br label %if.end

if.else:
br i1 %cond, label %if.then2, label %if.end

if.then2:
br label %if.end

if.end:
%value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ]
%ret = call float @llvm.fabs.f32(float %value)
ret float %ret
}
```
We can prove the signbit of `%value` is always zero. Then the fabs can
be eliminated.

show more ...


Revision tags: llvmorg-18.1.0-rc1, llvmorg-19-init
# daecc303 09-Jan-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Replace sqrt OpenCL libcalls with llvm.sqrt (#74197)

The library implementation is just a wrapper around a call to the
intrinsic, but loses metadata. Swap out the call site to the intrinsic

AMDGPU: Replace sqrt OpenCL libcalls with llvm.sqrt (#74197)

The library implementation is just a wrapper around a call to the
intrinsic, but loses metadata. Swap out the call site to the intrinsic
so that the lowering can see the !fpmath metadata and fast math flags.

Since d56e0d07cc5ee8e334fd1ad403eef0b1a771384f, clang started placing
!fpmath on OpenCL library sqrt calls. Also don't bother emitting
native_sqrt anymore, it's just another wrapper around llvm.sqrt.

show more ...


# a34db9bd 18-Dec-2023 Jakub Chlanda <jakub@codeplay.com>

[AMDGPU][NFC] Simplify needcopysign logic (#75176)

This was caught by coverity, reported as: `dead_error_condition`.
Since the conditional revolves around `CF`, it is guaranteed to be null
in the

[AMDGPU][NFC] Simplify needcopysign logic (#75176)

This was caught by coverity, reported as: `dead_error_condition`.
Since the conditional revolves around `CF`, it is guaranteed to be null
in the else clause, hence making the second part of the statement
redundant.

show more ...


# 67aec2f5 15-Dec-2023 Youngsuk Kim <youngsuk.kim@hpe.com>

[llvm] Remove no-op ptr-to-ptr casts (NFC)

Remove calls to CreatePointerCast which are just doing no-op ptr-to-ptr
bitcasts.

Opaque ptr cleanup effort (NFC).


12345