History log of /llvm-project/llvm/lib/Target/AMDGPU/AMDGPULibCalls.cpp (Results 26 – 50 of 116)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4
# ee795fd1 26-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Handle rounding intrinsic exponents in isKnownIntegral

https://reviews.llvm.org/D158999


# def22855 26-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Use pown instead of pow if known integral

https://reviews.llvm.org/D158998


# deefda70 26-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Use exp2 and log2 intrinsics directly for f16/f32

These codegen correctly but f64 doesn't. This prevents losing fast
math flags on the way to the underlying intrinsic.

https://reviews.llvm.

AMDGPU: Use exp2 and log2 intrinsics directly for f16/f32

These codegen correctly but f64 doesn't. This prevents losing fast
math flags on the way to the underlying intrinsic.

https://reviews.llvm.org/D158997

show more ...


# dac8f974 26-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Handle sitofp and uitofp exponents in fast pow expansion

https://reviews.llvm.org/D158996


# 699685b7 26-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Enable assumptions in AMDGPULibCalls

https://reviews.llvm.org/D159006


# a45b787c 25-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Turn pow libcalls into powr

powr is just pow with the assumption that x >= 0, otherwise nan. This
fires at least 6 times in luxmark

https://reviews.llvm.org/D158908


# f5d8a9b1 25-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Simplify handling of constant vectors in libcalls

Also fixes not handling the partially undef case.

https://reviews.llvm.org/D158905


# afb24cbb 25-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Don't require all flags to expand fast powr

This was requiring all fast math flags, which is practically
useless. This wouldn't fire using all the standard OpenCL fast math
flags. This only

AMDGPU: Don't require all flags to expand fast powr

This was requiring all fast math flags, which is practically
useless. This wouldn't fire using all the standard OpenCL fast math
flags. This only needs afn nnan and ninf.

https://reviews.llvm.org/D158904

show more ...


# bfe6bc05 26-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Cleanup check for integral exponents in pow folds

Also improves undef handling

https://reviews.llvm.org/D159006


# 80e5b46e 26-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix assertion on half typed pow with constant exponents

https://reviews.llvm.org/D158993


# 35c2a754 25-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix asserting on fast f16 pown

https://reviews.llvm.org/D158903


# b24dab0e 25-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Trim dead includes


Revision tags: llvmorg-17.0.0-rc3
# 66ee7940 16-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix verifier error on splatted opencl fmin/fmax and ldexp calls

Apparently the spec has overloads for fmin/fmax and ldexp with one of
the operands as scalar. We need to broadcast the scalars

AMDGPU: Fix verifier error on splatted opencl fmin/fmax and ldexp calls

Apparently the spec has overloads for fmin/fmax and ldexp with one of
the operands as scalar. We need to broadcast the scalars to the vector
type.

https://reviews.llvm.org/D158077

show more ...


# d2517616 14-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Replace log libcalls with log intrinsics


# d45022b0 12-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Remove special case constant folding of divide

We should probably just swap this out for the fdiv, but that's what
the implementation is anyway.


# 483cc218 12-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Remove special case folding of sqrt


# 416f6af9 12-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Remove special case folding of fma/mad

These just get replaced with an intrinsic now. This was also
introducing host dependence on the result since it relied on the
compiler choice to contra

AMDGPU: Remove special case folding of fma/mad

These just get replaced with an intrinsic now. This was also
introducing host dependence on the result since it relied on the
compiler choice to contract or not.

show more ...


# 0eabe65b 12-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Replace ldexp libcalls with intrinsic


# f337a77c 12-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Replace rounding libcalls with intrinsics


# c7876c55 12-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Replace fabs and copysign libcalls with intrinsics

Preserves flags and metadata like the other cases.


# a70006c4 12-Aug-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Replace some libcalls with intrinsics

OpenCL loses fast math information by going through libcall wrappers
around intrinsics.

Do this to preserve call site flags which are lost when inlinin

AMDGPU: Replace some libcalls with intrinsics

OpenCL loses fast math information by going through libcall wrappers
around intrinsics.

Do this to preserve call site flags which are lost when inlining. It's
not safe in general to propagate flags during inline, so avoid dealing
with this by just special casing some of the useful calls.

show more ...


Revision tags: llvmorg-17.0.0-rc2
# f44beecb 31-Jul-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Try to use private version of sincos if available

The comment was out of date, the device libs build does provide all
the pointer overloads. An extremely pedantic interpretation of the
spec

AMDGPU: Try to use private version of sincos if available

The comment was out of date, the device libs build does provide all
the pointer overloads. An extremely pedantic interpretation of the
spec would suggest only the flat version exists, but the overloads do
exist in the implementation.

https://reviews.llvm.org/D156720

show more ...


# 42c6e420 30-Jul-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Handle multiple uses when matching sincos

Match how the generic implementation handles this. We now will leave
behind the dead other user for later passes to deal with.

https://reviews.llvm

AMDGPU: Handle multiple uses when matching sincos

Match how the generic implementation handles this. We now will leave
behind the dead other user for later passes to deal with.

https://reviews.llvm.org/D156707

show more ...


# 6dbd4581 30-Jul-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Remove pointless libcall optimization of fma/mad

After the library is linked and trivially inlined, the generic fma and
fmuladd intrinsics already handle these cases, and with precise flag
h

AMDGPU: Remove pointless libcall optimization of fma/mad

After the library is linked and trivially inlined, the generic fma and
fmuladd intrinsics already handle these cases, and with precise flag
handling. This was requiring all fast math flags when we really just
need nsz for the fma(a, b, 0) case.

https://reviews.llvm.org/D156677

show more ...


# 6448d5ba 30-Jul-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Remove pointless libcall recognition of native_{divide|recip}

This was trying to constant fold these calls, and also turn some of
them into a regular fmul/fdiv. There's no point to doing tha

AMDGPU: Remove pointless libcall recognition of native_{divide|recip}

This was trying to constant fold these calls, and also turn some of
them into a regular fmul/fdiv. There's no point to doing that, the
underlying library implementation should be using those in the first
place. Even when the library does use the rcp intrinsics, the backend
handles constant folding of those. This was also only performing the
folds under overly strict fast-evertyhing-is-required conditions.

The one possible plus this gained over linking in the library is if
you were using all fast math flags, it would propagate them to the new
instructions. We could address this in the library by adding more fast
math flags to the native implementations.

The constant fold case also had no test coverage.

https://reviews.llvm.org/D156676

show more ...


12345