History log of /llvm-project/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp (Results 76 – 100 of 136)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 5fa87ec0 07-May-2020 Nikita Popov <nikita.ppv@gmail.com>

[AMDGPU] Try to determine sign bit during div/rem expansion

This is preparation for D79294, which removes an expensive
InstSimplify optimization, on the assumption that it will be
picked up by InstC

[AMDGPU] Try to determine sign bit during div/rem expansion

This is preparation for D79294, which removes an expensive
InstSimplify optimization, on the assumption that it will be
picked up by InstCombine instead. Of course, this does not hold
up if a backend performs non-trivial IR expansions without running
a canonicalization pipeline afterwards, which turned up as an
issue in the context of AMDGPU div/rem expansion.

This patch mitigates the issue by explicitly performing a known
bits calculation where it matters. No test changes, as those would
only be visible after the other patch lands.

Differential Revision: https://reviews.llvm.org/D79596

show more ...


# a7aaadc1 19-Apr-2020 Florian Hahn <flo@fhahn.com>

[TTI] Clean up includes (NFC).

Remove some unnecessary includes, replace some with forward
declarations.

This also exposed a few places that were missing some includes.


# 44920e85 09-Apr-2020 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Disable sub-dword scralar loads IR widening

These will be widened in the DAG. In the meanwhile early
widening prevents otherwise possible vectorization of
such loads.

Differential Revision

[AMDGPU] Disable sub-dword scralar loads IR widening

These will be widened in the DAG. In the meanwhile early
widening prevents otherwise possible vectorization of
such loads.

Differential Revision: https://reviews.llvm.org/D77835

show more ...


Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1
# 5660bb6b 18-Nov-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Remove denormal subtarget features

Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.


# 98ed613c 17-Feb-2020 Nikita Popov <nikita.ppv@gmail.com>

[IRBuilder] Avoid passing IRBuilder by value; NFC

I've fixed most of these before, but missed some occurrences
in targets I don't usually build.


# 65dbdc32 15-Feb-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Don't preserve analyses with div64 IR expansion

The dominator tree needs to be updated, but that isn't handled now.


# 9ec66860 11-Feb-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add option to disable CGP division expansion

The division expansions in AMDGPUCodeGenPrepare can't be relied on for
correctness, since they punt to later optimization and possibly
legalizati

AMDGPU: Add option to disable CGP division expansion

The division expansions in AMDGPUCodeGenPrepare can't be relied on for
correctness, since they punt to later optimization and possibly
legalization in some cases. We still need a way to be able to write
tests for the legalizer versions of the expansion. This is mostly for
GlobalISel, since the expected optimzations is expecting aren't
implemented.

The interaction with the flag to expand 64-bit division in the IR is
pretty confusing, but these flags have different purposes.

show more ...


# 34d9a16e 19-Jan-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add option to expand 64-bit integer division in IR

I didn't realize we were already expanding 24/32-bit division here
already. Use the available IntegerDivision utilities. This uses loops,
s

AMDGPU: Add option to expand 64-bit integer division in IR

I didn't realize we were already expanding 24/32-bit division here
already. Use the available IntegerDivision utilities. This uses loops,
so produces significantly smaller code than the inline DAG expansion.

This now requires width reductions of 64-bit divisions before
introducing the expanded loops.

This helps work around missing legalization in GlobalISel for
division, which are the only remaining core instructions that didn't
work at all.

I think this is plausibly a better implementation than exists in the
DAG, although turning it on by default misses out on the constant
value optimizations and also needs benchmarking.

show more ...


# 6d4ebada 11-Feb-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Use conditions directly in division expansion

This was creating a select on true/false values, and then comparing
that later. This produced more work for later combines, which can be
avoided

AMDGPU: Use conditions directly in division expansion

This was creating a select on true/false values, and then comparing
that later. This produced more work for later combines, which can be
avoided by just using the boolean values. This was copied from the
original DAG expansion, which also has the same problem. This doesn't
have a observable change using SelectionDAG, but since GlobalISel is
missing these optimizations, the final code was noticeably longer.

show more ...


# b30e1223 11-Feb-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Don't expand more special div cases in IR

These have nicer expansions implemented in the DAG. Ideally we would
either directly implement all of these special expansions, or stop
expanding di

AMDGPU: Don't expand more special div cases in IR

These have nicer expansions implemented in the DAG. Ideally we would
either directly implement all of these special expansions, or stop
expanding division in the IR.

show more ...


# 92c62582 11-Feb-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Directly use rcp intrinsic in idiv expansions

Since natural fdiv lowering is now more conservative even with
denormals disabled, we get a slower expansion from just a plain
1.0/fdiv. Directl

AMDGPU: Directly use rcp intrinsic in idiv expansions

Since natural fdiv lowering is now more conservative even with
denormals disabled, we get a slower expansion from just a plain
1.0/fdiv. Directly emit the rcp intrinsic when using it to implement
integer division to avoid a pointlessly complex sequence.

show more ...


# b87e3e2d 11-Feb-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Don't create potentially dead rcp declarations

This will introduce unused declarations if this doesn't reach any of
the paths that will really use it.


# 884acbb9 07-Feb-2020 Changpeng Fang <changpeng.fang@gmail.com>

AMDGPU: Enhancement on FDIV lowering in AMDGPUCodeGenPrepare

Summary:
The accuracy limit to use rcp is adjusted to 1.0 ulp from 2.5 ulp.
Also, afn instead of arcp is used to allow inaccurate rcp t

AMDGPU: Enhancement on FDIV lowering in AMDGPUCodeGenPrepare

Summary:
The accuracy limit to use rcp is adjusted to 1.0 ulp from 2.5 ulp.
Also, afn instead of arcp is used to allow inaccurate rcp to be used.

Reviewers:
arsenm

Differential Revision: https://reviews.llvm.org/D73588

show more ...


# 25315359 24-Jan-2020 Changpeng Fang <changpeng.fang@gmail.com>

AMDGPU: Implement FDIV optimizations in AMDGPUCodeGenPrepare

Summary:
RCP has the accuracy limit. If FDIV fpmath require high accuracy rcp may not
meet the requirement. However, in DAG

AMDGPU: Implement FDIV optimizations in AMDGPUCodeGenPrepare

Summary:
RCP has the accuracy limit. If FDIV fpmath require high accuracy rcp may not
meet the requirement. However, in DAG lowering, fpmath information gets lost,
and thus we may generate either inaccurate rcp related computation or slow code
for fdiv.

In patch implements fdiv optimizations in the AMDGPUCodeGenPrepare, which could
exactly know !fpmath.

FastUnsafeRcpLegal: We determine whether it is legal to use rcp based on
unsafe-fp-math, fast math flags, denormals and fpmath
accuracy request.

RCP Optimizations:
1/x -> rcp(x) when fast unsafe rcp is legal or fpmath >= 2.5ULP with
denormals flushed.
a/b -> a*rcp(b) when fast unsafe rcp is legal.

Use fdiv.fast:
a/b -> fdiv.fast(a, b) when RCP optimization is not performed and
fpmath >= 2.5ULP with denormals flushed.

1/x -> fdiv.fast(1,x) when RCP optimization is not performed and
fpmath >= 2.5ULP with denormals.

Reviewers:
arsenm

Differential Revision:
https://reviews.llvm.org/D71293

show more ...


# dfec7022 23-Jan-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Check for other uses when looking through casted select

Fixes mesa regression on ext_transform_feedback-max-varyings


# e93e1b62 22-Jan-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix typo


# 2fe500ab 21-Jan-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Look through casted selects to constant fold bin ops

The promotion of the uniform select to i32 interfered with this fold.


# bcd91778 19-Jan-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Do binop of select of constant fold in AMDGPUCodeGenPrepare

DAGCombiner does this, but divisions expanded here miss this
optimization. Since 67aa18f165640374cf0e0a6226dc793bbda6e74f,
divisio

AMDGPU: Do binop of select of constant fold in AMDGPUCodeGenPrepare

DAGCombiner does this, but divisions expanded here miss this
optimization. Since 67aa18f165640374cf0e0a6226dc793bbda6e74f,
divisions have been expanded here and missed out on this
optimization. Avoids test regressions in a future patch.

show more ...


# 5721483b 21-Jan-2020 Fangrui Song <i@maskray.me>

[AMDGPU] Fix -Wunused-variable after e5823bf806ca9fa6f87583065b3898a2edabce57


# e5823bf8 19-Jan-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Don't create weird sized integers

There's no reason to introduce a new, unnaturally sized value
here. This has a chance to produce worse code with
legalization. Avoids regression in a future

AMDGPU: Don't create weird sized integers

There's no reason to introduce a new, unnaturally sized value
here. This has a chance to produce worse code with
legalization. Avoids regression in a future patch.

show more ...


# db0ed3e4 01-Nov-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Refactor treatment of denormal mode

Start moving towards treating this as a property of the calling
convention, and not the subtarget. The default denormal mode should
not be part of the sub

AMDGPU: Refactor treatment of denormal mode

Start moving towards treating this as a property of the calling
convention, and not the subtarget. The default denormal mode should
not be part of the subtarget, and be moved into a separate function
attribute.

This patch is still NFC. The denormal mode remains as a subtarget
feature for now, but make the necessary changes to switch to using an
attribute.

show more ...


# 05da2fe5 13-Nov-2019 Reid Kleckner <rnk@google.com>

Sink all InitializePasses.h includes

This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of reco

Sink all InitializePasses.h includes

This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.

I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h

Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.

Reviewers: bkramer, asbirlea, bollu, jdoerfert

Differential Revision: https://reviews.llvm.org/D70211

show more ...


Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3
# c6ab2b4f 24-Aug-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Preserve value name when inserting mul24 intrinsic

llvm-svn: 369857


# b3dd381a 24-Aug-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Introduce a flag to disable mul24 intrinsic formation

llvm-svn: 369856


Revision tags: llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init
# 49169a96 15-Jul-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add 24-bit mul intrinsics

Insert these during codegenprepare.

This works around a DAG issue where generic combines eliminate the and
asserting the high bits are zero, which then exposes an

AMDGPU: Add 24-bit mul intrinsics

Insert these during codegenprepare.

This works around a DAG issue where generic combines eliminate the and
asserting the high bits are zero, which then exposes an unknown read
source to the mul combine. It doesn't worth the hassle of trying to
insert an AssertZext or something to try to deal with it.

llvm-svn: 366094

show more ...


123456