#
5fa87ec0 |
| 07-May-2020 |
Nikita Popov <nikita.ppv@gmail.com> |
[AMDGPU] Try to determine sign bit during div/rem expansion
This is preparation for D79294, which removes an expensive InstSimplify optimization, on the assumption that it will be picked up by InstC
[AMDGPU] Try to determine sign bit during div/rem expansion
This is preparation for D79294, which removes an expensive InstSimplify optimization, on the assumption that it will be picked up by InstCombine instead. Of course, this does not hold up if a backend performs non-trivial IR expansions without running a canonicalization pipeline afterwards, which turned up as an issue in the context of AMDGPU div/rem expansion.
This patch mitigates the issue by explicitly performing a known bits calculation where it matters. No test changes, as those would only be visible after the other patch lands.
Differential Revision: https://reviews.llvm.org/D79596
show more ...
|
#
a7aaadc1 |
| 19-Apr-2020 |
Florian Hahn <flo@fhahn.com> |
[TTI] Clean up includes (NFC).
Remove some unnecessary includes, replace some with forward declarations.
This also exposed a few places that were missing some includes.
|
#
44920e85 |
| 09-Apr-2020 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Disable sub-dword scralar loads IR widening
These will be widened in the DAG. In the meanwhile early widening prevents otherwise possible vectorization of such loads.
Differential Revision
[AMDGPU] Disable sub-dword scralar loads IR widening
These will be widened in the DAG. In the meanwhile early widening prevents otherwise possible vectorization of such loads.
Differential Revision: https://reviews.llvm.org/D77835
show more ...
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1 |
|
#
5660bb6b |
| 18-Nov-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove denormal subtarget features
Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.
|
#
98ed613c |
| 17-Feb-2020 |
Nikita Popov <nikita.ppv@gmail.com> |
[IRBuilder] Avoid passing IRBuilder by value; NFC
I've fixed most of these before, but missed some occurrences in targets I don't usually build.
|
#
65dbdc32 |
| 15-Feb-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Don't preserve analyses with div64 IR expansion
The dominator tree needs to be updated, but that isn't handled now.
|
#
9ec66860 |
| 11-Feb-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add option to disable CGP division expansion
The division expansions in AMDGPUCodeGenPrepare can't be relied on for correctness, since they punt to later optimization and possibly legalizati
AMDGPU: Add option to disable CGP division expansion
The division expansions in AMDGPUCodeGenPrepare can't be relied on for correctness, since they punt to later optimization and possibly legalization in some cases. We still need a way to be able to write tests for the legalizer versions of the expansion. This is mostly for GlobalISel, since the expected optimzations is expecting aren't implemented.
The interaction with the flag to expand 64-bit division in the IR is pretty confusing, but these flags have different purposes.
show more ...
|
#
34d9a16e |
| 19-Jan-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add option to expand 64-bit integer division in IR
I didn't realize we were already expanding 24/32-bit division here already. Use the available IntegerDivision utilities. This uses loops, s
AMDGPU: Add option to expand 64-bit integer division in IR
I didn't realize we were already expanding 24/32-bit division here already. Use the available IntegerDivision utilities. This uses loops, so produces significantly smaller code than the inline DAG expansion.
This now requires width reductions of 64-bit divisions before introducing the expanded loops.
This helps work around missing legalization in GlobalISel for division, which are the only remaining core instructions that didn't work at all.
I think this is plausibly a better implementation than exists in the DAG, although turning it on by default misses out on the constant value optimizations and also needs benchmarking.
show more ...
|
#
6d4ebada |
| 11-Feb-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Use conditions directly in division expansion
This was creating a select on true/false values, and then comparing that later. This produced more work for later combines, which can be avoided
AMDGPU: Use conditions directly in division expansion
This was creating a select on true/false values, and then comparing that later. This produced more work for later combines, which can be avoided by just using the boolean values. This was copied from the original DAG expansion, which also has the same problem. This doesn't have a observable change using SelectionDAG, but since GlobalISel is missing these optimizations, the final code was noticeably longer.
show more ...
|
#
b30e1223 |
| 11-Feb-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Don't expand more special div cases in IR
These have nicer expansions implemented in the DAG. Ideally we would either directly implement all of these special expansions, or stop expanding di
AMDGPU: Don't expand more special div cases in IR
These have nicer expansions implemented in the DAG. Ideally we would either directly implement all of these special expansions, or stop expanding division in the IR.
show more ...
|
#
92c62582 |
| 11-Feb-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Directly use rcp intrinsic in idiv expansions
Since natural fdiv lowering is now more conservative even with denormals disabled, we get a slower expansion from just a plain 1.0/fdiv. Directl
AMDGPU: Directly use rcp intrinsic in idiv expansions
Since natural fdiv lowering is now more conservative even with denormals disabled, we get a slower expansion from just a plain 1.0/fdiv. Directly emit the rcp intrinsic when using it to implement integer division to avoid a pointlessly complex sequence.
show more ...
|
#
b87e3e2d |
| 11-Feb-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Don't create potentially dead rcp declarations
This will introduce unused declarations if this doesn't reach any of the paths that will really use it.
|
#
884acbb9 |
| 07-Feb-2020 |
Changpeng Fang <changpeng.fang@gmail.com> |
AMDGPU: Enhancement on FDIV lowering in AMDGPUCodeGenPrepare
Summary: The accuracy limit to use rcp is adjusted to 1.0 ulp from 2.5 ulp. Also, afn instead of arcp is used to allow inaccurate rcp t
AMDGPU: Enhancement on FDIV lowering in AMDGPUCodeGenPrepare
Summary: The accuracy limit to use rcp is adjusted to 1.0 ulp from 2.5 ulp. Also, afn instead of arcp is used to allow inaccurate rcp to be used.
Reviewers: arsenm
Differential Revision: https://reviews.llvm.org/D73588
show more ...
|
#
25315359 |
| 24-Jan-2020 |
Changpeng Fang <changpeng.fang@gmail.com> |
AMDGPU: Implement FDIV optimizations in AMDGPUCodeGenPrepare
Summary: RCP has the accuracy limit. If FDIV fpmath require high accuracy rcp may not meet the requirement. However, in DAG
AMDGPU: Implement FDIV optimizations in AMDGPUCodeGenPrepare
Summary: RCP has the accuracy limit. If FDIV fpmath require high accuracy rcp may not meet the requirement. However, in DAG lowering, fpmath information gets lost, and thus we may generate either inaccurate rcp related computation or slow code for fdiv.
In patch implements fdiv optimizations in the AMDGPUCodeGenPrepare, which could exactly know !fpmath.
FastUnsafeRcpLegal: We determine whether it is legal to use rcp based on unsafe-fp-math, fast math flags, denormals and fpmath accuracy request.
RCP Optimizations: 1/x -> rcp(x) when fast unsafe rcp is legal or fpmath >= 2.5ULP with denormals flushed. a/b -> a*rcp(b) when fast unsafe rcp is legal.
Use fdiv.fast: a/b -> fdiv.fast(a, b) when RCP optimization is not performed and fpmath >= 2.5ULP with denormals flushed.
1/x -> fdiv.fast(1,x) when RCP optimization is not performed and fpmath >= 2.5ULP with denormals.
Reviewers: arsenm
Differential Revision: https://reviews.llvm.org/D71293
show more ...
|
#
dfec7022 |
| 23-Jan-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Check for other uses when looking through casted select
Fixes mesa regression on ext_transform_feedback-max-varyings
|
#
e93e1b62 |
| 22-Jan-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix typo
|
#
2fe500ab |
| 21-Jan-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Look through casted selects to constant fold bin ops
The promotion of the uniform select to i32 interfered with this fold.
|
#
bcd91778 |
| 19-Jan-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Do binop of select of constant fold in AMDGPUCodeGenPrepare
DAGCombiner does this, but divisions expanded here miss this optimization. Since 67aa18f165640374cf0e0a6226dc793bbda6e74f, divisio
AMDGPU: Do binop of select of constant fold in AMDGPUCodeGenPrepare
DAGCombiner does this, but divisions expanded here miss this optimization. Since 67aa18f165640374cf0e0a6226dc793bbda6e74f, divisions have been expanded here and missed out on this optimization. Avoids test regressions in a future patch.
show more ...
|
#
5721483b |
| 21-Jan-2020 |
Fangrui Song <i@maskray.me> |
[AMDGPU] Fix -Wunused-variable after e5823bf806ca9fa6f87583065b3898a2edabce57
|
#
e5823bf8 |
| 19-Jan-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Don't create weird sized integers
There's no reason to introduce a new, unnaturally sized value here. This has a chance to produce worse code with legalization. Avoids regression in a future
AMDGPU: Don't create weird sized integers
There's no reason to introduce a new, unnaturally sized value here. This has a chance to produce worse code with legalization. Avoids regression in a future patch.
show more ...
|
#
db0ed3e4 |
| 01-Nov-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Refactor treatment of denormal mode
Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the sub
AMDGPU: Refactor treatment of denormal mode
Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute.
This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute.
show more ...
|
#
05da2fe5 |
| 13-Nov-2019 |
Reid Kleckner <rnk@google.com> |
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of reco
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
show more ...
|
Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3 |
|
#
c6ab2b4f |
| 24-Aug-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Preserve value name when inserting mul24 intrinsic
llvm-svn: 369857
|
#
b3dd381a |
| 24-Aug-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Introduce a flag to disable mul24 intrinsic formation
llvm-svn: 369856
|
Revision tags: llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init |
|
#
49169a96 |
| 15-Jul-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add 24-bit mul intrinsics
Insert these during codegenprepare.
This works around a DAG issue where generic combines eliminate the and asserting the high bits are zero, which then exposes an
AMDGPU: Add 24-bit mul intrinsics
Insert these during codegenprepare.
This works around a DAG issue where generic combines eliminate the and asserting the high bits are zero, which then exposes an unknown read source to the mul combine. It doesn't worth the hassle of trying to insert an AssertZext or something to try to deal with it.
llvm-svn: 366094
show more ...
|