AMDGPUCodeGenPrepare.cpp - OpenGrok history log for /llvm-project/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# 5fa87ec0	07-May-2020	Nikita Popov <nikita.ppv@gmail.com>	[AMDGPU] Try to determine sign bit during div/rem expansion This is preparation for D79294, which removes an expensive InstSimplify optimization, on the assumption that it will be picked up by InstC [AMDGPU] Try to determine sign bit during div/rem expansion This is preparation for D79294, which removes an expensive InstSimplify optimization, on the assumption that it will be picked up by InstCombine instead. Of course, this does not hold up if a backend performs non-trivial IR expansions without running a canonicalization pipeline afterwards, which turned up as an issue in the context of AMDGPU div/rem expansion. This patch mitigates the issue by explicitly performing a known bits calculation where it matters. No test changes, as those would only be visible after the other patch lands. Differential Revision: https://reviews.llvm.org/D79596 show more ...
# a7aaadc1	19-Apr-2020	Florian Hahn <flo@fhahn.com>	[TTI] Clean up includes (NFC). Remove some unnecessary includes, replace some with forward declarations. This also exposed a few places that were missing some includes.
# 44920e85	09-Apr-2020	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	[AMDGPU] Disable sub-dword scralar loads IR widening These will be widened in the DAG. In the meanwhile early widening prevents otherwise possible vectorization of such loads. Differential Revision [AMDGPU] Disable sub-dword scralar loads IR widening These will be widened in the DAG. In the meanwhile early widening prevents otherwise possible vectorization of such loads. Differential Revision: https://reviews.llvm.org/D77835 show more ...
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1
# 5660bb6b	18-Nov-2019	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Remove denormal subtarget features Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.
# 98ed613c	17-Feb-2020	Nikita Popov <nikita.ppv@gmail.com>	[IRBuilder] Avoid passing IRBuilder by value; NFC I've fixed most of these before, but missed some occurrences in targets I don't usually build.
# 65dbdc32	15-Feb-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Don't preserve analyses with div64 IR expansion The dominator tree needs to be updated, but that isn't handled now.
# 9ec66860	11-Feb-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add option to disable CGP division expansion The division expansions in AMDGPUCodeGenPrepare can't be relied on for correctness, since they punt to later optimization and possibly legalizati AMDGPU: Add option to disable CGP division expansion The division expansions in AMDGPUCodeGenPrepare can't be relied on for correctness, since they punt to later optimization and possibly legalization in some cases. We still need a way to be able to write tests for the legalizer versions of the expansion. This is mostly for GlobalISel, since the expected optimzations is expecting aren't implemented. The interaction with the flag to expand 64-bit division in the IR is pretty confusing, but these flags have different purposes. show more ...
# 34d9a16e	19-Jan-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add option to expand 64-bit integer division in IR I didn't realize we were already expanding 24/32-bit division here already. Use the available IntegerDivision utilities. This uses loops, s AMDGPU: Add option to expand 64-bit integer division in IR I didn't realize we were already expanding 24/32-bit division here already. Use the available IntegerDivision utilities. This uses loops, so produces significantly smaller code than the inline DAG expansion. This now requires width reductions of 64-bit divisions before introducing the expanded loops. This helps work around missing legalization in GlobalISel for division, which are the only remaining core instructions that didn't work at all. I think this is plausibly a better implementation than exists in the DAG, although turning it on by default misses out on the constant value optimizations and also needs benchmarking. show more ...
# 6d4ebada	11-Feb-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Use conditions directly in division expansion This was creating a select on true/false values, and then comparing that later. This produced more work for later combines, which can be avoided AMDGPU: Use conditions directly in division expansion This was creating a select on true/false values, and then comparing that later. This produced more work for later combines, which can be avoided by just using the boolean values. This was copied from the original DAG expansion, which also has the same problem. This doesn't have a observable change using SelectionDAG, but since GlobalISel is missing these optimizations, the final code was noticeably longer. show more ...
# b30e1223	11-Feb-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Don't expand more special div cases in IR These have nicer expansions implemented in the DAG. Ideally we would either directly implement all of these special expansions, or stop expanding di AMDGPU: Don't expand more special div cases in IR These have nicer expansions implemented in the DAG. Ideally we would either directly implement all of these special expansions, or stop expanding division in the IR. show more ...
# 92c62582	11-Feb-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Directly use rcp intrinsic in idiv expansions Since natural fdiv lowering is now more conservative even with denormals disabled, we get a slower expansion from just a plain 1.0/fdiv. Directl AMDGPU: Directly use rcp intrinsic in idiv expansions Since natural fdiv lowering is now more conservative even with denormals disabled, we get a slower expansion from just a plain 1.0/fdiv. Directly emit the rcp intrinsic when using it to implement integer division to avoid a pointlessly complex sequence. show more ...
# b87e3e2d	11-Feb-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Don't create potentially dead rcp declarations This will introduce unused declarations if this doesn't reach any of the paths that will really use it.
# 884acbb9	07-Feb-2020	Changpeng Fang <changpeng.fang@gmail.com>	AMDGPU: Enhancement on FDIV lowering in AMDGPUCodeGenPrepare Summary: The accuracy limit to use rcp is adjusted to 1.0 ulp from 2.5 ulp. Also, afn instead of arcp is used to allow inaccurate rcp t AMDGPU: Enhancement on FDIV lowering in AMDGPUCodeGenPrepare Summary: The accuracy limit to use rcp is adjusted to 1.0 ulp from 2.5 ulp. Also, afn instead of arcp is used to allow inaccurate rcp to be used. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D73588 show more ...
# 25315359	24-Jan-2020	Changpeng Fang <changpeng.fang@gmail.com>	AMDGPU: Implement FDIV optimizations in AMDGPUCodeGenPrepare Summary: RCP has the accuracy limit. If FDIV fpmath require high accuracy rcp may not meet the requirement. However, in DAG AMDGPU: Implement FDIV optimizations in AMDGPUCodeGenPrepare Summary: RCP has the accuracy limit. If FDIV fpmath require high accuracy rcp may not meet the requirement. However, in DAG lowering, fpmath information gets lost, and thus we may generate either inaccurate rcp related computation or slow code for fdiv. In patch implements fdiv optimizations in the AMDGPUCodeGenPrepare, which could exactly know !fpmath. FastUnsafeRcpLegal: We determine whether it is legal to use rcp based on unsafe-fp-math, fast math flags, denormals and fpmath accuracy request. RCP Optimizations: 1/x -> rcp(x) when fast unsafe rcp is legal or fpmath >= 2.5ULP with denormals flushed. a/b -> a*rcp(b) when fast unsafe rcp is legal. Use fdiv.fast: a/b -> fdiv.fast(a, b) when RCP optimization is not performed and fpmath >= 2.5ULP with denormals flushed. 1/x -> fdiv.fast(1,x) when RCP optimization is not performed and fpmath >= 2.5ULP with denormals. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D71293 show more ...
# dfec7022	23-Jan-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Check for other uses when looking through casted select Fixes mesa regression on ext_transform_feedback-max-varyings
# e93e1b62	22-Jan-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Fix typo
# 2fe500ab	21-Jan-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Look through casted selects to constant fold bin ops The promotion of the uniform select to i32 interfered with this fold.
# bcd91778	19-Jan-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Do binop of select of constant fold in AMDGPUCodeGenPrepare DAGCombiner does this, but divisions expanded here miss this optimization. Since 67aa18f165640374cf0e0a6226dc793bbda6e74f, divisio AMDGPU: Do binop of select of constant fold in AMDGPUCodeGenPrepare DAGCombiner does this, but divisions expanded here miss this optimization. Since 67aa18f165640374cf0e0a6226dc793bbda6e74f, divisions have been expanded here and missed out on this optimization. Avoids test regressions in a future patch. show more ...
# 5721483b	21-Jan-2020	Fangrui Song <i@maskray.me>	[AMDGPU] Fix -Wunused-variable after e5823bf806ca9fa6f87583065b3898a2edabce57
# e5823bf8	19-Jan-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Don't create weird sized integers There's no reason to introduce a new, unnaturally sized value here. This has a chance to produce worse code with legalization. Avoids regression in a future AMDGPU: Don't create weird sized integers There's no reason to introduce a new, unnaturally sized value here. This has a chance to produce worse code with legalization. Avoids regression in a future patch. show more ...
# db0ed3e4	01-Nov-2019	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Refactor treatment of denormal mode Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the sub AMDGPU: Refactor treatment of denormal mode Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute. This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute. show more ...
# 05da2fe5	13-Nov-2019	Reid Kleckner <rnk@google.com>	Sink all InitializePasses.h includes This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of reco Sink all InitializePasses.h includes This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation. I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild. Reviewers: bkramer, asbirlea, bollu, jdoerfert Differential Revision: https://reviews.llvm.org/D70211 show more ...
Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3
# c6ab2b4f	24-Aug-2019	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Preserve value name when inserting mul24 intrinsic llvm-svn: 369857
# b3dd381a	24-Aug-2019	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Introduce a flag to disable mul24 intrinsic formation llvm-svn: 369856
Revision tags: llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init
# 49169a96	15-Jul-2019	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add 24-bit mul intrinsics Insert these during codegenprepare. This works around a DAG issue where generic combines eliminate the and asserting the high bits are zero, which then exposes an AMDGPU: Add 24-bit mul intrinsics Insert these during codegenprepare. This works around a DAG issue where generic combines eliminate the and asserting the high bits are zero, which then exposes an unknown read source to the mul combine. It doesn't worth the hassle of trying to insert an AssertZext or something to try to deal with it. llvm-svn: 366094 show more ...
1 2 345 6