amdgpu-simplify-libcall-pow-codegen.ll - OpenGrok history log for /llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init
# 11b04019	24-Jan-2025	Aaditya <115080342+easyonaadit@users.noreply.github.com>	[AMDGPU] Restore SP from saved-FP or saved-BP (#124007) Currently, the AMDGPU backend bumps the Stack Pointer by fixed size offsets in the prolog of device functions, and restores it by the same [AMDGPU] Restore SP from saved-FP or saved-BP (#124007) Currently, the AMDGPU backend bumps the Stack Pointer by fixed size offsets in the prolog of device functions, and restores it by the same amount in the epilog. Prolog: sp += frameSize Epilog: sp -= frameSize If a function has dynamic stack realignment, Prolog: sp += frameSize + max_alignment Epilog: sp -= frameSize + max_alignment These calculations are not optimal in case of dynamic stack realignment, and completely fail in case of dynamic stack readjustment. This patch uses the saved Frame Pointer to restore SP. Prolog: fp = sp sp += frameSize Epilog: sp = fp In case of dynamic stack realignment, SP is restored from the saved Base Pointer. Prolog: fp = sp + (max_alignment - 1) fp = fp & (-max_alignment) bp = sp sp += frameSize + max_alignment Epilog: sp = bp (Note: The presence of BP has been enforced in case of any dynamic stack realignment.) --------- Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com> Co-authored-by: Matt Arsenault <arsenm2@gmail.com> show more ...
Revision tags: llvmorg-19.1.7, llvmorg-19.1.6
# 0b0d9a3b	09-Dec-2024	Vikash Gupta <Vikash.Gupta@amd.com>	[CodeGen] [AMDGPU] Attempt DAGCombine for fmul with select to ldexp (#111109) The materialization cost of 32-bit non-inline in case of fmul is quite relatively more, rather than if possible to comb [CodeGen] [AMDGPU] Attempt DAGCombine for fmul with select to ldexp (#111109) The materialization cost of 32-bit non-inline in case of fmul is quite relatively more, rather than if possible to combine it into ldexp instruction for specific scenarios (for datatypes like f64, f32 and f16) as this is being handled here : The dag combine for any pair of select values which are exact exponent of 2. ``` fmul x, select(y, A, B) -> ldexp (x, select i32 (y, a, b)) fmul x, select(y, -A, -B) -> ldexp ((fneg x), select i32 (y, a, b)) where, A=2^a & B=2^b ; a and b are integers. ``` This dagCombine is handled separately in fmulCombine (newly defined in SIIselLowering), targeting fmul fusing it with select type operand into ldexp. Thus, it fixes #104900. show more ...
Revision tags: llvmorg-19.1.5, llvmorg-19.1.4
# 6548b635	09-Nov-2024	Shilei Tian <i@tianshilei.me>	Reapply "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)" This reverts commit ca33649abe5fad93c57afef54e43ed9b3249cd86.
# ca33649a	08-Nov-2024	Shilei Tian <i@tianshilei.me>	Revert "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)" This reverts commit e215a1e27d84adad2635a52393621eb4fa439dc9 as it broke both hip and openmp buildbots.
# e215a1e2	08-Nov-2024	Shilei Tian <i@tianshilei.me>	[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1
# 528bcf3a	20-Sep-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Restore deleted test checks from test These were accidentally removed in 758444ca3e7163a1504eeced3383af861d01d761
# 758444ca	19-Sep-2024	Pierre van Houtryve <pierre.vanhoutryve@amd.com>	[AMDGPU] Promote uniform ops to I32 in DAGISel (#106383) Promote uniform binops, selects and setcc between 2 and 16 bits to 32 bits in DAGISel Solves #64591
Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4
# a1058776	21-Aug-2024	Nikita Popov <npopov@redhat.com>	[InstCombine] Remove some of the complexity-based canonicalization (#91185) The idea behind this canonicalization is that it allows us to handle less patterns, because we know that some will be can [InstCombine] Remove some of the complexity-based canonicalization (#91185) The idea behind this canonicalization is that it allows us to handle less patterns, because we know that some will be canonicalized away. This is indeed very useful to e.g. know that constants are always on the right. However, this is only useful if the canonicalization is actually reliable. This is the case for constants, but not for arguments: Moving these to the right makes it look like the "more complex" expression is guaranteed to be on the left, but this is not actually the case in practice. It fails as soon as you replace the argument with another instruction. The end result is that it looks like things correctly work in tests, while they actually don't. We use the "thwart complexity-based canonicalization" trick to handle this in tests, but it's often a challenge for new contributors to get this right, and based on the regressions this PR originally exposed, we clearly don't get this right in many cases. For this reason, I think that it's better to remove this complexity canonicalization. It will make it much easier to write tests for commuted cases and make sure that they are handled. show more ...
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2
# b455edbc	31-Jul-2024	Yingwei Zheng <dtcxzyw2333@gmail.com>	[InstCombine] Recognize copysign idioms (#101324) This patch folds `(bitcast (or (and (bitcast X to int), signmask), nneg Y) to fp)` into `copysign((bitcast Y to fp), X)`. I found this pattern exi [InstCombine] Recognize copysign idioms (#101324) This patch folds `(bitcast (or (and (bitcast X to int), signmask), nneg Y) to fp)` into `copysign((bitcast Y to fp), X)`. I found this pattern exists in some graphics applications/math libraries. Alive2: https://alive2.llvm.org/ce/z/ggQZV2 show more ...
Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1
# 4490003a	06-Mar-2024	Emma Pilkington <emma.pilkington95@gmail.com>	[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905) The previous name 'amdgpu_code_object_version', was misleading since this is really a property of the HSA OS. The new spelling [AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905) The previous name 'amdgpu_code_object_version', was misleading since this is really a property of the HSA OS. The new spelling also matches the asm directive I added in bc82cfb. show more ...
Revision tags: llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 230c13d5	24-Jan-2024	Christudasan Devadasan <christudasan.devadasan@amd.com>	[AMDGPU] Pick available high VGPR for CSR SGPR spilling (#78669) CSR SGPR spilling currently uses the early available physical VGPRs. It currently imposes a high register pressure while trying to a [AMDGPU] Pick available high VGPR for CSR SGPR spilling (#78669) CSR SGPR spilling currently uses the early available physical VGPRs. It currently imposes a high register pressure while trying to allocate large VGPR tuples within the default register budget. This patch changes the spilling strategy by picking the VGPRs in the reverse order, the highest available VGPR first and later after regalloc shift them back to the lowest available range. With that, the initial VGPRs would be available for allocation and possibility of finding large number of contiguous registers will be more. show more ...
# af4f1766	17-Jan-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Allocate special SGPRs before user SGPR arguments (#78234)
# 777b6de7	12-Dec-2023	Saiyedul Islam <Saiyedul.Islam@amd.com>	[AMDGPU][NFC] Test autogenerated llc tests for COV5 (#74339) Regenerate a few llc tests to test for COV5 instead of the default ABI version.
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0
# 466a8149	12-Sep-2023	Saiyedul Islam <Saiyedul.Islam@amd.com>	Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)" (#66060) This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.
# 0a8d17e7	12-Sep-2023	Saiyedul Islam <Saiyedul.Islam@amd.com>	[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410) Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Reviewed B [AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410) Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Reviewed By: arsenm, jhuber6 Github PR: #65410 Differential Revision: https://reviews.llvm.org/D129818 show more ...
Revision tags: llvmorg-17.0.0-rc4
# def22855	26-Aug-2023	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Use pown instead of pow if known integral https://reviews.llvm.org/D158998
# deefda70	26-Aug-2023	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Use exp2 and log2 intrinsics directly for f16/f32 These codegen correctly but f64 doesn't. This prevents losing fast math flags on the way to the underlying intrinsic. https://reviews.llvm. AMDGPU: Use exp2 and log2 intrinsics directly for f16/f32 These codegen correctly but f64 doesn't. This prevents losing fast math flags on the way to the underlying intrinsic. https://reviews.llvm.org/D158997 show more ...
# dac8f974	26-Aug-2023	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Handle sitofp and uitofp exponents in fast pow expansion https://reviews.llvm.org/D158996
Revision tags: llvmorg-17.0.0-rc3
# aa539b12	20-Aug-2023	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add baseline tests for libcall recognition of pow/powr/pown