ARMTargetTransformInfo.cpp - OpenGrok history log for /llvm-project/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init
# 416f1c46	20-Jan-2025	Mats Jun Larsen <mats@jun.codes>	[IR] Replace of PointerType::get(Type) with opaque version (NFC) (#123617) In accordance with https://github.com/llvm/llvm-project/issues/123569 In order to keep the patch at reasonable size, this [IR] Replace of PointerType::get(Type) with opaque version (NFC) (#123617) In accordance with https://github.com/llvm/llvm-project/issues/123569 In order to keep the patch at reasonable size, this PR only covers for the llvm subproject, unittests excluded. show more ...
Revision tags: llvmorg-19.1.7
# f8d27047	18-Dec-2024	Vladi Krapp <vladi.krapp@arm.com>	[ARM] Reduce loop unroll when low overhead branching is available (#120065) For processors with low overhead branching (LOB), runtime unrolling the innermost loop is often detrimental to performanc [ARM] Reduce loop unroll when low overhead branching is available (#120065) For processors with low overhead branching (LOB), runtime unrolling the innermost loop is often detrimental to performance. In these cases the loop remainder gets unrolled into a series of compare-and-jump blocks, which in deeply nested loops get executed multiple times, negating the benefits of LOB. This is particularly noticable when the loop trip count of the innermost loop varies within the outer loop, such as in the case of triangular matrix decompositions. In these cases we will prefer to not unroll the innermost loop, with the intention for it to be executed as a low overhead loop. show more ...
Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4
# c3260c65	29-Oct-2024	Benjamin Maxwell <benjamin.maxwell@arm.com>	[IR] Add `llvm.sincos` intrinsic (#109825) This adds the `llvm.sincos` intrinsic, legalization, and lowering. The `llvm.sincos` intrinsic takes a floating-point value and returns both the sine a [IR] Add `llvm.sincos` intrinsic (#109825) This adds the `llvm.sincos` intrinsic, legalization, and lowering. The `llvm.sincos` intrinsic takes a floating-point value and returns both the sine and cosine (as a struct). ``` declare { float, float } @llvm.sincos.f32(float %Val) declare { double, double } @llvm.sincos.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val) ``` The lowering is built on top of the existing FSINCOS ISD node, with additional type legalization to allow for f16, f128, and vector values. show more ...
Revision tags: llvmorg-19.1.3
# e37d736d	24-Oct-2024	Nashe Mncube <nashe.mncube@arm.com>	Recommit: [llvm][ARM][GlobalOpt]Add widen global arrays pass (#113289) This is a recommit of #107120 . The original PR was approved but failed buildbot. The newly added tests should only be run fo Recommit: [llvm][ARM][GlobalOpt]Add widen global arrays pass (#113289) This is a recommit of #107120 . The original PR was approved but failed buildbot. The newly added tests should only be run for compilers that support the ARM target. This has been resolved by adding a config file for these tests. - Pass optimizes memcpy's by padding out destinations and sources to a full word to make ARM backend generate full word loads instead of loading a single byte (ldrb) and/or half word (ldrh). Only pads destination when it's a stack allocated constant size array and source when it's constant string. Heuristic to decide whether to pad or not is very basic and could be improved to allow more examples to be padded. - Pass works at the midend level show more ...
# 370fd743	17-Oct-2024	Nashe Mncube <nashe.mncube@arm.com>	Revert "[llvm][ARM]Add widen global arrays pass" (#112701) Reverts llvm/llvm-project#107120 Unexpected build failures in post-commit pipelines. Needs investigation
# ab90d279	17-Oct-2024	Nashe Mncube <nashe.mncube@arm.com>	[llvm][ARM]Add widen global arrays pass (#107120) - Pass optimizes memcpy's by padding out destinations and sources to a full word to make backend generate full word loads instead of loading a si [llvm][ARM]Add widen global arrays pass (#107120) - Pass optimizes memcpy's by padding out destinations and sources to a full word to make backend generate full word loads instead of loading a single byte (ldrb) and/or half word (ldrh). Only pads destination when it's a stack allocated constant size array and source when it's constant array. Heuristic to decide whether to pad or not is very basic and could be improved to allow more examples to be padded. - Pass works within GlobalOpt but is disabled by default on all targets except ARM. show more ...
Revision tags: llvmorg-19.1.2
# 853c43d0	09-Oct-2024	Jeffrey Byrnes <jeffrey.byrnes@amd.com>	[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564) Porting to TTI provides direct access to the instruction cost model, which can enable instruction cost based sinking without introducing code [TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564) Porting to TTI provides direct access to the instruction cost model, which can enable instruction cost based sinking without introducing code duplication. show more ...
Revision tags: llvmorg-19.1.1
# d2885743	25-Sep-2024	Philip Reames <preames@rivosinc.com>	[TTI][RISCV] Model cost of loading constants arms of selects and compares (#109824) This follows in the spirit of 7d82c99403f615f6236334e698720bf979959704, and extends the costing API for compares [TTI][RISCV] Model cost of loading constants arms of selects and compares (#109824) This follows in the spirit of 7d82c99403f615f6236334e698720bf979959704, and extends the costing API for compares and selects to provide information about the operands passed in an analogous manner. This allows us to model the cost of materializing the vector constant, as some select-of-constants are significantly more expensive than others when you account for the cost of materializing the constants involved. This is a stepping stone towards fixing https://github.com/llvm/llvm-project/issues/109466. A separate SLP patch will be required to utilize the new API. show more ...
Revision tags: llvmorg-19.1.0
# a7697c86	05-Sep-2024	Nikita Popov <npopov@redhat.com>	[ARM] Do not assume alignment in vld1xN and vst1xN intrinsics (#106984) These intrinsics currently assume natural alignment. Instead, respect the alignment attribute on the intrinsic. Teach InstCom [ARM] Do not assume alignment in vld1xN and vst1xN intrinsics (#106984) These intrinsics currently assume natural alignment. Instead, respect the alignment attribute on the intrinsic. Teach InstCombine to improve that alignment. If desired I could also adjust the clang frontend to add alignment annotations equivalent to the previous behavior, but I don't see any indication that such an assumption is correct in the ARM intrinsics docs. Fixes https://github.com/llvm/llvm-project/issues/59081. show more ...
Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3
# dcd246cb	05-Aug-2024	David Green <david.green@arm.com>	[ARM] Add scalar add_sat costs. (#100988) These can usually generate: - qadd / qsub for signed i32 scalars - uqadd16 / qadd16 / uqsub16 / qsub16 with an extend for signed/unsigned i8/i16 - Are [ARM] Add scalar add_sat costs. (#100988) These can usually generate: - qadd / qsub for signed i32 scalars - uqadd16 / qadd16 / uqsub16 / qsub16 with an extend for signed/unsigned i8/i16 - Are expanded to an add + cmp + sel otherwise This can lead to differences in unrolling etc, but should be a better cost for the instructions. show more ...
Revision tags: llvmorg-19.1.0-rc2
# ea7cc12f	28-Jul-2024	David Green <david.green@arm.com>	[ARM] Add fallback fptoi_sat costs. This makes sure that the custom operations get a fallback cost, even if they are not perfect.
Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init
# 11484cb8	18-Jun-2024	Nikita Popov <npopov@redhat.com>	[InstCombine] Pass SimplifyQuery to SimplifyDemandedBits() This will enable calling SimplifyDemandedBits() with a SimplifyQuery that has CondContext set in the future. Additionally this also margin [InstCombine] Pass SimplifyQuery to SimplifyDemandedBits() This will enable calling SimplifyDemandedBits() with a SimplifyQuery that has CondContext set in the future. Additionally this also marginally strengthens the analysis by retaining the original context instruction for one-use chains. show more ...
# 34a2889e	19-Jun-2024	Andreas Jonson <andjo403@hotmail.com>	[InstCombine] Swap out range metadata to range attribute for arm_mve_pred_v2i (#94847)
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6
# 2e8d8155	10-May-2024	Graham Hunter <graham.hunter@arm.com>	[TTI] Support scalable offsets in getScalingFactorCost (#88113) Part of the work to support vscale-relative immediates in LSR.
Revision tags: llvmorg-18.1.5, llvmorg-18.1.4
# 4ac2721e	09-Apr-2024	David Green <david.green@arm.com>	[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934) This tries to add some costs for the shuffle in a ST3/ST4 instruction, which are represented in LLVM IR as sto [AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934) This tries to add some costs for the shuffle in a ST3/ST4 instruction, which are represented in LLVM IR as store(interleaving shuffle). In order to detect the store, it needs to add a CxtI context instruction to check the users of the shuffle. LD3 and LD4 are added, LD2 should be a zip1 shuffle, which will be added in another patch. It should help fix some of the regressions from #87510. show more ...
Revision tags: llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3
# 7bc079c8	12-Feb-2024	Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com>	[TTI]Fallback to SingleSrcPermute shuffle kind, if no direct estimation for extract subvector. Many targets do not have cost for extractsubvector shuffle kind, but have the costs for single source [TTI]Fallback to SingleSrcPermute shuffle kind, if no direct estimation for extract subvector. Many targets do not have cost for extractsubvector shuffle kind, but have the costs for single source permute. If there are no costs estimation for extractsubvector, better to switchto single source permute for better cost estimation. Reviewers: RKSimon, davemgreen, arsenm Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/79837 show more ...
Revision tags: llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1
# 184ca395	25-Jan-2024	Nico Weber <thakis@chromium.org>	[llvm] Move CodeGenTypes library to its own directory (#79444) Finally addresses https://reviews.llvm.org/D148769#4311232 :) No behavior change.
Revision tags: llvmorg-19-init
# 586ecdf2	12-Dec-2023	Kazu Hirata <kazu@google.com>	[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}:: [llvm] Use StringRef::{starts,ends}_with (NFC) (#74956) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with. show more ...
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5
# fee2953f	02-Nov-2023	David Green <david.green@arm.com>	[ARM] Fix for undef elements from demanded elements (#70504) I think this is right, that the undef bits should be the undef bits from the passthrough (operand 0), with the top/bottom lanes cleared, [ARM] Fix for undef elements from demanded elements (#70504) I think this is right, that the undef bits should be the undef bits from the passthrough (operand 0), with the top/bottom lanes cleared, as they come from the second arg (operand 1). We don't yet attempt to look for undef elements in the second operand, but this should fix the bug with all elements being marked as undef and the instruction being optimized away. show more ...
# 75b3c3d2	31-Oct-2023	David Green <david.green@arm.com>	[ARM] Disable UpperBound loop unrolling for MVE tail predicated loops. (#69709) For MVE tail predicated loops, better code can be generated by keeping the loop whole than to unroll to an upper boun [ARM] Disable UpperBound loop unrolling for MVE tail predicated loops. (#69709) For MVE tail predicated loops, better code can be generated by keeping the loop whole than to unroll to an upper bound, which requires the expansion of active lane masks that can be difficult to generate good code for. This patch disables UpperBound unrolling when we find a active_lane_mask in the loop. show more ...
Revision tags: llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4
# 233fb987	04-Sep-2023	David Green <david.green@arm.com>	[ARM] Improve bitwise reduction costs This adds some basic and/or/xor reduction costs for NEON/MVE, handling them like other reductions where vector operations are used to reduce to legal sizes, fol [ARM] Improve bitwise reduction costs This adds some basic and/or/xor reduction costs for NEON/MVE, handling them like other reductions where vector operations are used to reduce to legal sizes, followed by an optional VREV+VAND/VORR/VEOR step and scalarization from there. show more ...
# 4cef24a8	04-Sep-2023	David Green <david.green@arm.com>	[ARM] Improve reduction integer min/max costs This adds some basic smin/smax/umin/umax reduction costs for MVE/NEON, similar to the existing Add reduction costs. They follow the same style as Add re [ARM] Improve reduction integer min/max costs This adds some basic smin/smax/umin/umax reduction costs for MVE/NEON, similar to the existing Add reduction costs. They follow the same style as Add reductions, but include a higher cost as the costs tend to be dependant on the element size for vminv/vmaxv. These costs may not be precise, but will be more inline than the default that extracts each element. show more ...
# 2955cc15	04-Sep-2023	David Green <david.green@arm.com>	[ARM] Improve costs for FMin/Max reductions Similar to the other reductions, this changes the cost of fmin/fmax reductions under MVE/NEON to perform vector operations until the types need to be scal [ARM] Improve costs for FMin/Max reductions Similar to the other reductions, this changes the cost of fmin/fmax reductions under MVE/NEON to perform vector operations until the types need to be scalarized. The fp16 vectors can perform a VREV+FMIN/FMAX to skip a step of the reduction, and otherwise need lanewise extract fro the top lanes. show more ...
# 4530f029	04-Sep-2023	David Green <david.green@arm.com>	[ARM] Improve reduction fadd/fmul costs This adds some basic fadd/fmul reduction costs for MVE/NEON. It reduces by halving the vector size until it it gets scalarized, with some additional costs for [ARM] Improve reduction fadd/fmul costs This adds some basic fadd/fmul reduction costs for MVE/NEON. It reduces by halving the vector size until it it gets scalarized, with some additional costs for fp16 which may require extracting the top lanes. Differential Revision: https://reviews.llvm.org/D159367 show more ...
Revision tags: llvmorg-17.0.0-rc3
# 9a207578	08-Aug-2023	Alexey Bataev <a.bataev@outlook.com>	[TTI]Add InsertSubvector pattern in improveShuffleKindFromMask(). It improves shuffle instructions estimation and improves vectorization outcome. Differential Revision: https://reviews.llvm.org/D15 [TTI]Add InsertSubvector pattern in improveShuffleKindFromMask(). It improves shuffle instructions estimation and improves vectorization outcome. Differential Revision: https://reviews.llvm.org/D157425 show more ...
12 3 4 5 6 7 8 9 10 >>...15