reduction.ll - OpenGrok history log for /llvm-project/llvm/test/Analysis/CostModel/X86/reduction.ll

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# 89ca3e72	29-Jan-2025	Simon Pilgrim <llvm-dev@redking.me.uk>	[CostModel][X86] Reduce worst case v8i16/v16i8 SSE2 shuffle costs (#124789) These were based off instruction count, not throughput - we can probably improve these further, but these throughput numbe [CostModel][X86] Reduce worst case v8i16/v16i8 SSE2 shuffle costs (#124789) These were based off instruction count, not throughput - we can probably improve these further, but these throughput numbers match the worse expanded shuffles we see in the vector-shuffle-128-v* codegen tests. show more ...
Revision tags: llvmorg-21-init
# 178f4714	27-Jan-2025	Simon Pilgrim <llvm-dev@redking.me.uk>	[CostModel][X86] getShuffleCost - shuffles with only one defined element are always cheap (#124412) If we're just moving a single element around inside a 128-bit lane (probably as an alternative to [CostModel][X86] getShuffleCost - shuffles with only one defined element are always cheap (#124412) If we're just moving a single element around inside a 128-bit lane (probably as an alternative to extracting it), we can assume this is cheap as a single PSRLDQ/PSHUFD/SHUFPS. I've got the horrid feeling we're moving towards matching all SSE shuffle patterns inside the cost model, but I'm going to do my best to avoid this for now :\| show more ...
Revision tags: llvmorg-19.1.7
# db88071a	06-Jan-2025	Simon Pilgrim <llvm-dev@redking.me.uk>	[CostModel][X86] Attempt to match cheap v4f32 shuffles that map to SHUFPS instruction (#121778) Avoid always assuming the worst for v4f32 2 input shuffles, and match the SHUFPS pattern where possibl [CostModel][X86] Attempt to match cheap v4f32 shuffles that map to SHUFPS instruction (#121778) Avoid always assuming the worst for v4f32 2 input shuffles, and match the SHUFPS pattern where possible - each pair of output elements must come from the same source register. show more ...
# 9bb1d036	19-Dec-2024	Simon Pilgrim <llvm-dev@redking.me.uk>	[X86] getShuffleCost - when splitting shuffles, if a whole vector source is just copied we should treat this as free. (#120561) If the shuffle split results in referencing a single legalised whole v [X86] getShuffleCost - when splitting shuffles, if a whole vector source is just copied we should treat this as free. (#120561) If the shuffle split results in referencing a single legalised whole vector (i.e. no permutation), then this can be treated as free. We already do something similar for broadcasts / whole subvector insertion + extraction - its purely an issue for register allocation. show more ...
Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4
# ac1869aa	04-Nov-2024	Simon Pilgrim <llvm-dev@redking.me.uk>	[CostModel][X86] Add initial costs for non-lane-crossing one/two input shuffles (#114680) Most of the x86 shuffle instructions operate within each 128-bit subvector lane, but our shuffle costs strug [CostModel][X86] Add initial costs for non-lane-crossing one/two input shuffles (#114680) Most of the x86 shuffle instructions operate within each 128-bit subvector lane, but our shuffle costs struggle to handle this and have to fallback to worst case shuffles that reference elements from any lane. This patch detects shuffle masks that we know are "inlane" and enable us to assume a cheaper shuffle cost. show more ...
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3
# 8b56da5e	26-Apr-2023	ManuelJBrito <manuel.brito@tecnico.ulisboa.pt>	[IR] Change shufflevector undef mask to poison With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for unde [IR] Change shufflevector undef mask to poison With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for undefined mask elements. Differential Revision: https://reviews.llvm.org/D149210 show more ...
Revision tags: llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1
# 75e1cf4a	14-Apr-2021	Alexey Bataev <a.bataev@outlook.com>	[COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. [COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. Differential Revision: https://reviews.llvm.org/D100486 show more ...
# 9861ca0c	28-Apr-2022	Alexey Bataev <a.bataev@outlook.com>	Revert "[COST]Improve cost model for shuffles in SLP." This reverts commit 29a470e3804ca216d4e76c88a38086eb61c200f9 to fix a crash reported in https://reviews.llvm.org/D100486#3479989.
# 29a470e3	14-Apr-2021	Alexey Bataev <a.bataev@outlook.com>	[COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. [COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. Differential Revision: https://reviews.llvm.org/D100486 show more ...
# 4455c5cd	18-Mar-2022	Simon Pilgrim <llvm-dev@redking.me.uk>	[CostModel][X86] Update RUN -passes=* to double quotes to appease update scripts on windows
# 15ba588d	09-Feb-2022	Arthur Eubanks <aeubanks@google.com>	[test] Migrate '-analyze -cost-model' to '-passes=print<cost-model>'
# 38c9a406	09-Jul-2021	David Green <david.green@arm.com>	[TTI] Remove IsPairwiseForm from getArithmeticReductionCost This patch removes the IsPairwiseForm flag from the Reduction Cost TTI hooks, along with some accompanying code for pattern matching reduc [TTI] Remove IsPairwiseForm from getArithmeticReductionCost This patch removes the IsPairwiseForm flag from the Reduction Cost TTI hooks, along with some accompanying code for pattern matching reductions from trees starting at extract elements. IsPairWise is now assumed to be false, which was the predominant way that the value was used from both the Loop and SLP vectorizers. Since the adjustments such as D93860, the SLP vectorizer has not relied upon this distinction between paiwise and non-pairwise reductions. This also removes some code that was detecting reductions trees starting from extract elements inside the costmodel. This case was double-counting costs though, adding the individual costs on the individual instruction _and_ the total cost of the reduction. Removing it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to not double count. The cost of reduction intrinsics is still tested through the various tests in llvm/test/Analysis/CostModel/X86/reduce-xyz.ll. Differential Revision: https://reviews.llvm.org/D105484 show more ...
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6
# f4c67dfa	22-Mar-2020	Craig Topper <craig.topper@gmail.com>	[X86] More accurately model the cost of horizontal reductions. This patch attempts to more accurately model the reduction of power of 2 vectors of types we natively support. This takes into account [X86] More accurately model the cost of horizontal reductions. This patch attempts to more accurately model the reduction of power of 2 vectors of types we natively support. This takes into account the narrowing of vectors that occur as we go from 512 bits to 256 bits, to 128 bits. It also takes into account the use of wider elements in the shuffles for the first 2 steps of a reduction from 128 bits. And uses a v8i16 shift for the final step of vXi8 reduction. The default implementation uses the legalized type for the arithmetic for all levels. And uses the single source permute cost of the legalized type for all levels. This penalizes things like lack of v16i8 pshufb on pre-sse3 targets and the splitting and joining that needs to be done for integer types on AVX1. We never need v16i8 shuffle for a reduction and we only need split AVX1 ops when type the type wide and needs to be split. I think we're still over costing splits and joins for AVX1, but we're closer now. I've also removed all pairwise special casing because I don't think we ever want to generate that on X86. I've also adjusted the add handling to more accurately account for any type splitting that occurs before we reach a legal type. Differential Revision: https://reviews.llvm.org/D76478 show more ...
Revision tags: llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1
# a091f706	06-Nov-2019	Simon Pilgrim <llvm-dev@redking.me.uk>	[CostModel][X86] Improve add vXi64 + fadd vXf64 reduction tests for SLM As noted on D59710 we weren't handling the high costs of these operations on SLM.
# 1b986b41	06-Nov-2019	Simon Pilgrim <llvm-dev@redking.me.uk>	[CostModel][X86] Add add/fadd reduction tests for SLM
# 1b59a16c	12-Oct-2019	Simon Pilgrim <llvm-dev@redking.me.uk>	[CostModel][X86] Improve sum reduction costs. I can't see any notable differences in costs between SSE2 and SSE42 arches for FADD/ADD reduction, so I've lowered the target to just SSE2. I've also a [CostModel][X86] Improve sum reduction costs. I can't see any notable differences in costs between SSE2 and SSE42 arches for FADD/ADD reduction, so I've lowered the target to just SSE2. I've also added vXi8 sum reduction costs in line with the PSADBW codegen and discussions on PR42674. llvm-svn: 374655 show more ...
Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1
# c6bfb057	13-Dec-2018	Craig Topper <craig.topper@intel.com>	[CostModel][X86] Don't count 2 shuffles on the last level of a pairwise arithmetic or min/max reduction This is split from D55452 with the correct patch this time. Pairwise reductions require two s [CostModel][X86] Don't count 2 shuffles on the last level of a pairwise arithmetic or min/max reduction This is split from D55452 with the correct patch this time. Pairwise reductions require two shuffles on every level but the last. On the last level the two shuffles are <1, u, u, u...> and <0, u, u, u...>, but <0, u, u, u...> will be dropped by InstCombine/DAGCombine as being an identity shuffle. Differential Revision: https://reviews.llvm.org/D55615 llvm-svn: 349072 show more ...
Revision tags: llvmorg-7.0.1, llvmorg-7.0.1-rc3
# d1498ed8	07-Dec-2018	Craig Topper <craig.topper@intel.com>	[CostModel][X86] Fix overcounting arithmetic cost in illegal types in getArithmeticReductionCost/getMinMaxReductionCost We were overcounting the number of arithmetic operations needed at each level [CostModel][X86] Fix overcounting arithmetic cost in illegal types in getArithmeticReductionCost/getMinMaxReductionCost We were overcounting the number of arithmetic operations needed at each level before we reach a legal type. We were using the full vector type for that level, but we are going to split the input vector at that level in half. So the effective arithmetic operation cost at that level is half the width. So for example on 8i32 on an sse target. Were were calculating the cost of an 8i32 op which is likely 2 for basic integer. Then after the loop we count 2 more v4i32 ops. For a total arith cost of 4. But if you look at the assembly there would only be 3 arithmetic ops. There are still more bugs in this code that I'm going to work on next. The non pairwise code shouldn't count extract subvectors in the loop. There are no extracts, the types are split in registers. For pairwise we need to use 2 two src permute shuffles. Differential Revision: https://reviews.llvm.org/D55397 llvm-svn: 348621 show more ...
# 102854f4	01-Dec-2018	Simon Pilgrim <llvm-dev@redking.me.uk>	[TTI] Reduction costs only need to include a single extract element cost (REAPPLIED) We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extractin [TTI] Reduction costs only need to include a single extract element cost (REAPPLIED) We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type. For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed. Fixes PR37731 Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997 Differential Revision: https://reviews.llvm.org/D54585 llvm-svn: 348076 show more ...
# 8cd9d1b5	26-Nov-2018	Fedor Sergeev <fedor.sergeev@azul.com>	Revert "[TTI] Reduction costs only need to include a single extract element cost" This reverts commit r346970. It was causing PR39774, a crash in slp-vectorizer on a rather simple loop with just a b Revert "[TTI] Reduction costs only need to include a single extract element cost" This reverts commit r346970. It was causing PR39774, a crash in slp-vectorizer on a rather simple loop with just a bunch of 'and's in the body. llvm-svn: 347541 show more ...
# 924f1934	15-Nov-2018	Simon Pilgrim <llvm-dev@redking.me.uk>	[TTI] Reduction costs only need to include a single extract element cost We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every elem [TTI] Reduction costs only need to include a single extract element cost We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type. For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed. Fixes PR37731 Differential Revision: https://reviews.llvm.org/D54585 llvm-svn: 346970 show more ...
# fc8f1d7d	09-Nov-2018	Simon Pilgrim <llvm-dev@redking.me.uk>	[CostModel][X86] SK_ExtractSubvector is free if the subvector is at the start of the source vector llvm-svn: 346538
Revision tags: llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1
# 44a9a71d	30-Oct-2018	Simon Pilgrim <llvm-dev@redking.me.uk>	[TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368) Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector. Unlike the re [TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368) Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector. Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination! I've done my best to fix a number of vectorizer uses: SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment. LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta. Differential Revision: https://reviews.llvm.org/D53573 llvm-svn: 345617 show more ...
Revision tags: llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1
# 2a9cde02	21-Jun-2018	Simon Pilgrim <llvm-dev@redking.me.uk>	[X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882) These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like [X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882) These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does. llvm-svn: 335216 show more ...
Revision tags: llvmorg-6.0.1, llvmorg-6.0.1-rc3
# 07839219	12-Jun-2018	Simon Pilgrim <llvm-dev@redking.me.uk>	[CostModel] Treat Identity shuffle masks as zero cost As discussed on D47985, identity shuffle masks should probably be free. I've limited this to the case where the input and output types all matc [CostModel] Treat Identity shuffle masks as zero cost As discussed on D47985, identity shuffle masks should probably be free. I've limited this to the case where the input and output types all match - but we could probably accept all cases. Differential Revision: https://reviews.llvm.org/D47986 llvm-svn: 334506 show more ...
12