| #
89ca3e72 |
| 29-Jan-2025 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Reduce worst case v8i16/v16i8 SSE2 shuffle costs (#124789)
These were based off instruction count, not throughput - we can probably improve these further, but these throughput numbe
[CostModel][X86] Reduce worst case v8i16/v16i8 SSE2 shuffle costs (#124789)
These were based off instruction count, not throughput - we can probably improve these further, but these throughput numbers match the worse expanded shuffles we see in the vector-shuffle-128-v* codegen tests.
show more ...
|
|
Revision tags: llvmorg-21-init |
|
| #
178f4714 |
| 27-Jan-2025 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] getShuffleCost - shuffles with only one defined element are always cheap (#124412)
If we're just moving a single element around inside a 128-bit lane (probably as an alternative to
[CostModel][X86] getShuffleCost - shuffles with only one defined element are always cheap (#124412)
If we're just moving a single element around inside a 128-bit lane (probably as an alternative to extracting it), we can assume this is cheap as a single PSRLDQ/PSHUFD/SHUFPS.
I've got the horrid feeling we're moving towards matching all SSE shuffle patterns inside the cost model, but I'm going to do my best to avoid this for now :|
show more ...
|
|
Revision tags: llvmorg-19.1.7 |
|
| #
db88071a |
| 06-Jan-2025 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Attempt to match cheap v4f32 shuffles that map to SHUFPS instruction (#121778)
Avoid always assuming the worst for v4f32 2 input shuffles, and match the SHUFPS pattern where possibl
[CostModel][X86] Attempt to match cheap v4f32 shuffles that map to SHUFPS instruction (#121778)
Avoid always assuming the worst for v4f32 2 input shuffles, and match the SHUFPS pattern where possible - each pair of output elements must come from the same source register.
show more ...
|
| #
9bb1d036 |
| 19-Dec-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[X86] getShuffleCost - when splitting shuffles, if a whole vector source is just copied we should treat this as free. (#120561)
If the shuffle split results in referencing a single legalised whole v
[X86] getShuffleCost - when splitting shuffles, if a whole vector source is just copied we should treat this as free. (#120561)
If the shuffle split results in referencing a single legalised whole vector (i.e. no permutation), then this can be treated as free.
We already do something similar for broadcasts / whole subvector insertion + extraction - its purely an issue for register allocation.
show more ...
|
|
Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
| #
ac1869aa |
| 04-Nov-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Add initial costs for non-lane-crossing one/two input shuffles (#114680)
Most of the x86 shuffle instructions operate within each 128-bit subvector lane, but our shuffle costs strug
[CostModel][X86] Add initial costs for non-lane-crossing one/two input shuffles (#114680)
Most of the x86 shuffle instructions operate within each 128-bit subvector lane, but our shuffle costs struggle to handle this and have to fallback to worst case shuffles that reference elements from any lane.
This patch detects shuffle masks that we know are "inlane" and enable us to assume a cheaper shuffle cost.
show more ...
|
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3 |
|
| #
8b56da5e |
| 26-Apr-2023 |
ManuelJBrito <manuel.brito@tecnico.ulisboa.pt> |
[IR] Change shufflevector undef mask to poison
With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for unde
[IR] Change shufflevector undef mask to poison
With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for undefined mask elements.
Differential Revision: https://reviews.llvm.org/D149210
show more ...
|
|
Revision tags: llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
75e1cf4a |
| 14-Apr-2021 |
Alexey Bataev <a.bataev@outlook.com> |
[COST]Improve cost model for shuffles in SLP.
Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks.
[COST]Improve cost model for shuffles in SLP.
Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks.
Differential Revision: https://reviews.llvm.org/D100486
show more ...
|
| #
9861ca0c |
| 28-Apr-2022 |
Alexey Bataev <a.bataev@outlook.com> |
Revert "[COST]Improve cost model for shuffles in SLP."
This reverts commit 29a470e3804ca216d4e76c88a38086eb61c200f9 to fix a crash reported in https://reviews.llvm.org/D100486#3479989.
|
| #
29a470e3 |
| 14-Apr-2021 |
Alexey Bataev <a.bataev@outlook.com> |
[COST]Improve cost model for shuffles in SLP.
Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks.
[COST]Improve cost model for shuffles in SLP.
Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks.
Differential Revision: https://reviews.llvm.org/D100486
show more ...
|
| #
4455c5cd |
| 18-Mar-2022 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Update RUN -passes=* to double quotes to appease update scripts on windows
|
| #
15ba588d |
| 09-Feb-2022 |
Arthur Eubanks <aeubanks@google.com> |
[test] Migrate '-analyze -cost-model' to '-passes=print<cost-model>'
|
| #
38c9a406 |
| 09-Jul-2021 |
David Green <david.green@arm.com> |
[TTI] Remove IsPairwiseForm from getArithmeticReductionCost
This patch removes the IsPairwiseForm flag from the Reduction Cost TTI hooks, along with some accompanying code for pattern matching reduc
[TTI] Remove IsPairwiseForm from getArithmeticReductionCost
This patch removes the IsPairwiseForm flag from the Reduction Cost TTI hooks, along with some accompanying code for pattern matching reductions from trees starting at extract elements. IsPairWise is now assumed to be false, which was the predominant way that the value was used from both the Loop and SLP vectorizers. Since the adjustments such as D93860, the SLP vectorizer has not relied upon this distinction between paiwise and non-pairwise reductions.
This also removes some code that was detecting reductions trees starting from extract elements inside the costmodel. This case was double-counting costs though, adding the individual costs on the individual instruction _and_ the total cost of the reduction. Removing it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to not double count. The cost of reduction intrinsics is still tested through the various tests in llvm/test/Analysis/CostModel/X86/reduce-xyz.ll.
Differential Revision: https://reviews.llvm.org/D105484
show more ...
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6 |
|
| #
f4c67dfa |
| 22-Mar-2020 |
Craig Topper <craig.topper@gmail.com> |
[X86] More accurately model the cost of horizontal reductions.
This patch attempts to more accurately model the reduction of power of 2 vectors of types we natively support. This takes into account
[X86] More accurately model the cost of horizontal reductions.
This patch attempts to more accurately model the reduction of power of 2 vectors of types we natively support. This takes into account the narrowing of vectors that occur as we go from 512 bits to 256 bits, to 128 bits. It also takes into account the use of wider elements in the shuffles for the first 2 steps of a reduction from 128 bits. And uses a v8i16 shift for the final step of vXi8 reduction.
The default implementation uses the legalized type for the arithmetic for all levels. And uses the single source permute cost of the legalized type for all levels. This penalizes things like lack of v16i8 pshufb on pre-sse3 targets and the splitting and joining that needs to be done for integer types on AVX1. We never need v16i8 shuffle for a reduction and we only need split AVX1 ops when type the type wide and needs to be split. I think we're still over costing splits and joins for AVX1, but we're closer now.
I've also removed all pairwise special casing because I don't think we ever want to generate that on X86. I've also adjusted the add handling to more accurately account for any type splitting that occurs before we reach a legal type.
Differential Revision: https://reviews.llvm.org/D76478
show more ...
|
|
Revision tags: llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1 |
|
| #
a091f706 |
| 06-Nov-2019 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Improve add vXi64 + fadd vXf64 reduction tests for SLM
As noted on D59710 we weren't handling the high costs of these operations on SLM.
|
| #
1b986b41 |
| 06-Nov-2019 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Add add/fadd reduction tests for SLM
|
| #
1b59a16c |
| 12-Oct-2019 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Improve sum reduction costs.
I can't see any notable differences in costs between SSE2 and SSE42 arches for FADD/ADD reduction, so I've lowered the target to just SSE2.
I've also a
[CostModel][X86] Improve sum reduction costs.
I can't see any notable differences in costs between SSE2 and SSE42 arches for FADD/ADD reduction, so I've lowered the target to just SSE2.
I've also added vXi8 sum reduction costs in line with the PSADBW codegen and discussions on PR42674.
llvm-svn: 374655
show more ...
|
|
Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1 |
|
| #
c6bfb057 |
| 13-Dec-2018 |
Craig Topper <craig.topper@intel.com> |
[CostModel][X86] Don't count 2 shuffles on the last level of a pairwise arithmetic or min/max reduction
This is split from D55452 with the correct patch this time.
Pairwise reductions require two s
[CostModel][X86] Don't count 2 shuffles on the last level of a pairwise arithmetic or min/max reduction
This is split from D55452 with the correct patch this time.
Pairwise reductions require two shuffles on every level but the last. On the last level the two shuffles are <1, u, u, u...> and <0, u, u, u...>, but <0, u, u, u...> will be dropped by InstCombine/DAGCombine as being an identity shuffle.
Differential Revision: https://reviews.llvm.org/D55615
llvm-svn: 349072
show more ...
|
|
Revision tags: llvmorg-7.0.1, llvmorg-7.0.1-rc3 |
|
| #
d1498ed8 |
| 07-Dec-2018 |
Craig Topper <craig.topper@intel.com> |
[CostModel][X86] Fix overcounting arithmetic cost in illegal types in getArithmeticReductionCost/getMinMaxReductionCost
We were overcounting the number of arithmetic operations needed at each level
[CostModel][X86] Fix overcounting arithmetic cost in illegal types in getArithmeticReductionCost/getMinMaxReductionCost
We were overcounting the number of arithmetic operations needed at each level before we reach a legal type. We were using the full vector type for that level, but we are going to split the input vector at that level in half. So the effective arithmetic operation cost at that level is half the width.
So for example on 8i32 on an sse target. Were were calculating the cost of an 8i32 op which is likely 2 for basic integer. Then after the loop we count 2 more v4i32 ops. For a total arith cost of 4. But if you look at the assembly there would only be 3 arithmetic ops.
There are still more bugs in this code that I'm going to work on next. The non pairwise code shouldn't count extract subvectors in the loop. There are no extracts, the types are split in registers. For pairwise we need to use 2 two src permute shuffles.
Differential Revision: https://reviews.llvm.org/D55397
llvm-svn: 348621
show more ...
|
| #
102854f4 |
| 01-Dec-2018 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[TTI] Reduction costs only need to include a single extract element cost (REAPPLIED)
We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extractin
[TTI] Reduction costs only need to include a single extract element cost (REAPPLIED)
We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.
For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.
Fixes PR37731
Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997
Differential Revision: https://reviews.llvm.org/D54585
llvm-svn: 348076
show more ...
|
| #
8cd9d1b5 |
| 26-Nov-2018 |
Fedor Sergeev <fedor.sergeev@azul.com> |
Revert "[TTI] Reduction costs only need to include a single extract element cost"
This reverts commit r346970. It was causing PR39774, a crash in slp-vectorizer on a rather simple loop with just a b
Revert "[TTI] Reduction costs only need to include a single extract element cost"
This reverts commit r346970. It was causing PR39774, a crash in slp-vectorizer on a rather simple loop with just a bunch of 'and's in the body.
llvm-svn: 347541
show more ...
|
| #
924f1934 |
| 15-Nov-2018 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[TTI] Reduction costs only need to include a single extract element cost
We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every elem
[TTI] Reduction costs only need to include a single extract element cost
We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.
For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.
Fixes PR37731
Differential Revision: https://reviews.llvm.org/D54585
llvm-svn: 346970
show more ...
|
| #
fc8f1d7d |
| 09-Nov-2018 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] SK_ExtractSubvector is free if the subvector is at the start of the source vector
llvm-svn: 346538
|
|
Revision tags: llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1 |
|
| #
44a9a71d |
| 30-Oct-2018 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368)
Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector.
Unlike the re
[TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368)
Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector.
Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination!
I've done my best to fix a number of vectorizer uses:
SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment.
LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta.
Differential Revision: https://reviews.llvm.org/D53573
llvm-svn: 345617
show more ...
|
|
Revision tags: llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1 |
|
| #
2a9cde02 |
| 21-Jun-2018 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882)
These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like
[X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882)
These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does.
llvm-svn: 335216
show more ...
|
|
Revision tags: llvmorg-6.0.1, llvmorg-6.0.1-rc3 |
|
| #
07839219 |
| 12-Jun-2018 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel] Treat Identity shuffle masks as zero cost
As discussed on D47985, identity shuffle masks should probably be free.
I've limited this to the case where the input and output types all matc
[CostModel] Treat Identity shuffle masks as zero cost
As discussed on D47985, identity shuffle masks should probably be free.
I've limited this to the case where the input and output types all match - but we could probably accept all cases.
Differential Revision: https://reviews.llvm.org/D47986
llvm-svn: 334506
show more ...
|