History log of /llvm-project/llvm/test/Analysis/CostModel/X86/reduction.ll (Results 1 – 25 of 35)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 89ca3e72 29-Jan-2025 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Reduce worst case v8i16/v16i8 SSE2 shuffle costs (#124789)

These were based off instruction count, not throughput - we can probably improve these further, but these throughput numbe

[CostModel][X86] Reduce worst case v8i16/v16i8 SSE2 shuffle costs (#124789)

These were based off instruction count, not throughput - we can probably improve these further, but these throughput numbers match the worse expanded shuffles we see in the vector-shuffle-128-v* codegen tests.

show more ...


Revision tags: llvmorg-21-init
# 178f4714 27-Jan-2025 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] getShuffleCost - shuffles with only one defined element are always cheap (#124412)

If we're just moving a single element around inside a 128-bit lane (probably as an alternative to

[CostModel][X86] getShuffleCost - shuffles with only one defined element are always cheap (#124412)

If we're just moving a single element around inside a 128-bit lane (probably as an alternative to extracting it), we can assume this is cheap as a single PSRLDQ/PSHUFD/SHUFPS.

I've got the horrid feeling we're moving towards matching all SSE shuffle patterns inside the cost model, but I'm going to do my best to avoid this for now :|

show more ...


Revision tags: llvmorg-19.1.7
# db88071a 06-Jan-2025 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Attempt to match cheap v4f32 shuffles that map to SHUFPS instruction (#121778)

Avoid always assuming the worst for v4f32 2 input shuffles, and match the SHUFPS pattern where possibl

[CostModel][X86] Attempt to match cheap v4f32 shuffles that map to SHUFPS instruction (#121778)

Avoid always assuming the worst for v4f32 2 input shuffles, and match the SHUFPS pattern where possible - each pair of output elements must come from the same source register.

show more ...


# 9bb1d036 19-Dec-2024 Simon Pilgrim <llvm-dev@redking.me.uk>

[X86] getShuffleCost - when splitting shuffles, if a whole vector source is just copied we should treat this as free. (#120561)

If the shuffle split results in referencing a single legalised whole v

[X86] getShuffleCost - when splitting shuffles, if a whole vector source is just copied we should treat this as free. (#120561)

If the shuffle split results in referencing a single legalised whole vector (i.e. no permutation), then this can be treated as free.

We already do something similar for broadcasts / whole subvector insertion + extraction - its purely an issue for register allocation.

show more ...


Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4
# ac1869aa 04-Nov-2024 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Add initial costs for non-lane-crossing one/two input shuffles (#114680)

Most of the x86 shuffle instructions operate within each 128-bit subvector lane, but our shuffle costs strug

[CostModel][X86] Add initial costs for non-lane-crossing one/two input shuffles (#114680)

Most of the x86 shuffle instructions operate within each 128-bit subvector lane, but our shuffle costs struggle to handle this and have to fallback to worst case shuffles that reference elements from any lane.

This patch detects shuffle masks that we know are "inlane" and enable us to assume a cheaper shuffle cost.

show more ...


Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3
# 8b56da5e 26-Apr-2023 ManuelJBrito <manuel.brito@tecnico.ulisboa.pt>

[IR] Change shufflevector undef mask to poison

With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for unde

[IR] Change shufflevector undef mask to poison

With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.

Differential Revision: https://reviews.llvm.org/D149210

show more ...


Revision tags: llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1
# 75e1cf4a 14-Apr-2021 Alexey Bataev <a.bataev@outlook.com>

[COST]Improve cost model for shuffles in SLP.

Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.

[COST]Improve cost model for shuffles in SLP.

Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.

Differential Revision: https://reviews.llvm.org/D100486

show more ...


# 9861ca0c 28-Apr-2022 Alexey Bataev <a.bataev@outlook.com>

Revert "[COST]Improve cost model for shuffles in SLP."

This reverts commit 29a470e3804ca216d4e76c88a38086eb61c200f9 to fix
a crash reported in https://reviews.llvm.org/D100486#3479989.


# 29a470e3 14-Apr-2021 Alexey Bataev <a.bataev@outlook.com>

[COST]Improve cost model for shuffles in SLP.

Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.

[COST]Improve cost model for shuffles in SLP.

Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.

Differential Revision: https://reviews.llvm.org/D100486

show more ...


# 4455c5cd 18-Mar-2022 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Update RUN -passes=* to double quotes to appease update scripts on windows


# 15ba588d 09-Feb-2022 Arthur Eubanks <aeubanks@google.com>

[test] Migrate '-analyze -cost-model' to '-passes=print<cost-model>'


# 38c9a406 09-Jul-2021 David Green <david.green@arm.com>

[TTI] Remove IsPairwiseForm from getArithmeticReductionCost

This patch removes the IsPairwiseForm flag from the Reduction Cost TTI
hooks, along with some accompanying code for pattern matching reduc

[TTI] Remove IsPairwiseForm from getArithmeticReductionCost

This patch removes the IsPairwiseForm flag from the Reduction Cost TTI
hooks, along with some accompanying code for pattern matching reductions
from trees starting at extract elements. IsPairWise is now assumed to be
false, which was the predominant way that the value was used from both
the Loop and SLP vectorizers. Since the adjustments such as D93860, the
SLP vectorizer has not relied upon this distinction between paiwise and
non-pairwise reductions.

This also removes some code that was detecting reductions trees starting
from extract elements inside the costmodel. This case was
double-counting costs though, adding the individual costs on the
individual instruction _and_ the total cost of the reduction. Removing
it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to
not double count. The cost of reduction intrinsics is still tested
through the various tests in
llvm/test/Analysis/CostModel/X86/reduce-xyz.ll.

Differential Revision: https://reviews.llvm.org/D105484

show more ...


Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6
# f4c67dfa 22-Mar-2020 Craig Topper <craig.topper@gmail.com>

[X86] More accurately model the cost of horizontal reductions.

This patch attempts to more accurately model the reduction of
power of 2 vectors of types we natively support. This takes into
account

[X86] More accurately model the cost of horizontal reductions.

This patch attempts to more accurately model the reduction of
power of 2 vectors of types we natively support. This takes into
account the narrowing of vectors that occur as we go from 512
bits to 256 bits, to 128 bits. It also takes into account the use
of wider elements in the shuffles for the first 2 steps of a
reduction from 128 bits. And uses a v8i16 shift for the final step
of vXi8 reduction.

The default implementation uses the legalized type for the arithmetic
for all levels. And uses the single source permute cost of the
legalized type for all levels. This penalizes things like
lack of v16i8 pshufb on pre-sse3 targets and the splitting and
joining that needs to be done for integer types on AVX1. We never
need v16i8 shuffle for a reduction and we only need split AVX1 ops
when type the type wide and needs to be split. I think we're still
over costing splits and joins for AVX1, but we're closer now.

I've also removed all pairwise special casing because I don't
think we ever want to generate that on X86. I've also adjusted
the add handling to more accurately account for any type splitting
that occurs before we reach a legal type.

Differential Revision: https://reviews.llvm.org/D76478

show more ...


Revision tags: llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1
# a091f706 06-Nov-2019 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Improve add vXi64 + fadd vXf64 reduction tests for SLM

As noted on D59710 we weren't handling the high costs of these operations on SLM.


# 1b986b41 06-Nov-2019 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Add add/fadd reduction tests for SLM


# 1b59a16c 12-Oct-2019 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Improve sum reduction costs.

I can't see any notable differences in costs between SSE2 and SSE42 arches for FADD/ADD reduction, so I've lowered the target to just SSE2.

I've also a

[CostModel][X86] Improve sum reduction costs.

I can't see any notable differences in costs between SSE2 and SSE42 arches for FADD/ADD reduction, so I've lowered the target to just SSE2.

I've also added vXi8 sum reduction costs in line with the PSADBW codegen and discussions on PR42674.

llvm-svn: 374655

show more ...


Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1
# c6bfb057 13-Dec-2018 Craig Topper <craig.topper@intel.com>

[CostModel][X86] Don't count 2 shuffles on the last level of a pairwise arithmetic or min/max reduction

This is split from D55452 with the correct patch this time.

Pairwise reductions require two s

[CostModel][X86] Don't count 2 shuffles on the last level of a pairwise arithmetic or min/max reduction

This is split from D55452 with the correct patch this time.

Pairwise reductions require two shuffles on every level but the last. On the last level the two shuffles are <1, u, u, u...> and <0, u, u, u...>, but <0, u, u, u...> will be dropped by InstCombine/DAGCombine as being an identity shuffle.

Differential Revision: https://reviews.llvm.org/D55615

llvm-svn: 349072

show more ...


Revision tags: llvmorg-7.0.1, llvmorg-7.0.1-rc3
# d1498ed8 07-Dec-2018 Craig Topper <craig.topper@intel.com>

[CostModel][X86] Fix overcounting arithmetic cost in illegal types in getArithmeticReductionCost/getMinMaxReductionCost

We were overcounting the number of arithmetic operations needed at each level

[CostModel][X86] Fix overcounting arithmetic cost in illegal types in getArithmeticReductionCost/getMinMaxReductionCost

We were overcounting the number of arithmetic operations needed at each level before we reach a legal type. We were using the full vector type for that level, but we are going to split the input vector at that level in half. So the effective arithmetic operation cost at that level is half the width.

So for example on 8i32 on an sse target. Were were calculating the cost of an 8i32 op which is likely 2 for basic integer. Then after the loop we count 2 more v4i32 ops. For a total arith cost of 4. But if you look at the assembly there would only be 3 arithmetic ops.

There are still more bugs in this code that I'm going to work on next. The non pairwise code shouldn't count extract subvectors in the loop. There are no extracts, the types are split in registers. For pairwise we need to use 2 two src permute shuffles.

Differential Revision: https://reviews.llvm.org/D55397

llvm-svn: 348621

show more ...


# 102854f4 01-Dec-2018 Simon Pilgrim <llvm-dev@redking.me.uk>

[TTI] Reduction costs only need to include a single extract element cost (REAPPLIED)

We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extractin

[TTI] Reduction costs only need to include a single extract element cost (REAPPLIED)

We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.

For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.

Fixes PR37731

Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997

Differential Revision: https://reviews.llvm.org/D54585

llvm-svn: 348076

show more ...


# 8cd9d1b5 26-Nov-2018 Fedor Sergeev <fedor.sergeev@azul.com>

Revert "[TTI] Reduction costs only need to include a single extract element cost"

This reverts commit r346970.
It was causing PR39774, a crash in slp-vectorizer on a rather simple loop
with just a b

Revert "[TTI] Reduction costs only need to include a single extract element cost"

This reverts commit r346970.
It was causing PR39774, a crash in slp-vectorizer on a rather simple loop
with just a bunch of 'and's in the body.

llvm-svn: 347541

show more ...


# 924f1934 15-Nov-2018 Simon Pilgrim <llvm-dev@redking.me.uk>

[TTI] Reduction costs only need to include a single extract element cost

We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every elem

[TTI] Reduction costs only need to include a single extract element cost

We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.

For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.

Fixes PR37731

Differential Revision: https://reviews.llvm.org/D54585

llvm-svn: 346970

show more ...


# fc8f1d7d 09-Nov-2018 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] SK_ExtractSubvector is free if the subvector is at the start of the source vector

llvm-svn: 346538


Revision tags: llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1
# 44a9a71d 30-Oct-2018 Simon Pilgrim <llvm-dev@redking.me.uk>

[TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368)

Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector.

Unlike the re

[TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368)

Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector.

Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination!

I've done my best to fix a number of vectorizer uses:

SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment.

LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta.

Differential Revision: https://reviews.llvm.org/D53573

llvm-svn: 345617

show more ...


Revision tags: llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1
# 2a9cde02 21-Jun-2018 Simon Pilgrim <llvm-dev@redking.me.uk>

[X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882)

These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like

[X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882)

These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does.

llvm-svn: 335216

show more ...


Revision tags: llvmorg-6.0.1, llvmorg-6.0.1-rc3
# 07839219 12-Jun-2018 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel] Treat Identity shuffle masks as zero cost

As discussed on D47985, identity shuffle masks should probably be free.

I've limited this to the case where the input and output types all matc

[CostModel] Treat Identity shuffle masks as zero cost

As discussed on D47985, identity shuffle masks should probably be free.

I've limited this to the case where the input and output types all match - but we could probably accept all cases.

Differential Revision: https://reviews.llvm.org/D47986

llvm-svn: 334506

show more ...


12