#
89ca3e72 |
| 29-Jan-2025 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Reduce worst case v8i16/v16i8 SSE2 shuffle costs (#124789)
These were based off instruction count, not throughput - we can probably improve these further, but these throughput numbe
[CostModel][X86] Reduce worst case v8i16/v16i8 SSE2 shuffle costs (#124789)
These were based off instruction count, not throughput - we can probably improve these further, but these throughput numbers match the worse expanded shuffles we see in the vector-shuffle-128-v* codegen tests.
show more ...
|
Revision tags: llvmorg-21-init |
|
#
7d172f96 |
| 28-Jan-2025 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] getShuffleCosts - convert all shuffle cost tables to be CostKind compatible. NFC. (#124753)
No change in actual costs yet, but split the costs per cost kind to make it easier to twe
[CostModel][X86] getShuffleCosts - convert all shuffle cost tables to be CostKind compatible. NFC. (#124753)
No change in actual costs yet, but split the costs per cost kind to make it easier to tweak the numbers in future patches.
show more ...
|
#
1782168c |
| 27-Jan-2025 |
Kazu Hirata <kazu@google.com> |
[X86] Fix a warning
This patch fixes:
llvm/lib/Target/X86/X86TargetTransformInfo.cpp:1583:47: error: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'typename it
[X86] Fix a warning
This patch fixes:
llvm/lib/Target/X86/X86TargetTransformInfo.cpp:1583:47: error: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'typename iterator_traits<const int *>::difference_type' (aka 'long') [-Werror,-Wsign-compare]
show more ...
|
#
178f4714 |
| 27-Jan-2025 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] getShuffleCost - shuffles with only one defined element are always cheap (#124412)
If we're just moving a single element around inside a 128-bit lane (probably as an alternative to
[CostModel][X86] getShuffleCost - shuffles with only one defined element are always cheap (#124412)
If we're just moving a single element around inside a 128-bit lane (probably as an alternative to extracting it), we can assume this is cheap as a single PSRLDQ/PSHUFD/SHUFPS.
I've got the horrid feeling we're moving towards matching all SSE shuffle patterns inside the cost model, but I'm going to do my best to avoid this for now :|
show more ...
|
#
dec47b76 |
| 26-Jan-2025 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Update baseline CTTZ/CTLZ costs for x86_64 (#124312)
Followup to #123623 - now that the CMOV has been removed, the throughput has improved, reducing the benefit of vectorization on
[CostModel][X86] Update baseline CTTZ/CTLZ costs for x86_64 (#124312)
Followup to #123623 - now that the CMOV has been removed, the throughput has improved, reducing the benefit of vectorization on pre-x86-64-v3 CPUs
show more ...
|
#
a12d7e4b |
| 24-Jan-2025 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[SLP] getVectorCallCosts - don't provide scalar argument data for vector IntrinsicCostAttributes (#124254)
getVectorCallCosts determines the cost of a vector intrinsic, based off
an existing scalar
[SLP] getVectorCallCosts - don't provide scalar argument data for vector IntrinsicCostAttributes (#124254)
getVectorCallCosts determines the cost of a vector intrinsic, based off
an existing scalar intrinsic call - but we were including the scalar
argument data to the IntrinsicCostAttributes, which meant that not only
was the cost calculation not type-only based, it was making incorrect
assumptions about constant values etc.
This also exposed an issue that x86 relied on fallback calculations for
funnel shift costs - this is great when we have the argument data as
that improves the accuracy of uniform shift amounts etc., but meant that
type-only costs would default to Cost=2 for all custom lowered funnel
shifts, which was far too cheap.
This is the reverse of #124129 where we weren't including argument data
when we could.
Fixes #63980
show more ...
|
#
1fa56038 |
| 24-Jan-2025 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] getIntrinsicInstrCost - lrint/llrint costs can use getCastInstrCost without argument data
We don't use the IntrinsicCostAttributes arguments so, which allows us to use in type-only
[CostModel][X86] getIntrinsicInstrCost - lrint/llrint costs can use getCastInstrCost without argument data
We don't use the IntrinsicCostAttributes arguments so, which allows us to use in type-only analysis in a future patch.
show more ...
|
Revision tags: llvmorg-19.1.7 |
|
#
bab7920f |
| 13-Jan-2025 |
Alexey Bataev <a.bataev@outlook.com> |
[RISCV][CG]Use processShuffleMasks for per-register shuffles
Patch adds usage of processShuffleMasks in in codegen in lowerShuffleViaVRegSplitting. This function is already used for X86 shuffles est
[RISCV][CG]Use processShuffleMasks for per-register shuffles
Patch adds usage of processShuffleMasks in in codegen in lowerShuffleViaVRegSplitting. This function is already used for X86 shuffles estimations and in DAGTypeLegalizer::SplitVecRes_VECTOR_SHUFFLE functions, unifies the code.
Reviewers: topperc, wangpc-pp, lukel97, preames
Reviewed By: preames
Pull Request: https://github.com/llvm/llvm-project/pull/121765
show more ...
|
#
a5e129cc |
| 07-Jan-2025 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] getVectorInstrCost - correctly cost v4f32 insertelement into index 0
This is just the MOVSS instruction (SSE41 INSERTPS is still necessary for index != 0)
This exposed an issue in
[CostModel][X86] getVectorInstrCost - correctly cost v4f32 insertelement into index 0
This is just the MOVSS instruction (SSE41 INSERTPS is still necessary for index != 0)
This exposed an issue in VectorCombine::foldInsExtFNeg - we need to use the more general SK_PermuteTwoSrc shuffle kind to allow getShuffleCost to match other shuffle kinds (not just SK_Select).
show more ...
|
#
5a7dfb46 |
| 07-Jan-2025 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Attempt to match v4f32 shuffles that map to MOVSS/INSERTPS instruction
improveShuffleKindFromMask matches this as a SK_InsertSubvector of a v1f32 (which legalises to f32) into a v4f
[CostModel][X86] Attempt to match v4f32 shuffles that map to MOVSS/INSERTPS instruction
improveShuffleKindFromMask matches this as a SK_InsertSubvector of a v1f32 (which legalises to f32) into a v4f32 base vector, making it easy to recognise. MOVSS is limited to index0.
show more ...
|
#
db88071a |
| 06-Jan-2025 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Attempt to match cheap v4f32 shuffles that map to SHUFPS instruction (#121778)
Avoid always assuming the worst for v4f32 2 input shuffles, and match the SHUFPS pattern where possibl
[CostModel][X86] Attempt to match cheap v4f32 shuffles that map to SHUFPS instruction (#121778)
Avoid always assuming the worst for v4f32 2 input shuffles, and match the SHUFPS pattern where possible - each pair of output elements must come from the same source register.
show more ...
|
#
7cdbde70 |
| 06-Jan-2025 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] getShuffleCost - use processShuffleMasks for all shuffle kinds to legal types (#120599) (#121760)
Now that processShuffleMasks can correctly handle 2 src shuffles, we can completely
[CostModel][X86] getShuffleCost - use processShuffleMasks for all shuffle kinds to legal types (#120599) (#121760)
Now that processShuffleMasks can correctly handle 2 src shuffles, we can completely remove the shuffle kind limits and correctly recognize the number of active subvectors per legalized shuffle - improveShuffleKindFromMask will determine the shuffle kind for each split subvector.
show more ...
|
#
611401c1 |
| 20-Dec-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] getShuffleCost - use processShuffleMasks to split SK_PermuteTwoSrc shuffles to legal types (#120599)
processShuffleMasks can now correctly handle 2 src shuffles, so we can use the e
[CostModel][X86] getShuffleCost - use processShuffleMasks to split SK_PermuteTwoSrc shuffles to legal types (#120599)
processShuffleMasks can now correctly handle 2 src shuffles, so we can use the existing SK_PermuteSingleSrc splitting cost logic to handle SK_PermuteTwoSrc as well and correctly recognise the number of active subvectors per legalised shuffle.
show more ...
|
#
091448e3 |
| 20-Dec-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
Revert "[CostModel][X86] getShuffleCost - use processShuffleMasks to split SK_PermuteTwoSrc shuffles to legal types" (#120707)
Reverts llvm/llvm-project#120599 - some recent tests are currently fail
Revert "[CostModel][X86] getShuffleCost - use processShuffleMasks to split SK_PermuteTwoSrc shuffles to legal types" (#120707)
Reverts llvm/llvm-project#120599 - some recent tests are currently failing
show more ...
|
#
81e63f9e |
| 20-Dec-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] getShuffleCost - use processShuffleMasks to split SK_PermuteTwoSrc shuffles to legal types (#120599)
processShuffleMasks can now correctly handle 2 src shuffles, so we can use the e
[CostModel][X86] getShuffleCost - use processShuffleMasks to split SK_PermuteTwoSrc shuffles to legal types (#120599)
processShuffleMasks can now correctly handle 2 src shuffles, so we can use the existing SK_PermuteSingleSrc splitting cost logic to handle SK_PermuteTwoSrc as well and correctly recognise the number of active subvectors per legalised shuffle.
show more ...
|
#
9bb1d036 |
| 19-Dec-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[X86] getShuffleCost - when splitting shuffles, if a whole vector source is just copied we should treat this as free. (#120561)
If the shuffle split results in referencing a single legalised whole v
[X86] getShuffleCost - when splitting shuffles, if a whole vector source is just copied we should treat this as free. (#120561)
If the shuffle split results in referencing a single legalised whole vector (i.e. no permutation), then this can be treated as free.
We already do something similar for broadcasts / whole subvector insertion + extraction - its purely an issue for register allocation.
show more ...
|
Revision tags: llvmorg-19.1.6 |
|
#
df2356b4 |
| 17-Dec-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[X86] getShuffleCost - ensure we treat constant folded shuffles as free
|
#
4f933277 |
| 10-Dec-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Improve cost estimation of insert_subvector shuffle patterns of legalized types (#119363)
In cases where the base/sub vector type in an insert_subvector pattern legalize to the same
[CostModel][X86] Improve cost estimation of insert_subvector shuffle patterns of legalized types (#119363)
In cases where the base/sub vector type in an insert_subvector pattern legalize to the same width through splitting, we can assume that the shuffle becomes free as the legalized vectors will not overlap.
Note this isn't true if the vectors have been widened during legalization (e.g. v2f32 insertion into v4f32 would legalize to v4f32 into v4f32).
Noticed while working on adding processShuffleMasks handling for SK_PermuteTwoSrc.
show more ...
|
#
53e9eee0 |
| 10-Dec-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[X86][TTI] Use TargetCostConstants Free/Basic values instead of hard coded 0/1 to make the costs calculation more obvious. NFC.
|
#
85d15bd1 |
| 04-Dec-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[TTI][X86] getMemoryOpCost - reduced costs when loading uniform values due to value reuse (#118642)
Similar to what we do for broadcast shuffles, when legalising load costs, if the value is known to
[TTI][X86] getMemoryOpCost - reduced costs when loading uniform values due to value reuse (#118642)
Similar to what we do for broadcast shuffles, when legalising load costs, if the value is known to be uniform, then we will only load a single vector and reuse this across the split legalised registers.
Fixes #111126
show more ...
|
#
041e5c96 |
| 04-Dec-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[X86] getMemoryOpCost - ensure we pass through OpInfo / Instruction args to base getMemoryOpCost calls
Nothing really uses these yet, but we shouldn't be losing the info.
We can also pass on the Op
[X86] getMemoryOpCost - ensure we pass through OpInfo / Instruction args to base getMemoryOpCost calls
Nothing really uses these yet, but we shouldn't be losing the info.
We can also pass on the OpInfo arg to the getMemoryOpCost constant load call to indicate if its constant/uniform/pow2 etc.
Prep cleanup for #111126
show more ...
|
Revision tags: llvmorg-19.1.5 |
|
#
94df95de |
| 01-Dec-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[TTI][X86] getShuffleCosts - for SK_PermuteTwoSrc, if the masks are known to be "inlane" no need to scale the costs by worst-case legalization (#117999)
SK_PermuteTwoSrc legalization has to assume a
[TTI][X86] getShuffleCosts - for SK_PermuteTwoSrc, if the masks are known to be "inlane" no need to scale the costs by worst-case legalization (#117999)
SK_PermuteTwoSrc legalization has to assume any of the legalised source registers could be referenced in split shuffles, but if we already know that each 128-bit lane only references elements from the same lane of the source operands, then this scaling won't occur.
Hopefully this can help with #113356 without us having to get full processShuffleMasks canonicalization finished first.
show more ...
|
#
0ad6be19 |
| 29-Nov-2024 |
Jonas Paulsson <paulson1@linux.ibm.com> |
[SLPVectorizer, TargetTransformInfo, SystemZ] Improve SLP getGatherCost(). (#112491)
As vector element loads are free on SystemZ, this patch improves the cost
computation in getGatherCost() to ref
[SLPVectorizer, TargetTransformInfo, SystemZ] Improve SLP getGatherCost(). (#112491)
As vector element loads are free on SystemZ, this patch improves the cost
computation in getGatherCost() to reflect this.
getScalarizationOverhead() gets an optional parameter which can hold the actual
Values so that they in turn can be passed (by BasicTTIImpl) to
getVectorInstrCost().
SystemZTTIImpl::getVectorInstrCost() will now recognize a LoadInst and
typically return a 0 cost for it, with some exceptions.
show more ...
|
Revision tags: llvmorg-19.1.4 |
|
#
dfe43bd1 |
| 09-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[X86] Remove unused includes (NFC) (#115593)
Identified with misc-include-cleaner.
|
#
ac1869aa |
| 04-Nov-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Add initial costs for non-lane-crossing one/two input shuffles (#114680)
Most of the x86 shuffle instructions operate within each 128-bit subvector lane, but our shuffle costs strug
[CostModel][X86] Add initial costs for non-lane-crossing one/two input shuffles (#114680)
Most of the x86 shuffle instructions operate within each 128-bit subvector lane, but our shuffle costs struggle to handle this and have to fallback to worst case shuffles that reference elements from any lane.
This patch detects shuffle masks that we know are "inlane" and enable us to assume a cheaper shuffle cost.
show more ...
|