|
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2 |
|
| #
3ef92208 |
| 29-Jul-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Add missing AVX512 vector mul overflow intrinsic costs
Fix regressions in #100519
|
| #
f2a0f97f |
| 26-Jul-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Improve vector mul overflow intrinsic costs
|
|
Revision tags: llvmorg-19.1.0-rc1 |
|
| #
010dcfd8 |
| 25-Jul-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Improve add/sub/mul overflow intrinsic costs
Noticed due to x86 changes in #97463
|
|
Revision tags: llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
| #
595a7439 |
| 14-Jun-2023 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Tweak SSE2 v2i64 multiply costs based off D46276 script
It looks like we were trying to account for SLM costs, which are actually handled separately
Fixes #62969
|
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2 |
|
| #
faff990e |
| 25-Sep-2022 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[X86] Fix Icelake VPMULLQ zmm pipes and adjust AVX512DQ v8i64 mul costs to match worse case
Icelake PMULLQ throughput regressed cf SkylakeServer as its Pipe0 only
Confirmed with Intel SOM, Agner an
[X86] Fix Icelake VPMULLQ zmm pipes and adjust AVX512DQ v8i64 mul costs to match worse case
Icelake PMULLQ throughput regressed cf SkylakeServer as its Pipe0 only
Confirmed with Intel SOM, Agner and instlatx64
show more ...
|
|
Revision tags: llvmorg-15.0.1 |
|
| #
f8fa0429 |
| 16-Sep-2022 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Add CostKinds handling for vector integer comparisons
These were based off a mixture of vector integer add/sub costs and the numbers from the 'cost-tables vs llvm-mca' script from D
[CostModel][X86] Add CostKinds handling for vector integer comparisons
These were based off a mixture of vector integer add/sub costs and the numbers from the 'cost-tables vs llvm-mca' script from D103695 - the extra costs for different predicates are still proving tricky to implement, but I've gotten most costs to within +/1 now - the AVX512 are tricky as we still don't handle predicate results properly, so most of these were done by hand.
show more ...
|
| #
0ec028fe |
| 15-Sep-2022 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Add CostKinds handling for vector shift by uniform/constuniform ops
Vector shift by const uniform is the cheapest shift instruction we have, non-const uniform have a marginally high
[CostModel][X86] Add CostKinds handling for vector shift by uniform/constuniform ops
Vector shift by const uniform is the cheapest shift instruction we have, non-const uniform have a marginally higher cost - some targets 'splat' the amount internally to use the shift-per-element instruction, others see a higher cost for the explicit zeroing of the upper bits for the (64-bit) shift amount.
This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)
show more ...
|
| #
40ab7875 |
| 14-Sep-2022 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Fix throughput costs for AVX512BW v32i16 shifts
Fixes regression from a931dbfbd30754cf39897037a223eee60ae9e855
|
| #
a931dbfb |
| 10-Sep-2022 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Merge AVX512BW vXi8/vXi16 shifts into default AVX512BW cost table
We only need to handle the uniform cases early
|
|
Revision tags: llvmorg-15.0.0 |
|
| #
c444af1c |
| 04-Sep-2022 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Add CostKinds handling for mul ops
This was achieved using the 'cost-tables vs llvm-mca' script D103695
Also fix a missing pmullw v16i16 half-rate throughput as znver1 double-pumps
[CostModel][X86] Add CostKinds handling for mul ops
This was achieved using the 'cost-tables vs llvm-mca' script D103695
Also fix a missing pmullw v16i16 half-rate throughput as znver1 double-pumps - matches numbers from AMD SoG + Agner
show more ...
|
| #
444685de |
| 03-Sep-2022 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Adjust mul v4i32/v8i32 throughput cost
Based off the numbers from AMD SoG + Agner - vXi32 are both half-rate, and znver1 double-pumps the v8i32 op
We should have caught this earlie
[CostModel][X86] Adjust mul v4i32/v8i32 throughput cost
Based off the numbers from AMD SoG + Agner - vXi32 are both half-rate, and znver1 double-pumps the v8i32 op
We should have caught this earlier as many Intel models have half-rate pmulld already :-(
show more ...
|
| #
78304450 |
| 30-Aug-2022 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Account for add/sub 512-bit vector splitting costs on non-AVX512BW targets
|
|
Revision tags: llvmorg-15.0.0-rc3 |
|
| #
1ad18b59 |
| 19-Aug-2022 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Cleanup arithmetic test triples
Just specify the triple inside the RUN command (to make i686 support much easier!), and consistently use x86_64-- generic triple
|
|
Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
d5198cf9 |
| 01-May-2022 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Check for 'null op' truncations
If the legalized src/dst types are the same, assume the "truncation" is free.
This fixes some edge cases such as mul lo/hi ops and bool vectors whic
[CostModel][X86] Check for 'null op' truncations
If the legalized src/dst types are the same, assume the "truncation" is free.
This fixes some edge cases such as mul lo/hi ops and bool vectors which will get legalized back to legal vector widths
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
| #
d663166a |
| 25-Mar-2022 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Reduce cost of v2i64 icmp base cost on SSE2 targets
Based off the script from D103695, we were exaggerating the cost of the v2i64 comparison expansion using instruction count instea
[CostModel][X86] Reduce cost of v2i64 icmp base cost on SSE2 targets
Based off the script from D103695, we were exaggerating the cost of the v2i64 comparison expansion using instruction count instead of effective throughput
show more ...
|
| #
4455c5cd |
| 18-Mar-2022 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] Update RUN -passes=* to double quotes to appease update scripts on windows
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
| #
15ba588d |
| 09-Feb-2022 |
Arthur Eubanks <aeubanks@google.com> |
[test] Migrate '-analyze -cost-model' to '-passes=print<cost-model>'
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
d2c093e7 |
| 07-Dec-2021 |
Haohai Wen <haohai.wen@intel.com> |
[CostModel][X86] Add i64 mul cost for avx512 as 1cy
i64 mul cost is 1cy for all cpu that support avx512. Currently all X86 cpu uses i64 mul cost in X64 cost table which is not true for cpu that supp
[CostModel][X86] Add i64 mul cost for avx512 as 1cy
i64 mul cost is 1cy for all cpu that support avx512. Currently all X86 cpu uses i64 mul cost in X64 cost table which is not true for cpu that support avx512 (skx, icx).
Reviewed By: pengfei, RKSimon
Differential Revision: https://reviews.llvm.org/D115016
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
a468c39c |
| 15-Nov-2021 |
Roman Lebedev <lebedev.ri@gmail.com> |
[X86][Costmodel] `trunc v32i16 to v64i8` can appear after legalization, cost is same as for `trunc v32i16 to v32i8`
Some of the costs get larger here, but i suppose that makes sense since we'd previ
[X86][Costmodel] `trunc v32i16 to v64i8` can appear after legalization, cost is same as for `trunc v32i16 to v32i8`
Some of the costs get larger here, but i suppose that makes sense since we'd previously query scalarization costs that may not be really representative of the reality.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D113852
show more ...
|
| #
a5f2fdca |
| 14-Nov-2021 |
Roman Lebedev <lebedev.ri@gmail.com> |
[X86][Costmodel] `trunc v16i32 to v32i16` can appear after legalization, cost is same as for `trunc v16i32 to v16i16`
This was noticed in D113609, hopefully it unblocks that patch. There are likely
[X86][Costmodel] `trunc v16i32 to v32i16` can appear after legalization, cost is same as for `trunc v16i32 to v16i16`
This was noticed in D113609, hopefully it unblocks that patch. There are likely other similar problems.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D113842
show more ...
|
| #
2ced9a42 |
| 06-Oct-2021 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][TTI] Replace BAD_ICMP_PREDICATE with ICMP_NE for generic smulo/umulo cost expansion
Match the predicate used in TargetLowering::expandMULO to detect overflow
|
| #
7bd097fd |
| 06-Oct-2021 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][TTI] Fix ops used for generic smulo/umulo cost expansion
Fix copy+pasta that was checking for smul_fix instead of smul_with_overflow to detected signed values.
The LShr is performed on
[CostModel][TTI] Fix ops used for generic smulo/umulo cost expansion
Fix copy+pasta that was checking for smul_fix instead of smul_with_overflow to detected signed values.
The LShr is performed on the extended type as we use it to truncate+extract the upper/hi bits of the extended multiply.
More closely matches the default expansion from TargetLowering::expandMULO
show more ...
|
| #
81b5da8c |
| 06-Oct-2021 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][TTI] Replace BAD_ICMP_PREDICATE with ICMP_ULT/UGT for generic uadd/usubo cost expansion
Match the predicates used in TargetLowering::expandUADDSUBO
|
| #
0776924a |
| 05-Oct-2021 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel][X86] getCmpSelInstrCost - treat BAD_PREDICATEs the same as the worst case cost predicates for ICMP/FCMP instructions
As suggested on D111024, we should treat getCmpSelInstrCost calls wit
[CostModel][X86] getCmpSelInstrCost - treat BAD_PREDICATEs the same as the worst case cost predicates for ICMP/FCMP instructions
As suggested on D111024, we should treat getCmpSelInstrCost calls without a specific predicate as matching the worst case predicate cost.
These regressions will be addressed with a mixture of D111024 and fixing other specific getCmpSelInstrCost calls to have realistic predicates.
show more ...
|
| #
76534829 |
| 29-Sep-2021 |
Craig Topper <craig.topper@sifive.com> |
[CostModel] Update default cost model for sadd/ssub overflow to match TargetLowering
The expansion for these was updated in https://reviews.llvm.org/D47927 but the cost model was not adjusted.
I be
[CostModel] Update default cost model for sadd/ssub overflow to match TargetLowering
The expansion for these was updated in https://reviews.llvm.org/D47927 but the cost model was not adjusted.
I believe the cost model was also incorrect for the old expansion. The expansion prior to D47927 used 3 icmps using LHS, RHS, and Result to calculate theirs signs. Then 2 icmps to compare the signs. Followed by an And. The previous cost model was using 3 icmps and 2 selects. Digging back through git blame, those 2 selects in the cost model used to be 2 icmps, but were changed in https://reviews.llvm.org/D90681
Differential Revision: https://reviews.llvm.org/D110739
show more ...
|