|
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0 |
|
| #
960c975a |
| 16-Sep-2024 |
David Green <david.green@arm.com> |
[AArch64] Expand scmp/ucmp vector operations with sub (#108830)
Unlike scalar, where AArch64 prefers expanding scmp/ucmp with select,
under Neon we can use the arithmetic expansion to generate fewe
[AArch64] Expand scmp/ucmp vector operations with sub (#108830)
Unlike scalar, where AArch64 prefers expanding scmp/ucmp with select,
under Neon we can use the arithmetic expansion to generate fewer
instructions. Notably it also prevents the scalarization of vselect
during vector-legalization.
show more ...
|
|
Revision tags: llvmorg-19.1.0-rc4 |
|
| #
140e80a2 |
| 31-Aug-2024 |
Yingwei Zheng <dtcxzyw2333@gmail.com> |
[TTI] Add cost model support for [u|s]cmp (#106824)
This patch adds cost model support for [u|s]cmp.
|
|
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1 |
|
| #
2a859b20 |
| 28-Jul-2023 |
David Green <david.green@arm.com> |
[AArch64] Change the cost of vector insert/extract to 2
The cost of vector instructions has always been high under AArch64, in order to add a high cost for inserts/extracts, shuffles and scalarizati
[AArch64] Change the cost of vector insert/extract to 2
The cost of vector instructions has always been high under AArch64, in order to add a high cost for inserts/extracts, shuffles and scalarization. This is a conservative approach to limit the scope of unusual SLP vectorization where the codegen ends up being quite poor, but has always been higher than the correct costs would be for any specific core.
This relaxes that, reducing the vector insert/extract cost from 3 to 2. It is a generalization of D142359 to all AArch64 cpus. The ScalarizationOverhead is also overridden for integer vector at the same time, to remove the effect of lane 0 being considered free for integer vectors (something that should only be true for float when scalarizing).
The lower insert/extract cost will reduce the cost of insert, extracts, shuffling and scalarization. The adjustments of ScalaizationOverhead will increase the cost on integer, especially for small vectors. The end result will be lower cost for float and long-integer types, some higher cost for some smaller vectors. This, along with the raw insert/extract cost being lower, will generally mean more vectorization from the Loop and SLP vectorizer.
We may end up regretting this, as that vectorization is not always profitable. In all the benchmarking I have done this is generally an improvement in the overall performance, and I've attempted to address the places where it wasn't with other costmodel adjustments.
Differential Revision: https://reviews.llvm.org/D155459
show more ...
|
|
Revision tags: llvmorg-18-init |
|
| #
5106b221 |
| 01-Jul-2023 |
David Green <david.green@arm.com> |
[AArch64] Treat the icmp in icmp(and(..), 0) as free
As in https://godbolt.org/z/4dafd9Geq, the icmp from an And may use an Ands to set flags, meaning the icmp is free.
This could also be done for
[AArch64] Treat the icmp in icmp(and(..), 0) as free
As in https://godbolt.org/z/4dafd9Geq, the icmp from an And may use an Ands to set flags, meaning the icmp is free.
This could also be done for add/sub, but those patterns often happen in the induction variable of a loop, making them quite performance sensitive.
Differential Revision: https://reviews.llvm.org/D153611
show more ...
|
| #
46ef3337 |
| 29-Jun-2023 |
David Green <david.green@arm.com> |
[AArch64] Add and cmp cost model tests. NFC
See D153611. Tests for the cost of icmp(and, 0) are added, in addition to expanding the extractelements-to-shuffle.ll test, which has always been a bit si
[AArch64] Add and cmp cost model tests. NFC
See D153611. Tests for the cost of icmp(and, 0) are added, in addition to expanding the extractelements-to-shuffle.ll test, which has always been a bit simple, to include a more complete example with both a vector and scalar version. The icmp(and, 0) costs are targetting at improving the second when the cost of vector inserts and extracts is lowered.
show more ...
|
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0 |
|
| #
75f1b328 |
| 29-Aug-2022 |
Mingming Liu <mingmingl@google.com> |
[AArch64][CostModel][NFC] Specify target datalayout explicitly for cost analysis test.
- Use linux little endian data layout string.
Differential Revision: https://reviews.llvm.org/D132889
|
|
Revision tags: llvmorg-15.0.0-rc3 |
|
| #
4178e334 |
| 10-Aug-2022 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[CostModel] Update RUN -passes=* to double quotes to appease update scripts on windows
DOS really doesn't like `` quotes to be used in command lines
Some prep work as I'm intending to resurrect D79
[CostModel] Update RUN -passes=* to double quotes to appease update scripts on windows
DOS really doesn't like `` quotes to be used in command lines
Some prep work as I'm intending to resurrect D79483 soon
show more ...
|
|
Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
| #
15ba588d |
| 09-Feb-2022 |
Arthur Eubanks <aeubanks@google.com> |
[test] Migrate '-analyze -cost-model' to '-passes=print<cost-model>'
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3 |
|
| #
1ccc4992 |
| 30-Jun-2020 |
Florian Hahn <flo@fhahn.com> |
[AArch64] Add getCFInstrCost, treat branches as free for throughput.
D79164/2596da31740f changed getCFInstrCost to return 1 per default. AArch64 did not have its own implementation, hence the throug
[AArch64] Add getCFInstrCost, treat branches as free for throughput.
D79164/2596da31740f changed getCFInstrCost to return 1 per default. AArch64 did not have its own implementation, hence the throughput cost of CFI instructions is overestimated. On most cores, most branches should be predicated and essentially free throughput wise.
This restores a 9% performance regression on a SPEC2006 benchmark on AArch64 with -O3 LTO & PGO.
This patch effectively restores pre 2596da31740f behavior for AArch64 and undoes the AArch64 test changes of the patch.
Reviewers: samparker, dmgreen, anemet
Reviewed By: samparker
Differential Revision: https://reviews.llvm.org/D82755
show more ...
|
|
Revision tags: llvmorg-10.0.1-rc2 |
|
| #
2596da31 |
| 15-Jun-2020 |
Sam Parker <sam.parker@arm.com> |
[CostModel] getCFInstrCost in getUserCost.
Have BasicTTI call the base implementation so that both agree on the default behaviour, which the default being a cost of '1'. This has required an X86 spe
[CostModel] getCFInstrCost in getUserCost.
Have BasicTTI call the base implementation so that both agree on the default behaviour, which the default being a cost of '1'. This has required an X86 specific implementation as it seems to be very reliant on those instructions being free. Changes are also made to AMDGPU so that their implementations distinguish between cost kinds, so that the unrolling isn't affected. PowerPC also has its own implementation to prevent changes to the reg-usage vectorizer test.
The cost model test changes now reflect that ret instructions are not generally free.
Differential Revision: https://reviews.llvm.org/D79164
show more ...
|
| #
792575ff |
| 26-May-2020 |
Sam Parker <sam.parker@arm.com> |
[NFC][ARM][AArch64] More code size tests
Add analysis runs for icmp, fcmp and select instructions.
|