History log of /llvm-project/llvm/test/Transforms/SLPVectorizer/X86/shift-lshr.ll (Results 1 – 23 of 23)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 38511864 03-Jul-2024 Alexey Bataev <a.bataev@outlook.com>

[SLP]Remove operands upon marking instruction for deletion.

If the instruction is marked for deletion, better to drop all its
operands and mark them for deletion too (if allowed). It allows to have

[SLP]Remove operands upon marking instruction for deletion.

If the instruction is marked for deletion, better to drop all its
operands and mark them for deletion too (if allowed). It allows to have
more vectorizable patterns and generate less useless extractelement
instructions.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/97409

show more ...


# 873c3f7e 03-Jul-2024 Alexey Bataev <a.bataev@outlook.com>

Revert "[SLP]Remove operands upon marking instruction for deletion."

This reverts commit bbd52dd44ceee80e3b6ba6a9b2bd8ee9a9713833 to fix
a crash revealed in https://lab.llvm.org/buildbot/#/builders/

Revert "[SLP]Remove operands upon marking instruction for deletion."

This reverts commit bbd52dd44ceee80e3b6ba6a9b2bd8ee9a9713833 to fix
a crash revealed in https://lab.llvm.org/buildbot/#/builders/4/builds/505

show more ...


# bbd52dd4 03-Jul-2024 Alexey Bataev <a.bataev@outlook.com>

[SLP]Remove operands upon marking instruction for deletion.

If the instruction is marked for deletion, better to drop all its
operands and mark them for deletion too (if allowed). It allows to have

[SLP]Remove operands upon marking instruction for deletion.

If the instruction is marked for deletion, better to drop all its
operands and mark them for deletion too (if allowed). It allows to have
more vectorizable patterns and generate less useless extractelement
instructions.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/97409

show more ...


Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# 580210a0 23-Dec-2022 Nikita Popov <npopov@redhat.com>

[SLP] Convert some tests to opaque pointers (NFC)


Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3
# 3be72f40 12-Oct-2022 Bjorn Pettersson <bjorn.a.pettersson@ericsson.com>

[test][SLPVectorizer] Use -passes syntax in RUN lines. NFC


# f3a928e2 07-Oct-2022 Arthur Eubanks <aeubanks@google.com>

[opt] Don't translate legacy -analysis flag to require<analysis>

Tests relying on this should explicitly use -passes='require<analysis>,foo'.


Revision tags: working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# 48cc9287 18-Mar-2022 Philip Reames <listmail@philipreames.com>

Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"" (try 3)

The original commit exposed several missing dependencies (e.g. latent bugs in SLP scheduling). Most of these were fixed

Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"" (try 3)

The original commit exposed several missing dependencies (e.g. latent bugs in SLP scheduling). Most of these were fixed over the weekend and have had several days to bake. The last was fixed this morning after being noticed in manual review of test changes yesterday. See the review thread for links to each change.

Original commit message follows:

SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

Before this patch:

704357 SLP - Number of calcDeps actions
699021 SLP - Number of schedule calls
5598 SLP - Number of ReSchedule actions
59 SLP - Number of ReScheduleOnFail actions
10084 SLP - Number of schedule resets
8523 SLP - Number of vector instructions generated

After this patch:

102895 SLP - Number of calcDeps actions
161916 SLP - Number of schedule calls
5637 SLP - Number of ReSchedule actions
55 SLP - Number of ReScheduleOnFail actions
10083 SLP - Number of schedule resets
8403 SLP - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538

show more ...


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# deae979a 03-Mar-2022 Philip Reames <listmail@philipreames.com>

Revert "Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"""

This reverts commit 738042711bc08cde9135873200b1d088e6cf11c3. A second, apparently separate, issue has been reported on

Revert "Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"""

This reverts commit 738042711bc08cde9135873200b1d088e6cf11c3. A second, apparently separate, issue has been reported on the original review.

show more ...


# 73804271 02-Mar-2022 Philip Reames <listmail@philipreames.com>

Reapply "[SLP] Schedule only sub-graph of vectorizable instructions""

Root issue which triggered the revert was fixed in 689bab. No changes in the reapplied patch.

Original commit message follows:

Reapply "[SLP] Schedule only sub-graph of vectorizable instructions""

Root issue which triggered the revert was fixed in 689bab. No changes in the reapplied patch.

Original commit message follows:

SLP currently schedules all instructions within a scheduling window which stretches from the first instr
uction potentially vectorized to the last. This window can include a very large number of unrelated instruct
ions which are not being considered for vectorization. This change switches the code to only schedule the su
b-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

Before this patch:

704357 SLP - Number of calcDeps actions
699021 SLP - Number of schedule calls
5598 SLP - Number of ReSchedule actions
59 SLP - Number of ReScheduleOnFail actions
10084 SLP - Number of schedule resets
8523 SLP - Number of vector instructions generated

After this patch:

102895 SLP - Number of calcDeps actions
161916 SLP - Number of schedule calls
5637 SLP - Number of ReSchedule actions
55 SLP - Number of ReScheduleOnFail actions
10083 SLP - Number of schedule resets
8403 SLP - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538

show more ...


Revision tags: llvmorg-14.0.0-rc2
# 9c6250ee 01-Mar-2022 Arthur Eubanks <aeubanks@google.com>

Revert "[SLP] Schedule only sub-graph of vectorizable instructions"

This reverts commit 0539a26d91a1b7c74022fa9cf33bd7faca87544d.

Causes a miscompile, see comments on D118538.

Required updating bo

Revert "[SLP] Schedule only sub-graph of vectorizable instructions"

This reverts commit 0539a26d91a1b7c74022fa9cf33bd7faca87544d.

Causes a miscompile, see comments on D118538.

Required updating bottom-to-top-reorder.ll.

show more ...


# 0539a26d 22-Feb-2022 Philip Reames <listmail@philipreames.com>

[SLP] Schedule only sub-graph of vectorizable instructions

SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to th

[SLP] Schedule only sub-graph of vectorizable instructions

SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

Before this patch:

704357 SLP - Number of calcDeps actions
699021 SLP - Number of schedule calls
5598 SLP - Number of ReSchedule actions
59 SLP - Number of ReScheduleOnFail actions
10084 SLP - Number of schedule resets
8523 SLP - Number of vector instructions generated

After this patch:

102895 SLP - Number of calcDeps actions
161916 SLP - Number of schedule calls
5637 SLP - Number of ReSchedule actions
55 SLP - Number of ReScheduleOnFail actions
10083 SLP - Number of schedule resets
8403 SLP - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538

show more ...


Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init
# e1bdb579 22-Jul-2021 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Adjust shift SSE legalized costs based on llvm-mca reports.

Update shl/lshr/ashr costs based on the worst case costs from the script in D103695.


Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2
# def62697 25-May-2021 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Improve accuracy of 256-bit non-uniform vector shifts on AVX1

Determined from llvm-mca analysis, AVX1 capable targets have a higher throughput for VPBLENDVB and shuffle ops, making

[CostModel][X86] Improve accuracy of 256-bit non-uniform vector shifts on AVX1

Determined from llvm-mca analysis, AVX1 capable targets have a higher throughput for VPBLENDVB and shuffle ops, making it cheaper to perform shift+shuffle/select shift patterns.

show more ...


Revision tags: llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1
# 119e4550 08-Nov-2020 Simon Pilgrim <llvm-dev@redking.me.uk>

[SLPVectorizer][X86] Remove unused check-prefixes


Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3
# 691c086d 26-Jun-2020 Arthur Eubanks <aeubanks@google.com>

[NewPM][BasicAA] basicaa -> basic-aa in Transforms/SLPVectorizer

Following https://reviews.llvm.org/D82607.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D82681


Revision tags: llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1
# bfa32573 14-Nov-2019 Alexey Bataev <a.bataev@hotmail.com>

Revert "Temporarily Revert:"

This reverts commit e511c4b0dff1692c267addf17dce3cebe8f97faa:

Temporarily Revert:

"[SLP] Generalization of stores vectorization."
"[SLP] Fix -Wunused-var

Revert "Temporarily Revert:"

This reverts commit e511c4b0dff1692c267addf17dce3cebe8f97faa:

Temporarily Revert:

"[SLP] Generalization of stores vectorization."
"[SLP] Fix -Wunused-variable. NFC"
"[SLP] Vectorize jumbled stores."

after fixing the problem with compile time.

show more ...


# e511c4b0 06-Nov-2019 Eric Christopher <echristo@gmail.com>

Temporarily Revert:

"[SLP] Generalization of stores vectorization."
"[SLP] Fix -Wunused-variable. NFC"
"[SLP] Vectorize jumbled stores."

As they're causing significant (10-30x) compile time regr

Temporarily Revert:

"[SLP] Generalization of stores vectorization."
"[SLP] Fix -Wunused-variable. NFC"
"[SLP] Vectorize jumbled stores."

As they're causing significant (10-30x) compile time regressions on
vectorizable code.

The primary cause of the compile-time regression is f228b5371647f471853c5fb3e6719823a42fe451.

This reverts commits:

f228b5371647f471853c5fb3e6719823a42fe451
5503455ccb3f5fcedced158332c016c8d3a7fa81
21d498c9c0f32dcab5bc89ac593aa813b533b43a

show more ...


# f228b537 28-Oct-2019 Alexey Bataev <a.bataev@hotmail.com>

[SLP] Generalization of stores vectorization.

Stores are vectorized with maximum vectorization factor of 16. Patch
tries to improve the situation and use maximal vectorization factor.

Reviewers: sp

[SLP] Generalization of stores vectorization.

Stores are vectorized with maximum vectorization factor of 16. Patch
tries to improve the situation and use maximal vectorization factor.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Differential Revision: https://reviews.llvm.org/D43582

show more ...


Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1
# cee313d2 17-Apr-2019 Eric Christopher <echristo@gmail.com>

Revert "Temporarily Revert "Add basic loop fusion pass.""

The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552


Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3
# 2534592b 20-Feb-2019 Eric Christopher <echristo@gmail.com>

Temporarily Revert "[X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)"

As this has broken the lto bootstrap build for 3 days and is
showing a significant regress

Temporarily Revert "[X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)"

As this has broken the lto bootstrap build for 3 days and is
showing a significant regression on the Dither_benchmark results (from
the LLVM benchmark suite) -- specifically, on the
BENCHMARK_FLOYD_DITHER_128, BENCHMARK_FLOYD_DITHER_256, and
BENCHMARK_FLOYD_DITHER_512; the others are unchanged. These have
regressed by about 28% on Skylake, 34% on Haswell, and over 40% on
Sandybridge.

This reverts commit r353923.

llvm-svn: 354434

show more ...


# ca9aff93 13-Feb-2019 Anton Afanasyev <anton.a.afanasyev@gmail.com>

[X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)

Try to use 64-bit SLP vectorization. In addition to horizontal instrs
this change triggers optimizations for pa

[X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)

Try to use 64-bit SLP vectorization. In addition to horizontal instrs
this change triggers optimizations for partial vector operations (for instance,
using low halfs of 128-bit registers xmm0 and xmm1 to multiply <2 x float> by
<2 x float>).

Fixes llvm.org/PR32433

llvm-svn: 353923

show more ...


Revision tags: llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3
# 502fc1bd 30-Nov-2018 Craig Topper <craig.topper@intel.com>

[X86] Split skylake-avx512 run lines in SLP vectorizer tests to cover -mprefer=vector-width=256 and -mprefer-vector-width=512.

This will make these tests immune if we ever change the default behavio

[X86] Split skylake-avx512 run lines in SLP vectorizer tests to cover -mprefer=vector-width=256 and -mprefer-vector-width=512.

This will make these tests immune if we ever change the default behavior of -march=skylake-avx512 to prefer 256 bit vectors.

llvm-svn: 348046

show more ...


Revision tags: llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2, llvmorg-5.0.0-rc1, llvmorg-4.0.1, llvmorg-4.0.1-rc3, llvmorg-4.0.1-rc2
# a12d6597 15-May-2017 Simon Pilgrim <llvm-dev@redking.me.uk>

[SLPVectorizer][X86] Add vectorization tests for vXi64/vXi32/vXi16/VXi8 shifts

llvm-svn: 303069