History log of /llvm-project/llvm/test/Transforms/SLPVectorizer/X86/horizontal.ll (Results 1 – 25 of 51)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4
# 38fffa63 06-Nov-2024 Paul Walker <paul.walker@arm.com>

[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548)


Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0
# 2c7786e9 03-Sep-2024 Philip Reames <preames@rivosinc.com>

Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770)

This is a follow up to 924907bc6, and is mostly motivated by consistency
but does include one additional optimization. In gen

Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770)

This is a follow up to 924907bc6, and is mostly motivated by consistency
but does include one additional optimization. In general, we prefer 0.0
over -0.0 as the identity value for an fadd. We use that value in
several places, but don't in others. So, let's be consistent and use the
same identity (when nsz allows) everywhere.

This creates a bunch of test churn, but due to 924907bc6, most of that
churn doesn't actually indicate a change in codegen. The exception is
that this change enables the use of 0.0 for nsz, but *not* reasoc, fadd
reductions. Or said differently, it allows the neutral value of an
ordered fadd reduction to be 0.0.

show more ...


Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2
# 2d69827c 05-Feb-2024 Nikita Popov <npopov@redhat.com>

[Transforms] Convert tests to opaque pointers (NFC)


Revision tags: llvmorg-18.1.0-rc1, llvmorg-19-init
# eecb99c5 05-Dec-2023 Nikita Popov <npopov@redhat.com>

[Tests] Add disjoint flag to some tests (NFC)

These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation f

[Tests] Add disjoint flag to some tests (NFC)

These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.

show more ...


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# b061159e 05-Jan-2023 Nikita Popov <npopov@redhat.com>

[SLPVectorizer] Convert test to opaque pointers (NFC)


Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3
# 3be72f40 12-Oct-2022 Bjorn Pettersson <bjorn.a.pettersson@ericsson.com>

[test][SLPVectorizer] Use -passes syntax in RUN lines. NFC


Revision tags: working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5
# 982d9ef1 01-Jun-2022 Alexey Bataev <a.bataev@outlook.com>

[SLP]Fix PR55734: SLP vectorizer's reduce_and formation introduces poison.

Need either follow the original order of the operands for bool logical
ops, or emit freeze instruction to avoid poison prop

[SLP]Fix PR55734: SLP vectorizer's reduce_and formation introduces poison.

Need either follow the original order of the operands for bool logical
ops, or emit freeze instruction to avoid poison propagation.

Differential Revision: https://reviews.llvm.org/D126877

show more ...


# 65c7cecb 15-Aug-2022 Alexey Bataev <a.bataev@outlook.com>

[SLP]Fix PR51320: Try to vectorize single store operands.

Currently, we try to vectorize values, feeding into stores, only if
slp-vectorize-hor-store option is provided. We can safely enable
vectori

[SLP]Fix PR51320: Try to vectorize single store operands.

Currently, we try to vectorize values, feeding into stores, only if
slp-vectorize-hor-store option is provided. We can safely enable
vectorization of the value operand of a single store in the basic block,
if the operand value is used only in store.
It should enable extra vectorization and should not increase compile
time significantly.
Fixes https://github.com/llvm/llvm-project/issues/51320

Differential Revision: https://reviews.llvm.org/D131894

show more ...


# 10f41a21 25-May-2022 Alexey Bataev <a.bataev@outlook.com>

[SLP]Fix PR55688: Miscompile due to incorrect nuw/nsw handling.

Need to use all ReductionOps when propagating flags for the reduction
ops, otherwise transformation is not correct. Plus, need to drop

[SLP]Fix PR55688: Miscompile due to incorrect nuw/nsw handling.

Need to use all ReductionOps when propagating flags for the reduction
ops, otherwise transformation is not correct. Plus, need to drop nuw/nsw
flags.

Differential Revision: https://reviews.llvm.org/D126371

show more ...


# d3187dd5 25-May-2022 Sanjay Patel <spatel@rotateright.com>

[SLP] add minimum test for miscompile (PR55688); NFC


Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# 7d6e8f2a 29-Mar-2022 Philip Reames <listmail@philipreames.com>

[slp] Delete dead scalar instructions feeding vectorized instructions

If we vectorize a e.g. store, we leave around a bunch of getelementptrs for the individual scalar stores which we removed. We ca

[slp] Delete dead scalar instructions feeding vectorized instructions

If we vectorize a e.g. store, we leave around a bunch of getelementptrs for the individual scalar stores which we removed. We can go ahead and delete them as well.

This is purely for test output quality and readability. It should have no effect in any sane pipeline.

Differential Revision: https://reviews.llvm.org/D122493

show more ...


# 48cc9287 18-Mar-2022 Philip Reames <listmail@philipreames.com>

Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"" (try 3)

The original commit exposed several missing dependencies (e.g. latent bugs in SLP scheduling). Most of these were fixed

Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"" (try 3)

The original commit exposed several missing dependencies (e.g. latent bugs in SLP scheduling). Most of these were fixed over the weekend and have had several days to bake. The last was fixed this morning after being noticed in manual review of test changes yesterday. See the review thread for links to each change.

Original commit message follows:

SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

Before this patch:

704357 SLP - Number of calcDeps actions
699021 SLP - Number of schedule calls
5598 SLP - Number of ReSchedule actions
59 SLP - Number of ReScheduleOnFail actions
10084 SLP - Number of schedule resets
8523 SLP - Number of vector instructions generated

After this patch:

102895 SLP - Number of calcDeps actions
161916 SLP - Number of schedule calls
5637 SLP - Number of ReSchedule actions
55 SLP - Number of ReScheduleOnFail actions
10083 SLP - Number of schedule resets
8403 SLP - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538

show more ...


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# deae979a 03-Mar-2022 Philip Reames <listmail@philipreames.com>

Revert "Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"""

This reverts commit 738042711bc08cde9135873200b1d088e6cf11c3. A second, apparently separate, issue has been reported on

Revert "Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"""

This reverts commit 738042711bc08cde9135873200b1d088e6cf11c3. A second, apparently separate, issue has been reported on the original review.

show more ...


# 73804271 02-Mar-2022 Philip Reames <listmail@philipreames.com>

Reapply "[SLP] Schedule only sub-graph of vectorizable instructions""

Root issue which triggered the revert was fixed in 689bab. No changes in the reapplied patch.

Original commit message follows:

Reapply "[SLP] Schedule only sub-graph of vectorizable instructions""

Root issue which triggered the revert was fixed in 689bab. No changes in the reapplied patch.

Original commit message follows:

SLP currently schedules all instructions within a scheduling window which stretches from the first instr
uction potentially vectorized to the last. This window can include a very large number of unrelated instruct
ions which are not being considered for vectorization. This change switches the code to only schedule the su
b-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

Before this patch:

704357 SLP - Number of calcDeps actions
699021 SLP - Number of schedule calls
5598 SLP - Number of ReSchedule actions
59 SLP - Number of ReScheduleOnFail actions
10084 SLP - Number of schedule resets
8523 SLP - Number of vector instructions generated

After this patch:

102895 SLP - Number of calcDeps actions
161916 SLP - Number of schedule calls
5637 SLP - Number of ReSchedule actions
55 SLP - Number of ReScheduleOnFail actions
10083 SLP - Number of schedule resets
8403 SLP - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538

show more ...


Revision tags: llvmorg-14.0.0-rc2
# 9c6250ee 01-Mar-2022 Arthur Eubanks <aeubanks@google.com>

Revert "[SLP] Schedule only sub-graph of vectorizable instructions"

This reverts commit 0539a26d91a1b7c74022fa9cf33bd7faca87544d.

Causes a miscompile, see comments on D118538.

Required updating bo

Revert "[SLP] Schedule only sub-graph of vectorizable instructions"

This reverts commit 0539a26d91a1b7c74022fa9cf33bd7faca87544d.

Causes a miscompile, see comments on D118538.

Required updating bottom-to-top-reorder.ll.

show more ...


# 0539a26d 22-Feb-2022 Philip Reames <listmail@philipreames.com>

[SLP] Schedule only sub-graph of vectorizable instructions

SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to th

[SLP] Schedule only sub-graph of vectorizable instructions

SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

Before this patch:

704357 SLP - Number of calcDeps actions
699021 SLP - Number of schedule calls
5598 SLP - Number of ReSchedule actions
59 SLP - Number of ReScheduleOnFail actions
10084 SLP - Number of schedule resets
8523 SLP - Number of vector instructions generated

After this patch:

102895 SLP - Number of calcDeps actions
161916 SLP - Number of schedule calls
5637 SLP - Number of ReSchedule actions
55 SLP - Number of ReScheduleOnFail actions
10083 SLP - Number of schedule resets
8403 SLP - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538

show more ...


# ea071884 12-Feb-2022 Simon Pilgrim <llvm-dev@redking.me.uk>

[SLP][X86] Add common check prefix for horizontal reduction tests


Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init
# a6f02212 23-Jan-2021 Sanjay Patel <spatel@rotateright.com>

[SLP] fix fast-math-flag propagation on FP reductions

As shown in the test diffs, we could miscompile by
propagating flags that did not exist in the original
code.

The flags required for fmin/fmax

[SLP] fix fast-math-flag propagation on FP reductions

As shown in the test diffs, we could miscompile by
propagating flags that did not exist in the original
code.

The flags required for fmin/fmax reductions will be
fixed in a follow-up patch.

show more ...


# 39e1e53a 23-Jan-2021 Sanjay Patel <spatel@rotateright.com>

[SLP] add reduction test with mixed fast-math-flags; NFC


Revision tags: llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1
# 3567908d 30-Dec-2020 Sanjay Patel <spatel@rotateright.com>

[SLP] add fadd reduction test to show broken FMF propagation; NFC


Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1
# 08834979 17-Nov-2020 Sanjay Patel <spatel@rotateright.com>

[SLP] avoid unreachable code crash/infloop

Example based on the post-commit comments for D88735.


# d8d1cc64 06-Nov-2020 Florian Hahn <flo@fhahn.com>

[SLP] Also try to vectorize incoming values of PHIs .

Currently we do not consider incoming values of PHIs as roots for SLP
vectorization. This means we miss scenarios like the one in the test
case

[SLP] Also try to vectorize incoming values of PHIs .

Currently we do not consider incoming values of PHIs as roots for SLP
vectorization. This means we miss scenarios like the one in the test
case and PR47670.

It appears quite straight-forward to consider incoming values of PHIs as
roots for vectorization, but I might be missing something that makes
this problematic.

In terms of vectorized instructions, this applies to quite a few
benchmarks across MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto

Same hash: 185 (filtered out)
Remaining: 52
Metric: SLP.NumVectorInstructions

Program base patch diff
test-suite...ProxyApps-C++/HPCCG/HPCCG.test 9.00 27.00 200.0%
test-suite...C/CFP2000/179.art/179.art.test 8.00 22.00 175.0%
test-suite...T2006/458.sjeng/458.sjeng.test 14.00 30.00 114.3%
test-suite...ce/Benchmarks/PAQ8p/paq8p.test 11.00 18.00 63.6%
test-suite...s/FreeBench/neural/neural.test 12.00 18.00 50.0%
test-suite...rimaran/enc-3des/enc-3des.test 65.00 95.00 46.2%
test-suite...006/450.soplex/450.soplex.test 63.00 89.00 41.3%
test-suite...ProxyApps-C++/CLAMR/CLAMR.test 177.00 250.00 41.2%
test-suite...nchmarks/McCat/18-imp/imp.test 13.00 18.00 38.5%
test-suite.../Applications/sgefa/sgefa.test 26.00 35.00 34.6%
test-suite...pplications/oggenc/oggenc.test 100.00 133.00 33.0%
test-suite...6/482.sphinx3/482.sphinx3.test 103.00 134.00 30.1%
test-suite...oxyApps-C++/miniFE/miniFE.test 169.00 213.00 26.0%
test-suite.../Benchmarks/Olden/tsp/tsp.test 59.00 73.00 23.7%
test-suite...TimberWolfMC/timberwolfmc.test 503.00 622.00 23.7%
test-suite...T2006/456.hmmer/456.hmmer.test 65.00 79.00 21.5%
test-suite...libquantum/462.libquantum.test 58.00 68.00 17.2%
test-suite...ternal/HMMER/hmmcalibrate.test 84.00 98.00 16.7%
test-suite...ications/JM/ldecod/ldecod.test 351.00 401.00 14.2%
test-suite...arks/VersaBench/dbms/dbms.test 52.00 57.00 9.6%
test-suite...ce/Benchmarks/Olden/bh/bh.test 118.00 128.00 8.5%
test-suite.../Benchmarks/Bullet/bullet.test 6355.00 6880.00 8.3%
test-suite...nsumer-lame/consumer-lame.test 480.00 519.00 8.1%
test-suite...000/183.equake/183.equake.test 226.00 244.00 8.0%
test-suite...chmarks/Olden/power/power.test 105.00 113.00 7.6%
test-suite...6/471.omnetpp/471.omnetpp.test 92.00 99.00 7.6%
test-suite...ications/JM/lencod/lencod.test 1173.00 1261.00 7.5%
test-suite...0/253.perlbmk/253.perlbmk.test 55.00 59.00 7.3%
test-suite...oxyApps-C/miniAMR/miniAMR.test 92.00 98.00 6.5%
test-suite...chmarks/MallocBench/gs/gs.test 446.00 473.00 6.1%
test-suite.../CINT2006/403.gcc/403.gcc.test 464.00 491.00 5.8%
test-suite...6/464.h264ref/464.h264ref.test 998.00 1055.00 5.7%
test-suite...006/453.povray/453.povray.test 5711.00 6007.00 5.2%
test-suite...FreeBench/distray/distray.test 102.00 107.00 4.9%
test-suite...:: External/Povray/povray.test 4184.00 4378.00 4.6%
test-suite...DOE-ProxyApps-C/CoMD/CoMD.test 112.00 117.00 4.5%
test-suite...T2006/445.gobmk/445.gobmk.test 104.00 108.00 3.8%
test-suite...CI_Purple/SMG2000/smg2000.test 789.00 819.00 3.8%
test-suite...yApps-C++/PENNANT/PENNANT.test 233.00 241.00 3.4%
test-suite...marks/7zip/7zip-benchmark.test 417.00 428.00 2.6%
test-suite...arks/mafft/pairlocalalign.test 627.00 643.00 2.6%
test-suite.../Benchmarks/nbench/nbench.test 259.00 265.00 2.3%
test-suite...006/447.dealII/447.dealII.test 4641.00 4732.00 2.0%
test-suite...lications/ClamAV/clamscan.test 106.00 108.00 1.9%
test-suite...CFP2000/177.mesa/177.mesa.test 1639.00 1664.00 1.5%
test-suite...oxyApps-C/RSBench/rsbench.test 66.00 65.00 -1.5%
test-suite.../CINT2000/252.eon/252.eon.test 3416.00 3444.00 0.8%
test-suite...CFP2000/188.ammp/188.ammp.test 1846.00 1861.00 0.8%
test-suite.../CINT2000/176.gcc/176.gcc.test 152.00 153.00 0.7%
test-suite...CFP2006/444.namd/444.namd.test 3528.00 3544.00 0.5%
test-suite...T2006/473.astar/473.astar.test 98.00 98.00 0.0%
test-suite...frame_layout/frame_layout.test NaN 39.00 nan%

On ARM64, there appears to be a slight regression on SPEC2006, which
might be interesting to investigate:

test-suite...T2006/473.astar/473.astar.test 0.9%

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D88735

show more ...


# 20b386aa 29-Oct-2020 Nikita Popov <nikita.ppv@gmail.com>

[LoopUtils] Fix neutral value for vector.reduce.fadd

Use -0.0 instead of 0.0 as the start value. The previous use of 0.0
was fine for all existing uses of this function though, as it is
always gener

[LoopUtils] Fix neutral value for vector.reduce.fadd

Use -0.0 instead of 0.0 as the start value. The previous use of 0.0
was fine for all existing uses of this function though, as it is
always generated with fast flags right now, and thus nsz.

show more ...


Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6
# 322d0afd 03-Oct-2020 Amara Emerson <amara@apple.com>

[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.

This change renames the intrinsics to not have "experimental" in the name.

The autoupgrader will handle lega

[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.

This change renames the intrinsics to not have "experimental" in the name.

The autoupgrader will handle legacy intrinsics.

Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html

Differential Revision: https://reviews.llvm.org/D88787

show more ...


# bb448a24 02-Oct-2020 Florian Hahn <flo@fhahn.com>

[SLP] Add test where reduction result is used in PHI.

Test case for PR47670.


123