#
cd0ba9dc |
| 12-Oct-2021 |
Florian Hahn <flo@fhahn.com> |
[LoopPeel] Peel if it turns invariant loads dereferenceable.
This patch adds a new cost heuristic that allows peeling a single iteration off read-only loops, if the loop contains a load that
1.
[LoopPeel] Peel if it turns invariant loads dereferenceable.
This patch adds a new cost heuristic that allows peeling a single iteration off read-only loops, if the loop contains a load that
1. is feeding an exit condition, 2. dominates the latch, 3. is not already known to be dereferenceable, 4. and has a loop invariant address.
If all non-latch exits are terminated with unreachable, such loads in the loop are guaranteed to be dereferenceable after peeling, enabling hoisting/CSE'ing them.
This enables vectorization of loops with certain runtime-checks, like multiple calls to `std::vector::at` if the vector is passed as pointer.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D108114
show more ...
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4 |
|
#
1ac209ed |
| 15-Sep-2021 |
Markus Lavin <markus.lavin@ericsson.com> |
[NPM] Added -print-pipeline-passes print params for a few passes.
Added '-print-pipeline-passes' printing of parameters for those passes declared with *_WITH_PARAMS macro in PassRegistry.def.
Note
[NPM] Added -print-pipeline-passes print params for a few passes.
Added '-print-pipeline-passes' printing of parameters for those passes declared with *_WITH_PARAMS macro in PassRegistry.def.
Note that it only prints the parameters declared inside *_WITH_PARAMS as in a few cases there appear to be additional parameters not parsable.
The following passes are now covered (i.e. all of those with *_WITH_PARAMS in PassRegistry.def).
LoopExtractorPass - loop-extract HWAddressSanitizerPass - hwsan EarlyCSEPass - early-cse EntryExitInstrumenterPass - ee-instrument LowerMatrixIntrinsicsPass - lower-matrix-intrinsics LoopUnrollPass - loop-unroll AddressSanitizerPass - asan MemorySanitizerPass - msan SimplifyCFGPass - simplifycfg LoopVectorizePass - loop-vectorize MergedLoadStoreMotionPass - mldst-motion GVN - gvn StackLifetimePrinterPass - print<stack-lifetime> SimpleLoopUnswitchPass - simple-loop-unswitch
Differential Revision: https://reviews.llvm.org/D109310
show more ...
|
Revision tags: llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
#
cc7bcef3 |
| 18-Aug-2021 |
Ali Sedaghati <asedagha@sfu.ca> |
Reapply: [NFC] factor out unrolling decision logic
reverting ffd8a268bdc518f87e9ba7524aba0458f4b9979c (reapplying 4d559837e887c278d7c27274f4f6b1b78b97c00d) - removed spurious inclusion of <optional>
Reapply: [NFC] factor out unrolling decision logic
reverting ffd8a268bdc518f87e9ba7524aba0458f4b9979c (reapplying 4d559837e887c278d7c27274f4f6b1b78b97c00d) - removed spurious inclusion of <optional>
Differential Revision: https://reviews.llvm.org/D106001
show more ...
|
#
ffd8a268 |
| 18-Aug-2021 |
Geoffrey Martin-Noble <gcmn@google.com> |
Revert "[NFC] factor out unrolling decision logic"
This patch added a requirement for C++17, while LLVM is supposed to build with C++14 (https://llvm.org/docs/CodingStandards.html#c-standard-version
Revert "[NFC] factor out unrolling decision logic"
This patch added a requirement for C++17, while LLVM is supposed to build with C++14 (https://llvm.org/docs/CodingStandards.html#c-standard-versions). Posted a note to the original review thread (https://reviews.llvm.org/D106001).
This reverts commit 4d559837e887c278d7c27274f4f6b1b78b97c00d.
Differential Revision: https://reviews.llvm.org/D108314
show more ...
|
#
4d559837 |
| 18-Aug-2021 |
Ali Sedaghati <asedagha@sfu.ca> |
[NFC] factor out unrolling decision logic
Decoupling the unrolling logic into three different functions. The shouldPragmaUnroll() covers the 1st and 2nd priorities of the previous code, the shouldFu
[NFC] factor out unrolling decision logic
Decoupling the unrolling logic into three different functions. The shouldPragmaUnroll() covers the 1st and 2nd priorities of the previous code, the shouldFullUnroll() covers the 3rd, and the shouldPartialUnroll() covers the 5th. The output of each function, Optional<unsigned>, could be a value for UP.Count, which means unrolling factor has been set, or None, which means decision hasn't been made yet and should try the next priority.
Reviewed By: mtrofin, jdoerfert
Differential Revision: https://reviews.llvm.org/D106001
show more ...
|
Revision tags: llvmorg-13.0.0-rc1 |
|
#
6f6e9a86 |
| 02-Aug-2021 |
Roman Lebedev <lebedev.ri@gmail.com> |
[BasicTTIImpl][LoopUnroll] getUnrollingPreferences(): emit ORE remark when advising against unrolling due to a call in a loop
I'm not sure this is the best way to approach this, but the situation is
[BasicTTIImpl][LoopUnroll] getUnrollingPreferences(): emit ORE remark when advising against unrolling due to a call in a loop
I'm not sure this is the best way to approach this, but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D107271
show more ...
|
#
c7770574 |
| 02-Aug-2021 |
Nikita Popov <nikita.ppv@gmail.com> |
Revert "[unroll] Move multiple exit costing into consumer pass [NFC]"
This reverts commit 76940577e4bf9c63a8a4ebd32b556bd7feb8cad3.
This causes Transforms/LoopUnroll/ARM/multi-blocks.ll to fail.
|
#
76940577 |
| 02-Aug-2021 |
Philip Reames <listmail@philipreames.com> |
[unroll] Move multiple exit costing into consumer pass [NFC]
This aligns the multiple exit costing with all the other cost decisions. Note that UnrollAndJam, which is the only other caller of the o
[unroll] Move multiple exit costing into consumer pass [NFC]
This aligns the multiple exit costing with all the other cost decisions. Note that UnrollAndJam, which is the only other caller of the original home of this code, unconditionally bails out of multiple exit loops.
show more ...
|
Revision tags: llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3 |
|
#
862313cf |
| 19-Jun-2021 |
Nikita Popov <nikita.ppv@gmail.com> |
[LoopUnroll] Don't modify TripCount/TripMultiple in computeUnrollCount() (NFCI)
As these are no longer passed to UnrollLoop(), there is no need to modify them in computeUnrollCount(). Make them non-
[LoopUnroll] Don't modify TripCount/TripMultiple in computeUnrollCount() (NFCI)
As these are no longer passed to UnrollLoop(), there is no need to modify them in computeUnrollCount(). Make them non-reference parameters.
Differential Revision: https://reviews.llvm.org/D104590
show more ...
|
#
1ae266f4 |
| 19-Jun-2021 |
Nikita Popov <nikita.ppv@gmail.com> |
[LoopUnroll] Use smallest exact trip count from any exit
This is a more general alternative/extension to D102635. Rather than handling the special case of "header exit with non-exiting latch", this
[LoopUnroll] Use smallest exact trip count from any exit
This is a more general alternative/extension to D102635. Rather than handling the special case of "header exit with non-exiting latch", this unrolls against the smallest exact trip count from any exit. The latch exit is no longer treated as priviledged when it comes to full unrolling.
The motivating case is in full-unroll-one-unpredictable-exit.ll. Here the header exit is an IV-based exit, while the latch exit is a data comparison. This kind of loop does not get rotated, because the latch is already exiting, and loop rotation doesn't try to distinguish IV-based/analyzable latches.
Differential Revision: https://reviews.llvm.org/D102982
show more ...
|
#
1bd4085e |
| 17-Jun-2021 |
Nikita Popov <nikita.ppv@gmail.com> |
[LoopUnroll] Push runtime unrolling decision up into tryToUnrollLoop()
Currently, UnrollLoop() is passed an AllowRuntime flag and decides itself whether runtime unrolling should be used or not. This
[LoopUnroll] Push runtime unrolling decision up into tryToUnrollLoop()
Currently, UnrollLoop() is passed an AllowRuntime flag and decides itself whether runtime unrolling should be used or not. This patch pushes the decision into the caller and allows us to eliminate the ULO.TripCount and ULO.TripMultiple parameters.
Differential Revision: https://reviews.llvm.org/D104487
show more ...
|
Revision tags: llvmorg-12.0.1-rc2 |
|
#
db457468 |
| 29-May-2021 |
Nikita Popov <nikita.ppv@gmail.com> |
[LoopUnroll] Separate peeling from unrolling
Loop peeling is currently performed as part of UnrollLoop(). Outside test scenarios, it is always performed with an unroll count of 1. This means that un
[LoopUnroll] Separate peeling from unrolling
Loop peeling is currently performed as part of UnrollLoop(). Outside test scenarios, it is always performed with an unroll count of 1. This means that unrolling doesn't actually do anything apart from performing post-unroll simplification.
When testing, it's currently possible to specify both an explicit peel count and an explicit unroll count. This doesn't perform any sensible operation and may result in miscompiles, see https://bugs.llvm.org/show_bug.cgi?id=45939.
This patch moves peeling from UnrollLoop() into tryToUnrollLoop(), so that peeling does not also perform a susequent unroll. We only run the post-unroll simplifications. Specifying both an explicit peel count and unroll count is forbidden.
In the future, we may want to support both (non-PGO) peeling a loop and unrolling it, but this needs to be done by first performing the peel and then recalculating unrolling heuristics on a now possibly analyzable loop.
Differential Revision: https://reviews.llvm.org/D103362
show more ...
|
#
5c0d1b2f |
| 03-Jun-2021 |
Philip Reames <listmail@philipreames.com> |
[LoopUnroll] Eliminate PreserveCondBr parameter and fix a bug in the process
This builds on D103584. The change eliminates the coupling between unroll heuristic and implementation w.r.t. knowing whe
[LoopUnroll] Eliminate PreserveCondBr parameter and fix a bug in the process
This builds on D103584. The change eliminates the coupling between unroll heuristic and implementation w.r.t. knowing when the passed in trip count is an exact trip count or a max trip count. In theory the new code is slightly less powerful (since it relies on exact computable trip counts), but in practice, it appears to cover all the same cases. It can also be extended if needed.
The test change shows what appears to be a bug in the existing code around the interaction of peeling and unrolling. The original loop only ran 8 iterations. The previous output had the loop peeled by 2, and then an exact unroll of 8. This meant the loop ran a total of 10 iterations which appears to have been a miscompile.
Differential Revision: https://reviews.llvm.org/D103620
show more ...
|
#
44d70d29 |
| 03-Jun-2021 |
Philip Reames <listmail@philipreames.com> |
[LoopUnroll] Eliminate PreserveOnlyFirst parameter [nfc]
This is a first step towards simplifying the transform interface to be less error prone. The basic idea is that querying SCEV is cheap (since
[LoopUnroll] Eliminate PreserveOnlyFirst parameter [nfc]
This is a first step towards simplifying the transform interface to be less error prone. The basic idea is that querying SCEV is cheap (since it's cached) and we can just check for properties related to branch folding in the transform method instead of relying on the heuristic part to pass everything in correctly.
Differential Revision: https://reviews.llvm.org/D103584
show more ...
|
#
9cc2181e |
| 26-May-2021 |
Philip Reames <listmail@philipreames.com> |
[unroll] Use value domain for symbolic execution based cost model
The current full unroll cost model does a symbolic evaluation of the loop up to a fixed limit. That symbolic evaluation currently si
[unroll] Use value domain for symbolic execution based cost model
The current full unroll cost model does a symbolic evaluation of the loop up to a fixed limit. That symbolic evaluation currently simplifies to constants, but we can generalize to arbitrary Values using the InstructionSimplify infrastructure at very low cost.
By itself, this enables some simplifications, but it's mainly useful when combined with the branch simplification over in D102928.
Differential Revision: https://reviews.llvm.org/D102934
show more ...
|
Revision tags: llvmorg-12.0.1-rc1 |
|
#
6ce7b2f0 |
| 15-May-2021 |
Vitaly Buka <vitalybuka@google.com> |
Fix "is not used" warning
|
#
fcd12fed |
| 15-May-2021 |
Philip Reames <listmail@philipreames.com> |
Extract a helper routine to simplify D91481 [NFC]
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2 |
|
#
75b2555d |
| 04-Feb-2021 |
Sander de Smalen <sander.desmalen@arm.com> |
NFC: Migrate LoopUnrollPass to work on InstructionCost
This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any Instruct
NFC: Migrate LoopUnrollPass to work on InstructionCost
This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally.
See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html
Reviewed By: david-arm, fhahn
Differential Revision: https://reviews.llvm.org/D95817
show more ...
|
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init |
|
#
35b3989a |
| 26-Jan-2021 |
Florian Hahn <flo@fhahn.com> |
[Passes] Run peeling as part of simple/full loop unrolling.
Loop peeling removes conditions from loop bodies that become invariant after a small number of iterations. When triggered, this leads to f
[Passes] Run peeling as part of simple/full loop unrolling.
Loop peeling removes conditions from loop bodies that become invariant after a small number of iterations. When triggered, this leads to fewer compares and possibly PHIs in loop bodies, enabling further optimizations. The current cost-model of loop peeling should be quite conservative/safe, i.e. only peel if a condition in the loop becomes known after peeling.
For example, see PR47671, where loop peeling enables vectorization by removing a PHI the vectorizer does not understand. Granted, the loop-vectorizer could also be taught about constant PHIs, but loop peeling is likely to enable other optimizations as well.
This has an impact on quite a few benchmarks from MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto, for example
Same hash: 186 (filtered out) Remaining: 51 Metric: loop-vectorize.LoopsVectorized
Program base patch diff test-suite...ve-susan/automotive-susan.test 8.00 9.00 12.5% test-suite...nal/skidmarks10/skidmarks.test 35.00 31.00 -11.4% test-suite...lications/sqlite3/sqlite3.test 41.00 43.00 4.9% test-suite...s/ASC_Sequoia/AMGmk/AMGmk.test 25.00 26.00 4.0% test-suite...006/450.soplex/450.soplex.test 88.00 89.00 1.1% test-suite...TimberWolfMC/timberwolfmc.test 120.00 119.00 -0.8% test-suite.../CINT2006/403.gcc/403.gcc.test 215.00 216.00 0.5% test-suite...006/447.dealII/447.dealII.test 957.00 958.00 0.1% test-suite...ternal/HMMER/hmmcalibrate.test 75.00 75.00 0.0%
Same hash: 186 (filtered out) Remaining: 51 Metric: loop-vectorize.LoopsAnalyzed
Program base patch diff test-suite...ks/Prolangs-C/agrep/agrep.test 440.00 434.00 -1.4% test-suite...nal/skidmarks10/skidmarks.test 312.00 308.00 -1.3% test-suite...marks/7zip/7zip-benchmark.test 6399.00 6323.00 -1.2% test-suite...lications/minisat/minisat.test 134.00 135.00 0.7% test-suite...rks/FreeBench/pifft/pifft.test 295.00 297.00 0.7% test-suite...TimberWolfMC/timberwolfmc.test 1879.00 1869.00 -0.5% test-suite...pplications/treecc/treecc.test 689.00 691.00 0.3% test-suite...T2000/300.twolf/300.twolf.test 1593.00 1597.00 0.3% test-suite.../Benchmarks/Bullet/bullet.test 1394.00 1392.00 -0.1% test-suite...ications/JM/ldecod/ldecod.test 1431.00 1429.00 -0.1% test-suite...6/464.h264ref/464.h264ref.test 2229.00 2230.00 0.0% test-suite...lications/sqlite3/sqlite3.test 2590.00 2589.00 -0.0% test-suite...ications/JM/lencod/lencod.test 2732.00 2733.00 0.0% test-suite...006/453.povray/453.povray.test 3395.00 3394.00 -0.0%
Note the -11% regression in number of loops vectorized for skidmarks. I suspect this corresponds to the fact that those loops are gone now (see the reduction in number of loops analyzed by LV).
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D88471
show more ...
|
Revision tags: llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1 |
|
#
33bf1cad |
| 08-Jan-2021 |
Kazu Hirata <kazu@google.com> |
[llvm] Use *Set::contains (NFC)
|
#
cf5415c7 |
| 06-Jan-2021 |
Hiroshi Yamauchi <yamauchi@google.com> |
[PGO][PGSO] Let unroll hints take precedence over PGSO.
Differential Revision: https://reviews.llvm.org/D94199
|
#
76f6b125 |
| 07-Jan-2021 |
Oliver Stannard <oliver.stannard@linaro.org> |
Revert "[llvm] Use BasicBlock::phis() (NFC)"
Reverting because this causes crashes on the 2-stage buildbots, for example http://lab.llvm.org:8011/#/builders/7/builds/1140.
This reverts commit 9b228
Revert "[llvm] Use BasicBlock::phis() (NFC)"
Reverting because this causes crashes on the 2-stage buildbots, for example http://lab.llvm.org:8011/#/builders/7/builds/1140.
This reverts commit 9b228f107d43341ef73af92865f73a9a076c5a76.
show more ...
|
#
9b228f10 |
| 07-Jan-2021 |
Kazu Hirata <kazu@google.com> |
[llvm] Use BasicBlock::phis() (NFC)
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1 |
|
#
60b19424 |
| 17-Oct-2020 |
Pedro Tammela <pctammela@gmail.com> |
[NFC] fix some typos in LoopUnrollPass
This patch fixes a couple of typos in the LoopUnrollPass.cpp comments
Differential Revision: https://reviews.llvm.org/D89603
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4 |
|
#
89c1e35f |
| 22-Sep-2020 |
Stefanos Baziotis <sdi1600105@di.uoa.gr> |
[LoopInfo] empty() -> isInnermost(), add isOutermost()
Differential Revision: https://reviews.llvm.org/D82895
|