Revision tags: llvmorg-4.0.0-rc3 |
|
#
c2af82b4 |
| 22-Feb-2017 |
Michael Kuperstein <mkuper@google.com> |
[LoopUnroll] Enable PGO-based loop peeling by default.
This enables peeling of loops with low dynamic iteration count by default, when profile information is available.
Differential Revision: https
[LoopUnroll] Enable PGO-based loop peeling by default.
This enables peeling of loops with low dynamic iteration count by default, when profile information is available.
Differential Revision: https://reviews.llvm.org/D27734
llvm-svn: 295796
show more ...
|
#
7d230325 |
| 18-Feb-2017 |
Dehao Chen <dehao@google.com> |
Increases full-unroll threshold.
Summary: The default threshold for fully unroll is too conservative. This patch doubles the full-unroll threshold
This change will affect the following speccpu2006
Increases full-unroll threshold.
Summary: The default threshold for fully unroll is too conservative. This patch doubles the full-unroll threshold
This change will affect the following speccpu2006 benchmarks (performance numbers were collected from Intel Sandybridge):
Performance:
403 0.11% 433 0.51% 445 0.48% 447 3.50% 453 1.49% 464 0.75%
Code size:
403 0.56% 433 0.96% 445 2.16% 447 2.96% 453 0.94% 464 8.02%
The compiler time overhead is similar with code size.
Reviewers: davidxl, mkuper, mzolotukhin, hfinkel, chandlerc
Reviewed By: hfinkel, chandlerc
Subscribers: mehdi_amini, zzheng, efriedma, haicheng, hfinkel, llvm-commits
Differential Revision: https://reviews.llvm.org/D28368
llvm-svn: 295538
show more ...
|
Revision tags: llvmorg-4.0.0-rc2 |
|
#
eab3b90a |
| 26-Jan-2017 |
Chandler Carruth <chandlerc@gmail.com> |
[PM] Simplify the new PM interface to the loop unroller and expose two factory functions for the two modes the loop unroller is actually used in in-tree: simplified full-unrolling and the entire thin
[PM] Simplify the new PM interface to the loop unroller and expose two factory functions for the two modes the loop unroller is actually used in in-tree: simplified full-unrolling and the entire thing including partial unrolling.
I've also wired these up to nice names so you can express both of these being in a pipeline easily. This is a precursor to actually enabling these parts of the O2 pipeline.
Differential Revision: https://reviews.llvm.org/D28897
llvm-svn: 293136
show more ...
|
#
5dd55e84 |
| 26-Jan-2017 |
Michael Kuperstein <mkuper@google.com> |
[LoopUnroll] Properly update loopinfo for runtime unrolling by 2
Even when we don't create a remainder loop (that is, when we unroll by 2), we may duplicate nested loops into the remainder. This is
[LoopUnroll] Properly update loopinfo for runtime unrolling by 2
Even when we don't create a remainder loop (that is, when we unroll by 2), we may duplicate nested loops into the remainder. This is complicated by the fact the remainder may itself be either inserted into an outer loop, or at the top level. In the latter case, we may need to create new top-level loops.
Differential Revision: https://reviews.llvm.org/D29156
llvm-svn: 293124
show more ...
|
#
ce40fa13 |
| 25-Jan-2017 |
Chandler Carruth <chandlerc@gmail.com> |
[PM] Teach LoopUnroll to update the LPM infrastructure as it unrolls loops.
We do this by reconstructing the newly added loops after the unroll completes to avoid threading pass manager details thro
[PM] Teach LoopUnroll to update the LPM infrastructure as it unrolls loops.
We do this by reconstructing the newly added loops after the unroll completes to avoid threading pass manager details through all the mess of the unrolling infrastructure.
I've enabled some extra assertions in the LPM to try and catch issues here and enabled a bunch of unroller tests to try and make sure this is sane.
Currently, I'm manually running loop-simplify when needed. That should go away once it is folded into the LPM infrastructure.
Differential Revision: https://reviews.llvm.org/D28848
llvm-svn: 293011
show more ...
|
Revision tags: llvmorg-4.0.0-rc1 |
|
#
c3f87f02 |
| 17-Jan-2017 |
Dehao Chen <dehao@google.com> |
Introduce -unroll-partial-threshold to separate PartialThreshold from Threshold in loop unorller.
Summary: Partial unrolling should have separate threshold with full unrolling.
Reviewers: efriedma,
Introduce -unroll-partial-threshold to separate PartialThreshold from Threshold in loop unorller.
Summary: Partial unrolling should have separate threshold with full unrolling.
Reviewers: efriedma, mzolotukhin
Reviewed By: efriedma, mzolotukhin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28831
llvm-svn: 292293
show more ...
|
#
ca68a3ec |
| 15-Jan-2017 |
Chandler Carruth <chandlerc@gmail.com> |
[PM] Introduce an analysis set used to preserve all analyses over a function's CFG when that CFG is unchanged.
This allows transformation passes to simply claim they preserve the CFG and analysis pa
[PM] Introduce an analysis set used to preserve all analyses over a function's CFG when that CFG is unchanged.
This allows transformation passes to simply claim they preserve the CFG and analysis passes to check for the CFG being preserved to remove the fanout of all analyses being listed in all passes.
I've gone through and removed or cleaned up as many of the comments reminding us to do this as I could.
Differential Revision: https://reviews.llvm.org/D28627
llvm-svn: 292054
show more ...
|
#
3bab7e1a |
| 11-Jan-2017 |
Chandler Carruth <chandlerc@gmail.com> |
[PM] Separate the LoopAnalysisManager from the LoopPassManager and move the latter to the Transforms library.
While the loop PM uses an analysis to form the IR units, the current plan is to have the
[PM] Separate the LoopAnalysisManager from the LoopPassManager and move the latter to the Transforms library.
While the loop PM uses an analysis to form the IR units, the current plan is to have the PM itself establish and enforce both loop simplified form and LCSSA. This would be a layering violation in the analysis library.
Fundamentally, the idea behind the loop PM is to *transform* loops in addition to running passes over them, so it really seemed like the most natural place to sink this was into the transforms library.
We can't just move *everything* because we also have loop analyses that rely on a subset of the invariants. So this patch splits the the loop infrastructure into the analysis management that has to be part of the analysis library, and the transform-aware pass manager.
This also required splitting the loop analyses' printer passes out to the transforms library, which makes sense to me as running these will transform the code into LCSSA in theory.
I haven't split the unittest though because testing one component without the other seems nearly intractable.
Differential Revision: https://reviews.llvm.org/D28452
llvm-svn: 291662
show more ...
|
#
410eaeb0 |
| 11-Jan-2017 |
Chandler Carruth <chandlerc@gmail.com> |
[PM] Rewrite the loop pass manager to use a worklist and augmented run arguments much like the CGSCC pass manager.
This is a major redesign following the pattern establish for the CGSCC layer to sup
[PM] Rewrite the loop pass manager to use a worklist and augmented run arguments much like the CGSCC pass manager.
This is a major redesign following the pattern establish for the CGSCC layer to support updates to the set of loops during the traversal of the loop nest and to support invalidation of analyses.
An additional significant burden in the loop PM is that so many passes require access to a large number of function analyses. Manually ensuring these are cached, available, and preserved has been a long-standing burden in LLVM even with the help of the automatic scheduling in the old pass manager. And it made the new pass manager extremely unweildy. With this design, we can package the common analyses up while in a function pass and make them immediately available to all the loop passes. While in some cases this is unnecessary, I think the simplicity afforded is worth it.
This does not (yet) address loop simplified form or LCSSA form, but those are the next things on my radar and I have a clear plan for them.
While the patch is very large, most of it is either mechanically updating loop passes to the new API or the new testing for the loop PM. The code for it is reasonably compact.
I have not yet updated all of the loop passes to correctly leverage the update mechanisms demonstrated in the unittests. I'll do that in follow-up patches along with improved FileCheck tests for those passes that ensure things work in more realistic scenarios. In many cases, there isn't much we can do with these until the loop simplified form and LCSSA form are in place.
Differential Revision: https://reviews.llvm.org/D28292
llvm-svn: 291651
show more ...
|
#
cc76344e |
| 30-Dec-2016 |
Dehao Chen <dehao@google.com> |
Use continuous boosting factor for complete unroll.
Summary: The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will
Use continuous boosting factor for complete unroll.
Summary: The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will apply a fixed boosting factor to the threshold (by discounting cost). The problem for this approach is that the threshold abruptly. This patch makes the boosting factor a function of runtime reduction percentage, capped by a fixed threshold. In this way, the threshold changes continuously.
The patch also simplified the code by reducing one parameter in UP.
The patch only affects code-gen of two speccpu2006 benchmark:
445.gobmk binary size decreases 0.08%, no performance change. 464.h264ref binary size increases 0.24%, no performance change.
Reviewers: mzolotukhin, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26989
llvm-svn: 290737
show more ...
|
#
aec2fa35 |
| 19-Dec-2016 |
Daniel Jasper <djasper@google.com> |
Revert @llvm.assume with operator bundles (r289755-r289757)
This creates non-linear behavior in the inliner (see more details in r289755's commit thread).
llvm-svn: 290086
|
#
3ca4a6bc |
| 15-Dec-2016 |
Hal Finkel <hfinkel@anl.gov> |
Remove the AssumptionCache
After r289755, the AssumptionCache is no longer needed. Variables affected by assumptions are now found by using the new operand-bundle-based scheme. This new scheme is mo
Remove the AssumptionCache
After r289755, the AssumptionCache is no longer needed. Variables affected by assumptions are now found by using the new operand-bundle-based scheme. This new scheme is more computationally efficient, and also we need much less code...
llvm-svn: 289756
show more ...
|
Revision tags: llvmorg-3.9.1, llvmorg-3.9.1-rc3 |
|
#
c3be2258 |
| 02-Dec-2016 |
Dehao Chen <dehao@google.com> |
Change LoopUnrollPass cost from int to unsigned to make it consistent. (NFC)
llvm-svn: 288463
|
Revision tags: llvmorg-3.9.1-rc2 |
|
#
b151a641 |
| 30-Nov-2016 |
Michael Kuperstein <mkuper@google.com> |
[LoopUnroll] Implement profile-based loop peeling
This implements PGO-driven loop peeling.
The basic idea is that when the average dynamic trip-count of a loop is known, based on PGO, to be low, we
[LoopUnroll] Implement profile-based loop peeling
This implements PGO-driven loop peeling.
The basic idea is that when the average dynamic trip-count of a loop is known, based on PGO, to be low, we can expect a performance win by peeling off the first several iterations of that loop. Unlike unrolling based on a known trip count, or a trip count multiple, this doesn't save us the conditional check and branch on each iteration. However, it does allow us to simplify the straight-line code we get (constant-folding, etc.). This is important given that we know that we will usually only hit this code, and not the actual loop.
This is currently disabled by default.
Differential Revision: https://reviews.llvm.org/D25963
llvm-svn: 288274
show more ...
|
Revision tags: llvmorg-3.9.1-rc1 |
|
#
731b04ca |
| 23-Nov-2016 |
Haicheng Wu <haicheng@codeaurora.org> |
[LoopUnroll] Move code to exit early. NFC.
Just to save some compilation time.
Differential Revision: https://reviews.llvm.org/D26784
llvm-svn: 287800
|
#
41d72a86 |
| 17-Nov-2016 |
Dehao Chen <dehao@google.com> |
Use profile info to adjust loop unroll threshold.
Summary: For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold. For hot loop, w
Use profile info to adjust loop unroll threshold.
Summary: For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold. For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling.
Reviewers: davidxl, mzolotukhin
Subscribers: sanjoy, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D26527
llvm-svn: 287186
show more ...
|
#
c2698cd9 |
| 09-Nov-2016 |
Evgeny Stupachenko <evstupac@gmail.com> |
Minor unroll pass refacoring.
Summary: Unrolled Loop Size calculations moved to a function. Constant representing number of optimized instructions when "back edge" becomes "fall through" replaced w
Minor unroll pass refacoring.
Summary: Unrolled Loop Size calculations moved to a function. Constant representing number of optimized instructions when "back edge" becomes "fall through" replaced with variable. Some comments added.
Reviewers: mzolotukhin
Differential Revision: http://reviews.llvm.org/D21719
From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 286389
show more ...
|
#
430b3e48 |
| 27-Oct-2016 |
Haicheng Wu <haicheng@codeaurora.org> |
[LoopUnroll] Check partial unrolling is enabled before initialization. NFC.
Differential Revision: https://reviews.llvm.org/D23891
llvm-svn: 285330
|
#
cffedc4a |
| 25-Oct-2016 |
Michael Kuperstein <mkuper@google.com> |
Fix 80-char violations. NFC.
llvm-svn: 285092
|
#
84b21835 |
| 21-Oct-2016 |
John Brawn <john.brawn@arm.com> |
[LoopUnroll] Keep the loop test only on the first iteration of max-or-zero loops
When we have a loop with a known upper bound on the number of iterations, and furthermore know that either the number
[LoopUnroll] Keep the loop test only on the first iteration of max-or-zero loops
When we have a loop with a known upper bound on the number of iterations, and furthermore know that either the number of iterations will be either exactly that upper bound or zero, then we can fully unroll up to that upper bound keeping only the first loop test to check for the zero iteration case.
Most of the work here is in plumbing this 'max-or-zero' information from the part of scalar evolution where it's detected through to loop unrolling. I've also gone for the safe default of 'false' everywhere but howManyLessThans which could probably be improved.
Differential Revision: https://reviews.llvm.org/D25682
llvm-svn: 284818
show more ...
|
#
1ef17e90 |
| 12-Oct-2016 |
Haicheng Wu <haicheng@codeaurora.org> |
Reapply "[LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop"
Reappy r284044 after revert in r284051. Krzysztof fixed the error in r284049.
The original summary:
This p
Reapply "[LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop"
Reappy r284044 after revert in r284051. Krzysztof fixed the error in r284049.
The original summary:
This patch tries to fully unroll loops having break statement like this
for (int i = 0; i < 8; i++) { if (a[i] == value) { found = true; break; } }
GCC can fully unroll such loops, but currently LLVM cannot because LLVM only supports loops having exact constant trip counts.
The upper bound of the trip count can be obtained from calling ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the refactoring work in SCEV to prevent duplicating code.
The feature of using the upper bound is enabled under the same circumstance when runtime unrolling is enabled since both are used to unroll loops without knowing the exact constant trip count.
llvm-svn: 284053
show more ...
|
#
45e4ef73 |
| 12-Oct-2016 |
Haicheng Wu <haicheng@codeaurora.org> |
Revert "[LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop"
This reverts commit r284044.
llvm-svn: 284051
|
#
6cac34fd |
| 12-Oct-2016 |
Haicheng Wu <haicheng@codeaurora.org> |
[LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop
This patch tries to fully unroll loops having break statement like this
for (int i = 0; i < 8; i++) { if (a[i] ==
[LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop
This patch tries to fully unroll loops having break statement like this
for (int i = 0; i < 8; i++) { if (a[i] == value) { found = true; break; } }
GCC can fully unroll such loops, but currently LLVM cannot because LLVM only supports loops having exact constant trip counts.
The upper bound of the trip count can be obtained from calling ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the refactoring work in SCEV to prevent duplicating code.
The feature of using the upper bound is enabled under the same circumstance when runtime unrolling is enabled since both are used to unroll loops without knowing the exact constant trip count.
Differential Revision: https://reviews.llvm.org/D24790
llvm-svn: 284044
show more ...
|
#
977853b7 |
| 30-Sep-2016 |
Dehao Chen <dehao@google.com> |
Update loop unroller cost model to make sure debug info does not affect optimization decisions.
Summary: Debug info should *not* affect optimization decisions. This patch updates loop unroller cost
Update loop unroller cost model to make sure debug info does not affect optimization decisions.
Summary: Debug info should *not* affect optimization decisions. This patch updates loop unroller cost model to make it not affected by debug info.
Reviewers: davidxl, mzolotukhin
Subscribers: haicheng, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D25098
llvm-svn: 282894
show more ...
|
#
f57cc62a |
| 30-Sep-2016 |
Adam Nemet <anemet@apple.com> |
[LoopUnroll] Port to the new streaming interface for opt remarks.
llvm-svn: 282834
|