LoopUnrollPass.cpp - OpenGrok history log for /llvm-project/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# 58c5a7f5	28-Sep-2016	Jonas Paulsson <paulsson@linux.vnet.ibm.com>	[SystemZ] Implementation of getUnrollingPreferences(). This commit enables more unrolling for SystemZ by implementing the SystemZTargetTransformInfo::getUnrollingPreferences() method. It has been f [SystemZ] Implementation of getUnrollingPreferences(). This commit enables more unrolling for SystemZ by implementing the SystemZTargetTransformInfo::getUnrollingPreferences() method. It has been found that it is better to only unroll moderately, so the DefaultUnrollRuntimeCount has been moved into UnrollingPreferences in order to set this to a lower value for SystemZ (4). Reviewers: Evgeny Stupachenko, Ulrich Weigand. https://reviews.llvm.org/D24451 llvm-svn: 282570 show more ...
# 109f4f35	07-Sep-2016	Haicheng Wu <haicheng@codeaurora.org>	[LoopUnroll] Correct a debug message. NFC. Differential Revision: https://reviews.llvm.org/D24299 llvm-svn: 280865
# 4f155b6e	26-Aug-2016	Adam Nemet <anemet@apple.com>	[LoopUnroll] Use OptimizationRemarkEmitter directly not via the analysis pass We can't mark ORE (a function pass) preserved as required by the loop passes because that is how we ensure that the requ [LoopUnroll] Use OptimizationRemarkEmitter directly not via the analysis pass We can't mark ORE (a function pass) preserved as required by the loop passes because that is how we ensure that the required passes like LazyBFI are all available any time ORE is used. See the new comments in the patch. Instead we use it directly just like the inliner does in D22694. As expected there is some additional overhead after removing the caching provided by analysis passes. The worst case, I measured was LNT/CINT2006_ref/401.bzip2 which regresses by 12%. As before, this only affects -Rpass-with-hotness and not default compilation. llvm-svn: 279829 show more ...
Revision tags: llvmorg-3.9.0, llvmorg-3.9.0-rc3
# bd63d436	23-Aug-2016	Michael Zolotukhin <mzolotukhin@apple.com>	[LoopUnroll] By default disable unrolling when optimizing for size. Summary: In clang commit r268509 we started to invoke loop-unroll pass from the driver even under -Os. However, we happen to not i [LoopUnroll] By default disable unrolling when optimizing for size. Summary: In clang commit r268509 we started to invoke loop-unroll pass from the driver even under -Os. However, we happen to not initialize optsize thresholds properly, which si fixed with this change. r268509 led to some big compile time regressions, because we started to unroll some loops that we didn't unroll before. With this change I hope to recover most of the regressions. We still are slightly slower than before, because we do some checks here and there in loop-unrolling before we bail out, but at least the slowdown is not that huge now. Reviewers: hfinkel, chandlerc Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D23388 llvm-svn: 279585 show more ...
Revision tags: llvmorg-3.9.0-rc2
# e7877632	17-Aug-2016	Haicheng Wu <haicheng@codeaurora.org>	[LoopUnroll] Move a simple check earlier. NFC. Move the check of CallInst earlier to skip expensive recursive operations. Differential Revision: https://reviews.llvm.org/D23611 llvm-svn: 278998
# 0746f3bf	09-Aug-2016	Sean Silva <chisophugis@gmail.com>	Consistently use LoopAnalysisManager One exception here is LoopInfo which must forward-declare it (because the typedef is in LoopPassManager.h which depends on LoopInfo). Also, some includes for Lo Consistently use LoopAnalysisManager One exception here is LoopInfo which must forward-declare it (because the typedef is in LoopPassManager.h which depends on LoopInfo). Also, some includes for LoopPassManager.h were needed since that file provides the typedef. Besides a general consistently benefit, the extra layer of indirection allows the mechanical part of https://reviews.llvm.org/D23256 that requires touching every transformation and analysis to be factored out cleanly. Thanks to David for the suggestion. llvm-svn: 278079 show more ...
Revision tags: llvmorg-3.9.0-rc1
# 12937c36	29-Jul-2016	Adam Nemet <anemet@apple.com>	[LoopUnroll] Include hotness of region in opt remark LoopUnroll is a loop pass, so the analysis of OptimizationRemarkEmitter is added to the common function analysis passes that loop passes depend o [LoopUnroll] Include hotness of region in opt remark LoopUnroll is a loop pass, so the analysis of OptimizationRemarkEmitter is added to the common function analysis passes that loop passes depend on. The BFI and indirectly BPI used in this pass is computed lazily so no overhead should be observed unless -pass-remarks-with-hotness is used. This is how the patch affects the O3 pipeline: Dominator Tree Construction Natural Loop Information Canonicalize natural loops Loop-Closed SSA Form Pass Basic Alias Analysis (stateless AA impl) Function Alias Analysis Results Scalar Evolution Analysis + Lazy Branch Probability Analysis + Lazy Block Frequency Analysis + Optimization Remark Emitter Loop Pass Manager Rotate Loops Loop Invariant Code Motion Unswitch loops Simplify the CFG Dominator Tree Construction Basic Alias Analysis (stateless AA impl) Function Alias Analysis Results Combine redundant instructions Natural Loop Information Canonicalize natural loops Loop-Closed SSA Form Pass Scalar Evolution Analysis + Lazy Branch Probability Analysis + Lazy Block Frequency Analysis + Optimization Remark Emitter Loop Pass Manager Induction Variable Simplification Recognize loop idioms Delete dead loops Unroll loops ... llvm-svn: 277203 show more ...
# e3c18a5a	19-Jul-2016	Sean Silva <chisophugis@gmail.com>	[PM] Port LoopUnroll. We just set PreserveLCSSA to always true since we don't have an analogous method `mustPreserveAnalysisID(LCSSA)`. Also port LoopInfo verifier pass to test LoopUnrollPass. llv [PM] Port LoopUnroll. We just set PreserveLCSSA to always true since we don't have an analogous method `mustPreserveAnalysisID(LCSSA)`. Also port LoopInfo verifier pass to test LoopUnrollPass. llvm-svn: 276063 show more ...
# 4a697c31	15-Jun-2016	David Majnemer <david.majnemer@gmail.com>	[LoopUnroll] Don't crash trying to unroll loop with EH pad exit We do not support splitting cleanuppad or catchswitches. This is problematic for passes which assume that a loop is in loop simplify [LoopUnroll] Don't crash trying to unroll loop with EH pad exit We do not support splitting cleanuppad or catchswitches. This is problematic for passes which assume that a loop is in loop simplify form (the loop would have a dedicated exit block instead of sharing it). While it isn't great that we don't support this for cleanups, we still cannot make loop-simplify form an assertable precondition because indirectbr will also disable these sorts of CFG cleanups. This fixes PR28132. llvm-svn: 272739 show more ...
# 3e2f389a	08-Jun-2016	Evgeny Stupachenko <evstupac@gmail.com>	The patch set unroll disable pragma when unroll with user specified count has been applied. Summary: Previously SetLoopAlreadyUnrolled() set the disable pragma only if there was some loop metadata. The patch set unroll disable pragma when unroll with user specified count has been applied. Summary: Previously SetLoopAlreadyUnrolled() set the disable pragma only if there was some loop metadata. Now it set the pragma in all cases. This helps to prevent multiple unroll when -unroll-count=N is given. Reviewers: mzolotukhin Differential Revision: http://reviews.llvm.org/D20765 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 272195 show more ...
Revision tags: llvmorg-3.8.1, llvmorg-3.8.1-rc1
# 58564989	03-Jun-2016	Michael Zolotukhin <mzolotukhin@apple.com>	[LoopUnroll] Set correct thresholds for new recently enabled unrolling heuristic. In r270478, where I enabled the new heuristic I posted testing results, which I got when explicitly passed the thres [LoopUnroll] Set correct thresholds for new recently enabled unrolling heuristic. In r270478, where I enabled the new heuristic I posted testing results, which I got when explicitly passed the thresholds values via CL options. However, setting the CL options init-values is not enough to change the default values of thresholds, so I'm changing them in another place now. llvm-svn: 271615 show more ...
# b787522d	28-May-2016	Evgeny Stupachenko <evstupac@gmail.com>	The patch fixes r271071 Summary: unused variables in Release mode: BasicBlock *Header unsigned OrigCount put under DEBUG From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 271076
# ea2aef4a	27-May-2016	Evgeny Stupachenko <evstupac@gmail.com>	The patch refactors unroll pass. Summary: Unroll factor (Count) calculations moved to a new function. Early exits on pragma and "-unroll-count" defined factor added. New type of unrolling "Force" int The patch refactors unroll pass. Summary: Unroll factor (Count) calculations moved to a new function. Early exits on pragma and "-unroll-count" defined factor added. New type of unrolling "Force" introduced (previously used implicitly). New unroll preference "AllowRemainder" introduced and set "true" by default. (should be set to false for architectures that suffers from it). Reviewers: hfinkel, mzolotukhin, zzheng Differential Revision: http://reviews.llvm.org/D19553 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 271071 show more ...
# 82de7d32	27-May-2016	Benjamin Kramer <benny.kra@googlemail.com>	Apply clang-tidy's misc-move-constructor-init throughout LLVM. No functionality change intended, maybe a tiny performance improvement. llvm-svn: 270997
# 1ecdedad	26-May-2016	Michael Zolotukhin <mzolotukhin@apple.com>	[LoopUnrollAnalyzer] Fix a crash in analyzeLoopUnrollCost. Condition might be simplified to a Constant, but it doesn't have to be ConstantInt, so we should dyn_cast, instead of cast. This fixes PR2 [LoopUnrollAnalyzer] Fix a crash in analyzeLoopUnrollCost. Condition might be simplified to a Constant, but it doesn't have to be ConstantInt, so we should dyn_cast, instead of cast. This fixes PR27886. llvm-svn: 270924 show more ...
# 8f7a242c	24-May-2016	Michael Zolotukhin <mzolotukhin@apple.com>	Re-enable "[LoopUnroll] Enable advanced unrolling analysis by default" one more time. This reverts commit r270577. llvm-svn: 270630
# b64e4390	24-May-2016	Hans Wennborg <hans@hanshq.net>	Revert r270518, which re-enabled "[LoopUnroll] Enable advanced unrolling analysis by default. Chromium builds are still hitting the assert in PR27874. llvm-svn: 270577
# 96c150d1	24-May-2016	Michael Zolotukhin <mzolotukhin@apple.com>	Revert "Revert r270478 "[LoopUnroll] Enable advanced unrolling analysis by default."" This reverts commit r270512 and reapplies r270478. Originally it caused PR27847, but it was fixed in r270517. l Revert "Revert r270478 "[LoopUnroll] Enable advanced unrolling analysis by default."" This reverts commit r270512 and reapplies r270478. Originally it caused PR27847, but it was fixed in r270517. llvm-svn: 270518 show more ...
# 6951028b	23-May-2016	Hans Wennborg <hans@hanshq.net>	Revert r270478 "[LoopUnroll] Enable advanced unrolling analysis by default." This caused PR27847. llvm-svn: 270512
# be080fc5	23-May-2016	Michael Zolotukhin <mzolotukhin@apple.com>	[LoopUnroll] Enable advanced unrolling analysis by default. Summary: This patch turns on LoopUnrollAnalyzer by default. To mitigate compile time regressions, I chose very conservative thresholds for [LoopUnroll] Enable advanced unrolling analysis by default. Summary: This patch turns on LoopUnrollAnalyzer by default. To mitigate compile time regressions, I chose very conservative thresholds for now. Later we can make them more aggressive, but it might require being smarter in which loops we're optimizing. E.g. currently the biggest issue is that with more agressive thresholds we unroll many cold loops, which increases compile time for no performance benefit (performance of those loops is improved, but it doesn't matter since they are cold). Test results for compile time(using 4 samples to reduce noise): ``` MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes 5.19% SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect 4.19% MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow 3.39% MultiSource/Applications/JM/lencod/lencod 1.47% MultiSource/Benchmarks/Fhourstones-3_1/fhourstones3_1 -6.06% ``` I didn't see any performance changes in the testsuite, but it improves some internal tests. Reviewers: hfinkel, chandlerc Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D20482 llvm-svn: 270478 show more ...
# d2268a73	18-May-2016	Michael Zolotukhin <mzolotukhin@apple.com>	[LoopUnrollAnalyzer] Take into account cost of instructions controlling branches, along with their operands. Previously, we didn't add their and their operands cost, which could've resulted in unrol [LoopUnrollAnalyzer] Take into account cost of instructions controlling branches, along with their operands. Previously, we didn't add their and their operands cost, which could've resulted in unrolling loops for no actual benefit. llvm-svn: 269985 show more ...
# 963a6d9c	13-May-2016	Michael Zolotukhin <mzolotukhin@apple.com>	Revert "Revert "[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that inst Revert "Revert "[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the..."" This reverts commit r269395. Try to reapply with a fix from chapuni. llvm-svn: 269486 show more ...
# 9be3b8b9	13-May-2016	Michael Zolotukhin <mzolotukhin@apple.com>	Revert "[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction Revert "[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the..." This reverts commit r269388. It caused some bots to fail, I'm reverting it until I investigate the issue. llvm-svn: 269395 show more ...
# b7b80529	13-May-2016	Michael Zolotukhin <mzolotukhin@apple.com>	[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes [Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the... Summary: ...loop after the last iteration. This is really hard to do correctly. The core problem is that we need to model liveness through the induction PHIs from iteration to iteration in order to get the correct results, and we need to correctly de-duplicate the common subgraphs of instructions feeding some subset of the induction PHIs. All of this can be driven either from a side effect at some iteration or from the loop values used after the loop finishes. This patch implements this by storing the forward-propagating analysis of each instruction in a cache to recall whether it was free and whether it has become live and thus counted toward the total unroll cost. Then, at each sink for a value in the loop, we recursively walk back through every value that feeds the sink, including looping back through the iterations as needed, until we have marked the entire input graph as live. Because we cache this, we never visit instructions more than twice -- once when we analyze them and put them into the cache, and once when we count their cost towards the unrolled loop. Also, because the cache is only two bits and because we are dealing with relatively small iteration counts, we can store all of this very densely in memory to avoid this from becoming an excessively slow analysis. The code here is still pretty gross. I would appreciate suggestions about better ways to factor or split this up, I've stared too long at the algorithmic side to really have a good sense of what the design should probably look at. Also, it might seem like we should do all of this bottom-up, but I think that is a red herring. Specifically, the simplification power is much greater working top-down. We can forward propagate very effectively, even across strange and interesting recurrances around the backedge. Because we use data to propagate, this doesn't cause a state space explosion. Doing this level of constant folding, etc, would be very expensive to do bottom-up because it wouldn't be until the last moment that you could collapse everything. The current solution is essentially a top-down simplification with a bottom-up cost accounting which seems to get the best of both worlds. It makes the simplification incremental and powerful while leaving everything dead until we know it is needed. Finally, a core property of this approach is its monotonicity. At all times, the current UnrolledCost is a conservatively low estimate. This ensures that we will never early-exit from the analysis due to exceeding a threshold when if we had continued, the cost would have gone back below the threshold. These kinds of bugs can cause incredibly hard to track down random changes to behavior. We could use a techinque similar (but much simpler) within the inliner as well to avoid considering speculated code in the inline cost. Reviewers: chandlerc Subscribers: sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D11758 llvm-svn: 269388 show more ...
# 719b26ba	10-May-2016	Hans Wennborg <hans@hanshq.net>	Loop unroller: set thresholds for optsize and minsize functions to zero Before r268509, Clang would disable the loop unroll pass when optimizing for size. That commit enabled it to be able to suppor Loop unroller: set thresholds for optsize and minsize functions to zero Before r268509, Clang would disable the loop unroll pass when optimizing for size. That commit enabled it to be able to support unroll pragmas in -Os builds. However, this regressed binary size in one of Chromium's DLLs with ~100 KB. This restores the original behaviour of no unrolling at -Os, but doing it in LLVM instead of Clang makes more sense, and also allows the pragmas to keep working. Differential revision: http://reviews.llvm.org/D20115 llvm-svn: 269124 show more ...
1 2 3 4 5 6 789 10 >>...15