#
b7b80529 |
| 13-May-2016 |
Michael Zolotukhin <mzolotukhin@apple.com> |
[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes
[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the...
Summary: ...loop after the last iteration.
This is really hard to do correctly. The core problem is that we need to model liveness through the induction PHIs from iteration to iteration in order to get the correct results, and we need to correctly de-duplicate the common subgraphs of instructions feeding some subset of the induction PHIs. All of this can be driven either from a side effect at some iteration or from the loop values used after the loop finishes.
This patch implements this by storing the forward-propagating analysis of each instruction in a cache to recall whether it was free and whether it has become live and thus counted toward the total unroll cost. Then, at each sink for a value in the loop, we recursively walk back through every value that feeds the sink, including looping back through the iterations as needed, until we have marked the entire input graph as live. Because we cache this, we never visit instructions more than twice -- once when we analyze them and put them into the cache, and once when we count their cost towards the unrolled loop. Also, because the cache is only two bits and because we are dealing with relatively small iteration counts, we can store all of this very densely in memory to avoid this from becoming an excessively slow analysis.
The code here is still pretty gross. I would appreciate suggestions about better ways to factor or split this up, I've stared too long at the algorithmic side to really have a good sense of what the design should probably look at.
Also, it might seem like we should do all of this bottom-up, but I think that is a red herring. Specifically, the simplification power is *much* greater working top-down. We can forward propagate very effectively, even across strange and interesting recurrances around the backedge. Because we use data to propagate, this doesn't cause a state space explosion. Doing this level of constant folding, etc, would be very expensive to do bottom-up because it wouldn't be until the last moment that you could collapse everything. The current solution is essentially a top-down simplification with a bottom-up cost accounting which seems to get the best of both worlds. It makes the simplification incremental and powerful while leaving everything dead until we *know* it is needed.
Finally, a core property of this approach is its *monotonicity*. At all times, the current UnrolledCost is a conservatively low estimate. This ensures that we will never early-exit from the analysis due to exceeding a threshold when if we had continued, the cost would have gone back below the threshold. These kinds of bugs can cause incredibly hard to track down random changes to behavior.
We could use a techinque similar (but much simpler) within the inliner as well to avoid considering speculated code in the inline cost.
Reviewers: chandlerc
Subscribers: sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D11758
llvm-svn: 269388
show more ...
|
#
a59a308e |
| 13-May-2016 |
Michael Zolotukhin <mzolotukhin@apple.com> |
[LoopUnrollAnalyzer] Don't treat gep-instructions with simplified offset as simplified.
Summary: Currently we consider such instructions as simplified, which is incorrect, because if their user isn'
[LoopUnrollAnalyzer] Don't treat gep-instructions with simplified offset as simplified.
Summary: Currently we consider such instructions as simplified, which is incorrect, because if their user isn't simplified, we can't actually simplify them too. This biases our estimates of profitability: for instance the analyzer expects much more gains from unrolling memcpy loops than there actually are.
Reviewers: hfinkel, chandlerc
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D17365
llvm-svn: 269387
show more ...
|
#
9f520ebc |
| 26-Feb-2016 |
Michael Zolotukhin <mzolotukhin@apple.com> |
[LoopUnrollAnalyzer] Check that we're using SCEV for the same loop we're simulating.
Summary: Check that we're using SCEV for the same loop we're simulating. Otherwise, we might try to use the itera
[LoopUnrollAnalyzer] Check that we're using SCEV for the same loop we're simulating.
Summary: Check that we're using SCEV for the same loop we're simulating. Otherwise, we might try to use the iteration number of the current loop in SCEV expressions for inner/outer loops IVs, which is clearly incorrect.
Reviewers: chandlerc, hfinkel
Subscribers: sanjoy, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D17632
llvm-svn: 261958
show more ...
|
#
1da4afdf |
| 08-Feb-2016 |
Michael Zolotukhin <mzolotukhin@apple.com> |
Factor out UnrollAnalyzer to Analysis, and add unit tests for it.
Summary: Unrolling Analyzer is already pretty complicated, and it becomes harder and harder to exercise it with usual IR tests, as w
Factor out UnrollAnalyzer to Analysis, and add unit tests for it.
Summary: Unrolling Analyzer is already pretty complicated, and it becomes harder and harder to exercise it with usual IR tests, as with them we can only check the final decision: whether the loop is unrolled or not. This change factors this framework out from LoopUnrollPass to analyses, which allows to use unit tests. The change itself is supposed to be NFC, except adding a couple of tests.
I plan to add more tests as I add new functionality and find/fix bugs.
Reviewers: chandlerc, hfinkel, sanjoy
Subscribers: zzheng, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D16623
llvm-svn: 260169
show more ...
|