Revision tags: llvmorg-3.6.1 |
|
#
a6ae877a |
| 12-May-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[Unrolling] Refactor the start and step offsets to simplify overflow checking and make the cache faster and smaller.
I had thought that using an APInt here would be useful, but I think I was just wr
[Unrolling] Refactor the start and step offsets to simplify overflow checking and make the cache faster and smaller.
I had thought that using an APInt here would be useful, but I think I was just wrong. Notably, we don't have to do any fancy overflow checking, we can just bound the values as quite small and do the math in a higher precision integer. I've switched to a signed integer so that UBSan will even point out if we ever have integer overflow. I've added various asserts to try to catch things as well and hoisted the overflow checks so that we just leave the too-large offsets out of the SCEV-GEP cache. This makes the value in the cache quite a bit smaller which is probably worthwhile.
No functionality changed here (for trip counts under 1 billion).
llvm-svn: 237209
show more ...
|
#
8c68171f |
| 12-May-2015 |
Michael Zolotukhin <mzolotukhin@apple.com> |
Reimplement heuristic for estimating complete-unroll optimization effects.
Summary: This patch reimplements heuristic that tries to estimate optimization beneftis from complete loop unrolling.
In t
Reimplement heuristic for estimating complete-unroll optimization effects.
Summary: This patch reimplements heuristic that tries to estimate optimization beneftis from complete loop unrolling.
In this patch I kept the minimal changes - e.g. I removed code handling branches and folding compares. That's a promising area, but now there are too many questions to discuss before we can enable it.
Test Plan: Tests are included in the patch.
Reviewers: hfinkel, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8816
llvm-svn: 237156
show more ...
|
Revision tags: llvmorg-3.6.1-rc1 |
|
#
e178f469 |
| 14-Apr-2015 |
Sanjoy Das <sanjoy@playingwithpointers.com> |
[LoopUnrollRuntime] Avoid high-cost trip count computation.
Summary: Runtime unrolling of loops needs to emit an expression to compute the loop's runtime trip-count. Avoid runtime unrolling if this
[LoopUnrollRuntime] Avoid high-cost trip count computation.
Summary: Runtime unrolling of loops needs to emit an expression to compute the loop's runtime trip-count. Avoid runtime unrolling if this computation will be expensive.
Depends on D8993.
Reviewers: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8994
llvm-svn: 234846
show more ...
|
#
799003bf |
| 23-Mar-2015 |
Benjamin Kramer <benny.kra@googlemail.com> |
Re-sort includes with sort-includes.py and insert raw_ostream.h where it's used.
llvm-svn: 232998
|
#
51f6096c |
| 23-Mar-2015 |
Benjamin Kramer <benny.kra@googlemail.com> |
Move private classes into anonymous namespaces
NFC.
llvm-svn: 232944
|
Revision tags: llvmorg-3.5.2, llvmorg-3.5.2-rc1 |
|
#
a28d91d8 |
| 10-Mar-2015 |
Mehdi Amini <mehdi.amini@apple.com> |
DataLayout is mandatory, update the API to reflect it with references.
Summary: Now that the DataLayout is a mandatory part of the module, let's start cleaning the codebase. This patch is a first at
DataLayout is mandatory, update the API to reflect it with references.
Summary: Now that the DataLayout is a mandatory part of the module, let's start cleaning the codebase. This patch is a first attempt at doing that.
This patch is not exactly NFC as for instance some places were passing a nullptr instead of the DataLayout, possibly just because there was a default value on the DataLayout argument to many functions in the API. Even though it is not purely NFC, there is no change in the validation.
I turned as many pointer to DataLayout to references, this helped figuring out all the places where a nullptr could come up.
I had initially a local version of this patch broken into over 30 independant, commits but some later commit were cleaning the API and touching part of the code modified in the previous commits, so it seemed cleaner without the intermediate state.
Test Plan:
Reviewers: echristo
Subscribers: llvm-commits
From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231740
show more ...
|
#
715b01e9 |
| 09-Mar-2015 |
Kevin Qin <Kevin.Qin@arm.com> |
Introduce runtime unrolling disable matadata and use it to mark the scalar loop from vectorization.
Runtime unrolling is an expensive optimization which can bring benefit only if the loop is hot and
Introduce runtime unrolling disable matadata and use it to mark the scalar loop from vectorization.
Runtime unrolling is an expensive optimization which can bring benefit only if the loop is hot and iteration number is relatively large enough. For some loops, we know they are not worth to be runtime unrolled. The scalar loop from vectorization is one of the cases.
llvm-svn: 231631
show more ...
|
Revision tags: llvmorg-3.6.0, llvmorg-3.6.0-rc4 |
|
#
2c79ad97 |
| 14-Feb-2015 |
Duncan P. N. Exon Smith <dexonsmith@apple.com> |
Transforms: Canonicalize access to function attributes, NFC
Canonicalize access to function attributes to use the simpler API.
getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => g
Transforms: Canonicalize access to function attributes, NFC
Canonicalize access to function attributes to use the simpler API.
getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind)
getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind)
llvm-svn: 229202
show more ...
|
#
1fbc3165 |
| 13-Feb-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[unroll] Concede defeat and disable the unroll analyzer for now.
The issues with the new unroll analyzer are more fundamental than code cleanup, algorithm, or data structure changes. I've sent an em
[unroll] Concede defeat and disable the unroll analyzer for now.
The issues with the new unroll analyzer are more fundamental than code cleanup, algorithm, or data structure changes. I've sent an email to the original commit thread with details and a proposal for how to redesign things. I'm disabling this for now so that we don't spend time debugging issues with it in its current state.
llvm-svn: 229064
show more ...
|
#
6c03dff7 |
| 13-Feb-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[unroll] Merge the simplification and DCE estimation methods on the UnrollAnalyzer.
Now they share a single worklist and have less implicit state between them. There was no real benefit to separatin
[unroll] Merge the simplification and DCE estimation methods on the UnrollAnalyzer.
Now they share a single worklist and have less implicit state between them. There was no real benefit to separating these two things out.
I'm going to subsequently refactor things to share even more code.
llvm-svn: 229062
show more ...
|
#
d9591d89 |
| 13-Feb-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[unroll] Remove pointless dyn_cast<>s to Instruction - the users of an instruction must by definition be instructions.
llvm-svn: 229061
|
#
5457e20d |
| 13-Feb-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[unroll] Don't check the loop set for whether an instruction is contained in it each time we try to add it to the worklist, just check this when pulling it off the worklist. That way we do it at most
[unroll] Don't check the loop set for whether an instruction is contained in it each time we try to add it to the worklist, just check this when pulling it off the worklist. That way we do it at most once per instruction with the cost of the worklist set we would need to pay anyways.
llvm-svn: 229060
show more ...
|
#
e5c30e4e |
| 13-Feb-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[unroll] Change the other worklist in the unroll analyzer to be a set vector.
In addition to dramatically reducing the work required for contrived example loops, this also has to correct some seriou
[unroll] Change the other worklist in the unroll analyzer to be a set vector.
In addition to dramatically reducing the work required for contrived example loops, this also has to correct some serious latent bugs in the cost computation. Previously, we might add an instruction onto the worklist once for every load which it used and was simplified. Then we would visit it many times and accumulate "savings" each time.
I mean, fortunately this couldn't matter for things like calls with 100s of operands, but even for binary operators this code seems like it must be double counting the savings.
I just noticed this by inspection and due to the runtime problems it can introduce, I don't have any test cases for cases where the cost produced by this routine is unacceptable.
llvm-svn: 229059
show more ...
|
#
7824bc92 |
| 13-Feb-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[unroll] Replace a boolean, for loop, condition, and break with std::all_of and a lambda. Much cleaner, no functionality changed.
llvm-svn: 229058
|
#
06d537cd |
| 13-Feb-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[unroll] Directly query for dead instructions.
In the unroll analyzer, it is checking each user to see if that user will become dead. However, it first checked if that user was missing from the simp
[unroll] Directly query for dead instructions.
In the unroll analyzer, it is checking each user to see if that user will become dead. However, it first checked if that user was missing from the simplified values map, and then if was also missing from the dead instructions set. We add everything from the simplified values map to the dead instructions set, so the first step is completely subsumed by the second. Moreover, the first step requires *inserting* something into the simplified value map which isn't what we want at all.
This also replaces a dyn_cast with a cast as an instruction cannot be used by a non-instruction.
llvm-svn: 229057
show more ...
|
#
82cb30f1 |
| 13-Feb-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[unroll] Replace a linear time check for no uses with a constant time check.
Also hoist this into the enqueue process as it is faster even than testing the worklist set, we should just directly filt
[unroll] Replace a linear time check for no uses with a constant time check.
Also hoist this into the enqueue process as it is faster even than testing the worklist set, we should just directly filter these out much like we filter out constants and such.
llvm-svn: 229056
show more ...
|
#
3b057b32 |
| 13-Feb-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[unroll] Rather than an operand set, use a setvector for the worklist.
We don't just want to handle duplicate operands within an instruction, but also duplicates across operands of different instruc
[unroll] Rather than an operand set, use a setvector for the worklist.
We don't just want to handle duplicate operands within an instruction, but also duplicates across operands of different instructions. I should have gone straight to this, but I had convinced myself that it wasn't going to be necessary briefly. I've come to my senses after chatting more with Nick, and am now happier here.
llvm-svn: 229054
show more ...
|
#
17a0496b |
| 13-Feb-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[unroll] Extract the code to enqueue operansd for the worklist in the unroll analysis into a lambda and call it. That's much simpler than duplicating all the code.
llvm-svn: 229053
|
#
8c86375a |
| 13-Feb-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[unroll] Use a small set to de-duplicate operands prior to putting them into the worklist. This avoids allocating lots of worklist memory for them when there are large numbers of repeated operands.
[unroll] Use a small set to de-duplicate operands prior to putting them into the worklist. This avoids allocating lots of worklist memory for them when there are large numbers of repeated operands.
llvm-svn: 229052
show more ...
|
#
93063e61 |
| 13-Feb-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[unroll] Make the unroll cost analysis terminate deterministically and reasonably quickly.
I don't have a reduced test case, but for a version of FFMPEG, this makes the loop unroller start finishing
[unroll] Make the unroll cost analysis terminate deterministically and reasonably quickly.
I don't have a reduced test case, but for a version of FFMPEG, this makes the loop unroller start finishing at all (after over 15 minutes of running, it hadn't terminated for me, no idea if it was a true infloop or just exponential work).
The key thing here is to check the DeadInstructions set when pulling things off the worklist. Without this, we would re-walk the user list of already dead instructions again and again and again. Consider phi nodes with many, many operands and other patterns.
The other important aspect of this is that because we would keep re-visiting instructions that were already known dead, we kept adding their cost savings to this! This would cause our cost savings to be *insanely* inflated from this.
While I was here, I also rotated the operand walk out of the worklist loop to make the code easier to read. There is still work to be done to minimize worklist traffic because we don't de-duplicate operands. This means we may add the same instruction onto the worklist 1000s of times if it shows up in 1000s of operansd to a PHI node for example.
Still, with this patch, the ffmpeg testcase I have finishes quickly and I can't measure the runtime impact of the unroll analysis any more. I'll probably try to do a few more cleanups to this code, but not sure how much cleanup I can justify right now.
llvm-svn: 229038
show more ...
|
Revision tags: llvmorg-3.6.0-rc3 |
|
#
dd6029fc |
| 13-Feb-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[unroll] Make range based for loops a bit more explicit and more readable.
The biggest thing that was causing me problems is recognizing the references vs. poniters here. I also found that for maps
[unroll] Make range based for loops a bit more explicit and more readable.
The biggest thing that was causing me problems is recognizing the references vs. poniters here. I also found that for maps naming the loop variable as KeyValue helps make it obvious why you don't actually use it directly. Finally, using 'auto' instead of 'User *' doesn't seem like a good tradeoff. Much like with the other cases, I like to know its a pointer, and 'User' is just as long and tells the reader a lot more.
llvm-svn: 229033
show more ...
|
#
415f4125 |
| 13-Feb-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[unroll] Avoid the "Insn" abbreviation of Instruction. This is quite hard to type and read for me, and is inconsistent with the other abbreviation in the base class "Inst". For most of these (where t
[unroll] Avoid the "Insn" abbreviation of Instruction. This is quite hard to type and read for me, and is inconsistent with the other abbreviation in the base class "Inst". For most of these (where they are used widely) I prefer just spelling it out as Instruction. I've changed two of the short-lived variables to use "Inst" to match the base class.
llvm-svn: 229028
show more ...
|
#
302a133b |
| 13-Feb-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[unroll] Tidy up the integer we use to accumululate the number of instructions optimized. NFC, just separating this out from the functionality changing commit.
llvm-svn: 229026
|
#
10a9926a |
| 13-Feb-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[unroll] Don't use a map from pointer to bool. Use a set.
This is much more efficient. In particular, the query with the user instruction has to insert a false for every missing instruction into the
[unroll] Don't use a map from pointer to bool. Use a set.
This is much more efficient. In particular, the query with the user instruction has to insert a false for every missing instruction into the set. This is just a cleanup a long the way to fixing the underlying algorithm problems here.
llvm-svn: 228994
show more ...
|
#
1b480197 |
| 13-Feb-2015 |
Michael Zolotukhin <mzolotukhin@apple.com> |
Prevent division by 0.
When we try to estimate number of potentially removed instructions in loop unroller, we analyze first N iterations and then scale the computed number by TripCount/N. We should
Prevent division by 0.
When we try to estimate number of potentially removed instructions in loop unroller, we analyze first N iterations and then scale the computed number by TripCount/N. We should bail out early if N is 0.
llvm-svn: 228988
show more ...
|