#
186ad608 |
| 13-Feb-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[unroll] Update the new analysis logic from r228265 to use modern coding conventions for function names consistently. Some were already using this but not all.
llvm-svn: 228987
|
#
7af83c1f |
| 06-Feb-2015 |
Michael Zolotukhin <mzolotukhin@apple.com> |
Use estimated number of optimized insns in unroll-threshold computation.
If complete-unroll could help us to optimize away N% of instructions, we might want to do this even if the final size would e
Use estimated number of optimized insns in unroll-threshold computation.
If complete-unroll could help us to optimize away N% of instructions, we might want to do this even if the final size would exceed loop-unroll threshold. However, we don't want to unroll huge loop, and we are add AbsoluteThreshold to avoid that - this threshold will never be crossed, even if we expect to optimize 99% instructions after that.
llvm-svn: 228434
show more ...
|
#
4e8598ee |
| 06-Feb-2015 |
Michael Zolotukhin <mzolotukhin@apple.com> |
[InstSimplify] Add SimplifyFPBinOp function.
It is a variation of SimplifyBinOp, but it takes into account FastMathFlags.
It is needed in inliner and loop-unroller to accurately predict the transfo
[InstSimplify] Add SimplifyFPBinOp function.
It is a variation of SimplifyBinOp, but it takes into account FastMathFlags.
It is needed in inliner and loop-unroller to accurately predict the transformation's outcome (previously we dropped the flags and were too conservative in some cases).
Example: float foo(float *a, float b) { float r; if (a[1] * b) r = /* a lot of expensive computations */; else r = 1; return r; } float boo(float *a) { return foo(a, 0.0); }
Without this patch, we don't inline 'foo' into 'boo'.
llvm-svn: 228432
show more ...
|
#
a9aadd29 |
| 05-Feb-2015 |
Michael Zolotukhin <mzolotukhin@apple.com> |
Implement new heuristic for complete loop unrolling.
Complete loop unrolling can make some loads constant, thus enabling a lot of other optimizations. To catch such cases, we look for loads that mig
Implement new heuristic for complete loop unrolling.
Complete loop unrolling can make some loads constant, thus enabling a lot of other optimizations. To catch such cases, we look for loads that might become constants and estimate number of instructions that would be simplified or become dead after substitution.
Example: Suppose we have: int a[] = {0, 1, 0}; v = 0; for (i = 0; i < 3; i ++) v += b[i]*a[i];
If we completely unroll the loop, we would get: v = b[0]*a[0] + b[1]*a[1] + b[2]*a[2]
Which then will be simplified to: v = b[0]* 0 + b[1]* 1 + b[2]* 0
And finally: v = b[1]
llvm-svn: 228265
show more ...
|
#
49a766e4 |
| 02-Feb-2015 |
Jingyue Wu <jingyue@google.com> |
Resurrect the assertion removed by r227717
Summary: MSVC can compile "LoopID->getOperand(0) == LoopID" when LoopID is MDNode*.
Test Plan: no regression
Reviewers: mkuper
Subscribers: jholewinski,
Resurrect the assertion removed by r227717
Summary: MSVC can compile "LoopID->getOperand(0) == LoopID" when LoopID is MDNode*.
Test Plan: no regression
Reviewers: mkuper
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D7327
llvm-svn: 227853
show more ...
|
#
21fc195c |
| 01-Feb-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[multiversion] Kill FunctionTargetTransformInfo, TTI itself is now per-function and supports the exact desired interface.
llvm-svn: 227743
|
#
fdb9c573 |
| 01-Feb-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[multiversion] Thread a function argument through all the callers of the getTTI method used to get an actual TTI object.
No functionality changed. This just threads the argument and ensures code lik
[multiversion] Thread a function argument through all the callers of the getTTI method used to get an actual TTI object.
No functionality changed. This just threads the argument and ensures code like the inliner can correctly look up the callee's TTI rather than using a fixed one.
The next change will use this to implement per-function subtarget usage by TTI. The changes after that should eliminate the need for FTTI as that will have become the default.
llvm-svn: 227730
show more ...
|
#
0220df0d |
| 01-Feb-2015 |
Jingyue Wu <jingyue@google.com> |
[NVPTX] Emit .pragma "nounroll" for loops marked with nounroll
Summary: CUDA driver can unroll loops when jit-compiling PTX. To prevent CUDA driver from unrolling a loop marked with llvm.loop.unroll
[NVPTX] Emit .pragma "nounroll" for loops marked with nounroll
Summary: CUDA driver can unroll loops when jit-compiling PTX. To prevent CUDA driver from unrolling a loop marked with llvm.loop.unroll.disable is not unrolled by CUDA driver, we need to emit .pragma "nounroll" at the header of that loop.
This patch also extracts getting unroll metadata from loop ID metadata into a shared helper function.
Test Plan: test/CodeGen/NVPTX/nounroll.ll
Reviewers: eliben, meheff, jholewinski
Reviewed By: jholewinski
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D7041
llvm-svn: 227703
show more ...
|
#
705b185f |
| 31-Jan-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[PM] Change the core design of the TTI analysis to use a polymorphic type erased interface and a single analysis pass rather than an extremely complex analysis group.
The end result is that the TTI
[PM] Change the core design of the TTI analysis to use a polymorphic type erased interface and a single analysis pass rather than an extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased implementation that supports the polymorphic TTI interface. We can build one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes, including CRTP base classes to facilitate calling back up to the most specialized form when delegating horizontally across the surface. These aren't as clean as I would like and I'm planning to work on cleaning some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular design. The first and foremost reason is that an analysis group is complete overkill, and the chaining delegation strategy was so opaque, confusing, and high overhead that TTI was suffering greatly for it. Several of the TTI functions had failed to be implemented in all places because of the chaining-based delegation making there be no checking of this. A few other functions were implemented with incorrect delegation. The message to me was very clear working on this -- the delegation and analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for the new pass manager. This will lay the ground work for a type-erased per-function info object that can look up the correct subtarget and even cache it.
Yet another benefit is that this will significantly simplify the interaction of the pass managers and the TargetMachine. See the future work below.
The downside of this change is that it is very, very verbose. I'm going to work to improve that, but it is somewhat an implementation necessity in C++ to do type erasure. =/ I discussed this design really extensively with Eric and Hal prior to going down this path, and afterward showed them the result. No one was really thrilled with it, but there doesn't seem to be a substantially better alternative. Using a base class and virtual method dispatch would make the code much shorter, but as discussed in the update to the programmer's manual and elsewhere, a polymorphic interface feels like the more principled approach even if this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the huge chunk that I couldn't really split things out of because this was the interface change to TTI. I've tried to minimize all the other parts of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return a TTI object. Because we have a non-pass object with value semantics and an internal type erasure mechanism, we can narrow the interface of the TargetMachine to *just* do what we need: build and return a TTI object that we can then insert into the pass pipeline. 2) Make the TTI object be fully specialized for a particular function. This will include splitting off a minimal form of it which is sufficient for the inliner and the old pass manager. 3) Add a new pass manager analysis which produces TTI objects from the target machine for each function. This may actually be done as part of #2 in order to use the new analysis to implement #2. 4) Work on narrowing the API between TTI and the targets so that it is easier to understand and less verbose to type erase. 5) Work on narrowing the API between TTI and its clients so that it is easier to understand and less verbose to forward. 6) Try to improve the CRTP-based delegation. I feel like this code is just a bit messy and exacerbating the complexity of implementing the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on this somewhat more abruptly than I expected, and so I appreciate getting it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
show more ...
|
Revision tags: llvmorg-3.6.0-rc2 |
|
#
4f8f307c |
| 17-Jan-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[PM] Split the LoopInfo object apart from the legacy pass, creating a LoopInfoWrapperPass to wire the object up to the legacy pass manager.
This switches all the clients of LoopInfo over and paves t
[PM] Split the LoopInfo object apart from the legacy pass, creating a LoopInfoWrapperPass to wire the object up to the legacy pass manager.
This switches all the clients of LoopInfo over and paves the way to port LoopInfo to the new pass manager. No functionality change is intended with this iteration.
llvm-svn: 226373
show more ...
|
Revision tags: llvmorg-3.6.0-rc1 |
|
#
38dd5908 |
| 10-Jan-2015 |
Hal Finkel <hfinkel@anl.gov> |
[LoopUnroll] Fix the partial unrolling threshold for small loop sizes
When we compute the size of a loop, we include the branch on the backedge and the comparison feeding the conditional branch. Und
[LoopUnroll] Fix the partial unrolling threshold for small loop sizes
When we compute the size of a loop, we include the branch on the backedge and the comparison feeding the conditional branch. Under normal circumstances, these don't get replicated with the rest of the loop body when we unroll. This led to the somewhat surprising behavior that really small loops would not get unrolled enough -- they could be unrolled more and the resulting loop would be below the threshold, because we were assuming they'd take (LoopSize * UnrollingFactor) instructions after unrolling, instead of (((LoopSize-2) * UnrollingFactor)+2) instructions. This fixes that computation.
llvm-svn: 225565
show more ...
|
#
66b3130c |
| 04-Jan-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[PM] Split the AssumptionTracker immutable pass into two separate APIs: a cache of assumptions for a single function, and an immutable pass that manages those caches.
The motivation for this change
[PM] Split the AssumptionTracker immutable pass into two separate APIs: a cache of assumptions for a single function, and an immutable pass that manages those caches.
The motivation for this change is two fold. Immutable analyses are really hacks around the current pass manager design and don't exist in the new design. This is usually OK, but it requires that the core logic of an immutable pass be reasonably partitioned off from the pass logic. This change does precisely that. As a consequence it also paves the way for the *many* utility functions that deal in the assumptions to live in both pass manager worlds by creating an separate non-pass object with its own independent API that they all rely on. Now, the only bits of the system that deal with the actual pass mechanics are those that actually need to deal with the pass mechanics.
Once this separation is made, several simplifications become pretty obvious in the assumption cache itself. Rather than using a set and callback value handles, it can just be a vector of weak value handles. The callers can easily skip the handles that are null, and eventually we can wrap all of this up behind a filter iterator.
For now, this adds boiler plate to the various passes, but this kind of boiler plate will end up making it possible to port these passes to the new pass manager, and so it will end up factored away pretty reasonably.
llvm-svn: 225131
show more ...
|
Revision tags: llvmorg-3.5.1, llvmorg-3.5.1-rc2 |
|
#
5bf8fef5 |
| 09-Dec-2014 |
Duncan P. N. Exon Smith <dexonsmith@apple.com> |
IR: Split Metadata from Value
Split `Metadata` away from the `Value` class hierarchy, as part of PR21532. Assembly and bitcode changes are in the wings, but this is the bulk of the change for the I
IR: Split Metadata from Value
Split `Metadata` away from the `Value` class hierarchy, as part of PR21532. Assembly and bitcode changes are in the wings, but this is the bulk of the change for the IR C++ API.
I have a follow-up patch prepared for `clang`. If this breaks other sub-projects, I apologize in advance :(. Help me compile it on Darwin I'll try to fix it. FWIW, the errors should be easy to fix, so it may be simpler to just fix it yourself.
This breaks the build for all metadata-related code that's out-of-tree. Rest assured the transition is mechanical and the compiler should catch almost all of the problems.
Here's a quick guide for updating your code:
- `Metadata` is the root of a class hierarchy with three main classes: `MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from the `Value` class hierarchy. It is typeless -- i.e., instances do *not* have a `Type`.
- `MDNode`'s operands are all `Metadata *` (instead of `Value *`).
- `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively.
If you're referring solely to resolved `MDNode`s -- post graph construction -- just use `MDNode*`.
- `MDNode` (and the rest of `Metadata`) have only limited support for `replaceAllUsesWith()`.
As long as an `MDNode` is pointing at a forward declaration -- the result of `MDNode::getTemporary()` -- it maintains a side map of its uses and can RAUW itself. Once the forward declarations are fully resolved RAUW support is dropped on the ground. This means that uniquing collisions on changing operands cause nodes to become "distinct". (This already happened fairly commonly, whenever an operand went to null.)
If you're constructing complex (non self-reference) `MDNode` cycles, you need to call `MDNode::resolveCycles()` on each node (or on a top-level node that somehow references all of the nodes). Also, don't do that. Metadata cycles (and the RAUW machinery needed to construct them) are expensive.
- An `MDNode` can only refer to a `Constant` through a bridge called `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`).
As a side effect, accessing an operand of an `MDNode` that is known to be, e.g., `ConstantInt`, takes three steps: first, cast from `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`; third, cast down to `ConstantInt`.
The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have metadata schema owners transition away from using `Constant`s when the type isn't important (and they don't care about referring to `GlobalValue`s).
In the meantime, I've added transitional API to the `mdconst` namespace that matches semantics with the old code, in order to avoid adding the error-prone three-step equivalent to every call site. If your old code was:
MDNode *N = foo(); bar(isa <ConstantInt>(N->getOperand(0))); baz(cast <ConstantInt>(N->getOperand(1))); bak(cast_or_null <ConstantInt>(N->getOperand(2))); bat(dyn_cast <ConstantInt>(N->getOperand(3))); bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4)));
you can trivially match its semantics with:
MDNode *N = foo(); bar(mdconst::hasa <ConstantInt>(N->getOperand(0))); baz(mdconst::extract <ConstantInt>(N->getOperand(1))); bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2))); bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3))); bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4)));
and when you transition your metadata schema to `MDInt`:
MDNode *N = foo(); bar(isa <MDInt>(N->getOperand(0))); baz(cast <MDInt>(N->getOperand(1))); bak(cast_or_null <MDInt>(N->getOperand(2))); bat(dyn_cast <MDInt>(N->getOperand(3))); bay(dyn_cast_or_null<MDInt>(N->getOperand(4)));
- A `CallInst` -- specifically, intrinsic instructions -- can refer to metadata through a bridge called `MetadataAsValue`. This is a subclass of `Value` where `getType()->isMetadataTy()`.
`MetadataAsValue` is the *only* class that can legally refer to a `LocalAsMetadata`, which is a bridged form of non-`Constant` values like `Argument` and `Instruction`. It can also refer to any other `Metadata` subclass.
(I'll break all your testcases in a follow-up commit, when I propagate this change to assembly.)
llvm-svn: 223802
show more ...
|
Revision tags: llvmorg-3.5.1-rc1 |
|
#
6666c27e |
| 11-Oct-2014 |
Chandler Carruth <chandlerc@gmail.com> |
[SCEV] Add some asserts to the recently improved trip count computation routines and fix all of the bugs they expose.
I hit a test case that crashed even without these asserts due to passing a non-e
[SCEV] Add some asserts to the recently improved trip count computation routines and fix all of the bugs they expose.
I hit a test case that crashed even without these asserts due to passing a non-exiting latch to the ExitingBlock parameter of the trip count computation machinery. However, when I add the nice asserts, it turns out we have plenty of coverage of these bugs, they just didn't manifest in crashers.
The core problem seems to stem from an assumption that the latch *is* the exiting block. While this is often true, and somewhat the "normal" way to think about loops, it isn't necessarily true. The correct way to call the trip count routines in a *generic* fashion (that is, without a particular exit in mind) is to just use the loop's single exiting block if it has one. The trip count can't be computed generically unless it does. This works great for the loop vectorizer. The loop unroller actually *wants* to select the latch when it has to chose between multiple exits because for unrolling it is the latch trips that matter. But if this is the desire, it needs to explicitly guard for non-exiting latches and check for the generic trip count in that case.
I've added the asserts, and added convenience APIs for querying the trip count generically that check for a single exit block. I've kept the APIs consistent between computing trip count and trip multiples.
Thansk to Mark for the help debugging and tracking down the *right* fix here!
llvm-svn: 219550
show more ...
|
#
d85ffb1f |
| 18-Sep-2014 |
Eric Christopher <echristo@gmail.com> |
Add a new pass FunctionTargetTransformInfo. This pass serves as a shim between the TargetTransformInfo immutable pass and the Subtarget via the TargetMachine and Function. Migrate a single call from
Add a new pass FunctionTargetTransformInfo. This pass serves as a shim between the TargetTransformInfo immutable pass and the Subtarget via the TargetMachine and Function. Migrate a single call from BasicTargetTransformInfo as an example and provide shims where TargetMachine begins taking a Function to determine the subtarget.
No functional change.
llvm-svn: 218004
show more ...
|
#
57f03dda |
| 07-Sep-2014 |
Hal Finkel <hfinkel@anl.gov> |
Add functions for finding ephemeral values
This adds a set of utility functions for collecting 'ephemeral' values. These are LLVM IR values that are used only by @llvm.assume intrinsics (directly or
Add functions for finding ephemeral values
This adds a set of utility functions for collecting 'ephemeral' values. These are LLVM IR values that are used only by @llvm.assume intrinsics (directly or indirectly), and thus will be removed prior to code generation, implying that they should be considered free for certain purposes (like inlining). The inliner's cost analysis, and a few other passes, have been updated to account for ephemeral values using the provided functionality.
This functionality is important for the usability of @llvm.assume, because it limits the "non-local" side-effects of adding llvm.assume on inlining, loop unrolling, etc. (these are hints, and do not generate code, so they should not directly contribute to estimates of execution cost).
llvm-svn: 217335
show more ...
|
#
74c2f355 |
| 07-Sep-2014 |
Hal Finkel <hfinkel@anl.gov> |
Add an Assumption-Tracking Pass
This adds an immutable pass, AssumptionTracker, which keeps a cache of @llvm.assume call instructions within a module. It uses callback value handles to keep stale fu
Add an Assumption-Tracking Pass
This adds an immutable pass, AssumptionTracker, which keeps a cache of @llvm.assume call instructions within a module. It uses callback value handles to keep stale functions and intrinsics out of the map, and it relies on any code that creates new @llvm.assume calls to notify it of the new instructions. The benefit is that code needing to find @llvm.assume intrinsics can do so directly, without scanning the function, thus allowing the cost of @llvm.assume handling to be negligible when none are present.
The current design is intended to be lightweight. We don't keep track of anything until we need a list of assumptions in some function. The first time this happens, we scan the function. After that, we add/remove @llvm.assume calls from the cache in response to registration calls and ValueHandle callbacks.
There are no new direct test cases for this pass, but because it calls it validation function upon module finalization, we'll pick up detectable inconsistencies from the other tests that touch @llvm.assume calls.
This pass will be used by follow-up commits that make use of @llvm.assume.
llvm-svn: 217334
show more ...
|
#
89854ebe |
| 03-Sep-2014 |
Benjamin Kramer <benny.kra@googlemail.com> |
Make some helpers static or move into the llvm namespace.
llvm-svn: 217077
|
Revision tags: llvmorg-3.5.0, llvmorg-3.5.0-rc4, llvmorg-3.5.0-rc3, llvmorg-3.5.0-rc2 |
|
#
8ec1474f |
| 24-Jul-2014 |
Mark Heffernan <meheff@google.com> |
After unrolling a loop with llvm.loop.unroll.count metadata (unroll factor hint) the loop unroller replaces the llvm.loop.unroll.count metadata with llvm.loop.unroll.disable metadata to prevent any s
After unrolling a loop with llvm.loop.unroll.count metadata (unroll factor hint) the loop unroller replaces the llvm.loop.unroll.count metadata with llvm.loop.unroll.disable metadata to prevent any subsequent unrolling passes from unrolling more than the hint indicates. This patch fixes an issue where loop unrolling could be disabled for other loops as well which share the same llvm.loop metadata.
llvm-svn: 213900
show more ...
|
#
9e112443 |
| 23-Jul-2014 |
Mark Heffernan <meheff@google.com> |
Do not add unroll disable metadata after unrolling pass for loops with #pragma clang loop unroll(full).
llvm-svn: 213789
|
#
e6b4ba1c |
| 23-Jul-2014 |
Mark Heffernan <meheff@google.com> |
In unroll pragma syntax and loop hint metadata, change "enable" forms to a new form using the string "full".
llvm-svn: 213772
|
Revision tags: llvmorg-3.5.0-rc1 |
|
#
f3764da8 |
| 18-Jul-2014 |
Mark Heffernan <meheff@google.com> |
Fix build breakage introduced with r213412.
llvm-svn: 213414
|
#
053a6868 |
| 18-Jul-2014 |
Mark Heffernan <meheff@google.com> |
Remove unroll pragma metadata after it is used.
llvm-svn: 213412
|
#
5d5e18da |
| 25-Jun-2014 |
Eli Bendersky <eliben@google.com> |
Rename loop unrolling and loop vectorizer metadata to have a common prefix.
[LLVM part]
These patches rename the loop unrolling and loop vectorizer metadata such that they have a common 'llvm.loop.
Rename loop unrolling and loop vectorizer metadata to have a common prefix.
[LLVM part]
These patches rename the loop unrolling and loop vectorizer metadata such that they have a common 'llvm.loop.' prefix. Metadata name changes:
llvm.vectorizer.* => llvm.loop.vectorizer.* llvm.loopunroll.* => llvm.loop.unroll.*
This was a suggestion from an earlier review (http://reviews.llvm.org/D4090) which added the loop unrolling metadata.
Patch by Mark Heffernan.
llvm-svn: 211710
show more ...
|
#
ff903245 |
| 16-Jun-2014 |
Eli Bendersky <eliben@google.com> |
Teach LoopUnrollPass to respect loop unrolling hints in metadata.
[This is resubmitting r210721, which was reverted due to suspected breakage which turned out to be unrelated].
Some extra review co
Teach LoopUnrollPass to respect loop unrolling hints in metadata.
[This is resubmitting r210721, which was reverted due to suspected breakage which turned out to be unrelated].
Some extra review comments were addressed. See D4090 and D4147 for more details.
The Clang change that produces this metadata was committed in r210667
Patch by Mark Heffernan.
llvm-svn: 211076
show more ...
|