#
171817ee |
| 07-Aug-2013 |
Hal Finkel <hfinkel@anl.gov> |
Add ISD::FROUND for libm round()
All libm floating-point rounding functions, except for round(), had their own ISD nodes. Recent PowerPC cores have an instruction for round(), and so here I'm adding
Add ISD::FROUND for libm round()
All libm floating-point rounding functions, except for round(), had their own ISD nodes. Recent PowerPC cores have an instruction for round(), and so here I'm adding ISD::FROUND so that round() can be custom lowered as well.
For the most part, this is straightforward. I've added an intrinsic and a matching ISD node just like those for nearbyint() and friends. The SelectionDAG pattern I've named frnd (because ISD::FP_ROUND has already claimed fround).
This will be used by the PowerPC backend in a follow-up commit.
llvm-svn: 187926
show more ...
|
#
a7cd6bf3 |
| 06-Aug-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
LoopVectorize: Allow vectorization of loops with lifetime markers
Patch by Marc Jessome!
llvm-svn: 187825
|
#
8b1e021e |
| 27-Jul-2013 |
Tom Stellard <thomas.stellard@amd.com> |
SimplifyCFG: Use parallel-and and parallel-or mode to consolidate branch conditions
Merge consecutive if-regions if they contain identical statements. Both transformations reduce number of branches.
SimplifyCFG: Use parallel-and and parallel-or mode to consolidate branch conditions
Merge consecutive if-regions if they contain identical statements. Both transformations reduce number of branches. The transformation is guarded by a target-hook, and is currently enabled only for +R600, but the correctness has been tested on X86 target using a variety of CPU benchmarks.
Patch by: Mei Ye
llvm-svn: 187278
show more ...
|
#
9da9a43a |
| 12-Jul-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
TargetTransformInfo: address calculation parameter for gather/scather
Address calculation for gather/scather in vectorized code can incur a significant cost making vectorization unbeneficial. Add in
TargetTransformInfo: address calculation parameter for gather/scather
Address calculation for gather/scather in vectorized code can incur a significant cost making vectorization unbeneficial. Add infrastructure to add cost. Tests and cost model for targets will be in follow-up commits.
radar://14351991
llvm-svn: 186187
show more ...
|
Revision tags: llvmorg-3.3.1-rc1 |
|
#
ec474f28 |
| 08-Jul-2013 |
Hal Finkel <hfinkel@anl.gov> |
Add the nearbyint -> FNEARBYINT mapping to BasicTargetTransformInfo
This fixes an oversight that Intrinsic::nearbyint was not being mapped to ISD::FNEARBYINT (thus fixing the over-optimistic cost we
Add the nearbyint -> FNEARBYINT mapping to BasicTargetTransformInfo
This fixes an oversight that Intrinsic::nearbyint was not being mapped to ISD::FNEARBYINT (thus fixing the over-optimistic cost we were assigning to nearbyint calls for some targets).
llvm-svn: 185783
show more ...
|
#
afc1036f |
| 19-Jun-2013 |
Bill Wendling <isanbard@gmail.com> |
Access the TargetLoweringInfo from the TargetMachine object instead of caching it. The TLI may change between functions. No functionality change.
llvm-svn: 184349
|
Revision tags: llvmorg-3.3.0, llvmorg-3.3.0-rc3 |
|
#
bf490d4a |
| 31-May-2013 |
Quentin Colombet <qcolombet@apple.com> |
Loop Strength Reduce: Scaling factor cost.
Account for the cost of scaling factor in Loop Strength Reduce when rating the formulae. This uses a target hook.
The default implementation of the hook i
Loop Strength Reduce: Scaling factor cost.
Account for the cost of scaling factor in Loop Strength Reduce when rating the formulae. This uses a target hook.
The default implementation of the hook is: if the addressing mode is legal, the scaling factor is free.
<rdar://problem/13806271>
llvm-svn: 183045
show more ...
|
Revision tags: llvmorg-3.3.0-rc2, llvmorg-3.3.0-rc1 |
|
#
0db0690a |
| 14-Apr-2013 |
Nadav Rotem <nrotem@apple.com> |
Document the decision to assume that the cost of floats is twice as much as integers.
llvm-svn: 179478
|
#
87a0af6e |
| 12-Apr-2013 |
Nadav Rotem <nrotem@apple.com> |
CostModel: increase the default cost of supported floating point operations from 1 to two. Fixed a few tests that changes because now the cost of one insert + a vector operation on two doubles is low
CostModel: increase the default cost of supported floating point operations from 1 to two. Fixed a few tests that changes because now the cost of one insert + a vector operation on two doubles is lower than two scalar operations on doubles.
llvm-svn: 179413
show more ...
|
#
b9773871 |
| 04-Apr-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
CostModel: Add parameter to instruction cost to further classify operand values
On certain architectures we can support efficient vectorized version of instructions if the operand value is uniform (
CostModel: Add parameter to instruction cost to further classify operand values
On certain architectures we can support efficient vectorized version of instructions if the operand value is uniform (splat) or a constant scalar. An example of this is a vector shift on x86.
We can efficiently support
for (i = 0 ; i < ; i += 4) w[0:3] = v[0:3] << <2, 2, 2, 2>
but not
for (i = 0; i < ; i += 4) w[0:3] = v[0:3] << x[0:3]
This patch adds a parameter to getArithmeticInstrCost to further qualify operand values as uniform or uniform constant.
Targets can then choose to return a different cost for instructions with such operand values.
A follow-up commit will test this feature on x86.
radar://13576547
llvm-svn: 178807
show more ...
|
#
f7cfac7a |
| 28-Feb-2013 |
Benjamin Kramer <benny.kra@googlemail.com> |
Cost model support for lowered math builtins.
We make the cost for calling libm functions extremely high as emitting the calls is expensive and causes spills (on x86) so performance suffers. We stil
Cost model support for lowered math builtins.
We make the cost for calling libm functions extremely high as emitting the calls is expensive and causes spills (on x86) so performance suffers. We still vectorize important calls like ceilf and friends on SSE4.1. and fabs.
Differential Revision: http://llvm-reviews.chandlerc.com/D466
llvm-svn: 176287
show more ...
|
#
594fa2dc |
| 08-Feb-2013 |
Arnold Schwaighofer <aschwaighofer@apple.com> |
ARM cost model: Address computation in vector mem ops not free
Adds a function to target transform info to query for the cost of address computation. The cost model analysis pass now also queries th
ARM cost model: Address computation in vector mem ops not free
Adds a function to target transform info to query for the cost of address computation. The cost model analysis pass now also queries this interface. The code in LoopVectorize adds the cost of address computation as part of the memory instruction cost calculation. Only there, we know whether the instruction will be scalarized or not. Increase the penality for inserting in to D registers on swift. This becomes necessary because we now always assume that address computation has a cost and three is a closer value to the architecture.
radar://13097204
llvm-svn: 174713
show more ...
|
#
56b31bd9 |
| 11-Jan-2013 |
Benjamin Kramer <benny.kra@googlemail.com> |
Split TargetLowering into a CodeGen and a SelectionDAG part.
This fixes some of the cycles between libCodeGen and libSelectionDAG. It's still a complete mess but as long as the edges consist of virt
Split TargetLowering into a CodeGen and a SelectionDAG part.
This fixes some of the cycles between libCodeGen and libSelectionDAG. It's still a complete mess but as long as the edges consist of virtual call it doesn't cause breakage. BasicTTI did static calls and thus broke some build configurations.
llvm-svn: 172246
show more ...
|
#
e55aa3c8 |
| 11-Jan-2013 |
Nadav Rotem <nrotem@apple.com> |
ARM Cost Model: Modify the target independent cost model to ask the target if it supports the different CAST types. We didn't do this on X86 because of the different register sizes and types, but on
ARM Cost Model: Modify the target independent cost model to ask the target if it supports the different CAST types. We didn't do this on X86 because of the different register sizes and types, but on ARM this makes sense.
llvm-svn: 172245
show more ...
|
#
b1791a75 |
| 09-Jan-2013 |
Nadav Rotem <nrotem@apple.com> |
ARM Cost model: Use the size of vector registers and widest vectorizable instruction to determine the max vectorization factor.
llvm-svn: 172010
|
#
b696c36f |
| 09-Jan-2013 |
Nadav Rotem <nrotem@apple.com> |
Cost Model: Move the 'max unroll factor' variable to the TTI and add initial Cost Model support on ARM.
llvm-svn: 171928
|
#
95f83e01 |
| 07-Jan-2013 |
Chandler Carruth <chandlerc@gmail.com> |
Sink AddrMode back into TargetLowering, removing one of the most peculiar headers under include/llvm.
This struct still doesn't make a lot of sense, but it makes more sense down in TargetLowering th
Sink AddrMode back into TargetLowering, removing one of the most peculiar headers under include/llvm.
This struct still doesn't make a lot of sense, but it makes more sense down in TargetLowering than it did before.
llvm-svn: 171739
show more ...
|
#
d3e73556 |
| 07-Jan-2013 |
Chandler Carruth <chandlerc@gmail.com> |
Move TargetTransformInfo to live under the Analysis library. This no longer would violate any dependency layering and it is in fact an analysis. =]
llvm-svn: 171686
|
#
664e354d |
| 07-Jan-2013 |
Chandler Carruth <chandlerc@gmail.com> |
Switch TargetTransformInfo from an immutable analysis pass that requires a TargetMachine to construct (and thus isn't always available), to an analysis group that supports layered implementations muc
Switch TargetTransformInfo from an immutable analysis pass that requires a TargetMachine to construct (and thus isn't always available), to an analysis group that supports layered implementations much like AliasAnalysis does. This is a pretty massive change, with a few parts that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an analysis group, and to sink the nonce implementations in ScalarTargetTransformInfo and VectorTargetTranformInfo into a NoTargetTransformInfo pass. This allows other passes to add a hard requirement on TTI, and assume they will always get at least on implementation.
The TargetTransformInfo analysis group leverages the delegation chaining trick that AliasAnalysis uses, where the base class for the analysis group delegates to the previous analysis *pass*, allowing all but tho NoFoo analysis passes to only implement the parts of the interfaces they support. It also introduces a new trick where each pass in the group retains a pointer to the top-most pass that has been initialized. This allows passes to implement one API in terms of another API and benefit when some other pass above them in the stack has more precise results for the second API.
The second step of this conversion is to create a pass that implements the TargetTransformInfo analysis using the target-independent abstractions in the code generator. This replaces the ScalarTargetTransformImpl and VectorTargetTransformImpl classes in lib/Target with a single pass in lib/CodeGen called BasicTargetTransformInfo. This class actually provides most of the TTI functionality, basing it upon the TargetLowering abstraction and other information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to register custom analysis passes. This allows building those passes with access to TargetLowering or other target-specific classes, and it also allows each target to customize the set of analysis passes desired in the pass manager. The baseline LLVMTargetMachine implements this interface to add the BasicTTI pass to the pass manager, and all of the tools that want to support target-aware TTI passes call this routine on whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis passes for the X86 and ARM backends. These passes contain the custom logic that was previously in their extensions of the ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces. I separated them into their own file, as now all of the interface bits are private and they just expose a function to create the pass itself. Then I extended these target machines to set up a custom set of analysis passes, first adding BasicTTI as a fallback, and then adding their customized TTI implementations.
The fourth step required logic that was shared between the target independent layer and the specific targets to move to a different interface, as they no longer derive from each other. As a consequence, a helper functions were added to TargetLowering representing the common logic needed both in the target implementation and the codegen implementation of the TTI pass. While technically this is the only change that could have been committed separately, it would have been a nightmare to extract.
The final step of the conversion was just to delete all the old boilerplate. This got rid of the ScalarTargetTransformInfo and VectorTargetTransformInfo classes, all of the support in all of the targets for producing instances of them, and all of the support in the tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become straightforward. First, we can sink it into lib/Analysis which is a more natural layer for it to live. Second, clients of this interface can depend on it *always* being available which will simplify their code and behavior. These (and other) simplifications will follow in subsequent commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation needs to be updated. As soon as I had this working, and plausibly well commented, I wanted to get it committed and in front of the build bots. I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
show more ...
|