#
2596da31 |
| 15-Jun-2020 |
Sam Parker <sam.parker@arm.com> |
[CostModel] getCFInstrCost in getUserCost.
Have BasicTTI call the base implementation so that both agree on the default behaviour, which the default being a cost of '1'. This has required an X86 spe
[CostModel] getCFInstrCost in getUserCost.
Have BasicTTI call the base implementation so that both agree on the default behaviour, which the default being a cost of '1'. This has required an X86 specific implementation as it seems to be very reliant on those instructions being free. Changes are also made to AMDGPU so that their implementations distinguish between cost kinds, so that the unrolling isn't affected. PowerPC also has its own implementation to prevent changes to the reg-usage vectorizer test.
The cost model test changes now reflect that ret instructions are not generally free.
Differential Revision: https://reviews.llvm.org/D79164
show more ...
|
#
fa8bff0c |
| 05-Jun-2020 |
Sam Parker <sam.parker@arm.com> |
[CostModel] Unify getArithmeticInstrCost
Add the remaining arithmetic opcodes into the generic implementation of getUserCost and then call this from getInstructionThroughput. Most of the backends ha
[CostModel] Unify getArithmeticInstrCost
Add the remaining arithmetic opcodes into the generic implementation of getUserCost and then call this from getInstructionThroughput. Most of the backends have been modified to return the base implementation for cost kinds other RecipThroughput. The outlier here is AMDGPU which already uses getArithmeticInstrCost for all the cost kinds. This change means that most of the opcodes can be removed from that backends implementation of getUserCost.
Differential Revision: https://reviews.llvm.org/D80992
show more ...
|
#
37289615 |
| 26-May-2020 |
Sam Parker <sam.parker@arm.com> |
[NFCI][CostModel] Unify getCmpSelInstrCost
Add cases for icmp, fcmp and select into the switch statement of the generic getUserCost implementation with getInstructionThroughput then calling into it.
[NFCI][CostModel] Unify getCmpSelInstrCost
Add cases for icmp, fcmp and select into the switch statement of the generic getUserCost implementation with getInstructionThroughput then calling into it. The BasicTTI and backend implementations have be set to return a default value (1) when a cost other than throughput is being queried.
Differential Revision: https://reviews.llvm.org/D80550
show more ...
|
#
772349de |
| 08-Jun-2020 |
Sam Parker <sam.parker@arm.com> |
[PPC] Try to fix builbots
Attempt to handle unsupported types, such as structs, in getMemoryOpCost. The backend now checks for a supported type and calls into BasicTTI as a fallback. BasicTTI will n
[PPC] Try to fix builbots
Attempt to handle unsupported types, such as structs, in getMemoryOpCost. The backend now checks for a supported type and calls into BasicTTI as a fallback. BasicTTI will now also perform the same check and will default to an expensive cost of 4 for 'Other' MVTs.
Differential Revision: https://reviews.llvm.org/D80984
show more ...
|
#
9303546b |
| 05-Jun-2020 |
Sam Parker <sam.parker@arm.com> |
[CostModel] Unify getMemoryOpCost
Use getMemoryOpCost from the generic implementation of getUserCost and have getInstructionThroughput return the result of that for loads and stores.
This also mean
[CostModel] Unify getMemoryOpCost
Use getMemoryOpCost from the generic implementation of getUserCost and have getInstructionThroughput return the result of that for loads and stores.
This also means that the X86 implementation of getUserCost can be removed with the functionality folded into its getMemoryOpCost.
Differential Revision: https://reviews.llvm.org/D80984
show more ...
|
#
2368bf52 |
| 27-May-2020 |
Lei Huang <lei@ca.ibm.com> |
[PowerPC] Add support for -mcpu=pwr10 in both clang and llvm
Summary: This patch simply adds support for the new CPU in anticipation of Power10. There isn't really any functionality added so there a
[PowerPC] Add support for -mcpu=pwr10 in both clang and llvm
Summary: This patch simply adds support for the new CPU in anticipation of Power10. There isn't really any functionality added so there are no associated test cases at this time.
Reviewers: stefanp, nemanjai, amyk, hfinkel, power-llvm-team, #powerpc
Reviewed By: stefanp, nemanjai, amyk, #powerpc
Subscribers: NeHuang, steven.zhang, hiraditya, llvm-commits, wuzish, shchenz, cfe-commits, kbarton, echristo
Tags: #clang, #powerpc, #llvm
Differential Revision: https://reviews.llvm.org/D80020
show more ...
|
#
559845f8 |
| 27-May-2020 |
Lei Huang <lei@ca.ibm.com> |
Revert "[PowerPC] Add support for -mcpu=pwr10 in both clang and llvm"
This reverts commit 7eb666b1556b86503f2f386bf921186cdbb2d22a.
|
Revision tags: llvmorg-10.0.1-rc1 |
|
#
7eb666b1 |
| 15-May-2020 |
Lei Huang <lei@ca.ibm.com> |
[PowerPC] Add support for -mcpu=pwr10 in both clang and llvm
Summary: This patch simply adds support for the new CPU in anticipation of Power10. There isn't really any functionality added so there a
[PowerPC] Add support for -mcpu=pwr10 in both clang and llvm
Summary: This patch simply adds support for the new CPU in anticipation of Power10. There isn't really any functionality added so there are no associated test cases at this time.
Reviewers: stefanp, nemanjai, amyk, hfinkel, power-llvm-team, #powerpc
Reviewed By: stefanp, nemanjai, amyk, #powerpc
Subscribers: NeHuang, steven.zhang, hiraditya, llvm-commits, wuzish, shchenz, cfe-commits, kbarton, echristo
Tags: #clang, #powerpc, #llvm
Differential Revision: https://reviews.llvm.org/D80020
show more ...
|
#
8aaabade |
| 26-May-2020 |
Sam Parker <sam.parker@arm.com> |
[CostModel] Unify getCastInstrCost
Add the remaining cast instruction opcodes to the base implementation of getUserCost and directly return the result. This allows getInstructionThroughput to return
[CostModel] Unify getCastInstrCost
Add the remaining cast instruction opcodes to the base implementation of getUserCost and directly return the result. This allows getInstructionThroughput to return getUserCost for the casts. This has required changes to PPC and SystemZ because they implement getUserCost and/or getCastInstrCost with adjustments for vector operations. Adjusts have also been made in the remaining backends that implement the method so that they still produce a cost of zero or one for cost kinds other than throughput.
Differential Revision: https://reviews.llvm.org/D79848
show more ...
|
#
7392820f |
| 23-May-2020 |
Craig Topper <craig.topper@gmail.com> |
[Align] Remove operations on MaybeAlign that asserted that it had a defined value.
If the caller needs to reponsible for making sure the MaybeAlign has a value, then we should just make the caller c
[Align] Remove operations on MaybeAlign that asserted that it had a defined value.
If the caller needs to reponsible for making sure the MaybeAlign has a value, then we should just make the caller convert it to an Align with operator*.
I explicitly deleted the relational comparison operators that were being inherited from Optional. It's unclear what the meaning of two MaybeAligns were one is defined and the other isn't should be. So make the caller reponsible for defining the behavior.
I left the ==/!= operators from Optional. But now that exposed a weird quirk that ==/!= between Align and MaybeAlign required the MaybeAlign to be defined. But now we use the operator== from Optional that takes an Optional and the Value.
Differential Revision: https://reviews.llvm.org/D80455
show more ...
|
#
fb3ba380 |
| 12-May-2020 |
Sam Parker <sam.parker@arm.com> |
[CostModel] Remove getExtCost
This has not been implemented by any backends which appear to cover the functionality through getCastInstrCost. Sink what there is in the default implementation into Ba
[CostModel] Remove getExtCost
This has not been implemented by any backends which appear to cover the functionality through getCastInstrCost. Sink what there is in the default implementation into BasicTTI.
Differential Revision: https://reviews.llvm.org/D78922
show more ...
|
#
8cc911fa |
| 20-May-2020 |
Sam Parker <sam.parker@arm.com> |
[NFCI][CostModel] Refactor getIntrinsicInstrCost
Combine the two API calls into one by introducing a structure to hold the relevant data. This has the added benefit of moving the boiler plate code f
[NFCI][CostModel] Refactor getIntrinsicInstrCost
Combine the two API calls into one by introducing a structure to hold the relevant data. This has the added benefit of moving the boiler plate code for arguments and flags, into the constructors. This is intended to be a non-functional change, but the complicated web of logic involved here makes it very hard to guarantee.
Differential Revision: https://reviews.llvm.org/D79941
show more ...
|
#
0d5d5a75 |
| 15-May-2020 |
Christopher Tetreault <ctetreau@quicinc.com> |
[SVE] Remove usages of VectorType::getNumElements() from PowerPC
Reviewers: efriedma, sdesmalen, c-rhodes, hfinkel
Reviewed By: c-rhodes
Subscribers: wuzish, nemanjai, tschuett, hiraditya, kbarton
[SVE] Remove usages of VectorType::getNumElements() from PowerPC
Reviewers: efriedma, sdesmalen, c-rhodes, hfinkel
Reviewed By: c-rhodes
Subscribers: wuzish, nemanjai, tschuett, hiraditya, kbarton, rkruppe, psnobl, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79821
show more ...
|
#
0138cc01 |
| 19-Apr-2020 |
Justin Hibbits <jrh29@alumni.cwru.edu> |
PowerPC: Treat llvm.fma.f* intrinsic as using CTR with SPE
Summary: The SPE doesn't have a 'fma' instruction, so the intrinsic becomes a libcall. It really should become an expansion to two instruc
PowerPC: Treat llvm.fma.f* intrinsic as using CTR with SPE
Summary: The SPE doesn't have a 'fma' instruction, so the intrinsic becomes a libcall. It really should become an expansion to two instructions, but for some reason the compiler doesn't think that's as optimal as a branch. Since this lowering is done after CTR is allocated for loops, tell the optimizer that CTR may be used in this case. This prevents a "Invalid PPC CTR loop!" assertion in the case that a fma() function call is used in a C/C++ file, and clang converts it into an intrinsic.
Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D78668
show more ...
|
#
40574fef |
| 28-Apr-2020 |
Sam Parker <sam.parker@arm.com> |
[NFC][CostModel] Add TargetCostKind to relevant APIs
Make the kind of cost explicit throughout the cost model which, apart from making the cost clear, will allow the generic parts to calculate bette
[NFC][CostModel] Add TargetCostKind to relevant APIs
Make the kind of cost explicit throughout the cost model which, apart from making the cost clear, will allow the generic parts to calculate better costs. It will also allow some backends to approximate and correlate the different costs if they wish. Another benefit is that it will also help simplify the cost model around immediate and intrinsic costs, where we currently have multiple APIs.
RFC thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/141263.html
Differential Revision: https://reviews.llvm.org/D79002
show more ...
|
#
e9c9329a |
| 27-Apr-2020 |
Sam Parker <sam.parker@arm.com> |
[TTI] Add TargetCostKind argument to getUserCost
There are several different types of cost that TTI tries to provide explicit information for: throughput, latency, code size along with a vague 'inte
[TTI] Add TargetCostKind argument to getUserCost
There are several different types of cost that TTI tries to provide explicit information for: throughput, latency, code size along with a vague 'intersection of code-size cost and execution cost'.
The vectorizer is a keen user of RecipThroughput and there's at least 'getInstructionThroughput' and 'getArithmeticInstrCost' designed to help with this cost. The latency cost has a single use and a single implementation. The intersection cost appears to cover most of the rest of the API.
getUserCost is explicitly called from within TTI when the user has been explicit in wanting the code size (also only one use) as well as a few passes which are concerned with a mixture of size and/or a relative cost. In many cases these costs are closely related, such as when multiple instructions are required, but one evident diverging cost in this function is for div/rem.
This patch adds an argument so that the cost required is explicit, so that we can make the important distinction when necessary.
Differential Revision: https://reviews.llvm.org/D78635
show more ...
|
#
a58b62b4 |
| 28-Apr-2020 |
Craig Topper <craig.topper@gmail.com> |
[IR] Replace all uses of CallBase::getCalledValue() with getCalledOperand().
This method has been commented as deprecated for a while. Remove it and replace all uses with the equivalent getCalledOpe
[IR] Replace all uses of CallBase::getCalledValue() with getCalledOperand().
This method has been commented as deprecated for a while. Remove it and replace all uses with the equivalent getCalledOperand().
I also made a few cleanups in here. For example, to removes use of getElementType on a pointer when we could just use getFunctionType from the call.
Differential Revision: https://reviews.llvm.org/D78882
show more ...
|
#
49fd24fe |
| 08-Apr-2020 |
Christopher Tetreault <ctetreau@quicinc.com> |
Clean up usages of asserting vector getters in Type
Summary: Remove usages of asserting vector getters in Type in preparation for the VectorType refactor. The existence of these functions complicate
Clean up usages of asserting vector getters in Type
Summary: Remove usages of asserting vector getters in Type in preparation for the VectorType refactor. The existence of these functions complicates the refactor while adding little value.
Reviewers: hfinkel, efriedma, sdesmalen
Reviewed By: efriedma
Subscribers: wuzish, nemanjai, hiraditya, kbarton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77266
show more ...
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4 |
|
#
8f5e3c74 |
| 07-Mar-2020 |
Teresa Johnson <tejohnson@google.com> |
[PowerPC] Fix compile time issue in recursive CTR analysis code
Summary: Avoid re-examining operands on recursive walk looking for CTR. This was causing huge compile time after some earlier optimiza
[PowerPC] Fix compile time issue in recursive CTR analysis code
Summary: Avoid re-examining operands on recursive walk looking for CTR. This was causing huge compile time after some earlier optimization created a large expression.
The start of the expression (created by IndVarSimplify) looked like:
%469 = lshr i64 trunc (i128 xor (i128 udiv (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011)) to i64), i64 45) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011)) to i64), i64 45) to i128), ...
with the _ZN4absl13hash_internal13CityHashState5kSeedE referenced many times.
Reviewers: hfinkel
Subscribers: nemanjai, hiraditya, kbarton, jsji, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75790
show more ...
|
#
a6d3bec8 |
| 11-Mar-2020 |
Anna Welker <anna.welker@arm.com> |
[TTI][ARM][MVE] Refine gather/scatter cost model
Refines the gather/scatter cost model, but also changes the TTI function getIntrinsicInstrCost to accept an additional parameter which is needed for
[TTI][ARM][MVE] Refine gather/scatter cost model
Refines the gather/scatter cost model, but also changes the TTI function getIntrinsicInstrCost to accept an additional parameter which is needed for the gather/scatter cost evaluation. This did require trivial changes in some non-ARM backends to adopt the new parameter. Extending gathers and truncating scatters are now priced cheaper.
Differential Revision: https://reviews.llvm.org/D75525
show more ...
|
Revision tags: llvmorg-10.0.0-rc3 |
|
#
04377a81 |
| 14-Feb-2020 |
Zheng Chen <czhengsz@cn.ibm.com> |
[Powerpc] set instruction count as lsr first priority of lsr.
On Powerpc, set instruction count as lsr first priority of lsr by default. Add an option ppc-lsr-no-insns-cost to return back to default
[Powerpc] set instruction count as lsr first priority of lsr.
On Powerpc, set instruction count as lsr first priority of lsr by default. Add an option ppc-lsr-no-insns-cost to return back to default lsr cost model.
Reviewed By: steven.zhang, jsji
Differential Revision: https://reviews.llvm.org/D72683
show more ...
|
Revision tags: llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init |
|
#
e29a2e6b |
| 03-Jan-2020 |
Jinsong Ji <jji@us.ibm.com> |
[PowerPC][LoopVectorize] Extend getRegisterClassForType to consider double and other floating point type
In https://reviews.llvm.org/D67148, we use isFloatTy to test floating point type, otherwise w
[PowerPC][LoopVectorize] Extend getRegisterClassForType to consider double and other floating point type
In https://reviews.llvm.org/D67148, we use isFloatTy to test floating point type, otherwise we return GPRRC. So 'double' will be classified as GPRRC, which is not accurate.
This patch covers other floating point types.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D71946
show more ...
|
#
7a733466 |
| 27-Dec-2019 |
Fangrui Song <maskray@google.com> |
Delete llvm.{sig,}{setjmp,longjmp} remnant after r136821
Intrinsic has incorrect argument type! i32 (i32*)* @llvm.setjmp
*wipes tear*
|
#
a5da8d90 |
| 18-Dec-2019 |
Nemanja Ivanovic <nemanja.i.ibm@gmail.com> |
[PowerPC] Add missing legalization for vector BSWAP
We somehow missed doing this when we were working on Power9 exploitation. This just adds the missing legalization and cost for producing the vecto
[PowerPC] Add missing legalization for vector BSWAP
We somehow missed doing this when we were working on Power9 exploitation. This just adds the missing legalization and cost for producing the vector intrinsics.
Differential revision: https://reviews.llvm.org/D70436
show more ...
|
#
85ba5f63 |
| 11-Dec-2019 |
Reid Kleckner <rnk@google.com> |
Rename TTI::getIntImmCost for instructions and intrinsics
Soon Intrinsic::ID will be a plain integer, so this overload will not be possible.
Rename both overloads to ensure that downstream targets
Rename TTI::getIntImmCost for instructions and intrinsics
Soon Intrinsic::ID will be a plain integer, so this overload will not be possible.
Rename both overloads to ensure that downstream targets observe this as a build failure instead of a runtime failure.
Split off from D71320
Reviewers: efriedma
Differential Revision: https://reviews.llvm.org/D71381
show more ...
|