#
bd79f73f |
| 03-Jun-2017 |
Galina Kistanova <gkistanova@gmail.com> |
Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.
llvm-svn: 304635
|
#
251ea8a4 |
| 01-Jun-2017 |
Amaury Sechet <deadalnix@gmail.com> |
Do not legalize large setcc with setcce, introduce setcccarry and do it with usubo/setcccarry.
Summary: This is a continuation of the work started in D29872 . Passing the carry down as a value rathe
Do not legalize large setcc with setcce, introduce setcccarry and do it with usubo/setcccarry.
Summary: This is a continuation of the work started in D29872 . Passing the carry down as a value rather than as a glue allows for further optimizations. Introducing setcccarry makes the use of addc/subc unecessary and we can start the removal process.
This patch only introduce the optimization strictly required to get the same level of optimization as was available before nothing more.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33374
llvm-svn: 304404
show more ...
|
#
3a7578c6 |
| 31-May-2017 |
Zaara Syeda <syzaara@ca.ibm.com> |
[PPC] Inline expansion of memcmp
This patch does an inline expansion of memcmp. It changes the memcmp library call into an inline expansion when the size is known at compile time and is under a targ
[PPC] Inline expansion of memcmp
This patch does an inline expansion of memcmp. It changes the memcmp library call into an inline expansion when the size is known at compile time and is under a target specified threshold. This expansion is implemented in CodeGenPrepare and expands into straight line code. The target specifies a maximum load size and the expansion works by using this size to load the two sources, compare, and exit early if a difference is found. It also has a special case when the memcmp result is used in a compare to zero equality.
Differential Revision: https://reviews.llvm.org/D28637
llvm-svn: 304313
show more ...
|
#
f6d4dc5b |
| 30-May-2017 |
Craig Topper <craig.topper@gmail.com> |
[SelectionDAG] Set ISD::FPOWI to Expand by default
Summary: Currently FPOWI defaults to Legal and LegalizeDAG.cpp turns Legal into Expand for this opcode because Legal is a "lie".
This patch change
[SelectionDAG] Set ISD::FPOWI to Expand by default
Summary: Currently FPOWI defaults to Legal and LegalizeDAG.cpp turns Legal into Expand for this opcode because Legal is a "lie".
This patch changes the default for this opcode to Expand and removes the hack from LegalizeDAG.cpp. It also removes all the code in the targets that set this opcode to Expand themselves since they can just rely on the default.
Reviewers: spatel, RKSimon, efriedma
Reviewed By: RKSimon
Subscribers: jfb, dschuff, sbc100, jgravelle-google, nemanjai, javed.absar, andrew.w.kaylor, llvm-commits
Differential Revision: https://reviews.llvm.org/D33530
llvm-svn: 304215
show more ...
|
#
b52e0366 |
| 17-May-2017 |
Francis Visoiu Mistrih <fvisoiumistrih@apple.com> |
BitVector: add iterators for set bits
Differential revision: https://reviews.llvm.org/D32060
llvm-svn: 303227
|
#
8ac81f39 |
| 30-Apr-2017 |
Amaury Sechet <deadalnix@gmail.com> |
Do not legalize large add with addc/adde, introduce addcarry and do it with uaddo/addcarry
Summary: As per discution on how to get better codegen an large int legalization, it became clear that usin
Do not legalize large add with addc/adde, introduce addcarry and do it with uaddo/addcarry
Summary: As per discution on how to get better codegen an large int legalization, it became clear that using a glue for the carry was preventing several desirable optimizations. Passing the carry down as a value allow for more flexibility.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D29872
llvm-svn: 301775
show more ...
|
#
744c215e |
| 28-Apr-2017 |
Matthias Braun <matze@braunis.de> |
TargetLowering: Add finalizeLowering() function; NFC
Adds a new method finalizeLowering to TargetLoweringBase. This is in preparation for an upcoming commit.
This function is meant for target speci
TargetLowering: Add finalizeLowering() function; NFC
Adds a new method finalizeLowering to TargetLoweringBase. This is in preparation for an upcoming commit.
This function is meant for target specific adjustments to MachineFrameInfo or register reservations.
Move the freezeRegisters() and the hasCopyImplyingStackAdjustment() handling into the new function to prove the concept. As an added bonus GlobalISel no longer missed the hasCopyImplyingStackAdjustment() handling with this.
Differential Revision: https://reviews.llvm.org/D32621
llvm-svn: 301679
show more ...
|
#
919f9e8d |
| 28-Apr-2017 |
Jun Bum Lim <junbuml@codeaurora.org> |
[InlineCost] Improve the cost heuristic for Switch
Summary: The motivation example is like below which has 13 cases but only 2 distinct targets
``` lor.lhs.false2:
[InlineCost] Improve the cost heuristic for Switch
Summary: The motivation example is like below which has 13 cases but only 2 distinct targets
``` lor.lhs.false2: ; preds = %if.then switch i32 %Status, label %if.then27 [ i32 -7012, label %if.end35 i32 -10008, label %if.end35 i32 -10016, label %if.end35 i32 15000, label %if.end35 i32 14013, label %if.end35 i32 10114, label %if.end35 i32 10107, label %if.end35 i32 10105, label %if.end35 i32 10013, label %if.end35 i32 10011, label %if.end35 i32 7008, label %if.end35 i32 7007, label %if.end35 i32 5002, label %if.end35 ] ``` which is compiled into a balanced binary tree like this on AArch64 (similar on X86)
``` .LBB853_9: // %lor.lhs.false2 mov w8, #10012 cmp w19, w8 b.gt .LBB853_14 // BB#10: // %lor.lhs.false2 mov w8, #5001 cmp w19, w8 b.gt .LBB853_18 // BB#11: // %lor.lhs.false2 mov w8, #-10016 cmp w19, w8 b.eq .LBB853_23 // BB#12: // %lor.lhs.false2 mov w8, #-10008 cmp w19, w8 b.eq .LBB853_23 // BB#13: // %lor.lhs.false2 mov w8, #-7012 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_14: // %lor.lhs.false2 mov w8, #14012 cmp w19, w8 b.gt .LBB853_21 // BB#15: // %lor.lhs.false2 mov w8, #-10105 add w8, w19, w8 cmp w8, #9 // =9 b.hi .LBB853_17 // BB#16: // %lor.lhs.false2 orr w9, wzr, #0x1 lsl w8, w9, w8 mov w9, #517 and w8, w8, w9 cbnz w8, .LBB853_23 .LBB853_17: // %lor.lhs.false2 mov w8, #10013 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_18: // %lor.lhs.false2 mov w8, #-7007 add w8, w19, w8 cmp w8, #2 // =2 b.lo .LBB853_23 // BB#19: // %lor.lhs.false2 mov w8, #5002 cmp w19, w8 b.eq .LBB853_23 // BB#20: // %lor.lhs.false2 mov w8, #10011 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_21: // %lor.lhs.false2 mov w8, #14013 cmp w19, w8 b.eq .LBB853_23 // BB#22: // %lor.lhs.false2 mov w8, #15000 cmp w19, w8 b.ne .LBB853_3 ``` However, the inline cost model estimates the cost to be linear with the number of distinct targets and the cost of the above switch is just 2 InstrCosts. The function containing this switch is then inlined about 900 times.
This change use the general way of switch lowering for the inline heuristic. It etimate the number of case clusters with the suitability check for a jump table or bit test. Considering the binary search tree built for the clusters, this change modifies the model to be linear with the size of the balanced binary tree. The model is off by default for now : -inline-generic-switch-cost=false
This change was originally proposed by Haicheng in D29870.
Reviewers: hans, bmakam, chandlerc, eraman, haicheng, mcrosier
Reviewed By: hans
Subscribers: joerg, aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D31085
llvm-svn: 301649
show more ...
|
#
c8e8e2a0 |
| 24-Apr-2017 |
Krzysztof Parzyszek <kparzysz@codeaurora.org> |
Move value type list from TargetRegisterClass to TargetRegisterInfo
Differential Revision: https://reviews.llvm.org/D31937
llvm-svn: 301234
|
#
98ab4c64 |
| 24-Apr-2017 |
Krzysztof Parzyszek <kparzysz@codeaurora.org> |
Revert r301231: Accidentally committed stale files
I forgot to commit local changes before commit.
llvm-svn: 301232
|
#
c0197066 |
| 24-Apr-2017 |
Krzysztof Parzyszek <kparzysz@codeaurora.org> |
Move value type list from TargetRegisterClass to TargetRegisterInfo
Differential Revision: https://reviews.llvm.org/D31937
llvm-svn: 301231
|
#
44e25f37 |
| 24-Apr-2017 |
Krzysztof Parzyszek <kparzysz@codeaurora.org> |
Move size and alignment information of regclass to TargetRegisterInfo
1. RegisterClass::getSize() is split into two functions: - TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC
Move size and alignment information of regclass to TargetRegisterInfo
1. RegisterClass::getSize() is split into two functions: - TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const; - TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const; 2. RegisterClass::getAlignment() is replaced by: - TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const;
This will allow making those values depend on subtarget features in the future.
Differential Revision: https://reviews.llvm.org/D31783
llvm-svn: 301221
show more ...
|
#
59a2d7b9 |
| 11-Apr-2017 |
Serge Guelton <sguelton@quarkslab.com> |
Module::getOrInsertFunction is using C-style vararg instead of variadic templates.
From a user prospective, it forces the use of an annoying nullptr to mark the end of the vararg, and there's not ty
Module::getOrInsertFunction is using C-style vararg instead of variadic templates.
From a user prospective, it forces the use of an annoying nullptr to mark the end of the vararg, and there's not type checking on the arguments. The variadic template is an obvious solution to both issues.
Differential Revision: https://reviews.llvm.org/D31070
llvm-svn: 299949
show more ...
|
#
b050c7fb |
| 11-Apr-2017 |
Diana Picus <diana.picus@linaro.org> |
Revert "Turn some C-style vararg into variadic templates"
This reverts commit r299925 because it broke the buildbots. See e.g. http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/6008
ll
Revert "Turn some C-style vararg into variadic templates"
This reverts commit r299925 because it broke the buildbots. See e.g. http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/6008
llvm-svn: 299928
show more ...
|
#
5fd75fb7 |
| 11-Apr-2017 |
Serge Guelton <sguelton@quarkslab.com> |
Turn some C-style vararg into variadic templates
Module::getOrInsertFunction is using C-style vararg instead of variadic templates.
From a user prospective, it forces the use of an annoying nullptr
Turn some C-style vararg into variadic templates
Module::getOrInsertFunction is using C-style vararg instead of variadic templates.
From a user prospective, it forces the use of an annoying nullptr to mark the end of the vararg, and there's not type checking on the arguments. The variadic template is an obvious solution to both issues.
llvm-svn: 299925
show more ...
|
#
f7e4388e |
| 07-Apr-2017 |
Simon Dardis <simon.dardis@imgtec.com> |
Revert "[SelectionDAG] Enable target specific vector scalarization of calls and returns"
This reverts commit r299766. This change appears to have broken the MIPS buildbots. Reverting while I investi
Revert "[SelectionDAG] Enable target specific vector scalarization of calls and returns"
This reverts commit r299766. This change appears to have broken the MIPS buildbots. Reverting while I investigate.
Revert "[mips] Remove usage of debug only variable (NFC)"
This reverts commit r299769. Follow up commit.
llvm-svn: 299788
show more ...
|
#
6470ff0b |
| 07-Apr-2017 |
Simon Dardis <simon.dardis@imgtec.com> |
[SelectionDAG] Enable target specific vector scalarization of calls and returns
By target hookifying getRegisterType, getNumRegisters, getVectorBreakdown, backends can request that LLVM to scalarize
[SelectionDAG] Enable target specific vector scalarization of calls and returns
By target hookifying getRegisterType, getNumRegisters, getVectorBreakdown, backends can request that LLVM to scalarize vector types for calls and returns.
The MIPS vector ABI requires that vector arguments and returns are passed in integer registers. With SelectionDAG's new hooks, the MIPS backend can now handle LLVM-IR with vector types in calls and returns. E.g. 'call @foo(<4 x i32> %4)'.
Previously these cases would be scalarized for the MIPS O32/N32/N64 ABI for calls and returns if vector types were not legal. If vector types were legal, a single 128bit vector argument would be assigned to a single 32 bit / 64 bit integer register.
By teaching the MIPS backend to inspect the original types, it can now implement the MIPS vector ABI which requires a particular method of scalarizing vectors.
Previously, the MIPS backend relied on clang to scalarize types such as "call @foo(<4 x float> %a) into "call @foo(i32 inreg %1, i32 inreg %2, i32 inreg %3, i32 inreg %4)".
This patch enables the MIPS backend to take either form for vector types.
Reviewers: zoran.jovanovic, jaydeep, vkalintiris, slthakur
Differential Revision: https://reviews.llvm.org/D27845
llvm-svn: 299766
show more ...
|
#
db11fdfd |
| 06-Apr-2017 |
Mehdi Amini <mehdi.amini@apple.com> |
Revert "Turn some C-style vararg into variadic templates"
This reverts commit r299699, the examples needs to be updated.
llvm-svn: 299702
|
#
579540a8 |
| 06-Apr-2017 |
Mehdi Amini <mehdi.amini@apple.com> |
Turn some C-style vararg into variadic templates
Module::getOrInsertFunction is using C-style vararg instead of variadic templates.
From a user prospective, it forces the use of an annoying nullptr
Turn some C-style vararg into variadic templates
Module::getOrInsertFunction is using C-style vararg instead of variadic templates.
From a user prospective, it forces the use of an annoying nullptr to mark the end of the vararg, and there's not type checking on the arguments. The variadic template is an obvious solution to both issues.
Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu>
Differential Revision: https://reviews.llvm.org/D31070
llvm-svn: 299699
show more ...
|
#
b518054b |
| 21-Mar-2017 |
Reid Kleckner <rnk@google.com> |
Rename AttributeSet to AttributeList
Summary: This class is a list of AttributeSetNodes corresponding the function prototype of a call or function declaration. This class used to be called ParamAttr
Rename AttributeSet to AttributeList
Summary: This class is a list of AttributeSetNodes corresponding the function prototype of a call or function declaration. This class used to be called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is typically accessed by parameter and return value index, so "AttributeList" seems like a more intuitive name.
Rename AttributeSetImpl to AttributeListImpl to follow suit.
It's useful to rename this class so that we can rename AttributeSetNode to AttributeSet later. AttributeSet is the set of attributes that apply to a single function, argument, or return value.
Reviewers: sanjoy, javed.absar, chandlerc, pete
Reviewed By: pete
Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits
Differential Revision: https://reviews.llvm.org/D31102
llvm-svn: 298393
show more ...
|
#
cf2da96c |
| 14-Mar-2017 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[SelectionDAG] Add a signed integer absolute ISD node
Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that p
[SelectionDAG] Add a signed integer absolute ISD node
Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering.
ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns.
At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom.
Differential Revision: https://reviews.llvm.org/D29639
llvm-svn: 297780
show more ...
|
#
54e22f33 |
| 14-Mar-2017 |
Nirav Dave <niravd@google.com> |
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic.
When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code paths
3. Re-add the Store node to the worklist after calling SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added.
This finishes the change Matt Arsenault started in r246307 and jyknight's original patch.
Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 297695
show more ...
|
#
ce52b807 |
| 03-Mar-2017 |
Chandler Carruth <chandlerc@gmail.com> |
[SDAG] Revert r296476 (and r296486, r296668, r296690).
This patch causes compile times for some patterns to explode. I have a (large, unreduced) test case that slows down by more than 20x and severa
[SDAG] Revert r296476 (and r296486, r296668, r296690).
This patch causes compile times for some patterns to explode. I have a (large, unreduced) test case that slows down by more than 20x and several test cases slow down by 2x. I'm sending some of the test cases directly to Nirav and following up with more details in the review log, but this should unblock anyone else hitting this.
llvm-svn: 296862
show more ...
|
#
f830dec3 |
| 28-Feb-2017 |
Nirav Dave <niravd@google.com> |
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge St
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic.
When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code paths
3. Re-add the Store node to the worklist after calling SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added.
This finishes the change Matt Arsenault started in r246307 and jyknight's original patch.
Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 296476
show more ...
|
#
73cd0194 |
| 26-Feb-2017 |
Nirav Dave <niravd@google.com> |
Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r296252 until 256-bit operations are more efficiently generated in X86.
llvm-svn: 296
Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r296252 until 256-bit operations are more efficiently generated in X86.
llvm-svn: 296279
show more ...
|