Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
#
bed1c7f0 |
| 19-Dec-2022 |
Nikita Popov <npopov@redhat.com> |
[ARM] Convert some tests to opaque pointers (NFC)
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3 |
|
#
dc8a41de |
| 08-Sep-2021 |
Andrew Savonichev <andrew.savonichev@gmail.com> |
[ARM] Simplify address calculation for NEON load/store
The patch attempts to optimize a sequence of SIMD loads from the same base pointer:
%0 = gep float*, float* base, i32 4 %1 = bitcast f
[ARM] Simplify address calculation for NEON load/store
The patch attempts to optimize a sequence of SIMD loads from the same base pointer:
%0 = gep float*, float* base, i32 4 %1 = bitcast float* %0 to <4 x float>* %2 = load <4 x float>, <4 x float>* %1 ... %n1 = gep float*, float* base, i32 N %n2 = bitcast float* %n1 to <4 x float>* %n3 = load <4 x float>, <4 x float>* %n2
For AArch64 the compiler generates a sequence of LDR Qt, [Xn, #16]. However, 32-bit NEON VLD1/VST1 lack the [Wn, #imm] addressing mode, so the address is computed before every ld/st instruction:
add r2, r0, #32 add r0, r0, #16 vld1.32 {d18, d19}, [r2] vld1.32 {d22, d23}, [r0]
This can be improved by computing address for the first load, and then using a post-indexed form of VLD1/VST1 to load the rest:
add r0, r0, #16 vld1.32 {d18, d19}, [r0]! vld1.32 {d22, d23}, [r0]
In order to do that, the patch adds more patterns to DAGCombine:
- (load (add ptr inc1)) and (add ptr inc2) are now folded if inc1 and inc2 are constants.
- (or ptr inc) is now recognized as a pointer increment if ptr is sufficiently aligned.
In addition to that, we now search for all possible base updates and then pick the best one.
Differential Revision: https://reviews.llvm.org/D108988
show more ...
|
Revision tags: llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1 |
|
#
20c43d6b |
| 20-Nov-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
OpaquePtr: Bulk update tests to use typed sret
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1 |
|
#
c9c930ae |
| 06-May-2020 |
Eli Friedman <efriedma@quicinc.com> |
[SelectionDAG] Don't promote the alignment of allocas beyond the stack alignment.
allocas in LLVM IR have a specified alignment. When that alignment is specified, the alloca has at least that alignm
[SelectionDAG] Don't promote the alignment of allocas beyond the stack alignment.
allocas in LLVM IR have a specified alignment. When that alignment is specified, the alloca has at least that alignment at runtime.
If the specified type of the alloca has a higher preferred alignment, SelectionDAG currently ignores that specified alignment, and increases the alignment. It does this even if it would trigger stack realignment. I don't think this makes sense, so this patch changes that.
I was looking into this for SVE in particular: for SVE, overaligning vscale'ed types is extra expensive because it requires realigning the stack multiple times, or using dynamic allocation. (This currently isn't implemented.)
I updated the expected assembly for a couple tests; in particular, for arg-copy-elide.ll, the optimization in question does not increase the alignment the way SelectionDAG normally would. For the rest, I just increased the specified alignment on the allocas to match what SelectionDAG was inferring.
Differential Revision: https://reviews.llvm.org/D79532
show more ...
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2 |
|
#
3aeb1a54 |
| 28-Nov-2017 |
Simon Dardis <simon.dardis@mips.com> |
[DAGCombine] Disable finding better chains for stores at O0
Unoptimized IR can have linear sequences of stores to an array, where the initial GEP for the first store is formed from the pointer to th
[DAGCombine] Disable finding better chains for stores at O0
Unoptimized IR can have linear sequences of stores to an array, where the initial GEP for the first store is formed from the pointer to the array, and the GEP for each store after the first is formed from the previous GEP with some offset in an inductive fashion.
The (large) resulting DAG when analyzed by DAGCombine undergoes an excessive number of combines as each store node is examined every time its' offset node is combined with any child of the offset. One of the transformations is findBetterNeighborChains which assists MergeConsecutiveStores. The former relies on repeated chain walking to do its' work, however MergeConsecutiveStores is disabled at O0 which makes the transformation redundant.
Any optimization level other than O0 would invoke InstCombine which would resolve the chain of GEPs into flat base + offset GEP for each store which does not exhibit the repeated examination of each store to the array.
Disabling this optimization fixes an excessive compile time issue (30~ minutes for the test case provided) at O0.
Reviewers: niravd, craig.topper, t.p.northover
Differential Revision: https://reviews.llvm.org/D40193
llvm-svn: 319142
show more ...
|
Revision tags: llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2, llvmorg-5.0.0-rc1, llvmorg-4.0.1, llvmorg-4.0.1-rc3, llvmorg-4.0.1-rc2, llvmorg-4.0.1-rc1 |
|
#
8b1240b0 |
| 20-Apr-2017 |
Tim Northover <tnorthover@apple.com> |
ARM: handle post-indexed NEON ops where the offset isn't the access width.
Before, we assumed that any ConstantInt offset was precisely the access width, so we could use the "[rN]!" form. ISelLoweri
ARM: handle post-indexed NEON ops where the offset isn't the access width.
Before, we assumed that any ConstantInt offset was precisely the access width, so we could use the "[rN]!" form. ISelLowering only ever created that kind, but further simplification during combining could lead to unexpected constants and incorrect codegen.
Should fix PR32658.
llvm-svn: 300878
show more ...
|
#
54e22f33 |
| 14-Mar-2017 |
Nirav Dave <niravd@google.com> |
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic.
When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code paths
3. Re-add the Store node to the worklist after calling SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added.
This finishes the change Matt Arsenault started in r246307 and jyknight's original patch.
Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 297695
show more ...
|
Revision tags: llvmorg-4.0.0, llvmorg-4.0.0-rc4 |
|
#
ce52b807 |
| 03-Mar-2017 |
Chandler Carruth <chandlerc@gmail.com> |
[SDAG] Revert r296476 (and r296486, r296668, r296690).
This patch causes compile times for some patterns to explode. I have a (large, unreduced) test case that slows down by more than 20x and severa
[SDAG] Revert r296476 (and r296486, r296668, r296690).
This patch causes compile times for some patterns to explode. I have a (large, unreduced) test case that slows down by more than 20x and several test cases slow down by 2x. I'm sending some of the test cases directly to Nirav and following up with more details in the review log, but this should unblock anyone else hitting this.
llvm-svn: 296862
show more ...
|
Revision tags: llvmorg-4.0.0-rc3 |
|
#
f830dec3 |
| 28-Feb-2017 |
Nirav Dave <niravd@google.com> |
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge St
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic.
When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code paths
3. Re-add the Store node to the worklist after calling SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added.
This finishes the change Matt Arsenault started in r246307 and jyknight's original patch.
Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 296476
show more ...
|
#
73cd0194 |
| 26-Feb-2017 |
Nirav Dave <niravd@google.com> |
Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r296252 until 256-bit operations are more efficiently generated in X86.
llvm-svn: 296
Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r296252 until 256-bit operations are more efficiently generated in X86.
llvm-svn: 296279
show more ...
|
#
beabf456 |
| 25-Feb-2017 |
Nirav Dave <niravd@google.com> |
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge St
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic.
When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code paths
3. Re-add the Store node to the worklist after calling SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added.
This finishes the change Matt Arsenault started in r246307 and jyknight's original patch.
Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 296252
show more ...
|
Revision tags: llvmorg-4.0.0-rc2 |
|
#
93f9d5ce |
| 02-Feb-2017 |
Nirav Dave <niravd@google.com> |
Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293893 which is miscompiling lua on ARM and bootstrapping for x86-windows.
llvm-svn:
Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293893 which is miscompiling lua on ARM and bootstrapping for x86-windows.
llvm-svn: 293915
show more ...
|
#
4442667f |
| 02-Feb-2017 |
Nirav Dave <niravd@google.com> |
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixing X86 inc/dec chain bug.
* Simplify Consecutive Merge Store Candidate Search
N
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixing X86 inc/dec chain bug.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic.
When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code paths
3. Re-add the Store node to the worklist after calling SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added.
This finishes the change Matt Arsenault started in r246307 and jyknight's original patch.
Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 293893
show more ...
|
#
d32a421f |
| 26-Jan-2017 |
Nirav Dave <niravd@google.com> |
Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293184 which is failing in LTO builds
llvm-svn: 293188
|
#
de6516c4 |
| 26-Jan-2017 |
Nirav Dave <niravd@google.com> |
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, p
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic.
When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code paths
3. Re-add the Store node to the worklist after calling SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added.
This finishes the change Matt Arsenault started in r246307 and jyknight's original patch.
Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 293184
show more ...
|
Revision tags: llvmorg-4.0.0-rc1 |
|
#
f5bf03c7 |
| 14-Dec-2016 |
Nirav Dave <niravd@google.com> |
Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
Reverting due to ARM MCJIT and MIPS LLD error.
This reverts commit r289659.
llvm-svn: 289667
|
#
8527ab0a |
| 14-Dec-2016 |
Nirav Dave <niravd@google.com> |
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing after removing load-store factoring through token factors in favor of improved token factor o
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing after removing load-store factoring through token factors in favor of improved token factor operand pruning
Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic.
Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions).
Additional Minor Changes:
1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests.
This finishes the change Matt Arsenault started in r246307 and jyknight's original patch.
Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations
Noteworthy tests:
CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right.
CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll -
The backend now generates *worse* code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently.
CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself?
CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding.
CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores
CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls
CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores
CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores *CAN* be moved past volatile loads, and now are.
CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing.
CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel
Differential Revision: https://reviews.llvm.org/D14834
llvm-svn: 289659
show more ...
|
#
bedb5d90 |
| 09-Dec-2016 |
Nirav Dave <niravd@google.com> |
Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r289221 which appears to be triggering an assertion
llvm-svn: 289226
|
#
fd51ff4f |
| 09-Dec-2016 |
Nirav Dave <niravd@google.com> |
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing overly aggressive load-store forwarding optimization.
Simplify Consecutive Merge Store Candi
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing overly aggressive load-store forwarding optimization.
Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic.
Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions).
Additional Minor Changes:
1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests.
This finishes the change Matt Arsenault started in r246307 and jyknight's original patch.
Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations
Noteworthy tests:
CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right.
CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll -
The backend now generates *worse* code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently.
CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself?
CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding.
CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores
CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls
CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores
CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores *CAN* be moved past volatile loads, and now are.
CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing.
CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel
Differential Revision: https://reviews.llvm.org/D14834
llvm-svn: 289221
show more ...
|
Revision tags: llvmorg-3.9.1, llvmorg-3.9.1-rc3, llvmorg-3.9.1-rc2, llvmorg-3.9.1-rc1 |
|
#
a81682aa |
| 13-Oct-2016 |
Nirav Dave <niravd@google.com> |
Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r284151 which appears to be triggering a LTO failures on Hexagon
llvm-svn: 284157
|
#
4b369572 |
| 13-Oct-2016 |
Nirav Dave <niravd@google.com> |
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after upstream changes.
Simplify Consecutive Merge Store Candidate Search
Now that address alia
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after upstream changes.
Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic.
Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions).
Additional Minor Changes:
1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests.
This finishes the change Matt Arsenault started in r246307 and jyknight's original patch.
Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations
Noteworthy tests:
CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right.
CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll -
The backend now generates *worse* code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently.
CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself?
CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding.
CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores
CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls
CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores
CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores *CAN* be moved past volatile loads, and now are.
CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing.
CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged.
CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel
Differential Revision: https://reviews.llvm.org/D14834
llvm-svn: 284151
show more ...
|
#
081385a7 |
| 12-Oct-2016 |
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> |
[DAGCombiner] Do not remove the load of stored values when optimizations are disabled
This combiner breaks debug experience and should not be run when optimizations are disabled.
For example: int
[DAGCombiner] Do not remove the load of stored values when optimizations are disabled
This combiner breaks debug experience and should not be run when optimizations are disabled.
For example: int main() { int j = 0; j += 2; if (j == 2) return 0; return 5; } When debugging this code compiled in /O0, it should be valid to break at line "j+=2;" and edit the value of j. It should change the return value of the function.
Differential Revision: https://reviews.llvm.org/D19268
llvm-svn: 284014
show more ...
|
#
e524f508 |
| 28-Sep-2016 |
Nirav Dave <niravd@google.com> |
Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT
llvm-svn: 282604
|
#
e17e055b |
| 28-Sep-2016 |
Nirav Dave <niravd@google.com> |
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push th
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic.
Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions).
Additional Minor Changes:
1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests.
This finishes the change Matt Arsenault started in r246307 and jyknight's original patch.
Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations
Noteworthy tests:
CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right.
CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates *worse* code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently.
CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself?
CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding.
CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores
CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls
CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores
CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores *CAN* be moved past volatile loads, and now are.
CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing.
CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged.
CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior.
Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight
Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel
Differential Revision: https://reviews.llvm.org/D14834
llvm-svn: 282600
show more ...
|
Revision tags: llvmorg-3.9.0, llvmorg-3.9.0-rc3, llvmorg-3.9.0-rc2, llvmorg-3.9.0-rc1, llvmorg-3.8.1, llvmorg-3.8.1-rc1, llvmorg-3.8.0, llvmorg-3.8.0-rc3, llvmorg-3.8.0-rc2, llvmorg-3.8.0-rc1, llvmorg-3.7.1, llvmorg-3.7.1-rc2, llvmorg-3.7.1-rc1, llvmorg-3.7.0, llvmorg-3.7.0-rc4, llvmorg-3.7.0-rc3, studio-1.4, llvmorg-3.7.0-rc2, llvmorg-3.7.0-rc1, llvmorg-3.6.2, llvmorg-3.6.2-rc1, llvmorg-3.6.1, llvmorg-3.6.1-rc1, llvmorg-3.5.2, llvmorg-3.5.2-rc1 |
|
#
a79ac14f |
| 27-Feb-2015 |
David Blaikie <dblaikie@gmail.com> |
[opaque pointer type] Add textual IR support for explicit type parameter to load instruction
Essentially the same as the GEP change in r230786.
A similar migration script can be used to update test
[opaque pointer type] Add textual IR support for explicit type parameter to load instruction
Essentially the same as the GEP change in r230786.
A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278)
import fileinput import sys import re
pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")
for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))
Reviewers: rafael, dexonsmith, grosser
Differential Revision: http://reviews.llvm.org/D7649
llvm-svn: 230794
show more ...
|