History log of /llvm-project/llvm/lib/CodeGen/TargetLoweringBase.cpp (Results 326 – 350 of 500)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# beabf456 25-Feb-2017 Nirav Dave <niravd@google.com>

In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.

Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

* Simplify Consecutive Merge St

In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.

Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

* Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.

When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.

This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).

Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

Additional Minor Changes:

1. Finishes removing unused AliasLoad code

2. Unifies the chain aggregation in the merged stores across code
paths

3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.

4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.

5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence

6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.

7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)

8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:

CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 296252

show more ...


# 5d534b6a 21-Feb-2017 Geoff Berry <gberry@codeaurora.org>

[CodeGenPrepare] Sink and duplicate more 'and' instructions.

Summary:
Rework the code that was sinking/duplicating (icmp and, 0) sequences
into blocks where they were being used by conditional branc

[CodeGenPrepare] Sink and duplicate more 'and' instructions.

Summary:
Rework the code that was sinking/duplicating (icmp and, 0) sequences
into blocks where they were being used by conditional branches to form
more tbz instructions on AArch64. The new code is more general in that
it just looks for 'and's that have all icmp 0's as users, with a target
hook used to select which subset of 'and' instructions to consider.
This change also enables 'and' sinking for X86, where it is more widely
beneficial than on AArch64.

The 'and' sinking/duplicating code is moved into the optimizeInst phase
of CodeGenPrepare, where it can take advantage of the fact the
OptimizeCmpExpression has already sunk/duplicated any icmps into the
blocks where they are used. One minor complication from this change is
that optimizeLoadExt needed to be updated to always mark 'and's it has
determined should be in the same block as their feeding load in the
InsertedInsts set to avoid an infinite loop of hoisting and sinking the
same 'and'.

This change fixes a regression on X86 in the tsan runtime caused by
moving GVNHoist to a later place in the optimization pipeline (see
PR31382).

Reviewers: t.p.northover, qcolombet, MatzeB

Subscribers: aemerson, mcrosier, sebpop, llvm-commits

Differential Revision: https://reviews.llvm.org/D28813

llvm-svn: 295746

show more ...


# 93f9d5ce 02-Feb-2017 Nirav Dave <niravd@google.com>

Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."

This reverts commit r293893 which is miscompiling lua on ARM and
bootstrapping for x86-windows.

llvm-svn:

Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."

This reverts commit r293893 which is miscompiling lua on ARM and
bootstrapping for x86-windows.

llvm-svn: 293915

show more ...


# 4442667f 02-Feb-2017 Nirav Dave <niravd@google.com>

In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.

Recommiting after fixing X86 inc/dec chain bug.

* Simplify Consecutive Merge Store Candidate Search

N

In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.

Recommiting after fixing X86 inc/dec chain bug.

* Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.

When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.

This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).

Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

Additional Minor Changes:

1. Finishes removing unused AliasLoad code

2. Unifies the chain aggregation in the merged stores across code
paths

3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.

4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.

5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence

6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.

7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)

8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:

CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293893

show more ...


# d32a421f 26-Jan-2017 Nirav Dave <niravd@google.com>

Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."

This reverts commit r293184 which is failing in LTO builds

llvm-svn: 293188


# de6516c4 26-Jan-2017 Nirav Dave <niravd@google.com>

In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.

* Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, p

In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.

* Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.

When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.

This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).

Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

Additional Minor Changes:

1. Finishes removing unused AliasLoad code

2. Unifies the chain aggregation in the merged stores across code
paths

3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.

4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.

5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence

6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.

7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)

8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:

CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293184

show more ...


# e0ebdf4b 13-Jan-2017 David Majnemer <david.majnemer@gmail.com>

[CodeGen] Simplify getRecipEstimateForFunc

It used two attribute lookups when only one was needed.

llvm-svn: 291965


# 116bbab4 13-Jan-2017 Diana Picus <diana.picus@linaro.org>

[CodeGen] Rename MachineInstrBuilder::addOperand. NFC

Rename from addOperand to just add, to match the other method that has been
added to MachineInstrBuilder for adding more than just 1 operand.

S

[CodeGen] Rename MachineInstrBuilder::addOperand. NFC

Rename from addOperand to just add, to match the other method that has been
added to MachineInstrBuilder for adding more than just 1 operand.

See https://reviews.llvm.org/D28057 for the whole discussion.

Differential Revision: https://reviews.llvm.org/D28556

llvm-svn: 291891

show more ...


# 4f31e52f 29-Dec-2016 Igor Laevsky <igmyrj@gmail.com>

Introduce element-wise atomic memcpy intrinsic

This change adds a new intrinsic which is intended to provide memcpy functionality
with additional atomicity guarantees. Please refer to the review thr

Introduce element-wise atomic memcpy intrinsic

This change adds a new intrinsic which is intended to provide memcpy functionality
with additional atomicity guarantees. Please refer to the review thread
or language reference for further details.

Differential Revision: https://reviews.llvm.org/D27133

llvm-svn: 290708

show more ...


# f5bf03c7 14-Dec-2016 Nirav Dave <niravd@google.com>

Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."

Reverting due to ARM MCJIT and MIPS LLD error.

This reverts commit r289659.

llvm-svn: 289667


# 8527ab0a 14-Dec-2016 Nirav Dave <niravd@google.com>

In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.

Retrying after fixing after removing load-store factoring through
token factors in favor of improved token factor o

In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.

Retrying after fixing after removing load-store factoring through
token factors in favor of improved token factor operand pruning

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

1. Finishes removing unused AliasLoad code
2. Unifies the the chain aggregation in the merged stores across
code paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seemed sufficient to not cause regressions in
tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

CodeGen/AArch64/argument-blocks.ll -
It's not entirely clear what the test_varargs_stackalign test is
supposed to be asserting, but the new code looks right.

CodeGen/AArch64/arm64-memset-inline.lli -
CodeGen/AArch64/arm64-stur.ll -
CodeGen/ARM/memset-inline.ll -

The backend now generates *worse* code due to store merging
succeeding, as we do do a 16-byte constant-zero store efficiently.

CodeGen/AArch64/merge-store.ll -
Improved, but there still seems to be an extraneous vector insert
from an element to itself?

CodeGen/PowerPC/ppc64-align-long-double.ll -
Worse code emitted in this case, due to the improved store->load
forwarding.

CodeGen/X86/dag-merge-fast-accesses.ll -
CodeGen/X86/MergeConsecutiveStores.ll -
CodeGen/X86/stores-merging.ll -
CodeGen/Mips/load-store-left-right.ll -
Restored correct merging of non-aligned stores

CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
Improved. Correctly merges buffer_store_dword calls

CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
Improved. Sidesteps loading a stored value and
merges two stores

CodeGen/X86/pr18023.ll -
This test has been removed, as it was asserting incorrect
behavior. Non-volatile stores *CAN* be moved past volatile loads,
and now are.

CodeGen/X86/vector-idiv.ll -
CodeGen/X86/vector-lzcnt-128.ll -
It's basically impossible to tell what these tests are actually
testing. But, looks like the code got better due to the memory
operations being recognized as non-aliasing.

CodeGen/X86/win32-eh.ll -
Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289659

show more ...


# bedb5d90 09-Dec-2016 Nirav Dave <niravd@google.com>

Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."

This reverts commit r289221 which appears to be triggering an assertion

llvm-svn: 289226


# fd51ff4f 09-Dec-2016 Nirav Dave <niravd@google.com>

In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.

Retrying after fixing overly aggressive load-store forwarding optimization.

Simplify Consecutive Merge Store Candi

In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.

Retrying after fixing overly aggressive load-store forwarding optimization.

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

1. Finishes removing unused AliasLoad code
2. Unifies the the chain aggregation in the merged stores across
code paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seemed sufficient to not cause regressions in
tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

CodeGen/AArch64/argument-blocks.ll -
It's not entirely clear what the test_varargs_stackalign test is
supposed to be asserting, but the new code looks right.

CodeGen/AArch64/arm64-memset-inline.lli -
CodeGen/AArch64/arm64-stur.ll -
CodeGen/ARM/memset-inline.ll -

The backend now generates *worse* code due to store merging
succeeding, as we do do a 16-byte constant-zero store efficiently.

CodeGen/AArch64/merge-store.ll -
Improved, but there still seems to be an extraneous vector insert
from an element to itself?

CodeGen/PowerPC/ppc64-align-long-double.ll -
Worse code emitted in this case, due to the improved store->load
forwarding.

CodeGen/X86/dag-merge-fast-accesses.ll -
CodeGen/X86/MergeConsecutiveStores.ll -
CodeGen/X86/stores-merging.ll -
CodeGen/Mips/load-store-left-right.ll -
Restored correct merging of non-aligned stores

CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
Improved. Correctly merges buffer_store_dword calls

CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
Improved. Sidesteps loading a stored value and
merges two stores

CodeGen/X86/pr18023.ll -
This test has been removed, as it was asserting incorrect
behavior. Non-volatile stores *CAN* be moved past volatile loads,
and now are.

CodeGen/X86/vector-idiv.ll -
CodeGen/X86/vector-lzcnt-128.ll -
It's basically impossible to tell what these tests are actually
testing. But, looks like the code got better due to the memory
operations being recognized as non-aliasing.

CodeGen/X86/win32-eh.ll -
Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289221

show more ...


# eb97e355 25-Oct-2016 Evandro Menezes <e.menezes@samsung.com>

Add option to specify minimum number of entries for jump tables

Add an option to allow easier experimentation by target maintainers with the
minimum number of entries to create jump tables. Also cl

Add option to specify minimum number of entries for jump tables

Add an option to allow easier experimentation by target maintainers with the
minimum number of entries to create jump tables. Also clarify the name of
the other existing option governing the creation of jump tables.

Differential revision: https://reviews.llvm.org/D25883

llvm-svn: 285104

show more ...


# 501be9b3 21-Oct-2016 Sanjay Patel <spatel@rotateright.com>

fix variable names; NFCI

Because we're just 'or-ing' these 2 variables later in the code, I
don't think there's a logical bug here, but of course the string with
"no size" is the one that should hav

fix variable names; NFCI

Because we're just 'or-ing' these 2 variables later in the code, I
don't think there's a logical bug here, but of course the string with
"no size" is the one that should have the size suffix stripped off.

llvm-svn: 284826

show more ...


# 0051efcf 20-Oct-2016 Sanjay Patel <spatel@rotateright.com>

[Target] remove TargetRecip class; 2nd try

This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs
caused by faulty usage of StringRef.

This version also renames a pair

[Target] remove TargetRecip class; 2nd try

This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs
caused by faulty usage of StringRef.

This version also renames a pair of functions:
getRecipEstimateDivEnabled()
getRecipEstimateSqrtEnabled()
as suggested by Eric Christopher.

original commit msg:

[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering

This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440

llvm-svn: 284746

show more ...


# 19601fa5 18-Oct-2016 Sanjay Patel <spatel@rotateright.com>

revert r284495: [Target] remove TargetRecip class

There's something wrong with the StringRef usage while parsing the attribute string.

llvm-svn: 284513


# 08fff9ca 18-Oct-2016 Sanjay Patel <spatel@rotateright.com>

[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering

This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes
rathe

[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering

This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440

llvm-svn: 284495

show more ...


# d5c6755d 14-Oct-2016 David L Kreitzer <david.l.kreitzer@intel.com>

[safestack] Use non-thread-local unsafe stack pointer for Contiki OS

Patch by Michael LeMay

Differential revision: http://reviews.llvm.org/D19852

llvm-svn: 284254


# a81682aa 13-Oct-2016 Nirav Dave <niravd@google.com>

Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."

This reverts commit r284151 which appears to be triggering a LTO
failures on Hexagon

llvm-svn: 284157


# 4b369572 13-Oct-2016 Nirav Dave <niravd@google.com>

In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.

Retrying after upstream changes.

Simplify Consecutive Merge Store Candidate Search

Now that address alia

In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.

Retrying after upstream changes.

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

1. Finishes removing unused AliasLoad code
2. Unifies the the chain aggregation in the merged stores across
code paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seemed sufficient to not cause regressions in
tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

CodeGen/AArch64/argument-blocks.ll -
It's not entirely clear what the test_varargs_stackalign test is
supposed to be asserting, but the new code looks right.

CodeGen/AArch64/arm64-memset-inline.lli -
CodeGen/AArch64/arm64-stur.ll -
CodeGen/ARM/memset-inline.ll -

The backend now generates *worse* code due to store merging
succeeding, as we do do a 16-byte constant-zero store efficiently.

CodeGen/AArch64/merge-store.ll -
Improved, but there still seems to be an extraneous vector insert
from an element to itself?

CodeGen/PowerPC/ppc64-align-long-double.ll -
Worse code emitted in this case, due to the improved store->load
forwarding.

CodeGen/X86/dag-merge-fast-accesses.ll -
CodeGen/X86/MergeConsecutiveStores.ll -
CodeGen/X86/stores-merging.ll -
CodeGen/Mips/load-store-left-right.ll -
Restored correct merging of non-aligned stores

CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
Improved. Correctly merges buffer_store_dword calls

CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
Improved. Sidesteps loading a stored value and
merges two stores

CodeGen/X86/pr18023.ll -
This test has been removed, as it was asserting incorrect
behavior. Non-volatile stores *CAN* be moved past volatile loads,
and now are.

CodeGen/X86/vector-idiv.ll -
CodeGen/X86/vector-lzcnt-128.ll -
It's basically impossible to tell what these tests are actually
testing. But, looks like the code got better due to the memory
operations being recognized as non-aliasing.

CodeGen/X86/win32-eh.ll -
Both loads of the securitycookie are now merged.

CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
This test appears to work but no longer exhibits the spill behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 284151

show more ...


# 5c924d71 09-Oct-2016 Peter Collingbourne <peter@pcc.me.uk>

Target: Remove unused entities.

llvm-svn: 283690


# bfdbea64 04-Oct-2016 Sanjay Patel <spatel@rotateright.com>

[Target] move reciprocal estimate settings from TargetOptions to TargetLowering

The motivation for the change is that we can't have pseudo-global settings for
codegen living in TargetOptions because

[Target] move reciprocal estimate settings from TargetOptions to TargetLowering

The motivation for the change is that we can't have pseudo-global settings for
codegen living in TargetOptions because that doesn't work with LTO.

Ideally, these reciprocal attributes will be moved to the instruction-level via
FMF, metadata, or something else. But making them function attributes is at least
an improvement over the current state.

The ingredients of this patch are:

Remove the reciprocal estimate command-line debug option.
Add TargetRecip to TargetLowering.
Remove TargetRecip from TargetOptions.
Clean up the TargetRecip implementation to work with this new scheme.
Set the default reciprocal settings in TargetLoweringBase (everything is off).
Update the PowerPC defaults, users, and tests.
Update the x86 defaults, users, and tests.

Note that if this patch needs to be reverted, the related clang patch checked in
at r283251 should be reverted too.

Differential Revision: https://reviews.llvm.org/D24816

llvm-svn: 283252

show more ...


# e524f508 28-Sep-2016 Nirav Dave <niravd@google.com>

Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."

This reverts commit r282600 due to test failues with MCJIT

llvm-svn: 282604


# e17e055b 28-Sep-2016 Nirav Dave <niravd@google.com>

In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push th

In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

1. Finishes removing unused AliasLoad code
2. Unifies the the chain aggregation in the merged stores across
code paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seemed sufficient to not cause regressions in
tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

CodeGen/AArch64/argument-blocks.ll -
It's not entirely clear what the test_varargs_stackalign test is
supposed to be asserting, but the new code looks right.

CodeGen/AArch64/arm64-memset-inline.lli -
CodeGen/AArch64/arm64-stur.ll -
CodeGen/ARM/memset-inline.ll -
The backend now generates *worse* code due to store merging
succeeding, as we do do a 16-byte constant-zero store efficiently.

CodeGen/AArch64/merge-store.ll -
Improved, but there still seems to be an extraneous vector insert
from an element to itself?

CodeGen/PowerPC/ppc64-align-long-double.ll -
Worse code emitted in this case, due to the improved store->load
forwarding.

CodeGen/X86/dag-merge-fast-accesses.ll -
CodeGen/X86/MergeConsecutiveStores.ll -
CodeGen/X86/stores-merging.ll -
CodeGen/Mips/load-store-left-right.ll -
Restored correct merging of non-aligned stores

CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
Improved. Correctly merges buffer_store_dword calls

CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
Improved. Sidesteps loading a stored value and merges two stores

CodeGen/X86/pr18023.ll -
This test has been removed, as it was asserting incorrect
behavior. Non-volatile stores *CAN* be moved past volatile loads,
and now are.

CodeGen/X86/vector-idiv.ll -
CodeGen/X86/vector-lzcnt-128.ll -
It's basically impossible to tell what these tests are actually
testing. But, looks like the code got better due to the memory
operations being recognized as non-aliasing.

CodeGen/X86/win32-eh.ll -
Both loads of the securitycookie are now merged.

CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
This test appears to work but no longer exhibits the spill
behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 282600

show more ...


1...<<11121314151617181920