History log of /llvm-project/llvm/lib/CodeGen/MachineBlockPlacement.cpp (Results 126 – 150 of 331)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 8426d134 23-Aug-2017 Matthias Braun <matze@braunis.de>

Add test case for r311511

This also changes the TailDuplicator to be configured explicitely
pre/post regalloc rather than relying on the isSSA() flag. This was
necessary to have `llc -run-pass` work

Add test case for r311511

This also changes the TailDuplicator to be configured explicitely
pre/post regalloc rather than relying on the isSSA() flag. This was
necessary to have `llc -run-pass` work reliably.

llvm-svn: 311520

show more ...


# c0541dfa 17-Aug-2017 Richard Smith <richard-llvm@metafoo.co.uk>

Increase tail dup threshold for -O3 from 3 to 4.

We see a modest performance improvement from this slightly higher tail dup threshold.

Differential Revision: https://reviews.llvm.org/D36775

llvm-s

Increase tail dup threshold for -O3 from 3 to 4.

We see a modest performance improvement from this slightly higher tail dup threshold.

Differential Revision: https://reviews.llvm.org/D36775

llvm-svn: 311139

show more ...


Revision tags: llvmorg-5.0.0-rc2
# 74f61dd8 04-Aug-2017 Kyle Butt <kyle+llvm@iteratee.net>

BlockPlacement: add a flag to force cold block outlining w/o a profile.

NFC.

llvm-svn: 310129


Revision tags: llvmorg-5.0.0-rc1
# 0e831c99 11-Jul-2017 Serguei Katkov <serguei.katkov@azul.com>

Revert Revert [MBP] do not rotate loop if it creates extra branch

This is a second attempt to land this patch.

The first one resulted in a crash of clang sanitizer buildbot.
The fix is here and reg

Revert Revert [MBP] do not rotate loop if it creates extra branch

This is a second attempt to land this patch.

The first one resulted in a crash of clang sanitizer buildbot.
The fix is here and regression test is added.

This is a last fix for the corner case of PR32214. Actually this is not really corner case in general.

We should not do a loop rotation if we create an additional branch due to it.
Consider the case where we have a loop chain H, M, B, C , where
H is header with viable fallthrough from pre-header and exit from the loop
M - some middle block
B - backedge to Header but with exit from the loop also.
C - some cold block of the loop.

Let's H is determined as a best exit. If we do a loop rotation M, B, C, H we can introduce the extra branch.
Let's compute the change in number of branches:
+1 branch from pre-header to header
-1 branch from header to exit
+1 branch from header to middle block if there is such
-1 branch from cold bock to header if there is one

So if C is not a predecessor of H then we introduce extra branch.

This change actually prohibits rotation of the loop if both true
Best Exit has next element in chain as successor.
Last element in chain is not a predecessor of first element of chain.

Reviewers: iteratee, xur, sammccall, chandlerc
Reviewed By: iteratee
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34745

llvm-svn: 307631

show more ...


# 0e70206c 26-Jun-2017 Serguei Katkov <serguei.katkov@azul.com>

This reverts commit r306272.

Revert "[MBP] do not rotate loop if it creates extra branch"

It breaks the sanitizer build bots. Need to fix this.

llvm-svn: 306276


# b01fff06 26-Jun-2017 Serguei Katkov <serguei.katkov@azul.com>

[MBP] do not rotate loop if it creates extra branch

This is a last fix for the corner case of PR32214. Actually this is not really corner case in general.

We should not do a loop rotation if we cre

[MBP] do not rotate loop if it creates extra branch

This is a last fix for the corner case of PR32214. Actually this is not really corner case in general.

We should not do a loop rotation if we create an additional branch due to it.
Consider the case where we have a loop chain H, M, B, C , where
H is header with viable fallthrough from pre-header and exit from the loop
M - some middle block
B - backedge to Header but with exit from the loop also.
C - some cold block of the loop.

Let's H is determined as a best exit. If we do a loop rotation M, B, C, H we can introduce the extra branch.
Let's compute the change in number of branches:
+1 branch from pre-header to header
-1 branch from header to exit
+1 branch from header to middle block if there is such
-1 branch from cold bock to header if there is one

So if C is not a predecessor of H then we introduce extra branch.

This change actually prohibits rotation of the loop if both true
1) Best Exit has next element in chain as successor.
2) Last element in chain is not a predecessor of first element of chain.

Reviewers: iteratee, xur
Reviewed By: iteratee
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34271

llvm-svn: 306272

show more ...


# 3c358f8c 16-Jun-2017 Hiroshi Inoue <inouehrs@jp.ibm.com>

[MachineBlockPlacement] trivial fix in comments, NFC

- Topologocal is abbreviated as "topo" in comments, but "top" is used in only one comment. Modify it for consistency.
- Capitalize "succ" and "pr

[MachineBlockPlacement] trivial fix in comments, NFC

- Topologocal is abbreviated as "topo" in comments, but "top" is used in only one comment. Modify it for consistency.
- Capitalize "succ" and "pred" for consistency in one figure.
- Other trivial fixes.

llvm-svn: 305552

show more ...


Revision tags: llvmorg-4.0.1, llvmorg-4.0.1-rc3
# 6bda14b3 06-Jun-2017 Chandler Carruth <chandlerc@gmail.com>

Sort the remaining #include lines in include/... and lib/....

I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line

Sort the remaining #include lines in include/... and lib/....

I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.

I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.

This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.

Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).

llvm-svn: 304787

show more ...


Revision tags: llvmorg-4.0.1-rc2
# 1527baab 25-May-2017 Matthias Braun <matze@braunis.de>

CodeGen: Rename DEBUG_TYPE to match passnames

Rename the DEBUG_TYPE to match the names of corresponding passes where
it makes sense. Also establish the pattern of simply referencing
DEBUG_TYPE inste

CodeGen: Rename DEBUG_TYPE to match passnames

Rename the DEBUG_TYPE to match the names of corresponding passes where
it makes sense. Also establish the pattern of simply referencing
DEBUG_TYPE instead of repeating the passname where possible.

llvm-svn: 303921

show more ...


# 0cf5b2f8 17-May-2017 Kyle Butt <kyle+llvm@iteratee.net>

CodeGen: BlockPlacement: Add Message strings to asserts. NFC

Add message strings to all the unlabeled asserts in the file.

Differential Revision: https://reviews.llvm.org/D33078

llvm-svn: 303316


# 7d531dae 15-May-2017 Kyle Butt <kyle+llvm@iteratee.net>

CodeGen: BlockPlacement: Increase tail duplication size for O3.

At O3 we are more willing to increase size if we believe it will improve
performance. The current threshold for tail-duplication of 2

CodeGen: BlockPlacement: Increase tail duplication size for O3.

At O3 we are more willing to increase size if we believe it will improve
performance. The current threshold for tail-duplication of 2 instructions is
conservative, and can be relaxed at O3.

Benchmark results:
llvm test-suite:
6% improvement in aha, due to duplication of loop latch
3% improvement in hexxagon

2% slowdown in lpbench. Seems related, but couldn't completely diagnose.

Internal google benchmark:
Produces 4% improvement on internal google protocol buffer serialization
benchmarks.

Differential-Revision: https://reviews.llvm.org/D32324
llvm-svn: 303084

show more ...


Revision tags: llvmorg-4.0.1-rc1
# 336c78fd 12-Apr-2017 Kyle Butt <kyle+llvm@iteratee.net>

CodeGen: BlockPlacement: Add comment about DenseMap Safety.

The use of a DenseMap in precomputeTriangleChains does not cause
non-determinism, even though it is iterated over, as the only thing the
i

CodeGen: BlockPlacement: Add comment about DenseMap Safety.

The use of a DenseMap in precomputeTriangleChains does not cause
non-determinism, even though it is iterated over, as the only thing the
iteration does is to insert entries into a new DenseMap, which is not iterated.
Comment only change.

llvm-svn: 300088

show more ...


# 33580692 12-Apr-2017 Benjamin Kramer <benny.kra@googlemail.com>

[MachineBlockPlacment] Add an assert to ensure there is no order dependency on DenseMap iteration order.

llvm-svn: 300060


# d71461c2 12-Apr-2017 Benjamin Kramer <benny.kra@googlemail.com>

[MachineBlockPlacement] Clean up data structures a bit.

No functionality change intended.

llvm-svn: 300059


# 04300b03 12-Apr-2017 Kyle Butt <kyle+llvm@iteratee.net>

CodeGen: BlockPlacement: Clear ComputedEdges between functions.

Not clearing was causing non-deterministic compiles for large files. Addresses
for MachineBasicBlocks would end up colliding and we wo

CodeGen: BlockPlacement: Clear ComputedEdges between functions.

Not clearing was causing non-deterministic compiles for large files. Addresses
for MachineBasicBlocks would end up colliding and we would lay out a block that
we assumed had been pre-computed when it had not been.

llvm-svn: 300022

show more ...


# 7e8be286 10-Apr-2017 Kyle Butt <kyle+llvm@iteratee.net>

CodeGen: BlockPlacement: Don't always tail-duplicate with no other successor.

The math works out where it can actually be counter-productive. The probability
calculations correctly handle the case w

CodeGen: BlockPlacement: Don't always tail-duplicate with no other successor.

The math works out where it can actually be counter-productive. The probability
calculations correctly handle the case where the alternative is 0 probability,
rely on those calculations.

Includes a test case that demonstrates the problem.

llvm-svn: 299892

show more ...


# ee51a201 10-Apr-2017 Kyle Butt <kyle+llvm@iteratee.net>

CodeGen: BlockPlacement: Minor probability changes.

Qin may be large, and Succ may be more frequent than BB. Take these both into
account when deciding if tail-duplication is profitable.

llvm-svn:

CodeGen: BlockPlacement: Minor probability changes.

Qin may be large, and Succ may be more frequent than BB. Take these both into
account when deciding if tail-duplication is profitable.

llvm-svn: 299891

show more ...


# b197d5b0 23-Mar-2017 Dehao Chen <dehao@google.com>

Fix trellis layout to avoid mis-identify triangle.

Summary:
For the following CFG:

A->B
B->C
A->C

If there is another edge B->D, then ABC should not be considered as triangle.

Reviewers: davidxl,

Fix trellis layout to avoid mis-identify triangle.

Summary:
For the following CFG:

A->B
B->C
A->C

If there is another edge B->D, then ABC should not be considered as triangle.

Reviewers: davidxl, iteratee

Reviewed By: iteratee

Subscribers: nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D31310

llvm-svn: 298661

show more ...


# 08655997 16-Mar-2017 Kyle Butt <kyle+llvm@iteratee.net>

CodeGen: BlockPlacement: Reduce TriangleChainCount to 2

This produces a 1% speedup on an important internal Google benchmark
(protocol buffers), with no other regressions in google or in the llvm
te

CodeGen: BlockPlacement: Reduce TriangleChainCount to 2

This produces a 1% speedup on an important internal Google benchmark
(protocol buffers), with no other regressions in google or in the llvm
test-suite. Only 5 targets in the entire llvm test-suite are affected,
and on those 5 targets the size increase is 0.027%

llvm-svn: 297925

show more ...


Revision tags: llvmorg-4.0.0, llvmorg-4.0.0-rc4
# 1fa60307 03-Mar-2017 Kyle Butt <kyle+llvm@iteratee.net>

CodeGen: BlockPlacement: Precompute layout for chains of triangles.

For chains of triangles with small join blocks that can be tail duplicated, a
simple calculation of probabilities is insufficient.

CodeGen: BlockPlacement: Precompute layout for chains of triangles.

For chains of triangles with small join blocks that can be tail duplicated, a
simple calculation of probabilities is insufficient. Tail duplication
can be profitable in 3 different ways for these cases:

1) The post-dominators marked 50% are actually taken 56% (This shrinks with
longer chains)
2) The chains are statically correlated. Branch probabilities have a very
U-shaped distribution.
[http://nrs.harvard.edu/urn-3:HUL.InstRepos:24015805]
If the branches in a chain are likely to be from the same side of the
distribution as their predecessor, but are independent at runtime, this
transformation is profitable. (Because the cost of being wrong is a small
fixed cost, unlike the standard triangle layout where the cost of being
wrong scales with the # of triangles.)
3) The chains are dynamically correlated. If the probability that a previous
branch was taken positively influences whether the next branch will be
taken
We believe that 2 and 3 are common enough to justify the small margin in 1.

The code pre-scans a function's CFG to identify this pattern and marks the edges
so that the standard layout algorithm can use the computed results.

llvm-svn: 296845

show more ...


# 1393761e 02-Mar-2017 Kyle Butt <kyle+llvm@iteratee.net>

CodeGen: MachineBlockPlacement: Remove the unused outlining heuristic.

Outlining optional branches isn't a good heuristic, and it's never been
on by default. Remove it to clean things up.

llvm-svn:

CodeGen: MachineBlockPlacement: Remove the unused outlining heuristic.

Outlining optional branches isn't a good heuristic, and it's never been
on by default. Remove it to clean things up.

llvm-svn: 296818

show more ...


Revision tags: llvmorg-4.0.0-rc3
# ebe6cc4d 23-Feb-2017 Kyle Butt <kyle+llvm@iteratee.net>

CodeGen: MachineBlockPlacement: Rename member to more general name. NFC.

Rename ComputedTrellisEdges to ComputedEdges to allow for other methods of
pre-computing edges.

Differential Revision: https

CodeGen: MachineBlockPlacement: Rename member to more general name. NFC.

Rename ComputedTrellisEdges to ComputedEdges to allow for other methods of
pre-computing edges.

Differential Revision: https://reviews.llvm.org/D30308

llvm-svn: 296018

show more ...


# 7fbec9bd 15-Feb-2017 Kyle Butt <kyle+llvm@iteratee.net>

Codegen: Make chains from trellis-shaped CFGs

Lay out trellis-shaped CFGs optimally.
A trellis of the shape below:

A B
|\ /|
| \ / |
| X |
| / \ |
|/ \|
C D

would be la

Codegen: Make chains from trellis-shaped CFGs

Lay out trellis-shaped CFGs optimally.
A trellis of the shape below:

A B
|\ /|
| \ / |
| X |
| / \ |
|/ \|
C D

would be laid out A; B->C ; D by the current layout algorithm. Now we identify
trellises and lay them out either A->C; B->D or A->D; B->C. This scales with an
increasing number of predecessors. A trellis is a a group of 2 or more
predecessor blocks that all have the same successors.

because of this we can tail duplicate to extend existing trellises.

As an example consider the following CFG:

B D F H
/ \ / \ / \ / \
A---C---E---G---Ret

Where A,C,E,G are all small (Currently 2 instructions).

The CFG preserving layout is then A,B,C,D,E,F,G,H,Ret.

The current code will copy C into B, E into D and G into F and yield the layout
A,C,B(C),E,D(E),F(G),G,H,ret

define void @straight_test(i32 %tag) {
entry:
br label %test1
test1: ; A
%tagbit1 = and i32 %tag, 1
%tagbit1eq0 = icmp eq i32 %tagbit1, 0
br i1 %tagbit1eq0, label %test2, label %optional1
optional1: ; B
call void @a()
br label %test2
test2: ; C
%tagbit2 = and i32 %tag, 2
%tagbit2eq0 = icmp eq i32 %tagbit2, 0
br i1 %tagbit2eq0, label %test3, label %optional2
optional2: ; D
call void @b()
br label %test3
test3: ; E
%tagbit3 = and i32 %tag, 4
%tagbit3eq0 = icmp eq i32 %tagbit3, 0
br i1 %tagbit3eq0, label %test4, label %optional3
optional3: ; F
call void @c()
br label %test4
test4: ; G
%tagbit4 = and i32 %tag, 8
%tagbit4eq0 = icmp eq i32 %tagbit4, 0
br i1 %tagbit4eq0, label %exit, label %optional4
optional4: ; H
call void @d()
br label %exit
exit:
ret void
}

here is the layout after D27742:
straight_test: # @straight_test
; ... Prologue elided
; BB#0: # %entry ; A (merged with test1)
; ... More prologue elided
mr 30, 3
andi. 3, 30, 1
bc 12, 1, .LBB0_2
; BB#1: # %test2 ; C
rlwinm. 3, 30, 0, 30, 30
beq 0, .LBB0_3
b .LBB0_4
.LBB0_2: # %optional1 ; B (copy of C)
bl a
nop
rlwinm. 3, 30, 0, 30, 30
bne 0, .LBB0_4
.LBB0_3: # %test3 ; E
rlwinm. 3, 30, 0, 29, 29
beq 0, .LBB0_5
b .LBB0_6
.LBB0_4: # %optional2 ; D (copy of E)
bl b
nop
rlwinm. 3, 30, 0, 29, 29
bne 0, .LBB0_6
.LBB0_5: # %test4 ; G
rlwinm. 3, 30, 0, 28, 28
beq 0, .LBB0_8
b .LBB0_7
.LBB0_6: # %optional3 ; F (copy of G)
bl c
nop
rlwinm. 3, 30, 0, 28, 28
beq 0, .LBB0_8
.LBB0_7: # %optional4 ; H
bl d
nop
.LBB0_8: # %exit ; Ret
ld 30, 96(1) # 8-byte Folded Reload
addi 1, 1, 112
ld 0, 16(1)
mtlr 0
blr

The tail-duplication has produced some benefit, but it has also produced a
trellis which is not laid out optimally. With this patch, we improve the layouts
of such trellises, and decrease the cost calculation for tail-duplication
accordingly.

This patch produces the layout A,C,E,G,B,D,F,H,Ret. This layout does have
back edges, which is a negative, but it has a bigger compensating
positive, which is that it handles the case where there are long strings
of skipped blocks much better than the original layout. Both layouts
handle runs of executed blocks equally well. Branch prediction also
improves if there is any correlation between subsequent optional blocks.

Here is the resulting concrete layout:

straight_test: # @straight_test
; BB#0: # %entry ; A (merged with test1)
mr 30, 3
andi. 3, 30, 1
bc 12, 1, .LBB0_4
; BB#1: # %test2 ; C
rlwinm. 3, 30, 0, 30, 30
bne 0, .LBB0_5
.LBB0_2: # %test3 ; E
rlwinm. 3, 30, 0, 29, 29
bne 0, .LBB0_6
.LBB0_3: # %test4 ; G
rlwinm. 3, 30, 0, 28, 28
bne 0, .LBB0_7
b .LBB0_8
.LBB0_4: # %optional1 ; B (Copy of C)
bl a
nop
rlwinm. 3, 30, 0, 30, 30
beq 0, .LBB0_2
.LBB0_5: # %optional2 ; D (Copy of E)
bl b
nop
rlwinm. 3, 30, 0, 29, 29
beq 0, .LBB0_3
.LBB0_6: # %optional3 ; F (Copy of G)
bl c
nop
rlwinm. 3, 30, 0, 28, 28
beq 0, .LBB0_8
.LBB0_7: # %optional4 ; H
bl d
nop
.LBB0_8: # %exit

Differential Revision: https://reviews.llvm.org/D28522

llvm-svn: 295223

show more ...


# 538d6668 15-Feb-2017 Xinliang David Li <davidxl@google.com>

include function name in dot filename

Differential Revision: http://reviews.llvm.org/D29975

llvm-svn: 295220


Revision tags: llvmorg-4.0.0-rc2
# c7d67eef 04-Feb-2017 Kyle Butt <kyle+llvm@iteratee.net>

[CodeGen]: BlockPlacement: Skip extraneous logging.

Move a check for blocks that are not candidates for tail duplication up before
the logging. Reduces logging noise. No non-logging changes intended

[CodeGen]: BlockPlacement: Skip extraneous logging.

Move a check for blocks that are not candidates for tail duplication up before
the logging. Reduces logging noise. No non-logging changes intended.

llvm-svn: 294086

show more ...


12345678910>>...14