Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2 |
|
#
e6bf48d1 |
| 02-Oct-2024 |
Jeremy Morse <jeremy.morse@sony.com> |
[X86] Don't request 0x90 nop filling in p2align directives (#110134)
As of rev ea222be0d, LLVMs assembler will actually try to honour the
"fill value" part of p2align directives. X86 printed these
[X86] Don't request 0x90 nop filling in p2align directives (#110134)
As of rev ea222be0d, LLVMs assembler will actually try to honour the
"fill value" part of p2align directives. X86 printed these as 0x90, which
isn't actually what it wanted: we want multi-byte nops for .text
padding. Compiling via a textual assembly file produces single-byte
nop padding since ea222be0d but the built-in assembler will produce
multi-byte nops. This divergent behaviour is undesirable.
To fix: don't set the byte padding field for x86, which allows the
assembler to pick multi-byte nops. Test that we get the same multi-byte
padding when compiled via textual assembly or directly to object file.
Added same-align-bytes-with-llasm-llobj.ll to that effect, updated
numerous other tests to not contain check-lines for the explicit padding.
show more ...
|
Revision tags: llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6 |
|
#
2f448bf5 |
| 22-Jun-2022 |
Nikita Popov <npopov@redhat.com> |
[X86] Migrate tests to use opaque pointers (NFC)
Test updates were performed using: https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34
These are only the test updates where the test pas
[X86] Migrate tests to use opaque pointers (NFC)
Test updates were performed using: https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34
These are only the test updates where the test passed without further modification (which is almost all of them, as the backend is largely pointer-type agnostic).
show more ...
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3 |
|
#
2f215cf3 |
| 20-Feb-2020 |
Djordje Todorovic <djordje.todorovic@rt-rk.com> |
Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit rGfaff707db82d. A failure found on an ARM 2-stage buildbot. The investigation is needed.
|
#
faff707d |
| 19-Feb-2020 |
Djordje Todorovic <djordje.todorovic@rt-rk.com> |
Reland "[DebugInfo] Enable the debug entry values feature by default"
Differential Revision: https://reviews.llvm.org/D73534
|
#
2bf44d11 |
| 18-Feb-2020 |
Djordje Todorovic <djordje.todorovic@rt-rk.com> |
Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit rGa82d3e8a6e67.
|
#
a82d3e8a |
| 18-Feb-2020 |
Djordje Todorovic <djordje.todorovic@rt-rk.com> |
Reland "[DebugInfo] Enable the debug entry values feature by default"
This patch enables the debug entry values feature.
- Remove the (CC1) experimental -femit-debug-entry-values option - Enabl
Reland "[DebugInfo] Enable the debug entry values feature by default"
This patch enables the debug entry values feature.
- Remove the (CC1) experimental -femit-debug-entry-values option - Enable it for x86, arm and aarch64 targets - Resolve the test failures - Leave the llc experimental option for targets that do not support the CallSiteInfo yet
Differential Revision: https://reviews.llvm.org/D73534
show more ...
|
#
a5ac8ca3 |
| 18-Feb-2020 |
Djordje Todorovic <djordje.todorovic@rt-rk.com> |
[CSInfo][TailDuplicator] Delete the call site info when removing dead MBBs
This is needed for the debug entry values feature.
Differential Revision: https://reviews.llvm.org/D74702
|
Revision tags: llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2 |
|
#
72942459 |
| 05-Dec-2019 |
Guozhi Wei <carrot@google.com> |
[MBP] Avoid tail duplication if it can't bring benefit
Current tail duplication integrated in bb layout is designed to increase the fallthrough from a BB's predecessor to its successor, but we have
[MBP] Avoid tail duplication if it can't bring benefit
Current tail duplication integrated in bb layout is designed to increase the fallthrough from a BB's predecessor to its successor, but we have observed cases that duplication doesn't increase fallthrough, or it brings too much size overhead.
To overcome these two issues in function canTailDuplicateUnplacedPreds I add two checks:
make sure there is at least one duplication in current work set. the number of duplication should not exceed the number of successors.
The modification in hasBetterLayoutPredecessor fixes a bug that potential predecessor must be at the bottom of a chain.
Differential Revision: https://reviews.llvm.org/D64376
show more ...
|
Revision tags: llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3 |
|
#
f9f81289 |
| 29-Aug-2019 |
Jordan Rupprecht <rupprecht@google.com> |
Revert [MBP] Disable aggressive loop rotate in plain mode
This reverts r369664 (git commit 51f48295cbe8fa3a44db263b528dd9f7bae7bf9a)
It causes many benchmark regressions, internally and in llvm's b
Revert [MBP] Disable aggressive loop rotate in plain mode
This reverts r369664 (git commit 51f48295cbe8fa3a44db263b528dd9f7bae7bf9a)
It causes many benchmark regressions, internally and in llvm's benchmark suite.
llvm-svn: 370398
show more ...
|
#
51f48295 |
| 22-Aug-2019 |
Guozhi Wei <carrot@google.com> |
[MBP] Disable aggressive loop rotate in plain mode
Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile informat
[MBP] Disable aggressive loop rotate in plain mode
Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.
To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.
Differential Revision: https://reviews.llvm.org/D65673
llvm-svn: 369664
show more ...
|
Revision tags: llvmorg-9.0.0-rc2 |
|
#
a45f301f |
| 12-Aug-2019 |
Hans Wennborg <hans@hanshq.net> |
Revert r368339 "[MBP] Disable aggressive loop rotate in plain mode"
It caused assertions to fire when building Chromium:
lib/CodeGen/LiveDebugValues.cpp:331: bool {anonymous}::LiveDebugValues::
Revert r368339 "[MBP] Disable aggressive loop rotate in plain mode"
It caused assertions to fire when building Chromium:
lib/CodeGen/LiveDebugValues.cpp:331: bool {anonymous}::LiveDebugValues::OpenRangesSet::empty() const: Assertion `Vars.empty() == VarLocs.empty() && "open ranges are inconsistent"' failed.
See https://crbug.com/992871#c3 for how to reproduce.
> Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse. > > To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true. > > Differential Revision: https://reviews.llvm.org/D65673
llvm-svn: 368579
show more ...
|
#
80347c3a |
| 08-Aug-2019 |
Guozhi Wei <carrot@google.com> |
[MBP] Disable aggressive loop rotate in plain mode
Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile informat
[MBP] Disable aggressive loop rotate in plain mode
Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.
To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.
Differential Revision: https://reviews.llvm.org/D65673
llvm-svn: 368339
show more ...
|
Revision tags: llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3 |
|
#
d2210af3 |
| 14-Jun-2019 |
Guozhi Wei <carrot@google.com> |
[MBP] Move a latch block with conditional exit and multi predecessors to top of loop
Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another comm
[MBP] Move a latch block with conditional exit and multi predecessors to top of loop
Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is:
* a latch block * it has two successors, one is loop header, another is exit * it has more than one predecessors
If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch.
Differential Revision: https://reviews.llvm.org/D43256
llvm-svn: 363471
show more ...
|
Revision tags: llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1 |
|
#
4465a765 |
| 24-Mar-2019 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[X86] Remove icmp undef from reduced tests
Pre-commit for D59363 (Add icmp UNDEF handling to SelectionDAG::FoldSetCC)
Approved by @spatel (Sanjay Patel)
llvm-svn: 356859
|
Revision tags: llvmorg-8.0.0 |
|
#
8f1b8250 |
| 13-Mar-2019 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[X86] Regenerate tail call tests
llvm-svn: 356083
|
Revision tags: llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3 |
|
#
c0541dfa |
| 17-Aug-2017 |
Richard Smith <richard-llvm@metafoo.co.uk> |
Increase tail dup threshold for -O3 from 3 to 4.
We see a modest performance improvement from this slightly higher tail dup threshold.
Differential Revision: https://reviews.llvm.org/D36775
llvm-s
Increase tail dup threshold for -O3 from 3 to 4.
We see a modest performance improvement from this slightly higher tail dup threshold.
Differential Revision: https://reviews.llvm.org/D36775
llvm-svn: 311139
show more ...
|
Revision tags: llvmorg-5.0.0-rc2, llvmorg-5.0.0-rc1, llvmorg-4.0.1, llvmorg-4.0.1-rc3, llvmorg-4.0.1-rc2, llvmorg-4.0.1-rc1, llvmorg-4.0.0, llvmorg-4.0.0-rc4, llvmorg-4.0.0-rc3 |
|
#
7fbec9bd |
| 15-Feb-2017 |
Kyle Butt <kyle+llvm@iteratee.net> |
Codegen: Make chains from trellis-shaped CFGs
Lay out trellis-shaped CFGs optimally. A trellis of the shape below:
A B |\ /| | \ / | | X | | / \ | |/ \| C D
would be la
Codegen: Make chains from trellis-shaped CFGs
Lay out trellis-shaped CFGs optimally. A trellis of the shape below:
A B |\ /| | \ / | | X | | / \ | |/ \| C D
would be laid out A; B->C ; D by the current layout algorithm. Now we identify trellises and lay them out either A->C; B->D or A->D; B->C. This scales with an increasing number of predecessors. A trellis is a a group of 2 or more predecessor blocks that all have the same successors.
because of this we can tail duplicate to extend existing trellises.
As an example consider the following CFG:
B D F H / \ / \ / \ / \ A---C---E---G---Ret
Where A,C,E,G are all small (Currently 2 instructions).
The CFG preserving layout is then A,B,C,D,E,F,G,H,Ret.
The current code will copy C into B, E into D and G into F and yield the layout A,C,B(C),E,D(E),F(G),G,H,ret
define void @straight_test(i32 %tag) { entry: br label %test1 test1: ; A %tagbit1 = and i32 %tag, 1 %tagbit1eq0 = icmp eq i32 %tagbit1, 0 br i1 %tagbit1eq0, label %test2, label %optional1 optional1: ; B call void @a() br label %test2 test2: ; C %tagbit2 = and i32 %tag, 2 %tagbit2eq0 = icmp eq i32 %tagbit2, 0 br i1 %tagbit2eq0, label %test3, label %optional2 optional2: ; D call void @b() br label %test3 test3: ; E %tagbit3 = and i32 %tag, 4 %tagbit3eq0 = icmp eq i32 %tagbit3, 0 br i1 %tagbit3eq0, label %test4, label %optional3 optional3: ; F call void @c() br label %test4 test4: ; G %tagbit4 = and i32 %tag, 8 %tagbit4eq0 = icmp eq i32 %tagbit4, 0 br i1 %tagbit4eq0, label %exit, label %optional4 optional4: ; H call void @d() br label %exit exit: ret void }
here is the layout after D27742: straight_test: # @straight_test ; ... Prologue elided ; BB#0: # %entry ; A (merged with test1) ; ... More prologue elided mr 30, 3 andi. 3, 30, 1 bc 12, 1, .LBB0_2 ; BB#1: # %test2 ; C rlwinm. 3, 30, 0, 30, 30 beq 0, .LBB0_3 b .LBB0_4 .LBB0_2: # %optional1 ; B (copy of C) bl a nop rlwinm. 3, 30, 0, 30, 30 bne 0, .LBB0_4 .LBB0_3: # %test3 ; E rlwinm. 3, 30, 0, 29, 29 beq 0, .LBB0_5 b .LBB0_6 .LBB0_4: # %optional2 ; D (copy of E) bl b nop rlwinm. 3, 30, 0, 29, 29 bne 0, .LBB0_6 .LBB0_5: # %test4 ; G rlwinm. 3, 30, 0, 28, 28 beq 0, .LBB0_8 b .LBB0_7 .LBB0_6: # %optional3 ; F (copy of G) bl c nop rlwinm. 3, 30, 0, 28, 28 beq 0, .LBB0_8 .LBB0_7: # %optional4 ; H bl d nop .LBB0_8: # %exit ; Ret ld 30, 96(1) # 8-byte Folded Reload addi 1, 1, 112 ld 0, 16(1) mtlr 0 blr
The tail-duplication has produced some benefit, but it has also produced a trellis which is not laid out optimally. With this patch, we improve the layouts of such trellises, and decrease the cost calculation for tail-duplication accordingly.
This patch produces the layout A,C,E,G,B,D,F,H,Ret. This layout does have back edges, which is a negative, but it has a bigger compensating positive, which is that it handles the case where there are long strings of skipped blocks much better than the original layout. Both layouts handle runs of executed blocks equally well. Branch prediction also improves if there is any correlation between subsequent optional blocks.
Here is the resulting concrete layout:
straight_test: # @straight_test ; BB#0: # %entry ; A (merged with test1) mr 30, 3 andi. 3, 30, 1 bc 12, 1, .LBB0_4 ; BB#1: # %test2 ; C rlwinm. 3, 30, 0, 30, 30 bne 0, .LBB0_5 .LBB0_2: # %test3 ; E rlwinm. 3, 30, 0, 29, 29 bne 0, .LBB0_6 .LBB0_3: # %test4 ; G rlwinm. 3, 30, 0, 28, 28 bne 0, .LBB0_7 b .LBB0_8 .LBB0_4: # %optional1 ; B (Copy of C) bl a nop rlwinm. 3, 30, 0, 30, 30 beq 0, .LBB0_2 .LBB0_5: # %optional2 ; D (Copy of E) bl b nop rlwinm. 3, 30, 0, 29, 29 beq 0, .LBB0_3 .LBB0_6: # %optional3 ; F (Copy of G) bl c nop rlwinm. 3, 30, 0, 28, 28 beq 0, .LBB0_8 .LBB0_7: # %optional4 ; H bl d nop .LBB0_8: # %exit
Differential Revision: https://reviews.llvm.org/D28522
llvm-svn: 295223
show more ...
|
Revision tags: llvmorg-4.0.0-rc2, llvmorg-4.0.0-rc1, llvmorg-3.9.1, llvmorg-3.9.1-rc3, llvmorg-3.9.1-rc2, llvmorg-3.9.1-rc1 |
|
#
0846e56e |
| 11-Oct-2016 |
Kyle Butt <kyle+llvm@iteratee.net> |
Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication decisions. This is fine, but passes up duplication opportunities that may arise whe
Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication decisions. This is fine, but passes up duplication opportunities that may arise when blocks are outlined. Because we want the updated CFG to affect subsequent placement decisions, this change must occur during placement.
In order to achieve this goal, TailDuplicationPass is split into a utility class, TailDuplicator, and the pass itself. The pass delegates nearly everything to the TailDuplicator object, except for looping over the blocks in a function. This allows the same code to be used for tail duplication in both places.
This change, in concert with outlining optional branches, allows triangle shaped code to perform much better, esepecially when the taken/untaken branches are correlated, as it creates a second spine when the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that case as well. Issue was worklist/scheduling/taildup issue in layout.
Issue from 2nd rollback fixed, with 2 additional tests. Issue was tail merging/loop info/tail-duplication causing issue with loops that share a header block.
Issue with early tail-duplication of blocks that branch to a fallthrough predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll
Differential revision: https://reviews.llvm.org/D18226
llvm-svn: 283934
show more ...
|
#
ae068a32 |
| 11-Oct-2016 |
Kyle Butt <kyle+llvm@iteratee.net> |
Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication decisions. This is fine, but passes up duplication opportunities that may arise whe
Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication decisions. This is fine, but passes up duplication opportunities that may arise when blocks are outlined. Because we want the updated CFG to affect subsequent placement decisions, this change must occur during placement.
In order to achieve this goal, TailDuplicationPass is split into a utility class, TailDuplicator, and the pass itself. The pass delegates nearly everything to the TailDuplicator object, except for looping over the blocks in a function. This allows the same code to be used for tail duplication in both places.
This change, in concert with outlining optional branches, allows triangle shaped code to perform much better, esepecially when the taken/untaken branches are correlated, as it creates a second spine when the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that case as well. Issue was worklist/scheduling/taildup issue in layout.
Issue from 2nd rollback fixed, with 2 additional tests. Issue was tail merging/loop info/tail-duplication causing issue with loops that share a header block.
Issue with early tail-duplication of blocks that branch to a fallthrough predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll
Differential revision: https://reviews.llvm.org/D18226
llvm-svn: 283842
show more ...
|
#
37e676d8 |
| 07-Oct-2016 |
Kyle Butt <kyle+llvm@iteratee.net> |
Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication decisions. This is fine, but passes up duplication opportunities that may arise whe
Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication decisions. This is fine, but passes up duplication opportunities that may arise when blocks are outlined. Because we want the updated CFG to affect subsequent placement decisions, this change must occur during placement.
In order to achieve this goal, TailDuplicationPass is split into a utility class, TailDuplicator, and the pass itself. The pass delegates nearly everything to the TailDuplicator object, except for looping over the blocks in a function. This allows the same code to be used for tail duplication in both places.
This change, in concert with outlining optional branches, allows triangle shaped code to perform much better, esepecially when the taken/untaken branches are correlated, as it creates a second spine when the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that case as well. Issue was worklist/scheduling/taildup issue in layout.
Issue from 2nd rollback fixed, with 2 additional tests. Issue was tail merging/loop info/tail-duplication causing issue with loops that share a header block.
Differential revision: https://reviews.llvm.org/D18226
llvm-svn: 283619
show more ...
|
#
adabac2d |
| 04-Oct-2016 |
Kyle Butt <kyle+llvm@iteratee.net> |
Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication decisions. This is fine, but passes up duplication opportunities that may arise whe
Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication decisions. This is fine, but passes up duplication opportunities that may arise when blocks are outlined. Because we want the updated CFG to affect subsequent placement decisions, this change must occur during placement.
In order to achieve this goal, TailDuplicationPass is split into a utility class, TailDuplicator, and the pass itself. The pass delegates nearly everything to the TailDuplicator object, except for looping over the blocks in a function. This allows the same code to be used for tail duplication in both places.
This change, in concert with outlining optional branches, allows triangle shaped code to perform much better, esepecially when the taken/untaken branches are correlated, as it creates a second spine when the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that case as well.
Differential revision: https://reviews.llvm.org/D18226
llvm-svn: 283274
show more ...
|
#
396bfdd7 |
| 04-Oct-2016 |
Kyle Butt <kyle+llvm@iteratee.net> |
Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication decisions. This is fine, but passes up duplication opportunities that may arise whe
Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication decisions. This is fine, but passes up duplication opportunities that may arise when blocks are outlined. Because we want the updated CFG to affect subsequent placement decisions, this change must occur during placement.
In order to achieve this goal, TailDuplicationPass is split into a utility class, TailDuplicator, and the pass itself. The pass delegates nearly everything to the TailDuplicator object, except for looping over the blocks in a function. This allows the same code to be used for tail duplication in both places.
This change, in concert with outlining optional branches, allows triangle shaped code to perform much better, esepecially when the taken/untaken branches are correlated, as it creates a second spine when the tests are small enough.
llvm-svn: 283164
show more ...
|