#
1ccca9e6 |
| 01-Dec-2015 |
Cong Hou <congh@google.com> |
Fix a bug in MachineBlockPlacement that may cause assertion failure during BranchProbability construction.
The root cause is the rounding behavior in BranchProbability construction. We may consider
Fix a bug in MachineBlockPlacement that may cause assertion failure during BranchProbability construction.
The root cause is the rounding behavior in BranchProbability construction. We may consider to use truncation instead in the future.
llvm-svn: 254356
show more ...
|
Revision tags: llvmorg-3.7.1 |
|
#
fa1917c6 |
| 01-Dec-2015 |
Cong Hou <congh@google.com> |
Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces.
The patch in http://reviews.llvm.org/D13745 is broken into four parts:
1. New in
Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces.
The patch in http://reviews.llvm.org/D13745 is broken into four parts:
1. New interfaces without functional changes (http://reviews.llvm.org/D13908). 2. Use new interfaces in SelectionDAG, while in other passes treat probabilities as weights (http://reviews.llvm.org/D14361). 3. Use new interfaces in all other passes. 4. Remove old interfaces.
This patch is 3+4 above. In this patch, MBB won't provide weight-based interfaces any more, which are totally replaced by probability-based ones. The interface addSuccessor() is redesigned so that the default probability is unknown. We allow unknown probabilities but don't allow using it together with known probabilities in successor list. That is to say, we either have a list of successors with all known probabilities, or all unknown probabilities. In the latter case, we assume each successor has 1/N probability where N is the number of successors. An assertion checks if the user is attempting to add a successor with the disallowed mixed use as stated above. This can help us catch many misuses.
All uses of weight-based interfaces are now updated to use probability-based ones.
Differential revision: http://reviews.llvm.org/D14973
llvm-svn: 254348
show more ...
|
Revision tags: llvmorg-3.7.1-rc2 |
|
#
41cf1a5d |
| 18-Nov-2015 |
Cong Hou <congh@google.com> |
Improving edge probabilities computation when choosing the best successor in machine block placement.
When looking for the best successor from the outer loop for a block belonging to an inner loop,
Improving edge probabilities computation when choosing the best successor in machine block placement.
When looking for the best successor from the outer loop for a block belonging to an inner loop, the edge probability computation can be improved so that edges in the inner loop are ignored. For example, suppose we are building chains for the non-loop part of the following code, and looking for B1's best successor. Assume the true body is very hot, then B3 should be the best candidate. However, because of the existence of the back edge from B1 to B0, the probability from B1 to B3 can be very small, preventing B3 to be its successor. In this patch, when computing the probability of the edge from B1 to B3, the weight on the back edge B1->B0 is ignored, so that B1->B3 will have 100% probability.
if (...) do { B0; ... // some branches B1; } while(...); else B2; B3;
Differential revision: http://reviews.llvm.org/D10825
llvm-svn: 253414
show more ...
|
Revision tags: llvmorg-3.7.1-rc1 |
|
#
b90b9e05 |
| 02-Nov-2015 |
Cong Hou <congh@google.com> |
In MachineBlockPlacement, filter cold blocks off the loop chain when profile data is available.
In the current BB placement algorithm, a loop chain always contains all loop blocks. This has a drawba
In MachineBlockPlacement, filter cold blocks off the loop chain when profile data is available.
In the current BB placement algorithm, a loop chain always contains all loop blocks. This has a drawback that cold blocks in the loop may be inserted on a hot function path, hence increasing branch cost and also reducing icache locality.
Consider a simple example shown below:
A | B⇆C | D
When B->C is quite cold, the best BB-layout should be A,B,D,C. But the current implementation produces A,C,B,D.
This patch filters those cold blocks off from the loop chain by comparing the ratio:
LoopBBFreq / LoopFreq
to 20%: if it is less than 20%, we don't include this BB to the loop chain. Here LoopFreq is the frequency of the loop when we reduce the loop into a single node. In general we have more cold blocks when the loop has few iterations. And vice versa.
Differential revision: http://reviews.llvm.org/D11662
llvm-svn: 251833
show more ...
|
#
7745dbc5 |
| 19-Oct-2015 |
Cong Hou <congh@google.com> |
Enhance loop rotation with existence of profile data in MachineBlockPlacement pass.
Currently, in MachineBlockPlacement pass the loop is rotated to let the best exit to be the last BB in the loop ch
Enhance loop rotation with existence of profile data in MachineBlockPlacement pass.
Currently, in MachineBlockPlacement pass the loop is rotated to let the best exit to be the last BB in the loop chain, to maximize the fall-through from the loop to outside. With profile data, we can determine the cost in terms of missed fall through opportunities when rotating a loop chain and select the best rotation. Basically, there are three kinds of cost to consider for each rotation:
1. The possibly missed fall through edge (if it exists) from BB out of the loop to the loop header. 2. The possibly missed fall through edges (if they exist) from the loop exits to BB out of the loop. 3. The missed fall through edge (if it exists) from the last BB to the first BB in the loop chain.
Therefore, the cost for a given rotation is the sum of costs listed above. We select the best rotation with the smallest cost. This is only for PGO mode when we have more precise edge frequencies.
Differential revision: http://reviews.llvm.org/D10717
llvm-svn: 250754
show more ...
|
#
6ac07fd2 |
| 09-Oct-2015 |
Duncan P. N. Exon Smith <dexonsmith@apple.com> |
CodeGen: Remove implicit iterator conversions from MBB.cpp
Remove implicit ilist iterator conversions from MachineBasicBlock.cpp.
I've also added an overload of `splice()` that takes a pointer, sin
CodeGen: Remove implicit iterator conversions from MBB.cpp
Remove implicit ilist iterator conversions from MachineBasicBlock.cpp.
I've also added an overload of `splice()` that takes a pointer, since it's a natural API. This is similar to the overloads I added for `remove()` and `erase()` in r249867.
llvm-svn: 249883
show more ...
|
#
77ec0770 |
| 16-Sep-2015 |
Craig Topper <craig.topper@gmail.com> |
Fix a spelling error in the description of a statistic. NFC
llvm-svn: 247771
|
#
0e288234 |
| 27-Aug-2015 |
Reid Kleckner <rnk@google.com> |
[WinEH] Add some support for code generating catchpad
We can now run 32-bit programs with empty catch bodies. The next step is to change PEI so that we get funclet prologues and epilogues.
llvm-sv
[WinEH] Add some support for code generating catchpad
We can now run 32-bit programs with empty catch bodies. The next step is to change PEI so that we get funclet prologues and epilogues.
llvm-svn: 246235
show more ...
|
Revision tags: llvmorg-3.7.0, llvmorg-3.7.0-rc4, llvmorg-3.7.0-rc3, studio-1.4 |
|
#
ec105872 |
| 06-Aug-2015 |
Cong Hou <congh@google.com> |
Revert r244154 which causes some build failure. See https://llvm.org/bugs/show_bug.cgi?id=24377.
llvm-svn: 244239
|
#
36e7e52a |
| 05-Aug-2015 |
Cong Hou <congh@google.com> |
Record whether the weights on out-edges from a MBB are normalized.
1. Create a utility function normalizeEdgeWeights() in MachineBranchProbabilityInfo that normalizes a list of edge weights so that
Record whether the weights on out-edges from a MBB are normalized.
1. Create a utility function normalizeEdgeWeights() in MachineBranchProbabilityInfo that normalizes a list of edge weights so that the sum of then can fit in uint32_t. 2. Provide an interface in MachineBasicBlock to normalize its successors' weights. 3. Add a flag in MachineBasicBlock that tracks whether its successors' weights are normalized. 4. Provide an overload of getSumForBlock that accepts a non-const pointer to a MBB so that it can force normalizing this MBB's successors' weights. 5. Update several uses of getSumForBlock() by eliminating the once needed weight scale.
Differential Revision: http://reviews.llvm.org/D11442
llvm-svn: 244154
show more ...
|
#
924879ad |
| 04-Aug-2015 |
Sanjay Patel <spatel@rotateright.com> |
wrap OptSize and MinSize attributes for easier and consistent access (NFCI)
Create wrapper methods in the Function class for the OptimizeForSize and MinSize attributes. We want to hide the logic of
wrap OptSize and MinSize attributes for easier and consistent access (NFCI)
Create wrapper methods in the Function class for the OptimizeForSize and MinSize attributes. We want to hide the logic of "or'ing" them together when optimizing just for size (-Os).
Currently, we are not consistent about this and rely on a front-end to always set OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here that should be added as follow-on patches with regression tests.
This patch is NFC-intended: it just replaces existing direct accesses of the attributes by the equivalent wrapper call.
Differential Revision: http://reviews.llvm.org/D11734
llvm-svn: 243994
show more ...
|
Revision tags: llvmorg-3.7.0-rc2, llvmorg-3.7.0-rc1 |
|
#
0881fc11 |
| 15-Jul-2015 |
Cong Hou <congh@google.com> |
Test commit.
This is a test commit (one blank line deleted).
llvm-svn: 242308
|
Revision tags: llvmorg-3.6.2, llvmorg-3.6.2-rc1 |
|
#
f00654e3 |
| 23-Jun-2015 |
Alexander Kornienko <alexfh@google.com> |
Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)
Apparently, the style needs to be agreed upon first.
llvm-svn: 240390
|
#
70bc5f13 |
| 19-Jun-2015 |
Alexander Kornienko <alexfh@google.com> |
Fixed/added namespace ending comments using clang-tidy. NFC
The patch is generated using this command:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-*,llvm-namespace-c
Fixed/added namespace ending comments using clang-tidy. NFC
The patch is generated using this command:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \ llvm/lib/
Thanks to Eugene Kosov for the original patch!
llvm-svn: 240137
show more ...
|
Revision tags: llvmorg-3.6.1, llvmorg-3.6.1-rc1 |
|
#
26d3017b |
| 15-Apr-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[MBP] Spell the conditions the same way through out this if statement. NFC.
llvm-svn: 235009
|
#
cfb2b9d7 |
| 15-Apr-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[MBP] Sink a comment into the if block to which it pertains. This makes the content of the comment make much more sense.
llvm-svn: 235007
|
#
9a512a48 |
| 15-Apr-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[MBP] Fix a really misleading typo in a comment.
llvm-svn: 235006
|
#
799003bf |
| 23-Mar-2015 |
Benjamin Kramer <benny.kra@googlemail.com> |
Re-sort includes with sort-includes.py and insert raw_ostream.h where it's used.
llvm-svn: 232998
|
#
214997c6 |
| 20-Mar-2015 |
Daniel Jasper <djasper@google.com> |
[MBP] Don't outline short optional branches
With the option -outline-optional-branches, LLVM will place optional branches out of line (more details on r231230).
With this patch, this is not done fo
[MBP] Don't outline short optional branches
With the option -outline-optional-branches, LLVM will place optional branches out of line (more details on r231230).
With this patch, this is not done for short optional branches. A short optional branch is a branch containing a single block with an instruction count below a certain threshold (defaulting to 3). Still everything is guarded under -outline-optional-branches).
Outlining a short branch can't significantly improve code locality. It can however decrease performance because of the additional jmp and in cases where the optional branch is hot. This fixes a compile time regression I have observed in a benchmark.
Review: http://reviews.llvm.org/D8108 llvm-svn: 232802
show more ...
|
Revision tags: llvmorg-3.5.2, llvmorg-3.5.2-rc1 |
|
#
7a715dae |
| 05-Mar-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[MBP] Use range based for-loops throughout this code. Several had already been added and the inconsistency made choosing names and changing code more annoying. Plus, wow are they better for this code
[MBP] Use range based for-loops throughout this code. Several had already been added and the inconsistency made choosing names and changing code more annoying. Plus, wow are they better for this code!
llvm-svn: 231347
show more ...
|
#
2fc3fe12 |
| 05-Mar-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[MBP] NFC, run clang-format over this code and tweak things to make the result reasonable.
This code predated clang-format and so there was a reasonable amount of crufty formatting that had accumula
[MBP] NFC, run clang-format over this code and tweak things to make the result reasonable.
This code predated clang-format and so there was a reasonable amount of crufty formatting that had accumulated. This should ensure that neither myself nor others end up with formatting-only changes sneaking into other fixes.
llvm-svn: 231341
show more ...
|
#
d0dced58 |
| 05-Mar-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[MBP] This is no longer 'block-placement2'. ;] The old variants are long gone, update this code to reflect that.
llvm-svn: 231340
|
#
af7e99f2 |
| 05-Mar-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[MBP] Revert r231238 which attempted to fix a nasty bug where MBP is just arbitrarily interleaving unrelated control flows once they get moved "out-of-line" (both outside of natural CFG ordering and
[MBP] Revert r231238 which attempted to fix a nasty bug where MBP is just arbitrarily interleaving unrelated control flows once they get moved "out-of-line" (both outside of natural CFG ordering and with diamonds that cannot be fully laid out by chaining fallthrough edges).
This easy solution doesn't work in practice, and it isn't just a small bug. It looks like a very different strategy will be required. I'm working on that now, and it'll again go behind some flag so that everyone can experiment and make sure it is working well for them.
llvm-svn: 231332
show more ...
|
#
9a53fbe2 |
| 04-Mar-2015 |
Chandler Carruth <chandlerc@gmail.com> |
[MBP] Fix a really horrible bug in MachineBlockPlacement, but behind a flag for now.
First off, thanks to Daniel Jasper for really pointing out the issue here. It's been here forever (at least, I th
[MBP] Fix a really horrible bug in MachineBlockPlacement, but behind a flag for now.
First off, thanks to Daniel Jasper for really pointing out the issue here. It's been here forever (at least, I think it was there when I first wrote this code) without getting really noticed or fixed.
The key problem is what happens when two reasonably common patterns happen at the same time: we outline multiple cold regions of code, and those regions in turn have diamonds or other CFGs for which we can't just topologically lay them out. Consider some C code that looks like:
if (a1()) { if (b1()) c1(); else d1(); f1(); } if (a2()) { if (b2()) c2(); else d2(); f2(); } done();
Now consider the case where a1() and a2() are unlikely to be true. In that case, we might lay out the first part of the function like:
a1, a2, done;
And then we will be out of successors in which to build the chain. We go to find the best block to continue the chain with, which is perfectly reasonable here, and find "b1" let's say. Laying out successors gets us to:
a1, a2, done; b1, c1;
At this point, we will refuse to lay out the successor to c1 (f1) because there are still un-placed predecessors of f1 and we want to try to preserve the CFG structure. So we go get the next best block, d1.
... wait for it ...
Except that the next best block *isn't* d1. It is b2! d1 is waaay down inside these conditionals. It is much less important than b2. Except that this is exactly what we didn't want. If we keep going we get the entire set of the rest of the CFG *interleaved*!!!
a1, a2, done; b1, c1; b2, c2; d1, f1; d2, f2;
So we clearly need a better strategy here. =] My current favorite strategy is to actually try to place the block whose predecessor is closest. This very simply ensures that we unwind these kinds of CFGs the way that is natural and fitting, and should minimize the number of cache lines instructions are spread across.
It also happens to be *dead simple*. It's like the datastructure was specifically set up for this use case or something. We only push blocks onto the work list when the last predecessor for them is placed into the chain. So the back of the worklist *is* the nearest next block.
Unfortunately, a change like this is going to cause *soooo* many benchmarks to swing wildly. So for now I'm adding this under a flag so that we and others can validate that this is fixing the problems described, that it seems possible to enable, and hopefully that it fixes more of our problems long term.
llvm-svn: 231238
show more ...
|
#
471e856f |
| 04-Mar-2015 |
Daniel Jasper <djasper@google.com> |
Add a flag to experiment with outlining optional branches.
In a CFG with the edges A->B->C and A->C, B is an optional branch.
LLVM's default behavior is to lay the blocks out naturally, i.e. A, B,
Add a flag to experiment with outlining optional branches.
In a CFG with the edges A->B->C and A->C, B is an optional branch.
LLVM's default behavior is to lay the blocks out naturally, i.e. A, B, C, in order to improve code locality and fallthroughs. However, if a function contains many of those optional branches only a few of which are taken, this leads to a lot of unnecessary icache misses. Moving B out of line can work around this.
Review: http://reviews.llvm.org/D7719 llvm-svn: 231230
show more ...
|