Revision tags: llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1 |
|
#
50b62731 |
| 29-Jul-2021 |
Guozhi Wei <carrot@google.com> |
[MBP] findBestLoopTopHelper should exit if OldTop is not a chain header
Function findBestLoopTopHelper tries to find a new loop top block which can also fall through to OldTop, but it's impossible i
[MBP] findBestLoopTopHelper should exit if OldTop is not a chain header
Function findBestLoopTopHelper tries to find a new loop top block which can also fall through to OldTop, but it's impossible if OldTop is not a chain header, so it should exit immediately.
Differential Revision: https://reviews.llvm.org/D106329
show more ...
|
Revision tags: llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
#
d8aba75a |
| 07-May-2021 |
Fangrui Song <i@maskray.me> |
Internalize some cl::opt global variables or move them under namespace llvm
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3 |
|
#
cd880442 |
| 28-Jan-2021 |
Nicholas Guy <nicholas.guy@arm.com> |
[CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold
Different targets might handle branch performance differently, so this patch allows for targets to speci
[CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold
Different targets might handle branch performance differently, so this patch allows for targets to specify the TailDuplicateSize threshold. Said threshold defines how small a branch can be and still be duplicated to generate straight-line code instead. This patch also specifies said override values for the AArch64 subtarget.
Differential Revision: https://reviews.llvm.org/D95631
show more ...
|
Revision tags: llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1 |
|
#
7bc76fd0 |
| 31-Dec-2020 |
Kazu Hirata <kazu@google.com> |
[CodeGen] Construct SmallVector with iterator ranges (NFC)
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2 |
|
#
687e80be |
| 16-Dec-2020 |
Guozhi Wei <carrot@google.com> |
[MBP] Add whole chain to BlockFilterSet instead of individual BB
Currently we add individual BB to BlockFilterSet if its frequency satisfies
LoopFreq / Freq <= LoopToColdBlockRatio
LoopFreq is e
[MBP] Add whole chain to BlockFilterSet instead of individual BB
Currently we add individual BB to BlockFilterSet if its frequency satisfies
LoopFreq / Freq <= LoopToColdBlockRatio
LoopFreq is edge frequency from outside to loop header. LoopToColdBlockRatio is a command line parameter.
It doesn't make sense since we always layout whole chain, not individual BBs.
It may also cause a tricky problem. Sometimes it is possible that the LoopFreq of an inner loop is smaller than LoopFreq of outer loop. So a BB can be in BlockFilterSet of inner loop, but not in BlockFilterSet of outer loop, like .cold in the test case. So it is added to the chain of inner loop. When work on the outer loop, .cold is not added to BlockFilterSet, so the edge to successor .problem is not counted in UnscheduledPredecessors of .problem chain. But other blocks in the inner loop are added BlockFilterSet, so the whole inner loop chain can be layout, and markChainSuccessors is called to decrease UnscheduledPredecessors of following chains. markChainSuccessors calls markBlockSuccessors for every BB, even it is not in BlockFilterSet, like .cold, so .problem chain's UnscheduledPredecessors is decreased, but this edge was not counted on in fillWorkLists, so .problem chain's UnscheduledPredecessors becomes 0 when it still has an unscheduled predecessor .pred! And it causes problems in following various successor BB selection algorithms.
Differential Revision: https://reviews.llvm.org/D89088
show more ...
|
#
d50d7c37 |
| 14-Dec-2020 |
Guozhi Wei <carrot@google.com> |
[MBP] Prevent rotating a chain contains entry block
The entry block should always be the first BB in a function. So we should not rotate a chain contains the entry block.
Differential Revision: htt
[MBP] Prevent rotating a chain contains entry block
The entry block should always be the first BB in a function. So we should not rotate a chain contains the entry block.
Differential Revision: https://reviews.llvm.org/D92882
show more ...
|
#
ee5b5b7a |
| 14-Dec-2020 |
Kazu Hirata <kazu@google.com> |
[CodeGen] Use llvm::erase_value (NFC)
|
#
a553ac97 |
| 05-Dec-2020 |
Kazu Hirata <kazu@google.com> |
[CodeGen] llvm::erase_if (NFC)
|
Revision tags: llvmorg-11.0.1-rc1 |
|
#
68403af0 |
| 22-Nov-2020 |
Kazu Hirata <kazu@google.com> |
[MBP] Remove unused declaration shouldPredBlockBeOutlined (NFC)
The function was introduced on Jun 12, 2016 in commit 071d0f180794f7819c44026815614ce8fa00a3bd. Its definition was removed on Mar 2,
[MBP] Remove unused declaration shouldPredBlockBeOutlined (NFC)
The function was introduced on Jun 12, 2016 in commit 071d0f180794f7819c44026815614ce8fa00a3bd. Its definition was removed on Mar 2, 2017 in commit 1393761e0ca3fe8271245762f78daf4d5208cd77.
show more ...
|
#
e42f6c0a |
| 23-Oct-2020 |
Han Shen <shenhan@google.com> |
Revert "[MBP] Add whole chain to BlockFilterSet instead of individual BB"
This reverts commit adfb5415010fbbc009a4a6298cfda7a6ed4fa6d4.
This is reverted because it caused an chrome error: https://c
Revert "[MBP] Add whole chain to BlockFilterSet instead of individual BB"
This reverts commit adfb5415010fbbc009a4a6298cfda7a6ed4fa6d4.
This is reverted because it caused an chrome error: https://crbug.com/1140168
show more ...
|
#
adfb5415 |
| 14-Oct-2020 |
Guozhi Wei <carrot@google.com> |
[MBP] Add whole chain to BlockFilterSet instead of individual BB
Currently we add individual BB to BlockFilterSet if its frequency satisfies
LoopFreq / Freq <= LoopToColdBlockRatio
LoopFreq is edg
[MBP] Add whole chain to BlockFilterSet instead of individual BB
Currently we add individual BB to BlockFilterSet if its frequency satisfies
LoopFreq / Freq <= LoopToColdBlockRatio
LoopFreq is edge frequency from outside to loop header. LoopToColdBlockRatio is a command line parameter.
It doesn't make sense since we always layout whole chain, not individual BBs.
It may also cause a tricky problem. Sometimes it is possible that the LoopFreq of an inner loop is smaller than LoopFreq of outer loop. So a BB can be in BlockFilterSet of inner loop, but not in BlockFilterSet of outer loop, like .cold in the test case. So it is added to the chain of inner loop. When work on the outer loop, .cold is not added to BlockFilterSet, so the edge to successor .problem is not counted in UnscheduledPredecessors of .problem chain. But other blocks in the inner loop are added BlockFilterSet, so the whole inner loop chain can be layout, and markChainSuccessors is called to decrease UnscheduledPredecessors of following chains. markChainSuccessors calls markBlockSuccessors for every BB, even it is not in BlockFilterSet, like .cold, so .problem chain's UnscheduledPredecessors is decreased, but this edge was not counted on in fillWorkLists, so .problem chain's UnscheduledPredecessors becomes 0 when it still has an unscheduled predecessor .pred! And it causes problems in following various successor BB selection algorithms.
Differential Revision: https://reviews.llvm.org/D89088
show more ...
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4 |
|
#
fd75ad86 |
| 23-Sep-2020 |
Guozhi Wei <carrot@google.com> |
[MBFIWrapper] Add a new function getBlockProfileCount
MBFIWrapper keeps track of block frequencies of newly created blocks and modified blocks, modified block frequencies should also impact block pr
[MBFIWrapper] Add a new function getBlockProfileCount
MBFIWrapper keeps track of block frequencies of newly created blocks and modified blocks, modified block frequencies should also impact block profile count. This class doesn't provide interface getBlockProfileCount, users can only use the underlying MBFI to query profile count, the underlying MBFI doesn't know the modifications made in MBFIWrapper, so it either provides stale profile count for modified block or simply crashes on new blocks.
So this patch add function getBlockProfileCount to class MBFIWrapper to handle new blocks or modified blocks.
Differential Revision: https://reviews.llvm.org/D87802
show more ...
|
Revision tags: llvmorg-11.0.0-rc3 |
|
#
6913812a |
| 20-Sep-2020 |
Fangrui Song <i@maskray.me> |
Fix some clang-tidy bugprone-argument-comment issues
|
Revision tags: llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1 |
|
#
28759e9f |
| 21-Jul-2020 |
Guozhi Wei <carrot@google.com> |
[MBP] Use profile count to compute tail dup cost if it is available
Current tail duplication in machine block placement pass uses block frequency information in cost model. But frequency number has
[MBP] Use profile count to compute tail dup cost if it is available
Current tail duplication in machine block placement pass uses block frequency information in cost model. But frequency number has only relative meaning compared to other basic blocks in the same function. A large frequency number doesn't mean it is hot and a small frequency number doesn't mean it is cold.
To overcome this problem, this patch uses profile count in cost model if it's available. So we can tail duplicate real hot basic blocks.
Differential Revision: https://reviews.llvm.org/D83265
show more ...
|
Revision tags: llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3 |
|
#
78c69a00 |
| 01-Jul-2020 |
Yuanfang Chen <yuanfang.chen@sony.com> |
[NFC] Clean up uses of MachineModuleInfoWrapperPass
|
Revision tags: llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3 |
|
#
1978309d |
| 19-Feb-2020 |
James Y Knight <jyknight@google.com> |
MachineBasicBlock::updateTerminator now requires an explicit layout successor.
Previously, it tried to infer the correct destination block from the successor list, but this is a rather tricky propsp
MachineBasicBlock::updateTerminator now requires an explicit layout successor.
Previously, it tried to infer the correct destination block from the successor list, but this is a rather tricky propspect, given the existence of successors that occur mid-block, such as invoke, and potentially in the future, callbr/INLINEASM_BR. (INLINEASM_BR, in particular would be problematic, because its successor blocks are not distinct from "normal" successors, as EHPads are.)
Instead, require the caller to pass in the expected fallthrough successor explicitly. In most callers, the correct block is immediately clear. But, in MachineBlockPlacement, we do need to record the original ordering, before starting to reorder blocks.
Unfortunately, the goal of decoupling the behavior of end-of-block jumps from the successor list has not been fully accomplished in this patch, as there is currently no other way to determine whether a block is intended to fall-through, or end as unreachable. Further work is needed there.
Differential Revision: https://reviews.llvm.org/D79605
show more ...
|
#
97f92261 |
| 02-May-2020 |
Benjamin Kramer <benny.kra@googlemail.com> |
[MBP] tuple->pair. NFC.
std::pair has a trivial copy ctor, std::tuple doesn't.
|
#
11455a79 |
| 11-Apr-2020 |
Hongtao Yu <hoy@fb.com> |
[CodeGen] Allow partial tail duplication in Machine Block Placement.
Summary: A count profile may affect tail duplication's heuristic causing a block to be duplicated in only a part of its predecess
[CodeGen] Allow partial tail duplication in Machine Block Placement.
Summary: A count profile may affect tail duplication's heuristic causing a block to be duplicated in only a part of its predecessors. This is not allowed in the Machine Block Placement pass where an assert will go off. I'm removing the assert and making the optimization bail out when such case happens.
Reviewers: wenlei, davidxl, Carrot
Reviewed By: Carrot
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77748
show more ...
|
#
c3417592 |
| 24-Mar-2020 |
Hiroshi Yamauchi <yamauchi@google.com> |
Revert "Include static prof data when collecting loop BBs"
This reverts commit 129c911efaa492790c251b3eb18e4db36b55cbc5.
Due to an internal benchmark regression.
|
#
129c911e |
| 19-Feb-2020 |
Bill Wendling <isanbard@gmail.com> |
Include static prof data when collecting loop BBs
Summary: If the programmer adds static profile data to a branch---i.e. uses "__builtin_expect()" or similar---then we should honor it. Otherwise, "_
Include static prof data when collecting loop BBs
Summary: If the programmer adds static profile data to a branch---i.e. uses "__builtin_expect()" or similar---then we should honor it. Otherwise, "__builtin_expect()" is ignored in crucial situations. So we trust that the programmer knows what they're doing until proven wrong.
Subscribers: hiraditya, JDevlieghere, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74809
show more ...
|
Revision tags: llvmorg-10.0.0-rc2 |
|
#
369d086d |
| 12-Feb-2020 |
Guozhi Wei <carrot@google.com> |
[MBP] Partial tail duplication into hot predecessors
Current tail duplication embedded in MBP duplicates a BB into all or none of its predecessors without too much cost analysis. So sometimes it is
[MBP] Partial tail duplication into hot predecessors
Current tail duplication embedded in MBP duplicates a BB into all or none of its predecessors without too much cost analysis. So sometimes it is duplicated into cold predecessors, and in other cases it may miss the duplication into hot predecessors.
This patch improves tail duplication in 3 aspects:
A successor can be duplicated into part of its predecessors. A more fine-grained benefit analysis, combined with 1, now a successor is duplicated into hot predecessors only. If a successor can't be duplicated into one predecessor, it doesn't impact the duplication into other predecessors.
Differential Revision: https://reviews.llvm.org/D73387
show more ...
|
Revision tags: llvmorg-10.0.0-rc1 |
|
#
ac8da31a |
| 29-Jan-2020 |
Hiroshi Yamauchi <yamauchi@google.com> |
[PGO][PGSO] Handle MBFIWrapper
Some code gen passes use MBFIWrapper to keep track of the frequency of new blocks. This was not taken into account and could lead to incorrect frequencies as MBFI sile
[PGO][PGSO] Handle MBFIWrapper
Some code gen passes use MBFIWrapper to keep track of the frequency of new blocks. This was not taken into account and could lead to incorrect frequencies as MBFI silently returns zero frequency for unknown/new blocks.
Add a variant for MBFIWrapper in the PGSO query interface.
Depends on D73494.
show more ...
|
#
2c03c899 |
| 27-Jan-2020 |
Hiroshi Yamauchi <yamauchi@google.com> |
[MBFI] Move BranchFolding::MBFIWrapper to its own files. NFC.
Summary: To avoid header file circular dependency issues in passing updated MBFI (in MBFIWrapper) to the interface of profile guided siz
[MBFI] Move BranchFolding::MBFIWrapper to its own files. NFC.
Summary: To avoid header file circular dependency issues in passing updated MBFI (in MBFIWrapper) to the interface of profile guided size optimizations.
A prep step for (and split off of) D73381.
Reviewers: davidxl
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73494
show more ...
|
#
020041d9 |
| 21-Jan-2020 |
Krzysztof Parzyszek <kparzysz@quicinc.com> |
Update spelling of {analyze,insert,remove}Branch in strings and comments
These names have been changed from CamelCase to camelCase, but there were many places (comments mostly) that still used the o
Update spelling of {analyze,insert,remove}Branch in strings and comments
These names have been changed from CamelCase to camelCase, but there were many places (comments mostly) that still used the old names.
This change is NFC.
show more ...
|
Revision tags: llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3 |
|
#
d9ae4939 |
| 05-Dec-2019 |
Hiroshi Yamauchi <yamauchi@google.com> |
[PGO][PGSO] Instrument the code gen / target passes.
Summary: Split off of D67120.
Add the profile guided size optimization instrumentation / queries in the code gen or target passes. This doesn't
[PGO][PGSO] Instrument the code gen / target passes.
Summary: Split off of D67120.
Add the profile guided size optimization instrumentation / queries in the code gen or target passes. This doesn't enable the size optimizations in those passes yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass queries).
A second try after reverted D71072.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71149
show more ...
|