Revision tags: llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3 |
|
#
d0641826 |
| 24-Jun-2021 |
Roman Lebedev <lebedev.ri@gmail.com> |
[SimplifyCFG] Tail-merging all blocks with `resume` terminator
Similar to what we already do for `ret` terminators. As noted by @rnk, clang seems to already generate a single `ret`/`resume`, so this
[SimplifyCFG] Tail-merging all blocks with `resume` terminator
Similar to what we already do for `ret` terminators. As noted by @rnk, clang seems to already generate a single `ret`/`resume`, so this isn't likely to cause widespread changes.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D104849
show more ...
|
#
9c4c2f24 |
| 24-Jun-2021 |
Roman Lebedev <lebedev.ri@gmail.com> |
[SimplifyCFG] Tail-merging all blocks with `ret` terminator
Based ontop of D104598, which is a NFCI-ish refactoring. Here, a restriction, that only empty blocks can be merged, is lifted.
Reviewed B
[SimplifyCFG] Tail-merging all blocks with `ret` terminator
Based ontop of D104598, which is a NFCI-ish refactoring. Here, a restriction, that only empty blocks can be merged, is lifted.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D104597
show more ...
|
#
ff4b1d37 |
| 23-Jun-2021 |
Roman Lebedev <lebedev.ri@gmail.com> |
[NFCI-ish][SimplifyCFGPass] Rework and generalize `ret` block tail-merging
This changes the approach taken to tail-merge the blocks to always create a new block instead of trying to reuse some block
[NFCI-ish][SimplifyCFGPass] Rework and generalize `ret` block tail-merging
This changes the approach taken to tail-merge the blocks to always create a new block instead of trying to reuse some block, and generalizes it to support dealing not with just the `ret` in the future.
This effectively lifts the CallBr restriction, although this isn't really intentional. That is the only non-NFC change here, i'm not sure if it's reasonable/feasible to temporarily retain it.
Other restrictions of the transform remain.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D104598
show more ...
|
Revision tags: llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
#
729e18cb |
| 19-May-2021 |
Roman Lebedev <lebedev.ri@gmail.com> |
[NFCI] SimplifyCFGPass: mergeEmptyReturnBlocks(): use DeleteDeadBlocks()
In this case, it does the same thing as the original pattern does.
SimplifyCFG has a few lurking miscompilations about delet
[NFCI] SimplifyCFGPass: mergeEmptyReturnBlocks(): use DeleteDeadBlocks()
In this case, it does the same thing as the original pattern does.
SimplifyCFG has a few lurking miscompilations about deleting blocks that have their address taken, and consistently using DeleteDeadBlocks() instead of a hand-rolled pattern will allow to weed those cases out easierly.
show more ...
|
#
6b9524a0 |
| 06-May-2021 |
Arthur Eubanks <aeubanks@google.com> |
[NewPM] Don't mark AA analyses as preserved
Currently all AA analyses marked as preserved are stateless, not taking into account their dependent analyses. So there's no need to mark them as preserve
[NewPM] Don't mark AA analyses as preserved
Currently all AA analyses marked as preserved are stateless, not taking into account their dependent analyses. So there's no need to mark them as preserved, they won't be invalidated unless their analyses are.
SCEVAAResults was the one exception to this, it was treated like a typical analysis result. Make it like the others and don't invalidate unless SCEV is invalidated.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D102032
show more ...
|
#
13fca9d8 |
| 11-Apr-2021 |
Roman Lebedev <lebedev.ri@gmail.com> |
[NFCI][SimplifyCFG] mergeEmptyReturnBlocks(): improve Dominator Tree updating
Same as with previous patches.
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init |
|
#
022da61f |
| 23-Jan-2021 |
Roman Lebedev <lebedev.ri@gmail.com> |
[SimplifyCFG] Change 'LoopHeaders' to be ArrayRef<WeakVH>, not a naked set, thus avoiding dangling pointers
If i change it to AssertingVH instead, a number of existing tests fail, which means we don
[SimplifyCFG] Change 'LoopHeaders' to be ArrayRef<WeakVH>, not a naked set, thus avoiding dangling pointers
If i change it to AssertingVH instead, a number of existing tests fail, which means we don't consistently remove from the set when deleting blocks, which means newly-created blocks may happen to appear in that set if they happen to occupy the same memory chunk as did some block that was in the set originally.
There are many places where we delete blocks, and while we could probably consistently delete from LoopHeaders when deleting a block in transforms located in SimplifyCFG.cpp itself, transforms located elsewhere (Local.cpp/BasicBlockUtils.cpp) also may delete blocks, and it doesn't seem good to teach them to deal with it.
Since we at most only ever delete from LoopHeaders, let's just delegate to WeakVH to do that automatically.
But to be honest, personally, i'm not sure that the idea behind LoopHeaders is sound.
show more ...
|
Revision tags: llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1 |
|
#
ec8a6c11 |
| 11-Jan-2021 |
Roman Lebedev <lebedev.ri@gmail.com> |
[SimplifyCFGPass] iterativelySimplifyCFG(): support lazy DomTreeUpdater
This boils down to how we deal with early-increment iterator over function's basic blocks: not only we need to early-increment
[SimplifyCFGPass] iterativelySimplifyCFG(): support lazy DomTreeUpdater
This boils down to how we deal with early-increment iterator over function's basic blocks: not only we need to early-increment, after that we also need to skip all the blocks that are scheduled for removal, as per DomTreeUpdater.
show more ...
|
#
81afeacd |
| 11-Jan-2021 |
Roman Lebedev <lebedev.ri@gmail.com> |
[SimplifyCFGPass] mergeEmptyReturnBlocks(): skip blocks scheduled for removal as per DomTreeUpdater
Thus supporting lazy DomTreeUpdater mode, where the domtree updates (and thus block removals) aren
[SimplifyCFGPass] mergeEmptyReturnBlocks(): skip blocks scheduled for removal as per DomTreeUpdater
Thus supporting lazy DomTreeUpdater mode, where the domtree updates (and thus block removals) aren't applied immediately, but are delayed until last possible moment.
show more ...
|
#
8e8d214c |
| 10-Jan-2021 |
Roman Lebedev <lebedev.ri@gmail.com> |
[NFCI][SimplifyCFG] Prefer to add Insert edges before Delete edges into DomTreeUpdater, if reasonable
This has a measurable impact on the number of DomTree recalculations. While this doesn't handle
[NFCI][SimplifyCFG] Prefer to add Insert edges before Delete edges into DomTreeUpdater, if reasonable
This has a measurable impact on the number of DomTree recalculations. While this doesn't handle all the cases, it deals with the most obvious ones.
show more ...
|
#
ed9de61c |
| 04-Jan-2021 |
Roman Lebedev <lebedev.ri@gmail.com> |
[SimplifyCFGPass] mergeEmptyReturnBlocks(): switch to non-permissive DomTree updates
... which requires not inserting an edge that already exists.
|
#
3fb57222 |
| 04-Jan-2021 |
Roman Lebedev <lebedev.ri@gmail.com> |
[NFCI] SimplifyCFG: switch to non-permissive DomTree updates, where possible
Notably, this doesn't switch *every* case, remaining cases don't actually pass sanity checks in non-permissve mode, and t
[NFCI] SimplifyCFG: switch to non-permissive DomTree updates, where possible
Notably, this doesn't switch *every* case, remaining cases don't actually pass sanity checks in non-permissve mode, and therefore require further analysis.
Note that SimplifyCFG still defaults to not preserving DomTree by default, so this is effectively a NFC change.
show more ...
|
#
e08fea3b |
| 01-Jan-2021 |
Roman Lebedev <lebedev.ri@gmail.com> |
[SimplifyCFGPass] Ensure that DominatorTreeWrapperPass is init'd before SimplifyCFG
It's probably better than hoping that it will happen to be already initialized.
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2 |
|
#
d22a47e9 |
| 16-Dec-2020 |
Roman Lebedev <lebedev.ri@gmail.com> |
[SimplifyCFG] Teach mergeEmptyReturnBlocks() to preserve DomTree
A first real transformation that didn't already knew how to do that, but it's pretty tame - either change successor of all the predec
[SimplifyCFG] Teach mergeEmptyReturnBlocks() to preserve DomTree
A first real transformation that didn't already knew how to do that, but it's pretty tame - either change successor of all the predecessors of a block and carefully delay deletion of the block until afterwards the DomTree updates are appled, or add a successor to the block.
There wasn't a great test coverage for this, so i added extra, to be sure.
show more ...
|
#
49dac4ac |
| 16-Dec-2020 |
Roman Lebedev <lebedev.ri@gmail.com> |
[SimplifyCFG] MergeBlockIntoPredecessor() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.
Fixes DomTree preservation for a large number of tests, al
[SimplifyCFG] MergeBlockIntoPredecessor() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.
Fixes DomTree preservation for a large number of tests, all of which are marked as such so that they do not regress.
show more ...
|
#
4fc169f6 |
| 16-Dec-2020 |
Roman Lebedev <lebedev.ri@gmail.com> |
[SimplifyCFG] removeUnreachableBlocks() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.
Apparently, there were no dedicated tests just for that func
[SimplifyCFG] removeUnreachableBlocks() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.
Apparently, there were no dedicated tests just for that functionality, so i'm adding one here.
show more ...
|
#
e1133179 |
| 15-Dec-2020 |
Roman Lebedev <lebedev.ri@gmail.com> |
[NFCI][SimplifyCFG] Add basic scaffolding for gradually making the pass DomTree-aware
Two observations: 1. Unavailability of DomTree makes it impossible to make `FoldBranchToCommonDest()` transfor
[NFCI][SimplifyCFG] Add basic scaffolding for gradually making the pass DomTree-aware
Two observations: 1. Unavailability of DomTree makes it impossible to make `FoldBranchToCommonDest()` transform in certain cases, where the successor is dominated by predecessor, because we then don't have PHI's, and can't recreate them, well, without handrolling 'is dominated by' check, which doesn't really look like a great solution to me. 2. Avoiding invalidating DomTree in SimplifyCFG will decrease the number of `Dominator Tree Construction` by 5 (from 28 now, i.e. -18%) in `-O3` old-pm pipeline (as per `llvm/test/Other/opt-O3-pipeline.ll`) This might or might not be beneficial for compile time.
So the plan is to make SimplifyCFG preserve DomTree, and then eventually make DomTree fully required and preserved by the pass.
Now, SimplifyCFG is ~7KLOC. I don't think it will be nice to do all this uplifting in a single mega-commit, nor would it be possible to review it in any meaningful way.
But, i believe, it should be possible to do this in smaller steps, introducing the new behavior, in an optional way, off-by-default, opt-in option, and gradually fixing transforms one-by-one and adding the flag to appropriate test coverage.
Then, eventually, the default should be flipped, and eventually^2 the flag removed.
And that is what is happening here - when the new off-by-default option is specified, DomTree is required and is claimed to be preserved, and SimplifyCFG-internal assertions verify that the DomTree is still OK.
show more ...
|
Revision tags: llvmorg-11.0.1-rc1 |
|
#
aeb0fdff |
| 14-Nov-2020 |
Arthur Eubanks <aeubanks@google.com> |
[SimplifyCFG] Respect optforfuzzing in NPM pass
Regression caused by refactoring in cdd006eec9409923f9a56b9026ce2cb72e7b71dc.
See discussion in https://reviews.llvm.org/D89917.
Reviewed By: arsenm
[SimplifyCFG] Respect optforfuzzing in NPM pass
Regression caused by refactoring in cdd006eec9409923f9a56b9026ce2cb72e7b71dc.
See discussion in https://reviews.llvm.org/D89917.
Reviewed By: arsenm, morehouse
Differential Revision: https://reviews.llvm.org/D91473
show more ...
|
#
2f293411 |
| 22-Oct-2020 |
Zequan Wu <zequanwu@google.com> |
Revert "Revert "SimplifyCFG: Clean up optforfuzzing implementation""
This reverts commit 716f7636e1ec7880a6d2f2205f54f65191cf8f9a.
|
#
716f7636 |
| 21-Oct-2020 |
Zequan Wu <zequanwu@google.com> |
Revert "SimplifyCFG: Clean up optforfuzzing implementation"
See discussion: https://reviews.llvm.org/D89590 This reverts commit cdd006eec9409923f9a56b9026ce2cb72e7b71dc.
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3 |
|
#
1747f777 |
| 15-Sep-2020 |
Arthur Eubanks <aeubanks@google.com> |
[SimplifyCFG] Override options in default constructor
SimplifyCFG's options should always be overridden by command line flags, but they mistakenly weren't in the default constructor.
Reviewed By: y
[SimplifyCFG] Override options in default constructor
SimplifyCFG's options should always be overridden by command line flags, but they mistakenly weren't in the default constructor.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D87718
show more ...
|
#
bb7d3af1 |
| 07-Sep-2020 |
Roman Lebedev <lebedev.ri@gmail.com> |
Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
This was reverted in 503deec2183d466dad64b763bab4e15fd8804239 because it caused
Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
This was reverted in 503deec2183d466dad64b763bab4e15fd8804239 because it caused gigantic increase (3x) in branch mispredictions in certain benchmarks on certain CPU's, see https://reviews.llvm.org/D84108#2227365.
It has since been investigated and here are the results: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html > It's an amazingly severe regression, but it's also all due to branch > mispredicts (about 3x without this). The code layout looks ok so there's > probably something else to deal with. I'm not sure there's anything we can > reasonably do so we'll just have to take the hit for now and wait for > another code reorganization to make the branch predictor a bit more happy :) > > Thanks for giving us some time to investigate and feel free to recommit > whenever you'd like. > > -eric
So let's just reland this. Original commit message:
I've been looking at missed vectorizations in one codebase. One particular thing that stands out is that some of the loops reach vectorizer in a rather mangled form, with weird PHI's, and some of the loops aren't even in a rotated form.
After taking a more detailed look, that happened because the loop's headers were too big by then. It is evident that SimplifyCFG's common code hoisting transform is at fault there, because the pattern it handles is precisely the unrotated loop basic block structure.
Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled by default, and is always run, unlike it's friend, common code sinking transform, `SinkCommonCodeFromPredecessors()`, which is not enabled by default and is only run once very late in the pipeline.
I'm proposing to harmonize this, and disable common code hoisting until //late// in pipeline. Definition of //late// may vary, here currently i've picked the same one as for code sinking, but i suppose we could enable it as soon as right after loop rotation happens.
Experimentation shows that this does indeed unsurprizingly help, more loops got rotated, although other issues remain elsewhere.
Now, this undoubtedly seriously shakes phase ordering. This will undoubtedly be a mixed bag in terms of both compile- and run- time performance, codesize. Since we no longer aggressively hoist+deduplicate common code, we don't pay the price of said hoisting (which wasn't big). That may allow more loops to be rotated, so we pay that price. That, in turn, that may enable all the transforms that require canonical (rotated) loop form, including but not limited to vectorization, so we pay that too. And in general, no deduplication means more [duplicate] instructions going through the optimizations. But there's still late hoisting, some of them will be caught late.
As per benchmarks i've run {F12360204}, this is mostly within the noise, there are some small improvements, some small regressions. One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure this will expose many more pre-existing missed optimizations, as usual :S
llvm-compile-time-tracker.com thoughts on this: http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions * this does regress compile-time by +0.5% geomean (unsurprizingly) * size impact varies; for ThinLTO it's actually an improvement
The largest fallout appears to be in GVN's load partial redundancy elimination, it spends *much* more time in `MemoryDependenceResults::getNonLocalPointerDependency()`. Non-local `MemoryDependenceResults` is widely-known to be, uh, costly. There does not appear to be a proper solution to this issue, other than silencing the compile-time performance regression by tuning cut-off thresholds in `MemoryDependenceResults`, at the cost of potentially regressing run-time performance. D84609 attempts to move in that direction, but the path is unclear and is going to take some time.
If we look at stats before/after diffs, some excerpts: * RawSpeed (the target) {F12360200} * -14 (-73.68%) loops not rotated due to the header size (yay) * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer * -3937 (-64.19%) common instructions hoisted * +561 (+0.06%) x86 asm instructions * -2 basic blocks * +2418 (+0.11%) IR instructions * vanilla test-suite + RawSpeed + darktable {F12360201} * -36396 (-65.29%) common instructions hoisted * +1676 (+0.02%) x86 asm instructions * +662 (+0.06%) basic blocks * +4395 (+0.04%) IR instructions
It is likely to be sub-optimal for when optimizing for code size, so one might want to change tune pipeline by enabling sinking/hoisting when optimizing for size.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D84108
This reverts commit 503deec2183d466dad64b763bab4e15fd8804239.
show more ...
|
#
503deec2 |
| 21-Aug-2020 |
Roman Lebedev <lebedev.ri@gmail.com> |
Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline"
As disscussed in post-commit review starting with https://reviews
Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline"
As disscussed in post-commit review starting with https://reviews.llvm.org/D84108#2227365 while this appears to be mostly a win overall, especially code-size-wise, this appears to shake //certain// code pattens in a way that is extremely unfavorable for performance (+30% runtime regression) on certain CPU's (i personally can't reproduce).
So until the behaviour is better understood, and a path forward is mapped, let's back this out for now.
This reverts commit 1d51dc38d89bd33fb8874e242ab87b265b4dec1c.
show more ...
|
Revision tags: llvmorg-11.0.0-rc2 |
|
#
1d51dc38 |
| 29-Jul-2020 |
Roman Lebedev <lebedev.ri@gmail.com> |
[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
I've been looking at missed vectorizations in one codebase. One particular thing that s
[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
I've been looking at missed vectorizations in one codebase. One particular thing that stands out is that some of the loops reach vectorizer in a rather mangled form, with weird PHI's, and some of the loops aren't even in a rotated form.
After taking a more detailed look, that happened because the loop's headers were too big by then. It is evident that SimplifyCFG's common code hoisting transform is at fault there, because the pattern it handles is precisely the unrotated loop basic block structure.
Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled by default, and is always run, unlike it's friend, common code sinking transform, `SinkCommonCodeFromPredecessors()`, which is not enabled by default and is only run once very late in the pipeline.
I'm proposing to harmonize this, and disable common code hoisting until //late// in pipeline. Definition of //late// may vary, here currently i've picked the same one as for code sinking, but i suppose we could enable it as soon as right after loop rotation happens.
Experimentation shows that this does indeed unsurprizingly help, more loops got rotated, although other issues remain elsewhere.
Now, this undoubtedly seriously shakes phase ordering. This will undoubtedly be a mixed bag in terms of both compile- and run- time performance, codesize. Since we no longer aggressively hoist+deduplicate common code, we don't pay the price of said hoisting (which wasn't big). That may allow more loops to be rotated, so we pay that price. That, in turn, that may enable all the transforms that require canonical (rotated) loop form, including but not limited to vectorization, so we pay that too. And in general, no deduplication means more [duplicate] instructions going through the optimizations. But there's still late hoisting, some of them will be caught late.
As per benchmarks i've run {F12360204}, this is mostly within the noise, there are some small improvements, some small regressions. One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure this will expose many more pre-existing missed optimizations, as usual :S
llvm-compile-time-tracker.com thoughts on this: http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions * this does regress compile-time by +0.5% geomean (unsurprizingly) * size impact varies; for ThinLTO it's actually an improvement
The largest fallout appears to be in GVN's load partial redundancy elimination, it spends *much* more time in `MemoryDependenceResults::getNonLocalPointerDependency()`. Non-local `MemoryDependenceResults` is widely-known to be, uh, costly. There does not appear to be a proper solution to this issue, other than silencing the compile-time performance regression by tuning cut-off thresholds in `MemoryDependenceResults`, at the cost of potentially regressing run-time performance. D84609 attempts to move in that direction, but the path is unclear and is going to take some time.
If we look at stats before/after diffs, some excerpts: * RawSpeed (the target) {F12360200} * -14 (-73.68%) loops not rotated due to the header size (yay) * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer * -3937 (-64.19%) common instructions hoisted * +561 (+0.06%) x86 asm instructions * -2 basic blocks * +2418 (+0.11%) IR instructions * vanilla test-suite + RawSpeed + darktable {F12360201} * -36396 (-65.29%) common instructions hoisted * +1676 (+0.02%) x86 asm instructions * +662 (+0.06%) basic blocks * +4395 (+0.04%) IR instructions
It is likely to be sub-optimal for when optimizing for code size, so one might want to change tune pipeline by enabling sinking/hoisting when optimizing for size.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D84108
show more ...
|
Revision tags: llvmorg-11.0.0-rc1 |
|
#
04b729d0 |
| 20-Jul-2020 |
Roman Lebedev <lebedev.ri@gmail.com> |
[NFCI][SimplifyCFG] Guard common code hoisting with a (default-on) flag
Common code sinking is already guarded with a (with default-off!) flag, so add a flag for hoisting, too.
D84108 will hopefull
[NFCI][SimplifyCFG] Guard common code hoisting with a (default-on) flag
Common code sinking is already guarded with a (with default-off!) flag, so add a flag for hoisting, too.
D84108 will hopefully make hoisting off-by-default too.
show more ...
|