History log of /llvm-project/llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp (Results 26 – 50 of 174)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3
# d0641826 24-Jun-2021 Roman Lebedev <lebedev.ri@gmail.com>

[SimplifyCFG] Tail-merging all blocks with `resume` terminator

Similar to what we already do for `ret` terminators.
As noted by @rnk, clang seems to already generate a single `ret`/`resume`,
so this

[SimplifyCFG] Tail-merging all blocks with `resume` terminator

Similar to what we already do for `ret` terminators.
As noted by @rnk, clang seems to already generate a single `ret`/`resume`,
so this isn't likely to cause widespread changes.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D104849

show more ...


# 9c4c2f24 24-Jun-2021 Roman Lebedev <lebedev.ri@gmail.com>

[SimplifyCFG] Tail-merging all blocks with `ret` terminator

Based ontop of D104598, which is a NFCI-ish refactoring.
Here, a restriction, that only empty blocks can be merged, is lifted.

Reviewed B

[SimplifyCFG] Tail-merging all blocks with `ret` terminator

Based ontop of D104598, which is a NFCI-ish refactoring.
Here, a restriction, that only empty blocks can be merged, is lifted.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D104597

show more ...


# ff4b1d37 23-Jun-2021 Roman Lebedev <lebedev.ri@gmail.com>

[NFCI-ish][SimplifyCFGPass] Rework and generalize `ret` block tail-merging

This changes the approach taken to tail-merge the blocks
to always create a new block instead of trying to reuse some block

[NFCI-ish][SimplifyCFGPass] Rework and generalize `ret` block tail-merging

This changes the approach taken to tail-merge the blocks
to always create a new block instead of trying to reuse some block,
and generalizes it to support dealing not with just the `ret` in the future.

This effectively lifts the CallBr restriction, although this isn't really intentional.
That is the only non-NFC change here, i'm not sure if it's reasonable/feasible to temporarily retain it.

Other restrictions of the transform remain.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D104598

show more ...


Revision tags: llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1
# 729e18cb 19-May-2021 Roman Lebedev <lebedev.ri@gmail.com>

[NFCI] SimplifyCFGPass: mergeEmptyReturnBlocks(): use DeleteDeadBlocks()

In this case, it does the same thing as the original pattern does.

SimplifyCFG has a few lurking miscompilations about delet

[NFCI] SimplifyCFGPass: mergeEmptyReturnBlocks(): use DeleteDeadBlocks()

In this case, it does the same thing as the original pattern does.

SimplifyCFG has a few lurking miscompilations about deleting blocks that
have their address taken, and consistently using DeleteDeadBlocks() instead
of a hand-rolled pattern will allow to weed those cases out easierly.

show more ...


# 6b9524a0 06-May-2021 Arthur Eubanks <aeubanks@google.com>

[NewPM] Don't mark AA analyses as preserved

Currently all AA analyses marked as preserved are stateless, not taking
into account their dependent analyses. So there's no need to mark them
as preserve

[NewPM] Don't mark AA analyses as preserved

Currently all AA analyses marked as preserved are stateless, not taking
into account their dependent analyses. So there's no need to mark them
as preserved, they won't be invalidated unless their analyses are.

SCEVAAResults was the one exception to this, it was treated like a
typical analysis result. Make it like the others and don't invalidate
unless SCEV is invalidated.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D102032

show more ...


# 13fca9d8 11-Apr-2021 Roman Lebedev <lebedev.ri@gmail.com>

[NFCI][SimplifyCFG] mergeEmptyReturnBlocks(): improve Dominator Tree updating

Same as with previous patches.


Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init
# 022da61f 23-Jan-2021 Roman Lebedev <lebedev.ri@gmail.com>

[SimplifyCFG] Change 'LoopHeaders' to be ArrayRef<WeakVH>, not a naked set, thus avoiding dangling pointers

If i change it to AssertingVH instead, a number of existing tests fail,
which means we don

[SimplifyCFG] Change 'LoopHeaders' to be ArrayRef<WeakVH>, not a naked set, thus avoiding dangling pointers

If i change it to AssertingVH instead, a number of existing tests fail,
which means we don't consistently remove from the set when deleting blocks,
which means newly-created blocks may happen to appear in that set
if they happen to occupy the same memory chunk as did some block
that was in the set originally.

There are many places where we delete blocks,
and while we could probably consistently delete from LoopHeaders
when deleting a block in transforms located in SimplifyCFG.cpp itself,
transforms located elsewhere (Local.cpp/BasicBlockUtils.cpp) also may
delete blocks, and it doesn't seem good to teach them to deal with it.

Since we at most only ever delete from LoopHeaders,
let's just delegate to WeakVH to do that automatically.

But to be honest, personally, i'm not sure that the idea
behind LoopHeaders is sound.

show more ...


Revision tags: llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1
# ec8a6c11 11-Jan-2021 Roman Lebedev <lebedev.ri@gmail.com>

[SimplifyCFGPass] iterativelySimplifyCFG(): support lazy DomTreeUpdater

This boils down to how we deal with early-increment iterator
over function's basic blocks: not only we need to early-increment

[SimplifyCFGPass] iterativelySimplifyCFG(): support lazy DomTreeUpdater

This boils down to how we deal with early-increment iterator
over function's basic blocks: not only we need to early-increment,
after that we also need to skip all the blocks
that are scheduled for removal, as per DomTreeUpdater.

show more ...


# 81afeacd 11-Jan-2021 Roman Lebedev <lebedev.ri@gmail.com>

[SimplifyCFGPass] mergeEmptyReturnBlocks(): skip blocks scheduled for removal as per DomTreeUpdater

Thus supporting lazy DomTreeUpdater mode,
where the domtree updates (and thus block removals)
aren

[SimplifyCFGPass] mergeEmptyReturnBlocks(): skip blocks scheduled for removal as per DomTreeUpdater

Thus supporting lazy DomTreeUpdater mode,
where the domtree updates (and thus block removals)
aren't applied immediately, but are delayed
until last possible moment.

show more ...


# 8e8d214c 10-Jan-2021 Roman Lebedev <lebedev.ri@gmail.com>

[NFCI][SimplifyCFG] Prefer to add Insert edges before Delete edges into DomTreeUpdater, if reasonable

This has a measurable impact on the number of DomTree recalculations.
While this doesn't handle

[NFCI][SimplifyCFG] Prefer to add Insert edges before Delete edges into DomTreeUpdater, if reasonable

This has a measurable impact on the number of DomTree recalculations.
While this doesn't handle all the cases,
it deals with the most obvious ones.

show more ...


# ed9de61c 04-Jan-2021 Roman Lebedev <lebedev.ri@gmail.com>

[SimplifyCFGPass] mergeEmptyReturnBlocks(): switch to non-permissive DomTree updates

... which requires not inserting an edge that already exists.


# 3fb57222 04-Jan-2021 Roman Lebedev <lebedev.ri@gmail.com>

[NFCI] SimplifyCFG: switch to non-permissive DomTree updates, where possible

Notably, this doesn't switch *every* case, remaining cases
don't actually pass sanity checks in non-permissve mode,
and t

[NFCI] SimplifyCFG: switch to non-permissive DomTree updates, where possible

Notably, this doesn't switch *every* case, remaining cases
don't actually pass sanity checks in non-permissve mode,
and therefore require further analysis.

Note that SimplifyCFG still defaults to not preserving DomTree by default,
so this is effectively a NFC change.

show more ...


# e08fea3b 01-Jan-2021 Roman Lebedev <lebedev.ri@gmail.com>

[SimplifyCFGPass] Ensure that DominatorTreeWrapperPass is init'd before SimplifyCFG

It's probably better than hoping that it will happen to be
already initialized.


Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2
# d22a47e9 16-Dec-2020 Roman Lebedev <lebedev.ri@gmail.com>

[SimplifyCFG] Teach mergeEmptyReturnBlocks() to preserve DomTree

A first real transformation that didn't already knew how to do that,
but it's pretty tame - either change successor of all the predec

[SimplifyCFG] Teach mergeEmptyReturnBlocks() to preserve DomTree

A first real transformation that didn't already knew how to do that,
but it's pretty tame - either change successor of all the predecessors
of a block and carefully delay deletion of the block until afterwards
the DomTree updates are appled, or add a successor to the block.

There wasn't a great test coverage for this, so i added extra, to be sure.

show more ...


# 49dac4ac 16-Dec-2020 Roman Lebedev <lebedev.ri@gmail.com>

[SimplifyCFG] MergeBlockIntoPredecessor() already knows how to preserve DomTree

... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
al

[SimplifyCFG] MergeBlockIntoPredecessor() already knows how to preserve DomTree

... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.

show more ...


# 4fc169f6 16-Dec-2020 Roman Lebedev <lebedev.ri@gmail.com>

[SimplifyCFG] removeUnreachableBlocks() already knows how to preserve DomTree

... so just ensure that we pass DomTreeUpdater it into it.

Apparently, there were no dedicated tests just for that func

[SimplifyCFG] removeUnreachableBlocks() already knows how to preserve DomTree

... so just ensure that we pass DomTreeUpdater it into it.

Apparently, there were no dedicated tests just for that functionality,
so i'm adding one here.

show more ...


# e1133179 15-Dec-2020 Roman Lebedev <lebedev.ri@gmail.com>

[NFCI][SimplifyCFG] Add basic scaffolding for gradually making the pass DomTree-aware

Two observations:
1. Unavailability of DomTree makes it impossible to make
`FoldBranchToCommonDest()` transfor

[NFCI][SimplifyCFG] Add basic scaffolding for gradually making the pass DomTree-aware

Two observations:
1. Unavailability of DomTree makes it impossible to make
`FoldBranchToCommonDest()` transform in certain cases,
where the successor is dominated by predecessor,
because we then don't have PHI's, and can't recreate them,
well, without handrolling 'is dominated by' check,
which doesn't really look like a great solution to me.
2. Avoiding invalidating DomTree in SimplifyCFG will
decrease the number of `Dominator Tree Construction` by 5
(from 28 now, i.e. -18%) in `-O3` old-pm pipeline
(as per `llvm/test/Other/opt-O3-pipeline.ll`)
This might or might not be beneficial for compile time.

So the plan is to make SimplifyCFG preserve DomTree, and then
eventually make DomTree fully required and preserved by the pass.

Now, SimplifyCFG is ~7KLOC. I don't think it will be nice
to do all this uplifting in a single mega-commit,
nor would it be possible to review it in any meaningful way.

But, i believe, it should be possible to do this in smaller steps,
introducing the new behavior, in an optional way, off-by-default,
opt-in option, and gradually fixing transforms one-by-one
and adding the flag to appropriate test coverage.

Then, eventually, the default should be flipped,
and eventually^2 the flag removed.

And that is what is happening here - when the new off-by-default option
is specified, DomTree is required and is claimed to be preserved,
and SimplifyCFG-internal assertions verify that the DomTree is still OK.

show more ...


Revision tags: llvmorg-11.0.1-rc1
# aeb0fdff 14-Nov-2020 Arthur Eubanks <aeubanks@google.com>

[SimplifyCFG] Respect optforfuzzing in NPM pass

Regression caused by refactoring in
cdd006eec9409923f9a56b9026ce2cb72e7b71dc.

See discussion in https://reviews.llvm.org/D89917.

Reviewed By: arsenm

[SimplifyCFG] Respect optforfuzzing in NPM pass

Regression caused by refactoring in
cdd006eec9409923f9a56b9026ce2cb72e7b71dc.

See discussion in https://reviews.llvm.org/D89917.

Reviewed By: arsenm, morehouse

Differential Revision: https://reviews.llvm.org/D91473

show more ...


# 2f293411 22-Oct-2020 Zequan Wu <zequanwu@google.com>

Revert "Revert "SimplifyCFG: Clean up optforfuzzing implementation""

This reverts commit 716f7636e1ec7880a6d2f2205f54f65191cf8f9a.


# 716f7636 21-Oct-2020 Zequan Wu <zequanwu@google.com>

Revert "SimplifyCFG: Clean up optforfuzzing implementation"

See discussion: https://reviews.llvm.org/D89590
This reverts commit cdd006eec9409923f9a56b9026ce2cb72e7b71dc.


Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3
# 1747f777 15-Sep-2020 Arthur Eubanks <aeubanks@google.com>

[SimplifyCFG] Override options in default constructor

SimplifyCFG's options should always be overridden by command line flags,
but they mistakenly weren't in the default constructor.

Reviewed By: y

[SimplifyCFG] Override options in default constructor

SimplifyCFG's options should always be overridden by command line flags,
but they mistakenly weren't in the default constructor.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D87718

show more ...


# bb7d3af1 07-Sep-2020 Roman Lebedev <lebedev.ri@gmail.com>

Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline

This was reverted in 503deec2183d466dad64b763bab4e15fd8804239
because it caused

Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline

This was reverted in 503deec2183d466dad64b763bab4e15fd8804239
because it caused gigantic increase (3x) in branch mispredictions
in certain benchmarks on certain CPU's,
see https://reviews.llvm.org/D84108#2227365.

It has since been investigated and here are the results:
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html
> It's an amazingly severe regression, but it's also all due to branch
> mispredicts (about 3x without this). The code layout looks ok so there's
> probably something else to deal with. I'm not sure there's anything we can
> reasonably do so we'll just have to take the hit for now and wait for
> another code reorganization to make the branch predictor a bit more happy :)
>
> Thanks for giving us some time to investigate and feel free to recommit
> whenever you'd like.
>
> -eric

So let's just reland this.
Original commit message:


I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.

After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.

Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.

I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.

Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.

Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.

As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S

llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement

The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.

If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
* -14 (-73.68%) loops not rotated due to the header size (yay)
* -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
* -3937 (-64.19%) common instructions hoisted
* +561 (+0.06%) x86 asm instructions
* -2 basic blocks
* +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable {F12360201}
* -36396 (-65.29%) common instructions hoisted
* +1676 (+0.02%) x86 asm instructions
* +662 (+0.06%) basic blocks
* +4395 (+0.04%) IR instructions

It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D84108

This reverts commit 503deec2183d466dad64b763bab4e15fd8804239.

show more ...


# 503deec2 21-Aug-2020 Roman Lebedev <lebedev.ri@gmail.com>

Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline"

As disscussed in post-commit review starting with
https://reviews

Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline"

As disscussed in post-commit review starting with
https://reviews.llvm.org/D84108#2227365
while this appears to be mostly a win overall, especially code-size-wise,
this appears to shake //certain// code pattens in a way that is extremely
unfavorable for performance (+30% runtime regression)
on certain CPU's (i personally can't reproduce).

So until the behaviour is better understood, and a path forward is mapped,
let's back this out for now.

This reverts commit 1d51dc38d89bd33fb8874e242ab87b265b4dec1c.

show more ...


Revision tags: llvmorg-11.0.0-rc2
# 1d51dc38 29-Jul-2020 Roman Lebedev <lebedev.ri@gmail.com>

[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline

I've been looking at missed vectorizations in one codebase.
One particular thing that s

[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline

I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.

After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.

Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.

I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.

Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.

Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.

As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S

llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement

The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.

If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
* -14 (-73.68%) loops not rotated due to the header size (yay)
* -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
* -3937 (-64.19%) common instructions hoisted
* +561 (+0.06%) x86 asm instructions
* -2 basic blocks
* +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable {F12360201}
* -36396 (-65.29%) common instructions hoisted
* +1676 (+0.02%) x86 asm instructions
* +662 (+0.06%) basic blocks
* +4395 (+0.04%) IR instructions

It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D84108

show more ...


Revision tags: llvmorg-11.0.0-rc1
# 04b729d0 20-Jul-2020 Roman Lebedev <lebedev.ri@gmail.com>

[NFCI][SimplifyCFG] Guard common code hoisting with a (default-on) flag

Common code sinking is already guarded with a (with default-off!) flag,
so add a flag for hoisting, too.

D84108 will hopefull

[NFCI][SimplifyCFG] Guard common code hoisting with a (default-on) flag

Common code sinking is already guarded with a (with default-off!) flag,
so add a flag for hoisting, too.

D84108 will hopefully make hoisting off-by-default too.

show more ...


1234567