Revision tags: llvmorg-21-init, llvmorg-19.1.7 |
|
#
5fb57131 |
| 25-Dec-2024 |
Usman Nadeem <mnadeem@quicinc.com> |
[DFAJumpThreading] Don't bail early after encountering unpredictable values (#119774)
After #96127 landed, mshockwave reported that the pass was no longer
threading SPEC2006/perlbench.
After 961
[DFAJumpThreading] Don't bail early after encountering unpredictable values (#119774)
After #96127 landed, mshockwave reported that the pass was no longer
threading SPEC2006/perlbench.
After 96127 we started bailing out in `getStateDefMap` and rejecting the
transformation because one of the unpredictable values was coming from
inside the loop. There was no fundamental change in that function except
that we started calling `Loop->contains(IncomingBB)` instead of
`LoopBBs.count(IncomingBB)`. After some analysis I came to the
conclusion that even before 96127 we would reject the transformation if
we provided large enough limits on the path traversal (large enough so
that LoopBBs contained blocks corresponding to that unpredictable
value).
In this patch I changed `getStateDefMap` to not terminate early on
finding an unpredictable value, this is because
`getPathsFromStateDefMap`, later, actually has checks to ensure that the
final list of paths only have predictable values. As a result we can now
partially thread functions like `negative6` in the tests that have some
predictable paths.
This patch does not really have any compile-time impact on the test
suite without `-dfa-early-exit-heuristic=false` (early exit is enabled
by default).
Change-Id: Ie1633b370ed4a0eda8dea52650b40f6f66ef49a3
show more ...
|
Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
94f9cbbe |
| 02-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[Scalar] Remove unused includes (NFC) (#114645)
Identified with misc-include-cleaner.
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1 |
|
#
d4a38c8f |
| 24-Sep-2024 |
Usman Nadeem <mnadeem@quicinc.com> |
[DFAJumpThreading] Handle select unfolding when user phi is not a dir… (#109511)
…ect successor
Previously the code assumed that the select instruction is defined in a
block that is a direct pre
[DFAJumpThreading] Handle select unfolding when user phi is not a dir… (#109511)
…ect successor
Previously the code assumed that the select instruction is defined in a
block that is a direct predecessor of the block where the PHINode uses
it. So, we were hitting an assertion when we tried to access the def
block as an incoming block for the user phi node.
This patch handles that case by using the correct end block and creating
a new phi node that aggregates both the values of the select in that end
block, and then using that new unfolded phi to overwrite the original
user phi node.
Fixes #106083
Change-Id: Ie471994cca232318f74a6e6438efa21e561c2dc0
show more ...
|
Revision tags: llvmorg-19.1.0 |
|
#
23a26e71 |
| 07-Sep-2024 |
Kazu Hirata <kazu@google.com> |
[DFAJumpThreading] Avoid repeated hash lookups (NFC) (#107670)
|
Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3 |
|
#
b167ada8 |
| 10-Aug-2024 |
Usman Nadeem <mnadeem@quicinc.com> |
[DFAJumpThreading] Rewrite the way paths are enumerated (#96127)
I tried to add a limit to number of blocks visited in the paths()
function but even with a very high limit the transformation covera
[DFAJumpThreading] Rewrite the way paths are enumerated (#96127)
I tried to add a limit to number of blocks visited in the paths()
function but even with a very high limit the transformation coverage was
being reduced.
After looking at the code it seemed that the function was trying to
create paths of the form
`SwitchBB...DeterminatorBB...SwitchPredecessor`. This is inefficient
because a lot of nodes in those paths (nodes before DeterminatorBB)
would be irrelevant to the optimization. We only care about paths of the
form `DeterminatorBB_Pred DeterminatorBB...SwitchBB`. This weeds out a
lot of visited nodes.
In this patch I have added a hard limit to the number of nodes visited
and changed the algorithm for path calculation. Primarily I am
traversing the use-def chain for the PHI nodes that define the state. If
we have a hole in the use-def chain (no immediate predecessors) then I
call the paths() function.
I also had to the change the select instruction unfolding code to insert
redundant one input PHIs to allow the use of the use-def chain in
calculating the paths.
The test suite coverage with this patch (including a limit on nodes
visited) is as follows:
Geomean diff:
dfa-jump-threading.NumTransforms: +13.4%
dfa-jump-threading.NumCloned: +34.1%
dfa-jump-threading.NumPaths: -80.7%
Compile time effect vs baseline (pass enabled by default) is mostly
positive:
https://llvm-compile-time-tracker.com/compare.php?from=ad8705fda25f64dcfeb6264ac4d6bac36bee91ab&to=5a3af6ce7e852f0736f706b4a8663efad5bce6ea&stat=instructions:u
Change-Id: I0fba9e0f8aa079706f633089a8ccd4ecf57547ed
show more ...
|
Revision tags: llvmorg-19.1.0-rc2 |
|
#
34d48279 |
| 29-Jul-2024 |
Kazu Hirata <kazu@google.com> |
[llvm] Initialize SmallVector with ranges (NFC) (#100948)
|
Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8 |
|
#
e0ac087f |
| 06-Jun-2024 |
Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> |
[LoopUnroll] Consider convergence control tokens when unrolling (#91715)
- There is no restriction on a loop with controlled convergent
operations when
the relevant tokens are defined and used w
[LoopUnroll] Consider convergence control tokens when unrolling (#91715)
- There is no restriction on a loop with controlled convergent
operations when
the relevant tokens are defined and used within the loop.
- When a token defined outside a loop is used inside (also called a loop
convergence heart), unrolling is allowed only in the absence of
remainder or
runtime checks.
- When a token defined inside a loop is used outside, such a loop is
said to be
"extended". This loop can only be unrolled by also duplicating the
extended part
lying outside the loop. Such unrolling is disabled for now.
- Clean up loop hearts: When unrolling a loop with a heart, duplicating
the
heart will introduce multiple static uses of a convergence control token
in a
cycle that does not contain its definition. This violates the static
rules for
tokens, and needs to be cleaned up into a single occurrence of the
intrinsic.
- Spell out the initializer for UnrollLoopOptions to improve
readability.
Original implementation [D85605] by Nicolai Haehnle
<nicolai.haehnle@amd.com>.
show more ...
|
Revision tags: llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5 |
|
#
7b5b5214 |
| 27-Apr-2024 |
XChy <xxs_chy@outlook.com> |
[DFAJumpThreading][NFC] Use const reference as range variable (#90342)
Fixes #90286
|
#
c229f767 |
| 27-Apr-2024 |
XChy <xxs_chy@outlook.com> |
[DFAJumpThreading] Avoid exploring the paths that never come back (#85505)
This patch does:
- Preserve loop info when unfolding selects.
- Reduce the search space for loop paths.
|
Revision tags: llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2 |
|
#
c9325f8a |
| 16-Mar-2024 |
Usman Nadeem <mnadeem@quicinc.com> |
[DFAJumpThreading] Add an early exit heuristic for unpredictable values (#85015)
Right now the algorithm does not exit on unpredictable values. It
waits until all the paths have been enumerated to
[DFAJumpThreading] Add an early exit heuristic for unpredictable values (#85015)
Right now the algorithm does not exit on unpredictable values. It
waits until all the paths have been enumerated to see if any of
those paths have that value. Waiting this late leads to a lot of
wasteful computation and higher compile time.
In this patch I have added a heuristic that checks if the value
comes from the same inner loops as the switch, if so, then it is
likely that the value will also be seen on a threadable path and
the code in `getStateDefMap()` return an empty map.
I tested this on the llvm test suite and the only change in the
number of threaded switches was in 7zip (before 23, after 18).
In all of those cases the current algorithm was partially threading
the loop because it was hitting a limit on the number of paths to
be explored. On increasing this limit even the current algorithm
finds paths where the unpredictable value is seen.
Compile time(with pass enabled by default and this patch):
https://llvm-compile-time-tracker.com/compare.php?from=8c5e9cf737138aba22a4a8f64ef2c5efc80dd7f9&to=42c75d888058b35c6d15901b34e36251d8f766b9&stat=instructions:u
show more ...
|
#
6b53ada6 |
| 15-Mar-2024 |
XChy <xxs_chy@outlook.com> |
[DFAJumpThreading] Early exit if switch is not in a loop (#85360)
This patch prevents taking non-loop switch as candidate.
|
Revision tags: llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
2c0fc0f3 |
| 16-Jan-2024 |
XChy <xxs_chy@outlook.com> |
[DFAJumpThreading] Handle circular determinator (#78177)
Fixes the buildbot failure in
https://github.com/llvm/llvm-project/pull/78134#issuecomment-1892195197
When we meet the path with single `de
[DFAJumpThreading] Handle circular determinator (#78177)
Fixes the buildbot failure in
https://github.com/llvm/llvm-project/pull/78134#issuecomment-1892195197
When we meet the path with single `determinator`, the determinator
actually takes itself as a predecessor. Thus, we need to let `Prev` be
the determinator when `PathBBs` has only one element.
show more ...
|
#
019ffbf3 |
| 15-Jan-2024 |
XChy <xxs_chy@outlook.com> |
[DFAJumpThreading] Extends the bitwidth of state from uint64_t to APInt (#78134)
Fixes #78059
|
#
03dc806b |
| 22-Dec-2023 |
Kazu Hirata <kazu@google.com> |
[Transforms] Use {DenseMap,SmallPtrSet}::contains (NFC)
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5 |
|
#
c880fdc0 |
| 03-Nov-2023 |
XChy <xxs_chy@outlook.com> |
[DFAJumpThreading] Remove incoming StartBlock from all phis when unfolding select (#71082)
Fixes #65222.
When unfolding select into diamond-like control flow, we need to remove
the StartBlock from
[DFAJumpThreading] Remove incoming StartBlock from all phis when unfolding select (#71082)
Fixes #65222.
When unfolding select into diamond-like control flow, we need to remove
the StartBlock from all phis in EndBlock.
show more ...
|
#
2fba4694 |
| 02-Nov-2023 |
XChy <xxs_chy@outlook.com> |
[DFAJumpThreading] Don't thread switch without multiple successors (#71060)
Fixes #56882.
Fixes #60254.
When switch has only one successor, it make no sense to thread it. And
computing the cost
[DFAJumpThreading] Don't thread switch without multiple successors (#71060)
Fixes #56882.
Fixes #60254.
When switch has only one successor, it make no sense to thread it. And
computing the cost of it brings div-by-zero exception. We prevent it in
this patch.
show more ...
|
#
7fa41d8a |
| 02-Nov-2023 |
XChy <xxs_chy@outlook.com> |
[DFAJumpThreading] Only unfold select coming from directly where it is defined (#70966)
Fixes #64860.
When a select instruction comes in by PHINode, the phi's incoming block
for it can flow indire
[DFAJumpThreading] Only unfold select coming from directly where it is defined (#70966)
Fixes #64860.
When a select instruction comes in by PHINode, the phi's incoming block
for it can flow indirectly past other BasicBlock into it. In this case,
we cannot unfold select to the phi's BB.
show more ...
|
Revision tags: llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2 |
|
#
a20f7efb |
| 15-Apr-2023 |
Bjorn Pettersson <bjorn.a.pettersson@ericsson.com> |
Remove several no longer needed includes. NFCI
Mostly removing includes of InitializePasses.h and Pass.h in passes that no longer has support for the legacy PM.
|
#
c83c4b58 |
| 16-Apr-2023 |
Kazu Hirata <kazu@google.com> |
[Transforms] Apply fixes from performance-for-range-copy (NFC)
|
Revision tags: llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4 |
|
#
7c3c9814 |
| 08-Mar-2023 |
Arthur Eubanks <aeubanks@google.com> |
[Passes] Remove some legacy passes
DFAJumpThreading JumpThreading LibCallsShrink LoopVectorize SLPVectorizer DeadStoreElimination AggressiveDCE CorrelatedValuePropagation IndVarSimplify
These are p
[Passes] Remove some legacy passes
DFAJumpThreading JumpThreading LibCallsShrink LoopVectorize SLPVectorizer DeadStoreElimination AggressiveDCE CorrelatedValuePropagation IndVarSimplify
These are part of the optimization pipeline, of which the legacy version is deprecated and being removed.
show more ...
|
Revision tags: llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3 |
|
#
5ea31555 |
| 16-Oct-2022 |
Kazu Hirata <kazu@google.com> |
[llvm] Use llvm::find (NFC)
|
Revision tags: working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3 |
|
#
9c710ebb |
| 17-Aug-2022 |
Daniil Fukalov <1671137+dfukalov@users.noreply.github.com> |
[TTI] NFC: Reduce InstructionCost::getValue() usage...
in order to propagate `InstructionCost` value upper.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D103406
|
Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
#
53dc0f10 |
| 03-Jul-2022 |
Nuno Lopes <nuno.lopes@tecnico.ulisboa.pt> |
[NFC] Switch a few uses of undef to poison as placeholders for unreachble code
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5 |
|
#
f85c5079 |
| 09-Jun-2022 |
Philip Reames <preames@rivosinc.com> |
Pipe potentially invalid InstructionCost through CodeMetrics
Per the documentation in Support/InstructionCost.h, the purpose of an invalid cost is so that clients can change behavior on impossible t
Pipe potentially invalid InstructionCost through CodeMetrics
Per the documentation in Support/InstructionCost.h, the purpose of an invalid cost is so that clients can change behavior on impossible to cost inputs. CodeMetrics was instead asserting that invalid costs never occurred.
On a target with an incomplete cost model - e.g. RISCV - this means that transformations would crash on (falsely) invalid constructs - e.g. scalable vectors. While we certainly should improve the cost model - and I plan to do so in the near future - we also shouldn't be crashing. This violates the explicitly stated purpose of an invalid InstructionCost.
I updated all of the "easy" consumers where bailouts were locally obvious. I plan to follow up with loop unroll in a following change.
Differential Revision: https://reviews.llvm.org/D127131
show more ...
|
#
f83a88a1 |
| 05-Jun-2022 |
Kazu Hirata <kazu@google.com> |
[Transforms] Use llvm::is_contained (NFC)
|