Revision tags: llvmorg-21-init |
|
#
ad9da92c |
| 27-Jan-2025 |
Florian Hahn <flo@fhahn.com> |
[LoopUnroll] Add RuntimeUnrollMultiExit to loop unroll options (NFC) (#124462)
Add an extra knob to RuntimeUnrollMultiExit to let backends control
whether to allow multi-exit unrolling on a per-loo
[LoopUnroll] Add RuntimeUnrollMultiExit to loop unroll options (NFC) (#124462)
Add an extra knob to RuntimeUnrollMultiExit to let backends control
whether to allow multi-exit unrolling on a per-loop basis.
This gives backends more fine-grained control on deciding if multi-exit
unrolling is profitable for a given loop and uarch. Similar to
4226e0a0c75.
PR: https://github.com/llvm/llvm-project/pull/124462
show more ...
|
Revision tags: llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5 |
|
#
4226e0a0 |
| 02-Dec-2024 |
Florian Hahn <flo@fhahn.com> |
[TTI] Add SCEVExpansionBudget to loop unrolling options. (#118316)
Add an extra know to UnrollingPreferences to let backends control the
maximum budget for SCEV expansions.
This gives backends m
[TTI] Add SCEVExpansionBudget to loop unrolling options. (#118316)
Add an extra know to UnrollingPreferences to let backends control the
maximum budget for SCEV expansions.
This gives backends more fine-grained control on the cost of the runtime
checks for runtime unrolling.
PR: https://github.com/llvm/llvm-project/pull/118316
show more ...
|
Revision tags: llvmorg-19.1.4 |
|
#
92e0fb0c |
| 08-Nov-2024 |
Stephen Tozer <stephen.tozer@sony.com> |
[DebugInfo][LoopUnroll] Preserve DebugLocs on optimized cond branches (#114225)
This patch fixes a simple error where as part of loop unrolling we
optimize conditional loop-exiting branches into un
[DebugInfo][LoopUnroll] Preserve DebugLocs on optimized cond branches (#114225)
This patch fixes a simple error where as part of loop unrolling we
optimize conditional loop-exiting branches into unconditional branches
when we know that they will or won't exit the loop, but does not
propagate the source location of the original branch to the new one.
Found using https://github.com/llvm/llvm-project/pull/107279.
show more ...
|
#
665dd23a |
| 05-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[Utils] Simplify code with DenseMap::operator[] (NFC) (#114932)
|
#
013f4a46 |
| 05-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[Utils] Remove unused includes (NFC) (#114748)
Identified with misc-include-cleaner.
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1 |
|
#
c8e06728 |
| 23-Sep-2024 |
Nikita Popov <npopov@redhat.com> |
[Loops] Use forgetLcssaPhiWithNewPredecessor() in more places
Use the more aggressive invalidation method in a number of places that add incoming values to lcssa phi nodes. It is likely that it's po
[Loops] Use forgetLcssaPhiWithNewPredecessor() in more places
Use the more aggressive invalidation method in a number of places that add incoming values to lcssa phi nodes. It is likely that it's possible to construct cases with incorrect SCEV preservation similar to https://github.com/llvm/llvm-project/issues/109333 for these.
show more ...
|
Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
5c834989 |
| 21-Jul-2024 |
Kazu Hirata <kazu@google.com> |
[Transforms] Use range-based for loops (NFC) (#99607)
|
#
2d209d96 |
| 27-Jun-2024 |
Nikita Popov <npopov@redhat.com> |
[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it does
[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...
`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
show more ...
|
Revision tags: llvmorg-18.1.8 |
|
#
e0ac087f |
| 06-Jun-2024 |
Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> |
[LoopUnroll] Consider convergence control tokens when unrolling (#91715)
- There is no restriction on a loop with controlled convergent
operations when
the relevant tokens are defined and used w
[LoopUnroll] Consider convergence control tokens when unrolling (#91715)
- There is no restriction on a loop with controlled convergent
operations when
the relevant tokens are defined and used within the loop.
- When a token defined outside a loop is used inside (also called a loop
convergence heart), unrolling is allowed only in the absence of
remainder or
runtime checks.
- When a token defined inside a loop is used outside, such a loop is
said to be
"extended". This loop can only be unrolled by also duplicating the
extended part
lying outside the loop. Such unrolling is disabled for now.
- Clean up loop hearts: When unrolling a loop with a heart, duplicating
the
heart will introduce multiple static uses of a convergence control token
in a
cycle that does not contain its definition. This violates the static
rules for
tokens, and needs to be cleaned up into a single occurrence of the
intrinsic.
- Spell out the initializer for UnrollLoopOptions to improve
readability.
Original implementation [D85605] by Nicolai Haehnle
<nicolai.haehnle@amd.com>.
show more ...
|
Revision tags: llvmorg-18.1.7, llvmorg-18.1.6 |
|
#
79643565 |
| 13-May-2024 |
chenlin <chenlin138@huawei.com> |
[LoopUnroll] Remove redundant debug instructions after blocks have been merged (#91246)
Remove redundant debug instructions after blocks have been merged into
the predecessor, It can reduce some co
[LoopUnroll] Remove redundant debug instructions after blocks have been merged (#91246)
Remove redundant debug instructions after blocks have been merged into
the predecessor, It can reduce some compile time in some cases.
This change only fixes the situation of loop unrolling, and other
situations are not considered. "RemoveRedundantDbgInstrs" seems to be
very time-consuming. Thus, we just add here after the "Dest" has been
merged into the "Fold", this may be a more targeted solution!!!
fixes: https://github.com/llvm/llvm-project/issues/89073
show more ...
|
#
677dddeb |
| 04-May-2024 |
Kazu Hirata <kazu@google.com> |
[Transforms] Use StringRef::operator== instead of StringRef::equals (NFC) (#91072)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumbe
[Transforms] Use StringRef::operator== instead of StringRef::equals (NFC) (#91072)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
31 under llvm/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
show more ...
|
#
175d2971 |
| 02-May-2024 |
Florian Hahn <flo@fhahn.com> |
[LoopUnroll] Add CSE to remove redundant loads after unrolling. (#83860)
This patch adds loadCSE support to simplifyLoopAfterUnroll. It is based
on EarlyCSE's implementation using ScopeHashTable an
[LoopUnroll] Add CSE to remove redundant loads after unrolling. (#83860)
This patch adds loadCSE support to simplifyLoopAfterUnroll. It is based
on EarlyCSE's implementation using ScopeHashTable and is using SCEV for
accessed pointers to check to find redundant loads after unrolling.
This applies to the late unroll pass only, for full unrolling those
redundant loads will be cleaned up by the regular pipeline.
The current approach constructs MSSA on-demand per-loop, but there is
still small but notable compile-time impact:
stage1-O3 +0.04%
stage1-ReleaseThinLTO +0.06%
stage1-ReleaseLTO-g +0.05%
stage1-O0-g +0.02%
stage2-O3 +0.09%
stage2-O0-g +0.04%
stage2-clang +0.02%
https://llvm-compile-time-tracker.com/compare.php?from=c089fa5a729e217d0c0d4647656386dac1a1b135&to=ec7c0f27cb5c12b600d9adfc8543d131765ec7be&stat=instructions:u
This benefits some workloads with runtime-unrolling disabled,
where users use pragmas to force unrolling, as well as with
runtime unrolling enabled.
On SPEC/MultiSource, this removes a number of loads after unrolling
on AArch64 with runtime unrolling enabled.
```
External/S...te/526.blender_r/526.blender_r 96
MultiSourc...rks/mediabench/gsm/toast/toast 39
SingleSource/Benchmarks/Misc/ffbench 4
External/SPEC/CINT2006/403.gcc/403.gcc 18
MultiSourc.../Applications/JM/ldecod/ldecod 4
MultiSourc.../mediabench/jpeg/jpeg-6a/cjpeg 6
MultiSourc...OE-ProxyApps-C/miniGMG/miniGMG 9
MultiSourc...e/Applications/ClamAV/clamscan 4
MultiSourc.../MallocBench/espresso/espresso 3
MultiSourc...dence-flt/LinearDependence-flt 2
MultiSourc...ch/office-ispell/office-ispell 4
MultiSourc...ch/consumer-jpeg/consumer-jpeg 6
MultiSourc...ench/security-sha/security-sha 11
MultiSourc...chmarks/McCat/04-bisect/bisect 3
SingleSour...tTests/2020-01-06-coverage-009 12
MultiSourc...ench/telecomm-gsm/telecomm-gsm 39
MultiSourc...lds-flt/CrossingThresholds-flt 24
MultiSourc...dence-dbl/LinearDependence-dbl 2
External/S...C/CINT2006/445.gobmk/445.gobmk 6
MultiSourc...enchmarks/mafft/pairlocalalign 53
External/S...31.deepsjeng_r/531.deepsjeng_r 3
External/S...rate/510.parest_r/510.parest_r 58
External/S...NT2006/464.h264ref/464.h264ref 29
External/S...NT2017rate/502.gcc_r/502.gcc_r 45
External/S...C/CINT2006/456.hmmer/456.hmmer 6
External/S...te/538.imagick_r/538.imagick_r 18
External/S.../CFP2006/447.dealII/447.dealII 4
MultiSourc...OE-ProxyApps-C++/miniFE/miniFE 12
External/S...2017rate/525.x264_r/525.x264_r 36
MultiSourc...Benchmarks/7zip/7zip-benchmark 33
MultiSourc...hmarks/ASC_Sequoia/AMGmk/AMGmk 2
MultiSourc...chmarks/VersaBench/8b10b/8b10b 1
MultiSourc.../Applications/JM/lencod/lencod 116
MultiSourc...lds-dbl/CrossingThresholds-dbl 24
MultiSource/Benchmarks/McCat/05-eks/eks 15
```
PR: https://github.com/llvm/llvm-project/pull/83860
show more ...
|
Revision tags: llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1 |
|
#
6b62a913 |
| 04-Mar-2024 |
Jeremy Morse <jeremy.morse@sony.com> |
[RemoveDIs] Reapply 3fda50d3915, insert instructions using iterators
I'd reverted this in 6c7805d5d1 after a bad stage. Original commit messsage follows:
[NFC][RemoveDIs] Bulk update utilities to i
[RemoveDIs] Reapply 3fda50d3915, insert instructions using iterators
I'd reverted this in 6c7805d5d1 after a bad stage. Original commit messsage follows:
[NFC][RemoveDIs] Bulk update utilities to insert with iterators
As part of the RemoveDIs project we need LLVM to insert instructions using iterators wherever possible, so that the iterators can carry a bit of debug-info. This commit implements some of that by updating the contents of llvm/lib/Transforms/Utils to always use iterator-versions of instruction constructors.
There are two general flavours of update: * Almost all call-sites just call getIterator on an instruction * Several make use of an existing iterator (scenarios where the code is actually significant for debug-info) The underlying logic is that any call to getFirstInsertionPt or similar APIs that identify the start of a block need to have that iterator passed directly to the insertion function, without being converted to a bare Instruction pointer along the way.
I've also switched DemotePHIToStack to take an optional iterator: it needs to take an iterator, and having a no-insert-location behaviour appears to be important. The constructors for ICmpInst and FCmpInst have been updated too. They're the only instructions that take block _references_ rather than pointers for certain calls, and a future patch is going to make use of default-null block insertion locations.
All of this should be NFC.
show more ...
|
#
6c7805d5 |
| 29-Feb-2024 |
Jeremy Morse <jeremy.morse@sony.com> |
Revert "[NFC][RemoveDIs] Bulk update utilities to insert with iterators"
This reverts commit 3fda50d3915b2163a54a37b602be7783a89dd808.
Apparently I've missed a hunk while staging this; will back ou
Revert "[NFC][RemoveDIs] Bulk update utilities to insert with iterators"
This reverts commit 3fda50d3915b2163a54a37b602be7783a89dd808.
Apparently I've missed a hunk while staging this; will back out for now.
Picked up here: https://lab.llvm.org/buildbot/#/builders/139/builds/60429/steps/6/logs/stdio
show more ...
|
#
3fda50d3 |
| 29-Feb-2024 |
Jeremy Morse <jeremy.morse@sony.com> |
[NFC][RemoveDIs] Bulk update utilities to insert with iterators
As part of the RemoveDIs project we need LLVM to insert instructions using iterators wherever possible, so that the iterators can carr
[NFC][RemoveDIs] Bulk update utilities to insert with iterators
As part of the RemoveDIs project we need LLVM to insert instructions using iterators wherever possible, so that the iterators can carry a bit of debug-info. This commit implements some of that by updating the contents of llvm/lib/Transforms/Utils to always use iterator-versions of instruction constructors.
There are two general flavours of update: * Almost all call-sites just call getIterator on an instruction * Several make use of an existing iterator (scenarios where the code is actually significant for debug-info) The underlying logic is that any call to getFirstInsertionPt or similar APIs that identify the start of a block need to have that iterator passed directly to the insertion function, without being converted to a bare Instruction pointer along the way.
I've also switched DemotePHIToStack to take an optional iterator: it needs to take an iterator, and having a no-insert-location behaviour appears to be important. The constructors for ICmpInst and FCmpInst have been updated too. They're the only instructions that take block _references_ rather than pointers for certain calls, and a future patch is going to make use of default-null block insertion locations.
All of this should be NFC.
show more ...
|
Revision tags: llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4 |
|
#
f9306f6d |
| 25-Oct-2023 |
Kazu Hirata <kazu@google.com> |
[ADT] Rename llvm::erase_value to llvm::erase (NFC) (#70156)
C++20 comes with std::erase to erase a value from std::vector. This
patch renames llvm::erase_value to llvm::erase for consistency with
[ADT] Rename llvm::erase_value to llvm::erase (NFC) (#70156)
C++20 comes with std::erase to erase a value from std::vector. This
patch renames llvm::erase_value to llvm::erase for consistency with
C++20.
We could make llvm::erase more similar to std::erase by having it
return the number of elements removed, but I'm not doing that for now
because nobody seems to care about that in our code base.
Since there are only 50 occurrences of erase_value in our code base,
this patch replaces all of them with llvm::erase and deprecates
llvm::erase_value.
show more ...
|
#
9c5a5a42 |
| 22-Oct-2023 |
Kazu Hirata <kazu@google.com> |
[llvm] Stop including llvm/ADT/iterator_range.h (NFC)
Identified with misc-include-cleaner.
|
Revision tags: llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
#
b9808e56 |
| 14-Jun-2023 |
Nikita Popov <npopov@redhat.com> |
[LoopUnroll] Fold add chains during unrolling
Loop unrolling tends to produce chains of `%x1 = add %x0, 1; %x2 = add %x1, 1; ...` with one add per unrolled iteration. This patch simplifies these add
[LoopUnroll] Fold add chains during unrolling
Loop unrolling tends to produce chains of `%x1 = add %x0, 1; %x2 = add %x1, 1; ...` with one add per unrolled iteration. This patch simplifies these adds to `%xN = add %x0, N` directly during unrolling, rather than waiting for InstCombine to do so.
The motivation for this is that having a single add (rather than an add chain) on the induction variable makes it a simple recurrence, which we specially recognize in a number of places. This allows InstCombine to directly perform folds with that knowledge, instead of first folding the add chains, and then doing other folds in another InstCombine iteration.
Due to the reduced number of InstCombine iterations, this also results in a small compile-time improvement.
Differential Revision: https://reviews.llvm.org/D153540
show more ...
|
Revision tags: llvmorg-16.0.6 |
|
#
143ed21b |
| 05-Jun-2023 |
Nikita Popov <npopov@redhat.com> |
Revert "[LCSSA] Remove unused ScalarEvolution argument (NFC)"
This reverts commit 5362a0d859d8e96b3f7c0437b7866e17a818a4f7.
In preparation for reverting a dependent revision.
|
Revision tags: llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2 |
|
#
9272d0f0 |
| 17-Apr-2023 |
Hongtao Yu <hoy@fb.com> |
[PseudoProbe] Clean up dwarf discriminator and avoid duplicating factor.
A pseudo probe is created with dwarf line information shared with its nearest instruction. If the instruction comes with a dw
[PseudoProbe] Clean up dwarf discriminator and avoid duplicating factor.
A pseudo probe is created with dwarf line information shared with its nearest instruction. If the instruction comes with a dwarf discriminator, it will be shared with the probe as well. This can confuse the later FS-AFDO discriminator assignment pass. To fix this, I'm cleaning up the discriminator fields for probes when they are inserted.
I also notice another possibility to change the discriminator field of pseudo probes in the pipeline before the FS discriminator assignment pass. That is the loop unroller, which assigns duplication factor to instruction being vectorized. I'm disabling that for pseudo probe intrinsics specifically, also for callsites with probes.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D148569
show more ...
|
#
5362a0d8 |
| 02-May-2023 |
Nikita Popov <npopov@redhat.com> |
[LCSSA] Remove unused ScalarEvolution argument (NFC)
After D149435, LCSSA formation no longer needs access to ScalarEvolution, so remove the argument from the utilities.
|
Revision tags: llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3 |
|
#
8cf1524c |
| 13-Feb-2023 |
Mircea Trofin <mtrofin@google.com> |
[loop unroll] Fix `branch-weights` for unrolled loop.
The branch weights of the unrolled loop need to be reduced by the unroll factor.
Differential Revision: https://reviews.llvm.org/D143948
|
Revision tags: llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init |
|
#
12dd3a7b |
| 20-Jan-2023 |
Florian Hahn <flo@fhahn.com> |
Recommit "[LoopUnroll] Directly update DT instead of DTU."
This reverts commit c5ea42bcf48c8f3d3e35a6bff620b06d2a499108.
Recommit the patch with a fix for loops where the exiting terminator is not
Recommit "[LoopUnroll] Directly update DT instead of DTU."
This reverts commit c5ea42bcf48c8f3d3e35a6bff620b06d2a499108.
Recommit the patch with a fix for loops where the exiting terminator is not a branch instruction. In that case, ExitInfos may be empty. In addition to checking if there's a single exiting block also check if there's a single ExitInfo.
A test case has been added in f92b35392ed8e4631.
show more ...
|
#
c5ea42bc |
| 20-Jan-2023 |
Arthur Eubanks <aeubanks@google.com> |
Revert "[LoopUnroll] Directly update DT instead of DTU."
This reverts commit d0907ce7ed9f159562ca3f4cfd8d87e89e93febe.
Causes `opt -passes=loop-unroll-full` to crash on
``` define void @foo() { bb
Revert "[LoopUnroll] Directly update DT instead of DTU."
This reverts commit d0907ce7ed9f159562ca3f4cfd8d87e89e93febe.
Causes `opt -passes=loop-unroll-full` to crash on
``` define void @foo() { bb: br label %bb1
bb1: ; preds = %bb1, %bb1, %bb switch i1 true, label %bb1 [ i1 true, label %bb2 i1 false, label %bb1 ]
bb2: ; preds = %bb1 ret void } ```
show more ...
|
#
d0907ce7 |
| 19-Jan-2023 |
Florian Hahn <flo@fhahn.com> |
[LoopUnroll] Directly update DT instead of DTU.
The scope of DT updates are very limited when unrolling loops: the DT should only need updating for * new blocks added * exiting blocks we simplified
[LoopUnroll] Directly update DT instead of DTU.
The scope of DT updates are very limited when unrolling loops: the DT should only need updating for * new blocks added * exiting blocks we simplified branches
This can be done manually without too much extra work. MergeBlockIntoPredecessor also needs to be updated to support direct DT updates.
This fixes excessive time spent in DTU for same cases. In an internal example, time spent in LoopUnroll with this patch goes from ~200s to 2s.
It also is slightly positive for CTMark: * NewPM-O3: -0.13% * NewPM-ReleaseThinLTO: -0.11% * NewPM-ReleaseLTO-g: -0.13%
Notable improvements are mafft (~ -0.50%) and lencod (~ -0.30%), with no workload regressed.
https://llvm-compile-time-tracker.com/compare.php?from=78a9ee7834331fb4360457cc565fa36f5452f7e0&to=687e08d011b0dc6d3edd223612761e44225c7537&stat=instructions:u
Reviewed By: kuhar
Differential Revision: https://reviews.llvm.org/D141487
show more ...
|