Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3 |
|
#
62333468 |
| 29-Oct-2024 |
Chengjun <chengjunp@Nvidia.com> |
[GenericCycle] Add a Cache for getExitBlocks in GenericCycle (#112290)
In `UniformityAnalysis`, we need to get the exit blocks of cycles in the
`DivergencePropagator` and currently, we have to do a
[GenericCycle] Add a Cache for getExitBlocks in GenericCycle (#112290)
In `UniformityAnalysis`, we need to get the exit blocks of cycles in the
`DivergencePropagator` and currently, we have to do a search for the
exit blocks every time. In this change, we add a cache of the results in
the `GenericCycle` so that it can save the compile time. By testing, for
some large cases, this can save about 60% compile time in the
`UniformityAnalysis`.
show more ...
|
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4 |
|
#
fa4cc9dd |
| 26-Aug-2024 |
Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> |
[FixIrreducible] Use CycleInfo instead of a custom SCC traversal (#101386)
[FixIrreducible] Use CycleInfo instead of a custom SCC traversal
1. CycleInfo efficiently locates all cycles in a single
[FixIrreducible] Use CycleInfo instead of a custom SCC traversal (#101386)
[FixIrreducible] Use CycleInfo instead of a custom SCC traversal
1. CycleInfo efficiently locates all cycles in a single pass, while the
SCC is
repeated inside every natural loop.
2. CycleInfo provides a hierarchy of irreducible cycles, and the new
implementation transforms each cycle in this hierarchy separately
instead of
reducing an entire irreducible SCC in a single step. This reduces the
number
of control-flow paths that pass through the header of each newly created
loop. This is evidenced by the reduced number of predecessors on the
"guard"
blocks in the lit tests, and fewer operands on the corresponding PHI
nodes.
3. When an entry of an irreducible cycle is the header of a child
natural loop,
the original implementation destroyed that loop. This is now preserved,
since the incoming edges on non-header entries are not touched.
4. In the new implementation, if an irreducible cycle is a superset of a
natural
loop with the same header, then that natural loop is destroyed and
replaced
by the newly created loop.
show more ...
|
#
e6da78a3 |
| 20-Aug-2024 |
Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> |
Reapply "[CycleAnalysis] Methods to verify cycles and their nesting. (#102300)"
This reverts commit 4aacc60fe7e1f7b3f788bba8382ea1fa5189ef3b.
The original implementation provided a simple method to
Reapply "[CycleAnalysis] Methods to verify cycles and their nesting. (#102300)"
This reverts commit 4aacc60fe7e1f7b3f788bba8382ea1fa5189ef3b.
The original implementation provided a simple method to check whether the forest of nested cycles is well-formed. This is now augmented with other methods to check well-formedness of every cycle, either individually, or as the entire forest. These will be used by future transforms that modify CycleInfo.
show more ...
|
#
4aacc60f |
| 20-Aug-2024 |
Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> |
Revert "[CycleAnalysis] Methods to verify cycles and their nesting. (#102300)"
This reverts commit b432afc28406b670a58933c2fe56c73e6f85911e.
Reverted due to linker failures in expensive-checks.
|
#
b432afc2 |
| 20-Aug-2024 |
Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> |
[CycleAnalysis] Methods to verify cycles and their nesting. (#102300)
The original implementation provided a simple method to check whether
the forest of nested cycles is well-formed. This is now a
[CycleAnalysis] Methods to verify cycles and their nesting. (#102300)
The original implementation provided a simple method to check whether
the forest of nested cycles is well-formed. This is now augmented with
other methods to check well-formedness of all cycles, either
invdividually, or as the entire forest. These will be used by future
transforms that modify CycleInfo.
show more ...
|
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6 |
|
#
f5e49279 |
| 10-May-2024 |
Petar Avramovic <Petar.Avramovic@amd.com> |
AMDGPU: fix isSafeToSink expecting exactly one predecessor (#89224)
isSafeToSink needs to check if machine cycle has divergent exit branch
but first it needs the MBB that contains cycle exit branch
AMDGPU: fix isSafeToSink expecting exactly one predecessor (#89224)
isSafeToSink needs to check if machine cycle has divergent exit branch
but first it needs the MBB that contains cycle exit branch.
Early-tailduplication can delete exit block created by structurize-cfg
so there is still exactly one cycle exit block but the new cycle exit
block can have multiple predecessors.
Simplify search for MBBs that contain cycle exit branch by introducing
helper method getExitingBlocks in GenericCycle.
Fixes #89200
show more ...
|
Revision tags: llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3 |
|
#
29f37f81 |
| 11-Oct-2023 |
petar-avramovic <56677889+petar-avramovic@users.noreply.github.com> |
CycleInfo: Fix splitCriticalEdge (#68584)
There are cases when cycle that contains both Pred and Succ is not
found: Pred is in a smaller cycle that is contained in a larger cycle
that also contain
CycleInfo: Fix splitCriticalEdge (#68584)
There are cases when cycle that contains both Pred and Succ is not
found: Pred is in a smaller cycle that is contained in a larger cycle
that also contains Succ, or Pred and Succ are in disjointed innermost
cycles but there is a larger cycle that contains both.
Add efficient helper function that finds innermost cycle that contains
both Pred and Succ if it exists.
show more ...
|
#
2fa7d652 |
| 04-Oct-2023 |
Petar Avramovic <Petar.Avramovic@amd.com> |
AMDGPU: Fix temporal divergence introduced by machine-sink (#67456)
Temporal divergence that was present in input or introduced in IR transforms, like code-sinking or LICM, is handled in SIFixSGPRCo
AMDGPU: Fix temporal divergence introduced by machine-sink (#67456)
Temporal divergence that was present in input or introduced in IR transforms, like code-sinking or LICM, is handled in SIFixSGPRCopies by changing sgpr source instr to vgpr instr. After 5b657f5, that moved LICM after AMDGPUCodeGenPrepare, machine-sinking can introduce temporal divergence by sinking instructions outside of the cycle. Add isSafeToSink callback in TargetInstrInfo.
show more ...
|
Revision tags: llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1 |
|
#
b14e30f1 |
| 27-Jul-2023 |
Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> |
[LLVM] refactor GenericSSAContext and its specializations
Fix the GenericSSAContext template so that it actually declares all the necessary typenames and the methods that must be implemented by its
[LLVM] refactor GenericSSAContext and its specializations
Fix the GenericSSAContext template so that it actually declares all the necessary typenames and the methods that must be implemented by its specializations SSAContext and MachineSSAContext.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D156288
show more ...
|
Revision tags: llvmorg-18-init |
|
#
da61c865 |
| 12-Jul-2023 |
Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> |
[RFC] Introduce convergence control intrinsics
This is a reboot of the original design and implementation by Nicolai Haehnle <nicolai.haehnle@amd.com>: https://reviews.llvm.org/D85603
This change a
[RFC] Introduce convergence control intrinsics
This is a reboot of the original design and implementation by Nicolai Haehnle <nicolai.haehnle@amd.com>: https://reviews.llvm.org/D85603
This change also obsoletes an earlier attempt at restarting the work on convergence tokens: https://reviews.llvm.org/D104504
Changes relative to D85603:
1. Clean up the definition of a "convergent operation", a convergent call and convergent function. 2. Clean up the relationship between dynamic instances, sets of threads and convergence tokens. 3. Redistribute the formal rules into the definitions of the convergence intrinsics. 4. Expand on the semantics of entering a function from outside LLVM, and the environment-defined outcome of the entry intrinsic. 5. Replace the term "cycle" with "closed path". The static rules are defined in terms of closed paths, and then a relation is established with cycles. 6. Specify that if a function contains a controlled convergent operation, then all convergent operations in that function must be controlled. 7. Describe an optional procedure to infer tokens for uncontrolled convergent operations. 8. Introduce controlled maximal convergence-before and controlled m-converged property as an update to the original properties in UniformityAnalysis. 9. Additional constraint that a cycle heart can only occur in the header of a reducible cycle (natural loop).
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D147116
show more ...
|
Revision tags: llvmorg-16.0.6 |
|
#
8e580b7e |
| 08-Jun-2023 |
Dhruv Chawla <44582521+dc03@users.noreply.github.com> |
[NFC][SetVector] Update some usages of SetVector to SmallSetVector
This patch is a continuation of D152497. It updates usages of SetVector that were found in llvm/ and clang/ which were originally s
[NFC][SetVector] Update some usages of SetVector to SmallSetVector
This patch is a continuation of D152497. It updates usages of SetVector that were found in llvm/ and clang/ which were originally specifying either SmallPtrSet or SmallVector to just using SmallSetVector, as the overhead of SetVector is reduced with D152497.
This also helps clean up the code a fair bit, and gives a decent speed boost at -O0 (~0.2%): https://llvm-compile-time-tracker.com/compare.php?from=9ffdabecabcddde298ff313f5353f9e06590af62&to=97f1c0cde42ba85eaa67cbe89bec8fe45b801f21&stat=instructions%3Au
Differential Revision: https://reviews.llvm.org/D152522
show more ...
|
Revision tags: llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1 |
|
#
98286a04 |
| 29-Mar-2023 |
Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> |
[llvm][CycleInfo] Quick look-up for block in cycle.
Use a SetVector to store blocks in a cycle to ensure a quick loop-up when querying whether the cycle contains a given block. This is along the sam
[llvm][CycleInfo] Quick look-up for block in cycle.
Use a SetVector to store blocks in a cycle to ensure a quick loop-up when querying whether the cycle contains a given block. This is along the same lines as the SmallPtrSet in LoopBase, introduced by commit be640b28c0cb81b77015baaef20ca2941fc61dea.
To make this work, we also enhance SetVector to support vector operations with pointers and set operations with const pointers in the same container.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D146136
show more ...
|
Revision tags: llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
#
475ce4c2 |
| 20-Dec-2022 |
Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> |
RFC: Uniformity Analysis for Irreducible Control Flow
Uniformity analysis is a generalization of divergence analysis to include irreducible control flow:
1. The proposed spec presents a notion of
RFC: Uniformity Analysis for Irreducible Control Flow
Uniformity analysis is a generalization of divergence analysis to include irreducible control flow:
1. The proposed spec presents a notion of "maximal convergence" that captures the existing convention of converging threads at the headers of natual loops.
2. Maximal convergence is then extended to irreducible cycles. The identity of irreducible cycles is determined by the choices made in a depth-first traversal of the control flow graph. Uniformity analysis uses criteria that depend only on closed paths and not cycles, to determine maximal convergence. This makes it a conservative analysis that is independent of the effect of DFS on CycleInfo.
3. The analysis is implemented as a template that can be instantiated for both LLVM IR and Machine IR.
Validation: - passes existing tests for divergence analysis - passes new tests with irreducible control flow - passes equivalent tests in MIR and GMIR
Based on concepts originally outlined by Nicolai Haehnle <nicolai.haehnle@amd.com>
With contributions from Ruiling Song <ruiling.song@amd.com> and Jay Foad <jay.foad@amd.com>.
Support for GMIR and lit tests for GMIR/MIR added by Yashwant Singh <yashwant.singh@amd.com>.
Differential Revision: https://reviews.llvm.org/D130746
show more ...
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1 |
|
#
c941d925 |
| 16-Sep-2022 |
Chen Zheng <czhengsz@cn.ibm.com> |
[MachineCycle][NFC] add a cache for block and its top level cycle
This solves https://github.com/llvm/llvm-project/issues/57664
Reviewed By: sameerds
Differential Revision: https://reviews.llvm.or
[MachineCycle][NFC] add a cache for block and its top level cycle
This solves https://github.com/llvm/llvm-project/issues/57664
Reviewed By: sameerds
Differential Revision: https://reviews.llvm.org/D134019
show more ...
|
Revision tags: llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2 |
|
#
d0bfebda |
| 01-Aug-2022 |
Jannik Silvanus <jannik.silvanus@amd.com> |
[Docs] Improve cycle and closed path definitions
Improve the cycle definition, by avoiding usage of not yet defined or only vaguely defined terminology inside definitions. More precisely, the existi
[Docs] Improve cycle and closed path definitions
Improve the cycle definition, by avoiding usage of not yet defined or only vaguely defined terminology inside definitions. More precisely, the existing definition defined "outermost cycles", and then proceeded to use the term "cycles" for further definitions, which in turn were used to actually define "cycles".
Now, instead only define "cycles". This does not change the meaning of a cycle, which depends on the chosen surrounding (subgraph) of a CFG.
Also mention the function CFG in the first definition, because later later definitions require it anyways.
Also slightly improve the definition of a closed path, by explicitly requiring the inner nodes to be distinct.
Differential Revision: https://reviews.llvm.org/D130891
show more ...
|
Revision tags: llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2 |
|
#
d7927523 |
| 19-Apr-2022 |
Chen Zheng <czhengsz@cn.ibm.com> |
[MachineSink] replace MachineLoop with MachineCycle
reapply 62a9b36fcf728b104ea87e6eb84c0be69b779df7 and fix module build failue: 1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def
[MachineSink] replace MachineLoop with MachineCycle
reapply 62a9b36fcf728b104ea87e6eb84c0be69b779df7 and fix module build failue: 1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def MachineCycleInfoWrapperPass is a anylysis pass, should not be there. 2: move the definition for MachineCycleInfoPrinterPass to cpp file.
Otherwise, there are module conflicit for MachineCycleInfoWrapperPass in MachinePassRegistry.def and MachineCycleAnalysis.h after 62a9b36fcf728b104ea87e6eb84c0be69b779df7.
MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo().
This patch tries to use MachineCycle so that we can handle irreducible loop better.
Reviewed By: sameerds, MatzeB
Differential Revision: https://reviews.llvm.org/D123995
show more ...
|
#
80c4910f |
| 25-May-2022 |
Chen Zheng <czhengsz@cn.ibm.com> |
Revert "[MachineSink] replace MachineLoop with MachineCycle"
This reverts commit 62a9b36fcf728b104ea87e6eb84c0be69b779df7. Cause build failure on lldb incremental buildbot: https://green.lab.llvm.or
Revert "[MachineSink] replace MachineLoop with MachineCycle"
This reverts commit 62a9b36fcf728b104ea87e6eb84c0be69b779df7. Cause build failure on lldb incremental buildbot: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/43994/changes
show more ...
|
#
62a9b36f |
| 19-Apr-2022 |
Chen Zheng <czhengsz@cn.ibm.com> |
[MachineSink] replace MachineLoop with MachineCycle
MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. A
[MachineSink] replace MachineLoop with MachineCycle
MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo().
This patch tries to use MachineCycle so that we can handle irreducible loop better.
Reviewed By: sameerds, MatzeB
Differential Revision: https://reviews.llvm.org/D123995
show more ...
|
#
cd5f3241 |
| 22-May-2022 |
NAKAMURA Takumi <geek4civic@gmail.com> |
ADT::GenericCycleInfo: Hide validateTree() in -Asserts.
validateTree() is instantiated with __FILE__. It will be pruned at link time due to -ffunction-sections but be left in object files. Its user
ADT::GenericCycleInfo: Hide validateTree() in -Asserts.
validateTree() is instantiated with __FILE__. It will be pruned at link time due to -ffunction-sections but be left in object files. Its user is only GenericCycleInfo::compute() with assert(validateTree()); Therefore I think validateTree() may be hidden with NDEBUG.
This is a fixup for https://reviews.llvm.org/D112696
show more ...
|
#
0ca2b93c |
| 13-May-2022 |
Chen Zheng <czhengsz@cn.ibm.com> |
[NFC] add the missing //@}
address code review comments for D123995
|
Revision tags: llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init |
|
#
c95df64c |
| 28-Jan-2022 |
Shivam Gupta <shivam98.tkg@gmail.com> |
[NFC] Add missing doxygen file tag in llvm/include/llvm/ADT/ headers
Few header file don't have file tag in them. This patch can be help in viewing doxygen documentation. When we hover on the includ
[NFC] Add missing doxygen file tag in llvm/include/llvm/ADT/ headers
Few header file don't have file tag in them. This patch can be help in viewing doxygen documentation. When we hover on the included header file, small description will display.
Reviewed By: aaron.ballman, xgupta
Differential Revision: https://reviews.llvm.org/D116004
show more ...
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3 |
|
#
4b22ffe0 |
| 17-Jan-2022 |
Carl Ritson <carl.ritson@amd.com> |
CycleInfo: Fix trivial typo. NFC.
|
Revision tags: llvmorg-13.0.1-rc2 |
|
#
1d0244ae |
| 10-Dec-2021 |
Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> |
Reapply CycleInfo: Introduce cycles as a generalization of loops
Reverts 02940d6d2202. Fixes breakage in the modules build.
LLVM loops cannot represent irreducible structures in the CFG. This chang
Reapply CycleInfo: Introduce cycles as a generalization of loops
Reverts 02940d6d2202. Fixes breakage in the modules build.
LLVM loops cannot represent irreducible structures in the CFG. This change introduce the concept of cycles as a generalization of loops, along with a CycleInfo analysis that discovers a nested hierarchy of such cycles. This is based on Havlak (1997), Nesting of Reducible and Irreducible Loops.
The cycle analysis is implemented as a generic template and then instatiated for LLVM IR and Machine IR. The template relies on a new GenericSSAContext template which must be specialized when used for each IR.
This review is a restart of an older review request: https://reviews.llvm.org/D83094
Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>, with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com>
Differential Revision: https://reviews.llvm.org/D112696
show more ...
|
#
0fe61ecc |
| 07-Dec-2021 |
Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> |
CycleInfo: Introduce cycles as a generalization of loops
LLVM loops cannot represent irreducible structures in the CFG. This change introduce the concept of cycles as a generalization of loops, alon
CycleInfo: Introduce cycles as a generalization of loops
LLVM loops cannot represent irreducible structures in the CFG. This change introduce the concept of cycles as a generalization of loops, along with a CycleInfo analysis that discovers a nested hierarchy of such cycles. This is based on Havlak (1997), Nesting of Reducible and Irreducible Loops.
The cycle analysis is implemented as a generic template and then instatiated for LLVM IR and Machine IR. The template relies on a new GenericSSAContext template which must be specialized when used for each IR.
This review is a restart of an older review request: https://reviews.llvm.org/D83094
Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>, with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com>
Differential Revision: https://reviews.llvm.org/D112696
show more ...
|