|
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
| #
54d31bde |
| 01-Nov-2024 |
Ruiling, Song <ruiling.song@amd.com> |
Reapply "StructurizeCFG: Optimize phi insertion during ssa reconstruction (#101301)" (#114347)
This reverts commit be40c723ce2b7bf2690d22039d74d21b2bd5b7cf.
|
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3 |
|
| #
be40c723 |
| 08-Aug-2024 |
Yaxun (Sam) Liu <yaxun.liu@amd.com> |
Revert "StructurizeCFG: Optimize phi insertion during ssa reconstruction (#101301)"
This reverts commit c62e2a2a4ed69d53a3c6ca5c24ee8d2504d6ba2b.
Since it caused regression in HIP buildbot:
https:
Revert "StructurizeCFG: Optimize phi insertion during ssa reconstruction (#101301)"
This reverts commit c62e2a2a4ed69d53a3c6ca5c24ee8d2504d6ba2b.
Since it caused regression in HIP buildbot:
https://lab.llvm.org/buildbot/#/builders/123/builds/3282
show more ...
|
| #
c62e2a2a |
| 08-Aug-2024 |
Ruiling, Song <ruiling.song@amd.com> |
StructurizeCFG: Optimize phi insertion during ssa reconstruction (#101301)
After investigating more while-break cases, I think we should try to
optimize
the way we reconstruct phi nodes. Previousl
StructurizeCFG: Optimize phi insertion during ssa reconstruction (#101301)
After investigating more while-break cases, I think we should try to
optimize
the way we reconstruct phi nodes. Previously, we reconstruct each phi
nodes separately, but this is not optimal. For example:
```
header:
%v.1 = phi float [ %v, %entry ], [ %v.2, %latch ]
br i1 %cc, label %if, label %latch
if:
%v.if = fadd float %v.1, 1.0
br i1 %cc2, label %latch, label %exit
latch:
%v.2 = phi float [ %v.if, %if ], [ %v.1, %header ]
br i1 %cc3, label %exit, label %header
exit:
%v.3 = phi float [ %v.2, %latch ], [ %v.if, %if ]
```
For this case, we have different copies of value `v`, but there is at
most one copy of value `v` alive at any program point shown above.
The existing ssa reconstruction will use the incoming values from the
old deleted phi. Below is a possible output after ssa reconstruction.
```
header:
%v.1 = phi float [ %v, %entry ], [ %v.loop, %Flow1 ]
br i1 %cc, label %if, label %flow
if:
%v.if = fadd float %v.1, 1.0
br label %flow
flow:
%v.exit.if = phi float [ %v.if, %if ], [ undef, %header ]
%v.latch = phi float [ %v.if, %if ], [ %v.1, %header ]
latch:
br label %flow1
flow1:
%v.loop = phi float [ %v.latch, %latch ], [ undef, %Flow ]
%v.exit = phi float [ %v.latch, %latch ], [ %v.exit.if, %Flow ]
exit:
%v.3 = phi float [ %v.exit, %flow1 ]
```
If we look closely, in order to reconstruct `v.1` `v.2` `v.3`, we are
having two simultaneous copies of `v` alive at `flow` and `flow1`.
We highly depend on register coalescer to coalesce them together.
But register coalescer may not always be able to coalesce them
because of the complexity in the chain of phi.
On the other side, now that we have only one copy of `v` alive at any
program point before the transform, why not simplify the phi network
as much as we can? Look at the incoming values of these PHIs:
```
header if latch
v.1: -- -- v.2
v.2: v.1 v.if --
v.3: -- v.if v.2
```
If we let them share the same incoming values for these three different
incoming blocks, then we would have only one copy of alive `v` at any
program point after ssa reconstruction. Something like:
```
header:
%v.1 = phi float [ %v, %entry ], [ %v.2, %Flow1 ]
br i1 %cc, label %if, label %flow
if:
%v.if = fadd float %v.1, 1.0
br label %flow
flow:
%v.2 = phi float [ %v.if, %if ], [ %v.1, %header ]
latch:
br label %flow1
flow1:
...
exit:
%v.3 = phi float [ %v.2, %flow1 ]
```
show more ...
|
|
Revision tags: llvmorg-19.1.0-rc2 |
|
| #
dae7fb8c |
| 31-Jul-2024 |
Ruiling, Song <ruiling.song@amd.com> |
[AMDGPU,test] Add one more while-break case (#101300)
which suffers from v_mov issue.
|
|
Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
| #
9e9907f1 |
| 17-Jan-2024 |
Fangrui Song <i@maskray.me> |
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.
For IR files without a target triple, -mtriple= specifies the full
target triple while
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.
For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.
Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.
This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:
```
LLVM :: CodeGen/AMDGPU/fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fabs.ll
LLVM :: CodeGen/AMDGPU/floor.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```
show more ...
|
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3 |
|
| #
7b3bbd83 |
| 09-Oct-2023 |
Jay Foad <jay.foad@amd.com> |
Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"
This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.
Reverted due to various buildbot failures.
|
| #
2501ae58 |
| 09-Oct-2023 |
Jay Foad <jay.foad@amd.com> |
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
show more ...
|
|
Revision tags: llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3 |
|
| #
a5676a3a |
| 23-Aug-2022 |
Ruiling Song <ruiling.song@amd.com> |
StructurizeCFG: Set Undef for non-predecessors in setPhiValues()
During structurization process, we may place non-predecessor blocks between the predecessors of a block in the structurized CFG. Take
StructurizeCFG: Set Undef for non-predecessors in setPhiValues()
During structurization process, we may place non-predecessor blocks between the predecessors of a block in the structurized CFG. Take the typical while-break case as an example: ``` /---A(v=...) | / \ ^ B C | \ /| \---L | \ / E (r = phi (v:C)...) ``` After structurization, the CFG would be look like: ``` /---A | |\ | | C | |/ | F1 ^ |\ | | B | |/ | F2 | |\ | | L \ |/ \--F3 | E ``` We can see that block B is placed between the predecessors(C/L) of E. During phi reconstruction, to achieve the same sematics as before, we are reconstructing the PHIs as: F1: v1 = phi (v:C), (undef:A) F3: r = phi (v1:F2), ... But this is also saying that `v1` would be live through B, which is not quite necessary. The idea in the change is to say the incoming value from B is Undef for the PHI in E. With this change, the reconstructed PHI would be: F1: v1 = phi (v:C), (undef:A) F2: v2 = phi (v1:F1), (undef:B) F3: r = phi (v2:F2), ...
Reviewed by: sameerds
Differential Revision: https://reviews.llvm.org/D132450
show more ...
|
| #
40e9284f |
| 22-Aug-2022 |
Ruiling Song <ruiling.song@amd.com> |
StructurizeCFG: prefer reduced number of live values
The instruction simplification will try to simplify the affected phis. In some cases, this might extend the liveness of values. For example:
B
StructurizeCFG: prefer reduced number of live values
The instruction simplification will try to simplify the affected phis. In some cases, this might extend the liveness of values. For example:
BB0: | \ | BB1 | / BB2:phi (BB0, v), (BB1, undef)
The phi in BB2 will be simplified to v as v dominates BB2, but this is increasing the number of active values in BB1. By setting CanUseUndef to false, we will not simplify the phi in this way, this would help register pressure. This is mandatory for the later change to help reducing VGPR pressure for AMDGPU.
Reviewed by: foad, sameerds
Differential Revision: https://reviews.llvm.org/D132449
show more ...
|
| #
66325d9b |
| 23-Aug-2022 |
Ruiling Song <ruiling.song@amd.com> |
AMDGPU: Add a test to show how later optimization works
Differential Revision: https://reviews.llvm.org/D132448
|