|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2 |
|
| #
62ea799e |
| 03-Aug-2023 |
pvanhout <pierre.vanhoutryve@amd.com> |
[AMDGPU] Break Large PHIs: Take whole PHI chains into account
Previous heuristics had a big flaw: they only looked at single PHI at a time, and didn't take into account the whole "chain". The concep
[AMDGPU] Break Large PHIs: Take whole PHI chains into account
Previous heuristics had a big flaw: they only looked at single PHI at a time, and didn't take into account the whole "chain". The concept of "chain" is important because if we only break a chain partially, we risk forcing regalloc to reserve twice as many registers for that vector. We also risk adding a lot of copies that shouldn't be there and can inhibit backend optimizations.
The solution I found is to consider the whole "PHI chain" when looking at PHI. That is, we recursively look at the PHI's incoming value & users for other PHIs, then make a decision about the chain as a whole.
The currrent threshold requires that at least `ceil(chain size * (2/3))` PHIs have at least one interesting incoming value. In simple terms, two-thirds (rounded up) of the PHIs should be breakable.
This seems to work well. A lower threshold such as 50% is too aggressive because chains can often have 7 or 9 PHIs, and breaking 3+ or 4+ PHIs in those case often causes performance issue.
Fixes SWDEV-409648, SWDEV-398393, SWDEV-413487
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D156414
show more ...
|
|
Revision tags: llvmorg-17.0.0-rc1, llvmorg-18-init |
|
| #
e5296c52 |
| 13-Jul-2023 |
pvanhout <pierre.vanhoutryve@amd.com> |
[AMDGPU] Relax restrictions on unbreakable PHI users in BreakLargePHis
The previous heuristic rejected a PHI if one of its user was an unbreakable PHI, no matter what the other users were.
This wor
[AMDGPU] Relax restrictions on unbreakable PHI users in BreakLargePHis
The previous heuristic rejected a PHI if one of its user was an unbreakable PHI, no matter what the other users were.
This worked well in most cases, but there's one case in rocRAND where it doesn't work. In that case, a PHI node has 2 PHI users where one is breakable but not the other. When that PHI node isn't broken performance falls by 35%.
Relaxing the restriction to "require that half of the PHI node users are breakable" fixes the issue, and seems like a sensible change.
Solves SWDEV-409648, SWDEV-398393
Reviewed By: #amdgpu, arsenm
Differential Revision: https://reviews.llvm.org/D155184
show more ...
|
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5 |
|
| #
fa87dd52 |
| 22-May-2023 |
pvanhout <pierre.vanhoutryve@amd.com> |
[AMDGPU] Handle multiple occurences of an incoming value in break large PHIs
We naively broke all incoming values, assuming they'd be unique. However it's not illegal to have multiple occurences of,
[AMDGPU] Handle multiple occurences of an incoming value in break large PHIs
We naively broke all incoming values, assuming they'd be unique. However it's not illegal to have multiple occurences of, e.g. `[BB0, V0]` in a PHI node. What's illegal though is having the same basic block multiple times but with different values, and it's exactly what the transform caused. This broke in some rare applications where the pattern arised.
Now we cache the `BasicBlock, Value` pairs we're breaking so we can reuse the values and preserve this invariant.
Solves SWDEV-399460
Reviewed By: #amdgpu, rovka
Differential Revision: https://reviews.llvm.org/D151069
show more ...
|
|
Revision tags: llvmorg-16.0.4 |
|
| #
52a2d07b |
| 10-May-2023 |
pvanhout <pierre.vanhoutryve@amd.com> |
[AMDGPU] Improve PHI-breaking heuristics in CGP
D147786 made the transform more conservative by adding heuristics, which was a good idea. However, the transform got a bit too conservative at times.
[AMDGPU] Improve PHI-breaking heuristics in CGP
D147786 made the transform more conservative by adding heuristics, which was a good idea. However, the transform got a bit too conservative at times.
This caused a surprise in some rocRAND benchmarks because D143731 greatly helped a few of them. For instance, a few xorwow-uniform tests saw a +30% boost in performance after that pass, which was lost when D147786 landed.
This patch is an attempt at reaching a middleground that makes the pass a bit more permissive. It continues in the same spirit as D147786 but does the following changes: - PHI users of a PHI node are now recursively checked. When loops are encountered, we consider the PHIs non-breakable. (Considering them breakable had very negative effect in one app I tested) - `shufflevector` is now considered interesting, given that it satisfies a few trivial checks.
Reviewed By: arsenm, #amdgpu, jmmartinez
Differential Revision: https://reviews.llvm.org/D150266
show more ...
|
|
Revision tags: llvmorg-16.0.3 |
|
| #
8b56da5e |
| 26-Apr-2023 |
ManuelJBrito <manuel.brito@tecnico.ulisboa.pt> |
[IR] Change shufflevector undef mask to poison
With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for unde
[IR] Change shufflevector undef mask to poison
With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for undefined mask elements.
Differential Revision: https://reviews.llvm.org/D149210
show more ...
|
|
Revision tags: llvmorg-16.0.2 |
|
| #
b3b3cb2d |
| 07-Apr-2023 |
pvanhout <pierre.vanhoutryve@amd.com> |
[AMDGPU] Less aggressively break large PHIs
In some cases, breaking large PHIs can very negatively affect performance (3x more instructions observed in a particular test case).
This patch adds some
[AMDGPU] Less aggressively break large PHIs
In some cases, breaking large PHIs can very negatively affect performance (3x more instructions observed in a particular test case).
This patch adds some basic profitability heuristics to help with some of these issues without affecting the "good" cases. e.g. avoid breaking PHIs if it causes back-and-forth between vector/scalar form for no good reason.
Fixes SWDEV-392803 Fixes SWDEV-393781 Fixes SWDEV-394228
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D147786
show more ...
|