Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2
# 62ea799e 03-Aug-2023 pvanhout <pierre.vanhoutryve@amd.com>

[AMDGPU] Break Large PHIs: Take whole PHI chains into account

Previous heuristics had a big flaw: they only looked at single PHI at a time, and didn't take into account the whole "chain".
The concep

[AMDGPU] Break Large PHIs: Take whole PHI chains into account

Previous heuristics had a big flaw: they only looked at single PHI at a time, and didn't take into account the whole "chain".
The concept of "chain" is important because if we only break a chain partially, we risk forcing regalloc to reserve twice as many registers for that vector.
We also risk adding a lot of copies that shouldn't be there and can inhibit backend optimizations.

The solution I found is to consider the whole "PHI chain" when looking at PHI.
That is, we recursively look at the PHI's incoming value & users for other PHIs, then make a decision about the chain as a whole.

The currrent threshold requires that at least `ceil(chain size * (2/3))` PHIs have at least one interesting incoming value.
In simple terms, two-thirds (rounded up) of the PHIs should be breakable.

This seems to work well. A lower threshold such as 50% is too aggressive because chains can often have 7 or 9 PHIs, and breaking 3+ or 4+ PHIs in those case often causes performance issue.

Fixes SWDEV-409648, SWDEV-398393, SWDEV-413487

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156414

show more ...


Revision tags: llvmorg-17.0.0-rc1, llvmorg-18-init
# e5296c52 13-Jul-2023 pvanhout <pierre.vanhoutryve@amd.com>

[AMDGPU] Relax restrictions on unbreakable PHI users in BreakLargePHis

The previous heuristic rejected a PHI if one of its user was an unbreakable PHI, no matter what the other users were.

This wor

[AMDGPU] Relax restrictions on unbreakable PHI users in BreakLargePHis

The previous heuristic rejected a PHI if one of its user was an unbreakable PHI, no matter what the other users were.

This worked well in most cases, but there's one case in rocRAND where
it doesn't work. In that case, a PHI node has 2 PHI users where one is
breakable but not the other. When that PHI node isn't broken performance falls by 35%.

Relaxing the restriction to "require that half of the PHI node users are breakable" fixes the issue, and seems like a sensible change.

Solves SWDEV-409648, SWDEV-398393

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D155184

show more ...


Revision tags: llvmorg-16.0.6, llvmorg-16.0.5
# fa87dd52 22-May-2023 pvanhout <pierre.vanhoutryve@amd.com>

[AMDGPU] Handle multiple occurences of an incoming value in break large PHIs

We naively broke all incoming values, assuming they'd be unique.
However it's not illegal to have multiple occurences of,

[AMDGPU] Handle multiple occurences of an incoming value in break large PHIs

We naively broke all incoming values, assuming they'd be unique.
However it's not illegal to have multiple occurences of, e.g. `[BB0, V0]`
in a PHI node. What's illegal though is having the same basic block
multiple times but with different values, and it's exactly what the
transform caused. This broke in some rare applications where the pattern
arised.

Now we cache the `BasicBlock, Value` pairs we're breaking so we can reuse the values and preserve this invariant.

Solves SWDEV-399460

Reviewed By: #amdgpu, rovka

Differential Revision: https://reviews.llvm.org/D151069

show more ...


Revision tags: llvmorg-16.0.4
# 52a2d07b 10-May-2023 pvanhout <pierre.vanhoutryve@amd.com>

[AMDGPU] Improve PHI-breaking heuristics in CGP

D147786 made the transform more conservative by adding heuristics,
which was a good idea. However, the transform got a bit
too conservative at times.

[AMDGPU] Improve PHI-breaking heuristics in CGP

D147786 made the transform more conservative by adding heuristics,
which was a good idea. However, the transform got a bit
too conservative at times.

This caused a surprise in some rocRAND benchmarks because D143731 greatly helped a few of them.
For instance, a few xorwow-uniform tests saw a +30% boost in performance after that pass, which was lost when D147786 landed.

This patch is an attempt at reaching a middleground that makes
the pass a bit more permissive. It continues in the same spirit as
D147786 but does the following changes:
- PHI users of a PHI node are now recursively checked. When loops are encountered, we consider the PHIs non-breakable. (Considering them breakable had very negative effect in one app I tested)
- `shufflevector` is now considered interesting, given that it satisfies a few trivial checks.

Reviewed By: arsenm, #amdgpu, jmmartinez

Differential Revision: https://reviews.llvm.org/D150266

show more ...


Revision tags: llvmorg-16.0.3
# 8b56da5e 26-Apr-2023 ManuelJBrito <manuel.brito@tecnico.ulisboa.pt>

[IR] Change shufflevector undef mask to poison

With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for unde

[IR] Change shufflevector undef mask to poison

With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.

Differential Revision: https://reviews.llvm.org/D149210

show more ...


Revision tags: llvmorg-16.0.2
# b3b3cb2d 07-Apr-2023 pvanhout <pierre.vanhoutryve@amd.com>

[AMDGPU] Less aggressively break large PHIs

In some cases, breaking large PHIs can very negatively affect
performance (3x more instructions observed in a particular test case).

This patch adds some

[AMDGPU] Less aggressively break large PHIs

In some cases, breaking large PHIs can very negatively affect
performance (3x more instructions observed in a particular test case).

This patch adds some basic profitability heuristics to help with some of these issues without affecting the "good" cases.
e.g. avoid breaking PHIs if it causes back-and-forth between vector/scalar form for no good reason.

Fixes SWDEV-392803
Fixes SWDEV-393781
Fixes SWDEV-394228

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D147786

show more ...