amdgpu-codegenprepare-break-large-phis-heuristics.ll - OpenGrok history log for /llvm-project/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-break-large-phis-heuristics.ll

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2
# 62ea799e	03-Aug-2023	pvanhout <pierre.vanhoutryve@amd.com>	[AMDGPU] Break Large PHIs: Take whole PHI chains into account Previous heuristics had a big flaw: they only looked at single PHI at a time, and didn't take into account the whole "chain". The concep [AMDGPU] Break Large PHIs: Take whole PHI chains into account Previous heuristics had a big flaw: they only looked at single PHI at a time, and didn't take into account the whole "chain". The concept of "chain" is important because if we only break a chain partially, we risk forcing regalloc to reserve twice as many registers for that vector. We also risk adding a lot of copies that shouldn't be there and can inhibit backend optimizations. The solution I found is to consider the whole "PHI chain" when looking at PHI. That is, we recursively look at the PHI's incoming value & users for other PHIs, then make a decision about the chain as a whole. The currrent threshold requires that at least `ceil(chain size * (2/3))` PHIs have at least one interesting incoming value. In simple terms, two-thirds (rounded up) of the PHIs should be breakable. This seems to work well. A lower threshold such as 50% is too aggressive because chains can often have 7 or 9 PHIs, and breaking 3+ or 4+ PHIs in those case often causes performance issue. Fixes SWDEV-409648, SWDEV-398393, SWDEV-413487 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156414 show more ...
Revision tags: llvmorg-17.0.0-rc1, llvmorg-18-init
# e5296c52	13-Jul-2023	pvanhout <pierre.vanhoutryve@amd.com>	[AMDGPU] Relax restrictions on unbreakable PHI users in BreakLargePHis The previous heuristic rejected a PHI if one of its user was an unbreakable PHI, no matter what the other users were. This wor [AMDGPU] Relax restrictions on unbreakable PHI users in BreakLargePHis The previous heuristic rejected a PHI if one of its user was an unbreakable PHI, no matter what the other users were. This worked well in most cases, but there's one case in rocRAND where it doesn't work. In that case, a PHI node has 2 PHI users where one is breakable but not the other. When that PHI node isn't broken performance falls by 35%. Relaxing the restriction to "require that half of the PHI node users are breakable" fixes the issue, and seems like a sensible change. Solves SWDEV-409648, SWDEV-398393 Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D155184 show more ...
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5
# fa87dd52	22-May-2023	pvanhout <pierre.vanhoutryve@amd.com>	[AMDGPU] Handle multiple occurences of an incoming value in break large PHIs We naively broke all incoming values, assuming they'd be unique. However it's not illegal to have multiple occurences of, [AMDGPU] Handle multiple occurences of an incoming value in break large PHIs We naively broke all incoming values, assuming they'd be unique. However it's not illegal to have multiple occurences of, e.g. `[BB0, V0]` in a PHI node. What's illegal though is having the same basic block multiple times but with different values, and it's exactly what the transform caused. This broke in some rare applications where the pattern arised. Now we cache the `BasicBlock, Value` pairs we're breaking so we can reuse the values and preserve this invariant. Solves SWDEV-399460 Reviewed By: #amdgpu, rovka Differential Revision: https://reviews.llvm.org/D151069 show more ...
Revision tags: llvmorg-16.0.4
# 52a2d07b	10-May-2023	pvanhout <pierre.vanhoutryve@amd.com>	[AMDGPU] Improve PHI-breaking heuristics in CGP D147786 made the transform more conservative by adding heuristics, which was a good idea. However, the transform got a bit too conservative at times. [AMDGPU] Improve PHI-breaking heuristics in CGP D147786 made the transform more conservative by adding heuristics, which was a good idea. However, the transform got a bit too conservative at times. This caused a surprise in some rocRAND benchmarks because D143731 greatly helped a few of them. For instance, a few xorwow-uniform tests saw a +30% boost in performance after that pass, which was lost when D147786 landed. This patch is an attempt at reaching a middleground that makes the pass a bit more permissive. It continues in the same spirit as D147786 but does the following changes: - PHI users of a PHI node are now recursively checked. When loops are encountered, we consider the PHIs non-breakable. (Considering them breakable had very negative effect in one app I tested) - `shufflevector` is now considered interesting, given that it satisfies a few trivial checks. Reviewed By: arsenm, #amdgpu, jmmartinez Differential Revision: https://reviews.llvm.org/D150266 show more ...
Revision tags: llvmorg-16.0.3
# 8b56da5e	26-Apr-2023	ManuelJBrito <manuel.brito@tecnico.ulisboa.pt>	[IR] Change shufflevector undef mask to poison With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for unde [IR] Change shufflevector undef mask to poison With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for undefined mask elements. Differential Revision: https://reviews.llvm.org/D149210 show more ...
Revision tags: llvmorg-16.0.2
# b3b3cb2d	07-Apr-2023	pvanhout <pierre.vanhoutryve@amd.com>	[AMDGPU] Less aggressively break large PHIs In some cases, breaking large PHIs can very negatively affect performance (3x more instructions observed in a particular test case). This patch adds some [AMDGPU] Less aggressively break large PHIs In some cases, breaking large PHIs can very negatively affect performance (3x more instructions observed in a particular test case). This patch adds some basic profitability heuristics to help with some of these issues without affecting the "good" cases. e.g. avoid breaking PHIs if it causes back-and-forth between vector/scalar form for no good reason. Fixes SWDEV-392803 Fixes SWDEV-393781 Fixes SWDEV-394228 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D147786 show more ...