History log of /llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp (Results 1 – 21 of 21)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4
# 83ad90d8 14-Nov-2024 Siu Chi Chan <siuchi.chan@amd.com>

[AMDGPU] Fix module split's assumption on kernels

Module split assumes that a kernel function must have an external
linkage; however, that isn't the case. For example, a static kernel
function will

[AMDGPU] Fix module split's assumption on kernels

Module split assumes that a kernel function must have an external
linkage; however, that isn't the case. For example, a static kernel
function will have a weak_odr linkage

Change-Id: I1e5dee0de1fd866b365f4090a574e1b2961f8dca

show more ...


# 345b3319 27-Nov-2024 Fraser Cormack <fraser@codeplay.com>

[AMDGPU][SplitModule] Fix unintentional integer division (#117586)

A static analysis tool warned that a division was always being performed
in integer division, so was either 0.0 or 1.0.

This do

[AMDGPU][SplitModule] Fix unintentional integer division (#117586)

A static analysis tool warned that a division was always being performed
in integer division, so was either 0.0 or 1.0.

This doesn't seem intentional, so has been fixed to return a true ratio
using floating-point division. This in turn showed a bug where a
comparison against this ratio was incorrect.

show more ...


# 3414993e 26-Nov-2024 Fraser Cormack <fraser@codeplay.com>

[AMDGPU][SplitModule] Fix potential divide by zero (#117602)

A static analysis tool found that ModuleCost could be zero, so would
perform divide by zero when being printed. Perhaps this is unreacha

[AMDGPU][SplitModule] Fix potential divide by zero (#117602)

A static analysis tool found that ModuleCost could be zero, so would
perform divide by zero when being printed. Perhaps this is unreachable
in practice, but the fix is straightforward enough and unlikely to be a
performance concern.

show more ...


# be187369 14-Nov-2024 Kazu Hirata <kazu@google.com>

[AMDGPU] Remove unused includes (NFC) (#116154)

Identified with misc-include-cleaner.


Revision tags: llvmorg-19.1.3, llvmorg-19.1.2
# b3a8400a 14-Oct-2024 Pierre van Houtryve <pierre.vanhoutryve@amd.com>

(reland) [AMDGPU][SplitModule] Handle !callees metadata (#108802)

(reland with fixed sed command for macos)

Handle the `!callees` metadata to further reduce the amount of indirect
call cases that e

(reland) [AMDGPU][SplitModule] Handle !callees metadata (#108802)

(reland with fixed sed command for macos)

Handle the `!callees` metadata to further reduce the amount of indirect
call cases that end up conservatively assuming that any indirectly
callable function is a potential target.

show more ...


# 140cbca8 14-Oct-2024 Nico Weber <thakis@chromium.org>

Revert "[AMDGPU][SplitModule] Handle !callees metadata (#108802)"

This reverts commit 4a0dc3ef36ceff20787ff277a1fb6a1b513c4934.
Breaks tests, see comments on
https://github.com/llvm/llvm-project/pul

Revert "[AMDGPU][SplitModule] Handle !callees metadata (#108802)"

This reverts commit 4a0dc3ef36ceff20787ff277a1fb6a1b513c4934.
Breaks tests, see comments on
https://github.com/llvm/llvm-project/pull/108802

show more ...


# 4a0dc3ef 14-Oct-2024 Pierre van Houtryve <pierre.vanhoutryve@amd.com>

[AMDGPU][SplitModule] Handle !callees metadata (#108802)

See #106528 to review the first commit.

Handle the `!callees` metadata to further reduce the amount of indirect
call cases that end up co

[AMDGPU][SplitModule] Handle !callees metadata (#108802)

See #106528 to review the first commit.

Handle the `!callees` metadata to further reduce the amount of indirect
call cases that end up conservatively assuming that any indirectly
callable function is a potential target.

show more ...


# d656b206 11-Oct-2024 Pierre van Houtryve <pierre.vanhoutryve@amd.com>

[AMDGPU][SplitModule] Cleanup CallsExternal Handling (#106528)

- Don't treat inline ASM as indirect calls
- Remove call to alias testing, which was broken (only working by pure
luck right now) and

[AMDGPU][SplitModule] Cleanup CallsExternal Handling (#106528)

- Don't treat inline ASM as indirect calls
- Remove call to alias testing, which was broken (only working by pure
luck right now) and isn't needed anyway. GlobalOpt should take care of
them for us.

show more ...


Revision tags: llvmorg-19.1.1, llvmorg-19.1.0
# 959d8404 09-Sep-2024 pvanhout <pierre.vanhoutryve@amd.com>

[AMDGPU] Remove unused SplitGraph::Node::getFullCost


# 9347b66c 09-Sep-2024 Pierre van Houtryve <pierre.vanhoutryve@amd.com>

Reland "[AMDGPU] Graph-based Module Splitting Rewrite (#104763)" (#107076)

Relands #104763 with
- Fixes for EXPENSIVE_CHECKS test failure (due to sorting operator
failing if the input is shuffled

Reland "[AMDGPU] Graph-based Module Splitting Rewrite (#104763)" (#107076)

Relands #104763 with
- Fixes for EXPENSIVE_CHECKS test failure (due to sorting operator
failing if the input is shuffled first)
- Fix for broken proposal selection
- c3cb27370af40e491446164840766478d3258429 included

Original commit description below
---

Major rewrite of the AMDGPUSplitModule pass in order to better support
it long-term.

Highlights:
- Removal of the "SML" logging system in favor of just using CL options
and LLVM_DEBUG, like any other pass in LLVM.
- The SML system started from good intentions, but it was too flawed and
messy to be of any real use. It was also a real pain to use and made the
code more annoying to maintain.
- Graph-based module representation with DOTGraph printing support
- The graph represents the module accurately, with bidirectional, typed
edges between nodes (a node usually represents one function).
- Nodes are assigned IDs starting from 0, which allows us to represent a
set of nodes as a BitVector. This makes comparing 2 sets of nodes to
find common dependencies a trivial task. Merging two clusters of nodes
together is also really trivial.
- No more defaulting to "P0" for external calls
- Roots that can reach non-copyable dependencies (such as external
calls) are now grouped together in a single "cluster" that can go into
any partition.
- No more defaulting to "P0" for indirect calls
- New representation for module splitting proposals that can be graded
and compared.
- Graph-search algorithm that can explore multiple branches/assignments
for a cluster of functions, up to a maximum depth.
- With the default max depth of 8, we can create up to 256 propositions
to try and find the best one.
- We can still fall back to a greedy approach upon reaching max depth.
That greedy approach uses almost identical heuristics to the previous
version of the pass.

All of this gives us a lot of room to experiment with new heuristics or
even entirely different splitting strategies if we need to. For
instance, the graph representation has room for abstract nodes, e.g. if
we need to represent some global variables or external constraints. We
could also introduce more edge types to model other type of relations
between nodes, etc.

I also designed the graph representation & the splitting strategies to
be as fast as possible, and it seems to have paid off. Some quick tests
showed that we spend pretty much all of our time in the CloneModule
function, with the actual splitting logic being >1% of the runtime.

show more ...


Revision tags: llvmorg-19.1.0-rc4
# 6345604a 30-Aug-2024 Danial Klimkin <dklimkin@google.com>

Revert: [AMDGPU] Graph-based Module Splitting Rewrite (llvm#104763) (#106707)

* Revert "Fix MSVC "not all control paths return a value" warning. NFC."
Dep to revert c9b6e01b2e4fc930dac91dd44c0592ad

Revert: [AMDGPU] Graph-based Module Splitting Rewrite (llvm#104763) (#106707)

* Revert "Fix MSVC "not all control paths return a value" warning. NFC."
Dep to revert c9b6e01b2e4fc930dac91dd44c0592ad7e36d967

* Revert "[AMDGPU] Graph-based Module Splitting Rewrite (#104763)"
Breaks tests.

show more ...


# c3cb2737 29-Aug-2024 Simon Pilgrim <llvm-dev@redking.me.uk>

Fix MSVC "not all control paths return a value" warning. NFC.


# c9b6e01b 29-Aug-2024 Pierre van Houtryve <pierre.vanhoutryve@amd.com>

[AMDGPU] Graph-based Module Splitting Rewrite (#104763)

Major rewrite of the AMDGPUSplitModule pass in order to better support
it long-term.

Highlights:
- Removal of the "SML" logging system in

[AMDGPU] Graph-based Module Splitting Rewrite (#104763)

Major rewrite of the AMDGPUSplitModule pass in order to better support
it long-term.

Highlights:
- Removal of the "SML" logging system in favor of just using CL options
and LLVM_DEBUG, like any other pass in LLVM.
- The SML system started from good intentions, but it was too flawed and
messy to be of any real use. It was also a real pain to use and made the
code more annoying to maintain.
- Graph-based module representation with DOTGraph printing support
- The graph represents the module accurately, with bidirectional, typed
edges between nodes (a node usually represents one function).
- Nodes are assigned IDs starting from 0, which allows us to represent a
set of nodes as a BitVector. This makes comparing 2 sets of nodes to
find common dependencies a trivial task. Merging two clusters of nodes
together is also really trivial.
- No more defaulting to "P0" for external calls
- Roots that can reach non-copyable dependencies (such as external
calls) are now grouped together in a single "cluster" that can go into
any partition.
- No more defaulting to "P0" for indirect calls
- New representation for module splitting proposals that can be graded
and compared.
- Graph-search algorithm that can explore multiple branches/assignments
for a cluster of functions, up to a maximum depth.
- With the default max depth of 8, we can create up to 256 propositions
to try and find the best one.
- We can still fall back to a greedy approach upon reaching max depth.
That greedy approach uses almost identical heuristics to the previous
version of the pass.

All of this gives us a lot of room to experiment with new heuristics or
even entirely different splitting strategies if we need to. For
instance, the graph representation has room for abstract nodes, e.g. if
we need to represent some global variables or external constraints. We
could also introduce more edge types to model other type of relations
between nodes, etc.

I also designed the graph representation & the splitting strategies to
be as fast as possible, and it seems to have paid off. Some quick tests
showed that we spend pretty much all of our time in the CloneModule
function, with the actual splitting logic being >1% of the runtime.

show more ...


Revision tags: llvmorg-19.1.0-rc3
# 2e9f3f3b 15-Aug-2024 Fraser Cormack <fraser@codeplay.com>

[AMDGPU][llvm-split] Fix another division by zero (#104421)

Somehow I missed this in #98888. It requires a log file, or the debug
flag to be passed.


Revision tags: llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 5e338f1f 17-Jul-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] clang-tidy: use emplace_back instead of push_back. NFC.


# 075f7542 15-Jul-2024 Fraser Cormack <fraser@codeplay.com>

[AMDGPU][llvm-split] Fix division by zero (#98888)

An empty module, or one containing only declarations, would result in a
division by a zero cost.


# 1c025fb0 24-Jun-2024 Pierre van Houtryve <pierre.vanhoutryve@amd.com>

[AMDGPU][SplitModule] Allow non-kernels to be treated as roots (#95902)

I initially assumed only kernels could be roots, but that is wrong. A
function with no callers also needs to be a root to ens

[AMDGPU][SplitModule] Allow non-kernels to be treated as roots (#95902)

I initially assumed only kernels could be roots, but that is wrong. A
function with no callers also needs to be a root to ensure it is
correctly handled.
They're very rare because we usually internalize everything, and
internal functions with no callers would be deleted.

When they are present, we need to also consider their dependencies and
act accordingly. Previously, we could put a function "by default" in P0,
but it could call another function with internal linkage defined in
another module which was of course incorrect.

Fixes SWDEV-467695

show more ...


# d95b82c4 18-Jun-2024 Pierre van Houtryve <pierre.vanhoutryve@amd.com>

[NFC][AMDGPU] Make AMDGPUSplitModule a ModulePass (#95773)

It allows it to access TTI correctly, and opens the door to accessing
more analysis in the future.

I went back and forth between this,

[NFC][AMDGPU] Make AMDGPUSplitModule a ModulePass (#95773)

It allows it to access TTI correctly, and opens the door to accessing
more analysis in the future.

I went back and forth between this, and also making the default
SplitModule a Pass too to make it uniform, but I decided against it
because it's just needless complications. Neither llvm-split or
LTOBackend have a PM ready to use so we need to create one anyway. Let's
keep all the mess hidden in the AMDGPU version for now to keep this
change more self-contained.

show more ...


Revision tags: llvmorg-18.1.8, llvmorg-18.1.7
# 42c40277 28-May-2024 Pierre van Houtryve <pierre.vanhoutryve@amd.com>

[AMDGPU][SplitModule] Keep looking for more dependencies after finding an indirect call (#93480)

This is just something I noticed while going over this pass logic one
more time and didn't cause iss

[AMDGPU][SplitModule] Keep looking for more dependencies after finding an indirect call (#93480)

This is just something I noticed while going over this pass logic one
more time and didn't cause issues (yet). If we find an indirect call, we
stop looking assuming we added all functions to the list, but if not all
functions in the module were indirectly callable, some may still be
missing.

Just to be safe, keep looking until we did everything we could to find
dependencies, so we don't accidentally miss one.

show more ...


# 43fd244b 23-May-2024 Pierre van Houtryve <pierre.vanhoutryve@amd.com>

Reland "[AMDGPU] Add AMDGPU-specific module splitting (#89245)"

(with fix for ubsan)

This enables the --lto-partitions option to work more consistently.

This module splitting logic is fully aware

Reland "[AMDGPU] Add AMDGPU-specific module splitting (#89245)"

(with fix for ubsan)

This enables the --lto-partitions option to work more consistently.

This module splitting logic is fully aware of AMDGPU modules and their
specificities and takes advantage of
them to split modules in a way that avoids compilation issue (such as
resource usage being incorrectly represented).

This also includes a logging system that's more elaborate than just
LLVM_DEBUG which allows
printing logs to uniquely named files, and optionally with all value
names hidden so they can be safely shared without leaking informatiton
about the source. Logs can also be enabled through an environment
variable, which avoids the sometimes complicated process of passing a
-mllvm option all the way from clang driver to the offload linker that
handles full LTO codegen.

show more ...


# d7c37130 23-May-2024 Pierre van Houtryve <pierre.vanhoutryve@amd.com>

[AMDGPU] Add AMDGPU-specific module splitting (#89245)

This enables the --lto-partitions option to work more consistently.

This module splitting logic is fully aware of AMDGPU modules and their

[AMDGPU] Add AMDGPU-specific module splitting (#89245)

This enables the --lto-partitions option to work more consistently.

This module splitting logic is fully aware of AMDGPU modules and their
specificities and takes advantage of
them to split modules in a way that avoids compilation issue (such as
resource usage being incorrectly represented).

This also includes a logging system that's more elaborate than just
LLVM_DEBUG which allows
printing logs to uniquely named files, and optionally with all value
names hidden so they can be safely shared without leaking informatiton
about the source. Logs can also be enabled through an environment
variable, which avoids the sometimes complicated process of passing a
-mllvm option all the way from clang driver to the offload linker that
handles full LTO codegen.

show more ...