History log of /llvm-project/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp (Results 26 – 47 of 47)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3
# 6b7805fc 28-Apr-2023 Jeffrey Byrnes <JeffreyByrnes@amd.com>

[AMDGPU][IGLP] Add iglp_opt(1) strategy for single wave gemms

This adds the IGLP strategy for single-wave gemms. The SchedGroup pipeline is laid out in multiple phases, with each phase corresponding

[AMDGPU][IGLP] Add iglp_opt(1) strategy for single wave gemms

This adds the IGLP strategy for single-wave gemms. The SchedGroup pipeline is laid out in multiple phases, with each phase corresponding to a distinct pattern present in gemm kernels. The resilience of the optimization is dependent upon IR (as seen by pre-RA scheduling) continuing to have these patterns (as defined by instruction class and dependencies) in their current relative ordering.

The kernels of interest have these specific phases:
NT: 1, 2a, 2c
NN: 1, 2a, 2b
TT: 1, 2b, 2c
TN: 1, 2b

The general approach taken was to have a long SchedGroup pipeline. In this way the scheduler will have less capability of doing the wrong thing. In order to resolve the challenge of correctly fitting these long pipelines, we leverage the rules infrastructure to help the solver.

Differential Revision: https://reviews.llvm.org/D149773

Change-Id: I1a35962a95b4bdf740602b8f110d3297c6fb9d96

show more ...


# db619279 28-Apr-2023 Jeffrey Byrnes <Jeffrey.Byrnes@amd.com>

[AMDGPU][IGLP]: Add rules to SchedGroups

Differential Revision: https://reviews.llvm.org/D146774

Change-Id: Icd7aaaa0b257a25713c22ead0813777cef7d5859


# 1721e72d 27-Apr-2023 Jeffrey Byrnes <Jeffrey.Byrnes@amd.com>

[AMDGPU][IGLP] Parameterize the SchedGroup processing / linking order in Solver

Currently the PipelineSolver processes SchedGroups in bottom up manner. However, there is no compelling reason to requ

[AMDGPU][IGLP] Parameterize the SchedGroup processing / linking order in Solver

Currently the PipelineSolver processes SchedGroups in bottom up manner. However, there is no compelling reason to require this. Providing the option to toggle this affords greater experimentation capability, and make usage a bit more intuitive. Importantly, it makes designing rules much easier.

Differential Revision: https://reviews.llvm.org/D149393

Change-Id: Ic4abd3408f9faa105c0eef72eab7873d46083ee4

show more ...


Revision tags: llvmorg-16.0.2
# 72e01ef1 06-Apr-2023 Nico Weber <thakis@chromium.org>

Revert "[AMDGPU] Add Lower Bound to PipelineSolver"

This reverts commit 3c42a58c4f20ae3b621733bf5ee6d57c912994a9.
Breaks tests on mac, see https://reviews.llvm.org/rG3c42a58c4f20ae3b621733bf5ee6d57c

Revert "[AMDGPU] Add Lower Bound to PipelineSolver"

This reverts commit 3c42a58c4f20ae3b621733bf5ee6d57c912994a9.
Breaks tests on mac, see https://reviews.llvm.org/rG3c42a58c4f20ae3b621733bf5ee6d57c912994a9#1191724

show more ...


Revision tags: llvmorg-16.0.1
# 3c42a58c 23-Mar-2023 Jeff Byrnes <jeffrey.byrnes@amd.com>

[AMDGPU] Add Lower Bound to PipelineSolver


Revision tags: llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init
# 63e7e9c8 18-Jan-2023 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Treat WMMA the same as MFMA for sched_barrier

MFMA and WMMA essentially the same thing, but apear on different ASICs.

Differential Revision: https://reviews.llvm.org/D142062


Revision tags: llvmorg-15.0.7
# 6443c0ee 12-Dec-2022 Jay Foad <jay.foad@amd.com>

[AMDGPU] Stop using make_pair and make_tuple. NFC.

C++17 allows us to call constructors pair and tuple instead of helper
functions make_pair and make_tuple.

Differential Revision: https://reviews.l

[AMDGPU] Stop using make_pair and make_tuple. NFC.

C++17 allows us to call constructors pair and tuple instead of helper
functions make_pair and make_tuple.

Differential Revision: https://reviews.llvm.org/D139828

show more ...


# 67819a72 13-Dec-2022 Fangrui Song <i@maskray.me>

[CodeGen] llvm::Optional => std::optional


# f9c76a11 02-Dec-2022 Austin Kerbow <Austin.Kerbow@amd.com>

[AMDGPU] Update MFMASmallGemmOpt with better performing stategy

Based on experiments this does better with target small GEMM kernels.

Reviewed By: jrbyrnes

Differential Revision: https://reviews.l

[AMDGPU] Update MFMASmallGemmOpt with better performing stategy

Based on experiments this does better with target small GEMM kernels.

Reviewed By: jrbyrnes

Differential Revision: https://reviews.llvm.org/D139227

show more ...


# 20cde154 03-Dec-2022 Kazu Hirata <kazu@google.com>

[Target] Use std::nullopt instead of None (NFC)

This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of

[Target] Use std::nullopt instead of None (NFC)

This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716

show more ...


Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0
# 7d8c2d17 04-Sep-2022 Kazu Hirata <kazu@google.com>

[llvm] Use range-based for loops (NFC)

Identified with modernize-loop-convert.


# 9861a68a 28-Aug-2022 Kazu Hirata <kazu@google.com>

[Target] Qualify auto in range-based for loops (NFC)


# ce9f007c 28-Aug-2022 Kazu Hirata <kazu@google.com>

[llvm] Use llvm::find_if (NFC)


Revision tags: llvmorg-15.0.0-rc3
# b0f4678b 17-Aug-2022 Austin Kerbow <Austin.Kerbow@amd.com>

[AMDGPU] Add iglp_opt builtin and MFMA GEMM Opt strategy

Adds a builtin that serves as an optimization hint to apply specific optimized
DAG mutations during scheduling. This also disables any other

[AMDGPU] Add iglp_opt builtin and MFMA GEMM Opt strategy

Adds a builtin that serves as an optimization hint to apply specific optimized
DAG mutations during scheduling. This also disables any other mutations or
clustering that may interfere with the desired pipeline. The first optimization
strategy that is added here is designed to improve the performance of small gemm
kernels on gfx90a.

Reviewed By: jrbyrnes

Differential Revision: https://reviews.llvm.org/D132079

show more ...


Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 1c8d7ea9 20-Jul-2022 Jeffrey Byrnes <Jeffrey.Byrnes@amd.com>

[AMDGPU] Implement pipeline solver for non-trivial pipelines

Requested SchedGroup pipelines may be non-trivial to satisify. A minimimal example is if the requested pipeline is {2 VMEM, 2 VALU, 2 VME

[AMDGPU] Implement pipeline solver for non-trivial pipelines

Requested SchedGroup pipelines may be non-trivial to satisify. A minimimal example is if the requested pipeline is {2 VMEM, 2 VALU, 2 VMEM} and the original order of SUnits is {VMEM, VALU, VMEM, VALU, VMEM}. Because of existing dependencies, the choice of which SchedGroup the middle VMEM goes into impacts how closely we are able to match the requested pipeline. It seems minimizing the degree of misfit (as measured by the number of edges we can't add) w.r.t the choice we make when mapping an instruction -> SchedGroup is an NP problem. This patch implements the PipelineSolver class which produces a solution for the defined problem for the sched_group_barrier mutation. The solver has both an exponential time exact algorithm and a greedy algorithm. The patch includes some controls which allows the user to select the greedy/exact algorithm.

Differential Revision: https://reviews.llvm.org/D130797

show more ...


# 6d9cd919 14-Aug-2022 Kazu Hirata <kazu@google.com>

Use llvm::all_of (NFC)


# 7898426a 30-Jul-2022 Austin Kerbow <Austin.Kerbow@amd.com>

[AMDGPU] Remove unused function


# 0f93a45b 28-Jul-2022 Austin Kerbow <Austin.Kerbow@amd.com>

[AMDGPU] Add isMeta flag to SCHED_GROUP_BARRIER


Revision tags: llvmorg-14.0.6
# f5b21680 13-Jun-2022 Austin Kerbow <Austin.Kerbow@amd.com>

[AMDGPU] Add amdgcn_sched_group_barrier builtin

This builtin allows the creation of custom scheduling pipelines on a per-region
basis. Like the sched_barrier builtin this is intended to be used eith

[AMDGPU] Add amdgcn_sched_group_barrier builtin

This builtin allows the creation of custom scheduling pipelines on a per-region
basis. Like the sched_barrier builtin this is intended to be used either for
testing, in situations where the default scheduler heuristics cannot be
improved, or in critical kernels where users are trying to get performance that
is close to handwritten assembly. Obviously using these builtins will require
extra work from the kernel writer to maintain the desired behavior.

The builtin can be used to create groups of instructions called "scheduling
groups" where ordering between the groups is enforced by the scheduler.
__builtin_amdgcn_sched_group_barrier takes three parameters. The first parameter
is a mask that determines the types of instructions that you would like to
synchronize around and add to a scheduling group. These instructions will be
selected from the bottom up starting from the sched_group_barrier's location
during instruction scheduling. The second parameter is the number of matching
instructions that will be associated with this sched_group_barrier. The third
parameter is an identifier which is used to describe what other
sched_group_barriers should be synchronized with. Note that multiple
sched_group_barriers must be added in order for them to be useful since they
only synchronize with other sched_group_barriers. Only "scheduling groups" with
a matching third parameter will have any enforced ordering between them.

As an example, the code below tries to create a pipeline of 1 VMEM_READ
instruction followed by 1 VALU instruction followed by 5 MFMA instructions...
// 1 VMEM_READ
__builtin_amdgcn_sched_group_barrier(32, 1, 0)
// 1 VALU
__builtin_amdgcn_sched_group_barrier(2, 1, 0)
// 5 MFMA
__builtin_amdgcn_sched_group_barrier(8, 5, 0)
// 1 VMEM_READ
__builtin_amdgcn_sched_group_barrier(32, 1, 0)
// 3 VALU
__builtin_amdgcn_sched_group_barrier(2, 3, 0)
// 2 VMEM_WRITE
__builtin_amdgcn_sched_group_barrier(64, 2, 0)

Reviewed By: jrbyrnes

Differential Revision: https://reviews.llvm.org/D128158

show more ...


# 064a08cd 21-Jun-2022 Kazu Hirata <kazu@google.com>

Don't use Optional::hasValue (NFC)


# 4bba8211 15-Jun-2022 Austin Kerbow <Austin.Kerbow@amd.com>

[AMDGPU] Fix buildbot failures after 48ebc1af29

Some buildbots (lto, windows) were failing due to some function reference
variables being improperly initialized.


Revision tags: llvmorg-14.0.5
# 48ebc1af 03-Jun-2022 Austin Kerbow <Austin.Kerbow@amd.com>

[AMDGPU] Add more expressive sched_barrier controls

The sched_barrier builtin allow the scheduler's behavior to be shaped by users
when very specific codegen is needed in order to create highly opti

[AMDGPU] Add more expressive sched_barrier controls

The sched_barrier builtin allow the scheduler's behavior to be shaped by users
when very specific codegen is needed in order to create highly optimized code.
This patch adds more granular control over the types of instructions that are
allowed to be reordered with respect to one or multiple sched_barriers. A mask
is used to specify groups of instructions that should be allowed to be scheduled
around a sched_barrier. The details about this mask may be used can be found in
llvm/include/llvm/IR/IntrinsicsAMDGPU.td.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D127123

show more ...


12