Revision tags: llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3 |
|
#
6b7805fc |
| 28-Apr-2023 |
Jeffrey Byrnes <JeffreyByrnes@amd.com> |
[AMDGPU][IGLP] Add iglp_opt(1) strategy for single wave gemms
This adds the IGLP strategy for single-wave gemms. The SchedGroup pipeline is laid out in multiple phases, with each phase corresponding
[AMDGPU][IGLP] Add iglp_opt(1) strategy for single wave gemms
This adds the IGLP strategy for single-wave gemms. The SchedGroup pipeline is laid out in multiple phases, with each phase corresponding to a distinct pattern present in gemm kernels. The resilience of the optimization is dependent upon IR (as seen by pre-RA scheduling) continuing to have these patterns (as defined by instruction class and dependencies) in their current relative ordering.
The kernels of interest have these specific phases: NT: 1, 2a, 2c NN: 1, 2a, 2b TT: 1, 2b, 2c TN: 1, 2b
The general approach taken was to have a long SchedGroup pipeline. In this way the scheduler will have less capability of doing the wrong thing. In order to resolve the challenge of correctly fitting these long pipelines, we leverage the rules infrastructure to help the solver.
Differential Revision: https://reviews.llvm.org/D149773
Change-Id: I1a35962a95b4bdf740602b8f110d3297c6fb9d96
show more ...
|
#
db619279 |
| 28-Apr-2023 |
Jeffrey Byrnes <Jeffrey.Byrnes@amd.com> |
[AMDGPU][IGLP]: Add rules to SchedGroups
Differential Revision: https://reviews.llvm.org/D146774
Change-Id: Icd7aaaa0b257a25713c22ead0813777cef7d5859
|
#
1721e72d |
| 27-Apr-2023 |
Jeffrey Byrnes <Jeffrey.Byrnes@amd.com> |
[AMDGPU][IGLP] Parameterize the SchedGroup processing / linking order in Solver
Currently the PipelineSolver processes SchedGroups in bottom up manner. However, there is no compelling reason to requ
[AMDGPU][IGLP] Parameterize the SchedGroup processing / linking order in Solver
Currently the PipelineSolver processes SchedGroups in bottom up manner. However, there is no compelling reason to require this. Providing the option to toggle this affords greater experimentation capability, and make usage a bit more intuitive. Importantly, it makes designing rules much easier.
Differential Revision: https://reviews.llvm.org/D149393
Change-Id: Ic4abd3408f9faa105c0eef72eab7873d46083ee4
show more ...
|
Revision tags: llvmorg-16.0.2 |
|
#
72e01ef1 |
| 06-Apr-2023 |
Nico Weber <thakis@chromium.org> |
Revert "[AMDGPU] Add Lower Bound to PipelineSolver"
This reverts commit 3c42a58c4f20ae3b621733bf5ee6d57c912994a9. Breaks tests on mac, see https://reviews.llvm.org/rG3c42a58c4f20ae3b621733bf5ee6d57c
Revert "[AMDGPU] Add Lower Bound to PipelineSolver"
This reverts commit 3c42a58c4f20ae3b621733bf5ee6d57c912994a9. Breaks tests on mac, see https://reviews.llvm.org/rG3c42a58c4f20ae3b621733bf5ee6d57c912994a9#1191724
show more ...
|
Revision tags: llvmorg-16.0.1 |
|
#
3c42a58c |
| 23-Mar-2023 |
Jeff Byrnes <jeffrey.byrnes@amd.com> |
[AMDGPU] Add Lower Bound to PipelineSolver
|
Revision tags: llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init |
|
#
63e7e9c8 |
| 18-Jan-2023 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Treat WMMA the same as MFMA for sched_barrier
MFMA and WMMA essentially the same thing, but apear on different ASICs.
Differential Revision: https://reviews.llvm.org/D142062
|
Revision tags: llvmorg-15.0.7 |
|
#
6443c0ee |
| 12-Dec-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple.
Differential Revision: https://reviews.l
[AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple.
Differential Revision: https://reviews.llvm.org/D139828
show more ...
|
#
67819a72 |
| 13-Dec-2022 |
Fangrui Song <i@maskray.me> |
[CodeGen] llvm::Optional => std::optional
|
#
f9c76a11 |
| 02-Dec-2022 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Update MFMASmallGemmOpt with better performing stategy
Based on experiments this does better with target small GEMM kernels.
Reviewed By: jrbyrnes
Differential Revision: https://reviews.l
[AMDGPU] Update MFMASmallGemmOpt with better performing stategy
Based on experiments this does better with target small GEMM kernels.
Reviewed By: jrbyrnes
Differential Revision: https://reviews.llvm.org/D139227
show more ...
|
#
20cde154 |
| 03-Dec-2022 |
Kazu Hirata <kazu@google.com> |
[Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of
[Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional.
This is part of an effort to migrate from llvm::Optional to std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
show more ...
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0 |
|
#
7d8c2d17 |
| 04-Sep-2022 |
Kazu Hirata <kazu@google.com> |
[llvm] Use range-based for loops (NFC)
Identified with modernize-loop-convert.
|
#
9861a68a |
| 28-Aug-2022 |
Kazu Hirata <kazu@google.com> |
[Target] Qualify auto in range-based for loops (NFC)
|
#
ce9f007c |
| 28-Aug-2022 |
Kazu Hirata <kazu@google.com> |
[llvm] Use llvm::find_if (NFC)
|
Revision tags: llvmorg-15.0.0-rc3 |
|
#
b0f4678b |
| 17-Aug-2022 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Add iglp_opt builtin and MFMA GEMM Opt strategy
Adds a builtin that serves as an optimization hint to apply specific optimized DAG mutations during scheduling. This also disables any other
[AMDGPU] Add iglp_opt builtin and MFMA GEMM Opt strategy
Adds a builtin that serves as an optimization hint to apply specific optimized DAG mutations during scheduling. This also disables any other mutations or clustering that may interfere with the desired pipeline. The first optimization strategy that is added here is designed to improve the performance of small gemm kernels on gfx90a.
Reviewed By: jrbyrnes
Differential Revision: https://reviews.llvm.org/D132079
show more ...
|
Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
#
1c8d7ea9 |
| 20-Jul-2022 |
Jeffrey Byrnes <Jeffrey.Byrnes@amd.com> |
[AMDGPU] Implement pipeline solver for non-trivial pipelines
Requested SchedGroup pipelines may be non-trivial to satisify. A minimimal example is if the requested pipeline is {2 VMEM, 2 VALU, 2 VME
[AMDGPU] Implement pipeline solver for non-trivial pipelines
Requested SchedGroup pipelines may be non-trivial to satisify. A minimimal example is if the requested pipeline is {2 VMEM, 2 VALU, 2 VMEM} and the original order of SUnits is {VMEM, VALU, VMEM, VALU, VMEM}. Because of existing dependencies, the choice of which SchedGroup the middle VMEM goes into impacts how closely we are able to match the requested pipeline. It seems minimizing the degree of misfit (as measured by the number of edges we can't add) w.r.t the choice we make when mapping an instruction -> SchedGroup is an NP problem. This patch implements the PipelineSolver class which produces a solution for the defined problem for the sched_group_barrier mutation. The solver has both an exponential time exact algorithm and a greedy algorithm. The patch includes some controls which allows the user to select the greedy/exact algorithm.
Differential Revision: https://reviews.llvm.org/D130797
show more ...
|
#
6d9cd919 |
| 14-Aug-2022 |
Kazu Hirata <kazu@google.com> |
Use llvm::all_of (NFC)
|
#
7898426a |
| 30-Jul-2022 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Remove unused function
|
#
0f93a45b |
| 28-Jul-2022 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Add isMeta flag to SCHED_GROUP_BARRIER
|
Revision tags: llvmorg-14.0.6 |
|
#
f5b21680 |
| 13-Jun-2022 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Add amdgcn_sched_group_barrier builtin
This builtin allows the creation of custom scheduling pipelines on a per-region basis. Like the sched_barrier builtin this is intended to be used eith
[AMDGPU] Add amdgcn_sched_group_barrier builtin
This builtin allows the creation of custom scheduling pipelines on a per-region basis. Like the sched_barrier builtin this is intended to be used either for testing, in situations where the default scheduler heuristics cannot be improved, or in critical kernels where users are trying to get performance that is close to handwritten assembly. Obviously using these builtins will require extra work from the kernel writer to maintain the desired behavior.
The builtin can be used to create groups of instructions called "scheduling groups" where ordering between the groups is enforced by the scheduler. __builtin_amdgcn_sched_group_barrier takes three parameters. The first parameter is a mask that determines the types of instructions that you would like to synchronize around and add to a scheduling group. These instructions will be selected from the bottom up starting from the sched_group_barrier's location during instruction scheduling. The second parameter is the number of matching instructions that will be associated with this sched_group_barrier. The third parameter is an identifier which is used to describe what other sched_group_barriers should be synchronized with. Note that multiple sched_group_barriers must be added in order for them to be useful since they only synchronize with other sched_group_barriers. Only "scheduling groups" with a matching third parameter will have any enforced ordering between them.
As an example, the code below tries to create a pipeline of 1 VMEM_READ instruction followed by 1 VALU instruction followed by 5 MFMA instructions... // 1 VMEM_READ __builtin_amdgcn_sched_group_barrier(32, 1, 0) // 1 VALU __builtin_amdgcn_sched_group_barrier(2, 1, 0) // 5 MFMA __builtin_amdgcn_sched_group_barrier(8, 5, 0) // 1 VMEM_READ __builtin_amdgcn_sched_group_barrier(32, 1, 0) // 3 VALU __builtin_amdgcn_sched_group_barrier(2, 3, 0) // 2 VMEM_WRITE __builtin_amdgcn_sched_group_barrier(64, 2, 0)
Reviewed By: jrbyrnes
Differential Revision: https://reviews.llvm.org/D128158
show more ...
|
#
064a08cd |
| 21-Jun-2022 |
Kazu Hirata <kazu@google.com> |
Don't use Optional::hasValue (NFC)
|
#
4bba8211 |
| 15-Jun-2022 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Fix buildbot failures after 48ebc1af29
Some buildbots (lto, windows) were failing due to some function reference variables being improperly initialized.
|
Revision tags: llvmorg-14.0.5 |
|
#
48ebc1af |
| 03-Jun-2022 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Add more expressive sched_barrier controls
The sched_barrier builtin allow the scheduler's behavior to be shaped by users when very specific codegen is needed in order to create highly opti
[AMDGPU] Add more expressive sched_barrier controls
The sched_barrier builtin allow the scheduler's behavior to be shaped by users when very specific codegen is needed in order to create highly optimized code. This patch adds more granular control over the types of instructions that are allowed to be reordered with respect to one or multiple sched_barriers. A mask is used to specify groups of instructions that should be allowed to be scheduled around a sched_barrier. The details about this mask may be used can be found in llvm/include/llvm/IR/IntrinsicsAMDGPU.td.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D127123
show more ...
|