AMDGPUIGroupLP.cpp - OpenGrok history log for /llvm-project/llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3
# 6b7805fc	28-Apr-2023	Jeffrey Byrnes <JeffreyByrnes@amd.com>	[AMDGPU][IGLP] Add iglp_opt(1) strategy for single wave gemms This adds the IGLP strategy for single-wave gemms. The SchedGroup pipeline is laid out in multiple phases, with each phase corresponding [AMDGPU][IGLP] Add iglp_opt(1) strategy for single wave gemms This adds the IGLP strategy for single-wave gemms. The SchedGroup pipeline is laid out in multiple phases, with each phase corresponding to a distinct pattern present in gemm kernels. The resilience of the optimization is dependent upon IR (as seen by pre-RA scheduling) continuing to have these patterns (as defined by instruction class and dependencies) in their current relative ordering. The kernels of interest have these specific phases: NT: 1, 2a, 2c NN: 1, 2a, 2b TT: 1, 2b, 2c TN: 1, 2b The general approach taken was to have a long SchedGroup pipeline. In this way the scheduler will have less capability of doing the wrong thing. In order to resolve the challenge of correctly fitting these long pipelines, we leverage the rules infrastructure to help the solver. Differential Revision: https://reviews.llvm.org/D149773 Change-Id: I1a35962a95b4bdf740602b8f110d3297c6fb9d96 show more ...
# db619279	28-Apr-2023	Jeffrey Byrnes <Jeffrey.Byrnes@amd.com>	[AMDGPU][IGLP]: Add rules to SchedGroups Differential Revision: https://reviews.llvm.org/D146774 Change-Id: Icd7aaaa0b257a25713c22ead0813777cef7d5859
# 1721e72d	27-Apr-2023	Jeffrey Byrnes <Jeffrey.Byrnes@amd.com>	[AMDGPU][IGLP] Parameterize the SchedGroup processing / linking order in Solver Currently the PipelineSolver processes SchedGroups in bottom up manner. However, there is no compelling reason to requ [AMDGPU][IGLP] Parameterize the SchedGroup processing / linking order in Solver Currently the PipelineSolver processes SchedGroups in bottom up manner. However, there is no compelling reason to require this. Providing the option to toggle this affords greater experimentation capability, and make usage a bit more intuitive. Importantly, it makes designing rules much easier. Differential Revision: https://reviews.llvm.org/D149393 Change-Id: Ic4abd3408f9faa105c0eef72eab7873d46083ee4 show more ...
Revision tags: llvmorg-16.0.2
# 72e01ef1	06-Apr-2023	Nico Weber <thakis@chromium.org>	Revert "[AMDGPU] Add Lower Bound to PipelineSolver" This reverts commit 3c42a58c4f20ae3b621733bf5ee6d57c912994a9. Breaks tests on mac, see https://reviews.llvm.org/rG3c42a58c4f20ae3b621733bf5ee6d57c Revert "[AMDGPU] Add Lower Bound to PipelineSolver" This reverts commit 3c42a58c4f20ae3b621733bf5ee6d57c912994a9. Breaks tests on mac, see https://reviews.llvm.org/rG3c42a58c4f20ae3b621733bf5ee6d57c912994a9#1191724 show more ...
Revision tags: llvmorg-16.0.1
# 3c42a58c	23-Mar-2023	Jeff Byrnes <jeffrey.byrnes@amd.com>	[AMDGPU] Add Lower Bound to PipelineSolver
Revision tags: llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init
# 63e7e9c8	18-Jan-2023	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	[AMDGPU] Treat WMMA the same as MFMA for sched_barrier MFMA and WMMA essentially the same thing, but apear on different ASICs. Differential Revision: https://reviews.llvm.org/D142062
Revision tags: llvmorg-15.0.7
# 6443c0ee	12-Dec-2022	Jay Foad <jay.foad@amd.com>	[AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.l [AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828 show more ...
# 67819a72	13-Dec-2022	Fangrui Song <i@maskray.me>	[CodeGen] llvm::Optional => std::optional
# f9c76a11	02-Dec-2022	Austin Kerbow <Austin.Kerbow@amd.com>	[AMDGPU] Update MFMASmallGemmOpt with better performing stategy Based on experiments this does better with target small GEMM kernels. Reviewed By: jrbyrnes Differential Revision: https://reviews.l [AMDGPU] Update MFMASmallGemmOpt with better performing stategy Based on experiments this does better with target small GEMM kernels. Reviewed By: jrbyrnes Differential Revision: https://reviews.llvm.org/D139227 show more ...
# 20cde154	03-Dec-2022	Kazu Hirata <kazu@google.com>	[Target] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of [Target] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716 show more ...
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0
# 7d8c2d17	04-Sep-2022	Kazu Hirata <kazu@google.com>	[llvm] Use range-based for loops (NFC) Identified with modernize-loop-convert.
# 9861a68a	28-Aug-2022	Kazu Hirata <kazu@google.com>	[Target] Qualify auto in range-based for loops (NFC)
# ce9f007c	28-Aug-2022	Kazu Hirata <kazu@google.com>	[llvm] Use llvm::find_if (NFC)
Revision tags: llvmorg-15.0.0-rc3
# b0f4678b	17-Aug-2022	Austin Kerbow <Austin.Kerbow@amd.com>	[AMDGPU] Add iglp_opt builtin and MFMA GEMM Opt strategy Adds a builtin that serves as an optimization hint to apply specific optimized DAG mutations during scheduling. This also disables any other [AMDGPU] Add iglp_opt builtin and MFMA GEMM Opt strategy Adds a builtin that serves as an optimization hint to apply specific optimized DAG mutations during scheduling. This also disables any other mutations or clustering that may interfere with the desired pipeline. The first optimization strategy that is added here is designed to improve the performance of small gemm kernels on gfx90a. Reviewed By: jrbyrnes Differential Revision: https://reviews.llvm.org/D132079 show more ...
Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 1c8d7ea9	20-Jul-2022	Jeffrey Byrnes <Jeffrey.Byrnes@amd.com>	[AMDGPU] Implement pipeline solver for non-trivial pipelines Requested SchedGroup pipelines may be non-trivial to satisify. A minimimal example is if the requested pipeline is {2 VMEM, 2 VALU, 2 VME [AMDGPU] Implement pipeline solver for non-trivial pipelines Requested SchedGroup pipelines may be non-trivial to satisify. A minimimal example is if the requested pipeline is {2 VMEM, 2 VALU, 2 VMEM} and the original order of SUnits is {VMEM, VALU, VMEM, VALU, VMEM}. Because of existing dependencies, the choice of which SchedGroup the middle VMEM goes into impacts how closely we are able to match the requested pipeline. It seems minimizing the degree of misfit (as measured by the number of edges we can't add) w.r.t the choice we make when mapping an instruction -> SchedGroup is an NP problem. This patch implements the PipelineSolver class which produces a solution for the defined problem for the sched_group_barrier mutation. The solver has both an exponential time exact algorithm and a greedy algorithm. The patch includes some controls which allows the user to select the greedy/exact algorithm. Differential Revision: https://reviews.llvm.org/D130797 show more ...
# 6d9cd919	14-Aug-2022	Kazu Hirata <kazu@google.com>	Use llvm::all_of (NFC)
# 7898426a	30-Jul-2022	Austin Kerbow <Austin.Kerbow@amd.com>	[AMDGPU] Remove unused function
# 0f93a45b	28-Jul-2022	Austin Kerbow <Austin.Kerbow@amd.com>	[AMDGPU] Add isMeta flag to SCHED_GROUP_BARRIER
Revision tags: llvmorg-14.0.6
# f5b21680	13-Jun-2022	Austin Kerbow <Austin.Kerbow@amd.com>	[AMDGPU] Add amdgcn_sched_group_barrier builtin This builtin allows the creation of custom scheduling pipelines on a per-region basis. Like the sched_barrier builtin this is intended to be used eith [AMDGPU] Add amdgcn_sched_group_barrier builtin This builtin allows the creation of custom scheduling pipelines on a per-region basis. Like the sched_barrier builtin this is intended to be used either for testing, in situations where the default scheduler heuristics cannot be improved, or in critical kernels where users are trying to get performance that is close to handwritten assembly. Obviously using these builtins will require extra work from the kernel writer to maintain the desired behavior. The builtin can be used to create groups of instructions called "scheduling groups" where ordering between the groups is enforced by the scheduler. __builtin_amdgcn_sched_group_barrier takes three parameters. The first parameter is a mask that determines the types of instructions that you would like to synchronize around and add to a scheduling group. These instructions will be selected from the bottom up starting from the sched_group_barrier's location during instruction scheduling. The second parameter is the number of matching instructions that will be associated with this sched_group_barrier. The third parameter is an identifier which is used to describe what other sched_group_barriers should be synchronized with. Note that multiple sched_group_barriers must be added in order for them to be useful since they only synchronize with other sched_group_barriers. Only "scheduling groups" with a matching third parameter will have any enforced ordering between them. As an example, the code below tries to create a pipeline of 1 VMEM_READ instruction followed by 1 VALU instruction followed by 5 MFMA instructions... // 1 VMEM_READ __builtin_amdgcn_sched_group_barrier(32, 1, 0) // 1 VALU __builtin_amdgcn_sched_group_barrier(2, 1, 0) // 5 MFMA __builtin_amdgcn_sched_group_barrier(8, 5, 0) // 1 VMEM_READ __builtin_amdgcn_sched_group_barrier(32, 1, 0) // 3 VALU __builtin_amdgcn_sched_group_barrier(2, 3, 0) // 2 VMEM_WRITE __builtin_amdgcn_sched_group_barrier(64, 2, 0) Reviewed By: jrbyrnes Differential Revision: https://reviews.llvm.org/D128158 show more ...
# 064a08cd	21-Jun-2022	Kazu Hirata <kazu@google.com>	Don't use Optional::hasValue (NFC)
# 4bba8211	15-Jun-2022	Austin Kerbow <Austin.Kerbow@amd.com>	[AMDGPU] Fix buildbot failures after 48ebc1af29 Some buildbots (lto, windows) were failing due to some function reference variables being improperly initialized.
Revision tags: llvmorg-14.0.5
# 48ebc1af	03-Jun-2022	Austin Kerbow <Austin.Kerbow@amd.com>	[AMDGPU] Add more expressive sched_barrier controls The sched_barrier builtin allow the scheduler's behavior to be shaped by users when very specific codegen is needed in order to create highly opti [AMDGPU] Add more expressive sched_barrier controls The sched_barrier builtin allow the scheduler's behavior to be shaped by users when very specific codegen is needed in order to create highly optimized code. This patch adds more granular control over the types of instructions that are allowed to be reordered with respect to one or multiple sched_barriers. A mask is used to specify groups of instructions that should be allowed to be scheduled around a sched_barrier. The details about this mask may be used can be found in llvm/include/llvm/IR/IntrinsicsAMDGPU.td. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D127123 show more ...
12