#
43fd4c49 |
| 18-Jun-2024 |
Krzysztof Drewniak <Krzysztof.Drewniak@amd.com> |
[mlir][GPU] Improve handling of GPU bounds (#95166)
This change reworks how range information for GPU dispatch IDs (block
IDs, thread IDs, and so on) is handled.
1. `known_block_size` and `known
[mlir][GPU] Improve handling of GPU bounds (#95166)
This change reworks how range information for GPU dispatch IDs (block
IDs, thread IDs, and so on) is handled.
1. `known_block_size` and `known_grid_size` become inherent attributes
of GPU functions. This makes them less clunky to work with. As a
consequence, the `gpu.func` lowering patterns now only look at the
inherent attributes when setting target-specific attributes on the
`llvm.func` that they lower to.
2. At the same time, `gpu.known_block_size` and `gpu.known_grid_size`
are made official dialect-level discardable attributes which can be
placed on arbitrary functions. This allows for progressive lowerings
(without this, a lowering for `gpu.thread_id` couldn't know about the
bounds if it had already been moved from a `gpu.func` to an `llvm.func`)
and allows for range information to be provided even when
`gpu.*_{id,dim}` are being used outside of a `gpu.func` context.
3. All of these index operations have gained an optional `upper_bound`
attribute, allowing for an alternate mode of operation where the bounds
are specified locally and not inherited from the operation's context.
These also allow handling of cases where the precise launch sizes aren't
known, but can be bounded more precisely than the maximum of what any
platform's API allows. (I'd like to thank @benvanik for pointing out
that this could be useful.)
When inferring bounds (either for range inference or for setting `range`
during lowering) these sources of information are consulted in order of
specificity (`upper_bound` > inherent attribute > discardable attribute,
except that dimension sizes check for `known_*_bounds` to see if they
can be constant-folded before checking their `upper_bound`).
This patch also updates the documentation about the bounds and inference
behavior to clarify what these attributes do when set and the
consequences of setting them up incorrectly.
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
show more ...
|
Revision tags: llvmorg-18.1.8 |
|
#
bd6568c9 |
| 14-Jun-2024 |
Pradeep Kumar <pradeepisro49@gmail.com> |
[MLIR][GPU] Add gpu.cluster_dim_blocks and gpu.cluster_block_id Ops (#95245)
This commit adds support for `gpu.cluster_dim_blocks` and
`gpu.cluster_block_id` Ops to represent number of blocks per c
[MLIR][GPU] Add gpu.cluster_dim_blocks and gpu.cluster_block_id Ops (#95245)
This commit adds support for `gpu.cluster_dim_blocks` and
`gpu.cluster_block_id` Ops to represent number of blocks per cluster and
block id inside a cluster respectively. Also, fixed the description of
`gpu.cluster_dim` Op and updated the `cga_cluster.mlir` test file to use
`gpu.cluster_dim_blocks`
Co-authored-by: pradeepku <pradeepku@nvidia.com>
Co-authored-by: Guray Ozen <guray.ozen@gmail.com>
show more ...
|
Revision tags: llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6 |
|
#
edf5cae7 |
| 27-Nov-2023 |
Guray Ozen <guray.ozen@gmail.com> |
[mlir][gpu] Support Cluster of Thread Blocks in `gpu.launch_func` (#72871)
NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA).
It is a new level of parallelism, allowing cluste
[mlir][gpu] Support Cluster of Thread Blocks in `gpu.launch_func` (#72871)
NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA).
It is a new level of parallelism, allowing clustering of Cooperative
Thread Arrays (CTA) to synchronize and communicate through shared memory
while running concurrently.
This PR enables support for CGA within the `gpu.launch_func` in the GPU
dialect. It extends `gpu.launch_func` to accommodate this functionality.
The GPU dialect remains architecture-agnostic, so we've added CGA
functionality as optional parameters. We want to leverage mechanisms
that we have in the GPU dialects such as outlining and kernel launching,
making it a practical and convenient choice.
An example of this implementation can be seen below:
```
gpu.launch_func @kernel_module::@kernel
clusters in (%1, %0, %0) // <-- Optional
blocks in (%0, %0, %0)
threads in (%0, %0, %0)
```
The PR also introduces index and dimensions Ops specific to clusters,
binding them to NVVM Ops:
```
%cidX = gpu.cluster_id x
%cidY = gpu.cluster_id y
%cidZ = gpu.cluster_id z
%cdimX = gpu.cluster_dim x
%cdimY = gpu.cluster_dim y
%cdimZ = gpu.cluster_dim z
```
We will introduce cluster support in `gpu.launch` Op in an upcoming PR.
See [the
documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays)
provided by NVIDIA for details.
show more ...
|
Revision tags: llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init |
|
#
0a81ace0 |
| 14-Jan-2023 |
Kazu Hirata <kazu@google.com> |
[mlir] Use std::optional instead of llvm::Optional (NFC)
This patch replaces (llvm::|)Optional< with std::optional<. I'll post a separate patch to remove #include "llvm/ADT/Optional.h".
This is pa
[mlir] Use std::optional instead of llvm::Optional (NFC)
This patch replaces (llvm::|)Optional< with std::optional<. I'll post a separate patch to remove #include "llvm/ADT/Optional.h".
This is part of an effort to migrate from llvm::Optional to std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
show more ...
|
Revision tags: llvmorg-15.0.7 |
|
#
be575c5d |
| 29-Dec-2022 |
Krzysztof Drewniak <Krzysztof.Drewniak@amd.com> |
Re-land D139865 "Add known_block_size and known_grid_size to gpu.func"
This should fix the MSVC warning that caused the previous revert.
Reviewed By: antiagainst
Differential Revision: https://rev
Re-land D139865 "Add known_block_size and known_grid_size to gpu.func"
This should fix the MSVC warning that caused the previous revert.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D140766
show more ...
|
#
828b4762 |
| 24-Dec-2022 |
Stella Stamenova <stilis@microsoft.com> |
Revert "[mlir][GPU] Add known_block_size and known_grid_size to gpu.func"
This reverts commit 85e38d7cd670371206f6067772dc822049d2cbd8.
This broke the windows mlir buildbot: https://lab.llvm.org/bu
Revert "[mlir][GPU] Add known_block_size and known_grid_size to gpu.func"
This reverts commit 85e38d7cd670371206f6067772dc822049d2cbd8.
This broke the windows mlir buildbot: https://lab.llvm.org/buildbot/#/builders/13/builds/30180/steps/6/logs/stdio
show more ...
|
#
85e38d7c |
| 02-Dec-2022 |
Krzysztof Drewniak <Krzysztof.Drewniak@amd.com> |
[mlir][GPU] Add known_block_size and known_grid_size to gpu.func
In many cases, the the number of workgroups (the grid size) and the number of workitems within each group (the block size) that a GPU
[mlir][GPU] Add known_block_size and known_grid_size to gpu.func
In many cases, the the number of workgroups (the grid size) and the number of workitems within each group (the block size) that a GPU kernel will be launched with are known. For example, if gpu.launch is called with constant block and grid sizes, we know that those are the only possible sizes that will be used to launch that kernel. In other cases, a custom code-generation pipeline that eventually produces GPU kernels may know the launch dimensions of those kernels, or at least may be able to provide an upper bound on them.
Other GPU programming systems, such as OpenCL, allow capturing such information to enable compiler optimizations - see reqd_work_group_size, but MLIR currently has no mechanism for doing so.
This set of attributes is the first step in enabling optimizations based on the known launch dimensions of kernels. It extends the kernel outline pass to set these bounds on kernels with constant launch dimensions and extends integer range inference for GPU index operations to account for the bounds when they are known.
Subsequent revisions will use this data when lowering GPU operations to the ROCDL dialect.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D139865
show more ...
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2 |
|
#
10c04f46 |
| 30-Sep-2022 |
River Riddle <riddleriver@gmail.com> |
[mlir:GPU][NFC] Update GPU API to use prefixed accessors
This doesn't flip the switch for prefix generation yet, that'll be done in a followup.
|
Revision tags: llvmorg-15.0.1, llvmorg-15.0.0 |
|
#
28c17a4b |
| 29-Aug-2022 |
Mehdi Amini <joker.eph@gmail.com> |
Apply clang-tidy fixes for performance-unnecessary-value-param in InferIntRangeInterfaceImpls.cpp (NFC)
|
Revision tags: llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
#
3e01af09 |
| 04-Jul-2022 |
Christian Sigg <csigg@google.com> |
[mlir] Add InferIntRangeInterface to gpu.launch
Infers block/grid dimensions/indices or ranges of such dimensions/indices.
Reviewed By: krzysz00
Differential Revision: https://reviews.llvm.org/D12
[mlir] Add InferIntRangeInterface to gpu.launch
Infers block/grid dimensions/indices or ranges of such dimensions/indices.
Reviewed By: krzysz00
Differential Revision: https://reviews.llvm.org/D129036
show more ...
|