History log of /llvm-project/mlir/lib/Dialect/GPU/IR/InferIntRangeInterfaceImpls.cpp (Results 1 – 10 of 10)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 43fd4c49 18-Jun-2024 Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>

[mlir][GPU] Improve handling of GPU bounds (#95166)

This change reworks how range information for GPU dispatch IDs (block
IDs, thread IDs, and so on) is handled.

1. `known_block_size` and `known

[mlir][GPU] Improve handling of GPU bounds (#95166)

This change reworks how range information for GPU dispatch IDs (block
IDs, thread IDs, and so on) is handled.

1. `known_block_size` and `known_grid_size` become inherent attributes
of GPU functions. This makes them less clunky to work with. As a
consequence, the `gpu.func` lowering patterns now only look at the
inherent attributes when setting target-specific attributes on the
`llvm.func` that they lower to.
2. At the same time, `gpu.known_block_size` and `gpu.known_grid_size`
are made official dialect-level discardable attributes which can be
placed on arbitrary functions. This allows for progressive lowerings
(without this, a lowering for `gpu.thread_id` couldn't know about the
bounds if it had already been moved from a `gpu.func` to an `llvm.func`)
and allows for range information to be provided even when
`gpu.*_{id,dim}` are being used outside of a `gpu.func` context.
3. All of these index operations have gained an optional `upper_bound`
attribute, allowing for an alternate mode of operation where the bounds
are specified locally and not inherited from the operation's context.
These also allow handling of cases where the precise launch sizes aren't
known, but can be bounded more precisely than the maximum of what any
platform's API allows. (I'd like to thank @benvanik for pointing out
that this could be useful.)

When inferring bounds (either for range inference or for setting `range`
during lowering) these sources of information are consulted in order of
specificity (`upper_bound` > inherent attribute > discardable attribute,
except that dimension sizes check for `known_*_bounds` to see if they
can be constant-folded before checking their `upper_bound`).

This patch also updates the documentation about the bounds and inference
behavior to clarify what these attributes do when set and the
consequences of setting them up incorrectly.

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>

show more ...


Revision tags: llvmorg-18.1.8
# bd6568c9 14-Jun-2024 Pradeep Kumar <pradeepisro49@gmail.com>

[MLIR][GPU] Add gpu.cluster_dim_blocks and gpu.cluster_block_id Ops (#95245)

This commit adds support for `gpu.cluster_dim_blocks` and
`gpu.cluster_block_id` Ops to represent number of blocks per c

[MLIR][GPU] Add gpu.cluster_dim_blocks and gpu.cluster_block_id Ops (#95245)

This commit adds support for `gpu.cluster_dim_blocks` and
`gpu.cluster_block_id` Ops to represent number of blocks per cluster and
block id inside a cluster respectively. Also, fixed the description of
`gpu.cluster_dim` Op and updated the `cga_cluster.mlir` test file to use
`gpu.cluster_dim_blocks`

Co-authored-by: pradeepku <pradeepku@nvidia.com>
Co-authored-by: Guray Ozen <guray.ozen@gmail.com>

show more ...


Revision tags: llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6
# edf5cae7 27-Nov-2023 Guray Ozen <guray.ozen@gmail.com>

[mlir][gpu] Support Cluster of Thread Blocks in `gpu.launch_func` (#72871)

NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA).
It is a new level of parallelism, allowing cluste

[mlir][gpu] Support Cluster of Thread Blocks in `gpu.launch_func` (#72871)

NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA).
It is a new level of parallelism, allowing clustering of Cooperative
Thread Arrays (CTA) to synchronize and communicate through shared memory
while running concurrently.

This PR enables support for CGA within the `gpu.launch_func` in the GPU
dialect. It extends `gpu.launch_func` to accommodate this functionality.

The GPU dialect remains architecture-agnostic, so we've added CGA
functionality as optional parameters. We want to leverage mechanisms
that we have in the GPU dialects such as outlining and kernel launching,
making it a practical and convenient choice.

An example of this implementation can be seen below:

```
gpu.launch_func @kernel_module::@kernel
clusters in (%1, %0, %0) // <-- Optional
blocks in (%0, %0, %0)
threads in (%0, %0, %0)
```

The PR also introduces index and dimensions Ops specific to clusters,
binding them to NVVM Ops:

```
%cidX = gpu.cluster_id x
%cidY = gpu.cluster_id y
%cidZ = gpu.cluster_id z

%cdimX = gpu.cluster_dim x
%cdimY = gpu.cluster_dim y
%cdimZ = gpu.cluster_dim z
```

We will introduce cluster support in `gpu.launch` Op in an upcoming PR.

See [the
documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays)
provided by NVIDIA for details.

show more ...


Revision tags: llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init
# 0a81ace0 14-Jan-2023 Kazu Hirata <kazu@google.com>

[mlir] Use std::optional instead of llvm::Optional (NFC)

This patch replaces (llvm::|)Optional< with std::optional<. I'll post
a separate patch to remove #include "llvm/ADT/Optional.h".

This is pa

[mlir] Use std::optional instead of llvm::Optional (NFC)

This patch replaces (llvm::|)Optional< with std::optional<. I'll post
a separate patch to remove #include "llvm/ADT/Optional.h".

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716

show more ...


Revision tags: llvmorg-15.0.7
# be575c5d 29-Dec-2022 Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>

Re-land D139865 "Add known_block_size and known_grid_size to gpu.func"

This should fix the MSVC warning that caused the previous revert.

Reviewed By: antiagainst

Differential Revision: https://rev

Re-land D139865 "Add known_block_size and known_grid_size to gpu.func"

This should fix the MSVC warning that caused the previous revert.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D140766

show more ...


# 828b4762 24-Dec-2022 Stella Stamenova <stilis@microsoft.com>

Revert "[mlir][GPU] Add known_block_size and known_grid_size to gpu.func"

This reverts commit 85e38d7cd670371206f6067772dc822049d2cbd8.

This broke the windows mlir buildbot:
https://lab.llvm.org/bu

Revert "[mlir][GPU] Add known_block_size and known_grid_size to gpu.func"

This reverts commit 85e38d7cd670371206f6067772dc822049d2cbd8.

This broke the windows mlir buildbot:
https://lab.llvm.org/buildbot/#/builders/13/builds/30180/steps/6/logs/stdio

show more ...


# 85e38d7c 02-Dec-2022 Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>

[mlir][GPU] Add known_block_size and known_grid_size to gpu.func

In many cases, the the number of workgroups (the grid size) and the
number of workitems within each group (the block size) that a GPU

[mlir][GPU] Add known_block_size and known_grid_size to gpu.func

In many cases, the the number of workgroups (the grid size) and the
number of workitems within each group (the block size) that a GPU
kernel will be launched with are known. For example, if gpu.launch is
called with constant block and grid sizes, we know that those are the
only possible sizes that will be used to launch that kernel. In other
cases, a custom code-generation pipeline that eventually produces GPU
kernels may know the launch dimensions of those kernels, or at least
may be able to provide an upper bound on them.

Other GPU programming systems, such as OpenCL, allow capturing such
information to enable compiler optimizations - see
reqd_work_group_size, but MLIR currently has no mechanism for doing so.

This set of attributes is the first step in enabling optimizations
based on the known launch dimensions of kernels. It extends the kernel
outline pass to set these bounds on kernels with constant launch
dimensions and extends integer range inference for GPU index
operations to account for the bounds when they are known.

Subsequent revisions will use this data when lowering GPU operations
to the ROCDL dialect.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D139865

show more ...


Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2
# 10c04f46 30-Sep-2022 River Riddle <riddleriver@gmail.com>

[mlir:GPU][NFC] Update GPU API to use prefixed accessors

This doesn't flip the switch for prefix generation yet, that'll be
done in a followup.


Revision tags: llvmorg-15.0.1, llvmorg-15.0.0
# 28c17a4b 29-Aug-2022 Mehdi Amini <joker.eph@gmail.com>

Apply clang-tidy fixes for performance-unnecessary-value-param in InferIntRangeInterfaceImpls.cpp (NFC)


Revision tags: llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 3e01af09 04-Jul-2022 Christian Sigg <csigg@google.com>

[mlir] Add InferIntRangeInterface to gpu.launch

Infers block/grid dimensions/indices or ranges of such dimensions/indices.

Reviewed By: krzysz00

Differential Revision: https://reviews.llvm.org/D12

[mlir] Add InferIntRangeInterface to gpu.launch

Infers block/grid dimensions/indices or ranges of such dimensions/indices.

Reviewed By: krzysz00

Differential Revision: https://reviews.llvm.org/D129036

show more ...