History log of /llvm-project/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp (Results 1 – 25 of 109)
Revision Date Author Comments
# 599c7399 06-Jan-2025 Matthias Springer <me@m-sp.org>

[mlir][GPU] Add NVVM-specific `cf.assert` lowering (#120431)

This commit add an NVIDIA-specific lowering of `cf.assert` to to
`__assertfail`.

Note: `getUniqueFormatGlobalName`, `getOrCreateForma

[mlir][GPU] Add NVVM-specific `cf.assert` lowering (#120431)

This commit add an NVIDIA-specific lowering of `cf.assert` to to
`__assertfail`.

Note: `getUniqueFormatGlobalName`, `getOrCreateFormatStringConstant` and
`getOrDefineFunction` are moved to `GPUOpsLowering.h`, so that they can
be reused.

show more ...


# 0e23cb0c 31-Dec-2024 Ivan Butygin <ivan.butygin@gmail.com>

[mlir][nfc] GpuToROCDL: Remove some dead code (#121403)


# 018b32ca 31-Dec-2024 Ivan Butygin <ivan.butygin@gmail.com>

Revert "[mlir][nfc] GpuToROCDL: Remove some dead code" (#121402)

Reverts llvm/llvm-project#121395


# 0b08e095 31-Dec-2024 Ivan Butygin <ivan.butygin@gmail.com>

[mlir][nfc] GpuToROCDL: Remove some dead code (#121395)


# 09dfc571 20-Dec-2024 Jacques Pienaar <jpienaar@google.com>

[mlir] Enable decoupling two kinds of greedy behavior. (#104649)

The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracin

[mlir] Enable decoupling two kinds of greedy behavior. (#104649)

The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.

These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.

Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.

For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.

show more ...


# 596bfb80 20-Nov-2024 Dragan Mladjenovic <Dragan.Mladjenovic@syrmia.com>

[MLIR][AMDGPU] Support gpu::ShuffleMode::DOWN lowering in ROCDL (#106237)


# 206fad0e 05-Oct-2024 Matthias Springer <me@m-sp.org>

[mlir][NFC] Mark type converter in `populate...` functions as `const` (#111250)

This commit marks the type converter in `populate...` functions as
`const`. This is useful for debugging.

Patterns

[mlir][NFC] Mark type converter in `populate...` functions as `const` (#111250)

This commit marks the type converter in `populate...` functions as
`const`. This is useful for debugging.

Patterns already take a `const` type converter. However, some
`populate...` functions do not only add new patterns, but also add
additional type conversion rules. That makes it difficult to find the
place where a type conversion was added in the code base. With this
change, all `populate...` functions that only populate pattern now have
a `const` type converter. Programmers can then conclude from the
function signature that these functions do not register any new type
conversion rules.

Also some minor cleanups around the 1:N dialect conversion
infrastructure, which did not always pass the type converter as a
`const` object internally.

show more ...


# 1c47fa9b 23-Sep-2024 Daniel Hernandez-Juarez <dhernandez0@gmail.com>

[mlir][AMDGPU] Add support for AMD f16 math library calls (#108809)

In this PR we add support for AMD f16 math library calls
(`__ocml_*_f16`)

CC: @krzysz00 @manupak


# a16164d0 12-Sep-2024 Nirvedh Meshram <96096277+nirvedhmeshram@users.noreply.github.com>

[MLIR][ROCDL] Add dynamically legal ops to LowerGpuOpsToROCDLOpsPass (#108302)

Similar to https://github.com/llvm/llvm-project/pull/108266
After https://github.com/llvm/llvm-project/pull/102971
It

[MLIR][ROCDL] Add dynamically legal ops to LowerGpuOpsToROCDLOpsPass (#108302)

Similar to https://github.com/llvm/llvm-project/pull/108266
After https://github.com/llvm/llvm-project/pull/102971
It is legal to generate `LLVM::ExpOp` and `LLVM::LogOp` if the type is
is a float16 or float32

show more ...


# c31d3438 11-Sep-2024 Nirvedh Meshram <96096277+nirvedhmeshram@users.noreply.github.com>

Update legalizations for LowerGpuOpsToROCDLOps (#108266)

LLVM::FAbsOp and LLVM::SqrtOp are legal after
https://github.com/llvm/llvm-project/pull/102971


# 70302803 13-Aug-2024 Matthias Springer <me@m-sp.org>

[mlir][GPU] Improve `gpu.module` op implementation (#102866)

- Replace hand-written parser/printer with auto-generated assembly
format.
- Remove implicit `gpu.module_end` terminator and use the `N

[mlir][GPU] Improve `gpu.module` op implementation (#102866)

- Replace hand-written parser/printer with auto-generated assembly
format.
- Remove implicit `gpu.module_end` terminator and use the `NoTerminator`
trait instead. (Same as `builtin.module`.)
- Turn the region into a graph region. (Same as `builtin.module`.)

show more ...


# d45de800 09-Aug-2024 Victor Perez <victor.perez@codeplay.com>

[MLIR][GPU-LLVM] Convert `gpu.func` to `llvm.func` (#101664)

Add support in `-convert-gpu-to-llvm-spv` to convert `gpu.func` to
`llvm.func` operations.

- `spir_kernel`/`spir_func` calling conven

[MLIR][GPU-LLVM] Convert `gpu.func` to `llvm.func` (#101664)

Add support in `-convert-gpu-to-llvm-spv` to convert `gpu.func` to
`llvm.func` operations.

- `spir_kernel`/`spir_func` calling conventions used for
kernels/functions.
- `workgroup` attributions encoded as additional `llvm.ptr<3>`
arguments.
- No attribute used to annotate kernels
- `reqd_work_group_size` attribute using to encode
`gpu.known_block_size`.
- `llvm.mlir.workgroup_attrib_size` used to encode workgroup attribution
sizes. This will be attached to the pointer argument workgroup
attributions lower to.

**Note**: A notable missing feature that will be addressed in a
follow-up PR is a `-use-bare-ptr-memref-call-conv` option to replace
MemRef arguments with bare pointers to the MemRef element types instead
of the current MemRef descriptor approach.

---------

Signed-off-by: Victor Perez <victor.perez@codeplay.com>

show more ...


# 3fae5551 17-Jul-2024 Jan Leyonberg <jan_sjodin@yahoo.com>

[MLIR][ROCDL] Refactor conversion of math operations to ROCDL calls to a separate pass (#98653)

This patch refactors the conversion of math operations to ROCDL library
calls. This pass will also be

[MLIR][ROCDL] Refactor conversion of math operations to ROCDL calls to a separate pass (#98653)

This patch refactors the conversion of math operations to ROCDL library
calls. This pass will also be used in flang to lower Fortran
intrinsics/math functions for OpenMP target offloading codgen.

show more ...


# 43fd4c49 18-Jun-2024 Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>

[mlir][GPU] Improve handling of GPU bounds (#95166)

This change reworks how range information for GPU dispatch IDs (block
IDs, thread IDs, and so on) is handled.

1. `known_block_size` and `known

[mlir][GPU] Improve handling of GPU bounds (#95166)

This change reworks how range information for GPU dispatch IDs (block
IDs, thread IDs, and so on) is handled.

1. `known_block_size` and `known_grid_size` become inherent attributes
of GPU functions. This makes them less clunky to work with. As a
consequence, the `gpu.func` lowering patterns now only look at the
inherent attributes when setting target-specific attributes on the
`llvm.func` that they lower to.
2. At the same time, `gpu.known_block_size` and `gpu.known_grid_size`
are made official dialect-level discardable attributes which can be
placed on arbitrary functions. This allows for progressive lowerings
(without this, a lowering for `gpu.thread_id` couldn't know about the
bounds if it had already been moved from a `gpu.func` to an `llvm.func`)
and allows for range information to be provided even when
`gpu.*_{id,dim}` are being used outside of a `gpu.func` context.
3. All of these index operations have gained an optional `upper_bound`
attribute, allowing for an alternate mode of operation where the bounds
are specified locally and not inherited from the operation's context.
These also allow handling of cases where the precise launch sizes aren't
known, but can be bounded more precisely than the maximum of what any
platform's API allows. (I'd like to thank @benvanik for pointing out
that this could be useful.)

When inferring bounds (either for range inference or for setting `range`
during lowering) these sources of information are consulted in order of
specificity (`upper_bound` > inherent attribute > discardable attribute,
except that dimension sizes check for `known_*_bounds` to see if they
can be constant-folded before checking their `upper_bound`).

This patch also updates the documentation about the bounds and inference
behavior to clarify what these attributes do when set and the
consequences of setting them up incorrectly.

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>

show more ...


# 94be8018 28-May-2024 stefankoncarevic <skoncare@amd.com>

[mlir][ROCDL] Update the LLVM data layout for ROCDL lowering. (#92127)

This change updates the dataLayout string to ensure alignment with the
latest LLVM TargetMachine configuration. The aim is to

[mlir][ROCDL] Update the LLVM data layout for ROCDL lowering. (#92127)

This change updates the dataLayout string to ensure alignment with the
latest LLVM TargetMachine configuration. The aim is to
maintain consistency and prevent potential compilation issues related to
memory address space handling.

show more ...


# 4cba5957 27-Feb-2024 Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>

[mlir][ROCDL] Set the LLVM data layout when lowering to ROCDL LLVM (#74501)

In order to ensure operations lower correctly (especially
memref.addrspacecast, which relies on the data layout benig set

[mlir][ROCDL] Set the LLVM data layout when lowering to ROCDL LLVM (#74501)

In order to ensure operations lower correctly (especially
memref.addrspacecast, which relies on the data layout benig set
correctly then dealing with dynamic memrefs) and to prevent compilation
issues later down the line, set the `llvm.data_layout` attribute on GPU
modules when lowering their contents to a ROCDL / AMDGPU target.

If there's a good way to test the embedded string to prevent it from
going out of sync with the LLVM TargetMachine, I'd appreciate hearing
about it. (Or, alternatively, if there's a place I could farctor the
string out to).

show more ...


# 45c226d4 20-Feb-2024 Mehdi Amini <joker.eph@gmail.com>

[MLIR] Add ODS support for generating helpers for dialect (discardable) attributes (#77024)

This is a new ODS feature that allows dialects to define a list of
key/value pair representing an attribu

[MLIR] Add ODS support for generating helpers for dialect (discardable) attributes (#77024)

This is a new ODS feature that allows dialects to define a list of
key/value pair representing an attribute type and a name.
This will generate helper classes on the dialect to be able to
manage discardable attributes on operations in a type safe way.

For example the `test` dialect can define:

```
let discardableAttrs = (ins
"mlir::IntegerAttr":$discardable_attr_key,
);
```

And the following will be generated in the TestDialect class:

```
/// Helper to manage the discardable attribute `discardable_attr_key`.
class DiscardableAttrKeyAttrHelper {
::mlir::StringAttr name;
public:
static constexpr ::llvm::StringLiteral getNameStr() {
return "test.discardable_attr_key";
}
constexpr ::mlir::StringAttr getName() {
return name;
}

DiscardableAttrKeyAttrHelper(::mlir::MLIRContext *ctx)
: name(::mlir::StringAttr::get(ctx, getNameStr())) {}

mlir::IntegerAttr getAttr(::mlir::Operation *op) {
return op->getAttrOfType<mlir::IntegerAttr>(name);
}
void setAttr(::mlir::Operation *op, mlir::IntegerAttr val) {
op->setAttr(name, val);
}
bool isAttrPresent(::mlir::Operation *op) {
return op->hasAttrOfType<mlir::IntegerAttr>(name);
}
void removeAttr(::mlir::Operation *op) {
assert(op->hasAttrOfType<mlir::IntegerAttr>(name));
op->removeAttr(name);
}
};
DiscardableAttrKeyAttrHelper getDiscardableAttrKeyAttrHelper() {
return discardableAttrKeyAttrName;
}
```

User code having an instance of the TestDialect can then manipulate this
attribute on operation using:

```
auto helper = testDialect.getDiscardableAttrKeyAttrHelper();

helper.setAttr(op, value);
helper.isAttrPresent(op);
...
```

show more ...


# 65066c02 01-Feb-2024 Hugo Trachino <32955781+nujaa@users.noreply.github.com>

[mlir] Use `create` instead of `createOrFold` for ConstantOp as folding has no effect (NFC) (#80129)

This aims to clean-up confusing uses of
builder.createOrFold<ConstantOp> since folding of consta

[mlir] Use `create` instead of `createOrFold` for ConstantOp as folding has no effect (NFC) (#80129)

This aims to clean-up confusing uses of
builder.createOrFold<ConstantOp> since folding of constants fails.

show more ...


# 391a7577 05-Dec-2023 Guray Ozen <guray.ozen@gmail.com>

[mlir][gpu] Add lowering dynamic_shared_memory op for rocdl (#74473)

This PR adds lowering of `gpu.dynamic_shared_memory` to rocdl target.


# 4279a642 01-Nov-2023 Christian Ulmann <christian.ulmann@nextsilicon.com>

[MLIR][GPUToROCDL] Remove typed pointer support (#70908)

This commit removes the support for lowering GPU to ROCDL dialect with
typed pointers. Typed pointers have been deprecated for a while now a

[MLIR][GPUToROCDL] Remove typed pointer support (#70908)

This commit removes the support for lowering GPU to ROCDL dialect with
typed pointers. Typed pointers have been deprecated for a while now and
it's planned to soon remove them from the LLVM dialect.

Related PSA:
https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502

show more ...


# baf2d135 04-Sep-2023 Adrian Kuegel <akuegel@google.com>

[mlir][GPUToROCDL] Lower arith.remf to GPU intrinsic.

Differential Revision: https://reviews.llvm.org/D159423


# 18960960 24-Aug-2023 Stanley Winata <stanley@nod-labs.com>

[mlir][ROCM] Add Wave/Warp shuffle lowering and op for ROCM.

Reduction is heavily used for many DL workload especially with
softmax/Attention layers. Wave/Warp shuffle and reduction is known to be
a

[mlir][ROCM] Add Wave/Warp shuffle lowering and op for ROCM.

Reduction is heavily used for many DL workload especially with
softmax/Attention layers. Wave/Warp shuffle and reduction is known to be
a speedy/efficient way to do these reductions.

In this patch we introduce AMD shuffle intrinsic Ops to ROCDL, along with it's corresponding lowering from gpu.shuffle. This should speed up a lot of DL workloads on ROCM backend. Currently, we have support for xor and idx, which are the more common ones. In the future, we plan on adding support for Down and Up, as well as using the ds_swizzle to further enhance it's performance when width and offsets are constant.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D158684

show more ...


# 51b65d08 20-Jul-2023 Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>

[mlir][AMDGPU] Improve BF16 handling through AMDGPU compilation

Many previous sets of AMDGPU dialect code have been incorrect in the
presence of the bf16 type (when lowered to LLVM's bfloat) as they

[mlir][AMDGPU] Improve BF16 handling through AMDGPU compilation

Many previous sets of AMDGPU dialect code have been incorrect in the
presence of the bf16 type (when lowered to LLVM's bfloat) as they were
developed in a setting that run a custom bf16-to-i16 pass before LLVM
lowering.

An overall effect of this patch is that you should run
--arith-emulate-unsupported-floats="source-types=bf16 target-type=f32"
on your GPU module before calling --convert-gpu-to-rocdl if your code
performs bf16 arithmetic.

While LLVM now supports software bfloat, initial experiments showed
that using this support on AMDGPU inserted a large number of
conversions around loads and stores which had substantial performance
imparts. Furthermore, all of the native AMDGPU operations on bf16
types (like the WMMA operations) operate on 16-bit integers instead of
the bfloat type.

First, we make the following changes to preserve compatibility once
the LLVM bfloat type is reenabled.
1. The matrix multiplication operations (MFMA and WMMA) will bitcast
bfloat vectors to i16 vectors.
2. Buffer loads and stores will operate on the relevant integer
datatype and then cast to bfloat if needed.

Second, we add type conversions to convert bf16 and vectors of it to
equivalent i16 types.

Third, we add the bfloat <-> f32 expansion patterns to the set of
operations run before the main LLVM conversion so that MLIR's
implementation of these conversion routines is used.

Finally, we extend the "floats treated as integers" support in the
LLVM exporter to handle types other than fp8.

We also fix a bug in the unsupported floats emulation where it tried
to operate on `arith.bitcast` due to an oversight.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D156361

show more ...


# 888717e8 09-Aug-2023 Nicolas Vasilache <nicolasvasilache@users.noreply.github.com>

[mlir][transform] Enable gpu-to-nvvm via conversion patterns driven by TD

This revision untangles a few more conversion pieces and allows rewriting
the relatively intricate (and somewhat inconsisten

[mlir][transform] Enable gpu-to-nvvm via conversion patterns driven by TD

This revision untangles a few more conversion pieces and allows rewriting
the relatively intricate (and somewhat inconsistent) LowerGpuOpsToNVVMOpsPass
in a declarative fashion that provides a much better understanding and control.

Differential Revision: https://reviews.llvm.org/D157617

show more ...


# cdf7ca6d 26-Jul-2023 SJW <swaters@amd.com>

[MLIR][ROCDL] Add conversion for gpu.lane_id to ROCDL

Creates rocdl.lane_id op with llvm conversion to:

__device__ static unsigned int __lane_id() {
return __builtin_amdgcn_mbcnt_hi(

[MLIR][ROCDL] Add conversion for gpu.lane_id to ROCDL

Creates rocdl.lane_id op with llvm conversion to:

__device__ static unsigned int __lane_id() {
return __builtin_amdgcn_mbcnt_hi(
-1, __builtin_amdgcn_mbcnt_lo(-1, 0));
}

Reviewed By: krzysz00

Differential Revision: https://reviews.llvm.org/D154666

show more ...


12345