History log of /llvm-project/mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp (Results 1 – 25 of 94)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init
# 6aaa8f25 21-Jan-2025 Matthias Springer <me@m-sp.org>

[mlir][IR][NFC] Move free-standing functions to `MemRefType` (#123465)

Turn free-standing `MemRefType`-related helper functions in
`BuiltinTypes.h` into member functions.


Revision tags: llvmorg-19.1.7
# 09dfc571 20-Dec-2024 Jacques Pienaar <jpienaar@google.com>

[mlir] Enable decoupling two kinds of greedy behavior. (#104649)

The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracin

[mlir] Enable decoupling two kinds of greedy behavior. (#104649)

The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.

These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.

Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.

For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.

show more ...


Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3
# 927559d2 24-Oct-2024 Longsheng Mou <longshengmou@gmail.com>

[mlir][vector] Fix a crash in `VectorToGPU` (#113454)

This PR fixes a crash in `VectorToGPU` when the operand of `extOp` is a
function argument, which cannot be retrieved using `getDefiningOp`.
Fi

[mlir][vector] Fix a crash in `VectorToGPU` (#113454)

This PR fixes a crash in `VectorToGPU` when the operand of `extOp` is a
function argument, which cannot be retrieved using `getDefiningOp`.
Fixes #107967.

show more ...


Revision tags: llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2
# 5262865a 04-Aug-2024 Kazu Hirata <kazu@google.com>

[mlir] Construct SmallVector with ArrayRef (NFC) (#101896)


Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init
# db791b27 02-Jul-2024 Ramkumar Ramachandra <ramkumar.ramachandra@codasip.com>

mlir/LogicalResult: move into llvm (#97309)

This patch is part of a project to move the Presburger library into
LLVM.


# 43fd4c49 18-Jun-2024 Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>

[mlir][GPU] Improve handling of GPU bounds (#95166)

This change reworks how range information for GPU dispatch IDs (block
IDs, thread IDs, and so on) is handled.

1. `known_block_size` and `known

[mlir][GPU] Improve handling of GPU bounds (#95166)

This change reworks how range information for GPU dispatch IDs (block
IDs, thread IDs, and so on) is handled.

1. `known_block_size` and `known_grid_size` become inherent attributes
of GPU functions. This makes them less clunky to work with. As a
consequence, the `gpu.func` lowering patterns now only look at the
inherent attributes when setting target-specific attributes on the
`llvm.func` that they lower to.
2. At the same time, `gpu.known_block_size` and `gpu.known_grid_size`
are made official dialect-level discardable attributes which can be
placed on arbitrary functions. This allows for progressive lowerings
(without this, a lowering for `gpu.thread_id` couldn't know about the
bounds if it had already been moved from a `gpu.func` to an `llvm.func`)
and allows for range information to be provided even when
`gpu.*_{id,dim}` are being used outside of a `gpu.func` context.
3. All of these index operations have gained an optional `upper_bound`
attribute, allowing for an alternate mode of operation where the bounds
are specified locally and not inherited from the operation's context.
These also allow handling of cases where the precise launch sizes aren't
known, but can be bounded more precisely than the maximum of what any
platform's API allows. (I'd like to thank @benvanik for pointing out
that this could be useful.)

When inferring bounds (either for range inference or for setting `range`
during lowering) these sources of information are consulted in order of
specificity (`upper_bound` > inherent attribute > discardable attribute,
except that dimension sizes check for `known_*_bounds` to see if they
can be constant-folded before checking their `upper_bound`).

This patch also updates the documentation about the bounds and inference
behavior to clarify what these attributes do when set and the
consequences of setting them up incorrectly.

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>

show more ...


Revision tags: llvmorg-18.1.8, llvmorg-18.1.7
# b00e0c16 22-May-2024 Christian Ulmann <christianulmann@gmail.com>

[MLIR][Analysis] Consolidate topological sort utilities (#92563)

This PR attempts to consolidate the different topological sort utilities
into one place. It adds them to the analysis folder becaus

[MLIR][Analysis] Consolidate topological sort utilities (#92563)

This PR attempts to consolidate the different topological sort utilities
into one place. It adds them to the analysis folder because the
`SliceAnalysis` uses some of these.

There are now two different sorting strategies:
1. Sort only according to SSA use-def chains
2. Sort while taking regions into account. This requires a much more
elaborate traversal and cannot be applied on graph regions that easily.

This additionally reimplements the region aware topological sorting
because the previous implementation had an exponential space complexity.

I'm open to suggestions on how to combine this further or how to fuse
the test passes.

show more ...


Revision tags: llvmorg-18.1.6
# a037d889 13-May-2024 Lei Zhang <antiagainst@gmail.com>

[mlir][gpu] Support extf before contract when converting to MMA ops (#91988)

This commit allows `inferFragType` to see through all arith.ext op
and other elementwise users before reaching contract

[mlir][gpu] Support extf before contract when converting to MMA ops (#91988)

This commit allows `inferFragType` to see through all arith.ext op
and other elementwise users before reaching contract op for
figuring out the fragment type.

show more ...


# baa5beec 13-May-2024 tyb0807 <sontuan.vu119@gmail.com>

[NFC] Make NVGPU casing consistent (#91903)


Revision tags: llvmorg-18.1.5
# a5757c5b 19-Apr-2024 Christian Sigg <chsigg@users.noreply.github.com>

Switch member calls to `isa/dyn_cast/cast/...` to free function calls. (#89356)

This change cleans up call sites. Next step is to mark the member
functions deprecated.

See https://mlir.llvm.org/

Switch member calls to `isa/dyn_cast/cast/...` to free function calls. (#89356)

This change cleans up call sites. Next step is to mark the member
functions deprecated.

See https://mlir.llvm.org/deprecation and
https://discourse.llvm.org/t/preferred-casting-style-going-forward.

show more ...


Revision tags: llvmorg-18.1.4, llvmorg-18.1.3
# 971b8525 01-Apr-2024 Jakub Kuderski <jakub@nod-labs.com>

[mlir][NFC] Simplify type checks with isa predicates (#87183)

For more context on isa predicates, see:
https://github.com/llvm/llvm-project/pull/83753.


Revision tags: llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3
# fe8a62c4 08-Feb-2024 Uday Bondhugula <uday@polymagelabs.com>

[MLIR] Fix crash in AffineMap::replace for zero result maps (#80930)

Fix obvious bug in AffineMap::replace for the case of zero result maps.
Extend/complete inferExprsFromList to work with empty ex

[MLIR] Fix crash in AffineMap::replace for zero result maps (#80930)

Fix obvious bug in AffineMap::replace for the case of zero result maps.
Extend/complete inferExprsFromList to work with empty expression lists.

show more ...


Revision tags: llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 62bf7710 21-Jan-2024 Matthias Springer <me@m-sp.org>

[mlir][IR] Add `notifyBlockRemoved` callback to listener (#78306)

There is already a "block inserted" notification (in
`OpBuilder::Listener`), so there should also be a "block removed"
notificatio

[mlir][IR] Add `notifyBlockRemoved` callback to listener (#78306)

There is already a "block inserted" notification (in
`OpBuilder::Listener`), so there should also be a "block removed"
notification.

The purpose of this change is to make the listener API more mature.
There is currently a gap between what kind of IR changes can be made and
what IR changes can be listened to. At the moment, the only way to
inform listeners about "block removal" is to send a manual
`notifyOperationModified` for the parent op (e.g., by wrapping the
`eraseBlock(b)` method call in `updateRootInPlace(b->getParentOp())`).
This tells the listener that *something* has changed, but it is somewhat
of an API abuse.

show more ...


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4
# b984045d 20-Oct-2023 Mehdi Amini <joker.eph@gmail.com>

Apply clang-tidy fixes for readability-simplify-boolean-expr in VectorToGPU.cpp (NFC)


# 847d8457 20-Oct-2023 Mehdi Amini <joker.eph@gmail.com>

Apply clang-tidy fixes for performance-unnecessary-value-param in VectorToGPU.cpp (NFC)


# b8a3f0fd 20-Oct-2023 Mehdi Amini <joker.eph@gmail.com>

Apply clang-tidy fixes for llvm-qualified-auto in VectorToGPU.cpp (NFC)


# 42bba97f 07-Dec-2023 harsh-nod <harsh@nod-labs.com>

[mlir] Extend CombineTransferReadOpTranspose pattern to handle extf ops. (#74754)

This patch modifies the CombineTransferReadOpTranspose pattern to handle
extf ops. Also adds a test which shows the

[mlir] Extend CombineTransferReadOpTranspose pattern to handle extf ops. (#74754)

This patch modifies the CombineTransferReadOpTranspose pattern to handle
extf ops. Also adds a test which shows the transpose getting folded into
the transfer_read.

show more ...


# 32c3decb 20-Nov-2023 Matthias Springer <me@m-sp.org>

[mlir][vector] Modernize `vector.transpose` op (#72594)

* Declare arguments/results with `let` statements.
* Rename `transp` to `permutation`.
* Change type of `transp` from `I64ArrayAttr` to `Den

[mlir][vector] Modernize `vector.transpose` op (#72594)

* Declare arguments/results with `let` statements.
* Rename `transp` to `permutation`.
* Change type of `transp` from `I64ArrayAttr` to `DenseI64ArrayAttr`
(provides direct access to `ArrayRef<int64_t>` instead of `ArrayAttr`).

show more ...


# 1609f1c2 14-Nov-2023 long.chen <lipracer@gmail.com>

[mlir][affine][nfc] cleanup deprecated T.cast style functions (#71269)

detail see the docment: https://mlir.llvm.org/deprecation/

Not all changes are made manually, most of them are made through

[mlir][affine][nfc] cleanup deprecated T.cast style functions (#71269)

detail see the docment: https://mlir.llvm.org/deprecation/

Not all changes are made manually, most of them are made through a clang
tool I wrote https://github.com/lipracer/cpp-refactor.

show more ...


Revision tags: llvmorg-17.0.3, llvmorg-17.0.2
# 9b5ef2be 19-Sep-2023 Matthias Springer <me@m-sp.org>

[mlir][Interfaces] `LoopLikeOpInterface`: Support ops with multiple regions (#66754)

This commit implements `LoopLikeOpInterface` on `scf.while`. This
enables LICM (and potentially other transforms

[mlir][Interfaces] `LoopLikeOpInterface`: Support ops with multiple regions (#66754)

This commit implements `LoopLikeOpInterface` on `scf.while`. This
enables LICM (and potentially other transforms) on `scf.while`.

`LoopLikeOpInterface::getLoopBody()` is renamed to `getLoopRegions` and
can now return multiple regions.

Also fix a bug in the default implementation of
`LoopLikeOpInterface::isDefinedOutsideOfLoop()`, which returned "false"
for some values that are defined outside of the loop (in a nested op, in
such a way that the value does not dominate the loop). This interface is
currently only used for LICM and there is no way to trigger this bug, so
no test is added.

show more ...


Revision tags: llvmorg-17.0.1
# 5cf714bb 18-Sep-2023 Matthias Springer <me@m-sp.org>

[mlir][SCF] scf.for: Consistent API around `initArgs` (#66512)

* Always use the auto-generated `getInitArgs` function. Remove the
hand-written `getInitOperands` duplicate.
* Remove `hasIterOperand

[mlir][SCF] scf.for: Consistent API around `initArgs` (#66512)

* Always use the auto-generated `getInitArgs` function. Remove the
hand-written `getInitOperands` duplicate.
* Remove `hasIterOperands` and `getNumIterOperands`. The names were
inconsistent because the "arg" is called `initArgs` in TableGen. Use
`getInitArgs().size()` instead.
* Fix verification around ops with no results.

show more ...


Revision tags: llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init
# cafb6284 19-Jul-2023 Christopher Bate <cbate@nvidia.com>

[mlir][VectorToGPU] Update memref stride preconditions on `nvgpu.mma.sync` path

This change removes the requirement that the row stride be statically known when
converting `vector.transfer_read` and

[mlir][VectorToGPU] Update memref stride preconditions on `nvgpu.mma.sync` path

This change removes the requirement that the row stride be statically known when
converting `vector.transfer_read` and `vector.transfer_write` to distributed
SIMT operations in the `nvgpu` lowering path. It also adds a check to verify
that the last dimension of the source memref is statically known to have stride
1 since this is assumed in the conversion logic. No other change should be
required since the generated `vector.load` operations are never created across
dimensions other than the last. The routines for checking preconditions on
`vector.transfer_read/write` are moved to under nvgpu utilities.

The change is NFC with respect to the GPU dialect lowering path.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D155753

show more ...


# 8a6e54c9 12-Sep-2023 Daniil Dudkin <39276703+unterumarmung@users.noreply.github.com>

[mlir][arith] Rename operations: `maxf` → `maximumf`, `minf` → `minimumf` (#65800)

This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https:

[mlir][arith] Rename operations: `maxf` → `maximumf`, `minf` → `minimumf` (#65800)

This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671.

This commit addresses Task 1.2 of the mentioned RFC. By renaming these operations, we align their names with LLVM intrinsics that have corresponding semantics.

show more ...


# a0119437 02-Aug-2023 Lei Zhang <antiagainst@google.com>

[mlir][gpu] Support arith.extf in subgroup MMA elementwise ops

This commit adds support for arith.extf in the supported list of
elementwise ops for subgroup MMA ops, and enables lowering to
SPIR-V.

[mlir][gpu] Support arith.extf in subgroup MMA elementwise ops

This commit adds support for arith.extf in the supported list of
elementwise ops for subgroup MMA ops, and enables lowering to
SPIR-V.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D156847

show more ...


# 16b75cd2 31-Jul-2023 Matthias Springer <me@m-sp.org>

[mlir][vector] Use DenseI64ArrayAttr for ExtractOp/InsertOp positions

`DenseI64ArrayAttr` provides a better API than `I64ArrayAttr`. E.g., accessors returning `ArrayRef<int64_t>` (instead of `ArrayA

[mlir][vector] Use DenseI64ArrayAttr for ExtractOp/InsertOp positions

`DenseI64ArrayAttr` provides a better API than `I64ArrayAttr`. E.g., accessors returning `ArrayRef<int64_t>` (instead of `ArrayAttr`) are generated.

Differential Revision: https://reviews.llvm.org/D156684

show more ...


1234