VectorToGPU.cpp - OpenGrok history log for /llvm-project/mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init
# 6aaa8f25	21-Jan-2025	Matthias Springer <me@m-sp.org>	[mlir][IR][NFC] Move free-standing functions to `MemRefType` (#123465) Turn free-standing `MemRefType`-related helper functions in `BuiltinTypes.h` into member functions.
Revision tags: llvmorg-19.1.7
# 09dfc571	20-Dec-2024	Jacques Pienaar <jpienaar@google.com>	[mlir] Enable decoupling two kinds of greedy behavior. (#104649) The greedy rewriter is used in many different flows and it has a lot of convenience (work list management, debugging actions, tracin [mlir] Enable decoupling two kinds of greedy behavior. (#104649) The greedy rewriter is used in many different flows and it has a lot of convenience (work list management, debugging actions, tracing, etc). But it combines two kinds of greedy behavior 1) how ops are matched, 2) folding wherever it can. These are independent forms of greedy and leads to inefficiency. E.g., cases where one need to create different phases in lowering and is required to applying patterns in specific order split across different passes. Using the driver one ends up needlessly retrying folding/having multiple rounds of folding attempts, where one final run would have sufficed. Of course folks can locally avoid this behavior by just building their own, but this is also a common requested feature that folks keep on working around locally in suboptimal ways. For downstream users, there should be no behavioral change. Updating from the deprecated should just be a find and replace (e.g., `find ./ -type f -exec sed -i 's\|applyPatternsAndFoldGreedily\|applyPatternsGreedily\|g' {} \;` variety) as the API arguments hasn't changed between the two. show more ...
Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3
# 927559d2	24-Oct-2024	Longsheng Mou <longshengmou@gmail.com>	[mlir][vector] Fix a crash in `VectorToGPU` (#113454) This PR fixes a crash in `VectorToGPU` when the operand of `extOp` is a function argument, which cannot be retrieved using `getDefiningOp`. Fi [mlir][vector] Fix a crash in `VectorToGPU` (#113454) This PR fixes a crash in `VectorToGPU` when the operand of `extOp` is a function argument, which cannot be retrieved using `getDefiningOp`. Fixes #107967. show more ...
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2
# 5262865a	04-Aug-2024	Kazu Hirata <kazu@google.com>	[mlir] Construct SmallVector with ArrayRef (NFC) (#101896)
Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init
# db791b27	02-Jul-2024	Ramkumar Ramachandra <ramkumar.ramachandra@codasip.com>	mlir/LogicalResult: move into llvm (#97309) This patch is part of a project to move the Presburger library into LLVM.
# 43fd4c49	18-Jun-2024	Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>	[mlir][GPU] Improve handling of GPU bounds (#95166) This change reworks how range information for GPU dispatch IDs (block IDs, thread IDs, and so on) is handled. 1. `known_block_size` and `known [mlir][GPU] Improve handling of GPU bounds (#95166) This change reworks how range information for GPU dispatch IDs (block IDs, thread IDs, and so on) is handled. 1. `known_block_size` and `known_grid_size` become inherent attributes of GPU functions. This makes them less clunky to work with. As a consequence, the `gpu.func` lowering patterns now only look at the inherent attributes when setting target-specific attributes on the `llvm.func` that they lower to. 2. At the same time, `gpu.known_block_size` and `gpu.known_grid_size` are made official dialect-level discardable attributes which can be placed on arbitrary functions. This allows for progressive lowerings (without this, a lowering for `gpu.thread_id` couldn't know about the bounds if it had already been moved from a `gpu.func` to an `llvm.func`) and allows for range information to be provided even when `gpu._{id,dim}` are being used outside of a `gpu.func` context. 3. All of these index operations have gained an optional `upper_bound` attribute, allowing for an alternate mode of operation where the bounds are specified locally and not inherited from the operation's context. These also allow handling of cases where the precise launch sizes aren't known, but can be bounded more precisely than the maximum of what any platform's API allows. (I'd like to thank @benvanik for pointing out that this could be useful.) When inferring bounds (either for range inference or for setting `range` during lowering) these sources of information are consulted in order of specificity (`upper_bound` > inherent attribute > discardable attribute, except that dimension sizes check for `known__bounds` to see if they can be constant-folded before checking their `upper_bound`). This patch also updates the documentation about the bounds and inference behavior to clarify what these attributes do when set and the consequences of setting them up incorrectly. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com> show more ...
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7
# b00e0c16	22-May-2024	Christian Ulmann <christianulmann@gmail.com>	[MLIR][Analysis] Consolidate topological sort utilities (#92563) This PR attempts to consolidate the different topological sort utilities into one place. It adds them to the analysis folder becaus [MLIR][Analysis] Consolidate topological sort utilities (#92563) This PR attempts to consolidate the different topological sort utilities into one place. It adds them to the analysis folder because the `SliceAnalysis` uses some of these. There are now two different sorting strategies: 1. Sort only according to SSA use-def chains 2. Sort while taking regions into account. This requires a much more elaborate traversal and cannot be applied on graph regions that easily. This additionally reimplements the region aware topological sorting because the previous implementation had an exponential space complexity. I'm open to suggestions on how to combine this further or how to fuse the test passes. show more ...
Revision tags: llvmorg-18.1.6
# a037d889	13-May-2024	Lei Zhang <antiagainst@gmail.com>	[mlir][gpu] Support extf before contract when converting to MMA ops (#91988) This commit allows `inferFragType` to see through all arith.ext op and other elementwise users before reaching contract [mlir][gpu] Support extf before contract when converting to MMA ops (#91988) This commit allows `inferFragType` to see through all arith.ext op and other elementwise users before reaching contract op for figuring out the fragment type. show more ...
# baa5beec	13-May-2024	tyb0807 <sontuan.vu119@gmail.com>	[NFC] Make NVGPU casing consistent (#91903)
Revision tags: llvmorg-18.1.5
# a5757c5b	19-Apr-2024	Christian Sigg <chsigg@users.noreply.github.com>	Switch member calls to `isa/dyn_cast/cast/...` to free function calls. (#89356) This change cleans up call sites. Next step is to mark the member functions deprecated. See https://mlir.llvm.org/ Switch member calls to `isa/dyn_cast/cast/...` to free function calls. (#89356) This change cleans up call sites. Next step is to mark the member functions deprecated. See https://mlir.llvm.org/deprecation and https://discourse.llvm.org/t/preferred-casting-style-going-forward. show more ...
Revision tags: llvmorg-18.1.4, llvmorg-18.1.3
# 971b8525	01-Apr-2024	Jakub Kuderski <jakub@nod-labs.com>	[mlir][NFC] Simplify type checks with isa predicates (#87183) For more context on isa predicates, see: https://github.com/llvm/llvm-project/pull/83753.
Revision tags: llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3
# fe8a62c4	08-Feb-2024	Uday Bondhugula <uday@polymagelabs.com>	[MLIR] Fix crash in AffineMap::replace for zero result maps (#80930) Fix obvious bug in AffineMap::replace for the case of zero result maps. Extend/complete inferExprsFromList to work with empty ex [MLIR] Fix crash in AffineMap::replace for zero result maps (#80930) Fix obvious bug in AffineMap::replace for the case of zero result maps. Extend/complete inferExprsFromList to work with empty expression lists. show more ...
Revision tags: llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 62bf7710	21-Jan-2024	Matthias Springer <me@m-sp.org>	[mlir][IR] Add `notifyBlockRemoved` callback to listener (#78306) There is already a "block inserted" notification (in `OpBuilder::Listener`), so there should also be a "block removed" notificatio [mlir][IR] Add `notifyBlockRemoved` callback to listener (#78306) There is already a "block inserted" notification (in `OpBuilder::Listener`), so there should also be a "block removed" notification. The purpose of this change is to make the listener API more mature. There is currently a gap between what kind of IR changes can be made and what IR changes can be listened to. At the moment, the only way to inform listeners about "block removal" is to send a manual `notifyOperationModified` for the parent op (e.g., by wrapping the `eraseBlock(b)` method call in `updateRootInPlace(b->getParentOp())`). This tells the listener that something has changed, but it is somewhat of an API abuse. show more ...
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4
# b984045d	20-Oct-2023	Mehdi Amini <joker.eph@gmail.com>	Apply clang-tidy fixes for readability-simplify-boolean-expr in VectorToGPU.cpp (NFC)
# 847d8457	20-Oct-2023	Mehdi Amini <joker.eph@gmail.com>	Apply clang-tidy fixes for performance-unnecessary-value-param in VectorToGPU.cpp (NFC)
# b8a3f0fd	20-Oct-2023	Mehdi Amini <joker.eph@gmail.com>	Apply clang-tidy fixes for llvm-qualified-auto in VectorToGPU.cpp (NFC)
# 42bba97f	07-Dec-2023	harsh-nod <harsh@nod-labs.com>	[mlir] Extend CombineTransferReadOpTranspose pattern to handle extf ops. (#74754) This patch modifies the CombineTransferReadOpTranspose pattern to handle extf ops. Also adds a test which shows the [mlir] Extend CombineTransferReadOpTranspose pattern to handle extf ops. (#74754) This patch modifies the CombineTransferReadOpTranspose pattern to handle extf ops. Also adds a test which shows the transpose getting folded into the transfer_read. show more ...
# 32c3decb	20-Nov-2023	Matthias Springer <me@m-sp.org>	[mlir][vector] Modernize `vector.transpose` op (#72594) * Declare arguments/results with `let` statements. * Rename `transp` to `permutation`. * Change type of `transp` from `I64ArrayAttr` to `Den [mlir][vector] Modernize `vector.transpose` op (#72594) * Declare arguments/results with `let` statements. * Rename `transp` to `permutation`. * Change type of `transp` from `I64ArrayAttr` to `DenseI64ArrayAttr` (provides direct access to `ArrayRef<int64_t>` instead of `ArrayAttr`). show more ...
# 1609f1c2	14-Nov-2023	long.chen <lipracer@gmail.com>	[mlir][affine][nfc] cleanup deprecated T.cast style functions (#71269) detail see the docment: https://mlir.llvm.org/deprecation/ Not all changes are made manually, most of them are made through [mlir][affine][nfc] cleanup deprecated T.cast style functions (#71269) detail see the docment: https://mlir.llvm.org/deprecation/ Not all changes are made manually, most of them are made through a clang tool I wrote https://github.com/lipracer/cpp-refactor. show more ...
Revision tags: llvmorg-17.0.3, llvmorg-17.0.2
# 9b5ef2be	19-Sep-2023	Matthias Springer <me@m-sp.org>	[mlir][Interfaces] `LoopLikeOpInterface`: Support ops with multiple regions (#66754) This commit implements `LoopLikeOpInterface` on `scf.while`. This enables LICM (and potentially other transforms [mlir][Interfaces] `LoopLikeOpInterface`: Support ops with multiple regions (#66754) This commit implements `LoopLikeOpInterface` on `scf.while`. This enables LICM (and potentially other transforms) on `scf.while`. `LoopLikeOpInterface::getLoopBody()` is renamed to `getLoopRegions` and can now return multiple regions. Also fix a bug in the default implementation of `LoopLikeOpInterface::isDefinedOutsideOfLoop()`, which returned "false" for some values that are defined outside of the loop (in a nested op, in such a way that the value does not dominate the loop). This interface is currently only used for LICM and there is no way to trigger this bug, so no test is added. show more ...
Revision tags: llvmorg-17.0.1
# 5cf714bb	18-Sep-2023	Matthias Springer <me@m-sp.org>	[mlir][SCF] scf.for: Consistent API around `initArgs` (#66512) * Always use the auto-generated `getInitArgs` function. Remove the hand-written `getInitOperands` duplicate. * Remove `hasIterOperand [mlir][SCF] scf.for: Consistent API around `initArgs` (#66512) * Always use the auto-generated `getInitArgs` function. Remove the hand-written `getInitOperands` duplicate. * Remove `hasIterOperands` and `getNumIterOperands`. The names were inconsistent because the "arg" is called `initArgs` in TableGen. Use `getInitArgs().size()` instead. * Fix verification around ops with no results. show more ...
Revision tags: llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init
# cafb6284	19-Jul-2023	Christopher Bate <cbate@nvidia.com>	[mlir][VectorToGPU] Update memref stride preconditions on `nvgpu.mma.sync` path This change removes the requirement that the row stride be statically known when converting `vector.transfer_read` and [mlir][VectorToGPU] Update memref stride preconditions on `nvgpu.mma.sync` path This change removes the requirement that the row stride be statically known when converting `vector.transfer_read` and `vector.transfer_write` to distributed SIMT operations in the `nvgpu` lowering path. It also adds a check to verify that the last dimension of the source memref is statically known to have stride 1 since this is assumed in the conversion logic. No other change should be required since the generated `vector.load` operations are never created across dimensions other than the last. The routines for checking preconditions on `vector.transfer_read/write` are moved to under nvgpu utilities. The change is NFC with respect to the GPU dialect lowering path. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D155753 show more ...
# 8a6e54c9	12-Sep-2023	Daniil Dudkin <39276703+unterumarmung@users.noreply.github.com>	[mlir][arith] Rename operations: `maxf` → `maximumf`, `minf` → `minimumf` (#65800) This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https: [mlir][arith] Rename operations: `maxf` → `maximumf`, `minf` → `minimumf` (#65800) This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671. This commit addresses Task 1.2 of the mentioned RFC. By renaming these operations, we align their names with LLVM intrinsics that have corresponding semantics. show more ...
# a0119437	02-Aug-2023	Lei Zhang <antiagainst@google.com>	[mlir][gpu] Support arith.extf in subgroup MMA elementwise ops This commit adds support for arith.extf in the supported list of elementwise ops for subgroup MMA ops, and enables lowering to SPIR-V. [mlir][gpu] Support arith.extf in subgroup MMA elementwise ops This commit adds support for arith.extf in the supported list of elementwise ops for subgroup MMA ops, and enables lowering to SPIR-V. Reviewed By: mravishankar Differential Revision: https://reviews.llvm.org/D156847 show more ...
# 16b75cd2	31-Jul-2023	Matthias Springer <me@m-sp.org>	[mlir][vector] Use DenseI64ArrayAttr for ExtractOp/InsertOp positions `DenseI64ArrayAttr` provides a better API than `I64ArrayAttr`. E.g., accessors returning `ArrayRef<int64_t>` (instead of `ArrayA [mlir][vector] Use DenseI64ArrayAttr for ExtractOp/InsertOp positions `DenseI64ArrayAttr` provides a better API than `I64ArrayAttr`. E.g., accessors returning `ArrayRef<int64_t>` (instead of `ArrayAttr`) are generated. Differential Revision: https://reviews.llvm.org/D156684 show more ...
12 3 4