Vectorization.cpp - OpenGrok history log for /llvm-project/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init
# 9cbc1f29	22-Jan-2025	Han-Chung Wang <hanhan0912@gmail.com>	[mlir][NFC] Avoid using braced initializer lists to call a constructor. (#123714) In the LLVM style guide, we prefer not using braced initializer lists to call a constructor. Also, we prefer using [mlir][NFC] Avoid using braced initializer lists to call a constructor. (#123714) In the LLVM style guide, we prefer not using braced initializer lists to call a constructor. Also, we prefer using an equal before the open curly brace if we use a braced initializer list when initializing a variable. See https://llvm.org/docs/CodingStandards.html#do-not-use-braced-initializer-lists-to-call-a-constructor for more details. The style guide does not explain the reason well. There is an article from abseil, which mentions few benefits. E.g., we can avoid the most vexing parse, etc. See https://abseil.io/tips/88 for more details. Signed-off-by: hanhanW <hanhan0912@gmail.com> show more ...
Revision tags: llvmorg-19.1.7, llvmorg-19.1.6
# be06c79c	11-Dec-2024	Andrzej Warzyński <andrzej.warzynski@arm.com>	[mlir][linalg] Enable Vectorization of 0-D tensor.extract (#119079) This patch removes an assert in `vectorizeTensorExtract` that was blocking the vectorization of 0-D tensor.extract operations, e [mlir][linalg] Enable Vectorization of 0-D tensor.extract (#119079) This patch removes an assert in `vectorizeTensorExtract` that was blocking the vectorization of 0-D tensor.extract operations, e.g.: ```mlir %1 = tensor.extract %src[] : tensor<f32> ``` As demonstrated by the included tests, this case is already effectively supported. Context The removed assert was introduced in #109580 as a guard, pending proper support and testing for 0-D tensors. This PR addresses that previously undocumented TODO. Apologies for the oversight! Updates and Tests * Revised the existing test `@negative_no_loop` to ensure the `vectorize_nd_extract` attribute is included, allowing the vectorizer to process it. The test was renamed and variables updated for clarity. * Added a new test `@extract_scalar_from_0d_into_1d` to cover "mixed" 0-D/1-D tensor extraction, e.g.: ```mlir %res = linalg.generic { indexing_maps = [#map], iterator_types = ["parallel"] } outs(%init : tensor<1xf32>) { ^bb0(%in: f32): %1 = tensor.extract %src[] : tensor<f32> linalg.yield %1 : f32 } -> tensor<1xf32> return %res : tensor<1xf32> ``` Additional updates I also took the liberty and improved test coverage for 0-D tensor in the vectorizer tests: * Added a specific test for "0D linalg.generic" in "vectorization-with-patterns.mlir". * Renamed several tests in "vectorization-with-patterns.mlir" to clarify that the 0-D case is now covered. show more ...
# a2acb2ff	05-Dec-2024	Andrzej Warzyński <andrzej.warzynski@arm.com>	[mlir][linalg] Fix vectorization of tensor.extract (#118105) The example below demonstrates a "scalar read followed by a broadcast" pattern for `tensor.extract`: ```mlir #map = affine_map<(d0, [mlir][linalg] Fix vectorization of tensor.extract (#118105) The example below demonstrates a "scalar read followed by a broadcast" pattern for `tensor.extract`: ```mlir #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)> func.func @scalar_broadcast( %init : tensor<1x1x3xi32>, %src: tensor<1x3x2x4xi32>, %idx :index) -> tensor<1x1x3xi32> { %c0 = arith.constant 0 :index %res = linalg.generic { indexing_maps = [#map], iterator_types = ["parallel", "parallel", "parallel"]} outs(%init : tensor<1x1x3xi32>) { ^bb0(%out: i32): %val = tensor.extract %src[%idx, %idx, %idx, %idx] : tensor<1x3x2x4xi32> linalg.yield %val : i32 } -> tensor<1x1x3xi32> return %res : tensor<1x1x3xi32> } ``` The default masking path within the Linalg vectorizer, which assumes an identity masking map, is not suitable here. Indeed: * identity != broadcast. This patch ensures masking is handled in the `vectorizeTensorExtract` hook, which has the necessary context for proper handling. Fixes #116197 show more ...
Revision tags: llvmorg-19.1.5
# aa9d3686	29-Nov-2024	Andrzej Warzyński <andrzej.warzynski@arm.com>	[mlir][linalg] Relax scalable vectorization restrictions (#117991) Currently, the Linalg vectorizer disallows non-trailing parallel dimensions to be scalable, e.g., `vector_sizes [[8], 1]` (), for [mlir][linalg] Relax scalable vectorization restrictions (#117991) Currently, the Linalg vectorizer disallows non-trailing parallel dimensions to be scalable, e.g., `vector_sizes [[8], 1]` (), for cases like: ```mlir %0 = linalg.fill ins(%arg0 : f32) outs(%A : tensor<?x?xf32>) -> tensor<?x?xf32> ``` This restriction exists to avoid generating "scalable" arrays of aggregates, which LLVM does not support (multi-dim vectors are lowered into arrays of aggregates at the LLVM level). This patch relaxes that restriction when the trailing parallel vector dimension is `1`, e.g., for `vector_sizes [[8], 1]`. Such cases are safe since trailing unit dimensions can be collapsed. This relaxation is necessary to support scalable vectorization for tensor.pack, where inner tile sizes are `[8]` (scalable) and `1` (scalar). (*) Transform Dialect notation show more ...
# 1b2c8f10	26-Nov-2024	Andrzej Warzyński <andrzej.warzynski@arm.com>	[mlir][linalg] Extract `GeneralizePadOpPattern` into a standalone transformation (#117329) Currently, `GeneralizePadOpPattern` is grouped under `populatePadOpVectorizationPatterns`. However, as not [mlir][linalg] Extract `GeneralizePadOpPattern` into a standalone transformation (#117329) Currently, `GeneralizePadOpPattern` is grouped under `populatePadOpVectorizationPatterns`. However, as noted in #111349, this transformation "decomposes" rather than "vectorizes" `tensor.pad`. As such, it functions as: * a vectorization _pre-processing_ transformation, not * a vectorization transformation itself. To clarify its purpose, this PR turns `GeneralizePadOpPattern` into a standalone transformation by: * introducing a dedicated `populateDecomposePadPatterns` method, * adding a `apply_patterns.linalg.decompose_pad` Transform Dialect Op, * removing it from `populatePadOpVectorizationPatterns`. In addition, to better reflect its role, it is renamed as "decomposition" rather then "generalization". This is in line with the recent renaming of similar ops, i.e. tensor.pack/tensor.unpack Ops in #116439. show more ...
Revision tags: llvmorg-19.1.4
# 8e663039	13-Nov-2024	Kunwar Grover <groverkss@gmail.com>	[mlir][Vector] Remove trivial uses of vector.extractelement/vector.insertelement (1/N) (#116053) This patch removes trivial usages of vector.extractelement/vector.insertelement. These operations ca [mlir][Vector] Remove trivial uses of vector.extractelement/vector.insertelement (1/N) (#116053) This patch removes trivial usages of vector.extractelement/vector.insertelement. These operations can be fully represented by vector.extract/vector.insert. See https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops/71116 for more information. Further patches will remove more usages of these ops. show more ...
# 3ad01480	07-Nov-2024	Md Asghar Ahmad Shahid <md.asghar.ahmad.shahid@intel.com>	[MLIR][Linalg] Re-land linalg.matmul move to ODS. + Remove/update failing obsolete OpDSL tests. (#115319) The earlier PR(https://github.com/llvm/llvm-project/pull/104783) which introduces transpos [MLIR][Linalg] Re-land linalg.matmul move to ODS. + Remove/update failing obsolete OpDSL tests. (#115319) The earlier PR(https://github.com/llvm/llvm-project/pull/104783) which introduces transpose and broadcast semantic to linalg.matmul was reverted due to two failing OpDSL test for linalg.matmul. Since linalg.matmul is now defined using TableGen ODS instead of Python-based OpDSL, these test started failing and needs to be removed/updated. This commit removes/updates the failing obsolete tests from below files. All other files were part of earlier PR and just cherry picked. "mlir/test/python/integration/dialects/linalg/opsrun.py" "mlir/test/python/integration/dialects/transform.py" --------- Co-authored-by: Renato Golin <rengolin@systemcall.eu> show more ...
# 39ad84e4	29-Oct-2024	Andrzej Warzyński <andrzej.warzynski@arm.com>	[mlir][linalg] Split GenericPadOpVectorizationPattern into two patterns (#111349) At the moment, `GenericPadOpVectorizationPattern` implements two orthogonal transformations: 1. Rewrites `tensor [mlir][linalg] Split GenericPadOpVectorizationPattern into two patterns (#111349) At the moment, `GenericPadOpVectorizationPattern` implements two orthogonal transformations: 1. Rewrites `tensor::PadOp` into a sequence of `tensor::EmptyOp`, `linalg::FillOp` and `tensor::InsertSliceOp`. 2. Vectorizes (where possible) `tensor::InsertSliceOp` (see `tryVectorizeCopy`). This patch splits `GenericPadOpVectorizationPattern` into two separate patterns: 1. `GeneralizePadOpPattern` for the first transformation (note that currently `GenericPadOpVectorizationPattern` inherits from `GeneralizePadOpPattern`). 2. `InsertSliceVectorizePattern` to vectorize `tensor::InsertSliceOp`. With this change, we gain the following: * a clear separation between pre-processing and vectorization transformations/stages, * a path to support masked vectorisation for `tensor.insert_slice` (with a dedicated pattern for vectorization, it is much easier to specify the input vector sizes used in masking), * more opportunities to vectorize `tensor.insert_slice`. Note for downstream users: -------------------------- If you were using `populatePadOpVectorizationPatterns`, following this change you will also have to add `populateInsertSliceVectorizationPatterns`. Finer implementation details: ----------------------------- 1. The majority of changes in this patch are copy & paste + some edits. 1.1. The only functional change is that the vectorization of `tensor.insert_slice` is now broadly available (as opposed to being constrained to the pad vectorization pattern: `GenericPadOpVectorizationPattern`). 1.2. Following-on from the above, `@pad_and_insert_slice_dest` is updated. As expected, the input `tensor.insert_slice` Op is no longer "preserved" and instead gets vectorized successfully. 2. The `linalg.fill` case in `getConstantPadVal` works under the assumption that only _scalar_ source values can be used. That's consistent with the definition of the Op, but it's not tested at the moment. Hence a test case in Linalg/invalid.mlir is added. 3. The behaviour of the two TD vectorization Ops, `transform.structured.vectorize_children_and_apply_patterns` and `transform.structured.vectorize` is preserved. show more ...
Revision tags: llvmorg-19.1.3
# ac4bd741	25-Oct-2024	Andrzej Warzyński <andrzej.warzynski@arm.com>	[mlir] Add apply_patterns.linalg.pad_vectorization TD Op (#112504) This PR simply wraps `populatePadOpVectorizationPatterns` into a new Transform Dialect Op: `apply_patterns.linalg.pad_vectorizatio [mlir] Add apply_patterns.linalg.pad_vectorization TD Op (#112504) This PR simply wraps `populatePadOpVectorizationPatterns` into a new Transform Dialect Op: `apply_patterns.linalg.pad_vectorization`. This change makes it possible to run (and test) the corresponding patterns _without_: `transform.structured.vectorize_children_and_apply_patterns`. Note that the Op above only supports non-masked vectorisation (i.e. when the inputs are static), so, effectively, only fixed-width vectorisation (as opposed to scalable vectorisation). As such, this change is required to construct vectorization pipelines for tensor.pad targeting scalable vectors. To test the new Op and the corresponding patterns, I added "vectorization-pad-patterns.mlir" - most tests have been extracted from "vectorization-with-patterns.mlir". show more ...
# 0a3347dc	18-Oct-2024	Andrzej Warzyński <andrzej.warzynski@arm.com>	[mlir][linalg] Fix idx comparison in the vectorizer (#112900) Fixes loop comparison condition in the vectorizer. As that logic is used specifically for vectorising `tensor.extract`, I also added [mlir][linalg] Fix idx comparison in the vectorizer (#112900) Fixes loop comparison condition in the vectorizer. As that logic is used specifically for vectorising `tensor.extract`, I also added a test that violates the assumptions made inside `getTrailingNonUnitLoopDimIdx`, namely that Linalg loops are non-empty. Vectorizer pre-conditions will capture that much earlier making sure that `getTrailingNonUnitLoopDimIdx` is only run when all the assumptions are actually met. Thank you for pointing this out, @pfusik ! show more ...
# f7f51f2a	18-Oct-2024	Andrzej Warzyński <andrzej.warzynski@arm.com>	[mlir][vector] Clarify the semantics of masking maps (nfc) (#111383) We use the term "masking map" throughout the Linalg vectorization logic, but we don't really define what it is and how it differ [mlir][vector] Clarify the semantics of masking maps (nfc) (#111383) We use the term "masking map" throughout the Linalg vectorization logic, but we don't really define what it is and how it differs from Linalg indexing maps. This PR clarifies the differnces, makes sure that the new terminology is used consistenty and improves code re-use. show more ...
# a24c4687	16-Oct-2024	Alexander Pivovarov <pivovaa@amazon.com>	[MLIR] Fix assert expressions (#112474) I noticed that several assertions in MLIR codebase have issues with operator precedence The issue with operator precedence in these assertions is due to t [MLIR] Fix assert expressions (#112474) I noticed that several assertions in MLIR codebase have issues with operator precedence The issue with operator precedence in these assertions is due to the way logical operators are evaluated. The `&&` operator has higher precedence than the `\|\|` operator, which means the assertion is currently evaluating incorrectly, like this: ``` assert((resType.getNumDynamicDims() == dynOutDims.size()) \|\| (dynOutDims.empty() && "Either none or all output dynamic dims must be specified!")); ``` We should add parentheses around the entire expression involving `dynOutDims.empty()` to ensure that the logical conditions are grouped correctly. Here’s the corrected version: ``` assert(((resType.getNumDynamicDims() == dynOutDims.size()) \|\| dynOutDims.empty()) && "Either none or all output dynamic dims must be specified!"); ``` show more ...
Revision tags: llvmorg-19.1.2
# 1276ce9e	11-Oct-2024	Emilio Cota <ecg@google.com>	Revert "[mlir][linalg] Introduce transpose semantic to 'linalg.matmul' ops. (#104783)" This reverts commit 03483737a7a2d72a257a5ab6ff01748ad9cf0f75 and 99c8557, which is a fix-up on top of the forme Revert "[mlir][linalg] Introduce transpose semantic to 'linalg.matmul' ops. (#104783)" This reverts commit 03483737a7a2d72a257a5ab6ff01748ad9cf0f75 and 99c8557, which is a fix-up on top of the former. I'm reverting because this commit broke two tests: mlir/test/python/integration/dialects/linalg/opsrun.py mlir/test/python/integration/dialects/transform.py See https://lab.llvm.org/buildbot/#/builders/138/builds/4872 I'm not familiar with the tests, so I'm leaving it to the original author to either remove or adapt the broken tests, as discussed here: https://github.com/llvm/llvm-project/pull/104783#issuecomment-2406390905 show more ...
# 03483737	10-Oct-2024	Md Asghar Ahmad Shahid <md.asghar.ahmad.shahid@intel.com>	[mlir][linalg] Introduce transpose semantic to 'linalg.matmul' ops. (#104783) The main goal of this patch is to extend the semantic of 'linalg.matmul' named op to include per operand transpose sema [mlir][linalg] Introduce transpose semantic to 'linalg.matmul' ops. (#104783) The main goal of this patch is to extend the semantic of 'linalg.matmul' named op to include per operand transpose semantic while also laying out a way to move ops definition from OpDSL to tablegen. Hence, it is implemented in tablegen. Transpose semantic is as follows. By default 'linalg.matmul' behavior will remain as is. Transpose semantics can be appiled on per input operand by specifying the optional permutation attributes (namely 'permutationA' for 1st input and 'permutationB' for 2nd input) for each operand explicitly as needed. By default, no transpose is mandated for any of the input operand. Example: ``` %val = linalg.matmul ins(%arg0, %arg1 : memref<5x3xf32>, memref<5x7xf32>) outs(%arg2: memref<3x7xf32>) permutationA = [1, 0] permutationB = [0, 1] ``` show more ...
# d9d62331	04-Oct-2024	Andrzej Warzyński <andrzej.warzynski@arm.com>	[mlir][linalg] Add a new helper hook: `hasVectorizationImpl` (#110708) The newly added hook simply returns `false` for Ops for which there's no "vectorization logic" in the Linalg Vectorizer (i.e. [mlir][linalg] Add a new helper hook: `hasVectorizationImpl` (#110708) The newly added hook simply returns `false` for Ops for which there's no "vectorization logic" in the Linalg Vectorizer (i.e. the `vectorize()` method). It's added so that the following two TD ops expose identical level of functionality (that's not the case ATM): * `transform.structured.vectorize_children_and_apply_patterns` * `transform.structured.vectorize` Specifically, ATM, the former works only for Linalg Ops, while the latter works for all Ops that the vectorizer supports (). With this change, I am making sure that both TD will behave consistently. Note, this shouldn't affect any of the current uses of the vectorizer. () This is implemented via the `vectorize()` method in Vectorization.cpp. show more ...
# 56d6b567	04-Oct-2024	Andrzej Warzyński <andrzej.warzynski@arm.com>	[mlir][vector] Relax the requirements on broadcast dims (#99341) NOTE: This is a follow-up for #97049 in which the `in_bounds` attribute was made mandatory. This PR updates the semantics of the [mlir][vector] Relax the requirements on broadcast dims (#99341) NOTE: This is a follow-up for #97049 in which the `in_bounds` attribute was made mandatory. This PR updates the semantics of the `in_bounds` attribute so that broadcast dimensions are no longer required to be "in bounds". Specifically, these xfer_read/xfer_write Ops become valid after this change: ```mlir %read = vector.transfer_read %A[%base1, %base2], %pad {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>} {permutation_map = affine_map<(d0, d1) -> (0)>} : memref<?x?xf32>, vector<9xf32> vector.transfer_write %vec, %A[%base1, %base2], {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>} {permutation_map = affine_map<(d0, d1) -> (0)>} : vector<9xf32>, memref<?x?xf32> ``` Note that the value `false` merely means "may run out-of-bounds", i.e., the corresponding access can still be "in bounds". In fact, the folder for xfer Ops is also updated () and will update the attribute value corresponding to broadcast dims to `true` if all non-broadcast dims are marked as "in bounds". Note that this PR doesn't change any of the lowerings. The changes in "SuperVectorize.cpp", "Vectorization.cpp" and "AffineMap.cpp" are simple reverts of recent changes in #97049. Those were only meant to facilitate making `in_bounds` mandatory and to work around the extra requirements for broadcast dims (those requirements ere removed in this PR). All changes in tests are also reverts of changes from #97049. For context, here's a PR in which "broadcast" dims where forced to always be "in-bounds": https://reviews.llvm.org/D102566 (*) See `foldTransferInBoundsAttribute`. show more ...
Revision tags: llvmorg-19.1.1
# 6d114944	26-Sep-2024	Andrzej Warzyński <andrzej.warzynski@arm.com>	[mlir][Linalg] Refine how broadcast dims are treated (#99015) This PR fixes how broadcast dims (identified as "zero" results in permutation maps) corresponding to a reduction iterator are vectorise [mlir][Linalg] Refine how broadcast dims are treated (#99015) This PR fixes how broadcast dims (identified as "zero" results in permutation maps) corresponding to a reduction iterator are vectorised in the case of generic Ops. Here's an example: ```mlir #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)> #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, 0)> func.func @generic_with_reduction_and_broadcast(%arg0: tensor<1x12x197x197xf32>) -> (tensor<1x12x197x1xf32>) { %0 = tensor.empty() : tensor<1x12x197x1xf32> %1 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "parallel", "reduction"]} ins(%arg0 : tensor<1x12x197x197xf32>) outs(%0 : tensor<1x12x197x1xf32>) { ^bb0(%in: f32, %out: f32): %818 = arith.addf %in, %out : f32 linalg.yield %818 : f32 } -> tensor<1x12x197x1xf32> return %1 : tensor<1x12x197x1xf32> } ``` This is a perfectly valid Generic Op, but currently triggers two issues in the vectoriser. The root cause is this map: ```mlir #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, 0)> ``` This map triggers an assert in `reindexIndexingMap` - this hook incorrectly assumes that every result in the input map is a `dim` expression and that there are no constants. That's not the case in this example. `reindexIndexingMap` is extended to allow maps like the one above. For now, only constant "zero" results are allowed. This can be extended in the future once a good motivating example is available. Separately, the permutation map highlighted above "breaks" mask calculation (ATM masks are always computed, even in the presence of static shapes). When applying the following permutation: ```mlir (d0, d1, d2, d3) -> (d0, d1, d2, 0) ``` to these canonical shapes (corresponding to the example above): ``` (1, 12, 197, 197) ``` we end up with the following error: ```bash error: vector types must have positive constant sizes but got 1, 12, 197, 0 ``` The error makes sense and indicates that we should update the permutation map above to: ``` (d0, d1, d2, d3) -> (d0, d1, d2) ``` This would correctly give the following vector type: ``` vector<1x12x197xi1> ``` Fixes #97247 show more ...
# 234193ba	24-Sep-2024	Nirvedh Meshram <96096277+nirvedhmeshram@users.noreply.github.com>	[mlir][linalg] Vectorization support for convolution of i1 type (#109480) Normally convolutions present with the following linalg op region ``` ^bb0(%arg14: i4, %arg15: i4, %arg16: i4): %17 = a [mlir][linalg] Vectorization support for convolution of i1 type (#109480) Normally convolutions present with the following linalg op region ``` ^bb0(%arg14: i4, %arg15: i4, %arg16: i4): %17 = arith.muli %arg14, %arg15 : i4 %18 = arith.addi %arg16, %17 : i4 linalg.yield %18 : i4 ``` However, for i1 due to strength reduction we get something like ``` ^bb0(%arg14: i1, %arg15: i1, %arg16: i1): %17 = arith.andi %arg14, %arg15 : i1 %18 = arith.ori %arg16, %17 : i1 linalg.yield %18 : i1 ``` This PR updates the logic to support this region for i1 types. show more ...
# b47d1787	24-Sep-2024	Andrzej Warzyński <andrzej.warzynski@arm.com>	[mlir][vector] Refine vectorisation of tensor.extract (#109580) This PR fixes a bug in `isLoopInvariantIdx`. It makes sure that the following case is vectorised as `vector.gather` (as opposed to a [mlir][vector] Refine vectorisation of tensor.extract (#109580) This PR fixes a bug in `isLoopInvariantIdx`. It makes sure that the following case is vectorised as `vector.gather` (as opposed to attempting a contiguous load): ```mlir func.func @index_from_output_column_vector_gather_load(%src: tensor<8x128xf32>) -> tensor<8x1xf32> { %c0 = arith.constant 0 : index %0 = tensor.empty() : tensor<8x1xf32> %res = linalg.generic { indexing_maps = [#map], iterator_types = ["parallel", "parallel"] } outs(%0 : tensor<8x1xf32>) { ^bb0(%arg1: f32): %1 = linalg.index 0 : index %extracted = tensor.extract %src[%1, %c0] : tensor<8x128xf32> linalg.yield %extracted : f32 } -> tensor<8x1xf32> return %res : tensor<8x1xf32> } ``` Specifically, when looking for loop-invariant indices in `tensor.extract` Ops, any `linalg.index` Op that's used in address colcluation should only access loop dims that are == 1. In the example above, the following does not meet that criteria: ```mlir %1 = linalg.index 0 : index ``` Note that this PR also effectively addresses the issue fixed in #107922, i.e. exercised by: * `@vectorize_nd_tensor_extract_load_1d_column_vector_using_gather_load` `getNonUnitLoopDim` introduced in #107922 is still valid though. In fact, it is required to identify that the following case is a contiguous load: ```mlir func.func @index_from_output_column_vector_contiguous_load(%src: tensor<8x128xf32>) -> tensor<8x1xf32> { %c0 = arith.constant 0 : index %0 = tensor.empty() : tensor<8x1xf32> %res = linalg.generic { indexing_maps = [#map], iterator_types = ["parallel", "parallel"] } outs(%0 : tensor<8x1xf32>) { ^bb0(%arg1: f32): %1 = linalg.index 0 : index %extracted = tensor.extract %src[%c0, %1] : tensor<8x128xf32> linalg.yield %extracted : f32 } -> tensor<8x1xf32> return %res : tensor<8x1xf32> } ``` Some logic is still missing to lower the above to `vector.transfer_read`, so it is conservatively lowered to `vector.gather` instead (see TODO in `getTensorExtractMemoryAccessPattern`). There's a few additional changes: * `getNonUnitLoopDim` is simplified and renamed as `getTrailingNonUnitLoopDimIdx`, additional comments are added (note that the functionality didn't change); * extra comments in a few places, variable names in comments update to use Markdown (which is the preferred approach in MLIR). This is a follow-on for: * https://github.com/llvm/llvm-project/pull/107922 * https://github.com/llvm/llvm-project/pull/102321 show more ...
# f264d9a9	21-Sep-2024	Kazu Hirata <kazu@google.com>	[Linalg] Fix a warning This patch fixes: mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp:821:12: error: variable 'countNonUnitDim' set but not used [-Werror,-Wunused-but-set-variable]
# e45fc514	21-Sep-2024	Nirvedh Meshram <96096277+nirvedhmeshram@users.noreply.github.com>	[Linalg][Vectorization] Add support for linalg vectorization of a tensor.extract case (#107922) In https://github.com/llvm/llvm-project/pull/102321 we relaxed the vectorizer so that when checking f [Linalg][Vectorization] Add support for linalg vectorization of a tensor.extract case (#107922) In https://github.com/llvm/llvm-project/pull/102321 we relaxed the vectorizer so that when checking for contiguous loads we dont always have a trailing non unit dim. For example in the test case added we have `tensor<8x1xf32>` which is now a valid candidate for contiguous load. However, the logic to check contiguous load assumed that only the trailing dim will be non unit so this PR just updates that logic to find the actual non unit dim. show more ...
# 315ba774	19-Sep-2024	Andrzej Warzyński <andrzej.warzynski@arm.com>	[mlir][linalg] Vectorisation of tensor.extract - dynamic shapes (#100582) This PR removes the assumption that reading from a dynamic tensor is always a gather load: ```mlir %extracted = tensor. [mlir][linalg] Vectorisation of tensor.extract - dynamic shapes (#100582) This PR removes the assumption that reading from a dynamic tensor is always a gather load: ```mlir %extracted = tensor.extract %src[%c79, %3] : tensor<?x?xf32> ``` That assumption was originally introduced to simplify the implementation and to reduce the number of cases to consider. Now that the vectorisation of `tensor.extract` has been around for > 1 year and has been quite stable, we can safely relax it. This is a relatively small change - rather than using the parent linalg Op to infer the target output shape (not possible with dynamic shapes), the vectorizer will use the (previously constructed) output vector shape instead. As expected, the following test required updating (`vector.gather` -> `vector.transfer_read`): * @masked_dynamic_vectorize_nd_tensor_extract_with_affine_apply_contiguous Similar test for scalable vectors is also added. show more ...
Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3
# 62e5032c	08-Aug-2024	Andrzej Warzyński <andrzej.warzynski@arm.com>	Reapply "[mlir][linalg] Relax tensor.extract vectorization" (#102321) [This reverts commit 6662523d6b2ca0198141c94ee80ebbb41601df9f] Simplifies the vectorization of tensor.extract so that: * all Reapply "[mlir][linalg] Relax tensor.extract vectorization" (#102321) [This reverts commit 6662523d6b2ca0198141c94ee80ebbb41601df9f] Simplifies the vectorization of tensor.extract so that: * all cases that read into a genuinely multi-dim vector () are considered a gather load, all other cases are considered as potential contiguous loads. This change means that the following extraction from a "column" tensor is correctly identified as a scalar load followed by a broadcast (rather than a gather load). ```mlir func.func @vectorize_scalar_broadcast_column_tensor(%in: tensor<1x1x4xi32>) -> tensor<1x1x4xi32> { %c4 = arith.constant 4 : index %c0 = arith.constant 0 : index %cst = arith.constant dense<[...]> : tensor<15x1xi32> %out = linalg.generic { indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} outs(%in : tensor<1x1x4xi32>) { ^bb0(%out: i32): %8 = linalg.index 0 : index %idx_0 = linalg.index 0 : index %extracted = tensor.extract %cst[%idx_0, %c0] : tensor<15x1xi32> linalg.yield %extracted : i32 } -> tensor<1x1x4xi32> return %out:tensor<1x1x4xi32> } ``` Overview of the delta compared to the original submission (#99299): * removed an assert representing a condition that is being relaxed here, * added a test (reading from a column tensor) based on a repro from @hanhanW. (*) `vector<1x4x1xf32>` is considered as 1D vector in this context. show more ...
# 28fa83f8	06-Aug-2024	Han-Chung Wang <hanhan0912@gmail.com>	Revert "[mlir][linalg] Relax tensor.extract vectorization" (#102232) Reverts llvm/llvm-project#99299 because it breaks the lowering. To repro: `mlir-opt -transform-interpreter ~/repro.mlir` ```m Revert "[mlir][linalg] Relax tensor.extract vectorization" (#102232) Reverts llvm/llvm-project#99299 because it breaks the lowering. To repro: `mlir-opt -transform-interpreter ~/repro.mlir` ```mlir #map = affine_map<(d0, d1) -> (d0)> #map1 = affine_map<(d0, d1) -> (d1)> #map2 = affine_map<(d0, d1) -> (d0, d1)> #map3 = affine_map<(d0, d1) -> (d0 + d1)> module { func.func @foo(%arg0: index, %arg1: tensor<2xf32>, %arg2: tensor<4xf32>, %arg3: tensor<1xf32>) -> tensor<4x1xf32> { %c0 = arith.constant 0 : index %cst = arith.constant 1.000000e+00 : f32 %cst_0 = arith.constant 0.000000e+00 : f32 %0 = tensor.empty() : tensor<4x1xf32> %1 = linalg.generic {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel"]} ins(%arg2, %arg3 : tensor<4xf32>, tensor<1xf32>) outs(%0 : tensor<4x1xf32>) { ^bb0(%in: f32, %in_1: f32, %out: f32): %2 = linalg.index 0 : index %3 = linalg.index 1 : index %4 = affine.apply #map3(%3, %arg0) %extracted = tensor.extract %arg1[%c0] : tensor<2xf32> %5 = arith.cmpi eq, %2, %c0 : index %6 = arith.cmpi ult, %2, %c0 : index %7 = arith.select %5, %cst, %in : f32 %8 = arith.select %6, %cst_0, %7 : f32 %9 = arith.cmpi eq, %4, %c0 : index %10 = arith.cmpi ult, %4, %c0 : index %11 = arith.select %9, %cst, %in_1 : f32 %12 = arith.select %10, %cst_0, %11 : f32 %13 = arith.mulf %8, %12 : f32 %14 = arith.mulf %13, %extracted : f32 %15 = arith.cmpi eq, %2, %4 : index %16 = arith.select %15, %cst, %cst_0 : f32 %17 = arith.subf %16, %14 : f32 linalg.yield %17 : f32 } -> tensor<4x1xf32> return %1 : tensor<4x1xf32> } } module attributes {transform.with_named_sequence} { transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) { %0 = transform.structured.match ops{["linalg.generic"]} in %arg1 : (!transform.any_op) -> !transform.any_op transform.structured.vectorize %0 : !transform.any_op transform.yield } } ``` show more ...
# 8868c02c	06-Aug-2024	Andrzej Warzyński <andrzej.warzynski@arm.com>	[mlir][linalg] Relax tensor.extract vectorization (#99299) Simplifies the vectorization of tensor.extract so that: * all cases that read into a genuinely multi-dim vector () are considered a ga [mlir][linalg] Relax tensor.extract vectorization (#99299) Simplifies the vectorization of tensor.extract so that: all cases that read into a genuinely multi-dim vector () are considered a gather load, all other cases are considered as potential contiguous loads. This change means that the following extraction from a "column" tensor will be correctly identified as a scalar load followed by a broadcast (rather than a gather load). ```mlir func.func @vectorize_scalar_broadcast_column_tensor(%in: tensor<1x1x4xi32>) -> tensor<1x1x4xi32> { %c4 = arith.constant 4 : index %c0 = arith.constant 0 : index %cst = arith.constant dense<[...]> : tensor<15x1xi32> %out = linalg.generic { indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} outs(%in : tensor<1x1x4xi32>) { ^bb0(%out: i32): %idx_0 = linalg.index 0 : index %extracted = tensor.extract %cst[%idx_0, %c0] : tensor<15x1xi32> linalg.yield %extracted : i32 } -> tensor<1x1x4xi32> return %out:tensor<1x1x4xi32> } ``` (*) `vector<1x4x1xf32>` is considered as 1D vector in this context. show more ...
12 3 4 5 6 7 8 9 10 >>...12