History log of /llvm-project/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp (Results 1 – 25 of 294)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init
# 9cbc1f29 22-Jan-2025 Han-Chung Wang <hanhan0912@gmail.com>

[mlir][NFC] Avoid using braced initializer lists to call a constructor. (#123714)

In the LLVM style guide, we prefer not using braced initializer lists to
call a constructor. Also, we prefer using

[mlir][NFC] Avoid using braced initializer lists to call a constructor. (#123714)

In the LLVM style guide, we prefer not using braced initializer lists to
call a constructor. Also, we prefer using an equal before the open curly
brace if we use a braced initializer list when initializing a variable.

See

https://llvm.org/docs/CodingStandards.html#do-not-use-braced-initializer-lists-to-call-a-constructor
for more details.

The style guide does not explain the reason well. There is an article
from abseil, which mentions few benefits. E.g., we can avoid the most
vexing parse, etc. See https://abseil.io/tips/88 for more details.

Signed-off-by: hanhanW <hanhan0912@gmail.com>

show more ...


Revision tags: llvmorg-19.1.7, llvmorg-19.1.6
# be06c79c 11-Dec-2024 Andrzej Warzyński <andrzej.warzynski@arm.com>

[mlir][linalg] Enable Vectorization of 0-D tensor.extract (#119079)

This patch removes an assert in `vectorizeTensorExtract` that was
blocking
the vectorization of 0-D tensor.extract operations, e

[mlir][linalg] Enable Vectorization of 0-D tensor.extract (#119079)

This patch removes an assert in `vectorizeTensorExtract` that was
blocking
the vectorization of 0-D tensor.extract operations, e.g.:

```mlir
%1 = tensor.extract %src[] : tensor<f32>
```

As demonstrated by the included tests, this case is already effectively
supported.

**Context**
The removed assert was introduced in #109580 as a guard, pending proper
support
and testing for 0-D tensors. This PR addresses that previously
undocumented
TODO. Apologies for the oversight!

**Updates and Tests**
* Revised the existing test `@negative_no_loop` to ensure the
`vectorize_nd_extract` attribute is included, allowing the vectorizer
to process it. The test was renamed and variables updated for clarity.
* Added a new test `@extract_scalar_from_0d_into_1d` to cover "mixed"
0-D/1-D tensor extraction, e.g.:
```mlir
%res = linalg.generic {
indexing_maps = [#map],
iterator_types = ["parallel"]
} outs(%init : tensor<1xf32>) {
^bb0(%in: f32):
%1 = tensor.extract %src[] : tensor<f32>
linalg.yield %1 : f32
} -> tensor<1xf32>

return %res : tensor<1xf32>
```

**Additional updates**
I also took the liberty and improved test coverage for 0-D tensor in the
vectorizer tests:
* Added a specific test for "0D linalg.generic" in
"vectorization-with-patterns.mlir".
* Renamed several tests in "vectorization-with-patterns.mlir" to clarify
that the 0-D case is now covered.

show more ...


# a2acb2ff 05-Dec-2024 Andrzej Warzyński <andrzej.warzynski@arm.com>

[mlir][linalg] Fix vectorization of tensor.extract (#118105)

The example below demonstrates a "scalar read followed by a broadcast"
pattern for `tensor.extract`:

```mlir
#map = affine_map<(d0,

[mlir][linalg] Fix vectorization of tensor.extract (#118105)

The example below demonstrates a "scalar read followed by a broadcast"
pattern for `tensor.extract`:

```mlir
#map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
func.func @scalar_broadcast(
%init : tensor<1x1x3xi32>,
%src: tensor<1x3x2x4xi32>,
%idx :index) -> tensor<1x1x3xi32> {

%c0 = arith.constant 0 :index

%res = linalg.generic {
indexing_maps = [#map],
iterator_types = ["parallel", "parallel", "parallel"]}
outs(%init : tensor<1x1x3xi32>) {
^bb0(%out: i32):
%val = tensor.extract %src[%idx, %idx, %idx, %idx] : tensor<1x3x2x4xi32>
linalg.yield %val : i32
} -> tensor<1x1x3xi32>

return %res : tensor<1x1x3xi32>
}
```

The default masking path within the Linalg vectorizer, which assumes an
identity masking map, is not suitable here. Indeed:

* identity != broadcast.

This patch ensures masking is handled in the `vectorizeTensorExtract`
hook, which has the necessary context for proper handling.

Fixes #116197

show more ...


Revision tags: llvmorg-19.1.5
# aa9d3686 29-Nov-2024 Andrzej Warzyński <andrzej.warzynski@arm.com>

[mlir][linalg] Relax scalable vectorization restrictions (#117991)

Currently, the Linalg vectorizer disallows non-trailing parallel
dimensions to be scalable, e.g., `vector_sizes [[8], 1]` (*), for

[mlir][linalg] Relax scalable vectorization restrictions (#117991)

Currently, the Linalg vectorizer disallows non-trailing parallel
dimensions to be scalable, e.g., `vector_sizes [[8], 1]` (*), for cases
like:

```mlir
%0 = linalg.fill ins(%arg0 : f32) outs(%A : tensor<?x?xf32>) -> tensor<?x?xf32>
```

This restriction exists to avoid generating "scalable" arrays of
aggregates, which LLVM does not support (multi-dim vectors are lowered
into arrays of aggregates at the LLVM level).

This patch relaxes that restriction when the trailing parallel vector
dimension is `1`, e.g., for `vector_sizes [[8], 1]`. Such cases are safe
since trailing unit dimensions can be collapsed. This relaxation is
necessary to support scalable vectorization for tensor.pack, where inner
tile sizes are `[8]` (scalable) and `1` (scalar).

(*) Transform Dialect notation

show more ...


# 1b2c8f10 26-Nov-2024 Andrzej Warzyński <andrzej.warzynski@arm.com>

[mlir][linalg] Extract `GeneralizePadOpPattern` into a standalone transformation (#117329)

Currently, `GeneralizePadOpPattern` is grouped under
`populatePadOpVectorizationPatterns`. However, as not

[mlir][linalg] Extract `GeneralizePadOpPattern` into a standalone transformation (#117329)

Currently, `GeneralizePadOpPattern` is grouped under
`populatePadOpVectorizationPatterns`. However, as noted in #111349, this
transformation "decomposes" rather than "vectorizes" `tensor.pad`. As
such, it functions as:
* a vectorization _pre-processing_ transformation, not
* a vectorization transformation itself.

To clarify its purpose, this PR turns `GeneralizePadOpPattern` into a
standalone transformation by:
* introducing a dedicated `populateDecomposePadPatterns` method,
* adding a `apply_patterns.linalg.decompose_pad` Transform Dialect Op,
* removing it from `populatePadOpVectorizationPatterns`.

In addition, to better reflect its role, it is renamed as "decomposition"
rather then "generalization". This is in line with the recent renaming
of similar ops, i.e. tensor.pack/tensor.unpack Ops in #116439.

show more ...


Revision tags: llvmorg-19.1.4
# 8e663039 13-Nov-2024 Kunwar Grover <groverkss@gmail.com>

[mlir][Vector] Remove trivial uses of vector.extractelement/vector.insertelement (1/N) (#116053)

This patch removes trivial usages of
vector.extractelement/vector.insertelement. These operations ca

[mlir][Vector] Remove trivial uses of vector.extractelement/vector.insertelement (1/N) (#116053)

This patch removes trivial usages of
vector.extractelement/vector.insertelement. These operations can be
fully represented by vector.extract/vector.insert. See
https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops/71116
for more information.

Further patches will remove more usages of these ops.

show more ...


# 3ad01480 07-Nov-2024 Md Asghar Ahmad Shahid <md.asghar.ahmad.shahid@intel.com>

[MLIR][Linalg] Re-land linalg.matmul move to ODS. + Remove/update failing obsolete OpDSL tests. (#115319)

The earlier PR(https://github.com/llvm/llvm-project/pull/104783) which
introduces
transpos

[MLIR][Linalg] Re-land linalg.matmul move to ODS. + Remove/update failing obsolete OpDSL tests. (#115319)

The earlier PR(https://github.com/llvm/llvm-project/pull/104783) which
introduces
transpose and broadcast semantic to linalg.matmul was reverted due to
two failing
OpDSL test for linalg.matmul.

Since linalg.matmul is now defined using TableGen ODS instead of
Python-based OpDSL,
these test started failing and needs to be removed/updated.

This commit removes/updates the failing obsolete tests from below files.
All other files
were part of earlier PR and just cherry picked.
"mlir/test/python/integration/dialects/linalg/opsrun.py"
"mlir/test/python/integration/dialects/transform.py"

---------

Co-authored-by: Renato Golin <rengolin@systemcall.eu>

show more ...


# 39ad84e4 29-Oct-2024 Andrzej Warzyński <andrzej.warzynski@arm.com>

[mlir][linalg] Split GenericPadOpVectorizationPattern into two patterns (#111349)

At the moment, `GenericPadOpVectorizationPattern` implements two
orthogonal transformations:
1. Rewrites `tensor

[mlir][linalg] Split GenericPadOpVectorizationPattern into two patterns (#111349)

At the moment, `GenericPadOpVectorizationPattern` implements two
orthogonal transformations:
1. Rewrites `tensor::PadOp` into a sequence of `tensor::EmptyOp`,
`linalg::FillOp` and `tensor::InsertSliceOp`.
2. Vectorizes (where possible) `tensor::InsertSliceOp` (see
`tryVectorizeCopy`).

This patch splits `GenericPadOpVectorizationPattern` into two separate
patterns:
1. `GeneralizePadOpPattern` for the first transformation (note that
currently `GenericPadOpVectorizationPattern` inherits from
`GeneralizePadOpPattern`).
2. `InsertSliceVectorizePattern` to vectorize `tensor::InsertSliceOp`.

With this change, we gain the following:
* a clear separation between pre-processing and vectorization
transformations/stages,
* a path to support masked vectorisation for `tensor.insert_slice`
(with a dedicated pattern for vectorization, it is much easier to
specify the input vector sizes used in masking),
* more opportunities to vectorize `tensor.insert_slice`.

Note for downstream users:
--------------------------

If you were using `populatePadOpVectorizationPatterns`, following this
change you will also have to add
`populateInsertSliceVectorizationPatterns`.

Finer implementation details:
-----------------------------

1. The majority of changes in this patch are copy & paste + some edits.
1.1. The only functional change is that the vectorization of
`tensor.insert_slice` is now broadly available (as opposed to being
constrained to the pad vectorization pattern:
`GenericPadOpVectorizationPattern`).
1.2. Following-on from the above, `@pad_and_insert_slice_dest` is
updated. As expected, the input `tensor.insert_slice` Op is no
longer "preserved" and instead gets vectorized successfully.

2. The `linalg.fill` case in `getConstantPadVal` works under the
assumption that only _scalar_ source values can be used. That's
consistent with the definition of the Op, but it's not tested at the
moment. Hence a test case in Linalg/invalid.mlir is added.

3. The behaviour of the two TD vectorization Ops,
`transform.structured.vectorize_children_and_apply_patterns` and
`transform.structured.vectorize` is preserved.

show more ...


Revision tags: llvmorg-19.1.3
# ac4bd741 25-Oct-2024 Andrzej Warzyński <andrzej.warzynski@arm.com>

[mlir] Add apply_patterns.linalg.pad_vectorization TD Op (#112504)

This PR simply wraps `populatePadOpVectorizationPatterns` into a new
Transform Dialect Op: `apply_patterns.linalg.pad_vectorizatio

[mlir] Add apply_patterns.linalg.pad_vectorization TD Op (#112504)

This PR simply wraps `populatePadOpVectorizationPatterns` into a new
Transform Dialect Op: `apply_patterns.linalg.pad_vectorization`.

This change makes it possible to run (and test) the corresponding
patterns _without_:

`transform.structured.vectorize_children_and_apply_patterns`.

Note that the Op above only supports non-masked vectorisation (i.e. when
the inputs are static), so, effectively, only fixed-width vectorisation
(as opposed to scalable vectorisation). As such, this change is required
to construct vectorization pipelines for tensor.pad targeting scalable
vectors.

To test the new Op and the corresponding patterns, I added
"vectorization-pad-patterns.mlir" - most tests have been extracted from
"vectorization-with-patterns.mlir".

show more ...


# 0a3347dc 18-Oct-2024 Andrzej Warzyński <andrzej.warzynski@arm.com>

[mlir][linalg] Fix idx comparison in the vectorizer (#112900)

Fixes loop comparison condition in the vectorizer.

As that logic is used specifically for vectorising `tensor.extract`, I
also added

[mlir][linalg] Fix idx comparison in the vectorizer (#112900)

Fixes loop comparison condition in the vectorizer.

As that logic is used specifically for vectorising `tensor.extract`, I
also added a test that violates the assumptions made inside
`getTrailingNonUnitLoopDimIdx`, namely that Linalg loops are non-empty.
Vectorizer pre-conditions will capture that much earlier making sure
that `getTrailingNonUnitLoopDimIdx` is only run when all the assumptions
are actually met.

Thank you for pointing this out, @pfusik !

show more ...


# f7f51f2a 18-Oct-2024 Andrzej Warzyński <andrzej.warzynski@arm.com>

[mlir][vector] Clarify the semantics of masking maps (nfc) (#111383)

We use the term "masking map" throughout the Linalg vectorization logic,
but we don't really define what it is and how it differ

[mlir][vector] Clarify the semantics of masking maps (nfc) (#111383)

We use the term "masking map" throughout the Linalg vectorization logic,
but we don't really define what it is and how it differs from Linalg
indexing maps. This PR clarifies the differnces, makes sure that the new
terminology is used consistenty and improves code re-use.

show more ...


# a24c4687 16-Oct-2024 Alexander Pivovarov <pivovaa@amazon.com>

[MLIR] Fix assert expressions (#112474)

I noticed that several assertions in MLIR codebase have issues with
operator precedence

The issue with operator precedence in these assertions is due to t

[MLIR] Fix assert expressions (#112474)

I noticed that several assertions in MLIR codebase have issues with
operator precedence

The issue with operator precedence in these assertions is due to the way
logical operators are evaluated. The `&&` operator has higher precedence
than the `||` operator, which means the assertion is currently
evaluating incorrectly, like this:
```
assert((resType.getNumDynamicDims() == dynOutDims.size()) ||
(dynOutDims.empty() && "Either none or all output dynamic dims must be specified!"));
```

We should add parentheses around the entire expression involving
`dynOutDims.empty()` to ensure that the logical conditions are grouped
correctly. Here’s the corrected version:
```
assert(((resType.getNumDynamicDims() == dynOutDims.size()) || dynOutDims.empty()) &&
"Either none or all output dynamic dims must be specified!");

```

show more ...


Revision tags: llvmorg-19.1.2
# 1276ce9e 11-Oct-2024 Emilio Cota <ecg@google.com>

Revert "[mlir][linalg] Introduce transpose semantic to 'linalg.matmul' ops. (#104783)"

This reverts commit 03483737a7a2d72a257a5ab6ff01748ad9cf0f75 and
99c8557, which is a fix-up on top of the forme

Revert "[mlir][linalg] Introduce transpose semantic to 'linalg.matmul' ops. (#104783)"

This reverts commit 03483737a7a2d72a257a5ab6ff01748ad9cf0f75 and
99c8557, which is a fix-up on top of the former.

I'm reverting because this commit broke two tests:
mlir/test/python/integration/dialects/linalg/opsrun.py
mlir/test/python/integration/dialects/transform.py
See https://lab.llvm.org/buildbot/#/builders/138/builds/4872

I'm not familiar with the tests, so I'm leaving it to the original author
to either remove or adapt the broken tests, as discussed here:
https://github.com/llvm/llvm-project/pull/104783#issuecomment-2406390905

show more ...


# 03483737 10-Oct-2024 Md Asghar Ahmad Shahid <md.asghar.ahmad.shahid@intel.com>

[mlir][linalg] Introduce transpose semantic to 'linalg.matmul' ops. (#104783)

The main goal of this patch is to extend the semantic of 'linalg.matmul'
named op to include per operand transpose sema

[mlir][linalg] Introduce transpose semantic to 'linalg.matmul' ops. (#104783)

The main goal of this patch is to extend the semantic of 'linalg.matmul'
named op to include per operand transpose semantic while also laying out
a way to move ops definition from OpDSL to tablegen. Hence, it is
implemented in tablegen. Transpose semantic is as follows.

By default 'linalg.matmul' behavior will remain as is. Transpose
semantics can be appiled on per input operand by specifying the optional
permutation attributes (namely 'permutationA' for 1st input and
'permutationB' for 2nd input) for each operand explicitly as needed. By
default, no transpose is mandated for any of the input operand.

Example:
```
%val = linalg.matmul ins(%arg0, %arg1 : memref<5x3xf32>,
memref<5x7xf32>)
outs(%arg2: memref<3x7xf32>)
permutationA = [1, 0]
permutationB = [0, 1]
```

show more ...


# d9d62331 04-Oct-2024 Andrzej Warzyński <andrzej.warzynski@arm.com>

[mlir][linalg] Add a new helper hook: `hasVectorizationImpl` (#110708)

The newly added hook simply returns `false` for Ops for which there's no
"vectorization logic" in the Linalg Vectorizer (i.e.

[mlir][linalg] Add a new helper hook: `hasVectorizationImpl` (#110708)

The newly added hook simply returns `false` for Ops for which there's no
"vectorization logic" in the Linalg Vectorizer (i.e. the `vectorize()`
method). It's added so that the following two TD ops expose identical
level of functionality (that's not the case ATM):

* `transform.structured.vectorize_children_and_apply_patterns`
* `transform.structured.vectorize`

Specifically, ATM, the former works only for Linalg Ops, while the
latter works for all Ops that the vectorizer supports (*). With this
change,
I am making sure that both TD will behave consistently.

Note, this shouldn't affect any of the current uses of the vectorizer.

(*) This is implemented via the `vectorize()` method in
Vectorization.cpp.

show more ...


# 56d6b567 04-Oct-2024 Andrzej Warzyński <andrzej.warzynski@arm.com>

[mlir][vector] Relax the requirements on broadcast dims (#99341)

NOTE: This is a follow-up for #97049 in which the `in_bounds` attribute
was made mandatory.

This PR updates the semantics of the

[mlir][vector] Relax the requirements on broadcast dims (#99341)

NOTE: This is a follow-up for #97049 in which the `in_bounds` attribute
was made mandatory.

This PR updates the semantics of the `in_bounds` attribute so that
broadcast dimensions are no longer required to be "in bounds".
Specifically, these xfer_read/xfer_write Ops become valid after this
change:

```mlir
%read = vector.transfer_read %A[%base1, %base2], %pad
{in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>}
{permutation_map = affine_map<(d0, d1) -> (0)>}
: memref<?x?xf32>, vector<9xf32>

vector.transfer_write %vec, %A[%base1, %base2],
{in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>}
{permutation_map = affine_map<(d0, d1) -> (0)>}
: vector<9xf32>, memref<?x?xf32>
```

Note that the value `false` merely means "may run out-of-bounds", i.e.,
the corresponding access can still be "in bounds". In fact, the folder
for xfer Ops is also updated (*) and will update the attribute value
corresponding to broadcast dims to `true` if all non-broadcast dims
are marked as "in bounds".

Note that this PR doesn't change any of the lowerings. The changes in
"SuperVectorize.cpp", "Vectorization.cpp" and "AffineMap.cpp" are simple
reverts of recent changes in #97049. Those were only meant to facilitate
making `in_bounds` mandatory and to work around the extra requirements
for broadcast dims (those requirements ere removed in this PR). All
changes in tests are also reverts of changes from #97049.

For context, here's a PR in which "broadcast" dims where forced to
always be "in-bounds":
* https://reviews.llvm.org/D102566

(*) See `foldTransferInBoundsAttribute`.

show more ...


Revision tags: llvmorg-19.1.1
# 6d114944 26-Sep-2024 Andrzej Warzyński <andrzej.warzynski@arm.com>

[mlir][Linalg] Refine how broadcast dims are treated (#99015)

This PR fixes how broadcast dims (identified as "zero" results in
permutation maps) corresponding to a reduction iterator are vectorise

[mlir][Linalg] Refine how broadcast dims are treated (#99015)

This PR fixes how broadcast dims (identified as "zero" results in
permutation maps) corresponding to a reduction iterator are vectorised
in the case of generic Ops. Here's an example:

```mlir
#map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
#map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, 0)>

func.func @generic_with_reduction_and_broadcast(%arg0: tensor<1x12x197x197xf32>) -> (tensor<1x12x197x1xf32>) {
%0 = tensor.empty() : tensor<1x12x197x1xf32>

%1 = linalg.generic {indexing_maps = [#map, #map1],
iterator_types = ["parallel", "parallel", "parallel", "reduction"]}
ins(%arg0 : tensor<1x12x197x197xf32>)
outs(%0 : tensor<1x12x197x1xf32>) {

^bb0(%in: f32, %out: f32):
%818 = arith.addf %in, %out : f32
linalg.yield %818 : f32
} -> tensor<1x12x197x1xf32>
return %1 : tensor<1x12x197x1xf32>
}
```

This is a perfectly valid Generic Op, but currently triggers two issues
in the vectoriser. The root cause is this map:

```mlir
#map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, 0)>
```

This map triggers an assert in `reindexIndexingMap` - this hook
incorrectly assumes that every result in the input map is a `dim`
expression and that there are no constants. That's not the case in this
example. `reindexIndexingMap` is extended to allow maps like the one
above. For now, only constant "zero" results are allowed. This can be
extended in the future once a good motivating example is available.

Separately, the permutation map highlighted above "breaks" mask
calculation (ATM masks are always computed, even in the presence of
static shapes). When applying the following permutation:
```mlir
(d0, d1, d2, d3) -> (d0, d1, d2, 0)
```

to these canonical shapes (corresponding to the example above):
```
(1, 12, 197, 197)
```
we end up with the following error:
```bash
error: vector types must have positive constant sizes but got 1, 12, 197, 0
```

The error makes sense and indicates that we should update the
permutation map above to:
```
(d0, d1, d2, d3) -> (d0, d1, d2)
```

This would correctly give the following vector type:
```
vector<1x12x197xi1>
```

Fixes #97247

show more ...


# 234193ba 24-Sep-2024 Nirvedh Meshram <96096277+nirvedhmeshram@users.noreply.github.com>

[mlir][linalg] Vectorization support for convolution of i1 type (#109480)

Normally convolutions present with the following linalg op region
```
^bb0(%arg14: i4, %arg15: i4, %arg16: i4):
%17 = a

[mlir][linalg] Vectorization support for convolution of i1 type (#109480)

Normally convolutions present with the following linalg op region
```
^bb0(%arg14: i4, %arg15: i4, %arg16: i4):
%17 = arith.muli %arg14, %arg15 : i4
%18 = arith.addi %arg16, %17 : i4
linalg.yield %18 : i4
```
However, for i1 due to strength reduction we get something like
```
^bb0(%arg14: i1, %arg15: i1, %arg16: i1):
%17 = arith.andi %arg14, %arg15 : i1
%18 = arith.ori %arg16, %17 : i1
linalg.yield %18 : i1
```
This PR updates the logic to support this region for i1 types.

show more ...


# b47d1787 24-Sep-2024 Andrzej Warzyński <andrzej.warzynski@arm.com>

[mlir][vector] Refine vectorisation of tensor.extract (#109580)

This PR fixes a bug in `isLoopInvariantIdx`. It makes sure that the
following case is vectorised as `vector.gather` (as opposed to
a

[mlir][vector] Refine vectorisation of tensor.extract (#109580)

This PR fixes a bug in `isLoopInvariantIdx`. It makes sure that the
following case is vectorised as `vector.gather` (as opposed to
attempting a contiguous load):
```mlir
func.func @index_from_output_column_vector_gather_load(%src: tensor<8x128xf32>) -> tensor<8x1xf32> {
%c0 = arith.constant 0 : index
%0 = tensor.empty() : tensor<8x1xf32>
%res = linalg.generic {
indexing_maps = [#map],
iterator_types = ["parallel", "parallel"]
} outs(%0 : tensor<8x1xf32>) {
^bb0(%arg1: f32):
%1 = linalg.index 0 : index
%extracted = tensor.extract %src[%1, %c0] : tensor<8x128xf32>
linalg.yield %extracted : f32
} -> tensor<8x1xf32>
return %res : tensor<8x1xf32>
}
```

Specifically, when looking for loop-invariant indices in
`tensor.extract` Ops, any `linalg.index` Op that's used in address
colcluation should only access loop dims that are == 1. In the example
above, the following does not meet that criteria:
```mlir
%1 = linalg.index 0 : index
```

Note that this PR also effectively addresses the issue fixed in #107922,
i.e. exercised by:
* `@vectorize_nd_tensor_extract_load_1d_column_vector_using_gather_load`

`getNonUnitLoopDim` introduced in #107922 is still valid though. In
fact, it is required to identify that the following case is a contiguous
load:
```mlir
func.func @index_from_output_column_vector_contiguous_load(%src: tensor<8x128xf32>) -> tensor<8x1xf32> {
%c0 = arith.constant 0 : index
%0 = tensor.empty() : tensor<8x1xf32>
%res = linalg.generic {
indexing_maps = [#map],
iterator_types = ["parallel", "parallel"]
} outs(%0 : tensor<8x1xf32>) {
^bb0(%arg1: f32):
%1 = linalg.index 0 : index
%extracted = tensor.extract %src[%c0, %1] : tensor<8x128xf32>
linalg.yield %extracted : f32
} -> tensor<8x1xf32>
return %res : tensor<8x1xf32>
}
```
Some logic is still missing to lower the above to
`vector.transfer_read`, so it is conservatively lowered to
`vector.gather` instead (see TODO in
`getTensorExtractMemoryAccessPattern`).

There's a few additional changes:
* `getNonUnitLoopDim` is simplified and renamed as
`getTrailingNonUnitLoopDimIdx`, additional comments are added (note
that the functionality didn't change);
* extra comments in a few places, variable names in comments update to
use Markdown (which is the preferred approach in MLIR).

This is a follow-on for:
* https://github.com/llvm/llvm-project/pull/107922
* https://github.com/llvm/llvm-project/pull/102321

show more ...


# f264d9a9 21-Sep-2024 Kazu Hirata <kazu@google.com>

[Linalg] Fix a warning

This patch fixes:

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp:821:12: error:
variable 'countNonUnitDim' set but not used
[-Werror,-Wunused-but-set-variable]


# e45fc514 21-Sep-2024 Nirvedh Meshram <96096277+nirvedhmeshram@users.noreply.github.com>

[Linalg][Vectorization] Add support for linalg vectorization of a tensor.extract case (#107922)

In https://github.com/llvm/llvm-project/pull/102321 we relaxed the
vectorizer so that when checking f

[Linalg][Vectorization] Add support for linalg vectorization of a tensor.extract case (#107922)

In https://github.com/llvm/llvm-project/pull/102321 we relaxed the
vectorizer so that when checking for contiguous loads we dont always
have a trailing non unit dim. For example in the test case added we have
`tensor<8x1xf32>` which is now a valid candidate for contiguous load.
However, the logic to check contiguous load assumed that only the
trailing dim will be non unit so this PR just updates that logic to find
the actual non unit dim.

show more ...


# 315ba774 19-Sep-2024 Andrzej Warzyński <andrzej.warzynski@arm.com>

[mlir][linalg] Vectorisation of tensor.extract - dynamic shapes (#100582)

This PR removes the assumption that reading from a dynamic tensor is
always a gather load:

```mlir
%extracted = tensor.

[mlir][linalg] Vectorisation of tensor.extract - dynamic shapes (#100582)

This PR removes the assumption that reading from a dynamic tensor is
always a gather load:

```mlir
%extracted = tensor.extract %src[%c79, %3] : tensor<?x?xf32>
```

That assumption was originally introduced to simplify the implementation
and to reduce the number of cases to consider. Now that the
vectorisation of `tensor.extract` has been around for > 1 year and has
been quite stable, we can safely relax it.

This is a relatively small change - rather than using the parent linalg
Op to infer the target output shape (not possible with dynamic shapes),
the vectorizer will use the (previously constructed) output vector
shape instead.

As expected, the following test required updating (`vector.gather` ->
`vector.transfer_read`):
*
@masked_dynamic_vectorize_nd_tensor_extract_with_affine_apply_contiguous

Similar test for scalable vectors is also added.

show more ...


Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3
# 62e5032c 08-Aug-2024 Andrzej Warzyński <andrzej.warzynski@arm.com>

Reapply "[mlir][linalg] Relax tensor.extract vectorization" (#102321)

[This reverts commit 6662523d6b2ca0198141c94ee80ebbb41601df9f]

Simplifies the vectorization of tensor.extract so that:
* all

Reapply "[mlir][linalg] Relax tensor.extract vectorization" (#102321)

[This reverts commit 6662523d6b2ca0198141c94ee80ebbb41601df9f]

Simplifies the vectorization of tensor.extract so that:
* all cases that read into a genuinely multi-dim vector (*) are
considered a gather load,
* all other cases are considered as potential contiguous loads.

This change means that the following extraction from a "column" tensor
is correctly identified as a scalar load followed by a broadcast (rather
than a gather load).

```mlir
func.func @vectorize_scalar_broadcast_column_tensor(%in: tensor<1x1x4xi32>) -> tensor<1x1x4xi32> {
%c4 = arith.constant 4 : index
%c0 = arith.constant 0 : index
%cst = arith.constant dense<[...]> : tensor<15x1xi32>

%out = linalg.generic {
indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>],
iterator_types = ["parallel", "parallel", "parallel"]}
outs(%in : tensor<1x1x4xi32>) {

^bb0(%out: i32):
%8 = linalg.index 0 : index
%idx_0 = linalg.index 0 : index
%extracted = tensor.extract %cst[%idx_0, %c0] : tensor<15x1xi32>
linalg.yield %extracted : i32
} -> tensor<1x1x4xi32>

return %out:tensor<1x1x4xi32>
}
```

Overview of the delta compared to the original submission (#99299):
* removed an assert representing a condition that is being relaxed
here,
* added a test (reading from a column tensor) based on a repro from
@hanhanW.

(*) `vector<1x4x1xf32>` is considered as 1D vector in this context.

show more ...


# 28fa83f8 06-Aug-2024 Han-Chung Wang <hanhan0912@gmail.com>

Revert "[mlir][linalg] Relax tensor.extract vectorization" (#102232)

Reverts llvm/llvm-project#99299 because it breaks the lowering. To
repro: `mlir-opt -transform-interpreter ~/repro.mlir`

```m

Revert "[mlir][linalg] Relax tensor.extract vectorization" (#102232)

Reverts llvm/llvm-project#99299 because it breaks the lowering. To
repro: `mlir-opt -transform-interpreter ~/repro.mlir`

```mlir
#map = affine_map<(d0, d1) -> (d0)>
#map1 = affine_map<(d0, d1) -> (d1)>
#map2 = affine_map<(d0, d1) -> (d0, d1)>
#map3 = affine_map<(d0, d1) -> (d0 + d1)>
module {
func.func @foo(%arg0: index, %arg1: tensor<2xf32>, %arg2: tensor<4xf32>, %arg3: tensor<1xf32>) -> tensor<4x1xf32> {
%c0 = arith.constant 0 : index
%cst = arith.constant 1.000000e+00 : f32
%cst_0 = arith.constant 0.000000e+00 : f32
%0 = tensor.empty() : tensor<4x1xf32>
%1 = linalg.generic {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel"]} ins(%arg2, %arg3 : tensor<4xf32>, tensor<1xf32>) outs(%0 : tensor<4x1xf32>) {
^bb0(%in: f32, %in_1: f32, %out: f32):
%2 = linalg.index 0 : index
%3 = linalg.index 1 : index
%4 = affine.apply #map3(%3, %arg0)
%extracted = tensor.extract %arg1[%c0] : tensor<2xf32>
%5 = arith.cmpi eq, %2, %c0 : index
%6 = arith.cmpi ult, %2, %c0 : index
%7 = arith.select %5, %cst, %in : f32
%8 = arith.select %6, %cst_0, %7 : f32
%9 = arith.cmpi eq, %4, %c0 : index
%10 = arith.cmpi ult, %4, %c0 : index
%11 = arith.select %9, %cst, %in_1 : f32
%12 = arith.select %10, %cst_0, %11 : f32
%13 = arith.mulf %8, %12 : f32
%14 = arith.mulf %13, %extracted : f32
%15 = arith.cmpi eq, %2, %4 : index
%16 = arith.select %15, %cst, %cst_0 : f32
%17 = arith.subf %16, %14 : f32
linalg.yield %17 : f32
} -> tensor<4x1xf32>
return %1 : tensor<4x1xf32>
}
}

module attributes {transform.with_named_sequence} {
transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
%0 = transform.structured.match ops{["linalg.generic"]} in %arg1 : (!transform.any_op) -> !transform.any_op
transform.structured.vectorize %0 : !transform.any_op
transform.yield
}
}
```

show more ...


# 8868c02c 06-Aug-2024 Andrzej Warzyński <andrzej.warzynski@arm.com>

[mlir][linalg] Relax tensor.extract vectorization (#99299)

Simplifies the vectorization of tensor.extract so that:
* all cases that read into a genuinely multi-dim vector (*) are
considered a ga

[mlir][linalg] Relax tensor.extract vectorization (#99299)

Simplifies the vectorization of tensor.extract so that:
* all cases that read into a genuinely multi-dim vector (*) are
considered a gather load,
* all other cases are considered as potential contiguous loads.

This change means that the following extraction from a "column" tensor
will be correctly identified as a scalar load followed by a broadcast (rather
than a gather load).

```mlir
func.func @vectorize_scalar_broadcast_column_tensor(%in: tensor<1x1x4xi32>) -> tensor<1x1x4xi32> {
%c4 = arith.constant 4 : index
%c0 = arith.constant 0 : index
%cst = arith.constant dense<[...]> : tensor<15x1xi32>

%out = linalg.generic {
indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>],
iterator_types = ["parallel", "parallel", "parallel"]}
outs(%in : tensor<1x1x4xi32>) {

^bb0(%out: i32):
%idx_0 = linalg.index 0 : index
%extracted = tensor.extract %cst[%idx_0, %c0] : tensor<15x1xi32>
linalg.yield %extracted : i32
} -> tensor<1x1x4xi32>

return %out:tensor<1x1x4xi32>
}
```

(*) `vector<1x4x1xf32>` is considered as 1D vector in this context.

show more ...


12345678910>>...12