Revision tags: llvmorg-21-init, llvmorg-19.1.7 |
|
#
09dfc571 |
| 20-Dec-2024 |
Jacques Pienaar <jpienaar@google.com> |
[mlir] Enable decoupling two kinds of greedy behavior. (#104649)
The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracin
[mlir] Enable decoupling two kinds of greedy behavior. (#104649)
The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.
These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.
Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.
For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.
show more ...
|
Revision tags: llvmorg-19.1.6, llvmorg-19.1.5 |
|
#
017c75bf |
| 01-Dec-2024 |
Kai Sasaki <lewuathe@gmail.com> |
[mlir] Fix typo in test vector transform pass descriptions (#118194)
Fix some typos in the description of vector transform passes.
|
#
ecaf2c33 |
| 22-Nov-2024 |
Petr Kurapov <petr.a.kurapov@intel.com> |
[MLIR] Move warp_execute_on_lane_0 from vector to gpu (#116994)
Please see the related RFC here:
https://discourse.llvm.org/t/rfc-move-execute-on-lane-0-from-vector-to-gpu-dialect/82989.
This pa
[MLIR] Move warp_execute_on_lane_0 from vector to gpu (#116994)
Please see the related RFC here:
https://discourse.llvm.org/t/rfc-move-execute-on-lane-0-from-vector-to-gpu-dialect/82989.
This patch does exactly one thing - moves the op to gpu.
show more ...
|
Revision tags: llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2 |
|
#
a9ebdbb5 |
| 09-Oct-2024 |
Benoit Jacob <jacob.benoit.1@gmail.com> |
[MLIR] Vector: turn the ExtractStridedSlice rewrite pattern from #111541 into a canonicalization (#111614)
This is a reasonable canonicalization because `extract` is more
constrained than `extract_
[MLIR] Vector: turn the ExtractStridedSlice rewrite pattern from #111541 into a canonicalization (#111614)
This is a reasonable canonicalization because `extract` is more
constrained than `extract_strided_slices`, so there is no loss of
semantics here, just lifting an op to a special-case higher/constrained
op. And the additional `shape_cast` is merely adding leading unit dims
to match the original result type.
Context: discussion on #111541. I wasn't sure how this would turn out,
but in the process of writing this PR, I discovered at least 2 bugs in
the pattern introduced in #111541, which shows the value of shared
canonicalization patterns which are exercised on a high number of
testcases.
---------
Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
show more ...
|
#
10054ba4 |
| 08-Oct-2024 |
Benoit Jacob <jacob.benoit.1@gmail.com> |
[mlir][vector] Add pattern to rewrite contiguous ExtractStridedSlice into Extract (#111541)
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
|
Revision tags: llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3 |
|
#
42944da5 |
| 16-Aug-2024 |
Andrzej Warzyński <andrzej.warzynski@arm.com> |
[mlir][vector] Group re-order patterns together (#102856)
Group all patterns that re-order vector.transpose and vector.broadcast
Ops (*) under `populateSinkVectorOpsPatterns`. These patterns are
n
[mlir][vector] Group re-order patterns together (#102856)
Group all patterns that re-order vector.transpose and vector.broadcast
Ops (*) under `populateSinkVectorOpsPatterns`. These patterns are
normally used to "sink" redundant Vector Ops, hence grouping together.
Example:
```mlir
%at = vector.transpose %a, [1, 0]: vector<4x2xf32> to vector<2x4xf32>
%bt = vector.transpose %b, [1, 0]: vector<4x2xf32> to vector<2x4xf32>
%r = arith.addf %at, %bt : vector<2x4xf32>
```
would get converted to:
```mlir
%0 = arith.addf %a, %b : vector<4x2xf32>
%r = vector.transpose %0, [1, 0] : vector<2x4xf32>
```
This patch also moves all tests for these patterns so that all of them
are:
* run under one test-flag: `test-vector-sink-patterns`,
* located in one file: "vector-sink.mlir".
To facilitate this change:
* `-test-sink-vector-broadcast` is renamed as
`test-vector-sink-patterns`,
* "sink-vector-broadcast.mlir" is renamed as "vector-sink.mlir",
* tests for `ReorderCastOpsOnBroadcast` and
`ReorderElementwiseOpsOnTranspose` patterns are moved from
"vector-reduce-to-contract.mlir" to "vector-sink.mlir",
* `ReorderElementwiseOpsOnTranspose` patterns are removed from
`populateVectorReductionToContractPatterns` and added to (newly
created) `populateSinkVectorOpsPatterns`,
* `ReorderCastOpsOnBroadcast` patterns are removed from
`populateVectorReductionToContractPatterns` - these are already
present in `populateSinkVectorOpsPatterns`.
This should allow us better layering and more straightforward testing.
For the latter, the goal is to be able to easily identify which pattern
a particular test is exercising (especially when it's a specific
pattern).
NOTES FOR DOWNSTREAM USERS
In order to preserve the current functionality, please make sure to add
* `populateSinkVectorOpsPatterns`,
wherever you are using `populateVectorReductionToContractPatterns`.
Also, rename `populateSinkVectorBroadcastPatterns` as
`populateSinkVectorOpsPatterns`.
(*) I didn't notice any other re-order patterns.
show more ...
|
#
9b06e25e |
| 09-Aug-2024 |
Benjamin Maxwell <benjamin.maxwell@arm.com> |
[mlir][vector] Add mask elimination transform (#99314)
This adds a new transform `eliminateVectorMasks()` which aims at
removing scalable `vector.create_masks` that will be all-true at
runtime. It
[mlir][vector] Add mask elimination transform (#99314)
This adds a new transform `eliminateVectorMasks()` which aims at
removing scalable `vector.create_masks` that will be all-true at
runtime. It attempts to do this by simply pattern-matching the mask
operands (similar to some canonicalizations), if that does not lead to
an answer (is all-true? yes/no), then value bounds analysis will be used
to find the lower bound of the unknown operands. If the lower bound is
>= to the corresponding mask vector type dim, then that dimension of the
mask is all true.
Note that the pattern matching prevents expensive value-bounds analysis
in cases where the mask won't be all true.
For example:
```mlir
%mask = vector.create_mask %dynamicValue, %c2 : vector<8x4xi1>
```
From looking at `%c2` we can tell this is not going to be an all-true
mask, so we don't need to run the value-bounds analysis for
`%dynamicValue` (and can exit the transform early).
Note: Eliminating create_masks here means replacing them with all-true
constants (which will then lead to the masks folding away).
show more ...
|
Revision tags: llvmorg-19.1.0-rc2 |
|
#
5262865a |
| 04-Aug-2024 |
Kazu Hirata <kazu@google.com> |
[mlir] Construct SmallVector with ArrayRef (NFC) (#101896)
|
Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5 |
|
#
c577f91d |
| 18-Apr-2024 |
Charitha Saumya <136391709+charithaintc@users.noreply.github.com> |
[mlir][vector] Add support for linearizing Extract, ExtractStridedSlice, Shuffle VectorOps in VectorLinearize (#88204)
This PR adds support for converting `vector.extract_strided_slice` and
`vector
[mlir][vector] Add support for linearizing Extract, ExtractStridedSlice, Shuffle VectorOps in VectorLinearize (#88204)
This PR adds support for converting `vector.extract_strided_slice` and
`vector.extract` operations to equivalent `vector.shuffle` operations
that operates on linearized (1-D) vectors. `vector.shuffle` operations
operating on n-D (n > 1) are also converted to equivalent shuffle
operations working on linearized vectors.
show more ...
|
Revision tags: llvmorg-18.1.4, llvmorg-18.1.3 |
|
#
d3aa92ed |
| 28-Mar-2024 |
Andrzej Warzyński <andrzej.warzynski@arm.com> |
[mlir][vector] Add support for scalable vectors to VectorLinearize (#86786)
Adds support for scalable vectors to patterns defined in
VectorLineralize.cpp.
Linearization is disable in 2 notable c
[mlir][vector] Add support for scalable vectors to VectorLinearize (#86786)
Adds support for scalable vectors to patterns defined in
VectorLineralize.cpp.
Linearization is disable in 2 notable cases:
* vectors with more than 1 scalable dimension (we cannot represent
vscale^2),
* vectors initialised with arith.constant that's not a vector splat
(such arith.constant Ops cannot be flattened).
show more ...
|
Revision tags: llvmorg-18.1.2, llvmorg-18.1.1 |
|
#
6f5c4f2e |
| 05-Mar-2024 |
Balaji V. Iyer <43187390+bviyer@users.noreply.github.com> |
[mlir][vector]Add Vector bitwidth target to Linearize Vectorizable and Constant Ops (#83314)
Added a new flag `targetVectorBitwidth` to capture bit-width input.
|
#
c2b95292 |
| 28-Feb-2024 |
Quinn Dawkins <quinn.dawkins@gmail.com> |
[mlir][vector] Fix n-d transfer write distribution (#83215)
Currently n-d transfer write distribution can be inconsistent with
distribution of reductions if a value has multiple users, one of which
[mlir][vector] Fix n-d transfer write distribution (#83215)
Currently n-d transfer write distribution can be inconsistent with
distribution of reductions if a value has multiple users, one of which
is a transfer_write with a non-standard distribution map, and the other
of which is a vector.reduction.
We may want to consider removing the distribution map functionality in
the future for this reason.
show more ...
|
Revision tags: llvmorg-18.1.0, llvmorg-18.1.0-rc4 |
|
#
71441ed1 |
| 21-Feb-2024 |
Diego Caballero <diegocaballero@google.com> |
[mlir][Vector] Add vector bitwidth target to xfer op flattening (#81966)
This PR adds an optional bitwidth parameter to the vector xfer op
flattening transformation so that the flattening doesn't h
[mlir][Vector] Add vector bitwidth target to xfer op flattening (#81966)
This PR adds an optional bitwidth parameter to the vector xfer op
flattening transformation so that the flattening doesn't happen if the
trailing dimension of the read/writen vector is larger than this
bitwidth (i.e., we are already able to fill at least one vector register
with that size).
show more ...
|
Revision tags: llvmorg-18.1.0-rc3 |
|
#
35ef3994 |
| 13-Feb-2024 |
Ivan Butygin <ivan.butygin@gmail.com> |
[mlir][vector] ND vectors linearization pass (#81159)
Common backends (LLVM, SPIR-V) only supports 1D vectors, LLVM conversion
handles ND vectors (N >= 2) as `array<array<... vector>>` and SPIR-V
[mlir][vector] ND vectors linearization pass (#81159)
Common backends (LLVM, SPIR-V) only supports 1D vectors, LLVM conversion
handles ND vectors (N >= 2) as `array<array<... vector>>` and SPIR-V
conversion doesn't handle them at all at the moment. Sometimes it's
preferable to treat multidim vectors as linearized 1D. Add pass to do
this. Only constants and simple elementwise ops are supported for now.
@krzysz00 I've extracted yours result type conversion code from
LegalizeToF32 and moved it to common place.
Also, add ConversionPattern class operating on traits.
show more ...
|
Revision tags: llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
07677113 |
| 18-Dec-2023 |
Jakub Kuderski <jakub@nod-labs.com> |
[mlir][vector] Add pattern to break down reductions into arith ops (#75727)
The number of vector elements considered 'small' enough to extract is
parameterized.
[mlir][vector] Add pattern to break down reductions into arith ops (#75727)
The number of vector elements considered 'small' enough to extract is
parameterized.
This is to avoid going into specialized reduction lowering when a
single/couple of arith ops can do. Targets without dedicated reduction
intrinsics can use that as an emulation path too.
Depends on https://github.com/llvm/llvm-project/pull/75846.
show more ...
|
#
f643eec8 |
| 15-Dec-2023 |
Hsiangkai Wang <hsiangkai.wang@arm.com> |
[mlir][vector] Add emulation patterns for vector masked load/store (#74834)
In this patch, it will convert
```
vector.maskedload %base[%idx_0, %idx_1], %mask, %pass_thru
```
to
```
%ival
[mlir][vector] Add emulation patterns for vector masked load/store (#74834)
In this patch, it will convert
```
vector.maskedload %base[%idx_0, %idx_1], %mask, %pass_thru
```
to
```
%ivalue = %pass_thru
%m = vector.extract %mask[0]
%result0 = scf.if %m {
%v = memref.load %base[%idx_0, %idx_1]
%combined = vector.insert %v, %ivalue[0]
scf.yield %combined
} else {
scf.yield %ivalue
}
%m = vector.extract %mask[1]
%result1 = scf.if %m {
%v = memref.load %base[%idx_0, %idx_1 + 1]
%combined = vector.insert %v, %result0[1]
scf.yield %combined
} else {
scf.yield %result0
}
...
```
It will convert
```
vector.maskedstore %base[%idx_0, %idx_1], %mask, %value
```
to
```
%m = vector.extract %mask[0]
scf.if %m {
%extracted = vector.extract %value[0]
memref.store %extracted, %base[%idx_0, %idx_1]
}
%m = vector.extract %mask[1]
scf.if %m {
%extracted = vector.extract %value[1]
memref.store %extracted, %base[%idx_0, %idx_1 + 1]
}
...
```
show more ...
|
#
c02d07fd |
| 13-Dec-2023 |
Andrzej Warzyński <andrzej.warzynski@arm.com> |
[mlir][vector] Add pattern to drop unit dim from elementwise(a, b)) (#74817)
For vectors with either leading or trailing unit dim, replaces:
elementwise(a, b)
with:
sc_a = shape_cas
[mlir][vector] Add pattern to drop unit dim from elementwise(a, b)) (#74817)
For vectors with either leading or trailing unit dim, replaces:
elementwise(a, b)
with:
sc_a = shape_cast(a)
sc_b = shape_cast(b)
res = elementwise(sc_a, sc_b)
return shape_cast(res)
The newly inserted shape_cast Ops fold (before elementwise Op) and then
restore (after elementwise Op) the unit dim. Vectors `a` and `b` are
required to be rank > 1.
Example:
```mlir
%mul = arith.mulf %B_row, %A_row : vector<1x[4]xf32>
%cast = vector.shape_cast %mul : vector<1x[4]xf32> to vector<[4]xf32>
```
gets converted to:
```mlir
%B_row_sc = vector.shape_cast %B_row : vector<1x[4]xf32> to vector<[4]xf32>
%A_row_sc = vector.shape_cast %A_row : vector<1x[4]xf32> to vector<[4]xf32>
%mul = arith.mulf %B_row_sc, %A_row_sc : vector<[4]xf32>
%mul_sc = vector.shape_cast %mul : vector<[4]xf32> to vector<1x[4]xf32>
%cast = vector.shape_cast %mul_sc : vector<1x[4]xf32> to vector<[4]xf32>
```
In practice, the bottom 2 shape_cast(s) will be folded away.
show more ...
|
#
80636227 |
| 12-Dec-2023 |
Jakub Kuderski <jakub@nod-labs.com> |
[mlir][vector] Allow vector distribution with multiple written elements (#75122)
Add a configuration option to allow vector distribution with multiple
elements written by a single lane.
This is
[mlir][vector] Allow vector distribution with multiple written elements (#75122)
Add a configuration option to allow vector distribution with multiple
elements written by a single lane.
This is so that we can perform vector multi-reduction with multiple
results per workgroup.
show more ...
|
#
2eb9e33c |
| 05-Dec-2023 |
Andrzej Warzyński <andrzej.warzynski@arm.com> |
[mlir][Vector] Update patterns for flattening vector.xfer Ops (2/N) (#73523)
Updates patterns for flattening `vector.transfer_read` by relaxing the
requirement that the "collapsed" indices are all
[mlir][Vector] Update patterns for flattening vector.xfer Ops (2/N) (#73523)
Updates patterns for flattening `vector.transfer_read` by relaxing the
requirement that the "collapsed" indices are all zero. This enables
collapsing cases like this one:
```mlir
%2 = vector.transfer_read %arg4[%c0, %arg0, %arg1, %c0] ... :
memref<1x43x4x6xi32>, vector<1x2x6xi32>
```
Previously only the following case would be consider for collapsing
(all indices are 0):
```mlir
%2 = vector.transfer_read %arg4[%c0, %c0, %c0, %c0] ... :
memref<1x43x4x6xi32>, vector<1x2x6xi32>
```
Also adds some new comments and renames the `firstContiguousInnerDim`
parameter as `firstDimToCollapse` (the latter better matches the actual
meaning).
Similar updates for `vector.transfer_write` will be implemented in a
follow-up patch.
show more ...
|
Revision tags: llvmorg-17.0.6 |
|
#
d33bad66 |
| 22-Nov-2023 |
Jakub Kuderski <jakub@nod-labs.com> |
[mlir][vector] Add patterns to simplify chained reductions (#73048)
Chained reductions get created during vector unrolling. These patterns
simplify them into a series of adds followed by a final re
[mlir][vector] Add patterns to simplify chained reductions (#73048)
Chained reductions get created during vector unrolling. These patterns
simplify them into a series of adds followed by a final reductions.
This is preferred on GPU targets like SPIR-V/Vulkan where vector
reduction gets lowered into subgroup operations that are generally more
expensive than simple vector additions.
For now, only the `add` combining kind is handled.
show more ...
|
Revision tags: llvmorg-17.0.5 |
|
#
df49a97a |
| 10-Nov-2023 |
Quinn Dawkins <quinn.dawkins@gmail.com> |
[mlir][vector] Root the transfer write distribution pattern on the warp op (#71868)
Currently when there is a mix of transfer read ops and transfer write
ops that need to be distributed, because th
[mlir][vector] Root the transfer write distribution pattern on the warp op (#71868)
Currently when there is a mix of transfer read ops and transfer write
ops that need to be distributed, because the pattern for write
distribution is rooted on the transfer write, it is hard to guarantee
that the write gets distributed after the read when the two aren't
directly connected by SSA. This is likely still relatively unsafe when
there are undistributable ops, but structurally these patterns are a bit
difficult to work with. For now pattern benefits give fairly good
guarantees for happy paths.
show more ...
|
Revision tags: llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0 |
|
#
830b9b07 |
| 12-Sep-2023 |
Mehdi Amini <joker.eph@gmail.com> |
Update some uses of `getAttr()` to be explicit about Inherent vs Discardable (NFC)
|
Revision tags: llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3 |
|
#
576b184d |
| 15-Aug-2023 |
Andrzej Warzynski <andrzej.warzynski@arm.com> |
[mlir][vector] Add support for scalable vectors in `trimLeadingOneDims`
This patch updates one specific hook in "VectorDropLeadUnitDim.cpp" to make sure that "scalable dims" are handled correctly. W
[mlir][vector] Add support for scalable vectors in `trimLeadingOneDims`
This patch updates one specific hook in "VectorDropLeadUnitDim.cpp" to make sure that "scalable dims" are handled correctly. While this change affects multiple patterns, I am only adding one regression tests that captures one specific case that affects me right now.
I am also adding Vector dialect to the list of dependencies of `-test-vector-to-vector-lowering`. Otherwise my test case won't work as a standalone test.
Differential Revision: https://reviews.llvm.org/D157993
show more ...
|
Revision tags: llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6 |
|
#
4d339ec9 |
| 02-Jun-2023 |
Andrzej Warzynski <andrzej.warzynski@arm.com> |
[mlir][Vector] Add pattern to reorder elementwise and broadcast ops
The new pattern will replace elementwise(broadcast) with broadcast(elementwise) when safe.
This change affects tests for vectoris
[mlir][Vector] Add pattern to reorder elementwise and broadcast ops
The new pattern will replace elementwise(broadcast) with broadcast(elementwise) when safe.
This change affects tests for vectorising nD-extract. In one case ("vectorize_nd_tensor_extract_with_tensor_extract") I just trimmed the test and only preserved the key parts (scalar and contiguous load from the original Op). We could do the same with some other tests if that helps maintainability.
Differential Revision: https://reviews.llvm.org/D152812
show more ...
|
#
faae4d5d |
| 09-Jun-2023 |
Matthias Springer <me@m-sp.org> |
[mlir][vector][transform] Expose tensor slice -> transfer folding patterns
Add a new transform op to populate patterns: ApplyFoldTensorSliceIntoTransferPatternsOp.
Differential Revision: https://re
[mlir][vector][transform] Expose tensor slice -> transfer folding patterns
Add a new transform op to populate patterns: ApplyFoldTensorSliceIntoTransferPatternsOp.
Differential Revision: https://reviews.llvm.org/D152531
show more ...
|