Revision tags: llvmorg-21-init, llvmorg-19.1.7 |
|
#
91bbebc7 |
| 27-Dec-2024 |
Kunwar Grover <groverkss@gmail.com> |
[mlir][scf] Add getPartialResultTilePosition to PartialReductionOpInterface (#120465)
This PR adds a new interface method to PartialReductionOpInterface which allows it to query the result tile posi
[mlir][scf] Add getPartialResultTilePosition to PartialReductionOpInterface (#120465)
This PR adds a new interface method to PartialReductionOpInterface which allows it to query the result tile position for the partial result. Previously, tiling the reduction dimension with SplitReductionOuterReduction when the result has transposed parallel dimensions would produce wrong results.
Other fixes that were needed to make this PR work:
- Instead of ad-hoc logic to decide where to place the new reduction dimensions in the partial result based on the iteration space, the reduction dimensions are always appended to the partial result tensor. - Remove usage of PartialReductionOpInterface in Mesh dialect. The implementation was trying to just get a neutral element, but ended up trying to use PartialReductionOpInterface for it, which is not right. It was also passing the wrong sizes to it.
show more ...
|
#
6e3631d0 |
| 24-Dec-2024 |
Kunwar Grover <groverkss@gmail.com> |
[mlir][scf] Track replacements using a listener in TileAndFuse (#120999)
This PR makes TileAndFuse explicitly track replacements using a listener instead of assuming that the results always come fro
[mlir][scf] Track replacements using a listener in TileAndFuse (#120999)
This PR makes TileAndFuse explicitly track replacements using a listener instead of assuming that the results always come from the outer most tiling loop. scf::tileUsingInterface can introduce merge operations whose results are the actual replacements to use, instead of the outer most loop results.
show more ...
|
#
09dfc571 |
| 20-Dec-2024 |
Jacques Pienaar <jpienaar@google.com> |
[mlir] Enable decoupling two kinds of greedy behavior. (#104649)
The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracin
[mlir] Enable decoupling two kinds of greedy behavior. (#104649)
The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.
These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.
Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.
For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.
show more ...
|
#
4b563458 |
| 18-Dec-2024 |
Kunwar Grover <groverkss@gmail.com> |
[mlir][SCF] Unify tileUsingFor and tileReductionUsingFor implementation (#120115)
This patch unifies the tiling implementation for tileUsingFor and tileReductionUsingFor. This is done by passing an
[mlir][SCF] Unify tileUsingFor and tileReductionUsingFor implementation (#120115)
This patch unifies the tiling implementation for tileUsingFor and tileReductionUsingFor. This is done by passing an addition option to SCFTilingOptions, allowing it to set how reduction dimensions should be tiled. Currently, there are 3 different options for reduction tiling: FullReduction (old tileUsingFor), PartialReductionOuterReduction (old tileReductionUsingFor) and PartialReductionOuterParallel (linalg::tileReductionUsingForall, this isn't implemented in this patch).
The patch makes tileReductionUsingFor use the tileUsingFor implementation with the new reduction tiling options.
There are no test changes because the implementation was doing almost the exactly same thing. This was also tested in IREE (which uses both these APIs heavily) and there were no test changes.
show more ...
|
Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
8cc616bc |
| 13-Nov-2024 |
Max191 <44243577+Max191@users.noreply.github.com> |
[mlir] Clamp UnPackOp tiling sizes from operand tile (#112429)
The `getIterationDomainTileFromOperandTile` implementation for
tensor.unpack did not clamp sizes when the unpack op had extract_slice
[mlir] Clamp UnPackOp tiling sizes from operand tile (#112429)
The `getIterationDomainTileFromOperandTile` implementation for
tensor.unpack did not clamp sizes when the unpack op had extract_slice
semantics. This PR fixes the bug.
The PR also makes a minor change to `tileAndFuseConsumerOfSlice`. When
replacing DPS inits, the iteration domain is needed, and it is computed
from the tiled version of the operation after the initial tiling
transformation. This can result in some extra indexing computation, so
the PR changes it to use the original full sized cloned consumer op.
---------
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
show more ...
|
#
54ae9e7b |
| 11-Nov-2024 |
Quinn Dawkins <quinn.dawkins@gmail.com> |
[mlir][SCF] Fix condition for fusability in consumer fusion API (#115768)
It was previously allowing either a tilable or dps op to be fused. Both
are required for consumer fusion.
|
#
9bc3102b |
| 06-Nov-2024 |
Yun-Fly <yunfei.song@intel.com> |
[mlir][scf] Extend consumer fusion to multiple tilable users (#111955)
Before, consumer fusion expects single usage(or others are terminator
op). This patch supports multiple tilable consumers fusi
[mlir][scf] Extend consumer fusion to multiple tilable users (#111955)
Before, consumer fusion expects single usage(or others are terminator
op). This patch supports multiple tilable consumers fusion.
E.g.
```
%0 = scf.for {
...
%p = tiledProducer
...
}
%1 = tilableConsumer1 ins(%0 : ...)
%2 = tilableConsumer2 ins(%0 : ...)
```
===>
```
%0:3 = scf.for {
...
%p = tiledProducer
%1 = tiledConsumer1 ins(%p : ...)
%2 = tiledConsumer2 ins(%p : ...)
...
}
```
The key process is ensuring that the first user of loop
should not dominate any define of consumer operand(s).
show more ...
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2 |
|
#
9144fed3 |
| 04-Oct-2024 |
Quinn Dawkins <quinn.dawkins@gmail.com> |
[mlir] Add option for a cleanup pattern set to SCF tiling helper (#109554)
The SCF helper for tiling an operation implementing the TilingInterface
and greedily fusing consumers requires an uninterr
[mlir] Add option for a cleanup pattern set to SCF tiling helper (#109554)
The SCF helper for tiling an operation implementing the TilingInterface
and greedily fusing consumers requires an uninterrupted chain of
operations implementing the tiling interface to succeed. There can be
cases with intermediate ops that don't implement the interface but have
producers that could be fused if various canonicalization/simplification
patterns could run in between fusion steps.
This adds an option to SCFTileAndFuseOptions for a pattern set to run
between fusion steps to the ops that result from fusion/tiling. Removed
and newly inserted slices are tracked for continued fusion applications.
See this RFC for more discussion:
https://discourse.llvm.org/t/rfc-split-fusion-portions-of-the-tilinginterface-into-a-new-interface/81155
show more ...
|
Revision tags: llvmorg-19.1.1 |
|
#
b8c974f0 |
| 30-Sep-2024 |
Abhishek Varma <avarma094@gmail.com> |
[MLIR][TilingInterface] Extend consumer fusion for multi-use of producer shared by terminator ops (#110105)
-- This commit extends consumer fusion to take place even if the
producer has multiple us
[MLIR][TilingInterface] Extend consumer fusion for multi-use of producer shared by terminator ops (#110105)
-- This commit extends consumer fusion to take place even if the
producer has multiple uses.
-- The multiple uses of the producer essentially means that besides the
consumer op in concern, the only other uses of the producer are
allowed in :-
1. scf.yield
2. tensor.parallel_insert_slice
Signed-off-by: Abhishek Varma <abhvarma@amd.com>
show more ...
|
Revision tags: llvmorg-19.1.0 |
|
#
d5f0969c |
| 12-Sep-2024 |
MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com> |
[mlir][TilingInterface] Avoid looking at operands for getting slices to continue tile + fuse. (#107882)
Current implementation of `scf::tileConsumerAndFuseProducerUsingSCF`
looks at operands of til
[mlir][TilingInterface] Avoid looking at operands for getting slices to continue tile + fuse. (#107882)
Current implementation of `scf::tileConsumerAndFuseProducerUsingSCF`
looks at operands of tiled/tiled+fused operations to see if they are
produced by `extract_slice` operations to populate the worklist used to
continue fusion. This implicit assumption does not always work. Instead
make the implementations of `getTiledImplementation` return the slices
to use to continue fusion.
This is a breaking change
- To continue to get the same behavior of
`scf::tileConsumerAndFuseProducerUsingSCF`, change all out-of-tree
implementation of `TilingInterface::getTiledImplementation` to return
the slices to continue fusion on. All in-tree implementations have been
adapted to this.
- This change touches parts that required a simplification to the
`ControlFn` in `scf::SCFTileAndFuseOptions`. It now returns a
`std::optional<scf::SCFTileAndFuseOptions::ControlFnResult>` object that
should be `std::nullopt` if fusion is not to be performed.
Signed-off-by: MaheshRavishankar <mahesh.revishankar@gmail.com>
show more ...
|
#
a9ba1b6d |
| 12-Sep-2024 |
Yun-Fly <yunfei.song@intel.com> |
[mlir][scf] Extend consumer fuse to single nested `scf.for` (#108318)
Refactor current consumer fusion based on `addInitOperandsToLoopNest` to support single nested `scf.for`, E.g.
```
%0 = scf.
[mlir][scf] Extend consumer fuse to single nested `scf.for` (#108318)
Refactor current consumer fusion based on `addInitOperandsToLoopNest` to support single nested `scf.for`, E.g.
```
%0 = scf.for() {
%1 = scf.for() {
tiledProducer
}
yield %1
}
%2 = consumer ins(%0)
```
Compared with #94190, this PR fix build failure by making C++17 happy.
show more ...
|
#
335538c2 |
| 12-Sep-2024 |
Kazu Hirata <kazu@google.com> |
Revert "[mlir][scf] Extend consumer fuse to single nested `scf.for` (#94190)"
This reverts commit 2d4bdfba96d4cf88b12226b2b511bf55ee5e6559.
A build breakage is reported at:
https://lab.llvm.org/bu
Revert "[mlir][scf] Extend consumer fuse to single nested `scf.for` (#94190)"
This reverts commit 2d4bdfba96d4cf88b12226b2b511bf55ee5e6559.
A build breakage is reported at:
https://lab.llvm.org/buildbot/#/builders/138/builds/3524
show more ...
|
#
2d4bdfba |
| 12-Sep-2024 |
Yun-Fly <yunfei.song@intel.com> |
[mlir][scf] Extend consumer fuse to single nested `scf.for` (#94190)
Refactor current consumer fusion based on `addInitOperandsToLoopNest` to support single nested `scf.for`, E.g.
```
%0 = scf.f
[mlir][scf] Extend consumer fuse to single nested `scf.for` (#94190)
Refactor current consumer fusion based on `addInitOperandsToLoopNest` to support single nested `scf.for`, E.g.
```
%0 = scf.for() {
%1 = scf.for() {
tiledProducer
}
yield %1
}
%2 = consumer ins(%0)
```
show more ...
|
Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2 |
|
#
6740d701 |
| 31-Jul-2024 |
MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com> |
[mlir][Linalg] Deprecate `linalg::tileToForallOp` and `linalg::tileToForallOpUsingTileSizes` (#91878)
The implementation of these methods are legacy and they are removed in
favor of using the `scf:
[mlir][Linalg] Deprecate `linalg::tileToForallOp` and `linalg::tileToForallOpUsingTileSizes` (#91878)
The implementation of these methods are legacy and they are removed in
favor of using the `scf::tileUsingSCF` methods as replacements. To get
the latter on par with requirements of the deprecated methods, the
tiling allows one to specify the maximum number of tiles to use instead
of specifying the tile sizes. When tiling to `scf.forall` this
specification is used to generate the `num_threads` version of the
operation.
A slight deviation from previous implementation is that the deprecated
method always generated the `num_threads` variant of the `scf.forall`
operation. Instead now this is driven by the tiling options specified.
This reduces the indexing math generated when the tile sizes are
specified.
**Moving from `linalg::tileToForallOp` to `scf::tileUsingSCF`**
```
OpBuilder b;
TilingInterface op;
ArrayRef<OpFoldResult> numThreads;
ArrayAttr mapping;
FailureOr<ForallTilingResult> result =linalg::tileToForallOp(b, op, numThreads, mapping);
```
can be replaced by
```
scf::SCFTilingOptions options;
options.setNumThreads(numThreads);
options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp);
options.setMapping(mapping.getValue()); /*note the difference that setMapping takes an ArrayRef<Attribute> */
FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options);
```
This generates the `numThreads` version of the `scf.forall` for the
inter-tile loops, i.e.
```
... = scf.forall (%arg0, %arg1) in (%nt0, %nt1) shared_outs(...)
```
**Moving from `linalg::tileToForallOpUsingTileSizes` to
`scf::tileUsingSCF`**
```
OpBuilder b;
TilingInterface op;
ArrayRef<OpFoldResult> tileSizes;
ArrayAttr mapping;
FailureOr<ForallTilingResult> result =linalg::tileToForallOpUsingTileSizes(b, op, tileSizes, mapping);
```
can be replaced by
```
scf::SCFTilingOptions options;
options.setTileSizes(tileSizes);
options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp);
options.setMapping(mapping.getValue()); /*note the difference that setMapping takes an ArrayRef<Attribute> */
FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options);
```
Also note that `linalg::tileToForallOpUsingTileSizes` would effectively
call the `linalg::tileToForallOp` by computing the `numThreads` from the
`op` and `tileSizes` and generate the `numThreads` version of the
`scf.forall`. That is not the case anymore. Instead this will directly
generate the `tileSizes` version of the `scf.forall` op
```
... = scf.forall(%arg0, %arg1) = (%lb0, %lb1) to (%ub0, %ub1) step(%step0, %step1) shared_outs(...)
```
If you actually want to use the `numThreads` version, it is upto the
caller to compute the `numThreads` and set `options.setNumThreads`
instead of `options.setTileSizes`. Note that there is a slight
difference in the num threads version and tile size version. The former
requires an additional `affine.max` on the tile size to ensure
non-negative tile sizes. When lowering to `numThreads` version this
`affine.max` is not needed since by construction the tile sizes are
non-negative. In previous implementations, the `numThreads` version
generated when using the `linalg::tileToForallOpUsingTileSizes` method
would avoid generating the `affine.max` operation. To get the same
state, downstream users will have to additionally normalize the
`scf.forall` operation.
**Changes to `transform.structured.tile_using_forall`**
The transform dialect op that called into `linalg::tileToForallOp` and
`linalg::tileToForallOpUsingTileSizes` have been modified to call
`scf::tileUsingSCF`. The transform dialect op always generates the
`numThreads` version of the `scf.forall` op. So when `tile_sizes` are
specified for the transform dialect op, first the `tile_sizes` version
of the `scf.forall` is generated by the `scf::tileUsingSCF` method which
is then further normalized to get back to the same state. So there is no
functional change to `transform.structured.tile_using_forall`. It always
generates the `numThreads` version of the `scf.forall` op (as it did
before this change).
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
show more ...
|
Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
7ef08eac |
| 28-Jun-2024 |
Yun-Fly <yunfei.song@intel.com> |
[mlir][scf] Extend option to yield replacement for multiple results case (#93144)
This patch extends the functionality of yielding replacement for multiple
results case and adds another optional a
[mlir][scf] Extend option to yield replacement for multiple results case (#93144)
This patch extends the functionality of yielding replacement for multiple
results case and adds another optional argument called `yieldResultNumber`
indicating which result(s) need yield. If not given, all of results will be yield
by default.
show more ...
|
#
b99d0b34 |
| 18-Jun-2024 |
MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com> |
[mlir][TilingInterface] Update `PartialReductionOpInterface` to get it more in line with `TilingInterface`. (#95460)
The `TilingInterface` methods have return values that allow the
interface implem
[mlir][TilingInterface] Update `PartialReductionOpInterface` to get it more in line with `TilingInterface`. (#95460)
The `TilingInterface` methods have return values that allow the
interface implementation to return multiple operations, and also return
tiled values explicitly. This is to avoid the assumption that the
interface needs to return a single operation and this operations result
are the expected tiled values. Make the
`PartialReductionOpInterface::tileToPartialReduction` return
`TilingResult` as well for the same reason.
Similarly make the `PartialReductionOpInterface::mergeReductions` also
return a list of generated operations and values to use as replacements.
This is just a refactoring to allow for deprecation of
`linalg::tileReductionUsingForall` with `scf::tileReductionUsingSCF`
method.
show more ...
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7 |
|
#
e7e6e1ec |
| 01-Jun-2024 |
MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com> |
[mlir][bazel] Add bazel build support for https://github.com/llvm/llvm-project/commit/2b2ce50fe843b5b550806a0ab15b06cd5c405d48 (#94126)
Also drop errant header include from `Linalg` dialect into
`D
[mlir][bazel] Add bazel build support for https://github.com/llvm/llvm-project/commit/2b2ce50fe843b5b550806a0ab15b06cd5c405d48 (#94126)
Also drop errant header include from `Linalg` dialect into
`Dialect/SCF/Transforms/TileUsingInterface.cpp`
show more ...
|
#
2b2ce50f |
| 01-Jun-2024 |
Abhishek Varma <abhvarma@amd.com> |
[MLIR][SCF] Add an API to fuse consumer to a producer within scf loop (#88712)
This commit adds an API (`tileAndFuseConsumerOfSlice`) to fuse consumer to a producer within
scf.for/scf.forall loop.
[MLIR][SCF] Add an API to fuse consumer to a producer within scf loop (#88712)
This commit adds an API (`tileAndFuseConsumerOfSlice`) to fuse consumer to a producer within
scf.for/scf.forall loop.
To support this two new methods are added to the `TilingInterface`
- `getIterationDomainTileFromOperandTile`
- `getTiledImplementationFromOperandTile`.
Consumer operations that implement this method can be used to be fused with tiled producer operands in a manner similar to (but essentially the inverse of) the fusion of an untiled producer with a tiled consumer.
Note that this only does one `tiled producer` -> `consumer` fusion. This could be called repeatedly for fusing multiple consumers. The current implementation also is conservative in when this kicks in (like single use of the value returned by the inter-tile loops that surround the tiled producer, etc.) These can be relaxed over time.
Signed-off-by: Abhishek Varma <abhvarma@amd.com>
---------
Signed-off-by: Abhishek Varma <abhvarma@amd.com>
Signed-off-by: Abhishek Varma <avarma094@gmail.com>
Co-authored-by: cxy <chenxunyu1993@gmail.com>
show more ...
|
#
9329b20d |
| 22-May-2024 |
Kunwar Grover <groverkss@gmail.com> |
[mlir][TilingInterface] Allow multiple results in PartialReductionOpInterface (#92624)
This patch adds support for reducing operations with multiple results
using PartialReductionOpInterface. Also
[mlir][TilingInterface] Allow multiple results in PartialReductionOpInterface (#92624)
This patch adds support for reducing operations with multiple results
using PartialReductionOpInterface. Also adds an implementation of
PartialReductionOpInterface for multiple results for linalg.generic.
show more ...
|
Revision tags: llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1 |
|
#
76ead96c |
| 26-Jan-2024 |
MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com> |
[mlir][TilingInterface] Use `LoopLikeOpInterface` in tiling using SCF to unify tiling with `scf.for` and `scf.forall`. (#77874)
Using `LoopLikeOpInterface` as the basis for the implementation unifie
[mlir][TilingInterface] Use `LoopLikeOpInterface` in tiling using SCF to unify tiling with `scf.for` and `scf.forall`. (#77874)
Using `LoopLikeOpInterface` as the basis for the implementation unifies
all the tiling logic for both `scf.for` and `scf.forall`. The only
difference is the actual loop generation. This is a follow up to
https://github.com/llvm/llvm-project/pull/72178
Instead of many entry points for each loop type, the loop type is now
passed as part of the options passed to the tiling method.
This is a breaking change with the following changes
1) The `scf::tileUsingSCFForOp` is renamed to `scf::tileUsingSCF`
2) The `scf::tileUsingSCFForallOp` is deprecated. The same
functionality is obtained by using `scf::tileUsingSCF` and setting
the loop type in `scf::SCFTilingOptions` passed into this method to
`scf::SCFTilingOptions::LoopType::ForallOp` (using the
`setLoopType` method).
3) The `scf::tileConsumerAndFusedProducerGreedilyUsingSCFForOp` is
renamed to `scf::tileConsumerAndFuseProducerUsingSCF`. The use of
the `controlFn` in `scf::SCFTileAndFuseOptions` allows implementing
any strategy with the default callback implemeting the greedy fusion.
4) The `scf::SCFTilingResult` and `scf::SCFTileAndFuseResult` now use
`SmallVector<LoopLikeOpInterface>`.
5) To make `scf::ForallOp` implement the parts of
`LoopLikeOpInterface` needed, the `getOutputBlockArguments()`
method is replaced with `getRegionIterArgs()`
These changes now bring the tiling and fusion capabilities using
`scf.forall` on par with what was already supported by `scf.for`
show more ...
|
Revision tags: llvmorg-19-init |
|
#
5fcf907b |
| 17-Jan-2024 |
Matthias Springer <me@m-sp.org> |
[mlir][IR] Rename "update root" to "modify op" in rewriter API (#78260)
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `sta
[mlir][IR] Rename "update root" to "modify op" in rewriter API (#78260)
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`
The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.
Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
show more ...
|
#
aa2a96a2 |
| 12-Jan-2024 |
MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com> |
[mlir][TilingInterface] Move TilingInterface tests to use transform dialect ops. (#77204)
In the process a couple of test transform dialect ops are added just
for testing. These operations are not
[mlir][TilingInterface] Move TilingInterface tests to use transform dialect ops. (#77204)
In the process a couple of test transform dialect ops are added just
for testing. These operations are not intended to use as full flushed
out of transformation ops, but are rather operations added for testing.
A separate operation is added to `LinalgTransformOps.td` to convert a
`TilingInterface` operation to loops using the
`generateScalarImplementation` method implemented by the
operation. Eventually this and other operations related to tiling
using the `TilingInterface` need to move to a better place (i.e. out
of `Linalg` dialect)
show more ...
|
#
4435ced9 |
| 08-Jan-2024 |
MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com> |
[mlir][TilingInterface] Allow controlling what fusion is done within tile and fuse (#76871)
Currently the `tileConsumerAndFuseProducerGreedilyUsingSCFFor` method
greedily fuses through all slices t
[mlir][TilingInterface] Allow controlling what fusion is done within tile and fuse (#76871)
Currently the `tileConsumerAndFuseProducerGreedilyUsingSCFFor` method
greedily fuses through all slices that are generated during the tile and
fuse flow. That is not the normal use case. Ideally the caller would
like to control which slices get fused and which dont. This patch
introduces a new field to the `SCFTileAndFuseOptions` to specify this
control.
The contol function also allows the caller to specify if the replacement
for the fused producer needs to be yielded from within the tiled
computation. This allows replacing the fused producers in case they have
other uses. Without this the original producers still survive negating
the utility of the fusion.
The change here also means that the name of the function
`tileConsumerAndFuseProducerGreedily...` can be updated. Defering that
to a later stage to reduce the churn of API changes.
show more ...
|
#
899c2bed |
| 19-Dec-2023 |
Han-Chung Wang <hanhan0912@gmail.com> |
[mlir][TilingInterface] Early return cloned ops if tile sizes are zeros. (#75410)
It is a trivial early-return case. If the cloned ops are not returned,
it will generate `extract_slice` op that ext
[mlir][TilingInterface] Early return cloned ops if tile sizes are zeros. (#75410)
It is a trivial early-return case. If the cloned ops are not returned,
it will generate `extract_slice` op that extracts the whole slice.
However, it is not folded away. Early-return to avoid the case.
E.g.,
```mlir
func.func @matmul_tensors(
%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>)
-> tensor<?x?xf32> {
%0 = linalg.matmul ins(%arg0, %arg1: tensor<?x?xf32>, tensor<?x?xf32>)
outs(%arg2: tensor<?x?xf32>)
-> tensor<?x?xf32>
return %0 : tensor<?x?xf32>
}
module attributes {transform.with_named_sequence} {
transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1 : (!transform.any_op) -> !transform.any_op
%1 = transform.structured.tile_using_for %0 [0, 0, 0] : (!transform.any_op) -> (!transform.any_op)
transform.yield
}
}
```
Apply the transforms and canonicalize the IR:
```
mlir-opt --transform-interpreter -canonicalize input.mlir
```
we will get
```mlir
module {
func.func @matmul_tensors(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> {
%c1 = arith.constant 1 : index
%c0 = arith.constant 0 : index
%dim = tensor.dim %arg0, %c0 : tensor<?x?xf32>
%dim_0 = tensor.dim %arg0, %c1 : tensor<?x?xf32>
%dim_1 = tensor.dim %arg1, %c1 : tensor<?x?xf32>
%extracted_slice = tensor.extract_slice %arg0[0, 0] [%dim, %dim_0] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
%extracted_slice_2 = tensor.extract_slice %arg1[0, 0] [%dim_0, %dim_1] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
%extracted_slice_3 = tensor.extract_slice %arg2[0, 0] [%dim, %dim_1] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
%0 = linalg.matmul ins(%extracted_slice, %extracted_slice_2 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%extracted_slice_3 : tensor<?x?xf32>) -> tensor<?x?xf32>
return %0 : tensor<?x?xf32>
}
}
```
The revision early-return the case so we can get:
```mlir
func.func @matmul_tensors(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> {
%0 = linalg.matmul ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%arg2 : tensor<?x?xf32>) -> tensor<?x?xf32>
return %0 : tensor<?x?xf32>
}
```
show more ...
|
Revision tags: llvmorg-17.0.6 |
|
#
ec1086f2 |
| 21-Nov-2023 |
MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com> |
Fix build error from #72178 (#72905)
|