mlir/docs/Bufferization.md

e2d7d3cbSSean Silva# Bufferization
0313e3bfSJulian Gross
e2d7d3cbSSean Silva[TOC]
0313e3bfSJulian Gross
e2d7d3cbSSean Silva## Overview
e2d7d3cbSSean Silva
975284abSMatthias SpringerBufferization in MLIR is the process of converting ops with `tensor` semantics
099417d6SMatthias Springerto ops with `memref` semantics. There are multiple MLIR passes that are related
099417d6SMatthias Springerto bufferization. These passes typically run as one of the last steps in a
099417d6SMatthias Springerpass pipeline, right before lowering to `memref` ops to LLVM. That is because
099417d6SMatthias Springermany transformations are easier or only supported in tensor land; e.g.,
099417d6SMatthias Springer[tile/fuse/… on tensors first](https://llvm.discourse.group/t/rfc-linalg-on-tensors-update-and-comprehensive-bufferization-rfc/3373),
099417d6SMatthias Springerthen bufferize the remaining IR.
975284abSMatthias Springer
099417d6SMatthias Springer![bufferization passes](/includes/img/bufferization_passes.svg)
099417d6SMatthias Springer
099417d6SMatthias SpringerThe most important bufferization pass is *One-Shot Bufferize*: This pass
099417d6SMatthias Springerrewrites `tensor` IR to `memref` IR. There are additional helper passes that
099417d6SMatthias Springerpreprocess IR (e.g., so that IR can be bufferized more efficiently), perform
099417d6SMatthias Springerbuffer-level optimizations such as allocation hoisting, and
099417d6SMatthias Springer[insert buffer deallocation ops](OwnershipBasedBufferDeallocation.md) so that
099417d6SMatthias Springerthe resulting `memref` IR has no memory leaks.
099417d6SMatthias Springer
099417d6SMatthias Springer## Deprecated Passes
099417d6SMatthias Springer
099417d6SMatthias SpringerThe buffer deallocation pass has been deprecated in favor of the ownership-based
099417d6SMatthias Springerbuffer deallocation pipeline. The deprecated pass has some limitations that may
099417d6SMatthias Springercause memory leaks in the resulting IR.
975284abSMatthias Springer
975284abSMatthias Springer## What is One-Shot Bufferize?
975284abSMatthias Springer
099417d6SMatthias SpringerOne-Shot Bufferize is a tensor bufferization pass designed for IR in
975284abSMatthias Springer[destination-passing style](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/11/dps-fhpc17.pdf),
975284abSMatthias Springerand with aggressive in-place bufferization.
975284abSMatthias Springer
975284abSMatthias SpringerOne-Shot Bufferize is:
975284abSMatthias Springer
099417d6SMatthias Springer*   **Monolithic**: A single MLIR pass does the entire work.
099417d6SMatthias Springer
099417d6SMatthias Springer*   **Extensible** via an op interface: All ops that implement
099417d6SMatthias Springer    `BufferizableOpInterface` can be bufferized.
975284abSMatthias Springer
b55d55ecSMatthias Springer*   A **whole-function at a time analysis**. In-place bufferization decisions
b55d55ecSMatthias Springer    are made by analyzing SSA use-def chains on tensors. Op interface
b55d55ecSMatthias Springer    implementations not only provide the rewrite logic from tensor ops to memref
b55d55ecSMatthias Springer    ops, but also helper methods for One-Shot Bufferize's analysis to query
b55d55ecSMatthias Springer    information about an op's bufferization/memory semantics.
975284abSMatthias Springer
099417d6SMatthias Springer*   **2-Phase**: Bufferization is internally broken down into 2 steps: First,
b55d55ecSMatthias Springer    analyze the entire IR and make bufferization decisions. Then, bufferize
b55d55ecSMatthias Springer    (rewrite) the IR. The analysis has access to exact SSA use-def information.
b55d55ecSMatthias Springer    It incrementally builds alias and equivalence sets and does not rely on a
b55d55ecSMatthias Springer    posteriori-alias analysis from preallocated memory.
975284abSMatthias Springer
b55d55ecSMatthias Springer*   **Greedy**: Operations are analyzed one-by-one and it is decided on the spot
b55d55ecSMatthias Springer    whether a tensor OpOperand must be copied or not. Heuristics determine the
b55d55ecSMatthias Springer    order of analysis.
975284abSMatthias Springer
b55d55ecSMatthias Springer*   **Modular**: The current One-Shot Analysis can be replaced with a different
b55d55ecSMatthias Springer    analysis. The result of the analysis are queried by the bufferization via
b55d55ecSMatthias Springer    `AnalysisState`, in particular `AnalysisState::isInPlace`. Any derived class
b55d55ecSMatthias Springer    of `AnalysisState` that implements a small number virtual functions can
b55d55ecSMatthias Springer    serve as a custom analysis. It is even possible to run One-Shot Bufferize
b55d55ecSMatthias Springer    without any analysis (`AlwaysCopyAnalysisState`), in which case One-Shot
099417d6SMatthias Springer    Bufferize copies every buffer before writing to it.
975284abSMatthias Springer
099417d6SMatthias SpringerNote that One-Shot Bufferize does not deallocate buffers. That is done by the
099417d6SMatthias Springer[Ownership-based Buffer Deallocation passes](OwnershipBasedBufferDeallocation.md).
975284abSMatthias Springer
975284abSMatthias Springer## Goals of Bufferization
975284abSMatthias Springer
099417d6SMatthias SpringerThe high-level goal of every bufferization technique is to:
099417d6SMatthias Springer
099417d6SMatthias Springer1. Use as little memory as possible.
099417d6SMatthias Springer2. Copy as little memory as possible.
975284abSMatthias Springer
975284abSMatthias SpringerThis implies reusing already allocated buffers when possible, turning
975284abSMatthias Springerbufferization into an algorithmically complex problem with similarities to
975284abSMatthias Springerregister allocation.
975284abSMatthias Springer
975284abSMatthias SpringerDepending on the concrete use case, there may be additional bufferization
975284abSMatthias Springerrequirements. If the contents of a buffer are expensive to compute, there could
975284abSMatthias Springerbe a tradeoff between *recomputation* and *compute once and copy*. On the
975284abSMatthias Springercontrary, it may not even be possible to allocate new buffers at runtime on some
975284abSMatthias Springerarchitectures.
975284abSMatthias Springer
975284abSMatthias Springer## Destination-Passing Style
975284abSMatthias Springer
975284abSMatthias SpringerBufferization is an algorithmically complex problem. Given an op with a tensor
975284abSMatthias Springerresult, bufferization has to choose a memref buffer in which the result can be
975284abSMatthias Springerstored. It is always safe to allocate a brand new buffer, but such a
975284abSMatthias Springerbufferization strategy would be unacceptable for high-performance codegen. When
975284abSMatthias Springerchoosing an already existing buffer, we must be careful not to accidentally
975284abSMatthias Springeroverwrite data that is still needed later in the program.
975284abSMatthias Springer
76dea22bSMehdi AminiTo simplify this problem, One-Shot Bufferize was designed to take advantage of
099417d6SMatthias Springer*destination-passing style* (DPS). In MLIR, DPS op should implement the
099417d6SMatthias Springer[`DestinationStyleOpInterface`](https://github.com/llvm/llvm-project/blob/792d437b56adfb3416daf8105942d4899fb82763/mlir/include/mlir/Interfaces/DestinationStyleOpInterface.td).
099417d6SMatthias SpringerDPS exists in itself independently of bufferization and is tied to SSA
099417d6SMatthias Springersemantics: many ops are "updating" a part of their input SSA variables. For
099417d6SMatthias Springerexample the LLVM instruction
76dea22bSMehdi Amini[`insertelement`](https://llvm.org/docs/LangRef.html#insertelement-instruction)
76dea22bSMehdi Aminiis inserting an element inside a vector. Since SSA values are immutable, the
76dea22bSMehdi Aminioperation returns a copy of the input vector with the element inserted.
099417d6SMatthias SpringerAnother example in MLIR is `linalg.generic` on tensors, which always has an
099417d6SMatthias Springerextra `outs` operand for each result, which provides the initial values to
099417d6SMatthias Springerupdate (for example when the operation is doing a reduction).
76dea22bSMehdi Amini
099417d6SMatthias Springer`outs` operands are referred to as "destinations" in the following (quotes are
76dea22bSMehdi Aminiimportant as this operand isn't modified in place but copied) and comes into
76dea22bSMehdi Aminiplace in the context of bufferization as a possible "anchor" for the
76dea22bSMehdi Aminibufferization algorithm. This allows the user to shape the input in a form that
76dea22bSMehdi Aminiguarantees close to optimal bufferization result when carefully choosing the
76dea22bSMehdi AminiSSA value used as "destination".
76dea22bSMehdi Amini
099417d6SMatthias SpringerFor every tensor result, a DPS op has a corresponding tensor operand. If there
099417d6SMatthias Springeraren't any other conflicting uses of this tensor, the bufferization can alias
099417d6SMatthias Springerit with the op result and perform the operation "in-place" by reusing the buffer
099417d6SMatthias Springerallocated for this "destination" input.
975284abSMatthias Springer
099417d6SMatthias SpringerAs an example, consider the following op: `%r = tensor.insert %f into
099417d6SMatthias Springer%t[%idx] : tensor<5xf32>`
099417d6SMatthias Springer
099417d6SMatthias Springer![tensor.insert example](/includes/img/bufferization_tensor_insert_dst.svg)
975284abSMatthias Springer
76dea22bSMehdi Amini`%t` is the "destination" in this example. When choosing a buffer for the result
099417d6SMatthias Springer`%r`, denoted as `buffer(%r)`, One-Shot Bufferize considers only two options:
975284abSMatthias Springer
099417d6SMatthias Springer1.  `buffer(%r) = buffer(%t)`: store the result in the existing `buffer(%t)`.
099417d6SMatthias Springer    Note that this is not always possible. E.g., if the old contents of
099417d6SMatthias Springer    `buffer(%t)` are still needed. One-Shot Bufferize's main task is to detect
099417d6SMatthias Springer    such cases and fall back to the second option when necessary.
099417d6SMatthias Springer2.  `buffer(%r)` is a newly allocated buffer.
975284abSMatthias Springer
975284abSMatthias SpringerThere may be other buffers in the same function that could potentially be used
099417d6SMatthias Springerfor `buffer(%r)`, but those are not considered by One-Shot Bufferize to keep the
975284abSMatthias Springerbufferization simple. One-Shot Bufferize could be extended to consider such
975284abSMatthias Springerbuffers in the future to achieve a better quality of bufferization.
975284abSMatthias Springer
76dea22bSMehdi AminiTensor ops that are not in destination-passing style always bufferized to a
975284abSMatthias Springermemory allocation. E.g.:
975284abSMatthias Springer
b59fd8c2SMatthias Springer```mlir
975284abSMatthias Springer%0 = tensor.generate %sz {
975284abSMatthias Springer^bb0(%i : index):
975284abSMatthias Springer  %cst = arith.constant 0.0 : f32
975284abSMatthias Springer  tensor.yield %cst : f32
975284abSMatthias Springer} : tensor<?xf32>
975284abSMatthias Springer```
975284abSMatthias Springer
76dea22bSMehdi AminiThe result of `tensor.generate` does not have a "destination" operand, so
099417d6SMatthias Springerbufferization allocates a new buffer. This could be avoided by instead using an
d8a52157SRik Huijzerop such as `linalg.generic`, which can express the same computation with a
76dea22bSMehdi Amini"destination" operand, as specified behind outputs (`outs`):
975284abSMatthias Springer
b59fd8c2SMatthias Springer```mlir
975284abSMatthias Springer#map = affine_map<(i) -> (i)>
975284abSMatthias Springer%0 = linalg.generic {indexing_maps = [#map], iterator_types = ["parallel"]}
975284abSMatthias Springer                    outs(%t : tensor<?xf32>) {
975284abSMatthias Springer  ^bb0(%arg0 : f32):
975284abSMatthias Springer    %cst = arith.constant 0.0 : f32
975284abSMatthias Springer    linalg.yield %cst : f32
975284abSMatthias Springer} -> tensor<?xf32>
975284abSMatthias Springer```
975284abSMatthias Springer
975284abSMatthias SpringerAt first glance, the above `linalg.generic` op may not seem very useful because
975284abSMatthias Springerthe output tensor `%t` is entirely overwritten. Why pass the tensor `%t` as an
975284abSMatthias Springeroperand in the first place? As an example, this can be useful for overwriting a
975284abSMatthias Springerslice of a tensor:
975284abSMatthias Springer
b59fd8c2SMatthias Springer```mlir
975284abSMatthias Springer%t = tensor.extract_slice %s [%idx] [%sz] [1] : tensor<?xf32> to tensor<?xf32>
975284abSMatthias Springer%0 = linalg.generic ... outs(%t) { ... } -> tensor<?xf32>
975284abSMatthias Springer%1 = tensor.insert_slice %0 into %s [%idx] [%sz] [1]
975284abSMatthias Springer    : tensor<?xf32> into tensor<?xf32>
975284abSMatthias Springer```
975284abSMatthias Springer
975284abSMatthias SpringerThe above example bufferizes to a `memref.subview`, followed by a
76dea22bSMehdi Amini"`linalg.generic` on memrefs" that overwrites the memory of the subview, assuming
76dea22bSMehdi Aminithat the slice `%t` has no other user. The `tensor.insert_slice` then bufferizes
76dea22bSMehdi Aminito a no-op (in the absence of RaW conflicts such as a subsequent read of `%s`).
975284abSMatthias Springer
975284abSMatthias SpringerRaW conflicts are detected with an analysis of SSA use-def chains (details
975284abSMatthias Springerlater). One-Shot Bufferize works best if there is a single SSA use-def chain,
76dea22bSMehdi Aminiwhere the result of a tensor op is the operand of the next tensor ops, e.g.:
975284abSMatthias Springer
b59fd8c2SMatthias Springer```mlir
975284abSMatthias Springer%0 = "my_dialect.some_op"(%t) : (tensor<?xf32>) -> (tensor<?xf32>)
975284abSMatthias Springer%1 = "my_dialect.another_op"(%0) : (tensor<?xf32>) -> (tensor<?xf32>)
975284abSMatthias Springer%2 = "my_dialect.yet_another_op"(%1) : (tensor<?xf32>) -> (tensor<?xf32>)
975284abSMatthias Springer```
975284abSMatthias Springer
975284abSMatthias SpringerBuffer copies are likely inserted if the SSA use-def chain splits at some point,
975284abSMatthias Springere.g.:
975284abSMatthias Springer
b59fd8c2SMatthias Springer```mlir
975284abSMatthias Springer%0 = "my_dialect.some_op"(%t) : (tensor<?xf32>) -> (tensor<?xf32>)
975284abSMatthias Springer%1 = "my_dialect.another_op"(%0) : (tensor<?xf32>) -> (tensor<?xf32>)
099417d6SMatthias Springer
099417d6SMatthias Springer// "yet_another_op" likely needs to read the data of %0, so "another_op" cannot
099417d6SMatthias Springer// in-place write to buffer(%0).
975284abSMatthias Springer%2 = "my_dialect.yet_another_op"(%0) : (tensor<?xf32>) -> (tensor<?xf32>)
975284abSMatthias Springer```
975284abSMatthias Springer
099417d6SMatthias Springer## Tensor / MemRef Boundary
099417d6SMatthias Springer
099417d6SMatthias SpringerThe bufferization dialect provides a few helper ops to connect tensor IR (that
099417d6SMatthias Springershould be bufferized) with existing buffers (that may be allocated/provided by
099417d6SMatthias Springera different runtime/library/etc.).
099417d6SMatthias Springer
099417d6SMatthias Springer`bufferization.to_memref %t` returns the future buffer of a tensor SSA value.
099417d6SMatthias Springer`bufferization.to_tensor %m` returns a tensor SSA value for a given MemRef
099417d6SMatthias Springerbuffer. `bufferization.materialize_in_destination` indicates that a tensor value
099417d6SMatthias Springershould materialize in a certain buffer.
099417d6SMatthias Springer
099417d6SMatthias SpringerConsider the following example, where a TOSA matmul result should materialize in
099417d6SMatthias Springeran existing buffer `%C`:
099417d6SMatthias Springer
099417d6SMatthias Springer```mlir
099417d6SMatthias Springer// Batched TOSA matrix multiplication. %A and %B are the
099417d6SMatthias Springer// inputs, %C is the output.
099417d6SMatthias Springerfunc.func @test_matmul(%A: memref<1x17x19xf32>,
099417d6SMatthias Springer                       %B: memref<1x19x29xf32>,
099417d6SMatthias Springer                       %C: memref<1x17x29xf32>) {
099417d6SMatthias Springer
*ced2fc78SChristopher Bate  %A_tensor = bufferization.to_tensor %A restrict : memref<1x17x19xf32> to tensor<1x17x19xf32>
*ced2fc78SChristopher Bate  %B_tensor = bufferization.to_tensor %B restrict : memref<1x19x29xf32> to tensor<1x19x29xf32>
099417d6SMatthias Springer
099417d6SMatthias Springer  %0 = tosa.matmul %A_tensor, %B_tensor
099417d6SMatthias Springer      : (tensor<1x17x19xf32>, tensor<1x19x29xf32>) ->
099417d6SMatthias Springer         tensor<1x17x29xf32>
099417d6SMatthias Springer
099417d6SMatthias Springer  bufferization.materialize_in_destination
099417d6SMatthias Springer    %0 in restrict writable %C
099417d6SMatthias Springer      : (tensor<1x17x29xf32>, memref<1x17x29xf32>) -> ()
099417d6SMatthias Springer
099417d6SMatthias Springer  return
099417d6SMatthias Springer}
099417d6SMatthias Springer```
099417d6SMatthias Springer
099417d6SMatthias SpringerNote that all bufferization ops in this example have the `restrict` unit
099417d6SMatthias Springerattribute set. This attribute is similar to the C restrict keyword and indicates
099417d6SMatthias Springerthat there is no other `to_tensor` or `materialize_in_destination` op with
099417d6SMatthias Springerthe same or an aliasing MemRef operand. Only such
099417d6SMatthias Springer`to_tensor`/`materialize_in_destination` ops are supported. The `restrict`
099417d6SMatthias Springerattribute gives strong aliasing guarantees to the bufferization analysis and
099417d6SMatthias Springerallows us to look only at the tensor IR in a program. (Ops that do not operate
099417d6SMatthias Springeron tensors are ignored by the One-Shot Bufferize.)
099417d6SMatthias Springer
099417d6SMatthias SpringerAlso note that `tosa.matmul` cannot be bufferized as is: there is no
099417d6SMatthias Springer`BufferizableOpInterface` implementation for that op. However, the op can be
099417d6SMatthias Springerlowered to a combination of `tensor.empty` and `linalg.matmul`, which can be
099417d6SMatthias Springerbufferized.
975284abSMatthias Springer
975284abSMatthias Springer## Using One-Shot Bufferize
975284abSMatthias Springer
975284abSMatthias SpringerMLIR provides a pass
975284abSMatthias Springer[`-one-shot-bufferize`](https://mlir.llvm.org/docs/Passes/#-one-shot-bufferize-one-shot-bufferize)
975284abSMatthias Springerthat performs an analysis and bufferizes all ops with tensor semantics that
975284abSMatthias Springerimplement `BufferizableOpInterface`. For modularity reasons, these op interface
975284abSMatthias Springerimplementations are typically external models that live in a dialect's
975284abSMatthias Springer"Transforms" build unit. (External models are a mechanism for implementing an op
975284abSMatthias Springerinterface in a different build unit.) It is the user's responsibility to ensure
975284abSMatthias Springerthat all needed external models are registered before running One-Shot
975284abSMatthias SpringerBufferize.
975284abSMatthias Springer
975284abSMatthias SpringerBy default, One-Shot Bufferize fails when it encounters an op with tensor
975284abSMatthias Springersemantics (i.e., tensor result or tensor operand) that is not bufferizable
975284abSMatthias Springer(i.e., does not implement `BufferizableOpInterface`). This can be avoided with
975284abSMatthias Springer`allow-unknown-ops`. In that case, One-Shot Bufferize inserts
099417d6SMatthias Springer`to_memref`/`to_tensor` ops around the bufferization boundary.
975284abSMatthias Springer
975284abSMatthias SpringerOne-Shot Bufferize can be configured to bufferize only ops from a set of
e394fecdSMatthias Springerdialects with `dialect-filter`.
975284abSMatthias Springer
975284abSMatthias SpringerOne-Shot Bufferize can also be called programmatically with
975284abSMatthias Springer[`bufferization::runOneShotBufferize`](https://github.com/llvm/llvm-project/blob/ae2764e835a26bad9774803eca0a6530df2a3e2d/mlir/include/mlir/Dialect/Bufferization/Transforms/OneShotAnalysis.h#L167).
975284abSMatthias SpringerAlternatively,
975284abSMatthias Springer[`bufferization::bufferizeOp`](https://github.com/llvm/llvm-project/blob/ae2764e835a26bad9774803eca0a6530df2a3e2d/mlir/include/mlir/Dialect/Bufferization/Transforms/Bufferize.h#L78)
e394fecdSMatthias Springerskips the analysis and inserts a copy on every buffer write.
975284abSMatthias Springer
099417d6SMatthias SpringerBy default, function boundaries are not bufferized. This is because there are
099417d6SMatthias Springercurrently limitations around function graph bufferization: recursive
099417d6SMatthias Springercalls are not supported. As long as there are no recursive calls, function
099417d6SMatthias Springerboundary bufferization can be enabled with `bufferize-function-boundaries`. Each
099417d6SMatthias Springertensor function argument and tensor function result is then turned into a
099417d6SMatthias Springermemref. The layout map of the memref type can be controlled with
099417d6SMatthias Springer`function-boundary-type-conversion`.
099417d6SMatthias Springer
975284abSMatthias Springer## Memory Layouts
975284abSMatthias Springer
975284abSMatthias SpringerOne-Shot Bufferize bufferizes ops from top to bottom. This works well when all
975284abSMatthias Springerops are bufferizable. However, when encountering a non-bufferizable tensor with
975284abSMatthias Springer`allow-unknown-ops`, One-Shot Bufferize must insert `to_memref` ops at the
975284abSMatthias Springerbufferization boundary and decide on a memref type. By default, One-Shot
975284abSMatthias SpringerBufferize choose the most dynamic memref type wrt. layout maps. E.g.:
975284abSMatthias Springer
b59fd8c2SMatthias Springer```mlir
975284abSMatthias Springer%0 = "my_dialect.unbufferizable_op(%t) : (tensor<?x?xf32>) -> (tensor<?x?xf32>)
975284abSMatthias Springer%1 = tensor.extract %0[%idx1, %idx2] : tensor<?xf32>
975284abSMatthias Springer```
975284abSMatthias Springer
975284abSMatthias SpringerWhen bufferizing the above IR, One-Shot Bufferize inserts a `to_memref` ops with
975284abSMatthias Springerdynamic offset and strides:
975284abSMatthias Springer
b59fd8c2SMatthias Springer```mlir
975284abSMatthias Springer%0 = "my_dialect.unbufferizable_op(%t) : (tensor<?x?xf32>) -> (tensor<?x?xf32>)
f3fae035SAlex Zinenko%0_m = bufferization.to_memref %0 : memref<?x?xf32, strided<[?, ?], offset: ?>>
f3fae035SAlex Zinenko%1 = memref.load %0_m[%idx1, %idx2] : memref<?x?xf32, strided<[?, ?], offset: ?>>
975284abSMatthias Springer```
975284abSMatthias Springer
975284abSMatthias SpringerAll users of `%0` have fully dynamic layout maps. This ensures that the
975284abSMatthias Springerbufferized IR composes well with future bufferizations of `unbufferizable_op`
975284abSMatthias Springer(maybe bufferized by another pass), regardless of the exact memref type of the
975284abSMatthias Springerfuture bufferization. If the op turns out to be bufferized to an op with a
975284abSMatthias Springersimpler memref type (e.g., identity layout map), we expect that canonicalization
975284abSMatthias Springerpatterns would clean up unnecessarily dynamic layout maps. (Some of these
975284abSMatthias Springercanonicalization patterns may not be implemented yet.)
975284abSMatthias Springer
f287da8aSMatthias SpringerOne-Shot Bufferize tries to infer the most precise memref type when bufferizing
f287da8aSMatthias Springeran op. If the entire IR is bufferizable, we do not have to resort to
f287da8aSMatthias Springerconservatively use fully dynamic layout maps. In that case, we also do not have
f287da8aSMatthias Springerto rely on canonicalization patterns to clean up the bufferized IR.
975284abSMatthias Springer
f287da8aSMatthias SpringerNote: There are some bufferizable ops for which a percise layout map cannot be
f287da8aSMatthias Springerinferred. E.g., a `tensor.cast` from a `tensor<*xf32>` to a `tensor<?x?xf32>`
f287da8aSMatthias Springermust be bufferized to a `memref.cast` with a memref type that has a fully
f287da8aSMatthias Springerdynamic layout map.
f287da8aSMatthias Springer
f287da8aSMatthias SpringerOne-Shot Bufferize has an option `unknown-type-conversion` to control the
f287da8aSMatthias Springergeneration of layout maps when no precise layout can be inferred:
f287da8aSMatthias Springer
f287da8aSMatthias Springer*   `fully-dynamic-layout-map` uses fully dynamic layout maps and is the default
f287da8aSMatthias Springer    behavior. This composes well when IR is partially bufferized.
f287da8aSMatthias Springer*   `identity-layout-map` uses static identity layout maps. This option can be
f287da8aSMatthias Springer    useful for legacy code that cannot handle memref types with layout maps.
f287da8aSMatthias Springer    Note that this setting can lead to additional buffer copies when folding a
f287da8aSMatthias Springer    `to_tensor`/`to_memref` pair with memref types that are not cast-compatible.
f287da8aSMatthias Springer
f287da8aSMatthias SpringerNote: The `unknown-type-conversion` option does not affect layout maps of
f287da8aSMatthias Springerfunction signatures. There is a separate `function-signature-type-conversion`
f287da8aSMatthias Springeroption that controls layout maps of function parameters and function results.
975284abSMatthias Springer
975284abSMatthias Springer## Extending One-Shot Bufferize
975284abSMatthias Springer
975284abSMatthias SpringerCustom ops can be bufferized if they implement `BufferizableOpInterface`. Users
975284abSMatthias Springermust at least implement the following interface methods.
975284abSMatthias Springer
975284abSMatthias Springer*   `bufferizesToMemoryRead`: Return `true` if the buffer of the given tensor
975284abSMatthias Springer    OpOperand is read.
975284abSMatthias Springer*   `bufferizesToMemoryWrite`: Return `true` if the buffer of the given tensor
975284abSMatthias Springer    OpOperand is written (if bufferizing in-place).
975284abSMatthias Springer*   `getAliasingOpResult`: Return the OpResults that may share the same buffer
975284abSMatthias Springer    as the given OpOperand. This interface method describes to
975284abSMatthias Springer    OpOperand-to-OpResult mapping wrt. destination-passing style.
975284abSMatthias Springer*   `bufferRelation`: Return `BufferRelation::Equivalent` if the given OpResult
975284abSMatthias Springer    is the exact same memref as the aliasing OpOperand after bufferization (in
975284abSMatthias Springer    case of in-place bufferization). Otherwise, (e.g., they overlap but are not
148432eaSMatthias Springer    necessarily the exact same memrefs), `BufferRelation::Unknown` should be
975284abSMatthias Springer    returned. Additional buffer relations will be added in the future, but
148432eaSMatthias Springer    `BufferRelation::Unknown` is always safe.
975284abSMatthias Springer*   `bufferize`: Rewrite the op with the given rewriter. Ops should be replaced
975284abSMatthias Springer    with `bufferization::replaceOpWithBufferizedValues`.
975284abSMatthias Springer
975284abSMatthias SpringerTo get a better intuition of the interface methods, we invite users to take a
975284abSMatthias Springerlook at existing implementations in MLIR, e.g., the implementation of
975284abSMatthias Springer`tensor.insert` or `tensor.extract`.
975284abSMatthias Springer
099417d6SMatthias SpringerInterface implementations of DPS ops (that implement
099417d6SMatthias Springer`DestinationStyleOpInterface`) can derive from
099417d6SMatthias Springer`DstBufferizableOpInterfaceExternalModel`, which provides all necessary
099417d6SMatthias Springermethod implementations except for `bufferize`.
099417d6SMatthias Springer
975284abSMatthias Springer## Debugging Buffer Copies
975284abSMatthias Springer
975284abSMatthias SpringerTo get a better understanding of why One-Shot Bufferize introduced a buffer
975284abSMatthias Springercopy, users can run the pass with `test-analysis-only print-conflicts`. Every
975284abSMatthias Springertensor op is then annotated with an attribute that has a boolean value for each
975284abSMatthias Springertensor OpOperand. `true` means that the OpOperand bufferizes in-place. `false`
975284abSMatthias Springermeans that the OpOperand bufferizes out-of-place and a buffer copy will be
975284abSMatthias Springerinserted.
975284abSMatthias Springer
975284abSMatthias SpringerThere are two reasons why a buffer copy may be inserted.
975284abSMatthias Springer
975284abSMatthias Springer1.  Due to a RaW conflict, it is not safe to bufferize in-place. I.e., the
975284abSMatthias Springer    overwritten data is still needed.
975284abSMatthias Springer2.  The buffer is not writable. E.g., `memref.global` buffers that are the
975284abSMatthias Springer    result of `arith.constant` ops are never modified.
975284abSMatthias Springer
975284abSMatthias SpringerIn the first case, `print-conflicts` illustrates the conflict in the form of a
975284abSMatthias Springer("read", "conflicting write", "last write") tuple.
975284abSMatthias Springer
099417d6SMatthias SpringerA RaW conflict consists of three parts, in the following order according to
099417d6SMatthias Springerop dominance:
099417d6SMatthias Springer
099417d6SMatthias Springer1. **Definition:** A tensor `%t` is defined.
099417d6SMatthias Springer2. **Conflicting Write:** An operation writes to `buffer(%t)`.
099417d6SMatthias Springer3. **Read:** An operation reads `%t`.
099417d6SMatthias Springer
099417d6SMatthias SpringerWhen such a RaW conflict is detected during the analysis phase, One-Shot
099417d6SMatthias SpringerBufferize will insert a buffer copy for the conflicting write.
099417d6SMatthias Springer
099417d6SMatthias Springer**Example**
099417d6SMatthias Springer
099417d6SMatthias Springer```mlir
099417d6SMatthias Springer// RUN: mlir-opt %s -one-shot-bufferize="bufferize-function-boundaries test-analysis-only print-conflicts"
099417d6SMatthias Springerfunc.func @test(%arg0: f32, %arg1: f32, %arg2: index, %arg3: index) -> (f32, tensor<3xf32>) {
099417d6SMatthias Springer  // Create a new tensor with [%arg0, %arg0, %arg0].
099417d6SMatthias Springer  %0 = tensor.from_elements %arg0, %arg0, %arg0 : tensor<3xf32>
099417d6SMatthias Springer
099417d6SMatthias Springer  // Insert something into the new tensor.
099417d6SMatthias Springer  %1 = tensor.insert %arg1 into %0[%arg2] : tensor<3xf32>
099417d6SMatthias Springer
099417d6SMatthias Springer  // Read from the old tensor.
099417d6SMatthias Springer  %r = tensor.extract %0[%arg3] : tensor<3xf32>
099417d6SMatthias Springer
099417d6SMatthias Springer  // Return the extracted value and the result of the insertion.
099417d6SMatthias Springer  func.return %r, %1 : f32, tensor<3xf32>
099417d6SMatthias Springer}
099417d6SMatthias Springer```
099417d6SMatthias Springer
099417d6SMatthias SpringerThe output IR is as follows:
099417d6SMatthias Springer
099417d6SMatthias Springer```mlir
099417d6SMatthias Springerfunc.func @test(%arg0: f32, %arg1: f32, %arg2: index, %arg3: index) -> (f32, tensor<3xf32>) {
099417d6SMatthias Springer  %from_elements = tensor.from_elements %arg0, %arg0, %arg0 {"C_0[DEF: result 0]"} : tensor<3xf32>
099417d6SMatthias Springer  %inserted = tensor.insert %arg1 into %from_elements[%arg2] {"C_0[CONFL-WRITE: 1]", __inplace_operands_attr__ = ["none", "false", "none"]} : tensor<3xf32>
099417d6SMatthias Springer  %extracted = tensor.extract %from_elements[%arg3] {"C_0[READ: 0]", __inplace_operands_attr__ = ["true", "none"]} : tensor<3xf32>
099417d6SMatthias Springer  return {__inplace_operands_attr__ = ["none", "true"]} %extracted, %inserted : f32, tensor<3xf32>
099417d6SMatthias Springer}
099417d6SMatthias Springer```
099417d6SMatthias Springer
099417d6SMatthias SpringerNote that the IR was not bufferized. It was merely annotated with the results
099417d6SMatthias Springerof the bufferization analysis. Every operation with tensor semantics has a
099417d6SMatthias Springer`__inplace_operands_attr__` attribute with one value per operand. If an operand
099417d6SMatthias Springeris not a tensor, the respective value is `none`. Otherwise, if the operand was
099417d6SMatthias Springerdecided to be bufferized in-place, the value is `true`. A value of `false`
099417d6SMatthias Springerindicates a buffer copy. In the above example, a buffer copy would be inserted
099417d6SMatthias Springerfor `tensor.insert`, so that it does not overwrite `buffer(%from_elements)`,
099417d6SMatthias Springerwhich is still needed for `tensor.extract`.
099417d6SMatthias Springer
099417d6SMatthias SpringerFor each RaW (there is only one in the example), three `C_i` attributes were
099417d6SMatthias Springeradded:
099417d6SMatthias Springer
099417d6SMatthias Springer* `C_0[DEF: result 0]`: A tensor is defined: 0-th result of
099417d6SMatthias Springer  `tensor.from_elements`.
099417d6SMatthias Springer* `C_0[CONFL-WRITE: 1]`: An operation (if bufferized in-place) would write into
099417d6SMatthias Springer  the future buffer of the defined tensor: 1-st operand of `tensor.insert`.
099417d6SMatthias Springer* `C_0[READ: 0]`: An operation reads the tensor definition: 0-th operand of
099417d6SMatthias Springer  `tensor.extract`.
099417d6SMatthias Springer
099417d6SMatthias SpringerThe fully bufferized IR (with the inserted buffer copy) is as follows:
099417d6SMatthias Springer
099417d6SMatthias Springer```mlir
099417d6SMatthias Springerfunc.func @test(%arg0: f32, %arg1: f32, %arg2: index, %arg3: index) -> (f32, memref<3xf32>) {
099417d6SMatthias Springer  %c2 = arith.constant 2 : index
099417d6SMatthias Springer  %c1 = arith.constant 1 : index
099417d6SMatthias Springer  %c0 = arith.constant 0 : index
099417d6SMatthias Springer  %alloc = memref.alloc() {alignment = 64 : i64} : memref<3xf32>
099417d6SMatthias Springer  memref.store %arg0, %alloc[%c0] : memref<3xf32>
099417d6SMatthias Springer  memref.store %arg0, %alloc[%c1] : memref<3xf32>
099417d6SMatthias Springer  memref.store %arg0, %alloc[%c2] : memref<3xf32>
099417d6SMatthias Springer  %alloc_0 = memref.alloc() {alignment = 64 : i64} : memref<3xf32>
099417d6SMatthias Springer  memref.copy %alloc, %alloc_0 : memref<3xf32> to memref<3xf32>
099417d6SMatthias Springer  memref.store %arg1, %alloc_0[%arg2] : memref<3xf32>
099417d6SMatthias Springer  %0 = memref.load %alloc[%arg3] : memref<3xf32>
099417d6SMatthias Springer  return %0, %alloc_0 : f32, memref<3xf32>
099417d6SMatthias Springer}
099417d6SMatthias Springer```
975284abSMatthias Springer
975284abSMatthias SpringerTo get a better understanding of the SSA Use-Def Chain Analysis and the RaW
099417d6SMatthias Springerconflict detection algorithm, interested users may want to refer to:
099417d6SMatthias Springer
099417d6SMatthias Springer* [Original design document](https://discourse.llvm.org/uploads/short-url/5kckJ3DftYwQokG252teFgw3sYa.pdf)
099417d6SMatthias Springer* [ODM talk](https://youtu.be/TXEo59CYS9A), ([slides](https://mlir.llvm.org/OpenMeetings/2022-01-13-One-Shot-Bufferization.pdf)).
099417d6SMatthias Springer* [LLVM Dev Meeting 2023 tutorial slides](https://m-sp.org/downloads/llvm_dev_2023.pdf)