1e2d7d3cbSSean Silva# Bufferization 20313e3bfSJulian Gross 3e2d7d3cbSSean Silva[TOC] 40313e3bfSJulian Gross 5e2d7d3cbSSean Silva## Overview 6e2d7d3cbSSean Silva 7975284abSMatthias SpringerBufferization in MLIR is the process of converting ops with `tensor` semantics 8099417d6SMatthias Springerto ops with `memref` semantics. There are multiple MLIR passes that are related 9099417d6SMatthias Springerto bufferization. These passes typically run as one of the last steps in a 10099417d6SMatthias Springerpass pipeline, right before lowering to `memref` ops to LLVM. That is because 11099417d6SMatthias Springermany transformations are easier or only supported in tensor land; e.g., 12099417d6SMatthias Springer[tile/fuse/… on tensors first](https://llvm.discourse.group/t/rfc-linalg-on-tensors-update-and-comprehensive-bufferization-rfc/3373), 13099417d6SMatthias Springerthen bufferize the remaining IR. 14975284abSMatthias Springer 15099417d6SMatthias Springer 16099417d6SMatthias Springer 17099417d6SMatthias SpringerThe most important bufferization pass is *One-Shot Bufferize*: This pass 18099417d6SMatthias Springerrewrites `tensor` IR to `memref` IR. There are additional helper passes that 19099417d6SMatthias Springerpreprocess IR (e.g., so that IR can be bufferized more efficiently), perform 20099417d6SMatthias Springerbuffer-level optimizations such as allocation hoisting, and 21099417d6SMatthias Springer[insert buffer deallocation ops](OwnershipBasedBufferDeallocation.md) so that 22099417d6SMatthias Springerthe resulting `memref` IR has no memory leaks. 23099417d6SMatthias Springer 24099417d6SMatthias Springer## Deprecated Passes 25099417d6SMatthias Springer 26099417d6SMatthias SpringerThe buffer deallocation pass has been deprecated in favor of the ownership-based 27099417d6SMatthias Springerbuffer deallocation pipeline. The deprecated pass has some limitations that may 28099417d6SMatthias Springercause memory leaks in the resulting IR. 29975284abSMatthias Springer 30975284abSMatthias Springer## What is One-Shot Bufferize? 31975284abSMatthias Springer 32099417d6SMatthias SpringerOne-Shot Bufferize is a tensor bufferization pass designed for IR in 33975284abSMatthias Springer[destination-passing style](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/11/dps-fhpc17.pdf), 34975284abSMatthias Springerand with aggressive in-place bufferization. 35975284abSMatthias Springer 36975284abSMatthias SpringerOne-Shot Bufferize is: 37975284abSMatthias Springer 38099417d6SMatthias Springer* **Monolithic**: A single MLIR pass does the entire work. 39099417d6SMatthias Springer 40099417d6SMatthias Springer* **Extensible** via an op interface: All ops that implement 41099417d6SMatthias Springer `BufferizableOpInterface` can be bufferized. 42975284abSMatthias Springer 43b55d55ecSMatthias Springer* A **whole-function at a time analysis**. In-place bufferization decisions 44b55d55ecSMatthias Springer are made by analyzing SSA use-def chains on tensors. Op interface 45b55d55ecSMatthias Springer implementations not only provide the rewrite logic from tensor ops to memref 46b55d55ecSMatthias Springer ops, but also helper methods for One-Shot Bufferize's analysis to query 47b55d55ecSMatthias Springer information about an op's bufferization/memory semantics. 48975284abSMatthias Springer 49099417d6SMatthias Springer* **2-Phase**: Bufferization is internally broken down into 2 steps: First, 50b55d55ecSMatthias Springer analyze the entire IR and make bufferization decisions. Then, bufferize 51b55d55ecSMatthias Springer (rewrite) the IR. The analysis has access to exact SSA use-def information. 52b55d55ecSMatthias Springer It incrementally builds alias and equivalence sets and does not rely on a 53b55d55ecSMatthias Springer posteriori-alias analysis from preallocated memory. 54975284abSMatthias Springer 55b55d55ecSMatthias Springer* **Greedy**: Operations are analyzed one-by-one and it is decided on the spot 56b55d55ecSMatthias Springer whether a tensor OpOperand must be copied or not. Heuristics determine the 57b55d55ecSMatthias Springer order of analysis. 58975284abSMatthias Springer 59b55d55ecSMatthias Springer* **Modular**: The current One-Shot Analysis can be replaced with a different 60b55d55ecSMatthias Springer analysis. The result of the analysis are queried by the bufferization via 61b55d55ecSMatthias Springer `AnalysisState`, in particular `AnalysisState::isInPlace`. Any derived class 62b55d55ecSMatthias Springer of `AnalysisState` that implements a small number virtual functions can 63b55d55ecSMatthias Springer serve as a custom analysis. It is even possible to run One-Shot Bufferize 64b55d55ecSMatthias Springer without any analysis (`AlwaysCopyAnalysisState`), in which case One-Shot 65099417d6SMatthias Springer Bufferize copies every buffer before writing to it. 66975284abSMatthias Springer 67099417d6SMatthias SpringerNote that One-Shot Bufferize does not deallocate buffers. That is done by the 68099417d6SMatthias Springer[Ownership-based Buffer Deallocation passes](OwnershipBasedBufferDeallocation.md). 69975284abSMatthias Springer 70975284abSMatthias Springer## Goals of Bufferization 71975284abSMatthias Springer 72099417d6SMatthias SpringerThe high-level goal of every bufferization technique is to: 73099417d6SMatthias Springer 74099417d6SMatthias Springer1. Use as little memory as possible. 75099417d6SMatthias Springer2. Copy as little memory as possible. 76975284abSMatthias Springer 77975284abSMatthias SpringerThis implies reusing already allocated buffers when possible, turning 78975284abSMatthias Springerbufferization into an algorithmically complex problem with similarities to 79975284abSMatthias Springerregister allocation. 80975284abSMatthias Springer 81975284abSMatthias SpringerDepending on the concrete use case, there may be additional bufferization 82975284abSMatthias Springerrequirements. If the contents of a buffer are expensive to compute, there could 83975284abSMatthias Springerbe a tradeoff between *recomputation* and *compute once and copy*. On the 84975284abSMatthias Springercontrary, it may not even be possible to allocate new buffers at runtime on some 85975284abSMatthias Springerarchitectures. 86975284abSMatthias Springer 87975284abSMatthias Springer## Destination-Passing Style 88975284abSMatthias Springer 89975284abSMatthias SpringerBufferization is an algorithmically complex problem. Given an op with a tensor 90975284abSMatthias Springerresult, bufferization has to choose a memref buffer in which the result can be 91975284abSMatthias Springerstored. It is always safe to allocate a brand new buffer, but such a 92975284abSMatthias Springerbufferization strategy would be unacceptable for high-performance codegen. When 93975284abSMatthias Springerchoosing an already existing buffer, we must be careful not to accidentally 94975284abSMatthias Springeroverwrite data that is still needed later in the program. 95975284abSMatthias Springer 9676dea22bSMehdi AminiTo simplify this problem, One-Shot Bufferize was designed to take advantage of 97099417d6SMatthias Springer*destination-passing style* (DPS). In MLIR, DPS op should implement the 98099417d6SMatthias Springer[`DestinationStyleOpInterface`](https://github.com/llvm/llvm-project/blob/792d437b56adfb3416daf8105942d4899fb82763/mlir/include/mlir/Interfaces/DestinationStyleOpInterface.td). 99099417d6SMatthias SpringerDPS exists in itself independently of bufferization and is tied to SSA 100099417d6SMatthias Springersemantics: many ops are "updating" a part of their input SSA variables. For 101099417d6SMatthias Springerexample the LLVM instruction 10276dea22bSMehdi Amini[`insertelement`](https://llvm.org/docs/LangRef.html#insertelement-instruction) 10376dea22bSMehdi Aminiis inserting an element inside a vector. Since SSA values are immutable, the 10476dea22bSMehdi Aminioperation returns a copy of the input vector with the element inserted. 105099417d6SMatthias SpringerAnother example in MLIR is `linalg.generic` on tensors, which always has an 106099417d6SMatthias Springerextra `outs` operand for each result, which provides the initial values to 107099417d6SMatthias Springerupdate (for example when the operation is doing a reduction). 10876dea22bSMehdi Amini 109099417d6SMatthias Springer`outs` operands are referred to as "destinations" in the following (quotes are 11076dea22bSMehdi Aminiimportant as this operand isn't modified in place but copied) and comes into 11176dea22bSMehdi Aminiplace in the context of bufferization as a possible "anchor" for the 11276dea22bSMehdi Aminibufferization algorithm. This allows the user to shape the input in a form that 11376dea22bSMehdi Aminiguarantees close to optimal bufferization result when carefully choosing the 11476dea22bSMehdi AminiSSA value used as "destination". 11576dea22bSMehdi Amini 116099417d6SMatthias SpringerFor every tensor result, a DPS op has a corresponding tensor operand. If there 117099417d6SMatthias Springeraren't any other conflicting uses of this tensor, the bufferization can alias 118099417d6SMatthias Springerit with the op result and perform the operation "in-place" by reusing the buffer 119099417d6SMatthias Springerallocated for this "destination" input. 120975284abSMatthias Springer 121099417d6SMatthias SpringerAs an example, consider the following op: `%r = tensor.insert %f into 122099417d6SMatthias Springer%t[%idx] : tensor<5xf32>` 123099417d6SMatthias Springer 124099417d6SMatthias Springer 125975284abSMatthias Springer 12676dea22bSMehdi Amini`%t` is the "destination" in this example. When choosing a buffer for the result 127099417d6SMatthias Springer`%r`, denoted as `buffer(%r)`, One-Shot Bufferize considers only two options: 128975284abSMatthias Springer 129099417d6SMatthias Springer1. `buffer(%r) = buffer(%t)`: store the result in the existing `buffer(%t)`. 130099417d6SMatthias Springer Note that this is not always possible. E.g., if the old contents of 131099417d6SMatthias Springer `buffer(%t)` are still needed. One-Shot Bufferize's main task is to detect 132099417d6SMatthias Springer such cases and fall back to the second option when necessary. 133099417d6SMatthias Springer2. `buffer(%r)` is a newly allocated buffer. 134975284abSMatthias Springer 135975284abSMatthias SpringerThere may be other buffers in the same function that could potentially be used 136099417d6SMatthias Springerfor `buffer(%r)`, but those are not considered by One-Shot Bufferize to keep the 137975284abSMatthias Springerbufferization simple. One-Shot Bufferize could be extended to consider such 138975284abSMatthias Springerbuffers in the future to achieve a better quality of bufferization. 139975284abSMatthias Springer 14076dea22bSMehdi AminiTensor ops that are not in destination-passing style always bufferized to a 141975284abSMatthias Springermemory allocation. E.g.: 142975284abSMatthias Springer 143b59fd8c2SMatthias Springer```mlir 144975284abSMatthias Springer%0 = tensor.generate %sz { 145975284abSMatthias Springer^bb0(%i : index): 146975284abSMatthias Springer %cst = arith.constant 0.0 : f32 147975284abSMatthias Springer tensor.yield %cst : f32 148975284abSMatthias Springer} : tensor<?xf32> 149975284abSMatthias Springer``` 150975284abSMatthias Springer 15176dea22bSMehdi AminiThe result of `tensor.generate` does not have a "destination" operand, so 152099417d6SMatthias Springerbufferization allocates a new buffer. This could be avoided by instead using an 153d8a52157SRik Huijzerop such as `linalg.generic`, which can express the same computation with a 15476dea22bSMehdi Amini"destination" operand, as specified behind outputs (`outs`): 155975284abSMatthias Springer 156b59fd8c2SMatthias Springer```mlir 157975284abSMatthias Springer#map = affine_map<(i) -> (i)> 158975284abSMatthias Springer%0 = linalg.generic {indexing_maps = [#map], iterator_types = ["parallel"]} 159975284abSMatthias Springer outs(%t : tensor<?xf32>) { 160975284abSMatthias Springer ^bb0(%arg0 : f32): 161975284abSMatthias Springer %cst = arith.constant 0.0 : f32 162975284abSMatthias Springer linalg.yield %cst : f32 163975284abSMatthias Springer} -> tensor<?xf32> 164975284abSMatthias Springer``` 165975284abSMatthias Springer 166975284abSMatthias SpringerAt first glance, the above `linalg.generic` op may not seem very useful because 167975284abSMatthias Springerthe output tensor `%t` is entirely overwritten. Why pass the tensor `%t` as an 168975284abSMatthias Springeroperand in the first place? As an example, this can be useful for overwriting a 169975284abSMatthias Springerslice of a tensor: 170975284abSMatthias Springer 171b59fd8c2SMatthias Springer```mlir 172975284abSMatthias Springer%t = tensor.extract_slice %s [%idx] [%sz] [1] : tensor<?xf32> to tensor<?xf32> 173975284abSMatthias Springer%0 = linalg.generic ... outs(%t) { ... } -> tensor<?xf32> 174975284abSMatthias Springer%1 = tensor.insert_slice %0 into %s [%idx] [%sz] [1] 175975284abSMatthias Springer : tensor<?xf32> into tensor<?xf32> 176975284abSMatthias Springer``` 177975284abSMatthias Springer 178975284abSMatthias SpringerThe above example bufferizes to a `memref.subview`, followed by a 17976dea22bSMehdi Amini"`linalg.generic` on memrefs" that overwrites the memory of the subview, assuming 18076dea22bSMehdi Aminithat the slice `%t` has no other user. The `tensor.insert_slice` then bufferizes 18176dea22bSMehdi Aminito a no-op (in the absence of RaW conflicts such as a subsequent read of `%s`). 182975284abSMatthias Springer 183975284abSMatthias SpringerRaW conflicts are detected with an analysis of SSA use-def chains (details 184975284abSMatthias Springerlater). One-Shot Bufferize works best if there is a single SSA use-def chain, 18576dea22bSMehdi Aminiwhere the result of a tensor op is the operand of the next tensor ops, e.g.: 186975284abSMatthias Springer 187b59fd8c2SMatthias Springer```mlir 188975284abSMatthias Springer%0 = "my_dialect.some_op"(%t) : (tensor<?xf32>) -> (tensor<?xf32>) 189975284abSMatthias Springer%1 = "my_dialect.another_op"(%0) : (tensor<?xf32>) -> (tensor<?xf32>) 190975284abSMatthias Springer%2 = "my_dialect.yet_another_op"(%1) : (tensor<?xf32>) -> (tensor<?xf32>) 191975284abSMatthias Springer``` 192975284abSMatthias Springer 193975284abSMatthias SpringerBuffer copies are likely inserted if the SSA use-def chain splits at some point, 194975284abSMatthias Springere.g.: 195975284abSMatthias Springer 196b59fd8c2SMatthias Springer```mlir 197975284abSMatthias Springer%0 = "my_dialect.some_op"(%t) : (tensor<?xf32>) -> (tensor<?xf32>) 198975284abSMatthias Springer%1 = "my_dialect.another_op"(%0) : (tensor<?xf32>) -> (tensor<?xf32>) 199099417d6SMatthias Springer 200099417d6SMatthias Springer// "yet_another_op" likely needs to read the data of %0, so "another_op" cannot 201099417d6SMatthias Springer// in-place write to buffer(%0). 202975284abSMatthias Springer%2 = "my_dialect.yet_another_op"(%0) : (tensor<?xf32>) -> (tensor<?xf32>) 203975284abSMatthias Springer``` 204975284abSMatthias Springer 205099417d6SMatthias Springer## Tensor / MemRef Boundary 206099417d6SMatthias Springer 207099417d6SMatthias SpringerThe bufferization dialect provides a few helper ops to connect tensor IR (that 208099417d6SMatthias Springershould be bufferized) with existing buffers (that may be allocated/provided by 209099417d6SMatthias Springera different runtime/library/etc.). 210099417d6SMatthias Springer 211099417d6SMatthias Springer`bufferization.to_memref %t` returns the future buffer of a tensor SSA value. 212099417d6SMatthias Springer`bufferization.to_tensor %m` returns a tensor SSA value for a given MemRef 213099417d6SMatthias Springerbuffer. `bufferization.materialize_in_destination` indicates that a tensor value 214099417d6SMatthias Springershould materialize in a certain buffer. 215099417d6SMatthias Springer 216099417d6SMatthias SpringerConsider the following example, where a TOSA matmul result should materialize in 217099417d6SMatthias Springeran existing buffer `%C`: 218099417d6SMatthias Springer 219099417d6SMatthias Springer```mlir 220099417d6SMatthias Springer// Batched TOSA matrix multiplication. %A and %B are the 221099417d6SMatthias Springer// inputs, %C is the output. 222099417d6SMatthias Springerfunc.func @test_matmul(%A: memref<1x17x19xf32>, 223099417d6SMatthias Springer %B: memref<1x19x29xf32>, 224099417d6SMatthias Springer %C: memref<1x17x29xf32>) { 225099417d6SMatthias Springer 226*ced2fc78SChristopher Bate %A_tensor = bufferization.to_tensor %A restrict : memref<1x17x19xf32> to tensor<1x17x19xf32> 227*ced2fc78SChristopher Bate %B_tensor = bufferization.to_tensor %B restrict : memref<1x19x29xf32> to tensor<1x19x29xf32> 228099417d6SMatthias Springer 229099417d6SMatthias Springer %0 = tosa.matmul %A_tensor, %B_tensor 230099417d6SMatthias Springer : (tensor<1x17x19xf32>, tensor<1x19x29xf32>) -> 231099417d6SMatthias Springer tensor<1x17x29xf32> 232099417d6SMatthias Springer 233099417d6SMatthias Springer bufferization.materialize_in_destination 234099417d6SMatthias Springer %0 in restrict writable %C 235099417d6SMatthias Springer : (tensor<1x17x29xf32>, memref<1x17x29xf32>) -> () 236099417d6SMatthias Springer 237099417d6SMatthias Springer return 238099417d6SMatthias Springer} 239099417d6SMatthias Springer``` 240099417d6SMatthias Springer 241099417d6SMatthias SpringerNote that all bufferization ops in this example have the `restrict` unit 242099417d6SMatthias Springerattribute set. This attribute is similar to the C restrict keyword and indicates 243099417d6SMatthias Springerthat there is no other `to_tensor` or `materialize_in_destination` op with 244099417d6SMatthias Springerthe same or an aliasing MemRef operand. Only such 245099417d6SMatthias Springer`to_tensor`/`materialize_in_destination` ops are supported. The `restrict` 246099417d6SMatthias Springerattribute gives strong aliasing guarantees to the bufferization analysis and 247099417d6SMatthias Springerallows us to look only at the tensor IR in a program. (Ops that do not operate 248099417d6SMatthias Springeron tensors are ignored by the One-Shot Bufferize.) 249099417d6SMatthias Springer 250099417d6SMatthias SpringerAlso note that `tosa.matmul` cannot be bufferized as is: there is no 251099417d6SMatthias Springer`BufferizableOpInterface` implementation for that op. However, the op can be 252099417d6SMatthias Springerlowered to a combination of `tensor.empty` and `linalg.matmul`, which can be 253099417d6SMatthias Springerbufferized. 254975284abSMatthias Springer 255975284abSMatthias Springer## Using One-Shot Bufferize 256975284abSMatthias Springer 257975284abSMatthias SpringerMLIR provides a pass 258975284abSMatthias Springer[`-one-shot-bufferize`](https://mlir.llvm.org/docs/Passes/#-one-shot-bufferize-one-shot-bufferize) 259975284abSMatthias Springerthat performs an analysis and bufferizes all ops with tensor semantics that 260975284abSMatthias Springerimplement `BufferizableOpInterface`. For modularity reasons, these op interface 261975284abSMatthias Springerimplementations are typically external models that live in a dialect's 262975284abSMatthias Springer"Transforms" build unit. (External models are a mechanism for implementing an op 263975284abSMatthias Springerinterface in a different build unit.) It is the user's responsibility to ensure 264975284abSMatthias Springerthat all needed external models are registered before running One-Shot 265975284abSMatthias SpringerBufferize. 266975284abSMatthias Springer 267975284abSMatthias SpringerBy default, One-Shot Bufferize fails when it encounters an op with tensor 268975284abSMatthias Springersemantics (i.e., tensor result or tensor operand) that is not bufferizable 269975284abSMatthias Springer(i.e., does not implement `BufferizableOpInterface`). This can be avoided with 270975284abSMatthias Springer`allow-unknown-ops`. In that case, One-Shot Bufferize inserts 271099417d6SMatthias Springer`to_memref`/`to_tensor` ops around the bufferization boundary. 272975284abSMatthias Springer 273975284abSMatthias SpringerOne-Shot Bufferize can be configured to bufferize only ops from a set of 274e394fecdSMatthias Springerdialects with `dialect-filter`. 275975284abSMatthias Springer 276975284abSMatthias SpringerOne-Shot Bufferize can also be called programmatically with 277975284abSMatthias Springer[`bufferization::runOneShotBufferize`](https://github.com/llvm/llvm-project/blob/ae2764e835a26bad9774803eca0a6530df2a3e2d/mlir/include/mlir/Dialect/Bufferization/Transforms/OneShotAnalysis.h#L167). 278975284abSMatthias SpringerAlternatively, 279975284abSMatthias Springer[`bufferization::bufferizeOp`](https://github.com/llvm/llvm-project/blob/ae2764e835a26bad9774803eca0a6530df2a3e2d/mlir/include/mlir/Dialect/Bufferization/Transforms/Bufferize.h#L78) 280e394fecdSMatthias Springerskips the analysis and inserts a copy on every buffer write. 281975284abSMatthias Springer 282099417d6SMatthias SpringerBy default, function boundaries are not bufferized. This is because there are 283099417d6SMatthias Springercurrently limitations around function graph bufferization: recursive 284099417d6SMatthias Springercalls are not supported. As long as there are no recursive calls, function 285099417d6SMatthias Springerboundary bufferization can be enabled with `bufferize-function-boundaries`. Each 286099417d6SMatthias Springertensor function argument and tensor function result is then turned into a 287099417d6SMatthias Springermemref. The layout map of the memref type can be controlled with 288099417d6SMatthias Springer`function-boundary-type-conversion`. 289099417d6SMatthias Springer 290975284abSMatthias Springer## Memory Layouts 291975284abSMatthias Springer 292975284abSMatthias SpringerOne-Shot Bufferize bufferizes ops from top to bottom. This works well when all 293975284abSMatthias Springerops are bufferizable. However, when encountering a non-bufferizable tensor with 294975284abSMatthias Springer`allow-unknown-ops`, One-Shot Bufferize must insert `to_memref` ops at the 295975284abSMatthias Springerbufferization boundary and decide on a memref type. By default, One-Shot 296975284abSMatthias SpringerBufferize choose the most dynamic memref type wrt. layout maps. E.g.: 297975284abSMatthias Springer 298b59fd8c2SMatthias Springer```mlir 299975284abSMatthias Springer%0 = "my_dialect.unbufferizable_op(%t) : (tensor<?x?xf32>) -> (tensor<?x?xf32>) 300975284abSMatthias Springer%1 = tensor.extract %0[%idx1, %idx2] : tensor<?xf32> 301975284abSMatthias Springer``` 302975284abSMatthias Springer 303975284abSMatthias SpringerWhen bufferizing the above IR, One-Shot Bufferize inserts a `to_memref` ops with 304975284abSMatthias Springerdynamic offset and strides: 305975284abSMatthias Springer 306b59fd8c2SMatthias Springer```mlir 307975284abSMatthias Springer%0 = "my_dialect.unbufferizable_op(%t) : (tensor<?x?xf32>) -> (tensor<?x?xf32>) 308f3fae035SAlex Zinenko%0_m = bufferization.to_memref %0 : memref<?x?xf32, strided<[?, ?], offset: ?>> 309f3fae035SAlex Zinenko%1 = memref.load %0_m[%idx1, %idx2] : memref<?x?xf32, strided<[?, ?], offset: ?>> 310975284abSMatthias Springer``` 311975284abSMatthias Springer 312975284abSMatthias SpringerAll users of `%0` have fully dynamic layout maps. This ensures that the 313975284abSMatthias Springerbufferized IR composes well with future bufferizations of `unbufferizable_op` 314975284abSMatthias Springer(maybe bufferized by another pass), regardless of the exact memref type of the 315975284abSMatthias Springerfuture bufferization. If the op turns out to be bufferized to an op with a 316975284abSMatthias Springersimpler memref type (e.g., identity layout map), we expect that canonicalization 317975284abSMatthias Springerpatterns would clean up unnecessarily dynamic layout maps. (Some of these 318975284abSMatthias Springercanonicalization patterns may not be implemented yet.) 319975284abSMatthias Springer 320f287da8aSMatthias SpringerOne-Shot Bufferize tries to infer the most precise memref type when bufferizing 321f287da8aSMatthias Springeran op. If the entire IR is bufferizable, we do not have to resort to 322f287da8aSMatthias Springerconservatively use fully dynamic layout maps. In that case, we also do not have 323f287da8aSMatthias Springerto rely on canonicalization patterns to clean up the bufferized IR. 324975284abSMatthias Springer 325f287da8aSMatthias SpringerNote: There are some bufferizable ops for which a percise layout map cannot be 326f287da8aSMatthias Springerinferred. E.g., a `tensor.cast` from a `tensor<*xf32>` to a `tensor<?x?xf32>` 327f287da8aSMatthias Springermust be bufferized to a `memref.cast` with a memref type that has a fully 328f287da8aSMatthias Springerdynamic layout map. 329f287da8aSMatthias Springer 330f287da8aSMatthias SpringerOne-Shot Bufferize has an option `unknown-type-conversion` to control the 331f287da8aSMatthias Springergeneration of layout maps when no precise layout can be inferred: 332f287da8aSMatthias Springer 333f287da8aSMatthias Springer* `fully-dynamic-layout-map` uses fully dynamic layout maps and is the default 334f287da8aSMatthias Springer behavior. This composes well when IR is partially bufferized. 335f287da8aSMatthias Springer* `identity-layout-map` uses static identity layout maps. This option can be 336f287da8aSMatthias Springer useful for legacy code that cannot handle memref types with layout maps. 337f287da8aSMatthias Springer Note that this setting can lead to additional buffer copies when folding a 338f287da8aSMatthias Springer `to_tensor`/`to_memref` pair with memref types that are not cast-compatible. 339f287da8aSMatthias Springer 340f287da8aSMatthias SpringerNote: The `unknown-type-conversion` option does not affect layout maps of 341f287da8aSMatthias Springerfunction signatures. There is a separate `function-signature-type-conversion` 342f287da8aSMatthias Springeroption that controls layout maps of function parameters and function results. 343975284abSMatthias Springer 344975284abSMatthias Springer## Extending One-Shot Bufferize 345975284abSMatthias Springer 346975284abSMatthias SpringerCustom ops can be bufferized if they implement `BufferizableOpInterface`. Users 347975284abSMatthias Springermust at least implement the following interface methods. 348975284abSMatthias Springer 349975284abSMatthias Springer* `bufferizesToMemoryRead`: Return `true` if the buffer of the given tensor 350975284abSMatthias Springer OpOperand is read. 351975284abSMatthias Springer* `bufferizesToMemoryWrite`: Return `true` if the buffer of the given tensor 352975284abSMatthias Springer OpOperand is written (if bufferizing in-place). 353975284abSMatthias Springer* `getAliasingOpResult`: Return the OpResults that may share the same buffer 354975284abSMatthias Springer as the given OpOperand. This interface method describes to 355975284abSMatthias Springer OpOperand-to-OpResult mapping wrt. destination-passing style. 356975284abSMatthias Springer* `bufferRelation`: Return `BufferRelation::Equivalent` if the given OpResult 357975284abSMatthias Springer is the exact same memref as the aliasing OpOperand after bufferization (in 358975284abSMatthias Springer case of in-place bufferization). Otherwise, (e.g., they overlap but are not 359148432eaSMatthias Springer necessarily the exact same memrefs), `BufferRelation::Unknown` should be 360975284abSMatthias Springer returned. Additional buffer relations will be added in the future, but 361148432eaSMatthias Springer `BufferRelation::Unknown` is always safe. 362975284abSMatthias Springer* `bufferize`: Rewrite the op with the given rewriter. Ops should be replaced 363975284abSMatthias Springer with `bufferization::replaceOpWithBufferizedValues`. 364975284abSMatthias Springer 365975284abSMatthias SpringerTo get a better intuition of the interface methods, we invite users to take a 366975284abSMatthias Springerlook at existing implementations in MLIR, e.g., the implementation of 367975284abSMatthias Springer`tensor.insert` or `tensor.extract`. 368975284abSMatthias Springer 369099417d6SMatthias SpringerInterface implementations of DPS ops (that implement 370099417d6SMatthias Springer`DestinationStyleOpInterface`) can derive from 371099417d6SMatthias Springer`DstBufferizableOpInterfaceExternalModel`, which provides all necessary 372099417d6SMatthias Springermethod implementations except for `bufferize`. 373099417d6SMatthias Springer 374975284abSMatthias Springer## Debugging Buffer Copies 375975284abSMatthias Springer 376975284abSMatthias SpringerTo get a better understanding of why One-Shot Bufferize introduced a buffer 377975284abSMatthias Springercopy, users can run the pass with `test-analysis-only print-conflicts`. Every 378975284abSMatthias Springertensor op is then annotated with an attribute that has a boolean value for each 379975284abSMatthias Springertensor OpOperand. `true` means that the OpOperand bufferizes in-place. `false` 380975284abSMatthias Springermeans that the OpOperand bufferizes out-of-place and a buffer copy will be 381975284abSMatthias Springerinserted. 382975284abSMatthias Springer 383975284abSMatthias SpringerThere are two reasons why a buffer copy may be inserted. 384975284abSMatthias Springer 385975284abSMatthias Springer1. Due to a RaW conflict, it is not safe to bufferize in-place. I.e., the 386975284abSMatthias Springer overwritten data is still needed. 387975284abSMatthias Springer2. The buffer is not writable. E.g., `memref.global` buffers that are the 388975284abSMatthias Springer result of `arith.constant` ops are never modified. 389975284abSMatthias Springer 390975284abSMatthias SpringerIn the first case, `print-conflicts` illustrates the conflict in the form of a 391975284abSMatthias Springer("read", "conflicting write", "last write") tuple. 392975284abSMatthias Springer 393099417d6SMatthias SpringerA RaW conflict consists of three parts, in the following order according to 394099417d6SMatthias Springerop dominance: 395099417d6SMatthias Springer 396099417d6SMatthias Springer1. **Definition:** A tensor `%t` is defined. 397099417d6SMatthias Springer2. **Conflicting Write:** An operation writes to `buffer(%t)`. 398099417d6SMatthias Springer3. **Read:** An operation reads `%t`. 399099417d6SMatthias Springer 400099417d6SMatthias SpringerWhen such a RaW conflict is detected during the analysis phase, One-Shot 401099417d6SMatthias SpringerBufferize will insert a buffer copy for the conflicting write. 402099417d6SMatthias Springer 403099417d6SMatthias Springer**Example** 404099417d6SMatthias Springer 405099417d6SMatthias Springer```mlir 406099417d6SMatthias Springer// RUN: mlir-opt %s -one-shot-bufferize="bufferize-function-boundaries test-analysis-only print-conflicts" 407099417d6SMatthias Springerfunc.func @test(%arg0: f32, %arg1: f32, %arg2: index, %arg3: index) -> (f32, tensor<3xf32>) { 408099417d6SMatthias Springer // Create a new tensor with [%arg0, %arg0, %arg0]. 409099417d6SMatthias Springer %0 = tensor.from_elements %arg0, %arg0, %arg0 : tensor<3xf32> 410099417d6SMatthias Springer 411099417d6SMatthias Springer // Insert something into the new tensor. 412099417d6SMatthias Springer %1 = tensor.insert %arg1 into %0[%arg2] : tensor<3xf32> 413099417d6SMatthias Springer 414099417d6SMatthias Springer // Read from the old tensor. 415099417d6SMatthias Springer %r = tensor.extract %0[%arg3] : tensor<3xf32> 416099417d6SMatthias Springer 417099417d6SMatthias Springer // Return the extracted value and the result of the insertion. 418099417d6SMatthias Springer func.return %r, %1 : f32, tensor<3xf32> 419099417d6SMatthias Springer} 420099417d6SMatthias Springer``` 421099417d6SMatthias Springer 422099417d6SMatthias SpringerThe output IR is as follows: 423099417d6SMatthias Springer 424099417d6SMatthias Springer```mlir 425099417d6SMatthias Springerfunc.func @test(%arg0: f32, %arg1: f32, %arg2: index, %arg3: index) -> (f32, tensor<3xf32>) { 426099417d6SMatthias Springer %from_elements = tensor.from_elements %arg0, %arg0, %arg0 {"C_0[DEF: result 0]"} : tensor<3xf32> 427099417d6SMatthias Springer %inserted = tensor.insert %arg1 into %from_elements[%arg2] {"C_0[CONFL-WRITE: 1]", __inplace_operands_attr__ = ["none", "false", "none"]} : tensor<3xf32> 428099417d6SMatthias Springer %extracted = tensor.extract %from_elements[%arg3] {"C_0[READ: 0]", __inplace_operands_attr__ = ["true", "none"]} : tensor<3xf32> 429099417d6SMatthias Springer return {__inplace_operands_attr__ = ["none", "true"]} %extracted, %inserted : f32, tensor<3xf32> 430099417d6SMatthias Springer} 431099417d6SMatthias Springer``` 432099417d6SMatthias Springer 433099417d6SMatthias SpringerNote that the IR was not bufferized. It was merely annotated with the results 434099417d6SMatthias Springerof the bufferization analysis. Every operation with tensor semantics has a 435099417d6SMatthias Springer`__inplace_operands_attr__` attribute with one value per operand. If an operand 436099417d6SMatthias Springeris not a tensor, the respective value is `none`. Otherwise, if the operand was 437099417d6SMatthias Springerdecided to be bufferized in-place, the value is `true`. A value of `false` 438099417d6SMatthias Springerindicates a buffer copy. In the above example, a buffer copy would be inserted 439099417d6SMatthias Springerfor `tensor.insert`, so that it does not overwrite `buffer(%from_elements)`, 440099417d6SMatthias Springerwhich is still needed for `tensor.extract`. 441099417d6SMatthias Springer 442099417d6SMatthias SpringerFor each RaW (there is only one in the example), three `C_i` attributes were 443099417d6SMatthias Springeradded: 444099417d6SMatthias Springer 445099417d6SMatthias Springer* `C_0[DEF: result 0]`: A tensor is defined: 0-th result of 446099417d6SMatthias Springer `tensor.from_elements`. 447099417d6SMatthias Springer* `C_0[CONFL-WRITE: 1]`: An operation (if bufferized in-place) would write into 448099417d6SMatthias Springer the future buffer of the defined tensor: 1-st operand of `tensor.insert`. 449099417d6SMatthias Springer* `C_0[READ: 0]`: An operation reads the tensor definition: 0-th operand of 450099417d6SMatthias Springer `tensor.extract`. 451099417d6SMatthias Springer 452099417d6SMatthias SpringerThe fully bufferized IR (with the inserted buffer copy) is as follows: 453099417d6SMatthias Springer 454099417d6SMatthias Springer```mlir 455099417d6SMatthias Springerfunc.func @test(%arg0: f32, %arg1: f32, %arg2: index, %arg3: index) -> (f32, memref<3xf32>) { 456099417d6SMatthias Springer %c2 = arith.constant 2 : index 457099417d6SMatthias Springer %c1 = arith.constant 1 : index 458099417d6SMatthias Springer %c0 = arith.constant 0 : index 459099417d6SMatthias Springer %alloc = memref.alloc() {alignment = 64 : i64} : memref<3xf32> 460099417d6SMatthias Springer memref.store %arg0, %alloc[%c0] : memref<3xf32> 461099417d6SMatthias Springer memref.store %arg0, %alloc[%c1] : memref<3xf32> 462099417d6SMatthias Springer memref.store %arg0, %alloc[%c2] : memref<3xf32> 463099417d6SMatthias Springer %alloc_0 = memref.alloc() {alignment = 64 : i64} : memref<3xf32> 464099417d6SMatthias Springer memref.copy %alloc, %alloc_0 : memref<3xf32> to memref<3xf32> 465099417d6SMatthias Springer memref.store %arg1, %alloc_0[%arg2] : memref<3xf32> 466099417d6SMatthias Springer %0 = memref.load %alloc[%arg3] : memref<3xf32> 467099417d6SMatthias Springer return %0, %alloc_0 : f32, memref<3xf32> 468099417d6SMatthias Springer} 469099417d6SMatthias Springer``` 470975284abSMatthias Springer 471975284abSMatthias SpringerTo get a better understanding of the SSA Use-Def Chain Analysis and the RaW 472099417d6SMatthias Springerconflict detection algorithm, interested users may want to refer to: 473099417d6SMatthias Springer 474099417d6SMatthias Springer* [Original design document](https://discourse.llvm.org/uploads/short-url/5kckJ3DftYwQokG252teFgw3sYa.pdf) 475099417d6SMatthias Springer* [ODM talk](https://youtu.be/TXEo59CYS9A), ([slides](https://mlir.llvm.org/OpenMeetings/2022-01-13-One-Shot-Bufferization.pdf)). 476099417d6SMatthias Springer* [LLVM Dev Meeting 2023 tutorial slides](https://m-sp.org/downloads/llvm_dev_2023.pdf) 477