1# Ownership-based Buffer Deallocation 2 3[TOC] 4 5One-Shot Bufferize does not deallocate any buffers that it allocates. After 6running One-Shot Bufferize, the resulting IR may have a number of `memref.alloc` 7ops, but no `memref.dealloc` ops. Buffer dellocation is delegated to the 8`-ownership-based-buffer-deallocation` pass. This pass supersedes the now 9deprecated `-buffer-deallocation` pass, which does not work well with 10One-Shot Bufferize. 11 12On a high level, buffers are "owned" by a basic block. Ownership materializes 13as an `i1` SSA value and can be thought of as "responsibility to deallocate". It 14is conceptually similar to `std::unique_ptr` in C++. 15 16There are few additional preprocessing and postprocessing passes that should be 17run together with the ownership-based buffer deallocation pass. The recommended 18compilation pipeline is as follows: 19 20``` 21one-shot-bufferize 22 | it's recommended to perform all bufferization here at latest, 23 | <- any allocations inserted after this point have to be handled 24 V manually 25expand-realloc 26 V 27ownership-based-buffer-deallocation 28 V 29 canonicalize <- mostly for scf.if simplifications 30 V 31buffer-deallocation-simplification 32 V <- from this point onwards no tensor values are allowed 33lower-deallocations 34 V 35 CSE 36 V 37 canonicalize 38``` 39 40The entire deallocation pipeline (excluding `-one-shot-bufferize`) is exposed 41as `-buffer-deallocation-pipeline`. 42 43The ownership-based buffer deallocation pass processes operations implementing 44`FunctionOpInterface` one-by-one without analysing the call-graph. 45This means that there have to be [some rules](#function-boundary-abi) on how 46MemRefs are handled when being passed from one function to another. The rest of 47the pass revolves heavily around the `bufferization.dealloc` operation which is 48inserted at the end of each basic block with appropriate operands and should be 49optimized using the Buffer Deallocation Simplification pass 50(`--buffer-deallocation-simplification`) and the regular canonicalizer 51(`--canonicalize`). Lowering the result of the 52`-ownership-based-buffer-deallocation` pass directly using 53`--convert-bufferization-to-memref` without beforehand optimization is not 54recommended as it will lead to very inefficient code (the runtime-cost of 55`bufferization.dealloc` is `O(|memrefs|^2+|memref|*|retained|)`). 56 57## Function boundary ABI 58 59The Buffer Deallocation pass operates on the level of operations implementing 60the `FunctionOpInterface`. Such operations can take MemRefs as arguments, but 61also return them. To ensure compatibility among all functions (including 62external ones), some rules have to be enforced: 63* When a MemRef is passed as a function argument, ownership is never acquired. 64 It is always the caller's responsibility to deallocate such MemRefs. 65* Returning a MemRef from a function always passes ownership to the caller, 66 i.e., it is also the caller's responsibility to deallocate memrefs returned 67 from a called function. 68* A function must not return a MemRef with the same allocated base buffer as 69 one of its arguments (in this case a copy has to be created). Note that in 70 this context two subviews of the same buffer that don't overlap are also 71 considered to alias. 72 73For external functions (e.g., library functions written externally in C), the 74externally provided implementation has to adhere to these rules and they are 75just assumed by the buffer deallocation pass. Functions on which the 76deallocation pass is applied and for which the implementation is accessible are 77modified by the pass such that the ABI is respected (i.e., buffer copies are 78inserted when necessary). 79 80## Inserting `bufferization.dealloc` operations 81 82`bufferization.dealloc` and ownership indicators are the main abstractions in 83the ownership-based buffer deallocation pass. `bufferization.dealloc` 84deallocates all given buffers if the respective ownership indicator is set and 85there is no aliasing buffer in the retain list. 86 87 88 89`bufferization.dealloc` operations are unconditionally inserted at the end of 90each basic block (just before the terminator). The majority of the pass is about 91finding the correct operands for this operation. There are three variadic 92operand lists to be populated, the first contains all MemRef values that may 93need to be deallocated, the second list contains their associated ownership 94values (of `i1` type), and the third list contains MemRef values that are still 95needed at a later point and should thus not be deallocated (e.g., yielded or 96returned buffers). 97 98`bufferization.dealloc` allows us to deal with any kind of aliasing behavior: it 99lowers to runtime aliasing checks when not enough information can be collected 100statically. When enough aliasing information is statically available, operands 101or the entire op may fold away. 102 103**Ownerships** 104 105To do so, we use a concept of ownership indicators of memrefs which materialize 106as an `i1` value for any SSA value of `memref` type, indicating whether the 107basic block in which it was materialized has ownership of this MemRef. Ideally, 108this is a constant `true` or `false`, but might also be a non-constant SSA 109value. To keep track of those ownership values without immediately materializing 110them (which might require insertion of `bufferization.clone` operations or 111operations checking for aliasing at runtime at positions where we don't actually 112need a materialized value), we use the `Ownership` class. This class represents 113the ownership in three states forming a lattice on a partial order: 114``` 115forall X in SSA values. uninitialized < unique(X) < unknown 116forall X, Y in SSA values. 117 unique(X) == unique(Y) iff X and Y always evaluate to the same value 118 unique(X) != unique(Y) otherwise 119``` 120Intuitively, the states have the following meaning: 121* Uninitialized: the ownership is not initialized yet, this is the default 122 state; once an operation is finished processing the ownership of all 123 operation results with MemRef type should not be uninitialized anymore. 124* Unique: there is a specific SSA value that can be queried to check ownership 125 without materializing any additional IR 126* Unknown: no specific SSA value is available without materializing additional 127 IR, typically this is because two ownerships in 'Unique' state would have to 128 be merged manually (e.g., the result of an `arith.select` either has the 129 ownership of the then or else case depending on the condition value, 130 inserting another `arith.select` for the ownership values can perform the 131 merge and provide a 'Unique' ownership for the result), however, in the 132 general case this 'Unknown' state has to be assigned. 133 134Implied by the above partial order, the pass combines two ownerships in the 135following way: 136 137| Ownership 1 | Ownership 2 | Combined Ownership | 138|:--------------|:--------------|:-------------------| 139| uninitialized | uninitialized | uninitialized | 140| unique(X) | uninitialized | unique(X) | 141| unique(X) | unique(X) | unique(X) | 142| unique(X) | unique(Y) | unknown | 143| unknown | unique | unknown | 144| unknown | uninitialized | unknown | 145| <td colspan=3> + symmetric cases | 146 147**Collecting the list of MemRefs that potentially need to be deallocated** 148 149For a given block, the list of MemRefs that potentially need to be deallocated 150at the end of that block is computed by keeping track of all values for which 151the block potentially takes over ownership. This includes MemRefs provided as 152basic block arguments, interface handlers for operations like `memref.alloc` and 153`func.call`, but also liveness information in regions with multiple basic 154blocks. More concretely, it is computed by taking the MemRefs in the 'in' set 155of the liveness analysis of the current basic block B, appended by the MemRef 156block arguments and by the set of MemRefs allocated in B itself (determined by 157the interface handlers), then subtracted (also determined by the interface 158handlers) by the set of MemRefs deallocated in B. 159 160Note that we don't have to take the intersection of the liveness 'in' set with 161the 'out' set of the predecessor block because a value that is in the 'in' set 162must be defined in an ancestor block that dominates all direct predecessors and 163thus the 'in' set of this block is a subset of the 'out' sets of each 164predecessor. 165 166``` 167memrefs = filter((liveIn(block) U 168 allocated(block) U arguments(block)) \ deallocated(block), isMemRef) 169``` 170 171The list of conditions for the second variadic operands list of 172`bufferization.dealloc` is computed by querying the stored ownership value for 173each of the MemRefs collected as described above. The ownership state is updated 174by the interface handlers while processing the basic block. 175 176**Collecting the list of MemRefs to retain** 177 178Given a basic block B, the list of MemRefs that have to be retained can be 179different for each successor block S. For the two basic blocks B and S and the 180values passed via block arguments to the destination block S, we compute the 181list of MemRefs that have to be retained in B by taking the MemRefs in the 182successor operand list of the terminator and the MemRefs in the 'out' set of the 183liveness analysis for B intersected with the 'in' set of the destination block 184S. 185 186This list of retained values makes sure that we cannot run into use-after-free 187situations even if no aliasing information is present at compile-time. 188 189``` 190toRetain = filter(successorOperands + (liveOut(fromBlock) insersect 191 liveIn(toBlock)), isMemRef) 192``` 193 194## Supported interfaces 195 196The pass uses liveness analysis and a few interfaces: 197* `FunctionOpInterface` 198* `CallOpInterface` 199* `MemoryEffectOpInterface` 200* `RegionBranchOpInterface` 201* `RegionBranchTerminatorOpInterface` 202 203Due to insufficient information provided by the interface, it also special-cases 204on the `cf.cond_br` operation and makes some assumptions about operations 205implementing the `RegionBranchOpInterface` at the moment, but improving the 206interfaces would allow us to remove those dependencies in the future. 207 208## Limitations 209 210The Buffer Deallocation pass has some requirements and limitations on the input 211IR. These are checked in the beginning of the pass and errors are emitted 212accordingly: 213* The set of interfaces the pass operates on must be implemented (correctly). 214 E.g., if there is an operation present with a nested region, but does not 215 implement the `RegionBranchOpInterface`, an error is emitted because the 216 pass cannot know the semantics of the nested region (and does not make any 217 default assumptions on it). 218* No explicit control-flow loops are present. Currently, only loops using 219 structural-control-flow are supported. However, this limitation could be 220 lifted in the future. 221* Deallocation operations should not be present already. The pass should 222 handle them correctly already (at least in most cases), but it's not 223 supported yet due to insufficient testing. 224* Terminators must implement either `RegionBranchTerminatorOpInterface` or 225 `BranchOpInterface`, but not both. Terminators with more than one successor 226 are not supported (except `cf.cond_br`). This is not a fundamental 227 limitation, but there is no use-case justifying the more complex 228 implementation at the moment. 229 230## Example 231 232The following example contains a few interesting cases: 233* Basic block arguments are modified to also pass along the ownership 234 indicator, but not for entry blocks, where the function boundary ABI 235 is applied instead. 236* The result of `arith.select` initially has 'Unknown' assigned as ownership, 237 but once the `bufferization.dealloc` operation is inserted it is put in the 238 'retained' list (since it has uses in a later basic block) and thus the 239 'Unknown' ownership can be replaced with a 'Unique' ownership using the 240 corresponding result of the dealloc operation. 241* The `cf.cond_br` operation has more than one successor and thus has to 242 insert two `bufferization.dealloc` operations (one for each successor). 243 While they have the same list of MemRefs to deallocate (because they perform 244 the deallocations for the same block), it must be taken into account that 245 some MemRefs remain *live* for one branch but not the other (thus set 246 intersection is performed on the *live-out* of the current block and the 247 *live-in* of the target block). Also, `cf.cond_br` supports separate 248 forwarding operands for each successor. To make sure that no MemRef is 249 deallocated twice (because there are two `bufferization.dealloc` operations 250 with the same MemRefs to deallocate), the condition operands are adjusted to 251 take the branch condition into account. While a generic lowering for such 252 terminator operations could be implemented, a specialized implementation can 253 take all the semantics of this particular operation into account and thus 254 generate a more efficient lowering. 255 256```mlir 257func.func @example(%memref: memref<?xi8>, %select_cond: i1, %br_cond: i1) { 258 %alloc = memref.alloc() : memref<?xi8> 259 %alloca = memref.alloca() : memref<?xi8> 260 %select = arith.select %select_cond, %alloc, %alloca : memref<?xi8> 261 cf.cond_br %br_cond, ^bb1(%alloc : memref<?xi8>), ^bb1(%memref : memref<?xi8>) 262^bb1(%bbarg: memref<?xi8>): 263 test.copy(%bbarg, %select) : (memref<?xi8>, memref<?xi8>) 264 return 265} 266``` 267 268After running `--ownership-based-buffer-deallocation`, it looks as follows: 269 270```mlir 271// Function boundary ABI: ownership of `%memref` will never be acquired. 272func.func @example(%memref: memref<?xi8>, %select_cond: i1, %br_cond: i1) { 273 %false = arith.constant false 274 %true = arith.constant true 275 276 // The ownership of a MemRef defined by the `memref.alloc` operation is always 277 // assigned to be 'true'. 278 %alloc = memref.alloc() : memref<?xi8> 279 280 // The ownership of a MemRef defined by the `memref.alloca` operation is 281 // always assigned to be 'false'. 282 %alloca = memref.alloca() : memref<?xi8> 283 284 // The ownership of %select will be the join of the ownership of %alloc and 285 // the ownership of %alloca, i.e., of %true and %false. Because the pass does 286 // not know about the semantics of the `arith.select` operation (unless a 287 // custom handler is implemented), the ownership join will be 'Unknown'. If 288 // the materialized ownership indicator of %select is needed, either a clone 289 // has to be created for which %true is assigned as ownership or the result 290 // of a `bufferization.dealloc` where %select is in the retain list has to be 291 // used. 292 %select = arith.select %select_cond, %alloc, %alloca : memref<?xi8> 293 294 // We use `memref.extract_strided_metadata` to get the base memref since it is 295 // not allowed to pass arbitrary memrefs to `memref.dealloc`. This property is 296 // already enforced for `bufferization.dealloc` 297 %base_buffer_memref, ... = memref.extract_strided_metadata %memref 298 : memref<?xi8> -> memref<i8>, index, index, index 299 %base_buffer_alloc, ... = memref.extract_strided_metadata %alloc 300 : memref<?xi8> -> memref<i8>, index, index, index 301 %base_buffer_alloca, ... = memref.extract_strided_metadata %alloca 302 : memref<?xi8> -> memref<i8>, index, index, index 303 304 // The deallocation conditions need to be adjusted to incorporate the branch 305 // condition. In this example, this requires only a single negation, but might 306 // also require multiple arith.andi operations. 307 %not_br_cond = arith.xori %true, %br_cond : i1 308 309 // There are two dealloc operations inserted in this basic block, one per 310 // successor. Both have the same list of MemRefs to deallocate and the 311 // conditions only differ by the branch condition conjunct. 312 // Note, however, that the retained list differs. Here, both contain the 313 // %select value because it is used in both successors (since it's the same 314 // block), but the value passed via block argument differs (%memref vs. 315 // %alloc). 316 %10:2 = bufferization.dealloc 317 (%base_buffer_memref, %base_buffer_alloc, %base_buffer_alloca 318 : memref<i8>, memref<i8>, memref<i8>) 319 if (%false, %br_cond, %false) 320 retain (%alloc, %select : memref<?xi8>, memref<?xi8>) 321 322 %11:2 = bufferization.dealloc 323 (%base_buffer_memref, %base_buffer_alloc, %base_buffer_alloca 324 : memref<i8>, memref<i8>, memref<i8>) 325 if (%false, %not_br_cond, %false) 326 retain (%memref, %select : memref<?xi8>, memref<?xi8>) 327 328 // Because %select is used in ^bb1 without passing it via block argument, we 329 // need to update it's ownership value here by merging the ownership values 330 // returned by the dealloc operations 331 %new_ownership = arith.select %br_cond, %10#1, %11#1 : i1 332 333 // The terminator is modified to pass along the ownership indicator values 334 // with each MemRef value. 335 cf.cond_br %br_cond, ^bb1(%alloc, %10#0 : memref<?xi8>, i1), 336 ^bb1(%memref, %11#0 : memref<?xi8>, i1) 337 338// All non-entry basic blocks are modified to have an additional i1 argument for 339// each MemRef value in the argument list. 340^bb1(%13: memref<?xi8>, %14: i1): // 2 preds: ^bb0, ^bb0 341 test.copy(%13, %select) : (memref<?xi8>, memref<?xi8>) 342 343 %base_buffer_13, ... = memref.extract_strided_metadata %13 344 : memref<?xi8> -> memref<i8>, index, index, index 345 %base_buffer_select, ... = memref.extract_strided_metadata %select 346 : memref<?xi8> -> memref<i8>, index, index, index 347 348 // Here, we don't have a retained list, because the block has no successors 349 // and the return has no operands. 350 bufferization.dealloc (%base_buffer_13, %base_buffer_select 351 : memref<i8>, memref<i8>) 352 if (%14, %new_ownership) 353 return 354} 355``` 356 357## Buffer Deallocation Simplification Pass 358 359The [semantics of the `bufferization.dealloc` operation](#bufferizationdealloc-bufferizationdeallocop) 360provide a lot of opportunities for optimizations which can be conveniently split 361into patterns using the greedy pattern rewriter. Some of those patterns need 362access to additional analyses such as an analysis that can determine whether two 363MemRef values must, may, or never originate from the same buffer allocation. 364These patterns are collected in the Buffer Deallocation Simplification pass, 365while patterns that don't need additional analyses are registered as part of the 366regular canonicalizer pass. This pass is best run after 367`--ownership-based-buffer-deallocation` followed by `--canonicalize`. 368 369The pass applies patterns for the following simplifications: 370* Remove MemRefs from retain list when guaranteed to not alias with any value 371 in the 'memref' operand list. This avoids an additional aliasing check with 372 the removed value. 373* Split off values in the 'memref' list to new `bufferization.dealloc` 374 operations only containing this value in the 'memref' list when it is 375 guaranteed to not alias with any other value in the 'memref' list. This 376 avoids at least one aliasing check at runtime and enables using a more 377 efficient lowering for this new `bufferization.dealloc` operation. 378* Remove values from the 'memref' operand list when it is guaranteed to alias 379 with at least one value in the 'retained' list and may not alias any other 380 value in the 'retain' list. 381 382## Lower Deallocations Pass 383 384The `-lower-deallocations` pass transforms all `bufferization.dealloc` 385operations to `memref.dealloc` operations and may also insert operations from 386the `scf`, `func`, and `arith` dialects to make deallocations conditional and 387check whether two MemRef values come from the same allocation at runtime (when 388the `buffer-deallocation-simplification` pass wasn't able to determine it 389statically). 390 391The same lowering of the `bufferization.dealloc` operation is also part of the 392`-convert-bufferization-to-memref` conversion pass which also lowers all the 393other operations of the bufferization dialect. 394 395We distinguish multiple cases in this lowering pass to provide an overall more 396efficient lowering. In the general case, a library function is created to avoid 397quadratic code size explosion (relative to the number of operands of the dealloc 398operation). The specialized lowerings aim to avoid this library function because 399it requires allocating auxiliary MemRefs of index values. 400 401### Generic Lowering 402 403A library function is generated to avoid code-size blow-up. On a high level, the 404base-memref of all operands is extracted as an index value and stored into 405specifically allocated MemRefs and passed to the library function which then 406determines whether they come from the same original allocation. This information 407is needed to avoid double-free situations and to correctly retain the MemRef 408values in the `retained` list. 409 410**Dealloc Operation Lowering** 411 412This lowering supports all features the dealloc operation has to offer. It 413computes the base pointer of each memref (as an index), stores it in a 414new memref helper structure and passes it to the helper function generated 415in `buildDeallocationLibraryFunction`. The results are stored in two lists 416(represented as MemRefs) of booleans passed as arguments. The first list 417stores whether the corresponding condition should be deallocated, the 418second list stores the ownership of the retained values which can be used 419to replace the result values of the `bufferization.dealloc` operation. 420 421Example: 422```mlir 423%0:2 = bufferization.dealloc (%m0, %m1 : memref<2xf32>, memref<5xf32>) 424 if (%cond0, %cond1) 425 retain (%r0, %r1 : memref<1xf32>, memref<2xf32>) 426``` 427lowers to (simplified): 428```mlir 429%c0 = arith.constant 0 : index 430%c1 = arith.constant 1 : index 431%dealloc_base_pointer_list = memref.alloc() : memref<2xindex> 432%cond_list = memref.alloc() : memref<2xi1> 433%retain_base_pointer_list = memref.alloc() : memref<2xindex> 434%m0_base_pointer = memref.extract_aligned_pointer_as_index %m0 435memref.store %m0_base_pointer, %dealloc_base_pointer_list[%c0] 436%m1_base_pointer = memref.extract_aligned_pointer_as_index %m1 437memref.store %m1_base_pointer, %dealloc_base_pointer_list[%c1] 438memref.store %cond0, %cond_list[%c0] 439memref.store %cond1, %cond_list[%c1] 440%r0_base_pointer = memref.extract_aligned_pointer_as_index %r0 441memref.store %r0_base_pointer, %retain_base_pointer_list[%c0] 442%r1_base_pointer = memref.extract_aligned_pointer_as_index %r1 443memref.store %r1_base_pointer, %retain_base_pointer_list[%c1] 444%dyn_dealloc_base_pointer_list = memref.cast %dealloc_base_pointer_list : 445 memref<2xindex> to memref<?xindex> 446%dyn_cond_list = memref.cast %cond_list : memref<2xi1> to memref<?xi1> 447%dyn_retain_base_pointer_list = memref.cast %retain_base_pointer_list : 448 memref<2xindex> to memref<?xindex> 449%dealloc_cond_out = memref.alloc() : memref<2xi1> 450%ownership_out = memref.alloc() : memref<2xi1> 451%dyn_dealloc_cond_out = memref.cast %dealloc_cond_out : 452 memref<2xi1> to memref<?xi1> 453%dyn_ownership_out = memref.cast %ownership_out : 454 memref<2xi1> to memref<?xi1> 455call @dealloc_helper(%dyn_dealloc_base_pointer_list, 456 %dyn_retain_base_pointer_list, 457 %dyn_cond_list, 458 %dyn_dealloc_cond_out, 459 %dyn_ownership_out) : (...) 460%m0_dealloc_cond = memref.load %dyn_dealloc_cond_out[%c0] : memref<2xi1> 461scf.if %m0_dealloc_cond { 462 memref.dealloc %m0 : memref<2xf32> 463} 464%m1_dealloc_cond = memref.load %dyn_dealloc_cond_out[%c1] : memref<2xi1> 465scf.if %m1_dealloc_cond { 466 memref.dealloc %m1 : memref<5xf32> 467} 468%r0_ownership = memref.load %dyn_ownership_out[%c0] : memref<2xi1> 469%r1_ownership = memref.load %dyn_ownership_out[%c1] : memref<2xi1> 470memref.dealloc %dealloc_base_pointer_list : memref<2xindex> 471memref.dealloc %retain_base_pointer_list : memref<2xindex> 472memref.dealloc %cond_list : memref<2xi1> 473memref.dealloc %dealloc_cond_out : memref<2xi1> 474memref.dealloc %ownership_out : memref<2xi1> 475// replace %0#0 with %r0_ownership 476// replace %0#1 with %r1_ownership 477``` 478 479**Library function** 480 481A library function is built per compilation unit that can be called at 482bufferization dealloc sites to determine whether two MemRefs come from the same 483allocation and their new ownerships. 484 485The generated function takes two MemRefs of indices and three MemRefs of 486booleans as arguments: 487 * The first argument A should contain the result of the 488 extract_aligned_pointer_as_index operation applied to the MemRefs to be 489 deallocated 490 * The second argument B should contain the result of the 491 extract_aligned_pointer_as_index operation applied to the MemRefs to be 492 retained 493 * The third argument C should contain the conditions as passed directly 494 to the deallocation operation. 495 * The fourth argument D is used to pass results to the caller. Those 496 represent the condition under which the MemRef at the corresponding 497 position in A should be deallocated. 498 * The fifth argument E is used to pass results to the caller. It 499 provides the ownership value corresponding the the MemRef at the same 500 position in B 501 502This helper function is supposed to be called once for each 503`bufferization.dealloc` operation to determine the deallocation need and 504new ownership indicator for the retained values, but does not perform the 505deallocation itself. 506 507Generated code: 508```mlir 509func.func @dealloc_helper( 510 %dyn_dealloc_base_pointer_list: memref<?xindex>, 511 %dyn_retain_base_pointer_list: memref<?xindex>, 512 %dyn_cond_list: memref<?xi1>, 513 %dyn_dealloc_cond_out: memref<?xi1>, 514 %dyn_ownership_out: memref<?xi1>) { 515 %c0 = arith.constant 0 : index 516 %c1 = arith.constant 1 : index 517 %true = arith.constant true 518 %false = arith.constant false 519 %num_dealloc_memrefs = memref.dim %dyn_dealloc_base_pointer_list, %c0 520 %num_retain_memrefs = memref.dim %dyn_retain_base_pointer_list, %c0 521 // Zero initialize result buffer. 522 scf.for %i = %c0 to %num_retain_memrefs step %c1 { 523 memref.store %false, %dyn_ownership_out[%i] : memref<?xi1> 524 } 525 scf.for %i = %c0 to %num_dealloc_memrefs step %c1 { 526 %dealloc_bp = memref.load %dyn_dealloc_base_pointer_list[%i] 527 %cond = memref.load %dyn_cond_list[%i] 528 // Check for aliasing with retained memrefs. 529 %does_not_alias_retained = scf.for %j = %c0 to %num_retain_memrefs 530 step %c1 iter_args(%does_not_alias_aggregated = %true) -> (i1) { 531 %retain_bp = memref.load %dyn_retain_base_pointer_list[%j] 532 %does_alias = arith.cmpi eq, %retain_bp, %dealloc_bp : index 533 scf.if %does_alias { 534 %curr_ownership = memref.load %dyn_ownership_out[%j] 535 %updated_ownership = arith.ori %curr_ownership, %cond : i1 536 memref.store %updated_ownership, %dyn_ownership_out[%j] 537 } 538 %does_not_alias = arith.cmpi ne, %retain_bp, %dealloc_bp : index 539 %updated_aggregate = arith.andi %does_not_alias_aggregated, 540 %does_not_alias : i1 541 scf.yield %updated_aggregate : i1 542 } 543 // Check for aliasing with dealloc memrefs in the list before the 544 // current one, i.e., 545 // `fix i, forall j < i: check_aliasing(%dyn_dealloc_base_pointer[j], 546 // %dyn_dealloc_base_pointer[i])` 547 %does_not_alias_any = scf.for %j = %c0 to %i step %c1 548 iter_args(%does_not_alias_agg = %does_not_alias_retained) -> (i1) { 549 %prev_dealloc_bp = memref.load %dyn_dealloc_base_pointer_list[%j] 550 %does_not_alias = arith.cmpi ne, %prev_dealloc_bp, %dealloc_bp 551 %updated_alias_agg = arith.andi %does_not_alias_agg, %does_not_alias 552 scf.yield %updated_alias_agg : i1 553 } 554 %dealloc_cond = arith.andi %does_not_alias_any, %cond : i1 555 memref.store %dealloc_cond, %dyn_dealloc_cond_out[%i] : memref<?xi1> 556 } 557 return 558} 559``` 560 561### Specialized Lowerings 562 563Currently, there are two special lowerings for common cases to avoid the library 564function and thus unnecessary memory load and store operations and function 565calls: 566 567**One memref, no retained** 568 569Lower a simple case without any retained values and a single MemRef. Ideally, 570static analysis can provide enough information such that the 571`buffer-deallocation-simplification` pass is able to split the dealloc 572operations up into this simple case as much as possible before running this 573pass. 574 575Example: 576```mlir 577bufferization.dealloc (%arg0 : memref<2xf32>) if (%arg1) 578``` 579is lowered to 580```mlir 581scf.if %arg1 { 582 memref.dealloc %arg0 : memref<2xf32> 583} 584``` 585 586In most cases, the branch condition is either constant 'true' or 'false' and can 587thus be optimized away entirely by the canonicalizer pass. 588 589**One memref, arbitrarily many retained** 590 591A special case lowering for the deallocation operation with exactly one MemRef, 592but an arbitrary number of retained values. The size of the code produced by 593this lowering is linear to the number of retained values. 594 595Example: 596```mlir 597%0:2 = bufferization.dealloc (%m : memref<2xf32>) if (%cond) 598 retain (%r0, %r1 : memref<1xf32>, memref<2xf32>) 599return %0#0, %0#1 : i1, i1 600``` 601is lowered to 602```mlir 603%m_base_pointer = memref.extract_aligned_pointer_as_index %m 604%r0_base_pointer = memref.extract_aligned_pointer_as_index %r0 605%r0_does_not_alias = arith.cmpi ne, %m_base_pointer, %r0_base_pointer 606%r1_base_pointer = memref.extract_aligned_pointer_as_index %r1 607%r1_does_not_alias = arith.cmpi ne, %m_base_pointer, %r1_base_pointer 608%not_retained = arith.andi %r0_does_not_alias, %r1_does_not_alias : i1 609%should_dealloc = arith.andi %not_retained, %cond : i1 610scf.if %should_dealloc { 611 memref.dealloc %m : memref<2xf32> 612} 613%true = arith.constant true 614%r0_does_alias = arith.xori %r0_does_not_alias, %true : i1 615%r0_ownership = arith.andi %r0_does_alias, %cond : i1 616%r1_does_alias = arith.xori %r1_does_not_alias, %true : i1 617%r1_ownership = arith.andi %r1_does_alias, %cond : i1 618return %r0_ownership, %r1_ownership : i1, i1 619``` 620