xref: /llvm-project/mlir/docs/OwnershipBasedBufferDeallocation.md (revision 2a4e61b342e7a19bcb229ef16dee083c58224a3f)
1# Ownership-based Buffer Deallocation
2
3[TOC]
4
5One-Shot Bufferize does not deallocate any buffers that it allocates. After
6running One-Shot Bufferize, the resulting IR may have a number of `memref.alloc`
7ops, but no `memref.dealloc` ops. Buffer dellocation is delegated to the
8`-ownership-based-buffer-deallocation` pass. This pass supersedes the now
9deprecated `-buffer-deallocation` pass, which does not work well with
10One-Shot Bufferize.
11
12On a high level, buffers are "owned" by a basic block. Ownership materializes
13as an `i1` SSA value and can be thought of as "responsibility to deallocate". It
14is conceptually similar to `std::unique_ptr` in C++.
15
16There are few additional preprocessing and postprocessing passes that should be
17run together with the ownership-based buffer deallocation pass. The recommended
18compilation pipeline is as follows:
19
20```
21one-shot-bufferize
22       |          it's recommended to perform all bufferization here at latest,
23       |       <- any allocations inserted after this point have to be handled
24       V          manually
25expand-realloc
26       V
27ownership-based-buffer-deallocation
28       V
29  canonicalize <- mostly for scf.if simplifications
30       V
31buffer-deallocation-simplification
32       V       <- from this point onwards no tensor values are allowed
33lower-deallocations
34       V
35      CSE
36       V
37  canonicalize
38```
39
40The entire deallocation pipeline (excluding `-one-shot-bufferize`) is exposed
41as `-buffer-deallocation-pipeline`.
42
43The ownership-based buffer deallocation pass processes operations implementing
44`FunctionOpInterface` one-by-one without analysing the call-graph.
45This means that there have to be [some rules](#function-boundary-abi) on how
46MemRefs are handled when being passed from one function to another. The rest of
47the pass revolves heavily around the `bufferization.dealloc` operation which is
48inserted at the end of each basic block with appropriate operands and should be
49optimized using the Buffer Deallocation Simplification pass
50(`--buffer-deallocation-simplification`) and the regular canonicalizer
51(`--canonicalize`). Lowering the result of the
52`-ownership-based-buffer-deallocation` pass directly using
53`--convert-bufferization-to-memref` without beforehand optimization is not
54recommended as it will lead to very inefficient code (the runtime-cost of
55`bufferization.dealloc` is `O(|memrefs|^2+|memref|*|retained|)`).
56
57## Function boundary ABI
58
59The Buffer Deallocation pass operates on the level of operations implementing
60the `FunctionOpInterface`. Such operations can take MemRefs as arguments, but
61also return them. To ensure compatibility among all functions (including
62external ones), some rules have to be enforced:
63*   When a MemRef is passed as a function argument, ownership is never acquired.
64    It is always the caller's responsibility to deallocate such MemRefs.
65*   Returning a MemRef from a function always passes ownership to the caller,
66    i.e., it is also the caller's responsibility to deallocate memrefs returned
67    from a called function.
68*   A function must not return a MemRef with the same allocated base buffer as
69    one of its arguments (in this case a copy has to be created). Note that in
70    this context two subviews of the same buffer that don't overlap are also
71    considered to alias.
72
73For external functions (e.g., library functions written externally in C), the
74externally provided implementation has to adhere to these rules and they are
75just assumed by the buffer deallocation pass. Functions on which the
76deallocation pass is applied and for which the implementation is accessible are
77modified by the pass such that the ABI is respected (i.e., buffer copies are
78inserted when necessary).
79
80## Inserting `bufferization.dealloc` operations
81
82`bufferization.dealloc` and ownership indicators are the main abstractions in
83the ownership-based buffer deallocation pass. `bufferization.dealloc`
84deallocates all given buffers if the respective ownership indicator is set and
85there is no aliasing buffer in the retain list.
86
87![branch_example_pre_move](/includes/img/bufferization_dealloc_op.svg)
88
89`bufferization.dealloc` operations are unconditionally inserted at the end of
90each basic block (just before the terminator). The majority of the pass is about
91finding the correct operands for this operation. There are three variadic
92operand lists to be populated, the first contains all MemRef values that may
93need to be deallocated, the second list contains their associated ownership
94values (of `i1` type), and the third list contains MemRef values that are still
95needed at a later point and should thus not be deallocated (e.g., yielded or
96returned buffers).
97
98`bufferization.dealloc` allows us to deal with any kind of aliasing behavior: it
99lowers to runtime aliasing checks when not enough information can be collected
100statically. When enough aliasing information is statically available, operands
101or the entire op may fold away.
102
103**Ownerships**
104
105To do so, we use a concept of ownership indicators of memrefs which materialize
106as an `i1` value for any SSA value of `memref` type, indicating whether the
107basic block in which it was materialized has ownership of this MemRef. Ideally,
108this is a constant `true` or `false`, but might also be a non-constant SSA
109value. To keep track of those ownership values without immediately materializing
110them (which might require insertion of `bufferization.clone` operations or
111operations checking for aliasing at runtime at positions where we don't actually
112need a materialized value), we use the `Ownership` class. This class represents
113the ownership in three states forming a lattice on a partial order:
114```
115forall X in SSA values. uninitialized < unique(X) < unknown
116forall X, Y in SSA values.
117  unique(X) == unique(Y) iff X and Y always evaluate to the same value
118  unique(X) != unique(Y) otherwise
119```
120Intuitively, the states have the following meaning:
121*   Uninitialized: the ownership is not initialized yet, this is the default
122    state; once an operation is finished processing the ownership of all
123    operation results with MemRef type should not be uninitialized anymore.
124*   Unique: there is a specific SSA value that can be queried to check ownership
125    without materializing any additional IR
126*   Unknown: no specific SSA value is available without materializing additional
127    IR, typically this is because two ownerships in 'Unique' state would have to
128    be merged manually (e.g., the result of an `arith.select` either has the
129    ownership of the then or else case depending on the condition value,
130    inserting another `arith.select` for the ownership values can perform the
131    merge and provide a 'Unique' ownership for the result), however, in the
132    general case this 'Unknown' state has to be assigned.
133
134Implied by the above partial order, the pass combines two ownerships in the
135following way:
136
137| Ownership 1   | Ownership 2   | Combined Ownership |
138|:--------------|:--------------|:-------------------|
139| uninitialized | uninitialized | uninitialized      |
140| unique(X)     | uninitialized | unique(X)          |
141| unique(X)     | unique(X)     | unique(X)          |
142| unique(X)     | unique(Y)     | unknown            |
143| unknown       | unique        | unknown            |
144| unknown       | uninitialized | unknown            |
145| <td colspan=3> + symmetric cases                   |
146
147**Collecting the list of MemRefs that potentially need to be deallocated**
148
149For a given block, the list of MemRefs that potentially need to be deallocated
150at the end of that block is computed by keeping track of all values for which
151the block potentially takes over ownership. This includes MemRefs provided as
152basic block arguments, interface handlers for operations like `memref.alloc` and
153`func.call`, but also liveness information in regions with multiple basic
154blocks.  More concretely, it is computed by taking the MemRefs in the 'in' set
155of the liveness analysis of the current basic block B, appended by the MemRef
156block arguments and by the set of MemRefs allocated in B itself (determined by
157the interface handlers), then subtracted (also determined by the interface
158handlers) by the set of MemRefs deallocated in B.
159
160Note that we don't have to take the intersection of the liveness 'in' set with
161the 'out' set of the predecessor block because a value that is in the 'in' set
162must be defined in an ancestor block that dominates all direct predecessors and
163thus the 'in' set of this block is a subset of the 'out' sets of each
164predecessor.
165
166```
167memrefs = filter((liveIn(block) U
168  allocated(block) U arguments(block)) \ deallocated(block), isMemRef)
169```
170
171The list of conditions for the second variadic operands list of
172`bufferization.dealloc` is computed by querying the stored ownership value for
173each of the MemRefs collected as described above. The ownership state is updated
174by the interface handlers while processing the basic block.
175
176**Collecting the list of MemRefs to retain**
177
178Given a basic block B, the list of MemRefs that have to be retained can be
179different for each successor block S.  For the two basic blocks B and S and the
180values passed via block arguments to the destination block S, we compute the
181list of MemRefs that have to be retained in B by taking the MemRefs in the
182successor operand list of the terminator and the MemRefs in the 'out' set of the
183liveness analysis for B intersected with the 'in' set of the destination block
184S.
185
186This list of retained values makes sure that we cannot run into use-after-free
187situations even if no aliasing information is present at compile-time.
188
189```
190toRetain = filter(successorOperands + (liveOut(fromBlock) insersect
191  liveIn(toBlock)), isMemRef)
192```
193
194## Supported interfaces
195
196The pass uses liveness analysis and a few interfaces:
197*   `FunctionOpInterface`
198*   `CallOpInterface`
199*   `MemoryEffectOpInterface`
200*   `RegionBranchOpInterface`
201*   `RegionBranchTerminatorOpInterface`
202
203Due to insufficient information provided by the interface, it also special-cases
204on the `cf.cond_br` operation and makes some assumptions about operations
205implementing the `RegionBranchOpInterface` at the moment, but improving the
206interfaces would allow us to remove those dependencies in the future.
207
208## Limitations
209
210The Buffer Deallocation pass has some requirements and limitations on the input
211IR. These are checked in the beginning of the pass and errors are emitted
212accordingly:
213*   The set of interfaces the pass operates on must be implemented (correctly).
214    E.g., if there is an operation present with a nested region, but does not
215    implement the `RegionBranchOpInterface`, an error is emitted because the
216    pass cannot know the semantics of the nested region (and does not make any
217    default assumptions on it).
218*   No explicit control-flow loops are present. Currently, only loops using
219    structural-control-flow are supported.  However, this limitation could be
220    lifted in the future.
221*   Deallocation operations should not be present already. The pass should
222    handle them correctly already (at least in most cases), but it's not
223    supported yet due to insufficient testing.
224*   Terminators must implement either `RegionBranchTerminatorOpInterface` or
225    `BranchOpInterface`, but not both. Terminators with more than one successor
226    are not supported (except `cf.cond_br`). This is not a fundamental
227    limitation, but there is no use-case justifying the more complex
228    implementation at the moment.
229
230## Example
231
232The following example contains a few interesting cases:
233*   Basic block arguments are modified to also pass along the ownership
234    indicator, but not for entry blocks, where the function boundary ABI
235    is applied instead.
236*   The result of `arith.select` initially has 'Unknown' assigned as ownership,
237    but once the `bufferization.dealloc` operation is inserted it is put in the
238    'retained' list (since it has uses in a later basic block) and thus the
239    'Unknown' ownership can be replaced with a 'Unique' ownership using the
240    corresponding result of the dealloc operation.
241*   The `cf.cond_br` operation has more than one successor and thus has to
242    insert two `bufferization.dealloc` operations (one for each successor).
243    While they have the same list of MemRefs to deallocate (because they perform
244    the deallocations for the same block), it must be taken into account that
245    some MemRefs remain *live* for one branch but not the other (thus set
246    intersection is performed on the *live-out* of the current block and the
247    *live-in* of the target block). Also, `cf.cond_br` supports separate
248    forwarding operands for each successor. To make sure that no MemRef is
249    deallocated twice (because there are two `bufferization.dealloc` operations
250    with the same MemRefs to deallocate), the condition operands are adjusted to
251    take the branch condition into account. While a generic lowering for such
252    terminator operations could be implemented, a specialized implementation can
253    take all the semantics of this particular operation into account and thus
254    generate a more efficient lowering.
255
256```mlir
257func.func @example(%memref: memref<?xi8>, %select_cond: i1, %br_cond: i1) {
258  %alloc = memref.alloc() : memref<?xi8>
259  %alloca = memref.alloca() : memref<?xi8>
260  %select = arith.select %select_cond, %alloc, %alloca : memref<?xi8>
261  cf.cond_br %br_cond, ^bb1(%alloc : memref<?xi8>), ^bb1(%memref : memref<?xi8>)
262^bb1(%bbarg: memref<?xi8>):
263  test.copy(%bbarg, %select) : (memref<?xi8>, memref<?xi8>)
264  return
265}
266```
267
268After running `--ownership-based-buffer-deallocation`, it looks as follows:
269
270```mlir
271// Function boundary ABI: ownership of `%memref` will never be acquired.
272func.func @example(%memref: memref<?xi8>, %select_cond: i1, %br_cond: i1) {
273  %false = arith.constant false
274  %true = arith.constant true
275
276  // The ownership of a MemRef defined by the `memref.alloc` operation is always
277  // assigned to be 'true'.
278  %alloc = memref.alloc() : memref<?xi8>
279
280  // The ownership of a MemRef defined by the `memref.alloca` operation is
281  // always assigned to be 'false'.
282  %alloca = memref.alloca() : memref<?xi8>
283
284  // The ownership of %select will be the join of the ownership of %alloc and
285  // the ownership of %alloca, i.e., of %true and %false. Because the pass does
286  // not know about the semantics of the `arith.select` operation (unless a
287  // custom handler is implemented), the ownership join will be 'Unknown'. If
288  // the materialized ownership indicator of %select is needed, either a clone
289  // has to be created for which %true is assigned as ownership or the result
290  // of a `bufferization.dealloc` where %select is in the retain list has to be
291  // used.
292  %select = arith.select %select_cond, %alloc, %alloca : memref<?xi8>
293
294  // We use `memref.extract_strided_metadata` to get the base memref since it is
295  // not allowed to pass arbitrary memrefs to `memref.dealloc`. This property is
296  // already enforced for `bufferization.dealloc`
297  %base_buffer_memref, ... = memref.extract_strided_metadata %memref
298    : memref<?xi8> -> memref<i8>, index, index, index
299  %base_buffer_alloc, ... = memref.extract_strided_metadata %alloc
300    : memref<?xi8> -> memref<i8>, index, index, index
301  %base_buffer_alloca, ... = memref.extract_strided_metadata %alloca
302    : memref<?xi8> -> memref<i8>, index, index, index
303
304  // The deallocation conditions need to be adjusted to incorporate the branch
305  // condition. In this example, this requires only a single negation, but might
306  // also require multiple arith.andi operations.
307  %not_br_cond = arith.xori %true, %br_cond : i1
308
309  // There are two dealloc operations inserted in this basic block, one per
310  // successor. Both have the same list of MemRefs to deallocate and the
311  // conditions only differ by the branch condition conjunct.
312  // Note, however, that the retained list differs. Here, both contain the
313  // %select value because it is used in both successors (since it's the same
314  // block), but the value passed via block argument differs (%memref vs.
315  // %alloc).
316  %10:2 = bufferization.dealloc
317           (%base_buffer_memref, %base_buffer_alloc, %base_buffer_alloca
318             : memref<i8>, memref<i8>, memref<i8>)
319        if (%false, %br_cond, %false)
320    retain (%alloc, %select : memref<?xi8>, memref<?xi8>)
321
322  %11:2 = bufferization.dealloc
323           (%base_buffer_memref, %base_buffer_alloc, %base_buffer_alloca
324             : memref<i8>, memref<i8>, memref<i8>)
325        if (%false, %not_br_cond, %false)
326    retain (%memref, %select : memref<?xi8>, memref<?xi8>)
327
328  // Because %select is used in ^bb1 without passing it via block argument, we
329  // need to update it's ownership value here by merging the ownership values
330  // returned by the dealloc operations
331  %new_ownership = arith.select %br_cond, %10#1, %11#1 : i1
332
333  // The terminator is modified to pass along the ownership indicator values
334  // with each MemRef value.
335  cf.cond_br %br_cond, ^bb1(%alloc, %10#0 : memref<?xi8>, i1),
336                       ^bb1(%memref, %11#0 : memref<?xi8>, i1)
337
338// All non-entry basic blocks are modified to have an additional i1 argument for
339// each MemRef value in the argument list.
340^bb1(%13: memref<?xi8>, %14: i1):  // 2 preds: ^bb0, ^bb0
341  test.copy(%13, %select) : (memref<?xi8>, memref<?xi8>)
342
343  %base_buffer_13, ... = memref.extract_strided_metadata %13
344    : memref<?xi8> -> memref<i8>, index, index, index
345  %base_buffer_select, ... = memref.extract_strided_metadata %select
346    : memref<?xi8> -> memref<i8>, index, index, index
347
348  // Here, we don't have a retained list, because the block has no successors
349  // and the return has no operands.
350  bufferization.dealloc (%base_buffer_13, %base_buffer_select
351                          : memref<i8>, memref<i8>)
352                     if (%14, %new_ownership)
353  return
354}
355```
356
357## Buffer Deallocation Simplification Pass
358
359The [semantics of the `bufferization.dealloc` operation](#bufferizationdealloc-bufferizationdeallocop)
360provide a lot of opportunities for optimizations which can be conveniently split
361into patterns using the greedy pattern rewriter. Some of those patterns need
362access to additional analyses such as an analysis that can determine whether two
363MemRef values must, may, or never originate from the same buffer allocation.
364These patterns are collected in the Buffer Deallocation Simplification pass,
365while patterns that don't need additional analyses are registered as part of the
366regular canonicalizer pass. This pass is best run after
367`--ownership-based-buffer-deallocation` followed by `--canonicalize`.
368
369The pass applies patterns for the following simplifications:
370*   Remove MemRefs from retain list when guaranteed to not alias with any value
371    in the 'memref' operand list. This avoids an additional aliasing check with
372    the removed value.
373*   Split off values in the 'memref' list to new `bufferization.dealloc`
374    operations only containing this value in the 'memref' list when it is
375    guaranteed to not alias with any other value in the 'memref' list. This
376    avoids at least one aliasing check at runtime and enables using a more
377    efficient lowering for this new `bufferization.dealloc` operation.
378*   Remove values from the 'memref' operand list when it is guaranteed to alias
379    with at least one value in the 'retained' list and may not alias any other
380    value in the 'retain' list.
381
382## Lower Deallocations Pass
383
384The `-lower-deallocations` pass transforms all `bufferization.dealloc`
385operations to `memref.dealloc` operations and may also insert operations from
386the `scf`, `func`, and `arith` dialects to make deallocations conditional and
387check whether two MemRef values come from the same allocation at runtime (when
388the `buffer-deallocation-simplification` pass wasn't able to determine it
389statically).
390
391The same lowering of the `bufferization.dealloc` operation is also part of the
392`-convert-bufferization-to-memref` conversion pass which also lowers all the
393other operations of the bufferization dialect.
394
395We distinguish multiple cases in this lowering pass to provide an overall more
396efficient lowering. In the general case, a library function is created to avoid
397quadratic code size explosion (relative to the number of operands of the dealloc
398operation). The specialized lowerings aim to avoid this library function because
399it requires allocating auxiliary MemRefs of index values.
400
401### Generic Lowering
402
403A library function is generated to avoid code-size blow-up. On a high level, the
404base-memref of all operands is extracted as an index value and stored into
405specifically allocated MemRefs and passed to the library function which then
406determines whether they come from the same original allocation. This information
407is needed to avoid double-free situations and to correctly retain the MemRef
408values in the `retained` list.
409
410**Dealloc Operation Lowering**
411
412This lowering supports all features the dealloc operation has to offer. It
413computes the base pointer of each memref (as an index), stores it in a
414new memref helper structure and passes it to the helper function generated
415in `buildDeallocationLibraryFunction`. The results are stored in two lists
416(represented as MemRefs) of booleans passed as arguments. The first list
417stores whether the corresponding condition should be deallocated, the
418second list stores the ownership of the retained values which can be used
419to replace the result values of the `bufferization.dealloc` operation.
420
421Example:
422```mlir
423%0:2 = bufferization.dealloc (%m0, %m1 : memref<2xf32>, memref<5xf32>)
424                          if (%cond0, %cond1)
425                      retain (%r0, %r1 : memref<1xf32>, memref<2xf32>)
426```
427lowers to (simplified):
428```mlir
429%c0 = arith.constant 0 : index
430%c1 = arith.constant 1 : index
431%dealloc_base_pointer_list = memref.alloc() : memref<2xindex>
432%cond_list = memref.alloc() : memref<2xi1>
433%retain_base_pointer_list = memref.alloc() : memref<2xindex>
434%m0_base_pointer = memref.extract_aligned_pointer_as_index %m0
435memref.store %m0_base_pointer, %dealloc_base_pointer_list[%c0]
436%m1_base_pointer = memref.extract_aligned_pointer_as_index %m1
437memref.store %m1_base_pointer, %dealloc_base_pointer_list[%c1]
438memref.store %cond0, %cond_list[%c0]
439memref.store %cond1, %cond_list[%c1]
440%r0_base_pointer = memref.extract_aligned_pointer_as_index %r0
441memref.store %r0_base_pointer, %retain_base_pointer_list[%c0]
442%r1_base_pointer = memref.extract_aligned_pointer_as_index %r1
443memref.store %r1_base_pointer, %retain_base_pointer_list[%c1]
444%dyn_dealloc_base_pointer_list = memref.cast %dealloc_base_pointer_list :
445   memref<2xindex> to memref<?xindex>
446%dyn_cond_list = memref.cast %cond_list : memref<2xi1> to memref<?xi1>
447%dyn_retain_base_pointer_list = memref.cast %retain_base_pointer_list :
448   memref<2xindex> to memref<?xindex>
449%dealloc_cond_out = memref.alloc() : memref<2xi1>
450%ownership_out = memref.alloc() : memref<2xi1>
451%dyn_dealloc_cond_out = memref.cast %dealloc_cond_out :
452   memref<2xi1> to memref<?xi1>
453%dyn_ownership_out = memref.cast %ownership_out :
454   memref<2xi1> to memref<?xi1>
455call @dealloc_helper(%dyn_dealloc_base_pointer_list,
456                     %dyn_retain_base_pointer_list,
457                     %dyn_cond_list,
458                     %dyn_dealloc_cond_out,
459                     %dyn_ownership_out) : (...)
460%m0_dealloc_cond = memref.load %dyn_dealloc_cond_out[%c0] : memref<2xi1>
461scf.if %m0_dealloc_cond {
462  memref.dealloc %m0 : memref<2xf32>
463}
464%m1_dealloc_cond = memref.load %dyn_dealloc_cond_out[%c1] : memref<2xi1>
465scf.if %m1_dealloc_cond {
466  memref.dealloc %m1 : memref<5xf32>
467}
468%r0_ownership = memref.load %dyn_ownership_out[%c0] : memref<2xi1>
469%r1_ownership = memref.load %dyn_ownership_out[%c1] : memref<2xi1>
470memref.dealloc %dealloc_base_pointer_list : memref<2xindex>
471memref.dealloc %retain_base_pointer_list : memref<2xindex>
472memref.dealloc %cond_list : memref<2xi1>
473memref.dealloc %dealloc_cond_out : memref<2xi1>
474memref.dealloc %ownership_out : memref<2xi1>
475// replace %0#0 with %r0_ownership
476// replace %0#1 with %r1_ownership
477```
478
479**Library function**
480
481A library function is built per compilation unit that can be called at
482bufferization dealloc sites to determine whether two MemRefs come from the same
483allocation and their new ownerships.
484
485The generated function takes two MemRefs of indices and three MemRefs of
486booleans as arguments:
487  * The first argument A should contain the result of the
488  extract_aligned_pointer_as_index operation applied to the MemRefs to be
489  deallocated
490  * The second argument B should contain the result of the
491  extract_aligned_pointer_as_index operation applied to the MemRefs to be
492  retained
493  * The third argument C should contain the conditions as passed directly
494  to the deallocation operation.
495  * The fourth argument D is used to pass results to the caller. Those
496  represent the condition under which the MemRef at the corresponding
497  position in A should be deallocated.
498  * The fifth argument E is used to pass results to the caller. It
499  provides the ownership value corresponding the the MemRef at the same
500  position in B
501
502This helper function is supposed to be called once for each
503`bufferization.dealloc` operation to determine the deallocation need and
504new ownership indicator for the retained values, but does not perform the
505deallocation itself.
506
507Generated code:
508```mlir
509func.func @dealloc_helper(
510    %dyn_dealloc_base_pointer_list: memref<?xindex>,
511    %dyn_retain_base_pointer_list: memref<?xindex>,
512    %dyn_cond_list: memref<?xi1>,
513    %dyn_dealloc_cond_out: memref<?xi1>,
514    %dyn_ownership_out: memref<?xi1>) {
515  %c0 = arith.constant 0 : index
516  %c1 = arith.constant 1 : index
517  %true = arith.constant true
518  %false = arith.constant false
519  %num_dealloc_memrefs = memref.dim %dyn_dealloc_base_pointer_list, %c0
520  %num_retain_memrefs = memref.dim %dyn_retain_base_pointer_list, %c0
521  // Zero initialize result buffer.
522  scf.for %i = %c0 to %num_retain_memrefs step %c1 {
523    memref.store %false, %dyn_ownership_out[%i] : memref<?xi1>
524  }
525  scf.for %i = %c0 to %num_dealloc_memrefs step %c1 {
526    %dealloc_bp = memref.load %dyn_dealloc_base_pointer_list[%i]
527    %cond = memref.load %dyn_cond_list[%i]
528    // Check for aliasing with retained memrefs.
529    %does_not_alias_retained = scf.for %j = %c0 to %num_retain_memrefs
530        step %c1 iter_args(%does_not_alias_aggregated = %true) -> (i1) {
531      %retain_bp = memref.load %dyn_retain_base_pointer_list[%j]
532      %does_alias = arith.cmpi eq, %retain_bp, %dealloc_bp : index
533      scf.if %does_alias {
534        %curr_ownership = memref.load %dyn_ownership_out[%j]
535        %updated_ownership = arith.ori %curr_ownership, %cond : i1
536        memref.store %updated_ownership, %dyn_ownership_out[%j]
537      }
538      %does_not_alias = arith.cmpi ne, %retain_bp, %dealloc_bp : index
539      %updated_aggregate = arith.andi %does_not_alias_aggregated,
540                                      %does_not_alias : i1
541      scf.yield %updated_aggregate : i1
542    }
543    // Check for aliasing with dealloc memrefs in the list before the
544    // current one, i.e.,
545    // `fix i, forall j < i: check_aliasing(%dyn_dealloc_base_pointer[j],
546    // %dyn_dealloc_base_pointer[i])`
547    %does_not_alias_any = scf.for %j = %c0 to %i step %c1
548       iter_args(%does_not_alias_agg = %does_not_alias_retained) -> (i1) {
549      %prev_dealloc_bp = memref.load %dyn_dealloc_base_pointer_list[%j]
550      %does_not_alias = arith.cmpi ne, %prev_dealloc_bp, %dealloc_bp
551      %updated_alias_agg = arith.andi %does_not_alias_agg, %does_not_alias
552      scf.yield %updated_alias_agg : i1
553    }
554    %dealloc_cond = arith.andi %does_not_alias_any, %cond : i1
555    memref.store %dealloc_cond, %dyn_dealloc_cond_out[%i] : memref<?xi1>
556  }
557  return
558}
559```
560
561### Specialized Lowerings
562
563Currently, there are two special lowerings for common cases to avoid the library
564function and thus unnecessary memory load and store operations and function
565calls:
566
567**One memref, no retained**
568
569Lower a simple case without any retained values and a single MemRef. Ideally,
570static analysis can provide enough information such that the
571`buffer-deallocation-simplification` pass is able to split the dealloc
572operations up into this simple case as much as possible before running this
573pass.
574
575Example:
576```mlir
577bufferization.dealloc (%arg0 : memref<2xf32>) if (%arg1)
578```
579is lowered to
580```mlir
581scf.if %arg1 {
582  memref.dealloc %arg0 : memref<2xf32>
583}
584```
585
586In most cases, the branch condition is either constant 'true' or 'false' and can
587thus be optimized away entirely by the canonicalizer pass.
588
589**One memref, arbitrarily many retained**
590
591A special case lowering for the deallocation operation with exactly one MemRef,
592but an arbitrary number of retained values. The size of the code produced by
593this lowering is linear to the number of retained values.
594
595Example:
596```mlir
597%0:2 = bufferization.dealloc (%m : memref<2xf32>) if (%cond)
598                      retain (%r0, %r1 : memref<1xf32>, memref<2xf32>)
599return %0#0, %0#1 : i1, i1
600```
601is lowered to
602```mlir
603%m_base_pointer = memref.extract_aligned_pointer_as_index %m
604%r0_base_pointer = memref.extract_aligned_pointer_as_index %r0
605%r0_does_not_alias = arith.cmpi ne, %m_base_pointer, %r0_base_pointer
606%r1_base_pointer = memref.extract_aligned_pointer_as_index %r1
607%r1_does_not_alias = arith.cmpi ne, %m_base_pointer, %r1_base_pointer
608%not_retained = arith.andi %r0_does_not_alias, %r1_does_not_alias : i1
609%should_dealloc = arith.andi %not_retained, %cond : i1
610scf.if %should_dealloc {
611  memref.dealloc %m : memref<2xf32>
612}
613%true = arith.constant true
614%r0_does_alias = arith.xori %r0_does_not_alias, %true : i1
615%r0_ownership = arith.andi %r0_does_alias, %cond : i1
616%r1_does_alias = arith.xori %r1_does_not_alias, %true : i1
617%r1_ownership = arith.andi %r1_does_alias, %cond : i1
618return %r0_ownership, %r1_ownership : i1, i1
619```
620