xref: /llvm-project/llvm/docs/MemoryModelRelaxationAnnotations.rst (revision cf328ff96daf5e676fb51ac86e550af7fd689fec)
1===================================
2Memory Model Relaxation Annotations
3===================================
4
5.. contents::
6   :local:
7
8Introduction
9============
10
11Memory Model Relaxation Annotations (MMRAs) are target-defined properties
12on instructions that can be used to selectively relax constraints placed
13by the memory model. For example:
14
15* The use of ``VulkanMemoryModel`` in a SPIRV program allows certain
16  memory operations to be reordered across ``acquire`` or ``release``
17  operations.
18* OpenCL APIs expose primitives to only fence a specific set of address
19  spaces. Carrying that information to the backend can enable the
20  use of faster synchronization instructions, rather than fencing all
21  address spaces everytime.
22
23MMRAs offer an opt-in system for targets to relax the default LLVM
24memory model.
25As such, they are attached to an operation using LLVM metadata which
26can always be dropped without affecting correctness.
27
28Definitions
29===========
30
31memory operation
32    A load, a store, an atomic, or a function call that is marked as
33    accessing memory.
34
35synchronizing operation
36    An instruction that synchronizes memory with other threads (e.g.
37    an atomic or a fence).
38
39tag
40    Metadata attached to a memory or synchronizing operation
41    that represents some target-defined property regarding memory
42    synchronization.
43
44    An operation may have multiple tags that each represent a different
45    property.
46
47    A tag is composed of a pair of metadata string: a *prefix* and a *suffix*.
48
49    In LLVM IR, the pair is represented using a metadata tuple.
50    In other cases (comments, documentation, etc.), we may use the
51    ``prefix:suffix`` notation.
52    For example:
53
54    .. code-block::
55      :caption: Example: Tags in Metadata
56
57      !0 = !{!"scope", !"workgroup"}  # scope:workgroup
58      !1 = !{!"scope", !"device"}     # scope:device
59      !2 = !{!"scope", !"system"}     # scope:system
60
61    .. note::
62
63      The only semantics relevant to the optimizer is the
64      "compatibility" relation defined below. All other
65      semantics are target defined.
66
67    Tags can also be organised in lists to allow operations
68    to specify all of the tags they belong to. Such a list
69    is referred to as a "set of tags".
70
71    .. code-block::
72      :caption: Example: Set of Tags in Metadata
73
74      !0 = !{!"scope", !"workgroup"}
75      !1 = !{!"sync-as", !"private"}
76      !2 = !{!0, !2}
77
78    .. note::
79
80      If an operation does not have MMRA metadata, it's treated as if
81      it has an empty list (``!{}``) of tags.
82
83    Note that it is not an error if a tag is not recognized by the
84    instruction it is applied to, or by the current target.
85    Such tags are simply ignored.
86
87    Both synchronizing operations and memory operations can have
88    zero or more tags attached to them using the ``!mmra`` syntax.
89
90    For the sake of readability in examples below,
91    we use a (non-functional) short syntax to represent MMMRA metadata:
92
93    .. code-block::
94      :caption: Short Syntax Example
95
96      store %ptr1 # foo:bar
97      store %ptr1 !mmra !{!"foo", !"bar"}
98
99    These two notations can be used in this document and are strictly
100    equivalent. However, only the second version is functional.
101
102compatibility
103    Two sets of tags are said to be *compatible* iff, for every unique
104    tag prefix P present in at least one set:
105
106    - the other set contains no tag with prefix P, or
107    - at least one tag with prefix P is common to both sets.
108
109    The above definition implies that an empty set is always compatible
110    with any other set. This is an important property as it ensures that
111    if a transform drops the metadata on an operation, it can never affect
112    correctness. In other words, the memory model cannot be relaxed further
113    by deleting metadata from instructions.
114
115.. _HappensBefore:
116
117The *happens-before* Relation
118==============================
119
120Compatibility checks can be used to opt out of the *happens-before* relation
121established between two instructions.
122
123Ordering
124    When two instructions' metadata are not compatible, any program order
125    between them are not in *happens-before*.
126
127    For example, consider two tags ``foo:bar`` and
128    ``foo:baz`` exposed by a target:
129
130    .. code-block::
131
132       A: store %ptr1                 # foo:bar
133       B: store %ptr2                 # foo:baz
134       X: store atomic release %ptr3  # foo:bar
135
136    In the above figure, ``A`` is compatible with ``X``, and hence ``A``
137    happens-before ``X``. But ``B`` is not compatible with
138    ``X``, and hence it is not happens-before ``X``.
139
140Synchronization
141    If an synchronizing operation has one or more tags, then whether it
142    synchronizes-with and participates in the  ``seq_cst`` order with
143    other operations is target dependent.
144
145    Whether the following example synchronizes with another sequence depends
146    on the target-defined semantics of ``foo:bar`` and ``foo:bux``.
147
148    .. code-block::
149
150       fence release               # foo:bar
151       store atomic %ptr1          # foo:bux
152
153Examples
154--------
155
156Example 1:
157    .. code-block::
158
159      A: store ptr addrspace(1) %ptr2                  # sync-as:1 vulkan:nonprivate
160      B: store atomic release ptr addrspace(1) %ptr3   # sync-as:0 vulkan:nonprivate
161
162    A and B are not ordered relative to each other
163    (no *happens-before*) because their sets of tags are not compatible.
164
165    Note that the ``sync-as`` value does not have to match the ``addrspace`` value.
166    e.g. In Example 1, a store-release to a location in ``addrspace(1)`` wants to
167    only synchronize with operations happening in ``addrspace(0)``.
168
169Example 2:
170    .. code-block::
171
172      A: store ptr addrspace(1) %ptr2                 # sync-as:1 vulkan:nonprivate
173      B: store atomic release ptr addrspace(1) %ptr3  # sync-as:1 vulkan:nonprivate
174
175    The ordering of A and B is unaffected because their set of tags are
176    compatible.
177
178    Note that A and B may or may not be in *happens-before* due to other reasons.
179
180Example 3:
181    .. code-block::
182
183      A: store ptr addrspace(1) %ptr2                 # sync-as:1 vulkan:nonprivate
184      B: store atomic release ptr addrspace(1) %ptr3  # vulkan:nonprivate
185
186    The ordering of A and B is unaffected because their set of tags are
187    compatible.
188
189Example 4:
190    .. code-block::
191
192      A: store ptr addrspace(1) %ptr2                 # sync-as:1
193      B: store atomic release ptr addrspace(1) %ptr3  # sync-as:2
194
195    A and B do not have to be ordered relative to each other
196    (no *happens-before*) because their sets of tags are not compatible.
197
198Use-cases
199=========
200
201SPIRV ``NonPrivatePointer``
202---------------------------
203
204MMRAs can support the SPIRV capability
205``VulkanMemoryModel``, where synchronizing operations only affect
206memory operations that specify ``NonPrivatePointer`` semantics.
207
208The example below is generated from a SPIRV program using the
209following recipe:
210
211- Add ``vulkan:nonprivate`` to every synchronizing operation.
212- Add ``vulkan:nonprivate`` to every non-atomic memory operation
213  that is marked ``NonPrivatePointer``.
214- Add ``vulkan:private`` to tags of every non-atomic memory operation
215  that is not marked ``NonPrivatePointer``.
216
217.. code-block::
218
219   Thread T1:
220    A: store %ptr1                 # vulkan:nonprivate
221    B: store %ptr2                 # vulkan:private
222    X: store atomic release %ptr3  # vulkan:nonprivate
223
224   Thread T2:
225    Y: load atomic acquire %ptr3   # vulkan:nonprivate
226    C: load %ptr2                  # vulkan:private
227    D: load %ptr1                  # vulkan:nonprivate
228
229Compatibility ensures that operation ``A`` is ordered
230relative to ``X`` while operation ``D`` is ordered relative to ``Y``.
231If ``X`` synchronizes with ``Y``, then ``A`` happens-before ``D``.
232No such relation can be inferred about operations ``B`` and ``C``.
233
234.. note::
235   The `Vulkan Memory Model <https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#memory-model-non-private>`_
236   considers all atomic operation non-private.
237
238   Whether ``vulkan:nonprivate`` would be specified on atomic operations is
239   an implementation detail, as an atomic operation is always ``nonprivate``.
240   The implementation may choose to be explicit and emit IR with
241   ``vulkan:nonprivate`` on every atomic operation, or it could choose to
242   only emit ``vulkan::private`` and assume ``vulkan:nonprivate``
243   by default.
244
245Operations marked with ``vulkan:private`` effectively opt out of the
246happens-before order in a SPIRV program since they are incompatible
247with every synchronizing operation. Note that SPIRV operations that
248are not marked ``NonPrivatePointer`` are not entirely private to the
249thread --- they are implicitly synchronized at the start or end of a
250thread by the Vulkan *system-synchronizes-with* relationship. This
251example assumes that the target-defined semantics of
252``vulkan:private`` correctly implements this property.
253
254This scheme is general enough to express the interoperability of SPIRV
255programs with other environments.
256
257.. code-block::
258
259   Thread T1:
260   A: store %ptr1                 # vulkan:nonprivate
261   X: store atomic release %ptr2  # vulkan:nonprivate
262
263   Thread T2:
264   Y: load atomic acquire %ptr2   # foo:bar
265   B: load %ptr1
266
267In the above example, thread ``T1`` originates from a SPIRV program
268while thread ``T2`` originates from a non-SPIRV program. Whether ``X``
269can synchronize with ``Y`` is target defined.  If ``X`` synchronizes
270with ``Y``, then ``A`` happens before ``B`` (because A/X and
271Y/B are compatible).
272
273Implementation Example
274~~~~~~~~~~~~~~~~~~~~~~
275
276Consider the implementation of SPIRV ``NonPrivatePointer`` on a target
277where all memory operations are cached, and the entire cache is
278flushed or invalidated at a ``release`` or ``acquire`` respectively. A
279possible scheme is that when translating a SPIRV program, memory
280operations marked ``NonPrivatePointer`` should not be cached, and the
281cache contents should not be touched during an ``acquire`` and
282``release`` operation.
283
284This could be implemented using the tags that share the ``vulkan:`` prefix,
285as follows:
286
287- For memory operations:
288
289  - Operations with ``vulkan:nonprivate`` should bypass the cache.
290  - Operations with ``vulkan:private`` should be cached.
291  - Operations that specify neither or both should conservatively
292    bypass the cache to ensure correctness.
293
294- For synchronizing operations:
295
296  - Operations with ``vulkan:nonprivate`` should not flush or
297    invalidate the cache.
298  - Operations with ``vulkan:private`` should flush or invalidate the cache.
299  - Operations that specify neither or both should conservatively
300    flush or invalidate the cache to ensure correctness.
301
302.. note::
303   In such an implementation, dropping the metadata on an operation, while
304   not affecting correctness, may have big performance implications.
305   e.g. an operation bypasses the cache when it shouldn't.
306
307Memory Types
308------------
309
310MMRAs may express the selective synchronization of
311different memory types.
312
313As an example, a target may expose an ``sync-as:<N>`` tag to
314pass information about which address spaces are synchronized by the
315execution of a synchronizing operation.
316
317.. note::
318  Address spaces are used here as a common example, but this concept
319  can apply for other "memory types". What "memory types" means here is
320  up to the target.
321
322.. code-block::
323
324   # let 1 = global address space
325   # let 3 = local address space
326
327   Thread T1:
328   A: store %ptr1                                  # sync-as:1
329   B: store %ptr2                                  # sync-as:3
330   X: store atomic release ptr addrspace(0) %ptr3  # sync-as:3
331
332   Thread T2:
333   Y: load atomic acquire ptr addrspace(0) %ptr3   # sync-as:3
334   C: load %ptr2                                   # sync-as:3
335   D: load %ptr1                                   # sync-as:1
336
337In the above figure, ``X`` and ``Y`` are atomic operations on a
338location in the ``global``  address space. If ``X`` synchronizes with
339``Y``, then ``B`` happens-before ``C`` in the ``local`` address
340space. But no such statement can be made about operations ``A`` and
341``D``, although they are peformed on a location in the ``global``
342address space.
343
344Implementation Example: Adding Address Space Information to Fences
345~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
346
347Languages such as OpenCL C provide fence operations such as
348``atomic_work_item_fence`` that can take an explicit address
349space to fence.
350
351By default, LLVM has no means to carry that information in the IR, so
352the information is lost during lowering to LLVM IR. This means that
353targets such as AMDGPU have to conservatively emit instructions to
354fence all address spaces in all cases, which can have a noticeable
355performance impact in high-performance applications.
356
357MMRAs may be used to preserve that information at the IR level, all the
358way through code generation. For example, a fence that only affects the
359global address space ``addrspace(1)`` may be lowered as
360
361.. code-block::
362
363    fence release # sync-as:1
364
365and the target may use the presence of ``sync-as:1`` to infer that it
366must only emit instruction to fence the global address space.
367
368Note that as MMRAs are opt in, a fence that does not have MMRA metadata
369could still be lowered conservatively, so this optimization would only
370apply if the front-end emits the MMRA metadata on the fence instructions.
371
372Additional Topics
373=================
374
375.. note::
376
377  The following sections are informational.
378
379Performance Impact
380------------------
381
382MMRAs are a way to capture optimization opportunities in the program.
383But when an operation mentions no tags or conflicting tags,
384the target may need to produce conservative code to ensure correctness
385at the cost of performance. This can happen in the following situations:
386
3871. When a target first introduces MMRAs, the
388   frontend might not have been updated to emit them.
3892. An optimization may drop MMRA metadata.
3903. An optimization may add arbitrary tags to an operation.
391
392Note that targets can always choose to ignore (or even drop) MMRAs
393and revert to the default behavior/codegen heuristics without
394affecting correctness.
395
396Consequences of the Absence of *happens-before*
397-----------------------------------------------
398
399In the :ref:`happens-before<HappensBefore>` section, we defined how an
400*happens-before* relation between two instruction can be broken
401by leveraging compatibility between MMRAs. When the instructions
402are incompatible and there is no *happens-before* relation, we say
403that the instructions "do not have to be ordered relative to each
404other".
405
406"Ordering" in this context is a very broad term which covers both
407static and runtime aspects.
408
409When there is no ordering constraint, we *could* statically reorder
410the instructions in an optimizer transform if the reordering does
411not break other constraints as single location coherence.
412Static reordering is one consequence of breaking *happens-before*,
413but is not the most interesting one.
414
415Run-time consequences are more interesting. When there is an
416*happens-before* relation between instructions, the target has to emit
417synchronization code to ensure other threads will observe the effects of
418the instructions in the right order.
419
420For instance, the target may have to wait for previous loads & stores to
421finish before starting a fence-release, or there may be a need to flush a
422memory cache before executing the next instruction.
423In the absence of *happens-before*, there is no such requirement and
424no waiting or flushing is required. This may noticeably speed up
425execution in some cases.
426
427Combining Operations
428--------------------
429
430If a pass can combine multiple memory or synchronizing operations
431into one, it needs to be able to combine MMRAs. One possible way to
432achieve this is by doing a prefix-wise union of the tag sets.
433
434Let A and B be two tags set, and U be the prefix-wise union of A and B.
435For every unique tag prefix P present in A or B:
436
437* If either A or B has no tags with prefix P, no tags with prefix
438  P are added to U.
439* If both A and B have at least one tag with prefix P, all tags with prefix
440  P from both sets are added to U.
441
442Passes should avoid aggressively combining MMRAs, as this can result
443in significant losses of information. While this cannot affect
444correctness, it may affect performance.
445
446As a general rule of thumb, common passes such as SimplifyCFG that
447aggressively combine/reorder operations should only combine
448instructions that have identical sets of tags.
449Passes that combine less frequently, or that are well aware of the cost
450of combining the MMRAs can use the prefix-wise union described above.
451
452Examples:
453
454.. code-block::
455
456    A: store release %ptr1  # foo:x, foo:y, bar:x
457    B: store release %ptr2  # foo:x, bar:y
458
459    # Unique prefixes P = [foo, bar]
460    # "foo:x" is common to A and B so it's added to U.
461    # "bar:x" != "bar:y" so it's not added to U.
462    U: store release %ptr3  # foo:x
463
464.. code-block::
465
466    A: store release %ptr1  # foo:x, foo:y
467    B: store release %ptr2  # foo:x, bux:y
468
469    # Unique prefixes P = [foo, bux]
470    # "foo:x" is common to A and B so it's added to U.
471    # No tags have the prefix "bux" in A.
472    U: store release %ptr3  # foo:x
473
474.. code-block::
475
476    A: store release %ptr1
477    B: store release %ptr2  # foo:x, bar:y
478
479    # Unique prefixes P = [foo, bar]
480    # No tags with "foo" or "bar" in A, so no tags added.
481    U: store release %ptr3
482