xref: /llvm-project/llvm/docs/MemoryModelRelaxationAnnotations.rst (revision cf328ff96daf5e676fb51ac86e550af7fd689fec)
1*cf328ff9SPierre van Houtryve===================================
2*cf328ff9SPierre van HoutryveMemory Model Relaxation Annotations
3*cf328ff9SPierre van Houtryve===================================
4*cf328ff9SPierre van Houtryve
5*cf328ff9SPierre van Houtryve.. contents::
6*cf328ff9SPierre van Houtryve   :local:
7*cf328ff9SPierre van Houtryve
8*cf328ff9SPierre van HoutryveIntroduction
9*cf328ff9SPierre van Houtryve============
10*cf328ff9SPierre van Houtryve
11*cf328ff9SPierre van HoutryveMemory Model Relaxation Annotations (MMRAs) are target-defined properties
12*cf328ff9SPierre van Houtryveon instructions that can be used to selectively relax constraints placed
13*cf328ff9SPierre van Houtryveby the memory model. For example:
14*cf328ff9SPierre van Houtryve
15*cf328ff9SPierre van Houtryve* The use of ``VulkanMemoryModel`` in a SPIRV program allows certain
16*cf328ff9SPierre van Houtryve  memory operations to be reordered across ``acquire`` or ``release``
17*cf328ff9SPierre van Houtryve  operations.
18*cf328ff9SPierre van Houtryve* OpenCL APIs expose primitives to only fence a specific set of address
19*cf328ff9SPierre van Houtryve  spaces. Carrying that information to the backend can enable the
20*cf328ff9SPierre van Houtryve  use of faster synchronization instructions, rather than fencing all
21*cf328ff9SPierre van Houtryve  address spaces everytime.
22*cf328ff9SPierre van Houtryve
23*cf328ff9SPierre van HoutryveMMRAs offer an opt-in system for targets to relax the default LLVM
24*cf328ff9SPierre van Houtryvememory model.
25*cf328ff9SPierre van HoutryveAs such, they are attached to an operation using LLVM metadata which
26*cf328ff9SPierre van Houtryvecan always be dropped without affecting correctness.
27*cf328ff9SPierre van Houtryve
28*cf328ff9SPierre van HoutryveDefinitions
29*cf328ff9SPierre van Houtryve===========
30*cf328ff9SPierre van Houtryve
31*cf328ff9SPierre van Houtryvememory operation
32*cf328ff9SPierre van Houtryve    A load, a store, an atomic, or a function call that is marked as
33*cf328ff9SPierre van Houtryve    accessing memory.
34*cf328ff9SPierre van Houtryve
35*cf328ff9SPierre van Houtryvesynchronizing operation
36*cf328ff9SPierre van Houtryve    An instruction that synchronizes memory with other threads (e.g.
37*cf328ff9SPierre van Houtryve    an atomic or a fence).
38*cf328ff9SPierre van Houtryve
39*cf328ff9SPierre van Houtryvetag
40*cf328ff9SPierre van Houtryve    Metadata attached to a memory or synchronizing operation
41*cf328ff9SPierre van Houtryve    that represents some target-defined property regarding memory
42*cf328ff9SPierre van Houtryve    synchronization.
43*cf328ff9SPierre van Houtryve
44*cf328ff9SPierre van Houtryve    An operation may have multiple tags that each represent a different
45*cf328ff9SPierre van Houtryve    property.
46*cf328ff9SPierre van Houtryve
47*cf328ff9SPierre van Houtryve    A tag is composed of a pair of metadata string: a *prefix* and a *suffix*.
48*cf328ff9SPierre van Houtryve
49*cf328ff9SPierre van Houtryve    In LLVM IR, the pair is represented using a metadata tuple.
50*cf328ff9SPierre van Houtryve    In other cases (comments, documentation, etc.), we may use the
51*cf328ff9SPierre van Houtryve    ``prefix:suffix`` notation.
52*cf328ff9SPierre van Houtryve    For example:
53*cf328ff9SPierre van Houtryve
54*cf328ff9SPierre van Houtryve    .. code-block::
55*cf328ff9SPierre van Houtryve      :caption: Example: Tags in Metadata
56*cf328ff9SPierre van Houtryve
57*cf328ff9SPierre van Houtryve      !0 = !{!"scope", !"workgroup"}  # scope:workgroup
58*cf328ff9SPierre van Houtryve      !1 = !{!"scope", !"device"}     # scope:device
59*cf328ff9SPierre van Houtryve      !2 = !{!"scope", !"system"}     # scope:system
60*cf328ff9SPierre van Houtryve
61*cf328ff9SPierre van Houtryve    .. note::
62*cf328ff9SPierre van Houtryve
63*cf328ff9SPierre van Houtryve      The only semantics relevant to the optimizer is the
64*cf328ff9SPierre van Houtryve      "compatibility" relation defined below. All other
65*cf328ff9SPierre van Houtryve      semantics are target defined.
66*cf328ff9SPierre van Houtryve
67*cf328ff9SPierre van Houtryve    Tags can also be organised in lists to allow operations
68*cf328ff9SPierre van Houtryve    to specify all of the tags they belong to. Such a list
69*cf328ff9SPierre van Houtryve    is referred to as a "set of tags".
70*cf328ff9SPierre van Houtryve
71*cf328ff9SPierre van Houtryve    .. code-block::
72*cf328ff9SPierre van Houtryve      :caption: Example: Set of Tags in Metadata
73*cf328ff9SPierre van Houtryve
74*cf328ff9SPierre van Houtryve      !0 = !{!"scope", !"workgroup"}
75*cf328ff9SPierre van Houtryve      !1 = !{!"sync-as", !"private"}
76*cf328ff9SPierre van Houtryve      !2 = !{!0, !2}
77*cf328ff9SPierre van Houtryve
78*cf328ff9SPierre van Houtryve    .. note::
79*cf328ff9SPierre van Houtryve
80*cf328ff9SPierre van Houtryve      If an operation does not have MMRA metadata, it's treated as if
81*cf328ff9SPierre van Houtryve      it has an empty list (``!{}``) of tags.
82*cf328ff9SPierre van Houtryve
83*cf328ff9SPierre van Houtryve    Note that it is not an error if a tag is not recognized by the
84*cf328ff9SPierre van Houtryve    instruction it is applied to, or by the current target.
85*cf328ff9SPierre van Houtryve    Such tags are simply ignored.
86*cf328ff9SPierre van Houtryve
87*cf328ff9SPierre van Houtryve    Both synchronizing operations and memory operations can have
88*cf328ff9SPierre van Houtryve    zero or more tags attached to them using the ``!mmra`` syntax.
89*cf328ff9SPierre van Houtryve
90*cf328ff9SPierre van Houtryve    For the sake of readability in examples below,
91*cf328ff9SPierre van Houtryve    we use a (non-functional) short syntax to represent MMMRA metadata:
92*cf328ff9SPierre van Houtryve
93*cf328ff9SPierre van Houtryve    .. code-block::
94*cf328ff9SPierre van Houtryve      :caption: Short Syntax Example
95*cf328ff9SPierre van Houtryve
96*cf328ff9SPierre van Houtryve      store %ptr1 # foo:bar
97*cf328ff9SPierre van Houtryve      store %ptr1 !mmra !{!"foo", !"bar"}
98*cf328ff9SPierre van Houtryve
99*cf328ff9SPierre van Houtryve    These two notations can be used in this document and are strictly
100*cf328ff9SPierre van Houtryve    equivalent. However, only the second version is functional.
101*cf328ff9SPierre van Houtryve
102*cf328ff9SPierre van Houtryvecompatibility
103*cf328ff9SPierre van Houtryve    Two sets of tags are said to be *compatible* iff, for every unique
104*cf328ff9SPierre van Houtryve    tag prefix P present in at least one set:
105*cf328ff9SPierre van Houtryve
106*cf328ff9SPierre van Houtryve    - the other set contains no tag with prefix P, or
107*cf328ff9SPierre van Houtryve    - at least one tag with prefix P is common to both sets.
108*cf328ff9SPierre van Houtryve
109*cf328ff9SPierre van Houtryve    The above definition implies that an empty set is always compatible
110*cf328ff9SPierre van Houtryve    with any other set. This is an important property as it ensures that
111*cf328ff9SPierre van Houtryve    if a transform drops the metadata on an operation, it can never affect
112*cf328ff9SPierre van Houtryve    correctness. In other words, the memory model cannot be relaxed further
113*cf328ff9SPierre van Houtryve    by deleting metadata from instructions.
114*cf328ff9SPierre van Houtryve
115*cf328ff9SPierre van Houtryve.. _HappensBefore:
116*cf328ff9SPierre van Houtryve
117*cf328ff9SPierre van HoutryveThe *happens-before* Relation
118*cf328ff9SPierre van Houtryve==============================
119*cf328ff9SPierre van Houtryve
120*cf328ff9SPierre van HoutryveCompatibility checks can be used to opt out of the *happens-before* relation
121*cf328ff9SPierre van Houtryveestablished between two instructions.
122*cf328ff9SPierre van Houtryve
123*cf328ff9SPierre van HoutryveOrdering
124*cf328ff9SPierre van Houtryve    When two instructions' metadata are not compatible, any program order
125*cf328ff9SPierre van Houtryve    between them are not in *happens-before*.
126*cf328ff9SPierre van Houtryve
127*cf328ff9SPierre van Houtryve    For example, consider two tags ``foo:bar`` and
128*cf328ff9SPierre van Houtryve    ``foo:baz`` exposed by a target:
129*cf328ff9SPierre van Houtryve
130*cf328ff9SPierre van Houtryve    .. code-block::
131*cf328ff9SPierre van Houtryve
132*cf328ff9SPierre van Houtryve       A: store %ptr1                 # foo:bar
133*cf328ff9SPierre van Houtryve       B: store %ptr2                 # foo:baz
134*cf328ff9SPierre van Houtryve       X: store atomic release %ptr3  # foo:bar
135*cf328ff9SPierre van Houtryve
136*cf328ff9SPierre van Houtryve    In the above figure, ``A`` is compatible with ``X``, and hence ``A``
137*cf328ff9SPierre van Houtryve    happens-before ``X``. But ``B`` is not compatible with
138*cf328ff9SPierre van Houtryve    ``X``, and hence it is not happens-before ``X``.
139*cf328ff9SPierre van Houtryve
140*cf328ff9SPierre van HoutryveSynchronization
141*cf328ff9SPierre van Houtryve    If an synchronizing operation has one or more tags, then whether it
142*cf328ff9SPierre van Houtryve    synchronizes-with and participates in the  ``seq_cst`` order with
143*cf328ff9SPierre van Houtryve    other operations is target dependent.
144*cf328ff9SPierre van Houtryve
145*cf328ff9SPierre van Houtryve    Whether the following example synchronizes with another sequence depends
146*cf328ff9SPierre van Houtryve    on the target-defined semantics of ``foo:bar`` and ``foo:bux``.
147*cf328ff9SPierre van Houtryve
148*cf328ff9SPierre van Houtryve    .. code-block::
149*cf328ff9SPierre van Houtryve
150*cf328ff9SPierre van Houtryve       fence release               # foo:bar
151*cf328ff9SPierre van Houtryve       store atomic %ptr1          # foo:bux
152*cf328ff9SPierre van Houtryve
153*cf328ff9SPierre van HoutryveExamples
154*cf328ff9SPierre van Houtryve--------
155*cf328ff9SPierre van Houtryve
156*cf328ff9SPierre van HoutryveExample 1:
157*cf328ff9SPierre van Houtryve    .. code-block::
158*cf328ff9SPierre van Houtryve
159*cf328ff9SPierre van Houtryve      A: store ptr addrspace(1) %ptr2                  # sync-as:1 vulkan:nonprivate
160*cf328ff9SPierre van Houtryve      B: store atomic release ptr addrspace(1) %ptr3   # sync-as:0 vulkan:nonprivate
161*cf328ff9SPierre van Houtryve
162*cf328ff9SPierre van Houtryve    A and B are not ordered relative to each other
163*cf328ff9SPierre van Houtryve    (no *happens-before*) because their sets of tags are not compatible.
164*cf328ff9SPierre van Houtryve
165*cf328ff9SPierre van Houtryve    Note that the ``sync-as`` value does not have to match the ``addrspace`` value.
166*cf328ff9SPierre van Houtryve    e.g. In Example 1, a store-release to a location in ``addrspace(1)`` wants to
167*cf328ff9SPierre van Houtryve    only synchronize with operations happening in ``addrspace(0)``.
168*cf328ff9SPierre van Houtryve
169*cf328ff9SPierre van HoutryveExample 2:
170*cf328ff9SPierre van Houtryve    .. code-block::
171*cf328ff9SPierre van Houtryve
172*cf328ff9SPierre van Houtryve      A: store ptr addrspace(1) %ptr2                 # sync-as:1 vulkan:nonprivate
173*cf328ff9SPierre van Houtryve      B: store atomic release ptr addrspace(1) %ptr3  # sync-as:1 vulkan:nonprivate
174*cf328ff9SPierre van Houtryve
175*cf328ff9SPierre van Houtryve    The ordering of A and B is unaffected because their set of tags are
176*cf328ff9SPierre van Houtryve    compatible.
177*cf328ff9SPierre van Houtryve
178*cf328ff9SPierre van Houtryve    Note that A and B may or may not be in *happens-before* due to other reasons.
179*cf328ff9SPierre van Houtryve
180*cf328ff9SPierre van HoutryveExample 3:
181*cf328ff9SPierre van Houtryve    .. code-block::
182*cf328ff9SPierre van Houtryve
183*cf328ff9SPierre van Houtryve      A: store ptr addrspace(1) %ptr2                 # sync-as:1 vulkan:nonprivate
184*cf328ff9SPierre van Houtryve      B: store atomic release ptr addrspace(1) %ptr3  # vulkan:nonprivate
185*cf328ff9SPierre van Houtryve
186*cf328ff9SPierre van Houtryve    The ordering of A and B is unaffected because their set of tags are
187*cf328ff9SPierre van Houtryve    compatible.
188*cf328ff9SPierre van Houtryve
189*cf328ff9SPierre van HoutryveExample 4:
190*cf328ff9SPierre van Houtryve    .. code-block::
191*cf328ff9SPierre van Houtryve
192*cf328ff9SPierre van Houtryve      A: store ptr addrspace(1) %ptr2                 # sync-as:1
193*cf328ff9SPierre van Houtryve      B: store atomic release ptr addrspace(1) %ptr3  # sync-as:2
194*cf328ff9SPierre van Houtryve
195*cf328ff9SPierre van Houtryve    A and B do not have to be ordered relative to each other
196*cf328ff9SPierre van Houtryve    (no *happens-before*) because their sets of tags are not compatible.
197*cf328ff9SPierre van Houtryve
198*cf328ff9SPierre van HoutryveUse-cases
199*cf328ff9SPierre van Houtryve=========
200*cf328ff9SPierre van Houtryve
201*cf328ff9SPierre van HoutryveSPIRV ``NonPrivatePointer``
202*cf328ff9SPierre van Houtryve---------------------------
203*cf328ff9SPierre van Houtryve
204*cf328ff9SPierre van HoutryveMMRAs can support the SPIRV capability
205*cf328ff9SPierre van Houtryve``VulkanMemoryModel``, where synchronizing operations only affect
206*cf328ff9SPierre van Houtryvememory operations that specify ``NonPrivatePointer`` semantics.
207*cf328ff9SPierre van Houtryve
208*cf328ff9SPierre van HoutryveThe example below is generated from a SPIRV program using the
209*cf328ff9SPierre van Houtryvefollowing recipe:
210*cf328ff9SPierre van Houtryve
211*cf328ff9SPierre van Houtryve- Add ``vulkan:nonprivate`` to every synchronizing operation.
212*cf328ff9SPierre van Houtryve- Add ``vulkan:nonprivate`` to every non-atomic memory operation
213*cf328ff9SPierre van Houtryve  that is marked ``NonPrivatePointer``.
214*cf328ff9SPierre van Houtryve- Add ``vulkan:private`` to tags of every non-atomic memory operation
215*cf328ff9SPierre van Houtryve  that is not marked ``NonPrivatePointer``.
216*cf328ff9SPierre van Houtryve
217*cf328ff9SPierre van Houtryve.. code-block::
218*cf328ff9SPierre van Houtryve
219*cf328ff9SPierre van Houtryve   Thread T1:
220*cf328ff9SPierre van Houtryve    A: store %ptr1                 # vulkan:nonprivate
221*cf328ff9SPierre van Houtryve    B: store %ptr2                 # vulkan:private
222*cf328ff9SPierre van Houtryve    X: store atomic release %ptr3  # vulkan:nonprivate
223*cf328ff9SPierre van Houtryve
224*cf328ff9SPierre van Houtryve   Thread T2:
225*cf328ff9SPierre van Houtryve    Y: load atomic acquire %ptr3   # vulkan:nonprivate
226*cf328ff9SPierre van Houtryve    C: load %ptr2                  # vulkan:private
227*cf328ff9SPierre van Houtryve    D: load %ptr1                  # vulkan:nonprivate
228*cf328ff9SPierre van Houtryve
229*cf328ff9SPierre van HoutryveCompatibility ensures that operation ``A`` is ordered
230*cf328ff9SPierre van Houtryverelative to ``X`` while operation ``D`` is ordered relative to ``Y``.
231*cf328ff9SPierre van HoutryveIf ``X`` synchronizes with ``Y``, then ``A`` happens-before ``D``.
232*cf328ff9SPierre van HoutryveNo such relation can be inferred about operations ``B`` and ``C``.
233*cf328ff9SPierre van Houtryve
234*cf328ff9SPierre van Houtryve.. note::
235*cf328ff9SPierre van Houtryve   The `Vulkan Memory Model <https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#memory-model-non-private>`_
236*cf328ff9SPierre van Houtryve   considers all atomic operation non-private.
237*cf328ff9SPierre van Houtryve
238*cf328ff9SPierre van Houtryve   Whether ``vulkan:nonprivate`` would be specified on atomic operations is
239*cf328ff9SPierre van Houtryve   an implementation detail, as an atomic operation is always ``nonprivate``.
240*cf328ff9SPierre van Houtryve   The implementation may choose to be explicit and emit IR with
241*cf328ff9SPierre van Houtryve   ``vulkan:nonprivate`` on every atomic operation, or it could choose to
242*cf328ff9SPierre van Houtryve   only emit ``vulkan::private`` and assume ``vulkan:nonprivate``
243*cf328ff9SPierre van Houtryve   by default.
244*cf328ff9SPierre van Houtryve
245*cf328ff9SPierre van HoutryveOperations marked with ``vulkan:private`` effectively opt out of the
246*cf328ff9SPierre van Houtryvehappens-before order in a SPIRV program since they are incompatible
247*cf328ff9SPierre van Houtryvewith every synchronizing operation. Note that SPIRV operations that
248*cf328ff9SPierre van Houtryveare not marked ``NonPrivatePointer`` are not entirely private to the
249*cf328ff9SPierre van Houtryvethread --- they are implicitly synchronized at the start or end of a
250*cf328ff9SPierre van Houtryvethread by the Vulkan *system-synchronizes-with* relationship. This
251*cf328ff9SPierre van Houtryveexample assumes that the target-defined semantics of
252*cf328ff9SPierre van Houtryve``vulkan:private`` correctly implements this property.
253*cf328ff9SPierre van Houtryve
254*cf328ff9SPierre van HoutryveThis scheme is general enough to express the interoperability of SPIRV
255*cf328ff9SPierre van Houtryveprograms with other environments.
256*cf328ff9SPierre van Houtryve
257*cf328ff9SPierre van Houtryve.. code-block::
258*cf328ff9SPierre van Houtryve
259*cf328ff9SPierre van Houtryve   Thread T1:
260*cf328ff9SPierre van Houtryve   A: store %ptr1                 # vulkan:nonprivate
261*cf328ff9SPierre van Houtryve   X: store atomic release %ptr2  # vulkan:nonprivate
262*cf328ff9SPierre van Houtryve
263*cf328ff9SPierre van Houtryve   Thread T2:
264*cf328ff9SPierre van Houtryve   Y: load atomic acquire %ptr2   # foo:bar
265*cf328ff9SPierre van Houtryve   B: load %ptr1
266*cf328ff9SPierre van Houtryve
267*cf328ff9SPierre van HoutryveIn the above example, thread ``T1`` originates from a SPIRV program
268*cf328ff9SPierre van Houtryvewhile thread ``T2`` originates from a non-SPIRV program. Whether ``X``
269*cf328ff9SPierre van Houtryvecan synchronize with ``Y`` is target defined.  If ``X`` synchronizes
270*cf328ff9SPierre van Houtryvewith ``Y``, then ``A`` happens before ``B`` (because A/X and
271*cf328ff9SPierre van HoutryveY/B are compatible).
272*cf328ff9SPierre van Houtryve
273*cf328ff9SPierre van HoutryveImplementation Example
274*cf328ff9SPierre van Houtryve~~~~~~~~~~~~~~~~~~~~~~
275*cf328ff9SPierre van Houtryve
276*cf328ff9SPierre van HoutryveConsider the implementation of SPIRV ``NonPrivatePointer`` on a target
277*cf328ff9SPierre van Houtryvewhere all memory operations are cached, and the entire cache is
278*cf328ff9SPierre van Houtryveflushed or invalidated at a ``release`` or ``acquire`` respectively. A
279*cf328ff9SPierre van Houtryvepossible scheme is that when translating a SPIRV program, memory
280*cf328ff9SPierre van Houtryveoperations marked ``NonPrivatePointer`` should not be cached, and the
281*cf328ff9SPierre van Houtryvecache contents should not be touched during an ``acquire`` and
282*cf328ff9SPierre van Houtryve``release`` operation.
283*cf328ff9SPierre van Houtryve
284*cf328ff9SPierre van HoutryveThis could be implemented using the tags that share the ``vulkan:`` prefix,
285*cf328ff9SPierre van Houtryveas follows:
286*cf328ff9SPierre van Houtryve
287*cf328ff9SPierre van Houtryve- For memory operations:
288*cf328ff9SPierre van Houtryve
289*cf328ff9SPierre van Houtryve  - Operations with ``vulkan:nonprivate`` should bypass the cache.
290*cf328ff9SPierre van Houtryve  - Operations with ``vulkan:private`` should be cached.
291*cf328ff9SPierre van Houtryve  - Operations that specify neither or both should conservatively
292*cf328ff9SPierre van Houtryve    bypass the cache to ensure correctness.
293*cf328ff9SPierre van Houtryve
294*cf328ff9SPierre van Houtryve- For synchronizing operations:
295*cf328ff9SPierre van Houtryve
296*cf328ff9SPierre van Houtryve  - Operations with ``vulkan:nonprivate`` should not flush or
297*cf328ff9SPierre van Houtryve    invalidate the cache.
298*cf328ff9SPierre van Houtryve  - Operations with ``vulkan:private`` should flush or invalidate the cache.
299*cf328ff9SPierre van Houtryve  - Operations that specify neither or both should conservatively
300*cf328ff9SPierre van Houtryve    flush or invalidate the cache to ensure correctness.
301*cf328ff9SPierre van Houtryve
302*cf328ff9SPierre van Houtryve.. note::
303*cf328ff9SPierre van Houtryve   In such an implementation, dropping the metadata on an operation, while
304*cf328ff9SPierre van Houtryve   not affecting correctness, may have big performance implications.
305*cf328ff9SPierre van Houtryve   e.g. an operation bypasses the cache when it shouldn't.
306*cf328ff9SPierre van Houtryve
307*cf328ff9SPierre van HoutryveMemory Types
308*cf328ff9SPierre van Houtryve------------
309*cf328ff9SPierre van Houtryve
310*cf328ff9SPierre van HoutryveMMRAs may express the selective synchronization of
311*cf328ff9SPierre van Houtryvedifferent memory types.
312*cf328ff9SPierre van Houtryve
313*cf328ff9SPierre van HoutryveAs an example, a target may expose an ``sync-as:<N>`` tag to
314*cf328ff9SPierre van Houtryvepass information about which address spaces are synchronized by the
315*cf328ff9SPierre van Houtryveexecution of a synchronizing operation.
316*cf328ff9SPierre van Houtryve
317*cf328ff9SPierre van Houtryve.. note::
318*cf328ff9SPierre van Houtryve  Address spaces are used here as a common example, but this concept
319*cf328ff9SPierre van Houtryve  can apply for other "memory types". What "memory types" means here is
320*cf328ff9SPierre van Houtryve  up to the target.
321*cf328ff9SPierre van Houtryve
322*cf328ff9SPierre van Houtryve.. code-block::
323*cf328ff9SPierre van Houtryve
324*cf328ff9SPierre van Houtryve   # let 1 = global address space
325*cf328ff9SPierre van Houtryve   # let 3 = local address space
326*cf328ff9SPierre van Houtryve
327*cf328ff9SPierre van Houtryve   Thread T1:
328*cf328ff9SPierre van Houtryve   A: store %ptr1                                  # sync-as:1
329*cf328ff9SPierre van Houtryve   B: store %ptr2                                  # sync-as:3
330*cf328ff9SPierre van Houtryve   X: store atomic release ptr addrspace(0) %ptr3  # sync-as:3
331*cf328ff9SPierre van Houtryve
332*cf328ff9SPierre van Houtryve   Thread T2:
333*cf328ff9SPierre van Houtryve   Y: load atomic acquire ptr addrspace(0) %ptr3   # sync-as:3
334*cf328ff9SPierre van Houtryve   C: load %ptr2                                   # sync-as:3
335*cf328ff9SPierre van Houtryve   D: load %ptr1                                   # sync-as:1
336*cf328ff9SPierre van Houtryve
337*cf328ff9SPierre van HoutryveIn the above figure, ``X`` and ``Y`` are atomic operations on a
338*cf328ff9SPierre van Houtryvelocation in the ``global``  address space. If ``X`` synchronizes with
339*cf328ff9SPierre van Houtryve``Y``, then ``B`` happens-before ``C`` in the ``local`` address
340*cf328ff9SPierre van Houtryvespace. But no such statement can be made about operations ``A`` and
341*cf328ff9SPierre van Houtryve``D``, although they are peformed on a location in the ``global``
342*cf328ff9SPierre van Houtryveaddress space.
343*cf328ff9SPierre van Houtryve
344*cf328ff9SPierre van HoutryveImplementation Example: Adding Address Space Information to Fences
345*cf328ff9SPierre van Houtryve~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
346*cf328ff9SPierre van Houtryve
347*cf328ff9SPierre van HoutryveLanguages such as OpenCL C provide fence operations such as
348*cf328ff9SPierre van Houtryve``atomic_work_item_fence`` that can take an explicit address
349*cf328ff9SPierre van Houtryvespace to fence.
350*cf328ff9SPierre van Houtryve
351*cf328ff9SPierre van HoutryveBy default, LLVM has no means to carry that information in the IR, so
352*cf328ff9SPierre van Houtryvethe information is lost during lowering to LLVM IR. This means that
353*cf328ff9SPierre van Houtryvetargets such as AMDGPU have to conservatively emit instructions to
354*cf328ff9SPierre van Houtryvefence all address spaces in all cases, which can have a noticeable
355*cf328ff9SPierre van Houtryveperformance impact in high-performance applications.
356*cf328ff9SPierre van Houtryve
357*cf328ff9SPierre van HoutryveMMRAs may be used to preserve that information at the IR level, all the
358*cf328ff9SPierre van Houtryveway through code generation. For example, a fence that only affects the
359*cf328ff9SPierre van Houtryveglobal address space ``addrspace(1)`` may be lowered as
360*cf328ff9SPierre van Houtryve
361*cf328ff9SPierre van Houtryve.. code-block::
362*cf328ff9SPierre van Houtryve
363*cf328ff9SPierre van Houtryve    fence release # sync-as:1
364*cf328ff9SPierre van Houtryve
365*cf328ff9SPierre van Houtryveand the target may use the presence of ``sync-as:1`` to infer that it
366*cf328ff9SPierre van Houtryvemust only emit instruction to fence the global address space.
367*cf328ff9SPierre van Houtryve
368*cf328ff9SPierre van HoutryveNote that as MMRAs are opt in, a fence that does not have MMRA metadata
369*cf328ff9SPierre van Houtryvecould still be lowered conservatively, so this optimization would only
370*cf328ff9SPierre van Houtryveapply if the front-end emits the MMRA metadata on the fence instructions.
371*cf328ff9SPierre van Houtryve
372*cf328ff9SPierre van HoutryveAdditional Topics
373*cf328ff9SPierre van Houtryve=================
374*cf328ff9SPierre van Houtryve
375*cf328ff9SPierre van Houtryve.. note::
376*cf328ff9SPierre van Houtryve
377*cf328ff9SPierre van Houtryve  The following sections are informational.
378*cf328ff9SPierre van Houtryve
379*cf328ff9SPierre van HoutryvePerformance Impact
380*cf328ff9SPierre van Houtryve------------------
381*cf328ff9SPierre van Houtryve
382*cf328ff9SPierre van HoutryveMMRAs are a way to capture optimization opportunities in the program.
383*cf328ff9SPierre van HoutryveBut when an operation mentions no tags or conflicting tags,
384*cf328ff9SPierre van Houtryvethe target may need to produce conservative code to ensure correctness
385*cf328ff9SPierre van Houtryveat the cost of performance. This can happen in the following situations:
386*cf328ff9SPierre van Houtryve
387*cf328ff9SPierre van Houtryve1. When a target first introduces MMRAs, the
388*cf328ff9SPierre van Houtryve   frontend might not have been updated to emit them.
389*cf328ff9SPierre van Houtryve2. An optimization may drop MMRA metadata.
390*cf328ff9SPierre van Houtryve3. An optimization may add arbitrary tags to an operation.
391*cf328ff9SPierre van Houtryve
392*cf328ff9SPierre van HoutryveNote that targets can always choose to ignore (or even drop) MMRAs
393*cf328ff9SPierre van Houtryveand revert to the default behavior/codegen heuristics without
394*cf328ff9SPierre van Houtryveaffecting correctness.
395*cf328ff9SPierre van Houtryve
396*cf328ff9SPierre van HoutryveConsequences of the Absence of *happens-before*
397*cf328ff9SPierre van Houtryve-----------------------------------------------
398*cf328ff9SPierre van Houtryve
399*cf328ff9SPierre van HoutryveIn the :ref:`happens-before<HappensBefore>` section, we defined how an
400*cf328ff9SPierre van Houtryve*happens-before* relation between two instruction can be broken
401*cf328ff9SPierre van Houtryveby leveraging compatibility between MMRAs. When the instructions
402*cf328ff9SPierre van Houtryveare incompatible and there is no *happens-before* relation, we say
403*cf328ff9SPierre van Houtryvethat the instructions "do not have to be ordered relative to each
404*cf328ff9SPierre van Houtryveother".
405*cf328ff9SPierre van Houtryve
406*cf328ff9SPierre van Houtryve"Ordering" in this context is a very broad term which covers both
407*cf328ff9SPierre van Houtryvestatic and runtime aspects.
408*cf328ff9SPierre van Houtryve
409*cf328ff9SPierre van HoutryveWhen there is no ordering constraint, we *could* statically reorder
410*cf328ff9SPierre van Houtryvethe instructions in an optimizer transform if the reordering does
411*cf328ff9SPierre van Houtryvenot break other constraints as single location coherence.
412*cf328ff9SPierre van HoutryveStatic reordering is one consequence of breaking *happens-before*,
413*cf328ff9SPierre van Houtryvebut is not the most interesting one.
414*cf328ff9SPierre van Houtryve
415*cf328ff9SPierre van HoutryveRun-time consequences are more interesting. When there is an
416*cf328ff9SPierre van Houtryve*happens-before* relation between instructions, the target has to emit
417*cf328ff9SPierre van Houtryvesynchronization code to ensure other threads will observe the effects of
418*cf328ff9SPierre van Houtryvethe instructions in the right order.
419*cf328ff9SPierre van Houtryve
420*cf328ff9SPierre van HoutryveFor instance, the target may have to wait for previous loads & stores to
421*cf328ff9SPierre van Houtryvefinish before starting a fence-release, or there may be a need to flush a
422*cf328ff9SPierre van Houtryvememory cache before executing the next instruction.
423*cf328ff9SPierre van HoutryveIn the absence of *happens-before*, there is no such requirement and
424*cf328ff9SPierre van Houtryveno waiting or flushing is required. This may noticeably speed up
425*cf328ff9SPierre van Houtryveexecution in some cases.
426*cf328ff9SPierre van Houtryve
427*cf328ff9SPierre van HoutryveCombining Operations
428*cf328ff9SPierre van Houtryve--------------------
429*cf328ff9SPierre van Houtryve
430*cf328ff9SPierre van HoutryveIf a pass can combine multiple memory or synchronizing operations
431*cf328ff9SPierre van Houtryveinto one, it needs to be able to combine MMRAs. One possible way to
432*cf328ff9SPierre van Houtryveachieve this is by doing a prefix-wise union of the tag sets.
433*cf328ff9SPierre van Houtryve
434*cf328ff9SPierre van HoutryveLet A and B be two tags set, and U be the prefix-wise union of A and B.
435*cf328ff9SPierre van HoutryveFor every unique tag prefix P present in A or B:
436*cf328ff9SPierre van Houtryve
437*cf328ff9SPierre van Houtryve* If either A or B has no tags with prefix P, no tags with prefix
438*cf328ff9SPierre van Houtryve  P are added to U.
439*cf328ff9SPierre van Houtryve* If both A and B have at least one tag with prefix P, all tags with prefix
440*cf328ff9SPierre van Houtryve  P from both sets are added to U.
441*cf328ff9SPierre van Houtryve
442*cf328ff9SPierre van HoutryvePasses should avoid aggressively combining MMRAs, as this can result
443*cf328ff9SPierre van Houtryvein significant losses of information. While this cannot affect
444*cf328ff9SPierre van Houtryvecorrectness, it may affect performance.
445*cf328ff9SPierre van Houtryve
446*cf328ff9SPierre van HoutryveAs a general rule of thumb, common passes such as SimplifyCFG that
447*cf328ff9SPierre van Houtryveaggressively combine/reorder operations should only combine
448*cf328ff9SPierre van Houtryveinstructions that have identical sets of tags.
449*cf328ff9SPierre van HoutryvePasses that combine less frequently, or that are well aware of the cost
450*cf328ff9SPierre van Houtryveof combining the MMRAs can use the prefix-wise union described above.
451*cf328ff9SPierre van Houtryve
452*cf328ff9SPierre van HoutryveExamples:
453*cf328ff9SPierre van Houtryve
454*cf328ff9SPierre van Houtryve.. code-block::
455*cf328ff9SPierre van Houtryve
456*cf328ff9SPierre van Houtryve    A: store release %ptr1  # foo:x, foo:y, bar:x
457*cf328ff9SPierre van Houtryve    B: store release %ptr2  # foo:x, bar:y
458*cf328ff9SPierre van Houtryve
459*cf328ff9SPierre van Houtryve    # Unique prefixes P = [foo, bar]
460*cf328ff9SPierre van Houtryve    # "foo:x" is common to A and B so it's added to U.
461*cf328ff9SPierre van Houtryve    # "bar:x" != "bar:y" so it's not added to U.
462*cf328ff9SPierre van Houtryve    U: store release %ptr3  # foo:x
463*cf328ff9SPierre van Houtryve
464*cf328ff9SPierre van Houtryve.. code-block::
465*cf328ff9SPierre van Houtryve
466*cf328ff9SPierre van Houtryve    A: store release %ptr1  # foo:x, foo:y
467*cf328ff9SPierre van Houtryve    B: store release %ptr2  # foo:x, bux:y
468*cf328ff9SPierre van Houtryve
469*cf328ff9SPierre van Houtryve    # Unique prefixes P = [foo, bux]
470*cf328ff9SPierre van Houtryve    # "foo:x" is common to A and B so it's added to U.
471*cf328ff9SPierre van Houtryve    # No tags have the prefix "bux" in A.
472*cf328ff9SPierre van Houtryve    U: store release %ptr3  # foo:x
473*cf328ff9SPierre van Houtryve
474*cf328ff9SPierre van Houtryve.. code-block::
475*cf328ff9SPierre van Houtryve
476*cf328ff9SPierre van Houtryve    A: store release %ptr1
477*cf328ff9SPierre van Houtryve    B: store release %ptr2  # foo:x, bar:y
478*cf328ff9SPierre van Houtryve
479*cf328ff9SPierre van Houtryve    # Unique prefixes P = [foo, bar]
480*cf328ff9SPierre van Houtryve    # No tags with "foo" or "bar" in A, so no tags added.
481*cf328ff9SPierre van Houtryve    U: store release %ptr3
482