1*cf328ff9SPierre van Houtryve=================================== 2*cf328ff9SPierre van HoutryveMemory Model Relaxation Annotations 3*cf328ff9SPierre van Houtryve=================================== 4*cf328ff9SPierre van Houtryve 5*cf328ff9SPierre van Houtryve.. contents:: 6*cf328ff9SPierre van Houtryve :local: 7*cf328ff9SPierre van Houtryve 8*cf328ff9SPierre van HoutryveIntroduction 9*cf328ff9SPierre van Houtryve============ 10*cf328ff9SPierre van Houtryve 11*cf328ff9SPierre van HoutryveMemory Model Relaxation Annotations (MMRAs) are target-defined properties 12*cf328ff9SPierre van Houtryveon instructions that can be used to selectively relax constraints placed 13*cf328ff9SPierre van Houtryveby the memory model. For example: 14*cf328ff9SPierre van Houtryve 15*cf328ff9SPierre van Houtryve* The use of ``VulkanMemoryModel`` in a SPIRV program allows certain 16*cf328ff9SPierre van Houtryve memory operations to be reordered across ``acquire`` or ``release`` 17*cf328ff9SPierre van Houtryve operations. 18*cf328ff9SPierre van Houtryve* OpenCL APIs expose primitives to only fence a specific set of address 19*cf328ff9SPierre van Houtryve spaces. Carrying that information to the backend can enable the 20*cf328ff9SPierre van Houtryve use of faster synchronization instructions, rather than fencing all 21*cf328ff9SPierre van Houtryve address spaces everytime. 22*cf328ff9SPierre van Houtryve 23*cf328ff9SPierre van HoutryveMMRAs offer an opt-in system for targets to relax the default LLVM 24*cf328ff9SPierre van Houtryvememory model. 25*cf328ff9SPierre van HoutryveAs such, they are attached to an operation using LLVM metadata which 26*cf328ff9SPierre van Houtryvecan always be dropped without affecting correctness. 27*cf328ff9SPierre van Houtryve 28*cf328ff9SPierre van HoutryveDefinitions 29*cf328ff9SPierre van Houtryve=========== 30*cf328ff9SPierre van Houtryve 31*cf328ff9SPierre van Houtryvememory operation 32*cf328ff9SPierre van Houtryve A load, a store, an atomic, or a function call that is marked as 33*cf328ff9SPierre van Houtryve accessing memory. 34*cf328ff9SPierre van Houtryve 35*cf328ff9SPierre van Houtryvesynchronizing operation 36*cf328ff9SPierre van Houtryve An instruction that synchronizes memory with other threads (e.g. 37*cf328ff9SPierre van Houtryve an atomic or a fence). 38*cf328ff9SPierre van Houtryve 39*cf328ff9SPierre van Houtryvetag 40*cf328ff9SPierre van Houtryve Metadata attached to a memory or synchronizing operation 41*cf328ff9SPierre van Houtryve that represents some target-defined property regarding memory 42*cf328ff9SPierre van Houtryve synchronization. 43*cf328ff9SPierre van Houtryve 44*cf328ff9SPierre van Houtryve An operation may have multiple tags that each represent a different 45*cf328ff9SPierre van Houtryve property. 46*cf328ff9SPierre van Houtryve 47*cf328ff9SPierre van Houtryve A tag is composed of a pair of metadata string: a *prefix* and a *suffix*. 48*cf328ff9SPierre van Houtryve 49*cf328ff9SPierre van Houtryve In LLVM IR, the pair is represented using a metadata tuple. 50*cf328ff9SPierre van Houtryve In other cases (comments, documentation, etc.), we may use the 51*cf328ff9SPierre van Houtryve ``prefix:suffix`` notation. 52*cf328ff9SPierre van Houtryve For example: 53*cf328ff9SPierre van Houtryve 54*cf328ff9SPierre van Houtryve .. code-block:: 55*cf328ff9SPierre van Houtryve :caption: Example: Tags in Metadata 56*cf328ff9SPierre van Houtryve 57*cf328ff9SPierre van Houtryve !0 = !{!"scope", !"workgroup"} # scope:workgroup 58*cf328ff9SPierre van Houtryve !1 = !{!"scope", !"device"} # scope:device 59*cf328ff9SPierre van Houtryve !2 = !{!"scope", !"system"} # scope:system 60*cf328ff9SPierre van Houtryve 61*cf328ff9SPierre van Houtryve .. note:: 62*cf328ff9SPierre van Houtryve 63*cf328ff9SPierre van Houtryve The only semantics relevant to the optimizer is the 64*cf328ff9SPierre van Houtryve "compatibility" relation defined below. All other 65*cf328ff9SPierre van Houtryve semantics are target defined. 66*cf328ff9SPierre van Houtryve 67*cf328ff9SPierre van Houtryve Tags can also be organised in lists to allow operations 68*cf328ff9SPierre van Houtryve to specify all of the tags they belong to. Such a list 69*cf328ff9SPierre van Houtryve is referred to as a "set of tags". 70*cf328ff9SPierre van Houtryve 71*cf328ff9SPierre van Houtryve .. code-block:: 72*cf328ff9SPierre van Houtryve :caption: Example: Set of Tags in Metadata 73*cf328ff9SPierre van Houtryve 74*cf328ff9SPierre van Houtryve !0 = !{!"scope", !"workgroup"} 75*cf328ff9SPierre van Houtryve !1 = !{!"sync-as", !"private"} 76*cf328ff9SPierre van Houtryve !2 = !{!0, !2} 77*cf328ff9SPierre van Houtryve 78*cf328ff9SPierre van Houtryve .. note:: 79*cf328ff9SPierre van Houtryve 80*cf328ff9SPierre van Houtryve If an operation does not have MMRA metadata, it's treated as if 81*cf328ff9SPierre van Houtryve it has an empty list (``!{}``) of tags. 82*cf328ff9SPierre van Houtryve 83*cf328ff9SPierre van Houtryve Note that it is not an error if a tag is not recognized by the 84*cf328ff9SPierre van Houtryve instruction it is applied to, or by the current target. 85*cf328ff9SPierre van Houtryve Such tags are simply ignored. 86*cf328ff9SPierre van Houtryve 87*cf328ff9SPierre van Houtryve Both synchronizing operations and memory operations can have 88*cf328ff9SPierre van Houtryve zero or more tags attached to them using the ``!mmra`` syntax. 89*cf328ff9SPierre van Houtryve 90*cf328ff9SPierre van Houtryve For the sake of readability in examples below, 91*cf328ff9SPierre van Houtryve we use a (non-functional) short syntax to represent MMMRA metadata: 92*cf328ff9SPierre van Houtryve 93*cf328ff9SPierre van Houtryve .. code-block:: 94*cf328ff9SPierre van Houtryve :caption: Short Syntax Example 95*cf328ff9SPierre van Houtryve 96*cf328ff9SPierre van Houtryve store %ptr1 # foo:bar 97*cf328ff9SPierre van Houtryve store %ptr1 !mmra !{!"foo", !"bar"} 98*cf328ff9SPierre van Houtryve 99*cf328ff9SPierre van Houtryve These two notations can be used in this document and are strictly 100*cf328ff9SPierre van Houtryve equivalent. However, only the second version is functional. 101*cf328ff9SPierre van Houtryve 102*cf328ff9SPierre van Houtryvecompatibility 103*cf328ff9SPierre van Houtryve Two sets of tags are said to be *compatible* iff, for every unique 104*cf328ff9SPierre van Houtryve tag prefix P present in at least one set: 105*cf328ff9SPierre van Houtryve 106*cf328ff9SPierre van Houtryve - the other set contains no tag with prefix P, or 107*cf328ff9SPierre van Houtryve - at least one tag with prefix P is common to both sets. 108*cf328ff9SPierre van Houtryve 109*cf328ff9SPierre van Houtryve The above definition implies that an empty set is always compatible 110*cf328ff9SPierre van Houtryve with any other set. This is an important property as it ensures that 111*cf328ff9SPierre van Houtryve if a transform drops the metadata on an operation, it can never affect 112*cf328ff9SPierre van Houtryve correctness. In other words, the memory model cannot be relaxed further 113*cf328ff9SPierre van Houtryve by deleting metadata from instructions. 114*cf328ff9SPierre van Houtryve 115*cf328ff9SPierre van Houtryve.. _HappensBefore: 116*cf328ff9SPierre van Houtryve 117*cf328ff9SPierre van HoutryveThe *happens-before* Relation 118*cf328ff9SPierre van Houtryve============================== 119*cf328ff9SPierre van Houtryve 120*cf328ff9SPierre van HoutryveCompatibility checks can be used to opt out of the *happens-before* relation 121*cf328ff9SPierre van Houtryveestablished between two instructions. 122*cf328ff9SPierre van Houtryve 123*cf328ff9SPierre van HoutryveOrdering 124*cf328ff9SPierre van Houtryve When two instructions' metadata are not compatible, any program order 125*cf328ff9SPierre van Houtryve between them are not in *happens-before*. 126*cf328ff9SPierre van Houtryve 127*cf328ff9SPierre van Houtryve For example, consider two tags ``foo:bar`` and 128*cf328ff9SPierre van Houtryve ``foo:baz`` exposed by a target: 129*cf328ff9SPierre van Houtryve 130*cf328ff9SPierre van Houtryve .. code-block:: 131*cf328ff9SPierre van Houtryve 132*cf328ff9SPierre van Houtryve A: store %ptr1 # foo:bar 133*cf328ff9SPierre van Houtryve B: store %ptr2 # foo:baz 134*cf328ff9SPierre van Houtryve X: store atomic release %ptr3 # foo:bar 135*cf328ff9SPierre van Houtryve 136*cf328ff9SPierre van Houtryve In the above figure, ``A`` is compatible with ``X``, and hence ``A`` 137*cf328ff9SPierre van Houtryve happens-before ``X``. But ``B`` is not compatible with 138*cf328ff9SPierre van Houtryve ``X``, and hence it is not happens-before ``X``. 139*cf328ff9SPierre van Houtryve 140*cf328ff9SPierre van HoutryveSynchronization 141*cf328ff9SPierre van Houtryve If an synchronizing operation has one or more tags, then whether it 142*cf328ff9SPierre van Houtryve synchronizes-with and participates in the ``seq_cst`` order with 143*cf328ff9SPierre van Houtryve other operations is target dependent. 144*cf328ff9SPierre van Houtryve 145*cf328ff9SPierre van Houtryve Whether the following example synchronizes with another sequence depends 146*cf328ff9SPierre van Houtryve on the target-defined semantics of ``foo:bar`` and ``foo:bux``. 147*cf328ff9SPierre van Houtryve 148*cf328ff9SPierre van Houtryve .. code-block:: 149*cf328ff9SPierre van Houtryve 150*cf328ff9SPierre van Houtryve fence release # foo:bar 151*cf328ff9SPierre van Houtryve store atomic %ptr1 # foo:bux 152*cf328ff9SPierre van Houtryve 153*cf328ff9SPierre van HoutryveExamples 154*cf328ff9SPierre van Houtryve-------- 155*cf328ff9SPierre van Houtryve 156*cf328ff9SPierre van HoutryveExample 1: 157*cf328ff9SPierre van Houtryve .. code-block:: 158*cf328ff9SPierre van Houtryve 159*cf328ff9SPierre van Houtryve A: store ptr addrspace(1) %ptr2 # sync-as:1 vulkan:nonprivate 160*cf328ff9SPierre van Houtryve B: store atomic release ptr addrspace(1) %ptr3 # sync-as:0 vulkan:nonprivate 161*cf328ff9SPierre van Houtryve 162*cf328ff9SPierre van Houtryve A and B are not ordered relative to each other 163*cf328ff9SPierre van Houtryve (no *happens-before*) because their sets of tags are not compatible. 164*cf328ff9SPierre van Houtryve 165*cf328ff9SPierre van Houtryve Note that the ``sync-as`` value does not have to match the ``addrspace`` value. 166*cf328ff9SPierre van Houtryve e.g. In Example 1, a store-release to a location in ``addrspace(1)`` wants to 167*cf328ff9SPierre van Houtryve only synchronize with operations happening in ``addrspace(0)``. 168*cf328ff9SPierre van Houtryve 169*cf328ff9SPierre van HoutryveExample 2: 170*cf328ff9SPierre van Houtryve .. code-block:: 171*cf328ff9SPierre van Houtryve 172*cf328ff9SPierre van Houtryve A: store ptr addrspace(1) %ptr2 # sync-as:1 vulkan:nonprivate 173*cf328ff9SPierre van Houtryve B: store atomic release ptr addrspace(1) %ptr3 # sync-as:1 vulkan:nonprivate 174*cf328ff9SPierre van Houtryve 175*cf328ff9SPierre van Houtryve The ordering of A and B is unaffected because their set of tags are 176*cf328ff9SPierre van Houtryve compatible. 177*cf328ff9SPierre van Houtryve 178*cf328ff9SPierre van Houtryve Note that A and B may or may not be in *happens-before* due to other reasons. 179*cf328ff9SPierre van Houtryve 180*cf328ff9SPierre van HoutryveExample 3: 181*cf328ff9SPierre van Houtryve .. code-block:: 182*cf328ff9SPierre van Houtryve 183*cf328ff9SPierre van Houtryve A: store ptr addrspace(1) %ptr2 # sync-as:1 vulkan:nonprivate 184*cf328ff9SPierre van Houtryve B: store atomic release ptr addrspace(1) %ptr3 # vulkan:nonprivate 185*cf328ff9SPierre van Houtryve 186*cf328ff9SPierre van Houtryve The ordering of A and B is unaffected because their set of tags are 187*cf328ff9SPierre van Houtryve compatible. 188*cf328ff9SPierre van Houtryve 189*cf328ff9SPierre van HoutryveExample 4: 190*cf328ff9SPierre van Houtryve .. code-block:: 191*cf328ff9SPierre van Houtryve 192*cf328ff9SPierre van Houtryve A: store ptr addrspace(1) %ptr2 # sync-as:1 193*cf328ff9SPierre van Houtryve B: store atomic release ptr addrspace(1) %ptr3 # sync-as:2 194*cf328ff9SPierre van Houtryve 195*cf328ff9SPierre van Houtryve A and B do not have to be ordered relative to each other 196*cf328ff9SPierre van Houtryve (no *happens-before*) because their sets of tags are not compatible. 197*cf328ff9SPierre van Houtryve 198*cf328ff9SPierre van HoutryveUse-cases 199*cf328ff9SPierre van Houtryve========= 200*cf328ff9SPierre van Houtryve 201*cf328ff9SPierre van HoutryveSPIRV ``NonPrivatePointer`` 202*cf328ff9SPierre van Houtryve--------------------------- 203*cf328ff9SPierre van Houtryve 204*cf328ff9SPierre van HoutryveMMRAs can support the SPIRV capability 205*cf328ff9SPierre van Houtryve``VulkanMemoryModel``, where synchronizing operations only affect 206*cf328ff9SPierre van Houtryvememory operations that specify ``NonPrivatePointer`` semantics. 207*cf328ff9SPierre van Houtryve 208*cf328ff9SPierre van HoutryveThe example below is generated from a SPIRV program using the 209*cf328ff9SPierre van Houtryvefollowing recipe: 210*cf328ff9SPierre van Houtryve 211*cf328ff9SPierre van Houtryve- Add ``vulkan:nonprivate`` to every synchronizing operation. 212*cf328ff9SPierre van Houtryve- Add ``vulkan:nonprivate`` to every non-atomic memory operation 213*cf328ff9SPierre van Houtryve that is marked ``NonPrivatePointer``. 214*cf328ff9SPierre van Houtryve- Add ``vulkan:private`` to tags of every non-atomic memory operation 215*cf328ff9SPierre van Houtryve that is not marked ``NonPrivatePointer``. 216*cf328ff9SPierre van Houtryve 217*cf328ff9SPierre van Houtryve.. code-block:: 218*cf328ff9SPierre van Houtryve 219*cf328ff9SPierre van Houtryve Thread T1: 220*cf328ff9SPierre van Houtryve A: store %ptr1 # vulkan:nonprivate 221*cf328ff9SPierre van Houtryve B: store %ptr2 # vulkan:private 222*cf328ff9SPierre van Houtryve X: store atomic release %ptr3 # vulkan:nonprivate 223*cf328ff9SPierre van Houtryve 224*cf328ff9SPierre van Houtryve Thread T2: 225*cf328ff9SPierre van Houtryve Y: load atomic acquire %ptr3 # vulkan:nonprivate 226*cf328ff9SPierre van Houtryve C: load %ptr2 # vulkan:private 227*cf328ff9SPierre van Houtryve D: load %ptr1 # vulkan:nonprivate 228*cf328ff9SPierre van Houtryve 229*cf328ff9SPierre van HoutryveCompatibility ensures that operation ``A`` is ordered 230*cf328ff9SPierre van Houtryverelative to ``X`` while operation ``D`` is ordered relative to ``Y``. 231*cf328ff9SPierre van HoutryveIf ``X`` synchronizes with ``Y``, then ``A`` happens-before ``D``. 232*cf328ff9SPierre van HoutryveNo such relation can be inferred about operations ``B`` and ``C``. 233*cf328ff9SPierre van Houtryve 234*cf328ff9SPierre van Houtryve.. note:: 235*cf328ff9SPierre van Houtryve The `Vulkan Memory Model <https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#memory-model-non-private>`_ 236*cf328ff9SPierre van Houtryve considers all atomic operation non-private. 237*cf328ff9SPierre van Houtryve 238*cf328ff9SPierre van Houtryve Whether ``vulkan:nonprivate`` would be specified on atomic operations is 239*cf328ff9SPierre van Houtryve an implementation detail, as an atomic operation is always ``nonprivate``. 240*cf328ff9SPierre van Houtryve The implementation may choose to be explicit and emit IR with 241*cf328ff9SPierre van Houtryve ``vulkan:nonprivate`` on every atomic operation, or it could choose to 242*cf328ff9SPierre van Houtryve only emit ``vulkan::private`` and assume ``vulkan:nonprivate`` 243*cf328ff9SPierre van Houtryve by default. 244*cf328ff9SPierre van Houtryve 245*cf328ff9SPierre van HoutryveOperations marked with ``vulkan:private`` effectively opt out of the 246*cf328ff9SPierre van Houtryvehappens-before order in a SPIRV program since they are incompatible 247*cf328ff9SPierre van Houtryvewith every synchronizing operation. Note that SPIRV operations that 248*cf328ff9SPierre van Houtryveare not marked ``NonPrivatePointer`` are not entirely private to the 249*cf328ff9SPierre van Houtryvethread --- they are implicitly synchronized at the start or end of a 250*cf328ff9SPierre van Houtryvethread by the Vulkan *system-synchronizes-with* relationship. This 251*cf328ff9SPierre van Houtryveexample assumes that the target-defined semantics of 252*cf328ff9SPierre van Houtryve``vulkan:private`` correctly implements this property. 253*cf328ff9SPierre van Houtryve 254*cf328ff9SPierre van HoutryveThis scheme is general enough to express the interoperability of SPIRV 255*cf328ff9SPierre van Houtryveprograms with other environments. 256*cf328ff9SPierre van Houtryve 257*cf328ff9SPierre van Houtryve.. code-block:: 258*cf328ff9SPierre van Houtryve 259*cf328ff9SPierre van Houtryve Thread T1: 260*cf328ff9SPierre van Houtryve A: store %ptr1 # vulkan:nonprivate 261*cf328ff9SPierre van Houtryve X: store atomic release %ptr2 # vulkan:nonprivate 262*cf328ff9SPierre van Houtryve 263*cf328ff9SPierre van Houtryve Thread T2: 264*cf328ff9SPierre van Houtryve Y: load atomic acquire %ptr2 # foo:bar 265*cf328ff9SPierre van Houtryve B: load %ptr1 266*cf328ff9SPierre van Houtryve 267*cf328ff9SPierre van HoutryveIn the above example, thread ``T1`` originates from a SPIRV program 268*cf328ff9SPierre van Houtryvewhile thread ``T2`` originates from a non-SPIRV program. Whether ``X`` 269*cf328ff9SPierre van Houtryvecan synchronize with ``Y`` is target defined. If ``X`` synchronizes 270*cf328ff9SPierre van Houtryvewith ``Y``, then ``A`` happens before ``B`` (because A/X and 271*cf328ff9SPierre van HoutryveY/B are compatible). 272*cf328ff9SPierre van Houtryve 273*cf328ff9SPierre van HoutryveImplementation Example 274*cf328ff9SPierre van Houtryve~~~~~~~~~~~~~~~~~~~~~~ 275*cf328ff9SPierre van Houtryve 276*cf328ff9SPierre van HoutryveConsider the implementation of SPIRV ``NonPrivatePointer`` on a target 277*cf328ff9SPierre van Houtryvewhere all memory operations are cached, and the entire cache is 278*cf328ff9SPierre van Houtryveflushed or invalidated at a ``release`` or ``acquire`` respectively. A 279*cf328ff9SPierre van Houtryvepossible scheme is that when translating a SPIRV program, memory 280*cf328ff9SPierre van Houtryveoperations marked ``NonPrivatePointer`` should not be cached, and the 281*cf328ff9SPierre van Houtryvecache contents should not be touched during an ``acquire`` and 282*cf328ff9SPierre van Houtryve``release`` operation. 283*cf328ff9SPierre van Houtryve 284*cf328ff9SPierre van HoutryveThis could be implemented using the tags that share the ``vulkan:`` prefix, 285*cf328ff9SPierre van Houtryveas follows: 286*cf328ff9SPierre van Houtryve 287*cf328ff9SPierre van Houtryve- For memory operations: 288*cf328ff9SPierre van Houtryve 289*cf328ff9SPierre van Houtryve - Operations with ``vulkan:nonprivate`` should bypass the cache. 290*cf328ff9SPierre van Houtryve - Operations with ``vulkan:private`` should be cached. 291*cf328ff9SPierre van Houtryve - Operations that specify neither or both should conservatively 292*cf328ff9SPierre van Houtryve bypass the cache to ensure correctness. 293*cf328ff9SPierre van Houtryve 294*cf328ff9SPierre van Houtryve- For synchronizing operations: 295*cf328ff9SPierre van Houtryve 296*cf328ff9SPierre van Houtryve - Operations with ``vulkan:nonprivate`` should not flush or 297*cf328ff9SPierre van Houtryve invalidate the cache. 298*cf328ff9SPierre van Houtryve - Operations with ``vulkan:private`` should flush or invalidate the cache. 299*cf328ff9SPierre van Houtryve - Operations that specify neither or both should conservatively 300*cf328ff9SPierre van Houtryve flush or invalidate the cache to ensure correctness. 301*cf328ff9SPierre van Houtryve 302*cf328ff9SPierre van Houtryve.. note:: 303*cf328ff9SPierre van Houtryve In such an implementation, dropping the metadata on an operation, while 304*cf328ff9SPierre van Houtryve not affecting correctness, may have big performance implications. 305*cf328ff9SPierre van Houtryve e.g. an operation bypasses the cache when it shouldn't. 306*cf328ff9SPierre van Houtryve 307*cf328ff9SPierre van HoutryveMemory Types 308*cf328ff9SPierre van Houtryve------------ 309*cf328ff9SPierre van Houtryve 310*cf328ff9SPierre van HoutryveMMRAs may express the selective synchronization of 311*cf328ff9SPierre van Houtryvedifferent memory types. 312*cf328ff9SPierre van Houtryve 313*cf328ff9SPierre van HoutryveAs an example, a target may expose an ``sync-as:<N>`` tag to 314*cf328ff9SPierre van Houtryvepass information about which address spaces are synchronized by the 315*cf328ff9SPierre van Houtryveexecution of a synchronizing operation. 316*cf328ff9SPierre van Houtryve 317*cf328ff9SPierre van Houtryve.. note:: 318*cf328ff9SPierre van Houtryve Address spaces are used here as a common example, but this concept 319*cf328ff9SPierre van Houtryve can apply for other "memory types". What "memory types" means here is 320*cf328ff9SPierre van Houtryve up to the target. 321*cf328ff9SPierre van Houtryve 322*cf328ff9SPierre van Houtryve.. code-block:: 323*cf328ff9SPierre van Houtryve 324*cf328ff9SPierre van Houtryve # let 1 = global address space 325*cf328ff9SPierre van Houtryve # let 3 = local address space 326*cf328ff9SPierre van Houtryve 327*cf328ff9SPierre van Houtryve Thread T1: 328*cf328ff9SPierre van Houtryve A: store %ptr1 # sync-as:1 329*cf328ff9SPierre van Houtryve B: store %ptr2 # sync-as:3 330*cf328ff9SPierre van Houtryve X: store atomic release ptr addrspace(0) %ptr3 # sync-as:3 331*cf328ff9SPierre van Houtryve 332*cf328ff9SPierre van Houtryve Thread T2: 333*cf328ff9SPierre van Houtryve Y: load atomic acquire ptr addrspace(0) %ptr3 # sync-as:3 334*cf328ff9SPierre van Houtryve C: load %ptr2 # sync-as:3 335*cf328ff9SPierre van Houtryve D: load %ptr1 # sync-as:1 336*cf328ff9SPierre van Houtryve 337*cf328ff9SPierre van HoutryveIn the above figure, ``X`` and ``Y`` are atomic operations on a 338*cf328ff9SPierre van Houtryvelocation in the ``global`` address space. If ``X`` synchronizes with 339*cf328ff9SPierre van Houtryve``Y``, then ``B`` happens-before ``C`` in the ``local`` address 340*cf328ff9SPierre van Houtryvespace. But no such statement can be made about operations ``A`` and 341*cf328ff9SPierre van Houtryve``D``, although they are peformed on a location in the ``global`` 342*cf328ff9SPierre van Houtryveaddress space. 343*cf328ff9SPierre van Houtryve 344*cf328ff9SPierre van HoutryveImplementation Example: Adding Address Space Information to Fences 345*cf328ff9SPierre van Houtryve~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 346*cf328ff9SPierre van Houtryve 347*cf328ff9SPierre van HoutryveLanguages such as OpenCL C provide fence operations such as 348*cf328ff9SPierre van Houtryve``atomic_work_item_fence`` that can take an explicit address 349*cf328ff9SPierre van Houtryvespace to fence. 350*cf328ff9SPierre van Houtryve 351*cf328ff9SPierre van HoutryveBy default, LLVM has no means to carry that information in the IR, so 352*cf328ff9SPierre van Houtryvethe information is lost during lowering to LLVM IR. This means that 353*cf328ff9SPierre van Houtryvetargets such as AMDGPU have to conservatively emit instructions to 354*cf328ff9SPierre van Houtryvefence all address spaces in all cases, which can have a noticeable 355*cf328ff9SPierre van Houtryveperformance impact in high-performance applications. 356*cf328ff9SPierre van Houtryve 357*cf328ff9SPierre van HoutryveMMRAs may be used to preserve that information at the IR level, all the 358*cf328ff9SPierre van Houtryveway through code generation. For example, a fence that only affects the 359*cf328ff9SPierre van Houtryveglobal address space ``addrspace(1)`` may be lowered as 360*cf328ff9SPierre van Houtryve 361*cf328ff9SPierre van Houtryve.. code-block:: 362*cf328ff9SPierre van Houtryve 363*cf328ff9SPierre van Houtryve fence release # sync-as:1 364*cf328ff9SPierre van Houtryve 365*cf328ff9SPierre van Houtryveand the target may use the presence of ``sync-as:1`` to infer that it 366*cf328ff9SPierre van Houtryvemust only emit instruction to fence the global address space. 367*cf328ff9SPierre van Houtryve 368*cf328ff9SPierre van HoutryveNote that as MMRAs are opt in, a fence that does not have MMRA metadata 369*cf328ff9SPierre van Houtryvecould still be lowered conservatively, so this optimization would only 370*cf328ff9SPierre van Houtryveapply if the front-end emits the MMRA metadata on the fence instructions. 371*cf328ff9SPierre van Houtryve 372*cf328ff9SPierre van HoutryveAdditional Topics 373*cf328ff9SPierre van Houtryve================= 374*cf328ff9SPierre van Houtryve 375*cf328ff9SPierre van Houtryve.. note:: 376*cf328ff9SPierre van Houtryve 377*cf328ff9SPierre van Houtryve The following sections are informational. 378*cf328ff9SPierre van Houtryve 379*cf328ff9SPierre van HoutryvePerformance Impact 380*cf328ff9SPierre van Houtryve------------------ 381*cf328ff9SPierre van Houtryve 382*cf328ff9SPierre van HoutryveMMRAs are a way to capture optimization opportunities in the program. 383*cf328ff9SPierre van HoutryveBut when an operation mentions no tags or conflicting tags, 384*cf328ff9SPierre van Houtryvethe target may need to produce conservative code to ensure correctness 385*cf328ff9SPierre van Houtryveat the cost of performance. This can happen in the following situations: 386*cf328ff9SPierre van Houtryve 387*cf328ff9SPierre van Houtryve1. When a target first introduces MMRAs, the 388*cf328ff9SPierre van Houtryve frontend might not have been updated to emit them. 389*cf328ff9SPierre van Houtryve2. An optimization may drop MMRA metadata. 390*cf328ff9SPierre van Houtryve3. An optimization may add arbitrary tags to an operation. 391*cf328ff9SPierre van Houtryve 392*cf328ff9SPierre van HoutryveNote that targets can always choose to ignore (or even drop) MMRAs 393*cf328ff9SPierre van Houtryveand revert to the default behavior/codegen heuristics without 394*cf328ff9SPierre van Houtryveaffecting correctness. 395*cf328ff9SPierre van Houtryve 396*cf328ff9SPierre van HoutryveConsequences of the Absence of *happens-before* 397*cf328ff9SPierre van Houtryve----------------------------------------------- 398*cf328ff9SPierre van Houtryve 399*cf328ff9SPierre van HoutryveIn the :ref:`happens-before<HappensBefore>` section, we defined how an 400*cf328ff9SPierre van Houtryve*happens-before* relation between two instruction can be broken 401*cf328ff9SPierre van Houtryveby leveraging compatibility between MMRAs. When the instructions 402*cf328ff9SPierre van Houtryveare incompatible and there is no *happens-before* relation, we say 403*cf328ff9SPierre van Houtryvethat the instructions "do not have to be ordered relative to each 404*cf328ff9SPierre van Houtryveother". 405*cf328ff9SPierre van Houtryve 406*cf328ff9SPierre van Houtryve"Ordering" in this context is a very broad term which covers both 407*cf328ff9SPierre van Houtryvestatic and runtime aspects. 408*cf328ff9SPierre van Houtryve 409*cf328ff9SPierre van HoutryveWhen there is no ordering constraint, we *could* statically reorder 410*cf328ff9SPierre van Houtryvethe instructions in an optimizer transform if the reordering does 411*cf328ff9SPierre van Houtryvenot break other constraints as single location coherence. 412*cf328ff9SPierre van HoutryveStatic reordering is one consequence of breaking *happens-before*, 413*cf328ff9SPierre van Houtryvebut is not the most interesting one. 414*cf328ff9SPierre van Houtryve 415*cf328ff9SPierre van HoutryveRun-time consequences are more interesting. When there is an 416*cf328ff9SPierre van Houtryve*happens-before* relation between instructions, the target has to emit 417*cf328ff9SPierre van Houtryvesynchronization code to ensure other threads will observe the effects of 418*cf328ff9SPierre van Houtryvethe instructions in the right order. 419*cf328ff9SPierre van Houtryve 420*cf328ff9SPierre van HoutryveFor instance, the target may have to wait for previous loads & stores to 421*cf328ff9SPierre van Houtryvefinish before starting a fence-release, or there may be a need to flush a 422*cf328ff9SPierre van Houtryvememory cache before executing the next instruction. 423*cf328ff9SPierre van HoutryveIn the absence of *happens-before*, there is no such requirement and 424*cf328ff9SPierre van Houtryveno waiting or flushing is required. This may noticeably speed up 425*cf328ff9SPierre van Houtryveexecution in some cases. 426*cf328ff9SPierre van Houtryve 427*cf328ff9SPierre van HoutryveCombining Operations 428*cf328ff9SPierre van Houtryve-------------------- 429*cf328ff9SPierre van Houtryve 430*cf328ff9SPierre van HoutryveIf a pass can combine multiple memory or synchronizing operations 431*cf328ff9SPierre van Houtryveinto one, it needs to be able to combine MMRAs. One possible way to 432*cf328ff9SPierre van Houtryveachieve this is by doing a prefix-wise union of the tag sets. 433*cf328ff9SPierre van Houtryve 434*cf328ff9SPierre van HoutryveLet A and B be two tags set, and U be the prefix-wise union of A and B. 435*cf328ff9SPierre van HoutryveFor every unique tag prefix P present in A or B: 436*cf328ff9SPierre van Houtryve 437*cf328ff9SPierre van Houtryve* If either A or B has no tags with prefix P, no tags with prefix 438*cf328ff9SPierre van Houtryve P are added to U. 439*cf328ff9SPierre van Houtryve* If both A and B have at least one tag with prefix P, all tags with prefix 440*cf328ff9SPierre van Houtryve P from both sets are added to U. 441*cf328ff9SPierre van Houtryve 442*cf328ff9SPierre van HoutryvePasses should avoid aggressively combining MMRAs, as this can result 443*cf328ff9SPierre van Houtryvein significant losses of information. While this cannot affect 444*cf328ff9SPierre van Houtryvecorrectness, it may affect performance. 445*cf328ff9SPierre van Houtryve 446*cf328ff9SPierre van HoutryveAs a general rule of thumb, common passes such as SimplifyCFG that 447*cf328ff9SPierre van Houtryveaggressively combine/reorder operations should only combine 448*cf328ff9SPierre van Houtryveinstructions that have identical sets of tags. 449*cf328ff9SPierre van HoutryvePasses that combine less frequently, or that are well aware of the cost 450*cf328ff9SPierre van Houtryveof combining the MMRAs can use the prefix-wise union described above. 451*cf328ff9SPierre van Houtryve 452*cf328ff9SPierre van HoutryveExamples: 453*cf328ff9SPierre van Houtryve 454*cf328ff9SPierre van Houtryve.. code-block:: 455*cf328ff9SPierre van Houtryve 456*cf328ff9SPierre van Houtryve A: store release %ptr1 # foo:x, foo:y, bar:x 457*cf328ff9SPierre van Houtryve B: store release %ptr2 # foo:x, bar:y 458*cf328ff9SPierre van Houtryve 459*cf328ff9SPierre van Houtryve # Unique prefixes P = [foo, bar] 460*cf328ff9SPierre van Houtryve # "foo:x" is common to A and B so it's added to U. 461*cf328ff9SPierre van Houtryve # "bar:x" != "bar:y" so it's not added to U. 462*cf328ff9SPierre van Houtryve U: store release %ptr3 # foo:x 463*cf328ff9SPierre van Houtryve 464*cf328ff9SPierre van Houtryve.. code-block:: 465*cf328ff9SPierre van Houtryve 466*cf328ff9SPierre van Houtryve A: store release %ptr1 # foo:x, foo:y 467*cf328ff9SPierre van Houtryve B: store release %ptr2 # foo:x, bux:y 468*cf328ff9SPierre van Houtryve 469*cf328ff9SPierre van Houtryve # Unique prefixes P = [foo, bux] 470*cf328ff9SPierre van Houtryve # "foo:x" is common to A and B so it's added to U. 471*cf328ff9SPierre van Houtryve # No tags have the prefix "bux" in A. 472*cf328ff9SPierre van Houtryve U: store release %ptr3 # foo:x 473*cf328ff9SPierre van Houtryve 474*cf328ff9SPierre van Houtryve.. code-block:: 475*cf328ff9SPierre van Houtryve 476*cf328ff9SPierre van Houtryve A: store release %ptr1 477*cf328ff9SPierre van Houtryve B: store release %ptr2 # foo:x, bar:y 478*cf328ff9SPierre van Houtryve 479*cf328ff9SPierre van Houtryve # Unique prefixes P = [foo, bar] 480*cf328ff9SPierre van Houtryve # No tags with "foo" or "bar" in A, so no tags added. 481*cf328ff9SPierre van Houtryve U: store release %ptr3 482