1=================================== 2Memory Model Relaxation Annotations 3=================================== 4 5.. contents:: 6 :local: 7 8Introduction 9============ 10 11Memory Model Relaxation Annotations (MMRAs) are target-defined properties 12on instructions that can be used to selectively relax constraints placed 13by the memory model. For example: 14 15* The use of ``VulkanMemoryModel`` in a SPIRV program allows certain 16 memory operations to be reordered across ``acquire`` or ``release`` 17 operations. 18* OpenCL APIs expose primitives to only fence a specific set of address 19 spaces. Carrying that information to the backend can enable the 20 use of faster synchronization instructions, rather than fencing all 21 address spaces everytime. 22 23MMRAs offer an opt-in system for targets to relax the default LLVM 24memory model. 25As such, they are attached to an operation using LLVM metadata which 26can always be dropped without affecting correctness. 27 28Definitions 29=========== 30 31memory operation 32 A load, a store, an atomic, or a function call that is marked as 33 accessing memory. 34 35synchronizing operation 36 An instruction that synchronizes memory with other threads (e.g. 37 an atomic or a fence). 38 39tag 40 Metadata attached to a memory or synchronizing operation 41 that represents some target-defined property regarding memory 42 synchronization. 43 44 An operation may have multiple tags that each represent a different 45 property. 46 47 A tag is composed of a pair of metadata string: a *prefix* and a *suffix*. 48 49 In LLVM IR, the pair is represented using a metadata tuple. 50 In other cases (comments, documentation, etc.), we may use the 51 ``prefix:suffix`` notation. 52 For example: 53 54 .. code-block:: 55 :caption: Example: Tags in Metadata 56 57 !0 = !{!"scope", !"workgroup"} # scope:workgroup 58 !1 = !{!"scope", !"device"} # scope:device 59 !2 = !{!"scope", !"system"} # scope:system 60 61 .. note:: 62 63 The only semantics relevant to the optimizer is the 64 "compatibility" relation defined below. All other 65 semantics are target defined. 66 67 Tags can also be organised in lists to allow operations 68 to specify all of the tags they belong to. Such a list 69 is referred to as a "set of tags". 70 71 .. code-block:: 72 :caption: Example: Set of Tags in Metadata 73 74 !0 = !{!"scope", !"workgroup"} 75 !1 = !{!"sync-as", !"private"} 76 !2 = !{!0, !2} 77 78 .. note:: 79 80 If an operation does not have MMRA metadata, it's treated as if 81 it has an empty list (``!{}``) of tags. 82 83 Note that it is not an error if a tag is not recognized by the 84 instruction it is applied to, or by the current target. 85 Such tags are simply ignored. 86 87 Both synchronizing operations and memory operations can have 88 zero or more tags attached to them using the ``!mmra`` syntax. 89 90 For the sake of readability in examples below, 91 we use a (non-functional) short syntax to represent MMMRA metadata: 92 93 .. code-block:: 94 :caption: Short Syntax Example 95 96 store %ptr1 # foo:bar 97 store %ptr1 !mmra !{!"foo", !"bar"} 98 99 These two notations can be used in this document and are strictly 100 equivalent. However, only the second version is functional. 101 102compatibility 103 Two sets of tags are said to be *compatible* iff, for every unique 104 tag prefix P present in at least one set: 105 106 - the other set contains no tag with prefix P, or 107 - at least one tag with prefix P is common to both sets. 108 109 The above definition implies that an empty set is always compatible 110 with any other set. This is an important property as it ensures that 111 if a transform drops the metadata on an operation, it can never affect 112 correctness. In other words, the memory model cannot be relaxed further 113 by deleting metadata from instructions. 114 115.. _HappensBefore: 116 117The *happens-before* Relation 118============================== 119 120Compatibility checks can be used to opt out of the *happens-before* relation 121established between two instructions. 122 123Ordering 124 When two instructions' metadata are not compatible, any program order 125 between them are not in *happens-before*. 126 127 For example, consider two tags ``foo:bar`` and 128 ``foo:baz`` exposed by a target: 129 130 .. code-block:: 131 132 A: store %ptr1 # foo:bar 133 B: store %ptr2 # foo:baz 134 X: store atomic release %ptr3 # foo:bar 135 136 In the above figure, ``A`` is compatible with ``X``, and hence ``A`` 137 happens-before ``X``. But ``B`` is not compatible with 138 ``X``, and hence it is not happens-before ``X``. 139 140Synchronization 141 If an synchronizing operation has one or more tags, then whether it 142 synchronizes-with and participates in the ``seq_cst`` order with 143 other operations is target dependent. 144 145 Whether the following example synchronizes with another sequence depends 146 on the target-defined semantics of ``foo:bar`` and ``foo:bux``. 147 148 .. code-block:: 149 150 fence release # foo:bar 151 store atomic %ptr1 # foo:bux 152 153Examples 154-------- 155 156Example 1: 157 .. code-block:: 158 159 A: store ptr addrspace(1) %ptr2 # sync-as:1 vulkan:nonprivate 160 B: store atomic release ptr addrspace(1) %ptr3 # sync-as:0 vulkan:nonprivate 161 162 A and B are not ordered relative to each other 163 (no *happens-before*) because their sets of tags are not compatible. 164 165 Note that the ``sync-as`` value does not have to match the ``addrspace`` value. 166 e.g. In Example 1, a store-release to a location in ``addrspace(1)`` wants to 167 only synchronize with operations happening in ``addrspace(0)``. 168 169Example 2: 170 .. code-block:: 171 172 A: store ptr addrspace(1) %ptr2 # sync-as:1 vulkan:nonprivate 173 B: store atomic release ptr addrspace(1) %ptr3 # sync-as:1 vulkan:nonprivate 174 175 The ordering of A and B is unaffected because their set of tags are 176 compatible. 177 178 Note that A and B may or may not be in *happens-before* due to other reasons. 179 180Example 3: 181 .. code-block:: 182 183 A: store ptr addrspace(1) %ptr2 # sync-as:1 vulkan:nonprivate 184 B: store atomic release ptr addrspace(1) %ptr3 # vulkan:nonprivate 185 186 The ordering of A and B is unaffected because their set of tags are 187 compatible. 188 189Example 4: 190 .. code-block:: 191 192 A: store ptr addrspace(1) %ptr2 # sync-as:1 193 B: store atomic release ptr addrspace(1) %ptr3 # sync-as:2 194 195 A and B do not have to be ordered relative to each other 196 (no *happens-before*) because their sets of tags are not compatible. 197 198Use-cases 199========= 200 201SPIRV ``NonPrivatePointer`` 202--------------------------- 203 204MMRAs can support the SPIRV capability 205``VulkanMemoryModel``, where synchronizing operations only affect 206memory operations that specify ``NonPrivatePointer`` semantics. 207 208The example below is generated from a SPIRV program using the 209following recipe: 210 211- Add ``vulkan:nonprivate`` to every synchronizing operation. 212- Add ``vulkan:nonprivate`` to every non-atomic memory operation 213 that is marked ``NonPrivatePointer``. 214- Add ``vulkan:private`` to tags of every non-atomic memory operation 215 that is not marked ``NonPrivatePointer``. 216 217.. code-block:: 218 219 Thread T1: 220 A: store %ptr1 # vulkan:nonprivate 221 B: store %ptr2 # vulkan:private 222 X: store atomic release %ptr3 # vulkan:nonprivate 223 224 Thread T2: 225 Y: load atomic acquire %ptr3 # vulkan:nonprivate 226 C: load %ptr2 # vulkan:private 227 D: load %ptr1 # vulkan:nonprivate 228 229Compatibility ensures that operation ``A`` is ordered 230relative to ``X`` while operation ``D`` is ordered relative to ``Y``. 231If ``X`` synchronizes with ``Y``, then ``A`` happens-before ``D``. 232No such relation can be inferred about operations ``B`` and ``C``. 233 234.. note:: 235 The `Vulkan Memory Model <https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#memory-model-non-private>`_ 236 considers all atomic operation non-private. 237 238 Whether ``vulkan:nonprivate`` would be specified on atomic operations is 239 an implementation detail, as an atomic operation is always ``nonprivate``. 240 The implementation may choose to be explicit and emit IR with 241 ``vulkan:nonprivate`` on every atomic operation, or it could choose to 242 only emit ``vulkan::private`` and assume ``vulkan:nonprivate`` 243 by default. 244 245Operations marked with ``vulkan:private`` effectively opt out of the 246happens-before order in a SPIRV program since they are incompatible 247with every synchronizing operation. Note that SPIRV operations that 248are not marked ``NonPrivatePointer`` are not entirely private to the 249thread --- they are implicitly synchronized at the start or end of a 250thread by the Vulkan *system-synchronizes-with* relationship. This 251example assumes that the target-defined semantics of 252``vulkan:private`` correctly implements this property. 253 254This scheme is general enough to express the interoperability of SPIRV 255programs with other environments. 256 257.. code-block:: 258 259 Thread T1: 260 A: store %ptr1 # vulkan:nonprivate 261 X: store atomic release %ptr2 # vulkan:nonprivate 262 263 Thread T2: 264 Y: load atomic acquire %ptr2 # foo:bar 265 B: load %ptr1 266 267In the above example, thread ``T1`` originates from a SPIRV program 268while thread ``T2`` originates from a non-SPIRV program. Whether ``X`` 269can synchronize with ``Y`` is target defined. If ``X`` synchronizes 270with ``Y``, then ``A`` happens before ``B`` (because A/X and 271Y/B are compatible). 272 273Implementation Example 274~~~~~~~~~~~~~~~~~~~~~~ 275 276Consider the implementation of SPIRV ``NonPrivatePointer`` on a target 277where all memory operations are cached, and the entire cache is 278flushed or invalidated at a ``release`` or ``acquire`` respectively. A 279possible scheme is that when translating a SPIRV program, memory 280operations marked ``NonPrivatePointer`` should not be cached, and the 281cache contents should not be touched during an ``acquire`` and 282``release`` operation. 283 284This could be implemented using the tags that share the ``vulkan:`` prefix, 285as follows: 286 287- For memory operations: 288 289 - Operations with ``vulkan:nonprivate`` should bypass the cache. 290 - Operations with ``vulkan:private`` should be cached. 291 - Operations that specify neither or both should conservatively 292 bypass the cache to ensure correctness. 293 294- For synchronizing operations: 295 296 - Operations with ``vulkan:nonprivate`` should not flush or 297 invalidate the cache. 298 - Operations with ``vulkan:private`` should flush or invalidate the cache. 299 - Operations that specify neither or both should conservatively 300 flush or invalidate the cache to ensure correctness. 301 302.. note:: 303 In such an implementation, dropping the metadata on an operation, while 304 not affecting correctness, may have big performance implications. 305 e.g. an operation bypasses the cache when it shouldn't. 306 307Memory Types 308------------ 309 310MMRAs may express the selective synchronization of 311different memory types. 312 313As an example, a target may expose an ``sync-as:<N>`` tag to 314pass information about which address spaces are synchronized by the 315execution of a synchronizing operation. 316 317.. note:: 318 Address spaces are used here as a common example, but this concept 319 can apply for other "memory types". What "memory types" means here is 320 up to the target. 321 322.. code-block:: 323 324 # let 1 = global address space 325 # let 3 = local address space 326 327 Thread T1: 328 A: store %ptr1 # sync-as:1 329 B: store %ptr2 # sync-as:3 330 X: store atomic release ptr addrspace(0) %ptr3 # sync-as:3 331 332 Thread T2: 333 Y: load atomic acquire ptr addrspace(0) %ptr3 # sync-as:3 334 C: load %ptr2 # sync-as:3 335 D: load %ptr1 # sync-as:1 336 337In the above figure, ``X`` and ``Y`` are atomic operations on a 338location in the ``global`` address space. If ``X`` synchronizes with 339``Y``, then ``B`` happens-before ``C`` in the ``local`` address 340space. But no such statement can be made about operations ``A`` and 341``D``, although they are peformed on a location in the ``global`` 342address space. 343 344Implementation Example: Adding Address Space Information to Fences 345~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 346 347Languages such as OpenCL C provide fence operations such as 348``atomic_work_item_fence`` that can take an explicit address 349space to fence. 350 351By default, LLVM has no means to carry that information in the IR, so 352the information is lost during lowering to LLVM IR. This means that 353targets such as AMDGPU have to conservatively emit instructions to 354fence all address spaces in all cases, which can have a noticeable 355performance impact in high-performance applications. 356 357MMRAs may be used to preserve that information at the IR level, all the 358way through code generation. For example, a fence that only affects the 359global address space ``addrspace(1)`` may be lowered as 360 361.. code-block:: 362 363 fence release # sync-as:1 364 365and the target may use the presence of ``sync-as:1`` to infer that it 366must only emit instruction to fence the global address space. 367 368Note that as MMRAs are opt in, a fence that does not have MMRA metadata 369could still be lowered conservatively, so this optimization would only 370apply if the front-end emits the MMRA metadata on the fence instructions. 371 372Additional Topics 373================= 374 375.. note:: 376 377 The following sections are informational. 378 379Performance Impact 380------------------ 381 382MMRAs are a way to capture optimization opportunities in the program. 383But when an operation mentions no tags or conflicting tags, 384the target may need to produce conservative code to ensure correctness 385at the cost of performance. This can happen in the following situations: 386 3871. When a target first introduces MMRAs, the 388 frontend might not have been updated to emit them. 3892. An optimization may drop MMRA metadata. 3903. An optimization may add arbitrary tags to an operation. 391 392Note that targets can always choose to ignore (or even drop) MMRAs 393and revert to the default behavior/codegen heuristics without 394affecting correctness. 395 396Consequences of the Absence of *happens-before* 397----------------------------------------------- 398 399In the :ref:`happens-before<HappensBefore>` section, we defined how an 400*happens-before* relation between two instruction can be broken 401by leveraging compatibility between MMRAs. When the instructions 402are incompatible and there is no *happens-before* relation, we say 403that the instructions "do not have to be ordered relative to each 404other". 405 406"Ordering" in this context is a very broad term which covers both 407static and runtime aspects. 408 409When there is no ordering constraint, we *could* statically reorder 410the instructions in an optimizer transform if the reordering does 411not break other constraints as single location coherence. 412Static reordering is one consequence of breaking *happens-before*, 413but is not the most interesting one. 414 415Run-time consequences are more interesting. When there is an 416*happens-before* relation between instructions, the target has to emit 417synchronization code to ensure other threads will observe the effects of 418the instructions in the right order. 419 420For instance, the target may have to wait for previous loads & stores to 421finish before starting a fence-release, or there may be a need to flush a 422memory cache before executing the next instruction. 423In the absence of *happens-before*, there is no such requirement and 424no waiting or flushing is required. This may noticeably speed up 425execution in some cases. 426 427Combining Operations 428-------------------- 429 430If a pass can combine multiple memory or synchronizing operations 431into one, it needs to be able to combine MMRAs. One possible way to 432achieve this is by doing a prefix-wise union of the tag sets. 433 434Let A and B be two tags set, and U be the prefix-wise union of A and B. 435For every unique tag prefix P present in A or B: 436 437* If either A or B has no tags with prefix P, no tags with prefix 438 P are added to U. 439* If both A and B have at least one tag with prefix P, all tags with prefix 440 P from both sets are added to U. 441 442Passes should avoid aggressively combining MMRAs, as this can result 443in significant losses of information. While this cannot affect 444correctness, it may affect performance. 445 446As a general rule of thumb, common passes such as SimplifyCFG that 447aggressively combine/reorder operations should only combine 448instructions that have identical sets of tags. 449Passes that combine less frequently, or that are well aware of the cost 450of combining the MMRAs can use the prefix-wise union described above. 451 452Examples: 453 454.. code-block:: 455 456 A: store release %ptr1 # foo:x, foo:y, bar:x 457 B: store release %ptr2 # foo:x, bar:y 458 459 # Unique prefixes P = [foo, bar] 460 # "foo:x" is common to A and B so it's added to U. 461 # "bar:x" != "bar:y" so it's not added to U. 462 U: store release %ptr3 # foo:x 463 464.. code-block:: 465 466 A: store release %ptr1 # foo:x, foo:y 467 B: store release %ptr2 # foo:x, bux:y 468 469 # Unique prefixes P = [foo, bux] 470 # "foo:x" is common to A and B so it's added to U. 471 # No tags have the prefix "bux" in A. 472 U: store release %ptr3 # foo:x 473 474.. code-block:: 475 476 A: store release %ptr1 477 B: store release %ptr2 # foo:x, bar:y 478 479 # Unique prefixes P = [foo, bar] 480 # No tags with "foo" or "bar" in A, so no tags added. 481 U: store release %ptr3 482