1.. _convergence-and-uniformity: 2 3========================== 4Convergence And Uniformity 5========================== 6 7.. contents:: 8 :local: 9 10Introduction 11============ 12 13In some environments, groups of threads execute the same program in parallel, 14where efficient communication within a group is established using special 15primitives called :ref:`convergent operations<convergent_operations>`. The 16outcome of a convergent operation is sensitive to the set of threads that 17participate in it. 18 19The intuitive picture of *convergence* is built around threads executing in 20"lock step" --- a set of threads is thought of as *converged* if they are all 21executing "the same sequence of instructions together". Such threads may 22*diverge* at a *divergent branch*, and they may later *reconverge* at some 23common program point. 24 25In this intuitive picture, when converged threads execute an instruction, the 26resulting value is said to be *uniform* if it is the same in those threads, and 27*divergent* otherwise. Correspondingly, a branch is said to be a uniform branch 28if its condition is uniform, and it is a divergent branch otherwise. 29 30But the assumption of lock-step execution is not necessary for describing 31communication at convergent operations. It also constrains the implementation 32(compiler as well as hardware) by overspecifying how threads execute in such a 33parallel environment. To eliminate this assumption: 34 35- We define convergence as a relation between the execution of each instruction 36 by different threads and not as a relation between the threads themselves. 37 This definition is reasonable for known targets and is compatible with the 38 semantics of :ref:`convergent operations<convergent_operations>` in LLVM IR. 39- We also define uniformity in terms of this convergence. The output of an 40 instruction can be examined for uniformity across multiple threads only if the 41 corresponding executions of that instruction are converged. 42 43This document decribes a static analysis for determining convergence at each 44instruction in a function. The analysis extends previous work on divergence 45analysis [DivergenceSPMD]_ to cover irreducible control-flow. The described 46analysis is used in LLVM to implement a UniformityAnalysis that determines the 47uniformity of value(s) computed at each instruction in an LLVM IR or MIR 48function. 49 50.. [DivergenceSPMD] Julian Rosemann, Simon Moll, and Sebastian 51 Hack. 2021. An Abstract Interpretation for SPMD Divergence on 52 Reducible Control Flow Graphs. Proc. ACM Program. Lang. 5, POPL, 53 Article 31 (January 2021), 35 pages. 54 https://doi.org/10.1145/3434312 55 56Motivation 57========== 58 59Divergent branches constrain 60program transforms such as changing the CFG or moving a convergent 61operation to a different point of the CFG. Performing these 62transformations across a divergent branch can change the sets of 63threads that execute convergent operations convergently. While these 64constraints are out of scope for this document, 65uniformity analysis allows these transformations to identify 66uniform branches where these constraints do not hold. 67 68Uniformity is also useful by itself on targets that execute threads in 69groups with shared execution resources (e.g. waves, warps, or 70subgroups): 71 72- Uniform outputs can potentially be computed or stored on shared 73 resources. 74- These targets must "linearize" a divergent branch to ensure that 75 each side of the branch is followed by the corresponding threads in 76 the same group. But linearization is unnecessary at uniform 77 branches, since the whole group of threads follows either one side 78 of the branch or the other. 79 80Terminology 81=========== 82 83Cycles 84 Described in :ref:`cycle-terminology`. 85 86Closed path 87 Described in :ref:`cycle-closed-path`. 88 89Disjoint paths 90 Two paths in a CFG are said to be disjoint if the only nodes common 91 to both are the start node or the end node, or both. 92 93Join node 94 A join node of a branch is a node reachable along disjoint paths 95 starting from that branch. 96 97Diverged path 98 A diverged path is a path that starts from a divergent branch and 99 either reaches a join node of the branch or reaches the end of the 100 function without passing through any join node of the branch. 101 102.. _convergence-dynamic-instances: 103 104Threads and Dynamic Instances 105============================= 106 107Each occurrence of an instruction in the program source is called a 108*static instance*. When a thread executes a program, each execution of 109a static instance produces a distinct *dynamic instance* of that 110instruction. 111 112Each thread produces a unique sequence of dynamic instances: 113 114- The sequence is generated along branch decisions and loop 115 traversals. 116- Starts with a dynamic instance of a "first" instruction. 117- Continues with dynamic instances of successive "next" 118 instructions. 119 120Threads are independent; some targets may choose to execute them in 121groups in order to share resources when possible. 122 123.. figure:: convergence-natural-loop.png 124 :name: convergence-natural-loop 125 126.. table:: 127 :name: convergence-thread-example 128 :align: left 129 130 +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 131 | | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | 132 +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 133 | Thread 1 | Entry1 | H1 | B1 | L1 | H3 | | L3 | | | | Exit | 134 +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 135 | Thread 2 | Entry1 | H2 | | L2 | H4 | B2 | L4 | H5 | B3 | L5 | Exit | 136 +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 137 138In the above table, each row is a different thread, listing the 139dynamic instances produced by that thread from left to right. Each 140thread executes the same program that starts with an ``Entry`` node 141and ends with an ``Exit`` node, but different threads may take 142different paths through the control flow of the program. The columns 143are numbered merely for convenience, and empty cells have no special 144meaning. Dynamic instances listed in the same column are converged. 145 146.. _convergence-definition: 147 148Convergence 149=========== 150 151*Convergence-before* is a strict partial order over dynamic instances 152that is defined as the transitive closure of: 153 1541. If dynamic instance ``P`` is executed strictly before ``Q`` in the 155 same thread, then ``P`` is *convergence-before* ``Q``. 1562. If dynamic instance ``P`` is executed strictly before ``Q1`` in the 157 same thread, and ``Q1`` is *converged-with* ``Q2``, then ``P`` is 158 *convergence-before* ``Q2``. 1593. If dynamic instance ``P1`` is *converged-with* ``P2``, and ``P2`` 160 is executed strictly before ``Q`` in the same thread, then ``P1`` 161 is *convergence-before* ``Q``. 162 163.. table:: 164 :name: convergence-order-example 165 :align: left 166 167 +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 168 | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 169 +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 170 | Thread 1 | Entry | ... | | | | S2 | T | ... | Exit | 171 +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 172 | Thread 2 | Entry | ... | | Q2 | R | S1 | | ... | Exit | 173 +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 174 | Thread 3 | Entry | ... | P | Q1 | | | | ... | | 175 +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 176 177The above table shows partial sequences of dynamic instances from 178different threads. Dynamic instances in the same column are assumed 179to be converged (i.e., related to each other in the converged-with 180relation). The resulting convergence order includes the edges ``P -> 181Q2``, ``Q1 -> R``, ``P -> R``, ``P -> T``, etc. 182 183*Converged-with* is a transitive symmetric relation over dynamic instances 184produced by *different threads* for the *same static instance*. 185 186It is impractical to provide any one definition for the *converged-with* 187relation, since different environments may wish to relate dynamic instances in 188different ways. The fact that *convergence-before* is a strict partial order is 189a constraint on the *converged-with* relation. It is trivially satisfied if 190different dynamic instances are never converged. Below, we provide a relation 191called :ref:`maximal converged-with<convergence-maximal>`, which satisifies 192*convergence-before* and is suitable for known targets. 193 194.. _convergence-note-convergence: 195 196.. note:: 197 198 1. The convergence-before relation is not 199 directly observable. Program transforms are in general free to 200 change the order of instructions, even though that obviously 201 changes the convergence-before relation. 202 203 2. Converged dynamic instances need not be executed at the same 204 time or even on the same resource. Converged dynamic instances 205 of a convergent operation may appear to do so but that is an 206 implementation detail. 207 208 3. The fact that ``P`` is convergence-before 209 ``Q`` does not automatically imply that ``P`` happens-before 210 ``Q`` in a memory model sense. 211 212.. _convergence-maximal: 213 214Maximal Convergence 215------------------- 216 217This section defines a constraint that may be used to 218produce a *maximal converged-with* relation without violating the 219strict *convergence-before* order. This maximal converged-with 220relation is reasonable for real targets and is compatible with 221convergent operations. 222 223The maximal converged-with relation is defined in terms of cycle 224headers, with the assumption that threads converge at the header on every 225"iteration" of the cycle. Informally, two threads execute the same iteration of 226a cycle if they both previously executed the cycle header the same number of 227times after they entered that cycle. In general, this needs to account for the 228iterations of parent cycles as well. 229 230 **Maximal converged-with:** 231 232 Dynamic instances ``X1`` and ``X2`` produced by different threads 233 for the same static instance ``X`` are converged in the maximal 234 converged-with relation if and only if: 235 236 - ``X`` is not contained in any cycle, or, 237 - For every cycle ``C`` with header ``H`` that contains ``X``: 238 239 - every dynamic instance ``H1`` of ``H`` that precedes ``X1`` in 240 the respective thread is convergence-before ``X2``, and, 241 - every dynamic instance ``H2`` of ``H`` that precedes ``X2`` in 242 the respective thread is convergence-before ``X1``, 243 - without assuming that ``X1`` is converged with ``X2``. 244 245.. note:: 246 247 Cycle headers may not be unique to a given CFG if it is irreducible. Each 248 cycle hierarchy for the same CFG results in a different maximal 249 converged-with relation. 250 251 For brevity, the rest of the document restricts the term 252 *converged* to mean "related under the maximal converged-with 253 relation for the given cycle hierarchy". 254 255Maximal convergence can now be demonstrated in the earlier example as follows: 256 257.. table:: 258 :align: left 259 260 +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 261 | | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | 262 +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 263 | Thread 1 | Entry1 | H1 | B1 | L1 | H3 | | L3 | | | | Exit | 264 +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 265 | Thread 2 | Entry2 | H2 | | L2 | H4 | B2 | L4 | H5 | B3 | L5 | Exit | 266 +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 267 268- ``Entry1`` and ``Entry2`` are converged. 269- ``H1`` and ``H2`` are converged. 270- ``B1`` and ``B2`` are not converged due to ``H4`` which is not 271 convergence-before ``B1``. 272- ``H3`` and ``H4`` are converged. 273- ``H3`` is not converged with ``H5`` due to ``H4`` which is not 274 convergence-before ``H3``. 275- ``L1`` and ``L2`` are converged. 276- ``L3`` and ``L4`` are converged. 277- ``L3`` is not converged with ``L5`` due to ``H5`` which is not 278 convergence-before ``L3``. 279 280.. _convergence-cycle-headers: 281 282Dependence on Cycles Headers 283---------------------------- 284 285Contradictions in *convergence-before* are possible only between two 286nodes that are inside some cycle. The dynamic instances of such nodes 287may be interleaved in the same thread, and this interleaving may be 288different for different threads. 289 290When a thread executes a node ``X`` once and then executes it again, 291it must have followed a closed path in the CFG that includes ``X``. 292Such a path must pass through the header of at least one cycle --- the 293smallest cycle that includes the entire closed path. In a given 294thread, two dynamic instances of ``X`` are either separated by the 295execution of at least one cycle header, or ``X`` itself is a cycle 296header. 297 298In reducible cycles (natural loops), each execution of the header is 299equivalent to the start of a new iteration of the cycle. But this 300analogy breaks down in the presence of explicit constraints on the 301converged-with relation, such as those described in :ref:`future 302work<convergence-note-convergence>`. Instead, cycle headers should be 303treated as implicit *points of convergence* in a maximal 304converged-with relation. 305 306Consider a sequence of nested cycles ``C1``, ``C2``, ..., ``Ck`` such 307that ``C1`` is the outermost cycle and ``Ck`` is the innermost cycle, 308with headers ``H1``, ``H2``, ..., ``Hk`` respectively. When a thread 309enters the cycle ``Ck``, any of the following is possible: 310 3111. The thread directly entered cycle ``Ck`` without having executed 312 any of the headers ``H1`` to ``Hk``. 313 3142. The thread executed some or all of the nested headers one or more 315 times. 316 317The maximal converged-with relation captures the following intuition 318about cycles: 319 3201. When two threads enter a top-level cycle ``C1``, they execute 321 converged dynamic instances of every node that is a :ref:`child 322 <cycle-parent-block>` of ``C1``. 323 3242. When two threads enter a nested cycle ``Ck``, they execute 325 converged dynamic instances of every node that is a child of 326 ``Ck``, until either thread exits ``Ck``, if and only if they 327 executed converged dynamic instances of the last nested header that 328 either thread encountered. 329 330 Note that when a thread exits a nested cycle ``Ck``, it must follow 331 a closed path outside ``Ck`` to reenter it. This requires executing 332 the header of some outer cycle, as described earlier. 333 334Consider two dynamic instances ``X1`` and ``X2`` produced by threads ``T1`` 335and ``T2`` for a node ``X`` that is a child of nested cycle ``Ck``. 336Maximal convergence relates ``X1`` and ``X2`` as follows: 337 3381. If neither thread executed any header from ``H1`` to ``Hk``, then 339 ``X1`` and ``X2`` are converged. 340 3412. Otherwise, if there are no converged dynamic instances ``Q1`` and 342 ``Q2`` of any header ``Q`` from ``H1`` to ``Hk`` (where ``Q`` is 343 possibly the same as ``X``), such that ``Q1`` precedes ``X1`` and 344 ``Q2`` precedes ``X2`` in the respective threads, then ``X1`` and 345 ``X2`` are not converged. 346 3473. Otherwise, consider the pair ``Q1`` and ``Q2`` of converged dynamic 348 instances of a header ``Q`` from ``H1`` to ``Hk`` that occur most 349 recently before ``X1`` and ``X2`` in the respective threads. Then 350 ``X1`` and ``X2`` are converged if and only if there is no dynamic 351 instance of any header from ``H1`` to ``Hk`` that occurs between 352 ``Q1`` and ``X1`` in thread ``T1``, or between ``Q2`` and ``X2`` in 353 thread ``T2``. In other words, ``Q1`` and ``Q2`` represent the last 354 point of convergence, with no other header being executed before 355 executing ``X``. 356 357**Example:** 358 359.. figure:: convergence-both-diverged-nested.png 360 :name: convergence-both-diverged-nested 361 362The above figure shows two nested irreducible cycles with headers 363``R`` and ``S``. The nodes ``Entry`` and ``Q`` have divergent 364branches. The table below shows the convergence between three threads 365taking different paths through the CFG. Dynamic instances listed in 366the same column are converged. 367 368 .. table:: 369 :align: left 370 371 +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 372 | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 10 | 373 +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 374 | Thread1 | Entry | P1 | Q1 | S1 | P3 | Q3 | R1 | S2 | Exit | 375 +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 376 | Thread2 | Entry | P2 | Q2 | | | | R2 | S3 | Exit | 377 +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 378 | Thread3 | Entry | | | | | | R3 | S4 | Exit | 379 +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 380 381- ``P2`` and ``P3`` are not converged due to ``S1`` 382- ``Q2`` and ``Q3`` are not converged due to ``S1`` 383- ``S1`` and ``S3`` are not converged due to ``R2`` 384- ``S1`` and ``S4`` are not converged due to ``R3`` 385 386Informally, ``T1`` and ``T2`` execute the inner cycle a different 387number of times, without executing the header of the outer cycle. All 388threads converge in the outer cycle when they first execute the header 389of the outer cycle. 390 391.. _convergence-uniformity: 392 393Uniformity 394========== 395 3961. The output of two converged dynamic instances is uniform if and 397 only if it compares equal for those two dynamic instances. 3982. The output of a static instance ``X`` is uniform *for a given set 399 of threads* if and only if it is uniform for every pair of 400 converged dynamic instances of ``X`` produced by those threads. 401 402A non-uniform value is said to be *divergent*. 403 404For a set ``S`` of threads, the uniformity of each output of a static 405instance is determined as follows: 406 4071. The semantics of the instruction may specify the output to be 408 uniform. 4092. Otherwise, the output is divergent if the static instance is not 410 :ref:`m-converged <convergence-m-converged>`. 4113. Otherwise, if the static instance is m-converged: 412 413 1. If it is a PHI node, its output is uniform if and only 414 if for every pair of converged dynamic instances produced by all 415 threads in ``S``: 416 417 a. Both instances choose the same output from converged 418 dynamic instances, and, 419 b. That output is uniform for all threads in ``S``. 420 2. Otherwise, the output is uniform if and only if the input 421 operands are uniform for all threads in ``S``. 422 423Divergent Cycle Exits 424--------------------- 425 426When a divergent branch occurs inside a cycle, it is possible that a 427diverged path continues to an exit of the cycle. This is called a 428divergent cycle exit. If the cycle is irreducible, the diverged path 429may re-enter and eventually reach a join within the cycle. Such a join 430should be examined for the :ref:`diverged entry 431<convergence-diverged-entry>` criterion. 432 433Nodes along the diverged path that lie outside the cycle experience 434*temporal divergence*, when two threads executing convergently inside 435the cycle produce uniform values, but exit the cycle along the same 436divergent path after executing the header a different number of times 437(informally, on different iterations of the cycle). For a node ``N`` 438inside the cycle the outputs may be uniform for the two threads, but 439any use ``U`` outside the cycle receives a value from non-converged 440dynamic instances of ``N``. An output of ``U`` may be divergent, 441depending on the semantics of the instruction. 442 443.. _uniformity-analysis: 444 445Static Uniformity Analysis 446========================== 447 448Irreducible control flow results in different cycle hierarchies 449depending on the choice of headers during depth-first traversal. As a 450result, a static analysis cannot always determine the convergence of 451nodes in irreducible cycles, and any uniformity analysis is limited to 452those static instances whose convergence is independent of the cycle 453hierarchy: 454 455.. _convergence-m-converged: 456 457 **m-converged static instances:** 458 459 A static instance ``X`` is *m-converged* for a given CFG if and only 460 if the maximal converged-with relation for its dynamic instances is 461 the same in every cycle hierarchy that can be constructed for that CFG. 462 463 .. note:: 464 465 In other words, two dynamic instances ``X1`` and ``X2`` of an 466 m-converged static instance ``X`` are converged in some cycle 467 hierarchy if and only if they are also converged in every other 468 cycle hierarchy for the same CFG. 469 470 As noted earlier, for brevity, we restrict the term *converged* to 471 mean "related under the maximal converged-with relation for a given 472 cycle hierarchy". 473 474 475Each node ``X`` in a given CFG is reported to be m-converged if and 476only if every cycle that contains ``X`` satisfies the following necessary 477conditions: 478 479 1. Every divergent branch inside the cycle satisfies the 480 :ref:`diverged entry criterion<convergence-diverged-entry>`, and, 481 2. There are no :ref:`diverged paths reaching the 482 cycle<convergence-diverged-outside>` from a divergent branch 483 outside it. 484 485.. note:: 486 487 A reducible cycle :ref:`trivially satisfies 488 <convergence-reducible-cycle>` the above conditions. In particular, 489 if the whole CFG is reducible, then all nodes in the CFG are 490 m-converged. 491 492The uniformity of each output of a static instance 493is determined using the criteria 494:ref:`described earlier <convergence-uniformity>`. The discovery of 495divergent outputs may cause their uses (including branches) to also 496become divergent. The analysis propagates this divergence until a 497fixed point is reached. 498 499The convergence inferred using these criteria is a safe subset of the 500maximal converged-with relation for any cycle hierarchy. In 501particular, it is sufficient to determine if a static instance is 502m-converged for a given cycle hierarchy ``T``, even if that fact is 503not detected when examining some other cycle hierarchy ``T'``. 504 505This property allows compiler transforms to use the uniformity 506analysis without being affected by DFS choices made in the underlying 507cycle analysis. When two transforms use different instances of the 508uniformity analysis for the same CFG, a "divergent value" result in 509one analysis instance cannot contradict a "uniform value" result in 510the other. 511 512Generic transforms such as SimplifyCFG, CSE, and loop transforms 513commonly change the program in ways that change the maximal 514converged-with relations. This also means that a value that was 515previously uniform can become divergent after such a transform. 516Uniformity has to be recomputed after such transforms. 517 518Divergent Branch inside a Cycle 519------------------------------- 520 521.. figure:: convergence-divergent-inside.png 522 :name: convergence-divergent-inside 523 524The above figure shows a divergent branch ``Q`` inside an irreducible 525cyclic region. When two threads diverge at ``Q``, the convergence of 526dynamic instances within the cyclic region depends on the cycle 527hierarchy chosen: 528 5291. In an implementation that detects a single cycle ``C`` with header 530 ``P``, convergence inside the cycle is determined by ``P``. 531 5322. In an implementation that detects two nested cycles with headers 533 ``R`` and ``S``, convergence inside those cycles is determined by 534 their respective headers. 535 536.. _convergence-diverged-entry: 537 538A conservative approach would be to simply report all nodes inside 539irreducible cycles as having divergent outputs. But it is desirable to 540recognize m-converged nodes in the CFG in order to maximize 541uniformity. This section describes one such pattern of nodes derived 542from *closed paths*, which are a property of the CFG and do not depend 543on the cycle hierarchy. 544 545 **Diverged Entry Criterion:** 546 547 The dynamic instances of all the nodes in a closed path ``P`` are 548 m-converged only if for every divergent branch ``B`` and its 549 join node ``J`` that lie on ``P``, there is no entry to ``P`` which 550 lies on a diverged path from ``B`` to ``J``. 551 552.. figure:: convergence-closed-path.png 553 :name: convergence-closed-path 554 555Consider the closed path ``P -> Q -> R -> S`` in the above figure. 556``P`` and ``R`` are :ref:`entries to the closed 557path<cycle-closed-path>`. ``Q`` is a divergent branch and ``S`` is a 558join for that branch, with diverged paths ``Q -> R -> S`` and ``Q -> 559S``. 560 561- If a diverged entry ``R`` exists, then in some cycle hierarchy, 562 ``R`` is the header of the smallest cycle ``C`` containing the 563 closed path and a :ref:`child cycle<cycle-definition>` ``C'`` 564 exists in the set ``C - R``, containing both branch ``Q`` and join 565 ``S``. When threads diverge at ``Q``, one subset ``M`` continues 566 inside cycle ``C'``, while the complement ``N`` exits ``C'`` and 567 reaches ``R``. Dynamic instances of ``S`` executed by threads in set 568 ``M`` are not converged with those executed in set ``N`` due to the 569 presence of ``R``. Informally, threads that diverge at ``Q`` 570 reconverge in the same iteration of the outer cycle ``C``, but they 571 may have executed the inner cycle ``C'`` differently. 572 573 .. table:: 574 :align: left 575 576 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 577 | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 578 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 579 | Thread1 | Entry | P1 | Q1 | | | | R1 | S1 | P3 | ... | Exit | 580 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 581 | Thread2 | Entry | P2 | Q2 | S2 | P4 | Q4 | R2 | S4 | | | Exit | 582 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 583 584 In the table above, ``S2`` is not converged with ``S1`` due to ``R1``. 585 586| 587 588- If ``R`` does not exist, or if any node other than ``R`` is the 589 header of ``C``, then no such child cycle ``C'`` is detected. 590 Threads that diverge at ``Q`` execute converged dynamic instances of 591 ``S`` since they do not encounter the cycle header on any path from 592 ``Q`` to ``S``. Informally, threads that diverge at ``Q`` 593 reconverge at ``S`` in the same iteration of ``C``. 594 595 .. table:: 596 :align: left 597 598 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+ 599 | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 600 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+ 601 | Thread1 | Entry | P1 | Q1 | R1 | S1 | P3 | Q3 | R3 | S3 | Exit | 602 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+ 603 | Thread2 | Entry | P2 | Q2 | | S2 | P4 | Q4 | R2 | S4 | Exit | 604 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+ 605 606| 607 608 .. note:: 609 610 In general, the cycle ``C`` in the above statements is not 611 expected to be the same cycle for different headers. Cycles and 612 their headers are tightly coupled; for different headers in the 613 same outermost cycle, the child cycles detected may be different. 614 The property relevant to the above examples is that for every 615 closed path, there is a cycle ``C`` that contains the path and 616 whose header is on that path. 617 618The diverged entry criterion must be checked for every closed path 619passing through a divergent branch ``B`` and its join ``J``. Since 620:ref:`every closed path passes through the header of some 621cycle<cycle-closed-path-header>`, this amounts to checking every cycle 622``C`` that contains ``B`` and ``J``. When the header of ``C`` 623dominates the join ``J``, there can be no entry to any path from the 624header to ``J``, which includes any diverged path from ``B`` to ``J``. 625This is also true for any closed paths passing through the header of 626an outer cycle that contains ``C``. 627 628Thus, the diverged entry criterion can be conservatively simplified 629as follows: 630 631 For a divergent branch ``B`` and its join node ``J``, the nodes in a 632 cycle ``C`` that contains both ``B`` and ``J`` are m-converged only 633 if: 634 635 - ``B`` strictly dominates ``J``, or, 636 - The header ``H`` of ``C`` strictly dominates ``J``, or, 637 - Recursively, there is cycle ``C'`` inside ``C`` that satisfies the 638 same condition. 639 640When ``J`` is the same as ``H`` or ``B``, the trivial dominance is 641insufficient to make any statement about entries to diverged paths. 642 643.. _convergence-diverged-outside: 644 645Diverged Paths reaching a Cycle 646------------------------------- 647 648.. figure:: convergence-divergent-outside.png 649 :name: convergence-divergent-outside 650 651The figure shows two cycle hierarchies with a divergent branch in 652``Entry`` instead of ``Q``. For two threads that enter the closed path 653``P -> Q -> R -> S`` at ``P`` and ``R`` respectively, the convergence 654of dynamic instances generated along the path depends on whether ``P`` 655or ``R`` is the header. 656 657- Convergence when ``P`` is the header. 658 659 .. table:: 660 :align: left 661 662 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 663 | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 664 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 665 | Thread1 | Entry | | | | P1 | Q1 | R1 | S1 | P3 | Q3 | | S3 | Exit | 666 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 667 | Thread2 | Entry | | R2 | S2 | P2 | Q2 | | S2 | P4 | Q4 | R3 | S4 | Exit | 668 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 669 670 | 671 672- Convergence when ``R`` is the header. 673 674 .. table:: 675 :align: left 676 677 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 678 | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 679 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 680 | Thread1 | Entry | | P1 | Q1 | R1 | S1 | P3 | Q3 | S3 | | | Exit | 681 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 682 | Thread2 | Entry | | | | R2 | S2 | P2 | Q2 | S2 | P4 | ... | Exit | 683 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 684 685 | 686 687Thus, when diverged paths reach different entries of an irreducible 688cycle from outside the cycle, the static analysis conservatively 689reports every node in the cycle as not m-converged. 690 691.. _convergence-reducible-cycle: 692 693Reducible Cycle 694--------------- 695 696If ``C`` is a reducible cycle with header ``H``, then in any DFS, 697``H`` :ref:`must be the header of some cycle<cycle-reducible-headers>` 698``C'`` that contains ``C``. Independent of the DFS, there is no entry 699to the subgraph ``C`` other than ``H`` itself. Thus, we have the 700following: 701 7021. The diverged entry criterion is trivially satisfied for a divergent 703 branch and its join, where both are inside subgraph ``C``. 7042. When diverged paths reach the subgraph ``C`` from outside, their 705 convergence is always determined by the same header ``H``. 706 707Clearly, this can be determined only in a cycle hierarchy ``T`` where 708``C`` is detected as a reducible cycle. No such conclusion can be made 709in a different cycle hierarchy ``T'`` where ``C`` is part of a larger 710cycle ``C'`` with the same header, but this does not contradict the 711conclusion in ``T``. 712 713Controlled Convergence 714====================== 715 716:ref:`Convergence control tokens <dynamic_instances_and_convergence_tokens>` 717provide an explicit semantics for determining which threads are converged at a 718given point in the program. The impact of this is incorporated in a 719:ref:`controlled maximal converged-with <controlled_maximal_converged_with>` 720relation over dynamic instances and a :ref:`controlled m-converged 721<controlled_m_converged>` property of static instances. The :ref:`uniformity 722analysis <uniformity-analysis>` implemented in LLVM includes this for targets 723that support convergence control tokens. 724