1da61c865SSameer Sahasrabuddhe.. _convergence-and-uniformity: 2da61c865SSameer Sahasrabuddhe 3475ce4c2SSameer Sahasrabuddhe========================== 4475ce4c2SSameer SahasrabuddheConvergence And Uniformity 5475ce4c2SSameer Sahasrabuddhe========================== 6475ce4c2SSameer Sahasrabuddhe 7475ce4c2SSameer Sahasrabuddhe.. contents:: 8475ce4c2SSameer Sahasrabuddhe :local: 9475ce4c2SSameer Sahasrabuddhe 10475ce4c2SSameer SahasrabuddheIntroduction 11475ce4c2SSameer Sahasrabuddhe============ 12475ce4c2SSameer Sahasrabuddhe 13*256d76f4SSameer SahasrabuddheIn some environments, groups of threads execute the same program in parallel, 14*256d76f4SSameer Sahasrabuddhewhere efficient communication within a group is established using special 15*256d76f4SSameer Sahasrabuddheprimitives called :ref:`convergent operations<convergent_operations>`. The 16*256d76f4SSameer Sahasrabuddheoutcome of a convergent operation is sensitive to the set of threads that 17*256d76f4SSameer Sahasrabuddheparticipate in it. 18475ce4c2SSameer Sahasrabuddhe 19*256d76f4SSameer SahasrabuddheThe intuitive picture of *convergence* is built around threads executing in 20*256d76f4SSameer Sahasrabuddhe"lock step" --- a set of threads is thought of as *converged* if they are all 21*256d76f4SSameer Sahasrabuddheexecuting "the same sequence of instructions together". Such threads may 22*256d76f4SSameer Sahasrabuddhe*diverge* at a *divergent branch*, and they may later *reconverge* at some 23*256d76f4SSameer Sahasrabuddhecommon program point. 24475ce4c2SSameer Sahasrabuddhe 25*256d76f4SSameer SahasrabuddheIn this intuitive picture, when converged threads execute an instruction, the 26*256d76f4SSameer Sahasrabuddheresulting value is said to be *uniform* if it is the same in those threads, and 27*256d76f4SSameer Sahasrabuddhe*divergent* otherwise. Correspondingly, a branch is said to be a uniform branch 28*256d76f4SSameer Sahasrabuddheif its condition is uniform, and it is a divergent branch otherwise. 29*256d76f4SSameer Sahasrabuddhe 30*256d76f4SSameer SahasrabuddheBut the assumption of lock-step execution is not necessary for describing 31*256d76f4SSameer Sahasrabuddhecommunication at convergent operations. It also constrains the implementation 32*256d76f4SSameer Sahasrabuddhe(compiler as well as hardware) by overspecifying how threads execute in such a 33*256d76f4SSameer Sahasrabuddheparallel environment. To eliminate this assumption: 34*256d76f4SSameer Sahasrabuddhe 35*256d76f4SSameer Sahasrabuddhe- We define convergence as a relation between the execution of each instruction 36*256d76f4SSameer Sahasrabuddhe by different threads and not as a relation between the threads themselves. 37*256d76f4SSameer Sahasrabuddhe This definition is reasonable for known targets and is compatible with the 38*256d76f4SSameer Sahasrabuddhe semantics of :ref:`convergent operations<convergent_operations>` in LLVM IR. 39*256d76f4SSameer Sahasrabuddhe- We also define uniformity in terms of this convergence. The output of an 40*256d76f4SSameer Sahasrabuddhe instruction can be examined for uniformity across multiple threads only if the 41*256d76f4SSameer Sahasrabuddhe corresponding executions of that instruction are converged. 42*256d76f4SSameer Sahasrabuddhe 43*256d76f4SSameer SahasrabuddheThis document decribes a static analysis for determining convergence at each 44*256d76f4SSameer Sahasrabuddheinstruction in a function. The analysis extends previous work on divergence 45*256d76f4SSameer Sahasrabuddheanalysis [DivergenceSPMD]_ to cover irreducible control-flow. The described 46*256d76f4SSameer Sahasrabuddheanalysis is used in LLVM to implement a UniformityAnalysis that determines the 47*256d76f4SSameer Sahasrabuddheuniformity of value(s) computed at each instruction in an LLVM IR or MIR 48*256d76f4SSameer Sahasrabuddhefunction. 49*256d76f4SSameer Sahasrabuddhe 50*256d76f4SSameer Sahasrabuddhe.. [DivergenceSPMD] Julian Rosemann, Simon Moll, and Sebastian 51*256d76f4SSameer Sahasrabuddhe Hack. 2021. An Abstract Interpretation for SPMD Divergence on 52*256d76f4SSameer Sahasrabuddhe Reducible Control Flow Graphs. Proc. ACM Program. Lang. 5, POPL, 53*256d76f4SSameer Sahasrabuddhe Article 31 (January 2021), 35 pages. 54*256d76f4SSameer Sahasrabuddhe https://doi.org/10.1145/3434312 55*256d76f4SSameer Sahasrabuddhe 56*256d76f4SSameer SahasrabuddheMotivation 57*256d76f4SSameer Sahasrabuddhe========== 58*256d76f4SSameer Sahasrabuddhe 59*256d76f4SSameer SahasrabuddheDivergent branches constrain 60475ce4c2SSameer Sahasrabuddheprogram transforms such as changing the CFG or moving a convergent 61475ce4c2SSameer Sahasrabuddheoperation to a different point of the CFG. Performing these 62475ce4c2SSameer Sahasrabuddhetransformations across a divergent branch can change the sets of 63475ce4c2SSameer Sahasrabuddhethreads that execute convergent operations convergently. While these 64*256d76f4SSameer Sahasrabuddheconstraints are out of scope for this document, 65*256d76f4SSameer Sahasrabuddheuniformity analysis allows these transformations to identify 66475ce4c2SSameer Sahasrabuddheuniform branches where these constraints do not hold. 67475ce4c2SSameer Sahasrabuddhe 68475ce4c2SSameer SahasrabuddheUniformity is also useful by itself on targets that execute threads in 69475ce4c2SSameer Sahasrabuddhegroups with shared execution resources (e.g. waves, warps, or 70475ce4c2SSameer Sahasrabuddhesubgroups): 71475ce4c2SSameer Sahasrabuddhe 72475ce4c2SSameer Sahasrabuddhe- Uniform outputs can potentially be computed or stored on shared 73475ce4c2SSameer Sahasrabuddhe resources. 74475ce4c2SSameer Sahasrabuddhe- These targets must "linearize" a divergent branch to ensure that 75475ce4c2SSameer Sahasrabuddhe each side of the branch is followed by the corresponding threads in 76475ce4c2SSameer Sahasrabuddhe the same group. But linearization is unnecessary at uniform 77475ce4c2SSameer Sahasrabuddhe branches, since the whole group of threads follows either one side 78475ce4c2SSameer Sahasrabuddhe of the branch or the other. 79475ce4c2SSameer Sahasrabuddhe 80475ce4c2SSameer SahasrabuddheTerminology 81475ce4c2SSameer Sahasrabuddhe=========== 82475ce4c2SSameer Sahasrabuddhe 83475ce4c2SSameer SahasrabuddheCycles 84475ce4c2SSameer Sahasrabuddhe Described in :ref:`cycle-terminology`. 85475ce4c2SSameer Sahasrabuddhe 86475ce4c2SSameer SahasrabuddheClosed path 87475ce4c2SSameer Sahasrabuddhe Described in :ref:`cycle-closed-path`. 88475ce4c2SSameer Sahasrabuddhe 89475ce4c2SSameer SahasrabuddheDisjoint paths 90475ce4c2SSameer Sahasrabuddhe Two paths in a CFG are said to be disjoint if the only nodes common 91475ce4c2SSameer Sahasrabuddhe to both are the start node or the end node, or both. 92475ce4c2SSameer Sahasrabuddhe 93475ce4c2SSameer SahasrabuddheJoin node 94475ce4c2SSameer Sahasrabuddhe A join node of a branch is a node reachable along disjoint paths 95475ce4c2SSameer Sahasrabuddhe starting from that branch. 96475ce4c2SSameer Sahasrabuddhe 97475ce4c2SSameer SahasrabuddheDiverged path 98475ce4c2SSameer Sahasrabuddhe A diverged path is a path that starts from a divergent branch and 99475ce4c2SSameer Sahasrabuddhe either reaches a join node of the branch or reaches the end of the 100475ce4c2SSameer Sahasrabuddhe function without passing through any join node of the branch. 101475ce4c2SSameer Sahasrabuddhe 102da61c865SSameer Sahasrabuddhe.. _convergence-dynamic-instances: 103da61c865SSameer Sahasrabuddhe 104475ce4c2SSameer SahasrabuddheThreads and Dynamic Instances 105475ce4c2SSameer Sahasrabuddhe============================= 106475ce4c2SSameer Sahasrabuddhe 107475ce4c2SSameer SahasrabuddheEach occurrence of an instruction in the program source is called a 108475ce4c2SSameer Sahasrabuddhe*static instance*. When a thread executes a program, each execution of 109475ce4c2SSameer Sahasrabuddhea static instance produces a distinct *dynamic instance* of that 110475ce4c2SSameer Sahasrabuddheinstruction. 111475ce4c2SSameer Sahasrabuddhe 112475ce4c2SSameer SahasrabuddheEach thread produces a unique sequence of dynamic instances: 113475ce4c2SSameer Sahasrabuddhe 114475ce4c2SSameer Sahasrabuddhe- The sequence is generated along branch decisions and loop 115475ce4c2SSameer Sahasrabuddhe traversals. 116475ce4c2SSameer Sahasrabuddhe- Starts with a dynamic instance of a "first" instruction. 117475ce4c2SSameer Sahasrabuddhe- Continues with dynamic instances of successive "next" 118475ce4c2SSameer Sahasrabuddhe instructions. 119475ce4c2SSameer Sahasrabuddhe 120475ce4c2SSameer SahasrabuddheThreads are independent; some targets may choose to execute them in 121475ce4c2SSameer Sahasrabuddhegroups in order to share resources when possible. 122475ce4c2SSameer Sahasrabuddhe 123475ce4c2SSameer Sahasrabuddhe.. figure:: convergence-natural-loop.png 124475ce4c2SSameer Sahasrabuddhe :name: convergence-natural-loop 125475ce4c2SSameer Sahasrabuddhe 126475ce4c2SSameer Sahasrabuddhe.. table:: 127475ce4c2SSameer Sahasrabuddhe :name: convergence-thread-example 128475ce4c2SSameer Sahasrabuddhe :align: left 129475ce4c2SSameer Sahasrabuddhe 130475ce4c2SSameer Sahasrabuddhe +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 131475ce4c2SSameer Sahasrabuddhe | | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | 132475ce4c2SSameer Sahasrabuddhe +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 133475ce4c2SSameer Sahasrabuddhe | Thread 1 | Entry1 | H1 | B1 | L1 | H3 | | L3 | | | | Exit | 134475ce4c2SSameer Sahasrabuddhe +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 135475ce4c2SSameer Sahasrabuddhe | Thread 2 | Entry1 | H2 | | L2 | H4 | B2 | L4 | H5 | B3 | L5 | Exit | 136475ce4c2SSameer Sahasrabuddhe +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 137475ce4c2SSameer Sahasrabuddhe 138475ce4c2SSameer SahasrabuddheIn the above table, each row is a different thread, listing the 139475ce4c2SSameer Sahasrabuddhedynamic instances produced by that thread from left to right. Each 140475ce4c2SSameer Sahasrabuddhethread executes the same program that starts with an ``Entry`` node 141475ce4c2SSameer Sahasrabuddheand ends with an ``Exit`` node, but different threads may take 142475ce4c2SSameer Sahasrabuddhedifferent paths through the control flow of the program. The columns 143475ce4c2SSameer Sahasrabuddheare numbered merely for convenience, and empty cells have no special 144475ce4c2SSameer Sahasrabuddhemeaning. Dynamic instances listed in the same column are converged. 145475ce4c2SSameer Sahasrabuddhe 146475ce4c2SSameer Sahasrabuddhe.. _convergence-definition: 147475ce4c2SSameer Sahasrabuddhe 148475ce4c2SSameer SahasrabuddheConvergence 149475ce4c2SSameer Sahasrabuddhe=========== 150475ce4c2SSameer Sahasrabuddhe 151da61c865SSameer Sahasrabuddhe*Convergence-before* is a strict partial order over dynamic instances 152475ce4c2SSameer Sahasrabuddhethat is defined as the transitive closure of: 153475ce4c2SSameer Sahasrabuddhe 154475ce4c2SSameer Sahasrabuddhe1. If dynamic instance ``P`` is executed strictly before ``Q`` in the 155475ce4c2SSameer Sahasrabuddhe same thread, then ``P`` is *convergence-before* ``Q``. 156475ce4c2SSameer Sahasrabuddhe2. If dynamic instance ``P`` is executed strictly before ``Q1`` in the 157475ce4c2SSameer Sahasrabuddhe same thread, and ``Q1`` is *converged-with* ``Q2``, then ``P`` is 158475ce4c2SSameer Sahasrabuddhe *convergence-before* ``Q2``. 159475ce4c2SSameer Sahasrabuddhe3. If dynamic instance ``P1`` is *converged-with* ``P2``, and ``P2`` 160475ce4c2SSameer Sahasrabuddhe is executed strictly before ``Q`` in the same thread, then ``P1`` 161475ce4c2SSameer Sahasrabuddhe is *convergence-before* ``Q``. 162475ce4c2SSameer Sahasrabuddhe 163475ce4c2SSameer Sahasrabuddhe.. table:: 164475ce4c2SSameer Sahasrabuddhe :name: convergence-order-example 165475ce4c2SSameer Sahasrabuddhe :align: left 166475ce4c2SSameer Sahasrabuddhe 167475ce4c2SSameer Sahasrabuddhe +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 168475ce4c2SSameer Sahasrabuddhe | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 169475ce4c2SSameer Sahasrabuddhe +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 170475ce4c2SSameer Sahasrabuddhe | Thread 1 | Entry | ... | | | | S2 | T | ... | Exit | 171475ce4c2SSameer Sahasrabuddhe +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 172475ce4c2SSameer Sahasrabuddhe | Thread 2 | Entry | ... | | Q2 | R | S1 | | ... | Exit | 173475ce4c2SSameer Sahasrabuddhe +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 174475ce4c2SSameer Sahasrabuddhe | Thread 3 | Entry | ... | P | Q1 | | | | ... | | 175475ce4c2SSameer Sahasrabuddhe +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 176475ce4c2SSameer Sahasrabuddhe 177475ce4c2SSameer SahasrabuddheThe above table shows partial sequences of dynamic instances from 178475ce4c2SSameer Sahasrabuddhedifferent threads. Dynamic instances in the same column are assumed 179475ce4c2SSameer Sahasrabuddheto be converged (i.e., related to each other in the converged-with 180475ce4c2SSameer Sahasrabuddherelation). The resulting convergence order includes the edges ``P -> 181475ce4c2SSameer SahasrabuddheQ2``, ``Q1 -> R``, ``P -> R``, ``P -> T``, etc. 182475ce4c2SSameer Sahasrabuddhe 183*256d76f4SSameer Sahasrabuddhe*Converged-with* is a transitive symmetric relation over dynamic instances 184*256d76f4SSameer Sahasrabuddheproduced by *different threads* for the *same static instance*. 185*256d76f4SSameer Sahasrabuddhe 186*256d76f4SSameer SahasrabuddheIt is impractical to provide any one definition for the *converged-with* 187*256d76f4SSameer Sahasrabuddherelation, since different environments may wish to relate dynamic instances in 188*256d76f4SSameer Sahasrabuddhedifferent ways. The fact that *convergence-before* is a strict partial order is 189*256d76f4SSameer Sahasrabuddhea constraint on the *converged-with* relation. It is trivially satisfied if 190*256d76f4SSameer Sahasrabuddhedifferent dynamic instances are never converged. Below, we provide a relation 191*256d76f4SSameer Sahasrabuddhecalled :ref:`maximal converged-with<convergence-maximal>`, which satisifies 192*256d76f4SSameer Sahasrabuddhe*convergence-before* and is suitable for known targets. 193475ce4c2SSameer Sahasrabuddhe 194475ce4c2SSameer Sahasrabuddhe.. _convergence-note-convergence: 195475ce4c2SSameer Sahasrabuddhe 196475ce4c2SSameer Sahasrabuddhe.. note:: 197475ce4c2SSameer Sahasrabuddhe 198da61c865SSameer Sahasrabuddhe 1. The convergence-before relation is not 199475ce4c2SSameer Sahasrabuddhe directly observable. Program transforms are in general free to 200475ce4c2SSameer Sahasrabuddhe change the order of instructions, even though that obviously 201475ce4c2SSameer Sahasrabuddhe changes the convergence-before relation. 202475ce4c2SSameer Sahasrabuddhe 203da61c865SSameer Sahasrabuddhe 2. Converged dynamic instances need not be executed at the same 204475ce4c2SSameer Sahasrabuddhe time or even on the same resource. Converged dynamic instances 205475ce4c2SSameer Sahasrabuddhe of a convergent operation may appear to do so but that is an 206da61c865SSameer Sahasrabuddhe implementation detail. 207da61c865SSameer Sahasrabuddhe 208da61c865SSameer Sahasrabuddhe 3. The fact that ``P`` is convergence-before 209475ce4c2SSameer Sahasrabuddhe ``Q`` does not automatically imply that ``P`` happens-before 210475ce4c2SSameer Sahasrabuddhe ``Q`` in a memory model sense. 211475ce4c2SSameer Sahasrabuddhe 212475ce4c2SSameer Sahasrabuddhe.. _convergence-maximal: 213475ce4c2SSameer Sahasrabuddhe 214475ce4c2SSameer SahasrabuddheMaximal Convergence 215475ce4c2SSameer Sahasrabuddhe------------------- 216475ce4c2SSameer Sahasrabuddhe 217475ce4c2SSameer SahasrabuddheThis section defines a constraint that may be used to 218475ce4c2SSameer Sahasrabuddheproduce a *maximal converged-with* relation without violating the 219475ce4c2SSameer Sahasrabuddhestrict *convergence-before* order. This maximal converged-with 220475ce4c2SSameer Sahasrabuddherelation is reasonable for real targets and is compatible with 221475ce4c2SSameer Sahasrabuddheconvergent operations. 222475ce4c2SSameer Sahasrabuddhe 223475ce4c2SSameer SahasrabuddheThe maximal converged-with relation is defined in terms of cycle 224da61c865SSameer Sahasrabuddheheaders, with the assumption that threads converge at the header on every 225da61c865SSameer Sahasrabuddhe"iteration" of the cycle. Informally, two threads execute the same iteration of 226da61c865SSameer Sahasrabuddhea cycle if they both previously executed the cycle header the same number of 227da61c865SSameer Sahasrabuddhetimes after they entered that cycle. In general, this needs to account for the 228da61c865SSameer Sahasrabuddheiterations of parent cycles as well. 229475ce4c2SSameer Sahasrabuddhe 230475ce4c2SSameer Sahasrabuddhe **Maximal converged-with:** 231475ce4c2SSameer Sahasrabuddhe 232475ce4c2SSameer Sahasrabuddhe Dynamic instances ``X1`` and ``X2`` produced by different threads 233475ce4c2SSameer Sahasrabuddhe for the same static instance ``X`` are converged in the maximal 234*256d76f4SSameer Sahasrabuddhe converged-with relation if and only if: 235*256d76f4SSameer Sahasrabuddhe 236*256d76f4SSameer Sahasrabuddhe - ``X`` is not contained in any cycle, or, 237*256d76f4SSameer Sahasrabuddhe - For every cycle ``C`` with header ``H`` that contains ``X``: 238475ce4c2SSameer Sahasrabuddhe 239475ce4c2SSameer Sahasrabuddhe - every dynamic instance ``H1`` of ``H`` that precedes ``X1`` in 240475ce4c2SSameer Sahasrabuddhe the respective thread is convergence-before ``X2``, and, 241475ce4c2SSameer Sahasrabuddhe - every dynamic instance ``H2`` of ``H`` that precedes ``X2`` in 242475ce4c2SSameer Sahasrabuddhe the respective thread is convergence-before ``X1``, 243475ce4c2SSameer Sahasrabuddhe - without assuming that ``X1`` is converged with ``X2``. 244475ce4c2SSameer Sahasrabuddhe 245475ce4c2SSameer Sahasrabuddhe.. note:: 246475ce4c2SSameer Sahasrabuddhe 247da61c865SSameer Sahasrabuddhe Cycle headers may not be unique to a given CFG if it is irreducible. Each 248da61c865SSameer Sahasrabuddhe cycle hierarchy for the same CFG results in a different maximal 249da61c865SSameer Sahasrabuddhe converged-with relation. 250da61c865SSameer Sahasrabuddhe 251475ce4c2SSameer Sahasrabuddhe For brevity, the rest of the document restricts the term 252475ce4c2SSameer Sahasrabuddhe *converged* to mean "related under the maximal converged-with 253475ce4c2SSameer Sahasrabuddhe relation for the given cycle hierarchy". 254475ce4c2SSameer Sahasrabuddhe 255475ce4c2SSameer SahasrabuddheMaximal convergence can now be demonstrated in the earlier example as follows: 256475ce4c2SSameer Sahasrabuddhe 257475ce4c2SSameer Sahasrabuddhe.. table:: 258475ce4c2SSameer Sahasrabuddhe :align: left 259475ce4c2SSameer Sahasrabuddhe 260475ce4c2SSameer Sahasrabuddhe +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 261475ce4c2SSameer Sahasrabuddhe | | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | 262475ce4c2SSameer Sahasrabuddhe +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 263475ce4c2SSameer Sahasrabuddhe | Thread 1 | Entry1 | H1 | B1 | L1 | H3 | | L3 | | | | Exit | 264475ce4c2SSameer Sahasrabuddhe +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 265475ce4c2SSameer Sahasrabuddhe | Thread 2 | Entry2 | H2 | | L2 | H4 | B2 | L4 | H5 | B3 | L5 | Exit | 266475ce4c2SSameer Sahasrabuddhe +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 267475ce4c2SSameer Sahasrabuddhe 268475ce4c2SSameer Sahasrabuddhe- ``Entry1`` and ``Entry2`` are converged. 269475ce4c2SSameer Sahasrabuddhe- ``H1`` and ``H2`` are converged. 270475ce4c2SSameer Sahasrabuddhe- ``B1`` and ``B2`` are not converged due to ``H4`` which is not 271475ce4c2SSameer Sahasrabuddhe convergence-before ``B1``. 272475ce4c2SSameer Sahasrabuddhe- ``H3`` and ``H4`` are converged. 273475ce4c2SSameer Sahasrabuddhe- ``H3`` is not converged with ``H5`` due to ``H4`` which is not 274475ce4c2SSameer Sahasrabuddhe convergence-before ``H3``. 275475ce4c2SSameer Sahasrabuddhe- ``L1`` and ``L2`` are converged. 276475ce4c2SSameer Sahasrabuddhe- ``L3`` and ``L4`` are converged. 277475ce4c2SSameer Sahasrabuddhe- ``L3`` is not converged with ``L5`` due to ``H5`` which is not 278475ce4c2SSameer Sahasrabuddhe convergence-before ``L3``. 279475ce4c2SSameer Sahasrabuddhe 280475ce4c2SSameer Sahasrabuddhe.. _convergence-cycle-headers: 281475ce4c2SSameer Sahasrabuddhe 282475ce4c2SSameer SahasrabuddheDependence on Cycles Headers 283475ce4c2SSameer Sahasrabuddhe---------------------------- 284475ce4c2SSameer Sahasrabuddhe 285da61c865SSameer SahasrabuddheContradictions in *convergence-before* are possible only between two 286475ce4c2SSameer Sahasrabuddhenodes that are inside some cycle. The dynamic instances of such nodes 287475ce4c2SSameer Sahasrabuddhemay be interleaved in the same thread, and this interleaving may be 288475ce4c2SSameer Sahasrabuddhedifferent for different threads. 289475ce4c2SSameer Sahasrabuddhe 290475ce4c2SSameer SahasrabuddheWhen a thread executes a node ``X`` once and then executes it again, 291475ce4c2SSameer Sahasrabuddheit must have followed a closed path in the CFG that includes ``X``. 292475ce4c2SSameer SahasrabuddheSuch a path must pass through the header of at least one cycle --- the 293475ce4c2SSameer Sahasrabuddhesmallest cycle that includes the entire closed path. In a given 294475ce4c2SSameer Sahasrabuddhethread, two dynamic instances of ``X`` are either separated by the 295475ce4c2SSameer Sahasrabuddheexecution of at least one cycle header, or ``X`` itself is a cycle 296475ce4c2SSameer Sahasrabuddheheader. 297475ce4c2SSameer Sahasrabuddhe 298475ce4c2SSameer SahasrabuddheIn reducible cycles (natural loops), each execution of the header is 299475ce4c2SSameer Sahasrabuddheequivalent to the start of a new iteration of the cycle. But this 300475ce4c2SSameer Sahasrabuddheanalogy breaks down in the presence of explicit constraints on the 301475ce4c2SSameer Sahasrabuddheconverged-with relation, such as those described in :ref:`future 302475ce4c2SSameer Sahasrabuddhework<convergence-note-convergence>`. Instead, cycle headers should be 303475ce4c2SSameer Sahasrabuddhetreated as implicit *points of convergence* in a maximal 304475ce4c2SSameer Sahasrabuddheconverged-with relation. 305475ce4c2SSameer Sahasrabuddhe 306475ce4c2SSameer SahasrabuddheConsider a sequence of nested cycles ``C1``, ``C2``, ..., ``Ck`` such 307475ce4c2SSameer Sahasrabuddhethat ``C1`` is the outermost cycle and ``Ck`` is the innermost cycle, 308475ce4c2SSameer Sahasrabuddhewith headers ``H1``, ``H2``, ..., ``Hk`` respectively. When a thread 309475ce4c2SSameer Sahasrabuddheenters the cycle ``Ck``, any of the following is possible: 310475ce4c2SSameer Sahasrabuddhe 311475ce4c2SSameer Sahasrabuddhe1. The thread directly entered cycle ``Ck`` without having executed 312475ce4c2SSameer Sahasrabuddhe any of the headers ``H1`` to ``Hk``. 313475ce4c2SSameer Sahasrabuddhe 314475ce4c2SSameer Sahasrabuddhe2. The thread executed some or all of the nested headers one or more 315475ce4c2SSameer Sahasrabuddhe times. 316475ce4c2SSameer Sahasrabuddhe 317475ce4c2SSameer SahasrabuddheThe maximal converged-with relation captures the following intuition 318475ce4c2SSameer Sahasrabuddheabout cycles: 319475ce4c2SSameer Sahasrabuddhe 320475ce4c2SSameer Sahasrabuddhe1. When two threads enter a top-level cycle ``C1``, they execute 321475ce4c2SSameer Sahasrabuddhe converged dynamic instances of every node that is a :ref:`child 322475ce4c2SSameer Sahasrabuddhe <cycle-parent-block>` of ``C1``. 323475ce4c2SSameer Sahasrabuddhe 324475ce4c2SSameer Sahasrabuddhe2. When two threads enter a nested cycle ``Ck``, they execute 325475ce4c2SSameer Sahasrabuddhe converged dynamic instances of every node that is a child of 326475ce4c2SSameer Sahasrabuddhe ``Ck``, until either thread exits ``Ck``, if and only if they 327475ce4c2SSameer Sahasrabuddhe executed converged dynamic instances of the last nested header that 328475ce4c2SSameer Sahasrabuddhe either thread encountered. 329475ce4c2SSameer Sahasrabuddhe 330475ce4c2SSameer Sahasrabuddhe Note that when a thread exits a nested cycle ``Ck``, it must follow 331475ce4c2SSameer Sahasrabuddhe a closed path outside ``Ck`` to reenter it. This requires executing 332475ce4c2SSameer Sahasrabuddhe the header of some outer cycle, as described earlier. 333475ce4c2SSameer Sahasrabuddhe 334475ce4c2SSameer SahasrabuddheConsider two dynamic instances ``X1`` and ``X2`` produced by threads ``T1`` 335475ce4c2SSameer Sahasrabuddheand ``T2`` for a node ``X`` that is a child of nested cycle ``Ck``. 336475ce4c2SSameer SahasrabuddheMaximal convergence relates ``X1`` and ``X2`` as follows: 337475ce4c2SSameer Sahasrabuddhe 338475ce4c2SSameer Sahasrabuddhe1. If neither thread executed any header from ``H1`` to ``Hk``, then 339475ce4c2SSameer Sahasrabuddhe ``X1`` and ``X2`` are converged. 340475ce4c2SSameer Sahasrabuddhe 341475ce4c2SSameer Sahasrabuddhe2. Otherwise, if there are no converged dynamic instances ``Q1`` and 342475ce4c2SSameer Sahasrabuddhe ``Q2`` of any header ``Q`` from ``H1`` to ``Hk`` (where ``Q`` is 343475ce4c2SSameer Sahasrabuddhe possibly the same as ``X``), such that ``Q1`` precedes ``X1`` and 344475ce4c2SSameer Sahasrabuddhe ``Q2`` precedes ``X2`` in the respective threads, then ``X1`` and 345475ce4c2SSameer Sahasrabuddhe ``X2`` are not converged. 346475ce4c2SSameer Sahasrabuddhe 347475ce4c2SSameer Sahasrabuddhe3. Otherwise, consider the pair ``Q1`` and ``Q2`` of converged dynamic 348475ce4c2SSameer Sahasrabuddhe instances of a header ``Q`` from ``H1`` to ``Hk`` that occur most 349475ce4c2SSameer Sahasrabuddhe recently before ``X1`` and ``X2`` in the respective threads. Then 350475ce4c2SSameer Sahasrabuddhe ``X1`` and ``X2`` are converged if and only if there is no dynamic 351475ce4c2SSameer Sahasrabuddhe instance of any header from ``H1`` to ``Hk`` that occurs between 352475ce4c2SSameer Sahasrabuddhe ``Q1`` and ``X1`` in thread ``T1``, or between ``Q2`` and ``X2`` in 353475ce4c2SSameer Sahasrabuddhe thread ``T2``. In other words, ``Q1`` and ``Q2`` represent the last 354475ce4c2SSameer Sahasrabuddhe point of convergence, with no other header being executed before 355475ce4c2SSameer Sahasrabuddhe executing ``X``. 356475ce4c2SSameer Sahasrabuddhe 357475ce4c2SSameer Sahasrabuddhe**Example:** 358475ce4c2SSameer Sahasrabuddhe 359475ce4c2SSameer Sahasrabuddhe.. figure:: convergence-both-diverged-nested.png 360475ce4c2SSameer Sahasrabuddhe :name: convergence-both-diverged-nested 361475ce4c2SSameer Sahasrabuddhe 362475ce4c2SSameer SahasrabuddheThe above figure shows two nested irreducible cycles with headers 363475ce4c2SSameer Sahasrabuddhe``R`` and ``S``. The nodes ``Entry`` and ``Q`` have divergent 364475ce4c2SSameer Sahasrabuddhebranches. The table below shows the convergence between three threads 365475ce4c2SSameer Sahasrabuddhetaking different paths through the CFG. Dynamic instances listed in 366475ce4c2SSameer Sahasrabuddhethe same column are converged. 367475ce4c2SSameer Sahasrabuddhe 368475ce4c2SSameer Sahasrabuddhe .. table:: 369475ce4c2SSameer Sahasrabuddhe :align: left 370475ce4c2SSameer Sahasrabuddhe 371475ce4c2SSameer Sahasrabuddhe +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 372475ce4c2SSameer Sahasrabuddhe | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 10 | 373475ce4c2SSameer Sahasrabuddhe +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 374475ce4c2SSameer Sahasrabuddhe | Thread1 | Entry | P1 | Q1 | S1 | P3 | Q3 | R1 | S2 | Exit | 375475ce4c2SSameer Sahasrabuddhe +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 376475ce4c2SSameer Sahasrabuddhe | Thread2 | Entry | P2 | Q2 | | | | R2 | S3 | Exit | 377475ce4c2SSameer Sahasrabuddhe +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 378475ce4c2SSameer Sahasrabuddhe | Thread3 | Entry | | | | | | R3 | S4 | Exit | 379475ce4c2SSameer Sahasrabuddhe +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 380475ce4c2SSameer Sahasrabuddhe 381475ce4c2SSameer Sahasrabuddhe- ``P2`` and ``P3`` are not converged due to ``S1`` 382475ce4c2SSameer Sahasrabuddhe- ``Q2`` and ``Q3`` are not converged due to ``S1`` 383475ce4c2SSameer Sahasrabuddhe- ``S1`` and ``S3`` are not converged due to ``R2`` 384475ce4c2SSameer Sahasrabuddhe- ``S1`` and ``S4`` are not converged due to ``R3`` 385475ce4c2SSameer Sahasrabuddhe 386475ce4c2SSameer SahasrabuddheInformally, ``T1`` and ``T2`` execute the inner cycle a different 387475ce4c2SSameer Sahasrabuddhenumber of times, without executing the header of the outer cycle. All 388475ce4c2SSameer Sahasrabuddhethreads converge in the outer cycle when they first execute the header 389475ce4c2SSameer Sahasrabuddheof the outer cycle. 390475ce4c2SSameer Sahasrabuddhe 391475ce4c2SSameer Sahasrabuddhe.. _convergence-uniformity: 392475ce4c2SSameer Sahasrabuddhe 393475ce4c2SSameer SahasrabuddheUniformity 394475ce4c2SSameer Sahasrabuddhe========== 395475ce4c2SSameer Sahasrabuddhe 396475ce4c2SSameer Sahasrabuddhe1. The output of two converged dynamic instances is uniform if and 397475ce4c2SSameer Sahasrabuddhe only if it compares equal for those two dynamic instances. 398475ce4c2SSameer Sahasrabuddhe2. The output of a static instance ``X`` is uniform *for a given set 399475ce4c2SSameer Sahasrabuddhe of threads* if and only if it is uniform for every pair of 400475ce4c2SSameer Sahasrabuddhe converged dynamic instances of ``X`` produced by those threads. 401475ce4c2SSameer Sahasrabuddhe 402475ce4c2SSameer SahasrabuddheA non-uniform value is said to be *divergent*. 403475ce4c2SSameer Sahasrabuddhe 404475ce4c2SSameer SahasrabuddheFor a set ``S`` of threads, the uniformity of each output of a static 405475ce4c2SSameer Sahasrabuddheinstance is determined as follows: 406475ce4c2SSameer Sahasrabuddhe 407475ce4c2SSameer Sahasrabuddhe1. The semantics of the instruction may specify the output to be 408475ce4c2SSameer Sahasrabuddhe uniform. 409fd98416dSSameer Sahasrabuddhe2. Otherwise, the output is divergent if the static instance is not 410fd98416dSSameer Sahasrabuddhe :ref:`m-converged <convergence-m-converged>`. 411fd98416dSSameer Sahasrabuddhe3. Otherwise, if the static instance is m-converged: 412fd98416dSSameer Sahasrabuddhe 413fd98416dSSameer Sahasrabuddhe 1. If it is a PHI node, its output is uniform if and only 414475ce4c2SSameer Sahasrabuddhe if for every pair of converged dynamic instances produced by all 415475ce4c2SSameer Sahasrabuddhe threads in ``S``: 416475ce4c2SSameer Sahasrabuddhe 417475ce4c2SSameer Sahasrabuddhe a. Both instances choose the same output from converged 418475ce4c2SSameer Sahasrabuddhe dynamic instances, and, 419475ce4c2SSameer Sahasrabuddhe b. That output is uniform for all threads in ``S``. 420fd98416dSSameer Sahasrabuddhe 2. Otherwise, the output is uniform if and only if the input 421475ce4c2SSameer Sahasrabuddhe operands are uniform for all threads in ``S``. 422475ce4c2SSameer Sahasrabuddhe 423475ce4c2SSameer SahasrabuddheDivergent Cycle Exits 424475ce4c2SSameer Sahasrabuddhe--------------------- 425475ce4c2SSameer Sahasrabuddhe 426475ce4c2SSameer SahasrabuddheWhen a divergent branch occurs inside a cycle, it is possible that a 427475ce4c2SSameer Sahasrabuddhediverged path continues to an exit of the cycle. This is called a 428475ce4c2SSameer Sahasrabuddhedivergent cycle exit. If the cycle is irreducible, the diverged path 429475ce4c2SSameer Sahasrabuddhemay re-enter and eventually reach a join within the cycle. Such a join 430475ce4c2SSameer Sahasrabuddheshould be examined for the :ref:`diverged entry 431475ce4c2SSameer Sahasrabuddhe<convergence-diverged-entry>` criterion. 432475ce4c2SSameer Sahasrabuddhe 433475ce4c2SSameer SahasrabuddheNodes along the diverged path that lie outside the cycle experience 434475ce4c2SSameer Sahasrabuddhe*temporal divergence*, when two threads executing convergently inside 435475ce4c2SSameer Sahasrabuddhethe cycle produce uniform values, but exit the cycle along the same 436475ce4c2SSameer Sahasrabuddhedivergent path after executing the header a different number of times 437475ce4c2SSameer Sahasrabuddhe(informally, on different iterations of the cycle). For a node ``N`` 438475ce4c2SSameer Sahasrabuddheinside the cycle the outputs may be uniform for the two threads, but 439475ce4c2SSameer Sahasrabuddheany use ``U`` outside the cycle receives a value from non-converged 440475ce4c2SSameer Sahasrabuddhedynamic instances of ``N``. An output of ``U`` may be divergent, 441475ce4c2SSameer Sahasrabuddhedepending on the semantics of the instruction. 442475ce4c2SSameer Sahasrabuddhe 443da61c865SSameer Sahasrabuddhe.. _uniformity-analysis: 444da61c865SSameer Sahasrabuddhe 445475ce4c2SSameer SahasrabuddheStatic Uniformity Analysis 446475ce4c2SSameer Sahasrabuddhe========================== 447475ce4c2SSameer Sahasrabuddhe 448475ce4c2SSameer SahasrabuddheIrreducible control flow results in different cycle hierarchies 449475ce4c2SSameer Sahasrabuddhedepending on the choice of headers during depth-first traversal. As a 450475ce4c2SSameer Sahasrabuddheresult, a static analysis cannot always determine the convergence of 451475ce4c2SSameer Sahasrabuddhenodes in irreducible cycles, and any uniformity analysis is limited to 452475ce4c2SSameer Sahasrabuddhethose static instances whose convergence is independent of the cycle 453475ce4c2SSameer Sahasrabuddhehierarchy: 454475ce4c2SSameer Sahasrabuddhe 455fd98416dSSameer Sahasrabuddhe.. _convergence-m-converged: 456fd98416dSSameer Sahasrabuddhe 457475ce4c2SSameer Sahasrabuddhe **m-converged static instances:** 458475ce4c2SSameer Sahasrabuddhe 459475ce4c2SSameer Sahasrabuddhe A static instance ``X`` is *m-converged* for a given CFG if and only 460475ce4c2SSameer Sahasrabuddhe if the maximal converged-with relation for its dynamic instances is 461475ce4c2SSameer Sahasrabuddhe the same in every cycle hierarchy that can be constructed for that CFG. 462475ce4c2SSameer Sahasrabuddhe 463475ce4c2SSameer Sahasrabuddhe .. note:: 464475ce4c2SSameer Sahasrabuddhe 465475ce4c2SSameer Sahasrabuddhe In other words, two dynamic instances ``X1`` and ``X2`` of an 466475ce4c2SSameer Sahasrabuddhe m-converged static instance ``X`` are converged in some cycle 467475ce4c2SSameer Sahasrabuddhe hierarchy if and only if they are also converged in every other 468475ce4c2SSameer Sahasrabuddhe cycle hierarchy for the same CFG. 469475ce4c2SSameer Sahasrabuddhe 470475ce4c2SSameer Sahasrabuddhe As noted earlier, for brevity, we restrict the term *converged* to 471475ce4c2SSameer Sahasrabuddhe mean "related under the maximal converged-with relation for a given 472475ce4c2SSameer Sahasrabuddhe cycle hierarchy". 473475ce4c2SSameer Sahasrabuddhe 474475ce4c2SSameer Sahasrabuddhe 475475ce4c2SSameer SahasrabuddheEach node ``X`` in a given CFG is reported to be m-converged if and 476da61c865SSameer Sahasrabuddheonly if every cycle that contains ``X`` satisfies the following necessary 477da61c865SSameer Sahasrabuddheconditions: 478475ce4c2SSameer Sahasrabuddhe 479da61c865SSameer Sahasrabuddhe 1. Every divergent branch inside the cycle satisfies the 480475ce4c2SSameer Sahasrabuddhe :ref:`diverged entry criterion<convergence-diverged-entry>`, and, 481da61c865SSameer Sahasrabuddhe 2. There are no :ref:`diverged paths reaching the 482475ce4c2SSameer Sahasrabuddhe cycle<convergence-diverged-outside>` from a divergent branch 483475ce4c2SSameer Sahasrabuddhe outside it. 484475ce4c2SSameer Sahasrabuddhe 485475ce4c2SSameer Sahasrabuddhe.. note:: 486475ce4c2SSameer Sahasrabuddhe 487475ce4c2SSameer Sahasrabuddhe A reducible cycle :ref:`trivially satisfies 488475ce4c2SSameer Sahasrabuddhe <convergence-reducible-cycle>` the above conditions. In particular, 489475ce4c2SSameer Sahasrabuddhe if the whole CFG is reducible, then all nodes in the CFG are 490475ce4c2SSameer Sahasrabuddhe m-converged. 491475ce4c2SSameer Sahasrabuddhe 492fd98416dSSameer SahasrabuddheThe uniformity of each output of a static instance 493fd98416dSSameer Sahasrabuddheis determined using the criteria 494475ce4c2SSameer Sahasrabuddhe:ref:`described earlier <convergence-uniformity>`. The discovery of 495475ce4c2SSameer Sahasrabuddhedivergent outputs may cause their uses (including branches) to also 496475ce4c2SSameer Sahasrabuddhebecome divergent. The analysis propagates this divergence until a 497475ce4c2SSameer Sahasrabuddhefixed point is reached. 498475ce4c2SSameer Sahasrabuddhe 499475ce4c2SSameer SahasrabuddheThe convergence inferred using these criteria is a safe subset of the 500475ce4c2SSameer Sahasrabuddhemaximal converged-with relation for any cycle hierarchy. In 501475ce4c2SSameer Sahasrabuddheparticular, it is sufficient to determine if a static instance is 502475ce4c2SSameer Sahasrabuddhem-converged for a given cycle hierarchy ``T``, even if that fact is 503475ce4c2SSameer Sahasrabuddhenot detected when examining some other cycle hierarchy ``T'``. 504475ce4c2SSameer Sahasrabuddhe 505475ce4c2SSameer SahasrabuddheThis property allows compiler transforms to use the uniformity 506475ce4c2SSameer Sahasrabuddheanalysis without being affected by DFS choices made in the underlying 507475ce4c2SSameer Sahasrabuddhecycle analysis. When two transforms use different instances of the 508475ce4c2SSameer Sahasrabuddheuniformity analysis for the same CFG, a "divergent value" result in 509475ce4c2SSameer Sahasrabuddheone analysis instance cannot contradict a "uniform value" result in 510475ce4c2SSameer Sahasrabuddhethe other. 511475ce4c2SSameer Sahasrabuddhe 512475ce4c2SSameer SahasrabuddheGeneric transforms such as SimplifyCFG, CSE, and loop transforms 513475ce4c2SSameer Sahasrabuddhecommonly change the program in ways that change the maximal 514475ce4c2SSameer Sahasrabuddheconverged-with relations. This also means that a value that was 515475ce4c2SSameer Sahasrabuddhepreviously uniform can become divergent after such a transform. 516475ce4c2SSameer SahasrabuddheUniformity has to be recomputed after such transforms. 517475ce4c2SSameer Sahasrabuddhe 518475ce4c2SSameer SahasrabuddheDivergent Branch inside a Cycle 519475ce4c2SSameer Sahasrabuddhe------------------------------- 520475ce4c2SSameer Sahasrabuddhe 521475ce4c2SSameer Sahasrabuddhe.. figure:: convergence-divergent-inside.png 522475ce4c2SSameer Sahasrabuddhe :name: convergence-divergent-inside 523475ce4c2SSameer Sahasrabuddhe 524475ce4c2SSameer SahasrabuddheThe above figure shows a divergent branch ``Q`` inside an irreducible 525475ce4c2SSameer Sahasrabuddhecyclic region. When two threads diverge at ``Q``, the convergence of 526475ce4c2SSameer Sahasrabuddhedynamic instances within the cyclic region depends on the cycle 527475ce4c2SSameer Sahasrabuddhehierarchy chosen: 528475ce4c2SSameer Sahasrabuddhe 529475ce4c2SSameer Sahasrabuddhe1. In an implementation that detects a single cycle ``C`` with header 530475ce4c2SSameer Sahasrabuddhe ``P``, convergence inside the cycle is determined by ``P``. 531475ce4c2SSameer Sahasrabuddhe 532475ce4c2SSameer Sahasrabuddhe2. In an implementation that detects two nested cycles with headers 533475ce4c2SSameer Sahasrabuddhe ``R`` and ``S``, convergence inside those cycles is determined by 534475ce4c2SSameer Sahasrabuddhe their respective headers. 535475ce4c2SSameer Sahasrabuddhe 536475ce4c2SSameer Sahasrabuddhe.. _convergence-diverged-entry: 537475ce4c2SSameer Sahasrabuddhe 538475ce4c2SSameer SahasrabuddheA conservative approach would be to simply report all nodes inside 539475ce4c2SSameer Sahasrabuddheirreducible cycles as having divergent outputs. But it is desirable to 540475ce4c2SSameer Sahasrabuddherecognize m-converged nodes in the CFG in order to maximize 541475ce4c2SSameer Sahasrabuddheuniformity. This section describes one such pattern of nodes derived 542475ce4c2SSameer Sahasrabuddhefrom *closed paths*, which are a property of the CFG and do not depend 543475ce4c2SSameer Sahasrabuddheon the cycle hierarchy. 544475ce4c2SSameer Sahasrabuddhe 545475ce4c2SSameer Sahasrabuddhe **Diverged Entry Criterion:** 546475ce4c2SSameer Sahasrabuddhe 547475ce4c2SSameer Sahasrabuddhe The dynamic instances of all the nodes in a closed path ``P`` are 548475ce4c2SSameer Sahasrabuddhe m-converged only if for every divergent branch ``B`` and its 549475ce4c2SSameer Sahasrabuddhe join node ``J`` that lie on ``P``, there is no entry to ``P`` which 550475ce4c2SSameer Sahasrabuddhe lies on a diverged path from ``B`` to ``J``. 551475ce4c2SSameer Sahasrabuddhe 552475ce4c2SSameer Sahasrabuddhe.. figure:: convergence-closed-path.png 553475ce4c2SSameer Sahasrabuddhe :name: convergence-closed-path 554475ce4c2SSameer Sahasrabuddhe 555475ce4c2SSameer SahasrabuddheConsider the closed path ``P -> Q -> R -> S`` in the above figure. 556475ce4c2SSameer Sahasrabuddhe``P`` and ``R`` are :ref:`entries to the closed 557475ce4c2SSameer Sahasrabuddhepath<cycle-closed-path>`. ``Q`` is a divergent branch and ``S`` is a 558475ce4c2SSameer Sahasrabuddhejoin for that branch, with diverged paths ``Q -> R -> S`` and ``Q -> 559475ce4c2SSameer SahasrabuddheS``. 560475ce4c2SSameer Sahasrabuddhe 561475ce4c2SSameer Sahasrabuddhe- If a diverged entry ``R`` exists, then in some cycle hierarchy, 562475ce4c2SSameer Sahasrabuddhe ``R`` is the header of the smallest cycle ``C`` containing the 563475ce4c2SSameer Sahasrabuddhe closed path and a :ref:`child cycle<cycle-definition>` ``C'`` 564475ce4c2SSameer Sahasrabuddhe exists in the set ``C - R``, containing both branch ``Q`` and join 565475ce4c2SSameer Sahasrabuddhe ``S``. When threads diverge at ``Q``, one subset ``M`` continues 566475ce4c2SSameer Sahasrabuddhe inside cycle ``C'``, while the complement ``N`` exits ``C'`` and 567475ce4c2SSameer Sahasrabuddhe reaches ``R``. Dynamic instances of ``S`` executed by threads in set 568475ce4c2SSameer Sahasrabuddhe ``M`` are not converged with those executed in set ``N`` due to the 569475ce4c2SSameer Sahasrabuddhe presence of ``R``. Informally, threads that diverge at ``Q`` 570475ce4c2SSameer Sahasrabuddhe reconverge in the same iteration of the outer cycle ``C``, but they 571475ce4c2SSameer Sahasrabuddhe may have executed the inner cycle ``C'`` differently. 572475ce4c2SSameer Sahasrabuddhe 573475ce4c2SSameer Sahasrabuddhe .. table:: 574475ce4c2SSameer Sahasrabuddhe :align: left 575475ce4c2SSameer Sahasrabuddhe 576475ce4c2SSameer Sahasrabuddhe +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 577475ce4c2SSameer Sahasrabuddhe | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 578475ce4c2SSameer Sahasrabuddhe +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 579475ce4c2SSameer Sahasrabuddhe | Thread1 | Entry | P1 | Q1 | | | | R1 | S1 | P3 | ... | Exit | 580475ce4c2SSameer Sahasrabuddhe +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 581475ce4c2SSameer Sahasrabuddhe | Thread2 | Entry | P2 | Q2 | S2 | P4 | Q4 | R2 | S4 | | | Exit | 582475ce4c2SSameer Sahasrabuddhe +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 583475ce4c2SSameer Sahasrabuddhe 584475ce4c2SSameer Sahasrabuddhe In the table above, ``S2`` is not converged with ``S1`` due to ``R1``. 585475ce4c2SSameer Sahasrabuddhe 586475ce4c2SSameer Sahasrabuddhe| 587475ce4c2SSameer Sahasrabuddhe 588475ce4c2SSameer Sahasrabuddhe- If ``R`` does not exist, or if any node other than ``R`` is the 589475ce4c2SSameer Sahasrabuddhe header of ``C``, then no such child cycle ``C'`` is detected. 590475ce4c2SSameer Sahasrabuddhe Threads that diverge at ``Q`` execute converged dynamic instances of 591475ce4c2SSameer Sahasrabuddhe ``S`` since they do not encounter the cycle header on any path from 592475ce4c2SSameer Sahasrabuddhe ``Q`` to ``S``. Informally, threads that diverge at ``Q`` 593475ce4c2SSameer Sahasrabuddhe reconverge at ``S`` in the same iteration of ``C``. 594475ce4c2SSameer Sahasrabuddhe 595475ce4c2SSameer Sahasrabuddhe .. table:: 596475ce4c2SSameer Sahasrabuddhe :align: left 597475ce4c2SSameer Sahasrabuddhe 598475ce4c2SSameer Sahasrabuddhe +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+ 599475ce4c2SSameer Sahasrabuddhe | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 600475ce4c2SSameer Sahasrabuddhe +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+ 601475ce4c2SSameer Sahasrabuddhe | Thread1 | Entry | P1 | Q1 | R1 | S1 | P3 | Q3 | R3 | S3 | Exit | 602475ce4c2SSameer Sahasrabuddhe +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+ 603475ce4c2SSameer Sahasrabuddhe | Thread2 | Entry | P2 | Q2 | | S2 | P4 | Q4 | R2 | S4 | Exit | 604475ce4c2SSameer Sahasrabuddhe +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+ 605475ce4c2SSameer Sahasrabuddhe 606475ce4c2SSameer Sahasrabuddhe| 607475ce4c2SSameer Sahasrabuddhe 608475ce4c2SSameer Sahasrabuddhe .. note:: 609475ce4c2SSameer Sahasrabuddhe 610475ce4c2SSameer Sahasrabuddhe In general, the cycle ``C`` in the above statements is not 611475ce4c2SSameer Sahasrabuddhe expected to be the same cycle for different headers. Cycles and 612475ce4c2SSameer Sahasrabuddhe their headers are tightly coupled; for different headers in the 613475ce4c2SSameer Sahasrabuddhe same outermost cycle, the child cycles detected may be different. 614475ce4c2SSameer Sahasrabuddhe The property relevant to the above examples is that for every 615475ce4c2SSameer Sahasrabuddhe closed path, there is a cycle ``C`` that contains the path and 616475ce4c2SSameer Sahasrabuddhe whose header is on that path. 617475ce4c2SSameer Sahasrabuddhe 618475ce4c2SSameer SahasrabuddheThe diverged entry criterion must be checked for every closed path 619475ce4c2SSameer Sahasrabuddhepassing through a divergent branch ``B`` and its join ``J``. Since 620475ce4c2SSameer Sahasrabuddhe:ref:`every closed path passes through the header of some 621475ce4c2SSameer Sahasrabuddhecycle<cycle-closed-path-header>`, this amounts to checking every cycle 622475ce4c2SSameer Sahasrabuddhe``C`` that contains ``B`` and ``J``. When the header of ``C`` 623475ce4c2SSameer Sahasrabuddhedominates the join ``J``, there can be no entry to any path from the 624475ce4c2SSameer Sahasrabuddheheader to ``J``, which includes any diverged path from ``B`` to ``J``. 625475ce4c2SSameer SahasrabuddheThis is also true for any closed paths passing through the header of 626475ce4c2SSameer Sahasrabuddhean outer cycle that contains ``C``. 627475ce4c2SSameer Sahasrabuddhe 628475ce4c2SSameer SahasrabuddheThus, the diverged entry criterion can be conservatively simplified 629475ce4c2SSameer Sahasrabuddheas follows: 630475ce4c2SSameer Sahasrabuddhe 631475ce4c2SSameer Sahasrabuddhe For a divergent branch ``B`` and its join node ``J``, the nodes in a 632475ce4c2SSameer Sahasrabuddhe cycle ``C`` that contains both ``B`` and ``J`` are m-converged only 633475ce4c2SSameer Sahasrabuddhe if: 634475ce4c2SSameer Sahasrabuddhe 635475ce4c2SSameer Sahasrabuddhe - ``B`` strictly dominates ``J``, or, 636475ce4c2SSameer Sahasrabuddhe - The header ``H`` of ``C`` strictly dominates ``J``, or, 637475ce4c2SSameer Sahasrabuddhe - Recursively, there is cycle ``C'`` inside ``C`` that satisfies the 638475ce4c2SSameer Sahasrabuddhe same condition. 639475ce4c2SSameer Sahasrabuddhe 640475ce4c2SSameer SahasrabuddheWhen ``J`` is the same as ``H`` or ``B``, the trivial dominance is 641475ce4c2SSameer Sahasrabuddheinsufficient to make any statement about entries to diverged paths. 642475ce4c2SSameer Sahasrabuddhe 643475ce4c2SSameer Sahasrabuddhe.. _convergence-diverged-outside: 644475ce4c2SSameer Sahasrabuddhe 645475ce4c2SSameer SahasrabuddheDiverged Paths reaching a Cycle 646475ce4c2SSameer Sahasrabuddhe------------------------------- 647475ce4c2SSameer Sahasrabuddhe 648475ce4c2SSameer Sahasrabuddhe.. figure:: convergence-divergent-outside.png 649475ce4c2SSameer Sahasrabuddhe :name: convergence-divergent-outside 650475ce4c2SSameer Sahasrabuddhe 651475ce4c2SSameer SahasrabuddheThe figure shows two cycle hierarchies with a divergent branch in 652475ce4c2SSameer Sahasrabuddhe``Entry`` instead of ``Q``. For two threads that enter the closed path 653475ce4c2SSameer Sahasrabuddhe``P -> Q -> R -> S`` at ``P`` and ``R`` respectively, the convergence 654475ce4c2SSameer Sahasrabuddheof dynamic instances generated along the path depends on whether ``P`` 655475ce4c2SSameer Sahasrabuddheor ``R`` is the header. 656475ce4c2SSameer Sahasrabuddhe 657475ce4c2SSameer Sahasrabuddhe- Convergence when ``P`` is the header. 658475ce4c2SSameer Sahasrabuddhe 659475ce4c2SSameer Sahasrabuddhe .. table:: 660475ce4c2SSameer Sahasrabuddhe :align: left 661475ce4c2SSameer Sahasrabuddhe 662475ce4c2SSameer Sahasrabuddhe +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 663475ce4c2SSameer Sahasrabuddhe | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 664475ce4c2SSameer Sahasrabuddhe +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 665475ce4c2SSameer Sahasrabuddhe | Thread1 | Entry | | | | P1 | Q1 | R1 | S1 | P3 | Q3 | | S3 | Exit | 666475ce4c2SSameer Sahasrabuddhe +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 667475ce4c2SSameer Sahasrabuddhe | Thread2 | Entry | | R2 | S2 | P2 | Q2 | | S2 | P4 | Q4 | R3 | S4 | Exit | 668475ce4c2SSameer Sahasrabuddhe +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 669475ce4c2SSameer Sahasrabuddhe 670475ce4c2SSameer Sahasrabuddhe | 671475ce4c2SSameer Sahasrabuddhe 672475ce4c2SSameer Sahasrabuddhe- Convergence when ``R`` is the header. 673475ce4c2SSameer Sahasrabuddhe 674475ce4c2SSameer Sahasrabuddhe .. table:: 675475ce4c2SSameer Sahasrabuddhe :align: left 676475ce4c2SSameer Sahasrabuddhe 677475ce4c2SSameer Sahasrabuddhe +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 678475ce4c2SSameer Sahasrabuddhe | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 679475ce4c2SSameer Sahasrabuddhe +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 680475ce4c2SSameer Sahasrabuddhe | Thread1 | Entry | | P1 | Q1 | R1 | S1 | P3 | Q3 | S3 | | | Exit | 681475ce4c2SSameer Sahasrabuddhe +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 682475ce4c2SSameer Sahasrabuddhe | Thread2 | Entry | | | | R2 | S2 | P2 | Q2 | S2 | P4 | ... | Exit | 683475ce4c2SSameer Sahasrabuddhe +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 684475ce4c2SSameer Sahasrabuddhe 685475ce4c2SSameer Sahasrabuddhe | 686475ce4c2SSameer Sahasrabuddhe 687475ce4c2SSameer SahasrabuddheThus, when diverged paths reach different entries of an irreducible 688475ce4c2SSameer Sahasrabuddhecycle from outside the cycle, the static analysis conservatively 689475ce4c2SSameer Sahasrabuddhereports every node in the cycle as not m-converged. 690475ce4c2SSameer Sahasrabuddhe 691475ce4c2SSameer Sahasrabuddhe.. _convergence-reducible-cycle: 692475ce4c2SSameer Sahasrabuddhe 693475ce4c2SSameer SahasrabuddheReducible Cycle 694475ce4c2SSameer Sahasrabuddhe--------------- 695475ce4c2SSameer Sahasrabuddhe 696475ce4c2SSameer SahasrabuddheIf ``C`` is a reducible cycle with header ``H``, then in any DFS, 697475ce4c2SSameer Sahasrabuddhe``H`` :ref:`must be the header of some cycle<cycle-reducible-headers>` 698475ce4c2SSameer Sahasrabuddhe``C'`` that contains ``C``. Independent of the DFS, there is no entry 699475ce4c2SSameer Sahasrabuddheto the subgraph ``C`` other than ``H`` itself. Thus, we have the 700475ce4c2SSameer Sahasrabuddhefollowing: 701475ce4c2SSameer Sahasrabuddhe 702475ce4c2SSameer Sahasrabuddhe1. The diverged entry criterion is trivially satisfied for a divergent 703475ce4c2SSameer Sahasrabuddhe branch and its join, where both are inside subgraph ``C``. 704475ce4c2SSameer Sahasrabuddhe2. When diverged paths reach the subgraph ``C`` from outside, their 705475ce4c2SSameer Sahasrabuddhe convergence is always determined by the same header ``H``. 706475ce4c2SSameer Sahasrabuddhe 707475ce4c2SSameer SahasrabuddheClearly, this can be determined only in a cycle hierarchy ``T`` where 708475ce4c2SSameer Sahasrabuddhe``C`` is detected as a reducible cycle. No such conclusion can be made 709475ce4c2SSameer Sahasrabuddhein a different cycle hierarchy ``T'`` where ``C`` is part of a larger 710475ce4c2SSameer Sahasrabuddhecycle ``C'`` with the same header, but this does not contradict the 711475ce4c2SSameer Sahasrabuddheconclusion in ``T``. 712da61c865SSameer Sahasrabuddhe 713da61c865SSameer SahasrabuddheControlled Convergence 714da61c865SSameer Sahasrabuddhe====================== 715da61c865SSameer Sahasrabuddhe 716da61c865SSameer Sahasrabuddhe:ref:`Convergence control tokens <dynamic_instances_and_convergence_tokens>` 717da61c865SSameer Sahasrabuddheprovide an explicit semantics for determining which threads are converged at a 718da61c865SSameer Sahasrabuddhegiven point in the program. The impact of this is incorporated in a 719da61c865SSameer Sahasrabuddhe:ref:`controlled maximal converged-with <controlled_maximal_converged_with>` 720da61c865SSameer Sahasrabuddherelation over dynamic instances and a :ref:`controlled m-converged 721da61c865SSameer Sahasrabuddhe<controlled_m_converged>` property of static instances. The :ref:`uniformity 722da61c865SSameer Sahasrabuddheanalysis <uniformity-analysis>` implemented in LLVM includes this for targets 723da61c865SSameer Sahasrabuddhethat support convergence control tokens. 724