xref: /llvm-project/llvm/docs/ConvergenceAndUniformity.rst (revision 256d76f48060a353ba3bb885698e2ba8d1c87ec6)
1da61c865SSameer Sahasrabuddhe.. _convergence-and-uniformity:
2da61c865SSameer Sahasrabuddhe
3475ce4c2SSameer Sahasrabuddhe==========================
4475ce4c2SSameer SahasrabuddheConvergence And Uniformity
5475ce4c2SSameer Sahasrabuddhe==========================
6475ce4c2SSameer Sahasrabuddhe
7475ce4c2SSameer Sahasrabuddhe.. contents::
8475ce4c2SSameer Sahasrabuddhe   :local:
9475ce4c2SSameer Sahasrabuddhe
10475ce4c2SSameer SahasrabuddheIntroduction
11475ce4c2SSameer Sahasrabuddhe============
12475ce4c2SSameer Sahasrabuddhe
13*256d76f4SSameer SahasrabuddheIn some environments, groups of threads execute the same program in parallel,
14*256d76f4SSameer Sahasrabuddhewhere efficient communication within a group is established using special
15*256d76f4SSameer Sahasrabuddheprimitives called :ref:`convergent operations<convergent_operations>`. The
16*256d76f4SSameer Sahasrabuddheoutcome of a convergent operation is sensitive to the set of threads that
17*256d76f4SSameer Sahasrabuddheparticipate in it.
18475ce4c2SSameer Sahasrabuddhe
19*256d76f4SSameer SahasrabuddheThe intuitive picture of *convergence* is built around threads executing in
20*256d76f4SSameer Sahasrabuddhe"lock step" --- a set of threads is thought of as *converged* if they are all
21*256d76f4SSameer Sahasrabuddheexecuting "the same sequence of instructions together". Such threads may
22*256d76f4SSameer Sahasrabuddhe*diverge* at a *divergent branch*, and they may later *reconverge* at some
23*256d76f4SSameer Sahasrabuddhecommon program point.
24475ce4c2SSameer Sahasrabuddhe
25*256d76f4SSameer SahasrabuddheIn this intuitive picture, when converged threads execute an instruction, the
26*256d76f4SSameer Sahasrabuddheresulting value is said to be *uniform* if it is the same in those threads, and
27*256d76f4SSameer Sahasrabuddhe*divergent* otherwise. Correspondingly, a branch is said to be a uniform branch
28*256d76f4SSameer Sahasrabuddheif its condition is uniform, and it is a divergent branch otherwise.
29*256d76f4SSameer Sahasrabuddhe
30*256d76f4SSameer SahasrabuddheBut the assumption of lock-step execution is not necessary for describing
31*256d76f4SSameer Sahasrabuddhecommunication at convergent operations. It also constrains the implementation
32*256d76f4SSameer Sahasrabuddhe(compiler as well as hardware) by overspecifying how threads execute in such a
33*256d76f4SSameer Sahasrabuddheparallel environment. To eliminate this assumption:
34*256d76f4SSameer Sahasrabuddhe
35*256d76f4SSameer Sahasrabuddhe- We define convergence as a relation between the execution of each instruction
36*256d76f4SSameer Sahasrabuddhe  by different threads and not as a relation between the threads themselves.
37*256d76f4SSameer Sahasrabuddhe  This definition is reasonable for known targets and is compatible with the
38*256d76f4SSameer Sahasrabuddhe  semantics of :ref:`convergent operations<convergent_operations>` in LLVM IR.
39*256d76f4SSameer Sahasrabuddhe- We also define uniformity in terms of this convergence. The output of an
40*256d76f4SSameer Sahasrabuddhe  instruction can be examined for uniformity across multiple threads only if the
41*256d76f4SSameer Sahasrabuddhe  corresponding executions of that instruction are converged.
42*256d76f4SSameer Sahasrabuddhe
43*256d76f4SSameer SahasrabuddheThis document decribes a static analysis for determining convergence at each
44*256d76f4SSameer Sahasrabuddheinstruction in a function. The analysis extends previous work on divergence
45*256d76f4SSameer Sahasrabuddheanalysis [DivergenceSPMD]_ to cover irreducible control-flow. The described
46*256d76f4SSameer Sahasrabuddheanalysis is used in LLVM to implement a UniformityAnalysis that determines the
47*256d76f4SSameer Sahasrabuddheuniformity of value(s) computed at each instruction in an LLVM IR or MIR
48*256d76f4SSameer Sahasrabuddhefunction.
49*256d76f4SSameer Sahasrabuddhe
50*256d76f4SSameer Sahasrabuddhe.. [DivergenceSPMD] Julian Rosemann, Simon Moll, and Sebastian
51*256d76f4SSameer Sahasrabuddhe   Hack. 2021. An Abstract Interpretation for SPMD Divergence on
52*256d76f4SSameer Sahasrabuddhe   Reducible Control Flow Graphs. Proc. ACM Program. Lang. 5, POPL,
53*256d76f4SSameer Sahasrabuddhe   Article 31 (January 2021), 35 pages.
54*256d76f4SSameer Sahasrabuddhe   https://doi.org/10.1145/3434312
55*256d76f4SSameer Sahasrabuddhe
56*256d76f4SSameer SahasrabuddheMotivation
57*256d76f4SSameer Sahasrabuddhe==========
58*256d76f4SSameer Sahasrabuddhe
59*256d76f4SSameer SahasrabuddheDivergent branches constrain
60475ce4c2SSameer Sahasrabuddheprogram transforms such as changing the CFG or moving a convergent
61475ce4c2SSameer Sahasrabuddheoperation to a different point of the CFG. Performing these
62475ce4c2SSameer Sahasrabuddhetransformations across a divergent branch can change the sets of
63475ce4c2SSameer Sahasrabuddhethreads that execute convergent operations convergently. While these
64*256d76f4SSameer Sahasrabuddheconstraints are out of scope for this document,
65*256d76f4SSameer Sahasrabuddheuniformity analysis allows these transformations to identify
66475ce4c2SSameer Sahasrabuddheuniform branches where these constraints do not hold.
67475ce4c2SSameer Sahasrabuddhe
68475ce4c2SSameer SahasrabuddheUniformity is also useful by itself on targets that execute threads in
69475ce4c2SSameer Sahasrabuddhegroups with shared execution resources (e.g. waves, warps, or
70475ce4c2SSameer Sahasrabuddhesubgroups):
71475ce4c2SSameer Sahasrabuddhe
72475ce4c2SSameer Sahasrabuddhe- Uniform outputs can potentially be computed or stored on shared
73475ce4c2SSameer Sahasrabuddhe  resources.
74475ce4c2SSameer Sahasrabuddhe- These targets must "linearize" a divergent branch to ensure that
75475ce4c2SSameer Sahasrabuddhe  each side of the branch is followed by the corresponding threads in
76475ce4c2SSameer Sahasrabuddhe  the same group. But linearization is unnecessary at uniform
77475ce4c2SSameer Sahasrabuddhe  branches, since the whole group of threads follows either one side
78475ce4c2SSameer Sahasrabuddhe  of the branch or the other.
79475ce4c2SSameer Sahasrabuddhe
80475ce4c2SSameer SahasrabuddheTerminology
81475ce4c2SSameer Sahasrabuddhe===========
82475ce4c2SSameer Sahasrabuddhe
83475ce4c2SSameer SahasrabuddheCycles
84475ce4c2SSameer Sahasrabuddhe   Described in :ref:`cycle-terminology`.
85475ce4c2SSameer Sahasrabuddhe
86475ce4c2SSameer SahasrabuddheClosed path
87475ce4c2SSameer Sahasrabuddhe   Described in :ref:`cycle-closed-path`.
88475ce4c2SSameer Sahasrabuddhe
89475ce4c2SSameer SahasrabuddheDisjoint paths
90475ce4c2SSameer Sahasrabuddhe   Two paths in a CFG are said to be disjoint if the only nodes common
91475ce4c2SSameer Sahasrabuddhe   to both are the start node or the end node, or both.
92475ce4c2SSameer Sahasrabuddhe
93475ce4c2SSameer SahasrabuddheJoin node
94475ce4c2SSameer Sahasrabuddhe   A join node of a branch is a node reachable along disjoint paths
95475ce4c2SSameer Sahasrabuddhe   starting from that branch.
96475ce4c2SSameer Sahasrabuddhe
97475ce4c2SSameer SahasrabuddheDiverged path
98475ce4c2SSameer Sahasrabuddhe   A diverged path is a path that starts from a divergent branch and
99475ce4c2SSameer Sahasrabuddhe   either reaches a join node of the branch or reaches the end of the
100475ce4c2SSameer Sahasrabuddhe   function without passing through any join node of the branch.
101475ce4c2SSameer Sahasrabuddhe
102da61c865SSameer Sahasrabuddhe.. _convergence-dynamic-instances:
103da61c865SSameer Sahasrabuddhe
104475ce4c2SSameer SahasrabuddheThreads and Dynamic Instances
105475ce4c2SSameer Sahasrabuddhe=============================
106475ce4c2SSameer Sahasrabuddhe
107475ce4c2SSameer SahasrabuddheEach occurrence of an instruction in the program source is called a
108475ce4c2SSameer Sahasrabuddhe*static instance*. When a thread executes a program, each execution of
109475ce4c2SSameer Sahasrabuddhea static instance produces a distinct *dynamic instance* of that
110475ce4c2SSameer Sahasrabuddheinstruction.
111475ce4c2SSameer Sahasrabuddhe
112475ce4c2SSameer SahasrabuddheEach thread produces a unique sequence of dynamic instances:
113475ce4c2SSameer Sahasrabuddhe
114475ce4c2SSameer Sahasrabuddhe- The sequence is generated along branch decisions and loop
115475ce4c2SSameer Sahasrabuddhe  traversals.
116475ce4c2SSameer Sahasrabuddhe- Starts with a dynamic instance of a "first" instruction.
117475ce4c2SSameer Sahasrabuddhe- Continues with dynamic instances of successive "next"
118475ce4c2SSameer Sahasrabuddhe  instructions.
119475ce4c2SSameer Sahasrabuddhe
120475ce4c2SSameer SahasrabuddheThreads are independent; some targets may choose to execute them in
121475ce4c2SSameer Sahasrabuddhegroups in order to share resources when possible.
122475ce4c2SSameer Sahasrabuddhe
123475ce4c2SSameer Sahasrabuddhe.. figure:: convergence-natural-loop.png
124475ce4c2SSameer Sahasrabuddhe   :name: convergence-natural-loop
125475ce4c2SSameer Sahasrabuddhe
126475ce4c2SSameer Sahasrabuddhe.. table::
127475ce4c2SSameer Sahasrabuddhe   :name: convergence-thread-example
128475ce4c2SSameer Sahasrabuddhe   :align: left
129475ce4c2SSameer Sahasrabuddhe
130475ce4c2SSameer Sahasrabuddhe   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
131475ce4c2SSameer Sahasrabuddhe   |          |        | 1   | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   |      |
132475ce4c2SSameer Sahasrabuddhe   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
133475ce4c2SSameer Sahasrabuddhe   | Thread 1 | Entry1 | H1  | B1  | L1  | H3  |     | L3  |     |     |     | Exit |
134475ce4c2SSameer Sahasrabuddhe   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
135475ce4c2SSameer Sahasrabuddhe   | Thread 2 | Entry1 | H2  |     | L2  | H4  | B2  | L4  | H5  | B3  | L5  | Exit |
136475ce4c2SSameer Sahasrabuddhe   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
137475ce4c2SSameer Sahasrabuddhe
138475ce4c2SSameer SahasrabuddheIn the above table, each row is a different thread, listing the
139475ce4c2SSameer Sahasrabuddhedynamic instances produced by that thread from left to right. Each
140475ce4c2SSameer Sahasrabuddhethread executes the same program that starts with an ``Entry`` node
141475ce4c2SSameer Sahasrabuddheand ends with an ``Exit`` node, but different threads may take
142475ce4c2SSameer Sahasrabuddhedifferent paths through the control flow of the program. The columns
143475ce4c2SSameer Sahasrabuddheare numbered merely for convenience, and empty cells have no special
144475ce4c2SSameer Sahasrabuddhemeaning. Dynamic instances listed in the same column are converged.
145475ce4c2SSameer Sahasrabuddhe
146475ce4c2SSameer Sahasrabuddhe.. _convergence-definition:
147475ce4c2SSameer Sahasrabuddhe
148475ce4c2SSameer SahasrabuddheConvergence
149475ce4c2SSameer Sahasrabuddhe===========
150475ce4c2SSameer Sahasrabuddhe
151da61c865SSameer Sahasrabuddhe*Convergence-before* is a strict partial order over dynamic instances
152475ce4c2SSameer Sahasrabuddhethat is defined as the transitive closure of:
153475ce4c2SSameer Sahasrabuddhe
154475ce4c2SSameer Sahasrabuddhe1. If dynamic instance ``P`` is executed strictly before ``Q`` in the
155475ce4c2SSameer Sahasrabuddhe   same thread, then ``P`` is *convergence-before* ``Q``.
156475ce4c2SSameer Sahasrabuddhe2. If dynamic instance ``P`` is executed strictly before ``Q1`` in the
157475ce4c2SSameer Sahasrabuddhe   same thread, and ``Q1`` is *converged-with* ``Q2``, then ``P`` is
158475ce4c2SSameer Sahasrabuddhe   *convergence-before* ``Q2``.
159475ce4c2SSameer Sahasrabuddhe3. If dynamic instance ``P1`` is *converged-with* ``P2``, and ``P2``
160475ce4c2SSameer Sahasrabuddhe   is executed strictly before ``Q`` in the same thread, then ``P1``
161475ce4c2SSameer Sahasrabuddhe   is *convergence-before* ``Q``.
162475ce4c2SSameer Sahasrabuddhe
163475ce4c2SSameer Sahasrabuddhe.. table::
164475ce4c2SSameer Sahasrabuddhe   :name: convergence-order-example
165475ce4c2SSameer Sahasrabuddhe   :align: left
166475ce4c2SSameer Sahasrabuddhe
167475ce4c2SSameer Sahasrabuddhe   +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+
168475ce4c2SSameer Sahasrabuddhe   |          | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9    |
169475ce4c2SSameer Sahasrabuddhe   +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+
170475ce4c2SSameer Sahasrabuddhe   | Thread 1 | Entry | ... |     |     |     | S2  | T   | ... | Exit |
171475ce4c2SSameer Sahasrabuddhe   +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+
172475ce4c2SSameer Sahasrabuddhe   | Thread 2 | Entry | ... |     | Q2  | R   | S1  |     | ... | Exit |
173475ce4c2SSameer Sahasrabuddhe   +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+
174475ce4c2SSameer Sahasrabuddhe   | Thread 3 | Entry | ... | P   | Q1  |     |     |     | ... |      |
175475ce4c2SSameer Sahasrabuddhe   +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+
176475ce4c2SSameer Sahasrabuddhe
177475ce4c2SSameer SahasrabuddheThe above table shows partial sequences of dynamic instances from
178475ce4c2SSameer Sahasrabuddhedifferent threads. Dynamic instances in the same column are assumed
179475ce4c2SSameer Sahasrabuddheto be converged (i.e., related to each other in the converged-with
180475ce4c2SSameer Sahasrabuddherelation). The resulting convergence order includes the edges ``P ->
181475ce4c2SSameer SahasrabuddheQ2``, ``Q1 -> R``, ``P -> R``, ``P -> T``, etc.
182475ce4c2SSameer Sahasrabuddhe
183*256d76f4SSameer Sahasrabuddhe*Converged-with* is a transitive symmetric relation over dynamic instances
184*256d76f4SSameer Sahasrabuddheproduced by *different threads* for the *same static instance*.
185*256d76f4SSameer Sahasrabuddhe
186*256d76f4SSameer SahasrabuddheIt is impractical to provide any one definition for the *converged-with*
187*256d76f4SSameer Sahasrabuddherelation, since different environments may wish to relate dynamic instances in
188*256d76f4SSameer Sahasrabuddhedifferent ways. The fact that *convergence-before* is a strict partial order is
189*256d76f4SSameer Sahasrabuddhea constraint on the *converged-with* relation. It is trivially satisfied if
190*256d76f4SSameer Sahasrabuddhedifferent dynamic instances are never converged. Below, we provide a relation
191*256d76f4SSameer Sahasrabuddhecalled :ref:`maximal converged-with<convergence-maximal>`, which satisifies
192*256d76f4SSameer Sahasrabuddhe*convergence-before* and is suitable for known targets.
193475ce4c2SSameer Sahasrabuddhe
194475ce4c2SSameer Sahasrabuddhe.. _convergence-note-convergence:
195475ce4c2SSameer Sahasrabuddhe
196475ce4c2SSameer Sahasrabuddhe.. note::
197475ce4c2SSameer Sahasrabuddhe
198da61c865SSameer Sahasrabuddhe   1. The convergence-before relation is not
199475ce4c2SSameer Sahasrabuddhe      directly observable. Program transforms are in general free to
200475ce4c2SSameer Sahasrabuddhe      change the order of instructions, even though that obviously
201475ce4c2SSameer Sahasrabuddhe      changes the convergence-before relation.
202475ce4c2SSameer Sahasrabuddhe
203da61c865SSameer Sahasrabuddhe   2. Converged dynamic instances need not be executed at the same
204475ce4c2SSameer Sahasrabuddhe      time or even on the same resource. Converged dynamic instances
205475ce4c2SSameer Sahasrabuddhe      of a convergent operation may appear to do so but that is an
206da61c865SSameer Sahasrabuddhe      implementation detail.
207da61c865SSameer Sahasrabuddhe
208da61c865SSameer Sahasrabuddhe   3. The fact that ``P`` is convergence-before
209475ce4c2SSameer Sahasrabuddhe      ``Q`` does not automatically imply that ``P`` happens-before
210475ce4c2SSameer Sahasrabuddhe      ``Q`` in a memory model sense.
211475ce4c2SSameer Sahasrabuddhe
212475ce4c2SSameer Sahasrabuddhe.. _convergence-maximal:
213475ce4c2SSameer Sahasrabuddhe
214475ce4c2SSameer SahasrabuddheMaximal Convergence
215475ce4c2SSameer Sahasrabuddhe-------------------
216475ce4c2SSameer Sahasrabuddhe
217475ce4c2SSameer SahasrabuddheThis section defines a constraint that may be used to
218475ce4c2SSameer Sahasrabuddheproduce a *maximal converged-with* relation without violating the
219475ce4c2SSameer Sahasrabuddhestrict *convergence-before* order. This maximal converged-with
220475ce4c2SSameer Sahasrabuddherelation is reasonable for real targets and is compatible with
221475ce4c2SSameer Sahasrabuddheconvergent operations.
222475ce4c2SSameer Sahasrabuddhe
223475ce4c2SSameer SahasrabuddheThe maximal converged-with relation is defined in terms of cycle
224da61c865SSameer Sahasrabuddheheaders, with the assumption that threads converge at the header on every
225da61c865SSameer Sahasrabuddhe"iteration" of the cycle. Informally, two threads execute the same iteration of
226da61c865SSameer Sahasrabuddhea cycle if they both previously executed the cycle header the same number of
227da61c865SSameer Sahasrabuddhetimes after they entered that cycle. In general, this needs to account for the
228da61c865SSameer Sahasrabuddheiterations of parent cycles as well.
229475ce4c2SSameer Sahasrabuddhe
230475ce4c2SSameer Sahasrabuddhe   **Maximal converged-with:**
231475ce4c2SSameer Sahasrabuddhe
232475ce4c2SSameer Sahasrabuddhe   Dynamic instances ``X1`` and ``X2`` produced by different threads
233475ce4c2SSameer Sahasrabuddhe   for the same static instance ``X`` are converged in the maximal
234*256d76f4SSameer Sahasrabuddhe   converged-with relation if and only if:
235*256d76f4SSameer Sahasrabuddhe
236*256d76f4SSameer Sahasrabuddhe   - ``X`` is not contained in any cycle, or,
237*256d76f4SSameer Sahasrabuddhe   - For every cycle ``C`` with header ``H`` that contains ``X``:
238475ce4c2SSameer Sahasrabuddhe
239475ce4c2SSameer Sahasrabuddhe     - every dynamic instance ``H1`` of ``H`` that precedes ``X1`` in
240475ce4c2SSameer Sahasrabuddhe       the respective thread is convergence-before ``X2``, and,
241475ce4c2SSameer Sahasrabuddhe     - every dynamic instance ``H2`` of ``H`` that precedes ``X2`` in
242475ce4c2SSameer Sahasrabuddhe       the respective thread is convergence-before ``X1``,
243475ce4c2SSameer Sahasrabuddhe     - without assuming that ``X1`` is converged with ``X2``.
244475ce4c2SSameer Sahasrabuddhe
245475ce4c2SSameer Sahasrabuddhe.. note::
246475ce4c2SSameer Sahasrabuddhe
247da61c865SSameer Sahasrabuddhe   Cycle headers may not be unique to a given CFG if it is irreducible. Each
248da61c865SSameer Sahasrabuddhe   cycle hierarchy for the same CFG results in a different maximal
249da61c865SSameer Sahasrabuddhe   converged-with relation.
250da61c865SSameer Sahasrabuddhe
251475ce4c2SSameer Sahasrabuddhe   For brevity, the rest of the document restricts the term
252475ce4c2SSameer Sahasrabuddhe   *converged* to mean "related under the maximal converged-with
253475ce4c2SSameer Sahasrabuddhe   relation for the given cycle hierarchy".
254475ce4c2SSameer Sahasrabuddhe
255475ce4c2SSameer SahasrabuddheMaximal convergence can now be demonstrated in the earlier example as follows:
256475ce4c2SSameer Sahasrabuddhe
257475ce4c2SSameer Sahasrabuddhe.. table::
258475ce4c2SSameer Sahasrabuddhe   :align: left
259475ce4c2SSameer Sahasrabuddhe
260475ce4c2SSameer Sahasrabuddhe   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
261475ce4c2SSameer Sahasrabuddhe   |          |        | 1   | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   |      |
262475ce4c2SSameer Sahasrabuddhe   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
263475ce4c2SSameer Sahasrabuddhe   | Thread 1 | Entry1 | H1  | B1  | L1  | H3  |     | L3  |     |     |     | Exit |
264475ce4c2SSameer Sahasrabuddhe   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
265475ce4c2SSameer Sahasrabuddhe   | Thread 2 | Entry2 | H2  |     | L2  | H4  | B2  | L4  | H5  | B3  | L5  | Exit |
266475ce4c2SSameer Sahasrabuddhe   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
267475ce4c2SSameer Sahasrabuddhe
268475ce4c2SSameer Sahasrabuddhe- ``Entry1`` and ``Entry2`` are converged.
269475ce4c2SSameer Sahasrabuddhe- ``H1`` and ``H2`` are converged.
270475ce4c2SSameer Sahasrabuddhe- ``B1`` and ``B2`` are not converged due to ``H4`` which is not
271475ce4c2SSameer Sahasrabuddhe  convergence-before ``B1``.
272475ce4c2SSameer Sahasrabuddhe- ``H3`` and ``H4`` are converged.
273475ce4c2SSameer Sahasrabuddhe- ``H3`` is not converged with ``H5`` due to ``H4`` which is not
274475ce4c2SSameer Sahasrabuddhe  convergence-before ``H3``.
275475ce4c2SSameer Sahasrabuddhe- ``L1`` and ``L2`` are converged.
276475ce4c2SSameer Sahasrabuddhe- ``L3`` and ``L4`` are converged.
277475ce4c2SSameer Sahasrabuddhe- ``L3`` is not converged with ``L5`` due to ``H5`` which is not
278475ce4c2SSameer Sahasrabuddhe  convergence-before ``L3``.
279475ce4c2SSameer Sahasrabuddhe
280475ce4c2SSameer Sahasrabuddhe.. _convergence-cycle-headers:
281475ce4c2SSameer Sahasrabuddhe
282475ce4c2SSameer SahasrabuddheDependence on Cycles Headers
283475ce4c2SSameer Sahasrabuddhe----------------------------
284475ce4c2SSameer Sahasrabuddhe
285da61c865SSameer SahasrabuddheContradictions in *convergence-before* are possible only between two
286475ce4c2SSameer Sahasrabuddhenodes that are inside some cycle. The dynamic instances of such nodes
287475ce4c2SSameer Sahasrabuddhemay be interleaved in the same thread, and this interleaving may be
288475ce4c2SSameer Sahasrabuddhedifferent for different threads.
289475ce4c2SSameer Sahasrabuddhe
290475ce4c2SSameer SahasrabuddheWhen a thread executes a node ``X`` once and then executes it again,
291475ce4c2SSameer Sahasrabuddheit must have followed a closed path in the CFG that includes ``X``.
292475ce4c2SSameer SahasrabuddheSuch a path must pass through the header of at least one cycle --- the
293475ce4c2SSameer Sahasrabuddhesmallest cycle that includes the entire closed path. In a given
294475ce4c2SSameer Sahasrabuddhethread, two dynamic instances of ``X`` are either separated by the
295475ce4c2SSameer Sahasrabuddheexecution of at least one cycle header, or ``X`` itself is a cycle
296475ce4c2SSameer Sahasrabuddheheader.
297475ce4c2SSameer Sahasrabuddhe
298475ce4c2SSameer SahasrabuddheIn reducible cycles (natural loops), each execution of the header is
299475ce4c2SSameer Sahasrabuddheequivalent to the start of a new iteration of the cycle. But this
300475ce4c2SSameer Sahasrabuddheanalogy breaks down in the presence of explicit constraints on the
301475ce4c2SSameer Sahasrabuddheconverged-with relation, such as those described in :ref:`future
302475ce4c2SSameer Sahasrabuddhework<convergence-note-convergence>`. Instead, cycle headers should be
303475ce4c2SSameer Sahasrabuddhetreated as implicit *points of convergence* in a maximal
304475ce4c2SSameer Sahasrabuddheconverged-with relation.
305475ce4c2SSameer Sahasrabuddhe
306475ce4c2SSameer SahasrabuddheConsider a sequence of nested cycles ``C1``, ``C2``, ..., ``Ck`` such
307475ce4c2SSameer Sahasrabuddhethat ``C1`` is the outermost cycle and ``Ck`` is the innermost cycle,
308475ce4c2SSameer Sahasrabuddhewith headers ``H1``, ``H2``, ..., ``Hk`` respectively. When a thread
309475ce4c2SSameer Sahasrabuddheenters the cycle ``Ck``, any of the following is possible:
310475ce4c2SSameer Sahasrabuddhe
311475ce4c2SSameer Sahasrabuddhe1. The thread directly entered cycle ``Ck`` without having executed
312475ce4c2SSameer Sahasrabuddhe   any of the headers ``H1`` to ``Hk``.
313475ce4c2SSameer Sahasrabuddhe
314475ce4c2SSameer Sahasrabuddhe2. The thread executed some or all of the nested headers one or more
315475ce4c2SSameer Sahasrabuddhe   times.
316475ce4c2SSameer Sahasrabuddhe
317475ce4c2SSameer SahasrabuddheThe maximal converged-with relation captures the following intuition
318475ce4c2SSameer Sahasrabuddheabout cycles:
319475ce4c2SSameer Sahasrabuddhe
320475ce4c2SSameer Sahasrabuddhe1. When two threads enter a top-level cycle ``C1``, they execute
321475ce4c2SSameer Sahasrabuddhe   converged dynamic instances of every node that is a :ref:`child
322475ce4c2SSameer Sahasrabuddhe   <cycle-parent-block>` of ``C1``.
323475ce4c2SSameer Sahasrabuddhe
324475ce4c2SSameer Sahasrabuddhe2. When two threads enter a nested cycle ``Ck``, they execute
325475ce4c2SSameer Sahasrabuddhe   converged dynamic instances of every node that is a child of
326475ce4c2SSameer Sahasrabuddhe   ``Ck``, until either thread exits ``Ck``, if and only if they
327475ce4c2SSameer Sahasrabuddhe   executed converged dynamic instances of the last nested header that
328475ce4c2SSameer Sahasrabuddhe   either thread encountered.
329475ce4c2SSameer Sahasrabuddhe
330475ce4c2SSameer Sahasrabuddhe   Note that when a thread exits a nested cycle ``Ck``, it must follow
331475ce4c2SSameer Sahasrabuddhe   a closed path outside ``Ck`` to reenter it. This requires executing
332475ce4c2SSameer Sahasrabuddhe   the header of some outer cycle, as described earlier.
333475ce4c2SSameer Sahasrabuddhe
334475ce4c2SSameer SahasrabuddheConsider two dynamic instances ``X1`` and ``X2`` produced by threads ``T1``
335475ce4c2SSameer Sahasrabuddheand ``T2`` for a node ``X`` that is a child of nested cycle ``Ck``.
336475ce4c2SSameer SahasrabuddheMaximal convergence relates ``X1`` and ``X2`` as follows:
337475ce4c2SSameer Sahasrabuddhe
338475ce4c2SSameer Sahasrabuddhe1. If neither thread executed any header from ``H1`` to ``Hk``, then
339475ce4c2SSameer Sahasrabuddhe   ``X1`` and ``X2`` are converged.
340475ce4c2SSameer Sahasrabuddhe
341475ce4c2SSameer Sahasrabuddhe2. Otherwise, if there are no converged dynamic instances ``Q1`` and
342475ce4c2SSameer Sahasrabuddhe   ``Q2`` of any header ``Q`` from ``H1`` to ``Hk`` (where ``Q`` is
343475ce4c2SSameer Sahasrabuddhe   possibly the same as ``X``), such that ``Q1`` precedes ``X1`` and
344475ce4c2SSameer Sahasrabuddhe   ``Q2`` precedes ``X2`` in the respective threads, then ``X1`` and
345475ce4c2SSameer Sahasrabuddhe   ``X2`` are not converged.
346475ce4c2SSameer Sahasrabuddhe
347475ce4c2SSameer Sahasrabuddhe3. Otherwise, consider the pair ``Q1`` and ``Q2`` of converged dynamic
348475ce4c2SSameer Sahasrabuddhe   instances of a header ``Q`` from ``H1`` to ``Hk`` that occur most
349475ce4c2SSameer Sahasrabuddhe   recently before ``X1`` and ``X2`` in the respective threads. Then
350475ce4c2SSameer Sahasrabuddhe   ``X1`` and ``X2`` are converged if and only if there is no dynamic
351475ce4c2SSameer Sahasrabuddhe   instance of any header from ``H1`` to ``Hk`` that occurs between
352475ce4c2SSameer Sahasrabuddhe   ``Q1`` and ``X1`` in thread ``T1``, or between ``Q2`` and ``X2`` in
353475ce4c2SSameer Sahasrabuddhe   thread ``T2``. In other words, ``Q1`` and ``Q2`` represent the last
354475ce4c2SSameer Sahasrabuddhe   point of convergence, with no other header being executed before
355475ce4c2SSameer Sahasrabuddhe   executing ``X``.
356475ce4c2SSameer Sahasrabuddhe
357475ce4c2SSameer Sahasrabuddhe**Example:**
358475ce4c2SSameer Sahasrabuddhe
359475ce4c2SSameer Sahasrabuddhe.. figure:: convergence-both-diverged-nested.png
360475ce4c2SSameer Sahasrabuddhe   :name: convergence-both-diverged-nested
361475ce4c2SSameer Sahasrabuddhe
362475ce4c2SSameer SahasrabuddheThe above figure shows two nested irreducible cycles with headers
363475ce4c2SSameer Sahasrabuddhe``R`` and ``S``. The nodes ``Entry`` and ``Q`` have divergent
364475ce4c2SSameer Sahasrabuddhebranches. The table below shows the convergence between three threads
365475ce4c2SSameer Sahasrabuddhetaking different paths through the CFG. Dynamic instances listed in
366475ce4c2SSameer Sahasrabuddhethe same column are converged.
367475ce4c2SSameer Sahasrabuddhe
368475ce4c2SSameer Sahasrabuddhe   .. table::
369475ce4c2SSameer Sahasrabuddhe      :align: left
370475ce4c2SSameer Sahasrabuddhe
371475ce4c2SSameer Sahasrabuddhe      +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+
372475ce4c2SSameer Sahasrabuddhe      |         | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 10   |
373475ce4c2SSameer Sahasrabuddhe      +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+
374475ce4c2SSameer Sahasrabuddhe      | Thread1 | Entry | P1  | Q1  | S1  | P3  | Q3  | R1  | S2  | Exit |
375475ce4c2SSameer Sahasrabuddhe      +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+
376475ce4c2SSameer Sahasrabuddhe      | Thread2 | Entry | P2  | Q2  |     |     |     | R2  | S3  | Exit |
377475ce4c2SSameer Sahasrabuddhe      +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+
378475ce4c2SSameer Sahasrabuddhe      | Thread3 | Entry |     |     |     |     |     | R3  | S4  | Exit |
379475ce4c2SSameer Sahasrabuddhe      +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+
380475ce4c2SSameer Sahasrabuddhe
381475ce4c2SSameer Sahasrabuddhe- ``P2`` and ``P3`` are not converged due to ``S1``
382475ce4c2SSameer Sahasrabuddhe- ``Q2`` and ``Q3`` are not converged due to ``S1``
383475ce4c2SSameer Sahasrabuddhe- ``S1`` and ``S3`` are not converged due to ``R2``
384475ce4c2SSameer Sahasrabuddhe- ``S1`` and ``S4`` are not converged due to ``R3``
385475ce4c2SSameer Sahasrabuddhe
386475ce4c2SSameer SahasrabuddheInformally, ``T1`` and ``T2`` execute the inner cycle a different
387475ce4c2SSameer Sahasrabuddhenumber of times, without executing the header of the outer cycle. All
388475ce4c2SSameer Sahasrabuddhethreads converge in the outer cycle when they first execute the header
389475ce4c2SSameer Sahasrabuddheof the outer cycle.
390475ce4c2SSameer Sahasrabuddhe
391475ce4c2SSameer Sahasrabuddhe.. _convergence-uniformity:
392475ce4c2SSameer Sahasrabuddhe
393475ce4c2SSameer SahasrabuddheUniformity
394475ce4c2SSameer Sahasrabuddhe==========
395475ce4c2SSameer Sahasrabuddhe
396475ce4c2SSameer Sahasrabuddhe1. The output of two converged dynamic instances is uniform if and
397475ce4c2SSameer Sahasrabuddhe   only if it compares equal for those two dynamic instances.
398475ce4c2SSameer Sahasrabuddhe2. The output of a static instance ``X`` is uniform *for a given set
399475ce4c2SSameer Sahasrabuddhe   of threads* if and only if it is uniform for every pair of
400475ce4c2SSameer Sahasrabuddhe   converged dynamic instances of ``X`` produced by those threads.
401475ce4c2SSameer Sahasrabuddhe
402475ce4c2SSameer SahasrabuddheA non-uniform value is said to be *divergent*.
403475ce4c2SSameer Sahasrabuddhe
404475ce4c2SSameer SahasrabuddheFor a set ``S`` of threads, the uniformity of each output of a static
405475ce4c2SSameer Sahasrabuddheinstance is determined as follows:
406475ce4c2SSameer Sahasrabuddhe
407475ce4c2SSameer Sahasrabuddhe1. The semantics of the instruction may specify the output to be
408475ce4c2SSameer Sahasrabuddhe   uniform.
409fd98416dSSameer Sahasrabuddhe2. Otherwise, the output is divergent if the static instance is not
410fd98416dSSameer Sahasrabuddhe   :ref:`m-converged <convergence-m-converged>`.
411fd98416dSSameer Sahasrabuddhe3. Otherwise, if the static instance is m-converged:
412fd98416dSSameer Sahasrabuddhe
413fd98416dSSameer Sahasrabuddhe   1. If it is a PHI node, its output is uniform if and only
414475ce4c2SSameer Sahasrabuddhe      if for every pair of converged dynamic instances produced by all
415475ce4c2SSameer Sahasrabuddhe      threads in ``S``:
416475ce4c2SSameer Sahasrabuddhe
417475ce4c2SSameer Sahasrabuddhe      a. Both instances choose the same output from converged
418475ce4c2SSameer Sahasrabuddhe         dynamic instances, and,
419475ce4c2SSameer Sahasrabuddhe      b. That output is uniform for all threads in ``S``.
420fd98416dSSameer Sahasrabuddhe   2. Otherwise, the output is uniform if and only if the input
421475ce4c2SSameer Sahasrabuddhe      operands are uniform for all threads in ``S``.
422475ce4c2SSameer Sahasrabuddhe
423475ce4c2SSameer SahasrabuddheDivergent Cycle Exits
424475ce4c2SSameer Sahasrabuddhe---------------------
425475ce4c2SSameer Sahasrabuddhe
426475ce4c2SSameer SahasrabuddheWhen a divergent branch occurs inside a cycle, it is possible that a
427475ce4c2SSameer Sahasrabuddhediverged path continues to an exit of the cycle. This is called a
428475ce4c2SSameer Sahasrabuddhedivergent cycle exit. If the cycle is irreducible, the diverged path
429475ce4c2SSameer Sahasrabuddhemay re-enter and eventually reach a join within the cycle. Such a join
430475ce4c2SSameer Sahasrabuddheshould be examined for the :ref:`diverged entry
431475ce4c2SSameer Sahasrabuddhe<convergence-diverged-entry>` criterion.
432475ce4c2SSameer Sahasrabuddhe
433475ce4c2SSameer SahasrabuddheNodes along the diverged path that lie outside the cycle experience
434475ce4c2SSameer Sahasrabuddhe*temporal divergence*, when two threads executing convergently inside
435475ce4c2SSameer Sahasrabuddhethe cycle produce uniform values, but exit the cycle along the same
436475ce4c2SSameer Sahasrabuddhedivergent path after executing the header a different number of times
437475ce4c2SSameer Sahasrabuddhe(informally, on different iterations of the cycle). For a node ``N``
438475ce4c2SSameer Sahasrabuddheinside the cycle the outputs may be uniform for the two threads, but
439475ce4c2SSameer Sahasrabuddheany use ``U`` outside the cycle receives a value from non-converged
440475ce4c2SSameer Sahasrabuddhedynamic instances of ``N``. An output of ``U`` may be divergent,
441475ce4c2SSameer Sahasrabuddhedepending on the semantics of the instruction.
442475ce4c2SSameer Sahasrabuddhe
443da61c865SSameer Sahasrabuddhe.. _uniformity-analysis:
444da61c865SSameer Sahasrabuddhe
445475ce4c2SSameer SahasrabuddheStatic Uniformity Analysis
446475ce4c2SSameer Sahasrabuddhe==========================
447475ce4c2SSameer Sahasrabuddhe
448475ce4c2SSameer SahasrabuddheIrreducible control flow results in different cycle hierarchies
449475ce4c2SSameer Sahasrabuddhedepending on the choice of headers during depth-first traversal. As a
450475ce4c2SSameer Sahasrabuddheresult, a static analysis cannot always determine the convergence of
451475ce4c2SSameer Sahasrabuddhenodes in irreducible cycles, and any uniformity analysis is limited to
452475ce4c2SSameer Sahasrabuddhethose static instances whose convergence is independent of the cycle
453475ce4c2SSameer Sahasrabuddhehierarchy:
454475ce4c2SSameer Sahasrabuddhe
455fd98416dSSameer Sahasrabuddhe.. _convergence-m-converged:
456fd98416dSSameer Sahasrabuddhe
457475ce4c2SSameer Sahasrabuddhe  **m-converged static instances:**
458475ce4c2SSameer Sahasrabuddhe
459475ce4c2SSameer Sahasrabuddhe  A static instance ``X`` is *m-converged* for a given CFG if and only
460475ce4c2SSameer Sahasrabuddhe  if the maximal converged-with relation for its dynamic instances is
461475ce4c2SSameer Sahasrabuddhe  the same in every cycle hierarchy that can be constructed for that CFG.
462475ce4c2SSameer Sahasrabuddhe
463475ce4c2SSameer Sahasrabuddhe  .. note::
464475ce4c2SSameer Sahasrabuddhe
465475ce4c2SSameer Sahasrabuddhe   In other words, two dynamic instances ``X1`` and ``X2`` of an
466475ce4c2SSameer Sahasrabuddhe   m-converged static instance ``X`` are converged in some cycle
467475ce4c2SSameer Sahasrabuddhe   hierarchy if and only if they are also converged in every other
468475ce4c2SSameer Sahasrabuddhe   cycle hierarchy for the same CFG.
469475ce4c2SSameer Sahasrabuddhe
470475ce4c2SSameer Sahasrabuddhe   As noted earlier, for brevity, we restrict the term *converged* to
471475ce4c2SSameer Sahasrabuddhe   mean "related under the maximal converged-with relation for a given
472475ce4c2SSameer Sahasrabuddhe   cycle hierarchy".
473475ce4c2SSameer Sahasrabuddhe
474475ce4c2SSameer Sahasrabuddhe
475475ce4c2SSameer SahasrabuddheEach node ``X`` in a given CFG is reported to be m-converged if and
476da61c865SSameer Sahasrabuddheonly if every cycle that contains ``X`` satisfies the following necessary
477da61c865SSameer Sahasrabuddheconditions:
478475ce4c2SSameer Sahasrabuddhe
479da61c865SSameer Sahasrabuddhe  1. Every divergent branch inside the cycle satisfies the
480475ce4c2SSameer Sahasrabuddhe     :ref:`diverged entry criterion<convergence-diverged-entry>`, and,
481da61c865SSameer Sahasrabuddhe  2. There are no :ref:`diverged paths reaching the
482475ce4c2SSameer Sahasrabuddhe     cycle<convergence-diverged-outside>` from a divergent branch
483475ce4c2SSameer Sahasrabuddhe     outside it.
484475ce4c2SSameer Sahasrabuddhe
485475ce4c2SSameer Sahasrabuddhe.. note::
486475ce4c2SSameer Sahasrabuddhe
487475ce4c2SSameer Sahasrabuddhe   A reducible cycle :ref:`trivially satisfies
488475ce4c2SSameer Sahasrabuddhe   <convergence-reducible-cycle>` the above conditions. In particular,
489475ce4c2SSameer Sahasrabuddhe   if the whole CFG is reducible, then all nodes in the CFG are
490475ce4c2SSameer Sahasrabuddhe   m-converged.
491475ce4c2SSameer Sahasrabuddhe
492fd98416dSSameer SahasrabuddheThe uniformity of each output of a static instance
493fd98416dSSameer Sahasrabuddheis determined using the criteria
494475ce4c2SSameer Sahasrabuddhe:ref:`described earlier <convergence-uniformity>`. The discovery of
495475ce4c2SSameer Sahasrabuddhedivergent outputs may cause their uses (including branches) to also
496475ce4c2SSameer Sahasrabuddhebecome divergent. The analysis propagates this divergence until a
497475ce4c2SSameer Sahasrabuddhefixed point is reached.
498475ce4c2SSameer Sahasrabuddhe
499475ce4c2SSameer SahasrabuddheThe convergence inferred using these criteria is a safe subset of the
500475ce4c2SSameer Sahasrabuddhemaximal converged-with relation for any cycle hierarchy. In
501475ce4c2SSameer Sahasrabuddheparticular, it is sufficient to determine if a static instance is
502475ce4c2SSameer Sahasrabuddhem-converged for a given cycle hierarchy ``T``, even if that fact is
503475ce4c2SSameer Sahasrabuddhenot detected when examining some other cycle hierarchy ``T'``.
504475ce4c2SSameer Sahasrabuddhe
505475ce4c2SSameer SahasrabuddheThis property allows compiler transforms to use the uniformity
506475ce4c2SSameer Sahasrabuddheanalysis without being affected by DFS choices made in the underlying
507475ce4c2SSameer Sahasrabuddhecycle analysis. When two transforms use different instances of the
508475ce4c2SSameer Sahasrabuddheuniformity analysis for the same CFG, a "divergent value" result in
509475ce4c2SSameer Sahasrabuddheone analysis instance cannot contradict a "uniform value" result in
510475ce4c2SSameer Sahasrabuddhethe other.
511475ce4c2SSameer Sahasrabuddhe
512475ce4c2SSameer SahasrabuddheGeneric transforms such as SimplifyCFG, CSE, and loop transforms
513475ce4c2SSameer Sahasrabuddhecommonly change the program in ways that change the maximal
514475ce4c2SSameer Sahasrabuddheconverged-with relations. This also means that a value that was
515475ce4c2SSameer Sahasrabuddhepreviously uniform can become divergent after such a transform.
516475ce4c2SSameer SahasrabuddheUniformity has to be recomputed after such transforms.
517475ce4c2SSameer Sahasrabuddhe
518475ce4c2SSameer SahasrabuddheDivergent Branch inside a Cycle
519475ce4c2SSameer Sahasrabuddhe-------------------------------
520475ce4c2SSameer Sahasrabuddhe
521475ce4c2SSameer Sahasrabuddhe.. figure:: convergence-divergent-inside.png
522475ce4c2SSameer Sahasrabuddhe   :name: convergence-divergent-inside
523475ce4c2SSameer Sahasrabuddhe
524475ce4c2SSameer SahasrabuddheThe above figure shows a divergent branch ``Q`` inside an irreducible
525475ce4c2SSameer Sahasrabuddhecyclic region. When two threads diverge at ``Q``, the convergence of
526475ce4c2SSameer Sahasrabuddhedynamic instances within the cyclic region depends on the cycle
527475ce4c2SSameer Sahasrabuddhehierarchy chosen:
528475ce4c2SSameer Sahasrabuddhe
529475ce4c2SSameer Sahasrabuddhe1. In an implementation that detects a single cycle ``C`` with header
530475ce4c2SSameer Sahasrabuddhe   ``P``, convergence inside the cycle is determined by ``P``.
531475ce4c2SSameer Sahasrabuddhe
532475ce4c2SSameer Sahasrabuddhe2. In an implementation that detects two nested cycles with headers
533475ce4c2SSameer Sahasrabuddhe   ``R`` and ``S``, convergence inside those cycles is determined by
534475ce4c2SSameer Sahasrabuddhe   their respective headers.
535475ce4c2SSameer Sahasrabuddhe
536475ce4c2SSameer Sahasrabuddhe.. _convergence-diverged-entry:
537475ce4c2SSameer Sahasrabuddhe
538475ce4c2SSameer SahasrabuddheA conservative approach would be to simply report all nodes inside
539475ce4c2SSameer Sahasrabuddheirreducible cycles as having divergent outputs. But it is desirable to
540475ce4c2SSameer Sahasrabuddherecognize m-converged nodes in the CFG in order to maximize
541475ce4c2SSameer Sahasrabuddheuniformity. This section describes one such pattern of nodes derived
542475ce4c2SSameer Sahasrabuddhefrom *closed paths*, which are a property of the CFG and do not depend
543475ce4c2SSameer Sahasrabuddheon the cycle hierarchy.
544475ce4c2SSameer Sahasrabuddhe
545475ce4c2SSameer Sahasrabuddhe  **Diverged Entry Criterion:**
546475ce4c2SSameer Sahasrabuddhe
547475ce4c2SSameer Sahasrabuddhe  The dynamic instances of all the nodes in a closed path ``P`` are
548475ce4c2SSameer Sahasrabuddhe  m-converged only if for every divergent branch ``B`` and its
549475ce4c2SSameer Sahasrabuddhe  join node ``J`` that lie on ``P``, there is no entry to ``P`` which
550475ce4c2SSameer Sahasrabuddhe  lies on a diverged path from ``B`` to ``J``.
551475ce4c2SSameer Sahasrabuddhe
552475ce4c2SSameer Sahasrabuddhe.. figure:: convergence-closed-path.png
553475ce4c2SSameer Sahasrabuddhe   :name: convergence-closed-path
554475ce4c2SSameer Sahasrabuddhe
555475ce4c2SSameer SahasrabuddheConsider the closed path ``P -> Q -> R -> S`` in the above figure.
556475ce4c2SSameer Sahasrabuddhe``P`` and ``R`` are :ref:`entries to the closed
557475ce4c2SSameer Sahasrabuddhepath<cycle-closed-path>`. ``Q`` is a divergent branch and ``S`` is a
558475ce4c2SSameer Sahasrabuddhejoin for that branch, with diverged paths ``Q -> R -> S`` and ``Q ->
559475ce4c2SSameer SahasrabuddheS``.
560475ce4c2SSameer Sahasrabuddhe
561475ce4c2SSameer Sahasrabuddhe- If a diverged entry ``R`` exists, then in some cycle hierarchy,
562475ce4c2SSameer Sahasrabuddhe  ``R`` is the header of the smallest cycle ``C`` containing the
563475ce4c2SSameer Sahasrabuddhe  closed path and a :ref:`child cycle<cycle-definition>` ``C'``
564475ce4c2SSameer Sahasrabuddhe  exists in the set ``C - R``, containing both branch ``Q`` and join
565475ce4c2SSameer Sahasrabuddhe  ``S``. When threads diverge at ``Q``, one subset ``M`` continues
566475ce4c2SSameer Sahasrabuddhe  inside cycle ``C'``, while the complement ``N`` exits ``C'`` and
567475ce4c2SSameer Sahasrabuddhe  reaches ``R``. Dynamic instances of ``S`` executed by threads in set
568475ce4c2SSameer Sahasrabuddhe  ``M`` are not converged with those executed in set ``N`` due to the
569475ce4c2SSameer Sahasrabuddhe  presence of ``R``. Informally, threads that diverge at ``Q``
570475ce4c2SSameer Sahasrabuddhe  reconverge in the same iteration of the outer cycle ``C``, but they
571475ce4c2SSameer Sahasrabuddhe  may have executed the inner cycle ``C'`` differently.
572475ce4c2SSameer Sahasrabuddhe
573475ce4c2SSameer Sahasrabuddhe  .. table::
574475ce4c2SSameer Sahasrabuddhe     :align: left
575475ce4c2SSameer Sahasrabuddhe
576475ce4c2SSameer Sahasrabuddhe     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
577475ce4c2SSameer Sahasrabuddhe     |         | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11   |
578475ce4c2SSameer Sahasrabuddhe     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
579475ce4c2SSameer Sahasrabuddhe     | Thread1 | Entry | P1  | Q1  |     |     |     | R1  | S1  | P3  | ... | Exit |
580475ce4c2SSameer Sahasrabuddhe     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
581475ce4c2SSameer Sahasrabuddhe     | Thread2 | Entry | P2  | Q2  | S2  | P4  | Q4  | R2  | S4  |     |     | Exit |
582475ce4c2SSameer Sahasrabuddhe     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
583475ce4c2SSameer Sahasrabuddhe
584475ce4c2SSameer Sahasrabuddhe  In the table above, ``S2`` is not converged with ``S1`` due to ``R1``.
585475ce4c2SSameer Sahasrabuddhe
586475ce4c2SSameer Sahasrabuddhe|
587475ce4c2SSameer Sahasrabuddhe
588475ce4c2SSameer Sahasrabuddhe- If ``R`` does not exist, or if any node other than ``R`` is the
589475ce4c2SSameer Sahasrabuddhe  header of ``C``, then no such child cycle ``C'`` is detected.
590475ce4c2SSameer Sahasrabuddhe  Threads that diverge at ``Q`` execute converged dynamic instances of
591475ce4c2SSameer Sahasrabuddhe  ``S`` since they do not encounter the cycle header on any path from
592475ce4c2SSameer Sahasrabuddhe  ``Q`` to ``S``. Informally, threads that diverge at ``Q``
593475ce4c2SSameer Sahasrabuddhe  reconverge at ``S`` in the same iteration of ``C``.
594475ce4c2SSameer Sahasrabuddhe
595475ce4c2SSameer Sahasrabuddhe  .. table::
596475ce4c2SSameer Sahasrabuddhe     :align: left
597475ce4c2SSameer Sahasrabuddhe
598475ce4c2SSameer Sahasrabuddhe     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+
599475ce4c2SSameer Sahasrabuddhe     |         | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10   |
600475ce4c2SSameer Sahasrabuddhe     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+
601475ce4c2SSameer Sahasrabuddhe     | Thread1 | Entry | P1  | Q1  | R1  | S1  | P3  | Q3  | R3  | S3  | Exit |
602475ce4c2SSameer Sahasrabuddhe     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+
603475ce4c2SSameer Sahasrabuddhe     | Thread2 | Entry | P2  | Q2  |     | S2  | P4  | Q4  | R2  | S4  | Exit |
604475ce4c2SSameer Sahasrabuddhe     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+
605475ce4c2SSameer Sahasrabuddhe
606475ce4c2SSameer Sahasrabuddhe|
607475ce4c2SSameer Sahasrabuddhe
608475ce4c2SSameer Sahasrabuddhe  .. note::
609475ce4c2SSameer Sahasrabuddhe
610475ce4c2SSameer Sahasrabuddhe     In general, the cycle ``C`` in the above statements is not
611475ce4c2SSameer Sahasrabuddhe     expected to be the same cycle for different headers. Cycles and
612475ce4c2SSameer Sahasrabuddhe     their headers are tightly coupled; for different headers in the
613475ce4c2SSameer Sahasrabuddhe     same outermost cycle, the child cycles detected may be different.
614475ce4c2SSameer Sahasrabuddhe     The property relevant to the above examples is that for every
615475ce4c2SSameer Sahasrabuddhe     closed path, there is a cycle ``C`` that contains the path and
616475ce4c2SSameer Sahasrabuddhe     whose header is on that path.
617475ce4c2SSameer Sahasrabuddhe
618475ce4c2SSameer SahasrabuddheThe diverged entry criterion must be checked for every closed path
619475ce4c2SSameer Sahasrabuddhepassing through a divergent branch ``B`` and its join ``J``. Since
620475ce4c2SSameer Sahasrabuddhe:ref:`every closed path passes through the header of some
621475ce4c2SSameer Sahasrabuddhecycle<cycle-closed-path-header>`, this amounts to checking every cycle
622475ce4c2SSameer Sahasrabuddhe``C`` that contains ``B`` and ``J``. When the header of ``C``
623475ce4c2SSameer Sahasrabuddhedominates the join ``J``, there can be no entry to any path from the
624475ce4c2SSameer Sahasrabuddheheader to ``J``, which includes any diverged path from ``B`` to ``J``.
625475ce4c2SSameer SahasrabuddheThis is also true for any closed paths passing through the header of
626475ce4c2SSameer Sahasrabuddhean outer cycle that contains ``C``.
627475ce4c2SSameer Sahasrabuddhe
628475ce4c2SSameer SahasrabuddheThus, the diverged entry criterion can be conservatively simplified
629475ce4c2SSameer Sahasrabuddheas follows:
630475ce4c2SSameer Sahasrabuddhe
631475ce4c2SSameer Sahasrabuddhe  For a divergent branch ``B`` and its join node ``J``, the nodes in a
632475ce4c2SSameer Sahasrabuddhe  cycle ``C`` that contains both ``B`` and ``J`` are m-converged only
633475ce4c2SSameer Sahasrabuddhe  if:
634475ce4c2SSameer Sahasrabuddhe
635475ce4c2SSameer Sahasrabuddhe  - ``B`` strictly dominates ``J``, or,
636475ce4c2SSameer Sahasrabuddhe  - The header ``H`` of ``C`` strictly dominates ``J``, or,
637475ce4c2SSameer Sahasrabuddhe  - Recursively, there is cycle ``C'`` inside ``C`` that satisfies the
638475ce4c2SSameer Sahasrabuddhe    same condition.
639475ce4c2SSameer Sahasrabuddhe
640475ce4c2SSameer SahasrabuddheWhen ``J`` is the same as ``H`` or ``B``, the trivial dominance is
641475ce4c2SSameer Sahasrabuddheinsufficient to make any statement about entries to diverged paths.
642475ce4c2SSameer Sahasrabuddhe
643475ce4c2SSameer Sahasrabuddhe.. _convergence-diverged-outside:
644475ce4c2SSameer Sahasrabuddhe
645475ce4c2SSameer SahasrabuddheDiverged Paths reaching a Cycle
646475ce4c2SSameer Sahasrabuddhe-------------------------------
647475ce4c2SSameer Sahasrabuddhe
648475ce4c2SSameer Sahasrabuddhe.. figure:: convergence-divergent-outside.png
649475ce4c2SSameer Sahasrabuddhe   :name: convergence-divergent-outside
650475ce4c2SSameer Sahasrabuddhe
651475ce4c2SSameer SahasrabuddheThe figure shows two cycle hierarchies with a divergent branch in
652475ce4c2SSameer Sahasrabuddhe``Entry`` instead of ``Q``. For two threads that enter the closed path
653475ce4c2SSameer Sahasrabuddhe``P -> Q -> R -> S`` at ``P`` and ``R`` respectively, the convergence
654475ce4c2SSameer Sahasrabuddheof dynamic instances generated along the path depends on whether ``P``
655475ce4c2SSameer Sahasrabuddheor ``R`` is the header.
656475ce4c2SSameer Sahasrabuddhe
657475ce4c2SSameer Sahasrabuddhe-  Convergence when ``P`` is the header.
658475ce4c2SSameer Sahasrabuddhe
659475ce4c2SSameer Sahasrabuddhe   .. table::
660475ce4c2SSameer Sahasrabuddhe      :align: left
661475ce4c2SSameer Sahasrabuddhe
662475ce4c2SSameer Sahasrabuddhe      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
663475ce4c2SSameer Sahasrabuddhe      |         | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12  | 13   |
664475ce4c2SSameer Sahasrabuddhe      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
665475ce4c2SSameer Sahasrabuddhe      | Thread1 | Entry |     |     |     | P1  | Q1  | R1  | S1  | P3  | Q3  |     | S3  | Exit |
666475ce4c2SSameer Sahasrabuddhe      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
667475ce4c2SSameer Sahasrabuddhe      | Thread2 | Entry |     | R2  | S2  | P2  | Q2  |     | S2  | P4  | Q4  | R3  | S4  | Exit |
668475ce4c2SSameer Sahasrabuddhe      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
669475ce4c2SSameer Sahasrabuddhe
670475ce4c2SSameer Sahasrabuddhe   |
671475ce4c2SSameer Sahasrabuddhe
672475ce4c2SSameer Sahasrabuddhe-  Convergence when ``R`` is the header.
673475ce4c2SSameer Sahasrabuddhe
674475ce4c2SSameer Sahasrabuddhe   .. table::
675475ce4c2SSameer Sahasrabuddhe      :align: left
676475ce4c2SSameer Sahasrabuddhe
677475ce4c2SSameer Sahasrabuddhe      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
678475ce4c2SSameer Sahasrabuddhe      |         | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12   |
679475ce4c2SSameer Sahasrabuddhe      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
680475ce4c2SSameer Sahasrabuddhe      | Thread1 | Entry |     | P1  | Q1  | R1  | S1  | P3  | Q3  | S3  |     |     | Exit |
681475ce4c2SSameer Sahasrabuddhe      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
682475ce4c2SSameer Sahasrabuddhe      | Thread2 | Entry |     |     |     | R2  | S2  | P2  | Q2  | S2  | P4  | ... | Exit |
683475ce4c2SSameer Sahasrabuddhe      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
684475ce4c2SSameer Sahasrabuddhe
685475ce4c2SSameer Sahasrabuddhe   |
686475ce4c2SSameer Sahasrabuddhe
687475ce4c2SSameer SahasrabuddheThus, when diverged paths reach different entries of an irreducible
688475ce4c2SSameer Sahasrabuddhecycle from outside the cycle, the static analysis conservatively
689475ce4c2SSameer Sahasrabuddhereports every node in the cycle as not m-converged.
690475ce4c2SSameer Sahasrabuddhe
691475ce4c2SSameer Sahasrabuddhe.. _convergence-reducible-cycle:
692475ce4c2SSameer Sahasrabuddhe
693475ce4c2SSameer SahasrabuddheReducible Cycle
694475ce4c2SSameer Sahasrabuddhe---------------
695475ce4c2SSameer Sahasrabuddhe
696475ce4c2SSameer SahasrabuddheIf ``C`` is a reducible cycle with header ``H``, then in any DFS,
697475ce4c2SSameer Sahasrabuddhe``H`` :ref:`must be the header of some cycle<cycle-reducible-headers>`
698475ce4c2SSameer Sahasrabuddhe``C'`` that contains ``C``. Independent of the DFS, there is no entry
699475ce4c2SSameer Sahasrabuddheto the subgraph ``C`` other than ``H`` itself. Thus, we have the
700475ce4c2SSameer Sahasrabuddhefollowing:
701475ce4c2SSameer Sahasrabuddhe
702475ce4c2SSameer Sahasrabuddhe1. The diverged entry criterion is trivially satisfied for a divergent
703475ce4c2SSameer Sahasrabuddhe   branch and its join, where both are inside subgraph ``C``.
704475ce4c2SSameer Sahasrabuddhe2. When diverged paths reach the subgraph ``C`` from outside, their
705475ce4c2SSameer Sahasrabuddhe   convergence is always determined by the same header ``H``.
706475ce4c2SSameer Sahasrabuddhe
707475ce4c2SSameer SahasrabuddheClearly, this can be determined only in a cycle hierarchy ``T`` where
708475ce4c2SSameer Sahasrabuddhe``C`` is detected as a reducible cycle. No such conclusion can be made
709475ce4c2SSameer Sahasrabuddhein a different cycle hierarchy ``T'`` where ``C`` is part of a larger
710475ce4c2SSameer Sahasrabuddhecycle ``C'`` with the same header, but this does not contradict the
711475ce4c2SSameer Sahasrabuddheconclusion in ``T``.
712da61c865SSameer Sahasrabuddhe
713da61c865SSameer SahasrabuddheControlled Convergence
714da61c865SSameer Sahasrabuddhe======================
715da61c865SSameer Sahasrabuddhe
716da61c865SSameer Sahasrabuddhe:ref:`Convergence control tokens <dynamic_instances_and_convergence_tokens>`
717da61c865SSameer Sahasrabuddheprovide an explicit semantics for determining which threads are converged at a
718da61c865SSameer Sahasrabuddhegiven point in the program. The impact of this is incorporated in a
719da61c865SSameer Sahasrabuddhe:ref:`controlled maximal converged-with <controlled_maximal_converged_with>`
720da61c865SSameer Sahasrabuddherelation over dynamic instances and a :ref:`controlled m-converged
721da61c865SSameer Sahasrabuddhe<controlled_m_converged>` property of static instances. The :ref:`uniformity
722da61c865SSameer Sahasrabuddheanalysis <uniformity-analysis>` implemented in LLVM includes this for targets
723da61c865SSameer Sahasrabuddhethat support convergence control tokens.
724