xref: /llvm-project/llvm/docs/ConvergenceAndUniformity.rst (revision 256d76f48060a353ba3bb885698e2ba8d1c87ec6)
1.. _convergence-and-uniformity:
2
3==========================
4Convergence And Uniformity
5==========================
6
7.. contents::
8   :local:
9
10Introduction
11============
12
13In some environments, groups of threads execute the same program in parallel,
14where efficient communication within a group is established using special
15primitives called :ref:`convergent operations<convergent_operations>`. The
16outcome of a convergent operation is sensitive to the set of threads that
17participate in it.
18
19The intuitive picture of *convergence* is built around threads executing in
20"lock step" --- a set of threads is thought of as *converged* if they are all
21executing "the same sequence of instructions together". Such threads may
22*diverge* at a *divergent branch*, and they may later *reconverge* at some
23common program point.
24
25In this intuitive picture, when converged threads execute an instruction, the
26resulting value is said to be *uniform* if it is the same in those threads, and
27*divergent* otherwise. Correspondingly, a branch is said to be a uniform branch
28if its condition is uniform, and it is a divergent branch otherwise.
29
30But the assumption of lock-step execution is not necessary for describing
31communication at convergent operations. It also constrains the implementation
32(compiler as well as hardware) by overspecifying how threads execute in such a
33parallel environment. To eliminate this assumption:
34
35- We define convergence as a relation between the execution of each instruction
36  by different threads and not as a relation between the threads themselves.
37  This definition is reasonable for known targets and is compatible with the
38  semantics of :ref:`convergent operations<convergent_operations>` in LLVM IR.
39- We also define uniformity in terms of this convergence. The output of an
40  instruction can be examined for uniformity across multiple threads only if the
41  corresponding executions of that instruction are converged.
42
43This document decribes a static analysis for determining convergence at each
44instruction in a function. The analysis extends previous work on divergence
45analysis [DivergenceSPMD]_ to cover irreducible control-flow. The described
46analysis is used in LLVM to implement a UniformityAnalysis that determines the
47uniformity of value(s) computed at each instruction in an LLVM IR or MIR
48function.
49
50.. [DivergenceSPMD] Julian Rosemann, Simon Moll, and Sebastian
51   Hack. 2021. An Abstract Interpretation for SPMD Divergence on
52   Reducible Control Flow Graphs. Proc. ACM Program. Lang. 5, POPL,
53   Article 31 (January 2021), 35 pages.
54   https://doi.org/10.1145/3434312
55
56Motivation
57==========
58
59Divergent branches constrain
60program transforms such as changing the CFG or moving a convergent
61operation to a different point of the CFG. Performing these
62transformations across a divergent branch can change the sets of
63threads that execute convergent operations convergently. While these
64constraints are out of scope for this document,
65uniformity analysis allows these transformations to identify
66uniform branches where these constraints do not hold.
67
68Uniformity is also useful by itself on targets that execute threads in
69groups with shared execution resources (e.g. waves, warps, or
70subgroups):
71
72- Uniform outputs can potentially be computed or stored on shared
73  resources.
74- These targets must "linearize" a divergent branch to ensure that
75  each side of the branch is followed by the corresponding threads in
76  the same group. But linearization is unnecessary at uniform
77  branches, since the whole group of threads follows either one side
78  of the branch or the other.
79
80Terminology
81===========
82
83Cycles
84   Described in :ref:`cycle-terminology`.
85
86Closed path
87   Described in :ref:`cycle-closed-path`.
88
89Disjoint paths
90   Two paths in a CFG are said to be disjoint if the only nodes common
91   to both are the start node or the end node, or both.
92
93Join node
94   A join node of a branch is a node reachable along disjoint paths
95   starting from that branch.
96
97Diverged path
98   A diverged path is a path that starts from a divergent branch and
99   either reaches a join node of the branch or reaches the end of the
100   function without passing through any join node of the branch.
101
102.. _convergence-dynamic-instances:
103
104Threads and Dynamic Instances
105=============================
106
107Each occurrence of an instruction in the program source is called a
108*static instance*. When a thread executes a program, each execution of
109a static instance produces a distinct *dynamic instance* of that
110instruction.
111
112Each thread produces a unique sequence of dynamic instances:
113
114- The sequence is generated along branch decisions and loop
115  traversals.
116- Starts with a dynamic instance of a "first" instruction.
117- Continues with dynamic instances of successive "next"
118  instructions.
119
120Threads are independent; some targets may choose to execute them in
121groups in order to share resources when possible.
122
123.. figure:: convergence-natural-loop.png
124   :name: convergence-natural-loop
125
126.. table::
127   :name: convergence-thread-example
128   :align: left
129
130   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
131   |          |        | 1   | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   |      |
132   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
133   | Thread 1 | Entry1 | H1  | B1  | L1  | H3  |     | L3  |     |     |     | Exit |
134   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
135   | Thread 2 | Entry1 | H2  |     | L2  | H4  | B2  | L4  | H5  | B3  | L5  | Exit |
136   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
137
138In the above table, each row is a different thread, listing the
139dynamic instances produced by that thread from left to right. Each
140thread executes the same program that starts with an ``Entry`` node
141and ends with an ``Exit`` node, but different threads may take
142different paths through the control flow of the program. The columns
143are numbered merely for convenience, and empty cells have no special
144meaning. Dynamic instances listed in the same column are converged.
145
146.. _convergence-definition:
147
148Convergence
149===========
150
151*Convergence-before* is a strict partial order over dynamic instances
152that is defined as the transitive closure of:
153
1541. If dynamic instance ``P`` is executed strictly before ``Q`` in the
155   same thread, then ``P`` is *convergence-before* ``Q``.
1562. If dynamic instance ``P`` is executed strictly before ``Q1`` in the
157   same thread, and ``Q1`` is *converged-with* ``Q2``, then ``P`` is
158   *convergence-before* ``Q2``.
1593. If dynamic instance ``P1`` is *converged-with* ``P2``, and ``P2``
160   is executed strictly before ``Q`` in the same thread, then ``P1``
161   is *convergence-before* ``Q``.
162
163.. table::
164   :name: convergence-order-example
165   :align: left
166
167   +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+
168   |          | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9    |
169   +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+
170   | Thread 1 | Entry | ... |     |     |     | S2  | T   | ... | Exit |
171   +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+
172   | Thread 2 | Entry | ... |     | Q2  | R   | S1  |     | ... | Exit |
173   +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+
174   | Thread 3 | Entry | ... | P   | Q1  |     |     |     | ... |      |
175   +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+
176
177The above table shows partial sequences of dynamic instances from
178different threads. Dynamic instances in the same column are assumed
179to be converged (i.e., related to each other in the converged-with
180relation). The resulting convergence order includes the edges ``P ->
181Q2``, ``Q1 -> R``, ``P -> R``, ``P -> T``, etc.
182
183*Converged-with* is a transitive symmetric relation over dynamic instances
184produced by *different threads* for the *same static instance*.
185
186It is impractical to provide any one definition for the *converged-with*
187relation, since different environments may wish to relate dynamic instances in
188different ways. The fact that *convergence-before* is a strict partial order is
189a constraint on the *converged-with* relation. It is trivially satisfied if
190different dynamic instances are never converged. Below, we provide a relation
191called :ref:`maximal converged-with<convergence-maximal>`, which satisifies
192*convergence-before* and is suitable for known targets.
193
194.. _convergence-note-convergence:
195
196.. note::
197
198   1. The convergence-before relation is not
199      directly observable. Program transforms are in general free to
200      change the order of instructions, even though that obviously
201      changes the convergence-before relation.
202
203   2. Converged dynamic instances need not be executed at the same
204      time or even on the same resource. Converged dynamic instances
205      of a convergent operation may appear to do so but that is an
206      implementation detail.
207
208   3. The fact that ``P`` is convergence-before
209      ``Q`` does not automatically imply that ``P`` happens-before
210      ``Q`` in a memory model sense.
211
212.. _convergence-maximal:
213
214Maximal Convergence
215-------------------
216
217This section defines a constraint that may be used to
218produce a *maximal converged-with* relation without violating the
219strict *convergence-before* order. This maximal converged-with
220relation is reasonable for real targets and is compatible with
221convergent operations.
222
223The maximal converged-with relation is defined in terms of cycle
224headers, with the assumption that threads converge at the header on every
225"iteration" of the cycle. Informally, two threads execute the same iteration of
226a cycle if they both previously executed the cycle header the same number of
227times after they entered that cycle. In general, this needs to account for the
228iterations of parent cycles as well.
229
230   **Maximal converged-with:**
231
232   Dynamic instances ``X1`` and ``X2`` produced by different threads
233   for the same static instance ``X`` are converged in the maximal
234   converged-with relation if and only if:
235
236   - ``X`` is not contained in any cycle, or,
237   - For every cycle ``C`` with header ``H`` that contains ``X``:
238
239     - every dynamic instance ``H1`` of ``H`` that precedes ``X1`` in
240       the respective thread is convergence-before ``X2``, and,
241     - every dynamic instance ``H2`` of ``H`` that precedes ``X2`` in
242       the respective thread is convergence-before ``X1``,
243     - without assuming that ``X1`` is converged with ``X2``.
244
245.. note::
246
247   Cycle headers may not be unique to a given CFG if it is irreducible. Each
248   cycle hierarchy for the same CFG results in a different maximal
249   converged-with relation.
250
251   For brevity, the rest of the document restricts the term
252   *converged* to mean "related under the maximal converged-with
253   relation for the given cycle hierarchy".
254
255Maximal convergence can now be demonstrated in the earlier example as follows:
256
257.. table::
258   :align: left
259
260   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
261   |          |        | 1   | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   |      |
262   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
263   | Thread 1 | Entry1 | H1  | B1  | L1  | H3  |     | L3  |     |     |     | Exit |
264   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
265   | Thread 2 | Entry2 | H2  |     | L2  | H4  | B2  | L4  | H5  | B3  | L5  | Exit |
266   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
267
268- ``Entry1`` and ``Entry2`` are converged.
269- ``H1`` and ``H2`` are converged.
270- ``B1`` and ``B2`` are not converged due to ``H4`` which is not
271  convergence-before ``B1``.
272- ``H3`` and ``H4`` are converged.
273- ``H3`` is not converged with ``H5`` due to ``H4`` which is not
274  convergence-before ``H3``.
275- ``L1`` and ``L2`` are converged.
276- ``L3`` and ``L4`` are converged.
277- ``L3`` is not converged with ``L5`` due to ``H5`` which is not
278  convergence-before ``L3``.
279
280.. _convergence-cycle-headers:
281
282Dependence on Cycles Headers
283----------------------------
284
285Contradictions in *convergence-before* are possible only between two
286nodes that are inside some cycle. The dynamic instances of such nodes
287may be interleaved in the same thread, and this interleaving may be
288different for different threads.
289
290When a thread executes a node ``X`` once and then executes it again,
291it must have followed a closed path in the CFG that includes ``X``.
292Such a path must pass through the header of at least one cycle --- the
293smallest cycle that includes the entire closed path. In a given
294thread, two dynamic instances of ``X`` are either separated by the
295execution of at least one cycle header, or ``X`` itself is a cycle
296header.
297
298In reducible cycles (natural loops), each execution of the header is
299equivalent to the start of a new iteration of the cycle. But this
300analogy breaks down in the presence of explicit constraints on the
301converged-with relation, such as those described in :ref:`future
302work<convergence-note-convergence>`. Instead, cycle headers should be
303treated as implicit *points of convergence* in a maximal
304converged-with relation.
305
306Consider a sequence of nested cycles ``C1``, ``C2``, ..., ``Ck`` such
307that ``C1`` is the outermost cycle and ``Ck`` is the innermost cycle,
308with headers ``H1``, ``H2``, ..., ``Hk`` respectively. When a thread
309enters the cycle ``Ck``, any of the following is possible:
310
3111. The thread directly entered cycle ``Ck`` without having executed
312   any of the headers ``H1`` to ``Hk``.
313
3142. The thread executed some or all of the nested headers one or more
315   times.
316
317The maximal converged-with relation captures the following intuition
318about cycles:
319
3201. When two threads enter a top-level cycle ``C1``, they execute
321   converged dynamic instances of every node that is a :ref:`child
322   <cycle-parent-block>` of ``C1``.
323
3242. When two threads enter a nested cycle ``Ck``, they execute
325   converged dynamic instances of every node that is a child of
326   ``Ck``, until either thread exits ``Ck``, if and only if they
327   executed converged dynamic instances of the last nested header that
328   either thread encountered.
329
330   Note that when a thread exits a nested cycle ``Ck``, it must follow
331   a closed path outside ``Ck`` to reenter it. This requires executing
332   the header of some outer cycle, as described earlier.
333
334Consider two dynamic instances ``X1`` and ``X2`` produced by threads ``T1``
335and ``T2`` for a node ``X`` that is a child of nested cycle ``Ck``.
336Maximal convergence relates ``X1`` and ``X2`` as follows:
337
3381. If neither thread executed any header from ``H1`` to ``Hk``, then
339   ``X1`` and ``X2`` are converged.
340
3412. Otherwise, if there are no converged dynamic instances ``Q1`` and
342   ``Q2`` of any header ``Q`` from ``H1`` to ``Hk`` (where ``Q`` is
343   possibly the same as ``X``), such that ``Q1`` precedes ``X1`` and
344   ``Q2`` precedes ``X2`` in the respective threads, then ``X1`` and
345   ``X2`` are not converged.
346
3473. Otherwise, consider the pair ``Q1`` and ``Q2`` of converged dynamic
348   instances of a header ``Q`` from ``H1`` to ``Hk`` that occur most
349   recently before ``X1`` and ``X2`` in the respective threads. Then
350   ``X1`` and ``X2`` are converged if and only if there is no dynamic
351   instance of any header from ``H1`` to ``Hk`` that occurs between
352   ``Q1`` and ``X1`` in thread ``T1``, or between ``Q2`` and ``X2`` in
353   thread ``T2``. In other words, ``Q1`` and ``Q2`` represent the last
354   point of convergence, with no other header being executed before
355   executing ``X``.
356
357**Example:**
358
359.. figure:: convergence-both-diverged-nested.png
360   :name: convergence-both-diverged-nested
361
362The above figure shows two nested irreducible cycles with headers
363``R`` and ``S``. The nodes ``Entry`` and ``Q`` have divergent
364branches. The table below shows the convergence between three threads
365taking different paths through the CFG. Dynamic instances listed in
366the same column are converged.
367
368   .. table::
369      :align: left
370
371      +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+
372      |         | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 10   |
373      +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+
374      | Thread1 | Entry | P1  | Q1  | S1  | P3  | Q3  | R1  | S2  | Exit |
375      +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+
376      | Thread2 | Entry | P2  | Q2  |     |     |     | R2  | S3  | Exit |
377      +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+
378      | Thread3 | Entry |     |     |     |     |     | R3  | S4  | Exit |
379      +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+
380
381- ``P2`` and ``P3`` are not converged due to ``S1``
382- ``Q2`` and ``Q3`` are not converged due to ``S1``
383- ``S1`` and ``S3`` are not converged due to ``R2``
384- ``S1`` and ``S4`` are not converged due to ``R3``
385
386Informally, ``T1`` and ``T2`` execute the inner cycle a different
387number of times, without executing the header of the outer cycle. All
388threads converge in the outer cycle when they first execute the header
389of the outer cycle.
390
391.. _convergence-uniformity:
392
393Uniformity
394==========
395
3961. The output of two converged dynamic instances is uniform if and
397   only if it compares equal for those two dynamic instances.
3982. The output of a static instance ``X`` is uniform *for a given set
399   of threads* if and only if it is uniform for every pair of
400   converged dynamic instances of ``X`` produced by those threads.
401
402A non-uniform value is said to be *divergent*.
403
404For a set ``S`` of threads, the uniformity of each output of a static
405instance is determined as follows:
406
4071. The semantics of the instruction may specify the output to be
408   uniform.
4092. Otherwise, the output is divergent if the static instance is not
410   :ref:`m-converged <convergence-m-converged>`.
4113. Otherwise, if the static instance is m-converged:
412
413   1. If it is a PHI node, its output is uniform if and only
414      if for every pair of converged dynamic instances produced by all
415      threads in ``S``:
416
417      a. Both instances choose the same output from converged
418         dynamic instances, and,
419      b. That output is uniform for all threads in ``S``.
420   2. Otherwise, the output is uniform if and only if the input
421      operands are uniform for all threads in ``S``.
422
423Divergent Cycle Exits
424---------------------
425
426When a divergent branch occurs inside a cycle, it is possible that a
427diverged path continues to an exit of the cycle. This is called a
428divergent cycle exit. If the cycle is irreducible, the diverged path
429may re-enter and eventually reach a join within the cycle. Such a join
430should be examined for the :ref:`diverged entry
431<convergence-diverged-entry>` criterion.
432
433Nodes along the diverged path that lie outside the cycle experience
434*temporal divergence*, when two threads executing convergently inside
435the cycle produce uniform values, but exit the cycle along the same
436divergent path after executing the header a different number of times
437(informally, on different iterations of the cycle). For a node ``N``
438inside the cycle the outputs may be uniform for the two threads, but
439any use ``U`` outside the cycle receives a value from non-converged
440dynamic instances of ``N``. An output of ``U`` may be divergent,
441depending on the semantics of the instruction.
442
443.. _uniformity-analysis:
444
445Static Uniformity Analysis
446==========================
447
448Irreducible control flow results in different cycle hierarchies
449depending on the choice of headers during depth-first traversal. As a
450result, a static analysis cannot always determine the convergence of
451nodes in irreducible cycles, and any uniformity analysis is limited to
452those static instances whose convergence is independent of the cycle
453hierarchy:
454
455.. _convergence-m-converged:
456
457  **m-converged static instances:**
458
459  A static instance ``X`` is *m-converged* for a given CFG if and only
460  if the maximal converged-with relation for its dynamic instances is
461  the same in every cycle hierarchy that can be constructed for that CFG.
462
463  .. note::
464
465   In other words, two dynamic instances ``X1`` and ``X2`` of an
466   m-converged static instance ``X`` are converged in some cycle
467   hierarchy if and only if they are also converged in every other
468   cycle hierarchy for the same CFG.
469
470   As noted earlier, for brevity, we restrict the term *converged* to
471   mean "related under the maximal converged-with relation for a given
472   cycle hierarchy".
473
474
475Each node ``X`` in a given CFG is reported to be m-converged if and
476only if every cycle that contains ``X`` satisfies the following necessary
477conditions:
478
479  1. Every divergent branch inside the cycle satisfies the
480     :ref:`diverged entry criterion<convergence-diverged-entry>`, and,
481  2. There are no :ref:`diverged paths reaching the
482     cycle<convergence-diverged-outside>` from a divergent branch
483     outside it.
484
485.. note::
486
487   A reducible cycle :ref:`trivially satisfies
488   <convergence-reducible-cycle>` the above conditions. In particular,
489   if the whole CFG is reducible, then all nodes in the CFG are
490   m-converged.
491
492The uniformity of each output of a static instance
493is determined using the criteria
494:ref:`described earlier <convergence-uniformity>`. The discovery of
495divergent outputs may cause their uses (including branches) to also
496become divergent. The analysis propagates this divergence until a
497fixed point is reached.
498
499The convergence inferred using these criteria is a safe subset of the
500maximal converged-with relation for any cycle hierarchy. In
501particular, it is sufficient to determine if a static instance is
502m-converged for a given cycle hierarchy ``T``, even if that fact is
503not detected when examining some other cycle hierarchy ``T'``.
504
505This property allows compiler transforms to use the uniformity
506analysis without being affected by DFS choices made in the underlying
507cycle analysis. When two transforms use different instances of the
508uniformity analysis for the same CFG, a "divergent value" result in
509one analysis instance cannot contradict a "uniform value" result in
510the other.
511
512Generic transforms such as SimplifyCFG, CSE, and loop transforms
513commonly change the program in ways that change the maximal
514converged-with relations. This also means that a value that was
515previously uniform can become divergent after such a transform.
516Uniformity has to be recomputed after such transforms.
517
518Divergent Branch inside a Cycle
519-------------------------------
520
521.. figure:: convergence-divergent-inside.png
522   :name: convergence-divergent-inside
523
524The above figure shows a divergent branch ``Q`` inside an irreducible
525cyclic region. When two threads diverge at ``Q``, the convergence of
526dynamic instances within the cyclic region depends on the cycle
527hierarchy chosen:
528
5291. In an implementation that detects a single cycle ``C`` with header
530   ``P``, convergence inside the cycle is determined by ``P``.
531
5322. In an implementation that detects two nested cycles with headers
533   ``R`` and ``S``, convergence inside those cycles is determined by
534   their respective headers.
535
536.. _convergence-diverged-entry:
537
538A conservative approach would be to simply report all nodes inside
539irreducible cycles as having divergent outputs. But it is desirable to
540recognize m-converged nodes in the CFG in order to maximize
541uniformity. This section describes one such pattern of nodes derived
542from *closed paths*, which are a property of the CFG and do not depend
543on the cycle hierarchy.
544
545  **Diverged Entry Criterion:**
546
547  The dynamic instances of all the nodes in a closed path ``P`` are
548  m-converged only if for every divergent branch ``B`` and its
549  join node ``J`` that lie on ``P``, there is no entry to ``P`` which
550  lies on a diverged path from ``B`` to ``J``.
551
552.. figure:: convergence-closed-path.png
553   :name: convergence-closed-path
554
555Consider the closed path ``P -> Q -> R -> S`` in the above figure.
556``P`` and ``R`` are :ref:`entries to the closed
557path<cycle-closed-path>`. ``Q`` is a divergent branch and ``S`` is a
558join for that branch, with diverged paths ``Q -> R -> S`` and ``Q ->
559S``.
560
561- If a diverged entry ``R`` exists, then in some cycle hierarchy,
562  ``R`` is the header of the smallest cycle ``C`` containing the
563  closed path and a :ref:`child cycle<cycle-definition>` ``C'``
564  exists in the set ``C - R``, containing both branch ``Q`` and join
565  ``S``. When threads diverge at ``Q``, one subset ``M`` continues
566  inside cycle ``C'``, while the complement ``N`` exits ``C'`` and
567  reaches ``R``. Dynamic instances of ``S`` executed by threads in set
568  ``M`` are not converged with those executed in set ``N`` due to the
569  presence of ``R``. Informally, threads that diverge at ``Q``
570  reconverge in the same iteration of the outer cycle ``C``, but they
571  may have executed the inner cycle ``C'`` differently.
572
573  .. table::
574     :align: left
575
576     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
577     |         | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11   |
578     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
579     | Thread1 | Entry | P1  | Q1  |     |     |     | R1  | S1  | P3  | ... | Exit |
580     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
581     | Thread2 | Entry | P2  | Q2  | S2  | P4  | Q4  | R2  | S4  |     |     | Exit |
582     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
583
584  In the table above, ``S2`` is not converged with ``S1`` due to ``R1``.
585
586|
587
588- If ``R`` does not exist, or if any node other than ``R`` is the
589  header of ``C``, then no such child cycle ``C'`` is detected.
590  Threads that diverge at ``Q`` execute converged dynamic instances of
591  ``S`` since they do not encounter the cycle header on any path from
592  ``Q`` to ``S``. Informally, threads that diverge at ``Q``
593  reconverge at ``S`` in the same iteration of ``C``.
594
595  .. table::
596     :align: left
597
598     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+
599     |         | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10   |
600     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+
601     | Thread1 | Entry | P1  | Q1  | R1  | S1  | P3  | Q3  | R3  | S3  | Exit |
602     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+
603     | Thread2 | Entry | P2  | Q2  |     | S2  | P4  | Q4  | R2  | S4  | Exit |
604     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+
605
606|
607
608  .. note::
609
610     In general, the cycle ``C`` in the above statements is not
611     expected to be the same cycle for different headers. Cycles and
612     their headers are tightly coupled; for different headers in the
613     same outermost cycle, the child cycles detected may be different.
614     The property relevant to the above examples is that for every
615     closed path, there is a cycle ``C`` that contains the path and
616     whose header is on that path.
617
618The diverged entry criterion must be checked for every closed path
619passing through a divergent branch ``B`` and its join ``J``. Since
620:ref:`every closed path passes through the header of some
621cycle<cycle-closed-path-header>`, this amounts to checking every cycle
622``C`` that contains ``B`` and ``J``. When the header of ``C``
623dominates the join ``J``, there can be no entry to any path from the
624header to ``J``, which includes any diverged path from ``B`` to ``J``.
625This is also true for any closed paths passing through the header of
626an outer cycle that contains ``C``.
627
628Thus, the diverged entry criterion can be conservatively simplified
629as follows:
630
631  For a divergent branch ``B`` and its join node ``J``, the nodes in a
632  cycle ``C`` that contains both ``B`` and ``J`` are m-converged only
633  if:
634
635  - ``B`` strictly dominates ``J``, or,
636  - The header ``H`` of ``C`` strictly dominates ``J``, or,
637  - Recursively, there is cycle ``C'`` inside ``C`` that satisfies the
638    same condition.
639
640When ``J`` is the same as ``H`` or ``B``, the trivial dominance is
641insufficient to make any statement about entries to diverged paths.
642
643.. _convergence-diverged-outside:
644
645Diverged Paths reaching a Cycle
646-------------------------------
647
648.. figure:: convergence-divergent-outside.png
649   :name: convergence-divergent-outside
650
651The figure shows two cycle hierarchies with a divergent branch in
652``Entry`` instead of ``Q``. For two threads that enter the closed path
653``P -> Q -> R -> S`` at ``P`` and ``R`` respectively, the convergence
654of dynamic instances generated along the path depends on whether ``P``
655or ``R`` is the header.
656
657-  Convergence when ``P`` is the header.
658
659   .. table::
660      :align: left
661
662      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
663      |         | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12  | 13   |
664      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
665      | Thread1 | Entry |     |     |     | P1  | Q1  | R1  | S1  | P3  | Q3  |     | S3  | Exit |
666      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
667      | Thread2 | Entry |     | R2  | S2  | P2  | Q2  |     | S2  | P4  | Q4  | R3  | S4  | Exit |
668      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
669
670   |
671
672-  Convergence when ``R`` is the header.
673
674   .. table::
675      :align: left
676
677      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
678      |         | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12   |
679      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
680      | Thread1 | Entry |     | P1  | Q1  | R1  | S1  | P3  | Q3  | S3  |     |     | Exit |
681      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
682      | Thread2 | Entry |     |     |     | R2  | S2  | P2  | Q2  | S2  | P4  | ... | Exit |
683      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
684
685   |
686
687Thus, when diverged paths reach different entries of an irreducible
688cycle from outside the cycle, the static analysis conservatively
689reports every node in the cycle as not m-converged.
690
691.. _convergence-reducible-cycle:
692
693Reducible Cycle
694---------------
695
696If ``C`` is a reducible cycle with header ``H``, then in any DFS,
697``H`` :ref:`must be the header of some cycle<cycle-reducible-headers>`
698``C'`` that contains ``C``. Independent of the DFS, there is no entry
699to the subgraph ``C`` other than ``H`` itself. Thus, we have the
700following:
701
7021. The diverged entry criterion is trivially satisfied for a divergent
703   branch and its join, where both are inside subgraph ``C``.
7042. When diverged paths reach the subgraph ``C`` from outside, their
705   convergence is always determined by the same header ``H``.
706
707Clearly, this can be determined only in a cycle hierarchy ``T`` where
708``C`` is detected as a reducible cycle. No such conclusion can be made
709in a different cycle hierarchy ``T'`` where ``C`` is part of a larger
710cycle ``C'`` with the same header, but this does not contradict the
711conclusion in ``T``.
712
713Controlled Convergence
714======================
715
716:ref:`Convergence control tokens <dynamic_instances_and_convergence_tokens>`
717provide an explicit semantics for determining which threads are converged at a
718given point in the program. The impact of this is incorporated in a
719:ref:`controlled maximal converged-with <controlled_maximal_converged_with>`
720relation over dynamic instances and a :ref:`controlled m-converged
721<controlled_m_converged>` property of static instances. The :ref:`uniformity
722analysis <uniformity-analysis>` implemented in LLVM includes this for targets
723that support convergence control tokens.
724