Lines Matching full:threads

12 Some parallel execution environments execute threads in groups that allow
15 the set of threads that executes it "together", i.e., convergently. When control
16 flow :ref:`diverges <convergence-and-uniformity>`, i.e. threads of the same
18 paths through the CFG, not all threads of the group may be available to
23 that occurs outside of the memory model, where the set of threads which
27 convergent operation is expected to occur precisely among those threads of an
41 unambiguous way of determining the threads that are expected to communicate.
45 of communicating threads for convergent operations.
48 threads are formed in the first place. It focuses on the questions that are
58 In LLVM IR, the only way to communicate between threads as described
80 do not directly depend on the set of threads that enter the function as a
82 implementation-defined subset of threads within the body of the function, as
98 threads in the same group:
113 neighboring pixels, then their corresponding threads will not execute together
121 1. It communicates with a set of threads that implicitly depends on control
123 2. Correctness depends on this set of threads.
146 among all threads of the same "quad" -- a group of 2x2 pixels that are
154 dependencies, it must communicate among the same set of threads. This indicates
158 additional knowledge, that ``%condition`` is always uniform across the threads
183 subset of threads with positive ``delta`` in a subgroup (wave), and so will sum
184 up all the ``delta`` values of those threads; and similarly for the
188 would sum up the ``delta`` across *all* threads instead.
217 terminology), we expect it to communicate among all threads within the
222 defined in this document, they only communicate among the subset of threads
226 among the full set of threads that the entry intrinsic communicated with.
228 among the relevant set of threads: in that case, the ``@subgroupAdd`` already
229 communicates among the full set of threads in the original program.
287 Is the control barrier guaranteed to synchronize among the same set of threads
291 * In an implementation that reconverges at post-dominators, threads reconverge
292 at ``mid`` in the first version, so that all threads (within a subgroup/wave)
294 threads that reach the control barrier via different paths synchronize
295 separately: the first (and only) post-dominator is ``end``, so threads do not
331 If S is the set of threads that the entry intrinsic communicated with, then
333 actually reaches the call site. This set of threads doesn't change after
342 operations where the code does not care about the exact set of threads with
343 which it is executed, but only that the set of threads is the same for all the
346 possible. However, the code may still require that the sets of threads are
353 implementation, where threads conditionally write fixed-sized records
362 because it uses only a single atomic operation for an entire group of threads.
365 atomic operation to all threads of the group, so that each thread can compute
401 The key here is that the function really doesn't care which set of threads it
402 is being called with. It takes whatever set of threads it can get. What the
404 ``@subgroupBallot`` -- which is used to retrieve the bitmask of threads that
405 executed the anchor together -- executes with the same set of threads as the
411 the behavior in practice, by changing the sets of threads that are grouped
457 be executed convergently on every iteration of the loop, by threads that
458 together take the branch to exit the loop. But when compiled, all threads that
468 only by those threads that convergently exited the loop in a given iteration.
503 convergently only by those threads that convergently take the exit edge from %B
517 formal objects by which we talk about communicating threads in convergent
526 by different threads may be :ref:`converged <convergence-definition>`. When
527 executing a convergent operation, the set of threads that execute converged
528 dynamic instances is the set of threads that communicate with each other.
536 a convergence token operand to define the set of communicating threads relative
542 threads execute converged dynamic instances of ``U`` if and only if the
543 token value in both threads was returned by converged dynamic
551 set of threads instead -- specifically, the set ``S`` of threads that
555 ``convergencectrl`` bundle on an instruction ``I``, then the set of threads that
557 Specifically, it is the subset of threads that ends up executing ``I`` while
561 multiple times by the same threads? Which execution of ``I`` in thread 1
597 a. In an OpenCL *kernel launch*, the maximal set of threads that
599 Hence, a suitable choice is to specify that all the threads from
602 b. In a C/C++ program, threads are launched independently and they can
606 threads execute converged dynamic instances of this intrinsic if and
607 only if both threads entered the function by executing converged
663 threads execute converged dynamic instances of ``U`` if and only if:
665 1. The token value in both threads was returned by converged dynamic
667 2. There is an integer *n* such that both threads execute ``U`` for the *n*'th time
705 any "outer scope". The set of threads executing converged dynamic instances of
713 The expectation is that all threads within a group that "happen to be active
715 can detect the maximal set of threads that can communicate efficiently within
752 For an environment-defined group of threads (such as an OpenCL workgroup or
754 all threads in the group do so convergently with that thread.
760 that the group of threads that converge on reaching ``X`` is the same group that
764 implementation-defined group of threads, which is insufficient to support the
859 When the target or the environment guarantees that threads do not
860 communicate using convergent operations or that threads never diverge,
885 of threads.
887 Informational note: Threads that execute converged dynamic instances do not
897 affected by the set of threads executing this function. This typically
912 convergent operation ``U``, the implementation must ensure that the threads that
913 converge at ``U`` are all the threads that reached ``U`` after converging at
915 threads are converged at every node they reach on any path from ``D`` to ``U``.
916 In other words, the converged-with relation at ``D`` produces groups of threads
932 2. Dynamic instances ``X1`` and ``X2`` produced by different threads for the
936 1. Both threads executed converged dynamic instances of every token
967 When a cycle has a divergent exit, maximal convergence assumes that all threads
970 convergence region of ``D`` now extends outside the cycle. If two threads
1006 deciding whether two threads execute converged dynamic instances of
1008 Assume two threads execute converged dynamic instances of the anchor
1016 operations: both threads execute the same static instruction while using
1025 forming associations between loop iterations in different threads, *except*
1033 establishes a relationship between loop iterations across threads.
1045 In the same scenario of two threads executing converged dynamic instances of the
1047 intrinsics implies that both threads execute the converged dynamic instances of
1083 Assume two threads execute converged dynamic instances of the anchor followed
1091 That is, both threads execute two iterations of the loop, but they execute
1093 relation between loop iterations across the threads, there is no reasonable way
1095 same across the threads, if any.
1179 instructions given in the program, the executions of the threads can be
1188 can be no communication between the threads, which means they execute
1234 semantics. This means that the set of communicating threads in the transformed
1289 are threads whose initial counter value is not a multiple of 2. In particular,
1290 all threads with an odd trip count are now likely to execute the convergent
1292 underlying implementation is likely to try to group as many threads together
1320 To understand why unrolling is forbidden, consider two threads that execute
1329 By the dynamic rule on loop heart intrinsics, these threads execute converged
1333 By the dynamic rule on general convergent operations, the threads execute
1374 The threads now execute the following sequences of blocks:
1390 non-converged dynamic instances, which means that the set of communicating threads
1422 changes the set of threads that reach the operation and therefore, the set of
1423 threads that execute converged dynamic instances of the operation. By
1424 definition, this changes the set of threads that participate in the
1432 conditional branches where within every possible relevant set of threads, all
1433 threads will always take the same direction -- is generally allowed. See the end
1442 communicating set of threads. So hoisting is allowed in the following
1461 threads is the union of the two sets of threads in the original program, so
1480 There is no guarantee about the value of ``%id`` in the threads where
1482 ``%id`` is outside of the set of communicating threads, then speculating and
1504 original program, the set of threads that communicates in the
1505 ``@convergent.operation`` is automatically subset to the threads for which
1532 ever communicates with threads that have the same ``condition`` value.
1559 threads participating in those dynamic instances of the anchor could be
1562 threads executing ``@convergent.operation`` could be different in each loop
1591 the sinking will restrict the set of communicating threads to those for which
1596 incorrect. That would allow threads for which ``condition`` is false to