xref: /netbsd-src/external/apache2/llvm/dist/llvm/docs/Coroutines.rst (revision 82d56013d7b633d116a93943de88e08335357a7c)
1=====================================
2Coroutines in LLVM
3=====================================
4
5.. contents::
6   :local:
7   :depth: 3
8
9.. warning::
10  This is a work in progress. Compatibility across LLVM releases is not
11  guaranteed.
12
13Introduction
14============
15
16.. _coroutine handle:
17
18LLVM coroutines are functions that have one or more `suspend points`_.
19When a suspend point is reached, the execution of a coroutine is suspended and
20control is returned back to its caller. A suspended coroutine can be resumed
21to continue execution from the last suspend point or it can be destroyed.
22
23In the following example, we call function `f` (which may or may not be a
24coroutine itself) that returns a handle to a suspended coroutine
25(**coroutine handle**) that is used by `main` to resume the coroutine twice and
26then destroy it:
27
28.. code-block:: llvm
29
30  define i32 @main() {
31  entry:
32    %hdl = call i8* @f(i32 4)
33    call void @llvm.coro.resume(i8* %hdl)
34    call void @llvm.coro.resume(i8* %hdl)
35    call void @llvm.coro.destroy(i8* %hdl)
36    ret i32 0
37  }
38
39.. _coroutine frame:
40
41In addition to the function stack frame which exists when a coroutine is
42executing, there is an additional region of storage that contains objects that
43keep the coroutine state when a coroutine is suspended. This region of storage
44is called the **coroutine frame**. It is created when a coroutine is called
45and destroyed when a coroutine either runs to completion or is destroyed
46while suspended.
47
48LLVM currently supports two styles of coroutine lowering. These styles
49support substantially different sets of features, have substantially
50different ABIs, and expect substantially different patterns of frontend
51code generation. However, the styles also have a great deal in common.
52
53In all cases, an LLVM coroutine is initially represented as an ordinary LLVM
54function that has calls to `coroutine intrinsics`_ defining the structure of
55the coroutine. The coroutine function is then, in the most general case,
56rewritten by the coroutine lowering passes to become the "ramp function",
57the initial entrypoint of the coroutine, which executes until a suspend point
58is first reached. The remainder of the original coroutine function is split
59out into some number of "resume functions". Any state which must persist
60across suspensions is stored in the coroutine frame. The resume functions
61must somehow be able to handle either a "normal" resumption, which continues
62the normal execution of the coroutine, or an "abnormal" resumption, which
63must unwind the coroutine without attempting to suspend it.
64
65Switched-Resume Lowering
66------------------------
67
68In LLVM's standard switched-resume lowering, signaled by the use of
69`llvm.coro.id`, the coroutine frame is stored as part of a "coroutine
70object" which represents a handle to a particular invocation of the
71coroutine.  All coroutine objects support a common ABI allowing certain
72features to be used without knowing anything about the coroutine's
73implementation:
74
75- A coroutine object can be queried to see if it has reached completion
76  with `llvm.coro.done`.
77
78- A coroutine object can be resumed normally if it has not already reached
79  completion with `llvm.coro.resume`.
80
81- A coroutine object can be destroyed, invalidating the coroutine object,
82  with `llvm.coro.destroy`.  This must be done separately even if the
83  coroutine has reached completion normally.
84
85- "Promise" storage, which is known to have a certain size and alignment,
86  can be projected out of the coroutine object with `llvm.coro.promise`.
87  The coroutine implementation must have been compiled to define a promise
88  of the same size and alignment.
89
90In general, interacting with a coroutine object in any of these ways while
91it is running has undefined behavior.
92
93The coroutine function is split into three functions, representing three
94different ways that control can enter the coroutine:
95
961. the ramp function that is initially invoked, which takes arbitrary
97   arguments and returns a pointer to the coroutine object;
98
992. a coroutine resume function that is invoked when the coroutine is resumed,
100   which takes a pointer to the coroutine object and returns `void`;
101
1023. a coroutine destroy function that is invoked when the coroutine is
103   destroyed, which takes a pointer to the coroutine object and returns
104   `void`.
105
106Because the resume and destroy functions are shared across all suspend
107points, suspend points must store the index of the active suspend in
108the coroutine object, and the resume/destroy functions must switch over
109that index to get back to the correct point.  Hence the name of this
110lowering.
111
112Pointers to the resume and destroy functions are stored in the coroutine
113object at known offsets which are fixed for all coroutines.  A completed
114coroutine is represented with a null resume function.
115
116There is a somewhat complex protocol of intrinsics for allocating and
117deallocating the coroutine object.  It is complex in order to allow the
118allocation to be elided due to inlining.  This protocol is discussed
119in further detail below.
120
121The frontend may generate code to call the coroutine function directly;
122this will become a call to the ramp function and will return a pointer
123to the coroutine object.  The frontend should always resume or destroy
124the coroutine using the corresponding intrinsics.
125
126Returned-Continuation Lowering
127------------------------------
128
129In returned-continuation lowering, signaled by the use of
130`llvm.coro.id.retcon` or `llvm.coro.id.retcon.once`, some aspects of
131the ABI must be handled more explicitly by the frontend.
132
133In this lowering, every suspend point takes a list of "yielded values"
134which are returned back to the caller along with a function pointer,
135called the continuation function.  The coroutine is resumed by simply
136calling this continuation function pointer.  The original coroutine
137is divided into the ramp function and then an arbitrary number of
138these continuation functions, one for each suspend point.
139
140LLVM actually supports two closely-related returned-continuation
141lowerings:
142
143- In normal returned-continuation lowering, the coroutine may suspend
144  itself multiple times. This means that a continuation function
145  itself returns another continuation pointer, as well as a list of
146  yielded values.
147
148  The coroutine indicates that it has run to completion by returning
149  a null continuation pointer. Any yielded values will be `undef`
150  should be ignored.
151
152- In yield-once returned-continuation lowering, the coroutine must
153  suspend itself exactly once (or throw an exception).  The ramp
154  function returns a continuation function pointer and yielded
155  values, but the continuation function simply returns `void`
156  when the coroutine has run to completion.
157
158The coroutine frame is maintained in a fixed-size buffer that is
159passed to the `coro.id` intrinsic, which guarantees a certain size
160and alignment statically. The same buffer must be passed to the
161continuation function(s). The coroutine will allocate memory if the
162buffer is insufficient, in which case it will need to store at
163least that pointer in the buffer; therefore the buffer must always
164be at least pointer-sized. How the coroutine uses the buffer may
165vary between suspend points.
166
167In addition to the buffer pointer, continuation functions take an
168argument indicating whether the coroutine is being resumed normally
169(zero) or abnormally (non-zero).
170
171LLVM is currently ineffective at statically eliminating allocations
172after fully inlining returned-continuation coroutines into a caller.
173This may be acceptable if LLVM's coroutine support is primarily being
174used for low-level lowering and inlining is expected to be applied
175earlier in the pipeline.
176
177Async Lowering
178--------------
179
180In async-continuation lowering, signaled by the use of `llvm.coro.id.async`,
181handling of control-flow must be handled explicitly by the frontend.
182
183In this lowering, a coroutine is assumed to take the current `async context` as
184one of its arguments (the argument position is determined by
185`llvm.coro.id.async`). It is used to marshal arguments and return values of the
186coroutine. Therefore an async coroutine returns `void`.
187
188.. code-block:: llvm
189
190  define swiftcc void @async_coroutine(i8* %async.ctxt, i8*, i8*) {
191  }
192
193Values live across a suspend point need to be stored in the coroutine frame to
194be available in the continuation function. This frame is stored as a tail to the
195`async context`.
196
197Every suspend point takes an `context projection function` argument which
198describes how-to obtain the continuations `async context` and every suspend
199point has an associated `resume function` denoted by the
200`llvm.coro.async.resume` intrinsic. The coroutine is resumed by calling this
201`resume function` passing the `async context` as the one of its arguments
202argument. The `resume function` can restore its (the caller's) `async context`
203by applying a `context projection function` that is provided by the frontend as
204a parameter to the `llvm.coro.suspend.async` intrinsic.
205
206.. code-block:: c
207
208  // For example:
209  struct async_context {
210    struct async_context *caller_context;
211    ...
212  }
213
214  char *context_projection_function(struct async_context *callee_ctxt) {
215     return callee_ctxt->caller_context;
216  }
217
218.. code-block:: llvm
219
220  %resume_func_ptr = call i8* @llvm.coro.async.resume()
221  call {i8*, i8*, i8*} (i8*, i8*, ...) @llvm.coro.suspend.async(
222                                              i8* %resume_func_ptr,
223                                              i8* %context_projection_function
224
225The frontend should provide a `async function pointer` struct associated with
226each async coroutine by `llvm.coro.id.async`'s argument. The initial size and
227alignment of the `async context` must be provided as arguments to the
228`llvm.coro.id.async` intrinsic. Lowering will update the size entry with the
229coroutine frame  requirements. The frontend is responsible for allocating the
230memory for the `async context` but can use the `async function pointer` struct
231to obtain the required size.
232
233.. code-block:: c
234
235  struct async_function_pointer {
236    uint32_t relative_function_pointer_to_async_impl;
237    uint32_t context_size;
238  }
239
240Lowering will split an async coroutine into a ramp function and one resume
241function per suspend point.
242
243How control-flow is passed between caller, suspension point, and back to
244resume function is left up to the frontend.
245
246The suspend point takes a function and its arguments. The function is intended
247to model the transfer to the callee function. It will be tail called by
248lowering and therefore must have the same signature and calling convention as
249the async coroutine.
250
251.. code-block:: llvm
252
253  call {i8*, i8*, i8*} (i8*, i8*, ...) @llvm.coro.suspend.async(
254                   i8* %resume_func_ptr,
255                   i8* %context_projection_function,
256                   i8* (bitcast void (i8*, i8*, i8*)* to i8*) %suspend_function,
257                   i8* %arg1, i8* %arg2, i8 %arg3)
258
259Coroutines by Example
260=====================
261
262The examples below are all of switched-resume coroutines.
263
264Coroutine Representation
265------------------------
266
267Let's look at an example of an LLVM coroutine with the behavior sketched
268by the following pseudo-code.
269
270.. code-block:: c++
271
272  void *f(int n) {
273     for(;;) {
274       print(n++);
275       <suspend> // returns a coroutine handle on first suspend
276     }
277  }
278
279This coroutine calls some function `print` with value `n` as an argument and
280suspends execution. Every time this coroutine resumes, it calls `print` again with an argument one bigger than the last time. This coroutine never completes by itself and must be destroyed explicitly. If we use this coroutine with
281a `main` shown in the previous section. It will call `print` with values 4, 5
282and 6 after which the coroutine will be destroyed.
283
284The LLVM IR for this coroutine looks like this:
285
286.. code-block:: llvm
287
288  define i8* @f(i32 %n) {
289  entry:
290    %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
291    %size = call i32 @llvm.coro.size.i32()
292    %alloc = call i8* @malloc(i32 %size)
293    %hdl = call noalias i8* @llvm.coro.begin(token %id, i8* %alloc)
294    br label %loop
295  loop:
296    %n.val = phi i32 [ %n, %entry ], [ %inc, %loop ]
297    %inc = add nsw i32 %n.val, 1
298    call void @print(i32 %n.val)
299    %0 = call i8 @llvm.coro.suspend(token none, i1 false)
300    switch i8 %0, label %suspend [i8 0, label %loop
301                                  i8 1, label %cleanup]
302  cleanup:
303    %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
304    call void @free(i8* %mem)
305    br label %suspend
306  suspend:
307    %unused = call i1 @llvm.coro.end(i8* %hdl, i1 false)
308    ret i8* %hdl
309  }
310
311The `entry` block establishes the coroutine frame. The `coro.size`_ intrinsic is
312lowered to a constant representing the size required for the coroutine frame.
313The `coro.begin`_ intrinsic initializes the coroutine frame and returns the
314coroutine handle. The second parameter of `coro.begin` is given a block of memory
315to be used if the coroutine frame needs to be allocated dynamically.
316The `coro.id`_ intrinsic serves as coroutine identity useful in cases when the
317`coro.begin`_ intrinsic get duplicated by optimization passes such as
318jump-threading.
319
320The `cleanup` block destroys the coroutine frame. The `coro.free`_ intrinsic,
321given the coroutine handle, returns a pointer of the memory block to be freed or
322`null` if the coroutine frame was not allocated dynamically. The `cleanup`
323block is entered when coroutine runs to completion by itself or destroyed via
324call to the `coro.destroy`_ intrinsic.
325
326The `suspend` block contains code to be executed when coroutine runs to
327completion or suspended. The `coro.end`_ intrinsic marks the point where
328a coroutine needs to return control back to the caller if it is not an initial
329invocation of the coroutine.
330
331The `loop` blocks represents the body of the coroutine. The `coro.suspend`_
332intrinsic in combination with the following switch indicates what happens to
333control flow when a coroutine is suspended (default case), resumed (case 0) or
334destroyed (case 1).
335
336Coroutine Transformation
337------------------------
338
339One of the steps of coroutine lowering is building the coroutine frame. The
340def-use chains are analyzed to determine which objects need be kept alive across
341suspend points. In the coroutine shown in the previous section, use of virtual register
342`%inc` is separated from the definition by a suspend point, therefore, it
343cannot reside on the stack frame since the latter goes away once the coroutine
344is suspended and control is returned back to the caller. An i32 slot is
345allocated in the coroutine frame and `%inc` is spilled and reloaded from that
346slot as needed.
347
348We also store addresses of the resume and destroy functions so that the
349`coro.resume` and `coro.destroy` intrinsics can resume and destroy the coroutine
350when its identity cannot be determined statically at compile time. For our
351example, the coroutine frame will be:
352
353.. code-block:: llvm
354
355  %f.frame = type { void (%f.frame*)*, void (%f.frame*)*, i32 }
356
357After resume and destroy parts are outlined, function `f` will contain only the
358code responsible for creation and initialization of the coroutine frame and
359execution of the coroutine until a suspend point is reached:
360
361.. code-block:: llvm
362
363  define i8* @f(i32 %n) {
364  entry:
365    %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
366    %alloc = call noalias i8* @malloc(i32 24)
367    %0 = call noalias i8* @llvm.coro.begin(token %id, i8* %alloc)
368    %frame = bitcast i8* %0 to %f.frame*
369    %1 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 0
370    store void (%f.frame*)* @f.resume, void (%f.frame*)** %1
371    %2 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 1
372    store void (%f.frame*)* @f.destroy, void (%f.frame*)** %2
373
374    %inc = add nsw i32 %n, 1
375    %inc.spill.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
376    store i32 %inc, i32* %inc.spill.addr
377    call void @print(i32 %n)
378
379    ret i8* %frame
380  }
381
382Outlined resume part of the coroutine will reside in function `f.resume`:
383
384.. code-block:: llvm
385
386  define internal fastcc void @f.resume(%f.frame* %frame.ptr.resume) {
387  entry:
388    %inc.spill.addr = getelementptr %f.frame, %f.frame* %frame.ptr.resume, i64 0, i32 2
389    %inc.spill = load i32, i32* %inc.spill.addr, align 4
390    %inc = add i32 %n.val, 1
391    store i32 %inc, i32* %inc.spill.addr, align 4
392    tail call void @print(i32 %inc)
393    ret void
394  }
395
396Whereas function `f.destroy` will contain the cleanup code for the coroutine:
397
398.. code-block:: llvm
399
400  define internal fastcc void @f.destroy(%f.frame* %frame.ptr.destroy) {
401  entry:
402    %0 = bitcast %f.frame* %frame.ptr.destroy to i8*
403    tail call void @free(i8* %0)
404    ret void
405  }
406
407Avoiding Heap Allocations
408-------------------------
409
410A particular coroutine usage pattern, which is illustrated by the `main`
411function in the overview section, where a coroutine is created, manipulated and
412destroyed by the same calling function, is common for coroutines implementing
413RAII idiom and is suitable for allocation elision optimization which avoid
414dynamic allocation by storing the coroutine frame as a static `alloca` in its
415caller.
416
417In the entry block, we will call `coro.alloc`_ intrinsic that will return `true`
418when dynamic allocation is required, and `false` if dynamic allocation is
419elided.
420
421.. code-block:: llvm
422
423  entry:
424    %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
425    %need.dyn.alloc = call i1 @llvm.coro.alloc(token %id)
426    br i1 %need.dyn.alloc, label %dyn.alloc, label %coro.begin
427  dyn.alloc:
428    %size = call i32 @llvm.coro.size.i32()
429    %alloc = call i8* @CustomAlloc(i32 %size)
430    br label %coro.begin
431  coro.begin:
432    %phi = phi i8* [ null, %entry ], [ %alloc, %dyn.alloc ]
433    %hdl = call noalias i8* @llvm.coro.begin(token %id, i8* %phi)
434
435In the cleanup block, we will make freeing the coroutine frame conditional on
436`coro.free`_ intrinsic. If allocation is elided, `coro.free`_ returns `null`
437thus skipping the deallocation code:
438
439.. code-block:: llvm
440
441  cleanup:
442    %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
443    %need.dyn.free = icmp ne i8* %mem, null
444    br i1 %need.dyn.free, label %dyn.free, label %if.end
445  dyn.free:
446    call void @CustomFree(i8* %mem)
447    br label %if.end
448  if.end:
449    ...
450
451With allocations and deallocations represented as described as above, after
452coroutine heap allocation elision optimization, the resulting main will be:
453
454.. code-block:: llvm
455
456  define i32 @main() {
457  entry:
458    call void @print(i32 4)
459    call void @print(i32 5)
460    call void @print(i32 6)
461    ret i32 0
462  }
463
464Multiple Suspend Points
465-----------------------
466
467Let's consider the coroutine that has more than one suspend point:
468
469.. code-block:: c++
470
471  void *f(int n) {
472     for(;;) {
473       print(n++);
474       <suspend>
475       print(-n);
476       <suspend>
477     }
478  }
479
480Matching LLVM code would look like (with the rest of the code remaining the same
481as the code in the previous section):
482
483.. code-block:: llvm
484
485  loop:
486    %n.addr = phi i32 [ %n, %entry ], [ %inc, %loop.resume ]
487    call void @print(i32 %n.addr) #4
488    %2 = call i8 @llvm.coro.suspend(token none, i1 false)
489    switch i8 %2, label %suspend [i8 0, label %loop.resume
490                                  i8 1, label %cleanup]
491  loop.resume:
492    %inc = add nsw i32 %n.addr, 1
493    %sub = xor i32 %n.addr, -1
494    call void @print(i32 %sub)
495    %3 = call i8 @llvm.coro.suspend(token none, i1 false)
496    switch i8 %3, label %suspend [i8 0, label %loop
497                                  i8 1, label %cleanup]
498
499In this case, the coroutine frame would include a suspend index that will
500indicate at which suspend point the coroutine needs to resume. The resume
501function will use an index to jump to an appropriate basic block and will look
502as follows:
503
504.. code-block:: llvm
505
506  define internal fastcc void @f.Resume(%f.Frame* %FramePtr) {
507  entry.Resume:
508    %index.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i64 0, i32 2
509    %index = load i8, i8* %index.addr, align 1
510    %switch = icmp eq i8 %index, 0
511    %n.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i64 0, i32 3
512    %n = load i32, i32* %n.addr, align 4
513    br i1 %switch, label %loop.resume, label %loop
514
515  loop.resume:
516    %sub = xor i32 %n, -1
517    call void @print(i32 %sub)
518    br label %suspend
519  loop:
520    %inc = add nsw i32 %n, 1
521    store i32 %inc, i32* %n.addr, align 4
522    tail call void @print(i32 %inc)
523    br label %suspend
524
525  suspend:
526    %storemerge = phi i8 [ 0, %loop ], [ 1, %loop.resume ]
527    store i8 %storemerge, i8* %index.addr, align 1
528    ret void
529  }
530
531If different cleanup code needs to get executed for different suspend points,
532a similar switch will be in the `f.destroy` function.
533
534.. note ::
535
536  Using suspend index in a coroutine state and having a switch in `f.resume` and
537  `f.destroy` is one of the possible implementation strategies. We explored
538  another option where a distinct `f.resume1`, `f.resume2`, etc. are created for
539  every suspend point, and instead of storing an index, the resume and destroy
540  function pointers are updated at every suspend. Early testing showed that the
541  current approach is easier on the optimizer than the latter so it is a
542  lowering strategy implemented at the moment.
543
544Distinct Save and Suspend
545-------------------------
546
547In the previous example, setting a resume index (or some other state change that
548needs to happen to prepare a coroutine for resumption) happens at the same time as
549a suspension of a coroutine. However, in certain cases, it is necessary to control
550when coroutine is prepared for resumption and when it is suspended.
551
552In the following example, a coroutine represents some activity that is driven
553by completions of asynchronous operations `async_op1` and `async_op2` which get
554a coroutine handle as a parameter and resume the coroutine once async
555operation is finished.
556
557.. code-block:: text
558
559  void g() {
560     for (;;)
561       if (cond()) {
562          async_op1(<coroutine-handle>); // will resume once async_op1 completes
563          <suspend>
564          do_one();
565       }
566       else {
567          async_op2(<coroutine-handle>); // will resume once async_op2 completes
568          <suspend>
569          do_two();
570       }
571     }
572  }
573
574In this case, coroutine should be ready for resumption prior to a call to
575`async_op1` and `async_op2`. The `coro.save`_ intrinsic is used to indicate a
576point when coroutine should be ready for resumption (namely, when a resume index
577should be stored in the coroutine frame, so that it can be resumed at the
578correct resume point):
579
580.. code-block:: llvm
581
582  if.true:
583    %save1 = call token @llvm.coro.save(i8* %hdl)
584    call void @async_op1(i8* %hdl)
585    %suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false)
586    switch i8 %suspend1, label %suspend [i8 0, label %resume1
587                                         i8 1, label %cleanup]
588  if.false:
589    %save2 = call token @llvm.coro.save(i8* %hdl)
590    call void @async_op2(i8* %hdl)
591    %suspend2 = call i1 @llvm.coro.suspend(token %save2, i1 false)
592    switch i8 %suspend1, label %suspend [i8 0, label %resume2
593                                         i8 1, label %cleanup]
594
595.. _coroutine promise:
596
597Coroutine Promise
598-----------------
599
600A coroutine author or a frontend may designate a distinguished `alloca` that can
601be used to communicate with the coroutine. This distinguished alloca is called
602**coroutine promise** and is provided as the second parameter to the
603`coro.id`_ intrinsic.
604
605The following coroutine designates a 32 bit integer `promise` and uses it to
606store the current value produced by a coroutine.
607
608.. code-block:: llvm
609
610  define i8* @f(i32 %n) {
611  entry:
612    %promise = alloca i32
613    %pv = bitcast i32* %promise to i8*
614    %id = call token @llvm.coro.id(i32 0, i8* %pv, i8* null, i8* null)
615    %need.dyn.alloc = call i1 @llvm.coro.alloc(token %id)
616    br i1 %need.dyn.alloc, label %dyn.alloc, label %coro.begin
617  dyn.alloc:
618    %size = call i32 @llvm.coro.size.i32()
619    %alloc = call i8* @malloc(i32 %size)
620    br label %coro.begin
621  coro.begin:
622    %phi = phi i8* [ null, %entry ], [ %alloc, %dyn.alloc ]
623    %hdl = call noalias i8* @llvm.coro.begin(token %id, i8* %phi)
624    br label %loop
625  loop:
626    %n.val = phi i32 [ %n, %coro.begin ], [ %inc, %loop ]
627    %inc = add nsw i32 %n.val, 1
628    store i32 %n.val, i32* %promise
629    %0 = call i8 @llvm.coro.suspend(token none, i1 false)
630    switch i8 %0, label %suspend [i8 0, label %loop
631                                  i8 1, label %cleanup]
632  cleanup:
633    %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
634    call void @free(i8* %mem)
635    br label %suspend
636  suspend:
637    %unused = call i1 @llvm.coro.end(i8* %hdl, i1 false)
638    ret i8* %hdl
639  }
640
641A coroutine consumer can rely on the `coro.promise`_ intrinsic to access the
642coroutine promise.
643
644.. code-block:: llvm
645
646  define i32 @main() {
647  entry:
648    %hdl = call i8* @f(i32 4)
649    %promise.addr.raw = call i8* @llvm.coro.promise(i8* %hdl, i32 4, i1 false)
650    %promise.addr = bitcast i8* %promise.addr.raw to i32*
651    %val0 = load i32, i32* %promise.addr
652    call void @print(i32 %val0)
653    call void @llvm.coro.resume(i8* %hdl)
654    %val1 = load i32, i32* %promise.addr
655    call void @print(i32 %val1)
656    call void @llvm.coro.resume(i8* %hdl)
657    %val2 = load i32, i32* %promise.addr
658    call void @print(i32 %val2)
659    call void @llvm.coro.destroy(i8* %hdl)
660    ret i32 0
661  }
662
663After example in this section is compiled, result of the compilation will be:
664
665.. code-block:: llvm
666
667  define i32 @main() {
668  entry:
669    tail call void @print(i32 4)
670    tail call void @print(i32 5)
671    tail call void @print(i32 6)
672    ret i32 0
673  }
674
675.. _final:
676.. _final suspend:
677
678Final Suspend
679-------------
680
681A coroutine author or a frontend may designate a particular suspend to be final,
682by setting the second argument of the `coro.suspend`_ intrinsic to `true`.
683Such a suspend point has two properties:
684
685* it is possible to check whether a suspended coroutine is at the final suspend
686  point via `coro.done`_ intrinsic;
687
688* a resumption of a coroutine stopped at the final suspend point leads to
689  undefined behavior. The only possible action for a coroutine at a final
690  suspend point is destroying it via `coro.destroy`_ intrinsic.
691
692From the user perspective, the final suspend point represents an idea of a
693coroutine reaching the end. From the compiler perspective, it is an optimization
694opportunity for reducing number of resume points (and therefore switch cases) in
695the resume function.
696
697The following is an example of a function that keeps resuming the coroutine
698until the final suspend point is reached after which point the coroutine is
699destroyed:
700
701.. code-block:: llvm
702
703  define i32 @main() {
704  entry:
705    %hdl = call i8* @f(i32 4)
706    br label %while
707  while:
708    call void @llvm.coro.resume(i8* %hdl)
709    %done = call i1 @llvm.coro.done(i8* %hdl)
710    br i1 %done, label %end, label %while
711  end:
712    call void @llvm.coro.destroy(i8* %hdl)
713    ret i32 0
714  }
715
716Usually, final suspend point is a frontend injected suspend point that does not
717correspond to any explicitly authored suspend point of the high level language.
718For example, for a Python generator that has only one suspend point:
719
720.. code-block:: python
721
722  def coroutine(n):
723    for i in range(n):
724      yield i
725
726Python frontend would inject two more suspend points, so that the actual code
727looks like this:
728
729.. code-block:: c
730
731  void* coroutine(int n) {
732    int current_value;
733    <designate current_value to be coroutine promise>
734    <SUSPEND> // injected suspend point, so that the coroutine starts suspended
735    for (int i = 0; i < n; ++i) {
736      current_value = i; <SUSPEND>; // corresponds to "yield i"
737    }
738    <SUSPEND final=true> // injected final suspend point
739  }
740
741and python iterator `__next__` would look like:
742
743.. code-block:: c++
744
745  int __next__(void* hdl) {
746    coro.resume(hdl);
747    if (coro.done(hdl)) throw StopIteration();
748    return *(int*)coro.promise(hdl, 4, false);
749  }
750
751
752Intrinsics
753==========
754
755Coroutine Manipulation Intrinsics
756---------------------------------
757
758Intrinsics described in this section are used to manipulate an existing
759coroutine. They can be used in any function which happen to have a pointer
760to a `coroutine frame`_ or a pointer to a `coroutine promise`_.
761
762.. _coro.destroy:
763
764'llvm.coro.destroy' Intrinsic
765^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
766
767Syntax:
768"""""""
769
770::
771
772      declare void @llvm.coro.destroy(i8* <handle>)
773
774Overview:
775"""""""""
776
777The '``llvm.coro.destroy``' intrinsic destroys a suspended
778switched-resume coroutine.
779
780Arguments:
781""""""""""
782
783The argument is a coroutine handle to a suspended coroutine.
784
785Semantics:
786""""""""""
787
788When possible, the `coro.destroy` intrinsic is replaced with a direct call to
789the coroutine destroy function. Otherwise it is replaced with an indirect call
790based on the function pointer for the destroy function stored in the coroutine
791frame. Destroying a coroutine that is not suspended leads to undefined behavior.
792
793.. _coro.resume:
794
795'llvm.coro.resume' Intrinsic
796^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
797
798::
799
800      declare void @llvm.coro.resume(i8* <handle>)
801
802Overview:
803"""""""""
804
805The '``llvm.coro.resume``' intrinsic resumes a suspended switched-resume coroutine.
806
807Arguments:
808""""""""""
809
810The argument is a handle to a suspended coroutine.
811
812Semantics:
813""""""""""
814
815When possible, the `coro.resume` intrinsic is replaced with a direct call to the
816coroutine resume function. Otherwise it is replaced with an indirect call based
817on the function pointer for the resume function stored in the coroutine frame.
818Resuming a coroutine that is not suspended leads to undefined behavior.
819
820.. _coro.done:
821
822'llvm.coro.done' Intrinsic
823^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
824
825::
826
827      declare i1 @llvm.coro.done(i8* <handle>)
828
829Overview:
830"""""""""
831
832The '``llvm.coro.done``' intrinsic checks whether a suspended
833switched-resume coroutine is at the final suspend point or not.
834
835Arguments:
836""""""""""
837
838The argument is a handle to a suspended coroutine.
839
840Semantics:
841""""""""""
842
843Using this intrinsic on a coroutine that does not have a `final suspend`_ point
844or on a coroutine that is not suspended leads to undefined behavior.
845
846.. _coro.promise:
847
848'llvm.coro.promise' Intrinsic
849^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
850
851::
852
853      declare i8* @llvm.coro.promise(i8* <ptr>, i32 <alignment>, i1 <from>)
854
855Overview:
856"""""""""
857
858The '``llvm.coro.promise``' intrinsic obtains a pointer to a
859`coroutine promise`_ given a switched-resume coroutine handle and vice versa.
860
861Arguments:
862""""""""""
863
864The first argument is a handle to a coroutine if `from` is false. Otherwise,
865it is a pointer to a coroutine promise.
866
867The second argument is an alignment requirements of the promise.
868If a frontend designated `%promise = alloca i32` as a promise, the alignment
869argument to `coro.promise` should be the alignment of `i32` on the target
870platform. If a frontend designated `%promise = alloca i32, align 16` as a
871promise, the alignment argument should be 16.
872This argument only accepts constants.
873
874The third argument is a boolean indicating a direction of the transformation.
875If `from` is true, the intrinsic returns a coroutine handle given a pointer
876to a promise. If `from` is false, the intrinsics return a pointer to a promise
877from a coroutine handle. This argument only accepts constants.
878
879Semantics:
880""""""""""
881
882Using this intrinsic on a coroutine that does not have a coroutine promise
883leads to undefined behavior. It is possible to read and modify coroutine
884promise of the coroutine which is currently executing. The coroutine author and
885a coroutine user are responsible to makes sure there is no data races.
886
887Example:
888""""""""
889
890.. code-block:: llvm
891
892  define i8* @f(i32 %n) {
893  entry:
894    %promise = alloca i32
895    %pv = bitcast i32* %promise to i8*
896    ; the second argument to coro.id points to the coroutine promise.
897    %id = call token @llvm.coro.id(i32 0, i8* %pv, i8* null, i8* null)
898    ...
899    %hdl = call noalias i8* @llvm.coro.begin(token %id, i8* %alloc)
900    ...
901    store i32 42, i32* %promise ; store something into the promise
902    ...
903    ret i8* %hdl
904  }
905
906  define i32 @main() {
907  entry:
908    %hdl = call i8* @f(i32 4) ; starts the coroutine and returns its handle
909    %promise.addr.raw = call i8* @llvm.coro.promise(i8* %hdl, i32 4, i1 false)
910    %promise.addr = bitcast i8* %promise.addr.raw to i32*
911    %val = load i32, i32* %promise.addr ; load a value from the promise
912    call void @print(i32 %val)
913    call void @llvm.coro.destroy(i8* %hdl)
914    ret i32 0
915  }
916
917.. _coroutine intrinsics:
918
919Coroutine Structure Intrinsics
920------------------------------
921Intrinsics described in this section are used within a coroutine to describe
922the coroutine structure. They should not be used outside of a coroutine.
923
924.. _coro.size:
925
926'llvm.coro.size' Intrinsic
927^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
928::
929
930    declare i32 @llvm.coro.size.i32()
931    declare i64 @llvm.coro.size.i64()
932
933Overview:
934"""""""""
935
936The '``llvm.coro.size``' intrinsic returns the number of bytes
937required to store a `coroutine frame`_.  This is only supported for
938switched-resume coroutines.
939
940Arguments:
941""""""""""
942
943None
944
945Semantics:
946""""""""""
947
948The `coro.size` intrinsic is lowered to a constant representing the size of
949the coroutine frame.
950
951.. _coro.begin:
952
953'llvm.coro.begin' Intrinsic
954^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
955::
956
957  declare i8* @llvm.coro.begin(token <id>, i8* <mem>)
958
959Overview:
960"""""""""
961
962The '``llvm.coro.begin``' intrinsic returns an address of the coroutine frame.
963
964Arguments:
965""""""""""
966
967The first argument is a token returned by a call to '``llvm.coro.id``'
968identifying the coroutine.
969
970The second argument is a pointer to a block of memory where coroutine frame
971will be stored if it is allocated dynamically.  This pointer is ignored
972for returned-continuation coroutines.
973
974Semantics:
975""""""""""
976
977Depending on the alignment requirements of the objects in the coroutine frame
978and/or on the codegen compactness reasons the pointer returned from `coro.begin`
979may be at offset to the `%mem` argument. (This could be beneficial if
980instructions that express relative access to data can be more compactly encoded
981with small positive and negative offsets).
982
983A frontend should emit exactly one `coro.begin` intrinsic per coroutine.
984
985.. _coro.free:
986
987'llvm.coro.free' Intrinsic
988^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
989::
990
991  declare i8* @llvm.coro.free(token %id, i8* <frame>)
992
993Overview:
994"""""""""
995
996The '``llvm.coro.free``' intrinsic returns a pointer to a block of memory where
997coroutine frame is stored or `null` if this instance of a coroutine did not use
998dynamically allocated memory for its coroutine frame.  This intrinsic is not
999supported for returned-continuation coroutines.
1000
1001Arguments:
1002""""""""""
1003
1004The first argument is a token returned by a call to '``llvm.coro.id``'
1005identifying the coroutine.
1006
1007The second argument is a pointer to the coroutine frame. This should be the same
1008pointer that was returned by prior `coro.begin` call.
1009
1010Example (custom deallocation function):
1011"""""""""""""""""""""""""""""""""""""""
1012
1013.. code-block:: llvm
1014
1015  cleanup:
1016    %mem = call i8* @llvm.coro.free(token %id, i8* %frame)
1017    %mem_not_null = icmp ne i8* %mem, null
1018    br i1 %mem_not_null, label %if.then, label %if.end
1019  if.then:
1020    call void @CustomFree(i8* %mem)
1021    br label %if.end
1022  if.end:
1023    ret void
1024
1025Example (standard deallocation functions):
1026""""""""""""""""""""""""""""""""""""""""""
1027
1028.. code-block:: llvm
1029
1030  cleanup:
1031    %mem = call i8* @llvm.coro.free(token %id, i8* %frame)
1032    call void @free(i8* %mem)
1033    ret void
1034
1035.. _coro.alloc:
1036
1037'llvm.coro.alloc' Intrinsic
1038^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1039::
1040
1041  declare i1 @llvm.coro.alloc(token <id>)
1042
1043Overview:
1044"""""""""
1045
1046The '``llvm.coro.alloc``' intrinsic returns `true` if dynamic allocation is
1047required to obtain a memory for the coroutine frame and `false` otherwise.
1048This is not supported for returned-continuation coroutines.
1049
1050Arguments:
1051""""""""""
1052
1053The first argument is a token returned by a call to '``llvm.coro.id``'
1054identifying the coroutine.
1055
1056Semantics:
1057""""""""""
1058
1059A frontend should emit at most one `coro.alloc` intrinsic per coroutine.
1060The intrinsic is used to suppress dynamic allocation of the coroutine frame
1061when possible.
1062
1063Example:
1064""""""""
1065
1066.. code-block:: llvm
1067
1068  entry:
1069    %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
1070    %dyn.alloc.required = call i1 @llvm.coro.alloc(token %id)
1071    br i1 %dyn.alloc.required, label %coro.alloc, label %coro.begin
1072
1073  coro.alloc:
1074    %frame.size = call i32 @llvm.coro.size()
1075    %alloc = call i8* @MyAlloc(i32 %frame.size)
1076    br label %coro.begin
1077
1078  coro.begin:
1079    %phi = phi i8* [ null, %entry ], [ %alloc, %coro.alloc ]
1080    %frame = call i8* @llvm.coro.begin(token %id, i8* %phi)
1081
1082.. _coro.noop:
1083
1084'llvm.coro.noop' Intrinsic
1085^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1086::
1087
1088  declare i8* @llvm.coro.noop()
1089
1090Overview:
1091"""""""""
1092
1093The '``llvm.coro.noop``' intrinsic returns an address of the coroutine frame of
1094a coroutine that does nothing when resumed or destroyed.
1095
1096Arguments:
1097""""""""""
1098
1099None
1100
1101Semantics:
1102""""""""""
1103
1104This intrinsic is lowered to refer to a private constant coroutine frame. The
1105resume and destroy handlers for this frame are empty functions that do nothing.
1106Note that in different translation units llvm.coro.noop may return different pointers.
1107
1108.. _coro.frame:
1109
1110'llvm.coro.frame' Intrinsic
1111^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1112::
1113
1114  declare i8* @llvm.coro.frame()
1115
1116Overview:
1117"""""""""
1118
1119The '``llvm.coro.frame``' intrinsic returns an address of the coroutine frame of
1120the enclosing coroutine.
1121
1122Arguments:
1123""""""""""
1124
1125None
1126
1127Semantics:
1128""""""""""
1129
1130This intrinsic is lowered to refer to the `coro.begin`_ instruction. This is
1131a frontend convenience intrinsic that makes it easier to refer to the
1132coroutine frame.
1133
1134.. _coro.id:
1135
1136'llvm.coro.id' Intrinsic
1137^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1138::
1139
1140  declare token @llvm.coro.id(i32 <align>, i8* <promise>, i8* <coroaddr>,
1141                                                          i8* <fnaddrs>)
1142
1143Overview:
1144"""""""""
1145
1146The '``llvm.coro.id``' intrinsic returns a token identifying a
1147switched-resume coroutine.
1148
1149Arguments:
1150""""""""""
1151
1152The first argument provides information on the alignment of the memory returned
1153by the allocation function and given to `coro.begin` by the first argument. If
1154this argument is 0, the memory is assumed to be aligned to 2 * sizeof(i8*).
1155This argument only accepts constants.
1156
1157The second argument, if not `null`, designates a particular alloca instruction
1158to be a `coroutine promise`_.
1159
1160The third argument is `null` coming out of the frontend. The CoroEarly pass sets
1161this argument to point to the function this coro.id belongs to.
1162
1163The fourth argument is `null` before coroutine is split, and later is replaced
1164to point to a private global constant array containing function pointers to
1165outlined resume and destroy parts of the coroutine.
1166
1167
1168Semantics:
1169""""""""""
1170
1171The purpose of this intrinsic is to tie together `coro.id`, `coro.alloc` and
1172`coro.begin` belonging to the same coroutine to prevent optimization passes from
1173duplicating any of these instructions unless entire body of the coroutine is
1174duplicated.
1175
1176A frontend should emit exactly one `coro.id` intrinsic per coroutine.
1177
1178.. _coro.id.async:
1179
1180'llvm.coro.id.async' Intrinsic
1181^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1182::
1183
1184  declare token @llvm.coro.id.async(i32 <context size>, i32 <align>,
1185                                    i8* <context arg>,
1186                                    i8* <async function pointer>)
1187
1188Overview:
1189"""""""""
1190
1191The '``llvm.coro.id.async``' intrinsic returns a token identifying an async coroutine.
1192
1193Arguments:
1194""""""""""
1195
1196The first argument provides the initial size of the `async context` as required
1197from the frontend. Lowering will add to this size the size required by the frame
1198storage and store that value to the `async function pointer`.
1199
1200The second argument, is the alignment guarantee of the memory of the
1201`async context`. The frontend guarantees that the memory will be aligned by this
1202value.
1203
1204The third argument is the `async context` argument in the current coroutine.
1205
1206The fourth argument is the address of the `async function pointer` struct.
1207Lowering will update the context size requirement in this struct by adding the
1208coroutine frame size requirement to the initial size requirement as specified by
1209the first argument of this intrinsic.
1210
1211
1212Semantics:
1213""""""""""
1214
1215A frontend should emit exactly one `coro.id.async` intrinsic per coroutine.
1216
1217.. _coro.id.retcon:
1218
1219'llvm.coro.id.retcon' Intrinsic
1220^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1221::
1222
1223  declare token @llvm.coro.id.retcon(i32 <size>, i32 <align>, i8* <buffer>,
1224                                     i8* <continuation prototype>,
1225                                     i8* <alloc>, i8* <dealloc>)
1226
1227Overview:
1228"""""""""
1229
1230The '``llvm.coro.id.retcon``' intrinsic returns a token identifying a
1231multiple-suspend returned-continuation coroutine.
1232
1233The 'result-type sequence' of the coroutine is defined as follows:
1234
1235- if the return type of the coroutine function is ``void``, it is the
1236  empty sequence;
1237
1238- if the return type of the coroutine function is a ``struct``, it is the
1239  element types of that ``struct`` in order;
1240
1241- otherwise, it is just the return type of the coroutine function.
1242
1243The first element of the result-type sequence must be a pointer type;
1244continuation functions will be coerced to this type.  The rest of
1245the sequence are the 'yield types', and any suspends in the coroutine
1246must take arguments of these types.
1247
1248Arguments:
1249""""""""""
1250
1251The first and second arguments are the expected size and alignment of
1252the buffer provided as the third argument.  They must be constant.
1253
1254The fourth argument must be a reference to a global function, called
1255the 'continuation prototype function'.  The type, calling convention,
1256and attributes of any continuation functions will be taken from this
1257declaration.  The return type of the prototype function must match the
1258return type of the current function.  The first parameter type must be
1259a pointer type.  The second parameter type must be an integer type;
1260it will be used only as a boolean flag.
1261
1262The fifth argument must be a reference to a global function that will
1263be used to allocate memory.  It may not fail, either by returning null
1264or throwing an exception.  It must take an integer and return a pointer.
1265
1266The sixth argument must be a reference to a global function that will
1267be used to deallocate memory.  It must take a pointer and return ``void``.
1268
1269'llvm.coro.id.retcon.once' Intrinsic
1270^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1271::
1272
1273  declare token @llvm.coro.id.retcon.once(i32 <size>, i32 <align>, i8* <buffer>,
1274                                          i8* <prototype>,
1275                                          i8* <alloc>, i8* <dealloc>)
1276
1277Overview:
1278"""""""""
1279
1280The '``llvm.coro.id.retcon.once``' intrinsic returns a token identifying a
1281unique-suspend returned-continuation coroutine.
1282
1283Arguments:
1284""""""""""
1285
1286As for ``llvm.core.id.retcon``, except that the return type of the
1287continuation prototype must be `void` instead of matching the
1288coroutine's return type.
1289
1290.. _coro.end:
1291
1292'llvm.coro.end' Intrinsic
1293^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1294::
1295
1296  declare i1 @llvm.coro.end(i8* <handle>, i1 <unwind>)
1297
1298Overview:
1299"""""""""
1300
1301The '``llvm.coro.end``' marks the point where execution of the resume part of
1302the coroutine should end and control should return to the caller.
1303
1304
1305Arguments:
1306""""""""""
1307
1308The first argument should refer to the coroutine handle of the enclosing
1309coroutine. A frontend is allowed to supply null as the first parameter, in this
1310case `coro-early` pass will replace the null with an appropriate coroutine
1311handle value.
1312
1313The second argument should be `true` if this coro.end is in the block that is
1314part of the unwind sequence leaving the coroutine body due to an exception and
1315`false` otherwise.
1316
1317Semantics:
1318""""""""""
1319The purpose of this intrinsic is to allow frontends to mark the cleanup and
1320other code that is only relevant during the initial invocation of the coroutine
1321and should not be present in resume and destroy parts.
1322
1323In returned-continuation lowering, ``llvm.coro.end`` fully destroys the
1324coroutine frame.  If the second argument is `false`, it also returns from
1325the coroutine with a null continuation pointer, and the next instruction
1326will be unreachable.  If the second argument is `true`, it falls through
1327so that the following logic can resume unwinding.  In a yield-once
1328coroutine, reaching a non-unwind ``llvm.coro.end`` without having first
1329reached a ``llvm.coro.suspend.retcon`` has undefined behavior.
1330
1331The remainder of this section describes the behavior under switched-resume
1332lowering.
1333
1334This intrinsic is lowered when a coroutine is split into
1335the start, resume and destroy parts. In the start part, it is a no-op,
1336in resume and destroy parts, it is replaced with `ret void` instruction and
1337the rest of the block containing `coro.end` instruction is discarded.
1338In landing pads it is replaced with an appropriate instruction to unwind to
1339caller. The handling of coro.end differs depending on whether the target is
1340using landingpad or WinEH exception model.
1341
1342For landingpad based exception model, it is expected that frontend uses the
1343`coro.end`_ intrinsic as follows:
1344
1345.. code-block:: llvm
1346
1347    ehcleanup:
1348      %InResumePart = call i1 @llvm.coro.end(i8* null, i1 true)
1349      br i1 %InResumePart, label %eh.resume, label %cleanup.cont
1350
1351    cleanup.cont:
1352      ; rest of the cleanup
1353
1354    eh.resume:
1355      %exn = load i8*, i8** %exn.slot, align 8
1356      %sel = load i32, i32* %ehselector.slot, align 4
1357      %lpad.val = insertvalue { i8*, i32 } undef, i8* %exn, 0
1358      %lpad.val29 = insertvalue { i8*, i32 } %lpad.val, i32 %sel, 1
1359      resume { i8*, i32 } %lpad.val29
1360
1361The `CoroSpit` pass replaces `coro.end` with ``True`` in the resume functions,
1362thus leading to immediate unwind to the caller, whereas in start function it
1363is replaced with ``False``, thus allowing to proceed to the rest of the cleanup
1364code that is only needed during initial invocation of the coroutine.
1365
1366For Windows Exception handling model, a frontend should attach a funclet bundle
1367referring to an enclosing cleanuppad as follows:
1368
1369.. code-block:: llvm
1370
1371    ehcleanup:
1372      %tok = cleanuppad within none []
1373      %unused = call i1 @llvm.coro.end(i8* null, i1 true) [ "funclet"(token %tok) ]
1374      cleanupret from %tok unwind label %RestOfTheCleanup
1375
1376The `CoroSplit` pass, if the funclet bundle is present, will insert
1377``cleanupret from %tok unwind to caller`` before
1378the `coro.end`_ intrinsic and will remove the rest of the block.
1379
1380The following table summarizes the handling of `coro.end`_ intrinsic.
1381
1382+--------------------------+-------------------+-------------------------------+
1383|                          | In Start Function | In Resume/Destroy Functions   |
1384+--------------------------+-------------------+-------------------------------+
1385|unwind=false              | nothing           |``ret void``                   |
1386+------------+-------------+-------------------+-------------------------------+
1387|            | WinEH       | nothing           |``cleanupret unwind to caller``|
1388|unwind=true +-------------+-------------------+-------------------------------+
1389|            | Landingpad  | nothing           | nothing                       |
1390+------------+-------------+-------------------+-------------------------------+
1391
1392
1393'llvm.coro.end.async' Intrinsic
1394^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1395::
1396
1397  declare i1 @llvm.coro.end.async(i8* <handle>, i1 <unwind>, ...)
1398
1399Overview:
1400"""""""""
1401
1402The '``llvm.coro.end.async``' marks the point where execution of the resume part
1403of the coroutine should end and control should return to the caller. As part of
1404its variable tail arguments this instruction allows to specify a function and
1405the function's arguments that are to be tail called as the last action before
1406returning.
1407
1408
1409Arguments:
1410""""""""""
1411
1412The first argument should refer to the coroutine handle of the enclosing
1413coroutine. A frontend is allowed to supply null as the first parameter, in this
1414case `coro-early` pass will replace the null with an appropriate coroutine
1415handle value.
1416
1417The second argument should be `true` if this coro.end is in the block that is
1418part of the unwind sequence leaving the coroutine body due to an exception and
1419`false` otherwise.
1420
1421The third argument if present should specify a function to be called.
1422
1423If the third argument is present, the remaining arguments are the arguments to
1424the function call.
1425
1426.. code-block:: llvm
1427
1428  call i1 (i8*, i1, ...) @llvm.coro.end.async(
1429                           i8* %hdl, i1 0,
1430                           void (i8*, %async.task*, %async.actor*)* @must_tail_call_return,
1431                           i8* %ctxt, %async.task* %task, %async.actor* %actor)
1432  unreachable
1433
1434.. _coro.suspend:
1435.. _suspend points:
1436
1437'llvm.coro.suspend' Intrinsic
1438^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1439::
1440
1441  declare i8 @llvm.coro.suspend(token <save>, i1 <final>)
1442
1443Overview:
1444"""""""""
1445
1446The '``llvm.coro.suspend``' marks the point where execution of a
1447switched-resume coroutine is suspended and control is returned back
1448to the caller.  Conditional branches consuming the result of this
1449intrinsic lead to basic blocks where coroutine should proceed when
1450suspended (-1), resumed (0) or destroyed (1).
1451
1452Arguments:
1453""""""""""
1454
1455The first argument refers to a token of `coro.save` intrinsic that marks the
1456point when coroutine state is prepared for suspension. If `none` token is passed,
1457the intrinsic behaves as if there were a `coro.save` immediately preceding
1458the `coro.suspend` intrinsic.
1459
1460The second argument indicates whether this suspension point is `final`_.
1461The second argument only accepts constants. If more than one suspend point is
1462designated as final, the resume and destroy branches should lead to the same
1463basic blocks.
1464
1465Example (normal suspend point):
1466"""""""""""""""""""""""""""""""
1467
1468.. code-block:: llvm
1469
1470    %0 = call i8 @llvm.coro.suspend(token none, i1 false)
1471    switch i8 %0, label %suspend [i8 0, label %resume
1472                                  i8 1, label %cleanup]
1473
1474Example (final suspend point):
1475""""""""""""""""""""""""""""""
1476
1477.. code-block:: llvm
1478
1479  while.end:
1480    %s.final = call i8 @llvm.coro.suspend(token none, i1 true)
1481    switch i8 %s.final, label %suspend [i8 0, label %trap
1482                                        i8 1, label %cleanup]
1483  trap:
1484    call void @llvm.trap()
1485    unreachable
1486
1487Semantics:
1488""""""""""
1489
1490If a coroutine that was suspended at the suspend point marked by this intrinsic
1491is resumed via `coro.resume`_ the control will transfer to the basic block
1492of the 0-case. If it is resumed via `coro.destroy`_, it will proceed to the
1493basic block indicated by the 1-case. To suspend, coroutine proceed to the
1494default label.
1495
1496If suspend intrinsic is marked as final, it can consider the `true` branch
1497unreachable and can perform optimizations that can take advantage of that fact.
1498
1499.. _coro.save:
1500
1501'llvm.coro.save' Intrinsic
1502^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1503::
1504
1505  declare token @llvm.coro.save(i8* <handle>)
1506
1507Overview:
1508"""""""""
1509
1510The '``llvm.coro.save``' marks the point where a coroutine need to update its
1511state to prepare for resumption to be considered suspended (and thus eligible
1512for resumption).
1513
1514Arguments:
1515""""""""""
1516
1517The first argument points to a coroutine handle of the enclosing coroutine.
1518
1519Semantics:
1520""""""""""
1521
1522Whatever coroutine state changes are required to enable resumption of
1523the coroutine from the corresponding suspend point should be done at the point
1524of `coro.save` intrinsic.
1525
1526Example:
1527""""""""
1528
1529Separate save and suspend points are necessary when a coroutine is used to
1530represent an asynchronous control flow driven by callbacks representing
1531completions of asynchronous operations.
1532
1533In such a case, a coroutine should be ready for resumption prior to a call to
1534`async_op` function that may trigger resumption of a coroutine from the same or
1535a different thread possibly prior to `async_op` call returning control back
1536to the coroutine:
1537
1538.. code-block:: llvm
1539
1540    %save1 = call token @llvm.coro.save(i8* %hdl)
1541    call void @async_op1(i8* %hdl)
1542    %suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false)
1543    switch i8 %suspend1, label %suspend [i8 0, label %resume1
1544                                         i8 1, label %cleanup]
1545
1546.. _coro.suspend.async:
1547
1548'llvm.coro.suspend.async' Intrinsic
1549^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1550::
1551
1552  declare {i8*, i8*, i8*} @llvm.coro.suspend.async(
1553                             i8* <resume function>,
1554                             i8* <context projection function>,
1555                             ... <function to call>
1556                             ... <arguments to function>)
1557
1558Overview:
1559"""""""""
1560
1561The '``llvm.coro.suspend.async``' intrinsic marks the point where
1562execution of a async coroutine is suspended and control is passed to a callee.
1563
1564Arguments:
1565""""""""""
1566
1567The first argument should be the result of the `llvm.coro.async.resume` intrinsic.
1568Lowering will replace this intrinsic with the resume function for this suspend
1569point.
1570
1571The second argument is the `context projection function`. It should describe
1572how-to restore the `async context` in the continuation function from the first
1573argument of the continuation function. Its type is `i8* (i8*)`.
1574
1575The third argument is the function that models transfer to the callee at the
1576suspend point. It should take 3 arguments. Lowering will `musttail` call this
1577function.
1578
1579The fourth to six argument are the arguments for the third argument.
1580
1581Semantics:
1582""""""""""
1583
1584The result of the intrinsic are mapped to the arguments of the resume function.
1585Execution is suspended at this intrinsic and resumed when the resume function is
1586called.
1587
1588.. _coro.prepare.async:
1589
1590'llvm.coro.prepare.async' Intrinsic
1591^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1592::
1593
1594  declare i8* @llvm.coro.prepare.async(i8* <coroutine function>)
1595
1596Overview:
1597"""""""""
1598
1599The '``llvm.coro.prepare.async``' intrinsic is used to block inlining of the
1600async coroutine until after coroutine splitting.
1601
1602Arguments:
1603""""""""""
1604
1605The first argument should be an async coroutine of type `void (i8*, i8*, i8*)`.
1606Lowering will replace this intrinsic with its coroutine function argument.
1607
1608.. _coro.suspend.retcon:
1609
1610'llvm.coro.suspend.retcon' Intrinsic
1611^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1612::
1613
1614  declare i1 @llvm.coro.suspend.retcon(...)
1615
1616Overview:
1617"""""""""
1618
1619The '``llvm.coro.suspend.retcon``' intrinsic marks the point where
1620execution of a returned-continuation coroutine is suspended and control
1621is returned back to the caller.
1622
1623`llvm.coro.suspend.retcon`` does not support separate save points;
1624they are not useful when the continuation function is not locally
1625accessible.  That would be a more appropriate feature for a ``passcon``
1626lowering that is not yet implemented.
1627
1628Arguments:
1629""""""""""
1630
1631The types of the arguments must exactly match the yielded-types sequence
1632of the coroutine.  They will be turned into return values from the ramp
1633and continuation functions, along with the next continuation function.
1634
1635Semantics:
1636""""""""""
1637
1638The result of the intrinsic indicates whether the coroutine should resume
1639abnormally (non-zero).
1640
1641In a normal coroutine, it is undefined behavior if the coroutine executes
1642a call to ``llvm.coro.suspend.retcon`` after resuming abnormally.
1643
1644In a yield-once coroutine, it is undefined behavior if the coroutine
1645executes a call to ``llvm.coro.suspend.retcon`` after resuming in any way.
1646
1647.. _coro.param:
1648
1649'llvm.coro.param' Intrinsic
1650^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1651::
1652
1653  declare i1 @llvm.coro.param(i8* <original>, i8* <copy>)
1654
1655Overview:
1656"""""""""
1657
1658The '``llvm.coro.param``' is used by a frontend to mark up the code used to
1659construct and destruct copies of the parameters. If the optimizer discovers that
1660a particular parameter copy is not used after any suspends, it can remove the
1661construction and destruction of the copy by replacing corresponding coro.param
1662with `i1 false` and replacing any use of the `copy` with the `original`.
1663
1664Arguments:
1665""""""""""
1666
1667The first argument points to an `alloca` storing the value of a parameter to a
1668coroutine.
1669
1670The second argument points to an `alloca` storing the value of the copy of that
1671parameter.
1672
1673Semantics:
1674""""""""""
1675
1676The optimizer is free to always replace this intrinsic with `i1 true`.
1677
1678The optimizer is also allowed to replace it with `i1 false` provided that the
1679parameter copy is only used prior to control flow reaching any of the suspend
1680points. The code that would be DCE'd if the `coro.param` is replaced with
1681`i1 false` is not considered to be a use of the parameter copy.
1682
1683The frontend can emit this intrinsic if its language rules allow for this
1684optimization.
1685
1686Example:
1687""""""""
1688Consider the following example. A coroutine takes two parameters `a` and `b`
1689that has a destructor and a move constructor.
1690
1691.. code-block:: c++
1692
1693  struct A { ~A(); A(A&&); bool foo(); void bar(); };
1694
1695  task<int> f(A a, A b) {
1696    if (a.foo())
1697      return 42;
1698
1699    a.bar();
1700    co_await read_async(); // introduces suspend point
1701    b.bar();
1702  }
1703
1704Note that, uses of `b` is used after a suspend point and thus must be copied
1705into a coroutine frame, whereas `a` does not have to, since it never used
1706after suspend.
1707
1708A frontend can create parameter copies for `a` and `b` as follows:
1709
1710.. code-block:: text
1711
1712  task<int> f(A a', A b') {
1713    a = alloca A;
1714    b = alloca A;
1715    // move parameters to its copies
1716    if (coro.param(a', a)) A::A(a, A&& a');
1717    if (coro.param(b', b)) A::A(b, A&& b');
1718    ...
1719    // destroy parameters copies
1720    if (coro.param(a', a)) A::~A(a);
1721    if (coro.param(b', b)) A::~A(b);
1722  }
1723
1724The optimizer can replace coro.param(a',a) with `i1 false` and replace all uses
1725of `a` with `a'`, since it is not used after suspend.
1726
1727The optimizer must replace coro.param(b', b) with `i1 true`, since `b` is used
1728after suspend and therefore, it has to reside in the coroutine frame.
1729
1730Coroutine Transformation Passes
1731===============================
1732CoroEarly
1733---------
1734The pass CoroEarly lowers coroutine intrinsics that hide the details of the
1735structure of the coroutine frame, but, otherwise not needed to be preserved to
1736help later coroutine passes. This pass lowers `coro.frame`_, `coro.done`_,
1737and `coro.promise`_ intrinsics.
1738
1739.. _CoroSplit:
1740
1741CoroSplit
1742---------
1743The pass CoroSplit buides coroutine frame and outlines resume and destroy parts
1744into separate functions.
1745
1746CoroElide
1747---------
1748The pass CoroElide examines if the inlined coroutine is eligible for heap
1749allocation elision optimization. If so, it replaces
1750`coro.begin` intrinsic with an address of a coroutine frame placed on its caller
1751and replaces `coro.alloc` and `coro.free` intrinsics with `false` and `null`
1752respectively to remove the deallocation code.
1753This pass also replaces `coro.resume` and `coro.destroy` intrinsics with direct
1754calls to resume and destroy functions for a particular coroutine where possible.
1755
1756CoroCleanup
1757-----------
1758This pass runs late to lower all coroutine related intrinsics not replaced by
1759earlier passes.
1760
1761Areas Requiring Attention
1762=========================
1763#. When coro.suspend returns -1, the coroutine is suspended, and it's possible
1764   that the coroutine has already been destroyed (hence the frame has been freed).
1765   We cannot access anything on the frame on the suspend path.
1766   However there is nothing that prevents the compiler from moving instructions
1767   along that path (e.g. LICM), which can lead to use-after-free. At the moment
1768   we disabled LICM for loops that have coro.suspend, but the general problem still
1769   exists and requires a general solution.
1770
1771#. Take advantage of the lifetime intrinsics for the data that goes into the
1772   coroutine frame. Leave lifetime intrinsics as is for the data that stays in
1773   allocas.
1774
1775#. The CoroElide optimization pass relies on coroutine ramp function to be
1776   inlined. It would be beneficial to split the ramp function further to
1777   increase the chance that it will get inlined into its caller.
1778
1779#. Design a convention that would make it possible to apply coroutine heap
1780   elision optimization across ABI boundaries.
1781
1782#. Cannot handle coroutines with `inalloca` parameters (used in x86 on Windows).
1783
1784#. Alignment is ignored by coro.begin and coro.free intrinsics.
1785
1786#. Make required changes to make sure that coroutine optimizations work with
1787   LTO.
1788
1789#. More tests, more tests, more tests
1790