xref: /llvm-project/llvm/docs/StackMaps.rst (revision a1ee1a9126678aabd3b02ef1367df8879d2536fa)
1===================================
2Stack maps and patch points in LLVM
3===================================
4
5.. contents::
6   :local:
7   :depth: 2
8
9Definitions
10===========
11
12In this document we refer to the "runtime" collectively as all
13components that serve as the LLVM client, including the LLVM IR
14generator, object code consumer, and code patcher.
15
16A stack map records the location of ``live values`` at a particular
17instruction address. These ``live values`` do not refer to all the
18LLVM values live across the stack map. Instead, they are only the
19values that the runtime requires to be live at this point. For
20example, they may be the values the runtime will need to resume
21program execution at that point independent of the compiled function
22containing the stack map.
23
24LLVM emits stack map data into the object code within a designated
25:ref:`stackmap-section`. This stack map data contains a record for
26each stack map. The record stores the stack map's instruction address
27and contains an entry for each mapped value. Each entry encodes a
28value's location as a register, stack offset, or constant.
29
30A patch point is an instruction address at which space is reserved for
31patching a new instruction sequence at run time. Patch points look
32much like calls to LLVM. They take arguments that follow a calling
33convention and may return a value. They also imply stack map
34generation, which allows the runtime to locate the patchpoint and
35find the location of ``live values`` at that point.
36
37Motivation
38==========
39
40This functionality is currently experimental but is potentially useful
41in a variety of settings, the most obvious being a runtime (JIT)
42compiler. Example applications of the patchpoint intrinsics are
43implementing an inline call cache for polymorphic method dispatch or
44optimizing the retrieval of properties in dynamically typed languages
45such as JavaScript.
46
47The intrinsics documented here are currently used by the JavaScript
48compiler within the open source WebKit project, see the `FTL JIT
49<https://trac.webkit.org/wiki/FTLJIT>`_, but they are designed to be
50used whenever stack maps or code patching are needed. Because the
51intrinsics have experimental status, compatibility across LLVM
52releases is not guaranteed.
53
54The stack map functionality described in this document is separate
55from the functionality described in
56:ref:`stack-map`. `GCFunctionMetadata` provides the location of
57pointers into a collected heap captured by the `GCRoot` intrinsic,
58which can also be considered a "stack map". Unlike the stack maps
59defined above, the `GCFunctionMetadata` stack map interface does not
60provide a way to associate live register values of arbitrary type with
61an instruction address, nor does it specify a format for the resulting
62stack map. The stack maps described here could potentially provide
63richer information to a garbage collecting runtime, but that usage
64will not be discussed in this document.
65
66Intrinsics
67==========
68
69The following two kinds of intrinsics can be used to implement stack
70maps and patch points: ``llvm.experimental.stackmap`` and
71``llvm.experimental.patchpoint``. Both kinds of intrinsics generate a
72stack map record, and they both allow some form of code patching. They
73can be used independently (i.e. ``llvm.experimental.patchpoint``
74implicitly generates a stack map without the need for an additional
75call to ``llvm.experimental.stackmap``). The choice of which to use
76depends on whether it is necessary to reserve space for code patching
77and whether any of the intrinsic arguments should be lowered according
78to calling conventions. ``llvm.experimental.stackmap`` does not
79reserve any space, nor does it expect any call arguments. If the
80runtime patches code at the stack map's address, it will destructively
81overwrite the program text. This is unlike
82``llvm.experimental.patchpoint``, which reserves space for in-place
83patching without overwriting surrounding code. The
84``llvm.experimental.patchpoint`` intrinsic also lowers a specified
85number of arguments according to its calling convention. This allows
86patched code to make in-place function calls without marshaling.
87
88Each instance of one of these intrinsics generates a stack map record
89in the :ref:`stackmap-section`. The record includes an ID, allowing
90the runtime to uniquely identify the stack map, and the offset within
91the code from the beginning of the enclosing function.
92
93'``llvm.experimental.stackmap``' Intrinsic
94^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
95
96Syntax:
97"""""""
98
99::
100
101      declare void
102        @llvm.experimental.stackmap(i64 <id>, i32 <numShadowBytes>, ...)
103
104Overview:
105"""""""""
106
107The '``llvm.experimental.stackmap``' intrinsic records the location of
108specified values in the stack map without generating any code.
109
110Operands:
111"""""""""
112
113The first operand is an ID to be encoded within the stack map. The
114second operand is the number of shadow bytes following the
115intrinsic. These first two operands should be immediate, e.g. cannot
116be passed as variables. The variable number of operands that follow are
117the ``live values`` for which locations will be recorded in the stack map.
118
119To use this intrinsic as a bare-bones stack map, with no code patching
120support, the number of shadow bytes can be set to zero.
121
122Semantics:
123""""""""""
124
125The stack map intrinsic generates no code in place, unless nops are
126needed to cover its shadow (see below). However, its offset from
127function entry is stored in the stack map. This is the relative
128instruction address immediately following the instructions that
129precede the stack map.
130
131The stack map ID allows a runtime to locate the desired stack map
132record. LLVM passes this ID through directly to the stack map
133record without checking uniqueness.
134
135LLVM guarantees a shadow of instructions following the stack map's
136instruction offset during which neither the end of the basic block nor
137another call to ``llvm.experimental.stackmap`` or
138``llvm.experimental.patchpoint`` may occur. This allows the runtime to
139patch the code at this point in response to an event triggered from
140outside the code. The code for instructions following the stack map
141may be emitted in the stack map's shadow, and these instructions may
142be overwritten by destructive patching. Without shadow bytes, this
143destructive patching could overwrite program text or data outside the
144current function. We disallow overlapping stack map shadows so that
145the runtime does not need to consider this corner case.
146
147For example, a stack map with 8 byte shadow:
148
149.. code-block:: llvm
150
151  call void @runtime()
152  call void (i64, i32, ...) @llvm.experimental.stackmap(i64 77, i32 8,
153                                                        ptr %ptr)
154  %val = load i64, ptr %ptr
155  %add = add i64 %val, 3
156  ret i64 %add
157
158May require one byte of nop-padding:
159
160.. code-block:: none
161
162  0x00 callq _runtime
163  0x05 nop                <--- stack map address
164  0x06 movq (%rdi), %rax
165  0x07 addq $3, %rax
166  0x0a popq %rdx
167  0x0b ret                <---- end of 8-byte shadow
168
169Now, if the runtime needs to invalidate the compiled code, it may
170patch 8 bytes of code at the stack map's address at follows:
171
172.. code-block:: none
173
174  0x00 callq _runtime
175  0x05 movl  $0xffff, %rax <--- patched code at stack map address
176  0x0a callq *%rax         <---- end of 8-byte shadow
177
178This way, after the normal call to the runtime returns, the code will
179execute a patched call to a special entry point that can rebuild a
180stack frame from the values located by the stack map.
181
182'``llvm.experimental.patchpoint.*``' Intrinsic
183^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
184
185Syntax:
186"""""""
187
188::
189
190      declare void
191        @llvm.experimental.patchpoint.void(i64 <id>, i32 <numBytes>,
192                                           ptr <target>, i32 <numArgs>, ...)
193      declare i64
194        @llvm.experimental.patchpoint.i64(i64 <id>, i32 <numBytes>,
195                                          ptr <target>, i32 <numArgs>, ...)
196
197Overview:
198"""""""""
199
200The '``llvm.experimental.patchpoint.*``' intrinsics creates a function
201call to the specified ``<target>`` and records the location of specified
202values in the stack map.
203
204Operands:
205"""""""""
206
207The first operand is an ID, the second operand is the number of bytes
208reserved for the patchable region, the third operand is the target
209address of a function (optionally null), and the fourth operand
210specifies how many of the following variable operands are considered
211function call arguments. The remaining variable number of operands are
212the ``live values`` for which locations will be recorded in the stack
213map.
214
215Semantics:
216""""""""""
217
218The patch point intrinsic generates a stack map. It also emits a
219function call to the address specified by ``<target>`` if the address
220is not a constant null. The function call and its arguments are
221lowered according to the calling convention specified at the
222intrinsic's callsite. Variants of the intrinsic with non-void return
223type also return a value according to calling convention.
224
225On PowerPC, note that ``<target>`` must be the ABI function pointer for the
226intended target of the indirect call. Specifically, when compiling for the
227ELF V1 ABI, ``<target>`` is the function-descriptor address normally used as
228the C/C++ function-pointer representation.
229
230Requesting zero patch point arguments is valid. In this case, all
231variable operands are handled just like
232``llvm.experimental.stackmap.*``. The difference is that space will
233still be reserved for patching, a call will be emitted, and a return
234value is allowed.
235
236The location of the arguments are not normally recorded in the stack
237map because they are already fixed by the calling convention. The
238remaining ``live values`` will have their location recorded, which
239could be a register, stack location, or constant. A special calling
240convention has been introduced for use with stack maps, anyregcc,
241which forces the arguments to be loaded into registers but allows
242those register to be dynamically allocated. These argument registers
243will have their register locations recorded in the stack map in
244addition to the remaining ``live values``.
245
246The patch point also emits nops to cover at least ``<numBytes>`` of
247instruction encoding space. Hence, the client must ensure that
248``<numBytes>`` is enough to encode a call to the target address on the
249supported targets. If the call target is constant null, then there is
250no minimum requirement. A zero-byte null target patchpoint is
251valid.
252
253The runtime may patch the code emitted for the patch point, including
254the call sequence and nops. However, the runtime may not assume
255anything about the code LLVM emits within the reserved space. Partial
256patching is not allowed. The runtime must patch all reserved bytes,
257padding with nops if necessary.
258
259This example shows a patch point reserving 15 bytes, with one argument
260in $rdi, and a return value in $rax per native calling convention:
261
262.. code-block:: llvm
263
264  %target = inttoptr i64 -281474976710654 to ptr
265  %val = call i64 (i64, i32, ...)
266           @llvm.experimental.patchpoint.i64(i64 78, i32 15,
267                                             ptr %target, i32 1, ptr %ptr)
268  %add = add i64 %val, 3
269  ret i64 %add
270
271May generate:
272
273.. code-block:: none
274
275  0x00 movabsq $0xffff000000000002, %r11 <--- patch point address
276  0x0a callq   *%r11
277  0x0d nop
278  0x0e nop                               <--- end of reserved 15-bytes
279  0x0f addq    $0x3, %rax
280  0x10 movl    %rax, 8(%rsp)
281
282Note that no stack map locations will be recorded. If the patched code
283sequence does not need arguments fixed to specific calling convention
284registers, then the ``anyregcc`` convention may be used:
285
286.. code-block:: none
287
288  %val = call anyregcc @llvm.experimental.patchpoint(i64 78, i32 15,
289                                                     ptr %target, i32 1,
290                                                     ptr %ptr)
291
292The stack map now indicates the location of the %ptr argument and
293return value:
294
295.. code-block:: none
296
297  Stack Map: ID=78, Loc0=%r9 Loc1=%r8
298
299The patch code sequence may now use the argument that happened to be
300allocated in %r8 and return a value allocated in %r9:
301
302.. code-block:: none
303
304  0x00 movslq 4(%r8) %r9              <--- patched code at patch point address
305  0x03 nop
306  ...
307  0x0e nop                            <--- end of reserved 15-bytes
308  0x0f addq    $0x3, %r9
309  0x10 movl    %r9, 8(%rsp)
310
311.. _stackmap-format:
312
313Stack Map Format
314================
315
316The existence of a stack map or patch point intrinsic within an LLVM
317Module forces code emission to create a :ref:`stackmap-section`. The
318format of this section follows:
319
320.. code-block:: none
321
322  Header {
323    uint8  : Stack Map Version (current version is 3)
324    uint8  : Reserved (expected to be 0)
325    uint16 : Reserved (expected to be 0)
326  }
327  uint32 : NumFunctions
328  uint32 : NumConstants
329  uint32 : NumRecords
330  StkSizeRecord[NumFunctions] {
331    uint64 : Function Address
332    uint64 : Stack Size (or UINT64_MAX if not statically known)
333    uint64 : Record Count
334  }
335  Constants[NumConstants] {
336    uint64 : LargeConstant
337  }
338  StkMapRecord[NumRecords] {
339    uint64 : PatchPoint ID
340    uint32 : Instruction Offset
341    uint16 : Reserved (record flags)
342    uint16 : NumLocations
343    Location[NumLocations] {
344      uint8  : Register | Direct | Indirect | Constant | ConstantIndex
345      uint8  : Reserved (expected to be 0)
346      uint16 : Location Size
347      uint16 : Dwarf RegNum
348      uint16 : Reserved (expected to be 0)
349      int32  : Offset or SmallConstant
350    }
351    uint32 : Padding (only if required to align to 8 byte)
352    uint16 : Padding
353    uint16 : NumLiveOuts
354    LiveOuts[NumLiveOuts]
355      uint16 : Dwarf RegNum
356      uint8  : Reserved
357      uint8  : Size in Bytes
358    }
359    uint32 : Padding (only if required to align to 8 byte)
360  }
361
362The first byte of each location encodes a type that indicates how to
363interpret the ``RegNum`` and ``Offset`` fields as follows:
364
365======== ========== =================== ===========================
366Encoding Type       Value               Description
367-------- ---------- ------------------- ---------------------------
3680x1      Register   Reg                 Value in a register
3690x2      Direct     Reg + Offset        Frame index value
3700x3      Indirect   [Reg + Offset]      Spilled value
3710x4      Constant   Offset              Small constant
3720x5      ConstIndex Constants[Offset]   Large constant
373======== ========== =================== ===========================
374
375In the common case, a value is available in a register, and the
376``Offset`` field will be zero. Values spilled to the stack are encoded
377as ``Indirect`` locations. The runtime must load those values from a
378stack address, typically in the form ``[BP + Offset]``. If an
379``alloca`` value is passed directly to a stack map intrinsic, then
380LLVM may fold the frame index into the stack map as an optimization to
381avoid allocating a register or stack slot. These frame indices will be
382encoded as ``Direct`` locations in the form ``BP + Offset``. LLVM may
383also optimize constants by emitting them directly in the stack map,
384either in the ``Offset`` of a ``Constant`` location or in the constant
385pool, referred to by ``ConstantIndex`` locations.
386
387At each callsite, a "liveout" register list is also recorded. These
388are the registers that are live across the stackmap and therefore must
389be saved by the runtime. This is an important optimization when the
390patchpoint intrinsic is used with a calling convention that by default
391preserves most registers as callee-save.
392
393Each entry in the liveout register list contains a DWARF register
394number and size in bytes. The stackmap format deliberately omits
395specific subregister information. Instead the runtime must interpret
396this information conservatively. For example, if the stackmap reports
397one byte at ``%rax``, then the value may be in either ``%al`` or
398``%ah``. It doesn't matter in practice, because the runtime will
399simply save ``%rax``. However, if the stackmap reports 16 bytes at
400``%ymm0``, then the runtime can safely optimize by saving only
401``%xmm0``.
402
403The stack map format is a contract between an LLVM SVN revision and
404the runtime. It is currently experimental and may change in the short
405term, but minimizing the need to update the runtime is
406important. Consequently, the stack map design is motivated by
407simplicity and extensibility. Compactness of the representation is
408secondary because the runtime is expected to parse the data
409immediately after compiling a module and encode the information in its
410own format. Since the runtime controls the allocation of sections, it
411can reuse the same stack map space for multiple modules.
412
413Stackmap support is currently only implemented for 64-bit
414platforms. However, a 32-bit implementation should be able to use the
415same format with an insignificant amount of wasted space.
416
417.. _stackmap-section:
418
419Stack Map Section
420^^^^^^^^^^^^^^^^^
421
422A JIT compiler can easily access this section by providing its own
423memory manager via the LLVM C API
424``LLVMCreateSimpleMCJITMemoryManager()``. When creating the memory
425manager, the JIT provides a callback:
426``LLVMMemoryManagerAllocateDataSectionCallback()``. When LLVM creates
427this section, it invokes the callback and passes the section name. The
428JIT can record the in-memory address of the section at this time and
429later parse it to recover the stack map data.
430
431For MachO (e.g. on Darwin), the stack map section name is
432"__llvm_stackmaps". The segment name is "__LLVM_STACKMAPS".
433
434For ELF (e.g. on Linux), the stack map section name is
435".llvm_stackmaps".  The segment name is "__LLVM_STACKMAPS".
436
437Stack Map Usage
438===============
439
440The stack map support described in this document can be used to
441precisely determine the location of values at a specific position in
442the code. LLVM does not maintain any mapping between those values and
443any higher-level entity. The runtime must be able to interpret the
444stack map record given only the ID, offset, and the order of the
445locations, records, and functions, which LLVM preserves.
446
447Note that this is quite different from the goal of debug information,
448which is a best-effort attempt to track the location of named
449variables at every instruction.
450
451An important motivation for this design is to allow a runtime to
452commandeer a stack frame when execution reaches an instruction address
453associated with a stack map. The runtime must be able to rebuild a
454stack frame and resume program execution using the information
455provided by the stack map. For example, execution may resume in an
456interpreter or a recompiled version of the same function.
457
458This usage restricts LLVM optimization. Clearly, LLVM must not move
459stores across a stack map. However, loads must also be handled
460conservatively. If the load may trigger an exception, hoisting it
461above a stack map could be invalid. For example, the runtime may
462determine that a load is safe to execute without a type check given
463the current state of the type system. If the type system changes while
464some activation of the load's function exists on the stack, the load
465becomes unsafe. The runtime can prevent subsequent execution of that
466load by immediately patching any stack map location that lies between
467the current call site and the load (typically, the runtime would
468simply patch all stack map locations to invalidate the function). If
469the compiler had hoisted the load above the stack map, then the
470program could crash before the runtime could take back control.
471
472To enforce these semantics, stackmap and patchpoint intrinsics are
473considered to potentially read and write all memory. This may limit
474optimization more than some clients desire. This limitation may be
475avoided by marking the call site as "readonly". In the future we may
476also allow meta-data to be added to the intrinsic call to express
477aliasing, thereby allowing optimizations to hoist certain loads above
478stack maps.
479
480Direct Stack Map Entries
481^^^^^^^^^^^^^^^^^^^^^^^^
482
483As shown in :ref:`stackmap-section`, a Direct stack map location
484records the address of frame index. This address is itself the value
485that the runtime requested. This differs from Indirect locations,
486which refer to a stack locations from which the requested values must
487be loaded. Direct locations can communicate the address if an alloca,
488while Indirect locations handle register spills.
489
490For example:
491
492.. code-block:: none
493
494  entry:
495    %a = alloca i64...
496    llvm.experimental.stackmap(i64 <ID>, i32 <shadowBytes>, ptr %a)
497
498The runtime can determine this alloca's relative location on the
499stack immediately after compilation, or at any time thereafter. This
500differs from Register and Indirect locations, because the runtime can
501only read the values in those locations when execution reaches the
502instruction address of the stack map.
503
504This functionality requires LLVM to treat entry-block allocas
505specially when they are directly consumed by an intrinsics. (This is
506the same requirement imposed by the llvm.gcroot intrinsic.) LLVM
507transformations must not substitute the alloca with any intervening
508value. This can be verified by the runtime simply by checking that the
509stack map's location is a Direct location type.
510
511
512Supported Architectures
513=======================
514
515Support for StackMap generation and the related intrinsics requires
516some code for each backend.  Today, only a subset of LLVM's backends
517are supported.  The currently supported architectures are X86_64,
518PowerPC, AArch64 and SystemZ.
519