1=================================== 2Stack maps and patch points in LLVM 3=================================== 4 5.. contents:: 6 :local: 7 :depth: 2 8 9Definitions 10=========== 11 12In this document we refer to the "runtime" collectively as all 13components that serve as the LLVM client, including the LLVM IR 14generator, object code consumer, and code patcher. 15 16A stack map records the location of ``live values`` at a particular 17instruction address. These ``live values`` do not refer to all the 18LLVM values live across the stack map. Instead, they are only the 19values that the runtime requires to be live at this point. For 20example, they may be the values the runtime will need to resume 21program execution at that point independent of the compiled function 22containing the stack map. 23 24LLVM emits stack map data into the object code within a designated 25:ref:`stackmap-section`. This stack map data contains a record for 26each stack map. The record stores the stack map's instruction address 27and contains an entry for each mapped value. Each entry encodes a 28value's location as a register, stack offset, or constant. 29 30A patch point is an instruction address at which space is reserved for 31patching a new instruction sequence at run time. Patch points look 32much like calls to LLVM. They take arguments that follow a calling 33convention and may return a value. They also imply stack map 34generation, which allows the runtime to locate the patchpoint and 35find the location of ``live values`` at that point. 36 37Motivation 38========== 39 40This functionality is currently experimental but is potentially useful 41in a variety of settings, the most obvious being a runtime (JIT) 42compiler. Example applications of the patchpoint intrinsics are 43implementing an inline call cache for polymorphic method dispatch or 44optimizing the retrieval of properties in dynamically typed languages 45such as JavaScript. 46 47The intrinsics documented here are currently used by the JavaScript 48compiler within the open source WebKit project, see the `FTL JIT 49<https://trac.webkit.org/wiki/FTLJIT>`_, but they are designed to be 50used whenever stack maps or code patching are needed. Because the 51intrinsics have experimental status, compatibility across LLVM 52releases is not guaranteed. 53 54The stack map functionality described in this document is separate 55from the functionality described in 56:ref:`stack-map`. `GCFunctionMetadata` provides the location of 57pointers into a collected heap captured by the `GCRoot` intrinsic, 58which can also be considered a "stack map". Unlike the stack maps 59defined above, the `GCFunctionMetadata` stack map interface does not 60provide a way to associate live register values of arbitrary type with 61an instruction address, nor does it specify a format for the resulting 62stack map. The stack maps described here could potentially provide 63richer information to a garbage collecting runtime, but that usage 64will not be discussed in this document. 65 66Intrinsics 67========== 68 69The following two kinds of intrinsics can be used to implement stack 70maps and patch points: ``llvm.experimental.stackmap`` and 71``llvm.experimental.patchpoint``. Both kinds of intrinsics generate a 72stack map record, and they both allow some form of code patching. They 73can be used independently (i.e. ``llvm.experimental.patchpoint`` 74implicitly generates a stack map without the need for an additional 75call to ``llvm.experimental.stackmap``). The choice of which to use 76depends on whether it is necessary to reserve space for code patching 77and whether any of the intrinsic arguments should be lowered according 78to calling conventions. ``llvm.experimental.stackmap`` does not 79reserve any space, nor does it expect any call arguments. If the 80runtime patches code at the stack map's address, it will destructively 81overwrite the program text. This is unlike 82``llvm.experimental.patchpoint``, which reserves space for in-place 83patching without overwriting surrounding code. The 84``llvm.experimental.patchpoint`` intrinsic also lowers a specified 85number of arguments according to its calling convention. This allows 86patched code to make in-place function calls without marshaling. 87 88Each instance of one of these intrinsics generates a stack map record 89in the :ref:`stackmap-section`. The record includes an ID, allowing 90the runtime to uniquely identify the stack map, and the offset within 91the code from the beginning of the enclosing function. 92 93'``llvm.experimental.stackmap``' Intrinsic 94^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 95 96Syntax: 97""""""" 98 99:: 100 101 declare void 102 @llvm.experimental.stackmap(i64 <id>, i32 <numShadowBytes>, ...) 103 104Overview: 105""""""""" 106 107The '``llvm.experimental.stackmap``' intrinsic records the location of 108specified values in the stack map without generating any code. 109 110Operands: 111""""""""" 112 113The first operand is an ID to be encoded within the stack map. The 114second operand is the number of shadow bytes following the 115intrinsic. These first two operands should be immediate, e.g. cannot 116be passed as variables. The variable number of operands that follow are 117the ``live values`` for which locations will be recorded in the stack map. 118 119To use this intrinsic as a bare-bones stack map, with no code patching 120support, the number of shadow bytes can be set to zero. 121 122Semantics: 123"""""""""" 124 125The stack map intrinsic generates no code in place, unless nops are 126needed to cover its shadow (see below). However, its offset from 127function entry is stored in the stack map. This is the relative 128instruction address immediately following the instructions that 129precede the stack map. 130 131The stack map ID allows a runtime to locate the desired stack map 132record. LLVM passes this ID through directly to the stack map 133record without checking uniqueness. 134 135LLVM guarantees a shadow of instructions following the stack map's 136instruction offset during which neither the end of the basic block nor 137another call to ``llvm.experimental.stackmap`` or 138``llvm.experimental.patchpoint`` may occur. This allows the runtime to 139patch the code at this point in response to an event triggered from 140outside the code. The code for instructions following the stack map 141may be emitted in the stack map's shadow, and these instructions may 142be overwritten by destructive patching. Without shadow bytes, this 143destructive patching could overwrite program text or data outside the 144current function. We disallow overlapping stack map shadows so that 145the runtime does not need to consider this corner case. 146 147For example, a stack map with 8 byte shadow: 148 149.. code-block:: llvm 150 151 call void @runtime() 152 call void (i64, i32, ...) @llvm.experimental.stackmap(i64 77, i32 8, 153 ptr %ptr) 154 %val = load i64, ptr %ptr 155 %add = add i64 %val, 3 156 ret i64 %add 157 158May require one byte of nop-padding: 159 160.. code-block:: none 161 162 0x00 callq _runtime 163 0x05 nop <--- stack map address 164 0x06 movq (%rdi), %rax 165 0x07 addq $3, %rax 166 0x0a popq %rdx 167 0x0b ret <---- end of 8-byte shadow 168 169Now, if the runtime needs to invalidate the compiled code, it may 170patch 8 bytes of code at the stack map's address at follows: 171 172.. code-block:: none 173 174 0x00 callq _runtime 175 0x05 movl $0xffff, %rax <--- patched code at stack map address 176 0x0a callq *%rax <---- end of 8-byte shadow 177 178This way, after the normal call to the runtime returns, the code will 179execute a patched call to a special entry point that can rebuild a 180stack frame from the values located by the stack map. 181 182'``llvm.experimental.patchpoint.*``' Intrinsic 183^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 184 185Syntax: 186""""""" 187 188:: 189 190 declare void 191 @llvm.experimental.patchpoint.void(i64 <id>, i32 <numBytes>, 192 ptr <target>, i32 <numArgs>, ...) 193 declare i64 194 @llvm.experimental.patchpoint.i64(i64 <id>, i32 <numBytes>, 195 ptr <target>, i32 <numArgs>, ...) 196 197Overview: 198""""""""" 199 200The '``llvm.experimental.patchpoint.*``' intrinsics creates a function 201call to the specified ``<target>`` and records the location of specified 202values in the stack map. 203 204Operands: 205""""""""" 206 207The first operand is an ID, the second operand is the number of bytes 208reserved for the patchable region, the third operand is the target 209address of a function (optionally null), and the fourth operand 210specifies how many of the following variable operands are considered 211function call arguments. The remaining variable number of operands are 212the ``live values`` for which locations will be recorded in the stack 213map. 214 215Semantics: 216"""""""""" 217 218The patch point intrinsic generates a stack map. It also emits a 219function call to the address specified by ``<target>`` if the address 220is not a constant null. The function call and its arguments are 221lowered according to the calling convention specified at the 222intrinsic's callsite. Variants of the intrinsic with non-void return 223type also return a value according to calling convention. 224 225On PowerPC, note that ``<target>`` must be the ABI function pointer for the 226intended target of the indirect call. Specifically, when compiling for the 227ELF V1 ABI, ``<target>`` is the function-descriptor address normally used as 228the C/C++ function-pointer representation. 229 230Requesting zero patch point arguments is valid. In this case, all 231variable operands are handled just like 232``llvm.experimental.stackmap.*``. The difference is that space will 233still be reserved for patching, a call will be emitted, and a return 234value is allowed. 235 236The location of the arguments are not normally recorded in the stack 237map because they are already fixed by the calling convention. The 238remaining ``live values`` will have their location recorded, which 239could be a register, stack location, or constant. A special calling 240convention has been introduced for use with stack maps, anyregcc, 241which forces the arguments to be loaded into registers but allows 242those register to be dynamically allocated. These argument registers 243will have their register locations recorded in the stack map in 244addition to the remaining ``live values``. 245 246The patch point also emits nops to cover at least ``<numBytes>`` of 247instruction encoding space. Hence, the client must ensure that 248``<numBytes>`` is enough to encode a call to the target address on the 249supported targets. If the call target is constant null, then there is 250no minimum requirement. A zero-byte null target patchpoint is 251valid. 252 253The runtime may patch the code emitted for the patch point, including 254the call sequence and nops. However, the runtime may not assume 255anything about the code LLVM emits within the reserved space. Partial 256patching is not allowed. The runtime must patch all reserved bytes, 257padding with nops if necessary. 258 259This example shows a patch point reserving 15 bytes, with one argument 260in $rdi, and a return value in $rax per native calling convention: 261 262.. code-block:: llvm 263 264 %target = inttoptr i64 -281474976710654 to ptr 265 %val = call i64 (i64, i32, ...) 266 @llvm.experimental.patchpoint.i64(i64 78, i32 15, 267 ptr %target, i32 1, ptr %ptr) 268 %add = add i64 %val, 3 269 ret i64 %add 270 271May generate: 272 273.. code-block:: none 274 275 0x00 movabsq $0xffff000000000002, %r11 <--- patch point address 276 0x0a callq *%r11 277 0x0d nop 278 0x0e nop <--- end of reserved 15-bytes 279 0x0f addq $0x3, %rax 280 0x10 movl %rax, 8(%rsp) 281 282Note that no stack map locations will be recorded. If the patched code 283sequence does not need arguments fixed to specific calling convention 284registers, then the ``anyregcc`` convention may be used: 285 286.. code-block:: none 287 288 %val = call anyregcc @llvm.experimental.patchpoint(i64 78, i32 15, 289 ptr %target, i32 1, 290 ptr %ptr) 291 292The stack map now indicates the location of the %ptr argument and 293return value: 294 295.. code-block:: none 296 297 Stack Map: ID=78, Loc0=%r9 Loc1=%r8 298 299The patch code sequence may now use the argument that happened to be 300allocated in %r8 and return a value allocated in %r9: 301 302.. code-block:: none 303 304 0x00 movslq 4(%r8) %r9 <--- patched code at patch point address 305 0x03 nop 306 ... 307 0x0e nop <--- end of reserved 15-bytes 308 0x0f addq $0x3, %r9 309 0x10 movl %r9, 8(%rsp) 310 311.. _stackmap-format: 312 313Stack Map Format 314================ 315 316The existence of a stack map or patch point intrinsic within an LLVM 317Module forces code emission to create a :ref:`stackmap-section`. The 318format of this section follows: 319 320.. code-block:: none 321 322 Header { 323 uint8 : Stack Map Version (current version is 3) 324 uint8 : Reserved (expected to be 0) 325 uint16 : Reserved (expected to be 0) 326 } 327 uint32 : NumFunctions 328 uint32 : NumConstants 329 uint32 : NumRecords 330 StkSizeRecord[NumFunctions] { 331 uint64 : Function Address 332 uint64 : Stack Size (or UINT64_MAX if not statically known) 333 uint64 : Record Count 334 } 335 Constants[NumConstants] { 336 uint64 : LargeConstant 337 } 338 StkMapRecord[NumRecords] { 339 uint64 : PatchPoint ID 340 uint32 : Instruction Offset 341 uint16 : Reserved (record flags) 342 uint16 : NumLocations 343 Location[NumLocations] { 344 uint8 : Register | Direct | Indirect | Constant | ConstantIndex 345 uint8 : Reserved (expected to be 0) 346 uint16 : Location Size 347 uint16 : Dwarf RegNum 348 uint16 : Reserved (expected to be 0) 349 int32 : Offset or SmallConstant 350 } 351 uint32 : Padding (only if required to align to 8 byte) 352 uint16 : Padding 353 uint16 : NumLiveOuts 354 LiveOuts[NumLiveOuts] 355 uint16 : Dwarf RegNum 356 uint8 : Reserved 357 uint8 : Size in Bytes 358 } 359 uint32 : Padding (only if required to align to 8 byte) 360 } 361 362The first byte of each location encodes a type that indicates how to 363interpret the ``RegNum`` and ``Offset`` fields as follows: 364 365======== ========== =================== =========================== 366Encoding Type Value Description 367-------- ---------- ------------------- --------------------------- 3680x1 Register Reg Value in a register 3690x2 Direct Reg + Offset Frame index value 3700x3 Indirect [Reg + Offset] Spilled value 3710x4 Constant Offset Small constant 3720x5 ConstIndex Constants[Offset] Large constant 373======== ========== =================== =========================== 374 375In the common case, a value is available in a register, and the 376``Offset`` field will be zero. Values spilled to the stack are encoded 377as ``Indirect`` locations. The runtime must load those values from a 378stack address, typically in the form ``[BP + Offset]``. If an 379``alloca`` value is passed directly to a stack map intrinsic, then 380LLVM may fold the frame index into the stack map as an optimization to 381avoid allocating a register or stack slot. These frame indices will be 382encoded as ``Direct`` locations in the form ``BP + Offset``. LLVM may 383also optimize constants by emitting them directly in the stack map, 384either in the ``Offset`` of a ``Constant`` location or in the constant 385pool, referred to by ``ConstantIndex`` locations. 386 387At each callsite, a "liveout" register list is also recorded. These 388are the registers that are live across the stackmap and therefore must 389be saved by the runtime. This is an important optimization when the 390patchpoint intrinsic is used with a calling convention that by default 391preserves most registers as callee-save. 392 393Each entry in the liveout register list contains a DWARF register 394number and size in bytes. The stackmap format deliberately omits 395specific subregister information. Instead the runtime must interpret 396this information conservatively. For example, if the stackmap reports 397one byte at ``%rax``, then the value may be in either ``%al`` or 398``%ah``. It doesn't matter in practice, because the runtime will 399simply save ``%rax``. However, if the stackmap reports 16 bytes at 400``%ymm0``, then the runtime can safely optimize by saving only 401``%xmm0``. 402 403The stack map format is a contract between an LLVM SVN revision and 404the runtime. It is currently experimental and may change in the short 405term, but minimizing the need to update the runtime is 406important. Consequently, the stack map design is motivated by 407simplicity and extensibility. Compactness of the representation is 408secondary because the runtime is expected to parse the data 409immediately after compiling a module and encode the information in its 410own format. Since the runtime controls the allocation of sections, it 411can reuse the same stack map space for multiple modules. 412 413Stackmap support is currently only implemented for 64-bit 414platforms. However, a 32-bit implementation should be able to use the 415same format with an insignificant amount of wasted space. 416 417.. _stackmap-section: 418 419Stack Map Section 420^^^^^^^^^^^^^^^^^ 421 422A JIT compiler can easily access this section by providing its own 423memory manager via the LLVM C API 424``LLVMCreateSimpleMCJITMemoryManager()``. When creating the memory 425manager, the JIT provides a callback: 426``LLVMMemoryManagerAllocateDataSectionCallback()``. When LLVM creates 427this section, it invokes the callback and passes the section name. The 428JIT can record the in-memory address of the section at this time and 429later parse it to recover the stack map data. 430 431For MachO (e.g. on Darwin), the stack map section name is 432"__llvm_stackmaps". The segment name is "__LLVM_STACKMAPS". 433 434For ELF (e.g. on Linux), the stack map section name is 435".llvm_stackmaps". The segment name is "__LLVM_STACKMAPS". 436 437Stack Map Usage 438=============== 439 440The stack map support described in this document can be used to 441precisely determine the location of values at a specific position in 442the code. LLVM does not maintain any mapping between those values and 443any higher-level entity. The runtime must be able to interpret the 444stack map record given only the ID, offset, and the order of the 445locations, records, and functions, which LLVM preserves. 446 447Note that this is quite different from the goal of debug information, 448which is a best-effort attempt to track the location of named 449variables at every instruction. 450 451An important motivation for this design is to allow a runtime to 452commandeer a stack frame when execution reaches an instruction address 453associated with a stack map. The runtime must be able to rebuild a 454stack frame and resume program execution using the information 455provided by the stack map. For example, execution may resume in an 456interpreter or a recompiled version of the same function. 457 458This usage restricts LLVM optimization. Clearly, LLVM must not move 459stores across a stack map. However, loads must also be handled 460conservatively. If the load may trigger an exception, hoisting it 461above a stack map could be invalid. For example, the runtime may 462determine that a load is safe to execute without a type check given 463the current state of the type system. If the type system changes while 464some activation of the load's function exists on the stack, the load 465becomes unsafe. The runtime can prevent subsequent execution of that 466load by immediately patching any stack map location that lies between 467the current call site and the load (typically, the runtime would 468simply patch all stack map locations to invalidate the function). If 469the compiler had hoisted the load above the stack map, then the 470program could crash before the runtime could take back control. 471 472To enforce these semantics, stackmap and patchpoint intrinsics are 473considered to potentially read and write all memory. This may limit 474optimization more than some clients desire. This limitation may be 475avoided by marking the call site as "readonly". In the future we may 476also allow meta-data to be added to the intrinsic call to express 477aliasing, thereby allowing optimizations to hoist certain loads above 478stack maps. 479 480Direct Stack Map Entries 481^^^^^^^^^^^^^^^^^^^^^^^^ 482 483As shown in :ref:`stackmap-section`, a Direct stack map location 484records the address of frame index. This address is itself the value 485that the runtime requested. This differs from Indirect locations, 486which refer to a stack locations from which the requested values must 487be loaded. Direct locations can communicate the address if an alloca, 488while Indirect locations handle register spills. 489 490For example: 491 492.. code-block:: none 493 494 entry: 495 %a = alloca i64... 496 llvm.experimental.stackmap(i64 <ID>, i32 <shadowBytes>, ptr %a) 497 498The runtime can determine this alloca's relative location on the 499stack immediately after compilation, or at any time thereafter. This 500differs from Register and Indirect locations, because the runtime can 501only read the values in those locations when execution reaches the 502instruction address of the stack map. 503 504This functionality requires LLVM to treat entry-block allocas 505specially when they are directly consumed by an intrinsics. (This is 506the same requirement imposed by the llvm.gcroot intrinsic.) LLVM 507transformations must not substitute the alloca with any intervening 508value. This can be verified by the runtime simply by checking that the 509stack map's location is a Direct location type. 510 511 512Supported Architectures 513======================= 514 515Support for StackMap generation and the related intrinsics requires 516some code for each backend. Today, only a subset of LLVM's backends 517are supported. The currently supported architectures are X86_64, 518PowerPC, AArch64 and SystemZ. 519