1Formatter Bytecode 2================== 3 4Background 5---------- 6 7LLDB provides rich customization options to display data types (see :doc:`/use/variable/`). To use custom data formatters, developers need to edit the global `~/.lldbinit` file to make sure they are found and loaded. In addition to this rather manual workflow, developers or library authors can ship ship data formatters with their code in a format that allows LLDB automatically find them and run them securely. 8 9An end-to-end example of such a workflow is the Swift `DebugDescription` macro (see https://www.swift.org/blog/announcing-swift-6/#debugging ) that translates Swift string interpolation into LLDB summary strings, and puts them into a `.lldbsummaries` section, where LLDB can find them. 10 11This document describes a minimal bytecode tailored to running LLDB formatters. It defines a human-readable assembler representation for the language, an efficient binary encoding, a virtual machine for evaluating it, and format for embedding formatters into binary containers. 12 13Goals 14~~~~~ 15 16Provide an efficient and secure encoding for data formatters that can be used as a compilation target from user-friendly representations (such as DIL, Swift DebugDescription, or NatVis). 17 18Non-goals 19~~~~~~~~~ 20 21While humans could write the assembler syntax, making it user-friendly is not a goal. It is meant to be used as a compilation target for higher-level, language-specific affordances. 22 23Design of the virtual machine 24----------------------------- 25 26The LLDB formatter virtual machine uses a stack-based bytecode, comparable with DWARF expressions, but with higher-level data types and functions. 27 28The virtual machine has two stacks, a data and a control stack. The control stack is kept separate to make it easier to reason about the security aspects of the virtual machine. 29 30Data types 31~~~~~~~~~~ 32 33All objects on the data stack must have one of the following data types. These data types are "host" data types, in LLDB parlance. 34 35* *String* (UTF-8) 36* *Int* (64 bit) 37* *UInt* (64 bit) 38* *Object* (Basically an `SBValue`) 39* *Type* (Basically an `SBType`) 40* *Selector* (One of the predefine functions) 41 42*Object* and *Type* are opaque, they can only be used as a parameters of `call`. 43 44Instruction set 45--------------- 46 47Stack operations 48~~~~~~~~~~~~~~~~ 49 50These instructions manipulate the data stack directly. 51 52======== ========== =========================== 53 Opcode Mnemonic Stack effect 54-------- ---------- --------------------------- 55 0x00 `dup` `(x -> x x)` 56 0x01 `drop` `(x y -> x)` 57 0x02 `pick` `(x ... UInt -> x ... x)` 58 0x03 `over` `(x y -> x y x)` 59 0x04 `swap` `(x y -> y x)` 60 0x05 `rot` `(x y z -> z x y)` 61======== ========== =========================== 62 63Control flow 64~~~~~~~~~~~~ 65 66These manipulate the control stack and program counter. Both `if` and `ifelse` expect a `UInt` at the top of the data stack to represent the condition. 67 68======== ========== ============================================================ 69 Opcode Mnemonic Description 70-------- ---------- ------------------------------------------------------------ 71 0x10 `{` push a code block address onto the control stack 72 -- `}` (technically not an opcode) syntax for end of code block 73 0x11 `if` `(UInt -> )` pop a block from the control stack, 74 if the top of the data stack is nonzero, execute it 75 0x12 `ifelse` `(UInt -> )` pop two blocks from the control stack, if 76 the top of the data stack is nonzero, execute the first, 77 otherwise the second. 78 0x13 `return` pop the entire control stack and return 79======== ========== ============================================================ 80 81Literals for basic types 82~~~~~~~~~~~~~~~~~~~~~~~~ 83 84======== =========== ============================================================ 85 Opcode Mnemonic Description 86-------- ----------- ------------------------------------------------------------ 87 0x20 `123u` `( -> UInt)` push an unsigned 64-bit host integer 88 0x21 `123` `( -> Int)` push a signed 64-bit host integer 89 0x22 `"abc"` `( -> String)` push a UTF-8 host string 90 0x23 `@strlen` `( -> Selector)` push one of the predefined function 91 selectors. See `call`. 92======== =========== ============================================================ 93 94Conversion operations 95~~~~~~~~~~~~~~~~~~~~~ 96 97======== =========== ================================================================ 98 Opcode Mnemonic Description 99-------- ----------- ---------------------------------------------------------------- 100 0x2a `as_int` `( UInt -> Int)` reinterpret a UInt as an Int 101 0x2b `as_uint` `( Int -> UInt)` reinterpret an Int as a UInt 102 0x2c `is_null` `( Object -> UInt )` check an object for null `(object ? 0 : 1)` 103======== =========== ================================================================ 104 105 106Arithmetic, logic, and comparison operations 107~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 108 109All of these operations are only defined for `Int` and `UInt` and both operands need to be of the same type. The `>>` operator is an arithmetic shift if the parameters are of type `Int`, otherwise it's a logical shift to the right. 110 111======== ========== =========================== 112 Opcode Mnemonic Stack effect 113-------- ---------- --------------------------- 114 0x30 `+` `(x y -> [x+y])` 115 0x31 `-` etc ... 116 0x32 `*` 117 0x33 `/` 118 0x34 `%` 119 0x35 `<<` 120 0x36 `>>` 121 0x40 `~` 122 0x41 `|` 123 0x42 `^` 124 0x50 `=` 125 0x51 `!=` 126 0x52 `<` 127 0x53 `>` 128 0x54 `=<` 129 0x55 `>=` 130======== ========== =========================== 131 132Function calls 133~~~~~~~~~~~~~~ 134 135For security reasons the list of functions callable with `call` is predefined. The supported functions are either existing methods on `SBValue`, or string formatting operations. 136 137======== ========== ============================================ 138 Opcode Mnemonic Stack effect 139-------- ---------- -------------------------------------------- 140 0x60 `call` `(Object argN ... arg0 Selector -> retval)` 141======== ========== ============================================ 142 143Method is one of a predefined set of *Selectors*. 144 145==== ============================ =================================================== ================================== 146Sel. Mnemonic Stack Effect Description 147---- ---------------------------- --------------------------------------------------- ---------------------------------- 1480x00 `summary` `(Object @summary -> String)` `SBValue::GetSummary` 1490x01 `type_summary` `(Object @type_summary -> String)` `SBValue::GetTypeSummary` 1500x10 `get_num_children` `(Object @get_num_children -> UInt)` `SBValue::GetNumChildren` 1510x11 `get_child_at_index` `(Object UInt @get_child_at_index -> Object)` `SBValue::GetChildAtIndex` 1520x12 `get_child_with_name` `(Object String @get_child_with_name -> Object)` `SBValue::GetChildAtIndex` 1530x13 `get_child_index` `(Object String @get_child_index -> UInt)` `SBValue::GetChildIndex` 1540x15 `get_type` `(Object @get_type -> Type)` `SBValue::GetType` 1550x16 `get_template_argument_type` `(Object UInt @get_template_argument_type -> Type)` `SBValue::GetTemplateArgumentType` 1560x17 `cast` `(Object Type @cast -> Object)` `SBValue::Cast` 1570x20 `get_value` `(Object @get_value -> Object)` `SBValue::GetValue` 1580x21 `get_value_as_unsigned` `(Object @get_value_as_unsigned -> UInt)` `SBValue::GetValueAsUnsigned` 1590x22 `get_value_as_signed` `(Object @get_value_as_signed -> Int)` `SBValue::GetValueAsSigned` 1600x23 `get_value_as_address` `(Object @get_value_as_address -> UInt)` `SBValue::GetValueAsAddress` 1610x40 `read_memory_byte` `(UInt @read_memory_byte -> UInt)` `Target::ReadMemory` 1620x41 `read_memory_uint32` `(UInt @read_memory_uint32 -> UInt)` `Target::ReadMemory` 1630x42 `read_memory_int32` `(UInt @read_memory_int32 -> Int)` `Target::ReadMemory` 1640x43 `read_memory_uint64` `(UInt @read_memory_uint64 -> UInt)` `Target::ReadMemory` 1650x44 `read_memory_int64` `(UInt @read_memory_int64 -> Int)` `Target::ReadMemory` 1660x45 `read_memory_address` `(UInt @read_memory_uint64 -> UInt)` `Target::ReadMemory` 1670x46 `read_memory` `(UInt Type @read_memory -> Object)` `Target::ReadMemory` 1680x50 `fmt` `(String arg0 ... @fmt -> String)` `llvm::format` 1690x51 `sprintf` `(String arg0 ... sprintf -> String)` `sprintf` 1700x52 `strlen` `(String strlen -> String)` `strlen in bytes` 171==== ============================ =================================================== ================================== 172 173Byte Code 174~~~~~~~~~ 175 176Most instructions are just a single byte opcode. The only exceptions are the literals: 177 178* *String*: Length in bytes encoded as ULEB128, followed length bytes 179* *Int*: LEB128 180* *UInt*: ULEB128 181* *Selector*: ULEB128 182 183Embedding 184~~~~~~~~~ 185 186Expression programs are embedded into an `.lldbformatters` section (an evolution of the Swift `.lldbsummaries` section) that is a dictionary of type names/regexes and descriptions. It consists of a list of records. Each record starts with the following header: 187 188* Version number (ULEB128) 189* Remaining size of the record (minus the header) (ULEB128) 190 191The version number is increased whenever an incompatible change is made. Adding new opcodes or selectors is not an incompatible change since consumers can unambiguously detect this and report an error. 192 193Space between two records may be padded with NULL bytes. 194 195In version 1, a record consists of a dictionary key, which is a type name or regex. 196 197* Length of the key in bytes (ULEB128) 198* The key (UTF-8) 199 200A regex has to start with `^`, which is part of the regular expression. 201 202After this comes a flag bitfield, which is a ULEB-encoded `lldb::TypeOptions` bitfield. 203 204* Flags (ULEB128) 205 206 207This is followed by one or more dictionary values that immediately follow each other and entirely fill out the record size from the header. Each expression program has the following layout: 208 209* Function signature (1 byte) 210* Length of the program (ULEB128) 211* The program bytecode 212 213The possible function signatures are: 214 215========= ====================== ========================== 216Signature Mnemonic Stack Effect 217--------- ---------------------- -------------------------- 218 0x00 `@summary` `(Object -> String)` 219 0x01 `@init` `(Object -> Object+)` 220 0x02 `@get_num_children` `(Object+ -> UInt)` 221 0x03 `@get_child_index` `(Object+ String -> UInt)` 222 0x04 `@get_child_at_index` `(Object+ UInt -> Object)` 223 0x05 `@get_value` `(Object+ -> String)` 224========= ====================== ========================== 225 226If not specified, the init function defaults to an empty function that just passes the Object along. Its results may be cached and allow common prep work to be done for an Object that can be reused by subsequent calls to the other methods. This way subsequent calls to `@get_child_at_index` can avoid recomputing shared information, for example. 227 228While it is more efficient to store multiple programs per type key, this is not a requirement. LLDB will merge all entries. If there are conflicts the result is undefined. 229 230Execution model 231~~~~~~~~~~~~~~~ 232 233Execution begins at the first byte in the program. The program counter of the virtual machine starts at offset 0 of the bytecode and may never move outside the range of the program as defined in the header. The data stack starts with one Object or the result of the `@init` function (`Object+` in the table above). 234 235Error handling 236~~~~~~~~~~~~~~ 237 238In version 1 errors are unrecoverable, the entire expression will fail if any kind of error is encountered. 239 240