1f3acb54cSRiver Riddle# MLIR Bytecode Format 2f3acb54cSRiver Riddle 3*5795f9e2SMatthias SpringerThis document describes the MLIR bytecode format and its encoding. 4450462cbSMehdi AminiThis format is versioned and stable: we don't plan to ever break 5*5795f9e2SMatthias Springercompatibility, that is a dialect should be able to deserialize any 6*5795f9e2SMatthias Springerolder bytecode. Similarly, we support back-deployment so that an 7*5795f9e2SMatthias Springerolder version of the format can be targetted. 8450462cbSMehdi Amini 9450462cbSMehdi AminiThat said, it is important to realize that the promises of the 10450462cbSMehdi Aminibytecode format are made assuming immutable dialects: the format 11450462cbSMehdi Aminiallows backward and forward compatibility, but only when nothing 12450462cbSMehdi Aminiin a dialect changes (operations, types, attributes definitions). 13450462cbSMehdi Amini 14450462cbSMehdi AminiA dialect can opt-in to handle its own versioning through the 15450462cbSMehdi Amini`BytecodeDialectInterface`. Some hooks are exposed to the dialect 16450462cbSMehdi Aminito allow managing a version encoded into the bytecode file. The 17450462cbSMehdi Aminiversion is loaded lazily and allows to retrieve the version 18450462cbSMehdi Aminiinformation while decoding the input IR, and gives an opportunity 19450462cbSMehdi Aminito each dialect for which a version is present to perform IR 20450462cbSMehdi Aminiupgrades post-parsing through the `upgradeFromVersion` method. 21450462cbSMehdi AminiThere is no restriction on what kind of information a dialect 22*5795f9e2SMatthias Springeris allowed to encode to model its versioning. 23f3acb54cSRiver Riddle 24f3acb54cSRiver Riddle[TOC] 25f3acb54cSRiver Riddle 26f3acb54cSRiver Riddle## Magic Number 27f3acb54cSRiver Riddle 280e0b6070SMatteo FrancioliniMLIR uses the following four-byte magic number to 290e0b6070SMatteo Francioliniindicate bytecode files: 30f3acb54cSRiver Riddle 31f3acb54cSRiver Riddle'\[‘M’<sub>8</sub>, ‘L’<sub>8</sub>, ‘ï’<sub>8</sub>, ‘R’<sub>8</sub>\]' 32f3acb54cSRiver Riddle 33f3acb54cSRiver RiddleIn hex: 34f3acb54cSRiver Riddle 35f3acb54cSRiver Riddle'\[‘4D’<sub>8</sub>, ‘4C’<sub>8</sub>, ‘EF’<sub>8</sub>, ‘52’<sub>8</sub>\]' 36f3acb54cSRiver Riddle 37f3acb54cSRiver Riddle## Format Overview 38f3acb54cSRiver Riddle 39f3acb54cSRiver RiddleAn MLIR Bytecode file is comprised of a byte stream, with a few simple 40f3acb54cSRiver Riddlestructural concepts layered on top. 41f3acb54cSRiver Riddle 42f3acb54cSRiver Riddle### Primitives 43f3acb54cSRiver Riddle 44f3acb54cSRiver Riddle#### Fixed-Width Integers 45f3acb54cSRiver Riddle 46f3acb54cSRiver Riddle``` 47f3acb54cSRiver Riddle byte ::= `0x00`...`0xFF` 48f3acb54cSRiver Riddle``` 49f3acb54cSRiver Riddle 50f3acb54cSRiver RiddleFixed width integers are unsigned integers of a known byte size. The values are 51f3acb54cSRiver Riddlestored in little-endian byte order. 52f3acb54cSRiver Riddle 53f3acb54cSRiver RiddleTODO: Add larger fixed width integers as necessary. 54f3acb54cSRiver Riddle 55f3acb54cSRiver Riddle#### Variable-Width Integers 56f3acb54cSRiver Riddle 57f3acb54cSRiver RiddleVariable width integers, or `VarInt`s, provide a compact representation for 58f3acb54cSRiver Riddleintegers. Each encoded VarInt consists of one to nine bytes, which together 59f3acb54cSRiver Riddlerepresent a single 64-bit value. The MLIR bytecode utilizes the "PrefixVarInt" 60f3acb54cSRiver Riddleencoding for VarInts. This encoding is a variant of the 61f3acb54cSRiver Riddle[LEB128 ("Little-Endian Base 128")](https://en.wikipedia.org/wiki/LEB128) 62f3acb54cSRiver Riddleencoding, where each byte of the encoding provides up to 7 bits for the value, 63f3acb54cSRiver Riddlewith the remaining bit used to store a tag indicating the number of bytes used 64f3acb54cSRiver Riddlefor the encoding. This means that small unsigned integers (less than 2^7) may be 65f3acb54cSRiver Riddlestored in one byte, unsigned integers up to 2^14 may be stored in two bytes, 66f3acb54cSRiver Riddleetc. 67f3acb54cSRiver Riddle 68f3acb54cSRiver RiddleThe first byte of the encoding includes a length prefix in the low bits. This 69f3acb54cSRiver Riddleprefix is a bit sequence of '0's followed by a terminal '1', or the end of the 70f3acb54cSRiver Riddlebyte. The number of '0' bits indicate the number of _additional_ bytes, not 71f3acb54cSRiver Riddleincluding the prefix byte, used to encode the value. All of the remaining bits 72f3acb54cSRiver Riddlein the first byte, along with all of the bits in the additional bytes, provide 73f3acb54cSRiver Riddlethe value of the integer. Below are the various possible encodings of the prefix 74f3acb54cSRiver Riddlebyte: 75f3acb54cSRiver Riddle 76f3acb54cSRiver Riddle``` 77f3acb54cSRiver Riddlexxxxxxx1: 7 value bits, the encoding uses 1 byte 78f3acb54cSRiver Riddlexxxxxx10: 14 value bits, the encoding uses 2 bytes 79f3acb54cSRiver Riddlexxxxx100: 21 value bits, the encoding uses 3 bytes 80f3acb54cSRiver Riddlexxxx1000: 28 value bits, the encoding uses 4 bytes 81f3acb54cSRiver Riddlexxx10000: 35 value bits, the encoding uses 5 bytes 82f3acb54cSRiver Riddlexx100000: 42 value bits, the encoding uses 6 bytes 83f3acb54cSRiver Riddlex1000000: 49 value bits, the encoding uses 7 bytes 84f3acb54cSRiver Riddle10000000: 56 value bits, the encoding uses 8 bytes 85f3acb54cSRiver Riddle00000000: 64 value bits, the encoding uses 9 bytes 86f3acb54cSRiver Riddle``` 87f3acb54cSRiver Riddle 882f90764cSRiver Riddle##### Signed Variable-Width Integers 892f90764cSRiver Riddle 902f90764cSRiver RiddleSigned variable width integer values are encoded in a similar fashion to 912f90764cSRiver Riddle[varints](#variable-width-integers), but employ 922f90764cSRiver Riddle[zigzag encoding](https://en.wikipedia.org/wiki/Variable-length_quantity#Zigzag_encoding). 932f90764cSRiver RiddleThis encoding uses the low bit of the value to indicate the sign, which allows 942f90764cSRiver Riddlefor more efficiently encoding negative numbers. If a negative value were encoded 952f90764cSRiver Riddleusing a normal [varint](#variable-width-integers), it would be treated as an 962f90764cSRiver Riddleextremely large unsigned value. Using zigzag encoding allows for a smaller 972f90764cSRiver Riddlenumber of active bits in the value, leading to a smaller encoding. Below is the 982f90764cSRiver Riddlebasic computation for generating a zigzag encoding: 992f90764cSRiver Riddle 1002f90764cSRiver Riddle``` 1012f90764cSRiver Riddle(value << 1) ^ (value >> 63) 1022f90764cSRiver Riddle``` 1032f90764cSRiver Riddle 104f3acb54cSRiver Riddle#### Strings 105f3acb54cSRiver Riddle 106f3acb54cSRiver RiddleStrings are blobs of characters with an associated length. 107f3acb54cSRiver Riddle 108f3acb54cSRiver Riddle### Sections 109f3acb54cSRiver Riddle 110f3acb54cSRiver Riddle``` 111f3acb54cSRiver Riddlesection { 1126ab2bcffSRiver Riddle idAndIsAligned: byte // id | (hasAlign << 7) 1136ab2bcffSRiver Riddle length: varint, 1146ab2bcffSRiver Riddle 1156ab2bcffSRiver Riddle alignment: varint?, 1166ab2bcffSRiver Riddle padding: byte[], // Padding bytes are always `0xCB`. 1176ab2bcffSRiver Riddle 1186ab2bcffSRiver Riddle data: byte[] 119f3acb54cSRiver Riddle} 120f3acb54cSRiver Riddle``` 121f3acb54cSRiver Riddle 1226ab2bcffSRiver RiddleSections are a mechanism for grouping data within the bytecode. They enable 123f3acb54cSRiver Riddledelayed processing, which is useful for out-of-order processing of data, 1246ab2bcffSRiver Riddlelazy-loading, and more. Each section contains a Section ID, whose high bit 1256ab2bcffSRiver Riddleindicates if the section has alignment requirements, a length (which allows for 1266ab2bcffSRiver Riddleskipping over the section), and an optional alignment. When an alignment is 1276ab2bcffSRiver Riddlepresent, a variable number of padding bytes (0xCB) may appear before the section 1286ab2bcffSRiver Riddledata. The alignment of a section must be a power of 2. 129f3acb54cSRiver Riddle 130f3acb54cSRiver Riddle## MLIR Encoding 131f3acb54cSRiver Riddle 132f3acb54cSRiver RiddleGiven the generic structure of MLIR, the bytecode encoding is actually fairly 133f3acb54cSRiver Riddlesimplistic. It effectively maps to the core components of MLIR. 134f3acb54cSRiver Riddle 135f3acb54cSRiver Riddle### Top Level Structure 136f3acb54cSRiver Riddle 137f3acb54cSRiver RiddleThe top-level structure of the bytecode contains the 4-byte "magic number", a 138f3acb54cSRiver Riddleversion number, a null-terminated producer string, and a list of sections. Each 139f3acb54cSRiver Riddlesection is currently only expected to appear once within a bytecode file. 140f3acb54cSRiver Riddle 141f3acb54cSRiver Riddle``` 142f3acb54cSRiver Riddlebytecode { 143f3acb54cSRiver Riddle magic: "MLïR", 144f3acb54cSRiver Riddle version: varint, 145f3acb54cSRiver Riddle producer: string, 146f3acb54cSRiver Riddle sections: section[] 147f3acb54cSRiver Riddle} 148f3acb54cSRiver Riddle``` 149f3acb54cSRiver Riddle 150f3acb54cSRiver Riddle### String Section 151f3acb54cSRiver Riddle 152f3acb54cSRiver Riddle``` 153f3acb54cSRiver Riddlestrings { 154f3acb54cSRiver Riddle numStrings: varint, 155f3acb54cSRiver Riddle reverseStringLengths: varint[], 156f3acb54cSRiver Riddle stringData: byte[] 157f3acb54cSRiver Riddle} 158f3acb54cSRiver Riddle``` 159f3acb54cSRiver Riddle 160f3acb54cSRiver RiddleThe string section contains a table of strings referenced within the bytecode, 161f3acb54cSRiver Riddlemore easily enabling string sharing. This section is encoded first with the 162f3acb54cSRiver Riddletotal number of strings, followed by the sizes of each of the individual strings 163f3acb54cSRiver Riddlein reverse order. The remaining encoding contains a single blob containing all 164f3acb54cSRiver Riddleof the strings concatenated together. 165f3acb54cSRiver Riddle 166f3acb54cSRiver Riddle### Dialect Section 167f3acb54cSRiver Riddle 168f3acb54cSRiver RiddleThe dialect section of the bytecode contains all of the dialects referenced 169f3acb54cSRiver Riddlewithin the encoded IR, and some information about the components of those 170f3acb54cSRiver Riddledialects that were also referenced. 171f3acb54cSRiver Riddle 172f3acb54cSRiver Riddle``` 173f3acb54cSRiver Riddledialect_section { 174f3acb54cSRiver Riddle numDialects: varint, 175*5795f9e2SMatthias Springer dialectNames: dialect_name_group[], 176*5795f9e2SMatthias Springer opNames: dialect_ops_group[] // ops grouped by dialect 177f3acb54cSRiver Riddle} 178f3acb54cSRiver Riddle 179*5795f9e2SMatthias Springerdialect_name_group { 180*5795f9e2SMatthias Springer nameAndIsVersioned: varint // (dialectID << 1) | (hasVersion), 181*5795f9e2SMatthias Springer version: dialect_version_section // only if versioned 182f3acb54cSRiver Riddle} 1830e0b6070SMatteo Franciolini 1840e0b6070SMatteo Franciolinidialect_version_section { 1850e0b6070SMatteo Franciolini size: varint, 1860e0b6070SMatteo Franciolini version: byte[] 1870e0b6070SMatteo Franciolini} 1880e0b6070SMatteo Franciolini 189*5795f9e2SMatthias Springerdialect_ops_group { 190*5795f9e2SMatthias Springer dialect: varint, 191*5795f9e2SMatthias Springer numOpNames: varint, 192*5795f9e2SMatthias Springer opNames: op_name_group[] 193*5795f9e2SMatthias Springer} 194*5795f9e2SMatthias Springer 195*5795f9e2SMatthias Springerop_name_group { 196*5795f9e2SMatthias Springer nameAndIsRegistered: varint // (nameID << 1) | (isRegisteredOp) 197*5795f9e2SMatthias Springer} 198f3acb54cSRiver Riddle``` 199f3acb54cSRiver Riddle 2000e0b6070SMatteo FrancioliniDialects are encoded as a `varint` containing the index to the name string 2010e0b6070SMatteo Francioliniwithin the string section, plus a flag indicating whether the dialect is 2020e0b6070SMatteo Francioliniversioned. Operation names are encoded in groups by dialect, with each group 2030e0b6070SMatteo Franciolinicontaining the dialect, the number of operation names, and the array of indexes 2040e0b6070SMatteo Franciolinito each name within the string section. The version is encoded as a nested 205*5795f9e2SMatthias Springersection for each dialect. 206f3acb54cSRiver Riddle 207f3acb54cSRiver Riddle### Attribute/Type Sections 208f3acb54cSRiver Riddle 209f3acb54cSRiver RiddleAttributes and types are encoded using two [sections](#sections), one section 210f3acb54cSRiver Riddle(`attr_type_section`) containing the actual encoded representation, and another 211f3acb54cSRiver Riddlesection (`attr_type_offset_section`) containing the offsets of each encoded 212f3acb54cSRiver Riddleattribute/type into the previous section. This structure allows for attributes 213f3acb54cSRiver Riddleand types to always be lazily loaded on demand. 214f3acb54cSRiver Riddle 215f3acb54cSRiver Riddle``` 216f3acb54cSRiver Riddleattr_type_section { 217f3acb54cSRiver Riddle attrs: attribute[], 218f3acb54cSRiver Riddle types: type[] 219f3acb54cSRiver Riddle} 220f3acb54cSRiver Riddleattr_type_offset_section { 221f3acb54cSRiver Riddle numAttrs: varint, 222f3acb54cSRiver Riddle numTypes: varint, 223f3acb54cSRiver Riddle offsets: attr_type_offset_group[] 224f3acb54cSRiver Riddle} 225f3acb54cSRiver Riddle 226f3acb54cSRiver Riddleattr_type_offset_group { 227f3acb54cSRiver Riddle dialect: varint, 228f3acb54cSRiver Riddle numElements: varint, 229f3acb54cSRiver Riddle offsets: varint[] // (offset << 1) | (hasCustomEncoding) 230f3acb54cSRiver Riddle} 231f3acb54cSRiver Riddle 232f3acb54cSRiver Riddleattribute { 233f3acb54cSRiver Riddle encoding: ... 234f3acb54cSRiver Riddle} 235f3acb54cSRiver Riddletype { 236f3acb54cSRiver Riddle encoding: ... 237f3acb54cSRiver Riddle} 238f3acb54cSRiver Riddle``` 239f3acb54cSRiver Riddle 240f3acb54cSRiver RiddleEach `offset` in the `attr_type_offset_section` above is the size of the 241f3acb54cSRiver Riddleencoding for the attribute or type and a flag indicating if the encoding uses 242f3acb54cSRiver Riddlethe textual assembly format, or a custom bytecode encoding. We avoid using the 243f3acb54cSRiver Riddledirect offset into the `attr_type_section`, as a smaller relative offsets 244f3acb54cSRiver Riddleprovides more effective compression. Attributes and types are grouped by 245f3acb54cSRiver Riddledialect, with each `attr_type_offset_group` in the offset section containing the 246f3acb54cSRiver Riddlecorresponding parent dialect, number of elements, and offsets for each element 247f3acb54cSRiver Riddlewithin the group. 248f3acb54cSRiver Riddle 249f3acb54cSRiver Riddle#### Attribute/Type Encodings 250f3acb54cSRiver Riddle 251f3acb54cSRiver RiddleIn the abstract, an attribute/type is encoded in one of two possible ways: via 252f3acb54cSRiver Riddleits assembly format, or via a custom dialect defined encoding. 253f3acb54cSRiver Riddle 254f3acb54cSRiver Riddle##### Assembly Format Fallback 255f3acb54cSRiver Riddle 256f3acb54cSRiver RiddleIn the case where a dialect does not define a method for encoding the attribute 257f3acb54cSRiver Riddleor type, the textual assembly format of that attribute or type is used as a 258*5795f9e2SMatthias Springerfallback. For example, a type `!bytecode.type<42>` would be encoded as the null 259*5795f9e2SMatthias Springerterminated string "!bytecode.type<42>". This ensures that every attribute and 260*5795f9e2SMatthias Springertype can be encoded, even if the owning dialect has not yet opted in to a more 261f3acb54cSRiver Riddleefficient serialization. 262f3acb54cSRiver Riddle 263f3acb54cSRiver RiddleTODO: We shouldn't redundantly encode the dialect name here, we should use a 264f3acb54cSRiver Riddlereference to the parent dialect instead. 265f3acb54cSRiver Riddle 266f3acb54cSRiver Riddle##### Dialect Defined Encoding 267f3acb54cSRiver Riddle 268*5795f9e2SMatthias SpringerAs an alternative to the assembly format fallback, dialects may also provide a 269*5795f9e2SMatthias Springercustom encoding for their attributes and types. Custom encodings are very 270*5795f9e2SMatthias Springerbeneficial in that they are significantly smaller and faster to read and write. 27102c2ecb9SRiver Riddle 27202c2ecb9SRiver RiddleDialects can opt-in to providing custom encodings by implementing the 27302c2ecb9SRiver Riddle`BytecodeDialectInterface`. This interface provides hooks, namely 27402c2ecb9SRiver Riddle`readAttribute`/`readType` and `writeAttribute`/`writeType`, that will be used 27502c2ecb9SRiver Riddleby the bytecode reader and writer. These hooks are provided a reader and writer 27602c2ecb9SRiver Riddleimplementation that can be used to encode various constructs in the underlying 27702c2ecb9SRiver Riddlebytecode format. A unique feature of this interface is that dialects may choose 27802c2ecb9SRiver Riddleto only encode a subset of their attributes and types in a custom bytecode 27902c2ecb9SRiver Riddleformat, which can simplify adding new or experimental components that aren't 28002c2ecb9SRiver Riddlefully baked. 28102c2ecb9SRiver Riddle 28202c2ecb9SRiver RiddleWhen implementing the bytecode interface, dialects are responsible for all 28302c2ecb9SRiver Riddleaspects of the encoding. This includes the indicator for which kind of attribute 28402c2ecb9SRiver Riddleor type is being encoded; the bytecode reader will only know that it has 28502c2ecb9SRiver Riddleencountered an attribute or type of a given dialect, it doesn't encode any 28602c2ecb9SRiver Riddlefurther information. As such, a common encoding idiom is to use a leading 28702c2ecb9SRiver Riddle`varint` code to indicate how the attribute or type was encoded. 288f3acb54cSRiver Riddle 2896ab2bcffSRiver Riddle### Resource Section 2906ab2bcffSRiver Riddle 2916ab2bcffSRiver RiddleResources are encoded using two [sections](#sections), one section 2926ab2bcffSRiver Riddle(`resource_section`) containing the actual encoded representation, and another 2936ab2bcffSRiver Riddlesection (`resource_offset_section`) containing the offsets of each encoded 2946ab2bcffSRiver Riddleresource into the previous section. 2956ab2bcffSRiver Riddle 2966ab2bcffSRiver Riddle``` 2976ab2bcffSRiver Riddleresource_section { 2986ab2bcffSRiver Riddle resources: resource[] 2996ab2bcffSRiver Riddle} 3006ab2bcffSRiver Riddleresource { 3016ab2bcffSRiver Riddle value: resource_bool | resource_string | resource_blob 3026ab2bcffSRiver Riddle} 3036ab2bcffSRiver Riddleresource_bool { 3046ab2bcffSRiver Riddle value: byte 3056ab2bcffSRiver Riddle} 3066ab2bcffSRiver Riddleresource_string { 3076ab2bcffSRiver Riddle value: varint 3086ab2bcffSRiver Riddle} 3096ab2bcffSRiver Riddleresource_blob { 3106ab2bcffSRiver Riddle alignment: varint, 3116ab2bcffSRiver Riddle size: varint, 3126ab2bcffSRiver Riddle padding: byte[], 3136ab2bcffSRiver Riddle blob: byte[] 3146ab2bcffSRiver Riddle} 3156ab2bcffSRiver Riddle 3166ab2bcffSRiver Riddleresource_offset_section { 3176ab2bcffSRiver Riddle numExternalResourceGroups: varint, 3186ab2bcffSRiver Riddle resourceGroups: resource_group[] 3196ab2bcffSRiver Riddle} 3206ab2bcffSRiver Riddleresource_group { 3216ab2bcffSRiver Riddle key: varint, 3226ab2bcffSRiver Riddle numResources: varint, 3236ab2bcffSRiver Riddle resources: resource_info[] 3246ab2bcffSRiver Riddle} 3256ab2bcffSRiver Riddleresource_info { 3266ab2bcffSRiver Riddle key: varint, 3276ab2bcffSRiver Riddle size: varint 3286ab2bcffSRiver Riddle kind: byte, 3296ab2bcffSRiver Riddle} 3306ab2bcffSRiver Riddle``` 3316ab2bcffSRiver Riddle 3326ab2bcffSRiver RiddleResources are grouped by the provider, either an external entity or a dialect, 3336ab2bcffSRiver Riddlewith each `resource_group` in the offset section containing the corresponding 3346ab2bcffSRiver Riddleprovider, number of elements, and info for each element within the group. For 3356ab2bcffSRiver Riddleeach element, we record the key, the value kind, and the encoded size. We avoid 3366ab2bcffSRiver Riddleusing the direct offset into the `resource_section`, as a smaller relative 3376ab2bcffSRiver Riddleoffsets provides more effective compression. 3386ab2bcffSRiver Riddle 339f3acb54cSRiver Riddle### IR Section 340f3acb54cSRiver Riddle 341f3acb54cSRiver RiddleThe IR section contains the encoded form of operations within the bytecode. 342f3acb54cSRiver Riddle 3433128b310SMehdi Amini``` 3443128b310SMehdi Aminiir_section { 3453128b310SMehdi Amini block: block; // Single block without arguments. 3463128b310SMehdi Amini} 3473128b310SMehdi Amini``` 3483128b310SMehdi Amini 349f3acb54cSRiver Riddle#### Operation Encoding 350f3acb54cSRiver Riddle 351f3acb54cSRiver Riddle``` 352f3acb54cSRiver Riddleop { 353f3acb54cSRiver Riddle name: varint, 354f3acb54cSRiver Riddle encodingMask: byte, 355f3acb54cSRiver Riddle location: varint, 356f3acb54cSRiver Riddle 357f3acb54cSRiver Riddle attrDict: varint?, 358f3acb54cSRiver Riddle 359f3acb54cSRiver Riddle numResults: varint?, 360f3acb54cSRiver Riddle resultTypes: varint[], 361f3acb54cSRiver Riddle 362f3acb54cSRiver Riddle numOperands: varint?, 363f3acb54cSRiver Riddle operands: varint[], 364f3acb54cSRiver Riddle 365f3acb54cSRiver Riddle numSuccessors: varint?, 366f3acb54cSRiver Riddle successors: varint[], 367f3acb54cSRiver Riddle 36861278191SMatteo Franciolini numUseListOrders: varint?, 36961278191SMatteo Franciolini useListOrders: uselist[], 37061278191SMatteo Franciolini 371f3acb54cSRiver Riddle regionEncoding: varint?, // (numRegions << 1) | (isIsolatedFromAbove) 3723128b310SMehdi Amini 3733128b310SMehdi Amini // regions are stored in a section if isIsolatedFromAbove 3743128b310SMehdi Amini regions: (region | region_section)[] 375f3acb54cSRiver Riddle} 37661278191SMatteo Franciolini 37761278191SMatteo Francioliniuselist { 37861278191SMatteo Franciolini indexInRange: varint?, 37961278191SMatteo Franciolini useListEncoding: varint, // (numIndices << 1) | (isIndexPairEncoding) 38061278191SMatteo Franciolini indices: varint[] 38161278191SMatteo Franciolini} 382f3acb54cSRiver Riddle``` 383f3acb54cSRiver Riddle 384f3acb54cSRiver RiddleThe encoding of an operation is important because this is generally the most 385f3acb54cSRiver Riddlecommonly appearing structure in the bytecode. A single encoding is used for 386*5795f9e2SMatthias Springerevery type of operation. Given this prevalence, many of the fields of an 387f3acb54cSRiver Riddleoperation are optional. The `encodingMask` field is a bitmask which indicates 388f3acb54cSRiver Riddlewhich of the components of the operation are present. 389f3acb54cSRiver Riddle 390f3acb54cSRiver Riddle##### Location 391f3acb54cSRiver Riddle 392f3acb54cSRiver RiddleThe location is encoded as the index of the location within the attribute table. 393f3acb54cSRiver Riddle 394f3acb54cSRiver Riddle##### Attributes 395f3acb54cSRiver Riddle 396f3acb54cSRiver RiddleIf the operation has attribues, the index of the operation attribute dictionary 397f3acb54cSRiver Riddlewithin the attribute table is encoded. 398f3acb54cSRiver Riddle 399f3acb54cSRiver Riddle##### Results 400f3acb54cSRiver Riddle 401f3acb54cSRiver RiddleIf the operation has results, the number of results and the indexes of the 402f3acb54cSRiver Riddleresult types within the type table are encoded. 403f3acb54cSRiver Riddle 404f3acb54cSRiver Riddle##### Operands 405f3acb54cSRiver Riddle 406f3acb54cSRiver RiddleIf the operation has operands, the number of operands and the value index of 407f3acb54cSRiver Riddleeach operand is encoded. This value index is the relative ordering of the 408f3acb54cSRiver Riddledefinition of that value from the start of the first ancestor isolated region. 409f3acb54cSRiver Riddle 410f3acb54cSRiver Riddle##### Successors 411f3acb54cSRiver Riddle 412f3acb54cSRiver RiddleIf the operation has successors, the number of successors and the indexes of the 413f3acb54cSRiver Riddlesuccessor blocks within the parent region are encoded. 414f3acb54cSRiver Riddle 41561278191SMatteo Franciolini##### Use-list orders 41661278191SMatteo Franciolini 41761278191SMatteo FrancioliniThe reference use-list order is assumed to be the reverse of the global 41861278191SMatteo Franciolinienumeration of all the op operands that one would obtain with a pre-order walk 41961278191SMatteo Francioliniof the IR. This order is naturally obtained by building blocks of operations 42061278191SMatteo Francioliniop-by-op. However, some transformations may shuffle the use-lists with respect 42161278191SMatteo Franciolinito this reference ordering. If any of the results of the operation have a 42261278191SMatteo Francioliniuse-list order that is not sorted with respect to the reference use-list order, 42361278191SMatteo Franciolinian encoding is emitted such that it is possible to reconstruct such order after 42461278191SMatteo Francioliniparsing the bytecode. The encoding represents an index map from the reference 42561278191SMatteo Franciolinioperand order to the current use-list order. A bit flag is used to detect if 42661278191SMatteo Franciolinithis encoding is of type index-pair or not. When the bit flag is set to zero, 42761278191SMatteo Franciolinithe element at `i` represent the position of the use `i` of the reference list 42861278191SMatteo Francioliniinto the current use-list. When the bit flag is set to `1`, the encoding 42961278191SMatteo Franciolinirepresent index pairs `(i, j)`, which indicate that the use at position `i` of 43061278191SMatteo Franciolinithe reference list is mapped to position `j` in the current use-list. When only 43161278191SMatteo Francioliniless than half of the elements in the current use-list are shuffled with respect 43261278191SMatteo Franciolinito the reference use-list, the index-pair encoding is used to reduce the 43361278191SMatteo Franciolinibytecode memory requirements. 43461278191SMatteo Franciolini 435f3acb54cSRiver Riddle##### Regions 436f3acb54cSRiver Riddle 437f3acb54cSRiver RiddleIf the operation has regions, the number of regions and if the regions are 438f3acb54cSRiver Riddleisolated from above are encoded together in a single varint. Afterwards, each 439f3acb54cSRiver Riddleregion is encoded inline. 440f3acb54cSRiver Riddle 441f3acb54cSRiver Riddle#### Region Encoding 442f3acb54cSRiver Riddle 443f3acb54cSRiver Riddle``` 444f3acb54cSRiver Riddleregion { 445f3acb54cSRiver Riddle numBlocks: varint, 446f3acb54cSRiver Riddle 447f3acb54cSRiver Riddle numValues: varint?, 448f3acb54cSRiver Riddle blocks: block[] 449f3acb54cSRiver Riddle} 450f3acb54cSRiver Riddle``` 451f3acb54cSRiver Riddle 452f3acb54cSRiver RiddleA region is encoded first with the number of blocks within. If the region is 453f3acb54cSRiver Riddlenon-empty, the number of values defined directly within the region are encoded, 454f3acb54cSRiver Riddlefollowed by the blocks of the region. 455f3acb54cSRiver Riddle 456f3acb54cSRiver Riddle#### Block Encoding 457f3acb54cSRiver Riddle 458f3acb54cSRiver Riddle``` 459f3acb54cSRiver Riddleblock { 460f3acb54cSRiver Riddle encoding: varint, // (numOps << 1) | (hasBlockArgs) 461f3acb54cSRiver Riddle arguments: block_arguments?, // Optional based on encoding 462f3acb54cSRiver Riddle ops : op[] 463f3acb54cSRiver Riddle} 464f3acb54cSRiver Riddle 465f3acb54cSRiver Riddleblock_arguments { 466f3acb54cSRiver Riddle numArgs: varint?, 467f3acb54cSRiver Riddle args: block_argument[] 46861278191SMatteo Franciolini numUseListOrders: varint?, 46961278191SMatteo Franciolini useListOrders: uselist[], 470f3acb54cSRiver Riddle} 471f3acb54cSRiver Riddle 472f3acb54cSRiver Riddleblock_argument { 4731826fadbSJacques Pienaar typeAndLocation: varint, // (type << 1) | (hasLocation) 4741826fadbSJacques Pienaar location: varint? // Optional, else unknown location 475f3acb54cSRiver Riddle} 476f3acb54cSRiver Riddle``` 477f3acb54cSRiver Riddle 478f3acb54cSRiver RiddleA block is encoded with an array of operations and block arguments. The first 479f3acb54cSRiver Riddlefield is an encoding that combines the number of operations in the block, with a 480f3acb54cSRiver Riddleflag indicating if the block has arguments. 48161278191SMatteo Franciolini 48261278191SMatteo FrancioliniUse-list orders are attached to block arguments similarly to how they are 48361278191SMatteo Francioliniattached to operation results. 484