xref: /llvm-project/mlir/docs/BytecodeFormat.md (revision 5795f9e27390fe5c322853476b3c9ba8376e8541)
1f3acb54cSRiver Riddle# MLIR Bytecode Format
2f3acb54cSRiver Riddle
3*5795f9e2SMatthias SpringerThis document describes the MLIR bytecode format and its encoding.
4450462cbSMehdi AminiThis format is versioned and stable: we don't plan to ever break
5*5795f9e2SMatthias Springercompatibility, that is a dialect should be able to deserialize any
6*5795f9e2SMatthias Springerolder bytecode. Similarly, we support back-deployment so that an
7*5795f9e2SMatthias Springerolder version of the format can be targetted.
8450462cbSMehdi Amini
9450462cbSMehdi AminiThat said, it is important to realize that the promises of the
10450462cbSMehdi Aminibytecode format are made assuming immutable dialects: the format
11450462cbSMehdi Aminiallows backward and forward compatibility, but only when nothing
12450462cbSMehdi Aminiin a dialect changes (operations, types, attributes definitions).
13450462cbSMehdi Amini
14450462cbSMehdi AminiA dialect can opt-in to handle its own versioning through the
15450462cbSMehdi Amini`BytecodeDialectInterface`. Some hooks are exposed to the dialect
16450462cbSMehdi Aminito allow managing a version encoded into the bytecode file. The
17450462cbSMehdi Aminiversion is loaded lazily and allows to retrieve the version
18450462cbSMehdi Aminiinformation while decoding the input IR, and gives an opportunity
19450462cbSMehdi Aminito each dialect for which a version is present to perform IR
20450462cbSMehdi Aminiupgrades post-parsing through the `upgradeFromVersion` method.
21450462cbSMehdi AminiThere is no restriction on what kind of information a dialect
22*5795f9e2SMatthias Springeris allowed to encode to model its versioning.
23f3acb54cSRiver Riddle
24f3acb54cSRiver Riddle[TOC]
25f3acb54cSRiver Riddle
26f3acb54cSRiver Riddle## Magic Number
27f3acb54cSRiver Riddle
280e0b6070SMatteo FrancioliniMLIR uses the following four-byte magic number to
290e0b6070SMatteo Francioliniindicate bytecode files:
30f3acb54cSRiver Riddle
31f3acb54cSRiver Riddle'\[‘M’<sub>8</sub>, ‘L’<sub>8</sub>, ‘ï’<sub>8</sub>, ‘R’<sub>8</sub>\]'
32f3acb54cSRiver Riddle
33f3acb54cSRiver RiddleIn hex:
34f3acb54cSRiver Riddle
35f3acb54cSRiver Riddle'\[‘4D’<sub>8</sub>, ‘4C’<sub>8</sub>, ‘EF’<sub>8</sub>, ‘52’<sub>8</sub>\]'
36f3acb54cSRiver Riddle
37f3acb54cSRiver Riddle## Format Overview
38f3acb54cSRiver Riddle
39f3acb54cSRiver RiddleAn MLIR Bytecode file is comprised of a byte stream, with a few simple
40f3acb54cSRiver Riddlestructural concepts layered on top.
41f3acb54cSRiver Riddle
42f3acb54cSRiver Riddle### Primitives
43f3acb54cSRiver Riddle
44f3acb54cSRiver Riddle#### Fixed-Width Integers
45f3acb54cSRiver Riddle
46f3acb54cSRiver Riddle```
47f3acb54cSRiver Riddle  byte ::= `0x00`...`0xFF`
48f3acb54cSRiver Riddle```
49f3acb54cSRiver Riddle
50f3acb54cSRiver RiddleFixed width integers are unsigned integers of a known byte size. The values are
51f3acb54cSRiver Riddlestored in little-endian byte order.
52f3acb54cSRiver Riddle
53f3acb54cSRiver RiddleTODO: Add larger fixed width integers as necessary.
54f3acb54cSRiver Riddle
55f3acb54cSRiver Riddle#### Variable-Width Integers
56f3acb54cSRiver Riddle
57f3acb54cSRiver RiddleVariable width integers, or `VarInt`s, provide a compact representation for
58f3acb54cSRiver Riddleintegers. Each encoded VarInt consists of one to nine bytes, which together
59f3acb54cSRiver Riddlerepresent a single 64-bit value. The MLIR bytecode utilizes the "PrefixVarInt"
60f3acb54cSRiver Riddleencoding for VarInts. This encoding is a variant of the
61f3acb54cSRiver Riddle[LEB128 ("Little-Endian Base 128")](https://en.wikipedia.org/wiki/LEB128)
62f3acb54cSRiver Riddleencoding, where each byte of the encoding provides up to 7 bits for the value,
63f3acb54cSRiver Riddlewith the remaining bit used to store a tag indicating the number of bytes used
64f3acb54cSRiver Riddlefor the encoding. This means that small unsigned integers (less than 2^7) may be
65f3acb54cSRiver Riddlestored in one byte, unsigned integers up to 2^14 may be stored in two bytes,
66f3acb54cSRiver Riddleetc.
67f3acb54cSRiver Riddle
68f3acb54cSRiver RiddleThe first byte of the encoding includes a length prefix in the low bits. This
69f3acb54cSRiver Riddleprefix is a bit sequence of '0's followed by a terminal '1', or the end of the
70f3acb54cSRiver Riddlebyte. The number of '0' bits indicate the number of _additional_ bytes, not
71f3acb54cSRiver Riddleincluding the prefix byte, used to encode the value. All of the remaining bits
72f3acb54cSRiver Riddlein the first byte, along with all of the bits in the additional bytes, provide
73f3acb54cSRiver Riddlethe value of the integer. Below are the various possible encodings of the prefix
74f3acb54cSRiver Riddlebyte:
75f3acb54cSRiver Riddle
76f3acb54cSRiver Riddle```
77f3acb54cSRiver Riddlexxxxxxx1:  7 value bits, the encoding uses 1 byte
78f3acb54cSRiver Riddlexxxxxx10: 14 value bits, the encoding uses 2 bytes
79f3acb54cSRiver Riddlexxxxx100: 21 value bits, the encoding uses 3 bytes
80f3acb54cSRiver Riddlexxxx1000: 28 value bits, the encoding uses 4 bytes
81f3acb54cSRiver Riddlexxx10000: 35 value bits, the encoding uses 5 bytes
82f3acb54cSRiver Riddlexx100000: 42 value bits, the encoding uses 6 bytes
83f3acb54cSRiver Riddlex1000000: 49 value bits, the encoding uses 7 bytes
84f3acb54cSRiver Riddle10000000: 56 value bits, the encoding uses 8 bytes
85f3acb54cSRiver Riddle00000000: 64 value bits, the encoding uses 9 bytes
86f3acb54cSRiver Riddle```
87f3acb54cSRiver Riddle
882f90764cSRiver Riddle##### Signed Variable-Width Integers
892f90764cSRiver Riddle
902f90764cSRiver RiddleSigned variable width integer values are encoded in a similar fashion to
912f90764cSRiver Riddle[varints](#variable-width-integers), but employ
922f90764cSRiver Riddle[zigzag encoding](https://en.wikipedia.org/wiki/Variable-length_quantity#Zigzag_encoding).
932f90764cSRiver RiddleThis encoding uses the low bit of the value to indicate the sign, which allows
942f90764cSRiver Riddlefor more efficiently encoding negative numbers. If a negative value were encoded
952f90764cSRiver Riddleusing a normal [varint](#variable-width-integers), it would be treated as an
962f90764cSRiver Riddleextremely large unsigned value. Using zigzag encoding allows for a smaller
972f90764cSRiver Riddlenumber of active bits in the value, leading to a smaller encoding. Below is the
982f90764cSRiver Riddlebasic computation for generating a zigzag encoding:
992f90764cSRiver Riddle
1002f90764cSRiver Riddle```
1012f90764cSRiver Riddle(value << 1) ^ (value >> 63)
1022f90764cSRiver Riddle```
1032f90764cSRiver Riddle
104f3acb54cSRiver Riddle#### Strings
105f3acb54cSRiver Riddle
106f3acb54cSRiver RiddleStrings are blobs of characters with an associated length.
107f3acb54cSRiver Riddle
108f3acb54cSRiver Riddle### Sections
109f3acb54cSRiver Riddle
110f3acb54cSRiver Riddle```
111f3acb54cSRiver Riddlesection {
1126ab2bcffSRiver Riddle  idAndIsAligned: byte // id | (hasAlign << 7)
1136ab2bcffSRiver Riddle  length: varint,
1146ab2bcffSRiver Riddle
1156ab2bcffSRiver Riddle  alignment: varint?,
1166ab2bcffSRiver Riddle  padding: byte[], // Padding bytes are always `0xCB`.
1176ab2bcffSRiver Riddle
1186ab2bcffSRiver Riddle  data: byte[]
119f3acb54cSRiver Riddle}
120f3acb54cSRiver Riddle```
121f3acb54cSRiver Riddle
1226ab2bcffSRiver RiddleSections are a mechanism for grouping data within the bytecode. They enable
123f3acb54cSRiver Riddledelayed processing, which is useful for out-of-order processing of data,
1246ab2bcffSRiver Riddlelazy-loading, and more. Each section contains a Section ID, whose high bit
1256ab2bcffSRiver Riddleindicates if the section has alignment requirements, a length (which allows for
1266ab2bcffSRiver Riddleskipping over the section), and an optional alignment. When an alignment is
1276ab2bcffSRiver Riddlepresent, a variable number of padding bytes (0xCB) may appear before the section
1286ab2bcffSRiver Riddledata. The alignment of a section must be a power of 2.
129f3acb54cSRiver Riddle
130f3acb54cSRiver Riddle## MLIR Encoding
131f3acb54cSRiver Riddle
132f3acb54cSRiver RiddleGiven the generic structure of MLIR, the bytecode encoding is actually fairly
133f3acb54cSRiver Riddlesimplistic. It effectively maps to the core components of MLIR.
134f3acb54cSRiver Riddle
135f3acb54cSRiver Riddle### Top Level Structure
136f3acb54cSRiver Riddle
137f3acb54cSRiver RiddleThe top-level structure of the bytecode contains the 4-byte "magic number", a
138f3acb54cSRiver Riddleversion number, a null-terminated producer string, and a list of sections. Each
139f3acb54cSRiver Riddlesection is currently only expected to appear once within a bytecode file.
140f3acb54cSRiver Riddle
141f3acb54cSRiver Riddle```
142f3acb54cSRiver Riddlebytecode {
143f3acb54cSRiver Riddle  magic: "MLïR",
144f3acb54cSRiver Riddle  version: varint,
145f3acb54cSRiver Riddle  producer: string,
146f3acb54cSRiver Riddle  sections: section[]
147f3acb54cSRiver Riddle}
148f3acb54cSRiver Riddle```
149f3acb54cSRiver Riddle
150f3acb54cSRiver Riddle### String Section
151f3acb54cSRiver Riddle
152f3acb54cSRiver Riddle```
153f3acb54cSRiver Riddlestrings {
154f3acb54cSRiver Riddle  numStrings: varint,
155f3acb54cSRiver Riddle  reverseStringLengths: varint[],
156f3acb54cSRiver Riddle  stringData: byte[]
157f3acb54cSRiver Riddle}
158f3acb54cSRiver Riddle```
159f3acb54cSRiver Riddle
160f3acb54cSRiver RiddleThe string section contains a table of strings referenced within the bytecode,
161f3acb54cSRiver Riddlemore easily enabling string sharing. This section is encoded first with the
162f3acb54cSRiver Riddletotal number of strings, followed by the sizes of each of the individual strings
163f3acb54cSRiver Riddlein reverse order. The remaining encoding contains a single blob containing all
164f3acb54cSRiver Riddleof the strings concatenated together.
165f3acb54cSRiver Riddle
166f3acb54cSRiver Riddle### Dialect Section
167f3acb54cSRiver Riddle
168f3acb54cSRiver RiddleThe dialect section of the bytecode contains all of the dialects referenced
169f3acb54cSRiver Riddlewithin the encoded IR, and some information about the components of those
170f3acb54cSRiver Riddledialects that were also referenced.
171f3acb54cSRiver Riddle
172f3acb54cSRiver Riddle```
173f3acb54cSRiver Riddledialect_section {
174f3acb54cSRiver Riddle  numDialects: varint,
175*5795f9e2SMatthias Springer  dialectNames: dialect_name_group[],
176*5795f9e2SMatthias Springer  opNames: dialect_ops_group[]  // ops grouped by dialect
177f3acb54cSRiver Riddle}
178f3acb54cSRiver Riddle
179*5795f9e2SMatthias Springerdialect_name_group {
180*5795f9e2SMatthias Springer  nameAndIsVersioned: varint  // (dialectID << 1) | (hasVersion),
181*5795f9e2SMatthias Springer  version: dialect_version_section  // only if versioned
182f3acb54cSRiver Riddle}
1830e0b6070SMatteo Franciolini
1840e0b6070SMatteo Franciolinidialect_version_section {
1850e0b6070SMatteo Franciolini  size: varint,
1860e0b6070SMatteo Franciolini  version: byte[]
1870e0b6070SMatteo Franciolini}
1880e0b6070SMatteo Franciolini
189*5795f9e2SMatthias Springerdialect_ops_group {
190*5795f9e2SMatthias Springer  dialect: varint,
191*5795f9e2SMatthias Springer  numOpNames: varint,
192*5795f9e2SMatthias Springer  opNames: op_name_group[]
193*5795f9e2SMatthias Springer}
194*5795f9e2SMatthias Springer
195*5795f9e2SMatthias Springerop_name_group {
196*5795f9e2SMatthias Springer  nameAndIsRegistered: varint  // (nameID << 1) | (isRegisteredOp)
197*5795f9e2SMatthias Springer}
198f3acb54cSRiver Riddle```
199f3acb54cSRiver Riddle
2000e0b6070SMatteo FrancioliniDialects are encoded as a `varint` containing the index to the name string
2010e0b6070SMatteo Francioliniwithin the string section, plus a flag indicating whether the dialect is
2020e0b6070SMatteo Francioliniversioned. Operation names are encoded in groups by dialect, with each group
2030e0b6070SMatteo Franciolinicontaining the dialect, the number of operation names, and the array of indexes
2040e0b6070SMatteo Franciolinito each name within the string section. The version is encoded as a nested
205*5795f9e2SMatthias Springersection for each dialect.
206f3acb54cSRiver Riddle
207f3acb54cSRiver Riddle### Attribute/Type Sections
208f3acb54cSRiver Riddle
209f3acb54cSRiver RiddleAttributes and types are encoded using two [sections](#sections), one section
210f3acb54cSRiver Riddle(`attr_type_section`) containing the actual encoded representation, and another
211f3acb54cSRiver Riddlesection (`attr_type_offset_section`) containing the offsets of each encoded
212f3acb54cSRiver Riddleattribute/type into the previous section. This structure allows for attributes
213f3acb54cSRiver Riddleand types to always be lazily loaded on demand.
214f3acb54cSRiver Riddle
215f3acb54cSRiver Riddle```
216f3acb54cSRiver Riddleattr_type_section {
217f3acb54cSRiver Riddle  attrs: attribute[],
218f3acb54cSRiver Riddle  types: type[]
219f3acb54cSRiver Riddle}
220f3acb54cSRiver Riddleattr_type_offset_section {
221f3acb54cSRiver Riddle  numAttrs: varint,
222f3acb54cSRiver Riddle  numTypes: varint,
223f3acb54cSRiver Riddle  offsets: attr_type_offset_group[]
224f3acb54cSRiver Riddle}
225f3acb54cSRiver Riddle
226f3acb54cSRiver Riddleattr_type_offset_group {
227f3acb54cSRiver Riddle  dialect: varint,
228f3acb54cSRiver Riddle  numElements: varint,
229f3acb54cSRiver Riddle  offsets: varint[] // (offset << 1) | (hasCustomEncoding)
230f3acb54cSRiver Riddle}
231f3acb54cSRiver Riddle
232f3acb54cSRiver Riddleattribute {
233f3acb54cSRiver Riddle  encoding: ...
234f3acb54cSRiver Riddle}
235f3acb54cSRiver Riddletype {
236f3acb54cSRiver Riddle  encoding: ...
237f3acb54cSRiver Riddle}
238f3acb54cSRiver Riddle```
239f3acb54cSRiver Riddle
240f3acb54cSRiver RiddleEach `offset` in the `attr_type_offset_section` above is the size of the
241f3acb54cSRiver Riddleencoding for the attribute or type and a flag indicating if the encoding uses
242f3acb54cSRiver Riddlethe textual assembly format, or a custom bytecode encoding. We avoid using the
243f3acb54cSRiver Riddledirect offset into the `attr_type_section`, as a smaller relative offsets
244f3acb54cSRiver Riddleprovides more effective compression. Attributes and types are grouped by
245f3acb54cSRiver Riddledialect, with each `attr_type_offset_group` in the offset section containing the
246f3acb54cSRiver Riddlecorresponding parent dialect, number of elements, and offsets for each element
247f3acb54cSRiver Riddlewithin the group.
248f3acb54cSRiver Riddle
249f3acb54cSRiver Riddle#### Attribute/Type Encodings
250f3acb54cSRiver Riddle
251f3acb54cSRiver RiddleIn the abstract, an attribute/type is encoded in one of two possible ways: via
252f3acb54cSRiver Riddleits assembly format, or via a custom dialect defined encoding.
253f3acb54cSRiver Riddle
254f3acb54cSRiver Riddle##### Assembly Format Fallback
255f3acb54cSRiver Riddle
256f3acb54cSRiver RiddleIn the case where a dialect does not define a method for encoding the attribute
257f3acb54cSRiver Riddleor type, the textual assembly format of that attribute or type is used as a
258*5795f9e2SMatthias Springerfallback. For example, a type `!bytecode.type<42>` would be encoded as the null
259*5795f9e2SMatthias Springerterminated string "!bytecode.type<42>". This ensures that every attribute and
260*5795f9e2SMatthias Springertype can be encoded, even if the owning dialect has not yet opted in to a more
261f3acb54cSRiver Riddleefficient serialization.
262f3acb54cSRiver Riddle
263f3acb54cSRiver RiddleTODO: We shouldn't redundantly encode the dialect name here, we should use a
264f3acb54cSRiver Riddlereference to the parent dialect instead.
265f3acb54cSRiver Riddle
266f3acb54cSRiver Riddle##### Dialect Defined Encoding
267f3acb54cSRiver Riddle
268*5795f9e2SMatthias SpringerAs an alternative to the assembly format fallback, dialects may also provide a
269*5795f9e2SMatthias Springercustom encoding for their attributes and types. Custom encodings are very
270*5795f9e2SMatthias Springerbeneficial in that they are significantly smaller and faster to read and write.
27102c2ecb9SRiver Riddle
27202c2ecb9SRiver RiddleDialects can opt-in to providing custom encodings by implementing the
27302c2ecb9SRiver Riddle`BytecodeDialectInterface`. This interface provides hooks, namely
27402c2ecb9SRiver Riddle`readAttribute`/`readType` and `writeAttribute`/`writeType`, that will be used
27502c2ecb9SRiver Riddleby the bytecode reader and writer. These hooks are provided a reader and writer
27602c2ecb9SRiver Riddleimplementation that can be used to encode various constructs in the underlying
27702c2ecb9SRiver Riddlebytecode format. A unique feature of this interface is that dialects may choose
27802c2ecb9SRiver Riddleto only encode a subset of their attributes and types in a custom bytecode
27902c2ecb9SRiver Riddleformat, which can simplify adding new or experimental components that aren't
28002c2ecb9SRiver Riddlefully baked.
28102c2ecb9SRiver Riddle
28202c2ecb9SRiver RiddleWhen implementing the bytecode interface, dialects are responsible for all
28302c2ecb9SRiver Riddleaspects of the encoding. This includes the indicator for which kind of attribute
28402c2ecb9SRiver Riddleor type is being encoded; the bytecode reader will only know that it has
28502c2ecb9SRiver Riddleencountered an attribute or type of a given dialect, it doesn't encode any
28602c2ecb9SRiver Riddlefurther information. As such, a common encoding idiom is to use a leading
28702c2ecb9SRiver Riddle`varint` code to indicate how the attribute or type was encoded.
288f3acb54cSRiver Riddle
2896ab2bcffSRiver Riddle### Resource Section
2906ab2bcffSRiver Riddle
2916ab2bcffSRiver RiddleResources are encoded using two [sections](#sections), one section
2926ab2bcffSRiver Riddle(`resource_section`) containing the actual encoded representation, and another
2936ab2bcffSRiver Riddlesection (`resource_offset_section`) containing the offsets of each encoded
2946ab2bcffSRiver Riddleresource into the previous section.
2956ab2bcffSRiver Riddle
2966ab2bcffSRiver Riddle```
2976ab2bcffSRiver Riddleresource_section {
2986ab2bcffSRiver Riddle  resources: resource[]
2996ab2bcffSRiver Riddle}
3006ab2bcffSRiver Riddleresource {
3016ab2bcffSRiver Riddle  value: resource_bool | resource_string | resource_blob
3026ab2bcffSRiver Riddle}
3036ab2bcffSRiver Riddleresource_bool {
3046ab2bcffSRiver Riddle  value: byte
3056ab2bcffSRiver Riddle}
3066ab2bcffSRiver Riddleresource_string {
3076ab2bcffSRiver Riddle  value: varint
3086ab2bcffSRiver Riddle}
3096ab2bcffSRiver Riddleresource_blob {
3106ab2bcffSRiver Riddle  alignment: varint,
3116ab2bcffSRiver Riddle  size: varint,
3126ab2bcffSRiver Riddle  padding: byte[],
3136ab2bcffSRiver Riddle  blob: byte[]
3146ab2bcffSRiver Riddle}
3156ab2bcffSRiver Riddle
3166ab2bcffSRiver Riddleresource_offset_section {
3176ab2bcffSRiver Riddle  numExternalResourceGroups: varint,
3186ab2bcffSRiver Riddle  resourceGroups: resource_group[]
3196ab2bcffSRiver Riddle}
3206ab2bcffSRiver Riddleresource_group {
3216ab2bcffSRiver Riddle  key: varint,
3226ab2bcffSRiver Riddle  numResources: varint,
3236ab2bcffSRiver Riddle  resources: resource_info[]
3246ab2bcffSRiver Riddle}
3256ab2bcffSRiver Riddleresource_info {
3266ab2bcffSRiver Riddle  key: varint,
3276ab2bcffSRiver Riddle  size: varint
3286ab2bcffSRiver Riddle  kind: byte,
3296ab2bcffSRiver Riddle}
3306ab2bcffSRiver Riddle```
3316ab2bcffSRiver Riddle
3326ab2bcffSRiver RiddleResources are grouped by the provider, either an external entity or a dialect,
3336ab2bcffSRiver Riddlewith each `resource_group` in the offset section containing the corresponding
3346ab2bcffSRiver Riddleprovider, number of elements, and info for each element within the group. For
3356ab2bcffSRiver Riddleeach element, we record the key, the value kind, and the encoded size. We avoid
3366ab2bcffSRiver Riddleusing the direct offset into the `resource_section`, as a smaller relative
3376ab2bcffSRiver Riddleoffsets provides more effective compression.
3386ab2bcffSRiver Riddle
339f3acb54cSRiver Riddle### IR Section
340f3acb54cSRiver Riddle
341f3acb54cSRiver RiddleThe IR section contains the encoded form of operations within the bytecode.
342f3acb54cSRiver Riddle
3433128b310SMehdi Amini```
3443128b310SMehdi Aminiir_section {
3453128b310SMehdi Amini  block: block; // Single block without arguments.
3463128b310SMehdi Amini}
3473128b310SMehdi Amini```
3483128b310SMehdi Amini
349f3acb54cSRiver Riddle#### Operation Encoding
350f3acb54cSRiver Riddle
351f3acb54cSRiver Riddle```
352f3acb54cSRiver Riddleop {
353f3acb54cSRiver Riddle  name: varint,
354f3acb54cSRiver Riddle  encodingMask: byte,
355f3acb54cSRiver Riddle  location: varint,
356f3acb54cSRiver Riddle
357f3acb54cSRiver Riddle  attrDict: varint?,
358f3acb54cSRiver Riddle
359f3acb54cSRiver Riddle  numResults: varint?,
360f3acb54cSRiver Riddle  resultTypes: varint[],
361f3acb54cSRiver Riddle
362f3acb54cSRiver Riddle  numOperands: varint?,
363f3acb54cSRiver Riddle  operands: varint[],
364f3acb54cSRiver Riddle
365f3acb54cSRiver Riddle  numSuccessors: varint?,
366f3acb54cSRiver Riddle  successors: varint[],
367f3acb54cSRiver Riddle
36861278191SMatteo Franciolini  numUseListOrders: varint?,
36961278191SMatteo Franciolini  useListOrders: uselist[],
37061278191SMatteo Franciolini
371f3acb54cSRiver Riddle  regionEncoding: varint?, // (numRegions << 1) | (isIsolatedFromAbove)
3723128b310SMehdi Amini
3733128b310SMehdi Amini  // regions are stored in a section if isIsolatedFromAbove
3743128b310SMehdi Amini  regions: (region | region_section)[]
375f3acb54cSRiver Riddle}
37661278191SMatteo Franciolini
37761278191SMatteo Francioliniuselist {
37861278191SMatteo Franciolini  indexInRange: varint?,
37961278191SMatteo Franciolini  useListEncoding: varint, // (numIndices << 1) | (isIndexPairEncoding)
38061278191SMatteo Franciolini  indices: varint[]
38161278191SMatteo Franciolini}
382f3acb54cSRiver Riddle```
383f3acb54cSRiver Riddle
384f3acb54cSRiver RiddleThe encoding of an operation is important because this is generally the most
385f3acb54cSRiver Riddlecommonly appearing structure in the bytecode. A single encoding is used for
386*5795f9e2SMatthias Springerevery type of operation. Given this prevalence, many of the fields of an
387f3acb54cSRiver Riddleoperation are optional. The `encodingMask` field is a bitmask which indicates
388f3acb54cSRiver Riddlewhich of the components of the operation are present.
389f3acb54cSRiver Riddle
390f3acb54cSRiver Riddle##### Location
391f3acb54cSRiver Riddle
392f3acb54cSRiver RiddleThe location is encoded as the index of the location within the attribute table.
393f3acb54cSRiver Riddle
394f3acb54cSRiver Riddle##### Attributes
395f3acb54cSRiver Riddle
396f3acb54cSRiver RiddleIf the operation has attribues, the index of the operation attribute dictionary
397f3acb54cSRiver Riddlewithin the attribute table is encoded.
398f3acb54cSRiver Riddle
399f3acb54cSRiver Riddle##### Results
400f3acb54cSRiver Riddle
401f3acb54cSRiver RiddleIf the operation has results, the number of results and the indexes of the
402f3acb54cSRiver Riddleresult types within the type table are encoded.
403f3acb54cSRiver Riddle
404f3acb54cSRiver Riddle##### Operands
405f3acb54cSRiver Riddle
406f3acb54cSRiver RiddleIf the operation has operands, the number of operands and the value index of
407f3acb54cSRiver Riddleeach operand is encoded. This value index is the relative ordering of the
408f3acb54cSRiver Riddledefinition of that value from the start of the first ancestor isolated region.
409f3acb54cSRiver Riddle
410f3acb54cSRiver Riddle##### Successors
411f3acb54cSRiver Riddle
412f3acb54cSRiver RiddleIf the operation has successors, the number of successors and the indexes of the
413f3acb54cSRiver Riddlesuccessor blocks within the parent region are encoded.
414f3acb54cSRiver Riddle
41561278191SMatteo Franciolini##### Use-list orders
41661278191SMatteo Franciolini
41761278191SMatteo FrancioliniThe reference use-list order is assumed to be the reverse of the global
41861278191SMatteo Franciolinienumeration of all the op operands that one would obtain with a pre-order walk
41961278191SMatteo Francioliniof the IR. This order is naturally obtained by building blocks of operations
42061278191SMatteo Francioliniop-by-op. However, some transformations may shuffle the use-lists with respect
42161278191SMatteo Franciolinito this reference ordering. If any of the results of the operation have a
42261278191SMatteo Francioliniuse-list order that is not sorted with respect to the reference use-list order,
42361278191SMatteo Franciolinian encoding is emitted such that it is possible to reconstruct such order after
42461278191SMatteo Francioliniparsing the bytecode. The encoding represents an index map from the reference
42561278191SMatteo Franciolinioperand order to the current use-list order. A bit flag is used to detect if
42661278191SMatteo Franciolinithis encoding is of type index-pair or not. When the bit flag is set to zero,
42761278191SMatteo Franciolinithe element at `i` represent the position of the use `i` of the reference list
42861278191SMatteo Francioliniinto the current use-list. When the bit flag is set to `1`, the encoding
42961278191SMatteo Franciolinirepresent index pairs `(i, j)`, which indicate that the use at position `i` of
43061278191SMatteo Franciolinithe reference list is mapped to position `j` in the current use-list. When only
43161278191SMatteo Francioliniless than half of the elements in the current use-list are shuffled with respect
43261278191SMatteo Franciolinito the reference use-list, the index-pair encoding is used to reduce the
43361278191SMatteo Franciolinibytecode memory requirements.
43461278191SMatteo Franciolini
435f3acb54cSRiver Riddle##### Regions
436f3acb54cSRiver Riddle
437f3acb54cSRiver RiddleIf the operation has regions, the number of regions and if the regions are
438f3acb54cSRiver Riddleisolated from above are encoded together in a single varint. Afterwards, each
439f3acb54cSRiver Riddleregion is encoded inline.
440f3acb54cSRiver Riddle
441f3acb54cSRiver Riddle#### Region Encoding
442f3acb54cSRiver Riddle
443f3acb54cSRiver Riddle```
444f3acb54cSRiver Riddleregion {
445f3acb54cSRiver Riddle  numBlocks: varint,
446f3acb54cSRiver Riddle
447f3acb54cSRiver Riddle  numValues: varint?,
448f3acb54cSRiver Riddle  blocks: block[]
449f3acb54cSRiver Riddle}
450f3acb54cSRiver Riddle```
451f3acb54cSRiver Riddle
452f3acb54cSRiver RiddleA region is encoded first with the number of blocks within. If the region is
453f3acb54cSRiver Riddlenon-empty, the number of values defined directly within the region are encoded,
454f3acb54cSRiver Riddlefollowed by the blocks of the region.
455f3acb54cSRiver Riddle
456f3acb54cSRiver Riddle#### Block Encoding
457f3acb54cSRiver Riddle
458f3acb54cSRiver Riddle```
459f3acb54cSRiver Riddleblock {
460f3acb54cSRiver Riddle  encoding: varint, // (numOps << 1) | (hasBlockArgs)
461f3acb54cSRiver Riddle  arguments: block_arguments?, // Optional based on encoding
462f3acb54cSRiver Riddle  ops : op[]
463f3acb54cSRiver Riddle}
464f3acb54cSRiver Riddle
465f3acb54cSRiver Riddleblock_arguments {
466f3acb54cSRiver Riddle  numArgs: varint?,
467f3acb54cSRiver Riddle  args: block_argument[]
46861278191SMatteo Franciolini  numUseListOrders: varint?,
46961278191SMatteo Franciolini  useListOrders: uselist[],
470f3acb54cSRiver Riddle}
471f3acb54cSRiver Riddle
472f3acb54cSRiver Riddleblock_argument {
4731826fadbSJacques Pienaar  typeAndLocation: varint, // (type << 1) | (hasLocation)
4741826fadbSJacques Pienaar  location: varint? // Optional, else unknown location
475f3acb54cSRiver Riddle}
476f3acb54cSRiver Riddle```
477f3acb54cSRiver Riddle
478f3acb54cSRiver RiddleA block is encoded with an array of operations and block arguments. The first
479f3acb54cSRiver Riddlefield is an encoding that combines the number of operations in the block, with a
480f3acb54cSRiver Riddleflag indicating if the block has arguments.
48161278191SMatteo Franciolini
48261278191SMatteo FrancioliniUse-list orders are attached to block arguments similarly to how they are
48361278191SMatteo Francioliniattached to operation results.
484