xref: /llvm-project/llvm/docs/MIRLangRef.rst (revision 400d4fd7b6dea9c7cdd255bb804fcd0ee77f6d42)
1========================================
2Machine IR (MIR) Format Reference Manual
3========================================
4
5.. contents::
6   :local:
7
8.. warning::
9  This is a work in progress.
10
11Introduction
12============
13
14This document is a reference manual for the Machine IR (MIR) serialization
15format. MIR is a human readable serialization format that is used to represent
16LLVM's :ref:`machine specific intermediate representation
17<machine code representation>`.
18
19The MIR serialization format is designed to be used for testing the code
20generation passes in LLVM.
21
22Overview
23========
24
25The MIR serialization format uses a YAML container. YAML is a standard
26data serialization language, and the full YAML language spec can be read at
27`yaml.org
28<http://www.yaml.org/spec/1.2/spec.html#Introduction>`_.
29
30A MIR file is split up into a series of `YAML documents`_. The first document
31can contain an optional embedded LLVM IR module, and the rest of the documents
32contain the serialized machine functions.
33
34.. _YAML documents: http://www.yaml.org/spec/1.2/spec.html#id2800132
35
36MIR Testing Guide
37=================
38
39You can use the MIR format for testing in two different ways:
40
41- You can write MIR tests that invoke a single code generation pass using the
42  ``-run-pass`` option in llc.
43
44- You can use llc's ``-stop-after`` option with existing or new LLVM assembly
45  tests and check the MIR output of a specific code generation pass.
46
47Testing Individual Code Generation Passes
48-----------------------------------------
49
50The ``-run-pass`` option in llc allows you to create MIR tests that invoke just
51a single code generation pass. When this option is used, llc will parse an
52input MIR file, run the specified code generation pass(es), and output the
53resulting MIR code.
54
55You can generate an input MIR file for the test by using the ``-stop-after`` or
56``-stop-before`` option in llc. For example, if you would like to write a test
57for the post register allocation pseudo instruction expansion pass, you can
58specify the machine copy propagation pass in the ``-stop-after`` option, as it
59runs just before the pass that we are trying to test:
60
61   ``llc -stop-after=machine-cp bug-trigger.ll -o test.mir``
62
63If the same pass is run multiple times, a run index can be included
64after the name with a comma.
65
66   ``llc -stop-after=dead-mi-elimination,1 bug-trigger.ll -o test.mir``
67
68After generating the input MIR file, you'll have to add a run line that uses
69the ``-run-pass`` option to it. In order to test the post register allocation
70pseudo instruction expansion pass on X86-64, a run line like the one shown
71below can be used:
72
73    ``# RUN: llc -o - %s -mtriple=x86_64-- -run-pass=postrapseudos | FileCheck %s``
74
75The MIR files are target dependent, so they have to be placed in the target
76specific test directories (``lib/CodeGen/TARGETNAME``). They also need to
77specify a target triple or a target architecture either in the run line or in
78the embedded LLVM IR module.
79
80Simplifying MIR files
81^^^^^^^^^^^^^^^^^^^^^
82
83The MIR code coming out of ``-stop-after``/``-stop-before`` is very verbose;
84Tests are more accessible and future proof when simplified:
85
86- Use the ``-simplify-mir`` option with llc.
87
88- Machine function attributes often have default values or the test works just
89  as well with default values. Typical candidates for this are: `alignment:`,
90  `exposesReturnsTwice`, `legalized`, `regBankSelected`, `selected`.
91  The whole `frameInfo` section is often unnecessary if there is no special
92  frame usage in the function. `tracksRegLiveness` on the other hand is often
93  necessary for some passes that care about block livein lists.
94
95- The (global) `liveins:` list is typically only interesting for early
96  instruction selection passes and can be removed when testing later passes.
97  The per-block `liveins:` on the other hand are necessary if
98  `tracksRegLiveness` is true.
99
100- Branch probability data in block `successors:` lists can be dropped if the
101  test doesn't depend on it. Example:
102  `successors: %bb.1(0x40000000), %bb.2(0x40000000)` can be replaced with
103  `successors: %bb.1, %bb.2`.
104
105- MIR code contains a whole IR module. This is necessary because there are
106  no equivalents in MIR for global variables, references to external functions,
107  function attributes, metadata, debug info. Instead some MIR data references
108  the IR constructs. You can often remove them if the test doesn't depend on
109  them.
110
111- Alias Analysis is performed on IR values. These are referenced by memory
112  operands in MIR. Example: `:: (load 8 from %ir.foobar, !alias.scope !9)`.
113  If the test doesn't depend on (good) alias analysis the references can be
114  dropped: `:: (load 8)`
115
116- MIR blocks can reference IR blocks for debug printing, profile information
117  or debug locations. Example: `bb.42.myblock` in MIR references the IR block
118  `myblock`. It is usually possible to drop the `.myblock` reference and simply
119  use `bb.42`.
120
121- If there are no memory operands or blocks referencing the IR then the
122  IR function can be replaced by a parameterless dummy function like
123  `define @func() { ret void }`.
124
125- It is possible to drop the whole IR section of the MIR file if it only
126  contains dummy functions (see above). The .mir loader will create the
127  IR functions automatically in this case.
128
129.. _limitations:
130
131Limitations
132-----------
133
134Currently the MIR format has several limitations in terms of which state it
135can serialize:
136
137- The target-specific state in the target-specific ``MachineFunctionInfo``
138  subclasses isn't serialized at the moment.
139
140- The target-specific ``MachineConstantPoolValue`` subclasses (in the ARM and
141  SystemZ backends) aren't serialized at the moment.
142
143- The ``MCSymbol`` machine operands don't support temporary or local symbols.
144
145- A lot of the state in ``MachineModuleInfo`` isn't serialized - only the CFI
146  instructions and the variable debug information from MMI is serialized right
147  now.
148
149These limitations impose restrictions on what you can test with the MIR format.
150For now, tests that would like to test some behaviour that depends on the state
151of temporary or local ``MCSymbol``  operands or the exception handling state in
152MMI, can't use the MIR format. As well as that, tests that test some behaviour
153that depends on the state of the target specific ``MachineFunctionInfo`` or
154``MachineConstantPoolValue`` subclasses can't use the MIR format at the moment.
155
156High Level Structure
157====================
158
159.. _embedded-module:
160
161Embedded Module
162---------------
163
164When the first YAML document contains a `YAML block literal string`_, the MIR
165parser will treat this string as an LLVM assembly language string that
166represents an embedded LLVM IR module.
167Here is an example of a YAML document that contains an LLVM module:
168
169.. code-block:: llvm
170
171       define i32 @inc(ptr %x) {
172       entry:
173         %0 = load i32, ptr %x
174         %1 = add i32 %0, 1
175         store i32 %1, ptr %x
176         ret i32 %1
177       }
178
179.. _YAML block literal string: http://www.yaml.org/spec/1.2/spec.html#id2795688
180
181Machine Functions
182-----------------
183
184The remaining YAML documents contain the machine functions. This is an example
185of such YAML document:
186
187.. code-block:: text
188
189     ---
190     name:            inc
191     tracksRegLiveness: true
192     liveins:
193       - { reg: '$rdi' }
194     callSites:
195       - { bb: 0, offset: 3, fwdArgRegs:
196           - { arg: 0, reg: '$edi' } }
197     body: |
198       bb.0.entry:
199         liveins: $rdi
200
201         $eax = MOV32rm $rdi, 1, _, 0, _
202         $eax = INC32r killed $eax, implicit-def dead $eflags
203         MOV32mr killed $rdi, 1, _, 0, _, $eax
204         CALL64pcrel32 @foo <regmask...>
205         RETQ $eax
206     ...
207
208The document above consists of attributes that represent the various
209properties and data structures in a machine function.
210
211The attribute ``name`` is required, and its value should be identical to the
212name of a function that this machine function is based on.
213
214The attribute ``body`` is a `YAML block literal string`_. Its value represents
215the function's machine basic blocks and their machine instructions.
216
217The attribute ``callSites`` is a representation of call site information which
218keeps track of call instructions and registers used to transfer call arguments.
219
220Machine Instructions Format Reference
221=====================================
222
223The machine basic blocks and their instructions are represented using a custom,
224human readable serialization language. This language is used in the
225`YAML block literal string`_ that corresponds to the machine function's body.
226
227A source string that uses this language contains a list of machine basic
228blocks, which are described in the section below.
229
230Machine Basic Blocks
231--------------------
232
233A machine basic block is defined in a single block definition source construct
234that contains the block's ID.
235The example below defines two blocks that have an ID of zero and one:
236
237.. code-block:: text
238
239    bb.0:
240      <instructions>
241    bb.1:
242      <instructions>
243
244A machine basic block can also have a name. It should be specified after the ID
245in the block's definition:
246
247.. code-block:: text
248
249    bb.0.entry:       ; This block's name is "entry"
250       <instructions>
251
252The block's name should be identical to the name of the IR block that this
253machine block is based on.
254
255.. _block-references:
256
257Block References
258^^^^^^^^^^^^^^^^
259
260The machine basic blocks are identified by their ID numbers. Individual
261blocks are referenced using the following syntax:
262
263.. code-block:: text
264
265    %bb.<id>
266
267Example:
268
269.. code-block:: llvm
270
271    %bb.0
272
273The following syntax is also supported, but the former syntax is preferred for
274block references:
275
276.. code-block:: text
277
278    %bb.<id>[.<name>]
279
280Example:
281
282.. code-block:: llvm
283
284    %bb.1.then
285
286Successors
287^^^^^^^^^^
288
289The machine basic block's successors have to be specified before any of the
290instructions:
291
292.. code-block:: text
293
294    bb.0.entry:
295      successors: %bb.1.then, %bb.2.else
296      <instructions>
297    bb.1.then:
298      <instructions>
299    bb.2.else:
300      <instructions>
301
302The branch weights can be specified in brackets after the successor blocks.
303The example below defines a block that has two successors with branch weights
304of 32 and 16:
305
306.. code-block:: text
307
308    bb.0.entry:
309      successors: %bb.1.then(32), %bb.2.else(16)
310
311.. _bb-liveins:
312
313Live In Registers
314^^^^^^^^^^^^^^^^^
315
316The machine basic block's live in registers have to be specified before any of
317the instructions:
318
319.. code-block:: text
320
321    bb.0.entry:
322      liveins: $edi, $esi
323
324The list of live in registers and successors can be empty. The language also
325allows multiple live in register and successor lists - they are combined into
326one list by the parser.
327
328Miscellaneous Attributes
329^^^^^^^^^^^^^^^^^^^^^^^^
330
331The attributes ``IsAddressTaken``, ``IsLandingPad``,
332``IsInlineAsmBrIndirectTarget`` and ``Alignment`` can be specified in brackets
333after the block's definition:
334
335.. code-block:: text
336
337    bb.0.entry (address-taken):
338      <instructions>
339    bb.2.else (align 4):
340      <instructions>
341    bb.3(landing-pad, align 4):
342      <instructions>
343    bb.4 (inlineasm-br-indirect-target):
344      <instructions>
345
346.. TODO: Describe the way the reference to an unnamed LLVM IR block can be
347   preserved.
348
349``Alignment`` is specified in bytes, and must be a power of two.
350
351.. _mir-instructions:
352
353Machine Instructions
354--------------------
355
356A machine instruction is composed of a name,
357:ref:`machine operands <machine-operands>`,
358:ref:`instruction flags <instruction-flags>`, and machine memory operands.
359
360The instruction's name is usually specified before the operands. The example
361below shows an instance of the X86 ``RETQ`` instruction with a single machine
362operand:
363
364.. code-block:: text
365
366    RETQ $eax
367
368However, if the machine instruction has one or more explicitly defined register
369operands, the instruction's name has to be specified after them. The example
370below shows an instance of the AArch64 ``LDPXpost`` instruction with three
371defined register operands:
372
373.. code-block:: text
374
375    $sp, $fp, $lr = LDPXpost $sp, 2
376
377The instruction names are serialized using the exact definitions from the
378target's ``*InstrInfo.td`` files, and they are case sensitive. This means that
379similar instruction names like ``TSTri`` and ``tSTRi`` represent different
380machine instructions.
381
382.. _instruction-flags:
383
384Instruction Flags
385^^^^^^^^^^^^^^^^^
386
387The flag ``frame-setup`` or ``frame-destroy`` can be specified before the
388instruction's name:
389
390.. code-block:: text
391
392    $fp = frame-setup ADDXri $sp, 0, 0
393
394.. code-block:: text
395
396    $x21, $x20 = frame-destroy LDPXi $sp
397
398.. _registers:
399
400Bundled Instructions
401^^^^^^^^^^^^^^^^^^^^
402
403The syntax for bundled instructions is the following:
404
405.. code-block:: text
406
407    BUNDLE implicit-def $r0, implicit-def $r1, implicit $r2 {
408      $r0 = SOME_OP $r2
409      $r1 = ANOTHER_OP internal $r0
410    }
411
412The first instruction is often a bundle header. The instructions between ``{``
413and ``}`` are bundled with the first instruction.
414
415.. _mir-registers:
416
417Registers
418---------
419
420Registers are one of the key primitives in the machine instructions
421serialization language. They are primarily used in the
422:ref:`register machine operands <register-operands>`,
423but they can also be used in a number of other places, like the
424:ref:`basic block's live in list <bb-liveins>`.
425
426The physical registers are identified by their name and by the '$' prefix sigil.
427They use the following syntax:
428
429.. code-block:: text
430
431    $<name>
432
433The example below shows three X86 physical registers:
434
435.. code-block:: text
436
437    $eax
438    $r15
439    $eflags
440
441The virtual registers are identified by their ID number and by the '%' sigil.
442They use the following syntax:
443
444.. code-block:: text
445
446    %<id>
447
448Example:
449
450.. code-block:: text
451
452    %0
453
454The null registers are represented using an underscore ('``_``'). They can also be
455represented using a '``$noreg``' named register, although the former syntax
456is preferred.
457
458.. _machine-operands:
459
460Machine Operands
461----------------
462
463There are eighteen different kinds of machine operands, and all of them can be
464serialized.
465
466Immediate Operands
467^^^^^^^^^^^^^^^^^^
468
469The immediate machine operands are untyped, 64-bit signed integers. The
470example below shows an instance of the X86 ``MOV32ri`` instruction that has an
471immediate machine operand ``-42``:
472
473.. code-block:: text
474
475    $eax = MOV32ri -42
476
477An immediate operand is also used to represent a subregister index when the
478machine instruction has one of the following opcodes:
479
480- ``EXTRACT_SUBREG``
481
482- ``INSERT_SUBREG``
483
484- ``REG_SEQUENCE``
485
486- ``SUBREG_TO_REG``
487
488In case this is true, the Machine Operand is printed according to the target.
489
490For example:
491
492In AArch64RegisterInfo.td:
493
494.. code-block:: text
495
496  def sub_32 : SubRegIndex<32>;
497
498If the third operand is an immediate with the value ``15`` (target-dependent
499value), based on the instruction's opcode and the operand's index the operand
500will be printed as ``%subreg.sub_32``:
501
502.. code-block:: text
503
504    %1:gpr64 = SUBREG_TO_REG 0, %0, %subreg.sub_32
505
506For integers > 64bit, we use a special machine operand, ``MO_CImmediate``,
507which stores the immediate in a ``ConstantInt`` using an ``APInt`` (LLVM's
508arbitrary precision integers).
509
510.. TODO: Describe the FPIMM immediate operands.
511
512.. _register-operands:
513
514Register Operands
515^^^^^^^^^^^^^^^^^
516
517The :ref:`register <registers>` primitive is used to represent the register
518machine operands. The register operands can also have optional
519:ref:`register flags <register-flags>`,
520:ref:`a subregister index <subregister-indices>`,
521and a reference to the tied register operand.
522The full syntax of a register operand is shown below:
523
524.. code-block:: text
525
526    [<flags>] <register> [ :<subregister-idx-name> ] [ (tied-def <tied-op>) ]
527
528This example shows an instance of the X86 ``XOR32rr`` instruction that has
5295 register operands with different register flags:
530
531.. code-block:: text
532
533  dead $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags, implicit-def $al
534
535.. _register-flags:
536
537Register Flags
538~~~~~~~~~~~~~~
539
540The table below shows all of the possible register flags along with the
541corresponding internal ``llvm::RegState`` representation:
542
543..
544   Keep this in sync with MachineInstrBuilder.h
545
546.. list-table::
547   :header-rows: 1
548
549   * - Flag
550     - Internal Value
551     - Meaning
552
553   * - ``implicit``
554     - ``RegState::Implicit``
555     - Not emitted register (e.g. carry, or temporary result).
556
557   * - ``implicit-def``
558     - ``RegState::ImplicitDefine``
559     - ``implicit`` and ``def``
560
561   * - ``def``
562     - ``RegState::Define``
563     - Register definition.
564
565   * - ``dead``
566     - ``RegState::Dead``
567     - Unused definition.
568
569   * - ``killed``
570     - ``RegState::Kill``
571     - The last use of a register.
572
573   * - ``undef``
574     - ``RegState::Undef``
575     - Value of the register doesn't matter.
576
577   * - ``internal``
578     - ``RegState::InternalRead``
579     - Register reads a value that is defined inside the same instruction or bundle.
580
581   * - ``early-clobber``
582     - ``RegState::EarlyClobber``
583     - Register definition happens before uses.
584
585   * - ``debug-use``
586     - ``RegState::Debug``
587     - Register 'use' is for debugging purpose.
588
589   * - ``renamable``
590     - ``RegState::Renamable``
591     - Register that may be renamed.
592
593.. _subregister-indices:
594
595Subregister Indices
596~~~~~~~~~~~~~~~~~~~
597
598The register machine operands can reference a portion of a register by using
599the subregister indices. The example below shows an instance of the ``COPY``
600pseudo instruction that uses the X86 ``sub_8bit`` subregister index to copy 8
601lower bits from the 32-bit virtual register 0 to the 8-bit virtual register 1:
602
603.. code-block:: text
604
605    %1 = COPY %0:sub_8bit
606
607The names of the subregister indices are target specific, and are typically
608defined in the target's ``*RegisterInfo.td`` file.
609
610Constant Pool Indices
611^^^^^^^^^^^^^^^^^^^^^
612
613A constant pool index (CPI) operand is printed using its index in the
614function's ``MachineConstantPool`` and an offset.
615
616For example, a CPI with the index 1 and offset 8:
617
618.. code-block:: text
619
620    %1:gr64 = MOV64ri %const.1 + 8
621
622For a CPI with the index 0 and offset -12:
623
624.. code-block:: text
625
626    %1:gr64 = MOV64ri %const.0 - 12
627
628A constant pool entry is bound to a LLVM IR ``Constant`` or a target-specific
629``MachineConstantPoolValue``. When serializing all the function's constants the
630following format is used:
631
632.. code-block:: text
633
634    constants:
635      - id:               <index>
636        value:            <value>
637        alignment:        <alignment>
638        isTargetSpecific: <target-specific>
639
640where:
641  - ``<index>`` is a 32-bit unsigned integer;
642  - ``<value>`` is a `LLVM IR Constant
643    <https://www.llvm.org/docs/LangRef.html#constants>`_;
644  - ``<alignment>`` is a 32-bit unsigned integer specified in bytes, and must be
645    power of two;
646  - ``<target-specific>`` is either true or false.
647
648Example:
649
650.. code-block:: text
651
652    constants:
653      - id:               0
654        value:            'double 3.250000e+00'
655        alignment:        8
656      - id:               1
657        value:            'g-(LPC0+8)'
658        alignment:        4
659        isTargetSpecific: true
660
661Global Value Operands
662^^^^^^^^^^^^^^^^^^^^^
663
664The global value machine operands reference the global values from the
665:ref:`embedded LLVM IR module <embedded-module>`.
666The example below shows an instance of the X86 ``MOV64rm`` instruction that has
667a global value operand named ``G``:
668
669.. code-block:: text
670
671    $rax = MOV64rm $rip, 1, _, @G, _
672
673The named global values are represented using an identifier with the '@' prefix.
674If the identifier doesn't match the regular expression
675`[-a-zA-Z$._][-a-zA-Z$._0-9]*`, then this identifier must be quoted.
676
677The unnamed global values are represented using an unsigned numeric value with
678the '@' prefix, like in the following examples: ``@0``, ``@989``.
679
680Target-dependent Index Operands
681^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
682
683A target index operand is a target-specific index and an offset. The
684target-specific index is printed using target-specific names and a positive or
685negative offset.
686
687For example, the ``amdgpu-constdata-start`` is associated with the index ``0``
688in the AMDGPU backend. So if we have a target index operand with the index 0
689and the offset 8:
690
691.. code-block:: text
692
693    $sgpr2 = S_ADD_U32 _, target-index(amdgpu-constdata-start) + 8, implicit-def _, implicit-def _
694
695Jump-table Index Operands
696^^^^^^^^^^^^^^^^^^^^^^^^^
697
698A jump-table index operand with the index 0 is printed as following:
699
700.. code-block:: text
701
702    tBR_JTr killed $r0, %jump-table.0
703
704A machine jump-table entry contains a list of ``MachineBasicBlocks``. When serializing all the function's jump-table entries, the following format is used:
705
706.. code-block:: text
707
708    jumpTable:
709      kind:             <kind>
710      entries:
711        - id:             <index>
712          blocks:         [ <bbreference>, <bbreference>, ... ]
713
714where ``<kind>`` is describing how the jump table is represented and emitted (plain address, relocations, PIC, etc.), and each ``<index>`` is a 32-bit unsigned integer and ``blocks`` contains a list of :ref:`machine basic block references <block-references>`.
715
716Example:
717
718.. code-block:: text
719
720    jumpTable:
721      kind:             inline
722      entries:
723        - id:             0
724          blocks:         [ '%bb.3', '%bb.9', '%bb.4.d3' ]
725        - id:             1
726          blocks:         [ '%bb.7', '%bb.7', '%bb.4.d3', '%bb.5' ]
727
728External Symbol Operands
729^^^^^^^^^^^^^^^^^^^^^^^^^
730
731An external symbol operand is represented using an identifier with the ``&``
732prefix. The identifier is surrounded with ""'s and escaped if it has any
733special non-printable characters in it.
734
735Example:
736
737.. code-block:: text
738
739    CALL64pcrel32 &__stack_chk_fail, csr_64, implicit $rsp, implicit-def $rsp
740
741MCSymbol Operands
742^^^^^^^^^^^^^^^^^
743
744A MCSymbol operand is holding a pointer to a ``MCSymbol``. For the limitations
745of this operand in MIR, see :ref:`limitations <limitations>`.
746
747The syntax is:
748
749.. code-block:: text
750
751    EH_LABEL <mcsymbol Ltmp1>
752
753Debug Instruction Reference Operands
754^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
755
756A debug instruction reference operand is a pair of indices, referring to an
757instruction and an operand within that instruction respectively; see
758:ref:`Instruction referencing locations <instruction-referencing-locations>`.
759
760The example below uses a reference to Instruction 1, Operand 0:
761
762.. code-block:: text
763
764    DBG_INSTR_REF !123, !DIExpression(DW_OP_LLVM_arg, 0), dbg-instr-ref(1, 0), debug-location !456
765
766CFIIndex Operands
767^^^^^^^^^^^^^^^^^
768
769A CFI Index operand is holding an index into a per-function side-table,
770``MachineFunction::getFrameInstructions()``, which references all the frame
771instructions in a ``MachineFunction``. A ``CFI_INSTRUCTION`` may look like it
772contains multiple operands, but the only operand it contains is the CFI Index.
773The other operands are tracked by the ``MCCFIInstruction`` object.
774
775The syntax is:
776
777.. code-block:: text
778
779    CFI_INSTRUCTION offset $w30, -16
780
781which may be emitted later in the MC layer as:
782
783.. code-block:: text
784
785    .cfi_offset w30, -16
786
787IntrinsicID Operands
788^^^^^^^^^^^^^^^^^^^^
789
790An Intrinsic ID operand contains a generic intrinsic ID or a target-specific ID.
791
792The syntax for the ``returnaddress`` intrinsic is:
793
794.. code-block:: text
795
796   $x0 = COPY intrinsic(@llvm.returnaddress)
797
798Predicate Operands
799^^^^^^^^^^^^^^^^^^
800
801A Predicate operand contains an IR predicate from ``CmpInst::Predicate``, like
802``ICMP_EQ``, etc.
803
804For an int eq predicate ``ICMP_EQ``, the syntax is:
805
806.. code-block:: text
807
808   %2:gpr(s32) = G_ICMP intpred(eq), %0, %1
809
810.. TODO: Describe the parsers default behaviour when optional YAML attributes
811   are missing.
812.. TODO: Describe the syntax for virtual register YAML definitions.
813.. TODO: Describe the machine function's YAML flag attributes.
814.. TODO: Describe the syntax for the register mask machine operands.
815.. TODO: Describe the frame information YAML mapping.
816.. TODO: Describe the syntax of the stack object machine operands and their
817   YAML definitions.
818.. TODO: Describe the syntax of the block address machine operands.
819.. TODO: Describe the syntax of the metadata machine operands, and the
820   instructions debug location attribute.
821.. TODO: Describe the syntax of the register live out machine operands.
822.. TODO: Describe the syntax of the machine memory operands.
823
824Comments
825^^^^^^^^
826
827Machine operands can have C/C++ style comments, which are annotations enclosed
828between ``/*`` and ``*/`` to improve readability of e.g. immediate operands.
829In the example below, ARM instructions EOR and BCC and immediate operands
830``14`` and ``0`` have been annotated with their condition codes (CC)
831definitions, i.e. the ``always`` and ``eq`` condition codes:
832
833.. code-block:: text
834
835  dead renamable $r2, $cpsr = tEOR killed renamable $r2, renamable $r1, 14 /* CC::always */, $noreg
836  t2Bcc %bb.4, 0 /* CC:eq */, killed $cpsr
837
838As these annotations are comments, they are ignored by the MI parser.
839Comments can be added or customized by overriding InstrInfo's hook
840``createMIROperandComment()``.
841
842Debug-Info constructs
843---------------------
844
845Most of the debugging information in a MIR file is to be found in the metadata
846of the embedded module. Within a machine function, that metadata is referred to
847by various constructs to describe source locations and variable locations.
848
849Source locations
850^^^^^^^^^^^^^^^^
851
852Every MIR instruction may optionally have a trailing reference to a
853``DILocation`` metadata node, after all operands and symbols, but before
854memory operands:
855
856.. code-block:: text
857
858   $rbp = MOV64rr $rdi, debug-location !12
859
860The source location attachment is synonymous with the ``!dbg`` metadata
861attachment in LLVM-IR. The absence of a source location attachment will be
862represented by an empty ``DebugLoc`` object in the machine instruction.
863
864Fixed variable locations
865^^^^^^^^^^^^^^^^^^^^^^^^
866
867There are several ways of specifying variable locations. The simplest is
868describing a variable that is permanently located on the stack. In the stack
869or fixedStack attribute of the machine function, the variable, scope, and
870any qualifying location modifier are provided:
871
872.. code-block:: text
873
874    - { id: 0, name: offset.addr, offset: -24, size: 8, alignment: 8, stack-id: default,
875     4  debug-info-variable: '!1', debug-info-expression: '!DIExpression()',
876        debug-info-location: '!2' }
877
878Where:
879
880- ``debug-info-variable`` identifies a DILocalVariable metadata node,
881
882- ``debug-info-expression`` adds qualifiers to the variable location,
883
884- ``debug-info-location`` identifies a DILocation metadata node.
885
886These metadata attributes correspond to the operands of a ``#dbg_declare``
887IR debug record, see the :ref:`source level
888debugging<debug_records>` documentation.
889
890Varying variable locations
891^^^^^^^^^^^^^^^^^^^^^^^^^^
892
893Variables that are not always on the stack or change location are specified
894with the ``DBG_VALUE``  meta machine instruction. It is synonymous with the
895``#dbg_value`` IR record, and is written:
896
897.. code-block:: text
898
899    DBG_VALUE $rax, $noreg, !123, !DIExpression(), debug-location !456
900
901The operands to which respectively:
902
9031. Identifies a machine location such as a register, immediate, or frame index,
904
9052. Is either $noreg, or immediate value zero if an extra level of indirection is to be added to the first operand,
906
9073. Identifies a ``DILocalVariable`` metadata node,
908
9094. Specifies an expression qualifying the variable location, either inline or as a metadata node reference,
910
911While the source location identifies the ``DILocation`` for the scope of the
912variable. The second operand (``IsIndirect``) is deprecated and to be deleted.
913All additional qualifiers for the variable location should be made through the
914expression metadata.
915
916.. _instruction-referencing-locations:
917
918Instruction referencing locations
919^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
920
921This experimental feature aims to separate the specification of variable
922*values* from the program point where a variable takes on that value. Changes
923in variable value occur in the same manner as ``DBG_VALUE`` meta instructions
924but using ``DBG_INSTR_REF``. Variable values are identified by a pair of
925instruction number and operand number. Consider the example below:
926
927.. code-block:: text
928
929    $rbp = MOV64ri 0, debug-instr-number 1, debug-location !12
930    DBG_INSTR_REF !123, !DIExpression(DW_OP_LLVM_arg, 0), dbg-instr-ref(1, 0), debug-location !456
931
932Instruction numbers are directly attached to machine instructions with an
933optional ``debug-instr-number`` attachment, before the optional
934``debug-location`` attachment. The value defined in ``$rbp`` in the code
935above would be identified by the pair ``<1, 0>``.
936
937The 3rd operand of the ``DBG_INSTR_REF`` above records the instruction
938and operand number ``<1, 0>``, identifying the value defined by the ``MOV64ri``.
939The first two operands to ``DBG_INSTR_REF`` are identical to ``DBG_VALUE_LIST``,
940and the ``DBG_INSTR_REF`` s position records where the variable takes on the
941designated value in the same way.
942
943More information about how these constructs are used is available in
944:doc:`InstrRefDebugInfo`. The related documents :doc:`SourceLevelDebugging` and
945:doc:`HowToUpdateDebugInfo` may be useful as well.
946