109467b48Spatrick========================================== 209467b48SpatrickThe LLVM Target-Independent Code Generator 309467b48Spatrick========================================== 409467b48Spatrick 509467b48Spatrick.. role:: raw-html(raw) 609467b48Spatrick :format: html 709467b48Spatrick 809467b48Spatrick.. raw:: html 909467b48Spatrick 1009467b48Spatrick <style> 1109467b48Spatrick .unknown { background-color: #C0C0C0; text-align: center; } 1209467b48Spatrick .unknown:before { content: "?" } 1309467b48Spatrick .no { background-color: #C11B17 } 1409467b48Spatrick .no:before { content: "N" } 1509467b48Spatrick .partial { background-color: #F88017 } 1609467b48Spatrick .yes { background-color: #0F0; } 1709467b48Spatrick .yes:before { content: "Y" } 1809467b48Spatrick .na { background-color: #6666FF; } 1909467b48Spatrick .na:before { content: "N/A" } 2009467b48Spatrick </style> 2109467b48Spatrick 2209467b48Spatrick.. contents:: 2309467b48Spatrick :local: 2409467b48Spatrick 2509467b48Spatrick.. warning:: 2609467b48Spatrick This is a work in progress. 2709467b48Spatrick 2809467b48SpatrickIntroduction 2909467b48Spatrick============ 3009467b48Spatrick 3109467b48SpatrickThe LLVM target-independent code generator is a framework that provides a suite 3209467b48Spatrickof reusable components for translating the LLVM internal representation to the 3309467b48Spatrickmachine code for a specified target---either in assembly form (suitable for a 3409467b48Spatrickstatic compiler) or in binary machine code format (usable for a JIT 3509467b48Spatrickcompiler). The LLVM target-independent code generator consists of six main 3609467b48Spatrickcomponents: 3709467b48Spatrick 3809467b48Spatrick1. `Abstract target description`_ interfaces which capture important properties 3909467b48Spatrick about various aspects of the machine, independently of how they will be used. 4009467b48Spatrick These interfaces are defined in ``include/llvm/Target/``. 4109467b48Spatrick 4209467b48Spatrick2. Classes used to represent the `code being generated`_ for a target. These 4309467b48Spatrick classes are intended to be abstract enough to represent the machine code for 4409467b48Spatrick *any* target machine. These classes are defined in 4509467b48Spatrick ``include/llvm/CodeGen/``. At this level, concepts like "constant pool 4609467b48Spatrick entries" and "jump tables" are explicitly exposed. 4709467b48Spatrick 4809467b48Spatrick3. Classes and algorithms used to represent code at the object file level, the 4909467b48Spatrick `MC Layer`_. These classes represent assembly level constructs like labels, 5009467b48Spatrick sections, and instructions. At this level, concepts like "constant pool 5109467b48Spatrick entries" and "jump tables" don't exist. 5209467b48Spatrick 5309467b48Spatrick4. `Target-independent algorithms`_ used to implement various phases of native 5409467b48Spatrick code generation (register allocation, scheduling, stack frame representation, 5509467b48Spatrick etc). This code lives in ``lib/CodeGen/``. 5609467b48Spatrick 5709467b48Spatrick5. `Implementations of the abstract target description interfaces`_ for 5809467b48Spatrick particular targets. These machine descriptions make use of the components 5909467b48Spatrick provided by LLVM, and can optionally provide custom target-specific passes, 6009467b48Spatrick to build complete code generators for a specific target. Target descriptions 6109467b48Spatrick live in ``lib/Target/``. 6209467b48Spatrick 6309467b48Spatrick6. The target-independent JIT components. The LLVM JIT is completely target 6409467b48Spatrick independent (it uses the ``TargetJITInfo`` structure to interface for 6509467b48Spatrick target-specific issues. The code for the target-independent JIT lives in 6609467b48Spatrick ``lib/ExecutionEngine/JIT``. 6709467b48Spatrick 6809467b48SpatrickDepending on which part of the code generator you are interested in working on, 6909467b48Spatrickdifferent pieces of this will be useful to you. In any case, you should be 7009467b48Spatrickfamiliar with the `target description`_ and `machine code representation`_ 7109467b48Spatrickclasses. If you want to add a backend for a new target, you will need to 7209467b48Spatrick`implement the target description`_ classes for your new target and understand 7309467b48Spatrickthe :doc:`LLVM code representation <LangRef>`. If you are interested in 7409467b48Spatrickimplementing a new `code generation algorithm`_, it should only depend on the 7509467b48Spatricktarget-description and machine code representation classes, ensuring that it is 7609467b48Spatrickportable. 7709467b48Spatrick 7809467b48SpatrickRequired components in the code generator 7909467b48Spatrick----------------------------------------- 8009467b48Spatrick 8109467b48SpatrickThe two pieces of the LLVM code generator are the high-level interface to the 8209467b48Spatrickcode generator and the set of reusable components that can be used to build 8309467b48Spatricktarget-specific backends. The two most important interfaces (:raw-html:`<tt>` 8409467b48Spatrick`TargetMachine`_ :raw-html:`</tt>` and :raw-html:`<tt>` `DataLayout`_ 8509467b48Spatrick:raw-html:`</tt>`) are the only ones that are required to be defined for a 8609467b48Spatrickbackend to fit into the LLVM system, but the others must be defined if the 8709467b48Spatrickreusable code generator components are going to be used. 8809467b48Spatrick 8909467b48SpatrickThis design has two important implications. The first is that LLVM can support 9009467b48Spatrickcompletely non-traditional code generation targets. For example, the C backend 9109467b48Spatrickdoes not require register allocation, instruction selection, or any of the other 9209467b48Spatrickstandard components provided by the system. As such, it only implements these 9309467b48Spatricktwo interfaces, and does its own thing. Note that C backend was removed from the 9409467b48Spatricktrunk since LLVM 3.1 release. Another example of a code generator like this is a 9509467b48Spatrick(purely hypothetical) backend that converts LLVM to the GCC RTL form and uses 9609467b48SpatrickGCC to emit machine code for a target. 9709467b48Spatrick 9809467b48SpatrickThis design also implies that it is possible to design and implement radically 9909467b48Spatrickdifferent code generators in the LLVM system that do not make use of any of the 10009467b48Spatrickbuilt-in components. Doing so is not recommended at all, but could be required 10109467b48Spatrickfor radically different targets that do not fit into the LLVM machine 10209467b48Spatrickdescription model: FPGAs for example. 10309467b48Spatrick 10409467b48Spatrick.. _high-level design of the code generator: 10509467b48Spatrick 10609467b48SpatrickThe high-level design of the code generator 10709467b48Spatrick------------------------------------------- 10809467b48Spatrick 10909467b48SpatrickThe LLVM target-independent code generator is designed to support efficient and 11009467b48Spatrickquality code generation for standard register-based microprocessors. Code 11109467b48Spatrickgeneration in this model is divided into the following stages: 11209467b48Spatrick 11309467b48Spatrick1. `Instruction Selection`_ --- This phase determines an efficient way to 11409467b48Spatrick express the input LLVM code in the target instruction set. This stage 11509467b48Spatrick produces the initial code for the program in the target instruction set, then 11609467b48Spatrick makes use of virtual registers in SSA form and physical registers that 11709467b48Spatrick represent any required register assignments due to target constraints or 11809467b48Spatrick calling conventions. This step turns the LLVM code into a DAG of target 11909467b48Spatrick instructions. 12009467b48Spatrick 12109467b48Spatrick2. `Scheduling and Formation`_ --- This phase takes the DAG of target 12209467b48Spatrick instructions produced by the instruction selection phase, determines an 12309467b48Spatrick ordering of the instructions, then emits the instructions as :raw-html:`<tt>` 12409467b48Spatrick `MachineInstr`_\s :raw-html:`</tt>` with that ordering. Note that we 12509467b48Spatrick describe this in the `instruction selection section`_ because it operates on 12609467b48Spatrick a `SelectionDAG`_. 12709467b48Spatrick 12809467b48Spatrick3. `SSA-based Machine Code Optimizations`_ --- This optional stage consists of a 12909467b48Spatrick series of machine-code optimizations that operate on the SSA-form produced by 13009467b48Spatrick the instruction selector. Optimizations like modulo-scheduling or peephole 13109467b48Spatrick optimization work here. 13209467b48Spatrick 13309467b48Spatrick4. `Register Allocation`_ --- The target code is transformed from an infinite 13409467b48Spatrick virtual register file in SSA form to the concrete register file used by the 13509467b48Spatrick target. This phase introduces spill code and eliminates all virtual register 13609467b48Spatrick references from the program. 13709467b48Spatrick 13809467b48Spatrick5. `Prolog/Epilog Code Insertion`_ --- Once the machine code has been generated 13909467b48Spatrick for the function and the amount of stack space required is known (used for 14009467b48Spatrick LLVM alloca's and spill slots), the prolog and epilog code for the function 14109467b48Spatrick can be inserted and "abstract stack location references" can be eliminated. 14209467b48Spatrick This stage is responsible for implementing optimizations like frame-pointer 14309467b48Spatrick elimination and stack packing. 14409467b48Spatrick 14509467b48Spatrick6. `Late Machine Code Optimizations`_ --- Optimizations that operate on "final" 14609467b48Spatrick machine code can go here, such as spill code scheduling and peephole 14709467b48Spatrick optimizations. 14809467b48Spatrick 14909467b48Spatrick7. `Code Emission`_ --- The final stage actually puts out the code for the 15009467b48Spatrick current function, either in the target assembler format or in machine 15109467b48Spatrick code. 15209467b48Spatrick 15309467b48SpatrickThe code generator is based on the assumption that the instruction selector will 15409467b48Spatrickuse an optimal pattern matching selector to create high-quality sequences of 15509467b48Spatricknative instructions. Alternative code generator designs based on pattern 15609467b48Spatrickexpansion and aggressive iterative peephole optimization are much slower. This 15709467b48Spatrickdesign permits efficient compilation (important for JIT environments) and 15809467b48Spatrickaggressive optimization (used when generating code offline) by allowing 15909467b48Spatrickcomponents of varying levels of sophistication to be used for any step of 16009467b48Spatrickcompilation. 16109467b48Spatrick 16209467b48SpatrickIn addition to these stages, target implementations can insert arbitrary 16309467b48Spatricktarget-specific passes into the flow. For example, the X86 target uses a 16409467b48Spatrickspecial pass to handle the 80x87 floating point stack architecture. Other 16509467b48Spatricktargets with unusual requirements can be supported with custom passes as needed. 16609467b48Spatrick 16709467b48SpatrickUsing TableGen for target description 16809467b48Spatrick------------------------------------- 16909467b48Spatrick 17009467b48SpatrickThe target description classes require a detailed description of the target 17109467b48Spatrickarchitecture. These target descriptions often have a large amount of common 17209467b48Spatrickinformation (e.g., an ``add`` instruction is almost identical to a ``sub`` 17309467b48Spatrickinstruction). In order to allow the maximum amount of commonality to be 17409467b48Spatrickfactored out, the LLVM code generator uses the 17509467b48Spatrick:doc:`TableGen/index` tool to describe big chunks of the 17609467b48Spatricktarget machine, which allows the use of domain-specific and target-specific 17709467b48Spatrickabstractions to reduce the amount of repetition. 17809467b48Spatrick 17909467b48SpatrickAs LLVM continues to be developed and refined, we plan to move more and more of 18009467b48Spatrickthe target description to the ``.td`` form. Doing so gives us a number of 18109467b48Spatrickadvantages. The most important is that it makes it easier to port LLVM because 18209467b48Spatrickit reduces the amount of C++ code that has to be written, and the surface area 18309467b48Spatrickof the code generator that needs to be understood before someone can get 18409467b48Spatricksomething working. Second, it makes it easier to change things. In particular, 18509467b48Spatrickif tables and other things are all emitted by ``tblgen``, we only need a change 18609467b48Spatrickin one place (``tblgen``) to update all of the targets to a new interface. 18709467b48Spatrick 18809467b48Spatrick.. _Abstract target description: 18909467b48Spatrick.. _target description: 19009467b48Spatrick 19109467b48SpatrickTarget description classes 19209467b48Spatrick========================== 19309467b48Spatrick 19409467b48SpatrickThe LLVM target description classes (located in the ``include/llvm/Target`` 19509467b48Spatrickdirectory) provide an abstract description of the target machine independent of 19609467b48Spatrickany particular client. These classes are designed to capture the *abstract* 19709467b48Spatrickproperties of the target (such as the instructions and registers it has), and do 19809467b48Spatricknot incorporate any particular pieces of code generation algorithms. 19909467b48Spatrick 20009467b48SpatrickAll of the target description classes (except the :raw-html:`<tt>` `DataLayout`_ 20109467b48Spatrick:raw-html:`</tt>` class) are designed to be subclassed by the concrete target 20209467b48Spatrickimplementation, and have virtual methods implemented. To get to these 20309467b48Spatrickimplementations, the :raw-html:`<tt>` `TargetMachine`_ :raw-html:`</tt>` class 20409467b48Spatrickprovides accessors that should be implemented by the target. 20509467b48Spatrick 20609467b48Spatrick.. _TargetMachine: 20709467b48Spatrick 20809467b48SpatrickThe ``TargetMachine`` class 20909467b48Spatrick--------------------------- 21009467b48Spatrick 21109467b48SpatrickThe ``TargetMachine`` class provides virtual methods that are used to access the 21209467b48Spatricktarget-specific implementations of the various target description classes via 21309467b48Spatrickthe ``get*Info`` methods (``getInstrInfo``, ``getRegisterInfo``, 21409467b48Spatrick``getFrameInfo``, etc.). This class is designed to be specialized by a concrete 21509467b48Spatricktarget implementation (e.g., ``X86TargetMachine``) which implements the various 21609467b48Spatrickvirtual methods. The only required target description class is the 21709467b48Spatrick:raw-html:`<tt>` `DataLayout`_ :raw-html:`</tt>` class, but if the code 21809467b48Spatrickgenerator components are to be used, the other interfaces should be implemented 21909467b48Spatrickas well. 22009467b48Spatrick 22109467b48Spatrick.. _DataLayout: 22209467b48Spatrick 22309467b48SpatrickThe ``DataLayout`` class 22409467b48Spatrick------------------------ 22509467b48Spatrick 22609467b48SpatrickThe ``DataLayout`` class is the only required target description class, and it 22709467b48Spatrickis the only class that is not extensible (you cannot derive a new class from 22809467b48Spatrickit). ``DataLayout`` specifies information about how the target lays out memory 22909467b48Spatrickfor structures, the alignment requirements for various data types, the size of 23009467b48Spatrickpointers in the target, and whether the target is little-endian or 23109467b48Spatrickbig-endian. 23209467b48Spatrick 23309467b48Spatrick.. _TargetLowering: 23409467b48Spatrick 23509467b48SpatrickThe ``TargetLowering`` class 23609467b48Spatrick---------------------------- 23709467b48Spatrick 23809467b48SpatrickThe ``TargetLowering`` class is used by SelectionDAG based instruction selectors 23909467b48Spatrickprimarily to describe how LLVM code should be lowered to SelectionDAG 24009467b48Spatrickoperations. Among other things, this class indicates: 24109467b48Spatrick 24209467b48Spatrick* an initial register class to use for various ``ValueType``\s, 24309467b48Spatrick 24409467b48Spatrick* which operations are natively supported by the target machine, 24509467b48Spatrick 24609467b48Spatrick* the return type of ``setcc`` operations, 24709467b48Spatrick 24809467b48Spatrick* the type to use for shift amounts, and 24909467b48Spatrick 25009467b48Spatrick* various high-level characteristics, like whether it is profitable to turn 25109467b48Spatrick division by a constant into a multiplication sequence. 25209467b48Spatrick 25309467b48Spatrick.. _TargetRegisterInfo: 25409467b48Spatrick 25509467b48SpatrickThe ``TargetRegisterInfo`` class 25609467b48Spatrick-------------------------------- 25709467b48Spatrick 25809467b48SpatrickThe ``TargetRegisterInfo`` class is used to describe the register file of the 25909467b48Spatricktarget and any interactions between the registers. 26009467b48Spatrick 26109467b48SpatrickRegisters are represented in the code generator by unsigned integers. Physical 26209467b48Spatrickregisters (those that actually exist in the target description) are unique 26309467b48Spatricksmall numbers, and virtual registers are generally large. Note that 26409467b48Spatrickregister ``#0`` is reserved as a flag value. 26509467b48Spatrick 26609467b48SpatrickEach register in the processor description has an associated 26709467b48Spatrick``TargetRegisterDesc`` entry, which provides a textual name for the register 26809467b48Spatrick(used for assembly output and debugging dumps) and a set of aliases (used to 26909467b48Spatrickindicate whether one register overlaps with another). 27009467b48Spatrick 27109467b48SpatrickIn addition to the per-register description, the ``TargetRegisterInfo`` class 27209467b48Spatrickexposes a set of processor specific register classes (instances of the 27309467b48Spatrick``TargetRegisterClass`` class). Each register class contains sets of registers 27409467b48Spatrickthat have the same properties (for example, they are all 32-bit integer 27509467b48Spatrickregisters). Each SSA virtual register created by the instruction selector has 27609467b48Spatrickan associated register class. When the register allocator runs, it replaces 27709467b48Spatrickvirtual registers with a physical register in the set. 27809467b48Spatrick 27909467b48SpatrickThe target-specific implementations of these classes is auto-generated from a 28009467b48Spatrick:doc:`TableGen/index` description of the register file. 28109467b48Spatrick 28209467b48Spatrick.. _TargetInstrInfo: 28309467b48Spatrick 28409467b48SpatrickThe ``TargetInstrInfo`` class 28509467b48Spatrick----------------------------- 28609467b48Spatrick 28709467b48SpatrickThe ``TargetInstrInfo`` class is used to describe the machine instructions 28809467b48Spatricksupported by the target. Descriptions define things like the mnemonic for 28909467b48Spatrickthe opcode, the number of operands, the list of implicit register uses and defs, 29009467b48Spatrickwhether the instruction has certain target-independent properties (accesses 29109467b48Spatrickmemory, is commutable, etc), and holds any target-specific flags. 29209467b48Spatrick 29309467b48SpatrickThe ``TargetFrameLowering`` class 29409467b48Spatrick--------------------------------- 29509467b48Spatrick 29609467b48SpatrickThe ``TargetFrameLowering`` class is used to provide information about the stack 29709467b48Spatrickframe layout of the target. It holds the direction of stack growth, the known 29809467b48Spatrickstack alignment on entry to each function, and the offset to the local area. 29909467b48SpatrickThe offset to the local area is the offset from the stack pointer on function 30009467b48Spatrickentry to the first location where function data (local variables, spill 30109467b48Spatricklocations) can be stored. 30209467b48Spatrick 30309467b48SpatrickThe ``TargetSubtarget`` class 30409467b48Spatrick----------------------------- 30509467b48Spatrick 30609467b48SpatrickThe ``TargetSubtarget`` class is used to provide information about the specific 30709467b48Spatrickchip set being targeted. A sub-target informs code generation of which 30809467b48Spatrickinstructions are supported, instruction latencies and instruction execution 30909467b48Spatrickitinerary; i.e., which processing units are used, in what order, and for how 31009467b48Spatricklong. 31109467b48Spatrick 31209467b48SpatrickThe ``TargetJITInfo`` class 31309467b48Spatrick--------------------------- 31409467b48Spatrick 31509467b48SpatrickThe ``TargetJITInfo`` class exposes an abstract interface used by the 31609467b48SpatrickJust-In-Time code generator to perform target-specific activities, such as 31709467b48Spatrickemitting stubs. If a ``TargetMachine`` supports JIT code generation, it should 31809467b48Spatrickprovide one of these objects through the ``getJITInfo`` method. 31909467b48Spatrick 32009467b48Spatrick.. _code being generated: 32109467b48Spatrick.. _machine code representation: 32209467b48Spatrick 32309467b48SpatrickMachine code description classes 32409467b48Spatrick================================ 32509467b48Spatrick 32609467b48SpatrickAt the high-level, LLVM code is translated to a machine specific representation 32709467b48Spatrickformed out of :raw-html:`<tt>` `MachineFunction`_ :raw-html:`</tt>`, 32809467b48Spatrick:raw-html:`<tt>` `MachineBasicBlock`_ :raw-html:`</tt>`, and :raw-html:`<tt>` 32909467b48Spatrick`MachineInstr`_ :raw-html:`</tt>` instances (defined in 33009467b48Spatrick``include/llvm/CodeGen``). This representation is completely target agnostic, 33109467b48Spatrickrepresenting instructions in their most abstract form: an opcode and a series of 33209467b48Spatrickoperands. This representation is designed to support both an SSA representation 33309467b48Spatrickfor machine code, as well as a register allocated, non-SSA form. 33409467b48Spatrick 33509467b48Spatrick.. _MachineInstr: 33609467b48Spatrick 33709467b48SpatrickThe ``MachineInstr`` class 33809467b48Spatrick-------------------------- 33909467b48Spatrick 34009467b48SpatrickTarget machine instructions are represented as instances of the ``MachineInstr`` 34109467b48Spatrickclass. This class is an extremely abstract way of representing machine 34209467b48Spatrickinstructions. In particular, it only keeps track of an opcode number and a set 34309467b48Spatrickof operands. 34409467b48Spatrick 34509467b48SpatrickThe opcode number is a simple unsigned integer that only has meaning to a 34609467b48Spatrickspecific backend. All of the instructions for a target should be defined in the 34709467b48Spatrick``*InstrInfo.td`` file for the target. The opcode enum values are auto-generated 34809467b48Spatrickfrom this description. The ``MachineInstr`` class does not have any information 34909467b48Spatrickabout how to interpret the instruction (i.e., what the semantics of the 35009467b48Spatrickinstruction are); for that you must refer to the :raw-html:`<tt>` 35109467b48Spatrick`TargetInstrInfo`_ :raw-html:`</tt>` class. 35209467b48Spatrick 35309467b48SpatrickThe operands of a machine instruction can be of several different types: a 35409467b48Spatrickregister reference, a constant integer, a basic block reference, etc. In 35509467b48Spatrickaddition, a machine operand should be marked as a def or a use of the value 35609467b48Spatrick(though only registers are allowed to be defs). 35709467b48Spatrick 35809467b48SpatrickBy convention, the LLVM code generator orders instruction operands so that all 35909467b48Spatrickregister definitions come before the register uses, even on architectures that 36009467b48Spatrickare normally printed in other orders. For example, the SPARC add instruction: 36109467b48Spatrick"``add %i1, %i2, %i3``" adds the "%i1", and "%i2" registers and stores the 36209467b48Spatrickresult into the "%i3" register. In the LLVM code generator, the operands should 36309467b48Spatrickbe stored as "``%i3, %i1, %i2``": with the destination first. 36409467b48Spatrick 36509467b48SpatrickKeeping destination (definition) operands at the beginning of the operand list 36609467b48Spatrickhas several advantages. In particular, the debugging printer will print the 36709467b48Spatrickinstruction like this: 36809467b48Spatrick 36909467b48Spatrick.. code-block:: llvm 37009467b48Spatrick 37109467b48Spatrick %r3 = add %i1, %i2 37209467b48Spatrick 37309467b48SpatrickAlso if the first operand is a def, it is easier to `create instructions`_ whose 37409467b48Spatrickonly def is the first operand. 37509467b48Spatrick 37609467b48Spatrick.. _create instructions: 37709467b48Spatrick 37809467b48SpatrickUsing the ``MachineInstrBuilder.h`` functions 37909467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 38009467b48Spatrick 38109467b48SpatrickMachine instructions are created by using the ``BuildMI`` functions, located in 38209467b48Spatrickthe ``include/llvm/CodeGen/MachineInstrBuilder.h`` file. The ``BuildMI`` 38309467b48Spatrickfunctions make it easy to build arbitrary machine instructions. Usage of the 38409467b48Spatrick``BuildMI`` functions look like this: 38509467b48Spatrick 38609467b48Spatrick.. code-block:: c++ 38709467b48Spatrick 38809467b48Spatrick // Create a 'DestReg = mov 42' (rendered in X86 assembly as 'mov DestReg, 42') 38909467b48Spatrick // instruction and insert it at the end of the given MachineBasicBlock. 39009467b48Spatrick const TargetInstrInfo &TII = ... 39109467b48Spatrick MachineBasicBlock &MBB = ... 39209467b48Spatrick DebugLoc DL; 39309467b48Spatrick MachineInstr *MI = BuildMI(MBB, DL, TII.get(X86::MOV32ri), DestReg).addImm(42); 39409467b48Spatrick 39509467b48Spatrick // Create the same instr, but insert it before a specified iterator point. 39609467b48Spatrick MachineBasicBlock::iterator MBBI = ... 39709467b48Spatrick BuildMI(MBB, MBBI, DL, TII.get(X86::MOV32ri), DestReg).addImm(42); 39809467b48Spatrick 39909467b48Spatrick // Create a 'cmp Reg, 0' instruction, no destination reg. 40009467b48Spatrick MI = BuildMI(MBB, DL, TII.get(X86::CMP32ri8)).addReg(Reg).addImm(42); 40109467b48Spatrick 40209467b48Spatrick // Create an 'sahf' instruction which takes no operands and stores nothing. 40309467b48Spatrick MI = BuildMI(MBB, DL, TII.get(X86::SAHF)); 40409467b48Spatrick 40509467b48Spatrick // Create a self looping branch instruction. 40609467b48Spatrick BuildMI(MBB, DL, TII.get(X86::JNE)).addMBB(&MBB); 40709467b48Spatrick 40809467b48SpatrickIf you need to add a definition operand (other than the optional destination 40909467b48Spatrickregister), you must explicitly mark it as such: 41009467b48Spatrick 41109467b48Spatrick.. code-block:: c++ 41209467b48Spatrick 41309467b48Spatrick MI.addReg(Reg, RegState::Define); 41409467b48Spatrick 41509467b48SpatrickFixed (preassigned) registers 41609467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 41709467b48Spatrick 41809467b48SpatrickOne important issue that the code generator needs to be aware of is the presence 41909467b48Spatrickof fixed registers. In particular, there are often places in the instruction 42009467b48Spatrickstream where the register allocator *must* arrange for a particular value to be 42109467b48Spatrickin a particular register. This can occur due to limitations of the instruction 42209467b48Spatrickset (e.g., the X86 can only do a 32-bit divide with the ``EAX``/``EDX`` 42309467b48Spatrickregisters), or external factors like calling conventions. In any case, the 42409467b48Spatrickinstruction selector should emit code that copies a virtual register into or out 42509467b48Spatrickof a physical register when needed. 42609467b48Spatrick 42709467b48SpatrickFor example, consider this simple LLVM example: 42809467b48Spatrick 42909467b48Spatrick.. code-block:: llvm 43009467b48Spatrick 43109467b48Spatrick define i32 @test(i32 %X, i32 %Y) { 43209467b48Spatrick %Z = sdiv i32 %X, %Y 43309467b48Spatrick ret i32 %Z 43409467b48Spatrick } 43509467b48Spatrick 43609467b48SpatrickThe X86 instruction selector might produce this machine code for the ``div`` and 43709467b48Spatrick``ret``: 43809467b48Spatrick 43909467b48Spatrick.. code-block:: text 44009467b48Spatrick 44109467b48Spatrick ;; Start of div 44209467b48Spatrick %EAX = mov %reg1024 ;; Copy X (in reg1024) into EAX 44309467b48Spatrick %reg1027 = sar %reg1024, 31 44409467b48Spatrick %EDX = mov %reg1027 ;; Sign extend X into EDX 44509467b48Spatrick idiv %reg1025 ;; Divide by Y (in reg1025) 44609467b48Spatrick %reg1026 = mov %EAX ;; Read the result (Z) out of EAX 44709467b48Spatrick 44809467b48Spatrick ;; Start of ret 44909467b48Spatrick %EAX = mov %reg1026 ;; 32-bit return value goes in EAX 45009467b48Spatrick ret 45109467b48Spatrick 45209467b48SpatrickBy the end of code generation, the register allocator would coalesce the 45309467b48Spatrickregisters and delete the resultant identity moves producing the following 45409467b48Spatrickcode: 45509467b48Spatrick 45609467b48Spatrick.. code-block:: text 45709467b48Spatrick 45809467b48Spatrick ;; X is in EAX, Y is in ECX 45909467b48Spatrick mov %EAX, %EDX 46009467b48Spatrick sar %EDX, 31 46109467b48Spatrick idiv %ECX 46209467b48Spatrick ret 46309467b48Spatrick 46409467b48SpatrickThis approach is extremely general (if it can handle the X86 architecture, it 46509467b48Spatrickcan handle anything!) and allows all of the target specific knowledge about the 46609467b48Spatrickinstruction stream to be isolated in the instruction selector. Note that 46709467b48Spatrickphysical registers should have a short lifetime for good code generation, and 46809467b48Spatrickall physical registers are assumed dead on entry to and exit from basic blocks 46909467b48Spatrick(before register allocation). Thus, if you need a value to be live across basic 47009467b48Spatrickblock boundaries, it *must* live in a virtual register. 47109467b48Spatrick 47209467b48SpatrickCall-clobbered registers 47309467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^ 47409467b48Spatrick 47509467b48SpatrickSome machine instructions, like calls, clobber a large number of physical 47609467b48Spatrickregisters. Rather than adding ``<def,dead>`` operands for all of them, it is 47709467b48Spatrickpossible to use an ``MO_RegisterMask`` operand instead. The register mask 47809467b48Spatrickoperand holds a bit mask of preserved registers, and everything else is 47909467b48Spatrickconsidered to be clobbered by the instruction. 48009467b48Spatrick 48109467b48SpatrickMachine code in SSA form 48209467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^ 48309467b48Spatrick 48409467b48Spatrick``MachineInstr``'s are initially selected in SSA-form, and are maintained in 48509467b48SpatrickSSA-form until register allocation happens. For the most part, this is 48609467b48Spatricktrivially simple since LLVM is already in SSA form; LLVM PHI nodes become 48709467b48Spatrickmachine code PHI nodes, and virtual registers are only allowed to have a single 48809467b48Spatrickdefinition. 48909467b48Spatrick 49009467b48SpatrickAfter register allocation, machine code is no longer in SSA-form because there 49109467b48Spatrickare no virtual registers left in the code. 49209467b48Spatrick 49309467b48Spatrick.. _MachineBasicBlock: 49409467b48Spatrick 49509467b48SpatrickThe ``MachineBasicBlock`` class 49609467b48Spatrick------------------------------- 49709467b48Spatrick 49809467b48SpatrickThe ``MachineBasicBlock`` class contains a list of machine instructions 49909467b48Spatrick(:raw-html:`<tt>` `MachineInstr`_ :raw-html:`</tt>` instances). It roughly 50009467b48Spatrickcorresponds to the LLVM code input to the instruction selector, but there can be 50109467b48Spatricka one-to-many mapping (i.e. one LLVM basic block can map to multiple machine 50209467b48Spatrickbasic blocks). The ``MachineBasicBlock`` class has a "``getBasicBlock``" method, 50309467b48Spatrickwhich returns the LLVM basic block that it comes from. 50409467b48Spatrick 50509467b48Spatrick.. _MachineFunction: 50609467b48Spatrick 50709467b48SpatrickThe ``MachineFunction`` class 50809467b48Spatrick----------------------------- 50909467b48Spatrick 51009467b48SpatrickThe ``MachineFunction`` class contains a list of machine basic blocks 51109467b48Spatrick(:raw-html:`<tt>` `MachineBasicBlock`_ :raw-html:`</tt>` instances). It 51209467b48Spatrickcorresponds one-to-one with the LLVM function input to the instruction selector. 513*d415bd75SrobertIn addition to a list of basic blocks, the ``MachineFunction`` contains a 51409467b48Spatrick``MachineConstantPool``, a ``MachineFrameInfo``, a ``MachineFunctionInfo``, and 51509467b48Spatricka ``MachineRegisterInfo``. See ``include/llvm/CodeGen/MachineFunction.h`` for 51609467b48Spatrickmore information. 51709467b48Spatrick 51809467b48Spatrick``MachineInstr Bundles`` 51909467b48Spatrick------------------------ 52009467b48Spatrick 52109467b48SpatrickLLVM code generator can model sequences of instructions as MachineInstr 52209467b48Spatrickbundles. A MI bundle can model a VLIW group / pack which contains an arbitrary 52309467b48Spatricknumber of parallel instructions. It can also be used to model a sequential list 52409467b48Spatrickof instructions (potentially with data dependencies) that cannot be legally 52509467b48Spatrickseparated (e.g. ARM Thumb2 IT blocks). 52609467b48Spatrick 52709467b48SpatrickConceptually a MI bundle is a MI with a number of other MIs nested within: 52809467b48Spatrick 52909467b48Spatrick:: 53009467b48Spatrick 53109467b48Spatrick -------------- 53209467b48Spatrick | Bundle | --------- 53309467b48Spatrick -------------- \ 53409467b48Spatrick | ---------------- 53509467b48Spatrick | | MI | 53609467b48Spatrick | ---------------- 53709467b48Spatrick | | 53809467b48Spatrick | ---------------- 53909467b48Spatrick | | MI | 54009467b48Spatrick | ---------------- 54109467b48Spatrick | | 54209467b48Spatrick | ---------------- 54309467b48Spatrick | | MI | 54409467b48Spatrick | ---------------- 54509467b48Spatrick | 54609467b48Spatrick -------------- 54709467b48Spatrick | Bundle | -------- 54809467b48Spatrick -------------- \ 54909467b48Spatrick | ---------------- 55009467b48Spatrick | | MI | 55109467b48Spatrick | ---------------- 55209467b48Spatrick | | 55309467b48Spatrick | ---------------- 55409467b48Spatrick | | MI | 55509467b48Spatrick | ---------------- 55609467b48Spatrick | | 55709467b48Spatrick | ... 55809467b48Spatrick | 55909467b48Spatrick -------------- 56009467b48Spatrick | Bundle | -------- 56109467b48Spatrick -------------- \ 56209467b48Spatrick | 56309467b48Spatrick ... 56409467b48Spatrick 56509467b48SpatrickMI bundle support does not change the physical representations of 56609467b48SpatrickMachineBasicBlock and MachineInstr. All the MIs (including top level and nested 56709467b48Spatrickones) are stored as sequential list of MIs. The "bundled" MIs are marked with 56809467b48Spatrickthe 'InsideBundle' flag. A top level MI with the special BUNDLE opcode is used 56909467b48Spatrickto represent the start of a bundle. It's legal to mix BUNDLE MIs with individual 57009467b48SpatrickMIs that are not inside bundles nor represent bundles. 57109467b48Spatrick 57209467b48SpatrickMachineInstr passes should operate on a MI bundle as a single unit. Member 57309467b48Spatrickmethods have been taught to correctly handle bundles and MIs inside bundles. 57409467b48SpatrickThe MachineBasicBlock iterator has been modified to skip over bundled MIs to 57509467b48Spatrickenforce the bundle-as-a-single-unit concept. An alternative iterator 57609467b48Spatrickinstr_iterator has been added to MachineBasicBlock to allow passes to iterate 57709467b48Spatrickover all of the MIs in a MachineBasicBlock, including those which are nested 57809467b48Spatrickinside bundles. The top level BUNDLE instruction must have the correct set of 57909467b48Spatrickregister MachineOperand's that represent the cumulative inputs and outputs of 58009467b48Spatrickthe bundled MIs. 58109467b48Spatrick 58209467b48SpatrickPacking / bundling of MachineInstrs for VLIW architectures should 58309467b48Spatrickgenerally be done as part of the register allocation super-pass. More 58409467b48Spatrickspecifically, the pass which determines what MIs should be bundled 58509467b48Spatricktogether should be done after code generator exits SSA form 58609467b48Spatrick(i.e. after two-address pass, PHI elimination, and copy coalescing). 58709467b48SpatrickSuch bundles should be finalized (i.e. adding BUNDLE MIs and input and 58809467b48Spatrickoutput register MachineOperands) after virtual registers have been 58909467b48Spatrickrewritten into physical registers. This eliminates the need to add 59009467b48Spatrickvirtual register operands to BUNDLE instructions which would 59109467b48Spatrickeffectively double the virtual register def and use lists. Bundles may 59209467b48Spatrickuse virtual registers and be formed in SSA form, but may not be 59309467b48Spatrickappropriate for all use cases. 59409467b48Spatrick 59509467b48Spatrick.. _MC Layer: 59609467b48Spatrick 59709467b48SpatrickThe "MC" Layer 59809467b48Spatrick============== 59909467b48Spatrick 60009467b48SpatrickThe MC Layer is used to represent and process code at the raw machine code 60109467b48Spatricklevel, devoid of "high level" information like "constant pools", "jump tables", 60209467b48Spatrick"global variables" or anything like that. At this level, LLVM handles things 60309467b48Spatricklike label names, machine instructions, and sections in the object file. The 60409467b48Spatrickcode in this layer is used for a number of important purposes: the tail end of 60509467b48Spatrickthe code generator uses it to write a .s or .o file, and it is also used by the 60609467b48Spatrickllvm-mc tool to implement standalone machine code assemblers and disassemblers. 60709467b48Spatrick 60809467b48SpatrickThis section describes some of the important classes. There are also a number 60909467b48Spatrickof important subsystems that interact at this layer, they are described later in 61009467b48Spatrickthis manual. 61109467b48Spatrick 61209467b48Spatrick.. _MCStreamer: 61309467b48Spatrick 61409467b48SpatrickThe ``MCStreamer`` API 61509467b48Spatrick---------------------- 61609467b48Spatrick 61709467b48SpatrickMCStreamer is best thought of as an assembler API. It is an abstract API which 61809467b48Spatrickis *implemented* in different ways (e.g. to output a .s file, output an ELF .o 61909467b48Spatrickfile, etc) but whose API correspond directly to what you see in a .s file. 62009467b48SpatrickMCStreamer has one method per directive, such as EmitLabel, EmitSymbolAttribute, 621*d415bd75SrobertswitchSection, emitValue (for .byte, .word), etc, which directly correspond to 62209467b48Spatrickassembly level directives. It also has an EmitInstruction method, which is used 62309467b48Spatrickto output an MCInst to the streamer. 62409467b48Spatrick 62509467b48SpatrickThis API is most important for two clients: the llvm-mc stand-alone assembler is 62609467b48Spatrickeffectively a parser that parses a line, then invokes a method on MCStreamer. In 62709467b48Spatrickthe code generator, the `Code Emission`_ phase of the code generator lowers 62809467b48Spatrickhigher level LLVM IR and Machine* constructs down to the MC layer, emitting 62909467b48Spatrickdirectives through MCStreamer. 63009467b48Spatrick 63109467b48SpatrickOn the implementation side of MCStreamer, there are two major implementations: 63209467b48Spatrickone for writing out a .s file (MCAsmStreamer), and one for writing out a .o 63309467b48Spatrickfile (MCObjectStreamer). MCAsmStreamer is a straightforward implementation 63409467b48Spatrickthat prints out a directive for each method (e.g. ``EmitValue -> .byte``), but 63509467b48SpatrickMCObjectStreamer implements a full assembler. 63609467b48Spatrick 63709467b48SpatrickFor target specific directives, the MCStreamer has a MCTargetStreamer instance. 63809467b48SpatrickEach target that needs it defines a class that inherits from it and is a lot 63909467b48Spatricklike MCStreamer itself: It has one method per directive and two classes that 64009467b48Spatrickinherit from it, a target object streamer and a target asm streamer. The target 64109467b48Spatrickasm streamer just prints it (``emitFnStart -> .fnstart``), and the object 64209467b48Spatrickstreamer implement the assembler logic for it. 64309467b48Spatrick 64409467b48SpatrickTo make llvm use these classes, the target initialization must call 64509467b48SpatrickTargetRegistry::RegisterAsmStreamer and TargetRegistry::RegisterMCObjectStreamer 64609467b48Spatrickpassing callbacks that allocate the corresponding target streamer and pass it 64709467b48Spatrickto createAsmStreamer or to the appropriate object streamer constructor. 64809467b48Spatrick 64909467b48SpatrickThe ``MCContext`` class 65009467b48Spatrick----------------------- 65109467b48Spatrick 65209467b48SpatrickThe MCContext class is the owner of a variety of uniqued data structures at the 65309467b48SpatrickMC layer, including symbols, sections, etc. As such, this is the class that you 65409467b48Spatrickinteract with to create symbols and sections. This class can not be subclassed. 65509467b48Spatrick 65609467b48SpatrickThe ``MCSymbol`` class 65709467b48Spatrick---------------------- 65809467b48Spatrick 65909467b48SpatrickThe MCSymbol class represents a symbol (aka label) in the assembly file. There 66009467b48Spatrickare two interesting kinds of symbols: assembler temporary symbols, and normal 66109467b48Spatricksymbols. Assembler temporary symbols are used and processed by the assembler 66209467b48Spatrickbut are discarded when the object file is produced. The distinction is usually 66309467b48Spatrickrepresented by adding a prefix to the label, for example "L" labels are 66409467b48Spatrickassembler temporary labels in MachO. 66509467b48Spatrick 66609467b48SpatrickMCSymbols are created by MCContext and uniqued there. This means that MCSymbols 66709467b48Spatrickcan be compared for pointer equivalence to find out if they are the same symbol. 66809467b48SpatrickNote that pointer inequality does not guarantee the labels will end up at 66909467b48Spatrickdifferent addresses though. It's perfectly legal to output something like this 67009467b48Spatrickto the .s file: 67109467b48Spatrick 67209467b48Spatrick:: 67309467b48Spatrick 67409467b48Spatrick foo: 67509467b48Spatrick bar: 67609467b48Spatrick .byte 4 67709467b48Spatrick 67809467b48SpatrickIn this case, both the foo and bar symbols will have the same address. 67909467b48Spatrick 68009467b48SpatrickThe ``MCSection`` class 68109467b48Spatrick----------------------- 68209467b48Spatrick 68309467b48SpatrickThe ``MCSection`` class represents an object-file specific section. It is 68409467b48Spatricksubclassed by object file specific implementations (e.g. ``MCSectionMachO``, 68509467b48Spatrick``MCSectionCOFF``, ``MCSectionELF``) and these are created and uniqued by 68609467b48SpatrickMCContext. The MCStreamer has a notion of the current section, which can be 68709467b48Spatrickchanged with the SwitchToSection method (which corresponds to a ".section" 68809467b48Spatrickdirective in a .s file). 68909467b48Spatrick 69009467b48Spatrick.. _MCInst: 69109467b48Spatrick 69209467b48SpatrickThe ``MCInst`` class 69309467b48Spatrick-------------------- 69409467b48Spatrick 69509467b48SpatrickThe ``MCInst`` class is a target-independent representation of an instruction. 69609467b48SpatrickIt is a simple class (much more so than `MachineInstr`_) that holds a 69709467b48Spatricktarget-specific opcode and a vector of MCOperands. MCOperand, in turn, is a 69809467b48Spatricksimple discriminated union of three cases: 1) a simple immediate, 2) a target 69909467b48Spatrickregister ID, 3) a symbolic expression (e.g. "``Lfoo-Lbar+42``") as an MCExpr. 70009467b48Spatrick 70109467b48SpatrickMCInst is the common currency used to represent machine instructions at the MC 70209467b48Spatricklayer. It is the type used by the instruction encoder, the instruction printer, 70309467b48Spatrickand the type generated by the assembly parser and disassembler. 70409467b48Spatrick 705*d415bd75Srobert.. _ObjectFormats: 706*d415bd75Srobert 707*d415bd75SrobertObject File Format 708*d415bd75Srobert------------------ 709*d415bd75Srobert 710*d415bd75SrobertThe MC layer's object writers support a variety of object formats. Because of 711*d415bd75Sroberttarget-specific aspects of object formats each target only supports a subset of 712*d415bd75Srobertthe formats supported by the MC layer. Most targets support emitting ELF 713*d415bd75Srobertobjects. Other vendor-specific objects are generally supported only on targets 714*d415bd75Srobertthat are supported by that vendor (i.e. MachO is only supported on targets 715*d415bd75Srobertsupported by Darwin, and XCOFF is only supported on targets that support AIX). 716*d415bd75SrobertAdditionally some targets have their own object formats (i.e. DirectX, SPIR-V 717*d415bd75Srobertand WebAssembly). 718*d415bd75Srobert 719*d415bd75SrobertThe table below captures a snapshot of object file support in LLVM: 720*d415bd75Srobert 721*d415bd75Srobert .. table:: Object File Formats 722*d415bd75Srobert 723*d415bd75Srobert ================== ======================================================== 724*d415bd75Srobert Format Supported Targets 725*d415bd75Srobert ================== ======================================================== 726*d415bd75Srobert ``COFF`` AArch64, ARM, X86 727*d415bd75Srobert ``DXContainer`` DirectX 728*d415bd75Srobert ``ELF`` AArch64, AMDGPU, ARM, AVR, BPF, CSKY, Hexagon, Lanai, LoongArch, M86k, MSP430, MIPS, PowerPC, RISCV, SPARC, SystemZ, VE, X86 729*d415bd75Srobert ``GCOFF`` SystemZ 730*d415bd75Srobert ``MachO`` AArch64, ARM, X86 731*d415bd75Srobert ``SPIR-V`` SPIRV 732*d415bd75Srobert ``WASM`` WebAssembly 733*d415bd75Srobert ``XCOFF`` PowerPC 734*d415bd75Srobert ================== ======================================================== 735*d415bd75Srobert 73609467b48Spatrick.. _Target-independent algorithms: 73709467b48Spatrick.. _code generation algorithm: 73809467b48Spatrick 73909467b48SpatrickTarget-independent code generation algorithms 74009467b48Spatrick============================================= 74109467b48Spatrick 74209467b48SpatrickThis section documents the phases described in the `high-level design of the 74309467b48Spatrickcode generator`_. It explains how they work and some of the rationale behind 74409467b48Spatricktheir design. 74509467b48Spatrick 74609467b48Spatrick.. _Instruction Selection: 74709467b48Spatrick.. _instruction selection section: 74809467b48Spatrick 74909467b48SpatrickInstruction Selection 75009467b48Spatrick--------------------- 75109467b48Spatrick 75209467b48SpatrickInstruction Selection is the process of translating LLVM code presented to the 75309467b48Spatrickcode generator into target-specific machine instructions. There are several 75409467b48Spatrickwell-known ways to do this in the literature. LLVM uses a SelectionDAG based 75509467b48Spatrickinstruction selector. 75609467b48Spatrick 75709467b48SpatrickPortions of the DAG instruction selector are generated from the target 75809467b48Spatrickdescription (``*.td``) files. Our goal is for the entire instruction selector 75909467b48Spatrickto be generated from these ``.td`` files, though currently there are still 76009467b48Spatrickthings that require custom C++ code. 76109467b48Spatrick 76273471bf0Spatrick`GlobalISel <https://llvm.org/docs/GlobalISel/index.html>`_ is another 76373471bf0Spatrickinstruction selection framework. 76473471bf0Spatrick 76509467b48Spatrick.. _SelectionDAG: 76609467b48Spatrick 76709467b48SpatrickIntroduction to SelectionDAGs 76809467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 76909467b48Spatrick 77009467b48SpatrickThe SelectionDAG provides an abstraction for code representation in a way that 77109467b48Spatrickis amenable to instruction selection using automatic techniques 77209467b48Spatrick(e.g. dynamic-programming based optimal pattern matching selectors). It is also 77309467b48Spatrickwell-suited to other phases of code generation; in particular, instruction 77409467b48Spatrickscheduling (SelectionDAG's are very close to scheduling DAGs post-selection). 77509467b48SpatrickAdditionally, the SelectionDAG provides a host representation where a large 77609467b48Spatrickvariety of very-low-level (but target-independent) `optimizations`_ may be 77709467b48Spatrickperformed; ones which require extensive information about the instructions 77809467b48Spatrickefficiently supported by the target. 77909467b48Spatrick 78009467b48SpatrickThe SelectionDAG is a Directed-Acyclic-Graph whose nodes are instances of the 78109467b48Spatrick``SDNode`` class. The primary payload of the ``SDNode`` is its operation code 78209467b48Spatrick(Opcode) that indicates what operation the node performs and the operands to the 78309467b48Spatrickoperation. The various operation node types are described at the top of the 78409467b48Spatrick``include/llvm/CodeGen/ISDOpcodes.h`` file. 78509467b48Spatrick 78609467b48SpatrickAlthough most operations define a single value, each node in the graph may 78709467b48Spatrickdefine multiple values. For example, a combined div/rem operation will define 78809467b48Spatrickboth the dividend and the remainder. Many other situations require multiple 78909467b48Spatrickvalues as well. Each node also has some number of operands, which are edges to 79009467b48Spatrickthe node defining the used value. Because nodes may define multiple values, 79109467b48Spatrickedges are represented by instances of the ``SDValue`` class, which is a 79209467b48Spatrick``<SDNode, unsigned>`` pair, indicating the node and result value being used, 79309467b48Spatrickrespectively. Each value produced by an ``SDNode`` has an associated ``MVT`` 79409467b48Spatrick(Machine Value Type) indicating what the type of the value is. 79509467b48Spatrick 79609467b48SpatrickSelectionDAGs contain two different kinds of values: those that represent data 79709467b48Spatrickflow and those that represent control flow dependencies. Data values are simple 79809467b48Spatrickedges with an integer or floating point value type. Control edges are 79909467b48Spatrickrepresented as "chain" edges which are of type ``MVT::Other``. These edges 80009467b48Spatrickprovide an ordering between nodes that have side effects (such as loads, stores, 80109467b48Spatrickcalls, returns, etc). All nodes that have side effects should take a token 80209467b48Spatrickchain as input and produce a new one as output. By convention, token chain 80309467b48Spatrickinputs are always operand #0, and chain results are always the last value 80409467b48Spatrickproduced by an operation. However, after instruction selection, the 80509467b48Spatrickmachine nodes have their chain after the instruction's operands, and 80609467b48Spatrickmay be followed by glue nodes. 80709467b48Spatrick 80809467b48SpatrickA SelectionDAG has designated "Entry" and "Root" nodes. The Entry node is 80909467b48Spatrickalways a marker node with an Opcode of ``ISD::EntryToken``. The Root node is 81009467b48Spatrickthe final side-effecting node in the token chain. For example, in a single basic 81109467b48Spatrickblock function it would be the return node. 81209467b48Spatrick 81309467b48SpatrickOne important concept for SelectionDAGs is the notion of a "legal" vs. 81409467b48Spatrick"illegal" DAG. A legal DAG for a target is one that only uses supported 81509467b48Spatrickoperations and supported types. On a 32-bit PowerPC, for example, a DAG with a 81609467b48Spatrickvalue of type i1, i8, i16, or i64 would be illegal, as would a DAG that uses a 81709467b48SpatrickSREM or UREM operation. The `legalize types`_ and `legalize operations`_ phases 81809467b48Spatrickare responsible for turning an illegal DAG into a legal DAG. 81909467b48Spatrick 82009467b48Spatrick.. _SelectionDAG-Process: 82109467b48Spatrick 82209467b48SpatrickSelectionDAG Instruction Selection Process 82309467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 82409467b48Spatrick 82509467b48SpatrickSelectionDAG-based instruction selection consists of the following steps: 82609467b48Spatrick 82709467b48Spatrick#. `Build initial DAG`_ --- This stage performs a simple translation from the 82809467b48Spatrick input LLVM code to an illegal SelectionDAG. 82909467b48Spatrick 83009467b48Spatrick#. `Optimize SelectionDAG`_ --- This stage performs simple optimizations on the 83109467b48Spatrick SelectionDAG to simplify it, and recognize meta instructions (like rotates 83209467b48Spatrick and ``div``/``rem`` pairs) for targets that support these meta operations. 83309467b48Spatrick This makes the resultant code more efficient and the `select instructions 83409467b48Spatrick from DAG`_ phase (below) simpler. 83509467b48Spatrick 83609467b48Spatrick#. `Legalize SelectionDAG Types`_ --- This stage transforms SelectionDAG nodes 83709467b48Spatrick to eliminate any types that are unsupported on the target. 83809467b48Spatrick 83909467b48Spatrick#. `Optimize SelectionDAG`_ --- The SelectionDAG optimizer is run to clean up 84009467b48Spatrick redundancies exposed by type legalization. 84109467b48Spatrick 84209467b48Spatrick#. `Legalize SelectionDAG Ops`_ --- This stage transforms SelectionDAG nodes to 84309467b48Spatrick eliminate any operations that are unsupported on the target. 84409467b48Spatrick 84509467b48Spatrick#. `Optimize SelectionDAG`_ --- The SelectionDAG optimizer is run to eliminate 84609467b48Spatrick inefficiencies introduced by operation legalization. 84709467b48Spatrick 84809467b48Spatrick#. `Select instructions from DAG`_ --- Finally, the target instruction selector 84909467b48Spatrick matches the DAG operations to target instructions. This process translates 85009467b48Spatrick the target-independent input DAG into another DAG of target instructions. 85109467b48Spatrick 85209467b48Spatrick#. `SelectionDAG Scheduling and Formation`_ --- The last phase assigns a linear 85309467b48Spatrick order to the instructions in the target-instruction DAG and emits them into 85409467b48Spatrick the MachineFunction being compiled. This step uses traditional prepass 85509467b48Spatrick scheduling techniques. 85609467b48Spatrick 85709467b48SpatrickAfter all of these steps are complete, the SelectionDAG is destroyed and the 85809467b48Spatrickrest of the code generation passes are run. 85909467b48Spatrick 86009467b48SpatrickOne great way to visualize what is going on here is to take advantage of a few 86109467b48SpatrickLLC command line options. The following options pop up a window displaying the 86209467b48SpatrickSelectionDAG at specific times (if you only get errors printed to the console 86309467b48Spatrickwhile using this, you probably `need to configure your 86409467b48Spatricksystem <ProgrammersManual.html#viewing-graphs-while-debugging-code>`_ to add support for it). 86509467b48Spatrick 86609467b48Spatrick* ``-view-dag-combine1-dags`` displays the DAG after being built, before the 86709467b48Spatrick first optimization pass. 86809467b48Spatrick 86909467b48Spatrick* ``-view-legalize-dags`` displays the DAG before Legalization. 87009467b48Spatrick 87109467b48Spatrick* ``-view-dag-combine2-dags`` displays the DAG before the second optimization 87209467b48Spatrick pass. 87309467b48Spatrick 87409467b48Spatrick* ``-view-isel-dags`` displays the DAG before the Select phase. 87509467b48Spatrick 87609467b48Spatrick* ``-view-sched-dags`` displays the DAG before Scheduling. 87709467b48Spatrick 87809467b48SpatrickThe ``-view-sunit-dags`` displays the Scheduler's dependency graph. This graph 87909467b48Spatrickis based on the final SelectionDAG, with nodes that must be scheduled together 88009467b48Spatrickbundled into a single scheduling-unit node, and with immediate operands and 88109467b48Spatrickother nodes that aren't relevant for scheduling omitted. 88209467b48Spatrick 88309467b48SpatrickThe option ``-filter-view-dags`` allows to select the name of the basic block 88409467b48Spatrickthat you are interested to visualize and filters all the previous 88509467b48Spatrick``view-*-dags`` options. 88609467b48Spatrick 88709467b48Spatrick.. _Build initial DAG: 88809467b48Spatrick 88909467b48SpatrickInitial SelectionDAG Construction 89009467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 89109467b48Spatrick 89209467b48SpatrickThe initial SelectionDAG is na\ :raw-html:`ï`\ vely peephole expanded from 89309467b48Spatrickthe LLVM input by the ``SelectionDAGBuilder`` class. The intent of this pass 89409467b48Spatrickis to expose as much low-level, target-specific details to the SelectionDAG as 89509467b48Spatrickpossible. This pass is mostly hard-coded (e.g. an LLVM ``add`` turns into an 89609467b48Spatrick``SDNode add`` while a ``getelementptr`` is expanded into the obvious 89709467b48Spatrickarithmetic). This pass requires target-specific hooks to lower calls, returns, 89809467b48Spatrickvarargs, etc. For these features, the :raw-html:`<tt>` `TargetLowering`_ 89909467b48Spatrick:raw-html:`</tt>` interface is used. 90009467b48Spatrick 90109467b48Spatrick.. _legalize types: 90209467b48Spatrick.. _Legalize SelectionDAG Types: 90309467b48Spatrick.. _Legalize SelectionDAG Ops: 90409467b48Spatrick 90509467b48SpatrickSelectionDAG LegalizeTypes Phase 90609467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 90709467b48Spatrick 90809467b48SpatrickThe Legalize phase is in charge of converting a DAG to only use the types that 90909467b48Spatrickare natively supported by the target. 91009467b48Spatrick 91109467b48SpatrickThere are two main ways of converting values of unsupported scalar types to 91209467b48Spatrickvalues of supported types: converting small types to larger types ("promoting"), 91309467b48Spatrickand breaking up large integer types into smaller ones ("expanding"). For 91409467b48Spatrickexample, a target might require that all f32 values are promoted to f64 and that 91509467b48Spatrickall i1/i8/i16 values are promoted to i32. The same target might require that 91609467b48Spatrickall i64 values be expanded into pairs of i32 values. These changes can insert 91709467b48Spatricksign and zero extensions as needed to make sure that the final code has the same 91809467b48Spatrickbehavior as the input. 91909467b48Spatrick 92009467b48SpatrickThere are two main ways of converting values of unsupported vector types to 92109467b48Spatrickvalue of supported types: splitting vector types, multiple times if necessary, 92209467b48Spatrickuntil a legal type is found, and extending vector types by adding elements to 92309467b48Spatrickthe end to round them out to legal types ("widening"). If a vector gets split 92409467b48Spatrickall the way down to single-element parts with no supported vector type being 92509467b48Spatrickfound, the elements are converted to scalars ("scalarizing"). 92609467b48Spatrick 92709467b48SpatrickA target implementation tells the legalizer which types are supported (and which 92809467b48Spatrickregister class to use for them) by calling the ``addRegisterClass`` method in 92909467b48Spatrickits ``TargetLowering`` constructor. 93009467b48Spatrick 93109467b48Spatrick.. _legalize operations: 93209467b48Spatrick.. _Legalizer: 93309467b48Spatrick 93409467b48SpatrickSelectionDAG Legalize Phase 93509467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^ 93609467b48Spatrick 93709467b48SpatrickThe Legalize phase is in charge of converting a DAG to only use the operations 93809467b48Spatrickthat are natively supported by the target. 93909467b48Spatrick 94009467b48SpatrickTargets often have weird constraints, such as not supporting every operation on 94109467b48Spatrickevery supported datatype (e.g. X86 does not support byte conditional moves and 94209467b48SpatrickPowerPC does not support sign-extending loads from a 16-bit memory location). 94309467b48SpatrickLegalize takes care of this by open-coding another sequence of operations to 94409467b48Spatrickemulate the operation ("expansion"), by promoting one type to a larger type that 94509467b48Spatricksupports the operation ("promotion"), or by using a target-specific hook to 94609467b48Spatrickimplement the legalization ("custom"). 94709467b48Spatrick 94809467b48SpatrickA target implementation tells the legalizer which operations are not supported 94909467b48Spatrick(and which of the above three actions to take) by calling the 95009467b48Spatrick``setOperationAction`` method in its ``TargetLowering`` constructor. 95109467b48Spatrick 95209467b48SpatrickIf a target has legal vector types, it is expected to produce efficient machine 95309467b48Spatrickcode for common forms of the shufflevector IR instruction using those types. 95409467b48SpatrickThis may require custom legalization for SelectionDAG vector operations that 95509467b48Spatrickare created from the shufflevector IR. The shufflevector forms that should be 95609467b48Spatrickhandled include: 95709467b48Spatrick 95809467b48Spatrick* Vector select --- Each element of the vector is chosen from either of the 95909467b48Spatrick corresponding elements of the 2 input vectors. This operation may also be 96009467b48Spatrick known as a "blend" or "bitwise select" in target assembly. This type of shuffle 96109467b48Spatrick maps directly to the ``shuffle_vector`` SelectionDAG node. 96209467b48Spatrick 96309467b48Spatrick* Insert subvector --- A vector is placed into a longer vector type starting 96409467b48Spatrick at index 0. This type of shuffle maps directly to the ``insert_subvector`` 96509467b48Spatrick SelectionDAG node with the ``index`` operand set to 0. 96609467b48Spatrick 96709467b48Spatrick* Extract subvector --- A vector is pulled from a longer vector type starting 96809467b48Spatrick at index 0. This type of shuffle maps directly to the ``extract_subvector`` 96909467b48Spatrick SelectionDAG node with the ``index`` operand set to 0. 97009467b48Spatrick 97109467b48Spatrick* Splat --- All elements of the vector have identical scalar elements. This 97209467b48Spatrick operation may also be known as a "broadcast" or "duplicate" in target assembly. 97309467b48Spatrick The shufflevector IR instruction may change the vector length, so this operation 97409467b48Spatrick may map to multiple SelectionDAG nodes including ``shuffle_vector``, 97509467b48Spatrick ``concat_vectors``, ``insert_subvector``, and ``extract_subvector``. 97609467b48Spatrick 97709467b48SpatrickPrior to the existence of the Legalize passes, we required that every target 97809467b48Spatrick`selector`_ supported and handled every operator and type even if they are not 97909467b48Spatricknatively supported. The introduction of the Legalize phases allows all of the 98009467b48Spatrickcanonicalization patterns to be shared across targets, and makes it very easy to 98109467b48Spatrickoptimize the canonicalized code because it is still in the form of a DAG. 98209467b48Spatrick 98309467b48Spatrick.. _optimizations: 98409467b48Spatrick.. _Optimize SelectionDAG: 98509467b48Spatrick.. _selector: 98609467b48Spatrick 98709467b48SpatrickSelectionDAG Optimization Phase: the DAG Combiner 98809467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 98909467b48Spatrick 99009467b48SpatrickThe SelectionDAG optimization phase is run multiple times for code generation, 99109467b48Spatrickimmediately after the DAG is built and once after each legalization. The first 99209467b48Spatrickrun of the pass allows the initial code to be cleaned up (e.g. performing 99309467b48Spatrickoptimizations that depend on knowing that the operators have restricted type 99409467b48Spatrickinputs). Subsequent runs of the pass clean up the messy code generated by the 99509467b48SpatrickLegalize passes, which allows Legalize to be very simple (it can focus on making 99609467b48Spatrickcode legal instead of focusing on generating *good* and legal code). 99709467b48Spatrick 99809467b48SpatrickOne important class of optimizations performed is optimizing inserted sign and 99909467b48Spatrickzero extension instructions. We currently use ad-hoc techniques, but could move 100009467b48Spatrickto more rigorous techniques in the future. Here are some good papers on the 100109467b48Spatricksubject: 100209467b48Spatrick 100309467b48Spatrick"`Widening integer arithmetic <http://www.eecs.harvard.edu/~nr/pubs/widen-abstract.html>`_" :raw-html:`<br>` 100409467b48SpatrickKevin Redwine and Norman Ramsey :raw-html:`<br>` 100509467b48SpatrickInternational Conference on Compiler Construction (CC) 2004 100609467b48Spatrick 100709467b48Spatrick"`Effective sign extension elimination <http://portal.acm.org/citation.cfm?doid=512529.512552>`_" :raw-html:`<br>` 100809467b48SpatrickMotohiro Kawahito, Hideaki Komatsu, and Toshio Nakatani :raw-html:`<br>` 100909467b48SpatrickProceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design 101009467b48Spatrickand Implementation. 101109467b48Spatrick 101209467b48Spatrick.. _Select instructions from DAG: 101309467b48Spatrick 101409467b48SpatrickSelectionDAG Select Phase 101509467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^ 101609467b48Spatrick 101709467b48SpatrickThe Select phase is the bulk of the target-specific code for instruction 101809467b48Spatrickselection. This phase takes a legal SelectionDAG as input, pattern matches the 101909467b48Spatrickinstructions supported by the target to this DAG, and produces a new DAG of 102009467b48Spatricktarget code. For example, consider the following LLVM fragment: 102109467b48Spatrick 102209467b48Spatrick.. code-block:: llvm 102309467b48Spatrick 102409467b48Spatrick %t1 = fadd float %W, %X 102509467b48Spatrick %t2 = fmul float %t1, %Y 102609467b48Spatrick %t3 = fadd float %t2, %Z 102709467b48Spatrick 102809467b48SpatrickThis LLVM code corresponds to a SelectionDAG that looks basically like this: 102909467b48Spatrick 103009467b48Spatrick.. code-block:: text 103109467b48Spatrick 103209467b48Spatrick (fadd:f32 (fmul:f32 (fadd:f32 W, X), Y), Z) 103309467b48Spatrick 103409467b48SpatrickIf a target supports floating point multiply-and-add (FMA) operations, one of 103509467b48Spatrickthe adds can be merged with the multiply. On the PowerPC, for example, the 103609467b48Spatrickoutput of the instruction selector might look like this DAG: 103709467b48Spatrick 103809467b48Spatrick:: 103909467b48Spatrick 104009467b48Spatrick (FMADDS (FADDS W, X), Y, Z) 104109467b48Spatrick 104209467b48SpatrickThe ``FMADDS`` instruction is a ternary instruction that multiplies its first 104309467b48Spatricktwo operands and adds the third (as single-precision floating-point numbers). 104409467b48SpatrickThe ``FADDS`` instruction is a simple binary single-precision add instruction. 104509467b48SpatrickTo perform this pattern match, the PowerPC backend includes the following 104609467b48Spatrickinstruction definitions: 104709467b48Spatrick 104809467b48Spatrick.. code-block:: text 104909467b48Spatrick :emphasize-lines: 4-5,9 105009467b48Spatrick 105109467b48Spatrick def FMADDS : AForm_1<59, 29, 105209467b48Spatrick (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB), 105309467b48Spatrick "fmadds $FRT, $FRA, $FRC, $FRB", 105409467b48Spatrick [(set F4RC:$FRT, (fadd (fmul F4RC:$FRA, F4RC:$FRC), 105509467b48Spatrick F4RC:$FRB))]>; 105609467b48Spatrick def FADDS : AForm_2<59, 21, 105709467b48Spatrick (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRB), 105809467b48Spatrick "fadds $FRT, $FRA, $FRB", 105909467b48Spatrick [(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))]>; 106009467b48Spatrick 106109467b48SpatrickThe highlighted portion of the instruction definitions indicates the pattern 106209467b48Spatrickused to match the instructions. The DAG operators (like ``fmul``/``fadd``) 106309467b48Spatrickare defined in the ``include/llvm/Target/TargetSelectionDAG.td`` file. 106409467b48Spatrick"``F4RC``" is the register class of the input and result values. 106509467b48Spatrick 106609467b48SpatrickThe TableGen DAG instruction selector generator reads the instruction patterns 106709467b48Spatrickin the ``.td`` file and automatically builds parts of the pattern matching code 106809467b48Spatrickfor your target. It has the following strengths: 106909467b48Spatrick 107009467b48Spatrick* At compiler-compile time, it analyzes your instruction patterns and tells you 107109467b48Spatrick if your patterns make sense or not. 107209467b48Spatrick 107309467b48Spatrick* It can handle arbitrary constraints on operands for the pattern match. In 107409467b48Spatrick particular, it is straight-forward to say things like "match any immediate 107509467b48Spatrick that is a 13-bit sign-extended value". For examples, see the ``immSExt16`` 107609467b48Spatrick and related ``tblgen`` classes in the PowerPC backend. 107709467b48Spatrick 107809467b48Spatrick* It knows several important identities for the patterns defined. For example, 107909467b48Spatrick it knows that addition is commutative, so it allows the ``FMADDS`` pattern 108009467b48Spatrick above to match "``(fadd X, (fmul Y, Z))``" as well as "``(fadd (fmul X, Y), 108109467b48Spatrick Z)``", without the target author having to specially handle this case. 108209467b48Spatrick 108309467b48Spatrick* It has a full-featured type-inferencing system. In particular, you should 108409467b48Spatrick rarely have to explicitly tell the system what type parts of your patterns 108509467b48Spatrick are. In the ``FMADDS`` case above, we didn't have to tell ``tblgen`` that all 108609467b48Spatrick of the nodes in the pattern are of type 'f32'. It was able to infer and 108709467b48Spatrick propagate this knowledge from the fact that ``F4RC`` has type 'f32'. 108809467b48Spatrick 108909467b48Spatrick* Targets can define their own (and rely on built-in) "pattern fragments". 109009467b48Spatrick Pattern fragments are chunks of reusable patterns that get inlined into your 109109467b48Spatrick patterns during compiler-compile time. For example, the integer "``(not 109209467b48Spatrick x)``" operation is actually defined as a pattern fragment that expands as 109309467b48Spatrick "``(xor x, -1)``", since the SelectionDAG does not have a native '``not``' 109409467b48Spatrick operation. Targets can define their own short-hand fragments as they see fit. 109509467b48Spatrick See the definition of '``not``' and '``ineg``' for examples. 109609467b48Spatrick 109709467b48Spatrick* In addition to instructions, targets can specify arbitrary patterns that map 109809467b48Spatrick to one or more instructions using the 'Pat' class. For example, the PowerPC 109909467b48Spatrick has no way to load an arbitrary integer immediate into a register in one 110009467b48Spatrick instruction. To tell tblgen how to do this, it defines: 110109467b48Spatrick 110209467b48Spatrick :: 110309467b48Spatrick 110409467b48Spatrick // Arbitrary immediate support. Implement in terms of LIS/ORI. 110509467b48Spatrick def : Pat<(i32 imm:$imm), 110609467b48Spatrick (ORI (LIS (HI16 imm:$imm)), (LO16 imm:$imm))>; 110709467b48Spatrick 110809467b48Spatrick If none of the single-instruction patterns for loading an immediate into a 110909467b48Spatrick register match, this will be used. This rule says "match an arbitrary i32 111009467b48Spatrick immediate, turning it into an ``ORI`` ('or a 16-bit immediate') and an ``LIS`` 111109467b48Spatrick ('load 16-bit immediate, where the immediate is shifted to the left 16 bits') 111209467b48Spatrick instruction". To make this work, the ``LO16``/``HI16`` node transformations 111309467b48Spatrick are used to manipulate the input immediate (in this case, take the high or low 111409467b48Spatrick 16-bits of the immediate). 111509467b48Spatrick 111609467b48Spatrick* When using the 'Pat' class to map a pattern to an instruction that has one 111709467b48Spatrick or more complex operands (like e.g. `X86 addressing mode`_), the pattern may 111809467b48Spatrick either specify the operand as a whole using a ``ComplexPattern``, or else it 111909467b48Spatrick may specify the components of the complex operand separately. The latter is 112009467b48Spatrick done e.g. for pre-increment instructions by the PowerPC back end: 112109467b48Spatrick 112209467b48Spatrick :: 112309467b48Spatrick 112409467b48Spatrick def STWU : DForm_1<37, (outs ptr_rc:$ea_res), (ins GPRC:$rS, memri:$dst), 112509467b48Spatrick "stwu $rS, $dst", LdStStoreUpd, []>, 112609467b48Spatrick RegConstraint<"$dst.reg = $ea_res">, NoEncode<"$ea_res">; 112709467b48Spatrick 112809467b48Spatrick def : Pat<(pre_store GPRC:$rS, ptr_rc:$ptrreg, iaddroff:$ptroff), 112909467b48Spatrick (STWU GPRC:$rS, iaddroff:$ptroff, ptr_rc:$ptrreg)>; 113009467b48Spatrick 113109467b48Spatrick Here, the pair of ``ptroff`` and ``ptrreg`` operands is matched onto the 113209467b48Spatrick complex operand ``dst`` of class ``memri`` in the ``STWU`` instruction. 113309467b48Spatrick 113409467b48Spatrick* While the system does automate a lot, it still allows you to write custom C++ 113509467b48Spatrick code to match special cases if there is something that is hard to 113609467b48Spatrick express. 113709467b48Spatrick 113809467b48SpatrickWhile it has many strengths, the system currently has some limitations, 113909467b48Spatrickprimarily because it is a work in progress and is not yet finished: 114009467b48Spatrick 114109467b48Spatrick* Overall, there is no way to define or match SelectionDAG nodes that define 114209467b48Spatrick multiple values (e.g. ``SMUL_LOHI``, ``LOAD``, ``CALL``, etc). This is the 114309467b48Spatrick biggest reason that you currently still *have to* write custom C++ code 114409467b48Spatrick for your instruction selector. 114509467b48Spatrick 114609467b48Spatrick* There is no great way to support matching complex addressing modes yet. In 114709467b48Spatrick the future, we will extend pattern fragments to allow them to define multiple 114809467b48Spatrick values (e.g. the four operands of the `X86 addressing mode`_, which are 114909467b48Spatrick currently matched with custom C++ code). In addition, we'll extend fragments 115009467b48Spatrick so that a fragment can match multiple different patterns. 115109467b48Spatrick 115209467b48Spatrick* We don't automatically infer flags like ``isStore``/``isLoad`` yet. 115309467b48Spatrick 115409467b48Spatrick* We don't automatically generate the set of supported registers and operations 115509467b48Spatrick for the `Legalizer`_ yet. 115609467b48Spatrick 115709467b48Spatrick* We don't have a way of tying in custom legalized nodes yet. 115809467b48Spatrick 115909467b48SpatrickDespite these limitations, the instruction selector generator is still quite 116009467b48Spatrickuseful for most of the binary and logical operations in typical instruction 116109467b48Spatricksets. If you run into any problems or can't figure out how to do something, 116209467b48Spatrickplease let Chris know! 116309467b48Spatrick 116409467b48Spatrick.. _Scheduling and Formation: 116509467b48Spatrick.. _SelectionDAG Scheduling and Formation: 116609467b48Spatrick 116709467b48SpatrickSelectionDAG Scheduling and Formation Phase 116809467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 116909467b48Spatrick 117009467b48SpatrickThe scheduling phase takes the DAG of target instructions from the selection 117109467b48Spatrickphase and assigns an order. The scheduler can pick an order depending on 117209467b48Spatrickvarious constraints of the machines (i.e. order for minimal register pressure or 117309467b48Spatricktry to cover instruction latencies). Once an order is established, the DAG is 117409467b48Spatrickconverted to a list of :raw-html:`<tt>` `MachineInstr`_\s :raw-html:`</tt>` and 117509467b48Spatrickthe SelectionDAG is destroyed. 117609467b48Spatrick 117709467b48SpatrickNote that this phase is logically separate from the instruction selection phase, 117809467b48Spatrickbut is tied to it closely in the code because it operates on SelectionDAGs. 117909467b48Spatrick 118009467b48SpatrickFuture directions for the SelectionDAG 118109467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 118209467b48Spatrick 118309467b48Spatrick#. Optional function-at-a-time selection. 118409467b48Spatrick 118509467b48Spatrick#. Auto-generate entire selector from ``.td`` file. 118609467b48Spatrick 118709467b48Spatrick.. _SSA-based Machine Code Optimizations: 118809467b48Spatrick 118909467b48SpatrickSSA-based Machine Code Optimizations 119009467b48Spatrick------------------------------------ 119109467b48Spatrick 119209467b48SpatrickTo Be Written 119309467b48Spatrick 119409467b48SpatrickLive Intervals 119509467b48Spatrick-------------- 119609467b48Spatrick 119709467b48SpatrickLive Intervals are the ranges (intervals) where a variable is *live*. They are 119809467b48Spatrickused by some `register allocator`_ passes to determine if two or more virtual 119909467b48Spatrickregisters which require the same physical register are live at the same point in 120009467b48Spatrickthe program (i.e., they conflict). When this situation occurs, one virtual 120109467b48Spatrickregister must be *spilled*. 120209467b48Spatrick 120309467b48SpatrickLive Variable Analysis 120409467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^ 120509467b48Spatrick 120609467b48SpatrickThe first step in determining the live intervals of variables is to calculate 120709467b48Spatrickthe set of registers that are immediately dead after the instruction (i.e., the 120809467b48Spatrickinstruction calculates the value, but it is never used) and the set of registers 120909467b48Spatrickthat are used by the instruction, but are never used after the instruction 121009467b48Spatrick(i.e., they are killed). Live variable information is computed for 121109467b48Spatrickeach *virtual* register and *register allocatable* physical register 121209467b48Spatrickin the function. This is done in a very efficient manner because it uses SSA to 121309467b48Spatricksparsely compute lifetime information for virtual registers (which are in SSA 121409467b48Spatrickform) and only has to track physical registers within a block. Before register 121509467b48Spatrickallocation, LLVM can assume that physical registers are only live within a 121609467b48Spatricksingle basic block. This allows it to do a single, local analysis to resolve 121709467b48Spatrickphysical register lifetimes within each basic block. If a physical register is 121809467b48Spatricknot register allocatable (e.g., a stack pointer or condition codes), it is not 121909467b48Spatricktracked. 122009467b48Spatrick 122109467b48SpatrickPhysical registers may be live in to or out of a function. Live in values are 122209467b48Spatricktypically arguments in registers. Live out values are typically return values in 122309467b48Spatrickregisters. Live in values are marked as such, and are given a dummy "defining" 122409467b48Spatrickinstruction during live intervals analysis. If the last basic block of a 122509467b48Spatrickfunction is a ``return``, then it's marked as using all live out values in the 122609467b48Spatrickfunction. 122709467b48Spatrick 122809467b48Spatrick``PHI`` nodes need to be handled specially, because the calculation of the live 122909467b48Spatrickvariable information from a depth first traversal of the CFG of the function 123009467b48Spatrickwon't guarantee that a virtual register used by the ``PHI`` node is defined 123109467b48Spatrickbefore it's used. When a ``PHI`` node is encountered, only the definition is 123209467b48Spatrickhandled, because the uses will be handled in other basic blocks. 123309467b48Spatrick 123409467b48SpatrickFor each ``PHI`` node of the current basic block, we simulate an assignment at 123509467b48Spatrickthe end of the current basic block and traverse the successor basic blocks. If a 123609467b48Spatricksuccessor basic block has a ``PHI`` node and one of the ``PHI`` node's operands 123709467b48Spatrickis coming from the current basic block, then the variable is marked as *alive* 123809467b48Spatrickwithin the current basic block and all of its predecessor basic blocks, until 123909467b48Spatrickthe basic block with the defining instruction is encountered. 124009467b48Spatrick 124109467b48SpatrickLive Intervals Analysis 124209467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^ 124309467b48Spatrick 124409467b48SpatrickWe now have the information available to perform the live intervals analysis and 124509467b48Spatrickbuild the live intervals themselves. We start off by numbering the basic blocks 124609467b48Spatrickand machine instructions. We then handle the "live-in" values. These are in 124709467b48Spatrickphysical registers, so the physical register is assumed to be killed by the end 124809467b48Spatrickof the basic block. Live intervals for virtual registers are computed for some 124909467b48Spatrickordering of the machine instructions ``[1, N]``. A live interval is an interval 125009467b48Spatrick``[i, j)``, where ``1 >= i >= j > N``, for which a variable is live. 125109467b48Spatrick 125209467b48Spatrick.. note:: 125309467b48Spatrick More to come... 125409467b48Spatrick 125509467b48Spatrick.. _Register Allocation: 125609467b48Spatrick.. _register allocator: 125709467b48Spatrick 125809467b48SpatrickRegister Allocation 125909467b48Spatrick------------------- 126009467b48Spatrick 126109467b48SpatrickThe *Register Allocation problem* consists in mapping a program 126209467b48Spatrick:raw-html:`<b><tt>` P\ :sub:`v`\ :raw-html:`</tt></b>`, that can use an unbounded 126309467b48Spatricknumber of virtual registers, to a program :raw-html:`<b><tt>` P\ :sub:`p`\ 126409467b48Spatrick:raw-html:`</tt></b>` that contains a finite (possibly small) number of physical 126509467b48Spatrickregisters. Each target architecture has a different number of physical 126609467b48Spatrickregisters. If the number of physical registers is not enough to accommodate all 126709467b48Spatrickthe virtual registers, some of them will have to be mapped into memory. These 126809467b48Spatrickvirtuals are called *spilled virtuals*. 126909467b48Spatrick 127009467b48SpatrickHow registers are represented in LLVM 127109467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 127209467b48Spatrick 127309467b48SpatrickIn LLVM, physical registers are denoted by integer numbers that normally range 127409467b48Spatrickfrom 1 to 1023. To see how this numbering is defined for a particular 127509467b48Spatrickarchitecture, you can read the ``GenRegisterNames.inc`` file for that 127609467b48Spatrickarchitecture. For instance, by inspecting 127709467b48Spatrick``lib/Target/X86/X86GenRegisterInfo.inc`` we see that the 32-bit register 127809467b48Spatrick``EAX`` is denoted by 43, and the MMX register ``MM0`` is mapped to 65. 127909467b48Spatrick 128009467b48SpatrickSome architectures contain registers that share the same physical location. A 128109467b48Spatricknotable example is the X86 platform. For instance, in the X86 architecture, the 128209467b48Spatrickregisters ``EAX``, ``AX`` and ``AL`` share the first eight bits. These physical 128309467b48Spatrickregisters are marked as *aliased* in LLVM. Given a particular architecture, you 128409467b48Spatrickcan check which registers are aliased by inspecting its ``RegisterInfo.td`` 128509467b48Spatrickfile. Moreover, the class ``MCRegAliasIterator`` enumerates all the physical 128609467b48Spatrickregisters aliased to a register. 128709467b48Spatrick 128809467b48SpatrickPhysical registers, in LLVM, are grouped in *Register Classes*. Elements in the 128909467b48Spatricksame register class are functionally equivalent, and can be interchangeably 129009467b48Spatrickused. Each virtual register can only be mapped to physical registers of a 129109467b48Spatrickparticular class. For instance, in the X86 architecture, some virtuals can only 129209467b48Spatrickbe allocated to 8 bit registers. A register class is described by 129309467b48Spatrick``TargetRegisterClass`` objects. To discover if a virtual register is 129409467b48Spatrickcompatible with a given physical, this code can be used: 129509467b48Spatrick 129609467b48Spatrick.. code-block:: c++ 129709467b48Spatrick 129809467b48Spatrick bool RegMapping_Fer::compatible_class(MachineFunction &mf, 129909467b48Spatrick unsigned v_reg, 130009467b48Spatrick unsigned p_reg) { 130109467b48Spatrick assert(TargetRegisterInfo::isPhysicalRegister(p_reg) && 130209467b48Spatrick "Target register must be physical"); 130309467b48Spatrick const TargetRegisterClass *trc = mf.getRegInfo().getRegClass(v_reg); 130409467b48Spatrick return trc->contains(p_reg); 130509467b48Spatrick } 130609467b48Spatrick 130709467b48SpatrickSometimes, mostly for debugging purposes, it is useful to change the number of 130809467b48Spatrickphysical registers available in the target architecture. This must be done 1309097a140dSpatrickstatically, inside the ``TargetRegisterInfo.td`` file. Just ``grep`` for 131009467b48Spatrick``RegisterClass``, the last parameter of which is a list of registers. Just 131109467b48Spatrickcommenting some out is one simple way to avoid them being used. A more polite 131209467b48Spatrickway is to explicitly exclude some registers from the *allocation order*. See the 131309467b48Spatrickdefinition of the ``GR8`` register class in 131409467b48Spatrick``lib/Target/X86/X86RegisterInfo.td`` for an example of this. 131509467b48Spatrick 131609467b48SpatrickVirtual registers are also denoted by integer numbers. Contrary to physical 131709467b48Spatrickregisters, different virtual registers never share the same number. Whereas 131809467b48Spatrickphysical registers are statically defined in a ``TargetRegisterInfo.td`` file 131909467b48Spatrickand cannot be created by the application developer, that is not the case with 132009467b48Spatrickvirtual registers. In order to create new virtual registers, use the method 132109467b48Spatrick``MachineRegisterInfo::createVirtualRegister()``. This method will return a new 132209467b48Spatrickvirtual register. Use an ``IndexedMap<Foo, VirtReg2IndexFunctor>`` to hold 132309467b48Spatrickinformation per virtual register. If you need to enumerate all virtual 132409467b48Spatrickregisters, use the function ``TargetRegisterInfo::index2VirtReg()`` to find the 132509467b48Spatrickvirtual register numbers: 132609467b48Spatrick 132709467b48Spatrick.. code-block:: c++ 132809467b48Spatrick 132909467b48Spatrick for (unsigned i = 0, e = MRI->getNumVirtRegs(); i != e; ++i) { 133009467b48Spatrick unsigned VirtReg = TargetRegisterInfo::index2VirtReg(i); 133109467b48Spatrick stuff(VirtReg); 133209467b48Spatrick } 133309467b48Spatrick 133409467b48SpatrickBefore register allocation, the operands of an instruction are mostly virtual 133509467b48Spatrickregisters, although physical registers may also be used. In order to check if a 133609467b48Spatrickgiven machine operand is a register, use the boolean function 133709467b48Spatrick``MachineOperand::isRegister()``. To obtain the integer code of a register, use 133809467b48Spatrick``MachineOperand::getReg()``. An instruction may define or use a register. For 133909467b48Spatrickinstance, ``ADD reg:1026 := reg:1025 reg:1024`` defines the registers 1024, and 134009467b48Spatrickuses registers 1025 and 1026. Given a register operand, the method 134109467b48Spatrick``MachineOperand::isUse()`` informs if that register is being used by the 134209467b48Spatrickinstruction. The method ``MachineOperand::isDef()`` informs if that registers is 134309467b48Spatrickbeing defined. 134409467b48Spatrick 134509467b48SpatrickWe will call physical registers present in the LLVM bitcode before register 134609467b48Spatrickallocation *pre-colored registers*. Pre-colored registers are used in many 134709467b48Spatrickdifferent situations, for instance, to pass parameters of functions calls, and 134809467b48Spatrickto store results of particular instructions. There are two types of pre-colored 134909467b48Spatrickregisters: the ones *implicitly* defined, and those *explicitly* 135009467b48Spatrickdefined. Explicitly defined registers are normal operands, and can be accessed 135109467b48Spatrickwith ``MachineInstr::getOperand(int)::getReg()``. In order to check which 135209467b48Spatrickregisters are implicitly defined by an instruction, use the 135309467b48Spatrick``TargetInstrInfo::get(opcode)::ImplicitDefs``, where ``opcode`` is the opcode 135409467b48Spatrickof the target instruction. One important difference between explicit and 135509467b48Spatrickimplicit physical registers is that the latter are defined statically for each 135609467b48Spatrickinstruction, whereas the former may vary depending on the program being 135709467b48Spatrickcompiled. For example, an instruction that represents a function call will 135809467b48Spatrickalways implicitly define or use the same set of physical registers. To read the 135909467b48Spatrickregisters implicitly used by an instruction, use 136009467b48Spatrick``TargetInstrInfo::get(opcode)::ImplicitUses``. Pre-colored registers impose 136109467b48Spatrickconstraints on any register allocation algorithm. The register allocator must 136209467b48Spatrickmake sure that none of them are overwritten by the values of virtual registers 136309467b48Spatrickwhile still alive. 136409467b48Spatrick 136509467b48SpatrickMapping virtual registers to physical registers 136609467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 136709467b48Spatrick 136809467b48SpatrickThere are two ways to map virtual registers to physical registers (or to memory 136909467b48Spatrickslots). The first way, that we will call *direct mapping*, is based on the use 137009467b48Spatrickof methods of the classes ``TargetRegisterInfo``, and ``MachineOperand``. The 137109467b48Spatricksecond way, that we will call *indirect mapping*, relies on the ``VirtRegMap`` 137209467b48Spatrickclass in order to insert loads and stores sending and getting values to and from 137309467b48Spatrickmemory. 137409467b48Spatrick 137509467b48SpatrickThe direct mapping provides more flexibility to the developer of the register 137609467b48Spatrickallocator; however, it is more error prone, and demands more implementation 137709467b48Spatrickwork. Basically, the programmer will have to specify where load and store 137809467b48Spatrickinstructions should be inserted in the target function being compiled in order 137909467b48Spatrickto get and store values in memory. To assign a physical register to a virtual 138009467b48Spatrickregister present in a given operand, use ``MachineOperand::setReg(p_reg)``. To 138109467b48Spatrickinsert a store instruction, use ``TargetInstrInfo::storeRegToStackSlot(...)``, 138209467b48Spatrickand to insert a load instruction, use ``TargetInstrInfo::loadRegFromStackSlot``. 138309467b48Spatrick 138409467b48SpatrickThe indirect mapping shields the application developer from the complexities of 138509467b48Spatrickinserting load and store instructions. In order to map a virtual register to a 138609467b48Spatrickphysical one, use ``VirtRegMap::assignVirt2Phys(vreg, preg)``. In order to map 138709467b48Spatricka certain virtual register to memory, use 138809467b48Spatrick``VirtRegMap::assignVirt2StackSlot(vreg)``. This method will return the stack 138909467b48Spatrickslot where ``vreg``'s value will be located. If it is necessary to map another 139009467b48Spatrickvirtual register to the same stack slot, use 139109467b48Spatrick``VirtRegMap::assignVirt2StackSlot(vreg, stack_location)``. One important point 139209467b48Spatrickto consider when using the indirect mapping, is that even if a virtual register 139309467b48Spatrickis mapped to memory, it still needs to be mapped to a physical register. This 139409467b48Spatrickphysical register is the location where the virtual register is supposed to be 139509467b48Spatrickfound before being stored or after being reloaded. 139609467b48Spatrick 139709467b48SpatrickIf the indirect strategy is used, after all the virtual registers have been 139809467b48Spatrickmapped to physical registers or stack slots, it is necessary to use a spiller 139909467b48Spatrickobject to place load and store instructions in the code. Every virtual that has 140009467b48Spatrickbeen mapped to a stack slot will be stored to memory after being defined and will 140109467b48Spatrickbe loaded before being used. The implementation of the spiller tries to recycle 140209467b48Spatrickload/store instructions, avoiding unnecessary instructions. For an example of 140309467b48Spatrickhow to invoke the spiller, see ``RegAllocLinearScan::runOnMachineFunction`` in 140409467b48Spatrick``lib/CodeGen/RegAllocLinearScan.cpp``. 140509467b48Spatrick 140609467b48SpatrickHandling two address instructions 140709467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 140809467b48Spatrick 140909467b48SpatrickWith very rare exceptions (e.g., function calls), the LLVM machine code 141009467b48Spatrickinstructions are three address instructions. That is, each instruction is 141109467b48Spatrickexpected to define at most one register, and to use at most two registers. 141209467b48SpatrickHowever, some architectures use two address instructions. In this case, the 141309467b48Spatrickdefined register is also one of the used registers. For instance, an instruction 141409467b48Spatricksuch as ``ADD %EAX, %EBX``, in X86 is actually equivalent to ``%EAX = %EAX + 141509467b48Spatrick%EBX``. 141609467b48Spatrick 141709467b48SpatrickIn order to produce correct code, LLVM must convert three address instructions 141809467b48Spatrickthat represent two address instructions into true two address instructions. LLVM 141909467b48Spatrickprovides the pass ``TwoAddressInstructionPass`` for this specific purpose. It 142009467b48Spatrickmust be run before register allocation takes place. After its execution, the 142109467b48Spatrickresulting code may no longer be in SSA form. This happens, for instance, in 142209467b48Spatricksituations where an instruction such as ``%a = ADD %b %c`` is converted to two 142309467b48Spatrickinstructions such as: 142409467b48Spatrick 142509467b48Spatrick:: 142609467b48Spatrick 142709467b48Spatrick %a = MOVE %b 142809467b48Spatrick %a = ADD %a %c 142909467b48Spatrick 143009467b48SpatrickNotice that, internally, the second instruction is represented as ``ADD 143109467b48Spatrick%a[def/use] %c``. I.e., the register operand ``%a`` is both used and defined by 143209467b48Spatrickthe instruction. 143309467b48Spatrick 143409467b48SpatrickThe SSA deconstruction phase 143509467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 143609467b48Spatrick 143709467b48SpatrickAn important transformation that happens during register allocation is called 143809467b48Spatrickthe *SSA Deconstruction Phase*. The SSA form simplifies many analyses that are 143909467b48Spatrickperformed on the control flow graph of programs. However, traditional 144009467b48Spatrickinstruction sets do not implement PHI instructions. Thus, in order to generate 144109467b48Spatrickexecutable code, compilers must replace PHI instructions with other instructions 144209467b48Spatrickthat preserve their semantics. 144309467b48Spatrick 144409467b48SpatrickThere are many ways in which PHI instructions can safely be removed from the 144509467b48Spatricktarget code. The most traditional PHI deconstruction algorithm replaces PHI 144609467b48Spatrickinstructions with copy instructions. That is the strategy adopted by LLVM. The 144709467b48SpatrickSSA deconstruction algorithm is implemented in 144809467b48Spatrick``lib/CodeGen/PHIElimination.cpp``. In order to invoke this pass, the identifier 144909467b48Spatrick``PHIEliminationID`` must be marked as required in the code of the register 145009467b48Spatrickallocator. 145109467b48Spatrick 145209467b48SpatrickInstruction folding 145309467b48Spatrick^^^^^^^^^^^^^^^^^^^ 145409467b48Spatrick 145509467b48Spatrick*Instruction folding* is an optimization performed during register allocation 145609467b48Spatrickthat removes unnecessary copy instructions. For instance, a sequence of 145709467b48Spatrickinstructions such as: 145809467b48Spatrick 145909467b48Spatrick:: 146009467b48Spatrick 146109467b48Spatrick %EBX = LOAD %mem_address 146209467b48Spatrick %EAX = COPY %EBX 146309467b48Spatrick 146409467b48Spatrickcan be safely substituted by the single instruction: 146509467b48Spatrick 146609467b48Spatrick:: 146709467b48Spatrick 146809467b48Spatrick %EAX = LOAD %mem_address 146909467b48Spatrick 147009467b48SpatrickInstructions can be folded with the 147109467b48Spatrick``TargetRegisterInfo::foldMemoryOperand(...)`` method. Care must be taken when 147209467b48Spatrickfolding instructions; a folded instruction can be quite different from the 147309467b48Spatrickoriginal instruction. See ``LiveIntervals::addIntervalsForSpills`` in 147409467b48Spatrick``lib/CodeGen/LiveIntervalAnalysis.cpp`` for an example of its use. 147509467b48Spatrick 147609467b48SpatrickBuilt in register allocators 147709467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 147809467b48Spatrick 147909467b48SpatrickThe LLVM infrastructure provides the application developer with three different 148009467b48Spatrickregister allocators: 148109467b48Spatrick 148209467b48Spatrick* *Fast* --- This register allocator is the default for debug builds. It 148309467b48Spatrick allocates registers on a basic block level, attempting to keep values in 148409467b48Spatrick registers and reusing registers as appropriate. 148509467b48Spatrick 148609467b48Spatrick* *Basic* --- This is an incremental approach to register allocation. Live 148709467b48Spatrick ranges are assigned to registers one at a time in an order that is driven by 148809467b48Spatrick heuristics. Since code can be rewritten on-the-fly during allocation, this 148909467b48Spatrick framework allows interesting allocators to be developed as extensions. It is 149009467b48Spatrick not itself a production register allocator but is a potentially useful 149109467b48Spatrick stand-alone mode for triaging bugs and as a performance baseline. 149209467b48Spatrick 149309467b48Spatrick* *Greedy* --- *The default allocator*. This is a highly tuned implementation of 149409467b48Spatrick the *Basic* allocator that incorporates global live range splitting. This 149509467b48Spatrick allocator works hard to minimize the cost of spill code. 149609467b48Spatrick 149709467b48Spatrick* *PBQP* --- A Partitioned Boolean Quadratic Programming (PBQP) based register 149809467b48Spatrick allocator. This allocator works by constructing a PBQP problem representing 149909467b48Spatrick the register allocation problem under consideration, solving this using a PBQP 150009467b48Spatrick solver, and mapping the solution back to a register assignment. 150109467b48Spatrick 150209467b48SpatrickThe type of register allocator used in ``llc`` can be chosen with the command 150309467b48Spatrickline option ``-regalloc=...``: 150409467b48Spatrick 150509467b48Spatrick.. code-block:: bash 150609467b48Spatrick 150709467b48Spatrick $ llc -regalloc=linearscan file.bc -o ln.s 150809467b48Spatrick $ llc -regalloc=fast file.bc -o fa.s 150909467b48Spatrick $ llc -regalloc=pbqp file.bc -o pbqp.s 151009467b48Spatrick 151109467b48Spatrick.. _Prolog/Epilog Code Insertion: 151209467b48Spatrick 151309467b48SpatrickProlog/Epilog Code Insertion 151409467b48Spatrick---------------------------- 151509467b48Spatrick 1516*d415bd75Srobert.. note:: 1517*d415bd75Srobert 1518*d415bd75Srobert To Be Written 1519*d415bd75Srobert 152009467b48SpatrickCompact Unwind 1521*d415bd75Srobert-------------- 152209467b48Spatrick 152309467b48SpatrickThrowing an exception requires *unwinding* out of a function. The information on 152409467b48Spatrickhow to unwind a given function is traditionally expressed in DWARF unwind 152509467b48Spatrick(a.k.a. frame) info. But that format was originally developed for debuggers to 152609467b48Spatrickbacktrace, and each Frame Description Entry (FDE) requires ~20-30 bytes per 152709467b48Spatrickfunction. There is also the cost of mapping from an address in a function to the 152809467b48Spatrickcorresponding FDE at runtime. An alternative unwind encoding is called *compact 152909467b48Spatrickunwind* and requires just 4-bytes per function. 153009467b48Spatrick 153109467b48SpatrickThe compact unwind encoding is a 32-bit value, which is encoded in an 153209467b48Spatrickarchitecture-specific way. It specifies which registers to restore and from 153309467b48Spatrickwhere, and how to unwind out of the function. When the linker creates a final 153409467b48Spatricklinked image, it will create a ``__TEXT,__unwind_info`` section. This section is 153509467b48Spatricka small and fast way for the runtime to access unwind info for any given 153609467b48Spatrickfunction. If we emit compact unwind info for the function, that compact unwind 153709467b48Spatrickinfo will be encoded in the ``__TEXT,__unwind_info`` section. If we emit DWARF 153809467b48Spatrickunwind info, the ``__TEXT,__unwind_info`` section will contain the offset of the 153909467b48SpatrickFDE in the ``__TEXT,__eh_frame`` section in the final linked image. 154009467b48Spatrick 154109467b48SpatrickFor X86, there are three modes for the compact unwind encoding: 154209467b48Spatrick 154309467b48Spatrick*Function with a Frame Pointer (``EBP`` or ``RBP``)* 154409467b48Spatrick ``EBP/RBP``-based frame, where ``EBP/RBP`` is pushed onto the stack 154509467b48Spatrick immediately after the return address, then ``ESP/RSP`` is moved to 154609467b48Spatrick ``EBP/RBP``. Thus to unwind, ``ESP/RSP`` is restored with the current 154709467b48Spatrick ``EBP/RBP`` value, then ``EBP/RBP`` is restored by popping the stack, and the 154809467b48Spatrick return is done by popping the stack once more into the PC. All non-volatile 154909467b48Spatrick registers that need to be restored must have been saved in a small range on 155009467b48Spatrick the stack that starts ``EBP-4`` to ``EBP-1020`` (``RBP-8`` to 155109467b48Spatrick ``RBP-1020``). The offset (divided by 4 in 32-bit mode and 8 in 64-bit mode) 155209467b48Spatrick is encoded in bits 16-23 (mask: ``0x00FF0000``). The registers saved are 155309467b48Spatrick encoded in bits 0-14 (mask: ``0x00007FFF``) as five 3-bit entries from the 155409467b48Spatrick following table: 155509467b48Spatrick 155609467b48Spatrick ============== ============= =============== 155709467b48Spatrick Compact Number i386 Register x86-64 Register 155809467b48Spatrick ============== ============= =============== 155909467b48Spatrick 1 ``EBX`` ``RBX`` 156009467b48Spatrick 2 ``ECX`` ``R12`` 156109467b48Spatrick 3 ``EDX`` ``R13`` 156209467b48Spatrick 4 ``EDI`` ``R14`` 156309467b48Spatrick 5 ``ESI`` ``R15`` 156409467b48Spatrick 6 ``EBP`` ``RBP`` 156509467b48Spatrick ============== ============= =============== 156609467b48Spatrick 156709467b48Spatrick*Frameless with a Small Constant Stack Size (``EBP`` or ``RBP`` is not used as a frame pointer)* 156809467b48Spatrick To return, a constant (encoded in the compact unwind encoding) is added to the 156909467b48Spatrick ``ESP/RSP``. Then the return is done by popping the stack into the PC. All 157009467b48Spatrick non-volatile registers that need to be restored must have been saved on the 157109467b48Spatrick stack immediately after the return address. The stack size (divided by 4 in 157209467b48Spatrick 32-bit mode and 8 in 64-bit mode) is encoded in bits 16-23 (mask: 157309467b48Spatrick ``0x00FF0000``). There is a maximum stack size of 1024 bytes in 32-bit mode 157409467b48Spatrick and 2048 in 64-bit mode. The number of registers saved is encoded in bits 9-12 157509467b48Spatrick (mask: ``0x00001C00``). Bits 0-9 (mask: ``0x000003FF``) contain which 157609467b48Spatrick registers were saved and their order. (See the 157709467b48Spatrick ``encodeCompactUnwindRegistersWithoutFrame()`` function in 157809467b48Spatrick ``lib/Target/X86FrameLowering.cpp`` for the encoding algorithm.) 157909467b48Spatrick 158009467b48Spatrick*Frameless with a Large Constant Stack Size (``EBP`` or ``RBP`` is not used as a frame pointer)* 158109467b48Spatrick This case is like the "Frameless with a Small Constant Stack Size" case, but 158209467b48Spatrick the stack size is too large to encode in the compact unwind encoding. Instead 158309467b48Spatrick it requires that the function contains "``subl $nnnnnn, %esp``" in its 158409467b48Spatrick prolog. The compact encoding contains the offset to the ``$nnnnnn`` value in 158509467b48Spatrick the function in bits 9-12 (mask: ``0x00001C00``). 158609467b48Spatrick 158709467b48Spatrick.. _Late Machine Code Optimizations: 158809467b48Spatrick 158909467b48SpatrickLate Machine Code Optimizations 159009467b48Spatrick------------------------------- 159109467b48Spatrick 159209467b48Spatrick.. note:: 159309467b48Spatrick 159409467b48Spatrick To Be Written 159509467b48Spatrick 159609467b48Spatrick.. _Code Emission: 159709467b48Spatrick 159809467b48SpatrickCode Emission 159909467b48Spatrick------------- 160009467b48Spatrick 160109467b48SpatrickThe code emission step of code generation is responsible for lowering from the 160209467b48Spatrickcode generator abstractions (like `MachineFunction`_, `MachineInstr`_, etc) down 160309467b48Spatrickto the abstractions used by the MC layer (`MCInst`_, `MCStreamer`_, etc). This 160409467b48Spatrickis done with a combination of several different classes: the (misnamed) 160509467b48Spatricktarget-independent AsmPrinter class, target-specific subclasses of AsmPrinter 160609467b48Spatrick(such as SparcAsmPrinter), and the TargetLoweringObjectFile class. 160709467b48Spatrick 160809467b48SpatrickSince the MC layer works at the level of abstraction of object files, it doesn't 160909467b48Spatrickhave a notion of functions, global variables etc. Instead, it thinks about 161009467b48Spatricklabels, directives, and instructions. A key class used at this time is the 161109467b48SpatrickMCStreamer class. This is an abstract API that is implemented in different ways 161209467b48Spatrick(e.g. to output a .s file, output an ELF .o file, etc) that is effectively an 161309467b48Spatrick"assembler API". MCStreamer has one method per directive, such as EmitLabel, 1614*d415bd75SrobertEmitSymbolAttribute, switchSection, etc, which directly correspond to assembly 161509467b48Spatricklevel directives. 161609467b48Spatrick 161709467b48SpatrickIf you are interested in implementing a code generator for a target, there are 161809467b48Spatrickthree important things that you have to implement for your target: 161909467b48Spatrick 162009467b48Spatrick#. First, you need a subclass of AsmPrinter for your target. This class 162109467b48Spatrick implements the general lowering process converting MachineFunction's into MC 162209467b48Spatrick label constructs. The AsmPrinter base class provides a number of useful 162309467b48Spatrick methods and routines, and also allows you to override the lowering process in 162409467b48Spatrick some important ways. You should get much of the lowering for free if you are 162509467b48Spatrick implementing an ELF, COFF, or MachO target, because the 162609467b48Spatrick TargetLoweringObjectFile class implements much of the common logic. 162709467b48Spatrick 162809467b48Spatrick#. Second, you need to implement an instruction printer for your target. The 162909467b48Spatrick instruction printer takes an `MCInst`_ and renders it to a raw_ostream as 163009467b48Spatrick text. Most of this is automatically generated from the .td file (when you 163109467b48Spatrick specify something like "``add $dst, $src1, $src2``" in the instructions), but 163209467b48Spatrick you need to implement routines to print operands. 163309467b48Spatrick 163409467b48Spatrick#. Third, you need to implement code that lowers a `MachineInstr`_ to an MCInst, 163509467b48Spatrick usually implemented in "<target>MCInstLower.cpp". This lowering process is 163609467b48Spatrick often target specific, and is responsible for turning jump table entries, 163709467b48Spatrick constant pool indices, global variable addresses, etc into MCLabels as 163809467b48Spatrick appropriate. This translation layer is also responsible for expanding pseudo 163909467b48Spatrick ops used by the code generator into the actual machine instructions they 164009467b48Spatrick correspond to. The MCInsts that are generated by this are fed into the 164109467b48Spatrick instruction printer or the encoder. 164209467b48Spatrick 164309467b48SpatrickFinally, at your choosing, you can also implement a subclass of MCCodeEmitter 164409467b48Spatrickwhich lowers MCInst's into machine code bytes and relocations. This is 164509467b48Spatrickimportant if you want to support direct .o file emission, or would like to 164609467b48Spatrickimplement an assembler for your target. 164709467b48Spatrick 164809467b48SpatrickEmitting function stack size information 164909467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 165009467b48Spatrick 165109467b48SpatrickA section containing metadata on function stack sizes will be emitted when 165209467b48Spatrick``TargetLoweringObjectFile::StackSizesSection`` is not null, and 165309467b48Spatrick``TargetOptions::EmitStackSizeSection`` is set (-stack-size-section). The 165409467b48Spatricksection will contain an array of pairs of function symbol values (pointer size) 165509467b48Spatrickand stack sizes (unsigned LEB128). The stack size values only include the space 165609467b48Spatrickallocated in the function prologue. Functions with dynamic stack allocations are 165709467b48Spatricknot included. 165809467b48Spatrick 165909467b48SpatrickVLIW Packetizer 166009467b48Spatrick--------------- 166109467b48Spatrick 166209467b48SpatrickIn a Very Long Instruction Word (VLIW) architecture, the compiler is responsible 166309467b48Spatrickfor mapping instructions to functional-units available on the architecture. To 166409467b48Spatrickthat end, the compiler creates groups of instructions called *packets* or 166509467b48Spatrick*bundles*. The VLIW packetizer in LLVM is a target-independent mechanism to 166609467b48Spatrickenable the packetization of machine instructions. 166709467b48Spatrick 166809467b48SpatrickMapping from instructions to functional units 166909467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 167009467b48Spatrick 167109467b48SpatrickInstructions in a VLIW target can typically be mapped to multiple functional 167209467b48Spatrickunits. During the process of packetizing, the compiler must be able to reason 167309467b48Spatrickabout whether an instruction can be added to a packet. This decision can be 167409467b48Spatrickcomplex since the compiler has to examine all possible mappings of instructions 167509467b48Spatrickto functional units. Therefore to alleviate compilation-time complexity, the 167609467b48SpatrickVLIW packetizer parses the instruction classes of a target and generates tables 167709467b48Spatrickat compiler build time. These tables can then be queried by the provided 167809467b48Spatrickmachine-independent API to determine if an instruction can be accommodated in a 167909467b48Spatrickpacket. 168009467b48Spatrick 168109467b48SpatrickHow the packetization tables are generated and used 168209467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 168309467b48Spatrick 168409467b48SpatrickThe packetizer reads instruction classes from a target's itineraries and creates 168509467b48Spatricka deterministic finite automaton (DFA) to represent the state of a packet. A DFA 168609467b48Spatrickconsists of three major elements: inputs, states, and transitions. The set of 168709467b48Spatrickinputs for the generated DFA represents the instruction being added to a 168809467b48Spatrickpacket. The states represent the possible consumption of functional units by 168909467b48Spatrickinstructions in a packet. In the DFA, transitions from one state to another 169009467b48Spatrickoccur on the addition of an instruction to an existing packet. If there is a 169109467b48Spatricklegal mapping of functional units to instructions, then the DFA contains a 169209467b48Spatrickcorresponding transition. The absence of a transition indicates that a legal 169309467b48Spatrickmapping does not exist and that the instruction cannot be added to the packet. 169409467b48Spatrick 169509467b48SpatrickTo generate tables for a VLIW target, add *Target*\ GenDFAPacketizer.inc as a 169609467b48Spatricktarget to the Makefile in the target directory. The exported API provides three 169709467b48Spatrickfunctions: ``DFAPacketizer::clearResources()``, 169809467b48Spatrick``DFAPacketizer::reserveResources(MachineInstr *MI)``, and 169909467b48Spatrick``DFAPacketizer::canReserveResources(MachineInstr *MI)``. These functions allow 170009467b48Spatricka target packetizer to add an instruction to an existing packet and to check 170109467b48Spatrickwhether an instruction can be added to a packet. See 170209467b48Spatrick``llvm/CodeGen/DFAPacketizer.h`` for more information. 170309467b48Spatrick 170409467b48SpatrickImplementing a Native Assembler 170509467b48Spatrick=============================== 170609467b48Spatrick 170709467b48SpatrickThough you're probably reading this because you want to write or maintain a 170809467b48Spatrickcompiler backend, LLVM also fully supports building a native assembler. 170909467b48SpatrickWe've tried hard to automate the generation of the assembler from the .td files 171009467b48Spatrick(in particular the instruction syntax and encodings), which means that a large 171109467b48Spatrickpart of the manual and repetitive data entry can be factored and shared with the 171209467b48Spatrickcompiler. 171309467b48Spatrick 171409467b48SpatrickInstruction Parsing 171509467b48Spatrick------------------- 171609467b48Spatrick 171709467b48Spatrick.. note:: 171809467b48Spatrick 171909467b48Spatrick To Be Written 172009467b48Spatrick 172109467b48Spatrick 172209467b48SpatrickInstruction Alias Processing 172309467b48Spatrick---------------------------- 172409467b48Spatrick 172509467b48SpatrickOnce the instruction is parsed, it enters the MatchInstructionImpl function. 172609467b48SpatrickThe MatchInstructionImpl function performs alias processing and then does actual 172709467b48Spatrickmatching. 172809467b48Spatrick 172909467b48SpatrickAlias processing is the phase that canonicalizes different lexical forms of the 173009467b48Spatricksame instructions down to one representation. There are several different kinds 173109467b48Spatrickof alias that are possible to implement and they are listed below in the order 173209467b48Spatrickthat they are processed (which is in order from simplest/weakest to most 173309467b48Spatrickcomplex/powerful). Generally you want to use the first alias mechanism that 173409467b48Spatrickmeets the needs of your instruction, because it will allow a more concise 173509467b48Spatrickdescription. 173609467b48Spatrick 173709467b48SpatrickMnemonic Aliases 173809467b48Spatrick^^^^^^^^^^^^^^^^ 173909467b48Spatrick 174009467b48SpatrickThe first phase of alias processing is simple instruction mnemonic remapping for 174109467b48Spatrickclasses of instructions which are allowed with two different mnemonics. This 174209467b48Spatrickphase is a simple and unconditionally remapping from one input mnemonic to one 174309467b48Spatrickoutput mnemonic. It isn't possible for this form of alias to look at the 174409467b48Spatrickoperands at all, so the remapping must apply for all forms of a given mnemonic. 174509467b48SpatrickMnemonic aliases are defined simply, for example X86 has: 174609467b48Spatrick 174709467b48Spatrick:: 174809467b48Spatrick 174909467b48Spatrick def : MnemonicAlias<"cbw", "cbtw">; 175009467b48Spatrick def : MnemonicAlias<"smovq", "movsq">; 175109467b48Spatrick def : MnemonicAlias<"fldcww", "fldcw">; 175209467b48Spatrick def : MnemonicAlias<"fucompi", "fucomip">; 175309467b48Spatrick def : MnemonicAlias<"ud2a", "ud2">; 175409467b48Spatrick 175509467b48Spatrick... and many others. With a MnemonicAlias definition, the mnemonic is remapped 175609467b48Spatricksimply and directly. Though MnemonicAlias's can't look at any aspect of the 175709467b48Spatrickinstruction (such as the operands) they can depend on global modes (the same 175809467b48Spatrickones supported by the matcher), through a Requires clause: 175909467b48Spatrick 176009467b48Spatrick:: 176109467b48Spatrick 176209467b48Spatrick def : MnemonicAlias<"pushf", "pushfq">, Requires<[In64BitMode]>; 176309467b48Spatrick def : MnemonicAlias<"pushf", "pushfl">, Requires<[In32BitMode]>; 176409467b48Spatrick 176509467b48SpatrickIn this example, the mnemonic gets mapped into a different one depending on 176609467b48Spatrickthe current instruction set. 176709467b48Spatrick 176809467b48SpatrickInstruction Aliases 176909467b48Spatrick^^^^^^^^^^^^^^^^^^^ 177009467b48Spatrick 177109467b48SpatrickThe most general phase of alias processing occurs while matching is happening: 177209467b48Spatrickit provides new forms for the matcher to match along with a specific instruction 177309467b48Spatrickto generate. An instruction alias has two parts: the string to match and the 177409467b48Spatrickinstruction to generate. For example: 177509467b48Spatrick 177609467b48Spatrick:: 177709467b48Spatrick 177809467b48Spatrick def : InstAlias<"movsx $src, $dst", (MOVSX16rr8W GR16:$dst, GR8 :$src)>; 177909467b48Spatrick def : InstAlias<"movsx $src, $dst", (MOVSX16rm8W GR16:$dst, i8mem:$src)>; 178009467b48Spatrick def : InstAlias<"movsx $src, $dst", (MOVSX32rr8 GR32:$dst, GR8 :$src)>; 178109467b48Spatrick def : InstAlias<"movsx $src, $dst", (MOVSX32rr16 GR32:$dst, GR16 :$src)>; 178209467b48Spatrick def : InstAlias<"movsx $src, $dst", (MOVSX64rr8 GR64:$dst, GR8 :$src)>; 178309467b48Spatrick def : InstAlias<"movsx $src, $dst", (MOVSX64rr16 GR64:$dst, GR16 :$src)>; 178409467b48Spatrick def : InstAlias<"movsx $src, $dst", (MOVSX64rr32 GR64:$dst, GR32 :$src)>; 178509467b48Spatrick 178609467b48SpatrickThis shows a powerful example of the instruction aliases, matching the same 178709467b48Spatrickmnemonic in multiple different ways depending on what operands are present in 178809467b48Spatrickthe assembly. The result of instruction aliases can include operands in a 178909467b48Spatrickdifferent order than the destination instruction, and can use an input multiple 179009467b48Spatricktimes, for example: 179109467b48Spatrick 179209467b48Spatrick:: 179309467b48Spatrick 179409467b48Spatrick def : InstAlias<"clrb $reg", (XOR8rr GR8 :$reg, GR8 :$reg)>; 179509467b48Spatrick def : InstAlias<"clrw $reg", (XOR16rr GR16:$reg, GR16:$reg)>; 179609467b48Spatrick def : InstAlias<"clrl $reg", (XOR32rr GR32:$reg, GR32:$reg)>; 179709467b48Spatrick def : InstAlias<"clrq $reg", (XOR64rr GR64:$reg, GR64:$reg)>; 179809467b48Spatrick 179909467b48SpatrickThis example also shows that tied operands are only listed once. In the X86 180009467b48Spatrickbackend, XOR8rr has two input GR8's and one output GR8 (where an input is tied 180109467b48Spatrickto the output). InstAliases take a flattened operand list without duplicates 180209467b48Spatrickfor tied operands. The result of an instruction alias can also use immediates 180309467b48Spatrickand fixed physical registers which are added as simple immediate operands in the 180409467b48Spatrickresult, for example: 180509467b48Spatrick 180609467b48Spatrick:: 180709467b48Spatrick 180809467b48Spatrick // Fixed Immediate operand. 180909467b48Spatrick def : InstAlias<"aad", (AAD8i8 10)>; 181009467b48Spatrick 181109467b48Spatrick // Fixed register operand. 181209467b48Spatrick def : InstAlias<"fcomi", (COM_FIr ST1)>; 181309467b48Spatrick 181409467b48Spatrick // Simple alias. 181509467b48Spatrick def : InstAlias<"fcomi $reg", (COM_FIr RST:$reg)>; 181609467b48Spatrick 181709467b48SpatrickInstruction aliases can also have a Requires clause to make them subtarget 181809467b48Spatrickspecific. 181909467b48Spatrick 182009467b48SpatrickIf the back-end supports it, the instruction printer can automatically emit the 182109467b48Spatrickalias rather than what's being aliased. It typically leads to better, more 182209467b48Spatrickreadable code. If it's better to print out what's being aliased, then pass a '0' 182309467b48Spatrickas the third parameter to the InstAlias definition. 182409467b48Spatrick 182509467b48SpatrickInstruction Matching 182609467b48Spatrick-------------------- 182709467b48Spatrick 182809467b48Spatrick.. note:: 182909467b48Spatrick 183009467b48Spatrick To Be Written 183109467b48Spatrick 183209467b48Spatrick.. _Implementations of the abstract target description interfaces: 183309467b48Spatrick.. _implement the target description: 183409467b48Spatrick 183509467b48SpatrickTarget-specific Implementation Notes 183609467b48Spatrick==================================== 183709467b48Spatrick 183809467b48SpatrickThis section of the document explains features or design decisions that are 1839*d415bd75Srobertspecific to the code generator for a particular target. 184009467b48Spatrick 184109467b48Spatrick.. _tail call section: 184209467b48Spatrick 184309467b48SpatrickTail call optimization 184409467b48Spatrick---------------------- 184509467b48Spatrick 184609467b48SpatrickTail call optimization, callee reusing the stack of the caller, is currently 184773471bf0Spatricksupported on x86/x86-64, PowerPC, AArch64, and WebAssembly. It is performed on 184873471bf0Spatrickx86/x86-64, PowerPC, and AArch64 if: 184909467b48Spatrick 185009467b48Spatrick* Caller and callee have the calling convention ``fastcc``, ``cc 10`` (GHC 185173471bf0Spatrick calling convention), ``cc 11`` (HiPE calling convention), ``tailcc``, or 185273471bf0Spatrick ``swifttailcc``. 185309467b48Spatrick 185409467b48Spatrick* The call is a tail call - in tail position (ret immediately follows call and 185509467b48Spatrick ret uses value of call or is void). 185609467b48Spatrick 185709467b48Spatrick* Option ``-tailcallopt`` is enabled or the calling convention is ``tailcc``. 185809467b48Spatrick 185909467b48Spatrick* Platform-specific constraints are met. 186009467b48Spatrick 186109467b48Spatrickx86/x86-64 constraints: 186209467b48Spatrick 186309467b48Spatrick* No variable argument lists are used. 186409467b48Spatrick 186509467b48Spatrick* On x86-64 when generating GOT/PIC code only module-local calls (visibility = 186609467b48Spatrick hidden or protected) are supported. 186709467b48Spatrick 186809467b48SpatrickPowerPC constraints: 186909467b48Spatrick 187009467b48Spatrick* No variable argument lists are used. 187109467b48Spatrick 187209467b48Spatrick* No byval parameters are used. 187309467b48Spatrick 187409467b48Spatrick* On ppc32/64 GOT/PIC only module-local calls (visibility = hidden or protected) 187509467b48Spatrick are supported. 187609467b48Spatrick 187709467b48SpatrickWebAssembly constraints: 187809467b48Spatrick 187909467b48Spatrick* No variable argument lists are used 188009467b48Spatrick 188109467b48Spatrick* The 'tail-call' target attribute is enabled. 188209467b48Spatrick 188309467b48Spatrick* The caller and callee's return types must match. The caller cannot 188409467b48Spatrick be void unless the callee is, too. 188509467b48Spatrick 188673471bf0SpatrickAArch64 constraints: 188773471bf0Spatrick 188873471bf0Spatrick* No variable argument lists are used. 188973471bf0Spatrick 189009467b48SpatrickExample: 189109467b48Spatrick 189209467b48SpatrickCall as ``llc -tailcallopt test.ll``. 189309467b48Spatrick 189409467b48Spatrick.. code-block:: llvm 189509467b48Spatrick 189609467b48Spatrick declare fastcc i32 @tailcallee(i32 inreg %a1, i32 inreg %a2, i32 %a3, i32 %a4) 189709467b48Spatrick 189809467b48Spatrick define fastcc i32 @tailcaller(i32 %in1, i32 %in2) { 189909467b48Spatrick %l1 = add i32 %in1, %in2 190073471bf0Spatrick %tmp = tail call fastcc i32 @tailcallee(i32 inreg %in1, i32 inreg %in2, i32 %in1, i32 %l1) 190109467b48Spatrick ret i32 %tmp 190209467b48Spatrick } 190309467b48Spatrick 190409467b48SpatrickImplications of ``-tailcallopt``: 190509467b48Spatrick 190609467b48SpatrickTo support tail call optimization in situations where the callee has more 190709467b48Spatrickarguments than the caller a 'callee pops arguments' convention is used. This 190809467b48Spatrickcurrently causes each ``fastcc`` call that is not tail call optimized (because 190909467b48Spatrickone or more of above constraints are not met) to be followed by a readjustment 191009467b48Spatrickof the stack. So performance might be worse in such cases. 191109467b48Spatrick 191209467b48SpatrickSibling call optimization 191309467b48Spatrick------------------------- 191409467b48Spatrick 191509467b48SpatrickSibling call optimization is a restricted form of tail call optimization. 191609467b48SpatrickUnlike tail call optimization described in the previous section, it can be 191709467b48Spatrickperformed automatically on any tail calls when ``-tailcallopt`` option is not 191809467b48Spatrickspecified. 191909467b48Spatrick 192009467b48SpatrickSibling call optimization is currently performed on x86/x86-64 when the 192109467b48Spatrickfollowing constraints are met: 192209467b48Spatrick 192309467b48Spatrick* Caller and callee have the same calling convention. It can be either ``c`` or 192409467b48Spatrick ``fastcc``. 192509467b48Spatrick 192609467b48Spatrick* The call is a tail call - in tail position (ret immediately follows call and 192709467b48Spatrick ret uses value of call or is void). 192809467b48Spatrick 192909467b48Spatrick* Caller and callee have matching return type or the callee result is not used. 193009467b48Spatrick 193109467b48Spatrick* If any of the callee arguments are being passed in stack, they must be 193209467b48Spatrick available in caller's own incoming argument stack and the frame offsets must 193309467b48Spatrick be the same. 193409467b48Spatrick 193509467b48SpatrickExample: 193609467b48Spatrick 193709467b48Spatrick.. code-block:: llvm 193809467b48Spatrick 193909467b48Spatrick declare i32 @bar(i32, i32) 194009467b48Spatrick 194109467b48Spatrick define i32 @foo(i32 %a, i32 %b, i32 %c) { 194209467b48Spatrick entry: 194309467b48Spatrick %0 = tail call i32 @bar(i32 %a, i32 %b) 194409467b48Spatrick ret i32 %0 194509467b48Spatrick } 194609467b48Spatrick 194709467b48SpatrickThe X86 backend 194809467b48Spatrick--------------- 194909467b48Spatrick 195009467b48SpatrickThe X86 code generator lives in the ``lib/Target/X86`` directory. This code 195109467b48Spatrickgenerator is capable of targeting a variety of x86-32 and x86-64 processors, and 195209467b48Spatrickincludes support for ISA extensions such as MMX and SSE. 195309467b48Spatrick 195409467b48SpatrickX86 Target Triples supported 195509467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 195609467b48Spatrick 195709467b48SpatrickThe following are the known target triples that are supported by the X86 195809467b48Spatrickbackend. This is not an exhaustive list, and it would be useful to add those 195909467b48Spatrickthat people test. 196009467b48Spatrick 196109467b48Spatrick* **i686-pc-linux-gnu** --- Linux 196209467b48Spatrick 196309467b48Spatrick* **i386-unknown-freebsd5.3** --- FreeBSD 5.3 196409467b48Spatrick 196509467b48Spatrick* **i686-pc-cygwin** --- Cygwin on Win32 196609467b48Spatrick 196709467b48Spatrick* **i686-pc-mingw32** --- MingW on Win32 196809467b48Spatrick 196909467b48Spatrick* **i386-pc-mingw32msvc** --- MingW crosscompiler on Linux 197009467b48Spatrick 197109467b48Spatrick* **i686-apple-darwin*** --- Apple Darwin on X86 197209467b48Spatrick 197309467b48Spatrick* **x86_64-unknown-linux-gnu** --- Linux 197409467b48Spatrick 197509467b48SpatrickX86 Calling Conventions supported 197609467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 197709467b48Spatrick 197809467b48SpatrickThe following target-specific calling conventions are known to backend: 197909467b48Spatrick 198009467b48Spatrick* **x86_StdCall** --- stdcall calling convention seen on Microsoft Windows 198109467b48Spatrick platform (CC ID = 64). 198209467b48Spatrick 198309467b48Spatrick* **x86_FastCall** --- fastcall calling convention seen on Microsoft Windows 198409467b48Spatrick platform (CC ID = 65). 198509467b48Spatrick 198609467b48Spatrick* **x86_ThisCall** --- Similar to X86_StdCall. Passes first argument in ECX, 198709467b48Spatrick others via stack. Callee is responsible for stack cleaning. This convention is 198809467b48Spatrick used by MSVC by default for methods in its ABI (CC ID = 70). 198909467b48Spatrick 199009467b48Spatrick.. _X86 addressing mode: 199109467b48Spatrick 199209467b48SpatrickRepresenting X86 addressing modes in MachineInstrs 199309467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 199409467b48Spatrick 199509467b48SpatrickThe x86 has a very flexible way of accessing memory. It is capable of forming 199609467b48Spatrickmemory addresses of the following expression directly in integer instructions 199709467b48Spatrick(which use ModR/M addressing): 199809467b48Spatrick 199909467b48Spatrick:: 200009467b48Spatrick 200109467b48Spatrick SegmentReg: Base + [1,2,4,8] * IndexReg + Disp32 200209467b48Spatrick 200309467b48SpatrickIn order to represent this, LLVM tracks no less than 5 operands for each memory 200409467b48Spatrickoperand of this form. This means that the "load" form of '``mov``' has the 200509467b48Spatrickfollowing ``MachineOperand``\s in this order: 200609467b48Spatrick 200709467b48Spatrick:: 200809467b48Spatrick 200909467b48Spatrick Index: 0 | 1 2 3 4 5 201009467b48Spatrick Meaning: DestReg, | BaseReg, Scale, IndexReg, Displacement Segment 201109467b48Spatrick OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg, SignExtImm PhysReg 201209467b48Spatrick 201309467b48SpatrickStores, and all other instructions, treat the four memory operands in the same 201409467b48Spatrickway and in the same order. If the segment register is unspecified (regno = 0), 201509467b48Spatrickthen no segment override is generated. "Lea" operations do not have a segment 201609467b48Spatrickregister specified, so they only have 4 operands for their memory reference. 201709467b48Spatrick 201809467b48SpatrickX86 address spaces supported 201909467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 202009467b48Spatrick 202109467b48Spatrickx86 has a feature which provides the ability to perform loads and stores to 202209467b48Spatrickdifferent address spaces via the x86 segment registers. A segment override 202309467b48Spatrickprefix byte on an instruction causes the instruction's memory access to go to 202409467b48Spatrickthe specified segment. LLVM address space 0 is the default address space, which 202509467b48Spatrickincludes the stack, and any unqualified memory accesses in a program. Address 202609467b48Spatrickspaces 1-255 are currently reserved for user-defined code. The GS-segment is 202709467b48Spatrickrepresented by address space 256, the FS-segment is represented by address space 202809467b48Spatrick257, and the SS-segment is represented by address space 258. Other x86 segments 202909467b48Spatrickhave yet to be allocated address space numbers. 203009467b48Spatrick 203109467b48SpatrickWhile these address spaces may seem similar to TLS via the ``thread_local`` 203209467b48Spatrickkeyword, and often use the same underlying hardware, there are some fundamental 203309467b48Spatrickdifferences. 203409467b48Spatrick 203509467b48SpatrickThe ``thread_local`` keyword applies to global variables and specifies that they 203609467b48Spatrickare to be allocated in thread-local memory. There are no type qualifiers 203709467b48Spatrickinvolved, and these variables can be pointed to with normal pointers and 203809467b48Spatrickaccessed with normal loads and stores. The ``thread_local`` keyword is 203909467b48Spatricktarget-independent at the LLVM IR level (though LLVM doesn't yet have 204009467b48Spatrickimplementations of it for some configurations) 204109467b48Spatrick 204209467b48SpatrickSpecial address spaces, in contrast, apply to static types. Every load and store 204309467b48Spatrickhas a particular address space in its address operand type, and this is what 204409467b48Spatrickdetermines which address space is accessed. LLVM ignores these special address 204509467b48Spatrickspace qualifiers on global variables, and does not provide a way to directly 204609467b48Spatrickallocate storage in them. At the LLVM IR level, the behavior of these special 204709467b48Spatrickaddress spaces depends in part on the underlying OS or runtime environment, and 204809467b48Spatrickthey are specific to x86 (and LLVM doesn't yet handle them correctly in some 204909467b48Spatrickcases). 205009467b48Spatrick 205109467b48SpatrickSome operating systems and runtime environments use (or may in the future use) 205209467b48Spatrickthe FS/GS-segment registers for various low-level purposes, so care should be 205309467b48Spatricktaken when considering them. 205409467b48Spatrick 205509467b48SpatrickInstruction naming 205609467b48Spatrick^^^^^^^^^^^^^^^^^^ 205709467b48Spatrick 2058*d415bd75SrobertAn instruction name consists of the base name, a default operand size, and a 205909467b48Spatrickcharacter per operand with an optional special size. For example: 206009467b48Spatrick 206109467b48Spatrick:: 206209467b48Spatrick 206309467b48Spatrick ADD8rr -> add, 8-bit register, 8-bit register 206409467b48Spatrick IMUL16rmi -> imul, 16-bit register, 16-bit memory, 16-bit immediate 206509467b48Spatrick IMUL16rmi8 -> imul, 16-bit register, 16-bit memory, 8-bit immediate 206609467b48Spatrick MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory 206709467b48Spatrick 206809467b48SpatrickThe PowerPC backend 206909467b48Spatrick------------------- 207009467b48Spatrick 207109467b48SpatrickThe PowerPC code generator lives in the lib/Target/PowerPC directory. The code 207209467b48Spatrickgeneration is retargetable to several variations or *subtargets* of the PowerPC 207309467b48SpatrickISA; including ppc32, ppc64 and altivec. 207409467b48Spatrick 207509467b48SpatrickLLVM PowerPC ABI 207609467b48Spatrick^^^^^^^^^^^^^^^^ 207709467b48Spatrick 207809467b48SpatrickLLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC relative 207909467b48Spatrick(PIC) or static addressing for accessing global values, so no TOC (r2) is 208009467b48Spatrickused. Second, r31 is used as a frame pointer to allow dynamic growth of a stack 208109467b48Spatrickframe. LLVM takes advantage of having no TOC to provide space to save the frame 208209467b48Spatrickpointer in the PowerPC linkage area of the caller frame. Other details of 208309467b48SpatrickPowerPC ABI can be found at `PowerPC ABI 208409467b48Spatrick<http://developer.apple.com/documentation/DeveloperTools/Conceptual/LowLevelABI/Articles/32bitPowerPC.html>`_\ 208509467b48Spatrick. Note: This link describes the 32 bit ABI. The 64 bit ABI is similar except 208609467b48Spatrickspace for GPRs are 8 bytes wide (not 4) and r13 is reserved for system use. 208709467b48Spatrick 208809467b48SpatrickFrame Layout 208909467b48Spatrick^^^^^^^^^^^^ 209009467b48Spatrick 209109467b48SpatrickThe size of a PowerPC frame is usually fixed for the duration of a function's 209209467b48Spatrickinvocation. Since the frame is fixed size, all references into the frame can be 209309467b48Spatrickaccessed via fixed offsets from the stack pointer. The exception to this is 209409467b48Spatrickwhen dynamic alloca or variable sized arrays are present, then a base pointer 209509467b48Spatrick(r31) is used as a proxy for the stack pointer and stack pointer is free to grow 209609467b48Spatrickor shrink. A base pointer is also used if llvm-gcc is not passed the 209709467b48Spatrick-fomit-frame-pointer flag. The stack pointer is always aligned to 16 bytes, so 209809467b48Spatrickthat space allocated for altivec vectors will be properly aligned. 209909467b48Spatrick 210009467b48SpatrickAn invocation frame is laid out as follows (low memory at top): 210109467b48Spatrick 210209467b48Spatrick:raw-html:`<table border="1" cellspacing="0">` 210309467b48Spatrick:raw-html:`<tr>` 210409467b48Spatrick:raw-html:`<td>Linkage<br><br></td>` 210509467b48Spatrick:raw-html:`</tr>` 210609467b48Spatrick:raw-html:`<tr>` 210709467b48Spatrick:raw-html:`<td>Parameter area<br><br></td>` 210809467b48Spatrick:raw-html:`</tr>` 210909467b48Spatrick:raw-html:`<tr>` 211009467b48Spatrick:raw-html:`<td>Dynamic area<br><br></td>` 211109467b48Spatrick:raw-html:`</tr>` 211209467b48Spatrick:raw-html:`<tr>` 211309467b48Spatrick:raw-html:`<td>Locals area<br><br></td>` 211409467b48Spatrick:raw-html:`</tr>` 211509467b48Spatrick:raw-html:`<tr>` 211609467b48Spatrick:raw-html:`<td>Saved registers area<br><br></td>` 211709467b48Spatrick:raw-html:`</tr>` 211809467b48Spatrick:raw-html:`<tr style="border-style: none hidden none hidden;">` 211909467b48Spatrick:raw-html:`<td><br></td>` 212009467b48Spatrick:raw-html:`</tr>` 212109467b48Spatrick:raw-html:`<tr>` 212209467b48Spatrick:raw-html:`<td>Previous Frame<br><br></td>` 212309467b48Spatrick:raw-html:`</tr>` 212409467b48Spatrick:raw-html:`</table>` 212509467b48Spatrick 212609467b48SpatrickThe *linkage* area is used by a callee to save special registers prior to 212709467b48Spatrickallocating its own frame. Only three entries are relevant to LLVM. The first 212809467b48Spatrickentry is the previous stack pointer (sp), aka link. This allows probing tools 212909467b48Spatricklike gdb or exception handlers to quickly scan the frames in the stack. A 213009467b48Spatrickfunction epilog can also use the link to pop the frame from the stack. The 213109467b48Spatrickthird entry in the linkage area is used to save the return address from the lr 213209467b48Spatrickregister. Finally, as mentioned above, the last entry is used to save the 213309467b48Spatrickprevious frame pointer (r31.) The entries in the linkage area are the size of a 213409467b48SpatrickGPR, thus the linkage area is 24 bytes long in 32 bit mode and 48 bytes in 64 213509467b48Spatrickbit mode. 213609467b48Spatrick 213709467b48Spatrick32 bit linkage area: 213809467b48Spatrick 213909467b48Spatrick:raw-html:`<table border="1" cellspacing="0">` 214009467b48Spatrick:raw-html:`<tr>` 214109467b48Spatrick:raw-html:`<td>0</td>` 214209467b48Spatrick:raw-html:`<td>Saved SP (r1)</td>` 214309467b48Spatrick:raw-html:`</tr>` 214409467b48Spatrick:raw-html:`<tr>` 214509467b48Spatrick:raw-html:`<td>4</td>` 214609467b48Spatrick:raw-html:`<td>Saved CR</td>` 214709467b48Spatrick:raw-html:`</tr>` 214809467b48Spatrick:raw-html:`<tr>` 214909467b48Spatrick:raw-html:`<td>8</td>` 215009467b48Spatrick:raw-html:`<td>Saved LR</td>` 215109467b48Spatrick:raw-html:`</tr>` 215209467b48Spatrick:raw-html:`<tr>` 215309467b48Spatrick:raw-html:`<td>12</td>` 215409467b48Spatrick:raw-html:`<td>Reserved</td>` 215509467b48Spatrick:raw-html:`</tr>` 215609467b48Spatrick:raw-html:`<tr>` 215709467b48Spatrick:raw-html:`<td>16</td>` 215809467b48Spatrick:raw-html:`<td>Reserved</td>` 215909467b48Spatrick:raw-html:`</tr>` 216009467b48Spatrick:raw-html:`<tr>` 216109467b48Spatrick:raw-html:`<td>20</td>` 216209467b48Spatrick:raw-html:`<td>Saved FP (r31)</td>` 216309467b48Spatrick:raw-html:`</tr>` 216409467b48Spatrick:raw-html:`</table>` 216509467b48Spatrick 216609467b48Spatrick64 bit linkage area: 216709467b48Spatrick 216809467b48Spatrick:raw-html:`<table border="1" cellspacing="0">` 216909467b48Spatrick:raw-html:`<tr>` 217009467b48Spatrick:raw-html:`<td>0</td>` 217109467b48Spatrick:raw-html:`<td>Saved SP (r1)</td>` 217209467b48Spatrick:raw-html:`</tr>` 217309467b48Spatrick:raw-html:`<tr>` 217409467b48Spatrick:raw-html:`<td>8</td>` 217509467b48Spatrick:raw-html:`<td>Saved CR</td>` 217609467b48Spatrick:raw-html:`</tr>` 217709467b48Spatrick:raw-html:`<tr>` 217809467b48Spatrick:raw-html:`<td>16</td>` 217909467b48Spatrick:raw-html:`<td>Saved LR</td>` 218009467b48Spatrick:raw-html:`</tr>` 218109467b48Spatrick:raw-html:`<tr>` 218209467b48Spatrick:raw-html:`<td>24</td>` 218309467b48Spatrick:raw-html:`<td>Reserved</td>` 218409467b48Spatrick:raw-html:`</tr>` 218509467b48Spatrick:raw-html:`<tr>` 218609467b48Spatrick:raw-html:`<td>32</td>` 218709467b48Spatrick:raw-html:`<td>Reserved</td>` 218809467b48Spatrick:raw-html:`</tr>` 218909467b48Spatrick:raw-html:`<tr>` 219009467b48Spatrick:raw-html:`<td>40</td>` 219109467b48Spatrick:raw-html:`<td>Saved FP (r31)</td>` 219209467b48Spatrick:raw-html:`</tr>` 219309467b48Spatrick:raw-html:`</table>` 219409467b48Spatrick 219509467b48SpatrickThe *parameter area* is used to store arguments being passed to a callee 219609467b48Spatrickfunction. Following the PowerPC ABI, the first few arguments are actually 219709467b48Spatrickpassed in registers, with the space in the parameter area unused. However, if 219809467b48Spatrickthere are not enough registers or the callee is a thunk or vararg function, 219909467b48Spatrickthese register arguments can be spilled into the parameter area. Thus, the 220009467b48Spatrickparameter area must be large enough to store all the parameters for the largest 220109467b48Spatrickcall sequence made by the caller. The size must also be minimally large enough 220209467b48Spatrickto spill registers r3-r10. This allows callees blind to the call signature, 220309467b48Spatricksuch as thunks and vararg functions, enough space to cache the argument 220409467b48Spatrickregisters. Therefore, the parameter area is minimally 32 bytes (64 bytes in 64 220509467b48Spatrickbit mode.) Also note that since the parameter area is a fixed offset from the 2206097a140dSpatricktop of the frame, that a callee can access its split arguments using fixed 220709467b48Spatrickoffsets from the stack pointer (or base pointer.) 220809467b48Spatrick 220909467b48SpatrickCombining the information about the linkage, parameter areas and alignment. A 221009467b48Spatrickstack frame is minimally 64 bytes in 32 bit mode and 128 bytes in 64 bit mode. 221109467b48Spatrick 221209467b48SpatrickThe *dynamic area* starts out as size zero. If a function uses dynamic alloca 221309467b48Spatrickthen space is added to the stack, the linkage and parameter areas are shifted to 221409467b48Spatricktop of stack, and the new space is available immediately below the linkage and 221509467b48Spatrickparameter areas. The cost of shifting the linkage and parameter areas is minor 221609467b48Spatricksince only the link value needs to be copied. The link value can be easily 221709467b48Spatrickfetched by adding the original frame size to the base pointer. Note that 221809467b48Spatrickallocations in the dynamic space need to observe 16 byte alignment. 221909467b48Spatrick 222009467b48SpatrickThe *locals area* is where the llvm compiler reserves space for local variables. 222109467b48Spatrick 222209467b48SpatrickThe *saved registers area* is where the llvm compiler spills callee saved 222309467b48Spatrickregisters on entry to the callee. 222409467b48Spatrick 222509467b48SpatrickProlog/Epilog 222609467b48Spatrick^^^^^^^^^^^^^ 222709467b48Spatrick 222809467b48SpatrickThe llvm prolog and epilog are the same as described in the PowerPC ABI, with 222909467b48Spatrickthe following exceptions. Callee saved registers are spilled after the frame is 223009467b48Spatrickcreated. This allows the llvm epilog/prolog support to be common with other 223109467b48Spatricktargets. The base pointer callee saved register r31 is saved in the TOC slot of 223209467b48Spatricklinkage area. This simplifies allocation of space for the base pointer and 223309467b48Spatrickmakes it convenient to locate programmatically and during debugging. 223409467b48Spatrick 223509467b48SpatrickDynamic Allocation 223609467b48Spatrick^^^^^^^^^^^^^^^^^^ 223709467b48Spatrick 223809467b48Spatrick.. note:: 223909467b48Spatrick 224009467b48Spatrick TODO - More to come. 224109467b48Spatrick 224209467b48SpatrickThe NVPTX backend 224309467b48Spatrick----------------- 224409467b48Spatrick 224509467b48SpatrickThe NVPTX code generator under lib/Target/NVPTX is an open-source version of 224609467b48Spatrickthe NVIDIA NVPTX code generator for LLVM. It is contributed by NVIDIA and is 224709467b48Spatricka port of the code generator used in the CUDA compiler (nvcc). It targets the 224809467b48SpatrickPTX 3.0/3.1 ISA and can target any compute capability greater than or equal to 224909467b48Spatrick2.0 (Fermi). 225009467b48Spatrick 225109467b48SpatrickThis target is of production quality and should be completely compatible with 225209467b48Spatrickthe official NVIDIA toolchain. 225309467b48Spatrick 225409467b48SpatrickCode Generator Options: 225509467b48Spatrick 225609467b48Spatrick:raw-html:`<table border="1" cellspacing="0">` 225709467b48Spatrick:raw-html:`<tr>` 225809467b48Spatrick:raw-html:`<th>Option</th>` 225909467b48Spatrick:raw-html:`<th>Description</th>` 226009467b48Spatrick:raw-html:`</tr>` 226109467b48Spatrick:raw-html:`<tr>` 226209467b48Spatrick:raw-html:`<td>sm_20</td>` 226309467b48Spatrick:raw-html:`<td align="left">Set shader model/compute capability to 2.0</td>` 226409467b48Spatrick:raw-html:`</tr>` 226509467b48Spatrick:raw-html:`<tr>` 226609467b48Spatrick:raw-html:`<td>sm_21</td>` 226709467b48Spatrick:raw-html:`<td align="left">Set shader model/compute capability to 2.1</td>` 226809467b48Spatrick:raw-html:`</tr>` 226909467b48Spatrick:raw-html:`<tr>` 227009467b48Spatrick:raw-html:`<td>sm_30</td>` 227109467b48Spatrick:raw-html:`<td align="left">Set shader model/compute capability to 3.0</td>` 227209467b48Spatrick:raw-html:`</tr>` 227309467b48Spatrick:raw-html:`<tr>` 227409467b48Spatrick:raw-html:`<td>sm_35</td>` 227509467b48Spatrick:raw-html:`<td align="left">Set shader model/compute capability to 3.5</td>` 227609467b48Spatrick:raw-html:`</tr>` 227709467b48Spatrick:raw-html:`<tr>` 227809467b48Spatrick:raw-html:`<td>ptx30</td>` 227909467b48Spatrick:raw-html:`<td align="left">Target PTX 3.0</td>` 228009467b48Spatrick:raw-html:`</tr>` 228109467b48Spatrick:raw-html:`<tr>` 228209467b48Spatrick:raw-html:`<td>ptx31</td>` 228309467b48Spatrick:raw-html:`<td align="left">Target PTX 3.1</td>` 228409467b48Spatrick:raw-html:`</tr>` 228509467b48Spatrick:raw-html:`</table>` 228609467b48Spatrick 228709467b48SpatrickThe extended Berkeley Packet Filter (eBPF) backend 228809467b48Spatrick-------------------------------------------------- 228909467b48Spatrick 229009467b48SpatrickExtended BPF (or eBPF) is similar to the original ("classic") BPF (cBPF) used 229109467b48Spatrickto filter network packets. The 229209467b48Spatrick`bpf() system call <http://man7.org/linux/man-pages/man2/bpf.2.html>`_ 229309467b48Spatrickperforms a range of operations related to eBPF. For both cBPF and eBPF 229409467b48Spatrickprograms, the Linux kernel statically analyzes the programs before loading 229509467b48Spatrickthem, in order to ensure that they cannot harm the running system. eBPF is 229609467b48Spatricka 64-bit RISC instruction set designed for one to one mapping to 64-bit CPUs. 229709467b48SpatrickOpcodes are 8-bit encoded, and 87 instructions are defined. There are 10 229809467b48Spatrickregisters, grouped by function as outlined below. 229909467b48Spatrick 230009467b48Spatrick:: 230109467b48Spatrick 230209467b48Spatrick R0 return value from in-kernel functions; exit value for eBPF program 230309467b48Spatrick R1 - R5 function call arguments to in-kernel functions 230409467b48Spatrick R6 - R9 callee-saved registers preserved by in-kernel functions 230509467b48Spatrick R10 stack frame pointer (read only) 230609467b48Spatrick 230709467b48SpatrickInstruction encoding (arithmetic and jump) 230809467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 230909467b48SpatrickeBPF is reusing most of the opcode encoding from classic to simplify conversion 231009467b48Spatrickof classic BPF to eBPF. For arithmetic and jump instructions the 8-bit 'code' 231109467b48Spatrickfield is divided into three parts: 231209467b48Spatrick 231309467b48Spatrick:: 231409467b48Spatrick 231509467b48Spatrick +----------------+--------+--------------------+ 231609467b48Spatrick | 4 bits | 1 bit | 3 bits | 231709467b48Spatrick | operation code | source | instruction class | 231809467b48Spatrick +----------------+--------+--------------------+ 231909467b48Spatrick (MSB) (LSB) 232009467b48Spatrick 232109467b48SpatrickThree LSB bits store instruction class which is one of: 232209467b48Spatrick 232309467b48Spatrick:: 232409467b48Spatrick 232509467b48Spatrick BPF_LD 0x0 232609467b48Spatrick BPF_LDX 0x1 232709467b48Spatrick BPF_ST 0x2 232809467b48Spatrick BPF_STX 0x3 232909467b48Spatrick BPF_ALU 0x4 233009467b48Spatrick BPF_JMP 0x5 233109467b48Spatrick (unused) 0x6 233209467b48Spatrick BPF_ALU64 0x7 233309467b48Spatrick 233409467b48SpatrickWhen BPF_CLASS(code) == BPF_ALU or BPF_ALU64 or BPF_JMP, 233509467b48Spatrick4th bit encodes source operand 233609467b48Spatrick 233709467b48Spatrick:: 233809467b48Spatrick 233909467b48Spatrick BPF_X 0x1 use src_reg register as source operand 234009467b48Spatrick BPF_K 0x0 use 32 bit immediate as source operand 234109467b48Spatrick 234209467b48Spatrickand four MSB bits store operation code 234309467b48Spatrick 234409467b48Spatrick:: 234509467b48Spatrick 234609467b48Spatrick BPF_ADD 0x0 add 234709467b48Spatrick BPF_SUB 0x1 subtract 234809467b48Spatrick BPF_MUL 0x2 multiply 234909467b48Spatrick BPF_DIV 0x3 divide 235009467b48Spatrick BPF_OR 0x4 bitwise logical OR 235109467b48Spatrick BPF_AND 0x5 bitwise logical AND 235209467b48Spatrick BPF_LSH 0x6 left shift 235309467b48Spatrick BPF_RSH 0x7 right shift (zero extended) 235409467b48Spatrick BPF_NEG 0x8 arithmetic negation 235509467b48Spatrick BPF_MOD 0x9 modulo 235609467b48Spatrick BPF_XOR 0xa bitwise logical XOR 235709467b48Spatrick BPF_MOV 0xb move register to register 235809467b48Spatrick BPF_ARSH 0xc right shift (sign extended) 235909467b48Spatrick BPF_END 0xd endianness conversion 236009467b48Spatrick 236109467b48SpatrickIf BPF_CLASS(code) == BPF_JMP, BPF_OP(code) is one of 236209467b48Spatrick 236309467b48Spatrick:: 236409467b48Spatrick 236509467b48Spatrick BPF_JA 0x0 unconditional jump 236609467b48Spatrick BPF_JEQ 0x1 jump == 236709467b48Spatrick BPF_JGT 0x2 jump > 236809467b48Spatrick BPF_JGE 0x3 jump >= 236909467b48Spatrick BPF_JSET 0x4 jump if (DST & SRC) 237009467b48Spatrick BPF_JNE 0x5 jump != 237109467b48Spatrick BPF_JSGT 0x6 jump signed > 237209467b48Spatrick BPF_JSGE 0x7 jump signed >= 237309467b48Spatrick BPF_CALL 0x8 function call 237409467b48Spatrick BPF_EXIT 0x9 function return 237509467b48Spatrick 237609467b48SpatrickInstruction encoding (load, store) 237709467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 237809467b48SpatrickFor load and store instructions the 8-bit 'code' field is divided as: 237909467b48Spatrick 238009467b48Spatrick:: 238109467b48Spatrick 238209467b48Spatrick +--------+--------+-------------------+ 238309467b48Spatrick | 3 bits | 2 bits | 3 bits | 238409467b48Spatrick | mode | size | instruction class | 238509467b48Spatrick +--------+--------+-------------------+ 238609467b48Spatrick (MSB) (LSB) 238709467b48Spatrick 238809467b48SpatrickSize modifier is one of 238909467b48Spatrick 239009467b48Spatrick:: 239109467b48Spatrick 239209467b48Spatrick BPF_W 0x0 word 239309467b48Spatrick BPF_H 0x1 half word 239409467b48Spatrick BPF_B 0x2 byte 239509467b48Spatrick BPF_DW 0x3 double word 239609467b48Spatrick 239709467b48SpatrickMode modifier is one of 239809467b48Spatrick 239909467b48Spatrick:: 240009467b48Spatrick 240109467b48Spatrick BPF_IMM 0x0 immediate 240209467b48Spatrick BPF_ABS 0x1 used to access packet data 240309467b48Spatrick BPF_IND 0x2 used to access packet data 240409467b48Spatrick BPF_MEM 0x3 memory 240509467b48Spatrick (reserved) 0x4 240609467b48Spatrick (reserved) 0x5 240709467b48Spatrick BPF_XADD 0x6 exclusive add 240809467b48Spatrick 240909467b48Spatrick 241009467b48SpatrickPacket data access (BPF_ABS, BPF_IND) 241109467b48Spatrick^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 241209467b48Spatrick 241309467b48SpatrickTwo non-generic instructions: (BPF_ABS | <size> | BPF_LD) and 241409467b48Spatrick(BPF_IND | <size> | BPF_LD) which are used to access packet data. 241509467b48SpatrickRegister R6 is an implicit input that must contain pointer to sk_buff. 241609467b48SpatrickRegister R0 is an implicit output which contains the data fetched 241709467b48Spatrickfrom the packet. Registers R1-R5 are scratch registers and must not 241809467b48Spatrickbe used to store the data across BPF_ABS | BPF_LD or BPF_IND | BPF_LD 241909467b48Spatrickinstructions. These instructions have implicit program exit condition 242009467b48Spatrickas well. When eBPF program is trying to access the data beyond 242109467b48Spatrickthe packet boundary, the interpreter will abort the execution of the program. 242209467b48Spatrick 242309467b48SpatrickBPF_IND | BPF_W | BPF_LD is equivalent to: 242409467b48Spatrick R0 = ntohl(\*(u32 \*) (((struct sk_buff \*) R6)->data + src_reg + imm32)) 242509467b48Spatrick 242609467b48SpatrickeBPF maps 242709467b48Spatrick^^^^^^^^^ 242809467b48Spatrick 242909467b48SpatrickeBPF maps are provided for sharing data between kernel and user-space. 243009467b48SpatrickCurrently implemented types are hash and array, with potential extension to 243109467b48Spatricksupport bloom filters, radix trees, etc. A map is defined by its type, 243209467b48Spatrickmaximum number of elements, key size and value size in bytes. eBPF syscall 243309467b48Spatricksupports create, update, find and delete functions on maps. 243409467b48Spatrick 243509467b48SpatrickFunction calls 243609467b48Spatrick^^^^^^^^^^^^^^ 243709467b48Spatrick 243809467b48SpatrickFunction call arguments are passed using up to five registers (R1 - R5). 243909467b48SpatrickThe return value is passed in a dedicated register (R0). Four additional 244009467b48Spatrickregisters (R6 - R9) are callee-saved, and the values in these registers 244109467b48Spatrickare preserved within kernel functions. R0 - R5 are scratch registers within 244209467b48Spatrickkernel functions, and eBPF programs must therefor store/restore values in 244309467b48Spatrickthese registers if needed across function calls. The stack can be accessed 244409467b48Spatrickusing the read-only frame pointer R10. eBPF registers map 1:1 to hardware 244509467b48Spatrickregisters on x86_64 and other 64-bit architectures. For example, x86_64 244609467b48Spatrickin-kernel JIT maps them as 244709467b48Spatrick 244809467b48Spatrick:: 244909467b48Spatrick 245009467b48Spatrick R0 - rax 245109467b48Spatrick R1 - rdi 245209467b48Spatrick R2 - rsi 245309467b48Spatrick R3 - rdx 245409467b48Spatrick R4 - rcx 245509467b48Spatrick R5 - r8 245609467b48Spatrick R6 - rbx 245709467b48Spatrick R7 - r13 245809467b48Spatrick R8 - r14 245909467b48Spatrick R9 - r15 246009467b48Spatrick R10 - rbp 246109467b48Spatrick 246209467b48Spatricksince x86_64 ABI mandates rdi, rsi, rdx, rcx, r8, r9 for argument passing 246309467b48Spatrickand rbx, r12 - r15 are callee saved. 246409467b48Spatrick 246509467b48SpatrickProgram start 246609467b48Spatrick^^^^^^^^^^^^^ 246709467b48Spatrick 246809467b48SpatrickAn eBPF program receives a single argument and contains 246909467b48Spatricka single eBPF main routine; the program does not contain eBPF functions. 247009467b48SpatrickFunction calls are limited to a predefined set of kernel functions. The size 247109467b48Spatrickof a program is limited to 4K instructions: this ensures fast termination and 247209467b48Spatricka limited number of kernel function calls. Prior to running an eBPF program, 247309467b48Spatricka verifier performs static analysis to prevent loops in the code and 247409467b48Spatrickto ensure valid register usage and operand types. 247509467b48Spatrick 247609467b48SpatrickThe AMDGPU backend 247709467b48Spatrick------------------ 247809467b48Spatrick 247909467b48SpatrickThe AMDGPU code generator lives in the ``lib/Target/AMDGPU`` 248009467b48Spatrickdirectory. This code generator is capable of targeting a variety of 248109467b48SpatrickAMD GPU processors. Refer to :doc:`AMDGPUUsage` for more information. 2482