1================= 2TableGen Overview 3================= 4 5.. contents:: 6 :local: 7 8.. toctree:: 9 :hidden: 10 11 BackEnds 12 BackGuide 13 ProgRef 14 15Introduction 16============ 17 18TableGen's purpose is to help a human develop and maintain records of 19domain-specific information. Because there may be a large number of these 20records, it is specifically designed to allow writing flexible descriptions and 21for common features of these records to be factored out. This reduces the 22amount of duplication in the description, reduces the chance of error, and makes 23it easier to structure domain specific information. 24 25The TableGen front end parses a file, instantiates the declarations, and 26hands the result off to a domain-specific `backend`_ for processing. See 27the :doc:`TableGen Programmer's Reference <./ProgRef>` for an in-depth 28description of TableGen. See :doc:`tblgen - Description to C++ Code 29<../CommandGuide/tblgen>` for details on the ``*-tblgen`` commands 30that run the various flavors of TableGen. 31 32The current major users of TableGen are :doc:`The LLVM Target-Independent 33Code Generator <../CodeGenerator>` and the `Clang diagnostics and attributes 34<https://clang.llvm.org/docs/UsersManual.html#controlling-errors-and-warnings>`_. 35 36Note that if you work with TableGen frequently and use emacs or vim, 37you can find an emacs "TableGen mode" and a vim language file in the 38``llvm/utils/emacs`` and ``llvm/utils/vim`` directories of your LLVM 39distribution, respectively. 40 41.. _intro: 42 43 44The TableGen program 45==================== 46 47TableGen files are interpreted by the TableGen program: `llvm-tblgen` available 48on your build directory under `bin`. It is not installed in the system (or where 49your sysroot is set to), since it has no use beyond LLVM's build process. 50 51Running TableGen 52---------------- 53 54TableGen runs just like any other LLVM tool. The first (optional) argument 55specifies the file to read. If a filename is not specified, ``llvm-tblgen`` 56reads from standard input. 57 58To be useful, one of the `backends`_ must be used. These backends are 59selectable on the command line (type '``llvm-tblgen -help``' for a list). For 60example, to get a list of all of the definitions that subclass a particular type 61(which can be useful for building up an enum list of these records), use the 62``-print-enums`` option: 63 64.. code-block:: bash 65 66 $ llvm-tblgen X86.td -print-enums -class=Register 67 AH, AL, AX, BH, BL, BP, BPL, BX, CH, CL, CX, DH, DI, DIL, DL, DX, EAX, EBP, EBX, 68 ECX, EDI, EDX, EFLAGS, EIP, ESI, ESP, FP0, FP1, FP2, FP3, FP4, FP5, FP6, IP, 69 MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, R10, R10B, R10D, R10W, R11, R11B, R11D, 70 R11W, R12, R12B, R12D, R12W, R13, R13B, R13D, R13W, R14, R14B, R14D, R14W, R15, 71 R15B, R15D, R15W, R8, R8B, R8D, R8W, R9, R9B, R9D, R9W, RAX, RBP, RBX, RCX, RDI, 72 RDX, RIP, RSI, RSP, SI, SIL, SP, SPL, ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7, 73 XMM0, XMM1, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM2, XMM3, XMM4, XMM5, 74 XMM6, XMM7, XMM8, XMM9, 75 76 $ llvm-tblgen X86.td -print-enums -class=Instruction 77 ABS_F, ABS_Fp32, ABS_Fp64, ABS_Fp80, ADC32mi, ADC32mi8, ADC32mr, ADC32ri, 78 ADC32ri8, ADC32rm, ADC32rr, ADC64mi32, ADC64mi8, ADC64mr, ADC64ri32, ADC64ri8, 79 ADC64rm, ADC64rr, ADD16mi, ADD16mi8, ADD16mr, ADD16ri, ADD16ri8, ADD16rm, 80 ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr, 81 ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ... 82 83The default backend prints out all of the records. There is also a general 84backend which outputs all the records as a JSON data structure, enabled using 85the `-dump-json` option. 86 87If you plan to use TableGen, you will most likely have to write a `backend`_ 88that extracts the information specific to what you need and formats it in the 89appropriate way. You can do this by extending TableGen itself in C++, or by 90writing a script in any language that can consume the JSON output. 91 92Example 93------- 94 95With no other arguments, `llvm-tblgen` parses the specified file and prints out all 96of the classes, then all of the definitions. This is a good way to see what the 97various definitions expand to fully. Running this on the ``X86.td`` file prints 98this (at the time of this writing): 99 100.. code-block:: text 101 102 ... 103 def ADD32rr { // Instruction X86Inst I 104 string Namespace = "X86"; 105 dag OutOperandList = (outs GR32:$dst); 106 dag InOperandList = (ins GR32:$src1, GR32:$src2); 107 string AsmString = "add{l}\t{$src2, $dst|$dst, $src2}"; 108 list<dag> Pattern = [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]; 109 list<Register> Uses = []; 110 list<Register> Defs = [EFLAGS]; 111 list<Predicate> Predicates = []; 112 int CodeSize = 3; 113 int AddedComplexity = 0; 114 bit isReturn = 0; 115 bit isBranch = 0; 116 bit isIndirectBranch = 0; 117 bit isBarrier = 0; 118 bit isCall = 0; 119 bit canFoldAsLoad = 0; 120 bit mayLoad = 0; 121 bit mayStore = 0; 122 bit isImplicitDef = 0; 123 bit isConvertibleToThreeAddress = 1; 124 bit isCommutable = 1; 125 bit isTerminator = 0; 126 bit isReMaterializable = 0; 127 bit isPredicable = 0; 128 bit hasDelaySlot = 0; 129 bit usesCustomInserter = 0; 130 bit hasCtrlDep = 0; 131 bit isNotDuplicable = 0; 132 bit hasSideEffects = 0; 133 InstrItinClass Itinerary = NoItinerary; 134 string Constraints = ""; 135 string DisableEncoding = ""; 136 bits<8> Opcode = { 0, 0, 0, 0, 0, 0, 0, 1 }; 137 Format Form = MRMDestReg; 138 bits<6> FormBits = { 0, 0, 0, 0, 1, 1 }; 139 ImmType ImmT = NoImm; 140 bits<3> ImmTypeBits = { 0, 0, 0 }; 141 bit hasOpSizePrefix = 0; 142 bit hasAdSizePrefix = 0; 143 bits<4> Prefix = { 0, 0, 0, 0 }; 144 bit hasREX_WPrefix = 0; 145 FPFormat FPForm = ?; 146 bits<3> FPFormBits = { 0, 0, 0 }; 147 } 148 ... 149 150This definition corresponds to the 32-bit register-register ``add`` instruction 151of the x86 architecture. ``def ADD32rr`` defines a record named 152``ADD32rr``, and the comment at the end of the line indicates the superclasses 153of the definition. The body of the record contains all of the data that 154TableGen assembled for the record, indicating that the instruction is part of 155the "X86" namespace, the pattern indicating how the instruction is selected by 156the code generator, that it is a two-address instruction, has a particular 157encoding, etc. The contents and semantics of the information in the record are 158specific to the needs of the X86 backend, and are only shown as an example. 159 160As you can see, a lot of information is needed for every instruction supported 161by the code generator, and specifying it all manually would be unmaintainable, 162prone to bugs, and tiring to do in the first place. Because we are using 163TableGen, all of the information was derived from the following definition: 164 165.. code-block:: text 166 167 let Defs = [EFLAGS], 168 isCommutable = 1, // X = ADD Y,Z --> X = ADD Z,Y 169 isConvertibleToThreeAddress = 1 in // Can transform into LEA. 170 def ADD32rr : I<0x01, MRMDestReg, (outs GR32:$dst), 171 (ins GR32:$src1, GR32:$src2), 172 "add{l}\t{$src2, $dst|$dst, $src2}", 173 [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>; 174 175This definition makes use of the custom class ``I`` (extended from the custom 176class ``X86Inst``), which is defined in the X86-specific TableGen file, to 177factor out the common features that instructions of its class share. A key 178feature of TableGen is that it allows the end-user to define the abstractions 179they prefer to use when describing their information. 180 181Syntax 182====== 183 184TableGen has a syntax that is loosely based on C++ templates, with built-in 185types and specification. In addition, TableGen's syntax introduces some 186automation concepts like multiclass, foreach, let, etc. 187 188Basic concepts 189-------------- 190 191TableGen files consist of two key parts: 'classes' and 'definitions', both of 192which are considered 'records'. 193 194**TableGen records** have a unique name, a list of values, and a list of 195superclasses. The list of values is the main data that TableGen builds for each 196record; it is this that holds the domain specific information for the 197application. The interpretation of this data is left to a specific `backend`_, 198but the structure and format rules are taken care of and are fixed by 199TableGen. 200 201**TableGen definitions** are the concrete form of 'records'. These generally do 202not have any undefined values, and are marked with the '``def``' keyword. 203 204.. code-block:: text 205 206 def FeatureFPARMv8 : SubtargetFeature<"fp-armv8", "HasFPARMv8", "true", 207 "Enable ARMv8 FP">; 208 209In this example, FeatureFPARMv8 is ``SubtargetFeature`` record initialised 210with some values. The names of the classes are defined via the 211keyword `class` either on the same file or some other included. Most target 212TableGen files include the generic ones in ``include/llvm/Target``. 213 214**TableGen classes** are abstract records that are used to build and describe 215other records. These classes allow the end-user to build abstractions for 216either the domain they are targeting (such as "Register", "RegisterClass", and 217"Instruction" in the LLVM code generator) or for the implementor to help factor 218out common properties of records (such as "FPInst", which is used to represent 219floating point instructions in the X86 backend). TableGen keeps track of all of 220the classes that are used to build up a definition, so the backend can find all 221definitions of a particular class, such as "Instruction". 222 223.. code-block:: text 224 225 class ProcNoItin<string Name, list<SubtargetFeature> Features> 226 : Processor<Name, NoItineraries, Features>; 227 228Here, the class ProcNoItin, receiving parameters `Name` of type `string` and 229a list of target features is specializing the class Processor by passing the 230arguments down as well as hard-coding NoItineraries. 231 232**TableGen multiclasses** are groups of abstract records that are instantiated 233all at once. Each instantiation can result in multiple TableGen definitions. 234If a multiclass inherits from another multiclass, the definitions in the 235sub-multiclass become part of the current multiclass, as if they were declared 236in the current multiclass. 237 238.. code-block:: text 239 240 multiclass ro_signed_pats<string T, string Rm, dag Base, dag Offset, dag Extend, 241 dag address, ValueType sty> { 242 def : Pat<(i32 (!cast<SDNode>("sextload" # sty) address)), 243 (!cast<Instruction>("LDRS" # T # "w_" # Rm # "_RegOffset") 244 Base, Offset, Extend)>; 245 246 def : Pat<(i64 (!cast<SDNode>("sextload" # sty) address)), 247 (!cast<Instruction>("LDRS" # T # "x_" # Rm # "_RegOffset") 248 Base, Offset, Extend)>; 249 } 250 251 defm : ro_signed_pats<"B", Rm, Base, Offset, Extend, 252 !foreach(decls.pattern, address, 253 !subst(SHIFT, imm_eq0, decls.pattern)), 254 i8>; 255 256See the :doc:`TableGen Programmer's Reference <./ProgRef>` for an in-depth 257description of TableGen. 258 259 260.. _backend: 261.. _backends: 262 263TableGen backends 264================= 265 266TableGen files have no real meaning without a backend. The default operation 267when running ``*-tblgen`` is to print the information in a textual format, but 268that's only useful for debugging the TableGen files themselves. The power 269in TableGen is, however, to interpret the source files into an internal 270representation that can be generated into anything you want. 271 272Current usage of TableGen is to create huge include files with tables that you 273can either include directly (if the output is in the language you're coding), 274or be used in pre-processing via macros surrounding the include of the file. 275 276Direct output can be used if the backend already prints a table in C format 277or if the output is just a list of strings (for error and warning messages). 278Pre-processed output should be used if the same information needs to be used 279in different contexts (like Instruction names), so your backend should print 280a meta-information list that can be shaped into different compile-time formats. 281 282See :doc:`TableGen BackEnds <./BackEnds>` for a list of available 283backends, and see the :doc:`TableGen Backend Developer's Guide <./BackGuide>` 284for information on how to write and debug a new backend. 285 286Tools and Resources 287=================== 288 289In addition to this documentation, a list of tools and resources for TableGen 290can be found in TableGen's 291`README <https://github.com/llvm/llvm-project/blob/main/llvm/utils/TableGen/README.md>`_. 292 293TableGen Deficiencies 294===================== 295 296Despite being very generic, TableGen has some deficiencies that have been 297pointed out numerous times. The common theme is that, while TableGen allows 298you to build domain specific languages, the final languages that you create 299lack the power of other DSLs, which in turn increase considerably the size 300and complexity of TableGen files. 301 302At the same time, TableGen allows you to create virtually any meaning of 303the basic concepts via custom-made backends, which can pervert the original 304design and make it very hard for newcomers to understand the evil TableGen 305file. 306 307There are some in favor of extending the semantics even more, but making sure 308backends adhere to strict rules. Others are suggesting we should move to less, 309more powerful DSLs designed with specific purposes, or even reusing existing 310DSLs. 311