xref: /llvm-project/llvm/docs/TableGen/index.rst (revision 1e5338669026e1be1e0b0b8fc886cbd949616be8)
1=================
2TableGen Overview
3=================
4
5.. contents::
6   :local:
7
8.. toctree::
9   :hidden:
10
11   BackEnds
12   BackGuide
13   ProgRef
14
15Introduction
16============
17
18TableGen's purpose is to help a human develop and maintain records of
19domain-specific information.  Because there may be a large number of these
20records, it is specifically designed to allow writing flexible descriptions and
21for common features of these records to be factored out.  This reduces the
22amount of duplication in the description, reduces the chance of error, and makes
23it easier to structure domain specific information.
24
25The TableGen front end parses a file, instantiates the declarations, and
26hands the result off to a domain-specific `backend`_ for processing.  See
27the :doc:`TableGen Programmer's Reference <./ProgRef>` for an in-depth
28description of TableGen. See :doc:`tblgen - Description to C++ Code
29<../CommandGuide/tblgen>` for details on the ``*-tblgen`` commands
30that run the various flavors of TableGen.
31
32The current major users of TableGen are :doc:`The LLVM Target-Independent
33Code Generator <../CodeGenerator>` and the `Clang diagnostics and attributes
34<https://clang.llvm.org/docs/UsersManual.html#controlling-errors-and-warnings>`_.
35
36Note that if you work with TableGen frequently and use emacs or vim,
37you can find an emacs "TableGen mode" and a vim language file in the
38``llvm/utils/emacs`` and ``llvm/utils/vim`` directories of your LLVM
39distribution, respectively.
40
41.. _intro:
42
43
44The TableGen program
45====================
46
47TableGen files are interpreted by the TableGen program: `llvm-tblgen` available
48on your build directory under `bin`. It is not installed in the system (or where
49your sysroot is set to), since it has no use beyond LLVM's build process.
50
51Running TableGen
52----------------
53
54TableGen runs just like any other LLVM tool.  The first (optional) argument
55specifies the file to read.  If a filename is not specified, ``llvm-tblgen``
56reads from standard input.
57
58To be useful, one of the `backends`_ must be used.  These backends are
59selectable on the command line (type '``llvm-tblgen -help``' for a list).  For
60example, to get a list of all of the definitions that subclass a particular type
61(which can be useful for building up an enum list of these records), use the
62``-print-enums`` option:
63
64.. code-block:: bash
65
66  $ llvm-tblgen X86.td -print-enums -class=Register
67  AH, AL, AX, BH, BL, BP, BPL, BX, CH, CL, CX, DH, DI, DIL, DL, DX, EAX, EBP, EBX,
68  ECX, EDI, EDX, EFLAGS, EIP, ESI, ESP, FP0, FP1, FP2, FP3, FP4, FP5, FP6, IP,
69  MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, R10, R10B, R10D, R10W, R11, R11B, R11D,
70  R11W, R12, R12B, R12D, R12W, R13, R13B, R13D, R13W, R14, R14B, R14D, R14W, R15,
71  R15B, R15D, R15W, R8, R8B, R8D, R8W, R9, R9B, R9D, R9W, RAX, RBP, RBX, RCX, RDI,
72  RDX, RIP, RSI, RSP, SI, SIL, SP, SPL, ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7,
73  XMM0, XMM1, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM2, XMM3, XMM4, XMM5,
74  XMM6, XMM7, XMM8, XMM9,
75
76  $ llvm-tblgen X86.td -print-enums -class=Instruction
77  ABS_F, ABS_Fp32, ABS_Fp64, ABS_Fp80, ADC32mi, ADC32mi8, ADC32mr, ADC32ri,
78  ADC32ri8, ADC32rm, ADC32rr, ADC64mi32, ADC64mi8, ADC64mr, ADC64ri32, ADC64ri8,
79  ADC64rm, ADC64rr, ADD16mi, ADD16mi8, ADD16mr, ADD16ri, ADD16ri8, ADD16rm,
80  ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr,
81  ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ...
82
83The default backend prints out all of the records. There is also a general
84backend which outputs all the records as a JSON data structure, enabled using
85the `-dump-json` option.
86
87If you plan to use TableGen, you will most likely have to write a `backend`_
88that extracts the information specific to what you need and formats it in the
89appropriate way. You can do this by extending TableGen itself in C++, or by
90writing a script in any language that can consume the JSON output.
91
92Example
93-------
94
95With no other arguments, `llvm-tblgen` parses the specified file and prints out all
96of the classes, then all of the definitions.  This is a good way to see what the
97various definitions expand to fully.  Running this on the ``X86.td`` file prints
98this (at the time of this writing):
99
100.. code-block:: text
101
102  ...
103  def ADD32rr {   // Instruction X86Inst I
104    string Namespace = "X86";
105    dag OutOperandList = (outs GR32:$dst);
106    dag InOperandList = (ins GR32:$src1, GR32:$src2);
107    string AsmString = "add{l}\t{$src2, $dst|$dst, $src2}";
108    list<dag> Pattern = [(set GR32:$dst, (add GR32:$src1, GR32:$src2))];
109    list<Register> Uses = [];
110    list<Register> Defs = [EFLAGS];
111    list<Predicate> Predicates = [];
112    int CodeSize = 3;
113    int AddedComplexity = 0;
114    bit isReturn = 0;
115    bit isBranch = 0;
116    bit isIndirectBranch = 0;
117    bit isBarrier = 0;
118    bit isCall = 0;
119    bit canFoldAsLoad = 0;
120    bit mayLoad = 0;
121    bit mayStore = 0;
122    bit isImplicitDef = 0;
123    bit isConvertibleToThreeAddress = 1;
124    bit isCommutable = 1;
125    bit isTerminator = 0;
126    bit isReMaterializable = 0;
127    bit isPredicable = 0;
128    bit hasDelaySlot = 0;
129    bit usesCustomInserter = 0;
130    bit hasCtrlDep = 0;
131    bit isNotDuplicable = 0;
132    bit hasSideEffects = 0;
133    InstrItinClass Itinerary = NoItinerary;
134    string Constraints = "";
135    string DisableEncoding = "";
136    bits<8> Opcode = { 0, 0, 0, 0, 0, 0, 0, 1 };
137    Format Form = MRMDestReg;
138    bits<6> FormBits = { 0, 0, 0, 0, 1, 1 };
139    ImmType ImmT = NoImm;
140    bits<3> ImmTypeBits = { 0, 0, 0 };
141    bit hasOpSizePrefix = 0;
142    bit hasAdSizePrefix = 0;
143    bits<4> Prefix = { 0, 0, 0, 0 };
144    bit hasREX_WPrefix = 0;
145    FPFormat FPForm = ?;
146    bits<3> FPFormBits = { 0, 0, 0 };
147  }
148  ...
149
150This definition corresponds to the 32-bit register-register ``add`` instruction
151of the x86 architecture.  ``def ADD32rr`` defines a record named
152``ADD32rr``, and the comment at the end of the line indicates the superclasses
153of the definition.  The body of the record contains all of the data that
154TableGen assembled for the record, indicating that the instruction is part of
155the "X86" namespace, the pattern indicating how the instruction is selected by
156the code generator, that it is a two-address instruction, has a particular
157encoding, etc.  The contents and semantics of the information in the record are
158specific to the needs of the X86 backend, and are only shown as an example.
159
160As you can see, a lot of information is needed for every instruction supported
161by the code generator, and specifying it all manually would be unmaintainable,
162prone to bugs, and tiring to do in the first place.  Because we are using
163TableGen, all of the information was derived from the following definition:
164
165.. code-block:: text
166
167  let Defs = [EFLAGS],
168      isCommutable = 1,                  // X = ADD Y,Z --> X = ADD Z,Y
169      isConvertibleToThreeAddress = 1 in // Can transform into LEA.
170  def ADD32rr  : I<0x01, MRMDestReg, (outs GR32:$dst),
171                                     (ins GR32:$src1, GR32:$src2),
172                   "add{l}\t{$src2, $dst|$dst, $src2}",
173                   [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>;
174
175This definition makes use of the custom class ``I`` (extended from the custom
176class ``X86Inst``), which is defined in the X86-specific TableGen file, to
177factor out the common features that instructions of its class share.  A key
178feature of TableGen is that it allows the end-user to define the abstractions
179they prefer to use when describing their information.
180
181Syntax
182======
183
184TableGen has a syntax that is loosely based on C++ templates, with built-in
185types and specification. In addition, TableGen's syntax introduces some
186automation concepts like multiclass, foreach, let, etc.
187
188Basic concepts
189--------------
190
191TableGen files consist of two key parts: 'classes' and 'definitions', both of
192which are considered 'records'.
193
194**TableGen records** have a unique name, a list of values, and a list of
195superclasses.  The list of values is the main data that TableGen builds for each
196record; it is this that holds the domain specific information for the
197application.  The interpretation of this data is left to a specific `backend`_,
198but the structure and format rules are taken care of and are fixed by
199TableGen.
200
201**TableGen definitions** are the concrete form of 'records'.  These generally do
202not have any undefined values, and are marked with the '``def``' keyword.
203
204.. code-block:: text
205
206  def FeatureFPARMv8 : SubtargetFeature<"fp-armv8", "HasFPARMv8", "true",
207                                        "Enable ARMv8 FP">;
208
209In this example, FeatureFPARMv8 is ``SubtargetFeature`` record initialised
210with some values. The names of the classes are defined via the
211keyword `class` either on the same file or some other included. Most target
212TableGen files include the generic ones in ``include/llvm/Target``.
213
214**TableGen classes** are abstract records that are used to build and describe
215other records.  These classes allow the end-user to build abstractions for
216either the domain they are targeting (such as "Register", "RegisterClass", and
217"Instruction" in the LLVM code generator) or for the implementor to help factor
218out common properties of records (such as "FPInst", which is used to represent
219floating point instructions in the X86 backend).  TableGen keeps track of all of
220the classes that are used to build up a definition, so the backend can find all
221definitions of a particular class, such as "Instruction".
222
223.. code-block:: text
224
225 class ProcNoItin<string Name, list<SubtargetFeature> Features>
226       : Processor<Name, NoItineraries, Features>;
227
228Here, the class ProcNoItin, receiving parameters `Name` of type `string` and
229a list of target features is specializing the class Processor by passing the
230arguments down as well as hard-coding NoItineraries.
231
232**TableGen multiclasses** are groups of abstract records that are instantiated
233all at once.  Each instantiation can result in multiple TableGen definitions.
234If a multiclass inherits from another multiclass, the definitions in the
235sub-multiclass become part of the current multiclass, as if they were declared
236in the current multiclass.
237
238.. code-block:: text
239
240  multiclass ro_signed_pats<string T, string Rm, dag Base, dag Offset, dag Extend,
241                          dag address, ValueType sty> {
242  def : Pat<(i32 (!cast<SDNode>("sextload" # sty) address)),
243            (!cast<Instruction>("LDRS" # T # "w_" # Rm # "_RegOffset")
244              Base, Offset, Extend)>;
245
246  def : Pat<(i64 (!cast<SDNode>("sextload" # sty) address)),
247            (!cast<Instruction>("LDRS" # T # "x_" # Rm # "_RegOffset")
248              Base, Offset, Extend)>;
249  }
250
251  defm : ro_signed_pats<"B", Rm, Base, Offset, Extend,
252                        !foreach(decls.pattern, address,
253                                 !subst(SHIFT, imm_eq0, decls.pattern)),
254                        i8>;
255
256See the :doc:`TableGen Programmer's Reference <./ProgRef>` for an in-depth
257description of TableGen.
258
259
260.. _backend:
261.. _backends:
262
263TableGen backends
264=================
265
266TableGen files have no real meaning without a backend. The default operation
267when running ``*-tblgen`` is to print the information in a textual format, but
268that's only useful for debugging the TableGen files themselves. The power
269in TableGen is, however, to interpret the source files into an internal
270representation that can be generated into anything you want.
271
272Current usage of TableGen is to create huge include files with tables that you
273can either include directly (if the output is in the language you're coding),
274or be used in pre-processing via macros surrounding the include of the file.
275
276Direct output can be used if the backend already prints a table in C format
277or if the output is just a list of strings (for error and warning messages).
278Pre-processed output should be used if the same information needs to be used
279in different contexts (like Instruction names), so your backend should print
280a meta-information list that can be shaped into different compile-time formats.
281
282See :doc:`TableGen BackEnds <./BackEnds>` for a list of available
283backends, and see the :doc:`TableGen Backend Developer's Guide <./BackGuide>`
284for information on how to write and debug a new backend.
285
286Tools and Resources
287===================
288
289In addition to this documentation, a list of tools and resources for TableGen
290can be found in TableGen's
291`README <https://github.com/llvm/llvm-project/blob/main/llvm/utils/TableGen/README.md>`_.
292
293TableGen Deficiencies
294=====================
295
296Despite being very generic, TableGen has some deficiencies that have been
297pointed out numerous times. The common theme is that, while TableGen allows
298you to build domain specific languages, the final languages that you create
299lack the power of other DSLs, which in turn increase considerably the size
300and complexity of TableGen files.
301
302At the same time, TableGen allows you to create virtually any meaning of
303the basic concepts via custom-made backends, which can pervert the original
304design and make it very hard for newcomers to understand the evil TableGen
305file.
306
307There are some in favor of extending the semantics even more, but making sure
308backends adhere to strict rules. Others are suggesting we should move to less,
309more powerful DSLs designed with specific purposes, or even reusing existing
310DSLs.
311