xref: /minix3/external/bsd/llvm/dist/clang/docs/InternalsManual.rst (revision 0a6a1f1d05b60e214de2f05a7310ddd1f0e590e7)
1f4a2713aSLionel Sambuc============================
2f4a2713aSLionel Sambuc"Clang" CFE Internals Manual
3f4a2713aSLionel Sambuc============================
4f4a2713aSLionel Sambuc
5f4a2713aSLionel Sambuc.. contents::
6f4a2713aSLionel Sambuc   :local:
7f4a2713aSLionel Sambuc
8f4a2713aSLionel SambucIntroduction
9f4a2713aSLionel Sambuc============
10f4a2713aSLionel Sambuc
11f4a2713aSLionel SambucThis document describes some of the more important APIs and internal design
12f4a2713aSLionel Sambucdecisions made in the Clang C front-end.  The purpose of this document is to
13f4a2713aSLionel Sambucboth capture some of this high level information and also describe some of the
14f4a2713aSLionel Sambucdesign decisions behind it.  This is meant for people interested in hacking on
15f4a2713aSLionel SambucClang, not for end-users.  The description below is categorized by libraries,
16f4a2713aSLionel Sambucand does not describe any of the clients of the libraries.
17f4a2713aSLionel Sambuc
18f4a2713aSLionel SambucLLVM Support Library
19f4a2713aSLionel Sambuc====================
20f4a2713aSLionel Sambuc
21f4a2713aSLionel SambucThe LLVM ``libSupport`` library provides many underlying libraries and
22f4a2713aSLionel Sambuc`data-structures <http://llvm.org/docs/ProgrammersManual.html>`_, including
23f4a2713aSLionel Sambuccommand line option processing, various containers and a system abstraction
24f4a2713aSLionel Sambuclayer, which is used for file system access.
25f4a2713aSLionel Sambuc
26f4a2713aSLionel SambucThe Clang "Basic" Library
27f4a2713aSLionel Sambuc=========================
28f4a2713aSLionel Sambuc
29f4a2713aSLionel SambucThis library certainly needs a better name.  The "basic" library contains a
30f4a2713aSLionel Sambucnumber of low-level utilities for tracking and manipulating source buffers,
31f4a2713aSLionel Sambuclocations within the source buffers, diagnostics, tokens, target abstraction,
32f4a2713aSLionel Sambucand information about the subset of the language being compiled for.
33f4a2713aSLionel Sambuc
34f4a2713aSLionel SambucPart of this infrastructure is specific to C (such as the ``TargetInfo``
35f4a2713aSLionel Sambucclass), other parts could be reused for other non-C-based languages
36f4a2713aSLionel Sambuc(``SourceLocation``, ``SourceManager``, ``Diagnostics``, ``FileManager``).
37f4a2713aSLionel SambucWhen and if there is future demand we can figure out if it makes sense to
38f4a2713aSLionel Sambucintroduce a new library, move the general classes somewhere else, or introduce
39f4a2713aSLionel Sambucsome other solution.
40f4a2713aSLionel Sambuc
41f4a2713aSLionel SambucWe describe the roles of these classes in order of their dependencies.
42f4a2713aSLionel Sambuc
43f4a2713aSLionel SambucThe Diagnostics Subsystem
44f4a2713aSLionel Sambuc-------------------------
45f4a2713aSLionel Sambuc
46f4a2713aSLionel SambucThe Clang Diagnostics subsystem is an important part of how the compiler
47f4a2713aSLionel Sambuccommunicates with the human.  Diagnostics are the warnings and errors produced
48f4a2713aSLionel Sambucwhen the code is incorrect or dubious.  In Clang, each diagnostic produced has
49f4a2713aSLionel Sambuc(at the minimum) a unique ID, an English translation associated with it, a
50f4a2713aSLionel Sambuc:ref:`SourceLocation <SourceLocation>` to "put the caret", and a severity
51f4a2713aSLionel Sambuc(e.g., ``WARNING`` or ``ERROR``).  They can also optionally include a number of
52f4a2713aSLionel Sambucarguments to the dianostic (which fill in "%0"'s in the string) as well as a
53f4a2713aSLionel Sambucnumber of source ranges that related to the diagnostic.
54f4a2713aSLionel Sambuc
55f4a2713aSLionel SambucIn this section, we'll be giving examples produced by the Clang command line
56f4a2713aSLionel Sambucdriver, but diagnostics can be :ref:`rendered in many different ways
57f4a2713aSLionel Sambuc<DiagnosticClient>` depending on how the ``DiagnosticClient`` interface is
58f4a2713aSLionel Sambucimplemented.  A representative example of a diagnostic is:
59f4a2713aSLionel Sambuc
60f4a2713aSLionel Sambuc.. code-block:: c++
61f4a2713aSLionel Sambuc
62f4a2713aSLionel Sambuc  t.c:38:15: error: invalid operands to binary expression ('int *' and '_Complex float')
63f4a2713aSLionel Sambuc  P = (P-42) + Gamma*4;
64f4a2713aSLionel Sambuc      ~~~~~~ ^ ~~~~~~~
65f4a2713aSLionel Sambuc
66f4a2713aSLionel SambucIn this example, you can see the English translation, the severity (error), you
67f4a2713aSLionel Sambuccan see the source location (the caret ("``^``") and file/line/column info),
68f4a2713aSLionel Sambucthe source ranges "``~~~~``", arguments to the diagnostic ("``int*``" and
69f4a2713aSLionel Sambuc"``_Complex float``").  You'll have to believe me that there is a unique ID
70f4a2713aSLionel Sambucbacking the diagnostic :).
71f4a2713aSLionel Sambuc
72f4a2713aSLionel SambucGetting all of this to happen has several steps and involves many moving
73f4a2713aSLionel Sambucpieces, this section describes them and talks about best practices when adding
74f4a2713aSLionel Sambuca new diagnostic.
75f4a2713aSLionel Sambuc
76f4a2713aSLionel SambucThe ``Diagnostic*Kinds.td`` files
77f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
78f4a2713aSLionel Sambuc
79f4a2713aSLionel SambucDiagnostics are created by adding an entry to one of the
80f4a2713aSLionel Sambuc``clang/Basic/Diagnostic*Kinds.td`` files, depending on what library will be
81f4a2713aSLionel Sambucusing it.  From this file, :program:`tblgen` generates the unique ID of the
82f4a2713aSLionel Sambucdiagnostic, the severity of the diagnostic and the English translation + format
83f4a2713aSLionel Sambucstring.
84f4a2713aSLionel Sambuc
85f4a2713aSLionel SambucThere is little sanity with the naming of the unique ID's right now.  Some
86f4a2713aSLionel Sambucstart with ``err_``, ``warn_``, ``ext_`` to encode the severity into the name.
87f4a2713aSLionel SambucSince the enum is referenced in the C++ code that produces the diagnostic, it
88f4a2713aSLionel Sambucis somewhat useful for it to be reasonably short.
89f4a2713aSLionel Sambuc
90*0a6a1f1dSLionel SambucThe severity of the diagnostic comes from the set {``NOTE``, ``REMARK``,
91*0a6a1f1dSLionel Sambuc``WARNING``,
92f4a2713aSLionel Sambuc``EXTENSION``, ``EXTWARN``, ``ERROR``}.  The ``ERROR`` severity is used for
93f4a2713aSLionel Sambucdiagnostics indicating the program is never acceptable under any circumstances.
94f4a2713aSLionel SambucWhen an error is emitted, the AST for the input code may not be fully built.
95f4a2713aSLionel SambucThe ``EXTENSION`` and ``EXTWARN`` severities are used for extensions to the
96f4a2713aSLionel Sambuclanguage that Clang accepts.  This means that Clang fully understands and can
97f4a2713aSLionel Sambucrepresent them in the AST, but we produce diagnostics to tell the user their
98f4a2713aSLionel Sambuccode is non-portable.  The difference is that the former are ignored by
99f4a2713aSLionel Sambucdefault, and the later warn by default.  The ``WARNING`` severity is used for
100f4a2713aSLionel Sambucconstructs that are valid in the currently selected source language but that
101*0a6a1f1dSLionel Sambucare dubious in some way.  The ``REMARK`` severity provides generic information
102*0a6a1f1dSLionel Sambucabout the compilation that is not necessarily related to any dubious code.  The
103*0a6a1f1dSLionel Sambuc``NOTE`` level is used to staple more information onto previous diagnostics.
104f4a2713aSLionel Sambuc
105f4a2713aSLionel SambucThese *severities* are mapped into a smaller set (the ``Diagnostic::Level``
106*0a6a1f1dSLionel Sambucenum, {``Ignored``, ``Note``, ``Remark``, ``Warning``, ``Error``, ``Fatal``}) of
107*0a6a1f1dSLionel Sambucoutput
108f4a2713aSLionel Sambuc*levels* by the diagnostics subsystem based on various configuration options.
109f4a2713aSLionel SambucClang internally supports a fully fine grained mapping mechanism that allows
110f4a2713aSLionel Sambucyou to map almost any diagnostic to the output level that you want.  The only
111f4a2713aSLionel Sambucdiagnostics that cannot be mapped are ``NOTE``\ s, which always follow the
112f4a2713aSLionel Sambucseverity of the previously emitted diagnostic and ``ERROR``\ s, which can only
113f4a2713aSLionel Sambucbe mapped to ``Fatal`` (it is not possible to turn an error into a warning, for
114f4a2713aSLionel Sambucexample).
115f4a2713aSLionel Sambuc
116f4a2713aSLionel SambucDiagnostic mappings are used in many ways.  For example, if the user specifies
117f4a2713aSLionel Sambuc``-pedantic``, ``EXTENSION`` maps to ``Warning``, if they specify
118f4a2713aSLionel Sambuc``-pedantic-errors``, it turns into ``Error``.  This is used to implement
119f4a2713aSLionel Sambucoptions like ``-Wunused_macros``, ``-Wundef`` etc.
120f4a2713aSLionel Sambuc
121f4a2713aSLionel SambucMapping to ``Fatal`` should only be used for diagnostics that are considered so
122f4a2713aSLionel Sambucsevere that error recovery won't be able to recover sensibly from them (thus
123f4a2713aSLionel Sambucspewing a ton of bogus errors).  One example of this class of error are failure
124f4a2713aSLionel Sambucto ``#include`` a file.
125f4a2713aSLionel Sambuc
126f4a2713aSLionel SambucThe Format String
127f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^
128f4a2713aSLionel Sambuc
129f4a2713aSLionel SambucThe format string for the diagnostic is very simple, but it has some power.  It
130f4a2713aSLionel Sambuctakes the form of a string in English with markers that indicate where and how
131f4a2713aSLionel Sambucarguments to the diagnostic are inserted and formatted.  For example, here are
132f4a2713aSLionel Sambucsome simple format strings:
133f4a2713aSLionel Sambuc
134f4a2713aSLionel Sambuc.. code-block:: c++
135f4a2713aSLionel Sambuc
136f4a2713aSLionel Sambuc  "binary integer literals are an extension"
137f4a2713aSLionel Sambuc  "format string contains '\\0' within the string body"
138f4a2713aSLionel Sambuc  "more '%%' conversions than data arguments"
139f4a2713aSLionel Sambuc  "invalid operands to binary expression (%0 and %1)"
140f4a2713aSLionel Sambuc  "overloaded '%0' must be a %select{unary|binary|unary or binary}2 operator"
141f4a2713aSLionel Sambuc       " (has %1 parameter%s1)"
142f4a2713aSLionel Sambuc
143f4a2713aSLionel SambucThese examples show some important points of format strings.  You can use any
144f4a2713aSLionel Sambucplain ASCII character in the diagnostic string except "``%``" without a
145f4a2713aSLionel Sambucproblem, but these are C strings, so you have to use and be aware of all the C
146f4a2713aSLionel Sambucescape sequences (as in the second example).  If you want to produce a "``%``"
147f4a2713aSLionel Sambucin the output, use the "``%%``" escape sequence, like the third diagnostic.
148f4a2713aSLionel SambucFinally, Clang uses the "``%...[digit]``" sequences to specify where and how
149f4a2713aSLionel Sambucarguments to the diagnostic are formatted.
150f4a2713aSLionel Sambuc
151f4a2713aSLionel SambucArguments to the diagnostic are numbered according to how they are specified by
152f4a2713aSLionel Sambucthe C++ code that :ref:`produces them <internals-producing-diag>`, and are
153f4a2713aSLionel Sambucreferenced by ``%0`` .. ``%9``.  If you have more than 10 arguments to your
154f4a2713aSLionel Sambucdiagnostic, you are doing something wrong :).  Unlike ``printf``, there is no
155f4a2713aSLionel Sambucrequirement that arguments to the diagnostic end up in the output in the same
156f4a2713aSLionel Sambucorder as they are specified, you could have a format string with "``%1 %0``"
157f4a2713aSLionel Sambucthat swaps them, for example.  The text in between the percent and digit are
158f4a2713aSLionel Sambucformatting instructions.  If there are no instructions, the argument is just
159f4a2713aSLionel Sambucturned into a string and substituted in.
160f4a2713aSLionel Sambuc
161f4a2713aSLionel SambucHere are some "best practices" for writing the English format string:
162f4a2713aSLionel Sambuc
163f4a2713aSLionel Sambuc* Keep the string short.  It should ideally fit in the 80 column limit of the
164f4a2713aSLionel Sambuc  ``DiagnosticKinds.td`` file.  This avoids the diagnostic wrapping when
165f4a2713aSLionel Sambuc  printed, and forces you to think about the important point you are conveying
166f4a2713aSLionel Sambuc  with the diagnostic.
167f4a2713aSLionel Sambuc* Take advantage of location information.  The user will be able to see the
168f4a2713aSLionel Sambuc  line and location of the caret, so you don't need to tell them that the
169f4a2713aSLionel Sambuc  problem is with the 4th argument to the function: just point to it.
170f4a2713aSLionel Sambuc* Do not capitalize the diagnostic string, and do not end it with a period.
171f4a2713aSLionel Sambuc* If you need to quote something in the diagnostic string, use single quotes.
172f4a2713aSLionel Sambuc
173f4a2713aSLionel SambucDiagnostics should never take random English strings as arguments: you
174f4a2713aSLionel Sambucshouldn't use "``you have a problem with %0``" and pass in things like "``your
175f4a2713aSLionel Sambucargument``" or "``your return value``" as arguments.  Doing this prevents
176f4a2713aSLionel Sambuc:ref:`translating <internals-diag-translation>` the Clang diagnostics to other
177f4a2713aSLionel Sambuclanguages (because they'll get random English words in their otherwise
178f4a2713aSLionel Sambuclocalized diagnostic).  The exceptions to this are C/C++ language keywords
179f4a2713aSLionel Sambuc(e.g., ``auto``, ``const``, ``mutable``, etc) and C/C++ operators (``/=``).
180f4a2713aSLionel SambucNote that things like "pointer" and "reference" are not keywords.  On the other
181f4a2713aSLionel Sambuchand, you *can* include anything that comes from the user's source code,
182f4a2713aSLionel Sambucincluding variable names, types, labels, etc.  The "``select``" format can be
183f4a2713aSLionel Sambucused to achieve this sort of thing in a localizable way, see below.
184f4a2713aSLionel Sambuc
185f4a2713aSLionel SambucFormatting a Diagnostic Argument
186f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
187f4a2713aSLionel Sambuc
188f4a2713aSLionel SambucArguments to diagnostics are fully typed internally, and come from a couple
189f4a2713aSLionel Sambucdifferent classes: integers, types, names, and random strings.  Depending on
190f4a2713aSLionel Sambucthe class of the argument, it can be optionally formatted in different ways.
191f4a2713aSLionel SambucThis gives the ``DiagnosticClient`` information about what the argument means
192f4a2713aSLionel Sambucwithout requiring it to use a specific presentation (consider this MVC for
193f4a2713aSLionel SambucClang :).
194f4a2713aSLionel Sambuc
195f4a2713aSLionel SambucHere are the different diagnostic argument formats currently supported by
196f4a2713aSLionel SambucClang:
197f4a2713aSLionel Sambuc
198f4a2713aSLionel Sambuc**"s" format**
199f4a2713aSLionel Sambuc
200f4a2713aSLionel SambucExample:
201f4a2713aSLionel Sambuc  ``"requires %1 parameter%s1"``
202f4a2713aSLionel SambucClass:
203f4a2713aSLionel Sambuc  Integers
204f4a2713aSLionel SambucDescription:
205f4a2713aSLionel Sambuc  This is a simple formatter for integers that is useful when producing English
206f4a2713aSLionel Sambuc  diagnostics.  When the integer is 1, it prints as nothing.  When the integer
207f4a2713aSLionel Sambuc  is not 1, it prints as "``s``".  This allows some simple grammatical forms to
208f4a2713aSLionel Sambuc  be to be handled correctly, and eliminates the need to use gross things like
209f4a2713aSLionel Sambuc  ``"requires %1 parameter(s)"``.
210f4a2713aSLionel Sambuc
211f4a2713aSLionel Sambuc**"select" format**
212f4a2713aSLionel Sambuc
213f4a2713aSLionel SambucExample:
214f4a2713aSLionel Sambuc  ``"must be a %select{unary|binary|unary or binary}2 operator"``
215f4a2713aSLionel SambucClass:
216f4a2713aSLionel Sambuc  Integers
217f4a2713aSLionel SambucDescription:
218f4a2713aSLionel Sambuc  This format specifier is used to merge multiple related diagnostics together
219f4a2713aSLionel Sambuc  into one common one, without requiring the difference to be specified as an
220f4a2713aSLionel Sambuc  English string argument.  Instead of specifying the string, the diagnostic
221f4a2713aSLionel Sambuc  gets an integer argument and the format string selects the numbered option.
222f4a2713aSLionel Sambuc  In this case, the "``%2``" value must be an integer in the range [0..2].  If
223f4a2713aSLionel Sambuc  it is 0, it prints "unary", if it is 1 it prints "binary" if it is 2, it
224f4a2713aSLionel Sambuc  prints "unary or binary".  This allows other language translations to
225f4a2713aSLionel Sambuc  substitute reasonable words (or entire phrases) based on the semantics of the
226f4a2713aSLionel Sambuc  diagnostic instead of having to do things textually.  The selected string
227f4a2713aSLionel Sambuc  does undergo formatting.
228f4a2713aSLionel Sambuc
229f4a2713aSLionel Sambuc**"plural" format**
230f4a2713aSLionel Sambuc
231f4a2713aSLionel SambucExample:
232f4a2713aSLionel Sambuc  ``"you have %1 %plural{1:mouse|:mice}1 connected to your computer"``
233f4a2713aSLionel SambucClass:
234f4a2713aSLionel Sambuc  Integers
235f4a2713aSLionel SambucDescription:
236f4a2713aSLionel Sambuc  This is a formatter for complex plural forms.  It is designed to handle even
237f4a2713aSLionel Sambuc  the requirements of languages with very complex plural forms, as many Baltic
238f4a2713aSLionel Sambuc  languages have.  The argument consists of a series of expression/form pairs,
239f4a2713aSLionel Sambuc  separated by ":", where the first form whose expression evaluates to true is
240f4a2713aSLionel Sambuc  the result of the modifier.
241f4a2713aSLionel Sambuc
242f4a2713aSLionel Sambuc  An expression can be empty, in which case it is always true.  See the example
243f4a2713aSLionel Sambuc  at the top.  Otherwise, it is a series of one or more numeric conditions,
244f4a2713aSLionel Sambuc  separated by ",".  If any condition matches, the expression matches.  Each
245f4a2713aSLionel Sambuc  numeric condition can take one of three forms.
246f4a2713aSLionel Sambuc
247f4a2713aSLionel Sambuc  * number: A simple decimal number matches if the argument is the same as the
248f4a2713aSLionel Sambuc    number.  Example: ``"%plural{1:mouse|:mice}4"``
249f4a2713aSLionel Sambuc  * range: A range in square brackets matches if the argument is within the
250f4a2713aSLionel Sambuc    range.  Then range is inclusive on both ends.  Example:
251f4a2713aSLionel Sambuc    ``"%plural{0:none|1:one|[2,5]:some|:many}2"``
252f4a2713aSLionel Sambuc  * modulo: A modulo operator is followed by a number, and equals sign and
253f4a2713aSLionel Sambuc    either a number or a range.  The tests are the same as for plain numbers
254f4a2713aSLionel Sambuc    and ranges, but the argument is taken modulo the number first.  Example:
255f4a2713aSLionel Sambuc    ``"%plural{%100=0:even hundred|%100=[1,50]:lower half|:everything else}1"``
256f4a2713aSLionel Sambuc
257f4a2713aSLionel Sambuc  The parser is very unforgiving.  A syntax error, even whitespace, will abort,
258f4a2713aSLionel Sambuc  as will a failure to match the argument against any expression.
259f4a2713aSLionel Sambuc
260f4a2713aSLionel Sambuc**"ordinal" format**
261f4a2713aSLionel Sambuc
262f4a2713aSLionel SambucExample:
263f4a2713aSLionel Sambuc  ``"ambiguity in %ordinal0 argument"``
264f4a2713aSLionel SambucClass:
265f4a2713aSLionel Sambuc  Integers
266f4a2713aSLionel SambucDescription:
267f4a2713aSLionel Sambuc  This is a formatter which represents the argument number as an ordinal: the
268f4a2713aSLionel Sambuc  value ``1`` becomes ``1st``, ``3`` becomes ``3rd``, and so on.  Values less
269f4a2713aSLionel Sambuc  than ``1`` are not supported.  This formatter is currently hard-coded to use
270f4a2713aSLionel Sambuc  English ordinals.
271f4a2713aSLionel Sambuc
272f4a2713aSLionel Sambuc**"objcclass" format**
273f4a2713aSLionel Sambuc
274f4a2713aSLionel SambucExample:
275f4a2713aSLionel Sambuc  ``"method %objcclass0 not found"``
276f4a2713aSLionel SambucClass:
277f4a2713aSLionel Sambuc  ``DeclarationName``
278f4a2713aSLionel SambucDescription:
279f4a2713aSLionel Sambuc  This is a simple formatter that indicates the ``DeclarationName`` corresponds
280f4a2713aSLionel Sambuc  to an Objective-C class method selector.  As such, it prints the selector
281f4a2713aSLionel Sambuc  with a leading "``+``".
282f4a2713aSLionel Sambuc
283f4a2713aSLionel Sambuc**"objcinstance" format**
284f4a2713aSLionel Sambuc
285f4a2713aSLionel SambucExample:
286f4a2713aSLionel Sambuc  ``"method %objcinstance0 not found"``
287f4a2713aSLionel SambucClass:
288f4a2713aSLionel Sambuc  ``DeclarationName``
289f4a2713aSLionel SambucDescription:
290f4a2713aSLionel Sambuc  This is a simple formatter that indicates the ``DeclarationName`` corresponds
291f4a2713aSLionel Sambuc  to an Objective-C instance method selector.  As such, it prints the selector
292f4a2713aSLionel Sambuc  with a leading "``-``".
293f4a2713aSLionel Sambuc
294f4a2713aSLionel Sambuc**"q" format**
295f4a2713aSLionel Sambuc
296f4a2713aSLionel SambucExample:
297f4a2713aSLionel Sambuc  ``"candidate found by name lookup is %q0"``
298f4a2713aSLionel SambucClass:
299f4a2713aSLionel Sambuc  ``NamedDecl *``
300f4a2713aSLionel SambucDescription:
301f4a2713aSLionel Sambuc  This formatter indicates that the fully-qualified name of the declaration
302f4a2713aSLionel Sambuc  should be printed, e.g., "``std::vector``" rather than "``vector``".
303f4a2713aSLionel Sambuc
304f4a2713aSLionel Sambuc**"diff" format**
305f4a2713aSLionel Sambuc
306f4a2713aSLionel SambucExample:
307f4a2713aSLionel Sambuc  ``"no known conversion %diff{from $ to $|from argument type to parameter type}1,2"``
308f4a2713aSLionel SambucClass:
309f4a2713aSLionel Sambuc  ``QualType``
310f4a2713aSLionel SambucDescription:
311f4a2713aSLionel Sambuc  This formatter takes two ``QualType``\ s and attempts to print a template
312f4a2713aSLionel Sambuc  difference between the two.  If tree printing is off, the text inside the
313f4a2713aSLionel Sambuc  braces before the pipe is printed, with the formatted text replacing the $.
314f4a2713aSLionel Sambuc  If tree printing is on, the text after the pipe is printed and a type tree is
315f4a2713aSLionel Sambuc  printed after the diagnostic message.
316f4a2713aSLionel Sambuc
317f4a2713aSLionel SambucIt is really easy to add format specifiers to the Clang diagnostics system, but
318f4a2713aSLionel Sambucthey should be discussed before they are added.  If you are creating a lot of
319f4a2713aSLionel Sambucrepetitive diagnostics and/or have an idea for a useful formatter, please bring
320f4a2713aSLionel Sambucit up on the cfe-dev mailing list.
321f4a2713aSLionel Sambuc
322f4a2713aSLionel Sambuc.. _internals-producing-diag:
323f4a2713aSLionel Sambuc
324f4a2713aSLionel SambucProducing the Diagnostic
325f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^
326f4a2713aSLionel Sambuc
327f4a2713aSLionel SambucNow that you've created the diagnostic in the ``Diagnostic*Kinds.td`` file, you
328f4a2713aSLionel Sambucneed to write the code that detects the condition in question and emits the new
329f4a2713aSLionel Sambucdiagnostic.  Various components of Clang (e.g., the preprocessor, ``Sema``,
330f4a2713aSLionel Sambucetc.) provide a helper function named "``Diag``".  It creates a diagnostic and
331f4a2713aSLionel Sambucaccepts the arguments, ranges, and other information that goes along with it.
332f4a2713aSLionel Sambuc
333f4a2713aSLionel SambucFor example, the binary expression error comes from code like this:
334f4a2713aSLionel Sambuc
335f4a2713aSLionel Sambuc.. code-block:: c++
336f4a2713aSLionel Sambuc
337f4a2713aSLionel Sambuc  if (various things that are bad)
338f4a2713aSLionel Sambuc    Diag(Loc, diag::err_typecheck_invalid_operands)
339f4a2713aSLionel Sambuc      << lex->getType() << rex->getType()
340f4a2713aSLionel Sambuc      << lex->getSourceRange() << rex->getSourceRange();
341f4a2713aSLionel Sambuc
342f4a2713aSLionel SambucThis shows that use of the ``Diag`` method: it takes a location (a
343f4a2713aSLionel Sambuc:ref:`SourceLocation <SourceLocation>` object) and a diagnostic enum value
344f4a2713aSLionel Sambuc(which matches the name from ``Diagnostic*Kinds.td``).  If the diagnostic takes
345f4a2713aSLionel Sambucarguments, they are specified with the ``<<`` operator: the first argument
346f4a2713aSLionel Sambucbecomes ``%0``, the second becomes ``%1``, etc.  The diagnostic interface
347f4a2713aSLionel Sambucallows you to specify arguments of many different types, including ``int`` and
348f4a2713aSLionel Sambuc``unsigned`` for integer arguments, ``const char*`` and ``std::string`` for
349f4a2713aSLionel Sambucstring arguments, ``DeclarationName`` and ``const IdentifierInfo *`` for names,
350f4a2713aSLionel Sambuc``QualType`` for types, etc.  ``SourceRange``\ s are also specified with the
351f4a2713aSLionel Sambuc``<<`` operator, but do not have a specific ordering requirement.
352f4a2713aSLionel Sambuc
353f4a2713aSLionel SambucAs you can see, adding and producing a diagnostic is pretty straightforward.
354f4a2713aSLionel SambucThe hard part is deciding exactly what you need to say to help the user,
355f4a2713aSLionel Sambucpicking a suitable wording, and providing the information needed to format it
356f4a2713aSLionel Sambuccorrectly.  The good news is that the call site that issues a diagnostic should
357f4a2713aSLionel Sambucbe completely independent of how the diagnostic is formatted and in what
358f4a2713aSLionel Sambuclanguage it is rendered.
359f4a2713aSLionel Sambuc
360f4a2713aSLionel SambucFix-It Hints
361f4a2713aSLionel Sambuc^^^^^^^^^^^^
362f4a2713aSLionel Sambuc
363f4a2713aSLionel SambucIn some cases, the front end emits diagnostics when it is clear that some small
364f4a2713aSLionel Sambucchange to the source code would fix the problem.  For example, a missing
365f4a2713aSLionel Sambucsemicolon at the end of a statement or a use of deprecated syntax that is
366f4a2713aSLionel Sambuceasily rewritten into a more modern form.  Clang tries very hard to emit the
367f4a2713aSLionel Sambucdiagnostic and recover gracefully in these and other cases.
368f4a2713aSLionel Sambuc
369f4a2713aSLionel SambucHowever, for these cases where the fix is obvious, the diagnostic can be
370f4a2713aSLionel Sambucannotated with a hint (referred to as a "fix-it hint") that describes how to
371f4a2713aSLionel Sambucchange the code referenced by the diagnostic to fix the problem.  For example,
372f4a2713aSLionel Sambucit might add the missing semicolon at the end of the statement or rewrite the
373f4a2713aSLionel Sambucuse of a deprecated construct into something more palatable.  Here is one such
374f4a2713aSLionel Sambucexample from the C++ front end, where we warn about the right-shift operator
375f4a2713aSLionel Sambucchanging meaning from C++98 to C++11:
376f4a2713aSLionel Sambuc
377f4a2713aSLionel Sambuc.. code-block:: c++
378f4a2713aSLionel Sambuc
379f4a2713aSLionel Sambuc  test.cpp:3:7: warning: use of right-shift operator ('>>') in template argument
380f4a2713aSLionel Sambuc                         will require parentheses in C++11
381f4a2713aSLionel Sambuc  A<100 >> 2> *a;
382f4a2713aSLionel Sambuc        ^
383f4a2713aSLionel Sambuc    (       )
384f4a2713aSLionel Sambuc
385f4a2713aSLionel SambucHere, the fix-it hint is suggesting that parentheses be added, and showing
386f4a2713aSLionel Sambucexactly where those parentheses would be inserted into the source code.  The
387f4a2713aSLionel Sambucfix-it hints themselves describe what changes to make to the source code in an
388f4a2713aSLionel Sambucabstract manner, which the text diagnostic printer renders as a line of
389f4a2713aSLionel Sambuc"insertions" below the caret line.  :ref:`Other diagnostic clients
390f4a2713aSLionel Sambuc<DiagnosticClient>` might choose to render the code differently (e.g., as
391f4a2713aSLionel Sambucmarkup inline) or even give the user the ability to automatically fix the
392f4a2713aSLionel Sambucproblem.
393f4a2713aSLionel Sambuc
394f4a2713aSLionel SambucFix-it hints on errors and warnings need to obey these rules:
395f4a2713aSLionel Sambuc
396f4a2713aSLionel Sambuc* Since they are automatically applied if ``-Xclang -fixit`` is passed to the
397f4a2713aSLionel Sambuc  driver, they should only be used when it's very likely they match the user's
398f4a2713aSLionel Sambuc  intent.
399f4a2713aSLionel Sambuc* Clang must recover from errors as if the fix-it had been applied.
400f4a2713aSLionel Sambuc
401f4a2713aSLionel SambucIf a fix-it can't obey these rules, put the fix-it on a note.  Fix-its on notes
402f4a2713aSLionel Sambucare not applied automatically.
403f4a2713aSLionel Sambuc
404f4a2713aSLionel SambucAll fix-it hints are described by the ``FixItHint`` class, instances of which
405f4a2713aSLionel Sambucshould be attached to the diagnostic using the ``<<`` operator in the same way
406f4a2713aSLionel Sambucthat highlighted source ranges and arguments are passed to the diagnostic.
407f4a2713aSLionel SambucFix-it hints can be created with one of three constructors:
408f4a2713aSLionel Sambuc
409f4a2713aSLionel Sambuc* ``FixItHint::CreateInsertion(Loc, Code)``
410f4a2713aSLionel Sambuc
411f4a2713aSLionel Sambuc    Specifies that the given ``Code`` (a string) should be inserted before the
412f4a2713aSLionel Sambuc    source location ``Loc``.
413f4a2713aSLionel Sambuc
414f4a2713aSLionel Sambuc* ``FixItHint::CreateRemoval(Range)``
415f4a2713aSLionel Sambuc
416f4a2713aSLionel Sambuc    Specifies that the code in the given source ``Range`` should be removed.
417f4a2713aSLionel Sambuc
418f4a2713aSLionel Sambuc* ``FixItHint::CreateReplacement(Range, Code)``
419f4a2713aSLionel Sambuc
420f4a2713aSLionel Sambuc    Specifies that the code in the given source ``Range`` should be removed,
421f4a2713aSLionel Sambuc    and replaced with the given ``Code`` string.
422f4a2713aSLionel Sambuc
423f4a2713aSLionel Sambuc.. _DiagnosticClient:
424f4a2713aSLionel Sambuc
425f4a2713aSLionel SambucThe ``DiagnosticClient`` Interface
426f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
427f4a2713aSLionel Sambuc
428f4a2713aSLionel SambucOnce code generates a diagnostic with all of the arguments and the rest of the
429f4a2713aSLionel Sambucrelevant information, Clang needs to know what to do with it.  As previously
430f4a2713aSLionel Sambucmentioned, the diagnostic machinery goes through some filtering to map a
431f4a2713aSLionel Sambucseverity onto a diagnostic level, then (assuming the diagnostic is not mapped
432f4a2713aSLionel Sambucto "``Ignore``") it invokes an object that implements the ``DiagnosticClient``
433f4a2713aSLionel Sambucinterface with the information.
434f4a2713aSLionel Sambuc
435f4a2713aSLionel SambucIt is possible to implement this interface in many different ways.  For
436f4a2713aSLionel Sambucexample, the normal Clang ``DiagnosticClient`` (named
437f4a2713aSLionel Sambuc``TextDiagnosticPrinter``) turns the arguments into strings (according to the
438f4a2713aSLionel Sambucvarious formatting rules), prints out the file/line/column information and the
439f4a2713aSLionel Sambucstring, then prints out the line of code, the source ranges, and the caret.
440f4a2713aSLionel SambucHowever, this behavior isn't required.
441f4a2713aSLionel Sambuc
442f4a2713aSLionel SambucAnother implementation of the ``DiagnosticClient`` interface is the
443f4a2713aSLionel Sambuc``TextDiagnosticBuffer`` class, which is used when Clang is in ``-verify``
444f4a2713aSLionel Sambucmode.  Instead of formatting and printing out the diagnostics, this
445f4a2713aSLionel Sambucimplementation just captures and remembers the diagnostics as they fly by.
446f4a2713aSLionel SambucThen ``-verify`` compares the list of produced diagnostics to the list of
447f4a2713aSLionel Sambucexpected ones.  If they disagree, it prints out its own output.  Full
448f4a2713aSLionel Sambucdocumentation for the ``-verify`` mode can be found in the Clang API
449f4a2713aSLionel Sambucdocumentation for `VerifyDiagnosticConsumer
450f4a2713aSLionel Sambuc</doxygen/classclang_1_1VerifyDiagnosticConsumer.html#details>`_.
451f4a2713aSLionel Sambuc
452f4a2713aSLionel SambucThere are many other possible implementations of this interface, and this is
453f4a2713aSLionel Sambucwhy we prefer diagnostics to pass down rich structured information in
454f4a2713aSLionel Sambucarguments.  For example, an HTML output might want declaration names be
455f4a2713aSLionel Sambuclinkified to where they come from in the source.  Another example is that a GUI
456f4a2713aSLionel Sambucmight let you click on typedefs to expand them.  This application would want to
457f4a2713aSLionel Sambucpass significantly more information about types through to the GUI than a
458f4a2713aSLionel Sambucsimple flat string.  The interface allows this to happen.
459f4a2713aSLionel Sambuc
460f4a2713aSLionel Sambuc.. _internals-diag-translation:
461f4a2713aSLionel Sambuc
462f4a2713aSLionel SambucAdding Translations to Clang
463f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^^^
464f4a2713aSLionel Sambuc
465f4a2713aSLionel SambucNot possible yet! Diagnostic strings should be written in UTF-8, the client can
466f4a2713aSLionel Sambuctranslate to the relevant code page if needed.  Each translation completely
467f4a2713aSLionel Sambucreplaces the format string for the diagnostic.
468f4a2713aSLionel Sambuc
469f4a2713aSLionel Sambuc.. _SourceLocation:
470f4a2713aSLionel Sambuc.. _SourceManager:
471f4a2713aSLionel Sambuc
472f4a2713aSLionel SambucThe ``SourceLocation`` and ``SourceManager`` classes
473f4a2713aSLionel Sambuc----------------------------------------------------
474f4a2713aSLionel Sambuc
475f4a2713aSLionel SambucStrangely enough, the ``SourceLocation`` class represents a location within the
476f4a2713aSLionel Sambucsource code of the program.  Important design points include:
477f4a2713aSLionel Sambuc
478f4a2713aSLionel Sambuc#. ``sizeof(SourceLocation)`` must be extremely small, as these are embedded
479f4a2713aSLionel Sambuc   into many AST nodes and are passed around often.  Currently it is 32 bits.
480f4a2713aSLionel Sambuc#. ``SourceLocation`` must be a simple value object that can be efficiently
481f4a2713aSLionel Sambuc   copied.
482f4a2713aSLionel Sambuc#. We should be able to represent a source location for any byte of any input
483f4a2713aSLionel Sambuc   file.  This includes in the middle of tokens, in whitespace, in trigraphs,
484f4a2713aSLionel Sambuc   etc.
485f4a2713aSLionel Sambuc#. A ``SourceLocation`` must encode the current ``#include`` stack that was
486f4a2713aSLionel Sambuc   active when the location was processed.  For example, if the location
487f4a2713aSLionel Sambuc   corresponds to a token, it should contain the set of ``#include``\ s active
488f4a2713aSLionel Sambuc   when the token was lexed.  This allows us to print the ``#include`` stack
489f4a2713aSLionel Sambuc   for a diagnostic.
490f4a2713aSLionel Sambuc#. ``SourceLocation`` must be able to describe macro expansions, capturing both
491f4a2713aSLionel Sambuc   the ultimate instantiation point and the source of the original character
492f4a2713aSLionel Sambuc   data.
493f4a2713aSLionel Sambuc
494f4a2713aSLionel SambucIn practice, the ``SourceLocation`` works together with the ``SourceManager``
495f4a2713aSLionel Sambucclass to encode two pieces of information about a location: its spelling
496f4a2713aSLionel Sambuclocation and its instantiation location.  For most tokens, these will be the
497f4a2713aSLionel Sambucsame.  However, for a macro expansion (or tokens that came from a ``_Pragma``
498f4a2713aSLionel Sambucdirective) these will describe the location of the characters corresponding to
499f4a2713aSLionel Sambucthe token and the location where the token was used (i.e., the macro
500f4a2713aSLionel Sambucinstantiation point or the location of the ``_Pragma`` itself).
501f4a2713aSLionel Sambuc
502f4a2713aSLionel SambucThe Clang front-end inherently depends on the location of a token being tracked
503f4a2713aSLionel Sambuccorrectly.  If it is ever incorrect, the front-end may get confused and die.
504f4a2713aSLionel SambucThe reason for this is that the notion of the "spelling" of a ``Token`` in
505f4a2713aSLionel SambucClang depends on being able to find the original input characters for the
506f4a2713aSLionel Sambuctoken.  This concept maps directly to the "spelling location" for the token.
507f4a2713aSLionel Sambuc
508f4a2713aSLionel Sambuc``SourceRange`` and ``CharSourceRange``
509f4a2713aSLionel Sambuc---------------------------------------
510f4a2713aSLionel Sambuc
511f4a2713aSLionel Sambuc.. mostly taken from http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-August/010595.html
512f4a2713aSLionel Sambuc
513f4a2713aSLionel SambucClang represents most source ranges by [first, last], where "first" and "last"
514f4a2713aSLionel Sambuceach point to the beginning of their respective tokens.  For example consider
515f4a2713aSLionel Sambucthe ``SourceRange`` of the following statement:
516f4a2713aSLionel Sambuc
517f4a2713aSLionel Sambuc.. code-block:: c++
518f4a2713aSLionel Sambuc
519f4a2713aSLionel Sambuc  x = foo + bar;
520f4a2713aSLionel Sambuc  ^first    ^last
521f4a2713aSLionel Sambuc
522f4a2713aSLionel SambucTo map from this representation to a character-based representation, the "last"
523f4a2713aSLionel Sambuclocation needs to be adjusted to point to (or past) the end of that token with
524f4a2713aSLionel Sambuceither ``Lexer::MeasureTokenLength()`` or ``Lexer::getLocForEndOfToken()``.  For
525f4a2713aSLionel Sambucthe rare cases where character-level source ranges information is needed we use
526f4a2713aSLionel Sambucthe ``CharSourceRange`` class.
527f4a2713aSLionel Sambuc
528f4a2713aSLionel SambucThe Driver Library
529f4a2713aSLionel Sambuc==================
530f4a2713aSLionel Sambuc
531f4a2713aSLionel SambucThe clang Driver and library are documented :doc:`here <DriverInternals>`.
532f4a2713aSLionel Sambuc
533f4a2713aSLionel SambucPrecompiled Headers
534f4a2713aSLionel Sambuc===================
535f4a2713aSLionel Sambuc
536f4a2713aSLionel SambucClang supports two implementations of precompiled headers.  The default
537f4a2713aSLionel Sambucimplementation, precompiled headers (:doc:`PCH <PCHInternals>`) uses a
538f4a2713aSLionel Sambucserialized representation of Clang's internal data structures, encoded with the
539f4a2713aSLionel Sambuc`LLVM bitstream format <http://llvm.org/docs/BitCodeFormat.html>`_.
540f4a2713aSLionel SambucPretokenized headers (:doc:`PTH <PTHInternals>`), on the other hand, contain a
541f4a2713aSLionel Sambucserialized representation of the tokens encountered when preprocessing a header
542f4a2713aSLionel Sambuc(and anything that header includes).
543f4a2713aSLionel Sambuc
544f4a2713aSLionel SambucThe Frontend Library
545f4a2713aSLionel Sambuc====================
546f4a2713aSLionel Sambuc
547f4a2713aSLionel SambucThe Frontend library contains functionality useful for building tools on top of
548f4a2713aSLionel Sambucthe Clang libraries, for example several methods for outputting diagnostics.
549f4a2713aSLionel Sambuc
550f4a2713aSLionel SambucThe Lexer and Preprocessor Library
551f4a2713aSLionel Sambuc==================================
552f4a2713aSLionel Sambuc
553f4a2713aSLionel SambucThe Lexer library contains several tightly-connected classes that are involved
554f4a2713aSLionel Sambucwith the nasty process of lexing and preprocessing C source code.  The main
555f4a2713aSLionel Sambucinterface to this library for outside clients is the large ``Preprocessor``
556f4a2713aSLionel Sambucclass.  It contains the various pieces of state that are required to coherently
557f4a2713aSLionel Sambucread tokens out of a translation unit.
558f4a2713aSLionel Sambuc
559f4a2713aSLionel SambucThe core interface to the ``Preprocessor`` object (once it is set up) is the
560f4a2713aSLionel Sambuc``Preprocessor::Lex`` method, which returns the next :ref:`Token <Token>` from
561f4a2713aSLionel Sambucthe preprocessor stream.  There are two types of token providers that the
562f4a2713aSLionel Sambucpreprocessor is capable of reading from: a buffer lexer (provided by the
563f4a2713aSLionel Sambuc:ref:`Lexer <Lexer>` class) and a buffered token stream (provided by the
564f4a2713aSLionel Sambuc:ref:`TokenLexer <TokenLexer>` class).
565f4a2713aSLionel Sambuc
566f4a2713aSLionel Sambuc.. _Token:
567f4a2713aSLionel Sambuc
568f4a2713aSLionel SambucThe Token class
569f4a2713aSLionel Sambuc---------------
570f4a2713aSLionel Sambuc
571f4a2713aSLionel SambucThe ``Token`` class is used to represent a single lexed token.  Tokens are
572f4a2713aSLionel Sambucintended to be used by the lexer/preprocess and parser libraries, but are not
573f4a2713aSLionel Sambucintended to live beyond them (for example, they should not live in the ASTs).
574f4a2713aSLionel Sambuc
575f4a2713aSLionel SambucTokens most often live on the stack (or some other location that is efficient
576f4a2713aSLionel Sambucto access) as the parser is running, but occasionally do get buffered up.  For
577f4a2713aSLionel Sambucexample, macro definitions are stored as a series of tokens, and the C++
578f4a2713aSLionel Sambucfront-end periodically needs to buffer tokens up for tentative parsing and
579f4a2713aSLionel Sambucvarious pieces of look-ahead.  As such, the size of a ``Token`` matters.  On a
580f4a2713aSLionel Sambuc32-bit system, ``sizeof(Token)`` is currently 16 bytes.
581f4a2713aSLionel Sambuc
582f4a2713aSLionel SambucTokens occur in two forms: :ref:`annotation tokens <AnnotationToken>` and
583f4a2713aSLionel Sambucnormal tokens.  Normal tokens are those returned by the lexer, annotation
584f4a2713aSLionel Sambuctokens represent semantic information and are produced by the parser, replacing
585f4a2713aSLionel Sambucnormal tokens in the token stream.  Normal tokens contain the following
586f4a2713aSLionel Sambucinformation:
587f4a2713aSLionel Sambuc
588f4a2713aSLionel Sambuc* **A SourceLocation** --- This indicates the location of the start of the
589f4a2713aSLionel Sambuc  token.
590f4a2713aSLionel Sambuc
591f4a2713aSLionel Sambuc* **A length** --- This stores the length of the token as stored in the
592f4a2713aSLionel Sambuc  ``SourceBuffer``.  For tokens that include them, this length includes
593f4a2713aSLionel Sambuc  trigraphs and escaped newlines which are ignored by later phases of the
594f4a2713aSLionel Sambuc  compiler.  By pointing into the original source buffer, it is always possible
595f4a2713aSLionel Sambuc  to get the original spelling of a token completely accurately.
596f4a2713aSLionel Sambuc
597f4a2713aSLionel Sambuc* **IdentifierInfo** --- If a token takes the form of an identifier, and if
598f4a2713aSLionel Sambuc  identifier lookup was enabled when the token was lexed (e.g., the lexer was
599f4a2713aSLionel Sambuc  not reading in "raw" mode) this contains a pointer to the unique hash value
600f4a2713aSLionel Sambuc  for the identifier.  Because the lookup happens before keyword
601f4a2713aSLionel Sambuc  identification, this field is set even for language keywords like "``for``".
602f4a2713aSLionel Sambuc
603f4a2713aSLionel Sambuc* **TokenKind** --- This indicates the kind of token as classified by the
604f4a2713aSLionel Sambuc  lexer.  This includes things like ``tok::starequal`` (for the "``*=``"
605f4a2713aSLionel Sambuc  operator), ``tok::ampamp`` for the "``&&``" token, and keyword values (e.g.,
606f4a2713aSLionel Sambuc  ``tok::kw_for``) for identifiers that correspond to keywords.  Note that
607f4a2713aSLionel Sambuc  some tokens can be spelled multiple ways.  For example, C++ supports
608f4a2713aSLionel Sambuc  "operator keywords", where things like "``and``" are treated exactly like the
609f4a2713aSLionel Sambuc  "``&&``" operator.  In these cases, the kind value is set to ``tok::ampamp``,
610f4a2713aSLionel Sambuc  which is good for the parser, which doesn't have to consider both forms.  For
611f4a2713aSLionel Sambuc  something that cares about which form is used (e.g., the preprocessor
612f4a2713aSLionel Sambuc  "stringize" operator) the spelling indicates the original form.
613f4a2713aSLionel Sambuc
614f4a2713aSLionel Sambuc* **Flags** --- There are currently four flags tracked by the
615f4a2713aSLionel Sambuc  lexer/preprocessor system on a per-token basis:
616f4a2713aSLionel Sambuc
617f4a2713aSLionel Sambuc  #. **StartOfLine** --- This was the first token that occurred on its input
618f4a2713aSLionel Sambuc     source line.
619f4a2713aSLionel Sambuc  #. **LeadingSpace** --- There was a space character either immediately before
620f4a2713aSLionel Sambuc     the token or transitively before the token as it was expanded through a
621f4a2713aSLionel Sambuc     macro.  The definition of this flag is very closely defined by the
622f4a2713aSLionel Sambuc     stringizing requirements of the preprocessor.
623f4a2713aSLionel Sambuc  #. **DisableExpand** --- This flag is used internally to the preprocessor to
624f4a2713aSLionel Sambuc     represent identifier tokens which have macro expansion disabled.  This
625f4a2713aSLionel Sambuc     prevents them from being considered as candidates for macro expansion ever
626f4a2713aSLionel Sambuc     in the future.
627f4a2713aSLionel Sambuc  #. **NeedsCleaning** --- This flag is set if the original spelling for the
628f4a2713aSLionel Sambuc     token includes a trigraph or escaped newline.  Since this is uncommon,
629f4a2713aSLionel Sambuc     many pieces of code can fast-path on tokens that did not need cleaning.
630f4a2713aSLionel Sambuc
631f4a2713aSLionel SambucOne interesting (and somewhat unusual) aspect of normal tokens is that they
632f4a2713aSLionel Sambucdon't contain any semantic information about the lexed value.  For example, if
633f4a2713aSLionel Sambucthe token was a pp-number token, we do not represent the value of the number
634f4a2713aSLionel Sambucthat was lexed (this is left for later pieces of code to decide).
635f4a2713aSLionel SambucAdditionally, the lexer library has no notion of typedef names vs variable
636f4a2713aSLionel Sambucnames: both are returned as identifiers, and the parser is left to decide
637f4a2713aSLionel Sambucwhether a specific identifier is a typedef or a variable (tracking this
638f4a2713aSLionel Sambucrequires scope information among other things).  The parser can do this
639f4a2713aSLionel Sambuctranslation by replacing tokens returned by the preprocessor with "Annotation
640f4a2713aSLionel SambucTokens".
641f4a2713aSLionel Sambuc
642f4a2713aSLionel Sambuc.. _AnnotationToken:
643f4a2713aSLionel Sambuc
644f4a2713aSLionel SambucAnnotation Tokens
645f4a2713aSLionel Sambuc-----------------
646f4a2713aSLionel Sambuc
647f4a2713aSLionel SambucAnnotation tokens are tokens that are synthesized by the parser and injected
648f4a2713aSLionel Sambucinto the preprocessor's token stream (replacing existing tokens) to record
649f4a2713aSLionel Sambucsemantic information found by the parser.  For example, if "``foo``" is found
650f4a2713aSLionel Sambucto be a typedef, the "``foo``" ``tok::identifier`` token is replaced with an
651f4a2713aSLionel Sambuc``tok::annot_typename``.  This is useful for a couple of reasons: 1) this makes
652f4a2713aSLionel Sambucit easy to handle qualified type names (e.g., "``foo::bar::baz<42>::t``") in
653f4a2713aSLionel SambucC++ as a single "token" in the parser.  2) if the parser backtracks, the
654f4a2713aSLionel Sambucreparse does not need to redo semantic analysis to determine whether a token
655f4a2713aSLionel Sambucsequence is a variable, type, template, etc.
656f4a2713aSLionel Sambuc
657f4a2713aSLionel SambucAnnotation tokens are created by the parser and reinjected into the parser's
658f4a2713aSLionel Sambuctoken stream (when backtracking is enabled).  Because they can only exist in
659f4a2713aSLionel Sambuctokens that the preprocessor-proper is done with, it doesn't need to keep
660f4a2713aSLionel Sambucaround flags like "start of line" that the preprocessor uses to do its job.
661f4a2713aSLionel SambucAdditionally, an annotation token may "cover" a sequence of preprocessor tokens
662f4a2713aSLionel Sambuc(e.g., "``a::b::c``" is five preprocessor tokens).  As such, the valid fields
663f4a2713aSLionel Sambucof an annotation token are different than the fields for a normal token (but
664f4a2713aSLionel Sambucthey are multiplexed into the normal ``Token`` fields):
665f4a2713aSLionel Sambuc
666f4a2713aSLionel Sambuc* **SourceLocation "Location"** --- The ``SourceLocation`` for the annotation
667f4a2713aSLionel Sambuc  token indicates the first token replaced by the annotation token.  In the
668f4a2713aSLionel Sambuc  example above, it would be the location of the "``a``" identifier.
669f4a2713aSLionel Sambuc* **SourceLocation "AnnotationEndLoc"** --- This holds the location of the last
670f4a2713aSLionel Sambuc  token replaced with the annotation token.  In the example above, it would be
671f4a2713aSLionel Sambuc  the location of the "``c``" identifier.
672f4a2713aSLionel Sambuc* **void* "AnnotationValue"** --- This contains an opaque object that the
673f4a2713aSLionel Sambuc  parser gets from ``Sema``.  The parser merely preserves the information for
674f4a2713aSLionel Sambuc  ``Sema`` to later interpret based on the annotation token kind.
675f4a2713aSLionel Sambuc* **TokenKind "Kind"** --- This indicates the kind of Annotation token this is.
676f4a2713aSLionel Sambuc  See below for the different valid kinds.
677f4a2713aSLionel Sambuc
678f4a2713aSLionel SambucAnnotation tokens currently come in three kinds:
679f4a2713aSLionel Sambuc
680f4a2713aSLionel Sambuc#. **tok::annot_typename**: This annotation token represents a resolved
681f4a2713aSLionel Sambuc   typename token that is potentially qualified.  The ``AnnotationValue`` field
682f4a2713aSLionel Sambuc   contains the ``QualType`` returned by ``Sema::getTypeName()``, possibly with
683f4a2713aSLionel Sambuc   source location information attached.
684f4a2713aSLionel Sambuc#. **tok::annot_cxxscope**: This annotation token represents a C++ scope
685f4a2713aSLionel Sambuc   specifier, such as "``A::B::``".  This corresponds to the grammar
686f4a2713aSLionel Sambuc   productions "*::*" and "*:: [opt] nested-name-specifier*".  The
687f4a2713aSLionel Sambuc   ``AnnotationValue`` pointer is a ``NestedNameSpecifier *`` returned by the
688f4a2713aSLionel Sambuc   ``Sema::ActOnCXXGlobalScopeSpecifier`` and
689f4a2713aSLionel Sambuc   ``Sema::ActOnCXXNestedNameSpecifier`` callbacks.
690f4a2713aSLionel Sambuc#. **tok::annot_template_id**: This annotation token represents a C++
691f4a2713aSLionel Sambuc   template-id such as "``foo<int, 4>``", where "``foo``" is the name of a
692f4a2713aSLionel Sambuc   template.  The ``AnnotationValue`` pointer is a pointer to a ``malloc``'d
693f4a2713aSLionel Sambuc   ``TemplateIdAnnotation`` object.  Depending on the context, a parsed
694f4a2713aSLionel Sambuc   template-id that names a type might become a typename annotation token (if
695f4a2713aSLionel Sambuc   all we care about is the named type, e.g., because it occurs in a type
696f4a2713aSLionel Sambuc   specifier) or might remain a template-id token (if we want to retain more
697f4a2713aSLionel Sambuc   source location information or produce a new type, e.g., in a declaration of
698f4a2713aSLionel Sambuc   a class template specialization).  template-id annotation tokens that refer
699f4a2713aSLionel Sambuc   to a type can be "upgraded" to typename annotation tokens by the parser.
700f4a2713aSLionel Sambuc
701f4a2713aSLionel SambucAs mentioned above, annotation tokens are not returned by the preprocessor,
702f4a2713aSLionel Sambucthey are formed on demand by the parser.  This means that the parser has to be
703f4a2713aSLionel Sambucaware of cases where an annotation could occur and form it where appropriate.
704f4a2713aSLionel SambucThis is somewhat similar to how the parser handles Translation Phase 6 of C99:
705f4a2713aSLionel SambucString Concatenation (see C99 5.1.1.2).  In the case of string concatenation,
706f4a2713aSLionel Sambucthe preprocessor just returns distinct ``tok::string_literal`` and
707f4a2713aSLionel Sambuc``tok::wide_string_literal`` tokens and the parser eats a sequence of them
708f4a2713aSLionel Sambucwherever the grammar indicates that a string literal can occur.
709f4a2713aSLionel Sambuc
710f4a2713aSLionel SambucIn order to do this, whenever the parser expects a ``tok::identifier`` or
711f4a2713aSLionel Sambuc``tok::coloncolon``, it should call the ``TryAnnotateTypeOrScopeToken`` or
712f4a2713aSLionel Sambuc``TryAnnotateCXXScopeToken`` methods to form the annotation token.  These
713f4a2713aSLionel Sambucmethods will maximally form the specified annotation tokens and replace the
714f4a2713aSLionel Sambuccurrent token with them, if applicable.  If the current tokens is not valid for
715f4a2713aSLionel Sambucan annotation token, it will remain an identifier or "``::``" token.
716f4a2713aSLionel Sambuc
717f4a2713aSLionel Sambuc.. _Lexer:
718f4a2713aSLionel Sambuc
719f4a2713aSLionel SambucThe ``Lexer`` class
720f4a2713aSLionel Sambuc-------------------
721f4a2713aSLionel Sambuc
722f4a2713aSLionel SambucThe ``Lexer`` class provides the mechanics of lexing tokens out of a source
723f4a2713aSLionel Sambucbuffer and deciding what they mean.  The ``Lexer`` is complicated by the fact
724f4a2713aSLionel Sambucthat it operates on raw buffers that have not had spelling eliminated (this is
725f4a2713aSLionel Sambuca necessity to get decent performance), but this is countered with careful
726f4a2713aSLionel Sambuccoding as well as standard performance techniques (for example, the comment
727f4a2713aSLionel Sambuchandling code is vectorized on X86 and PowerPC hosts).
728f4a2713aSLionel Sambuc
729f4a2713aSLionel SambucThe lexer has a couple of interesting modal features:
730f4a2713aSLionel Sambuc
731f4a2713aSLionel Sambuc* The lexer can operate in "raw" mode.  This mode has several features that
732f4a2713aSLionel Sambuc  make it possible to quickly lex the file (e.g., it stops identifier lookup,
733f4a2713aSLionel Sambuc  doesn't specially handle preprocessor tokens, handles EOF differently, etc).
734f4a2713aSLionel Sambuc  This mode is used for lexing within an "``#if 0``" block, for example.
735f4a2713aSLionel Sambuc* The lexer can capture and return comments as tokens.  This is required to
736f4a2713aSLionel Sambuc  support the ``-C`` preprocessor mode, which passes comments through, and is
737f4a2713aSLionel Sambuc  used by the diagnostic checker to identifier expect-error annotations.
738f4a2713aSLionel Sambuc* The lexer can be in ``ParsingFilename`` mode, which happens when
739f4a2713aSLionel Sambuc  preprocessing after reading a ``#include`` directive.  This mode changes the
740f4a2713aSLionel Sambuc  parsing of "``<``" to return an "angled string" instead of a bunch of tokens
741f4a2713aSLionel Sambuc  for each thing within the filename.
742f4a2713aSLionel Sambuc* When parsing a preprocessor directive (after "``#``") the
743f4a2713aSLionel Sambuc  ``ParsingPreprocessorDirective`` mode is entered.  This changes the parser to
744f4a2713aSLionel Sambuc  return EOD at a newline.
745f4a2713aSLionel Sambuc* The ``Lexer`` uses a ``LangOptions`` object to know whether trigraphs are
746f4a2713aSLionel Sambuc  enabled, whether C++ or ObjC keywords are recognized, etc.
747f4a2713aSLionel Sambuc
748f4a2713aSLionel SambucIn addition to these modes, the lexer keeps track of a couple of other features
749f4a2713aSLionel Sambucthat are local to a lexed buffer, which change as the buffer is lexed:
750f4a2713aSLionel Sambuc
751f4a2713aSLionel Sambuc* The ``Lexer`` uses ``BufferPtr`` to keep track of the current character being
752f4a2713aSLionel Sambuc  lexed.
753f4a2713aSLionel Sambuc* The ``Lexer`` uses ``IsAtStartOfLine`` to keep track of whether the next
754f4a2713aSLionel Sambuc  lexed token will start with its "start of line" bit set.
755f4a2713aSLionel Sambuc* The ``Lexer`` keeps track of the current "``#if``" directives that are active
756f4a2713aSLionel Sambuc  (which can be nested).
757f4a2713aSLionel Sambuc* The ``Lexer`` keeps track of an :ref:`MultipleIncludeOpt
758f4a2713aSLionel Sambuc  <MultipleIncludeOpt>` object, which is used to detect whether the buffer uses
759f4a2713aSLionel Sambuc  the standard "``#ifndef XX`` / ``#define XX``" idiom to prevent multiple
760f4a2713aSLionel Sambuc  inclusion.  If a buffer does, subsequent includes can be ignored if the
761f4a2713aSLionel Sambuc  "``XX``" macro is defined.
762f4a2713aSLionel Sambuc
763f4a2713aSLionel Sambuc.. _TokenLexer:
764f4a2713aSLionel Sambuc
765f4a2713aSLionel SambucThe ``TokenLexer`` class
766f4a2713aSLionel Sambuc------------------------
767f4a2713aSLionel Sambuc
768f4a2713aSLionel SambucThe ``TokenLexer`` class is a token provider that returns tokens from a list of
769f4a2713aSLionel Sambuctokens that came from somewhere else.  It typically used for two things: 1)
770f4a2713aSLionel Sambucreturning tokens from a macro definition as it is being expanded 2) returning
771f4a2713aSLionel Sambuctokens from an arbitrary buffer of tokens.  The later use is used by
772f4a2713aSLionel Sambuc``_Pragma`` and will most likely be used to handle unbounded look-ahead for the
773f4a2713aSLionel SambucC++ parser.
774f4a2713aSLionel Sambuc
775f4a2713aSLionel Sambuc.. _MultipleIncludeOpt:
776f4a2713aSLionel Sambuc
777f4a2713aSLionel SambucThe ``MultipleIncludeOpt`` class
778f4a2713aSLionel Sambuc--------------------------------
779f4a2713aSLionel Sambuc
780f4a2713aSLionel SambucThe ``MultipleIncludeOpt`` class implements a really simple little state
781f4a2713aSLionel Sambucmachine that is used to detect the standard "``#ifndef XX`` / ``#define XX``"
782f4a2713aSLionel Sambucidiom that people typically use to prevent multiple inclusion of headers.  If a
783f4a2713aSLionel Sambucbuffer uses this idiom and is subsequently ``#include``'d, the preprocessor can
784f4a2713aSLionel Sambucsimply check to see whether the guarding condition is defined or not.  If so,
785f4a2713aSLionel Sambucthe preprocessor can completely ignore the include of the header.
786f4a2713aSLionel Sambuc
787*0a6a1f1dSLionel Sambuc.. _Parser:
788*0a6a1f1dSLionel Sambuc
789f4a2713aSLionel SambucThe Parser Library
790f4a2713aSLionel Sambuc==================
791f4a2713aSLionel Sambuc
792*0a6a1f1dSLionel SambucThis library contains a recursive-descent parser that polls tokens from the
793*0a6a1f1dSLionel Sambucpreprocessor and notifies a client of the parsing progress.
794*0a6a1f1dSLionel Sambuc
795*0a6a1f1dSLionel SambucHistorically, the parser used to talk to an abstract ``Action`` interface that
796*0a6a1f1dSLionel Sambuchad virtual methods for parse events, for example ``ActOnBinOp()``.  When Clang
797*0a6a1f1dSLionel Sambucgrew C++ support, the parser stopped supporting general ``Action`` clients --
798*0a6a1f1dSLionel Sambucit now always talks to the :ref:`Sema libray <Sema>`.  However, the Parser
799*0a6a1f1dSLionel Sambucstill accesses AST objects only through opaque types like ``ExprResult`` and
800*0a6a1f1dSLionel Sambuc``StmtResult``.  Only :ref:`Sema <Sema>` looks at the AST node contents of these
801*0a6a1f1dSLionel Sambucwrappers.
802*0a6a1f1dSLionel Sambuc
803*0a6a1f1dSLionel Sambuc.. _AST:
804*0a6a1f1dSLionel Sambuc
805f4a2713aSLionel SambucThe AST Library
806f4a2713aSLionel Sambuc===============
807f4a2713aSLionel Sambuc
808f4a2713aSLionel Sambuc.. _Type:
809f4a2713aSLionel Sambuc
810f4a2713aSLionel SambucThe ``Type`` class and its subclasses
811f4a2713aSLionel Sambuc-------------------------------------
812f4a2713aSLionel Sambuc
813f4a2713aSLionel SambucThe ``Type`` class (and its subclasses) are an important part of the AST.
814f4a2713aSLionel SambucTypes are accessed through the ``ASTContext`` class, which implicitly creates
815f4a2713aSLionel Sambucand uniques them as they are needed.  Types have a couple of non-obvious
816f4a2713aSLionel Sambucfeatures: 1) they do not capture type qualifiers like ``const`` or ``volatile``
817f4a2713aSLionel Sambuc(see :ref:`QualType <QualType>`), and 2) they implicitly capture typedef
818f4a2713aSLionel Sambucinformation.  Once created, types are immutable (unlike decls).
819f4a2713aSLionel Sambuc
820f4a2713aSLionel SambucTypedefs in C make semantic analysis a bit more complex than it would be without
821f4a2713aSLionel Sambucthem.  The issue is that we want to capture typedef information and represent it
822f4a2713aSLionel Sambucin the AST perfectly, but the semantics of operations need to "see through"
823f4a2713aSLionel Sambuctypedefs.  For example, consider this code:
824f4a2713aSLionel Sambuc
825f4a2713aSLionel Sambuc.. code-block:: c++
826f4a2713aSLionel Sambuc
827f4a2713aSLionel Sambuc  void func() {
828f4a2713aSLionel Sambuc    typedef int foo;
829f4a2713aSLionel Sambuc    foo X, *Y;
830f4a2713aSLionel Sambuc    typedef foo *bar;
831f4a2713aSLionel Sambuc    bar Z;
832f4a2713aSLionel Sambuc    *X; // error
833f4a2713aSLionel Sambuc    **Y; // error
834f4a2713aSLionel Sambuc    **Z; // error
835f4a2713aSLionel Sambuc  }
836f4a2713aSLionel Sambuc
837f4a2713aSLionel SambucThe code above is illegal, and thus we expect there to be diagnostics emitted
838f4a2713aSLionel Sambucon the annotated lines.  In this example, we expect to get:
839f4a2713aSLionel Sambuc
840f4a2713aSLionel Sambuc.. code-block:: c++
841f4a2713aSLionel Sambuc
842f4a2713aSLionel Sambuc  test.c:6:1: error: indirection requires pointer operand ('foo' invalid)
843f4a2713aSLionel Sambuc    *X; // error
844f4a2713aSLionel Sambuc    ^~
845f4a2713aSLionel Sambuc  test.c:7:1: error: indirection requires pointer operand ('foo' invalid)
846f4a2713aSLionel Sambuc    **Y; // error
847f4a2713aSLionel Sambuc    ^~~
848f4a2713aSLionel Sambuc  test.c:8:1: error: indirection requires pointer operand ('foo' invalid)
849f4a2713aSLionel Sambuc    **Z; // error
850f4a2713aSLionel Sambuc    ^~~
851f4a2713aSLionel Sambuc
852f4a2713aSLionel SambucWhile this example is somewhat silly, it illustrates the point: we want to
853f4a2713aSLionel Sambucretain typedef information where possible, so that we can emit errors about
854f4a2713aSLionel Sambuc"``std::string``" instead of "``std::basic_string<char, std:...``".  Doing this
855f4a2713aSLionel Sambucrequires properly keeping typedef information (for example, the type of ``X``
856f4a2713aSLionel Sambucis "``foo``", not "``int``"), and requires properly propagating it through the
857f4a2713aSLionel Sambucvarious operators (for example, the type of ``*Y`` is "``foo``", not
858f4a2713aSLionel Sambuc"``int``").  In order to retain this information, the type of these expressions
859f4a2713aSLionel Sambucis an instance of the ``TypedefType`` class, which indicates that the type of
860f4a2713aSLionel Sambucthese expressions is a typedef for "``foo``".
861f4a2713aSLionel Sambuc
862f4a2713aSLionel SambucRepresenting types like this is great for diagnostics, because the
863f4a2713aSLionel Sambucuser-specified type is always immediately available.  There are two problems
864f4a2713aSLionel Sambucwith this: first, various semantic checks need to make judgements about the
865f4a2713aSLionel Sambuc*actual structure* of a type, ignoring typedefs.  Second, we need an efficient
866f4a2713aSLionel Sambucway to query whether two types are structurally identical to each other,
867f4a2713aSLionel Sambucignoring typedefs.  The solution to both of these problems is the idea of
868f4a2713aSLionel Sambuccanonical types.
869f4a2713aSLionel Sambuc
870f4a2713aSLionel SambucCanonical Types
871f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^
872f4a2713aSLionel Sambuc
873f4a2713aSLionel SambucEvery instance of the ``Type`` class contains a canonical type pointer.  For
874f4a2713aSLionel Sambucsimple types with no typedefs involved (e.g., "``int``", "``int*``",
875f4a2713aSLionel Sambuc"``int**``"), the type just points to itself.  For types that have a typedef
876f4a2713aSLionel Sambucsomewhere in their structure (e.g., "``foo``", "``foo*``", "``foo**``",
877f4a2713aSLionel Sambuc"``bar``"), the canonical type pointer points to their structurally equivalent
878f4a2713aSLionel Sambuctype without any typedefs (e.g., "``int``", "``int*``", "``int**``", and
879f4a2713aSLionel Sambuc"``int*``" respectively).
880f4a2713aSLionel Sambuc
881f4a2713aSLionel SambucThis design provides a constant time operation (dereferencing the canonical type
882f4a2713aSLionel Sambucpointer) that gives us access to the structure of types.  For example, we can
883f4a2713aSLionel Sambuctrivially tell that "``bar``" and "``foo*``" are the same type by dereferencing
884f4a2713aSLionel Sambuctheir canonical type pointers and doing a pointer comparison (they both point
885f4a2713aSLionel Sambucto the single "``int*``" type).
886f4a2713aSLionel Sambuc
887f4a2713aSLionel SambucCanonical types and typedef types bring up some complexities that must be
888f4a2713aSLionel Sambuccarefully managed.  Specifically, the ``isa``/``cast``/``dyn_cast`` operators
889f4a2713aSLionel Sambucgenerally shouldn't be used in code that is inspecting the AST.  For example,
890f4a2713aSLionel Sambucwhen type checking the indirection operator (unary "``*``" on a pointer), the
891f4a2713aSLionel Sambuctype checker must verify that the operand has a pointer type.  It would not be
892f4a2713aSLionel Sambuccorrect to check that with "``isa<PointerType>(SubExpr->getType())``", because
893f4a2713aSLionel Sambucthis predicate would fail if the subexpression had a typedef type.
894f4a2713aSLionel Sambuc
895f4a2713aSLionel SambucThe solution to this problem are a set of helper methods on ``Type``, used to
896f4a2713aSLionel Sambuccheck their properties.  In this case, it would be correct to use
897f4a2713aSLionel Sambuc"``SubExpr->getType()->isPointerType()``" to do the check.  This predicate will
898f4a2713aSLionel Sambucreturn true if the *canonical type is a pointer*, which is true any time the
899f4a2713aSLionel Sambuctype is structurally a pointer type.  The only hard part here is remembering
900f4a2713aSLionel Sambucnot to use the ``isa``/``cast``/``dyn_cast`` operations.
901f4a2713aSLionel Sambuc
902f4a2713aSLionel SambucThe second problem we face is how to get access to the pointer type once we
903f4a2713aSLionel Sambucknow it exists.  To continue the example, the result type of the indirection
904f4a2713aSLionel Sambucoperator is the pointee type of the subexpression.  In order to determine the
905f4a2713aSLionel Sambuctype, we need to get the instance of ``PointerType`` that best captures the
906f4a2713aSLionel Sambuctypedef information in the program.  If the type of the expression is literally
907f4a2713aSLionel Sambuca ``PointerType``, we can return that, otherwise we have to dig through the
908f4a2713aSLionel Sambuctypedefs to find the pointer type.  For example, if the subexpression had type
909f4a2713aSLionel Sambuc"``foo*``", we could return that type as the result.  If the subexpression had
910f4a2713aSLionel Sambuctype "``bar``", we want to return "``foo*``" (note that we do *not* want
911f4a2713aSLionel Sambuc"``int*``").  In order to provide all of this, ``Type`` has a
912f4a2713aSLionel Sambuc``getAsPointerType()`` method that checks whether the type is structurally a
913f4a2713aSLionel Sambuc``PointerType`` and, if so, returns the best one.  If not, it returns a null
914f4a2713aSLionel Sambucpointer.
915f4a2713aSLionel Sambuc
916f4a2713aSLionel SambucThis structure is somewhat mystical, but after meditating on it, it will make
917f4a2713aSLionel Sambucsense to you :).
918f4a2713aSLionel Sambuc
919f4a2713aSLionel Sambuc.. _QualType:
920f4a2713aSLionel Sambuc
921f4a2713aSLionel SambucThe ``QualType`` class
922f4a2713aSLionel Sambuc----------------------
923f4a2713aSLionel Sambuc
924f4a2713aSLionel SambucThe ``QualType`` class is designed as a trivial value class that is small,
925f4a2713aSLionel Sambucpassed by-value and is efficient to query.  The idea of ``QualType`` is that it
926f4a2713aSLionel Sambucstores the type qualifiers (``const``, ``volatile``, ``restrict``, plus some
927f4a2713aSLionel Sambucextended qualifiers required by language extensions) separately from the types
928f4a2713aSLionel Sambucthemselves.  ``QualType`` is conceptually a pair of "``Type*``" and the bits
929f4a2713aSLionel Sambucfor these type qualifiers.
930f4a2713aSLionel Sambuc
931f4a2713aSLionel SambucBy storing the type qualifiers as bits in the conceptual pair, it is extremely
932f4a2713aSLionel Sambucefficient to get the set of qualifiers on a ``QualType`` (just return the field
933f4a2713aSLionel Sambucof the pair), add a type qualifier (which is a trivial constant-time operation
934f4a2713aSLionel Sambucthat sets a bit), and remove one or more type qualifiers (just return a
935f4a2713aSLionel Sambuc``QualType`` with the bitfield set to empty).
936f4a2713aSLionel Sambuc
937f4a2713aSLionel SambucFurther, because the bits are stored outside of the type itself, we do not need
938f4a2713aSLionel Sambucto create duplicates of types with different sets of qualifiers (i.e. there is
939f4a2713aSLionel Sambuconly a single heap allocated "``int``" type: "``const int``" and "``volatile
940f4a2713aSLionel Sambucconst int``" both point to the same heap allocated "``int``" type).  This
941f4a2713aSLionel Sambucreduces the heap size used to represent bits and also means we do not have to
942f4a2713aSLionel Sambucconsider qualifiers when uniquing types (:ref:`Type <Type>` does not even
943f4a2713aSLionel Sambuccontain qualifiers).
944f4a2713aSLionel Sambuc
945f4a2713aSLionel SambucIn practice, the two most common type qualifiers (``const`` and ``restrict``)
946f4a2713aSLionel Sambucare stored in the low bits of the pointer to the ``Type`` object, together with
947f4a2713aSLionel Sambuca flag indicating whether extended qualifiers are present (which must be
948f4a2713aSLionel Sambucheap-allocated).  This means that ``QualType`` is exactly the same size as a
949f4a2713aSLionel Sambucpointer.
950f4a2713aSLionel Sambuc
951f4a2713aSLionel Sambuc.. _DeclarationName:
952f4a2713aSLionel Sambuc
953f4a2713aSLionel SambucDeclaration names
954f4a2713aSLionel Sambuc-----------------
955f4a2713aSLionel Sambuc
956f4a2713aSLionel SambucThe ``DeclarationName`` class represents the name of a declaration in Clang.
957f4a2713aSLionel SambucDeclarations in the C family of languages can take several different forms.
958f4a2713aSLionel SambucMost declarations are named by simple identifiers, e.g., "``f``" and "``x``" in
959f4a2713aSLionel Sambucthe function declaration ``f(int x)``.  In C++, declaration names can also name
960f4a2713aSLionel Sambucclass constructors ("``Class``" in ``struct Class { Class(); }``), class
961f4a2713aSLionel Sambucdestructors ("``~Class``"), overloaded operator names ("``operator+``"), and
962f4a2713aSLionel Sambucconversion functions ("``operator void const *``").  In Objective-C,
963f4a2713aSLionel Sambucdeclaration names can refer to the names of Objective-C methods, which involve
964f4a2713aSLionel Sambucthe method name and the parameters, collectively called a *selector*, e.g.,
965f4a2713aSLionel Sambuc"``setWidth:height:``".  Since all of these kinds of entities --- variables,
966f4a2713aSLionel Sambucfunctions, Objective-C methods, C++ constructors, destructors, and operators
967f4a2713aSLionel Sambuc--- are represented as subclasses of Clang's common ``NamedDecl`` class,
968f4a2713aSLionel Sambuc``DeclarationName`` is designed to efficiently represent any kind of name.
969f4a2713aSLionel Sambuc
970f4a2713aSLionel SambucGiven a ``DeclarationName`` ``N``, ``N.getNameKind()`` will produce a value
971f4a2713aSLionel Sambucthat describes what kind of name ``N`` stores.  There are 10 options (all of
972f4a2713aSLionel Sambucthe names are inside the ``DeclarationName`` class).
973f4a2713aSLionel Sambuc
974f4a2713aSLionel Sambuc``Identifier``
975f4a2713aSLionel Sambuc
976f4a2713aSLionel Sambuc  The name is a simple identifier.  Use ``N.getAsIdentifierInfo()`` to retrieve
977f4a2713aSLionel Sambuc  the corresponding ``IdentifierInfo*`` pointing to the actual identifier.
978f4a2713aSLionel Sambuc
979f4a2713aSLionel Sambuc``ObjCZeroArgSelector``, ``ObjCOneArgSelector``, ``ObjCMultiArgSelector``
980f4a2713aSLionel Sambuc
981f4a2713aSLionel Sambuc  The name is an Objective-C selector, which can be retrieved as a ``Selector``
982f4a2713aSLionel Sambuc  instance via ``N.getObjCSelector()``.  The three possible name kinds for
983f4a2713aSLionel Sambuc  Objective-C reflect an optimization within the ``DeclarationName`` class:
984f4a2713aSLionel Sambuc  both zero- and one-argument selectors are stored as a masked
985f4a2713aSLionel Sambuc  ``IdentifierInfo`` pointer, and therefore require very little space, since
986f4a2713aSLionel Sambuc  zero- and one-argument selectors are far more common than multi-argument
987f4a2713aSLionel Sambuc  selectors (which use a different structure).
988f4a2713aSLionel Sambuc
989f4a2713aSLionel Sambuc``CXXConstructorName``
990f4a2713aSLionel Sambuc
991f4a2713aSLionel Sambuc  The name is a C++ constructor name.  Use ``N.getCXXNameType()`` to retrieve
992f4a2713aSLionel Sambuc  the :ref:`type <QualType>` that this constructor is meant to construct.  The
993f4a2713aSLionel Sambuc  type is always the canonical type, since all constructors for a given type
994f4a2713aSLionel Sambuc  have the same name.
995f4a2713aSLionel Sambuc
996f4a2713aSLionel Sambuc``CXXDestructorName``
997f4a2713aSLionel Sambuc
998f4a2713aSLionel Sambuc  The name is a C++ destructor name.  Use ``N.getCXXNameType()`` to retrieve
999f4a2713aSLionel Sambuc  the :ref:`type <QualType>` whose destructor is being named.  This type is
1000f4a2713aSLionel Sambuc  always a canonical type.
1001f4a2713aSLionel Sambuc
1002f4a2713aSLionel Sambuc``CXXConversionFunctionName``
1003f4a2713aSLionel Sambuc
1004f4a2713aSLionel Sambuc  The name is a C++ conversion function.  Conversion functions are named
1005f4a2713aSLionel Sambuc  according to the type they convert to, e.g., "``operator void const *``".
1006f4a2713aSLionel Sambuc  Use ``N.getCXXNameType()`` to retrieve the type that this conversion function
1007f4a2713aSLionel Sambuc  converts to.  This type is always a canonical type.
1008f4a2713aSLionel Sambuc
1009f4a2713aSLionel Sambuc``CXXOperatorName``
1010f4a2713aSLionel Sambuc
1011f4a2713aSLionel Sambuc  The name is a C++ overloaded operator name.  Overloaded operators are named
1012f4a2713aSLionel Sambuc  according to their spelling, e.g., "``operator+``" or "``operator new []``".
1013f4a2713aSLionel Sambuc  Use ``N.getCXXOverloadedOperator()`` to retrieve the overloaded operator (a
1014f4a2713aSLionel Sambuc  value of type ``OverloadedOperatorKind``).
1015f4a2713aSLionel Sambuc
1016f4a2713aSLionel Sambuc``CXXLiteralOperatorName``
1017f4a2713aSLionel Sambuc
1018f4a2713aSLionel Sambuc  The name is a C++11 user defined literal operator.  User defined
1019f4a2713aSLionel Sambuc  Literal operators are named according to the suffix they define,
1020f4a2713aSLionel Sambuc  e.g., "``_foo``" for "``operator "" _foo``".  Use
1021f4a2713aSLionel Sambuc  ``N.getCXXLiteralIdentifier()`` to retrieve the corresponding
1022f4a2713aSLionel Sambuc  ``IdentifierInfo*`` pointing to the identifier.
1023f4a2713aSLionel Sambuc
1024f4a2713aSLionel Sambuc``CXXUsingDirective``
1025f4a2713aSLionel Sambuc
1026f4a2713aSLionel Sambuc  The name is a C++ using directive.  Using directives are not really
1027f4a2713aSLionel Sambuc  NamedDecls, in that they all have the same name, but they are
1028f4a2713aSLionel Sambuc  implemented as such in order to store them in DeclContext
1029f4a2713aSLionel Sambuc  effectively.
1030f4a2713aSLionel Sambuc
1031f4a2713aSLionel Sambuc``DeclarationName``\ s are cheap to create, copy, and compare.  They require
1032f4a2713aSLionel Sambuconly a single pointer's worth of storage in the common cases (identifiers,
1033f4a2713aSLionel Sambuczero- and one-argument Objective-C selectors) and use dense, uniqued storage
1034f4a2713aSLionel Sambucfor the other kinds of names.  Two ``DeclarationName``\ s can be compared for
1035f4a2713aSLionel Sambucequality (``==``, ``!=``) using a simple bitwise comparison, can be ordered
1036f4a2713aSLionel Sambucwith ``<``, ``>``, ``<=``, and ``>=`` (which provide a lexicographical ordering
1037f4a2713aSLionel Sambucfor normal identifiers but an unspecified ordering for other kinds of names),
1038f4a2713aSLionel Sambucand can be placed into LLVM ``DenseMap``\ s and ``DenseSet``\ s.
1039f4a2713aSLionel Sambuc
1040f4a2713aSLionel Sambuc``DeclarationName`` instances can be created in different ways depending on
1041f4a2713aSLionel Sambucwhat kind of name the instance will store.  Normal identifiers
1042f4a2713aSLionel Sambuc(``IdentifierInfo`` pointers) and Objective-C selectors (``Selector``) can be
1043f4a2713aSLionel Sambucimplicitly converted to ``DeclarationNames``.  Names for C++ constructors,
1044f4a2713aSLionel Sambucdestructors, conversion functions, and overloaded operators can be retrieved
1045f4a2713aSLionel Sambucfrom the ``DeclarationNameTable``, an instance of which is available as
1046f4a2713aSLionel Sambuc``ASTContext::DeclarationNames``.  The member functions
1047f4a2713aSLionel Sambuc``getCXXConstructorName``, ``getCXXDestructorName``,
1048f4a2713aSLionel Sambuc``getCXXConversionFunctionName``, and ``getCXXOperatorName``, respectively,
1049f4a2713aSLionel Sambucreturn ``DeclarationName`` instances for the four kinds of C++ special function
1050f4a2713aSLionel Sambucnames.
1051f4a2713aSLionel Sambuc
1052f4a2713aSLionel Sambuc.. _DeclContext:
1053f4a2713aSLionel Sambuc
1054f4a2713aSLionel SambucDeclaration contexts
1055f4a2713aSLionel Sambuc--------------------
1056f4a2713aSLionel Sambuc
1057f4a2713aSLionel SambucEvery declaration in a program exists within some *declaration context*, such
1058f4a2713aSLionel Sambucas a translation unit, namespace, class, or function.  Declaration contexts in
1059f4a2713aSLionel SambucClang are represented by the ``DeclContext`` class, from which the various
1060f4a2713aSLionel Sambucdeclaration-context AST nodes (``TranslationUnitDecl``, ``NamespaceDecl``,
1061f4a2713aSLionel Sambuc``RecordDecl``, ``FunctionDecl``, etc.) will derive.  The ``DeclContext`` class
1062f4a2713aSLionel Sambucprovides several facilities common to each declaration context:
1063f4a2713aSLionel Sambuc
1064f4a2713aSLionel SambucSource-centric vs. Semantics-centric View of Declarations
1065f4a2713aSLionel Sambuc
1066f4a2713aSLionel Sambuc  ``DeclContext`` provides two views of the declarations stored within a
1067f4a2713aSLionel Sambuc  declaration context.  The source-centric view accurately represents the
1068f4a2713aSLionel Sambuc  program source code as written, including multiple declarations of entities
1069f4a2713aSLionel Sambuc  where present (see the section :ref:`Redeclarations and Overloads
1070f4a2713aSLionel Sambuc  <Redeclarations>`), while the semantics-centric view represents the program
1071f4a2713aSLionel Sambuc  semantics.  The two views are kept synchronized by semantic analysis while
1072f4a2713aSLionel Sambuc  the ASTs are being constructed.
1073f4a2713aSLionel Sambuc
1074f4a2713aSLionel SambucStorage of declarations within that context
1075f4a2713aSLionel Sambuc
1076f4a2713aSLionel Sambuc  Every declaration context can contain some number of declarations.  For
1077f4a2713aSLionel Sambuc  example, a C++ class (represented by ``RecordDecl``) contains various member
1078f4a2713aSLionel Sambuc  functions, fields, nested types, and so on.  All of these declarations will
1079f4a2713aSLionel Sambuc  be stored within the ``DeclContext``, and one can iterate over the
1080f4a2713aSLionel Sambuc  declarations via [``DeclContext::decls_begin()``,
1081f4a2713aSLionel Sambuc  ``DeclContext::decls_end()``).  This mechanism provides the source-centric
1082f4a2713aSLionel Sambuc  view of declarations in the context.
1083f4a2713aSLionel Sambuc
1084f4a2713aSLionel SambucLookup of declarations within that context
1085f4a2713aSLionel Sambuc
1086f4a2713aSLionel Sambuc  The ``DeclContext`` structure provides efficient name lookup for names within
1087f4a2713aSLionel Sambuc  that declaration context.  For example, if ``N`` is a namespace we can look
1088f4a2713aSLionel Sambuc  for the name ``N::f`` using ``DeclContext::lookup``.  The lookup itself is
1089f4a2713aSLionel Sambuc  based on a lazily-constructed array (for declaration contexts with a small
1090f4a2713aSLionel Sambuc  number of declarations) or hash table (for declaration contexts with more
1091f4a2713aSLionel Sambuc  declarations).  The lookup operation provides the semantics-centric view of
1092f4a2713aSLionel Sambuc  the declarations in the context.
1093f4a2713aSLionel Sambuc
1094f4a2713aSLionel SambucOwnership of declarations
1095f4a2713aSLionel Sambuc
1096f4a2713aSLionel Sambuc  The ``DeclContext`` owns all of the declarations that were declared within
1097f4a2713aSLionel Sambuc  its declaration context, and is responsible for the management of their
1098f4a2713aSLionel Sambuc  memory as well as their (de-)serialization.
1099f4a2713aSLionel Sambuc
1100f4a2713aSLionel SambucAll declarations are stored within a declaration context, and one can query
1101f4a2713aSLionel Sambucinformation about the context in which each declaration lives.  One can
1102f4a2713aSLionel Sambucretrieve the ``DeclContext`` that contains a particular ``Decl`` using
1103f4a2713aSLionel Sambuc``Decl::getDeclContext``.  However, see the section
1104f4a2713aSLionel Sambuc:ref:`LexicalAndSemanticContexts` for more information about how to interpret
1105f4a2713aSLionel Sambucthis context information.
1106f4a2713aSLionel Sambuc
1107f4a2713aSLionel Sambuc.. _Redeclarations:
1108f4a2713aSLionel Sambuc
1109f4a2713aSLionel SambucRedeclarations and Overloads
1110f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1111f4a2713aSLionel Sambuc
1112f4a2713aSLionel SambucWithin a translation unit, it is common for an entity to be declared several
1113f4a2713aSLionel Sambuctimes.  For example, we might declare a function "``f``" and then later
1114f4a2713aSLionel Sambucre-declare it as part of an inlined definition:
1115f4a2713aSLionel Sambuc
1116f4a2713aSLionel Sambuc.. code-block:: c++
1117f4a2713aSLionel Sambuc
1118f4a2713aSLionel Sambuc  void f(int x, int y, int z = 1);
1119f4a2713aSLionel Sambuc
1120f4a2713aSLionel Sambuc  inline void f(int x, int y, int z) { /* ...  */ }
1121f4a2713aSLionel Sambuc
1122f4a2713aSLionel SambucThe representation of "``f``" differs in the source-centric and
1123f4a2713aSLionel Sambucsemantics-centric views of a declaration context.  In the source-centric view,
1124f4a2713aSLionel Sambucall redeclarations will be present, in the order they occurred in the source
1125f4a2713aSLionel Sambuccode, making this view suitable for clients that wish to see the structure of
1126f4a2713aSLionel Sambucthe source code.  In the semantics-centric view, only the most recent "``f``"
1127f4a2713aSLionel Sambucwill be found by the lookup, since it effectively replaces the first
1128f4a2713aSLionel Sambucdeclaration of "``f``".
1129f4a2713aSLionel Sambuc
1130f4a2713aSLionel SambucIn the semantics-centric view, overloading of functions is represented
1131f4a2713aSLionel Sambucexplicitly.  For example, given two declarations of a function "``g``" that are
1132f4a2713aSLionel Sambucoverloaded, e.g.,
1133f4a2713aSLionel Sambuc
1134f4a2713aSLionel Sambuc.. code-block:: c++
1135f4a2713aSLionel Sambuc
1136f4a2713aSLionel Sambuc  void g();
1137f4a2713aSLionel Sambuc  void g(int);
1138f4a2713aSLionel Sambuc
1139f4a2713aSLionel Sambucthe ``DeclContext::lookup`` operation will return a
1140f4a2713aSLionel Sambuc``DeclContext::lookup_result`` that contains a range of iterators over
1141f4a2713aSLionel Sambucdeclarations of "``g``".  Clients that perform semantic analysis on a program
1142f4a2713aSLionel Sambucthat is not concerned with the actual source code will primarily use this
1143f4a2713aSLionel Sambucsemantics-centric view.
1144f4a2713aSLionel Sambuc
1145f4a2713aSLionel Sambuc.. _LexicalAndSemanticContexts:
1146f4a2713aSLionel Sambuc
1147f4a2713aSLionel SambucLexical and Semantic Contexts
1148f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1149f4a2713aSLionel Sambuc
1150f4a2713aSLionel SambucEach declaration has two potentially different declaration contexts: a
1151f4a2713aSLionel Sambuc*lexical* context, which corresponds to the source-centric view of the
1152f4a2713aSLionel Sambucdeclaration context, and a *semantic* context, which corresponds to the
1153f4a2713aSLionel Sambucsemantics-centric view.  The lexical context is accessible via
1154f4a2713aSLionel Sambuc``Decl::getLexicalDeclContext`` while the semantic context is accessible via
1155f4a2713aSLionel Sambuc``Decl::getDeclContext``, both of which return ``DeclContext`` pointers.  For
1156f4a2713aSLionel Sambucmost declarations, the two contexts are identical.  For example:
1157f4a2713aSLionel Sambuc
1158f4a2713aSLionel Sambuc.. code-block:: c++
1159f4a2713aSLionel Sambuc
1160f4a2713aSLionel Sambuc  class X {
1161f4a2713aSLionel Sambuc  public:
1162f4a2713aSLionel Sambuc    void f(int x);
1163f4a2713aSLionel Sambuc  };
1164f4a2713aSLionel Sambuc
1165f4a2713aSLionel SambucHere, the semantic and lexical contexts of ``X::f`` are the ``DeclContext``
1166f4a2713aSLionel Sambucassociated with the class ``X`` (itself stored as a ``RecordDecl`` AST node).
1167f4a2713aSLionel SambucHowever, we can now define ``X::f`` out-of-line:
1168f4a2713aSLionel Sambuc
1169f4a2713aSLionel Sambuc.. code-block:: c++
1170f4a2713aSLionel Sambuc
1171f4a2713aSLionel Sambuc  void X::f(int x = 17) { /* ...  */ }
1172f4a2713aSLionel Sambuc
1173f4a2713aSLionel SambucThis definition of "``f``" has different lexical and semantic contexts.  The
1174f4a2713aSLionel Sambuclexical context corresponds to the declaration context in which the actual
1175f4a2713aSLionel Sambucdeclaration occurred in the source code, e.g., the translation unit containing
1176f4a2713aSLionel Sambuc``X``.  Thus, this declaration of ``X::f`` can be found by traversing the
1177f4a2713aSLionel Sambucdeclarations provided by [``decls_begin()``, ``decls_end()``) in the
1178f4a2713aSLionel Sambuctranslation unit.
1179f4a2713aSLionel Sambuc
1180f4a2713aSLionel SambucThe semantic context of ``X::f`` corresponds to the class ``X``, since this
1181f4a2713aSLionel Sambucmember function is (semantically) a member of ``X``.  Lookup of the name ``f``
1182f4a2713aSLionel Sambucinto the ``DeclContext`` associated with ``X`` will then return the definition
1183f4a2713aSLionel Sambucof ``X::f`` (including information about the default argument).
1184f4a2713aSLionel Sambuc
1185f4a2713aSLionel SambucTransparent Declaration Contexts
1186f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1187f4a2713aSLionel Sambuc
1188f4a2713aSLionel SambucIn C and C++, there are several contexts in which names that are logically
1189f4a2713aSLionel Sambucdeclared inside another declaration will actually "leak" out into the enclosing
1190f4a2713aSLionel Sambucscope from the perspective of name lookup.  The most obvious instance of this
1191f4a2713aSLionel Sambucbehavior is in enumeration types, e.g.,
1192f4a2713aSLionel Sambuc
1193f4a2713aSLionel Sambuc.. code-block:: c++
1194f4a2713aSLionel Sambuc
1195f4a2713aSLionel Sambuc  enum Color {
1196f4a2713aSLionel Sambuc    Red,
1197f4a2713aSLionel Sambuc    Green,
1198f4a2713aSLionel Sambuc    Blue
1199f4a2713aSLionel Sambuc  };
1200f4a2713aSLionel Sambuc
1201f4a2713aSLionel SambucHere, ``Color`` is an enumeration, which is a declaration context that contains
1202f4a2713aSLionel Sambucthe enumerators ``Red``, ``Green``, and ``Blue``.  Thus, traversing the list of
1203f4a2713aSLionel Sambucdeclarations contained in the enumeration ``Color`` will yield ``Red``,
1204f4a2713aSLionel Sambuc``Green``, and ``Blue``.  However, outside of the scope of ``Color`` one can
1205f4a2713aSLionel Sambucname the enumerator ``Red`` without qualifying the name, e.g.,
1206f4a2713aSLionel Sambuc
1207f4a2713aSLionel Sambuc.. code-block:: c++
1208f4a2713aSLionel Sambuc
1209f4a2713aSLionel Sambuc  Color c = Red;
1210f4a2713aSLionel Sambuc
1211f4a2713aSLionel SambucThere are other entities in C++ that provide similar behavior.  For example,
1212f4a2713aSLionel Sambuclinkage specifications that use curly braces:
1213f4a2713aSLionel Sambuc
1214f4a2713aSLionel Sambuc.. code-block:: c++
1215f4a2713aSLionel Sambuc
1216f4a2713aSLionel Sambuc  extern "C" {
1217f4a2713aSLionel Sambuc    void f(int);
1218f4a2713aSLionel Sambuc    void g(int);
1219f4a2713aSLionel Sambuc  }
1220f4a2713aSLionel Sambuc  // f and g are visible here
1221f4a2713aSLionel Sambuc
1222f4a2713aSLionel SambucFor source-level accuracy, we treat the linkage specification and enumeration
1223f4a2713aSLionel Sambuctype as a declaration context in which its enclosed declarations ("``Red``",
1224f4a2713aSLionel Sambuc"``Green``", and "``Blue``"; "``f``" and "``g``") are declared.  However, these
1225f4a2713aSLionel Sambucdeclarations are visible outside of the scope of the declaration context.
1226f4a2713aSLionel Sambuc
1227f4a2713aSLionel SambucThese language features (and several others, described below) have roughly the
1228f4a2713aSLionel Sambucsame set of requirements: declarations are declared within a particular lexical
1229f4a2713aSLionel Sambuccontext, but the declarations are also found via name lookup in scopes
1230f4a2713aSLionel Sambucenclosing the declaration itself.  This feature is implemented via
1231f4a2713aSLionel Sambuc*transparent* declaration contexts (see
1232f4a2713aSLionel Sambuc``DeclContext::isTransparentContext()``), whose declarations are visible in the
1233f4a2713aSLionel Sambucnearest enclosing non-transparent declaration context.  This means that the
1234f4a2713aSLionel Sambuclexical context of the declaration (e.g., an enumerator) will be the
1235f4a2713aSLionel Sambuctransparent ``DeclContext`` itself, as will the semantic context, but the
1236f4a2713aSLionel Sambucdeclaration will be visible in every outer context up to and including the
1237f4a2713aSLionel Sambucfirst non-transparent declaration context (since transparent declaration
1238f4a2713aSLionel Sambuccontexts can be nested).
1239f4a2713aSLionel Sambuc
1240f4a2713aSLionel SambucThe transparent ``DeclContext``\ s are:
1241f4a2713aSLionel Sambuc
1242f4a2713aSLionel Sambuc* Enumerations (but not C++11 "scoped enumerations"):
1243f4a2713aSLionel Sambuc
1244f4a2713aSLionel Sambuc  .. code-block:: c++
1245f4a2713aSLionel Sambuc
1246f4a2713aSLionel Sambuc    enum Color {
1247f4a2713aSLionel Sambuc      Red,
1248f4a2713aSLionel Sambuc      Green,
1249f4a2713aSLionel Sambuc      Blue
1250f4a2713aSLionel Sambuc    };
1251f4a2713aSLionel Sambuc    // Red, Green, and Blue are in scope
1252f4a2713aSLionel Sambuc
1253f4a2713aSLionel Sambuc* C++ linkage specifications:
1254f4a2713aSLionel Sambuc
1255f4a2713aSLionel Sambuc  .. code-block:: c++
1256f4a2713aSLionel Sambuc
1257f4a2713aSLionel Sambuc    extern "C" {
1258f4a2713aSLionel Sambuc      void f(int);
1259f4a2713aSLionel Sambuc      void g(int);
1260f4a2713aSLionel Sambuc    }
1261f4a2713aSLionel Sambuc    // f and g are in scope
1262f4a2713aSLionel Sambuc
1263f4a2713aSLionel Sambuc* Anonymous unions and structs:
1264f4a2713aSLionel Sambuc
1265f4a2713aSLionel Sambuc  .. code-block:: c++
1266f4a2713aSLionel Sambuc
1267f4a2713aSLionel Sambuc    struct LookupTable {
1268f4a2713aSLionel Sambuc      bool IsVector;
1269f4a2713aSLionel Sambuc      union {
1270f4a2713aSLionel Sambuc        std::vector<Item> *Vector;
1271f4a2713aSLionel Sambuc        std::set<Item> *Set;
1272f4a2713aSLionel Sambuc      };
1273f4a2713aSLionel Sambuc    };
1274f4a2713aSLionel Sambuc
1275f4a2713aSLionel Sambuc    LookupTable LT;
1276f4a2713aSLionel Sambuc    LT.Vector = 0; // Okay: finds Vector inside the unnamed union
1277f4a2713aSLionel Sambuc
1278f4a2713aSLionel Sambuc* C++11 inline namespaces:
1279f4a2713aSLionel Sambuc
1280f4a2713aSLionel Sambuc  .. code-block:: c++
1281f4a2713aSLionel Sambuc
1282f4a2713aSLionel Sambuc    namespace mylib {
1283f4a2713aSLionel Sambuc      inline namespace debug {
1284f4a2713aSLionel Sambuc        class X;
1285f4a2713aSLionel Sambuc      }
1286f4a2713aSLionel Sambuc    }
1287f4a2713aSLionel Sambuc    mylib::X *xp; // okay: mylib::X refers to mylib::debug::X
1288f4a2713aSLionel Sambuc
1289f4a2713aSLionel Sambuc.. _MultiDeclContext:
1290f4a2713aSLionel Sambuc
1291f4a2713aSLionel SambucMultiply-Defined Declaration Contexts
1292f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1293f4a2713aSLionel Sambuc
1294f4a2713aSLionel SambucC++ namespaces have the interesting --- and, so far, unique --- property that
1295f4a2713aSLionel Sambucthe namespace can be defined multiple times, and the declarations provided by
1296f4a2713aSLionel Sambuceach namespace definition are effectively merged (from the semantic point of
1297f4a2713aSLionel Sambucview).  For example, the following two code snippets are semantically
1298f4a2713aSLionel Sambucindistinguishable:
1299f4a2713aSLionel Sambuc
1300f4a2713aSLionel Sambuc.. code-block:: c++
1301f4a2713aSLionel Sambuc
1302f4a2713aSLionel Sambuc  // Snippet #1:
1303f4a2713aSLionel Sambuc  namespace N {
1304f4a2713aSLionel Sambuc    void f();
1305f4a2713aSLionel Sambuc  }
1306f4a2713aSLionel Sambuc  namespace N {
1307f4a2713aSLionel Sambuc    void f(int);
1308f4a2713aSLionel Sambuc  }
1309f4a2713aSLionel Sambuc
1310f4a2713aSLionel Sambuc  // Snippet #2:
1311f4a2713aSLionel Sambuc  namespace N {
1312f4a2713aSLionel Sambuc    void f();
1313f4a2713aSLionel Sambuc    void f(int);
1314f4a2713aSLionel Sambuc  }
1315f4a2713aSLionel Sambuc
1316f4a2713aSLionel SambucIn Clang's representation, the source-centric view of declaration contexts will
1317f4a2713aSLionel Sambucactually have two separate ``NamespaceDecl`` nodes in Snippet #1, each of which
1318f4a2713aSLionel Sambucis a declaration context that contains a single declaration of "``f``".
1319f4a2713aSLionel SambucHowever, the semantics-centric view provided by name lookup into the namespace
1320f4a2713aSLionel Sambuc``N`` for "``f``" will return a ``DeclContext::lookup_result`` that contains a
1321f4a2713aSLionel Sambucrange of iterators over declarations of "``f``".
1322f4a2713aSLionel Sambuc
1323f4a2713aSLionel Sambuc``DeclContext`` manages multiply-defined declaration contexts internally.  The
1324f4a2713aSLionel Sambucfunction ``DeclContext::getPrimaryContext`` retrieves the "primary" context for
1325f4a2713aSLionel Sambuca given ``DeclContext`` instance, which is the ``DeclContext`` responsible for
1326f4a2713aSLionel Sambucmaintaining the lookup table used for the semantics-centric view.  Given the
1327f4a2713aSLionel Sambucprimary context, one can follow the chain of ``DeclContext`` nodes that define
1328f4a2713aSLionel Sambucadditional declarations via ``DeclContext::getNextContext``.  Note that these
1329f4a2713aSLionel Sambucfunctions are used internally within the lookup and insertion methods of the
1330f4a2713aSLionel Sambuc``DeclContext``, so the vast majority of clients can ignore them.
1331f4a2713aSLionel Sambuc
1332f4a2713aSLionel Sambuc.. _CFG:
1333f4a2713aSLionel Sambuc
1334f4a2713aSLionel SambucThe ``CFG`` class
1335f4a2713aSLionel Sambuc-----------------
1336f4a2713aSLionel Sambuc
1337f4a2713aSLionel SambucThe ``CFG`` class is designed to represent a source-level control-flow graph
1338f4a2713aSLionel Sambucfor a single statement (``Stmt*``).  Typically instances of ``CFG`` are
1339f4a2713aSLionel Sambucconstructed for function bodies (usually an instance of ``CompoundStmt``), but
1340f4a2713aSLionel Sambuccan also be instantiated to represent the control-flow of any class that
1341f4a2713aSLionel Sambucsubclasses ``Stmt``, which includes simple expressions.  Control-flow graphs
1342f4a2713aSLionel Sambucare especially useful for performing `flow- or path-sensitive
1343f4a2713aSLionel Sambuc<http://en.wikipedia.org/wiki/Data_flow_analysis#Sensitivities>`_ program
1344f4a2713aSLionel Sambucanalyses on a given function.
1345f4a2713aSLionel Sambuc
1346f4a2713aSLionel SambucBasic Blocks
1347f4a2713aSLionel Sambuc^^^^^^^^^^^^
1348f4a2713aSLionel Sambuc
1349f4a2713aSLionel SambucConcretely, an instance of ``CFG`` is a collection of basic blocks.  Each basic
1350f4a2713aSLionel Sambucblock is an instance of ``CFGBlock``, which simply contains an ordered sequence
1351f4a2713aSLionel Sambucof ``Stmt*`` (each referring to statements in the AST).  The ordering of
1352f4a2713aSLionel Sambucstatements within a block indicates unconditional flow of control from one
1353f4a2713aSLionel Sambucstatement to the next.  :ref:`Conditional control-flow
1354f4a2713aSLionel Sambuc<ConditionalControlFlow>` is represented using edges between basic blocks.  The
1355f4a2713aSLionel Sambucstatements within a given ``CFGBlock`` can be traversed using the
1356f4a2713aSLionel Sambuc``CFGBlock::*iterator`` interface.
1357f4a2713aSLionel Sambuc
1358f4a2713aSLionel SambucA ``CFG`` object owns the instances of ``CFGBlock`` within the control-flow
1359f4a2713aSLionel Sambucgraph it represents.  Each ``CFGBlock`` within a CFG is also uniquely numbered
1360f4a2713aSLionel Sambuc(accessible via ``CFGBlock::getBlockID()``).  Currently the number is based on
1361f4a2713aSLionel Sambucthe ordering the blocks were created, but no assumptions should be made on how
1362f4a2713aSLionel Sambuc``CFGBlocks`` are numbered other than their numbers are unique and that they
1363f4a2713aSLionel Sambucare numbered from 0..N-1 (where N is the number of basic blocks in the CFG).
1364f4a2713aSLionel Sambuc
1365f4a2713aSLionel SambucEntry and Exit Blocks
1366f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^
1367f4a2713aSLionel Sambuc
1368f4a2713aSLionel SambucEach instance of ``CFG`` contains two special blocks: an *entry* block
1369f4a2713aSLionel Sambuc(accessible via ``CFG::getEntry()``), which has no incoming edges, and an
1370f4a2713aSLionel Sambuc*exit* block (accessible via ``CFG::getExit()``), which has no outgoing edges.
1371f4a2713aSLionel SambucNeither block contains any statements, and they serve the role of providing a
1372f4a2713aSLionel Sambucclear entrance and exit for a body of code such as a function body.  The
1373f4a2713aSLionel Sambucpresence of these empty blocks greatly simplifies the implementation of many
1374f4a2713aSLionel Sambucanalyses built on top of CFGs.
1375f4a2713aSLionel Sambuc
1376f4a2713aSLionel Sambuc.. _ConditionalControlFlow:
1377f4a2713aSLionel Sambuc
1378f4a2713aSLionel SambucConditional Control-Flow
1379f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^
1380f4a2713aSLionel Sambuc
1381f4a2713aSLionel SambucConditional control-flow (such as those induced by if-statements and loops) is
1382f4a2713aSLionel Sambucrepresented as edges between ``CFGBlocks``.  Because different C language
1383f4a2713aSLionel Sambucconstructs can induce control-flow, each ``CFGBlock`` also records an extra
1384f4a2713aSLionel Sambuc``Stmt*`` that represents the *terminator* of the block.  A terminator is
1385f4a2713aSLionel Sambucsimply the statement that caused the control-flow, and is used to identify the
1386f4a2713aSLionel Sambucnature of the conditional control-flow between blocks.  For example, in the
1387f4a2713aSLionel Sambuccase of an if-statement, the terminator refers to the ``IfStmt`` object in the
1388f4a2713aSLionel SambucAST that represented the given branch.
1389f4a2713aSLionel Sambuc
1390f4a2713aSLionel SambucTo illustrate, consider the following code example:
1391f4a2713aSLionel Sambuc
1392f4a2713aSLionel Sambuc.. code-block:: c++
1393f4a2713aSLionel Sambuc
1394f4a2713aSLionel Sambuc  int foo(int x) {
1395f4a2713aSLionel Sambuc    x = x + 1;
1396f4a2713aSLionel Sambuc    if (x > 2)
1397f4a2713aSLionel Sambuc      x++;
1398f4a2713aSLionel Sambuc    else {
1399f4a2713aSLionel Sambuc      x += 2;
1400f4a2713aSLionel Sambuc      x *= 2;
1401f4a2713aSLionel Sambuc    }
1402f4a2713aSLionel Sambuc
1403f4a2713aSLionel Sambuc    return x;
1404f4a2713aSLionel Sambuc  }
1405f4a2713aSLionel Sambuc
1406f4a2713aSLionel SambucAfter invoking the parser+semantic analyzer on this code fragment, the AST of
1407f4a2713aSLionel Sambucthe body of ``foo`` is referenced by a single ``Stmt*``.  We can then construct
1408f4a2713aSLionel Sambucan instance of ``CFG`` representing the control-flow graph of this function
1409f4a2713aSLionel Sambucbody by single call to a static class method:
1410f4a2713aSLionel Sambuc
1411f4a2713aSLionel Sambuc.. code-block:: c++
1412f4a2713aSLionel Sambuc
1413f4a2713aSLionel Sambuc  Stmt *FooBody = ...
1414*0a6a1f1dSLionel Sambuc  std::unique_ptr<CFG> FooCFG = CFG::buildCFG(FooBody);
1415f4a2713aSLionel Sambuc
1416f4a2713aSLionel SambucAlong with providing an interface to iterate over its ``CFGBlocks``, the
1417f4a2713aSLionel Sambuc``CFG`` class also provides methods that are useful for debugging and
1418f4a2713aSLionel Sambucvisualizing CFGs.  For example, the method ``CFG::dump()`` dumps a
1419f4a2713aSLionel Sambucpretty-printed version of the CFG to standard error.  This is especially useful
1420f4a2713aSLionel Sambucwhen one is using a debugger such as gdb.  For example, here is the output of
1421f4a2713aSLionel Sambuc``FooCFG->dump()``:
1422f4a2713aSLionel Sambuc
1423f4a2713aSLionel Sambuc.. code-block:: c++
1424f4a2713aSLionel Sambuc
1425f4a2713aSLionel Sambuc [ B5 (ENTRY) ]
1426f4a2713aSLionel Sambuc    Predecessors (0):
1427f4a2713aSLionel Sambuc    Successors (1): B4
1428f4a2713aSLionel Sambuc
1429f4a2713aSLionel Sambuc [ B4 ]
1430f4a2713aSLionel Sambuc    1: x = x + 1
1431f4a2713aSLionel Sambuc    2: (x > 2)
1432f4a2713aSLionel Sambuc    T: if [B4.2]
1433f4a2713aSLionel Sambuc    Predecessors (1): B5
1434f4a2713aSLionel Sambuc    Successors (2): B3 B2
1435f4a2713aSLionel Sambuc
1436f4a2713aSLionel Sambuc [ B3 ]
1437f4a2713aSLionel Sambuc    1: x++
1438f4a2713aSLionel Sambuc    Predecessors (1): B4
1439f4a2713aSLionel Sambuc    Successors (1): B1
1440f4a2713aSLionel Sambuc
1441f4a2713aSLionel Sambuc [ B2 ]
1442f4a2713aSLionel Sambuc    1: x += 2
1443f4a2713aSLionel Sambuc    2: x *= 2
1444f4a2713aSLionel Sambuc    Predecessors (1): B4
1445f4a2713aSLionel Sambuc    Successors (1): B1
1446f4a2713aSLionel Sambuc
1447f4a2713aSLionel Sambuc [ B1 ]
1448f4a2713aSLionel Sambuc    1: return x;
1449f4a2713aSLionel Sambuc    Predecessors (2): B2 B3
1450f4a2713aSLionel Sambuc    Successors (1): B0
1451f4a2713aSLionel Sambuc
1452f4a2713aSLionel Sambuc [ B0 (EXIT) ]
1453f4a2713aSLionel Sambuc    Predecessors (1): B1
1454f4a2713aSLionel Sambuc    Successors (0):
1455f4a2713aSLionel Sambuc
1456f4a2713aSLionel SambucFor each block, the pretty-printed output displays for each block the number of
1457f4a2713aSLionel Sambuc*predecessor* blocks (blocks that have outgoing control-flow to the given
1458f4a2713aSLionel Sambucblock) and *successor* blocks (blocks that have control-flow that have incoming
1459f4a2713aSLionel Sambuccontrol-flow from the given block).  We can also clearly see the special entry
1460f4a2713aSLionel Sambucand exit blocks at the beginning and end of the pretty-printed output.  For the
1461f4a2713aSLionel Sambucentry block (block B5), the number of predecessor blocks is 0, while for the
1462f4a2713aSLionel Sambucexit block (block B0) the number of successor blocks is 0.
1463f4a2713aSLionel Sambuc
1464f4a2713aSLionel SambucThe most interesting block here is B4, whose outgoing control-flow represents
1465f4a2713aSLionel Sambucthe branching caused by the sole if-statement in ``foo``.  Of particular
1466f4a2713aSLionel Sambucinterest is the second statement in the block, ``(x > 2)``, and the terminator,
1467f4a2713aSLionel Sambucprinted as ``if [B4.2]``.  The second statement represents the evaluation of
1468f4a2713aSLionel Sambucthe condition of the if-statement, which occurs before the actual branching of
1469f4a2713aSLionel Sambuccontrol-flow.  Within the ``CFGBlock`` for B4, the ``Stmt*`` for the second
1470f4a2713aSLionel Sambucstatement refers to the actual expression in the AST for ``(x > 2)``.  Thus
1471f4a2713aSLionel Sambucpointers to subclasses of ``Expr`` can appear in the list of statements in a
1472f4a2713aSLionel Sambucblock, and not just subclasses of ``Stmt`` that refer to proper C statements.
1473f4a2713aSLionel Sambuc
1474f4a2713aSLionel SambucThe terminator of block B4 is a pointer to the ``IfStmt`` object in the AST.
1475f4a2713aSLionel SambucThe pretty-printer outputs ``if [B4.2]`` because the condition expression of
1476f4a2713aSLionel Sambucthe if-statement has an actual place in the basic block, and thus the
1477f4a2713aSLionel Sambucterminator is essentially *referring* to the expression that is the second
1478f4a2713aSLionel Sambucstatement of block B4 (i.e., B4.2).  In this manner, conditions for
1479f4a2713aSLionel Sambuccontrol-flow (which also includes conditions for loops and switch statements)
1480f4a2713aSLionel Sambucare hoisted into the actual basic block.
1481f4a2713aSLionel Sambuc
1482f4a2713aSLionel Sambuc.. Implicit Control-Flow
1483f4a2713aSLionel Sambuc.. ^^^^^^^^^^^^^^^^^^^^^
1484f4a2713aSLionel Sambuc
1485f4a2713aSLionel Sambuc.. A key design principle of the ``CFG`` class was to not require any
1486f4a2713aSLionel Sambuc.. transformations to the AST in order to represent control-flow.  Thus the
1487f4a2713aSLionel Sambuc.. ``CFG`` does not perform any "lowering" of the statements in an AST: loops
1488f4a2713aSLionel Sambuc.. are not transformed into guarded gotos, short-circuit operations are not
1489f4a2713aSLionel Sambuc.. converted to a set of if-statements, and so on.
1490f4a2713aSLionel Sambuc
1491f4a2713aSLionel SambucConstant Folding in the Clang AST
1492f4a2713aSLionel Sambuc---------------------------------
1493f4a2713aSLionel Sambuc
1494f4a2713aSLionel SambucThere are several places where constants and constant folding matter a lot to
1495f4a2713aSLionel Sambucthe Clang front-end.  First, in general, we prefer the AST to retain the source
1496f4a2713aSLionel Sambuccode as close to how the user wrote it as possible.  This means that if they
1497f4a2713aSLionel Sambucwrote "``5+4``", we want to keep the addition and two constants in the AST, we
1498f4a2713aSLionel Sambucdon't want to fold to "``9``".  This means that constant folding in various
1499f4a2713aSLionel Sambucways turns into a tree walk that needs to handle the various cases.
1500f4a2713aSLionel Sambuc
1501f4a2713aSLionel SambucHowever, there are places in both C and C++ that require constants to be
1502f4a2713aSLionel Sambucfolded.  For example, the C standard defines what an "integer constant
1503f4a2713aSLionel Sambucexpression" (i-c-e) is with very precise and specific requirements.  The
1504f4a2713aSLionel Sambuclanguage then requires i-c-e's in a lot of places (for example, the size of a
1505f4a2713aSLionel Sambucbitfield, the value for a case statement, etc).  For these, we have to be able
1506f4a2713aSLionel Sambucto constant fold the constants, to do semantic checks (e.g., verify bitfield
1507f4a2713aSLionel Sambucsize is non-negative and that case statements aren't duplicated).  We aim for
1508f4a2713aSLionel SambucClang to be very pedantic about this, diagnosing cases when the code does not
1509f4a2713aSLionel Sambucuse an i-c-e where one is required, but accepting the code unless running with
1510f4a2713aSLionel Sambuc``-pedantic-errors``.
1511f4a2713aSLionel Sambuc
1512f4a2713aSLionel SambucThings get a little bit more tricky when it comes to compatibility with
1513f4a2713aSLionel Sambucreal-world source code.  Specifically, GCC has historically accepted a huge
1514f4a2713aSLionel Sambucsuperset of expressions as i-c-e's, and a lot of real world code depends on
1515f4a2713aSLionel Sambucthis unfortuate accident of history (including, e.g., the glibc system
1516f4a2713aSLionel Sambucheaders).  GCC accepts anything its "fold" optimizer is capable of reducing to
1517f4a2713aSLionel Sambucan integer constant, which means that the definition of what it accepts changes
1518f4a2713aSLionel Sambucas its optimizer does.  One example is that GCC accepts things like "``case
1519f4a2713aSLionel SambucX-X:``" even when ``X`` is a variable, because it can fold this to 0.
1520f4a2713aSLionel Sambuc
1521f4a2713aSLionel SambucAnother issue are how constants interact with the extensions we support, such
1522f4a2713aSLionel Sambucas ``__builtin_constant_p``, ``__builtin_inf``, ``__extension__`` and many
1523f4a2713aSLionel Sambucothers.  C99 obviously does not specify the semantics of any of these
1524f4a2713aSLionel Sambucextensions, and the definition of i-c-e does not include them.  However, these
1525f4a2713aSLionel Sambucextensions are often used in real code, and we have to have a way to reason
1526f4a2713aSLionel Sambucabout them.
1527f4a2713aSLionel Sambuc
1528f4a2713aSLionel SambucFinally, this is not just a problem for semantic analysis.  The code generator
1529f4a2713aSLionel Sambucand other clients have to be able to fold constants (e.g., to initialize global
1530f4a2713aSLionel Sambucvariables) and has to handle a superset of what C99 allows.  Further, these
1531f4a2713aSLionel Sambucclients can benefit from extended information.  For example, we know that
1532f4a2713aSLionel Sambuc"``foo() || 1``" always evaluates to ``true``, but we can't replace the
1533f4a2713aSLionel Sambucexpression with ``true`` because it has side effects.
1534f4a2713aSLionel Sambuc
1535f4a2713aSLionel SambucImplementation Approach
1536f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^
1537f4a2713aSLionel Sambuc
1538f4a2713aSLionel SambucAfter trying several different approaches, we've finally converged on a design
1539f4a2713aSLionel Sambuc(Note, at the time of this writing, not all of this has been implemented,
1540f4a2713aSLionel Sambucconsider this a design goal!).  Our basic approach is to define a single
1541f4a2713aSLionel Sambucrecursive method evaluation method (``Expr::Evaluate``), which is implemented
1542f4a2713aSLionel Sambucin ``AST/ExprConstant.cpp``.  Given an expression with "scalar" type (integer,
1543f4a2713aSLionel Sambucfp, complex, or pointer) this method returns the following information:
1544f4a2713aSLionel Sambuc
1545f4a2713aSLionel Sambuc* Whether the expression is an integer constant expression, a general constant
1546f4a2713aSLionel Sambuc  that was folded but has no side effects, a general constant that was folded
1547f4a2713aSLionel Sambuc  but that does have side effects, or an uncomputable/unfoldable value.
1548f4a2713aSLionel Sambuc* If the expression was computable in any way, this method returns the
1549f4a2713aSLionel Sambuc  ``APValue`` for the result of the expression.
1550f4a2713aSLionel Sambuc* If the expression is not evaluatable at all, this method returns information
1551f4a2713aSLionel Sambuc  on one of the problems with the expression.  This includes a
1552f4a2713aSLionel Sambuc  ``SourceLocation`` for where the problem is, and a diagnostic ID that explains
1553f4a2713aSLionel Sambuc  the problem.  The diagnostic should have ``ERROR`` type.
1554f4a2713aSLionel Sambuc* If the expression is not an integer constant expression, this method returns
1555f4a2713aSLionel Sambuc  information on one of the problems with the expression.  This includes a
1556f4a2713aSLionel Sambuc  ``SourceLocation`` for where the problem is, and a diagnostic ID that
1557f4a2713aSLionel Sambuc  explains the problem.  The diagnostic should have ``EXTENSION`` type.
1558f4a2713aSLionel Sambuc
1559f4a2713aSLionel SambucThis information gives various clients the flexibility that they want, and we
1560f4a2713aSLionel Sambucwill eventually have some helper methods for various extensions.  For example,
1561f4a2713aSLionel Sambuc``Sema`` should have a ``Sema::VerifyIntegerConstantExpression`` method, which
1562f4a2713aSLionel Sambuccalls ``Evaluate`` on the expression.  If the expression is not foldable, the
1563f4a2713aSLionel Sambucerror is emitted, and it would return ``true``.  If the expression is not an
1564f4a2713aSLionel Sambuci-c-e, the ``EXTENSION`` diagnostic is emitted.  Finally it would return
1565f4a2713aSLionel Sambuc``false`` to indicate that the AST is OK.
1566f4a2713aSLionel Sambuc
1567f4a2713aSLionel SambucOther clients can use the information in other ways, for example, codegen can
1568f4a2713aSLionel Sambucjust use expressions that are foldable in any way.
1569f4a2713aSLionel Sambuc
1570f4a2713aSLionel SambucExtensions
1571f4a2713aSLionel Sambuc^^^^^^^^^^
1572f4a2713aSLionel Sambuc
1573f4a2713aSLionel SambucThis section describes how some of the various extensions Clang supports
1574f4a2713aSLionel Sambucinteracts with constant evaluation:
1575f4a2713aSLionel Sambuc
1576f4a2713aSLionel Sambuc* ``__extension__``: The expression form of this extension causes any
1577f4a2713aSLionel Sambuc  evaluatable subexpression to be accepted as an integer constant expression.
1578f4a2713aSLionel Sambuc* ``__builtin_constant_p``: This returns true (as an integer constant
1579f4a2713aSLionel Sambuc  expression) if the operand evaluates to either a numeric value (that is, not
1580f4a2713aSLionel Sambuc  a pointer cast to integral type) of integral, enumeration, floating or
1581f4a2713aSLionel Sambuc  complex type, or if it evaluates to the address of the first character of a
1582f4a2713aSLionel Sambuc  string literal (possibly cast to some other type).  As a special case, if
1583f4a2713aSLionel Sambuc  ``__builtin_constant_p`` is the (potentially parenthesized) condition of a
1584f4a2713aSLionel Sambuc  conditional operator expression ("``?:``"), only the true side of the
1585f4a2713aSLionel Sambuc  conditional operator is considered, and it is evaluated with full constant
1586f4a2713aSLionel Sambuc  folding.
1587f4a2713aSLionel Sambuc* ``__builtin_choose_expr``: The condition is required to be an integer
1588f4a2713aSLionel Sambuc  constant expression, but we accept any constant as an "extension of an
1589f4a2713aSLionel Sambuc  extension".  This only evaluates one operand depending on which way the
1590f4a2713aSLionel Sambuc  condition evaluates.
1591f4a2713aSLionel Sambuc* ``__builtin_classify_type``: This always returns an integer constant
1592f4a2713aSLionel Sambuc  expression.
1593f4a2713aSLionel Sambuc* ``__builtin_inf, nan, ...``: These are treated just like a floating-point
1594f4a2713aSLionel Sambuc  literal.
1595f4a2713aSLionel Sambuc* ``__builtin_abs, copysign, ...``: These are constant folded as general
1596f4a2713aSLionel Sambuc  constant expressions.
1597f4a2713aSLionel Sambuc* ``__builtin_strlen`` and ``strlen``: These are constant folded as integer
1598f4a2713aSLionel Sambuc  constant expressions if the argument is a string literal.
1599f4a2713aSLionel Sambuc
1600*0a6a1f1dSLionel Sambuc.. _Sema:
1601*0a6a1f1dSLionel Sambuc
1602*0a6a1f1dSLionel SambucThe Sema Library
1603*0a6a1f1dSLionel Sambuc================
1604*0a6a1f1dSLionel Sambuc
1605*0a6a1f1dSLionel SambucThis library is called by the :ref:`Parser library <Parser>` during parsing to
1606*0a6a1f1dSLionel Sambucdo semantic analysis of the input.  For valid programs, Sema builds an AST for
1607*0a6a1f1dSLionel Sambucparsed constructs.
1608*0a6a1f1dSLionel Sambuc
1609*0a6a1f1dSLionel Sambuc.. _CodeGen:
1610*0a6a1f1dSLionel Sambuc
1611*0a6a1f1dSLionel SambucThe CodeGen Library
1612*0a6a1f1dSLionel Sambuc===================
1613*0a6a1f1dSLionel Sambuc
1614*0a6a1f1dSLionel SambucCodeGen takes an :ref:`AST <AST>` as input and produces `LLVM IR code
1615*0a6a1f1dSLionel Sambuc<//llvm.org/docs/LangRef.html>`_ from it.
1616*0a6a1f1dSLionel Sambuc
1617f4a2713aSLionel SambucHow to change Clang
1618f4a2713aSLionel Sambuc===================
1619f4a2713aSLionel Sambuc
1620f4a2713aSLionel SambucHow to add an attribute
1621f4a2713aSLionel Sambuc-----------------------
1622f4a2713aSLionel Sambuc
1623*0a6a1f1dSLionel SambucAttribute Basics
1624*0a6a1f1dSLionel Sambuc^^^^^^^^^^^^^^^^
1625f4a2713aSLionel Sambuc
1626*0a6a1f1dSLionel SambucAttributes in clang come in two forms: parsed form, and semantic form. Both
1627*0a6a1f1dSLionel Sambucforms are represented via a tablegen definition of the attribute, specified in
1628*0a6a1f1dSLionel SambucAttr.td.
1629f4a2713aSLionel Sambuc
1630f4a2713aSLionel Sambuc
1631f4a2713aSLionel Sambuc``include/clang/Basic/Attr.td``
1632f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1633f4a2713aSLionel Sambuc
1634*0a6a1f1dSLionel SambucFirst, add your attribute to the `include/clang/Basic/Attr.td
1635*0a6a1f1dSLionel Sambuc<http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/Attr.td?view=markup>`_
1636*0a6a1f1dSLionel Sambucfile.
1637f4a2713aSLionel Sambuc
1638f4a2713aSLionel SambucEach attribute gets a ``def`` inheriting from ``Attr`` or one of its
1639f4a2713aSLionel Sambucsubclasses.  ``InheritableAttr`` means that the attribute also applies to
1640*0a6a1f1dSLionel Sambucsubsequent declarations of the same name.  ``InheritableParamAttr`` is similar
1641*0a6a1f1dSLionel Sambucto ``InheritableAttr``, except that the attribute is written on a parameter
1642*0a6a1f1dSLionel Sambucinstead of a declaration, type or statement.  Attributes inheriting from
1643*0a6a1f1dSLionel Sambuc``TypeAttr`` are pure type attributes which generally are not given a
1644*0a6a1f1dSLionel Sambucrepresentation in the AST.  Attributes inheriting from ``TargetSpecificAttr``
1645*0a6a1f1dSLionel Sambucare attributes specific to one or more target architectures.  An attribute that
1646*0a6a1f1dSLionel Sambucinherits from ``IgnoredAttr`` is parsed, but will generate an ignored attribute
1647*0a6a1f1dSLionel Sambucdiagnostic when used.  The attribute type may be useful when an attribute is
1648*0a6a1f1dSLionel Sambucsupported by another vendor, but not supported by clang.
1649f4a2713aSLionel Sambuc
1650f4a2713aSLionel Sambuc``Spellings`` lists the strings that can appear in ``__attribute__((here))`` or
1651*0a6a1f1dSLionel Sambuc``[[here]]``.  All such strings will be synonymous.  Possible ``Spellings``
1652*0a6a1f1dSLionel Sambucare: ``GNU`` (for use with GNU-style __attribute__ spellings), ``Declspec``
1653*0a6a1f1dSLionel Sambuc(for use with Microsoft Visual Studio-style __declspec spellings), ``CXX11`
1654*0a6a1f1dSLionel Sambuc(for use with C++11-style [[foo]] and [[foo::bar]] spellings), and ``Keyword``
1655*0a6a1f1dSLionel Sambuc(for use with attributes that are implemented as keywords, like C++11's
1656*0a6a1f1dSLionel Sambuc``override`` or ``final``). If you want to allow the ``[[]]`` C++11 syntax, you
1657*0a6a1f1dSLionel Sambuchave to define a list of ``Namespaces``, which will let users write
1658*0a6a1f1dSLionel Sambuc``[[namespace::spelling]]``.  Using the empty string for a namespace will allow
1659*0a6a1f1dSLionel Sambucusers to write just the spelling with no "``::``".  Attributes which g++-4.8
1660*0a6a1f1dSLionel Sambucor later accepts should also have a ``CXX11<"gnu", "spelling">`` spelling.
1661f4a2713aSLionel Sambuc
1662f4a2713aSLionel Sambuc``Subjects`` restricts what kinds of AST node to which this attribute can
1663*0a6a1f1dSLionel Sambucappertain (roughly, attach).  The subjects are specified via a ``SubjectList``,
1664*0a6a1f1dSLionel Sambucwhich specify the list of subjects. Additionally, subject-related diagnostics
1665*0a6a1f1dSLionel Sambuccan be specified to be warnings or errors, with the default being a warning.
1666*0a6a1f1dSLionel SambucThe diagnostics displayed to the user are automatically determined based on
1667*0a6a1f1dSLionel Sambucthe subjects in the list, but a custom diagnostic parameter can also be
1668*0a6a1f1dSLionel Sambucspecified in the ``SubjectList``.  The diagnostics generated for subject list
1669*0a6a1f1dSLionel Sambucviolations are either ``diag::warn_attribute_wrong_decl_type`` or
1670*0a6a1f1dSLionel Sambuc``diag::err_attribute_wrong_decl_type``, and the parameter enumeration is
1671*0a6a1f1dSLionel Sambucfound in `include/clang/Sema/AttributeList.h
1672*0a6a1f1dSLionel Sambuc<http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Sema/AttributeList.h?view=markup>`_
1673*0a6a1f1dSLionel SambucIf you add new Decl nodes to the ``SubjectList``, you may need to update the
1674*0a6a1f1dSLionel Sambuclogic used to automatically determine the diagnostic parameter in `utils/TableGen/ClangAttrEmitter.cpp
1675*0a6a1f1dSLionel Sambuc<http://llvm.org/viewvc/llvm-project/cfe/trunk/utils/TableGen/ClangAttrEmitter.cpp?view=markup>`_.
1676*0a6a1f1dSLionel Sambuc
1677*0a6a1f1dSLionel SambucDiagnostic checking for attribute subject lists is automated except when
1678*0a6a1f1dSLionel Sambuc``HasCustomParsing`` is set to ``1``.
1679*0a6a1f1dSLionel Sambuc
1680*0a6a1f1dSLionel SambucBy default, all subjects in the SubjectList must either be a Decl node defined
1681*0a6a1f1dSLionel Sambucin ``DeclNodes.td``, or a statement node defined in ``StmtNodes.td``.  However,
1682*0a6a1f1dSLionel Sambucmore complex subjects can be created by creating a ``SubsetSubject`` object.
1683*0a6a1f1dSLionel SambucEach such object has a base subject which it appertains to (which must be a
1684*0a6a1f1dSLionel SambucDecl or Stmt node, and not a SubsetSubject node), and some custom code which is
1685*0a6a1f1dSLionel Sambuccalled when determining whether an attribute appertains to the subject.  For
1686*0a6a1f1dSLionel Sambucinstance, a ``NonBitField`` SubsetSubject appertains to a ``FieldDecl``, and
1687*0a6a1f1dSLionel Sambuctests whether the given FieldDecl is a bit field.  When a SubsetSubject is
1688*0a6a1f1dSLionel Sambucspecified in a SubjectList, a custom diagnostic parameter must also be provided.
1689f4a2713aSLionel Sambuc
1690f4a2713aSLionel Sambuc``Args`` names the arguments the attribute takes, in order.  If ``Args`` is
1691f4a2713aSLionel Sambuc``[StringArgument<"Arg1">, IntArgument<"Arg2">]`` then
1692*0a6a1f1dSLionel Sambuc``__attribute__((myattribute("Hello", 3)))`` will be a valid use.  Attribute
1693*0a6a1f1dSLionel Sambucarguments specify both the parsed form and the semantic form of the attribute.
1694*0a6a1f1dSLionel SambucThe previous example shows an attribute which requires two attributes while
1695*0a6a1f1dSLionel Sambucparsing, and the Attr subclass' constructor for the attribute will require a
1696*0a6a1f1dSLionel Sambucstring and integer argument.
1697*0a6a1f1dSLionel Sambuc
1698*0a6a1f1dSLionel SambucDiagnostic checking for argument counts is automated except when
1699*0a6a1f1dSLionel Sambuc``HasCustomParsing`` is set to ``1``, or when the attribute uses an optional or
1700*0a6a1f1dSLionel Sambucvariadic argument.  Diagnostic checking for argument semantics is not automated.
1701*0a6a1f1dSLionel Sambuc
1702*0a6a1f1dSLionel SambucIf the parsed form of the attribute is more complex, or differs from the
1703*0a6a1f1dSLionel Sambucsemantic form, the ``HasCustomParsing`` bit can be set to ``1`` for the class,
1704*0a6a1f1dSLionel Sambucand the parsing code in `Parser::ParseGNUAttributeArgs
1705*0a6a1f1dSLionel Sambuc<http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Parse/ParseDecl.cpp?view=markup>`_
1706*0a6a1f1dSLionel Sambuccan be updated for the special case.  Note that this only applies to arguments
1707*0a6a1f1dSLionel Sambucwith a GNU spelling -- attributes with a __declspec spelling currently ignore
1708*0a6a1f1dSLionel Sambucthis flag and are handled by ``Parser::ParseMicrosoftDeclSpec``.
1709*0a6a1f1dSLionel Sambuc
1710*0a6a1f1dSLionel SambucCustom accessors can be generated for an attribute based on the spelling list
1711*0a6a1f1dSLionel Sambucfor that attribute.  For instance, if an attribute has two different spellings:
1712*0a6a1f1dSLionel Sambuc'Foo' and 'Bar', accessors can be created:
1713*0a6a1f1dSLionel Sambuc``[Accessor<"isFoo", [GNU<"Foo">]>, Accessor<"isBar", [GNU<"Bar">]>]``
1714*0a6a1f1dSLionel SambucThese accessors will be generated on the semantic form of the attribute,
1715*0a6a1f1dSLionel Sambucaccepting no arguments and returning a Boolean.
1716*0a6a1f1dSLionel Sambuc
1717*0a6a1f1dSLionel SambucAttributes which do not require an AST node should set the ``ASTNode`` field to
1718*0a6a1f1dSLionel Sambuc``0`` to avoid polluting the AST.  Note that anything inheriting from
1719*0a6a1f1dSLionel Sambuc``TypeAttr`` or ``IgnoredAttr`` automatically do not generate an AST node.  All
1720*0a6a1f1dSLionel Sambucother attributes generate an AST node by default.  The AST node is the semantic
1721*0a6a1f1dSLionel Sambucrepresentation of the attribute.
1722*0a6a1f1dSLionel Sambuc
1723*0a6a1f1dSLionel SambucAttributes which do not require custom semantic handling should set the
1724*0a6a1f1dSLionel Sambuc``SemaHandler`` field to ``0``.  Note that anything inheriting from
1725*0a6a1f1dSLionel Sambuc``IgnoredAttr`` automatically do not get a semantic handler.  All other
1726*0a6a1f1dSLionel Sambucattributes are assumed to use a semantic handler by default.  Attributes
1727*0a6a1f1dSLionel Sambucwithout a semantic handler are not given a parsed attribute Kind enumeration.
1728*0a6a1f1dSLionel Sambuc
1729*0a6a1f1dSLionel SambucThe ``LangOpts`` field can be used to specify a list of language options
1730*0a6a1f1dSLionel Sambucrequired by the attribute.  For instance, all of the CUDA-specific attributes
1731*0a6a1f1dSLionel Sambucspecify ``[CUDA]`` for the ``LangOpts`` field, and when the CUDA language
1732*0a6a1f1dSLionel Sambucoption is not enabled, an "attribute ignored" warning diagnostic is emitted.
1733*0a6a1f1dSLionel SambucSince language options are not table generated nodes, new language options must
1734*0a6a1f1dSLionel Sambucbe created manually and should specify the spelling used by ``LangOptions`` class.
1735*0a6a1f1dSLionel Sambuc
1736*0a6a1f1dSLionel SambucTarget-specific attribute sometimes share a spelling with other attributes in
1737*0a6a1f1dSLionel Sambucdifferent targets.  For instance, the ARM and MSP430 targets both have an
1738*0a6a1f1dSLionel Sambucattribute spelled ``GNU<"interrupt">``, but with different parsing and semantic
1739*0a6a1f1dSLionel Sambucrequirements.  To support this feature, an attribute inheriting from
1740*0a6a1f1dSLionel Sambuc``TargetSpecificAttribute`` make specify a ``ParseKind`` field.  This field
1741*0a6a1f1dSLionel Sambucshould be the same value between all arguments sharing a spelling, and
1742*0a6a1f1dSLionel Sambuccorresponds to the parsed attribute's Kind enumeration.  This allows attributes
1743*0a6a1f1dSLionel Sambucto share a parsed attribute kind, but have distinct semantic attribute classes.
1744*0a6a1f1dSLionel SambucFor instance, ``AttributeList::AT_Interrupt`` is the shared parsed attribute
1745*0a6a1f1dSLionel Sambuckind, but ARMInterruptAttr and MSP430InterruptAttr are the semantic attributes
1746*0a6a1f1dSLionel Sambucgenerated.
1747*0a6a1f1dSLionel Sambuc
1748*0a6a1f1dSLionel SambucBy default, when declarations are merging attributes, an attribute will not be
1749*0a6a1f1dSLionel Sambucduplicated. However, if an attribute can be duplicated during this merging
1750*0a6a1f1dSLionel Sambucstage, set ``DuplicatesAllowedWhileMerging`` to ``1``, and the attribute will
1751*0a6a1f1dSLionel Sambucbe merged.
1752*0a6a1f1dSLionel Sambuc
1753*0a6a1f1dSLionel SambucBy default, attribute arguments are parsed in an evaluated context. If the
1754*0a6a1f1dSLionel Sambucarguments for an attribute should be parsed in an unevaluated context (akin to
1755*0a6a1f1dSLionel Sambucthe way the argument to a ``sizeof`` expression is parsed), you can set
1756*0a6a1f1dSLionel Sambuc``ParseArgumentsAsUnevaluated`` to ``1``.
1757*0a6a1f1dSLionel Sambuc
1758*0a6a1f1dSLionel SambucIf additional functionality is desired for the semantic form of the attribute,
1759*0a6a1f1dSLionel Sambucthe ``AdditionalMembers`` field specifies code to be copied verbatim into the
1760*0a6a1f1dSLionel Sambucsemantic attribute class object.
1761*0a6a1f1dSLionel Sambuc
1762*0a6a1f1dSLionel SambucAll attributes must have one or more form of documentation, which is provided
1763*0a6a1f1dSLionel Sambucin the ``Documentation`` list. Generally, the documentation for an attribute
1764*0a6a1f1dSLionel Sambucis a stand-alone definition in `include/clang/Basic/AttrDocs.td
1765*0a6a1f1dSLionel Sambuc<http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/AttdDocs.td?view=markup>`_
1766*0a6a1f1dSLionel Sambucthat is named after the attribute being documented. Each documentation element
1767*0a6a1f1dSLionel Sambucis given a ``Category`` (variable, function, or type) and ``Content``. A single
1768*0a6a1f1dSLionel Sambucattribute may contain multiple documentation elements for distinct categories.
1769*0a6a1f1dSLionel SambucFor instance, an attribute which can appertain to both function and types (such
1770*0a6a1f1dSLionel Sambucas a calling convention attribute), should contain two documentation elements.
1771*0a6a1f1dSLionel SambucThe ``Content`` for an attribute uses reStructuredText (RST) syntax.
1772*0a6a1f1dSLionel Sambuc
1773*0a6a1f1dSLionel SambucIf an attribute is used internally by the compiler, but is not written by users
1774*0a6a1f1dSLionel Sambuc(such as attributes with an empty spelling list), it can use the
1775*0a6a1f1dSLionel Sambuc``Undocumented`` documentation element.
1776f4a2713aSLionel Sambuc
1777f4a2713aSLionel SambucBoilerplate
1778f4a2713aSLionel Sambuc^^^^^^^^^^^
1779f4a2713aSLionel Sambuc
1780*0a6a1f1dSLionel SambucAll semantic processing of declaration attributes happens in `lib/Sema/SemaDeclAttr.cpp
1781f4a2713aSLionel Sambuc<http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Sema/SemaDeclAttr.cpp?view=markup>`_,
1782*0a6a1f1dSLionel Sambucand generally starts in the ``ProcessDeclAttribute`` function.  If your
1783*0a6a1f1dSLionel Sambucattribute is a "simple" attribute -- meaning that it requires no custom
1784*0a6a1f1dSLionel Sambucsemantic processing aside from what is automatically  provided for you, you can
1785*0a6a1f1dSLionel Sambucadd a call to ``handleSimpleAttribute<YourAttr>(S, D, Attr);`` to the switch
1786*0a6a1f1dSLionel Sambucstatement. Otherwise, write a new ``handleYourAttr()`` function, and add that
1787*0a6a1f1dSLionel Sambucto the switch statement.
1788f4a2713aSLionel Sambuc
1789f4a2713aSLionel SambucIf your attribute causes extra warnings to fire, define a ``DiagGroup`` in
1790f4a2713aSLionel Sambuc`include/clang/Basic/DiagnosticGroups.td
1791f4a2713aSLionel Sambuc<http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/DiagnosticGroups.td?view=markup>`_
1792f4a2713aSLionel Sambucnamed after the attribute's ``Spelling`` with "_"s replaced by "-"s.  If you're
1793f4a2713aSLionel Sambuconly defining one diagnostic, you can skip ``DiagnosticGroups.td`` and use
1794f4a2713aSLionel Sambuc``InGroup<DiagGroup<"your-attribute">>`` directly in `DiagnosticSemaKinds.td
1795f4a2713aSLionel Sambuc<http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/DiagnosticSemaKinds.td?view=markup>`_
1796f4a2713aSLionel Sambuc
1797*0a6a1f1dSLionel SambucAll semantic diagnostics generated for your attribute, including automatically-
1798*0a6a1f1dSLionel Sambucgenerated ones (such as subjects and argument counts), should have a
1799*0a6a1f1dSLionel Sambuccorresponding test case.
1800*0a6a1f1dSLionel Sambuc
1801f4a2713aSLionel SambucThe meat of your attribute
1802f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^
1803f4a2713aSLionel Sambuc
1804f4a2713aSLionel SambucFind an appropriate place in Clang to do whatever your attribute needs to do.
1805f4a2713aSLionel SambucCheck for the attribute's presence using ``Decl::getAttr<YourAttr>()``.
1806f4a2713aSLionel Sambuc
1807f4a2713aSLionel SambucUpdate the :doc:`LanguageExtensions` document to describe your new attribute.
1808f4a2713aSLionel Sambuc
1809f4a2713aSLionel SambucHow to add an expression or statement
1810f4a2713aSLionel Sambuc-------------------------------------
1811f4a2713aSLionel Sambuc
1812f4a2713aSLionel SambucExpressions and statements are one of the most fundamental constructs within a
1813f4a2713aSLionel Sambuccompiler, because they interact with many different parts of the AST, semantic
1814f4a2713aSLionel Sambucanalysis, and IR generation.  Therefore, adding a new expression or statement
1815f4a2713aSLionel Sambuckind into Clang requires some care.  The following list details the various
1816f4a2713aSLionel Sambucplaces in Clang where an expression or statement needs to be introduced, along
1817f4a2713aSLionel Sambucwith patterns to follow to ensure that the new expression or statement works
1818f4a2713aSLionel Sambucwell across all of the C languages.  We focus on expressions, but statements
1819f4a2713aSLionel Sambucare similar.
1820f4a2713aSLionel Sambuc
1821f4a2713aSLionel Sambuc#. Introduce parsing actions into the parser.  Recursive-descent parsing is
1822f4a2713aSLionel Sambuc   mostly self-explanatory, but there are a few things that are worth keeping
1823f4a2713aSLionel Sambuc   in mind:
1824f4a2713aSLionel Sambuc
1825f4a2713aSLionel Sambuc   * Keep as much source location information as possible! You'll want it later
1826f4a2713aSLionel Sambuc     to produce great diagnostics and support Clang's various features that map
1827f4a2713aSLionel Sambuc     between source code and the AST.
1828f4a2713aSLionel Sambuc   * Write tests for all of the "bad" parsing cases, to make sure your recovery
1829f4a2713aSLionel Sambuc     is good.  If you have matched delimiters (e.g., parentheses, square
1830f4a2713aSLionel Sambuc     brackets, etc.), use ``Parser::BalancedDelimiterTracker`` to give nice
1831f4a2713aSLionel Sambuc     diagnostics when things go wrong.
1832f4a2713aSLionel Sambuc
1833f4a2713aSLionel Sambuc#. Introduce semantic analysis actions into ``Sema``.  Semantic analysis should
1834f4a2713aSLionel Sambuc   always involve two functions: an ``ActOnXXX`` function that will be called
1835f4a2713aSLionel Sambuc   directly from the parser, and a ``BuildXXX`` function that performs the
1836f4a2713aSLionel Sambuc   actual semantic analysis and will (eventually!) build the AST node.  It's
1837f4a2713aSLionel Sambuc   fairly common for the ``ActOnCXX`` function to do very little (often just
1838f4a2713aSLionel Sambuc   some minor translation from the parser's representation to ``Sema``'s
1839f4a2713aSLionel Sambuc   representation of the same thing), but the separation is still important:
1840f4a2713aSLionel Sambuc   C++ template instantiation, for example, should always call the ``BuildXXX``
1841f4a2713aSLionel Sambuc   variant.  Several notes on semantic analysis before we get into construction
1842f4a2713aSLionel Sambuc   of the AST:
1843f4a2713aSLionel Sambuc
1844f4a2713aSLionel Sambuc   * Your expression probably involves some types and some subexpressions.
1845f4a2713aSLionel Sambuc     Make sure to fully check that those types, and the types of those
1846f4a2713aSLionel Sambuc     subexpressions, meet your expectations.  Add implicit conversions where
1847f4a2713aSLionel Sambuc     necessary to make sure that all of the types line up exactly the way you
1848f4a2713aSLionel Sambuc     want them.  Write extensive tests to check that you're getting good
1849f4a2713aSLionel Sambuc     diagnostics for mistakes and that you can use various forms of
1850f4a2713aSLionel Sambuc     subexpressions with your expression.
1851f4a2713aSLionel Sambuc   * When type-checking a type or subexpression, make sure to first check
1852f4a2713aSLionel Sambuc     whether the type is "dependent" (``Type::isDependentType()``) or whether a
1853f4a2713aSLionel Sambuc     subexpression is type-dependent (``Expr::isTypeDependent()``).  If any of
1854f4a2713aSLionel Sambuc     these return ``true``, then you're inside a template and you can't do much
1855f4a2713aSLionel Sambuc     type-checking now.  That's normal, and your AST node (when you get there)
1856f4a2713aSLionel Sambuc     will have to deal with this case.  At this point, you can write tests that
1857f4a2713aSLionel Sambuc     use your expression within templates, but don't try to instantiate the
1858f4a2713aSLionel Sambuc     templates.
1859f4a2713aSLionel Sambuc   * For each subexpression, be sure to call ``Sema::CheckPlaceholderExpr()``
1860f4a2713aSLionel Sambuc     to deal with "weird" expressions that don't behave well as subexpressions.
1861f4a2713aSLionel Sambuc     Then, determine whether you need to perform lvalue-to-rvalue conversions
1862f4a2713aSLionel Sambuc     (``Sema::DefaultLvalueConversions``) or the usual unary conversions
1863f4a2713aSLionel Sambuc     (``Sema::UsualUnaryConversions``), for places where the subexpression is
1864f4a2713aSLionel Sambuc     producing a value you intend to use.
1865f4a2713aSLionel Sambuc   * Your ``BuildXXX`` function will probably just return ``ExprError()`` at
1866f4a2713aSLionel Sambuc     this point, since you don't have an AST.  That's perfectly fine, and
1867f4a2713aSLionel Sambuc     shouldn't impact your testing.
1868f4a2713aSLionel Sambuc
1869f4a2713aSLionel Sambuc#. Introduce an AST node for your new expression.  This starts with declaring
1870f4a2713aSLionel Sambuc   the node in ``include/Basic/StmtNodes.td`` and creating a new class for your
1871f4a2713aSLionel Sambuc   expression in the appropriate ``include/AST/Expr*.h`` header.  It's best to
1872f4a2713aSLionel Sambuc   look at the class for a similar expression to get ideas, and there are some
1873f4a2713aSLionel Sambuc   specific things to watch for:
1874f4a2713aSLionel Sambuc
1875f4a2713aSLionel Sambuc   * If you need to allocate memory, use the ``ASTContext`` allocator to
1876f4a2713aSLionel Sambuc     allocate memory.  Never use raw ``malloc`` or ``new``, and never hold any
1877f4a2713aSLionel Sambuc     resources in an AST node, because the destructor of an AST node is never
1878f4a2713aSLionel Sambuc     called.
1879f4a2713aSLionel Sambuc   * Make sure that ``getSourceRange()`` covers the exact source range of your
1880f4a2713aSLionel Sambuc     expression.  This is needed for diagnostics and for IDE support.
1881f4a2713aSLionel Sambuc   * Make sure that ``children()`` visits all of the subexpressions.  This is
1882f4a2713aSLionel Sambuc     important for a number of features (e.g., IDE support, C++ variadic
1883f4a2713aSLionel Sambuc     templates).  If you have sub-types, you'll also need to visit those
1884*0a6a1f1dSLionel Sambuc     sub-types in ``RecursiveASTVisitor`` and ``DataRecursiveASTVisitor``.
1885*0a6a1f1dSLionel Sambuc   * Add printing support (``StmtPrinter.cpp``) for your expression.
1886f4a2713aSLionel Sambuc   * Add profiling support (``StmtProfile.cpp``) for your AST node, noting the
1887f4a2713aSLionel Sambuc     distinguishing (non-source location) characteristics of an instance of
1888f4a2713aSLionel Sambuc     your expression.  Omitting this step will lead to hard-to-diagnose
1889f4a2713aSLionel Sambuc     failures regarding matching of template declarations.
1890*0a6a1f1dSLionel Sambuc   * Add serialization support (``ASTReaderStmt.cpp``, ``ASTWriterStmt.cpp``)
1891*0a6a1f1dSLionel Sambuc     for your AST node.
1892f4a2713aSLionel Sambuc
1893f4a2713aSLionel Sambuc#. Teach semantic analysis to build your AST node.  At this point, you can wire
1894f4a2713aSLionel Sambuc   up your ``Sema::BuildXXX`` function to actually create your AST.  A few
1895f4a2713aSLionel Sambuc   things to check at this point:
1896f4a2713aSLionel Sambuc
1897f4a2713aSLionel Sambuc   * If your expression can construct a new C++ class or return a new
1898f4a2713aSLionel Sambuc     Objective-C object, be sure to update and then call
1899f4a2713aSLionel Sambuc     ``Sema::MaybeBindToTemporary`` for your just-created AST node to be sure
1900f4a2713aSLionel Sambuc     that the object gets properly destructed.  An easy way to test this is to
1901f4a2713aSLionel Sambuc     return a C++ class with a private destructor: semantic analysis should
1902f4a2713aSLionel Sambuc     flag an error here with the attempt to call the destructor.
1903f4a2713aSLionel Sambuc   * Inspect the generated AST by printing it using ``clang -cc1 -ast-print``,
1904f4a2713aSLionel Sambuc     to make sure you're capturing all of the important information about how
1905f4a2713aSLionel Sambuc     the AST was written.
1906f4a2713aSLionel Sambuc   * Inspect the generated AST under ``clang -cc1 -ast-dump`` to verify that
1907f4a2713aSLionel Sambuc     all of the types in the generated AST line up the way you want them.
1908f4a2713aSLionel Sambuc     Remember that clients of the AST should never have to "think" to
1909f4a2713aSLionel Sambuc     understand what's going on.  For example, all implicit conversions should
1910f4a2713aSLionel Sambuc     show up explicitly in the AST.
1911f4a2713aSLionel Sambuc   * Write tests that use your expression as a subexpression of other,
1912f4a2713aSLionel Sambuc     well-known expressions.  Can you call a function using your expression as
1913f4a2713aSLionel Sambuc     an argument?  Can you use the ternary operator?
1914f4a2713aSLionel Sambuc
1915f4a2713aSLionel Sambuc#. Teach code generation to create IR to your AST node.  This step is the first
1916f4a2713aSLionel Sambuc   (and only) that requires knowledge of LLVM IR.  There are several things to
1917f4a2713aSLionel Sambuc   keep in mind:
1918f4a2713aSLionel Sambuc
1919f4a2713aSLionel Sambuc   * Code generation is separated into scalar/aggregate/complex and
1920f4a2713aSLionel Sambuc     lvalue/rvalue paths, depending on what kind of result your expression
1921f4a2713aSLionel Sambuc     produces.  On occasion, this requires some careful factoring of code to
1922f4a2713aSLionel Sambuc     avoid duplication.
1923f4a2713aSLionel Sambuc   * ``CodeGenFunction`` contains functions ``ConvertType`` and
1924f4a2713aSLionel Sambuc     ``ConvertTypeForMem`` that convert Clang's types (``clang::Type*`` or
1925f4a2713aSLionel Sambuc     ``clang::QualType``) to LLVM types.  Use the former for values, and the
1926f4a2713aSLionel Sambuc     later for memory locations: test with the C++ "``bool``" type to check
1927f4a2713aSLionel Sambuc     this.  If you find that you are having to use LLVM bitcasts to make the
1928f4a2713aSLionel Sambuc     subexpressions of your expression have the type that your expression
1929f4a2713aSLionel Sambuc     expects, STOP!  Go fix semantic analysis and the AST so that you don't
1930f4a2713aSLionel Sambuc     need these bitcasts.
1931f4a2713aSLionel Sambuc   * The ``CodeGenFunction`` class has a number of helper functions to make
1932f4a2713aSLionel Sambuc     certain operations easy, such as generating code to produce an lvalue or
1933f4a2713aSLionel Sambuc     an rvalue, or to initialize a memory location with a given value.  Prefer
1934f4a2713aSLionel Sambuc     to use these functions rather than directly writing loads and stores,
1935f4a2713aSLionel Sambuc     because these functions take care of some of the tricky details for you
1936f4a2713aSLionel Sambuc     (e.g., for exceptions).
1937f4a2713aSLionel Sambuc   * If your expression requires some special behavior in the event of an
1938f4a2713aSLionel Sambuc     exception, look at the ``push*Cleanup`` functions in ``CodeGenFunction``
1939f4a2713aSLionel Sambuc     to introduce a cleanup.  You shouldn't have to deal with
1940f4a2713aSLionel Sambuc     exception-handling directly.
1941f4a2713aSLionel Sambuc   * Testing is extremely important in IR generation.  Use ``clang -cc1
1942f4a2713aSLionel Sambuc     -emit-llvm`` and `FileCheck
1943f4a2713aSLionel Sambuc     <http://llvm.org/docs/CommandGuide/FileCheck.html>`_ to verify that you're
1944f4a2713aSLionel Sambuc     generating the right IR.
1945f4a2713aSLionel Sambuc
1946f4a2713aSLionel Sambuc#. Teach template instantiation how to cope with your AST node, which requires
1947f4a2713aSLionel Sambuc   some fairly simple code:
1948f4a2713aSLionel Sambuc
1949f4a2713aSLionel Sambuc   * Make sure that your expression's constructor properly computes the flags
1950f4a2713aSLionel Sambuc     for type dependence (i.e., the type your expression produces can change
1951f4a2713aSLionel Sambuc     from one instantiation to the next), value dependence (i.e., the constant
1952f4a2713aSLionel Sambuc     value your expression produces can change from one instantiation to the
1953f4a2713aSLionel Sambuc     next), instantiation dependence (i.e., a template parameter occurs
1954f4a2713aSLionel Sambuc     anywhere in your expression), and whether your expression contains a
1955f4a2713aSLionel Sambuc     parameter pack (for variadic templates).  Often, computing these flags
1956f4a2713aSLionel Sambuc     just means combining the results from the various types and
1957f4a2713aSLionel Sambuc     subexpressions.
1958f4a2713aSLionel Sambuc   * Add ``TransformXXX`` and ``RebuildXXX`` functions to the ``TreeTransform``
1959f4a2713aSLionel Sambuc     class template in ``Sema``.  ``TransformXXX`` should (recursively)
1960f4a2713aSLionel Sambuc     transform all of the subexpressions and types within your expression,
1961f4a2713aSLionel Sambuc     using ``getDerived().TransformYYY``.  If all of the subexpressions and
1962f4a2713aSLionel Sambuc     types transform without error, it will then call the ``RebuildXXX``
1963f4a2713aSLionel Sambuc     function, which will in turn call ``getSema().BuildXXX`` to perform
1964f4a2713aSLionel Sambuc     semantic analysis and build your expression.
1965f4a2713aSLionel Sambuc   * To test template instantiation, take those tests you wrote to make sure
1966f4a2713aSLionel Sambuc     that you were type checking with type-dependent expressions and dependent
1967f4a2713aSLionel Sambuc     types (from step #2) and instantiate those templates with various types,
1968f4a2713aSLionel Sambuc     some of which type-check and some that don't, and test the error messages
1969f4a2713aSLionel Sambuc     in each case.
1970f4a2713aSLionel Sambuc
1971f4a2713aSLionel Sambuc#. There are some "extras" that make other features work better.  It's worth
1972f4a2713aSLionel Sambuc   handling these extras to give your expression complete integration into
1973f4a2713aSLionel Sambuc   Clang:
1974f4a2713aSLionel Sambuc
1975f4a2713aSLionel Sambuc   * Add code completion support for your expression in
1976f4a2713aSLionel Sambuc     ``SemaCodeComplete.cpp``.
1977f4a2713aSLionel Sambuc   * If your expression has types in it, or has any "interesting" features
1978f4a2713aSLionel Sambuc     other than subexpressions, extend libclang's ``CursorVisitor`` to provide
1979f4a2713aSLionel Sambuc     proper visitation for your expression, enabling various IDE features such
1980f4a2713aSLionel Sambuc     as syntax highlighting, cross-referencing, and so on.  The
1981f4a2713aSLionel Sambuc     ``c-index-test`` helper program can be used to test these features.
1982f4a2713aSLionel Sambuc
1983