1f4a2713aSLionel Sambuc============================ 2f4a2713aSLionel Sambuc"Clang" CFE Internals Manual 3f4a2713aSLionel Sambuc============================ 4f4a2713aSLionel Sambuc 5f4a2713aSLionel Sambuc.. contents:: 6f4a2713aSLionel Sambuc :local: 7f4a2713aSLionel Sambuc 8f4a2713aSLionel SambucIntroduction 9f4a2713aSLionel Sambuc============ 10f4a2713aSLionel Sambuc 11f4a2713aSLionel SambucThis document describes some of the more important APIs and internal design 12f4a2713aSLionel Sambucdecisions made in the Clang C front-end. The purpose of this document is to 13f4a2713aSLionel Sambucboth capture some of this high level information and also describe some of the 14f4a2713aSLionel Sambucdesign decisions behind it. This is meant for people interested in hacking on 15f4a2713aSLionel SambucClang, not for end-users. The description below is categorized by libraries, 16f4a2713aSLionel Sambucand does not describe any of the clients of the libraries. 17f4a2713aSLionel Sambuc 18f4a2713aSLionel SambucLLVM Support Library 19f4a2713aSLionel Sambuc==================== 20f4a2713aSLionel Sambuc 21f4a2713aSLionel SambucThe LLVM ``libSupport`` library provides many underlying libraries and 22f4a2713aSLionel Sambuc`data-structures <http://llvm.org/docs/ProgrammersManual.html>`_, including 23f4a2713aSLionel Sambuccommand line option processing, various containers and a system abstraction 24f4a2713aSLionel Sambuclayer, which is used for file system access. 25f4a2713aSLionel Sambuc 26f4a2713aSLionel SambucThe Clang "Basic" Library 27f4a2713aSLionel Sambuc========================= 28f4a2713aSLionel Sambuc 29f4a2713aSLionel SambucThis library certainly needs a better name. The "basic" library contains a 30f4a2713aSLionel Sambucnumber of low-level utilities for tracking and manipulating source buffers, 31f4a2713aSLionel Sambuclocations within the source buffers, diagnostics, tokens, target abstraction, 32f4a2713aSLionel Sambucand information about the subset of the language being compiled for. 33f4a2713aSLionel Sambuc 34f4a2713aSLionel SambucPart of this infrastructure is specific to C (such as the ``TargetInfo`` 35f4a2713aSLionel Sambucclass), other parts could be reused for other non-C-based languages 36f4a2713aSLionel Sambuc(``SourceLocation``, ``SourceManager``, ``Diagnostics``, ``FileManager``). 37f4a2713aSLionel SambucWhen and if there is future demand we can figure out if it makes sense to 38f4a2713aSLionel Sambucintroduce a new library, move the general classes somewhere else, or introduce 39f4a2713aSLionel Sambucsome other solution. 40f4a2713aSLionel Sambuc 41f4a2713aSLionel SambucWe describe the roles of these classes in order of their dependencies. 42f4a2713aSLionel Sambuc 43f4a2713aSLionel SambucThe Diagnostics Subsystem 44f4a2713aSLionel Sambuc------------------------- 45f4a2713aSLionel Sambuc 46f4a2713aSLionel SambucThe Clang Diagnostics subsystem is an important part of how the compiler 47f4a2713aSLionel Sambuccommunicates with the human. Diagnostics are the warnings and errors produced 48f4a2713aSLionel Sambucwhen the code is incorrect or dubious. In Clang, each diagnostic produced has 49f4a2713aSLionel Sambuc(at the minimum) a unique ID, an English translation associated with it, a 50f4a2713aSLionel Sambuc:ref:`SourceLocation <SourceLocation>` to "put the caret", and a severity 51f4a2713aSLionel Sambuc(e.g., ``WARNING`` or ``ERROR``). They can also optionally include a number of 52f4a2713aSLionel Sambucarguments to the dianostic (which fill in "%0"'s in the string) as well as a 53f4a2713aSLionel Sambucnumber of source ranges that related to the diagnostic. 54f4a2713aSLionel Sambuc 55f4a2713aSLionel SambucIn this section, we'll be giving examples produced by the Clang command line 56f4a2713aSLionel Sambucdriver, but diagnostics can be :ref:`rendered in many different ways 57f4a2713aSLionel Sambuc<DiagnosticClient>` depending on how the ``DiagnosticClient`` interface is 58f4a2713aSLionel Sambucimplemented. A representative example of a diagnostic is: 59f4a2713aSLionel Sambuc 60f4a2713aSLionel Sambuc.. code-block:: c++ 61f4a2713aSLionel Sambuc 62f4a2713aSLionel Sambuc t.c:38:15: error: invalid operands to binary expression ('int *' and '_Complex float') 63f4a2713aSLionel Sambuc P = (P-42) + Gamma*4; 64f4a2713aSLionel Sambuc ~~~~~~ ^ ~~~~~~~ 65f4a2713aSLionel Sambuc 66f4a2713aSLionel SambucIn this example, you can see the English translation, the severity (error), you 67f4a2713aSLionel Sambuccan see the source location (the caret ("``^``") and file/line/column info), 68f4a2713aSLionel Sambucthe source ranges "``~~~~``", arguments to the diagnostic ("``int*``" and 69f4a2713aSLionel Sambuc"``_Complex float``"). You'll have to believe me that there is a unique ID 70f4a2713aSLionel Sambucbacking the diagnostic :). 71f4a2713aSLionel Sambuc 72f4a2713aSLionel SambucGetting all of this to happen has several steps and involves many moving 73f4a2713aSLionel Sambucpieces, this section describes them and talks about best practices when adding 74f4a2713aSLionel Sambuca new diagnostic. 75f4a2713aSLionel Sambuc 76f4a2713aSLionel SambucThe ``Diagnostic*Kinds.td`` files 77f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 78f4a2713aSLionel Sambuc 79f4a2713aSLionel SambucDiagnostics are created by adding an entry to one of the 80f4a2713aSLionel Sambuc``clang/Basic/Diagnostic*Kinds.td`` files, depending on what library will be 81f4a2713aSLionel Sambucusing it. From this file, :program:`tblgen` generates the unique ID of the 82f4a2713aSLionel Sambucdiagnostic, the severity of the diagnostic and the English translation + format 83f4a2713aSLionel Sambucstring. 84f4a2713aSLionel Sambuc 85f4a2713aSLionel SambucThere is little sanity with the naming of the unique ID's right now. Some 86f4a2713aSLionel Sambucstart with ``err_``, ``warn_``, ``ext_`` to encode the severity into the name. 87f4a2713aSLionel SambucSince the enum is referenced in the C++ code that produces the diagnostic, it 88f4a2713aSLionel Sambucis somewhat useful for it to be reasonably short. 89f4a2713aSLionel Sambuc 90*0a6a1f1dSLionel SambucThe severity of the diagnostic comes from the set {``NOTE``, ``REMARK``, 91*0a6a1f1dSLionel Sambuc``WARNING``, 92f4a2713aSLionel Sambuc``EXTENSION``, ``EXTWARN``, ``ERROR``}. The ``ERROR`` severity is used for 93f4a2713aSLionel Sambucdiagnostics indicating the program is never acceptable under any circumstances. 94f4a2713aSLionel SambucWhen an error is emitted, the AST for the input code may not be fully built. 95f4a2713aSLionel SambucThe ``EXTENSION`` and ``EXTWARN`` severities are used for extensions to the 96f4a2713aSLionel Sambuclanguage that Clang accepts. This means that Clang fully understands and can 97f4a2713aSLionel Sambucrepresent them in the AST, but we produce diagnostics to tell the user their 98f4a2713aSLionel Sambuccode is non-portable. The difference is that the former are ignored by 99f4a2713aSLionel Sambucdefault, and the later warn by default. The ``WARNING`` severity is used for 100f4a2713aSLionel Sambucconstructs that are valid in the currently selected source language but that 101*0a6a1f1dSLionel Sambucare dubious in some way. The ``REMARK`` severity provides generic information 102*0a6a1f1dSLionel Sambucabout the compilation that is not necessarily related to any dubious code. The 103*0a6a1f1dSLionel Sambuc``NOTE`` level is used to staple more information onto previous diagnostics. 104f4a2713aSLionel Sambuc 105f4a2713aSLionel SambucThese *severities* are mapped into a smaller set (the ``Diagnostic::Level`` 106*0a6a1f1dSLionel Sambucenum, {``Ignored``, ``Note``, ``Remark``, ``Warning``, ``Error``, ``Fatal``}) of 107*0a6a1f1dSLionel Sambucoutput 108f4a2713aSLionel Sambuc*levels* by the diagnostics subsystem based on various configuration options. 109f4a2713aSLionel SambucClang internally supports a fully fine grained mapping mechanism that allows 110f4a2713aSLionel Sambucyou to map almost any diagnostic to the output level that you want. The only 111f4a2713aSLionel Sambucdiagnostics that cannot be mapped are ``NOTE``\ s, which always follow the 112f4a2713aSLionel Sambucseverity of the previously emitted diagnostic and ``ERROR``\ s, which can only 113f4a2713aSLionel Sambucbe mapped to ``Fatal`` (it is not possible to turn an error into a warning, for 114f4a2713aSLionel Sambucexample). 115f4a2713aSLionel Sambuc 116f4a2713aSLionel SambucDiagnostic mappings are used in many ways. For example, if the user specifies 117f4a2713aSLionel Sambuc``-pedantic``, ``EXTENSION`` maps to ``Warning``, if they specify 118f4a2713aSLionel Sambuc``-pedantic-errors``, it turns into ``Error``. This is used to implement 119f4a2713aSLionel Sambucoptions like ``-Wunused_macros``, ``-Wundef`` etc. 120f4a2713aSLionel Sambuc 121f4a2713aSLionel SambucMapping to ``Fatal`` should only be used for diagnostics that are considered so 122f4a2713aSLionel Sambucsevere that error recovery won't be able to recover sensibly from them (thus 123f4a2713aSLionel Sambucspewing a ton of bogus errors). One example of this class of error are failure 124f4a2713aSLionel Sambucto ``#include`` a file. 125f4a2713aSLionel Sambuc 126f4a2713aSLionel SambucThe Format String 127f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^ 128f4a2713aSLionel Sambuc 129f4a2713aSLionel SambucThe format string for the diagnostic is very simple, but it has some power. It 130f4a2713aSLionel Sambuctakes the form of a string in English with markers that indicate where and how 131f4a2713aSLionel Sambucarguments to the diagnostic are inserted and formatted. For example, here are 132f4a2713aSLionel Sambucsome simple format strings: 133f4a2713aSLionel Sambuc 134f4a2713aSLionel Sambuc.. code-block:: c++ 135f4a2713aSLionel Sambuc 136f4a2713aSLionel Sambuc "binary integer literals are an extension" 137f4a2713aSLionel Sambuc "format string contains '\\0' within the string body" 138f4a2713aSLionel Sambuc "more '%%' conversions than data arguments" 139f4a2713aSLionel Sambuc "invalid operands to binary expression (%0 and %1)" 140f4a2713aSLionel Sambuc "overloaded '%0' must be a %select{unary|binary|unary or binary}2 operator" 141f4a2713aSLionel Sambuc " (has %1 parameter%s1)" 142f4a2713aSLionel Sambuc 143f4a2713aSLionel SambucThese examples show some important points of format strings. You can use any 144f4a2713aSLionel Sambucplain ASCII character in the diagnostic string except "``%``" without a 145f4a2713aSLionel Sambucproblem, but these are C strings, so you have to use and be aware of all the C 146f4a2713aSLionel Sambucescape sequences (as in the second example). If you want to produce a "``%``" 147f4a2713aSLionel Sambucin the output, use the "``%%``" escape sequence, like the third diagnostic. 148f4a2713aSLionel SambucFinally, Clang uses the "``%...[digit]``" sequences to specify where and how 149f4a2713aSLionel Sambucarguments to the diagnostic are formatted. 150f4a2713aSLionel Sambuc 151f4a2713aSLionel SambucArguments to the diagnostic are numbered according to how they are specified by 152f4a2713aSLionel Sambucthe C++ code that :ref:`produces them <internals-producing-diag>`, and are 153f4a2713aSLionel Sambucreferenced by ``%0`` .. ``%9``. If you have more than 10 arguments to your 154f4a2713aSLionel Sambucdiagnostic, you are doing something wrong :). Unlike ``printf``, there is no 155f4a2713aSLionel Sambucrequirement that arguments to the diagnostic end up in the output in the same 156f4a2713aSLionel Sambucorder as they are specified, you could have a format string with "``%1 %0``" 157f4a2713aSLionel Sambucthat swaps them, for example. The text in between the percent and digit are 158f4a2713aSLionel Sambucformatting instructions. If there are no instructions, the argument is just 159f4a2713aSLionel Sambucturned into a string and substituted in. 160f4a2713aSLionel Sambuc 161f4a2713aSLionel SambucHere are some "best practices" for writing the English format string: 162f4a2713aSLionel Sambuc 163f4a2713aSLionel Sambuc* Keep the string short. It should ideally fit in the 80 column limit of the 164f4a2713aSLionel Sambuc ``DiagnosticKinds.td`` file. This avoids the diagnostic wrapping when 165f4a2713aSLionel Sambuc printed, and forces you to think about the important point you are conveying 166f4a2713aSLionel Sambuc with the diagnostic. 167f4a2713aSLionel Sambuc* Take advantage of location information. The user will be able to see the 168f4a2713aSLionel Sambuc line and location of the caret, so you don't need to tell them that the 169f4a2713aSLionel Sambuc problem is with the 4th argument to the function: just point to it. 170f4a2713aSLionel Sambuc* Do not capitalize the diagnostic string, and do not end it with a period. 171f4a2713aSLionel Sambuc* If you need to quote something in the diagnostic string, use single quotes. 172f4a2713aSLionel Sambuc 173f4a2713aSLionel SambucDiagnostics should never take random English strings as arguments: you 174f4a2713aSLionel Sambucshouldn't use "``you have a problem with %0``" and pass in things like "``your 175f4a2713aSLionel Sambucargument``" or "``your return value``" as arguments. Doing this prevents 176f4a2713aSLionel Sambuc:ref:`translating <internals-diag-translation>` the Clang diagnostics to other 177f4a2713aSLionel Sambuclanguages (because they'll get random English words in their otherwise 178f4a2713aSLionel Sambuclocalized diagnostic). The exceptions to this are C/C++ language keywords 179f4a2713aSLionel Sambuc(e.g., ``auto``, ``const``, ``mutable``, etc) and C/C++ operators (``/=``). 180f4a2713aSLionel SambucNote that things like "pointer" and "reference" are not keywords. On the other 181f4a2713aSLionel Sambuchand, you *can* include anything that comes from the user's source code, 182f4a2713aSLionel Sambucincluding variable names, types, labels, etc. The "``select``" format can be 183f4a2713aSLionel Sambucused to achieve this sort of thing in a localizable way, see below. 184f4a2713aSLionel Sambuc 185f4a2713aSLionel SambucFormatting a Diagnostic Argument 186f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 187f4a2713aSLionel Sambuc 188f4a2713aSLionel SambucArguments to diagnostics are fully typed internally, and come from a couple 189f4a2713aSLionel Sambucdifferent classes: integers, types, names, and random strings. Depending on 190f4a2713aSLionel Sambucthe class of the argument, it can be optionally formatted in different ways. 191f4a2713aSLionel SambucThis gives the ``DiagnosticClient`` information about what the argument means 192f4a2713aSLionel Sambucwithout requiring it to use a specific presentation (consider this MVC for 193f4a2713aSLionel SambucClang :). 194f4a2713aSLionel Sambuc 195f4a2713aSLionel SambucHere are the different diagnostic argument formats currently supported by 196f4a2713aSLionel SambucClang: 197f4a2713aSLionel Sambuc 198f4a2713aSLionel Sambuc**"s" format** 199f4a2713aSLionel Sambuc 200f4a2713aSLionel SambucExample: 201f4a2713aSLionel Sambuc ``"requires %1 parameter%s1"`` 202f4a2713aSLionel SambucClass: 203f4a2713aSLionel Sambuc Integers 204f4a2713aSLionel SambucDescription: 205f4a2713aSLionel Sambuc This is a simple formatter for integers that is useful when producing English 206f4a2713aSLionel Sambuc diagnostics. When the integer is 1, it prints as nothing. When the integer 207f4a2713aSLionel Sambuc is not 1, it prints as "``s``". This allows some simple grammatical forms to 208f4a2713aSLionel Sambuc be to be handled correctly, and eliminates the need to use gross things like 209f4a2713aSLionel Sambuc ``"requires %1 parameter(s)"``. 210f4a2713aSLionel Sambuc 211f4a2713aSLionel Sambuc**"select" format** 212f4a2713aSLionel Sambuc 213f4a2713aSLionel SambucExample: 214f4a2713aSLionel Sambuc ``"must be a %select{unary|binary|unary or binary}2 operator"`` 215f4a2713aSLionel SambucClass: 216f4a2713aSLionel Sambuc Integers 217f4a2713aSLionel SambucDescription: 218f4a2713aSLionel Sambuc This format specifier is used to merge multiple related diagnostics together 219f4a2713aSLionel Sambuc into one common one, without requiring the difference to be specified as an 220f4a2713aSLionel Sambuc English string argument. Instead of specifying the string, the diagnostic 221f4a2713aSLionel Sambuc gets an integer argument and the format string selects the numbered option. 222f4a2713aSLionel Sambuc In this case, the "``%2``" value must be an integer in the range [0..2]. If 223f4a2713aSLionel Sambuc it is 0, it prints "unary", if it is 1 it prints "binary" if it is 2, it 224f4a2713aSLionel Sambuc prints "unary or binary". This allows other language translations to 225f4a2713aSLionel Sambuc substitute reasonable words (or entire phrases) based on the semantics of the 226f4a2713aSLionel Sambuc diagnostic instead of having to do things textually. The selected string 227f4a2713aSLionel Sambuc does undergo formatting. 228f4a2713aSLionel Sambuc 229f4a2713aSLionel Sambuc**"plural" format** 230f4a2713aSLionel Sambuc 231f4a2713aSLionel SambucExample: 232f4a2713aSLionel Sambuc ``"you have %1 %plural{1:mouse|:mice}1 connected to your computer"`` 233f4a2713aSLionel SambucClass: 234f4a2713aSLionel Sambuc Integers 235f4a2713aSLionel SambucDescription: 236f4a2713aSLionel Sambuc This is a formatter for complex plural forms. It is designed to handle even 237f4a2713aSLionel Sambuc the requirements of languages with very complex plural forms, as many Baltic 238f4a2713aSLionel Sambuc languages have. The argument consists of a series of expression/form pairs, 239f4a2713aSLionel Sambuc separated by ":", where the first form whose expression evaluates to true is 240f4a2713aSLionel Sambuc the result of the modifier. 241f4a2713aSLionel Sambuc 242f4a2713aSLionel Sambuc An expression can be empty, in which case it is always true. See the example 243f4a2713aSLionel Sambuc at the top. Otherwise, it is a series of one or more numeric conditions, 244f4a2713aSLionel Sambuc separated by ",". If any condition matches, the expression matches. Each 245f4a2713aSLionel Sambuc numeric condition can take one of three forms. 246f4a2713aSLionel Sambuc 247f4a2713aSLionel Sambuc * number: A simple decimal number matches if the argument is the same as the 248f4a2713aSLionel Sambuc number. Example: ``"%plural{1:mouse|:mice}4"`` 249f4a2713aSLionel Sambuc * range: A range in square brackets matches if the argument is within the 250f4a2713aSLionel Sambuc range. Then range is inclusive on both ends. Example: 251f4a2713aSLionel Sambuc ``"%plural{0:none|1:one|[2,5]:some|:many}2"`` 252f4a2713aSLionel Sambuc * modulo: A modulo operator is followed by a number, and equals sign and 253f4a2713aSLionel Sambuc either a number or a range. The tests are the same as for plain numbers 254f4a2713aSLionel Sambuc and ranges, but the argument is taken modulo the number first. Example: 255f4a2713aSLionel Sambuc ``"%plural{%100=0:even hundred|%100=[1,50]:lower half|:everything else}1"`` 256f4a2713aSLionel Sambuc 257f4a2713aSLionel Sambuc The parser is very unforgiving. A syntax error, even whitespace, will abort, 258f4a2713aSLionel Sambuc as will a failure to match the argument against any expression. 259f4a2713aSLionel Sambuc 260f4a2713aSLionel Sambuc**"ordinal" format** 261f4a2713aSLionel Sambuc 262f4a2713aSLionel SambucExample: 263f4a2713aSLionel Sambuc ``"ambiguity in %ordinal0 argument"`` 264f4a2713aSLionel SambucClass: 265f4a2713aSLionel Sambuc Integers 266f4a2713aSLionel SambucDescription: 267f4a2713aSLionel Sambuc This is a formatter which represents the argument number as an ordinal: the 268f4a2713aSLionel Sambuc value ``1`` becomes ``1st``, ``3`` becomes ``3rd``, and so on. Values less 269f4a2713aSLionel Sambuc than ``1`` are not supported. This formatter is currently hard-coded to use 270f4a2713aSLionel Sambuc English ordinals. 271f4a2713aSLionel Sambuc 272f4a2713aSLionel Sambuc**"objcclass" format** 273f4a2713aSLionel Sambuc 274f4a2713aSLionel SambucExample: 275f4a2713aSLionel Sambuc ``"method %objcclass0 not found"`` 276f4a2713aSLionel SambucClass: 277f4a2713aSLionel Sambuc ``DeclarationName`` 278f4a2713aSLionel SambucDescription: 279f4a2713aSLionel Sambuc This is a simple formatter that indicates the ``DeclarationName`` corresponds 280f4a2713aSLionel Sambuc to an Objective-C class method selector. As such, it prints the selector 281f4a2713aSLionel Sambuc with a leading "``+``". 282f4a2713aSLionel Sambuc 283f4a2713aSLionel Sambuc**"objcinstance" format** 284f4a2713aSLionel Sambuc 285f4a2713aSLionel SambucExample: 286f4a2713aSLionel Sambuc ``"method %objcinstance0 not found"`` 287f4a2713aSLionel SambucClass: 288f4a2713aSLionel Sambuc ``DeclarationName`` 289f4a2713aSLionel SambucDescription: 290f4a2713aSLionel Sambuc This is a simple formatter that indicates the ``DeclarationName`` corresponds 291f4a2713aSLionel Sambuc to an Objective-C instance method selector. As such, it prints the selector 292f4a2713aSLionel Sambuc with a leading "``-``". 293f4a2713aSLionel Sambuc 294f4a2713aSLionel Sambuc**"q" format** 295f4a2713aSLionel Sambuc 296f4a2713aSLionel SambucExample: 297f4a2713aSLionel Sambuc ``"candidate found by name lookup is %q0"`` 298f4a2713aSLionel SambucClass: 299f4a2713aSLionel Sambuc ``NamedDecl *`` 300f4a2713aSLionel SambucDescription: 301f4a2713aSLionel Sambuc This formatter indicates that the fully-qualified name of the declaration 302f4a2713aSLionel Sambuc should be printed, e.g., "``std::vector``" rather than "``vector``". 303f4a2713aSLionel Sambuc 304f4a2713aSLionel Sambuc**"diff" format** 305f4a2713aSLionel Sambuc 306f4a2713aSLionel SambucExample: 307f4a2713aSLionel Sambuc ``"no known conversion %diff{from $ to $|from argument type to parameter type}1,2"`` 308f4a2713aSLionel SambucClass: 309f4a2713aSLionel Sambuc ``QualType`` 310f4a2713aSLionel SambucDescription: 311f4a2713aSLionel Sambuc This formatter takes two ``QualType``\ s and attempts to print a template 312f4a2713aSLionel Sambuc difference between the two. If tree printing is off, the text inside the 313f4a2713aSLionel Sambuc braces before the pipe is printed, with the formatted text replacing the $. 314f4a2713aSLionel Sambuc If tree printing is on, the text after the pipe is printed and a type tree is 315f4a2713aSLionel Sambuc printed after the diagnostic message. 316f4a2713aSLionel Sambuc 317f4a2713aSLionel SambucIt is really easy to add format specifiers to the Clang diagnostics system, but 318f4a2713aSLionel Sambucthey should be discussed before they are added. If you are creating a lot of 319f4a2713aSLionel Sambucrepetitive diagnostics and/or have an idea for a useful formatter, please bring 320f4a2713aSLionel Sambucit up on the cfe-dev mailing list. 321f4a2713aSLionel Sambuc 322f4a2713aSLionel Sambuc.. _internals-producing-diag: 323f4a2713aSLionel Sambuc 324f4a2713aSLionel SambucProducing the Diagnostic 325f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^ 326f4a2713aSLionel Sambuc 327f4a2713aSLionel SambucNow that you've created the diagnostic in the ``Diagnostic*Kinds.td`` file, you 328f4a2713aSLionel Sambucneed to write the code that detects the condition in question and emits the new 329f4a2713aSLionel Sambucdiagnostic. Various components of Clang (e.g., the preprocessor, ``Sema``, 330f4a2713aSLionel Sambucetc.) provide a helper function named "``Diag``". It creates a diagnostic and 331f4a2713aSLionel Sambucaccepts the arguments, ranges, and other information that goes along with it. 332f4a2713aSLionel Sambuc 333f4a2713aSLionel SambucFor example, the binary expression error comes from code like this: 334f4a2713aSLionel Sambuc 335f4a2713aSLionel Sambuc.. code-block:: c++ 336f4a2713aSLionel Sambuc 337f4a2713aSLionel Sambuc if (various things that are bad) 338f4a2713aSLionel Sambuc Diag(Loc, diag::err_typecheck_invalid_operands) 339f4a2713aSLionel Sambuc << lex->getType() << rex->getType() 340f4a2713aSLionel Sambuc << lex->getSourceRange() << rex->getSourceRange(); 341f4a2713aSLionel Sambuc 342f4a2713aSLionel SambucThis shows that use of the ``Diag`` method: it takes a location (a 343f4a2713aSLionel Sambuc:ref:`SourceLocation <SourceLocation>` object) and a diagnostic enum value 344f4a2713aSLionel Sambuc(which matches the name from ``Diagnostic*Kinds.td``). If the diagnostic takes 345f4a2713aSLionel Sambucarguments, they are specified with the ``<<`` operator: the first argument 346f4a2713aSLionel Sambucbecomes ``%0``, the second becomes ``%1``, etc. The diagnostic interface 347f4a2713aSLionel Sambucallows you to specify arguments of many different types, including ``int`` and 348f4a2713aSLionel Sambuc``unsigned`` for integer arguments, ``const char*`` and ``std::string`` for 349f4a2713aSLionel Sambucstring arguments, ``DeclarationName`` and ``const IdentifierInfo *`` for names, 350f4a2713aSLionel Sambuc``QualType`` for types, etc. ``SourceRange``\ s are also specified with the 351f4a2713aSLionel Sambuc``<<`` operator, but do not have a specific ordering requirement. 352f4a2713aSLionel Sambuc 353f4a2713aSLionel SambucAs you can see, adding and producing a diagnostic is pretty straightforward. 354f4a2713aSLionel SambucThe hard part is deciding exactly what you need to say to help the user, 355f4a2713aSLionel Sambucpicking a suitable wording, and providing the information needed to format it 356f4a2713aSLionel Sambuccorrectly. The good news is that the call site that issues a diagnostic should 357f4a2713aSLionel Sambucbe completely independent of how the diagnostic is formatted and in what 358f4a2713aSLionel Sambuclanguage it is rendered. 359f4a2713aSLionel Sambuc 360f4a2713aSLionel SambucFix-It Hints 361f4a2713aSLionel Sambuc^^^^^^^^^^^^ 362f4a2713aSLionel Sambuc 363f4a2713aSLionel SambucIn some cases, the front end emits diagnostics when it is clear that some small 364f4a2713aSLionel Sambucchange to the source code would fix the problem. For example, a missing 365f4a2713aSLionel Sambucsemicolon at the end of a statement or a use of deprecated syntax that is 366f4a2713aSLionel Sambuceasily rewritten into a more modern form. Clang tries very hard to emit the 367f4a2713aSLionel Sambucdiagnostic and recover gracefully in these and other cases. 368f4a2713aSLionel Sambuc 369f4a2713aSLionel SambucHowever, for these cases where the fix is obvious, the diagnostic can be 370f4a2713aSLionel Sambucannotated with a hint (referred to as a "fix-it hint") that describes how to 371f4a2713aSLionel Sambucchange the code referenced by the diagnostic to fix the problem. For example, 372f4a2713aSLionel Sambucit might add the missing semicolon at the end of the statement or rewrite the 373f4a2713aSLionel Sambucuse of a deprecated construct into something more palatable. Here is one such 374f4a2713aSLionel Sambucexample from the C++ front end, where we warn about the right-shift operator 375f4a2713aSLionel Sambucchanging meaning from C++98 to C++11: 376f4a2713aSLionel Sambuc 377f4a2713aSLionel Sambuc.. code-block:: c++ 378f4a2713aSLionel Sambuc 379f4a2713aSLionel Sambuc test.cpp:3:7: warning: use of right-shift operator ('>>') in template argument 380f4a2713aSLionel Sambuc will require parentheses in C++11 381f4a2713aSLionel Sambuc A<100 >> 2> *a; 382f4a2713aSLionel Sambuc ^ 383f4a2713aSLionel Sambuc ( ) 384f4a2713aSLionel Sambuc 385f4a2713aSLionel SambucHere, the fix-it hint is suggesting that parentheses be added, and showing 386f4a2713aSLionel Sambucexactly where those parentheses would be inserted into the source code. The 387f4a2713aSLionel Sambucfix-it hints themselves describe what changes to make to the source code in an 388f4a2713aSLionel Sambucabstract manner, which the text diagnostic printer renders as a line of 389f4a2713aSLionel Sambuc"insertions" below the caret line. :ref:`Other diagnostic clients 390f4a2713aSLionel Sambuc<DiagnosticClient>` might choose to render the code differently (e.g., as 391f4a2713aSLionel Sambucmarkup inline) or even give the user the ability to automatically fix the 392f4a2713aSLionel Sambucproblem. 393f4a2713aSLionel Sambuc 394f4a2713aSLionel SambucFix-it hints on errors and warnings need to obey these rules: 395f4a2713aSLionel Sambuc 396f4a2713aSLionel Sambuc* Since they are automatically applied if ``-Xclang -fixit`` is passed to the 397f4a2713aSLionel Sambuc driver, they should only be used when it's very likely they match the user's 398f4a2713aSLionel Sambuc intent. 399f4a2713aSLionel Sambuc* Clang must recover from errors as if the fix-it had been applied. 400f4a2713aSLionel Sambuc 401f4a2713aSLionel SambucIf a fix-it can't obey these rules, put the fix-it on a note. Fix-its on notes 402f4a2713aSLionel Sambucare not applied automatically. 403f4a2713aSLionel Sambuc 404f4a2713aSLionel SambucAll fix-it hints are described by the ``FixItHint`` class, instances of which 405f4a2713aSLionel Sambucshould be attached to the diagnostic using the ``<<`` operator in the same way 406f4a2713aSLionel Sambucthat highlighted source ranges and arguments are passed to the diagnostic. 407f4a2713aSLionel SambucFix-it hints can be created with one of three constructors: 408f4a2713aSLionel Sambuc 409f4a2713aSLionel Sambuc* ``FixItHint::CreateInsertion(Loc, Code)`` 410f4a2713aSLionel Sambuc 411f4a2713aSLionel Sambuc Specifies that the given ``Code`` (a string) should be inserted before the 412f4a2713aSLionel Sambuc source location ``Loc``. 413f4a2713aSLionel Sambuc 414f4a2713aSLionel Sambuc* ``FixItHint::CreateRemoval(Range)`` 415f4a2713aSLionel Sambuc 416f4a2713aSLionel Sambuc Specifies that the code in the given source ``Range`` should be removed. 417f4a2713aSLionel Sambuc 418f4a2713aSLionel Sambuc* ``FixItHint::CreateReplacement(Range, Code)`` 419f4a2713aSLionel Sambuc 420f4a2713aSLionel Sambuc Specifies that the code in the given source ``Range`` should be removed, 421f4a2713aSLionel Sambuc and replaced with the given ``Code`` string. 422f4a2713aSLionel Sambuc 423f4a2713aSLionel Sambuc.. _DiagnosticClient: 424f4a2713aSLionel Sambuc 425f4a2713aSLionel SambucThe ``DiagnosticClient`` Interface 426f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 427f4a2713aSLionel Sambuc 428f4a2713aSLionel SambucOnce code generates a diagnostic with all of the arguments and the rest of the 429f4a2713aSLionel Sambucrelevant information, Clang needs to know what to do with it. As previously 430f4a2713aSLionel Sambucmentioned, the diagnostic machinery goes through some filtering to map a 431f4a2713aSLionel Sambucseverity onto a diagnostic level, then (assuming the diagnostic is not mapped 432f4a2713aSLionel Sambucto "``Ignore``") it invokes an object that implements the ``DiagnosticClient`` 433f4a2713aSLionel Sambucinterface with the information. 434f4a2713aSLionel Sambuc 435f4a2713aSLionel SambucIt is possible to implement this interface in many different ways. For 436f4a2713aSLionel Sambucexample, the normal Clang ``DiagnosticClient`` (named 437f4a2713aSLionel Sambuc``TextDiagnosticPrinter``) turns the arguments into strings (according to the 438f4a2713aSLionel Sambucvarious formatting rules), prints out the file/line/column information and the 439f4a2713aSLionel Sambucstring, then prints out the line of code, the source ranges, and the caret. 440f4a2713aSLionel SambucHowever, this behavior isn't required. 441f4a2713aSLionel Sambuc 442f4a2713aSLionel SambucAnother implementation of the ``DiagnosticClient`` interface is the 443f4a2713aSLionel Sambuc``TextDiagnosticBuffer`` class, which is used when Clang is in ``-verify`` 444f4a2713aSLionel Sambucmode. Instead of formatting and printing out the diagnostics, this 445f4a2713aSLionel Sambucimplementation just captures and remembers the diagnostics as they fly by. 446f4a2713aSLionel SambucThen ``-verify`` compares the list of produced diagnostics to the list of 447f4a2713aSLionel Sambucexpected ones. If they disagree, it prints out its own output. Full 448f4a2713aSLionel Sambucdocumentation for the ``-verify`` mode can be found in the Clang API 449f4a2713aSLionel Sambucdocumentation for `VerifyDiagnosticConsumer 450f4a2713aSLionel Sambuc</doxygen/classclang_1_1VerifyDiagnosticConsumer.html#details>`_. 451f4a2713aSLionel Sambuc 452f4a2713aSLionel SambucThere are many other possible implementations of this interface, and this is 453f4a2713aSLionel Sambucwhy we prefer diagnostics to pass down rich structured information in 454f4a2713aSLionel Sambucarguments. For example, an HTML output might want declaration names be 455f4a2713aSLionel Sambuclinkified to where they come from in the source. Another example is that a GUI 456f4a2713aSLionel Sambucmight let you click on typedefs to expand them. This application would want to 457f4a2713aSLionel Sambucpass significantly more information about types through to the GUI than a 458f4a2713aSLionel Sambucsimple flat string. The interface allows this to happen. 459f4a2713aSLionel Sambuc 460f4a2713aSLionel Sambuc.. _internals-diag-translation: 461f4a2713aSLionel Sambuc 462f4a2713aSLionel SambucAdding Translations to Clang 463f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 464f4a2713aSLionel Sambuc 465f4a2713aSLionel SambucNot possible yet! Diagnostic strings should be written in UTF-8, the client can 466f4a2713aSLionel Sambuctranslate to the relevant code page if needed. Each translation completely 467f4a2713aSLionel Sambucreplaces the format string for the diagnostic. 468f4a2713aSLionel Sambuc 469f4a2713aSLionel Sambuc.. _SourceLocation: 470f4a2713aSLionel Sambuc.. _SourceManager: 471f4a2713aSLionel Sambuc 472f4a2713aSLionel SambucThe ``SourceLocation`` and ``SourceManager`` classes 473f4a2713aSLionel Sambuc---------------------------------------------------- 474f4a2713aSLionel Sambuc 475f4a2713aSLionel SambucStrangely enough, the ``SourceLocation`` class represents a location within the 476f4a2713aSLionel Sambucsource code of the program. Important design points include: 477f4a2713aSLionel Sambuc 478f4a2713aSLionel Sambuc#. ``sizeof(SourceLocation)`` must be extremely small, as these are embedded 479f4a2713aSLionel Sambuc into many AST nodes and are passed around often. Currently it is 32 bits. 480f4a2713aSLionel Sambuc#. ``SourceLocation`` must be a simple value object that can be efficiently 481f4a2713aSLionel Sambuc copied. 482f4a2713aSLionel Sambuc#. We should be able to represent a source location for any byte of any input 483f4a2713aSLionel Sambuc file. This includes in the middle of tokens, in whitespace, in trigraphs, 484f4a2713aSLionel Sambuc etc. 485f4a2713aSLionel Sambuc#. A ``SourceLocation`` must encode the current ``#include`` stack that was 486f4a2713aSLionel Sambuc active when the location was processed. For example, if the location 487f4a2713aSLionel Sambuc corresponds to a token, it should contain the set of ``#include``\ s active 488f4a2713aSLionel Sambuc when the token was lexed. This allows us to print the ``#include`` stack 489f4a2713aSLionel Sambuc for a diagnostic. 490f4a2713aSLionel Sambuc#. ``SourceLocation`` must be able to describe macro expansions, capturing both 491f4a2713aSLionel Sambuc the ultimate instantiation point and the source of the original character 492f4a2713aSLionel Sambuc data. 493f4a2713aSLionel Sambuc 494f4a2713aSLionel SambucIn practice, the ``SourceLocation`` works together with the ``SourceManager`` 495f4a2713aSLionel Sambucclass to encode two pieces of information about a location: its spelling 496f4a2713aSLionel Sambuclocation and its instantiation location. For most tokens, these will be the 497f4a2713aSLionel Sambucsame. However, for a macro expansion (or tokens that came from a ``_Pragma`` 498f4a2713aSLionel Sambucdirective) these will describe the location of the characters corresponding to 499f4a2713aSLionel Sambucthe token and the location where the token was used (i.e., the macro 500f4a2713aSLionel Sambucinstantiation point or the location of the ``_Pragma`` itself). 501f4a2713aSLionel Sambuc 502f4a2713aSLionel SambucThe Clang front-end inherently depends on the location of a token being tracked 503f4a2713aSLionel Sambuccorrectly. If it is ever incorrect, the front-end may get confused and die. 504f4a2713aSLionel SambucThe reason for this is that the notion of the "spelling" of a ``Token`` in 505f4a2713aSLionel SambucClang depends on being able to find the original input characters for the 506f4a2713aSLionel Sambuctoken. This concept maps directly to the "spelling location" for the token. 507f4a2713aSLionel Sambuc 508f4a2713aSLionel Sambuc``SourceRange`` and ``CharSourceRange`` 509f4a2713aSLionel Sambuc--------------------------------------- 510f4a2713aSLionel Sambuc 511f4a2713aSLionel Sambuc.. mostly taken from http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-August/010595.html 512f4a2713aSLionel Sambuc 513f4a2713aSLionel SambucClang represents most source ranges by [first, last], where "first" and "last" 514f4a2713aSLionel Sambuceach point to the beginning of their respective tokens. For example consider 515f4a2713aSLionel Sambucthe ``SourceRange`` of the following statement: 516f4a2713aSLionel Sambuc 517f4a2713aSLionel Sambuc.. code-block:: c++ 518f4a2713aSLionel Sambuc 519f4a2713aSLionel Sambuc x = foo + bar; 520f4a2713aSLionel Sambuc ^first ^last 521f4a2713aSLionel Sambuc 522f4a2713aSLionel SambucTo map from this representation to a character-based representation, the "last" 523f4a2713aSLionel Sambuclocation needs to be adjusted to point to (or past) the end of that token with 524f4a2713aSLionel Sambuceither ``Lexer::MeasureTokenLength()`` or ``Lexer::getLocForEndOfToken()``. For 525f4a2713aSLionel Sambucthe rare cases where character-level source ranges information is needed we use 526f4a2713aSLionel Sambucthe ``CharSourceRange`` class. 527f4a2713aSLionel Sambuc 528f4a2713aSLionel SambucThe Driver Library 529f4a2713aSLionel Sambuc================== 530f4a2713aSLionel Sambuc 531f4a2713aSLionel SambucThe clang Driver and library are documented :doc:`here <DriverInternals>`. 532f4a2713aSLionel Sambuc 533f4a2713aSLionel SambucPrecompiled Headers 534f4a2713aSLionel Sambuc=================== 535f4a2713aSLionel Sambuc 536f4a2713aSLionel SambucClang supports two implementations of precompiled headers. The default 537f4a2713aSLionel Sambucimplementation, precompiled headers (:doc:`PCH <PCHInternals>`) uses a 538f4a2713aSLionel Sambucserialized representation of Clang's internal data structures, encoded with the 539f4a2713aSLionel Sambuc`LLVM bitstream format <http://llvm.org/docs/BitCodeFormat.html>`_. 540f4a2713aSLionel SambucPretokenized headers (:doc:`PTH <PTHInternals>`), on the other hand, contain a 541f4a2713aSLionel Sambucserialized representation of the tokens encountered when preprocessing a header 542f4a2713aSLionel Sambuc(and anything that header includes). 543f4a2713aSLionel Sambuc 544f4a2713aSLionel SambucThe Frontend Library 545f4a2713aSLionel Sambuc==================== 546f4a2713aSLionel Sambuc 547f4a2713aSLionel SambucThe Frontend library contains functionality useful for building tools on top of 548f4a2713aSLionel Sambucthe Clang libraries, for example several methods for outputting diagnostics. 549f4a2713aSLionel Sambuc 550f4a2713aSLionel SambucThe Lexer and Preprocessor Library 551f4a2713aSLionel Sambuc================================== 552f4a2713aSLionel Sambuc 553f4a2713aSLionel SambucThe Lexer library contains several tightly-connected classes that are involved 554f4a2713aSLionel Sambucwith the nasty process of lexing and preprocessing C source code. The main 555f4a2713aSLionel Sambucinterface to this library for outside clients is the large ``Preprocessor`` 556f4a2713aSLionel Sambucclass. It contains the various pieces of state that are required to coherently 557f4a2713aSLionel Sambucread tokens out of a translation unit. 558f4a2713aSLionel Sambuc 559f4a2713aSLionel SambucThe core interface to the ``Preprocessor`` object (once it is set up) is the 560f4a2713aSLionel Sambuc``Preprocessor::Lex`` method, which returns the next :ref:`Token <Token>` from 561f4a2713aSLionel Sambucthe preprocessor stream. There are two types of token providers that the 562f4a2713aSLionel Sambucpreprocessor is capable of reading from: a buffer lexer (provided by the 563f4a2713aSLionel Sambuc:ref:`Lexer <Lexer>` class) and a buffered token stream (provided by the 564f4a2713aSLionel Sambuc:ref:`TokenLexer <TokenLexer>` class). 565f4a2713aSLionel Sambuc 566f4a2713aSLionel Sambuc.. _Token: 567f4a2713aSLionel Sambuc 568f4a2713aSLionel SambucThe Token class 569f4a2713aSLionel Sambuc--------------- 570f4a2713aSLionel Sambuc 571f4a2713aSLionel SambucThe ``Token`` class is used to represent a single lexed token. Tokens are 572f4a2713aSLionel Sambucintended to be used by the lexer/preprocess and parser libraries, but are not 573f4a2713aSLionel Sambucintended to live beyond them (for example, they should not live in the ASTs). 574f4a2713aSLionel Sambuc 575f4a2713aSLionel SambucTokens most often live on the stack (or some other location that is efficient 576f4a2713aSLionel Sambucto access) as the parser is running, but occasionally do get buffered up. For 577f4a2713aSLionel Sambucexample, macro definitions are stored as a series of tokens, and the C++ 578f4a2713aSLionel Sambucfront-end periodically needs to buffer tokens up for tentative parsing and 579f4a2713aSLionel Sambucvarious pieces of look-ahead. As such, the size of a ``Token`` matters. On a 580f4a2713aSLionel Sambuc32-bit system, ``sizeof(Token)`` is currently 16 bytes. 581f4a2713aSLionel Sambuc 582f4a2713aSLionel SambucTokens occur in two forms: :ref:`annotation tokens <AnnotationToken>` and 583f4a2713aSLionel Sambucnormal tokens. Normal tokens are those returned by the lexer, annotation 584f4a2713aSLionel Sambuctokens represent semantic information and are produced by the parser, replacing 585f4a2713aSLionel Sambucnormal tokens in the token stream. Normal tokens contain the following 586f4a2713aSLionel Sambucinformation: 587f4a2713aSLionel Sambuc 588f4a2713aSLionel Sambuc* **A SourceLocation** --- This indicates the location of the start of the 589f4a2713aSLionel Sambuc token. 590f4a2713aSLionel Sambuc 591f4a2713aSLionel Sambuc* **A length** --- This stores the length of the token as stored in the 592f4a2713aSLionel Sambuc ``SourceBuffer``. For tokens that include them, this length includes 593f4a2713aSLionel Sambuc trigraphs and escaped newlines which are ignored by later phases of the 594f4a2713aSLionel Sambuc compiler. By pointing into the original source buffer, it is always possible 595f4a2713aSLionel Sambuc to get the original spelling of a token completely accurately. 596f4a2713aSLionel Sambuc 597f4a2713aSLionel Sambuc* **IdentifierInfo** --- If a token takes the form of an identifier, and if 598f4a2713aSLionel Sambuc identifier lookup was enabled when the token was lexed (e.g., the lexer was 599f4a2713aSLionel Sambuc not reading in "raw" mode) this contains a pointer to the unique hash value 600f4a2713aSLionel Sambuc for the identifier. Because the lookup happens before keyword 601f4a2713aSLionel Sambuc identification, this field is set even for language keywords like "``for``". 602f4a2713aSLionel Sambuc 603f4a2713aSLionel Sambuc* **TokenKind** --- This indicates the kind of token as classified by the 604f4a2713aSLionel Sambuc lexer. This includes things like ``tok::starequal`` (for the "``*=``" 605f4a2713aSLionel Sambuc operator), ``tok::ampamp`` for the "``&&``" token, and keyword values (e.g., 606f4a2713aSLionel Sambuc ``tok::kw_for``) for identifiers that correspond to keywords. Note that 607f4a2713aSLionel Sambuc some tokens can be spelled multiple ways. For example, C++ supports 608f4a2713aSLionel Sambuc "operator keywords", where things like "``and``" are treated exactly like the 609f4a2713aSLionel Sambuc "``&&``" operator. In these cases, the kind value is set to ``tok::ampamp``, 610f4a2713aSLionel Sambuc which is good for the parser, which doesn't have to consider both forms. For 611f4a2713aSLionel Sambuc something that cares about which form is used (e.g., the preprocessor 612f4a2713aSLionel Sambuc "stringize" operator) the spelling indicates the original form. 613f4a2713aSLionel Sambuc 614f4a2713aSLionel Sambuc* **Flags** --- There are currently four flags tracked by the 615f4a2713aSLionel Sambuc lexer/preprocessor system on a per-token basis: 616f4a2713aSLionel Sambuc 617f4a2713aSLionel Sambuc #. **StartOfLine** --- This was the first token that occurred on its input 618f4a2713aSLionel Sambuc source line. 619f4a2713aSLionel Sambuc #. **LeadingSpace** --- There was a space character either immediately before 620f4a2713aSLionel Sambuc the token or transitively before the token as it was expanded through a 621f4a2713aSLionel Sambuc macro. The definition of this flag is very closely defined by the 622f4a2713aSLionel Sambuc stringizing requirements of the preprocessor. 623f4a2713aSLionel Sambuc #. **DisableExpand** --- This flag is used internally to the preprocessor to 624f4a2713aSLionel Sambuc represent identifier tokens which have macro expansion disabled. This 625f4a2713aSLionel Sambuc prevents them from being considered as candidates for macro expansion ever 626f4a2713aSLionel Sambuc in the future. 627f4a2713aSLionel Sambuc #. **NeedsCleaning** --- This flag is set if the original spelling for the 628f4a2713aSLionel Sambuc token includes a trigraph or escaped newline. Since this is uncommon, 629f4a2713aSLionel Sambuc many pieces of code can fast-path on tokens that did not need cleaning. 630f4a2713aSLionel Sambuc 631f4a2713aSLionel SambucOne interesting (and somewhat unusual) aspect of normal tokens is that they 632f4a2713aSLionel Sambucdon't contain any semantic information about the lexed value. For example, if 633f4a2713aSLionel Sambucthe token was a pp-number token, we do not represent the value of the number 634f4a2713aSLionel Sambucthat was lexed (this is left for later pieces of code to decide). 635f4a2713aSLionel SambucAdditionally, the lexer library has no notion of typedef names vs variable 636f4a2713aSLionel Sambucnames: both are returned as identifiers, and the parser is left to decide 637f4a2713aSLionel Sambucwhether a specific identifier is a typedef or a variable (tracking this 638f4a2713aSLionel Sambucrequires scope information among other things). The parser can do this 639f4a2713aSLionel Sambuctranslation by replacing tokens returned by the preprocessor with "Annotation 640f4a2713aSLionel SambucTokens". 641f4a2713aSLionel Sambuc 642f4a2713aSLionel Sambuc.. _AnnotationToken: 643f4a2713aSLionel Sambuc 644f4a2713aSLionel SambucAnnotation Tokens 645f4a2713aSLionel Sambuc----------------- 646f4a2713aSLionel Sambuc 647f4a2713aSLionel SambucAnnotation tokens are tokens that are synthesized by the parser and injected 648f4a2713aSLionel Sambucinto the preprocessor's token stream (replacing existing tokens) to record 649f4a2713aSLionel Sambucsemantic information found by the parser. For example, if "``foo``" is found 650f4a2713aSLionel Sambucto be a typedef, the "``foo``" ``tok::identifier`` token is replaced with an 651f4a2713aSLionel Sambuc``tok::annot_typename``. This is useful for a couple of reasons: 1) this makes 652f4a2713aSLionel Sambucit easy to handle qualified type names (e.g., "``foo::bar::baz<42>::t``") in 653f4a2713aSLionel SambucC++ as a single "token" in the parser. 2) if the parser backtracks, the 654f4a2713aSLionel Sambucreparse does not need to redo semantic analysis to determine whether a token 655f4a2713aSLionel Sambucsequence is a variable, type, template, etc. 656f4a2713aSLionel Sambuc 657f4a2713aSLionel SambucAnnotation tokens are created by the parser and reinjected into the parser's 658f4a2713aSLionel Sambuctoken stream (when backtracking is enabled). Because they can only exist in 659f4a2713aSLionel Sambuctokens that the preprocessor-proper is done with, it doesn't need to keep 660f4a2713aSLionel Sambucaround flags like "start of line" that the preprocessor uses to do its job. 661f4a2713aSLionel SambucAdditionally, an annotation token may "cover" a sequence of preprocessor tokens 662f4a2713aSLionel Sambuc(e.g., "``a::b::c``" is five preprocessor tokens). As such, the valid fields 663f4a2713aSLionel Sambucof an annotation token are different than the fields for a normal token (but 664f4a2713aSLionel Sambucthey are multiplexed into the normal ``Token`` fields): 665f4a2713aSLionel Sambuc 666f4a2713aSLionel Sambuc* **SourceLocation "Location"** --- The ``SourceLocation`` for the annotation 667f4a2713aSLionel Sambuc token indicates the first token replaced by the annotation token. In the 668f4a2713aSLionel Sambuc example above, it would be the location of the "``a``" identifier. 669f4a2713aSLionel Sambuc* **SourceLocation "AnnotationEndLoc"** --- This holds the location of the last 670f4a2713aSLionel Sambuc token replaced with the annotation token. In the example above, it would be 671f4a2713aSLionel Sambuc the location of the "``c``" identifier. 672f4a2713aSLionel Sambuc* **void* "AnnotationValue"** --- This contains an opaque object that the 673f4a2713aSLionel Sambuc parser gets from ``Sema``. The parser merely preserves the information for 674f4a2713aSLionel Sambuc ``Sema`` to later interpret based on the annotation token kind. 675f4a2713aSLionel Sambuc* **TokenKind "Kind"** --- This indicates the kind of Annotation token this is. 676f4a2713aSLionel Sambuc See below for the different valid kinds. 677f4a2713aSLionel Sambuc 678f4a2713aSLionel SambucAnnotation tokens currently come in three kinds: 679f4a2713aSLionel Sambuc 680f4a2713aSLionel Sambuc#. **tok::annot_typename**: This annotation token represents a resolved 681f4a2713aSLionel Sambuc typename token that is potentially qualified. The ``AnnotationValue`` field 682f4a2713aSLionel Sambuc contains the ``QualType`` returned by ``Sema::getTypeName()``, possibly with 683f4a2713aSLionel Sambuc source location information attached. 684f4a2713aSLionel Sambuc#. **tok::annot_cxxscope**: This annotation token represents a C++ scope 685f4a2713aSLionel Sambuc specifier, such as "``A::B::``". This corresponds to the grammar 686f4a2713aSLionel Sambuc productions "*::*" and "*:: [opt] nested-name-specifier*". The 687f4a2713aSLionel Sambuc ``AnnotationValue`` pointer is a ``NestedNameSpecifier *`` returned by the 688f4a2713aSLionel Sambuc ``Sema::ActOnCXXGlobalScopeSpecifier`` and 689f4a2713aSLionel Sambuc ``Sema::ActOnCXXNestedNameSpecifier`` callbacks. 690f4a2713aSLionel Sambuc#. **tok::annot_template_id**: This annotation token represents a C++ 691f4a2713aSLionel Sambuc template-id such as "``foo<int, 4>``", where "``foo``" is the name of a 692f4a2713aSLionel Sambuc template. The ``AnnotationValue`` pointer is a pointer to a ``malloc``'d 693f4a2713aSLionel Sambuc ``TemplateIdAnnotation`` object. Depending on the context, a parsed 694f4a2713aSLionel Sambuc template-id that names a type might become a typename annotation token (if 695f4a2713aSLionel Sambuc all we care about is the named type, e.g., because it occurs in a type 696f4a2713aSLionel Sambuc specifier) or might remain a template-id token (if we want to retain more 697f4a2713aSLionel Sambuc source location information or produce a new type, e.g., in a declaration of 698f4a2713aSLionel Sambuc a class template specialization). template-id annotation tokens that refer 699f4a2713aSLionel Sambuc to a type can be "upgraded" to typename annotation tokens by the parser. 700f4a2713aSLionel Sambuc 701f4a2713aSLionel SambucAs mentioned above, annotation tokens are not returned by the preprocessor, 702f4a2713aSLionel Sambucthey are formed on demand by the parser. This means that the parser has to be 703f4a2713aSLionel Sambucaware of cases where an annotation could occur and form it where appropriate. 704f4a2713aSLionel SambucThis is somewhat similar to how the parser handles Translation Phase 6 of C99: 705f4a2713aSLionel SambucString Concatenation (see C99 5.1.1.2). In the case of string concatenation, 706f4a2713aSLionel Sambucthe preprocessor just returns distinct ``tok::string_literal`` and 707f4a2713aSLionel Sambuc``tok::wide_string_literal`` tokens and the parser eats a sequence of them 708f4a2713aSLionel Sambucwherever the grammar indicates that a string literal can occur. 709f4a2713aSLionel Sambuc 710f4a2713aSLionel SambucIn order to do this, whenever the parser expects a ``tok::identifier`` or 711f4a2713aSLionel Sambuc``tok::coloncolon``, it should call the ``TryAnnotateTypeOrScopeToken`` or 712f4a2713aSLionel Sambuc``TryAnnotateCXXScopeToken`` methods to form the annotation token. These 713f4a2713aSLionel Sambucmethods will maximally form the specified annotation tokens and replace the 714f4a2713aSLionel Sambuccurrent token with them, if applicable. If the current tokens is not valid for 715f4a2713aSLionel Sambucan annotation token, it will remain an identifier or "``::``" token. 716f4a2713aSLionel Sambuc 717f4a2713aSLionel Sambuc.. _Lexer: 718f4a2713aSLionel Sambuc 719f4a2713aSLionel SambucThe ``Lexer`` class 720f4a2713aSLionel Sambuc------------------- 721f4a2713aSLionel Sambuc 722f4a2713aSLionel SambucThe ``Lexer`` class provides the mechanics of lexing tokens out of a source 723f4a2713aSLionel Sambucbuffer and deciding what they mean. The ``Lexer`` is complicated by the fact 724f4a2713aSLionel Sambucthat it operates on raw buffers that have not had spelling eliminated (this is 725f4a2713aSLionel Sambuca necessity to get decent performance), but this is countered with careful 726f4a2713aSLionel Sambuccoding as well as standard performance techniques (for example, the comment 727f4a2713aSLionel Sambuchandling code is vectorized on X86 and PowerPC hosts). 728f4a2713aSLionel Sambuc 729f4a2713aSLionel SambucThe lexer has a couple of interesting modal features: 730f4a2713aSLionel Sambuc 731f4a2713aSLionel Sambuc* The lexer can operate in "raw" mode. This mode has several features that 732f4a2713aSLionel Sambuc make it possible to quickly lex the file (e.g., it stops identifier lookup, 733f4a2713aSLionel Sambuc doesn't specially handle preprocessor tokens, handles EOF differently, etc). 734f4a2713aSLionel Sambuc This mode is used for lexing within an "``#if 0``" block, for example. 735f4a2713aSLionel Sambuc* The lexer can capture and return comments as tokens. This is required to 736f4a2713aSLionel Sambuc support the ``-C`` preprocessor mode, which passes comments through, and is 737f4a2713aSLionel Sambuc used by the diagnostic checker to identifier expect-error annotations. 738f4a2713aSLionel Sambuc* The lexer can be in ``ParsingFilename`` mode, which happens when 739f4a2713aSLionel Sambuc preprocessing after reading a ``#include`` directive. This mode changes the 740f4a2713aSLionel Sambuc parsing of "``<``" to return an "angled string" instead of a bunch of tokens 741f4a2713aSLionel Sambuc for each thing within the filename. 742f4a2713aSLionel Sambuc* When parsing a preprocessor directive (after "``#``") the 743f4a2713aSLionel Sambuc ``ParsingPreprocessorDirective`` mode is entered. This changes the parser to 744f4a2713aSLionel Sambuc return EOD at a newline. 745f4a2713aSLionel Sambuc* The ``Lexer`` uses a ``LangOptions`` object to know whether trigraphs are 746f4a2713aSLionel Sambuc enabled, whether C++ or ObjC keywords are recognized, etc. 747f4a2713aSLionel Sambuc 748f4a2713aSLionel SambucIn addition to these modes, the lexer keeps track of a couple of other features 749f4a2713aSLionel Sambucthat are local to a lexed buffer, which change as the buffer is lexed: 750f4a2713aSLionel Sambuc 751f4a2713aSLionel Sambuc* The ``Lexer`` uses ``BufferPtr`` to keep track of the current character being 752f4a2713aSLionel Sambuc lexed. 753f4a2713aSLionel Sambuc* The ``Lexer`` uses ``IsAtStartOfLine`` to keep track of whether the next 754f4a2713aSLionel Sambuc lexed token will start with its "start of line" bit set. 755f4a2713aSLionel Sambuc* The ``Lexer`` keeps track of the current "``#if``" directives that are active 756f4a2713aSLionel Sambuc (which can be nested). 757f4a2713aSLionel Sambuc* The ``Lexer`` keeps track of an :ref:`MultipleIncludeOpt 758f4a2713aSLionel Sambuc <MultipleIncludeOpt>` object, which is used to detect whether the buffer uses 759f4a2713aSLionel Sambuc the standard "``#ifndef XX`` / ``#define XX``" idiom to prevent multiple 760f4a2713aSLionel Sambuc inclusion. If a buffer does, subsequent includes can be ignored if the 761f4a2713aSLionel Sambuc "``XX``" macro is defined. 762f4a2713aSLionel Sambuc 763f4a2713aSLionel Sambuc.. _TokenLexer: 764f4a2713aSLionel Sambuc 765f4a2713aSLionel SambucThe ``TokenLexer`` class 766f4a2713aSLionel Sambuc------------------------ 767f4a2713aSLionel Sambuc 768f4a2713aSLionel SambucThe ``TokenLexer`` class is a token provider that returns tokens from a list of 769f4a2713aSLionel Sambuctokens that came from somewhere else. It typically used for two things: 1) 770f4a2713aSLionel Sambucreturning tokens from a macro definition as it is being expanded 2) returning 771f4a2713aSLionel Sambuctokens from an arbitrary buffer of tokens. The later use is used by 772f4a2713aSLionel Sambuc``_Pragma`` and will most likely be used to handle unbounded look-ahead for the 773f4a2713aSLionel SambucC++ parser. 774f4a2713aSLionel Sambuc 775f4a2713aSLionel Sambuc.. _MultipleIncludeOpt: 776f4a2713aSLionel Sambuc 777f4a2713aSLionel SambucThe ``MultipleIncludeOpt`` class 778f4a2713aSLionel Sambuc-------------------------------- 779f4a2713aSLionel Sambuc 780f4a2713aSLionel SambucThe ``MultipleIncludeOpt`` class implements a really simple little state 781f4a2713aSLionel Sambucmachine that is used to detect the standard "``#ifndef XX`` / ``#define XX``" 782f4a2713aSLionel Sambucidiom that people typically use to prevent multiple inclusion of headers. If a 783f4a2713aSLionel Sambucbuffer uses this idiom and is subsequently ``#include``'d, the preprocessor can 784f4a2713aSLionel Sambucsimply check to see whether the guarding condition is defined or not. If so, 785f4a2713aSLionel Sambucthe preprocessor can completely ignore the include of the header. 786f4a2713aSLionel Sambuc 787*0a6a1f1dSLionel Sambuc.. _Parser: 788*0a6a1f1dSLionel Sambuc 789f4a2713aSLionel SambucThe Parser Library 790f4a2713aSLionel Sambuc================== 791f4a2713aSLionel Sambuc 792*0a6a1f1dSLionel SambucThis library contains a recursive-descent parser that polls tokens from the 793*0a6a1f1dSLionel Sambucpreprocessor and notifies a client of the parsing progress. 794*0a6a1f1dSLionel Sambuc 795*0a6a1f1dSLionel SambucHistorically, the parser used to talk to an abstract ``Action`` interface that 796*0a6a1f1dSLionel Sambuchad virtual methods for parse events, for example ``ActOnBinOp()``. When Clang 797*0a6a1f1dSLionel Sambucgrew C++ support, the parser stopped supporting general ``Action`` clients -- 798*0a6a1f1dSLionel Sambucit now always talks to the :ref:`Sema libray <Sema>`. However, the Parser 799*0a6a1f1dSLionel Sambucstill accesses AST objects only through opaque types like ``ExprResult`` and 800*0a6a1f1dSLionel Sambuc``StmtResult``. Only :ref:`Sema <Sema>` looks at the AST node contents of these 801*0a6a1f1dSLionel Sambucwrappers. 802*0a6a1f1dSLionel Sambuc 803*0a6a1f1dSLionel Sambuc.. _AST: 804*0a6a1f1dSLionel Sambuc 805f4a2713aSLionel SambucThe AST Library 806f4a2713aSLionel Sambuc=============== 807f4a2713aSLionel Sambuc 808f4a2713aSLionel Sambuc.. _Type: 809f4a2713aSLionel Sambuc 810f4a2713aSLionel SambucThe ``Type`` class and its subclasses 811f4a2713aSLionel Sambuc------------------------------------- 812f4a2713aSLionel Sambuc 813f4a2713aSLionel SambucThe ``Type`` class (and its subclasses) are an important part of the AST. 814f4a2713aSLionel SambucTypes are accessed through the ``ASTContext`` class, which implicitly creates 815f4a2713aSLionel Sambucand uniques them as they are needed. Types have a couple of non-obvious 816f4a2713aSLionel Sambucfeatures: 1) they do not capture type qualifiers like ``const`` or ``volatile`` 817f4a2713aSLionel Sambuc(see :ref:`QualType <QualType>`), and 2) they implicitly capture typedef 818f4a2713aSLionel Sambucinformation. Once created, types are immutable (unlike decls). 819f4a2713aSLionel Sambuc 820f4a2713aSLionel SambucTypedefs in C make semantic analysis a bit more complex than it would be without 821f4a2713aSLionel Sambucthem. The issue is that we want to capture typedef information and represent it 822f4a2713aSLionel Sambucin the AST perfectly, but the semantics of operations need to "see through" 823f4a2713aSLionel Sambuctypedefs. For example, consider this code: 824f4a2713aSLionel Sambuc 825f4a2713aSLionel Sambuc.. code-block:: c++ 826f4a2713aSLionel Sambuc 827f4a2713aSLionel Sambuc void func() { 828f4a2713aSLionel Sambuc typedef int foo; 829f4a2713aSLionel Sambuc foo X, *Y; 830f4a2713aSLionel Sambuc typedef foo *bar; 831f4a2713aSLionel Sambuc bar Z; 832f4a2713aSLionel Sambuc *X; // error 833f4a2713aSLionel Sambuc **Y; // error 834f4a2713aSLionel Sambuc **Z; // error 835f4a2713aSLionel Sambuc } 836f4a2713aSLionel Sambuc 837f4a2713aSLionel SambucThe code above is illegal, and thus we expect there to be diagnostics emitted 838f4a2713aSLionel Sambucon the annotated lines. In this example, we expect to get: 839f4a2713aSLionel Sambuc 840f4a2713aSLionel Sambuc.. code-block:: c++ 841f4a2713aSLionel Sambuc 842f4a2713aSLionel Sambuc test.c:6:1: error: indirection requires pointer operand ('foo' invalid) 843f4a2713aSLionel Sambuc *X; // error 844f4a2713aSLionel Sambuc ^~ 845f4a2713aSLionel Sambuc test.c:7:1: error: indirection requires pointer operand ('foo' invalid) 846f4a2713aSLionel Sambuc **Y; // error 847f4a2713aSLionel Sambuc ^~~ 848f4a2713aSLionel Sambuc test.c:8:1: error: indirection requires pointer operand ('foo' invalid) 849f4a2713aSLionel Sambuc **Z; // error 850f4a2713aSLionel Sambuc ^~~ 851f4a2713aSLionel Sambuc 852f4a2713aSLionel SambucWhile this example is somewhat silly, it illustrates the point: we want to 853f4a2713aSLionel Sambucretain typedef information where possible, so that we can emit errors about 854f4a2713aSLionel Sambuc"``std::string``" instead of "``std::basic_string<char, std:...``". Doing this 855f4a2713aSLionel Sambucrequires properly keeping typedef information (for example, the type of ``X`` 856f4a2713aSLionel Sambucis "``foo``", not "``int``"), and requires properly propagating it through the 857f4a2713aSLionel Sambucvarious operators (for example, the type of ``*Y`` is "``foo``", not 858f4a2713aSLionel Sambuc"``int``"). In order to retain this information, the type of these expressions 859f4a2713aSLionel Sambucis an instance of the ``TypedefType`` class, which indicates that the type of 860f4a2713aSLionel Sambucthese expressions is a typedef for "``foo``". 861f4a2713aSLionel Sambuc 862f4a2713aSLionel SambucRepresenting types like this is great for diagnostics, because the 863f4a2713aSLionel Sambucuser-specified type is always immediately available. There are two problems 864f4a2713aSLionel Sambucwith this: first, various semantic checks need to make judgements about the 865f4a2713aSLionel Sambuc*actual structure* of a type, ignoring typedefs. Second, we need an efficient 866f4a2713aSLionel Sambucway to query whether two types are structurally identical to each other, 867f4a2713aSLionel Sambucignoring typedefs. The solution to both of these problems is the idea of 868f4a2713aSLionel Sambuccanonical types. 869f4a2713aSLionel Sambuc 870f4a2713aSLionel SambucCanonical Types 871f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^ 872f4a2713aSLionel Sambuc 873f4a2713aSLionel SambucEvery instance of the ``Type`` class contains a canonical type pointer. For 874f4a2713aSLionel Sambucsimple types with no typedefs involved (e.g., "``int``", "``int*``", 875f4a2713aSLionel Sambuc"``int**``"), the type just points to itself. For types that have a typedef 876f4a2713aSLionel Sambucsomewhere in their structure (e.g., "``foo``", "``foo*``", "``foo**``", 877f4a2713aSLionel Sambuc"``bar``"), the canonical type pointer points to their structurally equivalent 878f4a2713aSLionel Sambuctype without any typedefs (e.g., "``int``", "``int*``", "``int**``", and 879f4a2713aSLionel Sambuc"``int*``" respectively). 880f4a2713aSLionel Sambuc 881f4a2713aSLionel SambucThis design provides a constant time operation (dereferencing the canonical type 882f4a2713aSLionel Sambucpointer) that gives us access to the structure of types. For example, we can 883f4a2713aSLionel Sambuctrivially tell that "``bar``" and "``foo*``" are the same type by dereferencing 884f4a2713aSLionel Sambuctheir canonical type pointers and doing a pointer comparison (they both point 885f4a2713aSLionel Sambucto the single "``int*``" type). 886f4a2713aSLionel Sambuc 887f4a2713aSLionel SambucCanonical types and typedef types bring up some complexities that must be 888f4a2713aSLionel Sambuccarefully managed. Specifically, the ``isa``/``cast``/``dyn_cast`` operators 889f4a2713aSLionel Sambucgenerally shouldn't be used in code that is inspecting the AST. For example, 890f4a2713aSLionel Sambucwhen type checking the indirection operator (unary "``*``" on a pointer), the 891f4a2713aSLionel Sambuctype checker must verify that the operand has a pointer type. It would not be 892f4a2713aSLionel Sambuccorrect to check that with "``isa<PointerType>(SubExpr->getType())``", because 893f4a2713aSLionel Sambucthis predicate would fail if the subexpression had a typedef type. 894f4a2713aSLionel Sambuc 895f4a2713aSLionel SambucThe solution to this problem are a set of helper methods on ``Type``, used to 896f4a2713aSLionel Sambuccheck their properties. In this case, it would be correct to use 897f4a2713aSLionel Sambuc"``SubExpr->getType()->isPointerType()``" to do the check. This predicate will 898f4a2713aSLionel Sambucreturn true if the *canonical type is a pointer*, which is true any time the 899f4a2713aSLionel Sambuctype is structurally a pointer type. The only hard part here is remembering 900f4a2713aSLionel Sambucnot to use the ``isa``/``cast``/``dyn_cast`` operations. 901f4a2713aSLionel Sambuc 902f4a2713aSLionel SambucThe second problem we face is how to get access to the pointer type once we 903f4a2713aSLionel Sambucknow it exists. To continue the example, the result type of the indirection 904f4a2713aSLionel Sambucoperator is the pointee type of the subexpression. In order to determine the 905f4a2713aSLionel Sambuctype, we need to get the instance of ``PointerType`` that best captures the 906f4a2713aSLionel Sambuctypedef information in the program. If the type of the expression is literally 907f4a2713aSLionel Sambuca ``PointerType``, we can return that, otherwise we have to dig through the 908f4a2713aSLionel Sambuctypedefs to find the pointer type. For example, if the subexpression had type 909f4a2713aSLionel Sambuc"``foo*``", we could return that type as the result. If the subexpression had 910f4a2713aSLionel Sambuctype "``bar``", we want to return "``foo*``" (note that we do *not* want 911f4a2713aSLionel Sambuc"``int*``"). In order to provide all of this, ``Type`` has a 912f4a2713aSLionel Sambuc``getAsPointerType()`` method that checks whether the type is structurally a 913f4a2713aSLionel Sambuc``PointerType`` and, if so, returns the best one. If not, it returns a null 914f4a2713aSLionel Sambucpointer. 915f4a2713aSLionel Sambuc 916f4a2713aSLionel SambucThis structure is somewhat mystical, but after meditating on it, it will make 917f4a2713aSLionel Sambucsense to you :). 918f4a2713aSLionel Sambuc 919f4a2713aSLionel Sambuc.. _QualType: 920f4a2713aSLionel Sambuc 921f4a2713aSLionel SambucThe ``QualType`` class 922f4a2713aSLionel Sambuc---------------------- 923f4a2713aSLionel Sambuc 924f4a2713aSLionel SambucThe ``QualType`` class is designed as a trivial value class that is small, 925f4a2713aSLionel Sambucpassed by-value and is efficient to query. The idea of ``QualType`` is that it 926f4a2713aSLionel Sambucstores the type qualifiers (``const``, ``volatile``, ``restrict``, plus some 927f4a2713aSLionel Sambucextended qualifiers required by language extensions) separately from the types 928f4a2713aSLionel Sambucthemselves. ``QualType`` is conceptually a pair of "``Type*``" and the bits 929f4a2713aSLionel Sambucfor these type qualifiers. 930f4a2713aSLionel Sambuc 931f4a2713aSLionel SambucBy storing the type qualifiers as bits in the conceptual pair, it is extremely 932f4a2713aSLionel Sambucefficient to get the set of qualifiers on a ``QualType`` (just return the field 933f4a2713aSLionel Sambucof the pair), add a type qualifier (which is a trivial constant-time operation 934f4a2713aSLionel Sambucthat sets a bit), and remove one or more type qualifiers (just return a 935f4a2713aSLionel Sambuc``QualType`` with the bitfield set to empty). 936f4a2713aSLionel Sambuc 937f4a2713aSLionel SambucFurther, because the bits are stored outside of the type itself, we do not need 938f4a2713aSLionel Sambucto create duplicates of types with different sets of qualifiers (i.e. there is 939f4a2713aSLionel Sambuconly a single heap allocated "``int``" type: "``const int``" and "``volatile 940f4a2713aSLionel Sambucconst int``" both point to the same heap allocated "``int``" type). This 941f4a2713aSLionel Sambucreduces the heap size used to represent bits and also means we do not have to 942f4a2713aSLionel Sambucconsider qualifiers when uniquing types (:ref:`Type <Type>` does not even 943f4a2713aSLionel Sambuccontain qualifiers). 944f4a2713aSLionel Sambuc 945f4a2713aSLionel SambucIn practice, the two most common type qualifiers (``const`` and ``restrict``) 946f4a2713aSLionel Sambucare stored in the low bits of the pointer to the ``Type`` object, together with 947f4a2713aSLionel Sambuca flag indicating whether extended qualifiers are present (which must be 948f4a2713aSLionel Sambucheap-allocated). This means that ``QualType`` is exactly the same size as a 949f4a2713aSLionel Sambucpointer. 950f4a2713aSLionel Sambuc 951f4a2713aSLionel Sambuc.. _DeclarationName: 952f4a2713aSLionel Sambuc 953f4a2713aSLionel SambucDeclaration names 954f4a2713aSLionel Sambuc----------------- 955f4a2713aSLionel Sambuc 956f4a2713aSLionel SambucThe ``DeclarationName`` class represents the name of a declaration in Clang. 957f4a2713aSLionel SambucDeclarations in the C family of languages can take several different forms. 958f4a2713aSLionel SambucMost declarations are named by simple identifiers, e.g., "``f``" and "``x``" in 959f4a2713aSLionel Sambucthe function declaration ``f(int x)``. In C++, declaration names can also name 960f4a2713aSLionel Sambucclass constructors ("``Class``" in ``struct Class { Class(); }``), class 961f4a2713aSLionel Sambucdestructors ("``~Class``"), overloaded operator names ("``operator+``"), and 962f4a2713aSLionel Sambucconversion functions ("``operator void const *``"). In Objective-C, 963f4a2713aSLionel Sambucdeclaration names can refer to the names of Objective-C methods, which involve 964f4a2713aSLionel Sambucthe method name and the parameters, collectively called a *selector*, e.g., 965f4a2713aSLionel Sambuc"``setWidth:height:``". Since all of these kinds of entities --- variables, 966f4a2713aSLionel Sambucfunctions, Objective-C methods, C++ constructors, destructors, and operators 967f4a2713aSLionel Sambuc--- are represented as subclasses of Clang's common ``NamedDecl`` class, 968f4a2713aSLionel Sambuc``DeclarationName`` is designed to efficiently represent any kind of name. 969f4a2713aSLionel Sambuc 970f4a2713aSLionel SambucGiven a ``DeclarationName`` ``N``, ``N.getNameKind()`` will produce a value 971f4a2713aSLionel Sambucthat describes what kind of name ``N`` stores. There are 10 options (all of 972f4a2713aSLionel Sambucthe names are inside the ``DeclarationName`` class). 973f4a2713aSLionel Sambuc 974f4a2713aSLionel Sambuc``Identifier`` 975f4a2713aSLionel Sambuc 976f4a2713aSLionel Sambuc The name is a simple identifier. Use ``N.getAsIdentifierInfo()`` to retrieve 977f4a2713aSLionel Sambuc the corresponding ``IdentifierInfo*`` pointing to the actual identifier. 978f4a2713aSLionel Sambuc 979f4a2713aSLionel Sambuc``ObjCZeroArgSelector``, ``ObjCOneArgSelector``, ``ObjCMultiArgSelector`` 980f4a2713aSLionel Sambuc 981f4a2713aSLionel Sambuc The name is an Objective-C selector, which can be retrieved as a ``Selector`` 982f4a2713aSLionel Sambuc instance via ``N.getObjCSelector()``. The three possible name kinds for 983f4a2713aSLionel Sambuc Objective-C reflect an optimization within the ``DeclarationName`` class: 984f4a2713aSLionel Sambuc both zero- and one-argument selectors are stored as a masked 985f4a2713aSLionel Sambuc ``IdentifierInfo`` pointer, and therefore require very little space, since 986f4a2713aSLionel Sambuc zero- and one-argument selectors are far more common than multi-argument 987f4a2713aSLionel Sambuc selectors (which use a different structure). 988f4a2713aSLionel Sambuc 989f4a2713aSLionel Sambuc``CXXConstructorName`` 990f4a2713aSLionel Sambuc 991f4a2713aSLionel Sambuc The name is a C++ constructor name. Use ``N.getCXXNameType()`` to retrieve 992f4a2713aSLionel Sambuc the :ref:`type <QualType>` that this constructor is meant to construct. The 993f4a2713aSLionel Sambuc type is always the canonical type, since all constructors for a given type 994f4a2713aSLionel Sambuc have the same name. 995f4a2713aSLionel Sambuc 996f4a2713aSLionel Sambuc``CXXDestructorName`` 997f4a2713aSLionel Sambuc 998f4a2713aSLionel Sambuc The name is a C++ destructor name. Use ``N.getCXXNameType()`` to retrieve 999f4a2713aSLionel Sambuc the :ref:`type <QualType>` whose destructor is being named. This type is 1000f4a2713aSLionel Sambuc always a canonical type. 1001f4a2713aSLionel Sambuc 1002f4a2713aSLionel Sambuc``CXXConversionFunctionName`` 1003f4a2713aSLionel Sambuc 1004f4a2713aSLionel Sambuc The name is a C++ conversion function. Conversion functions are named 1005f4a2713aSLionel Sambuc according to the type they convert to, e.g., "``operator void const *``". 1006f4a2713aSLionel Sambuc Use ``N.getCXXNameType()`` to retrieve the type that this conversion function 1007f4a2713aSLionel Sambuc converts to. This type is always a canonical type. 1008f4a2713aSLionel Sambuc 1009f4a2713aSLionel Sambuc``CXXOperatorName`` 1010f4a2713aSLionel Sambuc 1011f4a2713aSLionel Sambuc The name is a C++ overloaded operator name. Overloaded operators are named 1012f4a2713aSLionel Sambuc according to their spelling, e.g., "``operator+``" or "``operator new []``". 1013f4a2713aSLionel Sambuc Use ``N.getCXXOverloadedOperator()`` to retrieve the overloaded operator (a 1014f4a2713aSLionel Sambuc value of type ``OverloadedOperatorKind``). 1015f4a2713aSLionel Sambuc 1016f4a2713aSLionel Sambuc``CXXLiteralOperatorName`` 1017f4a2713aSLionel Sambuc 1018f4a2713aSLionel Sambuc The name is a C++11 user defined literal operator. User defined 1019f4a2713aSLionel Sambuc Literal operators are named according to the suffix they define, 1020f4a2713aSLionel Sambuc e.g., "``_foo``" for "``operator "" _foo``". Use 1021f4a2713aSLionel Sambuc ``N.getCXXLiteralIdentifier()`` to retrieve the corresponding 1022f4a2713aSLionel Sambuc ``IdentifierInfo*`` pointing to the identifier. 1023f4a2713aSLionel Sambuc 1024f4a2713aSLionel Sambuc``CXXUsingDirective`` 1025f4a2713aSLionel Sambuc 1026f4a2713aSLionel Sambuc The name is a C++ using directive. Using directives are not really 1027f4a2713aSLionel Sambuc NamedDecls, in that they all have the same name, but they are 1028f4a2713aSLionel Sambuc implemented as such in order to store them in DeclContext 1029f4a2713aSLionel Sambuc effectively. 1030f4a2713aSLionel Sambuc 1031f4a2713aSLionel Sambuc``DeclarationName``\ s are cheap to create, copy, and compare. They require 1032f4a2713aSLionel Sambuconly a single pointer's worth of storage in the common cases (identifiers, 1033f4a2713aSLionel Sambuczero- and one-argument Objective-C selectors) and use dense, uniqued storage 1034f4a2713aSLionel Sambucfor the other kinds of names. Two ``DeclarationName``\ s can be compared for 1035f4a2713aSLionel Sambucequality (``==``, ``!=``) using a simple bitwise comparison, can be ordered 1036f4a2713aSLionel Sambucwith ``<``, ``>``, ``<=``, and ``>=`` (which provide a lexicographical ordering 1037f4a2713aSLionel Sambucfor normal identifiers but an unspecified ordering for other kinds of names), 1038f4a2713aSLionel Sambucand can be placed into LLVM ``DenseMap``\ s and ``DenseSet``\ s. 1039f4a2713aSLionel Sambuc 1040f4a2713aSLionel Sambuc``DeclarationName`` instances can be created in different ways depending on 1041f4a2713aSLionel Sambucwhat kind of name the instance will store. Normal identifiers 1042f4a2713aSLionel Sambuc(``IdentifierInfo`` pointers) and Objective-C selectors (``Selector``) can be 1043f4a2713aSLionel Sambucimplicitly converted to ``DeclarationNames``. Names for C++ constructors, 1044f4a2713aSLionel Sambucdestructors, conversion functions, and overloaded operators can be retrieved 1045f4a2713aSLionel Sambucfrom the ``DeclarationNameTable``, an instance of which is available as 1046f4a2713aSLionel Sambuc``ASTContext::DeclarationNames``. The member functions 1047f4a2713aSLionel Sambuc``getCXXConstructorName``, ``getCXXDestructorName``, 1048f4a2713aSLionel Sambuc``getCXXConversionFunctionName``, and ``getCXXOperatorName``, respectively, 1049f4a2713aSLionel Sambucreturn ``DeclarationName`` instances for the four kinds of C++ special function 1050f4a2713aSLionel Sambucnames. 1051f4a2713aSLionel Sambuc 1052f4a2713aSLionel Sambuc.. _DeclContext: 1053f4a2713aSLionel Sambuc 1054f4a2713aSLionel SambucDeclaration contexts 1055f4a2713aSLionel Sambuc-------------------- 1056f4a2713aSLionel Sambuc 1057f4a2713aSLionel SambucEvery declaration in a program exists within some *declaration context*, such 1058f4a2713aSLionel Sambucas a translation unit, namespace, class, or function. Declaration contexts in 1059f4a2713aSLionel SambucClang are represented by the ``DeclContext`` class, from which the various 1060f4a2713aSLionel Sambucdeclaration-context AST nodes (``TranslationUnitDecl``, ``NamespaceDecl``, 1061f4a2713aSLionel Sambuc``RecordDecl``, ``FunctionDecl``, etc.) will derive. The ``DeclContext`` class 1062f4a2713aSLionel Sambucprovides several facilities common to each declaration context: 1063f4a2713aSLionel Sambuc 1064f4a2713aSLionel SambucSource-centric vs. Semantics-centric View of Declarations 1065f4a2713aSLionel Sambuc 1066f4a2713aSLionel Sambuc ``DeclContext`` provides two views of the declarations stored within a 1067f4a2713aSLionel Sambuc declaration context. The source-centric view accurately represents the 1068f4a2713aSLionel Sambuc program source code as written, including multiple declarations of entities 1069f4a2713aSLionel Sambuc where present (see the section :ref:`Redeclarations and Overloads 1070f4a2713aSLionel Sambuc <Redeclarations>`), while the semantics-centric view represents the program 1071f4a2713aSLionel Sambuc semantics. The two views are kept synchronized by semantic analysis while 1072f4a2713aSLionel Sambuc the ASTs are being constructed. 1073f4a2713aSLionel Sambuc 1074f4a2713aSLionel SambucStorage of declarations within that context 1075f4a2713aSLionel Sambuc 1076f4a2713aSLionel Sambuc Every declaration context can contain some number of declarations. For 1077f4a2713aSLionel Sambuc example, a C++ class (represented by ``RecordDecl``) contains various member 1078f4a2713aSLionel Sambuc functions, fields, nested types, and so on. All of these declarations will 1079f4a2713aSLionel Sambuc be stored within the ``DeclContext``, and one can iterate over the 1080f4a2713aSLionel Sambuc declarations via [``DeclContext::decls_begin()``, 1081f4a2713aSLionel Sambuc ``DeclContext::decls_end()``). This mechanism provides the source-centric 1082f4a2713aSLionel Sambuc view of declarations in the context. 1083f4a2713aSLionel Sambuc 1084f4a2713aSLionel SambucLookup of declarations within that context 1085f4a2713aSLionel Sambuc 1086f4a2713aSLionel Sambuc The ``DeclContext`` structure provides efficient name lookup for names within 1087f4a2713aSLionel Sambuc that declaration context. For example, if ``N`` is a namespace we can look 1088f4a2713aSLionel Sambuc for the name ``N::f`` using ``DeclContext::lookup``. The lookup itself is 1089f4a2713aSLionel Sambuc based on a lazily-constructed array (for declaration contexts with a small 1090f4a2713aSLionel Sambuc number of declarations) or hash table (for declaration contexts with more 1091f4a2713aSLionel Sambuc declarations). The lookup operation provides the semantics-centric view of 1092f4a2713aSLionel Sambuc the declarations in the context. 1093f4a2713aSLionel Sambuc 1094f4a2713aSLionel SambucOwnership of declarations 1095f4a2713aSLionel Sambuc 1096f4a2713aSLionel Sambuc The ``DeclContext`` owns all of the declarations that were declared within 1097f4a2713aSLionel Sambuc its declaration context, and is responsible for the management of their 1098f4a2713aSLionel Sambuc memory as well as their (de-)serialization. 1099f4a2713aSLionel Sambuc 1100f4a2713aSLionel SambucAll declarations are stored within a declaration context, and one can query 1101f4a2713aSLionel Sambucinformation about the context in which each declaration lives. One can 1102f4a2713aSLionel Sambucretrieve the ``DeclContext`` that contains a particular ``Decl`` using 1103f4a2713aSLionel Sambuc``Decl::getDeclContext``. However, see the section 1104f4a2713aSLionel Sambuc:ref:`LexicalAndSemanticContexts` for more information about how to interpret 1105f4a2713aSLionel Sambucthis context information. 1106f4a2713aSLionel Sambuc 1107f4a2713aSLionel Sambuc.. _Redeclarations: 1108f4a2713aSLionel Sambuc 1109f4a2713aSLionel SambucRedeclarations and Overloads 1110f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1111f4a2713aSLionel Sambuc 1112f4a2713aSLionel SambucWithin a translation unit, it is common for an entity to be declared several 1113f4a2713aSLionel Sambuctimes. For example, we might declare a function "``f``" and then later 1114f4a2713aSLionel Sambucre-declare it as part of an inlined definition: 1115f4a2713aSLionel Sambuc 1116f4a2713aSLionel Sambuc.. code-block:: c++ 1117f4a2713aSLionel Sambuc 1118f4a2713aSLionel Sambuc void f(int x, int y, int z = 1); 1119f4a2713aSLionel Sambuc 1120f4a2713aSLionel Sambuc inline void f(int x, int y, int z) { /* ... */ } 1121f4a2713aSLionel Sambuc 1122f4a2713aSLionel SambucThe representation of "``f``" differs in the source-centric and 1123f4a2713aSLionel Sambucsemantics-centric views of a declaration context. In the source-centric view, 1124f4a2713aSLionel Sambucall redeclarations will be present, in the order they occurred in the source 1125f4a2713aSLionel Sambuccode, making this view suitable for clients that wish to see the structure of 1126f4a2713aSLionel Sambucthe source code. In the semantics-centric view, only the most recent "``f``" 1127f4a2713aSLionel Sambucwill be found by the lookup, since it effectively replaces the first 1128f4a2713aSLionel Sambucdeclaration of "``f``". 1129f4a2713aSLionel Sambuc 1130f4a2713aSLionel SambucIn the semantics-centric view, overloading of functions is represented 1131f4a2713aSLionel Sambucexplicitly. For example, given two declarations of a function "``g``" that are 1132f4a2713aSLionel Sambucoverloaded, e.g., 1133f4a2713aSLionel Sambuc 1134f4a2713aSLionel Sambuc.. code-block:: c++ 1135f4a2713aSLionel Sambuc 1136f4a2713aSLionel Sambuc void g(); 1137f4a2713aSLionel Sambuc void g(int); 1138f4a2713aSLionel Sambuc 1139f4a2713aSLionel Sambucthe ``DeclContext::lookup`` operation will return a 1140f4a2713aSLionel Sambuc``DeclContext::lookup_result`` that contains a range of iterators over 1141f4a2713aSLionel Sambucdeclarations of "``g``". Clients that perform semantic analysis on a program 1142f4a2713aSLionel Sambucthat is not concerned with the actual source code will primarily use this 1143f4a2713aSLionel Sambucsemantics-centric view. 1144f4a2713aSLionel Sambuc 1145f4a2713aSLionel Sambuc.. _LexicalAndSemanticContexts: 1146f4a2713aSLionel Sambuc 1147f4a2713aSLionel SambucLexical and Semantic Contexts 1148f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1149f4a2713aSLionel Sambuc 1150f4a2713aSLionel SambucEach declaration has two potentially different declaration contexts: a 1151f4a2713aSLionel Sambuc*lexical* context, which corresponds to the source-centric view of the 1152f4a2713aSLionel Sambucdeclaration context, and a *semantic* context, which corresponds to the 1153f4a2713aSLionel Sambucsemantics-centric view. The lexical context is accessible via 1154f4a2713aSLionel Sambuc``Decl::getLexicalDeclContext`` while the semantic context is accessible via 1155f4a2713aSLionel Sambuc``Decl::getDeclContext``, both of which return ``DeclContext`` pointers. For 1156f4a2713aSLionel Sambucmost declarations, the two contexts are identical. For example: 1157f4a2713aSLionel Sambuc 1158f4a2713aSLionel Sambuc.. code-block:: c++ 1159f4a2713aSLionel Sambuc 1160f4a2713aSLionel Sambuc class X { 1161f4a2713aSLionel Sambuc public: 1162f4a2713aSLionel Sambuc void f(int x); 1163f4a2713aSLionel Sambuc }; 1164f4a2713aSLionel Sambuc 1165f4a2713aSLionel SambucHere, the semantic and lexical contexts of ``X::f`` are the ``DeclContext`` 1166f4a2713aSLionel Sambucassociated with the class ``X`` (itself stored as a ``RecordDecl`` AST node). 1167f4a2713aSLionel SambucHowever, we can now define ``X::f`` out-of-line: 1168f4a2713aSLionel Sambuc 1169f4a2713aSLionel Sambuc.. code-block:: c++ 1170f4a2713aSLionel Sambuc 1171f4a2713aSLionel Sambuc void X::f(int x = 17) { /* ... */ } 1172f4a2713aSLionel Sambuc 1173f4a2713aSLionel SambucThis definition of "``f``" has different lexical and semantic contexts. The 1174f4a2713aSLionel Sambuclexical context corresponds to the declaration context in which the actual 1175f4a2713aSLionel Sambucdeclaration occurred in the source code, e.g., the translation unit containing 1176f4a2713aSLionel Sambuc``X``. Thus, this declaration of ``X::f`` can be found by traversing the 1177f4a2713aSLionel Sambucdeclarations provided by [``decls_begin()``, ``decls_end()``) in the 1178f4a2713aSLionel Sambuctranslation unit. 1179f4a2713aSLionel Sambuc 1180f4a2713aSLionel SambucThe semantic context of ``X::f`` corresponds to the class ``X``, since this 1181f4a2713aSLionel Sambucmember function is (semantically) a member of ``X``. Lookup of the name ``f`` 1182f4a2713aSLionel Sambucinto the ``DeclContext`` associated with ``X`` will then return the definition 1183f4a2713aSLionel Sambucof ``X::f`` (including information about the default argument). 1184f4a2713aSLionel Sambuc 1185f4a2713aSLionel SambucTransparent Declaration Contexts 1186f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1187f4a2713aSLionel Sambuc 1188f4a2713aSLionel SambucIn C and C++, there are several contexts in which names that are logically 1189f4a2713aSLionel Sambucdeclared inside another declaration will actually "leak" out into the enclosing 1190f4a2713aSLionel Sambucscope from the perspective of name lookup. The most obvious instance of this 1191f4a2713aSLionel Sambucbehavior is in enumeration types, e.g., 1192f4a2713aSLionel Sambuc 1193f4a2713aSLionel Sambuc.. code-block:: c++ 1194f4a2713aSLionel Sambuc 1195f4a2713aSLionel Sambuc enum Color { 1196f4a2713aSLionel Sambuc Red, 1197f4a2713aSLionel Sambuc Green, 1198f4a2713aSLionel Sambuc Blue 1199f4a2713aSLionel Sambuc }; 1200f4a2713aSLionel Sambuc 1201f4a2713aSLionel SambucHere, ``Color`` is an enumeration, which is a declaration context that contains 1202f4a2713aSLionel Sambucthe enumerators ``Red``, ``Green``, and ``Blue``. Thus, traversing the list of 1203f4a2713aSLionel Sambucdeclarations contained in the enumeration ``Color`` will yield ``Red``, 1204f4a2713aSLionel Sambuc``Green``, and ``Blue``. However, outside of the scope of ``Color`` one can 1205f4a2713aSLionel Sambucname the enumerator ``Red`` without qualifying the name, e.g., 1206f4a2713aSLionel Sambuc 1207f4a2713aSLionel Sambuc.. code-block:: c++ 1208f4a2713aSLionel Sambuc 1209f4a2713aSLionel Sambuc Color c = Red; 1210f4a2713aSLionel Sambuc 1211f4a2713aSLionel SambucThere are other entities in C++ that provide similar behavior. For example, 1212f4a2713aSLionel Sambuclinkage specifications that use curly braces: 1213f4a2713aSLionel Sambuc 1214f4a2713aSLionel Sambuc.. code-block:: c++ 1215f4a2713aSLionel Sambuc 1216f4a2713aSLionel Sambuc extern "C" { 1217f4a2713aSLionel Sambuc void f(int); 1218f4a2713aSLionel Sambuc void g(int); 1219f4a2713aSLionel Sambuc } 1220f4a2713aSLionel Sambuc // f and g are visible here 1221f4a2713aSLionel Sambuc 1222f4a2713aSLionel SambucFor source-level accuracy, we treat the linkage specification and enumeration 1223f4a2713aSLionel Sambuctype as a declaration context in which its enclosed declarations ("``Red``", 1224f4a2713aSLionel Sambuc"``Green``", and "``Blue``"; "``f``" and "``g``") are declared. However, these 1225f4a2713aSLionel Sambucdeclarations are visible outside of the scope of the declaration context. 1226f4a2713aSLionel Sambuc 1227f4a2713aSLionel SambucThese language features (and several others, described below) have roughly the 1228f4a2713aSLionel Sambucsame set of requirements: declarations are declared within a particular lexical 1229f4a2713aSLionel Sambuccontext, but the declarations are also found via name lookup in scopes 1230f4a2713aSLionel Sambucenclosing the declaration itself. This feature is implemented via 1231f4a2713aSLionel Sambuc*transparent* declaration contexts (see 1232f4a2713aSLionel Sambuc``DeclContext::isTransparentContext()``), whose declarations are visible in the 1233f4a2713aSLionel Sambucnearest enclosing non-transparent declaration context. This means that the 1234f4a2713aSLionel Sambuclexical context of the declaration (e.g., an enumerator) will be the 1235f4a2713aSLionel Sambuctransparent ``DeclContext`` itself, as will the semantic context, but the 1236f4a2713aSLionel Sambucdeclaration will be visible in every outer context up to and including the 1237f4a2713aSLionel Sambucfirst non-transparent declaration context (since transparent declaration 1238f4a2713aSLionel Sambuccontexts can be nested). 1239f4a2713aSLionel Sambuc 1240f4a2713aSLionel SambucThe transparent ``DeclContext``\ s are: 1241f4a2713aSLionel Sambuc 1242f4a2713aSLionel Sambuc* Enumerations (but not C++11 "scoped enumerations"): 1243f4a2713aSLionel Sambuc 1244f4a2713aSLionel Sambuc .. code-block:: c++ 1245f4a2713aSLionel Sambuc 1246f4a2713aSLionel Sambuc enum Color { 1247f4a2713aSLionel Sambuc Red, 1248f4a2713aSLionel Sambuc Green, 1249f4a2713aSLionel Sambuc Blue 1250f4a2713aSLionel Sambuc }; 1251f4a2713aSLionel Sambuc // Red, Green, and Blue are in scope 1252f4a2713aSLionel Sambuc 1253f4a2713aSLionel Sambuc* C++ linkage specifications: 1254f4a2713aSLionel Sambuc 1255f4a2713aSLionel Sambuc .. code-block:: c++ 1256f4a2713aSLionel Sambuc 1257f4a2713aSLionel Sambuc extern "C" { 1258f4a2713aSLionel Sambuc void f(int); 1259f4a2713aSLionel Sambuc void g(int); 1260f4a2713aSLionel Sambuc } 1261f4a2713aSLionel Sambuc // f and g are in scope 1262f4a2713aSLionel Sambuc 1263f4a2713aSLionel Sambuc* Anonymous unions and structs: 1264f4a2713aSLionel Sambuc 1265f4a2713aSLionel Sambuc .. code-block:: c++ 1266f4a2713aSLionel Sambuc 1267f4a2713aSLionel Sambuc struct LookupTable { 1268f4a2713aSLionel Sambuc bool IsVector; 1269f4a2713aSLionel Sambuc union { 1270f4a2713aSLionel Sambuc std::vector<Item> *Vector; 1271f4a2713aSLionel Sambuc std::set<Item> *Set; 1272f4a2713aSLionel Sambuc }; 1273f4a2713aSLionel Sambuc }; 1274f4a2713aSLionel Sambuc 1275f4a2713aSLionel Sambuc LookupTable LT; 1276f4a2713aSLionel Sambuc LT.Vector = 0; // Okay: finds Vector inside the unnamed union 1277f4a2713aSLionel Sambuc 1278f4a2713aSLionel Sambuc* C++11 inline namespaces: 1279f4a2713aSLionel Sambuc 1280f4a2713aSLionel Sambuc .. code-block:: c++ 1281f4a2713aSLionel Sambuc 1282f4a2713aSLionel Sambuc namespace mylib { 1283f4a2713aSLionel Sambuc inline namespace debug { 1284f4a2713aSLionel Sambuc class X; 1285f4a2713aSLionel Sambuc } 1286f4a2713aSLionel Sambuc } 1287f4a2713aSLionel Sambuc mylib::X *xp; // okay: mylib::X refers to mylib::debug::X 1288f4a2713aSLionel Sambuc 1289f4a2713aSLionel Sambuc.. _MultiDeclContext: 1290f4a2713aSLionel Sambuc 1291f4a2713aSLionel SambucMultiply-Defined Declaration Contexts 1292f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1293f4a2713aSLionel Sambuc 1294f4a2713aSLionel SambucC++ namespaces have the interesting --- and, so far, unique --- property that 1295f4a2713aSLionel Sambucthe namespace can be defined multiple times, and the declarations provided by 1296f4a2713aSLionel Sambuceach namespace definition are effectively merged (from the semantic point of 1297f4a2713aSLionel Sambucview). For example, the following two code snippets are semantically 1298f4a2713aSLionel Sambucindistinguishable: 1299f4a2713aSLionel Sambuc 1300f4a2713aSLionel Sambuc.. code-block:: c++ 1301f4a2713aSLionel Sambuc 1302f4a2713aSLionel Sambuc // Snippet #1: 1303f4a2713aSLionel Sambuc namespace N { 1304f4a2713aSLionel Sambuc void f(); 1305f4a2713aSLionel Sambuc } 1306f4a2713aSLionel Sambuc namespace N { 1307f4a2713aSLionel Sambuc void f(int); 1308f4a2713aSLionel Sambuc } 1309f4a2713aSLionel Sambuc 1310f4a2713aSLionel Sambuc // Snippet #2: 1311f4a2713aSLionel Sambuc namespace N { 1312f4a2713aSLionel Sambuc void f(); 1313f4a2713aSLionel Sambuc void f(int); 1314f4a2713aSLionel Sambuc } 1315f4a2713aSLionel Sambuc 1316f4a2713aSLionel SambucIn Clang's representation, the source-centric view of declaration contexts will 1317f4a2713aSLionel Sambucactually have two separate ``NamespaceDecl`` nodes in Snippet #1, each of which 1318f4a2713aSLionel Sambucis a declaration context that contains a single declaration of "``f``". 1319f4a2713aSLionel SambucHowever, the semantics-centric view provided by name lookup into the namespace 1320f4a2713aSLionel Sambuc``N`` for "``f``" will return a ``DeclContext::lookup_result`` that contains a 1321f4a2713aSLionel Sambucrange of iterators over declarations of "``f``". 1322f4a2713aSLionel Sambuc 1323f4a2713aSLionel Sambuc``DeclContext`` manages multiply-defined declaration contexts internally. The 1324f4a2713aSLionel Sambucfunction ``DeclContext::getPrimaryContext`` retrieves the "primary" context for 1325f4a2713aSLionel Sambuca given ``DeclContext`` instance, which is the ``DeclContext`` responsible for 1326f4a2713aSLionel Sambucmaintaining the lookup table used for the semantics-centric view. Given the 1327f4a2713aSLionel Sambucprimary context, one can follow the chain of ``DeclContext`` nodes that define 1328f4a2713aSLionel Sambucadditional declarations via ``DeclContext::getNextContext``. Note that these 1329f4a2713aSLionel Sambucfunctions are used internally within the lookup and insertion methods of the 1330f4a2713aSLionel Sambuc``DeclContext``, so the vast majority of clients can ignore them. 1331f4a2713aSLionel Sambuc 1332f4a2713aSLionel Sambuc.. _CFG: 1333f4a2713aSLionel Sambuc 1334f4a2713aSLionel SambucThe ``CFG`` class 1335f4a2713aSLionel Sambuc----------------- 1336f4a2713aSLionel Sambuc 1337f4a2713aSLionel SambucThe ``CFG`` class is designed to represent a source-level control-flow graph 1338f4a2713aSLionel Sambucfor a single statement (``Stmt*``). Typically instances of ``CFG`` are 1339f4a2713aSLionel Sambucconstructed for function bodies (usually an instance of ``CompoundStmt``), but 1340f4a2713aSLionel Sambuccan also be instantiated to represent the control-flow of any class that 1341f4a2713aSLionel Sambucsubclasses ``Stmt``, which includes simple expressions. Control-flow graphs 1342f4a2713aSLionel Sambucare especially useful for performing `flow- or path-sensitive 1343f4a2713aSLionel Sambuc<http://en.wikipedia.org/wiki/Data_flow_analysis#Sensitivities>`_ program 1344f4a2713aSLionel Sambucanalyses on a given function. 1345f4a2713aSLionel Sambuc 1346f4a2713aSLionel SambucBasic Blocks 1347f4a2713aSLionel Sambuc^^^^^^^^^^^^ 1348f4a2713aSLionel Sambuc 1349f4a2713aSLionel SambucConcretely, an instance of ``CFG`` is a collection of basic blocks. Each basic 1350f4a2713aSLionel Sambucblock is an instance of ``CFGBlock``, which simply contains an ordered sequence 1351f4a2713aSLionel Sambucof ``Stmt*`` (each referring to statements in the AST). The ordering of 1352f4a2713aSLionel Sambucstatements within a block indicates unconditional flow of control from one 1353f4a2713aSLionel Sambucstatement to the next. :ref:`Conditional control-flow 1354f4a2713aSLionel Sambuc<ConditionalControlFlow>` is represented using edges between basic blocks. The 1355f4a2713aSLionel Sambucstatements within a given ``CFGBlock`` can be traversed using the 1356f4a2713aSLionel Sambuc``CFGBlock::*iterator`` interface. 1357f4a2713aSLionel Sambuc 1358f4a2713aSLionel SambucA ``CFG`` object owns the instances of ``CFGBlock`` within the control-flow 1359f4a2713aSLionel Sambucgraph it represents. Each ``CFGBlock`` within a CFG is also uniquely numbered 1360f4a2713aSLionel Sambuc(accessible via ``CFGBlock::getBlockID()``). Currently the number is based on 1361f4a2713aSLionel Sambucthe ordering the blocks were created, but no assumptions should be made on how 1362f4a2713aSLionel Sambuc``CFGBlocks`` are numbered other than their numbers are unique and that they 1363f4a2713aSLionel Sambucare numbered from 0..N-1 (where N is the number of basic blocks in the CFG). 1364f4a2713aSLionel Sambuc 1365f4a2713aSLionel SambucEntry and Exit Blocks 1366f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^ 1367f4a2713aSLionel Sambuc 1368f4a2713aSLionel SambucEach instance of ``CFG`` contains two special blocks: an *entry* block 1369f4a2713aSLionel Sambuc(accessible via ``CFG::getEntry()``), which has no incoming edges, and an 1370f4a2713aSLionel Sambuc*exit* block (accessible via ``CFG::getExit()``), which has no outgoing edges. 1371f4a2713aSLionel SambucNeither block contains any statements, and they serve the role of providing a 1372f4a2713aSLionel Sambucclear entrance and exit for a body of code such as a function body. The 1373f4a2713aSLionel Sambucpresence of these empty blocks greatly simplifies the implementation of many 1374f4a2713aSLionel Sambucanalyses built on top of CFGs. 1375f4a2713aSLionel Sambuc 1376f4a2713aSLionel Sambuc.. _ConditionalControlFlow: 1377f4a2713aSLionel Sambuc 1378f4a2713aSLionel SambucConditional Control-Flow 1379f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^ 1380f4a2713aSLionel Sambuc 1381f4a2713aSLionel SambucConditional control-flow (such as those induced by if-statements and loops) is 1382f4a2713aSLionel Sambucrepresented as edges between ``CFGBlocks``. Because different C language 1383f4a2713aSLionel Sambucconstructs can induce control-flow, each ``CFGBlock`` also records an extra 1384f4a2713aSLionel Sambuc``Stmt*`` that represents the *terminator* of the block. A terminator is 1385f4a2713aSLionel Sambucsimply the statement that caused the control-flow, and is used to identify the 1386f4a2713aSLionel Sambucnature of the conditional control-flow between blocks. For example, in the 1387f4a2713aSLionel Sambuccase of an if-statement, the terminator refers to the ``IfStmt`` object in the 1388f4a2713aSLionel SambucAST that represented the given branch. 1389f4a2713aSLionel Sambuc 1390f4a2713aSLionel SambucTo illustrate, consider the following code example: 1391f4a2713aSLionel Sambuc 1392f4a2713aSLionel Sambuc.. code-block:: c++ 1393f4a2713aSLionel Sambuc 1394f4a2713aSLionel Sambuc int foo(int x) { 1395f4a2713aSLionel Sambuc x = x + 1; 1396f4a2713aSLionel Sambuc if (x > 2) 1397f4a2713aSLionel Sambuc x++; 1398f4a2713aSLionel Sambuc else { 1399f4a2713aSLionel Sambuc x += 2; 1400f4a2713aSLionel Sambuc x *= 2; 1401f4a2713aSLionel Sambuc } 1402f4a2713aSLionel Sambuc 1403f4a2713aSLionel Sambuc return x; 1404f4a2713aSLionel Sambuc } 1405f4a2713aSLionel Sambuc 1406f4a2713aSLionel SambucAfter invoking the parser+semantic analyzer on this code fragment, the AST of 1407f4a2713aSLionel Sambucthe body of ``foo`` is referenced by a single ``Stmt*``. We can then construct 1408f4a2713aSLionel Sambucan instance of ``CFG`` representing the control-flow graph of this function 1409f4a2713aSLionel Sambucbody by single call to a static class method: 1410f4a2713aSLionel Sambuc 1411f4a2713aSLionel Sambuc.. code-block:: c++ 1412f4a2713aSLionel Sambuc 1413f4a2713aSLionel Sambuc Stmt *FooBody = ... 1414*0a6a1f1dSLionel Sambuc std::unique_ptr<CFG> FooCFG = CFG::buildCFG(FooBody); 1415f4a2713aSLionel Sambuc 1416f4a2713aSLionel SambucAlong with providing an interface to iterate over its ``CFGBlocks``, the 1417f4a2713aSLionel Sambuc``CFG`` class also provides methods that are useful for debugging and 1418f4a2713aSLionel Sambucvisualizing CFGs. For example, the method ``CFG::dump()`` dumps a 1419f4a2713aSLionel Sambucpretty-printed version of the CFG to standard error. This is especially useful 1420f4a2713aSLionel Sambucwhen one is using a debugger such as gdb. For example, here is the output of 1421f4a2713aSLionel Sambuc``FooCFG->dump()``: 1422f4a2713aSLionel Sambuc 1423f4a2713aSLionel Sambuc.. code-block:: c++ 1424f4a2713aSLionel Sambuc 1425f4a2713aSLionel Sambuc [ B5 (ENTRY) ] 1426f4a2713aSLionel Sambuc Predecessors (0): 1427f4a2713aSLionel Sambuc Successors (1): B4 1428f4a2713aSLionel Sambuc 1429f4a2713aSLionel Sambuc [ B4 ] 1430f4a2713aSLionel Sambuc 1: x = x + 1 1431f4a2713aSLionel Sambuc 2: (x > 2) 1432f4a2713aSLionel Sambuc T: if [B4.2] 1433f4a2713aSLionel Sambuc Predecessors (1): B5 1434f4a2713aSLionel Sambuc Successors (2): B3 B2 1435f4a2713aSLionel Sambuc 1436f4a2713aSLionel Sambuc [ B3 ] 1437f4a2713aSLionel Sambuc 1: x++ 1438f4a2713aSLionel Sambuc Predecessors (1): B4 1439f4a2713aSLionel Sambuc Successors (1): B1 1440f4a2713aSLionel Sambuc 1441f4a2713aSLionel Sambuc [ B2 ] 1442f4a2713aSLionel Sambuc 1: x += 2 1443f4a2713aSLionel Sambuc 2: x *= 2 1444f4a2713aSLionel Sambuc Predecessors (1): B4 1445f4a2713aSLionel Sambuc Successors (1): B1 1446f4a2713aSLionel Sambuc 1447f4a2713aSLionel Sambuc [ B1 ] 1448f4a2713aSLionel Sambuc 1: return x; 1449f4a2713aSLionel Sambuc Predecessors (2): B2 B3 1450f4a2713aSLionel Sambuc Successors (1): B0 1451f4a2713aSLionel Sambuc 1452f4a2713aSLionel Sambuc [ B0 (EXIT) ] 1453f4a2713aSLionel Sambuc Predecessors (1): B1 1454f4a2713aSLionel Sambuc Successors (0): 1455f4a2713aSLionel Sambuc 1456f4a2713aSLionel SambucFor each block, the pretty-printed output displays for each block the number of 1457f4a2713aSLionel Sambuc*predecessor* blocks (blocks that have outgoing control-flow to the given 1458f4a2713aSLionel Sambucblock) and *successor* blocks (blocks that have control-flow that have incoming 1459f4a2713aSLionel Sambuccontrol-flow from the given block). We can also clearly see the special entry 1460f4a2713aSLionel Sambucand exit blocks at the beginning and end of the pretty-printed output. For the 1461f4a2713aSLionel Sambucentry block (block B5), the number of predecessor blocks is 0, while for the 1462f4a2713aSLionel Sambucexit block (block B0) the number of successor blocks is 0. 1463f4a2713aSLionel Sambuc 1464f4a2713aSLionel SambucThe most interesting block here is B4, whose outgoing control-flow represents 1465f4a2713aSLionel Sambucthe branching caused by the sole if-statement in ``foo``. Of particular 1466f4a2713aSLionel Sambucinterest is the second statement in the block, ``(x > 2)``, and the terminator, 1467f4a2713aSLionel Sambucprinted as ``if [B4.2]``. The second statement represents the evaluation of 1468f4a2713aSLionel Sambucthe condition of the if-statement, which occurs before the actual branching of 1469f4a2713aSLionel Sambuccontrol-flow. Within the ``CFGBlock`` for B4, the ``Stmt*`` for the second 1470f4a2713aSLionel Sambucstatement refers to the actual expression in the AST for ``(x > 2)``. Thus 1471f4a2713aSLionel Sambucpointers to subclasses of ``Expr`` can appear in the list of statements in a 1472f4a2713aSLionel Sambucblock, and not just subclasses of ``Stmt`` that refer to proper C statements. 1473f4a2713aSLionel Sambuc 1474f4a2713aSLionel SambucThe terminator of block B4 is a pointer to the ``IfStmt`` object in the AST. 1475f4a2713aSLionel SambucThe pretty-printer outputs ``if [B4.2]`` because the condition expression of 1476f4a2713aSLionel Sambucthe if-statement has an actual place in the basic block, and thus the 1477f4a2713aSLionel Sambucterminator is essentially *referring* to the expression that is the second 1478f4a2713aSLionel Sambucstatement of block B4 (i.e., B4.2). In this manner, conditions for 1479f4a2713aSLionel Sambuccontrol-flow (which also includes conditions for loops and switch statements) 1480f4a2713aSLionel Sambucare hoisted into the actual basic block. 1481f4a2713aSLionel Sambuc 1482f4a2713aSLionel Sambuc.. Implicit Control-Flow 1483f4a2713aSLionel Sambuc.. ^^^^^^^^^^^^^^^^^^^^^ 1484f4a2713aSLionel Sambuc 1485f4a2713aSLionel Sambuc.. A key design principle of the ``CFG`` class was to not require any 1486f4a2713aSLionel Sambuc.. transformations to the AST in order to represent control-flow. Thus the 1487f4a2713aSLionel Sambuc.. ``CFG`` does not perform any "lowering" of the statements in an AST: loops 1488f4a2713aSLionel Sambuc.. are not transformed into guarded gotos, short-circuit operations are not 1489f4a2713aSLionel Sambuc.. converted to a set of if-statements, and so on. 1490f4a2713aSLionel Sambuc 1491f4a2713aSLionel SambucConstant Folding in the Clang AST 1492f4a2713aSLionel Sambuc--------------------------------- 1493f4a2713aSLionel Sambuc 1494f4a2713aSLionel SambucThere are several places where constants and constant folding matter a lot to 1495f4a2713aSLionel Sambucthe Clang front-end. First, in general, we prefer the AST to retain the source 1496f4a2713aSLionel Sambuccode as close to how the user wrote it as possible. This means that if they 1497f4a2713aSLionel Sambucwrote "``5+4``", we want to keep the addition and two constants in the AST, we 1498f4a2713aSLionel Sambucdon't want to fold to "``9``". This means that constant folding in various 1499f4a2713aSLionel Sambucways turns into a tree walk that needs to handle the various cases. 1500f4a2713aSLionel Sambuc 1501f4a2713aSLionel SambucHowever, there are places in both C and C++ that require constants to be 1502f4a2713aSLionel Sambucfolded. For example, the C standard defines what an "integer constant 1503f4a2713aSLionel Sambucexpression" (i-c-e) is with very precise and specific requirements. The 1504f4a2713aSLionel Sambuclanguage then requires i-c-e's in a lot of places (for example, the size of a 1505f4a2713aSLionel Sambucbitfield, the value for a case statement, etc). For these, we have to be able 1506f4a2713aSLionel Sambucto constant fold the constants, to do semantic checks (e.g., verify bitfield 1507f4a2713aSLionel Sambucsize is non-negative and that case statements aren't duplicated). We aim for 1508f4a2713aSLionel SambucClang to be very pedantic about this, diagnosing cases when the code does not 1509f4a2713aSLionel Sambucuse an i-c-e where one is required, but accepting the code unless running with 1510f4a2713aSLionel Sambuc``-pedantic-errors``. 1511f4a2713aSLionel Sambuc 1512f4a2713aSLionel SambucThings get a little bit more tricky when it comes to compatibility with 1513f4a2713aSLionel Sambucreal-world source code. Specifically, GCC has historically accepted a huge 1514f4a2713aSLionel Sambucsuperset of expressions as i-c-e's, and a lot of real world code depends on 1515f4a2713aSLionel Sambucthis unfortuate accident of history (including, e.g., the glibc system 1516f4a2713aSLionel Sambucheaders). GCC accepts anything its "fold" optimizer is capable of reducing to 1517f4a2713aSLionel Sambucan integer constant, which means that the definition of what it accepts changes 1518f4a2713aSLionel Sambucas its optimizer does. One example is that GCC accepts things like "``case 1519f4a2713aSLionel SambucX-X:``" even when ``X`` is a variable, because it can fold this to 0. 1520f4a2713aSLionel Sambuc 1521f4a2713aSLionel SambucAnother issue are how constants interact with the extensions we support, such 1522f4a2713aSLionel Sambucas ``__builtin_constant_p``, ``__builtin_inf``, ``__extension__`` and many 1523f4a2713aSLionel Sambucothers. C99 obviously does not specify the semantics of any of these 1524f4a2713aSLionel Sambucextensions, and the definition of i-c-e does not include them. However, these 1525f4a2713aSLionel Sambucextensions are often used in real code, and we have to have a way to reason 1526f4a2713aSLionel Sambucabout them. 1527f4a2713aSLionel Sambuc 1528f4a2713aSLionel SambucFinally, this is not just a problem for semantic analysis. The code generator 1529f4a2713aSLionel Sambucand other clients have to be able to fold constants (e.g., to initialize global 1530f4a2713aSLionel Sambucvariables) and has to handle a superset of what C99 allows. Further, these 1531f4a2713aSLionel Sambucclients can benefit from extended information. For example, we know that 1532f4a2713aSLionel Sambuc"``foo() || 1``" always evaluates to ``true``, but we can't replace the 1533f4a2713aSLionel Sambucexpression with ``true`` because it has side effects. 1534f4a2713aSLionel Sambuc 1535f4a2713aSLionel SambucImplementation Approach 1536f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^ 1537f4a2713aSLionel Sambuc 1538f4a2713aSLionel SambucAfter trying several different approaches, we've finally converged on a design 1539f4a2713aSLionel Sambuc(Note, at the time of this writing, not all of this has been implemented, 1540f4a2713aSLionel Sambucconsider this a design goal!). Our basic approach is to define a single 1541f4a2713aSLionel Sambucrecursive method evaluation method (``Expr::Evaluate``), which is implemented 1542f4a2713aSLionel Sambucin ``AST/ExprConstant.cpp``. Given an expression with "scalar" type (integer, 1543f4a2713aSLionel Sambucfp, complex, or pointer) this method returns the following information: 1544f4a2713aSLionel Sambuc 1545f4a2713aSLionel Sambuc* Whether the expression is an integer constant expression, a general constant 1546f4a2713aSLionel Sambuc that was folded but has no side effects, a general constant that was folded 1547f4a2713aSLionel Sambuc but that does have side effects, or an uncomputable/unfoldable value. 1548f4a2713aSLionel Sambuc* If the expression was computable in any way, this method returns the 1549f4a2713aSLionel Sambuc ``APValue`` for the result of the expression. 1550f4a2713aSLionel Sambuc* If the expression is not evaluatable at all, this method returns information 1551f4a2713aSLionel Sambuc on one of the problems with the expression. This includes a 1552f4a2713aSLionel Sambuc ``SourceLocation`` for where the problem is, and a diagnostic ID that explains 1553f4a2713aSLionel Sambuc the problem. The diagnostic should have ``ERROR`` type. 1554f4a2713aSLionel Sambuc* If the expression is not an integer constant expression, this method returns 1555f4a2713aSLionel Sambuc information on one of the problems with the expression. This includes a 1556f4a2713aSLionel Sambuc ``SourceLocation`` for where the problem is, and a diagnostic ID that 1557f4a2713aSLionel Sambuc explains the problem. The diagnostic should have ``EXTENSION`` type. 1558f4a2713aSLionel Sambuc 1559f4a2713aSLionel SambucThis information gives various clients the flexibility that they want, and we 1560f4a2713aSLionel Sambucwill eventually have some helper methods for various extensions. For example, 1561f4a2713aSLionel Sambuc``Sema`` should have a ``Sema::VerifyIntegerConstantExpression`` method, which 1562f4a2713aSLionel Sambuccalls ``Evaluate`` on the expression. If the expression is not foldable, the 1563f4a2713aSLionel Sambucerror is emitted, and it would return ``true``. If the expression is not an 1564f4a2713aSLionel Sambuci-c-e, the ``EXTENSION`` diagnostic is emitted. Finally it would return 1565f4a2713aSLionel Sambuc``false`` to indicate that the AST is OK. 1566f4a2713aSLionel Sambuc 1567f4a2713aSLionel SambucOther clients can use the information in other ways, for example, codegen can 1568f4a2713aSLionel Sambucjust use expressions that are foldable in any way. 1569f4a2713aSLionel Sambuc 1570f4a2713aSLionel SambucExtensions 1571f4a2713aSLionel Sambuc^^^^^^^^^^ 1572f4a2713aSLionel Sambuc 1573f4a2713aSLionel SambucThis section describes how some of the various extensions Clang supports 1574f4a2713aSLionel Sambucinteracts with constant evaluation: 1575f4a2713aSLionel Sambuc 1576f4a2713aSLionel Sambuc* ``__extension__``: The expression form of this extension causes any 1577f4a2713aSLionel Sambuc evaluatable subexpression to be accepted as an integer constant expression. 1578f4a2713aSLionel Sambuc* ``__builtin_constant_p``: This returns true (as an integer constant 1579f4a2713aSLionel Sambuc expression) if the operand evaluates to either a numeric value (that is, not 1580f4a2713aSLionel Sambuc a pointer cast to integral type) of integral, enumeration, floating or 1581f4a2713aSLionel Sambuc complex type, or if it evaluates to the address of the first character of a 1582f4a2713aSLionel Sambuc string literal (possibly cast to some other type). As a special case, if 1583f4a2713aSLionel Sambuc ``__builtin_constant_p`` is the (potentially parenthesized) condition of a 1584f4a2713aSLionel Sambuc conditional operator expression ("``?:``"), only the true side of the 1585f4a2713aSLionel Sambuc conditional operator is considered, and it is evaluated with full constant 1586f4a2713aSLionel Sambuc folding. 1587f4a2713aSLionel Sambuc* ``__builtin_choose_expr``: The condition is required to be an integer 1588f4a2713aSLionel Sambuc constant expression, but we accept any constant as an "extension of an 1589f4a2713aSLionel Sambuc extension". This only evaluates one operand depending on which way the 1590f4a2713aSLionel Sambuc condition evaluates. 1591f4a2713aSLionel Sambuc* ``__builtin_classify_type``: This always returns an integer constant 1592f4a2713aSLionel Sambuc expression. 1593f4a2713aSLionel Sambuc* ``__builtin_inf, nan, ...``: These are treated just like a floating-point 1594f4a2713aSLionel Sambuc literal. 1595f4a2713aSLionel Sambuc* ``__builtin_abs, copysign, ...``: These are constant folded as general 1596f4a2713aSLionel Sambuc constant expressions. 1597f4a2713aSLionel Sambuc* ``__builtin_strlen`` and ``strlen``: These are constant folded as integer 1598f4a2713aSLionel Sambuc constant expressions if the argument is a string literal. 1599f4a2713aSLionel Sambuc 1600*0a6a1f1dSLionel Sambuc.. _Sema: 1601*0a6a1f1dSLionel Sambuc 1602*0a6a1f1dSLionel SambucThe Sema Library 1603*0a6a1f1dSLionel Sambuc================ 1604*0a6a1f1dSLionel Sambuc 1605*0a6a1f1dSLionel SambucThis library is called by the :ref:`Parser library <Parser>` during parsing to 1606*0a6a1f1dSLionel Sambucdo semantic analysis of the input. For valid programs, Sema builds an AST for 1607*0a6a1f1dSLionel Sambucparsed constructs. 1608*0a6a1f1dSLionel Sambuc 1609*0a6a1f1dSLionel Sambuc.. _CodeGen: 1610*0a6a1f1dSLionel Sambuc 1611*0a6a1f1dSLionel SambucThe CodeGen Library 1612*0a6a1f1dSLionel Sambuc=================== 1613*0a6a1f1dSLionel Sambuc 1614*0a6a1f1dSLionel SambucCodeGen takes an :ref:`AST <AST>` as input and produces `LLVM IR code 1615*0a6a1f1dSLionel Sambuc<//llvm.org/docs/LangRef.html>`_ from it. 1616*0a6a1f1dSLionel Sambuc 1617f4a2713aSLionel SambucHow to change Clang 1618f4a2713aSLionel Sambuc=================== 1619f4a2713aSLionel Sambuc 1620f4a2713aSLionel SambucHow to add an attribute 1621f4a2713aSLionel Sambuc----------------------- 1622f4a2713aSLionel Sambuc 1623*0a6a1f1dSLionel SambucAttribute Basics 1624*0a6a1f1dSLionel Sambuc^^^^^^^^^^^^^^^^ 1625f4a2713aSLionel Sambuc 1626*0a6a1f1dSLionel SambucAttributes in clang come in two forms: parsed form, and semantic form. Both 1627*0a6a1f1dSLionel Sambucforms are represented via a tablegen definition of the attribute, specified in 1628*0a6a1f1dSLionel SambucAttr.td. 1629f4a2713aSLionel Sambuc 1630f4a2713aSLionel Sambuc 1631f4a2713aSLionel Sambuc``include/clang/Basic/Attr.td`` 1632f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1633f4a2713aSLionel Sambuc 1634*0a6a1f1dSLionel SambucFirst, add your attribute to the `include/clang/Basic/Attr.td 1635*0a6a1f1dSLionel Sambuc<http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/Attr.td?view=markup>`_ 1636*0a6a1f1dSLionel Sambucfile. 1637f4a2713aSLionel Sambuc 1638f4a2713aSLionel SambucEach attribute gets a ``def`` inheriting from ``Attr`` or one of its 1639f4a2713aSLionel Sambucsubclasses. ``InheritableAttr`` means that the attribute also applies to 1640*0a6a1f1dSLionel Sambucsubsequent declarations of the same name. ``InheritableParamAttr`` is similar 1641*0a6a1f1dSLionel Sambucto ``InheritableAttr``, except that the attribute is written on a parameter 1642*0a6a1f1dSLionel Sambucinstead of a declaration, type or statement. Attributes inheriting from 1643*0a6a1f1dSLionel Sambuc``TypeAttr`` are pure type attributes which generally are not given a 1644*0a6a1f1dSLionel Sambucrepresentation in the AST. Attributes inheriting from ``TargetSpecificAttr`` 1645*0a6a1f1dSLionel Sambucare attributes specific to one or more target architectures. An attribute that 1646*0a6a1f1dSLionel Sambucinherits from ``IgnoredAttr`` is parsed, but will generate an ignored attribute 1647*0a6a1f1dSLionel Sambucdiagnostic when used. The attribute type may be useful when an attribute is 1648*0a6a1f1dSLionel Sambucsupported by another vendor, but not supported by clang. 1649f4a2713aSLionel Sambuc 1650f4a2713aSLionel Sambuc``Spellings`` lists the strings that can appear in ``__attribute__((here))`` or 1651*0a6a1f1dSLionel Sambuc``[[here]]``. All such strings will be synonymous. Possible ``Spellings`` 1652*0a6a1f1dSLionel Sambucare: ``GNU`` (for use with GNU-style __attribute__ spellings), ``Declspec`` 1653*0a6a1f1dSLionel Sambuc(for use with Microsoft Visual Studio-style __declspec spellings), ``CXX11` 1654*0a6a1f1dSLionel Sambuc(for use with C++11-style [[foo]] and [[foo::bar]] spellings), and ``Keyword`` 1655*0a6a1f1dSLionel Sambuc(for use with attributes that are implemented as keywords, like C++11's 1656*0a6a1f1dSLionel Sambuc``override`` or ``final``). If you want to allow the ``[[]]`` C++11 syntax, you 1657*0a6a1f1dSLionel Sambuchave to define a list of ``Namespaces``, which will let users write 1658*0a6a1f1dSLionel Sambuc``[[namespace::spelling]]``. Using the empty string for a namespace will allow 1659*0a6a1f1dSLionel Sambucusers to write just the spelling with no "``::``". Attributes which g++-4.8 1660*0a6a1f1dSLionel Sambucor later accepts should also have a ``CXX11<"gnu", "spelling">`` spelling. 1661f4a2713aSLionel Sambuc 1662f4a2713aSLionel Sambuc``Subjects`` restricts what kinds of AST node to which this attribute can 1663*0a6a1f1dSLionel Sambucappertain (roughly, attach). The subjects are specified via a ``SubjectList``, 1664*0a6a1f1dSLionel Sambucwhich specify the list of subjects. Additionally, subject-related diagnostics 1665*0a6a1f1dSLionel Sambuccan be specified to be warnings or errors, with the default being a warning. 1666*0a6a1f1dSLionel SambucThe diagnostics displayed to the user are automatically determined based on 1667*0a6a1f1dSLionel Sambucthe subjects in the list, but a custom diagnostic parameter can also be 1668*0a6a1f1dSLionel Sambucspecified in the ``SubjectList``. The diagnostics generated for subject list 1669*0a6a1f1dSLionel Sambucviolations are either ``diag::warn_attribute_wrong_decl_type`` or 1670*0a6a1f1dSLionel Sambuc``diag::err_attribute_wrong_decl_type``, and the parameter enumeration is 1671*0a6a1f1dSLionel Sambucfound in `include/clang/Sema/AttributeList.h 1672*0a6a1f1dSLionel Sambuc<http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Sema/AttributeList.h?view=markup>`_ 1673*0a6a1f1dSLionel SambucIf you add new Decl nodes to the ``SubjectList``, you may need to update the 1674*0a6a1f1dSLionel Sambuclogic used to automatically determine the diagnostic parameter in `utils/TableGen/ClangAttrEmitter.cpp 1675*0a6a1f1dSLionel Sambuc<http://llvm.org/viewvc/llvm-project/cfe/trunk/utils/TableGen/ClangAttrEmitter.cpp?view=markup>`_. 1676*0a6a1f1dSLionel Sambuc 1677*0a6a1f1dSLionel SambucDiagnostic checking for attribute subject lists is automated except when 1678*0a6a1f1dSLionel Sambuc``HasCustomParsing`` is set to ``1``. 1679*0a6a1f1dSLionel Sambuc 1680*0a6a1f1dSLionel SambucBy default, all subjects in the SubjectList must either be a Decl node defined 1681*0a6a1f1dSLionel Sambucin ``DeclNodes.td``, or a statement node defined in ``StmtNodes.td``. However, 1682*0a6a1f1dSLionel Sambucmore complex subjects can be created by creating a ``SubsetSubject`` object. 1683*0a6a1f1dSLionel SambucEach such object has a base subject which it appertains to (which must be a 1684*0a6a1f1dSLionel SambucDecl or Stmt node, and not a SubsetSubject node), and some custom code which is 1685*0a6a1f1dSLionel Sambuccalled when determining whether an attribute appertains to the subject. For 1686*0a6a1f1dSLionel Sambucinstance, a ``NonBitField`` SubsetSubject appertains to a ``FieldDecl``, and 1687*0a6a1f1dSLionel Sambuctests whether the given FieldDecl is a bit field. When a SubsetSubject is 1688*0a6a1f1dSLionel Sambucspecified in a SubjectList, a custom diagnostic parameter must also be provided. 1689f4a2713aSLionel Sambuc 1690f4a2713aSLionel Sambuc``Args`` names the arguments the attribute takes, in order. If ``Args`` is 1691f4a2713aSLionel Sambuc``[StringArgument<"Arg1">, IntArgument<"Arg2">]`` then 1692*0a6a1f1dSLionel Sambuc``__attribute__((myattribute("Hello", 3)))`` will be a valid use. Attribute 1693*0a6a1f1dSLionel Sambucarguments specify both the parsed form and the semantic form of the attribute. 1694*0a6a1f1dSLionel SambucThe previous example shows an attribute which requires two attributes while 1695*0a6a1f1dSLionel Sambucparsing, and the Attr subclass' constructor for the attribute will require a 1696*0a6a1f1dSLionel Sambucstring and integer argument. 1697*0a6a1f1dSLionel Sambuc 1698*0a6a1f1dSLionel SambucDiagnostic checking for argument counts is automated except when 1699*0a6a1f1dSLionel Sambuc``HasCustomParsing`` is set to ``1``, or when the attribute uses an optional or 1700*0a6a1f1dSLionel Sambucvariadic argument. Diagnostic checking for argument semantics is not automated. 1701*0a6a1f1dSLionel Sambuc 1702*0a6a1f1dSLionel SambucIf the parsed form of the attribute is more complex, or differs from the 1703*0a6a1f1dSLionel Sambucsemantic form, the ``HasCustomParsing`` bit can be set to ``1`` for the class, 1704*0a6a1f1dSLionel Sambucand the parsing code in `Parser::ParseGNUAttributeArgs 1705*0a6a1f1dSLionel Sambuc<http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Parse/ParseDecl.cpp?view=markup>`_ 1706*0a6a1f1dSLionel Sambuccan be updated for the special case. Note that this only applies to arguments 1707*0a6a1f1dSLionel Sambucwith a GNU spelling -- attributes with a __declspec spelling currently ignore 1708*0a6a1f1dSLionel Sambucthis flag and are handled by ``Parser::ParseMicrosoftDeclSpec``. 1709*0a6a1f1dSLionel Sambuc 1710*0a6a1f1dSLionel SambucCustom accessors can be generated for an attribute based on the spelling list 1711*0a6a1f1dSLionel Sambucfor that attribute. For instance, if an attribute has two different spellings: 1712*0a6a1f1dSLionel Sambuc'Foo' and 'Bar', accessors can be created: 1713*0a6a1f1dSLionel Sambuc``[Accessor<"isFoo", [GNU<"Foo">]>, Accessor<"isBar", [GNU<"Bar">]>]`` 1714*0a6a1f1dSLionel SambucThese accessors will be generated on the semantic form of the attribute, 1715*0a6a1f1dSLionel Sambucaccepting no arguments and returning a Boolean. 1716*0a6a1f1dSLionel Sambuc 1717*0a6a1f1dSLionel SambucAttributes which do not require an AST node should set the ``ASTNode`` field to 1718*0a6a1f1dSLionel Sambuc``0`` to avoid polluting the AST. Note that anything inheriting from 1719*0a6a1f1dSLionel Sambuc``TypeAttr`` or ``IgnoredAttr`` automatically do not generate an AST node. All 1720*0a6a1f1dSLionel Sambucother attributes generate an AST node by default. The AST node is the semantic 1721*0a6a1f1dSLionel Sambucrepresentation of the attribute. 1722*0a6a1f1dSLionel Sambuc 1723*0a6a1f1dSLionel SambucAttributes which do not require custom semantic handling should set the 1724*0a6a1f1dSLionel Sambuc``SemaHandler`` field to ``0``. Note that anything inheriting from 1725*0a6a1f1dSLionel Sambuc``IgnoredAttr`` automatically do not get a semantic handler. All other 1726*0a6a1f1dSLionel Sambucattributes are assumed to use a semantic handler by default. Attributes 1727*0a6a1f1dSLionel Sambucwithout a semantic handler are not given a parsed attribute Kind enumeration. 1728*0a6a1f1dSLionel Sambuc 1729*0a6a1f1dSLionel SambucThe ``LangOpts`` field can be used to specify a list of language options 1730*0a6a1f1dSLionel Sambucrequired by the attribute. For instance, all of the CUDA-specific attributes 1731*0a6a1f1dSLionel Sambucspecify ``[CUDA]`` for the ``LangOpts`` field, and when the CUDA language 1732*0a6a1f1dSLionel Sambucoption is not enabled, an "attribute ignored" warning diagnostic is emitted. 1733*0a6a1f1dSLionel SambucSince language options are not table generated nodes, new language options must 1734*0a6a1f1dSLionel Sambucbe created manually and should specify the spelling used by ``LangOptions`` class. 1735*0a6a1f1dSLionel Sambuc 1736*0a6a1f1dSLionel SambucTarget-specific attribute sometimes share a spelling with other attributes in 1737*0a6a1f1dSLionel Sambucdifferent targets. For instance, the ARM and MSP430 targets both have an 1738*0a6a1f1dSLionel Sambucattribute spelled ``GNU<"interrupt">``, but with different parsing and semantic 1739*0a6a1f1dSLionel Sambucrequirements. To support this feature, an attribute inheriting from 1740*0a6a1f1dSLionel Sambuc``TargetSpecificAttribute`` make specify a ``ParseKind`` field. This field 1741*0a6a1f1dSLionel Sambucshould be the same value between all arguments sharing a spelling, and 1742*0a6a1f1dSLionel Sambuccorresponds to the parsed attribute's Kind enumeration. This allows attributes 1743*0a6a1f1dSLionel Sambucto share a parsed attribute kind, but have distinct semantic attribute classes. 1744*0a6a1f1dSLionel SambucFor instance, ``AttributeList::AT_Interrupt`` is the shared parsed attribute 1745*0a6a1f1dSLionel Sambuckind, but ARMInterruptAttr and MSP430InterruptAttr are the semantic attributes 1746*0a6a1f1dSLionel Sambucgenerated. 1747*0a6a1f1dSLionel Sambuc 1748*0a6a1f1dSLionel SambucBy default, when declarations are merging attributes, an attribute will not be 1749*0a6a1f1dSLionel Sambucduplicated. However, if an attribute can be duplicated during this merging 1750*0a6a1f1dSLionel Sambucstage, set ``DuplicatesAllowedWhileMerging`` to ``1``, and the attribute will 1751*0a6a1f1dSLionel Sambucbe merged. 1752*0a6a1f1dSLionel Sambuc 1753*0a6a1f1dSLionel SambucBy default, attribute arguments are parsed in an evaluated context. If the 1754*0a6a1f1dSLionel Sambucarguments for an attribute should be parsed in an unevaluated context (akin to 1755*0a6a1f1dSLionel Sambucthe way the argument to a ``sizeof`` expression is parsed), you can set 1756*0a6a1f1dSLionel Sambuc``ParseArgumentsAsUnevaluated`` to ``1``. 1757*0a6a1f1dSLionel Sambuc 1758*0a6a1f1dSLionel SambucIf additional functionality is desired for the semantic form of the attribute, 1759*0a6a1f1dSLionel Sambucthe ``AdditionalMembers`` field specifies code to be copied verbatim into the 1760*0a6a1f1dSLionel Sambucsemantic attribute class object. 1761*0a6a1f1dSLionel Sambuc 1762*0a6a1f1dSLionel SambucAll attributes must have one or more form of documentation, which is provided 1763*0a6a1f1dSLionel Sambucin the ``Documentation`` list. Generally, the documentation for an attribute 1764*0a6a1f1dSLionel Sambucis a stand-alone definition in `include/clang/Basic/AttrDocs.td 1765*0a6a1f1dSLionel Sambuc<http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/AttdDocs.td?view=markup>`_ 1766*0a6a1f1dSLionel Sambucthat is named after the attribute being documented. Each documentation element 1767*0a6a1f1dSLionel Sambucis given a ``Category`` (variable, function, or type) and ``Content``. A single 1768*0a6a1f1dSLionel Sambucattribute may contain multiple documentation elements for distinct categories. 1769*0a6a1f1dSLionel SambucFor instance, an attribute which can appertain to both function and types (such 1770*0a6a1f1dSLionel Sambucas a calling convention attribute), should contain two documentation elements. 1771*0a6a1f1dSLionel SambucThe ``Content`` for an attribute uses reStructuredText (RST) syntax. 1772*0a6a1f1dSLionel Sambuc 1773*0a6a1f1dSLionel SambucIf an attribute is used internally by the compiler, but is not written by users 1774*0a6a1f1dSLionel Sambuc(such as attributes with an empty spelling list), it can use the 1775*0a6a1f1dSLionel Sambuc``Undocumented`` documentation element. 1776f4a2713aSLionel Sambuc 1777f4a2713aSLionel SambucBoilerplate 1778f4a2713aSLionel Sambuc^^^^^^^^^^^ 1779f4a2713aSLionel Sambuc 1780*0a6a1f1dSLionel SambucAll semantic processing of declaration attributes happens in `lib/Sema/SemaDeclAttr.cpp 1781f4a2713aSLionel Sambuc<http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Sema/SemaDeclAttr.cpp?view=markup>`_, 1782*0a6a1f1dSLionel Sambucand generally starts in the ``ProcessDeclAttribute`` function. If your 1783*0a6a1f1dSLionel Sambucattribute is a "simple" attribute -- meaning that it requires no custom 1784*0a6a1f1dSLionel Sambucsemantic processing aside from what is automatically provided for you, you can 1785*0a6a1f1dSLionel Sambucadd a call to ``handleSimpleAttribute<YourAttr>(S, D, Attr);`` to the switch 1786*0a6a1f1dSLionel Sambucstatement. Otherwise, write a new ``handleYourAttr()`` function, and add that 1787*0a6a1f1dSLionel Sambucto the switch statement. 1788f4a2713aSLionel Sambuc 1789f4a2713aSLionel SambucIf your attribute causes extra warnings to fire, define a ``DiagGroup`` in 1790f4a2713aSLionel Sambuc`include/clang/Basic/DiagnosticGroups.td 1791f4a2713aSLionel Sambuc<http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/DiagnosticGroups.td?view=markup>`_ 1792f4a2713aSLionel Sambucnamed after the attribute's ``Spelling`` with "_"s replaced by "-"s. If you're 1793f4a2713aSLionel Sambuconly defining one diagnostic, you can skip ``DiagnosticGroups.td`` and use 1794f4a2713aSLionel Sambuc``InGroup<DiagGroup<"your-attribute">>`` directly in `DiagnosticSemaKinds.td 1795f4a2713aSLionel Sambuc<http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/DiagnosticSemaKinds.td?view=markup>`_ 1796f4a2713aSLionel Sambuc 1797*0a6a1f1dSLionel SambucAll semantic diagnostics generated for your attribute, including automatically- 1798*0a6a1f1dSLionel Sambucgenerated ones (such as subjects and argument counts), should have a 1799*0a6a1f1dSLionel Sambuccorresponding test case. 1800*0a6a1f1dSLionel Sambuc 1801f4a2713aSLionel SambucThe meat of your attribute 1802f4a2713aSLionel Sambuc^^^^^^^^^^^^^^^^^^^^^^^^^^ 1803f4a2713aSLionel Sambuc 1804f4a2713aSLionel SambucFind an appropriate place in Clang to do whatever your attribute needs to do. 1805f4a2713aSLionel SambucCheck for the attribute's presence using ``Decl::getAttr<YourAttr>()``. 1806f4a2713aSLionel Sambuc 1807f4a2713aSLionel SambucUpdate the :doc:`LanguageExtensions` document to describe your new attribute. 1808f4a2713aSLionel Sambuc 1809f4a2713aSLionel SambucHow to add an expression or statement 1810f4a2713aSLionel Sambuc------------------------------------- 1811f4a2713aSLionel Sambuc 1812f4a2713aSLionel SambucExpressions and statements are one of the most fundamental constructs within a 1813f4a2713aSLionel Sambuccompiler, because they interact with many different parts of the AST, semantic 1814f4a2713aSLionel Sambucanalysis, and IR generation. Therefore, adding a new expression or statement 1815f4a2713aSLionel Sambuckind into Clang requires some care. The following list details the various 1816f4a2713aSLionel Sambucplaces in Clang where an expression or statement needs to be introduced, along 1817f4a2713aSLionel Sambucwith patterns to follow to ensure that the new expression or statement works 1818f4a2713aSLionel Sambucwell across all of the C languages. We focus on expressions, but statements 1819f4a2713aSLionel Sambucare similar. 1820f4a2713aSLionel Sambuc 1821f4a2713aSLionel Sambuc#. Introduce parsing actions into the parser. Recursive-descent parsing is 1822f4a2713aSLionel Sambuc mostly self-explanatory, but there are a few things that are worth keeping 1823f4a2713aSLionel Sambuc in mind: 1824f4a2713aSLionel Sambuc 1825f4a2713aSLionel Sambuc * Keep as much source location information as possible! You'll want it later 1826f4a2713aSLionel Sambuc to produce great diagnostics and support Clang's various features that map 1827f4a2713aSLionel Sambuc between source code and the AST. 1828f4a2713aSLionel Sambuc * Write tests for all of the "bad" parsing cases, to make sure your recovery 1829f4a2713aSLionel Sambuc is good. If you have matched delimiters (e.g., parentheses, square 1830f4a2713aSLionel Sambuc brackets, etc.), use ``Parser::BalancedDelimiterTracker`` to give nice 1831f4a2713aSLionel Sambuc diagnostics when things go wrong. 1832f4a2713aSLionel Sambuc 1833f4a2713aSLionel Sambuc#. Introduce semantic analysis actions into ``Sema``. Semantic analysis should 1834f4a2713aSLionel Sambuc always involve two functions: an ``ActOnXXX`` function that will be called 1835f4a2713aSLionel Sambuc directly from the parser, and a ``BuildXXX`` function that performs the 1836f4a2713aSLionel Sambuc actual semantic analysis and will (eventually!) build the AST node. It's 1837f4a2713aSLionel Sambuc fairly common for the ``ActOnCXX`` function to do very little (often just 1838f4a2713aSLionel Sambuc some minor translation from the parser's representation to ``Sema``'s 1839f4a2713aSLionel Sambuc representation of the same thing), but the separation is still important: 1840f4a2713aSLionel Sambuc C++ template instantiation, for example, should always call the ``BuildXXX`` 1841f4a2713aSLionel Sambuc variant. Several notes on semantic analysis before we get into construction 1842f4a2713aSLionel Sambuc of the AST: 1843f4a2713aSLionel Sambuc 1844f4a2713aSLionel Sambuc * Your expression probably involves some types and some subexpressions. 1845f4a2713aSLionel Sambuc Make sure to fully check that those types, and the types of those 1846f4a2713aSLionel Sambuc subexpressions, meet your expectations. Add implicit conversions where 1847f4a2713aSLionel Sambuc necessary to make sure that all of the types line up exactly the way you 1848f4a2713aSLionel Sambuc want them. Write extensive tests to check that you're getting good 1849f4a2713aSLionel Sambuc diagnostics for mistakes and that you can use various forms of 1850f4a2713aSLionel Sambuc subexpressions with your expression. 1851f4a2713aSLionel Sambuc * When type-checking a type or subexpression, make sure to first check 1852f4a2713aSLionel Sambuc whether the type is "dependent" (``Type::isDependentType()``) or whether a 1853f4a2713aSLionel Sambuc subexpression is type-dependent (``Expr::isTypeDependent()``). If any of 1854f4a2713aSLionel Sambuc these return ``true``, then you're inside a template and you can't do much 1855f4a2713aSLionel Sambuc type-checking now. That's normal, and your AST node (when you get there) 1856f4a2713aSLionel Sambuc will have to deal with this case. At this point, you can write tests that 1857f4a2713aSLionel Sambuc use your expression within templates, but don't try to instantiate the 1858f4a2713aSLionel Sambuc templates. 1859f4a2713aSLionel Sambuc * For each subexpression, be sure to call ``Sema::CheckPlaceholderExpr()`` 1860f4a2713aSLionel Sambuc to deal with "weird" expressions that don't behave well as subexpressions. 1861f4a2713aSLionel Sambuc Then, determine whether you need to perform lvalue-to-rvalue conversions 1862f4a2713aSLionel Sambuc (``Sema::DefaultLvalueConversions``) or the usual unary conversions 1863f4a2713aSLionel Sambuc (``Sema::UsualUnaryConversions``), for places where the subexpression is 1864f4a2713aSLionel Sambuc producing a value you intend to use. 1865f4a2713aSLionel Sambuc * Your ``BuildXXX`` function will probably just return ``ExprError()`` at 1866f4a2713aSLionel Sambuc this point, since you don't have an AST. That's perfectly fine, and 1867f4a2713aSLionel Sambuc shouldn't impact your testing. 1868f4a2713aSLionel Sambuc 1869f4a2713aSLionel Sambuc#. Introduce an AST node for your new expression. This starts with declaring 1870f4a2713aSLionel Sambuc the node in ``include/Basic/StmtNodes.td`` and creating a new class for your 1871f4a2713aSLionel Sambuc expression in the appropriate ``include/AST/Expr*.h`` header. It's best to 1872f4a2713aSLionel Sambuc look at the class for a similar expression to get ideas, and there are some 1873f4a2713aSLionel Sambuc specific things to watch for: 1874f4a2713aSLionel Sambuc 1875f4a2713aSLionel Sambuc * If you need to allocate memory, use the ``ASTContext`` allocator to 1876f4a2713aSLionel Sambuc allocate memory. Never use raw ``malloc`` or ``new``, and never hold any 1877f4a2713aSLionel Sambuc resources in an AST node, because the destructor of an AST node is never 1878f4a2713aSLionel Sambuc called. 1879f4a2713aSLionel Sambuc * Make sure that ``getSourceRange()`` covers the exact source range of your 1880f4a2713aSLionel Sambuc expression. This is needed for diagnostics and for IDE support. 1881f4a2713aSLionel Sambuc * Make sure that ``children()`` visits all of the subexpressions. This is 1882f4a2713aSLionel Sambuc important for a number of features (e.g., IDE support, C++ variadic 1883f4a2713aSLionel Sambuc templates). If you have sub-types, you'll also need to visit those 1884*0a6a1f1dSLionel Sambuc sub-types in ``RecursiveASTVisitor`` and ``DataRecursiveASTVisitor``. 1885*0a6a1f1dSLionel Sambuc * Add printing support (``StmtPrinter.cpp``) for your expression. 1886f4a2713aSLionel Sambuc * Add profiling support (``StmtProfile.cpp``) for your AST node, noting the 1887f4a2713aSLionel Sambuc distinguishing (non-source location) characteristics of an instance of 1888f4a2713aSLionel Sambuc your expression. Omitting this step will lead to hard-to-diagnose 1889f4a2713aSLionel Sambuc failures regarding matching of template declarations. 1890*0a6a1f1dSLionel Sambuc * Add serialization support (``ASTReaderStmt.cpp``, ``ASTWriterStmt.cpp``) 1891*0a6a1f1dSLionel Sambuc for your AST node. 1892f4a2713aSLionel Sambuc 1893f4a2713aSLionel Sambuc#. Teach semantic analysis to build your AST node. At this point, you can wire 1894f4a2713aSLionel Sambuc up your ``Sema::BuildXXX`` function to actually create your AST. A few 1895f4a2713aSLionel Sambuc things to check at this point: 1896f4a2713aSLionel Sambuc 1897f4a2713aSLionel Sambuc * If your expression can construct a new C++ class or return a new 1898f4a2713aSLionel Sambuc Objective-C object, be sure to update and then call 1899f4a2713aSLionel Sambuc ``Sema::MaybeBindToTemporary`` for your just-created AST node to be sure 1900f4a2713aSLionel Sambuc that the object gets properly destructed. An easy way to test this is to 1901f4a2713aSLionel Sambuc return a C++ class with a private destructor: semantic analysis should 1902f4a2713aSLionel Sambuc flag an error here with the attempt to call the destructor. 1903f4a2713aSLionel Sambuc * Inspect the generated AST by printing it using ``clang -cc1 -ast-print``, 1904f4a2713aSLionel Sambuc to make sure you're capturing all of the important information about how 1905f4a2713aSLionel Sambuc the AST was written. 1906f4a2713aSLionel Sambuc * Inspect the generated AST under ``clang -cc1 -ast-dump`` to verify that 1907f4a2713aSLionel Sambuc all of the types in the generated AST line up the way you want them. 1908f4a2713aSLionel Sambuc Remember that clients of the AST should never have to "think" to 1909f4a2713aSLionel Sambuc understand what's going on. For example, all implicit conversions should 1910f4a2713aSLionel Sambuc show up explicitly in the AST. 1911f4a2713aSLionel Sambuc * Write tests that use your expression as a subexpression of other, 1912f4a2713aSLionel Sambuc well-known expressions. Can you call a function using your expression as 1913f4a2713aSLionel Sambuc an argument? Can you use the ternary operator? 1914f4a2713aSLionel Sambuc 1915f4a2713aSLionel Sambuc#. Teach code generation to create IR to your AST node. This step is the first 1916f4a2713aSLionel Sambuc (and only) that requires knowledge of LLVM IR. There are several things to 1917f4a2713aSLionel Sambuc keep in mind: 1918f4a2713aSLionel Sambuc 1919f4a2713aSLionel Sambuc * Code generation is separated into scalar/aggregate/complex and 1920f4a2713aSLionel Sambuc lvalue/rvalue paths, depending on what kind of result your expression 1921f4a2713aSLionel Sambuc produces. On occasion, this requires some careful factoring of code to 1922f4a2713aSLionel Sambuc avoid duplication. 1923f4a2713aSLionel Sambuc * ``CodeGenFunction`` contains functions ``ConvertType`` and 1924f4a2713aSLionel Sambuc ``ConvertTypeForMem`` that convert Clang's types (``clang::Type*`` or 1925f4a2713aSLionel Sambuc ``clang::QualType``) to LLVM types. Use the former for values, and the 1926f4a2713aSLionel Sambuc later for memory locations: test with the C++ "``bool``" type to check 1927f4a2713aSLionel Sambuc this. If you find that you are having to use LLVM bitcasts to make the 1928f4a2713aSLionel Sambuc subexpressions of your expression have the type that your expression 1929f4a2713aSLionel Sambuc expects, STOP! Go fix semantic analysis and the AST so that you don't 1930f4a2713aSLionel Sambuc need these bitcasts. 1931f4a2713aSLionel Sambuc * The ``CodeGenFunction`` class has a number of helper functions to make 1932f4a2713aSLionel Sambuc certain operations easy, such as generating code to produce an lvalue or 1933f4a2713aSLionel Sambuc an rvalue, or to initialize a memory location with a given value. Prefer 1934f4a2713aSLionel Sambuc to use these functions rather than directly writing loads and stores, 1935f4a2713aSLionel Sambuc because these functions take care of some of the tricky details for you 1936f4a2713aSLionel Sambuc (e.g., for exceptions). 1937f4a2713aSLionel Sambuc * If your expression requires some special behavior in the event of an 1938f4a2713aSLionel Sambuc exception, look at the ``push*Cleanup`` functions in ``CodeGenFunction`` 1939f4a2713aSLionel Sambuc to introduce a cleanup. You shouldn't have to deal with 1940f4a2713aSLionel Sambuc exception-handling directly. 1941f4a2713aSLionel Sambuc * Testing is extremely important in IR generation. Use ``clang -cc1 1942f4a2713aSLionel Sambuc -emit-llvm`` and `FileCheck 1943f4a2713aSLionel Sambuc <http://llvm.org/docs/CommandGuide/FileCheck.html>`_ to verify that you're 1944f4a2713aSLionel Sambuc generating the right IR. 1945f4a2713aSLionel Sambuc 1946f4a2713aSLionel Sambuc#. Teach template instantiation how to cope with your AST node, which requires 1947f4a2713aSLionel Sambuc some fairly simple code: 1948f4a2713aSLionel Sambuc 1949f4a2713aSLionel Sambuc * Make sure that your expression's constructor properly computes the flags 1950f4a2713aSLionel Sambuc for type dependence (i.e., the type your expression produces can change 1951f4a2713aSLionel Sambuc from one instantiation to the next), value dependence (i.e., the constant 1952f4a2713aSLionel Sambuc value your expression produces can change from one instantiation to the 1953f4a2713aSLionel Sambuc next), instantiation dependence (i.e., a template parameter occurs 1954f4a2713aSLionel Sambuc anywhere in your expression), and whether your expression contains a 1955f4a2713aSLionel Sambuc parameter pack (for variadic templates). Often, computing these flags 1956f4a2713aSLionel Sambuc just means combining the results from the various types and 1957f4a2713aSLionel Sambuc subexpressions. 1958f4a2713aSLionel Sambuc * Add ``TransformXXX`` and ``RebuildXXX`` functions to the ``TreeTransform`` 1959f4a2713aSLionel Sambuc class template in ``Sema``. ``TransformXXX`` should (recursively) 1960f4a2713aSLionel Sambuc transform all of the subexpressions and types within your expression, 1961f4a2713aSLionel Sambuc using ``getDerived().TransformYYY``. If all of the subexpressions and 1962f4a2713aSLionel Sambuc types transform without error, it will then call the ``RebuildXXX`` 1963f4a2713aSLionel Sambuc function, which will in turn call ``getSema().BuildXXX`` to perform 1964f4a2713aSLionel Sambuc semantic analysis and build your expression. 1965f4a2713aSLionel Sambuc * To test template instantiation, take those tests you wrote to make sure 1966f4a2713aSLionel Sambuc that you were type checking with type-dependent expressions and dependent 1967f4a2713aSLionel Sambuc types (from step #2) and instantiate those templates with various types, 1968f4a2713aSLionel Sambuc some of which type-check and some that don't, and test the error messages 1969f4a2713aSLionel Sambuc in each case. 1970f4a2713aSLionel Sambuc 1971f4a2713aSLionel Sambuc#. There are some "extras" that make other features work better. It's worth 1972f4a2713aSLionel Sambuc handling these extras to give your expression complete integration into 1973f4a2713aSLionel Sambuc Clang: 1974f4a2713aSLionel Sambuc 1975f4a2713aSLionel Sambuc * Add code completion support for your expression in 1976f4a2713aSLionel Sambuc ``SemaCodeComplete.cpp``. 1977f4a2713aSLionel Sambuc * If your expression has types in it, or has any "interesting" features 1978f4a2713aSLionel Sambuc other than subexpressions, extend libclang's ``CursorVisitor`` to provide 1979f4a2713aSLionel Sambuc proper visitation for your expression, enabling various IDE features such 1980f4a2713aSLionel Sambuc as syntax highlighting, cross-referencing, and so on. The 1981f4a2713aSLionel Sambuc ``c-index-test`` helper program can be used to test these features. 1982f4a2713aSLionel Sambuc 1983