xref: /llvm-project/clang/docs/ClangRepl.rst (revision 1f02743851d82d6515f49f5b4379b9793d5ff5c9)
1===========
2Clang-Repl
3===========
4
5**Clang-Repl** is an interactive C++ interpreter that allows for incremental
6compilation. It supports interactive programming for C++ in a
7read-evaluate-print-loop (REPL) style. It uses Clang as a library to compile the
8high level programming language into LLVM IR. Then the LLVM IR is executed by
9the LLVM just-in-time (JIT) infrastructure.
10
11Clang-Repl is suitable for exploratory programming and in places where time
12to insight is important. Clang-Repl is a project inspired by the work in
13`Cling <https://github.com/root-project/cling>`_, a LLVM-based C/C++ interpreter
14developed by the field of high energy physics and used by the scientific data
15analysis framework `ROOT <https://root.cern/>`_. Clang-Repl allows to move parts
16of Cling upstream, making them useful and available to a broader audience.
17
18
19Clang-Repl Basic Data Flow
20==========================
21
22.. image:: ClangRepl_design.png
23   :align: center
24   :alt: ClangRepl design
25
26Clang-Repl data flow can be divided into roughly 8 phases:
27
281. Clang-Repl controls the input infrastructure by an interactive prompt or by
29   an interface allowing the incremental processing of input.
30
312. Then it sends the input to the underlying incremental facilities in Clang
32   infrastructure.
33
343. Clang compiles the input into an AST representation.
35
364. When required the AST can be further transformed in order to attach specific
37   behavior.
38
395. The AST representation is then lowered to LLVM IR.
40
416. The LLVM IR is the input format for LLVM’s JIT compilation infrastructure.
42   The tool will instruct the JIT to run specified functions, translating them
43   into machine code targeting the underlying device architecture (eg. Intel
44   x86 or NVPTX).
45
467. The LLVM JIT lowers the LLVM IR to machine code.
47
488. The machine code is then executed.
49
50Build Instructions:
51===================
52
53
54.. code-block:: console
55
56   $ cd llvm-project
57   $ mkdir build
58   $ cd build
59   $ cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DLLVM_ENABLE_PROJECTS=clang -G "Unix Makefiles" ../llvm
60
61**Note here**, above RelWithDebInfo - Debug / Release
62
63.. code-block:: console
64
65   cmake --build . --target clang clang-repl -j n
66      OR
67   cmake --build . --target clang clang-repl
68
69**Clang-repl** is built under llvm-project/build/bin. Proceed into the directory **llvm-project/build/bin**
70
71.. code-block:: console
72
73   ./clang-repl
74   clang-repl>
75
76
77Clang-Repl Usage
78================
79
80**Clang-Repl** is an interactive C++ interpreter that allows for incremental
81compilation. It supports interactive programming for C++ in a
82read-evaluate-print-loop (REPL) style. It uses Clang as a library to compile the
83high level programming language into LLVM IR. Then the LLVM IR is executed by
84the LLVM just-in-time (JIT) infrastructure.
85
86
87Basic:
88======
89
90.. code-block:: text
91
92  clang-repl> #include <iostream>
93  clang-repl> int f() { std::cout << "Hello Interpreted World!\n"; return 0; }
94  clang-repl> auto r = f();
95   // Prints Hello Interpreted World!
96
97.. code-block:: text
98
99   clang-repl> #include<iostream>
100   clang-repl> using namespace std;
101   clang-repl> std::cout << "Welcome to CLANG-REPL" << std::endl;
102   Welcome to CLANG-REPL
103   // Prints Welcome to CLANG-REPL
104
105
106Function Definitions and Calls:
107===============================
108
109.. code-block:: text
110
111   clang-repl> #include <iostream>
112   clang-repl> int sum(int a, int b){ return a+b; };
113   clang-repl> int c = sum(9,10);
114   clang-repl> std::cout << c << std::endl;
115   19
116   clang-repl>
117
118Iterative Structures:
119=====================
120
121.. code-block:: text
122
123   clang-repl> #include <iostream>
124   clang-repl> for (int i = 0;i < 3;i++){ std::cout << i << std::endl;}
125   0
126   1
127   2
128   clang-repl> while(i < 7){ i++; std::cout << i << std::endl;}
129   4
130   5
131   6
132   7
133
134Classes and Structures:
135=======================
136
137.. code-block:: text
138
139   clang-repl> #include <iostream>
140   clang-repl> class Rectangle {int width, height; public: void set_values (int,int);\
141   clang-repl... int area() {return width*height;}};
142   clang-repl>  void Rectangle::set_values (int x, int y) { width = x;height = y;}
143   clang-repl> int main () { Rectangle rect;rect.set_values (3,4);\
144   clang-repl... std::cout << "area: " << rect.area() << std::endl;\
145   clang-repl... return 0;}
146   clang-repl> main();
147   area: 12
148   clang-repl>
149   // Note: This '\' can be used for continuation of the statements in the next line
150
151Lamdas:
152=======
153
154.. code-block:: text
155
156   clang-repl> #include <iostream>
157   clang-repl> using namespace std;
158   clang-repl> auto welcome = []()  { std::cout << "Welcome to REPL" << std::endl;};
159   clang-repl> welcome();
160   Welcome to REPL
161
162Using Dynamic Library:
163======================
164
165.. code-block:: text
166
167   clang-repl> %lib print.so
168   clang-repl> #include"print.hpp"
169   clang-repl> print(9);
170   9
171
172**Generation of dynamic library**
173
174.. code-block:: text
175
176   // print.cpp
177   #include <iostream>
178   #include "print.hpp"
179
180   void print(int a)
181   {
182      std::cout << a << std::endl;
183   }
184
185   // print.hpp
186   void print (int a);
187
188   // Commands
189   clang++-17  -c -o print.o print.cpp
190   clang-17 -shared print.o -o print.so
191
192Comments:
193=========
194
195.. code-block:: text
196
197   clang-repl> // Comments in Clang-Repl
198   clang-repl> /* Comments in Clang-Repl */
199
200
201Closure or Termination:
202=======================
203
204.. code-block:: text
205
206   clang-repl>%quit
207
208
209Just like Clang, Clang-Repl can be integrated in existing applications as a library
210(using the clangInterpreter library). This turns your C++ compiler into a service that
211can incrementally consume and execute code. The **Compiler as A Service** (**CaaS**)
212concept helps support advanced use cases such as template instantiations on demand and
213automatic language interoperability. It also helps static languages such as C/C++ become
214apt for data science.
215
216Execution Results Handling in Clang-Repl
217========================================
218
219Execution Results Handling features discussed below help extend the Clang-Repl
220functionality by creating an interface between the execution results of a
221program and the compiled program.
222
2231. **Capture Execution Results**: This feature helps capture the execution results
224of a program and bring them back to the compiled program.
225
2262. **Dump Captured Execution Results**: This feature helps create a temporary dump
227for Value Printing/Automatic Printf, that is, to display the value and type of
228the captured data.
229
230
2311. Capture Execution Results
232============================
233
234In many cases, it is useful to bring back the program execution result to the
235compiled program. This result can be stored in an object of type **Value**.
236
237How Execution Results are captured (Value Synthesis):
238-----------------------------------------------------
239
240The synthesizer chooses which expression to synthesize, and then it replaces
241the original expression with the synthesized expression. Depending on the
242expression type, it may choose to save an object (``LastValue``) of type 'value'
243while allocating memory to it (``SetValueWithAlloc()``), or not (
244``SetValueNoAlloc()``).
245
246.. graphviz::
247    :name: valuesynthesis
248    :caption: Value Synthesis
249    :alt: Shows how an object of type 'Value' is synthesized
250    :align: center
251
252     digraph "valuesynthesis" {
253         rankdir="LR";
254         graph [fontname="Verdana", fontsize="12"];
255         node [fontname="Verdana", fontsize="12"];
256         edge [fontname="Sans", fontsize="9"];
257
258         start [label=" Create an Object \n 'Last Value' \n of type 'Value' ", shape="note", fontcolor=white, fillcolor="#3333ff", style=filled];
259         assign [label=" Assign the result \n to the 'LastValue' \n (based on respective \n Memory Allocation \n scenario) ", shape="box"]
260         print [label=" Pretty Print \n the Value Object ", shape="Msquare", fillcolor="yellow", style=filled];
261         start -> assign;
262         assign -> print;
263
264           subgraph SynthesizeExpression {
265             synth [label=" SynthesizeExpr() ", shape="note", fontcolor=white, fillcolor="#3333ff", style=filled];
266             mem [label=" New Memory \n Allocation? ", shape="diamond"];
267             withaloc [label=" SetValueWithAlloc() ", shape="box"];
268             noaloc [label=" SetValueNoAlloc() ", shape="box"];
269             right [label=" 1. RValue Structure \n (a temporary value)", shape="box"];
270             left2 [label=" 2. LValue Structure \n (a variable with \n an address)", shape="box"];
271             left3 [label=" 3. Built-In Type \n (int, float, etc.)", shape="box"];
272             output [label=" move to 'Assign' step ", shape="box"];
273
274             synth -> mem;
275             mem -> withaloc [label="Yes"];
276             mem -> noaloc [label="No"];
277             withaloc -> right;
278             noaloc -> left2;
279             noaloc -> left3;
280             right -> output;
281             left2 -> output;
282             left3 -> output;
283      }
284            output -> assign
285      }
286
287Where is the captured result stored?
288------------------------------------
289
290``LastValue`` holds the last result of the value printing. It is a class member
291because it can be accessed even after subsequent inputs.
292
293**Note:** If no value printing happens, then it is in an invalid state.
294
295Improving Efficiency and User Experience
296----------------------------------------
297
298The Value object is essentially used to create a mapping between an expression
299'type' and the allocated 'memory'. Built-in types (bool, char, int,
300float, double, etc.) are copyable. Their memory allocation size is known
301and the Value object can introduce a small-buffer optimization.
302In case of objects, the ``Value`` class provides reference-counted memory
303management.
304
305The implementation maps the type as written and the Clang Type to be able to use
306the preprocessor to synthesize the relevant cast operations. For example,
307``X(char, Char_S)``, where ``char`` is the type from the language's type system
308and ``Char_S`` is the Clang builtin type which represents it. This mapping helps
309to import execution results from the interpreter in a compiled program and vice
310versa. The ``Value.h`` header file can be included at runtime and this is why it
311has a very low token count and was developed with strict constraints in mind.
312
313This also enables the user to receive the computed 'type' back in their code
314and then transform the type into something else (e.g., re-cast a double into
315a float). Normally, the compiler can handle these conversions transparently,
316but in interpreter mode, the compiler cannot see all the 'from' and 'to' types,
317so it cannot implicitly do the conversions. So this logic enables providing
318these conversions on request.
319
320On-request conversions can help improve the user experience, by allowing
321conversion to a desired 'to' type, when the 'from' type is unknown or unclear.
322
323Significance of this Feature
324----------------------------
325
326The 'Value' object enables wrapping a memory region that comes from the
327JIT, and bringing it back to the compiled code (and vice versa).
328This is a very useful functionality when:
329
330- connecting an interpreter to the compiled code, or
331- connecting an interpreter in another language.
332
333For example, this feature helps transport values across boundaries. A notable
334example is the cppyy project code makes use of this feature to enable running C++
335within Python. It enables transporting values/information between C++
336and Python.
337
338Note: `cppyy <https://github.com/wlav/cppyy/>`_ is an automatic, run-time,
339Python-to-C++ bindings generator, for calling C++ from Python and Python from C++.
340It uses LLVM along with a C++ interpreter (e.g., Cling) to enable features like
341run-time instantiation of C++ templates, cross-inheritance, callbacks,
342auto-casting, transparent use of smart pointers, etc.
343
344In a nutshell, this feature enables a new way of developing code, paving the
345way for language interoperability and easier interactive programming.
346
347Implementation Details
348======================
349
350Interpreter as a REPL vs. as a Library
351--------------------------------------
352
3531 - If we're using the interpreter in interactive (REPL) mode, it will dump
354the value (i.e., value printing).
355
356.. code-block:: console
357
358  if (LastValue.isValid()) {
359    if (!V) {
360      LastValue.dump();
361      LastValue.clear();
362    } else
363      *V = std::move(LastValue);
364  }
365
366
3672 - If we're using the interpreter as a library, then it will pass the value
368to the user.
369
370Incremental AST Consumer
371------------------------
372
373The ``IncrementalASTConsumer`` class wraps the original code generator
374``ASTConsumer`` and it performs a hook, to traverse all the top-level decls, to
375look for expressions to synthesize, based on the ``isSemiMissing()`` condition.
376
377If this condition is found to be true, then ``Interp.SynthesizeExpr()`` will be
378invoked.
379
380**Note:** Following is a sample code snippet. Actual code may vary over time.
381
382.. code-block:: console
383
384    for (Decl *D : DGR)
385      if (auto *TSD = llvm::dyn_cast<TopLevelStmtDecl>(D);
386          TSD && TSD->isSemiMissing())
387        TSD->setStmt(Interp.SynthesizeExpr(cast<Expr>(TSD->getStmt())));
388
389    return Consumer->HandleTopLevelDecl(DGR);
390
391The synthesizer will then choose the relevant expression, based on its type.
392
393Communication between Compiled Code and Interpreted Code
394--------------------------------------------------------
395
396In Clang-Repl there is **interpreted code**, and this feature adds a 'value'
397runtime that can talk to the **compiled code**.
398
399Following is an example where the compiled code interacts with the interpreter
400code. The execution results of an expression are stored in the object 'V' of
401type Value. This value is then printed, effectively helping the interpreter
402use a value from the compiled code.
403
404.. code-block:: console
405
406    int Global = 42;
407    void setGlobal(int val) { Global = val; }
408    int getGlobal() { return Global; }
409    Interp.ParseAndExecute(“void setGlobal(int val);”);
410    Interp.ParseAndExecute(“int getGlobal();”);
411    Value V;
412    Interp.ParseAndExecute(“getGlobal()”, &V);
413    std::cout << V.getAs<int>() << “\n”; // Prints 42
414
415
416**Note:** Above is an example of interoperability between the compiled code and
417the interpreted code. Interoperability between languages (e.g., C++ and Python)
418works similarly.
419
420
4212. Dump Captured Execution Results
422==================================
423
424This feature helps create a temporary dump to display the value and type
425(pretty print) of the desired data. This is a good way to interact with the
426interpreter during interactive programming.
427
428How value printing is simplified (Automatic Printf)
429---------------------------------------------------
430
431The ``Automatic Printf`` feature makes it easy to display variable values during
432program execution. Using the ``printf`` function repeatedly is not required.
433This is achieved using an extension in the ``libclangInterpreter`` library.
434
435To automatically print the value of an expression, simply write the expression
436in the global scope **without a semicolon**.
437
438.. graphviz::
439    :name: automaticprintf
440    :caption: Automatic PrintF
441    :alt: Shows how Automatic PrintF can be used
442    :align: center
443
444     digraph "AutomaticPrintF" {
445         size="6,4";
446         rankdir="LR";
447         graph [fontname="Verdana", fontsize="12"];
448         node [fontname="Verdana", fontsize="12"];
449         edge [fontname="Sans", fontsize="9"];
450
451         manual [label=" Manual PrintF ", shape="box"];
452         int1 [label=" int ( &) 42 ", shape="box"]
453         auto [label=" Automatic PrintF ", shape="box"];
454         int2 [label=" int ( &) 42 ", shape="box"]
455
456         auto -> int2 [label="int x = 42; \n x"];
457         manual -> int1 [label="int x = 42; \n printf(&quot;(int &) %d \\n&quot;, x);"];
458     }
459
460
461Significance of this feature
462----------------------------
463
464Inspired by a similar implementation in `Cling <https://github.com/root-project/cling>`_,
465this feature added to upstream Clang repo has essentially extended the syntax of
466C++, so that it can be more helpful for people that are writing code for data
467science applications.
468
469This is useful, for example, when you want to experiment with a set of values
470against a set of functions, and you'd like to know the results right away.
471This is similar to how Python works (hence its popularity in data science
472research), but the superior performance of C++, along with this flexibility
473makes it a more attractive option.
474
475Implementation Details
476======================
477
478Parsing mechanism:
479------------------
480
481The Interpreter in Clang-Repl (``Interpreter.cpp``) includes the function
482``ParseAndExecute()`` that can accept a 'Value' parameter to capture the result.
483But if the value parameter is made optional and it is omitted (i.e., that the
484user does not want to utilize it elsewhere), then the last value can be
485validated and pushed into the ``dump()`` function.
486
487.. graphviz::
488    :name: parsing
489    :caption: Parsing Mechanism
490    :alt: Shows the Parsing Mechanism for Pretty Printing
491    :align: center
492
493
494     digraph "prettyprint" {
495         rankdir="LR";
496         graph [fontname="Verdana", fontsize="12"];
497         node [fontname="Verdana", fontsize="12"];
498         edge [fontname="Verdana", fontsize="9"];
499
500         parse [label=" ParseAndExecute() \n in Clang ", shape="box"];
501         capture [label=" Capture 'Value' parameter \n for processing? ", shape="diamond"];
502         use [label="  Use for processing  ", shape="box"];
503         dump [label="  Validate and push  \n to dump()", shape="box"];
504         callp [label="  call print() function ", shape="box"];
505         type [label="  Print the Type \n ReplPrintTypeImpl()", shape="box"];
506         data [label="  Print the Data \n ReplPrintDataImpl() ", shape="box"];
507         output [label="  Output Pretty Print \n to the user  ", shape="box", fontcolor=white, fillcolor="#3333ff", style=filled];
508
509         parse -> capture [label="Optional 'Value' Parameter"];
510         capture -> use [label="Yes"];
511         use -> End;
512         capture -> dump [label="No"];
513         dump -> callp;
514         callp -> type;
515         callp -> data;
516         type -> output;
517         data -> output;
518      }
519
520**Note:** Following is a sample code snippet. Actual code may vary over time.
521
522.. code-block:: console
523
524    llvm::Error Interpreter::ParseAndExecute(llvm::StringRef Code, Value *V) {
525
526    auto PTU = Parse(Code);
527    if (!PTU)
528        return PTU.takeError();
529    if (PTU->TheModule)
530        if (llvm::Error Err = Execute(*PTU))
531        return Err;
532
533    if (LastValue.isValid()) {
534        if (!V) {
535        LastValue.dump();
536        LastValue.clear();
537        } else
538        *V = std::move(LastValue);
539    }
540    return llvm::Error::success();
541    }
542
543The ``dump()`` function (in ``value.cpp``) calls the ``print()`` function.
544
545Printing the Data and Type are handled in their respective functions:
546``ReplPrintDataImpl()`` and ``ReplPrintTypeImpl()``.
547
548Annotation Token (annot_repl_input_end)
549---------------------------------------
550
551This feature uses a new token (``annot_repl_input_end``) to consider printing the
552value of an expression if it doesn't end with a semicolon. When parsing an
553Expression Statement, if the last semicolon is missing, then the code will
554pretend that there one and set a marker there for later utilization, and
555continue parsing.
556
557A semicolon is normally required in C++, but this feature expands the C++
558syntax to handle cases where a missing semicolon is expected (i.e., when
559handling an expression statement). It also makes sure that an error is not
560generated for the missing semicolon in this specific case.
561
562This is accomplished by identifying the end position of the user input
563(expression statement). This helps store and return the expression statement
564effectively, so that it can be printed (displayed to the user automatically).
565
566**Note:** This logic is only available for C++ for now, since part of the
567implementation itself requires C++ features. Future versions may support more
568languages.
569
570.. code-block:: console
571
572  Token *CurTok = nullptr;
573  // If the semicolon is missing at the end of REPL input, consider if
574  // we want to do value printing. Note this is only enabled in C++ mode
575  // since part of the implementation requires C++ language features.
576  // Note we shouldn't eat the token since the callback needs it.
577  if (Tok.is(tok::annot_repl_input_end) && Actions.getLangOpts().CPlusPlus)
578    CurTok = &Tok;
579  else
580    // Otherwise, eat the semicolon.
581    ExpectAndConsumeSemi(diag::err_expected_semi_after_expr);
582
583  StmtResult R = handleExprStmt(Expr, StmtCtx);
584  if (CurTok && !R.isInvalid())
585    CurTok->setAnnotationValue(R.get());
586
587  return R;
588    }
589
590AST Transformation
591-------------------
592
593When Sema encounters the ``annot_repl_input_end`` token, it knows to transform
594the AST before the real CodeGen process. It will consume the token and set a
595'semi missing' bit in the respective decl.
596
597.. code-block:: console
598
599    if (Tok.is(tok::annot_repl_input_end) &&
600        Tok.getAnnotationValue() != nullptr) {
601        ConsumeAnnotationToken();
602        cast<TopLevelStmtDecl>(DeclsInGroup.back())->setSemiMissing();
603    }
604
605In the AST Consumer, traverse all the Top Level Decls, to look for expressions
606to synthesize. If the current Decl is the Top Level Statement
607Decl(``TopLevelStmtDecl``) and has a semicolon missing, then ask the interpreter
608to synthesize another expression (an internal function call) to replace this
609original expression.
610
611
612Detailed RFC and Discussion:
613----------------------------
614
615For more technical details, community discussion and links to patches related
616to these features,
617Please visit: `RFC on LLVM Discourse <https://discourse.llvm.org/t/rfc-handle-execution-results-in-clang-repl/68493>`_.
618
619Some logic presented in the RFC (e.g. ValueGetter()) may be outdated,
620compared to the final developed solution.
621
622Related Reading
623===============
624`Cling Transitions to LLVM's Clang-Repl <https://root.cern/blog/cling-in-llvm/>`_
625
626`Moving (parts of) the Cling REPL in Clang <https://lists.llvm.org/pipermail/llvm-dev/2020-July/143257.html>`_
627
628`GPU Accelerated Automatic Differentiation With Clad <https://arxiv.org/pdf/2203.06139.pdf>`_
629