1=========== 2Clang-Repl 3=========== 4 5**Clang-Repl** is an interactive C++ interpreter that allows for incremental 6compilation. It supports interactive programming for C++ in a 7read-evaluate-print-loop (REPL) style. It uses Clang as a library to compile the 8high level programming language into LLVM IR. Then the LLVM IR is executed by 9the LLVM just-in-time (JIT) infrastructure. 10 11Clang-Repl is suitable for exploratory programming and in places where time 12to insight is important. Clang-Repl is a project inspired by the work in 13`Cling <https://github.com/root-project/cling>`_, a LLVM-based C/C++ interpreter 14developed by the field of high energy physics and used by the scientific data 15analysis framework `ROOT <https://root.cern/>`_. Clang-Repl allows to move parts 16of Cling upstream, making them useful and available to a broader audience. 17 18 19Clang-Repl Basic Data Flow 20========================== 21 22.. image:: ClangRepl_design.png 23 :align: center 24 :alt: ClangRepl design 25 26Clang-Repl data flow can be divided into roughly 8 phases: 27 281. Clang-Repl controls the input infrastructure by an interactive prompt or by 29 an interface allowing the incremental processing of input. 30 312. Then it sends the input to the underlying incremental facilities in Clang 32 infrastructure. 33 343. Clang compiles the input into an AST representation. 35 364. When required the AST can be further transformed in order to attach specific 37 behavior. 38 395. The AST representation is then lowered to LLVM IR. 40 416. The LLVM IR is the input format for LLVM’s JIT compilation infrastructure. 42 The tool will instruct the JIT to run specified functions, translating them 43 into machine code targeting the underlying device architecture (eg. Intel 44 x86 or NVPTX). 45 467. The LLVM JIT lowers the LLVM IR to machine code. 47 488. The machine code is then executed. 49 50Build Instructions: 51=================== 52 53 54.. code-block:: console 55 56 $ cd llvm-project 57 $ mkdir build 58 $ cd build 59 $ cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DLLVM_ENABLE_PROJECTS=clang -G "Unix Makefiles" ../llvm 60 61**Note here**, above RelWithDebInfo - Debug / Release 62 63.. code-block:: console 64 65 cmake --build . --target clang clang-repl -j n 66 OR 67 cmake --build . --target clang clang-repl 68 69**Clang-repl** is built under llvm-project/build/bin. Proceed into the directory **llvm-project/build/bin** 70 71.. code-block:: console 72 73 ./clang-repl 74 clang-repl> 75 76 77Clang-Repl Usage 78================ 79 80**Clang-Repl** is an interactive C++ interpreter that allows for incremental 81compilation. It supports interactive programming for C++ in a 82read-evaluate-print-loop (REPL) style. It uses Clang as a library to compile the 83high level programming language into LLVM IR. Then the LLVM IR is executed by 84the LLVM just-in-time (JIT) infrastructure. 85 86 87Basic: 88====== 89 90.. code-block:: text 91 92 clang-repl> #include <iostream> 93 clang-repl> int f() { std::cout << "Hello Interpreted World!\n"; return 0; } 94 clang-repl> auto r = f(); 95 // Prints Hello Interpreted World! 96 97.. code-block:: text 98 99 clang-repl> #include<iostream> 100 clang-repl> using namespace std; 101 clang-repl> std::cout << "Welcome to CLANG-REPL" << std::endl; 102 Welcome to CLANG-REPL 103 // Prints Welcome to CLANG-REPL 104 105 106Function Definitions and Calls: 107=============================== 108 109.. code-block:: text 110 111 clang-repl> #include <iostream> 112 clang-repl> int sum(int a, int b){ return a+b; }; 113 clang-repl> int c = sum(9,10); 114 clang-repl> std::cout << c << std::endl; 115 19 116 clang-repl> 117 118Iterative Structures: 119===================== 120 121.. code-block:: text 122 123 clang-repl> #include <iostream> 124 clang-repl> for (int i = 0;i < 3;i++){ std::cout << i << std::endl;} 125 0 126 1 127 2 128 clang-repl> while(i < 7){ i++; std::cout << i << std::endl;} 129 4 130 5 131 6 132 7 133 134Classes and Structures: 135======================= 136 137.. code-block:: text 138 139 clang-repl> #include <iostream> 140 clang-repl> class Rectangle {int width, height; public: void set_values (int,int);\ 141 clang-repl... int area() {return width*height;}}; 142 clang-repl> void Rectangle::set_values (int x, int y) { width = x;height = y;} 143 clang-repl> int main () { Rectangle rect;rect.set_values (3,4);\ 144 clang-repl... std::cout << "area: " << rect.area() << std::endl;\ 145 clang-repl... return 0;} 146 clang-repl> main(); 147 area: 12 148 clang-repl> 149 // Note: This '\' can be used for continuation of the statements in the next line 150 151Lamdas: 152======= 153 154.. code-block:: text 155 156 clang-repl> #include <iostream> 157 clang-repl> using namespace std; 158 clang-repl> auto welcome = []() { std::cout << "Welcome to REPL" << std::endl;}; 159 clang-repl> welcome(); 160 Welcome to REPL 161 162Using Dynamic Library: 163====================== 164 165.. code-block:: text 166 167 clang-repl> %lib print.so 168 clang-repl> #include"print.hpp" 169 clang-repl> print(9); 170 9 171 172**Generation of dynamic library** 173 174.. code-block:: text 175 176 // print.cpp 177 #include <iostream> 178 #include "print.hpp" 179 180 void print(int a) 181 { 182 std::cout << a << std::endl; 183 } 184 185 // print.hpp 186 void print (int a); 187 188 // Commands 189 clang++-17 -c -o print.o print.cpp 190 clang-17 -shared print.o -o print.so 191 192Comments: 193========= 194 195.. code-block:: text 196 197 clang-repl> // Comments in Clang-Repl 198 clang-repl> /* Comments in Clang-Repl */ 199 200 201Closure or Termination: 202======================= 203 204.. code-block:: text 205 206 clang-repl>%quit 207 208 209Just like Clang, Clang-Repl can be integrated in existing applications as a library 210(using the clangInterpreter library). This turns your C++ compiler into a service that 211can incrementally consume and execute code. The **Compiler as A Service** (**CaaS**) 212concept helps support advanced use cases such as template instantiations on demand and 213automatic language interoperability. It also helps static languages such as C/C++ become 214apt for data science. 215 216Execution Results Handling in Clang-Repl 217======================================== 218 219Execution Results Handling features discussed below help extend the Clang-Repl 220functionality by creating an interface between the execution results of a 221program and the compiled program. 222 2231. **Capture Execution Results**: This feature helps capture the execution results 224of a program and bring them back to the compiled program. 225 2262. **Dump Captured Execution Results**: This feature helps create a temporary dump 227for Value Printing/Automatic Printf, that is, to display the value and type of 228the captured data. 229 230 2311. Capture Execution Results 232============================ 233 234In many cases, it is useful to bring back the program execution result to the 235compiled program. This result can be stored in an object of type **Value**. 236 237How Execution Results are captured (Value Synthesis): 238----------------------------------------------------- 239 240The synthesizer chooses which expression to synthesize, and then it replaces 241the original expression with the synthesized expression. Depending on the 242expression type, it may choose to save an object (``LastValue``) of type 'value' 243while allocating memory to it (``SetValueWithAlloc()``), or not ( 244``SetValueNoAlloc()``). 245 246.. graphviz:: 247 :name: valuesynthesis 248 :caption: Value Synthesis 249 :alt: Shows how an object of type 'Value' is synthesized 250 :align: center 251 252 digraph "valuesynthesis" { 253 rankdir="LR"; 254 graph [fontname="Verdana", fontsize="12"]; 255 node [fontname="Verdana", fontsize="12"]; 256 edge [fontname="Sans", fontsize="9"]; 257 258 start [label=" Create an Object \n 'Last Value' \n of type 'Value' ", shape="note", fontcolor=white, fillcolor="#3333ff", style=filled]; 259 assign [label=" Assign the result \n to the 'LastValue' \n (based on respective \n Memory Allocation \n scenario) ", shape="box"] 260 print [label=" Pretty Print \n the Value Object ", shape="Msquare", fillcolor="yellow", style=filled]; 261 start -> assign; 262 assign -> print; 263 264 subgraph SynthesizeExpression { 265 synth [label=" SynthesizeExpr() ", shape="note", fontcolor=white, fillcolor="#3333ff", style=filled]; 266 mem [label=" New Memory \n Allocation? ", shape="diamond"]; 267 withaloc [label=" SetValueWithAlloc() ", shape="box"]; 268 noaloc [label=" SetValueNoAlloc() ", shape="box"]; 269 right [label=" 1. RValue Structure \n (a temporary value)", shape="box"]; 270 left2 [label=" 2. LValue Structure \n (a variable with \n an address)", shape="box"]; 271 left3 [label=" 3. Built-In Type \n (int, float, etc.)", shape="box"]; 272 output [label=" move to 'Assign' step ", shape="box"]; 273 274 synth -> mem; 275 mem -> withaloc [label="Yes"]; 276 mem -> noaloc [label="No"]; 277 withaloc -> right; 278 noaloc -> left2; 279 noaloc -> left3; 280 right -> output; 281 left2 -> output; 282 left3 -> output; 283 } 284 output -> assign 285 } 286 287Where is the captured result stored? 288------------------------------------ 289 290``LastValue`` holds the last result of the value printing. It is a class member 291because it can be accessed even after subsequent inputs. 292 293**Note:** If no value printing happens, then it is in an invalid state. 294 295Improving Efficiency and User Experience 296---------------------------------------- 297 298The Value object is essentially used to create a mapping between an expression 299'type' and the allocated 'memory'. Built-in types (bool, char, int, 300float, double, etc.) are copyable. Their memory allocation size is known 301and the Value object can introduce a small-buffer optimization. 302In case of objects, the ``Value`` class provides reference-counted memory 303management. 304 305The implementation maps the type as written and the Clang Type to be able to use 306the preprocessor to synthesize the relevant cast operations. For example, 307``X(char, Char_S)``, where ``char`` is the type from the language's type system 308and ``Char_S`` is the Clang builtin type which represents it. This mapping helps 309to import execution results from the interpreter in a compiled program and vice 310versa. The ``Value.h`` header file can be included at runtime and this is why it 311has a very low token count and was developed with strict constraints in mind. 312 313This also enables the user to receive the computed 'type' back in their code 314and then transform the type into something else (e.g., re-cast a double into 315a float). Normally, the compiler can handle these conversions transparently, 316but in interpreter mode, the compiler cannot see all the 'from' and 'to' types, 317so it cannot implicitly do the conversions. So this logic enables providing 318these conversions on request. 319 320On-request conversions can help improve the user experience, by allowing 321conversion to a desired 'to' type, when the 'from' type is unknown or unclear. 322 323Significance of this Feature 324---------------------------- 325 326The 'Value' object enables wrapping a memory region that comes from the 327JIT, and bringing it back to the compiled code (and vice versa). 328This is a very useful functionality when: 329 330- connecting an interpreter to the compiled code, or 331- connecting an interpreter in another language. 332 333For example, this feature helps transport values across boundaries. A notable 334example is the cppyy project code makes use of this feature to enable running C++ 335within Python. It enables transporting values/information between C++ 336and Python. 337 338Note: `cppyy <https://github.com/wlav/cppyy/>`_ is an automatic, run-time, 339Python-to-C++ bindings generator, for calling C++ from Python and Python from C++. 340It uses LLVM along with a C++ interpreter (e.g., Cling) to enable features like 341run-time instantiation of C++ templates, cross-inheritance, callbacks, 342auto-casting, transparent use of smart pointers, etc. 343 344In a nutshell, this feature enables a new way of developing code, paving the 345way for language interoperability and easier interactive programming. 346 347Implementation Details 348====================== 349 350Interpreter as a REPL vs. as a Library 351-------------------------------------- 352 3531 - If we're using the interpreter in interactive (REPL) mode, it will dump 354the value (i.e., value printing). 355 356.. code-block:: console 357 358 if (LastValue.isValid()) { 359 if (!V) { 360 LastValue.dump(); 361 LastValue.clear(); 362 } else 363 *V = std::move(LastValue); 364 } 365 366 3672 - If we're using the interpreter as a library, then it will pass the value 368to the user. 369 370Incremental AST Consumer 371------------------------ 372 373The ``IncrementalASTConsumer`` class wraps the original code generator 374``ASTConsumer`` and it performs a hook, to traverse all the top-level decls, to 375look for expressions to synthesize, based on the ``isSemiMissing()`` condition. 376 377If this condition is found to be true, then ``Interp.SynthesizeExpr()`` will be 378invoked. 379 380**Note:** Following is a sample code snippet. Actual code may vary over time. 381 382.. code-block:: console 383 384 for (Decl *D : DGR) 385 if (auto *TSD = llvm::dyn_cast<TopLevelStmtDecl>(D); 386 TSD && TSD->isSemiMissing()) 387 TSD->setStmt(Interp.SynthesizeExpr(cast<Expr>(TSD->getStmt()))); 388 389 return Consumer->HandleTopLevelDecl(DGR); 390 391The synthesizer will then choose the relevant expression, based on its type. 392 393Communication between Compiled Code and Interpreted Code 394-------------------------------------------------------- 395 396In Clang-Repl there is **interpreted code**, and this feature adds a 'value' 397runtime that can talk to the **compiled code**. 398 399Following is an example where the compiled code interacts with the interpreter 400code. The execution results of an expression are stored in the object 'V' of 401type Value. This value is then printed, effectively helping the interpreter 402use a value from the compiled code. 403 404.. code-block:: console 405 406 int Global = 42; 407 void setGlobal(int val) { Global = val; } 408 int getGlobal() { return Global; } 409 Interp.ParseAndExecute(“void setGlobal(int val);”); 410 Interp.ParseAndExecute(“int getGlobal();”); 411 Value V; 412 Interp.ParseAndExecute(“getGlobal()”, &V); 413 std::cout << V.getAs<int>() << “\n”; // Prints 42 414 415 416**Note:** Above is an example of interoperability between the compiled code and 417the interpreted code. Interoperability between languages (e.g., C++ and Python) 418works similarly. 419 420 4212. Dump Captured Execution Results 422================================== 423 424This feature helps create a temporary dump to display the value and type 425(pretty print) of the desired data. This is a good way to interact with the 426interpreter during interactive programming. 427 428How value printing is simplified (Automatic Printf) 429--------------------------------------------------- 430 431The ``Automatic Printf`` feature makes it easy to display variable values during 432program execution. Using the ``printf`` function repeatedly is not required. 433This is achieved using an extension in the ``libclangInterpreter`` library. 434 435To automatically print the value of an expression, simply write the expression 436in the global scope **without a semicolon**. 437 438.. graphviz:: 439 :name: automaticprintf 440 :caption: Automatic PrintF 441 :alt: Shows how Automatic PrintF can be used 442 :align: center 443 444 digraph "AutomaticPrintF" { 445 size="6,4"; 446 rankdir="LR"; 447 graph [fontname="Verdana", fontsize="12"]; 448 node [fontname="Verdana", fontsize="12"]; 449 edge [fontname="Sans", fontsize="9"]; 450 451 manual [label=" Manual PrintF ", shape="box"]; 452 int1 [label=" int ( &) 42 ", shape="box"] 453 auto [label=" Automatic PrintF ", shape="box"]; 454 int2 [label=" int ( &) 42 ", shape="box"] 455 456 auto -> int2 [label="int x = 42; \n x"]; 457 manual -> int1 [label="int x = 42; \n printf("(int &) %d \\n", x);"]; 458 } 459 460 461Significance of this feature 462---------------------------- 463 464Inspired by a similar implementation in `Cling <https://github.com/root-project/cling>`_, 465this feature added to upstream Clang repo has essentially extended the syntax of 466C++, so that it can be more helpful for people that are writing code for data 467science applications. 468 469This is useful, for example, when you want to experiment with a set of values 470against a set of functions, and you'd like to know the results right away. 471This is similar to how Python works (hence its popularity in data science 472research), but the superior performance of C++, along with this flexibility 473makes it a more attractive option. 474 475Implementation Details 476====================== 477 478Parsing mechanism: 479------------------ 480 481The Interpreter in Clang-Repl (``Interpreter.cpp``) includes the function 482``ParseAndExecute()`` that can accept a 'Value' parameter to capture the result. 483But if the value parameter is made optional and it is omitted (i.e., that the 484user does not want to utilize it elsewhere), then the last value can be 485validated and pushed into the ``dump()`` function. 486 487.. graphviz:: 488 :name: parsing 489 :caption: Parsing Mechanism 490 :alt: Shows the Parsing Mechanism for Pretty Printing 491 :align: center 492 493 494 digraph "prettyprint" { 495 rankdir="LR"; 496 graph [fontname="Verdana", fontsize="12"]; 497 node [fontname="Verdana", fontsize="12"]; 498 edge [fontname="Verdana", fontsize="9"]; 499 500 parse [label=" ParseAndExecute() \n in Clang ", shape="box"]; 501 capture [label=" Capture 'Value' parameter \n for processing? ", shape="diamond"]; 502 use [label=" Use for processing ", shape="box"]; 503 dump [label=" Validate and push \n to dump()", shape="box"]; 504 callp [label=" call print() function ", shape="box"]; 505 type [label=" Print the Type \n ReplPrintTypeImpl()", shape="box"]; 506 data [label=" Print the Data \n ReplPrintDataImpl() ", shape="box"]; 507 output [label=" Output Pretty Print \n to the user ", shape="box", fontcolor=white, fillcolor="#3333ff", style=filled]; 508 509 parse -> capture [label="Optional 'Value' Parameter"]; 510 capture -> use [label="Yes"]; 511 use -> End; 512 capture -> dump [label="No"]; 513 dump -> callp; 514 callp -> type; 515 callp -> data; 516 type -> output; 517 data -> output; 518 } 519 520**Note:** Following is a sample code snippet. Actual code may vary over time. 521 522.. code-block:: console 523 524 llvm::Error Interpreter::ParseAndExecute(llvm::StringRef Code, Value *V) { 525 526 auto PTU = Parse(Code); 527 if (!PTU) 528 return PTU.takeError(); 529 if (PTU->TheModule) 530 if (llvm::Error Err = Execute(*PTU)) 531 return Err; 532 533 if (LastValue.isValid()) { 534 if (!V) { 535 LastValue.dump(); 536 LastValue.clear(); 537 } else 538 *V = std::move(LastValue); 539 } 540 return llvm::Error::success(); 541 } 542 543The ``dump()`` function (in ``value.cpp``) calls the ``print()`` function. 544 545Printing the Data and Type are handled in their respective functions: 546``ReplPrintDataImpl()`` and ``ReplPrintTypeImpl()``. 547 548Annotation Token (annot_repl_input_end) 549--------------------------------------- 550 551This feature uses a new token (``annot_repl_input_end``) to consider printing the 552value of an expression if it doesn't end with a semicolon. When parsing an 553Expression Statement, if the last semicolon is missing, then the code will 554pretend that there one and set a marker there for later utilization, and 555continue parsing. 556 557A semicolon is normally required in C++, but this feature expands the C++ 558syntax to handle cases where a missing semicolon is expected (i.e., when 559handling an expression statement). It also makes sure that an error is not 560generated for the missing semicolon in this specific case. 561 562This is accomplished by identifying the end position of the user input 563(expression statement). This helps store and return the expression statement 564effectively, so that it can be printed (displayed to the user automatically). 565 566**Note:** This logic is only available for C++ for now, since part of the 567implementation itself requires C++ features. Future versions may support more 568languages. 569 570.. code-block:: console 571 572 Token *CurTok = nullptr; 573 // If the semicolon is missing at the end of REPL input, consider if 574 // we want to do value printing. Note this is only enabled in C++ mode 575 // since part of the implementation requires C++ language features. 576 // Note we shouldn't eat the token since the callback needs it. 577 if (Tok.is(tok::annot_repl_input_end) && Actions.getLangOpts().CPlusPlus) 578 CurTok = &Tok; 579 else 580 // Otherwise, eat the semicolon. 581 ExpectAndConsumeSemi(diag::err_expected_semi_after_expr); 582 583 StmtResult R = handleExprStmt(Expr, StmtCtx); 584 if (CurTok && !R.isInvalid()) 585 CurTok->setAnnotationValue(R.get()); 586 587 return R; 588 } 589 590AST Transformation 591------------------- 592 593When Sema encounters the ``annot_repl_input_end`` token, it knows to transform 594the AST before the real CodeGen process. It will consume the token and set a 595'semi missing' bit in the respective decl. 596 597.. code-block:: console 598 599 if (Tok.is(tok::annot_repl_input_end) && 600 Tok.getAnnotationValue() != nullptr) { 601 ConsumeAnnotationToken(); 602 cast<TopLevelStmtDecl>(DeclsInGroup.back())->setSemiMissing(); 603 } 604 605In the AST Consumer, traverse all the Top Level Decls, to look for expressions 606to synthesize. If the current Decl is the Top Level Statement 607Decl(``TopLevelStmtDecl``) and has a semicolon missing, then ask the interpreter 608to synthesize another expression (an internal function call) to replace this 609original expression. 610 611 612Detailed RFC and Discussion: 613---------------------------- 614 615For more technical details, community discussion and links to patches related 616to these features, 617Please visit: `RFC on LLVM Discourse <https://discourse.llvm.org/t/rfc-handle-execution-results-in-clang-repl/68493>`_. 618 619Some logic presented in the RFC (e.g. ValueGetter()) may be outdated, 620compared to the final developed solution. 621 622Related Reading 623=============== 624`Cling Transitions to LLVM's Clang-Repl <https://root.cern/blog/cling-in-llvm/>`_ 625 626`Moving (parts of) the Cling REPL in Clang <https://lists.llvm.org/pipermail/llvm-dev/2020-July/143257.html>`_ 627 628`GPU Accelerated Automatic Differentiation With Clad <https://arxiv.org/pdf/2203.06139.pdf>`_ 629