17330f729Sjoerg================================ 27330f729SjoergFrequently Asked Questions (FAQ) 37330f729Sjoerg================================ 47330f729Sjoerg 57330f729Sjoerg.. contents:: 67330f729Sjoerg :local: 77330f729Sjoerg 87330f729Sjoerg 97330f729SjoergLicense 107330f729Sjoerg======= 117330f729Sjoerg 127330f729SjoergCan I modify LLVM source code and redistribute the modified source? 137330f729Sjoerg------------------------------------------------------------------- 147330f729SjoergYes. The modified source distribution must retain the copyright notice and 15*82d56013Sjoergfollow the conditions listed in the `Apache License v2.0 with LLVM Exceptions 16*82d56013Sjoerg<https://github.com/llvm/llvm-project/blob/main/llvm/LICENSE.TXT>`_. 177330f729Sjoerg 187330f729Sjoerg 197330f729SjoergCan I modify the LLVM source code and redistribute binaries or other tools based on it, without redistributing the source? 207330f729Sjoerg-------------------------------------------------------------------------------------------------------------------------- 217330f729SjoergYes. This is why we distribute LLVM under a less restrictive license than GPL, 227330f729Sjoergas explained in the first question above. 237330f729Sjoerg 247330f729Sjoerg 257330f729SjoergSource Code 267330f729Sjoerg=========== 277330f729Sjoerg 287330f729SjoergIn what language is LLVM written? 297330f729Sjoerg--------------------------------- 307330f729SjoergAll of the LLVM tools and libraries are written in C++ with extensive use of 317330f729Sjoergthe STL. 327330f729Sjoerg 337330f729Sjoerg 347330f729SjoergHow portable is the LLVM source code? 357330f729Sjoerg------------------------------------- 367330f729SjoergThe LLVM source code should be portable to most modern Unix-like operating 37*82d56013Sjoergsystems. LLVM also has excellent support on Windows systems. 387330f729SjoergMost of the code is written in standard C++ with operating system 397330f729Sjoergservices abstracted to a support library. The tools required to build and 407330f729Sjoergtest LLVM have been ported to a plethora of platforms. 417330f729Sjoerg 427330f729Sjoerg 437330f729SjoergWhat API do I use to store a value to one of the virtual registers in LLVM IR's SSA representation? 447330f729Sjoerg--------------------------------------------------------------------------------------------------- 457330f729Sjoerg 467330f729SjoergIn short: you can't. It's actually kind of a silly question once you grok 477330f729Sjoergwhat's going on. Basically, in code like: 487330f729Sjoerg 497330f729Sjoerg.. code-block:: llvm 507330f729Sjoerg 517330f729Sjoerg %result = add i32 %foo, %bar 527330f729Sjoerg 537330f729Sjoerg, ``%result`` is just a name given to the ``Value`` of the ``add`` 547330f729Sjoerginstruction. In other words, ``%result`` *is* the add instruction. The 557330f729Sjoerg"assignment" doesn't explicitly "store" anything to any "virtual register"; 567330f729Sjoergthe "``=``" is more like the mathematical sense of equality. 577330f729Sjoerg 587330f729SjoergLonger explanation: In order to generate a textual representation of the 597330f729SjoergIR, some kind of name has to be given to each instruction so that other 607330f729Sjoerginstructions can textually reference it. However, the isomorphic in-memory 617330f729Sjoergrepresentation that you manipulate from C++ has no such restriction since 627330f729Sjoerginstructions can simply keep pointers to any other ``Value``'s that they 637330f729Sjoergreference. In fact, the names of dummy numbered temporaries like ``%1`` are 647330f729Sjoergnot explicitly represented in the in-memory representation at all (see 657330f729Sjoerg``Value::getName()``). 667330f729Sjoerg 677330f729Sjoerg 687330f729SjoergSource Languages 697330f729Sjoerg================ 707330f729Sjoerg 717330f729SjoergWhat source languages are supported? 727330f729Sjoerg------------------------------------ 737330f729Sjoerg 747330f729SjoergLLVM currently has full support for C and C++ source languages through 75*82d56013Sjoerg`Clang <https://clang.llvm.org/>`_. Many other language frontends have 767330f729Sjoergbeen written using LLVM, and an incomplete list is available at 77*82d56013Sjoerg`projects with LLVM <https://llvm.org/ProjectsWithLLVM/>`_. 787330f729Sjoerg 797330f729Sjoerg 807330f729SjoergI'd like to write a self-hosting LLVM compiler. How should I interface with the LLVM middle-end optimizers and back-end code generators? 817330f729Sjoerg---------------------------------------------------------------------------------------------------------------------------------------- 827330f729SjoergYour compiler front-end will communicate with LLVM by creating a module in the 837330f729SjoergLLVM intermediate representation (IR) format. Assuming you want to write your 847330f729Sjoerglanguage's compiler in the language itself (rather than C++), there are 3 857330f729Sjoergmajor ways to tackle generating LLVM IR from a front-end: 867330f729Sjoerg 877330f729Sjoerg1. **Call into the LLVM libraries code using your language's FFI (foreign 887330f729Sjoerg function interface).** 897330f729Sjoerg 907330f729Sjoerg * *for:* best tracks changes to the LLVM IR, .ll syntax, and .bc format 917330f729Sjoerg 927330f729Sjoerg * *for:* enables running LLVM optimization passes without a emit/parse 937330f729Sjoerg overhead 947330f729Sjoerg 957330f729Sjoerg * *for:* adapts well to a JIT context 967330f729Sjoerg 977330f729Sjoerg * *against:* lots of ugly glue code to write 987330f729Sjoerg 997330f729Sjoerg2. **Emit LLVM assembly from your compiler's native language.** 1007330f729Sjoerg 1017330f729Sjoerg * *for:* very straightforward to get started 1027330f729Sjoerg 1037330f729Sjoerg * *against:* the .ll parser is slower than the bitcode reader when 1047330f729Sjoerg interfacing to the middle end 1057330f729Sjoerg 1067330f729Sjoerg * *against:* it may be harder to track changes to the IR 1077330f729Sjoerg 1087330f729Sjoerg3. **Emit LLVM bitcode from your compiler's native language.** 1097330f729Sjoerg 1107330f729Sjoerg * *for:* can use the more-efficient bitcode reader when interfacing to the 1117330f729Sjoerg middle end 1127330f729Sjoerg 1137330f729Sjoerg * *against:* you'll have to re-engineer the LLVM IR object model and bitcode 1147330f729Sjoerg writer in your language 1157330f729Sjoerg 1167330f729Sjoerg * *against:* it may be harder to track changes to the IR 1177330f729Sjoerg 1187330f729SjoergIf you go with the first option, the C bindings in include/llvm-c should help 1197330f729Sjoerga lot, since most languages have strong support for interfacing with C. The 1207330f729Sjoergmost common hurdle with calling C from managed code is interfacing with the 1217330f729Sjoerggarbage collector. The C interface was designed to require very little memory 1227330f729Sjoergmanagement, and so is straightforward in this regard. 1237330f729Sjoerg 1247330f729SjoergWhat support is there for a higher level source language constructs for building a compiler? 1257330f729Sjoerg-------------------------------------------------------------------------------------------- 1267330f729SjoergCurrently, there isn't much. LLVM supports an intermediate representation 1277330f729Sjoergwhich is useful for code representation but will not support the high level 1287330f729Sjoerg(abstract syntax tree) representation needed by most compilers. There are no 1297330f729Sjoergfacilities for lexical nor semantic analysis. 1307330f729Sjoerg 1317330f729Sjoerg 1327330f729SjoergI don't understand the ``GetElementPtr`` instruction. Help! 1337330f729Sjoerg----------------------------------------------------------- 1347330f729SjoergSee `The Often Misunderstood GEP Instruction <GetElementPtr.html>`_. 1357330f729Sjoerg 1367330f729Sjoerg 1377330f729SjoergUsing the C and C++ Front Ends 1387330f729Sjoerg============================== 1397330f729Sjoerg 1407330f729SjoergCan I compile C or C++ code to platform-independent LLVM bitcode? 1417330f729Sjoerg----------------------------------------------------------------- 1427330f729SjoergNo. C and C++ are inherently platform-dependent languages. The most obvious 1437330f729Sjoergexample of this is the preprocessor. A very common way that C code is made 1447330f729Sjoergportable is by using the preprocessor to include platform-specific code. In 1457330f729Sjoergpractice, information about other platforms is lost after preprocessing, so 1467330f729Sjoergthe result is inherently dependent on the platform that the preprocessing was 1477330f729Sjoergtargeting. 1487330f729Sjoerg 1497330f729SjoergAnother example is ``sizeof``. It's common for ``sizeof(long)`` to vary 1507330f729Sjoergbetween platforms. In most C front-ends, ``sizeof`` is expanded to a 1517330f729Sjoergconstant immediately, thus hard-wiring a platform-specific detail. 1527330f729Sjoerg 1537330f729SjoergAlso, since many platforms define their ABIs in terms of C, and since LLVM is 1547330f729Sjoerglower-level than C, front-ends currently must emit platform-specific IR in 1557330f729Sjoergorder to have the result conform to the platform ABI. 1567330f729Sjoerg 1577330f729Sjoerg 1587330f729SjoergQuestions about code generated by the demo page 1597330f729Sjoerg=============================================== 1607330f729Sjoerg 1617330f729SjoergWhat is this ``llvm.global_ctors`` and ``_GLOBAL__I_a...`` stuff that happens when I ``#include <iostream>``? 1627330f729Sjoerg------------------------------------------------------------------------------------------------------------- 1637330f729SjoergIf you ``#include`` the ``<iostream>`` header into a C++ translation unit, 1647330f729Sjoergthe file will probably use the ``std::cin``/``std::cout``/... global objects. 1657330f729SjoergHowever, C++ does not guarantee an order of initialization between static 1667330f729Sjoergobjects in different translation units, so if a static ctor/dtor in your .cpp 1677330f729Sjoergfile used ``std::cout``, for example, the object would not necessarily be 1687330f729Sjoergautomatically initialized before your use. 1697330f729Sjoerg 1707330f729SjoergTo make ``std::cout`` and friends work correctly in these scenarios, the STL 1717330f729Sjoergthat we use declares a static object that gets created in every translation 1727330f729Sjoergunit that includes ``<iostream>``. This object has a static constructor 1737330f729Sjoergand destructor that initializes and destroys the global iostream objects 1747330f729Sjoergbefore they could possibly be used in the file. The code that you see in the 1757330f729Sjoerg``.ll`` file corresponds to the constructor and destructor registration code. 1767330f729Sjoerg 1777330f729SjoergIf you would like to make it easier to *understand* the LLVM code generated 1787330f729Sjoergby the compiler in the demo page, consider using ``printf()`` instead of 1797330f729Sjoerg``iostream``\s to print values. 1807330f729Sjoerg 1817330f729Sjoerg 1827330f729SjoergWhere did all of my code go?? 1837330f729Sjoerg----------------------------- 1847330f729SjoergIf you are using the LLVM demo page, you may often wonder what happened to 1857330f729Sjoergall of the code that you typed in. Remember that the demo script is running 1867330f729Sjoergthe code through the LLVM optimizers, so if your code doesn't actually do 1877330f729Sjoerganything useful, it might all be deleted. 1887330f729Sjoerg 1897330f729SjoergTo prevent this, make sure that the code is actually needed. For example, if 1907330f729Sjoergyou are computing some expression, return the value from the function instead 1917330f729Sjoergof leaving it in a local variable. If you really want to constrain the 1927330f729Sjoergoptimizer, you can read from and assign to ``volatile`` global variables. 1937330f729Sjoerg 1947330f729Sjoerg 1957330f729SjoergWhat is this "``undef``" thing that shows up in my code? 1967330f729Sjoerg-------------------------------------------------------- 1977330f729Sjoerg``undef`` is the LLVM way of representing a value that is not defined. You 1987330f729Sjoergcan get these if you do not initialize a variable before you use it. For 1997330f729Sjoergexample, the C function: 2007330f729Sjoerg 2017330f729Sjoerg.. code-block:: c 2027330f729Sjoerg 2037330f729Sjoerg int X() { int i; return i; } 2047330f729Sjoerg 2057330f729SjoergIs compiled to "``ret i32 undef``" because "``i``" never has a value specified 2067330f729Sjoergfor it. 2077330f729Sjoerg 2087330f729Sjoerg 2097330f729SjoergWhy does instcombine + simplifycfg turn a call to a function with a mismatched calling convention into "unreachable"? Why not make the verifier reject it? 2107330f729Sjoerg---------------------------------------------------------------------------------------------------------------------------------------------------------- 2117330f729SjoergThis is a common problem run into by authors of front-ends that are using 2127330f729Sjoergcustom calling conventions: you need to make sure to set the right calling 2137330f729Sjoergconvention on both the function and on each call to the function. For 2147330f729Sjoergexample, this code: 2157330f729Sjoerg 2167330f729Sjoerg.. code-block:: llvm 2177330f729Sjoerg 2187330f729Sjoerg define fastcc void @foo() { 2197330f729Sjoerg ret void 2207330f729Sjoerg } 2217330f729Sjoerg define void @bar() { 2227330f729Sjoerg call void @foo() 2237330f729Sjoerg ret void 2247330f729Sjoerg } 2257330f729Sjoerg 2267330f729SjoergIs optimized to: 2277330f729Sjoerg 2287330f729Sjoerg.. code-block:: llvm 2297330f729Sjoerg 2307330f729Sjoerg define fastcc void @foo() { 2317330f729Sjoerg ret void 2327330f729Sjoerg } 2337330f729Sjoerg define void @bar() { 2347330f729Sjoerg unreachable 2357330f729Sjoerg } 2367330f729Sjoerg 2377330f729Sjoerg... with "``opt -instcombine -simplifycfg``". This often bites people because 2387330f729Sjoerg"all their code disappears". Setting the calling convention on the caller and 2397330f729Sjoergcallee is required for indirect calls to work, so people often ask why not 2407330f729Sjoergmake the verifier reject this sort of thing. 2417330f729Sjoerg 2427330f729SjoergThe answer is that this code has undefined behavior, but it is not illegal. 2437330f729SjoergIf we made it illegal, then every transformation that could potentially create 2447330f729Sjoergthis would have to ensure that it doesn't, and there is valid code that can 2457330f729Sjoergcreate this sort of construct (in dead code). The sorts of things that can 2467330f729Sjoergcause this to happen are fairly contrived, but we still need to accept them. 2477330f729SjoergHere's an example: 2487330f729Sjoerg 2497330f729Sjoerg.. code-block:: llvm 2507330f729Sjoerg 2517330f729Sjoerg define fastcc void @foo() { 2527330f729Sjoerg ret void 2537330f729Sjoerg } 2547330f729Sjoerg define internal void @bar(void()* %FP, i1 %cond) { 2557330f729Sjoerg br i1 %cond, label %T, label %F 2567330f729Sjoerg T: 2577330f729Sjoerg call void %FP() 2587330f729Sjoerg ret void 2597330f729Sjoerg F: 2607330f729Sjoerg call fastcc void %FP() 2617330f729Sjoerg ret void 2627330f729Sjoerg } 2637330f729Sjoerg define void @test() { 2647330f729Sjoerg %X = or i1 false, false 2657330f729Sjoerg call void @bar(void()* @foo, i1 %X) 2667330f729Sjoerg ret void 2677330f729Sjoerg } 2687330f729Sjoerg 2697330f729SjoergIn this example, "test" always passes ``@foo``/``false`` into ``bar``, which 2707330f729Sjoergensures that it is dynamically called with the right calling conv (thus, the 2717330f729Sjoergcode is perfectly well defined). If you run this through the inliner, you 2727330f729Sjoergget this (the explicit "or" is there so that the inliner doesn't dead code 2737330f729Sjoergeliminate a bunch of stuff): 2747330f729Sjoerg 2757330f729Sjoerg.. code-block:: llvm 2767330f729Sjoerg 2777330f729Sjoerg define fastcc void @foo() { 2787330f729Sjoerg ret void 2797330f729Sjoerg } 2807330f729Sjoerg define void @test() { 2817330f729Sjoerg %X = or i1 false, false 2827330f729Sjoerg br i1 %X, label %T.i, label %F.i 2837330f729Sjoerg T.i: 2847330f729Sjoerg call void @foo() 2857330f729Sjoerg br label %bar.exit 2867330f729Sjoerg F.i: 2877330f729Sjoerg call fastcc void @foo() 2887330f729Sjoerg br label %bar.exit 2897330f729Sjoerg bar.exit: 2907330f729Sjoerg ret void 2917330f729Sjoerg } 2927330f729Sjoerg 2937330f729SjoergHere you can see that the inlining pass made an undefined call to ``@foo`` 2947330f729Sjoergwith the wrong calling convention. We really don't want to make the inliner 2957330f729Sjoerghave to know about this sort of thing, so it needs to be valid code. In this 2967330f729Sjoergcase, dead code elimination can trivially remove the undefined code. However, 2977330f729Sjoergif ``%X`` was an input argument to ``@test``, the inliner would produce this: 2987330f729Sjoerg 2997330f729Sjoerg.. code-block:: llvm 3007330f729Sjoerg 3017330f729Sjoerg define fastcc void @foo() { 3027330f729Sjoerg ret void 3037330f729Sjoerg } 3047330f729Sjoerg 3057330f729Sjoerg define void @test(i1 %X) { 3067330f729Sjoerg br i1 %X, label %T.i, label %F.i 3077330f729Sjoerg T.i: 3087330f729Sjoerg call void @foo() 3097330f729Sjoerg br label %bar.exit 3107330f729Sjoerg F.i: 3117330f729Sjoerg call fastcc void @foo() 3127330f729Sjoerg br label %bar.exit 3137330f729Sjoerg bar.exit: 3147330f729Sjoerg ret void 3157330f729Sjoerg } 3167330f729Sjoerg 3177330f729SjoergThe interesting thing about this is that ``%X`` *must* be false for the 3187330f729Sjoergcode to be well-defined, but no amount of dead code elimination will be able 3197330f729Sjoergto delete the broken call as unreachable. However, since 3207330f729Sjoerg``instcombine``/``simplifycfg`` turns the undefined call into unreachable, we 3217330f729Sjoergend up with a branch on a condition that goes to unreachable: a branch to 3227330f729Sjoergunreachable can never happen, so "``-inline -instcombine -simplifycfg``" is 3237330f729Sjoergable to produce: 3247330f729Sjoerg 3257330f729Sjoerg.. code-block:: llvm 3267330f729Sjoerg 3277330f729Sjoerg define fastcc void @foo() { 3287330f729Sjoerg ret void 3297330f729Sjoerg } 3307330f729Sjoerg define void @test(i1 %X) { 3317330f729Sjoerg F.i: 3327330f729Sjoerg call fastcc void @foo() 3337330f729Sjoerg ret void 3347330f729Sjoerg } 335