1d80f118eSChris Lattner====================================================== 2d80f118eSChris LattnerKaleidoscope: Conclusion and other useful LLVM tidbits 3d80f118eSChris Lattner====================================================== 4d80f118eSChris Lattner 5d80f118eSChris Lattner.. contents:: 6d80f118eSChris Lattner :local: 7d80f118eSChris Lattner 8d80f118eSChris LattnerTutorial Conclusion 9d80f118eSChris Lattner=================== 10d80f118eSChris Lattner 11d80f118eSChris LattnerWelcome to the final chapter of the "`Implementing a language with 12d80f118eSChris LattnerLLVM <index.html>`_" tutorial. In the course of this tutorial, we have 13d80f118eSChris Lattnergrown our little Kaleidoscope language from being a useless toy, to 14d80f118eSChris Lattnerbeing a semi-interesting (but probably still useless) toy. :) 15d80f118eSChris Lattner 16d80f118eSChris LattnerIt is interesting to see how far we've come, and how little code it has 17d80f118eSChris Lattnertaken. We built the entire lexer, parser, AST, code generator, an 18d80f118eSChris Lattnerinteractive run-loop (with a JIT!), and emitted debug information in 19d80f118eSChris Lattnerstandalone executables - all in under 1000 lines of (non-comment/non-blank) 20d80f118eSChris Lattnercode. 21d80f118eSChris Lattner 22d80f118eSChris LattnerOur little language supports a couple of interesting features: it 23d80f118eSChris Lattnersupports user defined binary and unary operators, it uses JIT 24d80f118eSChris Lattnercompilation for immediate evaluation, and it supports a few control flow 25d80f118eSChris Lattnerconstructs with SSA construction. 26d80f118eSChris Lattner 27d80f118eSChris LattnerPart of the idea of this tutorial was to show you how easy and fun it 28d80f118eSChris Lattnercan be to define, build, and play with languages. Building a compiler 29d80f118eSChris Lattnerneed not be a scary or mystical process! Now that you've seen some of 30d80f118eSChris Lattnerthe basics, I strongly encourage you to take the code and hack on it. 31d80f118eSChris LattnerFor example, try adding: 32d80f118eSChris Lattner 33e8fa9014SKazu Hirata- **global variables** - While global variables have questionable value 34d80f118eSChris Lattner in modern software engineering, they are often useful when putting 35d80f118eSChris Lattner together quick little hacks like the Kaleidoscope compiler itself. 36d80f118eSChris Lattner Fortunately, our current setup makes it very easy to add global 37d80f118eSChris Lattner variables: just have value lookup check to see if an unresolved 38d80f118eSChris Lattner variable is in the global variable symbol table before rejecting it. 39d80f118eSChris Lattner To create a new global variable, make an instance of the LLVM 40d80f118eSChris Lattner ``GlobalVariable`` class. 41d80f118eSChris Lattner- **typed variables** - Kaleidoscope currently only supports variables 42d80f118eSChris Lattner of type double. This gives the language a very nice elegance, because 43d80f118eSChris Lattner only supporting one type means that you never have to specify types. 44d80f118eSChris Lattner Different languages have different ways of handling this. The easiest 45d80f118eSChris Lattner way is to require the user to specify types for every variable 46d80f118eSChris Lattner definition, and record the type of the variable in the symbol table 47d80f118eSChris Lattner along with its Value\*. 48d80f118eSChris Lattner- **arrays, structs, vectors, etc** - Once you add types, you can start 49d80f118eSChris Lattner extending the type system in all sorts of interesting ways. Simple 50d80f118eSChris Lattner arrays are very easy and are quite useful for many different 51d80f118eSChris Lattner applications. Adding them is mostly an exercise in learning how the 522916489cSkristina LLVM `getelementptr <../../LangRef.html#getelementptr-instruction>`_ instruction 53d80f118eSChris Lattner works: it is so nifty/unconventional, it `has its own 542916489cSkristina FAQ <../../GetElementPtr.html>`_! 55d80f118eSChris Lattner- **standard runtime** - Our current language allows the user to access 56d80f118eSChris Lattner arbitrary external functions, and we use it for things like "printd" 57d80f118eSChris Lattner and "putchard". As you extend the language to add higher-level 58d80f118eSChris Lattner constructs, often these constructs make the most sense if they are 59d80f118eSChris Lattner lowered to calls into a language-supplied runtime. For example, if 60d80f118eSChris Lattner you add hash tables to the language, it would probably make sense to 61d80f118eSChris Lattner add the routines to a runtime, instead of inlining them all the way. 62d80f118eSChris Lattner- **memory management** - Currently we can only access the stack in 63d80f118eSChris Lattner Kaleidoscope. It would also be useful to be able to allocate heap 64d80f118eSChris Lattner memory, either with calls to the standard libc malloc/free interface 65d80f118eSChris Lattner or with a garbage collector. If you would like to use garbage 66d80f118eSChris Lattner collection, note that LLVM fully supports `Accurate Garbage 672916489cSkristina Collection <../../GarbageCollection.html>`_ including algorithms that 68d80f118eSChris Lattner move objects and need to scan/update the stack. 69d80f118eSChris Lattner- **exception handling support** - LLVM supports generation of `zero 702916489cSkristina cost exceptions <../../ExceptionHandling.html>`_ which interoperate with 71d80f118eSChris Lattner code compiled in other languages. You could also generate code by 72d80f118eSChris Lattner implicitly making every function return an error value and checking 73d80f118eSChris Lattner it. You could also make explicit use of setjmp/longjmp. There are 74d80f118eSChris Lattner many different ways to go here. 75d80f118eSChris Lattner- **object orientation, generics, database access, complex numbers, 76d80f118eSChris Lattner geometric programming, ...** - Really, there is no end of crazy 77d80f118eSChris Lattner features that you can add to the language. 78d80f118eSChris Lattner- **unusual domains** - We've been talking about applying LLVM to a 79d80f118eSChris Lattner domain that many people are interested in: building a compiler for a 80d80f118eSChris Lattner specific language. However, there are many other domains that can use 81d80f118eSChris Lattner compiler technology that are not typically considered. For example, 82d80f118eSChris Lattner LLVM has been used to implement OpenGL graphics acceleration, 83d80f118eSChris Lattner translate C++ code to ActionScript, and many other cute and clever 84d80f118eSChris Lattner things. Maybe you will be the first to JIT compile a regular 85d80f118eSChris Lattner expression interpreter into native code with LLVM? 86d80f118eSChris Lattner 87d80f118eSChris LattnerHave fun - try doing something crazy and unusual. Building a language 88d80f118eSChris Lattnerlike everyone else always has, is much less fun than trying something a 89d80f118eSChris Lattnerlittle crazy or off the wall and seeing how it turns out. If you get 90*a749e329SDanny Möschstuck or want to talk about it, please post on the `LLVM forums 91*a749e329SDanny Mösch<https://discourse.llvm.org>`_: it has lots of people who are interested 92*a749e329SDanny Möschin languages and are often willing to help out. 93d80f118eSChris Lattner 94d80f118eSChris LattnerBefore we end this tutorial, I want to talk about some "tips and tricks" 95d80f118eSChris Lattnerfor generating LLVM IR. These are some of the more subtle things that 96d80f118eSChris Lattnermay not be obvious, but are very useful if you want to take advantage of 97d80f118eSChris LattnerLLVM's capabilities. 98d80f118eSChris Lattner 99d80f118eSChris LattnerProperties of the LLVM IR 100d80f118eSChris Lattner========================= 101d80f118eSChris Lattner 102d80f118eSChris LattnerWe have a couple of common questions about code in the LLVM IR form - 103d80f118eSChris Lattnerlet's just get these out of the way right now, shall we? 104d80f118eSChris Lattner 105d80f118eSChris LattnerTarget Independence 106d80f118eSChris Lattner------------------- 107d80f118eSChris Lattner 108d80f118eSChris LattnerKaleidoscope is an example of a "portable language": any program written 109d80f118eSChris Lattnerin Kaleidoscope will work the same way on any target that it runs on. 110d80f118eSChris LattnerMany other languages have this property, e.g. lisp, java, haskell, 111d80f118eSChris Lattnerjavascript, python, etc (note that while these languages are portable, 112d80f118eSChris Lattnernot all their libraries are). 113d80f118eSChris Lattner 114d80f118eSChris LattnerOne nice aspect of LLVM is that it is often capable of preserving target 115d80f118eSChris Lattnerindependence in the IR: you can take the LLVM IR for a 116d80f118eSChris LattnerKaleidoscope-compiled program and run it on any target that LLVM 117d80f118eSChris Lattnersupports, even emitting C code and compiling that on targets that LLVM 118d80f118eSChris Lattnerdoesn't support natively. You can trivially tell that the Kaleidoscope 119d80f118eSChris Lattnercompiler generates target-independent code because it never queries for 120d80f118eSChris Lattnerany target-specific information when generating code. 121d80f118eSChris Lattner 122d80f118eSChris LattnerThe fact that LLVM provides a compact, target-independent, 123d80f118eSChris Lattnerrepresentation for code gets a lot of people excited. Unfortunately, 124d80f118eSChris Lattnerthese people are usually thinking about C or a language from the C 125d80f118eSChris Lattnerfamily when they are asking questions about language portability. I say 126d80f118eSChris Lattner"unfortunately", because there is really no way to make (fully general) 127d80f118eSChris LattnerC code portable, other than shipping the source code around (and of 128d80f118eSChris Lattnercourse, C source code is not actually portable in general either - ever 129d80f118eSChris Lattnerport a really old application from 32- to 64-bits?). 130d80f118eSChris Lattner 131d80f118eSChris LattnerThe problem with C (again, in its full generality) is that it is heavily 132d80f118eSChris Lattnerladen with target specific assumptions. As one simple example, the 133d80f118eSChris Lattnerpreprocessor often destructively removes target-independence from the 134d80f118eSChris Lattnercode when it processes the input text: 135d80f118eSChris Lattner 136d80f118eSChris Lattner.. code-block:: c 137d80f118eSChris Lattner 138d80f118eSChris Lattner #ifdef __i386__ 139d80f118eSChris Lattner int X = 1; 140d80f118eSChris Lattner #else 141d80f118eSChris Lattner int X = 42; 142d80f118eSChris Lattner #endif 143d80f118eSChris Lattner 144d80f118eSChris LattnerWhile it is possible to engineer more and more complex solutions to 145d80f118eSChris Lattnerproblems like this, it cannot be solved in full generality in a way that 146d80f118eSChris Lattneris better than shipping the actual source code. 147d80f118eSChris Lattner 148d80f118eSChris LattnerThat said, there are interesting subsets of C that can be made portable. 149d80f118eSChris LattnerIf you are willing to fix primitive types to a fixed size (say int = 150d80f118eSChris Lattner32-bits, and long = 64-bits), don't care about ABI compatibility with 151d80f118eSChris Lattnerexisting binaries, and are willing to give up some other minor features, 152d80f118eSChris Lattneryou can have portable code. This can make sense for specialized domains 153d80f118eSChris Lattnersuch as an in-kernel language. 154d80f118eSChris Lattner 155d80f118eSChris LattnerSafety Guarantees 156d80f118eSChris Lattner----------------- 157d80f118eSChris Lattner 158d80f118eSChris LattnerMany of the languages above are also "safe" languages: it is impossible 159d80f118eSChris Lattnerfor a program written in Java to corrupt its address space and crash the 160d80f118eSChris Lattnerprocess (assuming the JVM has no bugs). Safety is an interesting 161d80f118eSChris Lattnerproperty that requires a combination of language design, runtime 162d80f118eSChris Lattnersupport, and often operating system support. 163d80f118eSChris Lattner 164d80f118eSChris LattnerIt is certainly possible to implement a safe language in LLVM, but LLVM 165d80f118eSChris LattnerIR does not itself guarantee safety. The LLVM IR allows unsafe pointer 166d80f118eSChris Lattnercasts, use after free bugs, buffer over-runs, and a variety of other 167d80f118eSChris Lattnerproblems. Safety needs to be implemented as a layer on top of LLVM and, 168*a749e329SDanny Möschconveniently, several groups have investigated this. Ask on the `LLVM 169*a749e329SDanny Möschforums <https://discourse.llvm.org>`_ if you are interested in more details. 170d80f118eSChris Lattner 171d80f118eSChris LattnerLanguage-Specific Optimizations 172d80f118eSChris Lattner------------------------------- 173d80f118eSChris Lattner 174d80f118eSChris LattnerOne thing about LLVM that turns off many people is that it does not 175d80f118eSChris Lattnersolve all the world's problems in one system. One specific 176d80f118eSChris Lattnercomplaint is that people perceive LLVM as being incapable of performing 177d80f118eSChris Lattnerhigh-level language-specific optimization: LLVM "loses too much 178d80f118eSChris Lattnerinformation". Here are a few observations about this: 179d80f118eSChris Lattner 180d80f118eSChris LattnerFirst, you're right that LLVM does lose information. For example, as of 181d80f118eSChris Lattnerthis writing, there is no way to distinguish in the LLVM IR whether an 182d80f118eSChris LattnerSSA-value came from a C "int" or a C "long" on an ILP32 machine (other 183d80f118eSChris Lattnerthan debug info). Both get compiled down to an 'i32' value and the 184d80f118eSChris Lattnerinformation about what it came from is lost. The more general issue 185d80f118eSChris Lattnerhere, is that the LLVM type system uses "structural equivalence" instead 186d80f118eSChris Lattnerof "name equivalence". Another place this surprises people is if you 187d80f118eSChris Lattnerhave two types in a high-level language that have the same structure 188d80f118eSChris Lattner(e.g. two different structs that have a single int field): these types 189d80f118eSChris Lattnerwill compile down into a single LLVM type and it will be impossible to 190d80f118eSChris Lattnertell what it came from. 191d80f118eSChris Lattner 192d80f118eSChris LattnerSecond, while LLVM does lose information, LLVM is not a fixed target: we 193d80f118eSChris Lattnercontinue to enhance and improve it in many different ways. In addition 194d80f118eSChris Lattnerto adding new features (LLVM did not always support exceptions or debug 195d80f118eSChris Lattnerinfo), we also extend the IR to capture important information for 196d80f118eSChris Lattneroptimization (e.g. whether an argument is sign or zero extended, 197d80f118eSChris Lattnerinformation about pointers aliasing, etc). Many of the enhancements are 198d80f118eSChris Lattneruser-driven: people want LLVM to include some specific feature, so they 199d80f118eSChris Lattnergo ahead and extend it. 200d80f118eSChris Lattner 201d80f118eSChris LattnerThird, it is *possible and easy* to add language-specific optimizations, 202d80f118eSChris Lattnerand you have a number of choices in how to do it. As one trivial 203d80f118eSChris Lattnerexample, it is easy to add language-specific optimization passes that 204d80f118eSChris Lattner"know" things about code compiled for a language. In the case of the C 205d80f118eSChris Lattnerfamily, there is an optimization pass that "knows" about the standard C 206d80f118eSChris Lattnerlibrary functions. If you call "exit(0)" in main(), it knows that it is 207d80f118eSChris Lattnersafe to optimize that into "return 0;" because C specifies what the 208d80f118eSChris Lattner'exit' function does. 209d80f118eSChris Lattner 210d80f118eSChris LattnerIn addition to simple library knowledge, it is possible to embed a 211d80f118eSChris Lattnervariety of other language-specific information into the LLVM IR. If you 212d80f118eSChris Lattnerhave a specific need and run into a wall, please bring the topic up on 213d80f118eSChris Lattnerthe llvm-dev list. At the very worst, you can always treat LLVM as if it 214d80f118eSChris Lattnerwere a "dumb code generator" and implement the high-level optimizations 215d80f118eSChris Lattneryou desire in your front-end, on the language-specific AST. 216d80f118eSChris Lattner 217d80f118eSChris LattnerTips and Tricks 218d80f118eSChris Lattner=============== 219d80f118eSChris Lattner 220d80f118eSChris LattnerThere is a variety of useful tips and tricks that you come to know after 221d80f118eSChris Lattnerworking on/with LLVM that aren't obvious at first glance. Instead of 222d80f118eSChris Lattnerletting everyone rediscover them, this section talks about some of these 223d80f118eSChris Lattnerissues. 224d80f118eSChris Lattner 225d80f118eSChris LattnerImplementing portable offsetof/sizeof 226d80f118eSChris Lattner------------------------------------- 227d80f118eSChris Lattner 228d80f118eSChris LattnerOne interesting thing that comes up, if you are trying to keep the code 229d80f118eSChris Lattnergenerated by your compiler "target independent", is that you often need 230d80f118eSChris Lattnerto know the size of some LLVM type or the offset of some field in an 231d80f118eSChris Lattnerllvm structure. For example, you might need to pass the size of a type 232d80f118eSChris Lattnerinto a function that allocates memory. 233d80f118eSChris Lattner 234d80f118eSChris LattnerUnfortunately, this can vary widely across targets: for example the 235d80f118eSChris Lattnerwidth of a pointer is trivially target-specific. However, there is a 236d80f118eSChris Lattner`clever way to use the getelementptr 237d80f118eSChris Lattnerinstruction <http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt>`_ 238d80f118eSChris Lattnerthat allows you to compute this in a portable way. 239d80f118eSChris Lattner 240d80f118eSChris LattnerGarbage Collected Stack Frames 241d80f118eSChris Lattner------------------------------ 242d80f118eSChris Lattner 243d80f118eSChris LattnerSome languages want to explicitly manage their stack frames, often so 244d80f118eSChris Lattnerthat they are garbage collected or to allow easy implementation of 245d80f118eSChris Lattnerclosures. There are often better ways to implement these features than 246d80f118eSChris Lattnerexplicit stack frames, but `LLVM does support 247d80f118eSChris Lattnerthem, <http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt>`_ 248d80f118eSChris Lattnerif you want. It requires your front-end to convert the code into 249d80f118eSChris Lattner`Continuation Passing 250d80f118eSChris LattnerStyle <http://en.wikipedia.org/wiki/Continuation-passing_style>`_ and 251d80f118eSChris Lattnerthe use of tail calls (which LLVM also supports). 252d80f118eSChris Lattner 253