1======================================================= 2Building a JIT: Starting out with KaleidoscopeJIT 3======================================================= 4 5.. contents:: 6 :local: 7 8Chapter 1 Introduction 9====================== 10 11**Warning: This tutorial is currently being updated to account for ORC API 12changes. Only Chapters 1 and 2 are up-to-date.** 13 14**Example code from Chapters 3 to 5 will compile and run, but has not been 15updated** 16 17Welcome to Chapter 1 of the "Building an ORC-based JIT in LLVM" tutorial. This 18tutorial runs through the implementation of a JIT compiler using LLVM's 19On-Request-Compilation (ORC) APIs. It begins with a simplified version of the 20KaleidoscopeJIT class used in the 21`Implementing a language with LLVM <LangImpl01.html>`_ tutorials and then 22introduces new features like concurrent compilation, optimization, lazy 23compilation and remote execution. 24 25The goal of this tutorial is to introduce you to LLVM's ORC JIT APIs, show how 26these APIs interact with other parts of LLVM, and to teach you how to recombine 27them to build a custom JIT that is suited to your use-case. 28 29The structure of the tutorial is: 30 31- Chapter #1: Investigate the simple KaleidoscopeJIT class. This will 32 introduce some of the basic concepts of the ORC JIT APIs, including the 33 idea of an ORC *Layer*. 34 35- `Chapter #2 <BuildingAJIT2.html>`_: Extend the basic KaleidoscopeJIT by adding 36 a new layer that will optimize IR and generated code. 37 38- `Chapter #3 <BuildingAJIT3.html>`_: Further extend the JIT by adding a 39 Compile-On-Demand layer to lazily compile IR. 40 41- `Chapter #4 <BuildingAJIT4.html>`_: Improve the laziness of our JIT by 42 replacing the Compile-On-Demand layer with a custom layer that uses the ORC 43 Compile Callbacks API directly to defer IR-generation until functions are 44 called. 45 46- `Chapter #5 <BuildingAJIT5.html>`_: Add process isolation by JITing code into 47 a remote process with reduced privileges using the JIT Remote APIs. 48 49To provide input for our JIT we will use a lightly modified version of the 50Kaleidoscope REPL from `Chapter 7 <LangImpl07.html>`_ of the "Implementing a 51language in LLVM tutorial". 52 53Finally, a word on API generations: ORC is the 3rd generation of LLVM JIT API. 54It was preceded by MCJIT, and before that by the (now deleted) legacy JIT. 55These tutorials don't assume any experience with these earlier APIs, but 56readers acquainted with them will see many familiar elements. Where appropriate 57we will make this connection with the earlier APIs explicit to help people who 58are transitioning from them to ORC. 59 60JIT API Basics 61============== 62 63The purpose of a JIT compiler is to compile code "on-the-fly" as it is needed, 64rather than compiling whole programs to disk ahead of time as a traditional 65compiler does. To support that aim our initial, bare-bones JIT API will have 66just two functions: 67 681. ``Error addModule(std::unique_ptr<Module> M)``: Make the given IR module 69 available for execution. 702. ``Expected<ExecutorSymbolDef> lookup()``: Search for pointers to 71 symbols (functions or variables) that have been added to the JIT. 72 73A basic use-case for this API, executing the 'main' function from a module, 74will look like: 75 76.. code-block:: c++ 77 78 JIT J; 79 J.addModule(buildModule()); 80 auto *Main = J.lookup("main").getAddress().toPtr<int(*)(int, char *[])>(); 81 int Result = Main(); 82 83The APIs that we build in these tutorials will all be variations on this simple 84theme. Behind this API we will refine the implementation of the JIT to add 85support for concurrent compilation, optimization and lazy compilation. 86Eventually we will extend the API itself to allow higher-level program 87representations (e.g. ASTs) to be added to the JIT. 88 89KaleidoscopeJIT 90=============== 91 92In the previous section we described our API, now we examine a simple 93implementation of it: The KaleidoscopeJIT class [1]_ that was used in the 94`Implementing a language with LLVM <LangImpl01.html>`_ tutorials. We will use 95the REPL code from `Chapter 7 <LangImpl07.html>`_ of that tutorial to supply the 96input for our JIT: Each time the user enters an expression the REPL will add a 97new IR module containing the code for that expression to the JIT. If the 98expression is a top-level expression like '1+1' or 'sin(x)', the REPL will also 99use the lookup method of our JIT class find and execute the code for the 100expression. In later chapters of this tutorial we will modify the REPL to enable 101new interactions with our JIT class, but for now we will take this setup for 102granted and focus our attention on the implementation of our JIT itself. 103 104Our KaleidoscopeJIT class is defined in the KaleidoscopeJIT.h header. After the 105usual include guards and #includes [2]_, we get to the definition of our class: 106 107.. code-block:: c++ 108 109 #ifndef LLVM_EXECUTIONENGINE_ORC_KALEIDOSCOPEJIT_H 110 #define LLVM_EXECUTIONENGINE_ORC_KALEIDOSCOPEJIT_H 111 112 #include "llvm/ADT/StringRef.h" 113 #include "llvm/ExecutionEngine/Orc/CompileUtils.h" 114 #include "llvm/ExecutionEngine/Orc/Core.h" 115 #include "llvm/ExecutionEngine/Orc/ExecutionUtils.h" 116 #include "llvm/ExecutionEngine/Orc/IRCompileLayer.h" 117 #include "llvm/ExecutionEngine/Orc/JITTargetMachineBuilder.h" 118 #include "llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h" 119 #include "llvm/ExecutionEngine/SectionMemoryManager.h" 120 #include "llvm/IR/DataLayout.h" 121 #include "llvm/IR/LLVMContext.h" 122 #include <memory> 123 124 namespace llvm { 125 namespace orc { 126 127 class KaleidoscopeJIT { 128 private: 129 ExecutionSession ES; 130 RTDyldObjectLinkingLayer ObjectLayer; 131 IRCompileLayer CompileLayer; 132 133 DataLayout DL; 134 MangleAndInterner Mangle; 135 ThreadSafeContext Ctx; 136 137 public: 138 KaleidoscopeJIT(JITTargetMachineBuilder JTMB, DataLayout DL) 139 : ObjectLayer(ES, 140 []() { return std::make_unique<SectionMemoryManager>(); }), 141 CompileLayer(ES, ObjectLayer, ConcurrentIRCompiler(std::move(JTMB))), 142 DL(std::move(DL)), Mangle(ES, this->DL), 143 Ctx(std::make_unique<LLVMContext>()) { 144 ES.getMainJITDylib().addGenerator( 145 cantFail(DynamicLibrarySearchGenerator::GetForCurrentProcess(DL.getGlobalPrefix()))); 146 } 147 148Our class begins with six member variables: An ExecutionSession member, ``ES``, 149which provides context for our running JIT'd code (including the string pool, 150global mutex, and error reporting facilities); An RTDyldObjectLinkingLayer, 151``ObjectLayer``, that can be used to add object files to our JIT (though we will 152not use it directly); An IRCompileLayer, ``CompileLayer``, that can be used to 153add LLVM Modules to our JIT (and which builds on the ObjectLayer), A DataLayout 154and MangleAndInterner, ``DL`` and ``Mangle``, that will be used for symbol mangling 155(more on that later); and finally an LLVMContext that clients will use when 156building IR files for the JIT. 157 158Next up we have our class constructor, which takes a `JITTargetMachineBuilder`` 159that will be used by our IRCompiler, and a ``DataLayout`` that we will use to 160initialize our DL member. The constructor begins by initializing our 161ObjectLayer. The ObjectLayer requires a reference to the ExecutionSession, and 162a function object that will build a JIT memory manager for each module that is 163added (a JIT memory manager manages memory allocations, memory permissions, and 164registration of exception handlers for JIT'd code). For this we use a lambda 165that returns a SectionMemoryManager, an off-the-shelf utility that provides all 166the basic memory management functionality required for this chapter. Next we 167initialize our CompileLayer. The CompileLayer needs three things: (1) A 168reference to the ExecutionSession, (2) A reference to our object layer, and (3) 169a compiler instance to use to perform the actual compilation from IR to object 170files. We use the off-the-shelf ConcurrentIRCompiler utility as our compiler, 171which we construct using this constructor's JITTargetMachineBuilder argument. 172The ConcurrentIRCompiler utility will use the JITTargetMachineBuilder to build 173llvm TargetMachines (which are not thread safe) as needed for compiles. After 174this, we initialize our supporting members: ``DL``, ``Mangler`` and ``Ctx`` with 175the input DataLayout, the ExecutionSession and DL member, and a new default 176constructed LLVMContext respectively. Now that our members have been initialized, 177so the one thing that remains to do is to tweak the configuration of the 178*JITDylib* that we will store our code in. We want to modify this dylib to 179contain not only the symbols that we add to it, but also the symbols from our 180REPL process as well. We do this by attaching a 181``DynamicLibrarySearchGenerator`` instance using the 182``DynamicLibrarySearchGenerator::GetForCurrentProcess`` method. 183 184 185.. code-block:: c++ 186 187 static Expected<std::unique_ptr<KaleidoscopeJIT>> Create() { 188 auto JTMB = JITTargetMachineBuilder::detectHost(); 189 190 if (!JTMB) 191 return JTMB.takeError(); 192 193 auto DL = JTMB->getDefaultDataLayoutForTarget(); 194 if (!DL) 195 return DL.takeError(); 196 197 return std::make_unique<KaleidoscopeJIT>(std::move(*JTMB), std::move(*DL)); 198 } 199 200 const DataLayout &getDataLayout() const { return DL; } 201 202 LLVMContext &getContext() { return *Ctx.getContext(); } 203 204Next we have a named constructor, ``Create``, which will build a KaleidoscopeJIT 205instance that is configured to generate code for our host process. It does this 206by first generating a JITTargetMachineBuilder instance using that classes' 207detectHost method and then using that instance to generate a datalayout for 208the target process. Each of these operations can fail, so each returns its 209result wrapped in an Expected value [3]_ that we must check for error before 210continuing. If both operations succeed we can unwrap their results (using the 211dereference operator) and pass them into KaleidoscopeJIT's constructor on the 212last line of the function. 213 214Following the named constructor we have the ``getDataLayout()`` and 215``getContext()`` methods. These are used to make data structures created and 216managed by the JIT (especially the LLVMContext) available to the REPL code that 217will build our IR modules. 218 219.. code-block:: c++ 220 221 void addModule(std::unique_ptr<Module> M) { 222 cantFail(CompileLayer.add(ES.getMainJITDylib(), 223 ThreadSafeModule(std::move(M), Ctx))); 224 } 225 226 Expected<ExecutorSymbolDef> lookup(StringRef Name) { 227 return ES.lookup({&ES.getMainJITDylib()}, Mangle(Name.str())); 228 } 229 230Now we come to the first of our JIT API methods: addModule. This method is 231responsible for adding IR to the JIT and making it available for execution. In 232this initial implementation of our JIT we will make our modules "available for 233execution" by adding them to the CompileLayer, which will it turn store the 234Module in the main JITDylib. This process will create new symbol table entries 235in the JITDylib for each definition in the module, and will defer compilation of 236the module until any of its definitions is looked up. Note that this is not lazy 237compilation: just referencing a definition, even if it is never used, will be 238enough to trigger compilation. In later chapters we will teach our JIT to defer 239compilation of functions until they're actually called. To add our Module we 240must first wrap it in a ThreadSafeModule instance, which manages the lifetime of 241the Module's LLVMContext (our Ctx member) in a thread-friendly way. In our 242example, all modules will share the Ctx member, which will exist for the 243duration of the JIT. Once we switch to concurrent compilation in later chapters 244we will use a new context per module. 245 246Our last method is ``lookup``, which allows us to look up addresses for 247function and variable definitions added to the JIT based on their symbol names. 248As noted above, lookup will implicitly trigger compilation for any symbol 249that has not already been compiled. Our lookup method calls through to 250`ExecutionSession::lookup`, passing in a list of dylibs to search (in our case 251just the main dylib), and the symbol name to search for, with a twist: We have 252to *mangle* the name of the symbol we're searching for first. The ORC JIT 253components use mangled symbols internally the same way a static compiler and 254linker would, rather than using plain IR symbol names. This allows JIT'd code 255to interoperate easily with precompiled code in the application or shared 256libraries. The kind of mangling will depend on the DataLayout, which in turn 257depends on the target platform. To allow us to remain portable and search based 258on the un-mangled name, we just re-produce this mangling ourselves using our 259``Mangle`` member function object. 260 261This brings us to the end of Chapter 1 of Building a JIT. You now have a basic 262but fully functioning JIT stack that you can use to take LLVM IR and make it 263executable within the context of your JIT process. In the next chapter we'll 264look at how to extend this JIT to produce better quality code, and in the 265process take a deeper look at the ORC layer concept. 266 267`Next: Extending the KaleidoscopeJIT <BuildingAJIT2.html>`_ 268 269Full Code Listing 270================= 271 272Here is the complete code listing for our running example. To build this 273example, use: 274 275.. code-block:: bash 276 277 # Compile 278 clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy 279 # Run 280 ./toy 281 282Here is the code: 283 284.. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter1/KaleidoscopeJIT.h 285 :language: c++ 286 287.. [1] Actually we use a cut-down version of KaleidoscopeJIT that makes a 288 simplifying assumption: symbols cannot be re-defined. This will make it 289 impossible to re-define symbols in the REPL, but will make our symbol 290 lookup logic simpler. Re-introducing support for symbol redefinition is 291 left as an exercise for the reader. (The KaleidoscopeJIT.h used in the 292 original tutorials will be a helpful reference). 293 294.. [2] +-----------------------------+-----------------------------------------------+ 295 | File | Reason for inclusion | 296 +=============================+===============================================+ 297 | CompileUtils.h | Provides the SimpleCompiler class. | 298 +-----------------------------+-----------------------------------------------+ 299 | Core.h | Core utilities such as ExecutionSession and | 300 | | JITDylib. | 301 +-----------------------------+-----------------------------------------------+ 302 | ExecutionUtils.h | Provides the DynamicLibrarySearchGenerator | 303 | | class. | 304 +-----------------------------+-----------------------------------------------+ 305 | IRCompileLayer.h | Provides the IRCompileLayer class. | 306 +-----------------------------+-----------------------------------------------+ 307 | JITTargetMachineBuilder.h | Provides the JITTargetMachineBuilder class. | 308 +-----------------------------+-----------------------------------------------+ 309 | RTDyldObjectLinkingLayer.h | Provides the RTDyldObjectLinkingLayer class. | 310 +-----------------------------+-----------------------------------------------+ 311 | SectionMemoryManager.h | Provides the SectionMemoryManager class. | 312 +-----------------------------+-----------------------------------------------+ 313 | DataLayout.h | Provides the DataLayout class. | 314 +-----------------------------+-----------------------------------------------+ 315 | LLVMContext.h | Provides the LLVMContext class. | 316 +-----------------------------+-----------------------------------------------+ 317 318.. [3] See the ErrorHandling section in the LLVM Programmer's Manual 319 (https://llvm.org/docs/ProgrammersManual.html#error-handling) 320