1=============================== 2ORC Design and Implementation 3=============================== 4 5.. contents:: 6 :local: 7 8Introduction 9============ 10 11This document aims to provide a high-level overview of the design and 12implementation of the ORC JIT APIs. Except where otherwise stated all discussion 13refers to the modern ORCv2 APIs (available since LLVM 7). Clients wishing to 14transition from OrcV1 should see Section :ref:`transitioning_orcv1_to_orcv2`. 15 16Use-cases 17========= 18 19ORC provides a modular API for building JIT compilers. There are a number 20of use cases for such an API. For example: 21 221. The LLVM tutorials use a simple ORC-based JIT class to execute expressions 23compiled from a toy language: Kaleidoscope. 24 252. The LLVM debugger, LLDB, uses a cross-compiling JIT for expression 26evaluation. In this use case, cross compilation allows expressions compiled 27in the debugger process to be executed on the debug target process, which may 28be on a different device/architecture. 29 303. In high-performance JITs (e.g. JVMs, Julia) that want to make use of LLVM's 31optimizations within an existing JIT infrastructure. 32 334. In interpreters and REPLs, e.g. Cling (C++) and the Swift interpreter. 34 35By adopting a modular, library-based design we aim to make ORC useful in as many 36of these contexts as possible. 37 38Features 39======== 40 41ORC provides the following features: 42 43**JIT-linking** 44 ORC provides APIs to link relocatable object files (COFF, ELF, MachO) [1]_ 45 into a target process at runtime. The target process may be the same process 46 that contains the JIT session object and jit-linker, or may be another process 47 (even one running on a different machine or architecture) that communicates 48 with the JIT via RPC. 49 50**LLVM IR compilation** 51 ORC provides off the shelf components (IRCompileLayer, SimpleCompiler, 52 ConcurrentIRCompiler) that make it easy to add LLVM IR to a JIT'd process. 53 54**Eager and lazy compilation** 55 By default, ORC will compile symbols as soon as they are looked up in the JIT 56 session object (``ExecutionSession``). Compiling eagerly by default makes it 57 easy to use ORC as an in-memory compiler for an existing JIT (similar to how 58 MCJIT is commonly used). However ORC also provides built-in support for lazy 59 compilation via lazy-reexports (see :ref:`Laziness`). 60 61**Support for Custom Compilers and Program Representations** 62 Clients can supply custom compilers for each symbol that they define in their 63 JIT session. ORC will run the user-supplied compiler when the a definition of 64 a symbol is needed. ORC is actually fully language agnostic: LLVM IR is not 65 treated specially, and is supported via the same wrapper mechanism (the 66 ``MaterializationUnit`` class) that is used for custom compilers. 67 68**Concurrent JIT'd code** and **Concurrent Compilation** 69 JIT'd code may be executed in multiple threads, may spawn new threads, and may 70 re-enter the ORC (e.g. to request lazy compilation) concurrently from multiple 71 threads. Compilers launched my ORC can run concurrently (provided the client 72 sets up an appropriate dispatcher). Built-in dependency tracking ensures that 73 ORC does not release pointers to JIT'd code or data until all dependencies 74 have also been JIT'd and they are safe to call or use. 75 76**Removable Code** 77 Resources for JIT'd program representations 78 79**Orthogonality** and **Composability** 80 Each of the features above can be used independently. It is possible to put 81 ORC components together to make a non-lazy, in-process, single threaded JIT 82 or a lazy, out-of-process, concurrent JIT, or anything in between. 83 84LLJIT and LLLazyJIT 85=================== 86 87ORC provides two basic JIT classes off-the-shelf. These are useful both as 88examples of how to assemble ORC components to make a JIT, and as replacements 89for earlier LLVM JIT APIs (e.g. MCJIT). 90 91The LLJIT class uses an IRCompileLayer and RTDyldObjectLinkingLayer to support 92compilation of LLVM IR and linking of relocatable object files. All operations 93are performed eagerly on symbol lookup (i.e. a symbol's definition is compiled 94as soon as you attempt to look up its address). LLJIT is a suitable replacement 95for MCJIT in most cases (note: some more advanced features, e.g. 96JITEventListeners are not supported yet). 97 98The LLLazyJIT extends LLJIT and adds a CompileOnDemandLayer to enable lazy 99compilation of LLVM IR. When an LLVM IR module is added via the addLazyIRModule 100method, function bodies in that module will not be compiled until they are first 101called. LLLazyJIT aims to provide a replacement of LLVM's original (pre-MCJIT) 102JIT API. 103 104LLJIT and LLLazyJIT instances can be created using their respective builder 105classes: LLJITBuilder and LLazyJITBuilder. For example, assuming you have a 106module ``M`` loaded on a ThreadSafeContext ``Ctx``: 107 108.. code-block:: c++ 109 110 // Try to detect the host arch and construct an LLJIT instance. 111 auto JIT = LLJITBuilder().create(); 112 113 // If we could not construct an instance, return an error. 114 if (!JIT) 115 return JIT.takeError(); 116 117 // Add the module. 118 if (auto Err = JIT->addIRModule(TheadSafeModule(std::move(M), Ctx))) 119 return Err; 120 121 // Look up the JIT'd code entry point. 122 auto EntrySym = JIT->lookup("entry"); 123 if (!EntrySym) 124 return EntrySym.takeError(); 125 126 // Cast the entry point address to a function pointer. 127 auto *Entry = EntrySym.getAddress().toPtr<void(*)()>(); 128 129 // Call into JIT'd code. 130 Entry(); 131 132The builder classes provide a number of configuration options that can be 133specified before the JIT instance is constructed. For example: 134 135.. code-block:: c++ 136 137 // Build an LLLazyJIT instance that uses four worker threads for compilation, 138 // and jumps to a specific error handler (rather than null) on lazy compile 139 // failures. 140 141 void handleLazyCompileFailure() { 142 // JIT'd code will jump here if lazy compilation fails, giving us an 143 // opportunity to exit or throw an exception into JIT'd code. 144 throw JITFailed(); 145 } 146 147 auto JIT = LLLazyJITBuilder() 148 .setNumCompileThreads(4) 149 .setLazyCompileFailureAddr( 150 ExecutorAddr::fromPtr(&handleLazyCompileFailure)) 151 .create(); 152 153 // ... 154 155For users wanting to get started with LLJIT a minimal example program can be 156found at ``llvm/examples/HowToUseLLJIT``. 157 158Design Overview 159=============== 160 161ORC's JIT program model aims to emulate the linking and symbol resolution 162rules used by the static and dynamic linkers. This allows ORC to JIT 163arbitrary LLVM IR, including IR produced by an ordinary static compiler (e.g. 164clang) that uses constructs like symbol linkage and visibility, and weak [3]_ 165and common symbol definitions. 166 167To see how this works, imagine a program ``foo`` which links against a pair 168of dynamic libraries: ``libA`` and ``libB``. On the command line, building this 169program might look like: 170 171.. code-block:: bash 172 173 $ clang++ -shared -o libA.dylib a1.cpp a2.cpp 174 $ clang++ -shared -o libB.dylib b1.cpp b2.cpp 175 $ clang++ -o myapp myapp.cpp -L. -lA -lB 176 $ ./myapp 177 178In ORC, this would translate into API calls on a hypothetical CXXCompilingLayer 179(with error checking omitted for brevity) as: 180 181.. code-block:: c++ 182 183 ExecutionSession ES; 184 RTDyldObjectLinkingLayer ObjLinkingLayer( 185 ES, []() { return std::make_unique<SectionMemoryManager>(); }); 186 CXXCompileLayer CXXLayer(ES, ObjLinkingLayer); 187 188 // Create JITDylib "A" and add code to it using the CXX layer. 189 auto &LibA = ES.createJITDylib("A"); 190 CXXLayer.add(LibA, MemoryBuffer::getFile("a1.cpp")); 191 CXXLayer.add(LibA, MemoryBuffer::getFile("a2.cpp")); 192 193 // Create JITDylib "B" and add code to it using the CXX layer. 194 auto &LibB = ES.createJITDylib("B"); 195 CXXLayer.add(LibB, MemoryBuffer::getFile("b1.cpp")); 196 CXXLayer.add(LibB, MemoryBuffer::getFile("b2.cpp")); 197 198 // Create and specify the search order for the main JITDylib. This is 199 // equivalent to a "links against" relationship in a command-line link. 200 auto &MainJD = ES.createJITDylib("main"); 201 MainJD.addToLinkOrder(&LibA); 202 MainJD.addToLinkOrder(&LibB); 203 CXXLayer.add(MainJD, MemoryBuffer::getFile("main.cpp")); 204 205 // Look up the JIT'd main, cast it to a function pointer, then call it. 206 auto MainSym = ExitOnErr(ES.lookup({&MainJD}, "main")); 207 auto *Main = MainSym.getAddress().toPtr<int(*)(int, char *[])>(); 208 209 int Result = Main(...); 210 211This example tells us nothing about *how* or *when* compilation will happen. 212That will depend on the implementation of the hypothetical CXXCompilingLayer. 213The same linker-based symbol resolution rules will apply regardless of that 214implementation, however. For example, if a1.cpp and a2.cpp both define a 215function "foo" then ORCv2 will generate a duplicate definition error. On the 216other hand, if a1.cpp and b1.cpp both define "foo" there is no error (different 217dynamic libraries may define the same symbol). If main.cpp refers to "foo", it 218should bind to the definition in LibA rather than the one in LibB, since 219main.cpp is part of the "main" dylib, and the main dylib links against LibA 220before LibB. 221 222Many JIT clients will have no need for this strict adherence to the usual 223ahead-of-time linking rules, and should be able to get by just fine by putting 224all of their code in a single JITDylib. However, clients who want to JIT code 225for languages/projects that traditionally rely on ahead-of-time linking (e.g. 226C++) will find that this feature makes life much easier. 227 228Symbol lookup in ORC serves two other important functions, beyond providing 229addresses for symbols: (1) It triggers compilation of the symbol(s) searched for 230(if they have not been compiled already), and (2) it provides the 231synchronization mechanism for concurrent compilation. The pseudo-code for the 232lookup process is: 233 234.. code-block:: none 235 236 construct a query object from a query set and query handler 237 lock the session 238 lodge query against requested symbols, collect required materializers (if any) 239 unlock the session 240 dispatch materializers (if any) 241 242In this context a materializer is something that provides a working definition 243of a symbol upon request. Usually materializers are just wrappers for compilers, 244but they may also wrap a jit-linker directly (if the program representation 245backing the definitions is an object file), or may even be a class that writes 246bits directly into memory (for example, if the definitions are 247stubs). Materialization is the blanket term for any actions (compiling, linking, 248splatting bits, registering with runtimes, etc.) that are required to generate a 249symbol definition that is safe to call or access. 250 251As each materializer completes its work it notifies the JITDylib, which in turn 252notifies any query objects that are waiting on the newly materialized 253definitions. Each query object maintains a count of the number of symbols that 254it is still waiting on, and once this count reaches zero the query object calls 255the query handler with a *SymbolMap* (a map of symbol names to addresses) 256describing the result. If any symbol fails to materialize the query immediately 257calls the query handler with an error. 258 259The collected materialization units are sent to the ExecutionSession to be 260dispatched, and the dispatch behavior can be set by the client. By default each 261materializer is run on the calling thread. Clients are free to create new 262threads to run materializers, or to send the work to a work queue for a thread 263pool (this is what LLJIT/LLLazyJIT do). 264 265Top Level APIs 266============== 267 268Many of ORC's top-level APIs are visible in the example above: 269 270- *ExecutionSession* represents the JIT'd program and provides context for the 271 JIT: It contains the JITDylibs, error reporting mechanisms, and dispatches the 272 materializers. 273 274- *JITDylibs* provide the symbol tables. 275 276- *Layers* (ObjLinkingLayer and CXXLayer) are wrappers around compilers and 277 allow clients to add uncompiled program representations supported by those 278 compilers to JITDylibs. 279 280- *ResourceTrackers* allow you to remove code. 281 282Several other important APIs are used explicitly. JIT clients need not be aware 283of them, but Layer authors will use them: 284 285- *MaterializationUnit* - When XXXLayer::add is invoked it wraps the given 286 program representation (in this example, C++ source) in a MaterializationUnit, 287 which is then stored in the JITDylib. MaterializationUnits are responsible for 288 describing the definitions they provide, and for unwrapping the program 289 representation and passing it back to the layer when compilation is required 290 (this ownership shuffle makes writing thread-safe layers easier, since the 291 ownership of the program representation will be passed back on the stack, 292 rather than having to be fished out of a Layer member, which would require 293 synchronization). 294 295- *MaterializationResponsibility* - When a MaterializationUnit hands a program 296 representation back to the layer it comes with an associated 297 MaterializationResponsibility object. This object tracks the definitions 298 that must be materialized and provides a way to notify the JITDylib once they 299 are either successfully materialized or a failure occurs. 300 301Absolute Symbols, Aliases, and Reexports 302======================================== 303 304ORC makes it easy to define symbols with absolute addresses, or symbols that 305are simply aliases of other symbols: 306 307Absolute Symbols 308---------------- 309 310Absolute symbols are symbols that map directly to addresses without requiring 311further materialization, for example: "foo" = 0x1234. One use case for 312absolute symbols is allowing resolution of process symbols. E.g. 313 314.. code-block:: c++ 315 316 JD.define(absoluteSymbols(SymbolMap({ 317 { Mangle("printf"), 318 { ExecutorAddr::fromPtr(&printf), 319 JITSymbolFlags::Callable } } 320 }); 321 322With this mapping established code added to the JIT can refer to printf 323symbolically rather than requiring the address of printf to be "baked in". 324This in turn allows cached versions of the JIT'd code (e.g. compiled objects) 325to be re-used across JIT sessions as the JIT'd code no longer changes, only the 326absolute symbol definition does. 327 328For process and library symbols the DynamicLibrarySearchGenerator utility (See 329:ref:`How to Add Process and Library Symbols to JITDylibs 330<ProcessAndLibrarySymbols>`) can be used to automatically build absolute 331symbol mappings for you. However the absoluteSymbols function is still useful 332for making non-global objects in your JIT visible to JIT'd code. For example, 333imagine that your JIT standard library needs access to your JIT object to make 334some calls. We could bake the address of your object into the library, but then 335it would need to be recompiled for each session: 336 337.. code-block:: c++ 338 339 // From standard library for JIT'd code: 340 341 class MyJIT { 342 public: 343 void log(const char *Msg); 344 }; 345 346 void log(const char *Msg) { ((MyJIT*)0x1234)->log(Msg); } 347 348We can turn this into a symbolic reference in the JIT standard library: 349 350.. code-block:: c++ 351 352 extern MyJIT *__MyJITInstance; 353 354 void log(const char *Msg) { __MyJITInstance->log(Msg); } 355 356And then make our JIT object visible to the JIT standard library with an 357absolute symbol definition when the JIT is started: 358 359.. code-block:: c++ 360 361 MyJIT J = ...; 362 363 auto &JITStdLibJD = ... ; 364 365 JITStdLibJD.define(absoluteSymbols(SymbolMap({ 366 { Mangle("__MyJITInstance"), 367 { ExecutorAddr::fromPtr(&J), JITSymbolFlags() } } 368 }); 369 370Aliases and Reexports 371--------------------- 372 373Aliases and reexports allow you to define new symbols that map to existing 374symbols. This can be useful for changing linkage relationships between symbols 375across sessions without having to recompile code. For example, imagine that 376JIT'd code has access to a log function, ``void log(const char*)`` for which 377there are two implementations in the JIT standard library: ``log_fast`` and 378``log_detailed``. Your JIT can choose which one of these definitions will be 379used when the ``log`` symbol is referenced by setting up an alias at JIT startup 380time: 381 382.. code-block:: c++ 383 384 auto &JITStdLibJD = ... ; 385 386 auto LogImplementationSymbol = 387 Verbose ? Mangle("log_detailed") : Mangle("log_fast"); 388 389 JITStdLibJD.define( 390 symbolAliases(SymbolAliasMap({ 391 { Mangle("log"), 392 { LogImplementationSymbol 393 JITSymbolFlags::Exported | JITSymbolFlags::Callable } } 394 }); 395 396The ``symbolAliases`` function allows you to define aliases within a single 397JITDylib. The ``reexports`` function provides the same functionality, but 398operates across JITDylib boundaries. E.g. 399 400.. code-block:: c++ 401 402 auto &JD1 = ... ; 403 auto &JD2 = ... ; 404 405 // Make 'bar' in JD2 an alias for 'foo' from JD1. 406 JD2.define( 407 reexports(JD1, SymbolAliasMap({ 408 { Mangle("bar"), { Mangle("foo"), JITSymbolFlags::Exported } } 409 }); 410 411The reexports utility can be handy for composing a single JITDylib interface by 412re-exporting symbols from several other JITDylibs. 413 414.. _Laziness: 415 416Laziness 417======== 418 419Laziness in ORC is provided by a utility called "lazy reexports". A lazy 420reexport is similar to a regular reexport or alias: It provides a new name for 421an existing symbol. Unlike regular reexports however, lookups of lazy reexports 422do not trigger immediate materialization of the reexported symbol. Instead, they 423only trigger materialization of a function stub. This function stub is 424initialized to point at a *lazy call-through*, which provides reentry into the 425JIT. If the stub is called at runtime then the lazy call-through will look up 426the reexported symbol (triggering materialization for it if necessary), update 427the stub (to call directly to the reexported symbol on subsequent calls), and 428then return via the reexported symbol. By re-using the existing symbol lookup 429mechanism, lazy reexports inherit the same concurrency guarantees: calls to lazy 430reexports can be made from multiple threads concurrently, and the reexported 431symbol can be any state of compilation (uncompiled, already in the process of 432being compiled, or already compiled) and the call will succeed. This allows 433laziness to be safely mixed with features like remote compilation, concurrent 434compilation, concurrent JIT'd code, and speculative compilation. 435 436There is one other key difference between regular reexports and lazy reexports 437that some clients must be aware of: The address of a lazy reexport will be 438*different* from the address of the reexported symbol (whereas a regular 439reexport is guaranteed to have the same address as the reexported symbol). 440Clients who care about pointer equality will generally want to use the address 441of the reexport as the canonical address of the reexported symbol. This will 442allow the address to be taken without forcing materialization of the reexport. 443 444Usage example: 445 446If JITDylib ``JD`` contains definitions for symbols ``foo_body`` and 447``bar_body``, we can create lazy entry points ``Foo`` and ``Bar`` in JITDylib 448``JD2`` by calling: 449 450.. code-block:: c++ 451 452 auto ReexportFlags = JITSymbolFlags::Exported | JITSymbolFlags::Callable; 453 JD2.define( 454 lazyReexports(CallThroughMgr, StubsMgr, JD, 455 SymbolAliasMap({ 456 { Mangle("foo"), { Mangle("foo_body"), ReexportedFlags } }, 457 { Mangle("bar"), { Mangle("bar_body"), ReexportedFlags } } 458 })); 459 460A full example of how to use lazyReexports with the LLJIT class can be found at 461``llvm/examples/OrcV2Examples/LLJITWithLazyReexports``. 462 463Supporting Custom Compilers 464=========================== 465 466TBD. 467 468.. _transitioning_orcv1_to_orcv2: 469 470Transitioning from ORCv1 to ORCv2 471================================= 472 473Since LLVM 7.0, new ORC development work has focused on adding support for 474concurrent JIT compilation. The new APIs (including new layer interfaces and 475implementations, and new utilities) that support concurrency are collectively 476referred to as ORCv2, and the original, non-concurrent layers and utilities 477are now referred to as ORCv1. 478 479The majority of the ORCv1 layers and utilities were renamed with a 'Legacy' 480prefix in LLVM 8.0, and have deprecation warnings attached in LLVM 9.0. In LLVM 48112.0 ORCv1 will be removed entirely. 482 483Transitioning from ORCv1 to ORCv2 should be easy for most clients. Most of the 484ORCv1 layers and utilities have ORCv2 counterparts [2]_ that can be directly 485substituted. However there are some design differences between ORCv1 and ORCv2 486to be aware of: 487 488 1. ORCv2 fully adopts the JIT-as-linker model that began with MCJIT. Modules 489 (and other program representations, e.g. Object Files) are no longer added 490 directly to JIT classes or layers. Instead, they are added to ``JITDylib`` 491 instances *by* layers. The ``JITDylib`` determines *where* the definitions 492 reside, the layers determine *how* the definitions will be compiled. 493 Linkage relationships between ``JITDylibs`` determine how inter-module 494 references are resolved, and symbol resolvers are no longer used. See the 495 section `Design Overview`_ for more details. 496 497 Unless multiple JITDylibs are needed to model linkage relationships, ORCv1 498 clients should place all code in a single JITDylib. 499 MCJIT clients should use LLJIT (see `LLJIT and LLLazyJIT`_), and can place 500 code in LLJIT's default created main JITDylib (See 501 ``LLJIT::getMainJITDylib()``). 502 503 2. All JIT stacks now need an ``ExecutionSession`` instance. ExecutionSession 504 manages the string pool, error reporting, synchronization, and symbol 505 lookup. 506 507 3. ORCv2 uses uniqued strings (``SymbolStringPtr`` instances) rather than 508 string values in order to reduce memory overhead and improve lookup 509 performance. See the subsection `How to manage symbol strings`_. 510 511 4. IR layers require ThreadSafeModule instances, rather than 512 std::unique_ptr<Module>s. ThreadSafeModule is a wrapper that ensures that 513 Modules that use the same LLVMContext are not accessed concurrently. 514 See `How to use ThreadSafeModule and ThreadSafeContext`_. 515 516 5. Symbol lookup is no longer handled by layers. Instead, there is a 517 ``lookup`` method on JITDylib that takes a list of JITDylibs to scan. 518 519 .. code-block:: c++ 520 521 ExecutionSession ES; 522 JITDylib &JD1 = ...; 523 JITDylib &JD2 = ...; 524 525 auto Sym = ES.lookup({&JD1, &JD2}, ES.intern("_main")); 526 527 6. The removeModule/removeObject methods are replaced by 528 ``ResourceTracker::remove``. 529 See the subsection `How to remove code`_. 530 531For code examples and suggestions of how to use the ORCv2 APIs, please see 532the section `How-tos`_. 533 534How-tos 535======= 536 537How to manage symbol strings 538---------------------------- 539 540Symbol strings in ORC are uniqued to improve lookup performance, reduce memory 541overhead, and allow symbol names to function as efficient keys. To get the 542unique ``SymbolStringPtr`` for a string value, call the 543``ExecutionSession::intern`` method: 544 545 .. code-block:: c++ 546 547 ExecutionSession ES; 548 /// ... 549 auto MainSymbolName = ES.intern("main"); 550 551If you wish to perform lookup using the C/IR name of a symbol you will also 552need to apply the platform linker-mangling before interning the string. On 553Linux this mangling is a no-op, but on other platforms it usually involves 554adding a prefix to the string (e.g. '_' on Darwin). The mangling scheme is 555based on the DataLayout for the target. Given a DataLayout and an 556ExecutionSession, you can create a MangleAndInterner function object that 557will perform both jobs for you: 558 559 .. code-block:: c++ 560 561 ExecutionSession ES; 562 const DataLayout &DL = ...; 563 MangleAndInterner Mangle(ES, DL); 564 565 // ... 566 567 // Portable IR-symbol-name lookup: 568 auto Sym = ES.lookup({&MainJD}, Mangle("main")); 569 570How to create JITDylibs and set up linkage relationships 571-------------------------------------------------------- 572 573In ORC, all symbol definitions reside in JITDylibs. JITDylibs are created by 574calling the ``ExecutionSession::createJITDylib`` method with a unique name: 575 576 .. code-block:: c++ 577 578 ExecutionSession ES; 579 auto &JD = ES.createJITDylib("libFoo.dylib"); 580 581The JITDylib is owned by the ``ExecutionEngine`` instance and will be freed 582when it is destroyed. 583 584How to remove code 585------------------ 586 587To remove an individual module from a JITDylib it must first be added using an 588explicit ``ResourceTracker``. The module can then be removed by calling 589``ResourceTracker::remove``: 590 591 .. code-block:: c++ 592 593 auto &JD = ... ; 594 auto M = ... ; 595 596 auto RT = JD.createResourceTracker(); 597 Layer.add(RT, std::move(M)); // Add M to JD, tracking resources with RT 598 599 RT.remove(); // Remove M from JD. 600 601Modules added directly to a JITDylib will be tracked by that JITDylib's default 602resource tracker. 603 604All code can be removed from a JITDylib by calling ``JITDylib::clear``. This 605leaves the cleared JITDylib in an empty but usable state. 606 607JITDylibs can be removed by calling ``ExecutionSession::removeJITDylib``. This 608clears the JITDylib and then puts it into a defunct state. No further operations 609can be performed on the JITDylib, and it will be destroyed as soon as the last 610handle to it is released. 611 612An example of how to use the resource management APIs can be found at 613``llvm/examples/OrcV2Examples/LLJITRemovableCode``. 614 615 616How to add the support for custom program representation 617-------------------------------------------------------- 618In order to add the support for a custom program representation, a custom ``MaterializationUnit`` 619for the program representation, and a custom ``Layer`` are needed. The Layer will have two 620operations: ``add`` and ``emit``. The ``add`` operation takes an instance of your program 621representation, builds one of your custom ``MaterializationUnits`` to hold it, then adds it 622to a ``JITDylib``. The emit operation takes a ``MaterializationResponsibility`` object and an 623instance of your program representation and materializes it, usually by compiling it and handing 624the resulting object off to an ``ObjectLinkingLayer``. 625 626Your custom ``MaterializationUnit`` will have two operations: ``materialize`` and ``discard``. The 627``materialize`` function will be called for you when any symbol provided by the unit is looked up, 628and it should just call the ``emit`` function on your layer, passing in the given 629``MaterializationResponsibility`` and the wrapped program representation. The ``discard`` function 630will be called if some weak symbol provided by your unit is not needed (because the JIT found an 631overriding definition). You can use this to drop your definition early, or just ignore it and let 632the linker drops the definition later. 633 634Here is an example of an ASTLayer: 635 636 .. code-block:: c++ 637 638 // ... In you JIT class 639 AstLayer astLayer; 640 // ... 641 642 643 class AstMaterializationUnit : public orc::MaterializationUnit { 644 public: 645 AstMaterializationUnit(AstLayer &l, Ast &ast) 646 : llvm::orc::MaterializationUnit(l.getInterface(ast)), astLayer(l), 647 ast(ast) {}; 648 649 llvm::StringRef getName() const override { 650 return "AstMaterializationUnit"; 651 } 652 653 void materialize(std::unique_ptr<orc::MaterializationResponsibility> r) override { 654 astLayer.emit(std::move(r), ast); 655 }; 656 657 private: 658 void discard(const llvm::orc::JITDylib &jd, const llvm::orc::SymbolStringPtr &sym) override { 659 llvm_unreachable("functions are not overridable"); 660 } 661 662 663 AstLayer &astLayer; 664 Ast * 665 }; 666 667 class AstLayer { 668 llvhm::orc::IRLayer &baseLayer; 669 llvhm::orc::MangleAndInterner &mangler; 670 671 public: 672 AstLayer(llvm::orc::IRLayer &baseLayer, llvm::orc::MangleAndInterner &mangler) 673 : baseLayer(baseLayer), mangler(mangler){}; 674 675 llvm::Error add(llvm::orc::ResourceTrackerSP &rt, Ast &ast) { 676 return rt->getJITDylib().define(std::make_unique<AstMaterializationUnit>(*this, ast), rt); 677 } 678 679 void emit(std::unique_ptr<orc::MaterializationResponsibility> mr, Ast &ast) { 680 // compileAst is just function that compiles the given AST and returns 681 // a `llvm::orc::ThreadSafeModule` 682 baseLayer.emit(std::move(mr), compileAst(ast)); 683 } 684 685 llvm::orc::MaterializationUnit::Interface getInterface(Ast &ast) { 686 SymbolFlagsMap Symbols; 687 // Find all the symbols in the AST and for each of them 688 // add it to the Symbols map. 689 Symbols[mangler(someNameFromAST)] = 690 JITSymbolFlags(JITSymbolFlags::Exported | JITSymbolFlags::Callable); 691 return MaterializationUnit::Interface(std::move(Symbols), nullptr); 692 } 693 }; 694 695Take look at the source code of `Building A JIT's Chapter 4 <tutorial/BuildingAJIT4.html>`_ for a complete example. 696 697How to use ThreadSafeModule and ThreadSafeContext 698------------------------------------------------- 699 700ThreadSafeModule and ThreadSafeContext are wrappers around Modules and 701LLVMContexts respectively. A ThreadSafeModule is a pair of a 702std::unique_ptr<Module> and a (possibly shared) ThreadSafeContext value. A 703ThreadSafeContext is a pair of a std::unique_ptr<LLVMContext> and a lock. 704This design serves two purposes: providing a locking scheme and lifetime 705management for LLVMContexts. The ThreadSafeContext may be locked to prevent 706accidental concurrent access by two Modules that use the same LLVMContext. 707The underlying LLVMContext is freed once all ThreadSafeContext values pointing 708to it are destroyed, allowing the context memory to be reclaimed as soon as 709the Modules referring to it are destroyed. 710 711ThreadSafeContexts can be explicitly constructed from a 712std::unique_ptr<LLVMContext>: 713 714 .. code-block:: c++ 715 716 ThreadSafeContext TSCtx(std::make_unique<LLVMContext>()); 717 718ThreadSafeModules can be constructed from a pair of a std::unique_ptr<Module> 719and a ThreadSafeContext value. ThreadSafeContext values may be shared between 720multiple ThreadSafeModules: 721 722 .. code-block:: c++ 723 724 ThreadSafeModule TSM1( 725 std::make_unique<Module>("M1", *TSCtx.getContext()), TSCtx); 726 727 ThreadSafeModule TSM2( 728 std::make_unique<Module>("M2", *TSCtx.getContext()), TSCtx); 729 730Before using a ThreadSafeContext, clients should ensure that either the context 731is only accessible on the current thread, or that the context is locked. In the 732example above (where the context is never locked) we rely on the fact that both 733``TSM1`` and ``TSM2``, and TSCtx are all created on one thread. If a context is 734going to be shared between threads then it must be locked before any accessing 735or creating any Modules attached to it. E.g. 736 737 .. code-block:: c++ 738 739 ThreadSafeContext TSCtx(std::make_unique<LLVMContext>()); 740 741 DefaultThreadPool TP(NumThreads); 742 JITStack J; 743 744 for (auto &ModulePath : ModulePaths) { 745 TP.async( 746 [&]() { 747 auto Lock = TSCtx.getLock(); 748 auto M = loadModuleOnContext(ModulePath, TSCtx.getContext()); 749 J.addModule(ThreadSafeModule(std::move(M), TSCtx)); 750 }); 751 } 752 753 TP.wait(); 754 755To make exclusive access to Modules easier to manage the ThreadSafeModule class 756provides a convenience function, ``withModuleDo``, that implicitly (1) locks the 757associated context, (2) runs a given function object, (3) unlocks the context, 758and (3) returns the result generated by the function object. E.g. 759 760 .. code-block:: c++ 761 762 ThreadSafeModule TSM = getModule(...); 763 764 // Dump the module: 765 size_t NumFunctionsInModule = 766 TSM.withModuleDo( 767 [](Module &M) { // <- Context locked before entering lambda. 768 return M.size(); 769 } // <- Context unlocked after leaving. 770 ); 771 772Clients wishing to maximize possibilities for concurrent compilation will want 773to create every new ThreadSafeModule on a new ThreadSafeContext. For this 774reason a convenience constructor for ThreadSafeModule is provided that implicitly 775constructs a new ThreadSafeContext value from a std::unique_ptr<LLVMContext>: 776 777 .. code-block:: c++ 778 779 // Maximize concurrency opportunities by loading every module on a 780 // separate context. 781 for (const auto &IRPath : IRPaths) { 782 auto Ctx = std::make_unique<LLVMContext>(); 783 auto M = std::make_unique<Module>("M", *Ctx); 784 CompileLayer.add(MainJD, ThreadSafeModule(std::move(M), std::move(Ctx))); 785 } 786 787Clients who plan to run single-threaded may choose to save memory by loading 788all modules on the same context: 789 790 .. code-block:: c++ 791 792 // Save memory by using one context for all Modules: 793 ThreadSafeContext TSCtx(std::make_unique<LLVMContext>()); 794 for (const auto &IRPath : IRPaths) { 795 ThreadSafeModule TSM(parsePath(IRPath, *TSCtx.getContext()), TSCtx); 796 CompileLayer.add(MainJD, ThreadSafeModule(std::move(TSM)); 797 } 798 799.. _ProcessAndLibrarySymbols: 800 801How to Add Process and Library Symbols to JITDylibs 802=================================================== 803 804JIT'd code may need to access symbols in the host program or in supporting 805libraries. The best way to enable this is to reflect these symbols into your 806JITDylibs so that they appear the same as any other symbol defined within the 807execution session (i.e. they are findable via `ExecutionSession::lookup`, and 808so visible to the JIT linker during linking). 809 810One way to reflect external symbols is to add them manually using the 811absoluteSymbols function: 812 813 .. code-block:: c++ 814 815 const DataLayout &DL = getDataLayout(); 816 MangleAndInterner Mangle(ES, DL); 817 818 auto &JD = ES.createJITDylib("main"); 819 820 JD.define( 821 absoluteSymbols({ 822 { Mangle("puts"), ExecutorAddr::fromPtr(&puts)}, 823 { Mangle("gets"), ExecutorAddr::fromPtr(&getS)} 824 })); 825 826Using absoluteSymbols is reasonable if the set of symbols to be reflected is 827small and fixed. On the other hand, if the set of symbols is large or variable 828it may make more sense to have the definitions added for you on demand by a 829*definition generator*.A definition generator is an object that can be attached 830to a JITDylib, receiving a callback whenever a lookup within that JITDylib fails 831to find one or more symbols. The definition generator is given a chance to 832produce a definition of the missing symbol(s) before the lookup proceeds. 833 834ORC provides the ``DynamicLibrarySearchGenerator`` utility for reflecting symbols 835from the process (or a specific dynamic library) for you. For example, to reflect 836the whole interface of a runtime library: 837 838 .. code-block:: c++ 839 840 const DataLayout &DL = getDataLayout(); 841 auto &JD = ES.createJITDylib("main"); 842 843 if (auto DLSGOrErr = 844 DynamicLibrarySearchGenerator::Load("/path/to/lib" 845 DL.getGlobalPrefix())) 846 JD.addGenerator(std::move(*DLSGOrErr); 847 else 848 return DLSGOrErr.takeError(); 849 850 // IR added to JD can now link against all symbols exported by the library 851 // at '/path/to/lib'. 852 CompileLayer.add(JD, loadModule(...)); 853 854The ``DynamicLibrarySearchGenerator`` utility can also be constructed with a 855filter function to restrict the set of symbols that may be reflected. For 856example, to expose an allowed set of symbols from the main process: 857 858 .. code-block:: c++ 859 860 const DataLayout &DL = getDataLayout(); 861 MangleAndInterner Mangle(ES, DL); 862 863 auto &JD = ES.createJITDylib("main"); 864 865 DenseSet<SymbolStringPtr> AllowList({ 866 Mangle("puts"), 867 Mangle("gets") 868 }); 869 870 // Use GetForCurrentProcess with a predicate function that checks the 871 // allowed list. 872 JD.addGenerator(cantFail(DynamicLibrarySearchGenerator::GetForCurrentProcess( 873 DL.getGlobalPrefix(), 874 [&](const SymbolStringPtr &S) { return AllowList.count(S); }))); 875 876 // IR added to JD can now link against any symbols exported by the process 877 // and contained in the list. 878 CompileLayer.add(JD, loadModule(...)); 879 880References to process or library symbols could also be hardcoded into your IR 881or object files using the symbols' raw addresses, however symbolic resolution 882using the JIT symbol tables should be preferred: it keeps the IR and objects 883readable and reusable in subsequent JIT sessions. Hardcoded addresses are 884difficult to read, and usually only good for one session. 885 886Roadmap 887======= 888 889ORC is still undergoing active development. Some current and future works are 890listed below. 891 892Current Work 893------------ 894 8951. **TargetProcessControl: Improvements to in-tree support for out-of-process 896 execution** 897 898 The ``TargetProcessControl`` API provides various operations on the JIT 899 target process (the one which will execute the JIT'd code), including 900 memory allocation, memory writes, function execution, and process queries 901 (e.g. for the target triple). By targeting this API new components can be 902 developed which will work equally well for in-process and out-of-process 903 JITing. 904 905 9062. **ORC RPC based TargetProcessControl implementation** 907 908 An ORC RPC based implementation of the ``TargetProcessControl`` API is 909 currently under development to enable easy out-of-process JITing via 910 file descriptors / sockets. 911 9123. **Core State Machine Cleanup** 913 914 The core ORC state machine is currently implemented between JITDylib and 915 ExecutionSession. Methods are slowly being moved to `ExecutionSession`. This 916 will tidy up the code base, and also allow us to support asynchronous removal 917 of JITDylibs (in practice deleting an associated state object in 918 ExecutionSession and leaving the JITDylib instance in a defunct state until 919 all references to it have been released). 920 921Near Future Work 922---------------- 923 9241. **ORC JIT Runtime Libraries** 925 926 We need a runtime library for JIT'd code. This would include things like 927 TLS registration, reentry functions, registration code for language runtimes 928 (e.g. Objective C and Swift) and other JIT specific runtime code. This should 929 be built in a similar manner to compiler-rt (possibly even as part of it). 930 9312. **Remote jit_dlopen / jit_dlclose** 932 933 To more fully mimic the environment that static programs operate in we would 934 like JIT'd code to be able to "dlopen" and "dlclose" JITDylibs, running all of 935 their initializers/deinitializers on the current thread. This would require 936 support from the runtime library described above. 937 9383. **Debugging support** 939 940 ORC currently supports the GDBRegistrationListener API when using RuntimeDyld 941 as the underlying JIT linker. We will need a new solution for JITLink based 942 platforms. 943 944Further Future Work 945------------------- 946 9471. **Speculative Compilation** 948 949 ORC's support for concurrent compilation allows us to easily enable 950 *speculative* JIT compilation: compilation of code that is not needed yet, 951 but which we have reason to believe will be needed in the future. This can be 952 used to hide compile latency and improve JIT throughput. A proof-of-concept 953 example of speculative compilation with ORC has already been developed (see 954 ``llvm/examples/SpeculativeJIT``). Future work on this is likely to focus on 955 re-using and improving existing profiling support (currently used by PGO) to 956 feed speculation decisions, as well as built-in tools to simplify use of 957 speculative compilation. 958 959.. [1] Formats/architectures vary in terms of supported features. MachO and 960 ELF tend to have better support than COFF. Patches very welcome! 961 962.. [2] The ``LazyEmittingLayer``, ``RemoteObjectClientLayer`` and 963 ``RemoteObjectServerLayer`` do not have counterparts in the new 964 system. In the case of ``LazyEmittingLayer`` it was simply no longer 965 needed: in ORCv2, deferring compilation until symbols are looked up is 966 the default. The removal of ``RemoteObjectClientLayer`` and 967 ``RemoteObjectServerLayer`` means that JIT stacks can no longer be split 968 across processes, however this functionality appears not to have been 969 used. 970 971.. [3] Weak definitions are currently handled correctly within dylibs, but if 972 multiple dylibs provide a weak definition of a symbol then each will end 973 up with its own definition (similar to how weak definitions are handled 974 in Windows DLLs). This will be fixed in the future. 975