128ab5944SJoseph Huber============================= 228ab5944SJoseph HuberOffloading Design & Internals 328ab5944SJoseph Huber============================= 428ab5944SJoseph Huber 528ab5944SJoseph Huber.. contents:: 628ab5944SJoseph Huber :local: 728ab5944SJoseph Huber 828ab5944SJoseph HuberIntroduction 928ab5944SJoseph Huber============ 1028ab5944SJoseph Huber 1128ab5944SJoseph HuberThis document describes the Clang driver and code generation steps for creating 1228ab5944SJoseph Huberoffloading applications. Clang supports offloading to various architectures 1328ab5944SJoseph Huberusing programming models like CUDA, HIP, and OpenMP. The purpose of this 1428ab5944SJoseph Huberdocument is to illustrate the steps necessary to create an offloading 1528ab5944SJoseph Huberapplication using Clang. 1628ab5944SJoseph Huber 1728ab5944SJoseph HuberOpenMP Offloading 1828ab5944SJoseph Huber================= 1928ab5944SJoseph Huber 2028ab5944SJoseph HuberClang supports OpenMP target offloading to several different architectures such 2128ab5944SJoseph Huberas NVPTX, AMDGPU, X86_64, Arm, and PowerPC. Offloading code is generated by 2228ab5944SJoseph HuberClang and then executed using the ``libomptarget`` runtime and the associated 2328ab5944SJoseph Huberplugin for the target architecture, e.g. ``libomptarget.rtl.cuda``. This section 2428ab5944SJoseph Huberdescribes the steps necessary to create a functioning device image that can be 2528ab5944SJoseph Huberloaded by the OpenMP runtime. More information on the OpenMP runtimes can be 2628ab5944SJoseph Huberfound at the `OpenMP documentation page <https://openmp.llvm.org>`__. 2728ab5944SJoseph Huber 2828ab5944SJoseph Huber.. _Offloading Overview: 2928ab5944SJoseph Huber 3028ab5944SJoseph HuberOffloading Overview 3128ab5944SJoseph Huber------------------- 3228ab5944SJoseph Huber 3328ab5944SJoseph HuberThe goal of offloading compilation is to create an executable device image that 3428ab5944SJoseph Hubercan be run on the target device. OpenMP offloading creates executable images by 3528ab5944SJoseph Hubercompiling the input file for both the host and the target device. The output 3628ab5944SJoseph Huberfrom the device phase then needs to be embedded into the host to create a fat 3728ab5944SJoseph Huberobject. A special tool then needs to extract the device code from the fat 3828ab5944SJoseph Huberobjects, run the device linking step, and embed the final image in a symbol the 395c9ee351SJoseph Huberhost runtime library can use to register the library and access the symbols on 405c9ee351SJoseph Huberthe device. 4128ab5944SJoseph Huber 4228ab5944SJoseph HuberCompilation Process 4328ab5944SJoseph Huber^^^^^^^^^^^^^^^^^^^ 4428ab5944SJoseph Huber 455c9ee351SJoseph HuberThe compiler performs the following high-level actions to generate OpenMP 465c9ee351SJoseph Huberoffloading code: 4728ab5944SJoseph Huber 4828ab5944SJoseph Huber* Compile the input file for the host to produce a bitcode file. Lower ``#pragma 4928ab5944SJoseph Huber omp target`` declarations to :ref:`offloading entries <Generating Offloading 5028ab5944SJoseph Huber Entries>` and create metadata to indicate which entries are on the device. 5128ab5944SJoseph Huber* Compile the input file for the target :ref:`device <Device Compilation>` using 5228ab5944SJoseph Huber the :ref:`offloading entry <Generating Offloading Entries>` metadata created 5328ab5944SJoseph Huber by the host. 5428ab5944SJoseph Huber* Link the OpenMP device runtime library and run the backend to create a device 5528ab5944SJoseph Huber object file. 5628ab5944SJoseph Huber* Run the backend on the host bitcode file and create a :ref:`fat object file 5728ab5944SJoseph Huber <Creating Fat Objects>` using the device object file. 5828ab5944SJoseph Huber* Pass the fat object file to the :ref:`linker wrapper tool <Device Linking>` 5928ab5944SJoseph Huber and extract the device objects. Run the device linking action on the extracted 6028ab5944SJoseph Huber objects. 6128ab5944SJoseph Huber* :ref:`Wrap <Device Binary Wrapping>` the :ref:`device images <Device linking>` 6228ab5944SJoseph Huber and :ref:`offload entries <Generating Offloading Entries>` in a symbol that 6328ab5944SJoseph Huber can be accessed by the host. 6428ab5944SJoseph Huber* Add the :ref:`wrapped binary <Device Binary Wrapping>` to the linker input and 6528ab5944SJoseph Huber run the host linking action. Link with ``libomptarget`` to register and 6628ab5944SJoseph Huber execute the images. 6728ab5944SJoseph Huber 6828ab5944SJoseph Huber .. _Generating Offloading Entries: 6928ab5944SJoseph Huber 7028ab5944SJoseph HuberGenerating Offloading Entries 7128ab5944SJoseph Huber----------------------------- 7228ab5944SJoseph Huber 7328ab5944SJoseph HuberThe first step in compilation is to generate offloading entries for the host. 7428ab5944SJoseph HuberThis information is used to identify function kernels or global values that will 7528ab5944SJoseph Huberbe provided by the device. Blocks contained in a ``#pragma omp target`` or 7628ab5944SJoseph Hubersymbols inside a ``#pragma omp declare target`` directive will have offloading 7728ab5944SJoseph Huberentries generated. The following table shows the :ref:`offload entry structure 7828ab5944SJoseph Huber<table-tgt_offload_entry_structure>`. 7928ab5944SJoseph Huber 8028ab5944SJoseph Huber .. table:: __tgt_offload_entry Structure 8128ab5944SJoseph Huber :name: table-tgt_offload_entry_structure 8228ab5944SJoseph Huber 8328ab5944SJoseph Huber +---------+------------+------------------------------------------------------------------------+ 8428ab5944SJoseph Huber | Type | Identifier | Description | 8528ab5944SJoseph Huber +=========+============+========================================================================+ 8628ab5944SJoseph Huber | void* | addr | Address of global symbol within device image (function or global) | 8728ab5944SJoseph Huber +---------+------------+------------------------------------------------------------------------+ 8828ab5944SJoseph Huber | char* | name | Name of the symbol | 8928ab5944SJoseph Huber +---------+------------+------------------------------------------------------------------------+ 9028ab5944SJoseph Huber | size_t | size | Size of the entry info (0 if it is a function) | 9128ab5944SJoseph Huber +---------+------------+------------------------------------------------------------------------+ 9228ab5944SJoseph Huber | int32_t | flags | Flags associated with the entry (see :ref:`table-offload_entry_flags`) | 9328ab5944SJoseph Huber +---------+------------+------------------------------------------------------------------------+ 9428ab5944SJoseph Huber | int32_t | reserved | Reserved, to be used by the runtime library. | 9528ab5944SJoseph Huber +---------+------------+------------------------------------------------------------------------+ 9628ab5944SJoseph Huber 975c9ee351SJoseph HuberThe address of the global symbol will be set to the device pointer value by the 9828ab5944SJoseph Huberruntime once the device image is loaded. The flags are set to indicate the 9928ab5944SJoseph Huberhandling required for the offloading entry. If the offloading entry is an entry 1005c9ee351SJoseph Huberto a target region it can have one of the following :ref:`entry flags 1015c9ee351SJoseph Huber<table-offload_entry_flags>`. 10228ab5944SJoseph Huber 10328ab5944SJoseph Huber .. table:: Target Region Entry Flags 10428ab5944SJoseph Huber :name: table-offload_entry_flags 10528ab5944SJoseph Huber 10628ab5944SJoseph Huber +----------------------------------+-------+-----------------------------------------+ 10728ab5944SJoseph Huber | Name | Value | Description | 10828ab5944SJoseph Huber +==================================+=======+=========================================+ 10928ab5944SJoseph Huber | OMPTargetRegionEntryTargetRegion | 0x00 | Mark the entry as generic target region | 11028ab5944SJoseph Huber +----------------------------------+-------+-----------------------------------------+ 11128ab5944SJoseph Huber | OMPTargetRegionEntryCtor | 0x02 | Mark the entry as a global constructor | 11228ab5944SJoseph Huber +----------------------------------+-------+-----------------------------------------+ 11328ab5944SJoseph Huber | OMPTargetRegionEntryDtor | 0x04 | Mark the entry as a global destructor | 11428ab5944SJoseph Huber +----------------------------------+-------+-----------------------------------------+ 11528ab5944SJoseph Huber 11628ab5944SJoseph HuberIf the offloading entry is a global variable, indicated by a non-zero size, it 11728ab5944SJoseph Huberwill instead have one of the following :ref:`global 11828ab5944SJoseph Huber<table-offload_global_flags>` flags. 11928ab5944SJoseph Huber 12028ab5944SJoseph Huber .. table:: Target Region Global 12128ab5944SJoseph Huber :name: table-offload_global_flags 12228ab5944SJoseph Huber 12328ab5944SJoseph Huber +-----------------------------+-------+---------------------------------------------------------------+ 12428ab5944SJoseph Huber | Name | Value | Description | 12528ab5944SJoseph Huber +=============================+=======+===============================================================+ 12628ab5944SJoseph Huber | OMPTargetGlobalVarEntryTo | 0x00 | Mark the entry as a 'to' attribute (w.r.t. the to clause) | 12728ab5944SJoseph Huber +-----------------------------+-------+---------------------------------------------------------------+ 12828ab5944SJoseph Huber | OMPTargetGlobalVarEntryLink | 0x01 | Mark the entry as a 'link' attribute (w.r.t. the link clause) | 12928ab5944SJoseph Huber +-----------------------------+-------+---------------------------------------------------------------+ 13028ab5944SJoseph Huber 13128ab5944SJoseph HuberThe target offload entries are used by the runtime to access the device kernels 13228ab5944SJoseph Huberand globals that will be provided by the final device image. Each offloading 13328ab5944SJoseph Huberentry is set to use the ``omp_offloading_entries`` section. When the final 13428ab5944SJoseph Huberapplication is created the linker will provide the 13528ab5944SJoseph Huber``__start_omp_offloading_entries`` and ``__stop_omp_offloading_entries`` symbols 13628ab5944SJoseph Huberwhich are used to create the :ref:`final image <Device Binary Wrapping>`. 13728ab5944SJoseph Huber 1385c9ee351SJoseph HuberThis information is used by the device compilation stage to determine which 1395c9ee351SJoseph Hubersymbols need to be exported from the device. We use the ``omp_offload.info`` 1405c9ee351SJoseph Hubermetadata node to pass this information device compilation stage. 14128ab5944SJoseph Huber 14228ab5944SJoseph HuberAccessing Entries on the Device 14328ab5944SJoseph Huber^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 14428ab5944SJoseph Huber 14528ab5944SJoseph HuberAccessing the entries in the device is done using the address field in the 14628ab5944SJoseph Huber:ref:`offload entry<table-tgt_offload_entry_structure>`. The runtime will set 14728ab5944SJoseph Huberthe address to the pointer associated with the device image during runtime 14828ab5944SJoseph Huberinitialization. This is used to call the corresponding kernel function when 14928ab5944SJoseph Huberentering a ``#pragma omp target`` region. For variables, the runtime maintains a 15028ab5944SJoseph Hubertable mapping host pointers to device pointers. Global variables inside a 1515c9ee351SJoseph Huber``#pragma omp target declare`` directive are first initialized to the host's 15228ab5944SJoseph Huberaddress. Once the device address is initialized we insert it into the table to 15328ab5944SJoseph Hubermap the host address to the device address. 15428ab5944SJoseph Huber 15528ab5944SJoseph HuberDebugging Information 15628ab5944SJoseph Huber^^^^^^^^^^^^^^^^^^^^^ 15728ab5944SJoseph Huber 15828ab5944SJoseph HuberWe generate structures to hold debugging information that is passed to 15928ab5944SJoseph Huber``libomptarget``. This allows the front-end to generate information the runtime 16028ab5944SJoseph Huberlibrary uses for more informative error messages. This is done using the 16128ab5944SJoseph Huberstandard :ref:`identifier structure <table-ident_t_structure>` used in 16228ab5944SJoseph Huber``libomp`` and ``libomptarget``. This is used to pass information and source 16328ab5944SJoseph Huberlocations to the runtime. 16428ab5944SJoseph Huber 16528ab5944SJoseph Huber .. table:: ident_t Structure 16628ab5944SJoseph Huber :name: table-ident_t_structure 16728ab5944SJoseph Huber 16828ab5944SJoseph Huber +---------+------------+-----------------------------------------------------------------------------+ 16928ab5944SJoseph Huber | Type | Identifier | Description | 17028ab5944SJoseph Huber +=========+============+=============================================================================+ 17128ab5944SJoseph Huber | int32_t | reserved | Reserved, to be used by the runtime library. | 17228ab5944SJoseph Huber +---------+------------+-----------------------------------------------------------------------------+ 17328ab5944SJoseph Huber | int32_t | flags | Flags used to indicate some features, mostly unused. | 17428ab5944SJoseph Huber +---------+------------+-----------------------------------------------------------------------------+ 17528ab5944SJoseph Huber | int32_t | reserved | Reserved, to be used by the runtime library. | 17628ab5944SJoseph Huber +---------+------------+-----------------------------------------------------------------------------+ 17728ab5944SJoseph Huber | int32_t | reserved | Reserved, to be used by the runtime library. | 17828ab5944SJoseph Huber +---------+------------+-----------------------------------------------------------------------------+ 17928ab5944SJoseph Huber | char* | psource | Program source information, stored as ";filename;function;line;column;;\\0" | 18028ab5944SJoseph Huber +---------+------------+-----------------------------------------------------------------------------+ 18128ab5944SJoseph Huber 18228ab5944SJoseph HuberIf debugging information is enabled, we will also create strings to indicate the 18328ab5944SJoseph Hubernames and declarations of variables mapped in target regions. These have the 18428ab5944SJoseph Hubersame format as the source location in the :ref:`identifier structure 1855c9ee351SJoseph Huber<table-ident_t_structure>`, but the function name is replaced with the variable 1865c9ee351SJoseph Hubername. 18728ab5944SJoseph Huber 18828ab5944SJoseph Huber.. _Device Compilation: 18928ab5944SJoseph Huber 19028ab5944SJoseph HuberOffload Device Compilation 19128ab5944SJoseph Huber-------------------------- 19228ab5944SJoseph Huber 19328ab5944SJoseph HuberThe input file is compiled for each active device toolchain. The device 19428ab5944SJoseph Hubercompilation stage is performed differently from the host stage. Namely, we do 19528ab5944SJoseph Hubernot generate any offloading entries. This is set by passing the 19663ca93c7SSergio Afonso``-fopenmp-is-target-device`` flag to the front-end. We use the host bitcode to 19728ab5944SJoseph Huberdetermine which symbols to export from the device. The bitcode file is passed in 19828ab5944SJoseph Huberfrom the previous stage using the ``-fopenmp-host-ir-file-path`` flag. 19928ab5944SJoseph HuberCompilation is otherwise performed as it would be for any other target triple. 20028ab5944SJoseph Huber 20128ab5944SJoseph HuberWhen compiling for the OpenMP device, we set the visibility of all device 20228ab5944SJoseph Hubersymbols to be ``protected`` by default. This improves performance and prevents a 20328ab5944SJoseph Huberclass of errors where a symbol in the target device could preempt a host 20428ab5944SJoseph Huberlibrary. 20528ab5944SJoseph Huber 20628ab5944SJoseph HuberThe OpenMP runtime library is linked in during compilation to provide the 20728ab5944SJoseph Huberimplementations for standard OpenMP functionality. For GPU targets this is done 20828ab5944SJoseph Huberby linking in a special bitcode library during compilation, (e.g. 20928ab5944SJoseph Huber``libomptarget-nvptx64-sm_70.bc``) using the ``-mlink-builtin-bitcode`` flag. 21028ab5944SJoseph HuberOther device libraries, such as CUDA's libdevice, are also linked this way. If 21128ab5944SJoseph Huberthe target is a standard architecture with an existing ``libomp`` 21228ab5944SJoseph Huberimplementation, that will be linked instead. Finally, device tools are used to 21328ab5944SJoseph Hubercreate a relocatable device object file that can be embedded in the host. 21428ab5944SJoseph Huber 21528ab5944SJoseph Huber.. _Creating Fat Objects: 21628ab5944SJoseph Huber 21728ab5944SJoseph HuberCreating Fat Objects 21828ab5944SJoseph Huber-------------------- 21928ab5944SJoseph Huber 22028ab5944SJoseph HuberA fat binary is a binary file that contains information intended for another 22128ab5944SJoseph Huberdevice. We create a fat object by embedding the output of the device compilation 22228ab5944SJoseph Huberstage into the host as a named section. The output from the device compilation 22328ab5944SJoseph Huberis passed to the host backend using the ``-fembed-offload-object`` flag. This 22415e62062SJoseph Huberembeds the device image into the ``.llvm.offloading`` section using a special 22515e62062SJoseph Huberbinary format that behaves like a string map. This binary format is used to 22615e62062SJoseph Huberbundle metadata about the image so the linker can associate the proper device 22715e62062SJoseph Huberlinking action with the image. Each device image will start with the magic bytes 22815e62062SJoseph Huber``0x10FF10AD``. 22928ab5944SJoseph Huber 23028ab5944SJoseph Huber.. code-block:: llvm 23128ab5944SJoseph Huber 23215e62062SJoseph Huber @llvm.embedded.object = private constant [1 x i8] c"\00", section ".llvm.offloading" 23328ab5944SJoseph Huber 23428ab5944SJoseph HuberThe device code will then be placed in the corresponding section one the backend 23528ab5944SJoseph Huberis run on the host, creating a fat object. Using fat objects allows us to treat 23628ab5944SJoseph Huberoffloading objects as standard host objects. The final object file should 23728ab5944SJoseph Hubercontain the following :ref:`offloading sections <table-offloading_sections>`. We 23828ab5944SJoseph Huberwill use this information when :ref:`Device Linking`. 23928ab5944SJoseph Huber 24028ab5944SJoseph Huber .. table:: Offloading Sections 24128ab5944SJoseph Huber :name: table-offloading_sections 24228ab5944SJoseph Huber 24316cea040SAaron Ballman +----------------------------------+------------------------------------------------------------------------------+ 24428ab5944SJoseph Huber | Section | Description | 24516cea040SAaron Ballman +==================================+==============================================================================+ 24616cea040SAaron Ballman | omp_offloading_entries | Offloading entry information (see :ref:`table-tgt_offload_entry_structure`) | 24716cea040SAaron Ballman +----------------------------------+------------------------------------------------------------------------------+ 24815e62062SJoseph Huber | .llvm.offloading | Embedded device object file for the target device and architecture | 24916cea040SAaron Ballman +----------------------------------+------------------------------------------------------------------------------+ 25028ab5944SJoseph Huber 25128ab5944SJoseph Huber.. _Device Linking: 25228ab5944SJoseph Huber 25328ab5944SJoseph HuberLinking Target Device Code 25428ab5944SJoseph Huber-------------------------- 25528ab5944SJoseph Huber 25628ab5944SJoseph HuberObjects containing :ref:`table-offloading_sections` require special handling to 25728ab5944SJoseph Hubercreate an executable device image. This is done using a Clang tool, see 25828ab5944SJoseph Huber:doc:`ClangLinkerWrapper` for more information. This tool works as a wrapper 25928ab5944SJoseph Huberover the host linking job. It scans the input object files for the offloading 26015e62062SJoseph Hubersection ``.llvm.offloading``. The device files stored in this section are then 26115e62062SJoseph Huberextracted and passed to the appropriate linking job. The linked device image is 26215e62062SJoseph Huberthen :ref:`wrapped <Device Binary Wrapping>` to create the symbols used to load 26315e62062SJoseph Huberthe device image and link it with the host. 26428ab5944SJoseph Huber 26528ab5944SJoseph HuberThe linker wrapper tool supports linking bitcode files through link time 26628ab5944SJoseph Huberoptimization (LTO). This is used whenever the object files embedded in the host 26728ab5944SJoseph Hubercontain LLVM bitcode. Bitcode will be embedded for architectures that do not 26828ab5944SJoseph Hubersupport a relocatable object format, such as AMDGPU or SPIR-V, or if the user 2695c9ee351SJoseph Huberrequested it using the ``-foffload-lto`` flag. 27028ab5944SJoseph Huber 27128ab5944SJoseph Huber.. _Device Binary Wrapping: 27228ab5944SJoseph Huber 27328ab5944SJoseph HuberDevice Binary Wrapping 27428ab5944SJoseph Huber---------------------- 27528ab5944SJoseph Huber 27628ab5944SJoseph HuberVarious structures and functions are used to create the information necessary to 27728ab5944SJoseph Huberoffload code on the device. We use the :ref:`linked device executable <Device 27828ab5944SJoseph HuberLinking>` with the corresponding offloading entries to create the symbols 27928ab5944SJoseph Hubernecessary to load and execute the device image. 28028ab5944SJoseph Huber 28128ab5944SJoseph HuberStructure Types 28228ab5944SJoseph Huber^^^^^^^^^^^^^^^ 28328ab5944SJoseph Huber 28428ab5944SJoseph HuberSeveral different structures are used to store offloading information. The 28528ab5944SJoseph Huber:ref:`device image structure <table-device_image_structure>` stores a single 28628ab5944SJoseph Huberlinked device image and its associated offloading entries. The offloading 28728ab5944SJoseph Huberentries are stored using the ``__start_omp_offloading_entries`` and 28828ab5944SJoseph Huber``__stop_omp_offloading_entries`` symbols generated by the linker using the 28916cea040SAaron Ballman:ref:`table-tgt_offload_entry_structure`. 29028ab5944SJoseph Huber 29128ab5944SJoseph Huber .. table:: __tgt_device_image Structure 29228ab5944SJoseph Huber :name: table-device_image_structure 29328ab5944SJoseph Huber 29428ab5944SJoseph Huber +----------------------+--------------+----------------------------------------+ 29528ab5944SJoseph Huber | Type | Identifier | Description | 29628ab5944SJoseph Huber +======================+==============+========================================+ 29728ab5944SJoseph Huber | void* | ImageStart | Pointer to the target code start | 29828ab5944SJoseph Huber +----------------------+--------------+----------------------------------------+ 29928ab5944SJoseph Huber | void* | ImageEnd | Pointer to the target code end | 30028ab5944SJoseph Huber +----------------------+--------------+----------------------------------------+ 30128ab5944SJoseph Huber | __tgt_offload_entry* | EntriesBegin | Begin of table with all target entries | 30228ab5944SJoseph Huber +----------------------+--------------+----------------------------------------+ 30328ab5944SJoseph Huber | __tgt_offload_entry* | EntriesEnd | End of table (non inclusive) | 30428ab5944SJoseph Huber +----------------------+--------------+----------------------------------------+ 30528ab5944SJoseph Huber 30628ab5944SJoseph HuberThe target :ref:`target binary descriptor <table-target_binary_descriptor>` is 30728ab5944SJoseph Huberused to store all binary images and offloading entries in an array. 30828ab5944SJoseph Huber 30928ab5944SJoseph Huber .. table:: __tgt_bin_desc Structure 31028ab5944SJoseph Huber :name: table-target_binary_descriptor 31128ab5944SJoseph Huber 31228ab5944SJoseph Huber +----------------------+------------------+------------------------------------------+ 31328ab5944SJoseph Huber | Type | Identifier | Description | 31428ab5944SJoseph Huber +======================+==================+==========================================+ 31528ab5944SJoseph Huber | int32_t | NumDeviceImages | Number of device types supported | 31628ab5944SJoseph Huber +----------------------+------------------+------------------------------------------+ 31728ab5944SJoseph Huber | __tgt_device_image* | DeviceImages | Array of device images (1 per dev. type) | 31828ab5944SJoseph Huber +----------------------+------------------+------------------------------------------+ 31928ab5944SJoseph Huber | __tgt_offload_entry* | HostEntriesBegin | Begin of table with all host entries | 32028ab5944SJoseph Huber +----------------------+------------------+------------------------------------------+ 32128ab5944SJoseph Huber | __tgt_offload_entry* | HostEntriesEnd | End of table (non inclusive) | 32228ab5944SJoseph Huber +----------------------+------------------+------------------------------------------+ 32328ab5944SJoseph Huber 32428ab5944SJoseph HuberGlobal Variables 32528ab5944SJoseph Huber---------------- 32628ab5944SJoseph Huber 32728ab5944SJoseph Huber:ref:`table-global_variables` lists various global variables, along with their 32828ab5944SJoseph Hubertype and their explicit ELF sections, which are used to store device images and 32928ab5944SJoseph Huberrelated symbols. 33028ab5944SJoseph Huber 33128ab5944SJoseph Huber .. table:: Global Variables 33228ab5944SJoseph Huber :name: table-global_variables 33328ab5944SJoseph Huber 33428ab5944SJoseph Huber +--------------------------------+---------------------+-------------------------+---------------------------------------------------------+ 33528ab5944SJoseph Huber | Variable | Type | ELF Section | Description | 33628ab5944SJoseph Huber +================================+=====================+=========================+=========================================================+ 33728ab5944SJoseph Huber | __start_omp_offloading_entries | __tgt_offload_entry | .omp_offloading_entries | Begin symbol for the offload entries table. | 33828ab5944SJoseph Huber +--------------------------------+---------------------+-------------------------+---------------------------------------------------------+ 33928ab5944SJoseph Huber | __stop_omp_offloading_entries | __tgt_offload_entry | .omp_offloading_entries | End symbol for the offload entries table. | 34028ab5944SJoseph Huber +--------------------------------+---------------------+-------------------------+---------------------------------------------------------+ 34128ab5944SJoseph Huber | __dummy.omp_offloading.entry | __tgt_offload_entry | .omp_offloading_entries | Dummy zero-sized object in the offload entries | 34228ab5944SJoseph Huber | | | | section to force linker to define begin/end | 34328ab5944SJoseph Huber | | | | symbols defined above. | 34428ab5944SJoseph Huber +--------------------------------+---------------------+-------------------------+---------------------------------------------------------+ 34528ab5944SJoseph Huber | .omp_offloading.device_image | __tgt_device_image | .omp_offloading_entries | ELF device code object of the first image. | 34628ab5944SJoseph Huber +--------------------------------+---------------------+-------------------------+---------------------------------------------------------+ 34728ab5944SJoseph Huber | .omp_offloading.device_image.N | __tgt_device_image | .omp_offloading_entries | ELF device code object of the (N+1)th image. | 34828ab5944SJoseph Huber +--------------------------------+---------------------+-------------------------+---------------------------------------------------------+ 34928ab5944SJoseph Huber | .omp_offloading.device_images | __tgt_device_image | .omp_offloading_entries | Array of images. | 35028ab5944SJoseph Huber +--------------------------------+---------------------+-------------------------+---------------------------------------------------------+ 35128ab5944SJoseph Huber | .omp_offloading.descriptor | __tgt_bin_desc | .omp_offloading_entries | Binary descriptor object (see :ref:`binary_descriptor`) | 35228ab5944SJoseph Huber +--------------------------------+---------------------+-------------------------+---------------------------------------------------------+ 35328ab5944SJoseph Huber 35428ab5944SJoseph Huber.. _binary_descriptor: 35528ab5944SJoseph Huber 35628ab5944SJoseph HuberBinary Descriptor for Device Images 35728ab5944SJoseph Huber^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 35828ab5944SJoseph Huber 35928ab5944SJoseph HuberThis object is passed to the offloading runtime at program startup and it 36028ab5944SJoseph Huberdescribes all device images available in the executable or shared library. It 36128ab5944SJoseph Huberis defined as follows: 36228ab5944SJoseph Huber 36328ab5944SJoseph Huber.. code-block:: c 36428ab5944SJoseph Huber 36528ab5944SJoseph Huber __attribute__((visibility("hidden"))) 36628ab5944SJoseph Huber extern __tgt_offload_entry *__start_omp_offloading_entries; 36728ab5944SJoseph Huber __attribute__((visibility("hidden"))) 36828ab5944SJoseph Huber extern __tgt_offload_entry *__stop_omp_offloading_entries; 36928ab5944SJoseph Huber static const char Image0[] = { <Bufs.front() contents> }; 37028ab5944SJoseph Huber ... 37128ab5944SJoseph Huber static const char ImageN[] = { <Bufs.back() contents> }; 37228ab5944SJoseph Huber static const __tgt_device_image Images[] = { 37328ab5944SJoseph Huber { 37428ab5944SJoseph Huber Image0, /*ImageStart*/ 37528ab5944SJoseph Huber Image0 + sizeof(Image0), /*ImageEnd*/ 37628ab5944SJoseph Huber __start_omp_offloading_entries, /*EntriesBegin*/ 37728ab5944SJoseph Huber __stop_omp_offloading_entries /*EntriesEnd*/ 37828ab5944SJoseph Huber }, 37928ab5944SJoseph Huber ... 38028ab5944SJoseph Huber { 38128ab5944SJoseph Huber ImageN, /*ImageStart*/ 38228ab5944SJoseph Huber ImageN + sizeof(ImageN), /*ImageEnd*/ 38328ab5944SJoseph Huber __start_omp_offloading_entries, /*EntriesBegin*/ 38428ab5944SJoseph Huber __stop_omp_offloading_entries /*EntriesEnd*/ 38528ab5944SJoseph Huber } 38628ab5944SJoseph Huber }; 38728ab5944SJoseph Huber static const __tgt_bin_desc BinDesc = { 38828ab5944SJoseph Huber sizeof(Images) / sizeof(Images[0]), /*NumDeviceImages*/ 38928ab5944SJoseph Huber Images, /*DeviceImages*/ 39028ab5944SJoseph Huber __start_omp_offloading_entries, /*HostEntriesBegin*/ 39128ab5944SJoseph Huber __stop_omp_offloading_entries /*HostEntriesEnd*/ 39228ab5944SJoseph Huber }; 39328ab5944SJoseph Huber 3945c9ee351SJoseph Huber 39528ab5944SJoseph HuberGlobal Constructor and Destructor 39628ab5944SJoseph Huber^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 39728ab5944SJoseph Huber 39828ab5944SJoseph HuberThe global constructor (``.omp_offloading.descriptor_reg()``) registers the 39928ab5944SJoseph Huberdevice images with the runtime by calling the ``__tgt_register_lib()`` runtime 40028ab5944SJoseph Huberfunction. The constructor is explicitly defined in ``.text.startup`` section and 40128ab5944SJoseph Huberis run once when the program starts. Similarly, the global destructor 40228ab5944SJoseph Huber(``.omp_offloading.descriptor_unreg()``) calls ``__tgt_unregister_lib()`` for 40328ab5944SJoseph Huberthe destructor and is also defined in ``.text.startup`` section and run when the 40428ab5944SJoseph Huberprogram exits. 40528ab5944SJoseph Huber 40628ab5944SJoseph HuberOffloading Example 40728ab5944SJoseph Huber------------------ 40828ab5944SJoseph Huber 40928ab5944SJoseph HuberThis section contains a simple example of generating offloading code using 41028ab5944SJoseph HuberOpenMP offloading. We will use a simple ``ZAXPY`` BLAS routine. 41128ab5944SJoseph Huber 41228ab5944SJoseph Huber.. code-block:: c++ 41328ab5944SJoseph Huber 41428ab5944SJoseph Huber #include <complex> 41528ab5944SJoseph Huber 41628ab5944SJoseph Huber using complex = std::complex<double>; 41728ab5944SJoseph Huber 41828ab5944SJoseph Huber void zaxpy(complex *X, complex *Y, complex D, std::size_t N) { 41928ab5944SJoseph Huber #pragma omp target teams distribute parallel for 42028ab5944SJoseph Huber for (std::size_t i = 0; i < N; ++i) 42128ab5944SJoseph Huber Y[i] = D * X[i] + Y[i]; 42228ab5944SJoseph Huber } 42328ab5944SJoseph Huber 42428ab5944SJoseph Huber int main() { 42528ab5944SJoseph Huber const std::size_t N = 1024; 42628ab5944SJoseph Huber complex X[N], Y[N], D; 42728ab5944SJoseph Huber #pragma omp target data map(to:X[0 : N]) map(tofrom:Y[0 : N]) 42828ab5944SJoseph Huber zaxpy(X, Y, D, N); 42928ab5944SJoseph Huber } 43028ab5944SJoseph Huber 43128ab5944SJoseph HuberThis code is compiled using the following Clang flags. 43228ab5944SJoseph Huber 43328ab5944SJoseph Huber.. code-block:: console 43428ab5944SJoseph Huber 43528ab5944SJoseph Huber $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 zaxpy.cpp -c 43628ab5944SJoseph Huber 43715e62062SJoseph HuberThe output section in the object file can be seen using the ``readelf`` utility. 43815e62062SJoseph HuberThe ``.llvm.offloading`` section has the ``SHF_EXCLUDE`` flag so it will be 43915e62062SJoseph Huberremoved from the final executable or shared library by the linker. 44028ab5944SJoseph Huber 44128ab5944SJoseph Huber.. code-block:: text 44228ab5944SJoseph Huber 44328ab5944SJoseph Huber $ llvm-readelf -WS zaxpy.o 44415e62062SJoseph Huber Section Headers: 44515e62062SJoseph Huber [Nr] Name Type Address Off Size ES Flg Lk Inf Al 44615e62062SJoseph Huber [11] omp_offloading_entries PROGBITS 0000000000000000 0001f0 000040 00 A 0 0 1 44715e62062SJoseph Huber [12] .llvm.offloading PROGBITS 0000000000000000 000260 030950 00 E 0 0 8 44815e62062SJoseph Huber 44928ab5944SJoseph Huber 45028ab5944SJoseph HuberCompiling this file again will invoke the ``clang-linker-wrapper`` utility to 45128ab5944SJoseph Huberextract and link the device code stored at the section named 45215e62062SJoseph Huber``.llvm.offloading`` and then use entries stored in 45328ab5944SJoseph Huberthe section named ``omp_offloading_entries`` to create the symbols necessary for 45428ab5944SJoseph Huber``libomptarget`` to register the device image and call the entry function. 45528ab5944SJoseph Huber 45628ab5944SJoseph Huber.. code-block:: console 45728ab5944SJoseph Huber 45828ab5944SJoseph Huber $ clang++ -fopenmp -fopenmp-targets=nvptx64 zaxpy.o -o zaxpy 45928ab5944SJoseph Huber $ ./zaxpy 46028ab5944SJoseph Huber 46128ab5944SJoseph HuberWe can see the steps created by clang to generate the offloading code using the 46228ab5944SJoseph Huber``-ccc-print-phases`` option in Clang. This matches the description in 46328ab5944SJoseph Huber:ref:`Offloading Overview`. 46428ab5944SJoseph Huber 46528ab5944SJoseph Huber.. code-block:: console 46628ab5944SJoseph Huber 46728ab5944SJoseph Huber $ clang++ -fopenmp -fopenmp-targets=nvptx64 -ccc-print-phases zaxpy.cpp 46828ab5944SJoseph Huber # "x86_64-unknown-linux-gnu" - "clang", inputs: ["zaxpy.cpp"], output: "/tmp/zaxpy-host.bc" 46928ab5944SJoseph Huber # "nvptx64-nvidia-cuda" - "clang", inputs: ["zaxpy.cpp", "/tmp/zaxpy-e6a41b.bc"], output: "/tmp/zaxpy-07f434.s" 47028ab5944SJoseph Huber # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["/tmp/zaxpy-07f434.s"], output: "/tmp/zaxpy-0af7b7.o" 47128ab5944SJoseph Huber # "x86_64-unknown-linux-gnu" - "clang", inputs: ["/tmp/zaxpy-e6a41b.bc", "/tmp/zaxpy-0af7b7.o"], output: "/tmp/zaxpy-416cad.o" 47228ab5944SJoseph Huber # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["/tmp/zaxpy-416cad.o"], output: "a.out" 473*5c840542SJoseph Huber 474*5c840542SJoseph HuberRelocatable Linking 475*5c840542SJoseph Huber------------------- 476*5c840542SJoseph Huber 477*5c840542SJoseph HuberThe offloading compilation pipeline normally will defer the final device linking 478*5c840542SJoseph Huberand runtime registration until the ``clang-linker-wrapper`` is run to create the 479*5c840542SJoseph Huberexecutable. This is the standard behaviour when compiling for OpenMP offloading 480*5c840542SJoseph Huberor CUDA and HIP in ``-fgpu-rdc`` mode. However, there are some cases where the 481*5c840542SJoseph Huberuser may wish to perform this device handling prematurely. This is described in 482*5c840542SJoseph Huberthe :doc:`linker wrapper documentation<ClangLinkerWrapper>`. 483*5c840542SJoseph Huber 484*5c840542SJoseph HuberEffectively, this allows the user to handle offloading specific linking ahead of 485*5c840542SJoseph Hubertime when shipping objects or static libraries. This can be thought of as 486*5c840542SJoseph Huberperforming a standard ``-fno-gpu-rdc`` compilation on a subset of object files. 487*5c840542SJoseph HuberThis can be useful to reduce link time, prevent users from interacting with the 488*5c840542SJoseph Huberlibrary's device code, or for shipping libraries to incompatible compilers. 489*5c840542SJoseph Huber 490*5c840542SJoseph HuberNormally, if a relocatable link is done using ``clang -r`` it will simply merge 491*5c840542SJoseph Huberthe ``.llvm.offloading`` sections which will then be linked later when the 492*5c840542SJoseph Huberexecutable is created. However, if the ``-r`` flag is used with the offloading 493*5c840542SJoseph Hubertoolchain, it will perform the device linking and registration phases and then 494*5c840542SJoseph Hubermerge the registration code into the final relocatable object file. 495*5c840542SJoseph Huber 496*5c840542SJoseph HuberThe following example shows how using the relocatable link with the offloading 497*5c840542SJoseph Huberpipeline can create a static library with offloading code that can be 498*5c840542SJoseph Huberredistributed without requiring any additional handling. 499*5c840542SJoseph Huber 500*5c840542SJoseph Huber.. code-block:: console 501*5c840542SJoseph Huber 502*5c840542SJoseph Huber $ clang++ -fopenmp -fopenmp-targets=nvptx64 foo.cpp -c 503*5c840542SJoseph Huber $ clang++ -lomptarget.devicertl --offload-link -r foo.o -o merged.o 504*5c840542SJoseph Huber $ llvm-ar rcs libfoo.a merged.o 505*5c840542SJoseph Huber # g++ app.cpp -L. -lfoo 506*5c840542SJoseph Huber 507