xref: /llvm-project/clang/docs/OffloadingDesign.rst (revision 347ab99a5c6d096beb7378794c6255dca2a866e6)
128ab5944SJoseph Huber=============================
228ab5944SJoseph HuberOffloading Design & Internals
328ab5944SJoseph Huber=============================
428ab5944SJoseph Huber
528ab5944SJoseph Huber.. contents::
628ab5944SJoseph Huber   :local:
728ab5944SJoseph Huber
828ab5944SJoseph HuberIntroduction
928ab5944SJoseph Huber============
1028ab5944SJoseph Huber
1128ab5944SJoseph HuberThis document describes the Clang driver and code generation steps for creating
1228ab5944SJoseph Huberoffloading applications. Clang supports offloading to various architectures
1328ab5944SJoseph Huberusing programming models like CUDA, HIP, and OpenMP. The purpose of this
1428ab5944SJoseph Huberdocument is to illustrate the steps necessary to create an offloading
1528ab5944SJoseph Huberapplication using Clang.
1628ab5944SJoseph Huber
1728ab5944SJoseph HuberOpenMP Offloading
1828ab5944SJoseph Huber=================
1928ab5944SJoseph Huber
2028ab5944SJoseph HuberClang supports OpenMP target offloading to several different architectures such
2128ab5944SJoseph Huberas NVPTX, AMDGPU, X86_64, Arm, and PowerPC. Offloading code is generated by
2228ab5944SJoseph HuberClang and then executed using the ``libomptarget`` runtime and the associated
2328ab5944SJoseph Huberplugin for the target architecture, e.g. ``libomptarget.rtl.cuda``. This section
2428ab5944SJoseph Huberdescribes the steps necessary to create a functioning device image that can be
2528ab5944SJoseph Huberloaded by the OpenMP runtime.  More information on the OpenMP runtimes can be
2628ab5944SJoseph Huberfound at the `OpenMP documentation page <https://openmp.llvm.org>`__.
2728ab5944SJoseph Huber
2828ab5944SJoseph Huber.. _Offloading Overview:
2928ab5944SJoseph Huber
3028ab5944SJoseph HuberOffloading Overview
3128ab5944SJoseph Huber-------------------
3228ab5944SJoseph Huber
3328ab5944SJoseph HuberThe goal of offloading compilation is to create an executable device image that
3428ab5944SJoseph Hubercan be run on the target device. OpenMP offloading creates executable images by
3528ab5944SJoseph Hubercompiling the input file for both the host and the target device. The output
3628ab5944SJoseph Huberfrom the device phase then needs to be embedded into the host to create a fat
3728ab5944SJoseph Huberobject. A special tool then needs to extract the device code from the fat
3828ab5944SJoseph Huberobjects, run the device linking step, and embed the final image in a symbol the
395c9ee351SJoseph Huberhost runtime library can use to register the library and access the symbols on
405c9ee351SJoseph Huberthe device.
4128ab5944SJoseph Huber
4228ab5944SJoseph HuberCompilation Process
4328ab5944SJoseph Huber^^^^^^^^^^^^^^^^^^^
4428ab5944SJoseph Huber
455c9ee351SJoseph HuberThe compiler performs the following high-level actions to generate OpenMP
465c9ee351SJoseph Huberoffloading code:
4728ab5944SJoseph Huber
4828ab5944SJoseph Huber* Compile the input file for the host to produce a bitcode file. Lower ``#pragma
4928ab5944SJoseph Huber  omp target`` declarations to :ref:`offloading entries <Generating Offloading
5028ab5944SJoseph Huber  Entries>` and create metadata to indicate which entries are on the device.
5128ab5944SJoseph Huber* Compile the input file for the target :ref:`device <Device Compilation>` using
5228ab5944SJoseph Huber  the :ref:`offloading entry <Generating Offloading Entries>` metadata created
5328ab5944SJoseph Huber  by the host.
5428ab5944SJoseph Huber* Link the OpenMP device runtime library and run the backend to create a device
5528ab5944SJoseph Huber  object file.
5628ab5944SJoseph Huber* Run the backend on the host bitcode file and create a :ref:`fat object file
5728ab5944SJoseph Huber  <Creating Fat Objects>` using the device object file.
5828ab5944SJoseph Huber* Pass the fat object file to the :ref:`linker wrapper tool <Device Linking>`
5928ab5944SJoseph Huber  and extract the device objects. Run the device linking action on the extracted
6028ab5944SJoseph Huber  objects.
6128ab5944SJoseph Huber* :ref:`Wrap <Device Binary Wrapping>` the :ref:`device images <Device linking>`
6228ab5944SJoseph Huber  and :ref:`offload entries <Generating Offloading Entries>` in a symbol that
6328ab5944SJoseph Huber  can be accessed by the host.
6428ab5944SJoseph Huber* Add the :ref:`wrapped binary <Device Binary Wrapping>` to the linker input and
6528ab5944SJoseph Huber  run the host linking action. Link with ``libomptarget`` to register and
6628ab5944SJoseph Huber  execute the images.
6728ab5944SJoseph Huber
6828ab5944SJoseph Huber   .. _Generating Offloading Entries:
6928ab5944SJoseph Huber
7028ab5944SJoseph HuberGenerating Offloading Entries
7128ab5944SJoseph Huber-----------------------------
7228ab5944SJoseph Huber
7328ab5944SJoseph HuberThe first step in compilation is to generate offloading entries for the host.
7428ab5944SJoseph HuberThis information is used to identify function kernels or global values that will
7528ab5944SJoseph Huberbe provided by the device. Blocks contained in a ``#pragma omp target`` or
7628ab5944SJoseph Hubersymbols inside a ``#pragma omp declare target`` directive will have offloading
7728ab5944SJoseph Huberentries generated. The following table shows the :ref:`offload entry structure
7828ab5944SJoseph Huber<table-tgt_offload_entry_structure>`.
7928ab5944SJoseph Huber
8028ab5944SJoseph Huber  .. table:: __tgt_offload_entry Structure
8128ab5944SJoseph Huber    :name: table-tgt_offload_entry_structure
8228ab5944SJoseph Huber
8328ab5944SJoseph Huber    +---------+------------+------------------------------------------------------------------------+
8428ab5944SJoseph Huber    |   Type  | Identifier | Description                                                            |
8528ab5944SJoseph Huber    +=========+============+========================================================================+
8628ab5944SJoseph Huber    |  void*  |    addr    | Address of global symbol within device image (function or global)      |
8728ab5944SJoseph Huber    +---------+------------+------------------------------------------------------------------------+
8828ab5944SJoseph Huber    |  char*  |    name    | Name of the symbol                                                     |
8928ab5944SJoseph Huber    +---------+------------+------------------------------------------------------------------------+
9028ab5944SJoseph Huber    |  size_t |    size    | Size of the entry info (0 if it is a function)                         |
9128ab5944SJoseph Huber    +---------+------------+------------------------------------------------------------------------+
9228ab5944SJoseph Huber    | int32_t |    flags   | Flags associated with the entry (see :ref:`table-offload_entry_flags`) |
9328ab5944SJoseph Huber    +---------+------------+------------------------------------------------------------------------+
9428ab5944SJoseph Huber    | int32_t |  reserved  | Reserved, to be used by the runtime library.                           |
9528ab5944SJoseph Huber    +---------+------------+------------------------------------------------------------------------+
9628ab5944SJoseph Huber
975c9ee351SJoseph HuberThe address of the global symbol will be set to the device pointer value by the
9828ab5944SJoseph Huberruntime once the device image is loaded. The flags are set to indicate the
9928ab5944SJoseph Huberhandling required for the offloading entry. If the offloading entry is an entry
1005c9ee351SJoseph Huberto a target region it can have one of the following :ref:`entry flags
1015c9ee351SJoseph Huber<table-offload_entry_flags>`.
10228ab5944SJoseph Huber
10328ab5944SJoseph Huber  .. table:: Target Region Entry Flags
10428ab5944SJoseph Huber    :name: table-offload_entry_flags
10528ab5944SJoseph Huber
10628ab5944SJoseph Huber    +----------------------------------+-------+-----------------------------------------+
10728ab5944SJoseph Huber    |                Name              | Value | Description                             |
10828ab5944SJoseph Huber    +==================================+=======+=========================================+
10928ab5944SJoseph Huber    | OMPTargetRegionEntryTargetRegion | 0x00  | Mark the entry as generic target region |
11028ab5944SJoseph Huber    +----------------------------------+-------+-----------------------------------------+
11128ab5944SJoseph Huber    | OMPTargetRegionEntryCtor         | 0x02  | Mark the entry as a global constructor  |
11228ab5944SJoseph Huber    +----------------------------------+-------+-----------------------------------------+
11328ab5944SJoseph Huber    | OMPTargetRegionEntryDtor         | 0x04  | Mark the entry as a global destructor   |
11428ab5944SJoseph Huber    +----------------------------------+-------+-----------------------------------------+
11528ab5944SJoseph Huber
11628ab5944SJoseph HuberIf the offloading entry is a global variable, indicated by a non-zero size, it
11728ab5944SJoseph Huberwill instead have one of the following :ref:`global
11828ab5944SJoseph Huber<table-offload_global_flags>` flags.
11928ab5944SJoseph Huber
12028ab5944SJoseph Huber  .. table:: Target Region Global
12128ab5944SJoseph Huber    :name: table-offload_global_flags
12228ab5944SJoseph Huber
12328ab5944SJoseph Huber    +-----------------------------+-------+---------------------------------------------------------------+
12428ab5944SJoseph Huber    |          Name               | Value | Description                                                   |
12528ab5944SJoseph Huber    +=============================+=======+===============================================================+
12628ab5944SJoseph Huber    | OMPTargetGlobalVarEntryTo   | 0x00  | Mark the entry as a 'to' attribute (w.r.t. the to clause)     |
12728ab5944SJoseph Huber    +-----------------------------+-------+---------------------------------------------------------------+
12828ab5944SJoseph Huber    | OMPTargetGlobalVarEntryLink | 0x01  | Mark the entry as a 'link' attribute (w.r.t. the link clause) |
12928ab5944SJoseph Huber    +-----------------------------+-------+---------------------------------------------------------------+
13028ab5944SJoseph Huber
13128ab5944SJoseph HuberThe target offload entries are used by the runtime to access the device kernels
13228ab5944SJoseph Huberand globals that will be provided by the final device image. Each offloading
13328ab5944SJoseph Huberentry is set to use the ``omp_offloading_entries`` section. When the final
13428ab5944SJoseph Huberapplication is created the linker will provide the
13528ab5944SJoseph Huber``__start_omp_offloading_entries`` and ``__stop_omp_offloading_entries`` symbols
13628ab5944SJoseph Huberwhich are used to create the :ref:`final image <Device Binary Wrapping>`.
13728ab5944SJoseph Huber
1385c9ee351SJoseph HuberThis information is used by the device compilation stage to determine which
1395c9ee351SJoseph Hubersymbols need to be exported from the device. We use the ``omp_offload.info``
1405c9ee351SJoseph Hubermetadata node to pass this information device compilation stage.
14128ab5944SJoseph Huber
14228ab5944SJoseph HuberAccessing Entries on the Device
14328ab5944SJoseph Huber^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14428ab5944SJoseph Huber
14528ab5944SJoseph HuberAccessing the entries in the device is done using the address field in the
14628ab5944SJoseph Huber:ref:`offload entry<table-tgt_offload_entry_structure>`. The runtime will set
14728ab5944SJoseph Huberthe address to the pointer associated with the device image during runtime
14828ab5944SJoseph Huberinitialization. This is used to call the corresponding kernel function when
14928ab5944SJoseph Huberentering a ``#pragma omp target`` region. For variables, the runtime maintains a
15028ab5944SJoseph Hubertable mapping host pointers to device pointers. Global variables inside a
1515c9ee351SJoseph Huber``#pragma omp target declare`` directive are first initialized to the host's
15228ab5944SJoseph Huberaddress. Once the device address is initialized we insert it into the table to
15328ab5944SJoseph Hubermap the host address to the device address.
15428ab5944SJoseph Huber
15528ab5944SJoseph HuberDebugging Information
15628ab5944SJoseph Huber^^^^^^^^^^^^^^^^^^^^^
15728ab5944SJoseph Huber
15828ab5944SJoseph HuberWe generate structures to hold debugging information that is passed to
15928ab5944SJoseph Huber``libomptarget``. This allows the front-end to generate information the runtime
16028ab5944SJoseph Huberlibrary uses for more informative error messages. This is done using the
16128ab5944SJoseph Huberstandard :ref:`identifier structure <table-ident_t_structure>` used in
16228ab5944SJoseph Huber``libomp`` and ``libomptarget``. This is used to pass information and source
16328ab5944SJoseph Huberlocations to the runtime.
16428ab5944SJoseph Huber
16528ab5944SJoseph Huber  .. table:: ident_t Structure
16628ab5944SJoseph Huber    :name: table-ident_t_structure
16728ab5944SJoseph Huber
16828ab5944SJoseph Huber    +---------+------------+-----------------------------------------------------------------------------+
16928ab5944SJoseph Huber    |   Type  | Identifier | Description                                                                 |
17028ab5944SJoseph Huber    +=========+============+=============================================================================+
17128ab5944SJoseph Huber    | int32_t |  reserved  | Reserved, to be used by the runtime library.                                |
17228ab5944SJoseph Huber    +---------+------------+-----------------------------------------------------------------------------+
17328ab5944SJoseph Huber    | int32_t |   flags    | Flags used to indicate some features, mostly unused.                        |
17428ab5944SJoseph Huber    +---------+------------+-----------------------------------------------------------------------------+
17528ab5944SJoseph Huber    | int32_t |  reserved  | Reserved, to be used by the runtime library.                                |
17628ab5944SJoseph Huber    +---------+------------+-----------------------------------------------------------------------------+
17728ab5944SJoseph Huber    | int32_t |  reserved  | Reserved, to be used by the runtime library.                                |
17828ab5944SJoseph Huber    +---------+------------+-----------------------------------------------------------------------------+
17928ab5944SJoseph Huber    |  char*  |  psource   | Program source information, stored as ";filename;function;line;column;;\\0" |
18028ab5944SJoseph Huber    +---------+------------+-----------------------------------------------------------------------------+
18128ab5944SJoseph Huber
18228ab5944SJoseph HuberIf debugging information is enabled, we will also create strings to indicate the
18328ab5944SJoseph Hubernames and declarations of variables mapped in target regions. These have the
18428ab5944SJoseph Hubersame format as the source location in the :ref:`identifier structure
1855c9ee351SJoseph Huber<table-ident_t_structure>`, but the function name is replaced with the variable
1865c9ee351SJoseph Hubername.
18728ab5944SJoseph Huber
18828ab5944SJoseph Huber.. _Device Compilation:
18928ab5944SJoseph Huber
19028ab5944SJoseph HuberOffload Device Compilation
19128ab5944SJoseph Huber--------------------------
19228ab5944SJoseph Huber
19328ab5944SJoseph HuberThe input file is compiled for each active device toolchain. The device
19428ab5944SJoseph Hubercompilation stage is performed differently from the host stage. Namely, we do
19528ab5944SJoseph Hubernot generate any offloading entries. This is set by passing the
19663ca93c7SSergio Afonso``-fopenmp-is-target-device`` flag to the front-end. We use the host bitcode to
19728ab5944SJoseph Huberdetermine which symbols to export from the device. The bitcode file is passed in
19828ab5944SJoseph Huberfrom the previous stage using the ``-fopenmp-host-ir-file-path`` flag.
19928ab5944SJoseph HuberCompilation is otherwise performed as it would be for any other target triple.
20028ab5944SJoseph Huber
20128ab5944SJoseph HuberWhen compiling for the OpenMP device, we set the visibility of all device
20228ab5944SJoseph Hubersymbols to be ``protected`` by default. This improves performance and prevents a
20328ab5944SJoseph Huberclass of errors where a symbol in the target device could preempt a host
20428ab5944SJoseph Huberlibrary.
20528ab5944SJoseph Huber
20628ab5944SJoseph HuberThe OpenMP runtime library is linked in during compilation to provide the
20728ab5944SJoseph Huberimplementations for standard OpenMP functionality. For GPU targets this is done
20828ab5944SJoseph Huberby linking in a special bitcode library during compilation, (e.g.
20928ab5944SJoseph Huber``libomptarget-nvptx64-sm_70.bc``) using the ``-mlink-builtin-bitcode`` flag.
21028ab5944SJoseph HuberOther device libraries, such as CUDA's libdevice, are also linked this way. If
21128ab5944SJoseph Huberthe target is a standard architecture with an existing ``libomp``
21228ab5944SJoseph Huberimplementation, that will be linked instead. Finally, device tools are used to
21328ab5944SJoseph Hubercreate a relocatable device object file that can be embedded in the host.
21428ab5944SJoseph Huber
21528ab5944SJoseph Huber.. _Creating Fat Objects:
21628ab5944SJoseph Huber
21728ab5944SJoseph HuberCreating Fat Objects
21828ab5944SJoseph Huber--------------------
21928ab5944SJoseph Huber
22028ab5944SJoseph HuberA fat binary is a binary file that contains information intended for another
22128ab5944SJoseph Huberdevice. We create a fat object by embedding the output of the device compilation
22228ab5944SJoseph Huberstage into the host as a named section. The output from the device compilation
22328ab5944SJoseph Huberis passed to the host backend using the ``-fembed-offload-object`` flag. This
22415e62062SJoseph Huberembeds the device image into the ``.llvm.offloading`` section using a special
22515e62062SJoseph Huberbinary format that behaves like a string map. This binary format is used to
22615e62062SJoseph Huberbundle metadata about the image so the linker can associate the proper device
22715e62062SJoseph Huberlinking action with the image. Each device image will start with the magic bytes
22815e62062SJoseph Huber``0x10FF10AD``.
22928ab5944SJoseph Huber
23028ab5944SJoseph Huber.. code-block:: llvm
23128ab5944SJoseph Huber
23215e62062SJoseph Huber  @llvm.embedded.object = private constant [1 x i8] c"\00", section ".llvm.offloading"
23328ab5944SJoseph Huber
23428ab5944SJoseph HuberThe device code will then be placed in the corresponding section one the backend
23528ab5944SJoseph Huberis run on the host, creating a fat object. Using fat objects allows us to treat
23628ab5944SJoseph Huberoffloading objects as standard host objects. The final object file should
23728ab5944SJoseph Hubercontain the following :ref:`offloading sections <table-offloading_sections>`. We
23828ab5944SJoseph Huberwill use this information when :ref:`Device Linking`.
23928ab5944SJoseph Huber
24028ab5944SJoseph Huber  .. table:: Offloading Sections
24128ab5944SJoseph Huber    :name: table-offloading_sections
24228ab5944SJoseph Huber
24316cea040SAaron Ballman    +----------------------------------+------------------------------------------------------------------------------+
24428ab5944SJoseph Huber    |             Section              | Description                                                                  |
24516cea040SAaron Ballman    +==================================+==============================================================================+
24616cea040SAaron Ballman    | omp_offloading_entries           | Offloading entry information (see :ref:`table-tgt_offload_entry_structure`)  |
24716cea040SAaron Ballman    +----------------------------------+------------------------------------------------------------------------------+
24815e62062SJoseph Huber    | .llvm.offloading                 | Embedded device object file for the target device and architecture           |
24916cea040SAaron Ballman    +----------------------------------+------------------------------------------------------------------------------+
25028ab5944SJoseph Huber
25128ab5944SJoseph Huber.. _Device Linking:
25228ab5944SJoseph Huber
25328ab5944SJoseph HuberLinking Target Device Code
25428ab5944SJoseph Huber--------------------------
25528ab5944SJoseph Huber
25628ab5944SJoseph HuberObjects containing :ref:`table-offloading_sections` require special handling to
25728ab5944SJoseph Hubercreate an executable device image. This is done using a Clang tool, see
25828ab5944SJoseph Huber:doc:`ClangLinkerWrapper` for more information. This tool works as a wrapper
25928ab5944SJoseph Huberover the host linking job. It scans the input object files for the offloading
26015e62062SJoseph Hubersection ``.llvm.offloading``. The device files stored in this section are then
26115e62062SJoseph Huberextracted and passed to the appropriate linking job. The linked device image is
26215e62062SJoseph Huberthen :ref:`wrapped <Device Binary Wrapping>` to create the symbols used to load
26315e62062SJoseph Huberthe device image and link it with the host.
26428ab5944SJoseph Huber
26528ab5944SJoseph HuberThe linker wrapper tool supports linking bitcode files through link time
26628ab5944SJoseph Huberoptimization (LTO). This is used whenever the object files embedded in the host
26728ab5944SJoseph Hubercontain LLVM bitcode. Bitcode will be embedded for architectures that do not
26828ab5944SJoseph Hubersupport a relocatable object format, such as AMDGPU or SPIR-V, or if the user
2695c9ee351SJoseph Huberrequested it using the ``-foffload-lto`` flag.
27028ab5944SJoseph Huber
27128ab5944SJoseph Huber.. _Device Binary Wrapping:
27228ab5944SJoseph Huber
27328ab5944SJoseph HuberDevice Binary Wrapping
27428ab5944SJoseph Huber----------------------
27528ab5944SJoseph Huber
27628ab5944SJoseph HuberVarious structures and functions are used to create the information necessary to
27728ab5944SJoseph Huberoffload code on the device. We use the :ref:`linked device executable <Device
27828ab5944SJoseph HuberLinking>` with the corresponding offloading entries to create the symbols
27928ab5944SJoseph Hubernecessary to load and execute the device image.
28028ab5944SJoseph Huber
28128ab5944SJoseph HuberStructure Types
28228ab5944SJoseph Huber^^^^^^^^^^^^^^^
28328ab5944SJoseph Huber
28428ab5944SJoseph HuberSeveral different structures are used to store offloading information. The
28528ab5944SJoseph Huber:ref:`device image structure <table-device_image_structure>` stores a single
28628ab5944SJoseph Huberlinked device image and its associated offloading entries. The offloading
28728ab5944SJoseph Huberentries are stored using the ``__start_omp_offloading_entries`` and
28828ab5944SJoseph Huber``__stop_omp_offloading_entries`` symbols generated by the linker using the
28916cea040SAaron Ballman:ref:`table-tgt_offload_entry_structure`.
29028ab5944SJoseph Huber
29128ab5944SJoseph Huber  .. table:: __tgt_device_image Structure
29228ab5944SJoseph Huber    :name: table-device_image_structure
29328ab5944SJoseph Huber
29428ab5944SJoseph Huber    +----------------------+--------------+----------------------------------------+
29528ab5944SJoseph Huber    |         Type         |  Identifier  | Description                            |
29628ab5944SJoseph Huber    +======================+==============+========================================+
29728ab5944SJoseph Huber    |         void*        |  ImageStart  | Pointer to the target code start       |
29828ab5944SJoseph Huber    +----------------------+--------------+----------------------------------------+
29928ab5944SJoseph Huber    |         void*        |   ImageEnd   | Pointer to the target code end         |
30028ab5944SJoseph Huber    +----------------------+--------------+----------------------------------------+
30128ab5944SJoseph Huber    | __tgt_offload_entry* | EntriesBegin | Begin of table with all target entries |
30228ab5944SJoseph Huber    +----------------------+--------------+----------------------------------------+
30328ab5944SJoseph Huber    | __tgt_offload_entry* |  EntriesEnd  | End of table (non inclusive)           |
30428ab5944SJoseph Huber    +----------------------+--------------+----------------------------------------+
30528ab5944SJoseph Huber
30628ab5944SJoseph HuberThe target :ref:`target binary descriptor <table-target_binary_descriptor>` is
30728ab5944SJoseph Huberused to store all binary images and offloading entries in an array.
30828ab5944SJoseph Huber
30928ab5944SJoseph Huber  .. table:: __tgt_bin_desc Structure
31028ab5944SJoseph Huber    :name: table-target_binary_descriptor
31128ab5944SJoseph Huber
31228ab5944SJoseph Huber    +----------------------+------------------+------------------------------------------+
31328ab5944SJoseph Huber    |         Type         |    Identifier    | Description                              |
31428ab5944SJoseph Huber    +======================+==================+==========================================+
31528ab5944SJoseph Huber    |        int32_t       |  NumDeviceImages | Number of device types supported         |
31628ab5944SJoseph Huber    +----------------------+------------------+------------------------------------------+
31728ab5944SJoseph Huber    |  __tgt_device_image* |   DeviceImages   | Array of device images (1 per dev. type) |
31828ab5944SJoseph Huber    +----------------------+------------------+------------------------------------------+
31928ab5944SJoseph Huber    | __tgt_offload_entry* | HostEntriesBegin | Begin of table with all host entries     |
32028ab5944SJoseph Huber    +----------------------+------------------+------------------------------------------+
32128ab5944SJoseph Huber    | __tgt_offload_entry* |  HostEntriesEnd  | End of table (non inclusive)             |
32228ab5944SJoseph Huber    +----------------------+------------------+------------------------------------------+
32328ab5944SJoseph Huber
32428ab5944SJoseph HuberGlobal Variables
32528ab5944SJoseph Huber----------------
32628ab5944SJoseph Huber
32728ab5944SJoseph Huber:ref:`table-global_variables` lists various global variables, along with their
32828ab5944SJoseph Hubertype and their explicit ELF sections, which are used to store device images and
32928ab5944SJoseph Huberrelated symbols.
33028ab5944SJoseph Huber
33128ab5944SJoseph Huber  .. table:: Global Variables
33228ab5944SJoseph Huber    :name: table-global_variables
33328ab5944SJoseph Huber
33428ab5944SJoseph Huber    +--------------------------------+---------------------+-------------------------+---------------------------------------------------------+
33528ab5944SJoseph Huber    |            Variable            |         Type        |       ELF Section       |                    Description                          |
33628ab5944SJoseph Huber    +================================+=====================+=========================+=========================================================+
33728ab5944SJoseph Huber    | __start_omp_offloading_entries | __tgt_offload_entry | .omp_offloading_entries | Begin symbol for the offload entries table.             |
33828ab5944SJoseph Huber    +--------------------------------+---------------------+-------------------------+---------------------------------------------------------+
33928ab5944SJoseph Huber    | __stop_omp_offloading_entries  | __tgt_offload_entry | .omp_offloading_entries | End symbol for the offload entries table.               |
34028ab5944SJoseph Huber    +--------------------------------+---------------------+-------------------------+---------------------------------------------------------+
34128ab5944SJoseph Huber    | __dummy.omp_offloading.entry   | __tgt_offload_entry | .omp_offloading_entries | Dummy zero-sized object in the offload entries          |
34228ab5944SJoseph Huber    |                                |                     |                         | section to force linker to define begin/end             |
34328ab5944SJoseph Huber    |                                |                     |                         | symbols defined above.                                  |
34428ab5944SJoseph Huber    +--------------------------------+---------------------+-------------------------+---------------------------------------------------------+
34528ab5944SJoseph Huber    | .omp_offloading.device_image   |  __tgt_device_image | .omp_offloading_entries | ELF device code object of the first image.              |
34628ab5944SJoseph Huber    +--------------------------------+---------------------+-------------------------+---------------------------------------------------------+
34728ab5944SJoseph Huber    | .omp_offloading.device_image.N |  __tgt_device_image | .omp_offloading_entries | ELF device code object of the (N+1)th image.            |
34828ab5944SJoseph Huber    +--------------------------------+---------------------+-------------------------+---------------------------------------------------------+
34928ab5944SJoseph Huber    | .omp_offloading.device_images  |  __tgt_device_image | .omp_offloading_entries | Array of images.                                        |
35028ab5944SJoseph Huber    +--------------------------------+---------------------+-------------------------+---------------------------------------------------------+
35128ab5944SJoseph Huber    | .omp_offloading.descriptor     | __tgt_bin_desc      | .omp_offloading_entries | Binary descriptor object (see :ref:`binary_descriptor`) |
35228ab5944SJoseph Huber    +--------------------------------+---------------------+-------------------------+---------------------------------------------------------+
35328ab5944SJoseph Huber
35428ab5944SJoseph Huber.. _binary_descriptor:
35528ab5944SJoseph Huber
35628ab5944SJoseph HuberBinary Descriptor for Device Images
35728ab5944SJoseph Huber^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
35828ab5944SJoseph Huber
35928ab5944SJoseph HuberThis object is passed to the offloading runtime at program startup and it
36028ab5944SJoseph Huberdescribes all device images available in the executable or shared library. It
36128ab5944SJoseph Huberis defined as follows:
36228ab5944SJoseph Huber
36328ab5944SJoseph Huber.. code-block:: c
36428ab5944SJoseph Huber
36528ab5944SJoseph Huber  __attribute__((visibility("hidden")))
36628ab5944SJoseph Huber  extern __tgt_offload_entry *__start_omp_offloading_entries;
36728ab5944SJoseph Huber  __attribute__((visibility("hidden")))
36828ab5944SJoseph Huber  extern __tgt_offload_entry *__stop_omp_offloading_entries;
36928ab5944SJoseph Huber  static const char Image0[] = { <Bufs.front() contents> };
37028ab5944SJoseph Huber  ...
37128ab5944SJoseph Huber  static const char ImageN[] = { <Bufs.back() contents> };
37228ab5944SJoseph Huber  static const __tgt_device_image Images[] = {
37328ab5944SJoseph Huber    {
37428ab5944SJoseph Huber      Image0,                            /*ImageStart*/
37528ab5944SJoseph Huber      Image0 + sizeof(Image0),           /*ImageEnd*/
37628ab5944SJoseph Huber      __start_omp_offloading_entries,    /*EntriesBegin*/
37728ab5944SJoseph Huber      __stop_omp_offloading_entries      /*EntriesEnd*/
37828ab5944SJoseph Huber    },
37928ab5944SJoseph Huber    ...
38028ab5944SJoseph Huber    {
38128ab5944SJoseph Huber      ImageN,                            /*ImageStart*/
38228ab5944SJoseph Huber      ImageN + sizeof(ImageN),           /*ImageEnd*/
38328ab5944SJoseph Huber      __start_omp_offloading_entries,    /*EntriesBegin*/
38428ab5944SJoseph Huber      __stop_omp_offloading_entries      /*EntriesEnd*/
38528ab5944SJoseph Huber    }
38628ab5944SJoseph Huber  };
38728ab5944SJoseph Huber  static const __tgt_bin_desc BinDesc = {
38828ab5944SJoseph Huber    sizeof(Images) / sizeof(Images[0]),  /*NumDeviceImages*/
38928ab5944SJoseph Huber    Images,                              /*DeviceImages*/
39028ab5944SJoseph Huber    __start_omp_offloading_entries,      /*HostEntriesBegin*/
39128ab5944SJoseph Huber    __stop_omp_offloading_entries        /*HostEntriesEnd*/
39228ab5944SJoseph Huber  };
39328ab5944SJoseph Huber
3945c9ee351SJoseph Huber
39528ab5944SJoseph HuberGlobal Constructor and Destructor
39628ab5944SJoseph Huber^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
39728ab5944SJoseph Huber
39828ab5944SJoseph HuberThe global constructor (``.omp_offloading.descriptor_reg()``) registers the
39928ab5944SJoseph Huberdevice images with the runtime by calling the ``__tgt_register_lib()`` runtime
40028ab5944SJoseph Huberfunction. The constructor is explicitly defined in ``.text.startup`` section and
40128ab5944SJoseph Huberis run once when the program starts. Similarly, the global destructor
40228ab5944SJoseph Huber(``.omp_offloading.descriptor_unreg()``) calls ``__tgt_unregister_lib()`` for
40328ab5944SJoseph Huberthe destructor and is also defined in ``.text.startup`` section and run when the
40428ab5944SJoseph Huberprogram exits.
40528ab5944SJoseph Huber
40628ab5944SJoseph HuberOffloading Example
40728ab5944SJoseph Huber------------------
40828ab5944SJoseph Huber
40928ab5944SJoseph HuberThis section contains a simple example of generating offloading code using
41028ab5944SJoseph HuberOpenMP offloading. We will use a simple ``ZAXPY`` BLAS routine.
41128ab5944SJoseph Huber
41228ab5944SJoseph Huber.. code-block:: c++
41328ab5944SJoseph Huber
41428ab5944SJoseph Huber    #include <complex>
41528ab5944SJoseph Huber
41628ab5944SJoseph Huber    using complex = std::complex<double>;
41728ab5944SJoseph Huber
41828ab5944SJoseph Huber    void zaxpy(complex *X, complex *Y, complex D, std::size_t N) {
41928ab5944SJoseph Huber    #pragma omp target teams distribute parallel for
42028ab5944SJoseph Huber      for (std::size_t i = 0; i < N; ++i)
42128ab5944SJoseph Huber        Y[i] = D * X[i] + Y[i];
42228ab5944SJoseph Huber    }
42328ab5944SJoseph Huber
42428ab5944SJoseph Huber    int main() {
42528ab5944SJoseph Huber      const std::size_t N = 1024;
42628ab5944SJoseph Huber      complex X[N], Y[N], D;
42728ab5944SJoseph Huber    #pragma omp target data map(to:X[0 : N]) map(tofrom:Y[0 : N])
42828ab5944SJoseph Huber      zaxpy(X, Y, D, N);
42928ab5944SJoseph Huber    }
43028ab5944SJoseph Huber
43128ab5944SJoseph HuberThis code is compiled using the following Clang flags.
43228ab5944SJoseph Huber
43328ab5944SJoseph Huber.. code-block:: console
43428ab5944SJoseph Huber
43528ab5944SJoseph Huber    $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 zaxpy.cpp -c
43628ab5944SJoseph Huber
43715e62062SJoseph HuberThe output section in the object file can be seen using the ``readelf`` utility.
43815e62062SJoseph HuberThe ``.llvm.offloading`` section has the ``SHF_EXCLUDE`` flag so it will be
43915e62062SJoseph Huberremoved from the final executable or shared library by the linker.
44028ab5944SJoseph Huber
44128ab5944SJoseph Huber.. code-block:: text
44228ab5944SJoseph Huber
44328ab5944SJoseph Huber  $ llvm-readelf -WS zaxpy.o
44415e62062SJoseph Huber  Section Headers:
44515e62062SJoseph Huber  [Nr] Name                   Type     Address          Off    Size   ES Flg Lk Inf Al
44615e62062SJoseph Huber  [11] omp_offloading_entries PROGBITS 0000000000000000 0001f0 000040 00   A  0   0  1
44715e62062SJoseph Huber  [12] .llvm.offloading       PROGBITS 0000000000000000 000260 030950 00   E  0   0  8
44815e62062SJoseph Huber
44928ab5944SJoseph Huber
45028ab5944SJoseph HuberCompiling this file again will invoke the ``clang-linker-wrapper`` utility to
45128ab5944SJoseph Huberextract and link the device code stored at the section named
45215e62062SJoseph Huber``.llvm.offloading`` and then use entries stored in
45328ab5944SJoseph Huberthe section named ``omp_offloading_entries`` to create the symbols necessary for
45428ab5944SJoseph Huber``libomptarget`` to register the device image and call the entry function.
45528ab5944SJoseph Huber
45628ab5944SJoseph Huber.. code-block:: console
45728ab5944SJoseph Huber
45828ab5944SJoseph Huber    $ clang++ -fopenmp -fopenmp-targets=nvptx64 zaxpy.o -o zaxpy
45928ab5944SJoseph Huber    $ ./zaxpy
46028ab5944SJoseph Huber
46128ab5944SJoseph HuberWe can see the steps created by clang to generate the offloading code using the
46228ab5944SJoseph Huber``-ccc-print-phases`` option in Clang. This matches the description in
46328ab5944SJoseph Huber:ref:`Offloading Overview`.
46428ab5944SJoseph Huber
46528ab5944SJoseph Huber.. code-block:: console
46628ab5944SJoseph Huber
46728ab5944SJoseph Huber    $ clang++ -fopenmp -fopenmp-targets=nvptx64 -ccc-print-phases zaxpy.cpp
46828ab5944SJoseph Huber    # "x86_64-unknown-linux-gnu" - "clang", inputs: ["zaxpy.cpp"], output: "/tmp/zaxpy-host.bc"
46928ab5944SJoseph Huber    # "nvptx64-nvidia-cuda" - "clang", inputs: ["zaxpy.cpp", "/tmp/zaxpy-e6a41b.bc"], output: "/tmp/zaxpy-07f434.s"
47028ab5944SJoseph Huber    # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["/tmp/zaxpy-07f434.s"], output: "/tmp/zaxpy-0af7b7.o"
47128ab5944SJoseph Huber    # "x86_64-unknown-linux-gnu" - "clang", inputs: ["/tmp/zaxpy-e6a41b.bc", "/tmp/zaxpy-0af7b7.o"], output: "/tmp/zaxpy-416cad.o"
47228ab5944SJoseph Huber    # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["/tmp/zaxpy-416cad.o"], output: "a.out"
473*5c840542SJoseph Huber
474*5c840542SJoseph HuberRelocatable Linking
475*5c840542SJoseph Huber-------------------
476*5c840542SJoseph Huber
477*5c840542SJoseph HuberThe offloading compilation pipeline normally will defer the final device linking
478*5c840542SJoseph Huberand runtime registration until the ``clang-linker-wrapper`` is run to create the
479*5c840542SJoseph Huberexecutable. This is the standard behaviour when compiling for OpenMP offloading
480*5c840542SJoseph Huberor CUDA and HIP in ``-fgpu-rdc`` mode. However, there are some cases where the
481*5c840542SJoseph Huberuser may wish to perform this device handling prematurely. This is described in
482*5c840542SJoseph Huberthe :doc:`linker wrapper documentation<ClangLinkerWrapper>`.
483*5c840542SJoseph Huber
484*5c840542SJoseph HuberEffectively, this allows the user to handle offloading specific linking ahead of
485*5c840542SJoseph Hubertime when shipping objects or static libraries. This can be thought of as
486*5c840542SJoseph Huberperforming a standard ``-fno-gpu-rdc`` compilation on a subset of object files.
487*5c840542SJoseph HuberThis can be useful to reduce link time, prevent users from interacting with the
488*5c840542SJoseph Huberlibrary's device code, or for shipping libraries to incompatible compilers.
489*5c840542SJoseph Huber
490*5c840542SJoseph HuberNormally, if a relocatable link is done using ``clang -r`` it will simply merge
491*5c840542SJoseph Huberthe ``.llvm.offloading`` sections which will then be linked later when the
492*5c840542SJoseph Huberexecutable is created. However, if the ``-r`` flag is used with the offloading
493*5c840542SJoseph Hubertoolchain, it will perform the device linking and registration phases and then
494*5c840542SJoseph Hubermerge the registration code into the final relocatable object file.
495*5c840542SJoseph Huber
496*5c840542SJoseph HuberThe following example shows how using the relocatable link with the offloading
497*5c840542SJoseph Huberpipeline can create a static library with offloading code that can be
498*5c840542SJoseph Huberredistributed without requiring any additional handling.
499*5c840542SJoseph Huber
500*5c840542SJoseph Huber.. code-block:: console
501*5c840542SJoseph Huber
502*5c840542SJoseph Huber    $ clang++ -fopenmp -fopenmp-targets=nvptx64 foo.cpp -c
503*5c840542SJoseph Huber    $ clang++ -lomptarget.devicertl --offload-link -r foo.o -o merged.o
504*5c840542SJoseph Huber    $ llvm-ar rcs libfoo.a merged.o
505*5c840542SJoseph Huber    # g++ app.cpp -L. -lfoo
506*5c840542SJoseph Huber
507