xref: /openbsd-src/gnu/llvm/clang/docs/ClangOffloadBundler.rst (revision 12c855180aad702bbcca06e0398d774beeafb155)
1=====================
2Clang Offload Bundler
3=====================
4
5.. contents::
6   :local:
7
8.. _clang-offload-bundler:
9
10Introduction
11============
12
13For heterogeneous single source programming languages, use one or more
14``--offload-arch=<target-id>`` Clang options to specify the target IDs of the
15code to generate for the offload code regions.
16
17The tool chain may perform multiple compilations of a translation unit to
18produce separate code objects for the host and potentially multiple offloaded
19devices. The ``clang-offload-bundler`` tool may be used as part of the tool
20chain to combine these multiple code objects into a single bundled code object.
21
22The tool chain may use a bundled code object as an intermediate step so that
23each tool chain step consumes and produces a single file as in traditional
24non-heterogeneous tool chains. The bundled code object contains the code objects
25for the host and all the offload devices.
26
27A bundled code object may also be used to bundle just the offloaded code
28objects, and embedded as data into the host code object. The host compilation
29includes an ``init`` function that will use the runtime corresponding to the
30offload kind (see :ref:`clang-offload-kind-table`) to load the offload code
31objects appropriate to the devices present when the host program is executed.
32
33Supported File Formats
34======================
35Several text and binary file formats are supported for bundling/unbundling. See
36:ref:`supported-file-formats-table` for a list of currently supported formats.
37
38  .. table:: Supported File Formats
39     :name: supported-file-formats-table
40
41     +--------------------+----------------+-------------+
42     | File Format        | File Extension | Text/Binary |
43     +====================+================+=============+
44     | CPP output         |        i       |     Text    |
45     +--------------------+----------------+-------------+
46     | C++ CPP output     |       ii       |     Text    |
47     +--------------------+----------------+-------------+
48     | CUDA/HIP output    |       cui      |     Text    |
49     +--------------------+----------------+-------------+
50     | Dependency         |        d       |     Text    |
51     +--------------------+----------------+-------------+
52     | LLVM               |       ll       |     Text    |
53     +--------------------+----------------+-------------+
54     | LLVM Bitcode       |       bc       |    Binary   |
55     +--------------------+----------------+-------------+
56     | Assembler          |        s       |     Text    |
57     +--------------------+----------------+-------------+
58     | Object             |        o       |    Binary   |
59     +--------------------+----------------+-------------+
60     | Archive of objects |        a       |    Binary   |
61     +--------------------+----------------+-------------+
62     | Precompiled header |       gch      |    Binary   |
63     +--------------------+----------------+-------------+
64     | Clang AST file     |       ast      |    Binary   |
65     +--------------------+----------------+-------------+
66
67.. _clang-bundled-code-object-layout-text:
68
69Bundled Text File Layout
70========================
71
72The format of the bundled files is currently very simple: text formats are
73concatenated with comments that have a magic string and bundle entry ID in
74between.
75
76::
77
78  "Comment OFFLOAD_BUNDLER_MAGIC_STR__START__ 1st Bundle Entry ID"
79  Bundle 1
80  "Comment OFFLOAD_BUNDLER_MAGIC_STR__END__ 1st Bundle Entry ID"
81  ...
82  "Comment OFFLOAD_BUNDLER_MAGIC_STR__START__ Nth Bundle Entry ID"
83  Bundle N
84  "Comment OFFLOAD_BUNDLER_MAGIC_STR__END__ 1st Bundle Entry ID"
85
86.. _clang-bundled-code-object-layout:
87
88Bundled Binary File Layout
89==========================
90
91The layout of a bundled code object is defined by the following table:
92
93  .. table:: Bundled Code Object Layout
94    :name: bundled-code-object-layout-table
95
96    =================================== ======= ================ ===============================
97    Field                               Type    Size in Bytes    Description
98    =================================== ======= ================ ===============================
99    Magic String                        string  24               ``__CLANG_OFFLOAD_BUNDLE__``
100    Number Of Bundle Entries            integer 8                Number of bundle entries.
101    1st Bundle Entry Code Object Offset integer 8                Byte offset from beginning of
102                                                                 bundled code object to 1st code
103                                                                 object.
104    1st Bundle Entry Code Object Size   integer 8                Byte size of 1st code object.
105    1st Bundle Entry ID Length          integer 8                Character length of bundle
106                                                                 entry ID of 1st code object.
107    1st Bundle Entry ID                 string  1st Bundle Entry Bundle entry ID of 1st code
108                                                ID Length        object. This is not NUL
109                                                                 terminated. See
110                                                                 :ref:`clang-bundle-entry-id`.
111    \...
112    Nth Bundle Entry Code Object Offset integer 8
113    Nth Bundle Entry Code Object Size   integer 8
114    Nth Bundle Entry ID Length          integer 8
115    Nth Bundle Entry ID                 string  1st Bundle Entry
116                                                ID Length
117    1st Bundle Entry Code Object        bytes   1st Bundle Entry
118                                                Code Object Size
119    \...
120    Nth Bundle Entry Code Object        bytes   Nth Bundle Entry
121                                                Code Object Size
122    =================================== ======= ================ ===============================
123
124.. _clang-bundle-entry-id:
125
126Bundle Entry ID
127===============
128
129Each entry in a bundled code object (see
130:ref:`clang-bundled-code-object-layout`) has a bundle entry ID that indicates
131the kind of the entry's code object and the runtime that manages it.
132
133Bundle entry ID syntax is defined by the following BNF syntax:
134
135.. code::
136
137  <bundle-entry-id> ::== <offload-kind> "-" <target-triple> [ "-" <target-id> ]
138
139Where:
140
141**offload-kind**
142  The runtime responsible for managing the bundled entry code object. See
143  :ref:`clang-offload-kind-table`.
144
145  .. table:: Bundled Code Object Offload Kind
146      :name: clang-offload-kind-table
147
148      ============= ==============================================================
149      Offload Kind  Description
150      ============= ==============================================================
151      host          Host code object. ``clang-offload-bundler`` always includes
152                    this entry as the first bundled code object entry. For an
153                    embedded bundled code object this entry is not used by the
154                    runtime and so is generally an empty code object.
155
156      hip           Offload code object for the HIP language. Used for all
157                    HIP language offload code objects when the
158                    ``clang-offload-bundler`` is used to bundle code objects as
159                    intermediate steps of the tool chain. Also used for AMD GPU
160                    code objects before ABI version V4 when the
161                    ``clang-offload-bundler`` is used to create a *fat binary*
162                    to be loaded by the HIP runtime. The fat binary can be
163                    loaded directly from a file, or be embedded in the host code
164                    object as a data section with the name ``.hip_fatbin``.
165
166      hipv4         Offload code object for the HIP language. Used for AMD GPU
167                    code objects with at least ABI version V4 when the
168                    ``clang-offload-bundler`` is used to create a *fat binary*
169                    to be loaded by the HIP runtime. The fat binary can be
170                    loaded directly from a file, or be embedded in the host code
171                    object as a data section with the name ``.hip_fatbin``.
172
173      openmp        Offload code object for the OpenMP language extension.
174      ============= ==============================================================
175
176**target-triple**
177    The target triple of the code object.
178
179**target-id**
180  The canonical target ID of the code object. Present only if the target
181  supports a target ID. See :ref:`clang-target-id`.
182
183Each entry of a bundled code object must have a different bundle entry ID. There
184can be multiple entries for the same processor provided they differ in target
185feature settings. If there is an entry with a target feature specified as *Any*,
186then all entries must specify that target feature as *Any* for the same
187processor. There may be additional target specific restrictions.
188
189.. _clang-target-id:
190
191Target ID
192=========
193
194A target ID is used to indicate the processor and optionally its configuration,
195expressed by a set of target features, that affect ISA generation. It is target
196specific if a target ID is supported, or if the target triple alone is
197sufficient to specify the ISA generation.
198
199It is used with the ``-mcpu=<target-id>`` and ``--offload-arch=<target-id>``
200Clang compilation options to specify the kind of code to generate.
201
202It is also used as part of the bundle entry ID to identify the code object. See
203:ref:`clang-bundle-entry-id`.
204
205Target ID syntax is defined by the following BNF syntax:
206
207.. code::
208
209  <target-id> ::== <processor> ( ":" <target-feature> ( "+" | "-" ) )*
210
211Where:
212
213**processor**
214  Is a the target specific processor or any alternative processor name.
215
216**target-feature**
217  Is a target feature name that is supported by the processor. Each target
218  feature must appear at most once in a target ID and can have one of three
219  values:
220
221  *Any*
222    Specified by omitting the target feature from the target ID.
223    A code object compiled with a target ID specifying the default
224    value of a target feature can be loaded and executed on a processor
225    configured with the target feature on or off.
226
227  *On*
228    Specified by ``+``, indicating the target feature is enabled. A code
229    object compiled with a target ID specifying a target feature on
230    can only be loaded on a processor configured with the target feature on.
231
232  *Off*
233    specified by ``-``, indicating the target feature is disabled. A code
234    object compiled with a target ID specifying a target feature off
235    can only be loaded on a processor configured with the target feature off.
236
237There are two forms of target ID:
238
239*Non-Canonical Form*
240  The non-canonical form is used as the input to user commands to allow the user
241  greater convenience. It allows both the primary and alternative processor name
242  to be used and the target features may be specified in any order.
243
244*Canonical Form*
245  The canonical form is used for all generated output to allow greater
246  convenience for tools that consume the information. It is also used for
247  internal passing of information between tools. Only the primary and not
248  alternative processor name is used and the target features are specified in
249  alphabetic order. Command line tools convert non-canonical form to canonical
250  form.
251
252Target Specific information
253===========================
254
255Target specific information is available for the following:
256
257*AMD GPU*
258  AMD GPU supports target ID and target features. See `User Guide for AMDGPU Backend
259  <https://llvm.org/docs/AMDGPUUsage.html>`_ which defines the `processors
260  <https://llvm.org/docs/AMDGPUUsage.html#amdgpu-processors>`_ and `target
261  features <https://llvm.org/docs/AMDGPUUsage.html#amdgpu-target-features>`_
262  supported.
263
264Most other targets do not support target IDs.
265
266Archive Unbundling
267==================
268
269Unbundling of heterogeneous device archive is done to create device specific
270archives. Heterogeneous Device Archive is in a format compatible with GNU ar
271utility and contains a collection of bundled device binaries where each bundle
272file will contain device binaries for a host and one or more targets. The
273output device specific archive is in a format compatible with GNU ar utility
274and contains a collection of device binaries for a specific target.
275
276.. code::
277
278  Heterogeneous Device Archive, HDA = {F1.X, F2.X, ..., FN.Y}
279  where, Fi = Bundle{Host-DeviceBinary, T1-DeviceBinary, T2-DeviceBinary, ...,
280                     Tm-DeviceBinary},
281         Ti = {Target i, qualified using Bundle Entry ID},
282         X/Y = \*.bc for AMDGPU and \*.cubin for NVPTX
283
284  Device Specific Archive, DSA(Tk) = {F1-Tk-DeviceBinary.X, F2-Tk-DeviceBinary.X, ...
285                                      FN-Tk-DeviceBinary.Y}
286  where, Fi-Tj-DeviceBinary.X represents device binary of i-th bundled device
287  binary file for target Tj.
288
289clang-offload-bundler extracts compatible device binaries for a given target
290from the bundled device binaries in a heterogeneous device archive and creates
291a target specific device archive without bundling.
292
293clang-offload-bundler determines whether a device binary is compatible with a
294target by comparing bundle ID's. Two bundle ID's are considered compatible if:
295
296  * Their offload kind are the same
297  * Their target triple are the same
298  * Their GPUArch are the same
299