xref: /llvm-project/clang/docs/ClangOffloadBundler.rst (revision e87b843811e147db8d1edd7fe2dd52bb90be6ebc)
1=====================
2Clang Offload Bundler
3=====================
4
5.. contents::
6   :local:
7
8.. _clang-offload-bundler:
9
10Introduction
11============
12
13For heterogeneous single source programming languages, use one or more
14``--offload-arch=<target-id>`` Clang options to specify the target IDs of the
15code to generate for the offload code regions.
16
17The tool chain may perform multiple compilations of a translation unit to
18produce separate code objects for the host and potentially multiple offloaded
19devices. The ``clang-offload-bundler`` tool may be used as part of the tool
20chain to combine these multiple code objects into a single bundled code object.
21
22The tool chain may use a bundled code object as an intermediate step so that
23each tool chain step consumes and produces a single file as in traditional
24non-heterogeneous tool chains. The bundled code object contains the code objects
25for the host and all the offload devices.
26
27A bundled code object may also be used to bundle just the offloaded code
28objects, and embedded as data into the host code object. The host compilation
29includes an ``init`` function that will use the runtime corresponding to the
30offload kind (see :ref:`clang-offload-kind-table`) to load the offload code
31objects appropriate to the devices present when the host program is executed.
32
33:program:`clang-offload-bundler` is located in
34`clang/tools/clang-offload-bundler`.
35
36.. code-block:: console
37
38  $ clang-offload-bundler -help
39  OVERVIEW: A tool to bundle several input files of the specified type <type>
40  referring to the same source file but different targets into a single
41  one. The resulting file can also be unbundled into different files by
42  this tool if -unbundle is provided.
43
44  USAGE: clang-offload-bundler [options]
45
46  OPTIONS:
47
48  Generic Options:
49
50    --help                  - Display available options (--help-hidden for more)
51    --help-list             - Display list of available options (--help-list-hidden for more)
52    --version               - Display the version of this program
53
54  clang-offload-bundler options:
55
56    --###                   - Print any external commands that are to be executed instead of actually executing them - for testing purposes.
57    --allow-missing-bundles - Create empty files if bundles are missing when unbundling.
58    --bundle-align=<uint>   - Alignment of bundle for binary files
59    --check-input-archive   - Check if input heterogeneous archive is valid in terms of TargetID rules.
60    --inputs=<string>       - [<input file>,...]
61    --list                  - List bundle IDs in the bundled file.
62    --outputs=<string>      - [<output file>,...]
63    --targets=<string>      - [<offload kind>-<target triple>,...]
64    --type=<string>         - Type of the files to be bundled/unbundled.
65                              Current supported types are:
66                                i   - cpp-output
67                                ii  - c++-cpp-output
68                                cui - cuda/hip-output
69                                d   - dependency
70                                ll  - llvm
71                                bc  - llvm-bc
72                                s   - assembler
73                                o   - object
74                                a   - archive of bundled files
75                                gch - precompiled-header
76                                ast - clang AST file
77    --unbundle              - Unbundle bundled file into several output files.
78
79Usage
80=====
81
82This tool can be used as follows for bundling:
83
84::
85
86  clang-offload-bundler -targets=triple1,triple2 -type=ii -inputs=a.triple1.ii,a.triple2.ii -outputs=a.ii
87
88or, it can be used as follows for unbundling:
89
90::
91
92  clang-offload-bundler -targets=triple1,triple2 -type=ii -outputs=a.triple1.ii,a.triple2.ii -inputs=a.ii -unbundle
93
94
95Supported File Formats
96======================
97
98Multiple text and binary file formats are supported for bundling/unbundling. See
99:ref:`supported-file-formats-table` for a list of currently supported input
100formats. Use the ``File Type`` column to determine the value to pass to the
101``--type`` option based on the type of input files while bundling/unbundling.
102
103  .. table:: Supported File Formats
104     :name: supported-file-formats-table
105
106     +--------------------------+----------------+-------------+
107     | File Format              | File Type      | Text/Binary |
108     +==========================+================+=============+
109     | CPP output               |        i       |     Text    |
110     +--------------------------+----------------+-------------+
111     | C++ CPP output           |       ii       |     Text    |
112     +--------------------------+----------------+-------------+
113     | CUDA/HIP output          |       cui      |     Text    |
114     +--------------------------+----------------+-------------+
115     | Dependency               |        d       |     Text    |
116     +--------------------------+----------------+-------------+
117     | LLVM                     |       ll       |     Text    |
118     +--------------------------+----------------+-------------+
119     | LLVM Bitcode             |       bc       |    Binary   |
120     +--------------------------+----------------+-------------+
121     | Assembler                |        s       |     Text    |
122     +--------------------------+----------------+-------------+
123     | Object                   |        o       |    Binary   |
124     +--------------------------+----------------+-------------+
125     | Archive of bundled files |        a       |    Binary   |
126     +--------------------------+----------------+-------------+
127     | Precompiled header       |       gch      |    Binary   |
128     +--------------------------+----------------+-------------+
129     | Clang AST file           |       ast      |    Binary   |
130     +--------------------------+----------------+-------------+
131
132.. _clang-bundled-code-object-layout-text:
133
134Bundled Text File Layout
135========================
136
137The text file formats are concatenated with comments that have a magic string
138and bundle entry ID in between. The BNF syntax to represent a code object
139bundle file is:
140
141::
142
143  <file>    ::== <bundle> | <bundle> <file>
144  <bundle>  ::== <comment> <start> <bundle_id> <eol> <bundle> <eol>
145                 <comment> end <bundle_id> <eol>
146  <start>   ::== OFFLOAD_BUNDLER_MAGIC_STR__START__
147  <end>     ::== OFFLOAD_BUNDLER_MAGIC_STR__END__
148
149**comment**
150  The symbol used for starting single-line comment in the file type of
151  constituting bundles. E.g. it is ";" for ll ``File Type`` and "#" for "s"
152  ``File Type``.
153
154**bundle_id**
155  The :ref:`clang-bundle-entry-id` for the enclosing bundle.
156
157**eol**
158  The end of line character.
159
160**bundle**
161  The code object stored in one of the supported text file formats.
162
163**OFFLOAD_BUNDLER_MAGIC_STR__**
164  Magic string that marks the existence of offloading data i.e.
165  "__CLANG_OFFLOAD_BUNDLE__".
166
167.. _clang-bundled-code-object-layout:
168
169Bundled Binary File Layout
170==========================
171
172The layout of a bundled code object is defined by the following table:
173
174  .. table:: Bundled Code Object Layout
175    :name: bundled-code-object-layout-table
176
177    =================================== ======= ================ ===============================
178    Field                               Type    Size in Bytes    Description
179    =================================== ======= ================ ===============================
180    Magic String                        string  24               ``__CLANG_OFFLOAD_BUNDLE__``
181    Number Of Bundle Entries            integer 8                Number of bundle entries.
182    1st Bundle Entry Code Object Offset integer 8                Byte offset from beginning of
183                                                                 bundled code object to 1st code
184                                                                 object.
185    1st Bundle Entry Code Object Size   integer 8                Byte size of 1st code object.
186    1st Bundle Entry ID Length          integer 8                Character length of bundle
187                                                                 entry ID of 1st code object.
188    1st Bundle Entry ID                 string  1st Bundle Entry Bundle entry ID of 1st code
189                                                ID Length        object. This is not NUL
190                                                                 terminated. See
191                                                                 :ref:`clang-bundle-entry-id`.
192    \...
193    Nth Bundle Entry Code Object Offset integer 8
194    Nth Bundle Entry Code Object Size   integer 8
195    Nth Bundle Entry ID Length          integer 8
196    Nth Bundle Entry ID                 string  1st Bundle Entry
197                                                ID Length
198    1st Bundle Entry Code Object        bytes   1st Bundle Entry
199                                                Code Object Size
200    \...
201    Nth Bundle Entry Code Object        bytes   Nth Bundle Entry
202                                                Code Object Size
203    =================================== ======= ================ ===============================
204
205.. _clang-bundle-entry-id:
206
207Bundle Entry ID
208===============
209
210Each entry in a bundled code object (see :ref:`clang-bundled-code-object-layout-text`
211and :ref:`clang-bundled-code-object-layout`) has a bundle entry ID that indicates
212the kind of the entry's code object and the runtime that manages it.
213
214Bundle entry ID syntax is defined by the following BNF syntax:
215
216.. code::
217
218  <bundle-entry-id> ::== <offload-kind> "-" <target-triple> [ "-" <target-id> ]
219
220Where:
221
222**offload-kind**
223  The runtime responsible for managing the bundled entry code object. See
224  :ref:`clang-offload-kind-table`.
225
226  .. table:: Bundled Code Object Offload Kind
227      :name: clang-offload-kind-table
228
229      ============= ==============================================================
230      Offload Kind  Description
231      ============= ==============================================================
232      host          Host code object. ``clang-offload-bundler`` always includes
233                    this entry as the first bundled code object entry. For an
234                    embedded bundled code object this entry is not used by the
235                    runtime and so is generally an empty code object.
236
237      hip           Offload code object for the HIP language. Used for all
238                    HIP language offload code objects when the
239                    ``clang-offload-bundler`` is used to bundle code objects as
240                    intermediate steps of the tool chain. Also used for AMD GPU
241                    code objects before ABI version V4 when the
242                    ``clang-offload-bundler`` is used to create a *fat binary*
243                    to be loaded by the HIP runtime. The fat binary can be
244                    loaded directly from a file, or be embedded in the host code
245                    object as a data section with the name ``.hip_fatbin``.
246
247      hipv4         Offload code object for the HIP language. Used for AMD GPU
248                    code objects with at least ABI version V4 and above when the
249                    ``clang-offload-bundler`` is used to create a *fat binary*
250                    to be loaded by the HIP runtime. The fat binary can be
251                    loaded directly from a file, or be embedded in the host code
252                    object as a data section with the name ``.hip_fatbin``.
253
254      openmp        Offload code object for the OpenMP language extension.
255      ============= ==============================================================
256
257Note: The distinction between the `hip` and `hipv4` offload kinds is historically based.
258Originally, these designations might have indicated different versions of the
259code object ABI. However, as the system has evolved, the ABI version is now embedded
260directly within the code object itself, making these historical distinctions irrelevant
261during the unbundling process. Consequently, `hip` and `hipv4` are treated as compatible
262in current implementations, facilitating interchangeable handling of code objects
263without differentiation based on offload kind.
264
265**target-triple**
266    The target triple of the code object. See `Target Triple
267    <https://clang.llvm.org/docs/CrossCompilation.html#target-triple>`_.
268
269    The bundler accepts target triples with or without the optional environment
270    field:
271
272    ``<arch><sub>-<vendor>-<sys>``, or
273    ``<arch><sub>-<vendor>-<sys>-<env>``
274
275    However, in order to standardize outputs for tools that consume bitcode
276    bundles, bundles written by the bundler internally use only the 4-field
277    target triple:
278
279    ``<arch><sub>-<vendor>-<sys>-<env>``
280
281**target-id**
282  The canonical target ID of the code object. Present only if the target
283  supports a target ID. See :ref:`clang-target-id`.
284
285.. _code-object-composition:
286
287Bundled Code Object Composition
288-------------------------------
289
290  * Each entry of a bundled code object must have a different bundle entry ID.
291  * There can be multiple entries for the same processor provided they differ
292    in target feature settings.
293  * If there is an entry with a target feature specified as *Any*, then all
294    entries must specify that target feature as *Any* for the same processor.
295
296There may be additional target specific restrictions.
297
298.. _compatibility-bundle-entry-id:
299
300Compatibility Rules for Bundle Entry ID
301---------------------------------------
302
303  A code object, specified using its Bundle Entry ID, can be loaded and
304  executed on a target processor, if:
305
306  * Their offload kinds are the same or comptible.
307  * Their target triples are compatible.
308  * Their Target IDs are compatible as defined in :ref:`compatibility-target-id`.
309
310.. _clang-target-id:
311
312Target ID
313=========
314
315A target ID is used to indicate the processor and optionally its configuration,
316expressed by a set of target features, that affect ISA generation. It is target
317specific if a target ID is supported, or if the target triple alone is
318sufficient to specify the ISA generation.
319
320It is used with the ``-mcpu=<target-id>`` and ``--offload-arch=<target-id>``
321Clang compilation options to specify the kind of code to generate.
322
323It is also used as part of the bundle entry ID to identify the code object. See
324:ref:`clang-bundle-entry-id`.
325
326Target ID syntax is defined by the following BNF syntax:
327
328.. code::
329
330  <target-id> ::== <processor> ( ":" <target-feature> ( "+" | "-" ) )*
331
332Where:
333
334**processor**
335  Is a the target specific processor or any alternative processor name.
336
337**target-feature**
338  Is a target feature name that is supported by the processor. Each target
339  feature must appear at most once in a target ID and can have one of three
340  values:
341
342  *Any*
343    Specified by omitting the target feature from the target ID.
344    A code object compiled with a target ID specifying the default
345    value of a target feature can be loaded and executed on a processor
346    configured with the target feature on or off.
347
348  *On*
349    Specified by ``+``, indicating the target feature is enabled. A code
350    object compiled with a target ID specifying a target feature on
351    can only be loaded on a processor configured with the target feature on.
352
353  *Off*
354    specified by ``-``, indicating the target feature is disabled. A code
355    object compiled with a target ID specifying a target feature off
356    can only be loaded on a processor configured with the target feature off.
357
358.. _compatibility-target-id:
359
360Compatibility Rules for Target ID
361---------------------------------
362
363  A code object compiled for a Target ID is considered compatible for a
364  target, if:
365
366  * Their processor is same.
367  * Their feature set is compatible as defined above.
368
369There are two forms of target ID:
370
371*Non-Canonical Form*
372  The non-canonical form is used as the input to user commands to allow the user
373  greater convenience. It allows both the primary and alternative processor name
374  to be used and the target features may be specified in any order.
375
376*Canonical Form*
377  The canonical form is used for all generated output to allow greater
378  convenience for tools that consume the information. It is also used for
379  internal passing of information between tools. Only the primary and not
380  alternative processor name is used and the target features are specified in
381  alphabetic order. Command line tools convert non-canonical form to canonical
382  form.
383
384Target Specific information
385===========================
386
387Target specific information is available for the following:
388
389*AMD GPU*
390  AMD GPU supports target ID and target features. See `User Guide for AMDGPU Backend
391  <https://llvm.org/docs/AMDGPUUsage.html>`_ which defines the `processors
392  <https://llvm.org/docs/AMDGPUUsage.html#amdgpu-processors>`_ and `target
393  features <https://llvm.org/docs/AMDGPUUsage.html#amdgpu-target-features>`_
394  supported.
395
396Most other targets do not support target IDs.
397
398Archive Unbundling
399==================
400
401Unbundling of a heterogeneous device archive (HDA) is done to create device specific
402archives. HDA is in a format compatible with GNU ``ar`` utility and contains a
403collection of bundled device binaries where each bundle file will contain
404device binaries for a host and one or more targets. The output device-specific
405archive is in a format compatible with GNU ``ar`` utility and contains a
406collection of device binaries for a specific target.
407
408::
409
410  Heterogeneous Device Archive, HDA = {F1.X, F2.X, ..., FN.Y}
411  where, Fi = Bundle{Host-DeviceBinary, T1-DeviceBinary, T2-DeviceBinary, ...,
412                     Tm-DeviceBinary},
413         Ti = {Target i, qualified using Bundle Entry ID},
414         X/Y = \*.bc for AMDGPU and \*.cubin for NVPTX
415
416  Device Specific Archive, DSA(Tk) = {F1-Tk-DeviceBinary.X, F2-Tk-DeviceBinary.X, ...
417                                      FN-Tk-DeviceBinary.Y}
418  where, Fi-Tj-DeviceBinary.X represents device binary of i-th bundled device
419  binary file for target Tj.
420
421The clang-offload-bundler extracts compatible device binaries for a given target
422from the bundled device binaries in a heterogeneous device archive and creates
423a target-specific device archive without bundling.
424
425The clang-offload-bundler determines whether a device binary is compatible
426with a target by comparing bundle IDs. Two bundle IDs are considered
427compatible if:
428
429  * Their offload kinds are the same
430  * Their target triples are the same
431  * Their Target IDs are the same
432
433Creating a Heterogeneous Device Archive
434---------------------------------------
435
4361. Compile source file(s) to generate object file(s)
437
438  ::
439
440    clang -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa,amdgcn-amd-amdhsa,\
441       nvptx64-nvidia-cuda, nvptx64-nvidia-cuda \
442      -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906:sramecc-:xnack+ \
443      -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906:sramecc+:xnack+ \
444      -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_70 \
445      -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_80 \
446      -c func_1.c -o func_1.o
447
448    clang -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa,amdgcn-amd-amdhsa,
449      nvptx64-nvidia-cuda, nvptx64-nvidia-cuda \
450      -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906:sramecc-:xnack+ \
451      -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906:sramecc+:xnack+ \
452      -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_70 \
453      -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_80 \
454      -c func_2.c -o func_2.o
455
4562. Create a heterogeneous device archive by combining all the object file(s)
457
458  ::
459
460    llvm-ar cr libFatArchive.a func_1.o func_2.o
461
462Extracting a Device Specific Archive
463------------------------------------
464
465UnbundleArchive takes a heterogeneous device archive file (".a") as input
466containing bundled device binary files, and a list of offload targets (not
467host), and extracts the device binaries into a new archive file for each
468offload target. Each resulting archive file contains all device binaries
469compatible with that particular offload target. Compatibility between a
470device binary in HDA and a target is based on the compatibility between their
471bundle entry IDs as defined in :ref:`compatibility-bundle-entry-id`.
472
473Following cases may arise during compatibility testing:
474
475* A binary is compatible with one or more targets: Insert the binary into the
476  device-specific archive of each compatible target.
477* A binary is not compatible with any target: Skip the binary.
478* One or more binaries are compatible with a target: Insert all binaries into
479  the device-specific archive of the target. The insertion need not be ordered.
480* No binary is compatible with a target: If ``allow-missing-bundles`` option is
481  present then create an empty archive for the target. Otherwise, produce an
482  error without creating an archive.
483
484The created archive file does not contain an index of the symbols and device
485binary files are named as <<Parent Bundle Name>-<DeviceBinary's TargetID>>,
486with ':' replaced with '_'.
487
488Usage
489-----
490
491::
492
493  clang-offload-bundler --unbundle --inputs=libFatArchive.a -type=a \
494   -targets=openmp-amdgcn-amdhsa-gfx906:sramecc+:xnack+, \
495            openmp-amdgcn-amdhsa-gfx908:sramecc-:xnack+  \
496   -outputs=devicelib-gfx906.a,deviceLib-gfx908.a
497
498.. _additional-options-archive-unbundling:
499
500Additional Options while Archive Unbundling
501-------------------------------------------
502
503**-allow-missing-bundles**
504  Create an empty archive file if no compatible device binary is found.
505
506**-check-input-archive**
507  Check if input heterogeneous device archive follows rules for composition
508  as defined in :ref:`code-object-composition` before creating device-specific
509  archive(s).
510
511**-debug-only=CodeObjectCompatibility**
512  Verbose printing of matched/unmatched comparisons between bundle entry id of
513  a device binary from HDA and bundle entry ID of a given target processor
514  (see :ref:`compatibility-bundle-entry-id`).
515
516Compression and Decompression
517=============================
518
519``clang-offload-bundler`` provides features to compress and decompress the full
520bundle, leveraging inherent redundancies within the bundle entries. Use the
521`-compress` command-line option to enable this compression capability.
522
523The compressed offload bundle begins with a header followed by the compressed binary data:
524
525- **Magic Number (4 bytes)**:
526    This is a unique identifier to distinguish compressed offload bundles. The value is the string 'CCOB' (Compressed Clang Offload Bundle).
527
528- **Version Number (16-bit unsigned int)**:
529    This denotes the version of the compressed offload bundle format. The current version is `2`.
530
531- **Compression Method (16-bit unsigned int)**:
532    This field indicates the compression method used. The value corresponds to either `zlib` or `zstd`, represented as a 16-bit unsigned integer cast from the LLVM compression enumeration.
533
534- **Total File Size (32-bit unsigned int)**:
535    This is the total size (in bytes) of the file, including the header. Available in version 2 and above.
536
537- **Uncompressed Binary Size (32-bit unsigned int)**:
538    This is the size (in bytes) of the binary data before it was compressed.
539
540- **Hash (64-bit unsigned int)**:
541    This is a 64-bit truncated MD5 hash of the uncompressed binary data. It serves for verification and caching purposes.
542
543- **Compressed Data**:
544    The actual compressed binary data follows the header. Its size can be inferred from the total size of the file minus the header size.
545
546    > **Note**: Version 3 of the format is under development. It uses 64-bit fields for Total File Size and Uncompressed Binary Size to support files larger than 4GB. To experiment with version 3, set the environment variable `COMPRESSED_BUNDLE_FORMAT_VERSION=3`. This support is experimental and not recommended for production use.