1===================== 2Clang Offload Bundler 3===================== 4 5.. contents:: 6 :local: 7 8.. _clang-offload-bundler: 9 10Introduction 11============ 12 13For heterogeneous single source programming languages, use one or more 14``--offload-arch=<target-id>`` Clang options to specify the target IDs of the 15code to generate for the offload code regions. 16 17The tool chain may perform multiple compilations of a translation unit to 18produce separate code objects for the host and potentially multiple offloaded 19devices. The ``clang-offload-bundler`` tool may be used as part of the tool 20chain to combine these multiple code objects into a single bundled code object. 21 22The tool chain may use a bundled code object as an intermediate step so that 23each tool chain step consumes and produces a single file as in traditional 24non-heterogeneous tool chains. The bundled code object contains the code objects 25for the host and all the offload devices. 26 27A bundled code object may also be used to bundle just the offloaded code 28objects, and embedded as data into the host code object. The host compilation 29includes an ``init`` function that will use the runtime corresponding to the 30offload kind (see :ref:`clang-offload-kind-table`) to load the offload code 31objects appropriate to the devices present when the host program is executed. 32 33:program:`clang-offload-bundler` is located in 34`clang/tools/clang-offload-bundler`. 35 36.. code-block:: console 37 38 $ clang-offload-bundler -help 39 OVERVIEW: A tool to bundle several input files of the specified type <type> 40 referring to the same source file but different targets into a single 41 one. The resulting file can also be unbundled into different files by 42 this tool if -unbundle is provided. 43 44 USAGE: clang-offload-bundler [options] 45 46 OPTIONS: 47 48 Generic Options: 49 50 --help - Display available options (--help-hidden for more) 51 --help-list - Display list of available options (--help-list-hidden for more) 52 --version - Display the version of this program 53 54 clang-offload-bundler options: 55 56 --### - Print any external commands that are to be executed instead of actually executing them - for testing purposes. 57 --allow-missing-bundles - Create empty files if bundles are missing when unbundling. 58 --bundle-align=<uint> - Alignment of bundle for binary files 59 --check-input-archive - Check if input heterogeneous archive is valid in terms of TargetID rules. 60 --inputs=<string> - [<input file>,...] 61 --list - List bundle IDs in the bundled file. 62 --outputs=<string> - [<output file>,...] 63 --targets=<string> - [<offload kind>-<target triple>,...] 64 --type=<string> - Type of the files to be bundled/unbundled. 65 Current supported types are: 66 i - cpp-output 67 ii - c++-cpp-output 68 cui - cuda/hip-output 69 d - dependency 70 ll - llvm 71 bc - llvm-bc 72 s - assembler 73 o - object 74 a - archive of bundled files 75 gch - precompiled-header 76 ast - clang AST file 77 --unbundle - Unbundle bundled file into several output files. 78 79Usage 80===== 81 82This tool can be used as follows for bundling: 83 84:: 85 86 clang-offload-bundler -targets=triple1,triple2 -type=ii -inputs=a.triple1.ii,a.triple2.ii -outputs=a.ii 87 88or, it can be used as follows for unbundling: 89 90:: 91 92 clang-offload-bundler -targets=triple1,triple2 -type=ii -outputs=a.triple1.ii,a.triple2.ii -inputs=a.ii -unbundle 93 94 95Supported File Formats 96====================== 97 98Multiple text and binary file formats are supported for bundling/unbundling. See 99:ref:`supported-file-formats-table` for a list of currently supported input 100formats. Use the ``File Type`` column to determine the value to pass to the 101``--type`` option based on the type of input files while bundling/unbundling. 102 103 .. table:: Supported File Formats 104 :name: supported-file-formats-table 105 106 +--------------------------+----------------+-------------+ 107 | File Format | File Type | Text/Binary | 108 +==========================+================+=============+ 109 | CPP output | i | Text | 110 +--------------------------+----------------+-------------+ 111 | C++ CPP output | ii | Text | 112 +--------------------------+----------------+-------------+ 113 | CUDA/HIP output | cui | Text | 114 +--------------------------+----------------+-------------+ 115 | Dependency | d | Text | 116 +--------------------------+----------------+-------------+ 117 | LLVM | ll | Text | 118 +--------------------------+----------------+-------------+ 119 | LLVM Bitcode | bc | Binary | 120 +--------------------------+----------------+-------------+ 121 | Assembler | s | Text | 122 +--------------------------+----------------+-------------+ 123 | Object | o | Binary | 124 +--------------------------+----------------+-------------+ 125 | Archive of bundled files | a | Binary | 126 +--------------------------+----------------+-------------+ 127 | Precompiled header | gch | Binary | 128 +--------------------------+----------------+-------------+ 129 | Clang AST file | ast | Binary | 130 +--------------------------+----------------+-------------+ 131 132.. _clang-bundled-code-object-layout-text: 133 134Bundled Text File Layout 135======================== 136 137The text file formats are concatenated with comments that have a magic string 138and bundle entry ID in between. The BNF syntax to represent a code object 139bundle file is: 140 141:: 142 143 <file> ::== <bundle> | <bundle> <file> 144 <bundle> ::== <comment> <start> <bundle_id> <eol> <bundle> <eol> 145 <comment> end <bundle_id> <eol> 146 <start> ::== OFFLOAD_BUNDLER_MAGIC_STR__START__ 147 <end> ::== OFFLOAD_BUNDLER_MAGIC_STR__END__ 148 149**comment** 150 The symbol used for starting single-line comment in the file type of 151 constituting bundles. E.g. it is ";" for ll ``File Type`` and "#" for "s" 152 ``File Type``. 153 154**bundle_id** 155 The :ref:`clang-bundle-entry-id` for the enclosing bundle. 156 157**eol** 158 The end of line character. 159 160**bundle** 161 The code object stored in one of the supported text file formats. 162 163**OFFLOAD_BUNDLER_MAGIC_STR__** 164 Magic string that marks the existence of offloading data i.e. 165 "__CLANG_OFFLOAD_BUNDLE__". 166 167.. _clang-bundled-code-object-layout: 168 169Bundled Binary File Layout 170========================== 171 172The layout of a bundled code object is defined by the following table: 173 174 .. table:: Bundled Code Object Layout 175 :name: bundled-code-object-layout-table 176 177 =================================== ======= ================ =============================== 178 Field Type Size in Bytes Description 179 =================================== ======= ================ =============================== 180 Magic String string 24 ``__CLANG_OFFLOAD_BUNDLE__`` 181 Number Of Bundle Entries integer 8 Number of bundle entries. 182 1st Bundle Entry Code Object Offset integer 8 Byte offset from beginning of 183 bundled code object to 1st code 184 object. 185 1st Bundle Entry Code Object Size integer 8 Byte size of 1st code object. 186 1st Bundle Entry ID Length integer 8 Character length of bundle 187 entry ID of 1st code object. 188 1st Bundle Entry ID string 1st Bundle Entry Bundle entry ID of 1st code 189 ID Length object. This is not NUL 190 terminated. See 191 :ref:`clang-bundle-entry-id`. 192 \... 193 Nth Bundle Entry Code Object Offset integer 8 194 Nth Bundle Entry Code Object Size integer 8 195 Nth Bundle Entry ID Length integer 8 196 Nth Bundle Entry ID string 1st Bundle Entry 197 ID Length 198 1st Bundle Entry Code Object bytes 1st Bundle Entry 199 Code Object Size 200 \... 201 Nth Bundle Entry Code Object bytes Nth Bundle Entry 202 Code Object Size 203 =================================== ======= ================ =============================== 204 205.. _clang-bundle-entry-id: 206 207Bundle Entry ID 208=============== 209 210Each entry in a bundled code object (see :ref:`clang-bundled-code-object-layout-text` 211and :ref:`clang-bundled-code-object-layout`) has a bundle entry ID that indicates 212the kind of the entry's code object and the runtime that manages it. 213 214Bundle entry ID syntax is defined by the following BNF syntax: 215 216.. code:: 217 218 <bundle-entry-id> ::== <offload-kind> "-" <target-triple> [ "-" <target-id> ] 219 220Where: 221 222**offload-kind** 223 The runtime responsible for managing the bundled entry code object. See 224 :ref:`clang-offload-kind-table`. 225 226 .. table:: Bundled Code Object Offload Kind 227 :name: clang-offload-kind-table 228 229 ============= ============================================================== 230 Offload Kind Description 231 ============= ============================================================== 232 host Host code object. ``clang-offload-bundler`` always includes 233 this entry as the first bundled code object entry. For an 234 embedded bundled code object this entry is not used by the 235 runtime and so is generally an empty code object. 236 237 hip Offload code object for the HIP language. Used for all 238 HIP language offload code objects when the 239 ``clang-offload-bundler`` is used to bundle code objects as 240 intermediate steps of the tool chain. Also used for AMD GPU 241 code objects before ABI version V4 when the 242 ``clang-offload-bundler`` is used to create a *fat binary* 243 to be loaded by the HIP runtime. The fat binary can be 244 loaded directly from a file, or be embedded in the host code 245 object as a data section with the name ``.hip_fatbin``. 246 247 hipv4 Offload code object for the HIP language. Used for AMD GPU 248 code objects with at least ABI version V4 and above when the 249 ``clang-offload-bundler`` is used to create a *fat binary* 250 to be loaded by the HIP runtime. The fat binary can be 251 loaded directly from a file, or be embedded in the host code 252 object as a data section with the name ``.hip_fatbin``. 253 254 openmp Offload code object for the OpenMP language extension. 255 ============= ============================================================== 256 257Note: The distinction between the `hip` and `hipv4` offload kinds is historically based. 258Originally, these designations might have indicated different versions of the 259code object ABI. However, as the system has evolved, the ABI version is now embedded 260directly within the code object itself, making these historical distinctions irrelevant 261during the unbundling process. Consequently, `hip` and `hipv4` are treated as compatible 262in current implementations, facilitating interchangeable handling of code objects 263without differentiation based on offload kind. 264 265**target-triple** 266 The target triple of the code object. See `Target Triple 267 <https://clang.llvm.org/docs/CrossCompilation.html#target-triple>`_. 268 269 The bundler accepts target triples with or without the optional environment 270 field: 271 272 ``<arch><sub>-<vendor>-<sys>``, or 273 ``<arch><sub>-<vendor>-<sys>-<env>`` 274 275 However, in order to standardize outputs for tools that consume bitcode 276 bundles, bundles written by the bundler internally use only the 4-field 277 target triple: 278 279 ``<arch><sub>-<vendor>-<sys>-<env>`` 280 281**target-id** 282 The canonical target ID of the code object. Present only if the target 283 supports a target ID. See :ref:`clang-target-id`. 284 285.. _code-object-composition: 286 287Bundled Code Object Composition 288------------------------------- 289 290 * Each entry of a bundled code object must have a different bundle entry ID. 291 * There can be multiple entries for the same processor provided they differ 292 in target feature settings. 293 * If there is an entry with a target feature specified as *Any*, then all 294 entries must specify that target feature as *Any* for the same processor. 295 296There may be additional target specific restrictions. 297 298.. _compatibility-bundle-entry-id: 299 300Compatibility Rules for Bundle Entry ID 301--------------------------------------- 302 303 A code object, specified using its Bundle Entry ID, can be loaded and 304 executed on a target processor, if: 305 306 * Their offload kinds are the same or comptible. 307 * Their target triples are compatible. 308 * Their Target IDs are compatible as defined in :ref:`compatibility-target-id`. 309 310.. _clang-target-id: 311 312Target ID 313========= 314 315A target ID is used to indicate the processor and optionally its configuration, 316expressed by a set of target features, that affect ISA generation. It is target 317specific if a target ID is supported, or if the target triple alone is 318sufficient to specify the ISA generation. 319 320It is used with the ``-mcpu=<target-id>`` and ``--offload-arch=<target-id>`` 321Clang compilation options to specify the kind of code to generate. 322 323It is also used as part of the bundle entry ID to identify the code object. See 324:ref:`clang-bundle-entry-id`. 325 326Target ID syntax is defined by the following BNF syntax: 327 328.. code:: 329 330 <target-id> ::== <processor> ( ":" <target-feature> ( "+" | "-" ) )* 331 332Where: 333 334**processor** 335 Is a the target specific processor or any alternative processor name. 336 337**target-feature** 338 Is a target feature name that is supported by the processor. Each target 339 feature must appear at most once in a target ID and can have one of three 340 values: 341 342 *Any* 343 Specified by omitting the target feature from the target ID. 344 A code object compiled with a target ID specifying the default 345 value of a target feature can be loaded and executed on a processor 346 configured with the target feature on or off. 347 348 *On* 349 Specified by ``+``, indicating the target feature is enabled. A code 350 object compiled with a target ID specifying a target feature on 351 can only be loaded on a processor configured with the target feature on. 352 353 *Off* 354 specified by ``-``, indicating the target feature is disabled. A code 355 object compiled with a target ID specifying a target feature off 356 can only be loaded on a processor configured with the target feature off. 357 358.. _compatibility-target-id: 359 360Compatibility Rules for Target ID 361--------------------------------- 362 363 A code object compiled for a Target ID is considered compatible for a 364 target, if: 365 366 * Their processor is same. 367 * Their feature set is compatible as defined above. 368 369There are two forms of target ID: 370 371*Non-Canonical Form* 372 The non-canonical form is used as the input to user commands to allow the user 373 greater convenience. It allows both the primary and alternative processor name 374 to be used and the target features may be specified in any order. 375 376*Canonical Form* 377 The canonical form is used for all generated output to allow greater 378 convenience for tools that consume the information. It is also used for 379 internal passing of information between tools. Only the primary and not 380 alternative processor name is used and the target features are specified in 381 alphabetic order. Command line tools convert non-canonical form to canonical 382 form. 383 384Target Specific information 385=========================== 386 387Target specific information is available for the following: 388 389*AMD GPU* 390 AMD GPU supports target ID and target features. See `User Guide for AMDGPU Backend 391 <https://llvm.org/docs/AMDGPUUsage.html>`_ which defines the `processors 392 <https://llvm.org/docs/AMDGPUUsage.html#amdgpu-processors>`_ and `target 393 features <https://llvm.org/docs/AMDGPUUsage.html#amdgpu-target-features>`_ 394 supported. 395 396Most other targets do not support target IDs. 397 398Archive Unbundling 399================== 400 401Unbundling of a heterogeneous device archive (HDA) is done to create device specific 402archives. HDA is in a format compatible with GNU ``ar`` utility and contains a 403collection of bundled device binaries where each bundle file will contain 404device binaries for a host and one or more targets. The output device-specific 405archive is in a format compatible with GNU ``ar`` utility and contains a 406collection of device binaries for a specific target. 407 408:: 409 410 Heterogeneous Device Archive, HDA = {F1.X, F2.X, ..., FN.Y} 411 where, Fi = Bundle{Host-DeviceBinary, T1-DeviceBinary, T2-DeviceBinary, ..., 412 Tm-DeviceBinary}, 413 Ti = {Target i, qualified using Bundle Entry ID}, 414 X/Y = \*.bc for AMDGPU and \*.cubin for NVPTX 415 416 Device Specific Archive, DSA(Tk) = {F1-Tk-DeviceBinary.X, F2-Tk-DeviceBinary.X, ... 417 FN-Tk-DeviceBinary.Y} 418 where, Fi-Tj-DeviceBinary.X represents device binary of i-th bundled device 419 binary file for target Tj. 420 421The clang-offload-bundler extracts compatible device binaries for a given target 422from the bundled device binaries in a heterogeneous device archive and creates 423a target-specific device archive without bundling. 424 425The clang-offload-bundler determines whether a device binary is compatible 426with a target by comparing bundle IDs. Two bundle IDs are considered 427compatible if: 428 429 * Their offload kinds are the same 430 * Their target triples are the same 431 * Their Target IDs are the same 432 433Creating a Heterogeneous Device Archive 434--------------------------------------- 435 4361. Compile source file(s) to generate object file(s) 437 438 :: 439 440 clang -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa,amdgcn-amd-amdhsa,\ 441 nvptx64-nvidia-cuda, nvptx64-nvidia-cuda \ 442 -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906:sramecc-:xnack+ \ 443 -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906:sramecc+:xnack+ \ 444 -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_70 \ 445 -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_80 \ 446 -c func_1.c -o func_1.o 447 448 clang -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa,amdgcn-amd-amdhsa, 449 nvptx64-nvidia-cuda, nvptx64-nvidia-cuda \ 450 -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906:sramecc-:xnack+ \ 451 -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906:sramecc+:xnack+ \ 452 -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_70 \ 453 -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_80 \ 454 -c func_2.c -o func_2.o 455 4562. Create a heterogeneous device archive by combining all the object file(s) 457 458 :: 459 460 llvm-ar cr libFatArchive.a func_1.o func_2.o 461 462Extracting a Device Specific Archive 463------------------------------------ 464 465UnbundleArchive takes a heterogeneous device archive file (".a") as input 466containing bundled device binary files, and a list of offload targets (not 467host), and extracts the device binaries into a new archive file for each 468offload target. Each resulting archive file contains all device binaries 469compatible with that particular offload target. Compatibility between a 470device binary in HDA and a target is based on the compatibility between their 471bundle entry IDs as defined in :ref:`compatibility-bundle-entry-id`. 472 473Following cases may arise during compatibility testing: 474 475* A binary is compatible with one or more targets: Insert the binary into the 476 device-specific archive of each compatible target. 477* A binary is not compatible with any target: Skip the binary. 478* One or more binaries are compatible with a target: Insert all binaries into 479 the device-specific archive of the target. The insertion need not be ordered. 480* No binary is compatible with a target: If ``allow-missing-bundles`` option is 481 present then create an empty archive for the target. Otherwise, produce an 482 error without creating an archive. 483 484The created archive file does not contain an index of the symbols and device 485binary files are named as <<Parent Bundle Name>-<DeviceBinary's TargetID>>, 486with ':' replaced with '_'. 487 488Usage 489----- 490 491:: 492 493 clang-offload-bundler --unbundle --inputs=libFatArchive.a -type=a \ 494 -targets=openmp-amdgcn-amdhsa-gfx906:sramecc+:xnack+, \ 495 openmp-amdgcn-amdhsa-gfx908:sramecc-:xnack+ \ 496 -outputs=devicelib-gfx906.a,deviceLib-gfx908.a 497 498.. _additional-options-archive-unbundling: 499 500Additional Options while Archive Unbundling 501------------------------------------------- 502 503**-allow-missing-bundles** 504 Create an empty archive file if no compatible device binary is found. 505 506**-check-input-archive** 507 Check if input heterogeneous device archive follows rules for composition 508 as defined in :ref:`code-object-composition` before creating device-specific 509 archive(s). 510 511**-debug-only=CodeObjectCompatibility** 512 Verbose printing of matched/unmatched comparisons between bundle entry id of 513 a device binary from HDA and bundle entry ID of a given target processor 514 (see :ref:`compatibility-bundle-entry-id`). 515 516Compression and Decompression 517============================= 518 519``clang-offload-bundler`` provides features to compress and decompress the full 520bundle, leveraging inherent redundancies within the bundle entries. Use the 521`-compress` command-line option to enable this compression capability. 522 523The compressed offload bundle begins with a header followed by the compressed binary data: 524 525- **Magic Number (4 bytes)**: 526 This is a unique identifier to distinguish compressed offload bundles. The value is the string 'CCOB' (Compressed Clang Offload Bundle). 527 528- **Version Number (16-bit unsigned int)**: 529 This denotes the version of the compressed offload bundle format. The current version is `2`. 530 531- **Compression Method (16-bit unsigned int)**: 532 This field indicates the compression method used. The value corresponds to either `zlib` or `zstd`, represented as a 16-bit unsigned integer cast from the LLVM compression enumeration. 533 534- **Total File Size (32-bit unsigned int)**: 535 This is the total size (in bytes) of the file, including the header. Available in version 2 and above. 536 537- **Uncompressed Binary Size (32-bit unsigned int)**: 538 This is the size (in bytes) of the binary data before it was compressed. 539 540- **Hash (64-bit unsigned int)**: 541 This is a 64-bit truncated MD5 hash of the uncompressed binary data. It serves for verification and caching purposes. 542 543- **Compressed Data**: 544 The actual compressed binary data follows the header. Its size can be inferred from the total size of the file minus the header size. 545 546 > **Note**: Version 3 of the format is under development. It uses 64-bit fields for Total File Size and Uncompressed Binary Size to support files larger than 4GB. To experiment with version 3, set the environment variable `COMPRESSED_BUNDLE_FORMAT_VERSION=3`. This support is experimental and not recommended for production use.