1===================== 2Clang Offload Bundler 3===================== 4 5.. contents:: 6 :local: 7 8.. _clang-offload-bundler: 9 10Introduction 11============ 12 13For heterogeneous single source programming languages, use one or more 14``--offload-arch=<target-id>`` Clang options to specify the target IDs of the 15code to generate for the offload code regions. 16 17The tool chain may perform multiple compilations of a translation unit to 18produce separate code objects for the host and potentially multiple offloaded 19devices. The ``clang-offload-bundler`` tool may be used as part of the tool 20chain to combine these multiple code objects into a single bundled code object. 21 22The tool chain may use a bundled code object as an intermediate step so that 23each tool chain step consumes and produces a single file as in traditional 24non-heterogeneous tool chains. The bundled code object contains the code objects 25for the host and all the offload devices. 26 27A bundled code object may also be used to bundle just the offloaded code 28objects, and embedded as data into the host code object. The host compilation 29includes an ``init`` function that will use the runtime corresponding to the 30offload kind (see :ref:`clang-offload-kind-table`) to load the offload code 31objects appropriate to the devices present when the host program is executed. 32 33Supported File Formats 34====================== 35Several text and binary file formats are supported for bundling/unbundling. See 36:ref:`supported-file-formats-table` for a list of currently supported formats. 37 38 .. table:: Supported File Formats 39 :name: supported-file-formats-table 40 41 +--------------------+----------------+-------------+ 42 | File Format | File Extension | Text/Binary | 43 +====================+================+=============+ 44 | CPP output | i | Text | 45 +--------------------+----------------+-------------+ 46 | C++ CPP output | ii | Text | 47 +--------------------+----------------+-------------+ 48 | CUDA/HIP output | cui | Text | 49 +--------------------+----------------+-------------+ 50 | Dependency | d | Text | 51 +--------------------+----------------+-------------+ 52 | LLVM | ll | Text | 53 +--------------------+----------------+-------------+ 54 | LLVM Bitcode | bc | Binary | 55 +--------------------+----------------+-------------+ 56 | Assembler | s | Text | 57 +--------------------+----------------+-------------+ 58 | Object | o | Binary | 59 +--------------------+----------------+-------------+ 60 | Archive of objects | a | Binary | 61 +--------------------+----------------+-------------+ 62 | Precompiled header | gch | Binary | 63 +--------------------+----------------+-------------+ 64 | Clang AST file | ast | Binary | 65 +--------------------+----------------+-------------+ 66 67.. _clang-bundled-code-object-layout-text: 68 69Bundled Text File Layout 70======================== 71 72The format of the bundled files is currently very simple: text formats are 73concatenated with comments that have a magic string and bundle entry ID in 74between. 75 76:: 77 78 "Comment OFFLOAD_BUNDLER_MAGIC_STR__START__ 1st Bundle Entry ID" 79 Bundle 1 80 "Comment OFFLOAD_BUNDLER_MAGIC_STR__END__ 1st Bundle Entry ID" 81 ... 82 "Comment OFFLOAD_BUNDLER_MAGIC_STR__START__ Nth Bundle Entry ID" 83 Bundle N 84 "Comment OFFLOAD_BUNDLER_MAGIC_STR__END__ 1st Bundle Entry ID" 85 86.. _clang-bundled-code-object-layout: 87 88Bundled Binary File Layout 89========================== 90 91The layout of a bundled code object is defined by the following table: 92 93 .. table:: Bundled Code Object Layout 94 :name: bundled-code-object-layout-table 95 96 =================================== ======= ================ =============================== 97 Field Type Size in Bytes Description 98 =================================== ======= ================ =============================== 99 Magic String string 24 ``__CLANG_OFFLOAD_BUNDLE__`` 100 Number Of Bundle Entries integer 8 Number of bundle entries. 101 1st Bundle Entry Code Object Offset integer 8 Byte offset from beginning of 102 bundled code object to 1st code 103 object. 104 1st Bundle Entry Code Object Size integer 8 Byte size of 1st code object. 105 1st Bundle Entry ID Length integer 8 Character length of bundle 106 entry ID of 1st code object. 107 1st Bundle Entry ID string 1st Bundle Entry Bundle entry ID of 1st code 108 ID Length object. This is not NUL 109 terminated. See 110 :ref:`clang-bundle-entry-id`. 111 \... 112 Nth Bundle Entry Code Object Offset integer 8 113 Nth Bundle Entry Code Object Size integer 8 114 Nth Bundle Entry ID Length integer 8 115 Nth Bundle Entry ID string 1st Bundle Entry 116 ID Length 117 1st Bundle Entry Code Object bytes 1st Bundle Entry 118 Code Object Size 119 \... 120 Nth Bundle Entry Code Object bytes Nth Bundle Entry 121 Code Object Size 122 =================================== ======= ================ =============================== 123 124.. _clang-bundle-entry-id: 125 126Bundle Entry ID 127=============== 128 129Each entry in a bundled code object (see 130:ref:`clang-bundled-code-object-layout`) has a bundle entry ID that indicates 131the kind of the entry's code object and the runtime that manages it. 132 133Bundle entry ID syntax is defined by the following BNF syntax: 134 135.. code:: 136 137 <bundle-entry-id> ::== <offload-kind> "-" <target-triple> [ "-" <target-id> ] 138 139Where: 140 141**offload-kind** 142 The runtime responsible for managing the bundled entry code object. See 143 :ref:`clang-offload-kind-table`. 144 145 .. table:: Bundled Code Object Offload Kind 146 :name: clang-offload-kind-table 147 148 ============= ============================================================== 149 Offload Kind Description 150 ============= ============================================================== 151 host Host code object. ``clang-offload-bundler`` always includes 152 this entry as the first bundled code object entry. For an 153 embedded bundled code object this entry is not used by the 154 runtime and so is generally an empty code object. 155 156 hip Offload code object for the HIP language. Used for all 157 HIP language offload code objects when the 158 ``clang-offload-bundler`` is used to bundle code objects as 159 intermediate steps of the tool chain. Also used for AMD GPU 160 code objects before ABI version V4 when the 161 ``clang-offload-bundler`` is used to create a *fat binary* 162 to be loaded by the HIP runtime. The fat binary can be 163 loaded directly from a file, or be embedded in the host code 164 object as a data section with the name ``.hip_fatbin``. 165 166 hipv4 Offload code object for the HIP language. Used for AMD GPU 167 code objects with at least ABI version V4 when the 168 ``clang-offload-bundler`` is used to create a *fat binary* 169 to be loaded by the HIP runtime. The fat binary can be 170 loaded directly from a file, or be embedded in the host code 171 object as a data section with the name ``.hip_fatbin``. 172 173 openmp Offload code object for the OpenMP language extension. 174 ============= ============================================================== 175 176**target-triple** 177 The target triple of the code object. 178 179**target-id** 180 The canonical target ID of the code object. Present only if the target 181 supports a target ID. See :ref:`clang-target-id`. 182 183Each entry of a bundled code object must have a different bundle entry ID. There 184can be multiple entries for the same processor provided they differ in target 185feature settings. If there is an entry with a target feature specified as *Any*, 186then all entries must specify that target feature as *Any* for the same 187processor. There may be additional target specific restrictions. 188 189.. _clang-target-id: 190 191Target ID 192========= 193 194A target ID is used to indicate the processor and optionally its configuration, 195expressed by a set of target features, that affect ISA generation. It is target 196specific if a target ID is supported, or if the target triple alone is 197sufficient to specify the ISA generation. 198 199It is used with the ``-mcpu=<target-id>`` and ``--offload-arch=<target-id>`` 200Clang compilation options to specify the kind of code to generate. 201 202It is also used as part of the bundle entry ID to identify the code object. See 203:ref:`clang-bundle-entry-id`. 204 205Target ID syntax is defined by the following BNF syntax: 206 207.. code:: 208 209 <target-id> ::== <processor> ( ":" <target-feature> ( "+" | "-" ) )* 210 211Where: 212 213**processor** 214 Is a the target specific processor or any alternative processor name. 215 216**target-feature** 217 Is a target feature name that is supported by the processor. Each target 218 feature must appear at most once in a target ID and can have one of three 219 values: 220 221 *Any* 222 Specified by omitting the target feature from the target ID. 223 A code object compiled with a target ID specifying the default 224 value of a target feature can be loaded and executed on a processor 225 configured with the target feature on or off. 226 227 *On* 228 Specified by ``+``, indicating the target feature is enabled. A code 229 object compiled with a target ID specifying a target feature on 230 can only be loaded on a processor configured with the target feature on. 231 232 *Off* 233 specified by ``-``, indicating the target feature is disabled. A code 234 object compiled with a target ID specifying a target feature off 235 can only be loaded on a processor configured with the target feature off. 236 237There are two forms of target ID: 238 239*Non-Canonical Form* 240 The non-canonical form is used as the input to user commands to allow the user 241 greater convenience. It allows both the primary and alternative processor name 242 to be used and the target features may be specified in any order. 243 244*Canonical Form* 245 The canonical form is used for all generated output to allow greater 246 convenience for tools that consume the information. It is also used for 247 internal passing of information between tools. Only the primary and not 248 alternative processor name is used and the target features are specified in 249 alphabetic order. Command line tools convert non-canonical form to canonical 250 form. 251 252Target Specific information 253=========================== 254 255Target specific information is available for the following: 256 257*AMD GPU* 258 AMD GPU supports target ID and target features. See `User Guide for AMDGPU Backend 259 <https://llvm.org/docs/AMDGPUUsage.html>`_ which defines the `processors 260 <https://llvm.org/docs/AMDGPUUsage.html#amdgpu-processors>`_ and `target 261 features <https://llvm.org/docs/AMDGPUUsage.html#amdgpu-target-features>`_ 262 supported. 263 264Most other targets do not support target IDs. 265 266Archive Unbundling 267================== 268 269Unbundling of heterogeneous device archive is done to create device specific 270archives. Heterogeneous Device Archive is in a format compatible with GNU ar 271utility and contains a collection of bundled device binaries where each bundle 272file will contain device binaries for a host and one or more targets. The 273output device specific archive is in a format compatible with GNU ar utility 274and contains a collection of device binaries for a specific target. 275 276.. code:: 277 278 Heterogeneous Device Archive, HDA = {F1.X, F2.X, ..., FN.Y} 279 where, Fi = Bundle{Host-DeviceBinary, T1-DeviceBinary, T2-DeviceBinary, ..., 280 Tm-DeviceBinary}, 281 Ti = {Target i, qualified using Bundle Entry ID}, 282 X/Y = \*.bc for AMDGPU and \*.cubin for NVPTX 283 284 Device Specific Archive, DSA(Tk) = {F1-Tk-DeviceBinary.X, F2-Tk-DeviceBinary.X, ... 285 FN-Tk-DeviceBinary.Y} 286 where, Fi-Tj-DeviceBinary.X represents device binary of i-th bundled device 287 binary file for target Tj. 288 289clang-offload-bundler extracts compatible device binaries for a given target 290from the bundled device binaries in a heterogeneous device archive and creates 291a target specific device archive without bundling. 292 293clang-offload-bundler determines whether a device binary is compatible with a 294target by comparing bundle ID's. Two bundle ID's are considered compatible if: 295 296 * Their offload kind are the same 297 * Their target triple are the same 298 * Their GPUArch are the same 299