1.. raw:: html 2 3 <style type="text/css"> 4 .none { background-color: #FFCCCC } 5 .part { background-color: #FFFF99 } 6 .good { background-color: #CCFF99 } 7 </style> 8 9.. role:: none 10.. role:: part 11.. role:: good 12 13.. contents:: 14 :local: 15 16================== 17OpenCL Support 18================== 19 20Clang has complete support of OpenCL C versions from 1.0 to 3.0. 21Support for OpenCL 3.0 is in experimental phase (:ref:`OpenCL 3.0 <opencl_300>`). 22 23Clang also supports :ref:`the C++ for OpenCL kernel language <cxx_for_opencl_impl>`. 24 25There are also other :ref:`new and experimental features <opencl_experimenal>` 26available. 27 28Details about usage of clang for OpenCL can be found in :doc:`UsersManual`. 29 30Missing features or with limited support 31======================================== 32 33- For general issues and bugs with OpenCL in clang refer to `the GitHub issue 34 list 35 <https://github.com/llvm/llvm-project/issues?q=is%3Aopen+is%3Aissue+label%3Aopencl>`__. 36 37- Command-line flag :option:`-cl-ext` (used to override extensions/ 38 features supported by a target) is missing support of some functionality i.e. that is 39 implemented fully through libraries (see :ref:`library-based features and 40 extensions <opencl_ext_libs>`). 41 42Internals Manual 43================ 44 45This section acts as internal documentation for OpenCL features design 46as well as some important implementation aspects. It is primarily targeted 47at the advanced users and the toolchain developers integrating frontend 48functionality as a component. 49 50OpenCL Metadata 51--------------- 52 53Clang uses metadata to provide additional OpenCL semantics in IR needed for 54backends and OpenCL runtime. 55 56Each kernel will have function metadata attached to it, specifying the arguments. 57Kernel argument metadata is used to provide source level information for querying 58at runtime, for example using the `clGetKernelArgInfo 59<https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf#167>`_ 60call. 61 62Note that ``-cl-kernel-arg-info`` enables more information about the original 63kernel code to be added e.g. kernel parameter names will appear in the OpenCL 64metadata along with other information. 65 66The IDs used to encode the OpenCL's logical address spaces in the argument info 67metadata follows the SPIR address space mapping as defined in the SPIR 68specification `section 2.2 69<https://www.khronos.org/registry/spir/specs/spir_spec-2.0.pdf#18>`_ 70 71OpenCL Specific Options 72----------------------- 73 74In addition to the options described in :doc:`UsersManual` there are the 75following options specific to the OpenCL frontend. 76 77All the options in this section are frontend-only and therefore if used 78with regular clang driver they require frontend forwarding, e.g. ``-cc1`` 79or ``-Xclang``. 80 81.. _opencl_finclude_default_header: 82 83.. option:: -finclude-default-header 84 85Adds most of builtin types and function declarations during compilations. By 86default the OpenCL headers are not loaded by the frontend and therefore certain 87builtin types and most of builtin functions are not declared. To load them 88automatically this flag can be passed to the frontend (see also :ref:`the 89section on the OpenCL Header <opencl_header>`): 90 91 .. code-block:: console 92 93 $ clang -Xclang -finclude-default-header test.cl 94 95Alternatively the internal header `opencl-c.h` containing the declarations 96can be included manually using ``-include`` or ``-I`` followed by the path 97to the header location. The header can be found in the clang source tree or 98installation directory. 99 100 .. code-block:: console 101 102 $ clang -I<path to clang sources>/lib/Headers/opencl-c.h test.cl 103 $ clang -I<path to clang installation>/lib/clang/<llvm version>/include/opencl-c.h/opencl-c.h test.cl 104 105In this example it is assumed that the kernel code contains 106``#include <opencl-c.h>`` just as a regular C include. 107 108Because the header is very large and long to parse, PCH (:doc:`PCHInternals`) 109and modules (:doc:`Modules`) can be used internally to improve the compilation 110speed. 111 112To enable modules for OpenCL: 113 114 .. code-block:: console 115 116 $ clang --target=spir-unknown-unknown -c -emit-llvm -Xclang -finclude-default-header -fmodules -fimplicit-module-maps -fmodules-cache-path=<path to the generated module> test.cl 117 118Another way to circumvent long parsing latency for the OpenCL builtin 119declarations is to use mechanism enabled by :ref:`-fdeclare-opencl-builtins 120<opencl_fdeclare_opencl_builtins>` flag that is available as an alternative 121feature. 122 123.. _opencl_fdeclare_opencl_builtins: 124 125.. option:: -fdeclare-opencl-builtins 126 127In addition to regular header includes with builtin types and functions using 128:ref:`-finclude-default-header <opencl_finclude_default_header>`, clang 129supports a fast mechanism to declare builtin functions with 130``-fdeclare-opencl-builtins``. This does not declare the builtin types and 131therefore it has to be used in combination with ``-finclude-default-header`` 132if full functionality is required. 133 134**Example of Use**: 135 136 .. code-block:: console 137 138 $ clang -Xclang -fdeclare-opencl-builtins test.cl 139 140.. _opencl_fake_address_space_map: 141 142.. option:: -ffake-address-space-map 143 144Overrides the target address space map with a fake map. 145This allows adding explicit address space IDs to the bitcode for non-segmented 146memory architectures that do not have separate IDs for each of the OpenCL 147logical address spaces by default. Passing ``-ffake-address-space-map`` will 148add/override address spaces of the target compiled for with the following values: 149``1-global``, ``2-constant``, ``3-local``, ``4-generic``. The private address 150space is represented by the absence of an address space attribute in the IR (see 151also :ref:`the section on the address space attribute <opencl_addrsp>`). 152 153 .. code-block:: console 154 155 $ clang -cc1 -ffake-address-space-map test.cl 156 157.. _opencl_builtins: 158 159OpenCL builtins 160--------------- 161 162**Clang builtins** 163 164There are some standard OpenCL functions that are implemented as Clang builtins: 165 166- All pipe functions from `section 6.13.16.2/6.13.16.3 167 <https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf#160>`_ of 168 the OpenCL v2.0 kernel language specification. 169 170- Address space qualifier conversion functions ``to_global``/``to_local``/``to_private`` 171 from `section 6.13.9 172 <https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf#101>`_. 173 174- All the ``enqueue_kernel`` functions from `section 6.13.17.1 175 <https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf#164>`_ and 176 enqueue query functions from `section 6.13.17.5 177 <https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf#171>`_. 178 179**Fast builtin function declarations** 180 181The implementation of the fast builtin function declarations (available via the 182:ref:`-fdeclare-opencl-builtins option <opencl_fdeclare_opencl_builtins>`) consists 183of the following main components: 184 185- A TableGen definitions file ``OpenCLBuiltins.td``. This contains a compact 186 representation of the supported builtin functions. When adding new builtin 187 function declarations, this is normally the only file that needs modifying. 188 189- A Clang TableGen emitter defined in ``ClangOpenCLBuiltinEmitter.cpp``. During 190 Clang build time, the emitter reads the TableGen definition file and 191 generates ``OpenCLBuiltins.inc``. This generated file contains various tables 192 and functions that capture the builtin function data from the TableGen 193 definitions in a compact manner. 194 195- OpenCL specific code in ``SemaLookup.cpp``. When ``Sema::LookupBuiltin`` 196 encounters a potential builtin function, it will check if the name corresponds 197 to a valid OpenCL builtin function. If so, all overloads of the function are 198 inserted using ``InsertOCLBuiltinDeclarationsFromTable`` and overload 199 resolution takes place. 200 201OpenCL Extensions and Features 202------------------------------ 203 204Clang implements various extensions to OpenCL kernel languages. 205 206New functionality is accepted as soon as the documentation is detailed to the 207level sufficient to be implemented. There should be an evidence that the 208extension is designed with implementation feasibility in consideration and 209assessment of complexity for C/C++ based compilers. Alternatively, the 210documentation can be accepted in a format of a draft that can be further 211refined during the implementation. 212 213Implementation guidelines 214^^^^^^^^^^^^^^^^^^^^^^^^^ 215 216This section explains how to extend clang with the new functionality. 217 218**Parsing functionality** 219 220If an extension modifies the standard parsing it needs to be added to 221the clang frontend source code. This also means that the associated macro 222indicating the presence of the extension should be added to clang. 223 224The default flow for adding a new extension into the frontend is to 225modify `OpenCLExtensions.def 226<https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/OpenCLExtensions.def>`__, 227containing the list of all extensions and optional features supported by 228the frontend. 229 230This will add the macro automatically and also add a field in the target 231options ``clang::TargetOptions::OpenCLFeaturesMap`` to control the exposure 232of the new extension during the compilation. 233 234Note that by default targets like `SPIR-V`, `SPIR` or `X86` expose all the OpenCL 235extensions. For all other targets the configuration has to be made explicitly. 236 237Note that the target extension support performed by clang can be overridden 238with :option:`-cl-ext` command-line flags. 239 240.. _opencl_ext_libs: 241 242**Library functionality** 243 244If an extension adds functionality that does not modify standard language 245parsing it should not require modifying anything other than header files and 246``OpenCLBuiltins.td`` detailed in :ref:`OpenCL builtins <opencl_builtins>`. 247Most commonly such extensions add functionality via libraries (by adding 248non-native types or functions) parsed regularly. Similar to other languages this 249is the most common way to add new functionality. 250 251Clang has standard headers where new types and functions are being added, 252for more details refer to 253:ref:`the section on the OpenCL Header <opencl_header>`. The macros indicating 254the presence of such extensions can be added in the standard header files 255conditioned on target specific predefined macros or/and language version 256predefined macros (see `feature/extension preprocessor macros defined in 257opencl-c-base.h 258<https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/opencl-c-base.h>`__). 259 260**Pragmas** 261 262Some extensions alter standard parsing dynamically via pragmas. 263 264Clang provides a mechanism to add the standard extension pragma 265``OPENCL EXTENSION`` by setting a dedicated flag in the extension list entry of 266``OpenCLExtensions.def``. Note that there is no default behavior for the 267standard extension pragmas as it is not specified (for the standards up to and 268including version 3.0) in a sufficient level of detail and, therefore, 269there is no default functionality provided by clang. 270 271Pragmas without detailed information of their behavior (e.g. an explanation of 272changes it triggers in the parsing) should not be added to clang. Moreover, the 273pragmas should provide useful functionality to the user. For example, such 274functionality should address a practical use case and not be redundant i.e. 275cannot be achieved using existing features. 276 277Note that some legacy extensions (published prior to OpenCL 3.0) still 278provide some non-conformant functionality for pragmas e.g. add diagnostics on 279the use of types or functions. This functionality is not guaranteed to remain in 280future releases. However, any future changes should not affect backward 281compatibility. 282 283.. _opencl_addrsp: 284 285Address spaces attribute 286------------------------ 287 288Clang has arbitrary address space support using the ``address_space(N)`` 289attribute, where ``N`` is an integer number in the range specified in the 290Clang source code. This addresses spaces can be used along with the OpenCL 291address spaces however when such addresses spaces converted to/from OpenCL 292address spaces the behavior is not governed by OpenCL specification. 293 294An OpenCL implementation provides a list of standard address spaces using 295keywords: ``private``, ``local``, ``global``, and ``generic``. In the AST and 296in the IR each of the address spaces will be represented by unique number 297provided in the Clang source code. The specific IDs for an address space do not 298have to match between the AST and the IR. Typically in the AST address space 299numbers represent logical segments while in the IR they represent physical 300segments. 301Therefore, machines with flat memory segments can map all AST address space 302numbers to the same physical segment ID or skip address space attribute 303completely while generating the IR. However, if the address space information 304is needed by the IR passes e.g. to improve alias analysis, it is recommended 305to keep it and only lower to reflect physical memory segments in the late 306machine passes. The mapping between logical and target address spaces is 307specified in the Clang's source code. 308 309.. _cxx_for_opencl_impl: 310 311C++ for OpenCL Implementation Status 312==================================== 313 314Clang implements language versions 1.0 and 2021 published in `the official 315release of C++ for OpenCL Documentation 316<https://github.com/KhronosGroup/OpenCL-Docs/releases/tag/cxxforopencl-docrev2021.12>`_. 317 318Limited support of experimental C++ libraries is described in the :ref:`experimental features <opencl_experimenal>`. 319 320GitHub issues for this functionality are typically prefixed 321with '[C++4OpenCL]' - click `here 322<https://github.com/llvm/llvm-project/issues?q=is%3Aissue+is%3Aopen+%5BC%2B%2B4OpenCL%5D>`__ 323to view the full bug list. 324 325 326Missing features or with limited support 327---------------------------------------- 328 329- Support of C++ for OpenCL 2021 is currently in experimental phase. Refer to 330 :ref:`OpenCL 3.0 status <opencl_300>` for details of common missing 331 functionality from OpenCL 3.0. 332 333- IR generation for non-trivial global destructors is incomplete (See: 334 `PR48047 <https://llvm.org/PR48047>`_). 335 336- Support of `destructors with non-default address spaces 337 <https://www.khronos.org/opencl/assets/CXX_for_OpenCL.html#_construction_initialization_and_destruction>`_ 338 is incomplete (See: `D109609 <https://reviews.llvm.org/D109609>`_). 339 340.. _opencl_300: 341 342OpenCL C 3.0 Usage 343================== 344 345OpenCL C 3.0 language standard makes most OpenCL C 2.0 features optional. Optional 346functionality in OpenCL C 3.0 is indicated with the presence of feature-test macros 347(list of feature-test macros is `here <https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#features>`__). 348Command-line flag :option:`-cl-ext` can be used to override features supported by a target. 349 350For cases when there is an associated extension for a specific feature (fp64 and 3d image writes) 351user should specify both (extension and feature) in command-line flag: 352 353 .. code-block:: console 354 355 $ clang -cl-std=CL3.0 -cl-ext=+cl_khr_fp64,+__opencl_c_fp64 ... 356 $ clang -cl-std=CL3.0 -cl-ext=-cl_khr_fp64,-__opencl_c_fp64 ... 357 358 359 360OpenCL C 3.0 Implementation Status 361---------------------------------- 362 363The following table provides an overview of features in OpenCL C 3.0 and their 364implementation status. 365 366+------------------------------+-------------------------+-----------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------------+ 367| Category | Feature | Status | Reviews | 368+==============================+=========================+=========================================+======================+================================================================================================================================+ 369| Command line interface | New value for ``-cl-std`` flag | :good:`done` | https://reviews.llvm.org/D88300 | 370+------------------------------+-------------------------+-----------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------------+ 371| Predefined macros | New version macro | :good:`done` | https://reviews.llvm.org/D88300 | 372+------------------------------+-------------------------+-----------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------------+ 373| Predefined macros | Feature macros | :good:`done` | https://reviews.llvm.org/D95776 | 374+------------------------------+-------------------------+-----------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------------+ 375| Feature optionality | Generic address space | :good:`done` | https://reviews.llvm.org/D95778 and https://reviews.llvm.org/D103401 | 376+------------------------------+-------------------------+-----------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------------+ 377| Feature optionality | Builtin function overloads with generic address space | :good:`done` | https://reviews.llvm.org/D105526, https://reviews.llvm.org/D107769 | 378+------------------------------+-------------------------+-----------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------------+ 379| Feature optionality | Program scope variables in global memory | :good:`done` | https://reviews.llvm.org/D103191 | 380+------------------------------+-------------------------+-----------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------------+ 381| Feature optionality | 3D image writes including builtin functions | :good:`done` | https://reviews.llvm.org/D106260 (frontend) | 382+------------------------------+-------------------------+-----------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------------+ 383| Feature optionality | read_write images including builtin functions | :good:`done` | https://reviews.llvm.org/D104915 (frontend) and https://reviews.llvm.org/D107539, https://reviews.llvm.org/D117899 (functions) | 384+------------------------------+-------------------------+-----------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------------+ 385| Feature optionality | C11 atomics memory scopes, ordering and builtin function | :good:`done` | https://reviews.llvm.org/D106111, https://reviews.llvm.org/D119420 | 386+------------------------------+-------------------------+-----------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------------+ 387| Feature optionality | Blocks and Device-side kernel enqueue including builtin functions | :good:`done` | https://reviews.llvm.org/D115640, https://reviews.llvm.org/D118605 | 388+------------------------------+-------------------------+-----------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------------+ 389| Feature optionality | Pipes including builtin functions | :good:`done` | https://reviews.llvm.org/D107154 (frontend) and https://reviews.llvm.org/D105858 (functions) | 390+------------------------------+-------------------------+-----------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------------+ 391| Feature optionality | Work group collective builtin functions | :good:`done` | https://reviews.llvm.org/D105858 | 392+------------------------------+-------------------------+-----------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------------+ 393| Feature optionality | Image types and builtin functions | :good:`done` | https://reviews.llvm.org/D103911 (frontend) and https://reviews.llvm.org/D107539 (functions) | 394+------------------------------+-------------------------+-----------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------------+ 395| Feature optionality | Double precision floating point type | :good:`done` | https://reviews.llvm.org/D96524 | 396+------------------------------+-------------------------+-----------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------------+ 397| New functionality | RGBA vector components | :good:`done` | https://reviews.llvm.org/D99969 | 398+------------------------------+-------------------------+-----------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------------+ 399| New functionality | Subgroup functions | :good:`done` | https://reviews.llvm.org/D105858, https://reviews.llvm.org/D118999 | 400+------------------------------+-------------------------+-----------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------------+ 401| New functionality | Atomic mem scopes: subgroup, all devices including functions | :good:`done` | https://reviews.llvm.org/D103241 | 402+------------------------------+-------------------------+-----------------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------------+ 403 404.. _opencl_experimenal: 405 406Experimental features 407===================== 408 409Clang provides the following new WIP features for the developers to experiment 410and provide early feedback or contribute with further improvements. 411Feel free to contact us on `the Discourse forums (Clang Frontend category) 412<https://discourse.llvm.org/c/clang/6>`_ or file `a GitHub issue 413<https://github.com/llvm/llvm-project/issues/new>`_. 414 415.. _opencl_experimental_cxxlibs: 416 417C++ libraries for OpenCL 418------------------------ 419 420There is ongoing work to support C++ standard libraries from `LLVM's libcxx 421<https://libcxx.llvm.org/>`_ in OpenCL kernel code using C++ for OpenCL mode. 422 423It is currently possible to include `type_traits` from C++17 in the kernel 424sources when the following clang extensions are enabled 425``__cl_clang_function_pointers`` and ``__cl_clang_variadic_functions``, 426see :doc:`LanguageExtensions` for more details. The use of non-conformant 427features enabled by the extensions does not expose non-conformant behavior 428beyond the compilation i.e. does not get generated in IR or binary. 429The extension only appear in metaprogramming 430mechanism to identify or verify the properties of types. This allows to provide 431the full C++ functionality without a loss of portability. To avoid unsafe use 432of the extensions it is recommended that the extensions are disabled directly 433after the header include. 434 435**Example of Use**: 436 437The example of kernel code with `type_traits` is illustrated here. 438 439.. code-block:: c++ 440 441 #pragma OPENCL EXTENSION __cl_clang_function_pointers : enable 442 #pragma OPENCL EXTENSION __cl_clang_variadic_functions : enable 443 #include <type_traits> 444 #pragma OPENCL EXTENSION __cl_clang_function_pointers : disable 445 #pragma OPENCL EXTENSION __cl_clang_variadic_functions : disable 446 447 using sint_type = std::make_signed<unsigned int>::type; 448 449 __kernel void foo() { 450 static_assert(!std::is_same<sint_type, unsigned int>::value); 451 } 452 453The possible clang invocation to compile the example is as follows: 454 455 .. code-block:: console 456 457 $ clang -I<path to libcxx checkout or installation>/include test.clcpp 458 459Note that `type_traits` is a header only library and therefore no extra 460linking step against the standard libraries is required. See full example 461in `Compiler Explorer <https://godbolt.org/z/5WbnTfb65>`_. 462 463More OpenCL specific C++ library implementations built on top of libcxx 464are available in `libclcxx <https://github.com/KhronosGroup/libclcxx>`_ 465project. 466