1.. raw:: html 2 3 <style type="text/css"> 4 .none { background-color: #FFCCCC } 5 .partial { background-color: #FFFF99 } 6 .good { background-color: #CCFF99 } 7 </style> 8 9.. role:: none 10.. role:: partial 11.. role:: good 12 13.. contents:: 14 :local: 15 16================== 17OpenCL Support 18================== 19 20Clang has complete support of OpenCL C versions from 1.0 to 2.0. 21 22Clang also supports :ref:`the C++ for OpenCL kernel language <cxx_for_opencl_impl>`. 23 24There is an ongoing work to support :ref:`OpenCL 3.0 <opencl_300>`. 25 26There are also other :ref:`new and experimental features <opencl_experimenal>` available. 27 28For general issues and bugs with OpenCL in clang refer to `Bugzilla 29<https://bugs.llvm.org/buglist.cgi?component=OpenCL&list_id=172679&product=clang&resolution=--->`__. 30 31Internals Manual 32================ 33 34This section acts as internal documentation for OpenCL features design 35as well as some important implementation aspects. It is primarily targeted 36at the advanced users and the toolchain developers integrating frontend 37functionality as a component. 38 39OpenCL Metadata 40--------------- 41 42Clang uses metadata to provide additional OpenCL semantics in IR needed for 43backends and OpenCL runtime. 44 45Each kernel will have function metadata attached to it, specifying the arguments. 46Kernel argument metadata is used to provide source level information for querying 47at runtime, for example using the `clGetKernelArgInfo 48<https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf#167>`_ 49call. 50 51Note that ``-cl-kernel-arg-info`` enables more information about the original 52kernel code to be added e.g. kernel parameter names will appear in the OpenCL 53metadata along with other information. 54 55The IDs used to encode the OpenCL's logical address spaces in the argument info 56metadata follows the SPIR address space mapping as defined in the SPIR 57specification `section 2.2 58<https://www.khronos.org/registry/spir/specs/spir_spec-2.0.pdf#18>`_ 59 60OpenCL Specific Options 61----------------------- 62 63In addition to the options described in :doc:`UsersManual` there are the 64following options specific to the OpenCL frontend. 65 66All the options in this section are frontend-only and therefore if used 67with regular clang driver they require frontend forwarding, e.g. ``-cc1`` 68or ``-Xclang``. 69 70.. _opencl_cl_ext: 71 72.. option:: -cl-ext 73 74Disables support of OpenCL extensions. All OpenCL targets provide a list 75of extensions that they support. Clang allows to amend this using the ``-cl-ext`` 76flag with a comma-separated list of extensions prefixed with ``'+'`` or ``'-'``. 77The syntax: ``-cl-ext=<(['-'|'+']<extension>[,])+>``, where extensions 78can be either one of `the OpenCL published extensions 79<https://www.khronos.org/registry/OpenCL>`_ 80or any vendor extension. Alternatively, ``'all'`` can be used to enable 81or disable all known extensions. 82 83Example disabling double support for the 64-bit SPIR target: 84 85 .. code-block:: console 86 87 $ clang -cc1 -triple spir64-unknown-unknown -cl-ext=-cl_khr_fp64 test.cl 88 89Enabling all extensions except double support in R600 AMD GPU can be done using: 90 91 .. code-block:: console 92 93 $ clang -cc1 -triple r600-unknown-unknown -cl-ext=-all,+cl_khr_fp16 test.cl 94 95.. _opencl_finclude_default_header: 96 97.. option:: -finclude-default-header 98 99Adds most of builtin types and function declarations during compilations. By 100default the OpenCL headers are not loaded by the frontend and therefore certain 101builtin types and most of builtin functions are not declared. To load them 102automatically this flag can be passed to the frontend (see also :ref:`the 103section on the OpenCL Header <opencl_header>`): 104 105 .. code-block:: console 106 107 $ clang -Xclang -finclude-default-header test.cl 108 109Alternatively the internal header `opencl-c.h` containing the declarations 110can be included manually using ``-include`` or ``-I`` followed by the path 111to the header location. The header can be found in the clang source tree or 112installation directory. 113 114 .. code-block:: console 115 116 $ clang -I<path to clang sources>/lib/Headers/opencl-c.h test.cl 117 $ clang -I<path to clang installation>/lib/clang/<llvm version>/include/opencl-c.h/opencl-c.h test.cl 118 119In this example it is assumed that the kernel code contains 120``#include <opencl-c.h>`` just as a regular C include. 121 122Because the header is very large and long to parse, PCH (:doc:`PCHInternals`) 123and modules (:doc:`Modules`) can be used internally to improve the compilation 124speed. 125 126To enable modules for OpenCL: 127 128 .. code-block:: console 129 130 $ clang -target spir-unknown-unknown -c -emit-llvm -Xclang -finclude-default-header -fmodules -fimplicit-module-maps -fm odules-cache-path=<path to the generated module> test.cl 131 132Another way to circumvent long parsing latency for the OpenCL builtin 133declarations is to use mechanism enabled by :ref:`-fdeclare-opencl-builtins 134<opencl_fdeclare_opencl_builtins>` flag that is available as an alternative 135feature. 136 137.. _opencl_fdeclare_opencl_builtins: 138 139.. option:: -fdeclare-opencl-builtins 140 141In addition to regular header includes with builtin types and functions using 142:ref:`-finclude-default-header <opencl_finclude_default_header>`, clang 143supports a fast mechanism to declare builtin functions with 144``-fdeclare-opencl-builtins``. This does not declare the builtin types and 145therefore it has to be used in combination with ``-finclude-default-header`` 146if full functionality is required. 147 148**Example of Use**: 149 150 .. code-block:: console 151 152 $ clang -Xclang -fdeclare-opencl-builtins test.cl 153 154.. _opencl_fake_address_space_map: 155 156.. option:: -ffake-address-space-map 157 158Overrides the target address space map with a fake map. 159This allows adding explicit address space IDs to the bitcode for non-segmented 160memory architectures that do not have separate IDs for each of the OpenCL 161logical address spaces by default. Passing ``-ffake-address-space-map`` will 162add/override address spaces of the target compiled for with the following values: 163``1-global``, ``2-constant``, ``3-local``, ``4-generic``. The private address 164space is represented by the absence of an address space attribute in the IR (see 165also :ref:`the section on the address space attribute <opencl_addrsp>`). 166 167 .. code-block:: console 168 169 $ clang -cc1 -ffake-address-space-map test.cl 170 171.. _opencl_builtins: 172 173OpenCL builtins 174--------------- 175 176**Clang builtins** 177 178There are some standard OpenCL functions that are implemented as Clang builtins: 179 180- All pipe functions from `section 6.13.16.2/6.13.16.3 181 <https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf#160>`_ of 182 the OpenCL v2.0 kernel language specification. 183 184- Address space qualifier conversion functions ``to_global``/``to_local``/``to_private`` 185 from `section 6.13.9 186 <https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf#101>`_. 187 188- All the ``enqueue_kernel`` functions from `section 6.13.17.1 189 <https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf#164>`_ and 190 enqueue query functions from `section 6.13.17.5 191 <https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf#171>`_. 192 193**Fast builtin function declarations** 194 195The implementation of the fast builtin function declarations (available via the 196:ref:`-fdeclare-opencl-builtins option <opencl_fdeclare_opencl_builtins>`) consists 197of the following main components: 198 199- A TableGen definitions file ``OpenCLBuiltins.td``. This contains a compact 200 representation of the supported builtin functions. When adding new builtin 201 function declarations, this is normally the only file that needs modifying. 202 203- A Clang TableGen emitter defined in ``ClangOpenCLBuiltinEmitter.cpp``. During 204 Clang build time, the emitter reads the TableGen definition file and 205 generates ``OpenCLBuiltins.inc``. This generated file contains various tables 206 and functions that capture the builtin function data from the TableGen 207 definitions in a compact manner. 208 209- OpenCL specific code in ``SemaLookup.cpp``. When ``Sema::LookupBuiltin`` 210 encounters a potential builtin function, it will check if the name corresponds 211 to a valid OpenCL builtin function. If so, all overloads of the function are 212 inserted using ``InsertOCLBuiltinDeclarationsFromTable`` and overload 213 resolution takes place. 214 215OpenCL Extensions and Features 216------------------------------ 217 218Clang implements various extensions to OpenCL kernel languages. 219 220New functionality is accepted as soon as the documentation is detailed to the 221level sufficient to be implemented. There should be an evidence that the 222extension is designed with implementation feasibility in consideration and 223assessment of complexity for C/C++ based compilers. Alternatively, the 224documentation can be accepted in a format of a draft that can be further 225refined during the implementation. 226 227Implementation guidelines 228^^^^^^^^^^^^^^^^^^^^^^^^^ 229 230This section explains how to extend clang with the new functionality. 231 232**Parsing functionality** 233 234If an extension modifies the standard parsing it needs to be added to 235the clang frontend source code. This also means that the associated macro 236indicating the presence of the extension should be added to clang. 237 238The default flow for adding a new extension into the frontend is to 239modify `OpenCLExtensions.def 240<https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/OpenCLExtensions.def>`_ 241 242This will add the macro automatically and also add a field in the target 243options ``clang::TargetOptions::OpenCLFeaturesMap`` to control the exposure 244of the new extension during the compilation. 245 246Note that by default targets like `SPIR` or `X86` expose all the OpenCL 247extensions. For all other targets the configuration has to be made explicitly. 248 249Note that the target extension support performed by clang can be overridden 250with :ref:`-cl-ext <opencl_cl_ext>` command-line flags. 251 252**Library functionality** 253 254If an extension adds functionality that does not modify standard language 255parsing it should not require modifying anything other than header files or 256``OpenCLBuiltins.td`` detailed in :ref:`OpenCL builtins <opencl_builtins>`. 257Most commonly such extensions add functionality via libraries (by adding 258non-native types or functions) parsed regularly. Similar to other languages this 259is the most common way to add new functionality. 260 261Clang has standard headers where new types and functions are being added, 262for more details refer to 263:ref:`the section on the OpenCL Header <opencl_header>`. The macros indicating 264the presence of such extensions can be added in the standard header files 265conditioned on target specific predefined macros or/and language version 266predefined macros. 267 268**Pragmas** 269 270Some extensions alter standard parsing dynamically via pragmas. 271 272Clang provides a mechanism to add the standard extension pragma 273``OPENCL EXTENSION`` by setting a dedicated flag in the extension list entry of 274``OpenCLExtensions.def``. Note that there is no default behavior for the 275standard extension pragmas as it is not specified (for the standards up to and 276including version 3.0) in a sufficient level of detail and, therefore, 277there is no default functionality provided by clang. 278 279Pragmas without detailed information of their behavior (e.g. an explanation of 280changes it triggers in the parsing) should not be added to clang. Moreover, the 281pragmas should provide useful functionality to the user. For example, such 282functionality should address a practical use case and not be redundant i.e. 283cannot be achieved using existing features. 284 285Note that some legacy extensions (published prior to OpenCL 3.0) still 286provide some non-conformant functionality for pragmas e.g. add diagnostics on 287the use of types or functions. This functionality is not guaranteed to remain in 288future releases. However, any future changes should not affect backward 289compatibility. 290 291.. _opencl_addrsp: 292 293Address spaces attribute 294------------------------ 295 296Clang has arbitrary address space support using the ``address_space(N)`` 297attribute, where ``N`` is an integer number in the range specified in the 298Clang source code. This addresses spaces can be used along with the OpenCL 299address spaces however when such addresses spaces converted to/from OpenCL 300address spaces the behavior is not governed by OpenCL specification. 301 302An OpenCL implementation provides a list of standard address spaces using 303keywords: ``private``, ``local``, ``global``, and ``generic``. In the AST and 304in the IR each of the address spaces will be represented by unique number 305provided in the Clang source code. The specific IDs for an address space do not 306have to match between the AST and the IR. Typically in the AST address space 307numbers represent logical segments while in the IR they represent physical 308segments. 309Therefore, machines with flat memory segments can map all AST address space 310numbers to the same physical segment ID or skip address space attribute 311completely while generating the IR. However, if the address space information 312is needed by the IR passes e.g. to improve alias analysis, it is recommended 313to keep it and only lower to reflect physical memory segments in the late 314machine passes. The mapping between logical and target address spaces is 315specified in the Clang's source code. 316 317.. _cxx_for_opencl_impl: 318 319C++ for OpenCL Implementation Status 320==================================== 321 322Clang implements language version 1.0 published in `the official 323release of C++ for OpenCL Documentation 324<https://github.com/KhronosGroup/OpenCL-Docs/releases/tag/cxxforopencl-v1.0-r2>`_. 325 326Limited support of experimental C++ libraries is described in the :ref:`experimental features <opencl_experimenal>`. 327 328Bugzilla bugs for this functionality are typically prefixed 329with '[C++4OpenCL]' - click `here 330<https://bugs.llvm.org/buglist.cgi?component=OpenCL&list_id=204139&product=clang&query_format=advanced&resolution=---&sh ort_desc=%5BC%2B%2B4OpenCL%5D&short_desc_type=allwordssubstr>`_ 331to view the full bug list. 332 333 334Missing features or with limited support 335---------------------------------------- 336 337- IR generation for global destructors is incomplete (See: 338 `PR48047 <https://llvm.org/PR48047>`_). 339 340.. _opencl_300: 341 342OpenCL C 3.0 Usage 343================================ 344 345OpenCL C 3.0 language standard makes most OpenCL C 2.0 features optional. Optional 346functionality in OpenCL C 3.0 is indicated with the presence of feature-test macros 347(list of feature-test macros is `here <https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#features>`_). 348Command-line flag :ref:`-cl-ext <opencl_cl_ext>` can be used to override features supported by a target. 349 350For cases when there is an associated extension for a specific feature (fp64 and 3d image writes) 351user should specify both (extension and feature) in command-line flag: 352 353 .. code-block:: console 354 355 $ clang -cc1 -cl-std=CL3.0 -cl-ext=+cl_khr_fp64,+__opencl_c_fp64 ... 356 $ clang -cc1 -cl-std=CL3.0 -cl-ext=-cl_khr_fp64,-__opencl_c_fp64 ... 357 358 359OpenCL C 3.0 Implementation Status 360================================ 361 362The following table provides an overview of features in OpenCL C 3.0 and their 363implementation status. 364 365+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ 366| Category | Feature | Status | Reviews | 367+==============================+==============================================================+======================+===========================================================================+ 368| Command line interface | New value for ``-cl-std`` flag | :good:`done` | https://reviews.llvm.org/D88300 | 369+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ 370| Predefined macros | New version macro | :good:`done` | https://reviews.llvm.org/D88300 | 371+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ 372| Predefined macros | Feature macros | :good:`done` | https://reviews.llvm.org/D95776 | 373+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ 374| Feature optionality | Generic address space | :none:`unclaimed` | | 375+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ 376| Feature optionality | Builtin function overloads with generic address space | :part:`worked on` | https://reviews.llvm.org/D92004 | 377+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ 378| Feature optionality | Program scope variables in global memory | :none:`unclaimed` | | 379+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ 380| Feature optionality | 3D image writes including builtin functions | :none:`unclaimed` | | 381+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ 382| Feature optionality | read_write images including builtin functions | :none:`unclaimed` | | 383+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ 384| Feature optionality | C11 atomics memory scopes, ordering and builtin function | :part:`worked on` | https://reviews.llvm.org/D92004 (functions only) | 385+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ 386| Feature optionality | Device-side kernel enqueue including builtin functions | :none:`unclaimed` | | 387+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ 388| Feature optionality | Pipes including builtin functions | :part:`worked on` | https://reviews.llvm.org/D92004 (functions only) | 389+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ 390| Feature optionality | Work group collective functions | :part:`worked on` | https://reviews.llvm.org/D92004 | 391+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ 392| New functionality | RGBA vector components | :good:`done` | https://reviews.llvm.org/D99969 | 393+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ 394| New functionality | Subgroup functions | :part:`worked on` | https://reviews.llvm.org/D92004 | 395+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ 396| New functionality | Atomic mem scopes: subgroup, all devices including functions | :part:`worked on` | https://reviews.llvm.org/D92004 (functions only) | 397+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ 398 399.. _opencl_experimenal: 400 401Experimental features 402===================== 403 404Clang provides the following new WIP features for the developers to experiment 405and provide early feedback or contribute with further improvements. 406Feel free to contact us on `cfe-dev 407<https://lists.llvm.org/mailman/listinfo/cfe-dev>`_ or via `Bugzilla 408<https://bugs.llvm.org/>`__. 409 410.. _opencl_experimental_cxxlibs: 411 412C++ libraries for OpenCL 413------------------------ 414 415There is ongoing work to support C++ standard libraries from `LLVM's libcxx 416<https://libcxx.llvm.org/>`_ in OpenCL kernel code using C++ for OpenCL mode. 417 418It is currently possible to include `type_traits` from C++17 in the kernel 419sources when the following clang extensions are enabled 420``__cl_clang_function_pointers`` and ``__cl_clang_variadic_functions``, 421see :doc:`LanguageExtensions` for more details. The use of non-conformant 422features enabled by the extensions does not expose non-conformant behavior 423beyond the compilation i.e. does not get generated in IR or binary. 424The extension only appear in metaprogramming 425mechanism to identify or verify the properties of types. This allows to provide 426the full C++ functionality without a loss of portability. To avoid unsafe use 427of the extensions it is recommended that the extensions are disabled directly 428after the header include. 429 430**Example of Use**: 431 432The example of kernel code with `type_traits` is illustrated here. 433 434.. code-block:: c++ 435 436 #pragma OPENCL EXTENSION __cl_clang_function_pointers : enable 437 #pragma OPENCL EXTENSION __cl_clang_variadic_functions : enable 438 #include <type_traits> 439 #pragma OPENCL EXTENSION __cl_clang_function_pointers : disable 440 #pragma OPENCL EXTENSION __cl_clang_variadic_functions : disable 441 442 using sint_type = std::make_signed<unsigned int>::type; 443 444 __kernel void foo() { 445 static_assert(!std::is_same<sint_type, unsigned int>::value); 446 } 447 448The possible clang invocation to compile the example is as follows: 449 450 .. code-block:: console 451 452 $ clang -I<path to libcxx checkout or installation>/include test.clcpp 453 454Note that `type_traits` is a header only library and therefore no extra 455linking step against the standard libraries is required. See full example 456in `Compiler Explorer <https://godbolt.org/z/5WbnTfb65>`_. 457