1227c8ff1SJohannes DoerfertSupport, Getting Involved, and FAQ 2227c8ff1SJohannes Doerfert================================== 3227c8ff1SJohannes Doerfert 4520d29f3StlattnerPlease do not hesitate to reach out to us on the `Discourse forums (Runtimes - OpenMP) <https://discourse.llvm.org/c/runtimes/openmp/35>`_ or join 5227c8ff1SJohannes Doerfertone of our :ref:`regular calls <calls>`. Some common questions are answered in 6227c8ff1SJohannes Doerfertthe :ref:`faq`. 7227c8ff1SJohannes Doerfert 8227c8ff1SJohannes Doerfert.. _calls: 9227c8ff1SJohannes Doerfert 10227c8ff1SJohannes DoerfertCalls 11227c8ff1SJohannes Doerfert----- 12227c8ff1SJohannes Doerfert 13227c8ff1SJohannes DoerfertOpenMP in LLVM Technical Call 14227c8ff1SJohannes Doerfert^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 15227c8ff1SJohannes Doerfert 16227c8ff1SJohannes Doerfert- Development updates on OpenMP (and OpenACC) in the LLVM Project, including Clang, optimization, and runtime work. 17227c8ff1SJohannes Doerfert- Join `OpenMP in LLVM Technical Call <https://bluejeans.com/544112769//webrtc>`__. 18227c8ff1SJohannes Doerfert- Time: Weekly call on every Wednesday 7:00 AM Pacific time. 19227c8ff1SJohannes Doerfert- Meeting minutes are `here <https://docs.google.com/document/d/1Tz8WFN13n7yJ-SCE0Qjqf9LmjGUw0dWO9Ts1ss4YOdg/edit>`__. 20227c8ff1SJohannes Doerfert- Status tracking `page <https://openmp.llvm.org/docs>`__. 2130e818dbSJohannes Doerfert 2230e818dbSJohannes Doerfert 23227c8ff1SJohannes DoerfertOpenMP in Flang Technical Call 24227c8ff1SJohannes Doerfert^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 25227c8ff1SJohannes Doerfert- Development updates on OpenMP and OpenACC in the Flang Project. 26227c8ff1SJohannes Doerfert- Join `OpenMP in Flang Technical Call <https://bit.ly/39eQW3o>`_ 27227c8ff1SJohannes Doerfert- Time: Weekly call on every Thursdays 8:00 AM Pacific time. 28227c8ff1SJohannes Doerfert- Meeting minutes are `here <https://docs.google.com/document/d/1yA-MeJf6RYY-ZXpdol0t7YoDoqtwAyBhFLr5thu5pFI>`__. 29227c8ff1SJohannes Doerfert- Status tracking `page <https://docs.google.com/spreadsheets/d/1FvHPuSkGbl4mQZRAwCIndvQx9dQboffiD-xD0oqxgU0/edit#gid=0>`__. 30227c8ff1SJohannes Doerfert 31227c8ff1SJohannes Doerfert 32227c8ff1SJohannes Doerfert.. _faq: 33227c8ff1SJohannes Doerfert 34227c8ff1SJohannes DoerfertFAQ 35227c8ff1SJohannes Doerfert--- 36227c8ff1SJohannes Doerfert 37227c8ff1SJohannes Doerfert.. note:: 38227c8ff1SJohannes Doerfert The FAQ is a work in progress and most of the expected content is not 39227c8ff1SJohannes Doerfert yet available. While you can expect changes, we always welcome feedback and 40520d29f3Stlattner additions. Please post on the `Discourse forums (Runtimes - OpenMP) <https://discourse.llvm.org/c/runtimes/openmp/35>`__. 41227c8ff1SJohannes Doerfert 42227c8ff1SJohannes Doerfert 43227c8ff1SJohannes DoerfertQ: How to contribute a patch to the webpage or any other part? 44227c8ff1SJohannes Doerfert^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 45227c8ff1SJohannes Doerfert 46227c8ff1SJohannes DoerfertAll patches go through the regular `LLVM review process 47227c8ff1SJohannes Doerfert<https://llvm.org/docs/Contributing.html#how-to-submit-a-patch>`_. 48227c8ff1SJohannes Doerfert 49227c8ff1SJohannes Doerfert 50227c8ff1SJohannes Doerfert.. _build_offload_capable_compiler: 51227c8ff1SJohannes Doerfert 5225fe17d3SJon ChesterfieldQ: How to build an OpenMP GPU offload capable compiler? 5325fe17d3SJon Chesterfield^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 54*de41b137SJoseph Huber 55*de41b137SJoseph HuberThe easiest way to create an offload capable compiler is to use the provided 56*de41b137SJoseph HuberCMake cache file. This will enable the projects and runtimes necessary for 57*de41b137SJoseph Huberoffloading as well as some extra options. 58*de41b137SJoseph Huber 59*de41b137SJoseph Huber.. code-block:: sh 60*de41b137SJoseph Huber 61*de41b137SJoseph Huber $> cd llvm-project # The llvm-project checkout 62*de41b137SJoseph Huber $> mkdir build 63*de41b137SJoseph Huber $> cd build 64*de41b137SJoseph Huber $> cmake ../llvm -G Ninja \ 65*de41b137SJoseph Huber -C ../offload/cmake/caches/Offload.cmake \ # The preset cache file 66*de41b137SJoseph Huber -DCMAKE_BUILD_TYPE=<Debug|Release> \ # Select build type 67*de41b137SJoseph Huber -DCMAKE_INSTALL_PREFIX=<PATH> \ # Where the libraries will live 68*de41b137SJoseph Huber $> ninja install 69*de41b137SJoseph Huber 70*de41b137SJoseph HuberTo manually build an *effective* OpenMP offload capable compiler, only one extra CMake 714123050bSBaodi Shanoption, ``LLVM_ENABLE_RUNTIMES="openmp;offload"``, is needed when building LLVM (Generic 722190c48fSJoseph Huberinformation about building LLVM is available `here 732190c48fSJoseph Huber<https://llvm.org/docs/GettingStarted.html>`__.). Make sure all backends that 745c0f98cdSAnton Rydahlare targeted by OpenMP are enabled. That can be done by adjusting the CMake 755c0f98cdSAnton Rydahloption ``LLVM_TARGETS_TO_BUILD``. The corresponding targets for offloading to AMD 765c0f98cdSAnton Rydahland Nvidia GPUs are ``"AMDGPU"`` and ``"NVPTX"``, respectively. By default, 775c0f98cdSAnton RydahlClang will be built with all backends enabled. When building with 785c0f98cdSAnton Rydahl``LLVM_ENABLE_RUNTIMES="openmp"`` OpenMP should not be enabled in 795c0f98cdSAnton Rydahl``LLVM_ENABLE_PROJECTS`` because it is enabled by default. 80227c8ff1SJohannes Doerfert 812190c48fSJoseph HuberFor Nvidia offload, please see :ref:`build_nvidia_offload_capable_compiler`. 822190c48fSJoseph HuberFor AMDGPU offload, please see :ref:`build_amdgpu_offload_capable_compiler`. 83227c8ff1SJohannes Doerfert 84227c8ff1SJohannes Doerfert.. note:: 85227c8ff1SJohannes Doerfert The compiler that generates the offload code should be the same (version) as 869d81073aSKelvin Li the compiler that builds the OpenMP device runtimes. The OpenMP host runtime 879d81073aSKelvin Li can be built by a different compiler. 88227c8ff1SJohannes Doerfert 89227c8ff1SJohannes Doerfert.. _advanced_builds: https://llvm.org//docs/AdvancedBuilds.html 90227c8ff1SJohannes Doerfert 9125fe17d3SJon Chesterfield.. _build_nvidia_offload_capable_compiler: 92227c8ff1SJohannes Doerfert 935c0f98cdSAnton RydahlQ: How to build an OpenMP Nvidia offload capable compiler? 9425fe17d3SJon Chesterfield^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 9525fe17d3SJon ChesterfieldThe Cuda SDK is required on the machine that will execute the openmp application. 9625fe17d3SJon Chesterfield 9725fe17d3SJon ChesterfieldIf your build machine is not the target machine or automatic detection of the 9825fe17d3SJon Chesterfieldavailable GPUs failed, you should also set: 9925fe17d3SJon Chesterfield 10054b10555SJoel E. Denny- ``LIBOMPTARGET_DEVICE_ARCHITECTURES='sm_<xy>;...'`` where ``<xy>`` is the numeric 1015c0f98cdSAnton Rydahl compute capability of your GPU. For instance, set 10254b10555SJoel E. Denny ``LIBOMPTARGET_DEVICE_ARCHITECTURES='sm_70;sm_80'`` to target the Nvidia Volta 1035c0f98cdSAnton Rydahl and Ampere architectures. 10425fe17d3SJon Chesterfield 10525fe17d3SJon Chesterfield 10625fe17d3SJon Chesterfield.. _build_amdgpu_offload_capable_compiler: 10725fe17d3SJon Chesterfield 10825fe17d3SJon ChesterfieldQ: How to build an OpenMP AMDGPU offload capable compiler? 10925fe17d3SJon Chesterfield^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1102190c48fSJoseph HuberA subset of the `ROCm <https://github.com/radeonopencompute>`_ toolchain is 11125fe17d3SJon Chesterfieldrequired to build the LLVM toolchain and to execute the openmp application. 11225fe17d3SJon ChesterfieldEither install ROCm somewhere that cmake's find_package can locate it, or 11325fe17d3SJon Chesterfieldbuild the required subcomponents ROCt and ROCr from source. 11425fe17d3SJon Chesterfield 1152190c48fSJoseph HuberThe two components used are ROCT-Thunk-Interface, roct, and ROCR-Runtime, rocr. 1162190c48fSJoseph HuberRoct is the userspace part of the linux driver. It calls into the driver which 1172190c48fSJoseph Huberships with the linux kernel. It is an implementation detail of Rocr from 1182190c48fSJoseph HuberOpenMP's perspective. Rocr is an implementation of `HSA 1192190c48fSJoseph Huber<http://www.hsafoundation.com>`_. 1202190c48fSJoseph Huber 1212190c48fSJoseph Huber.. code-block:: text 12225fe17d3SJon Chesterfield 12325fe17d3SJon Chesterfield SOURCE_DIR=same-as-llvm-source # e.g. the checkout of llvm-project, next to openmp 12425fe17d3SJon Chesterfield BUILD_DIR=somewhere 12525fe17d3SJon Chesterfield INSTALL_PREFIX=same-as-llvm-install 12625fe17d3SJon Chesterfield 12725fe17d3SJon Chesterfield cd $SOURCE_DIR 128f244af5cSJon Chesterfield git clone git@github.com:RadeonOpenCompute/ROCT-Thunk-Interface.git -b roc-4.2.x \ 1292190c48fSJoseph Huber --single-branch 130f244af5cSJon Chesterfield git clone git@github.com:RadeonOpenCompute/ROCR-Runtime.git -b rocm-4.2.x \ 1312190c48fSJoseph Huber --single-branch 13225fe17d3SJon Chesterfield 13325fe17d3SJon Chesterfield cd $BUILD_DIR && mkdir roct && cd roct 1342190c48fSJoseph Huber cmake $SOURCE_DIR/ROCT-Thunk-Interface/ -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX \ 1352190c48fSJoseph Huber -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF 13625fe17d3SJon Chesterfield make && make install 13725fe17d3SJon Chesterfield 13825fe17d3SJon Chesterfield cd $BUILD_DIR && mkdir rocr && cd rocr 1392190c48fSJoseph Huber cmake $SOURCE_DIR/ROCR-Runtime/src -DIMAGE_SUPPORT=OFF \ 1402190c48fSJoseph Huber -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX -DCMAKE_BUILD_TYPE=Release \ 1412190c48fSJoseph Huber -DBUILD_SHARED_LIBS=ON 14225fe17d3SJon Chesterfield make && make install 14325fe17d3SJon Chesterfield 1442190c48fSJoseph Huber``IMAGE_SUPPORT`` requires building rocr with clang and is not used by openmp. 14525fe17d3SJon Chesterfield 14625fe17d3SJon ChesterfieldProvided cmake's find_package can find the ROCR-Runtime package, LLVM will 1472190c48fSJoseph Huberbuild a tool ``bin/amdgpu-arch`` which will print a string like ``gfx906`` when 14825fe17d3SJon Chesterfieldrun if it recognises a GPU on the local system. LLVM will also build a shared 14925fe17d3SJon Chesterfieldlibrary, libomptarget.rtl.amdgpu.so, which is linked against rocr. 15025fe17d3SJon Chesterfield 15125fe17d3SJon ChesterfieldWith those libraries installed, then LLVM build and installed, try: 15225fe17d3SJon Chesterfield 1532190c48fSJoseph Huber.. code-block:: shell 1542190c48fSJoseph Huber 15525fe17d3SJon Chesterfield clang -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa example.c -o example && ./example 15625fe17d3SJon Chesterfield 1575c0f98cdSAnton RydahlIf your build machine is not the target machine or automatic detection of the 1585c0f98cdSAnton Rydahlavailable GPUs failed, you should also set: 1595c0f98cdSAnton Rydahl 16054b10555SJoel E. Denny- ``LIBOMPTARGET_DEVICE_ARCHITECTURES='gfx<xyz>;...'`` where ``<xyz>`` is the 1615c0f98cdSAnton Rydahl shader core instruction set architecture. For instance, set 16254b10555SJoel E. Denny ``LIBOMPTARGET_DEVICE_ARCHITECTURES='gfx906;gfx90a'`` to target AMD GCN5 1635c0f98cdSAnton Rydahl and CDNA2 devices. 1645c0f98cdSAnton Rydahl 16525fe17d3SJon ChesterfieldQ: What are the known limitations of OpenMP AMDGPU offload? 16625fe17d3SJon Chesterfield^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 167f244af5cSJon ChesterfieldLD_LIBRARY_PATH or rpath/runpath are required to find libomp.so and libomptarget.so 16825fe17d3SJon Chesterfield 169f244af5cSJon ChesterfieldThere is no libc. That is, malloc and printf do not exist. Libm is implemented in terms 170f244af5cSJon Chesterfieldof the rocm device library, which will be searched for if linking with '-lm'. 17125fe17d3SJon Chesterfield 17225fe17d3SJon ChesterfieldSome versions of the driver for the radeon vii (gfx906) will error unless the 17325fe17d3SJon Chesterfieldenvironment variable 'export HSA_IGNORE_SRAMECC_MISREPORT=1' is set. 17425fe17d3SJon Chesterfield 17525fe17d3SJon ChesterfieldIt is a recent addition to LLVM and the implementation differs from that which 17625fe17d3SJon Chesterfieldhas been shipping in ROCm and AOMP for some time. Early adopters will encounter 17725fe17d3SJon Chesterfieldbugs. 178227c8ff1SJohannes Doerfert 17972e8a4c4SJon ChesterfieldQ: What are the LLVM components used in offloading and how are they found? 18072e8a4c4SJon Chesterfield^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 18172e8a4c4SJon ChesterfieldThe libraries used by an executable compiled for target offloading are: 1827e0b0e05SMark Dewing 18372e8a4c4SJon Chesterfield- ``libomp.so`` (or similar), the host openmp runtime 18472e8a4c4SJon Chesterfield- ``libomptarget.so``, the target-agnostic target offloading openmp runtime 18572e8a4c4SJon Chesterfield- plugins loaded by libomptarget.so: 1867e0b0e05SMark Dewing 18772e8a4c4SJon Chesterfield - ``libomptarget.rtl.amdgpu.so`` 18872e8a4c4SJon Chesterfield - ``libomptarget.rtl.cuda.so`` 18972e8a4c4SJon Chesterfield - ``libomptarget.rtl.x86_64.so`` 19072e8a4c4SJon Chesterfield - ``libomptarget.rtl.ve.so`` 19172e8a4c4SJon Chesterfield - and others 1927e0b0e05SMark Dewing 19372e8a4c4SJon Chesterfield- dependencies of those plugins, e.g. cuda/rocr for nvptx/amdgpu 19472e8a4c4SJon Chesterfield 19572e8a4c4SJon ChesterfieldThe compiled executable is dynamically linked against a host runtime, e.g. 19672e8a4c4SJon Chesterfield``libomp.so``, and against the target offloading runtime, ``libomptarget.so``. These 19772e8a4c4SJon Chesterfieldare found like any other dynamic library, by setting rpath or runpath on the 19872e8a4c4SJon Chesterfieldexecutable, by setting ``LD_LIBRARY_PATH``, or by adding them to the system search. 19972e8a4c4SJon Chesterfield 2000979ea92SJoseph Huber``libomptarget.so`` is only supported to work with the associated ``clang`` 2010979ea92SJoseph Hubercompiler. On systems with globally installed ``libomptarget.so`` this can be 2020979ea92SJoseph Huberproblematic. For this reason it is recommended to use a `Clang configuration 2030979ea92SJoseph Huberfile <https://clang.llvm.org/docs/UsersManual.html#configuration-files>`__ to 2040979ea92SJoseph Huberautomatically configure the environment. For example, store the following file 2050979ea92SJoseph Huberas ``openmp.cfg`` next to your ``clang`` executable. 2060979ea92SJoseph Huber 2070979ea92SJoseph Huber.. code-block:: text 2080979ea92SJoseph Huber 2090979ea92SJoseph Huber # Library paths for OpenMP offloading. 2100979ea92SJoseph Huber -L '<CFGDIR>/../lib' 2110979ea92SJoseph Huber -Wl,-rpath='<CFGDIR>/../lib' 21272e8a4c4SJon Chesterfield 21372e8a4c4SJon ChesterfieldThe plugins will try to find their dependencies in plugin-dependent fashion. 21472e8a4c4SJon Chesterfield 21572e8a4c4SJon ChesterfieldThe cuda plugin is dynamically linked against libcuda if cmake found it at 21672e8a4c4SJon Chesterfieldcompiler build time. Otherwise it will attempt to dlopen ``libcuda.so``. It does 21772e8a4c4SJon Chesterfieldnot have rpath set. 21872e8a4c4SJon Chesterfield 21972e8a4c4SJon ChesterfieldThe amdgpu plugin is linked against ROCr if cmake found it at compiler build 22072e8a4c4SJon Chesterfieldtime. Otherwise it will attempt to dlopen ``libhsa-runtime64.so``. It has rpath 22172e8a4c4SJon Chesterfieldset to ``$ORIGIN``, so installing ``libhsa-runtime64.so`` in the same directory is a 22272e8a4c4SJon Chesterfieldway to locate it without environment variables. 22372e8a4c4SJon Chesterfield 22472e8a4c4SJon ChesterfieldIn addition to those, there is a compiler runtime library called deviceRTL. 22572e8a4c4SJon ChesterfieldThis is compiled from mostly common code into an architecture specific 22672e8a4c4SJon Chesterfieldbitcode library, e.g. ``libomptarget-nvptx-sm_70.bc``. 22772e8a4c4SJon Chesterfield 22872e8a4c4SJon ChesterfieldClang and the deviceRTL need to match closely as the interface between them 22972e8a4c4SJon Chesterfieldchanges frequently. Using both from the same monorepo checkout is strongly 23072e8a4c4SJon Chesterfieldrecommended. 23172e8a4c4SJon Chesterfield 23272e8a4c4SJon ChesterfieldUnlike the host side which lets environment variables select components, the 23372e8a4c4SJon ChesterfielddeviceRTL that is located in the clang lib directory is preferred. Only if 23472e8a4c4SJon Chesterfieldit is absent, the ``LIBRARY_PATH`` environment variable is searched to find a 23572e8a4c4SJon Chesterfieldbitcode file with the right name. This can be overridden by passing a clang 23672e8a4c4SJon Chesterfieldflag, ``--libomptarget-nvptx-bc-path`` or ``--libomptarget-amdgcn-bc-path``. That 23772e8a4c4SJon Chesterfieldcan specify a directory or an exact bitcode file to use. 23872e8a4c4SJon Chesterfield 23972e8a4c4SJon Chesterfield 240227c8ff1SJohannes DoerfertQ: Does OpenMP offloading support work in pre-packaged LLVM releases? 241227c8ff1SJohannes Doerfert^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 242227c8ff1SJohannes DoerfertFor now, the answer is most likely *no*. Please see :ref:`build_offload_capable_compiler`. 243227c8ff1SJohannes Doerfert 244227c8ff1SJohannes DoerfertQ: Does OpenMP offloading support work in packages distributed as part of my OS? 245227c8ff1SJohannes Doerfert^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 246227c8ff1SJohannes DoerfertFor now, the answer is most likely *no*. Please see :ref:`build_offload_capable_compiler`. 2479cb74872SJohannes Doerfert 2489cb74872SJohannes Doerfert 2499cb74872SJohannes Doerfert.. _math_and_complex_in_target_regions: 2509cb74872SJohannes Doerfert 2519cb74872SJohannes DoerfertQ: Does Clang support `<math.h>` and `<complex.h>` operations in OpenMP target on GPUs? 2529cb74872SJohannes Doerfert^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2539cb74872SJohannes Doerfert 2542190c48fSJoseph HuberYes, LLVM/Clang allows math functions and complex arithmetic inside of OpenMP 2552190c48fSJoseph Hubertarget regions that are compiled for GPUs. 2569cb74872SJohannes Doerfert 2579cb74872SJohannes DoerfertClang provides a set of wrapper headers that are found first when `math.h` and 2587b0f9dd7SJohannes Doerfert`complex.h`, for C, `cmath` and `complex`, for C++, or similar headers are 2599cb74872SJohannes Doerfertincluded by the application. These wrappers will eventually include the system 2609cb74872SJohannes Doerfertversion of the corresponding header file after setting up a target device 2619cb74872SJohannes Doerfertspecific environment. The fact that the system header is included is important 2629cb74872SJohannes Doerfertbecause they differ based on the architecture and operating system and may 2639cb74872SJohannes Doerfertcontain preprocessor, variable, and function definitions that need to be 2649cb74872SJohannes Doerfertavailable in the target region regardless of the targeted device architecture. 2659cb74872SJohannes DoerfertHowever, various functions may require specialized device versions, e.g., 2669cb74872SJohannes Doerfert`sin`, and others are only available on certain devices, e.g., `__umul64hi`. To 2679cb74872SJohannes Doerfertprovide "native" support for math and complex on the respective architecture, 2689cb74872SJohannes DoerfertClang will wrap the "native" math functions, e.g., as provided by the device 2699cb74872SJohannes Doerfertvendor, in an OpenMP begin/end declare variant. These functions will then be 2709cb74872SJohannes Doerfertpicked up instead of the host versions while host only variables and function 2719cb74872SJohannes Doerfertdefinitions are still available. Complex arithmetic and functions are support 2729cb74872SJohannes Doerfertthrough a similar mechanism. It is worth noting that this support requires 2739cb74872SJohannes Doerfert`extensions to the OpenMP begin/end declare variant context selector 2749cb74872SJohannes Doerfert<https://clang.llvm.org/docs/AttributeReference.html#pragma-omp-declare-variant>`__ 2759cb74872SJohannes Doerfertthat are exposed through LLVM/Clang to the user as well. 276ec8f4a38SAtmn Patel 277ec8f4a38SAtmn PatelQ: What is a way to debug errors from mapping memory to a target device? 278ec8f4a38SAtmn Patel^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 279ec8f4a38SAtmn Patel 280ec8f4a38SAtmn PatelAn experimental way to debug these errors is to use :ref:`remote process 281ec8f4a38SAtmn Pateloffloading <remote_offloading_plugin>`. 282ec8f4a38SAtmn PatelBy using ``libomptarget.rtl.rpc.so`` and ``openmp-offloading-server``, it is 283ec8f4a38SAtmn Patelpossible to explicitly perform memory transfers between processes on the host 284ec8f4a38SAtmn PatelCPU and run sanitizers while doing so in order to catch these errors. 285ed8943c0SJoseph Huber 2862190c48fSJoseph HuberQ: Can I use dynamically linked libraries with OpenMP offloading? 2872190c48fSJoseph Huber^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 288077fe0f7SJoseph Huber 289077fe0f7SJoseph HuberDynamically linked libraries can be only used if there is no device code split 290077fe0f7SJoseph Huberbetween the library and application. Anything declared on the device inside the 291077fe0f7SJoseph Hubershared library will not be visible to the application when it's linked. 2925c189d30SAsher Mancinelli 2935c189d30SAsher MancinelliQ: How to build an OpenMP offload capable compiler with an outdated host compiler? 2945c189d30SAsher Mancinelli^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2955c189d30SAsher Mancinelli 2965c189d30SAsher MancinelliEnabling the OpenMP runtime will perform a two-stage build for you. 2975c189d30SAsher MancinelliIf your host compiler is different from your system-wide compiler, you may need 2989936ac30SFangrui Songto set ``CMAKE_{C,CXX}_FLAGS`` like 2999936ac30SFangrui Song``--gcc-install-dir=/usr/lib/gcc/x86_64-linux-gnu/12`` so that clang will be 3009936ac30SFangrui Songable to find the correct GCC toolchain in the second stage of the build. 3015c189d30SAsher Mancinelli 3025c189d30SAsher MancinelliFor example, if your system-wide GCC installation is too old to build LLVM and 3039936ac30SFangrui Songyou would like to use a newer GCC, set ``--gcc-install-dir=`` 3045c189d30SAsher Mancinellito inform clang of the GCC installation you would like to use in the second stage. 3052190c48fSJoseph Huber 3062190c48fSJoseph HuberQ: How can I include OpenMP offloading support in my CMake project? 3072190c48fSJoseph Huber^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3082190c48fSJoseph Huber 3092190c48fSJoseph HuberCurrently, there is an experimental CMake find module for OpenMP target 3102190c48fSJoseph Huberoffloading provided by LLVM. It will attempt to find OpenMP target offloading 3112190c48fSJoseph Hubersupport for your compiler. The flags necessary for OpenMP target offloading will 3122190c48fSJoseph Huberbe loaded into the ``OpenMPTarget::OpenMPTarget_<device>`` target or the 3132190c48fSJoseph Huber``OpenMPTarget_<device>_FLAGS`` variable if successful. Currently supported 314b917a1d7SJoseph Huberdevices are ``AMDGPU`` and ``NVPTX``. 3152190c48fSJoseph Huber 3162190c48fSJoseph HuberTo use this module, simply add the path to CMake's current module path and call 3172190c48fSJoseph Huber``find_package``. The module will be installed with your OpenMP installation by 3182190c48fSJoseph Huberdefault. Including OpenMP offloading support in an application should now only 3192190c48fSJoseph Huberrequire a few additions. 3202190c48fSJoseph Huber 3212190c48fSJoseph Huber.. code-block:: cmake 3222190c48fSJoseph Huber 323cbaa3597SMark de Wever cmake_minimum_required(VERSION 3.20.0) 3242190c48fSJoseph Huber project(offloadTest VERSION 1.0 LANGUAGES CXX) 3252190c48fSJoseph Huber 3262190c48fSJoseph Huber list(APPEND CMAKE_MODULE_PATH "${PATH_TO_OPENMP_INSTALL}/lib/cmake/openmp") 3272190c48fSJoseph Huber 3282190c48fSJoseph Huber find_package(OpenMPTarget REQUIRED NVPTX) 3292190c48fSJoseph Huber 3302190c48fSJoseph Huber add_executable(offload) 3312190c48fSJoseph Huber target_link_libraries(offload PRIVATE OpenMPTarget::OpenMPTarget_NVPTX) 3322190c48fSJoseph Huber target_sources(offload PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/src/Main.cpp) 3332190c48fSJoseph Huber 334cbaa3597SMark de WeverUsing this module requires at least CMake version 3.20.0. Supported languages 3352190c48fSJoseph Huberare C and C++ with Fortran support planned in the future. Compiler support is 3362190c48fSJoseph Huberbest for Clang but this module should work for other compiler vendors such as 3372190c48fSJoseph HuberIBM, GNU. 3389582f096SJoseph Huber 3399582f096SJoseph HuberQ: What does 'Stack size for entry function cannot be statically determined' mean? 3409582f096SJoseph Huber^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3419582f096SJoseph Huber 3429582f096SJoseph HuberThis is a warning that the Nvidia tools will sometimes emit if the offloading 3439582f096SJoseph Huberregion is too complex. Normally, the CUDA tools attempt to statically determine 3449582f096SJoseph Huberhow much stack memory each thread. This way when the kernel is launched each 3459582f096SJoseph Huberthread will have as much memory as it needs. If the control flow of the kernel 3469582f096SJoseph Huberis too complex, containing recursive calls or nested parallelism, this analysis 3479582f096SJoseph Hubercan fail. If this warning is triggered it means that the kernel may run out of 3489582f096SJoseph Huberstack memory during execution and crash. The environment variable 3499582f096SJoseph Huber``LIBOMPTARGET_STACK_SIZE`` can be used to increase the stack size if this 3509582f096SJoseph Huberoccurs. 35155d043ccSJoseph Huber 35255d043ccSJoseph HuberQ: Can OpenMP offloading compile for multiple architectures? 35355d043ccSJoseph Huber^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 35455d043ccSJoseph Huber 35555d043ccSJoseph HuberSince LLVM version 15.0, OpenMP offloading supports offloading to multiple 35655d043ccSJoseph Huberarchitectures at once. This allows for executables to be run on different 35755d043ccSJoseph Hubertargets, such as offloading to AMD and NVIDIA GPUs simultaneously, as well as 35855d043ccSJoseph Hubermultiple sub-architectures for the same target. Additionally, static libraries 35955d043ccSJoseph Huberwill only extract archive members if an architecture is used, allowing users to 36055d043ccSJoseph Hubercreate generic libraries. 36155d043ccSJoseph Huber 36255d043ccSJoseph HuberThe architecture can either be specified manually using ``--offload-arch=``. If 36355d043ccSJoseph Huber``--offload-arch=`` is present no ``-fopenmp-targets=`` flag is present then the 36455d043ccSJoseph Hubertargets will be inferred from the architectures. Conversely, if 36555d043ccSJoseph Huber``--fopenmp-targets=`` is present with no ``--offload-arch`` then the target 36655d043ccSJoseph Huberarchitecture will be set to a default value, usually the architecture supported 36755d043ccSJoseph Huberby the system LLVM was built on. 36855d043ccSJoseph Huber 36955d043ccSJoseph HuberFor example, an executable can be built that runs on AMDGPU and NVIDIA hardware 37055d043ccSJoseph Hubergiven that the necessary build tools are installed for both. 37155d043ccSJoseph Huber 37255d043ccSJoseph Huber.. code-block:: shell 37355d043ccSJoseph Huber 37455d043ccSJoseph Huber clang example.c -fopenmp --offload-arch=gfx90a --offload-arch=sm_80 37555d043ccSJoseph Huber 37655d043ccSJoseph HuberIf just given the architectures we should be able to infer the triples, 37755d043ccSJoseph Huberotherwise we can specify them manually. 37855d043ccSJoseph Huber 37955d043ccSJoseph Huber.. code-block:: shell 38055d043ccSJoseph Huber 38155d043ccSJoseph Huber clang example.c -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa,nvptx64-nvidia-cuda \ 38255d043ccSJoseph Huber -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx90a \ 38355d043ccSJoseph Huber -Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_80 38455d043ccSJoseph Huber 38555d043ccSJoseph HuberWhen linking against a static library that contains device code for multiple 38655d043ccSJoseph Huberarchitectures, only the images used by the executable will be extracted. 38755d043ccSJoseph Huber 38855d043ccSJoseph Huber.. code-block:: shell 38955d043ccSJoseph Huber 39055d043ccSJoseph Huber clang example.c -fopenmp --offload-arch=gfx90a,gfx90a,sm_70,sm_80 -c 39155d043ccSJoseph Huber llvm-ar rcs libexample.a example.o 39255d043ccSJoseph Huber clang app.c -fopenmp --offload-arch=gfx90a -o app 39355d043ccSJoseph Huber 39455d043ccSJoseph HuberThe supported device images can be viewed using the ``--offloading`` option with 39555d043ccSJoseph Huber``llvm-objdump``. 39655d043ccSJoseph Huber 39755d043ccSJoseph Huber.. code-block:: shell 39855d043ccSJoseph Huber 39955d043ccSJoseph Huber clang example.c -fopenmp --offload-arch=gfx90a --offload-arch=sm_80 -o example 40055d043ccSJoseph Huber llvm-objdump --offloading example 40155d043ccSJoseph Huber 40255d043ccSJoseph Huber a.out: file format elf64-x86-64 40355d043ccSJoseph Huber 40455d043ccSJoseph Huber OFFLOADING IMAGE [0]: 40555d043ccSJoseph Huber kind elf 40655d043ccSJoseph Huber arch gfx90a 40755d043ccSJoseph Huber triple amdgcn-amd-amdhsa 40855d043ccSJoseph Huber producer openmp 40955d043ccSJoseph Huber 41055d043ccSJoseph Huber OFFLOADING IMAGE [1]: 41155d043ccSJoseph Huber kind elf 41255d043ccSJoseph Huber arch sm_80 41355d043ccSJoseph Huber triple nvptx64-nvidia-cuda 41455d043ccSJoseph Huber producer openmp 415316eaa30SJoseph Huber 416316eaa30SJoseph HuberQ: Can I link OpenMP offloading with CUDA or HIP? 417316eaa30SJoseph Huber^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 418316eaa30SJoseph Huber 419316eaa30SJoseph HuberOpenMP offloading files can currently be experimentally linked with CUDA and HIP 420316eaa30SJoseph Huberfiles. This will allow OpenMP to call a CUDA device function or vice-versa. 421316eaa30SJoseph HuberHowever, the global state will be distinct between the two images at runtime. 422316eaa30SJoseph HuberThis means any global variables will potentially have different values when 423316eaa30SJoseph Huberqueried from OpenMP or CUDA. 424316eaa30SJoseph Huber 425316eaa30SJoseph HuberLinking CUDA and HIP currently requires enabling a different compilation mode 426316eaa30SJoseph Huberfor CUDA / HIP with ``--offload-new-driver`` and to link using 427316eaa30SJoseph Huber``--offload-link``. Additionally, ``-fgpu-rdc`` must be used to create a 428316eaa30SJoseph Huberlinkable device image. 429316eaa30SJoseph Huber 430316eaa30SJoseph Huber.. code-block:: shell 431316eaa30SJoseph Huber 432316eaa30SJoseph Huber clang++ openmp.cpp -fopenmp --offload-arch=sm_80 -c 433316eaa30SJoseph Huber clang++ cuda.cu --offload-new-driver --offload-arch=sm_80 -fgpu-rdc -c 434316eaa30SJoseph Huber clang++ openmp.o cuda.o --offload-link -o app 4358eab182bSShilei Tian 4368eab182bSShilei TianQ: Are libomptarget and plugins backward compatible? 4378eab182bSShilei Tian^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 4388eab182bSShilei Tian 4398eab182bSShilei TianNo. libomptarget and plugins are now built as LLVM libraries starting from LLVM 4408eab182bSShilei Tian15. Because LLVM libraries are not backward compatible, libomptarget and plugins 4418eab182bSShilei Tianare not as well. Given that fact, the interfaces between 1) the Clang compiler 4428eab182bSShilei Tianand libomptarget, 2) the Clang compiler and device runtime library, and 4438eab182bSShilei Tian3) libomptarget and plugins are not guaranteed to be compatible with an earlier 4448eab182bSShilei Tianversion. Users are responsible for ensuring compatibility when not using the 4458eab182bSShilei TianClang compiler and runtime libraries from the same build. Nevertheless, in order 4468eab182bSShilei Tianto better support third-party libraries and toolchains that depend on existing 4478eab182bSShilei Tianlibomptarget entry points, contributors are discouraged from making 4488eab182bSShilei Tianmodifications to them. 44948da6261SJoseph Huber 45048da6261SJoseph HuberQ: Can I use libc functions on the GPU? 45148da6261SJoseph Huber^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 45248da6261SJoseph Huber 45348da6261SJoseph HuberLLVM provides basic ``libc`` functionality through the LLVM C Library. For 45448da6261SJoseph Huberbuilding instructions, refer to the associated `LLVM libc documentation 45548da6261SJoseph Huber<https://libc.llvm.org/gpu/using.html#building-the-gpu-library>`_. Once built, 45648da6261SJoseph Huberthis provides a static library called ``libcgpu.a``. See the documentation for a 45748da6261SJoseph Huberlist of `supported functions <https://libc.llvm.org/gpu/support.html>`_ as well. 45848da6261SJoseph HuberTo utilize these functions, simply link this library as any other when building 45948da6261SJoseph Huberwith OpenMP. 46048da6261SJoseph Huber 46148da6261SJoseph Huber.. code-block:: shell 46248da6261SJoseph Huber 46348da6261SJoseph Huber clang++ openmp.cpp -fopenmp --offload-arch=gfx90a -lcgpu 46448da6261SJoseph Huber 46548da6261SJoseph HuberFor more information on how this is implemented in LLVM/OpenMP's offloading 46648da6261SJoseph Huberruntime, refer to the `runtime documentation <libomptarget_libc>`_. 4675c0f98cdSAnton Rydahl 4685c0f98cdSAnton RydahlQ: What command line options can I use for OpenMP? 4695c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 4705c0f98cdSAnton RydahlWe recommend taking a look at the OpenMP 4715c0f98cdSAnton Rydahl:doc:`command line argument reference <CommandLineArgumentReference>` page. 4725c0f98cdSAnton Rydahl 473c618ae17SJoseph HuberQ: Can I build the offloading runtimes without CUDA or HSA? 474c618ae17SJoseph Huber^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 475c618ae17SJoseph HuberBy default, the offloading runtime will load the associated vendor runtime 476c618ae17SJoseph Huberduring initialization rather than directly linking against them. This allows the 477c618ae17SJoseph Huberprogram to be built and run on many machine. If you wish to directly link 478c618ae17SJoseph Huberagainst these libraries, use the ``LIBOMPTARGET_DLOPEN_PLUGINS=""`` option to 479c618ae17SJoseph Hubersuppress it for each plugin. The default value is every plugin enabled with 480c618ae17SJoseph Huber``LIBOMPTARGET_PLUGINS_TO_BUILD``. 481c618ae17SJoseph Huber 4825c0f98cdSAnton RydahlQ: Why is my build taking a long time? 4835c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 4845c0f98cdSAnton RydahlWhen installing OpenMP and other LLVM components, the build time on multicore 4855c0f98cdSAnton Rydahlsystems can be significantly reduced with parallel build jobs. As suggested in 4865c0f98cdSAnton Rydahl*LLVM Techniques, Tips, and Best Practices*, one could consider using ``ninja`` as the 4875c0f98cdSAnton Rydahlgenerator. This can be done with the CMake option ``cmake -G Ninja``. Afterward, 4885c0f98cdSAnton Rydahluse ``ninja install`` and specify the number of parallel jobs with ``-j``. The build 4895c0f98cdSAnton Rydahltime can also be reduced by setting the build type to ``Release`` with the 4905c0f98cdSAnton Rydahl``CMAKE_BUILD_TYPE`` option. Recompilation can also be sped up by caching previous 4915c0f98cdSAnton Rydahlcompilations. Consider enabling ``Ccache`` with 4925c0f98cdSAnton Rydahl``CMAKE_CXX_COMPILER_LAUNCHER=ccache``. 4935c0f98cdSAnton Rydahl 4945c0f98cdSAnton RydahlQ: Did this FAQ not answer your question? 4955c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 4965c0f98cdSAnton RydahlFeel free to post questions or browse old threads at 4975c0f98cdSAnton Rydahl`LLVM Discourse <https://discourse.llvm.org/c/runtimes/openmp/>`__. 498