openmp/docs/SupportAndFAQ.rst

227c8ff1SJohannes DoerfertSupport, Getting Involved, and FAQ
227c8ff1SJohannes Doerfert==================================
227c8ff1SJohannes Doerfert
520d29f3StlattnerPlease do not hesitate to reach out to us on the `Discourse forums (Runtimes - OpenMP) <https://discourse.llvm.org/c/runtimes/openmp/35>`_ or join
227c8ff1SJohannes Doerfertone of our :ref:`regular calls <calls>`. Some common questions are answered in
227c8ff1SJohannes Doerfertthe :ref:`faq`.
227c8ff1SJohannes Doerfert
227c8ff1SJohannes Doerfert.. _calls:
227c8ff1SJohannes Doerfert
227c8ff1SJohannes DoerfertCalls
227c8ff1SJohannes Doerfert-----
227c8ff1SJohannes Doerfert
227c8ff1SJohannes DoerfertOpenMP in LLVM Technical Call
227c8ff1SJohannes Doerfert^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
227c8ff1SJohannes Doerfert
227c8ff1SJohannes Doerfert-   Development updates on OpenMP (and OpenACC) in the LLVM Project, including Clang, optimization, and runtime work.
227c8ff1SJohannes Doerfert-   Join `OpenMP in LLVM Technical Call <https://bluejeans.com/544112769//webrtc>`__.
227c8ff1SJohannes Doerfert-   Time: Weekly call on every Wednesday 7:00 AM Pacific time.
227c8ff1SJohannes Doerfert-   Meeting minutes are `here <https://docs.google.com/document/d/1Tz8WFN13n7yJ-SCE0Qjqf9LmjGUw0dWO9Ts1ss4YOdg/edit>`__.
227c8ff1SJohannes Doerfert-   Status tracking `page <https://openmp.llvm.org/docs>`__.
30e818dbSJohannes Doerfert
30e818dbSJohannes Doerfert
227c8ff1SJohannes DoerfertOpenMP in Flang Technical Call
227c8ff1SJohannes Doerfert^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
227c8ff1SJohannes Doerfert-   Development updates on OpenMP and OpenACC in the Flang Project.
227c8ff1SJohannes Doerfert-   Join `OpenMP in Flang Technical Call <https://bit.ly/39eQW3o>`_
227c8ff1SJohannes Doerfert-   Time: Weekly call on every Thursdays 8:00 AM Pacific time.
227c8ff1SJohannes Doerfert-   Meeting minutes are `here <https://docs.google.com/document/d/1yA-MeJf6RYY-ZXpdol0t7YoDoqtwAyBhFLr5thu5pFI>`__.
227c8ff1SJohannes Doerfert-   Status tracking `page <https://docs.google.com/spreadsheets/d/1FvHPuSkGbl4mQZRAwCIndvQx9dQboffiD-xD0oqxgU0/edit#gid=0>`__.
227c8ff1SJohannes Doerfert
227c8ff1SJohannes Doerfert
227c8ff1SJohannes Doerfert.. _faq:
227c8ff1SJohannes Doerfert
227c8ff1SJohannes DoerfertFAQ
227c8ff1SJohannes Doerfert---
227c8ff1SJohannes Doerfert
227c8ff1SJohannes Doerfert.. note::
227c8ff1SJohannes Doerfert   The FAQ is a work in progress and most of the expected content is not
227c8ff1SJohannes Doerfert   yet available. While you can expect changes, we always welcome feedback and
520d29f3Stlattner   additions. Please post on the `Discourse forums (Runtimes - OpenMP) <https://discourse.llvm.org/c/runtimes/openmp/35>`__.
227c8ff1SJohannes Doerfert
227c8ff1SJohannes Doerfert
227c8ff1SJohannes DoerfertQ: How to contribute a patch to the webpage or any other part?
227c8ff1SJohannes Doerfert^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
227c8ff1SJohannes Doerfert
227c8ff1SJohannes DoerfertAll patches go through the regular `LLVM review process
227c8ff1SJohannes Doerfert<https://llvm.org/docs/Contributing.html#how-to-submit-a-patch>`_.
227c8ff1SJohannes Doerfert
227c8ff1SJohannes Doerfert
227c8ff1SJohannes Doerfert.. _build_offload_capable_compiler:
227c8ff1SJohannes Doerfert
25fe17d3SJon ChesterfieldQ: How to build an OpenMP GPU offload capable compiler?
25fe17d3SJon Chesterfield^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
*de41b137SJoseph Huber
*de41b137SJoseph HuberThe easiest way to create an offload capable compiler is to use the provided
*de41b137SJoseph HuberCMake cache file. This will enable the projects and runtimes necessary for
*de41b137SJoseph Huberoffloading as well as some extra options.
*de41b137SJoseph Huber
*de41b137SJoseph Huber.. code-block:: sh
*de41b137SJoseph Huber
*de41b137SJoseph Huber  $> cd llvm-project  # The llvm-project checkout
*de41b137SJoseph Huber  $> mkdir build
*de41b137SJoseph Huber  $> cd build
*de41b137SJoseph Huber  $> cmake ../llvm -G Ninja                                                 \
*de41b137SJoseph Huber     -C ../offload/cmake/caches/Offload.cmake \ # The preset cache file
*de41b137SJoseph Huber     -DCMAKE_BUILD_TYPE=<Debug|Release>   \ # Select build type
*de41b137SJoseph Huber     -DCMAKE_INSTALL_PREFIX=<PATH>        \ # Where the libraries will live
*de41b137SJoseph Huber  $> ninja install
*de41b137SJoseph Huber
*de41b137SJoseph HuberTo manually build an *effective* OpenMP offload capable compiler, only one extra CMake
4123050bSBaodi Shanoption, ``LLVM_ENABLE_RUNTIMES="openmp;offload"``, is needed when building LLVM (Generic
2190c48fSJoseph Huberinformation about building LLVM is available `here
2190c48fSJoseph Huber<https://llvm.org/docs/GettingStarted.html>`__.). Make sure all backends that
5c0f98cdSAnton Rydahlare targeted by OpenMP are enabled. That can be done by adjusting the CMake
5c0f98cdSAnton Rydahloption ``LLVM_TARGETS_TO_BUILD``. The corresponding targets for offloading to AMD
5c0f98cdSAnton Rydahland Nvidia GPUs are ``"AMDGPU"`` and ``"NVPTX"``, respectively. By default,
5c0f98cdSAnton RydahlClang will be built with all backends enabled. When building with
5c0f98cdSAnton Rydahl``LLVM_ENABLE_RUNTIMES="openmp"`` OpenMP should not be enabled in
5c0f98cdSAnton Rydahl``LLVM_ENABLE_PROJECTS`` because it is enabled by default.
227c8ff1SJohannes Doerfert
2190c48fSJoseph HuberFor Nvidia offload, please see :ref:`build_nvidia_offload_capable_compiler`.
2190c48fSJoseph HuberFor AMDGPU offload, please see :ref:`build_amdgpu_offload_capable_compiler`.
227c8ff1SJohannes Doerfert
227c8ff1SJohannes Doerfert.. note::
227c8ff1SJohannes Doerfert  The compiler that generates the offload code should be the same (version) as
9d81073aSKelvin Li  the compiler that builds the OpenMP device runtimes. The OpenMP host runtime
9d81073aSKelvin Li  can be built by a different compiler.
227c8ff1SJohannes Doerfert
227c8ff1SJohannes Doerfert.. _advanced_builds: https://llvm.org//docs/AdvancedBuilds.html
227c8ff1SJohannes Doerfert
25fe17d3SJon Chesterfield.. _build_nvidia_offload_capable_compiler:
227c8ff1SJohannes Doerfert
5c0f98cdSAnton RydahlQ: How to build an OpenMP Nvidia offload capable compiler?
25fe17d3SJon Chesterfield^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
25fe17d3SJon ChesterfieldThe Cuda SDK is required on the machine that will execute the openmp application.
25fe17d3SJon Chesterfield
25fe17d3SJon ChesterfieldIf your build machine is not the target machine or automatic detection of the
25fe17d3SJon Chesterfieldavailable GPUs failed, you should also set:
25fe17d3SJon Chesterfield
54b10555SJoel E. Denny- ``LIBOMPTARGET_DEVICE_ARCHITECTURES='sm_<xy>;...'`` where ``<xy>`` is the numeric
5c0f98cdSAnton Rydahl  compute capability of your GPU. For instance, set
54b10555SJoel E. Denny  ``LIBOMPTARGET_DEVICE_ARCHITECTURES='sm_70;sm_80'`` to target the Nvidia Volta
5c0f98cdSAnton Rydahl  and Ampere architectures.
25fe17d3SJon Chesterfield
25fe17d3SJon Chesterfield
25fe17d3SJon Chesterfield.. _build_amdgpu_offload_capable_compiler:
25fe17d3SJon Chesterfield
25fe17d3SJon ChesterfieldQ: How to build an OpenMP AMDGPU offload capable compiler?
25fe17d3SJon Chesterfield^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2190c48fSJoseph HuberA subset of the `ROCm <https://github.com/radeonopencompute>`_ toolchain is
25fe17d3SJon Chesterfieldrequired to build the LLVM toolchain and to execute the openmp application.
25fe17d3SJon ChesterfieldEither install ROCm somewhere that cmake's find_package can locate it, or
25fe17d3SJon Chesterfieldbuild the required subcomponents ROCt and ROCr from source.
25fe17d3SJon Chesterfield
2190c48fSJoseph HuberThe two components used are ROCT-Thunk-Interface, roct, and ROCR-Runtime, rocr.
2190c48fSJoseph HuberRoct is the userspace part of the linux driver. It calls into the driver which
2190c48fSJoseph Huberships with the linux kernel. It is an implementation detail of Rocr from
2190c48fSJoseph HuberOpenMP's perspective. Rocr is an implementation of `HSA
2190c48fSJoseph Huber<http://www.hsafoundation.com>`_.
2190c48fSJoseph Huber
2190c48fSJoseph Huber.. code-block:: text
25fe17d3SJon Chesterfield
25fe17d3SJon Chesterfield  SOURCE_DIR=same-as-llvm-source # e.g. the checkout of llvm-project, next to openmp
25fe17d3SJon Chesterfield  BUILD_DIR=somewhere
25fe17d3SJon Chesterfield  INSTALL_PREFIX=same-as-llvm-install
25fe17d3SJon Chesterfield
25fe17d3SJon Chesterfield  cd $SOURCE_DIR
f244af5cSJon Chesterfield  git clone git@github.com:RadeonOpenCompute/ROCT-Thunk-Interface.git -b roc-4.2.x \
2190c48fSJoseph Huber    --single-branch
f244af5cSJon Chesterfield  git clone git@github.com:RadeonOpenCompute/ROCR-Runtime.git -b rocm-4.2.x \
2190c48fSJoseph Huber    --single-branch
25fe17d3SJon Chesterfield
25fe17d3SJon Chesterfield  cd $BUILD_DIR && mkdir roct && cd roct
2190c48fSJoseph Huber  cmake $SOURCE_DIR/ROCT-Thunk-Interface/ -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX \
2190c48fSJoseph Huber    -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF
25fe17d3SJon Chesterfield  make && make install
25fe17d3SJon Chesterfield
25fe17d3SJon Chesterfield  cd $BUILD_DIR && mkdir rocr && cd rocr
2190c48fSJoseph Huber  cmake $SOURCE_DIR/ROCR-Runtime/src -DIMAGE_SUPPORT=OFF \
2190c48fSJoseph Huber    -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX -DCMAKE_BUILD_TYPE=Release \
2190c48fSJoseph Huber    -DBUILD_SHARED_LIBS=ON
25fe17d3SJon Chesterfield  make && make install
25fe17d3SJon Chesterfield
2190c48fSJoseph Huber``IMAGE_SUPPORT`` requires building rocr with clang and is not used by openmp.
25fe17d3SJon Chesterfield
25fe17d3SJon ChesterfieldProvided cmake's find_package can find the ROCR-Runtime package, LLVM will
2190c48fSJoseph Huberbuild a tool ``bin/amdgpu-arch`` which will print a string like ``gfx906`` when
25fe17d3SJon Chesterfieldrun if it recognises a GPU on the local system. LLVM will also build a shared
25fe17d3SJon Chesterfieldlibrary, libomptarget.rtl.amdgpu.so, which is linked against rocr.
25fe17d3SJon Chesterfield
25fe17d3SJon ChesterfieldWith those libraries installed, then LLVM build and installed, try:
25fe17d3SJon Chesterfield
2190c48fSJoseph Huber.. code-block:: shell
2190c48fSJoseph Huber
25fe17d3SJon Chesterfield    clang -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa example.c -o example && ./example
25fe17d3SJon Chesterfield
5c0f98cdSAnton RydahlIf your build machine is not the target machine or automatic detection of the
5c0f98cdSAnton Rydahlavailable GPUs failed, you should also set:
5c0f98cdSAnton Rydahl
54b10555SJoel E. Denny- ``LIBOMPTARGET_DEVICE_ARCHITECTURES='gfx<xyz>;...'`` where ``<xyz>`` is the
5c0f98cdSAnton Rydahl  shader core instruction set architecture. For instance, set
54b10555SJoel E. Denny  ``LIBOMPTARGET_DEVICE_ARCHITECTURES='gfx906;gfx90a'`` to target AMD GCN5
5c0f98cdSAnton Rydahl  and CDNA2 devices.
5c0f98cdSAnton Rydahl
25fe17d3SJon ChesterfieldQ: What are the known limitations of OpenMP AMDGPU offload?
25fe17d3SJon Chesterfield^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
f244af5cSJon ChesterfieldLD_LIBRARY_PATH or rpath/runpath are required to find libomp.so and libomptarget.so
25fe17d3SJon Chesterfield
f244af5cSJon ChesterfieldThere is no libc. That is, malloc and printf do not exist. Libm is implemented in terms
f244af5cSJon Chesterfieldof the rocm device library, which will be searched for if linking with '-lm'.
25fe17d3SJon Chesterfield
25fe17d3SJon ChesterfieldSome versions of the driver for the radeon vii (gfx906) will error unless the
25fe17d3SJon Chesterfieldenvironment variable 'export HSA_IGNORE_SRAMECC_MISREPORT=1' is set.
25fe17d3SJon Chesterfield
25fe17d3SJon ChesterfieldIt is a recent addition to LLVM and the implementation differs from that which
25fe17d3SJon Chesterfieldhas been shipping in ROCm and AOMP for some time. Early adopters will encounter
25fe17d3SJon Chesterfieldbugs.
227c8ff1SJohannes Doerfert
72e8a4c4SJon ChesterfieldQ: What are the LLVM components used in offloading and how are they found?
72e8a4c4SJon Chesterfield^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
72e8a4c4SJon ChesterfieldThe libraries used by an executable compiled for target offloading are:
7e0b0e05SMark Dewing
72e8a4c4SJon Chesterfield- ``libomp.so`` (or similar), the host openmp runtime
72e8a4c4SJon Chesterfield- ``libomptarget.so``, the target-agnostic target offloading openmp runtime
72e8a4c4SJon Chesterfield- plugins loaded by libomptarget.so:
7e0b0e05SMark Dewing
72e8a4c4SJon Chesterfield  - ``libomptarget.rtl.amdgpu.so``
72e8a4c4SJon Chesterfield  - ``libomptarget.rtl.cuda.so``
72e8a4c4SJon Chesterfield  - ``libomptarget.rtl.x86_64.so``
72e8a4c4SJon Chesterfield  - ``libomptarget.rtl.ve.so``
72e8a4c4SJon Chesterfield  - and others
7e0b0e05SMark Dewing
72e8a4c4SJon Chesterfield- dependencies of those plugins, e.g. cuda/rocr for nvptx/amdgpu
72e8a4c4SJon Chesterfield
72e8a4c4SJon ChesterfieldThe compiled executable is dynamically linked against a host runtime, e.g.
72e8a4c4SJon Chesterfield``libomp.so``, and against the target offloading runtime, ``libomptarget.so``. These
72e8a4c4SJon Chesterfieldare found like any other dynamic library, by setting rpath or runpath on the
72e8a4c4SJon Chesterfieldexecutable, by setting ``LD_LIBRARY_PATH``, or by adding them to the system search.
72e8a4c4SJon Chesterfield
0979ea92SJoseph Huber``libomptarget.so`` is only supported to work with the associated ``clang``
0979ea92SJoseph Hubercompiler. On systems with globally installed ``libomptarget.so`` this can be
0979ea92SJoseph Huberproblematic. For this reason it is recommended to use a `Clang configuration
0979ea92SJoseph Huberfile <https://clang.llvm.org/docs/UsersManual.html#configuration-files>`__ to
0979ea92SJoseph Huberautomatically configure the environment. For example, store the following file
0979ea92SJoseph Huberas ``openmp.cfg`` next to your ``clang`` executable.
0979ea92SJoseph Huber
0979ea92SJoseph Huber.. code-block:: text
0979ea92SJoseph Huber
0979ea92SJoseph Huber  # Library paths for OpenMP offloading.
0979ea92SJoseph Huber  -L '<CFGDIR>/../lib'
0979ea92SJoseph Huber  -Wl,-rpath='<CFGDIR>/../lib'
72e8a4c4SJon Chesterfield
72e8a4c4SJon ChesterfieldThe plugins will try to find their dependencies in plugin-dependent fashion.
72e8a4c4SJon Chesterfield
72e8a4c4SJon ChesterfieldThe cuda plugin is dynamically linked against libcuda if cmake found it at
72e8a4c4SJon Chesterfieldcompiler build time. Otherwise it will attempt to dlopen ``libcuda.so``. It does
72e8a4c4SJon Chesterfieldnot have rpath set.
72e8a4c4SJon Chesterfield
72e8a4c4SJon ChesterfieldThe amdgpu plugin is linked against ROCr if cmake found it at compiler build
72e8a4c4SJon Chesterfieldtime. Otherwise it will attempt to dlopen ``libhsa-runtime64.so``. It has rpath
72e8a4c4SJon Chesterfieldset to ``$ORIGIN``, so installing ``libhsa-runtime64.so`` in the same directory is a
72e8a4c4SJon Chesterfieldway to locate it without environment variables.
72e8a4c4SJon Chesterfield
72e8a4c4SJon ChesterfieldIn addition to those, there is a compiler runtime library called deviceRTL.
72e8a4c4SJon ChesterfieldThis is compiled from mostly common code into an architecture specific
72e8a4c4SJon Chesterfieldbitcode library, e.g. ``libomptarget-nvptx-sm_70.bc``.
72e8a4c4SJon Chesterfield
72e8a4c4SJon ChesterfieldClang and the deviceRTL need to match closely as the interface between them
72e8a4c4SJon Chesterfieldchanges frequently. Using both from the same monorepo checkout is strongly
72e8a4c4SJon Chesterfieldrecommended.
72e8a4c4SJon Chesterfield
72e8a4c4SJon ChesterfieldUnlike the host side which lets environment variables select components, the
72e8a4c4SJon ChesterfielddeviceRTL that is located in the clang lib directory is preferred. Only if
72e8a4c4SJon Chesterfieldit is absent, the ``LIBRARY_PATH`` environment variable is searched to find a
72e8a4c4SJon Chesterfieldbitcode file with the right name. This can be overridden by passing a clang
72e8a4c4SJon Chesterfieldflag, ``--libomptarget-nvptx-bc-path`` or ``--libomptarget-amdgcn-bc-path``. That
72e8a4c4SJon Chesterfieldcan specify a directory or an exact bitcode file to use.
72e8a4c4SJon Chesterfield
72e8a4c4SJon Chesterfield
227c8ff1SJohannes DoerfertQ: Does OpenMP offloading support work in pre-packaged LLVM releases?
227c8ff1SJohannes Doerfert^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
227c8ff1SJohannes DoerfertFor now, the answer is most likely *no*. Please see :ref:`build_offload_capable_compiler`.
227c8ff1SJohannes Doerfert
227c8ff1SJohannes DoerfertQ: Does OpenMP offloading support work in packages distributed as part of my OS?
227c8ff1SJohannes Doerfert^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
227c8ff1SJohannes DoerfertFor now, the answer is most likely *no*. Please see :ref:`build_offload_capable_compiler`.
9cb74872SJohannes Doerfert
9cb74872SJohannes Doerfert
9cb74872SJohannes Doerfert.. _math_and_complex_in_target_regions:
9cb74872SJohannes Doerfert
9cb74872SJohannes DoerfertQ: Does Clang support `<math.h>` and `<complex.h>` operations in OpenMP target on GPUs?
9cb74872SJohannes Doerfert^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
9cb74872SJohannes Doerfert
2190c48fSJoseph HuberYes, LLVM/Clang allows math functions and complex arithmetic inside of OpenMP
2190c48fSJoseph Hubertarget regions that are compiled for GPUs.
9cb74872SJohannes Doerfert
9cb74872SJohannes DoerfertClang provides a set of wrapper headers that are found first when `math.h` and
7b0f9dd7SJohannes Doerfert`complex.h`, for C, `cmath` and `complex`, for C++, or similar headers are
9cb74872SJohannes Doerfertincluded by the application. These wrappers will eventually include the system
9cb74872SJohannes Doerfertversion of the corresponding header file after setting up a target device
9cb74872SJohannes Doerfertspecific environment. The fact that the system header is included is important
9cb74872SJohannes Doerfertbecause they differ based on the architecture and operating system and may
9cb74872SJohannes Doerfertcontain preprocessor, variable, and function definitions that need to be
9cb74872SJohannes Doerfertavailable in the target region regardless of the targeted device architecture.
9cb74872SJohannes DoerfertHowever, various functions may require specialized device versions, e.g.,
9cb74872SJohannes Doerfert`sin`, and others are only available on certain devices, e.g., `__umul64hi`. To
9cb74872SJohannes Doerfertprovide "native" support for math and complex on the respective architecture,
9cb74872SJohannes DoerfertClang will wrap the "native" math functions, e.g., as provided by the device
9cb74872SJohannes Doerfertvendor, in an OpenMP begin/end declare variant. These functions will then be
9cb74872SJohannes Doerfertpicked up instead of the host versions while host only variables and function
9cb74872SJohannes Doerfertdefinitions are still available. Complex arithmetic and functions are support
9cb74872SJohannes Doerfertthrough a similar mechanism. It is worth noting that this support requires
9cb74872SJohannes Doerfert`extensions to the OpenMP begin/end declare variant context selector
9cb74872SJohannes Doerfert<https://clang.llvm.org/docs/AttributeReference.html#pragma-omp-declare-variant>`__
9cb74872SJohannes Doerfertthat are exposed through LLVM/Clang to the user as well.
ec8f4a38SAtmn Patel
ec8f4a38SAtmn PatelQ: What is a way to debug errors from mapping memory to a target device?
ec8f4a38SAtmn Patel^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ec8f4a38SAtmn Patel
ec8f4a38SAtmn PatelAn experimental way to debug these errors is to use :ref:`remote process
ec8f4a38SAtmn Pateloffloading <remote_offloading_plugin>`.
ec8f4a38SAtmn PatelBy using ``libomptarget.rtl.rpc.so`` and ``openmp-offloading-server``, it is
ec8f4a38SAtmn Patelpossible to explicitly perform memory transfers between processes on the host
ec8f4a38SAtmn PatelCPU and run sanitizers while doing so in order to catch these errors.
ed8943c0SJoseph Huber
2190c48fSJoseph HuberQ: Can I use dynamically linked libraries with OpenMP offloading?
2190c48fSJoseph Huber^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
077fe0f7SJoseph Huber
077fe0f7SJoseph HuberDynamically linked libraries can be only used if there is no device code split
077fe0f7SJoseph Huberbetween the library and application. Anything declared on the device inside the
077fe0f7SJoseph Hubershared library will not be visible to the application when it's linked.
5c189d30SAsher Mancinelli
5c189d30SAsher MancinelliQ: How to build an OpenMP offload capable compiler with an outdated host compiler?
5c189d30SAsher Mancinelli^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
5c189d30SAsher Mancinelli
5c189d30SAsher MancinelliEnabling the OpenMP runtime will perform a two-stage build for you.
5c189d30SAsher MancinelliIf your host compiler is different from your system-wide compiler, you may need
9936ac30SFangrui Songto set ``CMAKE_{C,CXX}_FLAGS`` like
9936ac30SFangrui Song``--gcc-install-dir=/usr/lib/gcc/x86_64-linux-gnu/12`` so that clang will be
9936ac30SFangrui Songable to find the correct GCC toolchain in the second stage of the build.
5c189d30SAsher Mancinelli
5c189d30SAsher MancinelliFor example, if your system-wide GCC installation is too old to build LLVM and
9936ac30SFangrui Songyou would like to use a newer GCC, set ``--gcc-install-dir=``
5c189d30SAsher Mancinellito inform clang of the GCC installation you would like to use in the second stage.
2190c48fSJoseph Huber
2190c48fSJoseph HuberQ: How can I include OpenMP offloading support in my CMake project?
2190c48fSJoseph Huber^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2190c48fSJoseph Huber
2190c48fSJoseph HuberCurrently, there is an experimental CMake find module for OpenMP target
2190c48fSJoseph Huberoffloading provided by LLVM. It will attempt to find OpenMP target offloading
2190c48fSJoseph Hubersupport for your compiler. The flags necessary for OpenMP target offloading will
2190c48fSJoseph Huberbe loaded into the ``OpenMPTarget::OpenMPTarget_<device>`` target or the
2190c48fSJoseph Huber``OpenMPTarget_<device>_FLAGS`` variable if successful. Currently supported
b917a1d7SJoseph Huberdevices are ``AMDGPU`` and ``NVPTX``.
2190c48fSJoseph Huber
2190c48fSJoseph HuberTo use this module, simply add the path to CMake's current module path and call
2190c48fSJoseph Huber``find_package``. The module will be installed with your OpenMP installation by
2190c48fSJoseph Huberdefault. Including OpenMP offloading support in an application should now only
2190c48fSJoseph Huberrequire a few additions.
2190c48fSJoseph Huber
2190c48fSJoseph Huber.. code-block:: cmake
2190c48fSJoseph Huber
cbaa3597SMark de Wever  cmake_minimum_required(VERSION 3.20.0)
2190c48fSJoseph Huber  project(offloadTest VERSION 1.0 LANGUAGES CXX)
2190c48fSJoseph Huber
2190c48fSJoseph Huber  list(APPEND CMAKE_MODULE_PATH "${PATH_TO_OPENMP_INSTALL}/lib/cmake/openmp")
2190c48fSJoseph Huber
2190c48fSJoseph Huber  find_package(OpenMPTarget REQUIRED NVPTX)
2190c48fSJoseph Huber
2190c48fSJoseph Huber  add_executable(offload)
2190c48fSJoseph Huber  target_link_libraries(offload PRIVATE OpenMPTarget::OpenMPTarget_NVPTX)
2190c48fSJoseph Huber  target_sources(offload PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/src/Main.cpp)
2190c48fSJoseph Huber
cbaa3597SMark de WeverUsing this module requires at least CMake version 3.20.0. Supported languages
2190c48fSJoseph Huberare C and C++ with Fortran support planned in the future. Compiler support is
2190c48fSJoseph Huberbest for Clang but this module should work for other compiler vendors such as
2190c48fSJoseph HuberIBM, GNU.
9582f096SJoseph Huber
9582f096SJoseph HuberQ: What does 'Stack size for entry function cannot be statically determined' mean?
9582f096SJoseph Huber^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
9582f096SJoseph Huber
9582f096SJoseph HuberThis is a warning that the Nvidia tools will sometimes emit if the offloading
9582f096SJoseph Huberregion is too complex. Normally, the CUDA tools attempt to statically determine
9582f096SJoseph Huberhow much stack memory each thread. This way when the kernel is launched each
9582f096SJoseph Huberthread will have as much memory as it needs. If the control flow of the kernel
9582f096SJoseph Huberis too complex, containing recursive calls or nested parallelism, this analysis
9582f096SJoseph Hubercan fail. If this warning is triggered it means that the kernel may run out of
9582f096SJoseph Huberstack memory during execution and crash. The environment variable
9582f096SJoseph Huber``LIBOMPTARGET_STACK_SIZE`` can be used to increase the stack size if this
9582f096SJoseph Huberoccurs.
55d043ccSJoseph Huber
55d043ccSJoseph HuberQ: Can OpenMP offloading compile for multiple architectures?
55d043ccSJoseph Huber^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
55d043ccSJoseph Huber
55d043ccSJoseph HuberSince LLVM version 15.0, OpenMP offloading supports offloading to multiple
55d043ccSJoseph Huberarchitectures at once. This allows for executables to be run on different
55d043ccSJoseph Hubertargets, such as offloading to AMD and NVIDIA GPUs simultaneously, as well as
55d043ccSJoseph Hubermultiple sub-architectures for the same target. Additionally, static libraries
55d043ccSJoseph Huberwill only extract archive members if an architecture is used, allowing users to
55d043ccSJoseph Hubercreate generic libraries.
55d043ccSJoseph Huber
55d043ccSJoseph HuberThe architecture can either be specified manually using ``--offload-arch=``. If
55d043ccSJoseph Huber``--offload-arch=`` is present no ``-fopenmp-targets=`` flag is present then the
55d043ccSJoseph Hubertargets will be inferred from the architectures. Conversely, if
55d043ccSJoseph Huber``--fopenmp-targets=`` is present with no ``--offload-arch`` then the target
55d043ccSJoseph Huberarchitecture will be set to a default value, usually the architecture supported
55d043ccSJoseph Huberby the system LLVM was built on.
55d043ccSJoseph Huber
55d043ccSJoseph HuberFor example, an executable can be built that runs on AMDGPU and NVIDIA hardware
55d043ccSJoseph Hubergiven that the necessary build tools are installed for both.
55d043ccSJoseph Huber
55d043ccSJoseph Huber.. code-block:: shell
55d043ccSJoseph Huber
55d043ccSJoseph Huber   clang example.c -fopenmp --offload-arch=gfx90a --offload-arch=sm_80
55d043ccSJoseph Huber
55d043ccSJoseph HuberIf just given the architectures we should be able to infer the triples,
55d043ccSJoseph Huberotherwise we can specify them manually.
55d043ccSJoseph Huber
55d043ccSJoseph Huber.. code-block:: shell
55d043ccSJoseph Huber
55d043ccSJoseph Huber   clang example.c -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa,nvptx64-nvidia-cuda \
55d043ccSJoseph Huber      -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx90a \
55d043ccSJoseph Huber      -Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_80
55d043ccSJoseph Huber
55d043ccSJoseph HuberWhen linking against a static library that contains device code for multiple
55d043ccSJoseph Huberarchitectures, only the images used by the executable will be extracted.
55d043ccSJoseph Huber
55d043ccSJoseph Huber.. code-block:: shell
55d043ccSJoseph Huber
55d043ccSJoseph Huber   clang example.c -fopenmp --offload-arch=gfx90a,gfx90a,sm_70,sm_80 -c
55d043ccSJoseph Huber   llvm-ar rcs libexample.a example.o
55d043ccSJoseph Huber   clang app.c -fopenmp --offload-arch=gfx90a -o app
55d043ccSJoseph Huber
55d043ccSJoseph HuberThe supported device images can be viewed using the ``--offloading`` option with
55d043ccSJoseph Huber``llvm-objdump``.
55d043ccSJoseph Huber
55d043ccSJoseph Huber.. code-block:: shell
55d043ccSJoseph Huber
55d043ccSJoseph Huber   clang example.c -fopenmp --offload-arch=gfx90a --offload-arch=sm_80 -o example
55d043ccSJoseph Huber   llvm-objdump --offloading example
55d043ccSJoseph Huber
55d043ccSJoseph Huber   a.out:  file format elf64-x86-64
55d043ccSJoseph Huber
55d043ccSJoseph Huber   OFFLOADING IMAGE [0]:
55d043ccSJoseph Huber   kind            elf
55d043ccSJoseph Huber   arch            gfx90a
55d043ccSJoseph Huber   triple          amdgcn-amd-amdhsa
55d043ccSJoseph Huber   producer        openmp
55d043ccSJoseph Huber
55d043ccSJoseph Huber   OFFLOADING IMAGE [1]:
55d043ccSJoseph Huber   kind            elf
55d043ccSJoseph Huber   arch            sm_80
55d043ccSJoseph Huber   triple          nvptx64-nvidia-cuda
55d043ccSJoseph Huber   producer        openmp
316eaa30SJoseph Huber
316eaa30SJoseph HuberQ: Can I link OpenMP offloading with CUDA or HIP?
316eaa30SJoseph Huber^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
316eaa30SJoseph Huber
316eaa30SJoseph HuberOpenMP offloading files can currently be experimentally linked with CUDA and HIP
316eaa30SJoseph Huberfiles. This will allow OpenMP to call a CUDA device function or vice-versa.
316eaa30SJoseph HuberHowever, the global state will be distinct between the two images at runtime.
316eaa30SJoseph HuberThis means any global variables will potentially have different values when
316eaa30SJoseph Huberqueried from OpenMP or CUDA.
316eaa30SJoseph Huber
316eaa30SJoseph HuberLinking CUDA and HIP currently requires enabling a different compilation mode
316eaa30SJoseph Huberfor CUDA / HIP with ``--offload-new-driver`` and to link using
316eaa30SJoseph Huber``--offload-link``. Additionally, ``-fgpu-rdc`` must be used to create a
316eaa30SJoseph Huberlinkable device image.
316eaa30SJoseph Huber
316eaa30SJoseph Huber.. code-block:: shell
316eaa30SJoseph Huber
316eaa30SJoseph Huber   clang++ openmp.cpp -fopenmp --offload-arch=sm_80 -c
316eaa30SJoseph Huber   clang++ cuda.cu --offload-new-driver --offload-arch=sm_80 -fgpu-rdc -c
316eaa30SJoseph Huber   clang++ openmp.o cuda.o --offload-link -o app
8eab182bSShilei Tian
8eab182bSShilei TianQ: Are libomptarget and plugins backward compatible?
8eab182bSShilei Tian^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8eab182bSShilei Tian
8eab182bSShilei TianNo. libomptarget and plugins are now built as LLVM libraries starting from LLVM
8eab182bSShilei Tian15. Because LLVM libraries are not backward compatible, libomptarget and plugins
8eab182bSShilei Tianare not as well. Given that fact, the interfaces between 1) the Clang compiler
8eab182bSShilei Tianand libomptarget, 2) the Clang compiler and device runtime library, and
8eab182bSShilei Tian3) libomptarget and plugins are not guaranteed to be compatible with an earlier
8eab182bSShilei Tianversion. Users are responsible for ensuring compatibility when not using the
8eab182bSShilei TianClang compiler and runtime libraries from the same build. Nevertheless, in order
8eab182bSShilei Tianto better support third-party libraries and toolchains that depend on existing
8eab182bSShilei Tianlibomptarget entry points, contributors are discouraged from making
8eab182bSShilei Tianmodifications to them.
48da6261SJoseph Huber
48da6261SJoseph HuberQ: Can I use libc functions on the GPU?
48da6261SJoseph Huber^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
48da6261SJoseph Huber
48da6261SJoseph HuberLLVM provides basic ``libc`` functionality through the LLVM C Library. For
48da6261SJoseph Huberbuilding instructions, refer to the associated `LLVM libc documentation
48da6261SJoseph Huber<https://libc.llvm.org/gpu/using.html#building-the-gpu-library>`_. Once built,
48da6261SJoseph Huberthis provides a static library called ``libcgpu.a``. See the documentation for a
48da6261SJoseph Huberlist of `supported functions <https://libc.llvm.org/gpu/support.html>`_ as well.
48da6261SJoseph HuberTo utilize these functions, simply link this library as any other when building
48da6261SJoseph Huberwith OpenMP.
48da6261SJoseph Huber
48da6261SJoseph Huber.. code-block:: shell
48da6261SJoseph Huber
48da6261SJoseph Huber   clang++ openmp.cpp -fopenmp --offload-arch=gfx90a -lcgpu
48da6261SJoseph Huber
48da6261SJoseph HuberFor more information on how this is implemented in LLVM/OpenMP's offloading
48da6261SJoseph Huberruntime, refer to the `runtime documentation <libomptarget_libc>`_.
5c0f98cdSAnton Rydahl
5c0f98cdSAnton RydahlQ: What command line options can I use for OpenMP?
5c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
5c0f98cdSAnton RydahlWe recommend taking a look at the OpenMP
5c0f98cdSAnton Rydahl:doc:`command line argument reference <CommandLineArgumentReference>` page.
5c0f98cdSAnton Rydahl
c618ae17SJoseph HuberQ: Can I build the offloading runtimes without CUDA or HSA?
c618ae17SJoseph Huber^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
c618ae17SJoseph HuberBy default, the offloading runtime will load the associated vendor runtime
c618ae17SJoseph Huberduring initialization rather than directly linking against them. This allows the
c618ae17SJoseph Huberprogram to be built and run on many machine. If you wish to directly link
c618ae17SJoseph Huberagainst these libraries, use the ``LIBOMPTARGET_DLOPEN_PLUGINS=""`` option to
c618ae17SJoseph Hubersuppress it for each plugin. The default value is every plugin enabled with
c618ae17SJoseph Huber``LIBOMPTARGET_PLUGINS_TO_BUILD``.
c618ae17SJoseph Huber
5c0f98cdSAnton RydahlQ: Why is my build taking a long time?
5c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
5c0f98cdSAnton RydahlWhen installing OpenMP and other LLVM components, the build time on multicore
5c0f98cdSAnton Rydahlsystems can be significantly reduced with parallel build jobs. As suggested in
5c0f98cdSAnton Rydahl*LLVM Techniques, Tips, and Best Practices*, one could consider using ``ninja`` as the
5c0f98cdSAnton Rydahlgenerator. This can be done with the CMake option ``cmake -G Ninja``. Afterward,
5c0f98cdSAnton Rydahluse ``ninja install`` and specify the number of parallel jobs with ``-j``. The build
5c0f98cdSAnton Rydahltime can also be reduced by setting the build type to ``Release`` with the
5c0f98cdSAnton Rydahl``CMAKE_BUILD_TYPE`` option. Recompilation can also be sped up by caching previous
5c0f98cdSAnton Rydahlcompilations. Consider enabling ``Ccache`` with
5c0f98cdSAnton Rydahl``CMAKE_CXX_COMPILER_LAUNCHER=ccache``.
5c0f98cdSAnton Rydahl
5c0f98cdSAnton RydahlQ: Did this FAQ not answer your question?
5c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
5c0f98cdSAnton RydahlFeel free to post questions or browse old threads at
5c0f98cdSAnton Rydahl`LLVM Discourse <https://discourse.llvm.org/c/runtimes/openmp/>`__.