15c0f98cdSAnton RydahlOpenMP Command-Line Argument Reference 25c0f98cdSAnton Rydahl====================================== 35c0f98cdSAnton RydahlWelcome to the OpenMP in LLVM command line argument reference. The content is 45c0f98cdSAnton Rydahlnot a complete list of arguments but includes the essential command-line 55c0f98cdSAnton Rydahlarguments you may need when compiling and linking OpenMP. 65c0f98cdSAnton RydahlSection :ref:`general_command_line_arguments` lists OpenMP command line options 75c0f98cdSAnton Rydahlfor multicore programming while :ref:`offload_command_line_arguments` lists 85c0f98cdSAnton Rydahloptions relevant to OpenMP target offloading. 95c0f98cdSAnton Rydahl 105c0f98cdSAnton Rydahl.. _general_command_line_arguments: 115c0f98cdSAnton Rydahl 125c0f98cdSAnton RydahlOpenMP Command-Line Arguments 135c0f98cdSAnton Rydahl----------------------------- 145c0f98cdSAnton Rydahl 155c0f98cdSAnton Rydahl``-fopenmp`` 165c0f98cdSAnton Rydahl^^^^^^^^^^^^ 175c0f98cdSAnton RydahlEnable the OpenMP compilation toolchain. The compiler will parse OpenMP 185c0f98cdSAnton Rydahlcompiler directives and generate parallel code. 195c0f98cdSAnton Rydahl 205c0f98cdSAnton Rydahl``-fopenmp-extensions`` 215c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^ 225c0f98cdSAnton RydahlEnable all ``Clang`` extensions for OpenMP directives and clauses. A list of 235c0f98cdSAnton Rydahlcurrent extensions and their implementation status can be found on the 245c0f98cdSAnton Rydahl`support <https://clang.llvm.org/docs/OpenMPSupport.html#openmp-extensions>`_ 255c0f98cdSAnton Rydahlpage. 265c0f98cdSAnton Rydahl 275c0f98cdSAnton Rydahl``-fopenmp-simd`` 285c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^ 295c0f98cdSAnton RydahlThis option enables OpenMP only for single instruction, multiple data 305c0f98cdSAnton Rydahl(SIMD) constructs. 315c0f98cdSAnton Rydahl 325c0f98cdSAnton Rydahl``-static-openmp`` 335c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^ 345c0f98cdSAnton RydahlUse the static OpenMP host runtime while linking. 355c0f98cdSAnton Rydahl 365c0f98cdSAnton Rydahl``-fopenmp-version=<arg>`` 375c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^^ 385c0f98cdSAnton RydahlSet the OpenMP version to a specific version ``<arg>`` of the OpenMP standard. 395c0f98cdSAnton RydahlFor example, you may use ``-fopenmp-version=45`` to select version 4.5 of 40*c1b5674fSAnton Rydahlthe OpenMP standard. The default value is ``-fopenmp-version=51`` for ``Clang``. 415c0f98cdSAnton Rydahl 425c0f98cdSAnton Rydahl.. _offload_command_line_arguments: 435c0f98cdSAnton Rydahl 445c0f98cdSAnton RydahlOffloading Specific Command-Line Arguments 455c0f98cdSAnton Rydahl------------------------------------------ 465c0f98cdSAnton Rydahl 475c0f98cdSAnton Rydahl.. _fopenmp-targets: 485c0f98cdSAnton Rydahl 495c0f98cdSAnton Rydahl``-fopenmp-targets`` 505c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^ 515c0f98cdSAnton Rydahl| Specify which OpenMP offloading targets should be supported. For example, you 525c0f98cdSAnton Rydahl may specify ``-fopenmp-targets=amdgcn-amd-amdhsa,nvptx64``. This option is 535c0f98cdSAnton Rydahl often optional when :ref:`offload_arch` is provided. 545c0f98cdSAnton Rydahl| It is also possible to offload to CPU architectures, for instance with 555c0f98cdSAnton Rydahl ``-fopenmp-targets=x86_64-pc-linux-gnu``. 565c0f98cdSAnton Rydahl 575c0f98cdSAnton Rydahl.. _offload_arch: 585c0f98cdSAnton Rydahl 595c0f98cdSAnton Rydahl``--offload-arch`` 605c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^ 615c0f98cdSAnton Rydahl| Specify the device architecture for OpenMP offloading. For instance 625c0f98cdSAnton Rydahl ``--offload-arch=sm_80`` to target an Nvidia Tesla A100, 635c0f98cdSAnton Rydahl ``--offload-arch=gfx90a`` to target an AMD Instinct MI250X, or 645c0f98cdSAnton Rydahl ``--offload-arch=sm_80,gfx90a`` to target both. 655c0f98cdSAnton Rydahl| It is also possible to specify :ref:`fopenmp-targets` without specifying 665c0f98cdSAnton Rydahl ``--offload-arch``. In that case, the executables ``amdgpu-arch`` or 675c0f98cdSAnton Rydahl ``nvptx-arch`` will be executed as part of the compiler driver to 6811e29758SKazu Hirata detect the device architecture automatically. 695c0f98cdSAnton Rydahl| Finally, the device architecture will also be automatically inferred with 705c0f98cdSAnton Rydahl ``--offload-arch=native``. 715c0f98cdSAnton Rydahl 725c0f98cdSAnton Rydahl``--offload-device-only`` 735c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^ 745c0f98cdSAnton RydahlCompile only the code that goes on the device. This option is mainly for 755c0f98cdSAnton Rydahldebugging purposes. It is primarily used for inspecting the intermediate 765c0f98cdSAnton Rydahlrepresentation (IR) output when compiling for the device. It may also be used 775c0f98cdSAnton Rydahlif device-only runtimes are created. 785c0f98cdSAnton Rydahl 795c0f98cdSAnton Rydahl``--offload-host-only`` 805c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^ 815c0f98cdSAnton RydahlCompile only the code that goes on the host. With this option enabled, the 825c0f98cdSAnton Rydahl``.llvm.offloading`` section with embedded device code will not be included in 835c0f98cdSAnton Rydahlthe intermediate representation. 845c0f98cdSAnton Rydahl 855c0f98cdSAnton Rydahl``--offload-host-device`` 865c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^ 875c0f98cdSAnton RydahlCompile the target regions for both the host and the device. That is the 885c0f98cdSAnton Rydahldefault option. 895c0f98cdSAnton Rydahl 905c0f98cdSAnton Rydahl``-Xopenmp-target <arg>`` 915c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^ 925c0f98cdSAnton RydahlPass an argument ``<arg>`` to the offloading toolchain, for instance 935c0f98cdSAnton Rydahl``-Xopenmp-target -march=sm_80``. 945c0f98cdSAnton Rydahl 955c0f98cdSAnton Rydahl``-Xopenmp-target=<triple> <arg>`` 965c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 975c0f98cdSAnton RydahlPass an argument ``<arg>`` to the offloading toolchain for the target 985c0f98cdSAnton Rydahl``<triple>``. That is especially useful when an argument must differ for each 995c0f98cdSAnton Rydahltriple. For instance ``-Xopenmp-target=nvptx64 --offload-arch=sm_80 1005c0f98cdSAnton Rydahl-Xopenmp-target=amdgcn --offload-arch=gfx90a`` to specify the device 1015c0f98cdSAnton Rydahlarchitecture. Alternatively, :ref:`Xarch_host` and :ref:`Xarch_device` can 1025c0f98cdSAnton Rydahlpass an argument to the host and device compilation toolchain. 1035c0f98cdSAnton Rydahl 1045c0f98cdSAnton Rydahl``-Xoffload-linker<triple> <arg>`` 1055c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1065c0f98cdSAnton RydahlPass an argument ``<arg>`` to the offloading linker for the target specified in 1075c0f98cdSAnton Rydahl``<triple>``. 1085c0f98cdSAnton Rydahl 1095c0f98cdSAnton Rydahl.. _Xarch_device: 1105c0f98cdSAnton Rydahl 1115c0f98cdSAnton Rydahl``-Xarch_device <arg>`` 1125c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^ 1135c0f98cdSAnton RydahlPass an argument ``<arg>`` to the device compilation toolchain. 1145c0f98cdSAnton Rydahl 1155c0f98cdSAnton Rydahl.. _Xarch_host: 1165c0f98cdSAnton Rydahl 1175c0f98cdSAnton Rydahl``-Xarch_host <arg>`` 1185c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^ 1195c0f98cdSAnton RydahlPass an argument ``<arg>`` to the host compilation toolchain. 1205c0f98cdSAnton Rydahl 1215c0f98cdSAnton Rydahl``-foffload-lto[=<arg>]`` 1225c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^ 1235c0f98cdSAnton RydahlEnable device link time optimization (LTO) and select the LTO mode ``<arg>``. 1245c0f98cdSAnton RydahlSelect either ``-foffload-lto=thin`` or ``-foffload-lto=full``. Thin LTO takes 1255c0f98cdSAnton Rydahlless time while still achieving some performance gains. If no argument is set, 1265c0f98cdSAnton Rydahlthis option defaults to ``-foffload-lto=full``. 1275c0f98cdSAnton Rydahl 1285c0f98cdSAnton Rydahl``-fopenmp-offload-mandatory`` 1295c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1305c0f98cdSAnton Rydahl| This option is set to avoid generating the host fallback code 1315c0f98cdSAnton Rydahl executed when offloading to the device fails. That is 1325c0f98cdSAnton Rydahl helpful when the target contains code that cannot be compiled for the host, for 1335c0f98cdSAnton Rydahl instance, if it contains unguarded device intrinsics. 1345c0f98cdSAnton Rydahl| This option can also be used to reduce compile time. 1355c0f98cdSAnton Rydahl| This option should not be used when one wants to verify that the code is being 1365c0f98cdSAnton Rydahl offloaded to the device. Instead, set the environment variable 1375c0f98cdSAnton Rydahl ``OMP_TARGET_OFFLOAD='MANDATORY'`` to confirm that the code is being offloaded to 1385c0f98cdSAnton Rydahl the device. 1395c0f98cdSAnton Rydahl 1405c0f98cdSAnton Rydahl``-fopenmp-target-debug[=<arg>]`` 1415c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1425c0f98cdSAnton RydahlEnable debugging in the device runtime library (RTL). Note that it is both 1435c0f98cdSAnton Rydahlnecessary to configure the debugging in the device runtime at compile-time with 1445c0f98cdSAnton Rydahl``-fopenmp-target-debug=<arg>`` and enable debugging at runtime with the 1455c0f98cdSAnton Rydahlenvironment variable ``LIBOMPTARGET_DEVICE_RTL_DEBUG=<arg>``. Further, it is 1465c0f98cdSAnton Rydahlcurrently only supported for Nvidia targets as of July 2023. Alternatively, the 1475c0f98cdSAnton Rydahlenvironment variable ``LIBOMPTARGET_DEBUG`` can be set to debug both Nvidia and 1485c0f98cdSAnton RydahlAMD GPU targets. For more information, see the 1495c0f98cdSAnton Rydahl`debugging instructions <https://openmp.llvm.org/design/Runtimes.html#debugging>`_. 1505c0f98cdSAnton RydahlThe debugging instructions list the supported debugging arguments. 1515c0f98cdSAnton Rydahl 1525c0f98cdSAnton Rydahl``-fopenmp-target-jit`` 1535c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^ 1545c0f98cdSAnton Rydahl| Emit code that is Just-in-Time (JIT) compiled for OpenMP offloading. Embed 1555c0f98cdSAnton Rydahl LLVM-IR for the device code in the object files rather than binary code for the 1565c0f98cdSAnton Rydahl respective target. At runtime, the LLVM-IR is optimized again and compiled for 1575c0f98cdSAnton Rydahl the target device. The optimization level can be set at runtime with 1585c0f98cdSAnton Rydahl ``LIBOMPTARGET_JIT_OPT_LEVEL``, for instance, 1595c0f98cdSAnton Rydahl ``LIBOMPTARGET_JIT_OPT_LEVEL=3`` corresponding to optimizations level ``-O3``. 1605c0f98cdSAnton Rydahl See the 1615c0f98cdSAnton Rydahl `OpenMP JIT details <https://openmp.llvm.org/design/Runtimes.html#libomptarget-jit-pre-opt-ir-module>`_ 1625c0f98cdSAnton Rydahl for instructions on extracting the embedded device code before or after the 1635c0f98cdSAnton Rydahl JIT and more. 1645c0f98cdSAnton Rydahl| We want to emphasize that JIT for OpenMP offloading is good for debugging as 1655c0f98cdSAnton Rydahl the target IR can be extracted, modified, and injected at runtime. 1665c0f98cdSAnton Rydahl 1675c0f98cdSAnton Rydahl``--offload-new-driver`` 1685c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^ 1695c0f98cdSAnton RydahlIn upstream LLVM, OpenMP only uses the new driver. However, enabling this 1705c0f98cdSAnton Rydahloption for experimental linking with CUDA or HIP files is necessary. 1715c0f98cdSAnton Rydahl 1725c0f98cdSAnton Rydahl``--offload-link`` 1735c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^ 1745c0f98cdSAnton RydahlUse the new offloading linker `clang-linker-wrapper` to perform the link job. 1755c0f98cdSAnton Rydahl`clang-linker-wrapper` is the default offloading linker for OpenMP. This option 1765c0f98cdSAnton Rydahlcan be used to use the new offloading linker in toolchains that do not automatically 1775c0f98cdSAnton Rydahluse it. It is necessary to enable this option when linking with CUDA or HIP files. 1785c0f98cdSAnton Rydahl 1795c0f98cdSAnton Rydahl``-nogpulib`` 1805c0f98cdSAnton Rydahl^^^^^^^^^^^^^ 1815c0f98cdSAnton RydahlDo not link the device library for CUDA or HIP device compilation. 1825c0f98cdSAnton Rydahl 1835c0f98cdSAnton Rydahl``-nogpuinc`` 1845c0f98cdSAnton Rydahl^^^^^^^^^^^^^ 1855c0f98cdSAnton RydahlDo not include the default CUDA or HIP headers, and do not add CUDA or HIP 1865c0f98cdSAnton Rydahlinclude paths. 187