xref: /llvm-project/openmp/docs/CommandLineArgumentReference.rst (revision c1b5674fbb76c4b137a1e856441019605668f1ca)
15c0f98cdSAnton RydahlOpenMP Command-Line Argument Reference
25c0f98cdSAnton Rydahl======================================
35c0f98cdSAnton RydahlWelcome to the OpenMP in LLVM command line argument reference. The content is
45c0f98cdSAnton Rydahlnot a complete list of arguments but includes the essential command-line
55c0f98cdSAnton Rydahlarguments you may need when compiling and linking OpenMP.
65c0f98cdSAnton RydahlSection :ref:`general_command_line_arguments` lists OpenMP command line options
75c0f98cdSAnton Rydahlfor multicore programming while  :ref:`offload_command_line_arguments` lists
85c0f98cdSAnton Rydahloptions relevant to OpenMP target offloading.
95c0f98cdSAnton Rydahl
105c0f98cdSAnton Rydahl.. _general_command_line_arguments:
115c0f98cdSAnton Rydahl
125c0f98cdSAnton RydahlOpenMP Command-Line Arguments
135c0f98cdSAnton Rydahl-----------------------------
145c0f98cdSAnton Rydahl
155c0f98cdSAnton Rydahl``-fopenmp``
165c0f98cdSAnton Rydahl^^^^^^^^^^^^
175c0f98cdSAnton RydahlEnable the OpenMP compilation toolchain. The compiler will parse OpenMP
185c0f98cdSAnton Rydahlcompiler directives and generate parallel code.
195c0f98cdSAnton Rydahl
205c0f98cdSAnton Rydahl``-fopenmp-extensions``
215c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^
225c0f98cdSAnton RydahlEnable all ``Clang`` extensions for OpenMP directives and clauses. A list of
235c0f98cdSAnton Rydahlcurrent extensions and their implementation status can be found on the
245c0f98cdSAnton Rydahl`support <https://clang.llvm.org/docs/OpenMPSupport.html#openmp-extensions>`_
255c0f98cdSAnton Rydahlpage.
265c0f98cdSAnton Rydahl
275c0f98cdSAnton Rydahl``-fopenmp-simd``
285c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^
295c0f98cdSAnton RydahlThis option enables OpenMP only for single instruction, multiple data
305c0f98cdSAnton Rydahl(SIMD) constructs.
315c0f98cdSAnton Rydahl
325c0f98cdSAnton Rydahl``-static-openmp``
335c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^
345c0f98cdSAnton RydahlUse the static OpenMP host runtime while linking.
355c0f98cdSAnton Rydahl
365c0f98cdSAnton Rydahl``-fopenmp-version=<arg>``
375c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^^
385c0f98cdSAnton RydahlSet the OpenMP version to a specific version ``<arg>`` of the OpenMP standard.
395c0f98cdSAnton RydahlFor example, you may use ``-fopenmp-version=45`` to select version 4.5 of
40*c1b5674fSAnton Rydahlthe OpenMP standard. The default value is ``-fopenmp-version=51`` for ``Clang``.
415c0f98cdSAnton Rydahl
425c0f98cdSAnton Rydahl.. _offload_command_line_arguments:
435c0f98cdSAnton Rydahl
445c0f98cdSAnton RydahlOffloading Specific Command-Line Arguments
455c0f98cdSAnton Rydahl------------------------------------------
465c0f98cdSAnton Rydahl
475c0f98cdSAnton Rydahl.. _fopenmp-targets:
485c0f98cdSAnton Rydahl
495c0f98cdSAnton Rydahl``-fopenmp-targets``
505c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^
515c0f98cdSAnton Rydahl| Specify which OpenMP offloading targets should be supported. For example, you
525c0f98cdSAnton Rydahl  may specify ``-fopenmp-targets=amdgcn-amd-amdhsa,nvptx64``. This option is
535c0f98cdSAnton Rydahl  often optional when :ref:`offload_arch` is provided.
545c0f98cdSAnton Rydahl| It is also possible to offload to CPU architectures, for instance with
555c0f98cdSAnton Rydahl  ``-fopenmp-targets=x86_64-pc-linux-gnu``.
565c0f98cdSAnton Rydahl
575c0f98cdSAnton Rydahl.. _offload_arch:
585c0f98cdSAnton Rydahl
595c0f98cdSAnton Rydahl``--offload-arch``
605c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^
615c0f98cdSAnton Rydahl| Specify the device architecture for OpenMP offloading. For instance
625c0f98cdSAnton Rydahl  ``--offload-arch=sm_80`` to target an Nvidia Tesla A100,
635c0f98cdSAnton Rydahl  ``--offload-arch=gfx90a`` to target an AMD Instinct MI250X, or
645c0f98cdSAnton Rydahl  ``--offload-arch=sm_80,gfx90a`` to target both.
655c0f98cdSAnton Rydahl| It is also possible to specify :ref:`fopenmp-targets` without specifying
665c0f98cdSAnton Rydahl  ``--offload-arch``. In that case, the executables ``amdgpu-arch`` or
675c0f98cdSAnton Rydahl  ``nvptx-arch`` will be executed as part of the compiler driver to
6811e29758SKazu Hirata  detect the device architecture automatically.
695c0f98cdSAnton Rydahl| Finally, the device architecture will also be automatically inferred with
705c0f98cdSAnton Rydahl  ``--offload-arch=native``.
715c0f98cdSAnton Rydahl
725c0f98cdSAnton Rydahl``--offload-device-only``
735c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^
745c0f98cdSAnton RydahlCompile only the code that goes on the device. This option is mainly for
755c0f98cdSAnton Rydahldebugging purposes. It is primarily used for inspecting the intermediate
765c0f98cdSAnton Rydahlrepresentation (IR) output when compiling for the device. It may also be used
775c0f98cdSAnton Rydahlif device-only runtimes are created.
785c0f98cdSAnton Rydahl
795c0f98cdSAnton Rydahl``--offload-host-only``
805c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^
815c0f98cdSAnton RydahlCompile only the code that goes on the host. With this option enabled, the
825c0f98cdSAnton Rydahl``.llvm.offloading`` section with embedded device code will not be included in
835c0f98cdSAnton Rydahlthe intermediate representation.
845c0f98cdSAnton Rydahl
855c0f98cdSAnton Rydahl``--offload-host-device``
865c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^
875c0f98cdSAnton RydahlCompile the target regions for both the host and the device. That is the
885c0f98cdSAnton Rydahldefault option.
895c0f98cdSAnton Rydahl
905c0f98cdSAnton Rydahl``-Xopenmp-target <arg>``
915c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^
925c0f98cdSAnton RydahlPass an argument ``<arg>`` to the offloading toolchain, for instance
935c0f98cdSAnton Rydahl``-Xopenmp-target -march=sm_80``.
945c0f98cdSAnton Rydahl
955c0f98cdSAnton Rydahl``-Xopenmp-target=<triple> <arg>``
965c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
975c0f98cdSAnton RydahlPass an argument ``<arg>`` to the offloading toolchain for the target
985c0f98cdSAnton Rydahl``<triple>``. That is especially  useful when an argument must differ for each
995c0f98cdSAnton Rydahltriple. For instance ``-Xopenmp-target=nvptx64 --offload-arch=sm_80
1005c0f98cdSAnton Rydahl-Xopenmp-target=amdgcn --offload-arch=gfx90a`` to specify the device
1015c0f98cdSAnton Rydahlarchitecture.  Alternatively, :ref:`Xarch_host` and :ref:`Xarch_device` can
1025c0f98cdSAnton Rydahlpass an argument to the host and device compilation toolchain.
1035c0f98cdSAnton Rydahl
1045c0f98cdSAnton Rydahl``-Xoffload-linker<triple> <arg>``
1055c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1065c0f98cdSAnton RydahlPass an argument ``<arg>`` to the offloading linker for the target specified in
1075c0f98cdSAnton Rydahl``<triple>``.
1085c0f98cdSAnton Rydahl
1095c0f98cdSAnton Rydahl.. _Xarch_device:
1105c0f98cdSAnton Rydahl
1115c0f98cdSAnton Rydahl``-Xarch_device <arg>``
1125c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^
1135c0f98cdSAnton RydahlPass an argument ``<arg>`` to the device compilation toolchain.
1145c0f98cdSAnton Rydahl
1155c0f98cdSAnton Rydahl.. _Xarch_host:
1165c0f98cdSAnton Rydahl
1175c0f98cdSAnton Rydahl``-Xarch_host <arg>``
1185c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^
1195c0f98cdSAnton RydahlPass an argument ``<arg>`` to the host compilation toolchain.
1205c0f98cdSAnton Rydahl
1215c0f98cdSAnton Rydahl``-foffload-lto[=<arg>]``
1225c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^
1235c0f98cdSAnton RydahlEnable device link time optimization (LTO) and select the LTO mode ``<arg>``.
1245c0f98cdSAnton RydahlSelect either ``-foffload-lto=thin`` or ``-foffload-lto=full``. Thin LTO takes
1255c0f98cdSAnton Rydahlless time while still achieving some performance gains. If no argument is set,
1265c0f98cdSAnton Rydahlthis option defaults to ``-foffload-lto=full``.
1275c0f98cdSAnton Rydahl
1285c0f98cdSAnton Rydahl``-fopenmp-offload-mandatory``
1295c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1305c0f98cdSAnton Rydahl| This option is set to avoid generating the host fallback code
1315c0f98cdSAnton Rydahl  executed when offloading to the device fails. That is
1325c0f98cdSAnton Rydahl  helpful when the target contains code that cannot be compiled for the host, for
1335c0f98cdSAnton Rydahl  instance, if it contains unguarded device intrinsics.
1345c0f98cdSAnton Rydahl| This option can also be used to reduce compile time.
1355c0f98cdSAnton Rydahl| This option should not be used when one wants to verify that the code is being
1365c0f98cdSAnton Rydahl  offloaded to the device. Instead, set the environment variable
1375c0f98cdSAnton Rydahl  ``OMP_TARGET_OFFLOAD='MANDATORY'`` to confirm that the code is being offloaded to
1385c0f98cdSAnton Rydahl  the device.
1395c0f98cdSAnton Rydahl
1405c0f98cdSAnton Rydahl``-fopenmp-target-debug[=<arg>]``
1415c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1425c0f98cdSAnton RydahlEnable debugging in the device runtime library (RTL). Note that it is both
1435c0f98cdSAnton Rydahlnecessary to configure the debugging in the device runtime at compile-time with
1445c0f98cdSAnton Rydahl``-fopenmp-target-debug=<arg>`` and enable debugging at runtime with the
1455c0f98cdSAnton Rydahlenvironment  variable ``LIBOMPTARGET_DEVICE_RTL_DEBUG=<arg>``. Further, it is
1465c0f98cdSAnton Rydahlcurrently only supported for Nvidia targets as of July 2023. Alternatively, the
1475c0f98cdSAnton Rydahlenvironment variable ``LIBOMPTARGET_DEBUG`` can be set to debug both Nvidia and
1485c0f98cdSAnton RydahlAMD GPU targets. For more information, see the
1495c0f98cdSAnton Rydahl`debugging instructions <https://openmp.llvm.org/design/Runtimes.html#debugging>`_.
1505c0f98cdSAnton RydahlThe debugging instructions list the supported debugging arguments.
1515c0f98cdSAnton Rydahl
1525c0f98cdSAnton Rydahl``-fopenmp-target-jit``
1535c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^
1545c0f98cdSAnton Rydahl| Emit code that is Just-in-Time (JIT) compiled for OpenMP offloading. Embed
1555c0f98cdSAnton Rydahl  LLVM-IR for the device code in the object files rather than binary code for the
1565c0f98cdSAnton Rydahl  respective target. At runtime, the LLVM-IR is optimized again and compiled for
1575c0f98cdSAnton Rydahl  the target device. The optimization level can be set at runtime with
1585c0f98cdSAnton Rydahl  ``LIBOMPTARGET_JIT_OPT_LEVEL``, for instance,
1595c0f98cdSAnton Rydahl  ``LIBOMPTARGET_JIT_OPT_LEVEL=3`` corresponding to optimizations level ``-O3``.
1605c0f98cdSAnton Rydahl  See the
1615c0f98cdSAnton Rydahl  `OpenMP JIT details <https://openmp.llvm.org/design/Runtimes.html#libomptarget-jit-pre-opt-ir-module>`_
1625c0f98cdSAnton Rydahl  for instructions on extracting the embedded device code before or after the
1635c0f98cdSAnton Rydahl  JIT and more.
1645c0f98cdSAnton Rydahl| We want to emphasize that JIT for OpenMP offloading is good for debugging  as
1655c0f98cdSAnton Rydahl  the target IR can be extracted, modified, and injected at runtime.
1665c0f98cdSAnton Rydahl
1675c0f98cdSAnton Rydahl``--offload-new-driver``
1685c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^^^^^^^
1695c0f98cdSAnton RydahlIn upstream LLVM, OpenMP only uses the new driver. However, enabling this
1705c0f98cdSAnton Rydahloption for experimental linking with CUDA or HIP files is necessary.
1715c0f98cdSAnton Rydahl
1725c0f98cdSAnton Rydahl``--offload-link``
1735c0f98cdSAnton Rydahl^^^^^^^^^^^^^^^^^^
1745c0f98cdSAnton RydahlUse the new offloading linker `clang-linker-wrapper` to perform the link job.
1755c0f98cdSAnton Rydahl`clang-linker-wrapper` is the default offloading linker for OpenMP. This option
1765c0f98cdSAnton Rydahlcan be used to use the new offloading linker in toolchains that do not automatically
1775c0f98cdSAnton Rydahluse it. It is necessary to enable this option when linking with CUDA or HIP files.
1785c0f98cdSAnton Rydahl
1795c0f98cdSAnton Rydahl``-nogpulib``
1805c0f98cdSAnton Rydahl^^^^^^^^^^^^^
1815c0f98cdSAnton RydahlDo not link the device library for CUDA or HIP device compilation.
1825c0f98cdSAnton Rydahl
1835c0f98cdSAnton Rydahl``-nogpuinc``
1845c0f98cdSAnton Rydahl^^^^^^^^^^^^^
1855c0f98cdSAnton RydahlDo not include the default CUDA or HIP headers, and do not add CUDA or HIP
1865c0f98cdSAnton Rydahlinclude paths.
187