using.rst - OpenGrok cross reference for /llvm-project/libc/docs/gpu/using.rst

Lines Matching +full:openmp +full:- +full:build
20 library for offloading languages such as OpenMP, CUDA, or HIP. These aim to
27 ----------------
29 Offloading languages like CUDA, HIP, or OpenMP work by compiling a single source
34 by the OpenMP toolchain, but is currently opt-in for the CUDA and HIP toolchains
35 through the ``--offload-new-driver``` and ``-fgpu-rdc`` flags.
38 device linker job. This can be done using the ``-Xoffload-linker`` option, which
43 .. code-block:: sh
45   $> clang openmp.c -fopenmp --offload-arch=gfx90a -Xoffload-linker -lc
46   $> clang cuda.cu --offload-arch=sm_80 --offload-new-driver -fgpu-rdc -Xoffload-linker -lc
47   $> clang hip.hip --offload-arch=gfx940 --offload-new-driver -fgpu-rdc -Xoffload-linker -lc
50 required by the user's application. Normally using the ``-fgpu-rdc`` option
51 results in sub-par performance due to ABA linking. However, the offloading
52 toolchain supports the ``--foffload-lto`` option to support LTO on the target
57 ``declare target`` pragma in OpenMP. This requires that the LLVM C library
58 exposes its implemented functions to the compiler when it is used to build. We
60 These are located in ``<clang-resource-dir>/include/llvm-libc-wrappers`` in your
65 server example<libc_gpu_cuda_server>`. The OpenMP Offloading toolchain is
67 handle including the necessary libraries, define device-side interfaces, and run
70 OpenMP Offloading example
73 This section provides a simple example of compiling an OpenMP program with the
76 .. code-block:: c++
84     { fputs("Hello from OpenMP!\n", file); }
87 This can simply be compiled like any other OpenMP application to print from two
90 .. code-block:: sh
92   $> clang openmp.c -fopenmp --offload-arch=gfx90a
94   Hello from OpenMP!
95   Hello from OpenMP!
96   Hello from OpenMP!
97   Hello from OpenMP!
103 ------------------
109 method that the GPU C library uses both to build the library and to run tests.
113 on the compiler's intrinsic and built-in functions. For example, the following
117 .. code-block:: c++
129 We can then compile this for both NVPTX and AMDGPU into LLVM-IR using the
130 following commands. This will yield valid LLVM-IR for the given target just like
131 if we were using CUDA, OpenCL, or OpenMP.
133 .. code-block:: sh
135   $> clang id.c --target=amdgcn-amd-amdhsa -mcpu=native -nogpulib -flto -c
136   $> clang id.c --target=nvptx64-nvidia-cuda -march=native -nogpulib -flto -c
141 loader utility to launch the executable on the GPU similar to a cross-compiling
150 as its linker. The installation will include the ``include/amdgcn-amd-amdhsa``
151 and ``lib/amdgcn-amd-amdha`` directories that contain the necessary code to use
155 .. code-block:: c++
162 ``-flto`` and ``-mcpu=`` should be defined. This is because the GPU
163 sub-architectures do not have strict backwards compatibility. Use ``-mcpu=help``
164 for accepted arguments or ``-mcpu=native`` to target the system's installed GPUs
165 if present. Additionally, the AMDGPU target always uses ``-flto`` because we
167 ``amdhsa-loader`` utility to launch execution on the GPU. This will be built if
168 the ``hsa_runtime64`` library was found during build time.
170 .. code-block:: sh
172   $> clang hello.c --target=amdgcn-amd-amdhsa -mcpu=native -flto -lc <install>/lib/amdgcn-amd-amdhsa/crt1.o
173   $> amdhsa-loader --threads 2 --blocks 2 a.out
180 ``include/amdgcn-amd-amdhsa`` directory. We define out ``main`` function like a
181 standard application. The startup utility in ``lib/amdgcn-amd-amdhsa/crt1.o``
184 ``libc.a`` library stored in ``lib/amdgcn-amd-amdhsa`` to define the standard C
190 also provides ``libc.bc`` which is a single LLVM-IR bitcode blob that can be
198 ``clang-nvlink-wrapper`` instead wraps around the standard link job to give the
201 .. code-block:: c++
211 contain the ``nvptx-loader`` utility if the CUDA driver was found during
214 .. code-block:: sh
216   $> clang hello.c --target=nvptx64-nvidia-cuda -march=native -flto -lc <install>/lib/nvptx64-nvidia-cuda/crt1.o
217   $> nvptx-loader --threads 2 --blocks 2 a.out