Lines Matching +full:nvptx +full:- +full:-

27 ----------------
34 by the OpenMP toolchain, but is currently opt-in for the CUDA and HIP toolchains
35 through the ``--offload-new-driver``` and ``-fgpu-rdc`` flags.
38 device linker job. This can be done using the ``-Xoffload-linker`` option, which
43 .. code-block:: sh
45 $> clang openmp.c -fopenmp --offload-arch=gfx90a -Xoffload-linker -lc
46 $> clang cuda.cu --offload-arch=sm_80 --offload-new-driver -fgpu-rdc -Xoffload-linker -lc
47 $> clang hip.hip --offload-arch=gfx940 --offload-new-driver -fgpu-rdc -Xoffload-linker -lc
50 required by the user's application. Normally using the ``-fgpu-rdc`` option
51 results in sub-par performance due to ABA linking. However, the offloading
52 toolchain supports the ``--foffload-lto`` option to support LTO on the target
60 These are located in ``<clang-resource-dir>/include/llvm-libc-wrappers`` in your
67 handle including the necessary libraries, define device-side interfaces, and run
76 .. code-block:: c++
90 .. code-block:: sh
92 $> clang openmp.c -fopenmp --offload-arch=gfx90a
103 ------------------
113 on the compiler's intrinsic and built-in functions. For example, the following
117 .. code-block:: c++
129 We can then compile this for both NVPTX and AMDGPU into LLVM-IR using the
130 following commands. This will yield valid LLVM-IR for the given target just like
133 .. code-block:: sh
135 $> clang id.c --target=amdgcn-amd-amdhsa -mcpu=native -nogpulib -flto -c
136 $> clang id.c --target=nvptx64-nvidia-cuda -march=native -nogpulib -flto -c
141 loader utility to launch the executable on the GPU similar to a cross-compiling
150 as its linker. The installation will include the ``include/amdgcn-amd-amdhsa``
151 and ``lib/amdgcn-amd-amdha`` directories that contain the necessary code to use
155 .. code-block:: c++
162 ``-flto`` and ``-mcpu=`` should be defined. This is because the GPU
163 sub-architectures do not have strict backwards compatibility. Use ``-mcpu=help``
164 for accepted arguments or ``-mcpu=native`` to target the system's installed GPUs
165 if present. Additionally, the AMDGPU target always uses ``-flto`` because we
167 ``amdhsa-loader`` utility to launch execution on the GPU. This will be built if
170 .. code-block:: sh
172 $> clang hello.c --target=amdgcn-amd-amdhsa -mcpu=native -flto -lc <install>/lib/amdgcn-amd-amdhsa/crt1.o
173 $> amdhsa-loader --threads 2 --blocks 2 a.out
180 ``include/amdgcn-amd-amdhsa`` directory. We define out ``main`` function like a
181 standard application. The startup utility in ``lib/amdgcn-amd-amdhsa/crt1.o``
184 ``libc.a`` library stored in ``lib/amdgcn-amd-amdhsa`` to define the standard C
190 also provides ``libc.bc`` which is a single LLVM-IR bitcode blob that can be
193 Building for NVPTX targets
196 The infrastructure is the same as the AMDGPU example. However, the NVPTX binary
198 ``clang-nvlink-wrapper`` instead wraps around the standard link job to give the
201 .. code-block:: c++
206 printf("Hello from NVPTX!\n");
209 Additionally, the NVPTX ABI requires that every function signature matches. This
211 contain the ``nvptx-loader`` utility if the CUDA driver was found during
214 .. code-block:: sh
216 $> clang hello.c --target=nvptx64-nvidia-cuda -march=native -flto -lc <install>/lib/nvptx64-nvidia-cuda/crt1.o
217 $> nvptx-loader --threads 2 --blocks 2 a.out
218 Hello from NVPTX!
219 Hello from NVPTX!
220 Hello from NVPTX!
221 Hello from NVPTX!