using.rst - OpenGrok cross reference for /llvm-project/libc/docs/gpu/using.rst

Lines Matching +full:built +full:- +full:docs
27 ----------------
33 <https://clang.llvm.org/docs/OffloadingDesign.html>`_. This linking mode is used
34 by the OpenMP toolchain, but is currently opt-in for the CUDA and HIP toolchains
35 through the ``--offload-new-driver``` and ``-fgpu-rdc`` flags.
38 device linker job. This can be done using the ``-Xoffload-linker`` option, which
43 .. code-block:: sh
45   $> clang openmp.c -fopenmp --offload-arch=gfx90a -Xoffload-linker -lc
46   $> clang cuda.cu --offload-arch=sm_80 --offload-new-driver -fgpu-rdc -Xoffload-linker -lc
47   $> clang hip.hip --offload-arch=gfx940 --offload-new-driver -fgpu-rdc -Xoffload-linker -lc
50 required by the user's application. Normally using the ``-fgpu-rdc`` option
51 results in sub-par performance due to ABA linking. However, the offloading
52 toolchain supports the ``--foffload-lto`` option to support LTO on the target
60 These are located in ``<clang-resource-dir>/include/llvm-libc-wrappers`` in your
67 handle including the necessary libraries, define device-side interfaces, and run
76 .. code-block:: c++
90 .. code-block:: sh
92   $> clang openmp.c -fopenmp --offload-arch=gfx90a
103 ------------------
108 support <https://clang.llvm.org/docs/CrossCompilation.html>`_. This is the
113 on the compiler's intrinsic and built-in functions. For example, the following
117 .. code-block:: c++
129 We can then compile this for both NVPTX and AMDGPU into LLVM-IR using the
130 following commands. This will yield valid LLVM-IR for the given target just like
133 .. code-block:: sh
135   $> clang id.c --target=amdgcn-amd-amdhsa -mcpu=native -nogpulib -flto -c
136   $> clang id.c --target=nvptx64-nvidia-cuda -march=native -nogpulib -flto -c
141 loader utility to launch the executable on the GPU similar to a cross-compiling
150 as its linker. The installation will include the ``include/amdgcn-amd-amdhsa``
151 and ``lib/amdgcn-amd-amdha`` directories that contain the necessary code to use
155 .. code-block:: c++
162 ``-flto`` and ``-mcpu=`` should be defined. This is because the GPU
163 sub-architectures do not have strict backwards compatibility. Use ``-mcpu=help``
164 for accepted arguments or ``-mcpu=native`` to target the system's installed GPUs
165 if present. Additionally, the AMDGPU target always uses ``-flto`` because we
166 currently do not fully support ELF linking in ``lld``. Once built, we use the
167 ``amdhsa-loader`` utility to launch execution on the GPU. This will be built if
170 .. code-block:: sh
172   $> clang hello.c --target=amdgcn-amd-amdhsa -mcpu=native -flto -lc <install>/lib/amdgcn-amd-amdhsa/crt1.o
173   $> amdhsa-loader --threads 2 --blocks 2 a.out
180 ``include/amdgcn-amd-amdhsa`` directory. We define out ``main`` function like a
181 standard application. The startup utility in ``lib/amdgcn-amd-amdhsa/crt1.o``
184 ``libc.a`` library stored in ``lib/amdgcn-amd-amdhsa`` to define the standard C
190 also provides ``libc.bc`` which is a single LLVM-IR bitcode blob that can be
198 ``clang-nvlink-wrapper`` instead wraps around the standard link job to give the
201 .. code-block:: c++
211 contain the ``nvptx-loader`` utility if the CUDA driver was found during
214 .. code-block:: sh
216   $> clang hello.c --target=nvptx64-nvidia-cuda -march=native -flto -lc <install>/lib/nvptx64-nvidia-cuda/crt1.o
217   $> nvptx-loader --threads 2 --blocks 2 a.out