Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3 |
|
#
80525dfc |
| 13-Aug-2024 |
Johannes Doerfert <johannes@jdoerfert.de> |
[Offload][CUDA] Allow CUDA kernels to use LLVM/Offload (#94549)
Through the new `-foffload-via-llvm` flag, CUDA kernels can now be
lowered to the LLVM/Offload API. On the Clang side, this is simply
[Offload][CUDA] Allow CUDA kernels to use LLVM/Offload (#94549)
Through the new `-foffload-via-llvm` flag, CUDA kernels can now be
lowered to the LLVM/Offload API. On the Clang side, this is simply done
by using the OpenMP offload toolchain and emitting calls to `llvm*`
functions to orchestrate the kernel launch rather than `cuda*`
functions. These `llvm*` functions are implemented on top of the
existing LLVM/Offload API.
As we are about to redefine the Offload API, this wil help us in the
design process as a second offload language.
We do not support any CUDA APIs yet, however, we could:
https://www.osti.gov/servlets/purl/1892137
For proper host execution we need to resurrect/rebase
https://tianshilei.me/wp-content/uploads/2021/12/llpp-2021.pdf
(which was designed for debugging).
```
❯❯❯ cat test.cu
extern "C" {
void *llvm_omp_target_alloc_shared(size_t Size, int DeviceNum);
void llvm_omp_target_free_shared(void *DevicePtr, int DeviceNum);
}
__global__ void square(int *A) { *A = 42; }
int main(int argc, char **argv) {
int DevNo = 0;
int *Ptr = reinterpret_cast<int *>(llvm_omp_target_alloc_shared(4, DevNo));
*Ptr = 7;
printf("Ptr %p, *Ptr %i\n", Ptr, *Ptr);
square<<<1, 1>>>(Ptr);
printf("Ptr %p, *Ptr %i\n", Ptr, *Ptr);
llvm_omp_target_free_shared(Ptr, DevNo);
}
❯❯❯ clang++ test.cu -O3 -o test123 -foffload-via-llvm --offload-arch=native
❯❯❯ llvm-objdump --offloading test123
test123: file format elf64-x86-64
OFFLOADING IMAGE [0]:
kind elf
arch gfx90a
triple amdgcn-amd-amdhsa
producer openmp
❯❯❯ LIBOMPTARGET_INFO=16 ./test123
Ptr 0x155448ac8000, *Ptr 7
Ptr 0x155448ac8000, *Ptr 42
```
show more ...
|
Revision tags: llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
#
dee1f5b3 |
| 18-Jun-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
OpenMP: Don't include stdbool.h in builtin headers
Pre-C99 didn't include bool, and C99 allows you to redefine true/false apparently.
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
#
f3eb5f90 |
| 04-Aug-2021 |
Pushpinder Singh <Pushpinder.Singh@amd.com> |
[AMDGPU][OpenMP] Wrap amdgcn declare variant inside ifdef
This fixes the issue https://bugs.llvm.org/show_bug.cgi?id=51337
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.
[AMDGPU][OpenMP] Wrap amdgcn declare variant inside ifdef
This fixes the issue https://bugs.llvm.org/show_bug.cgi?id=51337
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D107468
show more ...
|
Revision tags: llvmorg-13.0.0-rc1 |
|
#
713a5d12 |
| 02-Aug-2021 |
Pushpinder Singh <Pushpinder.Singh@amd.com> |
[OpenMP][AMDGCN] Initial math headers support
With this patch, OpenMP on AMDGCN will use the math functions provided by ROCm ocml library. Linking device code to the ocml will be done in the next pa
[OpenMP][AMDGCN] Initial math headers support
With this patch, OpenMP on AMDGCN will use the math functions provided by ROCm ocml library. Linking device code to the ocml will be done in the next patch.
Reviewed By: JonChesterfield, jdoerfert, scchan
Differential Revision: https://reviews.llvm.org/D104904
show more ...
|
#
7f97ddaf |
| 30-Jul-2021 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
Revert "[OpenMP][AMDGCN] Initial math headers support"
Broke nvptx compilation on files including <complex>
This reverts commit 12da97ea10a941f0123340831300d09a2121e173.
|
#
12da97ea |
| 30-Jul-2021 |
Pushpinder Singh <Pushpinder.Singh@amd.com> |
[OpenMP][AMDGCN] Initial math headers support
With this patch, OpenMP on AMDGCN will use the math functions provided by ROCm ocml library. Linking device code to the ocml will be done in the next pa
[OpenMP][AMDGCN] Initial math headers support
With this patch, OpenMP on AMDGCN will use the math functions provided by ROCm ocml library. Linking device code to the ocml will be done in the next patch.
Reviewed By: JonChesterfield, jdoerfert, scchan
Differential Revision: https://reviews.llvm.org/D104904
show more ...
|
Revision tags: llvmorg-14-init |
|
#
d71062fb |
| 21-Jul-2021 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
Revert "[OpenMP][AMDGCN] Initial math headers support"
This reverts commit 968899ad9cf17579f9867dafb35c4d97bad0863f.
|
#
968899ad |
| 21-Jul-2021 |
Pushpinder Singh <Pushpinder.Singh@amd.com> |
[OpenMP][AMDGCN] Initial math headers support
With this patch, OpenMP on AMDGCN will use the math functions provided by ROCm ocml library. Linking device code to the ocml will be done in the next pa
[OpenMP][AMDGCN] Initial math headers support
With this patch, OpenMP on AMDGCN will use the math functions provided by ROCm ocml library. Linking device code to the ocml will be done in the next patch.
Reviewed By: JonChesterfield, jdoerfert, scchan
Differential Revision: https://reviews.llvm.org/D104904
show more ...
|
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
#
5d8d994d |
| 16-Apr-2021 |
Johannes Doerfert <johannes@jdoerfert.de> |
[OpenMP] Make sure classes work on the device as they do on the host
We do provide `operator delete(void*)` in `<new>` but it should be available by default. This is mostly boilerplate to test it an
[OpenMP] Make sure classes work on the device as they do on the host
We do provide `operator delete(void*)` in `<new>` but it should be available by default. This is mostly boilerplate to test it and the unconditional include of `<new>` in the header we always in include on the device.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D100620
show more ...
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init |
|
#
7f1e6fcf |
| 10-Jul-2020 |
Johannes Doerfert <johannes@jdoerfert.de> |
[OpenMP] Use __OPENMP_NVPTX__ instead of _OPENMP in wrapper headers
Due to recent changes we cannot use OpenMP in CUDA files anymore (PR45533) as the math handling of CUDA is different when _OPENMP
[OpenMP] Use __OPENMP_NVPTX__ instead of _OPENMP in wrapper headers
Due to recent changes we cannot use OpenMP in CUDA files anymore (PR45533) as the math handling of CUDA is different when _OPENMP is defined. We actually want this different behavior only if we are offloading with OpenMP to NVIDIA, thus generating NVPTX. With this patch we do not interfere with the CUDA math handling except if we are in NVPTX offloading mode, as indicated by the presence of __OPENMP_NVPTX__.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D78155
show more ...
|
Revision tags: llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1 |
|
#
f85ae058 |
| 28-Mar-2020 |
Johannes Doerfert <johannes@jdoerfert.de> |
[OpenMP] Provide math functions in OpenMP device code via OpenMP variants
For OpenMP target regions to piggy back on the CUDA/AMDGPU/... implementation of math functions, we include the appropriate
[OpenMP] Provide math functions in OpenMP device code via OpenMP variants
For OpenMP target regions to piggy back on the CUDA/AMDGPU/... implementation of math functions, we include the appropriate definitions inside of an `omp begin/end declare variant match(device={arch(nvptx)})` scope. This way, the vendor specific math functions will become specialized versions of the system math functions. When a system math function is called and specialized version is available the selection logic introduced in D75779 instead call the specialized version. In contrast to the code path we used so far, the system header is actually included. This means functions without specialized versions are available and so are macro definitions.
This should address PR42061, PR42798, and PR42799.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D75788
show more ...
|