Revision tags: llvmorg-21-init, llvmorg-19.1.7 |
|
#
73b0e8a1 |
| 13-Jan-2025 |
Akshat Oke <Akshat.Oke@amd.com> |
[AMDGPU][NewPM] Port AMDGPUOpenCLEnqueuedBlockLowering to NPM (#122434)
|
Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
#
bbc7b30f |
| 23-Dec-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove invalid testcase for enqueue kernel
The call didn't have the right calling convention, but calls to kernels are supposed to be illegal anyway.
|
#
99d4c722 |
| 08-Jan-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Really invert handling of enqueued block detection
Remove the broken call graph analysis in the block enqueue lowering pass. The previous iteration was reverted due to a runtime bug when the
AMDGPU: Really invert handling of enqueued block detection
Remove the broken call graph analysis in the block enqueue lowering pass. The previous iteration was reverted due to a runtime bug when the completion action was unconditionally enabled.
show more ...
|
#
4fc07e18 |
| 23-Dec-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Use constant and externally_initialized for block handle
The runtime initializes this.
|
#
0cd3a39e |
| 09-Jan-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix opaque pointer handling for enqueued blocks, again
|
#
270e96f4 |
| 08-Jan-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Revert "AMDGPU: Invert handling of enqueued block detection"
This reverts commit 47288cc977fa31c44cc92b4e65044a5b75c2597e.
The runtime is having trouble with this at -O0 when the inputs are always
Revert "AMDGPU: Invert handling of enqueued block detection"
This reverts commit 47288cc977fa31c44cc92b4e65044a5b75c2597e.
The runtime is having trouble with this at -O0 when the inputs are always enabled.
show more ...
|
#
47554a0c |
| 23-Dec-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Use more accurate IR type for block handle
The device library uses this as a struct with a pointer sized integer and 2 ints.
|
#
47288cc9 |
| 23-Dec-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Invert handling of enqueued block detection
Invert the sense of the attribute and let the attributor figure this out like everything else. If needed we can have the not-OpenCL languages set
AMDGPU: Invert handling of enqueued block detection
Invert the sense of the attribute and let the attributor figure this out like everything else. If needed we can have the not-OpenCL languages set amdgpu-no-default-queue and amdgpu-no-completion-action up front so they never have to pay the cost.
There are also so many of these now, the offset use API should probably consider all of them at once. Maybe they should merge into one attribute with used fields. Having separate functions for each field in AMDGPUBaseInfo is also not the greatest API (might as well fix this when the patch to get the object version from the module lands).
show more ...
|
#
0416883d |
| 23-Dec-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix enqueue block lowering for opaque pointers
This was looking for a specific constant cast of the function, when the type doesn't matter. Doesn't bother trying to handle typed pointers, it
AMDGPU: Fix enqueue block lowering for opaque pointers
This was looking for a specific constant cast of the function, when the type doesn't matter. Doesn't bother trying to handle typed pointers, it will just assert.
Things probably don't work completely correctly if the block kernel address is captured somewhere else, but that wouldn't work before either. The uses should really be loads out of the handle, and the handle initializer should contain the kernel address.
show more ...
|
#
4ce5400a |
| 23-Dec-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Convert enqueue-kernel.ll to opaque pointers
This demonstrates the pass is broken with them, the follow up change will fix it.
|
#
1f93517b |
| 23-Dec-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Switch enqueue kernel test to generated checks
|
#
a74ea40c |
| 29-Nov-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove unnecessary metadata from test
The pass isn't doing anything with it, and the line wrapping is confusing update_test_checks.
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1 |
|
#
06c192d4 |
| 20-Nov-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
OpaquePtr: Bulk update tests to use typed byval
Upgrade of the IR text tests should be the only thing blocking making typed byval mandatory. Partially done through regex and partially manual.
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3 |
|
#
fb17bf60 |
| 13-Jun-2018 |
Yaxun Liu <Yaxun.Liu@amd.com> |
[AMDGPU] Change enqueue kernel handle type
Currently the handle type is a global pointer which holds 8 bytes. We need a larger type which hold 16 bytes, therefore change it to [i64 x 2].
Differenti
[AMDGPU] Change enqueue kernel handle type
Currently the handle type is a global pointer which holds 8 bytes. We need a larger type which hold 16 bytes, therefore change it to [i64 x 2].
Differential Revision: https://reviews.llvm.org/D48094
llvm-svn: 334625
show more ...
|
Revision tags: llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1 |
|
#
9381ae97 |
| 11-Apr-2018 |
Yaxun Liu <Yaxun.Liu@amd.com> |
[AMDGPU] Fix lowering enqueue_kernel
Two issues were fixed:
runtime has difficulty to allocate memory for an external symbol of a kernel and set the address of the external symbol, therefore make t
[AMDGPU] Fix lowering enqueue_kernel
Two issues were fixed:
runtime has difficulty to allocate memory for an external symbol of a kernel and set the address of the external symbol, therefore make the runtime handle of an enqueued kernel an ordinary global variable. Runtime only needs to store the address of the loaded kernel to the handle and has verified that this approach works.
handle the situation where __enqueue_kernel* gets inlined therefore the enqueued kernel may be used through a constant expr instead of an instruction.
Differential Revision: https://reviews.llvm.org/D45187
llvm-svn: 329815
show more ...
|
Revision tags: llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1 |
|
#
a99e7d8e |
| 12-Mar-2018 |
Yaxun Liu <Yaxun.Liu@amd.com> |
[AMDGPU] Fix lowering enqueue kernel when kernel has no name
Since the enqueued kernels have internal linkage, their names may be dropped. In this case, give them unique names __amdgpu_enqueued_kern
[AMDGPU] Fix lowering enqueue kernel when kernel has no name
Since the enqueued kernels have internal linkage, their names may be dropped. In this case, give them unique names __amdgpu_enqueued_kernel or __amdgpu_enqueued_kernel.n where n is a sequential number starting from 1.
Differential Revision: https://reviews.llvm.org/D44322
llvm-svn: 327291
show more ...
|
#
46439e8d |
| 06-Mar-2018 |
Yaxun Liu <Yaxun.Liu@amd.com> |
[AMDGPU] Fix lowering OpenCL enqueue_kernel
One addrspacecast disappeared in clang emitted IR for block invoke function due to adoption of the new addr space mapping.
Differential Revision: https:/
[AMDGPU] Fix lowering OpenCL enqueue_kernel
One addrspacecast disappeared in clang emitted IR for block invoke function due to adoption of the new addr space mapping.
Differential Revision: https://reviews.llvm.org/D43785
llvm-svn: 326806
show more ...
|
Revision tags: llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1 |
|
#
e4b0231c |
| 11-Jan-2018 |
Rafael Espindola <rafael.espindola@gmail.com> |
Make internal/private GVs implicitly dso_local.
While updating clang tests for having clang set dso_local I noticed that:
- There are *a lot* of tests to update. - Many of the updates are redundant
Make internal/private GVs implicitly dso_local.
While updating clang tests for having clang set dso_local I noticed that:
- There are *a lot* of tests to update. - Many of the updates are redundant.
They are redundant because a GV is "obviously dso_local". This patch starts formalizing that a bit by requiring that internal and private GVs be dso_local too. Since they all are, we don't have to print dso_local to the textual representation, making it a bit more compact and easier to read.
llvm-svn: 322317
show more ...
|
Revision tags: llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1 |
|
#
c928f2a6 |
| 30-Oct-2017 |
Yaxun Liu <Yaxun.Liu@amd.com> |
[AMDGPU] Emit metadata for hidden arguments for kernel enqueue
Identifies kernels which performs device side kernel enqueues and emit metadata for the associated hidden kernel arguments. Such kernel
[AMDGPU] Emit metadata for hidden arguments for kernel enqueue
Identifies kernels which performs device side kernel enqueues and emit metadata for the associated hidden kernel arguments. Such kernels are marked with calls-enqueue-kernel function attribute by AMDGPUOpenCLEnqueueKernelLowering pass and later on hidden kernel arguments metadata HiddenDefaultQueue and HiddenCompletionAction are emitted for them.
Differential Revision: https://reviews.llvm.org/D39255
llvm-svn: 316907
show more ...
|
#
de4b88d9 |
| 10-Oct-2017 |
Yaxun Liu <Yaxun.Liu@amd.com> |
[AMDGPU] Lower enqueued blocks and generate runtime metadata
This patch adds a post-linking pass which replaces the function pointer of enqueued block kernel with a global variable (runtime handle)
[AMDGPU] Lower enqueued blocks and generate runtime metadata
This patch adds a post-linking pass which replaces the function pointer of enqueued block kernel with a global variable (runtime handle) and adds runtime-handle attribute to the enqueued block kernel.
In LLVM CodeGen the runtime-handle metadata will be translated to RuntimeHandle metadata in code object. Runtime allocates a global buffer for each kernel with RuntimeHandel metadata and saves the kernel address required for the AQL packet into the buffer. __enqueue_kernel function in device library knows that the invoke function pointer in the block literal is actually runtime handle and loads the kernel address from it and puts it into AQL packet for dispatching.
This cannot be done in FE since FE cannot create a unique global variable with external linkage across LLVM modules. The global variable with internal linkage does not work since optimization passes will try to replace loads of the global variable with its initialization value.
Differential Revision: https://reviews.llvm.org/D38610
llvm-svn: 315352
show more ...
|