| #
5feb32ba |
| 25-Jun-2024 |
Vikram Hegde <115221833+vikramRH@users.noreply.github.com> |
[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217)
This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass t
[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217)
This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass to support i64 and f64 operations (along
with removing all unnecessary bitcasts). This legalizes 64 bit readlane,
writelane and readfirstlane ops pre-ISel
---------
Co-authored-by: vikramRH <vikhegde@amd.com>
show more ...
|
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4 |
|
| #
f09360d2 |
| 30-Aug-2023 |
Pravin Jagtap <Pravin.Jagtap@amd.com> |
[AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer.
Reduction and Scan are implemented using `Iterative` and `DPP` strategy for `float` type.
Reviewed By: arsenm, #amdgpu
Different
[AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer.
Reduction and Scan are implemented using `Iterative` and `DPP` strategy for `float` type.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D156301
show more ...
|
|
Revision tags: llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
| #
699addef |
| 20-Jun-2023 |
Pravin Jagtap <Pravin.Jagtap@amd.com> |
[AMDGPU] Use verify<domtree> instead of intra-pass asserts.
Verifying dominator tree is expensive using intra-pass asserts. Asserts added during D147408 are increasing the build time of libc signifi
[AMDGPU] Use verify<domtree> instead of intra-pass asserts.
Verifying dominator tree is expensive using intra-pass asserts. Asserts added during D147408 are increasing the build time of libc significantly. This change does the verification after the atomic optimizer pass and should fix the regression reported in D153232.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D153261
show more ...
|
|
Revision tags: llvmorg-16.0.6 |
|
| #
f6c8a8e9 |
| 09-Jun-2023 |
Pravin Jagtap <Pravin.Jagtap@amd.com> |
[AMDGPU] Iterative scan implementation for atomic optimizer.
This patch provides an alternative implementation to DPP for Scan Computations.
An alternative implementation iterates over all active l
[AMDGPU] Iterative scan implementation for atomic optimizer.
This patch provides an alternative implementation to DPP for Scan Computations.
An alternative implementation iterates over all active lanes of Wavefront using llvm.cttz and performs the following steps: 1. Read the value that needs to be atomically incremented using llvm.amdgcn.readlane intrinsic 2. Accumulate the result. 3. Update the scan result using llvm.amdgcn.writelane intrinsic if intermediate scan results are needed later in the kernel.
Reviewed By: arsenm, cdevadas
Differential Revision: https://reviews.llvm.org/D147408
show more ...
|