target_teams_generic_loop_codegen.cpp - OpenGrok history log for /llvm-project/clang/test/OpenMP/target_teams_generic_loop

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# 6b1c51bc	26-Jun-2024	Akash Banerjee <akash.banerjee@amd.com>	[OpenMP] Migrate GPU Reductions CodeGen from Clang to OMPIRBuilder (#80343) This patch migrates the CGOpenMPRuntimeGPU::emitReduction and related functions to the OpenMPIRBUilder. In future patches [OpenMP] Migrate GPU Reductions CodeGen from Clang to OMPIRBuilder (#80343) This patch migrates the CGOpenMPRuntimeGPU::emitReduction and related functions to the OpenMPIRBUilder. In future patches MLIR OpenMP translation would be making use of these functions. Co-authored-by: Jan Leyonberg <jan.leyonberg@amd.com> show more ...
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4
# a1283664	10-Apr-2024	David Pagan <dave.pagan@amd.com>	[OpenMP][CodeGen] Improved codegen for combined loop directives (#87278) IR for 'target teams loop' is now dependent on suitability of associated loop-nest. If a loop-nest: - does not contain [OpenMP][CodeGen] Improved codegen for combined loop directives (#87278) IR for 'target teams loop' is now dependent on suitability of associated loop-nest. If a loop-nest: - does not contain a function call, or - the -fopenmp-assume-no-nested-parallelism has been specified, - or the call is to an OpenMP API AND - does not contain nested loop bind(parallel) directives then it can be emitted as 'target teams distribute parallel for', which is the current default. Otherwise, it is emitted as 'target teams distribute'. Added debug output indicating how 'target teams loop' was emitted. Flag is -mllvm -debug-only=target-teams-loop-codegen Added LIT tests explicitly verifying 'target teams loop' emitted as a parallel loop and a distribute loop. Updated other 'loop' related tests as needed to reflect change in IR. - These updates account for most of the changed files and additions/deletions. show more ...
Revision tags: llvmorg-18.1.3, llvmorg-18.1.2
# 4e3310a8	12-Mar-2024	mikaoP <raul.penacoba@bsc.es>	[clang] Fix OMPT ident flag in combined distribute parallel for pragma (#80987) Authored-by: Raúl Peñacoba Veigas <rpenacob@bsc.es>
Revision tags: llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5
# 7318fe63	10-Nov-2023	Johannes Doerfert <johannes@jdoerfert.de>	[OpenMP][FIX] Ensure device reduction geps work for multi-var reductions If we have more than one reduction variable we need to be consistent wrt. indexing. In 3de645efe30b83ba1b6d7e500486c4f441a17a [OpenMP][FIX] Ensure device reduction geps work for multi-var reductions If we have more than one reduction variable we need to be consistent wrt. indexing. In 3de645efe30b83ba1b6d7e500486c4f441a17a61 we broke this as the buffer type was reduced to a singleton but the index computation was not adjusted to account for that offset. This fixes it by interleaving the reduction variables properly in a array-of-struct style. We can revert it back to struct-of-array in a follow up if turns out to be a problem. I doubt it since half the accesses should benefit from the locallity this layout offers and only the other half were consecutive before. show more ...
# 3de645ef	03-Nov-2023	Johannes Doerfert <johannes@jdoerfert.de>	[OpenMP][NFC] Split the reduction buffer size into two components Before we tracked the size of the teams reduction buffer in order to allocate it at runtime per kernel launch. This patch splits the [OpenMP][NFC] Split the reduction buffer size into two components Before we tracked the size of the teams reduction buffer in order to allocate it at runtime per kernel launch. This patch splits the number into two parts, the size of the reduction data (=all reduction variables) and the (maximal) length of the buffer. This will allow us to allocate less if we need less, e.g., if we have less teams than the maximal length. It also allows us to move code from clangs codegen into the runtime as we now know how large the reduction data is. show more ...
# 921bd299	03-Nov-2023	Johannes Doerfert <johannes@jdoerfert.de>	[OpenMP] Remove alignment for global <-> local reduction functions The alignment did likely not help much but increases the memory requirement. Note that half of the affected accesses are all perfor [OpenMP] Remove alignment for global <-> local reduction functions The alignment did likely not help much but increases the memory requirement. Note that half of the affected accesses are all performed by a single thread in each block. The reads are by consecutive threads in a single block. show more ...
# d3e7a48c	03-Nov-2023	Johannes Doerfert <johannes@jdoerfert.de>	[OpenMP][NFC] Remove a no-op function
# f9a89e6b	01-Nov-2023	Johannes Doerfert <johannes@jdoerfert.de>	[OpenMP][FIX] Allocate per launch memory for GPU team reductions (#70752) We used to perform team reduction on global memory allocated in the runtime and by clang. This was racy as multiple instanc [OpenMP][FIX] Allocate per launch memory for GPU team reductions (#70752) We used to perform team reduction on global memory allocated in the runtime and by clang. This was racy as multiple instances of a kernel, or different kernels with team reductions, would use the same locations. Since we now have the kernel launch environment, we can allocate dynamic memory per-launch, allowing us to move all the state into a non-racy place. Fixes: https://github.com/llvm/llvm-project/issues/70249 show more ...
# b8cbc5c0	01-Nov-2023	Johannes Doerfert <johannes@jdoerfert.de>	[OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401) The KernelEnvironment is for compile time information about a kernel. It allows the compiler to feed information to the [OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401) The KernelEnvironment is for compile time information about a kernel. It allows the compiler to feed information to the runtime. The KernelLaunchEnvironment is for dynamic information per kernel launch. It allows the rutime to feed information to the kernel that is not shared with other invocations of the kernel. The first use case is to replace the globals that synchronize teams reductions with per-launch versions. This allows concurrent teams reductions. More uses cases will follow, e.g., per launch memory pools. Fixes: https://github.com/llvm/llvm-project/issues/70249 show more ...
Revision tags: llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1
# 10068cd6	26-Jul-2023	Shilei Tian <i@tianshilei.me>	[OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode` [OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime. This is a combination and refinement of patch series D116908, D116909, and D116910. Depend on D155886. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142569 show more ...
Revision tags: llvmorg-18-init
# 6bd74fd6	24-Jul-2023	Shilei Tian <i@tianshilei.me>	Revert commits for kernel environment This reverts commits for kernel environments as they causes issues in AMD BB.
# c5c80403	23-Jul-2023	Shilei Tian <i@tianshilei.me>	[OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode` [OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime. This is a combination and refinement of patch series D116908, D116909, and D116910. Depend on D155886. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142569 show more ...
# 63ca93c7	06-Jul-2023	Sergio Afonso <safonsof@amd.com>	[OpenMP][OMPIRBuilder] Rename IsEmbedded and IsTargetCodegen flags This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over their meaning. `IsTargetCodegen` becomes `IsGPU`, whe [OpenMP][OMPIRBuilder] Rename IsEmbedded and IsTargetCodegen flags This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes `IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to `-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed to `omp.is_target_device`. Getters and setters of all these renamed properties are also updated accordingly. Many unit tests have been updated to use the new names, but an alias for the `-fopenmp-is-device` option is created so that external programs do not stop working after the name change. `IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the `-fopenmp-is-target-device` compiler frontend option, which is only added to the OpenMP device invocation for offloading-enabled programs. Differential Revision: https://reviews.llvm.org/D154591 show more ...
# eb61bde8	02-Jul-2023	Dave Pagan <dave.pagan@amd.com>	[OpenMP][CodeGen] Add codegen for combined 'loop' directives. The loop directive is a descriptive construct which allows the compiler flexibility in how it generates code for the directive's associa [OpenMP][CodeGen] Add codegen for combined 'loop' directives. The loop directive is a descriptive construct which allows the compiler flexibility in how it generates code for the directive's associated loop(s). See OpenMP specification 5.2 [257:8-9]. Codegen added in this patch for the combined 'loop' directives are: 'target teams loop' -> 'target teams distribute parallel for' 'teams loop' -> 'teams distribute parallel for' 'target parallel loop' -> 'target parallel for' 'parallel loop' -> 'parallel for' NOTE: The implementation of the 'loop' directive itself is unchanged. Differential Revision: https://reviews.llvm.org/D145823 show more ...