| #
6b1c51bc |
| 26-Jun-2024 |
Akash Banerjee <akash.banerjee@amd.com> |
[OpenMP] Migrate GPU Reductions CodeGen from Clang to OMPIRBuilder (#80343)
This patch migrates the CGOpenMPRuntimeGPU::emitReduction and related functions to the OpenMPIRBUilder. In future patches
[OpenMP] Migrate GPU Reductions CodeGen from Clang to OMPIRBuilder (#80343)
This patch migrates the CGOpenMPRuntimeGPU::emitReduction and related functions to the OpenMPIRBUilder. In future patches MLIR OpenMP translation would be making use of these functions.
Co-authored-by: Jan Leyonberg <jan.leyonberg@amd.com>
show more ...
|
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4 |
|
| #
a1283664 |
| 10-Apr-2024 |
David Pagan <dave.pagan@amd.com> |
[OpenMP][CodeGen] Improved codegen for combined loop directives (#87278)
IR for 'target teams loop' is now dependent on suitability of associated
loop-nest.
If a loop-nest:
- does not contain
[OpenMP][CodeGen] Improved codegen for combined loop directives (#87278)
IR for 'target teams loop' is now dependent on suitability of associated
loop-nest.
If a loop-nest:
- does not contain a function call, or
- the -fopenmp-assume-no-nested-parallelism has been specified,
- or the call is to an OpenMP API AND
- does not contain nested loop bind(parallel) directives
then it can be emitted as 'target teams distribute parallel for', which
is the current default. Otherwise, it is emitted as 'target teams
distribute'.
Added debug output indicating how 'target teams loop' was emitted. Flag
is -mllvm -debug-only=target-teams-loop-codegen
Added LIT tests explicitly verifying 'target teams loop' emitted as a
parallel loop and a distribute loop.
Updated other 'loop' related tests as needed to reflect change in IR.
- These updates account for most of the changed files and
additions/deletions.
show more ...
|
|
Revision tags: llvmorg-18.1.3, llvmorg-18.1.2 |
|
| #
4e3310a8 |
| 12-Mar-2024 |
mikaoP <raul.penacoba@bsc.es> |
[clang] Fix OMPT ident flag in combined distribute parallel for pragma (#80987)
Authored-by: Raúl Peñacoba Veigas <rpenacob@bsc.es>
|
|
Revision tags: llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5 |
|
| #
7318fe63 |
| 10-Nov-2023 |
Johannes Doerfert <johannes@jdoerfert.de> |
[OpenMP][FIX] Ensure device reduction geps work for multi-var reductions
If we have more than one reduction variable we need to be consistent wrt. indexing. In 3de645efe30b83ba1b6d7e500486c4f441a17a
[OpenMP][FIX] Ensure device reduction geps work for multi-var reductions
If we have more than one reduction variable we need to be consistent wrt. indexing. In 3de645efe30b83ba1b6d7e500486c4f441a17a61 we broke this as the buffer type was reduced to a singleton but the index computation was not adjusted to account for that offset. This fixes it by interleaving the reduction variables properly in a array-of-struct style. We can revert it back to struct-of-array in a follow up if turns out to be a problem. I doubt it since half the accesses should benefit from the locallity this layout offers and only the other half were consecutive before.
show more ...
|
| #
3de645ef |
| 03-Nov-2023 |
Johannes Doerfert <johannes@jdoerfert.de> |
[OpenMP][NFC] Split the reduction buffer size into two components
Before we tracked the size of the teams reduction buffer in order to allocate it at runtime per kernel launch. This patch splits the
[OpenMP][NFC] Split the reduction buffer size into two components
Before we tracked the size of the teams reduction buffer in order to allocate it at runtime per kernel launch. This patch splits the number into two parts, the size of the reduction data (=all reduction variables) and the (maximal) length of the buffer. This will allow us to allocate less if we need less, e.g., if we have less teams than the maximal length. It also allows us to move code from clangs codegen into the runtime as we now know how large the reduction data is.
show more ...
|
| #
921bd299 |
| 03-Nov-2023 |
Johannes Doerfert <johannes@jdoerfert.de> |
[OpenMP] Remove alignment for global <-> local reduction functions
The alignment did likely not help much but increases the memory requirement. Note that half of the affected accesses are all perfor
[OpenMP] Remove alignment for global <-> local reduction functions
The alignment did likely not help much but increases the memory requirement. Note that half of the affected accesses are all performed by a single thread in each block. The reads are by consecutive threads in a single block.
show more ...
|
| #
d3e7a48c |
| 03-Nov-2023 |
Johannes Doerfert <johannes@jdoerfert.de> |
[OpenMP][NFC] Remove a no-op function
|
| #
f9a89e6b |
| 01-Nov-2023 |
Johannes Doerfert <johannes@jdoerfert.de> |
[OpenMP][FIX] Allocate per launch memory for GPU team reductions (#70752)
We used to perform team reduction on global memory allocated in the
runtime and by clang. This was racy as multiple instanc
[OpenMP][FIX] Allocate per launch memory for GPU team reductions (#70752)
We used to perform team reduction on global memory allocated in the
runtime and by clang. This was racy as multiple instances of a kernel,
or different kernels with team reductions, would use the same locations.
Since we now have the kernel launch environment, we can allocate dynamic
memory per-launch, allowing us to move all the state into a non-racy
place.
Fixes: https://github.com/llvm/llvm-project/issues/70249
show more ...
|
| #
b8cbc5c0 |
| 01-Nov-2023 |
Johannes Doerfert <johannes@jdoerfert.de> |
[OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401)
The KernelEnvironment is for compile time information about a kernel. It
allows the compiler to feed information to the
[OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401)
The KernelEnvironment is for compile time information about a kernel. It
allows the compiler to feed information to the runtime. The
KernelLaunchEnvironment is for dynamic information *per* kernel launch.
It allows the rutime to feed information to the kernel that is not
shared with other invocations of the kernel. The first use case is to
replace the globals that synchronize teams reductions with per-launch
versions. This allows concurrent teams reductions. More uses cases will
follow, e.g., per launch memory pools.
Fixes: https://github.com/llvm/llvm-project/issues/70249
show more ...
|
|
Revision tags: llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1 |
|
| #
10068cd6 |
| 26-Jul-2023 |
Shilei Tian <i@tianshilei.me> |
[OpenMP] Introduce kernel environment
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`
[OpenMP] Introduce kernel environment
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.
This is a combination and refinement of patch series D116908, D116909, and D116910.
Depend on D155886.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142569
show more ...
|
|
Revision tags: llvmorg-18-init |
|
| #
6bd74fd6 |
| 24-Jul-2023 |
Shilei Tian <i@tianshilei.me> |
Revert commits for kernel environment
This reverts commits for kernel environments as they causes issues in AMD BB.
|
| #
c5c80403 |
| 23-Jul-2023 |
Shilei Tian <i@tianshilei.me> |
[OpenMP] Introduce kernel environment
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`
[OpenMP] Introduce kernel environment
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.
This is a combination and refinement of patch series D116908, D116909, and D116910.
Depend on D155886.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142569
show more ...
|
| #
63ca93c7 |
| 06-Jul-2023 |
Sergio Afonso <safonsof@amd.com> |
[OpenMP][OMPIRBuilder] Rename IsEmbedded and IsTargetCodegen flags
This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over their meaning. `IsTargetCodegen` becomes `IsGPU`, whe
[OpenMP][OMPIRBuilder] Rename IsEmbedded and IsTargetCodegen flags
This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes `IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to `-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed to `omp.is_target_device`. Getters and setters of all these renamed properties are also updated accordingly. Many unit tests have been updated to use the new names, but an alias for the `-fopenmp-is-device` option is created so that external programs do not stop working after the name change.
`IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the `-fopenmp-is-target-device` compiler frontend option, which is only added to the OpenMP device invocation for offloading-enabled programs.
Differential Revision: https://reviews.llvm.org/D154591
show more ...
|
| #
eb61bde8 |
| 02-Jul-2023 |
Dave Pagan <dave.pagan@amd.com> |
[OpenMP][CodeGen] Add codegen for combined 'loop' directives.
The loop directive is a descriptive construct which allows the compiler flexibility in how it generates code for the directive's associa
[OpenMP][CodeGen] Add codegen for combined 'loop' directives.
The loop directive is a descriptive construct which allows the compiler flexibility in how it generates code for the directive's associated loop(s). See OpenMP specification 5.2 [257:8-9].
Codegen added in this patch for the combined 'loop' directives are:
'target teams loop' -> 'target teams distribute parallel for' 'teams loop' -> 'teams distribute parallel for' 'target parallel loop' -> 'target parallel for' 'parallel loop' -> 'parallel for'
NOTE: The implementation of the 'loop' directive itself is unchanged.
Differential Revision: https://reviews.llvm.org/D145823
show more ...
|