History log of /llvm-project/clang/test/OpenMP/target_teams_generic_loop_codegen.cpp (Results 1 – 14 of 14)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 6b1c51bc 26-Jun-2024 Akash Banerjee <akash.banerjee@amd.com>

[OpenMP] Migrate GPU Reductions CodeGen from Clang to OMPIRBuilder (#80343)

This patch migrates the CGOpenMPRuntimeGPU::emitReduction and related functions to the OpenMPIRBUilder. In future patches

[OpenMP] Migrate GPU Reductions CodeGen from Clang to OMPIRBuilder (#80343)

This patch migrates the CGOpenMPRuntimeGPU::emitReduction and related functions to the OpenMPIRBUilder. In future patches MLIR OpenMP translation would be making use of these functions.

Co-authored-by: Jan Leyonberg <jan.leyonberg@amd.com>

show more ...


Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4
# a1283664 10-Apr-2024 David Pagan <dave.pagan@amd.com>

[OpenMP][CodeGen] Improved codegen for combined loop directives (#87278)

IR for 'target teams loop' is now dependent on suitability of associated
loop-nest.

If a loop-nest:

- does not contain

[OpenMP][CodeGen] Improved codegen for combined loop directives (#87278)

IR for 'target teams loop' is now dependent on suitability of associated
loop-nest.

If a loop-nest:

- does not contain a function call, or
- the -fopenmp-assume-no-nested-parallelism has been specified,
- or the call is to an OpenMP API AND
- does not contain nested loop bind(parallel) directives

then it can be emitted as 'target teams distribute parallel for', which
is the current default. Otherwise, it is emitted as 'target teams
distribute'.

Added debug output indicating how 'target teams loop' was emitted. Flag
is -mllvm -debug-only=target-teams-loop-codegen

Added LIT tests explicitly verifying 'target teams loop' emitted as a
parallel loop and a distribute loop.

Updated other 'loop' related tests as needed to reflect change in IR.
- These updates account for most of the changed files and
additions/deletions.

show more ...


Revision tags: llvmorg-18.1.3, llvmorg-18.1.2
# 4e3310a8 12-Mar-2024 mikaoP <raul.penacoba@bsc.es>

[clang] Fix OMPT ident flag in combined distribute parallel for pragma (#80987)

Authored-by: Raúl Peñacoba Veigas <rpenacob@bsc.es>


Revision tags: llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5
# 7318fe63 10-Nov-2023 Johannes Doerfert <johannes@jdoerfert.de>

[OpenMP][FIX] Ensure device reduction geps work for multi-var reductions

If we have more than one reduction variable we need to be consistent
wrt. indexing. In 3de645efe30b83ba1b6d7e500486c4f441a17a

[OpenMP][FIX] Ensure device reduction geps work for multi-var reductions

If we have more than one reduction variable we need to be consistent
wrt. indexing. In 3de645efe30b83ba1b6d7e500486c4f441a17a61 we broke this
as the buffer type was reduced to a singleton but the index computation
was not adjusted to account for that offset. This fixes it by
interleaving the reduction variables properly in a array-of-struct
style. We can revert it back to struct-of-array in a follow up if turns
out to be a problem. I doubt it since half the accesses should benefit
from the locallity this layout offers and only the other half were
consecutive before.

show more ...


# 3de645ef 03-Nov-2023 Johannes Doerfert <johannes@jdoerfert.de>

[OpenMP][NFC] Split the reduction buffer size into two components

Before we tracked the size of the teams reduction buffer in order to
allocate it at runtime per kernel launch. This patch splits the

[OpenMP][NFC] Split the reduction buffer size into two components

Before we tracked the size of the teams reduction buffer in order to
allocate it at runtime per kernel launch. This patch splits the number
into two parts, the size of the reduction data (=all reduction
variables) and the (maximal) length of the buffer. This will allow us to
allocate less if we need less, e.g., if we have less teams than the
maximal length. It also allows us to move code from clangs codegen into
the runtime as we now know how large the reduction data is.

show more ...


# 921bd299 03-Nov-2023 Johannes Doerfert <johannes@jdoerfert.de>

[OpenMP] Remove alignment for global <-> local reduction functions

The alignment did likely not help much but increases the memory
requirement. Note that half of the affected accesses are all perfor

[OpenMP] Remove alignment for global <-> local reduction functions

The alignment did likely not help much but increases the memory
requirement. Note that half of the affected accesses are all performed
by a single thread in each block. The reads are by consecutive threads
in a single block.

show more ...


# d3e7a48c 03-Nov-2023 Johannes Doerfert <johannes@jdoerfert.de>

[OpenMP][NFC] Remove a no-op function


# f9a89e6b 01-Nov-2023 Johannes Doerfert <johannes@jdoerfert.de>

[OpenMP][FIX] Allocate per launch memory for GPU team reductions (#70752)

We used to perform team reduction on global memory allocated in the
runtime and by clang. This was racy as multiple instanc

[OpenMP][FIX] Allocate per launch memory for GPU team reductions (#70752)

We used to perform team reduction on global memory allocated in the
runtime and by clang. This was racy as multiple instances of a kernel,
or different kernels with team reductions, would use the same locations.
Since we now have the kernel launch environment, we can allocate dynamic
memory per-launch, allowing us to move all the state into a non-racy
place.

Fixes: https://github.com/llvm/llvm-project/issues/70249

show more ...


# b8cbc5c0 01-Nov-2023 Johannes Doerfert <johannes@jdoerfert.de>

[OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401)

The KernelEnvironment is for compile time information about a kernel. It
allows the compiler to feed information to the

[OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401)

The KernelEnvironment is for compile time information about a kernel. It
allows the compiler to feed information to the runtime. The
KernelLaunchEnvironment is for dynamic information *per* kernel launch.
It allows the rutime to feed information to the kernel that is not
shared with other invocations of the kernel. The first use case is to
replace the globals that synchronize teams reductions with per-launch
versions. This allows concurrent teams reductions. More uses cases will
follow, e.g., per launch memory pools.

Fixes: https://github.com/llvm/llvm-project/issues/70249

show more ...


Revision tags: llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1
# 10068cd6 26-Jul-2023 Shilei Tian <i@tianshilei.me>

[OpenMP] Introduce kernel environment

This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`

[OpenMP] Introduce kernel environment

This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.

This is a combination and refinement of patch series D116908, D116909, and D116910.

Depend on D155886.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142569

show more ...


Revision tags: llvmorg-18-init
# 6bd74fd6 24-Jul-2023 Shilei Tian <i@tianshilei.me>

Revert commits for kernel environment

This reverts commits for kernel environments as they causes issues in AMD BB.


# c5c80403 23-Jul-2023 Shilei Tian <i@tianshilei.me>

[OpenMP] Introduce kernel environment

This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`

[OpenMP] Introduce kernel environment

This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.

This is a combination and refinement of patch series D116908, D116909, and D116910.

Depend on D155886.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142569

show more ...


# 63ca93c7 06-Jul-2023 Sergio Afonso <safonsof@amd.com>

[OpenMP][OMPIRBuilder] Rename IsEmbedded and IsTargetCodegen flags

This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over
their meaning. `IsTargetCodegen` becomes `IsGPU`, whe

[OpenMP][OMPIRBuilder] Rename IsEmbedded and IsTargetCodegen flags

This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over
their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes
`IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to
`-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed
to `omp.is_target_device`. Getters and setters of all these renamed properties
are also updated accordingly. Many unit tests have been updated to use the new
names, but an alias for the `-fopenmp-is-device` option is created so that
external programs do not stop working after the name change.

`IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only
valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the
`-fopenmp-is-target-device` compiler frontend option, which is only added to
the OpenMP device invocation for offloading-enabled programs.

Differential Revision: https://reviews.llvm.org/D154591

show more ...


# eb61bde8 02-Jul-2023 Dave Pagan <dave.pagan@amd.com>

[OpenMP][CodeGen] Add codegen for combined 'loop' directives.

The loop directive is a descriptive construct which allows the compiler
flexibility in how it generates code for the directive's associa

[OpenMP][CodeGen] Add codegen for combined 'loop' directives.

The loop directive is a descriptive construct which allows the compiler
flexibility in how it generates code for the directive's associated
loop(s). See OpenMP specification 5.2 [257:8-9].

Codegen added in this patch for the combined 'loop' directives are:

'target teams loop' -> 'target teams distribute parallel for'
'teams loop' -> 'teams distribute parallel for'
'target parallel loop' -> 'target parallel for'
'parallel loop' -> 'parallel for'

NOTE: The implementation of the 'loop' directive itself is unchanged.

Differential Revision: https://reviews.llvm.org/D145823

show more ...