History log of /llvm-project/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl (Results 1 – 24 of 24)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6
# 7954a051 04-Dec-2024 Florian Hahn <flo@fhahn.com>

[Clang] Enable -fpointer-tbaa by default. (#117244)

Support for more precise TBAA metadata has been added a while ago
(behind the -fpointer-tbaa flag). The more precise TBAA metadata allows
treati

[Clang] Enable -fpointer-tbaa by default. (#117244)

Support for more precise TBAA metadata has been added a while ago
(behind the -fpointer-tbaa flag). The more precise TBAA metadata allows
treating accesses of different pointer types as no-alias.

This helps to remove more redundant loads and stores in a number of
workloads.

Some highlights on the impact across llvm-test-suite's MultiSource,
SPEC2006 & SPEC2017 include:
* +2% more NoAlias results for memory accesses
* +3% more stores removed by DSE,
* +4% more loops vectorized.

This closes a relatively big gap to GCC, which has been supporting
disambiguating based on pointer types for a long time.
(https://clang.godbolt.org/z/K7Wbhrz4q)

Pointer-TBAA support for pointers to builtin types has been added in
https://github.com/llvm/llvm-project/pull/76612.

Support for user-defined types has been added in
https://github.com/llvm/llvm-project/pull/110569.

There are 2 recent PRs with bug fixes for special cases uncovered during
testing:
* https://github.com/llvm/llvm-project/pull/116991
* https://github.com/llvm/llvm-project/pull/116596

PR: https://github.com/llvm/llvm-project/pull/117244

show more ...


# 68bcba6d 04-Dec-2024 Shilei Tian <i@tianshilei.me>

Revert "[AMDGPU] Use COV6 by default (#118515)"

This reverts commit 410cbe3cf28913cca2fc61b3437306b841d08172 because some
buildbots are not ready yet.


# 410cbe3c 04-Dec-2024 Shilei Tian <i@tianshilei.me>

[AMDGPU] Use COV6 by default (#118515)


Revision tags: llvmorg-19.1.5, llvmorg-19.1.4
# 4b50ec43 15-Nov-2024 Shilei Tian <i@tianshilei.me>

[Clang] Avoid Using `byval` for `ndrange_t` when emitting `__enqueue_kernel_basic` (#116435)

AMDGPU disabled the use of `byval` for struct argument passing in commit
d77c620. However, when emitting

[Clang] Avoid Using `byval` for `ndrange_t` when emitting `__enqueue_kernel_basic` (#116435)

AMDGPU disabled the use of `byval` for struct argument passing in commit
d77c620. However, when emitting `__enqueue_kernel_basic`, Clang still
adds the
`byval` attribute by default. Emitting the `byval` attribute by default
in this
context doesn’t seem like a good idea, as argument-passing conventions
are
highly target-dependent, and assumptions here could lead to issues. This
PR
removes the addition of the `byval` attribute, aligning the behavior
with other
`__enqueue_kernel_*` functions.

show more ...


Revision tags: llvmorg-19.1.3
# 6e0b0038 22-Oct-2024 Alex Voicu <alexandru.voicu@amd.com>

[clang][OpenCL][CodeGen][AMDGPU] Do not use `private` as the default AS for when `generic` is available (#112442)

Currently, for AMDGPU, when compiling for OpenCL, we unconditionally use
`private`

[clang][OpenCL][CodeGen][AMDGPU] Do not use `private` as the default AS for when `generic` is available (#112442)

Currently, for AMDGPU, when compiling for OpenCL, we unconditionally use
`private` as the default address space. This is wrong for cases where
the `generic` address space is available, and is corrected via this
patch. In general, this AS map abuse is a bad hack and we should re-work
it altogether, but at least after this patch we will stop being
incorrect for e.g. OpenCL 2.0.

show more ...


Revision tags: llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3
# 94473f4d 09-Aug-2024 Hari Limaye <hari.limaye@arm.com>

[IRBuilder] Generate nuw GEPs for struct member accesses (#99538)

Generate nuw GEPs for struct member accesses, as inbounds + non-negative
implies nuw.

Regression tests are updated using update

[IRBuilder] Generate nuw GEPs for struct member accesses (#99538)

Generate nuw GEPs for struct member accesses, as inbounds + non-negative
implies nuw.

Regression tests are updated using update scripts where possible, and by
find + replace where not.

show more ...


Revision tags: llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1
# 4490003a 06-Mar-2024 Emma Pilkington <emma.pilkington95@gmail.com>

[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905)

The previous name 'amdgpu_code_object_version', was misleading since
this is really a property of the HSA OS. The new spelling

[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905)

The previous name 'amdgpu_code_object_version', was misleading since
this is really a property of the HSA OS. The new spelling also matches
the asm directive I added in bc82cfb.

show more ...


Revision tags: llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 082f87c9 23-Jan-2024 Saiyedul Islam <Saiyedul.Islam@amd.com>

[AMDGPU] Change default AMDHSA Code Object version to 5 (#79038)

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Correspondi

[AMDGPU] Change default AMDHSA Code Object version to 5 (#79038)

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Corresponding llvm-objdump AMDGPU lit tests are updated
in a follow-up PR.

show more ...


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0
# 466a8149 12-Sep-2023 Saiyedul Islam <Saiyedul.Islam@amd.com>

Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)" (#66060)

This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.


# 0a8d17e7 12-Sep-2023 Saiyedul Islam <Saiyedul.Islam@amd.com>

[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Reviewed B

[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Reviewed By: arsenm, jhuber6

Github PR: #65410

Differential Revision: https://reviews.llvm.org/D129818

show more ...


Revision tags: llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init
# 64792564 12-Jan-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

clang/OpenCL: Apply default attributes to enqueued blocks

This was missing important environment context, like denormal-fp-math
and target-features. Curiously this seems to be losing nounwind. Note

clang/OpenCL: Apply default attributes to enqueued blocks

This was missing important environment context, like denormal-fp-math
and target-features. Curiously this seems to be losing nounwind. Note
this only fixes the actual invoke kernel. The invoke function is
already setting the default attribute set for internal
functions. However that is still buggy since it's not applying any use
function attributes (it's also missing uniform-work-group-size).

There seem to be too many different functions for setting attributes
with inconsistent behavior. The Function overload of
addDefaultFunctionAttributes seems to miss the target-cpu and
target-features. The AttrBuilder one seems to miss optnone (but that
seems to be disallowed on blocks anyway). Neither one calls
setTargetAttributes, when it probably should. uniform-work-group-size
is also set through AMDGPU code when it should be emitting generically
as a language property.

I also noticed update_cc_test_checks for attributes seem to not
connect the captured attribute variables to the attributes at the end
(although I think the numbers happen to work out correctly).

show more ...


# d12ee4bf 12-Jan-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

clang/OpenCL: Extend tests for enqueued block attributes

Baseline tests showing that enqueued blocks are not getting the
correct attributes applied.


Revision tags: llvmorg-15.0.7
# 00f6a7f0 11-Jan-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

clang/OpenCL: Fix not setting convergent on block invoke kernels

Yet another example how convergent not being the default is dangerous
and backwards.


# 14952109 19-Jan-2023 Sven van Haastregt <sven.vanhaastregt@arm.com>

[OpenCL] Always add nounwind attribute for OpenCL

Neither OpenCL nor C++ for OpenCL support exceptions, so add the
`nounwind` attribute unconditionally for those languages.

Differential Revision: h

[OpenCL] Always add nounwind attribute for OpenCL

Neither OpenCL nor C++ for OpenCL support exceptions, so add the
`nounwind` attribute unconditionally for those languages.

Differential Revision: https://reviews.llvm.org/D142033

show more ...


# f9559b1e 09-Jan-2023 Matt Arsenault <Matthew.Arsenault@amd.com>

clang: Convert test to generated checks and opaque pointers


Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# 532dc62b 07-Apr-2022 Nikita Popov <npopov@redhat.com>

[OpaquePtrs][Clang] Add -no-opaque-pointers to tests (NFC)

This adds -no-opaque-pointers to clang tests whose output will
change when opaque pointers are enabled by default. This is
intended to be p

[OpaquePtrs][Clang] Add -no-opaque-pointers to tests (NFC)

This adds -no-opaque-pointers to clang tests whose output will
change when opaque pointers are enabled by default. This is
intended to be part of the migration approach described in
https://discourse.llvm.org/t/enabling-opaque-pointers-by-default/61322/9.

The patch has been produced by replacing %clang_cc1 with
%clang_cc1 -no-opaque-pointers for tests that fail with opaque
pointers enabled. Worth noting that this doesn't cover all tests,
there's a remaining ~40 tests not using %clang_cc1 that will need
a followup change.

Differential Revision: https://reviews.llvm.org/D123115

show more ...


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1
# fd739804 31-Dec-2020 Fangrui Song <i@maskray.me>

[test] Add {{.*}} to make ELF tests immune to dso_local/dso_preemptable/(none) differences

For a default visibility external linkage definition, dso_local is set for ELF
-fno-pic/-fpie and COFF and

[test] Add {{.*}} to make ELF tests immune to dso_local/dso_preemptable/(none) differences

For a default visibility external linkage definition, dso_local is set for ELF
-fno-pic/-fpie and COFF and Mach-O. Since default clang -cc1 for ELF is similar
to -fpic ("PIC Level" is not set), this nuance causes unneeded binary format differences.

To make emitted IR similar, ELF -cc1 -fpic will default to -fno-semantic-interposition,
which sets dso_local for default visibility external linkage definitions.

To make this flip smooth and enable future (dso_local as definition default),
this patch replaces (function) `define ` with `define{{.*}} `,
(variable/constant/alias) `= ` with `={{.*}} `, or inserts appropriate `{{.*}} `.

show more ...


Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2
# a009a60a 03-Aug-2019 Tim Northover <tnorthover@apple.com>

IR: print value numbers for unnamed function arguments

For consistency with normal instructions and clarity when reading IR,
it's best to print the %0, %1, ... names of function arguments in
definit

IR: print value numbers for unnamed function arguments

For consistency with normal instructions and clarity when reading IR,
it's best to print the %0, %1, ... names of function arguments in
definitions.

Also modifies the parser to accept IR in that form for obvious reasons.

llvm-svn: 367755

show more ...


Revision tags: llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1
# da3b6320 02-Oct-2018 Sven van Haastregt <sven.vanhaastregt@arm.com>

Revert r326937 "[OpenCL] Remove block invoke function from emitted block literal struct"

This reverts r326937 as it broke block argument handling in OpenCL.
See the discussion on https://reviews.llv

Revert r326937 "[OpenCL] Remove block invoke function from emitted block literal struct"

This reverts r326937 as it broke block argument handling in OpenCL.
See the discussion on https://reviews.llvm.org/D43783 .

The next commit will add a test case that revealed the issue.

llvm-svn: 343582

show more ...


Revision tags: llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1
# cb35e9fa 07-Mar-2018 Yaxun Liu <Yaxun.Liu@amd.com>

[OpenCL] Remove block invoke function from emitted block literal struct

OpenCL runtime tracks the invoke function emitted for
any block expression. Due to restrictions on blocks in
OpenCL (v2.0 s6.1

[OpenCL] Remove block invoke function from emitted block literal struct

OpenCL runtime tracks the invoke function emitted for
any block expression. Due to restrictions on blocks in
OpenCL (v2.0 s6.12.5), it is always possible to know the
block invoke function when emitting call of block expression
or __enqueue_kernel builtin functions. Since __enqueu_kernel
already has an argument for the invoke function, it is redundant
to have invoke function member in the llvm block literal structure.

This patch removes invoke function from the llvm block literal
structure. It also removes the bitcast of block invoke function
to the generic block literal type which is useless for OpenCL.

This will save some space for the kernel argument, and also
eliminate some store instructions.

Differential Revision: https://reviews.llvm.org/D43783

llvm-svn: 326937

show more ...


Revision tags: llvmorg-6.0.0, llvmorg-6.0.0-rc3
# fa13d015 15-Feb-2018 Yaxun Liu <Yaxun.Liu@amd.com>

[OpenCL] Fix __enqueue_block for block with captures

The following test case causes issue with codegen of __enqueue_block

void (^block)(void) = ^{ callee(id, out); };

enqueue_kernel(queue, 0, ndra

[OpenCL] Fix __enqueue_block for block with captures

The following test case causes issue with codegen of __enqueue_block

void (^block)(void) = ^{ callee(id, out); };

enqueue_kernel(queue, 0, ndrange, block);
Clang first does codegen for block expression in the first line and deletes its block info.
Clang then tries to do codegen for the same block expression again for the second line,
and fails because the block info is gone.

The fix is to do normal codegen for both lines. Introduce an API to OpenCL runtime to
record llvm block invoke function and llvm block literal emitted for each AST block
expression, and use the recorded information for generating the wrapper kernel.

The EmitBlockLiteral APIs are cleaned up to minimize changes to the normal codegen
of blocks.

Another minor issue is that some clean up AST expression is generated for block
with captures, which can be stripped by IgnoreImplicit.

Differential Revision: https://reviews.llvm.org/D43240

llvm-svn: 325264

show more ...


Revision tags: llvmorg-6.0.0-rc2
# f5f45e5e 02-Feb-2018 Yaxun Liu <Yaxun.Liu@amd.com>

[AMDGPU] Switch to the new addr space mapping by default

This requires corresponding llvm change.

Differential Revision: https://reviews.llvm.org/D40956

llvm-svn: 324102


Revision tags: llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1
# f2127d17 19-Oct-2017 Yaxun Liu <Yaxun.Liu@amd.com>

[AMDGPU] Fix bug in enqueued block codegen due to an extra line

llvm-svn: 316165


# c2a87a05 14-Oct-2017 Yaxun Liu <Yaxun.Liu@amd.com>

[OpenCL] Emit enqueued block as kernel

In OpenCL the kernel function and non-kernel function has different calling conventions.
For certain targets they have different argument ABIs. Also kernels ha

[OpenCL] Emit enqueued block as kernel

In OpenCL the kernel function and non-kernel function has different calling conventions.
For certain targets they have different argument ABIs. Also kernels have special function
attributes and metadata for runtime to launch them.

The blocks passed to enqueue_kernel is supposed to be executed as kernels. As such,
the block invoke function should be emitted as kernel with proper calling convention and
argument ABI.

This patch emits enqueued block as kernel. If a block is both called directly and passed
to enqueue_kernel, separate functions will be generated.

Differential Revision: https://reviews.llvm.org/D38134

llvm-svn: 315804

show more ...