Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6 |
|
#
7954a051 |
| 04-Dec-2024 |
Florian Hahn <flo@fhahn.com> |
[Clang] Enable -fpointer-tbaa by default. (#117244)
Support for more precise TBAA metadata has been added a while ago
(behind the -fpointer-tbaa flag). The more precise TBAA metadata allows
treati
[Clang] Enable -fpointer-tbaa by default. (#117244)
Support for more precise TBAA metadata has been added a while ago
(behind the -fpointer-tbaa flag). The more precise TBAA metadata allows
treating accesses of different pointer types as no-alias.
This helps to remove more redundant loads and stores in a number of
workloads.
Some highlights on the impact across llvm-test-suite's MultiSource,
SPEC2006 & SPEC2017 include:
* +2% more NoAlias results for memory accesses
* +3% more stores removed by DSE,
* +4% more loops vectorized.
This closes a relatively big gap to GCC, which has been supporting
disambiguating based on pointer types for a long time.
(https://clang.godbolt.org/z/K7Wbhrz4q)
Pointer-TBAA support for pointers to builtin types has been added in
https://github.com/llvm/llvm-project/pull/76612.
Support for user-defined types has been added in
https://github.com/llvm/llvm-project/pull/110569.
There are 2 recent PRs with bug fixes for special cases uncovered during
testing:
* https://github.com/llvm/llvm-project/pull/116991
* https://github.com/llvm/llvm-project/pull/116596
PR: https://github.com/llvm/llvm-project/pull/117244
show more ...
|
#
68bcba6d |
| 04-Dec-2024 |
Shilei Tian <i@tianshilei.me> |
Revert "[AMDGPU] Use COV6 by default (#118515)"
This reverts commit 410cbe3cf28913cca2fc61b3437306b841d08172 because some buildbots are not ready yet.
|
#
410cbe3c |
| 04-Dec-2024 |
Shilei Tian <i@tianshilei.me> |
[AMDGPU] Use COV6 by default (#118515)
|
Revision tags: llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
4b50ec43 |
| 15-Nov-2024 |
Shilei Tian <i@tianshilei.me> |
[Clang] Avoid Using `byval` for `ndrange_t` when emitting `__enqueue_kernel_basic` (#116435)
AMDGPU disabled the use of `byval` for struct argument passing in commit
d77c620. However, when emitting
[Clang] Avoid Using `byval` for `ndrange_t` when emitting `__enqueue_kernel_basic` (#116435)
AMDGPU disabled the use of `byval` for struct argument passing in commit
d77c620. However, when emitting `__enqueue_kernel_basic`, Clang still
adds the
`byval` attribute by default. Emitting the `byval` attribute by default
in this
context doesn’t seem like a good idea, as argument-passing conventions
are
highly target-dependent, and assumptions here could lead to issues. This
PR
removes the addition of the `byval` attribute, aligning the behavior
with other
`__enqueue_kernel_*` functions.
show more ...
|
Revision tags: llvmorg-19.1.3 |
|
#
6e0b0038 |
| 22-Oct-2024 |
Alex Voicu <alexandru.voicu@amd.com> |
[clang][OpenCL][CodeGen][AMDGPU] Do not use `private` as the default AS for when `generic` is available (#112442)
Currently, for AMDGPU, when compiling for OpenCL, we unconditionally use
`private`
[clang][OpenCL][CodeGen][AMDGPU] Do not use `private` as the default AS for when `generic` is available (#112442)
Currently, for AMDGPU, when compiling for OpenCL, we unconditionally use
`private` as the default address space. This is wrong for cases where
the `generic` address space is available, and is corrected via this
patch. In general, this AS map abuse is a bad hack and we should re-work
it altogether, but at least after this patch we will stop being
incorrect for e.g. OpenCL 2.0.
show more ...
|
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3 |
|
#
94473f4d |
| 09-Aug-2024 |
Hari Limaye <hari.limaye@arm.com> |
[IRBuilder] Generate nuw GEPs for struct member accesses (#99538)
Generate nuw GEPs for struct member accesses, as inbounds + non-negative
implies nuw.
Regression tests are updated using update
[IRBuilder] Generate nuw GEPs for struct member accesses (#99538)
Generate nuw GEPs for struct member accesses, as inbounds + non-negative
implies nuw.
Regression tests are updated using update scripts where possible, and by
find + replace where not.
show more ...
|
Revision tags: llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1 |
|
#
4490003a |
| 06-Mar-2024 |
Emma Pilkington <emma.pilkington95@gmail.com> |
[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905)
The previous name 'amdgpu_code_object_version', was misleading since
this is really a property of the HSA OS. The new spelling
[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905)
The previous name 'amdgpu_code_object_version', was misleading since
this is really a property of the HSA OS. The new spelling also matches
the asm directive I added in bc82cfb.
show more ...
|
Revision tags: llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
082f87c9 |
| 23-Jan-2024 |
Saiyedul Islam <Saiyedul.Islam@amd.com> |
[AMDGPU] Change default AMDHSA Code Object version to 5 (#79038)
Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata
Correspondi
[AMDGPU] Change default AMDHSA Code Object version to 5 (#79038)
Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata
Corresponding llvm-objdump AMDGPU lit tests are updated
in a follow-up PR.
show more ...
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0 |
|
#
466a8149 |
| 12-Sep-2023 |
Saiyedul Islam <Saiyedul.Islam@amd.com> |
Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)" (#66060)
This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.
|
#
0a8d17e7 |
| 12-Sep-2023 |
Saiyedul Islam <Saiyedul.Islam@amd.com> |
[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)
Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata
Reviewed B
[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)
Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata
Reviewed By: arsenm, jhuber6
Github PR: #65410
Differential Revision: https://reviews.llvm.org/D129818
show more ...
|
Revision tags: llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init |
|
#
64792564 |
| 12-Jan-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
clang/OpenCL: Apply default attributes to enqueued blocks
This was missing important environment context, like denormal-fp-math and target-features. Curiously this seems to be losing nounwind. Note
clang/OpenCL: Apply default attributes to enqueued blocks
This was missing important environment context, like denormal-fp-math and target-features. Curiously this seems to be losing nounwind. Note this only fixes the actual invoke kernel. The invoke function is already setting the default attribute set for internal functions. However that is still buggy since it's not applying any use function attributes (it's also missing uniform-work-group-size).
There seem to be too many different functions for setting attributes with inconsistent behavior. The Function overload of addDefaultFunctionAttributes seems to miss the target-cpu and target-features. The AttrBuilder one seems to miss optnone (but that seems to be disallowed on blocks anyway). Neither one calls setTargetAttributes, when it probably should. uniform-work-group-size is also set through AMDGPU code when it should be emitting generically as a language property.
I also noticed update_cc_test_checks for attributes seem to not connect the captured attribute variables to the attributes at the end (although I think the numbers happen to work out correctly).
show more ...
|
#
d12ee4bf |
| 12-Jan-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
clang/OpenCL: Extend tests for enqueued block attributes
Baseline tests showing that enqueued blocks are not getting the correct attributes applied.
|
Revision tags: llvmorg-15.0.7 |
|
#
00f6a7f0 |
| 11-Jan-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
clang/OpenCL: Fix not setting convergent on block invoke kernels
Yet another example how convergent not being the default is dangerous and backwards.
|
#
14952109 |
| 19-Jan-2023 |
Sven van Haastregt <sven.vanhaastregt@arm.com> |
[OpenCL] Always add nounwind attribute for OpenCL
Neither OpenCL nor C++ for OpenCL support exceptions, so add the `nounwind` attribute unconditionally for those languages.
Differential Revision: h
[OpenCL] Always add nounwind attribute for OpenCL
Neither OpenCL nor C++ for OpenCL support exceptions, so add the `nounwind` attribute unconditionally for those languages.
Differential Revision: https://reviews.llvm.org/D142033
show more ...
|
#
f9559b1e |
| 09-Jan-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
clang: Convert test to generated checks and opaque pointers
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
#
532dc62b |
| 07-Apr-2022 |
Nikita Popov <npopov@redhat.com> |
[OpaquePtrs][Clang] Add -no-opaque-pointers to tests (NFC)
This adds -no-opaque-pointers to clang tests whose output will change when opaque pointers are enabled by default. This is intended to be p
[OpaquePtrs][Clang] Add -no-opaque-pointers to tests (NFC)
This adds -no-opaque-pointers to clang tests whose output will change when opaque pointers are enabled by default. This is intended to be part of the migration approach described in https://discourse.llvm.org/t/enabling-opaque-pointers-by-default/61322/9.
The patch has been produced by replacing %clang_cc1 with %clang_cc1 -no-opaque-pointers for tests that fail with opaque pointers enabled. Worth noting that this doesn't cover all tests, there's a remaining ~40 tests not using %clang_cc1 that will need a followup change.
Differential Revision: https://reviews.llvm.org/D123115
show more ...
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1 |
|
#
fd739804 |
| 31-Dec-2020 |
Fangrui Song <i@maskray.me> |
[test] Add {{.*}} to make ELF tests immune to dso_local/dso_preemptable/(none) differences
For a default visibility external linkage definition, dso_local is set for ELF -fno-pic/-fpie and COFF and
[test] Add {{.*}} to make ELF tests immune to dso_local/dso_preemptable/(none) differences
For a default visibility external linkage definition, dso_local is set for ELF -fno-pic/-fpie and COFF and Mach-O. Since default clang -cc1 for ELF is similar to -fpic ("PIC Level" is not set), this nuance causes unneeded binary format differences.
To make emitted IR similar, ELF -cc1 -fpic will default to -fno-semantic-interposition, which sets dso_local for default visibility external linkage definitions.
To make this flip smooth and enable future (dso_local as definition default), this patch replaces (function) `define ` with `define{{.*}} `, (variable/constant/alias) `= ` with `={{.*}} `, or inserts appropriate `{{.*}} `.
show more ...
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2 |
|
#
a009a60a |
| 03-Aug-2019 |
Tim Northover <tnorthover@apple.com> |
IR: print value numbers for unnamed function arguments
For consistency with normal instructions and clarity when reading IR, it's best to print the %0, %1, ... names of function arguments in definit
IR: print value numbers for unnamed function arguments
For consistency with normal instructions and clarity when reading IR, it's best to print the %0, %1, ... names of function arguments in definitions.
Also modifies the parser to accept IR in that form for obvious reasons.
llvm-svn: 367755
show more ...
|
Revision tags: llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1 |
|
#
da3b6320 |
| 02-Oct-2018 |
Sven van Haastregt <sven.vanhaastregt@arm.com> |
Revert r326937 "[OpenCL] Remove block invoke function from emitted block literal struct"
This reverts r326937 as it broke block argument handling in OpenCL. See the discussion on https://reviews.llv
Revert r326937 "[OpenCL] Remove block invoke function from emitted block literal struct"
This reverts r326937 as it broke block argument handling in OpenCL. See the discussion on https://reviews.llvm.org/D43783 .
The next commit will add a test case that revealed the issue.
llvm-svn: 343582
show more ...
|
Revision tags: llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1 |
|
#
cb35e9fa |
| 07-Mar-2018 |
Yaxun Liu <Yaxun.Liu@amd.com> |
[OpenCL] Remove block invoke function from emitted block literal struct
OpenCL runtime tracks the invoke function emitted for any block expression. Due to restrictions on blocks in OpenCL (v2.0 s6.1
[OpenCL] Remove block invoke function from emitted block literal struct
OpenCL runtime tracks the invoke function emitted for any block expression. Due to restrictions on blocks in OpenCL (v2.0 s6.12.5), it is always possible to know the block invoke function when emitting call of block expression or __enqueue_kernel builtin functions. Since __enqueu_kernel already has an argument for the invoke function, it is redundant to have invoke function member in the llvm block literal structure.
This patch removes invoke function from the llvm block literal structure. It also removes the bitcast of block invoke function to the generic block literal type which is useless for OpenCL.
This will save some space for the kernel argument, and also eliminate some store instructions.
Differential Revision: https://reviews.llvm.org/D43783
llvm-svn: 326937
show more ...
|
Revision tags: llvmorg-6.0.0, llvmorg-6.0.0-rc3 |
|
#
fa13d015 |
| 15-Feb-2018 |
Yaxun Liu <Yaxun.Liu@amd.com> |
[OpenCL] Fix __enqueue_block for block with captures
The following test case causes issue with codegen of __enqueue_block
void (^block)(void) = ^{ callee(id, out); };
enqueue_kernel(queue, 0, ndra
[OpenCL] Fix __enqueue_block for block with captures
The following test case causes issue with codegen of __enqueue_block
void (^block)(void) = ^{ callee(id, out); };
enqueue_kernel(queue, 0, ndrange, block); Clang first does codegen for block expression in the first line and deletes its block info. Clang then tries to do codegen for the same block expression again for the second line, and fails because the block info is gone.
The fix is to do normal codegen for both lines. Introduce an API to OpenCL runtime to record llvm block invoke function and llvm block literal emitted for each AST block expression, and use the recorded information for generating the wrapper kernel.
The EmitBlockLiteral APIs are cleaned up to minimize changes to the normal codegen of blocks.
Another minor issue is that some clean up AST expression is generated for block with captures, which can be stripped by IgnoreImplicit.
Differential Revision: https://reviews.llvm.org/D43240
llvm-svn: 325264
show more ...
|
Revision tags: llvmorg-6.0.0-rc2 |
|
#
f5f45e5e |
| 02-Feb-2018 |
Yaxun Liu <Yaxun.Liu@amd.com> |
[AMDGPU] Switch to the new addr space mapping by default
This requires corresponding llvm change.
Differential Revision: https://reviews.llvm.org/D40956
llvm-svn: 324102
|
Revision tags: llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1 |
|
#
f2127d17 |
| 19-Oct-2017 |
Yaxun Liu <Yaxun.Liu@amd.com> |
[AMDGPU] Fix bug in enqueued block codegen due to an extra line
llvm-svn: 316165
|
#
c2a87a05 |
| 14-Oct-2017 |
Yaxun Liu <Yaxun.Liu@amd.com> |
[OpenCL] Emit enqueued block as kernel
In OpenCL the kernel function and non-kernel function has different calling conventions. For certain targets they have different argument ABIs. Also kernels ha
[OpenCL] Emit enqueued block as kernel
In OpenCL the kernel function and non-kernel function has different calling conventions. For certain targets they have different argument ABIs. Also kernels have special function attributes and metadata for runtime to launch them.
The blocks passed to enqueue_kernel is supposed to be executed as kernels. As such, the block invoke function should be emitted as kernel with proper calling convention and argument ABI.
This patch emits enqueued block as kernel. If a block is both called directly and passed to enqueue_kernel, separate functions will be generated.
Differential Revision: https://reviews.llvm.org/D38134
llvm-svn: 315804
show more ...
|