#
86caf517 |
| 11-Dec-2021 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
Revert "[amdgpu][nfc] Delete dead code in LowerModuleLDS"
This reverts commit 7b9ab06d10a6a989f76e6c5ecf89d906f838fe7d. Said code is better removed as part of a larger change.
|
#
7b9ab06d |
| 10-Dec-2021 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
[amdgpu][nfc] Delete dead code in LowerModuleLDS
|
#
04b2f6ea |
| 09-Dec-2021 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
[amdgpu][nfc] Drop dead PtrSet, fix a comment
|
#
f0e3b39a |
| 08-Dec-2021 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
[amdgpu][nfc] Move non-shared code out of LDSUtils
|
Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4 |
|
#
cbdf624b |
| 19-Sep-2021 |
Brendon Cahoon <brendon.cahoon@amd.com> |
[AMDGPU] Correctly merge alias.scope and noalias metadata for memops
When adding alias.scope and noalias metadata to a memcpy function, the alias.scope and noalias metadata from the operands are mer
[AMDGPU] Correctly merge alias.scope and noalias metadata for memops
When adding alias.scope and noalias metadata to a memcpy function, the alias.scope and noalias metadata from the operands are merged. The rule for merging alias.scope is to take the intersection of the domains and the union of the scopes within those domains. The rule for merging noalias is to take the intersection.
The bug is that AMDGPULowerModuleLDS was using concatenation for both alias.scope and noalias. For example, when f1 and f2 are added to the LDS structure and there is a memcpy(f2, f1, sizeof(f1)). Then, concatenation creates noalias metadata for the memcpy that includes both {f1, f2}. That means that the memcpy is assumed not to alias a prior load of f2, which enables the optimizer to remove a load of f2 that occurs after mempcy.
The function MDNode::getmostGenericAliasScope defines the semantics for alias.scope. There is a function, combineMetadata in Local.cpp, that uses intersect for noalias.
Differential Revision: https://reviews.llvm.org/D110049
show more ...
|
#
dc6e8dfd |
| 20-Sep-2021 |
Jacob Lambert <jacob.lambert@amd.com> |
[AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor.
|
Revision tags: llvmorg-13.0.0-rc3 |
|
#
ce51c5d4 |
| 26-Aug-2021 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix crashing on kernel declarations when lowering LDS
This was trying to insert the used marker into a declaration.
|
Revision tags: llvmorg-13.0.0-rc2 |
|
#
8d7d89b0 |
| 17-Aug-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Add alias.scope metadata to lowered LDS struct
Alias analysis is unable to disambiguate accesses to the structure fields without it unlike distinct variables. As a result we cannot combine
[AMDGPU] Add alias.scope metadata to lowered LDS struct
Alias analysis is unable to disambiguate accesses to the structure fields without it unlike distinct variables. As a result we cannot combine ds_read and ds_write operations in a case of any store in between which always considered clobbering.
Differential Revision: https://reviews.llvm.org/D108315
show more ...
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init |
|
#
9dc26366 |
| 16-Jul-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Disable LDS lowering for GFX shaders
Apparently these need external LDS symbols to remain.
Fixes: SC1-3279
Differential Revision: https://reviews.llvm.org/D106288
|
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3 |
|
#
d274d64e |
| 23-Jun-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Check for pointer operand while refining LDS align
Also skips the propagation if alignment is 1.
Differential Revision: https://reviews.llvm.org/D104796
|
Revision tags: llvmorg-12.0.1-rc2 |
|
#
2b43209e |
| 15-Jun-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Propagate LDS align into to instructions
Differential Revision: https://reviews.llvm.org/D104316
|
#
d797a7f8 |
| 15-Jun-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Use performOptimizedStructLayout for LDS sort
This gives better packing.
Differential Revision: https://reviews.llvm.org/D104331
|
#
80fd5fa5 |
| 21-Jun-2021 |
hsmahesha <mahesha.comp@gmail.com> |
[AMDGPU] Replace non-kernel function uses of LDS globals by pointers.
The main motivation behind pointer replacement of LDS use within non-kernel functions is - to *avoid* subsequent LDS lowering pa
[AMDGPU] Replace non-kernel function uses of LDS globals by pointers.
The main motivation behind pointer replacement of LDS use within non-kernel functions is - to *avoid* subsequent LDS lowering pass from directly packing LDS (assume large LDS) into a struct type which would otherwise cause allocating huge memory for struct instance within every kernel.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D103225
show more ...
|
#
f6632f11 |
| 10-Jun-2021 |
hsmahesha <mahesha.comp@gmail.com> |
[AMDGPU] Fix missing lowering of LDS used in global scope.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D103431
|
#
05289dfb |
| 07-Jun-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Handle constant LDS uses from different kernels
This allows to lower an LDS variable into a kernel structure even if there is a constant expression used from different kernels.
Differentia
[AMDGPU] Handle constant LDS uses from different kernels
This allows to lower an LDS variable into a kernel structure even if there is a constant expression used from different kernels.
Differential Revision: https://reviews.llvm.org/D103655
show more ...
|
#
713ca2f3 |
| 07-Jun-2021 |
hsmahesha <mahesha.comp@gmail.com> |
[AMDGPU] Introduce command line switch to control super aligning of LDS.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D103817
|
#
52ffbfdf |
| 07-Jun-2021 |
hsmahesha <mahesha.comp@gmail.com> |
[AMDGPU] Increase alignment of LDS globals if necessary before LDS lowering.
Before packing LDS globals into a sorted structure, make sure that their alignment is properly updated based on their siz
[AMDGPU] Increase alignment of LDS globals if necessary before LDS lowering.
Before packing LDS globals into a sorted structure, make sure that their alignment is properly updated based on their size. This will make sure that the members of sorted structure are properly aligned, and hence it will further reduce the probability of unaligned LDS access.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D103261
show more ...
|
#
753437fc |
| 04-Jun-2021 |
hsmahesha <mahesha.comp@gmail.com> |
Revert "[AMDGPU] Increase alignment of LDS globals if necessary before LDS lowering."
This reverts commit d71ff907ef23eaef86ad66ba2d711e4986cd6cb2.
|
#
d71ff907 |
| 04-Jun-2021 |
hsmahesha <mahesha.comp@gmail.com> |
[AMDGPU] Increase alignment of LDS globals if necessary before LDS lowering.
Before packing LDS globals into a sorted structure, make sure that their alignment is properly updated based on their siz
[AMDGPU] Increase alignment of LDS globals if necessary before LDS lowering.
Before packing LDS globals into a sorted structure, make sure that their alignment is properly updated based on their size. This will make sure that the members of sorted structure are properly aligned, and hence it will further reduce the probability of unaligned LDS access.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D103261
show more ...
|
Revision tags: llvmorg-12.0.1-rc1 |
|
#
8de4db69 |
| 19-May-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Lower kernel LDS into a sorted structure
Differential Revision: https://reviews.llvm.org/D102954
|
#
49028858 |
| 20-May-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Request module used variables from LDS lowering as internal
I do not see any practical difference but technically used.* variables are internal and a call to getGlobalVariable misses true a
[AMDGPU] Request module used variables from LDS lowering as internal
I do not see any practical difference but technically used.* variables are internal and a call to getGlobalVariable misses true as a second argument. NFC as far as I can tell.
Differential Revision: https://reviews.llvm.org/D102884
show more ...
|
#
748db5bf |
| 20-May-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Fix module LDS selection
Accesses to global module LDS variable start from null, but kernel also thinks its variables start address is null. Fixed by not using a null as an address.
Differ
[AMDGPU] Fix module LDS selection
Accesses to global module LDS variable start from null, but kernel also thinks its variables start address is null. Fixed by not using a null as an address.
Differential Revision: https://reviews.llvm.org/D102882
show more ...
|
#
82787eb2 |
| 15-Apr-2021 |
hsmahesha <mahesha.comp@gmail.com> |
[AMDGPU] Move LDS lowering related utility functions to a separate utils file.
Move some utility functions which are used within LDS lowering pass to a separate utils file so that other LDS related
[AMDGPU] Move LDS lowering related utility functions to a separate utils file.
Move some utility functions which are used within LDS lowering pass to a separate utils file so that other LDS related passes can make use of them when required.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D100526
show more ...
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4 |
|
#
13e49dce |
| 15-Mar-2021 |
Jon Chesterfield <jonathanchesterfield@gmail.com> |
[amdgpu] Implement lower function LDS pass
[amdgpu] Implement lower function LDS pass
Local variables are allocated at kernel launch. This pass collects global variables that are used from non-kern
[amdgpu] Implement lower function LDS pass
[amdgpu] Implement lower function LDS pass
Local variables are allocated at kernel launch. This pass collects global variables that are used from non-kernel functions, moves them into a new struct type, and allocates an instance of that type in every kernel. Uses are then replaced with a constantexpr offset.
Prior to this pass, accesses from a function are compiled to trap. With this pass, most such accesses are removed before reaching codegen. The trap logic is left unchanged by this pass. It is still reachable for the cases this pass misses, notably the extern shared construct from hip and variables marked constant which survive the optimizer.
This is of interest to the openmp project because the deviceRTL runtime library uses cuda shared variables from functions that cannot be inlined. Trunk llvm therefore cannot compile some openmp kernels for amdgpu. In addition to the unit tests attached, this patch applied to ROCm llvm with fixed-abi enabled and the function pointer hashing scheme deleted passes the openmp suite.
This lowering will use more LDS than strictly necessary. It is intended to be a functionally correct fallback for cases that are difficult to target from future optimisation passes.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D94648
show more ...
|