|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4 |
|
| #
f104eb6e |
| 15-May-2023 |
pvanhout <pierre.vanhoutryve@amd.com> |
[AMDGPU] Reintroduce CC exception for non-inlined functions in Promote Alloca limits
This is basically a partial revert of https://reviews.llvm.org/D145586 ( fd1d60873fdc )
D145586 was originally i
[AMDGPU] Reintroduce CC exception for non-inlined functions in Promote Alloca limits
This is basically a partial revert of https://reviews.llvm.org/D145586 ( fd1d60873fdc )
D145586 was originally introduced to help with SWDEV-363662, and it did, but it also caused a 25% drop in performance in some MIOpen benchmarks where, it seems, functions are inlined more conservatively.
This patch restores the pre-D145586 behavior for PromoteAlloca: functions with a non-entry CC have a 32 VGPRs threshold, but only if the function is not marked with "alwaysinline".
A good number of AMDGPU code makes uses of the AMDGPUAlwaysInline pass anyway, so in our backend "alwaysinline" seems very common.
This change does not affect SWDEV-363662 (the motivating issue for introducing D145586).
Fixes SWDEV-399519
Reviewed By: rampitec, #amdgpu
Differential Revision: https://reviews.llvm.org/D150551
show more ...
|
|
Revision tags: llvmorg-16.0.3, llvmorg-16.0.2 |
|
| #
fd1d6087 |
| 12-Apr-2023 |
pvanhout <pierre.vanhoutryve@amd.com> |
[AMDGPU] Remove CC exception for Promote Alloca Limits
Apparently it was used to work around some issue that has been fixed. Removing it helps with high scratch usage observed in some cases due to f
[AMDGPU] Remove CC exception for Promote Alloca Limits
Apparently it was used to work around some issue that has been fixed. Removing it helps with high scratch usage observed in some cases due to failed alloca promotion.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D145586
show more ...
|
|
Revision tags: llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
| #
7850ab21 |
| 01-Dec-2022 |
Roman Lebedev <lebedev.ri@gmail.com> |
[NFC] Port an assortment of tests that invoke SROA to new pass manager
|
|
Revision tags: llvmorg-15.0.6 |
|
| #
50caf693 |
| 28-Nov-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Convert promote alloca tests to opaque pointers
|
| #
1310aa16 |
| 17-Nov-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Use -passes for amdgpu-promote-alloca tests
|
|
Revision tags: llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4 |
|
| #
cf74ef13 |
| 23-Sep-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Limit promote alloca max size in functions
Non-entry functions have 32 caller saved VGPRs available. If we promote alloca to consume more registers we will have to spill CSRs. There is no r
[AMDGPU] Limit promote alloca max size in functions
Non-entry functions have 32 caller saved VGPRs available. If we promote alloca to consume more registers we will have to spill CSRs. There is no reason to eliminate scratch access to get another scratch access instead.
Differential Revision: https://reviews.llvm.org/D110372
show more ...
|
|
Revision tags: llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3 |
|
| #
54e2dc75 |
| 01-Jul-2020 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Limit promote alloca to vector with VGPR budget
Allow only up to 1/4 of available VGPRs for the vectorization of any given alloca.
Differential Revision: https://reviews.llvm.org/D82990
|