occupancy-levels.ll - OpenGrok history log for /llvm-project/llvm/test/CodeGen/AMDGPU/occupancy-levels.ll

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3
# 076aac59	23-Oct-2024	Carl Ritson <carl.ritson@amd.com>	[AMDGPU] Add a new target for gfx1153 (#113138)
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 689c5c48	24-Jun-2024	Mariusz Sikora <mariusz.sikora@amd.com>	[AMDGPU] Set total VGPRs to 1536 for gfx12 (#96272) - Use Feature1_5xVGPRs
Revision tags: llvmorg-18.1.8
# 1ca0055f	06-Jun-2024	Shilei Tian <i@tianshilei.me>	[AMDGPU] Add a new target gfx1152 (#94534)
Revision tags: llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 9e9907f1	17-Jan-2024	Fangrui Song <i@maskray.me>	[AMDGPU,test] Change llc -march= to -mtriple= (#75982) Similar to 806761a7629df268c8aed49657aeccffa6bca449. For IR files without a target triple, -mtriple= specifies the full target triple while [AMDGPU,test] Change llc -march= to -mtriple= (#75982) Similar to 806761a7629df268c8aed49657aeccffa6bca449. For IR files without a target triple, -mtriple= specifies the full target triple while -march= merely sets the architecture part of the default target triple, leaving a target triple which may not make sense, e.g. amdgpu-apple-darwin. Therefore, -march= is error-prone and not recommended for tests without a target triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead of rejecting it outrightly. This patch changes AMDGPU tests to not rely on the default OS/environment components. Tests that need fixes are not changed: ``` LLVM :: CodeGen/AMDGPU/fabs.f64.ll LLVM :: CodeGen/AMDGPU/fabs.ll LLVM :: CodeGen/AMDGPU/floor.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.ll LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll LLVM :: CodeGen/AMDGPU/schedule-if-2.ll ``` show more ...
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init
# 92542f2a	17-Jul-2023	Jay Foad <jay.foad@amd.com>	[AMDGPU] Add targets gfx1150 and gfx1151 This is the target definition only. Currently they are treated the same as GFX 11.0.x. Differential Revision: https://reviews.llvm.org/D155429
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4
# 864a2b25	06-Mar-2023	Austin Kerbow <Austin.Kerbow@amd.com>	[AMDGPU] Reserve extra SGPR blocks wth XNACK "any" TID Setting ASMPrinter was relying on feature bits to setup extra SGRPs in the knerel descriptor for the xnack_mask. This was broken for the dynami [AMDGPU] Reserve extra SGPR blocks wth XNACK "any" TID Setting ASMPrinter was relying on feature bits to setup extra SGRPs in the knerel descriptor for the xnack_mask. This was broken for the dynamic XNACK "any" TID setting which could cause user SGPRs to be clobbered if the number of SGPRs reserved was near a granulated block boundary. When XNACK was enabled this worked correctly in the ASMParser which meant some kernels were only failing without "-save-temps". Fixes: SWDEV-382764 Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D145401 show more ...
Revision tags: llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# 10cef708	01-Dec-2022	Nicolai Hähnle <nicolai.haehnle@amd.com>	AMDGPU: Clean up LDS-related occupancy calculations Occupancy is expressed as waves per SIMD. This means that we need to take into account the number of SIMDs per "CU" or, to be more precise, the nu AMDGPU: Clean up LDS-related occupancy calculations Occupancy is expressed as waves per SIMD. This means that we need to take into account the number of SIMDs per "CU" or, to be more precise, the number of SIMDs over which a workgroup may be distributed. getOccupancyWithLocalMemSize was wrong because it didn't take SIMDs into account at all. At the same time, we need to take into account that WGP mode offers access to a larger total amount of LDS, since this can affect how non-power-of-two LDS allocations are rounded. To make this work consistently, we distinguish between (available) local memory size and addressable local memory size (which is always limited by 64kB on gfx10+, even with WGP mode). This change results in a massive amount of test churn. A lot of it is caused by the fact that the default work group size is 1024, which means that (due to rounding effects) the default occupancy on older hardware is 8 instead of 10, which affects scheduling via register pressure estimates. I've adjusted most tests by just running the UTC tools, but in some cases I manually changed the work group size to 32 or 64 to make sure that work group size chunkiness has no effect. Differential Revision: https://reviews.llvm.org/D139468 show more ...
Revision tags: llvmorg-15.0.6
# d09d834b	21-Nov-2022	Valery Pykhtin <valery.pykhtin@gmail.com>	[AMDGPU] Fix GCNSubtarget::getMinNumVGPRs, add unit test to check consistency between GCNSubtarget's getMinNumVGPRs, getMaxNumVGPRs and getOccupancyWithNumVGPRs. ``` /// \returns Minimum number of [AMDGPU] Fix GCNSubtarget::getMinNumVGPRs, add unit test to check consistency between GCNSubtarget's getMinNumVGPRs, getMaxNumVGPRs and getOccupancyWithNumVGPRs. ``` /// \returns Minimum number of VGPRs that meets given number of waves per /// execution unit requirement supported by the subtarget. unsigned getMinNumVGPRs(unsigned WavesPerEU) const; /// \returns Maximum number of VGPRs that meets given number of waves per /// execution unit requirement supported by the subtarget. unsigned getMaxNumVGPRs(unsigned WavesPerEU) const; /// Return the maximum number of waves per SIMD for kernels using \p VGPRs /// VGPRs unsigned getOccupancyWithNumVGPRs(unsigned VGPRs) const; ``` While working on RP tracking issues I noticed that getMinNumVGPRs return incorrect values: the problem is large VGPR granule sizes on GFX10+ architectures. Some of the occupancies aren't reachable because require the same amount of VGPR granules as others. For example 19 waves occupancy on gfx1010 require the same amount of granules as 20 waves so the resultng occupancy would be 20. SGPRs have the same issue and even have inconsistency between getMaxNumSGPRs and getOccupancyWithNumSGPRs. It will be addressed in the next patch. Legend: # MinVGPR and MaxVGPR are values returned by getMinNumVGPRs and getMaxNumVGPRs for a given Occ. # (ONumber) is the value returned by getOccupancyWithNumVGPRs for a given MinVGPR or MaxVGPR. # R means range problem: MinVGPR should be less than MaxVGPR and both should refer to the same occupancy. Unit test output without the fix: ``` ./build/unittests/Target/AMDGPU/AMDGPUTests --gtest_filter=AMDGPU.TestVGPRLimitsPerOccupancy --print-cpu-reg-limits gfx90a gfx940: Occ MinVGPR MaxVGPR 8 0 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 80 (O6) 5 81 (O5) 96 (O5) 4 97 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 257 (O1) 512 (O1) gfx600 gfx600 gfx601 gfx601 gfx601 gfx602 gfx602 gfx602 gfx700 gfx700 gfx701 gfx701 gfx702 gfx703 gfx703 gfx703 gfx704 gfx704 gfx705 gfx801 gfx801 gfx802 gfx802 gfx802 gfx803 gfx803 gfx803 gfx803 gfx805 gfx805 gfx810 gfx810 gfx900 gfx902 gfx904 gfx906 gfx908 gfx909 gfx90c: Occ MinVGPR MaxVGPR 10 0 (O10) 24 (O10) 9 25 (O9) 28 (O9) 8 29 (O8) 32 (O8) 7 33 (O7) 36 (O7) 6 37 (O6) 40 (O6) 5 41 (O5) 48 (O5) 4 49 (O4) 64 (O4) 3 65 (O3) 84 (O3) 2 85 (O2) 128 (O2) 1 129 (O1) 256 (O1) gfx1030w64 gfx1031w64 gfx1032w64 gfx1033w64 gfx1034w64 gfx1035w64 gfx1036w64 gfx1102w64 gfx1103w64: Occ MinVGPR MaxVGPR 16 0 (O16) 32 (O16) 15 33 (O12) R 32 (O16) 14 33 (O12) R 32 (O16) 13 33 (O12) R 32 (O16) 12 33 (O12) 40 (O12) 11 41 (O10) R 40 (O12) 10 41 (O10) 48 (O10) 9 49 (O9) 56 (O9) 8 57 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 80 (O6) 5 81 (O5) 96 (O5) 4 97 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 256 (O2) R 256 (O2) gfx1100w64 gfx1101w64: Occ MinVGPR MaxVGPR 16 0 (O16) 48 (O16) 15 49 (O12) R 48 (O16) 14 49 (O12) R 48 (O16) 13 49 (O12) R 48 (O16) 12 49 (O12) 60 (O12) 11 61 (O10) R 60 (O12) 10 61 (O10) 72 (O10) 9 73 (O9) 84 (O9) 8 85 (O8) 96 (O8) 7 97 (O7) 108 (O7) 6 109 (O6) 120 (O6) 5 121 (O5) 144 (O5) 4 145 (O4) 192 (O4) 3 193 (O3) 252 (O3) 2 253 (O2) 256 (O2) 1 256 (O2) R 256 (O2) gfx1030w32 gfx1031w32 gfx1032w32 gfx1033w32 gfx1034w32 gfx1035w32 gfx1036w32 gfx1102w32 gfx1103w32: Occ MinVGPR MaxVGPR 16 0 (O16) 64 (O16) 15 65 (O12) R 64 (O16) 14 65 (O12) R 64 (O16) 13 65 (O12) R 64 (O16) 12 65 (O12) 80 (O12) 11 81 (O10) R 80 (O12) 10 81 (O10) 96 (O10) 9 97 (O9) 112 (O9) 8 113 (O8) 128 (O8) 7 129 (O7) 144 (O7) 6 145 (O6) 160 (O6) 5 161 (O5) 192 (O5) 4 193 (O4) 256 (O4) 3 256 (O4) R 256 (O4) 2 256 (O4) R 256 (O4) 1 256 (O4) R 256 (O4) gfx1100w32 gfx1101w32: Occ MinVGPR MaxVGPR 16 0 (O16) 96 (O16) 15 97 (O12) R 96 (O16) 14 97 (O12) R 96 (O16) 13 97 (O12) R 96 (O16) 12 97 (O12) 120 (O12) 11 121 (O10) R 120 (O12) 10 121 (O10) 144 (O10) 9 145 (O9) 168 (O9) 8 169 (O8) 192 (O8) 7 193 (O7) 216 (O7) 6 217 (O6) 240 (O6) 5 241 (O5) 256 (O5) 4 256 (O5) R 256 (O5) 3 256 (O5) R 256 (O5) 2 256 (O5) R 256 (O5) 1 256 (O5) R 256 (O5) gfx1010w64 gfx1011w64 gfx1012w64 gfx1013w64: Occ MinVGPR MaxVGPR 20 0 (O20) 24 (O20) 19 25 (O18) R 24 (O20) 18 25 (O18) 28 (O18) 17 29 (O16) R 28 (O18) 16 29 (O16) 32 (O16) 15 33 (O14) R 32 (O16) 14 33 (O14) 36 (O14) 13 37 (O12) R 36 (O14) 12 37 (O12) 40 (O12) 11 41 (O11) 44 (O11) 10 45 (O10) 48 (O10) 9 49 (O9) 56 (O9) 8 57 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 84 (O6) 5 85 (O5) 100 (O5) 4 101 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 256 (O2) R 256 (O2) gfx1010w32 gfx1011w32 gfx1012w32 gfx1013w32: Occ MinVGPR MaxVGPR 20 0 (O20) 48 (O20) 19 49 (O18) R 48 (O20) 18 49 (O18) 56 (O18) 17 57 (O16) R 56 (O18) 16 57 (O16) 64 (O16) 15 65 (O14) R 64 (O16) 14 65 (O14) 72 (O14) 13 73 (O12) R 72 (O14) 12 73 (O12) 80 (O12) 11 81 (O11) 88 (O11) 10 89 (O10) 96 (O10) 9 97 (O9) 112 (O9) 8 113 (O8) 128 (O8) 7 129 (O7) 144 (O7) 6 145 (O6) 168 (O6) 5 169 (O5) 200 (O5) 4 201 (O4) 256 (O4) 3 256 (O4) R 256 (O4) 2 256 (O4) R 256 (O4) 1 256 (O4) R 256 (O4) ``` After the fix: ``` gfx90a gfx940: Occ MinVGPR MaxVGPR 8 0 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 80 (O6) 5 81 (O5) 96 (O5) 4 97 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 257 (O1) 512 (O1) gfx600 gfx600 gfx601 gfx601 gfx601 gfx602 gfx602 gfx602 gfx700 gfx700 gfx701 gfx701 gfx702 gfx703 gfx703 gfx703 gfx704 gfx704 gfx705 gfx801 gfx801 gfx802 gfx802 gfx802 gfx803 gfx803 gfx803 gfx803 gfx805 gfx805 gfx810 gfx810 gfx900 gfx902 gfx904 gfx906 gfx908 gfx909 gfx90c: Occ MinVGPR MaxVGPR 10 0 (O10) 24 (O10) 9 25 (O9) 28 (O9) 8 29 (O8) 32 (O8) 7 33 (O7) 36 (O7) 6 37 (O6) 40 (O6) 5 41 (O5) 48 (O5) 4 49 (O4) 64 (O4) 3 65 (O3) 84 (O3) 2 85 (O2) 128 (O2) 1 129 (O1) 256 (O1) gfx1030w64 gfx1031w64 gfx1032w64 gfx1033w64 gfx1034w64 gfx1035w64 gfx1036w64 gfx1102w64 gfx1103w64: Occ MinVGPR MaxVGPR 16 0 (O16) 32 (O16) 15 0 (O16) 32 (O16) 14 0 (O16) 32 (O16) 13 0 (O16) 32 (O16) 12 33 (O12) 40 (O12) 11 33 (O12) 40 (O12) 10 41 (O10) 48 (O10) 9 49 (O9) 56 (O9) 8 57 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 80 (O6) 5 81 (O5) 96 (O5) 4 97 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 169 (O2) 256 (O2) gfx1100w64 gfx1101w64: Occ MinVGPR MaxVGPR 16 0 (O16) 48 (O16) 15 0 (O16) 48 (O16) 14 0 (O16) 48 (O16) 13 0 (O16) 48 (O16) 12 49 (O12) 60 (O12) 11 49 (O12) 60 (O12) 10 61 (O10) 72 (O10) 9 73 (O9) 84 (O9) 8 85 (O8) 96 (O8) 7 97 (O7) 108 (O7) 6 109 (O6) 120 (O6) 5 121 (O5) 144 (O5) 4 145 (O4) 192 (O4) 3 193 (O3) 252 (O3) 2 253 (O2) 256 (O2) 1 253 (O2) 256 (O2) gfx1030w32 gfx1031w32 gfx1032w32 gfx1033w32 gfx1034w32 gfx1035w32 gfx1036w32 gfx1102w32 gfx1103w32: Occ MinVGPR MaxVGPR 16 0 (O16) 64 (O16) 15 0 (O16) 64 (O16) 14 0 (O16) 64 (O16) 13 0 (O16) 64 (O16) 12 65 (O12) 80 (O12) 11 65 (O12) 80 (O12) 10 81 (O10) 96 (O10) 9 97 (O9) 112 (O9) 8 113 (O8) 128 (O8) 7 129 (O7) 144 (O7) 6 145 (O6) 160 (O6) 5 161 (O5) 192 (O5) 4 193 (O4) 256 (O4) 3 193 (O4) 256 (O4) 2 193 (O4) 256 (O4) 1 193 (O4) 256 (O4) gfx1100w32 gfx1101w32: Occ MinVGPR MaxVGPR 16 0 (O16) 96 (O16) 15 0 (O16) 96 (O16) 14 0 (O16) 96 (O16) 13 0 (O16) 96 (O16) 12 97 (O12) 120 (O12) 11 97 (O12) 120 (O12) 10 121 (O10) 144 (O10) 9 145 (O9) 168 (O9) 8 169 (O8) 192 (O8) 7 193 (O7) 216 (O7) 6 217 (O6) 240 (O6) 5 241 (O5) 256 (O5) 4 241 (O5) 256 (O5) 3 241 (O5) 256 (O5) 2 241 (O5) 256 (O5) 1 241 (O5) 256 (O5) gfx1010w64 gfx1011w64 gfx1012w64 gfx1013w64: Occ MinVGPR MaxVGPR 20 0 (O20) 24 (O20) 19 0 (O20) 24 (O20) 18 25 (O18) 28 (O18) 17 25 (O18) 28 (O18) 16 29 (O16) 32 (O16) 15 29 (O16) 32 (O16) 14 33 (O14) 36 (O14) 13 33 (O14) 36 (O14) 12 37 (O12) 40 (O12) 11 41 (O11) 44 (O11) 10 45 (O10) 48 (O10) 9 49 (O9) 56 (O9) 8 57 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 84 (O6) 5 85 (O5) 100 (O5) 4 101 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 169 (O2) 256 (O2) gfx1010w32 gfx1011w32 gfx1012w32 gfx1013w32: Occ MinVGPR MaxVGPR 20 0 (O20) 48 (O20) 19 0 (O20) 48 (O20) 18 49 (O18) 56 (O18) 17 49 (O18) 56 (O18) 16 57 (O16) 64 (O16) 15 57 (O16) 64 (O16) 14 65 (O14) 72 (O14) 13 65 (O14) 72 (O14) 12 73 (O12) 80 (O12) 11 81 (O11) 88 (O11) 10 89 (O10) 96 (O10) 9 97 (O9) 112 (O9) 8 113 (O8) 128 (O8) 7 129 (O7) 144 (O7) 6 145 (O6) 168 (O6) 5 169 (O5) 200 (O5) 4 201 (O4) 256 (O4) 3 201 (O4) 256 (O4) 2 201 (O4) 256 (O4) 1 201 (O4) 256 (O4) ``` Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D138443 show more ...
# d85e849f	02-Dec-2022	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Convert some assorted tests to opaque pointers
Revision tags: llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2
# ddfa0f62	23-Sep-2022	Jay Foad <jay.foad@amd.com>	[AMDGPU] Add GFX11 feature for subtargets with more VGPRs The full complement of physical VGPRs for GFX11 is 50% more than GFX10. Some subtargets have this, others stay the same as GFX10. This affec [AMDGPU] Add GFX11 feature for subtargets with more VGPRs The full complement of physical VGPRs for GFX11 is 50% more than GFX10. Some subtargets have this, others stay the same as GFX10. This affects occupancy calculations. Differential Revision: https://reviews.llvm.org/D134522 show more ...
Revision tags: llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3
# 8d4b74ac	11-Sep-2021	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Don't consider whether amdgpu-flat-work-group-size was set It should be semantically identical if it was set to the same value as the default. Also improve the documentation.
Revision tags: llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2
# 03663e41	08-Dec-2020	Jay Foad <jay.foad@amd.com>	[AMDGPU] Add occupancy level tests for GFX10.3. NFC. getMaxWavesPerEU and getVGPRAllocGranule both changed in GFX10.3 and they both affect the occupancy calculation. Differential Revision: https:// [AMDGPU] Add occupancy level tests for GFX10.3. NFC. getMaxWavesPerEU and getVGPRAllocGranule both changed in GFX10.3 and they both affect the occupancy calculation. Differential Revision: https://reviews.llvm.org/D92839 show more ...
Revision tags: llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3
# 88aced1e	02-Mar-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Fix computation for getOccupancyWithLocalMemSize The computation here didn't really make sense to me, and reported wildy different results depending on the flat work group size attribute. I AMDGPU: Fix computation for getOccupancyWithLocalMemSize The computation here didn't really make sense to me, and reported wildy different results depending on the flat work group size attribute. I think this should really report a range derived from the possible work group size bounds, and only allow an occupancy that is a multiple of the group size. show more ...
Revision tags: llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3
# 4b472139	27-Aug-2019	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Switch backend default max workgroup size to 1024 Previously this would default to 256, not the maximum supported size of 1024. Using a maximum lower than the hardware maximum requires langu AMDGPU: Switch backend default max workgroup size to 1024 Previously this would default to 256, not the maximum supported size of 1024. Using a maximum lower than the hardware maximum requires language runtimes to enforce this limit for correctness, which no language has correctly done. Switch the default to the conservatively correct maximum, and force frontends to opt-in to the more optimal 256 default maximum. I don't really understand why the changes in occupancy-levels.ll increased the computed occupancy, which I expected to decrease. I'm not sure if these tests should be forcing the old maximum. show more ...
Revision tags: llvmorg-9.0.0-rc2
# 2594fa85	31-Jul-2019	Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	[AMDGPU] Fix high occupancy calculation and print it We had couple places which still return 10 as a maximum occupancy. Fixed. Also print comment about occupancy as compiler see it. Differential R [AMDGPU] Fix high occupancy calculation and print it We had couple places which still return 10 as a maximum occupancy. Fixed. Also print comment about occupancy as compiler see it. Differential Revision: https://reviews.llvm.org/D65423 llvm-svn: 367381 show more ...