History log of /llvm-project/llvm/test/CodeGen/AMDGPU/occupancy-levels.ll (Results 1 – 15 of 15)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3
# 076aac59 23-Oct-2024 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] Add a new target for gfx1153 (#113138)


Revision tags: llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 689c5c48 24-Jun-2024 Mariusz Sikora <mariusz.sikora@amd.com>

[AMDGPU] Set total VGPRs to 1536 for gfx12 (#96272)

- Use Feature1_5xVGPRs


Revision tags: llvmorg-18.1.8
# 1ca0055f 06-Jun-2024 Shilei Tian <i@tianshilei.me>

[AMDGPU] Add a new target gfx1152 (#94534)


Revision tags: llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 9e9907f1 17-Jan-2024 Fangrui Song <i@maskray.me>

[AMDGPU,test] Change llc -march= to -mtriple= (#75982)

Similar to 806761a7629df268c8aed49657aeccffa6bca449.

For IR files without a target triple, -mtriple= specifies the full
target triple while

[AMDGPU,test] Change llc -march= to -mtriple= (#75982)

Similar to 806761a7629df268c8aed49657aeccffa6bca449.

For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.

Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.

This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:

```
LLVM :: CodeGen/AMDGPU/fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fabs.ll
LLVM :: CodeGen/AMDGPU/floor.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```

show more ...


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init
# 92542f2a 17-Jul-2023 Jay Foad <jay.foad@amd.com>

[AMDGPU] Add targets gfx1150 and gfx1151

This is the target definition only. Currently they are treated the same
as GFX 11.0.x.

Differential Revision: https://reviews.llvm.org/D155429


Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4
# 864a2b25 06-Mar-2023 Austin Kerbow <Austin.Kerbow@amd.com>

[AMDGPU] Reserve extra SGPR blocks wth XNACK "any" TID Setting

ASMPrinter was relying on feature bits to setup extra SGRPs in the knerel
descriptor for the xnack_mask. This was broken for the dynami

[AMDGPU] Reserve extra SGPR blocks wth XNACK "any" TID Setting

ASMPrinter was relying on feature bits to setup extra SGRPs in the knerel
descriptor for the xnack_mask. This was broken for the dynamic XNACK "any" TID
setting which could cause user SGPRs to be clobbered if the number of SGPRs
reserved was near a granulated block boundary.

When XNACK was enabled this worked correctly in the ASMParser which meant some
kernels were only failing without "-save-temps".

Fixes: SWDEV-382764

Reviewed By: kzhuravl

Differential Revision: https://reviews.llvm.org/D145401

show more ...


Revision tags: llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# 10cef708 01-Dec-2022 Nicolai Hähnle <nicolai.haehnle@amd.com>

AMDGPU: Clean up LDS-related occupancy calculations

Occupancy is expressed as waves per SIMD. This means that we need to
take into account the number of SIMDs per "CU" or, to be more precise,
the nu

AMDGPU: Clean up LDS-related occupancy calculations

Occupancy is expressed as waves per SIMD. This means that we need to
take into account the number of SIMDs per "CU" or, to be more precise,
the number of SIMDs over which a workgroup may be distributed.

getOccupancyWithLocalMemSize was wrong because it didn't take SIMDs
into account at all.

At the same time, we need to take into account that WGP mode offers
access to a larger total amount of LDS, since this can affect how
non-power-of-two LDS allocations are rounded. To make this work
consistently, we distinguish between (available) local memory size and
addressable local memory size (which is always limited by 64kB on
gfx10+, even with WGP mode).

This change results in a massive amount of test churn. A lot of it is
caused by the fact that the default work group size is 1024, which means
that (due to rounding effects) the default occupancy on older hardware
is 8 instead of 10, which affects scheduling via register pressure
estimates. I've adjusted most tests by just running the UTC tools, but
in some cases I manually changed the work group size to 32 or 64 to make
sure that work group size chunkiness has no effect.

Differential Revision: https://reviews.llvm.org/D139468

show more ...


Revision tags: llvmorg-15.0.6
# d09d834b 21-Nov-2022 Valery Pykhtin <valery.pykhtin@gmail.com>

[AMDGPU] Fix GCNSubtarget::getMinNumVGPRs, add unit test to check consistency between GCNSubtarget's getMinNumVGPRs, getMaxNumVGPRs and getOccupancyWithNumVGPRs.

```
/// \returns Minimum number of

[AMDGPU] Fix GCNSubtarget::getMinNumVGPRs, add unit test to check consistency between GCNSubtarget's getMinNumVGPRs, getMaxNumVGPRs and getOccupancyWithNumVGPRs.

```
/// \returns Minimum number of VGPRs that meets given number of waves per
/// execution unit requirement supported by the subtarget.
unsigned getMinNumVGPRs(unsigned WavesPerEU) const;

/// \returns Maximum number of VGPRs that meets given number of waves per
/// execution unit requirement supported by the subtarget.
unsigned getMaxNumVGPRs(unsigned WavesPerEU) const;

/// Return the maximum number of waves per SIMD for kernels using \p VGPRs
/// VGPRs
unsigned getOccupancyWithNumVGPRs(unsigned VGPRs) const;
```

While working on RP tracking issues I noticed that getMinNumVGPRs return incorrect
values: the problem is large VGPR granule sizes on GFX10+ architectures. Some of the
occupancies aren't reachable because require the same amount of VGPR granules as others.
For example 19 waves occupancy on gfx1010 require the same amount of granules as 20 waves
so the resultng occupancy would be 20.

SGPRs have the same issue and even have inconsistency between getMaxNumSGPRs and getOccupancyWithNumSGPRs.
It will be addressed in the next patch.

Legend:
# MinVGPR and MaxVGPR are values returned by getMinNumVGPRs and getMaxNumVGPRs for a given Occ.
# (ONumber) is the value returned by getOccupancyWithNumVGPRs for a given MinVGPR or MaxVGPR.
# R means range problem: MinVGPR should be less than MaxVGPR and both should refer to the same occupancy.

Unit test output without the fix:
```
./build/unittests/Target/AMDGPU/AMDGPUTests --gtest_filter=AMDGPU.TestVGPRLimitsPerOccupancy --print-cpu-reg-limits

gfx90a gfx940:
Occ MinVGPR MaxVGPR
8 0 (O8) 64 (O8)
7 65 (O7) 72 (O7)
6 73 (O6) 80 (O6)
5 81 (O5) 96 (O5)
4 97 (O4) 128 (O4)
3 129 (O3) 168 (O3)
2 169 (O2) 256 (O2)
1 257 (O1) 512 (O1)

gfx600 gfx600 gfx601 gfx601 gfx601 gfx602 gfx602 gfx602 gfx700 gfx700 gfx701 gfx701 gfx702 gfx703 gfx703 gfx703 gfx704 gfx704 gfx705 gfx801 gfx801 gfx802 gfx802 gfx802 gfx803 gfx803 gfx803 gfx803 gfx805 gfx805 gfx810 gfx810 gfx900 gfx902 gfx904 gfx906 gfx908 gfx909 gfx90c:
Occ MinVGPR MaxVGPR
10 0 (O10) 24 (O10)
9 25 (O9) 28 (O9)
8 29 (O8) 32 (O8)
7 33 (O7) 36 (O7)
6 37 (O6) 40 (O6)
5 41 (O5) 48 (O5)
4 49 (O4) 64 (O4)
3 65 (O3) 84 (O3)
2 85 (O2) 128 (O2)
1 129 (O1) 256 (O1)

gfx1030w64 gfx1031w64 gfx1032w64 gfx1033w64 gfx1034w64 gfx1035w64 gfx1036w64 gfx1102w64 gfx1103w64:
Occ MinVGPR MaxVGPR
16 0 (O16) 32 (O16)
15 33 (O12) R 32 (O16)
14 33 (O12) R 32 (O16)
13 33 (O12) R 32 (O16)
12 33 (O12) 40 (O12)
11 41 (O10) R 40 (O12)
10 41 (O10) 48 (O10)
9 49 (O9) 56 (O9)
8 57 (O8) 64 (O8)
7 65 (O7) 72 (O7)
6 73 (O6) 80 (O6)
5 81 (O5) 96 (O5)
4 97 (O4) 128 (O4)
3 129 (O3) 168 (O3)
2 169 (O2) 256 (O2)
1 256 (O2) R 256 (O2)

gfx1100w64 gfx1101w64:
Occ MinVGPR MaxVGPR
16 0 (O16) 48 (O16)
15 49 (O12) R 48 (O16)
14 49 (O12) R 48 (O16)
13 49 (O12) R 48 (O16)
12 49 (O12) 60 (O12)
11 61 (O10) R 60 (O12)
10 61 (O10) 72 (O10)
9 73 (O9) 84 (O9)
8 85 (O8) 96 (O8)
7 97 (O7) 108 (O7)
6 109 (O6) 120 (O6)
5 121 (O5) 144 (O5)
4 145 (O4) 192 (O4)
3 193 (O3) 252 (O3)
2 253 (O2) 256 (O2)
1 256 (O2) R 256 (O2)

gfx1030w32 gfx1031w32 gfx1032w32 gfx1033w32 gfx1034w32 gfx1035w32 gfx1036w32 gfx1102w32 gfx1103w32:
Occ MinVGPR MaxVGPR
16 0 (O16) 64 (O16)
15 65 (O12) R 64 (O16)
14 65 (O12) R 64 (O16)
13 65 (O12) R 64 (O16)
12 65 (O12) 80 (O12)
11 81 (O10) R 80 (O12)
10 81 (O10) 96 (O10)
9 97 (O9) 112 (O9)
8 113 (O8) 128 (O8)
7 129 (O7) 144 (O7)
6 145 (O6) 160 (O6)
5 161 (O5) 192 (O5)
4 193 (O4) 256 (O4)
3 256 (O4) R 256 (O4)
2 256 (O4) R 256 (O4)
1 256 (O4) R 256 (O4)

gfx1100w32 gfx1101w32:
Occ MinVGPR MaxVGPR
16 0 (O16) 96 (O16)
15 97 (O12) R 96 (O16)
14 97 (O12) R 96 (O16)
13 97 (O12) R 96 (O16)
12 97 (O12) 120 (O12)
11 121 (O10) R 120 (O12)
10 121 (O10) 144 (O10)
9 145 (O9) 168 (O9)
8 169 (O8) 192 (O8)
7 193 (O7) 216 (O7)
6 217 (O6) 240 (O6)
5 241 (O5) 256 (O5)
4 256 (O5) R 256 (O5)
3 256 (O5) R 256 (O5)
2 256 (O5) R 256 (O5)
1 256 (O5) R 256 (O5)

gfx1010w64 gfx1011w64 gfx1012w64 gfx1013w64:
Occ MinVGPR MaxVGPR
20 0 (O20) 24 (O20)
19 25 (O18) R 24 (O20)
18 25 (O18) 28 (O18)
17 29 (O16) R 28 (O18)
16 29 (O16) 32 (O16)
15 33 (O14) R 32 (O16)
14 33 (O14) 36 (O14)
13 37 (O12) R 36 (O14)
12 37 (O12) 40 (O12)
11 41 (O11) 44 (O11)
10 45 (O10) 48 (O10)
9 49 (O9) 56 (O9)
8 57 (O8) 64 (O8)
7 65 (O7) 72 (O7)
6 73 (O6) 84 (O6)
5 85 (O5) 100 (O5)
4 101 (O4) 128 (O4)
3 129 (O3) 168 (O3)
2 169 (O2) 256 (O2)
1 256 (O2) R 256 (O2)

gfx1010w32 gfx1011w32 gfx1012w32 gfx1013w32:
Occ MinVGPR MaxVGPR
20 0 (O20) 48 (O20)
19 49 (O18) R 48 (O20)
18 49 (O18) 56 (O18)
17 57 (O16) R 56 (O18)
16 57 (O16) 64 (O16)
15 65 (O14) R 64 (O16)
14 65 (O14) 72 (O14)
13 73 (O12) R 72 (O14)
12 73 (O12) 80 (O12)
11 81 (O11) 88 (O11)
10 89 (O10) 96 (O10)
9 97 (O9) 112 (O9)
8 113 (O8) 128 (O8)
7 129 (O7) 144 (O7)
6 145 (O6) 168 (O6)
5 169 (O5) 200 (O5)
4 201 (O4) 256 (O4)
3 256 (O4) R 256 (O4)
2 256 (O4) R 256 (O4)
1 256 (O4) R 256 (O4)
```

After the fix:
```
gfx90a gfx940:
Occ MinVGPR MaxVGPR
8 0 (O8) 64 (O8)
7 65 (O7) 72 (O7)
6 73 (O6) 80 (O6)
5 81 (O5) 96 (O5)
4 97 (O4) 128 (O4)
3 129 (O3) 168 (O3)
2 169 (O2) 256 (O2)
1 257 (O1) 512 (O1)

gfx600 gfx600 gfx601 gfx601 gfx601 gfx602 gfx602 gfx602 gfx700 gfx700 gfx701 gfx701 gfx702 gfx703 gfx703 gfx703 gfx704 gfx704 gfx705 gfx801 gfx801 gfx802 gfx802 gfx802 gfx803 gfx803 gfx803 gfx803 gfx805 gfx805 gfx810 gfx810 gfx900 gfx902 gfx904 gfx906 gfx908 gfx909 gfx90c:
Occ MinVGPR MaxVGPR
10 0 (O10) 24 (O10)
9 25 (O9) 28 (O9)
8 29 (O8) 32 (O8)
7 33 (O7) 36 (O7)
6 37 (O6) 40 (O6)
5 41 (O5) 48 (O5)
4 49 (O4) 64 (O4)
3 65 (O3) 84 (O3)
2 85 (O2) 128 (O2)
1 129 (O1) 256 (O1)

gfx1030w64 gfx1031w64 gfx1032w64 gfx1033w64 gfx1034w64 gfx1035w64 gfx1036w64 gfx1102w64 gfx1103w64:
Occ MinVGPR MaxVGPR
16 0 (O16) 32 (O16)
15 0 (O16) 32 (O16)
14 0 (O16) 32 (O16)
13 0 (O16) 32 (O16)
12 33 (O12) 40 (O12)
11 33 (O12) 40 (O12)
10 41 (O10) 48 (O10)
9 49 (O9) 56 (O9)
8 57 (O8) 64 (O8)
7 65 (O7) 72 (O7)
6 73 (O6) 80 (O6)
5 81 (O5) 96 (O5)
4 97 (O4) 128 (O4)
3 129 (O3) 168 (O3)
2 169 (O2) 256 (O2)
1 169 (O2) 256 (O2)

gfx1100w64 gfx1101w64:
Occ MinVGPR MaxVGPR
16 0 (O16) 48 (O16)
15 0 (O16) 48 (O16)
14 0 (O16) 48 (O16)
13 0 (O16) 48 (O16)
12 49 (O12) 60 (O12)
11 49 (O12) 60 (O12)
10 61 (O10) 72 (O10)
9 73 (O9) 84 (O9)
8 85 (O8) 96 (O8)
7 97 (O7) 108 (O7)
6 109 (O6) 120 (O6)
5 121 (O5) 144 (O5)
4 145 (O4) 192 (O4)
3 193 (O3) 252 (O3)
2 253 (O2) 256 (O2)
1 253 (O2) 256 (O2)

gfx1030w32 gfx1031w32 gfx1032w32 gfx1033w32 gfx1034w32 gfx1035w32 gfx1036w32 gfx1102w32 gfx1103w32:
Occ MinVGPR MaxVGPR
16 0 (O16) 64 (O16)
15 0 (O16) 64 (O16)
14 0 (O16) 64 (O16)
13 0 (O16) 64 (O16)
12 65 (O12) 80 (O12)
11 65 (O12) 80 (O12)
10 81 (O10) 96 (O10)
9 97 (O9) 112 (O9)
8 113 (O8) 128 (O8)
7 129 (O7) 144 (O7)
6 145 (O6) 160 (O6)
5 161 (O5) 192 (O5)
4 193 (O4) 256 (O4)
3 193 (O4) 256 (O4)
2 193 (O4) 256 (O4)
1 193 (O4) 256 (O4)

gfx1100w32 gfx1101w32:
Occ MinVGPR MaxVGPR
16 0 (O16) 96 (O16)
15 0 (O16) 96 (O16)
14 0 (O16) 96 (O16)
13 0 (O16) 96 (O16)
12 97 (O12) 120 (O12)
11 97 (O12) 120 (O12)
10 121 (O10) 144 (O10)
9 145 (O9) 168 (O9)
8 169 (O8) 192 (O8)
7 193 (O7) 216 (O7)
6 217 (O6) 240 (O6)
5 241 (O5) 256 (O5)
4 241 (O5) 256 (O5)
3 241 (O5) 256 (O5)
2 241 (O5) 256 (O5)
1 241 (O5) 256 (O5)

gfx1010w64 gfx1011w64 gfx1012w64 gfx1013w64:
Occ MinVGPR MaxVGPR
20 0 (O20) 24 (O20)
19 0 (O20) 24 (O20)
18 25 (O18) 28 (O18)
17 25 (O18) 28 (O18)
16 29 (O16) 32 (O16)
15 29 (O16) 32 (O16)
14 33 (O14) 36 (O14)
13 33 (O14) 36 (O14)
12 37 (O12) 40 (O12)
11 41 (O11) 44 (O11)
10 45 (O10) 48 (O10)
9 49 (O9) 56 (O9)
8 57 (O8) 64 (O8)
7 65 (O7) 72 (O7)
6 73 (O6) 84 (O6)
5 85 (O5) 100 (O5)
4 101 (O4) 128 (O4)
3 129 (O3) 168 (O3)
2 169 (O2) 256 (O2)
1 169 (O2) 256 (O2)

gfx1010w32 gfx1011w32 gfx1012w32 gfx1013w32:
Occ MinVGPR MaxVGPR
20 0 (O20) 48 (O20)
19 0 (O20) 48 (O20)
18 49 (O18) 56 (O18)
17 49 (O18) 56 (O18)
16 57 (O16) 64 (O16)
15 57 (O16) 64 (O16)
14 65 (O14) 72 (O14)
13 65 (O14) 72 (O14)
12 73 (O12) 80 (O12)
11 81 (O11) 88 (O11)
10 89 (O10) 96 (O10)
9 97 (O9) 112 (O9)
8 113 (O8) 128 (O8)
7 129 (O7) 144 (O7)
6 145 (O6) 168 (O6)
5 169 (O5) 200 (O5)
4 201 (O4) 256 (O4)
3 201 (O4) 256 (O4)
2 201 (O4) 256 (O4)
1 201 (O4) 256 (O4)
```

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D138443

show more ...


# d85e849f 02-Dec-2022 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Convert some assorted tests to opaque pointers


Revision tags: llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2
# ddfa0f62 23-Sep-2022 Jay Foad <jay.foad@amd.com>

[AMDGPU] Add GFX11 feature for subtargets with more VGPRs

The full complement of physical VGPRs for GFX11 is 50% more than GFX10.
Some subtargets have this, others stay the same as GFX10. This affec

[AMDGPU] Add GFX11 feature for subtargets with more VGPRs

The full complement of physical VGPRs for GFX11 is 50% more than GFX10.
Some subtargets have this, others stay the same as GFX10. This affects
occupancy calculations.

Differential Revision: https://reviews.llvm.org/D134522

show more ...


Revision tags: llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3
# 8d4b74ac 11-Sep-2021 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Don't consider whether amdgpu-flat-work-group-size was set

It should be semantically identical if it was set to the same value as
the default. Also improve the documentation.


Revision tags: llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2
# 03663e41 08-Dec-2020 Jay Foad <jay.foad@amd.com>

[AMDGPU] Add occupancy level tests for GFX10.3. NFC.

getMaxWavesPerEU and getVGPRAllocGranule both changed in GFX10.3 and
they both affect the occupancy calculation.

Differential Revision: https://

[AMDGPU] Add occupancy level tests for GFX10.3. NFC.

getMaxWavesPerEU and getVGPRAllocGranule both changed in GFX10.3 and
they both affect the occupancy calculation.

Differential Revision: https://reviews.llvm.org/D92839

show more ...


Revision tags: llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3
# 88aced1e 02-Mar-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix computation for getOccupancyWithLocalMemSize

The computation here didn't really make sense to me, and reported
wildy different results depending on the flat work group size
attribute.

I

AMDGPU: Fix computation for getOccupancyWithLocalMemSize

The computation here didn't really make sense to me, and reported
wildy different results depending on the flat work group size
attribute.

I think this should really report a range derived from the possible
work group size bounds, and only allow an occupancy that is a multiple
of the group size.

show more ...


Revision tags: llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3
# 4b472139 27-Aug-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Switch backend default max workgroup size to 1024

Previously this would default to 256, not the maximum supported size
of 1024. Using a maximum lower than the hardware maximum requires
langu

AMDGPU: Switch backend default max workgroup size to 1024

Previously this would default to 256, not the maximum supported size
of 1024. Using a maximum lower than the hardware maximum requires
language runtimes to enforce this limit for correctness, which no
language has correctly done. Switch the default to the conservatively
correct maximum, and force frontends to opt-in to the more optimal 256
default maximum.

I don't really understand why the changes in occupancy-levels.ll
increased the computed occupancy, which I expected to decrease. I'm
not sure if these tests should be forcing the old maximum.

show more ...


Revision tags: llvmorg-9.0.0-rc2
# 2594fa85 31-Jul-2019 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Fix high occupancy calculation and print it

We had couple places which still return 10 as a maximum
occupancy. Fixed.

Also print comment about occupancy as compiler see it.

Differential R

[AMDGPU] Fix high occupancy calculation and print it

We had couple places which still return 10 as a maximum
occupancy. Fixed.

Also print comment about occupancy as compiler see it.

Differential Revision: https://reviews.llvm.org/D65423

llvm-svn: 367381

show more ...