History log of /llvm-project/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp (Results 126 – 150 of 167)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3
# 6cf306de 23-Feb-2018 Nicolai Haehnle <nhaehnle@gmail.com>

AMDGPU: Track physreg uses in SILoadStoreOptimizer

Summary:
This handles def-after-use of physregs, and allows us to merge loads and
stores even across some physreg defs (typically M0 defs).

Change

AMDGPU: Track physreg uses in SILoadStoreOptimizer

Summary:
This handles def-after-use of physregs, and allows us to merge loads and
stores even across some physreg defs (typically M0 defs).

Change-Id: I076484b2bda27c2cf46013c845a0380c5b89b67b

Reviewers: arsenm, mareko, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D42647

llvm-svn: 325882

show more ...


# 770397f4 21-Feb-2018 Nicolai Haehnle <nhaehnle@gmail.com>

AMDGPU: Do not combine loads/store across physreg defs

Summary:
Since this pass operates on machine SSA form, this should only really
affect M0 in practice.

Fixes various piglit variable-indexing/v

AMDGPU: Do not combine loads/store across physreg defs

Summary:
Since this pass operates on machine SSA form, this should only really
affect M0 in practice.

Fixes various piglit variable-indexing/vs-varying-array-mat4-index-*

Change-Id: Ib2a1dc3a8d7b08225a8da49a86f533faa0986aa8
Fixes: r317751 ("AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4")

Reviewers: arsenm, mareko, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D40343

llvm-svn: 325677

show more ...


# b02cebf5 08-Feb-2018 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix incorrect reordering when inline asm defines LDS address

Defs of operands outside of the instruction's explicit defs need
to be checked.

llvm-svn: 324554


Revision tags: llvmorg-6.0.0-rc2
# b2cc7798 07-Feb-2018 Marek Olsak <marek.olsak@amd.com>

AMDGPU: Remove the s_buffer workaround for GFX9 chips

Summary:
I checked the AMD closed source compiler and the workaround is only
needed when x3 is emulated as x4, which we don't do in LLVM.

SMEM

AMDGPU: Remove the s_buffer workaround for GFX9 chips

Summary:
I checked the AMD closed source compiler and the workaround is only
needed when x3 is emulated as x4, which we don't do in LLVM.

SMEM x3 opcodes don't exist, and instead there is a possibility to use x4
with the last component being unused. If the last component is out of
buffer bounds and falls on the next 4K page, the hw hangs.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D42756

llvm-svn: 324486

show more ...


# 7687d420 22-Jan-2018 Mark Searles <m.c.searles@gmail.com>

[AMDGPU] SI Load Store Optimizer: When merging with offset, use V_ADD_{I|U}32_e64
- Change inserted add ( V_ADD_{I|U}32_e32 ) to _e64 version ( V_ADD_{I|U}32_e64 ) so that the add uses a vreg for the

[AMDGPU] SI Load Store Optimizer: When merging with offset, use V_ADD_{I|U}32_e64
- Change inserted add ( V_ADD_{I|U}32_e32 ) to _e64 version ( V_ADD_{I|U}32_e64 ) so that the add uses a vreg for the carry; this prevents inserted v_add from killing VCC; the _e64 version doesn't accept a literal in its encoding, so we need to introduce a mov instr as well to get the imm into a register.
- Change pass name to "SI Load Store Optimizer"; this removes the '/', which complicates scripts.

Differential Revision: https://reviews.llvm.org/D42124

llvm-svn: 323153

show more ...


Revision tags: llvmorg-6.0.0-rc1
# f1caa283 15-Dec-2017 Matthias Braun <matze@braunis.de>

MachineFunction: Return reference from getFunction(); NFC

The Function can never be nullptr so we can return a reference.

llvm-svn: 320884


Revision tags: llvmorg-5.0.1, llvmorg-5.0.1-rc3
# 84445dd1 30-Nov-2017 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Use gfx9 carry-less add/sub instructions

llvm-svn: 319491


# 3f71c0e3 29-Nov-2017 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Select DS insts without m0 initialization

GFX9 stopped using m0 for most DS instructions. Select
a different instruction without the use. I think this will
be less error prone than trying to

AMDGPU: Select DS insts without m0 initialization

GFX9 stopped using m0 for most DS instructions. Select
a different instruction without the use. I think this will
be less error prone than trying to manually maintain m0
uses as needed.

llvm-svn: 319270

show more ...


Revision tags: llvmorg-5.0.1-rc2
# b4f28ded 28-Nov-2017 Nicolai Haehnle <nhaehnle@gmail.com>

AMDGPU: Re-organize the outer loop of SILoadStoreOptimizer

Summary:
The entire algorithm operates per basic-block, so for cache locality
it should be better to re-optimize a basic-block immediately

AMDGPU: Re-organize the outer loop of SILoadStoreOptimizer

Summary:
The entire algorithm operates per basic-block, so for cache locality
it should be better to re-optimize a basic-block immediately rather than
in a separate loop.

I don't have performance measurements.

Change-Id: I85106570bd623c4ff277faaa50ee43258e1ddcc5

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D40344

llvm-svn: 319156

show more ...


# dd059c16 22-Nov-2017 Nicolai Haehnle <nhaehnle@gmail.com>

AMDGPU: Consider memory dependencies with moved instructions in SILoadStoreOptimizer

Summary:
This bug seems to have gone unnoticed because critical cases with LDS
instructions are eliminated by the

AMDGPU: Consider memory dependencies with moved instructions in SILoadStoreOptimizer

Summary:
This bug seems to have gone unnoticed because critical cases with LDS
instructions are eliminated by the peephole optimizer.

However, equivalent situations arise with buffer loads and stores
as well, so this fixes regressions since r317751 ("AMDGPU: Merge
S_BUFFER_LOAD_DWORD_IMM into x2, x4").

Fixes at least:
KHR-GL45.shader_storage_buffer_object.basic-operations-case1-cs
KHR-GL45.cull_distance.functional
piglit tes-input-gl_ClipDistance.shader_test
... and probably more

Change-Id: I0e371536288eb8e6afeaa241a185266fd45d129d

Reviewers: arsenm, mareko, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D40303

llvm-svn: 318829

show more ...


# bee1964d 09-Nov-2017 Vitaly Buka <vitalybuka@google.com>

Fix "default label in switch which covers all enumeration values" warning

llvm-svn: 317771


# 58410f37 09-Nov-2017 Marek Olsak <marek.olsak@amd.com>

AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4

Summary:
Only 56 shaders (out of 48486) are affected.

Totals from affected shaders (changed stats only):
SGPRS: 2420 -> 2460 (1.65 %)
Spill

AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4

Summary:
Only 56 shaders (out of 48486) are affected.

Totals from affected shaders (changed stats only):
SGPRS: 2420 -> 2460 (1.65 %)
Spilled VGPRs: 94 -> 112 (19.15 %)
Scratch size: 524 -> 528 (0.76 %) dwords per thread
Code Size: 187400 -> 184992 (-1.28 %) bytes

One DiRT Showdown shader spills 6 more VGPRs.
One Grid Autosport shader spills 12 more VGPRs.

The other 54 shaders only have a decrease in code size.
(I'm ignoring the SGPR noise)

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D39012

llvm-svn: 317755

show more ...


# 4c421a2d 09-Nov-2017 Marek Olsak <marek.olsak@amd.com>

AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4

Summary: Only 3 (out of 48486) shaders are affected.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm

AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4

Summary: Only 3 (out of 48486) shaders are affected.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D38951

llvm-svn: 317753

show more ...


# 6a0548ac 09-Nov-2017 Marek Olsak <marek.olsak@amd.com>

AMDGPU: Merge BUFFER_LOAD_DWORD_OFFEN into x2, x4

Summary:
-9.9% code size decrease in affected shaders.

Totals (changed stats only):
SGPRS: 2151462 -> 2170646 (0.89 %)
VGPRS: 1634612 -> 1640288 (0

AMDGPU: Merge BUFFER_LOAD_DWORD_OFFEN into x2, x4

Summary:
-9.9% code size decrease in affected shaders.

Totals (changed stats only):
SGPRS: 2151462 -> 2170646 (0.89 %)
VGPRS: 1634612 -> 1640288 (0.35 %)
Spilled SGPRs: 8942 -> 8940 (-0.02 %)
Code Size: 52940672 -> 51727288 (-2.29 %) bytes
Max Waves: 373066 -> 371718 (-0.36 %)

Totals from affected shaders:
SGPRS: 283520 -> 302704 (6.77 %)
VGPRS: 227632 -> 233308 (2.49 %)
Spilled SGPRs: 3966 -> 3964 (-0.05 %)
Code Size: 12203080 -> 10989696 (-9.94 %) bytes
Max Waves: 44070 -> 42722 (-3.06 %)

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D38950

llvm-svn: 317752

show more ...


# b953cc36 09-Nov-2017 Marek Olsak <marek.olsak@amd.com>

AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4

Summary:
Only constant offsets (*_IMM opcodes) are merged.
It reuses code for LDS load/store merging.
It relies on the scheduler to group loads.

Th

AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4

Summary:
Only constant offsets (*_IMM opcodes) are merged.
It reuses code for LDS load/store merging.
It relies on the scheduler to group loads.

The results are mixed, I think they are mostly positive. Most shaders are
affected, so here are total stats only:

SGPRS: 2072198 -> 2151462 (3.83 %)
VGPRS: 1628024 -> 1634612 (0.40 %)
Spilled SGPRs: 7883 -> 8942 (13.43 %)
Spilled VGPRs: 97 -> 101 (4.12 %)
Scratch size: 1488 -> 1492 (0.27 %) dwords per thread
Code Size: 60222620 -> 52940672 (-12.09 %) bytes
Max Waves: 374337 -> 373066 (-0.34 %)

There is 13.4% increase in SGPR spilling, DiRT Showdown spills a few more
VGPRs (now 37), but 12% decrease in code size.

These are the new stats for SGPR spilling. We already spill a lot SGPRs,
so it's uncertain whether more spilling will make any difference since
SGPRs are always spilled to VGPRs:

SGPR SPILLING APPS Shaders SpillSGPR AvgPerSh
alien_isolation 2938 100 0.0
batman_arkham_origins 589 6 0.0
bioshock-infinite 1769 4 0.0
borderlands2 3968 22 0.0
counter_strike_glob.. 1142 60 0.1
deus_ex_mankind_div.. 1410 79 0.1
dirt-showdown 533 4 0.0
dirt_rally 364 1163 3.2
divinity 1052 2 0.0
dota2 1747 7 0.0
f1-2015 776 1515 2.0
grid_autosport 1767 1505 0.9
hitman 1413 273 0.2
left_4_dead_2 1762 4 0.0
life_is_strange 1296 26 0.0
mad_max 358 96 0.3
metro_2033_redux 2670 60 0.0
payday2 1362 22 0.0
portal 474 3 0.0
saints_row_iv 1704 8 0.0
serious_sam_3_bfe 392 1348 3.4
shadow_of_mordor 1418 12 0.0
shadow_warrior 3956 239 0.1
talos_principle 324 1735 5.4
thea 172 17 0.1
tomb_raider 1449 215 0.1
total_war_warhammer 242 56 0.2
ue4_effects_cave 295 55 0.2
ue4_elemental 572 12 0.0
unigine_tropics 210 56 0.3
unigine_valley 278 152 0.5
victor_vran 1262 84 0.1
yofrankie 82 2 0.0

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D38949

llvm-svn: 317751

show more ...


Revision tags: llvmorg-5.0.1-rc1
# aba2b3d1 10-Oct-2017 NAKAMURA Takumi <geek4civic@gmail.com>

SILoadStoreOptimizer.cpp: Fix build; Clang doesn't like "using anonymous struct" since rL315256.

llvm-svn: 315283


Revision tags: llvmorg-5.0.0, llvmorg-5.0.0-rc5
# 67e72dee 31-Aug-2017 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Use set for tracked registers

The majority of the time spent in the pass checking
for the register reads. Rather than searching all of
the defined registers for uses in each instruction,
use

AMDGPU: Use set for tracked registers

The majority of the time spent in the pass checking
for the register reads. Rather than searching all of
the defined registers for uses in each instruction,
use a set of defined registers and check the operands
of the instruction.

This process still is algorithmically not great,
but with the additional trick of skipping the analysis
for addresses with one use, this brings one slow
testcase into a reasonable range.

llvm-svn: 312206

show more ...


# 3cb61634 30-Aug-2017 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Don't look for DS merge candidates with one use address

The merge is only possible if the base address register is the
same for the two instructions. If there is only the one use,
there's no

AMDGPU: Don't look for DS merge candidates with one use address

The merge is only possible if the base address register is the
same for the two instructions. If there is only the one use,
there's no point in doing an expensive forward scan checking
for memory interference looking for a merge candidate.

This gives a signficant improvement in one extreme testcase.
The code to do the scan is still algorithmically terrible,
so this is still the slowest pass in that example.

llvm-svn: 312096

show more ...


Revision tags: llvmorg-5.0.0-rc4
# 2d69c924 29-Aug-2017 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix typo

llvm-svn: 312040


Revision tags: llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2
# 59e12826 08-Aug-2017 Eugene Zelenko <eugene.zelenko@gmail.com>

[AMDGPU] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).

llvm-svn: 310328


Revision tags: llvmorg-5.0.0-rc1, llvmorg-4.0.1, llvmorg-4.0.1-rc3, llvmorg-4.0.1-rc2
# 8b61764c 18-May-2017 Francis Visoiu Mistrih <fvisoiumistrih@apple.com>

[LegacyPassManager] Remove TargetMachine constructors

This provides a new way to access the TargetMachine through
TargetPassConfig, as a dependency.

The patterns replaced here are:

* Passes handli

[LegacyPassManager] Remove TargetMachine constructors

This provides a new way to access the TargetMachine through
TargetPassConfig, as a dependency.

The patterns replaced here are:

* Passes handling a null TargetMachine call
`getAnalysisIfAvailable<TargetPassConfig>`.

* Passes not handling a null TargetMachine
`addRequired<TargetPassConfig>` and call
`getAnalysis<TargetPassConfig>`.

* MachineFunctionPasses now use MF.getTarget().

* Remove all the TargetMachine constructors.
* Remove INITIALIZE_TM_PASS.

This fixes a crash when running `llc -start-before prologepilog`.

PEI needs StackProtector, which gets constructed without a TargetMachine
by the pass manager. The StackProtector pass doesn't handle the case
where there is no TargetMachine, so it segfaults.

Related to PR30324.

Differential Revision: https://reviews.llvm.org/D33222

llvm-svn: 303360

show more ...


Revision tags: llvmorg-4.0.1-rc1
# 86b0a546 14-Apr-2017 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] added SIInstrInfo::getAddNoCarry() helper

Addressed rest of post submit comments from D31993.

Differential Revision: https://reviews.llvm.org/D32057

llvm-svn: 300288


# dbc9ba30 13-Apr-2017 Reid Kleckner <rnk@google.com>

Fix -Wunused-value warning

llvm-svn: 300254


# d026f79b 13-Apr-2017 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Combine DS operations with offsets bigger than byte

In many cases ds operations can be combined even if offsets do not
fit into 8 bit encoding. What it takes is to adjust base address.

Dif

[AMDGPU] Combine DS operations with offsets bigger than byte

In many cases ds operations can be combined even if offsets do not
fit into 8 bit encoding. What it takes is to adjust base address.

Differential Revision: https://reviews.llvm.org/D31993

llvm-svn: 300227

show more ...


Revision tags: llvmorg-4.0.0, llvmorg-4.0.0-rc4, llvmorg-4.0.0-rc3, llvmorg-4.0.0-rc2
# 6620376d 21-Jan-2017 Eugene Zelenko <eugene.zelenko@gmail.com>

[AMDGPU] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).

llvm-svn: 292688


1234567