Revision tags: llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3 |
|
#
6cf306de |
| 23-Feb-2018 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Track physreg uses in SILoadStoreOptimizer
Summary: This handles def-after-use of physregs, and allows us to merge loads and stores even across some physreg defs (typically M0 defs).
Change
AMDGPU: Track physreg uses in SILoadStoreOptimizer
Summary: This handles def-after-use of physregs, and allows us to merge loads and stores even across some physreg defs (typically M0 defs).
Change-Id: I076484b2bda27c2cf46013c845a0380c5b89b67b
Reviewers: arsenm, mareko, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D42647
llvm-svn: 325882
show more ...
|
#
770397f4 |
| 21-Feb-2018 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Do not combine loads/store across physreg defs
Summary: Since this pass operates on machine SSA form, this should only really affect M0 in practice.
Fixes various piglit variable-indexing/v
AMDGPU: Do not combine loads/store across physreg defs
Summary: Since this pass operates on machine SSA form, this should only really affect M0 in practice.
Fixes various piglit variable-indexing/vs-varying-array-mat4-index-*
Change-Id: Ib2a1dc3a8d7b08225a8da49a86f533faa0986aa8 Fixes: r317751 ("AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4")
Reviewers: arsenm, mareko, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D40343
llvm-svn: 325677
show more ...
|
#
b02cebf5 |
| 08-Feb-2018 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix incorrect reordering when inline asm defines LDS address
Defs of operands outside of the instruction's explicit defs need to be checked.
llvm-svn: 324554
|
Revision tags: llvmorg-6.0.0-rc2 |
|
#
b2cc7798 |
| 07-Feb-2018 |
Marek Olsak <marek.olsak@amd.com> |
AMDGPU: Remove the s_buffer workaround for GFX9 chips
Summary: I checked the AMD closed source compiler and the workaround is only needed when x3 is emulated as x4, which we don't do in LLVM.
SMEM
AMDGPU: Remove the s_buffer workaround for GFX9 chips
Summary: I checked the AMD closed source compiler and the workaround is only needed when x3 is emulated as x4, which we don't do in LLVM.
SMEM x3 opcodes don't exist, and instead there is a possibility to use x4 with the last component being unused. If the last component is out of buffer bounds and falls on the next 4K page, the hw hangs.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D42756
llvm-svn: 324486
show more ...
|
#
7687d420 |
| 22-Jan-2018 |
Mark Searles <m.c.searles@gmail.com> |
[AMDGPU] SI Load Store Optimizer: When merging with offset, use V_ADD_{I|U}32_e64 - Change inserted add ( V_ADD_{I|U}32_e32 ) to _e64 version ( V_ADD_{I|U}32_e64 ) so that the add uses a vreg for the
[AMDGPU] SI Load Store Optimizer: When merging with offset, use V_ADD_{I|U}32_e64 - Change inserted add ( V_ADD_{I|U}32_e32 ) to _e64 version ( V_ADD_{I|U}32_e64 ) so that the add uses a vreg for the carry; this prevents inserted v_add from killing VCC; the _e64 version doesn't accept a literal in its encoding, so we need to introduce a mov instr as well to get the imm into a register. - Change pass name to "SI Load Store Optimizer"; this removes the '/', which complicates scripts.
Differential Revision: https://reviews.llvm.org/D42124
llvm-svn: 323153
show more ...
|
Revision tags: llvmorg-6.0.0-rc1 |
|
#
f1caa283 |
| 15-Dec-2017 |
Matthias Braun <matze@braunis.de> |
MachineFunction: Return reference from getFunction(); NFC
The Function can never be nullptr so we can return a reference.
llvm-svn: 320884
|
Revision tags: llvmorg-5.0.1, llvmorg-5.0.1-rc3 |
|
#
84445dd1 |
| 30-Nov-2017 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Use gfx9 carry-less add/sub instructions
llvm-svn: 319491
|
#
3f71c0e3 |
| 29-Nov-2017 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Select DS insts without m0 initialization
GFX9 stopped using m0 for most DS instructions. Select a different instruction without the use. I think this will be less error prone than trying to
AMDGPU: Select DS insts without m0 initialization
GFX9 stopped using m0 for most DS instructions. Select a different instruction without the use. I think this will be less error prone than trying to manually maintain m0 uses as needed.
llvm-svn: 319270
show more ...
|
Revision tags: llvmorg-5.0.1-rc2 |
|
#
b4f28ded |
| 28-Nov-2017 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Re-organize the outer loop of SILoadStoreOptimizer
Summary: The entire algorithm operates per basic-block, so for cache locality it should be better to re-optimize a basic-block immediately
AMDGPU: Re-organize the outer loop of SILoadStoreOptimizer
Summary: The entire algorithm operates per basic-block, so for cache locality it should be better to re-optimize a basic-block immediately rather than in a separate loop.
I don't have performance measurements.
Change-Id: I85106570bd623c4ff277faaa50ee43258e1ddcc5
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D40344
llvm-svn: 319156
show more ...
|
#
dd059c16 |
| 22-Nov-2017 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Consider memory dependencies with moved instructions in SILoadStoreOptimizer
Summary: This bug seems to have gone unnoticed because critical cases with LDS instructions are eliminated by the
AMDGPU: Consider memory dependencies with moved instructions in SILoadStoreOptimizer
Summary: This bug seems to have gone unnoticed because critical cases with LDS instructions are eliminated by the peephole optimizer.
However, equivalent situations arise with buffer loads and stores as well, so this fixes regressions since r317751 ("AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4").
Fixes at least: KHR-GL45.shader_storage_buffer_object.basic-operations-case1-cs KHR-GL45.cull_distance.functional piglit tes-input-gl_ClipDistance.shader_test ... and probably more
Change-Id: I0e371536288eb8e6afeaa241a185266fd45d129d
Reviewers: arsenm, mareko, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D40303
llvm-svn: 318829
show more ...
|
#
bee1964d |
| 09-Nov-2017 |
Vitaly Buka <vitalybuka@google.com> |
Fix "default label in switch which covers all enumeration values" warning
llvm-svn: 317771
|
#
58410f37 |
| 09-Nov-2017 |
Marek Olsak <marek.olsak@amd.com> |
AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4
Summary: Only 56 shaders (out of 48486) are affected.
Totals from affected shaders (changed stats only): SGPRS: 2420 -> 2460 (1.65 %) Spill
AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4
Summary: Only 56 shaders (out of 48486) are affected.
Totals from affected shaders (changed stats only): SGPRS: 2420 -> 2460 (1.65 %) Spilled VGPRs: 94 -> 112 (19.15 %) Scratch size: 524 -> 528 (0.76 %) dwords per thread Code Size: 187400 -> 184992 (-1.28 %) bytes
One DiRT Showdown shader spills 6 more VGPRs. One Grid Autosport shader spills 12 more VGPRs.
The other 54 shaders only have a decrease in code size. (I'm ignoring the SGPR noise)
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D39012
llvm-svn: 317755
show more ...
|
#
4c421a2d |
| 09-Nov-2017 |
Marek Olsak <marek.olsak@amd.com> |
AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4
Summary: Only 3 (out of 48486) shaders are affected.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm
AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4
Summary: Only 3 (out of 48486) shaders are affected.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D38951
llvm-svn: 317753
show more ...
|
#
6a0548ac |
| 09-Nov-2017 |
Marek Olsak <marek.olsak@amd.com> |
AMDGPU: Merge BUFFER_LOAD_DWORD_OFFEN into x2, x4
Summary: -9.9% code size decrease in affected shaders.
Totals (changed stats only): SGPRS: 2151462 -> 2170646 (0.89 %) VGPRS: 1634612 -> 1640288 (0
AMDGPU: Merge BUFFER_LOAD_DWORD_OFFEN into x2, x4
Summary: -9.9% code size decrease in affected shaders.
Totals (changed stats only): SGPRS: 2151462 -> 2170646 (0.89 %) VGPRS: 1634612 -> 1640288 (0.35 %) Spilled SGPRs: 8942 -> 8940 (-0.02 %) Code Size: 52940672 -> 51727288 (-2.29 %) bytes Max Waves: 373066 -> 371718 (-0.36 %)
Totals from affected shaders: SGPRS: 283520 -> 302704 (6.77 %) VGPRS: 227632 -> 233308 (2.49 %) Spilled SGPRs: 3966 -> 3964 (-0.05 %) Code Size: 12203080 -> 10989696 (-9.94 %) bytes Max Waves: 44070 -> 42722 (-3.06 %)
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D38950
llvm-svn: 317752
show more ...
|
#
b953cc36 |
| 09-Nov-2017 |
Marek Olsak <marek.olsak@amd.com> |
AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4
Summary: Only constant offsets (*_IMM opcodes) are merged. It reuses code for LDS load/store merging. It relies on the scheduler to group loads.
Th
AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4
Summary: Only constant offsets (*_IMM opcodes) are merged. It reuses code for LDS load/store merging. It relies on the scheduler to group loads.
The results are mixed, I think they are mostly positive. Most shaders are affected, so here are total stats only:
SGPRS: 2072198 -> 2151462 (3.83 %) VGPRS: 1628024 -> 1634612 (0.40 %) Spilled SGPRs: 7883 -> 8942 (13.43 %) Spilled VGPRs: 97 -> 101 (4.12 %) Scratch size: 1488 -> 1492 (0.27 %) dwords per thread Code Size: 60222620 -> 52940672 (-12.09 %) bytes Max Waves: 374337 -> 373066 (-0.34 %)
There is 13.4% increase in SGPR spilling, DiRT Showdown spills a few more VGPRs (now 37), but 12% decrease in code size.
These are the new stats for SGPR spilling. We already spill a lot SGPRs, so it's uncertain whether more spilling will make any difference since SGPRs are always spilled to VGPRs:
SGPR SPILLING APPS Shaders SpillSGPR AvgPerSh alien_isolation 2938 100 0.0 batman_arkham_origins 589 6 0.0 bioshock-infinite 1769 4 0.0 borderlands2 3968 22 0.0 counter_strike_glob.. 1142 60 0.1 deus_ex_mankind_div.. 1410 79 0.1 dirt-showdown 533 4 0.0 dirt_rally 364 1163 3.2 divinity 1052 2 0.0 dota2 1747 7 0.0 f1-2015 776 1515 2.0 grid_autosport 1767 1505 0.9 hitman 1413 273 0.2 left_4_dead_2 1762 4 0.0 life_is_strange 1296 26 0.0 mad_max 358 96 0.3 metro_2033_redux 2670 60 0.0 payday2 1362 22 0.0 portal 474 3 0.0 saints_row_iv 1704 8 0.0 serious_sam_3_bfe 392 1348 3.4 shadow_of_mordor 1418 12 0.0 shadow_warrior 3956 239 0.1 talos_principle 324 1735 5.4 thea 172 17 0.1 tomb_raider 1449 215 0.1 total_war_warhammer 242 56 0.2 ue4_effects_cave 295 55 0.2 ue4_elemental 572 12 0.0 unigine_tropics 210 56 0.3 unigine_valley 278 152 0.5 victor_vran 1262 84 0.1 yofrankie 82 2 0.0
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D38949
llvm-svn: 317751
show more ...
|
Revision tags: llvmorg-5.0.1-rc1 |
|
#
aba2b3d1 |
| 10-Oct-2017 |
NAKAMURA Takumi <geek4civic@gmail.com> |
SILoadStoreOptimizer.cpp: Fix build; Clang doesn't like "using anonymous struct" since rL315256.
llvm-svn: 315283
|
Revision tags: llvmorg-5.0.0, llvmorg-5.0.0-rc5 |
|
#
67e72dee |
| 31-Aug-2017 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Use set for tracked registers
The majority of the time spent in the pass checking for the register reads. Rather than searching all of the defined registers for uses in each instruction, use
AMDGPU: Use set for tracked registers
The majority of the time spent in the pass checking for the register reads. Rather than searching all of the defined registers for uses in each instruction, use a set of defined registers and check the operands of the instruction.
This process still is algorithmically not great, but with the additional trick of skipping the analysis for addresses with one use, this brings one slow testcase into a reasonable range.
llvm-svn: 312206
show more ...
|
#
3cb61634 |
| 30-Aug-2017 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Don't look for DS merge candidates with one use address
The merge is only possible if the base address register is the same for the two instructions. If there is only the one use, there's no
AMDGPU: Don't look for DS merge candidates with one use address
The merge is only possible if the base address register is the same for the two instructions. If there is only the one use, there's no point in doing an expensive forward scan checking for memory interference looking for a merge candidate.
This gives a signficant improvement in one extreme testcase. The code to do the scan is still algorithmically terrible, so this is still the slowest pass in that example.
llvm-svn: 312096
show more ...
|
Revision tags: llvmorg-5.0.0-rc4 |
|
#
2d69c924 |
| 29-Aug-2017 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix typo
llvm-svn: 312040
|
Revision tags: llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2 |
|
#
59e12826 |
| 08-Aug-2017 |
Eugene Zelenko <eugene.zelenko@gmail.com> |
[AMDGPU] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 310328
|
Revision tags: llvmorg-5.0.0-rc1, llvmorg-4.0.1, llvmorg-4.0.1-rc3, llvmorg-4.0.1-rc2 |
|
#
8b61764c |
| 18-May-2017 |
Francis Visoiu Mistrih <fvisoiumistrih@apple.com> |
[LegacyPassManager] Remove TargetMachine constructors
This provides a new way to access the TargetMachine through TargetPassConfig, as a dependency.
The patterns replaced here are:
* Passes handli
[LegacyPassManager] Remove TargetMachine constructors
This provides a new way to access the TargetMachine through TargetPassConfig, as a dependency.
The patterns replaced here are:
* Passes handling a null TargetMachine call `getAnalysisIfAvailable<TargetPassConfig>`.
* Passes not handling a null TargetMachine `addRequired<TargetPassConfig>` and call `getAnalysis<TargetPassConfig>`.
* MachineFunctionPasses now use MF.getTarget().
* Remove all the TargetMachine constructors. * Remove INITIALIZE_TM_PASS.
This fixes a crash when running `llc -start-before prologepilog`.
PEI needs StackProtector, which gets constructed without a TargetMachine by the pass manager. The StackProtector pass doesn't handle the case where there is no TargetMachine, so it segfaults.
Related to PR30324.
Differential Revision: https://reviews.llvm.org/D33222
llvm-svn: 303360
show more ...
|
Revision tags: llvmorg-4.0.1-rc1 |
|
#
86b0a546 |
| 14-Apr-2017 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] added SIInstrInfo::getAddNoCarry() helper
Addressed rest of post submit comments from D31993.
Differential Revision: https://reviews.llvm.org/D32057
llvm-svn: 300288
|
#
dbc9ba30 |
| 13-Apr-2017 |
Reid Kleckner <rnk@google.com> |
Fix -Wunused-value warning
llvm-svn: 300254
|
#
d026f79b |
| 13-Apr-2017 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Combine DS operations with offsets bigger than byte
In many cases ds operations can be combined even if offsets do not fit into 8 bit encoding. What it takes is to adjust base address.
Dif
[AMDGPU] Combine DS operations with offsets bigger than byte
In many cases ds operations can be combined even if offsets do not fit into 8 bit encoding. What it takes is to adjust base address.
Differential Revision: https://reviews.llvm.org/D31993
llvm-svn: 300227
show more ...
|
Revision tags: llvmorg-4.0.0, llvmorg-4.0.0-rc4, llvmorg-4.0.0-rc3, llvmorg-4.0.0-rc2 |
|
#
6620376d |
| 21-Jan-2017 |
Eugene Zelenko <eugene.zelenko@gmail.com> |
[AMDGPU] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 292688
|