Revision tags: llvmorg-4.0.0-rc1 |
|
#
116bbab4 |
| 13-Jan-2017 |
Diana Picus <diana.picus@linaro.org> |
[CodeGen] Rename MachineInstrBuilder::addOperand. NFC
Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand.
S
[CodeGen] Rename MachineInstrBuilder::addOperand. NFC
Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand.
See https://reviews.llvm.org/D28057 for the whole discussion.
Differential Revision: https://reviews.llvm.org/D28556
llvm-svn: 291891
show more ...
|
Revision tags: llvmorg-3.9.1, llvmorg-3.9.1-rc3, llvmorg-3.9.1-rc2, llvmorg-3.9.1-rc1 |
|
#
f867a40b |
| 03-Nov-2016 |
Alexander Timofeev <Alexander.Timofeev@amd.com> |
[AMDGPU][CodeGen] To improve CGEMM performance: combine LDS reads.
hange explores the fact that LDS reads may be reordered even if access the same location.
Prior the change, algorithm immediately
[AMDGPU][CodeGen] To improve CGEMM performance: combine LDS reads.
hange explores the fact that LDS reads may be reordered even if access the same location.
Prior the change, algorithm immediately stops as soon as any memory access encountered between loads that are expected to be merged together. Although, Read-After-Read conflict cannot affect execution correctness.
Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%. Also improvement expected on any massive sequences of reads from LDS.
Differential Revision: https://reviews.llvm.org/D25944
llvm-svn: 285919
show more ...
|
#
7b0e25b7 |
| 27-Oct-2016 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Fix SILoadStoreOptimizer when writes cannot be merged due register dependencies
Summary: When finding a match for a merge and collecting the instructions that must be moved, keep in mind tha
AMDGPU: Fix SILoadStoreOptimizer when writes cannot be merged due register dependencies
Summary: When finding a match for a merge and collecting the instructions that must be moved, keep in mind that the instruction we merge might actually use one of the defs that are being moved.
Fixes piglit spec/arb_enhanced_layouts/execution/component-layout/vs-tcs-load-output[-indirect].
The fact that the ds_read in the test case is not eliminated suggests that there might be another problem related to alias analysis, but that's a separate problem: this pass should still work correctly even when earlier optimization passes missed something or were disabled.
Reviewers: tstellarAMD, arsenm
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D25829
llvm-svn: 285273
show more ...
|
#
117296c0 |
| 01-Oct-2016 |
Mehdi Amini <mehdi.amini@apple.com> |
Use StringRef in Pass/PassManager APIs (NFC)
llvm-svn: 283004
|
#
9720f57a |
| 30-Aug-2016 |
NAKAMURA Takumi <geek4civic@gmail.com> |
SILoadStoreOptimizer.cpp: Fix a warning in r279991. [-Wunused-variable]
llvm-svn: 280075
|
#
c2ff0eb6 |
| 29-Aug-2016 |
Tom Stellard <thomas.stellard@amd.com> |
AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the scheduler
Summary: The SILoadStoreOptimizer can now look ahead more then one instruction when looking for instructions to merge, which g
AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the scheduler
Summary: The SILoadStoreOptimizer can now look ahead more then one instruction when looking for instructions to merge, which greatly improves the number of loads/stores that we are able to merge.
Moving the pass before scheduling avoids increasing register pressure after the scheduler, so that the scheduler's register pressure estimates will be more accurate. It also gives more consistent results, since it is no longer affected by minor scheduling changes.
Reviewers: arsenm
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: https://reviews.llvm.org/D23814
llvm-svn: 279991
show more ...
|
#
e175d8ab |
| 26-Aug-2016 |
Tom Stellard <thomas.stellard@amd.com> |
AMDGPU/SI: Canonicalize offset order for merged DS instructions
Summary: If the scheduler clusters the loads, then the offsets will be sorted, but it is possible for the scheduler to scheduler loads
AMDGPU/SI: Canonicalize offset order for merged DS instructions
Summary: If the scheduler clusters the loads, then the offsets will be sorted, but it is possible for the scheduler to scheduler loads together without out explicitly clustering them, which would give us non-sorted offsets.
Also, we will want to do this if we move the load/store optimizer before the scheduler.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D23776
llvm-svn: 279870
show more ...
|
Revision tags: llvmorg-3.9.0, llvmorg-3.9.0-rc3 |
|
#
90799ce8 |
| 23-Aug-2016 |
Matthias Braun <matze@braunis.de> |
MachineFunction: Introduce NoPHIs property
I want to compute the SSA property of .mir files automatically in upcoming patches. The problem with this is that some inputs will be reported as static si
MachineFunction: Introduce NoPHIs property
I want to compute the SSA property of .mir files automatically in upcoming patches. The problem with this is that some inputs will be reported as static single assignment with some passes claiming not to support SSA form. In reality though those passes do not support PHI instructions => Track the presence of PHI instructions separate from the SSA property.
Differential Revision: https://reviews.llvm.org/D22719
llvm-svn: 279573
show more ...
|
Revision tags: llvmorg-3.9.0-rc2 |
|
#
0dd9ed1d |
| 13-Aug-2016 |
Hans Wennborg <hans@hanshq.net> |
Fix more dereferenced end() iterators after r278532
llvm-svn: 278587
|
Revision tags: llvmorg-3.9.0-rc1 |
|
#
03d85845 |
| 27-Jun-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Move subtarget feature checks into passes
llvm-svn: 273937
|
#
43e92fe3 |
| 24-Jun-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Cleanup subtarget handling.
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict
AMDGPU: Cleanup subtarget handling.
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target.
llvm-svn: 273652
show more ...
|
#
48975881 |
| 21-Jun-2016 |
Rafael Espindola <rafael.espindola@gmail.com> |
Delete some dead code.
Found by gcc 6.
llvm-svn: 273303
|
Revision tags: llvmorg-3.8.1, llvmorg-3.8.1-rc1 |
|
#
7de74af9 |
| 25-Apr-2016 |
Andrew Kaylor <andrew.kaylor@intel.com> |
Add optimization bisect opt-in calls for AMDGPU passes
Differential Revision: http://reviews.llvm.org/D19450
llvm-svn: 267485
|
#
ecc7cbf6 |
| 29-Mar-2016 |
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> |
Test commit access
llvm-svn: 264736
|
Revision tags: llvmorg-3.8.0 |
|
#
3ac9cc61 |
| 27-Feb-2016 |
Duncan P. N. Exon Smith <dexonsmith@apple.com> |
CodeGen: Take MachineInstr& in SlotIndexes and LiveIntervals, NFC
Take MachineInstr by reference instead of by pointer in SlotIndexes and the SlotIndex wrappers in LiveIntervals. The MachineInstrs
CodeGen: Take MachineInstr& in SlotIndexes and LiveIntervals, NFC
Take MachineInstr by reference instead of by pointer in SlotIndexes and the SlotIndex wrappers in LiveIntervals. The MachineInstrs here are never null, so this cleans up the API a bit. It also incidentally removes a few implicit conversions from MachineInstrBundleIterator to MachineInstr* (see PR26753).
At a couple of call sites it was convenient to convert to a range-based for loop over MachineBasicBlock::instr_begin/instr_end, so I added MachineBasicBlock::instrs.
llvm-svn: 262115
show more ...
|
Revision tags: llvmorg-3.8.0-rc3, llvmorg-3.8.0-rc2, llvmorg-3.8.0-rc1, llvmorg-3.7.1, llvmorg-3.7.1-rc2, llvmorg-3.7.1-rc1, llvmorg-3.7.0, llvmorg-3.7.0-rc4, llvmorg-3.7.0-rc3, studio-1.4, llvmorg-3.7.0-rc2, llvmorg-3.7.0-rc1 |
|
#
84db5d97 |
| 14-Jul-2015 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU/SI: Fix read2 merging into a super register.
If the read2 produced was supposed to be writing into a super register, it would use the wrong subregister indices. Fix this by inserting copies,
AMDGPU/SI: Fix read2 merging into a super register.
If the read2 produced was supposed to be writing into a super register, it would use the wrong subregister indices. Fix this by inserting copies, so we only ever write to a vreg_64. Run the register coalescer again to clean this up, although this isn't ideal and often does result in an extra move.
Also remove the assert that offset1 > offset0.
There isn't a real reason to not allow this other than a minor convenience in the compiler, and it doesn't seem worth the effort of avoiding it.
llvm-svn: 242174
show more ...
|
Revision tags: llvmorg-3.6.2, llvmorg-3.6.2-rc1 |
|
#
45bb48ea |
| 13-Jun-2015 |
Tom Stellard <thomas.stellard@amd.com> |
R600 -> AMDGPU rename
llvm-svn: 239657
|