Revision tags: llvmorg-4.0.0-rc2, llvmorg-4.0.0-rc1 |
|
#
69e3001b |
| 11-Jan-2017 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix folding immediates into mac src2
Whether it is legal or not needs to check for the instruction it will be replaced with.
llvm-svn: 291711
|
#
51818c14 |
| 10-Jan-2017 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Constant fold when immediate is materialized
In future commits these patterns will appear after moveToVALU changes.
llvm-svn: 291615
|
#
4bd72361 |
| 10-Dec-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix handling of 16-bit immediates
Since 32-bit instructions with 32-bit input immediate behavior are used to materialize 16-bit constants in 32-bit registers for 16-bit instructions, determi
AMDGPU: Fix handling of 16-bit immediates
Since 32-bit instructions with 32-bit input immediate behavior are used to materialize 16-bit constants in 32-bit registers for 16-bit instructions, determining the legality based on the size is incorrect. Change operands to have the size specified in the type.
Also adds a workaround for a disassembler bug that produces an immediate MCOperand for an operand that is supposed to be OPERAND_REGISTER.
The assembler appears to accept out of bounds immediates and truncates them, but this seems to be an issue for 32-bit already.
llvm-svn: 289306
show more ...
|
#
8485fa09 |
| 07-Dec-2016 |
Tom Stellard <thomas.stellard@amd.com> |
AMDGPU : Add S_SETREG instructions to fix fdiv precision issues.
Patch By: Wei Ding
Summary: This patch fixes the fdiv precision issues.
Reviewers: b-sumner, cfang, wdng, arsenm
Subscribers: kzhu
AMDGPU : Add S_SETREG instructions to fix fdiv precision issues.
Patch By: Wei Ding
Summary: This patch fixes the fdiv precision issues.
Reviewers: b-sumner, cfang, wdng, arsenm
Subscribers: kzhuravl, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D26424
llvm-svn: 288879
show more ...
|
Revision tags: llvmorg-3.9.1, llvmorg-3.9.1-rc3, llvmorg-3.9.1-rc2 |
|
#
ff8bb49b |
| 29-Nov-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Refactor immediate folding logic
Change the logic for when to fold immediates to consider the destination operand rather than the source of the materializing mov instruction.
No change yet,
AMDGPU: Refactor immediate folding logic
Change the logic for when to fold immediates to consider the destination operand rather than the source of the materializing mov instruction.
No change yet, but this will allow for correctly handling i16/f16 operands. Since 32-bit moves are used to materialize constants for these, the same bitvalue will not be in the register.
llvm-svn: 288184
show more ...
|
Revision tags: llvmorg-3.9.1-rc1 |
|
#
a24d84be |
| 23-Nov-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Cleanup immediate folding code
Move code down to use, reorder to avoid hard to follow immediate folding logic.
llvm-svn: 287818
|
#
391c3ea9 |
| 23-Nov-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix debug printing
The uint8_t was printed as a char which didn't really work.
llvm-svn: 287817
|
#
f86e4b72 |
| 13-Nov-2016 |
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> |
[AMDGPU] Add f16 support (VI+)
Differential Revision: https://reviews.llvm.org/D25975
llvm-svn: 286753
|
#
5e63a04e |
| 06-Oct-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Don't fold undef uses or copies with implicit uses
llvm-svn: 283476
|
#
c2ee42cd |
| 06-Oct-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove leftover implicit operands when folding immediates
When constant folding an operation to a copy or an immediate mov, the implicit uses/defs of the old instruction were left behind, e.
AMDGPU: Remove leftover implicit operands when folding immediates
When constant folding an operation to a copy or an immediate mov, the implicit uses/defs of the old instruction were left behind, e.g. replacing v_or_b32 left the implicit exec use on the new copy.
llvm-svn: 283471
show more ...
|
#
117296c0 |
| 01-Oct-2016 |
Mehdi Amini <mehdi.amini@apple.com> |
Use StringRef in Pass/PassManager APIs (NFC)
llvm-svn: 283004
|
#
2bc198a3 |
| 14-Sep-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Support folding FrameIndex operands
This avoids test regressions in a future commit.
llvm-svn: 281491
|
#
fa5f767a |
| 14-Sep-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Improve splitting 64-bit bit ops by constants
This addresses a TODO to handle operations besides and. This also starts eliminating no-op operations with a constant that can emerge later.
ll
AMDGPU: Improve splitting 64-bit bit ops by constants
This addresses a TODO to handle operations besides and. This also starts eliminating no-op operations with a constant that can emerge later.
llvm-svn: 281488
show more ...
|
Revision tags: llvmorg-3.9.0, llvmorg-3.9.0-rc3, llvmorg-3.9.0-rc2 |
|
#
3661e90e |
| 15-Aug-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Don't fold subregister extracts into tied operands
llvm-svn: 278676
|
Revision tags: llvmorg-3.9.0-rc1 |
|
#
9cfc75c2 |
| 30-Jun-2016 |
Duncan P. N. Exon Smith <dexonsmith@apple.com> |
CodeGen: Use MachineInstr& in TargetInstrInfo, NFC
This is mostly a mechanical change to make TargetInstrInfo API take MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator) when th
CodeGen: Use MachineInstr& in TargetInstrInfo, NFC
This is mostly a mechanical change to make TargetInstrInfo API take MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator) when the argument is expected to be a valid MachineInstr. This is a general API improvement.
Although it would be possible to do this one function at a time, that would demand a quadratic amount of churn since many of these functions call each other. Instead I've done everything as a block and just updated what was necessary.
This is mostly mechanical fixes: adding and removing `*` and `&` operators. The only non-mechanical change is to split ARMBaseInstrInfo::getOperandLatencyImpl out from ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a `MachineInstr*` which it updated to the instruction bundle leader; now, the latter calls the former either with the same `MachineInstr&` or the bundle leader.
As a side effect, this removes a bunch of MachineInstr* to MachineBasicBlock::iterator implicit conversions, a necessary step toward fixing PR26753.
Note: I updated WebAssembly, Lanai, and AVR (despite being off-by-default) since it turned out to be easy. I couldn't run tests for AVR since llc doesn't link with it turned on.
llvm-svn: 274189
show more ...
|
#
43e92fe3 |
| 24-Jun-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Cleanup subtarget handling.
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict
AMDGPU: Cleanup subtarget handling.
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target.
llvm-svn: 273652
show more ...
|
Revision tags: llvmorg-3.8.1, llvmorg-3.8.1-rc1 |
|
#
7de74af9 |
| 25-Apr-2016 |
Andrew Kaylor <andrew.kaylor@intel.com> |
Add optimization bisect opt-in calls for AMDGPU passes
Differential Revision: http://reviews.llvm.org/D19450
llvm-svn: 267485
|
Revision tags: llvmorg-3.8.0, llvmorg-3.8.0-rc3 |
|
#
427c5489 |
| 11-Feb-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix passes depending on dominator tree for no reason
llvm-svn: 260494
|
Revision tags: llvmorg-3.8.0-rc2, llvmorg-3.8.0-rc1 |
|
#
926c56f5 |
| 13-Jan-2016 |
Marek Olsak <marek.olsak@amd.com> |
AMDGPU/SI: Fix a bug in SIFoldOperands
Summary: ret.ll will contain a test for this
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D16029
llvm-
AMDGPU/SI: Fix a bug in SIFoldOperands
Summary: ret.ll will contain a test for this
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D16029
llvm-svn: 257590
show more ...
|
#
82fc962c |
| 07-Jan-2016 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU/SI: Fold operands with sub-registers
Summary: Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs, increasing the code size and VGPR pressure. These moves are now fol
AMDGPU/SI: Fold operands with sub-registers
Summary: Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs, increasing the code size and VGPR pressure. These moves are now folded away.
Note that this lack of operand folding was not a problem for VMEM loads, because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register coalescer.
Some tests are updated, note that the fsub.ll test explicitly checks that the move is elided.
With the IR generated by current Mesa, the changes are obviously relatively minor:
7063 shaders in 3531 tests Totals: SGPRS: 351872 -> 352560 (0.20 %) VGPRS: 199984 -> 200732 (0.37 %) Code Size: 9876968 -> 9881112 (0.04 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave Wait states: 295164 -> 295337 (0.06 %)
Totals from affected shaders: SGPRS: 65784 -> 66472 (1.05 %) VGPRS: 38064 -> 38812 (1.97 %) Code Size: 1993828 -> 1997972 (0.21 %) bytes LDS: 42 -> 42 (0.00 %) blocks Scratch: 795648 -> 783360 (-1.54 %) bytes per wave Wait states: 54026 -> 54199 (0.32 %)
Reviewers: tstellarAMD, arsenm, mareko
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15875
llvm-svn: 257074
show more ...
|
Revision tags: llvmorg-3.7.1, llvmorg-3.7.1-rc2, llvmorg-3.7.1-rc1 |
|
#
e8c0891e |
| 21-Oct-2015 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix verifier error in SIFoldOperands
There may be other use operands that also need their kill flags cleared.
This happens in a few tests when SIFoldOperands is moved after PeepholeOptimize
AMDGPU: Fix verifier error in SIFoldOperands
There may be other use operands that also need their kill flags cleared.
This happens in a few tests when SIFoldOperands is moved after PeepholeOptimizer.
PeepholeOptimizer rewrites cases that look like: %vreg0 = ... %vreg1 = COPY %vreg0 use %vreg1<kill> %vreg2 = COPY %vreg0 use %vreg2<kill>
to use the earlier source to %vreg0 = ... use %vreg0 use %vreg0
Currently SIFoldOperands sees the copied registers, so there is only one use. So far I haven't managed to come up with a test that currently has multiple uses of a foldable VGPR -> VGPR copy.
llvm-svn: 250960
show more ...
|
#
16c4da03 |
| 28-Sep-2015 |
Andrew Kaylor <andrew.kaylor@intel.com> |
Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing.
Patch by Slava Klochkov (vyacheslav.n.klochkov@intel.com)
Differential Revision: http://reviews.llvm.
Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing.
Patch by Slava Klochkov (vyacheslav.n.klochkov@intel.com)
Differential Revision: http://reviews.llvm.org/D11370
llvm-svn: 248735
show more ...
|
#
0cb8517d |
| 25-Sep-2015 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix recomputing dominator tree unnecessarily
SIFixSGPRCopies does not modify the CFG, but this was being recomputed before running SIFoldOperands.
llvm-svn: 248587
|
#
ad46e0c1 |
| 10-Sep-2015 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU/SI: Fix creating v_mov_b32s without exec uses
This will be caught by existing tests with a verifier check to be added in a future commit.
llvm-svn: 247229
|
#
9a197676 |
| 09-Sep-2015 |
Tom Stellard <thomas.stellard@amd.com> |
AMDGPU/SI: Fold operands through REG_SEQUENCE instructions
Summary: This helps mostly when we use add instructions for address calculations that contain immediates.
Reviewers: arsenm
Subscribers:
AMDGPU/SI: Fold operands through REG_SEQUENCE instructions
Summary: This helps mostly when we use add instructions for address calculations that contain immediates.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D12256
llvm-svn: 247157
show more ...
|