#
701c21ea |
| 29-Apr-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix crash with unreachable terminators.
If a block has no successors because it ends in unreachable, this was accessing an invalid iterator.
Also stop counting instructions that don't emit
AMDGPU: Fix crash with unreachable terminators.
If a block has no successors because it ends in unreachable, this was accessing an invalid iterator.
Also stop counting instructions that don't emit any real instructions.
llvm-svn: 268119
show more ...
|
#
df3a20cd |
| 06-Apr-2016 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Add a shader calling convention
This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders.
Patch By: Bas Nieuwenhuizen <bas@basnie
AMDGPU: Add a shader calling convention
This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders.
Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Differential Revision: http://reviews.llvm.org/D18559
llvm-svn: 265589
show more ...
|
#
213e87f2 |
| 21-Mar-2016 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Add SIWholeQuadMode pass
Summary: Whole quad mode is already enabled for pixel shaders that compute derivatives, but it must be suspended for instructions that cause a shader to have side ef
AMDGPU: Add SIWholeQuadMode pass
Summary: Whole quad mode is already enabled for pixel shaders that compute derivatives, but it must be suspended for instructions that cause a shader to have side effects (i.e. stores and atomics).
This pass addresses the issue by storing the real (initial) live mask in a register, masking EXEC before instructions that require exact execution and (re-)enabling WQM where required.
This pass is run before register coalescing so that we can use machine SSA for analysis.
The changes in this patch expose a problem with the second machine scheduling pass: target independent instructions like COPY implicitly use EXEC when they operate on VGPRs, but this fact is not encoded in the MIR. This can lead to miscompilation because instructions are moved past changes to EXEC.
This patch fixes the problem by adding use-implicit operands to target independent instructions. Some general codegen passes are relaxed to work with such implicit use operands.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: MatzeB, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18162
llvm-svn: 263982
show more ...
|
#
92339e88 |
| 21-Mar-2016 |
Tom Stellard <thomas.stellard@amd.com> |
AMDGPU/SI: Fix threshold calculation for branching when exec is zero
Summary: When control flow is implemented using the exec mask, the compiler will insert branch instructions to skip over the mask
AMDGPU/SI: Fix threshold calculation for branching when exec is zero
Summary: When control flow is implemented using the exec mask, the compiler will insert branch instructions to skip over the masked section when exec is zero if the section contains more than a certain number of instructions.
The previous code would only count instructions in successor blocks, and this patch modifies the code to start counting instructions in all blocks between the start and end of the branch.
Reviewers: nhaehnle, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18282
llvm-svn: 263969
show more ...
|
#
fa771811 |
| 18-Mar-2016 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: add missing braces around multi-line if block
This fixes an issue with rL263658 pointed out by Tom Stellard.
llvm-svn: 263823
|
#
ef160de3 |
| 16-Mar-2016 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Prevent uniform loops from becoming infinite
Summary: Uniform loops where the branch leaving the loop is predicated on VCCNZ must be skipped if EXEC = 0, otherwise they will be infinite.
Re
AMDGPU: Prevent uniform loops from becoming infinite
Summary: Uniform loops where the branch leaving the loop is predicated on VCCNZ must be skipped if EXEC = 0, otherwise they will be infinite.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18137
llvm-svn: 263658
show more ...
|
#
ed2213e6 |
| 14-Mar-2016 |
Marek Olsak <marek.olsak@amd.com> |
AMDGPU/SI: Incomplete shader binaries need to finish execution at the end
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D18058
llvm-svn: 263441
|
Revision tags: llvmorg-3.8.0, llvmorg-3.8.0-rc3 |
|
#
296b8491 |
| 12-Feb-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Set flat_scratch from flat_scratch_init reg
This was hardcoded to the static private size, but this would be missing the offset and additional size for someday when we have dynamic sizing.
AMDGPU: Set flat_scratch from flat_scratch_init reg
This was hardcoded to the static private size, but this would be missing the offset and additional size for someday when we have dynamic sizing.
Also stops always initializing flat_scratch even when unused.
In the future we should stop emitting this unless flat instructions are used to access private memory. For example this will initialize it almost always on VI because flat is used for global access.
llvm-svn: 260658
show more ...
|
#
55d49cfe |
| 12-Feb-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Initialize SILowerControlFlow
llvm-svn: 260645
|
#
806dd0a5 |
| 12-Feb-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove trailing whitespace
llvm-svn: 260644
|
Revision tags: llvmorg-3.8.0-rc2, llvmorg-3.8.0-rc1, llvmorg-3.7.1, llvmorg-3.7.1-rc2, llvmorg-3.7.1-rc1 |
|
#
391be09e |
| 21-Oct-2015 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix adding redundant m0 uses
BuildMI already adds these since they are defined correctly now.
llvm-svn: 250961
|
#
3add6439 |
| 20-Oct-2015 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add MachineInstr overloads for instruction format tests
llvm-svn: 250797
|
#
28419273 |
| 07-Oct-2015 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Use explicit register size indirect pseudos
This stops using an unknown reg class operand.
Currently build_vector selection has a broken looking check where it tries to use a VGPR reg class
AMDGPU: Use explicit register size indirect pseudos
This stops using an unknown reg class operand.
Currently build_vector selection has a broken looking check where it tries to use a VGPR reg class and an SGPR one if it sees an SGPR use.
With the source operand has an explicit VGPR class, illegal copies will be inserted that SIFixSGPRCopies will take care of normally later, which will allow removing the weird check of build_vector users. Without this, when removed v_movrels_b32 would still be emitted even though all of the values were only stored in SGPRs.
llvm-svn: 249494
show more ...
|
#
0cb8517d |
| 25-Sep-2015 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix recomputing dominator tree unnecessarily
SIFixSGPRCopies does not modify the CFG, but this was being recomputed before running SIFoldOperands.
llvm-svn: 248587
|
Revision tags: llvmorg-3.7.0, llvmorg-3.7.0-rc4, llvmorg-3.7.0-rc3, studio-1.4 |
|
#
46359155 |
| 08-Aug-2015 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU/SI: Remove VCCReg
llvm-svn: 244380
|
#
95f0606e |
| 05-Aug-2015 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU/SI: Remove EXECReg
For the same reasons as the other physical registers.
llvm-svn: 244062
|
Revision tags: llvmorg-3.7.0-rc2, llvmorg-3.7.0-rc1, llvmorg-3.6.2, llvmorg-3.6.2-rc1 |
|
#
45bb48ea |
| 13-Jun-2015 |
Tom Stellard <thomas.stellard@amd.com> |
R600 -> AMDGPU rename
llvm-svn: 239657
|