Revision tags: llvmorg-3.9.0-rc1 |
|
#
3b572002 |
| 28-Jul-2016 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: add execfix flag to SI_ELSE
Summary: SI_ELSE is lowered into two parts:
s_or_saveexec_b64 dst, src (at the start of the basic block)
s_xor_b64 exec, exec, dst (at the end of the basic bloc
AMDGPU: add execfix flag to SI_ELSE
Summary: SI_ELSE is lowered into two parts:
s_or_saveexec_b64 dst, src (at the start of the basic block)
s_xor_b64 exec, exec, dst (at the end of the basic block)
The idea is that dst contains the exec mask of the preceding IF block. It can happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside the basic block that contains SI_ELSE, in which case it introduces an instruction
s_and_b64 exec, exec, s[...]
which masks out bits that can correspond to both the IF and the ELSE paths. So the resulting sequence must be:
s_or_savexec_b64 dst, src
s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode s_and_b64 dst, dst, exec <-- added by SILowerControlFlow
s_xor_b64 exec, exec, dst
Whether to add the additional s_and_b64 dst, dst, exec is currently determined via the ExecModified tracking. With this change, it is instead determined by an additional flag on SI_ELSE which is set by SIWholeQuadMode.
Finally: It also occured to me that an alternative approach for the long run is for SILowerControlFlow to unconditionally emit
s_or_saveexec_b64 dst, src
...
s_and_b64 dst, dst, exec s_xor_b64 exec, exec, dst
and have a pass that detects and cleans up the "redundant AND with exec" pattern where possible. This could be useful anyway, because we also add instructions
s_and_b64 vcc, exec, vcc
before s_cbranch_scc (in moveToALU), and those are often redundant. I have some pending changes to how KILL is lowered that could also benefit from such a cleanup pass.
In any case, this current patch could help in the short term with the whole ExecModified business.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D22846
llvm-svn: 276972
show more ...
|
#
46cb48c7 |
| 27-Jul-2016 |
Reid Kleckner <rnk@google.com> |
Remove MCAsmInfo.h include from TargetOptions.h
TargetOptions wants the ExceptionHandling enum. Move that to MCTargetOptions.h to avoid transitively including Dwarf.h everywhere in clang. Now you ca
Remove MCAsmInfo.h include from TargetOptions.h
TargetOptions wants the ExceptionHandling enum. Move that to MCTargetOptions.h to avoid transitively including Dwarf.h everywhere in clang. Now you can add a DWARF tag without a full rebuild of clang semantic analysis.
llvm-svn: 276883
show more ...
|
#
52ef4019 |
| 26-Jul-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Make AMDGPUMachineFunction fields private
ABIArgOffset is a problem because properly fsetting the KernArgSize requires that the reserved area before the real kernel arguments be correctly al
AMDGPU: Make AMDGPUMachineFunction fields private
ABIArgOffset is a problem because properly fsetting the KernArgSize requires that the reserved area before the real kernel arguments be correctly aligned, which requires fixing clover.
llvm-svn: 276766
show more ...
|
#
2fa171c4 |
| 25-Jul-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Make skip threshold an option
llvm-svn: 276680
|
#
63e59680 |
| 19-Jul-2016 |
Davide Italiano <davide@freebsd.org> |
[AMDGPU] Remove spurious line (should've been removed in r276029).
llvm-svn: 276030
|
#
1576e385 |
| 19-Jul-2016 |
Davide Italiano <davide@freebsd.org> |
[AMDGPU] Remove dead code.
LGTM'd by Matt Arsenault.
llvm-svn: 276029
|
#
cb540bc0 |
| 19-Jul-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Expand register indexing pseudos in custom inserter
This is to help moveSILowerControlFlow to before regalloc. There are a couple of tradeoffs with this. The complete CFG is visible to more
AMDGPU: Expand register indexing pseudos in custom inserter
This is to help moveSILowerControlFlow to before regalloc. There are a couple of tradeoffs with this. The complete CFG is visible to more passes, the loop body avoids an extra copy of m0, vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.
The disadvantage is the register allocator doesn't understand that the single lane's vector is dead within the loop body, so an extra register is used to outlive the loop block when expanding the VGPR -> m0 loop. This also now results in worse waitcnt insertion before the loop instead of after for pending operations at the point of the indexing, but that should be fixed by future improvements to cross block waitcnt insertion.
v_movreld_b32's operands are now modeled more correctly since vdst is not a true output. This is kind of a hack to treat vdst as a use operand. Extra checking is required in the verifier since I can't seem to get tablegen to emit an implicit operand for a virtual register.
llvm-svn: 275934
show more ...
|
#
b91805ea |
| 15-Jul-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix not expanding control flow after some kill blocks
Also stop trying to insert skip blocks at end_cf. This was inserting them at the end of the block which doesn't make sense. The skip sho
AMDGPU: Fix not expanding control flow after some kill blocks
Also stop trying to insert skip blocks at end_cf. This was inserting them at the end of the block which doesn't make sense. The skip should be inserted at the beginning of the block right after the end cf. Just remove this for now since no tests seem to stress this and I think this can be handled more generally later.
Fixes bug 28550
llvm-svn: 275510
show more ...
|
#
fa5a86a4 |
| 15-Jul-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix trying to skip from a block with no successors
Found while reducing bug 28550
llvm-svn: 275509
|
#
786724a2 |
| 12-Jul-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Follow up to r275203
I meant to squash this into it.
llvm-svn: 275220
|
#
657f871a |
| 12-Jul-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix verifier error with kill intrinsic
Don't create a terminator in the middle of the block. We should probably get rid of this intrinsic.
llvm-svn: 275203
|
#
48d70cb4 |
| 09-Jul-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Revert "AMDGPU: Remove unused control flow intrinsic"
llvm-svn: 274978
|
#
1322b6f8 |
| 09-Jul-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Improve offset folding for register indexing
llvm-svn: 274954
|
#
8f0a92f0 |
| 08-Jul-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove unused control flow intrinsic
llvm-svn: 274939
|
#
b63f18c9 |
| 08-Jul-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Minor adjustment to r274817
The commit message is inaccurate, modifiesRegister will check for partial defs of exec.
We currently don't ever emit partial defs of exec, so it doesn't really m
AMDGPU: Minor adjustment to r274817
The commit message is inaccurate, modifiesRegister will check for partial defs of exec.
We currently don't ever emit partial defs of exec, so it doesn't really matter.
llvm-svn: 274886
show more ...
|
#
a74374a8 |
| 08-Jul-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Move si_mask_branch register operand to be a use
llvm-svn: 274818
|
#
d4a84b1e |
| 08-Jul-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Cleanup. Use definesRegister instead of manual loop
Also this will be more precise since it will check exec_lo/exec_hi writes.
llvm-svn: 274817
|
#
e40530ea |
| 06-Jul-2016 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Fix return of non-void-returning shaders
Summary: Since "AMDGPU: Fix verifier errors in SILowerControlFlow", the logic that ensures that a non-void-returning shader falls off the end of the
AMDGPU: Fix return of non-void-returning shaders
Summary: Since "AMDGPU: Fix verifier errors in SILowerControlFlow", the logic that ensures that a non-void-returning shader falls off the end of the last basic block was effectively disabled, since SI_RETURN is now used.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96731
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: http://reviews.llvm.org/D21975
llvm-svn: 274612
show more ...
|
#
c1142725 |
| 30-Jun-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add m0 vgpr load loop block as successor
This shows up as a verifier error when I move this earlier, not sure why it didn't before.
llvm-svn: 274275
|
#
b4d95031 |
| 28-Jun-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix out of bounds indirect indexing errors
This was producing acceses to registers beyond the super register's limits, resulting in verifier failures.
llvm-svn: 273977
|
#
21a4625a |
| 27-Jun-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix verifier errors with undef vector indices
Also fix pointlessly adding exec to liveins.
llvm-svn: 273916
|
#
43e92fe3 |
| 24-Jun-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Cleanup subtarget handling.
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict
AMDGPU: Cleanup subtarget handling.
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target.
llvm-svn: 273652
show more ...
|
#
3cb4ddeb |
| 22-Jun-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix liveness when expanding m0 loop
llvm-svn: 273514
|
#
9babdf42 |
| 22-Jun-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix verifier errors in SILowerControlFlow
The main sin this was committing was using terminator instructions in the middle of the block, and then not updating the block successors / predeces
AMDGPU: Fix verifier errors in SILowerControlFlow
The main sin this was committing was using terminator instructions in the middle of the block, and then not updating the block successors / predecessors. Split the blocks up to avoid this and introduce new pseudo instructions for branches taken with exec masking.
Also use a pseudo instead of emitting s_endpgm and erasing it in the special case of a non-void return.
llvm-svn: 273467
show more ...
|
Revision tags: llvmorg-3.8.1, llvmorg-3.8.1-rc1 |
|
#
4318ea35 |
| 19-May-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Also look for s_cbranch_vccz
llvm-svn: 270091
|