History log of /llvm-project/llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp (Results 101 – 125 of 142)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-3.9.0-rc1
# 3b572002 28-Jul-2016 Nicolai Haehnle <nhaehnle@gmail.com>

AMDGPU: add execfix flag to SI_ELSE

Summary:
SI_ELSE is lowered into two parts:

s_or_saveexec_b64 dst, src (at the start of the basic block)

s_xor_b64 exec, exec, dst (at the end of the basic bloc

AMDGPU: add execfix flag to SI_ELSE

Summary:
SI_ELSE is lowered into two parts:

s_or_saveexec_b64 dst, src (at the start of the basic block)

s_xor_b64 exec, exec, dst (at the end of the basic block)

The idea is that dst contains the exec mask of the preceding IF block. It can
happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside
the basic block that contains SI_ELSE, in which case it introduces an instruction

s_and_b64 exec, exec, s[...]

which masks out bits that can correspond to both the IF and the ELSE paths.
So the resulting sequence must be:

s_or_savexec_b64 dst, src

s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode
s_and_b64 dst, dst, exec <-- added by SILowerControlFlow

s_xor_b64 exec, exec, dst

Whether to add the additional s_and_b64 dst, dst, exec is currently determined
via the ExecModified tracking. With this change, it is instead determined by
an additional flag on SI_ELSE which is set by SIWholeQuadMode.

Finally: It also occured to me that an alternative approach for the long run
is for SILowerControlFlow to unconditionally emit

s_or_saveexec_b64 dst, src

...

s_and_b64 dst, dst, exec
s_xor_b64 exec, exec, dst

and have a pass that detects and cleans up the "redundant AND with exec"
pattern where possible. This could be useful anyway, because we also add
instructions

s_and_b64 vcc, exec, vcc

before s_cbranch_scc (in moveToALU), and those are often redundant. I have
some pending changes to how KILL is lowered that could also benefit from
such a cleanup pass.

In any case, this current patch could help in the short term with the whole
ExecModified business.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22846

llvm-svn: 276972

show more ...


# 46cb48c7 27-Jul-2016 Reid Kleckner <rnk@google.com>

Remove MCAsmInfo.h include from TargetOptions.h

TargetOptions wants the ExceptionHandling enum. Move that to
MCTargetOptions.h to avoid transitively including Dwarf.h everywhere in
clang. Now you ca

Remove MCAsmInfo.h include from TargetOptions.h

TargetOptions wants the ExceptionHandling enum. Move that to
MCTargetOptions.h to avoid transitively including Dwarf.h everywhere in
clang. Now you can add a DWARF tag without a full rebuild of clang
semantic analysis.

llvm-svn: 276883

show more ...


# 52ef4019 26-Jul-2016 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Make AMDGPUMachineFunction fields private

ABIArgOffset is a problem because properly fsetting the
KernArgSize requires that the reserved area before the
real kernel arguments be correctly al

AMDGPU: Make AMDGPUMachineFunction fields private

ABIArgOffset is a problem because properly fsetting the
KernArgSize requires that the reserved area before the
real kernel arguments be correctly aligned, which requires
fixing clover.

llvm-svn: 276766

show more ...


# 2fa171c4 25-Jul-2016 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Make skip threshold an option

llvm-svn: 276680


# 63e59680 19-Jul-2016 Davide Italiano <davide@freebsd.org>

[AMDGPU] Remove spurious line (should've been removed in r276029).

llvm-svn: 276030


# 1576e385 19-Jul-2016 Davide Italiano <davide@freebsd.org>

[AMDGPU] Remove dead code.

LGTM'd by Matt Arsenault.

llvm-svn: 276029


# cb540bc0 19-Jul-2016 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Expand register indexing pseudos in custom inserter

This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more

AMDGPU: Expand register indexing pseudos in custom inserter

This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.

The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.

v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.

llvm-svn: 275934

show more ...


# b91805ea 15-Jul-2016 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix not expanding control flow after some kill blocks

Also stop trying to insert skip blocks at end_cf. This
was inserting them at the end of the block which doesn't make
sense. The skip sho

AMDGPU: Fix not expanding control flow after some kill blocks

Also stop trying to insert skip blocks at end_cf. This
was inserting them at the end of the block which doesn't make
sense. The skip should be inserted at the beginning of the block
right after the end cf. Just remove this for now since no tests
seem to stress this and I think this can be handled more generally
later.

Fixes bug 28550

llvm-svn: 275510

show more ...


# fa5a86a4 15-Jul-2016 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix trying to skip from a block with no successors

Found while reducing bug 28550

llvm-svn: 275509


# 786724a2 12-Jul-2016 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Follow up to r275203

I meant to squash this into it.

llvm-svn: 275220


# 657f871a 12-Jul-2016 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix verifier error with kill intrinsic

Don't create a terminator in the middle of the block.
We should probably get rid of this intrinsic.

llvm-svn: 275203


# 48d70cb4 09-Jul-2016 Matt Arsenault <Matthew.Arsenault@amd.com>

Revert "AMDGPU: Remove unused control flow intrinsic"

llvm-svn: 274978


# 1322b6f8 09-Jul-2016 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Improve offset folding for register indexing

llvm-svn: 274954


# 8f0a92f0 08-Jul-2016 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Remove unused control flow intrinsic

llvm-svn: 274939


# b63f18c9 08-Jul-2016 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Minor adjustment to r274817

The commit message is inaccurate, modifiesRegister
will check for partial defs of exec.

We currently don't ever emit partial defs of exec,
so it doesn't really m

AMDGPU: Minor adjustment to r274817

The commit message is inaccurate, modifiesRegister
will check for partial defs of exec.

We currently don't ever emit partial defs of exec,
so it doesn't really matter.

llvm-svn: 274886

show more ...


# a74374a8 08-Jul-2016 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Move si_mask_branch register operand to be a use

llvm-svn: 274818


# d4a84b1e 08-Jul-2016 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Cleanup. Use definesRegister instead of manual loop

Also this will be more precise since it will check
exec_lo/exec_hi writes.

llvm-svn: 274817


# e40530ea 06-Jul-2016 Nicolai Haehnle <nhaehnle@gmail.com>

AMDGPU: Fix return of non-void-returning shaders

Summary:
Since "AMDGPU: Fix verifier errors in SILowerControlFlow", the logic that
ensures that a non-void-returning shader falls off the end of the

AMDGPU: Fix return of non-void-returning shaders

Summary:
Since "AMDGPU: Fix verifier errors in SILowerControlFlow", the logic that
ensures that a non-void-returning shader falls off the end of the last
basic block was effectively disabled, since SI_RETURN is now used.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96731

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21975

llvm-svn: 274612

show more ...


# c1142725 30-Jun-2016 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add m0 vgpr load loop block as successor

This shows up as a verifier error when I move this
earlier, not sure why it didn't before.

llvm-svn: 274275


# b4d95031 28-Jun-2016 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix out of bounds indirect indexing errors

This was producing acceses to registers beyond the super
register's limits, resulting in verifier failures.

llvm-svn: 273977


# 21a4625a 27-Jun-2016 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix verifier errors with undef vector indices

Also fix pointlessly adding exec to liveins.

llvm-svn: 273916


# 43e92fe3 24-Jun-2016 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Cleanup subtarget handling.

Split AMDGPUSubtarget into amdgcn/r600 specific subclasses.
This removes most of the static_casting of the basic codegen
classes everywhere, and tries to restrict

AMDGPU: Cleanup subtarget handling.

Split AMDGPUSubtarget into amdgcn/r600 specific subclasses.
This removes most of the static_casting of the basic codegen
classes everywhere, and tries to restrict the features
visible on the wrong target.

llvm-svn: 273652

show more ...


# 3cb4ddeb 22-Jun-2016 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix liveness when expanding m0 loop

llvm-svn: 273514


# 9babdf42 22-Jun-2016 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix verifier errors in SILowerControlFlow

The main sin this was committing was using terminator
instructions in the middle of the block, and then
not updating the block successors / predeces

AMDGPU: Fix verifier errors in SILowerControlFlow

The main sin this was committing was using terminator
instructions in the middle of the block, and then
not updating the block successors / predecessors.
Split the blocks up to avoid this and introduce new
pseudo instructions for branches taken with exec masking.

Also use a pseudo instead of emitting s_endpgm and erasing
it in the special case of a non-void return.

llvm-svn: 273467

show more ...


Revision tags: llvmorg-3.8.1, llvmorg-3.8.1-rc1
# 4318ea35 19-May-2016 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Also look for s_cbranch_vccz

llvm-svn: 270091


123456