History log of /llvm-project/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (Results 126 – 150 of 229)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 4c0251da 25-Oct-2019 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Enable SGPR copy folding

That used to fail in the last testcase function because after
%0:sreg_64.sub0 was folded into %3:sreg_32_xm0_xexec COPY, it
was further folded into S_STORE_DWORD_IM

[AMDGPU] Enable SGPR copy folding

That used to fail in the last testcase function because after
%0:sreg_64.sub0 was folded into %3:sreg_32_xm0_xexec COPY, it
was further folded into S_STORE_DWORD_IMM. Its legal effective
subreg class is SReg_32 while instruction expects more restricted
SReg_32_XM0_EXEC. However, SIInstrInfo::isLegalRegOperand()
passed the legality check and it was caught in the verifier.

Borrowed code from the verifier to check for RC legality.

Differential Revision: https://reviews.llvm.org/D69445

show more ...


# c7dcacf1 25-Oct-2019 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Fixed asan failure in SIFoldOperands

Both tryFoldOMod() and tryFoldClamp() remove original instruction,
so the check MI.modifiesRegister() may use a deleted MI.

Differential Revision: http

[AMDGPU] Fixed asan failure in SIFoldOperands

Both tryFoldOMod() and tryFoldClamp() remove original instruction,
so the check MI.modifiesRegister() may use a deleted MI.

Differential Revision: https://reviews.llvm.org/D69448

show more ...


# d4303b38 24-Oct-2019 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Fold AGPR reg_sequence initializers

Differential Revision: https://reviews.llvm.org/D69413


# b2a65f0d 23-Oct-2019 Michael Liao <michael.hliao@gmail.com>

[AMDGPU] Skip additional folding on the same operand.

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

[AMDGPU] Skip additional folding on the same operand.

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69355

show more ...


# 61e7a61b 21-Oct-2019 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Allow folding of sgpr to vgpr copy

Potentially sgpr to sgpr copy should also be possible.
That is however trickier because we may end up with a
wrong register class at use because of xm0/xe

[AMDGPU] Allow folding of sgpr to vgpr copy

Potentially sgpr to sgpr copy should also be possible.
That is however trickier because we may end up with a
wrong register class at use because of xm0/xexec permutations.

Differential Revision: https://reviews.llvm.org/D69280

show more ...


# 48f57138 22-Oct-2019 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Allow tied operand subreg folding

Turns out it makes sense, contrarily to what comment said.

Differential Revision: https://reviews.llvm.org/D69287


# 8ebbf25c 21-Oct-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Erase redundant redefs of m0 in SIFoldOperands

Only handle simple inter-block redefs of m0 to the same value. This
avoids interference from redefs of m0 in SILoadStoreOptimzer. I was
initial

AMDGPU: Erase redundant redefs of m0 in SIFoldOperands

Only handle simple inter-block redefs of m0 to the same value. This
avoids interference from redefs of m0 in SILoadStoreOptimzer. I was
initially teaching that pass to ignore redefs of m0, but having them
not exist beforehand is much simpler.

This is in preparation for deleting the current special m0 handling in
SIFixSGPRCopies to allow the register coalescer to handle the
difficult cases.

llvm-svn: 375449

show more ...


# e5be543a 20-Oct-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Increase vcc liveness scan threshold

Avoids a test regression in a future patch. Also add debug printing on
this case, so I waste less time debugging folds in the future.

llvm-svn: 375367


# f8bf7d7f 09-Oct-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Don't fold copies to physregs

In a future patch, this will help cleanup m0 handling.

The register coalescer handles copies from a register that
materializes an immediate, but doesn't handle

AMDGPU: Don't fold copies to physregs

In a future patch, this will help cleanup m0 handling.

The register coalescer handles copies from a register that
materializes an immediate, but doesn't handle move immediates
itself. The virtual register uses will often be allocated to the same
register, so there end up being no real copy.

llvm-svn: 374257

show more ...


# 565b1d3d 30-Sep-2019 Alexander Timofeev <Alexander.Timofeev@amd.com>

[AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definition

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D67662

llvm-svn: 373221


# d3b2b971 25-Sep-2019 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] gfx10 v_fmac_f16 operand folding

Fold immediates into v_fmac_f16.

Differential Revision: https://reviews.llvm.org/D68037

llvm-svn: 372906


Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3
# 8fe1245a 23-Aug-2019 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] w/a for gfx908 mfma SrcC literal HW bug

gfx908 ignores an mfma if SrcC is a literal.

Differential Revision: https://reviews.llvm.org/D66670

llvm-svn: 369816


# 78347c97 21-Aug-2019 Alexander Timofeev <Alexander.Timofeev@amd.com>

[AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions
Differential Revision: https://reviews.llvm.org/D63731
Reviewers: qcolombet, rampitec

llvm-svn: 369532


# 0c476111 15-Aug-2019 Daniel Sanders <daniel_l_sanders@apple.com>

Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM

Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Re

Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM

Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor

Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&

Depends on D65919

Reviewers: arsenm, bogner, craig.topper, RKSimon

Reviewed By: arsenm

Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65962

llvm-svn: 369041

show more ...


# 10db641a 13-Aug-2019 Tim Renouf <tpr.llvm@botech.co.uk>

[AMDGPU] Fix to 'Fold readlane from copy of SGPR or imm'

That change (r363670) could leave a copy from vgpr to sgpr. Fixed.

Differential Revision: https://reviews.llvm.org/D66133

Change-Id: I00c3f

[AMDGPU] Fix to 'Fold readlane from copy of SGPR or imm'

That change (r363670) could leave a copy from vgpr to sgpr. Fixed.

Differential Revision: https://reviews.llvm.org/D66133

Change-Id: I00c3fe6fda2e8e1e36f53195b881b1449c777ea4
llvm-svn: 368736

show more ...


Revision tags: llvmorg-9.0.0-rc2
# 2bea69bf 01-Aug-2019 Daniel Sanders <daniel_l_sanders@apple.com>

Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC

llvm-svn: 367633


Revision tags: llvmorg-9.0.0-rc1, llvmorg-10-init
# 27ec195f 12-Jul-2019 Jay Foad <jay.foad@gmail.com>

[AMDGPU] Fix DPP combiner check for exec modification

Summary:
r363675 changed the exec modification helper function, now called
execMayBeModifiedBeforeUse, so that if no UseMI is specified it check

[AMDGPU] Fix DPP combiner check for exec modification

Summary:
r363675 changed the exec modification helper function, now called
execMayBeModifiedBeforeUse, so that if no UseMI is specified it checks
all instructions in the basic block, even beyond the last use. That
meant that the DPP combiner no longer worked in any basic block that
ended with a control flow instruction, and in particular it didn't work
on code sequences generated by the atomic optimizer.

Fix it by reinstating the old behaviour but in a new helper function
execMayBeModifiedBeforeAnyUse, and limiting the number of instructions
scanned.

Reviewers: arsenm, vpykhtin

Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64393

llvm-svn: 365910

show more ...


# e67cc380 11-Jul-2019 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] gfx908 mfma support

Differential Revision: https://reviews.llvm.org/D64584

llvm-svn: 365824


Revision tags: llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3
# 2710171a 25-Jun-2019 Nicolai Haehnle <nhaehnle@gmail.com>

AMDGPU: Write LDS objects out as global symbols in code generation

Summary:
The symbols use the processor-specific SHN_AMDGPU_LDS section index
introduced with a previous change. The linker is then

AMDGPU: Write LDS objects out as global symbols in code generation

Summary:
The symbols use the processor-specific SHN_AMDGPU_LDS section index
introduced with a previous change. The linker is then expected to resolve
relocations, which are also emitted.

Initially disabled for HSA and PAL environments until they have caught up
in terms of linker and runtime loader.

Some notes:

- The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered
to a constant at compile times, which means some tests can no longer
be applied.

The current "solution" is a terrible hack, but the intrinsic isn't
used by Mesa, so we can keep it for now.

- We no longer know the full LDS size per kernel at compile time, which
means that we can no longer generate a relevant error message at
compile time. It would be possible to add a check for the size of
individual variables, but ultimately the linker will have to perform
the final check.

Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275

Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin

Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61494

llvm-svn: 364297

show more ...


# 60957cb7 24-Jun-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fold frame index into MUBUF

This matters for byval uses outside of the entry block, which appear
as copies.

Previously, the only folding done was during selection, which could
not see the u

AMDGPU: Fold frame index into MUBUF

This matters for byval uses outside of the entry block, which appear
as copies.

Previously, the only folding done was during selection, which could
not see the underlying frame index. For any uses outside the entry
block, the frame index was materialized in the entry block relative to
the global scratch wave offset.

This may produce worse code in cases where the offset ends up not
fitting in the MUBUF offset field. A better heuristic would be helpfu
for extreme frames.

llvm-svn: 364185

show more ...


# 4d000d24 19-Jun-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix folding immediate into readfirstlane through reg_sequence

The def instruction for the vreg may not match, because it may be
folding through a reg_sequence. The assert was overly conserva

AMDGPU: Fix folding immediate into readfirstlane through reg_sequence

The def instruction for the vreg may not match, because it may be
folding through a reg_sequence. The assert was overly conservative and
not necessary. It's not actually important if DefMI really defined the
register, because the fold that will be done cares about the def of
the value that will be folded.

For some reason copies aren't making it through the reg_sequence,
although they should.

llvm-svn: 363876

show more ...


# f39f3bd0 18-Jun-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Change API for checking for exec modification

Invert the name and return value to better reflect the imprecise
nature.

Force passing in the DefMI, since it's known in the 2 users and could

AMDGPU: Change API for checking for exec modification

Invert the name and return value to better reflect the imprecise
nature.

Force passing in the DefMI, since it's known in the 2 users and could
possibly fail for an arbitrary vreg.

Allow specifying a specific user instruction. Scan through use
instructions, instead of use operands. Add scan thresholds instead of
searching infinitely.

Stop using a set to track seen uses. I didn't understand this usage,
or why it would not check the last use. I don't think the use list has
any particular order.

llvm-svn: 363675

show more ...


# bcb5ea00 18-Jun-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fold readlane from copy of SGPR or imm

These may be inserted to assert uniformity somewhere.

llvm-svn: 363670


# e75e197a 18-Jun-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Remove unnecessary check for virtual register

The copy was found by searching the uses of a virtual register, so
it's already known to be virtual.

llvm-svn: 363669


Revision tags: llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1
# cfd0ca38 03-May-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Support shrinking add with FI in SIFoldOperands

Avoids test regression in a future patch

llvm-svn: 359898


12345678910