History log of /llvm-project/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp (Results 76 – 100 of 167)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-10.0.1-rc1
# 35e6a9c8 24-Apr-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Break read2/write2 search range on a memory fence

This is to fix performance regressions introduced by
86c944d790728891801778b8d98c2c65a83f36a5.

The old search would collect all potentially

AMDGPU: Break read2/write2 search range on a memory fence

This is to fix performance regressions introduced by
86c944d790728891801778b8d98c2c65a83f36a5.

The old search would collect all potentially mergeable instructions in
the entire block. In this case, the same address is written in
multiple places in the block on the other side of a fence. When sorted
by offset, the two unmergeable, identical addresses would be next to
each other and the merge would give up.

Break the search space when we encounter an instruction we won't be
able to merge across. This will keep the identical addresses in
different merge attempts.

This may also improve compile time by reducing the merge list size.

show more ...


# 6bffd0df 24-Apr-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix redundant members


# 50128f8a 24-Apr-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Use Register


# 0337017a 22-Apr-2020 Jay Foad <jay.foad@amd.com>

[AMDGPU] Use SGPR instead of SReg classes

12994a70cf7 did this for 128-bit classes:

SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds
the additional non-allocatable TTMP

[AMDGPU] Use SGPR instead of SReg classes

12994a70cf7 did this for 128-bit classes:

SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds
the additional non-allocatable TTMP registers. There's no point in
allocating SReg_128 vregs. This shrinks the size of the classes
regalloc needs to consider, which is usually good.

This patch extends it to all classes > 64 bits, for consistency.

Differential Revision: https://reviews.llvm.org/D78622

show more ...


# f2334a7e 01-Apr-2020 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Fix crash in SILoadStoreOptimizer

SILoadStoreOptimizer::checkAndPrepareMerge() expects base and
paired instruction to come in order and scans MBB from base to
the paired instruction. An ori

[AMDGPU] Fix crash in SILoadStoreOptimizer

SILoadStoreOptimizer::checkAndPrepareMerge() expects base and
paired instruction to come in order and scans MBB from base to
the paired instruction. An original order can be changed if
there were a dependent instruction in between and base instruction
was moved.

Fixed by bailing the optimization. In theory it might be possible
still to perform a merge by swapping instructions, but on practice
it bails anyway because it finds dependency on that same instruction
which has resulted in the base move.

Differential Revision: https://reviews.llvm.org/D77245

show more ...


Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1
# 87568691 29-Jan-2020 Sebastian Neubauer <sebastian.neubauer@amd.com>

[AMDGPU] Add a16 feature to gfx10

Based on D72931

This adds a new feature called A16 which is enabled for gfx10.
gfx9 keeps the R128A16 feature so it can share all the instruction encodings
with gf

[AMDGPU] Add a16 feature to gfx10

Based on D72931

This adds a new feature called A16 which is enabled for gfx10.
gfx9 keeps the R128A16 feature so it can share all the instruction encodings
with gfx7/8.

Differential Revision: https://reviews.llvm.org/D73956

show more ...


# 0426c2d0 30-Jan-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

Reapply "AMDGPU: Cleanup and fix SMRD offset handling"

This reverts commit 6a4acb9d809aaadb9304a7a2f3382d958a6c2adf.


# 6a4acb9d 30-Jan-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

Revert "AMDGPU: Cleanup and fix SMRD offset handling"

This reverts commit 17dbc6611df9044d779d85b3d545bd37e5dd5200.

A test is failing on some bots


# 17dbc661 30-Jan-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Cleanup and fix SMRD offset handling

I believe this also fixes bugs with CI 32-bit handling, which was
incorrectly skipping offsets that look like signed 32-bit values. Also
validate the off

AMDGPU: Cleanup and fix SMRD offset handling

I believe this also fixes bugs with CI 32-bit handling, which was
incorrectly skipping offsets that look like signed 32-bit values. Also
validate the offsets are dword aligned before folding.

show more ...


# cb297050 25-Jan-2020 Tom Stellard <tstellar@redhat.com>

AMDGPU/SILoadStoreOptimizer: Fix uninitialized variable error

This was introduced by 86c944d790728891801778b8d98c2c65a83f36a5 and
caught by the sanitizer-x86_64-linux-fast bot.


# 86c944d7 24-Jan-2020 Tom Stellard <tstellar@redhat.com>

AMDGPU/SILoadStoreOptimizer: Improve merging of out of order offsets

Summary:
This improves merging of sequences like:

store a, ptr + 4
store b, ptr + 8
store c, ptr + 12
store d, ptr + 16
store e,

AMDGPU/SILoadStoreOptimizer: Improve merging of out of order offsets

Summary:
This improves merging of sequences like:

store a, ptr + 4
store b, ptr + 8
store c, ptr + 12
store d, ptr + 16
store e, ptr + 20
store f, ptr

Prior to this patch the basic block was scanned in order to find instructions
to merge and the above sequence would be transformed to:

store4 <a, b, c, d>, ptr + 4
store e, ptr + 20
store r, ptr

With this change, we now sort all the candidate merge instructions by their offset,
so instructions are visited in offset order rather than in the order they appear
in the basic block. We now transform this sequnce into:

store4 <f, a, b, c>, ptr
store2 <d, e>, ptr + 16

Another benefit of this change is that since we have sorted the mergeable lists
by offset, we can easily check if an instruction is mergeable by checking the
offset of the instruction that becomes before or after it in the sorted list.
Once we determine an instruction is not mergeable we can remove it from the list
and avoid having to do the more expensive mergeablilty checks.

Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin

Reviewed By: arsenm, nhaehnle

Subscribers: kerbowa, merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65966

show more ...


Revision tags: llvmorg-11-init
# c3bc805f 17-Dec-2019 Tom Stellard <tstellar@redhat.com>

AMDGPU/SILoadStoreOptimillzer: Refactor CombineInfo struct

Summary:
Modify CombineInfo to only store information about a single instruction.
This is a little easier to work with and removes a lot of

AMDGPU/SILoadStoreOptimillzer: Refactor CombineInfo struct

Summary:
Modify CombineInfo to only store information about a single instruction.
This is a little easier to work with and removes a lot of duplicate
initialization code.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm, nhaehnle

Subscribers: merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71045

show more ...


# bf13a710 12-Dec-2019 Tom Stellard <tstellar@redhat.com>

AMDGPU/SILoadStoreOptimizer: Simplify function

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llv

AMDGPU/SILoadStoreOptimizer: Simplify function

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71044

show more ...


Revision tags: llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1
# 4a801170 20-Nov-2019 Piotr Sobczak <Piotr.Sobczak@amd.com>

[AMDGPU][SILoadStoreOptimizer] Merge TBUFFER loads/stores

Summary: Extend SILoadStoreOptimizer to merge tbuffer loads and stores.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kz

[AMDGPU][SILoadStoreOptimizer] Merge TBUFFER loads/stores

Summary: Extend SILoadStoreOptimizer to merge tbuffer loads and stores.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69794

show more ...


# 4a308d30 19-Nov-2019 Michael Liao <michael.hliao@gmail.com>

[AMDGPU] Keep consistent check of legal addressing mode.

Summary:
- Add test cases for GFX10, which has narrower offset range compared to
GFX9.

Reviewers: rampitec, arsenm

Subscribers: kzhuravl,

[AMDGPU] Keep consistent check of legal addressing mode.

Summary:
- Add test cases for GFX10, which has narrower offset range compared to
GFX9.

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70473

show more ...


# d8f7c68e 09-Oct-2019 Nicolai Hähnle <nicolai.haehnle@amd.com>

AMDGPU/SILoadStoreOptimizer: fix a likely bug introduced recently

Summary:
We should check for same instruction class before checking whether they
have the same base address, else we might iterate o

AMDGPU/SILoadStoreOptimizer: fix a likely bug introduced recently

Summary:
We should check for same instruction class before checking whether they
have the same base address, else we might iterate out of bounds of a
MachineInstr operands list. The InstClass check is also cheaper.

This was introduced in SVN r373630.

Reviewers: tstellar

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68690

show more ...


# 05da2fe5 13-Nov-2019 Reid Kleckner <rnk@google.com>

Sink all InitializePasses.h includes

This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of reco

Sink all InitializePasses.h includes

This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.

I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h

Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.

Reviewers: bkramer, asbirlea, bollu, jdoerfert

Differential Revision: https://reviews.llvm.org/D70211

show more ...


# c3d6f0dd 03-Nov-2019 Dávid Bolvanský <david.bolvansky@gmail.com>

[SILoadStoreOptimizer] Fixed typo. NFCI.


# 02baaca7 16-Oct-2019 Piotr Sobczak <piotr.sobczak@amd.com>

[AMDGPU] Extend the SI Load/Store optimizer

Summary:
Extend the SI Load/Store optimizer to merge MIMG load instructions. Handle
different flavours of image_load and image_sample instructions.

When

[AMDGPU] Extend the SI Load/Store optimizer

Summary:
Extend the SI Load/Store optimizer to merge MIMG load instructions. Handle
different flavours of image_load and image_sample instructions.

When the instructions of the same subclass differ only in dmask, merge
them and update dmask accordingly.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64911

llvm-svn: 374984

show more ...


# 12994a70 10-Oct-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Use SGPR_128 instead of SReg_128 for vregs

SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds
the additional non-allocatable TTMP registers. There's no point in
allocating

AMDGPU: Use SGPR_128 instead of SReg_128 for vregs

SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds
the additional non-allocatable TTMP registers. There's no point in
allocating SReg_128 vregs. This shrinks the size of the classes
regalloc needs to consider, which is usually good.

llvm-svn: 374284

show more ...


# 00182683 09-Oct-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Relax register classes used

llvm-svn: 374254


# 165e4691 04-Oct-2019 Piotr Sobczak <piotr.sobczak@amd.com>

[AMDGPU][SILoadStoreOptimizer] NFC: Refactor code

Summary:
This patch fixes a potential aliasing problem in InstClassEnum,
where local values were mixed with machine opcodes.

Introducing InstSubcla

[AMDGPU][SILoadStoreOptimizer] NFC: Refactor code

Summary:
This patch fixes a potential aliasing problem in InstClassEnum,
where local values were mixed with machine opcodes.

Introducing InstSubclass will keep them separate and help extending
InstClassEnum with other instruction types (e.g. MIMG) in the future.

This patch also makes getSubRegIdxs() more concise.

Reviewers: nhaehnle, arsenm, tstellar

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68384

llvm-svn: 373699

show more ...


# e6f51713 03-Oct-2019 Tom Stellard <tstellar@redhat.com>

AMDGPU/SILoadStoreOptimizer: Optimize scanning for mergeable instructions

Summary:
This adds a pre-pass to this optimization that scans through the basic
block and generates lists of mergeable instr

AMDGPU/SILoadStoreOptimizer: Optimize scanning for mergeable instructions

Summary:
This adds a pre-pass to this optimization that scans through the basic
block and generates lists of mergeable instructions with one list per unique
address.

In the optimization phase instead of scanning through the basic block for mergeable
instructions, we now iterate over the lists generated by the pre-pass.

The decision to re-optimize a block is now made per list, so if we fail to merge any
instructions with the same address, then we do not attempt to optimize them in
future passes over the block. This will help to reduce the time this pass
spends re-optimizing instructions.

In one pathological test case, this change reduces the time spent in the
SILoadStoreOptimizer from 0.2s to 0.03s.

This restructuring will also make it possible to implement further solutions in
this pass, because we can now add less expensive checks to the pre-pass and
filter instructions out early which will avoid the need to do the expensive
scanning during the optimization pass. For example, checking for adjacent
offsets is an inexpensive test we can move to the pre-pass.

Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65961

llvm-svn: 373630

show more ...


# 265e94e6 02-Oct-2019 Piotr Sobczak <piotr.sobczak@amd.com>

[AMDGPU] Extend buffer intrinsics with swizzling

Summary:
Extend cachepolicy operand in the new VMEM buffer intrinsics
to supply information whether the buffer data is swizzled.
Also, propagate this

[AMDGPU] Extend buffer intrinsics with swizzling

Summary:
Extend cachepolicy operand in the new VMEM buffer intrinsics
to supply information whether the buffer data is swizzled.
Also, propagate this information to MIR.

Intrinsics updated:
int_amdgcn_raw_buffer_load
int_amdgcn_raw_buffer_load_format
int_amdgcn_raw_buffer_store
int_amdgcn_raw_buffer_store_format
int_amdgcn_raw_tbuffer_load
int_amdgcn_raw_tbuffer_store
int_amdgcn_struct_buffer_load
int_amdgcn_struct_buffer_load_format
int_amdgcn_struct_buffer_store
int_amdgcn_struct_buffer_store_format
int_amdgcn_struct_tbuffer_load
int_amdgcn_struct_tbuffer_store

Furthermore, disable merging of VMEM buffer instructions
in SI Load/Store optimizer, if the "swizzled" bit on the instruction
is on.

The default value of the bit is 0, meaning that data in buffer
is linear and buffer instructions can be merged.

There is no difference in the generated code with this commit.
However, in the future it will be expected that front-ends
use buffer intrinsics with correct "swizzled" bit set.

Reviewers: arsenm, nhaehnle, tpr

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68200

llvm-svn: 373491

show more ...


# 004c7915 01-Oct-2019 Tom Stellard <tstellar@redhat.com>

AMDGPU/SILoadStoreOptimizer: Add helper functions for working with CombineInfo

Summary:
This is a refactoring that will make future improvements to this pass easier.
This change should not change th

AMDGPU/SILoadStoreOptimizer: Add helper functions for working with CombineInfo

Summary:
This is a refactoring that will make future improvements to this pass easier.
This change should not change the behavior of the pass.

Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin

Reviewed By: nhaehnle, vpykhtin

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65496

llvm-svn: 373366

show more ...


1234567