History log of /llvm-project/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (Results 151 – 175 of 229)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 0446fbe4 03-May-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Replace shrunk instruction with dummy implicit_def

This was broken if the original operand was killed. The kill flag
would appear on both instructions, and fail the verifier. Keep the
kill f

AMDGPU: Replace shrunk instruction with dummy implicit_def

This was broken if the original operand was killed. The kill flag
would appear on both instructions, and fail the verifier. Keep the
kill flag, but remove the operands from the old instruction. This has
an added benefit of really reducing the use count for future folds.

Ideally the pass would be structured more like what PeepholeOptimizer
does to avoid this hack to avoid breaking instruction iterators.

llvm-svn: 359891

show more ...


# 2c8936fd 03-May-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix incorrect commute with sub when folding immediates

When a fold of an immediate into a sub/subrev required shrinking the
instruction, the wrong VOP2 opcode was used. This was using the VO

AMDGPU: Fix incorrect commute with sub when folding immediates

When a fold of an immediate into a sub/subrev required shrinking the
instruction, the wrong VOP2 opcode was used. This was using the VOP2
equivalent of the original instruction, not the commuted instruction
with the inverted opcode.

llvm-svn: 359883

show more ...


# 5cf81677 02-May-2019 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] gfx1010 allows VOP3 to have a literal

Differential Revision: https://reviews.llvm.org/D61413

llvm-svn: 359756


# 389d5a34 22-Apr-2019 Michael Liao <michael.hliao@gmail.com>

[AMDGPU] Fix an issue in `op_sel_hi` skipping.

Summary:
- Only apply packed literal `op_sel_hi` skipping on operands requiring
packed literals. Even an instruction is `packed`, it may have operand

[AMDGPU] Fix an issue in `op_sel_hi` skipping.

Summary:
- Only apply packed literal `op_sel_hi` skipping on operands requiring
packed literals. Even an instruction is `packed`, it may have operand
requiring non-packed literal, such as `v_dot2_f32_f16`.

Reviewers: rampitec, arsenm, kzhuravl

Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60978

llvm-svn: 358922

show more ...


# 055e4dce 29-Mar-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Remove dx10-clamp from subtarget features

Since this can be set with s_setreg*, it should not be a subtarget
property. Set a default based on the calling convention, and Introduce
a new amdg

AMDGPU: Remove dx10-clamp from subtarget features

Since this can be set with s_setreg*, it should not be a subtarget
property. Set a default based on the calling convention, and Introduce
a new amdgpu-dx10-clamp attribute to override this if desired.

Also introduce a new amdgpu-ieee attribute to match.

The values need to match to allow inlining. I think it is OK for the
caller's dx10-clamp attribute to override the callee, but there
doesn't appear to be the infrastructure to do this currently without
definining the attribute in the generic Attributes.td.

Eventually the calling convention lowering will need to insert a mode
switch somewhere for these.

llvm-svn: 357302

show more ...


# 2e94f6e5 18-Mar-2019 Tim Renouf <tpr.llvm@botech.co.uk>

[AMDGPU] Asm/disasm v_cndmask_b32_e64 with abs/neg source modifiers

This commit allows v_cndmask_b32_e64 with abs, neg source
modifiers on src0, src1 to be assembled and disassembled.

This does app

[AMDGPU] Asm/disasm v_cndmask_b32_e64 with abs/neg source modifiers

This commit allows v_cndmask_b32_e64 with abs, neg source
modifiers on src0, src1 to be assembled and disassembled.

This does appear to be allowed, even though they are floating point
modifiers and the operand type is b32.

To do this, I added src0_modifiers and src1_modifiers to the
MachineInstr, which involved fixing up several places in codegen and mir
tests.

Differential Revision: https://reviews.llvm.org/D59191

Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea
llvm-svn: 356398

show more ...


Revision tags: llvmorg-8.0.0
# da644c02 13-Mar-2019 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Silence gcc 7 warnings

Differential Revision: https://reviews.llvm.org/D59330

llvm-svn: 356100


Revision tags: llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1
# 2946cd70 19-Jan-2019 Chandler Carruth <chandlerc@gmail.com>

Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the ne

Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636

show more ...


# 993e2798 03-Jan-2019 Alexander Timofeev <Alexander.Timofeev@amd.com>

[AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression.

Detailed description: SIFoldOperands::foldInstOperand iterates over the
operand uses calling the function that change

[AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression.

Detailed description: SIFoldOperands::foldInstOperand iterates over the
operand uses calling the function that changes def-use iteratorson the
way. As a result loop exits immediately when def-use iterator is
changed. Hence, the operand is folded to the very first use instruction
only. This makes VGPR live along the whole basic block and increases
register pressure significantly. The performance drop observed in SHOC
DeviceMemory test is caused by this bug.

Proposed fix: collect uses to separate container for further processing
in another loop.

Testing: make check-llvm
SHOC performance test.

Reviewers: rampitec, ronlieb

Differential Revision: https://reviews.llvm.org/D56161

llvm-svn: 350350

show more ...


Revision tags: llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1
# b080adfc 27-Sep-2018 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Fold copy (copy vgpr)

This allows to reduce a number of used VGPRs in some cases.

Differential Revision: https://reviews.llvm.org/D52577

llvm-svn: 343249


Revision tags: llvmorg-7.0.0, llvmorg-7.0.0-rc3
# 201f892b 30-Aug-2018 Alexander Timofeev <Alexander.Timofeev@amd.com>

[AMDGPU] Preliminary patch for divergence driven instruction selection. Operands Folding 1.

Reviewers: rampitec

Differential revision: https://reviews/llvm/org/D51316

llvm-svn: 341068


# 9cca227d 28-Aug-2018 Fangrui Song <maskray@google.com>

[AMDGPU] Fix -Wunused-variable when -DLLVM_ENABLE_ASSERTIONS=off

llvm-svn: 340868


# 44a8a756 28-Aug-2018 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Force shrinking of add/sub even if the carry is used

The original motivating example uses a 64-bit add, so the carry
is used. Insert a copy from VCC. This may allow shrinking of
the used car

AMDGPU: Force shrinking of add/sub even if the carry is used

The original motivating example uses a 64-bit add, so the carry
is used. Insert a copy from VCC. This may allow shrinking of
the used carry instruction. At worst, we are replacing a
mov to materialize the constant with a copy of vcc.

llvm-svn: 340862

show more ...


# de6c421c 28-Aug-2018 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Shrink insts to fold immediates

This needs to be done in the SSA fold operands
pass to be effective, so there is a bit of overlap
with SIShrinkInstructions but I don't think this
is practica

AMDGPU: Shrink insts to fold immediates

This needs to be done in the SSA fold operands
pass to be effective, so there is a bit of overlap
with SIShrinkInstructions but I don't think this
is practically avoidable.

llvm-svn: 340859

show more ...


Revision tags: llvmorg-7.0.0-rc2
# 13b0db92 12-Aug-2018 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Check NSZ MI flag when folding omod

I'm not sure the exact nsz flag combination that
is OK. I think as long as it's on either, this is OK.
For now just check it on the omod multiply.

llvm-s

AMDGPU: Check NSZ MI flag when folding omod

I'm not sure the exact nsz flag combination that
is OK. I think as long as it's on either, this is OK.
For now just check it on the omod multiply.

llvm-svn: 339513

show more ...


# 0d1b3934 06-Aug-2018 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fold v_lshl_or_b32 with 0 src0

Appears from expansion of some packed cases.

llvm-svn: 339025


Revision tags: llvmorg-7.0.0-rc1
# 5bfbae5c 11-Jul-2018 Tom Stellard <tstellar@redhat.com>

AMDGPU: Refactor Subtarget classes

Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Me

AMDGPU: Refactor Subtarget classes

Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Merge R600Subtarget::Generation and GCNSubtarget::Generation into
AMDGPUSubtarget::Generation.

Reviewers: arsenm, jvesely

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D49037

llvm-svn: 336851

show more ...


# c5a154db 28-Jun-2018 Tom Stellard <tstellar@redhat.com>

AMDGPU: Separate R600 and GCN TableGen files

Summary:
We now have two sets of generated TableGen files, one for R600 and one
for GCN, so each sub-target now has its own tables of instructions,
regis

AMDGPU: Separate R600 and GCN TableGen files

Summary:
We now have two sets of generated TableGen files, one for R600 and one
for GCN, so each sub-target now has its own tables of instructions,
registers, ISel patterns, etc. This should help reduce compile time
since each sub-target now only has to consider information that
is specific to itself. This will also help prevent the R600
sub-target from slowing down new features for GCN, like disassembler
support, GlobalISel, etc.

Reviewers: arsenm, nhaehnle, jvesely

Reviewed By: arsenm

Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46365

llvm-svn: 335942

show more ...


Revision tags: llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2
# 44b30b45 22-May-2018 Tom Stellard <tstellar@redhat.com>

AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headers

Summary:
MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction
and register defintions, which are hu

AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headers

Summary:
MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction
and register defintions, which are huge so we only want to include
them where needed.

This will also make it easier if we want to split the R600 and GCN
definitions into separate tablegenerated files.

I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h
because it uses some enums from the header to initialize default values
for the SIMachineFunction class, so I ended up having to remove includes of
SIMachineFunctionInfo.h from headers too.

Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46272

llvm-svn: 332930

show more ...


# d34e60ca 14-May-2018 Nicola Zaghen <nicola.zaghen@imgtec.com>

Rename DEBUG macro to LLVM_DEBUG.

The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/

Rename DEBUG macro to LLVM_DEBUG.

The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624

llvm-svn: 332240

show more ...


# 0084adc5 30-Apr-2018 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add Vega12 and Vega20

Changes by
Matt Arsenault
Konstantin Zhuravlyov

llvm-svn: 331215


# a4bfb3c4 24-Apr-2018 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Truncate packed inline constant

If a packed inline constant is sign extended it must be truncated
after the shift. I.e. a constant (0xH0000, 0xHBC00), will be represented
as 0xFFFFFFFFBC000

[AMDGPU] Truncate packed inline constant

If a packed inline constant is sign extended it must be truncated
after the shift. I.e. a constant (0xH0000, 0xHBC00), will be represented
as 0xFFFFFFFFBC000000 in the IR because the immediate is sign extended
to 64 bit. After the value shifted right by 16 to use it in a low part
with op_sel_hi it becomes 0xFFFFFFFFBC00 and does not qualify as inline
constant any longer.

Fixed the error and added verification code. Without the fix and with
the verification bug is causing pk_max_f16_literal.ll to fail.

Differential Revision: https://reviews.llvm.org/D45987

llvm-svn: 330752

show more ...


# 160f8579 19-Apr-2018 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Use packed literals with zero either lower or hi part

Differential Revision: https://reviews.llvm.org/D45790

llvm-svn: 330365


# 8b20b7dc 17-Apr-2018 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Enabled v2.16 literals for VOP3P

Literal encoding needs op_sel_hi to select low 16 bit in this case.

Differential Revision: https://reviews.llvm.org/D45745

llvm-svn: 330230


Revision tags: llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1
# cbda7ff4 10-Mar-2018 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix crash when constant folding with physreg operand

llvm-svn: 327209


12345678910