#
0446fbe4 |
| 03-May-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Replace shrunk instruction with dummy implicit_def
This was broken if the original operand was killed. The kill flag would appear on both instructions, and fail the verifier. Keep the kill f
AMDGPU: Replace shrunk instruction with dummy implicit_def
This was broken if the original operand was killed. The kill flag would appear on both instructions, and fail the verifier. Keep the kill flag, but remove the operands from the old instruction. This has an added benefit of really reducing the use count for future folds.
Ideally the pass would be structured more like what PeepholeOptimizer does to avoid this hack to avoid breaking instruction iterators.
llvm-svn: 359891
show more ...
|
#
2c8936fd |
| 03-May-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix incorrect commute with sub when folding immediates
When a fold of an immediate into a sub/subrev required shrinking the instruction, the wrong VOP2 opcode was used. This was using the VO
AMDGPU: Fix incorrect commute with sub when folding immediates
When a fold of an immediate into a sub/subrev required shrinking the instruction, the wrong VOP2 opcode was used. This was using the VOP2 equivalent of the original instruction, not the commuted instruction with the inverted opcode.
llvm-svn: 359883
show more ...
|
#
5cf81677 |
| 02-May-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] gfx1010 allows VOP3 to have a literal
Differential Revision: https://reviews.llvm.org/D61413
llvm-svn: 359756
|
#
389d5a34 |
| 22-Apr-2019 |
Michael Liao <michael.hliao@gmail.com> |
[AMDGPU] Fix an issue in `op_sel_hi` skipping.
Summary: - Only apply packed literal `op_sel_hi` skipping on operands requiring packed literals. Even an instruction is `packed`, it may have operand
[AMDGPU] Fix an issue in `op_sel_hi` skipping.
Summary: - Only apply packed literal `op_sel_hi` skipping on operands requiring packed literals. Even an instruction is `packed`, it may have operand requiring non-packed literal, such as `v_dot2_f32_f16`.
Reviewers: rampitec, arsenm, kzhuravl
Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60978
llvm-svn: 358922
show more ...
|
#
055e4dce |
| 29-Mar-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove dx10-clamp from subtarget features
Since this can be set with s_setreg*, it should not be a subtarget property. Set a default based on the calling convention, and Introduce a new amdg
AMDGPU: Remove dx10-clamp from subtarget features
Since this can be set with s_setreg*, it should not be a subtarget property. Set a default based on the calling convention, and Introduce a new amdgpu-dx10-clamp attribute to override this if desired.
Also introduce a new amdgpu-ieee attribute to match.
The values need to match to allow inlining. I think it is OK for the caller's dx10-clamp attribute to override the callee, but there doesn't appear to be the infrastructure to do this currently without definining the attribute in the generic Attributes.td.
Eventually the calling convention lowering will need to insert a mode switch somewhere for these.
llvm-svn: 357302
show more ...
|
#
2e94f6e5 |
| 18-Mar-2019 |
Tim Renouf <tpr.llvm@botech.co.uk> |
[AMDGPU] Asm/disasm v_cndmask_b32_e64 with abs/neg source modifiers
This commit allows v_cndmask_b32_e64 with abs, neg source modifiers on src0, src1 to be assembled and disassembled.
This does app
[AMDGPU] Asm/disasm v_cndmask_b32_e64 with abs/neg source modifiers
This commit allows v_cndmask_b32_e64 with abs, neg source modifiers on src0, src1 to be assembled and disassembled.
This does appear to be allowed, even though they are floating point modifiers and the operand type is b32.
To do this, I added src0_modifiers and src1_modifiers to the MachineInstr, which involved fixing up several places in codegen and mir tests.
Differential Revision: https://reviews.llvm.org/D59191
Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea llvm-svn: 356398
show more ...
|
Revision tags: llvmorg-8.0.0 |
|
#
da644c02 |
| 13-Mar-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Silence gcc 7 warnings
Differential Revision: https://reviews.llvm.org/D59330
llvm-svn: 356100
|
Revision tags: llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1 |
|
#
2946cd70 |
| 19-Jan-2019 |
Chandler Carruth <chandlerc@gmail.com> |
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the ne
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository.
llvm-svn: 351636
show more ...
|
#
993e2798 |
| 03-Jan-2019 |
Alexander Timofeev <Alexander.Timofeev@amd.com> |
[AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression.
Detailed description: SIFoldOperands::foldInstOperand iterates over the operand uses calling the function that change
[AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression.
Detailed description: SIFoldOperands::foldInstOperand iterates over the operand uses calling the function that changes def-use iteratorson the way. As a result loop exits immediately when def-use iterator is changed. Hence, the operand is folded to the very first use instruction only. This makes VGPR live along the whole basic block and increases register pressure significantly. The performance drop observed in SHOC DeviceMemory test is caused by this bug.
Proposed fix: collect uses to separate container for further processing in another loop.
Testing: make check-llvm SHOC performance test.
Reviewers: rampitec, ronlieb
Differential Revision: https://reviews.llvm.org/D56161
llvm-svn: 350350
show more ...
|
Revision tags: llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1 |
|
#
b080adfc |
| 27-Sep-2018 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Fold copy (copy vgpr)
This allows to reduce a number of used VGPRs in some cases.
Differential Revision: https://reviews.llvm.org/D52577
llvm-svn: 343249
|
Revision tags: llvmorg-7.0.0, llvmorg-7.0.0-rc3 |
|
#
201f892b |
| 30-Aug-2018 |
Alexander Timofeev <Alexander.Timofeev@amd.com> |
[AMDGPU] Preliminary patch for divergence driven instruction selection. Operands Folding 1.
Reviewers: rampitec
Differential revision: https://reviews/llvm/org/D51316
llvm-svn: 341068
|
#
9cca227d |
| 28-Aug-2018 |
Fangrui Song <maskray@google.com> |
[AMDGPU] Fix -Wunused-variable when -DLLVM_ENABLE_ASSERTIONS=off
llvm-svn: 340868
|
#
44a8a756 |
| 28-Aug-2018 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Force shrinking of add/sub even if the carry is used
The original motivating example uses a 64-bit add, so the carry is used. Insert a copy from VCC. This may allow shrinking of the used car
AMDGPU: Force shrinking of add/sub even if the carry is used
The original motivating example uses a 64-bit add, so the carry is used. Insert a copy from VCC. This may allow shrinking of the used carry instruction. At worst, we are replacing a mov to materialize the constant with a copy of vcc.
llvm-svn: 340862
show more ...
|
#
de6c421c |
| 28-Aug-2018 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Shrink insts to fold immediates
This needs to be done in the SSA fold operands pass to be effective, so there is a bit of overlap with SIShrinkInstructions but I don't think this is practica
AMDGPU: Shrink insts to fold immediates
This needs to be done in the SSA fold operands pass to be effective, so there is a bit of overlap with SIShrinkInstructions but I don't think this is practically avoidable.
llvm-svn: 340859
show more ...
|
Revision tags: llvmorg-7.0.0-rc2 |
|
#
13b0db92 |
| 12-Aug-2018 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Check NSZ MI flag when folding omod
I'm not sure the exact nsz flag combination that is OK. I think as long as it's on either, this is OK. For now just check it on the omod multiply.
llvm-s
AMDGPU: Check NSZ MI flag when folding omod
I'm not sure the exact nsz flag combination that is OK. I think as long as it's on either, this is OK. For now just check it on the omod multiply.
llvm-svn: 339513
show more ...
|
#
0d1b3934 |
| 06-Aug-2018 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fold v_lshl_or_b32 with 0 src0
Appears from expansion of some packed cases.
llvm-svn: 339025
|
Revision tags: llvmorg-7.0.0-rc1 |
|
#
5bfbae5c |
| 11-Jul-2018 |
Tom Stellard <tstellar@redhat.com> |
AMDGPU: Refactor Subtarget classes
Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Me
AMDGPU: Refactor Subtarget classes
Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation.
Reviewers: arsenm, jvesely
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D49037
llvm-svn: 336851
show more ...
|
#
c5a154db |
| 28-Jun-2018 |
Tom Stellard <tstellar@redhat.com> |
AMDGPU: Separate R600 and GCN TableGen files
Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, regis
AMDGPU: Separate R600 and GCN TableGen files
Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc.
Reviewers: arsenm, nhaehnle, jvesely
Reviewed By: arsenm
Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46365
llvm-svn: 335942
show more ...
|
Revision tags: llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2 |
|
#
44b30b45 |
| 22-May-2018 |
Tom Stellard <tstellar@redhat.com> |
AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headers
Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are hu
AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headers
Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed.
This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files.
I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too.
Reviewers: arsenm, nhaehnle
Reviewed By: nhaehnle
Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46272
llvm-svn: 332930
show more ...
|
#
d34e60ca |
| 14-May-2018 |
Nicola Zaghen <nicola.zaghen@imgtec.com> |
Rename DEBUG macro to LLVM_DEBUG. The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/
Rename DEBUG macro to LLVM_DEBUG. The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
llvm-svn: 332240
show more ...
|
#
0084adc5 |
| 30-Apr-2018 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add Vega12 and Vega20
Changes by Matt Arsenault Konstantin Zhuravlyov
llvm-svn: 331215
|
#
a4bfb3c4 |
| 24-Apr-2018 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Truncate packed inline constant
If a packed inline constant is sign extended it must be truncated after the shift. I.e. a constant (0xH0000, 0xHBC00), will be represented as 0xFFFFFFFFBC000
[AMDGPU] Truncate packed inline constant
If a packed inline constant is sign extended it must be truncated after the shift. I.e. a constant (0xH0000, 0xHBC00), will be represented as 0xFFFFFFFFBC000000 in the IR because the immediate is sign extended to 64 bit. After the value shifted right by 16 to use it in a low part with op_sel_hi it becomes 0xFFFFFFFFBC00 and does not qualify as inline constant any longer.
Fixed the error and added verification code. Without the fix and with the verification bug is causing pk_max_f16_literal.ll to fail.
Differential Revision: https://reviews.llvm.org/D45987
llvm-svn: 330752
show more ...
|
#
160f8579 |
| 19-Apr-2018 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Use packed literals with zero either lower or hi part
Differential Revision: https://reviews.llvm.org/D45790
llvm-svn: 330365
|
#
8b20b7dc |
| 17-Apr-2018 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Enabled v2.16 literals for VOP3P
Literal encoding needs op_sel_hi to select low 16 bit in this case.
Differential Revision: https://reviews.llvm.org/D45745
llvm-svn: 330230
|
Revision tags: llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1 |
|
#
cbda7ff4 |
| 10-Mar-2018 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix crash when constant folding with physreg operand
llvm-svn: 327209
|