Revision tags: llvmorg-11.1.0-rc1 |
|
#
6a87e9b0 |
| 25-Dec-2020 |
dfukalov <daniil.fukalov@amd.com> |
[NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D93813
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2 |
|
#
2dc4a14e |
| 05-Dec-2020 |
Kazu Hirata <kazu@google.com> |
[AMDGPU] Use llvm::is_contained (NFC)
|
Revision tags: llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3 |
|
#
98de0d22 |
| 21-Aug-2020 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Apply llvm-prefer-register-over-unsigned from clang-tidy
|
#
34978602 |
| 20-Aug-2020 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister
... in favour of the isPhysical/isVirtual methods.
|
Revision tags: llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init |
|
#
d9e8b2cb |
| 23-Dec-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU/GlobalISel: Skip DAG hack passes on selected functions
The way fallback to SelectionDAG works is somewhat surprising to me. When the fallback path is enabled, the entire set of SelectionDAG s
AMDGPU/GlobalISel: Skip DAG hack passes on selected functions
The way fallback to SelectionDAG works is somewhat surprising to me. When the fallback path is enabled, the entire set of SelectionDAG selector passes is added to the pass pipeline, and each one needs to check if the function was selected. This results in the surprising behavior of running SIFixSGPRCopies for example, but only if -global-isel-abort=2 is used.
SIAddIMGInitPass is also added in addInstSelector, but I'm not sure why we have this pass or if it should be added somewhere else for GlobalISel.
show more ...
|
Revision tags: llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1 |
|
#
05da2fe5 |
| 13-Nov-2019 |
Reid Kleckner <rnk@google.com> |
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of reco
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
show more ...
|
#
e921ede5 |
| 26-Oct-2019 |
cdevadas <cdevadas@amd.com> |
[AMDGPU] Fix Vreg_1 PHI lowering in SILowerI1Copies.
There is a minor flaw in the implementation of function lowerPhis. This function replaces values of regclass Vreg_1 (boolean values) involved in
[AMDGPU] Fix Vreg_1 PHI lowering in SILowerI1Copies.
There is a minor flaw in the implementation of function lowerPhis. This function replaces values of regclass Vreg_1 (boolean values) involved in PHIs into an SGPR. Currently it iterates over the MBBs and performs an inplace lowering of PHIs and fails to lower any incoming value that itself is another PHI of Vreg_1 regclass. The failure occurs only when the MBB where the incoming PHI value belongs is not visited/lowered yet.
To fix this problem, collect all Vreg_1 PHIs upfront and then perform the lowering.
Differential Revision: https://reviews.llvm.org/D69182
show more ...
|
#
269bd15c |
| 25-Sep-2019 |
Jakub Kuderski <kubakuderski@gmail.com> |
[Dominators][AMDGPU] Don't use virtual exit node in findNearestCommonDominator. Cleanup MachinePostDominators.
Summary: This patch fixes a bug that originated from passing a virtual exit block (null
[Dominators][AMDGPU] Don't use virtual exit node in findNearestCommonDominator. Cleanup MachinePostDominators.
Summary: This patch fixes a bug that originated from passing a virtual exit block (nullptr) to `MachinePostDominatorTee::findNearestCommonDominator` and resulted in assertion failures inside its callee. It also applies a small cleanup to the class.
The patch introduces a new function in PDT that given a list of `MachineBasicBlock`s finds their NCD. The new overload of `findNearestCommonDominator` handles virtual root correctly.
Note that similar handling of virtual root nodes is not necessary in (forward) `DominatorTree`s, as right now they don't use virtual roots.
Reviewers: tstellar, tpr, nhaehnle, arsenm, NutshellySima, grosser, hliao
Reviewed By: hliao
Subscribers: hliao, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits
Tags: #amdgpu, #llvm
Differential Revision: https://reviews.llvm.org/D67974
llvm-svn: 372874
show more ...
|
Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4 |
|
#
8bc05d7d |
| 09-Sep-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Make VReg_1 size be 1
This was getting chosen as the preferred 32-bit register class based on how TableGen selects subregister classes.
llvm-svn: 371438
|
Revision tags: llvmorg-9.0.0-rc3 |
|
#
0c476111 |
| 15-Aug-2019 |
Daniel Sanders <daniel_l_sanders@apple.com> |
Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Re
Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible).
Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor
Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&
Depends on D65919
Reviewers: arsenm, bogner, craig.topper, RKSimon
Reviewed By: arsenm
Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65962
llvm-svn: 369041
show more ...
|
Revision tags: llvmorg-9.0.0-rc2 |
|
#
2bea69bf |
| 01-Aug-2019 |
Daniel Sanders <daniel_l_sanders@apple.com> |
Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC
llvm-svn: 367633
|
Revision tags: llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4 |
|
#
32ef9292 |
| 27-Jun-2019 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Make fixing i1 copies robust against re-ordering
Summary: The new test case led to incorrect code.
Change-Id: Ief48b227e97aa662dd3535c9bafb27d4a184efca
Reviewers: arsenm, david-salinas
Su
AMDGPU: Make fixing i1 copies robust against re-ordering
Summary: The new test case led to incorrect code.
Change-Id: Ief48b227e97aa662dd3535c9bafb27d4a184efca
Reviewers: arsenm, david-salinas
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63871
llvm-svn: 364566
show more ...
|
Revision tags: llvmorg-8.0.1-rc3 |
|
#
52500216 |
| 16-Jun-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] gfx10 conditional registers handling
This is cpp source part of wave32 support, excluding overriden getRegClass().
Differential Revision: https://reviews.llvm.org/D63351
llvm-svn: 363513
|
Revision tags: llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1 |
|
#
7edae4c4 |
| 23-Apr-2019 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Fix LCSSA phi lowering in SILowerI1Copies
Summary: When an LCSSA phi survives through instruction selection, the pass ends up removing that phi entirely because it is dominated by the logic
AMDGPU: Fix LCSSA phi lowering in SILowerI1Copies
Summary: When an LCSSA phi survives through instruction selection, the pass ends up removing that phi entirely because it is dominated by the logic that does the lanemask merging.
This then used to trigger an assertion when processing a dependent phi instruction.
Change-Id: Id4949719f8298062fe476a25718acccc109113b6
Reviewers: llvm-commits
Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, tpr, dstuttard, rtaylor, arsenm
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60999
llvm-svn: 358983
show more ...
|
#
2e94f6e5 |
| 18-Mar-2019 |
Tim Renouf <tpr.llvm@botech.co.uk> |
[AMDGPU] Asm/disasm v_cndmask_b32_e64 with abs/neg source modifiers
This commit allows v_cndmask_b32_e64 with abs, neg source modifiers on src0, src1 to be assembled and disassembled.
This does app
[AMDGPU] Asm/disasm v_cndmask_b32_e64 with abs/neg source modifiers
This commit allows v_cndmask_b32_e64 with abs, neg source modifiers on src0, src1 to be assembled and disassembled.
This does appear to be allowed, even though they are floating point modifiers and the operand type is b32.
To do this, I added src0_modifiers and src1_modifiers to the MachineInstr, which involved fixing up several places in codegen and mir tests.
Differential Revision: https://reviews.llvm.org/D59191
Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea llvm-svn: 356398
show more ...
|
Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1 |
|
#
2946cd70 |
| 19-Jan-2019 |
Chandler Carruth <chandlerc@gmail.com> |
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the ne
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository.
llvm-svn: 351636
show more ...
|
Revision tags: llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1 |
|
#
814abb59 |
| 31-Oct-2018 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Rewrite SILowerI1Copies to always stay on SALU
Summary: Instead of writing boolean values temporarily into 32-bit VGPRs if they are involved in PHIs or are observed from outside a loop, we u
AMDGPU: Rewrite SILowerI1Copies to always stay on SALU
Summary: Instead of writing boolean values temporarily into 32-bit VGPRs if they are involved in PHIs or are observed from outside a loop, we use bitwise masking operations to combine lane masks in a way that is consistent with wave control flow.
Move SIFixSGPRCopies to before this pass, since that pass incorrectly attempts to move SGPR phis to VGPRs.
This should recover most of the code quality that was lost with the bug fix in "AMDGPU: Remove PHI loop condition optimization".
There are still some relevant cases where code quality could be improved, in particular:
- We often introduce redundant masks with EXEC. Ideally, we'd have a generic computeKnownBits-like analysis to determine whether masks are already masked by EXEC, so we can avoid this masking both here and when lowering uniform control flow.
- The criterion we use to determine whether a def is observed from outside a loop is conservative: it doesn't check whether (loop) branch conditions are uniform.
Change-Id: Ibabdb373a7510e426b90deef00f5e16c5d56e64b
Reviewers: arsenm, rampitec, tpr
Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, t-tye, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D53496
llvm-svn: 345719
show more ...
|
Revision tags: llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1 |
|
#
5bfbae5c |
| 11-Jul-2018 |
Tom Stellard <tstellar@redhat.com> |
AMDGPU: Refactor Subtarget classes
Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Me
AMDGPU: Refactor Subtarget classes
Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation.
Reviewers: arsenm, jvesely
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D49037
llvm-svn: 336851
show more ...
|
Revision tags: llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2 |
|
#
44b30b45 |
| 22-May-2018 |
Tom Stellard <tstellar@redhat.com> |
AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headers
Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are hu
AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headers
Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed.
This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files.
I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too.
Reviewers: arsenm, nhaehnle
Reviewed By: nhaehnle
Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46272
llvm-svn: 332930
show more ...
|
Revision tags: llvmorg-6.0.1-rc1 |
|
#
3ffd383a |
| 04-Apr-2018 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Fix copying i1 value out of loop with non-uniform exit
Summary: When an i1-value is defined inside of a loop and used outside of it, we cannot simply use the SGPR bitmask from the loop's las
AMDGPU: Fix copying i1 value out of loop with non-uniform exit
Summary: When an i1-value is defined inside of a loop and used outside of it, we cannot simply use the SGPR bitmask from the loop's last iteration.
There are also useful and correct cases of an i1-value being copied between basic blocks, e.g. when a condition is computed outside of a loop and used inside it. The concept of dominators is not sufficient to capture what is going on, so I propose the notion of "lane-dominators".
Fixes a bug encountered in Nier: Automata.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103743 Change-Id: If37b969ddc71d823ab3004aeafb9ea050e45bd9a
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D40547
llvm-svn: 329164
show more ...
|
Revision tags: llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1 |
|
#
f842297d |
| 13-Dec-2017 |
Matthias Braun <matze@braunis.de> |
Rename LiveIntervalAnalysis.h to LiveIntervals.h
Headers/Implementation files should be named after the class they declare/define.
Also eliminated an `#include "llvm/CodeGen/LiveIntervalAnalysis.h"
Rename LiveIntervalAnalysis.h to LiveIntervals.h
Headers/Implementation files should be named after the class they declare/define.
Also eliminated an `#include "llvm/CodeGen/LiveIntervalAnalysis.h"` in favor of `class LiveIntarvals;`
llvm-svn: 320546
show more ...
|
Revision tags: llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1 |
|
#
ce4ddd06 |
| 29-Sep-2017 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: VALU carry-in and v_cndmask condition cannot be EXEC
The hardware will only forward EXEC_LO; the high 32 bits will be zero.
Additionally, inline constants do not work. At least,
v_addc_
AMDGPU: VALU carry-in and v_cndmask condition cannot be EXEC
The hardware will only forward EXEC_LO; the high 32 bits will be zero.
Additionally, inline constants do not work. At least,
v_addc_u32_e64 v0, vcc, v0, v1, -1
which could conceivably be used to combine (v0 + v1 + 1) into a single instruction, acts as if all carry-in bits are zero.
The llvm.amdgcn.ps.live test is adjusted; it would be nice to combine
s_mov_b64 s[0:1], exec v_cndmask_b32_e64 v0, v1, v2, s[0:1]
into
v_mov_b32 v0, v3
but it's not particularly high priority.
Fixes dEQP-GLES31.functional.shaders.helper_invocation.value.*
llvm-svn: 314522
show more ...
|
Revision tags: llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2, llvmorg-5.0.0-rc1, llvmorg-4.0.1, llvmorg-4.0.1-rc3 |
|
#
6bda14b3 |
| 06-Jun-2017 |
Chandler Carruth <chandlerc@gmail.com> |
Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line
Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
show more ...
|
Revision tags: llvmorg-4.0.1-rc2, llvmorg-4.0.1-rc1, llvmorg-4.0.0, llvmorg-4.0.0-rc4, llvmorg-4.0.0-rc3, llvmorg-4.0.0-rc2, llvmorg-4.0.0-rc1 |
|
#
116bbab4 |
| 13-Jan-2017 |
Diana Picus <diana.picus@linaro.org> |
[CodeGen] Rename MachineInstrBuilder::addOperand. NFC
Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand.
S
[CodeGen] Rename MachineInstrBuilder::addOperand. NFC
Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand.
See https://reviews.llvm.org/D28057 for the whole discussion.
Differential Revision: https://reviews.llvm.org/D28556
llvm-svn: 291891
show more ...
|
Revision tags: llvmorg-3.9.1, llvmorg-3.9.1-rc3, llvmorg-3.9.1-rc2, llvmorg-3.9.1-rc1 |
|
#
0ee250ee |
| 28-Nov-2016 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies
Codegen prepare sinks comparisons close to a user is we have only one register for conditions. For AMDGPU we have
[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies
Codegen prepare sinks comparisons close to a user is we have only one register for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions. Changed BE to report we have many condition registers. That way IR LICM pass would hoist an invariant comparison out of a loop and codegen prepare will not sink it.
With that done a condition is calculated in one block and used in another. Current behavior is to store workitem's condition in a VGPR using v_cndmask_b32 and then restore it with yet another v_cmp instruction from that v_cndmask's result. To mitigate the issue a propagation of source SGPR pair in place of v_cmp is implemented. Additional side effect of this is that we may consume less VGPRs at a cost of more SGPRs in case if holding of multiple conditions is needed, and that is a clear win in most cases.
Differential Revision: https://reviews.llvm.org/D26114
llvm-svn: 288053
show more ...
|