Revision tags: llvmorg-5.0.1-rc2 |
|
#
b3bde2ea |
| 17-Nov-2017 |
David Blaikie <dblaikie@gmail.com> |
Fix a bunch more layering of CodeGen headers that are in Target
All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, n
Fix a bunch more layering of CodeGen headers that are in Target
All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around).
llvm-svn: 318490
show more ...
|
#
6348188e |
| 16-Nov-2017 |
Eric Christopher <echristo@gmail.com> |
Fix thinko in last commit.
llvm-svn: 318374
|
#
3148a1be |
| 16-Nov-2017 |
Eric Christopher <echristo@gmail.com> |
Add NDEBUG checks around LLVM_DUMP_METHOD functions for Wunused-function warnings.
llvm-svn: 318373
|
Revision tags: llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2 |
|
#
66b9bd6e |
| 04-Aug-2017 |
Connor Abbott <cwabbott0@gmail.com> |
[AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic
Summary: This intrinsic lets us set inactive lanes to an identity value when implementing wavefront reductions. In combination with Whole Wavefr
[AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic
Summary: This intrinsic lets us set inactive lanes to an identity value when implementing wavefront reductions. In combination with Whole Wavefront Mode, it lets inactive lanes be skipped over as required by GLSL/Vulkan. Lowering the intrinsic needs to happen post-RA so that RA knows that the destination isn't completely overwritten due to the EXEC shenanigans, so we need another pseudo-instruction to represent the un-lowered intrinsic.
Reviewers: tstellar, arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D34719
llvm-svn: 310088
show more ...
|
#
92638ab6 |
| 04-Aug-2017 |
Connor Abbott <cwabbott0@gmail.com> |
[AMDGPU] Add support for Whole Wavefront Mode
Summary: Whole Wavefront Wode (WWM) is similar to WQM, except that all of the lanes are always enabled, regardless of control flow. This is required for
[AMDGPU] Add support for Whole Wavefront Mode
Summary: Whole Wavefront Wode (WWM) is similar to WQM, except that all of the lanes are always enabled, regardless of control flow. This is required for implementing wavefront reductions in non-uniform control flow, where we need to use the inactive lanes to propagate intermediate results, so they need to be enabled. We need to propagate WWM to uses (unless they're explicitly marked as exact) so that they also propagate intermediate results correctly. We do the analysis and exec mask munging during the WQM pass, since there are interactions with WQM for things that require both WQM and WWM. For simplicity, WWM is entirely block-local -- blocks are never WWM on entry or exit of a block, and WWM is not propagated to the block level. This means that computations involving WWM cannot involve control flow, but we only ever plan to use WWM for a few limited purposes (none of which involve control flow) anyways.
Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There isn't yet a way to turn WWM off -- that will be added in a future change.
Finally, it turns out that turning on inactive lanes causes a number of problems with register allocation. While the best long-term solution seems like teaching LLVM's register allocator about predication, for now we need to add some hacks to prevent ourselves from getting into trouble due to constraints that aren't currently expressed in LLVM. For the gory details, see the comments at the top of SIFixWWMLiveness.cpp.
Reviewers: arsenm, nhaehnle, tpr
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D35524
llvm-svn: 310087
show more ...
|
#
de068fe8 |
| 04-Aug-2017 |
Connor Abbott <cwabbott0@gmail.com> |
[AMDGPU] refactor WQM pass in preparation for WWM (NFCI)
Summary: Right now, the WQM pass conflates two different things when tracking the Needs of an instruction:
1. Needs can be StateWQM, which i
[AMDGPU] refactor WQM pass in preparation for WWM (NFCI)
Summary: Right now, the WQM pass conflates two different things when tracking the Needs of an instruction:
1. Needs can be StateWQM, which is propagated to other instructions, and means that this instruction (and everything it depends on) must be calculated in WQM. 2. Needs can be StateExact, which is not propagated to other instructions, and means that this instruction must not be calculated in WQM and WQM-ness must not be propagated past this instruction.
This works now because there are only two different states, but in the future we want to be able to express things like "calculate this in WQM, but please disable WWM and don't propagate it" (to implement @llvm.amdgcn.set.inactive). In order to do this, we need to split the per-instruction Needs field in two: a new Needs field, which can only contain StateWQM (and in the future, StateWWM) and is propagated to sources, and a Disables field, which can also contain just StateWQM or nothing for now.
We keep the per-block tracking the same for now, by translating Needs/Disables to the old representation with only StateWQM or StateExact. The other place that needs special handling is when we emit the state transitions. We could just translate back to the old representation there as well, which we almost do, but instead of 0 as a placeholder value for "any state," we explicitly or together all the states an instruction is allowed to be in. This lets us refactor the code in preparation for WWM, where we'll need to be able to handle things like "this instruction must be in Exact or WQM, but not WWM."
Reviewers: arsenm, nhaehnle, tpr
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D35523
llvm-svn: 310086
show more ...
|
#
8c217d0a |
| 04-Aug-2017 |
Connor Abbott <cwabbott0@gmail.com> |
[AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM
Summary: Previously, we assumed that certain types of instructions needed WQM in pixel shaders, particularly DS instructions and image sampling inst
[AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM
Summary: Previously, we assumed that certain types of instructions needed WQM in pixel shaders, particularly DS instructions and image sampling instructions. This was ok because with OpenGL, the assumption was correct. But we want to start using DPP instructions for derivatives as well as other things, so the assumption that we can infer whether to use WQM based on the instruction won't continue to hold. This intrinsic lets frontends like Mesa indicate what things need WQM based on their knowledge of the API, rather than second-guessing them in the backend. We need to keep around the old method of enabling WQM, but eventually we should remove it once Mesa catches up. For now, this will let us use DPP instructions for computing derivatives correctly.
Reviewers: arsenm, tpr, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D35167
llvm-svn: 310085
show more ...
|
Revision tags: llvmorg-5.0.0-rc1, llvmorg-4.0.1, llvmorg-4.0.1-rc3, llvmorg-4.0.1-rc2, llvmorg-4.0.1-rc1, llvmorg-4.0.0, llvmorg-4.0.0-rc4, llvmorg-4.0.0-rc3, llvmorg-4.0.0-rc2, llvmorg-4.0.0-rc1 |
|
#
2bc2f33b |
| 09-Dec-2016 |
Eugene Zelenko <eugene.zelenko@gmail.com> |
[AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 289282
|
Revision tags: llvmorg-3.9.1, llvmorg-3.9.1-rc3, llvmorg-3.9.1-rc2, llvmorg-3.9.1-rc1 |
|
#
79c05871 |
| 25-Nov-2016 |
Marek Olsak <marek.olsak@amd.com> |
AMDGPU/SI: Add back reverted SGPR spilling code, but disable it
suggested as a better solution by Matt
llvm-svn: 287942
|
#
a45dae45 |
| 25-Nov-2016 |
Marek Olsak <marek.olsak@amd.com> |
Revert "AMDGPU: Make m0 unallocatable"
This reverts commit 124ad83dae04514f943902446520c859adee0e96.
llvm-svn: 287932
|
#
9e5c7b10 |
| 24-Nov-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Make m0 unallocatable
m0 may need to be written for spill code, so we don't want general code uses relying on the value stored in it.
This introduces a few code quality regressions where co
AMDGPU: Make m0 unallocatable
m0 may need to be written for spill code, so we don't want general code uses relying on the value stored in it.
This introduces a few code quality regressions where copies from m0 are not coalesced into copies of a copy of m0.
llvm-svn: 287841
show more ...
|
#
117296c0 |
| 01-Oct-2016 |
Mehdi Amini <mehdi.amini@apple.com> |
Use StringRef in Pass/PassManager APIs (NFC)
llvm-svn: 283004
|
#
e58e0e3f |
| 12-Sep-2016 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Do not clobber SCC in SIWholeQuadMode
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D22198
llvm-svn: 28
AMDGPU: Do not clobber SCC in SIWholeQuadMode
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D22198
llvm-svn: 281230
show more ...
|
#
3bba6a84 |
| 03-Sep-2016 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Reduce the duration of whole-quad-mode
Summary: This contains two changes that reduce the time spent in WQM, with the intention of reducing bandwidth required by VMEM loads:
1. Sampling ins
AMDGPU: Reduce the duration of whole-quad-mode
Summary: This contains two changes that reduce the time spent in WQM, with the intention of reducing bandwidth required by VMEM loads:
1. Sampling instructions by themselves don't need to run in WQM, only their coordinate inputs need it (unless of course there is a dependent sampling instruction). The initial scanInstructions step is modified accordingly.
2. When switching back from WQM to Exact, switch back as soon as possible. This affects the logic in processBlock.
This should always be a win or at best neutral.
There are also some cleanups (e.g. remove unused ExecExports) and some new debugging output.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D22092
llvm-svn: 280590
show more ...
|
#
a246dccc |
| 03-Sep-2016 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Fix an interaction between WQM and polygon stippling
Summary: This fixes a rare bug in polygon stippling with non-monolithic pixel shaders.
The underlying problem is as follows: the prolog
AMDGPU: Fix an interaction between WQM and polygon stippling
Summary: This fixes a rare bug in polygon stippling with non-monolithic pixel shaders.
The underlying problem is as follows: the prolog part contains the polygon stippling sequence, i.e. a kill. The main part then enables WQM based on the _reduced_ exec mask, effectively undoing most of the polygon stippling.
Since we cannot know whether polygon stippling will be used, the main part of a non-monolithic shader must always return to exact mode to fix this problem.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D23131
llvm-svn: 280589
show more ...
|
Revision tags: llvmorg-3.9.0, llvmorg-3.9.0-rc3, llvmorg-3.9.0-rc2 |
|
#
8a482b33 |
| 02-Aug-2016 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Stay in WQM for non-intrinsic stores
Summary: Two types of stores are possible in pixel shaders: stores to memory that are explicitly requested at the API level, and stores that are an imple
AMDGPU: Stay in WQM for non-intrinsic stores
Summary: Two types of stores are possible in pixel shaders: stores to memory that are explicitly requested at the API level, and stores that are an implementation detail of register spilling or lowering of arrays.
For the first kind of store, we must ensure that helper pixels have no effect and hence WQM must be disabled. The second kind of store must always be executed, because the written value may be loaded again in a way that is relevant for helper pixels as well -- and there are no externally visible effects anyway.
This is a candidate for the 3.9 release branch.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: https://reviews.llvm.org/D22675
llvm-svn: 277504
show more ...
|
#
bef0e90c |
| 02-Aug-2016 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Track physical registers in SIWholeQuadMode
Summary: There are cases where uniform branch conditions are computed in VGPRs, and we didn't correctly mark those as WQM.
The stray change in ba
AMDGPU: Track physical registers in SIWholeQuadMode
Summary: There are cases where uniform branch conditions are computed in VGPRs, and we didn't correctly mark those as WQM.
The stray change in basic-branch.ll is because invoking the LiveIntervals analysis leads to the detection of a dead register that would otherwise not be seen at -O0.
This is a candidate for the 3.9 branch, as it fixes a possible hang.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D22673
llvm-svn: 277500
show more ...
|
Revision tags: llvmorg-3.9.0-rc1 |
|
#
3b572002 |
| 28-Jul-2016 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: add execfix flag to SI_ELSE
Summary: SI_ELSE is lowered into two parts:
s_or_saveexec_b64 dst, src (at the start of the basic block)
s_xor_b64 exec, exec, dst (at the end of the basic bloc
AMDGPU: add execfix flag to SI_ELSE
Summary: SI_ELSE is lowered into two parts:
s_or_saveexec_b64 dst, src (at the start of the basic block)
s_xor_b64 exec, exec, dst (at the end of the basic block)
The idea is that dst contains the exec mask of the preceding IF block. It can happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside the basic block that contains SI_ELSE, in which case it introduces an instruction
s_and_b64 exec, exec, s[...]
which masks out bits that can correspond to both the IF and the ELSE paths. So the resulting sequence must be:
s_or_savexec_b64 dst, src
s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode s_and_b64 dst, dst, exec <-- added by SILowerControlFlow
s_xor_b64 exec, exec, dst
Whether to add the additional s_and_b64 dst, dst, exec is currently determined via the ExecModified tracking. With this change, it is instead determined by an additional flag on SI_ELSE which is set by SIWholeQuadMode.
Finally: It also occured to me that an alternative approach for the long run is for SILowerControlFlow to unconditionally emit
s_or_saveexec_b64 dst, src
...
s_and_b64 dst, dst, exec s_xor_b64 exec, exec, dst
and have a pass that detects and cleans up the "redundant AND with exec" pattern where possible. This could be useful anyway, because we also add instructions
s_and_b64 vcc, exec, vcc
before s_cbranch_scc (in moveToALU), and those are often redundant. I have some pending changes to how KILL is lowered that could also benefit from such a cleanup pass.
In any case, this current patch could help in the short term with the whole ExecModified business.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D22846
llvm-svn: 276972
show more ...
|
#
8dff86d8 |
| 13-Jul-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: WQM cleanups
- Add new TTI instruction checks - Don't use const for blocks that are mutated. - Checking isBranch and isTerminator should be redundant
llvm-svn: 275252
|
#
786724a2 |
| 12-Jul-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Follow up to r275203
I meant to squash this into it.
llvm-svn: 275220
|
#
4d295118 |
| 08-Jul-2016 |
Duncan P. N. Exon Smith <dexonsmith@apple.com> |
AMDGPU: Remove implicit iterator conversions, NFC
Remove remaining implicit conversions from MachineInstrBundleIterator to MachineInstr* from the AMDGPU backend. In most cases, I made them less att
AMDGPU: Remove implicit iterator conversions, NFC
Remove remaining implicit conversions from MachineInstrBundleIterator to MachineInstr* from the AMDGPU backend. In most cases, I made them less attractive by preferring MachineInstr& or using a ranged-based for loop.
Once all the backends are fixed I'll make the operator explicit so that this doesn't bitrot back.
llvm-svn: 274906
show more ...
|
#
43e92fe3 |
| 24-Jun-2016 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Cleanup subtarget handling.
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict
AMDGPU: Cleanup subtarget handling.
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target.
llvm-svn: 273652
show more ...
|
#
c00e03b8 |
| 07-Jun-2016 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Add amdgpu-ps-wqm-outputs function attributes
Summary: The presence of this attribute indicates that VGPR outputs should be computed in whole quad mode. This will be used by Mesa for prolog
AMDGPU: Add amdgpu-ps-wqm-outputs function attributes
Summary: The presence of this attribute indicates that VGPR outputs should be computed in whole quad mode. This will be used by Mesa for prolog pixel shaders, so that derivatives can be taken of shader inputs computed by the prolog, fixing a bug.
The generated code could certainly be improved: if a prolog pixel shader is used (which isn't common in modern OpenGL - they're used for gl_Color, polygon stipples, and forcing per-sample interpolation), Mesa will use this attribute unconditionally, because it has to be conservative. So WQM may be used in the prolog when it isn't really needed, and furthermore a silly back-and-forth switch is likely to happen at the boundary between prolog and main shader parts.
Fixing this is a bit involved: we'd first have to add a mechanism by which LLVM writes the WQM-related input requirements to the main shader part binary, and then Mesa specializes the prolog part accordingly. At that point, we may as well just compile a monolithic shader...
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D20839
llvm-svn: 272063
show more ...
|
Revision tags: llvmorg-3.8.1, llvmorg-3.8.1-rc1 |
|
#
b0c97487 |
| 22-Apr-2016 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU/SI: add llvm.amdgcn.ps.live intrinsic
Summary: This intrinsic returns true if the current thread belongs to a live pixel and false if it belongs to a pixel that we are executing only for deri
AMDGPU/SI: add llvm.amdgcn.ps.live intrinsic
Summary: This intrinsic returns true if the current thread belongs to a live pixel and false if it belongs to a pixel that we are executing only for derivative computation. It will be used by Mesa to implement gl_HelperInvocation.
Note that for pixels that are killed during the shader, this implementation also returns true, but it doesn't matter because those pixels are always disabled in the EXEC mask.
This unearthed a corner case in the instruction verifier, which complained about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but correct code, so make the verifier accept it as such.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19191
llvm-svn: 267102
show more ...
|
#
df3a20cd |
| 06-Apr-2016 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Add a shader calling convention
This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders.
Patch By: Bas Nieuwenhuizen <bas@basnie
AMDGPU: Add a shader calling convention
This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders.
Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Differential Revision: http://reviews.llvm.org/D18559
llvm-svn: 265589
show more ...
|