Revision tags: llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2 |
|
#
14359ef1 |
| 01-Feb-2019 |
James Y Knight <jyknight@google.com> |
[opaque pointer types] Pass value type to LoadInst creation.
This cleans up all LoadInst creation in LLVM to explicitly pass the value type rather than deriving it from the pointer's element-type.
[opaque pointer types] Pass value type to LoadInst creation.
This cleans up all LoadInst creation in LLVM to explicitly pass the value type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57172
llvm-svn: 352911
show more ...
|
Revision tags: llvmorg-8.0.0-rc1 |
|
#
2946cd70 |
| 19-Jan-2019 |
Chandler Carruth <chandlerc@gmail.com> |
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the ne
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository.
llvm-svn: 351636
show more ...
|
Revision tags: llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1 |
|
#
57f5d0a8 |
| 08-Oct-2018 |
Neil Henning <neil.henning@amd.com> |
[IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle.
The IRBuilder CreateIntrinsic method wouldn't allow you to specify the types that you wanted the intrinsic to be mangled with.
[IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle.
The IRBuilder CreateIntrinsic method wouldn't allow you to specify the types that you wanted the intrinsic to be mangled with. To fix this I've:
- Added an ArrayRef<Type *> member to both CreateIntrinsic overloads. - Used that array to pass into the Intrinsic::getDeclaration call. - Added a CreateUnaryIntrinsic to replace the most common use of CreateIntrinsic where the type was auto-deduced from operand 0. - Added a bunch more unit tests to test Create*Intrinsic calls that weren't being tested (including the FMF flag that wasn't checked).
This was suggested as part of the AMDGPU specific atomic optimizer review (https://reviews.llvm.org/D51969).
Differential Revision: https://reviews.llvm.org/D52087
llvm-svn: 343962
show more ...
|
Revision tags: llvmorg-7.0.0, llvmorg-7.0.0-rc3 |
|
#
0da6350d |
| 31-Aug-2018 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove remnants of old address space mapping
llvm-svn: 341165
|
#
35617ed4 |
| 30-Aug-2018 |
Nicolai Haehnle <nhaehnle@gmail.com> |
[NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis
Summary: This is patch 1 of the new DivergenceAnalysis (https://reviews.llvm.org/D50433).
The purpose of this patch is to free up the
[NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis
Summary: This is patch 1 of the new DivergenceAnalysis (https://reviews.llvm.org/D50433).
The purpose of this patch is to free up the name DivergenceAnalysis for the new generic implementation. The generic implementation class will be shared by specialized divergence analysis classes.
Patch by: Simon Moll
Reviewed By: nhaehnle
Subscribers: jvesely, jholewinski, arsenm, nhaehnle, mgorny, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D50434
Change-Id: Ie8146b11be2c50d5312f30e11c7a3036a15b48cb llvm-svn: 341071
show more ...
|
Revision tags: llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1 |
|
#
7e7268ac |
| 25-Jul-2018 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Use AssumptionCacheTracker in the divrem32 expansion
Differential Revision: https://reviews.llvm.org/D49761
llvm-svn: 337938
|
#
5bfbae5c |
| 11-Jul-2018 |
Tom Stellard <tstellar@redhat.com> |
AMDGPU: Refactor Subtarget classes
Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Me
AMDGPU: Refactor Subtarget classes
Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation.
Reviewers: arsenm, jvesely
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D49037
llvm-svn: 336851
show more ...
|
#
67aa18f1 |
| 28-Jun-2018 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Early expansion of 32 bit udiv/urem
This allows hoisting of a common code, for instance if denominator is loop invariant. Current change is expansion only, adding licm to the target pass li
[AMDGPU] Early expansion of 32 bit udiv/urem
This allows hoisting of a common code, for instance if denominator is loop invariant. Current change is expansion only, adding licm to the target pass list going to be a separate patch. Given this patch changes to codegen are minor as the expansion is similar to that on DAG. DAG expansion still must remain for R600.
Differential Revision: https://reviews.llvm.org/D48586
llvm-svn: 335868
show more ...
|
#
12269dda |
| 28-Jun-2018 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix AMDGPUCodeGenPrepare using uninitialized AMDGPUAS struct
Not sure how this wasn't noticed before.
llvm-svn: 335828
|
Revision tags: llvmorg-6.0.1, llvmorg-6.0.1-rc3 |
|
#
90083d30 |
| 07-Jun-2018 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Try a lot harder to emit scalar loads
This has two main components. First, widen widen short constant loads in DAG when they have the correct alignment. This is already done a bit in AMDGPUC
AMDGPU: Try a lot harder to emit scalar loads
This has two main components. First, widen widen short constant loads in DAG when they have the correct alignment. This is already done a bit in AMDGPUCodeGenPrepare, since that has access to DivergenceAnalysis. This can't help kernarg loads created in the DAG. Start to use DAG divergence analysis to help this case.
The second part is to avoid kernel argument lowering breaking the alignment of short vector elements because calling convention lowering wants to split everything into legal register types.
When loading a split type, load the nearest 4-byte aligned segment and shift to get the desired bits. This extra load of the earlier argument piece ends up merging, and the bit extract hopefully folds out.
There are a number of improvements and regressions with this, but I think as-is this is a better compromise between several of the worst parts of SelectionDAG.
Particularly when i16 is legal, this produces worse code for i8 and i16 element vector kernel arguments. This is partially due to the very weak load merging the DAG does. It only looks for fairly specific combines between pairs of loads which no longer appear. In particular this causes v4i16 loads to be split into 2 components when previously the two halves were merged.
Worse, because of the newly introduced shifts, there is a lot more unnecessary vector packing and unpacking code emitted. At least some of this is due to reporting false for isTypeDesirableForOp for i16 as a workaround for the lack of divergence information in the DAG. The cases where this happens it doesn't actually matter, but the relevant code in SimplifyDemandedBits doens't have the context to know to ignore this.
The use of the scalar cache is probably more important than the mess of mostly scalar instructions doing this packing and unpacking. Future work can fix this, possibly by making better use of the new DAG divergence information for controlling promotion decisions, or adding another version of shift + trunc + shift combines that doesn't only know about the used types.
llvm-svn: 334180
show more ...
|
#
df61be70 |
| 06-Jun-2018 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Improve reciprocal handling
When denormals are supported we are producing a full division for 1.0f / x. That still can be replaced by the faster version:
bool c = fabs(x) > 0x1.0p+96f;
[AMDGPU] Improve reciprocal handling
When denormals are supported we are producing a full division for 1.0f / x. That still can be replaced by the faster version:
bool c = fabs(x) > 0x1.0p+96f; float s = c ? 0x1.0p-32f : 1.0f; x *= s; return s * v_rcp_f32(x)
in case if requested accuracy is 2.5ulp or less. The same version is used if denormals are not supported for non 1.0 numerators, where just v_rcp_f32 is then used for 1.0 numerator.
The optimization of 1/x is extended to the case -1/x, which is the same except for the resulting sign bit.
OpenCL conformance passed with both enabled and disabled denorms.
Differential Revision: https://reviews.llvm.org/D47805
llvm-svn: 334142
show more ...
|
#
57e541e8 |
| 05-Jun-2018 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Preserve metadata when widening loads
Preserves the low bound of the !range. I don't think it's legal to do anything with the top half since it's theoretically reading garbage.
llvm-svn: 33
AMDGPU: Preserve metadata when widening loads
Preserves the low bound of the !range. I don't think it's legal to do anything with the top half since it's theoretically reading garbage.
llvm-svn: 334045
show more ...
|
Revision tags: llvmorg-6.0.1-rc2 |
|
#
5f8f34e4 |
| 01-May-2018 |
Adrian Prantl <aprantl@apple.com> |
Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they ar
Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46290
llvm-svn: 331272
show more ...
|
Revision tags: llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3 |
|
#
923712b6 |
| 09-Feb-2018 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Reapply "AMDGPU: Add 32-bit constant address space"
This reverts r324494 and reapplies r324487.
llvm-svn: 324747
|
Revision tags: llvmorg-6.0.0-rc2 |
|
#
f4e3f3e3 |
| 07-Feb-2018 |
Rafael Espindola <rafael.espindola@gmail.com> |
Revert "AMDGPU: Add 32-bit constant address space"
This reverts commit r324487.
It broke clang tests.
llvm-svn: 324494
|
#
871c30e5 |
| 07-Feb-2018 |
Marek Olsak <marek.olsak@amd.com> |
AMDGPU: Add 32-bit constant address space
Note: This is a candidate for LLVM 6.0, because it was planned to be in that release but was delayed due to a long review period.
Merge conflict in r
AMDGPU: Add 32-bit constant address space
Note: This is a candidate for LLVM 6.0, because it was planned to be in that release but was delayed due to a long review period.
Merge conflict in release_60 - resolution: Add "-p6:32:32" into the second (non-amdgiz) string.
Only scalar loads support 32-bit pointers. An address in a VGPR will fail to compile. That's OK because the results of loads will only be used in places where VGPRs are forbidden.
Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC. The tests cover all uses cases we need for Mesa.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D41651
llvm-svn: 324487
show more ...
|
Revision tags: llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2 |
|
#
629c4115 |
| 06-Nov-2017 |
Sanjay Patel <spatel@rotateright.com> |
[IR] redefine 'UnsafeAlgebra' / 'reassoc' fast-math-flags and add 'trans' fast-math-flag
As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html and again more r
[IR] redefine 'UnsafeAlgebra' / 'reassoc' fast-math-flags and add 'trans' fast-math-flag
As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html and again more recently: http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html
...this is a step in cleaning up our fast-math-flags implementation in IR to better match the capabilities of both clang's user-visible flags and the backend's flags for SDNode.
As proposed in the above threads, we're replacing the 'UnsafeAlgebra' bit (which had the 'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic reassociation - 'AllowReassoc'.
We're also adding a bit to allow approximations for library functions called 'ApproxFunc' (this was initially proposed as 'libm' or similar).
...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits), but that's apparently already used for other purposes. Also, I don't think we can just add a field to FPMathOperator because Operator is not intended to be instantiated. We'll defer movement of FMF to another day.
We keep the 'fast' keyword. I thought about removing that, but seeing IR like this: %f.fast = fadd reassoc nnan ninf nsz arcp contract afn float %op1, %op2 ...made me think we want to keep the shortcut synonym.
Finally, this change is binary incompatible with existing IR as seen in the compatibility tests. This statement: "Newer releases can ignore features from older releases, but they cannot miscompile them. For example, if nsw is ever replaced with something else, dropping it would be a valid way to upgrade the IR." ( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility ) ...provides the flexibility we want to make this change without requiring a new IR version. Ie, we're not loosening the FP strictness of existing IR. At worst, we will fail to optimize some previously 'fast' code because it's no longer recognized as 'fast'. This should get fixed as we audit/squash all of the uses of 'isFast()'.
Note: an inter-dependent clang commit to use the new API name should closely follow commit.
Differential Revision: https://reviews.llvm.org/D39304
llvm-svn: 317488
show more ...
|
Revision tags: llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2, llvmorg-5.0.0-rc1 |
|
#
a126a13b |
| 26-Jul-2017 |
Wei Ding <wei.ding2@amd.com> |
AMDGPU : Widen extending scalar loads to 32-bits.
Differential Revision: http://reviews.llvm.org/D35146
llvm-svn: 309178
|
#
9d7b1c9d |
| 06-Jul-2017 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Always use rcp + mul with fast math
Regardless of relaxation options such as -cl-fast-relaxed-math we are producing rather long code for fdiv via amdgcn_fdiv_fast intrinsic. This intrinsic
[AMDGPU] Always use rcp + mul with fast math
Regardless of relaxation options such as -cl-fast-relaxed-math we are producing rather long code for fdiv via amdgcn_fdiv_fast intrinsic. This intrinsic is used to replace fdiv with 2.5ulp metadata and does not handle denormals, thus believed to be fast.
An fdiv instruction can also have fast math flag either by itself or together with fpmath metadata. Clang used with a relaxation flag always produces both metadata and fast flag:
%div = fdiv fast float %v, %0, !fpmath !12 !12 = !{float 2.500000e+00}
Current implementation ignores fast flag and favors metadata. An instruction with just fast flag would be lowered to a fastest rcp + mul, but that never happen on practice because of described mutual clang and BE behavior.
This change allows an "fdiv fast" to be always lowered as rcp + mul.
Differential Revision: https://reviews.llvm.org/D34844
llvm-svn: 307308
show more ...
|
Revision tags: llvmorg-4.0.1, llvmorg-4.0.1-rc3 |
|
#
6bda14b3 |
| 06-Jun-2017 |
Chandler Carruth <chandlerc@gmail.com> |
Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line
Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
show more ...
|
Revision tags: llvmorg-4.0.1-rc2 |
|
#
8b61764c |
| 18-May-2017 |
Francis Visoiu Mistrih <fvisoiumistrih@apple.com> |
[LegacyPassManager] Remove TargetMachine constructors
This provides a new way to access the TargetMachine through TargetPassConfig, as a dependency.
The patterns replaced here are:
* Passes handli
[LegacyPassManager] Remove TargetMachine constructors
This provides a new way to access the TargetMachine through TargetPassConfig, as a dependency.
The patterns replaced here are:
* Passes handling a null TargetMachine call `getAnalysisIfAvailable<TargetPassConfig>`.
* Passes not handling a null TargetMachine `addRequired<TargetPassConfig>` and call `getAnalysis<TargetPassConfig>`.
* MachineFunctionPasses now use MF.getTarget().
* Remove all the TargetMachine constructors. * Remove INITIALIZE_TM_PASS.
This fixes a crash when running `llc -start-before prologepilog`.
PEI needs StackProtector, which gets constructed without a TargetMachine by the pass manager. The StackProtector pass doesn't handle the case where there is no TargetMachine, so it segfaults.
Related to PR30324.
Differential Revision: https://reviews.llvm.org/D33222
llvm-svn: 303360
show more ...
|
Revision tags: llvmorg-4.0.1-rc1 |
|
#
c5b641ac |
| 17-Mar-2017 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Cleanup control flow intrinsics
Move backend internal intrinsics along with the rest of the normal intrinsics, and use the Intrinsic::getDeclaration API instead of manually constructing the
AMDGPU: Cleanup control flow intrinsics
Move backend internal intrinsics along with the rest of the normal intrinsics, and use the Intrinsic::getDeclaration API instead of manually constructing the type list.
It's surprising this was working before. fdiv.fast had the wrong number of parameters. The control flow intrinsic declaration attributes were not being applied, and their types were inconsistent. The actual IR use types did not match the declaration, and were closer to the types used for the patterns. The brcond lowering was changing the types, so introduce new nodes for those.
llvm-svn: 298119
show more ...
|
Revision tags: llvmorg-4.0.0, llvmorg-4.0.0-rc4, llvmorg-4.0.0-rc3 |
|
#
eb522e68 |
| 27-Feb-2017 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Support v2i16/v2f16 packed operations
llvm-svn: 296396
|
Revision tags: llvmorg-4.0.0-rc2 |
|
#
d59e6404 |
| 01-Feb-2017 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Improve nsw/nuw/exact when promoting uniform i16 ops
These were simply preserving the flags of the original operation, which was too conservative in most cases and incorrect for mul.
nsw/nu
AMDGPU: Improve nsw/nuw/exact when promoting uniform i16 ops
These were simply preserving the flags of the original operation, which was too conservative in most cases and incorrect for mul.
nsw/nuw may be needed for some combines to cleanup messes when intermediate sext_inregs are introduced later.
Tested valid combinations with alive.
llvm-svn: 293776
show more ...
|
#
734bb7bb |
| 20-Jan-2017 |
Eugene Zelenko <eugene.zelenko@gmail.com> |
[AMDGPU] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 292623
|