History log of /llvm-project/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp (Results 101 – 125 of 136)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2
# 14359ef1 01-Feb-2019 James Y Knight <jyknight@google.com>

[opaque pointer types] Pass value type to LoadInst creation.

This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.

[opaque pointer types] Pass value type to LoadInst creation.

This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57172

llvm-svn: 352911

show more ...


Revision tags: llvmorg-8.0.0-rc1
# 2946cd70 19-Jan-2019 Chandler Carruth <chandlerc@gmail.com>

Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the ne

Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636

show more ...


Revision tags: llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1
# 57f5d0a8 08-Oct-2018 Neil Henning <neil.henning@amd.com>

[IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle.

The IRBuilder CreateIntrinsic method wouldn't allow you to specify the
types that you wanted the intrinsic to be mangled with.

[IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle.

The IRBuilder CreateIntrinsic method wouldn't allow you to specify the
types that you wanted the intrinsic to be mangled with. To fix this
I've:

- Added an ArrayRef<Type *> member to both CreateIntrinsic overloads.
- Used that array to pass into the Intrinsic::getDeclaration call.
- Added a CreateUnaryIntrinsic to replace the most common use of
CreateIntrinsic where the type was auto-deduced from operand 0.
- Added a bunch more unit tests to test Create*Intrinsic calls that
weren't being tested (including the FMF flag that wasn't checked).

This was suggested as part of the AMDGPU specific atomic optimizer
review (https://reviews.llvm.org/D51969).

Differential Revision: https://reviews.llvm.org/D52087

llvm-svn: 343962

show more ...


Revision tags: llvmorg-7.0.0, llvmorg-7.0.0-rc3
# 0da6350d 31-Aug-2018 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Remove remnants of old address space mapping

llvm-svn: 341165


# 35617ed4 30-Aug-2018 Nicolai Haehnle <nhaehnle@gmail.com>

[NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis

Summary:
This is patch 1 of the new DivergenceAnalysis (https://reviews.llvm.org/D50433).

The purpose of this patch is to free up the

[NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis

Summary:
This is patch 1 of the new DivergenceAnalysis (https://reviews.llvm.org/D50433).

The purpose of this patch is to free up the name DivergenceAnalysis for the new generic
implementation. The generic implementation class will be shared by specialized
divergence analysis classes.

Patch by: Simon Moll

Reviewed By: nhaehnle

Subscribers: jvesely, jholewinski, arsenm, nhaehnle, mgorny, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D50434

Change-Id: Ie8146b11be2c50d5312f30e11c7a3036a15b48cb
llvm-svn: 341071

show more ...


Revision tags: llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1
# 7e7268ac 25-Jul-2018 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Use AssumptionCacheTracker in the divrem32 expansion

Differential Revision: https://reviews.llvm.org/D49761

llvm-svn: 337938


# 5bfbae5c 11-Jul-2018 Tom Stellard <tstellar@redhat.com>

AMDGPU: Refactor Subtarget classes

Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Me

AMDGPU: Refactor Subtarget classes

Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Merge R600Subtarget::Generation and GCNSubtarget::Generation into
AMDGPUSubtarget::Generation.

Reviewers: arsenm, jvesely

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D49037

llvm-svn: 336851

show more ...


# 67aa18f1 28-Jun-2018 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Early expansion of 32 bit udiv/urem

This allows hoisting of a common code, for instance if denominator
is loop invariant. Current change is expansion only, adding licm to
the target pass li

[AMDGPU] Early expansion of 32 bit udiv/urem

This allows hoisting of a common code, for instance if denominator
is loop invariant. Current change is expansion only, adding licm to
the target pass list going to be a separate patch. Given this patch
changes to codegen are minor as the expansion is similar to that on
DAG. DAG expansion still must remain for R600.

Differential Revision: https://reviews.llvm.org/D48586

llvm-svn: 335868

show more ...


# 12269dda 28-Jun-2018 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix AMDGPUCodeGenPrepare using uninitialized AMDGPUAS struct

Not sure how this wasn't noticed before.

llvm-svn: 335828


Revision tags: llvmorg-6.0.1, llvmorg-6.0.1-rc3
# 90083d30 07-Jun-2018 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Try a lot harder to emit scalar loads

This has two main components. First, widen
widen short constant loads in DAG when they have
the correct alignment. This is already done a bit in
AMDGPUC

AMDGPU: Try a lot harder to emit scalar loads

This has two main components. First, widen
widen short constant loads in DAG when they have
the correct alignment. This is already done a bit in
AMDGPUCodeGenPrepare, since that has access to
DivergenceAnalysis. This can't help kernarg loads
created in the DAG. Start to use DAG divergence analysis
to help this case.

The second part is to avoid kernel argument lowering
breaking the alignment of short vector elements because
calling convention lowering wants to split everything
into legal register types.

When loading a split type, load the nearest 4-byte aligned
segment and shift to get the desired bits. This extra
load of the earlier argument piece ends up merging,
and the bit extract hopefully folds out.

There are a number of improvements and regressions with
this, but I think as-is this is a better compromise between
several of the worst parts of SelectionDAG.

Particularly when i16 is legal, this produces worse code
for i8 and i16 element vector kernel arguments. This is
partially due to the very weak load merging the DAG does.
It only looks for fairly specific combines between pairs
of loads which no longer appear. In particular this
causes v4i16 loads to be split into 2 components when
previously the two halves were merged.

Worse, because of the newly introduced shifts, there
is a lot more unnecessary vector packing and unpacking code
emitted. At least some of this is due to reporting
false for isTypeDesirableForOp for i16 as a workaround for
the lack of divergence information in the DAG. The cases
where this happens it doesn't actually matter, but the
relevant code in SimplifyDemandedBits doens't have the context
to know to ignore this.

The use of the scalar cache is probably more important
than the mess of mostly scalar instructions doing this packing
and unpacking. Future work can fix this, possibly by making better
use of the new DAG divergence information for controlling promotion
decisions, or adding another version of shift + trunc + shift
combines that doesn't only know about the used types.

llvm-svn: 334180

show more ...


# df61be70 06-Jun-2018 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Improve reciprocal handling

When denormals are supported we are producing a full division for
1.0f / x. That still can be replaced by the faster version:

bool c = fabs(x) > 0x1.0p+96f;

[AMDGPU] Improve reciprocal handling

When denormals are supported we are producing a full division for
1.0f / x. That still can be replaced by the faster version:

bool c = fabs(x) > 0x1.0p+96f;
float s = c ? 0x1.0p-32f : 1.0f;
x *= s;
return s * v_rcp_f32(x)

in case if requested accuracy is 2.5ulp or less. The same version
is used if denormals are not supported for non 1.0 numerators, where
just v_rcp_f32 is then used for 1.0 numerator.

The optimization of 1/x is extended to the case -1/x, which is the
same except for the resulting sign bit.

OpenCL conformance passed with both enabled and disabled denorms.

Differential Revision: https://reviews.llvm.org/D47805

llvm-svn: 334142

show more ...


# 57e541e8 05-Jun-2018 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Preserve metadata when widening loads

Preserves the low bound of the !range. I don't think
it's legal to do anything with the top half since it's
theoretically reading garbage.

llvm-svn: 33

AMDGPU: Preserve metadata when widening loads

Preserves the low bound of the !range. I don't think
it's legal to do anything with the top half since it's
theoretically reading garbage.

llvm-svn: 334045

show more ...


Revision tags: llvmorg-6.0.1-rc2
# 5f8f34e4 01-May-2018 Adrian Prantl <aprantl@apple.com>

Remove \brief commands from doxygen comments.

We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they ar

Remove \brief commands from doxygen comments.

We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

llvm-svn: 331272

show more ...


Revision tags: llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3
# 923712b6 09-Feb-2018 Matt Arsenault <Matthew.Arsenault@amd.com>

Reapply "AMDGPU: Add 32-bit constant address space"

This reverts r324494 and reapplies r324487.

llvm-svn: 324747


Revision tags: llvmorg-6.0.0-rc2
# f4e3f3e3 07-Feb-2018 Rafael Espindola <rafael.espindola@gmail.com>

Revert "AMDGPU: Add 32-bit constant address space"

This reverts commit r324487.

It broke clang tests.

llvm-svn: 324494


# 871c30e5 07-Feb-2018 Marek Olsak <marek.olsak@amd.com>

AMDGPU: Add 32-bit constant address space

Note: This is a candidate for LLVM 6.0, because it was planned to be
in that release but was delayed due to a long review period.

Merge conflict in r

AMDGPU: Add 32-bit constant address space

Note: This is a candidate for LLVM 6.0, because it was planned to be
in that release but was delayed due to a long review period.

Merge conflict in release_60 - resolution:
Add "-p6:32:32" into the second (non-amdgiz) string.

Only scalar loads support 32-bit pointers. An address in a VGPR will
fail to compile. That's OK because the results of loads will only be used
in places where VGPRs are forbidden.

Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC.
The tests cover all uses cases we need for Mesa.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D41651

llvm-svn: 324487

show more ...


Revision tags: llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2
# 629c4115 06-Nov-2017 Sanjay Patel <spatel@rotateright.com>

[IR] redefine 'UnsafeAlgebra' / 'reassoc' fast-math-flags and add 'trans' fast-math-flag

As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
and again more r

[IR] redefine 'UnsafeAlgebra' / 'reassoc' fast-math-flags and add 'trans' fast-math-flag

As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
and again more recently:
http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html

...this is a step in cleaning up our fast-math-flags implementation in IR to better match
the capabilities of both clang's user-visible flags and the backend's flags for SDNode.

As proposed in the above threads, we're replacing the 'UnsafeAlgebra' bit (which had the
'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic
reassociation - 'AllowReassoc'.

We're also adding a bit to allow approximations for library functions called 'ApproxFunc'
(this was initially proposed as 'libm' or similar).

...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did
look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits),
but that's apparently already used for other purposes. Also, I don't think we can just
add a field to FPMathOperator because Operator is not intended to be instantiated.
We'll defer movement of FMF to another day.

We keep the 'fast' keyword. I thought about removing that, but seeing IR like this:
%f.fast = fadd reassoc nnan ninf nsz arcp contract afn float %op1, %op2
...made me think we want to keep the shortcut synonym.

Finally, this change is binary incompatible with existing IR as seen in the
compatibility tests. This statement:
"Newer releases can ignore features from older releases, but they cannot miscompile
them. For example, if nsw is ever replaced with something else, dropping it would be
a valid way to upgrade the IR."
( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility )
...provides the flexibility we want to make this change without requiring a new IR
version. Ie, we're not loosening the FP strictness of existing IR. At worst, we will
fail to optimize some previously 'fast' code because it's no longer recognized as
'fast'. This should get fixed as we audit/squash all of the uses of 'isFast()'.

Note: an inter-dependent clang commit to use the new API name should closely follow
commit.

Differential Revision: https://reviews.llvm.org/D39304

llvm-svn: 317488

show more ...


Revision tags: llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2, llvmorg-5.0.0-rc1
# a126a13b 26-Jul-2017 Wei Ding <wei.ding2@amd.com>

AMDGPU : Widen extending scalar loads to 32-bits.

Differential Revision: http://reviews.llvm.org/D35146

llvm-svn: 309178


# 9d7b1c9d 06-Jul-2017 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Always use rcp + mul with fast math

Regardless of relaxation options such as -cl-fast-relaxed-math
we are producing rather long code for fdiv via amdgcn_fdiv_fast
intrinsic. This intrinsic

[AMDGPU] Always use rcp + mul with fast math

Regardless of relaxation options such as -cl-fast-relaxed-math
we are producing rather long code for fdiv via amdgcn_fdiv_fast
intrinsic. This intrinsic is used to replace fdiv with 2.5ulp
metadata and does not handle denormals, thus believed to be fast.

An fdiv instruction can also have fast math flag either by itself
or together with fpmath metadata. Clang used with a relaxation flag
always produces both metadata and fast flag:

%div = fdiv fast float %v, %0, !fpmath !12
!12 = !{float 2.500000e+00}

Current implementation ignores fast flag and favors metadata. An
instruction with just fast flag would be lowered to a fastest rcp +
mul, but that never happen on practice because of described mutual
clang and BE behavior.

This change allows an "fdiv fast" to be always lowered as rcp + mul.

Differential Revision: https://reviews.llvm.org/D34844

llvm-svn: 307308

show more ...


Revision tags: llvmorg-4.0.1, llvmorg-4.0.1-rc3
# 6bda14b3 06-Jun-2017 Chandler Carruth <chandlerc@gmail.com>

Sort the remaining #include lines in include/... and lib/....

I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line

Sort the remaining #include lines in include/... and lib/....

I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.

I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.

This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.

Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).

llvm-svn: 304787

show more ...


Revision tags: llvmorg-4.0.1-rc2
# 8b61764c 18-May-2017 Francis Visoiu Mistrih <fvisoiumistrih@apple.com>

[LegacyPassManager] Remove TargetMachine constructors

This provides a new way to access the TargetMachine through
TargetPassConfig, as a dependency.

The patterns replaced here are:

* Passes handli

[LegacyPassManager] Remove TargetMachine constructors

This provides a new way to access the TargetMachine through
TargetPassConfig, as a dependency.

The patterns replaced here are:

* Passes handling a null TargetMachine call
`getAnalysisIfAvailable<TargetPassConfig>`.

* Passes not handling a null TargetMachine
`addRequired<TargetPassConfig>` and call
`getAnalysis<TargetPassConfig>`.

* MachineFunctionPasses now use MF.getTarget().

* Remove all the TargetMachine constructors.
* Remove INITIALIZE_TM_PASS.

This fixes a crash when running `llc -start-before prologepilog`.

PEI needs StackProtector, which gets constructed without a TargetMachine
by the pass manager. The StackProtector pass doesn't handle the case
where there is no TargetMachine, so it segfaults.

Related to PR30324.

Differential Revision: https://reviews.llvm.org/D33222

llvm-svn: 303360

show more ...


Revision tags: llvmorg-4.0.1-rc1
# c5b641ac 17-Mar-2017 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Cleanup control flow intrinsics

Move backend internal intrinsics along with the rest of the
normal intrinsics, and use the Intrinsic::getDeclaration
API instead of manually constructing the

AMDGPU: Cleanup control flow intrinsics

Move backend internal intrinsics along with the rest of the
normal intrinsics, and use the Intrinsic::getDeclaration
API instead of manually constructing the type list.

It's surprising this was working before. fdiv.fast had
the wrong number of parameters. The control flow intrinsic
declaration attributes were not being applied, and
their types were inconsistent. The actual IR use types
did not match the declaration, and were closer to the
types used for the patterns. The brcond lowering
was changing the types, so introduce new nodes for those.

llvm-svn: 298119

show more ...


Revision tags: llvmorg-4.0.0, llvmorg-4.0.0-rc4, llvmorg-4.0.0-rc3
# eb522e68 27-Feb-2017 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Support v2i16/v2f16 packed operations

llvm-svn: 296396


Revision tags: llvmorg-4.0.0-rc2
# d59e6404 01-Feb-2017 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Improve nsw/nuw/exact when promoting uniform i16 ops

These were simply preserving the flags of the original operation,
which was too conservative in most cases and incorrect for mul.

nsw/nu

AMDGPU: Improve nsw/nuw/exact when promoting uniform i16 ops

These were simply preserving the flags of the original operation,
which was too conservative in most cases and incorrect for mul.

nsw/nuw may be needed for some combines to cleanup messes when
intermediate sext_inregs are introduced later.

Tested valid combinations with alive.

llvm-svn: 293776

show more ...


# 734bb7bb 20-Jan-2017 Eugene Zelenko <eugene.zelenko@gmail.com>

[AMDGPU] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).

llvm-svn: 292623


123456