History log of /llvm-project/llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp (Results 126 – 150 of 429)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-3.6.0
# c93a9a2c 25-Feb-2015 Hal Finkel <hfinkel@anl.gov>

[PowerPC] Add support for the QPX vector instruction set

This adds support for the QPX vector instruction set, which is used by the
enhanced A2 cores on the IBM BG/Q supercomputers. QPX vectors are

[PowerPC] Add support for the QPX vector instruction set

This adds support for the QPX vector instruction set, which is used by the
enhanced A2 cores on the IBM BG/Q supercomputers. QPX vectors are 256 bytes
wide, holding 4 double-precision floating-point values. Boolean values, modeled
here as <4 x i1> are actually also represented as floating-point values
(essentially { -1, 1 } for { false, true }). QPX shares many features with
Altivec and VSX, but is distinct from both of them. One major difference is
that, instead of adding completely-separate vector registers, QPX vector
registers are extensions of the scalar floating-point registers (lane 0 is the
corresponding scalar floating-point value). The operations supported on QPX
vectors mirrors that supported on the scalar floating-point values (with some
additional ones for permutations and logical/comparison operations).

I've been maintaining this support out-of-tree, as part of the bgclang project,
for several years. This is not the entire bgclang patch set, but is most of the
subset that can be cleanly integrated into LLVM proper at this time. Adding
this to the LLVM backend is part of my efforts to rebase bgclang to the current
LLVM trunk, but is independently useful (especially for codes that use LLVM as
a JIT in library form).

The assembler/disassembler test coverage is complete. The CodeGen test coverage
is not, but I've included some tests, and more will be added as follow-up work.

llvm-svn: 230413

show more ...


Revision tags: llvmorg-3.6.0-rc4
# 5bedaf93 14-Feb-2015 Duncan P. N. Exon Smith <dexonsmith@apple.com>

PowerPC: Canonicalize access to function attributes, NFC

Canonicalize access to function attributes to use the simpler API.

getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
=> getF

PowerPC: Canonicalize access to function attributes, NFC

Canonicalize access to function attributes to use the simpler API.

getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
=> getFnAttribute(Kind)

getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
=> hasFnAttribute(Kind)

llvm-svn: 229224

show more ...


Revision tags: llvmorg-3.6.0-rc3
# e6698d53 01-Feb-2015 Hal Finkel <hfinkel@anl.gov>

[PowerPC] Make r2 allocatable on PPC64/ELF for some leaf functions

The TOC base pointer is passed in r2, and we normally reserve this register so
that we can depend on it being there. However, for l

[PowerPC] Make r2 allocatable on PPC64/ELF for some leaf functions

The TOC base pointer is passed in r2, and we normally reserve this register so
that we can depend on it being there. However, for leaf functions, and
specifically those leaf functions that don't do any TOC access of their own
(which is generally due to accessing the constant pool, using TLS, etc.),
we can treat r2 as an ordinary callee-saved register (it must be callee-saved
because, for local direct calls, the linker will not insert any save/restore
code).

The allocation order has been changed slightly for PPC64/ELF systems to put r2
at the end of the list (while leaving it near the beginning for Darwin systems
to prevent unnecessary output changes). While r2 is allocatable, using it still
requires spill/restore traffic, and thus comes at the end of the list.

llvm-svn: 227745

show more ...


Revision tags: llvmorg-3.6.0-rc2
# 065d16bc 30-Jan-2015 Eric Christopher <echristo@gmail.com>

Migrage PPCRegisterInfo to use the cached subtarget.

llvm-svn: 227546


Revision tags: llvmorg-3.6.0-rc1
# 934361a4 14-Jan-2015 Hal Finkel <hfinkel@anl.gov>

Revert "r225811 - Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support""

This re-applies r225808, fixed to avoid problems with SDAG dependencies along
with the preceding fix to ScheduleDAGSDN

Revert "r225811 - Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support""

This re-applies r225808, fixed to avoid problems with SDAG dependencies along
with the preceding fix to ScheduleDAGSDNodes::RegDefIter::InitNodeNumDefs.
These problems caused the original regression tests to assert/segfault on many
(but not all) systems.

Original commit message:

This commit does two things:

1. Refactors PPCFastISel to use more of the common infrastructure for call
lowering (this lets us take advantage of this common code for lowering some
common intrinsics, stackmap/patchpoint among them).

2. Adds support for stackmap/patchpoint lowering. For the most part, this is
very similar to the support in the AArch64 target, with the obvious differences
(different registers, NOP instructions, etc.). The test cases are adapted
from the AArch64 test cases.

One difference of note is that the patchpoint call sequence takes 24 bytes, so
you can't use less than that (on AArch64 you can go down to 16). Also, as noted
in the docs, we take the patchpoint address to be the actual code address
(assuming the call is local in the TOC-sharing sense), which should yield
higher performance than generating the full cross-DSO indirect-call sequence
and is likely just as useful for JITed code (if not, we'll change it).

StackMaps and Patchpoints are still marked as experimental, and so this support
is doubly experimental. So go ahead and experiment!

llvm-svn: 225909

show more ...


# 63fb9281 13-Jan-2015 Hal Finkel <hfinkel@anl.gov>

Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support"

Reverting this while I investiage buildbot failures (segfaulting in
GetCostForDef at ScheduleDAGRRList.cpp:314).

llvm-svn: 225811


# 821befd5 13-Jan-2015 Hal Finkel <hfinkel@anl.gov>

[PowerPC] Add StackMap/PatchPoint support

This commit does two things:

1. Refactors PPCFastISel to use more of the common infrastructure for call
lowering (this lets us take advantage of this

[PowerPC] Add StackMap/PatchPoint support

This commit does two things:

1. Refactors PPCFastISel to use more of the common infrastructure for call
lowering (this lets us take advantage of this common code for lowering some
common intrinsics, stackmap/patchpoint among them).

2. Adds support for stackmap/patchpoint lowering. For the most part, this is
very similar to the support in the AArch64 target, with the obvious differences
(different registers, NOP instructions, etc.). The test cases are adapted
from the AArch64 test cases.

One difference of note is that the patchpoint call sequence takes 24 bytes, so
you can't use less than that (on AArch64 you can go down to 16). Also, as noted
in the docs, we take the patchpoint address to be the actual code address
(assuming the call is local in the TOC-sharing sense), which should yield
higher performance than generating the full cross-DSO indirect-call sequence
and is likely just as useful for JITed code (if not, we'll change it).

StackMaps and Patchpoints are still marked as experimental, and so this support
is doubly experimental. So go ahead and experiment!

llvm-svn: 225808

show more ...


Revision tags: llvmorg-3.5.1, llvmorg-3.5.1-rc2, llvmorg-3.5.1-rc1, llvmorg-3.5.0, llvmorg-3.5.0-rc4, llvmorg-3.5.0-rc3, llvmorg-3.5.0-rc2
# fc6de428 05-Aug-2014 Eric Christopher <echristo@gmail.com>

Have MachineFunction cache a pointer to the subtarget to make lookups
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lo

Have MachineFunction cache a pointer to the subtarget to make lookups
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.

Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.

llvm-svn: 214838

show more ...


# d913448b 04-Aug-2014 Eric Christopher <echristo@gmail.com>

Remove the TargetMachine forwards for TargetSubtargetInfo based
information and update all callers. No functional change.

llvm-svn: 214781


Revision tags: llvmorg-3.5.0-rc1
# 3ee2af7d 18-Jul-2014 Hal Finkel <hfinkel@anl.gov>

[PowerPC] 32-bit ELF PIC support

This adds initial support for PPC32 ELF PIC (Position Independent Code; the
-fPIC variety), thus rectifying a long-standing deficiency in the PowerPC
backend.

Patch

[PowerPC] 32-bit ELF PIC support

This adds initial support for PPC32 ELF PIC (Position Independent Code; the
-fPIC variety), thus rectifying a long-standing deficiency in the PowerPC
backend.

Patch by Justin Hibbits!

llvm-svn: 213427

show more ...


# ea147a9d 11-Jul-2014 Ulrich Weigand <ulrich.weigand@de.ibm.com>

[PowerPC] Fix invalid displacement created by LocalStackAlloc

This commit fixes a bug in PPCRegisterInfo::isFrameOffsetLegal that
could result in the LocalStackAlloc pass creating an MI instruction

[PowerPC] Fix invalid displacement created by LocalStackAlloc

This commit fixes a bug in PPCRegisterInfo::isFrameOffsetLegal that
could result in the LocalStackAlloc pass creating an MI instruction
out-of-range displacement:
%vreg17<def> = LD 33184, %vreg31; mem:LD8[%g](align=32)
%G8RC:%vreg17 G8RC_and_G8RC_NOX0:%vreg31
(In final assembler output the top bits are stripped off, resulting
in a negative offset loading from below the stack pointer.)

Common code expects the isFrameOffsetLegal routine to verify whether
adding a given offset to the offset already present in the instruction
results in a valid displacement. However, on PowerPC the routine
did not take the already present instruction offset into account.

This commit fixes isFrameOffsetLegal to add the instruction offset,
and updates a local caller (needsFrameBaseReg) to no longer add the
instruction offset itself before calling isFrameOffsetLegal.

Reviewed by Hal Finkel.

llvm-svn: 212832

show more ...


# 14bd521f 27-Jun-2014 Ulrich Weigand <ulrich.weigand@de.ibm.com>

[PowerPC] Constrain base register in PPCRegisterInfo::resolveFrameIndex

I've run into a bug where current LLVM at -O0 (with fast-isel)
generated invalid code like:

ld 0, 20936(1)

[PowerPC] Constrain base register in PPCRegisterInfo::resolveFrameIndex

I've run into a bug where current LLVM at -O0 (with fast-isel)
generated invalid code like:

ld 0, 20936(1) # 8-byte Folded Reload
stw 12, 10348(0)
stw 12, 10344(0)

The underlying vreg had been introduced as base register by the
Local Stack Slot Allocation pass. That register was constrained
to G8RC by PPCRegisterInfo::materializeFrameBaseRegister to match
the ADDI instruction used to set it, but it was *not* constrained
to G8RC_NOX0 to fit the *use* of the register in an address.

That should have happened in PPCRegisterInfo::resolveFrameIndex.
This patch adds an appropriate constrainRegClass call.

Reviewed by Hal Finkel.

llvm-svn: 211897

show more ...


Revision tags: llvmorg-3.4.2, llvmorg-3.4.2-rc1, llvmorg-3.4.1, llvmorg-3.4.1-rc2
# 84e68b29 22-Apr-2014 Chandler Carruth <chandlerc@gmail.com>

[Modules] Fix potential ODR violations by sinking the DEBUG_TYPE
definition below all of the header #include lines, lib/Target/...
edition.

llvm-svn: 206842


# d174b72a 22-Apr-2014 Chandler Carruth <chandlerc@gmail.com>

[cleanup] Lift using directives, DEBUG_TYPE definitions, and even some
system headers above the includes of generated '.inc' files that
actually contain code. In a few targets this was already done p

[cleanup] Lift using directives, DEBUG_TYPE definitions, and even some
system headers above the includes of generated '.inc' files that
actually contain code. In a few targets this was already done pretty
consistently, but it wasn't done *really* consistently anywhere. It is
strictly cleaner IMO and necessary in a bunch of places where the
DEBUG_TYPE is referenced from the generated code. Consistency with the
necessary places trumps. Hopefully the build bots are OK with the
movement of intrin.h...

llvm-svn: 206838

show more ...


Revision tags: llvmorg-3.4.1-rc1
# 840beec2 04-Apr-2014 Craig Topper <craig.topper@gmail.com>

Make consistent use of MCPhysReg instead of uint16_t throughout the tree.

llvm-svn: 205610


# 36c49533 02-Apr-2014 Jim Grosbach <grosbach@apple.com>

Simplify resolveFrameIndex() signature.

Just pass a MachineInstr reference rather than an MBB iterator.
Creating a MachineInstr& is the first thing every implementation did
anyway.

llvm-svn: 205453


# 19be506a 29-Mar-2014 Hal Finkel <hfinkel@anl.gov>

[PowerPC] Add subregister classes for f64 VSX values

We had stored both f64 values and v2f64, etc. values in the VSX registers. This
worked, but was suboptimal because we would always spill 16-byte

[PowerPC] Add subregister classes for f64 VSX values

We had stored both f64 values and v2f64, etc. values in the VSX registers. This
worked, but was suboptimal because we would always spill 16-byte values even
through we almost always had scalar 8-byte values. This resulted in an
increase in stack-size use, extra memory bandwidth, etc. To fix this, I've
added 64-bit subregisters of the Altivec registers, and combined those with the
existing scalar floating-point registers to form a class of VSX scalar
floating-point registers. The ABI code has also been enhanced to use this
register class and some other necessary improvements have been made.

llvm-svn: 205075

show more ...


# 27774d92 13-Mar-2014 Hal Finkel <hfinkel@anl.gov>

[PowerPC] Initial support for the VSX instruction set

VSX is an ISA extension supported on the POWER7 and later cores that enhances
floating-point vector and scalar capabilities. Among other things,

[PowerPC] Initial support for the VSX instruction set

VSX is an ISA extension supported on the POWER7 and later cores that enhances
floating-point vector and scalar capabilities. Among other things, this adds
<2 x double> support and generally helps to reduce register pressure.

The interesting part of this ISA feature is the register configuration: there
are 64 new 128-bit vector registers, the 32 of which are super-registers of the
existing 32 scalar floating-point registers, and the second 32 of which overlap
with the 32 Altivec vector registers. This makes things like vector insertion
and extraction tricky: this can be free but only if we force a restriction to
the right register subclass when needed. A new "minipass" PPCVSXCopy takes care
of this (although it could do a more-optimal job of it; see the comment about
unnecessary copies below).

Please note that, currently, VSX is not enabled by default when targeting
anything because it is not yet ready for that. The assembler and disassembler
are fully implemented and tested. However:

- CodeGen support causes miscompiles; test-suite runtime failures:
MultiSource/Benchmarks/FreeBench/distray/distray
MultiSource/Benchmarks/McCat/08-main/main
MultiSource/Benchmarks/Olden/voronoi/voronoi
MultiSource/Benchmarks/mafft/pairlocalalign
MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4
SingleSource/Benchmarks/CoyoteBench/almabench
SingleSource/Benchmarks/Misc/matmul_f64_4x4

- The lowering currently falls back to using Altivec instructions far more
than it should. Worse, there are some things that are scalarized through the
stack that shouldn't be.

- A lot of unnecessary copies make it past the optimizers, and this needs to
be fixed.

- Many more regression tests are needed.

Normally, I'd fix these things prior to committing, but there are some
students and other contributors who would like to work this, and so it makes
sense to move this development process upstream where it can be subject to the
regular code-review procedures.

llvm-svn: 203768

show more ...


# 1da35121 12-Mar-2014 Patrik Hagglund <patrik.h.hagglund@ericsson.com>

Replace '#include ValueTypes.h' with forward declarations.

In some cases the include is pushed "downstream" (or removed if
unused).

llvm-svn: 203644


# 940ab934 28-Feb-2014 Hal Finkel <hfinkel@anl.gov>

Add CR-bit tracking to the PowerPC backend for i1 values

This change enables tracking i1 values in the PowerPC backend using the
condition register bits. These bits can be treated on PowerPC as sepa

Add CR-bit tracking to the PowerPC backend for i1 values

This change enables tracking i1 values in the PowerPC backend using the
condition register bits. These bits can be treated on PowerPC as separate
registers; individual bit operations (and, or, xor, etc.) are supported.
Tracking booleans in CR bits has several advantages:

- Reduction in register pressure (because we no longer need GPRs to store
boolean values).

- Logical operations on booleans can be handled more efficiently; we used to
have to move all results from comparisons into GPRs, perform promoted
logical operations in GPRs, and then move the result back into condition
register bits to be used by conditional branches. This can be very
inefficient, because the throughput of these CR <-> GPR moves have high
latency and low throughput (especially when other associated instructions
are accounted for).

- On the POWER7 and similar cores, we can increase total throughput by using
the CR bits. CR bit operations have a dedicated functional unit.

Most of this is more-or-less mechanical: Adjustments were needed in the
calling-convention code, support was added for spilling/restoring individual
condition-register bits, and conditional branch instruction definitions taking
specific CR bits were added (plus patterns and code for generating bit-level
operations).

This is enabled by default when running at -O2 and higher. For -O0 and -O1,
where the ability to debug is more important, this feature is disabled by
default. Individual CR bits do not have assigned DWARF register numbers,
and storing values in CR bits makes them invisible to the debugger.

It is critical, however, that we don't move i1 values that have been promoted
to larger values (such as those passed as function arguments) into bit
registers only to quickly turn around and move the values back into GPRs (such
as happens when values are returned by functions). A pair of target-specific
DAG combines are added to remove the trunc/extends in:
trunc(binary-ops(binary-ops(zext(x), zext(y)), ...)
and:
zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...)
In short, we only want to use CR bits where some of the i1 values come from
comparisons or are used by conditional branches or selects. To put it another
way, if we can do the entire i1 computation in GPRs, then we probably should
(on the POWER7, the GPR-operation throughput is higher, and for all cores, the
CR <-> GPR moves are expensive).

POWER7 test-suite performance results (from 10 runs in each configuration):

SingleSource/Benchmarks/Misc/mandel-2: 35% speedup
MultiSource/Benchmarks/Prolangs-C++/city/city: 21% speedup
MultiSource/Benchmarks/MiBench/automotive-susan: 23% speedup
SingleSource/Benchmarks/CoyoteBench/huffbench: 13% speedup
SingleSource/Benchmarks/Misc-C++/Large/sphereflake: 13% speedup
SingleSource/Benchmarks/Misc-C++/mandel-text: 10% speedup

SingleSource/Benchmarks/Misc-C++-EH/spirit: 10% slowdown
MultiSource/Applications/lemon/lemon: 8% slowdown

llvm-svn: 202451

show more ...


Revision tags: llvmorg-3.4.0, llvmorg-3.4.0-rc3, llvmorg-3.4.0-rc2, llvmorg-3.4.0-rc1
# e90fd9c5 07-Oct-2013 Rafael Espindola <rafael.espindola@gmail.com>

Remove getEHExceptionRegister and getEHHandlerRegister.

They haven't been used for a long time. Patch by MathOnNapkins.

llvm-svn: 192099


# 8d86fe7d 30-Aug-2013 Bill Schmidt <wschmidt@linux.vnet.ibm.com>

[PowerPC] Add handling for conversions to fast-isel.

Yet another chunk of fast-isel code. This one handles various
conversions involving floating-point. (It also includes some
miscellaneous handli

[PowerPC] Add handling for conversions to fast-isel.

Yet another chunk of fast-isel code. This one handles various
conversions involving floating-point. (It also includes some
miscellaneous handling throughout the back end for LWA_32 and LWAX_32
that should have been part of the load-store patch.)

llvm-svn: 189677

show more ...


# a5c536e1 01-Aug-2013 Bill Wendling <isanbard@gmail.com>

Use function attributes to indicate that we don't want to realign the stack.

Function attributes are the future! So just query whether we want to realign the
stack directly from the function instead

Use function attributes to indicate that we don't want to realign the stack.

Function attributes are the future! So just query whether we want to realign the
stack directly from the function instead of through a random target options
structure.

llvm-svn: 187618

show more ...


# 1860763c 18-Jul-2013 Hal Finkel <hfinkel@anl.gov>

PPC: Support dynamic allocas with large alignment

Support for dynamic stack alignments in the PPC backend has been unfinished, in
part because it depends on dynamic stack realignment (which I only j

PPC: Support dynamic allocas with large alignment

Support for dynamic stack alignments in the PPC backend has been unfinished, in
part because it depends on dynamic stack realignment (which I only just
recently implemented fully). Now we can also support dynamic allocas with
higher than the default target stack alignment (16 bytes).

In order to round-up the requested size to the maximum requested alignment, we
need an additional register to hold the rounded-up size. We're already using one
scavenged register to hold the previous stack-pointer value (which needs to be
stored with the signal-safe stdux update), and so when we have dynamic allocas
and a large alignment, we allocate two emergency spill slots for the scavenger.

llvm-svn: 186562

show more ...


# f05d6c78 17-Jul-2013 Hal Finkel <hfinkel@anl.gov>

PPC: Add base-pointer support to builtin setjmp/longjmp

First, this changes the base-pointer implementation to remove an unnecessary
complication (and one that is incompatible with how builtin SjLj

PPC: Add base-pointer support to builtin setjmp/longjmp

First, this changes the base-pointer implementation to remove an unnecessary
complication (and one that is incompatible with how builtin SjLj is
implemented): instead of using r31 as the base pointer when it is not needed as
a frame pointer, now the base pointer will always be r30 when needed.

Second, we introduce another pseudo register, BP, which is used just like the FP
pseudo register to refer to the base register before we know for certain what
register it will be.

Third, we now save BP into the jmp_buf, and restore r30 from that slot in
longjmp. If the function that called setjmp did not use a base pointer, then
r30 will be overwritten by the setjmp-calling-function's restore code. FP
restoration (which is restored into r31) works the same way.

llvm-svn: 186545

show more ...


12345678910>>...18