History log of /llvm-project/llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp (Results 151 – 175 of 429)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# a7c54e8c 17-Jul-2013 Hal Finkel <hfinkel@anl.gov>

PPC: Implement base pointer and stack realignment

This builds on some frame-lowering code that has existed since 2005 (r24224)
but was disabled in 2008 (r48188) because it needed base pointer suppor

PPC: Implement base pointer and stack realignment

This builds on some frame-lowering code that has existed since 2005 (r24224)
but was disabled in 2008 (r48188) because it needed base pointer support to
function correctly. This implementation follows the strategy suggested by Dale
Johannesen in r48188 where the following comment was added:

This does not currently work, because the delta between old and new stack
pointers is added to offsets that reference incoming parameters after the
prolog is generated, and the code that does that doesn't handle a variable
delta. You don't want to do that anyway; a better approach is to reserve
another register that retains to the incoming stack pointer, and reference
parameters relative to that.

And now we do exactly that. If we don't need a frame pointer, then we use r31
as a base pointer. If we do need a frame pointer, then we use r30 as a base
pointer. The base pointer retains the value of the stack pointer before it was
decremented in the prologue. We then use the base pointer to resolve all
negative frame indicies. The basic scheme follows that for base pointers in the
X86 backend.

We use a base pointer when we need to dynamically realign the incoming stack
pointer. This currently applies only to static objects (dynamic allocas with
large alignments, and base-pointer support in SjLj lowering will come in future
commits).

llvm-svn: 186478

show more ...


Revision tags: llvmorg-3.3.1-rc1
# 49f487e6 03-Jul-2013 Ulrich Weigand <ulrich.weigand@de.ibm.com>

[PowerPC] Use mtocrf when available

Just as with mfocrf, it is also preferable to use mtocrf instead of
mtcrf when only a single CR register is to be written.

Current code however always emits mtcr

[PowerPC] Use mtocrf when available

Just as with mfocrf, it is also preferable to use mtocrf instead of
mtcrf when only a single CR register is to be written.

Current code however always emits mtcrf. This probably does not matter
when using an external assembler, since the GNU assembler will in fact
automatically replace mtcrf with mtocrf when possible. It does create
inefficient code with the integrated assembler, however.

To fix this, this patch adds MTOCRF/MTOCRF8 instruction patterns and
uses those instead of MTCRF/MTCRF8 everything. Just as done in the
MFOCRF patch committed as 185556, these patterns will be converted
back to MTCRF if MTOCRF is not available on the machine.

As a side effect, this allows to modify the MTCRF pattern to accept
the full range of mask operands for the benefit of the asm parser.

llvm-svn: 185561

show more ...


# d5ebc626 03-Jul-2013 Ulrich Weigand <ulrich.weigand@de.ibm.com>

[PowerPC] Always use mfocrf if available

When accessing just a single CR register, it is always preferable to
use mfocrf instead of mfcr, if the former is available on the CPU.

Current code makes t

[PowerPC] Always use mfocrf if available

When accessing just a single CR register, it is always preferable to
use mfocrf instead of mfcr, if the former is available on the CPU.

Current code makes that distinction in many, but not all places
where a single CR register value is retrieved. One missing
location is PPCRegisterInfo::lowerCRSpilling.

To fix this and make this simpler in the future, this patch changes
the bulk of the back-end to always assume mfocrf is available and
simply generate it when needed.

On machines that actually do not support mfocrf, the instruction
is replaced by mfcr at the very end, in EmitInstruction.

This has the additional benefit that we no longer need the
MFCRpseud hack, since before EmitInstruction we always have
a MFOCRF instruction pattern, which already models data flow
as required.

The patch also adds the MFOCRF8 version of the instruction,
which was missing so far.

Except for the PPCRegisterInfo::lowerCRSpilling case, no change
in generated code intended.

llvm-svn: 185556

show more ...


# 52727c6b 02-Jul-2013 Hal Finkel <hfinkel@anl.gov>

Cleanup PPC Altivec registers in CSR lists and improve VRSAVE handling

There are a couple of (small) related changes here:

1. The printed name of the VRSAVE register has been changed from VRsave to

Cleanup PPC Altivec registers in CSR lists and improve VRSAVE handling

There are a couple of (small) related changes here:

1. The printed name of the VRSAVE register has been changed from VRsave to
vrsave in order to match the name accepted by GNU binutils.

2. Support for parsing vrsave has been added to the asm parser (it seems that
there was no test case specifically covering this code, so I've added one).

3. The list of Altivec registers, which was common to all calling conventions,
has been separated out. This allows us to define the base CSR lists, and then
lists for each ABI with Altivec included. This allows SjLj, for example, to
work correctly on non-Altivec targets without using unnatural definitions of
the NoRegs CSR list.

4. VRSAVE is now always reserved on non-Darwin targets and all Altivec
registers are reserved when Altivec is disabled.

With these changes, it is now possible to compile a function containing
__builtin_unwind_init() on Linux/PPC64 with debugging information. This did not
work previously because GNU binutils assumes that all .cfi_offset offsets will
be 8-byte aligned on PPC64 (and errors out if you provide a non-8-byte-aligned
offset). This is not true for the vrsave register, however, because this
register is used only on Darwin, GCC does not bother printing a .cfi_offset
entry for it (even though there is a slot in the stack frame for it as
specified by the ABI). This change allows us to do the same: we will also not
print .cfi_offset directives for vrsave.

llvm-svn: 185409

show more ...


# b735b4d6 16-Jun-2013 David Blaikie <dblaikie@gmail.com>

DebugInfo: remove target-specific Frame Index handling for DBG_VALUE MachineInstrs

Frame index handling is now target-agnostic, so delete the target hooks
for creation & asm printing of target-speci

DebugInfo: remove target-specific Frame Index handling for DBG_VALUE MachineInstrs

Frame index handling is now target-agnostic, so delete the target hooks
for creation & asm printing of target-specific addressing in DBG_VALUEs
and any related functions.

llvm-svn: 184067

show more ...


Revision tags: llvmorg-3.3.0
# 5e7656bf 07-Jun-2013 Bill Wendling <isanbard@gmail.com>

Don't cache the instruction and register info from the TargetMachine, because
the internals of TargetMachine could change.

No functionality change intended.

llvm-svn: 183494


Revision tags: llvmorg-3.3.0-rc3, llvmorg-3.3.0-rc2
# 9d980cbd 16-May-2013 Ulrich Weigand <ulrich.weigand@de.ibm.com>

[PowerPC] Use true offset value in "memrix" machine operands

This is the second part of the change to always return "true"
offset values from getPreIndexedAddressParts, tackling the
case of "memrix"

[PowerPC] Use true offset value in "memrix" machine operands

This is the second part of the change to always return "true"
offset values from getPreIndexedAddressParts, tackling the
case of "memrix" type operands.

This is about instructions like LD/STD that only have a 14-bit
field to encode immediate offsets, which are implicitly extended
by two zero bits by the machine, so that in effect we can access
16-bit offsets as long as they are a multiple of 4.

The PowerPC back end currently handles such instructions by
carrying the 14-bit value (as it will get encoded into the
actual machine instructions) in the machine operand fields
for such instructions. This means that those values are
in fact not the true offset, but rather the offset divided
by 4 (and then truncated to an unsigned 14-bit value).

Like in the case fixed in r182012, this makes common code
operations on such offset values not work as expected.
Furthermore, there doesn't really appear to be any strong
reason why we should encode machine operands this way.

This patch therefore changes the encoding of "memrix" type
machine operands to simply contain the "true" offset value
as a signed immediate value, while enforcing the rules that
it must fit in a 16-bit signed value and must also be a
multiple of 4.

This change must be made simultaneously in all places that
access machine operands of this type. However, just about
all those changes make the code simpler; in many cases we
can now just share the same code for memri and memrix
operands.

llvm-svn: 182032

show more ...


# 25c1992b 15-May-2013 Hal Finkel <hfinkel@anl.gov>

Implement PPC counter loops as a late IR-level pass

The old PPCCTRLoops pass, like the Hexagon pass version from which it was
derived, could only handle some simple loops in canonical form. We canno

Implement PPC counter loops as a late IR-level pass

The old PPCCTRLoops pass, like the Hexagon pass version from which it was
derived, could only handle some simple loops in canonical form. We cannot
directly adapt the new Hexagon hardware loops pass, however, because the
Hexagon pass contains a fundamental assumption that non-constant-trip-count
loops will contain a guard, and this is not always true (the result being that
incorrect negative counts can be generated). With this commit, we replace the
pass with a late IR-level pass which makes use of SE to calculate the
backedge-taken counts and safely generate the loop-count expressions (including
any necessary max() parts). This IR level pass inserts custom intrinsics that
are lowered into the desired decrement-and-branch instructions.

The most fragile part of this new implementation is that interfering uses of
the counter register must be detected on the IR level (and, on PPC, this also
includes any indirect branches in addition to function calls). Also, to make
all of this work, we need a variant of the mtctr instruction that is marked
as having side effects. Without this, machine-code level CSE, DCE, etc.
illegally transform the resulting code. Hopefully, this can be improved
in the future.

This new pass is smaller than the original (and much smaller than the new
Hexagon hardware loops pass), and can handle many additional cases correctly.
In addition, the preheader-creation code has been copied from LoopSimplify, and
after we decide on where it belongs, this code will be refactored so that it
can be explicitly shared (making this implementation even smaller).

The new test-case files ctrloop-{le,lt,ne}.ll have been adapted from tests for
the new Hexagon pass. There are a few classes of loops that this pass does not
transform (noted by FIXMEs in the files), but these deficiencies can be
addressed within the SE infrastructure (thus helping many other passes as well).

llvm-svn: 181927

show more ...


Revision tags: llvmorg-3.3.0-rc1
# b5899d57 09-Apr-2013 Hal Finkel <hfinkel@anl.gov>

Use virtual base registers on PPC

On PowerPC, non-vector loads and stores have r+i forms; however, in functions
with large stack frames these were not being used to access slots far from the
stack p

Use virtual base registers on PPC

On PowerPC, non-vector loads and stores have r+i forms; however, in functions
with large stack frames these were not being used to access slots far from the
stack pointer because such slots were out of range for the signed 16-bit
immediate offset field. This increases register pressure because we need a
separate register for each offset (when the r+r form is used). By enabling
virtual base registers, we can deal with large stack frames without unduly
increasing register pressure.

llvm-svn: 179105

show more ...


# d61d4f80 06-Apr-2013 Hal Finkel <hfinkel@anl.gov>

Implement PPCInstrInfo::FoldImmediate

There are certain PPC instructions into which we can fold a zero immediate
operand. We can detect such cases by looking at the register class required
by the us

Implement PPCInstrInfo::FoldImmediate

There are certain PPC instructions into which we can fold a zero immediate
operand. We can detect such cases by looking at the register class required
by the using operand (so long as it is not otherwise constrained).

llvm-svn: 178961

show more ...


# 39caf9f5 01-Apr-2013 Hal Finkel <hfinkel@anl.gov>

Use ImmToIdxMap.count in PPCRegisterInfo

Code improvement suggested by Jakob (in review of r178450). No functionality
change intended.

llvm-svn: 178473


# 8540f777 31-Mar-2013 Hal Finkel <hfinkel@anl.gov>

Cleanup ImmToIdxMap and noImmForm in PPCRegisterInfo

ImmToIdxMap should be a DenseMap (not a std::map) because there
is no ordering requirement. Also, we don't need a separate list
of instructions f

Cleanup ImmToIdxMap and noImmForm in PPCRegisterInfo

ImmToIdxMap should be a DenseMap (not a std::map) because there
is no ordering requirement. Also, we don't need a separate list
of instructions for noImmForm in eliminateFrameIndex, because this
list is essentially the complement of the keys in ImmToIdxMap.

No functionality change intended.

llvm-svn: 178450

show more ...


# beb296be 31-Mar-2013 Hal Finkel <hfinkel@anl.gov>

Add the PPC lfiwax instruction

This instruction is available on modern PPC64 CPUs, and is now used
to improve the SINT_TO_FP lowering (by eliminating the need for the
separate sign extension instruc

Add the PPC lfiwax instruction

This instruction is available on modern PPC64 CPUs, and is now used
to improve the SINT_TO_FP lowering (by eliminating the need for the
separate sign extension instruction and decreasing the amount of
needed stack space).

llvm-svn: 178446

show more ...


# e53429a1 31-Mar-2013 Hal Finkel <hfinkel@anl.gov>

Cleanup PPC(64) i32 -> float/double conversion

The existing SINT_TO_FP code for i32 -> float/double conversion was disabled
because it relied on broken EXTSW_32/STD_32 instruction definitions. The
o

Cleanup PPC(64) i32 -> float/double conversion

The existing SINT_TO_FP code for i32 -> float/double conversion was disabled
because it relied on broken EXTSW_32/STD_32 instruction definitions. The
original intent had been to enable these 64-bit instructions to be used on CPUs
that support them even in 32-bit mode. Unfortunately, this form of lying to
the infrastructure was buggy (as explained in the FIXME comment) and had
therefore been disabled.

This re-enables this functionality, using regular DAG nodes, but only when
compiling in 64-bit mode. The old STD_32/EXTSW_32 definitions (which were dead)
are removed.

llvm-svn: 178438

show more ...


# 035b4825 28-Mar-2013 Hal Finkel <hfinkel@anl.gov>

Cleanup PPC CR-spill kill flags and 32- vs. 64-bit instructions

There were a few places where kill flags were not being set correctly, and
where 32-bit instruction variants were being used with 64-b

Cleanup PPC CR-spill kill flags and 32- vs. 64-bit instructions

There were a few places where kill flags were not being set correctly, and
where 32-bit instruction variants were being used with 64-bit registers. After
r178180, this code was being triggered causing llc to assert.

llvm-svn: 178220

show more ...


# 0f77861d 27-Mar-2013 Hal Finkel <hfinkel@anl.gov>

Allocate r0 on PPC

The R0 register can now be allocated because instructions
that cannot use R0 as a GPR have been appropriately marked.

llvm-svn: 178123


# a7b0630b 27-Mar-2013 Hal Finkel <hfinkel@anl.gov>

Don't spill PPC VRSAVE on non-Darwin (even in SjLj)

As Bill Schmidt pointed out to me, only on Darwin do we need to spill/restore
VRSAVE in the SjLj code. For non-Darwin, don't spill/restore VRSAVE

Don't spill PPC VRSAVE on non-Darwin (even in SjLj)

As Bill Schmidt pointed out to me, only on Darwin do we need to spill/restore
VRSAVE in the SjLj code. For non-Darwin, don't spill/restore VRSAVE (and I've
added some asserts to make sure that we're not).

As it turns out, we're not currently handling the Darwin case correctly (I've
added a FIXME in the test case). I've tried adding various implied register
definitions/uses to force the spill without success, so I'll need to address
this later.

llvm-svn: 178096

show more ...


# feea6539 26-Mar-2013 Hal Finkel <hfinkel@anl.gov>

PPC: Use HWEncoding and TRI->getEncodingValue

As pointed out by Jakob, we don't need to maintain a separate
register-numbering table. Instead we should let TableGen generate the table for
us from th

PPC: Use HWEncoding and TRI->getEncodingValue

As pointed out by Jakob, we don't need to maintain a separate
register-numbering table. Instead we should let TableGen generate the table for
us from the information (already present) in PPCRegisterInfo.td.
TRI->getEncodingValue is now used to access register-encoding values.

No functionality change intended.

llvm-svn: 178067

show more ...


# 0dfbb05a 26-Mar-2013 Hal Finkel <hfinkel@anl.gov>

Use multiple virtual registers in PPC CR spilling

Now that the register scavenger can support multiple spill slots, and PEI can
use virtual-register-based scavenging for multiple simultaneous regist

Use multiple virtual registers in PPC CR spilling

Now that the register scavenger can support multiple spill slots, and PEI can
use virtual-register-based scavenging for multiple simultaneous registers, we
can use a virtual register for the transfer register in the CR spilling code.

This should eliminate the last place (outside of the prologue/epilogue) where
we depend on the unconditional availability of the r0 register. We will soon be
able to allocate it (in a somewhat restricted sense) as a GPR.

llvm-svn: 178060

show more ...


# d8a423cd 26-Mar-2013 Hal Finkel <hfinkel@anl.gov>

Update PPCRegisterInfo's use of virtual registers to be SSA

PPC's use of PEI's virtual-register-based scavenging functionality had
redefined the virtual registers (it was non-SSA). Now that PEI supp

Update PPCRegisterInfo's use of virtual registers to be SSA

PPC's use of PEI's virtual-register-based scavenging functionality had
redefined the virtual registers (it was non-SSA). Now that PEI supports
dealing with instructions with multiple virtual registers, this can be
cleanup up to use multiple virtual registers and keep SSA form.

No functionality change intended.

llvm-svn: 178059

show more ...


# c6eaa4ce 23-Mar-2013 Hal Finkel <hfinkel@anl.gov>

Cleanup some unused reg. scavenger parameters in PPCRegisterInfo

These spilling functions will eventually make use of the register scavenger,
however, they'll do so by taking advantage of PEI's virt

Cleanup some unused reg. scavenger parameters in PPCRegisterInfo

These spilling functions will eventually make use of the register scavenger,
however, they'll do so by taking advantage of PEI's virtual-register-based
delayed scavenging mechanism. As a result, these function parameters will not
be used, and can be removed.

No functionality change intended.

llvm-svn: 177827

show more ...


# f70c41ea 21-Mar-2013 Hal Finkel <hfinkel@anl.gov>

Remove the G8RC_NOX0_and_GPRC_NOR0 PPC register class

As Jakob pointed out in his review of r177423, having a shared ZERO
register between the 32- and 64-bit register classes causes this
odd G8RC_NO

Remove the G8RC_NOX0_and_GPRC_NOR0 PPC register class

As Jakob pointed out in his review of r177423, having a shared ZERO
register between the 32- and 64-bit register classes causes this
odd G8RC_NOX0_and_GPRC_NOR0 class to be created. As recommended,
this adds a ZERO8 register which differentiates the 32- and 64-bit
zeros.

No functionality change intended.

llvm-svn: 177683

show more ...


# 756810fe 21-Mar-2013 Hal Finkel <hfinkel@anl.gov>

Implement builtin_{setjmp/longjmp} on PPC

This implements SJLJ lowering on PPC, making the Clang functions
__builtin_{setjmp/longjmp} functional on PPC platforms. The implementation
strategy is simi

Implement builtin_{setjmp/longjmp} on PPC

This implements SJLJ lowering on PPC, making the Clang functions
__builtin_{setjmp/longjmp} functional on PPC platforms. The implementation
strategy is similar to that on X86, with the exception that a branch-and-link
variant is used to get the right jump address. Credit goes to Bill Schmidt for
suggesting the use of the unconditional bcl form (instead of the regular bl
instruction) to limit return-address-cache pollution.

Benchmarking the speed at -O3 of:

static jmp_buf env_sigill;

void foo() {
__builtin_longjmp(env_sigill,1);
}

main() {
...

for (int i = 0; i < c; ++i) {
if (__builtin_setjmp(env_sigill)) {
goto done;
} else {
foo();
}

done:;
}

...
}

vs. the same code using the libc setjmp/longjmp functions on a P7 shows that
this builtin implementation is ~4x faster with Altivec enabled and ~7.25x
faster with Altivec disabled. This comparison is somewhat unfair because the
libc version must also save/restore the VSX registers which we don't yet
support.

llvm-svn: 177666

show more ...


# a1431df5 21-Mar-2013 Hal Finkel <hfinkel@anl.gov>

Add support for spilling VRSAVE on PPC

Although there is only one Altivec VRSAVE register, it is a member of
a register class, and we need the ability to spill it. Because this
register is normally

Add support for spilling VRSAVE on PPC

Although there is only one Altivec VRSAVE register, it is a member of
a register class, and we need the ability to spill it. Because this
register is normally callee-preserved and handled by special code this
has never before been necessary. However, this capability will be required by
a forthcoming commit adding SjLj support.

llvm-svn: 177654

show more ...


# aa03c03a 21-Mar-2013 Hal Finkel <hfinkel@anl.gov>

Correct PPC FRAMEADDR lowering using a pseudo-register

The old code used to lower FRAMEADDR tried to replicate the logic in the real
frame-lowering code that determines whether or not the frame poin

Correct PPC FRAMEADDR lowering using a pseudo-register

The old code used to lower FRAMEADDR tried to replicate the logic in the real
frame-lowering code that determines whether or not the frame pointer (r31) will
be used. When it seemed as through the frame pointer would not be used, the
stack pointer (r1) was used instead. Unfortunately, because the stack size is
not yet known, this does not work. Instead, this change introduces new
always-reserved pseudo-registers (FP and FP8) that are replaced during prologue
insertion with the real frame-pointer register (either r1 or r31).

It is important that this intrinsic always return a valid frame address because
it is used by Clang to store the frame address as part of code generation for
__builtin_setjmp.

llvm-svn: 177653

show more ...


12345678910>>...18