#
a7c54e8c |
| 17-Jul-2013 |
Hal Finkel <hfinkel@anl.gov> |
PPC: Implement base pointer and stack realignment
This builds on some frame-lowering code that has existed since 2005 (r24224) but was disabled in 2008 (r48188) because it needed base pointer suppor
PPC: Implement base pointer and stack realignment
This builds on some frame-lowering code that has existed since 2005 (r24224) but was disabled in 2008 (r48188) because it needed base pointer support to function correctly. This implementation follows the strategy suggested by Dale Johannesen in r48188 where the following comment was added:
This does not currently work, because the delta between old and new stack pointers is added to offsets that reference incoming parameters after the prolog is generated, and the code that does that doesn't handle a variable delta. You don't want to do that anyway; a better approach is to reserve another register that retains to the incoming stack pointer, and reference parameters relative to that.
And now we do exactly that. If we don't need a frame pointer, then we use r31 as a base pointer. If we do need a frame pointer, then we use r30 as a base pointer. The base pointer retains the value of the stack pointer before it was decremented in the prologue. We then use the base pointer to resolve all negative frame indicies. The basic scheme follows that for base pointers in the X86 backend.
We use a base pointer when we need to dynamically realign the incoming stack pointer. This currently applies only to static objects (dynamic allocas with large alignments, and base-pointer support in SjLj lowering will come in future commits).
llvm-svn: 186478
show more ...
|
Revision tags: llvmorg-3.3.1-rc1 |
|
#
49f487e6 |
| 03-Jul-2013 |
Ulrich Weigand <ulrich.weigand@de.ibm.com> |
[PowerPC] Use mtocrf when available
Just as with mfocrf, it is also preferable to use mtocrf instead of mtcrf when only a single CR register is to be written.
Current code however always emits mtcr
[PowerPC] Use mtocrf when available
Just as with mfocrf, it is also preferable to use mtocrf instead of mtcrf when only a single CR register is to be written.
Current code however always emits mtcrf. This probably does not matter when using an external assembler, since the GNU assembler will in fact automatically replace mtcrf with mtocrf when possible. It does create inefficient code with the integrated assembler, however.
To fix this, this patch adds MTOCRF/MTOCRF8 instruction patterns and uses those instead of MTCRF/MTCRF8 everything. Just as done in the MFOCRF patch committed as 185556, these patterns will be converted back to MTCRF if MTOCRF is not available on the machine.
As a side effect, this allows to modify the MTCRF pattern to accept the full range of mask operands for the benefit of the asm parser.
llvm-svn: 185561
show more ...
|
#
d5ebc626 |
| 03-Jul-2013 |
Ulrich Weigand <ulrich.weigand@de.ibm.com> |
[PowerPC] Always use mfocrf if available
When accessing just a single CR register, it is always preferable to use mfocrf instead of mfcr, if the former is available on the CPU.
Current code makes t
[PowerPC] Always use mfocrf if available
When accessing just a single CR register, it is always preferable to use mfocrf instead of mfcr, if the former is available on the CPU.
Current code makes that distinction in many, but not all places where a single CR register value is retrieved. One missing location is PPCRegisterInfo::lowerCRSpilling.
To fix this and make this simpler in the future, this patch changes the bulk of the back-end to always assume mfocrf is available and simply generate it when needed.
On machines that actually do not support mfocrf, the instruction is replaced by mfcr at the very end, in EmitInstruction.
This has the additional benefit that we no longer need the MFCRpseud hack, since before EmitInstruction we always have a MFOCRF instruction pattern, which already models data flow as required.
The patch also adds the MFOCRF8 version of the instruction, which was missing so far.
Except for the PPCRegisterInfo::lowerCRSpilling case, no change in generated code intended.
llvm-svn: 185556
show more ...
|
#
52727c6b |
| 02-Jul-2013 |
Hal Finkel <hfinkel@anl.gov> |
Cleanup PPC Altivec registers in CSR lists and improve VRSAVE handling
There are a couple of (small) related changes here:
1. The printed name of the VRSAVE register has been changed from VRsave to
Cleanup PPC Altivec registers in CSR lists and improve VRSAVE handling
There are a couple of (small) related changes here:
1. The printed name of the VRSAVE register has been changed from VRsave to vrsave in order to match the name accepted by GNU binutils.
2. Support for parsing vrsave has been added to the asm parser (it seems that there was no test case specifically covering this code, so I've added one).
3. The list of Altivec registers, which was common to all calling conventions, has been separated out. This allows us to define the base CSR lists, and then lists for each ABI with Altivec included. This allows SjLj, for example, to work correctly on non-Altivec targets without using unnatural definitions of the NoRegs CSR list.
4. VRSAVE is now always reserved on non-Darwin targets and all Altivec registers are reserved when Altivec is disabled.
With these changes, it is now possible to compile a function containing __builtin_unwind_init() on Linux/PPC64 with debugging information. This did not work previously because GNU binutils assumes that all .cfi_offset offsets will be 8-byte aligned on PPC64 (and errors out if you provide a non-8-byte-aligned offset). This is not true for the vrsave register, however, because this register is used only on Darwin, GCC does not bother printing a .cfi_offset entry for it (even though there is a slot in the stack frame for it as specified by the ABI). This change allows us to do the same: we will also not print .cfi_offset directives for vrsave.
llvm-svn: 185409
show more ...
|
#
b735b4d6 |
| 16-Jun-2013 |
David Blaikie <dblaikie@gmail.com> |
DebugInfo: remove target-specific Frame Index handling for DBG_VALUE MachineInstrs
Frame index handling is now target-agnostic, so delete the target hooks for creation & asm printing of target-speci
DebugInfo: remove target-specific Frame Index handling for DBG_VALUE MachineInstrs
Frame index handling is now target-agnostic, so delete the target hooks for creation & asm printing of target-specific addressing in DBG_VALUEs and any related functions.
llvm-svn: 184067
show more ...
|
Revision tags: llvmorg-3.3.0 |
|
#
5e7656bf |
| 07-Jun-2013 |
Bill Wendling <isanbard@gmail.com> |
Don't cache the instruction and register info from the TargetMachine, because the internals of TargetMachine could change.
No functionality change intended.
llvm-svn: 183494
|
Revision tags: llvmorg-3.3.0-rc3, llvmorg-3.3.0-rc2 |
|
#
9d980cbd |
| 16-May-2013 |
Ulrich Weigand <ulrich.weigand@de.ibm.com> |
[PowerPC] Use true offset value in "memrix" machine operands
This is the second part of the change to always return "true" offset values from getPreIndexedAddressParts, tackling the case of "memrix"
[PowerPC] Use true offset value in "memrix" machine operands
This is the second part of the change to always return "true" offset values from getPreIndexedAddressParts, tackling the case of "memrix" type operands.
This is about instructions like LD/STD that only have a 14-bit field to encode immediate offsets, which are implicitly extended by two zero bits by the machine, so that in effect we can access 16-bit offsets as long as they are a multiple of 4.
The PowerPC back end currently handles such instructions by carrying the 14-bit value (as it will get encoded into the actual machine instructions) in the machine operand fields for such instructions. This means that those values are in fact not the true offset, but rather the offset divided by 4 (and then truncated to an unsigned 14-bit value).
Like in the case fixed in r182012, this makes common code operations on such offset values not work as expected. Furthermore, there doesn't really appear to be any strong reason why we should encode machine operands this way.
This patch therefore changes the encoding of "memrix" type machine operands to simply contain the "true" offset value as a signed immediate value, while enforcing the rules that it must fit in a 16-bit signed value and must also be a multiple of 4.
This change must be made simultaneously in all places that access machine operands of this type. However, just about all those changes make the code simpler; in many cases we can now just share the same code for memri and memrix operands.
llvm-svn: 182032
show more ...
|
#
25c1992b |
| 15-May-2013 |
Hal Finkel <hfinkel@anl.gov> |
Implement PPC counter loops as a late IR-level pass
The old PPCCTRLoops pass, like the Hexagon pass version from which it was derived, could only handle some simple loops in canonical form. We canno
Implement PPC counter loops as a late IR-level pass
The old PPCCTRLoops pass, like the Hexagon pass version from which it was derived, could only handle some simple loops in canonical form. We cannot directly adapt the new Hexagon hardware loops pass, however, because the Hexagon pass contains a fundamental assumption that non-constant-trip-count loops will contain a guard, and this is not always true (the result being that incorrect negative counts can be generated). With this commit, we replace the pass with a late IR-level pass which makes use of SE to calculate the backedge-taken counts and safely generate the loop-count expressions (including any necessary max() parts). This IR level pass inserts custom intrinsics that are lowered into the desired decrement-and-branch instructions.
The most fragile part of this new implementation is that interfering uses of the counter register must be detected on the IR level (and, on PPC, this also includes any indirect branches in addition to function calls). Also, to make all of this work, we need a variant of the mtctr instruction that is marked as having side effects. Without this, machine-code level CSE, DCE, etc. illegally transform the resulting code. Hopefully, this can be improved in the future.
This new pass is smaller than the original (and much smaller than the new Hexagon hardware loops pass), and can handle many additional cases correctly. In addition, the preheader-creation code has been copied from LoopSimplify, and after we decide on where it belongs, this code will be refactored so that it can be explicitly shared (making this implementation even smaller).
The new test-case files ctrloop-{le,lt,ne}.ll have been adapted from tests for the new Hexagon pass. There are a few classes of loops that this pass does not transform (noted by FIXMEs in the files), but these deficiencies can be addressed within the SE infrastructure (thus helping many other passes as well).
llvm-svn: 181927
show more ...
|
Revision tags: llvmorg-3.3.0-rc1 |
|
#
b5899d57 |
| 09-Apr-2013 |
Hal Finkel <hfinkel@anl.gov> |
Use virtual base registers on PPC
On PowerPC, non-vector loads and stores have r+i forms; however, in functions with large stack frames these were not being used to access slots far from the stack p
Use virtual base registers on PPC
On PowerPC, non-vector loads and stores have r+i forms; however, in functions with large stack frames these were not being used to access slots far from the stack pointer because such slots were out of range for the signed 16-bit immediate offset field. This increases register pressure because we need a separate register for each offset (when the r+r form is used). By enabling virtual base registers, we can deal with large stack frames without unduly increasing register pressure.
llvm-svn: 179105
show more ...
|
#
d61d4f80 |
| 06-Apr-2013 |
Hal Finkel <hfinkel@anl.gov> |
Implement PPCInstrInfo::FoldImmediate
There are certain PPC instructions into which we can fold a zero immediate operand. We can detect such cases by looking at the register class required by the us
Implement PPCInstrInfo::FoldImmediate
There are certain PPC instructions into which we can fold a zero immediate operand. We can detect such cases by looking at the register class required by the using operand (so long as it is not otherwise constrained).
llvm-svn: 178961
show more ...
|
#
39caf9f5 |
| 01-Apr-2013 |
Hal Finkel <hfinkel@anl.gov> |
Use ImmToIdxMap.count in PPCRegisterInfo
Code improvement suggested by Jakob (in review of r178450). No functionality change intended.
llvm-svn: 178473
|
#
8540f777 |
| 31-Mar-2013 |
Hal Finkel <hfinkel@anl.gov> |
Cleanup ImmToIdxMap and noImmForm in PPCRegisterInfo
ImmToIdxMap should be a DenseMap (not a std::map) because there is no ordering requirement. Also, we don't need a separate list of instructions f
Cleanup ImmToIdxMap and noImmForm in PPCRegisterInfo
ImmToIdxMap should be a DenseMap (not a std::map) because there is no ordering requirement. Also, we don't need a separate list of instructions for noImmForm in eliminateFrameIndex, because this list is essentially the complement of the keys in ImmToIdxMap.
No functionality change intended.
llvm-svn: 178450
show more ...
|
#
beb296be |
| 31-Mar-2013 |
Hal Finkel <hfinkel@anl.gov> |
Add the PPC lfiwax instruction
This instruction is available on modern PPC64 CPUs, and is now used to improve the SINT_TO_FP lowering (by eliminating the need for the separate sign extension instruc
Add the PPC lfiwax instruction
This instruction is available on modern PPC64 CPUs, and is now used to improve the SINT_TO_FP lowering (by eliminating the need for the separate sign extension instruction and decreasing the amount of needed stack space).
llvm-svn: 178446
show more ...
|
#
e53429a1 |
| 31-Mar-2013 |
Hal Finkel <hfinkel@anl.gov> |
Cleanup PPC(64) i32 -> float/double conversion
The existing SINT_TO_FP code for i32 -> float/double conversion was disabled because it relied on broken EXTSW_32/STD_32 instruction definitions. The o
Cleanup PPC(64) i32 -> float/double conversion
The existing SINT_TO_FP code for i32 -> float/double conversion was disabled because it relied on broken EXTSW_32/STD_32 instruction definitions. The original intent had been to enable these 64-bit instructions to be used on CPUs that support them even in 32-bit mode. Unfortunately, this form of lying to the infrastructure was buggy (as explained in the FIXME comment) and had therefore been disabled.
This re-enables this functionality, using regular DAG nodes, but only when compiling in 64-bit mode. The old STD_32/EXTSW_32 definitions (which were dead) are removed.
llvm-svn: 178438
show more ...
|
#
035b4825 |
| 28-Mar-2013 |
Hal Finkel <hfinkel@anl.gov> |
Cleanup PPC CR-spill kill flags and 32- vs. 64-bit instructions
There were a few places where kill flags were not being set correctly, and where 32-bit instruction variants were being used with 64-b
Cleanup PPC CR-spill kill flags and 32- vs. 64-bit instructions
There were a few places where kill flags were not being set correctly, and where 32-bit instruction variants were being used with 64-bit registers. After r178180, this code was being triggered causing llc to assert.
llvm-svn: 178220
show more ...
|
#
0f77861d |
| 27-Mar-2013 |
Hal Finkel <hfinkel@anl.gov> |
Allocate r0 on PPC
The R0 register can now be allocated because instructions that cannot use R0 as a GPR have been appropriately marked.
llvm-svn: 178123
|
#
a7b0630b |
| 27-Mar-2013 |
Hal Finkel <hfinkel@anl.gov> |
Don't spill PPC VRSAVE on non-Darwin (even in SjLj)
As Bill Schmidt pointed out to me, only on Darwin do we need to spill/restore VRSAVE in the SjLj code. For non-Darwin, don't spill/restore VRSAVE
Don't spill PPC VRSAVE on non-Darwin (even in SjLj)
As Bill Schmidt pointed out to me, only on Darwin do we need to spill/restore VRSAVE in the SjLj code. For non-Darwin, don't spill/restore VRSAVE (and I've added some asserts to make sure that we're not).
As it turns out, we're not currently handling the Darwin case correctly (I've added a FIXME in the test case). I've tried adding various implied register definitions/uses to force the spill without success, so I'll need to address this later.
llvm-svn: 178096
show more ...
|
#
feea6539 |
| 26-Mar-2013 |
Hal Finkel <hfinkel@anl.gov> |
PPC: Use HWEncoding and TRI->getEncodingValue
As pointed out by Jakob, we don't need to maintain a separate register-numbering table. Instead we should let TableGen generate the table for us from th
PPC: Use HWEncoding and TRI->getEncodingValue
As pointed out by Jakob, we don't need to maintain a separate register-numbering table. Instead we should let TableGen generate the table for us from the information (already present) in PPCRegisterInfo.td. TRI->getEncodingValue is now used to access register-encoding values.
No functionality change intended.
llvm-svn: 178067
show more ...
|
#
0dfbb05a |
| 26-Mar-2013 |
Hal Finkel <hfinkel@anl.gov> |
Use multiple virtual registers in PPC CR spilling
Now that the register scavenger can support multiple spill slots, and PEI can use virtual-register-based scavenging for multiple simultaneous regist
Use multiple virtual registers in PPC CR spilling
Now that the register scavenger can support multiple spill slots, and PEI can use virtual-register-based scavenging for multiple simultaneous registers, we can use a virtual register for the transfer register in the CR spilling code.
This should eliminate the last place (outside of the prologue/epilogue) where we depend on the unconditional availability of the r0 register. We will soon be able to allocate it (in a somewhat restricted sense) as a GPR.
llvm-svn: 178060
show more ...
|
#
d8a423cd |
| 26-Mar-2013 |
Hal Finkel <hfinkel@anl.gov> |
Update PPCRegisterInfo's use of virtual registers to be SSA
PPC's use of PEI's virtual-register-based scavenging functionality had redefined the virtual registers (it was non-SSA). Now that PEI supp
Update PPCRegisterInfo's use of virtual registers to be SSA
PPC's use of PEI's virtual-register-based scavenging functionality had redefined the virtual registers (it was non-SSA). Now that PEI supports dealing with instructions with multiple virtual registers, this can be cleanup up to use multiple virtual registers and keep SSA form.
No functionality change intended.
llvm-svn: 178059
show more ...
|
#
c6eaa4ce |
| 23-Mar-2013 |
Hal Finkel <hfinkel@anl.gov> |
Cleanup some unused reg. scavenger parameters in PPCRegisterInfo
These spilling functions will eventually make use of the register scavenger, however, they'll do so by taking advantage of PEI's virt
Cleanup some unused reg. scavenger parameters in PPCRegisterInfo
These spilling functions will eventually make use of the register scavenger, however, they'll do so by taking advantage of PEI's virtual-register-based delayed scavenging mechanism. As a result, these function parameters will not be used, and can be removed.
No functionality change intended.
llvm-svn: 177827
show more ...
|
#
f70c41ea |
| 21-Mar-2013 |
Hal Finkel <hfinkel@anl.gov> |
Remove the G8RC_NOX0_and_GPRC_NOR0 PPC register class
As Jakob pointed out in his review of r177423, having a shared ZERO register between the 32- and 64-bit register classes causes this odd G8RC_NO
Remove the G8RC_NOX0_and_GPRC_NOR0 PPC register class
As Jakob pointed out in his review of r177423, having a shared ZERO register between the 32- and 64-bit register classes causes this odd G8RC_NOX0_and_GPRC_NOR0 class to be created. As recommended, this adds a ZERO8 register which differentiates the 32- and 64-bit zeros.
No functionality change intended.
llvm-svn: 177683
show more ...
|
#
756810fe |
| 21-Mar-2013 |
Hal Finkel <hfinkel@anl.gov> |
Implement builtin_{setjmp/longjmp} on PPC
This implements SJLJ lowering on PPC, making the Clang functions __builtin_{setjmp/longjmp} functional on PPC platforms. The implementation strategy is simi
Implement builtin_{setjmp/longjmp} on PPC
This implements SJLJ lowering on PPC, making the Clang functions __builtin_{setjmp/longjmp} functional on PPC platforms. The implementation strategy is similar to that on X86, with the exception that a branch-and-link variant is used to get the right jump address. Credit goes to Bill Schmidt for suggesting the use of the unconditional bcl form (instead of the regular bl instruction) to limit return-address-cache pollution.
Benchmarking the speed at -O3 of:
static jmp_buf env_sigill;
void foo() { __builtin_longjmp(env_sigill,1); }
main() { ...
for (int i = 0; i < c; ++i) { if (__builtin_setjmp(env_sigill)) { goto done; } else { foo(); }
done:; }
... }
vs. the same code using the libc setjmp/longjmp functions on a P7 shows that this builtin implementation is ~4x faster with Altivec enabled and ~7.25x faster with Altivec disabled. This comparison is somewhat unfair because the libc version must also save/restore the VSX registers which we don't yet support.
llvm-svn: 177666
show more ...
|
#
a1431df5 |
| 21-Mar-2013 |
Hal Finkel <hfinkel@anl.gov> |
Add support for spilling VRSAVE on PPC
Although there is only one Altivec VRSAVE register, it is a member of a register class, and we need the ability to spill it. Because this register is normally
Add support for spilling VRSAVE on PPC
Although there is only one Altivec VRSAVE register, it is a member of a register class, and we need the ability to spill it. Because this register is normally callee-preserved and handled by special code this has never before been necessary. However, this capability will be required by a forthcoming commit adding SjLj support.
llvm-svn: 177654
show more ...
|
#
aa03c03a |
| 21-Mar-2013 |
Hal Finkel <hfinkel@anl.gov> |
Correct PPC FRAMEADDR lowering using a pseudo-register
The old code used to lower FRAMEADDR tried to replicate the logic in the real frame-lowering code that determines whether or not the frame poin
Correct PPC FRAMEADDR lowering using a pseudo-register
The old code used to lower FRAMEADDR tried to replicate the logic in the real frame-lowering code that determines whether or not the frame pointer (r31) will be used. When it seemed as through the frame pointer would not be used, the stack pointer (r1) was used instead. Unfortunately, because the stack size is not yet known, this does not work. Instead, this change introduces new always-reserved pseudo-registers (FP and FP8) that are replaced during prologue insertion with the real frame-pointer register (either r1 or r31).
It is important that this intrinsic always return a valid frame address because it is used by Clang to store the frame address as part of code generation for __builtin_setjmp.
llvm-svn: 177653
show more ...
|