#
736d39e1 |
| 13-Feb-2015 |
Eric Christopher <echristo@gmail.com> |
The TOC save offset can be computed at compile time, do so and propagate changes.
llvm-svn: 228997
|
#
f71609b5 |
| 13-Feb-2015 |
Eric Christopher <echristo@gmail.com> |
The return save offset can be computed at initialization time - do so and save the value.
llvm-svn: 228996
|
#
82f1c775 |
| 10-Feb-2015 |
Bill Schmidt <wschmidt@linux.vnet.ibm.com> |
[PowerPC] Fix reverted patch r227976 to avoid register assignment issues
See full discussion in http://reviews.llvm.org/D7491.
We now hide the add-immediate and call instructions together in a sepa
[PowerPC] Fix reverted patch r227976 to avoid register assignment issues
See full discussion in http://reviews.llvm.org/D7491.
We now hide the add-immediate and call instructions together in a separate pseudo-op, which is tagged to define GPR3 and clobber the call-killed registers. The PPCTLSDynamicCall pass prior to RA now expands this op into the two separate addi and call ops, with explicit definitions of GPR3 on both instructions, and explicit clobbers on the call instruction. The pass is now marked as requiring and preserving the LiveIntervals and SlotIndexes analyses, and fixes these up after the replacement sequences are introduced.
Self-hosting has been verified on LE P8 and BE P7 with various optimization levels, etc. It has also been verified with the --no-tls-optimize flag workaround removed.
llvm-svn: 228725
show more ...
|
#
0d2a1515 |
| 06-Feb-2015 |
Hal Finkel <hfinkel@anl.gov> |
Revert "r227976 - [PowerPC] Yet another approach to __tls_get_addr" and related fixups
Unfortunately, even with the workaround of disabling the linker TLS optimizations in Clang restored (which has
Revert "r227976 - [PowerPC] Yet another approach to __tls_get_addr" and related fixups
Unfortunately, even with the workaround of disabling the linker TLS optimizations in Clang restored (which has already been done), this still breaks self-hosting on my P7 machine (-O3 -DNDEBUG -mcpu=native).
Bill is currently working on an alternate implementation to address the TLS issue in a way that also fully elides the linker bug (which, unfortunately, this approach did not fully), so I'm reverting this now.
llvm-svn: 228460
show more ...
|
#
685aa8b0 |
| 03-Feb-2015 |
Bill Schmidt <wschmidt@linux.vnet.ibm.com> |
[PowerPC] Yet another approach to __tls_get_addr
This patch is a third attempt to properly handle the local-dynamic and global-dynamic TLS models.
In my original implementation, calls to __tls_get_
[PowerPC] Yet another approach to __tls_get_addr
This patch is a third attempt to properly handle the local-dynamic and global-dynamic TLS models.
In my original implementation, calls to __tls_get_addr were hidden from view until the asm-printer phase, at which point the underlying branch-and-link instruction was created with proper relocations. This mostly worked well, but I used some repellent techniques to ensure that the TLS_GET_ADDR nodes at the SD and MI levels correctly received input from GPR3 and produced output into GPR3. This proved to work badly in the presence of multiple TLS variable accesses, with the copies to and from GPR3 being scheduled incorrectly and generally creating havoc.
In r221703, I addressed that problem by representing the calls to __tls_get_addr as true calls during instruction lowering. This had the advantage of removing all of the bad hacks and relying on the existing call machinery to properly glue the copies in place. It looked like this was going to be the right way to go.
However, as a side effect of the recent discovery of problems with linker optimizations for TLS, we discovered cases of suboptimal code generation with this strategy. The problem comes when tls_get_addr is called for the same address, and there is a resulting CSE opportunity. It turns out that in such cases MachineCSE will common the addis/addi instructions that set up the input value to tls_get_addr, but will not common the calls themselves. MachineCSE does not have any machinery to common idempotent calls. This is perfectly sensible, since presumably this would be done at the IR level, and introducing calls in the back end isn't commonplace. In any case, we end up with two calls to __tls_get_addr when one would suffice, and that isn't good.
I presumed that the original design would have allowed commoning of the machine-specific nodes that hid the __tls_get_addr calls, so as suggested by Ulrich Weigand, I went back to that design and cleaned it up so that the copies were properly held together by glue nodes. However, it turned out that this didn't work either...the presence of copies to physical registers kept the machine-specific nodes from being commoned also.
All of which leads to the design presented here. This is a return to the original design, except that no attempt is made to introduce copies to and from GPR3 during instruction lowering. Virtual registers are used until prior to register allocation. At that point, a special pass is run that identifies the machine-specific nodes that hide the tls_get_addr calls and introduces the copies to and from GPR3 around them. The register allocator then coalesces these copies away. With this design, MachineCSE succeeds in commoning tls_get_addr calls where possible, and we get nice optimal code generation (better than GCC at the moment, which does not common these calls).
One additional problem must be dealt with: After introducing the mentions of the physical register GPR3, the aggressive anti-dependence breaker sees opportunities to improve scheduling by selecting a different register instead. Flags must be used on the instruction descriptions to tell the anti-dependence breaker to keep its hands in its pockets.
One thing missing from the original design was recording a definition of the link register on the GET_TLS_ADDR nodes. Doing this was found to be insufficient to force a stack frame to be created, which led to looping behavior because two different LR values were stored at the same address. This appears to have been an oversight in PPCFrameLowering::determineFrameLayout(), which is repaired here.
Because MustSaveLR() returns true for calls to builtin_return_address, this changed the expected behavior of test/CodeGen/PowerPC/retaddr2.ll, which now stacks a frame but formerly did not. I've fixed the test case to reflect this.
There are existing TLS tests to catch regressions; the checks in test/CodeGen/PowerPC/tls-store2.ll proved to be too restrictive in the face of instruction scheduling with these changes, so I fixed that up.
I've added a new test case based on the PrettyStackTrace module that demonstrated the original problem. This checks that we get correct code generation and that CSE of the calls to __get_tls_addr has taken place.
llvm-svn: 227976
show more ...
|
Revision tags: llvmorg-3.6.0-rc2 |
|
#
cccae795 |
| 30-Jan-2015 |
Eric Christopher <echristo@gmail.com> |
Use the cached subtargets and remove calls to getSubtarget/getSubtargetImpl without a Function argument.
llvm-svn: 227622
|
#
38522b88 |
| 30-Jan-2015 |
Eric Christopher <echristo@gmail.com> |
Use the cached subtarget in PPCFrameLowering.
llvm-svn: 227548
|
Revision tags: llvmorg-3.6.0-rc1 |
|
#
934361a4 |
| 14-Jan-2015 |
Hal Finkel <hfinkel@anl.gov> |
Revert "r225811 - Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support""
This re-applies r225808, fixed to avoid problems with SDAG dependencies along with the preceding fix to ScheduleDAGSDN
Revert "r225811 - Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support""
This re-applies r225808, fixed to avoid problems with SDAG dependencies along with the preceding fix to ScheduleDAGSDNodes::RegDefIter::InitNodeNumDefs. These problems caused the original regression tests to assert/segfault on many (but not all) systems.
Original commit message:
This commit does two things:
1. Refactors PPCFastISel to use more of the common infrastructure for call lowering (this lets us take advantage of this common code for lowering some common intrinsics, stackmap/patchpoint among them).
2. Adds support for stackmap/patchpoint lowering. For the most part, this is very similar to the support in the AArch64 target, with the obvious differences (different registers, NOP instructions, etc.). The test cases are adapted from the AArch64 test cases.
One difference of note is that the patchpoint call sequence takes 24 bytes, so you can't use less than that (on AArch64 you can go down to 16). Also, as noted in the docs, we take the patchpoint address to be the actual code address (assuming the call is local in the TOC-sharing sense), which should yield higher performance than generating the full cross-DSO indirect-call sequence and is likely just as useful for JITed code (if not, we'll change it).
StackMaps and Patchpoints are still marked as experimental, and so this support is doubly experimental. So go ahead and experiment!
llvm-svn: 225909
show more ...
|
#
63fb9281 |
| 13-Jan-2015 |
Hal Finkel <hfinkel@anl.gov> |
Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support"
Reverting this while I investiage buildbot failures (segfaulting in GetCostForDef at ScheduleDAGRRList.cpp:314).
llvm-svn: 225811
|
#
821befd5 |
| 13-Jan-2015 |
Hal Finkel <hfinkel@anl.gov> |
[PowerPC] Add StackMap/PatchPoint support
This commit does two things:
1. Refactors PPCFastISel to use more of the common infrastructure for call lowering (this lets us take advantage of this
[PowerPC] Add StackMap/PatchPoint support
This commit does two things:
1. Refactors PPCFastISel to use more of the common infrastructure for call lowering (this lets us take advantage of this common code for lowering some common intrinsics, stackmap/patchpoint among them).
2. Adds support for stackmap/patchpoint lowering. For the most part, this is very similar to the support in the AArch64 target, with the obvious differences (different registers, NOP instructions, etc.). The test cases are adapted from the AArch64 test cases.
One difference of note is that the patchpoint call sequence takes 24 bytes, so you can't use less than that (on AArch64 you can go down to 16). Also, as noted in the docs, we take the patchpoint address to be the actual code address (assuming the call is local in the TOC-sharing sense), which should yield higher performance than generating the full cross-DSO indirect-call sequence and is likely just as useful for JITed code (if not, we'll change it).
StackMaps and Patchpoints are still marked as experimental, and so this support is doubly experimental. So go ahead and experiment!
llvm-svn: 225808
show more ...
|
#
f4a22c0d |
| 13-Jan-2015 |
Hal Finkel <hfinkel@anl.gov> |
[PowerPC] Split the blr definition into BLR and BLR8
We really need a separate 64-bit version of this instruction so that it can be marked as clobbering LR8 (instead of just LR). No change in functi
[PowerPC] Split the blr definition into BLR and BLR8
We really need a separate 64-bit version of this instruction so that it can be marked as clobbering LR8 (instead of just LR). No change in functionality (although the verifier might be slightly happier), however, it is required for stackmap/patchpoint support. Thus, this will be covered by stackmap test cases once those are added.
llvm-svn: 225804
show more ...
|
#
654346e6 |
| 10-Jan-2015 |
Justin Hibbits <jrh29@alumni.cwru.edu> |
Fully fix Bug #22115.
Summary: In the previous commit, the register was saved, but space was not allocated. This resulted in the parameter save area potentially clobbering r30, leading to nasty resu
Fully fix Bug #22115.
Summary: In the previous commit, the register was saved, but space was not allocated. This resulted in the parameter save area potentially clobbering r30, leading to nasty results.
Test Plan: Tests updated
Reviewers: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6906
llvm-svn: 225573
show more ...
|
#
98a532dd |
| 08-Jan-2015 |
Justin Hibbits <jrh29@alumni.cwru.edu> |
Add saving and restoring of r30 to the prologue and epilogue, respectively
Summary: The PIC additions didn't update the prologue and epilogue code to save and restore r30 (PIC base register). This
Add saving and restoring of r30 to the prologue and epilogue, respectively
Summary: The PIC additions didn't update the prologue and epilogue code to save and restore r30 (PIC base register). This does that.
Test Plan: Tests updated.
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6876
llvm-svn: 225450
show more ...
|
Revision tags: llvmorg-3.5.1, llvmorg-3.5.1-rc2, llvmorg-3.5.1-rc1 |
|
#
1f0a44e6 |
| 01-Dec-2014 |
Jay Foad <jay.foad@gmail.com> |
[PowerPC] Fix unwind info with dynamic stack realignment
Summary: PowerPC DWARF unwind info defined CFA as SP + offset even in a function where the stack had been dynamically realigned. This clearly
[PowerPC] Fix unwind info with dynamic stack realignment
Summary: PowerPC DWARF unwind info defined CFA as SP + offset even in a function where the stack had been dynamically realigned. This clearly doesn't work because the offset from SP to CFA is not a constant. Fix it by defining CFA as BP instead.
This was causing the AddressSanitizer null_deref test to fail 50% of the time, depending on whether SP happened to be 32-byte aligned on entry to a particular function or not.
Reviewers: willschm, uweigand, hfinkel
Reviewed By: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6410
llvm-svn: 222996
show more ...
|
Revision tags: llvmorg-3.5.0, llvmorg-3.5.0-rc4, llvmorg-3.5.0-rc3, llvmorg-3.5.0-rc2 |
|
#
fc6de428 |
| 05-Aug-2014 |
Eric Christopher <echristo@gmail.com> |
Have MachineFunction cache a pointer to the subtarget to make lookups shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lo
Have MachineFunction cache a pointer to the subtarget to make lookups shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lookups from the MachineFunction easily.
Update the MIPS subtarget switching machinery to update this pointer at the same time it runs.
llvm-svn: 214838
show more ...
|
#
d913448b |
| 04-Aug-2014 |
Eric Christopher <echristo@gmail.com> |
Remove the TargetMachine forwards for TargetSubtargetInfo based information and update all callers. No functional change.
llvm-svn: 214781
|
Revision tags: llvmorg-3.5.0-rc1 |
|
#
be928cc2 |
| 21-Jul-2014 |
Ulrich Weigand <ulrich.weigand@de.ibm.com> |
[PowerPC] ELFv2 explicit CFI for CR fields
This is a minor improvement in the ELFv2 ABI. In ELFv1, DWARF CFI would represent a saved CR word (holding CR fields CR2, CR3, and CR4) using just a sing
[PowerPC] ELFv2 explicit CFI for CR fields
This is a minor improvement in the ELFv2 ABI. In ELFv1, DWARF CFI would represent a saved CR word (holding CR fields CR2, CR3, and CR4) using just a single CFI record refering to CR2. In ELFv2 instead, each of the CR fields is represented by its own CFI record. The advantage is that the compiler can now chose to save just a single (or two) CR fields instead of all of them, if those are the only ones that actually need saving. That can lead to more efficient code using mf(o)crf instead of the (slow) mfcr instruction.
Note that this patch does not (yet) implement this more efficient code generation, but it does implement the part that is required to be ABI compliant: creating multiple CFI records if multiple CR fields are saved.
Reviewed by Hal Finkel.
llvm-svn: 213492
show more ...
|
#
8658f17e |
| 20-Jul-2014 |
Ulrich Weigand <ulrich.weigand@de.ibm.com> |
[PowerPC] ELFv2 stack space reduction
The ELFv2 ABI reduces the amount of stack required to implement an ABI-compliant function call in two ways: * the "linkage area" is reduced from 48 bytes to 32
[PowerPC] ELFv2 stack space reduction
The ELFv2 ABI reduces the amount of stack required to implement an ABI-compliant function call in two ways: * the "linkage area" is reduced from 48 bytes to 32 bytes by eliminating two unused doublewords * the 64-byte "parameter save area" is now optional and need not be present in certain cases (it remains mandatory in functions with variable arguments, and functions that have any parameter that is passed on the stack)
The following patch implements this required changes: - reducing the linkage area, and associated relocation of the TOC save slot, in getLinkageSize / getTOCSaveOffset (this requires updating all callers of these routines to pass in the isELFv2ABI flag). - (partially) handling the case where the parameter save are is optional
This latter part requires some extra explanation: Currently, we still always allocate the parameter save area when *calling* a function. That is certainly always compliant with the ABI, but may cause code to allocate stack unnecessarily. This can be addressed by a follow-on optimization patch.
On the *callee* side, in LowerFormalArguments, we *must* track correctly whether the ABI guarantees that the caller has allocated the parameter save area for our use, and the patch does so. However, there is one complication: the code that handles incoming "byval" arguments will currently *always* write to the parameter save area, because it has to force incoming register arguments to the stack since it must return an *address* to implement the byval semantics.
To fix this, the patch changes the LowerFormalArguments code to write arguments to a freshly allocated stack slot on the function's own stack frame instead of the argument save area in those cases where that area is not present.
Reviewed by Hal Finkel.
llvm-svn: 213490
show more ...
|
#
3ee2af7d |
| 18-Jul-2014 |
Hal Finkel <hfinkel@anl.gov> |
[PowerPC] 32-bit ELF PIC support
This adds initial support for PPC32 ELF PIC (Position Independent Code; the -fPIC variety), thus rectifying a long-standing deficiency in the PowerPC backend.
Patch
[PowerPC] 32-bit ELF PIC support
This adds initial support for PPC32 ELF PIC (Position Independent Code; the -fPIC variety), thus rectifying a long-standing deficiency in the PowerPC backend.
Patch by Justin Hibbits!
llvm-svn: 213427
show more ...
|
#
8ca988f3 |
| 23-Jun-2014 |
Ulrich Weigand <ulrich.weigand@de.ibm.com> |
[PowerPC] Refactor getMinCallFrameSize / getMinCallArgumentsSize
As of r211495, the only remaining users of getMinCallFrameSize are in core ABI code (LowerFormalParameter / LowerCall). This is actu
[PowerPC] Refactor getMinCallFrameSize / getMinCallArgumentsSize
As of r211495, the only remaining users of getMinCallFrameSize are in core ABI code (LowerFormalParameter / LowerCall). This is actually a good thing, since the details of the parameter save area are ABI specific.
With the new ELFv2 ABI in particular, the rules defining the size of the save area will become significantly more complex, so it wouldn't make sense to implement those outside ABI code that has all required information.
In preparation, this patch eliminates the getMinCallFrameSize (and associated getMinCallArgumentsSize) routines, and inlines them into all callers. Note that since nearly all call arguments are constant, this allows simplifying the inlined copies to a single line everywhere.
No change in generate code expected.
llvm-svn: 211497
show more ...
|
#
f316e1db |
| 23-Jun-2014 |
Ulrich Weigand <ulrich.weigand@de.ibm.com> |
[PowerPC] Allow stack frames without parameter save area
The PPCFrameLowering::determineFrameLayout routine currently ensures that every function that allocates a stack frame provides space for the
[PowerPC] Allow stack frames without parameter save area
The PPCFrameLowering::determineFrameLayout routine currently ensures that every function that allocates a stack frame provides space for the parameter save area (via PPCFrameLowering::getMinCallFrameSize).
This is actually not necessary. There may be functions that never call another routine but still allocate a frame; those do not require the parameter save area. In the future, with the ELFv2 ABI, even some routines that do call other functions do not need to allocate the parameter save area.
While it is not a bug to allocate the parameter area when it is not needed, it is better to avoid it to save stack space.
Note that when any particular function call requires the parameter save area, this space will already have been included by ABI code in the size the CALLSEQ_START insn is annotated with, and therefore included in the size returned by MFI->getMaxCallFrameSize().
This means that determineFrameLayout simply does not need to care about the parameter save area. (It still needs to ensure that every frame provides the linkage area.) This is implemented by this patch.
Note that this exposed a bug in the new fast-isel code where the parameter area was *not* included in the CALLSEQ_START size; this is also fixed.
A couple of test cases needed to be adapted for the new (smaller) stack frame size those tests now see.
llvm-svn: 211495
show more ...
|
#
d104c31f |
| 12-Jun-2014 |
Eric Christopher <echristo@gmail.com> |
Move PPCFrameLowering into PPCSubtarget from PPCTargetMachine. Use the initializeSubtargetDependencies code to obtain an initialized subtarget and migrate a couple of subtarget using functions to the
Move PPCFrameLowering into PPCSubtarget from PPCTargetMachine. Use the initializeSubtargetDependencies code to obtain an initialized subtarget and migrate a couple of subtarget using functions to the .cpp file to avoid circular includes.
llvm-svn: 210822
show more ...
|
Revision tags: llvmorg-3.4.2, llvmorg-3.4.2-rc1 |
|
#
612bb69b |
| 29-Apr-2014 |
Eric Christopher <echristo@gmail.com> |
None of these targets actually define their own CFI_INSTRUCTION opcode so there's no reason to use the target namespace for it rather than TargetOpcode.
llvm-svn: 207475
|
#
d1737491 |
| 29-Apr-2014 |
Eric Christopher <echristo@gmail.com> |
80-column, tab characters, comment fixups.
llvm-svn: 207473
|
Revision tags: llvmorg-3.4.1, llvmorg-3.4.1-rc2, llvmorg-3.4.1-rc1 |
|
#
b1f25f1b |
| 07-Mar-2014 |
Rafael Espindola <rafael.espindola@gmail.com> |
Replace PROLOG_LABEL with a new CFI_INSTRUCTION.
The old system was fairly convoluted: * A temporary label was created. * A single PROLOG_LABEL was created with it. * A few MCCFIInstructions were cr
Replace PROLOG_LABEL with a new CFI_INSTRUCTION.
The old system was fairly convoluted: * A temporary label was created. * A single PROLOG_LABEL was created with it. * A few MCCFIInstructions were created with the same label.
The semantics were that the cfi instructions were mapped to the PROLOG_LABEL via the temporary label. The output position was that of the PROLOG_LABEL. The temporary label itself was used only for doing the mapping.
The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to one by holding an index into the CFI instructions of this function.
I did consider removing MMI.getFrameInstructions completelly and having CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non trivial constructors and destructors and are somewhat big, so the this setup is probably better.
The net result is that we don't create temporary labels that are never used.
llvm-svn: 203204
show more ...
|