History log of /llvm-project/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp (Results 151 – 175 of 230)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 736d39e1 13-Feb-2015 Eric Christopher <echristo@gmail.com>

The TOC save offset can be computed at compile time, do so and
propagate changes.

llvm-svn: 228997


# f71609b5 13-Feb-2015 Eric Christopher <echristo@gmail.com>

The return save offset can be computed at initialization time - do
so and save the value.

llvm-svn: 228996


# 82f1c775 10-Feb-2015 Bill Schmidt <wschmidt@linux.vnet.ibm.com>

[PowerPC] Fix reverted patch r227976 to avoid register assignment issues

See full discussion in http://reviews.llvm.org/D7491.

We now hide the add-immediate and call instructions together in a
sepa

[PowerPC] Fix reverted patch r227976 to avoid register assignment issues

See full discussion in http://reviews.llvm.org/D7491.

We now hide the add-immediate and call instructions together in a
separate pseudo-op, which is tagged to define GPR3 and clobber the
call-killed registers. The PPCTLSDynamicCall pass prior to RA now
expands this op into the two separate addi and call ops, with explicit
definitions of GPR3 on both instructions, and explicit clobbers on the
call instruction. The pass is now marked as requiring and preserving
the LiveIntervals and SlotIndexes analyses, and fixes these up after
the replacement sequences are introduced.

Self-hosting has been verified on LE P8 and BE P7 with various
optimization levels, etc. It has also been verified with the
--no-tls-optimize flag workaround removed.

llvm-svn: 228725

show more ...


# 0d2a1515 06-Feb-2015 Hal Finkel <hfinkel@anl.gov>

Revert "r227976 - [PowerPC] Yet another approach to __tls_get_addr" and related fixups

Unfortunately, even with the workaround of disabling the linker TLS
optimizations in Clang restored (which has

Revert "r227976 - [PowerPC] Yet another approach to __tls_get_addr" and related fixups

Unfortunately, even with the workaround of disabling the linker TLS
optimizations in Clang restored (which has already been done), this still
breaks self-hosting on my P7 machine (-O3 -DNDEBUG -mcpu=native).

Bill is currently working on an alternate implementation to address the TLS
issue in a way that also fully elides the linker bug (which, unfortunately,
this approach did not fully), so I'm reverting this now.

llvm-svn: 228460

show more ...


# 685aa8b0 03-Feb-2015 Bill Schmidt <wschmidt@linux.vnet.ibm.com>

[PowerPC] Yet another approach to __tls_get_addr

This patch is a third attempt to properly handle the local-dynamic and
global-dynamic TLS models.

In my original implementation, calls to __tls_get_

[PowerPC] Yet another approach to __tls_get_addr

This patch is a third attempt to properly handle the local-dynamic and
global-dynamic TLS models.

In my original implementation, calls to __tls_get_addr were hidden
from view until the asm-printer phase, at which point the underlying
branch-and-link instruction was created with proper relocations. This
mostly worked well, but I used some repellent techniques to ensure
that the TLS_GET_ADDR nodes at the SD and MI levels correctly received
input from GPR3 and produced output into GPR3. This proved to work
badly in the presence of multiple TLS variable accesses, with the
copies to and from GPR3 being scheduled incorrectly and generally
creating havoc.

In r221703, I addressed that problem by representing the calls to
__tls_get_addr as true calls during instruction lowering. This had
the advantage of removing all of the bad hacks and relying on the
existing call machinery to properly glue the copies in place. It
looked like this was going to be the right way to go.

However, as a side effect of the recent discovery of problems with
linker optimizations for TLS, we discovered cases of suboptimal code
generation with this strategy. The problem comes when tls_get_addr is
called for the same address, and there is a resulting CSE
opportunity. It turns out that in such cases MachineCSE will common
the addis/addi instructions that set up the input value to
tls_get_addr, but will not common the calls themselves. MachineCSE
does not have any machinery to common idempotent calls. This is
perfectly sensible, since presumably this would be done at the IR
level, and introducing calls in the back end isn't commonplace. In
any case, we end up with two calls to __tls_get_addr when one would
suffice, and that isn't good.

I presumed that the original design would have allowed commoning of
the machine-specific nodes that hid the __tls_get_addr calls, so as
suggested by Ulrich Weigand, I went back to that design and cleaned it
up so that the copies were properly held together by glue
nodes. However, it turned out that this didn't work either...the
presence of copies to physical registers kept the machine-specific
nodes from being commoned also.

All of which leads to the design presented here. This is a return to
the original design, except that no attempt is made to introduce
copies to and from GPR3 during instruction lowering. Virtual registers
are used until prior to register allocation. At that point, a special
pass is run that identifies the machine-specific nodes that hide the
tls_get_addr calls and introduces the copies to and from GPR3 around
them. The register allocator then coalesces these copies away. With
this design, MachineCSE succeeds in commoning tls_get_addr calls where
possible, and we get nice optimal code generation (better than GCC at
the moment, which does not common these calls).

One additional problem must be dealt with: After introducing the
mentions of the physical register GPR3, the aggressive anti-dependence
breaker sees opportunities to improve scheduling by selecting a
different register instead. Flags must be used on the instruction
descriptions to tell the anti-dependence breaker to keep its hands in
its pockets.

One thing missing from the original design was recording a definition
of the link register on the GET_TLS_ADDR nodes. Doing this was found
to be insufficient to force a stack frame to be created, which led to
looping behavior because two different LR values were stored at the
same address. This appears to have been an oversight in
PPCFrameLowering::determineFrameLayout(), which is repaired here.

Because MustSaveLR() returns true for calls to builtin_return_address,
this changed the expected behavior of
test/CodeGen/PowerPC/retaddr2.ll, which now stacks a frame but
formerly did not. I've fixed the test case to reflect this.

There are existing TLS tests to catch regressions; the checks in
test/CodeGen/PowerPC/tls-store2.ll proved to be too restrictive in the
face of instruction scheduling with these changes, so I fixed that
up.

I've added a new test case based on the PrettyStackTrace module that
demonstrated the original problem. This checks that we get correct
code generation and that CSE of the calls to __get_tls_addr has taken
place.

llvm-svn: 227976

show more ...


Revision tags: llvmorg-3.6.0-rc2
# cccae795 30-Jan-2015 Eric Christopher <echristo@gmail.com>

Use the cached subtargets and remove calls to getSubtarget/getSubtargetImpl
without a Function argument.

llvm-svn: 227622


# 38522b88 30-Jan-2015 Eric Christopher <echristo@gmail.com>

Use the cached subtarget in PPCFrameLowering.

llvm-svn: 227548


Revision tags: llvmorg-3.6.0-rc1
# 934361a4 14-Jan-2015 Hal Finkel <hfinkel@anl.gov>

Revert "r225811 - Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support""

This re-applies r225808, fixed to avoid problems with SDAG dependencies along
with the preceding fix to ScheduleDAGSDN

Revert "r225811 - Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support""

This re-applies r225808, fixed to avoid problems with SDAG dependencies along
with the preceding fix to ScheduleDAGSDNodes::RegDefIter::InitNodeNumDefs.
These problems caused the original regression tests to assert/segfault on many
(but not all) systems.

Original commit message:

This commit does two things:

1. Refactors PPCFastISel to use more of the common infrastructure for call
lowering (this lets us take advantage of this common code for lowering some
common intrinsics, stackmap/patchpoint among them).

2. Adds support for stackmap/patchpoint lowering. For the most part, this is
very similar to the support in the AArch64 target, with the obvious differences
(different registers, NOP instructions, etc.). The test cases are adapted
from the AArch64 test cases.

One difference of note is that the patchpoint call sequence takes 24 bytes, so
you can't use less than that (on AArch64 you can go down to 16). Also, as noted
in the docs, we take the patchpoint address to be the actual code address
(assuming the call is local in the TOC-sharing sense), which should yield
higher performance than generating the full cross-DSO indirect-call sequence
and is likely just as useful for JITed code (if not, we'll change it).

StackMaps and Patchpoints are still marked as experimental, and so this support
is doubly experimental. So go ahead and experiment!

llvm-svn: 225909

show more ...


# 63fb9281 13-Jan-2015 Hal Finkel <hfinkel@anl.gov>

Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support"

Reverting this while I investiage buildbot failures (segfaulting in
GetCostForDef at ScheduleDAGRRList.cpp:314).

llvm-svn: 225811


# 821befd5 13-Jan-2015 Hal Finkel <hfinkel@anl.gov>

[PowerPC] Add StackMap/PatchPoint support

This commit does two things:

1. Refactors PPCFastISel to use more of the common infrastructure for call
lowering (this lets us take advantage of this

[PowerPC] Add StackMap/PatchPoint support

This commit does two things:

1. Refactors PPCFastISel to use more of the common infrastructure for call
lowering (this lets us take advantage of this common code for lowering some
common intrinsics, stackmap/patchpoint among them).

2. Adds support for stackmap/patchpoint lowering. For the most part, this is
very similar to the support in the AArch64 target, with the obvious differences
(different registers, NOP instructions, etc.). The test cases are adapted
from the AArch64 test cases.

One difference of note is that the patchpoint call sequence takes 24 bytes, so
you can't use less than that (on AArch64 you can go down to 16). Also, as noted
in the docs, we take the patchpoint address to be the actual code address
(assuming the call is local in the TOC-sharing sense), which should yield
higher performance than generating the full cross-DSO indirect-call sequence
and is likely just as useful for JITed code (if not, we'll change it).

StackMaps and Patchpoints are still marked as experimental, and so this support
is doubly experimental. So go ahead and experiment!

llvm-svn: 225808

show more ...


# f4a22c0d 13-Jan-2015 Hal Finkel <hfinkel@anl.gov>

[PowerPC] Split the blr definition into BLR and BLR8

We really need a separate 64-bit version of this instruction so that it can be
marked as clobbering LR8 (instead of just LR). No change in functi

[PowerPC] Split the blr definition into BLR and BLR8

We really need a separate 64-bit version of this instruction so that it can be
marked as clobbering LR8 (instead of just LR). No change in functionality
(although the verifier might be slightly happier), however, it is required for
stackmap/patchpoint support. Thus, this will be covered by stackmap test cases
once those are added.

llvm-svn: 225804

show more ...


# 654346e6 10-Jan-2015 Justin Hibbits <jrh29@alumni.cwru.edu>

Fully fix Bug #22115.

Summary:
In the previous commit, the register was saved, but space was not allocated.
This resulted in the parameter save area potentially clobbering r30, leading to
nasty resu

Fully fix Bug #22115.

Summary:
In the previous commit, the register was saved, but space was not allocated.
This resulted in the parameter save area potentially clobbering r30, leading to
nasty results.

Test Plan: Tests updated

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6906

llvm-svn: 225573

show more ...


# 98a532dd 08-Jan-2015 Justin Hibbits <jrh29@alumni.cwru.edu>

Add saving and restoring of r30 to the prologue and epilogue, respectively

Summary: The PIC additions didn't update the prologue and epilogue code to save and restore r30 (PIC base register). This

Add saving and restoring of r30 to the prologue and epilogue, respectively

Summary: The PIC additions didn't update the prologue and epilogue code to save and restore r30 (PIC base register). This does that.

Test Plan: Tests updated.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6876

llvm-svn: 225450

show more ...


Revision tags: llvmorg-3.5.1, llvmorg-3.5.1-rc2, llvmorg-3.5.1-rc1
# 1f0a44e6 01-Dec-2014 Jay Foad <jay.foad@gmail.com>

[PowerPC] Fix unwind info with dynamic stack realignment

Summary:
PowerPC DWARF unwind info defined CFA as SP + offset even in a function
where the stack had been dynamically realigned. This clearly

[PowerPC] Fix unwind info with dynamic stack realignment

Summary:
PowerPC DWARF unwind info defined CFA as SP + offset even in a function
where the stack had been dynamically realigned. This clearly doesn't
work because the offset from SP to CFA is not a constant. Fix it by
defining CFA as BP instead.

This was causing the AddressSanitizer null_deref test to fail 50% of
the time, depending on whether SP happened to be 32-byte aligned on
entry to a particular function or not.

Reviewers: willschm, uweigand, hfinkel

Reviewed By: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6410

llvm-svn: 222996

show more ...


Revision tags: llvmorg-3.5.0, llvmorg-3.5.0-rc4, llvmorg-3.5.0-rc3, llvmorg-3.5.0-rc2
# fc6de428 05-Aug-2014 Eric Christopher <echristo@gmail.com>

Have MachineFunction cache a pointer to the subtarget to make lookups
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lo

Have MachineFunction cache a pointer to the subtarget to make lookups
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.

Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.

llvm-svn: 214838

show more ...


# d913448b 04-Aug-2014 Eric Christopher <echristo@gmail.com>

Remove the TargetMachine forwards for TargetSubtargetInfo based
information and update all callers. No functional change.

llvm-svn: 214781


Revision tags: llvmorg-3.5.0-rc1
# be928cc2 21-Jul-2014 Ulrich Weigand <ulrich.weigand@de.ibm.com>

[PowerPC] ELFv2 explicit CFI for CR fields

This is a minor improvement in the ELFv2 ABI. In ELFv1, DWARF CFI
would represent a saved CR word (holding CR fields CR2, CR3, and CR4)
using just a sing

[PowerPC] ELFv2 explicit CFI for CR fields

This is a minor improvement in the ELFv2 ABI. In ELFv1, DWARF CFI
would represent a saved CR word (holding CR fields CR2, CR3, and CR4)
using just a single CFI record refering to CR2. In ELFv2 instead,
each of the CR fields is represented by its own CFI record. The
advantage is that the compiler can now chose to save just a single
(or two) CR fields instead of all of them, if those are the only ones
that actually need saving. That can lead to more efficient code using
mf(o)crf instead of the (slow) mfcr instruction.

Note that this patch does not (yet) implement this more efficient
code generation, but it does implement the part that is required to
be ABI compliant: creating multiple CFI records if multiple CR fields
are saved.

Reviewed by Hal Finkel.

llvm-svn: 213492

show more ...


# 8658f17e 20-Jul-2014 Ulrich Weigand <ulrich.weigand@de.ibm.com>

[PowerPC] ELFv2 stack space reduction

The ELFv2 ABI reduces the amount of stack required to implement an
ABI-compliant function call in two ways:
* the "linkage area" is reduced from 48 bytes to 32

[PowerPC] ELFv2 stack space reduction

The ELFv2 ABI reduces the amount of stack required to implement an
ABI-compliant function call in two ways:
* the "linkage area" is reduced from 48 bytes to 32 bytes by
eliminating two unused doublewords
* the 64-byte "parameter save area" is now optional and need not be
present in certain cases (it remains mandatory in functions with
variable arguments, and functions that have any parameter that is
passed on the stack)

The following patch implements this required changes:
- reducing the linkage area, and associated relocation of the TOC save
slot, in getLinkageSize / getTOCSaveOffset (this requires updating all
callers of these routines to pass in the isELFv2ABI flag).
- (partially) handling the case where the parameter save are is optional

This latter part requires some extra explanation: Currently, we still
always allocate the parameter save area when *calling* a function.
That is certainly always compliant with the ABI, but may cause code to
allocate stack unnecessarily. This can be addressed by a follow-on
optimization patch.

On the *callee* side, in LowerFormalArguments, we *must* track
correctly whether the ABI guarantees that the caller has allocated
the parameter save area for our use, and the patch does so. However,
there is one complication: the code that handles incoming "byval"
arguments will currently *always* write to the parameter save area,
because it has to force incoming register arguments to the stack since
it must return an *address* to implement the byval semantics.

To fix this, the patch changes the LowerFormalArguments code to write
arguments to a freshly allocated stack slot on the function's own stack
frame instead of the argument save area in those cases where that area
is not present.

Reviewed by Hal Finkel.

llvm-svn: 213490

show more ...


# 3ee2af7d 18-Jul-2014 Hal Finkel <hfinkel@anl.gov>

[PowerPC] 32-bit ELF PIC support

This adds initial support for PPC32 ELF PIC (Position Independent Code; the
-fPIC variety), thus rectifying a long-standing deficiency in the PowerPC
backend.

Patch

[PowerPC] 32-bit ELF PIC support

This adds initial support for PPC32 ELF PIC (Position Independent Code; the
-fPIC variety), thus rectifying a long-standing deficiency in the PowerPC
backend.

Patch by Justin Hibbits!

llvm-svn: 213427

show more ...


# 8ca988f3 23-Jun-2014 Ulrich Weigand <ulrich.weigand@de.ibm.com>

[PowerPC] Refactor getMinCallFrameSize / getMinCallArgumentsSize

As of r211495, the only remaining users of getMinCallFrameSize are in
core ABI code (LowerFormalParameter / LowerCall). This is actu

[PowerPC] Refactor getMinCallFrameSize / getMinCallArgumentsSize

As of r211495, the only remaining users of getMinCallFrameSize are in
core ABI code (LowerFormalParameter / LowerCall). This is actually a
good thing, since the details of the parameter save area are ABI specific.

With the new ELFv2 ABI in particular, the rules defining the size of the
save area will become significantly more complex, so it wouldn't make
sense to implement those outside ABI code that has all required
information.

In preparation, this patch eliminates the getMinCallFrameSize (and
associated getMinCallArgumentsSize) routines, and inlines them into all
callers. Note that since nearly all call arguments are constant, this
allows simplifying the inlined copies to a single line everywhere.

No change in generate code expected.

llvm-svn: 211497

show more ...


# f316e1db 23-Jun-2014 Ulrich Weigand <ulrich.weigand@de.ibm.com>

[PowerPC] Allow stack frames without parameter save area

The PPCFrameLowering::determineFrameLayout routine currently ensures
that every function that allocates a stack frame provides space for the

[PowerPC] Allow stack frames without parameter save area

The PPCFrameLowering::determineFrameLayout routine currently ensures
that every function that allocates a stack frame provides space for the
parameter save area (via PPCFrameLowering::getMinCallFrameSize).

This is actually not necessary. There may be functions that never call
another routine but still allocate a frame; those do not require the
parameter save area. In the future, with the ELFv2 ABI, even some
routines that do call other functions do not need to allocate the
parameter save area.

While it is not a bug to allocate the parameter area when it is not
needed, it is better to avoid it to save stack space.

Note that when any particular function call requires the parameter save
area, this space will already have been included by ABI code in the size
the CALLSEQ_START insn is annotated with, and therefore included in the
size returned by MFI->getMaxCallFrameSize().

This means that determineFrameLayout simply does not need to care about
the parameter save area. (It still needs to ensure that every frame
provides the linkage area.) This is implemented by this patch.

Note that this exposed a bug in the new fast-isel code where the parameter
area was *not* included in the CALLSEQ_START size; this is also fixed.

A couple of test cases needed to be adapted for the new (smaller) stack
frame size those tests now see.

llvm-svn: 211495

show more ...


# d104c31f 12-Jun-2014 Eric Christopher <echristo@gmail.com>

Move PPCFrameLowering into PPCSubtarget from PPCTargetMachine. Use
the initializeSubtargetDependencies code to obtain an initialized
subtarget and migrate a couple of subtarget using functions to the

Move PPCFrameLowering into PPCSubtarget from PPCTargetMachine. Use
the initializeSubtargetDependencies code to obtain an initialized
subtarget and migrate a couple of subtarget using functions to the
.cpp file to avoid circular includes.

llvm-svn: 210822

show more ...


Revision tags: llvmorg-3.4.2, llvmorg-3.4.2-rc1
# 612bb69b 29-Apr-2014 Eric Christopher <echristo@gmail.com>

None of these targets actually define their own CFI_INSTRUCTION
opcode so there's no reason to use the target namespace for it
rather than TargetOpcode.

llvm-svn: 207475


# d1737491 29-Apr-2014 Eric Christopher <echristo@gmail.com>

80-column, tab characters, comment fixups.

llvm-svn: 207473


Revision tags: llvmorg-3.4.1, llvmorg-3.4.1-rc2, llvmorg-3.4.1-rc1
# b1f25f1b 07-Mar-2014 Rafael Espindola <rafael.espindola@gmail.com>

Replace PROLOG_LABEL with a new CFI_INSTRUCTION.

The old system was fairly convoluted:
* A temporary label was created.
* A single PROLOG_LABEL was created with it.
* A few MCCFIInstructions were cr

Replace PROLOG_LABEL with a new CFI_INSTRUCTION.

The old system was fairly convoluted:
* A temporary label was created.
* A single PROLOG_LABEL was created with it.
* A few MCCFIInstructions were created with the same label.

The semantics were that the cfi instructions were mapped to the PROLOG_LABEL
via the temporary label. The output position was that of the PROLOG_LABEL.
The temporary label itself was used only for doing the mapping.

The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to
one by holding an index into the CFI instructions of this function.

I did consider removing MMI.getFrameInstructions completelly and having
CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non
trivial constructors and destructors and are somewhat big, so the this setup
is probably better.

The net result is that we don't create temporary labels that are never used.

llvm-svn: 203204

show more ...


12345678910