History log of /llvm-project/llvm/lib/Target/ARM/ARMFrameLowering.cpp (Results 201 – 225 of 311)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-3.7.0-rc1
# 02564865 14-Jul-2015 Matthias Braun <matze@braunis.de>

PrologEpilogInserter: Rewrite API to determine callee save regsiters.

This changes TargetFrameLowering::processFunctionBeforeCalleeSavedScan():

- Rename the function to determineCalleeSaves()
- Pas

PrologEpilogInserter: Rewrite API to determine callee save regsiters.

This changes TargetFrameLowering::processFunctionBeforeCalleeSavedScan():

- Rename the function to determineCalleeSaves()
- Pass a bitset of callee saved registers by reference, thus avoiding
the function-global PhysRegUsed bitset in MachineRegisterInfo.
- Without PhysRegUsed the implementation is fine tuned to not save
physcial registers which are only read but never modified.

Related to rdar://21539507

Differential Revision: http://reviews.llvm.org/D10909

llvm-svn: 242165

show more ...


Revision tags: llvmorg-3.6.2, llvmorg-3.6.2-rc1
# f00654e3 23-Jun-2015 Alexander Kornienko <alexfh@google.com>

Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)

Apparently, the style needs to be agreed upon first.

llvm-svn: 240390


# 70bc5f13 19-Jun-2015 Alexander Kornienko <alexfh@google.com>

Fixed/added namespace ending comments using clang-tidy. NFC

The patch is generated using this command:

tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
-checks=-*,llvm-namespace-c

Fixed/added namespace ending comments using clang-tidy. NFC

The patch is generated using this command:

tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
-checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
llvm/lib/


Thanks to Eugene Kosov for the original patch!

llvm-svn: 240137

show more ...


# ddf76aa3 23-May-2015 Akira Hatanaka <ahatanaka@apple.com>

Stop resetting NoFramePointerElim in TargetMachine::resetTargetOptions.

This is part of the work to remove TargetMachine::resetTargetOptions.

In this patch, instead of updating global variable NoFr

Stop resetting NoFramePointerElim in TargetMachine::resetTargetOptions.

This is part of the work to remove TargetMachine::resetTargetOptions.

In this patch, instead of updating global variable NoFramePointerElim in
resetTargetOptions, its use in DisableFramePointerElim is replaced with a call
to TargetFrameLowering::noFramePointerElim. This function determines on a
per-function basis if frame pointer elimination should be disabled.

There is no change in functionality except that cl:opt option "disable-fp-elim"
can now override function attribute "no-frame-pointer-elim".

llvm-svn: 238080

show more ...


Revision tags: llvmorg-3.6.1, llvmorg-3.6.1-rc1
# 61b305ed 05-May-2015 Quentin Colombet <qcolombet@apple.com>

[ShrinkWrap] Add (a simplified version) of shrink-wrapping.

This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find

[ShrinkWrap] Add (a simplified version) of shrink-wrapping.

This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.

As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.


** Context **

Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.


** Motivating example **

Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b) {
%tmp = alloca i32, align 4
%tmp2 = icmp slt i32 %a, %b
br i1 %tmp2, label %true, label %false

true:
store i32 %a, i32* %tmp, align 4
%tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
br label %false

false:
%tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
ret i32 %tmp.0
}

On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f: ; @f
; BB#0:
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
LBB0_2: ; %false
mov sp, x29
ldp x29, x30, [sp], #16
ret

With shrink-wrapping we could generate:
_f: ; @f
; BB#0:
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
add sp, x29, #16 ; =16
ldp x29, x30, [sp], #16
LBB0_2: ; %false
ret

Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.


** Proposed Solution **

This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.

Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.

The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.

Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.


** Design Decisions **

1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.

Differential Revision: http://reviews.llvm.org/D9210

<rdar://problem/3201744>

llvm-svn: 236507

show more ...


# 78f1ecc5 23-Apr-2015 Peter Collingbourne <peter@pcc.me.uk>

ARM: When spilling extra registers for alignment, prefer low registers on all Thumb targets.

This makes it more likely that we can use the 16-bit push and pop instructions
on Thumb-2, saving around

ARM: When spilling extra registers for alignment, prefer low registers on all Thumb targets.

This makes it more likely that we can use the 16-bit push and pop instructions
on Thumb-2, saving around 4 bytes per function.

Differential Revision: http://reviews.llvm.org/D9165

llvm-svn: 235637

show more ...


# 3cc62b37 08-Apr-2015 Sergey Dmitrouk <sdmitrouk@accesssoftek.com>

[ARM][Debug Info] Restore emitting of .cfi_def_cfa_offset for functions without stack frame

Summary: Looks like new code from [[ http://reviews.llvm.org/rL222057 | rL222057 ]] doesn't account for ea

[ARM][Debug Info] Restore emitting of .cfi_def_cfa_offset for functions without stack frame

Summary: Looks like new code from [[ http://reviews.llvm.org/rL222057 | rL222057 ]] doesn't account for early `return` in `ARMFrameLowering::emitPrologue`, which leads to loosing `.cfi_def_cfa_offset` directive for functions without stack frame.

Reviewers: echristo, rengolin, asl, t.p.northover

Reviewed By: t.p.northover

Subscribers: llvm-commits, rengolin, aemerson

Differential Revision: http://reviews.llvm.org/D8606

llvm-svn: 234399

show more ...


Revision tags: llvmorg-3.5.2, llvmorg-3.5.2-rc1
# 8cda34f5 11-Mar-2015 Tim Northover <tnorthover@apple.com>

ARM: simplify and extend byval handling

The main issue being fixed here is that APCS targets handling a "byval align N"
parameter with N > 4 were miscounting what objects were where on the stack,
le

ARM: simplify and extend byval handling

The main issue being fixed here is that APCS targets handling a "byval align N"
parameter with N > 4 were miscounting what objects were where on the stack,
leading to FrameLowering setting the frame pointer incorrectly and clobbering
the stack.

But byval handling had grown over many years, and had multiple layers of cruft
trying to compensate for each other and calculate padding correctly. This only
really needs to be done once, in the HandleByVal function. Elsewhere should
just do what it's told by that call.

I also stripped out unnecessary APCS/AAPCS distinctions (now that Clang emits
byvals with the correct C ABI alignment), which simplified HandleByVal.

rdar://20095672

llvm-svn: 231959

show more ...


Revision tags: llvmorg-3.6.0
# 22b2ad26 20-Feb-2015 Eric Christopher <echristo@gmail.com>

Get the cached subtarget off the MachineFunction rather than
inquiring for a new one from the TargetMachine.

llvm-svn: 229999


Revision tags: llvmorg-3.6.0-rc4
# 2cff9e19 14-Feb-2015 Duncan P. N. Exon Smith <dexonsmith@apple.com>

ARM: Canonicalize access to function attributes, NFC

Canonicalize access to function attributes to use the simpler API.

getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
=> getFnAtt

ARM: Canonicalize access to function attributes, NFC

Canonicalize access to function attributes to use the simpler API.

getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
=> getFnAttribute(Kind)

getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
=> hasFnAttribute(Kind)

llvm-svn: 229220

show more ...


Revision tags: llvmorg-3.6.0-rc3
# fb8a66fb 31-Jan-2015 Saleem Abdulrasool <compnerd@compnerd.org>

ARM: support stack probe size on Windows on ARM

Now that -mstack-probe-size is piped through to the backend via the function
attribute as on Windows x86, honour the value to permit handling of non-d

ARM: support stack probe size on Windows on ARM

Now that -mstack-probe-size is piped through to the backend via the function
attribute as on Windows x86, honour the value to permit handling of non-default
values for stack probes. This is needed /Gs with the clang-cl driver or
-mstack-probe-size with the clang driver when targeting Windows on ARM.

llvm-svn: 227667

show more ...


Revision tags: llvmorg-3.6.0-rc2
# 1b21f009 29-Jan-2015 Eric Christopher <echristo@gmail.com>

Migrate ARM except for TTI, AsmPrinter, and frame lowering
away from getSubtargetImpl.

llvm-svn: 227399


Revision tags: llvmorg-3.6.0-rc1
# 933de7aa 08-Jan-2015 Kristof Beyls <kristof.beyls@arm.com>

Fix large stack alignment codegen for ARM and Thumb2 targets

This partially fixes PR13007 (ARM CodeGen fails with large stack
alignment): for ARM and Thumb2 targets, but not for Thumb1, as it
seems

Fix large stack alignment codegen for ARM and Thumb2 targets

This partially fixes PR13007 (ARM CodeGen fails with large stack
alignment): for ARM and Thumb2 targets, but not for Thumb1, as it
seems stack alignment for Thumb1 targets hasn't been supported at
all.

Producing an aligned stack pointer is done by zero-ing out the lower
bits of the stack pointer. The BIC instruction was used for this.
However, the immediate field of the BIC instruction only allows to
encode an immediate that can zero out up to a maximum of the 8 lower
bits. When a larger alignment is requested, a BIC instruction cannot
be used; llvm was silently producing incorrect code in this case.

This commit fixes code generation for large stack aligments by
using the BFC instruction instead, when the BFC instruction is
available. When not, it uses 2 instructions: a right shift,
followed by a left shift to zero out the lower bits.

The lowering of ARM::Int_eh_sjlj_dispatchsetup still has code
that unconditionally uses BIC to realign the stack pointer, so it
very likely has the same problem. However, I wasn't able to
produce a test case for that. This commit adds an assert so that
the compiler will fail the assert instead of silently generating
wrong code if this is ever reached.

llvm-svn: 225446

show more ...


Revision tags: llvmorg-3.5.1, llvmorg-3.5.1-rc2
# b9fa945d 16-Dec-2014 Adrian Prantl <aprantl@apple.com>

ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions.
Debug info marks the first instruction without the FrameSetup flag
as being the end of the function prologue. Any CFI instructions in th

ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions.
Debug info marks the first instruction without the FrameSetup flag
as being the end of the function prologue. Any CFI instructions in the
middle of the function prologue would cause debug info to end the prologue
too early and worse, attach the line number of the CFI instruction, which
incidentally is often 0.

llvm-svn: 224294

show more ...


Revision tags: llvmorg-3.5.1-rc1
# 3024b553 01-Dec-2014 Tim Northover <tnorthover@apple.com>

ARM: lower tail calls correctly when using GHC calling convention.

Patch by Ben Gamari.

llvm-svn: 223055


# 603d3165 14-Nov-2014 Tim Northover <tnorthover@apple.com>

ARM: refactor .cfi_def_cfa_offset emission.

We use to track quite a few "adjusted" offsets through the FrameLowering code
to account for changes in the prologue instructions as we went and allow the

ARM: refactor .cfi_def_cfa_offset emission.

We use to track quite a few "adjusted" offsets through the FrameLowering code
to account for changes in the prologue instructions as we went and allow the
emission of correct CFA annotations. However, we were missing a couple of cases
and the code was almost impenetrable.

It's easier to just add any stack-adjusting instruction to a list and emit them
together.

llvm-svn: 222057

show more ...


# 9d2d218f 14-Nov-2014 Tim Northover <tnorthover@apple.com>

ARM: correctly calculate the offset of FP in its push.

When we folded the DPR alignment gap into a push, we weren't noting the extra
distance from the beginning of the push to the FP, and so FP ende

ARM: correctly calculate the offset of FP in its push.

When we folded the DPR alignment gap into a push, we weren't noting the extra
distance from the beginning of the push to the FP, and so FP ended up pointing
at an incorrect offset.

The .cfi_def_cfa_offset directives are still wrong in this case, but I think
that can be improved by refactoring.

llvm-svn: 222056

show more ...


# dc0d9e46 05-Nov-2014 Tim Northover <tnorthover@apple.com>

ARM: try to add extra CS-register whenever stack alignment >= 8.

We currently try to push an even number of registers to preserve 8-byte
alignment during a function's prologue, but only when the sta

ARM: try to add extra CS-register whenever stack alignment >= 8.

We currently try to push an even number of registers to preserve 8-byte
alignment during a function's prologue, but only when the stack alignment is
prcisely 8. Many of the reasons for doing it apply also when that alignment > 8
(the extra store is often free, and can save another stack adjustment, though
less frequently for 16-byte stack alignment).

llvm-svn: 221321

show more ...


# 228c943f 05-Nov-2014 Tim Northover <tnorthover@apple.com>

ARM/Dwarf: correctly align stack before callee-saved VPRs

We were making an attempt to do this by adding an extra callee-saved GPR (so
that there was an even number in the list), but when that faile

ARM/Dwarf: correctly align stack before callee-saved VPRs

We were making an attempt to do this by adding an extra callee-saved GPR (so
that there was an even number in the list), but when that failed we went ahead
and pushed anyway.

This had a couple of potential issues:
+ The .cfi directives we emit misplaced dN because they were based on
PrologEpilogInserter's calculation.
+ Unaligned stores can be less efficient.
+ Unaligned stores can actually fault (likely only an issue in niche cases,
but possible).

This adds a final explicit stack adjustment if all other options fail, so that
the actual locations of the registers match up with where they should be.

llvm-svn: 221320

show more ...


Revision tags: llvmorg-3.5.0, llvmorg-3.5.0-rc4, llvmorg-3.5.0-rc3, llvmorg-3.5.0-rc2
# fc6de428 05-Aug-2014 Eric Christopher <echristo@gmail.com>

Have MachineFunction cache a pointer to the subtarget to make lookups
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lo

Have MachineFunction cache a pointer to the subtarget to make lookups
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.

Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.

llvm-svn: 214838

show more ...


# d913448b 04-Aug-2014 Eric Christopher <echristo@gmail.com>

Remove the TargetMachine forwards for TargetSubtargetInfo based
information and update all callers. No functional change.

llvm-svn: 214781


Revision tags: llvmorg-3.5.0-rc1
# 45fb7b63 26-Jun-2014 Eric Christopher <echristo@gmail.com>

Move the frame lowering constructors out of line to avoid circular
includes.

llvm-svn: 211798


# 86f60b72 30-May-2014 Tim Northover <tnorthover@apple.com>

ARM: use AAPCS-style prologues for embedded MachO.

Darwin prologues save their GPRs in two stages: a narrow push of r0-r7 & lr,
followed by a wide push of the remaining registers if there are any. A

ARM: use AAPCS-style prologues for embedded MachO.

Darwin prologues save their GPRs in two stages: a narrow push of r0-r7 & lr,
followed by a wide push of the remaining registers if there are any. AAPCS uses
a single push.w instruction.

It turns out that, on average, enough registers get pushed that code is smaller
in the AAPCS prologue, which is a nice property for M-class programmers. They
also have other options available for back-traces, so can hopefully deal with
the fact that FP & LR aren't adjacent in memory.

rdar://problem/15909583

llvm-svn: 209895

show more ...


# f9e798ba 22-May-2014 Tim Northover <tnorthover@apple.com>

Segmented stacks: omit __morestack call when there's no frame.

Patch by Florian Zeitz

llvm-svn: 209436


Revision tags: llvmorg-3.4.2, llvmorg-3.4.2-rc1
# 985dcf18 07-May-2014 Saleem Abdulrasool <compnerd@compnerd.org>

ARM: mark additional instructions as MachineFrameSetup

Mark up additional instructions which are part of the function prologue as
MachineFrameSetup. These instructions are part of the function prol

ARM: mark additional instructions as MachineFrameSetup

Mark up additional instructions which are part of the function prologue as
MachineFrameSetup. These instructions are part of the function prologue,
emitted by the PEI pass to setup the stack for use in the activating frame.

llvm-svn: 208153

show more ...


12345678910>>...13