ARMFrameLowering.cpp - OpenGrok history log for /llvm-project/llvm/lib/Target/ARM/ARMFrameLowering.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-3.7.0-rc1
# 02564865	14-Jul-2015	Matthias Braun <matze@braunis.de>	PrologEpilogInserter: Rewrite API to determine callee save regsiters. This changes TargetFrameLowering::processFunctionBeforeCalleeSavedScan(): - Rename the function to determineCalleeSaves() - Pas PrologEpilogInserter: Rewrite API to determine callee save regsiters. This changes TargetFrameLowering::processFunctionBeforeCalleeSavedScan(): - Rename the function to determineCalleeSaves() - Pass a bitset of callee saved registers by reference, thus avoiding the function-global PhysRegUsed bitset in MachineRegisterInfo. - Without PhysRegUsed the implementation is fine tuned to not save physcial registers which are only read but never modified. Related to rdar://21539507 Differential Revision: http://reviews.llvm.org/D10909 llvm-svn: 242165 show more ...
Revision tags: llvmorg-3.6.2, llvmorg-3.6.2-rc1
# f00654e3	23-Jun-2015	Alexander Kornienko <alexfh@google.com>	Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC) Apparently, the style needs to be agreed upon first. llvm-svn: 240390
# 70bc5f13	19-Jun-2015	Alexander Kornienko <alexfh@google.com>	Fixed/added namespace ending comments using clang-tidy. NFC The patch is generated using this command: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-,llvm-namespace-c Fixed/added namespace ending comments using clang-tidy. NFC The patch is generated using this command: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-,llvm-namespace-comment -header-filter='llvm/.\|clang/.' \ llvm/lib/ Thanks to Eugene Kosov for the original patch! llvm-svn: 240137 show more ...
# ddf76aa3	23-May-2015	Akira Hatanaka <ahatanaka@apple.com>	Stop resetting NoFramePointerElim in TargetMachine::resetTargetOptions. This is part of the work to remove TargetMachine::resetTargetOptions. In this patch, instead of updating global variable NoFr Stop resetting NoFramePointerElim in TargetMachine::resetTargetOptions. This is part of the work to remove TargetMachine::resetTargetOptions. In this patch, instead of updating global variable NoFramePointerElim in resetTargetOptions, its use in DisableFramePointerElim is replaced with a call to TargetFrameLowering::noFramePointerElim. This function determines on a per-function basis if frame pointer elimination should be disabled. There is no change in functionality except that cl:opt option "disable-fp-elim" can now override function attribute "no-frame-pointer-elim". llvm-svn: 238080 show more ...
Revision tags: llvmorg-3.6.1, llvmorg-3.6.1-rc1
# 61b305ed	05-May-2015	Quentin Colombet <qcolombet@apple.com>	[ShrinkWrap] Add (a simplified version) of shrink-wrapping. This patch introduces a new pass that computes the safe point to insert the prologue and epilogue of the function. The interest is to find [ShrinkWrap] Add (a simplified version) of shrink-wrapping. This patch introduces a new pass that computes the safe point to insert the prologue and epilogue of the function. The interest is to find safe points that are cheaper than the entry and exits blocks. As an example and to avoid regressions to be introduce, this patch also implements the required bits to enable the shrink-wrapping pass for AArch64. Context Currently we insert the prologue and epilogue of the method/function in the entry and exits blocks. Although this is correct, we can do a better job when those are not immediately required and insert them at less frequently executed places. The job of the shrink-wrapping pass is to identify such places. Motivating example Let us consider the following function that perform a call only in one branch of a if: define i32 @f(i32 %a, i32 %b) { %tmp = alloca i32, align 4 %tmp2 = icmp slt i32 %a, %b br i1 %tmp2, label %true, label %false true: store i32 %a, i32* %tmp, align 4 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp) br label %false false: %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ] ret i32 %tmp.0 } On AArch64 this code generates (removing the cfi directives to ease readabilities): _f: ; @f ; BB#0: stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething LBB0_2: ; %false mov sp, x29 ldp x29, x30, [sp], #16 ret With shrink-wrapping we could generate: _f: ; @f ; BB#0: cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething add sp, x29, #16 ; =16 ldp x29, x30, [sp], #16 LBB0_2: ; %false ret Therefore, we would pay the overhead of setting up/destroying the frame only if we actually do the call. Proposed Solution This patch introduces a new machine pass that perform the shrink-wrapping analysis (See the comments at the beginning of ShrinkWrap.cpp for more details). It then stores the safe save and restore point into the MachineFrameInfo attached to the MachineFunction. This information is then used by the PrologEpilogInserter (PEI) to place the related code at the right place. This pass runs right before the PEI. Unlike the original paper of Chow from PLDI’88, this implementation of shrink-wrapping does not use expensive data-flow analysis and does not need hack to properly avoid frequently executed point. Instead, it relies on dominance and loop properties. The pass is off by default and each target can opt-in by setting the EnableShrinkWrap boolean to true in their derived class of TargetPassConfig. This setting can also be overwritten on the command line by using -enable-shrink-wrap. Before you try out the pass for your target, make sure you properly fix your emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not necessarily the entry block. Design Decisions 1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but for debugging and clarity I thought it was best to have its own file. 2. Right now, we only support one save point and one restore point. At some point we can expand this to several save point and restore point, the impacted component would then be: - The pass itself: New algorithm needed. - MachineFrameInfo: Hold a list or set of Save/Restore point instead of one pointer. - PEI: Should loop over the save point and restore point. Anyhow, at least for this first iteration, I do not believe this is interesting to support the complex cases. We should revisit that when we motivating examples. Differential Revision: http://reviews.llvm.org/D9210 <rdar://problem/3201744> llvm-svn: 236507 show more ...
# 78f1ecc5	23-Apr-2015	Peter Collingbourne <peter@pcc.me.uk>	ARM: When spilling extra registers for alignment, prefer low registers on all Thumb targets. This makes it more likely that we can use the 16-bit push and pop instructions on Thumb-2, saving around ARM: When spilling extra registers for alignment, prefer low registers on all Thumb targets. This makes it more likely that we can use the 16-bit push and pop instructions on Thumb-2, saving around 4 bytes per function. Differential Revision: http://reviews.llvm.org/D9165 llvm-svn: 235637 show more ...
# 3cc62b37	08-Apr-2015	Sergey Dmitrouk <sdmitrouk@accesssoftek.com>	[ARM][Debug Info] Restore emitting of .cfi_def_cfa_offset for functions without stack frame Summary: Looks like new code from [[ http://reviews.llvm.org/rL222057 \| rL222057 ]] doesn't account for ea [ARM][Debug Info] Restore emitting of .cfi_def_cfa_offset for functions without stack frame Summary: Looks like new code from [[ http://reviews.llvm.org/rL222057 \| rL222057 ]] doesn't account for early `return` in `ARMFrameLowering::emitPrologue`, which leads to loosing `.cfi_def_cfa_offset` directive for functions without stack frame. Reviewers: echristo, rengolin, asl, t.p.northover Reviewed By: t.p.northover Subscribers: llvm-commits, rengolin, aemerson Differential Revision: http://reviews.llvm.org/D8606 llvm-svn: 234399 show more ...
Revision tags: llvmorg-3.5.2, llvmorg-3.5.2-rc1
# 8cda34f5	11-Mar-2015	Tim Northover <tnorthover@apple.com>	ARM: simplify and extend byval handling The main issue being fixed here is that APCS targets handling a "byval align N" parameter with N > 4 were miscounting what objects were where on the stack, le ARM: simplify and extend byval handling The main issue being fixed here is that APCS targets handling a "byval align N" parameter with N > 4 were miscounting what objects were where on the stack, leading to FrameLowering setting the frame pointer incorrectly and clobbering the stack. But byval handling had grown over many years, and had multiple layers of cruft trying to compensate for each other and calculate padding correctly. This only really needs to be done once, in the HandleByVal function. Elsewhere should just do what it's told by that call. I also stripped out unnecessary APCS/AAPCS distinctions (now that Clang emits byvals with the correct C ABI alignment), which simplified HandleByVal. rdar://20095672 llvm-svn: 231959 show more ...
Revision tags: llvmorg-3.6.0
# 22b2ad26	20-Feb-2015	Eric Christopher <echristo@gmail.com>	Get the cached subtarget off the MachineFunction rather than inquiring for a new one from the TargetMachine. llvm-svn: 229999
Revision tags: llvmorg-3.6.0-rc4
# 2cff9e19	14-Feb-2015	Duncan P. N. Exon Smith <dexonsmith@apple.com>	ARM: Canonicalize access to function attributes, NFC Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAtt ARM: Canonicalize access to function attributes, NFC Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229220 show more ...
Revision tags: llvmorg-3.6.0-rc3
# fb8a66fb	31-Jan-2015	Saleem Abdulrasool <compnerd@compnerd.org>	ARM: support stack probe size on Windows on ARM Now that -mstack-probe-size is piped through to the backend via the function attribute as on Windows x86, honour the value to permit handling of non-d ARM: support stack probe size on Windows on ARM Now that -mstack-probe-size is piped through to the backend via the function attribute as on Windows x86, honour the value to permit handling of non-default values for stack probes. This is needed /Gs with the clang-cl driver or -mstack-probe-size with the clang driver when targeting Windows on ARM. llvm-svn: 227667 show more ...
Revision tags: llvmorg-3.6.0-rc2
# 1b21f009	29-Jan-2015	Eric Christopher <echristo@gmail.com>	Migrate ARM except for TTI, AsmPrinter, and frame lowering away from getSubtargetImpl. llvm-svn: 227399
Revision tags: llvmorg-3.6.0-rc1
# 933de7aa	08-Jan-2015	Kristof Beyls <kristof.beyls@arm.com>	Fix large stack alignment codegen for ARM and Thumb2 targets This partially fixes PR13007 (ARM CodeGen fails with large stack alignment): for ARM and Thumb2 targets, but not for Thumb1, as it seems Fix large stack alignment codegen for ARM and Thumb2 targets This partially fixes PR13007 (ARM CodeGen fails with large stack alignment): for ARM and Thumb2 targets, but not for Thumb1, as it seems stack alignment for Thumb1 targets hasn't been supported at all. Producing an aligned stack pointer is done by zero-ing out the lower bits of the stack pointer. The BIC instruction was used for this. However, the immediate field of the BIC instruction only allows to encode an immediate that can zero out up to a maximum of the 8 lower bits. When a larger alignment is requested, a BIC instruction cannot be used; llvm was silently producing incorrect code in this case. This commit fixes code generation for large stack aligments by using the BFC instruction instead, when the BFC instruction is available. When not, it uses 2 instructions: a right shift, followed by a left shift to zero out the lower bits. The lowering of ARM::Int_eh_sjlj_dispatchsetup still has code that unconditionally uses BIC to realign the stack pointer, so it very likely has the same problem. However, I wasn't able to produce a test case for that. This commit adds an assert so that the compiler will fail the assert instead of silently generating wrong code if this is ever reached. llvm-svn: 225446 show more ...
Revision tags: llvmorg-3.5.1, llvmorg-3.5.1-rc2
# b9fa945d	16-Dec-2014	Adrian Prantl <aprantl@apple.com>	ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions. Debug info marks the first instruction without the FrameSetup flag as being the end of the function prologue. Any CFI instructions in th ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions. Debug info marks the first instruction without the FrameSetup flag as being the end of the function prologue. Any CFI instructions in the middle of the function prologue would cause debug info to end the prologue too early and worse, attach the line number of the CFI instruction, which incidentally is often 0. llvm-svn: 224294 show more ...
Revision tags: llvmorg-3.5.1-rc1
# 3024b553	01-Dec-2014	Tim Northover <tnorthover@apple.com>	ARM: lower tail calls correctly when using GHC calling convention. Patch by Ben Gamari. llvm-svn: 223055
# 603d3165	14-Nov-2014	Tim Northover <tnorthover@apple.com>	ARM: refactor .cfi_def_cfa_offset emission. We use to track quite a few "adjusted" offsets through the FrameLowering code to account for changes in the prologue instructions as we went and allow the ARM: refactor .cfi_def_cfa_offset emission. We use to track quite a few "adjusted" offsets through the FrameLowering code to account for changes in the prologue instructions as we went and allow the emission of correct CFA annotations. However, we were missing a couple of cases and the code was almost impenetrable. It's easier to just add any stack-adjusting instruction to a list and emit them together. llvm-svn: 222057 show more ...
# 9d2d218f	14-Nov-2014	Tim Northover <tnorthover@apple.com>	ARM: correctly calculate the offset of FP in its push. When we folded the DPR alignment gap into a push, we weren't noting the extra distance from the beginning of the push to the FP, and so FP ende ARM: correctly calculate the offset of FP in its push. When we folded the DPR alignment gap into a push, we weren't noting the extra distance from the beginning of the push to the FP, and so FP ended up pointing at an incorrect offset. The .cfi_def_cfa_offset directives are still wrong in this case, but I think that can be improved by refactoring. llvm-svn: 222056 show more ...
# dc0d9e46	05-Nov-2014	Tim Northover <tnorthover@apple.com>	ARM: try to add extra CS-register whenever stack alignment >= 8. We currently try to push an even number of registers to preserve 8-byte alignment during a function's prologue, but only when the sta ARM: try to add extra CS-register whenever stack alignment >= 8. We currently try to push an even number of registers to preserve 8-byte alignment during a function's prologue, but only when the stack alignment is prcisely 8. Many of the reasons for doing it apply also when that alignment > 8 (the extra store is often free, and can save another stack adjustment, though less frequently for 16-byte stack alignment). llvm-svn: 221321 show more ...
# 228c943f	05-Nov-2014	Tim Northover <tnorthover@apple.com>	ARM/Dwarf: correctly align stack before callee-saved VPRs We were making an attempt to do this by adding an extra callee-saved GPR (so that there was an even number in the list), but when that faile ARM/Dwarf: correctly align stack before callee-saved VPRs We were making an attempt to do this by adding an extra callee-saved GPR (so that there was an even number in the list), but when that failed we went ahead and pushed anyway. This had a couple of potential issues: + The .cfi directives we emit misplaced dN because they were based on PrologEpilogInserter's calculation. + Unaligned stores can be less efficient. + Unaligned stores can actually fault (likely only an issue in niche cases, but possible). This adds a final explicit stack adjustment if all other options fail, so that the actual locations of the registers match up with where they should be. llvm-svn: 221320 show more ...
Revision tags: llvmorg-3.5.0, llvmorg-3.5.0-rc4, llvmorg-3.5.0-rc3, llvmorg-3.5.0-rc2
# fc6de428	05-Aug-2014	Eric Christopher <echristo@gmail.com>	Have MachineFunction cache a pointer to the subtarget to make lookups shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lo Have MachineFunction cache a pointer to the subtarget to make lookups shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lookups from the MachineFunction easily. Update the MIPS subtarget switching machinery to update this pointer at the same time it runs. llvm-svn: 214838 show more ...
# d913448b	04-Aug-2014	Eric Christopher <echristo@gmail.com>	Remove the TargetMachine forwards for TargetSubtargetInfo based information and update all callers. No functional change. llvm-svn: 214781
Revision tags: llvmorg-3.5.0-rc1
# 45fb7b63	26-Jun-2014	Eric Christopher <echristo@gmail.com>	Move the frame lowering constructors out of line to avoid circular includes. llvm-svn: 211798
# 86f60b72	30-May-2014	Tim Northover <tnorthover@apple.com>	ARM: use AAPCS-style prologues for embedded MachO. Darwin prologues save their GPRs in two stages: a narrow push of r0-r7 & lr, followed by a wide push of the remaining registers if there are any. A ARM: use AAPCS-style prologues for embedded MachO. Darwin prologues save their GPRs in two stages: a narrow push of r0-r7 & lr, followed by a wide push of the remaining registers if there are any. AAPCS uses a single push.w instruction. It turns out that, on average, enough registers get pushed that code is smaller in the AAPCS prologue, which is a nice property for M-class programmers. They also have other options available for back-traces, so can hopefully deal with the fact that FP & LR aren't adjacent in memory. rdar://problem/15909583 llvm-svn: 209895 show more ...
# f9e798ba	22-May-2014	Tim Northover <tnorthover@apple.com>	Segmented stacks: omit __morestack call when there's no frame. Patch by Florian Zeitz llvm-svn: 209436
Revision tags: llvmorg-3.4.2, llvmorg-3.4.2-rc1
# 985dcf18	07-May-2014	Saleem Abdulrasool <compnerd@compnerd.org>	ARM: mark additional instructions as MachineFrameSetup Mark up additional instructions which are part of the function prologue as MachineFrameSetup. These instructions are part of the function prol ARM: mark additional instructions as MachineFrameSetup Mark up additional instructions which are part of the function prologue as MachineFrameSetup. These instructions are part of the function prologue, emitted by the PEI pass to setup the stack for use in the activating frame. llvm-svn: 208153 show more ...
1 2 3 4 5 6 7 8910 >>...13