Revision tags: llvmorg-3.7.0-rc1 |
|
#
02564865 |
| 14-Jul-2015 |
Matthias Braun <matze@braunis.de> |
PrologEpilogInserter: Rewrite API to determine callee save regsiters.
This changes TargetFrameLowering::processFunctionBeforeCalleeSavedScan():
- Rename the function to determineCalleeSaves() - Pas
PrologEpilogInserter: Rewrite API to determine callee save regsiters.
This changes TargetFrameLowering::processFunctionBeforeCalleeSavedScan():
- Rename the function to determineCalleeSaves() - Pass a bitset of callee saved registers by reference, thus avoiding the function-global PhysRegUsed bitset in MachineRegisterInfo. - Without PhysRegUsed the implementation is fine tuned to not save physcial registers which are only read but never modified.
Related to rdar://21539507
Differential Revision: http://reviews.llvm.org/D10909
llvm-svn: 242165
show more ...
|
Revision tags: llvmorg-3.6.2, llvmorg-3.6.2-rc1 |
|
#
f00654e3 |
| 23-Jun-2015 |
Alexander Kornienko <alexfh@google.com> |
Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)
Apparently, the style needs to be agreed upon first.
llvm-svn: 240390
|
#
70bc5f13 |
| 19-Jun-2015 |
Alexander Kornienko <alexfh@google.com> |
Fixed/added namespace ending comments using clang-tidy. NFC
The patch is generated using this command:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-*,llvm-namespace-c
Fixed/added namespace ending comments using clang-tidy. NFC
The patch is generated using this command:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \ llvm/lib/
Thanks to Eugene Kosov for the original patch!
llvm-svn: 240137
show more ...
|
#
ddf76aa3 |
| 23-May-2015 |
Akira Hatanaka <ahatanaka@apple.com> |
Stop resetting NoFramePointerElim in TargetMachine::resetTargetOptions.
This is part of the work to remove TargetMachine::resetTargetOptions.
In this patch, instead of updating global variable NoFr
Stop resetting NoFramePointerElim in TargetMachine::resetTargetOptions.
This is part of the work to remove TargetMachine::resetTargetOptions.
In this patch, instead of updating global variable NoFramePointerElim in resetTargetOptions, its use in DisableFramePointerElim is replaced with a call to TargetFrameLowering::noFramePointerElim. This function determines on a per-function basis if frame pointer elimination should be disabled.
There is no change in functionality except that cl:opt option "disable-fp-elim" can now override function attribute "no-frame-pointer-elim".
llvm-svn: 238080
show more ...
|
Revision tags: llvmorg-3.6.1, llvmorg-3.6.1-rc1 |
|
#
61b305ed |
| 05-May-2015 |
Quentin Colombet <qcolombet@apple.com> |
[ShrinkWrap] Add (a simplified version) of shrink-wrapping.
This patch introduces a new pass that computes the safe point to insert the prologue and epilogue of the function. The interest is to find
[ShrinkWrap] Add (a simplified version) of shrink-wrapping.
This patch introduces a new pass that computes the safe point to insert the prologue and epilogue of the function. The interest is to find safe points that are cheaper than the entry and exits blocks.
As an example and to avoid regressions to be introduce, this patch also implements the required bits to enable the shrink-wrapping pass for AArch64.
** Context **
Currently we insert the prologue and epilogue of the method/function in the entry and exits blocks. Although this is correct, we can do a better job when those are not immediately required and insert them at less frequently executed places. The job of the shrink-wrapping pass is to identify such places.
** Motivating example **
Let us consider the following function that perform a call only in one branch of a if: define i32 @f(i32 %a, i32 %b) { %tmp = alloca i32, align 4 %tmp2 = icmp slt i32 %a, %b br i1 %tmp2, label %true, label %false
true: store i32 %a, i32* %tmp, align 4 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp) br label %false
false: %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ] ret i32 %tmp.0 }
On AArch64 this code generates (removing the cfi directives to ease readabilities): _f: ; @f ; BB#0: stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething LBB0_2: ; %false mov sp, x29 ldp x29, x30, [sp], #16 ret
With shrink-wrapping we could generate: _f: ; @f ; BB#0: cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething add sp, x29, #16 ; =16 ldp x29, x30, [sp], #16 LBB0_2: ; %false ret
Therefore, we would pay the overhead of setting up/destroying the frame only if we actually do the call.
** Proposed Solution **
This patch introduces a new machine pass that perform the shrink-wrapping analysis (See the comments at the beginning of ShrinkWrap.cpp for more details). It then stores the safe save and restore point into the MachineFrameInfo attached to the MachineFunction. This information is then used by the PrologEpilogInserter (PEI) to place the related code at the right place. This pass runs right before the PEI.
Unlike the original paper of Chow from PLDI’88, this implementation of shrink-wrapping does not use expensive data-flow analysis and does not need hack to properly avoid frequently executed point. Instead, it relies on dominance and loop properties.
The pass is off by default and each target can opt-in by setting the EnableShrinkWrap boolean to true in their derived class of TargetPassConfig. This setting can also be overwritten on the command line by using -enable-shrink-wrap.
Before you try out the pass for your target, make sure you properly fix your emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not necessarily the entry block.
** Design Decisions **
1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but for debugging and clarity I thought it was best to have its own file. 2. Right now, we only support one save point and one restore point. At some point we can expand this to several save point and restore point, the impacted component would then be: - The pass itself: New algorithm needed. - MachineFrameInfo: Hold a list or set of Save/Restore point instead of one pointer. - PEI: Should loop over the save point and restore point. Anyhow, at least for this first iteration, I do not believe this is interesting to support the complex cases. We should revisit that when we motivating examples.
Differential Revision: http://reviews.llvm.org/D9210
<rdar://problem/3201744>
llvm-svn: 236507
show more ...
|
#
78f1ecc5 |
| 23-Apr-2015 |
Peter Collingbourne <peter@pcc.me.uk> |
ARM: When spilling extra registers for alignment, prefer low registers on all Thumb targets.
This makes it more likely that we can use the 16-bit push and pop instructions on Thumb-2, saving around
ARM: When spilling extra registers for alignment, prefer low registers on all Thumb targets.
This makes it more likely that we can use the 16-bit push and pop instructions on Thumb-2, saving around 4 bytes per function.
Differential Revision: http://reviews.llvm.org/D9165
llvm-svn: 235637
show more ...
|
#
3cc62b37 |
| 08-Apr-2015 |
Sergey Dmitrouk <sdmitrouk@accesssoftek.com> |
[ARM][Debug Info] Restore emitting of .cfi_def_cfa_offset for functions without stack frame
Summary: Looks like new code from [[ http://reviews.llvm.org/rL222057 | rL222057 ]] doesn't account for ea
[ARM][Debug Info] Restore emitting of .cfi_def_cfa_offset for functions without stack frame
Summary: Looks like new code from [[ http://reviews.llvm.org/rL222057 | rL222057 ]] doesn't account for early `return` in `ARMFrameLowering::emitPrologue`, which leads to loosing `.cfi_def_cfa_offset` directive for functions without stack frame.
Reviewers: echristo, rengolin, asl, t.p.northover
Reviewed By: t.p.northover
Subscribers: llvm-commits, rengolin, aemerson
Differential Revision: http://reviews.llvm.org/D8606
llvm-svn: 234399
show more ...
|
Revision tags: llvmorg-3.5.2, llvmorg-3.5.2-rc1 |
|
#
8cda34f5 |
| 11-Mar-2015 |
Tim Northover <tnorthover@apple.com> |
ARM: simplify and extend byval handling
The main issue being fixed here is that APCS targets handling a "byval align N" parameter with N > 4 were miscounting what objects were where on the stack, le
ARM: simplify and extend byval handling
The main issue being fixed here is that APCS targets handling a "byval align N" parameter with N > 4 were miscounting what objects were where on the stack, leading to FrameLowering setting the frame pointer incorrectly and clobbering the stack.
But byval handling had grown over many years, and had multiple layers of cruft trying to compensate for each other and calculate padding correctly. This only really needs to be done once, in the HandleByVal function. Elsewhere should just do what it's told by that call.
I also stripped out unnecessary APCS/AAPCS distinctions (now that Clang emits byvals with the correct C ABI alignment), which simplified HandleByVal.
rdar://20095672
llvm-svn: 231959
show more ...
|
Revision tags: llvmorg-3.6.0 |
|
#
22b2ad26 |
| 20-Feb-2015 |
Eric Christopher <echristo@gmail.com> |
Get the cached subtarget off the MachineFunction rather than inquiring for a new one from the TargetMachine.
llvm-svn: 229999
|
Revision tags: llvmorg-3.6.0-rc4 |
|
#
2cff9e19 |
| 14-Feb-2015 |
Duncan P. N. Exon Smith <dexonsmith@apple.com> |
ARM: Canonicalize access to function attributes, NFC
Canonicalize access to function attributes to use the simpler API.
getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAtt
ARM: Canonicalize access to function attributes, NFC
Canonicalize access to function attributes to use the simpler API.
getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind)
getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind)
llvm-svn: 229220
show more ...
|
Revision tags: llvmorg-3.6.0-rc3 |
|
#
fb8a66fb |
| 31-Jan-2015 |
Saleem Abdulrasool <compnerd@compnerd.org> |
ARM: support stack probe size on Windows on ARM
Now that -mstack-probe-size is piped through to the backend via the function attribute as on Windows x86, honour the value to permit handling of non-d
ARM: support stack probe size on Windows on ARM
Now that -mstack-probe-size is piped through to the backend via the function attribute as on Windows x86, honour the value to permit handling of non-default values for stack probes. This is needed /Gs with the clang-cl driver or -mstack-probe-size with the clang driver when targeting Windows on ARM.
llvm-svn: 227667
show more ...
|
Revision tags: llvmorg-3.6.0-rc2 |
|
#
1b21f009 |
| 29-Jan-2015 |
Eric Christopher <echristo@gmail.com> |
Migrate ARM except for TTI, AsmPrinter, and frame lowering away from getSubtargetImpl.
llvm-svn: 227399
|
Revision tags: llvmorg-3.6.0-rc1 |
|
#
933de7aa |
| 08-Jan-2015 |
Kristof Beyls <kristof.beyls@arm.com> |
Fix large stack alignment codegen for ARM and Thumb2 targets
This partially fixes PR13007 (ARM CodeGen fails with large stack alignment): for ARM and Thumb2 targets, but not for Thumb1, as it seems
Fix large stack alignment codegen for ARM and Thumb2 targets
This partially fixes PR13007 (ARM CodeGen fails with large stack alignment): for ARM and Thumb2 targets, but not for Thumb1, as it seems stack alignment for Thumb1 targets hasn't been supported at all.
Producing an aligned stack pointer is done by zero-ing out the lower bits of the stack pointer. The BIC instruction was used for this. However, the immediate field of the BIC instruction only allows to encode an immediate that can zero out up to a maximum of the 8 lower bits. When a larger alignment is requested, a BIC instruction cannot be used; llvm was silently producing incorrect code in this case.
This commit fixes code generation for large stack aligments by using the BFC instruction instead, when the BFC instruction is available. When not, it uses 2 instructions: a right shift, followed by a left shift to zero out the lower bits.
The lowering of ARM::Int_eh_sjlj_dispatchsetup still has code that unconditionally uses BIC to realign the stack pointer, so it very likely has the same problem. However, I wasn't able to produce a test case for that. This commit adds an assert so that the compiler will fail the assert instead of silently generating wrong code if this is ever reached.
llvm-svn: 225446
show more ...
|
Revision tags: llvmorg-3.5.1, llvmorg-3.5.1-rc2 |
|
#
b9fa945d |
| 16-Dec-2014 |
Adrian Prantl <aprantl@apple.com> |
ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions. Debug info marks the first instruction without the FrameSetup flag as being the end of the function prologue. Any CFI instructions in th
ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions. Debug info marks the first instruction without the FrameSetup flag as being the end of the function prologue. Any CFI instructions in the middle of the function prologue would cause debug info to end the prologue too early and worse, attach the line number of the CFI instruction, which incidentally is often 0.
llvm-svn: 224294
show more ...
|
Revision tags: llvmorg-3.5.1-rc1 |
|
#
3024b553 |
| 01-Dec-2014 |
Tim Northover <tnorthover@apple.com> |
ARM: lower tail calls correctly when using GHC calling convention.
Patch by Ben Gamari.
llvm-svn: 223055
|
#
603d3165 |
| 14-Nov-2014 |
Tim Northover <tnorthover@apple.com> |
ARM: refactor .cfi_def_cfa_offset emission.
We use to track quite a few "adjusted" offsets through the FrameLowering code to account for changes in the prologue instructions as we went and allow the
ARM: refactor .cfi_def_cfa_offset emission.
We use to track quite a few "adjusted" offsets through the FrameLowering code to account for changes in the prologue instructions as we went and allow the emission of correct CFA annotations. However, we were missing a couple of cases and the code was almost impenetrable.
It's easier to just add any stack-adjusting instruction to a list and emit them together.
llvm-svn: 222057
show more ...
|
#
9d2d218f |
| 14-Nov-2014 |
Tim Northover <tnorthover@apple.com> |
ARM: correctly calculate the offset of FP in its push.
When we folded the DPR alignment gap into a push, we weren't noting the extra distance from the beginning of the push to the FP, and so FP ende
ARM: correctly calculate the offset of FP in its push.
When we folded the DPR alignment gap into a push, we weren't noting the extra distance from the beginning of the push to the FP, and so FP ended up pointing at an incorrect offset.
The .cfi_def_cfa_offset directives are still wrong in this case, but I think that can be improved by refactoring.
llvm-svn: 222056
show more ...
|
#
dc0d9e46 |
| 05-Nov-2014 |
Tim Northover <tnorthover@apple.com> |
ARM: try to add extra CS-register whenever stack alignment >= 8.
We currently try to push an even number of registers to preserve 8-byte alignment during a function's prologue, but only when the sta
ARM: try to add extra CS-register whenever stack alignment >= 8.
We currently try to push an even number of registers to preserve 8-byte alignment during a function's prologue, but only when the stack alignment is prcisely 8. Many of the reasons for doing it apply also when that alignment > 8 (the extra store is often free, and can save another stack adjustment, though less frequently for 16-byte stack alignment).
llvm-svn: 221321
show more ...
|
#
228c943f |
| 05-Nov-2014 |
Tim Northover <tnorthover@apple.com> |
ARM/Dwarf: correctly align stack before callee-saved VPRs
We were making an attempt to do this by adding an extra callee-saved GPR (so that there was an even number in the list), but when that faile
ARM/Dwarf: correctly align stack before callee-saved VPRs
We were making an attempt to do this by adding an extra callee-saved GPR (so that there was an even number in the list), but when that failed we went ahead and pushed anyway.
This had a couple of potential issues: + The .cfi directives we emit misplaced dN because they were based on PrologEpilogInserter's calculation. + Unaligned stores can be less efficient. + Unaligned stores can actually fault (likely only an issue in niche cases, but possible).
This adds a final explicit stack adjustment if all other options fail, so that the actual locations of the registers match up with where they should be.
llvm-svn: 221320
show more ...
|
Revision tags: llvmorg-3.5.0, llvmorg-3.5.0-rc4, llvmorg-3.5.0-rc3, llvmorg-3.5.0-rc2 |
|
#
fc6de428 |
| 05-Aug-2014 |
Eric Christopher <echristo@gmail.com> |
Have MachineFunction cache a pointer to the subtarget to make lookups shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lo
Have MachineFunction cache a pointer to the subtarget to make lookups shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lookups from the MachineFunction easily.
Update the MIPS subtarget switching machinery to update this pointer at the same time it runs.
llvm-svn: 214838
show more ...
|
#
d913448b |
| 04-Aug-2014 |
Eric Christopher <echristo@gmail.com> |
Remove the TargetMachine forwards for TargetSubtargetInfo based information and update all callers. No functional change.
llvm-svn: 214781
|
Revision tags: llvmorg-3.5.0-rc1 |
|
#
45fb7b63 |
| 26-Jun-2014 |
Eric Christopher <echristo@gmail.com> |
Move the frame lowering constructors out of line to avoid circular includes.
llvm-svn: 211798
|
#
86f60b72 |
| 30-May-2014 |
Tim Northover <tnorthover@apple.com> |
ARM: use AAPCS-style prologues for embedded MachO.
Darwin prologues save their GPRs in two stages: a narrow push of r0-r7 & lr, followed by a wide push of the remaining registers if there are any. A
ARM: use AAPCS-style prologues for embedded MachO.
Darwin prologues save their GPRs in two stages: a narrow push of r0-r7 & lr, followed by a wide push of the remaining registers if there are any. AAPCS uses a single push.w instruction.
It turns out that, on average, enough registers get pushed that code is smaller in the AAPCS prologue, which is a nice property for M-class programmers. They also have other options available for back-traces, so can hopefully deal with the fact that FP & LR aren't adjacent in memory.
rdar://problem/15909583
llvm-svn: 209895
show more ...
|
#
f9e798ba |
| 22-May-2014 |
Tim Northover <tnorthover@apple.com> |
Segmented stacks: omit __morestack call when there's no frame.
Patch by Florian Zeitz
llvm-svn: 209436
|
Revision tags: llvmorg-3.4.2, llvmorg-3.4.2-rc1 |
|
#
985dcf18 |
| 07-May-2014 |
Saleem Abdulrasool <compnerd@compnerd.org> |
ARM: mark additional instructions as MachineFrameSetup
Mark up additional instructions which are part of the function prologue as MachineFrameSetup. These instructions are part of the function prol
ARM: mark additional instructions as MachineFrameSetup
Mark up additional instructions which are part of the function prologue as MachineFrameSetup. These instructions are part of the function prologue, emitted by the PEI pass to setup the stack for use in the activating frame.
llvm-svn: 208153
show more ...
|