#
604526fe |
| 10-May-2017 |
Ahmed Bougacha <ahmed.bougacha@gmail.com> |
[CodeGen] Don't require AA in SDAGISel at -O0.
Before r247167, the pass manager builder controlled which AA implementations were used, exporting them all in the AliasAnalysis analysis group.
Now, A
[CodeGen] Don't require AA in SDAGISel at -O0.
Before r247167, the pass manager builder controlled which AA implementations were used, exporting them all in the AliasAnalysis analysis group.
Now, AAResultsWrapperPass always uses BasicAA, but still uses other AA implementations if made available in the pass pipeline.
But regardless, SDAGISel is required at O0, and really doesn't need to be doing fancy optimizations based on useful AA results.
Don't require AA at CodeGenOpt::None, and only use it otherwise.
This does have a functional impact (and one testcase is pessimized because we can't reuse a load). But I think that's desirable no matter what.
Note that this alone doesn't result in less DT computations: TwoAddress was previously able to reuse the DT we computed for SDAG. That will be fixed separately.
Differential Revision: https://reviews.llvm.org/D32766
llvm-svn: 302611
show more ...
|
#
3a363fff |
| 09-May-2017 |
Reid Kleckner <rnk@google.com> |
Re-land "Use the frame index side table for byval and inalloca arguments"
This re-lands r302483. It was not the cause of PR32977.
llvm-svn: 302544
|
#
84075fdd |
| 09-May-2017 |
Reid Kleckner <rnk@google.com> |
Re-land "Don't add DBG_VALUE instructions for static allocas in dbg.declare"
This re-lands commit r302461. It was not the cause of PR32977.
llvm-svn: 302543
|
#
d526b13e |
| 09-May-2017 |
Serge Pavlov <sepavloff@gmail.com> |
Add extra operand to CALLSEQ_START to keep frame part set up previously
Using arguments with attribute inalloca creates problems for verification of machine representation. This attribute instructs
Add extra operand to CALLSEQ_START to keep frame part set up previously
Using arguments with attribute inalloca creates problems for verification of machine representation. This attribute instructs the backend that the argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size stored in CALLSEQ_START in this case does not count the size of this argument. However CALLSEQ_END still keeps total frame size, as caller can be responsible for cleanup of entire frame. So CALLSEQ_START and CALLSEQ_END keep different frame size and the difference is treated by MachineVerifier as stack error. Currently there is no way to distinguish this case from actual errors.
This patch adds additional argument to CALLSEQ_START and its target-specific counterparts to keep size of stack that is set up prior to the call frame sequence. This argument allows MachineVerifier to calculate actual frame size associated with frame setup instruction and correctly process the case of inalloca arguments.
The changes made by the patch are: - Frame setup instructions get the second mandatory argument. It affects all targets that use frame pseudo instructions and touched many files although the changes are uniform. - Access to frame properties are implemented using special instructions rather than calls getOperand(N).getImm(). For X86 and ARM such replacement was made previously. - Changes that reflect appearance of additional argument of frame setup instruction. These involve proper instruction initialization and methods that access instruction arguments. - MachineVerifier retrieves frame size using method, which reports sum of frame parts initialized inside frame instruction pair and outside it.
The patch implements approach proposed by Quentin Colombet in https://bugs.llvm.org/show_bug.cgi?id=27481#c1. It fixes 9 tests failed with machine verifier enabled and listed in PR27481.
Differential Revision: https://reviews.llvm.org/D32394
llvm-svn: 302527
show more ...
|
#
cf9daa33 |
| 09-May-2017 |
Amara Emerson <amara.emerson@arm.com> |
Introduce experimental generic intrinsics for horizontal vector reductions.
- This change allows targets to opt-in to using them instead of the log2 shufflevector algorithm. - The SLP and Loop vec
Introduce experimental generic intrinsics for horizontal vector reductions.
- This change allows targets to opt-in to using them instead of the log2 shufflevector algorithm. - The SLP and Loop vectorizers have the common code to do shuffle reductions factored out into LoopUtils, and now have a unified interface for generating reductions regardless of the preference of the target. LoopUtils now uses TTI to determine what kind of reductions the target wants to handle. - For CodeGen, basic legalization support is added.
Differential Revision: https://reviews.llvm.org/D30086
llvm-svn: 302514
show more ...
|
#
41bb9423 |
| 09-May-2017 |
Reid Kleckner <rnk@google.com> |
Revert "Don't add DBG_VALUE instructions for static allocas in dbg.declare"
This reverts commit r302461.
It appears to be causing failures compiling gtest with debug info on the Linux sanitizer bot
Revert "Don't add DBG_VALUE instructions for static allocas in dbg.declare"
This reverts commit r302461.
It appears to be causing failures compiling gtest with debug info on the Linux sanitizer bot. I was unable to reproduce the failure locally, however.
llvm-svn: 302504
show more ...
|
#
9f29914d |
| 09-May-2017 |
Reid Kleckner <rnk@google.com> |
Revert "Use the frame index side table for byval and inalloca arguments"
This reverts r302483 and it's follow up fix.
llvm-svn: 302493
|
#
45efcf0c |
| 08-May-2017 |
Reid Kleckner <rnk@google.com> |
Use the frame index side table for byval and inalloca arguments
Summary: For inalloca functions, this is a very common code pattern:
%argpack = type <{ i32, i32, i32 }> define void @f(%argpack*
Use the frame index side table for byval and inalloca arguments
Summary: For inalloca functions, this is a very common code pattern:
%argpack = type <{ i32, i32, i32 }> define void @f(%argpack* inalloca %args) { entry: %a = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 0 %b = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 1 %c = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 2 tail call void @llvm.dbg.declare(metadata i32* %a, ... "a") tail call void @llvm.dbg.declare(metadata i32* %c, ... "b") tail call void @llvm.dbg.declare(metadata i32* %b, ... "c")
Even though these GEPs can be simplified to a constant offset from EBP or RSP, we don't do that at -O0, and each GEP is computed into a register. Registers used to compute argument addresses are typically spilled and clobbered very quickly after the initial computation, so live debug variable tracking loses information very quickly if we use DBG_VALUE instructions.
This change moves processing of dbg.declare between argument lowering and basic block isel, so that we can ask if an argument has a frame index or not. If the argument lives in a register as is the case for byval arguments on some targets, then we don't put it in the side table and during ISel we emit DBG_VALUE instructions.
Reviewers: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32980
llvm-svn: 302483
show more ...
|
#
bf828eed |
| 08-May-2017 |
Reid Kleckner <rnk@google.com> |
Don't add DBG_VALUE instructions for static allocas in dbg.declare
Summary: An llvm.dbg.declare of a static alloca is always added to the MachineFunction dbg variable map, so these values are entire
Don't add DBG_VALUE instructions for static allocas in dbg.declare
Summary: An llvm.dbg.declare of a static alloca is always added to the MachineFunction dbg variable map, so these values are entirely redundant. They survive all the way through codegen to be ignored by DWARF emission.
Effectively revert r113967
Two bugpoint-reduced test cases from 2012 broke as a result of this change. Despite my best efforts, I haven't been able to rewrite the test case using dbg.value. I'm not too concerned about the lost coverage because these were reduced from the test-suite, which we still run.
Reviewers: aprantl, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32920
llvm-svn: 302461
show more ...
|
#
9bcaed86 |
| 08-May-2017 |
Dean Michael Berris <dberris@google.com> |
[XRay] Custom event logging intrinsic
This patch introduces an LLVM intrinsic and a target opcode for custom event logging in XRay. Initially, its use case will be to allow users of XRay to log some
[XRay] Custom event logging intrinsic
This patch introduces an LLVM intrinsic and a target opcode for custom event logging in XRay. Initially, its use case will be to allow users of XRay to log some type of string ("poor man's printf"). The target opcode compiles to a noop sled large enough to enable calling through to a runtime-determined relative function call. At runtime, when X-Ray is enabled, the sled is replaced by compiler-rt with a trampoline to the logic for creating the custom log entries.
Future patches will implement the compiler-rt parts and clang-side support for emitting the IR corresponding to this intrinsic.
Reviewers: timshen, dberris
Subscribers: igorb, pelikan, rSerge, timshen, echristo, dberris, llvm-commits
Differential Revision: https://reviews.llvm.org/D27503
llvm-svn: 302405
show more ...
|
#
ac1a97b3 |
| 05-May-2017 |
Reid Kleckner <rnk@google.com> |
Simplify dbg.value handling in SDISel with early returns
No functional change other than improving dbgs logging accuracy on constant dbg values. Previously we would add things like "i32 42" as debug
Simplify dbg.value handling in SDISel with early returns
No functional change other than improving dbgs logging accuracy on constant dbg values. Previously we would add things like "i32 42" as debug values, and then log that we were dropping the debug info, which is silly.
Delete some dead code that was checking for static allocas. This remained after r207165, but served no purpose. Currently, static alloca dbg.values are always sent through the DanglingDebugInfoMap, and are usually made valid the first time the alloca is used.
llvm-svn: 302267
show more ...
|
#
89ad89cc |
| 02-May-2017 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[SelectionDAG] Improve support for promotion of <1 x fX> floating point argument types (PR31088)
PR31088 demonstrated that we were assuming that only integers require promotion from <1 x iX> types,
[SelectionDAG] Improve support for promotion of <1 x fX> floating point argument types (PR31088)
PR31088 demonstrated that we were assuming that only integers require promotion from <1 x iX> types, when in fact float types may require it as well - in this case half floats.
This patch adds support for extension/truncation for both integer and float types.
Differential Revision: https://reviews.llvm.org/D32391
llvm-svn: 301910
show more ...
|
#
d28f0cd4 |
| 01-May-2017 |
Amara Emerson <amara.emerson@arm.com> |
Generalize the specialized flag-carrying SDNodes by moving flags into SDNode.
This removes BinaryWithFlagsSDNode, and flags are now all passed by value.
Differential Revision: https://reviews.llvm.
Generalize the specialized flag-carrying SDNodes by moving flags into SDNode.
This removes BinaryWithFlagsSDNode, and flags are now all passed by value.
Differential Revision: https://reviews.llvm.org/D32527
llvm-svn: 301803
show more ...
|
#
6652a52e |
| 28-Apr-2017 |
Reid Kleckner <rnk@google.com> |
Use Argument::hasAttribute and AttributeList::ReturnIndex more
This eliminates many extra 'Idx' induction variables in loops over arguments in CodeGen/ and Target/. It also reduces the number of pla
Use Argument::hasAttribute and AttributeList::ReturnIndex more
This eliminates many extra 'Idx' induction variables in loops over arguments in CodeGen/ and Target/. It also reduces the number of places where we assume that ReturnIndex is 0 and that we should add one to argument numbers to get the corresponding attribute list index.
NFC
llvm-svn: 301666
show more ...
|
#
919f9e8d |
| 28-Apr-2017 |
Jun Bum Lim <junbuml@codeaurora.org> |
[InlineCost] Improve the cost heuristic for Switch
Summary: The motivation example is like below which has 13 cases but only 2 distinct targets
``` lor.lhs.false2:
[InlineCost] Improve the cost heuristic for Switch
Summary: The motivation example is like below which has 13 cases but only 2 distinct targets
``` lor.lhs.false2: ; preds = %if.then switch i32 %Status, label %if.then27 [ i32 -7012, label %if.end35 i32 -10008, label %if.end35 i32 -10016, label %if.end35 i32 15000, label %if.end35 i32 14013, label %if.end35 i32 10114, label %if.end35 i32 10107, label %if.end35 i32 10105, label %if.end35 i32 10013, label %if.end35 i32 10011, label %if.end35 i32 7008, label %if.end35 i32 7007, label %if.end35 i32 5002, label %if.end35 ] ``` which is compiled into a balanced binary tree like this on AArch64 (similar on X86)
``` .LBB853_9: // %lor.lhs.false2 mov w8, #10012 cmp w19, w8 b.gt .LBB853_14 // BB#10: // %lor.lhs.false2 mov w8, #5001 cmp w19, w8 b.gt .LBB853_18 // BB#11: // %lor.lhs.false2 mov w8, #-10016 cmp w19, w8 b.eq .LBB853_23 // BB#12: // %lor.lhs.false2 mov w8, #-10008 cmp w19, w8 b.eq .LBB853_23 // BB#13: // %lor.lhs.false2 mov w8, #-7012 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_14: // %lor.lhs.false2 mov w8, #14012 cmp w19, w8 b.gt .LBB853_21 // BB#15: // %lor.lhs.false2 mov w8, #-10105 add w8, w19, w8 cmp w8, #9 // =9 b.hi .LBB853_17 // BB#16: // %lor.lhs.false2 orr w9, wzr, #0x1 lsl w8, w9, w8 mov w9, #517 and w8, w8, w9 cbnz w8, .LBB853_23 .LBB853_17: // %lor.lhs.false2 mov w8, #10013 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_18: // %lor.lhs.false2 mov w8, #-7007 add w8, w19, w8 cmp w8, #2 // =2 b.lo .LBB853_23 // BB#19: // %lor.lhs.false2 mov w8, #5002 cmp w19, w8 b.eq .LBB853_23 // BB#20: // %lor.lhs.false2 mov w8, #10011 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_21: // %lor.lhs.false2 mov w8, #14013 cmp w19, w8 b.eq .LBB853_23 // BB#22: // %lor.lhs.false2 mov w8, #15000 cmp w19, w8 b.ne .LBB853_3 ``` However, the inline cost model estimates the cost to be linear with the number of distinct targets and the cost of the above switch is just 2 InstrCosts. The function containing this switch is then inlined about 900 times.
This change use the general way of switch lowering for the inline heuristic. It etimate the number of case clusters with the suitability check for a jump table or bit test. Considering the binary search tree built for the clusters, this change modifies the model to be linear with the size of the balanced binary tree. The model is off by default for now : -inline-generic-switch-cost=false
This change was originally proposed by Haicheng in D29870.
Reviewers: hans, bmakam, chandlerc, eraman, haicheng, mcrosier
Reviewed By: hans
Subscribers: joerg, aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D31085
llvm-svn: 301649
show more ...
|
#
d0af7e8a |
| 28-Apr-2017 |
Craig Topper <craig.topper@gmail.com> |
[SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and simplifyDemandedBits
This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar
[SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and simplifyDemandedBits
This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently.
This is largely a mechanical transformation from KnownZero to Known.Zero.
Differential Revision: https://reviews.llvm.org/D32569
llvm-svn: 301620
show more ...
|
#
37ef04ad |
| 25-Apr-2017 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[SelectionDAG] Use getBuildVector helper where possible. NFCI
llvm-svn: 301314
|
#
986d73cc |
| 25-Apr-2017 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[SelectionDAG] Pull out repeated getValueType calls. NFCI.
Noticed in D32391.
llvm-svn: 301308
|
#
c8e8e2a0 |
| 24-Apr-2017 |
Krzysztof Parzyszek <kparzysz@codeaurora.org> |
Move value type list from TargetRegisterClass to TargetRegisterInfo
Differential Revision: https://reviews.llvm.org/D31937
llvm-svn: 301234
|
#
98ab4c64 |
| 24-Apr-2017 |
Krzysztof Parzyszek <kparzysz@codeaurora.org> |
Revert r301231: Accidentally committed stale files
I forgot to commit local changes before commit.
llvm-svn: 301232
|
#
c0197066 |
| 24-Apr-2017 |
Krzysztof Parzyszek <kparzysz@codeaurora.org> |
Move value type list from TargetRegisterClass to TargetRegisterInfo
Differential Revision: https://reviews.llvm.org/D31937
llvm-svn: 301231
|
#
fd23a0c0 |
| 24-Apr-2017 |
Yaxun Liu <Yaxun.Liu@amd.com> |
CodeGen: Add a hook for getFenceOperandTy
Currently the operand type for ATOMIC_FENCE assumes value type of a pointer in address space 0. This is fine for most targets. However for amdgcn target, th
CodeGen: Add a hook for getFenceOperandTy
Currently the operand type for ATOMIC_FENCE assumes value type of a pointer in address space 0. This is fine for most targets. However for amdgcn target, the size of pointer in address space 0 depends on triple environment. For amdgiz environment, it is 64 bit but for other environment it is 32 bit. On the other hand, amdgcn target expects 32 bit fence operands independent of the target triple environment. Therefore a hook is need in target lowering for getting the fence operand type.
This patch has no effect on targets other than amdgcn.
Differential Revision: https://reviews.llvm.org/D32186
llvm-svn: 301215
show more ...
|
#
5d977f8e |
| 20-Apr-2017 |
Yaxun Liu <Yaxun.Liu@amd.com> |
CodeGen: Let frame index value type match alloca addr space
Recently alloca address space has been added to data layout. Due to this change, pointer returned by alloca may have different size as poi
CodeGen: Let frame index value type match alloca addr space
Recently alloca address space has been added to data layout. Due to this change, pointer returned by alloca may have different size as pointer in address space 0.
However, currently the value type of frame index is assumed to be of the same size as pointer in address space 0.
This patch fixes that.
Most targets assume alloca returning pointer in address space 0, which is the default alloca address space. Therefore it is NFC for them.
AMDGCN target with amdgiz environment requires this change since it assumes alloca returning pointer to addr space 5 and its size is 32, which is different from the size of pointer in addr space 0 which is 64.
Differential Revision: https://reviews.llvm.org/D32021
llvm-svn: 300864
show more ...
|
#
6825fb64 |
| 18-Apr-2017 |
Adrian Prantl <aprantl@apple.com> |
PR32382: Fix emitting complex DWARF expressions.
The DWARF specification knows 3 kinds of non-empty simple location descriptions: 1. Register location descriptions - describe a variable in a regis
PR32382: Fix emitting complex DWARF expressions.
The DWARF specification knows 3 kinds of non-empty simple location descriptions: 1. Register location descriptions - describe a variable in a register - consist of only a DW_OP_reg 2. Memory location descriptions - describe the address of a variable 3. Implicit location descriptions - describe the value of a variable - end with DW_OP_stack_value & friends
The existing DwarfExpression code is pretty much ignorant of these restrictions. This used to not matter because we only emitted very short expressions that we happened to get right by accident. This patch makes DwarfExpression aware of the rules defined by the DWARF standard and now chooses the right kind of location description for each expression being emitted.
This would have been an NFC commit (for the existing testsuite) if not for the way that clang describes captured block variables. Based on how the previous code in LLVM emitted locations, DW_OP_deref operations that should have come at the end of the expression are put at its beginning. Fixing this means changing the semantics of DIExpression, so this patch bumps the version number of DIExpression and implements a bitcode upgrade.
There are two major changes in this patch:
I had to fix the semantics of dbg.declare for describing function arguments. After this patch a dbg.declare always takes the *address* of a variable as the first argument, even if the argument is not an alloca.
When lowering a DBG_VALUE, the decision of whether to emit a register location description or a memory location description depends on the MachineLocation — register machine locations may get promoted to memory locations based on their DIExpression. (Future) optimization passes that want to salvage implicit debug location for variables may do so by appending a DW_OP_stack_value. For example: DBG_VALUE, [RBP-8] --> DW_OP_fbreg -8 DBG_VALUE, RAX --> DW_OP_reg0 +0 DBG_VALUE, RAX, DIExpression(DW_OP_deref) --> DW_OP_reg0 +0
All testcases that were modified were regenerated from clang. I also added source-based testcases for each of these to the debuginfo-tests repository over the last week to make sure that no synchronized bugs slip in. The debuginfo-tests compile from source and run the debugger.
https://bugs.llvm.org/show_bug.cgi?id=32382 <rdar://problem/31205000>
Differential Revision: https://reviews.llvm.org/D31439
llvm-svn: 300522
show more ...
|
#
fb502d2f |
| 14-Apr-2017 |
Reid Kleckner <rnk@google.com> |
[IR] Make paramHasAttr to use arg indices instead of attr indices
This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern.
Previously we were testing return value attributes with index
[IR] Make paramHasAttr to use arg indices instead of attr indices
This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern.
Previously we were testing return value attributes with index 0, so I introduced hasReturnAttr() for that use case.
llvm-svn: 300367
show more ...
|