#
720ab84b |
| 03-Mar-2015 |
Eric Christopher <echristo@gmail.com> |
Add a comment above findRepresentativeClass explaining why it's where it is so that future generations can understand.
llvm-svn: 231111
|
#
23a3a7c8 |
| 26-Feb-2015 |
Eric Christopher <echristo@gmail.com> |
Remove an argument-less call to getSubtargetImpl from TargetLoweringBase. This required plumbing a TargetRegisterInfo through computeRegisterProperties and into findRepresentativeClass which uses it
Remove an argument-less call to getSubtargetImpl from TargetLoweringBase. This required plumbing a TargetRegisterInfo through computeRegisterProperties and into findRepresentativeClass which uses it for register class iteration. This required passing a subtarget into a few target specific initializations of TargetLowering.
llvm-svn: 230583
show more ...
|
#
75dbd7ca |
| 25-Feb-2015 |
Eric Christopher <echristo@gmail.com> |
Move TargetLoweringBase::getTypeConversion to the .cpp file from the .h file. It's used in only one place (other than recursively) and there's no need to include it everywhere.
Saves almost 900k fro
Move TargetLoweringBase::getTypeConversion to the .cpp file from the .h file. It's used in only one place (other than recursively) and there's no need to include it everywhere.
Saves almost 900k from total llvm object file size.
llvm-svn: 230561
show more ...
|
Revision tags: llvmorg-3.6.0 |
|
#
0dc54c4d |
| 20-Feb-2015 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Add generic fmad DAG node.
This allows sharing of FMA forming combines to work with instructions that have the same semantics as a separate multiply and add.
This is expand by default, and only for
Add generic fmad DAG node.
This allows sharing of FMA forming combines to work with instructions that have the same semantics as a separate multiply and add.
This is expand by default, and only formed post legalization so it shouldn't have much impact on targets that do not want it.
llvm-svn: 230070
show more ...
|
Revision tags: llvmorg-3.6.0-rc4, llvmorg-3.6.0-rc3, llvmorg-3.6.0-rc2 |
|
#
8b770651 |
| 26-Jan-2015 |
Eric Christopher <echristo@gmail.com> |
Move DataLayout back to the TargetMachine from TargetSubtargetInfo derived classes.
Since global data alignment, layout, and mangling is often based on the DataLayout, move it to the TargetMachine.
Move DataLayout back to the TargetMachine from TargetSubtargetInfo derived classes.
Since global data alignment, layout, and mangling is often based on the DataLayout, move it to the TargetMachine. This ensures that global data is going to be layed out and mangled consistently if the subtarget changes on a per function basis. Prior to this all targets(*) have had subtarget dependent code moved out and onto the TargetMachine.
*One target hasn't been migrated as part of this change: R600. The R600 port has, as a subtarget feature, the size of pointers and this affects global data layout. I've currently hacked in a FIXME to enable progress, but the port needs to be updated to either pass the 64-bitness to the TargetMachine, or fix the DataLayout to avoid subtarget dependent features.
llvm-svn: 227113
show more ...
|
Revision tags: llvmorg-3.6.0-rc1 |
|
#
bf0db918 |
| 13-Jan-2015 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
R600: Implement getRecipEstimate
This requires a new hook to prevent expanding sqrt in terms of rsqrt and reciprocal. v_rcp_f32, v_rsq_f32, and v_sqrt_f32 are all the same rate, so this expansion wo
R600: Implement getRecipEstimate
This requires a new hook to prevent expanding sqrt in terms of rsqrt and reciprocal. v_rcp_f32, v_rsq_f32, and v_sqrt_f32 are all the same rate, so this expansion would just double the number of instructions and cycles.
llvm-svn: 225828
show more ...
|
#
67dd2d25 |
| 07-Jan-2015 |
Ahmed Bougacha <ahmed.bougacha@gmail.com> |
[CodeGen] Use MVT iterator_ranges in legality loops. NFC intended.
A few loops do trickier things than just iterating on an MVT subset, so I'll leave them be for now. Follow-up of r225387.
llvm-svn
[CodeGen] Use MVT iterator_ranges in legality loops. NFC intended.
A few loops do trickier things than just iterating on an MVT subset, so I'll leave them be for now. Follow-up of r225387.
llvm-svn: 225392
show more ...
|
Revision tags: llvmorg-3.5.1, llvmorg-3.5.1-rc2 |
|
#
fc2201e9 |
| 17-Dec-2014 |
Quentin Colombet <qcolombet@apple.com> |
[CodeGenPrepare] Reapply r224351 with a fix for the assertion failure: The type promotion helper does not support vector type, so when make such it does not kick in in such cases.
Original commit me
[CodeGenPrepare] Reapply r224351 with a fix for the assertion failure: The type promotion helper does not support vector type, so when make such it does not kick in in such cases.
Original commit message: [CodeGenPrepare] Move sign/zero extensions near loads using type promotion.
This patch extends the optimization in CodeGenPrepare that moves a sign/zero extension near a load when the target can combine them. The optimization may promote any operations between the extension and the load to make that possible.
Although this optimization may be beneficial for all targets, in particular AArch64, this is enabled for X86 only as I have not benchmarked it for other targets yet.
** Context **
Most targets feature extended loads, i.e., loads that perform a zero or sign extension for free. In that context it is interesting to expose such pattern in CodeGenPrepare so that the instruction selection pass can form such loads. Sometimes, this pattern is blocked because of instructions between the load and the extension. When those instructions are promotable to the extended type, we can expose this pattern.
** Motivating Example **
Let us consider an example: define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) { %ld = load i8* %addr1 %zextld = zext i8 %ld to i32 %ld2 = load i32* %addr2 %add = add nsw i32 %ld2, %zextld %sextadd = sext i32 %add to i64 %zexta = zext i8 %a to i32 %addza = add nsw i32 %zexta, %zextld %sextaddza = sext i32 %addza to i64 %addb = add nsw i32 %b, %zextld %sextaddb = sext i32 %addb to i64 call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb) ret void }
As it is, this IR generates the following assembly on x86_64: [...] movzbl (%rdi), %eax # zero-extended load movl (%rsi), %es # plain load addl %eax, %esi # 32-bit add movslq %esi, %rdi # sign extend the result of add movzbl %dl, %edx # zero extend the first argument addl %eax, %edx # 32-bit add movslq %edx, %rsi # sign extend the result of add addl %eax, %ecx # 32-bit add movslq %ecx, %rdx # sign extend the result of add [...] The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA.
Now, by promoting the additions to form more extended loads we would generate: [...] movzbl (%rdi), %eax # zero-extended load movslq (%rsi), %rdi # sign-extended load addq %rax, %rdi # 64-bit add movzbl %dl, %esi # zero extend the first argument addq %rax, %rsi # 64-bit add movslq %ecx, %rdx # sign extend the second argument addq %rax, %rdx # 64-bit add [...] The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA.
This kind of sequences happen a lot on code using 32-bit indexes on 64-bit architectures.
Note: The throughput numbers are similar on Sandy Bridge and Haswell.
** Proposed Solution **
To avoid the penalty of all these sign/zero extensions, we merge them in the loads at the beginning of the chain of computation by promoting all the chain of computation on the extended type. The promotion is done if and only if we do not introduce new extensions, i.e., if we do not degrade the code quality. To achieve this, we extend the existing “move ext to load” optimization with the promotion mechanism introduced to match larger patterns for addressing mode (r200947). The idea of this extension is to perform the following transformation: ext(promotableInst1(...(promotableInstN(load)))) => promotedInst1(...(promotedInstN(ext(load))))
The promotion mechanism in that optimization is enabled by a new TargetLowering switch, which is off by default. In other words, by default, the optimization performs the “move ext to load” optimization as it was before this patch.
** Performance **
Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10. Tested Optimization Levels: O3/Os Tests: llvm-testsuite + externals. Results: - No regression beside noise. - Improvements: CINT2006/473.astar: ~2% Benchmarks/PAQ8p: ~2% Misc/perlin: ~3%
The results are consistent for both O3 and Os.
<rdar://problem/18310086>
llvm-svn: 224402
show more ...
|
#
04b69f89 |
| 17-Dec-2014 |
Reid Kleckner <reid@kleckner.net> |
Revert "[CodeGenPrepare] Move sign/zero extensions near loads using type promotion."
This reverts commit r224351. It causes assertion failures when building ICU.
llvm-svn: 224397
|
#
d5e57b73 |
| 16-Dec-2014 |
Quentin Colombet <qcolombet@apple.com> |
[CodeGenPrepare] Move sign/zero extensions near loads using type promotion.
This patch extends the optimization in CodeGenPrepare that moves a sign/zero extension near a load when the target can com
[CodeGenPrepare] Move sign/zero extensions near loads using type promotion.
This patch extends the optimization in CodeGenPrepare that moves a sign/zero extension near a load when the target can combine them. The optimization may promote any operations between the extension and the load to make that possible.
Although this optimization may be beneficial for all targets, in particular AArch64, this is enabled for X86 only as I have not benchmarked it for other targets yet.
** Context **
Most targets feature extended loads, i.e., loads that perform a zero or sign extension for free. In that context it is interesting to expose such pattern in CodeGenPrepare so that the instruction selection pass can form such loads. Sometimes, this pattern is blocked because of instructions between the load and the extension. When those instructions are promotable to the extended type, we can expose this pattern.
** Motivating Example **
Let us consider an example: define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) { %ld = load i8* %addr1 %zextld = zext i8 %ld to i32 %ld2 = load i32* %addr2 %add = add nsw i32 %ld2, %zextld %sextadd = sext i32 %add to i64 %zexta = zext i8 %a to i32 %addza = add nsw i32 %zexta, %zextld %sextaddza = sext i32 %addza to i64 %addb = add nsw i32 %b, %zextld %sextaddb = sext i32 %addb to i64 call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb) ret void }
As it is, this IR generates the following assembly on x86_64: [...] movzbl (%rdi), %eax # zero-extended load movl (%rsi), %es # plain load addl %eax, %esi # 32-bit add movslq %esi, %rdi # sign extend the result of add movzbl %dl, %edx # zero extend the first argument addl %eax, %edx # 32-bit add movslq %edx, %rsi # sign extend the result of add addl %eax, %ecx # 32-bit add movslq %ecx, %rdx # sign extend the result of add [...] The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA.
Now, by promoting the additions to form more extended loads we would generate: [...] movzbl (%rdi), %eax # zero-extended load movslq (%rsi), %rdi # sign-extended load addq %rax, %rdi # 64-bit add movzbl %dl, %esi # zero extend the first argument addq %rax, %rsi # 64-bit add movslq %ecx, %rdx # sign extend the second argument addq %rax, %rdx # 64-bit add [...] The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA.
This kind of sequences happen a lot on code using 32-bit indexes on 64-bit architectures.
Note: The throughput numbers are similar on Sandy Bridge and Haswell.
** Proposed Solution **
To avoid the penalty of all these sign/zero extensions, we merge them in the loads at the beginning of the chain of computation by promoting all the chain of computation on the extended type. The promotion is done if and only if we do not introduce new extensions, i.e., if we do not degrade the code quality. To achieve this, we extend the existing “move ext to load” optimization with the promotion mechanism introduced to match larger patterns for addressing mode (r200947). The idea of this extension is to perform the following transformation: ext(promotableInst1(...(promotableInstN(load)))) => promotedInst1(...(promotedInstN(ext(load))))
The promotion mechanism in that optimization is enabled by a new TargetLowering switch, which is off by default. In other words, by default, the optimization performs the “move ext to load” optimization as it was before this patch.
** Performance **
Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10. Tested Optimization Levels: O3/Os Tests: llvm-testsuite + externals. Results: - No regression beside noise. - Improvements: CINT2006/473.astar: ~2% Benchmarks/PAQ8p: ~2% Misc/perlin: ~3%
The results are consistent for both O3 and Os.
<rdar://problem/18310086>
llvm-svn: 224351
show more ...
|
Revision tags: llvmorg-3.5.1-rc1 |
|
#
0365f1a3 |
| 01-Dec-2014 |
Philip Reames <listmail@philipreames.com> |
[Statepoints 2/4] Statepoint infrastructure for garbage collection: MI & x86-64 Backend
This is the second patch in a small series. This patch contains the MachineInstruction and x86-64 backend pie
[Statepoints 2/4] Statepoint infrastructure for garbage collection: MI & x86-64 Backend
This is the second patch in a small series. This patch contains the MachineInstruction and x86-64 backend pieces required to lower Statepoints. It does not include the code to actually generate the STATEPOINT machine instruction and as a result, the entire patch is currently dead code. I will be submitting the SelectionDAG parts within the next 24-48 hours. Since those pieces are by far the most complicated, I wanted to minimize the size of that patch. That patch will include the tests which exercise the functionality in this patch. The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683.
The STATEPOINT psuedo node is generated after all gc values are explicitly spilled to stack slots. The purpose of this node is to wrap an actual call instruction while recording the spill locations of the meta arguments used for garbage collection and other purposes. The STATEPOINT is modeled as modifing all of those locations to prevent backend optimizations from forwarding the value from before the STATEPOINT to after the STATEPOINT. (Doing so would break relocation semantics for collectors which wish to relocate roots.)
The implementation of STATEPOINT is closely modeled on PATCHPOINT. Eventually, much of the code in this patch will be removed. The long term plan is to merge the functionality provided by statepoints and patchpoints. Merging their implementations in the backend is likely to be a good starting point.
Reviewed by: atrick, ributzka
llvm-svn: 223085
show more ...
|
#
2bfd9129 |
| 29-Nov-2014 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
Target triple OS detection tidyup. NFC
Use Triple::isOS*() helpers where possible.
llvm-svn: 222960
|
#
6438fc3d |
| 17-Nov-2014 |
Craig Topper <craig.topper@gmail.com> |
Replace a couple asserts with static_asserts.
llvm-svn: 222114
|
#
30531556 |
| 13-Nov-2014 |
Aditya Nandakumar <aditya_nandakumar@apple.com> |
We can get the TLOF from the TargetMachine - so constructor no longer requires TargetLoweringObjectFile to be passed.
llvm-svn: 221926
|
#
a2719329 |
| 13-Nov-2014 |
Aditya Nandakumar <aditya_nandakumar@apple.com> |
This patch changes the ownership of TLOF from TargetLoweringBase to TargetMachine so that different subtargets could share the TLOF effectively
llvm-svn: 221878
|
#
e02b1a06 |
| 31-Oct-2014 |
Hao Liu <Hao.Liu@arm.com> |
PR20557: Fix the bug that bogus cpu parameter crashes llc on AArch64 backend. Initial patch by Oleg Ranevskyy.
llvm-svn: 220945
|
#
7c93690b |
| 21-Oct-2014 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Add minnum / maxnum codegen
llvm-svn: 220342
|
#
79cc1e3a |
| 02-Sep-2014 |
Eric Christopher <echristo@gmail.com> |
Reinstate "Nuke the old JIT." Approved by Jim Grosbach, Lang Hames, Rafael Espindola.
This reinstates commits r215111, 215115, 215116, 215117, 215136.
llvm-svn: 216982
|
Revision tags: llvmorg-3.5.0, llvmorg-3.5.0-rc4 |
|
#
2cdea4c4 |
| 21-Aug-2014 |
Sanjay Patel <spatel@rotateright.com> |
name change: isPow2DivCheap -> isPow2SDivCheap
isPow2DivCheap
That name doesn't specify signed or unsigned.
Lazy as I am, I eventually read the function and variable comments. It turns out that th
name change: isPow2DivCheap -> isPow2SDivCheap
isPow2DivCheap
That name doesn't specify signed or unsigned.
Lazy as I am, I eventually read the function and variable comments. It turns out that this is strictly about signed div. But I discovered that the comments are wrong:
srl/add/sra
is not the general sequence for signed integer division by power-of-2. We need one more 'sra':
sra/srl/add/sra
That's the sequence produced in DAGCombiner. The first 'sra' may be removed when dividing by exactly '2', but that's a special case.
This patch corrects the comments, changes the name of the flag bit, and changes the name of the accessor methods.
No functional change intended.
Differential Revision: http://reviews.llvm.org/D5010
llvm-svn: 216237
show more ...
|
Revision tags: llvmorg-3.5.0-rc3 |
|
#
caa56588 |
| 08-Aug-2014 |
Pedro Artigas <partigas@apple.com> |
Added a TLI hook to signal that the target does not have or does not care about floating point exceptions, added use of flag to fold potentially exception raising floating point math in selection DA
Added a TLI hook to signal that the target does not have or does not care about floating point exceptions, added use of flag to fold potentially exception raising floating point math in selection DAG. No functionality change, as targets have to explicitly ask for this behavior and none does today.
llvm-svn: 215222
show more ...
|
#
b9fd9ed3 |
| 07-Aug-2014 |
Eric Christopher <echristo@gmail.com> |
Temporarily Revert "Nuke the old JIT." as it's not quite ready to be deleted. This will be reapplied as soon as possible and before the 3.6 branch date at any rate.
Approved by Jim Grosbach, Lang Ha
Temporarily Revert "Nuke the old JIT." as it's not quite ready to be deleted. This will be reapplied as soon as possible and before the 3.6 branch date at any rate.
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.
This reverts commits r215111, 215115, 215116, 215117, 215136.
llvm-svn: 215154
show more ...
|
#
f8b27c41 |
| 07-Aug-2014 |
Rafael Espindola <rafael.espindola@gmail.com> |
Nuke the old JIT.
I am sure we will be finding bits and pieces of dead code for years to come, but this is a good start.
Thanks to Lang Hames for making MCJIT a good replacement!
llvm-svn: 215111
|
Revision tags: llvmorg-3.5.0-rc2 |
|
#
d913448b |
| 04-Aug-2014 |
Eric Christopher <echristo@gmail.com> |
Remove the TargetMachine forwards for TargetSubtargetInfo based information and update all callers. No functional change.
llvm-svn: 214781
|
Revision tags: llvmorg-3.5.0-rc1 |
|
#
f7a02c17 |
| 21-Jul-2014 |
Tim Northover <tnorthover@apple.com> |
CodeGen: emit IR-level f16 conversion intrinsics as fptrunc/fpext
This makes the first stage DAG for @llvm.convert.to.fp16 an fptrunc, and correspondingly @llvm.convert.from.fp16 an fpext. The legal
CodeGen: emit IR-level f16 conversion intrinsics as fptrunc/fpext
This makes the first stage DAG for @llvm.convert.to.fp16 an fptrunc, and correspondingly @llvm.convert.from.fp16 an fpext. The legalisation path is now uniform, regardless of the input IR:
fptrunc -> FP_TO_FP16 (if f16 illegal) -> libcall fpext -> FP16_TO_FP (if f16 illegal) -> libcall
Each target should be able to select the version that best matches its operations and not be required to duplicate patterns for both fptrunc and FP_TO_FP16 (for example).
As a result we can remove some redundant AArch64 patterns.
llvm-svn: 213507
show more ...
|
#
20bd0ced |
| 18-Jul-2014 |
Tim Northover <tnorthover@apple.com> |
CodeGen: soften f16 type by default instead of marking legal.
Actual support for softening f16 operations is still limited, and can be added when it's needed. But Soften is much closer to being a u
CodeGen: soften f16 type by default instead of marking legal.
Actual support for softening f16 operations is still limited, and can be added when it's needed. But Soften is much closer to being a useful thing to try than keeping it Legal when no registers can actually hold such values.
Longer term, we probably want something between Soften and Promote semantics for most targets, it'll be more efficient to promote the 4 basic operations to f32 than libcall them.
llvm-svn: 213372
show more ...
|