Revision tags: llvmorg-3.6.0 |
|
#
c93a9a2c |
| 25-Feb-2015 |
Hal Finkel <hfinkel@anl.gov> |
[PowerPC] Add support for the QPX vector instruction set
This adds support for the QPX vector instruction set, which is used by the enhanced A2 cores on the IBM BG/Q supercomputers. QPX vectors are
[PowerPC] Add support for the QPX vector instruction set
This adds support for the QPX vector instruction set, which is used by the enhanced A2 cores on the IBM BG/Q supercomputers. QPX vectors are 256 bytes wide, holding 4 double-precision floating-point values. Boolean values, modeled here as <4 x i1> are actually also represented as floating-point values (essentially { -1, 1 } for { false, true }). QPX shares many features with Altivec and VSX, but is distinct from both of them. One major difference is that, instead of adding completely-separate vector registers, QPX vector registers are extensions of the scalar floating-point registers (lane 0 is the corresponding scalar floating-point value). The operations supported on QPX vectors mirrors that supported on the scalar floating-point values (with some additional ones for permutations and logical/comparison operations).
I've been maintaining this support out-of-tree, as part of the bgclang project, for several years. This is not the entire bgclang patch set, but is most of the subset that can be cleanly integrated into LLVM proper at this time. Adding this to the LLVM backend is part of my efforts to rebase bgclang to the current LLVM trunk, but is independently useful (especially for codes that use LLVM as a JIT in library form).
The assembler/disassembler test coverage is complete. The CodeGen test coverage is not, but I've included some tests, and more will be added as follow-up work.
llvm-svn: 230413
show more ...
|
Revision tags: llvmorg-3.6.0-rc4 |
|
#
5bedaf93 |
| 14-Feb-2015 |
Duncan P. N. Exon Smith <dexonsmith@apple.com> |
PowerPC: Canonicalize access to function attributes, NFC
Canonicalize access to function attributes to use the simpler API.
getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getF
PowerPC: Canonicalize access to function attributes, NFC
Canonicalize access to function attributes to use the simpler API.
getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind)
getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind)
llvm-svn: 229224
show more ...
|
Revision tags: llvmorg-3.6.0-rc3 |
|
#
e6698d53 |
| 01-Feb-2015 |
Hal Finkel <hfinkel@anl.gov> |
[PowerPC] Make r2 allocatable on PPC64/ELF for some leaf functions
The TOC base pointer is passed in r2, and we normally reserve this register so that we can depend on it being there. However, for l
[PowerPC] Make r2 allocatable on PPC64/ELF for some leaf functions
The TOC base pointer is passed in r2, and we normally reserve this register so that we can depend on it being there. However, for leaf functions, and specifically those leaf functions that don't do any TOC access of their own (which is generally due to accessing the constant pool, using TLS, etc.), we can treat r2 as an ordinary callee-saved register (it must be callee-saved because, for local direct calls, the linker will not insert any save/restore code).
The allocation order has been changed slightly for PPC64/ELF systems to put r2 at the end of the list (while leaving it near the beginning for Darwin systems to prevent unnecessary output changes). While r2 is allocatable, using it still requires spill/restore traffic, and thus comes at the end of the list.
llvm-svn: 227745
show more ...
|
Revision tags: llvmorg-3.6.0-rc2 |
|
#
065d16bc |
| 30-Jan-2015 |
Eric Christopher <echristo@gmail.com> |
Migrage PPCRegisterInfo to use the cached subtarget.
llvm-svn: 227546
|
Revision tags: llvmorg-3.6.0-rc1 |
|
#
934361a4 |
| 14-Jan-2015 |
Hal Finkel <hfinkel@anl.gov> |
Revert "r225811 - Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support""
This re-applies r225808, fixed to avoid problems with SDAG dependencies along with the preceding fix to ScheduleDAGSDN
Revert "r225811 - Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support""
This re-applies r225808, fixed to avoid problems with SDAG dependencies along with the preceding fix to ScheduleDAGSDNodes::RegDefIter::InitNodeNumDefs. These problems caused the original regression tests to assert/segfault on many (but not all) systems.
Original commit message:
This commit does two things:
1. Refactors PPCFastISel to use more of the common infrastructure for call lowering (this lets us take advantage of this common code for lowering some common intrinsics, stackmap/patchpoint among them).
2. Adds support for stackmap/patchpoint lowering. For the most part, this is very similar to the support in the AArch64 target, with the obvious differences (different registers, NOP instructions, etc.). The test cases are adapted from the AArch64 test cases.
One difference of note is that the patchpoint call sequence takes 24 bytes, so you can't use less than that (on AArch64 you can go down to 16). Also, as noted in the docs, we take the patchpoint address to be the actual code address (assuming the call is local in the TOC-sharing sense), which should yield higher performance than generating the full cross-DSO indirect-call sequence and is likely just as useful for JITed code (if not, we'll change it).
StackMaps and Patchpoints are still marked as experimental, and so this support is doubly experimental. So go ahead and experiment!
llvm-svn: 225909
show more ...
|
#
63fb9281 |
| 13-Jan-2015 |
Hal Finkel <hfinkel@anl.gov> |
Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support"
Reverting this while I investiage buildbot failures (segfaulting in GetCostForDef at ScheduleDAGRRList.cpp:314).
llvm-svn: 225811
|
#
821befd5 |
| 13-Jan-2015 |
Hal Finkel <hfinkel@anl.gov> |
[PowerPC] Add StackMap/PatchPoint support
This commit does two things:
1. Refactors PPCFastISel to use more of the common infrastructure for call lowering (this lets us take advantage of this
[PowerPC] Add StackMap/PatchPoint support
This commit does two things:
1. Refactors PPCFastISel to use more of the common infrastructure for call lowering (this lets us take advantage of this common code for lowering some common intrinsics, stackmap/patchpoint among them).
2. Adds support for stackmap/patchpoint lowering. For the most part, this is very similar to the support in the AArch64 target, with the obvious differences (different registers, NOP instructions, etc.). The test cases are adapted from the AArch64 test cases.
One difference of note is that the patchpoint call sequence takes 24 bytes, so you can't use less than that (on AArch64 you can go down to 16). Also, as noted in the docs, we take the patchpoint address to be the actual code address (assuming the call is local in the TOC-sharing sense), which should yield higher performance than generating the full cross-DSO indirect-call sequence and is likely just as useful for JITed code (if not, we'll change it).
StackMaps and Patchpoints are still marked as experimental, and so this support is doubly experimental. So go ahead and experiment!
llvm-svn: 225808
show more ...
|
Revision tags: llvmorg-3.5.1, llvmorg-3.5.1-rc2, llvmorg-3.5.1-rc1, llvmorg-3.5.0, llvmorg-3.5.0-rc4, llvmorg-3.5.0-rc3, llvmorg-3.5.0-rc2 |
|
#
fc6de428 |
| 05-Aug-2014 |
Eric Christopher <echristo@gmail.com> |
Have MachineFunction cache a pointer to the subtarget to make lookups shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lo
Have MachineFunction cache a pointer to the subtarget to make lookups shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lookups from the MachineFunction easily.
Update the MIPS subtarget switching machinery to update this pointer at the same time it runs.
llvm-svn: 214838
show more ...
|
#
d913448b |
| 04-Aug-2014 |
Eric Christopher <echristo@gmail.com> |
Remove the TargetMachine forwards for TargetSubtargetInfo based information and update all callers. No functional change.
llvm-svn: 214781
|
Revision tags: llvmorg-3.5.0-rc1 |
|
#
3ee2af7d |
| 18-Jul-2014 |
Hal Finkel <hfinkel@anl.gov> |
[PowerPC] 32-bit ELF PIC support
This adds initial support for PPC32 ELF PIC (Position Independent Code; the -fPIC variety), thus rectifying a long-standing deficiency in the PowerPC backend.
Patch
[PowerPC] 32-bit ELF PIC support
This adds initial support for PPC32 ELF PIC (Position Independent Code; the -fPIC variety), thus rectifying a long-standing deficiency in the PowerPC backend.
Patch by Justin Hibbits!
llvm-svn: 213427
show more ...
|
#
ea147a9d |
| 11-Jul-2014 |
Ulrich Weigand <ulrich.weigand@de.ibm.com> |
[PowerPC] Fix invalid displacement created by LocalStackAlloc
This commit fixes a bug in PPCRegisterInfo::isFrameOffsetLegal that could result in the LocalStackAlloc pass creating an MI instruction
[PowerPC] Fix invalid displacement created by LocalStackAlloc
This commit fixes a bug in PPCRegisterInfo::isFrameOffsetLegal that could result in the LocalStackAlloc pass creating an MI instruction out-of-range displacement: %vreg17<def> = LD 33184, %vreg31; mem:LD8[%g](align=32) %G8RC:%vreg17 G8RC_and_G8RC_NOX0:%vreg31 (In final assembler output the top bits are stripped off, resulting in a negative offset loading from below the stack pointer.)
Common code expects the isFrameOffsetLegal routine to verify whether adding a given offset to the offset already present in the instruction results in a valid displacement. However, on PowerPC the routine did not take the already present instruction offset into account.
This commit fixes isFrameOffsetLegal to add the instruction offset, and updates a local caller (needsFrameBaseReg) to no longer add the instruction offset itself before calling isFrameOffsetLegal.
Reviewed by Hal Finkel.
llvm-svn: 212832
show more ...
|
#
14bd521f |
| 27-Jun-2014 |
Ulrich Weigand <ulrich.weigand@de.ibm.com> |
[PowerPC] Constrain base register in PPCRegisterInfo::resolveFrameIndex
I've run into a bug where current LLVM at -O0 (with fast-isel) generated invalid code like:
ld 0, 20936(1)
[PowerPC] Constrain base register in PPCRegisterInfo::resolveFrameIndex
I've run into a bug where current LLVM at -O0 (with fast-isel) generated invalid code like:
ld 0, 20936(1) # 8-byte Folded Reload stw 12, 10348(0) stw 12, 10344(0)
The underlying vreg had been introduced as base register by the Local Stack Slot Allocation pass. That register was constrained to G8RC by PPCRegisterInfo::materializeFrameBaseRegister to match the ADDI instruction used to set it, but it was *not* constrained to G8RC_NOX0 to fit the *use* of the register in an address.
That should have happened in PPCRegisterInfo::resolveFrameIndex. This patch adds an appropriate constrainRegClass call.
Reviewed by Hal Finkel.
llvm-svn: 211897
show more ...
|
Revision tags: llvmorg-3.4.2, llvmorg-3.4.2-rc1, llvmorg-3.4.1, llvmorg-3.4.1-rc2 |
|
#
84e68b29 |
| 22-Apr-2014 |
Chandler Carruth <chandlerc@gmail.com> |
[Modules] Fix potential ODR violations by sinking the DEBUG_TYPE definition below all of the header #include lines, lib/Target/... edition.
llvm-svn: 206842
|
#
d174b72a |
| 22-Apr-2014 |
Chandler Carruth <chandlerc@gmail.com> |
[cleanup] Lift using directives, DEBUG_TYPE definitions, and even some system headers above the includes of generated '.inc' files that actually contain code. In a few targets this was already done p
[cleanup] Lift using directives, DEBUG_TYPE definitions, and even some system headers above the includes of generated '.inc' files that actually contain code. In a few targets this was already done pretty consistently, but it wasn't done *really* consistently anywhere. It is strictly cleaner IMO and necessary in a bunch of places where the DEBUG_TYPE is referenced from the generated code. Consistency with the necessary places trumps. Hopefully the build bots are OK with the movement of intrin.h...
llvm-svn: 206838
show more ...
|
Revision tags: llvmorg-3.4.1-rc1 |
|
#
840beec2 |
| 04-Apr-2014 |
Craig Topper <craig.topper@gmail.com> |
Make consistent use of MCPhysReg instead of uint16_t throughout the tree.
llvm-svn: 205610
|
#
36c49533 |
| 02-Apr-2014 |
Jim Grosbach <grosbach@apple.com> |
Simplify resolveFrameIndex() signature.
Just pass a MachineInstr reference rather than an MBB iterator. Creating a MachineInstr& is the first thing every implementation did anyway.
llvm-svn: 205453
|
#
19be506a |
| 29-Mar-2014 |
Hal Finkel <hfinkel@anl.gov> |
[PowerPC] Add subregister classes for f64 VSX values
We had stored both f64 values and v2f64, etc. values in the VSX registers. This worked, but was suboptimal because we would always spill 16-byte
[PowerPC] Add subregister classes for f64 VSX values
We had stored both f64 values and v2f64, etc. values in the VSX registers. This worked, but was suboptimal because we would always spill 16-byte values even through we almost always had scalar 8-byte values. This resulted in an increase in stack-size use, extra memory bandwidth, etc. To fix this, I've added 64-bit subregisters of the Altivec registers, and combined those with the existing scalar floating-point registers to form a class of VSX scalar floating-point registers. The ABI code has also been enhanced to use this register class and some other necessary improvements have been made.
llvm-svn: 205075
show more ...
|
#
27774d92 |
| 13-Mar-2014 |
Hal Finkel <hfinkel@anl.gov> |
[PowerPC] Initial support for the VSX instruction set
VSX is an ISA extension supported on the POWER7 and later cores that enhances floating-point vector and scalar capabilities. Among other things,
[PowerPC] Initial support for the VSX instruction set
VSX is an ISA extension supported on the POWER7 and later cores that enhances floating-point vector and scalar capabilities. Among other things, this adds <2 x double> support and generally helps to reduce register pressure.
The interesting part of this ISA feature is the register configuration: there are 64 new 128-bit vector registers, the 32 of which are super-registers of the existing 32 scalar floating-point registers, and the second 32 of which overlap with the 32 Altivec vector registers. This makes things like vector insertion and extraction tricky: this can be free but only if we force a restriction to the right register subclass when needed. A new "minipass" PPCVSXCopy takes care of this (although it could do a more-optimal job of it; see the comment about unnecessary copies below).
Please note that, currently, VSX is not enabled by default when targeting anything because it is not yet ready for that. The assembler and disassembler are fully implemented and tested. However:
- CodeGen support causes miscompiles; test-suite runtime failures: MultiSource/Benchmarks/FreeBench/distray/distray MultiSource/Benchmarks/McCat/08-main/main MultiSource/Benchmarks/Olden/voronoi/voronoi MultiSource/Benchmarks/mafft/pairlocalalign MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4 SingleSource/Benchmarks/CoyoteBench/almabench SingleSource/Benchmarks/Misc/matmul_f64_4x4
- The lowering currently falls back to using Altivec instructions far more than it should. Worse, there are some things that are scalarized through the stack that shouldn't be.
- A lot of unnecessary copies make it past the optimizers, and this needs to be fixed.
- Many more regression tests are needed.
Normally, I'd fix these things prior to committing, but there are some students and other contributors who would like to work this, and so it makes sense to move this development process upstream where it can be subject to the regular code-review procedures.
llvm-svn: 203768
show more ...
|
#
1da35121 |
| 12-Mar-2014 |
Patrik Hagglund <patrik.h.hagglund@ericsson.com> |
Replace '#include ValueTypes.h' with forward declarations.
In some cases the include is pushed "downstream" (or removed if unused).
llvm-svn: 203644
|
#
940ab934 |
| 28-Feb-2014 |
Hal Finkel <hfinkel@anl.gov> |
Add CR-bit tracking to the PowerPC backend for i1 values
This change enables tracking i1 values in the PowerPC backend using the condition register bits. These bits can be treated on PowerPC as sepa
Add CR-bit tracking to the PowerPC backend for i1 values
This change enables tracking i1 values in the PowerPC backend using the condition register bits. These bits can be treated on PowerPC as separate registers; individual bit operations (and, or, xor, etc.) are supported. Tracking booleans in CR bits has several advantages:
- Reduction in register pressure (because we no longer need GPRs to store boolean values).
- Logical operations on booleans can be handled more efficiently; we used to have to move all results from comparisons into GPRs, perform promoted logical operations in GPRs, and then move the result back into condition register bits to be used by conditional branches. This can be very inefficient, because the throughput of these CR <-> GPR moves have high latency and low throughput (especially when other associated instructions are accounted for).
- On the POWER7 and similar cores, we can increase total throughput by using the CR bits. CR bit operations have a dedicated functional unit.
Most of this is more-or-less mechanical: Adjustments were needed in the calling-convention code, support was added for spilling/restoring individual condition-register bits, and conditional branch instruction definitions taking specific CR bits were added (plus patterns and code for generating bit-level operations).
This is enabled by default when running at -O2 and higher. For -O0 and -O1, where the ability to debug is more important, this feature is disabled by default. Individual CR bits do not have assigned DWARF register numbers, and storing values in CR bits makes them invisible to the debugger.
It is critical, however, that we don't move i1 values that have been promoted to larger values (such as those passed as function arguments) into bit registers only to quickly turn around and move the values back into GPRs (such as happens when values are returned by functions). A pair of target-specific DAG combines are added to remove the trunc/extends in: trunc(binary-ops(binary-ops(zext(x), zext(y)), ...) and: zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...) In short, we only want to use CR bits where some of the i1 values come from comparisons or are used by conditional branches or selects. To put it another way, if we can do the entire i1 computation in GPRs, then we probably should (on the POWER7, the GPR-operation throughput is higher, and for all cores, the CR <-> GPR moves are expensive).
POWER7 test-suite performance results (from 10 runs in each configuration):
SingleSource/Benchmarks/Misc/mandel-2: 35% speedup MultiSource/Benchmarks/Prolangs-C++/city/city: 21% speedup MultiSource/Benchmarks/MiBench/automotive-susan: 23% speedup SingleSource/Benchmarks/CoyoteBench/huffbench: 13% speedup SingleSource/Benchmarks/Misc-C++/Large/sphereflake: 13% speedup SingleSource/Benchmarks/Misc-C++/mandel-text: 10% speedup
SingleSource/Benchmarks/Misc-C++-EH/spirit: 10% slowdown MultiSource/Applications/lemon/lemon: 8% slowdown
llvm-svn: 202451
show more ...
|
Revision tags: llvmorg-3.4.0, llvmorg-3.4.0-rc3, llvmorg-3.4.0-rc2, llvmorg-3.4.0-rc1 |
|
#
e90fd9c5 |
| 07-Oct-2013 |
Rafael Espindola <rafael.espindola@gmail.com> |
Remove getEHExceptionRegister and getEHHandlerRegister.
They haven't been used for a long time. Patch by MathOnNapkins.
llvm-svn: 192099
|
#
8d86fe7d |
| 30-Aug-2013 |
Bill Schmidt <wschmidt@linux.vnet.ibm.com> |
[PowerPC] Add handling for conversions to fast-isel.
Yet another chunk of fast-isel code. This one handles various conversions involving floating-point. (It also includes some miscellaneous handli
[PowerPC] Add handling for conversions to fast-isel.
Yet another chunk of fast-isel code. This one handles various conversions involving floating-point. (It also includes some miscellaneous handling throughout the back end for LWA_32 and LWAX_32 that should have been part of the load-store patch.)
llvm-svn: 189677
show more ...
|
#
a5c536e1 |
| 01-Aug-2013 |
Bill Wendling <isanbard@gmail.com> |
Use function attributes to indicate that we don't want to realign the stack.
Function attributes are the future! So just query whether we want to realign the stack directly from the function instead
Use function attributes to indicate that we don't want to realign the stack.
Function attributes are the future! So just query whether we want to realign the stack directly from the function instead of through a random target options structure.
llvm-svn: 187618
show more ...
|
#
1860763c |
| 18-Jul-2013 |
Hal Finkel <hfinkel@anl.gov> |
PPC: Support dynamic allocas with large alignment
Support for dynamic stack alignments in the PPC backend has been unfinished, in part because it depends on dynamic stack realignment (which I only j
PPC: Support dynamic allocas with large alignment
Support for dynamic stack alignments in the PPC backend has been unfinished, in part because it depends on dynamic stack realignment (which I only just recently implemented fully). Now we can also support dynamic allocas with higher than the default target stack alignment (16 bytes).
In order to round-up the requested size to the maximum requested alignment, we need an additional register to hold the rounded-up size. We're already using one scavenged register to hold the previous stack-pointer value (which needs to be stored with the signal-safe stdux update), and so when we have dynamic allocas and a large alignment, we allocate two emergency spill slots for the scavenger.
llvm-svn: 186562
show more ...
|
#
f05d6c78 |
| 17-Jul-2013 |
Hal Finkel <hfinkel@anl.gov> |
PPC: Add base-pointer support to builtin setjmp/longjmp
First, this changes the base-pointer implementation to remove an unnecessary complication (and one that is incompatible with how builtin SjLj
PPC: Add base-pointer support to builtin setjmp/longjmp
First, this changes the base-pointer implementation to remove an unnecessary complication (and one that is incompatible with how builtin SjLj is implemented): instead of using r31 as the base pointer when it is not needed as a frame pointer, now the base pointer will always be r30 when needed.
Second, we introduce another pseudo register, BP, which is used just like the FP pseudo register to refer to the base register before we know for certain what register it will be.
Third, we now save BP into the jmp_buf, and restore r30 from that slot in longjmp. If the function that called setjmp did not use a base pointer, then r30 will be overwritten by the setjmp-calling-function's restore code. FP restoration (which is restored into r31) works the same way.
llvm-svn: 186545
show more ...
|