Revision tags: llvmorg-4.0.1-rc2 |
|
#
d526b13e |
| 09-May-2017 |
Serge Pavlov <sepavloff@gmail.com> |
Add extra operand to CALLSEQ_START to keep frame part set up previously
Using arguments with attribute inalloca creates problems for verification of machine representation. This attribute instructs
Add extra operand to CALLSEQ_START to keep frame part set up previously
Using arguments with attribute inalloca creates problems for verification of machine representation. This attribute instructs the backend that the argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size stored in CALLSEQ_START in this case does not count the size of this argument. However CALLSEQ_END still keeps total frame size, as caller can be responsible for cleanup of entire frame. So CALLSEQ_START and CALLSEQ_END keep different frame size and the difference is treated by MachineVerifier as stack error. Currently there is no way to distinguish this case from actual errors.
This patch adds additional argument to CALLSEQ_START and its target-specific counterparts to keep size of stack that is set up prior to the call frame sequence. This argument allows MachineVerifier to calculate actual frame size associated with frame setup instruction and correctly process the case of inalloca arguments.
The changes made by the patch are: - Frame setup instructions get the second mandatory argument. It affects all targets that use frame pseudo instructions and touched many files although the changes are uniform. - Access to frame properties are implemented using special instructions rather than calls getOperand(N).getImm(). For X86 and ARM such replacement was made previously. - Changes that reflect appearance of additional argument of frame setup instruction. These involve proper instruction initialization and methods that access instruction arguments. - MachineVerifier retrieves frame size using method, which reports sum of frame parts initialized inside frame instruction pair and outside it.
The patch implements approach proposed by Quentin Colombet in https://bugs.llvm.org/show_bug.cgi?id=27481#c1. It fixes 9 tests failed with machine verifier enabled and listed in PR27481.
Differential Revision: https://reviews.llvm.org/D32394
llvm-svn: 302527
show more ...
|
Revision tags: llvmorg-4.0.1-rc1, llvmorg-4.0.0, llvmorg-4.0.0-rc4, llvmorg-4.0.0-rc3, llvmorg-4.0.0-rc2, llvmorg-4.0.0-rc1, llvmorg-3.9.1, llvmorg-3.9.1-rc3, llvmorg-3.9.1-rc2, llvmorg-3.9.1-rc1, llvmorg-3.9.0, llvmorg-3.9.0-rc3, llvmorg-3.9.0-rc2, llvmorg-3.9.0-rc1 |
|
#
e5a22f44 |
| 27-Jul-2016 |
Duncan P. N. Exon Smith <dexonsmith@apple.com> |
PowerPC: Avoid implicit iterator conversions, NFC
Avoid implicit conversions from MachineInstrBundleIterator to MachineInstr* in the PowerPC backend, mainly by preferring MachineInstr& over MachineI
PowerPC: Avoid implicit iterator conversions, NFC
Avoid implicit conversions from MachineInstrBundleIterator to MachineInstr* in the PowerPC backend, mainly by preferring MachineInstr& over MachineInstr* when a pointer isn't nullable and using range-based for loops.
There was one piece of questionable code in PPCInstrInfo::AnalyzeBranch, where a condition checked a pointer converted from an iterator for nullptr. Since this case is impossible (moreover, the code above guarantees that the iterator is valid), I removed the check when I changed the pointer to a reference.
Despite that case, there should be no functionality change here.
llvm-svn: 276864
show more ...
|
#
3bc1edf9 |
| 02-Jul-2016 |
Benjamin Kramer <benny.kra@googlemail.com> |
Use arrays or initializer lists to feed ArrayRefs instead of SmallVector where possible.
No functionality change intended.
llvm-svn: 274431
|
Revision tags: llvmorg-3.8.1, llvmorg-3.8.1-rc1, llvmorg-3.8.0, llvmorg-3.8.0-rc3, llvmorg-3.8.0-rc2, llvmorg-3.8.0-rc1 |
|
#
bfcff385 |
| 08-Jan-2016 |
Kyle Butt <kyle+llvm@iteratee.net> |
Add call sequence start and end for __tls_get_addr
This is a fix for bug http://llvm.org/bugs/show_bug.cgi?id=25839.
For a PIC TLS variable access in a function, prologue (mflr followed by std and
Add call sequence start and end for __tls_get_addr
This is a fix for bug http://llvm.org/bugs/show_bug.cgi?id=25839.
For a PIC TLS variable access in a function, prologue (mflr followed by std and stdu) gets scheduled after a tls_get_addr call. tls_get_addr messed up LR but no one saves/restores it.
Also added a test for save/restore clobbered registers during calling __tls_get_addr.
Patch by Tim Shen
llvm-svn: 257137
show more ...
|
Revision tags: llvmorg-3.7.1, llvmorg-3.7.1-rc2, llvmorg-3.7.1-rc1, llvmorg-3.7.0, llvmorg-3.7.0-rc4, llvmorg-3.7.0-rc3, studio-1.4, llvmorg-3.7.0-rc2, llvmorg-3.7.0-rc1, llvmorg-3.6.2, llvmorg-3.6.2-rc1 |
|
#
f00654e3 |
| 23-Jun-2015 |
Alexander Kornienko <alexfh@google.com> |
Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)
Apparently, the style needs to be agreed upon first.
llvm-svn: 240390
|
#
70bc5f13 |
| 19-Jun-2015 |
Alexander Kornienko <alexfh@google.com> |
Fixed/added namespace ending comments using clang-tidy. NFC
The patch is generated using this command:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-*,llvm-namespace-c
Fixed/added namespace ending comments using clang-tidy. NFC
The patch is generated using this command:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \ llvm/lib/
Thanks to Eugene Kosov for the original patch!
llvm-svn: 240137
show more ...
|
#
82e1fc5f |
| 21-May-2015 |
Hal Finkel <hfinkel@anl.gov> |
[PPC] Correct iterator bug in PPCTLSDynamicCall
Unfortunately, I can't reduce a small test case for this (although compiling mpfr-3.1.2 with -O2 -mcpu=a2 would fairly reliably trigger a crash), but
[PPC] Correct iterator bug in PPCTLSDynamicCall
Unfortunately, I can't reduce a small test case for this (although compiling mpfr-3.1.2 with -O2 -mcpu=a2 would fairly reliably trigger a crash), but the problem is fairly clear (at least once you know you're looking for one). If the TLS instruction being replaced was at the end of the block, we'd increment the iterator past it (so it would then point to MBB.end()), and then we'd increment it again as part of the for statement, thus overrunning the end of the list. Don't do that.
llvm-svn: 237974
show more ...
|
Revision tags: llvmorg-3.6.1, llvmorg-3.6.1-rc1, llvmorg-3.5.2, llvmorg-3.5.2-rc1, llvmorg-3.6.0 |
|
#
1df0c519 |
| 20-Feb-2015 |
Eric Christopher <echristo@gmail.com> |
Get the cached subtarget off the MachineFunction rather than inquiring for a new one from the TargetMachine.
llvm-svn: 230037
|
Revision tags: llvmorg-3.6.0-rc4, llvmorg-3.6.0-rc3 |
|
#
82f1c775 |
| 10-Feb-2015 |
Bill Schmidt <wschmidt@linux.vnet.ibm.com> |
[PowerPC] Fix reverted patch r227976 to avoid register assignment issues
See full discussion in http://reviews.llvm.org/D7491.
We now hide the add-immediate and call instructions together in a sepa
[PowerPC] Fix reverted patch r227976 to avoid register assignment issues
See full discussion in http://reviews.llvm.org/D7491.
We now hide the add-immediate and call instructions together in a separate pseudo-op, which is tagged to define GPR3 and clobber the call-killed registers. The PPCTLSDynamicCall pass prior to RA now expands this op into the two separate addi and call ops, with explicit definitions of GPR3 on both instructions, and explicit clobbers on the call instruction. The pass is now marked as requiring and preserving the LiveIntervals and SlotIndexes analyses, and fixes these up after the replacement sequences are introduced.
Self-hosting has been verified on LE P8 and BE P7 with various optimization levels, etc. It has also been verified with the --no-tls-optimize flag workaround removed.
llvm-svn: 228725
show more ...
|
#
81638eab |
| 04-Feb-2015 |
Bill Schmidt <wschmidt@linux.vnet.ibm.com> |
Replace tabs with spaces from r228116. Oops.
llvm-svn: 228117
|
#
1354f7c5 |
| 04-Feb-2015 |
Bill Schmidt <wschmidt@linux.vnet.ibm.com> |
[PowerPC] Handle 32-bit targets properly in PPCTLSDynamicCall.cpp
llvm-svn: 228116
|
#
685aa8b0 |
| 03-Feb-2015 |
Bill Schmidt <wschmidt@linux.vnet.ibm.com> |
[PowerPC] Yet another approach to __tls_get_addr
This patch is a third attempt to properly handle the local-dynamic and global-dynamic TLS models.
In my original implementation, calls to __tls_get_
[PowerPC] Yet another approach to __tls_get_addr
This patch is a third attempt to properly handle the local-dynamic and global-dynamic TLS models.
In my original implementation, calls to __tls_get_addr were hidden from view until the asm-printer phase, at which point the underlying branch-and-link instruction was created with proper relocations. This mostly worked well, but I used some repellent techniques to ensure that the TLS_GET_ADDR nodes at the SD and MI levels correctly received input from GPR3 and produced output into GPR3. This proved to work badly in the presence of multiple TLS variable accesses, with the copies to and from GPR3 being scheduled incorrectly and generally creating havoc.
In r221703, I addressed that problem by representing the calls to __tls_get_addr as true calls during instruction lowering. This had the advantage of removing all of the bad hacks and relying on the existing call machinery to properly glue the copies in place. It looked like this was going to be the right way to go.
However, as a side effect of the recent discovery of problems with linker optimizations for TLS, we discovered cases of suboptimal code generation with this strategy. The problem comes when tls_get_addr is called for the same address, and there is a resulting CSE opportunity. It turns out that in such cases MachineCSE will common the addis/addi instructions that set up the input value to tls_get_addr, but will not common the calls themselves. MachineCSE does not have any machinery to common idempotent calls. This is perfectly sensible, since presumably this would be done at the IR level, and introducing calls in the back end isn't commonplace. In any case, we end up with two calls to __tls_get_addr when one would suffice, and that isn't good.
I presumed that the original design would have allowed commoning of the machine-specific nodes that hid the __tls_get_addr calls, so as suggested by Ulrich Weigand, I went back to that design and cleaned it up so that the copies were properly held together by glue nodes. However, it turned out that this didn't work either...the presence of copies to physical registers kept the machine-specific nodes from being commoned also.
All of which leads to the design presented here. This is a return to the original design, except that no attempt is made to introduce copies to and from GPR3 during instruction lowering. Virtual registers are used until prior to register allocation. At that point, a special pass is run that identifies the machine-specific nodes that hide the tls_get_addr calls and introduces the copies to and from GPR3 around them. The register allocator then coalesces these copies away. With this design, MachineCSE succeeds in commoning tls_get_addr calls where possible, and we get nice optimal code generation (better than GCC at the moment, which does not common these calls).
One additional problem must be dealt with: After introducing the mentions of the physical register GPR3, the aggressive anti-dependence breaker sees opportunities to improve scheduling by selecting a different register instead. Flags must be used on the instruction descriptions to tell the anti-dependence breaker to keep its hands in its pockets.
One thing missing from the original design was recording a definition of the link register on the GET_TLS_ADDR nodes. Doing this was found to be insufficient to force a stack frame to be created, which led to looping behavior because two different LR values were stored at the same address. This appears to have been an oversight in PPCFrameLowering::determineFrameLayout(), which is repaired here.
Because MustSaveLR() returns true for calls to builtin_return_address, this changed the expected behavior of test/CodeGen/PowerPC/retaddr2.ll, which now stacks a frame but formerly did not. I've fixed the test case to reflect this.
There are existing TLS tests to catch regressions; the checks in test/CodeGen/PowerPC/tls-store2.ll proved to be too restrictive in the face of instruction scheduling with these changes, so I fixed that up.
I've added a new test case based on the PrettyStackTrace module that demonstrated the original problem. This checks that we get correct code generation and that CSE of the calls to __get_tls_addr has taken place.
llvm-svn: 227976
show more ...
|