#
a0cd09d4 |
| 16-Mar-2018 |
Andrew V. Tischenko <andrew.v.tischenko@gmail.com> |
This patch fixes the invalid usage of OptSize in Machine Combiner. Differential Revision: https://reviews.llvm.org/D43813
llvm-svn: 327721
|
Revision tags: llvmorg-6.0.0 |
|
#
08389192 |
| 26-Feb-2018 |
Andrew V. Tischenko <andrew.v.tischenko@gmail.com> |
The final step to close D41278 [MachineCombiner] Improve debug output (NFC). Differential Revision: https://reviews.llvm.org/D41278
llvm-svn: 326074
|
Revision tags: llvmorg-6.0.0-rc3 |
|
#
b65b078d |
| 15-Feb-2018 |
Andrew V. Tischenko <andrew.v.tischenko@gmail.com> |
(NFC)[MachineCombiner] Improve debug output.
llvm-svn: 325217
|
Revision tags: llvmorg-6.0.0-rc2 |
|
#
6805004c |
| 06-Feb-2018 |
Alexander Ivchenko <alexander.ivchenko@intel.com> |
Fix unused variable warning in release mode. NFC.
llvm-svn: 324330
|
#
c68428b5 |
| 31-Jan-2018 |
Florian Hahn <florian.hahn@arm.com> |
[MachineCombiner] Add check for optimal pattern order.
In D41587, @mssimpso discovered that the order of some patterns for AArch64 was sub-optimal. I thought a bit about how we could avoid that case
[MachineCombiner] Add check for optimal pattern order.
In D41587, @mssimpso discovered that the order of some patterns for AArch64 was sub-optimal. I thought a bit about how we could avoid that case in the future. I do not think there is a need for evaluating all patterns for now. But this patch adds an extra (expensive) check, that evaluates the latencies of all patterns, and ensures that the latency saved decreases for subsequent patterns.
This catches the sub-optimal order fixed in D41587, but I am not entirely happy with the check, as it only applies to sub-optimal patterns seen while building with EXPENSIVE_CHECKS on. It did not discover any other sub-optimal pattern ordering.
Reviewers: Gerolf, spatel, mssimpso
Reviewed By: Gerolf, mssimpso
Differential Revision: https://reviews.llvm.org/D41766
llvm-svn: 323873
show more ...
|
Revision tags: llvmorg-6.0.0-rc1 |
|
#
f1caa283 |
| 15-Dec-2017 |
Matthias Braun <matze@braunis.de> |
MachineFunction: Return reference from getFunction(); NFC
The Function can never be nullptr so we can return a reference.
llvm-svn: 320884
|
#
c468b648 |
| 13-Dec-2017 |
Michael Zolotukhin <mzolotukhin@apple.com> |
Remove redundant includes from lib/CodeGen.
llvm-svn: 320619
|
Revision tags: llvmorg-5.0.1, llvmorg-5.0.1-rc3 |
|
#
001c3dd2 |
| 06-Dec-2017 |
Florian Hahn <florian.hahn@arm.com> |
[MachineCombiner] Add up latencies of all instructions in new pattern.
Summary: When calculating the RootLatency, we add up all the latencies of the deleted instructions. But for NewRootLatency we o
[MachineCombiner] Add up latencies of all instructions in new pattern.
Summary: When calculating the RootLatency, we add up all the latencies of the deleted instructions. But for NewRootLatency we only add the latency of the new root instructions, ignoring the latencies of the other instructions inserted. This leads the combiner to underestimate the cost of patterns which add multiple instructions. This patch fixes that by summing up the latencies of all new instructions. For NewRootNode, the more complex getLatency function is used.
Note that we may be slightly more precise than just summing up all latencies. For example, consider a pattern like
r1 = INS1 .. r2 = INS2 .. r3 = INS3 r1, r2
I think in some other places, the total latency of the pattern would be estimated as lat(INS3) + max(lat(INS1), lat(INS2)). If you consider that worth changing, I think it would be best to do in a follow-up patch.
Reviewers: Gerolf, sebpop, spop, fhahn
Reviewed By: fhahn
Subscribers: evandro, llvm-commits
Differential Revision: https://reviews.llvm.org/D40307
llvm-svn: 319951
show more ...
|
Revision tags: llvmorg-5.0.1-rc2 |
|
#
b3bde2ea |
| 17-Nov-2017 |
David Blaikie <dblaikie@gmail.com> |
Fix a bunch more layering of CodeGen headers that are in Target
All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, n
Fix a bunch more layering of CodeGen headers that are in Target
All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around).
llvm-svn: 318490
show more ...
|
#
3f833edc |
| 08-Nov-2017 |
David Blaikie <dblaikie@gmail.com> |
Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layering
This header includes CodeGen headers, and is not, itself, included by any Target headers, so move it into CodeGen to match the
Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layering
This header includes CodeGen headers, and is not, itself, included by any Target headers, so move it into CodeGen to match the layering of its implementation.
llvm-svn: 317647
show more ...
|
Revision tags: llvmorg-5.0.1-rc1 |
|
#
194693e9 |
| 30-Oct-2017 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[MC] Split out register def/use idx calls to make debugging simpler. NFCI.
llvm-svn: 316927
|
#
e52abba2 |
| 11-Oct-2017 |
Florian Hahn <florian.hahn@arm.com> |
[MachineCombiner] Fix initialisation of LastUpdate for incremental update.
Summary: Fixes a bogus iterator resulting from the removal of a block's first instruction at the point that incremental upd
[MachineCombiner] Fix initialisation of LastUpdate for incremental update.
Summary: Fixes a bogus iterator resulting from the removal of a block's first instruction at the point that incremental update is enabled.
Patch by Paul Walker.
Reviewers: fhahn, Gerolf, efriedma, MatzeB
Reviewed By: fhahn
Subscribers: aemerson, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38734
llvm-svn: 315502
show more ...
|
#
ceb44947 |
| 20-Sep-2017 |
Florian Hahn <florian.hahn@arm.com> |
Recommit [MachineCombiner] Update instruction depths incrementally for large BBs.
This version of the patch fixes an off-by-one error causing PR34596. We do not need to use std::next(BlockIter) when
Recommit [MachineCombiner] Update instruction depths incrementally for large BBs.
This version of the patch fixes an off-by-one error causing PR34596. We do not need to use std::next(BlockIter) when calling updateDepths, as BlockIter already points to the next element.
Original commit message: > For large basic blocks with lots of combinable instructions, the > MachineTraceMetrics computations in MachineCombiner can dominate the compile > time, as computing the trace information is quadratic in the number of > instructions in a BB and it's relevant successors/predecessors.
> In most cases, knowing the instruction depth should be enough to make > combination decisions. As we already iterate over all instructions in a basic > block, the instruction depth can be computed incrementally. This reduces the > cost of machine-combine drastically in cases where lots of instructions > are combined. The major drawback is that AFAIK, computing the critical path > length cannot be done incrementally. Therefore we only compute > instruction depths incrementally, for basic blocks with more > instructions than inc_threshold. The -machine-combiner-inc-threshold > option can be used to set the threshold and allows for easier > experimenting and checking if using incremental updates for all basic > blocks has any impact on the performance. > > Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn > > Reviewed By: fhahn > > Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits > > Differential Revision: https://reviews.llvm.org/D36619
llvm-svn: 313751
show more ...
|
#
06e2a384 |
| 13-Sep-2017 |
Hans Wennborg <hans@hanshq.net> |
Revert r312719 "[MachineCombiner] Update instruction depths incrementally for large BBs."
This caused PR34596.
> [MachineCombiner] Update instruction depths incrementally for large BBs. > > Summary
Revert r312719 "[MachineCombiner] Update instruction depths incrementally for large BBs."
This caused PR34596.
> [MachineCombiner] Update instruction depths incrementally for large BBs. > > Summary: > For large basic blocks with lots of combinable instructions, the > MachineTraceMetrics computations in MachineCombiner can dominate the compile > time, as computing the trace information is quadratic in the number of > instructions in a BB and it's relevant successors/predecessors. > > In most cases, knowing the instruction depth should be enough to make > combination decisions. As we already iterate over all instructions in a basic > block, the instruction depth can be computed incrementally. This reduces the > cost of machine-combine drastically in cases where lots of instructions > are combined. The major drawback is that AFAIK, computing the critical path > length cannot be done incrementally. Therefore we only compute > instruction depths incrementally, for basic blocks with more > instructions than inc_threshold. The -machine-combiner-inc-threshold > option can be used to set the threshold and allows for easier > experimenting and checking if using incremental updates for all basic > blocks has any impact on the performance. > > Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn > > Reviewed By: fhahn > > Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits > > Differential Revision: https://reviews.llvm.org/D36619
llvm-svn: 313213
show more ...
|
#
d39b8a35 |
| 07-Sep-2017 |
Florian Hahn <florian.hahn@arm.com> |
[MachineCombiner] Update instruction depths incrementally for large BBs.
Summary: For large basic blocks with lots of combinable instructions, the MachineTraceMetrics computations in MachineCombiner
[MachineCombiner] Update instruction depths incrementally for large BBs.
Summary: For large basic blocks with lots of combinable instructions, the MachineTraceMetrics computations in MachineCombiner can dominate the compile time, as computing the trace information is quadratic in the number of instructions in a BB and it's relevant successors/predecessors.
In most cases, knowing the instruction depth should be enough to make combination decisions. As we already iterate over all instructions in a basic block, the instruction depth can be computed incrementally. This reduces the cost of machine-combine drastically in cases where lots of instructions are combined. The major drawback is that AFAIK, computing the critical path length cannot be done incrementally. Therefore we only compute instruction depths incrementally, for basic blocks with more instructions than inc_threshold. The -machine-combiner-inc-threshold option can be used to set the threshold and allows for easier experimenting and checking if using incremental updates for all basic blocks has any impact on the performance.
Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn
Reviewed By: fhahn
Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D36619
llvm-svn: 312719
show more ...
|
Revision tags: llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2, llvmorg-5.0.0-rc1 |
|
#
1d2dc681 |
| 13-Jul-2017 |
Jakub Kuderski <kubakuderski@gmail.com> |
[NFC] Move DEBUG_TYPE macro below includes...
in MachineCombiner.cpp.
llvm-svn: 307940
|
Revision tags: llvmorg-4.0.1, llvmorg-4.0.1-rc3, llvmorg-4.0.1-rc2 |
|
#
1527baab |
| 25-May-2017 |
Matthias Braun <matze@braunis.de> |
CodeGen: Rename DEBUG_TYPE to match passnames
Rename the DEBUG_TYPE to match the names of corresponding passes where it makes sense. Also establish the pattern of simply referencing DEBUG_TYPE inste
CodeGen: Rename DEBUG_TYPE to match passnames
Rename the DEBUG_TYPE to match the names of corresponding passes where it makes sense. Also establish the pattern of simply referencing DEBUG_TYPE instead of repeating the passname where possible.
llvm-svn: 303921
show more ...
|
Revision tags: llvmorg-4.0.1-rc1 |
|
#
17ce8a2f |
| 15-Mar-2017 |
Eric Christopher <echristo@gmail.com> |
Fix up grammar in a comment.
llvm-svn: 297898
|
Revision tags: llvmorg-4.0.0, llvmorg-4.0.0-rc4, llvmorg-4.0.0-rc3 |
|
#
8da96914 |
| 13-Feb-2017 |
Andrew V. Tischenko <andrew.v.tischenko@gmail.com> |
Compile time decreasing in the case we're dealing with Machine Combiner. Before this patch compile time was about 21s (see below). After this patch we have less than 2s (see bellow).
Intel(R) Xeo
Compile time decreasing in the case we're dealing with Machine Combiner. Before this patch compile time was about 21s (see below). After this patch we have less than 2s (see bellow).
Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
DAGCombiner - trunk time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math real 0m1.685s
DAGCombiner + Speed patch time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math real 0m1.655s
MachineCombiner w/o Speed patch time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math real 0m21.614s
MachineCombiner + Speed patch time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math real 0m1.593s
The test spill_fdiv.ll is attached to D29627 D29627 should be closed.
llvm-svn: 294936
show more ...
|
Revision tags: llvmorg-4.0.0-rc2 |
|
#
a4976c61 |
| 29-Jan-2017 |
Matthias Braun <matze@braunis.de> |
MachineInstr: Remove parameter from dump()
The primary use of the dump() functions in LLVM is for use in a debugger. Unfortunately lldb does not seem to handle default arguments so using `p SomeMI.d
MachineInstr: Remove parameter from dump()
The primary use of the dump() functions in LLVM is for use in a debugger. Unfortunately lldb does not seem to handle default arguments so using `p SomeMI.dump()` fails and you have to type the longer `p SomeMI.dump(nullptr)`. Remove the paramter to make the most common use easy. (You can always construct something like `p SomeMI.print(dbgs(),MyTII)` if you need more features).
Differential Revision: https://reviews.llvm.org/D29241
llvm-svn: 293440
show more ...
|
Revision tags: llvmorg-4.0.0-rc1 |
|
#
77794843 |
| 21-Dec-2016 |
Sebastian Pop <sebpop@gmail.com> |
machine combiner: fix pretty printer
we used to print UNKNOWN instructions when the instruction to be printer was not yet inserted in any BB: in that case the pretty printer would not be able to com
machine combiner: fix pretty printer
we used to print UNKNOWN instructions when the instruction to be printer was not yet inserted in any BB: in that case the pretty printer would not be able to compute a TII as the instruction does not belong to any BB or function yet. This patch explicitly passes the TII to the pretty-printer.
Differential Revision: https://reviews.llvm.org/D27645
llvm-svn: 290228
show more ...
|
#
e08d9c7c |
| 11-Dec-2016 |
Sebastian Pop <sebpop@gmail.com> |
instr-combiner: sum up all latencies of the transformed instructions
We have found that -- when the selected subarchitecture has a scheduling model and we are not optimizing for size -- the machine-
instr-combiner: sum up all latencies of the transformed instructions
We have found that -- when the selected subarchitecture has a scheduling model and we are not optimizing for size -- the machine-instruction combiner uses a too-simple algorithm to compute the cost of one of the two alternatives [before and after running a combining pass on a section of code], and therefor it throws away the combination results too often.
This fix has the potential to help any ISA with the potential to combine instructions and for which at least one subarchitecture has a scheduling model. As of now, this is only known to definitely affect AArch64 subarchitectures with a scheduling model.
Regression tested on AMD64/GNU-Linux, new test case tested to fail on an unpatched compiler and pass on a patched compiler.
Patch by Abe Skolnik and Sebastian Pop.
llvm-svn: 289399
show more ...
|
Revision tags: llvmorg-3.9.1, llvmorg-3.9.1-rc3, llvmorg-3.9.1-rc2, llvmorg-3.9.1-rc1 |
|
#
117296c0 |
| 01-Oct-2016 |
Mehdi Amini <mehdi.amini@apple.com> |
Use StringRef in Pass/PassManager APIs (NFC)
llvm-svn: 283004
|
Revision tags: llvmorg-3.9.0, llvmorg-3.9.0-rc3, llvmorg-3.9.0-rc2, llvmorg-3.9.0-rc1, llvmorg-3.8.1, llvmorg-3.8.1-rc1 |
|
#
01b3a618 |
| 24-Apr-2016 |
Gerolf Hoflehner <ghoflehner@apple.com> |
[MachineCombiner] Support for floating-point FMA on ARM64 (re-commit r267098)
The original patch caused crashes because it could derefence a null pointer for SelectionDAGTargetInfo for targets that
[MachineCombiner] Support for floating-point FMA on ARM64 (re-commit r267098)
The original patch caused crashes because it could derefence a null pointer for SelectionDAGTargetInfo for targets that do not define it.
Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are:
- DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math
llvm-svn: 267328
show more ...
|
#
591c3795 |
| 22-Apr-2016 |
Daniel Sanders <daniel.sanders@imgtec.com> |
Revert r267098 - [MachineCombiner] Support for floating-point FMA on ARM64
It introduced buildbot failures on clang-cmake-mips, clang-ppc64le-linux, among others.
llvm-svn: 267127
|