MachineCombiner.cpp - OpenGrok history log for /llvm-project/llvm/lib/CodeGen/MachineCombiner.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# a0cd09d4	16-Mar-2018	Andrew V. Tischenko <andrew.v.tischenko@gmail.com>	This patch fixes the invalid usage of OptSize in Machine Combiner. Differential Revision: https://reviews.llvm.org/D43813 llvm-svn: 327721
Revision tags: llvmorg-6.0.0
# 08389192	26-Feb-2018	Andrew V. Tischenko <andrew.v.tischenko@gmail.com>	The final step to close D41278 [MachineCombiner] Improve debug output (NFC). Differential Revision: https://reviews.llvm.org/D41278 llvm-svn: 326074
Revision tags: llvmorg-6.0.0-rc3
# b65b078d	15-Feb-2018	Andrew V. Tischenko <andrew.v.tischenko@gmail.com>	(NFC)[MachineCombiner] Improve debug output. llvm-svn: 325217
Revision tags: llvmorg-6.0.0-rc2
# 6805004c	06-Feb-2018	Alexander Ivchenko <alexander.ivchenko@intel.com>	Fix unused variable warning in release mode. NFC. llvm-svn: 324330
# c68428b5	31-Jan-2018	Florian Hahn <florian.hahn@arm.com>	[MachineCombiner] Add check for optimal pattern order. In D41587, @mssimpso discovered that the order of some patterns for AArch64 was sub-optimal. I thought a bit about how we could avoid that case [MachineCombiner] Add check for optimal pattern order. In D41587, @mssimpso discovered that the order of some patterns for AArch64 was sub-optimal. I thought a bit about how we could avoid that case in the future. I do not think there is a need for evaluating all patterns for now. But this patch adds an extra (expensive) check, that evaluates the latencies of all patterns, and ensures that the latency saved decreases for subsequent patterns. This catches the sub-optimal order fixed in D41587, but I am not entirely happy with the check, as it only applies to sub-optimal patterns seen while building with EXPENSIVE_CHECKS on. It did not discover any other sub-optimal pattern ordering. Reviewers: Gerolf, spatel, mssimpso Reviewed By: Gerolf, mssimpso Differential Revision: https://reviews.llvm.org/D41766 llvm-svn: 323873 show more ...
Revision tags: llvmorg-6.0.0-rc1
# f1caa283	15-Dec-2017	Matthias Braun <matze@braunis.de>	MachineFunction: Return reference from getFunction(); NFC The Function can never be nullptr so we can return a reference. llvm-svn: 320884
# c468b648	13-Dec-2017	Michael Zolotukhin <mzolotukhin@apple.com>	Remove redundant includes from lib/CodeGen. llvm-svn: 320619
Revision tags: llvmorg-5.0.1, llvmorg-5.0.1-rc3
# 001c3dd2	06-Dec-2017	Florian Hahn <florian.hahn@arm.com>	[MachineCombiner] Add up latencies of all instructions in new pattern. Summary: When calculating the RootLatency, we add up all the latencies of the deleted instructions. But for NewRootLatency we o [MachineCombiner] Add up latencies of all instructions in new pattern. Summary: When calculating the RootLatency, we add up all the latencies of the deleted instructions. But for NewRootLatency we only add the latency of the new root instructions, ignoring the latencies of the other instructions inserted. This leads the combiner to underestimate the cost of patterns which add multiple instructions. This patch fixes that by summing up the latencies of all new instructions. For NewRootNode, the more complex getLatency function is used. Note that we may be slightly more precise than just summing up all latencies. For example, consider a pattern like r1 = INS1 .. r2 = INS2 .. r3 = INS3 r1, r2 I think in some other places, the total latency of the pattern would be estimated as lat(INS3) + max(lat(INS1), lat(INS2)). If you consider that worth changing, I think it would be best to do in a follow-up patch. Reviewers: Gerolf, sebpop, spop, fhahn Reviewed By: fhahn Subscribers: evandro, llvm-commits Differential Revision: https://reviews.llvm.org/D40307 llvm-svn: 319951 show more ...
Revision tags: llvmorg-5.0.1-rc2
# b3bde2ea	17-Nov-2017	David Blaikie <dblaikie@gmail.com>	Fix a bunch more layering of CodeGen headers that are in Target All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, n Fix a bunch more layering of CodeGen headers that are in Target All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490 show more ...
# 3f833edc	08-Nov-2017	David Blaikie <dblaikie@gmail.com>	Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layering This header includes CodeGen headers, and is not, itself, included by any Target headers, so move it into CodeGen to match the Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layering This header includes CodeGen headers, and is not, itself, included by any Target headers, so move it into CodeGen to match the layering of its implementation. llvm-svn: 317647 show more ...
Revision tags: llvmorg-5.0.1-rc1
# 194693e9	30-Oct-2017	Simon Pilgrim <llvm-dev@redking.me.uk>	[MC] Split out register def/use idx calls to make debugging simpler. NFCI. llvm-svn: 316927
# e52abba2	11-Oct-2017	Florian Hahn <florian.hahn@arm.com>	[MachineCombiner] Fix initialisation of LastUpdate for incremental update. Summary: Fixes a bogus iterator resulting from the removal of a block's first instruction at the point that incremental upd [MachineCombiner] Fix initialisation of LastUpdate for incremental update. Summary: Fixes a bogus iterator resulting from the removal of a block's first instruction at the point that incremental update is enabled. Patch by Paul Walker. Reviewers: fhahn, Gerolf, efriedma, MatzeB Reviewed By: fhahn Subscribers: aemerson, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D38734 llvm-svn: 315502 show more ...
# ceb44947	20-Sep-2017	Florian Hahn <florian.hahn@arm.com>	Recommit [MachineCombiner] Update instruction depths incrementally for large BBs. This version of the patch fixes an off-by-one error causing PR34596. We do not need to use std::next(BlockIter) when Recommit [MachineCombiner] Update instruction depths incrementally for large BBs. This version of the patch fixes an off-by-one error causing PR34596. We do not need to use std::next(BlockIter) when calling updateDepths, as BlockIter already points to the next element. Original commit message: > For large basic blocks with lots of combinable instructions, the > MachineTraceMetrics computations in MachineCombiner can dominate the compile > time, as computing the trace information is quadratic in the number of > instructions in a BB and it's relevant successors/predecessors. > In most cases, knowing the instruction depth should be enough to make > combination decisions. As we already iterate over all instructions in a basic > block, the instruction depth can be computed incrementally. This reduces the > cost of machine-combine drastically in cases where lots of instructions > are combined. The major drawback is that AFAIK, computing the critical path > length cannot be done incrementally. Therefore we only compute > instruction depths incrementally, for basic blocks with more > instructions than inc_threshold. The -machine-combiner-inc-threshold > option can be used to set the threshold and allows for easier > experimenting and checking if using incremental updates for all basic > blocks has any impact on the performance. > > Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn > > Reviewed By: fhahn > > Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits > > Differential Revision: https://reviews.llvm.org/D36619 llvm-svn: 313751 show more ...
# 06e2a384	13-Sep-2017	Hans Wennborg <hans@hanshq.net>	Revert r312719 "[MachineCombiner] Update instruction depths incrementally for large BBs." This caused PR34596. > [MachineCombiner] Update instruction depths incrementally for large BBs. > > Summary Revert r312719 "[MachineCombiner] Update instruction depths incrementally for large BBs." This caused PR34596. > [MachineCombiner] Update instruction depths incrementally for large BBs. > > Summary: > For large basic blocks with lots of combinable instructions, the > MachineTraceMetrics computations in MachineCombiner can dominate the compile > time, as computing the trace information is quadratic in the number of > instructions in a BB and it's relevant successors/predecessors. > > In most cases, knowing the instruction depth should be enough to make > combination decisions. As we already iterate over all instructions in a basic > block, the instruction depth can be computed incrementally. This reduces the > cost of machine-combine drastically in cases where lots of instructions > are combined. The major drawback is that AFAIK, computing the critical path > length cannot be done incrementally. Therefore we only compute > instruction depths incrementally, for basic blocks with more > instructions than inc_threshold. The -machine-combiner-inc-threshold > option can be used to set the threshold and allows for easier > experimenting and checking if using incremental updates for all basic > blocks has any impact on the performance. > > Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn > > Reviewed By: fhahn > > Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits > > Differential Revision: https://reviews.llvm.org/D36619 llvm-svn: 313213 show more ...
# d39b8a35	07-Sep-2017	Florian Hahn <florian.hahn@arm.com>	[MachineCombiner] Update instruction depths incrementally for large BBs. Summary: For large basic blocks with lots of combinable instructions, the MachineTraceMetrics computations in MachineCombiner [MachineCombiner] Update instruction depths incrementally for large BBs. Summary: For large basic blocks with lots of combinable instructions, the MachineTraceMetrics computations in MachineCombiner can dominate the compile time, as computing the trace information is quadratic in the number of instructions in a BB and it's relevant successors/predecessors. In most cases, knowing the instruction depth should be enough to make combination decisions. As we already iterate over all instructions in a basic block, the instruction depth can be computed incrementally. This reduces the cost of machine-combine drastically in cases where lots of instructions are combined. The major drawback is that AFAIK, computing the critical path length cannot be done incrementally. Therefore we only compute instruction depths incrementally, for basic blocks with more instructions than inc_threshold. The -machine-combiner-inc-threshold option can be used to set the threshold and allows for easier experimenting and checking if using incremental updates for all basic blocks has any impact on the performance. Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn Reviewed By: fhahn Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D36619 llvm-svn: 312719 show more ...
Revision tags: llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2, llvmorg-5.0.0-rc1
# 1d2dc681	13-Jul-2017	Jakub Kuderski <kubakuderski@gmail.com>	[NFC] Move DEBUG_TYPE macro below includes... in MachineCombiner.cpp. llvm-svn: 307940
Revision tags: llvmorg-4.0.1, llvmorg-4.0.1-rc3, llvmorg-4.0.1-rc2
# 1527baab	25-May-2017	Matthias Braun <matze@braunis.de>	CodeGen: Rename DEBUG_TYPE to match passnames Rename the DEBUG_TYPE to match the names of corresponding passes where it makes sense. Also establish the pattern of simply referencing DEBUG_TYPE inste CodeGen: Rename DEBUG_TYPE to match passnames Rename the DEBUG_TYPE to match the names of corresponding passes where it makes sense. Also establish the pattern of simply referencing DEBUG_TYPE instead of repeating the passname where possible. llvm-svn: 303921 show more ...
Revision tags: llvmorg-4.0.1-rc1
# 17ce8a2f	15-Mar-2017	Eric Christopher <echristo@gmail.com>	Fix up grammar in a comment. llvm-svn: 297898
Revision tags: llvmorg-4.0.0, llvmorg-4.0.0-rc4, llvmorg-4.0.0-rc3
# 8da96914	13-Feb-2017	Andrew V. Tischenko <andrew.v.tischenko@gmail.com>	Compile time decreasing in the case we're dealing with Machine Combiner. Before this patch compile time was about 21s (see below). After this patch we have less than 2s (see bellow). Intel(R) Xeo Compile time decreasing in the case we're dealing with Machine Combiner. Before this patch compile time was about 21s (see below). After this patch we have less than 2s (see bellow). Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz DAGCombiner - trunk time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math real 0m1.685s DAGCombiner + Speed patch time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math real 0m1.655s MachineCombiner w/o Speed patch time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math real 0m21.614s MachineCombiner + Speed patch time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math real 0m1.593s The test spill_fdiv.ll is attached to D29627 D29627 should be closed. llvm-svn: 294936 show more ...
Revision tags: llvmorg-4.0.0-rc2
# a4976c61	29-Jan-2017	Matthias Braun <matze@braunis.de>	MachineInstr: Remove parameter from dump() The primary use of the dump() functions in LLVM is for use in a debugger. Unfortunately lldb does not seem to handle default arguments so using `p SomeMI.d MachineInstr: Remove parameter from dump() The primary use of the dump() functions in LLVM is for use in a debugger. Unfortunately lldb does not seem to handle default arguments so using `p SomeMI.dump()` fails and you have to type the longer `p SomeMI.dump(nullptr)`. Remove the paramter to make the most common use easy. (You can always construct something like `p SomeMI.print(dbgs(),MyTII)` if you need more features). Differential Revision: https://reviews.llvm.org/D29241 llvm-svn: 293440 show more ...
Revision tags: llvmorg-4.0.0-rc1
# 77794843	21-Dec-2016	Sebastian Pop <sebpop@gmail.com>	machine combiner: fix pretty printer we used to print UNKNOWN instructions when the instruction to be printer was not yet inserted in any BB: in that case the pretty printer would not be able to com machine combiner: fix pretty printer we used to print UNKNOWN instructions when the instruction to be printer was not yet inserted in any BB: in that case the pretty printer would not be able to compute a TII as the instruction does not belong to any BB or function yet. This patch explicitly passes the TII to the pretty-printer. Differential Revision: https://reviews.llvm.org/D27645 llvm-svn: 290228 show more ...
# e08d9c7c	11-Dec-2016	Sebastian Pop <sebpop@gmail.com>	instr-combiner: sum up all latencies of the transformed instructions We have found that -- when the selected subarchitecture has a scheduling model and we are not optimizing for size -- the machine- instr-combiner: sum up all latencies of the transformed instructions We have found that -- when the selected subarchitecture has a scheduling model and we are not optimizing for size -- the machine-instruction combiner uses a too-simple algorithm to compute the cost of one of the two alternatives [before and after running a combining pass on a section of code], and therefor it throws away the combination results too often. This fix has the potential to help any ISA with the potential to combine instructions and for which at least one subarchitecture has a scheduling model. As of now, this is only known to definitely affect AArch64 subarchitectures with a scheduling model. Regression tested on AMD64/GNU-Linux, new test case tested to fail on an unpatched compiler and pass on a patched compiler. Patch by Abe Skolnik and Sebastian Pop. llvm-svn: 289399 show more ...
Revision tags: llvmorg-3.9.1, llvmorg-3.9.1-rc3, llvmorg-3.9.1-rc2, llvmorg-3.9.1-rc1
# 117296c0	01-Oct-2016	Mehdi Amini <mehdi.amini@apple.com>	Use StringRef in Pass/PassManager APIs (NFC) llvm-svn: 283004
Revision tags: llvmorg-3.9.0, llvmorg-3.9.0-rc3, llvmorg-3.9.0-rc2, llvmorg-3.9.0-rc1, llvmorg-3.8.1, llvmorg-3.8.1-rc1
# 01b3a618	24-Apr-2016	Gerolf Hoflehner <ghoflehner@apple.com>	[MachineCombiner] Support for floating-point FMA on ARM64 (re-commit r267098) The original patch caused crashes because it could derefence a null pointer for SelectionDAGTargetInfo for targets that [MachineCombiner] Support for floating-point FMA on ARM64 (re-commit r267098) The original patch caused crashes because it could derefence a null pointer for SelectionDAGTargetInfo for targets that do not define it. Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267328 show more ...
# 591c3795	22-Apr-2016	Daniel Sanders <daniel.sanders@imgtec.com>	Revert r267098 - [MachineCombiner] Support for floating-point FMA on ARM64 It introduced buildbot failures on clang-cmake-mips, clang-ppc64le-linux, among others. llvm-svn: 267127
1 234 5