#
fd4a1073 |
| 14-Dec-2020 |
Bardia Mahjour <bmahjour@ca.ibm.com> |
[DDG] Data Dependence Graph - DOT printer
This patch implements a DDG printer pass that generates a graph in the DOT description language, providing a more visually appealing representation of the D
[DDG] Data Dependence Graph - DOT printer
This patch implements a DDG printer pass that generates a graph in the DOT description language, providing a more visually appealing representation of the DDG. Similar to the CFG DOT printer, this functionality is provided under an option called -dot-ddg and can be generated in a less verbose mode under -dot-ddg-only option.
Differential Revision: https://reviews.llvm.org/D90159
show more ...
|
Revision tags: llvmorg-11.0.1-rc1 |
|
#
a461e76b |
| 17-Nov-2020 |
Jon Roelofs <jonathan_roelofs@apple.com> |
[MachineScheduler] Inform pass infra of post-ra scheduler's dependencies
Differential Revision: https://reviews.llvm.org/D91561
|
#
1d178d60 |
| 02-Nov-2020 |
QingShan Zhang <qshanz@cn.ibm.com> |
[Scheduling] Fall back to the fast cluster algorithm if the DAG is too complex
We have added a new load/store cluster algorithm in D85517. However, AArch64 see some compiling deg with the new algori
[Scheduling] Fall back to the fast cluster algorithm if the DAG is too complex
We have added a new load/store cluster algorithm in D85517. However, AArch64 see some compiling deg with the new algorithm as the IsReachable() is not cheap if the DAG is complex. O(M+N) See https://bugs.llvm.org/show_bug.cgi?id=47966 So, this patch added a heuristic to switch to old cluster algorithm if the DAG is too complex.
Reviewed By: Owen Anderson
Differential Revision: https://reviews.llvm.org/D90144
show more ...
|
#
f0a98ad8 |
| 27-Oct-2020 |
Mircea Trofin <mtrofin@google.com> |
[NFC] Use Register in RegisterPressure APIs
Some related changes as well.
Differential Revision: https://reviews.llvm.org/D90268
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6 |
|
#
db80cc39 |
| 05-Oct-2020 |
Jon Roelofs <jonathan_roelofs@apple.com> |
[CodeGen][MachineSched] Fixup function name typo. NFC
|
Revision tags: llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3 |
|
#
17dc729b |
| 21-Sep-2020 |
Alexander Belyaev <pifon@google.com> |
Revert "[NFC][ScheduleDAG] Remove unused EntrySU SUnit"
This reverts commit 0345d88de654259ae90494bf9b015416e2cccacb.
Google internal backend uses EntrySU, we are looking into removing dependency o
Revert "[NFC][ScheduleDAG] Remove unused EntrySU SUnit"
This reverts commit 0345d88de654259ae90494bf9b015416e2cccacb.
Google internal backend uses EntrySU, we are looking into removing dependency on it.
Differential Revision: https://reviews.llvm.org/D88018
show more ...
|
#
0345d88d |
| 17-Sep-2020 |
Francis Visoiu Mistrih <francisvm@yahoo.com> |
[NFC][ScheduleDAG] Remove unused EntrySU SUnit
EntrySU doesn't seem to be used at all when building the ScheduleDAG.
Differential Revision: https://reviews.llvm.org/D87867
|
#
ebf3b188 |
| 26-Aug-2020 |
QingShan Zhang <qshanz@cn.ibm.com> |
[Scheduling] Implement a new way to cluster loads/stores
Before calling target hook to determine if two loads/stores are clusterable, we put them into different groups to avoid fake cluster due to d
[Scheduling] Implement a new way to cluster loads/stores
Before calling target hook to determine if two loads/stores are clusterable, we put them into different groups to avoid fake cluster due to dependency. For now, we are putting the loads/stores into the same group if they have the same predecessor. We assume that, if two loads/stores have the same predecessor, it is likely that, they didn't have dependency for each other.
However, one SUnit might have several predecessors and for now, we just pick up the first predecessor that has non-data/non-artificial dependency, which is too arbitrary. And we are struggling to fix it.
So, I am proposing some better implementation. 1. Collect all the loads/stores that has memory info first to reduce the complexity. 2. Sort these loads/stores so that we can stop the seeking as early as possible. 3. For each load/store, seeking for the first non-dependency instruction with the sorted order, and check if they can cluster or not.
Reviewed By: Jay Foad
Differential Revision: https://reviews.llvm.org/D85517
show more ...
|
Revision tags: llvmorg-11.0.0-rc2 |
|
#
2b2bfdb4 |
| 07-Aug-2020 |
QingShan Zhang <qshanz@cn.ibm.com> |
[NFC] Add the stats for load/store cluster
We have the stats for MacroFusion but miss it for load/store cluster.
|
#
3359ea62 |
| 07-Aug-2020 |
QingShan Zhang <qshanz@cn.ibm.com> |
[Scheduling] Create the missing dependency edges for store cluster
If it is load cluster, we don't need to create the dependency edges(SUb->reg) from SUb to SUa as they both depend on the base regis
[Scheduling] Create the missing dependency edges for store cluster
If it is load cluster, we don't need to create the dependency edges(SUb->reg) from SUb to SUa as they both depend on the base register "reg"
+-------+ +----> reg | | +---+---+ | ^ | | | | | | | +---+---+ | | SUa | Load 0(reg) | +---+---+ | ^ | | | | | +---+---+ +----+ SUb | Load 4(reg) +-------+
But if it is store cluster, we need to create it as follow shows to avoid the instruction store depend on scheduled in-between SUb and SUa.
+-------+ +----> reg | | +---+---+ | ^ | | Missing +-------+ | | +-------------------->+ y | | | | +---+---+ | +---+-+-+ ^ | | SUa | Store x 0(reg) | | +---+---+ | | ^ | | | +------------------------+ | | | | +---+--++ +----+ SUb | Store y 4(reg) +-------+
Reviewed By: evandro, arsenm, rampitec, foad, fhahn
Differential Revision: https://reviews.llvm.org/D72031
show more ...
|
#
7f1556f2 |
| 03-Aug-2020 |
Jon Roelofs <jonathan_roelofs@apple.com> |
Fix typo: s/epomymous/eponymous/ NFC
|
Revision tags: llvmorg-11.0.0-rc1 |
|
#
a6e9f526 |
| 27-Jul-2020 |
QingShan Zhang <qshanz@cn.ibm.com> |
[Scheduling] Improve group algorithm for store cluster
Store Addr and Store Addr+8 are clusterable pair. They have memory(ctrl) dependency on different loads. Current implementation will put these t
[Scheduling] Improve group algorithm for store cluster
Store Addr and Store Addr+8 are clusterable pair. They have memory(ctrl) dependency on different loads. Current implementation will put these two stores into different group and miss to cluster them.
Reviewed By: evandro
Differential Revision: https://reviews.llvm.org/D84139
show more ...
|
Revision tags: llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init |
|
#
62fd7f76 |
| 07-Jan-2020 |
Jay Foad <jay.foad@amd.com> |
[MachineScheduler] Fix the TopDepth/BotHeightReduce latency heuristics
tryLatency compares two sched candidates. For the top zone it prefers the one with lesser depth, but only if that depth is grea
[MachineScheduler] Fix the TopDepth/BotHeightReduce latency heuristics
tryLatency compares two sched candidates. For the top zone it prefers the one with lesser depth, but only if that depth is greater than the total latency of the instructions we've already scheduled -- otherwise its latency would be hidden and there would be no stall.
Unfortunately it only tests the depth of one of the candidates. This can lead to situations where the TopDepthReduce heuristic does not kick in, but a lower priority heuristic chooses the other candidate, whose depth *is* greater than the already scheduled latency, which causes a stall.
The fix is to apply the heuristic if the depth of *either* candidate is greater than the already scheduled latency.
All this also applies to the BotHeightReduce heuristic in the bottom zone.
Differential Revision: https://reviews.llvm.org/D72392
show more ...
|
#
5832950a |
| 23-Jun-2020 |
hsmahesha <mahesha.comp@gmail.com> |
[AMDGPU/MemOpsCluster] Compute `width` for `MIMG` instruction class.
Summary: `width` computation is missing for newly added `MIMG` instruction class. Add it.
Reviewers: foad, rampitec, arsenm
Rev
[AMDGPU/MemOpsCluster] Compute `width` for `MIMG` instruction class.
Summary: `width` computation is missing for newly added `MIMG` instruction class. Add it.
Reviewers: foad, rampitec, arsenm
Reviewed By: foad
Subscribers: MatzeB, javed.absar, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81649
show more ...
|
#
2fea3fe4 |
| 09-Jun-2020 |
David Green <david.green@arm.com> |
[MachineScheduler] Update available queue on the first mop of a new cycle
If a resource can be held for multiple cycles in the schedule model then an instruction can be placed into the available que
[MachineScheduler] Update available queue on the first mop of a new cycle
If a resource can be held for multiple cycles in the schedule model then an instruction can be placed into the available queue, another instruction can be scheduled, but the first will not be taken back out if the two instructions hazard. To fix this make sure that we update the available queue even on the first MOp of a cycle, pushing available instructions back into the pending queue if they now conflict.
This happens with some downstream schedules we have around MVE instruction scheduling where we use ResourceCycles=[2] to show the instruction executing over two beats. Apparently the test changes here are OK too.
Differential Revision: https://reviews.llvm.org/D76909
show more ...
|
#
0ed2c046 |
| 01-Jun-2020 |
hsmahesha <mahesha.comp@gmail.com> |
[AMDGPU/MemOpsCluster] Let mem ops clustering logic also consider number of clustered bytes
Summary: While clustering mem ops, AMDGPU target needs to consider number of clustered bytes to decide on
[AMDGPU/MemOpsCluster] Let mem ops clustering logic also consider number of clustered bytes
Summary: While clustering mem ops, AMDGPU target needs to consider number of clustered bytes to decide on max number of mem ops that can be clustered. This patch adds support to pass number of clustered bytes to target mem ops clustering logic.
Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar
Reviewed By: foad
Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80545
show more ...
|
#
09f7dcb6 |
| 26-May-2020 |
hsmahesha <mahesha.comp@gmail.com> |
[AMDGPU/MemOpsCluster] Code clean-up around mem ops clustering logic
Summary: Clean-up code around mem ops clustering logic. This patch cleans up code within the function clusterNeighboringMemOps().
[AMDGPU/MemOpsCluster] Code clean-up around mem ops clustering logic
Summary: Clean-up code around mem ops clustering logic. This patch cleans up code within the function clusterNeighboringMemOps(). It is WIP, and this patch is a first cut.
Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar
Reviewed By: foad
Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80119
show more ...
|
#
328bb446 |
| 30-Mar-2020 |
Mark Lacey <mark.lacey@apple.com> |
Add a policy to enable computing SchedDFSResult.
Summary: Make GenericScheduler compute SchedDFSResult on initialization if the policy is set. This makes it possible to create classes that extend Ge
Add a policy to enable computing SchedDFSResult.
Summary: Make GenericScheduler compute SchedDFSResult on initialization if the policy is set. This makes it possible to create classes that extend GenericScheduler and rely on the results of SchedDFSResult, e.g. to perform subtree scheduling.
NFC unless the policy is set.
Subscribers: MatzeB, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78432
show more ...
|
#
8fbc9258 |
| 18-Feb-2020 |
Sander de Smalen <sander.desmalen@arm.com> |
Add OffsetIsScalable to getMemOperandWithOffset
Summary: Making `Scale` a `TypeSize` in AArch64InstrInfo::getMemOpInfo, has the effect that all places where this information is used (notably, Target
Add OffsetIsScalable to getMemOperandWithOffset
Summary: Making `Scale` a `TypeSize` in AArch64InstrInfo::getMemOpInfo, has the effect that all places where this information is used (notably, TargetInstrInfo::getMemOperandWithOffset) will need to consider Scale - and derived, Offset - possibly being scalable.
This patch adds a new operand `bool &OffsetIsScalable` to TargetInstrInfo::getMemOperandWithOffset and fixes up all the places where this function is used, to consider the offset possibly being scalable.
In most cases, this means bailing out because the algorithm does not (or cannot) support scalable offsets in places where it does some form of alias checking for example.
Reviewers: rovka, efriedma, kristof.beyls
Reviewed By: efriedma
Subscribers: wuzish, kerbowa, MatzeB, arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, javed.absar, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72758
show more ...
|
#
0d7bd343 |
| 19-Dec-2019 |
Jay Foad <jay.foad@amd.com> |
[MachineScheduler] Ignore artificial edges when forming store chains
Summary: BaseMemOpClusterMutation::apply forms store chains by looking for control (i.e. non-data) dependencies from one mem op t
[MachineScheduler] Ignore artificial edges when forming store chains
Summary: BaseMemOpClusterMutation::apply forms store chains by looking for control (i.e. non-data) dependencies from one mem op to another.
In the test case, clusterNeighboringMemOps successfully clusters the loads, and then adds artificial edges to the loads' successors as described in the comment: // Copy successor edges from SUa to SUb. Interleaving computation // dependent on SUa can prevent load combining due to register reuse. The effect of this is that *data* dependencies from one load to a store are copied as *artificial* dependencies from a different load to the same store.
Then when BaseMemOpClusterMutation::apply looks at the stores, it finds that some of them have a control dependency on a previous load, which breaks the chains and means that the stores are not all considered part of the same chain and won't all be clustered.
The fix is to only consider non-artificial control dependencies when forming chains.
Subscribers: MatzeB, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71717
show more ...
|
#
adcd0268 |
| 28-Jan-2020 |
Benjamin Kramer <benny.kra@googlemail.com> |
Make llvm::StringRef to std::string conversions explicit.
This is how it should've been and brings it more in line with std::string_view. There should be no functional change here.
This is mostly m
Make llvm::StringRef to std::string conversions explicit.
This is how it should've been and brings it more in line with std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
show more ...
|
#
be8e38cb |
| 24-Jan-2020 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
Correct NumLoads in clustering
Scheduler sends NumLoads argument into shouldClusterMemOps() one less the actual cluster length. So for 2 instructions it will pass just 1. Correct this number.
This
Correct NumLoads in clustering
Scheduler sends NumLoads argument into shouldClusterMemOps() one less the actual cluster length. So for 2 instructions it will pass just 1. Correct this number.
This is NFC for in tree targets.
Differential Revision: https://reviews.llvm.org/D73292
show more ...
|
#
e0f0d0e5 |
| 06-Jan-2020 |
Jay Foad <jay.foad@amd.com> |
[MachineScheduler] Allow clustering mem ops with complex addresses
The generic BaseMemOpClusterMutation calls into TargetInstrInfo to analyze the address of each load/store instruction, and again to
[MachineScheduler] Allow clustering mem ops with complex addresses
The generic BaseMemOpClusterMutation calls into TargetInstrInfo to analyze the address of each load/store instruction, and again to decide whether two instructions should be clustered. Previously this had to represent each address as a single base operand plus a constant byte offset. This patch extends it to support any number of base operands.
The old target hook getMemOperandWithOffset is now a convenience function for callers that are only prepared to handle a single base operand. It calls the new more general target hook getMemOperandsWithOffset.
The only requirements for the base operands returned by getMemOperandsWithOffset are: - they can be sorted by MemOpInfo::Compare, such that clusterable ops get sorted next to each other, and - shouldClusterMemOps knows what they mean.
One simple follow-on is to enable clustering of AMDGPU FLAT instructions with both vaddr and saddr (base register + offset register). I've left a FIXME in the code for this case.
Differential Revision: https://reviews.llvm.org/D71655
show more ...
|
#
c65ac2ba |
| 15-Jan-2020 |
Jinsong Ji <jji@us.ibm.com> |
[MachineScheduler][NFC] Don't swap when we can't cluster
https://reviews.llvm.org/D72706 tried to reduce reordering due to mem op clustering. This patch avoid doing the swap when we can't cluster.
[MachineScheduler][NFC] Don't swap when we can't cluster
https://reviews.llvm.org/D72706 tried to reduce reordering due to mem op clustering. This patch avoid doing the swap when we can't cluster.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D72800
show more ...
|
#
b777e551 |
| 14-Jan-2020 |
Jay Foad <jay.foad@amd.com> |
[MachineScheduler] Reduce reordering due to mem op clustering
Summary: Mem op clustering adds a weak edge in the DAG between two loads or stores that should be clustered, but the direction of this e
[MachineScheduler] Reduce reordering due to mem op clustering
Summary: Mem op clustering adds a weak edge in the DAG between two loads or stores that should be clustered, but the direction of this edge is pretty arbitrary (it depends on the sort order of MemOpInfo, which represents the operands of a load or store). This often means that two loads or stores will get reordered even if they would naturally have been scheduled together anyway, which leads to test case churn and goes against the scheduler's "do no harm" philosophy.
The fix makes sure that the direction of the edge always matches the original code order of the instructions.
Reviewers: atrick, MatzeB, arsenm, rampitec, t.p.northover
Subscribers: jvesely, wdng, nhaehnle, kristof.beyls, hiraditya, javed.absar, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72706
show more ...
|