Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
9122c523 |
| 15-Nov-2024 |
Pengcheng Wang <wangpengcheng.pp@bytedance.com> |
[RISCV] Enable bidirectional scheduling and tracking register pressure (#115445)
This is based on other targets like PPC/AArch64 and some experiments.
This PR will only enable bidirectional schedu
[RISCV] Enable bidirectional scheduling and tracking register pressure (#115445)
This is based on other targets like PPC/AArch64 and some experiments.
This PR will only enable bidirectional scheduling and tracking register pressure.
Disclaimer: I haven't tested it on many cores, maybe we should make some options being features. I believe downstreams must have tried this before, so feedbacks are welcome.
show more ...
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2 |
|
#
14c4f28e |
| 01-Oct-2024 |
Alex Bradbury <asb@igalia.com> |
[RISCV] Enable load clustering by default (#73789)
We believe this is neutral or slightly better in the majority of cases.
|
Revision tags: llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
3e55ac94 |
| 20-Jun-2024 |
Philip Reames <preames@rivosinc.com> |
[RISCV] Strength reduce mul by 2^N - 2^M (#88983)
This is a three instruction expansion, and does not depend on zba, so
most of the test changes are in base RV32/64I configurations.
With zba, th
[RISCV] Strength reduce mul by 2^N - 2^M (#88983)
This is a three instruction expansion, and does not depend on zba, so
most of the test changes are in base RV32/64I configurations.
With zba, this gets immediates such as 14, 28, 30, 56, 60, 62.. which
aren't covered by our other expansions.
show more ...
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7 |
|
#
c8dc6b59 |
| 22-May-2024 |
Yingwei Zheng <dtcxzyw2333@gmail.com> |
[SDAG] Improve `SimplifyDemandedBits` for mul (#90034)
If the RHS is a constant with X trailing zeros, then the X MSBs of the
LHS are not demanded.
Alive2: https://alive2.llvm.org/ce/z/F5CyJW
F
[SDAG] Improve `SimplifyDemandedBits` for mul (#90034)
If the RHS is a constant with X trailing zeros, then the X MSBs of the
LHS are not demanded.
Alive2: https://alive2.llvm.org/ce/z/F5CyJW
Fixes https://github.com/llvm/llvm-project/issues/56645.
show more ...
|
Revision tags: llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4 |
|
#
bbd64c4d |
| 16-Apr-2024 |
Philip Reames <preames@rivosinc.com> |
[RISCV] Add coverage for strength reduction of mul as 2^N - 2^M
|
Revision tags: llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
eabaee0c |
| 07-Jan-2024 |
Fangrui Song <i@maskray.me> |
[RISCV] Omit "@plt" in assembly output "call foo@plt" (#72467)
R_RISCV_CALL/R_RISCV_CALL_PLT distinction is not necessary and R_RISCV_CALL has been deprecated. Since https://reviews.llvm.org/D132530
[RISCV] Omit "@plt" in assembly output "call foo@plt" (#72467)
R_RISCV_CALL/R_RISCV_CALL_PLT distinction is not necessary and R_RISCV_CALL has been deprecated. Since https://reviews.llvm.org/D132530 `call foo` assembles to R_RISCV_CALL_PLT. The `@plt` suffix is not useful and can be removed now (matching AArch64 and PowerPC).
GNU assembler assembles `call foo` to RISCV_CALL_PLT since 2022-09 (70f35d72ef04cd23771875c1661c9975044a749c).
Without this patch, unconditionally changing MO_CALL to MO_PLT could create `jump .L1@plt, a0`, which is invalid in LLVM integrated assembler and GNU assembler.
show more ...
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3 |
|
#
7b3bbd83 |
| 09-Oct-2023 |
Jay Foad <jay.foad@amd.com> |
Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"
This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.
Reverted due to various buildbot failures.
|
#
2501ae58 |
| 09-Oct-2023 |
Jay Foad <jay.foad@amd.com> |
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
show more ...
|
Revision tags: llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0 |
|
#
e0919b18 |
| 13-Sep-2023 |
Jay Foad <jay.foad@amd.com> |
[CodeGen] Renumber slot indexes before register allocation (#66334)
RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate the length of a live range for its heuristics. Renumbering
[CodeGen] Renumber slot indexes before register allocation (#66334)
RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate the length of a live range for its heuristics. Renumbering all slot indexes with the default instruction distance ensures that this estimate will be as accurate as possible, and will not depend on the history of how instructions have been added to and removed from SlotIndexes's maps.
This also means that enabling -early-live-intervals, which runs the SlotIndexes analysis earlier, will not cause large amounts of churn due to different register allocator decisions.
show more ...
|
Revision tags: llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6 |
|
#
a70d5e25 |
| 07-Jun-2023 |
Amaury Séchet <deadalnix@gmail.com> |
[DAGCombine] Make sure combined nodes are added back to the worklist in topological order.
Currently, a node and its users are added back to the worklist in reverse topological order after it is com
[DAGCombine] Make sure combined nodes are added back to the worklist in topological order.
Currently, a node and its users are added back to the worklist in reverse topological order after it is combined. This diff changes that order to be topological. This is part of a larger migration to get the DAGCombiner to process nodes in topological order.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D127115
show more ...
|
#
c9998ec1 |
| 05-Jun-2023 |
JP Lehr <JanPatrick.Lehr@amd.com> |
Revert "[DAGCombine] Make sure combined nodes are added back to the worklist in topological order."
This reverts commit e69fa03ddd85812be3143d79a0359c3e8d43bd45.
This patch lead to build time outs
Revert "[DAGCombine] Make sure combined nodes are added back to the worklist in topological order."
This reverts commit e69fa03ddd85812be3143d79a0359c3e8d43bd45.
This patch lead to build time outs on the AMDGPU OpenMP runtime buildbot.
show more ...
|
Revision tags: llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4 |
|
#
e69fa03d |
| 30-Apr-2022 |
Amaury Séchet <deadalnix@gmail.com> |
[DAGCombine] Make sure combined nodes are added back to the worklist in topological order.
Currently, a node and its users are added back to the worklist in reverse topological order after it is com
[DAGCombine] Make sure combined nodes are added back to the worklist in topological order.
Currently, a node and its users are added back to the worklist in reverse topological order after it is combined. This diff changes that order to be topological. This is part of a larger migration to get the DAGCombiner to process nodes in topological order.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D127115
show more ...
|
#
2e43eea2 |
| 15-Apr-2023 |
Ben Shi <powerman1st@163.com> |
[RISCV] Optimize multiplication with immediates
The optimization of (mul x, c) to (ADD (SLLI x, i0), (SLLI x, i1)) is only enabled for i32 multiplication on rv64, because of the regression in i64 mu
[RISCV] Optimize multiplication with immediates
The optimization of (mul x, c) to (ADD (SLLI x, i0), (SLLI x, i1)) is only enabled for i32 multiplication on rv64, because of the regression in i64 multiplication on rv32.
However we can change the condition to that the immediate 'c' should only be used once, then the above regression can also be avoided, and ohter chances of optimization can be enabled.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D147410
show more ...
|
#
86eff6be |
| 20-Jan-2023 |
Philip Reames <preames@rivosinc.com> |
[MachineCombiner] Use default latency model when no detailed model available
This change adjusts the cost modeling used when the target does not have a schedule model with individual instruction lat
[MachineCombiner] Use default latency model when no detailed model available
This change adjusts the cost modeling used when the target does not have a schedule model with individual instruction latencies. After this change, we use the default latency information available from TargetSchedule. The default latency information essentially ends up treating most instructions as latency 1, with a few "expensive" ones getting a higher cost.
Previously, we unconditionally applied the first legal pattern - without any consideration of profitability. As a result, this change both prevents some patterns being applied, and changes which patterns are exercised. (i.e. previously the first pattern was applied, afterwards, maybe the second one is because the first wasn't profitable.)
The motivation here is two fold.
First, this brings the default behavior in line with the behavior when -mcpu or -mtune is specified. This improves test coverage, and generally makes it less likely we will have bad surprises when providing more information to the compiler.
Second, this enables some reassociation for ILP by default. Despite being unconditionally enabled, the prior code tended to "reassociate" repeatedly through an entire chain and simply moving the first operand to the end. The result was still a serial chain, just a different one. With this change, one of the intermediate transforms is unprofitable and we end up with a partially flattened tree.
Note that the resulting code diffs show significant room for improvement in the basic algorithm. I am intentionally excluding those from this patch.
For the test diffs, I don't seen any concerning regressions. I took a fairly close look at the RISCV ones, but only skimmed the x86 (particularly vector x86) changes.
Differential Revision: https://reviews.llvm.org/D141017
show more ...
|
#
20ecc079 |
| 13-Jan-2023 |
Florian Hahn <flo@fhahn.com> |
[MachineCombiner] Lift same-bb restriction for reassociable ops.
This patch relaxes the restriction that both reassociate operands must be in the same block as the root instruction.
The comment ind
[MachineCombiner] Lift same-bb restriction for reassociable ops.
This patch relaxes the restriction that both reassociate operands must be in the same block as the root instruction.
The comment indicates that the reason for this restriction was that the operands not in the same block won't have a depth in the trace.
I believe this is outdated; if the operand is in a different block, it must dominate the current block (otherwise it would need to be phi), which in turn means the operand's block must be included in the current rance, and depths must be available.
There's a test case (no_reassociate_different_block) added in 70520e2f1c5fc4 which shows that we have accurate depths for operands defined in other blocks.
This allows reassociation of code that computes the final reduction value after vectorization, among other things.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D141302
show more ...
|
#
002005e6 |
| 22-Dec-2022 |
Hsiangkai Wang <hsiangkai@google.com> |
[RISCV] Add integer scalar instructions to isAssociativeAndCommutative
Inspired by D138107.
We can add ADD, AND, OR, XOR, MUL, MIN[U]/MAX[U] to isAssociativeAndCommutative to increase instruction-l
[RISCV] Add integer scalar instructions to isAssociativeAndCommutative
Inspired by D138107.
We can add ADD, AND, OR, XOR, MUL, MIN[U]/MAX[U] to isAssociativeAndCommutative to increase instruction-level parallelism by the existing MachineCombiner pass.
Differential Revision: https://reviews.llvm.org/D140530
show more ...
|
#
d64d3c5a |
| 22-Dec-2022 |
Nitin John Raj <nitin.raj@sifive.com> |
[RISCV] Add pass to remove W suffix from ADDIW and SLLIW to improve compressibility
SLLI and ADD are more compressible than SLLIW and ADDW. SLLI/ADD both have a 5-bit register encoding. SLLIW/ADDW h
[RISCV] Add pass to remove W suffix from ADDIW and SLLIW to improve compressibility
SLLI and ADD are more compressible than SLLIW and ADDW. SLLI/ADD both have a 5-bit register encoding. SLLIW/ADDW have a 3-bit register encoding. They both require the dest to also be one of the sources.
We aggressively form ADDW/SLLIW as it helps hasAllWBitUsers in RISCVISelDAGToDAG to not require recursion. So we need a pass to remove excessive -w suffixes.
Differential Revision: https://reviews.llvm.org/D139948
show more ...
|
#
1806ce90 |
| 06-Dec-2022 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Teach RISCVMatInt to prefer li+slli over lui+addi(w) for compressibility.
With C extension, li with a 6 bit immediate followed by slli is 4 bytes. The lui+addi(w) sequence is at least 6 byte
[RISCV] Teach RISCVMatInt to prefer li+slli over lui+addi(w) for compressibility.
With C extension, li with a 6 bit immediate followed by slli is 4 bytes. The lui+addi(w) sequence is at least 6 bytes.
The two sequences probably have similar execution latency. The exception being if the target supports lui+addi(w) macrofusion.
Since the execution latency is probably the same I didn't restrict this to C extension.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D139135
show more ...
|
#
e00e20a0 |
| 01-Dec-2022 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Add ADDW/AND/OR/XOR/SUB/SUBW to getRegAllocHints.
These instructions requires both register operands to be compressible so I've only applied the hint if we already have a GPRC physical regis
[RISCV] Add ADDW/AND/OR/XOR/SUB/SUBW to getRegAllocHints.
These instructions requires both register operands to be compressible so I've only applied the hint if we already have a GPRC physical register assigned for the other register operand.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D139079
show more ...
|
#
64612f5d |
| 25-Nov-2022 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Add ADD to getRegAllocationHints to improve to improve use of c.add.
add can always be compressed to c.add if one of the sources is the same as the destination.
The same is not true for c.a
[RISCV] Add ADD to getRegAllocationHints to improve to improve use of c.add.
add can always be compressed to c.add if one of the sources is the same as the destination.
The same is not true for c.addw where the registers need to be x8-x15.
show more ...
|
#
223f466f |
| 25-Oct-2022 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Add ORI to hasAllNBitUsers.
If the immediate is negative with sufficient leading ones, then the upper bits of the other operand aren't demanded.
|
#
db25f51e |
| 23-Oct-2022 |
Craig Topper <craig.topper@sifive.com> |
Revert "[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))"
This reverts commit e8b3ffa532b8ebac5dcdf17bb91b47817382c14d.
The AMDGPU/mad_64_32.ll seems to fail on some of the
Revert "[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))"
This reverts commit e8b3ffa532b8ebac5dcdf17bb91b47817382c14d.
The AMDGPU/mad_64_32.ll seems to fail on some of the build bots but passes locally. I'm really confused.
show more ...
|
#
e8b3ffa5 |
| 23-Oct-2022 |
Craig Topper <craig.topper@sifive.com> |
[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))
(sra X, BW-1) is either 0 or -1. So the multiply is a conditional negate of Y.
This pattern shows up when type legalizing w
[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))
(sra X, BW-1) is either 0 or -1. So the multiply is a conditional negate of Y.
This pattern shows up when type legalizing wide multiplies involving a sign extended value.
Fixes PR57549.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D133399
show more ...
|
#
ac920975 |
| 11-Oct-2022 |
Craig Topper <craig.topper@sifive.com> |
Revert "[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))"
This reverts commit 0148df8157f05ecf3b1064508e6f012aefb87dad.
Getting a lit test failures on AMDGPU but I can't re
Revert "[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))"
This reverts commit 0148df8157f05ecf3b1064508e6f012aefb87dad.
Getting a lit test failures on AMDGPU but I can't reproduce it so far. Reverting to investigate.
show more ...
|
#
0148df81 |
| 11-Oct-2022 |
Craig Topper <craig.topper@sifive.com> |
[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))
(sra X, BW-1) is either 0 or -1. So the multiply is a conditional negate of Y.
This pattern shows up when type legalizing w
[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))
(sra X, BW-1) is either 0 or -1. So the multiply is a conditional negate of Y.
This pattern shows up when type legalizing wide multiplies involving a sign extended value.
Fixes PR57549.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D133399
show more ...
|