Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
9122c523 |
| 15-Nov-2024 |
Pengcheng Wang <wangpengcheng.pp@bytedance.com> |
[RISCV] Enable bidirectional scheduling and tracking register pressure (#115445)
This is based on other targets like PPC/AArch64 and some experiments.
This PR will only enable bidirectional schedu
[RISCV] Enable bidirectional scheduling and tracking register pressure (#115445)
This is based on other targets like PPC/AArch64 and some experiments.
This PR will only enable bidirectional scheduling and tracking register pressure.
Disclaimer: I haven't tested it on many cores, maybe we should make some options being features. I believe downstreams must have tried this before, so feedbacks are welcome.
show more ...
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2 |
|
#
2967e5f8 |
| 11-Oct-2024 |
Alex Bradbury <asb@igalia.com> |
[RISCV] Enable store clustering by default (#73796)
Builds on #73789, enabling store clustering by default using the same
heuristic.
|
#
14c4f28e |
| 01-Oct-2024 |
Alex Bradbury <asb@igalia.com> |
[RISCV] Enable load clustering by default (#73789)
We believe this is neutral or slightly better in the majority of cases.
|
Revision tags: llvmorg-19.1.1 |
|
#
3e0a76b1 |
| 23-Sep-2024 |
futog <54807384+futog@users.noreply.github.com> |
[Codegen][LegalizeIntegerTypes] Improve shift through stack (#96151)
Minor improvement on cc39c3b17fb2598e20ca0854f9fe6d69169d85c7.
Use an aligned stack slot to store the shifted value.
Use the
[Codegen][LegalizeIntegerTypes] Improve shift through stack (#96151)
Minor improvement on cc39c3b17fb2598e20ca0854f9fe6d69169d85c7.
Use an aligned stack slot to store the shifted value.
Use the native register width as shifting unit, so the load of the
shift result is aligned.
If the shift amount is a multiple of the native register width, there is
no need to do a follow-up shift after the load. I added new tests for
these cases.
Co-authored-by: Gergely Futo <gergely.futo@hightec-rt.com>
show more ...
|
Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3 |
|
#
7b3bbd83 |
| 09-Oct-2023 |
Jay Foad <jay.foad@amd.com> |
Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"
This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.
Reverted due to various buildbot failures.
|
#
2501ae58 |
| 09-Oct-2023 |
Jay Foad <jay.foad@amd.com> |
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
show more ...
|
Revision tags: llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0 |
|
#
e0919b18 |
| 13-Sep-2023 |
Jay Foad <jay.foad@amd.com> |
[CodeGen] Renumber slot indexes before register allocation (#66334)
RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate the length of a live range for its heuristics. Renumbering
[CodeGen] Renumber slot indexes before register allocation (#66334)
RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate the length of a live range for its heuristics. Renumbering all slot indexes with the default instruction distance ensures that this estimate will be as accurate as possible, and will not depend on the history of how instructions have been added to and removed from SlotIndexes's maps.
This also means that enabling -early-live-intervals, which runs the SlotIndexes analysis earlier, will not cause large amounts of churn due to different register allocator decisions.
show more ...
|
Revision tags: llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
#
37b474a2 |
| 24-Jul-2023 |
Jim Lin <jim@andestech.com> |
[RISCV] Remove unused check prefixes for tests. NFC
Also remove the warning line for that these prefixes are unused.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D1560
[RISCV] Remove unused check prefixes for tests. NFC
Also remove the warning line for that these prefixes are unused.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D156048
show more ...
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2 |
|
#
b6ea46fe |
| 07-Apr-2023 |
LiaoChunyu <chunyu@iscas.ac.cn> |
[RISCV] Add DAG combine to fold (sub 0, (setcc x, 0, setlt)) -> (sra x , xlen - 1)
The result of sub + setcc is 0 or 1 for all bits. The sra instruction get the same result.
Reviewed By: craig.topp
[RISCV] Add DAG combine to fold (sub 0, (setcc x, 0, setlt)) -> (sra x , xlen - 1)
The result of sub + setcc is 0 or 1 for all bits. The sra instruction get the same result.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D147538
show more ...
|
Revision tags: llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init |
|
#
86eff6be |
| 20-Jan-2023 |
Philip Reames <preames@rivosinc.com> |
[MachineCombiner] Use default latency model when no detailed model available
This change adjusts the cost modeling used when the target does not have a schedule model with individual instruction lat
[MachineCombiner] Use default latency model when no detailed model available
This change adjusts the cost modeling used when the target does not have a schedule model with individual instruction latencies. After this change, we use the default latency information available from TargetSchedule. The default latency information essentially ends up treating most instructions as latency 1, with a few "expensive" ones getting a higher cost.
Previously, we unconditionally applied the first legal pattern - without any consideration of profitability. As a result, this change both prevents some patterns being applied, and changes which patterns are exercised. (i.e. previously the first pattern was applied, afterwards, maybe the second one is because the first wasn't profitable.)
The motivation here is two fold.
First, this brings the default behavior in line with the behavior when -mcpu or -mtune is specified. This improves test coverage, and generally makes it less likely we will have bad surprises when providing more information to the compiler.
Second, this enables some reassociation for ILP by default. Despite being unconditionally enabled, the prior code tended to "reassociate" repeatedly through an entire chain and simply moving the first operand to the end. The result was still a serial chain, just a different one. With this change, one of the intermediate transforms is unprofitable and we end up with a partially flattened tree.
Note that the resulting code diffs show significant room for improvement in the basic algorithm. I am intentionally excluding those from this patch.
For the test diffs, I don't seen any concerning regressions. I took a fairly close look at the RISCV ones, but only skimmed the x86 (particularly vector x86) changes.
Differential Revision: https://reviews.llvm.org/D141017
show more ...
|
#
cc39c3b1 |
| 14-Jan-2023 |
Roman Lebedev <lebedev.ri@gmail.com> |
[Codegen][LegalizeIntegerTypes] New legalization strategy for scalar shifts: shift through stack
https://reviews.llvm.org/D140493 is going to teach SROA how to promote allocas that have variably-ind
[Codegen][LegalizeIntegerTypes] New legalization strategy for scalar shifts: shift through stack
https://reviews.llvm.org/D140493 is going to teach SROA how to promote allocas that have variably-indexed loads. That does bring up questions of cost model, since that requires creating wide shifts.
Indeed, our legalization for them is not optimal. We either split it into parts, or lower it into a libcall. But if the shift amount is by a multiple of CHAR_BIT, we can also legalize it throught stack.
The basic idea is very simple: 1. Get a stack slot 2x the width of the shift type 2. store the value we are shifting into one half of the slot 3. pad the other half of the slot. for logical shifts, with zero, for arithmetic shift with signbit 4. index into the slot (starting from the base half into which we spilled, either upwards or downwards) 5. load 6. split loaded integer
This works for both little-endian and big-endian machines: https://alive2.llvm.org/ce/z/YNVwd5
And better yet, if the original shift amount was not a multiple of CHAR_BIT, we can just shift by that remainder afterwards: https://alive2.llvm.org/ce/z/pz5G-K
I think, if we are going perform shift->shift-by-parts expansion more than once, we should instead go through stack, which is what this patch does.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D140638
show more ...
|
#
20ecc079 |
| 13-Jan-2023 |
Florian Hahn <flo@fhahn.com> |
[MachineCombiner] Lift same-bb restriction for reassociable ops.
This patch relaxes the restriction that both reassociate operands must be in the same block as the root instruction.
The comment ind
[MachineCombiner] Lift same-bb restriction for reassociable ops.
This patch relaxes the restriction that both reassociate operands must be in the same block as the root instruction.
The comment indicates that the reason for this restriction was that the operands not in the same block won't have a depth in the trace.
I believe this is outdated; if the operand is in a different block, it must dominate the current block (otherwise it would need to be phi), which in turn means the operand's block must be included in the current rance, and depths must be available.
There's a test case (no_reassociate_different_block) added in 70520e2f1c5fc4 which shows that we have accurate depths for operands defined in other blocks.
This allows reassociation of code that computes the final reduction value after vectorization, among other things.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D141302
show more ...
|
Revision tags: llvmorg-15.0.7 |
|
#
a63b7247 |
| 30-Dec-2022 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Use SUB instead of XOR in lowerShiftLeftParts/lowerShiftRightParts./
isel is now capable of turning the SUB into XOR for shift amounts. Though it uses NOT instead of XOR with ShiftSize-1.
B
[RISCV] Use SUB instead of XOR in lowerShiftLeftParts/lowerShiftRightParts./
isel is now capable of turning the SUB into XOR for shift amounts. Though it uses NOT instead of XOR with ShiftSize-1.
By using SUB during lowering we enable more DAG combines with other arithmetic on the shift amount.
show more ...
|
#
e50976e5 |
| 29-Dec-2022 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Teach RISCVDAGToDAGISel::selectShiftMask to bypass adds with constant.
If the shift amount is (add X, C) where C is 0 modulo the size of the shift, we can bypass the add.
Similar to other t
[RISCV] Teach RISCVDAGToDAGISel::selectShiftMask to bypass adds with constant.
If the shift amount is (add X, C) where C is 0 modulo the size of the shift, we can bypass the add.
Similar to other targets like AArch64 and X86.
show more ...
|
#
002005e6 |
| 22-Dec-2022 |
Hsiangkai Wang <hsiangkai@google.com> |
[RISCV] Add integer scalar instructions to isAssociativeAndCommutative
Inspired by D138107.
We can add ADD, AND, OR, XOR, MUL, MIN[U]/MAX[U] to isAssociativeAndCommutative to increase instruction-l
[RISCV] Add integer scalar instructions to isAssociativeAndCommutative
Inspired by D138107.
We can add ADD, AND, OR, XOR, MUL, MIN[U]/MAX[U] to isAssociativeAndCommutative to increase instruction-level parallelism by the existing MachineCombiner pass.
Differential Revision: https://reviews.llvm.org/D140530
show more ...
|
#
6357b637 |
| 28-Dec-2022 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Add RISCV::XORI to RISCVDAGToDAGISel::hasAllNBitUsers.
|
#
a9fbf25a |
| 24-Dec-2022 |
Roman Lebedev <lebedev.ri@gmail.com> |
[NFC][Codegen] Rename tests for oversized shifts by byte multiple
|