Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
9122c523 |
| 15-Nov-2024 |
Pengcheng Wang <wangpengcheng.pp@bytedance.com> |
[RISCV] Enable bidirectional scheduling and tracking register pressure (#115445)
This is based on other targets like PPC/AArch64 and some experiments.
This PR will only enable bidirectional schedu
[RISCV] Enable bidirectional scheduling and tracking register pressure (#115445)
This is based on other targets like PPC/AArch64 and some experiments.
This PR will only enable bidirectional scheduling and tracking register pressure.
Disclaimer: I haven't tested it on many cores, maybe we should make some options being features. I believe downstreams must have tried this before, so feedbacks are welcome.
show more ...
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2 |
|
#
2967e5f8 |
| 11-Oct-2024 |
Alex Bradbury <asb@igalia.com> |
[RISCV] Enable store clustering by default (#73796)
Builds on #73789, enabling store clustering by default using the same
heuristic.
|
Revision tags: llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
5ce067d5 |
| 11-Jan-2024 |
Philip Reames <preames@rivosinc.com> |
Revert "[LSR][TTI][RISCV] Disable terminator folding for RISC-V."
This reverts commit fdb87640ee2be63af9b0e0cd943cb13d79686a03, and thus re-enables terminator folding for RISCV. The reported miscom
Revert "[LSR][TTI][RISCV] Disable terminator folding for RISC-V."
This reverts commit fdb87640ee2be63af9b0e0cd943cb13d79686a03, and thus re-enables terminator folding for RISCV. The reported miscompile has been fixed in f5dd70c58277d925710e5a7c25c86d7565cc3c6c.
show more ...
|
#
fdb87640 |
| 27-Dec-2023 |
Craig Topper <craig.topper@sifive.com> |
[LSR][TTI][RISCV] Disable terminator folding for RISC-V.
This is a partial revert of e947f953370abe8ffc8713b8f3250a3ec39599fe.
It caused a miscompile in downstream testing.
Spoke with Philip offli
[LSR][TTI][RISCV] Disable terminator folding for RISC-V.
This is a partial revert of e947f953370abe8ffc8713b8f3250a3ec39599fe.
It caused a miscompile in downstream testing.
Spoke with Philip offline. We believe the issue is that LSR needs to make sure the Step of the other AddRec is non-zero. Reverting until Philip is back from vacation.
show more ...
|
#
ffb2af3e |
| 07-Dec-2023 |
Philip Reames <preames@rivosinc.com> |
[SCEVExpander] Attempt to reinfer flags dropped due to CSE (#72431)
LSR uses SCEVExpander to generate induction formulas. The expander
internally tries to reuse existing IR expressions. To do that,
[SCEVExpander] Attempt to reinfer flags dropped due to CSE (#72431)
LSR uses SCEVExpander to generate induction formulas. The expander
internally tries to reuse existing IR expressions. To do that, it needs
to strip any poison generating flags (nsw, nuw, exact, nneg, etc..)
which may not be valid for the newly added users.
This is conservatively correct, but has the effect that LSR will strip
nneg flags on zext instructions involved in trip counts in loop
preheaders. To avoid this, this patch adjusts the expanded to reinfer
the flags on the CSE candidate if legal for all possible users.
This should fix the regression reported in
https://github.com/llvm/llvm-project/issues/71200.
This should arguably be done inside canReuseInstruction instead, but
doing it outside is more conservative compile time wise. Both
canReuseInstruction and isGuaranteedNotToBePoison walk operand lists, so
right now we are performing work which is roughly O(N^2) in the size of
the operand graph. We should fix that before making the per operand step
more expensive. My tenative plan is to land this, and then rework the
code to sink the logic into more core interfaces.
show more ...
|
#
eecb99c5 |
| 05-Dec-2023 |
Nikita Popov <npopov@redhat.com> |
[Tests] Add disjoint flag to some tests (NFC)
These tests rely on SCEV looking recognizing an "or" with no common bits as an "add". Add the disjoint flag to relevant or instructions in preparation f
[Tests] Add disjoint flag to some tests (NFC)
These tests rely on SCEV looking recognizing an "or" with no common bits as an "add". Add the disjoint flag to relevant or instructions in preparation for switching SCEV to use the flag instead of the ValueTracking query. The IR with disjoint flag matches what InstCombine would produce.
show more ...
|
#
e947f953 |
| 29-Nov-2023 |
Philip Reames <preames@rivosinc.com> |
[LSR][TTI][RISCV] Enable terminator folding for RISC-V
If looking for a miscompile revert candidate, look here!
The transform being enabled prefers comparing to a loop invariant exit value for a se
[LSR][TTI][RISCV] Enable terminator folding for RISC-V
If looking for a miscompile revert candidate, look here!
The transform being enabled prefers comparing to a loop invariant exit value for a secondary IV over using an otherwise dead primary IV. This increases register pressure (by requiring the exit value to be live through the loop), but reduces the number of instructions within the loop by one.
On RISC-V which has a large number of scalar registers, this is generally a profitable transform. We loose the ability to use a beqz on what is typically a count down IV, and pay the cost of computing the exit value on the secondary IV in the loop preheader, but save an add or sub in the loop body. For anything except an extremely short running loop, or one with extreme register pressure, this is profitable. On spec2017, we see a 0.42% geomean improvement in dynamic icount, with no individual workload regressing by more than 0.25%.
Code size wise, we trade a (possibly compressible) beqz and a (possibly compressible) addi for a uncompressible beq. We also add instructions in the preheader. Net result is a slight regression overall, but neutral or better inside the loop.
Previous versions of this transform had numerous cornercase correctness bugs. All of them ones I can spot by inspection have been fixed, and I have run this through all of spec2017, but there may be further issues lurking. Adding uses to an IV is a fraught thing to do given poison semantics, so this transform is somewhat inherently risky.
This patch is a reworked version of D134893 by @eop. That patch has been abandoned since May, so I picked it up, reworked it a bit, and am landing it.
show more ...
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5 |
|
#
d64d5ea1 |
| 13-Nov-2023 |
Yingwei Zheng <dtcxzyw2333@gmail.com> |
[RISCV][CodeGenPrepare] Remove duplicated transform for zext. NFC. (#72053)
After #71534 and #72052, the transform `zext -> zext nneg` in
`RISCVCodeGenPrepare` is redundant.
|
Revision tags: llvmorg-17.0.4, llvmorg-17.0.3 |
|
#
86240751 |
| 06-Oct-2023 |
Philip Reames <preames@rivosinc.com> |
[RISCV] Strip W suffix from ADDIW (#68425)
The motivation of this change is simply to reduce test duplication. As
can be seen in the (massive) test delta, we have many tests whose output
differ on
[RISCV] Strip W suffix from ADDIW (#68425)
The motivation of this change is simply to reduce test duplication. As
can be seen in the (massive) test delta, we have many tests whose output
differ only due to the use of addi on rv32 vs addiw on rv64 when the
high bits are don't care.
As an aside, we don't need to worry about the non-zero immediate
restriction on the compressed variants because we're not directly
forming the compressed variants. If we happen to get a zero immediate
for the ADDI, then either a later optimization will strip the useless
instruction or the encoder is responsible for not compressing the
instruction.
show more ...
|
Revision tags: llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6 |
|
#
a2b5b584 |
| 25-Nov-2022 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Use register allocation hints to improve use of compressed instructions.
Compressed instructions usually require one of the source registers to also be the source register. The register allo
[RISCV] Use register allocation hints to improve use of compressed instructions.
Compressed instructions usually require one of the source registers to also be the source register. The register allocator doesn't have that bias on its own.
This patch adds register allocation hints to introduce this bias. I've started with ADDI, ADDIW, and SLLI. These all have a 5-bit field for the register. If the source and dest register are the same they are guaranteed to compress as long as the immediate is also 6 bits.
This code was inspired by similar code from the SystemZ target.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D138242
show more ...
|
Revision tags: llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
#
8cc48309 |
| 17-Jul-2022 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Teach RISCVCodeGenPrepare to optimize (i64 (and (zext/sext (i32 X), C1)))
If X is known positive by a dominating condition, we can fill in ones into the upper bits of C1 if that would allow
[RISCV] Teach RISCVCodeGenPrepare to optimize (i64 (and (zext/sext (i32 X), C1)))
If X is known positive by a dominating condition, we can fill in ones into the upper bits of C1 if that would allow it to become an simm12 allowing the use of ANDI.
This pattern often occurs in unrolled loops where the induction variable has been widened.
To get the best benefit from this, I had to move the pass above ConstantHoisting which is in addIRPasses. Otherwise the AND constant is often hoisted away from the AND.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D129888
show more ...
|
#
1a8468ba |
| 14-Jul-2022 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Add a RISCV specific CodeGenPrepare pass.
Initial optimization is to convert (i64 (zext (i32 X))) to (i64 (sext (i32 X))) if the dominating condition for the basic block guaranteed the sign
[RISCV] Add a RISCV specific CodeGenPrepare pass.
Initial optimization is to convert (i64 (zext (i32 X))) to (i64 (sext (i32 X))) if the dominating condition for the basic block guaranteed the sign bit of X is zero.
This frequently occurs in loop preheaders where a signed induction variable that can never be negative has been widened. There will be a dominating check that the 32-bit trip count isn't negative or zero. The check here is not restricted to that specific case though.
A i32->i64 sext is cheaper than zext on RV64 without the Zba extension. Later optimizations can often remove the sext from the preheader basic block because the dominating block also needs a sext to evaluate the greater than 0 check.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D129732
show more ...
|