Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
9122c523 |
| 15-Nov-2024 |
Pengcheng Wang <wangpengcheng.pp@bytedance.com> |
[RISCV] Enable bidirectional scheduling and tracking register pressure (#115445)
This is based on other targets like PPC/AArch64 and some experiments.
This PR will only enable bidirectional schedu
[RISCV] Enable bidirectional scheduling and tracking register pressure (#115445)
This is based on other targets like PPC/AArch64 and some experiments.
This PR will only enable bidirectional scheduling and tracking register pressure.
Disclaimer: I haven't tested it on many cores, maybe we should make some options being features. I believe downstreams must have tried this before, so feedbacks are welcome.
show more ...
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2 |
|
#
2967e5f8 |
| 11-Oct-2024 |
Alex Bradbury <asb@igalia.com> |
[RISCV] Enable store clustering by default (#73796)
Builds on #73789, enabling store clustering by default using the same
heuristic.
|
#
e45b44c6 |
| 01-Oct-2024 |
Alex Bradbury <asb@igalia.com> |
[RISCV] Add pattern for PACK/PACKH in common misaligned load case (#110644)
PACKH is currently only selected for assembling the first two bytes of a
misligned load. A fairly complex RV32-only patte
[RISCV] Add pattern for PACK/PACKH in common misaligned load case (#110644)
PACKH is currently only selected for assembling the first two bytes of a
misligned load. A fairly complex RV32-only pattern is added for
producing PACKH+PACKH+PACK to assemble the result of a misaligned 32-bit
load.
Another pattern was added that just covers PACKH for shifted offsets 16
and 24, producing a packh and shift to replace two shifts and an 'or'.
This slightly improves RV64IZKBK for a 64-bit load, but fails to match
for the misaligned 32-bit load because the load of the upper byte is
anyext in the SelectionDAG.
I wrote the patch this way because it was quick and easy and has at
least some benefit, but the "right" approach probably merits further
discussion. Introducing target-specific SDNodes for PACK* and having
custom lowering for unaligned load/stores that introduces those nodes
them seems like it might be attractive. However, adding these patterns does provide benefit - so that's what this patch does for now.
show more ...
|
#
14c4f28e |
| 01-Oct-2024 |
Alex Bradbury <asb@igalia.com> |
[RISCV] Enable load clustering by default (#73789)
We believe this is neutral or slightly better in the majority of cases.
|
Revision tags: llvmorg-19.1.1 |
|
#
39b2e35f |
| 01-Oct-2024 |
Alex Bradbury <asb@igalia.com> |
[RISCV][test] Precommit tests showing codegen for unaligned load/store with zbkb
We have missed opportunities for selecting pack* instructions, that will be addressed in future patches.
|
Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6 |
|
#
90109d44 |
| 14-May-2024 |
Alex Bradbury <asb@igalia.com> |
[RISCV] Improve constant materialisation for stores of i8 negative constants (#92131)
This follows the same pattern as 20e62658735a1b03ecadc. Although we
can't reduce the number of instructions use
[RISCV] Improve constant materialisation for stores of i8 negative constants (#92131)
This follows the same pattern as 20e62658735a1b03ecadc. Although we
can't reduce the number of instructions used, if we are able to use a
sign-extended 6-bit immediate then the 16-bit c.li instruction can be
selected (thus saving code size). Although this _could_ be gated so it
only happens if C is enabled, I've opted not to because at worst it's
neutral and it doesn't seem helpful to add unnecessary divergence
between the RVC and non-RVC paths.
show more ...
|
Revision tags: llvmorg-18.1.5, llvmorg-18.1.4 |
|
#
9067070d |
| 16-Apr-2024 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Re-separate unaligned scalar and vector memory features in the backend. (#88954)
This is largely a revert of commit
e81796671890b59c110f8e41adc7ca26f8484d20.
As #88029 shows, there exist
[RISCV] Re-separate unaligned scalar and vector memory features in the backend. (#88954)
This is largely a revert of commit
e81796671890b59c110f8e41adc7ca26f8484d20.
As #88029 shows, there exists hardware that only supports unaligned
scalar.
I'm leaving how this gets exposed to the clang interface to a future
patch.
show more ...
|
Revision tags: llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
e8179667 |
| 01-Dec-2023 |
Philip Reames <preames@rivosinc.com> |
[RISCV] Collapse fast unaligned access into a single feature [nfc-ish] (#73971)
When we'd originally added unaligned-scalar-mem and
unaligned-vector-mem, they were separated into two parts under th
[RISCV] Collapse fast unaligned access into a single feature [nfc-ish] (#73971)
When we'd originally added unaligned-scalar-mem and
unaligned-vector-mem, they were separated into two parts under the
theory that some processor might implement one, but not the other. At
the moment, we don't have evidence of such a processor. The C/C++ level
interface, and the clang driver command lines have settled on a single
unaligned flag which indicates both scalar and vector support unaligned.
Given that, let's remove the test matrix complexity for a set of
configurations which don't appear useful.
Given these are internal feature names, I don't think we need to provide
any forward compatibility. Anyone disagree?
Note: The immediate trigger for this patch was finding another case
where the unaligned-vector-mem wasn't being properly serialized to IR
from clang which resulted in problems reproducing assembly from clang's
-emit-llvm feature. Instead of fixing this, I decided getting rid of the
complexity was the better approach.
show more ...
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0 |
|
#
8f04d81e |
| 16-Sep-2023 |
Craig Topper <craig.topper@sifive.com> |
[SelectionDAG][RISCV] Mask constants to narrow size in TargetLowering::expandUnalignedStore.
If the SRL for Hi constant folds, but we don't remoe those bits from the Lo, we can end up with strange c
[SelectionDAG][RISCV] Mask constants to narrow size in TargetLowering::expandUnalignedStore.
If the SRL for Hi constant folds, but we don't remoe those bits from the Lo, we can end up with strange constant folding through DAGCombine later. I've only seen this with constants being lowered to constant pools during lowering on RISC-V.
show more ...
|
#
17a12a27 |
| 16-Sep-2023 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Add test case to show bad codegen for unaligned i64 store of a large constant.
On the first split we create two i32 trunc stores and a srl to shift the high part down. The srl gets constant
[RISCV] Add test case to show bad codegen for unaligned i64 store of a large constant.
On the first split we create two i32 trunc stores and a srl to shift the high part down. The srl gets constant folded, but to produce a new i32 constant. But the truncstore for the low store still uses the original constant.
This original constant then gets converted to a constant pool before we revisit the stores to further split them. The constant pool prevents further constant folding of the additional srls.
After legalization is done, we run DAGCombiner and get some constant folding of srl via computeKnownBits which can peek through the constant pool load. This can create new constants that also need a constant pool.
show more ...
|
Revision tags: llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
#
5b95bba6 |
| 24-Jul-2023 |
Luke Lau <luke@igalia.com> |
[RISCV] Set Fast flag for unaligned memory accesses
The +unaligned-scalar-mem and +unaligned-vector-mem features were added in D126085 and D149375 respectively to allow subtargets to indicate that t
[RISCV] Set Fast flag for unaligned memory accesses
The +unaligned-scalar-mem and +unaligned-vector-mem features were added in D126085 and D149375 respectively to allow subtargets to indicate that they supported misaligned loads/stores with "sufficient" performance. This is separate from whether or not the target actually supports misaligned accesses, which could be determined from Zicclsm.
This patch enables the Fast flag under the assumption that any subtarget that declares support for +unaligned-*-mem will want to opt into optimisations that take advantage of misaligned scalar accesses, such as store merging.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D150771
show more ...
|
#
eb3f2fe4 |
| 20-Jul-2023 |
Philip Reames <preames@rivosinc.com> |
[RISCV] Revise check names for unaligned memory op tests [nfc]
This has come up a few times in review; the current ones seem to be universally confusing. Even I as the original author of most of th
[RISCV] Revise check names for unaligned memory op tests [nfc]
This has come up a few times in review; the current ones seem to be universally confusing. Even I as the original author of most of these get confused. Switch to using the SLOW/FAST naming used by x86, hopefully that's a bit clearer.
show more ...
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5 |
|
#
5ab13332 |
| 17-May-2023 |
Luke Lau <luke@igalia.com> |
[RISCV] Add tests for store merging with unaligned scalar access
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D150770
|
Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1 |
|
#
8e43c22d |
| 22-Mar-2023 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Use LBU for extloadi8.
The Zcb extension has c.lbu, but not c.lb. This patch makes us prefer LBU over LB if we have a choice which will enable more compression opportunities.
Reviewed By: a
[RISCV] Use LBU for extloadi8.
The Zcb extension has c.lbu, but not c.lb. This patch makes us prefer LBU over LB if we have a choice which will enable more compression opportunities.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D146270
show more ...
|
Revision tags: llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init |
|
#
86eff6be |
| 20-Jan-2023 |
Philip Reames <preames@rivosinc.com> |
[MachineCombiner] Use default latency model when no detailed model available
This change adjusts the cost modeling used when the target does not have a schedule model with individual instruction lat
[MachineCombiner] Use default latency model when no detailed model available
This change adjusts the cost modeling used when the target does not have a schedule model with individual instruction latencies. After this change, we use the default latency information available from TargetSchedule. The default latency information essentially ends up treating most instructions as latency 1, with a few "expensive" ones getting a higher cost.
Previously, we unconditionally applied the first legal pattern - without any consideration of profitability. As a result, this change both prevents some patterns being applied, and changes which patterns are exercised. (i.e. previously the first pattern was applied, afterwards, maybe the second one is because the first wasn't profitable.)
The motivation here is two fold.
First, this brings the default behavior in line with the behavior when -mcpu or -mtune is specified. This improves test coverage, and generally makes it less likely we will have bad surprises when providing more information to the compiler.
Second, this enables some reassociation for ILP by default. Despite being unconditionally enabled, the prior code tended to "reassociate" repeatedly through an entire chain and simply moving the first operand to the end. The result was still a serial chain, just a different one. With this change, one of the intermediate transforms is unprofitable and we end up with a partially flattened tree.
Note that the resulting code diffs show significant room for improvement in the basic algorithm. I am intentionally excluding those from this patch.
For the test diffs, I don't seen any concerning regressions. I took a fairly close look at the RISCV ones, but only skimmed the x86 (particularly vector x86) changes.
Differential Revision: https://reviews.llvm.org/D141017
show more ...
|
Revision tags: llvmorg-15.0.7 |
|
#
002005e6 |
| 22-Dec-2022 |
Hsiangkai Wang <hsiangkai@google.com> |
[RISCV] Add integer scalar instructions to isAssociativeAndCommutative
Inspired by D138107.
We can add ADD, AND, OR, XOR, MUL, MIN[U]/MAX[U] to isAssociativeAndCommutative to increase instruction-l
[RISCV] Add integer scalar instructions to isAssociativeAndCommutative
Inspired by D138107.
We can add ADD, AND, OR, XOR, MUL, MIN[U]/MAX[U] to isAssociativeAndCommutative to increase instruction-level parallelism by the existing MachineCombiner pass.
Differential Revision: https://reviews.llvm.org/D140530
show more ...
|
#
d64d3c5a |
| 22-Dec-2022 |
Nitin John Raj <nitin.raj@sifive.com> |
[RISCV] Add pass to remove W suffix from ADDIW and SLLIW to improve compressibility
SLLI and ADD are more compressible than SLLIW and ADDW. SLLI/ADD both have a 5-bit register encoding. SLLIW/ADDW h
[RISCV] Add pass to remove W suffix from ADDIW and SLLIW to improve compressibility
SLLI and ADD are more compressible than SLLIW and ADDW. SLLI/ADD both have a 5-bit register encoding. SLLIW/ADDW have a 3-bit register encoding. They both require the dest to also be one of the sources.
We aggressively form ADDW/SLLIW as it helps hasAllWBitUsers in RISCVISelDAGToDAG to not require recursion. So we need a pass to remove excessive -w suffixes.
Differential Revision: https://reviews.llvm.org/D139948
show more ...
|
#
1456b686 |
| 19-Dec-2022 |
Nikita Popov <npopov@redhat.com> |
[RISCV] Convert some tests to opaque pointers (NFC)
|
#
d741a31a |
| 14-Dec-2022 |
Nitin John Raj <nitin.raj@sifive.com> |
[RISCV][CodeGen][SelectionDAG] Recursively check hasAllNBitUsers for logical machine opcodes
We don’t have W versions of AND/OR/XOR/ANDN/ORN/XNOR so we should recursively check their users. We shoul
[RISCV][CodeGen][SelectionDAG] Recursively check hasAllNBitUsers for logical machine opcodes
We don’t have W versions of AND/OR/XOR/ANDN/ORN/XNOR so we should recursively check their users. We should limit the recursion to SelectionDAG::MaxRecursionDepth levels.
We need to add a Depth argument, all existing callers should pass 0 to the Depth. The new recursive calls should increment it by 1. At the top of the function we should give up and return false if Depth >= SelectionDAG::MaxRecursionDepth.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D139462
show more ...
|
#
e00e20a0 |
| 01-Dec-2022 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Add ADDW/AND/OR/XOR/SUB/SUBW to getRegAllocHints.
These instructions requires both register operands to be compressible so I've only applied the hint if we already have a GPRC physical regis
[RISCV] Add ADDW/AND/OR/XOR/SUB/SUBW to getRegAllocHints.
These instructions requires both register operands to be compressible so I've only applied the hint if we already have a GPRC physical register assigned for the other register operand.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D139079
show more ...
|
Revision tags: llvmorg-15.0.6 |
|
#
a2b5b584 |
| 25-Nov-2022 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Use register allocation hints to improve use of compressed instructions.
Compressed instructions usually require one of the source registers to also be the source register. The register allo
[RISCV] Use register allocation hints to improve use of compressed instructions.
Compressed instructions usually require one of the source registers to also be the source register. The register allocator doesn't have that bias on its own.
This patch adds register allocation hints to introduce this bias. I've started with ADDI, ADDIW, and SLLI. These all have a 5-bit field for the register. If the source and dest register are the same they are guaranteed to compress as long as the immediate is also 6 bits.
This code was inspired by similar code from the SystemZ target.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D138242
show more ...
|
Revision tags: llvmorg-15.0.5, llvmorg-15.0.4 |
|
#
78739fdb |
| 29-Oct-2022 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[DAG] Enable combineShiftOfShiftedLogic folds after type legalization
This was disabled to prevent regressions, which appear to be just occurring on AMDGPU (at least in our current lit tests), which
[DAG] Enable combineShiftOfShiftedLogic folds after type legalization
This was disabled to prevent regressions, which appear to be just occurring on AMDGPU (at least in our current lit tests), which I've addressed by adding AMDGPUTargetLowering::isDesirableToCommuteWithShift overrides.
Fixes #57872
Differential Revision: https://reviews.llvm.org/D136042
show more ...
|
Revision tags: llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5 |
|
#
8a3b6ba7 |
| 26-May-2022 |
Philip Reames <preames@rivosinc.com> |
[RISCV] Add a subtarget feature to enable unaligned scalar loads and stores
A RISCV implementation can choose to implement unaligned load/store support. We currently don't have a way for such a proc
[RISCV] Add a subtarget feature to enable unaligned scalar loads and stores
A RISCV implementation can choose to implement unaligned load/store support. We currently don't have a way for such a processor to indicate a preference for unaligned load/stores, so add a subtarget feature.
There doesn't appear to be a formal extension for unaligned support. The RISCV Profiles (https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#rva20u64-profile) docs use the name Zicclsm, but a) that doesn't appear to actually been standardized, and b) isn't quite what we want here anyway due to the perf comment.
Instead, we can follow precedent from other backends and have a feature flag for the existence of misaligned load/stores with sufficient performance that user code should actually use them.
Differential Revision: https://reviews.llvm.org/D126085
show more ...
|
Revision tags: llvmorg-14.0.4 |
|
#
3e5b1e9c |
| 20-May-2022 |
Philip Reames <preames@rivosinc.com> |
[RISCV] Add test showing codegen for unaligned loads and stores of scalar types
|