Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
9122c523 |
| 15-Nov-2024 |
Pengcheng Wang <wangpengcheng.pp@bytedance.com> |
[RISCV] Enable bidirectional scheduling and tracking register pressure (#115445)
This is based on other targets like PPC/AArch64 and some experiments.
This PR will only enable bidirectional schedu
[RISCV] Enable bidirectional scheduling and tracking register pressure (#115445)
This is based on other targets like PPC/AArch64 and some experiments.
This PR will only enable bidirectional scheduling and tracking register pressure.
Disclaimer: I haven't tested it on many cores, maybe we should make some options being features. I believe downstreams must have tried this before, so feedbacks are welcome.
show more ...
|
#
08411c85 |
| 06-Nov-2024 |
Gergely Futo <gergely.futo@hightec-rt.com> |
[RISCV] Correct fcopysign pattern for zdinx (#114954)
Correcting the pattern fixes the following error:
fatal error: error in backend: Cannot select: t17: f64 = fcopysign t5,
t8
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4 |
|
#
9a1eded9 |
| 03-Sep-2024 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Custom legalize f16/bf16 FCOPYSIGN with Zfhmin/Zbfmin. (#107039)
The LegalizeDAG expansion will go through memory since i16 isn't a legal
type. Avoid this by using FMV nodes.
Similar to
[RISCV] Custom legalize f16/bf16 FCOPYSIGN with Zfhmin/Zbfmin. (#107039)
The LegalizeDAG expansion will go through memory since i16 isn't a legal
type. Avoid this by using FMV nodes.
Similar to what we did for #106886 for FNEG and FABS. Special care is
needed to handle the Sign operand being a different type.
show more ...
|
#
3bdec313 |
| 01-Sep-2024 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Custom legalize f16/bf16 FNEG/FABS with Zfhmin/Zbfmin. (#106886)
The LegalizeDAG expansion will go through memory since i16 isn't a legal
type. Avoid this by using FMV nodes.
|
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
eabaee0c |
| 07-Jan-2024 |
Fangrui Song <i@maskray.me> |
[RISCV] Omit "@plt" in assembly output "call foo@plt" (#72467)
R_RISCV_CALL/R_RISCV_CALL_PLT distinction is not necessary and R_RISCV_CALL has been deprecated. Since https://reviews.llvm.org/D132530
[RISCV] Omit "@plt" in assembly output "call foo@plt" (#72467)
R_RISCV_CALL/R_RISCV_CALL_PLT distinction is not necessary and R_RISCV_CALL has been deprecated. Since https://reviews.llvm.org/D132530 `call foo` assembles to R_RISCV_CALL_PLT. The `@plt` suffix is not useful and can be removed now (matching AArch64 and PowerPC).
GNU assembler assembles `call foo` to RISCV_CALL_PLT since 2022-09 (70f35d72ef04cd23771875c1661c9975044a749c).
Without this patch, unconditionally changing MO_CALL to MO_PLT could create `jump .L1@plt, a0`, which is invalid in LLVM integrated assembler and GNU assembler.
show more ...
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
#
ee34fa00 |
| 06-Jul-2023 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Add DAG combine for (fmv_w_x_rv64 (fmv_x_anyextw_rv64 X))
This pattern started showing up more after D151284
|
#
5ba40c7b |
| 30-Jun-2023 |
Alex Bradbury <asb@igalia.com> |
[RISCV] Custom lower FP_TO_FP16 and FP16_TO_FP to correct ABI of of libcall
As introduced in D99148, RISC-V uses the softPromoteHalf legalisation for fp16 values without zfh, with logic ensuring tha
[RISCV] Custom lower FP_TO_FP16 and FP16_TO_FP to correct ABI of of libcall
As introduced in D99148, RISC-V uses the softPromoteHalf legalisation for fp16 values without zfh, with logic ensuring that f16 values are passed in lower bits of FPRs (see D98670) when F or D support is present. This legalisation produces ISD::FP_TO_FP16 and ISD::FP16_TO_FP nodes which (as described in ISDOpcodes.h) provide a "semi-softened interface for dealing with f16 (as an i16)". i.e. the return type of the FP_TO_FP16 is an integer rather than a float (and the arg of FP16_TO_FP is an integer). The remainder of the description focuses primarily on FP_TO_FP16 for ease of explanation.
FP_TO_FP16 is lowered to a libcall to `__truncsfhf2 (float)` or `__truncdfhf2 (double)`. As of D92241, `_Float16` is used as the return type of these libcalls if the host compiler accepts `_Float16` in a test input (i.e. dst_t is set to `_Float16`). `_Float16` is enabled for the RISC-V target as of D105001 and so the return value should be passed in an FPR on hard float ABIs.
This patch fixes the ABI issue in what appears to be a minimally invasive way - leaving the softPromoteHalf logic undisturbed, and lowering FP_TO_FP16 to an f32-returning libcall, converting its result to an XLen integer value.
As can be seen in the test changes, the custom lowering for FP16_TO_FP means the libcall is no longer tail-callable.
Although this patch fixes the issue, there are two open items: * Redundant fmv.x.w and fmv.w.x pairs are now somtimes produced during lowering (not a correctness issue). * Now coverage for STRICT variants of FP16 conversion opcodes.
Differential Revision: https://reviews.llvm.org/D151284
show more ...
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1 |
|
#
7b0c4184 |
| 28-Mar-2023 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Move compressible registers to the beginning of the FP allocation order.
We don't have very many compressible FP instructions, just load and store. These instruction require the FP register
[RISCV] Move compressible registers to the beginning of the FP allocation order.
We don't have very many compressible FP instructions, just load and store. These instruction require the FP register to be f8-f15.
This patch changes the FP allocation order to prioritize f10-f15 first. These are also the FP argument registers. So I allocated them in reverse order starting at f15 to avoid taking the first argument registers. This appears to match gcc allocation order.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D146488
show more ...
|
Revision tags: llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init |
|
#
86eff6be |
| 20-Jan-2023 |
Philip Reames <preames@rivosinc.com> |
[MachineCombiner] Use default latency model when no detailed model available
This change adjusts the cost modeling used when the target does not have a schedule model with individual instruction lat
[MachineCombiner] Use default latency model when no detailed model available
This change adjusts the cost modeling used when the target does not have a schedule model with individual instruction latencies. After this change, we use the default latency information available from TargetSchedule. The default latency information essentially ends up treating most instructions as latency 1, with a few "expensive" ones getting a higher cost.
Previously, we unconditionally applied the first legal pattern - without any consideration of profitability. As a result, this change both prevents some patterns being applied, and changes which patterns are exercised. (i.e. previously the first pattern was applied, afterwards, maybe the second one is because the first wasn't profitable.)
The motivation here is two fold.
First, this brings the default behavior in line with the behavior when -mcpu or -mtune is specified. This improves test coverage, and generally makes it less likely we will have bad surprises when providing more information to the compiler.
Second, this enables some reassociation for ILP by default. Despite being unconditionally enabled, the prior code tended to "reassociate" repeatedly through an entire chain and simply moving the first operand to the end. The result was still a serial chain, just a different one. With this change, one of the intermediate transforms is unprofitable and we end up with a partially flattened tree.
Note that the resulting code diffs show significant room for improvement in the basic algorithm. I am intentionally excluding those from this patch.
For the test diffs, I don't seen any concerning regressions. I took a fairly close look at the RISCV ones, but only skimmed the x86 (particularly vector x86) changes.
Differential Revision: https://reviews.llvm.org/D141017
show more ...
|
Revision tags: llvmorg-15.0.7 |
|
#
002005e6 |
| 22-Dec-2022 |
Hsiangkai Wang <hsiangkai@google.com> |
[RISCV] Add integer scalar instructions to isAssociativeAndCommutative
Inspired by D138107.
We can add ADD, AND, OR, XOR, MUL, MIN[U]/MAX[U] to isAssociativeAndCommutative to increase instruction-l
[RISCV] Add integer scalar instructions to isAssociativeAndCommutative
Inspired by D138107.
We can add ADD, AND, OR, XOR, MUL, MIN[U]/MAX[U] to isAssociativeAndCommutative to increase instruction-level parallelism by the existing MachineCombiner pass.
Differential Revision: https://reviews.llvm.org/D140530
show more ...
|
#
7b50c183 |
| 30-Nov-2022 |
Monk Chiang <monk.chiang@sifive.com> |
[RISCV] Codegen support for Zfhmin.
The Zfhmin subset only has FLH, FSH, FMV.X.H, FMV.H.X, FCVT.S.H, and FCVT.H.S. If the D extension is present, the FCVT.D.H and FCVT.H.D instructions are also incl
[RISCV] Codegen support for Zfhmin.
The Zfhmin subset only has FLH, FSH, FMV.X.H, FMV.H.X, FCVT.S.H, and FCVT.H.S. If the D extension is present, the FCVT.D.H and FCVT.H.D instructions are also included. Since most instructions are not included for Zfhmin, so most operations are promoted. The patch primarily about making f16 a legal type.
RISC-V ISA info: https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D139391
show more ...
|
#
a8c79121 |
| 30-Nov-2022 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Teach getRegAllocationHints about compressible SRAI/SRLI.
Similar to previous patches for ADDI/ADDIW/SLLI/ADD, but restricted to only cases where the register is x8-x15(GPRC reg class).
I'v
[RISCV] Teach getRegAllocationHints about compressible SRAI/SRLI.
Similar to previous patches for ADDI/ADDIW/SLLI/ADD, but restricted to only cases where the register is x8-x15(GPRC reg class).
I've restricted it so that we can be precise about whether the resulting instruction would be compressible. Changing the register allocation may make some other instruction not compressible so we should try to be accurate.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D138740
show more ...
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
#
7cbfb4eb |
| 29-Jun-2022 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Select (srl (and X, C2) as (slli (srliw X, C3), C3-C).
If C2 has 32 leading zeros and C3 trailing zeros.
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
#
6a54776f |
| 16-Mar-2022 |
Haocong.Lu <Haocong.Lu@streamcomputing.com> |
[RISCV] Select SRLI+SLLI for AND with leading ones mask
Select SRLI+SLLI for and i64 %x, imm if the imm is a leading ones mask. It's useful in RV64 when the mask exceeds simm32 (cannot be generated
[RISCV] Select SRLI+SLLI for AND with leading ones mask
Select SRLI+SLLI for and i64 %x, imm if the imm is a leading ones mask. It's useful in RV64 when the mask exceeds simm32 (cannot be generated by LUI).
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D121598
show more ...
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5 |
|
#
a9d5bb92 |
| 06-Apr-2021 |
Kito Cheng <kito.cheng@sifive.com> |
[RISCV] Use __extendhfsf2/__truncsfhf2 for fp16 <-> fp32
`__gnu_h2f_ieee` and `__gnu_f2h_ieee` are introduce by ARM and set that as default name for fp16 and fp32 conversion in LLVM.
However RISC-V
[RISCV] Use __extendhfsf2/__truncsfhf2 for fp16 <-> fp32
`__gnu_h2f_ieee` and `__gnu_f2h_ieee` are introduce by ARM and set that as default name for fp16 and fp32 conversion in LLVM.
However RISC-V GCC using default naming scheme for that, which is `__extendhfsf2` and `__truncsfhf2` for that, that cause runtime ABI incompatible issue.
Although we didn't have formal runtime ABI spec to specify those naming convention yet, but I think it would be great to fix the incompatible issue first.
And I've plan to create a runtime ABI spec undere psABI spec this year.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D118207
show more ...
|
#
a0a76fee |
| 15-Jan-2022 |
Shao-Ce SUN <shaoce@nj.iscas.ac.cn> |
[RISCV] update zfh and zfhmin extention to v1.0
`zfh` and `zfhmin` have been ratified, with version 1.0.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D117098
|
#
bd653f64 |
| 11-Jan-2022 |
Haocong.Lu <Haocong.Lu@streamcomputing.com> |
[RISCV] Use shift for zero extension when Zbb and Zbp are not enabled
Now AND is used for zero extension when both Zbb and Zbp are not enabled. It may be better to use shift operation if the trailin
[RISCV] Use shift for zero extension when Zbb and Zbp are not enabled
Now AND is used for zero extension when both Zbb and Zbp are not enabled. It may be better to use shift operation if the trailing ones mask exceeds simm12.
This patch optimzes LUI+ADDI+AND to SLLI+SRLI.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D116720
show more ...
|
#
137d3474 |
| 16-Nov-2021 |
Hsiangkai Wang <kai.wang@sifive.com> |
[RISCV] Reverse the order of loading/storing callee-saved registers.
Currently, we restore the return address register as the last restoring instruction in the epilog. The next instruction is `ret`
[RISCV] Reverse the order of loading/storing callee-saved registers.
Currently, we restore the return address register as the last restoring instruction in the epilog. The next instruction is `ret` usually. It is a use of return address register. In some microarchitectures, there is load-to-use data hazard. To avoid the load-to-use data hazard, we could separate the load instruction from its use as far as possible. In this patch, we reverse the order of restoring callee-saved registers to increase the distance of `load ra` and `ret` in the epilog.
Differential Revision: https://reviews.llvm.org/D113967
show more ...
|
#
af0ecfcc |
| 22-Nov-2021 |
wangpc <pc.wang@linux.alibaba.com> |
[RISCV] Generate pseudo instruction li
Add an alias of `addi [x], zero, imm` to generate pseudo instruction li, which makes assembly mush more readable. For existed tests, users can update them by r
[RISCV] Generate pseudo instruction li
Add an alias of `addi [x], zero, imm` to generate pseudo instruction li, which makes assembly mush more readable. For existed tests, users can update them by running script `llvm/utils/update_llc_test_checks.py`.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D112692
show more ...
|
#
1b941745 |
| 26-Aug-2021 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Insert a sext_inreg when type legalizing i32 shl by constant on RV64.
Similar to what we do for add/sub/mul.
This can help remove some sext.w. There are some regressions on some bswap tests
[RISCV] Insert a sext_inreg when type legalizing i32 shl by constant on RV64.
Similar to what we do for add/sub/mul.
This can help remove some sext.w. There are some regressions on some bswap tests, but I have an idea how to fix that for a follow up.
A new PACKW pattern is added to handle the new sext_inreg placement.
Differential Revision: https://reviews.llvm.org/D108663
show more ...
|
#
010f0f00 |
| 27-Jun-2021 |
Craig Topper <craig.topper@sifive.com> |
Revert "[RISCV] Use zexti32/sexti32 in srliw/sraiw isel patterns to improve usage of those instructions."
I thought this might help with another optimization I was thinking about, but I don't think
Revert "[RISCV] Use zexti32/sexti32 in srliw/sraiw isel patterns to improve usage of those instructions."
I thought this might help with another optimization I was thinking about, but I don't think it will. So it just wastes compile time calling computeKnownBits for no benefit.
This reverts commit 81b2f95971edd47a0057ac4a77b674d7ea620c01.
show more ...
|
#
81b2f959 |
| 26-Jun-2021 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Use zexti32/sexti32 in srliw/sraiw isel patterns to improve usage of those instructions.
|
#
dbbc95e3 |
| 01-Apr-2021 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Use softPromoteHalf legalization for fp16 without Zfh rather than PromoteFloat.
The default legalization strategy is PromoteFloat which keeps half in single precision format through multiple
[RISCV] Use softPromoteHalf legalization for fp16 without Zfh rather than PromoteFloat.
The default legalization strategy is PromoteFloat which keeps half in single precision format through multiple floating point operations. Conversion to/from float is done at loads, stores, bitcasts, and other places that care about the exact size being 16 bits.
This patches switches to the alternative method softPromoteHalf. This aims to keep the type in 16-bit format between every operation. So we promote to float and immediately round for any arithmetic operation. This should be closer to the IR semantics since we are rounding after each operation and not accumulating extra precision across multiple operations. X86 is the only other target that enables this today. See https://reviews.llvm.org/D73749
I had to update getRegisterTypeForCallingConv to force f16 to use f32 when the F extension is enabled. This way we can still pass it in the lower bits of an FPR for ilp32f and lp64f ABIs. The softPromoteHalf would otherwise always give i16 as the argument type.
Reviewed By: asb, frasercrmck
Differential Revision: https://reviews.llvm.org/D99148
show more ...
|
#
d61b40ed |
| 01-Apr-2021 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Improve 64-bit integer materialization for some cases.
This adds a new integer materialization strategy mainly targeted at 64-bit constants like 0xffffffff where there are 32 or more trailin
[RISCV] Improve 64-bit integer materialization for some cases.
This adds a new integer materialization strategy mainly targeted at 64-bit constants like 0xffffffff where there are 32 or more trailing ones with leading zeros. We can materialize these by using an addi -1 and srli to restore the leading zeros. This matches what gcc does.
I haven't limited to just these cases though. The implementation here takes the constant, shifts out all the leading zeros and shifts ones into the LSBs, creates the new sequence, adds an srli, and checks if this is shorter than our original strategy.
I've separated the recursive portion into a standalone function so I could append the new strategy outside of the recursion. Since external users are no longer using the recursive function, I've cleaned up the external interface to return the sequence instead of taking a vector by reference.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D98821
show more ...
|
Revision tags: llvmorg-12.0.0-rc4 |
|
#
a33fcafa |
| 30-Mar-2021 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Pass 'half' in the lower 16 bits of an f32 value when F extension is enabled, but Zfh is not.
Without Zfh the half type isn't legal, but it could still be used as an argument/return in IR. C
[RISCV] Pass 'half' in the lower 16 bits of an f32 value when F extension is enabled, but Zfh is not.
Without Zfh the half type isn't legal, but it could still be used as an argument/return in IR. Clang will not generate this today.
Previously we promoted the half value to float for arguments and returns if the F extension is enabled but Zfh isn't. Then depending on which ABI is enabled we would pass it in either an FPR or a GPR in float format.
If the F extension isn't enabled, it would get passed in the lower 16 bits of a GPR in half format.
With this patch the value will always in half format and will be in the lower bits of a GPR or FPR. This should be consistent with where the bits are located when Zfh is enabled.
I've based this implementation off of how this is done on ARM.
I've manually nan-boxed the value to 32 bits using integer ops. It looks like flw, fsw, fmv.s, fmv.w.x, fmf.x.w won't canonicalize nans so should leave the value alone. I think those are the instructions that could get used on this value.
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D98670
show more ...
|