Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
9122c523 |
| 15-Nov-2024 |
Pengcheng Wang <wangpengcheng.pp@bytedance.com> |
[RISCV] Enable bidirectional scheduling and tracking register pressure (#115445)
This is based on other targets like PPC/AArch64 and some experiments.
This PR will only enable bidirectional schedu
[RISCV] Enable bidirectional scheduling and tracking register pressure (#115445)
This is based on other targets like PPC/AArch64 and some experiments.
This PR will only enable bidirectional scheduling and tracking register pressure.
Disclaimer: I haven't tested it on many cores, maybe we should make some options being features. I believe downstreams must have tried this before, so feedbacks are welcome.
show more ...
|
#
aef0e77c |
| 05-Nov-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[DAG] visitAND - Fold (and (srl X, C), 1) -> (srl X, BW-1) for signbit extraction (#114992)
If we're masking the LSB of a SRL node result and that is shifting down an extended sign bit, see if we ca
[DAG] visitAND - Fold (and (srl X, C), 1) -> (srl X, BW-1) for signbit extraction (#114992)
If we're masking the LSB of a SRL node result and that is shifting down an extended sign bit, see if we can change the SRL to shift down the MSB directly.
These patterns can occur during legalisation when we've sign extended to a wider type but the SRL is still shifting from the subreg.
Alternative to #114967
Fixes the remaining regression in #112588
show more ...
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3 |
|
#
abc1acf8 |
| 14-Aug-2024 |
Craig Topper <craig.topper@sifive.com> |
[TargetLowering][AMDGPU][ARM][RISCV][X86] Teach SimplifyDemandedBits to combine (srl (sra X, C1), ShAmt) -> sra(X, C1+ShAmt) (#101751)
If the upper bits of the shr aren't demanded.
This helps wit
[TargetLowering][AMDGPU][ARM][RISCV][X86] Teach SimplifyDemandedBits to combine (srl (sra X, C1), ShAmt) -> sra(X, C1+ShAmt) (#101751)
If the upper bits of the shr aren't demanded.
This helps with cases where the outer srl was originally an sra and was
converted to a srl by SimplifyDemandedBits before it had a chance to
combine with the inner sra. This can occur when the inner sra was part
of a sign_extend_inreg expansion.
There are some regressions in ARM and Thumb2.
show more ...
|
Revision tags: llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
eabaee0c |
| 07-Jan-2024 |
Fangrui Song <i@maskray.me> |
[RISCV] Omit "@plt" in assembly output "call foo@plt" (#72467)
R_RISCV_CALL/R_RISCV_CALL_PLT distinction is not necessary and R_RISCV_CALL has been deprecated. Since https://reviews.llvm.org/D132530
[RISCV] Omit "@plt" in assembly output "call foo@plt" (#72467)
R_RISCV_CALL/R_RISCV_CALL_PLT distinction is not necessary and R_RISCV_CALL has been deprecated. Since https://reviews.llvm.org/D132530 `call foo` assembles to R_RISCV_CALL_PLT. The `@plt` suffix is not useful and can be removed now (matching AArch64 and PowerPC).
GNU assembler assembles `call foo` to RISCV_CALL_PLT since 2022-09 (70f35d72ef04cd23771875c1661c9975044a749c).
Without this patch, unconditionally changing MO_CALL to MO_PLT could create `jump .L1@plt, a0`, which is invalid in LLVM integrated assembler and GNU assembler.
show more ...
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3 |
|
#
86240751 |
| 06-Oct-2023 |
Philip Reames <preames@rivosinc.com> |
[RISCV] Strip W suffix from ADDIW (#68425)
The motivation of this change is simply to reduce test duplication. As
can be seen in the (massive) test delta, we have many tests whose output
differ on
[RISCV] Strip W suffix from ADDIW (#68425)
The motivation of this change is simply to reduce test duplication. As
can be seen in the (massive) test delta, we have many tests whose output
differ only due to the use of addi on rv32 vs addiw on rv64 when the
high bits are don't care.
As an aside, we don't need to worry about the non-zero immediate
restriction on the compressed variants because we're not directly
forming the compressed variants. If we happen to get a zero immediate
for the ADDI, then either a later optimization will strip the useless
instruction or the encoder is responsible for not compressing the
instruction.
show more ...
|
Revision tags: llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6 |
|
#
38f7c7eb |
| 07-Jun-2023 |
Florian Mayer <fmayer@google.com> |
Revert "Revert "[RISCV] Add special case to selectImm for constants that can be created with (ADD (SLLI C, 32), C).""
Revert broke even more stuff.
This reverts commit d5fbec30939f2c9f82475cf42c638
Revert "Revert "[RISCV] Add special case to selectImm for constants that can be created with (ADD (SLLI C, 32), C).""
Revert broke even more stuff.
This reverts commit d5fbec30939f2c9f82475cf42c638619514b5c67.
show more ...
|
#
d5fbec30 |
| 07-Jun-2023 |
Florian Mayer <fmayer@google.com> |
Revert "[RISCV] Add special case to selectImm for constants that can be created with (ADD (SLLI C, 32), C)."
Triggers UBSan error.
This reverts commit 58b2d652af49ee9d9ff2af6edd7f67f23b26bfee.
|
#
58b2d652 |
| 06-Jun-2023 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Add special case to selectImm for constants that can be created with (ADD (SLLI C, 32), C).
Where C is a simm32.
This costs an extra temporary register, but avoids a constant pool.
Reviewe
[RISCV] Add special case to selectImm for constants that can be created with (ADD (SLLI C, 32), C).
Where C is a simm32.
This costs an extra temporary register, but avoids a constant pool.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D152236
show more ...
|
Revision tags: llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init |
|
#
86eff6be |
| 20-Jan-2023 |
Philip Reames <preames@rivosinc.com> |
[MachineCombiner] Use default latency model when no detailed model available
This change adjusts the cost modeling used when the target does not have a schedule model with individual instruction lat
[MachineCombiner] Use default latency model when no detailed model available
This change adjusts the cost modeling used when the target does not have a schedule model with individual instruction latencies. After this change, we use the default latency information available from TargetSchedule. The default latency information essentially ends up treating most instructions as latency 1, with a few "expensive" ones getting a higher cost.
Previously, we unconditionally applied the first legal pattern - without any consideration of profitability. As a result, this change both prevents some patterns being applied, and changes which patterns are exercised. (i.e. previously the first pattern was applied, afterwards, maybe the second one is because the first wasn't profitable.)
The motivation here is two fold.
First, this brings the default behavior in line with the behavior when -mcpu or -mtune is specified. This improves test coverage, and generally makes it less likely we will have bad surprises when providing more information to the compiler.
Second, this enables some reassociation for ILP by default. Despite being unconditionally enabled, the prior code tended to "reassociate" repeatedly through an entire chain and simply moving the first operand to the end. The result was still a serial chain, just a different one. With this change, one of the intermediate transforms is unprofitable and we end up with a partially flattened tree.
Note that the resulting code diffs show significant room for improvement in the basic algorithm. I am intentionally excluding those from this patch.
For the test diffs, I don't seen any concerning regressions. I took a fairly close look at the RISCV ones, but only skimmed the x86 (particularly vector x86) changes.
Differential Revision: https://reviews.llvm.org/D141017
show more ...
|
Revision tags: llvmorg-15.0.7 |
|
#
002005e6 |
| 22-Dec-2022 |
Hsiangkai Wang <hsiangkai@google.com> |
[RISCV] Add integer scalar instructions to isAssociativeAndCommutative
Inspired by D138107.
We can add ADD, AND, OR, XOR, MUL, MIN[U]/MAX[U] to isAssociativeAndCommutative to increase instruction-l
[RISCV] Add integer scalar instructions to isAssociativeAndCommutative
Inspired by D138107.
We can add ADD, AND, OR, XOR, MUL, MIN[U]/MAX[U] to isAssociativeAndCommutative to increase instruction-level parallelism by the existing MachineCombiner pass.
Differential Revision: https://reviews.llvm.org/D140530
show more ...
|
#
d64d3c5a |
| 22-Dec-2022 |
Nitin John Raj <nitin.raj@sifive.com> |
[RISCV] Add pass to remove W suffix from ADDIW and SLLIW to improve compressibility
SLLI and ADD are more compressible than SLLIW and ADDW. SLLI/ADD both have a 5-bit register encoding. SLLIW/ADDW h
[RISCV] Add pass to remove W suffix from ADDIW and SLLIW to improve compressibility
SLLI and ADD are more compressible than SLLIW and ADDW. SLLI/ADD both have a 5-bit register encoding. SLLIW/ADDW have a 3-bit register encoding. They both require the dest to also be one of the sources.
We aggressively form ADDW/SLLIW as it helps hasAllWBitUsers in RISCVISelDAGToDAG to not require recursion. So we need a pass to remove excessive -w suffixes.
Differential Revision: https://reviews.llvm.org/D139948
show more ...
|
#
e00e20a0 |
| 01-Dec-2022 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Add ADDW/AND/OR/XOR/SUB/SUBW to getRegAllocHints.
These instructions requires both register operands to be compressible so I've only applied the hint if we already have a GPRC physical regis
[RISCV] Add ADDW/AND/OR/XOR/SUB/SUBW to getRegAllocHints.
These instructions requires both register operands to be compressible so I've only applied the hint if we already have a GPRC physical register assigned for the other register operand.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D139079
show more ...
|
Revision tags: llvmorg-15.0.6 |
|
#
64612f5d |
| 25-Nov-2022 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Add ADD to getRegAllocationHints to improve to improve use of c.add.
add can always be compressed to c.add if one of the sources is the same as the destination.
The same is not true for c.a
[RISCV] Add ADD to getRegAllocationHints to improve to improve use of c.add.
add can always be compressed to c.add if one of the sources is the same as the destination.
The same is not true for c.addw where the registers need to be x8-x15.
show more ...
|
Revision tags: llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1 |
|
#
38ffa2bb |
| 12-Sep-2022 |
Craig Topper <craig.topper@sifive.com> |
[LegalizeTypes] Improve splitting for urem/udiv by constant for some constants.
For remainder: If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves together and use a (Bitwidt
[LegalizeTypes] Improve splitting for urem/udiv by constant for some constants.
For remainder: If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer type, this urem will be expand by DAGCombiner using multiply by magic constant. We do have to take into account that adding high and low together can produce a carry, making it a (BitWidth / 2)+1 bit number. So we need to also add back in the carry from the first addition.
For division: We can use the above trick to compute the remainder, subtract that remainder from the dividend, then multiply by the multiplicative inverse of the Divisor modulo (1 << BitWidth).
This is based on the section "Remainder by Summing Digits" in Hacker's delight.
The remainder trick is similar to a trick you may have learned for determining if a decimal number is divisible by 3. You can add all the digits together and see if the sum is divisible by 3. If you're not sure if the sum is divisible by 3, you can add its digits together. This can be repeated until you have a single decimal digit. If that digit is 3, 6, or 9, then the original number is divisible by 3. This works because 10 % 3 == 1.
gcc already does this same trick. There are additional tricks gcc does urem as well as srem, udiv, and sdiv that I plan to add in future patches.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D130862
show more ...
|
Revision tags: llvmorg-15.0.0, llvmorg-15.0.0-rc3 |
|
#
47b1f836 |
| 10-Aug-2022 |
Alex Bradbury <asb@igalia.com> |
[RISCV] Implement isUsedByReturnOnly TargetLowering hook in order to tailcall more libcalls
Prior to this patch, libcalls inserted by the SelectionDAG legalizer could never be tailcalled. The eligib
[RISCV] Implement isUsedByReturnOnly TargetLowering hook in order to tailcall more libcalls
Prior to this patch, libcalls inserted by the SelectionDAG legalizer could never be tailcalled. The eligibility of libcalls for tail calling is is partly determined by checking TargetLowering::isInTailCallPosition and comparing the return type of the libcall and the calleer. isInTailCallPosition in turn calls TargetLowering::isUsedByReturnOnly (which always returns false if not implemented by the target).
This patch provides a minimal implementation of TargetLowering::isUsedByReturnOnly - enough to support tail calling libcalls on hard float ABIs. Soft-float ABIs are left for a follow on patch. libcall-tail-calls.ll also shows missed opportunities to tail call integer libcalls, but this is due to issues outside of the isUsedByReturnOnly hook.
Differential Revision: https://reviews.llvm.org/D131087
show more ...
|
Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
#
1903b991 |
| 07-Apr-2022 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Always select (and (srl X, C), Mask) as (srli (slli X, C2), C3).
SLLI is always compressible to C.SLLI as long as the source and dest register is the same.
ANDI and SRLI are only compressib
[RISCV] Always select (and (srl X, C), Mask) as (srli (slli X, C2), C3).
SLLI is always compressible to C.SLLI as long as the source and dest register is the same.
ANDI and SRLI are only compressible if the register is x8-x15. By using SLLI we have a better chance of generating shorter code.
I had to exclude one exclusion for the BEXTI case so that it's pattern match could still fire.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D123336
show more ...
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
#
bd653f64 |
| 11-Jan-2022 |
Haocong.Lu <Haocong.Lu@streamcomputing.com> |
[RISCV] Use shift for zero extension when Zbb and Zbp are not enabled
Now AND is used for zero extension when both Zbb and Zbp are not enabled. It may be better to use shift operation if the trailin
[RISCV] Use shift for zero extension when Zbb and Zbp are not enabled
Now AND is used for zero extension when both Zbb and Zbp are not enabled. It may be better to use shift operation if the trailing ones mask exceeds simm12.
This patch optimzes LUI+ADDI+AND to SLLI+SRLI.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D116720
show more ...
|
#
b645bcd9 |
| 10-Jan-2022 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Generalize (srl (and X, 0xffff), C) -> (srli (slli X, (XLen-16), (XLen-16) + C) optimization.
This can be generalized to (srl (and X, C2), C) -> (srli (slli X, (XLen-C3), (XLen-C3) + C). Whe
[RISCV] Generalize (srl (and X, 0xffff), C) -> (srli (slli X, (XLen-16), (XLen-16) + C) optimization.
This can be generalized to (srl (and X, C2), C) -> (srli (slli X, (XLen-C3), (XLen-C3) + C). Where C2 is a mask with C3 trailing ones.
This can avoid constant materialization for C2. This is beneficial even when C2 can be selected to ANDI because the SLLI can become C.SLLI, but C.ANDI cannot cover all the immediates of ANDI.
This also enables CSE in some cases of i8 sdiv by constant codegen.
show more ...
|
#
41454ab2 |
| 31-Dec-2021 |
wangpc <pc.wang@linux.alibaba.com> |
[RISCV] Use constant pool for large integers
For large integers (for example, magic numbers generated by TargetLowering::BuildSDIV when dividing by constant), we may need about 4~8 instructions to b
[RISCV] Use constant pool for large integers
For large integers (for example, magic numbers generated by TargetLowering::BuildSDIV when dividing by constant), we may need about 4~8 instructions to build them. In the same time, it just takes two instructions to load constants (with extra cycles to access memory), so it may be profitable to put these integers into constant pool.
Reviewed By: asb, craig.topper
Differential Revision: https://reviews.llvm.org/D114950
show more ...
|
#
6f7de819 |
| 09-Dec-2021 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Use MULHU for more division by constant cases.
D113805 improved handling of i32 divu/remu on RV64. The basic idea from that can be extended to (mul (and X, C2), C1) where C2 is any mask cons
[RISCV] Use MULHU for more division by constant cases.
D113805 improved handling of i32 divu/remu on RV64. The basic idea from that can be extended to (mul (and X, C2), C1) where C2 is any mask constant.
We can replace the and with an SLLI by shifting by the number of leading zeros in C2 if we also shift C1 left by XLen - lzcnt(C1) bits. This will give the full product XLen additional trailing zeros, putting the result in the output of MULHU. If we can't use ANDI, ZEXT.H, or ZEXT.W, this will avoid materializing C2 in a register.
The downside is it make take 1 additional instruction to create C1. But since that's not on the critical path, it can hopefully be interleaved with other operations.
The previous tablegen pattern is replaced by custom isel code.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D115310
show more ...
|
Revision tags: llvmorg-13.0.1-rc1 |
|
#
af0ecfcc |
| 22-Nov-2021 |
wangpc <pc.wang@linux.alibaba.com> |
[RISCV] Generate pseudo instruction li
Add an alias of `addi [x], zero, imm` to generate pseudo instruction li, which makes assembly mush more readable. For existed tests, users can update them by r
[RISCV] Generate pseudo instruction li
Add an alias of `addi [x], zero, imm` to generate pseudo instruction li, which makes assembly mush more readable. For existed tests, users can update them by running script `llvm/utils/update_llc_test_checks.py`.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D112692
show more ...
|
#
02bed66c |
| 12-Nov-2021 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Improve codegen for i32 udiv/urem by constant on RV64.
The division by constant optimization often produces constants that are uimm32, but not simm32. These constants require 3 or 4 instruct
[RISCV] Improve codegen for i32 udiv/urem by constant on RV64.
The division by constant optimization often produces constants that are uimm32, but not simm32. These constants require 3 or 4 instructions to materialize without Zba.
Since these instructions are often used by a multiply with a LHS that needs to be zero extended with an AND, we can switch the MUL to a MULHU by shifting both inputs left by 32. Once we shift the constant left, the upper 32 bits no longer need to be 0 so constant materialization is free to use LUI+ADDIW. This reduces the constant materialization from 4 instructions to 3 in some cases while also reducing the zero extend of the LHS from 2 shifts to 1.
Differential Revision: https://reviews.llvm.org/D113805
show more ...
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
#
d9ba1a9c |
| 18-Aug-2021 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Teach isel to select ADDW/SUBW/MULW/SLLIW when only the lower 32-bits are used.
We normally select these when the root node is a sext_inreg, but SimplifyDemandedBits can sometimes bypass the
[RISCV] Teach isel to select ADDW/SUBW/MULW/SLLIW when only the lower 32-bits are used.
We normally select these when the root node is a sext_inreg, but SimplifyDemandedBits can sometimes bypass the sext_inreg for some users. This can create situation where sext_inreg+add/sub/mul/shl is selected to a W instruction, and then the add/sub/mul/shl is separately selected to a non-W instruction with the same inputs.
This patch tries to detect when it would still be ok to use a W instruction without the sext_inreg by checking the direct users. This can allow the W instruction to CSE with one created for a sext_inreg+add/sub/mul/shl. To minimize complexity and cost of checking, we make no attempt to determine if the CSE will happen and just always use a W instruction when we can.
Differential Revision: https://reviews.llvm.org/D107658
show more ...
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init |
|
#
98d4adc2 |
| 20-Jul-2021 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Add custom isel to select (and (srl X, C1), C2) and (and (shl X, C1), C2)
Replace some existing isel patterns that are covered by the new code. SLLIUWPat has been removed in favor of folding
[RISCV] Add custom isel to select (and (srl X, C1), C2) and (and (shl X, C1), C2)
Replace some existing isel patterns that are covered by the new code. SLLIUWPat has been removed in favor of folding its root case into the new code. The other uses in isel patterns for shXadd.uw have been switched to using hardcoded AND masks.
This is based on the original version of D49585 from ARM. The final version of that was made a DAG combine, but I've chosen to keep it as custom isel. I'm not convinced DAG combine is as good with shift pairs as it is with and+shift. I saw some issues optimizing the shifts created by vscale lowering if an and isn't created for from a shift pair.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D106230
show more ...
|
#
00c1cc86 |
| 18-Jul-2021 |
Craig Topper <craig.topper@sifive.com> |
[RISCV] Add more i32 srem/sdiv with power of 2 constant tests. NFC
Add a small power 2 srem test to match existing sdiv test. Add larger power of 2 test to both.
The larger constant test shows mate
[RISCV] Add more i32 srem/sdiv with power of 2 constant tests. NFC
Add a small power 2 srem test to match existing sdiv test. Add larger power of 2 test to both.
The larger constant test shows materialization of a constant for an AND in the RV64 code. We should be using W shift instructions to match the RV32 code.
show more ...
|