History log of /llvm-project/llvm/test/CodeGen/RISCV/div.ll (Results 1 – 25 of 46)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4
# 9122c523 15-Nov-2024 Pengcheng Wang <wangpengcheng.pp@bytedance.com>

[RISCV] Enable bidirectional scheduling and tracking register pressure (#115445)


This is based on other targets like PPC/AArch64 and some experiments.

This PR will only enable bidirectional schedu

[RISCV] Enable bidirectional scheduling and tracking register pressure (#115445)


This is based on other targets like PPC/AArch64 and some experiments.

This PR will only enable bidirectional scheduling and tracking register
pressure.

Disclaimer: I haven't tested it on many cores, maybe we should make
some options being features. I believe downstreams must have tried
this before, so feedbacks are welcome.

show more ...


# aef0e77c 05-Nov-2024 Simon Pilgrim <llvm-dev@redking.me.uk>

[DAG] visitAND - Fold (and (srl X, C), 1) -> (srl X, BW-1) for signbit extraction (#114992)

If we're masking the LSB of a SRL node result and that is shifting down an extended sign bit, see if we ca

[DAG] visitAND - Fold (and (srl X, C), 1) -> (srl X, BW-1) for signbit extraction (#114992)

If we're masking the LSB of a SRL node result and that is shifting down an extended sign bit, see if we can change the SRL to shift down the MSB directly.

These patterns can occur during legalisation when we've sign extended to a wider type but the SRL is still shifting from the subreg.

Alternative to #114967

Fixes the remaining regression in #112588

show more ...


Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3
# abc1acf8 14-Aug-2024 Craig Topper <craig.topper@sifive.com>

[TargetLowering][AMDGPU][ARM][RISCV][X86] Teach SimplifyDemandedBits to combine (srl (sra X, C1), ShAmt) -> sra(X, C1+ShAmt) (#101751)

If the upper bits of the shr aren't demanded.

This helps wit

[TargetLowering][AMDGPU][ARM][RISCV][X86] Teach SimplifyDemandedBits to combine (srl (sra X, C1), ShAmt) -> sra(X, C1+ShAmt) (#101751)

If the upper bits of the shr aren't demanded.

This helps with cases where the outer srl was originally an sra and was
converted to a srl by SimplifyDemandedBits before it had a chance to
combine with the inner sra. This can occur when the inner sra was part
of a sign_extend_inreg expansion.

There are some regressions in ARM and Thumb2.

show more ...


Revision tags: llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# eabaee0c 07-Jan-2024 Fangrui Song <i@maskray.me>

[RISCV] Omit "@plt" in assembly output "call foo@plt" (#72467)

R_RISCV_CALL/R_RISCV_CALL_PLT distinction is not necessary and
R_RISCV_CALL has been deprecated. Since https://reviews.llvm.org/D132530

[RISCV] Omit "@plt" in assembly output "call foo@plt" (#72467)

R_RISCV_CALL/R_RISCV_CALL_PLT distinction is not necessary and
R_RISCV_CALL has been deprecated. Since https://reviews.llvm.org/D132530
`call foo` assembles to R_RISCV_CALL_PLT. The `@plt` suffix is not
useful and can be removed now (matching AArch64 and PowerPC).

GNU assembler assembles `call foo` to RISCV_CALL_PLT since 2022-09
(70f35d72ef04cd23771875c1661c9975044a749c).

Without this patch, unconditionally changing MO_CALL to MO_PLT could
create `jump .L1@plt, a0`, which is invalid in LLVM integrated assembler
and GNU assembler.

show more ...


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3
# 86240751 06-Oct-2023 Philip Reames <preames@rivosinc.com>

[RISCV] Strip W suffix from ADDIW (#68425)

The motivation of this change is simply to reduce test duplication. As
can be seen in the (massive) test delta, we have many tests whose output
differ on

[RISCV] Strip W suffix from ADDIW (#68425)

The motivation of this change is simply to reduce test duplication. As
can be seen in the (massive) test delta, we have many tests whose output
differ only due to the use of addi on rv32 vs addiw on rv64 when the
high bits are don't care.

As an aside, we don't need to worry about the non-zero immediate
restriction on the compressed variants because we're not directly
forming the compressed variants. If we happen to get a zero immediate
for the ADDI, then either a later optimization will strip the useless
instruction or the encoder is responsible for not compressing the
instruction.

show more ...


Revision tags: llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6
# 38f7c7eb 07-Jun-2023 Florian Mayer <fmayer@google.com>

Revert "Revert "[RISCV] Add special case to selectImm for constants that can be created with (ADD (SLLI C, 32), C).""

Revert broke even more stuff.

This reverts commit d5fbec30939f2c9f82475cf42c638

Revert "Revert "[RISCV] Add special case to selectImm for constants that can be created with (ADD (SLLI C, 32), C).""

Revert broke even more stuff.

This reverts commit d5fbec30939f2c9f82475cf42c638619514b5c67.

show more ...


# d5fbec30 07-Jun-2023 Florian Mayer <fmayer@google.com>

Revert "[RISCV] Add special case to selectImm for constants that can be created with (ADD (SLLI C, 32), C)."

Triggers UBSan error.

This reverts commit 58b2d652af49ee9d9ff2af6edd7f67f23b26bfee.


# 58b2d652 06-Jun-2023 Craig Topper <craig.topper@sifive.com>

[RISCV] Add special case to selectImm for constants that can be created with (ADD (SLLI C, 32), C).

Where C is a simm32.

This costs an extra temporary register, but avoids a constant pool.

Reviewe

[RISCV] Add special case to selectImm for constants that can be created with (ADD (SLLI C, 32), C).

Where C is a simm32.

This costs an extra temporary register, but avoids a constant pool.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D152236

show more ...


Revision tags: llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init
# 86eff6be 20-Jan-2023 Philip Reames <preames@rivosinc.com>

[MachineCombiner] Use default latency model when no detailed model available

This change adjusts the cost modeling used when the target does not have a schedule model with individual instruction lat

[MachineCombiner] Use default latency model when no detailed model available

This change adjusts the cost modeling used when the target does not have a schedule model with individual instruction latencies. After this change, we use the default latency information available from TargetSchedule. The default latency information essentially ends up treating most instructions as latency 1, with a few "expensive" ones getting a higher cost.

Previously, we unconditionally applied the first legal pattern - without any consideration of profitability. As a result, this change both prevents some patterns being applied, and changes which patterns are exercised. (i.e. previously the first pattern was applied, afterwards, maybe the second one is because the first wasn't profitable.)

The motivation here is two fold.

First, this brings the default behavior in line with the behavior when -mcpu or -mtune is specified. This improves test coverage, and generally makes it less likely we will have bad surprises when providing more information to the compiler.

Second, this enables some reassociation for ILP by default. Despite being unconditionally enabled, the prior code tended to "reassociate" repeatedly through an entire chain and simply moving the first operand to the end. The result was still a serial chain, just a different one. With this change, one of the intermediate transforms is unprofitable and we end up with a partially flattened tree.

Note that the resulting code diffs show significant room for improvement in the basic algorithm. I am intentionally excluding those from this patch.

For the test diffs, I don't seen any concerning regressions. I took a fairly close look at the RISCV ones, but only skimmed the x86 (particularly vector x86) changes.

Differential Revision: https://reviews.llvm.org/D141017

show more ...


Revision tags: llvmorg-15.0.7
# 002005e6 22-Dec-2022 Hsiangkai Wang <hsiangkai@google.com>

[RISCV] Add integer scalar instructions to isAssociativeAndCommutative

Inspired by D138107.

We can add ADD, AND, OR, XOR, MUL, MIN[U]/MAX[U] to isAssociativeAndCommutative
to increase instruction-l

[RISCV] Add integer scalar instructions to isAssociativeAndCommutative

Inspired by D138107.

We can add ADD, AND, OR, XOR, MUL, MIN[U]/MAX[U] to isAssociativeAndCommutative
to increase instruction-level parallelism by the existing MachineCombiner pass.

Differential Revision: https://reviews.llvm.org/D140530

show more ...


# d64d3c5a 22-Dec-2022 Nitin John Raj <nitin.raj@sifive.com>

[RISCV] Add pass to remove W suffix from ADDIW and SLLIW to improve compressibility

SLLI and ADD are more compressible than SLLIW and ADDW. SLLI/ADD both have a 5-bit register encoding. SLLIW/ADDW h

[RISCV] Add pass to remove W suffix from ADDIW and SLLIW to improve compressibility

SLLI and ADD are more compressible than SLLIW and ADDW. SLLI/ADD both have a 5-bit register encoding. SLLIW/ADDW have a 3-bit register encoding. They both require the dest to also be one of the sources.

We aggressively form ADDW/SLLIW as it helps hasAllWBitUsers in RISCVISelDAGToDAG to not require recursion. So we need a pass to remove excessive -w suffixes.

Differential Revision: https://reviews.llvm.org/D139948

show more ...


# e00e20a0 01-Dec-2022 Craig Topper <craig.topper@sifive.com>

[RISCV] Add ADDW/AND/OR/XOR/SUB/SUBW to getRegAllocHints.

These instructions requires both register operands to be compressible
so I've only applied the hint if we already have a GPRC physical regis

[RISCV] Add ADDW/AND/OR/XOR/SUB/SUBW to getRegAllocHints.

These instructions requires both register operands to be compressible
so I've only applied the hint if we already have a GPRC physical register
assigned for the other register operand.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D139079

show more ...


Revision tags: llvmorg-15.0.6
# 64612f5d 25-Nov-2022 Craig Topper <craig.topper@sifive.com>

[RISCV] Add ADD to getRegAllocationHints to improve to improve use of c.add.

add can always be compressed to c.add if one of the sources is the
same as the destination.

The same is not true for c.a

[RISCV] Add ADD to getRegAllocationHints to improve to improve use of c.add.

add can always be compressed to c.add if one of the sources is the
same as the destination.

The same is not true for c.addw where the registers need to be x8-x15.

show more ...


Revision tags: llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1
# 38ffa2bb 12-Sep-2022 Craig Topper <craig.topper@sifive.com>

[LegalizeTypes] Improve splitting for urem/udiv by constant for some constants.

For remainder:
If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves
together and use a (Bitwidt

[LegalizeTypes] Improve splitting for urem/udiv by constant for some constants.

For remainder:
If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves
together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer
type, this urem will be expand by DAGCombiner using multiply by magic
constant. We do have to take into account that adding high and low
together can produce a carry, making it a (BitWidth / 2)+1 bit number.
So we need to also add back in the carry from the first addition.

For division:
We can use the above trick to compute the remainder, subtract that
remainder from the dividend, then multiply by the multiplicative
inverse of the Divisor modulo (1 << BitWidth).

This is based on the section "Remainder by Summing Digits" in
Hacker's delight.

The remainder trick is similar to a trick you may have learned for
determining if a decimal number is divisible by 3. You can add all the
digits together and see if the sum is divisible by 3. If you're not sure
if the sum is divisible by 3, you can add its digits together. This
can be repeated until you have a single decimal digit. If that digit
is 3, 6, or 9, then the original number is divisible by 3. This works
because 10 % 3 == 1.

gcc already does this same trick. There are additional tricks gcc
does urem as well as srem, udiv, and sdiv that I plan to add in
future patches.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130862

show more ...


Revision tags: llvmorg-15.0.0, llvmorg-15.0.0-rc3
# 47b1f836 10-Aug-2022 Alex Bradbury <asb@igalia.com>

[RISCV] Implement isUsedByReturnOnly TargetLowering hook in order to tailcall more libcalls

Prior to this patch, libcalls inserted by the SelectionDAG legalizer
could never be tailcalled. The eligib

[RISCV] Implement isUsedByReturnOnly TargetLowering hook in order to tailcall more libcalls

Prior to this patch, libcalls inserted by the SelectionDAG legalizer
could never be tailcalled. The eligibility of libcalls for tail calling
is is partly determined by checking TargetLowering::isInTailCallPosition
and comparing the return type of the libcall and the calleer.
isInTailCallPosition in turn calls TargetLowering::isUsedByReturnOnly
(which always returns false if not implemented by the target).

This patch provides a minimal implementation of
TargetLowering::isUsedByReturnOnly - enough to support tail calling
libcalls on hard float ABIs. Soft-float ABIs are left for a follow on
patch. libcall-tail-calls.ll also shows missed opportunities to tail
call integer libcalls, but this is due to issues outside of
the isUsedByReturnOnly hook.

Differential Revision: https://reviews.llvm.org/D131087

show more ...


Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# 1903b991 07-Apr-2022 Craig Topper <craig.topper@sifive.com>

[RISCV] Always select (and (srl X, C), Mask) as (srli (slli X, C2), C3).

SLLI is always compressible to C.SLLI as long as the source and dest
register is the same.

ANDI and SRLI are only compressib

[RISCV] Always select (and (srl X, C), Mask) as (srli (slli X, C2), C3).

SLLI is always compressible to C.SLLI as long as the source and dest
register is the same.

ANDI and SRLI are only compressible if the register is x8-x15. By
using SLLI we have a better chance of generating shorter code.

I had to exclude one exclusion for the BEXTI case so that it's
pattern match could still fire.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D123336

show more ...


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# bd653f64 11-Jan-2022 Haocong.Lu <Haocong.Lu@streamcomputing.com>

[RISCV] Use shift for zero extension when Zbb and Zbp are not enabled

Now AND is used for zero extension when both Zbb and Zbp are not enabled.
It may be better to use shift operation if the trailin

[RISCV] Use shift for zero extension when Zbb and Zbp are not enabled

Now AND is used for zero extension when both Zbb and Zbp are not enabled.
It may be better to use shift operation if the trailing ones mask exceeds simm12.

This patch optimzes LUI+ADDI+AND to SLLI+SRLI.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116720

show more ...


# b645bcd9 10-Jan-2022 Craig Topper <craig.topper@sifive.com>

[RISCV] Generalize (srl (and X, 0xffff), C) -> (srli (slli X, (XLen-16), (XLen-16) + C) optimization.

This can be generalized to (srl (and X, C2), C) ->
(srli (slli X, (XLen-C3), (XLen-C3) + C). Whe

[RISCV] Generalize (srl (and X, 0xffff), C) -> (srli (slli X, (XLen-16), (XLen-16) + C) optimization.

This can be generalized to (srl (and X, C2), C) ->
(srli (slli X, (XLen-C3), (XLen-C3) + C). Where C2 is a mask with
C3 trailing ones.

This can avoid constant materialization for C2. This is beneficial
even when C2 can be selected to ANDI because the SLLI can become
C.SLLI, but C.ANDI cannot cover all the immediates of ANDI.

This also enables CSE in some cases of i8 sdiv by constant codegen.

show more ...


# 41454ab2 31-Dec-2021 wangpc <pc.wang@linux.alibaba.com>

[RISCV] Use constant pool for large integers

For large integers (for example, magic numbers generated by
TargetLowering::BuildSDIV when dividing by constant), we may
need about 4~8 instructions to b

[RISCV] Use constant pool for large integers

For large integers (for example, magic numbers generated by
TargetLowering::BuildSDIV when dividing by constant), we may
need about 4~8 instructions to build them.
In the same time, it just takes two instructions to load
constants (with extra cycles to access memory), so it may be
profitable to put these integers into constant pool.

Reviewed By: asb, craig.topper

Differential Revision: https://reviews.llvm.org/D114950

show more ...


# 6f7de819 09-Dec-2021 Craig Topper <craig.topper@sifive.com>

[RISCV] Use MULHU for more division by constant cases.

D113805 improved handling of i32 divu/remu on RV64. The basic idea
from that can be extended to (mul (and X, C2), C1) where C2 is any
mask cons

[RISCV] Use MULHU for more division by constant cases.

D113805 improved handling of i32 divu/remu on RV64. The basic idea
from that can be extended to (mul (and X, C2), C1) where C2 is any
mask constant.

We can replace the and with an SLLI by shifting by the number of
leading zeros in C2 if we also shift C1 left by XLen - lzcnt(C1)
bits. This will give the full product XLen additional trailing zeros,
putting the result in the output of MULHU. If we can't use ANDI,
ZEXT.H, or ZEXT.W, this will avoid materializing C2 in a register.

The downside is it make take 1 additional instruction to create C1.
But since that's not on the critical path, it can hopefully be
interleaved with other operations.

The previous tablegen pattern is replaced by custom isel code.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D115310

show more ...


Revision tags: llvmorg-13.0.1-rc1
# af0ecfcc 22-Nov-2021 wangpc <pc.wang@linux.alibaba.com>

[RISCV] Generate pseudo instruction li

Add an alias of `addi [x], zero, imm` to generate pseudo
instruction li, which makes assembly mush more readable.
For existed tests, users can update them by r

[RISCV] Generate pseudo instruction li

Add an alias of `addi [x], zero, imm` to generate pseudo
instruction li, which makes assembly mush more readable.
For existed tests, users can update them by running script
`llvm/utils/update_llc_test_checks.py`.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D112692

show more ...


# 02bed66c 12-Nov-2021 Craig Topper <craig.topper@sifive.com>

[RISCV] Improve codegen for i32 udiv/urem by constant on RV64.

The division by constant optimization often produces constants that
are uimm32, but not simm32. These constants require 3 or 4 instruct

[RISCV] Improve codegen for i32 udiv/urem by constant on RV64.

The division by constant optimization often produces constants that
are uimm32, but not simm32. These constants require 3 or 4 instructions
to materialize without Zba.

Since these instructions are often used by a multiply with a LHS
that needs to be zero extended with an AND, we can switch the MUL
to a MULHU by shifting both inputs left by 32. Once we shift the
constant left, the upper 32 bits no longer need to be 0 so constant
materialization is free to use LUI+ADDIW. This reduces the constant
materialization from 4 instructions to 3 in some cases while also
reducing the zero extend of the LHS from 2 shifts to 1.

Differential Revision: https://reviews.llvm.org/D113805

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2
# d9ba1a9c 18-Aug-2021 Craig Topper <craig.topper@sifive.com>

[RISCV] Teach isel to select ADDW/SUBW/MULW/SLLIW when only the lower 32-bits are used.

We normally select these when the root node is a sext_inreg, but
SimplifyDemandedBits can sometimes bypass the

[RISCV] Teach isel to select ADDW/SUBW/MULW/SLLIW when only the lower 32-bits are used.

We normally select these when the root node is a sext_inreg, but
SimplifyDemandedBits can sometimes bypass the sext_inreg for some
users. This can create situation where sext_inreg+add/sub/mul/shl
is selected to a W instruction, and then the add/sub/mul/shl is
separately selected to a non-W instruction with the same inputs.

This patch tries to detect when it would still be ok to use a W
instruction without the sext_inreg by checking the direct users.
This can allow the W instruction to CSE with one created for a
sext_inreg+add/sub/mul/shl. To minimize complexity and cost of
checking, we make no attempt to determine if the CSE will happen
and just always use a W instruction when we can.

Differential Revision: https://reviews.llvm.org/D107658

show more ...


Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init
# 98d4adc2 20-Jul-2021 Craig Topper <craig.topper@sifive.com>

[RISCV] Add custom isel to select (and (srl X, C1), C2) and (and (shl X, C1), C2)

Replace some existing isel patterns that are covered by the new
code. SLLIUWPat has been removed in favor of folding

[RISCV] Add custom isel to select (and (srl X, C1), C2) and (and (shl X, C1), C2)

Replace some existing isel patterns that are covered by the new
code. SLLIUWPat has been removed in favor of folding its root case
into the new code. The other uses in isel patterns for shXadd.uw
have been switched to using hardcoded AND masks.

This is based on the original version of D49585 from ARM. The final
version of that was made a DAG combine, but I've chosen to keep it
as custom isel. I'm not convinced DAG combine is as good with
shift pairs as it is with and+shift. I saw some issues optimizing
the shifts created by vscale lowering if an and isn't created for
from a shift pair.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D106230

show more ...


# 00c1cc86 18-Jul-2021 Craig Topper <craig.topper@sifive.com>

[RISCV] Add more i32 srem/sdiv with power of 2 constant tests. NFC

Add a small power 2 srem test to match existing sdiv test. Add
larger power of 2 test to both.

The larger constant test shows mate

[RISCV] Add more i32 srem/sdiv with power of 2 constant tests. NFC

Add a small power 2 srem test to match existing sdiv test. Add
larger power of 2 test to both.

The larger constant test shows materialization of a constant
for an AND in the RV64 code. We should be using W shift instructions
to match the RV32 code.

show more ...


12