|
Revision tags: llvmorg-21-init, llvmorg-19.1.7 |
|
| #
d7fe2cf8 |
| 18-Dec-2024 |
tianleliu <tianle.l.liu@intel.com> |
[InstCombine] Widen Sel width after Cmp to generate Max/Min intrinsics. (#118932)
When Sel(Cmp) are in different integer type,
From: (K and N mean width, K < N; a and b are src operands.)
bN =
[InstCombine] Widen Sel width after Cmp to generate Max/Min intrinsics. (#118932)
When Sel(Cmp) are in different integer type,
From: (K and N mean width, K < N; a and b are src operands.)
bN = Ext(bK)
cond = Cmp(aN, bN)
aK = Trunc aN
retK = Sel(cond, aK, bK)
To:
bN = Ext(bK)
cond = Cmp(aN, bN)
retN = Sel(cond, aN, bN)
retK = Trunc retN
Though Sel's operands width becomes larger, the benefit
of making type width in Sel the same as Cmp, is for combing
to max/min intrinsics, and also better performance for SIMD
instructions.
References of correctness: https://alive2.llvm.org/ce/z/Y4Kegm
https://alive2.llvm.org/ce/z/qFtjtR
Reference of generated code comparision:
https://gcc.godbolt.org/z/o97svGvYM
https://gcc.godbolt.org/z/59Ynj91ov
show more ...
|
|
Revision tags: llvmorg-19.1.6, llvmorg-19.1.5 |
|
| #
979a0356 |
| 02-Dec-2024 |
Veera <32646674+veera-sivarajan@users.noreply.github.com> |
[InstCombine] Fold `X Pred C2 ? X BOp C1 : C2 BOp C1` to `min/max(X, C2) BOp C1` (#116888)
Fixes #82414.
General Proof: https://alive2.llvm.org/ce/z/ERjNs4
Proof for Tests: https://alive2.llvm.
[InstCombine] Fold `X Pred C2 ? X BOp C1 : C2 BOp C1` to `min/max(X, C2) BOp C1` (#116888)
Fixes #82414.
General Proof: https://alive2.llvm.org/ce/z/ERjNs4
Proof for Tests: https://alive2.llvm.org/ce/z/K-934G
This PR transforms `select` instructions of the form `select (Cmp X C1)
(BOp X C2) C3` to `BOp (min/max X C1) C2` iff `C3 == BOp C1 C2`.
This helps in eliminating a noop loop in
https://github.com/rust-lang/rust/issues/123845 but does not improve
optimizations.
show more ...
|
|
Revision tags: llvmorg-19.1.4 |
|
| #
38fffa63 |
| 06-Nov-2024 |
Paul Walker <paul.walker@arm.com> |
[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548)
|
|
Revision tags: llvmorg-19.1.3 |
|
| #
583fa4f5 |
| 15-Oct-2024 |
Alexey Bader <alexey.bader@intel.com> |
[InstCombine] Extend fcmp+select folding to minnum/maxnum intrinsics (#112088)
Today, InstCombine can fold fcmp+select patterns to minnum/maxnum
intrinsics when the nnan and nsz flags are set. The
[InstCombine] Extend fcmp+select folding to minnum/maxnum intrinsics (#112088)
Today, InstCombine can fold fcmp+select patterns to minnum/maxnum
intrinsics when the nnan and nsz flags are set. The ordering of the
operands in both the fcmp and select instructions is important for the
folding to occur.
maxnum patterns:
1. (a op b) ? a : b -> maxnum(a, b), where op is one of {ogt, oge}
2. (a op b) ? b : a -> maxnum(a, b), where op is one of {ule, ult}
The second pattern is supposed to make the order of the operands in the
select instruction irrelevant. However, the pattern matching code uses
the CmpInst::getInversePredicate method to invert the comparison
predicate. This method doesn't take into account the fast-math flags,
which can lead missing the folding opportunity.
The patch extends the pattern matching code to handle unordered fcmp
instructions. This allows the folding to occur even when the select
instruction has the operands in the inverse order.
New maxnum patterns:
1. (a op b) ? a : b -> maxnum(a, b), where op is one of {ugt, uge}
2. (a op b) ? b : a -> maxnum(a, b), where op is one of {ole, olt}
The same changes are applied to the minnum intrinsic.
show more ...
|
|
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4 |
|
| #
a1058776 |
| 21-Aug-2024 |
Nikita Popov <npopov@redhat.com> |
[InstCombine] Remove some of the complexity-based canonicalization (#91185)
The idea behind this canonicalization is that it allows us to handle less
patterns, because we know that some will be can
[InstCombine] Remove some of the complexity-based canonicalization (#91185)
The idea behind this canonicalization is that it allows us to handle less
patterns, because we know that some will be canonicalized away. This is
indeed very useful to e.g. know that constants are always on the right.
However, this is only useful if the canonicalization is actually
reliable. This is the case for constants, but not for arguments: Moving
these to the right makes it look like the "more complex" expression is
guaranteed to be on the left, but this is not actually the case in
practice. It fails as soon as you replace the argument with another
instruction.
The end result is that it looks like things correctly work in tests,
while they actually don't. We use the "thwart complexity-based
canonicalization" trick to handle this in tests, but it's often a
challenge for new contributors to get this right, and based on the
regressions this PR originally exposed, we clearly don't get this right
in many cases.
For this reason, I think that it's better to remove this complexity
canonicalization. It will make it much easier to write tests for
commuted cases and make sure that they are handled.
show more ...
|
|
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5 |
|
| #
b8f3024a |
| 24-Apr-2024 |
Andreas Jonson <andjo403@hotmail.com> |
[InstCombine] Swap out range metadata to range attribute for cttz/ctlz/ctpop (#88776)
Since all optimizations that use range metadata now also handle range attribute, this patch replaces writes of
[InstCombine] Swap out range metadata to range attribute for cttz/ctlz/ctpop (#88776)
Since all optimizations that use range metadata now also handle range attribute, this patch replaces writes of
range metadata for call instructions to range attributes.
show more ...
|
| #
d9a5aa8e |
| 17-Apr-2024 |
Nikita Popov <npopov@redhat.com> |
[PatternMatch] Do not accept undef elements in m_AllOnes() and friends (#88217)
Change all the cstval_pred_ty based PatternMatch helpers (things like
m_AllOnes and m_Zero) to only allow poison elem
[PatternMatch] Do not accept undef elements in m_AllOnes() and friends (#88217)
Change all the cstval_pred_ty based PatternMatch helpers (things like
m_AllOnes and m_Zero) to only allow poison elements inside vector
splats, not undef elements.
Historically, we used to represent non-demanded elements in vectors
using undef. Nowadays, we use poison instead. As such, I believe that
support for undef in vector splats is no longer useful.
At the same time, while poison splat elements are pretty much always
safe to ignore, this is not generally the case for undef elements. We
have existing miscompiles in our tests due to this (see the
masked-merge-*.ll tests changed here) and it's easy to miss such cases
in the future, now that we write tests using poison instead of undef
elements.
I think overall, keeping support for undef elements no longer makes
sense, and we should drop it. Once this is done consistently, I think we
may also consider allowing poison in m_APInt by default, as doing that
change is much less risky than doing the same with undef.
This change involves a substantial amount of test changes. For most
tests, I've just replaced undef with poison, as I don't think there is
value in retaining both. For some tests (where the distinction between
undef and poison is important), I've duplicated tests.
show more ...
|
|
Revision tags: llvmorg-18.1.4, llvmorg-18.1.3 |
|
| #
b6bd41db |
| 21-Mar-2024 |
Noah Goldstein <goldstein.w.n@gmail.com> |
[InstCombine] Add canonicalization of `sitofp` -> `uitofp nneg`
This is essentially the same as #82404 but has the `nneg` flag which allows the backend to reliably undo the transform.
Closes #88299
|
| #
6960ace5 |
| 20-Mar-2024 |
Noah Goldstein <goldstein.w.n@gmail.com> |
Revert "[InstCombine] Canonicalize `(sitofp x)` -> `(uitofp x)` if `x >= 0`"
This reverts commit d80d5b923c6f611590a12543bdb33e0c16044d44.
It wasn't a particularly important transform to begin with
Revert "[InstCombine] Canonicalize `(sitofp x)` -> `(uitofp x)` if `x >= 0`"
This reverts commit d80d5b923c6f611590a12543bdb33e0c16044d44.
It wasn't a particularly important transform to begin with and caused some codegen regressions on targets that prefer `sitofp` so dropping.
Might re-visit along with adding `nneg` flag to `uitofp` so its easily reversable for the backend.
show more ...
|
|
Revision tags: llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3 |
|
| #
d80d5b92 |
| 20-Feb-2024 |
Noah Goldstein <goldstein.w.n@gmail.com> |
[InstCombine] Canonicalize `(sitofp x)` -> `(uitofp x)` if `x >= 0`
Just a standard canonicalization.
Proofs: https://alive2.llvm.org/ce/z/9W4VFm
Closes #82404
|
| #
641d160a |
| 25-Feb-2024 |
Yingwei Zheng <dtcxzyw2333@gmail.com> |
[InstCombine] Fold umax(smax)/smin(umin) with non-negative constants (#82929)
This patch extends `reassociateMinMaxWithConstants` to fold the
following patterns:
```
umax (smax X, nneg C0), nneg
[InstCombine] Fold umax(smax)/smin(umin) with non-negative constants (#82929)
This patch extends `reassociateMinMaxWithConstants` to fold the
following patterns:
```
umax (smax X, nneg C0), nneg C1 --> smax X, (umax C0, C1)
smin (umin X, nneg C0), nneg C1 --> umin X, (smin/umin C0, C1)
```
Alive2: https://alive2.llvm.org/ce/z/wfEj-e
Address the comment
https://github.com/llvm/llvm-project/pull/82472#pullrequestreview-1896922897.
show more ...
|
|
Revision tags: llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5 |
|
| #
5918f623 |
| 08-Nov-2023 |
Nikita Popov <npopov@redhat.com> |
[InstCombine] Infer zext nneg flag (#71534)
Use KnownBits to infer the nneg flag on zext instructions.
Currently we only set nneg when converting sext -> zext, but don't set
it when we have a ze
[InstCombine] Infer zext nneg flag (#71534)
Use KnownBits to infer the nneg flag on zext instructions.
Currently we only set nneg when converting sext -> zext, but don't set
it when we have a zext in the first place. If we want to use it in
optimizations, we should make sure the flag inference is consistent.
show more ...
|
|
Revision tags: llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1 |
|
| #
cbca9ce9 |
| 31-Mar-2023 |
Nikita Popov <npopov@redhat.com> |
[InstCombine] Remove min/max special case when folding into select
Now that we canonicalize to min/max intrinsics, we no longer need to guard against this here.
In fact, it seems like the issue fro
[InstCombine] Remove min/max special case when folding into select
Now that we canonicalize to min/max intrinsics, we no longer need to guard against this here.
In fact, it seems like the issue from PR46271 was the final push for introducing the intrinsics in the first place...
show more ...
|
|
Revision tags: llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
| #
379de123 |
| 16-Dec-2022 |
Nikita Popov <npopov@redhat.com> |
[InstCombine] Preserve instruction name in replaceInstUsesWith()
Currently InstCombine folds using the `return replaceInstUsesWith(V, Builder.CreateFoo())` pattern do not preserve the original name
[InstCombine] Preserve instruction name in replaceInstUsesWith()
Currently InstCombine folds using the `return replaceInstUsesWith(V, Builder.CreateFoo())` pattern do not preserve the original name of the instruction. To preserve the name, you either have to use something like `return FooInst::Create(...)` which is usually less nice, or go out of the way to preserve the name with takeName(). We often don't do that.
This patch instead preserves the name in replaceInstUsesWith() when replacing a named instruction with an unnamed instruction. To be conservative, I also added a zero-use check, which is a proxy for the case where the instruction was just created, rather than an existing one reused. Possibly we could drop that part.
As InstCombine tests are robust against renames this does not cause any test diffs, so I regenerated a random test to show the effects.
Differential Revision: https://reviews.llvm.org/D140192
show more ...
|
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2 |
|
| #
4ab40eca |
| 03-Oct-2022 |
Bjorn Pettersson <bjorn.a.pettersson@ericsson.com> |
[test][InstCombine] Update some test cases to use opaque pointers
These tests cases were converted using the script at https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34
Differential Re
[test][InstCombine] Update some test cases to use opaque pointers
These tests cases were converted using the script at https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34
Differential Revision: https://reviews.llvm.org/D135094
show more ...
|
|
Revision tags: llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
82190f91 |
| 06-May-2022 |
Nikita Popov <npopov@redhat.com> |
[InstCombine] Fold icmp of select with implied condition
When threading the icmp over the select, check whether the condition can be folded when taking into account the select condition.
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
| #
e1608a9d |
| 24-Feb-2022 |
Nikita Popov <npopov@redhat.com> |
[InstCombine] Remove SPF min/max canonicalization
Now that we canonicalize SPF min/max to intrinsics, there's no need to canonicalize the structure of the SPF min/max itself anymore. This is concept
[InstCombine] Remove SPF min/max canonicalization
Now that we canonicalize SPF min/max to intrinsics, there's no need to canonicalize the structure of the SPF min/max itself anymore. This is conceptually NFC, but in practice does slightly impact results due to folding order differences.
show more ...
|
| #
a266af72 |
| 14-Feb-2022 |
Nikita Popov <npopov@redhat.com> |
[InstCombine] Canonicalize SPF to min/max intrinsics
Now that integer min/max intrinsics have good support in both InstCombine and other passes, start canonicalizing SPF min/max to intrinsic min/max
[InstCombine] Canonicalize SPF to min/max intrinsics
Now that integer min/max intrinsics have good support in both InstCombine and other passes, start canonicalizing SPF min/max to intrinsic min/max.
Once this sticks, we can stop matching SPF min/max in various places, and can remove hacks we have for preventing infinite loops and breaking of SPF canonicalization.
Differential Revision: https://reviews.llvm.org/D98152
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1 |
|
| #
acdc419c |
| 04-Feb-2022 |
Bjorn Pettersson <bjorn.a.pettersson@ericsson.com> |
[test] Use -passes=instcombine instead of -instcombine in lots of tests. NFC
Another step moving away from the deprecated syntax of specifying pass pipeline in opt.
Differential Revision: https://r
[test] Use -passes=instcombine instead of -instcombine in lots of tests. NFC
Another step moving away from the deprecated syntax of specifying pass pipeline in opt.
Differential Revision: https://reviews.llvm.org/D119081
show more ...
|
|
Revision tags: llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
e6ad9ef4 |
| 14-Dec-2021 |
Philip Reames <listmail@philipreames.com> |
[instcombine] Canonicalize constant index type to i64 for extractelement/insertelement
The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transform
[instcombine] Canonicalize constant index type to i64 for extractelement/insertelement
The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transforms are inconsistent about which types we end up with based on visit order.
I'm restricting this to constants as for non-constants, we'd have to decide whether the simplicity was worth extra instructions. For constants, there are no extra instructions.
We chose the canonical type as i64 arbitrarily. We might consider changing this to something else in the future if we have cause.
Differential Revision: https://reviews.llvm.org/D115387
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
1376301c |
| 06-Nov-2021 |
Nikita Popov <nikita.ppv@gmail.com> |
[InstCombine] Canonicalize range test idiom
InstCombine converts range tests of the form (X > C1 && X < C2) or (X < C1 || X > C2) into checks of the form (X + C3 < C4) or (X + C3 > C4). It is possib
[InstCombine] Canonicalize range test idiom
InstCombine converts range tests of the form (X > C1 && X < C2) or (X < C1 || X > C2) into checks of the form (X + C3 < C4) or (X + C3 > C4). It is possible to express all range tests in either of these forms (with different choices of constants), but currently neither of them is considered canonical. We may have equivalent range tests using either ult or ugt.
This proposes to canonicalize all range tests to use ult. An alternative would be to canonicalize to either ult or ugt depending on the specific constants involved -- e.g. in practice we currently generate ult for && style ranges and ugt for || style ranges when going through the insertRangeTest() helper. In fact, the "clamp like" fold was relying on this, which is why I had to tweak it to not assume whether inversion is needed based on just the predicate.
Proof: https://alive2.llvm.org/ce/z/_SP_rQ
Differential Revision: https://reviews.llvm.org/D113366
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
| #
6db163c7 |
| 13-Aug-2021 |
Qiu Chaofan <qiucofan@cn.ibm.com> |
Pre-commit two-way clamp tests
|
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4 |
|
| #
7e18cd88 |
| 20-Mar-2021 |
Nikita Popov <nikita.ppv@gmail.com> |
[InstCombine] Whitelist non-refining folds in SimplifyWithOpReplaced
This is an alternative to D98391/D98585, playing things more conservatively. If AllowRefinement == false, then we don't use InstS
[InstCombine] Whitelist non-refining folds in SimplifyWithOpReplaced
This is an alternative to D98391/D98585, playing things more conservatively. If AllowRefinement == false, then we don't use InstSimplify methods at all, and instead explicitly implement a small number of non-refining folds. Most cases are handled by constant folding, and I only had to add three folds to cover our unit tests / test-suite. While this may lose some optimization power, I think it is safer to approach from this direction, given how many issues this code has already caused.
Differential Revision: https://reviews.llvm.org/D99027
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1 |
|
| #
9d70dbdc |
| 27-Dec-2020 |
Juneyoung Lee <aqjune@gmail.com> |
[InstCombine] use poison as placeholder for undemanded elems
Currently undef is used as a don’t-care vector when constructing a vector using a series of insertelement. However, this is problematic b
[InstCombine] use poison as placeholder for undemanded elems
Currently undef is used as a don’t-care vector when constructing a vector using a series of insertelement. However, this is problematic because undef isn’t undefined enough. Especially, a sequence of insertelement can be optimized to shufflevector, but using undef as its placeholder makes shufflevector a poison-blocking instruction because undef cannot be optimized to poison. This makes a few straightforward optimizations incorrect, such as:
``` ; https://bugs.llvm.org/show_bug.cgi?id=44185
define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) { %xv = insertelement <4 x float> %q, float %x, i32 2 %r = shufflevector <4 x float> %y, <4 x float> %xv, <4 x i32> { 0, 6, 2, undef } ret <4 x float> %r ; %r[3] is undef } => define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) { %r = insertelement <4 x float> %y, float %x, i32 1 ret <4 x float> %r ; %r[3] = %y[3], incorrect if %y[3] = poison }
Transformation doesn't verify! ERROR: Target is more poisonous than source ```
I’d like to suggest 1. Using poison as insertelement’s placeholder value (IRBuilder::CreateVectorSplat should be patched too) 2. Updating shufflevector’s semantics to return poison element if mask is undef
Note that poison is currently lowered into UNDEF in SelDag, so codegen part is okay. m_Undef() matches PoisonValue as well, so existing optimizations will still fire.
The only concern is hidden miscompilations that will go incorrect when poison constant is given. A conservative way is copying all tests having `insertelement undef` & replacing it with `insertelement poison` & run Alive2 on it, but it will create many tests and people won’t like it. :(
Instead, I’ll simply locally maintain the tests and run Alive2. If there is any bug found, I’ll report it.
Relevant links: https://bugs.llvm.org/show_bug.cgi?id=43958 , http://lists.llvm.org/pipermail/llvm-dev/2019-November/137242.html
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93586
show more ...
|
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1 |
|
| #
d12ec0f7 |
| 18-Jul-2020 |
Nikita Popov <nikita.ppv@gmail.com> |
[InstCombine] Fix store merge worklist management (PR46680)
Fixes https://bugs.llvm.org/show_bug.cgi?id=46680.
Just like insertions through IRBuilder, InsertNewInstBefore() should be using the defe
[InstCombine] Fix store merge worklist management (PR46680)
Fixes https://bugs.llvm.org/show_bug.cgi?id=46680.
Just like insertions through IRBuilder, InsertNewInstBefore() should be using the deferred worklist mechanism, so that processing of newly added instructions is prioritized.
There's one side-effect of the worklist order change which could be classified as a regression. An add op gets pushed through a select that at the time is not a umax. We could add a reverse transform that tries to push adds in the reverse direction to restore a min/max, but that seems like a sure way of getting infinite loops... Seems like something that should best wait on min/max intrinsics.
Differential Revision: https://reviews.llvm.org/D84109
show more ...
|