invariant-store-vectorization.ll - OpenGrok history log for /llvm-project/llvm/test/Transforms/LoopVectorize/invariant-store-vectorization.ll

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7
# 7f3428d3	29-Dec-2024	Florian Hahn <flo@fhahn.com>	[VPlan] Compute induction end values in VPlan. (#112145) Use createDerivedIV to compute IV end values directly in VPlan, instead of creating them up-front. This allows updating IV users outside [VPlan] Compute induction end values in VPlan. (#112145) Use createDerivedIV to compute IV end values directly in VPlan, instead of creating them up-front. This allows updating IV users outside the loop as follow-up. Depends on https://github.com/llvm/llvm-project/pull/110004 and https://github.com/llvm/llvm-project/pull/109975. PR: https://github.com/llvm/llvm-project/pull/112145 show more ...
# 4ad0fdd1	17-Dec-2024	Florian Hahn <flo@fhahn.com>	[VPlan] Remove reverse() of predecessors from VPInstruction::generate. This was originally done to reduce the diff for the change. Remove it and update the remaining tests. NFC modulo reordering of [VPlan] Remove reverse() of predecessors from VPInstruction::generate. This was originally done to reduce the diff for the change. Remove it and update the remaining tests. NFC modulo reordering of incoming values. Clean up after https://github.com/llvm/llvm-project/pull/114292. show more ...
Revision tags: llvmorg-19.1.6
# 462cb3cd	05-Dec-2024	Nikita Popov <npopov@redhat.com>	[InstCombine] Infer nusw + nneg -> nuw for getelementptr (#111144) If the gep is nusw (usually via inbounds) and the offset is non-negative, we can infer nuw. Proof: https://alive2.llvm.org/ce/z [InstCombine] Infer nusw + nneg -> nuw for getelementptr (#111144) If the gep is nusw (usually via inbounds) and the offset is non-negative, we can infer nuw. Proof: https://alive2.llvm.org/ce/z/ihztLy show more ...
Revision tags: llvmorg-19.1.5, llvmorg-19.1.4
# 38fffa63	06-Nov-2024	Paul Walker <paul.walker@arm.com>	[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548)
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4
# a1058776	21-Aug-2024	Nikita Popov <npopov@redhat.com>	[InstCombine] Remove some of the complexity-based canonicalization (#91185) The idea behind this canonicalization is that it allows us to handle less patterns, because we know that some will be can [InstCombine] Remove some of the complexity-based canonicalization (#91185) The idea behind this canonicalization is that it allows us to handle less patterns, because we know that some will be canonicalized away. This is indeed very useful to e.g. know that constants are always on the right. However, this is only useful if the canonicalization is actually reliable. This is the case for constants, but not for arguments: Moving these to the right makes it look like the "more complex" expression is guaranteed to be on the left, but this is not actually the case in practice. It fails as soon as you replace the argument with another instruction. The end result is that it looks like things correctly work in tests, while they actually don't. We use the "thwart complexity-based canonicalization" trick to handle this in tests, but it's often a challenge for new contributors to get this right, and based on the regressions this PR originally exposed, we clearly don't get this right in many cases. For this reason, I think that it's better to remove this complexity canonicalization. It will make it much easier to write tests for commuted cases and make sure that they are handled. show more ...
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 99d6c6d9	05-Jul-2024	Florian Hahn <flo@fhahn.com>	[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651) This patch moves branch condition creation to enter the scalar epilogue loop to VPlan. Modeling the branch in the middle block [VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651) This patch moves branch condition creation to enter the scalar epilogue loop to VPlan. Modeling the branch in the middle block also requires modeling the successor blocks. This is done using the recently introduced VPIRBasicBlock. Note that the middle.block is still created as part of the skeleton and then patched in during VPlan execution. Unfortunately the skeleton needs to create the middle.block early on, as it is also used for induction resume value creation and is also needed to properly update the dominator tree during skeleton creation. After this patch lands, I plan to move induction resume value and phi node creation in the scalar preheader to VPlan. Once that is done, we should be able to create the middle.block in VPlan directly. This is a re-worked version based on the earlier https://reviews.llvm.org/D150398 and the main change is the use of VPIRBasicBlock. Depends on https://github.com/llvm/llvm-project/pull/92525 PR: https://github.com/llvm/llvm-project/pull/92651 show more ...
# 3808ba78	20-Jun-2024	Florian Hahn <flo@fhahn.com>	[VPlan] Model middle block via VPIRBasicBlock. (#95816) Use VPIRBasicBlock to wrap the middle block and implement patching up branches in predecessors in VPIRBasicBlock::execute. The IR middle bloc [VPlan] Model middle block via VPIRBasicBlock. (#95816) Use VPIRBasicBlock to wrap the middle block and implement patching up branches in predecessors in VPIRBasicBlock::execute. The IR middle block is only created after skeleton creation. Initially a regular VPBasicBlock is created, which will later be replaced by a VPIRBasicBlock once the middle IR basic block has been created. Note that this slightly changes the order of instructions created in the middle block; code generated by recipe execution in the middle block will now be inserted before the terminator (and in between the compare to used by the terminator). The original order will be restored in https://github.com/llvm/llvm-project/pull/92651. PR: https://github.com/llvm/llvm-project/pull/95816 show more ...
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 66816500	05-Jan-2024	Yingwei Zheng <dtcxzyw2333@gmail.com>	[InstCombine] Revert the `signed icmp -> unsigned icmp` canonicalization when folding `icmp Pred min\|max(X, Y), Z` (#76685) This patch tries to flip the signedness of predicates when folding an uns [InstCombine] Revert the `signed icmp -> unsigned icmp` canonicalization when folding `icmp Pred min\|max(X, Y), Z` (#76685) This patch tries to flip the signedness of predicates when folding an unsigned icmp with a signed min/max. It will enable more optimizations as we canonicalizes a signed icmp into an unsigned icmp when both operands are known to have the same sign. Fixes #76672. Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=949ec83eaf6fa6dbffb94c2ea9c0a4d5efdbd239&to=2deca1aea8a4e13609bab72c522a97d424f0fc2d&stat=instructions:u \|stage1-O3\|stage1-ReleaseThinLTO\|stage1-ReleaseLTO-g\|stage1-O0-g\|stage2-O3\|stage2-O0-g\|stage2-clang\| \|--\|--\|--\|--\|--\|--\|--\| \|-0.00%\|+0.01%\|+0.05%\|-0.12%\|-0.01%\|-0.03%\|-0.00%\| NOTE: We can flip the signedness of predicate if both operands are negative. But I don't see the benefit of handling these cases. show more ...
# 49b0e6dc	18-Dec-2023	David Sherwood <57997763+david-arm@users.noreply.github.com>	[LoopVectorize] Enable hoisting of runtime checks by default (#71538) With commit https://reviews.llvm.org/D152366 I introduced functionality that permitted the hoisting of runtime memory checks fr [LoopVectorize] Enable hoisting of runtime checks by default (#71538) With commit https://reviews.llvm.org/D152366 I introduced functionality that permitted the hoisting of runtime memory checks from a vectorised inner loop to the preheader of the next outer-most loop. This is useful for benchmarks like SPEC2017's x264 where the inner loop is vectorised and only has a small trip count. In such cases the runtime memory checks become expensive and since the checks never fail in the case of x264 it makes sense to do this. However, this behaviour was controlled by the flag -hoist-runtime-checks which was off by default. This patch enables this flag by default for all targets, since I believe this is a generally beneficial thing to do. I have tested this with SPEC2017 and I see 2.3% and 2.6% improvements with x264 on neoverse-v1 and neoverse-n1, respectively. Similarly, I saw slight improvements in the overall geomean on both machines. The only other notable changes were a 1% drop in the roms benchmark, which was compensated for by a 1% improvement in fotonik3d. show more ...
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3
# e13bed4c	06-Oct-2023	Dmitriy Smirnov <dmitriy.smirnov@arm.com>	[PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP This patch tries to canonicalise add + gep to gep + gep. Co-authored-by: Paul Walker <paul.walker@arm.com> Reviewed By: nikic Differential Revisi [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP This patch tries to canonicalise add + gep to gep + gep. Co-authored-by: Paul Walker <paul.walker@arm.com> Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D155688 show more ...
Revision tags: llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5
# 745cfa34	17-May-2023	Nikita Popov <npopov@redhat.com>	[InstCombine] Compute known bits for multi-use add/sub We were failing to set the known bits for add/sub in the multi-use case, resulting in odd behavioral differences depending on the number of use [InstCombine] Compute known bits for multi-use add/sub We were failing to set the known bits for add/sub in the multi-use case, resulting in odd behavioral differences depending on the number of uses. Noticed while adding a consistency assertion. The test changes are essentially a revert to the state before d6498ab. These changes are not really desirable, but if we don't want them, that needs to be handled as part of the heuristic for demanded constant shrinking, not by artifically suppressing the known bits in one specific case. show more ...
Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# 68469a80	06-Jan-2023	Florian Hahn <flo@fhahn.com>	[LV] Disable runtime unrolling for vectorized loops. This patch adds metadata to disable runtime unrolling to the vectorized loop. If runtime unrolling/interleaving is considered profitable, LV will [LV] Disable runtime unrolling for vectorized loops. This patch adds metadata to disable runtime unrolling to the vectorized loop. If runtime unrolling/interleaving is considered profitable, LV will interleave the loop directly. There should be no need to perform runtime unrolling at a later stage. Note that we already add metadata to disable runtime unrolling to the scalar loop after vectorization. The additional unrolling unnecessarily increases code size and compile time. In addition to that we have several bug reports of unncessary runtime unrolling for vectorized loops, e.g. PR40961 Compile-time improvements: NewPM-O3: -1.04% NewPM-ReleaseThinLTO: -0.59% NewPM-ReleaseLTO-g: -0.97% https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:u Fixes #40306. Reviewed By: lebedev.ri, nikic Differential Revision: https://reviews.llvm.org/D115261 show more ...
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0
# 88419a30	01-Sep-2022	Nikita Popov <npopov@redhat.com>	[LICM] Allow load-only scalar promotion in the presence of aliasing loads During scalar promotion, if there are additional potentially-aliasing loads outside the promoted set, we can still perform a [LICM] Allow load-only scalar promotion in the presence of aliasing loads During scalar promotion, if there are additional potentially-aliasing loads outside the promoted set, we can still perform a load-only promotion. As the stores are retained, any potentially-aliasing loads will still read the correct value. This increases the number of load promotions in llvm-test-suite by a factor of two: \| Old \| New licm.NumPromotionCandidates \| 4448 \| 6038 licm.NumLoadPromoted \| 479 \| 1069 licm.NumLoadStorePromoted \| 1459 \| 1459 Unfortunately, this does have some impact on compile-time: http://llvm-compile-time-tracker.com/compare.php?from=57f7f0d6cf0706a88e1ecb74f3d3e8891cceabfa&to=72b811738148aab399966a0435f13b695da1c1c8&stat=instructions In part this is because we now have less early bailouts from promotion, but also due to second order effects (e.g. for one case I looked at we spend more time in SLP now). Differential Revision: https://reviews.llvm.org/D133192 show more ...
# 5b400150	14-Dec-2022	Nikita Popov <npopov@redhat.com>	[LoopVectorize] Convert some tests to opaque pointers (NFC) For these tests update_test_checks.py had to be rerun.
# be51fa45	05-Dec-2022	Roman Lebedev <lebedev.ri@gmail.com>	[NFC] Port all runlines for LoopVectorize pass tests to -passes syntax
# d3d84654	07-Oct-2022	Arthur Eubanks <aeubanks@google.com>	[opt] Stop treating alias analysis specially when translating legacy opt syntax I've attempted to keep AA tests as close to their original intent as possible.
Revision tags: llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# e6ad9ef4	14-Dec-2021	Philip Reames <listmail@philipreames.com>	[instcombine] Canonicalize constant index type to i64 for extractelement/insertelement The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transform [instcombine] Canonicalize constant index type to i64 for extractelement/insertelement The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transforms are inconsistent about which types we end up with based on visit order. I'm restricting this to constants as for non-constants, we'd have to decide whether the simplicity was worth extra instructions. For constants, there are no extra instructions. We chose the canonical type as i64 arbitrarily. We might consider changing this to something else in the future if we have cause. Differential Revision: https://reviews.llvm.org/D115387 show more ...
# 1a18de3d	13-Dec-2021	Philip Reames <listmail@philipreames.com>	Autogen a bunch of instcombine and vectorizer tests Done in advance of D115387. These are all the ones which my local script could handle, there's a couple more which need manual updates.
Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init
# f3814ed3	19-Jul-2021	Mindong Chen <chenmindong1@huawei.com>	[LV] Re-generate check lines of some fragile tests (NFC) Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D105438
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2
# 23c2f2e6	07-Jun-2021	Florian Hahn <flo@fhahn.com>	[LV] Mark increment of main vector loop induction variable as NUW. This patch marks the induction increment of the main induction variable of the vector loop as NUW when not folding the tail. If th [LV] Mark increment of main vector loop induction variable as NUW. This patch marks the induction increment of the main induction variable of the vector loop as NUW when not folding the tail. If the tail is not folded, we know that End - Start >= Step (either statically or through the minimum iteration checks). We also know that both Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV + %Step == %End. Hence we must exit the loop before %IV + %Step unsigned overflows and we can mark the induction increment as NUW. This should make SCEV return more precise bounds for the created vector loops, used by later optimizations, like late unrolling. At the moment quite a few tests still need to be updated, but before doing so I'd like to get initial feedback to make sure I am not missing anything. Note that this could probably be further improved by using information from the original IV. Attempt of modeling of the assumption in Alive2: https://alive2.llvm.org/ce/z/H_DL_g Part of a set of fixes required for PR50412. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D103255 show more ...
Revision tags: llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3
# b46c085d	26-Feb-2021	Roman Lebedev <lebedev.ri@gmail.com>	[NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions These intrinsics, not the icmp+select are the canonical form nowadays, so we might as well directly emit them. This [NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions These intrinsics, not the icmp+select are the canonical form nowadays, so we might as well directly emit them. This should not cause any regressions, but if it does, then then they would needed to be fixed regardless. Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`, but that is a pessimization, not a correctness issue. Additionally, the non-intrinsic form has issues with undef, see https://reviews.llvm.org/D88287#2587863 show more ...
Revision tags: llvmorg-12.0.0-rc2
# 79b1b4a5	12-Feb-2021	Sanjay Patel <spatel@rotateright.com>	[Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics The vector reduction intrinsics started life as experimental ops, so backend support was lacking. As part of promot [Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics The vector reduction intrinsics started life as experimental ops, so backend support was lacking. As part of promoting them to 1st-class intrinsics, however, codegen support was added/improved: D58015 D90247 So I think it is safe to now remove this complication from IR. Note that we still have an IR-level codegen expansion pass for these as discussed in D95690. Removing that is another step in simplifying the logic. Also note that x86 was already unconditionally forming reductions in IR, so there should be no difference for x86. I spot checked a couple of the tests here by running them through opt+llc and did not see any asm diffs. If we do find functional differences for other targets, it should be possible to (at least temporarily) restore the shuffle IR with the ExpandReductions IR pass. Differential Revision: https://reviews.llvm.org/D96552 show more ...
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1
# 9b296102	29-Dec-2020	Juneyoung Lee <aqjune@gmail.com>	Use unary CreateShuffleVector if possible As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleV Use unary CreateShuffleVector if possible As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`. Let's update them. Actually, it would have been more natural if the patches were made in this order: (1) let them use unary CreateShuffleVector first (2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793) The order is swapped, but in terms of correctness it is still fine. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93923 show more ...
# 278aa65c	24-Dec-2020	Juneyoung Lee <aqjune@gmail.com>	[IR] Let IRBuilder's CreateVectorSplat/CreateShuffleVector use poison as placeholder This patch updates IRBuilder to create insertelement/shufflevector using poison as a placeholder. Reviewed By: n [IR] Let IRBuilder's CreateVectorSplat/CreateShuffleVector use poison as placeholder This patch updates IRBuilder to create insertelement/shufflevector using poison as a placeholder. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93793 show more ...
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1
# cee313d2	17-Apr-2019	Eric Christopher <echristo@gmail.com>	Revert "Temporarily Revert "Add basic loop fusion pass."" The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552
12