reduction.ll - OpenGrok history log for /llvm-project/llvm/test/Transforms/LoopVectorize/reduction.ll

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6
# 462cb3cd	05-Dec-2024	Nikita Popov <npopov@redhat.com>	[InstCombine] Infer nusw + nneg -> nuw for getelementptr (#111144) If the gep is nusw (usually via inbounds) and the offset is non-negative, we can infer nuw. Proof: https://alive2.llvm.org/ce/z [InstCombine] Infer nusw + nneg -> nuw for getelementptr (#111144) If the gep is nusw (usually via inbounds) and the offset is non-negative, we can infer nuw. Proof: https://alive2.llvm.org/ce/z/ihztLy show more ...
Revision tags: llvmorg-19.1.5, llvmorg-19.1.4
# 38fffa63	06-Nov-2024	Paul Walker <paul.walker@arm.com>	[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548)
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0
# 2c7786e9	03-Sep-2024	Philip Reames <preames@rivosinc.com>	Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770) This is a follow up to 924907bc6, and is mostly motivated by consistency but does include one additional optimization. In gen Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770) This is a follow up to 924907bc6, and is mostly motivated by consistency but does include one additional optimization. In general, we prefer 0.0 over -0.0 as the identity value for an fadd. We use that value in several places, but don't in others. So, let's be consistent and use the same identity (when nsz allows) everywhere. This creates a bunch of test churn, but due to 924907bc6, most of that churn doesn't actually indicate a change in codegen. The exception is that this change enables the use of 0.0 for nsz, but not reasoc, fadd reductions. Or said differently, it allows the neutral value of an ordered fadd reduction to be 0.0. show more ...
Revision tags: llvmorg-19.1.0-rc4
# a1058776	21-Aug-2024	Nikita Popov <npopov@redhat.com>	[InstCombine] Remove some of the complexity-based canonicalization (#91185) The idea behind this canonicalization is that it allows us to handle less patterns, because we know that some will be can [InstCombine] Remove some of the complexity-based canonicalization (#91185) The idea behind this canonicalization is that it allows us to handle less patterns, because we know that some will be canonicalized away. This is indeed very useful to e.g. know that constants are always on the right. However, this is only useful if the canonicalization is actually reliable. This is the case for constants, but not for arguments: Moving these to the right makes it look like the "more complex" expression is guaranteed to be on the left, but this is not actually the case in practice. It fails as soon as you replace the argument with another instruction. The end result is that it looks like things correctly work in tests, while they actually don't. We use the "thwart complexity-based canonicalization" trick to handle this in tests, but it's often a challenge for new contributors to get this right, and based on the regressions this PR originally exposed, we clearly don't get this right in many cases. For this reason, I think that it's better to remove this complexity canonicalization. It will make it much easier to write tests for commuted cases and make sure that they are handled. show more ...
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 99d6c6d9	05-Jul-2024	Florian Hahn <flo@fhahn.com>	[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651) This patch moves branch condition creation to enter the scalar epilogue loop to VPlan. Modeling the branch in the middle block [VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651) This patch moves branch condition creation to enter the scalar epilogue loop to VPlan. Modeling the branch in the middle block also requires modeling the successor blocks. This is done using the recently introduced VPIRBasicBlock. Note that the middle.block is still created as part of the skeleton and then patched in during VPlan execution. Unfortunately the skeleton needs to create the middle.block early on, as it is also used for induction resume value creation and is also needed to properly update the dominator tree during skeleton creation. After this patch lands, I plan to move induction resume value and phi node creation in the scalar preheader to VPlan. Once that is done, we should be able to create the middle.block in VPlan directly. This is a re-worked version based on the earlier https://reviews.llvm.org/D150398 and the main change is the use of VPIRBasicBlock. Depends on https://github.com/llvm/llvm-project/pull/92525 PR: https://github.com/llvm/llvm-project/pull/92651 show more ...
# 3808ba78	20-Jun-2024	Florian Hahn <flo@fhahn.com>	[VPlan] Model middle block via VPIRBasicBlock. (#95816) Use VPIRBasicBlock to wrap the middle block and implement patching up branches in predecessors in VPIRBasicBlock::execute. The IR middle bloc [VPlan] Model middle block via VPIRBasicBlock. (#95816) Use VPIRBasicBlock to wrap the middle block and implement patching up branches in predecessors in VPIRBasicBlock::execute. The IR middle block is only created after skeleton creation. Initially a regular VPBasicBlock is created, which will later be replaced by a VPIRBasicBlock once the middle IR basic block has been created. Note that this slightly changes the order of instructions created in the middle block; code generated by recipe execution in the middle block will now be inserted before the terminator (and in between the compare to used by the terminator). The original order will be restored in https://github.com/llvm/llvm-project/pull/92651. PR: https://github.com/llvm/llvm-project/pull/95816 show more ...
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4
# b1094776	11-Apr-2024	Yingwei Zheng <dtcxzyw2333@gmail.com>	[InstCombine] Infer nsw/nuw for trunc (#87910) This patch adds support for inferring trunc's nsw/nuw flags.
Revision tags: llvmorg-18.1.3, llvmorg-18.1.2
# 09eb9f11	19-Mar-2024	Michele Scandale <michele.scandale@gmail.com>	[InstCombine] Fix for folding `select` into floating point binary operators. (#83200) Folding a `select` into a floating point binary operators can only be done if the result is preserved for both [InstCombine] Fix for folding `select` into floating point binary operators. (#83200) Folding a `select` into a floating point binary operators can only be done if the result is preserved for both case. In particular, if the other operand of the `select` can be a NaN, then the transformation won't preserve the result value. show more ...
Revision tags: llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 51afb101	09-Jan-2024	Florian Hahn <flo@fhahn.com>	[LV] Create block in mask up-front if needed. (#76635) At the moment, block and edge masks are created on demand, which means that they are inserted at the point where they are demanded and then c [LV] Create block in mask up-front if needed. (#76635) At the moment, block and edge masks are created on demand, which means that they are inserted at the point where they are demanded and then cached. It is possible that the mask for a block is looked up later at a point that's not dominated by the point where the mask has been inserted. To avoid this, create masks up front on entry to the corresponding basic block and leave it to VPlan simplification to remove unneeded masks. Note that we need to create masks for all blocks, if any of the blocks in the loop needs predication, as computing the mask of a block depends on the masks of its predecessor. Needed for #76090. https://github.com/llvm/llvm-project/pull/76635 show more ...
# 66816500	05-Jan-2024	Yingwei Zheng <dtcxzyw2333@gmail.com>	[InstCombine] Revert the `signed icmp -> unsigned icmp` canonicalization when folding `icmp Pred min\|max(X, Y), Z` (#76685) This patch tries to flip the signedness of predicates when folding an uns [InstCombine] Revert the `signed icmp -> unsigned icmp` canonicalization when folding `icmp Pred min\|max(X, Y), Z` (#76685) This patch tries to flip the signedness of predicates when folding an unsigned icmp with a signed min/max. It will enable more optimizations as we canonicalizes a signed icmp into an unsigned icmp when both operands are known to have the same sign. Fixes #76672. Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=949ec83eaf6fa6dbffb94c2ea9c0a4d5efdbd239&to=2deca1aea8a4e13609bab72c522a97d424f0fc2d&stat=instructions:u \|stage1-O3\|stage1-ReleaseThinLTO\|stage1-ReleaseLTO-g\|stage1-O0-g\|stage2-O3\|stage2-O0-g\|stage2-clang\| \|--\|--\|--\|--\|--\|--\|--\| \|-0.00%\|+0.01%\|+0.05%\|-0.12%\|-0.01%\|-0.03%\|-0.00%\| NOTE: We can flip the signedness of predicate if both operands are negative. But I don't see the benefit of handling these cases. show more ...
# d77067d0	06-Dec-2023	Nikita Popov <npopov@redhat.com>	[ValueTracking] Add dominating condition support in computeKnownBits() (#73662) This adds support for using dominating conditions in computeKnownBits() when called from InstCombine. The implementat [ValueTracking] Add dominating condition support in computeKnownBits() (#73662) This adds support for using dominating conditions in computeKnownBits() when called from InstCombine. The implementation uses a DomConditionCache, which stores which branches may provide information that is relevant for a given value. DomConditionCache is similar to AssumptionCache, but does not try to do any kind of automatic tracking. Relevant branches have to be explicitly registered and invalidated values explicitly removed. The necessary tracking is done inside InstCombine. The reason why this doesn't just do exactly the same thing as AssumptionCache is that a lot more transforms touch branches and branch conditions than assumptions. AssumptionCache is an immutable analysis and mostly gets away with this because only a handful of places have to register additional assumptions (mostly as a result of cloning). This is very much not the case for branches. This change regresses compile-time by about ~0.2%. It also improves stage2-O0-g builds by about ~0.2%, which indicates that this change results in additional optimizations inside clang itself. Fixes https://github.com/llvm/llvm-project/issues/74242. show more ...
Revision tags: llvmorg-17.0.6
# 03d4a9d9	27-Nov-2023	Craig Topper <craig.topper@sifive.com>	[InstCombine] Set disjoint flag when turning Add into Or. (#72702) The disjoint flag was recently added to IR in #72583
Revision tags: llvmorg-17.0.5
# 23099ac2	07-Nov-2023	Philip Reames <preames@rivosinc.com>	Add known and demanded bits support for zext nneg (#70858) zext nneg was recently added to the IR in #67982. This patch teaches demanded bits and known bits about the semantics of the instruction Add known and demanded bits support for zext nneg (#70858) zext nneg was recently added to the IR in #67982. This patch teaches demanded bits and known bits about the semantics of the instruction, and adds a couple of test cases to illustrate basic functionality. show more ...
# f8742b8d	31-Oct-2023	Philip Reames <preames@rivosinc.com>	[SCEV] Teach SCEVExpander to use zext nneg when possible (#70815) zext nneg was recently added to the IR in #67982. Teaching SCEVExpander to emit nneg when possible is valuable since SCEV may have [SCEV] Teach SCEVExpander to use zext nneg when possible (#70815) zext nneg was recently added to the IR in #67982. Teaching SCEVExpander to emit nneg when possible is valuable since SCEV may have proved non-trivial facts about loop bounds which would otherwise be lost when materializing the value. show more ...
Revision tags: llvmorg-17.0.4, llvmorg-17.0.3
# 8593c0bc	12-Oct-2023	Ramkumar Ramachandra <Ramkumar.Ramachandra@imgtec.com>	LoopVectorize/test: clean up reduction.ll; generate using UTC (NFC) (#68890) The test reduction.ll was introduced before utils/update_test_checks.py, and hence contains hand-written CHECK lines. Re LoopVectorize/test: clean up reduction.ll; generate using UTC (NFC) (#68890) The test reduction.ll was introduced before utils/update_test_checks.py, and hence contains hand-written CHECK lines. Revisit the test today, and modernize it by: - Removing extranous attributes on functions and their arguments, as LoopVectorize doesn't even look at these attributes. - Removing the target datalayout, as it is not essential for LoopVectorize. Finally, regenerate the CHECK lines using update_test_checks.py, eliminating hand-written error-prone CHECK lines. show more ...
Revision tags: llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# 7d757725	14-Dec-2022	Nikita Popov <npopov@redhat.com>	[LoopVectorize] Convert some tests to opaque pointers (NFC)
# 1e08a08a	07-Dec-2022	Roman Lebedev <lebedev.ri@gmail.com>	[NFC] Port all LoopVectorize tests to `-passes=` syntax
# be51fa45	05-Dec-2022	Roman Lebedev <lebedev.ri@gmail.com>	[NFC] Port all runlines for LoopVectorize pass tests to -passes syntax
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1
# faebc6bf	11-May-2021	Florian Hahn <flo@fhahn.com>	[VPlan] Register recipe for instr if the simplified value is recipe. If the simplified VPValue is a recipe, we need to register it for Instr, in case it needs to be recorded. The way this is handled [VPlan] Register recipe for instr if the simplified value is recipe. If the simplified VPValue is a recipe, we need to register it for Instr, in case it needs to be recorded. The way this is handled in general may change soon, following some post-commit comments. This fixes PR50298. show more ...
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2
# 79b1b4a5	12-Feb-2021	Sanjay Patel <spatel@rotateright.com>	[Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics The vector reduction intrinsics started life as experimental ops, so backend support was lacking. As part of promot [Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics The vector reduction intrinsics started life as experimental ops, so backend support was lacking. As part of promoting them to 1st-class intrinsics, however, codegen support was added/improved: D58015 D90247 So I think it is safe to now remove this complication from IR. Note that we still have an IR-level codegen expansion pass for these as discussed in D95690. Removing that is another step in simplifying the logic. Also note that x86 was already unconditionally forming reductions in IR, so there should be no difference for x86. I spot checked a couple of the tests here by running them through opt+llc and did not see any asm diffs. If we do find functional differences for other targets, it should be possible to (at least temporarily) restore the shuffle IR with the ExpandReductions IR pass. Differential Revision: https://reviews.llvm.org/D96552 show more ...
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1
# 9b296102	29-Dec-2020	Juneyoung Lee <aqjune@gmail.com>	Use unary CreateShuffleVector if possible As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleV Use unary CreateShuffleVector if possible As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`. Let's update them. Actually, it would have been more natural if the patches were made in this order: (1) let them use unary CreateShuffleVector first (2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793) The order is swapped, but in terms of correctness it is still fine. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93923 show more ...
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3
# c23aefd7	31-Aug-2020	Roman Lebedev <lebedev.ri@gmail.com>	[NFC][InstCombine] visitPHINode(): cleanup PHI CSE instruction replacement As @nikic is pointing out in https://reviews.llvm.org/rGbf21ce7b908e#inline-4647 this must be sufficient otherwise `Elimina [NFC][InstCombine] visitPHINode(): cleanup PHI CSE instruction replacement As @nikic is pointing out in https://reviews.llvm.org/rGbf21ce7b908e#inline-4647 this must be sufficient otherwise `EliminateDuplicatePHINodes()` would have hit issues with it already. show more ...
# bf21ce7b	29-Aug-2020	Roman Lebedev <lebedev.ri@gmail.com>	[InstCombine] Take 3: Perform trivial PHI CSE The original take 1 was 6102310d814ad73eab60a88b21dd70874f7a056f, which taught InstSimplify to do that, which seemed better at time, since we got EarlyC [InstCombine] Take 3: Perform trivial PHI CSE The original take 1 was 6102310d814ad73eab60a88b21dd70874f7a056f, which taught InstSimplify to do that, which seemed better at time, since we got EarlyCSE support for free. However, it was proven that we can not do that there, the simplified-to PHI would not be reachable from the original PHI, and that is not something InstSimplify is allowed to do, as noted in the commit ed90f15efb40d26b5d3ead3bb8e9e284218e0186 that reverted it: > It appears to cause compilation non-determinism and caused stage3 mismatches. Then there was take 2 3e69871ab5a66fb55913a2a2f5e7f5b42899a4c9, which was InstCombine-specific, but it again showed stage2-stage3 differences, and reverted in bdaa3f86a040b138c58de41d73d35b76fdec1380. This is quite alarming. Here, let's try to change how we find existing PHI candidate: due to the worklist order, and the way PHI nodes are inserted (it may be inserted as the first one, or maybe not), let's look at all PHI nodes in the block. Effects on vanilla llvm test-suite + RawSpeed: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \\|%\\| \| \|----------------------------------------------------\|-----------\|-----------\|-------:\|---------:\|---------:\| \| asm-printer.EmittedInsts \| 7942329 \| 7942457 \| 128 \| 0.00% \| 0.00% \| \| assembler.ObjectBytes \| 254295632 \| 254312480 \| 16848 \| 0.01% \| 0.01% \| \| correlated-value-propagation.NumPhis \| 18412 \| 18347 \| -65 \| -0.35% \| 0.35% \| \| early-cse.NumCSE \| 2183283 \| 2183267 \| -16 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 550105 \| 541842 \| -8263 \| -1.50% \| 1.50% \| \| instcombine.NumAggregateReconstructionsSimplified \| 73 \| 4506 \| 4433 \| 6072.60% \| 6072.60% \| \| instcombine.NumCombined \| 3640311 \| 3644419 \| 4108 \| 0.11% \| 0.11% \| \| instcombine.NumDeadInst \| 1778204 \| 1783205 \| 5001 \| 0.28% \| 0.28% \| \| instcombine.NumPHICSEs \| 0 \| 22490 \| 22490 \| 0.00% \| 0.00% \| \| instcombine.NumWorklistIterations \| 2023272 \| 2024400 \| 1128 \| 0.06% \| 0.06% \| \| instcount.NumCallInst \| 1758395 \| 1758802 \| 407 \| 0.02% \| 0.02% \| \| instcount.NumInvokeInst \| 59478 \| 59502 \| 24 \| 0.04% \| 0.04% \| \| instcount.NumPHIInst \| 330557 \| 330545 \| -12 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1077138 \| 1077220 \| 82 \| 0.01% \| 0.01% \| \| instcount.TotalFuncs \| 101442 \| 101441 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 8831946 \| 8832606 \| 660 \| 0.01% \| 0.01% \| \| simplifycfg.NumHoistCommonCode \| 24186 \| 24187 \| 1 \| 0.00% \| 0.00% \| \| simplifycfg.NumInvokes \| 4300 \| 4410 \| 110 \| 2.56% \| 2.56% \| \| simplifycfg.NumSimpl \| 1019813 \| 999767 \| -20046 \| -1.97% \| 1.97% \| ``` So it fires 22490 times, which is less than ~24k the take 1 did, but more than what take 2 did (22228 times) . It allows foldAggregateConstructionIntoAggregateReuse() to actually work after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG would have done this PHI CSE, of all places. Additionally, allows some more `invoke`->`call` folds to happen (+110, +2.56%). All in all, expectedly, this catches less things overall, but all the motivational cases are still caught, so all good. show more ...
# bdaa3f86	29-Aug-2020	Roman Lebedev <lebedev.ri@gmail.com>	Revert "[InstCombine] Take 2: Perform trivial PHI CSE" While the original variant with doing this in InstSimplify (rightfully) caused questions and ultimately was detected to be a culprit of stage2- Revert "[InstCombine] Take 2: Perform trivial PHI CSE" While the original variant with doing this in InstSimplify (rightfully) caused questions and ultimately was detected to be a culprit of stage2-stage3 mismatch, it was expected that InstCombine-based implementation would be fine. But apparently it's not, as http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24095/steps/compare-compilers/logs/stdio suggests. Which suggests that somewhere in InstCombine there is a loop over nondeterministically sorted container, which causes different worklist ordering. This reverts commit 3e69871ab5a66fb55913a2a2f5e7f5b42899a4c9. show more ...
# 3e69871a	29-Aug-2020	Roman Lebedev <lebedev.ri@gmail.com>	[InstCombine] Take 2: Perform trivial PHI CSE The original take was 6102310d814ad73eab60a88b21dd70874f7a056f, which taught InstSimplify to do that, which seemed better at time, since we got EarlyCSE [InstCombine] Take 2: Perform trivial PHI CSE The original take was 6102310d814ad73eab60a88b21dd70874f7a056f, which taught InstSimplify to do that, which seemed better at time, since we got EarlyCSE support for free. However, it was proven that we can not do that there, the simplified-to PHI would not be reachable from the original PHI, and that is not something InstSimplify is allowed to do, as noted in the commit ed90f15efb40d26b5d3ead3bb8e9e284218e0186 that reverted it : > It appears to cause compilation non-determinism and caused stage3 mismatches. However InstCombine already does many different optimizations, so it should be a safe place to do it here. Note that we still can't just compare incoming values ranges, because there is no guarantee that these PHI's we'd simplify to were already re-visited and sorted. However coming up with a test is problematic. Effects on vanilla llvm test-suite + RawSpeed: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \|%\| \| \|----------------------------------------------------\|-----------\|-----------\|-------:\|---------:\|---------:\| \| instcombine.NumPHICSEs \| 0 \| 22228 \| 22228 \| 0.00% \| 0.00% \| \| asm-printer.EmittedInsts \| 7942329 \| 7942456 \| 127 \| 0.00% \| 0.00% \| \| assembler.ObjectBytes \| 254295632 \| 254313792 \| 18160 \| 0.01% \| 0.01% \| \| early-cse.NumCSE \| 2183283 \| 2183272 \| -11 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 550105 \| 541842 \| -8263 \| -1.50% \| 1.50% \| \| instcombine.NumAggregateReconstructionsSimplified \| 73 \| 4506 \| 4433 \| 6072.60% \| 6072.60% \| \| instcombine.NumCombined \| 3640311 \| 3666911 \| 26600 \| 0.73% \| 0.73% \| \| instcombine.NumDeadInst \| 1778204 \| 1783318 \| 5114 \| 0.29% \| 0.29% \| \| instcount.NumCallInst \| 1758395 \| 1758804 \| 409 \| 0.02% \| 0.02% \| \| instcount.NumInvokeInst \| 59478 \| 59502 \| 24 \| 0.04% \| 0.04% \| \| instcount.NumPHIInst \| 330557 \| 330549 \| -8 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1077138 \| 1077221 \| 83 \| 0.01% \| 0.01% \| \| instcount.TotalFuncs \| 101442 \| 101441 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 8831946 \| 8832611 \| 665 \| 0.01% \| 0.01% \| \| simplifycfg.NumInvokes \| 4300 \| 4410 \| 110 \| 2.56% \| 2.56% \| \| simplifycfg.NumSimpl \| 1019813 \| 999740 \| -20073 \| -1.97% \| 1.97% \| ``` So it fires ~22k times, which is less than ~24k the take 1 did. It allows foldAggregateConstructionIntoAggregateReuse() to actually work after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG would have done this PHI CSE, of all places. Additionally, allows some more `invoke`->`call` folds to happen (+110, +2.56%). All in all, expectedly, this catches less things overall, but all the motivational cases are still caught, so all good. show more ...
12 3