1.. _loop-terminology: 2 3=========================================== 4LLVM Loop Terminology (and Canonical Forms) 5=========================================== 6 7.. contents:: 8 :local: 9 10Loop Definition 11=============== 12 13Loops are an important concept for a code optimizer. In LLVM, detection 14of loops in a control-flow graph is done by :ref:`loopinfo`. It is based 15on the following definition. 16 17A loop is a subset of nodes from the control-flow graph (CFG; where 18nodes represent basic blocks) with the following properties: 19 201. The induced subgraph (which is the subgraph that contains all the 21 edges from the CFG within the loop) is strongly connected 22 (every node is reachable from all others). 23 242. All edges from outside the subset into the subset point to the same 25 node, called the **header**. As a consequence, the header dominates 26 all nodes in the loop (i.e. every execution path to any of the loop's 27 node will have to pass through the header). 28 293. The loop is the maximum subset with these properties. That is, no 30 additional nodes from the CFG can be added such that the induced 31 subgraph would still be strongly connected and the header would 32 remain the same. 33 34In computer science literature, this is often called a *natural loop*. 35In LLVM, a more generalized definition is called a 36:ref:`cycle <cycle-terminology>`. 37 38 39Terminology 40----------- 41 42The definition of a loop comes with some additional terminology: 43 44* An **entering block** (or **loop predecessor**) is a non-loop node 45 that has an edge into the loop (necessarily the header). If there is 46 only one entering block, and its only edge is to the 47 header, it is also called the loop's **preheader**. The preheader 48 dominates the loop without itself being part of the loop. 49 50* A **latch** is a loop node that has an edge to the header. 51 52* A **backedge** is an edge from a latch to the header. 53 54* An **exiting edge** is an edge from inside the loop to a node outside 55 of the loop. The source of such an edge is called an **exiting block**, its 56 target is an **exit block**. 57 58.. image:: ./loop-terminology.svg 59 :width: 400 px 60 61 62Important Notes 63--------------- 64 65This loop definition has some noteworthy consequences: 66 67* A node can be the header of at most one loop. As such, a loop can be 68 identified by its header. Due to the header being the only entry into 69 a loop, it can be called a Single-Entry-Multiple-Exits (SEME) region. 70 71 72* For basic blocks that are not reachable from the function's entry, the 73 concept of loops is undefined. This follows from the concept of 74 dominance being undefined as well. 75 76 77* The smallest loop consists of a single basic block that branches to 78 itself. In this case that block is the header, latch (and exiting 79 block if it has another edge to a different block) at the same time. 80 A single block that has no branch to itself is not considered a loop, 81 even though it is trivially strongly connected. 82 83.. image:: ./loop-single.svg 84 :width: 300 px 85 86In this case, the role of header, exiting block and latch fall to the 87same node. :ref:`loopinfo` reports this as: 88 89.. code-block:: console 90 91 $ opt input.ll -passes='print<loops>' 92 Loop at depth 1 containing: %for.body<header><latch><exiting> 93 94 95* Loops can be nested inside each other. That is, a loop's node set can 96 be a subset of another loop with a different loop header. The loop 97 hierarchy in a function forms a forest: Each top-level loop is the 98 root of the tree of the loops nested inside it. 99 100.. image:: ./loop-nested.svg 101 :width: 350 px 102 103 104* It is not possible that two loops share only a few of their nodes. 105 Two loops are either disjoint or one is nested inside the other. In 106 the example below the left and right subsets both violate the 107 maximality condition. Only the merge of both sets is considered a loop. 108 109.. image:: ./loop-nonmaximal.svg 110 :width: 250 px 111 112 113* It is also possible that two logical loops share a header, but are 114 considered a single loop by LLVM: 115 116.. code-block:: C 117 118 for (int i = 0; i < 128; ++i) 119 for (int j = 0; j < 128; ++j) 120 body(i,j); 121 122which might be represented in LLVM-IR as follows. Note that there is 123only a single header and hence just a single loop. 124 125.. image:: ./loop-merge.svg 126 :width: 400 px 127 128The :ref:`LoopSimplify <loop-terminology-loop-simplify>` pass will 129detect the loop and ensure separate headers for the outer and inner loop. 130 131.. image:: ./loop-separate.svg 132 :width: 400 px 133 134* A cycle in the CFG does not imply there is a loop. The example below 135 shows such a CFG, where there is no header node that dominates all 136 other nodes in the cycle. This is called **irreducible control-flow**. 137 138.. image:: ./loop-irreducible.svg 139 :width: 150 px 140 141The term reducible results from the ability to collapse the CFG into a 142single node by successively replacing one of three base structures with 143a single node: A sequential execution of basic blocks, acyclic conditional 144branches (or switches), and a basic block looping on itself. 145`Wikipedia <https://en.wikipedia.org/wiki/Control-flow_graph#Reducibility>`_ 146has a more formal definition, which basically says that every cycle has 147a dominating header. 148 149 150* Irreducible control-flow can occur at any level of the loop nesting. 151 That is, a loop that itself does not contain any loops can still have 152 cyclic control flow in its body; a loop that is not nested inside 153 another loop can still be part of an outer cycle; and there can be 154 additional cycles between any two loops where one is contained in the other. 155 However, an LLVM :ref:`cycle<cycle-terminology>` covers both, loops and 156 irreducible control flow. 157 158 159* The `FixIrreducible <https://llvm.org/doxygen/FixIrreducible_8h.html>`_ 160 pass can transform irreducible control flow into loops by inserting 161 new loop headers. It is not included in any default optimization pass 162 pipeline, but is required for some back-end targets. 163 164 165* Exiting edges are not the only way to break out of a loop. Other 166 possibilities are unreachable terminators, [[noreturn]] functions, 167 exceptions, signals, and your computer's power button. 168 169 170* A basic block "inside" the loop that does not have a path back to the 171 loop (i.e. to a latch or header) is not considered part of the loop. 172 This is illustrated by the following code. 173 174.. code-block:: C 175 176 for (unsigned i = 0; i <= n; ++i) { 177 if (c1) { 178 // When reaching this block, we will have exited the loop. 179 do_something(); 180 break; 181 } 182 if (c2) { 183 // abort(), never returns, so we have exited the loop. 184 abort(); 185 } 186 if (c3) { 187 // The unreachable allows the compiler to assume that this will not rejoin the loop. 188 do_something(); 189 __builtin_unreachable(); 190 } 191 if (c4) { 192 // This statically infinite loop is not nested because control-flow will not continue with the for-loop. 193 while(true) { 194 do_something(); 195 } 196 } 197 } 198 199 200* There is no requirement for the control flow to eventually leave the 201 loop, i.e. a loop can be infinite. A **statically infinite loop** is a 202 loop that has no exiting edges. A **dynamically infinite loop** has 203 exiting edges, but it is possible to be never taken. This may happen 204 only under some circumstances, such as when n == UINT_MAX in the code 205 below. 206 207.. code-block:: C 208 209 for (unsigned i = 0; i <= n; ++i) 210 body(i); 211 212It is possible for the optimizer to turn a dynamically infinite loop 213into a statically infinite loop, for instance when it can prove that the 214exiting condition is always false. Because the exiting edge is never 215taken, the optimizer can change the conditional branch into an 216unconditional one. 217 218If a is loop is annotated with 219:ref:`llvm.loop.mustprogress <langref_llvm_loop_mustprogress>` metadata, 220the compiler is allowed to assume that it will eventually terminate, even 221if it cannot prove it. For instance, it may remove a mustprogress-loop 222that does not have any side-effect in its body even though the program 223could be stuck in that loop forever. Languages such as C and 224`C++ <https://eel.is/c++draft/intro.progress#1>`_ have such 225forward-progress guarantees for some loops. Also see the 226:ref:`mustprogress <langref_mustprogress>` and 227:ref:`willreturn <langref_willreturn>` function attributes, as well as 228the older :ref:`llvm.sideeffect <llvm_sideeffect>` intrinsic. 229 230* The number of executions of the loop header before leaving the loop is 231 the **loop trip count** (or **iteration count**). If the loop should 232 not be executed at all, a **loop guard** must skip the entire loop: 233 234.. image:: ./loop-guard.svg 235 :width: 500 px 236 237Since the first thing a loop header might do is to check whether there 238is another execution and if not, immediately exit without doing any work 239(also see :ref:`loop-terminology-loop-rotate`), loop trip count is not 240the best measure of a loop's number of iterations. For instance, the 241number of header executions of the code below for a non-positive n 242(before loop rotation) is 1, even though the loop body is not executed 243at all. 244 245.. code-block:: C 246 247 for (int i = 0; i < n; ++i) 248 body(i); 249 250A better measure is the **backedge-taken count**, which is the number of 251times any of the backedges is taken before the loop. It is one less than 252the trip count for executions that enter the header. 253 254 255.. _loopinfo: 256 257LoopInfo 258======== 259 260LoopInfo is the core analysis for obtaining information about loops. 261There are few key implications of the definitions given above which 262are important for working successfully with this interface. 263 264* LoopInfo does not contain information about non-loop cycles. As a 265 result, it is not suitable for any algorithm which requires complete 266 cycle detection for correctness. 267 268* LoopInfo provides an interface for enumerating all top level loops 269 (e.g. those not contained in any other loop). From there, you may 270 walk the tree of sub-loops rooted in that top level loop. 271 272* Loops which become statically unreachable during optimization *must* 273 be removed from LoopInfo. If this can not be done for some reason, 274 then the optimization is *required* to preserve the static 275 reachability of the loop. 276 277 278.. _loop-terminology-loop-simplify: 279 280Loop Simplify Form 281================== 282 283The Loop Simplify Form is a canonical form that makes 284several analyses and transformations simpler and more effective. 285It is ensured by the LoopSimplify 286(:ref:`-loop-simplify <passes-loop-simplify>`) pass and is automatically 287added by the pass managers when scheduling a LoopPass. 288This pass is implemented in 289`LoopSimplify.h <https://llvm.org/doxygen/LoopSimplify_8h_source.html>`_. 290When it is successful, the loop has: 291 292* A preheader. 293* A single backedge (which implies that there is a single latch). 294* Dedicated exits. That is, no exit block for the loop 295 has a predecessor that is outside the loop. This implies 296 that all exit blocks are dominated by the loop header. 297 298.. _loop-terminology-lcssa: 299 300Loop Closed SSA (LCSSA) 301======================= 302 303A program is in Loop Closed SSA Form if it is in SSA form 304and all values that are defined in a loop are used only inside 305this loop. 306 307Programs written in LLVM IR are always in SSA form but not necessarily 308in LCSSA. To achieve the latter, for each value that is live across the 309loop boundary, single entry PHI nodes are inserted to each of the exit blocks 310[#lcssa-construction]_ in order to "close" these values inside the loop. 311In particular, consider the following loop: 312 313.. code-block:: C 314 315 c = ...; 316 for (...) { 317 if (c) 318 X1 = ... 319 else 320 X2 = ... 321 X3 = phi(X1, X2); // X3 defined 322 } 323 324 ... = X3 + 4; // X3 used, i.e. live 325 // outside the loop 326 327In the inner loop, the X3 is defined inside the loop, but used 328outside of it. In Loop Closed SSA form, this would be represented as follows: 329 330.. code-block:: C 331 332 c = ...; 333 for (...) { 334 if (c) 335 X1 = ... 336 else 337 X2 = ... 338 X3 = phi(X1, X2); 339 } 340 X4 = phi(X3); 341 342 ... = X4 + 4; 343 344This is still valid LLVM; the extra phi nodes are purely redundant, 345but all LoopPass'es are required to preserve them. 346This form is ensured by the LCSSA (:ref:`-lcssa <passes-lcssa>`) 347pass and is added automatically by the LoopPassManager when 348scheduling a LoopPass. 349After the loop optimizations are done, these extra phi nodes 350will be deleted by :ref:`-instcombine <passes-instcombine>`. 351 352Note that an exit block is outside of a loop, so how can such a phi "close" 353the value inside the loop since it uses it outside of it ? First of all, 354for phi nodes, as 355`mentioned in the LangRef <https://llvm.org/docs/LangRef.html#id311>`_: 356"the use of each incoming value is deemed to occur on the edge from the 357corresponding predecessor block to the current block". Now, an 358edge to an exit block is considered outside of the loop because 359if we take that edge, it leads us clearly out of the loop. 360 361However, an edge doesn't actually contain any IR, so in source code, 362we have to choose a convention of whether the use happens in 363the current block or in the respective predecessor. For LCSSA's purpose, 364we consider the use happens in the latter (so as to consider the 365use inside) [#point-of-use-phis]_. 366 367The major benefit of LCSSA is that it makes many other loop optimizations 368simpler. 369 370First of all, a simple observation is that if one needs to see all 371the outside users, they can just iterate over all the (loop closing) 372PHI nodes in the exit blocks (the alternative would be to 373scan the def-use chain [#def-use-chain]_ of all instructions in the loop). 374 375Then, consider for example 376:ref:`simple-loop-unswitch <passes-simple-loop-unswitch>` ing the loop above. 377Because it is in LCSSA form, we know that any value defined inside of 378the loop will be used either only inside the loop or in a loop closing 379PHI node. In this case, the only loop closing PHI node is X4. 380This means that we can just copy the loop and change the X4 381accordingly, like so: 382 383.. code-block:: C 384 385 c = ...; 386 if (c) { 387 for (...) { 388 if (true) 389 X1 = ... 390 else 391 X2 = ... 392 X3 = phi(X1, X2); 393 } 394 } else { 395 for (...) { 396 if (false) 397 X1' = ... 398 else 399 X2' = ... 400 X3' = phi(X1', X2'); 401 } 402 } 403 X4 = phi(X3, X3') 404 405Now, all uses of X4 will get the updated value (in general, 406if a loop is in LCSSA form, in any loop transformation, 407we only need to update the loop closing PHI nodes for the changes 408to take effect). If we did not have Loop Closed SSA form, it means that X3 could 409possibly be used outside the loop. So, we would have to introduce the 410X4 (which is the new X3) and replace all uses of X3 with that. 411However, we should note that because LLVM keeps a def-use chain 412[#def-use-chain]_ for each Value, we wouldn't need 413to perform data-flow analysis to find and replace all the uses 414(there is even a utility function, replaceAllUsesWith(), 415that performs this transformation by iterating the def-use chain). 416 417Another important advantage is that the behavior of all uses 418of an induction variable is the same. Without this, you need to 419distinguish the case when the variable is used outside of 420the loop it is defined in, for example: 421 422.. code-block:: C 423 424 for (i = 0; i < 100; i++) { 425 for (j = 0; j < 100; j++) { 426 k = i + j; 427 use(k); // use 1 428 } 429 use(k); // use 2 430 } 431 432Looking from the outer loop with the normal SSA form, the first use of k 433is not well-behaved, while the second one is an induction variable with 434base 100 and step 1. Although, in practice, and in the LLVM context, 435such cases can be handled effectively by SCEV. Scalar Evolution 436(:ref:`scalar-evolution <passes-scalar-evolution>`) or SCEV, is a 437(analysis) pass that analyzes and categorizes the evolution of scalar 438expressions in loops. 439 440In general, it's easier to use SCEV in loops that are in LCSSA form. 441The evolution of a scalar (loop-variant) expression that 442SCEV can analyze is, by definition, relative to a loop. 443An expression is represented in LLVM by an 444`llvm::Instruction <https://llvm.org/doxygen/classllvm_1_1Instruction.html>`_. 445If the expression is inside two (or more) loops (which can only 446happen if the loops are nested, like in the example above) and you want 447to get an analysis of its evolution (from SCEV), 448you have to also specify relative to what Loop you want it. 449Specifically, you have to use 450`getSCEVAtScope() <https://llvm.org/doxygen/classllvm_1_1ScalarEvolution.html#a21d6ee82eed29080d911dbb548a8bb68>`_. 451 452However, if all loops are in LCSSA form, each expression is actually 453represented by two different llvm::Instructions. One inside the loop 454and one outside, which is the loop-closing PHI node and represents 455the value of the expression after the last iteration (effectively, 456we break each loop-variant expression into two expressions and so, every 457expression is at most in one loop). You can now just use 458`getSCEV() <https://llvm.org/doxygen/classllvm_1_1ScalarEvolution.html#a30bd18ac905eacf3601bc6a553a9ff49>`_. 459and which of these two llvm::Instructions you pass to it disambiguates 460the context / scope / relative loop. 461 462.. rubric:: Footnotes 463 464.. [#lcssa-construction] To insert these loop-closing PHI nodes, one has to 465 (re-)compute dominance frontiers (if the loop has multiple exits). 466 467.. [#point-of-use-phis] Considering the point of use of a PHI entry value 468 to be in the respective predecessor is a convention across the whole LLVM. 469 The reason is mostly practical; for example it preserves the dominance 470 property of SSA. It is also just an overapproximation of the actual 471 number of uses; the incoming block could branch to another block in which 472 case the value is not actually used but there are no side-effects (it might 473 increase its live range which is not relevant in LCSSA though). 474 Furthermore, we can gain some intuition if we consider liveness: 475 A PHI is *usually* inserted in the current block because the value can't 476 be used from this point and onwards (i.e. the current block is a dominance 477 frontier). It doesn't make sense to consider that the value is used in 478 the current block (because of the PHI) since the value stops being live 479 before the PHI. In some sense the PHI definition just "replaces" the original 480 value definition and doesn't actually use it. It should be stressed that 481 this analogy is only used as an example and does not pose any strict 482 requirements. For example, the value might dominate the current block 483 but we can still insert a PHI (as we do with LCSSA PHI nodes) *and* 484 use the original value afterwards (in which case the two live ranges overlap, 485 although in LCSSA (the whole point is that) we never do that). 486 487 488.. [#def-use-chain] A property of SSA is that there exists a def-use chain 489 for each definition, which is a list of all the uses of this definition. 490 LLVM implements this property by keeping a list of all the uses of a Value 491 in an internal data structure. 492 493"More Canonical" Loops 494====================== 495 496.. _loop-terminology-loop-rotate: 497 498Rotated Loops 499------------- 500 501Loops are rotated by the LoopRotate (:ref:`loop-rotate <passes-loop-rotate>`) 502pass, which converts loops into do/while style loops and is 503implemented in 504`LoopRotation.h <https://llvm.org/doxygen/LoopRotation_8h_source.html>`_. Example: 505 506.. code-block:: C 507 508 void test(int n) { 509 for (int i = 0; i < n; i += 1) 510 // Loop body 511 } 512 513is transformed to: 514 515.. code-block:: C 516 517 void test(int n) { 518 int i = 0; 519 do { 520 // Loop body 521 i += 1; 522 } while (i < n); 523 } 524 525**Warning**: This transformation is valid only if the compiler 526can prove that the loop body will be executed at least once. Otherwise, 527it has to insert a guard which will test it at runtime. In the example 528above, that would be: 529 530.. code-block:: C 531 532 void test(int n) { 533 int i = 0; 534 if (n > 0) { 535 do { 536 // Loop body 537 i += 1; 538 } while (i < n); 539 } 540 } 541 542It's important to understand the effect of loop rotation 543at the LLVM IR level. We follow with the previous examples 544in LLVM IR while also providing a graphical representation 545of the control-flow graphs (CFG). You can get the same graphical 546results by utilizing the :ref:`view-cfg <passes-view-cfg>` pass. 547 548The initial **for** loop could be translated to: 549 550.. code-block:: none 551 552 define void @test(i32 %n) { 553 entry: 554 br label %for.header 555 556 for.header: 557 %i = phi i32 [ 0, %entry ], [ %i.next, %latch ] 558 %cond = icmp slt i32 %i, %n 559 br i1 %cond, label %body, label %exit 560 561 body: 562 ; Loop body 563 br label %latch 564 565 latch: 566 %i.next = add nsw i32 %i, 1 567 br label %for.header 568 569 exit: 570 ret void 571 } 572 573.. image:: ./loop-terminology-initial-loop.png 574 :width: 400 px 575 576Before we explain how LoopRotate will actually 577transform this loop, here's how we could convert 578it (by hand) to a do-while style loop. 579 580.. code-block:: none 581 582 define void @test(i32 %n) { 583 entry: 584 br label %body 585 586 body: 587 %i = phi i32 [ 0, %entry ], [ %i.next, %latch ] 588 ; Loop body 589 br label %latch 590 591 latch: 592 %i.next = add nsw i32 %i, 1 593 %cond = icmp slt i32 %i.next, %n 594 br i1 %cond, label %body, label %exit 595 596 exit: 597 ret void 598 } 599 600.. image:: ./loop-terminology-rotated-loop.png 601 :width: 400 px 602 603Note two things: 604 605* The condition check was moved to the "bottom" of the loop, i.e. 606 the latch. This is something that LoopRotate does by copying the header 607 of the loop to the latch. 608* The compiler in this case can't deduce that the loop will 609 definitely execute at least once so the above transformation 610 is not valid. As mentioned above, a guard has to be inserted, 611 which is something that LoopRotate will do. 612 613This is how LoopRotate transforms this loop: 614 615.. code-block:: none 616 617 define void @test(i32 %n) { 618 entry: 619 %guard_cond = icmp slt i32 0, %n 620 br i1 %guard_cond, label %loop.preheader, label %exit 621 622 loop.preheader: 623 br label %body 624 625 body: 626 %i2 = phi i32 [ 0, %loop.preheader ], [ %i.next, %latch ] 627 br label %latch 628 629 latch: 630 %i.next = add nsw i32 %i2, 1 631 %cond = icmp slt i32 %i.next, %n 632 br i1 %cond, label %body, label %loop.exit 633 634 loop.exit: 635 br label %exit 636 637 exit: 638 ret void 639 } 640 641.. image:: ./loop-terminology-guarded-loop.png 642 :width: 500 px 643 644The result is a little bit more complicated than we may expect 645because LoopRotate ensures that the loop is in 646:ref:`Loop Simplify Form <loop-terminology-loop-simplify>` 647after rotation. 648In this case, it inserted the %loop.preheader basic block so 649that the loop has a preheader and it introduced the %loop.exit 650basic block so that the loop has dedicated exits 651(otherwise, %exit would be jumped from both %latch and %entry, 652but %entry is not contained in the loop). 653Note that a loop has to be in Loop Simplify Form beforehand 654too for LoopRotate to be applied successfully. 655 656The main advantage of this form is that it allows hoisting 657invariant instructions, especially loads, into the preheader. 658That could be done in non-rotated loops as well but with 659some disadvantages. Let's illustrate them with an example: 660 661.. code-block:: C 662 663 for (int i = 0; i < n; ++i) { 664 auto v = *p; 665 use(v); 666 } 667 668We assume that loading from p is invariant and use(v) is some 669statement that uses v. 670If we wanted to execute the load only once we could move it 671"out" of the loop body, resulting in this: 672 673.. code-block:: C 674 675 auto v = *p; 676 for (int i = 0; i < n; ++i) { 677 use(v); 678 } 679 680However, now, in the case that n <= 0, in the initial form, 681the loop body would never execute, and so, the load would 682never execute. This is a problem mainly for semantic reasons. 683Consider the case in which n <= 0 and loading from p is invalid. 684In the initial program there would be no error. However, with this 685transformation we would introduce one, effectively breaking 686the initial semantics. 687 688To avoid both of these problems, we can insert a guard: 689 690.. code-block:: C 691 692 if (n > 0) { // loop guard 693 auto v = *p; 694 for (int i = 0; i < n; ++i) { 695 use(v); 696 } 697 } 698 699This is certainly better but it could be improved slightly. Notice 700that the check for whether n is bigger than 0 is executed twice (and 701n does not change in between). Once when we check the guard condition 702and once in the first execution of the loop. To avoid that, we could 703do an unconditional first execution and insert the loop condition 704in the end. This effectively means transforming the loop into a do-while loop: 705 706.. code-block:: C 707 708 if (0 < n) { 709 auto v = *p; 710 do { 711 use(v); 712 ++i; 713 } while (i < n); 714 } 715 716Note that LoopRotate does not generally do such 717hoisting. Rather, it is an enabling transformation for other 718passes like Loop-Invariant Code Motion (:ref:`-licm <passes-licm>`). 719