1 /* Predictive commoning. 2 Copyright (C) 2005, 2007, 2008, 2009, 2010 3 Free Software Foundation, Inc. 4 5 This file is part of GCC. 6 7 GCC is free software; you can redistribute it and/or modify it 8 under the terms of the GNU General Public License as published by the 9 Free Software Foundation; either version 3, or (at your option) any 10 later version. 11 12 GCC is distributed in the hope that it will be useful, but WITHOUT 13 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 14 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 15 for more details. 16 17 You should have received a copy of the GNU General Public License 18 along with GCC; see the file COPYING3. If not see 19 <http://www.gnu.org/licenses/>. */ 20 21 /* This file implements the predictive commoning optimization. Predictive 22 commoning can be viewed as CSE around a loop, and with some improvements, 23 as generalized strength reduction-- i.e., reusing values computed in 24 earlier iterations of a loop in the later ones. So far, the pass only 25 handles the most useful case, that is, reusing values of memory references. 26 If you think this is all just a special case of PRE, you are sort of right; 27 however, concentrating on loops is simpler, and makes it possible to 28 incorporate data dependence analysis to detect the opportunities, perform 29 loop unrolling to avoid copies together with renaming immediately, 30 and if needed, we could also take register pressure into account. 31 32 Let us demonstrate what is done on an example: 33 34 for (i = 0; i < 100; i++) 35 { 36 a[i+2] = a[i] + a[i+1]; 37 b[10] = b[10] + i; 38 c[i] = c[99 - i]; 39 d[i] = d[i + 1]; 40 } 41 42 1) We find data references in the loop, and split them to mutually 43 independent groups (i.e., we find components of a data dependence 44 graph). We ignore read-read dependences whose distance is not constant. 45 (TODO -- we could also ignore antidependences). In this example, we 46 find the following groups: 47 48 a[i]{read}, a[i+1]{read}, a[i+2]{write} 49 b[10]{read}, b[10]{write} 50 c[99 - i]{read}, c[i]{write} 51 d[i + 1]{read}, d[i]{write} 52 53 2) Inside each of the group, we verify several conditions: 54 a) all the references must differ in indices only, and the indices 55 must all have the same step 56 b) the references must dominate loop latch (and thus, they must be 57 ordered by dominance relation). 58 c) the distance of the indices must be a small multiple of the step 59 We are then able to compute the difference of the references (# of 60 iterations before they point to the same place as the first of them). 61 Also, in case there are writes in the loop, we split the groups into 62 chains whose head is the write whose values are used by the reads in 63 the same chain. The chains are then processed independently, 64 making the further transformations simpler. Also, the shorter chains 65 need the same number of registers, but may require lower unrolling 66 factor in order to get rid of the copies on the loop latch. 67 68 In our example, we get the following chains (the chain for c is invalid). 69 70 a[i]{read,+0}, a[i+1]{read,-1}, a[i+2]{write,-2} 71 b[10]{read,+0}, b[10]{write,+0} 72 d[i + 1]{read,+0}, d[i]{write,+1} 73 74 3) For each read, we determine the read or write whose value it reuses, 75 together with the distance of this reuse. I.e. we take the last 76 reference before it with distance 0, or the last of the references 77 with the smallest positive distance to the read. Then, we remove 78 the references that are not used in any of these chains, discard the 79 empty groups, and propagate all the links so that they point to the 80 single root reference of the chain (adjusting their distance 81 appropriately). Some extra care needs to be taken for references with 82 step 0. In our example (the numbers indicate the distance of the 83 reuse), 84 85 a[i] --> (*) 2, a[i+1] --> (*) 1, a[i+2] (*) 86 b[10] --> (*) 1, b[10] (*) 87 88 4) The chains are combined together if possible. If the corresponding 89 elements of two chains are always combined together with the same 90 operator, we remember just the result of this combination, instead 91 of remembering the values separately. We may need to perform 92 reassociation to enable combining, for example 93 94 e[i] + f[i+1] + e[i+1] + f[i] 95 96 can be reassociated as 97 98 (e[i] + f[i]) + (e[i+1] + f[i+1]) 99 100 and we can combine the chains for e and f into one chain. 101 102 5) For each root reference (end of the chain) R, let N be maximum distance 103 of a reference reusing its value. Variables R0 upto RN are created, 104 together with phi nodes that transfer values from R1 .. RN to 105 R0 .. R(N-1). 106 Initial values are loaded to R0..R(N-1) (in case not all references 107 must necessarily be accessed and they may trap, we may fail here; 108 TODO sometimes, the loads could be guarded by a check for the number 109 of iterations). Values loaded/stored in roots are also copied to 110 RN. Other reads are replaced with the appropriate variable Ri. 111 Everything is put to SSA form. 112 113 As a small improvement, if R0 is dead after the root (i.e., all uses of 114 the value with the maximum distance dominate the root), we can avoid 115 creating RN and use R0 instead of it. 116 117 In our example, we get (only the parts concerning a and b are shown): 118 for (i = 0; i < 100; i++) 119 { 120 f = phi (a[0], s); 121 s = phi (a[1], f); 122 x = phi (b[10], x); 123 124 f = f + s; 125 a[i+2] = f; 126 x = x + i; 127 b[10] = x; 128 } 129 130 6) Factor F for unrolling is determined as the smallest common multiple of 131 (N + 1) for each root reference (N for references for that we avoided 132 creating RN). If F and the loop is small enough, loop is unrolled F 133 times. The stores to RN (R0) in the copies of the loop body are 134 periodically replaced with R0, R1, ... (R1, R2, ...), so that they can 135 be coalesced and the copies can be eliminated. 136 137 TODO -- copy propagation and other optimizations may change the live 138 ranges of the temporary registers and prevent them from being coalesced; 139 this may increase the register pressure. 140 141 In our case, F = 2 and the (main loop of the) result is 142 143 for (i = 0; i < ...; i += 2) 144 { 145 f = phi (a[0], f); 146 s = phi (a[1], s); 147 x = phi (b[10], x); 148 149 f = f + s; 150 a[i+2] = f; 151 x = x + i; 152 b[10] = x; 153 154 s = s + f; 155 a[i+3] = s; 156 x = x + i; 157 b[10] = x; 158 } 159 160 TODO -- stores killing other stores can be taken into account, e.g., 161 for (i = 0; i < n; i++) 162 { 163 a[i] = 1; 164 a[i+2] = 2; 165 } 166 167 can be replaced with 168 169 t0 = a[0]; 170 t1 = a[1]; 171 for (i = 0; i < n; i++) 172 { 173 a[i] = 1; 174 t2 = 2; 175 t0 = t1; 176 t1 = t2; 177 } 178 a[n] = t0; 179 a[n+1] = t1; 180 181 The interesting part is that this would generalize store motion; still, since 182 sm is performed elsewhere, it does not seem that important. 183 184 Predictive commoning can be generalized for arbitrary computations (not 185 just memory loads), and also nontrivial transfer functions (e.g., replacing 186 i * i with ii_last + 2 * i + 1), to generalize strength reduction. */ 187 188 #include "config.h" 189 #include "system.h" 190 #include "coretypes.h" 191 #include "tm.h" 192 #include "tree.h" 193 #include "tm_p.h" 194 #include "cfgloop.h" 195 #include "tree-flow.h" 196 #include "ggc.h" 197 #include "tree-data-ref.h" 198 #include "tree-scalar-evolution.h" 199 #include "tree-chrec.h" 200 #include "params.h" 201 #include "diagnostic.h" 202 #include "tree-pass.h" 203 #include "tree-affine.h" 204 #include "tree-inline.h" 205 206 /* The maximum number of iterations between the considered memory 207 references. */ 208 209 #define MAX_DISTANCE (target_avail_regs < 16 ? 4 : 8) 210 211 /* Data references (or phi nodes that carry data reference values across 212 loop iterations). */ 213 214 typedef struct dref_d 215 { 216 /* The reference itself. */ 217 struct data_reference *ref; 218 219 /* The statement in that the reference appears. */ 220 gimple stmt; 221 222 /* In case that STMT is a phi node, this field is set to the SSA name 223 defined by it in replace_phis_by_defined_names (in order to avoid 224 pointing to phi node that got reallocated in the meantime). */ 225 tree name_defined_by_phi; 226 227 /* Distance of the reference from the root of the chain (in number of 228 iterations of the loop). */ 229 unsigned distance; 230 231 /* Number of iterations offset from the first reference in the component. */ 232 double_int offset; 233 234 /* Number of the reference in a component, in dominance ordering. */ 235 unsigned pos; 236 237 /* True if the memory reference is always accessed when the loop is 238 entered. */ 239 unsigned always_accessed : 1; 240 } *dref; 241 242 DEF_VEC_P (dref); 243 DEF_VEC_ALLOC_P (dref, heap); 244 245 /* Type of the chain of the references. */ 246 247 enum chain_type 248 { 249 /* The addresses of the references in the chain are constant. */ 250 CT_INVARIANT, 251 252 /* There are only loads in the chain. */ 253 CT_LOAD, 254 255 /* Root of the chain is store, the rest are loads. */ 256 CT_STORE_LOAD, 257 258 /* A combination of two chains. */ 259 CT_COMBINATION 260 }; 261 262 /* Chains of data references. */ 263 264 typedef struct chain 265 { 266 /* Type of the chain. */ 267 enum chain_type type; 268 269 /* For combination chains, the operator and the two chains that are 270 combined, and the type of the result. */ 271 enum tree_code op; 272 tree rslt_type; 273 struct chain *ch1, *ch2; 274 275 /* The references in the chain. */ 276 VEC(dref,heap) *refs; 277 278 /* The maximum distance of the reference in the chain from the root. */ 279 unsigned length; 280 281 /* The variables used to copy the value throughout iterations. */ 282 VEC(tree,heap) *vars; 283 284 /* Initializers for the variables. */ 285 VEC(tree,heap) *inits; 286 287 /* True if there is a use of a variable with the maximal distance 288 that comes after the root in the loop. */ 289 unsigned has_max_use_after : 1; 290 291 /* True if all the memory references in the chain are always accessed. */ 292 unsigned all_always_accessed : 1; 293 294 /* True if this chain was combined together with some other chain. */ 295 unsigned combined : 1; 296 } *chain_p; 297 298 DEF_VEC_P (chain_p); 299 DEF_VEC_ALLOC_P (chain_p, heap); 300 301 /* Describes the knowledge about the step of the memory references in 302 the component. */ 303 304 enum ref_step_type 305 { 306 /* The step is zero. */ 307 RS_INVARIANT, 308 309 /* The step is nonzero. */ 310 RS_NONZERO, 311 312 /* The step may or may not be nonzero. */ 313 RS_ANY 314 }; 315 316 /* Components of the data dependence graph. */ 317 318 struct component 319 { 320 /* The references in the component. */ 321 VEC(dref,heap) *refs; 322 323 /* What we know about the step of the references in the component. */ 324 enum ref_step_type comp_step; 325 326 /* Next component in the list. */ 327 struct component *next; 328 }; 329 330 /* Bitmap of ssa names defined by looparound phi nodes covered by chains. */ 331 332 static bitmap looparound_phis; 333 334 /* Cache used by tree_to_aff_combination_expand. */ 335 336 static struct pointer_map_t *name_expansions; 337 338 /* Dumps data reference REF to FILE. */ 339 340 extern void dump_dref (FILE *, dref); 341 void 342 dump_dref (FILE *file, dref ref) 343 { 344 if (ref->ref) 345 { 346 fprintf (file, " "); 347 print_generic_expr (file, DR_REF (ref->ref), TDF_SLIM); 348 fprintf (file, " (id %u%s)\n", ref->pos, 349 DR_IS_READ (ref->ref) ? "" : ", write"); 350 351 fprintf (file, " offset "); 352 dump_double_int (file, ref->offset, false); 353 fprintf (file, "\n"); 354 355 fprintf (file, " distance %u\n", ref->distance); 356 } 357 else 358 { 359 if (gimple_code (ref->stmt) == GIMPLE_PHI) 360 fprintf (file, " looparound ref\n"); 361 else 362 fprintf (file, " combination ref\n"); 363 fprintf (file, " in statement "); 364 print_gimple_stmt (file, ref->stmt, 0, TDF_SLIM); 365 fprintf (file, "\n"); 366 fprintf (file, " distance %u\n", ref->distance); 367 } 368 369 } 370 371 /* Dumps CHAIN to FILE. */ 372 373 extern void dump_chain (FILE *, chain_p); 374 void 375 dump_chain (FILE *file, chain_p chain) 376 { 377 dref a; 378 const char *chain_type; 379 unsigned i; 380 tree var; 381 382 switch (chain->type) 383 { 384 case CT_INVARIANT: 385 chain_type = "Load motion"; 386 break; 387 388 case CT_LOAD: 389 chain_type = "Loads-only"; 390 break; 391 392 case CT_STORE_LOAD: 393 chain_type = "Store-loads"; 394 break; 395 396 case CT_COMBINATION: 397 chain_type = "Combination"; 398 break; 399 400 default: 401 gcc_unreachable (); 402 } 403 404 fprintf (file, "%s chain %p%s\n", chain_type, (void *) chain, 405 chain->combined ? " (combined)" : ""); 406 if (chain->type != CT_INVARIANT) 407 fprintf (file, " max distance %u%s\n", chain->length, 408 chain->has_max_use_after ? "" : ", may reuse first"); 409 410 if (chain->type == CT_COMBINATION) 411 { 412 fprintf (file, " equal to %p %s %p in type ", 413 (void *) chain->ch1, op_symbol_code (chain->op), 414 (void *) chain->ch2); 415 print_generic_expr (file, chain->rslt_type, TDF_SLIM); 416 fprintf (file, "\n"); 417 } 418 419 if (chain->vars) 420 { 421 fprintf (file, " vars"); 422 for (i = 0; VEC_iterate (tree, chain->vars, i, var); i++) 423 { 424 fprintf (file, " "); 425 print_generic_expr (file, var, TDF_SLIM); 426 } 427 fprintf (file, "\n"); 428 } 429 430 if (chain->inits) 431 { 432 fprintf (file, " inits"); 433 for (i = 0; VEC_iterate (tree, chain->inits, i, var); i++) 434 { 435 fprintf (file, " "); 436 print_generic_expr (file, var, TDF_SLIM); 437 } 438 fprintf (file, "\n"); 439 } 440 441 fprintf (file, " references:\n"); 442 for (i = 0; VEC_iterate (dref, chain->refs, i, a); i++) 443 dump_dref (file, a); 444 445 fprintf (file, "\n"); 446 } 447 448 /* Dumps CHAINS to FILE. */ 449 450 extern void dump_chains (FILE *, VEC (chain_p, heap) *); 451 void 452 dump_chains (FILE *file, VEC (chain_p, heap) *chains) 453 { 454 chain_p chain; 455 unsigned i; 456 457 for (i = 0; VEC_iterate (chain_p, chains, i, chain); i++) 458 dump_chain (file, chain); 459 } 460 461 /* Dumps COMP to FILE. */ 462 463 extern void dump_component (FILE *, struct component *); 464 void 465 dump_component (FILE *file, struct component *comp) 466 { 467 dref a; 468 unsigned i; 469 470 fprintf (file, "Component%s:\n", 471 comp->comp_step == RS_INVARIANT ? " (invariant)" : ""); 472 for (i = 0; VEC_iterate (dref, comp->refs, i, a); i++) 473 dump_dref (file, a); 474 fprintf (file, "\n"); 475 } 476 477 /* Dumps COMPS to FILE. */ 478 479 extern void dump_components (FILE *, struct component *); 480 void 481 dump_components (FILE *file, struct component *comps) 482 { 483 struct component *comp; 484 485 for (comp = comps; comp; comp = comp->next) 486 dump_component (file, comp); 487 } 488 489 /* Frees a chain CHAIN. */ 490 491 static void 492 release_chain (chain_p chain) 493 { 494 dref ref; 495 unsigned i; 496 497 if (chain == NULL) 498 return; 499 500 for (i = 0; VEC_iterate (dref, chain->refs, i, ref); i++) 501 free (ref); 502 503 VEC_free (dref, heap, chain->refs); 504 VEC_free (tree, heap, chain->vars); 505 VEC_free (tree, heap, chain->inits); 506 507 free (chain); 508 } 509 510 /* Frees CHAINS. */ 511 512 static void 513 release_chains (VEC (chain_p, heap) *chains) 514 { 515 unsigned i; 516 chain_p chain; 517 518 for (i = 0; VEC_iterate (chain_p, chains, i, chain); i++) 519 release_chain (chain); 520 VEC_free (chain_p, heap, chains); 521 } 522 523 /* Frees a component COMP. */ 524 525 static void 526 release_component (struct component *comp) 527 { 528 VEC_free (dref, heap, comp->refs); 529 free (comp); 530 } 531 532 /* Frees list of components COMPS. */ 533 534 static void 535 release_components (struct component *comps) 536 { 537 struct component *act, *next; 538 539 for (act = comps; act; act = next) 540 { 541 next = act->next; 542 release_component (act); 543 } 544 } 545 546 /* Finds a root of tree given by FATHERS containing A, and performs path 547 shortening. */ 548 549 static unsigned 550 component_of (unsigned fathers[], unsigned a) 551 { 552 unsigned root, n; 553 554 for (root = a; root != fathers[root]; root = fathers[root]) 555 continue; 556 557 for (; a != root; a = n) 558 { 559 n = fathers[a]; 560 fathers[a] = root; 561 } 562 563 return root; 564 } 565 566 /* Join operation for DFU. FATHERS gives the tree, SIZES are sizes of the 567 components, A and B are components to merge. */ 568 569 static void 570 merge_comps (unsigned fathers[], unsigned sizes[], unsigned a, unsigned b) 571 { 572 unsigned ca = component_of (fathers, a); 573 unsigned cb = component_of (fathers, b); 574 575 if (ca == cb) 576 return; 577 578 if (sizes[ca] < sizes[cb]) 579 { 580 sizes[cb] += sizes[ca]; 581 fathers[ca] = cb; 582 } 583 else 584 { 585 sizes[ca] += sizes[cb]; 586 fathers[cb] = ca; 587 } 588 } 589 590 /* Returns true if A is a reference that is suitable for predictive commoning 591 in the innermost loop that contains it. REF_STEP is set according to the 592 step of the reference A. */ 593 594 static bool 595 suitable_reference_p (struct data_reference *a, enum ref_step_type *ref_step) 596 { 597 tree ref = DR_REF (a), step = DR_STEP (a); 598 599 if (!step 600 || !is_gimple_reg_type (TREE_TYPE (ref)) 601 || tree_could_throw_p (ref)) 602 return false; 603 604 if (integer_zerop (step)) 605 *ref_step = RS_INVARIANT; 606 else if (integer_nonzerop (step)) 607 *ref_step = RS_NONZERO; 608 else 609 *ref_step = RS_ANY; 610 611 return true; 612 } 613 614 /* Stores DR_OFFSET (DR) + DR_INIT (DR) to OFFSET. */ 615 616 static void 617 aff_combination_dr_offset (struct data_reference *dr, aff_tree *offset) 618 { 619 aff_tree delta; 620 621 tree_to_aff_combination_expand (DR_OFFSET (dr), sizetype, offset, 622 &name_expansions); 623 aff_combination_const (&delta, sizetype, tree_to_double_int (DR_INIT (dr))); 624 aff_combination_add (offset, &delta); 625 } 626 627 /* Determines number of iterations of the innermost enclosing loop before B 628 refers to exactly the same location as A and stores it to OFF. If A and 629 B do not have the same step, they never meet, or anything else fails, 630 returns false, otherwise returns true. Both A and B are assumed to 631 satisfy suitable_reference_p. */ 632 633 static bool 634 determine_offset (struct data_reference *a, struct data_reference *b, 635 double_int *off) 636 { 637 aff_tree diff, baseb, step; 638 tree typea, typeb; 639 640 /* Check that both the references access the location in the same type. */ 641 typea = TREE_TYPE (DR_REF (a)); 642 typeb = TREE_TYPE (DR_REF (b)); 643 if (!useless_type_conversion_p (typeb, typea)) 644 return false; 645 646 /* Check whether the base address and the step of both references is the 647 same. */ 648 if (!operand_equal_p (DR_STEP (a), DR_STEP (b), 0) 649 || !operand_equal_p (DR_BASE_ADDRESS (a), DR_BASE_ADDRESS (b), 0)) 650 return false; 651 652 if (integer_zerop (DR_STEP (a))) 653 { 654 /* If the references have loop invariant address, check that they access 655 exactly the same location. */ 656 *off = double_int_zero; 657 return (operand_equal_p (DR_OFFSET (a), DR_OFFSET (b), 0) 658 && operand_equal_p (DR_INIT (a), DR_INIT (b), 0)); 659 } 660 661 /* Compare the offsets of the addresses, and check whether the difference 662 is a multiple of step. */ 663 aff_combination_dr_offset (a, &diff); 664 aff_combination_dr_offset (b, &baseb); 665 aff_combination_scale (&baseb, double_int_minus_one); 666 aff_combination_add (&diff, &baseb); 667 668 tree_to_aff_combination_expand (DR_STEP (a), sizetype, 669 &step, &name_expansions); 670 return aff_combination_constant_multiple_p (&diff, &step, off); 671 } 672 673 /* Returns the last basic block in LOOP for that we are sure that 674 it is executed whenever the loop is entered. */ 675 676 static basic_block 677 last_always_executed_block (struct loop *loop) 678 { 679 unsigned i; 680 VEC (edge, heap) *exits = get_loop_exit_edges (loop); 681 edge ex; 682 basic_block last = loop->latch; 683 684 for (i = 0; VEC_iterate (edge, exits, i, ex); i++) 685 last = nearest_common_dominator (CDI_DOMINATORS, last, ex->src); 686 VEC_free (edge, heap, exits); 687 688 return last; 689 } 690 691 /* Splits dependence graph on DATAREFS described by DEPENDS to components. */ 692 693 static struct component * 694 split_data_refs_to_components (struct loop *loop, 695 VEC (data_reference_p, heap) *datarefs, 696 VEC (ddr_p, heap) *depends) 697 { 698 unsigned i, n = VEC_length (data_reference_p, datarefs); 699 unsigned ca, ia, ib, bad; 700 unsigned *comp_father = XNEWVEC (unsigned, n + 1); 701 unsigned *comp_size = XNEWVEC (unsigned, n + 1); 702 struct component **comps; 703 struct data_reference *dr, *dra, *drb; 704 struct data_dependence_relation *ddr; 705 struct component *comp_list = NULL, *comp; 706 dref dataref; 707 basic_block last_always_executed = last_always_executed_block (loop); 708 709 for (i = 0; VEC_iterate (data_reference_p, datarefs, i, dr); i++) 710 { 711 if (!DR_REF (dr)) 712 { 713 /* A fake reference for call or asm_expr that may clobber memory; 714 just fail. */ 715 goto end; 716 } 717 dr->aux = (void *) (size_t) i; 718 comp_father[i] = i; 719 comp_size[i] = 1; 720 } 721 722 /* A component reserved for the "bad" data references. */ 723 comp_father[n] = n; 724 comp_size[n] = 1; 725 726 for (i = 0; VEC_iterate (data_reference_p, datarefs, i, dr); i++) 727 { 728 enum ref_step_type dummy; 729 730 if (!suitable_reference_p (dr, &dummy)) 731 { 732 ia = (unsigned) (size_t) dr->aux; 733 merge_comps (comp_father, comp_size, n, ia); 734 } 735 } 736 737 for (i = 0; VEC_iterate (ddr_p, depends, i, ddr); i++) 738 { 739 double_int dummy_off; 740 741 if (DDR_ARE_DEPENDENT (ddr) == chrec_known) 742 continue; 743 744 dra = DDR_A (ddr); 745 drb = DDR_B (ddr); 746 ia = component_of (comp_father, (unsigned) (size_t) dra->aux); 747 ib = component_of (comp_father, (unsigned) (size_t) drb->aux); 748 if (ia == ib) 749 continue; 750 751 bad = component_of (comp_father, n); 752 753 /* If both A and B are reads, we may ignore unsuitable dependences. */ 754 if (DR_IS_READ (dra) && DR_IS_READ (drb) 755 && (ia == bad || ib == bad 756 || !determine_offset (dra, drb, &dummy_off))) 757 continue; 758 759 merge_comps (comp_father, comp_size, ia, ib); 760 } 761 762 comps = XCNEWVEC (struct component *, n); 763 bad = component_of (comp_father, n); 764 for (i = 0; VEC_iterate (data_reference_p, datarefs, i, dr); i++) 765 { 766 ia = (unsigned) (size_t) dr->aux; 767 ca = component_of (comp_father, ia); 768 if (ca == bad) 769 continue; 770 771 comp = comps[ca]; 772 if (!comp) 773 { 774 comp = XCNEW (struct component); 775 comp->refs = VEC_alloc (dref, heap, comp_size[ca]); 776 comps[ca] = comp; 777 } 778 779 dataref = XCNEW (struct dref_d); 780 dataref->ref = dr; 781 dataref->stmt = DR_STMT (dr); 782 dataref->offset = double_int_zero; 783 dataref->distance = 0; 784 785 dataref->always_accessed 786 = dominated_by_p (CDI_DOMINATORS, last_always_executed, 787 gimple_bb (dataref->stmt)); 788 dataref->pos = VEC_length (dref, comp->refs); 789 VEC_quick_push (dref, comp->refs, dataref); 790 } 791 792 for (i = 0; i < n; i++) 793 { 794 comp = comps[i]; 795 if (comp) 796 { 797 comp->next = comp_list; 798 comp_list = comp; 799 } 800 } 801 free (comps); 802 803 end: 804 free (comp_father); 805 free (comp_size); 806 return comp_list; 807 } 808 809 /* Returns true if the component COMP satisfies the conditions 810 described in 2) at the beginning of this file. LOOP is the current 811 loop. */ 812 813 static bool 814 suitable_component_p (struct loop *loop, struct component *comp) 815 { 816 unsigned i; 817 dref a, first; 818 basic_block ba, bp = loop->header; 819 bool ok, has_write = false; 820 821 for (i = 0; VEC_iterate (dref, comp->refs, i, a); i++) 822 { 823 ba = gimple_bb (a->stmt); 824 825 if (!just_once_each_iteration_p (loop, ba)) 826 return false; 827 828 gcc_assert (dominated_by_p (CDI_DOMINATORS, ba, bp)); 829 bp = ba; 830 831 if (!DR_IS_READ (a->ref)) 832 has_write = true; 833 } 834 835 first = VEC_index (dref, comp->refs, 0); 836 ok = suitable_reference_p (first->ref, &comp->comp_step); 837 gcc_assert (ok); 838 first->offset = double_int_zero; 839 840 for (i = 1; VEC_iterate (dref, comp->refs, i, a); i++) 841 { 842 if (!determine_offset (first->ref, a->ref, &a->offset)) 843 return false; 844 845 #ifdef ENABLE_CHECKING 846 { 847 enum ref_step_type a_step; 848 ok = suitable_reference_p (a->ref, &a_step); 849 gcc_assert (ok && a_step == comp->comp_step); 850 } 851 #endif 852 } 853 854 /* If there is a write inside the component, we must know whether the 855 step is nonzero or not -- we would not otherwise be able to recognize 856 whether the value accessed by reads comes from the OFFSET-th iteration 857 or the previous one. */ 858 if (has_write && comp->comp_step == RS_ANY) 859 return false; 860 861 return true; 862 } 863 864 /* Check the conditions on references inside each of components COMPS, 865 and remove the unsuitable components from the list. The new list 866 of components is returned. The conditions are described in 2) at 867 the beginning of this file. LOOP is the current loop. */ 868 869 static struct component * 870 filter_suitable_components (struct loop *loop, struct component *comps) 871 { 872 struct component **comp, *act; 873 874 for (comp = &comps; *comp; ) 875 { 876 act = *comp; 877 if (suitable_component_p (loop, act)) 878 comp = &act->next; 879 else 880 { 881 dref ref; 882 unsigned i; 883 884 *comp = act->next; 885 for (i = 0; VEC_iterate (dref, act->refs, i, ref); i++) 886 free (ref); 887 release_component (act); 888 } 889 } 890 891 return comps; 892 } 893 894 /* Compares two drefs A and B by their offset and position. Callback for 895 qsort. */ 896 897 static int 898 order_drefs (const void *a, const void *b) 899 { 900 const dref *const da = (const dref *) a; 901 const dref *const db = (const dref *) b; 902 int offcmp = double_int_scmp ((*da)->offset, (*db)->offset); 903 904 if (offcmp != 0) 905 return offcmp; 906 907 return (*da)->pos - (*db)->pos; 908 } 909 910 /* Returns root of the CHAIN. */ 911 912 static inline dref 913 get_chain_root (chain_p chain) 914 { 915 return VEC_index (dref, chain->refs, 0); 916 } 917 918 /* Adds REF to the chain CHAIN. */ 919 920 static void 921 add_ref_to_chain (chain_p chain, dref ref) 922 { 923 dref root = get_chain_root (chain); 924 double_int dist; 925 926 gcc_assert (double_int_scmp (root->offset, ref->offset) <= 0); 927 dist = double_int_add (ref->offset, double_int_neg (root->offset)); 928 if (double_int_ucmp (uhwi_to_double_int (MAX_DISTANCE), dist) <= 0) 929 { 930 free (ref); 931 return; 932 } 933 gcc_assert (double_int_fits_in_uhwi_p (dist)); 934 935 VEC_safe_push (dref, heap, chain->refs, ref); 936 937 ref->distance = double_int_to_uhwi (dist); 938 939 if (ref->distance >= chain->length) 940 { 941 chain->length = ref->distance; 942 chain->has_max_use_after = false; 943 } 944 945 if (ref->distance == chain->length 946 && ref->pos > root->pos) 947 chain->has_max_use_after = true; 948 949 chain->all_always_accessed &= ref->always_accessed; 950 } 951 952 /* Returns the chain for invariant component COMP. */ 953 954 static chain_p 955 make_invariant_chain (struct component *comp) 956 { 957 chain_p chain = XCNEW (struct chain); 958 unsigned i; 959 dref ref; 960 961 chain->type = CT_INVARIANT; 962 963 chain->all_always_accessed = true; 964 965 for (i = 0; VEC_iterate (dref, comp->refs, i, ref); i++) 966 { 967 VEC_safe_push (dref, heap, chain->refs, ref); 968 chain->all_always_accessed &= ref->always_accessed; 969 } 970 971 return chain; 972 } 973 974 /* Make a new chain rooted at REF. */ 975 976 static chain_p 977 make_rooted_chain (dref ref) 978 { 979 chain_p chain = XCNEW (struct chain); 980 981 chain->type = DR_IS_READ (ref->ref) ? CT_LOAD : CT_STORE_LOAD; 982 983 VEC_safe_push (dref, heap, chain->refs, ref); 984 chain->all_always_accessed = ref->always_accessed; 985 986 ref->distance = 0; 987 988 return chain; 989 } 990 991 /* Returns true if CHAIN is not trivial. */ 992 993 static bool 994 nontrivial_chain_p (chain_p chain) 995 { 996 return chain != NULL && VEC_length (dref, chain->refs) > 1; 997 } 998 999 /* Returns the ssa name that contains the value of REF, or NULL_TREE if there 1000 is no such name. */ 1001 1002 static tree 1003 name_for_ref (dref ref) 1004 { 1005 tree name; 1006 1007 if (is_gimple_assign (ref->stmt)) 1008 { 1009 if (!ref->ref || DR_IS_READ (ref->ref)) 1010 name = gimple_assign_lhs (ref->stmt); 1011 else 1012 name = gimple_assign_rhs1 (ref->stmt); 1013 } 1014 else 1015 name = PHI_RESULT (ref->stmt); 1016 1017 return (TREE_CODE (name) == SSA_NAME ? name : NULL_TREE); 1018 } 1019 1020 /* Returns true if REF is a valid initializer for ROOT with given DISTANCE (in 1021 iterations of the innermost enclosing loop). */ 1022 1023 static bool 1024 valid_initializer_p (struct data_reference *ref, 1025 unsigned distance, struct data_reference *root) 1026 { 1027 aff_tree diff, base, step; 1028 double_int off; 1029 1030 /* Both REF and ROOT must be accessing the same object. */ 1031 if (!operand_equal_p (DR_BASE_ADDRESS (ref), DR_BASE_ADDRESS (root), 0)) 1032 return false; 1033 1034 /* The initializer is defined outside of loop, hence its address must be 1035 invariant inside the loop. */ 1036 gcc_assert (integer_zerop (DR_STEP (ref))); 1037 1038 /* If the address of the reference is invariant, initializer must access 1039 exactly the same location. */ 1040 if (integer_zerop (DR_STEP (root))) 1041 return (operand_equal_p (DR_OFFSET (ref), DR_OFFSET (root), 0) 1042 && operand_equal_p (DR_INIT (ref), DR_INIT (root), 0)); 1043 1044 /* Verify that this index of REF is equal to the root's index at 1045 -DISTANCE-th iteration. */ 1046 aff_combination_dr_offset (root, &diff); 1047 aff_combination_dr_offset (ref, &base); 1048 aff_combination_scale (&base, double_int_minus_one); 1049 aff_combination_add (&diff, &base); 1050 1051 tree_to_aff_combination_expand (DR_STEP (root), sizetype, &step, 1052 &name_expansions); 1053 if (!aff_combination_constant_multiple_p (&diff, &step, &off)) 1054 return false; 1055 1056 if (!double_int_equal_p (off, uhwi_to_double_int (distance))) 1057 return false; 1058 1059 return true; 1060 } 1061 1062 /* Finds looparound phi node of LOOP that copies the value of REF, and if its 1063 initial value is correct (equal to initial value of REF shifted by one 1064 iteration), returns the phi node. Otherwise, NULL_TREE is returned. ROOT 1065 is the root of the current chain. */ 1066 1067 static gimple 1068 find_looparound_phi (struct loop *loop, dref ref, dref root) 1069 { 1070 tree name, init, init_ref; 1071 gimple phi = NULL, init_stmt; 1072 edge latch = loop_latch_edge (loop); 1073 struct data_reference init_dr; 1074 gimple_stmt_iterator psi; 1075 1076 if (is_gimple_assign (ref->stmt)) 1077 { 1078 if (DR_IS_READ (ref->ref)) 1079 name = gimple_assign_lhs (ref->stmt); 1080 else 1081 name = gimple_assign_rhs1 (ref->stmt); 1082 } 1083 else 1084 name = PHI_RESULT (ref->stmt); 1085 if (!name) 1086 return NULL; 1087 1088 for (psi = gsi_start_phis (loop->header); !gsi_end_p (psi); gsi_next (&psi)) 1089 { 1090 phi = gsi_stmt (psi); 1091 if (PHI_ARG_DEF_FROM_EDGE (phi, latch) == name) 1092 break; 1093 } 1094 1095 if (gsi_end_p (psi)) 1096 return NULL; 1097 1098 init = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop)); 1099 if (TREE_CODE (init) != SSA_NAME) 1100 return NULL; 1101 init_stmt = SSA_NAME_DEF_STMT (init); 1102 if (gimple_code (init_stmt) != GIMPLE_ASSIGN) 1103 return NULL; 1104 gcc_assert (gimple_assign_lhs (init_stmt) == init); 1105 1106 init_ref = gimple_assign_rhs1 (init_stmt); 1107 if (!REFERENCE_CLASS_P (init_ref) 1108 && !DECL_P (init_ref)) 1109 return NULL; 1110 1111 /* Analyze the behavior of INIT_REF with respect to LOOP (innermost 1112 loop enclosing PHI). */ 1113 memset (&init_dr, 0, sizeof (struct data_reference)); 1114 DR_REF (&init_dr) = init_ref; 1115 DR_STMT (&init_dr) = phi; 1116 if (!dr_analyze_innermost (&init_dr)) 1117 return NULL; 1118 1119 if (!valid_initializer_p (&init_dr, ref->distance + 1, root->ref)) 1120 return NULL; 1121 1122 return phi; 1123 } 1124 1125 /* Adds a reference for the looparound copy of REF in PHI to CHAIN. */ 1126 1127 static void 1128 insert_looparound_copy (chain_p chain, dref ref, gimple phi) 1129 { 1130 dref nw = XCNEW (struct dref_d), aref; 1131 unsigned i; 1132 1133 nw->stmt = phi; 1134 nw->distance = ref->distance + 1; 1135 nw->always_accessed = 1; 1136 1137 for (i = 0; VEC_iterate (dref, chain->refs, i, aref); i++) 1138 if (aref->distance >= nw->distance) 1139 break; 1140 VEC_safe_insert (dref, heap, chain->refs, i, nw); 1141 1142 if (nw->distance > chain->length) 1143 { 1144 chain->length = nw->distance; 1145 chain->has_max_use_after = false; 1146 } 1147 } 1148 1149 /* For references in CHAIN that are copied around the LOOP (created previously 1150 by PRE, or by user), add the results of such copies to the chain. This 1151 enables us to remove the copies by unrolling, and may need less registers 1152 (also, it may allow us to combine chains together). */ 1153 1154 static void 1155 add_looparound_copies (struct loop *loop, chain_p chain) 1156 { 1157 unsigned i; 1158 dref ref, root = get_chain_root (chain); 1159 gimple phi; 1160 1161 for (i = 0; VEC_iterate (dref, chain->refs, i, ref); i++) 1162 { 1163 phi = find_looparound_phi (loop, ref, root); 1164 if (!phi) 1165 continue; 1166 1167 bitmap_set_bit (looparound_phis, SSA_NAME_VERSION (PHI_RESULT (phi))); 1168 insert_looparound_copy (chain, ref, phi); 1169 } 1170 } 1171 1172 /* Find roots of the values and determine distances in the component COMP. 1173 The references are redistributed into CHAINS. LOOP is the current 1174 loop. */ 1175 1176 static void 1177 determine_roots_comp (struct loop *loop, 1178 struct component *comp, 1179 VEC (chain_p, heap) **chains) 1180 { 1181 unsigned i; 1182 dref a; 1183 chain_p chain = NULL; 1184 double_int last_ofs = double_int_zero; 1185 1186 /* Invariants are handled specially. */ 1187 if (comp->comp_step == RS_INVARIANT) 1188 { 1189 chain = make_invariant_chain (comp); 1190 VEC_safe_push (chain_p, heap, *chains, chain); 1191 return; 1192 } 1193 1194 qsort (VEC_address (dref, comp->refs), VEC_length (dref, comp->refs), 1195 sizeof (dref), order_drefs); 1196 1197 for (i = 0; VEC_iterate (dref, comp->refs, i, a); i++) 1198 { 1199 if (!chain || !DR_IS_READ (a->ref) 1200 || double_int_ucmp (uhwi_to_double_int (MAX_DISTANCE), 1201 double_int_add (a->offset, 1202 double_int_neg (last_ofs))) <= 0) 1203 { 1204 if (nontrivial_chain_p (chain)) 1205 { 1206 add_looparound_copies (loop, chain); 1207 VEC_safe_push (chain_p, heap, *chains, chain); 1208 } 1209 else 1210 release_chain (chain); 1211 chain = make_rooted_chain (a); 1212 last_ofs = a->offset; 1213 continue; 1214 } 1215 1216 add_ref_to_chain (chain, a); 1217 } 1218 1219 if (nontrivial_chain_p (chain)) 1220 { 1221 add_looparound_copies (loop, chain); 1222 VEC_safe_push (chain_p, heap, *chains, chain); 1223 } 1224 else 1225 release_chain (chain); 1226 } 1227 1228 /* Find roots of the values and determine distances in components COMPS, and 1229 separates the references to CHAINS. LOOP is the current loop. */ 1230 1231 static void 1232 determine_roots (struct loop *loop, 1233 struct component *comps, VEC (chain_p, heap) **chains) 1234 { 1235 struct component *comp; 1236 1237 for (comp = comps; comp; comp = comp->next) 1238 determine_roots_comp (loop, comp, chains); 1239 } 1240 1241 /* Replace the reference in statement STMT with temporary variable 1242 NEW_TREE. If SET is true, NEW_TREE is instead initialized to the value of 1243 the reference in the statement. IN_LHS is true if the reference 1244 is in the lhs of STMT, false if it is in rhs. */ 1245 1246 static void 1247 replace_ref_with (gimple stmt, tree new_tree, bool set, bool in_lhs) 1248 { 1249 tree val; 1250 gimple new_stmt; 1251 gimple_stmt_iterator bsi, psi; 1252 1253 if (gimple_code (stmt) == GIMPLE_PHI) 1254 { 1255 gcc_assert (!in_lhs && !set); 1256 1257 val = PHI_RESULT (stmt); 1258 bsi = gsi_after_labels (gimple_bb (stmt)); 1259 psi = gsi_for_stmt (stmt); 1260 remove_phi_node (&psi, false); 1261 1262 /* Turn the phi node into GIMPLE_ASSIGN. */ 1263 new_stmt = gimple_build_assign (val, new_tree); 1264 gsi_insert_before (&bsi, new_stmt, GSI_NEW_STMT); 1265 return; 1266 } 1267 1268 /* Since the reference is of gimple_reg type, it should only 1269 appear as lhs or rhs of modify statement. */ 1270 gcc_assert (is_gimple_assign (stmt)); 1271 1272 bsi = gsi_for_stmt (stmt); 1273 1274 /* If we do not need to initialize NEW_TREE, just replace the use of OLD. */ 1275 if (!set) 1276 { 1277 gcc_assert (!in_lhs); 1278 gimple_assign_set_rhs_from_tree (&bsi, new_tree); 1279 stmt = gsi_stmt (bsi); 1280 update_stmt (stmt); 1281 return; 1282 } 1283 1284 if (in_lhs) 1285 { 1286 /* We have statement 1287 1288 OLD = VAL 1289 1290 If OLD is a memory reference, then VAL is gimple_val, and we transform 1291 this to 1292 1293 OLD = VAL 1294 NEW = VAL 1295 1296 Otherwise, we are replacing a combination chain, 1297 VAL is the expression that performs the combination, and OLD is an 1298 SSA name. In this case, we transform the assignment to 1299 1300 OLD = VAL 1301 NEW = OLD 1302 1303 */ 1304 1305 val = gimple_assign_lhs (stmt); 1306 if (TREE_CODE (val) != SSA_NAME) 1307 { 1308 gcc_assert (gimple_assign_copy_p (stmt)); 1309 val = gimple_assign_rhs1 (stmt); 1310 } 1311 } 1312 else 1313 { 1314 /* VAL = OLD 1315 1316 is transformed to 1317 1318 VAL = OLD 1319 NEW = VAL */ 1320 1321 val = gimple_assign_lhs (stmt); 1322 } 1323 1324 new_stmt = gimple_build_assign (new_tree, unshare_expr (val)); 1325 gsi_insert_after (&bsi, new_stmt, GSI_NEW_STMT); 1326 } 1327 1328 /* Returns the reference to the address of REF in the ITER-th iteration of 1329 LOOP, or NULL if we fail to determine it (ITER may be negative). We 1330 try to preserve the original shape of the reference (not rewrite it 1331 as an indirect ref to the address), to make tree_could_trap_p in 1332 prepare_initializers_chain return false more often. */ 1333 1334 static tree 1335 ref_at_iteration (struct loop *loop, tree ref, int iter) 1336 { 1337 tree idx, *idx_p, type, val, op0 = NULL_TREE, ret; 1338 affine_iv iv; 1339 bool ok; 1340 1341 if (handled_component_p (ref)) 1342 { 1343 op0 = ref_at_iteration (loop, TREE_OPERAND (ref, 0), iter); 1344 if (!op0) 1345 return NULL_TREE; 1346 } 1347 else if (!INDIRECT_REF_P (ref)) 1348 return unshare_expr (ref); 1349 1350 if (INDIRECT_REF_P (ref)) 1351 { 1352 /* Take care for INDIRECT_REF and MISALIGNED_INDIRECT_REF at 1353 the same time. */ 1354 ret = copy_node (ref); 1355 idx = TREE_OPERAND (ref, 0); 1356 idx_p = &TREE_OPERAND (ret, 0); 1357 } 1358 else if (TREE_CODE (ref) == COMPONENT_REF) 1359 { 1360 /* Check that the offset is loop invariant. */ 1361 if (TREE_OPERAND (ref, 2) 1362 && !expr_invariant_in_loop_p (loop, TREE_OPERAND (ref, 2))) 1363 return NULL_TREE; 1364 1365 return build3 (COMPONENT_REF, TREE_TYPE (ref), op0, 1366 unshare_expr (TREE_OPERAND (ref, 1)), 1367 unshare_expr (TREE_OPERAND (ref, 2))); 1368 } 1369 else if (TREE_CODE (ref) == ARRAY_REF) 1370 { 1371 /* Check that the lower bound and the step are loop invariant. */ 1372 if (TREE_OPERAND (ref, 2) 1373 && !expr_invariant_in_loop_p (loop, TREE_OPERAND (ref, 2))) 1374 return NULL_TREE; 1375 if (TREE_OPERAND (ref, 3) 1376 && !expr_invariant_in_loop_p (loop, TREE_OPERAND (ref, 3))) 1377 return NULL_TREE; 1378 1379 ret = build4 (ARRAY_REF, TREE_TYPE (ref), op0, NULL_TREE, 1380 unshare_expr (TREE_OPERAND (ref, 2)), 1381 unshare_expr (TREE_OPERAND (ref, 3))); 1382 idx = TREE_OPERAND (ref, 1); 1383 idx_p = &TREE_OPERAND (ret, 1); 1384 } 1385 else 1386 return NULL_TREE; 1387 1388 ok = simple_iv (loop, loop, idx, &iv, true); 1389 if (!ok) 1390 return NULL_TREE; 1391 iv.base = expand_simple_operations (iv.base); 1392 if (integer_zerop (iv.step)) 1393 *idx_p = unshare_expr (iv.base); 1394 else 1395 { 1396 type = TREE_TYPE (iv.base); 1397 if (POINTER_TYPE_P (type)) 1398 { 1399 val = fold_build2 (MULT_EXPR, sizetype, iv.step, 1400 size_int (iter)); 1401 val = fold_build2 (POINTER_PLUS_EXPR, type, iv.base, val); 1402 } 1403 else 1404 { 1405 val = fold_build2 (MULT_EXPR, type, iv.step, 1406 build_int_cst_type (type, iter)); 1407 val = fold_build2 (PLUS_EXPR, type, iv.base, val); 1408 } 1409 *idx_p = unshare_expr (val); 1410 } 1411 1412 return ret; 1413 } 1414 1415 /* Get the initialization expression for the INDEX-th temporary variable 1416 of CHAIN. */ 1417 1418 static tree 1419 get_init_expr (chain_p chain, unsigned index) 1420 { 1421 if (chain->type == CT_COMBINATION) 1422 { 1423 tree e1 = get_init_expr (chain->ch1, index); 1424 tree e2 = get_init_expr (chain->ch2, index); 1425 1426 return fold_build2 (chain->op, chain->rslt_type, e1, e2); 1427 } 1428 else 1429 return VEC_index (tree, chain->inits, index); 1430 } 1431 1432 /* Marks all virtual operands of statement STMT for renaming. */ 1433 1434 void 1435 mark_virtual_ops_for_renaming (gimple stmt) 1436 { 1437 tree var; 1438 1439 if (gimple_code (stmt) == GIMPLE_PHI) 1440 { 1441 var = PHI_RESULT (stmt); 1442 if (is_gimple_reg (var)) 1443 return; 1444 1445 if (TREE_CODE (var) == SSA_NAME) 1446 var = SSA_NAME_VAR (var); 1447 mark_sym_for_renaming (var); 1448 return; 1449 } 1450 1451 update_stmt (stmt); 1452 if (gimple_vuse (stmt)) 1453 mark_sym_for_renaming (gimple_vop (cfun)); 1454 } 1455 1456 /* Returns a new temporary variable used for the I-th variable carrying 1457 value of REF. The variable's uid is marked in TMP_VARS. */ 1458 1459 static tree 1460 predcom_tmp_var (tree ref, unsigned i, bitmap tmp_vars) 1461 { 1462 tree type = TREE_TYPE (ref); 1463 tree var = create_tmp_var (type, get_lsm_tmp_name (ref, i)); 1464 1465 /* We never access the components of the temporary variable in predictive 1466 commoning. */ 1467 if (TREE_CODE (type) == COMPLEX_TYPE 1468 || TREE_CODE (type) == VECTOR_TYPE) 1469 DECL_GIMPLE_REG_P (var) = 1; 1470 1471 add_referenced_var (var); 1472 bitmap_set_bit (tmp_vars, DECL_UID (var)); 1473 return var; 1474 } 1475 1476 /* Creates the variables for CHAIN, as well as phi nodes for them and 1477 initialization on entry to LOOP. Uids of the newly created 1478 temporary variables are marked in TMP_VARS. */ 1479 1480 static void 1481 initialize_root_vars (struct loop *loop, chain_p chain, bitmap tmp_vars) 1482 { 1483 unsigned i; 1484 unsigned n = chain->length; 1485 dref root = get_chain_root (chain); 1486 bool reuse_first = !chain->has_max_use_after; 1487 tree ref, init, var, next; 1488 gimple phi; 1489 gimple_seq stmts; 1490 edge entry = loop_preheader_edge (loop), latch = loop_latch_edge (loop); 1491 1492 /* If N == 0, then all the references are within the single iteration. And 1493 since this is an nonempty chain, reuse_first cannot be true. */ 1494 gcc_assert (n > 0 || !reuse_first); 1495 1496 chain->vars = VEC_alloc (tree, heap, n + 1); 1497 1498 if (chain->type == CT_COMBINATION) 1499 ref = gimple_assign_lhs (root->stmt); 1500 else 1501 ref = DR_REF (root->ref); 1502 1503 for (i = 0; i < n + (reuse_first ? 0 : 1); i++) 1504 { 1505 var = predcom_tmp_var (ref, i, tmp_vars); 1506 VEC_quick_push (tree, chain->vars, var); 1507 } 1508 if (reuse_first) 1509 VEC_quick_push (tree, chain->vars, VEC_index (tree, chain->vars, 0)); 1510 1511 for (i = 0; VEC_iterate (tree, chain->vars, i, var); i++) 1512 VEC_replace (tree, chain->vars, i, make_ssa_name (var, NULL)); 1513 1514 for (i = 0; i < n; i++) 1515 { 1516 var = VEC_index (tree, chain->vars, i); 1517 next = VEC_index (tree, chain->vars, i + 1); 1518 init = get_init_expr (chain, i); 1519 1520 init = force_gimple_operand (init, &stmts, true, NULL_TREE); 1521 if (stmts) 1522 gsi_insert_seq_on_edge_immediate (entry, stmts); 1523 1524 phi = create_phi_node (var, loop->header); 1525 SSA_NAME_DEF_STMT (var) = phi; 1526 add_phi_arg (phi, init, entry, UNKNOWN_LOCATION); 1527 add_phi_arg (phi, next, latch, UNKNOWN_LOCATION); 1528 } 1529 } 1530 1531 /* Create the variables and initialization statement for root of chain 1532 CHAIN. Uids of the newly created temporary variables are marked 1533 in TMP_VARS. */ 1534 1535 static void 1536 initialize_root (struct loop *loop, chain_p chain, bitmap tmp_vars) 1537 { 1538 dref root = get_chain_root (chain); 1539 bool in_lhs = (chain->type == CT_STORE_LOAD 1540 || chain->type == CT_COMBINATION); 1541 1542 initialize_root_vars (loop, chain, tmp_vars); 1543 replace_ref_with (root->stmt, 1544 VEC_index (tree, chain->vars, chain->length), 1545 true, in_lhs); 1546 } 1547 1548 /* Initializes a variable for load motion for ROOT and prepares phi nodes and 1549 initialization on entry to LOOP if necessary. The ssa name for the variable 1550 is stored in VARS. If WRITTEN is true, also a phi node to copy its value 1551 around the loop is created. Uid of the newly created temporary variable 1552 is marked in TMP_VARS. INITS is the list containing the (single) 1553 initializer. */ 1554 1555 static void 1556 initialize_root_vars_lm (struct loop *loop, dref root, bool written, 1557 VEC(tree, heap) **vars, VEC(tree, heap) *inits, 1558 bitmap tmp_vars) 1559 { 1560 unsigned i; 1561 tree ref = DR_REF (root->ref), init, var, next; 1562 gimple_seq stmts; 1563 gimple phi; 1564 edge entry = loop_preheader_edge (loop), latch = loop_latch_edge (loop); 1565 1566 /* Find the initializer for the variable, and check that it cannot 1567 trap. */ 1568 init = VEC_index (tree, inits, 0); 1569 1570 *vars = VEC_alloc (tree, heap, written ? 2 : 1); 1571 var = predcom_tmp_var (ref, 0, tmp_vars); 1572 VEC_quick_push (tree, *vars, var); 1573 if (written) 1574 VEC_quick_push (tree, *vars, VEC_index (tree, *vars, 0)); 1575 1576 for (i = 0; VEC_iterate (tree, *vars, i, var); i++) 1577 VEC_replace (tree, *vars, i, make_ssa_name (var, NULL)); 1578 1579 var = VEC_index (tree, *vars, 0); 1580 1581 init = force_gimple_operand (init, &stmts, written, NULL_TREE); 1582 if (stmts) 1583 gsi_insert_seq_on_edge_immediate (entry, stmts); 1584 1585 if (written) 1586 { 1587 next = VEC_index (tree, *vars, 1); 1588 phi = create_phi_node (var, loop->header); 1589 SSA_NAME_DEF_STMT (var) = phi; 1590 add_phi_arg (phi, init, entry, UNKNOWN_LOCATION); 1591 add_phi_arg (phi, next, latch, UNKNOWN_LOCATION); 1592 } 1593 else 1594 { 1595 gimple init_stmt = gimple_build_assign (var, init); 1596 mark_virtual_ops_for_renaming (init_stmt); 1597 gsi_insert_on_edge_immediate (entry, init_stmt); 1598 } 1599 } 1600 1601 1602 /* Execute load motion for references in chain CHAIN. Uids of the newly 1603 created temporary variables are marked in TMP_VARS. */ 1604 1605 static void 1606 execute_load_motion (struct loop *loop, chain_p chain, bitmap tmp_vars) 1607 { 1608 VEC (tree, heap) *vars; 1609 dref a; 1610 unsigned n_writes = 0, ridx, i; 1611 tree var; 1612 1613 gcc_assert (chain->type == CT_INVARIANT); 1614 gcc_assert (!chain->combined); 1615 for (i = 0; VEC_iterate (dref, chain->refs, i, a); i++) 1616 if (!DR_IS_READ (a->ref)) 1617 n_writes++; 1618 1619 /* If there are no reads in the loop, there is nothing to do. */ 1620 if (n_writes == VEC_length (dref, chain->refs)) 1621 return; 1622 1623 initialize_root_vars_lm (loop, get_chain_root (chain), n_writes > 0, 1624 &vars, chain->inits, tmp_vars); 1625 1626 ridx = 0; 1627 for (i = 0; VEC_iterate (dref, chain->refs, i, a); i++) 1628 { 1629 bool is_read = DR_IS_READ (a->ref); 1630 mark_virtual_ops_for_renaming (a->stmt); 1631 1632 if (!DR_IS_READ (a->ref)) 1633 { 1634 n_writes--; 1635 if (n_writes) 1636 { 1637 var = VEC_index (tree, vars, 0); 1638 var = make_ssa_name (SSA_NAME_VAR (var), NULL); 1639 VEC_replace (tree, vars, 0, var); 1640 } 1641 else 1642 ridx = 1; 1643 } 1644 1645 replace_ref_with (a->stmt, VEC_index (tree, vars, ridx), 1646 !is_read, !is_read); 1647 } 1648 1649 VEC_free (tree, heap, vars); 1650 } 1651 1652 /* Returns the single statement in that NAME is used, excepting 1653 the looparound phi nodes contained in one of the chains. If there is no 1654 such statement, or more statements, NULL is returned. */ 1655 1656 static gimple 1657 single_nonlooparound_use (tree name) 1658 { 1659 use_operand_p use; 1660 imm_use_iterator it; 1661 gimple stmt, ret = NULL; 1662 1663 FOR_EACH_IMM_USE_FAST (use, it, name) 1664 { 1665 stmt = USE_STMT (use); 1666 1667 if (gimple_code (stmt) == GIMPLE_PHI) 1668 { 1669 /* Ignore uses in looparound phi nodes. Uses in other phi nodes 1670 could not be processed anyway, so just fail for them. */ 1671 if (bitmap_bit_p (looparound_phis, 1672 SSA_NAME_VERSION (PHI_RESULT (stmt)))) 1673 continue; 1674 1675 return NULL; 1676 } 1677 else if (ret != NULL) 1678 return NULL; 1679 else 1680 ret = stmt; 1681 } 1682 1683 return ret; 1684 } 1685 1686 /* Remove statement STMT, as well as the chain of assignments in that it is 1687 used. */ 1688 1689 static void 1690 remove_stmt (gimple stmt) 1691 { 1692 tree name; 1693 gimple next; 1694 gimple_stmt_iterator psi; 1695 1696 if (gimple_code (stmt) == GIMPLE_PHI) 1697 { 1698 name = PHI_RESULT (stmt); 1699 next = single_nonlooparound_use (name); 1700 psi = gsi_for_stmt (stmt); 1701 remove_phi_node (&psi, true); 1702 1703 if (!next 1704 || !gimple_assign_ssa_name_copy_p (next) 1705 || gimple_assign_rhs1 (next) != name) 1706 return; 1707 1708 stmt = next; 1709 } 1710 1711 while (1) 1712 { 1713 gimple_stmt_iterator bsi; 1714 1715 bsi = gsi_for_stmt (stmt); 1716 1717 name = gimple_assign_lhs (stmt); 1718 gcc_assert (TREE_CODE (name) == SSA_NAME); 1719 1720 next = single_nonlooparound_use (name); 1721 1722 mark_virtual_ops_for_renaming (stmt); 1723 gsi_remove (&bsi, true); 1724 release_defs (stmt); 1725 1726 if (!next 1727 || !gimple_assign_ssa_name_copy_p (next) 1728 || gimple_assign_rhs1 (next) != name) 1729 return; 1730 1731 stmt = next; 1732 } 1733 } 1734 1735 /* Perform the predictive commoning optimization for a chain CHAIN. 1736 Uids of the newly created temporary variables are marked in TMP_VARS.*/ 1737 1738 static void 1739 execute_pred_commoning_chain (struct loop *loop, chain_p chain, 1740 bitmap tmp_vars) 1741 { 1742 unsigned i; 1743 dref a, root; 1744 tree var; 1745 1746 if (chain->combined) 1747 { 1748 /* For combined chains, just remove the statements that are used to 1749 compute the values of the expression (except for the root one). */ 1750 for (i = 1; VEC_iterate (dref, chain->refs, i, a); i++) 1751 remove_stmt (a->stmt); 1752 } 1753 else 1754 { 1755 /* For non-combined chains, set up the variables that hold its value, 1756 and replace the uses of the original references by these 1757 variables. */ 1758 root = get_chain_root (chain); 1759 mark_virtual_ops_for_renaming (root->stmt); 1760 1761 initialize_root (loop, chain, tmp_vars); 1762 for (i = 1; VEC_iterate (dref, chain->refs, i, a); i++) 1763 { 1764 mark_virtual_ops_for_renaming (a->stmt); 1765 var = VEC_index (tree, chain->vars, chain->length - a->distance); 1766 replace_ref_with (a->stmt, var, false, false); 1767 } 1768 } 1769 } 1770 1771 /* Determines the unroll factor necessary to remove as many temporary variable 1772 copies as possible. CHAINS is the list of chains that will be 1773 optimized. */ 1774 1775 static unsigned 1776 determine_unroll_factor (VEC (chain_p, heap) *chains) 1777 { 1778 chain_p chain; 1779 unsigned factor = 1, af, nfactor, i; 1780 unsigned max = PARAM_VALUE (PARAM_MAX_UNROLL_TIMES); 1781 1782 for (i = 0; VEC_iterate (chain_p, chains, i, chain); i++) 1783 { 1784 if (chain->type == CT_INVARIANT || chain->combined) 1785 continue; 1786 1787 /* The best unroll factor for this chain is equal to the number of 1788 temporary variables that we create for it. */ 1789 af = chain->length; 1790 if (chain->has_max_use_after) 1791 af++; 1792 1793 nfactor = factor * af / gcd (factor, af); 1794 if (nfactor <= max) 1795 factor = nfactor; 1796 } 1797 1798 return factor; 1799 } 1800 1801 /* Perform the predictive commoning optimization for CHAINS. 1802 Uids of the newly created temporary variables are marked in TMP_VARS. */ 1803 1804 static void 1805 execute_pred_commoning (struct loop *loop, VEC (chain_p, heap) *chains, 1806 bitmap tmp_vars) 1807 { 1808 chain_p chain; 1809 unsigned i; 1810 1811 for (i = 0; VEC_iterate (chain_p, chains, i, chain); i++) 1812 { 1813 if (chain->type == CT_INVARIANT) 1814 execute_load_motion (loop, chain, tmp_vars); 1815 else 1816 execute_pred_commoning_chain (loop, chain, tmp_vars); 1817 } 1818 1819 update_ssa (TODO_update_ssa_only_virtuals); 1820 } 1821 1822 /* For each reference in CHAINS, if its defining statement is 1823 phi node, record the ssa name that is defined by it. */ 1824 1825 static void 1826 replace_phis_by_defined_names (VEC (chain_p, heap) *chains) 1827 { 1828 chain_p chain; 1829 dref a; 1830 unsigned i, j; 1831 1832 for (i = 0; VEC_iterate (chain_p, chains, i, chain); i++) 1833 for (j = 0; VEC_iterate (dref, chain->refs, j, a); j++) 1834 { 1835 if (gimple_code (a->stmt) == GIMPLE_PHI) 1836 { 1837 a->name_defined_by_phi = PHI_RESULT (a->stmt); 1838 a->stmt = NULL; 1839 } 1840 } 1841 } 1842 1843 /* For each reference in CHAINS, if name_defined_by_phi is not 1844 NULL, use it to set the stmt field. */ 1845 1846 static void 1847 replace_names_by_phis (VEC (chain_p, heap) *chains) 1848 { 1849 chain_p chain; 1850 dref a; 1851 unsigned i, j; 1852 1853 for (i = 0; VEC_iterate (chain_p, chains, i, chain); i++) 1854 for (j = 0; VEC_iterate (dref, chain->refs, j, a); j++) 1855 if (a->stmt == NULL) 1856 { 1857 a->stmt = SSA_NAME_DEF_STMT (a->name_defined_by_phi); 1858 gcc_assert (gimple_code (a->stmt) == GIMPLE_PHI); 1859 a->name_defined_by_phi = NULL_TREE; 1860 } 1861 } 1862 1863 /* Wrapper over execute_pred_commoning, to pass it as a callback 1864 to tree_transform_and_unroll_loop. */ 1865 1866 struct epcc_data 1867 { 1868 VEC (chain_p, heap) *chains; 1869 bitmap tmp_vars; 1870 }; 1871 1872 static void 1873 execute_pred_commoning_cbck (struct loop *loop, void *data) 1874 { 1875 struct epcc_data *const dta = (struct epcc_data *) data; 1876 1877 /* Restore phi nodes that were replaced by ssa names before 1878 tree_transform_and_unroll_loop (see detailed description in 1879 tree_predictive_commoning_loop). */ 1880 replace_names_by_phis (dta->chains); 1881 execute_pred_commoning (loop, dta->chains, dta->tmp_vars); 1882 } 1883 1884 /* Base NAME and all the names in the chain of phi nodes that use it 1885 on variable VAR. The phi nodes are recognized by being in the copies of 1886 the header of the LOOP. */ 1887 1888 static void 1889 base_names_in_chain_on (struct loop *loop, tree name, tree var) 1890 { 1891 gimple stmt, phi; 1892 imm_use_iterator iter; 1893 1894 SSA_NAME_VAR (name) = var; 1895 1896 while (1) 1897 { 1898 phi = NULL; 1899 FOR_EACH_IMM_USE_STMT (stmt, iter, name) 1900 { 1901 if (gimple_code (stmt) == GIMPLE_PHI 1902 && flow_bb_inside_loop_p (loop, gimple_bb (stmt))) 1903 { 1904 phi = stmt; 1905 BREAK_FROM_IMM_USE_STMT (iter); 1906 } 1907 } 1908 if (!phi) 1909 return; 1910 1911 name = PHI_RESULT (phi); 1912 SSA_NAME_VAR (name) = var; 1913 } 1914 } 1915 1916 /* Given an unrolled LOOP after predictive commoning, remove the 1917 register copies arising from phi nodes by changing the base 1918 variables of SSA names. TMP_VARS is the set of the temporary variables 1919 for those we want to perform this. */ 1920 1921 static void 1922 eliminate_temp_copies (struct loop *loop, bitmap tmp_vars) 1923 { 1924 edge e; 1925 gimple phi, stmt; 1926 tree name, use, var; 1927 gimple_stmt_iterator psi; 1928 1929 e = loop_latch_edge (loop); 1930 for (psi = gsi_start_phis (loop->header); !gsi_end_p (psi); gsi_next (&psi)) 1931 { 1932 phi = gsi_stmt (psi); 1933 name = PHI_RESULT (phi); 1934 var = SSA_NAME_VAR (name); 1935 if (!bitmap_bit_p (tmp_vars, DECL_UID (var))) 1936 continue; 1937 use = PHI_ARG_DEF_FROM_EDGE (phi, e); 1938 gcc_assert (TREE_CODE (use) == SSA_NAME); 1939 1940 /* Base all the ssa names in the ud and du chain of NAME on VAR. */ 1941 stmt = SSA_NAME_DEF_STMT (use); 1942 while (gimple_code (stmt) == GIMPLE_PHI 1943 /* In case we could not unroll the loop enough to eliminate 1944 all copies, we may reach the loop header before the defining 1945 statement (in that case, some register copies will be present 1946 in loop latch in the final code, corresponding to the newly 1947 created looparound phi nodes). */ 1948 && gimple_bb (stmt) != loop->header) 1949 { 1950 gcc_assert (single_pred_p (gimple_bb (stmt))); 1951 use = PHI_ARG_DEF (stmt, 0); 1952 stmt = SSA_NAME_DEF_STMT (use); 1953 } 1954 1955 base_names_in_chain_on (loop, use, var); 1956 } 1957 } 1958 1959 /* Returns true if CHAIN is suitable to be combined. */ 1960 1961 static bool 1962 chain_can_be_combined_p (chain_p chain) 1963 { 1964 return (!chain->combined 1965 && (chain->type == CT_LOAD || chain->type == CT_COMBINATION)); 1966 } 1967 1968 /* Returns the modify statement that uses NAME. Skips over assignment 1969 statements, NAME is replaced with the actual name used in the returned 1970 statement. */ 1971 1972 static gimple 1973 find_use_stmt (tree *name) 1974 { 1975 gimple stmt; 1976 tree rhs, lhs; 1977 1978 /* Skip over assignments. */ 1979 while (1) 1980 { 1981 stmt = single_nonlooparound_use (*name); 1982 if (!stmt) 1983 return NULL; 1984 1985 if (gimple_code (stmt) != GIMPLE_ASSIGN) 1986 return NULL; 1987 1988 lhs = gimple_assign_lhs (stmt); 1989 if (TREE_CODE (lhs) != SSA_NAME) 1990 return NULL; 1991 1992 if (gimple_assign_copy_p (stmt)) 1993 { 1994 rhs = gimple_assign_rhs1 (stmt); 1995 if (rhs != *name) 1996 return NULL; 1997 1998 *name = lhs; 1999 } 2000 else if (get_gimple_rhs_class (gimple_assign_rhs_code (stmt)) 2001 == GIMPLE_BINARY_RHS) 2002 return stmt; 2003 else 2004 return NULL; 2005 } 2006 } 2007 2008 /* Returns true if we may perform reassociation for operation CODE in TYPE. */ 2009 2010 static bool 2011 may_reassociate_p (tree type, enum tree_code code) 2012 { 2013 if (FLOAT_TYPE_P (type) 2014 && !flag_unsafe_math_optimizations) 2015 return false; 2016 2017 return (commutative_tree_code (code) 2018 && associative_tree_code (code)); 2019 } 2020 2021 /* If the operation used in STMT is associative and commutative, go through the 2022 tree of the same operations and returns its root. Distance to the root 2023 is stored in DISTANCE. */ 2024 2025 static gimple 2026 find_associative_operation_root (gimple stmt, unsigned *distance) 2027 { 2028 tree lhs; 2029 gimple next; 2030 enum tree_code code = gimple_assign_rhs_code (stmt); 2031 tree type = TREE_TYPE (gimple_assign_lhs (stmt)); 2032 unsigned dist = 0; 2033 2034 if (!may_reassociate_p (type, code)) 2035 return NULL; 2036 2037 while (1) 2038 { 2039 lhs = gimple_assign_lhs (stmt); 2040 gcc_assert (TREE_CODE (lhs) == SSA_NAME); 2041 2042 next = find_use_stmt (&lhs); 2043 if (!next 2044 || gimple_assign_rhs_code (next) != code) 2045 break; 2046 2047 stmt = next; 2048 dist++; 2049 } 2050 2051 if (distance) 2052 *distance = dist; 2053 return stmt; 2054 } 2055 2056 /* Returns the common statement in that NAME1 and NAME2 have a use. If there 2057 is no such statement, returns NULL_TREE. In case the operation used on 2058 NAME1 and NAME2 is associative and commutative, returns the root of the 2059 tree formed by this operation instead of the statement that uses NAME1 or 2060 NAME2. */ 2061 2062 static gimple 2063 find_common_use_stmt (tree *name1, tree *name2) 2064 { 2065 gimple stmt1, stmt2; 2066 2067 stmt1 = find_use_stmt (name1); 2068 if (!stmt1) 2069 return NULL; 2070 2071 stmt2 = find_use_stmt (name2); 2072 if (!stmt2) 2073 return NULL; 2074 2075 if (stmt1 == stmt2) 2076 return stmt1; 2077 2078 stmt1 = find_associative_operation_root (stmt1, NULL); 2079 if (!stmt1) 2080 return NULL; 2081 stmt2 = find_associative_operation_root (stmt2, NULL); 2082 if (!stmt2) 2083 return NULL; 2084 2085 return (stmt1 == stmt2 ? stmt1 : NULL); 2086 } 2087 2088 /* Checks whether R1 and R2 are combined together using CODE, with the result 2089 in RSLT_TYPE, in order R1 CODE R2 if SWAP is false and in order R2 CODE R1 2090 if it is true. If CODE is ERROR_MARK, set these values instead. */ 2091 2092 static bool 2093 combinable_refs_p (dref r1, dref r2, 2094 enum tree_code *code, bool *swap, tree *rslt_type) 2095 { 2096 enum tree_code acode; 2097 bool aswap; 2098 tree atype; 2099 tree name1, name2; 2100 gimple stmt; 2101 2102 name1 = name_for_ref (r1); 2103 name2 = name_for_ref (r2); 2104 gcc_assert (name1 != NULL_TREE && name2 != NULL_TREE); 2105 2106 stmt = find_common_use_stmt (&name1, &name2); 2107 2108 if (!stmt) 2109 return false; 2110 2111 acode = gimple_assign_rhs_code (stmt); 2112 aswap = (!commutative_tree_code (acode) 2113 && gimple_assign_rhs1 (stmt) != name1); 2114 atype = TREE_TYPE (gimple_assign_lhs (stmt)); 2115 2116 if (*code == ERROR_MARK) 2117 { 2118 *code = acode; 2119 *swap = aswap; 2120 *rslt_type = atype; 2121 return true; 2122 } 2123 2124 return (*code == acode 2125 && *swap == aswap 2126 && *rslt_type == atype); 2127 } 2128 2129 /* Remove OP from the operation on rhs of STMT, and replace STMT with 2130 an assignment of the remaining operand. */ 2131 2132 static void 2133 remove_name_from_operation (gimple stmt, tree op) 2134 { 2135 tree other_op; 2136 gimple_stmt_iterator si; 2137 2138 gcc_assert (is_gimple_assign (stmt)); 2139 2140 if (gimple_assign_rhs1 (stmt) == op) 2141 other_op = gimple_assign_rhs2 (stmt); 2142 else 2143 other_op = gimple_assign_rhs1 (stmt); 2144 2145 si = gsi_for_stmt (stmt); 2146 gimple_assign_set_rhs_from_tree (&si, other_op); 2147 2148 /* We should not have reallocated STMT. */ 2149 gcc_assert (gsi_stmt (si) == stmt); 2150 2151 update_stmt (stmt); 2152 } 2153 2154 /* Reassociates the expression in that NAME1 and NAME2 are used so that they 2155 are combined in a single statement, and returns this statement. */ 2156 2157 static gimple 2158 reassociate_to_the_same_stmt (tree name1, tree name2) 2159 { 2160 gimple stmt1, stmt2, root1, root2, s1, s2; 2161 gimple new_stmt, tmp_stmt; 2162 tree new_name, tmp_name, var, r1, r2; 2163 unsigned dist1, dist2; 2164 enum tree_code code; 2165 tree type = TREE_TYPE (name1); 2166 gimple_stmt_iterator bsi; 2167 2168 stmt1 = find_use_stmt (&name1); 2169 stmt2 = find_use_stmt (&name2); 2170 root1 = find_associative_operation_root (stmt1, &dist1); 2171 root2 = find_associative_operation_root (stmt2, &dist2); 2172 code = gimple_assign_rhs_code (stmt1); 2173 2174 gcc_assert (root1 && root2 && root1 == root2 2175 && code == gimple_assign_rhs_code (stmt2)); 2176 2177 /* Find the root of the nearest expression in that both NAME1 and NAME2 2178 are used. */ 2179 r1 = name1; 2180 s1 = stmt1; 2181 r2 = name2; 2182 s2 = stmt2; 2183 2184 while (dist1 > dist2) 2185 { 2186 s1 = find_use_stmt (&r1); 2187 r1 = gimple_assign_lhs (s1); 2188 dist1--; 2189 } 2190 while (dist2 > dist1) 2191 { 2192 s2 = find_use_stmt (&r2); 2193 r2 = gimple_assign_lhs (s2); 2194 dist2--; 2195 } 2196 2197 while (s1 != s2) 2198 { 2199 s1 = find_use_stmt (&r1); 2200 r1 = gimple_assign_lhs (s1); 2201 s2 = find_use_stmt (&r2); 2202 r2 = gimple_assign_lhs (s2); 2203 } 2204 2205 /* Remove NAME1 and NAME2 from the statements in that they are used 2206 currently. */ 2207 remove_name_from_operation (stmt1, name1); 2208 remove_name_from_operation (stmt2, name2); 2209 2210 /* Insert the new statement combining NAME1 and NAME2 before S1, and 2211 combine it with the rhs of S1. */ 2212 var = create_tmp_var (type, "predreastmp"); 2213 if (TREE_CODE (type) == COMPLEX_TYPE 2214 || TREE_CODE (type) == VECTOR_TYPE) 2215 DECL_GIMPLE_REG_P (var) = 1; 2216 add_referenced_var (var); 2217 new_name = make_ssa_name (var, NULL); 2218 new_stmt = gimple_build_assign_with_ops (code, new_name, name1, name2); 2219 2220 var = create_tmp_var (type, "predreastmp"); 2221 if (TREE_CODE (type) == COMPLEX_TYPE 2222 || TREE_CODE (type) == VECTOR_TYPE) 2223 DECL_GIMPLE_REG_P (var) = 1; 2224 add_referenced_var (var); 2225 tmp_name = make_ssa_name (var, NULL); 2226 2227 /* Rhs of S1 may now be either a binary expression with operation 2228 CODE, or gimple_val (in case that stmt1 == s1 or stmt2 == s1, 2229 so that name1 or name2 was removed from it). */ 2230 tmp_stmt = gimple_build_assign_with_ops (gimple_assign_rhs_code (s1), 2231 tmp_name, 2232 gimple_assign_rhs1 (s1), 2233 gimple_assign_rhs2 (s1)); 2234 2235 bsi = gsi_for_stmt (s1); 2236 gimple_assign_set_rhs_with_ops (&bsi, code, new_name, tmp_name); 2237 s1 = gsi_stmt (bsi); 2238 update_stmt (s1); 2239 2240 gsi_insert_before (&bsi, new_stmt, GSI_SAME_STMT); 2241 gsi_insert_before (&bsi, tmp_stmt, GSI_SAME_STMT); 2242 2243 return new_stmt; 2244 } 2245 2246 /* Returns the statement that combines references R1 and R2. In case R1 2247 and R2 are not used in the same statement, but they are used with an 2248 associative and commutative operation in the same expression, reassociate 2249 the expression so that they are used in the same statement. */ 2250 2251 static gimple 2252 stmt_combining_refs (dref r1, dref r2) 2253 { 2254 gimple stmt1, stmt2; 2255 tree name1 = name_for_ref (r1); 2256 tree name2 = name_for_ref (r2); 2257 2258 stmt1 = find_use_stmt (&name1); 2259 stmt2 = find_use_stmt (&name2); 2260 if (stmt1 == stmt2) 2261 return stmt1; 2262 2263 return reassociate_to_the_same_stmt (name1, name2); 2264 } 2265 2266 /* Tries to combine chains CH1 and CH2 together. If this succeeds, the 2267 description of the new chain is returned, otherwise we return NULL. */ 2268 2269 static chain_p 2270 combine_chains (chain_p ch1, chain_p ch2) 2271 { 2272 dref r1, r2, nw; 2273 enum tree_code op = ERROR_MARK; 2274 bool swap = false; 2275 chain_p new_chain; 2276 unsigned i; 2277 gimple root_stmt; 2278 tree rslt_type = NULL_TREE; 2279 2280 if (ch1 == ch2) 2281 return NULL; 2282 if (ch1->length != ch2->length) 2283 return NULL; 2284 2285 if (VEC_length (dref, ch1->refs) != VEC_length (dref, ch2->refs)) 2286 return NULL; 2287 2288 for (i = 0; (VEC_iterate (dref, ch1->refs, i, r1) 2289 && VEC_iterate (dref, ch2->refs, i, r2)); i++) 2290 { 2291 if (r1->distance != r2->distance) 2292 return NULL; 2293 2294 if (!combinable_refs_p (r1, r2, &op, &swap, &rslt_type)) 2295 return NULL; 2296 } 2297 2298 if (swap) 2299 { 2300 chain_p tmp = ch1; 2301 ch1 = ch2; 2302 ch2 = tmp; 2303 } 2304 2305 new_chain = XCNEW (struct chain); 2306 new_chain->type = CT_COMBINATION; 2307 new_chain->op = op; 2308 new_chain->ch1 = ch1; 2309 new_chain->ch2 = ch2; 2310 new_chain->rslt_type = rslt_type; 2311 new_chain->length = ch1->length; 2312 2313 for (i = 0; (VEC_iterate (dref, ch1->refs, i, r1) 2314 && VEC_iterate (dref, ch2->refs, i, r2)); i++) 2315 { 2316 nw = XCNEW (struct dref_d); 2317 nw->stmt = stmt_combining_refs (r1, r2); 2318 nw->distance = r1->distance; 2319 2320 VEC_safe_push (dref, heap, new_chain->refs, nw); 2321 } 2322 2323 new_chain->has_max_use_after = false; 2324 root_stmt = get_chain_root (new_chain)->stmt; 2325 for (i = 1; VEC_iterate (dref, new_chain->refs, i, nw); i++) 2326 { 2327 if (nw->distance == new_chain->length 2328 && !stmt_dominates_stmt_p (nw->stmt, root_stmt)) 2329 { 2330 new_chain->has_max_use_after = true; 2331 break; 2332 } 2333 } 2334 2335 ch1->combined = true; 2336 ch2->combined = true; 2337 return new_chain; 2338 } 2339 2340 /* Try to combine the CHAINS. */ 2341 2342 static void 2343 try_combine_chains (VEC (chain_p, heap) **chains) 2344 { 2345 unsigned i, j; 2346 chain_p ch1, ch2, cch; 2347 VEC (chain_p, heap) *worklist = NULL; 2348 2349 for (i = 0; VEC_iterate (chain_p, *chains, i, ch1); i++) 2350 if (chain_can_be_combined_p (ch1)) 2351 VEC_safe_push (chain_p, heap, worklist, ch1); 2352 2353 while (!VEC_empty (chain_p, worklist)) 2354 { 2355 ch1 = VEC_pop (chain_p, worklist); 2356 if (!chain_can_be_combined_p (ch1)) 2357 continue; 2358 2359 for (j = 0; VEC_iterate (chain_p, *chains, j, ch2); j++) 2360 { 2361 if (!chain_can_be_combined_p (ch2)) 2362 continue; 2363 2364 cch = combine_chains (ch1, ch2); 2365 if (cch) 2366 { 2367 VEC_safe_push (chain_p, heap, worklist, cch); 2368 VEC_safe_push (chain_p, heap, *chains, cch); 2369 break; 2370 } 2371 } 2372 } 2373 } 2374 2375 /* Prepare initializers for CHAIN in LOOP. Returns false if this is 2376 impossible because one of these initializers may trap, true otherwise. */ 2377 2378 static bool 2379 prepare_initializers_chain (struct loop *loop, chain_p chain) 2380 { 2381 unsigned i, n = (chain->type == CT_INVARIANT) ? 1 : chain->length; 2382 struct data_reference *dr = get_chain_root (chain)->ref; 2383 tree init; 2384 gimple_seq stmts; 2385 dref laref; 2386 edge entry = loop_preheader_edge (loop); 2387 2388 /* Find the initializers for the variables, and check that they cannot 2389 trap. */ 2390 chain->inits = VEC_alloc (tree, heap, n); 2391 for (i = 0; i < n; i++) 2392 VEC_quick_push (tree, chain->inits, NULL_TREE); 2393 2394 /* If we have replaced some looparound phi nodes, use their initializers 2395 instead of creating our own. */ 2396 for (i = 0; VEC_iterate (dref, chain->refs, i, laref); i++) 2397 { 2398 if (gimple_code (laref->stmt) != GIMPLE_PHI) 2399 continue; 2400 2401 gcc_assert (laref->distance > 0); 2402 VEC_replace (tree, chain->inits, n - laref->distance, 2403 PHI_ARG_DEF_FROM_EDGE (laref->stmt, entry)); 2404 } 2405 2406 for (i = 0; i < n; i++) 2407 { 2408 if (VEC_index (tree, chain->inits, i) != NULL_TREE) 2409 continue; 2410 2411 init = ref_at_iteration (loop, DR_REF (dr), (int) i - n); 2412 if (!init) 2413 return false; 2414 2415 if (!chain->all_always_accessed && tree_could_trap_p (init)) 2416 return false; 2417 2418 init = force_gimple_operand (init, &stmts, false, NULL_TREE); 2419 if (stmts) 2420 gsi_insert_seq_on_edge_immediate (entry, stmts); 2421 2422 VEC_replace (tree, chain->inits, i, init); 2423 } 2424 2425 return true; 2426 } 2427 2428 /* Prepare initializers for CHAINS in LOOP, and free chains that cannot 2429 be used because the initializers might trap. */ 2430 2431 static void 2432 prepare_initializers (struct loop *loop, VEC (chain_p, heap) *chains) 2433 { 2434 chain_p chain; 2435 unsigned i; 2436 2437 for (i = 0; i < VEC_length (chain_p, chains); ) 2438 { 2439 chain = VEC_index (chain_p, chains, i); 2440 if (prepare_initializers_chain (loop, chain)) 2441 i++; 2442 else 2443 { 2444 release_chain (chain); 2445 VEC_unordered_remove (chain_p, chains, i); 2446 } 2447 } 2448 } 2449 2450 /* Performs predictive commoning for LOOP. Returns true if LOOP was 2451 unrolled. */ 2452 2453 static bool 2454 tree_predictive_commoning_loop (struct loop *loop) 2455 { 2456 VEC (data_reference_p, heap) *datarefs; 2457 VEC (ddr_p, heap) *dependences; 2458 struct component *components; 2459 VEC (chain_p, heap) *chains = NULL; 2460 unsigned unroll_factor; 2461 struct tree_niter_desc desc; 2462 bool unroll = false; 2463 edge exit; 2464 bitmap tmp_vars; 2465 2466 if (dump_file && (dump_flags & TDF_DETAILS)) 2467 fprintf (dump_file, "Processing loop %d\n", loop->num); 2468 2469 /* Find the data references and split them into components according to their 2470 dependence relations. */ 2471 datarefs = VEC_alloc (data_reference_p, heap, 10); 2472 dependences = VEC_alloc (ddr_p, heap, 10); 2473 compute_data_dependences_for_loop (loop, true, &datarefs, &dependences); 2474 if (dump_file && (dump_flags & TDF_DETAILS)) 2475 dump_data_dependence_relations (dump_file, dependences); 2476 2477 components = split_data_refs_to_components (loop, datarefs, dependences); 2478 free_dependence_relations (dependences); 2479 if (!components) 2480 { 2481 free_data_refs (datarefs); 2482 return false; 2483 } 2484 2485 if (dump_file && (dump_flags & TDF_DETAILS)) 2486 { 2487 fprintf (dump_file, "Initial state:\n\n"); 2488 dump_components (dump_file, components); 2489 } 2490 2491 /* Find the suitable components and split them into chains. */ 2492 components = filter_suitable_components (loop, components); 2493 2494 tmp_vars = BITMAP_ALLOC (NULL); 2495 looparound_phis = BITMAP_ALLOC (NULL); 2496 determine_roots (loop, components, &chains); 2497 release_components (components); 2498 2499 if (!chains) 2500 { 2501 if (dump_file && (dump_flags & TDF_DETAILS)) 2502 fprintf (dump_file, 2503 "Predictive commoning failed: no suitable chains\n"); 2504 goto end; 2505 } 2506 prepare_initializers (loop, chains); 2507 2508 /* Try to combine the chains that are always worked with together. */ 2509 try_combine_chains (&chains); 2510 2511 if (dump_file && (dump_flags & TDF_DETAILS)) 2512 { 2513 fprintf (dump_file, "Before commoning:\n\n"); 2514 dump_chains (dump_file, chains); 2515 } 2516 2517 /* Determine the unroll factor, and if the loop should be unrolled, ensure 2518 that its number of iterations is divisible by the factor. */ 2519 unroll_factor = determine_unroll_factor (chains); 2520 scev_reset (); 2521 unroll = (unroll_factor > 1 2522 && can_unroll_loop_p (loop, unroll_factor, &desc)); 2523 exit = single_dom_exit (loop); 2524 2525 /* Execute the predictive commoning transformations, and possibly unroll the 2526 loop. */ 2527 if (unroll) 2528 { 2529 struct epcc_data dta; 2530 2531 if (dump_file && (dump_flags & TDF_DETAILS)) 2532 fprintf (dump_file, "Unrolling %u times.\n", unroll_factor); 2533 2534 dta.chains = chains; 2535 dta.tmp_vars = tmp_vars; 2536 2537 update_ssa (TODO_update_ssa_only_virtuals); 2538 2539 /* Cfg manipulations performed in tree_transform_and_unroll_loop before 2540 execute_pred_commoning_cbck is called may cause phi nodes to be 2541 reallocated, which is a problem since CHAINS may point to these 2542 statements. To fix this, we store the ssa names defined by the 2543 phi nodes here instead of the phi nodes themselves, and restore 2544 the phi nodes in execute_pred_commoning_cbck. A bit hacky. */ 2545 replace_phis_by_defined_names (chains); 2546 2547 tree_transform_and_unroll_loop (loop, unroll_factor, exit, &desc, 2548 execute_pred_commoning_cbck, &dta); 2549 eliminate_temp_copies (loop, tmp_vars); 2550 } 2551 else 2552 { 2553 if (dump_file && (dump_flags & TDF_DETAILS)) 2554 fprintf (dump_file, 2555 "Executing predictive commoning without unrolling.\n"); 2556 execute_pred_commoning (loop, chains, tmp_vars); 2557 } 2558 2559 end: ; 2560 release_chains (chains); 2561 free_data_refs (datarefs); 2562 BITMAP_FREE (tmp_vars); 2563 BITMAP_FREE (looparound_phis); 2564 2565 free_affine_expand_cache (&name_expansions); 2566 2567 return unroll; 2568 } 2569 2570 /* Runs predictive commoning. */ 2571 2572 unsigned 2573 tree_predictive_commoning (void) 2574 { 2575 bool unrolled = false; 2576 struct loop *loop; 2577 loop_iterator li; 2578 unsigned ret = 0; 2579 2580 initialize_original_copy_tables (); 2581 FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST) 2582 if (optimize_loop_for_speed_p (loop)) 2583 { 2584 unrolled |= tree_predictive_commoning_loop (loop); 2585 } 2586 2587 if (unrolled) 2588 { 2589 scev_reset (); 2590 ret = TODO_cleanup_cfg; 2591 } 2592 free_original_copy_tables (); 2593 2594 return ret; 2595 } 2596