1*38fd1498Szrj /* Basic block reordering routines for the GNU compiler. 2*38fd1498Szrj Copyright (C) 2000-2018 Free Software Foundation, Inc. 3*38fd1498Szrj 4*38fd1498Szrj This file is part of GCC. 5*38fd1498Szrj 6*38fd1498Szrj GCC is free software; you can redistribute it and/or modify it 7*38fd1498Szrj under the terms of the GNU General Public License as published by 8*38fd1498Szrj the Free Software Foundation; either version 3, or (at your option) 9*38fd1498Szrj any later version. 10*38fd1498Szrj 11*38fd1498Szrj GCC is distributed in the hope that it will be useful, but WITHOUT 12*38fd1498Szrj ANY WARRANTY; without even the implied warranty of MERCHANTABILITY 13*38fd1498Szrj or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public 14*38fd1498Szrj License for more details. 15*38fd1498Szrj 16*38fd1498Szrj You should have received a copy of the GNU General Public License 17*38fd1498Szrj along with GCC; see the file COPYING3. If not see 18*38fd1498Szrj <http://www.gnu.org/licenses/>. */ 19*38fd1498Szrj 20*38fd1498Szrj /* This file contains the "reorder blocks" pass, which changes the control 21*38fd1498Szrj flow of a function to encounter fewer branches; the "partition blocks" 22*38fd1498Szrj pass, which divides the basic blocks into "hot" and "cold" partitions, 23*38fd1498Szrj which are kept separate; and the "duplicate computed gotos" pass, which 24*38fd1498Szrj duplicates blocks ending in an indirect jump. 25*38fd1498Szrj 26*38fd1498Szrj There are two algorithms for "reorder blocks": the "simple" algorithm, 27*38fd1498Szrj which just rearranges blocks, trying to minimize the number of executed 28*38fd1498Szrj unconditional branches; and the "software trace cache" algorithm, which 29*38fd1498Szrj also copies code, and in general tries a lot harder to have long linear 30*38fd1498Szrj pieces of machine code executed. This algorithm is described next. */ 31*38fd1498Szrj 32*38fd1498Szrj /* This (greedy) algorithm constructs traces in several rounds. 33*38fd1498Szrj The construction starts from "seeds". The seed for the first round 34*38fd1498Szrj is the entry point of the function. When there are more than one seed, 35*38fd1498Szrj the one with the lowest key in the heap is selected first (see bb_to_key). 36*38fd1498Szrj Then the algorithm repeatedly adds the most probable successor to the end 37*38fd1498Szrj of a trace. Finally it connects the traces. 38*38fd1498Szrj 39*38fd1498Szrj There are two parameters: Branch Threshold and Exec Threshold. 40*38fd1498Szrj If the probability of an edge to a successor of the current basic block is 41*38fd1498Szrj lower than Branch Threshold or its count is lower than Exec Threshold, 42*38fd1498Szrj then the successor will be the seed in one of the next rounds. 43*38fd1498Szrj Each round has these parameters lower than the previous one. 44*38fd1498Szrj The last round has to have these parameters set to zero so that the 45*38fd1498Szrj remaining blocks are picked up. 46*38fd1498Szrj 47*38fd1498Szrj The algorithm selects the most probable successor from all unvisited 48*38fd1498Szrj successors and successors that have been added to this trace. 49*38fd1498Szrj The other successors (that has not been "sent" to the next round) will be 50*38fd1498Szrj other seeds for this round and the secondary traces will start from them. 51*38fd1498Szrj If the successor has not been visited in this trace, it is added to the 52*38fd1498Szrj trace (however, there is some heuristic for simple branches). 53*38fd1498Szrj If the successor has been visited in this trace, a loop has been found. 54*38fd1498Szrj If the loop has many iterations, the loop is rotated so that the source 55*38fd1498Szrj block of the most probable edge going out of the loop is the last block 56*38fd1498Szrj of the trace. 57*38fd1498Szrj If the loop has few iterations and there is no edge from the last block of 58*38fd1498Szrj the loop going out of the loop, the loop header is duplicated. 59*38fd1498Szrj 60*38fd1498Szrj When connecting traces, the algorithm first checks whether there is an edge 61*38fd1498Szrj from the last block of a trace to the first block of another trace. 62*38fd1498Szrj When there are still some unconnected traces it checks whether there exists 63*38fd1498Szrj a basic block BB such that BB is a successor of the last block of a trace 64*38fd1498Szrj and BB is a predecessor of the first block of another trace. In this case, 65*38fd1498Szrj BB is duplicated, added at the end of the first trace and the traces are 66*38fd1498Szrj connected through it. 67*38fd1498Szrj The rest of traces are simply connected so there will be a jump to the 68*38fd1498Szrj beginning of the rest of traces. 69*38fd1498Szrj 70*38fd1498Szrj The above description is for the full algorithm, which is used when the 71*38fd1498Szrj function is optimized for speed. When the function is optimized for size, 72*38fd1498Szrj in order to reduce long jumps and connect more fallthru edges, the 73*38fd1498Szrj algorithm is modified as follows: 74*38fd1498Szrj (1) Break long traces to short ones. A trace is broken at a block that has 75*38fd1498Szrj multiple predecessors/ successors during trace discovery. When connecting 76*38fd1498Szrj traces, only connect Trace n with Trace n + 1. This change reduces most 77*38fd1498Szrj long jumps compared with the above algorithm. 78*38fd1498Szrj (2) Ignore the edge probability and count for fallthru edges. 79*38fd1498Szrj (3) Keep the original order of blocks when there is no chance to fall 80*38fd1498Szrj through. We rely on the results of cfg_cleanup. 81*38fd1498Szrj 82*38fd1498Szrj To implement the change for code size optimization, block's index is 83*38fd1498Szrj selected as the key and all traces are found in one round. 84*38fd1498Szrj 85*38fd1498Szrj References: 86*38fd1498Szrj 87*38fd1498Szrj "Software Trace Cache" 88*38fd1498Szrj A. Ramirez, J. Larriba-Pey, C. Navarro, J. Torrellas and M. Valero; 1999 89*38fd1498Szrj http://citeseer.nj.nec.com/15361.html 90*38fd1498Szrj 91*38fd1498Szrj */ 92*38fd1498Szrj 93*38fd1498Szrj #include "config.h" 94*38fd1498Szrj #define INCLUDE_ALGORITHM /* stable_sort */ 95*38fd1498Szrj #include "system.h" 96*38fd1498Szrj #include "coretypes.h" 97*38fd1498Szrj #include "backend.h" 98*38fd1498Szrj #include "target.h" 99*38fd1498Szrj #include "rtl.h" 100*38fd1498Szrj #include "tree.h" 101*38fd1498Szrj #include "cfghooks.h" 102*38fd1498Szrj #include "df.h" 103*38fd1498Szrj #include "memmodel.h" 104*38fd1498Szrj #include "optabs.h" 105*38fd1498Szrj #include "regs.h" 106*38fd1498Szrj #include "emit-rtl.h" 107*38fd1498Szrj #include "output.h" 108*38fd1498Szrj #include "expr.h" 109*38fd1498Szrj #include "params.h" 110*38fd1498Szrj #include "tree-pass.h" 111*38fd1498Szrj #include "cfgrtl.h" 112*38fd1498Szrj #include "cfganal.h" 113*38fd1498Szrj #include "cfgbuild.h" 114*38fd1498Szrj #include "cfgcleanup.h" 115*38fd1498Szrj #include "bb-reorder.h" 116*38fd1498Szrj #include "except.h" 117*38fd1498Szrj #include "fibonacci_heap.h" 118*38fd1498Szrj #include "stringpool.h" 119*38fd1498Szrj #include "attribs.h" 120*38fd1498Szrj 121*38fd1498Szrj /* The number of rounds. In most cases there will only be 4 rounds, but 122*38fd1498Szrj when partitioning hot and cold basic blocks into separate sections of 123*38fd1498Szrj the object file there will be an extra round. */ 124*38fd1498Szrj #define N_ROUNDS 5 125*38fd1498Szrj 126*38fd1498Szrj struct target_bb_reorder default_target_bb_reorder; 127*38fd1498Szrj #if SWITCHABLE_TARGET 128*38fd1498Szrj struct target_bb_reorder *this_target_bb_reorder = &default_target_bb_reorder; 129*38fd1498Szrj #endif 130*38fd1498Szrj 131*38fd1498Szrj #define uncond_jump_length \ 132*38fd1498Szrj (this_target_bb_reorder->x_uncond_jump_length) 133*38fd1498Szrj 134*38fd1498Szrj /* Branch thresholds in thousandths (per mille) of the REG_BR_PROB_BASE. */ 135*38fd1498Szrj static const int branch_threshold[N_ROUNDS] = {400, 200, 100, 0, 0}; 136*38fd1498Szrj 137*38fd1498Szrj /* Exec thresholds in thousandths (per mille) of the count of bb 0. */ 138*38fd1498Szrj static const int exec_threshold[N_ROUNDS] = {500, 200, 50, 0, 0}; 139*38fd1498Szrj 140*38fd1498Szrj /* If edge count is lower than DUPLICATION_THRESHOLD per mille of entry 141*38fd1498Szrj block the edge destination is not duplicated while connecting traces. */ 142*38fd1498Szrj #define DUPLICATION_THRESHOLD 100 143*38fd1498Szrj 144*38fd1498Szrj typedef fibonacci_heap <long, basic_block_def> bb_heap_t; 145*38fd1498Szrj typedef fibonacci_node <long, basic_block_def> bb_heap_node_t; 146*38fd1498Szrj 147*38fd1498Szrj /* Structure to hold needed information for each basic block. */ 148*38fd1498Szrj struct bbro_basic_block_data 149*38fd1498Szrj { 150*38fd1498Szrj /* Which trace is the bb start of (-1 means it is not a start of any). */ 151*38fd1498Szrj int start_of_trace; 152*38fd1498Szrj 153*38fd1498Szrj /* Which trace is the bb end of (-1 means it is not an end of any). */ 154*38fd1498Szrj int end_of_trace; 155*38fd1498Szrj 156*38fd1498Szrj /* Which trace is the bb in? */ 157*38fd1498Szrj int in_trace; 158*38fd1498Szrj 159*38fd1498Szrj /* Which trace was this bb visited in? */ 160*38fd1498Szrj int visited; 161*38fd1498Szrj 162*38fd1498Szrj /* Cached maximum frequency of interesting incoming edges. 163*38fd1498Szrj Minus one means not yet computed. */ 164*38fd1498Szrj int priority; 165*38fd1498Szrj 166*38fd1498Szrj /* Which heap is BB in (if any)? */ 167*38fd1498Szrj bb_heap_t *heap; 168*38fd1498Szrj 169*38fd1498Szrj /* Which heap node is BB in (if any)? */ 170*38fd1498Szrj bb_heap_node_t *node; 171*38fd1498Szrj }; 172*38fd1498Szrj 173*38fd1498Szrj /* The current size of the following dynamic array. */ 174*38fd1498Szrj static int array_size; 175*38fd1498Szrj 176*38fd1498Szrj /* The array which holds needed information for basic blocks. */ 177*38fd1498Szrj static bbro_basic_block_data *bbd; 178*38fd1498Szrj 179*38fd1498Szrj /* To avoid frequent reallocation the size of arrays is greater than needed, 180*38fd1498Szrj the number of elements is (not less than) 1.25 * size_wanted. */ 181*38fd1498Szrj #define GET_ARRAY_SIZE(X) ((((X) / 4) + 1) * 5) 182*38fd1498Szrj 183*38fd1498Szrj /* Free the memory and set the pointer to NULL. */ 184*38fd1498Szrj #define FREE(P) (gcc_assert (P), free (P), P = 0) 185*38fd1498Szrj 186*38fd1498Szrj /* Structure for holding information about a trace. */ 187*38fd1498Szrj struct trace 188*38fd1498Szrj { 189*38fd1498Szrj /* First and last basic block of the trace. */ 190*38fd1498Szrj basic_block first, last; 191*38fd1498Szrj 192*38fd1498Szrj /* The round of the STC creation which this trace was found in. */ 193*38fd1498Szrj int round; 194*38fd1498Szrj 195*38fd1498Szrj /* The length (i.e. the number of basic blocks) of the trace. */ 196*38fd1498Szrj int length; 197*38fd1498Szrj }; 198*38fd1498Szrj 199*38fd1498Szrj /* Maximum count of one of the entry blocks. */ 200*38fd1498Szrj static profile_count max_entry_count; 201*38fd1498Szrj 202*38fd1498Szrj /* Local function prototypes. */ 203*38fd1498Szrj static void find_traces_1_round (int, profile_count, struct trace *, int *, 204*38fd1498Szrj int, bb_heap_t **, int); 205*38fd1498Szrj static basic_block copy_bb (basic_block, edge, basic_block, int); 206*38fd1498Szrj static long bb_to_key (basic_block); 207*38fd1498Szrj static bool better_edge_p (const_basic_block, const_edge, profile_probability, 208*38fd1498Szrj profile_count, profile_probability, profile_count, 209*38fd1498Szrj const_edge); 210*38fd1498Szrj static bool copy_bb_p (const_basic_block, int); 211*38fd1498Szrj 212*38fd1498Szrj /* Return the trace number in which BB was visited. */ 213*38fd1498Szrj 214*38fd1498Szrj static int 215*38fd1498Szrj bb_visited_trace (const_basic_block bb) 216*38fd1498Szrj { 217*38fd1498Szrj gcc_assert (bb->index < array_size); 218*38fd1498Szrj return bbd[bb->index].visited; 219*38fd1498Szrj } 220*38fd1498Szrj 221*38fd1498Szrj /* This function marks BB that it was visited in trace number TRACE. */ 222*38fd1498Szrj 223*38fd1498Szrj static void 224*38fd1498Szrj mark_bb_visited (basic_block bb, int trace) 225*38fd1498Szrj { 226*38fd1498Szrj bbd[bb->index].visited = trace; 227*38fd1498Szrj if (bbd[bb->index].heap) 228*38fd1498Szrj { 229*38fd1498Szrj bbd[bb->index].heap->delete_node (bbd[bb->index].node); 230*38fd1498Szrj bbd[bb->index].heap = NULL; 231*38fd1498Szrj bbd[bb->index].node = NULL; 232*38fd1498Szrj } 233*38fd1498Szrj } 234*38fd1498Szrj 235*38fd1498Szrj /* Check to see if bb should be pushed into the next round of trace 236*38fd1498Szrj collections or not. Reasons for pushing the block forward are 1). 237*38fd1498Szrj If the block is cold, we are doing partitioning, and there will be 238*38fd1498Szrj another round (cold partition blocks are not supposed to be 239*38fd1498Szrj collected into traces until the very last round); or 2). There will 240*38fd1498Szrj be another round, and the basic block is not "hot enough" for the 241*38fd1498Szrj current round of trace collection. */ 242*38fd1498Szrj 243*38fd1498Szrj static bool 244*38fd1498Szrj push_to_next_round_p (const_basic_block bb, int round, int number_of_rounds, 245*38fd1498Szrj profile_count count_th) 246*38fd1498Szrj { 247*38fd1498Szrj bool there_exists_another_round; 248*38fd1498Szrj bool block_not_hot_enough; 249*38fd1498Szrj 250*38fd1498Szrj there_exists_another_round = round < number_of_rounds - 1; 251*38fd1498Szrj 252*38fd1498Szrj block_not_hot_enough = (bb->count < count_th 253*38fd1498Szrj || probably_never_executed_bb_p (cfun, bb)); 254*38fd1498Szrj 255*38fd1498Szrj if (there_exists_another_round 256*38fd1498Szrj && block_not_hot_enough) 257*38fd1498Szrj return true; 258*38fd1498Szrj else 259*38fd1498Szrj return false; 260*38fd1498Szrj } 261*38fd1498Szrj 262*38fd1498Szrj /* Find the traces for Software Trace Cache. Chain each trace through 263*38fd1498Szrj RBI()->next. Store the number of traces to N_TRACES and description of 264*38fd1498Szrj traces to TRACES. */ 265*38fd1498Szrj 266*38fd1498Szrj static void 267*38fd1498Szrj find_traces (int *n_traces, struct trace *traces) 268*38fd1498Szrj { 269*38fd1498Szrj int i; 270*38fd1498Szrj int number_of_rounds; 271*38fd1498Szrj edge e; 272*38fd1498Szrj edge_iterator ei; 273*38fd1498Szrj bb_heap_t *heap = new bb_heap_t (LONG_MIN); 274*38fd1498Szrj 275*38fd1498Szrj /* Add one extra round of trace collection when partitioning hot/cold 276*38fd1498Szrj basic blocks into separate sections. The last round is for all the 277*38fd1498Szrj cold blocks (and ONLY the cold blocks). */ 278*38fd1498Szrj 279*38fd1498Szrj number_of_rounds = N_ROUNDS - 1; 280*38fd1498Szrj 281*38fd1498Szrj /* Insert entry points of function into heap. */ 282*38fd1498Szrj max_entry_count = profile_count::zero (); 283*38fd1498Szrj FOR_EACH_EDGE (e, ei, ENTRY_BLOCK_PTR_FOR_FN (cfun)->succs) 284*38fd1498Szrj { 285*38fd1498Szrj bbd[e->dest->index].heap = heap; 286*38fd1498Szrj bbd[e->dest->index].node = heap->insert (bb_to_key (e->dest), e->dest); 287*38fd1498Szrj if (e->dest->count > max_entry_count) 288*38fd1498Szrj max_entry_count = e->dest->count; 289*38fd1498Szrj } 290*38fd1498Szrj 291*38fd1498Szrj /* Find the traces. */ 292*38fd1498Szrj for (i = 0; i < number_of_rounds; i++) 293*38fd1498Szrj { 294*38fd1498Szrj profile_count count_threshold; 295*38fd1498Szrj 296*38fd1498Szrj if (dump_file) 297*38fd1498Szrj fprintf (dump_file, "STC - round %d\n", i + 1); 298*38fd1498Szrj 299*38fd1498Szrj count_threshold = max_entry_count.apply_scale (exec_threshold[i], 1000); 300*38fd1498Szrj 301*38fd1498Szrj find_traces_1_round (REG_BR_PROB_BASE * branch_threshold[i] / 1000, 302*38fd1498Szrj count_threshold, traces, n_traces, i, &heap, 303*38fd1498Szrj number_of_rounds); 304*38fd1498Szrj } 305*38fd1498Szrj delete heap; 306*38fd1498Szrj 307*38fd1498Szrj if (dump_file) 308*38fd1498Szrj { 309*38fd1498Szrj for (i = 0; i < *n_traces; i++) 310*38fd1498Szrj { 311*38fd1498Szrj basic_block bb; 312*38fd1498Szrj fprintf (dump_file, "Trace %d (round %d): ", i + 1, 313*38fd1498Szrj traces[i].round + 1); 314*38fd1498Szrj for (bb = traces[i].first; 315*38fd1498Szrj bb != traces[i].last; 316*38fd1498Szrj bb = (basic_block) bb->aux) 317*38fd1498Szrj { 318*38fd1498Szrj fprintf (dump_file, "%d [", bb->index); 319*38fd1498Szrj bb->count.dump (dump_file); 320*38fd1498Szrj fprintf (dump_file, "] "); 321*38fd1498Szrj } 322*38fd1498Szrj fprintf (dump_file, "%d [", bb->index); 323*38fd1498Szrj bb->count.dump (dump_file); 324*38fd1498Szrj fprintf (dump_file, "]\n"); 325*38fd1498Szrj } 326*38fd1498Szrj fflush (dump_file); 327*38fd1498Szrj } 328*38fd1498Szrj } 329*38fd1498Szrj 330*38fd1498Szrj /* Rotate loop whose back edge is BACK_EDGE in the tail of trace TRACE 331*38fd1498Szrj (with sequential number TRACE_N). */ 332*38fd1498Szrj 333*38fd1498Szrj static basic_block 334*38fd1498Szrj rotate_loop (edge back_edge, struct trace *trace, int trace_n) 335*38fd1498Szrj { 336*38fd1498Szrj basic_block bb; 337*38fd1498Szrj 338*38fd1498Szrj /* Information about the best end (end after rotation) of the loop. */ 339*38fd1498Szrj basic_block best_bb = NULL; 340*38fd1498Szrj edge best_edge = NULL; 341*38fd1498Szrj profile_count best_count = profile_count::uninitialized (); 342*38fd1498Szrj /* The best edge is preferred when its destination is not visited yet 343*38fd1498Szrj or is a start block of some trace. */ 344*38fd1498Szrj bool is_preferred = false; 345*38fd1498Szrj 346*38fd1498Szrj /* Find the most frequent edge that goes out from current trace. */ 347*38fd1498Szrj bb = back_edge->dest; 348*38fd1498Szrj do 349*38fd1498Szrj { 350*38fd1498Szrj edge e; 351*38fd1498Szrj edge_iterator ei; 352*38fd1498Szrj 353*38fd1498Szrj FOR_EACH_EDGE (e, ei, bb->succs) 354*38fd1498Szrj if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun) 355*38fd1498Szrj && bb_visited_trace (e->dest) != trace_n 356*38fd1498Szrj && (e->flags & EDGE_CAN_FALLTHRU) 357*38fd1498Szrj && !(e->flags & EDGE_COMPLEX)) 358*38fd1498Szrj { 359*38fd1498Szrj if (is_preferred) 360*38fd1498Szrj { 361*38fd1498Szrj /* The best edge is preferred. */ 362*38fd1498Szrj if (!bb_visited_trace (e->dest) 363*38fd1498Szrj || bbd[e->dest->index].start_of_trace >= 0) 364*38fd1498Szrj { 365*38fd1498Szrj /* The current edge E is also preferred. */ 366*38fd1498Szrj if (e->count () > best_count) 367*38fd1498Szrj { 368*38fd1498Szrj best_count = e->count (); 369*38fd1498Szrj best_edge = e; 370*38fd1498Szrj best_bb = bb; 371*38fd1498Szrj } 372*38fd1498Szrj } 373*38fd1498Szrj } 374*38fd1498Szrj else 375*38fd1498Szrj { 376*38fd1498Szrj if (!bb_visited_trace (e->dest) 377*38fd1498Szrj || bbd[e->dest->index].start_of_trace >= 0) 378*38fd1498Szrj { 379*38fd1498Szrj /* The current edge E is preferred. */ 380*38fd1498Szrj is_preferred = true; 381*38fd1498Szrj best_count = e->count (); 382*38fd1498Szrj best_edge = e; 383*38fd1498Szrj best_bb = bb; 384*38fd1498Szrj } 385*38fd1498Szrj else 386*38fd1498Szrj { 387*38fd1498Szrj if (!best_edge || e->count () > best_count) 388*38fd1498Szrj { 389*38fd1498Szrj best_count = e->count (); 390*38fd1498Szrj best_edge = e; 391*38fd1498Szrj best_bb = bb; 392*38fd1498Szrj } 393*38fd1498Szrj } 394*38fd1498Szrj } 395*38fd1498Szrj } 396*38fd1498Szrj bb = (basic_block) bb->aux; 397*38fd1498Szrj } 398*38fd1498Szrj while (bb != back_edge->dest); 399*38fd1498Szrj 400*38fd1498Szrj if (best_bb) 401*38fd1498Szrj { 402*38fd1498Szrj /* Rotate the loop so that the BEST_EDGE goes out from the last block of 403*38fd1498Szrj the trace. */ 404*38fd1498Szrj if (back_edge->dest == trace->first) 405*38fd1498Szrj { 406*38fd1498Szrj trace->first = (basic_block) best_bb->aux; 407*38fd1498Szrj } 408*38fd1498Szrj else 409*38fd1498Szrj { 410*38fd1498Szrj basic_block prev_bb; 411*38fd1498Szrj 412*38fd1498Szrj for (prev_bb = trace->first; 413*38fd1498Szrj prev_bb->aux != back_edge->dest; 414*38fd1498Szrj prev_bb = (basic_block) prev_bb->aux) 415*38fd1498Szrj ; 416*38fd1498Szrj prev_bb->aux = best_bb->aux; 417*38fd1498Szrj 418*38fd1498Szrj /* Try to get rid of uncond jump to cond jump. */ 419*38fd1498Szrj if (single_succ_p (prev_bb)) 420*38fd1498Szrj { 421*38fd1498Szrj basic_block header = single_succ (prev_bb); 422*38fd1498Szrj 423*38fd1498Szrj /* Duplicate HEADER if it is a small block containing cond jump 424*38fd1498Szrj in the end. */ 425*38fd1498Szrj if (any_condjump_p (BB_END (header)) && copy_bb_p (header, 0) 426*38fd1498Szrj && !CROSSING_JUMP_P (BB_END (header))) 427*38fd1498Szrj copy_bb (header, single_succ_edge (prev_bb), prev_bb, trace_n); 428*38fd1498Szrj } 429*38fd1498Szrj } 430*38fd1498Szrj } 431*38fd1498Szrj else 432*38fd1498Szrj { 433*38fd1498Szrj /* We have not found suitable loop tail so do no rotation. */ 434*38fd1498Szrj best_bb = back_edge->src; 435*38fd1498Szrj } 436*38fd1498Szrj best_bb->aux = NULL; 437*38fd1498Szrj return best_bb; 438*38fd1498Szrj } 439*38fd1498Szrj 440*38fd1498Szrj /* One round of finding traces. Find traces for BRANCH_TH and EXEC_TH i.e. do 441*38fd1498Szrj not include basic blocks whose probability is lower than BRANCH_TH or whose 442*38fd1498Szrj count is lower than EXEC_TH into traces (or whose count is lower than 443*38fd1498Szrj COUNT_TH). Store the new traces into TRACES and modify the number of 444*38fd1498Szrj traces *N_TRACES. Set the round (which the trace belongs to) to ROUND. 445*38fd1498Szrj The function expects starting basic blocks to be in *HEAP and will delete 446*38fd1498Szrj *HEAP and store starting points for the next round into new *HEAP. */ 447*38fd1498Szrj 448*38fd1498Szrj static void 449*38fd1498Szrj find_traces_1_round (int branch_th, profile_count count_th, 450*38fd1498Szrj struct trace *traces, int *n_traces, int round, 451*38fd1498Szrj bb_heap_t **heap, int number_of_rounds) 452*38fd1498Szrj { 453*38fd1498Szrj /* Heap for discarded basic blocks which are possible starting points for 454*38fd1498Szrj the next round. */ 455*38fd1498Szrj bb_heap_t *new_heap = new bb_heap_t (LONG_MIN); 456*38fd1498Szrj bool for_size = optimize_function_for_size_p (cfun); 457*38fd1498Szrj 458*38fd1498Szrj while (!(*heap)->empty ()) 459*38fd1498Szrj { 460*38fd1498Szrj basic_block bb; 461*38fd1498Szrj struct trace *trace; 462*38fd1498Szrj edge best_edge, e; 463*38fd1498Szrj long key; 464*38fd1498Szrj edge_iterator ei; 465*38fd1498Szrj 466*38fd1498Szrj bb = (*heap)->extract_min (); 467*38fd1498Szrj bbd[bb->index].heap = NULL; 468*38fd1498Szrj bbd[bb->index].node = NULL; 469*38fd1498Szrj 470*38fd1498Szrj if (dump_file) 471*38fd1498Szrj fprintf (dump_file, "Getting bb %d\n", bb->index); 472*38fd1498Szrj 473*38fd1498Szrj /* If the BB's count is too low, send BB to the next round. When 474*38fd1498Szrj partitioning hot/cold blocks into separate sections, make sure all 475*38fd1498Szrj the cold blocks (and ONLY the cold blocks) go into the (extra) final 476*38fd1498Szrj round. When optimizing for size, do not push to next round. */ 477*38fd1498Szrj 478*38fd1498Szrj if (!for_size 479*38fd1498Szrj && push_to_next_round_p (bb, round, number_of_rounds, 480*38fd1498Szrj count_th)) 481*38fd1498Szrj { 482*38fd1498Szrj int key = bb_to_key (bb); 483*38fd1498Szrj bbd[bb->index].heap = new_heap; 484*38fd1498Szrj bbd[bb->index].node = new_heap->insert (key, bb); 485*38fd1498Szrj 486*38fd1498Szrj if (dump_file) 487*38fd1498Szrj fprintf (dump_file, 488*38fd1498Szrj " Possible start point of next round: %d (key: %d)\n", 489*38fd1498Szrj bb->index, key); 490*38fd1498Szrj continue; 491*38fd1498Szrj } 492*38fd1498Szrj 493*38fd1498Szrj trace = traces + *n_traces; 494*38fd1498Szrj trace->first = bb; 495*38fd1498Szrj trace->round = round; 496*38fd1498Szrj trace->length = 0; 497*38fd1498Szrj bbd[bb->index].in_trace = *n_traces; 498*38fd1498Szrj (*n_traces)++; 499*38fd1498Szrj 500*38fd1498Szrj do 501*38fd1498Szrj { 502*38fd1498Szrj bool ends_in_call; 503*38fd1498Szrj 504*38fd1498Szrj /* The probability and count of the best edge. */ 505*38fd1498Szrj profile_probability best_prob = profile_probability::uninitialized (); 506*38fd1498Szrj profile_count best_count = profile_count::uninitialized (); 507*38fd1498Szrj 508*38fd1498Szrj best_edge = NULL; 509*38fd1498Szrj mark_bb_visited (bb, *n_traces); 510*38fd1498Szrj trace->length++; 511*38fd1498Szrj 512*38fd1498Szrj if (dump_file) 513*38fd1498Szrj fprintf (dump_file, "Basic block %d was visited in trace %d\n", 514*38fd1498Szrj bb->index, *n_traces); 515*38fd1498Szrj 516*38fd1498Szrj ends_in_call = block_ends_with_call_p (bb); 517*38fd1498Szrj 518*38fd1498Szrj /* Select the successor that will be placed after BB. */ 519*38fd1498Szrj FOR_EACH_EDGE (e, ei, bb->succs) 520*38fd1498Szrj { 521*38fd1498Szrj gcc_assert (!(e->flags & EDGE_FAKE)); 522*38fd1498Szrj 523*38fd1498Szrj if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)) 524*38fd1498Szrj continue; 525*38fd1498Szrj 526*38fd1498Szrj if (bb_visited_trace (e->dest) 527*38fd1498Szrj && bb_visited_trace (e->dest) != *n_traces) 528*38fd1498Szrj continue; 529*38fd1498Szrj 530*38fd1498Szrj /* If partitioning hot/cold basic blocks, don't consider edges 531*38fd1498Szrj that cross section boundaries. */ 532*38fd1498Szrj if (BB_PARTITION (e->dest) != BB_PARTITION (bb)) 533*38fd1498Szrj continue; 534*38fd1498Szrj 535*38fd1498Szrj profile_probability prob = e->probability; 536*38fd1498Szrj profile_count count = e->dest->count; 537*38fd1498Szrj 538*38fd1498Szrj /* The only sensible preference for a call instruction is the 539*38fd1498Szrj fallthru edge. Don't bother selecting anything else. */ 540*38fd1498Szrj if (ends_in_call) 541*38fd1498Szrj { 542*38fd1498Szrj if (e->flags & EDGE_CAN_FALLTHRU) 543*38fd1498Szrj { 544*38fd1498Szrj best_edge = e; 545*38fd1498Szrj best_prob = prob; 546*38fd1498Szrj best_count = count; 547*38fd1498Szrj } 548*38fd1498Szrj continue; 549*38fd1498Szrj } 550*38fd1498Szrj 551*38fd1498Szrj /* Edge that cannot be fallthru or improbable or infrequent 552*38fd1498Szrj successor (i.e. it is unsuitable successor). When optimizing 553*38fd1498Szrj for size, ignore the probability and count. */ 554*38fd1498Szrj if (!(e->flags & EDGE_CAN_FALLTHRU) || (e->flags & EDGE_COMPLEX) 555*38fd1498Szrj || !prob.initialized_p () 556*38fd1498Szrj || ((prob.to_reg_br_prob_base () < branch_th 557*38fd1498Szrj || e->count () < count_th) && (!for_size))) 558*38fd1498Szrj continue; 559*38fd1498Szrj 560*38fd1498Szrj if (better_edge_p (bb, e, prob, count, best_prob, best_count, 561*38fd1498Szrj best_edge)) 562*38fd1498Szrj { 563*38fd1498Szrj best_edge = e; 564*38fd1498Szrj best_prob = prob; 565*38fd1498Szrj best_count = count; 566*38fd1498Szrj } 567*38fd1498Szrj } 568*38fd1498Szrj 569*38fd1498Szrj /* If the best destination has multiple predecessors and can be 570*38fd1498Szrj duplicated cheaper than a jump, don't allow it to be added to 571*38fd1498Szrj a trace; we'll duplicate it when connecting the traces later. 572*38fd1498Szrj However, we need to check that this duplication wouldn't leave 573*38fd1498Szrj the best destination with only crossing predecessors, because 574*38fd1498Szrj this would change its effective partition from hot to cold. */ 575*38fd1498Szrj if (best_edge 576*38fd1498Szrj && EDGE_COUNT (best_edge->dest->preds) >= 2 577*38fd1498Szrj && copy_bb_p (best_edge->dest, 0)) 578*38fd1498Szrj { 579*38fd1498Szrj bool only_crossing_preds = true; 580*38fd1498Szrj edge e; 581*38fd1498Szrj edge_iterator ei; 582*38fd1498Szrj FOR_EACH_EDGE (e, ei, best_edge->dest->preds) 583*38fd1498Szrj if (e != best_edge && !(e->flags & EDGE_CROSSING)) 584*38fd1498Szrj { 585*38fd1498Szrj only_crossing_preds = false; 586*38fd1498Szrj break; 587*38fd1498Szrj } 588*38fd1498Szrj if (!only_crossing_preds) 589*38fd1498Szrj best_edge = NULL; 590*38fd1498Szrj } 591*38fd1498Szrj 592*38fd1498Szrj /* If the best destination has multiple successors or predecessors, 593*38fd1498Szrj don't allow it to be added when optimizing for size. This makes 594*38fd1498Szrj sure predecessors with smaller index are handled before the best 595*38fd1498Szrj destinarion. It breaks long trace and reduces long jumps. 596*38fd1498Szrj 597*38fd1498Szrj Take if-then-else as an example. 598*38fd1498Szrj A 599*38fd1498Szrj / \ 600*38fd1498Szrj B C 601*38fd1498Szrj \ / 602*38fd1498Szrj D 603*38fd1498Szrj If we do not remove the best edge B->D/C->D, the final order might 604*38fd1498Szrj be A B D ... C. C is at the end of the program. If D's successors 605*38fd1498Szrj and D are complicated, might need long jumps for A->C and C->D. 606*38fd1498Szrj Similar issue for order: A C D ... B. 607*38fd1498Szrj 608*38fd1498Szrj After removing the best edge, the final result will be ABCD/ ACBD. 609*38fd1498Szrj It does not add jump compared with the previous order. But it 610*38fd1498Szrj reduces the possibility of long jumps. */ 611*38fd1498Szrj if (best_edge && for_size 612*38fd1498Szrj && (EDGE_COUNT (best_edge->dest->succs) > 1 613*38fd1498Szrj || EDGE_COUNT (best_edge->dest->preds) > 1)) 614*38fd1498Szrj best_edge = NULL; 615*38fd1498Szrj 616*38fd1498Szrj /* Add all non-selected successors to the heaps. */ 617*38fd1498Szrj FOR_EACH_EDGE (e, ei, bb->succs) 618*38fd1498Szrj { 619*38fd1498Szrj if (e == best_edge 620*38fd1498Szrj || e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun) 621*38fd1498Szrj || bb_visited_trace (e->dest)) 622*38fd1498Szrj continue; 623*38fd1498Szrj 624*38fd1498Szrj key = bb_to_key (e->dest); 625*38fd1498Szrj 626*38fd1498Szrj if (bbd[e->dest->index].heap) 627*38fd1498Szrj { 628*38fd1498Szrj /* E->DEST is already in some heap. */ 629*38fd1498Szrj if (key != bbd[e->dest->index].node->get_key ()) 630*38fd1498Szrj { 631*38fd1498Szrj if (dump_file) 632*38fd1498Szrj { 633*38fd1498Szrj fprintf (dump_file, 634*38fd1498Szrj "Changing key for bb %d from %ld to %ld.\n", 635*38fd1498Szrj e->dest->index, 636*38fd1498Szrj (long) bbd[e->dest->index].node->get_key (), 637*38fd1498Szrj key); 638*38fd1498Szrj } 639*38fd1498Szrj bbd[e->dest->index].heap->replace_key 640*38fd1498Szrj (bbd[e->dest->index].node, key); 641*38fd1498Szrj } 642*38fd1498Szrj } 643*38fd1498Szrj else 644*38fd1498Szrj { 645*38fd1498Szrj bb_heap_t *which_heap = *heap; 646*38fd1498Szrj 647*38fd1498Szrj profile_probability prob = e->probability; 648*38fd1498Szrj 649*38fd1498Szrj if (!(e->flags & EDGE_CAN_FALLTHRU) 650*38fd1498Szrj || (e->flags & EDGE_COMPLEX) 651*38fd1498Szrj || !prob.initialized_p () 652*38fd1498Szrj || prob.to_reg_br_prob_base () < branch_th 653*38fd1498Szrj || e->count () < count_th) 654*38fd1498Szrj { 655*38fd1498Szrj /* When partitioning hot/cold basic blocks, make sure 656*38fd1498Szrj the cold blocks (and only the cold blocks) all get 657*38fd1498Szrj pushed to the last round of trace collection. When 658*38fd1498Szrj optimizing for size, do not push to next round. */ 659*38fd1498Szrj 660*38fd1498Szrj if (!for_size && push_to_next_round_p (e->dest, round, 661*38fd1498Szrj number_of_rounds, 662*38fd1498Szrj count_th)) 663*38fd1498Szrj which_heap = new_heap; 664*38fd1498Szrj } 665*38fd1498Szrj 666*38fd1498Szrj bbd[e->dest->index].heap = which_heap; 667*38fd1498Szrj bbd[e->dest->index].node = which_heap->insert (key, e->dest); 668*38fd1498Szrj 669*38fd1498Szrj if (dump_file) 670*38fd1498Szrj { 671*38fd1498Szrj fprintf (dump_file, 672*38fd1498Szrj " Possible start of %s round: %d (key: %ld)\n", 673*38fd1498Szrj (which_heap == new_heap) ? "next" : "this", 674*38fd1498Szrj e->dest->index, (long) key); 675*38fd1498Szrj } 676*38fd1498Szrj 677*38fd1498Szrj } 678*38fd1498Szrj } 679*38fd1498Szrj 680*38fd1498Szrj if (best_edge) /* Suitable successor was found. */ 681*38fd1498Szrj { 682*38fd1498Szrj if (bb_visited_trace (best_edge->dest) == *n_traces) 683*38fd1498Szrj { 684*38fd1498Szrj /* We do nothing with one basic block loops. */ 685*38fd1498Szrj if (best_edge->dest != bb) 686*38fd1498Szrj { 687*38fd1498Szrj if (best_edge->count () 688*38fd1498Szrj > best_edge->dest->count.apply_scale (4, 5)) 689*38fd1498Szrj { 690*38fd1498Szrj /* The loop has at least 4 iterations. If the loop 691*38fd1498Szrj header is not the first block of the function 692*38fd1498Szrj we can rotate the loop. */ 693*38fd1498Szrj 694*38fd1498Szrj if (best_edge->dest 695*38fd1498Szrj != ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb) 696*38fd1498Szrj { 697*38fd1498Szrj if (dump_file) 698*38fd1498Szrj { 699*38fd1498Szrj fprintf (dump_file, 700*38fd1498Szrj "Rotating loop %d - %d\n", 701*38fd1498Szrj best_edge->dest->index, bb->index); 702*38fd1498Szrj } 703*38fd1498Szrj bb->aux = best_edge->dest; 704*38fd1498Szrj bbd[best_edge->dest->index].in_trace = 705*38fd1498Szrj (*n_traces) - 1; 706*38fd1498Szrj bb = rotate_loop (best_edge, trace, *n_traces); 707*38fd1498Szrj } 708*38fd1498Szrj } 709*38fd1498Szrj else 710*38fd1498Szrj { 711*38fd1498Szrj /* The loop has less than 4 iterations. */ 712*38fd1498Szrj 713*38fd1498Szrj if (single_succ_p (bb) 714*38fd1498Szrj && copy_bb_p (best_edge->dest, 715*38fd1498Szrj optimize_edge_for_speed_p 716*38fd1498Szrj (best_edge))) 717*38fd1498Szrj { 718*38fd1498Szrj bb = copy_bb (best_edge->dest, best_edge, bb, 719*38fd1498Szrj *n_traces); 720*38fd1498Szrj trace->length++; 721*38fd1498Szrj } 722*38fd1498Szrj } 723*38fd1498Szrj } 724*38fd1498Szrj 725*38fd1498Szrj /* Terminate the trace. */ 726*38fd1498Szrj break; 727*38fd1498Szrj } 728*38fd1498Szrj else 729*38fd1498Szrj { 730*38fd1498Szrj /* Check for a situation 731*38fd1498Szrj 732*38fd1498Szrj A 733*38fd1498Szrj /| 734*38fd1498Szrj B | 735*38fd1498Szrj \| 736*38fd1498Szrj C 737*38fd1498Szrj 738*38fd1498Szrj where 739*38fd1498Szrj AB->count () + BC->count () >= AC->count (). 740*38fd1498Szrj (i.e. 2 * B->count >= AC->count ) 741*38fd1498Szrj Best ordering is then A B C. 742*38fd1498Szrj 743*38fd1498Szrj When optimizing for size, A B C is always the best order. 744*38fd1498Szrj 745*38fd1498Szrj This situation is created for example by: 746*38fd1498Szrj 747*38fd1498Szrj if (A) B; 748*38fd1498Szrj C; 749*38fd1498Szrj 750*38fd1498Szrj */ 751*38fd1498Szrj 752*38fd1498Szrj FOR_EACH_EDGE (e, ei, bb->succs) 753*38fd1498Szrj if (e != best_edge 754*38fd1498Szrj && (e->flags & EDGE_CAN_FALLTHRU) 755*38fd1498Szrj && !(e->flags & EDGE_COMPLEX) 756*38fd1498Szrj && !bb_visited_trace (e->dest) 757*38fd1498Szrj && single_pred_p (e->dest) 758*38fd1498Szrj && !(e->flags & EDGE_CROSSING) 759*38fd1498Szrj && single_succ_p (e->dest) 760*38fd1498Szrj && (single_succ_edge (e->dest)->flags 761*38fd1498Szrj & EDGE_CAN_FALLTHRU) 762*38fd1498Szrj && !(single_succ_edge (e->dest)->flags & EDGE_COMPLEX) 763*38fd1498Szrj && single_succ (e->dest) == best_edge->dest 764*38fd1498Szrj && (e->dest->count.apply_scale (2, 1) 765*38fd1498Szrj >= best_edge->count () || for_size)) 766*38fd1498Szrj { 767*38fd1498Szrj best_edge = e; 768*38fd1498Szrj if (dump_file) 769*38fd1498Szrj fprintf (dump_file, "Selecting BB %d\n", 770*38fd1498Szrj best_edge->dest->index); 771*38fd1498Szrj break; 772*38fd1498Szrj } 773*38fd1498Szrj 774*38fd1498Szrj bb->aux = best_edge->dest; 775*38fd1498Szrj bbd[best_edge->dest->index].in_trace = (*n_traces) - 1; 776*38fd1498Szrj bb = best_edge->dest; 777*38fd1498Szrj } 778*38fd1498Szrj } 779*38fd1498Szrj } 780*38fd1498Szrj while (best_edge); 781*38fd1498Szrj trace->last = bb; 782*38fd1498Szrj bbd[trace->first->index].start_of_trace = *n_traces - 1; 783*38fd1498Szrj if (bbd[trace->last->index].end_of_trace != *n_traces - 1) 784*38fd1498Szrj { 785*38fd1498Szrj bbd[trace->last->index].end_of_trace = *n_traces - 1; 786*38fd1498Szrj /* Update the cached maximum frequency for interesting predecessor 787*38fd1498Szrj edges for successors of the new trace end. */ 788*38fd1498Szrj FOR_EACH_EDGE (e, ei, trace->last->succs) 789*38fd1498Szrj if (EDGE_FREQUENCY (e) > bbd[e->dest->index].priority) 790*38fd1498Szrj bbd[e->dest->index].priority = EDGE_FREQUENCY (e); 791*38fd1498Szrj } 792*38fd1498Szrj 793*38fd1498Szrj /* The trace is terminated so we have to recount the keys in heap 794*38fd1498Szrj (some block can have a lower key because now one of its predecessors 795*38fd1498Szrj is an end of the trace). */ 796*38fd1498Szrj FOR_EACH_EDGE (e, ei, bb->succs) 797*38fd1498Szrj { 798*38fd1498Szrj if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun) 799*38fd1498Szrj || bb_visited_trace (e->dest)) 800*38fd1498Szrj continue; 801*38fd1498Szrj 802*38fd1498Szrj if (bbd[e->dest->index].heap) 803*38fd1498Szrj { 804*38fd1498Szrj key = bb_to_key (e->dest); 805*38fd1498Szrj if (key != bbd[e->dest->index].node->get_key ()) 806*38fd1498Szrj { 807*38fd1498Szrj if (dump_file) 808*38fd1498Szrj { 809*38fd1498Szrj fprintf (dump_file, 810*38fd1498Szrj "Changing key for bb %d from %ld to %ld.\n", 811*38fd1498Szrj e->dest->index, 812*38fd1498Szrj (long) bbd[e->dest->index].node->get_key (), key); 813*38fd1498Szrj } 814*38fd1498Szrj bbd[e->dest->index].heap->replace_key 815*38fd1498Szrj (bbd[e->dest->index].node, key); 816*38fd1498Szrj } 817*38fd1498Szrj } 818*38fd1498Szrj } 819*38fd1498Szrj } 820*38fd1498Szrj 821*38fd1498Szrj delete (*heap); 822*38fd1498Szrj 823*38fd1498Szrj /* "Return" the new heap. */ 824*38fd1498Szrj *heap = new_heap; 825*38fd1498Szrj } 826*38fd1498Szrj 827*38fd1498Szrj /* Create a duplicate of the basic block OLD_BB and redirect edge E to it, add 828*38fd1498Szrj it to trace after BB, mark OLD_BB visited and update pass' data structures 829*38fd1498Szrj (TRACE is a number of trace which OLD_BB is duplicated to). */ 830*38fd1498Szrj 831*38fd1498Szrj static basic_block 832*38fd1498Szrj copy_bb (basic_block old_bb, edge e, basic_block bb, int trace) 833*38fd1498Szrj { 834*38fd1498Szrj basic_block new_bb; 835*38fd1498Szrj 836*38fd1498Szrj new_bb = duplicate_block (old_bb, e, bb); 837*38fd1498Szrj BB_COPY_PARTITION (new_bb, old_bb); 838*38fd1498Szrj 839*38fd1498Szrj gcc_assert (e->dest == new_bb); 840*38fd1498Szrj 841*38fd1498Szrj if (dump_file) 842*38fd1498Szrj fprintf (dump_file, 843*38fd1498Szrj "Duplicated bb %d (created bb %d)\n", 844*38fd1498Szrj old_bb->index, new_bb->index); 845*38fd1498Szrj 846*38fd1498Szrj if (new_bb->index >= array_size 847*38fd1498Szrj || last_basic_block_for_fn (cfun) > array_size) 848*38fd1498Szrj { 849*38fd1498Szrj int i; 850*38fd1498Szrj int new_size; 851*38fd1498Szrj 852*38fd1498Szrj new_size = MAX (last_basic_block_for_fn (cfun), new_bb->index + 1); 853*38fd1498Szrj new_size = GET_ARRAY_SIZE (new_size); 854*38fd1498Szrj bbd = XRESIZEVEC (bbro_basic_block_data, bbd, new_size); 855*38fd1498Szrj for (i = array_size; i < new_size; i++) 856*38fd1498Szrj { 857*38fd1498Szrj bbd[i].start_of_trace = -1; 858*38fd1498Szrj bbd[i].end_of_trace = -1; 859*38fd1498Szrj bbd[i].in_trace = -1; 860*38fd1498Szrj bbd[i].visited = 0; 861*38fd1498Szrj bbd[i].priority = -1; 862*38fd1498Szrj bbd[i].heap = NULL; 863*38fd1498Szrj bbd[i].node = NULL; 864*38fd1498Szrj } 865*38fd1498Szrj array_size = new_size; 866*38fd1498Szrj 867*38fd1498Szrj if (dump_file) 868*38fd1498Szrj { 869*38fd1498Szrj fprintf (dump_file, 870*38fd1498Szrj "Growing the dynamic array to %d elements.\n", 871*38fd1498Szrj array_size); 872*38fd1498Szrj } 873*38fd1498Szrj } 874*38fd1498Szrj 875*38fd1498Szrj gcc_assert (!bb_visited_trace (e->dest)); 876*38fd1498Szrj mark_bb_visited (new_bb, trace); 877*38fd1498Szrj new_bb->aux = bb->aux; 878*38fd1498Szrj bb->aux = new_bb; 879*38fd1498Szrj 880*38fd1498Szrj bbd[new_bb->index].in_trace = trace; 881*38fd1498Szrj 882*38fd1498Szrj return new_bb; 883*38fd1498Szrj } 884*38fd1498Szrj 885*38fd1498Szrj /* Compute and return the key (for the heap) of the basic block BB. */ 886*38fd1498Szrj 887*38fd1498Szrj static long 888*38fd1498Szrj bb_to_key (basic_block bb) 889*38fd1498Szrj { 890*38fd1498Szrj edge e; 891*38fd1498Szrj edge_iterator ei; 892*38fd1498Szrj 893*38fd1498Szrj /* Use index as key to align with its original order. */ 894*38fd1498Szrj if (optimize_function_for_size_p (cfun)) 895*38fd1498Szrj return bb->index; 896*38fd1498Szrj 897*38fd1498Szrj /* Do not start in probably never executed blocks. */ 898*38fd1498Szrj 899*38fd1498Szrj if (BB_PARTITION (bb) == BB_COLD_PARTITION 900*38fd1498Szrj || probably_never_executed_bb_p (cfun, bb)) 901*38fd1498Szrj return BB_FREQ_MAX; 902*38fd1498Szrj 903*38fd1498Szrj /* Prefer blocks whose predecessor is an end of some trace 904*38fd1498Szrj or whose predecessor edge is EDGE_DFS_BACK. */ 905*38fd1498Szrj int priority = bbd[bb->index].priority; 906*38fd1498Szrj if (priority == -1) 907*38fd1498Szrj { 908*38fd1498Szrj priority = 0; 909*38fd1498Szrj FOR_EACH_EDGE (e, ei, bb->preds) 910*38fd1498Szrj { 911*38fd1498Szrj if ((e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun) 912*38fd1498Szrj && bbd[e->src->index].end_of_trace >= 0) 913*38fd1498Szrj || (e->flags & EDGE_DFS_BACK)) 914*38fd1498Szrj { 915*38fd1498Szrj int edge_freq = EDGE_FREQUENCY (e); 916*38fd1498Szrj 917*38fd1498Szrj if (edge_freq > priority) 918*38fd1498Szrj priority = edge_freq; 919*38fd1498Szrj } 920*38fd1498Szrj } 921*38fd1498Szrj bbd[bb->index].priority = priority; 922*38fd1498Szrj } 923*38fd1498Szrj 924*38fd1498Szrj if (priority) 925*38fd1498Szrj /* The block with priority should have significantly lower key. */ 926*38fd1498Szrj return -(100 * BB_FREQ_MAX + 100 * priority + bb->count.to_frequency (cfun)); 927*38fd1498Szrj 928*38fd1498Szrj return -bb->count.to_frequency (cfun); 929*38fd1498Szrj } 930*38fd1498Szrj 931*38fd1498Szrj /* Return true when the edge E from basic block BB is better than the temporary 932*38fd1498Szrj best edge (details are in function). The probability of edge E is PROB. The 933*38fd1498Szrj count of the successor is COUNT. The current best probability is 934*38fd1498Szrj BEST_PROB, the best count is BEST_COUNT. 935*38fd1498Szrj The edge is considered to be equivalent when PROB does not differ much from 936*38fd1498Szrj BEST_PROB; similarly for count. */ 937*38fd1498Szrj 938*38fd1498Szrj static bool 939*38fd1498Szrj better_edge_p (const_basic_block bb, const_edge e, profile_probability prob, 940*38fd1498Szrj profile_count count, profile_probability best_prob, 941*38fd1498Szrj profile_count best_count, const_edge cur_best_edge) 942*38fd1498Szrj { 943*38fd1498Szrj bool is_better_edge; 944*38fd1498Szrj 945*38fd1498Szrj /* The BEST_* values do not have to be best, but can be a bit smaller than 946*38fd1498Szrj maximum values. */ 947*38fd1498Szrj profile_probability diff_prob = best_prob.apply_scale (1, 10); 948*38fd1498Szrj 949*38fd1498Szrj /* The smaller one is better to keep the original order. */ 950*38fd1498Szrj if (optimize_function_for_size_p (cfun)) 951*38fd1498Szrj return !cur_best_edge 952*38fd1498Szrj || cur_best_edge->dest->index > e->dest->index; 953*38fd1498Szrj 954*38fd1498Szrj /* Those edges are so expensive that continuing a trace is not useful 955*38fd1498Szrj performance wise. */ 956*38fd1498Szrj if (e->flags & (EDGE_ABNORMAL | EDGE_EH)) 957*38fd1498Szrj return false; 958*38fd1498Szrj 959*38fd1498Szrj if (prob > best_prob + diff_prob 960*38fd1498Szrj || (!best_prob.initialized_p () 961*38fd1498Szrj && prob > profile_probability::guessed_never ())) 962*38fd1498Szrj /* The edge has higher probability than the temporary best edge. */ 963*38fd1498Szrj is_better_edge = true; 964*38fd1498Szrj else if (prob < best_prob - diff_prob) 965*38fd1498Szrj /* The edge has lower probability than the temporary best edge. */ 966*38fd1498Szrj is_better_edge = false; 967*38fd1498Szrj else 968*38fd1498Szrj { 969*38fd1498Szrj profile_count diff_count = best_count.apply_scale (1, 10); 970*38fd1498Szrj if (count < best_count - diff_count 971*38fd1498Szrj || (!best_count.initialized_p () 972*38fd1498Szrj && count.nonzero_p ())) 973*38fd1498Szrj /* The edge and the temporary best edge have almost equivalent 974*38fd1498Szrj probabilities. The higher countuency of a successor now means 975*38fd1498Szrj that there is another edge going into that successor. 976*38fd1498Szrj This successor has lower countuency so it is better. */ 977*38fd1498Szrj is_better_edge = true; 978*38fd1498Szrj else if (count > best_count + diff_count) 979*38fd1498Szrj /* This successor has higher countuency so it is worse. */ 980*38fd1498Szrj is_better_edge = false; 981*38fd1498Szrj else if (e->dest->prev_bb == bb) 982*38fd1498Szrj /* The edges have equivalent probabilities and the successors 983*38fd1498Szrj have equivalent frequencies. Select the previous successor. */ 984*38fd1498Szrj is_better_edge = true; 985*38fd1498Szrj else 986*38fd1498Szrj is_better_edge = false; 987*38fd1498Szrj } 988*38fd1498Szrj 989*38fd1498Szrj return is_better_edge; 990*38fd1498Szrj } 991*38fd1498Szrj 992*38fd1498Szrj /* Return true when the edge E is better than the temporary best edge 993*38fd1498Szrj CUR_BEST_EDGE. If SRC_INDEX_P is true, the function compares the src bb of 994*38fd1498Szrj E and CUR_BEST_EDGE; otherwise it will compare the dest bb. 995*38fd1498Szrj BEST_LEN is the trace length of src (or dest) bb in CUR_BEST_EDGE. 996*38fd1498Szrj TRACES record the information about traces. 997*38fd1498Szrj When optimizing for size, the edge with smaller index is better. 998*38fd1498Szrj When optimizing for speed, the edge with bigger probability or longer trace 999*38fd1498Szrj is better. */ 1000*38fd1498Szrj 1001*38fd1498Szrj static bool 1002*38fd1498Szrj connect_better_edge_p (const_edge e, bool src_index_p, int best_len, 1003*38fd1498Szrj const_edge cur_best_edge, struct trace *traces) 1004*38fd1498Szrj { 1005*38fd1498Szrj int e_index; 1006*38fd1498Szrj int b_index; 1007*38fd1498Szrj bool is_better_edge; 1008*38fd1498Szrj 1009*38fd1498Szrj if (!cur_best_edge) 1010*38fd1498Szrj return true; 1011*38fd1498Szrj 1012*38fd1498Szrj if (optimize_function_for_size_p (cfun)) 1013*38fd1498Szrj { 1014*38fd1498Szrj e_index = src_index_p ? e->src->index : e->dest->index; 1015*38fd1498Szrj b_index = src_index_p ? cur_best_edge->src->index 1016*38fd1498Szrj : cur_best_edge->dest->index; 1017*38fd1498Szrj /* The smaller one is better to keep the original order. */ 1018*38fd1498Szrj return b_index > e_index; 1019*38fd1498Szrj } 1020*38fd1498Szrj 1021*38fd1498Szrj if (src_index_p) 1022*38fd1498Szrj { 1023*38fd1498Szrj e_index = e->src->index; 1024*38fd1498Szrj 1025*38fd1498Szrj /* We are looking for predecessor, so probabilities are not that 1026*38fd1498Szrj informative. We do not want to connect A to B becuse A has 1027*38fd1498Szrj only one sucessor (probablity is 100%) while there is edge 1028*38fd1498Szrj A' to B where probability is 90% but which is much more frequent. */ 1029*38fd1498Szrj if (e->count () > cur_best_edge->count ()) 1030*38fd1498Szrj /* The edge has higher probability than the temporary best edge. */ 1031*38fd1498Szrj is_better_edge = true; 1032*38fd1498Szrj else if (e->count () < cur_best_edge->count ()) 1033*38fd1498Szrj /* The edge has lower probability than the temporary best edge. */ 1034*38fd1498Szrj is_better_edge = false; 1035*38fd1498Szrj if (e->probability > cur_best_edge->probability) 1036*38fd1498Szrj /* The edge has higher probability than the temporary best edge. */ 1037*38fd1498Szrj is_better_edge = true; 1038*38fd1498Szrj else if (e->probability < cur_best_edge->probability) 1039*38fd1498Szrj /* The edge has lower probability than the temporary best edge. */ 1040*38fd1498Szrj is_better_edge = false; 1041*38fd1498Szrj else if (traces[bbd[e_index].end_of_trace].length > best_len) 1042*38fd1498Szrj /* The edge and the temporary best edge have equivalent probabilities. 1043*38fd1498Szrj The edge with longer trace is better. */ 1044*38fd1498Szrj is_better_edge = true; 1045*38fd1498Szrj else 1046*38fd1498Szrj is_better_edge = false; 1047*38fd1498Szrj } 1048*38fd1498Szrj else 1049*38fd1498Szrj { 1050*38fd1498Szrj e_index = e->dest->index; 1051*38fd1498Szrj 1052*38fd1498Szrj if (e->probability > cur_best_edge->probability) 1053*38fd1498Szrj /* The edge has higher probability than the temporary best edge. */ 1054*38fd1498Szrj is_better_edge = true; 1055*38fd1498Szrj else if (e->probability < cur_best_edge->probability) 1056*38fd1498Szrj /* The edge has lower probability than the temporary best edge. */ 1057*38fd1498Szrj is_better_edge = false; 1058*38fd1498Szrj else if (traces[bbd[e_index].start_of_trace].length > best_len) 1059*38fd1498Szrj /* The edge and the temporary best edge have equivalent probabilities. 1060*38fd1498Szrj The edge with longer trace is better. */ 1061*38fd1498Szrj is_better_edge = true; 1062*38fd1498Szrj else 1063*38fd1498Szrj is_better_edge = false; 1064*38fd1498Szrj } 1065*38fd1498Szrj 1066*38fd1498Szrj return is_better_edge; 1067*38fd1498Szrj } 1068*38fd1498Szrj 1069*38fd1498Szrj /* Connect traces in array TRACES, N_TRACES is the count of traces. */ 1070*38fd1498Szrj 1071*38fd1498Szrj static void 1072*38fd1498Szrj connect_traces (int n_traces, struct trace *traces) 1073*38fd1498Szrj { 1074*38fd1498Szrj int i; 1075*38fd1498Szrj bool *connected; 1076*38fd1498Szrj bool two_passes; 1077*38fd1498Szrj int last_trace; 1078*38fd1498Szrj int current_pass; 1079*38fd1498Szrj int current_partition; 1080*38fd1498Szrj profile_count count_threshold; 1081*38fd1498Szrj bool for_size = optimize_function_for_size_p (cfun); 1082*38fd1498Szrj 1083*38fd1498Szrj count_threshold = max_entry_count.apply_scale (DUPLICATION_THRESHOLD, 1000); 1084*38fd1498Szrj 1085*38fd1498Szrj connected = XCNEWVEC (bool, n_traces); 1086*38fd1498Szrj last_trace = -1; 1087*38fd1498Szrj current_pass = 1; 1088*38fd1498Szrj current_partition = BB_PARTITION (traces[0].first); 1089*38fd1498Szrj two_passes = false; 1090*38fd1498Szrj 1091*38fd1498Szrj if (crtl->has_bb_partition) 1092*38fd1498Szrj for (i = 0; i < n_traces && !two_passes; i++) 1093*38fd1498Szrj if (BB_PARTITION (traces[0].first) 1094*38fd1498Szrj != BB_PARTITION (traces[i].first)) 1095*38fd1498Szrj two_passes = true; 1096*38fd1498Szrj 1097*38fd1498Szrj for (i = 0; i < n_traces || (two_passes && current_pass == 1) ; i++) 1098*38fd1498Szrj { 1099*38fd1498Szrj int t = i; 1100*38fd1498Szrj int t2; 1101*38fd1498Szrj edge e, best; 1102*38fd1498Szrj int best_len; 1103*38fd1498Szrj 1104*38fd1498Szrj if (i >= n_traces) 1105*38fd1498Szrj { 1106*38fd1498Szrj gcc_assert (two_passes && current_pass == 1); 1107*38fd1498Szrj i = 0; 1108*38fd1498Szrj t = i; 1109*38fd1498Szrj current_pass = 2; 1110*38fd1498Szrj if (current_partition == BB_HOT_PARTITION) 1111*38fd1498Szrj current_partition = BB_COLD_PARTITION; 1112*38fd1498Szrj else 1113*38fd1498Szrj current_partition = BB_HOT_PARTITION; 1114*38fd1498Szrj } 1115*38fd1498Szrj 1116*38fd1498Szrj if (connected[t]) 1117*38fd1498Szrj continue; 1118*38fd1498Szrj 1119*38fd1498Szrj if (two_passes 1120*38fd1498Szrj && BB_PARTITION (traces[t].first) != current_partition) 1121*38fd1498Szrj continue; 1122*38fd1498Szrj 1123*38fd1498Szrj connected[t] = true; 1124*38fd1498Szrj 1125*38fd1498Szrj /* Find the predecessor traces. */ 1126*38fd1498Szrj for (t2 = t; t2 > 0;) 1127*38fd1498Szrj { 1128*38fd1498Szrj edge_iterator ei; 1129*38fd1498Szrj best = NULL; 1130*38fd1498Szrj best_len = 0; 1131*38fd1498Szrj FOR_EACH_EDGE (e, ei, traces[t2].first->preds) 1132*38fd1498Szrj { 1133*38fd1498Szrj int si = e->src->index; 1134*38fd1498Szrj 1135*38fd1498Szrj if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun) 1136*38fd1498Szrj && (e->flags & EDGE_CAN_FALLTHRU) 1137*38fd1498Szrj && !(e->flags & EDGE_COMPLEX) 1138*38fd1498Szrj && bbd[si].end_of_trace >= 0 1139*38fd1498Szrj && !connected[bbd[si].end_of_trace] 1140*38fd1498Szrj && (BB_PARTITION (e->src) == current_partition) 1141*38fd1498Szrj && connect_better_edge_p (e, true, best_len, best, traces)) 1142*38fd1498Szrj { 1143*38fd1498Szrj best = e; 1144*38fd1498Szrj best_len = traces[bbd[si].end_of_trace].length; 1145*38fd1498Szrj } 1146*38fd1498Szrj } 1147*38fd1498Szrj if (best) 1148*38fd1498Szrj { 1149*38fd1498Szrj best->src->aux = best->dest; 1150*38fd1498Szrj t2 = bbd[best->src->index].end_of_trace; 1151*38fd1498Szrj connected[t2] = true; 1152*38fd1498Szrj 1153*38fd1498Szrj if (dump_file) 1154*38fd1498Szrj { 1155*38fd1498Szrj fprintf (dump_file, "Connection: %d %d\n", 1156*38fd1498Szrj best->src->index, best->dest->index); 1157*38fd1498Szrj } 1158*38fd1498Szrj } 1159*38fd1498Szrj else 1160*38fd1498Szrj break; 1161*38fd1498Szrj } 1162*38fd1498Szrj 1163*38fd1498Szrj if (last_trace >= 0) 1164*38fd1498Szrj traces[last_trace].last->aux = traces[t2].first; 1165*38fd1498Szrj last_trace = t; 1166*38fd1498Szrj 1167*38fd1498Szrj /* Find the successor traces. */ 1168*38fd1498Szrj while (1) 1169*38fd1498Szrj { 1170*38fd1498Szrj /* Find the continuation of the chain. */ 1171*38fd1498Szrj edge_iterator ei; 1172*38fd1498Szrj best = NULL; 1173*38fd1498Szrj best_len = 0; 1174*38fd1498Szrj FOR_EACH_EDGE (e, ei, traces[t].last->succs) 1175*38fd1498Szrj { 1176*38fd1498Szrj int di = e->dest->index; 1177*38fd1498Szrj 1178*38fd1498Szrj if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun) 1179*38fd1498Szrj && (e->flags & EDGE_CAN_FALLTHRU) 1180*38fd1498Szrj && !(e->flags & EDGE_COMPLEX) 1181*38fd1498Szrj && bbd[di].start_of_trace >= 0 1182*38fd1498Szrj && !connected[bbd[di].start_of_trace] 1183*38fd1498Szrj && (BB_PARTITION (e->dest) == current_partition) 1184*38fd1498Szrj && connect_better_edge_p (e, false, best_len, best, traces)) 1185*38fd1498Szrj { 1186*38fd1498Szrj best = e; 1187*38fd1498Szrj best_len = traces[bbd[di].start_of_trace].length; 1188*38fd1498Szrj } 1189*38fd1498Szrj } 1190*38fd1498Szrj 1191*38fd1498Szrj if (for_size) 1192*38fd1498Szrj { 1193*38fd1498Szrj if (!best) 1194*38fd1498Szrj /* Stop finding the successor traces. */ 1195*38fd1498Szrj break; 1196*38fd1498Szrj 1197*38fd1498Szrj /* It is OK to connect block n with block n + 1 or a block 1198*38fd1498Szrj before n. For others, only connect to the loop header. */ 1199*38fd1498Szrj if (best->dest->index > (traces[t].last->index + 1)) 1200*38fd1498Szrj { 1201*38fd1498Szrj int count = EDGE_COUNT (best->dest->preds); 1202*38fd1498Szrj 1203*38fd1498Szrj FOR_EACH_EDGE (e, ei, best->dest->preds) 1204*38fd1498Szrj if (e->flags & EDGE_DFS_BACK) 1205*38fd1498Szrj count--; 1206*38fd1498Szrj 1207*38fd1498Szrj /* If dest has multiple predecessors, skip it. We expect 1208*38fd1498Szrj that one predecessor with smaller index connects with it 1209*38fd1498Szrj later. */ 1210*38fd1498Szrj if (count != 1) 1211*38fd1498Szrj break; 1212*38fd1498Szrj } 1213*38fd1498Szrj 1214*38fd1498Szrj /* Only connect Trace n with Trace n + 1. It is conservative 1215*38fd1498Szrj to keep the order as close as possible to the original order. 1216*38fd1498Szrj It also helps to reduce long jumps. */ 1217*38fd1498Szrj if (last_trace != bbd[best->dest->index].start_of_trace - 1) 1218*38fd1498Szrj break; 1219*38fd1498Szrj 1220*38fd1498Szrj if (dump_file) 1221*38fd1498Szrj fprintf (dump_file, "Connection: %d %d\n", 1222*38fd1498Szrj best->src->index, best->dest->index); 1223*38fd1498Szrj 1224*38fd1498Szrj t = bbd[best->dest->index].start_of_trace; 1225*38fd1498Szrj traces[last_trace].last->aux = traces[t].first; 1226*38fd1498Szrj connected[t] = true; 1227*38fd1498Szrj last_trace = t; 1228*38fd1498Szrj } 1229*38fd1498Szrj else if (best) 1230*38fd1498Szrj { 1231*38fd1498Szrj if (dump_file) 1232*38fd1498Szrj { 1233*38fd1498Szrj fprintf (dump_file, "Connection: %d %d\n", 1234*38fd1498Szrj best->src->index, best->dest->index); 1235*38fd1498Szrj } 1236*38fd1498Szrj t = bbd[best->dest->index].start_of_trace; 1237*38fd1498Szrj traces[last_trace].last->aux = traces[t].first; 1238*38fd1498Szrj connected[t] = true; 1239*38fd1498Szrj last_trace = t; 1240*38fd1498Szrj } 1241*38fd1498Szrj else 1242*38fd1498Szrj { 1243*38fd1498Szrj /* Try to connect the traces by duplication of 1 block. */ 1244*38fd1498Szrj edge e2; 1245*38fd1498Szrj basic_block next_bb = NULL; 1246*38fd1498Szrj bool try_copy = false; 1247*38fd1498Szrj 1248*38fd1498Szrj FOR_EACH_EDGE (e, ei, traces[t].last->succs) 1249*38fd1498Szrj if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun) 1250*38fd1498Szrj && (e->flags & EDGE_CAN_FALLTHRU) 1251*38fd1498Szrj && !(e->flags & EDGE_COMPLEX) 1252*38fd1498Szrj && (!best || e->probability > best->probability)) 1253*38fd1498Szrj { 1254*38fd1498Szrj edge_iterator ei; 1255*38fd1498Szrj edge best2 = NULL; 1256*38fd1498Szrj int best2_len = 0; 1257*38fd1498Szrj 1258*38fd1498Szrj /* If the destination is a start of a trace which is only 1259*38fd1498Szrj one block long, then no need to search the successor 1260*38fd1498Szrj blocks of the trace. Accept it. */ 1261*38fd1498Szrj if (bbd[e->dest->index].start_of_trace >= 0 1262*38fd1498Szrj && traces[bbd[e->dest->index].start_of_trace].length 1263*38fd1498Szrj == 1) 1264*38fd1498Szrj { 1265*38fd1498Szrj best = e; 1266*38fd1498Szrj try_copy = true; 1267*38fd1498Szrj continue; 1268*38fd1498Szrj } 1269*38fd1498Szrj 1270*38fd1498Szrj FOR_EACH_EDGE (e2, ei, e->dest->succs) 1271*38fd1498Szrj { 1272*38fd1498Szrj int di = e2->dest->index; 1273*38fd1498Szrj 1274*38fd1498Szrj if (e2->dest == EXIT_BLOCK_PTR_FOR_FN (cfun) 1275*38fd1498Szrj || ((e2->flags & EDGE_CAN_FALLTHRU) 1276*38fd1498Szrj && !(e2->flags & EDGE_COMPLEX) 1277*38fd1498Szrj && bbd[di].start_of_trace >= 0 1278*38fd1498Szrj && !connected[bbd[di].start_of_trace] 1279*38fd1498Szrj && BB_PARTITION (e2->dest) == current_partition 1280*38fd1498Szrj && e2->count () >= count_threshold 1281*38fd1498Szrj && (!best2 1282*38fd1498Szrj || e2->probability > best2->probability 1283*38fd1498Szrj || (e2->probability == best2->probability 1284*38fd1498Szrj && traces[bbd[di].start_of_trace].length 1285*38fd1498Szrj > best2_len)))) 1286*38fd1498Szrj { 1287*38fd1498Szrj best = e; 1288*38fd1498Szrj best2 = e2; 1289*38fd1498Szrj if (e2->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)) 1290*38fd1498Szrj best2_len = traces[bbd[di].start_of_trace].length; 1291*38fd1498Szrj else 1292*38fd1498Szrj best2_len = INT_MAX; 1293*38fd1498Szrj next_bb = e2->dest; 1294*38fd1498Szrj try_copy = true; 1295*38fd1498Szrj } 1296*38fd1498Szrj } 1297*38fd1498Szrj } 1298*38fd1498Szrj 1299*38fd1498Szrj /* Copy tiny blocks always; copy larger blocks only when the 1300*38fd1498Szrj edge is traversed frequently enough. */ 1301*38fd1498Szrj if (try_copy 1302*38fd1498Szrj && BB_PARTITION (best->src) == BB_PARTITION (best->dest) 1303*38fd1498Szrj && copy_bb_p (best->dest, 1304*38fd1498Szrj optimize_edge_for_speed_p (best) 1305*38fd1498Szrj && (!best->count ().initialized_p () 1306*38fd1498Szrj || best->count () >= count_threshold))) 1307*38fd1498Szrj { 1308*38fd1498Szrj basic_block new_bb; 1309*38fd1498Szrj 1310*38fd1498Szrj if (dump_file) 1311*38fd1498Szrj { 1312*38fd1498Szrj fprintf (dump_file, "Connection: %d %d ", 1313*38fd1498Szrj traces[t].last->index, best->dest->index); 1314*38fd1498Szrj if (!next_bb) 1315*38fd1498Szrj fputc ('\n', dump_file); 1316*38fd1498Szrj else if (next_bb == EXIT_BLOCK_PTR_FOR_FN (cfun)) 1317*38fd1498Szrj fprintf (dump_file, "exit\n"); 1318*38fd1498Szrj else 1319*38fd1498Szrj fprintf (dump_file, "%d\n", next_bb->index); 1320*38fd1498Szrj } 1321*38fd1498Szrj 1322*38fd1498Szrj new_bb = copy_bb (best->dest, best, traces[t].last, t); 1323*38fd1498Szrj traces[t].last = new_bb; 1324*38fd1498Szrj if (next_bb && next_bb != EXIT_BLOCK_PTR_FOR_FN (cfun)) 1325*38fd1498Szrj { 1326*38fd1498Szrj t = bbd[next_bb->index].start_of_trace; 1327*38fd1498Szrj traces[last_trace].last->aux = traces[t].first; 1328*38fd1498Szrj connected[t] = true; 1329*38fd1498Szrj last_trace = t; 1330*38fd1498Szrj } 1331*38fd1498Szrj else 1332*38fd1498Szrj break; /* Stop finding the successor traces. */ 1333*38fd1498Szrj } 1334*38fd1498Szrj else 1335*38fd1498Szrj break; /* Stop finding the successor traces. */ 1336*38fd1498Szrj } 1337*38fd1498Szrj } 1338*38fd1498Szrj } 1339*38fd1498Szrj 1340*38fd1498Szrj if (dump_file) 1341*38fd1498Szrj { 1342*38fd1498Szrj basic_block bb; 1343*38fd1498Szrj 1344*38fd1498Szrj fprintf (dump_file, "Final order:\n"); 1345*38fd1498Szrj for (bb = traces[0].first; bb; bb = (basic_block) bb->aux) 1346*38fd1498Szrj fprintf (dump_file, "%d ", bb->index); 1347*38fd1498Szrj fprintf (dump_file, "\n"); 1348*38fd1498Szrj fflush (dump_file); 1349*38fd1498Szrj } 1350*38fd1498Szrj 1351*38fd1498Szrj FREE (connected); 1352*38fd1498Szrj } 1353*38fd1498Szrj 1354*38fd1498Szrj /* Return true when BB can and should be copied. CODE_MAY_GROW is true 1355*38fd1498Szrj when code size is allowed to grow by duplication. */ 1356*38fd1498Szrj 1357*38fd1498Szrj static bool 1358*38fd1498Szrj copy_bb_p (const_basic_block bb, int code_may_grow) 1359*38fd1498Szrj { 1360*38fd1498Szrj int size = 0; 1361*38fd1498Szrj int max_size = uncond_jump_length; 1362*38fd1498Szrj rtx_insn *insn; 1363*38fd1498Szrj 1364*38fd1498Szrj if (EDGE_COUNT (bb->preds) < 2) 1365*38fd1498Szrj return false; 1366*38fd1498Szrj if (!can_duplicate_block_p (bb)) 1367*38fd1498Szrj return false; 1368*38fd1498Szrj 1369*38fd1498Szrj /* Avoid duplicating blocks which have many successors (PR/13430). */ 1370*38fd1498Szrj if (EDGE_COUNT (bb->succs) > 8) 1371*38fd1498Szrj return false; 1372*38fd1498Szrj 1373*38fd1498Szrj if (code_may_grow && optimize_bb_for_speed_p (bb)) 1374*38fd1498Szrj max_size *= PARAM_VALUE (PARAM_MAX_GROW_COPY_BB_INSNS); 1375*38fd1498Szrj 1376*38fd1498Szrj FOR_BB_INSNS (bb, insn) 1377*38fd1498Szrj { 1378*38fd1498Szrj if (INSN_P (insn)) 1379*38fd1498Szrj size += get_attr_min_length (insn); 1380*38fd1498Szrj } 1381*38fd1498Szrj 1382*38fd1498Szrj if (size <= max_size) 1383*38fd1498Szrj return true; 1384*38fd1498Szrj 1385*38fd1498Szrj if (dump_file) 1386*38fd1498Szrj { 1387*38fd1498Szrj fprintf (dump_file, 1388*38fd1498Szrj "Block %d can't be copied because its size = %d.\n", 1389*38fd1498Szrj bb->index, size); 1390*38fd1498Szrj } 1391*38fd1498Szrj 1392*38fd1498Szrj return false; 1393*38fd1498Szrj } 1394*38fd1498Szrj 1395*38fd1498Szrj /* Return the length of unconditional jump instruction. */ 1396*38fd1498Szrj 1397*38fd1498Szrj int 1398*38fd1498Szrj get_uncond_jump_length (void) 1399*38fd1498Szrj { 1400*38fd1498Szrj int length; 1401*38fd1498Szrj 1402*38fd1498Szrj start_sequence (); 1403*38fd1498Szrj rtx_code_label *label = emit_label (gen_label_rtx ()); 1404*38fd1498Szrj rtx_insn *jump = emit_jump_insn (targetm.gen_jump (label)); 1405*38fd1498Szrj length = get_attr_min_length (jump); 1406*38fd1498Szrj end_sequence (); 1407*38fd1498Szrj 1408*38fd1498Szrj return length; 1409*38fd1498Szrj } 1410*38fd1498Szrj 1411*38fd1498Szrj /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. 1412*38fd1498Szrj Add a new landing pad that will just jump to the old one and split the 1413*38fd1498Szrj edges so that no EH edge crosses partitions. */ 1414*38fd1498Szrj 1415*38fd1498Szrj static void 1416*38fd1498Szrj fix_up_crossing_landing_pad (eh_landing_pad old_lp, basic_block old_bb) 1417*38fd1498Szrj { 1418*38fd1498Szrj eh_landing_pad new_lp; 1419*38fd1498Szrj basic_block new_bb, last_bb; 1420*38fd1498Szrj rtx_insn *jump; 1421*38fd1498Szrj unsigned new_partition; 1422*38fd1498Szrj edge_iterator ei; 1423*38fd1498Szrj edge e; 1424*38fd1498Szrj 1425*38fd1498Szrj /* Generate the new landing-pad structure. */ 1426*38fd1498Szrj new_lp = gen_eh_landing_pad (old_lp->region); 1427*38fd1498Szrj new_lp->post_landing_pad = old_lp->post_landing_pad; 1428*38fd1498Szrj new_lp->landing_pad = gen_label_rtx (); 1429*38fd1498Szrj LABEL_PRESERVE_P (new_lp->landing_pad) = 1; 1430*38fd1498Szrj 1431*38fd1498Szrj /* Put appropriate instructions in new bb. */ 1432*38fd1498Szrj rtx_code_label *new_label = emit_label (new_lp->landing_pad); 1433*38fd1498Szrj 1434*38fd1498Szrj rtx_code_label *old_label = block_label (old_bb); 1435*38fd1498Szrj jump = emit_jump_insn (targetm.gen_jump (old_label)); 1436*38fd1498Szrj JUMP_LABEL (jump) = old_label; 1437*38fd1498Szrj 1438*38fd1498Szrj /* Create new basic block to be dest for lp. */ 1439*38fd1498Szrj last_bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb; 1440*38fd1498Szrj new_bb = create_basic_block (new_label, jump, last_bb); 1441*38fd1498Szrj new_bb->aux = last_bb->aux; 1442*38fd1498Szrj new_bb->count = old_bb->count; 1443*38fd1498Szrj last_bb->aux = new_bb; 1444*38fd1498Szrj 1445*38fd1498Szrj emit_barrier_after_bb (new_bb); 1446*38fd1498Szrj 1447*38fd1498Szrj make_single_succ_edge (new_bb, old_bb, 0); 1448*38fd1498Szrj 1449*38fd1498Szrj /* Make sure new bb is in the other partition. */ 1450*38fd1498Szrj new_partition = BB_PARTITION (old_bb); 1451*38fd1498Szrj new_partition ^= BB_HOT_PARTITION | BB_COLD_PARTITION; 1452*38fd1498Szrj BB_SET_PARTITION (new_bb, new_partition); 1453*38fd1498Szrj 1454*38fd1498Szrj /* Fix up the edges. */ 1455*38fd1498Szrj for (ei = ei_start (old_bb->preds); (e = ei_safe_edge (ei)) != NULL; ) 1456*38fd1498Szrj if (e->src != new_bb && BB_PARTITION (e->src) == new_partition) 1457*38fd1498Szrj { 1458*38fd1498Szrj rtx_insn *insn = BB_END (e->src); 1459*38fd1498Szrj rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX); 1460*38fd1498Szrj 1461*38fd1498Szrj gcc_assert (note != NULL); 1462*38fd1498Szrj gcc_checking_assert (INTVAL (XEXP (note, 0)) == old_lp->index); 1463*38fd1498Szrj XEXP (note, 0) = GEN_INT (new_lp->index); 1464*38fd1498Szrj 1465*38fd1498Szrj /* Adjust the edge to the new destination. */ 1466*38fd1498Szrj redirect_edge_succ (e, new_bb); 1467*38fd1498Szrj } 1468*38fd1498Szrj else 1469*38fd1498Szrj ei_next (&ei); 1470*38fd1498Szrj } 1471*38fd1498Szrj 1472*38fd1498Szrj 1473*38fd1498Szrj /* Ensure that all hot bbs are included in a hot path through the 1474*38fd1498Szrj procedure. This is done by calling this function twice, once 1475*38fd1498Szrj with WALK_UP true (to look for paths from the entry to hot bbs) and 1476*38fd1498Szrj once with WALK_UP false (to look for paths from hot bbs to the exit). 1477*38fd1498Szrj Returns the updated value of COLD_BB_COUNT and adds newly-hot bbs 1478*38fd1498Szrj to BBS_IN_HOT_PARTITION. */ 1479*38fd1498Szrj 1480*38fd1498Szrj static unsigned int 1481*38fd1498Szrj sanitize_hot_paths (bool walk_up, unsigned int cold_bb_count, 1482*38fd1498Szrj vec<basic_block> *bbs_in_hot_partition) 1483*38fd1498Szrj { 1484*38fd1498Szrj /* Callers check this. */ 1485*38fd1498Szrj gcc_checking_assert (cold_bb_count); 1486*38fd1498Szrj 1487*38fd1498Szrj /* Keep examining hot bbs while we still have some left to check 1488*38fd1498Szrj and there are remaining cold bbs. */ 1489*38fd1498Szrj vec<basic_block> hot_bbs_to_check = bbs_in_hot_partition->copy (); 1490*38fd1498Szrj while (! hot_bbs_to_check.is_empty () 1491*38fd1498Szrj && cold_bb_count) 1492*38fd1498Szrj { 1493*38fd1498Szrj basic_block bb = hot_bbs_to_check.pop (); 1494*38fd1498Szrj vec<edge, va_gc> *edges = walk_up ? bb->preds : bb->succs; 1495*38fd1498Szrj edge e; 1496*38fd1498Szrj edge_iterator ei; 1497*38fd1498Szrj profile_probability highest_probability 1498*38fd1498Szrj = profile_probability::uninitialized (); 1499*38fd1498Szrj profile_count highest_count = profile_count::uninitialized (); 1500*38fd1498Szrj bool found = false; 1501*38fd1498Szrj 1502*38fd1498Szrj /* Walk the preds/succs and check if there is at least one already 1503*38fd1498Szrj marked hot. Keep track of the most frequent pred/succ so that we 1504*38fd1498Szrj can mark it hot if we don't find one. */ 1505*38fd1498Szrj FOR_EACH_EDGE (e, ei, edges) 1506*38fd1498Szrj { 1507*38fd1498Szrj basic_block reach_bb = walk_up ? e->src : e->dest; 1508*38fd1498Szrj 1509*38fd1498Szrj if (e->flags & EDGE_DFS_BACK) 1510*38fd1498Szrj continue; 1511*38fd1498Szrj 1512*38fd1498Szrj /* Do not expect profile insanities when profile was not adjusted. */ 1513*38fd1498Szrj if (e->probability == profile_probability::never () 1514*38fd1498Szrj || e->count () == profile_count::zero ()) 1515*38fd1498Szrj continue; 1516*38fd1498Szrj 1517*38fd1498Szrj if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION) 1518*38fd1498Szrj { 1519*38fd1498Szrj found = true; 1520*38fd1498Szrj break; 1521*38fd1498Szrj } 1522*38fd1498Szrj /* The following loop will look for the hottest edge via 1523*38fd1498Szrj the edge count, if it is non-zero, then fallback to 1524*38fd1498Szrj the edge probability. */ 1525*38fd1498Szrj if (!(e->count () > highest_count)) 1526*38fd1498Szrj highest_count = e->count (); 1527*38fd1498Szrj if (!highest_probability.initialized_p () 1528*38fd1498Szrj || e->probability > highest_probability) 1529*38fd1498Szrj highest_probability = e->probability; 1530*38fd1498Szrj } 1531*38fd1498Szrj 1532*38fd1498Szrj /* If bb is reached by (or reaches, in the case of !WALK_UP) another hot 1533*38fd1498Szrj block (or unpartitioned, e.g. the entry block) then it is ok. If not, 1534*38fd1498Szrj then the most frequent pred (or succ) needs to be adjusted. In the 1535*38fd1498Szrj case where multiple preds/succs have the same frequency (e.g. a 1536*38fd1498Szrj 50-50 branch), then both will be adjusted. */ 1537*38fd1498Szrj if (found) 1538*38fd1498Szrj continue; 1539*38fd1498Szrj 1540*38fd1498Szrj FOR_EACH_EDGE (e, ei, edges) 1541*38fd1498Szrj { 1542*38fd1498Szrj if (e->flags & EDGE_DFS_BACK) 1543*38fd1498Szrj continue; 1544*38fd1498Szrj /* Do not expect profile insanities when profile was not adjusted. */ 1545*38fd1498Szrj if (e->probability == profile_probability::never () 1546*38fd1498Szrj || e->count () == profile_count::zero ()) 1547*38fd1498Szrj continue; 1548*38fd1498Szrj /* Select the hottest edge using the edge count, if it is non-zero, 1549*38fd1498Szrj then fallback to the edge probability. */ 1550*38fd1498Szrj if (highest_count.initialized_p ()) 1551*38fd1498Szrj { 1552*38fd1498Szrj if (!(e->count () >= highest_count)) 1553*38fd1498Szrj continue; 1554*38fd1498Szrj } 1555*38fd1498Szrj else if (!(e->probability >= highest_probability)) 1556*38fd1498Szrj continue; 1557*38fd1498Szrj 1558*38fd1498Szrj basic_block reach_bb = walk_up ? e->src : e->dest; 1559*38fd1498Szrj 1560*38fd1498Szrj /* We have a hot bb with an immediate dominator that is cold. 1561*38fd1498Szrj The dominator needs to be re-marked hot. */ 1562*38fd1498Szrj BB_SET_PARTITION (reach_bb, BB_HOT_PARTITION); 1563*38fd1498Szrj if (dump_file) 1564*38fd1498Szrj fprintf (dump_file, "Promoting bb %i to hot partition to sanitize " 1565*38fd1498Szrj "profile of bb %i in %s walk\n", reach_bb->index, 1566*38fd1498Szrj bb->index, walk_up ? "backward" : "forward"); 1567*38fd1498Szrj cold_bb_count--; 1568*38fd1498Szrj 1569*38fd1498Szrj /* Now we need to examine newly-hot reach_bb to see if it is also 1570*38fd1498Szrj dominated by a cold bb. */ 1571*38fd1498Szrj bbs_in_hot_partition->safe_push (reach_bb); 1572*38fd1498Szrj hot_bbs_to_check.safe_push (reach_bb); 1573*38fd1498Szrj } 1574*38fd1498Szrj } 1575*38fd1498Szrj 1576*38fd1498Szrj return cold_bb_count; 1577*38fd1498Szrj } 1578*38fd1498Szrj 1579*38fd1498Szrj 1580*38fd1498Szrj /* Find the basic blocks that are rarely executed and need to be moved to 1581*38fd1498Szrj a separate section of the .o file (to cut down on paging and improve 1582*38fd1498Szrj cache locality). Return a vector of all edges that cross. */ 1583*38fd1498Szrj 1584*38fd1498Szrj static vec<edge> 1585*38fd1498Szrj find_rarely_executed_basic_blocks_and_crossing_edges (void) 1586*38fd1498Szrj { 1587*38fd1498Szrj vec<edge> crossing_edges = vNULL; 1588*38fd1498Szrj basic_block bb; 1589*38fd1498Szrj edge e; 1590*38fd1498Szrj edge_iterator ei; 1591*38fd1498Szrj unsigned int cold_bb_count = 0; 1592*38fd1498Szrj auto_vec<basic_block> bbs_in_hot_partition; 1593*38fd1498Szrj 1594*38fd1498Szrj propagate_unlikely_bbs_forward (); 1595*38fd1498Szrj 1596*38fd1498Szrj /* Mark which partition (hot/cold) each basic block belongs in. */ 1597*38fd1498Szrj FOR_EACH_BB_FN (bb, cfun) 1598*38fd1498Szrj { 1599*38fd1498Szrj bool cold_bb = false; 1600*38fd1498Szrj 1601*38fd1498Szrj if (probably_never_executed_bb_p (cfun, bb)) 1602*38fd1498Szrj { 1603*38fd1498Szrj /* Handle profile insanities created by upstream optimizations 1604*38fd1498Szrj by also checking the incoming edge weights. If there is a non-cold 1605*38fd1498Szrj incoming edge, conservatively prevent this block from being split 1606*38fd1498Szrj into the cold section. */ 1607*38fd1498Szrj cold_bb = true; 1608*38fd1498Szrj FOR_EACH_EDGE (e, ei, bb->preds) 1609*38fd1498Szrj if (!probably_never_executed_edge_p (cfun, e)) 1610*38fd1498Szrj { 1611*38fd1498Szrj cold_bb = false; 1612*38fd1498Szrj break; 1613*38fd1498Szrj } 1614*38fd1498Szrj } 1615*38fd1498Szrj if (cold_bb) 1616*38fd1498Szrj { 1617*38fd1498Szrj BB_SET_PARTITION (bb, BB_COLD_PARTITION); 1618*38fd1498Szrj cold_bb_count++; 1619*38fd1498Szrj } 1620*38fd1498Szrj else 1621*38fd1498Szrj { 1622*38fd1498Szrj BB_SET_PARTITION (bb, BB_HOT_PARTITION); 1623*38fd1498Szrj bbs_in_hot_partition.safe_push (bb); 1624*38fd1498Szrj } 1625*38fd1498Szrj } 1626*38fd1498Szrj 1627*38fd1498Szrj /* Ensure that hot bbs are included along a hot path from the entry to exit. 1628*38fd1498Szrj Several different possibilities may include cold bbs along all paths 1629*38fd1498Szrj to/from a hot bb. One is that there are edge weight insanities 1630*38fd1498Szrj due to optimization phases that do not properly update basic block profile 1631*38fd1498Szrj counts. The second is that the entry of the function may not be hot, because 1632*38fd1498Szrj it is entered fewer times than the number of profile training runs, but there 1633*38fd1498Szrj is a loop inside the function that causes blocks within the function to be 1634*38fd1498Szrj above the threshold for hotness. This is fixed by walking up from hot bbs 1635*38fd1498Szrj to the entry block, and then down from hot bbs to the exit, performing 1636*38fd1498Szrj partitioning fixups as necessary. */ 1637*38fd1498Szrj if (cold_bb_count) 1638*38fd1498Szrj { 1639*38fd1498Szrj mark_dfs_back_edges (); 1640*38fd1498Szrj cold_bb_count = sanitize_hot_paths (true, cold_bb_count, 1641*38fd1498Szrj &bbs_in_hot_partition); 1642*38fd1498Szrj if (cold_bb_count) 1643*38fd1498Szrj sanitize_hot_paths (false, cold_bb_count, &bbs_in_hot_partition); 1644*38fd1498Szrj 1645*38fd1498Szrj hash_set <basic_block> set; 1646*38fd1498Szrj find_bbs_reachable_by_hot_paths (&set); 1647*38fd1498Szrj FOR_EACH_BB_FN (bb, cfun) 1648*38fd1498Szrj if (!set.contains (bb)) 1649*38fd1498Szrj BB_SET_PARTITION (bb, BB_COLD_PARTITION); 1650*38fd1498Szrj } 1651*38fd1498Szrj 1652*38fd1498Szrj /* The format of .gcc_except_table does not allow landing pads to 1653*38fd1498Szrj be in a different partition as the throw. Fix this by either 1654*38fd1498Szrj moving or duplicating the landing pads. */ 1655*38fd1498Szrj if (cfun->eh->lp_array) 1656*38fd1498Szrj { 1657*38fd1498Szrj unsigned i; 1658*38fd1498Szrj eh_landing_pad lp; 1659*38fd1498Szrj 1660*38fd1498Szrj FOR_EACH_VEC_ELT (*cfun->eh->lp_array, i, lp) 1661*38fd1498Szrj { 1662*38fd1498Szrj bool all_same, all_diff; 1663*38fd1498Szrj 1664*38fd1498Szrj if (lp == NULL 1665*38fd1498Szrj || lp->landing_pad == NULL_RTX 1666*38fd1498Szrj || !LABEL_P (lp->landing_pad)) 1667*38fd1498Szrj continue; 1668*38fd1498Szrj 1669*38fd1498Szrj all_same = all_diff = true; 1670*38fd1498Szrj bb = BLOCK_FOR_INSN (lp->landing_pad); 1671*38fd1498Szrj FOR_EACH_EDGE (e, ei, bb->preds) 1672*38fd1498Szrj { 1673*38fd1498Szrj gcc_assert (e->flags & EDGE_EH); 1674*38fd1498Szrj if (BB_PARTITION (bb) == BB_PARTITION (e->src)) 1675*38fd1498Szrj all_diff = false; 1676*38fd1498Szrj else 1677*38fd1498Szrj all_same = false; 1678*38fd1498Szrj } 1679*38fd1498Szrj 1680*38fd1498Szrj if (all_same) 1681*38fd1498Szrj ; 1682*38fd1498Szrj else if (all_diff) 1683*38fd1498Szrj { 1684*38fd1498Szrj int which = BB_PARTITION (bb); 1685*38fd1498Szrj which ^= BB_HOT_PARTITION | BB_COLD_PARTITION; 1686*38fd1498Szrj BB_SET_PARTITION (bb, which); 1687*38fd1498Szrj } 1688*38fd1498Szrj else 1689*38fd1498Szrj fix_up_crossing_landing_pad (lp, bb); 1690*38fd1498Szrj } 1691*38fd1498Szrj } 1692*38fd1498Szrj 1693*38fd1498Szrj /* Mark every edge that crosses between sections. */ 1694*38fd1498Szrj 1695*38fd1498Szrj FOR_EACH_BB_FN (bb, cfun) 1696*38fd1498Szrj FOR_EACH_EDGE (e, ei, bb->succs) 1697*38fd1498Szrj { 1698*38fd1498Szrj unsigned int flags = e->flags; 1699*38fd1498Szrj 1700*38fd1498Szrj /* We should never have EDGE_CROSSING set yet. */ 1701*38fd1498Szrj gcc_checking_assert ((flags & EDGE_CROSSING) == 0); 1702*38fd1498Szrj 1703*38fd1498Szrj if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun) 1704*38fd1498Szrj && e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun) 1705*38fd1498Szrj && BB_PARTITION (e->src) != BB_PARTITION (e->dest)) 1706*38fd1498Szrj { 1707*38fd1498Szrj crossing_edges.safe_push (e); 1708*38fd1498Szrj flags |= EDGE_CROSSING; 1709*38fd1498Szrj } 1710*38fd1498Szrj 1711*38fd1498Szrj /* Now that we've split eh edges as appropriate, allow landing pads 1712*38fd1498Szrj to be merged with the post-landing pads. */ 1713*38fd1498Szrj flags &= ~EDGE_PRESERVE; 1714*38fd1498Szrj 1715*38fd1498Szrj e->flags = flags; 1716*38fd1498Szrj } 1717*38fd1498Szrj 1718*38fd1498Szrj return crossing_edges; 1719*38fd1498Szrj } 1720*38fd1498Szrj 1721*38fd1498Szrj /* Set the flag EDGE_CAN_FALLTHRU for edges that can be fallthru. */ 1722*38fd1498Szrj 1723*38fd1498Szrj static void 1724*38fd1498Szrj set_edge_can_fallthru_flag (void) 1725*38fd1498Szrj { 1726*38fd1498Szrj basic_block bb; 1727*38fd1498Szrj 1728*38fd1498Szrj FOR_EACH_BB_FN (bb, cfun) 1729*38fd1498Szrj { 1730*38fd1498Szrj edge e; 1731*38fd1498Szrj edge_iterator ei; 1732*38fd1498Szrj 1733*38fd1498Szrj FOR_EACH_EDGE (e, ei, bb->succs) 1734*38fd1498Szrj { 1735*38fd1498Szrj e->flags &= ~EDGE_CAN_FALLTHRU; 1736*38fd1498Szrj 1737*38fd1498Szrj /* The FALLTHRU edge is also CAN_FALLTHRU edge. */ 1738*38fd1498Szrj if (e->flags & EDGE_FALLTHRU) 1739*38fd1498Szrj e->flags |= EDGE_CAN_FALLTHRU; 1740*38fd1498Szrj } 1741*38fd1498Szrj 1742*38fd1498Szrj /* If the BB ends with an invertible condjump all (2) edges are 1743*38fd1498Szrj CAN_FALLTHRU edges. */ 1744*38fd1498Szrj if (EDGE_COUNT (bb->succs) != 2) 1745*38fd1498Szrj continue; 1746*38fd1498Szrj if (!any_condjump_p (BB_END (bb))) 1747*38fd1498Szrj continue; 1748*38fd1498Szrj 1749*38fd1498Szrj rtx_jump_insn *bb_end_jump = as_a <rtx_jump_insn *> (BB_END (bb)); 1750*38fd1498Szrj if (!invert_jump (bb_end_jump, JUMP_LABEL (bb_end_jump), 0)) 1751*38fd1498Szrj continue; 1752*38fd1498Szrj invert_jump (bb_end_jump, JUMP_LABEL (bb_end_jump), 0); 1753*38fd1498Szrj EDGE_SUCC (bb, 0)->flags |= EDGE_CAN_FALLTHRU; 1754*38fd1498Szrj EDGE_SUCC (bb, 1)->flags |= EDGE_CAN_FALLTHRU; 1755*38fd1498Szrj } 1756*38fd1498Szrj } 1757*38fd1498Szrj 1758*38fd1498Szrj /* If any destination of a crossing edge does not have a label, add label; 1759*38fd1498Szrj Convert any easy fall-through crossing edges to unconditional jumps. */ 1760*38fd1498Szrj 1761*38fd1498Szrj static void 1762*38fd1498Szrj add_labels_and_missing_jumps (vec<edge> crossing_edges) 1763*38fd1498Szrj { 1764*38fd1498Szrj size_t i; 1765*38fd1498Szrj edge e; 1766*38fd1498Szrj 1767*38fd1498Szrj FOR_EACH_VEC_ELT (crossing_edges, i, e) 1768*38fd1498Szrj { 1769*38fd1498Szrj basic_block src = e->src; 1770*38fd1498Szrj basic_block dest = e->dest; 1771*38fd1498Szrj rtx_jump_insn *new_jump; 1772*38fd1498Szrj 1773*38fd1498Szrj if (dest == EXIT_BLOCK_PTR_FOR_FN (cfun)) 1774*38fd1498Szrj continue; 1775*38fd1498Szrj 1776*38fd1498Szrj /* Make sure dest has a label. */ 1777*38fd1498Szrj rtx_code_label *label = block_label (dest); 1778*38fd1498Szrj 1779*38fd1498Szrj /* Nothing to do for non-fallthru edges. */ 1780*38fd1498Szrj if (src == ENTRY_BLOCK_PTR_FOR_FN (cfun)) 1781*38fd1498Szrj continue; 1782*38fd1498Szrj if ((e->flags & EDGE_FALLTHRU) == 0) 1783*38fd1498Szrj continue; 1784*38fd1498Szrj 1785*38fd1498Szrj /* If the block does not end with a control flow insn, then we 1786*38fd1498Szrj can trivially add a jump to the end to fixup the crossing. 1787*38fd1498Szrj Otherwise the jump will have to go in a new bb, which will 1788*38fd1498Szrj be handled by fix_up_fall_thru_edges function. */ 1789*38fd1498Szrj if (control_flow_insn_p (BB_END (src))) 1790*38fd1498Szrj continue; 1791*38fd1498Szrj 1792*38fd1498Szrj /* Make sure there's only one successor. */ 1793*38fd1498Szrj gcc_assert (single_succ_p (src)); 1794*38fd1498Szrj 1795*38fd1498Szrj new_jump = emit_jump_insn_after (targetm.gen_jump (label), BB_END (src)); 1796*38fd1498Szrj BB_END (src) = new_jump; 1797*38fd1498Szrj JUMP_LABEL (new_jump) = label; 1798*38fd1498Szrj LABEL_NUSES (label) += 1; 1799*38fd1498Szrj 1800*38fd1498Szrj emit_barrier_after_bb (src); 1801*38fd1498Szrj 1802*38fd1498Szrj /* Mark edge as non-fallthru. */ 1803*38fd1498Szrj e->flags &= ~EDGE_FALLTHRU; 1804*38fd1498Szrj } 1805*38fd1498Szrj } 1806*38fd1498Szrj 1807*38fd1498Szrj /* Find any bb's where the fall-through edge is a crossing edge (note that 1808*38fd1498Szrj these bb's must also contain a conditional jump or end with a call 1809*38fd1498Szrj instruction; we've already dealt with fall-through edges for blocks 1810*38fd1498Szrj that didn't have a conditional jump or didn't end with call instruction 1811*38fd1498Szrj in the call to add_labels_and_missing_jumps). Convert the fall-through 1812*38fd1498Szrj edge to non-crossing edge by inserting a new bb to fall-through into. 1813*38fd1498Szrj The new bb will contain an unconditional jump (crossing edge) to the 1814*38fd1498Szrj original fall through destination. */ 1815*38fd1498Szrj 1816*38fd1498Szrj static void 1817*38fd1498Szrj fix_up_fall_thru_edges (void) 1818*38fd1498Szrj { 1819*38fd1498Szrj basic_block cur_bb; 1820*38fd1498Szrj 1821*38fd1498Szrj FOR_EACH_BB_FN (cur_bb, cfun) 1822*38fd1498Szrj { 1823*38fd1498Szrj edge succ1; 1824*38fd1498Szrj edge succ2; 1825*38fd1498Szrj edge fall_thru = NULL; 1826*38fd1498Szrj edge cond_jump = NULL; 1827*38fd1498Szrj 1828*38fd1498Szrj fall_thru = NULL; 1829*38fd1498Szrj if (EDGE_COUNT (cur_bb->succs) > 0) 1830*38fd1498Szrj succ1 = EDGE_SUCC (cur_bb, 0); 1831*38fd1498Szrj else 1832*38fd1498Szrj succ1 = NULL; 1833*38fd1498Szrj 1834*38fd1498Szrj if (EDGE_COUNT (cur_bb->succs) > 1) 1835*38fd1498Szrj succ2 = EDGE_SUCC (cur_bb, 1); 1836*38fd1498Szrj else 1837*38fd1498Szrj succ2 = NULL; 1838*38fd1498Szrj 1839*38fd1498Szrj /* Find the fall-through edge. */ 1840*38fd1498Szrj 1841*38fd1498Szrj if (succ1 1842*38fd1498Szrj && (succ1->flags & EDGE_FALLTHRU)) 1843*38fd1498Szrj { 1844*38fd1498Szrj fall_thru = succ1; 1845*38fd1498Szrj cond_jump = succ2; 1846*38fd1498Szrj } 1847*38fd1498Szrj else if (succ2 1848*38fd1498Szrj && (succ2->flags & EDGE_FALLTHRU)) 1849*38fd1498Szrj { 1850*38fd1498Szrj fall_thru = succ2; 1851*38fd1498Szrj cond_jump = succ1; 1852*38fd1498Szrj } 1853*38fd1498Szrj else if (succ2 && EDGE_COUNT (cur_bb->succs) > 2) 1854*38fd1498Szrj fall_thru = find_fallthru_edge (cur_bb->succs); 1855*38fd1498Szrj 1856*38fd1498Szrj if (fall_thru && (fall_thru->dest != EXIT_BLOCK_PTR_FOR_FN (cfun))) 1857*38fd1498Szrj { 1858*38fd1498Szrj /* Check to see if the fall-thru edge is a crossing edge. */ 1859*38fd1498Szrj 1860*38fd1498Szrj if (fall_thru->flags & EDGE_CROSSING) 1861*38fd1498Szrj { 1862*38fd1498Szrj /* The fall_thru edge crosses; now check the cond jump edge, if 1863*38fd1498Szrj it exists. */ 1864*38fd1498Szrj 1865*38fd1498Szrj bool cond_jump_crosses = true; 1866*38fd1498Szrj int invert_worked = 0; 1867*38fd1498Szrj rtx_insn *old_jump = BB_END (cur_bb); 1868*38fd1498Szrj 1869*38fd1498Szrj /* Find the jump instruction, if there is one. */ 1870*38fd1498Szrj 1871*38fd1498Szrj if (cond_jump) 1872*38fd1498Szrj { 1873*38fd1498Szrj if (!(cond_jump->flags & EDGE_CROSSING)) 1874*38fd1498Szrj cond_jump_crosses = false; 1875*38fd1498Szrj 1876*38fd1498Szrj /* We know the fall-thru edge crosses; if the cond 1877*38fd1498Szrj jump edge does NOT cross, and its destination is the 1878*38fd1498Szrj next block in the bb order, invert the jump 1879*38fd1498Szrj (i.e. fix it so the fall through does not cross and 1880*38fd1498Szrj the cond jump does). */ 1881*38fd1498Szrj 1882*38fd1498Szrj if (!cond_jump_crosses) 1883*38fd1498Szrj { 1884*38fd1498Szrj /* Find label in fall_thru block. We've already added 1885*38fd1498Szrj any missing labels, so there must be one. */ 1886*38fd1498Szrj 1887*38fd1498Szrj rtx_code_label *fall_thru_label 1888*38fd1498Szrj = block_label (fall_thru->dest); 1889*38fd1498Szrj 1890*38fd1498Szrj if (old_jump && fall_thru_label) 1891*38fd1498Szrj { 1892*38fd1498Szrj rtx_jump_insn *old_jump_insn 1893*38fd1498Szrj = dyn_cast <rtx_jump_insn *> (old_jump); 1894*38fd1498Szrj if (old_jump_insn) 1895*38fd1498Szrj invert_worked = invert_jump (old_jump_insn, 1896*38fd1498Szrj fall_thru_label, 0); 1897*38fd1498Szrj } 1898*38fd1498Szrj 1899*38fd1498Szrj if (invert_worked) 1900*38fd1498Szrj { 1901*38fd1498Szrj fall_thru->flags &= ~EDGE_FALLTHRU; 1902*38fd1498Szrj cond_jump->flags |= EDGE_FALLTHRU; 1903*38fd1498Szrj update_br_prob_note (cur_bb); 1904*38fd1498Szrj std::swap (fall_thru, cond_jump); 1905*38fd1498Szrj cond_jump->flags |= EDGE_CROSSING; 1906*38fd1498Szrj fall_thru->flags &= ~EDGE_CROSSING; 1907*38fd1498Szrj } 1908*38fd1498Szrj } 1909*38fd1498Szrj } 1910*38fd1498Szrj 1911*38fd1498Szrj if (cond_jump_crosses || !invert_worked) 1912*38fd1498Szrj { 1913*38fd1498Szrj /* This is the case where both edges out of the basic 1914*38fd1498Szrj block are crossing edges. Here we will fix up the 1915*38fd1498Szrj fall through edge. The jump edge will be taken care 1916*38fd1498Szrj of later. The EDGE_CROSSING flag of fall_thru edge 1917*38fd1498Szrj is unset before the call to force_nonfallthru 1918*38fd1498Szrj function because if a new basic-block is created 1919*38fd1498Szrj this edge remains in the current section boundary 1920*38fd1498Szrj while the edge between new_bb and the fall_thru->dest 1921*38fd1498Szrj becomes EDGE_CROSSING. */ 1922*38fd1498Szrj 1923*38fd1498Szrj fall_thru->flags &= ~EDGE_CROSSING; 1924*38fd1498Szrj basic_block new_bb = force_nonfallthru (fall_thru); 1925*38fd1498Szrj 1926*38fd1498Szrj if (new_bb) 1927*38fd1498Szrj { 1928*38fd1498Szrj new_bb->aux = cur_bb->aux; 1929*38fd1498Szrj cur_bb->aux = new_bb; 1930*38fd1498Szrj 1931*38fd1498Szrj /* This is done by force_nonfallthru_and_redirect. */ 1932*38fd1498Szrj gcc_assert (BB_PARTITION (new_bb) 1933*38fd1498Szrj == BB_PARTITION (cur_bb)); 1934*38fd1498Szrj 1935*38fd1498Szrj single_succ_edge (new_bb)->flags |= EDGE_CROSSING; 1936*38fd1498Szrj } 1937*38fd1498Szrj else 1938*38fd1498Szrj { 1939*38fd1498Szrj /* If a new basic-block was not created; restore 1940*38fd1498Szrj the EDGE_CROSSING flag. */ 1941*38fd1498Szrj fall_thru->flags |= EDGE_CROSSING; 1942*38fd1498Szrj } 1943*38fd1498Szrj 1944*38fd1498Szrj /* Add barrier after new jump */ 1945*38fd1498Szrj emit_barrier_after_bb (new_bb ? new_bb : cur_bb); 1946*38fd1498Szrj } 1947*38fd1498Szrj } 1948*38fd1498Szrj } 1949*38fd1498Szrj } 1950*38fd1498Szrj } 1951*38fd1498Szrj 1952*38fd1498Szrj /* This function checks the destination block of a "crossing jump" to 1953*38fd1498Szrj see if it has any crossing predecessors that begin with a code label 1954*38fd1498Szrj and end with an unconditional jump. If so, it returns that predecessor 1955*38fd1498Szrj block. (This is to avoid creating lots of new basic blocks that all 1956*38fd1498Szrj contain unconditional jumps to the same destination). */ 1957*38fd1498Szrj 1958*38fd1498Szrj static basic_block 1959*38fd1498Szrj find_jump_block (basic_block jump_dest) 1960*38fd1498Szrj { 1961*38fd1498Szrj basic_block source_bb = NULL; 1962*38fd1498Szrj edge e; 1963*38fd1498Szrj rtx_insn *insn; 1964*38fd1498Szrj edge_iterator ei; 1965*38fd1498Szrj 1966*38fd1498Szrj FOR_EACH_EDGE (e, ei, jump_dest->preds) 1967*38fd1498Szrj if (e->flags & EDGE_CROSSING) 1968*38fd1498Szrj { 1969*38fd1498Szrj basic_block src = e->src; 1970*38fd1498Szrj 1971*38fd1498Szrj /* Check each predecessor to see if it has a label, and contains 1972*38fd1498Szrj only one executable instruction, which is an unconditional jump. 1973*38fd1498Szrj If so, we can use it. */ 1974*38fd1498Szrj 1975*38fd1498Szrj if (LABEL_P (BB_HEAD (src))) 1976*38fd1498Szrj for (insn = BB_HEAD (src); 1977*38fd1498Szrj !INSN_P (insn) && insn != NEXT_INSN (BB_END (src)); 1978*38fd1498Szrj insn = NEXT_INSN (insn)) 1979*38fd1498Szrj { 1980*38fd1498Szrj if (INSN_P (insn) 1981*38fd1498Szrj && insn == BB_END (src) 1982*38fd1498Szrj && JUMP_P (insn) 1983*38fd1498Szrj && !any_condjump_p (insn)) 1984*38fd1498Szrj { 1985*38fd1498Szrj source_bb = src; 1986*38fd1498Szrj break; 1987*38fd1498Szrj } 1988*38fd1498Szrj } 1989*38fd1498Szrj 1990*38fd1498Szrj if (source_bb) 1991*38fd1498Szrj break; 1992*38fd1498Szrj } 1993*38fd1498Szrj 1994*38fd1498Szrj return source_bb; 1995*38fd1498Szrj } 1996*38fd1498Szrj 1997*38fd1498Szrj /* Find all BB's with conditional jumps that are crossing edges; 1998*38fd1498Szrj insert a new bb and make the conditional jump branch to the new 1999*38fd1498Szrj bb instead (make the new bb same color so conditional branch won't 2000*38fd1498Szrj be a 'crossing' edge). Insert an unconditional jump from the 2001*38fd1498Szrj new bb to the original destination of the conditional jump. */ 2002*38fd1498Szrj 2003*38fd1498Szrj static void 2004*38fd1498Szrj fix_crossing_conditional_branches (void) 2005*38fd1498Szrj { 2006*38fd1498Szrj basic_block cur_bb; 2007*38fd1498Szrj basic_block new_bb; 2008*38fd1498Szrj basic_block dest; 2009*38fd1498Szrj edge succ1; 2010*38fd1498Szrj edge succ2; 2011*38fd1498Szrj edge crossing_edge; 2012*38fd1498Szrj edge new_edge; 2013*38fd1498Szrj rtx set_src; 2014*38fd1498Szrj rtx old_label = NULL_RTX; 2015*38fd1498Szrj rtx_code_label *new_label; 2016*38fd1498Szrj 2017*38fd1498Szrj FOR_EACH_BB_FN (cur_bb, cfun) 2018*38fd1498Szrj { 2019*38fd1498Szrj crossing_edge = NULL; 2020*38fd1498Szrj if (EDGE_COUNT (cur_bb->succs) > 0) 2021*38fd1498Szrj succ1 = EDGE_SUCC (cur_bb, 0); 2022*38fd1498Szrj else 2023*38fd1498Szrj succ1 = NULL; 2024*38fd1498Szrj 2025*38fd1498Szrj if (EDGE_COUNT (cur_bb->succs) > 1) 2026*38fd1498Szrj succ2 = EDGE_SUCC (cur_bb, 1); 2027*38fd1498Szrj else 2028*38fd1498Szrj succ2 = NULL; 2029*38fd1498Szrj 2030*38fd1498Szrj /* We already took care of fall-through edges, so only one successor 2031*38fd1498Szrj can be a crossing edge. */ 2032*38fd1498Szrj 2033*38fd1498Szrj if (succ1 && (succ1->flags & EDGE_CROSSING)) 2034*38fd1498Szrj crossing_edge = succ1; 2035*38fd1498Szrj else if (succ2 && (succ2->flags & EDGE_CROSSING)) 2036*38fd1498Szrj crossing_edge = succ2; 2037*38fd1498Szrj 2038*38fd1498Szrj if (crossing_edge) 2039*38fd1498Szrj { 2040*38fd1498Szrj rtx_insn *old_jump = BB_END (cur_bb); 2041*38fd1498Szrj 2042*38fd1498Szrj /* Check to make sure the jump instruction is a 2043*38fd1498Szrj conditional jump. */ 2044*38fd1498Szrj 2045*38fd1498Szrj set_src = NULL_RTX; 2046*38fd1498Szrj 2047*38fd1498Szrj if (any_condjump_p (old_jump)) 2048*38fd1498Szrj { 2049*38fd1498Szrj if (GET_CODE (PATTERN (old_jump)) == SET) 2050*38fd1498Szrj set_src = SET_SRC (PATTERN (old_jump)); 2051*38fd1498Szrj else if (GET_CODE (PATTERN (old_jump)) == PARALLEL) 2052*38fd1498Szrj { 2053*38fd1498Szrj set_src = XVECEXP (PATTERN (old_jump), 0,0); 2054*38fd1498Szrj if (GET_CODE (set_src) == SET) 2055*38fd1498Szrj set_src = SET_SRC (set_src); 2056*38fd1498Szrj else 2057*38fd1498Szrj set_src = NULL_RTX; 2058*38fd1498Szrj } 2059*38fd1498Szrj } 2060*38fd1498Szrj 2061*38fd1498Szrj if (set_src && (GET_CODE (set_src) == IF_THEN_ELSE)) 2062*38fd1498Szrj { 2063*38fd1498Szrj rtx_jump_insn *old_jump_insn = 2064*38fd1498Szrj as_a <rtx_jump_insn *> (old_jump); 2065*38fd1498Szrj 2066*38fd1498Szrj if (GET_CODE (XEXP (set_src, 1)) == PC) 2067*38fd1498Szrj old_label = XEXP (set_src, 2); 2068*38fd1498Szrj else if (GET_CODE (XEXP (set_src, 2)) == PC) 2069*38fd1498Szrj old_label = XEXP (set_src, 1); 2070*38fd1498Szrj 2071*38fd1498Szrj /* Check to see if new bb for jumping to that dest has 2072*38fd1498Szrj already been created; if so, use it; if not, create 2073*38fd1498Szrj a new one. */ 2074*38fd1498Szrj 2075*38fd1498Szrj new_bb = find_jump_block (crossing_edge->dest); 2076*38fd1498Szrj 2077*38fd1498Szrj if (new_bb) 2078*38fd1498Szrj new_label = block_label (new_bb); 2079*38fd1498Szrj else 2080*38fd1498Szrj { 2081*38fd1498Szrj basic_block last_bb; 2082*38fd1498Szrj rtx_code_label *old_jump_target; 2083*38fd1498Szrj rtx_jump_insn *new_jump; 2084*38fd1498Szrj 2085*38fd1498Szrj /* Create new basic block to be dest for 2086*38fd1498Szrj conditional jump. */ 2087*38fd1498Szrj 2088*38fd1498Szrj /* Put appropriate instructions in new bb. */ 2089*38fd1498Szrj 2090*38fd1498Szrj new_label = gen_label_rtx (); 2091*38fd1498Szrj emit_label (new_label); 2092*38fd1498Szrj 2093*38fd1498Szrj gcc_assert (GET_CODE (old_label) == LABEL_REF); 2094*38fd1498Szrj old_jump_target = old_jump_insn->jump_target (); 2095*38fd1498Szrj new_jump = as_a <rtx_jump_insn *> 2096*38fd1498Szrj (emit_jump_insn (targetm.gen_jump (old_jump_target))); 2097*38fd1498Szrj new_jump->set_jump_target (old_jump_target); 2098*38fd1498Szrj 2099*38fd1498Szrj last_bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb; 2100*38fd1498Szrj new_bb = create_basic_block (new_label, new_jump, last_bb); 2101*38fd1498Szrj new_bb->aux = last_bb->aux; 2102*38fd1498Szrj last_bb->aux = new_bb; 2103*38fd1498Szrj 2104*38fd1498Szrj emit_barrier_after_bb (new_bb); 2105*38fd1498Szrj 2106*38fd1498Szrj /* Make sure new bb is in same partition as source 2107*38fd1498Szrj of conditional branch. */ 2108*38fd1498Szrj BB_COPY_PARTITION (new_bb, cur_bb); 2109*38fd1498Szrj } 2110*38fd1498Szrj 2111*38fd1498Szrj /* Make old jump branch to new bb. */ 2112*38fd1498Szrj 2113*38fd1498Szrj redirect_jump (old_jump_insn, new_label, 0); 2114*38fd1498Szrj 2115*38fd1498Szrj /* Remove crossing_edge as predecessor of 'dest'. */ 2116*38fd1498Szrj 2117*38fd1498Szrj dest = crossing_edge->dest; 2118*38fd1498Szrj 2119*38fd1498Szrj redirect_edge_succ (crossing_edge, new_bb); 2120*38fd1498Szrj 2121*38fd1498Szrj /* Make a new edge from new_bb to old dest; new edge 2122*38fd1498Szrj will be a successor for new_bb and a predecessor 2123*38fd1498Szrj for 'dest'. */ 2124*38fd1498Szrj 2125*38fd1498Szrj if (EDGE_COUNT (new_bb->succs) == 0) 2126*38fd1498Szrj new_edge = make_single_succ_edge (new_bb, dest, 0); 2127*38fd1498Szrj else 2128*38fd1498Szrj new_edge = EDGE_SUCC (new_bb, 0); 2129*38fd1498Szrj 2130*38fd1498Szrj crossing_edge->flags &= ~EDGE_CROSSING; 2131*38fd1498Szrj new_edge->flags |= EDGE_CROSSING; 2132*38fd1498Szrj } 2133*38fd1498Szrj } 2134*38fd1498Szrj } 2135*38fd1498Szrj } 2136*38fd1498Szrj 2137*38fd1498Szrj /* Find any unconditional branches that cross between hot and cold 2138*38fd1498Szrj sections. Convert them into indirect jumps instead. */ 2139*38fd1498Szrj 2140*38fd1498Szrj static void 2141*38fd1498Szrj fix_crossing_unconditional_branches (void) 2142*38fd1498Szrj { 2143*38fd1498Szrj basic_block cur_bb; 2144*38fd1498Szrj rtx_insn *last_insn; 2145*38fd1498Szrj rtx label; 2146*38fd1498Szrj rtx label_addr; 2147*38fd1498Szrj rtx_insn *indirect_jump_sequence; 2148*38fd1498Szrj rtx_insn *jump_insn = NULL; 2149*38fd1498Szrj rtx new_reg; 2150*38fd1498Szrj rtx_insn *cur_insn; 2151*38fd1498Szrj edge succ; 2152*38fd1498Szrj 2153*38fd1498Szrj FOR_EACH_BB_FN (cur_bb, cfun) 2154*38fd1498Szrj { 2155*38fd1498Szrj last_insn = BB_END (cur_bb); 2156*38fd1498Szrj 2157*38fd1498Szrj if (EDGE_COUNT (cur_bb->succs) < 1) 2158*38fd1498Szrj continue; 2159*38fd1498Szrj 2160*38fd1498Szrj succ = EDGE_SUCC (cur_bb, 0); 2161*38fd1498Szrj 2162*38fd1498Szrj /* Check to see if bb ends in a crossing (unconditional) jump. At 2163*38fd1498Szrj this point, no crossing jumps should be conditional. */ 2164*38fd1498Szrj 2165*38fd1498Szrj if (JUMP_P (last_insn) 2166*38fd1498Szrj && (succ->flags & EDGE_CROSSING)) 2167*38fd1498Szrj { 2168*38fd1498Szrj gcc_assert (!any_condjump_p (last_insn)); 2169*38fd1498Szrj 2170*38fd1498Szrj /* Make sure the jump is not already an indirect or table jump. */ 2171*38fd1498Szrj 2172*38fd1498Szrj if (!computed_jump_p (last_insn) 2173*38fd1498Szrj && !tablejump_p (last_insn, NULL, NULL)) 2174*38fd1498Szrj { 2175*38fd1498Szrj /* We have found a "crossing" unconditional branch. Now 2176*38fd1498Szrj we must convert it to an indirect jump. First create 2177*38fd1498Szrj reference of label, as target for jump. */ 2178*38fd1498Szrj 2179*38fd1498Szrj label = JUMP_LABEL (last_insn); 2180*38fd1498Szrj label_addr = gen_rtx_LABEL_REF (Pmode, label); 2181*38fd1498Szrj LABEL_NUSES (label) += 1; 2182*38fd1498Szrj 2183*38fd1498Szrj /* Get a register to use for the indirect jump. */ 2184*38fd1498Szrj 2185*38fd1498Szrj new_reg = gen_reg_rtx (Pmode); 2186*38fd1498Szrj 2187*38fd1498Szrj /* Generate indirect the jump sequence. */ 2188*38fd1498Szrj 2189*38fd1498Szrj start_sequence (); 2190*38fd1498Szrj emit_move_insn (new_reg, label_addr); 2191*38fd1498Szrj emit_indirect_jump (new_reg); 2192*38fd1498Szrj indirect_jump_sequence = get_insns (); 2193*38fd1498Szrj end_sequence (); 2194*38fd1498Szrj 2195*38fd1498Szrj /* Make sure every instruction in the new jump sequence has 2196*38fd1498Szrj its basic block set to be cur_bb. */ 2197*38fd1498Szrj 2198*38fd1498Szrj for (cur_insn = indirect_jump_sequence; cur_insn; 2199*38fd1498Szrj cur_insn = NEXT_INSN (cur_insn)) 2200*38fd1498Szrj { 2201*38fd1498Szrj if (!BARRIER_P (cur_insn)) 2202*38fd1498Szrj BLOCK_FOR_INSN (cur_insn) = cur_bb; 2203*38fd1498Szrj if (JUMP_P (cur_insn)) 2204*38fd1498Szrj jump_insn = cur_insn; 2205*38fd1498Szrj } 2206*38fd1498Szrj 2207*38fd1498Szrj /* Insert the new (indirect) jump sequence immediately before 2208*38fd1498Szrj the unconditional jump, then delete the unconditional jump. */ 2209*38fd1498Szrj 2210*38fd1498Szrj emit_insn_before (indirect_jump_sequence, last_insn); 2211*38fd1498Szrj delete_insn (last_insn); 2212*38fd1498Szrj 2213*38fd1498Szrj JUMP_LABEL (jump_insn) = label; 2214*38fd1498Szrj LABEL_NUSES (label)++; 2215*38fd1498Szrj 2216*38fd1498Szrj /* Make BB_END for cur_bb be the jump instruction (NOT the 2217*38fd1498Szrj barrier instruction at the end of the sequence...). */ 2218*38fd1498Szrj 2219*38fd1498Szrj BB_END (cur_bb) = jump_insn; 2220*38fd1498Szrj } 2221*38fd1498Szrj } 2222*38fd1498Szrj } 2223*38fd1498Szrj } 2224*38fd1498Szrj 2225*38fd1498Szrj /* Update CROSSING_JUMP_P flags on all jump insns. */ 2226*38fd1498Szrj 2227*38fd1498Szrj static void 2228*38fd1498Szrj update_crossing_jump_flags (void) 2229*38fd1498Szrj { 2230*38fd1498Szrj basic_block bb; 2231*38fd1498Szrj edge e; 2232*38fd1498Szrj edge_iterator ei; 2233*38fd1498Szrj 2234*38fd1498Szrj FOR_EACH_BB_FN (bb, cfun) 2235*38fd1498Szrj FOR_EACH_EDGE (e, ei, bb->succs) 2236*38fd1498Szrj if (e->flags & EDGE_CROSSING) 2237*38fd1498Szrj { 2238*38fd1498Szrj if (JUMP_P (BB_END (bb))) 2239*38fd1498Szrj CROSSING_JUMP_P (BB_END (bb)) = 1; 2240*38fd1498Szrj break; 2241*38fd1498Szrj } 2242*38fd1498Szrj } 2243*38fd1498Szrj 2244*38fd1498Szrj /* Reorder basic blocks using the software trace cache (STC) algorithm. */ 2245*38fd1498Szrj 2246*38fd1498Szrj static void 2247*38fd1498Szrj reorder_basic_blocks_software_trace_cache (void) 2248*38fd1498Szrj { 2249*38fd1498Szrj if (dump_file) 2250*38fd1498Szrj fprintf (dump_file, "\nReordering with the STC algorithm.\n\n"); 2251*38fd1498Szrj 2252*38fd1498Szrj int n_traces; 2253*38fd1498Szrj int i; 2254*38fd1498Szrj struct trace *traces; 2255*38fd1498Szrj 2256*38fd1498Szrj /* We are estimating the length of uncond jump insn only once since the code 2257*38fd1498Szrj for getting the insn length always returns the minimal length now. */ 2258*38fd1498Szrj if (uncond_jump_length == 0) 2259*38fd1498Szrj uncond_jump_length = get_uncond_jump_length (); 2260*38fd1498Szrj 2261*38fd1498Szrj /* We need to know some information for each basic block. */ 2262*38fd1498Szrj array_size = GET_ARRAY_SIZE (last_basic_block_for_fn (cfun)); 2263*38fd1498Szrj bbd = XNEWVEC (bbro_basic_block_data, array_size); 2264*38fd1498Szrj for (i = 0; i < array_size; i++) 2265*38fd1498Szrj { 2266*38fd1498Szrj bbd[i].start_of_trace = -1; 2267*38fd1498Szrj bbd[i].end_of_trace = -1; 2268*38fd1498Szrj bbd[i].in_trace = -1; 2269*38fd1498Szrj bbd[i].visited = 0; 2270*38fd1498Szrj bbd[i].priority = -1; 2271*38fd1498Szrj bbd[i].heap = NULL; 2272*38fd1498Szrj bbd[i].node = NULL; 2273*38fd1498Szrj } 2274*38fd1498Szrj 2275*38fd1498Szrj traces = XNEWVEC (struct trace, n_basic_blocks_for_fn (cfun)); 2276*38fd1498Szrj n_traces = 0; 2277*38fd1498Szrj find_traces (&n_traces, traces); 2278*38fd1498Szrj connect_traces (n_traces, traces); 2279*38fd1498Szrj FREE (traces); 2280*38fd1498Szrj FREE (bbd); 2281*38fd1498Szrj } 2282*38fd1498Szrj 2283*38fd1498Szrj /* Return true if edge E1 is more desirable as a fallthrough edge than 2284*38fd1498Szrj edge E2 is. */ 2285*38fd1498Szrj 2286*38fd1498Szrj static bool 2287*38fd1498Szrj edge_order (edge e1, edge e2) 2288*38fd1498Szrj { 2289*38fd1498Szrj return e1->count () > e2->count (); 2290*38fd1498Szrj } 2291*38fd1498Szrj 2292*38fd1498Szrj /* Reorder basic blocks using the "simple" algorithm. This tries to 2293*38fd1498Szrj maximize the dynamic number of branches that are fallthrough, without 2294*38fd1498Szrj copying instructions. The algorithm is greedy, looking at the most 2295*38fd1498Szrj frequently executed branch first. */ 2296*38fd1498Szrj 2297*38fd1498Szrj static void 2298*38fd1498Szrj reorder_basic_blocks_simple (void) 2299*38fd1498Szrj { 2300*38fd1498Szrj if (dump_file) 2301*38fd1498Szrj fprintf (dump_file, "\nReordering with the \"simple\" algorithm.\n\n"); 2302*38fd1498Szrj 2303*38fd1498Szrj edge *edges = new edge[2 * n_basic_blocks_for_fn (cfun)]; 2304*38fd1498Szrj 2305*38fd1498Szrj /* First, collect all edges that can be optimized by reordering blocks: 2306*38fd1498Szrj simple jumps and conditional jumps, as well as the function entry edge. */ 2307*38fd1498Szrj 2308*38fd1498Szrj int n = 0; 2309*38fd1498Szrj edges[n++] = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0); 2310*38fd1498Szrj 2311*38fd1498Szrj basic_block bb; 2312*38fd1498Szrj FOR_EACH_BB_FN (bb, cfun) 2313*38fd1498Szrj { 2314*38fd1498Szrj rtx_insn *end = BB_END (bb); 2315*38fd1498Szrj 2316*38fd1498Szrj if (computed_jump_p (end) || tablejump_p (end, NULL, NULL)) 2317*38fd1498Szrj continue; 2318*38fd1498Szrj 2319*38fd1498Szrj /* We cannot optimize asm goto. */ 2320*38fd1498Szrj if (JUMP_P (end) && extract_asm_operands (end)) 2321*38fd1498Szrj continue; 2322*38fd1498Szrj 2323*38fd1498Szrj if (single_succ_p (bb)) 2324*38fd1498Szrj edges[n++] = EDGE_SUCC (bb, 0); 2325*38fd1498Szrj else if (any_condjump_p (end)) 2326*38fd1498Szrj { 2327*38fd1498Szrj edge e0 = EDGE_SUCC (bb, 0); 2328*38fd1498Szrj edge e1 = EDGE_SUCC (bb, 1); 2329*38fd1498Szrj /* When optimizing for size it is best to keep the original 2330*38fd1498Szrj fallthrough edges. */ 2331*38fd1498Szrj if (e1->flags & EDGE_FALLTHRU) 2332*38fd1498Szrj std::swap (e0, e1); 2333*38fd1498Szrj edges[n++] = e0; 2334*38fd1498Szrj edges[n++] = e1; 2335*38fd1498Szrj } 2336*38fd1498Szrj } 2337*38fd1498Szrj 2338*38fd1498Szrj /* Sort the edges, the most desirable first. When optimizing for size 2339*38fd1498Szrj all edges are equally desirable. */ 2340*38fd1498Szrj 2341*38fd1498Szrj if (optimize_function_for_speed_p (cfun)) 2342*38fd1498Szrj std::stable_sort (edges, edges + n, edge_order); 2343*38fd1498Szrj 2344*38fd1498Szrj /* Now decide which of those edges to make fallthrough edges. We set 2345*38fd1498Szrj BB_VISITED if a block already has a fallthrough successor assigned 2346*38fd1498Szrj to it. We make ->AUX of an endpoint point to the opposite endpoint 2347*38fd1498Szrj of a sequence of blocks that fall through, and ->AUX will be NULL 2348*38fd1498Szrj for a block that is in such a sequence but not an endpoint anymore. 2349*38fd1498Szrj 2350*38fd1498Szrj To start with, everything points to itself, nothing is assigned yet. */ 2351*38fd1498Szrj 2352*38fd1498Szrj FOR_ALL_BB_FN (bb, cfun) 2353*38fd1498Szrj { 2354*38fd1498Szrj bb->aux = bb; 2355*38fd1498Szrj bb->flags &= ~BB_VISITED; 2356*38fd1498Szrj } 2357*38fd1498Szrj 2358*38fd1498Szrj EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = 0; 2359*38fd1498Szrj 2360*38fd1498Szrj /* Now for all edges, the most desirable first, see if that edge can 2361*38fd1498Szrj connect two sequences. If it can, update AUX and BB_VISITED; if it 2362*38fd1498Szrj cannot, zero out the edge in the table. */ 2363*38fd1498Szrj 2364*38fd1498Szrj for (int j = 0; j < n; j++) 2365*38fd1498Szrj { 2366*38fd1498Szrj edge e = edges[j]; 2367*38fd1498Szrj 2368*38fd1498Szrj basic_block tail_a = e->src; 2369*38fd1498Szrj basic_block head_b = e->dest; 2370*38fd1498Szrj basic_block head_a = (basic_block) tail_a->aux; 2371*38fd1498Szrj basic_block tail_b = (basic_block) head_b->aux; 2372*38fd1498Szrj 2373*38fd1498Szrj /* An edge cannot connect two sequences if: 2374*38fd1498Szrj - it crosses partitions; 2375*38fd1498Szrj - its src is not a current endpoint; 2376*38fd1498Szrj - its dest is not a current endpoint; 2377*38fd1498Szrj - or, it would create a loop. */ 2378*38fd1498Szrj 2379*38fd1498Szrj if (e->flags & EDGE_CROSSING 2380*38fd1498Szrj || tail_a->flags & BB_VISITED 2381*38fd1498Szrj || !tail_b 2382*38fd1498Szrj || (!(head_b->flags & BB_VISITED) && head_b != tail_b) 2383*38fd1498Szrj || tail_a == tail_b) 2384*38fd1498Szrj { 2385*38fd1498Szrj edges[j] = 0; 2386*38fd1498Szrj continue; 2387*38fd1498Szrj } 2388*38fd1498Szrj 2389*38fd1498Szrj tail_a->aux = 0; 2390*38fd1498Szrj head_b->aux = 0; 2391*38fd1498Szrj head_a->aux = tail_b; 2392*38fd1498Szrj tail_b->aux = head_a; 2393*38fd1498Szrj tail_a->flags |= BB_VISITED; 2394*38fd1498Szrj } 2395*38fd1498Szrj 2396*38fd1498Szrj /* Put the pieces together, in the same order that the start blocks of 2397*38fd1498Szrj the sequences already had. The hot/cold partitioning gives a little 2398*38fd1498Szrj complication: as a first pass only do this for blocks in the same 2399*38fd1498Szrj partition as the start block, and (if there is anything left to do) 2400*38fd1498Szrj in a second pass handle the other partition. */ 2401*38fd1498Szrj 2402*38fd1498Szrj basic_block last_tail = (basic_block) ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux; 2403*38fd1498Szrj 2404*38fd1498Szrj int current_partition 2405*38fd1498Szrj = BB_PARTITION (last_tail == ENTRY_BLOCK_PTR_FOR_FN (cfun) 2406*38fd1498Szrj ? EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0)->dest 2407*38fd1498Szrj : last_tail); 2408*38fd1498Szrj bool need_another_pass = true; 2409*38fd1498Szrj 2410*38fd1498Szrj for (int pass = 0; pass < 2 && need_another_pass; pass++) 2411*38fd1498Szrj { 2412*38fd1498Szrj need_another_pass = false; 2413*38fd1498Szrj 2414*38fd1498Szrj FOR_EACH_BB_FN (bb, cfun) 2415*38fd1498Szrj if ((bb->flags & BB_VISITED && bb->aux) || bb->aux == bb) 2416*38fd1498Szrj { 2417*38fd1498Szrj if (BB_PARTITION (bb) != current_partition) 2418*38fd1498Szrj { 2419*38fd1498Szrj need_another_pass = true; 2420*38fd1498Szrj continue; 2421*38fd1498Szrj } 2422*38fd1498Szrj 2423*38fd1498Szrj last_tail->aux = bb; 2424*38fd1498Szrj last_tail = (basic_block) bb->aux; 2425*38fd1498Szrj } 2426*38fd1498Szrj 2427*38fd1498Szrj current_partition ^= BB_HOT_PARTITION | BB_COLD_PARTITION; 2428*38fd1498Szrj } 2429*38fd1498Szrj 2430*38fd1498Szrj last_tail->aux = 0; 2431*38fd1498Szrj 2432*38fd1498Szrj /* Finally, link all the chosen fallthrough edges. */ 2433*38fd1498Szrj 2434*38fd1498Szrj for (int j = 0; j < n; j++) 2435*38fd1498Szrj if (edges[j]) 2436*38fd1498Szrj edges[j]->src->aux = edges[j]->dest; 2437*38fd1498Szrj 2438*38fd1498Szrj delete[] edges; 2439*38fd1498Szrj 2440*38fd1498Szrj /* If the entry edge no longer falls through we have to make a new 2441*38fd1498Szrj block so it can do so again. */ 2442*38fd1498Szrj 2443*38fd1498Szrj edge e = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0); 2444*38fd1498Szrj if (e->dest != ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux) 2445*38fd1498Szrj { 2446*38fd1498Szrj force_nonfallthru (e); 2447*38fd1498Szrj e->src->aux = ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux; 2448*38fd1498Szrj } 2449*38fd1498Szrj } 2450*38fd1498Szrj 2451*38fd1498Szrj /* Reorder basic blocks. The main entry point to this file. */ 2452*38fd1498Szrj 2453*38fd1498Szrj static void 2454*38fd1498Szrj reorder_basic_blocks (void) 2455*38fd1498Szrj { 2456*38fd1498Szrj gcc_assert (current_ir_type () == IR_RTL_CFGLAYOUT); 2457*38fd1498Szrj 2458*38fd1498Szrj if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1) 2459*38fd1498Szrj return; 2460*38fd1498Szrj 2461*38fd1498Szrj set_edge_can_fallthru_flag (); 2462*38fd1498Szrj mark_dfs_back_edges (); 2463*38fd1498Szrj 2464*38fd1498Szrj switch (flag_reorder_blocks_algorithm) 2465*38fd1498Szrj { 2466*38fd1498Szrj case REORDER_BLOCKS_ALGORITHM_SIMPLE: 2467*38fd1498Szrj reorder_basic_blocks_simple (); 2468*38fd1498Szrj break; 2469*38fd1498Szrj 2470*38fd1498Szrj case REORDER_BLOCKS_ALGORITHM_STC: 2471*38fd1498Szrj reorder_basic_blocks_software_trace_cache (); 2472*38fd1498Szrj break; 2473*38fd1498Szrj 2474*38fd1498Szrj default: 2475*38fd1498Szrj gcc_unreachable (); 2476*38fd1498Szrj } 2477*38fd1498Szrj 2478*38fd1498Szrj relink_block_chain (/*stay_in_cfglayout_mode=*/true); 2479*38fd1498Szrj 2480*38fd1498Szrj if (dump_file) 2481*38fd1498Szrj { 2482*38fd1498Szrj if (dump_flags & TDF_DETAILS) 2483*38fd1498Szrj dump_reg_info (dump_file); 2484*38fd1498Szrj dump_flow_info (dump_file, dump_flags); 2485*38fd1498Szrj } 2486*38fd1498Szrj 2487*38fd1498Szrj /* Signal that rtl_verify_flow_info_1 can now verify that there 2488*38fd1498Szrj is at most one switch between hot/cold sections. */ 2489*38fd1498Szrj crtl->bb_reorder_complete = true; 2490*38fd1498Szrj } 2491*38fd1498Szrj 2492*38fd1498Szrj /* Determine which partition the first basic block in the function 2493*38fd1498Szrj belongs to, then find the first basic block in the current function 2494*38fd1498Szrj that belongs to a different section, and insert a 2495*38fd1498Szrj NOTE_INSN_SWITCH_TEXT_SECTIONS note immediately before it in the 2496*38fd1498Szrj instruction stream. When writing out the assembly code, 2497*38fd1498Szrj encountering this note will make the compiler switch between the 2498*38fd1498Szrj hot and cold text sections. */ 2499*38fd1498Szrj 2500*38fd1498Szrj void 2501*38fd1498Szrj insert_section_boundary_note (void) 2502*38fd1498Szrj { 2503*38fd1498Szrj basic_block bb; 2504*38fd1498Szrj bool switched_sections = false; 2505*38fd1498Szrj int current_partition = 0; 2506*38fd1498Szrj 2507*38fd1498Szrj if (!crtl->has_bb_partition) 2508*38fd1498Szrj return; 2509*38fd1498Szrj 2510*38fd1498Szrj FOR_EACH_BB_FN (bb, cfun) 2511*38fd1498Szrj { 2512*38fd1498Szrj if (!current_partition) 2513*38fd1498Szrj current_partition = BB_PARTITION (bb); 2514*38fd1498Szrj if (BB_PARTITION (bb) != current_partition) 2515*38fd1498Szrj { 2516*38fd1498Szrj gcc_assert (!switched_sections); 2517*38fd1498Szrj switched_sections = true; 2518*38fd1498Szrj emit_note_before (NOTE_INSN_SWITCH_TEXT_SECTIONS, BB_HEAD (bb)); 2519*38fd1498Szrj current_partition = BB_PARTITION (bb); 2520*38fd1498Szrj } 2521*38fd1498Szrj } 2522*38fd1498Szrj 2523*38fd1498Szrj /* Make sure crtl->has_bb_partition matches reality even if bbpart finds 2524*38fd1498Szrj some hot and some cold basic blocks, but later one of those kinds is 2525*38fd1498Szrj optimized away. */ 2526*38fd1498Szrj crtl->has_bb_partition = switched_sections; 2527*38fd1498Szrj } 2528*38fd1498Szrj 2529*38fd1498Szrj namespace { 2530*38fd1498Szrj 2531*38fd1498Szrj const pass_data pass_data_reorder_blocks = 2532*38fd1498Szrj { 2533*38fd1498Szrj RTL_PASS, /* type */ 2534*38fd1498Szrj "bbro", /* name */ 2535*38fd1498Szrj OPTGROUP_NONE, /* optinfo_flags */ 2536*38fd1498Szrj TV_REORDER_BLOCKS, /* tv_id */ 2537*38fd1498Szrj 0, /* properties_required */ 2538*38fd1498Szrj 0, /* properties_provided */ 2539*38fd1498Szrj 0, /* properties_destroyed */ 2540*38fd1498Szrj 0, /* todo_flags_start */ 2541*38fd1498Szrj 0, /* todo_flags_finish */ 2542*38fd1498Szrj }; 2543*38fd1498Szrj 2544*38fd1498Szrj class pass_reorder_blocks : public rtl_opt_pass 2545*38fd1498Szrj { 2546*38fd1498Szrj public: 2547*38fd1498Szrj pass_reorder_blocks (gcc::context *ctxt) 2548*38fd1498Szrj : rtl_opt_pass (pass_data_reorder_blocks, ctxt) 2549*38fd1498Szrj {} 2550*38fd1498Szrj 2551*38fd1498Szrj /* opt_pass methods: */ 2552*38fd1498Szrj virtual bool gate (function *) 2553*38fd1498Szrj { 2554*38fd1498Szrj if (targetm.cannot_modify_jumps_p ()) 2555*38fd1498Szrj return false; 2556*38fd1498Szrj return (optimize > 0 2557*38fd1498Szrj && (flag_reorder_blocks || flag_reorder_blocks_and_partition)); 2558*38fd1498Szrj } 2559*38fd1498Szrj 2560*38fd1498Szrj virtual unsigned int execute (function *); 2561*38fd1498Szrj 2562*38fd1498Szrj }; // class pass_reorder_blocks 2563*38fd1498Szrj 2564*38fd1498Szrj unsigned int 2565*38fd1498Szrj pass_reorder_blocks::execute (function *fun) 2566*38fd1498Szrj { 2567*38fd1498Szrj basic_block bb; 2568*38fd1498Szrj 2569*38fd1498Szrj /* Last attempt to optimize CFG, as scheduling, peepholing and insn 2570*38fd1498Szrj splitting possibly introduced more crossjumping opportunities. */ 2571*38fd1498Szrj cfg_layout_initialize (CLEANUP_EXPENSIVE); 2572*38fd1498Szrj 2573*38fd1498Szrj reorder_basic_blocks (); 2574*38fd1498Szrj cleanup_cfg (CLEANUP_EXPENSIVE | CLEANUP_NO_PARTITIONING); 2575*38fd1498Szrj 2576*38fd1498Szrj FOR_EACH_BB_FN (bb, fun) 2577*38fd1498Szrj if (bb->next_bb != EXIT_BLOCK_PTR_FOR_FN (fun)) 2578*38fd1498Szrj bb->aux = bb->next_bb; 2579*38fd1498Szrj cfg_layout_finalize (); 2580*38fd1498Szrj 2581*38fd1498Szrj return 0; 2582*38fd1498Szrj } 2583*38fd1498Szrj 2584*38fd1498Szrj } // anon namespace 2585*38fd1498Szrj 2586*38fd1498Szrj rtl_opt_pass * 2587*38fd1498Szrj make_pass_reorder_blocks (gcc::context *ctxt) 2588*38fd1498Szrj { 2589*38fd1498Szrj return new pass_reorder_blocks (ctxt); 2590*38fd1498Szrj } 2591*38fd1498Szrj 2592*38fd1498Szrj /* Duplicate a block (that we already know ends in a computed jump) into its 2593*38fd1498Szrj predecessors, where possible. Return whether anything is changed. */ 2594*38fd1498Szrj static bool 2595*38fd1498Szrj maybe_duplicate_computed_goto (basic_block bb, int max_size) 2596*38fd1498Szrj { 2597*38fd1498Szrj if (single_pred_p (bb)) 2598*38fd1498Szrj return false; 2599*38fd1498Szrj 2600*38fd1498Szrj /* Make sure that the block is small enough. */ 2601*38fd1498Szrj rtx_insn *insn; 2602*38fd1498Szrj FOR_BB_INSNS (bb, insn) 2603*38fd1498Szrj if (INSN_P (insn)) 2604*38fd1498Szrj { 2605*38fd1498Szrj max_size -= get_attr_min_length (insn); 2606*38fd1498Szrj if (max_size < 0) 2607*38fd1498Szrj return false; 2608*38fd1498Szrj } 2609*38fd1498Szrj 2610*38fd1498Szrj bool changed = false; 2611*38fd1498Szrj edge e; 2612*38fd1498Szrj edge_iterator ei; 2613*38fd1498Szrj for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); ) 2614*38fd1498Szrj { 2615*38fd1498Szrj basic_block pred = e->src; 2616*38fd1498Szrj 2617*38fd1498Szrj /* Do not duplicate BB into PRED if that is the last predecessor, or if 2618*38fd1498Szrj we cannot merge a copy of BB with PRED. */ 2619*38fd1498Szrj if (single_pred_p (bb) 2620*38fd1498Szrj || !single_succ_p (pred) 2621*38fd1498Szrj || e->flags & EDGE_COMPLEX 2622*38fd1498Szrj || pred->index < NUM_FIXED_BLOCKS 2623*38fd1498Szrj || (JUMP_P (BB_END (pred)) && !simplejump_p (BB_END (pred))) 2624*38fd1498Szrj || (JUMP_P (BB_END (pred)) && CROSSING_JUMP_P (BB_END (pred)))) 2625*38fd1498Szrj { 2626*38fd1498Szrj ei_next (&ei); 2627*38fd1498Szrj continue; 2628*38fd1498Szrj } 2629*38fd1498Szrj 2630*38fd1498Szrj if (dump_file) 2631*38fd1498Szrj fprintf (dump_file, "Duplicating computed goto bb %d into bb %d\n", 2632*38fd1498Szrj bb->index, e->src->index); 2633*38fd1498Szrj 2634*38fd1498Szrj /* Remember if PRED can be duplicated; if so, the copy of BB merged 2635*38fd1498Szrj with PRED can be duplicated as well. */ 2636*38fd1498Szrj bool can_dup_more = can_duplicate_block_p (pred); 2637*38fd1498Szrj 2638*38fd1498Szrj /* Make a copy of BB, merge it into PRED. */ 2639*38fd1498Szrj basic_block copy = duplicate_block (bb, e, NULL); 2640*38fd1498Szrj emit_barrier_after_bb (copy); 2641*38fd1498Szrj reorder_insns_nobb (BB_HEAD (copy), BB_END (copy), BB_END (pred)); 2642*38fd1498Szrj merge_blocks (pred, copy); 2643*38fd1498Szrj 2644*38fd1498Szrj changed = true; 2645*38fd1498Szrj 2646*38fd1498Szrj /* Try to merge the resulting merged PRED into further predecessors. */ 2647*38fd1498Szrj if (can_dup_more) 2648*38fd1498Szrj maybe_duplicate_computed_goto (pred, max_size); 2649*38fd1498Szrj } 2650*38fd1498Szrj 2651*38fd1498Szrj return changed; 2652*38fd1498Szrj } 2653*38fd1498Szrj 2654*38fd1498Szrj /* Duplicate the blocks containing computed gotos. This basically unfactors 2655*38fd1498Szrj computed gotos that were factored early on in the compilation process to 2656*38fd1498Szrj speed up edge based data flow. We used to not unfactor them again, which 2657*38fd1498Szrj can seriously pessimize code with many computed jumps in the source code, 2658*38fd1498Szrj such as interpreters. See e.g. PR15242. */ 2659*38fd1498Szrj static void 2660*38fd1498Szrj duplicate_computed_gotos (function *fun) 2661*38fd1498Szrj { 2662*38fd1498Szrj /* We are estimating the length of uncond jump insn only once 2663*38fd1498Szrj since the code for getting the insn length always returns 2664*38fd1498Szrj the minimal length now. */ 2665*38fd1498Szrj if (uncond_jump_length == 0) 2666*38fd1498Szrj uncond_jump_length = get_uncond_jump_length (); 2667*38fd1498Szrj 2668*38fd1498Szrj /* Never copy a block larger than this. */ 2669*38fd1498Szrj int max_size 2670*38fd1498Szrj = uncond_jump_length * PARAM_VALUE (PARAM_MAX_GOTO_DUPLICATION_INSNS); 2671*38fd1498Szrj 2672*38fd1498Szrj bool changed = false; 2673*38fd1498Szrj 2674*38fd1498Szrj /* Try to duplicate all blocks that end in a computed jump and that 2675*38fd1498Szrj can be duplicated at all. */ 2676*38fd1498Szrj basic_block bb; 2677*38fd1498Szrj FOR_EACH_BB_FN (bb, fun) 2678*38fd1498Szrj if (computed_jump_p (BB_END (bb)) && can_duplicate_block_p (bb)) 2679*38fd1498Szrj changed |= maybe_duplicate_computed_goto (bb, max_size); 2680*38fd1498Szrj 2681*38fd1498Szrj /* Duplicating blocks will redirect edges and may cause hot blocks 2682*38fd1498Szrj previously reached by both hot and cold blocks to become dominated 2683*38fd1498Szrj only by cold blocks. */ 2684*38fd1498Szrj if (changed) 2685*38fd1498Szrj fixup_partitions (); 2686*38fd1498Szrj } 2687*38fd1498Szrj 2688*38fd1498Szrj namespace { 2689*38fd1498Szrj 2690*38fd1498Szrj const pass_data pass_data_duplicate_computed_gotos = 2691*38fd1498Szrj { 2692*38fd1498Szrj RTL_PASS, /* type */ 2693*38fd1498Szrj "compgotos", /* name */ 2694*38fd1498Szrj OPTGROUP_NONE, /* optinfo_flags */ 2695*38fd1498Szrj TV_REORDER_BLOCKS, /* tv_id */ 2696*38fd1498Szrj 0, /* properties_required */ 2697*38fd1498Szrj 0, /* properties_provided */ 2698*38fd1498Szrj 0, /* properties_destroyed */ 2699*38fd1498Szrj 0, /* todo_flags_start */ 2700*38fd1498Szrj 0, /* todo_flags_finish */ 2701*38fd1498Szrj }; 2702*38fd1498Szrj 2703*38fd1498Szrj class pass_duplicate_computed_gotos : public rtl_opt_pass 2704*38fd1498Szrj { 2705*38fd1498Szrj public: 2706*38fd1498Szrj pass_duplicate_computed_gotos (gcc::context *ctxt) 2707*38fd1498Szrj : rtl_opt_pass (pass_data_duplicate_computed_gotos, ctxt) 2708*38fd1498Szrj {} 2709*38fd1498Szrj 2710*38fd1498Szrj /* opt_pass methods: */ 2711*38fd1498Szrj virtual bool gate (function *); 2712*38fd1498Szrj virtual unsigned int execute (function *); 2713*38fd1498Szrj 2714*38fd1498Szrj }; // class pass_duplicate_computed_gotos 2715*38fd1498Szrj 2716*38fd1498Szrj bool 2717*38fd1498Szrj pass_duplicate_computed_gotos::gate (function *fun) 2718*38fd1498Szrj { 2719*38fd1498Szrj if (targetm.cannot_modify_jumps_p ()) 2720*38fd1498Szrj return false; 2721*38fd1498Szrj return (optimize > 0 2722*38fd1498Szrj && flag_expensive_optimizations 2723*38fd1498Szrj && ! optimize_function_for_size_p (fun)); 2724*38fd1498Szrj } 2725*38fd1498Szrj 2726*38fd1498Szrj unsigned int 2727*38fd1498Szrj pass_duplicate_computed_gotos::execute (function *fun) 2728*38fd1498Szrj { 2729*38fd1498Szrj duplicate_computed_gotos (fun); 2730*38fd1498Szrj 2731*38fd1498Szrj return 0; 2732*38fd1498Szrj } 2733*38fd1498Szrj 2734*38fd1498Szrj } // anon namespace 2735*38fd1498Szrj 2736*38fd1498Szrj rtl_opt_pass * 2737*38fd1498Szrj make_pass_duplicate_computed_gotos (gcc::context *ctxt) 2738*38fd1498Szrj { 2739*38fd1498Szrj return new pass_duplicate_computed_gotos (ctxt); 2740*38fd1498Szrj } 2741*38fd1498Szrj 2742*38fd1498Szrj /* This function is the main 'entrance' for the optimization that 2743*38fd1498Szrj partitions hot and cold basic blocks into separate sections of the 2744*38fd1498Szrj .o file (to improve performance and cache locality). Ideally it 2745*38fd1498Szrj would be called after all optimizations that rearrange the CFG have 2746*38fd1498Szrj been called. However part of this optimization may introduce new 2747*38fd1498Szrj register usage, so it must be called before register allocation has 2748*38fd1498Szrj occurred. This means that this optimization is actually called 2749*38fd1498Szrj well before the optimization that reorders basic blocks (see 2750*38fd1498Szrj function above). 2751*38fd1498Szrj 2752*38fd1498Szrj This optimization checks the feedback information to determine 2753*38fd1498Szrj which basic blocks are hot/cold, updates flags on the basic blocks 2754*38fd1498Szrj to indicate which section they belong in. This information is 2755*38fd1498Szrj later used for writing out sections in the .o file. Because hot 2756*38fd1498Szrj and cold sections can be arbitrarily large (within the bounds of 2757*38fd1498Szrj memory), far beyond the size of a single function, it is necessary 2758*38fd1498Szrj to fix up all edges that cross section boundaries, to make sure the 2759*38fd1498Szrj instructions used can actually span the required distance. The 2760*38fd1498Szrj fixes are described below. 2761*38fd1498Szrj 2762*38fd1498Szrj Fall-through edges must be changed into jumps; it is not safe or 2763*38fd1498Szrj legal to fall through across a section boundary. Whenever a 2764*38fd1498Szrj fall-through edge crossing a section boundary is encountered, a new 2765*38fd1498Szrj basic block is inserted (in the same section as the fall-through 2766*38fd1498Szrj source), and the fall through edge is redirected to the new basic 2767*38fd1498Szrj block. The new basic block contains an unconditional jump to the 2768*38fd1498Szrj original fall-through target. (If the unconditional jump is 2769*38fd1498Szrj insufficient to cross section boundaries, that is dealt with a 2770*38fd1498Szrj little later, see below). 2771*38fd1498Szrj 2772*38fd1498Szrj In order to deal with architectures that have short conditional 2773*38fd1498Szrj branches (which cannot span all of memory) we take any conditional 2774*38fd1498Szrj jump that attempts to cross a section boundary and add a level of 2775*38fd1498Szrj indirection: it becomes a conditional jump to a new basic block, in 2776*38fd1498Szrj the same section. The new basic block contains an unconditional 2777*38fd1498Szrj jump to the original target, in the other section. 2778*38fd1498Szrj 2779*38fd1498Szrj For those architectures whose unconditional branch is also 2780*38fd1498Szrj incapable of reaching all of memory, those unconditional jumps are 2781*38fd1498Szrj converted into indirect jumps, through a register. 2782*38fd1498Szrj 2783*38fd1498Szrj IMPORTANT NOTE: This optimization causes some messy interactions 2784*38fd1498Szrj with the cfg cleanup optimizations; those optimizations want to 2785*38fd1498Szrj merge blocks wherever possible, and to collapse indirect jump 2786*38fd1498Szrj sequences (change "A jumps to B jumps to C" directly into "A jumps 2787*38fd1498Szrj to C"). Those optimizations can undo the jump fixes that 2788*38fd1498Szrj partitioning is required to make (see above), in order to ensure 2789*38fd1498Szrj that jumps attempting to cross section boundaries are really able 2790*38fd1498Szrj to cover whatever distance the jump requires (on many architectures 2791*38fd1498Szrj conditional or unconditional jumps are not able to reach all of 2792*38fd1498Szrj memory). Therefore tests have to be inserted into each such 2793*38fd1498Szrj optimization to make sure that it does not undo stuff necessary to 2794*38fd1498Szrj cross partition boundaries. This would be much less of a problem 2795*38fd1498Szrj if we could perform this optimization later in the compilation, but 2796*38fd1498Szrj unfortunately the fact that we may need to create indirect jumps 2797*38fd1498Szrj (through registers) requires that this optimization be performed 2798*38fd1498Szrj before register allocation. 2799*38fd1498Szrj 2800*38fd1498Szrj Hot and cold basic blocks are partitioned and put in separate 2801*38fd1498Szrj sections of the .o file, to reduce paging and improve cache 2802*38fd1498Szrj performance (hopefully). This can result in bits of code from the 2803*38fd1498Szrj same function being widely separated in the .o file. However this 2804*38fd1498Szrj is not obvious to the current bb structure. Therefore we must take 2805*38fd1498Szrj care to ensure that: 1). There are no fall_thru edges that cross 2806*38fd1498Szrj between sections; 2). For those architectures which have "short" 2807*38fd1498Szrj conditional branches, all conditional branches that attempt to 2808*38fd1498Szrj cross between sections are converted to unconditional branches; 2809*38fd1498Szrj and, 3). For those architectures which have "short" unconditional 2810*38fd1498Szrj branches, all unconditional branches that attempt to cross between 2811*38fd1498Szrj sections are converted to indirect jumps. 2812*38fd1498Szrj 2813*38fd1498Szrj The code for fixing up fall_thru edges that cross between hot and 2814*38fd1498Szrj cold basic blocks does so by creating new basic blocks containing 2815*38fd1498Szrj unconditional branches to the appropriate label in the "other" 2816*38fd1498Szrj section. The new basic block is then put in the same (hot or cold) 2817*38fd1498Szrj section as the original conditional branch, and the fall_thru edge 2818*38fd1498Szrj is modified to fall into the new basic block instead. By adding 2819*38fd1498Szrj this level of indirection we end up with only unconditional branches 2820*38fd1498Szrj crossing between hot and cold sections. 2821*38fd1498Szrj 2822*38fd1498Szrj Conditional branches are dealt with by adding a level of indirection. 2823*38fd1498Szrj A new basic block is added in the same (hot/cold) section as the 2824*38fd1498Szrj conditional branch, and the conditional branch is retargeted to the 2825*38fd1498Szrj new basic block. The new basic block contains an unconditional branch 2826*38fd1498Szrj to the original target of the conditional branch (in the other section). 2827*38fd1498Szrj 2828*38fd1498Szrj Unconditional branches are dealt with by converting them into 2829*38fd1498Szrj indirect jumps. */ 2830*38fd1498Szrj 2831*38fd1498Szrj namespace { 2832*38fd1498Szrj 2833*38fd1498Szrj const pass_data pass_data_partition_blocks = 2834*38fd1498Szrj { 2835*38fd1498Szrj RTL_PASS, /* type */ 2836*38fd1498Szrj "bbpart", /* name */ 2837*38fd1498Szrj OPTGROUP_NONE, /* optinfo_flags */ 2838*38fd1498Szrj TV_REORDER_BLOCKS, /* tv_id */ 2839*38fd1498Szrj PROP_cfglayout, /* properties_required */ 2840*38fd1498Szrj 0, /* properties_provided */ 2841*38fd1498Szrj 0, /* properties_destroyed */ 2842*38fd1498Szrj 0, /* todo_flags_start */ 2843*38fd1498Szrj 0, /* todo_flags_finish */ 2844*38fd1498Szrj }; 2845*38fd1498Szrj 2846*38fd1498Szrj class pass_partition_blocks : public rtl_opt_pass 2847*38fd1498Szrj { 2848*38fd1498Szrj public: 2849*38fd1498Szrj pass_partition_blocks (gcc::context *ctxt) 2850*38fd1498Szrj : rtl_opt_pass (pass_data_partition_blocks, ctxt) 2851*38fd1498Szrj {} 2852*38fd1498Szrj 2853*38fd1498Szrj /* opt_pass methods: */ 2854*38fd1498Szrj virtual bool gate (function *); 2855*38fd1498Szrj virtual unsigned int execute (function *); 2856*38fd1498Szrj 2857*38fd1498Szrj }; // class pass_partition_blocks 2858*38fd1498Szrj 2859*38fd1498Szrj bool 2860*38fd1498Szrj pass_partition_blocks::gate (function *fun) 2861*38fd1498Szrj { 2862*38fd1498Szrj /* The optimization to partition hot/cold basic blocks into separate 2863*38fd1498Szrj sections of the .o file does not work well with linkonce or with 2864*38fd1498Szrj user defined section attributes. Don't call it if either case 2865*38fd1498Szrj arises. */ 2866*38fd1498Szrj return (flag_reorder_blocks_and_partition 2867*38fd1498Szrj && optimize 2868*38fd1498Szrj /* See pass_reorder_blocks::gate. We should not partition if 2869*38fd1498Szrj we are going to omit the reordering. */ 2870*38fd1498Szrj && optimize_function_for_speed_p (fun) 2871*38fd1498Szrj && !DECL_COMDAT_GROUP (current_function_decl) 2872*38fd1498Szrj && !lookup_attribute ("section", DECL_ATTRIBUTES (fun->decl)) 2873*38fd1498Szrj /* Workaround a bug in GDB where read_partial_die doesn't cope 2874*38fd1498Szrj with DIEs with DW_AT_ranges, see PR81115. */ 2875*38fd1498Szrj && !(in_lto_p && MAIN_NAME_P (DECL_NAME (fun->decl)))); 2876*38fd1498Szrj } 2877*38fd1498Szrj 2878*38fd1498Szrj unsigned 2879*38fd1498Szrj pass_partition_blocks::execute (function *fun) 2880*38fd1498Szrj { 2881*38fd1498Szrj vec<edge> crossing_edges; 2882*38fd1498Szrj 2883*38fd1498Szrj if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1) 2884*38fd1498Szrj return 0; 2885*38fd1498Szrj 2886*38fd1498Szrj df_set_flags (DF_DEFER_INSN_RESCAN); 2887*38fd1498Szrj 2888*38fd1498Szrj crossing_edges = find_rarely_executed_basic_blocks_and_crossing_edges (); 2889*38fd1498Szrj if (!crossing_edges.exists ()) 2890*38fd1498Szrj /* Make sure to process deferred rescans and clear changeable df flags. */ 2891*38fd1498Szrj return TODO_df_finish; 2892*38fd1498Szrj 2893*38fd1498Szrj crtl->has_bb_partition = true; 2894*38fd1498Szrj 2895*38fd1498Szrj /* Make sure the source of any crossing edge ends in a jump and the 2896*38fd1498Szrj destination of any crossing edge has a label. */ 2897*38fd1498Szrj add_labels_and_missing_jumps (crossing_edges); 2898*38fd1498Szrj 2899*38fd1498Szrj /* Convert all crossing fall_thru edges to non-crossing fall 2900*38fd1498Szrj thrus to unconditional jumps (that jump to the original fall 2901*38fd1498Szrj through dest). */ 2902*38fd1498Szrj fix_up_fall_thru_edges (); 2903*38fd1498Szrj 2904*38fd1498Szrj /* If the architecture does not have conditional branches that can 2905*38fd1498Szrj span all of memory, convert crossing conditional branches into 2906*38fd1498Szrj crossing unconditional branches. */ 2907*38fd1498Szrj if (!HAS_LONG_COND_BRANCH) 2908*38fd1498Szrj fix_crossing_conditional_branches (); 2909*38fd1498Szrj 2910*38fd1498Szrj /* If the architecture does not have unconditional branches that 2911*38fd1498Szrj can span all of memory, convert crossing unconditional branches 2912*38fd1498Szrj into indirect jumps. Since adding an indirect jump also adds 2913*38fd1498Szrj a new register usage, update the register usage information as 2914*38fd1498Szrj well. */ 2915*38fd1498Szrj if (!HAS_LONG_UNCOND_BRANCH) 2916*38fd1498Szrj fix_crossing_unconditional_branches (); 2917*38fd1498Szrj 2918*38fd1498Szrj update_crossing_jump_flags (); 2919*38fd1498Szrj 2920*38fd1498Szrj /* Clear bb->aux fields that the above routines were using. */ 2921*38fd1498Szrj clear_aux_for_blocks (); 2922*38fd1498Szrj 2923*38fd1498Szrj crossing_edges.release (); 2924*38fd1498Szrj 2925*38fd1498Szrj /* ??? FIXME: DF generates the bb info for a block immediately. 2926*38fd1498Szrj And by immediately, I mean *during* creation of the block. 2927*38fd1498Szrj 2928*38fd1498Szrj #0 df_bb_refs_collect 2929*38fd1498Szrj #1 in df_bb_refs_record 2930*38fd1498Szrj #2 in create_basic_block_structure 2931*38fd1498Szrj 2932*38fd1498Szrj Which means that the bb_has_eh_pred test in df_bb_refs_collect 2933*38fd1498Szrj will *always* fail, because no edges can have been added to the 2934*38fd1498Szrj block yet. Which of course means we don't add the right 2935*38fd1498Szrj artificial refs, which means we fail df_verify (much) later. 2936*38fd1498Szrj 2937*38fd1498Szrj Cleanest solution would seem to make DF_DEFER_INSN_RESCAN imply 2938*38fd1498Szrj that we also shouldn't grab data from the new blocks those new 2939*38fd1498Szrj insns are in either. In this way one can create the block, link 2940*38fd1498Szrj it up properly, and have everything Just Work later, when deferred 2941*38fd1498Szrj insns are processed. 2942*38fd1498Szrj 2943*38fd1498Szrj In the meantime, we have no other option but to throw away all 2944*38fd1498Szrj of the DF data and recompute it all. */ 2945*38fd1498Szrj if (fun->eh->lp_array) 2946*38fd1498Szrj { 2947*38fd1498Szrj df_finish_pass (true); 2948*38fd1498Szrj df_scan_alloc (NULL); 2949*38fd1498Szrj df_scan_blocks (); 2950*38fd1498Szrj /* Not all post-landing pads use all of the EH_RETURN_DATA_REGNO 2951*38fd1498Szrj data. We blindly generated all of them when creating the new 2952*38fd1498Szrj landing pad. Delete those assignments we don't use. */ 2953*38fd1498Szrj df_set_flags (DF_LR_RUN_DCE); 2954*38fd1498Szrj df_analyze (); 2955*38fd1498Szrj } 2956*38fd1498Szrj 2957*38fd1498Szrj /* Make sure to process deferred rescans and clear changeable df flags. */ 2958*38fd1498Szrj return TODO_df_finish; 2959*38fd1498Szrj } 2960*38fd1498Szrj 2961*38fd1498Szrj } // anon namespace 2962*38fd1498Szrj 2963*38fd1498Szrj rtl_opt_pass * 2964*38fd1498Szrj make_pass_partition_blocks (gcc::context *ctxt) 2965*38fd1498Szrj { 2966*38fd1498Szrj return new pass_partition_blocks (ctxt); 2967*38fd1498Szrj } 2968